Scene Understanding - 2024-06
Scene Understanding - 2024-06
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-06-30 | ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding | Quang P. M. Pham et.al. | 2407.00609 | translate | read | null |
| 2024-06-28 | EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting | Daiwei Zhang et.al. | 2406.19811 | translate | read | null |
| 2024-06-28 | PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation | Deyi Ji et.al. | 2406.19632 | translate | read | null |
| 2024-06-27 | Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation | KuanChao Chu et.al. | 2406.19316 | translate | read | null |
| 2024-06-26 | 3D-MVP: 3D Multiview Pretraining for Robotic Manipulation | Shengyi Qian et.al. | 2406.18158 | translate | read | null |
| 2024-06-24 | GPT-4V Explorations: Mining Autonomous Driving | Zixuan Li et.al. | 2406.16817 | translate | read | null |
| 2024-06-25 | AudioBench: A Universal Benchmark for Audio Large Language Models | Bin Wang et.al. | 2406.16020 | translate | read | link |
| 2024-06-20 | EvSegSNN: Neuromorphic Semantic Segmentation for Event Data | Dalia Hareb et.al. | 2406.14178 | translate | read | null |
| 2024-06-19 | StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images | Rushikesh Zawar et.al. | 2406.13735 | translate | read | null |
| 2024-06-17 | DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features | Letian Wang et.al. | 2406.12095 | translate | read | null |
| 2024-06-17 | Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding | Yunsong Wang et.al. | 2406.11283 | translate | read | null |
| 2024-06-15 | PIG: Prompt Images Guidance for Night-Time Scene Parsing | Zhifeng Xie et.al. | 2406.10531 | translate | read | link |
| 2024-06-14 | MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report | Zhongyu Yang et.al. | 2406.10125 | translate | read | null |
| 2024-06-14 | SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding | Junwei Luo et.al. | 2406.10100 | translate | read | link |
| 2024-06-14 | A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion | Kailai Sun et.al. | 2406.09792 | translate | read | link |
| 2024-06-13 | MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding | Fei Wang et.al. | 2406.09411 | translate | read | link |
| 2024-06-13 | Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach | Yansheng Li et.al. | 2406.09410 | translate | read | link |
| 2024-06-12 | Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment | Taekbeom Lee et.al. | 2406.08176 | translate | read | link |
| 2024-06-13 | A3VLM: Actionable Articulation-Aware Vision Language Model | Siyuan Huang et.al. | 2406.07549 | translate | read | link |
| 2024-06-10 | ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery | Xian Sun et.al. | 2406.06028 | translate | read | null |
| 2024-06-11 | LOP-Field: Brain-inspired Layout-Object-Position Fields for Robotic Scene Understanding | Jiawei Hou et.al. | 2406.05985 | translate | read | null |
| 2024-06-08 | 1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR’24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation | Qingfeng Liu et.al. | 2406.05352 | translate | read | null |
| 2024-06-06 | Semantic Similarity Score for Measuring Visual Similarity at Semantic Level | Senran Fan et.al. | 2406.03865 | translate | read | null |
| 2024-06-04 | Radar Spectra-Language Model for Automotive Scene Parsing | Mariia Pushkareva et.al. | 2406.02158 | translate | read | null |
| 2024-06-04 | Leveraging Predicate and Triplet Learning for Scene Graph Generation | Jiankai Li et.al. | 2406.02038 | translate | read | link |
| 2024-06-04 | FastLGS: Speeding up Language Embedded Gaussians with Feature Grid Mapping | Yuzhou Ji et.al. | 2406.01916 | translate | read | null |
| 2024-06-04 | PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning | Yupeng Zheng et.al. | 2406.01587 | translate | read | null |
| 2024-06-03 | EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding | Thanh-Dat Truong et.al. | 2406.01429 | translate | read | null |
| 2024-06-03 | Object Aware Egocentric Online Action Detection | Joungbin An et.al. | 2406.01079 | translate | read | null |
| 2024-06-03 | CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos | Trong-Thuan Nguyen et.al. | 2406.01029 | translate | read | null |
| 2024-06-02 | Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering | Xingrui Wang et.al. | 2406.00622 | translate | read | link |
| 2024-06-02 | Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024 | Biao Wu et.al. | 2406.00587 | translate | read | null |
(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)