Scene Understanding - 2024-12
Scene Understanding - 2024-12
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-12-31 | STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes | Jiawei Yang et.al. | 2501.00602 | translate | read | null |
| 2024-12-31 | Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding | Yue Fan et.al. | 2501.00358 | translate | read | null |
| 2024-12-31 | OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies | Runnan Chen et.al. | 2501.00326 | translate | read | link |
| 2024-12-30 | Text-to-Image GAN with Pretrained Representations | Xiaozhou You et.al. | 2501.00116 | translate | read | null |
| 2024-12-30 | 4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives | Zeyu Yang et.al. | 2412.20720 | translate | read | null |
| 2024-12-27 | An Actionable Hierarchical Scene Representation Enhancing Autonomous Inspection Missions in Unknown Environments | Vignesh Kottayam Viswanathan et.al. | 2412.19582 | translate | read | null |
| 2024-12-27 | xFLIE: Leveraging Actionable Hierarchical Scene Representations for Autonomous Semantic-Aware Inspection Missions | Vignesh Kottayam Viswanathan et.al. | 2412.19571 | translate | read | link |
| 2024-12-27 | MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios | Jiaqi Fan et.al. | 2412.19406 | translate | read | null |
| 2024-12-26 | Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation | Tao Liu et.al. | 2412.19021 | translate | read | null |
| 2024-12-25 | 3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding | Tatiana Zemskova et.al. | 2412.18450 | translate | read | link |
| 2024-12-24 | MR-COGraphs: Communication-efficient Multi-Robot Open-vocabulary Mapping System via 3D Scene Graphs | Qiuyi Gu et.al. | 2412.18381 | translate | read | null |
| 2024-12-24 | Parallel Neural Computing for Scene Understanding from LiDAR Perception in Autonomous Racing | Suwesh Prasad Sah et.al. | 2412.18165 | translate | read | link |
| 2024-12-24 | UniPLV: Towards Label-Efficient Open-World 3D Scene Understanding by Regional Visual Language Supervision | Yuru Wang et.al. | 2412.18131 | translate | read | null |
| 2024-12-24 | LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding | Hao Li et.al. | 2412.17635 | translate | read | null |
| 2024-12-21 | Application of Multimodal Large Language Models in Autonomous Driving | Md Robiul Islam et.al. | 2412.16410 | translate | read | null |
| 2024-12-20 | Improving Object Detection for Time-Lapse Imagery Using Temporal Features in Wildlife Monitoring | Marcus Jenkins et.al. | 2412.16329 | translate | read | link |
| 2024-12-19 | AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving | Shuo Xing et.al. | 2412.15206 | translate | read | link |
| 2024-12-19 | ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects | Qihang Cao et.al. | 2412.14837 | translate | read | null |
| 2024-12-19 | PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation | Shoumeng Qiu et.al. | 2412.14821 | translate | read | link |
| 2024-12-18 | GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting | Yuning Peng et.al. | 2412.13654 | translate | read | link |
| 2024-12-18 | RelationField: Relate Anything in Radiance Fields | Sebastian Koch et.al. | 2412.13652 | translate | read | null |
| 2024-12-18 | Multi-View Pedestrian Occupancy Prediction with a Novel Synthetic Dataset | Sithu Aung et.al. | 2412.13569 | translate | read | null |
| 2024-12-17 | RA-SGG: Retrieval-Augmented Scene Graph Generation Framework via Multi-Prototype Learning | Kanghoon Yoon et.al. | 2412.12788 | translate | read | link |
| 2024-12-18 | Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration | Ziheng Zhou et.al. | 2412.12628 | translate | read | null |
| 2024-12-17 | Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning | Qi Sun et.al. | 2412.11974 | translate | read | link |
| 2024-12-16 | DINO-Foresight: Looking into the Future with DINO | Efstathios Karypidis et.al. | 2412.11673 | translate | read | link |
| 2024-12-16 | An Enhanced Classification Method Based on Adaptive Multi-Scale Fusion for Long-tailed Multispectral Point Clouds | TianZhu Liu et.al. | 2412.11407 | translate | read | null |
| 2024-12-15 | SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation | Hang Zhang et.al. | 2412.11026 | translate | read | null |
| 2024-12-13 | SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians | Siyun Liang et.al. | 2412.10231 | translate | read | null |
| 2024-12-13 | Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance | Jiahao Lyu et.al. | 2412.10159 | translate | read | null |
| 2024-12-17 | WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model | Songyan Zhang et.al. | 2412.09951 | translate | read | link |
| 2024-12-12 | LIVE-GS: LLM Powers Interactive VR by Enhancing Gaussian Splatting | Haotian Mao et.al. | 2412.09176 | translate | read | null |
| 2024-12-11 | SLGaussian: Fast Language Gaussian Splatting in Sparse Views | Kangjie Chen et.al. | 2412.08331 | translate | read | null |
| 2024-12-11 | TGOSPA Metric Parameters Selection and Evaluation for Visual Multi-object Tracking | Jan Krejčí et.al. | 2412.08321 | translate | read | null |
| 2024-12-11 | THUD++: Large-Scale Dynamic Indoor Scene Dataset and Benchmark for Mobile Robots | Zeshun Li et.al. | 2412.08096 | translate | read | null |
| 2024-12-11 | MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents | Yun Xing et.al. | 2412.08014 | translate | read | null |
| 2024-12-10 | Motion-aware Contrastive Learning for Temporal Panoptic Scene Graph Generation | Thong Thanh Nguyen et.al. | 2412.07160 | translate | read | null |
| 2024-12-11 | ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models | Jieyu Zhang et.al. | 2412.07012 | translate | read | link |
| 2024-12-07 | Timely reliable Bayesian decision-making enabled using memristors | Lekai Song et.al. | 2412.06838 | translate | read | null |
| 2024-12-09 | Visual Lexicon: Rich Image Features in Language Space | XuDong Wang et.al. | 2412.06774 | translate | read | null |
| 2024-12-09 | LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations | Mingjie Xu et.al. | 2412.06322 | translate | read | link |
| 2024-12-09 | Event fields: Capturing light fields at high speed, resolution, and dynamic range | Ziyuan Qu et.al. | 2412.06191 | translate | read | null |
| 2024-12-07 | TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances | Wenting Xu et.al. | 2412.05596 | translate | read | null |
| 2024-12-06 | Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model | Lening Wang et.al. | 2412.05280 | translate | read | link |
| 2024-12-06 | EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding | Yuqi Wu et.al. | 2412.04380 | translate | read | link |
| 2024-12-04 | Designing DNNs for a trade-off between robustness and processing performance in embedded devices | Jon Gutiérrez-Zaballa et.al. | 2412.03682 | translate | read | null |
| 2024-12-04 | Assessing the performance of CT image denoisers using Laguerre-Gauss Channelized Hotelling Observer for lesion detection | Prabhat Kc et.al. | 2412.02920 | translate | read | null |
| 2024-12-03 | BYE: Build Your Encoder with One Sequence of Exploration Data for Long-Term Dynamic Scene Understanding | Chenguang Huang et.al. | 2412.02449 | translate | read | null |
| 2024-12-04 | SparseLGS: Sparse View Language Embedded Gaussian Splatting | Jun Hu et.al. | 2412.02245 | translate | read | null |
| 2024-12-02 | Occam’s LGS: A Simple Approach for Language Gaussian Splatting | Jiahuan Cheng et.al. | 2412.01807 | translate | read | null |
| 2024-12-02 | Holistic Understanding of 3D Scenes as Universal Scene Description | Anna-Maria Halacheva et.al. | 2412.01398 | translate | read | null |
| 2024-12-02 | LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences | Hongyan Zhi et.al. | 2412.01292 | translate | read | null |
| 2024-12-02 | A Semantic Communication System for Real-time 3D Reconstruction Tasks | Jiaxing Zhang et.al. | 2412.01191 | translate | read | null |
| 2024-12-02 | TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition | Xingsong Ye et.al. | 2412.01137 | translate | read | link |
| 2024-12-01 | ChatSplat: 3D Conversational Gaussian Splatting | Hanlin Chen et.al. | 2412.00734 | translate | read | null |
(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)