Scene Understanding - 2024-12

Publish Date Title Authors PDF Translate Read Code
2024-12-31 STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Jiawei Yang et.al. 2501.00602 translate read null
2024-12-31 Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding Yue Fan et.al. 2501.00358 translate read null
2024-12-31 OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies Runnan Chen et.al. 2501.00326 translate read link
2024-12-30 Text-to-Image GAN with Pretrained Representations Xiaozhou You et.al. 2501.00116 translate read null
2024-12-30 4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives Zeyu Yang et.al. 2412.20720 translate read null
2024-12-27 An Actionable Hierarchical Scene Representation Enhancing Autonomous Inspection Missions in Unknown Environments Vignesh Kottayam Viswanathan et.al. 2412.19582 translate read null
2024-12-27 xFLIE: Leveraging Actionable Hierarchical Scene Representations for Autonomous Semantic-Aware Inspection Missions Vignesh Kottayam Viswanathan et.al. 2412.19571 translate read link
2024-12-27 MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios Jiaqi Fan et.al. 2412.19406 translate read null
2024-12-26 Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation Tao Liu et.al. 2412.19021 translate read null
2024-12-25 3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding Tatiana Zemskova et.al. 2412.18450 translate read link
2024-12-24 MR-COGraphs: Communication-efficient Multi-Robot Open-vocabulary Mapping System via 3D Scene Graphs Qiuyi Gu et.al. 2412.18381 translate read null
2024-12-24 Parallel Neural Computing for Scene Understanding from LiDAR Perception in Autonomous Racing Suwesh Prasad Sah et.al. 2412.18165 translate read link
2024-12-24 UniPLV: Towards Label-Efficient Open-World 3D Scene Understanding by Regional Visual Language Supervision Yuru Wang et.al. 2412.18131 translate read null
2024-12-24 LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding Hao Li et.al. 2412.17635 translate read null
2024-12-21 Application of Multimodal Large Language Models in Autonomous Driving Md Robiul Islam et.al. 2412.16410 translate read null
2024-12-20 Improving Object Detection for Time-Lapse Imagery Using Temporal Features in Wildlife Monitoring Marcus Jenkins et.al. 2412.16329 translate read link
2024-12-19 AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving Shuo Xing et.al. 2412.15206 translate read link
2024-12-19 ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects Qihang Cao et.al. 2412.14837 translate read null
2024-12-19 PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation Shoumeng Qiu et.al. 2412.14821 translate read link
2024-12-18 GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting Yuning Peng et.al. 2412.13654 translate read link
2024-12-18 RelationField: Relate Anything in Radiance Fields Sebastian Koch et.al. 2412.13652 translate read null
2024-12-18 Multi-View Pedestrian Occupancy Prediction with a Novel Synthetic Dataset Sithu Aung et.al. 2412.13569 translate read null
2024-12-17 RA-SGG: Retrieval-Augmented Scene Graph Generation Framework via Multi-Prototype Learning Kanghoon Yoon et.al. 2412.12788 translate read link
2024-12-18 Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration Ziheng Zhou et.al. 2412.12628 translate read null
2024-12-17 Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning Qi Sun et.al. 2412.11974 translate read link
2024-12-16 DINO-Foresight: Looking into the Future with DINO Efstathios Karypidis et.al. 2412.11673 translate read link
2024-12-16 An Enhanced Classification Method Based on Adaptive Multi-Scale Fusion for Long-tailed Multispectral Point Clouds TianZhu Liu et.al. 2412.11407 translate read null
2024-12-15 SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation Hang Zhang et.al. 2412.11026 translate read null
2024-12-13 SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians Siyun Liang et.al. 2412.10231 translate read null
2024-12-13 Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance Jiahao Lyu et.al. 2412.10159 translate read null
2024-12-17 WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model Songyan Zhang et.al. 2412.09951 translate read link
2024-12-12 LIVE-GS: LLM Powers Interactive VR by Enhancing Gaussian Splatting Haotian Mao et.al. 2412.09176 translate read null
2024-12-11 SLGaussian: Fast Language Gaussian Splatting in Sparse Views Kangjie Chen et.al. 2412.08331 translate read null
2024-12-11 TGOSPA Metric Parameters Selection and Evaluation for Visual Multi-object Tracking Jan Krejčí et.al. 2412.08321 translate read null
2024-12-11 THUD++: Large-Scale Dynamic Indoor Scene Dataset and Benchmark for Mobile Robots Zeshun Li et.al. 2412.08096 translate read null
2024-12-11 MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents Yun Xing et.al. 2412.08014 translate read null
2024-12-10 Motion-aware Contrastive Learning for Temporal Panoptic Scene Graph Generation Thong Thanh Nguyen et.al. 2412.07160 translate read null
2024-12-11 ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models Jieyu Zhang et.al. 2412.07012 translate read link
2024-12-07 Timely reliable Bayesian decision-making enabled using memristors Lekai Song et.al. 2412.06838 translate read null
2024-12-09 Visual Lexicon: Rich Image Features in Language Space XuDong Wang et.al. 2412.06774 translate read null
2024-12-09 LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations Mingjie Xu et.al. 2412.06322 translate read link
2024-12-09 Event fields: Capturing light fields at high speed, resolution, and dynamic range Ziyuan Qu et.al. 2412.06191 translate read null
2024-12-07 TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances Wenting Xu et.al. 2412.05596 translate read null
2024-12-06 Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model Lening Wang et.al. 2412.05280 translate read link
2024-12-06 EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding Yuqi Wu et.al. 2412.04380 translate read link
2024-12-04 Designing DNNs for a trade-off between robustness and processing performance in embedded devices Jon Gutiérrez-Zaballa et.al. 2412.03682 translate read null
2024-12-04 Assessing the performance of CT image denoisers using Laguerre-Gauss Channelized Hotelling Observer for lesion detection Prabhat Kc et.al. 2412.02920 translate read null
2024-12-03 BYE: Build Your Encoder with One Sequence of Exploration Data for Long-Term Dynamic Scene Understanding Chenguang Huang et.al. 2412.02449 translate read null
2024-12-04 SparseLGS: Sparse View Language Embedded Gaussian Splatting Jun Hu et.al. 2412.02245 translate read null
2024-12-02 Occam’s LGS: A Simple Approach for Language Gaussian Splatting Jiahuan Cheng et.al. 2412.01807 translate read null
2024-12-02 Holistic Understanding of 3D Scenes as Universal Scene Description Anna-Maria Halacheva et.al. 2412.01398 translate read null
2024-12-02 LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences Hongyan Zhi et.al. 2412.01292 translate read null
2024-12-02 A Semantic Communication System for Real-time 3D Reconstruction Tasks Jiaxing Zhang et.al. 2412.01191 translate read null
2024-12-02 TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition Xingsong Ye et.al. 2412.01137 translate read link
2024-12-01 ChatSplat: 3D Conversational Gaussian Splatting Hanlin Chen et.al. 2412.00734 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)