Scene Understanding - 2024-12 | Paper Arxiv Daily

Scene Understanding - 2024-12

Publish Date	Title	Authors	PDF	Translate	Read	Code
2024-12-31	STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes	Jiawei Yang et.al.	2501.00602	translate	read	null
2024-12-31	Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding	Yue Fan et.al.	2501.00358	translate	read	null
2024-12-31	OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies	Runnan Chen et.al.	2501.00326	translate	read	link
2024-12-30	Text-to-Image GAN with Pretrained Representations	Xiaozhou You et.al.	2501.00116	translate	read	null
2024-12-30	4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives	Zeyu Yang et.al.	2412.20720	translate	read	null
2024-12-27	An Actionable Hierarchical Scene Representation Enhancing Autonomous Inspection Missions in Unknown Environments	Vignesh Kottayam Viswanathan et.al.	2412.19582	translate	read	null
2024-12-27	xFLIE: Leveraging Actionable Hierarchical Scene Representations for Autonomous Semantic-Aware Inspection Missions	Vignesh Kottayam Viswanathan et.al.	2412.19571	translate	read	link
2024-12-27	MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios	Jiaqi Fan et.al.	2412.19406	translate	read	null
2024-12-26	Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation	Tao Liu et.al.	2412.19021	translate	read	null
2024-12-25	3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding	Tatiana Zemskova et.al.	2412.18450	translate	read	link
2024-12-24	MR-COGraphs: Communication-efficient Multi-Robot Open-vocabulary Mapping System via 3D Scene Graphs	Qiuyi Gu et.al.	2412.18381	translate	read	null
2024-12-24	Parallel Neural Computing for Scene Understanding from LiDAR Perception in Autonomous Racing	Suwesh Prasad Sah et.al.	2412.18165	translate	read	link
2024-12-24	UniPLV: Towards Label-Efficient Open-World 3D Scene Understanding by Regional Visual Language Supervision	Yuru Wang et.al.	2412.18131	translate	read	null
2024-12-24	LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding	Hao Li et.al.	2412.17635	translate	read	null
2024-12-21	Application of Multimodal Large Language Models in Autonomous Driving	Md Robiul Islam et.al.	2412.16410	translate	read	null
2024-12-20	Improving Object Detection for Time-Lapse Imagery Using Temporal Features in Wildlife Monitoring	Marcus Jenkins et.al.	2412.16329	translate	read	link
2024-12-19	AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving	Shuo Xing et.al.	2412.15206	translate	read	link
2024-12-19	ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects	Qihang Cao et.al.	2412.14837	translate	read	null
2024-12-19	PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation	Shoumeng Qiu et.al.	2412.14821	translate	read	link
2024-12-18	GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting	Yuning Peng et.al.	2412.13654	translate	read	link
2024-12-18	RelationField: Relate Anything in Radiance Fields	Sebastian Koch et.al.	2412.13652	translate	read	null
2024-12-18	Multi-View Pedestrian Occupancy Prediction with a Novel Synthetic Dataset	Sithu Aung et.al.	2412.13569	translate	read	null
2024-12-17	RA-SGG: Retrieval-Augmented Scene Graph Generation Framework via Multi-Prototype Learning	Kanghoon Yoon et.al.	2412.12788	translate	read	link
2024-12-18	Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration	Ziheng Zhou et.al.	2412.12628	translate	read	null
2024-12-17	Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning	Qi Sun et.al.	2412.11974	translate	read	link
2024-12-16	DINO-Foresight: Looking into the Future with DINO	Efstathios Karypidis et.al.	2412.11673	translate	read	link
2024-12-16	An Enhanced Classification Method Based on Adaptive Multi-Scale Fusion for Long-tailed Multispectral Point Clouds	TianZhu Liu et.al.	2412.11407	translate	read	null
2024-12-15	SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation	Hang Zhang et.al.	2412.11026	translate	read	null
2024-12-13	SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians	Siyun Liang et.al.	2412.10231	translate	read	null
2024-12-13	Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance	Jiahao Lyu et.al.	2412.10159	translate	read	null
2024-12-17	WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model	Songyan Zhang et.al.	2412.09951	translate	read	link
2024-12-12	LIVE-GS: LLM Powers Interactive VR by Enhancing Gaussian Splatting	Haotian Mao et.al.	2412.09176	translate	read	null
2024-12-11	SLGaussian: Fast Language Gaussian Splatting in Sparse Views	Kangjie Chen et.al.	2412.08331	translate	read	null
2024-12-11	TGOSPA Metric Parameters Selection and Evaluation for Visual Multi-object Tracking	Jan Krejčí et.al.	2412.08321	translate	read	null
2024-12-11	THUD++: Large-Scale Dynamic Indoor Scene Dataset and Benchmark for Mobile Robots	Zeshun Li et.al.	2412.08096	translate	read	null
2024-12-11	MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents	Yun Xing et.al.	2412.08014	translate	read	null
2024-12-10	Motion-aware Contrastive Learning for Temporal Panoptic Scene Graph Generation	Thong Thanh Nguyen et.al.	2412.07160	translate	read	null
2024-12-11	ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models	Jieyu Zhang et.al.	2412.07012	translate	read	link
2024-12-07	Timely reliable Bayesian decision-making enabled using memristors	Lekai Song et.al.	2412.06838	translate	read	null
2024-12-09	Visual Lexicon: Rich Image Features in Language Space	XuDong Wang et.al.	2412.06774	translate	read	null
2024-12-09	LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations	Mingjie Xu et.al.	2412.06322	translate	read	link
2024-12-09	Event fields: Capturing light fields at high speed, resolution, and dynamic range	Ziyuan Qu et.al.	2412.06191	translate	read	null
2024-12-07	TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances	Wenting Xu et.al.	2412.05596	translate	read	null
2024-12-06	Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model	Lening Wang et.al.	2412.05280	translate	read	link
2024-12-06	EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding	Yuqi Wu et.al.	2412.04380	translate	read	link
2024-12-04	Designing DNNs for a trade-off between robustness and processing performance in embedded devices	Jon Gutiérrez-Zaballa et.al.	2412.03682	translate	read	null
2024-12-04	Assessing the performance of CT image denoisers using Laguerre-Gauss Channelized Hotelling Observer for lesion detection	Prabhat Kc et.al.	2412.02920	translate	read	null
2024-12-03	BYE: Build Your Encoder with One Sequence of Exploration Data for Long-Term Dynamic Scene Understanding	Chenguang Huang et.al.	2412.02449	translate	read	null
2024-12-04	SparseLGS: Sparse View Language Embedded Gaussian Splatting	Jun Hu et.al.	2412.02245	translate	read	null
2024-12-02	Occam’s LGS: A Simple Approach for Language Gaussian Splatting	Jiahuan Cheng et.al.	2412.01807	translate	read	null
2024-12-02	Holistic Understanding of 3D Scenes as Universal Scene Description	Anna-Maria Halacheva et.al.	2412.01398	translate	read	null
2024-12-02	LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences	Hongyan Zhi et.al.	2412.01292	translate	read	null
2024-12-02	A Semantic Communication System for Real-time 3D Reconstruction Tasks	Jiaxing Zhang et.al.	2412.01191	translate	read	null
2024-12-02	TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition	Xingsong Ye et.al.	2412.01137	translate	read	link
2024-12-01	ChatSplat: 3D Conversational Gaussian Splatting	Hanlin Chen et.al.	2412.00734	translate	read	null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)