Scene Understanding - 2024-08 | Paper Arxiv Daily

Scene Understanding - 2024-08

Publish Date	Title	Authors	PDF	Translate	Read	Code
2024-08-31	Leaky Wave Antenna-Equipped RF Chipless Tags for Orientation Estimation	Onel L. A. López et.al.	2409.00501	translate	read	null
2024-08-30	UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios	Baichuan Zhou et.al.	2408.17267	translate	read	link
2024-08-30	AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding	Yonghui Wang et.al.	2408.16986	translate	read	link
2024-08-29	DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving	Yongjie Fu et.al.	2408.16647	translate	read	null
2024-08-28	Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph	Zherong Zhang et.al.	2408.15750	translate	read	null
2024-08-28	RoboSense: Large-scale Dataset and Benchmark for Multi-sensor Low-speed Autonomous Driving	Haisheng Su et.al.	2408.15503	translate	read	link
2024-08-27	Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images	Silvia Seidlitz et.al.	2408.15373	translate	read	link
2024-08-27	MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders	Baijiong Lin et.al.	2408.15101	translate	read	link
2024-08-27	Interactive Occlusion Boundary Estimation through Exploitation of Synthetic Data	Lintao Xu et.al.	2408.15038	translate	read	null
2024-08-27	BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization	Mario A. V. Saucedo et.al.	2408.14941	translate	read	null
2024-08-27	Platypus: A Generalized Specialist Model for Reading Text in Various Forms	Peng Wang et.al.	2408.14805	translate	read	link
2024-08-27	RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models	Junyao Ge et.al.	2408.14744	translate	read	link
2024-08-26	Ensemble Predicate Decoding for Unbiased Scene Graph Generation	Jiasong Feng et.al.	2408.14187	translate	read	null
2024-08-26	FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation	Daixun Li et.al.	2408.13980	translate	read	null
2024-08-25	Making Large Language Models Better Planners with Reasoning-Decision Alignment	Zhijian Huang et.al.	2408.13890	translate	read	null
2024-08-25	3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing	Shichao Dong et.al.	2408.13788	translate	read	null
2024-08-25	Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild	Fares Bougourzi et.al.	2408.13774	translate	read	link
2024-08-25	SeeBelow: Sub-dermal 3D Reconstruction of Tumors with Surgical Robotic Palpation and Tactile Exploration	Raghava Uppuluri et.al.	2408.13699	translate	read	null
2024-08-21	Exploring Scene Coherence for Semi-Supervised 3D Semantic Segmentation	Chuandong Liu et.al.	2408.11280	translate	read	null
2024-08-20	OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding	Youjun Zhao et.al.	2408.11030	translate	read	link
2024-08-19	3D-Aware Instance Segmentation and Tracking in Egocentric Videos	Yash Bhalgat et.al.	2408.09860	translate	read	null
2024-08-16	Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation	Tri Ton et.al.	2408.08591	translate	read	null
2024-08-15	Towards Flexible Visual Relationship Segmentation	Fangrui Zhu et.al.	2408.08305	translate	read	null
2024-08-13	SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis	Saptarshi Neil Sinha et.al.	2408.06975	translate	read	null
2024-08-13	SceneGPT: A Language Model for 3D Scene Understanding	Shivam Chandhok et.al.	2408.06926	translate	read	null
2024-08-12	HeLiMOS: A Dataset for Moving Object Segmentation in 3D Point Clouds From Heterogeneous LiDAR Sensors	Hyungtae Lim et.al.	2408.06328	translate	read	null
2024-08-11	Decoder Pre-Training with only Text for Scene Text Recognition	Shuai Zhao et.al.	2408.05706	translate	read	link
2024-08-09	Spherical World-Locking for Audio-Visual Localization in Egocentric Videos	Heeseung Yun et.al.	2408.05364	translate	read	null
2024-08-15	DeepInteraction++: Multi-Modality Interaction for Autonomous Driving	Zeyu Yang et.al.	2408.05075	translate	read	link
2024-08-09	Mesh-based Object Tracking for Dynamic Semantic 3D Scene Graphs via Ray Tracing	Lennart Niecksch et.al.	2408.04979	translate	read	null
2024-08-09	Manipulable Semantic Components: a Computational Representation of Data Visualization Scenes	Zhicheng Liu et.al.	2408.04798	translate	read	null
2024-08-07	Leveraging LLMs for Enhanced Open-Vocabulary 3D Scene Understanding in Autonomous Driving	Amirhosein Chahe et.al.	2408.03516	translate	read	null
2024-08-04	LEGO: Self-Supervised Representation Learning for Scene Text Images	Yujin Ren et.al.	2408.02036	translate	read	null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)