Scene Understanding - 2024-08

Publish Date Title Authors PDF Translate Read Code
2024-08-31 Leaky Wave Antenna-Equipped RF Chipless Tags for Orientation Estimation Onel L. A. López et.al. 2409.00501 translate read null
2024-08-30 UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios Baichuan Zhou et.al. 2408.17267 translate read link
2024-08-30 AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding Yonghui Wang et.al. 2408.16986 translate read link
2024-08-29 DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving Yongjie Fu et.al. 2408.16647 translate read null
2024-08-28 Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph Zherong Zhang et.al. 2408.15750 translate read null
2024-08-28 RoboSense: Large-scale Dataset and Benchmark for Multi-sensor Low-speed Autonomous Driving Haisheng Su et.al. 2408.15503 translate read link
2024-08-27 Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images Silvia Seidlitz et.al. 2408.15373 translate read link
2024-08-27 MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders Baijiong Lin et.al. 2408.15101 translate read link
2024-08-27 Interactive Occlusion Boundary Estimation through Exploitation of Synthetic Data Lintao Xu et.al. 2408.15038 translate read null
2024-08-27 BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization Mario A. V. Saucedo et.al. 2408.14941 translate read null
2024-08-27 Platypus: A Generalized Specialist Model for Reading Text in Various Forms Peng Wang et.al. 2408.14805 translate read link
2024-08-27 RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models Junyao Ge et.al. 2408.14744 translate read link
2024-08-26 Ensemble Predicate Decoding for Unbiased Scene Graph Generation Jiasong Feng et.al. 2408.14187 translate read null
2024-08-26 FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation Daixun Li et.al. 2408.13980 translate read null
2024-08-25 Making Large Language Models Better Planners with Reasoning-Decision Alignment Zhijian Huang et.al. 2408.13890 translate read null
2024-08-25 3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing Shichao Dong et.al. 2408.13788 translate read null
2024-08-25 Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild Fares Bougourzi et.al. 2408.13774 translate read link
2024-08-25 SeeBelow: Sub-dermal 3D Reconstruction of Tumors with Surgical Robotic Palpation and Tactile Exploration Raghava Uppuluri et.al. 2408.13699 translate read null
2024-08-21 Exploring Scene Coherence for Semi-Supervised 3D Semantic Segmentation Chuandong Liu et.al. 2408.11280 translate read null
2024-08-20 OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding Youjun Zhao et.al. 2408.11030 translate read link
2024-08-19 3D-Aware Instance Segmentation and Tracking in Egocentric Videos Yash Bhalgat et.al. 2408.09860 translate read null
2024-08-16 Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation Tri Ton et.al. 2408.08591 translate read null
2024-08-15 Towards Flexible Visual Relationship Segmentation Fangrui Zhu et.al. 2408.08305 translate read null
2024-08-13 SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis Saptarshi Neil Sinha et.al. 2408.06975 translate read null
2024-08-13 SceneGPT: A Language Model for 3D Scene Understanding Shivam Chandhok et.al. 2408.06926 translate read null
2024-08-12 HeLiMOS: A Dataset for Moving Object Segmentation in 3D Point Clouds From Heterogeneous LiDAR Sensors Hyungtae Lim et.al. 2408.06328 translate read null
2024-08-11 Decoder Pre-Training with only Text for Scene Text Recognition Shuai Zhao et.al. 2408.05706 translate read link
2024-08-09 Spherical World-Locking for Audio-Visual Localization in Egocentric Videos Heeseung Yun et.al. 2408.05364 translate read null
2024-08-15 DeepInteraction++: Multi-Modality Interaction for Autonomous Driving Zeyu Yang et.al. 2408.05075 translate read link
2024-08-09 Mesh-based Object Tracking for Dynamic Semantic 3D Scene Graphs via Ray Tracing Lennart Niecksch et.al. 2408.04979 translate read null
2024-08-09 Manipulable Semantic Components: a Computational Representation of Data Visualization Scenes Zhicheng Liu et.al. 2408.04798 translate read null
2024-08-07 Leveraging LLMs for Enhanced Open-Vocabulary 3D Scene Understanding in Autonomous Driving Amirhosein Chahe et.al. 2408.03516 translate read null
2024-08-04 LEGO: Self-Supervised Representation Learning for Scene Text Images Yujin Ren et.al. 2408.02036 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)