Scene Understanding - 2024-08
Scene Understanding - 2024-08
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-08-31 | Leaky Wave Antenna-Equipped RF Chipless Tags for Orientation Estimation | Onel L. A. López et.al. | 2409.00501 | translate | read | null |
| 2024-08-30 | UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios | Baichuan Zhou et.al. | 2408.17267 | translate | read | link |
| 2024-08-30 | AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding | Yonghui Wang et.al. | 2408.16986 | translate | read | link |
| 2024-08-29 | DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving | Yongjie Fu et.al. | 2408.16647 | translate | read | null |
| 2024-08-28 | Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph | Zherong Zhang et.al. | 2408.15750 | translate | read | null |
| 2024-08-28 | RoboSense: Large-scale Dataset and Benchmark for Multi-sensor Low-speed Autonomous Driving | Haisheng Su et.al. | 2408.15503 | translate | read | link |
| 2024-08-27 | Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images | Silvia Seidlitz et.al. | 2408.15373 | translate | read | link |
| 2024-08-27 | MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders | Baijiong Lin et.al. | 2408.15101 | translate | read | link |
| 2024-08-27 | Interactive Occlusion Boundary Estimation through Exploitation of Synthetic Data | Lintao Xu et.al. | 2408.15038 | translate | read | null |
| 2024-08-27 | BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization | Mario A. V. Saucedo et.al. | 2408.14941 | translate | read | null |
| 2024-08-27 | Platypus: A Generalized Specialist Model for Reading Text in Various Forms | Peng Wang et.al. | 2408.14805 | translate | read | link |
| 2024-08-27 | RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models | Junyao Ge et.al. | 2408.14744 | translate | read | link |
| 2024-08-26 | Ensemble Predicate Decoding for Unbiased Scene Graph Generation | Jiasong Feng et.al. | 2408.14187 | translate | read | null |
| 2024-08-26 | FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation | Daixun Li et.al. | 2408.13980 | translate | read | null |
| 2024-08-25 | Making Large Language Models Better Planners with Reasoning-Decision Alignment | Zhijian Huang et.al. | 2408.13890 | translate | read | null |
| 2024-08-25 | 3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing | Shichao Dong et.al. | 2408.13788 | translate | read | null |
| 2024-08-25 | Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild | Fares Bougourzi et.al. | 2408.13774 | translate | read | link |
| 2024-08-25 | SeeBelow: Sub-dermal 3D Reconstruction of Tumors with Surgical Robotic Palpation and Tactile Exploration | Raghava Uppuluri et.al. | 2408.13699 | translate | read | null |
| 2024-08-21 | Exploring Scene Coherence for Semi-Supervised 3D Semantic Segmentation | Chuandong Liu et.al. | 2408.11280 | translate | read | null |
| 2024-08-20 | OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding | Youjun Zhao et.al. | 2408.11030 | translate | read | link |
| 2024-08-19 | 3D-Aware Instance Segmentation and Tracking in Egocentric Videos | Yash Bhalgat et.al. | 2408.09860 | translate | read | null |
| 2024-08-16 | Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation | Tri Ton et.al. | 2408.08591 | translate | read | null |
| 2024-08-15 | Towards Flexible Visual Relationship Segmentation | Fangrui Zhu et.al. | 2408.08305 | translate | read | null |
| 2024-08-13 | SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis | Saptarshi Neil Sinha et.al. | 2408.06975 | translate | read | null |
| 2024-08-13 | SceneGPT: A Language Model for 3D Scene Understanding | Shivam Chandhok et.al. | 2408.06926 | translate | read | null |
| 2024-08-12 | HeLiMOS: A Dataset for Moving Object Segmentation in 3D Point Clouds From Heterogeneous LiDAR Sensors | Hyungtae Lim et.al. | 2408.06328 | translate | read | null |
| 2024-08-11 | Decoder Pre-Training with only Text for Scene Text Recognition | Shuai Zhao et.al. | 2408.05706 | translate | read | link |
| 2024-08-09 | Spherical World-Locking for Audio-Visual Localization in Egocentric Videos | Heeseung Yun et.al. | 2408.05364 | translate | read | null |
| 2024-08-15 | DeepInteraction++: Multi-Modality Interaction for Autonomous Driving | Zeyu Yang et.al. | 2408.05075 | translate | read | link |
| 2024-08-09 | Mesh-based Object Tracking for Dynamic Semantic 3D Scene Graphs via Ray Tracing | Lennart Niecksch et.al. | 2408.04979 | translate | read | null |
| 2024-08-09 | Manipulable Semantic Components: a Computational Representation of Data Visualization Scenes | Zhicheng Liu et.al. | 2408.04798 | translate | read | null |
| 2024-08-07 | Leveraging LLMs for Enhanced Open-Vocabulary 3D Scene Understanding in Autonomous Driving | Amirhosein Chahe et.al. | 2408.03516 | translate | read | null |
| 2024-08-04 | LEGO: Self-Supervised Representation Learning for Scene Text Images | Yujin Ren et.al. | 2408.02036 | translate | read | null |
(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)