Scene Understanding - 2025-01
Scene Understanding - 2025-01
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-01-30 | Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation | Yuelei Li et.al. | 2501.18733 | translate | read | null |
| 2025-01-30 | Efficient Interactive 3D Multi-Object Removal | Jingcheng Ni et.al. | 2501.17636 | translate | read | null |
| 2025-01-29 | PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding | Wei Chow et.al. | 2501.16411 | translate | read | link |
| 2025-01-26 | Ocean-OCR: Towards General OCR Application via a Vision-Language Model | Song Chen et.al. | 2501.15558 | translate | read | link |
| 2025-01-26 | Unveiling the Potential of iMarkers: Invisible Fiducial Markers for Advanced Robotics | Ali Tourani et.al. | 2501.15505 | translate | read | link |
| 2025-01-24 | HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation | Xin Zhou et.al. | 2501.14729 | translate | read | link |
| 2025-01-24 | Scene Understanding Enabled Semantic Communication with Open Channel Coding | Zhe Xiang et.al. | 2501.14520 | translate | read | null |
| 2025-01-23 | GeomGS: LiDAR-Guided Geometry-Aware Gaussian Splatting for Robot Localization | Jaewon Lee et.al. | 2501.13417 | translate | read | null |
| 2025-01-22 | Neural Radiance Fields for the Real World: A Survey | Wenhui Xiao et.al. | 2501.13104 | translate | read | null |
| 2025-01-22 | PSGSL: A Probabilistic Framework Integrating Semantic Scene Understanding and Gas Sensing for Gas Source Localization | Pepe Ojeda et.al. | 2501.12812 | translate | read | null |
| 2025-01-20 | Dynamic Scene Understanding from Vision-Language Representations | Shahaf Pruss et.al. | 2501.11653 | translate | read | null |
| 2025-01-20 | EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery | Guankun Wang et.al. | 2501.11347 | translate | read | link |
| 2025-01-20 | A Survey of World Models for Autonomous Driving | Tuo Feng et.al. | 2501.11260 | translate | read | null |
| 2025-01-17 | A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features | Enes Karanfil et.al. | 2501.10144 | translate | read | null |
| 2025-01-16 | CrossModalityDiffusion: Multi-Modal Novel View Synthesis with Unified Intermediate Representation | Alex Berian et.al. | 2501.09838 | translate | read | link |
| 2025-01-16 | YETI (YET to Intervene) Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks | Saptarashmi Bandyopadhyay et.al. | 2501.09355 | translate | read | null |
| 2025-01-15 | Embodied Scene Understanding for Vision Language Models via MetaVQA | Weizhen Wang et.al. | 2501.09167 | translate | read | null |
| 2025-01-15 | GOTLoc: General Outdoor Text-based Localization Using Scene Graph Retrieval with OpenStreetMap | Donghwi Jung et.al. | 2501.08575 | translate | read | link |
| 2025-01-14 | 3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding | Haomiao Xiong et.al. | 2501.07819 | translate | read | link |
| 2025-01-13 | Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models | Yasiru Ranasinghe et.al. | 2501.07396 | translate | read | null |
| 2025-01-13 | Hierarchical Superpixel Segmentation via Structural Information Theory | Minhui Xie et.al. | 2501.07069 | translate | read | link |
| 2025-01-12 | Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving | Haoxiang Gao et.al. | 2501.06680 | translate | read | null |
| 2025-01-08 | NextStop: An Improved Tracker For Panoptic LIDAR Segmentation Data | Nirit Alkalay et.al. | 2501.06235 | translate | read | null |
| 2025-01-10 | Self-Supervised Partial Cycle-Consistency for Multi-View Matching | Fedor Taggenbrock et.al. | 2501.06000 | translate | read | link |
| 2025-01-10 | UniQ: Unified Decoder with Task-specific Queries for Efficient Scene Graph Generation | Xinyao Liao et.al. | 2501.05687 | translate | read | null |
| 2025-01-09 | Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding | Mohammed Elhenawy et.al. | 2501.05566 | translate | read | null |
| 2025-01-09 | A Systematic Literature Review on Deep Learning-based Depth Estimation in Computer Vision | Ali Rohan et.al. | 2501.05147 | translate | read | null |
| 2025-01-08 | TADFormer : Task-Adaptive Dynamic Transformer for Efficient Multi-Task Learning | Seungmin Baek et.al. | 2501.04293 | translate | read | null |
| 2025-01-07 | A Bayesian Modeling Framework for Estimation and Ground Segmentation of Cluttered Staircases | Prasanna Sriganesh et.al. | 2501.04170 | translate | read | null |
| 2025-01-07 | LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving | Lingdong Kong et.al. | 2501.04005 | translate | read | null |
| 2025-01-07 | CL3DOR: Contrastive Learning for 3D Large Multimodal Models via Odds Ratio on High-Resolution Point Clouds | Keonwoo Kim et.al. | 2501.03879 | translate | read | null |
| 2025-01-07 | Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets | Jing Liu et.al. | 2501.03637 | translate | read | null |
| 2025-01-03 | VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment | Wenyan Cong et.al. | 2501.01949 | translate | read | null |
| 2025-01-03 | IAM: Enhancing RGB-D Instance Segmentation with New Benchmarks | Aecheon Jung et.al. | 2501.01685 | translate | read | link |
| 2025-01-09 | GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models | Zhangyang Qi et.al. | 2501.01428 | translate | read | null |
| 2025-01-02 | 3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer | Jiajun Deng et.al. | 2501.01163 | translate | read | null |
| 2025-01-02 | Leverage Cross-Attention for End-to-End Open-Vocabulary Panoptic Reconstruction | Xuan Yu et.al. | 2501.01119 | translate | read | null |
(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)