Scene Understanding - 2025-01

Publish Date Title Authors PDF Translate Read Code
2025-01-30 Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation Yuelei Li et.al. 2501.18733 translate read null
2025-01-30 Efficient Interactive 3D Multi-Object Removal Jingcheng Ni et.al. 2501.17636 translate read null
2025-01-29 PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Wei Chow et.al. 2501.16411 translate read link
2025-01-26 Ocean-OCR: Towards General OCR Application via a Vision-Language Model Song Chen et.al. 2501.15558 translate read link
2025-01-26 Unveiling the Potential of iMarkers: Invisible Fiducial Markers for Advanced Robotics Ali Tourani et.al. 2501.15505 translate read link
2025-01-24 HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation Xin Zhou et.al. 2501.14729 translate read link
2025-01-24 Scene Understanding Enabled Semantic Communication with Open Channel Coding Zhe Xiang et.al. 2501.14520 translate read null
2025-01-23 GeomGS: LiDAR-Guided Geometry-Aware Gaussian Splatting for Robot Localization Jaewon Lee et.al. 2501.13417 translate read null
2025-01-22 Neural Radiance Fields for the Real World: A Survey Wenhui Xiao et.al. 2501.13104 translate read null
2025-01-22 PSGSL: A Probabilistic Framework Integrating Semantic Scene Understanding and Gas Sensing for Gas Source Localization Pepe Ojeda et.al. 2501.12812 translate read null
2025-01-20 Dynamic Scene Understanding from Vision-Language Representations Shahaf Pruss et.al. 2501.11653 translate read null
2025-01-20 EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery Guankun Wang et.al. 2501.11347 translate read link
2025-01-20 A Survey of World Models for Autonomous Driving Tuo Feng et.al. 2501.11260 translate read null
2025-01-17 A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features Enes Karanfil et.al. 2501.10144 translate read null
2025-01-16 CrossModalityDiffusion: Multi-Modal Novel View Synthesis with Unified Intermediate Representation Alex Berian et.al. 2501.09838 translate read link
2025-01-16 YETI (YET to Intervene) Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks Saptarashmi Bandyopadhyay et.al. 2501.09355 translate read null
2025-01-15 Embodied Scene Understanding for Vision Language Models via MetaVQA Weizhen Wang et.al. 2501.09167 translate read null
2025-01-15 GOTLoc: General Outdoor Text-based Localization Using Scene Graph Retrieval with OpenStreetMap Donghwi Jung et.al. 2501.08575 translate read link
2025-01-14 3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding Haomiao Xiong et.al. 2501.07819 translate read link
2025-01-13 Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models Yasiru Ranasinghe et.al. 2501.07396 translate read null
2025-01-13 Hierarchical Superpixel Segmentation via Structural Information Theory Minhui Xie et.al. 2501.07069 translate read link
2025-01-12 Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving Haoxiang Gao et.al. 2501.06680 translate read null
2025-01-08 NextStop: An Improved Tracker For Panoptic LIDAR Segmentation Data Nirit Alkalay et.al. 2501.06235 translate read null
2025-01-10 Self-Supervised Partial Cycle-Consistency for Multi-View Matching Fedor Taggenbrock et.al. 2501.06000 translate read link
2025-01-10 UniQ: Unified Decoder with Task-specific Queries for Efficient Scene Graph Generation Xinyao Liao et.al. 2501.05687 translate read null
2025-01-09 Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding Mohammed Elhenawy et.al. 2501.05566 translate read null
2025-01-09 A Systematic Literature Review on Deep Learning-based Depth Estimation in Computer Vision Ali Rohan et.al. 2501.05147 translate read null
2025-01-08 TADFormer : Task-Adaptive Dynamic Transformer for Efficient Multi-Task Learning Seungmin Baek et.al. 2501.04293 translate read null
2025-01-07 A Bayesian Modeling Framework for Estimation and Ground Segmentation of Cluttered Staircases Prasanna Sriganesh et.al. 2501.04170 translate read null
2025-01-07 LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving Lingdong Kong et.al. 2501.04005 translate read null
2025-01-07 CL3DOR: Contrastive Learning for 3D Large Multimodal Models via Odds Ratio on High-Resolution Point Clouds Keonwoo Kim et.al. 2501.03879 translate read null
2025-01-07 Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets Jing Liu et.al. 2501.03637 translate read null
2025-01-03 VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment Wenyan Cong et.al. 2501.01949 translate read null
2025-01-03 IAM: Enhancing RGB-D Instance Segmentation with New Benchmarks Aecheon Jung et.al. 2501.01685 translate read link
2025-01-09 GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models Zhangyang Qi et.al. 2501.01428 translate read null
2025-01-02 3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer Jiajun Deng et.al. 2501.01163 translate read null
2025-01-02 Leverage Cross-Attention for End-to-End Open-Vocabulary Panoptic Reconstruction Xuan Yu et.al. 2501.01119 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)