Scene Understanding - 2024-10

Publish Date Title Authors PDF Translate Read Code
2024-10-30 UniRiT: Towards Few-Shot Non-Rigid Point Cloud Registration Geng Li et.al. 2410.22909 translate read null
2024-10-30 Situational Scene Graph for Structured Human-centric Situation Understanding Chinthani Sugandhika et.al. 2410.22829 translate read null
2024-10-30 Symbolic Graph Inference for Compound Scene Understanding FNU Aryan et.al. 2410.22626 translate read null
2024-10-29 Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving Bo Jiang et.al. 2410.22313 translate read link
2024-10-26 Towards Robust Algorithms for Surgical Phase Recognition via Digital Twin-based Scene Representation Hao Ding et.al. 2410.20026 translate read null
2024-10-23 Surgical Scene Segmentation by Transformer With Asymmetric Feature Enhancement Cheng Yuan et.al. 2410.17642 translate read link
2024-10-22 PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding Vinh Nguyen et.al. 2410.16824 translate read null
2024-10-20 Scene Graph Generation with Role-Playing Large Language Models Guikun Chen et.al. 2410.15364 translate read null
2024-10-20 Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment Can Cui et.al. 2410.15281 translate read null
2024-10-19 Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards Lukas Brunke et.al. 2410.15185 translate read null
2024-10-19 Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding Yi Liu et.al. 2410.14944 translate read link
2024-10-17 ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding Guangda Ji et.al. 2410.13924 translate read link
2024-10-17 VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding Runsen Xu et.al. 2410.13860 translate read link
2024-10-16 3D Gaussian Splatting in Robotics: A Survey Siting Zhu et.al. 2410.12262 translate read null
2024-10-17 SAM-Guided Masked Token Prediction for 3D Scene Understanding Zhimin Chen et.al. 2410.12158 translate read null
2024-10-16 Leveraging Large Vision Language Model For Better Automatic Web GUI Testing Siyi Wang et.al. 2410.12157 translate read null
2024-10-15 MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark Bin Shan et.al. 2410.11538 translate read link
2024-10-14 3DArticCyclists: Generating Simulated Dynamic 3D Cyclists for Human-Object Interaction (HOI) and Autonomous Driving Applications Eduardo R. Corral-Soto et.al. 2410.10782 translate read null
2024-10-17 Stratified Domain Adaptation: A Progressive Self-Training Approach for Scene Text Recognition Kha Nhat Le et.al. 2410.09913 translate read null
2024-10-13 LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond Md Tanvir Islam et.al. 2410.09831 translate read link
2024-10-12 Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors Hritam Basak et.al. 2410.09467 translate read null
2024-10-11 Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking Wei Zhang et.al. 2410.08616 translate read null
2024-10-10 A transition towards virtual representations of visual scenes Américo Pereira et.al. 2410.07987 translate read null
2024-10-10 RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation Songming Liu et.al. 2410.07864 translate read null
2024-10-11 Test-Time Intensity Consistency Adaptation for Shadow Detection Leyi Zhu et.al. 2410.07695 translate read null
2024-10-10 3D Vision-Language Gaussian Splatting Qucheng Peng et.al. 2410.07577 translate read null
2024-10-09 Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy Qinfeng Zhu et.al. 2410.06725 translate read null
2024-10-09 Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments Meng Yu et.al. 2410.06626 translate read null
2024-10-08 BoxMap: Efficient Structural Mapping and Navigation Zili Wang et.al. 2410.06263 translate read null
2024-10-08 OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and Open-Vocabulary Semantic Scene Graphs Venkata Naren Devarakonda et.al. 2410.06239 translate read null
2024-10-07 Resource-Efficient Multiview Perception: Integrating Semantic Masking with Masked Autoencoders Kosta Dakic et.al. 2410.04817 translate read null
2024-10-07 Diffusion Models in 3D Vision: A Survey Zhen Wang et.al. 2410.04738 translate read null
2024-10-06 In-Place Panoptic Radiance Field Segmentation with Perceptual Prior for 3D Scene Understanding Shenghao Li et.al. 2410.04529 translate read null
2024-10-05 ETHcavation: A Dataset and Pipeline for Panoptic Scene Understanding and Object Tracking in Dynamic Construction Environments Lorenzo Terenzi et.al. 2410.04250 translate read null
2024-10-05 Fast Object Detection with a Machine Learning Edge Device Richard C. Rodriguez et.al. 2410.04173 translate read null
2024-10-04 SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models Yue Zhang et.al. 2410.03878 translate read null
2024-10-03 RESSCAL3D++: Joint Acquisition and Semantic Segmentation of 3D Point Clouds Remco Royen et.al. 2410.02323 translate read link
2024-10-01 A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio Xavier Juanola et.al. 2410.01020 translate read link
2024-10-02 BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes Kasun Weerakoon et.al. 2409.16484 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)