Scene Understanding - 2024-10
Scene Understanding - 2024-10
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-10-30 | UniRiT: Towards Few-Shot Non-Rigid Point Cloud Registration | Geng Li et.al. | 2410.22909 | translate | read | null |
| 2024-10-30 | Situational Scene Graph for Structured Human-centric Situation Understanding | Chinthani Sugandhika et.al. | 2410.22829 | translate | read | null |
| 2024-10-30 | Symbolic Graph Inference for Compound Scene Understanding | FNU Aryan et.al. | 2410.22626 | translate | read | null |
| 2024-10-29 | Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving | Bo Jiang et.al. | 2410.22313 | translate | read | link |
| 2024-10-26 | Towards Robust Algorithms for Surgical Phase Recognition via Digital Twin-based Scene Representation | Hao Ding et.al. | 2410.20026 | translate | read | null |
| 2024-10-23 | Surgical Scene Segmentation by Transformer With Asymmetric Feature Enhancement | Cheng Yuan et.al. | 2410.17642 | translate | read | link |
| 2024-10-22 | PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding | Vinh Nguyen et.al. | 2410.16824 | translate | read | null |
| 2024-10-20 | Scene Graph Generation with Role-Playing Large Language Models | Guikun Chen et.al. | 2410.15364 | translate | read | null |
| 2024-10-20 | Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment | Can Cui et.al. | 2410.15281 | translate | read | null |
| 2024-10-19 | Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards | Lukas Brunke et.al. | 2410.15185 | translate | read | null |
| 2024-10-19 | Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding | Yi Liu et.al. | 2410.14944 | translate | read | link |
| 2024-10-17 | ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding | Guangda Ji et.al. | 2410.13924 | translate | read | link |
| 2024-10-17 | VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding | Runsen Xu et.al. | 2410.13860 | translate | read | link |
| 2024-10-16 | 3D Gaussian Splatting in Robotics: A Survey | Siting Zhu et.al. | 2410.12262 | translate | read | null |
| 2024-10-17 | SAM-Guided Masked Token Prediction for 3D Scene Understanding | Zhimin Chen et.al. | 2410.12158 | translate | read | null |
| 2024-10-16 | Leveraging Large Vision Language Model For Better Automatic Web GUI Testing | Siyi Wang et.al. | 2410.12157 | translate | read | null |
| 2024-10-15 | MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark | Bin Shan et.al. | 2410.11538 | translate | read | link |
| 2024-10-14 | 3DArticCyclists: Generating Simulated Dynamic 3D Cyclists for Human-Object Interaction (HOI) and Autonomous Driving Applications | Eduardo R. Corral-Soto et.al. | 2410.10782 | translate | read | null |
| 2024-10-17 | Stratified Domain Adaptation: A Progressive Self-Training Approach for Scene Text Recognition | Kha Nhat Le et.al. | 2410.09913 | translate | read | null |
| 2024-10-13 | LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond | Md Tanvir Islam et.al. | 2410.09831 | translate | read | link |
| 2024-10-12 | Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors | Hritam Basak et.al. | 2410.09467 | translate | read | null |
| 2024-10-11 | Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking | Wei Zhang et.al. | 2410.08616 | translate | read | null |
| 2024-10-10 | A transition towards virtual representations of visual scenes | Américo Pereira et.al. | 2410.07987 | translate | read | null |
| 2024-10-10 | RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation | Songming Liu et.al. | 2410.07864 | translate | read | null |
| 2024-10-11 | Test-Time Intensity Consistency Adaptation for Shadow Detection | Leyi Zhu et.al. | 2410.07695 | translate | read | null |
| 2024-10-10 | 3D Vision-Language Gaussian Splatting | Qucheng Peng et.al. | 2410.07577 | translate | read | null |
| 2024-10-09 | Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy | Qinfeng Zhu et.al. | 2410.06725 | translate | read | null |
| 2024-10-09 | Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments | Meng Yu et.al. | 2410.06626 | translate | read | null |
| 2024-10-08 | BoxMap: Efficient Structural Mapping and Navigation | Zili Wang et.al. | 2410.06263 | translate | read | null |
| 2024-10-08 | OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and Open-Vocabulary Semantic Scene Graphs | Venkata Naren Devarakonda et.al. | 2410.06239 | translate | read | null |
| 2024-10-07 | Resource-Efficient Multiview Perception: Integrating Semantic Masking with Masked Autoencoders | Kosta Dakic et.al. | 2410.04817 | translate | read | null |
| 2024-10-07 | Diffusion Models in 3D Vision: A Survey | Zhen Wang et.al. | 2410.04738 | translate | read | null |
| 2024-10-06 | In-Place Panoptic Radiance Field Segmentation with Perceptual Prior for 3D Scene Understanding | Shenghao Li et.al. | 2410.04529 | translate | read | null |
| 2024-10-05 | ETHcavation: A Dataset and Pipeline for Panoptic Scene Understanding and Object Tracking in Dynamic Construction Environments | Lorenzo Terenzi et.al. | 2410.04250 | translate | read | null |
| 2024-10-05 | Fast Object Detection with a Machine Learning Edge Device | Richard C. Rodriguez et.al. | 2410.04173 | translate | read | null |
| 2024-10-04 | SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models | Yue Zhang et.al. | 2410.03878 | translate | read | null |
| 2024-10-03 | RESSCAL3D++: Joint Acquisition and Semantic Segmentation of 3D Point Clouds | Remco Royen et.al. | 2410.02323 | translate | read | link |
| 2024-10-01 | A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio | Xavier Juanola et.al. | 2410.01020 | translate | read | link |
| 2024-10-02 | BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes | Kasun Weerakoon et.al. | 2409.16484 | translate | read | null |
(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)