Scene Understanding - 2024-09

Publish Date Title Authors PDF Translate Read Code
2024-09-30 Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation Aleyna Kütük et.al. 2410.00266 translate read null
2024-09-30 Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation Kun Yuan et.al. 2410.00263 translate read link
2024-09-30 You Only Speak Once to See Wenhao Yang et.al. 2409.18372 translate read null
2024-09-26 LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Chenming Zhu et.al. 2409.18125 translate read null
2024-09-26 Text Image Generation for Low-Resource Languages with Dual Translation Learning Chihiro Noguchi et.al. 2409.17747 translate read null
2024-09-26 Scene Understanding in Pick-and-Place Tasks: Analyzing Transformations Between Initial and Final Scenes Seraj Ghasemi et.al. 2409.17720 translate read null
2024-09-24 Open-World Object Detection with Instance Representation Learning Sunoh Lee et.al. 2409.16073 translate read null
2024-09-24 Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving Lingyu Xiao et.al. 2409.15730 translate read link
2024-09-27 Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer Minh Bui et.al. 2409.15117 translate read null
2024-09-23 An Adverse Weather-Immune Scheme with Unfolded Regularization and Foundation Model Knowledge Distillation for Street Scene Understanding Wei-Bin Kou et.al. 2409.14737 translate read null
2024-09-22 One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance Minyi Zhao et.al. 2409.14483 translate read null
2024-09-22 Scene-Text Grounding for Text-Based Video Question Answering Sheng Zhou et.al. 2409.14319 translate read null
2024-09-21 MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors Zhenhua Du et.al. 2409.14019 translate read null
2024-09-21 Relevance-driven Decision Making for Safer and More Efficient Human Robot Collaboration Xiaotong Zhang et.al. 2409.13998 translate read null
2024-09-21 Enhanced Semantic Segmentation for Large-Scale and Imbalanced Point Clouds Haoran Gong et.al. 2409.13983 translate read null
2024-09-19 CLAIR-A: Leveraging Large Language Models to Judge Audio Captions Tsung-Han Wu et.al. 2409.12962 translate read link
2024-09-18 Towards Global Localization using Multi-Modal Object-Instance Re-Identification Aneesh Chavan et.al. 2409.12002 translate read null
2024-09-18 SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection Tim Engelbracht et.al. 2409.11870 translate read null
2024-09-18 VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer Humen Zhong et.al. 2409.11656 translate read null
2024-09-18 DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion Jian Xu et.al. 2409.11642 translate read link
2024-09-16 Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving Yunsheng Ma et.al. 2409.11182 translate read null
2024-09-16 Point2Graph: An End-to-end Point Cloud-based 3D Open-Vocabulary Scene Graph for Robot Navigation Yifan Xu et.al. 2409.10350 translate read null
2024-09-16 Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation Minghan Chen et.al. 2409.10262 translate read null
2024-09-15 Semantic2D: A Semantic Dataset for 2D Lidar Semantic Segmentation Zhanteng Xie et.al. 2409.09899 translate read null
2024-09-12 LED: Light Enhanced Depth Estimation at Night Simon de Moreau et.al. 2409.08031 translate read link
2024-09-12 Relevance for Human Robot Collaboration Xiaotong Zhang et.al. 2409.07753 translate read null
2024-09-10 Towards Localizing Structural Elements: Merging Geometrical Detection with Semantic Verification in RGB-D Data Ali Tourani et.al. 2409.06625 translate read null
2024-09-10 Loss Distillation via Gradient Matching for Point Cloud Completion with Weighted Chamfer Distance Fangzhou Lin et.al. 2409.06171 translate read link
2024-09-09 Online 3D reconstruction and dense tracking in endoscopic videos Michel Hayoz et.al. 2409.06037 translate read link
2024-09-08 TanDepth: Leveraging Global DEMs for Metric Monocular Depth Estimation in UAVs Horatiu Florea et.al. 2409.05142 translate read null
2024-09-06 Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences Rui Yu et.al. 2409.04390 translate read null
2024-09-06 RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement Hao Luo et.al. 2409.04363 translate read link
2024-09-05 Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding Yunze Man et.al. 2409.03757 translate read link
2024-09-05 Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction Shen Chen et.al. 2409.03213 translate read null
2024-09-04 Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving Yuhang Lu et.al. 2409.02914 translate read null
2024-09-03 Unveiling Deep Shadows: A Survey on Image and Video Shadow Detection, Removal, and Generation in the Era of Deep Learning Xiaowei Hu et.al. 2409.02108 translate read link
2024-09-03 EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video Zhen Zhou et.al. 2409.01807 translate read link
2024-09-03 GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting Zixuan Guo et.al. 2409.01581 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)