Scene Understanding - 2024-09
Scene Understanding - 2024-09
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-09-30 | Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation | Aleyna Kütük et.al. | 2410.00266 | translate | read | null |
| 2024-09-30 | Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation | Kun Yuan et.al. | 2410.00263 | translate | read | link |
| 2024-09-30 | You Only Speak Once to See | Wenhao Yang et.al. | 2409.18372 | translate | read | null |
| 2024-09-26 | LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness | Chenming Zhu et.al. | 2409.18125 | translate | read | null |
| 2024-09-26 | Text Image Generation for Low-Resource Languages with Dual Translation Learning | Chihiro Noguchi et.al. | 2409.17747 | translate | read | null |
| 2024-09-26 | Scene Understanding in Pick-and-Place Tasks: Analyzing Transformations Between Initial and Final Scenes | Seraj Ghasemi et.al. | 2409.17720 | translate | read | null |
| 2024-09-24 | Open-World Object Detection with Instance Representation Learning | Sunoh Lee et.al. | 2409.16073 | translate | read | null |
| 2024-09-24 | Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving | Lingyu Xiao et.al. | 2409.15730 | translate | read | link |
| 2024-09-27 | Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer | Minh Bui et.al. | 2409.15117 | translate | read | null |
| 2024-09-23 | An Adverse Weather-Immune Scheme with Unfolded Regularization and Foundation Model Knowledge Distillation for Street Scene Understanding | Wei-Bin Kou et.al. | 2409.14737 | translate | read | null |
| 2024-09-22 | One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance | Minyi Zhao et.al. | 2409.14483 | translate | read | null |
| 2024-09-22 | Scene-Text Grounding for Text-Based Video Question Answering | Sheng Zhou et.al. | 2409.14319 | translate | read | null |
| 2024-09-21 | MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors | Zhenhua Du et.al. | 2409.14019 | translate | read | null |
| 2024-09-21 | Relevance-driven Decision Making for Safer and More Efficient Human Robot Collaboration | Xiaotong Zhang et.al. | 2409.13998 | translate | read | null |
| 2024-09-21 | Enhanced Semantic Segmentation for Large-Scale and Imbalanced Point Clouds | Haoran Gong et.al. | 2409.13983 | translate | read | null |
| 2024-09-19 | CLAIR-A: Leveraging Large Language Models to Judge Audio Captions | Tsung-Han Wu et.al. | 2409.12962 | translate | read | link |
| 2024-09-18 | Towards Global Localization using Multi-Modal Object-Instance Re-Identification | Aneesh Chavan et.al. | 2409.12002 | translate | read | null |
| 2024-09-18 | SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection | Tim Engelbracht et.al. | 2409.11870 | translate | read | null |
| 2024-09-18 | VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer | Humen Zhong et.al. | 2409.11656 | translate | read | null |
| 2024-09-18 | DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion | Jian Xu et.al. | 2409.11642 | translate | read | link |
| 2024-09-16 | Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving | Yunsheng Ma et.al. | 2409.11182 | translate | read | null |
| 2024-09-16 | Point2Graph: An End-to-end Point Cloud-based 3D Open-Vocabulary Scene Graph for Robot Navigation | Yifan Xu et.al. | 2409.10350 | translate | read | null |
| 2024-09-16 | Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation | Minghan Chen et.al. | 2409.10262 | translate | read | null |
| 2024-09-15 | Semantic2D: A Semantic Dataset for 2D Lidar Semantic Segmentation | Zhanteng Xie et.al. | 2409.09899 | translate | read | null |
| 2024-09-12 | LED: Light Enhanced Depth Estimation at Night | Simon de Moreau et.al. | 2409.08031 | translate | read | link |
| 2024-09-12 | Relevance for Human Robot Collaboration | Xiaotong Zhang et.al. | 2409.07753 | translate | read | null |
| 2024-09-10 | Towards Localizing Structural Elements: Merging Geometrical Detection with Semantic Verification in RGB-D Data | Ali Tourani et.al. | 2409.06625 | translate | read | null |
| 2024-09-10 | Loss Distillation via Gradient Matching for Point Cloud Completion with Weighted Chamfer Distance | Fangzhou Lin et.al. | 2409.06171 | translate | read | link |
| 2024-09-09 | Online 3D reconstruction and dense tracking in endoscopic videos | Michel Hayoz et.al. | 2409.06037 | translate | read | link |
| 2024-09-08 | TanDepth: Leveraging Global DEMs for Metric Monocular Depth Estimation in UAVs | Horatiu Florea et.al. | 2409.05142 | translate | read | null |
| 2024-09-06 | Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences | Rui Yu et.al. | 2409.04390 | translate | read | null |
| 2024-09-06 | RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement | Hao Luo et.al. | 2409.04363 | translate | read | link |
| 2024-09-05 | Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding | Yunze Man et.al. | 2409.03757 | translate | read | link |
| 2024-09-05 | Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction | Shen Chen et.al. | 2409.03213 | translate | read | null |
| 2024-09-04 | Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving | Yuhang Lu et.al. | 2409.02914 | translate | read | null |
| 2024-09-03 | Unveiling Deep Shadows: A Survey on Image and Video Shadow Detection, Removal, and Generation in the Era of Deep Learning | Xiaowei Hu et.al. | 2409.02108 | translate | read | link |
| 2024-09-03 | EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video | Zhen Zhou et.al. | 2409.01807 | translate | read | link |
| 2024-09-03 | GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting | Zixuan Guo et.al. | 2409.01581 | translate | read | null |
(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)