Scene Understanding - 2024-07

Publish Date Title Authors PDF Translate Read Code
2024-07-31 RoadFormer+: Delivering RGB-X Scene Parsing through Scale-Aware Information Decoupling and Advanced Heterogeneous Feature Fusion Jianxin Huang et.al. 2407.21631 translate read null
2024-07-31 Voxel Scene Graph for Intracranial Hemorrhage Antoine P. Sanner et.al. 2407.21580 translate read null
2024-07-31 A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap Lijun Zhang et.al. 2407.21438 translate read link
2024-07-31 DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations Dongwon Son et.al. 2407.21267 translate read null
2024-07-30 From Feature Importance to Natural Language Explanations Using LLMs with RAG Sule Tekkesinoglu et.al. 2407.20990 translate read null
2024-07-30 Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering Yanpeng Zhao et.al. 2407.20908 translate read link
2024-07-30 NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding Hongjia Zhai et.al. 2407.20853 translate read null
2024-07-29 SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction Çağhan Köksal et.al. 2407.20214 translate read null
2024-07-29 Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets Muhammad Abdullah Jamal et.al. 2407.19714 translate read null
2024-07-28 ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding Zhen Chen et.al. 2407.19435 translate read link
2024-07-27 GP-VLS: A general-purpose vision language model for surgery Samuel Schmidgall et.al. 2407.19305 translate read null
2024-07-27 Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction Yansheng Li et.al. 2407.19259 translate read null
2024-07-26 BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation Peng Hao et.al. 2407.18715 translate read null
2024-07-26 MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition Chang Liu et.al. 2407.18616 translate read link
2024-07-26 Answerability Fields: Answerable Location Estimation via Diffusion Models Daichi Azuma et.al. 2407.18497 translate read null
2024-07-24 3D Question Answering for City Scene Understanding Penglei Sun et.al. 2407.17398 translate read null
2024-07-23 Augmented Efficiency: Reducing Memory Footprint and Accelerating Inference for 3D Semantic Segmentation through Hybrid Vision Aditya Krishnan et.al. 2407.16102 translate read null
2024-07-25 Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation Jaehyeong Jeon et.al. 2407.15396 translate read link
2024-07-21 VideoGameBunny: Towards vision assistants for video games Mohammad Reza Taesiri et.al. 2407.15295 translate read null
2024-07-21 Self-training Room Layout Estimation via Geometry-aware Ray-casting Bolivar Solarte et.al. 2407.15041 translate read null
2024-07-19 A New Lightweight Hybrid Graph Convolutional Neural Network – CNN Scheme for Scene Classification using Object Detection Inference Ayman Beghdadi et.al. 2407.14658 translate read null
2024-07-19 OpenSU3D: Open World 3D Scene Understanding using Foundation Models Rafay Mohiuddin et.al. 2407.14279 translate read null
2024-07-19 MC-PanDA: Mask Confidence for Panoptic Domain Adaptation Ivan Martinović et.al. 2407.14110 translate read link
2024-07-19 GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation Florian Chabot et.al. 2407.14108 translate read null
2024-07-18 Training-Free Model Merging for Multi-target Domain Adaptation Wenyi Li et.al. 2407.13771 translate read null
2024-07-18 General Geometry-aware Weakly Supervised 3D Object Detection Guowen Zhang et.al. 2407.13748 translate read link
2024-07-18 Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation Pengfei Wang et.al. 2407.13362 translate read null
2024-07-17 InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction Xulong Wang et.al. 2407.12661 translate read link
2024-07-17 Out of Length Text Recognition with Sub-String Matching Yongkun Du et.al. 2407.12317 translate read link
2024-07-17 Dual-Hybrid Attention Network for Specular Highlight Removal Xiaojiao Guo et.al. 2407.12255 translate read null
2024-07-16 Disentangled Acoustic Fields For Multimodal Physical Scene Understanding Jie Yin et.al. 2407.11333 translate read null
2024-07-15 OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models Zijian Zhou et.al. 2407.11213 translate read link
2024-07-15 No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations Walter Simoncini et.al. 2407.10964 translate read link
2024-07-18 Benchmarking Vision Language Models for Cultural Understanding Shravan Nayak et.al. 2407.10920 translate read null
2024-07-14 Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data Tuo Feng et.al. 2407.10200 translate read link
2024-07-13 Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding Ruihuang Li et.al. 2407.09781 translate read null
2024-07-12 A Fair Ranking and New Model for Panoptic Scene Graph Generation Julian Lorenz et.al. 2407.09216 translate read link
2024-07-12 From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation Hanrong Shi et.al. 2407.09191 translate read null
2024-07-11 BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight Hang Wu et.al. 2407.08526 translate read null
2024-07-10 Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences Nikolaos Dimitriadis et.al. 2407.08056 translate read null
2024-07-10 Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search Kirill Paramonov et.al. 2407.07541 translate read null
2024-07-09 Joint prototype and coefficient prediction for 3D instance segmentation Remco Royen et.al. 2407.06958 translate read null
2024-07-09 LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition Teng Wang et.al. 2407.06730 translate read null
2024-07-08 Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition Bangbang Zhou et.al. 2407.05562 translate read link
2024-07-07 Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness Idris Hamoud et.al. 2407.05448 translate read null
2024-07-05 Hybrid Primal Sketch: Combining Analogy, Qualitative Representations, and Computer Vision for Scene Understanding Kenneth D. Forbus et.al. 2407.04859 translate read null
2024-07-03 A Unified Framework for 3D Scene Understanding Wei Xu et.al. 2407.03263 translate read null
2024-07-11 Open Panoramic Segmentation Junwei Zheng et.al. 2407.02685 translate read link
2024-07-02 MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders Baijiong Lin et.al. 2407.02228 translate read link
2024-07-02 Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning Chengchao Shen et.al. 2407.02014 translate read link
2024-07-01 PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction Xuan Yu et.al. 2407.01349 translate read null
2024-07-01 Mobile Robot Oriented Large-Scale Indoor Dataset for Dynamic Scene Understanding Yifan Tang et.al. 2406.19791 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)