Scene Understanding - 2024-07
Scene Understanding - 2024-07
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-07-31 | RoadFormer+: Delivering RGB-X Scene Parsing through Scale-Aware Information Decoupling and Advanced Heterogeneous Feature Fusion | Jianxin Huang et.al. | 2407.21631 | translate | read | null |
| 2024-07-31 | Voxel Scene Graph for Intracranial Hemorrhage | Antoine P. Sanner et.al. | 2407.21580 | translate | read | null |
| 2024-07-31 | A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap | Lijun Zhang et.al. | 2407.21438 | translate | read | link |
| 2024-07-31 | DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations | Dongwon Son et.al. | 2407.21267 | translate | read | null |
| 2024-07-30 | From Feature Importance to Natural Language Explanations Using LLMs with RAG | Sule Tekkesinoglu et.al. | 2407.20990 | translate | read | null |
| 2024-07-30 | Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering | Yanpeng Zhao et.al. | 2407.20908 | translate | read | link |
| 2024-07-30 | NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding | Hongjia Zhai et.al. | 2407.20853 | translate | read | null |
| 2024-07-29 | SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction | Çağhan Köksal et.al. | 2407.20214 | translate | read | null |
| 2024-07-29 | Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets | Muhammad Abdullah Jamal et.al. | 2407.19714 | translate | read | null |
| 2024-07-28 | ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding | Zhen Chen et.al. | 2407.19435 | translate | read | link |
| 2024-07-27 | GP-VLS: A general-purpose vision language model for surgery | Samuel Schmidgall et.al. | 2407.19305 | translate | read | null |
| 2024-07-27 | Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction | Yansheng Li et.al. | 2407.19259 | translate | read | null |
| 2024-07-26 | BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation | Peng Hao et.al. | 2407.18715 | translate | read | null |
| 2024-07-26 | MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition | Chang Liu et.al. | 2407.18616 | translate | read | link |
| 2024-07-26 | Answerability Fields: Answerable Location Estimation via Diffusion Models | Daichi Azuma et.al. | 2407.18497 | translate | read | null |
| 2024-07-24 | 3D Question Answering for City Scene Understanding | Penglei Sun et.al. | 2407.17398 | translate | read | null |
| 2024-07-23 | Augmented Efficiency: Reducing Memory Footprint and Accelerating Inference for 3D Semantic Segmentation through Hybrid Vision | Aditya Krishnan et.al. | 2407.16102 | translate | read | null |
| 2024-07-25 | Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation | Jaehyeong Jeon et.al. | 2407.15396 | translate | read | link |
| 2024-07-21 | VideoGameBunny: Towards vision assistants for video games | Mohammad Reza Taesiri et.al. | 2407.15295 | translate | read | null |
| 2024-07-21 | Self-training Room Layout Estimation via Geometry-aware Ray-casting | Bolivar Solarte et.al. | 2407.15041 | translate | read | null |
| 2024-07-19 | A New Lightweight Hybrid Graph Convolutional Neural Network – CNN Scheme for Scene Classification using Object Detection Inference | Ayman Beghdadi et.al. | 2407.14658 | translate | read | null |
| 2024-07-19 | OpenSU3D: Open World 3D Scene Understanding using Foundation Models | Rafay Mohiuddin et.al. | 2407.14279 | translate | read | null |
| 2024-07-19 | MC-PanDA: Mask Confidence for Panoptic Domain Adaptation | Ivan Martinović et.al. | 2407.14110 | translate | read | link |
| 2024-07-19 | GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation | Florian Chabot et.al. | 2407.14108 | translate | read | null |
| 2024-07-18 | Training-Free Model Merging for Multi-target Domain Adaptation | Wenyi Li et.al. | 2407.13771 | translate | read | null |
| 2024-07-18 | General Geometry-aware Weakly Supervised 3D Object Detection | Guowen Zhang et.al. | 2407.13748 | translate | read | link |
| 2024-07-18 | Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation | Pengfei Wang et.al. | 2407.13362 | translate | read | null |
| 2024-07-17 | InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction | Xulong Wang et.al. | 2407.12661 | translate | read | link |
| 2024-07-17 | Out of Length Text Recognition with Sub-String Matching | Yongkun Du et.al. | 2407.12317 | translate | read | link |
| 2024-07-17 | Dual-Hybrid Attention Network for Specular Highlight Removal | Xiaojiao Guo et.al. | 2407.12255 | translate | read | null |
| 2024-07-16 | Disentangled Acoustic Fields For Multimodal Physical Scene Understanding | Jie Yin et.al. | 2407.11333 | translate | read | null |
| 2024-07-15 | OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models | Zijian Zhou et.al. | 2407.11213 | translate | read | link |
| 2024-07-15 | No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations | Walter Simoncini et.al. | 2407.10964 | translate | read | link |
| 2024-07-18 | Benchmarking Vision Language Models for Cultural Understanding | Shravan Nayak et.al. | 2407.10920 | translate | read | null |
| 2024-07-14 | Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data | Tuo Feng et.al. | 2407.10200 | translate | read | link |
| 2024-07-13 | Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding | Ruihuang Li et.al. | 2407.09781 | translate | read | null |
| 2024-07-12 | A Fair Ranking and New Model for Panoptic Scene Graph Generation | Julian Lorenz et.al. | 2407.09216 | translate | read | link |
| 2024-07-12 | From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation | Hanrong Shi et.al. | 2407.09191 | translate | read | null |
| 2024-07-11 | BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight | Hang Wu et.al. | 2407.08526 | translate | read | null |
| 2024-07-10 | Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences | Nikolaos Dimitriadis et.al. | 2407.08056 | translate | read | null |
| 2024-07-10 | Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search | Kirill Paramonov et.al. | 2407.07541 | translate | read | null |
| 2024-07-09 | Joint prototype and coefficient prediction for 3D instance segmentation | Remco Royen et.al. | 2407.06958 | translate | read | null |
| 2024-07-09 | LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition | Teng Wang et.al. | 2407.06730 | translate | read | null |
| 2024-07-08 | Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition | Bangbang Zhou et.al. | 2407.05562 | translate | read | link |
| 2024-07-07 | Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness | Idris Hamoud et.al. | 2407.05448 | translate | read | null |
| 2024-07-05 | Hybrid Primal Sketch: Combining Analogy, Qualitative Representations, and Computer Vision for Scene Understanding | Kenneth D. Forbus et.al. | 2407.04859 | translate | read | null |
| 2024-07-03 | A Unified Framework for 3D Scene Understanding | Wei Xu et.al. | 2407.03263 | translate | read | null |
| 2024-07-11 | Open Panoramic Segmentation | Junwei Zheng et.al. | 2407.02685 | translate | read | link |
| 2024-07-02 | MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders | Baijiong Lin et.al. | 2407.02228 | translate | read | link |
| 2024-07-02 | Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning | Chengchao Shen et.al. | 2407.02014 | translate | read | link |
| 2024-07-01 | PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction | Xuan Yu et.al. | 2407.01349 | translate | read | null |
| 2024-07-01 | Mobile Robot Oriented Large-Scale Indoor Dataset for Dynamic Scene Understanding | Yifan Tang et.al. | 2406.19791 | translate | read | null |
(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)