Scene Understanding - 2024-04
Scene Understanding - 2024-04
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-04-29 | Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM | Navid Rajabi et.al. | 2404.19128 | translate | read | null |
| 2024-04-29 | Compositional Factorization of Visual Scenes with Convolutional Sparse Coding and Resonator Networks | Christopher J. Kymn et.al. | 2404.19126 | translate | read | null |
| 2024-04-24 | Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer | Jiaming Lei et.al. | 2404.15785 | translate | read | null |
| 2024-04-22 | CloudFort: Enhancing Robustness of 3D Point Cloud Classification Against Backdoor Attacks via Spatial Partitioning and Ensemble Prediction | Wenhao Lan et.al. | 2404.14042 | translate | read | null |
| 2024-04-22 | On Support Relations Inference and Scene Hierarchy Graph Construction from Point Cloud in Clustered Environments | Gang Ma et.al. | 2404.13842 | translate | read | null |
| 2024-04-29 | Clio: Real-time Task-Driven Open-Set 3D Scene Graphs | Dominic Maggio et.al. | 2404.13696 | translate | read | link |
| 2024-04-19 | BACS: Background Aware Continual Semantic Segmentation | Mostafa ElAraby et.al. | 2404.13148 | translate | read | link |
| 2024-04-19 | Unified Scene Representation and Reconstruction for 3D Large Language Models | Tao Chu et.al. | 2404.13044 | translate | read | null |
| 2024-04-18 | SPIdepth: Strengthened Pose Information for Self-supervised Monocular Depth Estimation | Mykola Lavreniuk et.al. | 2404.12501 | translate | read | link |
| 2024-04-19 | AccidentBlip2: Accident Detection With Multi-View MotionBlip2 | Yihua Shao et.al. | 2404.12149 | translate | read | link |
| 2024-04-17 | Multimodal 3D Object Detection on Unseen Domains | Deepti Hegde et.al. | 2404.11764 | translate | read | null |
| 2024-04-16 | ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation | Iaroslav Melekhov et.al. | 2404.10699 | translate | read | link |
| 2024-04-16 | PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction | Sinisa Stekovic et.al. | 2404.10620 | translate | read | link |
| 2024-04-16 | PreGSU-A Generalized Traffic Scene Understanding Model for Autonomous Driving based on Pre-trained Graph Attention Network | Yuning Wang et.al. | 2404.10263 | translate | read | null |
| 2024-04-15 | No More Ambiguity in 360° Room Layout via Bi-Layout Estimation | Yu-Ju Tsai et.al. | 2404.09993 | translate | read | null |
| 2024-04-15 | A Review and Efficient Implementation of Scene Graph Generation Metrics | Julian Lorenz et.al. | 2404.09616 | translate | read | link |
| 2024-04-14 | Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms | Diandian Guo et.al. | 2404.09231 | translate | read | null |
| 2024-04-11 | Gaga: Group Any Gaussians via 3D-aware Memory Bank | Weijie Lyu et.al. | 2404.07977 | translate | read | null |
| 2024-04-11 | AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation | Yansheng Li et.al. | 2404.07788 | translate | read | null |
| 2024-04-11 | Depth Estimation using Weighted-loss and Transfer Learning | Muhammad Adeel Hafeez et.al. | 2404.07686 | translate | read | null |
| 2024-04-11 | Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange | Yanhao Wu et.al. | 2404.07504 | translate | read | null |
| 2024-04-10 | Incorporating Explanations into Human-Machine Interfaces for Trust and Situation Awareness in Autonomous Vehicles | Shahin Atakishiyev et.al. | 2404.07383 | translate | read | null |
| 2024-04-10 | ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling | Ege Özsoy et.al. | 2404.07031 | translate | read | link |
| 2024-04-10 | O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation | Muer Tie et.al. | 2404.06836 | translate | read | null |
| 2024-04-09 | QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding | Yash Mehan et.al. | 2404.06442 | translate | read | null |
| 2024-04-09 | DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird’s Eye View Segmentation with Occlusion Reasoning | Senthil Yogamani et.al. | 2404.06352 | translate | read | null |
| 2024-04-09 | JSTR: Judgment Improves Scene Text Recognition | Masato Fujitake et.al. | 2404.05967 | translate | read | null |
| 2024-04-06 | Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation | Danpei Zhao et.al. | 2404.04608 | translate | read | null |
| 2024-04-06 | SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos | Tao Wu et.al. | 2404.04565 | translate | read | link |
| 2024-04-05 | Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation | Zifu Wan et.al. | 2404.04256 | translate | read | link |
| 2024-04-06 | HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion | Jiahang Li et.al. | 2404.03527 | translate | read | link |
| 2024-04-04 | You Only Scan Once: A Dynamic Scene Reconstruction Pipeline for 6-DoF Robotic Grasping of Novel Objects | Lei Zhou et.al. | 2404.03462 | translate | read | null |
| 2024-04-03 | Weakly-Supervised 3D Scene Graph Generation via Visual-Linguistic Assisted Pseudo-labeling | Xu Wang et.al. | 2404.02527 | translate | read | null |
| 2024-04-05 | EGTR: Extracting Graph from Transformer for Scene Graph Generation | Jinbae Im et.al. | 2404.02072 | translate | read | link |
| 2024-04-01 | NeRF-MAE : Masked AutoEncoders for Self Supervised 3D representation Learning for Neural Radiance Fields | Muhammad Zubair Irshad et.al. | 2404.01300 | translate | read | null |
| 2024-04-08 | 360+x: A Panoptic Multi-modal Scene Understanding Dataset | Hao Chen et.al. | 2404.00989 | translate | read | null |
| 2024-04-01 | Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping | Hyeongjun Kwon et.al. | 2404.00974 | translate | read | link |
| 2024-04-01 | GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields | Yunsong Wang et.al. | 2404.00931 | translate | read | link |
| 2024-04-01 | MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements | Lisong C. Sun et.al. | 2404.00923 | translate | read | link |
| 2024-04-01 | From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models | Rongjie Li et.al. | 2404.00906 | translate | read | null |
| 2024-04-01 | Efficient 3D Instance Mapping and Localization with Neural Fields | George Tang et.al. | 2403.19797 | translate | read | null |
(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)