Scene Understanding - 2024-04

Publish Date Title Authors PDF Translate Read Code
2024-04-29 Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM Navid Rajabi et.al. 2404.19128 translate read null
2024-04-29 Compositional Factorization of Visual Scenes with Convolutional Sparse Coding and Resonator Networks Christopher J. Kymn et.al. 2404.19126 translate read null
2024-04-24 Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer Jiaming Lei et.al. 2404.15785 translate read null
2024-04-22 CloudFort: Enhancing Robustness of 3D Point Cloud Classification Against Backdoor Attacks via Spatial Partitioning and Ensemble Prediction Wenhao Lan et.al. 2404.14042 translate read null
2024-04-22 On Support Relations Inference and Scene Hierarchy Graph Construction from Point Cloud in Clustered Environments Gang Ma et.al. 2404.13842 translate read null
2024-04-29 Clio: Real-time Task-Driven Open-Set 3D Scene Graphs Dominic Maggio et.al. 2404.13696 translate read link
2024-04-19 BACS: Background Aware Continual Semantic Segmentation Mostafa ElAraby et.al. 2404.13148 translate read link
2024-04-19 Unified Scene Representation and Reconstruction for 3D Large Language Models Tao Chu et.al. 2404.13044 translate read null
2024-04-18 SPIdepth: Strengthened Pose Information for Self-supervised Monocular Depth Estimation Mykola Lavreniuk et.al. 2404.12501 translate read link
2024-04-19 AccidentBlip2: Accident Detection With Multi-View MotionBlip2 Yihua Shao et.al. 2404.12149 translate read link
2024-04-17 Multimodal 3D Object Detection on Unseen Domains Deepti Hegde et.al. 2404.11764 translate read null
2024-04-16 ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation Iaroslav Melekhov et.al. 2404.10699 translate read link
2024-04-16 PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction Sinisa Stekovic et.al. 2404.10620 translate read link
2024-04-16 PreGSU-A Generalized Traffic Scene Understanding Model for Autonomous Driving based on Pre-trained Graph Attention Network Yuning Wang et.al. 2404.10263 translate read null
2024-04-15 No More Ambiguity in 360° Room Layout via Bi-Layout Estimation Yu-Ju Tsai et.al. 2404.09993 translate read null
2024-04-15 A Review and Efficient Implementation of Scene Graph Generation Metrics Julian Lorenz et.al. 2404.09616 translate read link
2024-04-14 Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms Diandian Guo et.al. 2404.09231 translate read null
2024-04-11 Gaga: Group Any Gaussians via 3D-aware Memory Bank Weijie Lyu et.al. 2404.07977 translate read null
2024-04-11 AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation Yansheng Li et.al. 2404.07788 translate read null
2024-04-11 Depth Estimation using Weighted-loss and Transfer Learning Muhammad Adeel Hafeez et.al. 2404.07686 translate read null
2024-04-11 Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange Yanhao Wu et.al. 2404.07504 translate read null
2024-04-10 Incorporating Explanations into Human-Machine Interfaces for Trust and Situation Awareness in Autonomous Vehicles Shahin Atakishiyev et.al. 2404.07383 translate read null
2024-04-10 ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling Ege Özsoy et.al. 2404.07031 translate read link
2024-04-10 O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation Muer Tie et.al. 2404.06836 translate read null
2024-04-09 QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding Yash Mehan et.al. 2404.06442 translate read null
2024-04-09 DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird’s Eye View Segmentation with Occlusion Reasoning Senthil Yogamani et.al. 2404.06352 translate read null
2024-04-09 JSTR: Judgment Improves Scene Text Recognition Masato Fujitake et.al. 2404.05967 translate read null
2024-04-06 Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation Danpei Zhao et.al. 2404.04608 translate read null
2024-04-06 SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos Tao Wu et.al. 2404.04565 translate read link
2024-04-05 Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation Zifu Wan et.al. 2404.04256 translate read link
2024-04-06 HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion Jiahang Li et.al. 2404.03527 translate read link
2024-04-04 You Only Scan Once: A Dynamic Scene Reconstruction Pipeline for 6-DoF Robotic Grasping of Novel Objects Lei Zhou et.al. 2404.03462 translate read null
2024-04-03 Weakly-Supervised 3D Scene Graph Generation via Visual-Linguistic Assisted Pseudo-labeling Xu Wang et.al. 2404.02527 translate read null
2024-04-05 EGTR: Extracting Graph from Transformer for Scene Graph Generation Jinbae Im et.al. 2404.02072 translate read link
2024-04-01 NeRF-MAE : Masked AutoEncoders for Self Supervised 3D representation Learning for Neural Radiance Fields Muhammad Zubair Irshad et.al. 2404.01300 translate read null
2024-04-08 360+x: A Panoptic Multi-modal Scene Understanding Dataset Hao Chen et.al. 2404.00989 translate read null
2024-04-01 Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping Hyeongjun Kwon et.al. 2404.00974 translate read link
2024-04-01 GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields Yunsong Wang et.al. 2404.00931 translate read link
2024-04-01 MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements Lisong C. Sun et.al. 2404.00923 translate read link
2024-04-01 From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models Rongjie Li et.al. 2404.00906 translate read null
2024-04-01 Efficient 3D Instance Mapping and Localization with Neural Fields George Tang et.al. 2403.19797 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)