Scene Understanding - 2024-04 | Paper Arxiv Daily

Scene Understanding - 2024-04

Publish Date	Title	Authors	PDF	Translate	Read	Code
2024-04-29	Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM	Navid Rajabi et.al.	2404.19128	translate	read	null
2024-04-29	Compositional Factorization of Visual Scenes with Convolutional Sparse Coding and Resonator Networks	Christopher J. Kymn et.al.	2404.19126	translate	read	null
2024-04-24	Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer	Jiaming Lei et.al.	2404.15785	translate	read	null
2024-04-22	CloudFort: Enhancing Robustness of 3D Point Cloud Classification Against Backdoor Attacks via Spatial Partitioning and Ensemble Prediction	Wenhao Lan et.al.	2404.14042	translate	read	null
2024-04-22	On Support Relations Inference and Scene Hierarchy Graph Construction from Point Cloud in Clustered Environments	Gang Ma et.al.	2404.13842	translate	read	null
2024-04-29	Clio: Real-time Task-Driven Open-Set 3D Scene Graphs	Dominic Maggio et.al.	2404.13696	translate	read	link
2024-04-19	BACS: Background Aware Continual Semantic Segmentation	Mostafa ElAraby et.al.	2404.13148	translate	read	link
2024-04-19	Unified Scene Representation and Reconstruction for 3D Large Language Models	Tao Chu et.al.	2404.13044	translate	read	null
2024-04-18	SPIdepth: Strengthened Pose Information for Self-supervised Monocular Depth Estimation	Mykola Lavreniuk et.al.	2404.12501	translate	read	link
2024-04-19	AccidentBlip2: Accident Detection With Multi-View MotionBlip2	Yihua Shao et.al.	2404.12149	translate	read	link
2024-04-17	Multimodal 3D Object Detection on Unseen Domains	Deepti Hegde et.al.	2404.11764	translate	read	null
2024-04-16	ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation	Iaroslav Melekhov et.al.	2404.10699	translate	read	link
2024-04-16	PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction	Sinisa Stekovic et.al.	2404.10620	translate	read	link
2024-04-16	PreGSU-A Generalized Traffic Scene Understanding Model for Autonomous Driving based on Pre-trained Graph Attention Network	Yuning Wang et.al.	2404.10263	translate	read	null
2024-04-15	No More Ambiguity in 360° Room Layout via Bi-Layout Estimation	Yu-Ju Tsai et.al.	2404.09993	translate	read	null
2024-04-15	A Review and Efficient Implementation of Scene Graph Generation Metrics	Julian Lorenz et.al.	2404.09616	translate	read	link
2024-04-14	Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms	Diandian Guo et.al.	2404.09231	translate	read	null
2024-04-11	Gaga: Group Any Gaussians via 3D-aware Memory Bank	Weijie Lyu et.al.	2404.07977	translate	read	null
2024-04-11	AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation	Yansheng Li et.al.	2404.07788	translate	read	null
2024-04-11	Depth Estimation using Weighted-loss and Transfer Learning	Muhammad Adeel Hafeez et.al.	2404.07686	translate	read	null
2024-04-11	Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange	Yanhao Wu et.al.	2404.07504	translate	read	null
2024-04-10	Incorporating Explanations into Human-Machine Interfaces for Trust and Situation Awareness in Autonomous Vehicles	Shahin Atakishiyev et.al.	2404.07383	translate	read	null
2024-04-10	ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling	Ege Özsoy et.al.	2404.07031	translate	read	link
2024-04-10	O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation	Muer Tie et.al.	2404.06836	translate	read	null
2024-04-09	QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding	Yash Mehan et.al.	2404.06442	translate	read	null
2024-04-09	DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird’s Eye View Segmentation with Occlusion Reasoning	Senthil Yogamani et.al.	2404.06352	translate	read	null
2024-04-09	JSTR: Judgment Improves Scene Text Recognition	Masato Fujitake et.al.	2404.05967	translate	read	null
2024-04-06	Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation	Danpei Zhao et.al.	2404.04608	translate	read	null
2024-04-06	SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos	Tao Wu et.al.	2404.04565	translate	read	link
2024-04-05	Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation	Zifu Wan et.al.	2404.04256	translate	read	link
2024-04-06	HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion	Jiahang Li et.al.	2404.03527	translate	read	link
2024-04-04	You Only Scan Once: A Dynamic Scene Reconstruction Pipeline for 6-DoF Robotic Grasping of Novel Objects	Lei Zhou et.al.	2404.03462	translate	read	null
2024-04-03	Weakly-Supervised 3D Scene Graph Generation via Visual-Linguistic Assisted Pseudo-labeling	Xu Wang et.al.	2404.02527	translate	read	null
2024-04-05	EGTR: Extracting Graph from Transformer for Scene Graph Generation	Jinbae Im et.al.	2404.02072	translate	read	link
2024-04-01	NeRF-MAE : Masked AutoEncoders for Self Supervised 3D representation Learning for Neural Radiance Fields	Muhammad Zubair Irshad et.al.	2404.01300	translate	read	null
2024-04-08	360+x: A Panoptic Multi-modal Scene Understanding Dataset	Hao Chen et.al.	2404.00989	translate	read	null
2024-04-01	Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping	Hyeongjun Kwon et.al.	2404.00974	translate	read	link
2024-04-01	GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields	Yunsong Wang et.al.	2404.00931	translate	read	link
2024-04-01	MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements	Lisong C. Sun et.al.	2404.00923	translate	read	link
2024-04-01	From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models	Rongjie Li et.al.	2404.00906	translate	read	null
2024-04-01	Efficient 3D Instance Mapping and Localization with Neural Fields	George Tang et.al.	2403.19797	translate	read	null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)