Scene Understanding - 2024-06

Publish Date Title Authors PDF Translate Read Code
2024-06-30 ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding Quang P. M. Pham et.al. 2407.00609 translate read null
2024-06-28 EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting Daiwei Zhang et.al. 2406.19811 translate read null
2024-06-28 PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation Deyi Ji et.al. 2406.19632 translate read null
2024-06-27 Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation KuanChao Chu et.al. 2406.19316 translate read null
2024-06-26 3D-MVP: 3D Multiview Pretraining for Robotic Manipulation Shengyi Qian et.al. 2406.18158 translate read null
2024-06-24 GPT-4V Explorations: Mining Autonomous Driving Zixuan Li et.al. 2406.16817 translate read null
2024-06-25 AudioBench: A Universal Benchmark for Audio Large Language Models Bin Wang et.al. 2406.16020 translate read link
2024-06-20 EvSegSNN: Neuromorphic Semantic Segmentation for Event Data Dalia Hareb et.al. 2406.14178 translate read null
2024-06-19 StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images Rushikesh Zawar et.al. 2406.13735 translate read null
2024-06-17 DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features Letian Wang et.al. 2406.12095 translate read null
2024-06-17 Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding Yunsong Wang et.al. 2406.11283 translate read null
2024-06-15 PIG: Prompt Images Guidance for Night-Time Scene Parsing Zhifeng Xie et.al. 2406.10531 translate read link
2024-06-14 MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report Zhongyu Yang et.al. 2406.10125 translate read null
2024-06-14 SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding Junwei Luo et.al. 2406.10100 translate read link
2024-06-14 A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion Kailai Sun et.al. 2406.09792 translate read link
2024-06-13 MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding Fei Wang et.al. 2406.09411 translate read link
2024-06-13 Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach Yansheng Li et.al. 2406.09410 translate read link
2024-06-12 Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment Taekbeom Lee et.al. 2406.08176 translate read link
2024-06-13 A3VLM: Actionable Articulation-Aware Vision Language Model Siyuan Huang et.al. 2406.07549 translate read link
2024-06-10 ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery Xian Sun et.al. 2406.06028 translate read null
2024-06-11 LOP-Field: Brain-inspired Layout-Object-Position Fields for Robotic Scene Understanding Jiawei Hou et.al. 2406.05985 translate read null
2024-06-08 1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR’24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation Qingfeng Liu et.al. 2406.05352 translate read null
2024-06-06 Semantic Similarity Score for Measuring Visual Similarity at Semantic Level Senran Fan et.al. 2406.03865 translate read null
2024-06-04 Radar Spectra-Language Model for Automotive Scene Parsing Mariia Pushkareva et.al. 2406.02158 translate read null
2024-06-04 Leveraging Predicate and Triplet Learning for Scene Graph Generation Jiankai Li et.al. 2406.02038 translate read link
2024-06-04 FastLGS: Speeding up Language Embedded Gaussians with Feature Grid Mapping Yuzhou Ji et.al. 2406.01916 translate read null
2024-06-04 PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning Yupeng Zheng et.al. 2406.01587 translate read null
2024-06-03 EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding Thanh-Dat Truong et.al. 2406.01429 translate read null
2024-06-03 Object Aware Egocentric Online Action Detection Joungbin An et.al. 2406.01079 translate read null
2024-06-03 CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos Trong-Thuan Nguyen et.al. 2406.01029 translate read null
2024-06-02 Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering Xingrui Wang et.al. 2406.00622 translate read link
2024-06-02 Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024 Biao Wu et.al. 2406.00587 translate read null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)