Scene Understanding - 2024-06 | Paper Arxiv Daily

Scene Understanding - 2024-06

Publish Date	Title	Authors	PDF	Translate	Read	Code
2024-06-30	ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding	Quang P. M. Pham et.al.	2407.00609	translate	read	null
2024-06-28	EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting	Daiwei Zhang et.al.	2406.19811	translate	read	null
2024-06-28	PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation	Deyi Ji et.al.	2406.19632	translate	read	null
2024-06-27	Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation	KuanChao Chu et.al.	2406.19316	translate	read	null
2024-06-26	3D-MVP: 3D Multiview Pretraining for Robotic Manipulation	Shengyi Qian et.al.	2406.18158	translate	read	null
2024-06-24	GPT-4V Explorations: Mining Autonomous Driving	Zixuan Li et.al.	2406.16817	translate	read	null
2024-06-25	AudioBench: A Universal Benchmark for Audio Large Language Models	Bin Wang et.al.	2406.16020	translate	read	link
2024-06-20	EvSegSNN: Neuromorphic Semantic Segmentation for Event Data	Dalia Hareb et.al.	2406.14178	translate	read	null
2024-06-19	StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images	Rushikesh Zawar et.al.	2406.13735	translate	read	null
2024-06-17	DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features	Letian Wang et.al.	2406.12095	translate	read	null
2024-06-17	Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding	Yunsong Wang et.al.	2406.11283	translate	read	null
2024-06-15	PIG: Prompt Images Guidance for Night-Time Scene Parsing	Zhifeng Xie et.al.	2406.10531	translate	read	link
2024-06-14	MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report	Zhongyu Yang et.al.	2406.10125	translate	read	null
2024-06-14	SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding	Junwei Luo et.al.	2406.10100	translate	read	link
2024-06-14	A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion	Kailai Sun et.al.	2406.09792	translate	read	link
2024-06-13	MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding	Fei Wang et.al.	2406.09411	translate	read	link
2024-06-13	Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach	Yansheng Li et.al.	2406.09410	translate	read	link
2024-06-12	Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment	Taekbeom Lee et.al.	2406.08176	translate	read	link
2024-06-13	A3VLM: Actionable Articulation-Aware Vision Language Model	Siyuan Huang et.al.	2406.07549	translate	read	link
2024-06-10	ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery	Xian Sun et.al.	2406.06028	translate	read	null
2024-06-11	LOP-Field: Brain-inspired Layout-Object-Position Fields for Robotic Scene Understanding	Jiawei Hou et.al.	2406.05985	translate	read	null
2024-06-08	1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR’24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation	Qingfeng Liu et.al.	2406.05352	translate	read	null
2024-06-06	Semantic Similarity Score for Measuring Visual Similarity at Semantic Level	Senran Fan et.al.	2406.03865	translate	read	null
2024-06-04	Radar Spectra-Language Model for Automotive Scene Parsing	Mariia Pushkareva et.al.	2406.02158	translate	read	null
2024-06-04	Leveraging Predicate and Triplet Learning for Scene Graph Generation	Jiankai Li et.al.	2406.02038	translate	read	link
2024-06-04	FastLGS: Speeding up Language Embedded Gaussians with Feature Grid Mapping	Yuzhou Ji et.al.	2406.01916	translate	read	null
2024-06-04	PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning	Yupeng Zheng et.al.	2406.01587	translate	read	null
2024-06-03	EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding	Thanh-Dat Truong et.al.	2406.01429	translate	read	null
2024-06-03	Object Aware Egocentric Online Action Detection	Joungbin An et.al.	2406.01079	translate	read	null
2024-06-03	CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos	Trong-Thuan Nguyen et.al.	2406.01029	translate	read	null
2024-06-02	Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering	Xingrui Wang et.al.	2406.00622	translate	read	link
2024-06-02	Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024	Biao Wu et.al.	2406.00587	translate	read	null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)