Scene Understanding - 2024-05 | Paper Arxiv Daily

Scene Understanding - 2024-05

Publish Date	Title	Authors	PDF	Translate	Read	Code
2024-05-30	Learning 3D Robotics Perception using Inductive Priors	Muhammad Zubair Irshad et.al.	2405.20364	translate	read	null
2024-05-30	SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation	Junjie Zhang et.al.	2405.19586	translate	read	null
2024-05-29	Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding	Junjie Fei et.al.	2405.18937	translate	read	null
2024-05-27	GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane	Yansong Qu et.al.	2405.17596	translate	read	null
2024-05-27	OED: Towards One-stage End-to-End Dynamic Scene Graph Generation	Guan Wang et.al.	2405.16925	translate	read	link
2024-05-25	Real-Time Scene Graph Generation	Maëlic Neau et.al.	2405.16116	translate	read	link
2024-05-24	Open-Vocabulary SAM3D: Understand Any 3D Scene	Hanchen Tai et.al.	2405.15580	translate	read	null
2024-05-23	Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis	Basile Van Hoorick et.al.	2405.14868	translate	read	null
2024-05-23	CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments	Yang Zhou et.al.	2405.14731	translate	read	link
2024-05-23	Efficient Robot Learning for Perception and Mapping	Niclas Vödisch et.al.	2405.14688	translate	read	null
2024-05-24	Transformers for Image-Goal Navigation	Nikhilanj Pelluri et.al.	2405.14128	translate	read	null
2024-05-22	TS40K: a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission System	Diogo Lavado et.al.	2405.13989	translate	read	null
2024-05-22	A General Framework for Jersey Number Recognition in Sports Video	Maria Koshkina et.al.	2405.13896	translate	read	link
2024-05-22	GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games	Aoran Mei et.al.	2405.13751	translate	read	null
2024-05-21	Anticipating Object State Changes	Victoria Manousaki et.al.	2405.12789	translate	read	null
2024-05-21	Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency	Hyeongjin Kim et.al.	2405.12648	translate	read	null
2024-05-20	MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering	Jingqun Tang et.al.	2405.11985	translate	read	link
2024-05-19	The First Swahili Language Scene Text Detection and Recognition Dataset	Fadila Wendigoundi Douamba et.al.	2405.11437	translate	read	link
2024-05-16	Grounded 3D-LLM with Referent Tokens	Yilun Chen et.al.	2405.10370	translate	read	link
2024-05-16	4D Panoptic Scene Graph Generation	Jingkang Yang et.al.	2405.10305	translate	read	link
2024-05-16	When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models	Xianzheng Ma et.al.	2405.10255	translate	read	link
2024-05-16	A Preprocessing and Postprocessing Voxel-based Method for LiDAR Semantic Segmentation Improvement in Long Distance	Andrea Matteazzi et.al.	2405.10046	translate	read	null
2024-05-15	BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation	Yunhao Ge et.al.	2405.09546	translate	read	null
2024-05-15	HAAP: Vision-context Hierarchical Attention Autoregressive with Adaptive Permutation for Scene Text Recognition	Honghui Chen et.al.	2405.09125	translate	read	null
2024-05-15	3D Shape Augmentation with Content-Aware Shape Resizing	Mingxiang Chen et.al.	2405.09050	translate	read	null
2024-05-09	Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control	Gunshi Gupta et.al.	2405.05852	translate	read	link
2024-05-11	Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition	Zuan Gao et.al.	2405.05841	translate	read	null
2024-05-09	Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview	Yuhang Ming et.al.	2405.05526	translate	read	null
2024-05-09	DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction	Siyu Li et.al.	2405.05518	translate	read	null
2024-05-08	OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies	Lingdong Kong et.al.	2405.05259	translate	read	link
2024-05-08	Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving	Lingdong Kong et.al.	2405.05258	translate	read	link
2024-05-07	DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving	Chen Min et.al.	2405.04390	translate	read	null
2024-05-07	Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing	Boqiang Zhang et.al.	2405.04377	translate	read	null
2024-05-06	An Empty Room is All We Want: Automatic Defurnishing of Indoor Panoramas	Mira Slavcheva et.al.	2405.03682	translate	read	null
2024-05-04	Few-Shot Fruit Segmentation via Transfer Learning	Jordan A. James et.al.	2405.02556	translate	read	link

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)