Scene Understanding - 2024-05

Publish Date Title Authors PDF Translate Read Code
2024-05-30 Learning 3D Robotics Perception using Inductive Priors Muhammad Zubair Irshad et.al. 2405.20364 translate read null
2024-05-30 SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation Junjie Zhang et.al. 2405.19586 translate read null
2024-05-29 Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding Junjie Fei et.al. 2405.18937 translate read null
2024-05-27 GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane Yansong Qu et.al. 2405.17596 translate read null
2024-05-27 OED: Towards One-stage End-to-End Dynamic Scene Graph Generation Guan Wang et.al. 2405.16925 translate read link
2024-05-25 Real-Time Scene Graph Generation Maëlic Neau et.al. 2405.16116 translate read link
2024-05-24 Open-Vocabulary SAM3D: Understand Any 3D Scene Hanchen Tai et.al. 2405.15580 translate read null
2024-05-23 Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis Basile Van Hoorick et.al. 2405.14868 translate read null
2024-05-23 CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments Yang Zhou et.al. 2405.14731 translate read link
2024-05-23 Efficient Robot Learning for Perception and Mapping Niclas Vödisch et.al. 2405.14688 translate read null
2024-05-24 Transformers for Image-Goal Navigation Nikhilanj Pelluri et.al. 2405.14128 translate read null
2024-05-22 TS40K: a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission System Diogo Lavado et.al. 2405.13989 translate read null
2024-05-22 A General Framework for Jersey Number Recognition in Sports Video Maria Koshkina et.al. 2405.13896 translate read link
2024-05-22 GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games Aoran Mei et.al. 2405.13751 translate read null
2024-05-21 Anticipating Object State Changes Victoria Manousaki et.al. 2405.12789 translate read null
2024-05-21 Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency Hyeongjin Kim et.al. 2405.12648 translate read null
2024-05-20 MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering Jingqun Tang et.al. 2405.11985 translate read link
2024-05-19 The First Swahili Language Scene Text Detection and Recognition Dataset Fadila Wendigoundi Douamba et.al. 2405.11437 translate read link
2024-05-16 Grounded 3D-LLM with Referent Tokens Yilun Chen et.al. 2405.10370 translate read link
2024-05-16 4D Panoptic Scene Graph Generation Jingkang Yang et.al. 2405.10305 translate read link
2024-05-16 When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models Xianzheng Ma et.al. 2405.10255 translate read link
2024-05-16 A Preprocessing and Postprocessing Voxel-based Method for LiDAR Semantic Segmentation Improvement in Long Distance Andrea Matteazzi et.al. 2405.10046 translate read null
2024-05-15 BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation Yunhao Ge et.al. 2405.09546 translate read null
2024-05-15 HAAP: Vision-context Hierarchical Attention Autoregressive with Adaptive Permutation for Scene Text Recognition Honghui Chen et.al. 2405.09125 translate read null
2024-05-15 3D Shape Augmentation with Content-Aware Shape Resizing Mingxiang Chen et.al. 2405.09050 translate read null
2024-05-09 Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control Gunshi Gupta et.al. 2405.05852 translate read link
2024-05-11 Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition Zuan Gao et.al. 2405.05841 translate read null
2024-05-09 Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview Yuhang Ming et.al. 2405.05526 translate read null
2024-05-09 DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction Siyu Li et.al. 2405.05518 translate read null
2024-05-08 OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies Lingdong Kong et.al. 2405.05259 translate read link
2024-05-08 Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving Lingdong Kong et.al. 2405.05258 translate read link
2024-05-07 DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving Chen Min et.al. 2405.04390 translate read null
2024-05-07 Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing Boqiang Zhang et.al. 2405.04377 translate read null
2024-05-06 An Empty Room is All We Want: Automatic Defurnishing of Indoor Panoramas Mira Slavcheva et.al. 2405.03682 translate read null
2024-05-04 Few-Shot Fruit Segmentation via Transfer Learning Jordan A. James et.al. 2405.02556 translate read link

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)