Scene Understanding - 2024-05
Scene Understanding - 2024-05
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-05-30 | Learning 3D Robotics Perception using Inductive Priors | Muhammad Zubair Irshad et.al. | 2405.20364 | translate | read | null |
| 2024-05-30 | SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation | Junjie Zhang et.al. | 2405.19586 | translate | read | null |
| 2024-05-29 | Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding | Junjie Fei et.al. | 2405.18937 | translate | read | null |
| 2024-05-27 | GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane | Yansong Qu et.al. | 2405.17596 | translate | read | null |
| 2024-05-27 | OED: Towards One-stage End-to-End Dynamic Scene Graph Generation | Guan Wang et.al. | 2405.16925 | translate | read | link |
| 2024-05-25 | Real-Time Scene Graph Generation | Maëlic Neau et.al. | 2405.16116 | translate | read | link |
| 2024-05-24 | Open-Vocabulary SAM3D: Understand Any 3D Scene | Hanchen Tai et.al. | 2405.15580 | translate | read | null |
| 2024-05-23 | Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis | Basile Van Hoorick et.al. | 2405.14868 | translate | read | null |
| 2024-05-23 | CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments | Yang Zhou et.al. | 2405.14731 | translate | read | link |
| 2024-05-23 | Efficient Robot Learning for Perception and Mapping | Niclas Vödisch et.al. | 2405.14688 | translate | read | null |
| 2024-05-24 | Transformers for Image-Goal Navigation | Nikhilanj Pelluri et.al. | 2405.14128 | translate | read | null |
| 2024-05-22 | TS40K: a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission System | Diogo Lavado et.al. | 2405.13989 | translate | read | null |
| 2024-05-22 | A General Framework for Jersey Number Recognition in Sports Video | Maria Koshkina et.al. | 2405.13896 | translate | read | link |
| 2024-05-22 | GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games | Aoran Mei et.al. | 2405.13751 | translate | read | null |
| 2024-05-21 | Anticipating Object State Changes | Victoria Manousaki et.al. | 2405.12789 | translate | read | null |
| 2024-05-21 | Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency | Hyeongjin Kim et.al. | 2405.12648 | translate | read | null |
| 2024-05-20 | MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering | Jingqun Tang et.al. | 2405.11985 | translate | read | link |
| 2024-05-19 | The First Swahili Language Scene Text Detection and Recognition Dataset | Fadila Wendigoundi Douamba et.al. | 2405.11437 | translate | read | link |
| 2024-05-16 | Grounded 3D-LLM with Referent Tokens | Yilun Chen et.al. | 2405.10370 | translate | read | link |
| 2024-05-16 | 4D Panoptic Scene Graph Generation | Jingkang Yang et.al. | 2405.10305 | translate | read | link |
| 2024-05-16 | When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models | Xianzheng Ma et.al. | 2405.10255 | translate | read | link |
| 2024-05-16 | A Preprocessing and Postprocessing Voxel-based Method for LiDAR Semantic Segmentation Improvement in Long Distance | Andrea Matteazzi et.al. | 2405.10046 | translate | read | null |
| 2024-05-15 | BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation | Yunhao Ge et.al. | 2405.09546 | translate | read | null |
| 2024-05-15 | HAAP: Vision-context Hierarchical Attention Autoregressive with Adaptive Permutation for Scene Text Recognition | Honghui Chen et.al. | 2405.09125 | translate | read | null |
| 2024-05-15 | 3D Shape Augmentation with Content-Aware Shape Resizing | Mingxiang Chen et.al. | 2405.09050 | translate | read | null |
| 2024-05-09 | Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control | Gunshi Gupta et.al. | 2405.05852 | translate | read | link |
| 2024-05-11 | Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition | Zuan Gao et.al. | 2405.05841 | translate | read | null |
| 2024-05-09 | Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview | Yuhang Ming et.al. | 2405.05526 | translate | read | null |
| 2024-05-09 | DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction | Siyu Li et.al. | 2405.05518 | translate | read | null |
| 2024-05-08 | OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies | Lingdong Kong et.al. | 2405.05259 | translate | read | link |
| 2024-05-08 | Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving | Lingdong Kong et.al. | 2405.05258 | translate | read | link |
| 2024-05-07 | DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving | Chen Min et.al. | 2405.04390 | translate | read | null |
| 2024-05-07 | Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing | Boqiang Zhang et.al. | 2405.04377 | translate | read | null |
| 2024-05-06 | An Empty Room is All We Want: Automatic Defurnishing of Indoor Panoramas | Mira Slavcheva et.al. | 2405.03682 | translate | read | null |
| 2024-05-04 | Few-Shot Fruit Segmentation via Transfer Learning | Jordan A. James et.al. | 2405.02556 | translate | read | link |
(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)