Scene Understanding - 2024-09 | Paper Arxiv Daily

Scene Understanding - 2024-09

Publish Date	Title	Authors	PDF	Translate	Read	Code
2024-09-30	Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation	Aleyna Kütük et.al.	2410.00266	translate	read	null
2024-09-30	Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation	Kun Yuan et.al.	2410.00263	translate	read	link
2024-09-30	You Only Speak Once to See	Wenhao Yang et.al.	2409.18372	translate	read	null
2024-09-26	LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness	Chenming Zhu et.al.	2409.18125	translate	read	null
2024-09-26	Text Image Generation for Low-Resource Languages with Dual Translation Learning	Chihiro Noguchi et.al.	2409.17747	translate	read	null
2024-09-26	Scene Understanding in Pick-and-Place Tasks: Analyzing Transformations Between Initial and Final Scenes	Seraj Ghasemi et.al.	2409.17720	translate	read	null
2024-09-24	Open-World Object Detection with Instance Representation Learning	Sunoh Lee et.al.	2409.16073	translate	read	null
2024-09-24	Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving	Lingyu Xiao et.al.	2409.15730	translate	read	link
2024-09-27	Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer	Minh Bui et.al.	2409.15117	translate	read	null
2024-09-23	An Adverse Weather-Immune Scheme with Unfolded Regularization and Foundation Model Knowledge Distillation for Street Scene Understanding	Wei-Bin Kou et.al.	2409.14737	translate	read	null
2024-09-22	One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance	Minyi Zhao et.al.	2409.14483	translate	read	null
2024-09-22	Scene-Text Grounding for Text-Based Video Question Answering	Sheng Zhou et.al.	2409.14319	translate	read	null
2024-09-21	MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors	Zhenhua Du et.al.	2409.14019	translate	read	null
2024-09-21	Relevance-driven Decision Making for Safer and More Efficient Human Robot Collaboration	Xiaotong Zhang et.al.	2409.13998	translate	read	null
2024-09-21	Enhanced Semantic Segmentation for Large-Scale and Imbalanced Point Clouds	Haoran Gong et.al.	2409.13983	translate	read	null
2024-09-19	CLAIR-A: Leveraging Large Language Models to Judge Audio Captions	Tsung-Han Wu et.al.	2409.12962	translate	read	link
2024-09-18	Towards Global Localization using Multi-Modal Object-Instance Re-Identification	Aneesh Chavan et.al.	2409.12002	translate	read	null
2024-09-18	SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection	Tim Engelbracht et.al.	2409.11870	translate	read	null
2024-09-18	VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer	Humen Zhong et.al.	2409.11656	translate	read	null
2024-09-18	DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion	Jian Xu et.al.	2409.11642	translate	read	link
2024-09-16	Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving	Yunsheng Ma et.al.	2409.11182	translate	read	null
2024-09-16	Point2Graph: An End-to-end Point Cloud-based 3D Open-Vocabulary Scene Graph for Robot Navigation	Yifan Xu et.al.	2409.10350	translate	read	null
2024-09-16	Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation	Minghan Chen et.al.	2409.10262	translate	read	null
2024-09-15	Semantic2D: A Semantic Dataset for 2D Lidar Semantic Segmentation	Zhanteng Xie et.al.	2409.09899	translate	read	null
2024-09-12	LED: Light Enhanced Depth Estimation at Night	Simon de Moreau et.al.	2409.08031	translate	read	link
2024-09-12	Relevance for Human Robot Collaboration	Xiaotong Zhang et.al.	2409.07753	translate	read	null
2024-09-10	Towards Localizing Structural Elements: Merging Geometrical Detection with Semantic Verification in RGB-D Data	Ali Tourani et.al.	2409.06625	translate	read	null
2024-09-10	Loss Distillation via Gradient Matching for Point Cloud Completion with Weighted Chamfer Distance	Fangzhou Lin et.al.	2409.06171	translate	read	link
2024-09-09	Online 3D reconstruction and dense tracking in endoscopic videos	Michel Hayoz et.al.	2409.06037	translate	read	link
2024-09-08	TanDepth: Leveraging Global DEMs for Metric Monocular Depth Estimation in UAVs	Horatiu Florea et.al.	2409.05142	translate	read	null
2024-09-06	Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences	Rui Yu et.al.	2409.04390	translate	read	null
2024-09-06	RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement	Hao Luo et.al.	2409.04363	translate	read	link
2024-09-05	Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding	Yunze Man et.al.	2409.03757	translate	read	link
2024-09-05	Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction	Shen Chen et.al.	2409.03213	translate	read	null
2024-09-04	Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving	Yuhang Lu et.al.	2409.02914	translate	read	null
2024-09-03	Unveiling Deep Shadows: A Survey on Image and Video Shadow Detection, Removal, and Generation in the Era of Deep Learning	Xiaowei Hu et.al.	2409.02108	translate	read	link
2024-09-03	EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video	Zhen Zhou et.al.	2409.01807	translate	read	link
2024-09-03	GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting	Zixuan Guo et.al.	2409.01581	translate	read	null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)