Scene Understanding - 2024-07 | Paper Arxiv Daily

Scene Understanding - 2024-07

Publish Date	Title	Authors	PDF	Translate	Read	Code
2024-07-31	RoadFormer+: Delivering RGB-X Scene Parsing through Scale-Aware Information Decoupling and Advanced Heterogeneous Feature Fusion	Jianxin Huang et.al.	2407.21631	translate	read	null
2024-07-31	Voxel Scene Graph for Intracranial Hemorrhage	Antoine P. Sanner et.al.	2407.21580	translate	read	null
2024-07-31	A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap	Lijun Zhang et.al.	2407.21438	translate	read	link
2024-07-31	DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations	Dongwon Son et.al.	2407.21267	translate	read	null
2024-07-30	From Feature Importance to Natural Language Explanations Using LLMs with RAG	Sule Tekkesinoglu et.al.	2407.20990	translate	read	null
2024-07-30	Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering	Yanpeng Zhao et.al.	2407.20908	translate	read	link
2024-07-30	NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding	Hongjia Zhai et.al.	2407.20853	translate	read	null
2024-07-29	SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction	Çağhan Köksal et.al.	2407.20214	translate	read	null
2024-07-29	Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets	Muhammad Abdullah Jamal et.al.	2407.19714	translate	read	null
2024-07-28	ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding	Zhen Chen et.al.	2407.19435	translate	read	link
2024-07-27	GP-VLS: A general-purpose vision language model for surgery	Samuel Schmidgall et.al.	2407.19305	translate	read	null
2024-07-27	Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction	Yansheng Li et.al.	2407.19259	translate	read	null
2024-07-26	BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation	Peng Hao et.al.	2407.18715	translate	read	null
2024-07-26	MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition	Chang Liu et.al.	2407.18616	translate	read	link
2024-07-26	Answerability Fields: Answerable Location Estimation via Diffusion Models	Daichi Azuma et.al.	2407.18497	translate	read	null
2024-07-24	3D Question Answering for City Scene Understanding	Penglei Sun et.al.	2407.17398	translate	read	null
2024-07-23	Augmented Efficiency: Reducing Memory Footprint and Accelerating Inference for 3D Semantic Segmentation through Hybrid Vision	Aditya Krishnan et.al.	2407.16102	translate	read	null
2024-07-25	Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation	Jaehyeong Jeon et.al.	2407.15396	translate	read	link
2024-07-21	VideoGameBunny: Towards vision assistants for video games	Mohammad Reza Taesiri et.al.	2407.15295	translate	read	null
2024-07-21	Self-training Room Layout Estimation via Geometry-aware Ray-casting	Bolivar Solarte et.al.	2407.15041	translate	read	null
2024-07-19	A New Lightweight Hybrid Graph Convolutional Neural Network – CNN Scheme for Scene Classification using Object Detection Inference	Ayman Beghdadi et.al.	2407.14658	translate	read	null
2024-07-19	OpenSU3D: Open World 3D Scene Understanding using Foundation Models	Rafay Mohiuddin et.al.	2407.14279	translate	read	null
2024-07-19	MC-PanDA: Mask Confidence for Panoptic Domain Adaptation	Ivan Martinović et.al.	2407.14110	translate	read	link
2024-07-19	GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation	Florian Chabot et.al.	2407.14108	translate	read	null
2024-07-18	Training-Free Model Merging for Multi-target Domain Adaptation	Wenyi Li et.al.	2407.13771	translate	read	null
2024-07-18	General Geometry-aware Weakly Supervised 3D Object Detection	Guowen Zhang et.al.	2407.13748	translate	read	link
2024-07-18	Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation	Pengfei Wang et.al.	2407.13362	translate	read	null
2024-07-17	InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction	Xulong Wang et.al.	2407.12661	translate	read	link
2024-07-17	Out of Length Text Recognition with Sub-String Matching	Yongkun Du et.al.	2407.12317	translate	read	link
2024-07-17	Dual-Hybrid Attention Network for Specular Highlight Removal	Xiaojiao Guo et.al.	2407.12255	translate	read	null
2024-07-16	Disentangled Acoustic Fields For Multimodal Physical Scene Understanding	Jie Yin et.al.	2407.11333	translate	read	null
2024-07-15	OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models	Zijian Zhou et.al.	2407.11213	translate	read	link
2024-07-15	No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations	Walter Simoncini et.al.	2407.10964	translate	read	link
2024-07-18	Benchmarking Vision Language Models for Cultural Understanding	Shravan Nayak et.al.	2407.10920	translate	read	null
2024-07-14	Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data	Tuo Feng et.al.	2407.10200	translate	read	link
2024-07-13	Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding	Ruihuang Li et.al.	2407.09781	translate	read	null
2024-07-12	A Fair Ranking and New Model for Panoptic Scene Graph Generation	Julian Lorenz et.al.	2407.09216	translate	read	link
2024-07-12	From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation	Hanrong Shi et.al.	2407.09191	translate	read	null
2024-07-11	BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight	Hang Wu et.al.	2407.08526	translate	read	null
2024-07-10	Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences	Nikolaos Dimitriadis et.al.	2407.08056	translate	read	null
2024-07-10	Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search	Kirill Paramonov et.al.	2407.07541	translate	read	null
2024-07-09	Joint prototype and coefficient prediction for 3D instance segmentation	Remco Royen et.al.	2407.06958	translate	read	null
2024-07-09	LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition	Teng Wang et.al.	2407.06730	translate	read	null
2024-07-08	Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition	Bangbang Zhou et.al.	2407.05562	translate	read	link
2024-07-07	Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness	Idris Hamoud et.al.	2407.05448	translate	read	null
2024-07-05	Hybrid Primal Sketch: Combining Analogy, Qualitative Representations, and Computer Vision for Scene Understanding	Kenneth D. Forbus et.al.	2407.04859	translate	read	null
2024-07-03	A Unified Framework for 3D Scene Understanding	Wei Xu et.al.	2407.03263	translate	read	null
2024-07-11	Open Panoramic Segmentation	Junwei Zheng et.al.	2407.02685	translate	read	link
2024-07-02	MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders	Baijiong Lin et.al.	2407.02228	translate	read	link
2024-07-02	Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning	Chengchao Shen et.al.	2407.02014	translate	read	link
2024-07-01	PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction	Xuan Yu et.al.	2407.01349	translate	read	null
2024-07-01	Mobile Robot Oriented Large-Scale Indoor Dataset for Dynamic Scene Understanding	Yifan Tang et.al.	2406.19791	translate	read	null

(<a href=../Scene_Understanding.md>back to Scene Understanding</a>)