Object Detection - 2024-11
Object Detection - 2024-11
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-11-29 | SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection | Philipp Wolters et.al. | 2411.19860 | translate | read | null |
| 2024-11-29 | Feedback-driven object detection and iterative model improvement | Sönke Tenckhoff et.al. | 2411.19835 | translate | read | link |
| 2024-11-29 | Real-Time Anomaly Detection in Video Streams | Fabien Poirier et.al. | 2411.19731 | translate | read | null |
| 2024-11-29 | LDA-AQU: Adaptive Query-guided Upsampling via Local Deformable Attention | Zewen Du et.al. | 2411.19585 | translate | read | link |
| 2024-11-29 | Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding | Wenbo Zhang et.al. | 2411.19551 | translate | read | null |
| 2024-11-28 | Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection | Tsun-Hin Cheung et.al. | 2411.19220 | translate | read | null |
| 2024-11-28 | Co-Learning: Towards Semi-Supervised Object Detection with Road-side Cameras | Jicheng Yuan et.al. | 2411.19143 | translate | read | null |
| 2024-11-28 | On Moving Object Segmentation from Monocular Video with Transformers | Christian Homeyer et.al. | 2411.19141 | translate | read | null |
| 2024-11-28 | Dynamic Attention and Bi-directional Fusion for Safety Helmet Wearing Detection | Junwei Feng et.al. | 2411.19071 | translate | read | null |
| 2024-11-28 | MVFormer: Diversifying Feature Normalization and Token Mixing for Efficient Vision Transformers | Jongseong Bae et.al. | 2411.18995 | translate | read | null |
| 2024-11-27 | Exploring Depth Information for Detecting Manipulated Face Videos | Haoyue Wang et.al. | 2411.18572 | translate | read | null |
| 2024-11-27 | Efficient Dynamic LiDAR Odometry for Mobile Robots with Structured Point Clouds | Jonathan Lichtenfeld et.al. | 2411.18443 | translate | read | link |
| 2024-11-27 | Deep Fourier-embedded Network for Bi-modal Salient Object Detection | Pengfei Lyu et.al. | 2411.18409 | translate | read | link |
| 2024-11-27 | Optimizing Multispectral Object Detection: A Bag of Tricks and Comprehensive Benchmarks | Chen Zhou et.al. | 2411.18288 | translate | read | link |
| 2024-11-27 | From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects | Zizhao Li et.al. | 2411.18207 | translate | read | link |
| 2024-11-27 | RPEE-HEADS: A Novel Benchmark for Pedestrian Head Detection in Crowd Videos | Mohamad Abubaker et.al. | 2411.18164 | translate | read | null |
| 2024-11-27 | Revisiting Misalignment in Multispectral Pedestrian Detection: A Language-Driven Approach for Cross-modal Alignment Fusion | Taeheon Kim et.al. | 2411.17995 | translate | read | null |
| 2024-11-27 | ROICtrl: Boosting Instance Control for Visual Generation | Yuchao Gu et.al. | 2411.17949 | translate | read | null |
| 2024-11-26 | Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning | Hoàng-Ân Lê et.al. | 2411.17536 | translate | read | link |
| 2024-11-26 | TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba | Xiaowen Ma et.al. | 2411.17473 | translate | read | link |
| 2024-11-26 | Communication-Efficient Cooperative SLAMMOT via Determining the Number of Collaboration Vehicles | Susu Fang et.al. | 2411.17432 | translate | read | null |
| 2024-11-26 | DGNN-YOLO: Dynamic Graph Neural Networks with YOLO11 for Small Object Detection and Tracking in Traffic Surveillance | Shahriar Soudeep et.al. | 2411.17251 | translate | read | null |
| 2024-11-26 | Event-based Spiking Neural Networks for Object Detection: A Review of Datasets, Architectures, Learning Rules, and Implementation | Craig Iaboni et.al. | 2411.17006 | translate | read | link |
| 2024-11-25 | Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory | Zaira Manigrasso et.al. | 2411.16934 | translate | read | null |
| 2024-11-25 | Open Vocabulary Monocular 3D Object Detection | Jin Yao et.al. | 2411.16833 | translate | read | link |
| 2024-11-25 | Imperceptible Adversarial Examples in the Physical World | Weilin Xu et.al. | 2411.16622 | translate | read | null |
| 2024-11-25 | STDWeb: Simple Transient Detection pipeline for the Web | Sergey Karpov et.al. | 2411.16470 | translate | read | null |
| 2024-11-25 | Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and Tasks | Asanobu Kitamoto et.al. | 2411.16421 | translate | read | link |
| 2024-11-25 | CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation | Leon Sick et.al. | 2411.16319 | translate | read | null |
| 2024-11-25 | Diagnosis of diabetic retinopathy using machine learning & deep learning technique | Eric Shah et.al. | 2411.16250 | translate | read | null |
| 2024-11-25 | Interpreting Object-level Foundation Models via Visual Precision Search | Ruoyu Chen et.al. | 2411.16198 | translate | read | null |
| 2024-11-25 | Learn from Foundation Model: Fruit Detection Model without Manual Annotation | Yanan Wang et.al. | 2411.16196 | translate | read | null |
| 2024-11-25 | CIA: Controllable Image Augmentation Framework Based on Stable Diffusion | Mohamed Benkedadra et.al. | 2411.16128 | translate | read | null |
| 2024-11-25 | You only thermoelastically deform once: Point Absorber Detection in LIGO Test Masses with YOLO | Simon R. Goode et.al. | 2411.16104 | translate | read | null |
| 2024-11-25 | Leverage Task Context for Object Affordance Ranking | Haojie Huang et.al. | 2411.16082 | translate | read | null |
| 2024-11-22 | A Real-Time DETR Approach to Bangladesh Road Object Detection for Autonomous Vehicles | Irfan Nafiz Shahan et.al. | 2411.15110 | translate | read | null |
| 2024-11-22 | MSSF: A 4D Radar and Camera Fusion Framework With Multi-Stage Sampling for 3D Object Detection in Autonomous Driving | Hongsi Liu et.al. | 2411.15016 | translate | read | null |
| 2024-11-22 | VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving | Haiming Zhang et.al. | 2411.14716 | translate | read | null |
| 2024-11-21 | Unveiling the Hidden: A Comprehensive Evaluation of Underwater Image Enhancement and Its Impact on Object Detection | Ali Awad et.al. | 2411.14626 | translate | read | null |
| 2024-11-21 | DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding | Tianhe Ren et.al. | 2411.14347 | translate | read | link |
| 2024-11-21 | AnywhereDoor: Multi-Target Backdoor Attacks on Object Detection | Jialin Lu et.al. | 2411.14243 | translate | read | null |
| 2024-11-21 | Transforming Static Images Using Generative Models for Video Salient Object Detection | Suhwan Cho et.al. | 2411.13975 | translate | read | link |
| 2024-11-21 | Multitask Learning for SAR Ship Detection with Gaussian-Mask Joint Segmentation | Ming Zhao et.al. | 2411.13847 | translate | read | null |
| 2024-11-20 | MambaDETR: Query-based Temporal Modeling using State Space Model for Multi-View 3D Object Detection | Tong Ning et.al. | 2411.13628 | translate | read | null |
| 2024-11-20 | DIS-Mine: Instance Segmentation for Disaster-Awareness in Poor-Light Condition in Underground Mines | Mizanur Rahman Jewel et.al. | 2411.13544 | translate | read | null |
| 2024-11-20 | A Resource Efficient Fusion Network for Object Detection in Bird’s-Eye View using Camera and Raw Radar Data | Kavin Chandrasekaran et.al. | 2411.13311 | translate | read | link |
| 2024-11-20 | VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation | Chengjie Huang et.al. | 2411.13186 | translate | read | null |
| 2024-11-20 | RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation | Christoph Reinders et.al. | 2411.13150 | translate | read | link |
| 2024-11-20 | YCB-LUMA: YCB Object Dataset with Luminance Keying for Object Localization | Thomas Pöllabauer et.al. | 2411.13149 | translate | read | link |
| 2024-11-20 | Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension | Yongdong Luo et.al. | 2411.13093 | translate | read | link |
| 2024-11-20 | Bounding-box Watermarking: Defense against Model Extraction Attacks on Object Detectors | Satoru Koda et.al. | 2411.13047 | translate | read | null |
| 2024-11-20 | Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection | Xinhao Zhong et.al. | 2411.13001 | translate | read | null |
| 2024-11-19 | Maps from Motion (MfM): Generating 2D Semantic Maps from Sparse Multi-view Images | Matteo Toso et.al. | 2411.12620 | translate | read | null |
| 2024-11-19 | GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving | Shaoqing Xu et.al. | 2411.12452 | translate | read | null |
| 2024-11-19 | Physics-Guided Detector for SAR Airplanes | Zhongling Huang et.al. | 2411.12301 | translate | read | link |
| 2024-11-18 | Scaling Deep Learning Research with Kubernetes on the NRP Nautilus HyperCluster | J. Alex Hurt et.al. | 2411.12038 | translate | read | null |
| 2024-11-18 | LightFFDNets: Lightweight Convolutional Neural Networks for Rapid Facial Forgery Detection | Günel Jabbarlı et.al. | 2411.11826 | translate | read | null |
| 2024-11-18 | WoodYOLO: A Novel Object Detector for Wood Species Detection in Microscopic Images | Lars Nieradzik et.al. | 2411.11738 | translate | read | null |
| 2024-11-18 | Exploring Emerging Trends and Research Opportunities in Visual Place Recognition | Antonios Gasteratos et.al. | 2411.11481 | translate | read | null |
| 2024-11-18 | SL-YOLO: A Stronger and Lighter Drone Target Detection Model | Defan Chen et.al. | 2411.11477 | translate | read | null |
| 2024-11-19 | EVT: Efficient View Transformation for Multi-Modal 3D Object Detection | Yongjin Lee et.al. | 2411.10715 | translate | read | null |
| 2024-11-15 | Vision Eagle Attention: A New Lens for Advancing Image Classification | Mahmudul Hasan et.al. | 2411.10564 | translate | read | link |
| 2024-11-15 | Interactive Image-Based Aphid Counting in Yellow Water Traps under Stirring Actions | Xumin Gao et.al. | 2411.10357 | translate | read | null |
| 2024-11-15 | RETR: Multi-View Radar Detection Transformer for Indoor Perception | Ryoma Yataka et.al. | 2411.10293 | translate | read | null |
| 2024-11-15 | Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning | Jingru Yang et.al. | 2411.10252 | translate | read | null |
| 2024-11-15 | Real-Time AI-Driven People Tracking and Counting Using Overhead Cameras | Ishrath Ahamed et.al. | 2411.10072 | translate | read | null |
| 2024-11-15 | Diachronic Document Dataset for Semantic Layout Analysis | Thibault Clérice et.al. | 2411.10068 | translate | read | null |
| 2024-11-14 | Adversarial Attacks Using Differentiable Rendering: A Survey | Matthew Hull et.al. | 2411.09749 | translate | read | null |
| 2024-11-14 | Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration | Yifan Shao et.al. | 2411.09604 | translate | read | link |
| 2024-11-14 | Long-Tailed Object Detection Pre-training: Dynamic Rebalancing Contrastive Learning with Dual Reconstruction | Chen-Long Duan et.al. | 2411.09453 | translate | read | null |
| 2024-11-14 | Instruction-Driven Fusion of Infrared-Visible Images: Tailoring for Diverse Downstream Tasks | Zengyi Yang et.al. | 2411.09387 | translate | read | null |
| 2024-11-14 | DT-JRD: Deep Transformer based Just Recognizable Difference Prediction Model for Video Coding for Machines | Junqi Liu et.al. | 2411.09308 | translate | read | null |
| 2024-11-14 | Cross-Modal Consistency in Multimodal Large Language Models | Xiang Zhang et.al. | 2411.09273 | translate | read | null |
| 2024-11-14 | LEAP:D – A Novel Prompt-based Approach for Domain-Generalized Aerial Object Detection | Chanyeong Park et.al. | 2411.09180 | translate | read | null |
| 2024-11-13 | Multimodal Object Detection using Depth and Image Data for Manufacturing Parts | Nazanin Mahjourian et.al. | 2411.09062 | translate | read | null |
| 2024-11-13 | DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models | Yongdong Wang et.al. | 2411.09022 | translate | read | null |
| 2024-11-13 | UIFormer: A Unified Transformer-based Framework for Incremental Few-Shot Object Detection and Instance Segmentation | Chengyuan Zhang et.al. | 2411.08569 | translate | read | null |
| 2024-11-13 | Methodology for a Statistical Analysis of Influencing Factors on 3D Object Detection Performance | Anton Kuznietsov et.al. | 2411.08482 | translate | read | null |
| 2024-11-13 | V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion | Xun Huang et.al. | 2411.08402 | translate | read | link |
| 2024-11-12 | Large-scale Remote Sensing Image Target Recognition and Automatic Annotation | Wuzheng Dong et.al. | 2411.07802 | translate | read | link |
| 2024-11-12 | Efficient 3D Perception on Multi-Sweep Point Cloud with Gumbel Spatial Pruning | Jianhao Li et.al. | 2411.07742 | translate | read | null |
| 2024-11-12 | Depthwise Separable Convolutions with Deep Residual Convolutions | Md Arid Hasan et.al. | 2411.07544 | translate | read | null |
| 2024-11-11 | Transformers for Charged Particle Track Reconstruction in High Energy Physics | Samuel Van Stroud et.al. | 2411.07149 | translate | read | null |
| 2024-11-11 | Multi-scale Frequency Enhancement Network for Blind Image Deblurring | Yawen Xiang et.al. | 2411.06893 | translate | read | null |
| 2024-11-11 | Fast and Efficient Transformer-based Method for Bird’s Eye View Instance Prediction | Miguel Antunes-García et.al. | 2411.06851 | translate | read | link |
| 2024-11-11 | AV-PedAware: Self-Supervised Audio-Visual Fusion for Dynamic Pedestrian Awareness | Yizhuo Yang et.al. | 2411.06789 | translate | read | null |
| 2024-11-11 | United Domain Cognition Network for Salient Object Detection in Optical Remote Sensing Images | Yanguang Sun et.al. | 2411.06703 | translate | read | link |
| 2024-11-11 | Track Any Peppers: Weakly Supervised Sweet Pepper Tracking Using VLMs | Jia Syuen Lim et.al. | 2411.06702 | translate | read | null |
| 2024-11-11 | LFSamba: Marry SAM with Mamba for Light Field Salient Object Detection | Zhengyi Liu et.al. | 2411.06652 | translate | read | null |
| 2024-11-09 | Robust Detection of LLM-Generated Text: A Comparative Analysis | Yongye Su et.al. | 2411.06248 | translate | read | null |
| 2024-11-09 | LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance Representation | Weijie Ma et.al. | 2411.06173 | translate | read | link |
| 2024-11-09 | AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems | Zhiyu Zhu et.al. | 2411.06146 | translate | read | null |
| 2024-11-08 | Open-set object detection: towards unified problem formulation and benchmarking | Hejer Ammar et.al. | 2411.05564 | translate | read | null |
| 2024-11-08 | ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving | Tao Ma et.al. | 2411.05311 | translate | read | null |
| 2024-11-08 | SimpleBEV: Improved LiDAR-Camera Fusion Architecture for 3D Object Detection | Yun Zhao et.al. | 2411.05292 | translate | read | null |
| 2024-11-07 | On the Inherent Robustness of One-Stage Object Detection against Out-of-Distribution Data | Aitor Martinez-Seras et.al. | 2411.04586 | translate | read | null |
| 2024-11-07 | l0-Regularized Sparse Coding-based Interpretable Network for Multi-Modal Image Fusion | Gargi Panda et.al. | 2411.04519 | translate | read | null |
| 2024-11-07 | Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player’s Trajectory | Ali K. AlShami et.al. | 2411.04501 | translate | read | null |
| 2024-11-07 | SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation | Xun Tu et.al. | 2411.04386 | translate | read | null |
| 2024-11-07 | UEVAVD: A Dataset for Developing UAV’s Eye View Active Object Detection | Xinhua Jiang et.al. | 2411.04348 | translate | read | null |
| 2024-11-07 | GazeGen: Gaze-Driven User Interaction for Visual Content Generation | He-Yen Hsieh et.al. | 2411.04335 | translate | read | null |
| 2024-11-06 | An Enhancement of Haar Cascade Algorithm Applied to Face Recognition for Gate Pass Security | Clarence A. Antipona et.al. | 2411.03831 | translate | read | null |
| 2024-11-06 | Understanding the Effects of Human-written Paraphrases in LLM-generated Text Detection | Hiu Ting Lau et.al. | 2411.03806 | translate | read | link |
| 2024-11-06 | Efficient Fourier Filtering Network with Contrastive Learning for UAV-based Unaligned Bi-modal Salient Object Detection | Pengfei Lyu et.al. | 2411.03728 | translate | read | link |
| 2024-11-06 | Estimation of Psychosocial Work Environment Exposures Through Video Object Detection. Proof of Concept Using CCTV Footage | Claus D. Hansen et.al. | 2411.03724 | translate | read | null |
| 2024-11-06 | Hybrid Attention for Robust RGB-T Pedestrian Detection in Real-World Conditions | Arunkumar Rathinam et.al. | 2411.03576 | translate | read | null |
| 2024-11-05 | An Application-Agnostic Automatic Target Recognition System Using Vision Language Models | Anthony Palladino et.al. | 2411.03491 | translate | read | null |
| 2024-11-05 | Self-supervised cross-modality learning for uncertainty-aware object detection and recognition in applications which lack pre-labelled training data | Irum Mehboob et.al. | 2411.03082 | translate | read | null |
| 2024-11-05 | CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection | Jisong Kim et.al. | 2411.03013 | translate | read | null |
| 2024-11-05 | Centerness-based Instance-aware Knowledge Distillation with Task-wise Mutual Lifting for Object Detection on Drone Imagery | Bowei Du et.al. | 2411.02861 | translate | read | null |
| 2024-11-05 | Correlation of Object Detection Performance with Visual Saliency and Depth Estimation | Matthias Bartolo et.al. | 2411.02844 | translate | read | link |
| 2024-11-05 | ERUP-YOLO: Enhancing Object Detection Robustness for Adverse Weather Condition by Unified Image-Adaptive Processing | Yuka Ogino et.al. | 2411.02799 | translate | read | null |
| 2024-11-05 | Real-Time Text Detection with Similar Mask in Traffic, Industrial, and Natural Scenes | Xu Han et.al. | 2411.02794 | translate | read | link |
| 2024-11-05 | Efficient Feature Aggregation and Scale-Aware Regression for Monocular 3D Object Detection | Yifan Wang et.al. | 2411.02747 | translate | read | null |
| 2024-11-05 | Analysis of Multi-epoch JWST Images of $\sim 300$ Little Red Dots: Tentative Detection of Variability in a Minority of Sources | Zijian Zhang et.al. | 2411.02729 | translate | read | null |
| 2024-11-04 | Intelligent Video Recording Optimization using Activity Detection for Surveillance Systems | Youssef Elmir et.al. | 2411.02632 | translate | read | null |
| 2024-11-04 | SIRA: Scalable Inter-frame Relation and Association for Radar Perception | Ryoma Yataka et.al. | 2411.02220 | translate | read | null |
| 2024-11-04 | Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery | Robert Fonod et.al. | 2411.02136 | translate | read | null |
| 2024-11-04 | Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation | Yan Li et.al. | 2411.02057 | translate | read | link |
| 2024-11-04 | V-CAS: A Realtime Vehicle Anti Collision System Using Vision Transformer on Multi-Camera Streams | Muhammad Waqas Ashraf et.al. | 2411.01963 | translate | read | null |
| 2024-11-04 | Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models | Sharat Agarwal et.al. | 2411.01925 | translate | read | null |
| 2024-11-04 | LiDAttack: Robust Black-box Attack on LiDAR-based Object Detection | Jinyin Chen et.al. | 2411.01889 | translate | read | link |
| 2024-11-03 | ROAD-Waymo: Action Awareness at Scale for Autonomous Driving | Salman Khan et.al. | 2411.01683 | translate | read | null |
| 2024-11-03 | OSAD: Open-Set Aircraft Detection in SAR Images | Xiayang Xiao et.al. | 2411.01597 | translate | read | null |
| 2024-11-03 | One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection | Zhenyu Wang et.al. | 2411.01584 | translate | read | null |
| 2024-11-03 | A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning | Fei Wang et.al. | 2411.01445 | translate | read | null |
(<a href=../Object_Detection.md>back to Object Detection</a>)