<p align="center"><h1 align="center">
Paper-List-DAILY
Automatically Update Papers Daily in list</h1></p>
Updated on 2025.12.25
Introduction
This repository provides a daily-updated list of computer vision papers from arXiv, organized by topic. The updates are automated using GitHub Actions to ensure you stay current with the latest research.
Online documentation: https://islinxu.github.io/paper-list/
Usage
To generate the paper list locally, follow these steps:
- Install Dependencies
pip install -r requirements.txt - Run the Script
python get_paper.py - Configuration
You can customize the search keywords and other settings in
config.yaml.
Advanced Usage
You can also use the scripts in the scripts/ directory for additional tasks:
- Count Papers in Range: Count the number of papers within a specific date range.
python scripts/count_range.py 2024-01-01 2024-12-31
Classification
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-12-23 | Programmable Optical Spectrum Shapers as Computing Primitives for Accelerating Convolutional Neural Networks | Georgios Moustakas et.al. | 2512.20580 | null |
| 2025-12-23 | FedDPC : Handling Data Heterogeneity and Partial Client Participation in Federated Learning | Mrinmay Sen et.al. | 2512.20329 | null |
| 2025-12-23 | A Novel Graph-Sequence Learning Model for Inductive Text Classification | Zuo Wang et.al. | 2512.20097 | null |
| 2025-12-23 | 3D Stack In-Sensor-Computing (3DS-ISC): Accelerating Time-Surface Construction for Neuromorphic Event Cameras | Hongyang Shang et.al. | 2512.20073 | null |
| 2025-12-23 | Gaussian Process Assisted Meta-learning for Image Classification and Object Detection Models | Anna R. Flowers et.al. | 2512.20021 | null |
| 2025-12-23 | WSD-MIL: Window Scale Decay Multiple Instance Learning for Whole Slide Image Classification | Le Feng et.al. | 2512.19982 | null |
| 2025-12-22 | Phase-space entropy at acquisition reflects downstream learnability | Xiu-Cheng Wang et.al. | 2512.19223 | null |
| 2025-12-22 | Photonic Spiking Graph Neural Network for Energy-Efficient Structured Data Processing | Wanting Yu et.al. | 2512.19182 | null |
| 2025-12-20 | Towards Ancient Plant Seed Classification: A Benchmark Dataset and Baseline Model | Rui Xing et.al. | 2512.18247 | null |
| 2025-12-17 | SCS-SupCon: Sigmoid-based Common and Style Supervised Contrastive Learning with Adaptive Decision Boundaries | Bin Wang et.al. | 2512.17954 | null |
| 2025-12-19 | Domain-Aware Quantum Circuit for QML | Gurinder Singh et.al. | 2512.17800 | null |
| 2025-12-19 | Resource-efficient medical image classification for edge devices | Mahsa Lavaei et.al. | 2512.17515 | null |
| 2025-12-19 | AIFloodSense: A Global Aerial Imagery Dataset for Semantic Segmentation and Understanding of Flooded Environments | Georgios Simantiris et.al. | 2512.17432 | null |
| 2025-12-19 | Can Synthetic Images Serve as Effective and Efficient Class Prototypes? | Dianxing Shi et.al. | 2512.17160 | null |
| 2025-12-18 | Do Generalized-Gamma Scale Mixtures of Normals Fit Large Image Datasets? | Brandon Marks et.al. | 2512.17038 | null |
| 2025-12-18 | Blog Data Showdown: Machine Learning vs Neuro-Symbolic Models for Gender Classification | Natnael Tilahun Sinshaw et.al. | 2512.16687 | null |
| 2025-12-18 | Protecting Deep Neural Network Intellectual Property with Chaos-Based White-Box Watermarking | Sangeeth B et.al. | 2512.16658 | null |
| 2025-12-10 | D3G: Diverse Demographic Data Generation Increases Zero-Shot Image Classification Accuracy within Multimodal Models | Javon Hickmon et.al. | 2512.15747 | null |
| 2025-12-17 | Stylized Synthetic Augmentation further improves Corruption Robustness | Georg Siedel et.al. | 2512.15675 | null |
| 2025-12-17 | Vision-based module for accurately reading linear scales in a laboratory | Parvesh Saini et.al. | 2512.15327 | null |
| 2025-12-17 | TrajSyn: Privacy-Preserving Dataset Distillation from Federated Model Trajectories for Server-Side Adversarial Training | Mukur Gupta et.al. | 2512.15123 | null |
| 2025-12-09 | A Critical Perspective on Finite Sample Conformal Prediction Theory in Medical Applications | Klaus-Rudolf Kladny et.al. | 2512.14727 | null |
| 2025-12-16 | An Energy-Efficient Adiabatic Capacitive Neural Network Chip | Himadri Singh Raghav et.al. | 2512.14642 | null |
| 2025-12-16 | Low-Resource, High-Impact: Building Corpora for Inclusive Language Technologies | Ekaterina Artemova et.al. | 2512.14576 | null |
| 2025-12-16 | FoodLogAthl-218: Constructing a Real-World Food Image Dataset Using Dietary Management Applications | Mitsuki Watanabe et.al. | 2512.14574 | null |
| 2025-12-16 | Improving Semantic Uncertainty Quantification in LVLMs with Semantic Gaussian Processes | Joseph Hoche et.al. | 2512.14177 | null |
| 2025-12-15 | Ensemble-Guided Distillation for Compact and Robust Acoustic Scene Classification on Edge Devices | Hossein Sharify et.al. | 2512.13905 | null |
| 2025-12-14 | DL $^3$ M: A Vision-to-Language Framework for Expert-Level Medical Reasoning through Deep Learning and Large Language Models | Md. Najib Hasan et.al. | 2512.13742 | null |
| 2025-12-15 | REVERB-FL: Server-Side Adversarial and Reserve-Enhanced Federated Learning for Robust Audio Classification | Sathwika Peechara et.al. | 2512.13647 | null |
| 2025-12-15 | On the Ability of Deep Learning to Detect Signals with Unknown Parameters | Tom Anders et.al. | 2512.13542 | null |
| 2025-12-15 | Dual-Qubit Hierarchical Fuzzy Neural Network for Image Classification: Enabling Relational Learning via Quantum Entanglement | Wenwei Zhang et.al. | 2512.13274 | null |
| 2025-12-15 | Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views | Tingyang Chen et.al. | 2512.12980 | link |
| 2025-12-15 | Revisiting 2D Foundation Models for Scalable 3D Medical Image Classification | Han Liu et.al. | 2512.12887 | null |
| 2025-12-14 | Adapting Multimodal Foundation Models for Few-Shot Learning: A Comprehensive Study on Contrastive Captioners | N. K. B. M. P. K. B. Narasinghe et.al. | 2512.12824 | null |
| 2025-12-14 | Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches | Amirhossein Yousefiramandi et.al. | 2512.12677 | null |
| 2025-12-13 | Advancing Cache-Based Few-Shot Classification via Patch-Driven Relational Gated Graph Attention | Tasweer Ahmad et.al. | 2512.12498 | null |
| 2025-12-13 | Semantic Distance Measurement based on Multi-Kernel Gaussian Processes | Yinzhu Cheng et.al. | 2512.12238 | null |
| 2025-12-12 | A Comparative Analysis of Semiconductor Wafer Map Defect Detection with Image Transformer | Sushmita Nath et.al. | 2512.11977 | null |
| 2025-12-12 | ACCOR: Attention-Enhanced Complex-Valued Contrastive Learning for Occluded Object Classification Using mmWave Radar IQ Signals | Stefan Hägele et.al. | 2512.11556 | null |
| 2025-12-12 | FRQI Pairs method for image classification using Quantum Recurrent Neural Network | Rafał Potempa et.al. | 2512.11499 | null |
| 2025-12-12 | VLM2GeoVec: Toward Universal Multimodal Embeddings for Remote Sensing | Emanuel Sánchez Aimar et.al. | 2512.11490 | null |
| 2025-12-12 | Multi-task Learning with Extended Temporal Shift Module for Temporal Action Localization | Anh-Kiet Duong et.al. | 2512.11189 | null |
| 2025-12-11 | VL-JEPA: Joint Embedding Predictive Architecture for Vision-language | Delong Chen et.al. | 2512.10942 | null |
| 2025-12-11 | LabelFusion: Learning to Fuse LLMs and Transformer Classifiers for Robust Text Classification | Michael Schlee et.al. | 2512.10793 | null |
| 2025-12-11 | Uncertainty-Preserving QBNNs: Multi-Level Quantization of SVI-Based Bayesian Neural Networks for Image Classification | Hendrik Borras et.al. | 2512.10602 | null |
| 2025-12-10 | MedXAI: A Retrieval-Augmented and Self-Verifying Framework for Knowledge-Guided Medical Image Analysis | Midhat Urooj et.al. | 2512.10098 | null |
| 2025-12-10 | Text2Graph: Combining Lightweight LLMs and GNNs for Efficient Text Classification in Label-Scarce Scenarios | João Lucas Luz Lima Sarcinelli et.al. | 2512.10061 | null |
| 2025-12-10 | Stylized Meta-Album: Group-bias injection with style transfer to study robustness against distribution shifts | Romain Mussard et.al. | 2512.09773 | null |
| 2025-12-10 | OxEnsemble: Fair Ensembles for Low-Data Classification | Jonathan Rystrøm et.al. | 2512.09665 | null |
| 2025-12-10 | Hands-on Evaluation of Visual Transformers for Object Recognition and Detection | Dimitrios N. Vlachogiannis et.al. | 2512.09579 | null |
| 2025-12-10 | NeuroSketch: An Effective Framework for Neural Decoding via Systematic Architectural Optimization | Gaorui Zhang et.al. | 2512.09524 | link |
| 2025-12-10 | Advancing Text Classification with Large Language Models and Neural Attention Mechanisms | Ning Lyu et.al. | 2512.09444 | null |
| 2025-12-10 | Representation Calibration and Uncertainty Guidance for Class-Incremental Learning based on Vision Language Model | Jiantao Tan et.al. | 2512.09441 | null |
| 2025-12-10 | Benchmarking Real-World Medical Image Classification with Noisy Labels: Challenges, Practice, and Outlook | Yuan Ma et.al. | 2512.09315 | null |
| 2025-12-09 | GS-KAN: Parameter-Efficient Kolmogorov-Arnold Networks via Sprecher-Type Shared Basis Functions | Oscar Eliasson et.al. | 2512.09084 | null |
| 2025-12-09 | Improving Multi-Class Calibration through Normalization-Aware Isotonic Techniques | Alon Arad et.al. | 2512.09054 | null |
| 2025-12-09 | Luxical: High-Speed Lexical-Dense Text Embeddings | DatologyAI et.al. | 2512.09015 | link |
| 2025-12-08 | Enhancing Knowledge Transfer in Hyperspectral Image Classification via Cross-scene Knowledge Integration | Lu Huo et.al. | 2512.08989 | null |
| 2025-12-09 | Automated Pollen Recognition in Optical and Holographic Microscopy Images | Swarn Singh Warshaneyan et.al. | 2512.08589 | null |
| 2025-12-09 | Low Rank Support Quaternion Matrix Machine | Wang Chen et.al. | 2512.08327 | null |
| 2025-12-08 | Applicability of Metalenses for Generalizable Computer Vision | Yubo Zhang et.al. | 2512.08109 | null |
| 2025-12-08 | Exploiting the Randomness of Large Language Models (LLM) in Text Classification Tasks: Locating Privileged Documents in Legal Matters | Keith Huffman et.al. | 2512.08083 | null |
| 2025-11-28 | GSPN-2: Efficient Parallel Sequence Modeling | Hongjun Wang et.al. | 2512.07884 | null |
| 2025-11-27 | Semi-Supervised Contrastive Learning with Orthonormal Prototypes | Huanran Li et.al. | 2512.07880 | null |
| 2025-12-08 | Complementary Learning Approach for Text Classification using Large Language Models | Navid Asgari et.al. | 2512.07583 | null |
| 2025-12-08 | Integrating Multi-scale and Multi-filtration Topological Features for Medical Image Classification | Pengfei Gu et.al. | 2512.07190 | null |
| 2025-12-08 | Winning the Lottery by Preserving Network Training Dynamics with Concrete Ticket Search | Tanay Arora et.al. | 2512.07142 | null |
| 2025-12-08 | Dual Refinement Cycle Learning: Unsupervised Text Classification of Mamba and Community Detection on Text Attributed Graph | Hong Wang et.al. | 2512.07100 | null |
| 2025-12-07 | Toward Reliable Machine Unlearning: Theory, Algorithms, and Evaluation | Ali Ebrahimpour-Boroojeny et.al. | 2512.06993 | null |
| 2025-12-07 | SceneMixer: Exploring Convolutional Mixing Networks for Remote Sensing Scene Classification | Mohammed Q. Alkhatib et.al. | 2512.06877 | null |
| 2025-12-07 | Arc Gradient Descent: A Mathematically Derived Reformulation of Gradient Descent with Phase-Aware, User-Controlled Step Dynamics | Nikhil Verma et.al. | 2512.06737 | null |
| 2025-12-07 | Hierarchical Deep Learning for Diatom Image Classification: A Multi-Level Taxonomic Approach | Yueying Ke et.al. | 2512.06613 | null |
| 2025-12-06 | Proof of Concept for Mammography Classification with Enhanced Compactness and Separability Modules | Fariza Dahes et.al. | 2512.06575 | null |
| 2025-12-06 | LOCUS: A System and Method for Low-Cost Customization for Universal Specialization | Dhanasekar Sundararaman et.al. | 2512.06239 | null |
| 2025-12-05 | Efficient Text Classification with Conformal In-Context Learning | Ippokratis Pantelidis et.al. | 2512.05732 | null |
| 2025-12-05 | NormalView: sensor-agnostic tree species classification from backpack and aerial lidar data using geometric projections | Juho Korkeala et.al. | 2512.05610 | null |
| 2025-12-04 | GeoPE:A Unified Geometric Positional Embedding for Structured Tensors | Yupu Yao et.al. | 2512.04963 | null |
| 2025-12-04 | An all-optical convolutional neural network for image identification | Wei-Wei Fu et.al. | 2512.04569 | null |
| 2025-12-04 | Performance Evaluation of Transfer Learning Based Medical Image Classification Techniques for Disease Detection | Zeeshan Ahmad et.al. | 2512.04397 | null |
| 2025-11-20 | Memory-DD: A Low-Complexity Dendrite-Inspired Neuron for Temporal Prediction Tasks | Dongjian Yang et.al. | 2512.04094 | null |
| 2025-12-03 | Improving Alignment Between Human and Machine Codes: An Empirical Assessment of Prompt Engineering for Construct Identification in Psychology | Kylie L. Anglin et.al. | 2512.03818 | null |
| 2025-12-03 | Research on Brain Tumor Classification Method Based on Improved ResNet34 Network | Yufeng Li et.al. | 2512.03751 | null |
| 2025-12-03 | Multi-Scale Visual Prompting for Lightweight Small-Image Classification | Salim Khazem et.al. | 2512.03663 | null |
| 2025-12-03 | FeatureLens: A Highly Generalizable and Interpretable Framework for Detecting Adversarial Examples Based on Image Features | Zhigang Yang et.al. | 2512.03625 | null |
| 2025-12-03 | Label-Efficient Hyperspectral Image Classification via Spectral FiLM Modulation of Low-Level Pretrained Diffusion Features | Yuzhen Hu et.al. | 2512.03430 | null |
| 2025-12-01 | ALARM: Automated MLLM-Based Anomaly Detection in Complex-EnviRonment Monitoring with Uncertainty Quantification | Congjing Zhang et.al. | 2512.03101 | null |
| 2025-12-01 | Context-Enriched Contrastive Loss: Enhancing Presentation of Inherent Sample Connections in Contrastive Learning Framework | Haojin Deng et.al. | 2512.02152 | null |
| 2025-11-29 | Parallel Multi-Circuit Quantum Feature Fusion in Hybrid Quantum-Classical Convolutional Neural Networks for Breast Tumor Classification | Ece Yurtseven et.al. | 2512.02066 | null |
| 2025-12-01 | ViT $^3$ : Unlocking Test-Time Training in Vision | Dongchen Han et.al. | 2512.01643 | null |
| 2025-12-01 | Supervised Contrastive Machine Unlearning of Background Bias in Sonar Image Classification with Fine-Grained Explainable AI | Kamal Basha S et.al. | 2512.01291 | null |
| 2025-12-01 | nnMobileNet++: Towards Efficient Hybrid Networks for Retinal Image Analysis | Xin Li et.al. | 2512.01273 | null |
| 2025-12-01 | Teaching by Failure: Counter-Example-Driven Curricula for Transformer Self-Improvement | Harshil Vejendla et.al. | 2512.01187 | null |
| 2025-11-30 | Projection-Free CNN Pruning via Frank-Wolfe with Momentum: Sparser Models with Less Pretraining | Hamza ElMokhtar Shili et.al. | 2512.01147 | null |
| 2025-11-30 | OmniFD: A Unified Model for Versatile Face Forgery Detection | Haotian Liu et.al. | 2512.01128 | link |
| 2025-11-29 | Financial Text Classification Based On rLoRA Finetuning On Qwen3-8B model | Zhiming Lian et.al. | 2512.00630 | null |
| 2025-11-29 | Learning What Helps: Task-Aligned Context Selection for Vision Tasks | Jingyu Guo et.al. | 2512.00489 | null |
| 2025-11-29 | Vision Transformer for Classification of UAV and Helicopters Using Micro-Doppler Spectrograms in Surveillance Radar | Arkadiusz Czuba et.al. | 2512.00374 | null |
| 2025-11-26 | SemImage: Semantic Image Representation for Text, a Novel Framework for Embedding Disentangled Linguistic Features | Mohammad Zare et.al. | 2512.00088 | null |
| 2025-11-28 | Decoding the Past: Explainable Machine Learning Models for Dating Historical Texts | Paulo J. N. Pinto et.al. | 2511.23056 | null |
| 2025-11-27 | The Collapse of Patches | Wei Guo et.al. | 2511.22281 | link |
| 2025-11-27 | Support Vector Machine Classifier with Rescaled Huberized Pinball Loss | Shibo Diao et.al. | 2511.22065 | null |
| 2025-11-27 | When Do Domain-Specific Foundation Models Justify Their Cost? A Systematic Evaluation Across Retinal Imaging Tasks | David Isztl et.al. | 2511.22001 | null |
| 2025-11-26 | DeepGI: Explainable Deep Learning for Gastrointestinal Image Classification | Walid Houmaidi et.al. | 2511.21959 | null |
| 2025-11-23 | Semantics as a Shield: Label Disguise Defense (LDD) against Prompt Injection in LLM Sentiment Classification | Yanxi Li et.al. | 2511.21752 | null |
| 2025-11-18 | DNNs, Dataset Statistics, and Correlation Functions | Robert W. Batterman et.al. | 2511.21715 | null |
| 2025-11-26 | Continual Error Correction on Low-Resource Devices | Kirill Paramonov et.al. | 2511.21652 | null |
| 2025-11-25 | CHiQPM: Calibrated Hierarchical Interpretable Image Classification | Thomas Norrenbrock et.al. | 2511.20779 | null |
| 2025-11-25 | Adaptive Hopfield Network: Rethinking Similarities in Associative Memory | Shurong Wang et.al. | 2511.20609 | null |
| 2025-11-25 | HVAdam: A Full-Dimension Adaptive Optimizer | Yiheng Zhang et.al. | 2511.20277 | null |
| 2025-11-25 | ScenarioCLIP: Pretrained Transferable Visual Language Models and Action-Genome Dataset for Natural Scene Analysis | Advik Sinha et.al. | 2511.20274 | null |
| 2025-11-25 | Advancing Image Classification with Discrete Diffusion Classification Modeling | Omer Belhasin et.al. | 2511.20263 | null |
| 2025-11-25 | Exploring State-of-the-art models for Early Detection of Forest Fires | Sharjeel Ahmed et.al. | 2511.20096 | null |
| 2025-11-24 | Multiscale Vector-Quantized Variational Autoencoder for Endoscopic Image Synthesis | Dimitrios E. Diamantis et.al. | 2511.19578 | null |
| 2025-11-24 | An Anatomy Aware Hybrid Deep Learning Framework for Lung Cancer Tumor Stage Classification | Saniah Kayenat Chowdhury et.al. | 2511.19367 | null |
| 2025-11-24 | Neural Architecture Search for Quantum Autoencoders | Hibah Agha et.al. | 2511.19246 | null |
| 2025-11-24 | Uncertainty-Aware Dual-Student Knowledge Distillation for Efficient Image Classification | Aakash Gore et.al. | 2511.18826 | null |
| 2025-11-24 | Dendritic Convolution for Noise Image Recognition | Jiarui Xue et.al. | 2511.18699 | null |
| 2025-11-24 | EVCC: Enhanced Vision Transformer-ConvNeXt-CoAtNet Fusion for Classification | Kazi Reyazul Hasan et.al. | 2511.18691 | null |
| 2025-11-23 | A Unified BERT-CNN-BiLSTM Framework for Simultaneous Headline Classification and Sentiment Analysis of Bangla News | Mirza Raquib et.al. | 2511.18618 | null |
| 2025-11-22 | AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens | Purvish Jajal et.al. | 2511.18105 | null |
| 2025-11-22 | Less Is More: An Explainable AI Framework for Lightweight Malaria Classification | Md Abdullah Al Kafi et.al. | 2511.18083 | null |
| 2025-11-22 | Hierarchical Semi-Supervised Active Learning for Remote Sensing | Wei Huang et.al. | 2511.18058 | null |
| 2025-11-21 | A Hybrid Classical-Quantum Fine Tuned BERT for Text Classification | Abu Kaisar Mohammad Masum et.al. | 2511.17677 | null |
| 2025-11-21 | REMSA: An LLM Agent for Foundation Model Selection in Remote Sensing | Binger Chen et.al. | 2511.17442 | null |
| 2025-11-21 | DSeq-JEPA: Discriminative Sequential Joint-Embedding Predictive Architecture | Xiangteng He et.al. | 2511.17354 | link |
| 2025-11-21 | Attention-Guided Feature Fusion (AGFF) Model for Integrating Statistical and Semantic Features in News Text Classification | Mohammad Zare et.al. | 2511.17184 | null |
| 2025-11-15 | Concept-Based Interpretability for Toxicity Detection | Samarth Garg et.al. | 2511.16689 | null |
| 2025-11-20 | Teacher-Guided One-Shot Pruning via Context-Aware Knowledge Distillation | Md. Samiul Alim et.al. | 2511.16653 | null |
| 2025-11-20 | Formal Abductive Latent Explanations for Prototype-Based Networks | Jules Soria et.al. | 2511.16588 | link |
| 2025-11-20 | Unsupervised Image Classification with Adaptive Nearest Neighbor Selection and Cluster Ensembles | Melih Baydar et.al. | 2511.16213 | null |
| 2025-11-20 | SpectralTrain: A Universal Framework for Hyperspectral Image Classification | Meihua Zhou et.al. | 2511.16084 | null |
| 2025-11-19 | RB-FT: Rationale-Bootstrapped Fine-Tuning for Video Classification | Meilong Xu et.al. | 2511.15923 | null |
| 2025-11-19 | Hyperspectral Image Classification using Spectral-Spatial Mixer Network | Mohammed Q. Alkhatib et.al. | 2511.15692 | null |
| 2025-11-19 | IPTQ-ViT: Post-Training Quantization of Non-linear Functions for Integer-only Vision Transformers | Gihwan Kim et.al. | 2511.15369 | null |
| 2025-11-19 | Computer Vision Modeling of the Development of Geometric and Numerical Concepts in Humans | Zekun Wang et.al. | 2511.15029 | null |
| 2025-11-18 | Logit-Based Losses Limit the Effectiveness of Feature Knowledge Distillation | Nicholas Cooper et.al. | 2511.14981 | null |
| 2025-11-18 | Vision Large Language Models Are Good Noise Handlers in Engagement Analysis | Alexander Vedernikov et.al. | 2511.14749 | null |
| 2025-11-18 | Task Addition and Weight Disentanglement in Closed-Vocabulary Models | Adam Hazimeh et.al. | 2511.14569 | null |
| 2025-11-18 | Step by Step Network | Dongchen Han et.al. | 2511.14329 | null |
| 2025-11-18 | Zero-Training Task-Specific Model Synthesis for Few-Shot Medical Image Classification | Yao Qin et.al. | 2511.14082 | null |
| 2025-11-16 | Semantic Multiplexing | Mohammad Abdi et.al. | 2511.13779 | null |
| 2025-11-17 | Cross-Learning from Scarce Data via Multi-Task Constrained Optimization | Leopoldo Agorio et.al. | 2511.13680 | null |
| 2025-11-17 | Data Value in the Age of Scaling: Understanding LLM Scaling Dynamics Under Real-Synthetic Data Mixtures | Haohui Wang et.al. | 2511.13640 | null |
| 2025-11-17 | Minimax Multi-Target Conformal Prediction with Applications to Imaging Inverse Problems | Jeffrey Wen et.al. | 2511.13533 | null |
| 2025-11-17 | Tight and Practical Privacy Auditing for Differentially Private In-Context Learning | Yuyang Xia et.al. | 2511.13502 | null |
| 2025-11-17 | Aspect-Level Obfuscated Sentiment in Thai Financial Disclosures and Its Impact on Abnormal Returns | Attapol T. Rutherford et.al. | 2511.13481 | null |
| 2025-11-17 | Hardware optimization on Android for inference of AI models | Iulius Gherasim et.al. | 2511.13453 | null |
| 2025-11-17 | MedDCR: Learning to Design Agentic Workflows for Medical Coding | Jiyang Zheng et.al. | 2511.13361 | null |
| 2025-11-17 | Synthetic Forgetting without Access: A Few-shot Zero-glance Framework for Machine Unlearning | Qipeng Song et.al. | 2511.13116 | null |
| 2025-11-17 | Angular Gradient Sign Method: Uncovering Vulnerabilities in Hyperbolic Networks | Minsoo Jo et.al. | 2511.12985 | null |
| 2025-11-17 | CalibrateMix: Guided-Mixup Calibration of Image Semi-Supervised Models | Mehrab Mustafy Rahman et.al. | 2511.12964 | null |
| 2025-11-16 | Catastrophic Forgetting in Kolmogorov-Arnold Networks | Mohammad Marufur Rahman et.al. | 2511.12828 | null |
| 2025-11-16 | Medical Knowledge Intervention Prompt Tuning for Medical Image Classification | Ye Du et.al. | 2511.12639 | null |
| 2025-11-15 | AGGRNet: Selective Feature Extraction and Aggregation for Enhanced Medical Image Classification | Ansh Makwe et.al. | 2511.12382 | null |
| 2025-11-15 | CLAReSNet: When Convolution Meets Latent Attention for Hyperspectral Image Classification | Asmit Bandyopadhyay et.al. | 2511.12346 | null |
| 2025-11-15 | Learning Time in Static Classifiers | Xi Ding et.al. | 2511.12321 | null |
| 2025-11-15 | Rethinking Bias in Generative Data Augmentation for Medical AI: a Frequency Recalibration Method | Chi Liu et.al. | 2511.12301 | null |
| 2025-11-15 | FaNe: Towards Fine-Grained Cross-Modal Contrast with False-Negative Reduction and Text-Conditioned Sparse Attention | Peng Zhang et.al. | 2511.12215 | null |
| 2025-11-15 | MPD-SGR: Robust Spiking Neural Networks with Membrane Potential Distribution-Driven Surrogate Gradient Regularization | Runhao Jiang et.al. | 2511.12199 | null |
| 2025-11-15 | Breaking the Modality Wall: Time-step Mixup for Efficient Spiking Knowledge Transfer from Static to Event Domain | Yuqi Xie et.al. | 2511.12150 | null |
| 2025-11-15 | Supervised Multilabel Image Classification Using Residual Networks with Probabilistic Reasoning | Lokender Singh et.al. | 2511.12082 | null |
| 2025-11-15 | FedSDA: Federated Stain Distribution Alignment for Non-IID Histopathological Image Classification | Cheng-Chang Tsai et.al. | 2511.12044 | null |
| 2025-11-14 | Additive Large Language Models for Semi-Structured Text | Karthikeyan K et.al. | 2511.11922 | null |
| 2025-11-14 | Quantifying and Improving Adaptivity in Conformal Prediction through Input Transformations | Sooyong Jang et.al. | 2511.11472 | null |
| 2025-11-06 | Google-MedGemma Based Abnormality Detection in Musculoskeletal radiographs | Soumyajit Maity et.al. | 2511.05600 | null |
| 2025-11-06 | EETnet: a CNN for Gaze Detection and Tracking for Smart-Eyewear | Andrea Aspesi et.al. | 2511.04779 | null |
| 2025-11-06 | Courant algebroid lifts and curved Courant algebroids | Filip Moučka et.al. | 2511.04743 | null |
| 2025-11-06 | Trustworthiness Calibration Framework for Phishing Email Detection Using Large Language Models | Daniyal Ganiuly et.al. | 2511.04728 | null |
| 2025-11-06 | When retrieval outperforms generation: Dense evidence retrieval for scalable fake news detection | Alamgir Munir Qazi et.al. | 2511.04643 | link |
| 2025-11-06 | CardioPHON: Quality assessment and self-supervised pretraining for screening of cardiac function based on phonocardiogram recordings | Vladimir Despotovic et.al. | 2511.04533 | null |
| 2025-11-06 | IntelliProof: An Argumentation Network-based Conversational Helper for Organized Reflection | Kaveh Eskandari Miandoab et.al. | 2511.04528 | null |
| 2025-11-06 | Modeling Clinical Uncertainty in Radiology Reports: from Explicit Uncertainty Markers to Implicit Reasoning Pathways | Paloma Rabaey et.al. | 2511.04506 | null |
| 2025-11-06 | Differentially Private In-Context Learning with Nearest Neighbor Search | Antti Koskela et.al. | 2511.04332 | null |
| 2025-11-06 | Classification of four-quark operators with $ΔF\le 2$ under flavor symmetry and their renormalization in a gauge-invariant scheme | Gregoris Spanoudes et.al. | 2511.04305 | null |
| 2025-11-06 | Covariance Descriptors Meet General Vision Encoders: Riemannian Deep Learning for Medical Image Classification | Josef Mayr et.al. | 2511.04190 | null |
| 2025-11-06 | SynQuE: Estimating Synthetic Dataset Quality Without Annotations | Arthur Chen et.al. | 2511.03928 | null |
| 2025-11-05 | Divide, Cache, Conquer: Dichotomic Prompting for Efficient Multi-Label LLM-Based Classification | Mikołaj Langner et.al. | 2511.03830 | null |
| 2025-11-04 | Hybrid Convolution and Vision Transformer NAS Search Space for TinyML Image Classification | Mikhael Djajapermana et.al. | 2511.02992 | null |
| 2025-11-04 | Diffusion Models are Robust Pretrainers | Mika Yagoda et.al. | 2511.02793 | null |
| 2025-11-03 | Towards Continuous-variable Quantum Neural Networks for Biomedical Imaging | Daniel Alejandro Lopez et.al. | 2511.02051 | null |
| 2025-11-03 | Reliability Assessment Framework Based on Feature Separability for Pathological Cell Image Classification under Prior Bias | Takaaki Tachibana et.al. | 2511.01953 | null |
| 2025-11-03 | Game-theoretic distributed learning of generative models for heterogeneous data collections | Dmitrij Schlesinger et.al. | 2511.01740 | null |
| 2025-11-03 | Bayesian Natural Gradient Fine-Tuning of CLIP Models via Kalman Filtering | Hossein Abdi et.al. | 2511.01694 | null |
| 2025-11-03 | Protecting the Neural Networks against FGSM Attack Using Machine Unlearning | Amir Hossein Khorasani et.al. | 2511.01377 | null |
| 2025-11-02 | Parameter Interpolation Adversarial Training for Robust Image Classification | Xin Liu et.al. | 2511.00836 | null |
| 2025-11-01 | FeNN-DMA: A RISC-V SoC for SNN acceleration | Zainab Aizaz et.al. | 2511.00732 | null |
| 2025-11-01 | Leveraging Hierarchical Image-Text Misalignment for Universal Fake Image Detection | Daichi Zhang et.al. | 2511.00427 | null |
| 2025-11-01 | LGCA: Enhancing Semantic Representation via Progressive Expansion | Thanh Hieu Cao et.al. | 2511.00419 | null |
| 2025-10-31 | Cross-fluctuation phase transitions reveal sampling dynamics in diffusion models | Sai Niranjan Ramachandran et.al. | 2511.00124 | null |
| 2025-10-28 | ReLaX-Net: Reusing Layers for Parameter-Efficient Physical Neural Networks | Kohei Tsuchiyama et.al. | 2511.00044 | null |
| 2025-10-31 | C-LEAD: Contrastive Learning for Enhanced Adversarial Defense | Suklav Ghosh et.al. | 2510.27249 | null |
| 2025-10-31 | SpecAware: A Spectral-Content Aware Foundation Model for Unifying Multi-Sensor Learning in Hyperspectral Remote Sensing Mapping | Renjie Ji et.al. | 2510.27219 | null |
| 2025-10-31 | AFM-Net: Advanced Fusing Hierarchical CNN Visual Priors with Global Sequence Modeling for Remote Sensing Image Scene Classification | Yuanhao Tang et.al. | 2510.27155 | link |
| 2025-10-30 | Overspecified Mixture Discriminant Analysis: Exponential Convergence, Statistical Guarantees, and Remote Sensing Applications | Arman Bolatov et.al. | 2510.27056 | null |
| 2025-10-30 | Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off | Muhammad Faraz Ul Abrar et.al. | 2510.26722 | null |
| 2025-10-30 | FlowQ-Net: A Generative Framework for Automated Quantum Circuit Design | Jun Dai et.al. | 2510.26688 | null |
| 2025-10-30 | Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods | Jiali Cheng et.al. | 2510.26038 | null |
| 2025-10-29 | Binaspect – A Python Library for Binaural Audio Analysis, Visualization & Feature Generation | Dan Barry et.al. | 2510.25714 | null |
| 2025-10-29 | Monitoring the calibration of probability forecasts with an application to concept drift detection involving image classification | Christopher T. Franck et.al. | 2510.25573 | null |
| 2025-10-29 | Neighborhood Feature Pooling for Remote Sensing Image Classification | Fahimeh Orvati Nia et.al. | 2510.25077 | null |
| 2025-10-29 | Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Training of Sound Events With Partial Labels | Keisuke Imoto et.al. | 2510.25075 | null |
| 2025-10-28 | Fair Indivisible Payoffs through Shapley Value | Mikołaj Czarnecki et.al. | 2510.24906 | null |
| 2025-10-25 | CFL-SparseMed: Communication-Efficient Federated Learning for Medical Imaging with Top-k Sparse Updates | Gousia Habib et.al. | 2510.24776 | null |
| 2025-10-28 | All in one timestep: Enhancing Sparsity and Energy efficiency in Multi-level Spiking Neural Networks | Andrea Castagnetti et.al. | 2510.24637 | null |
| 2025-10-28 | Few-Shot Remote Sensing Image Scene Classification with CLIP and Prompt Learning | Ivica Dimitrovski et.al. | 2510.24321 | null |
| 2025-10-26 | Quantum Machine Learning for Image Classification: A Hybrid Model of Residual Network with Quantum Support Vector Machine | Md. Farhan Shahriyar et.al. | 2510.23659 | null |
| 2025-10-27 | iPac: Incorporating Intra-image Patch Context into Graph Neural Networks for Medical Image Classification | Usama Zidan et.al. | 2510.23504 | null |
| 2025-10-27 | Mixed Precision Training of Neural ODEs | Elena Celledoni et.al. | 2510.23498 | null |
| 2025-10-27 | Human-AI Collaborative Uncertainty Quantification | Sima Noorani et.al. | 2510.23476 | null |
| 2025-10-27 | CURVETE: Curriculum Learning and Progressive Self-supervised Training for Medical Image Classification | Asmaa Abbas et.al. | 2510.23442 | null |
| 2025-10-27 | One-Timestep is Enough: Achieving High-performance ANN-to-SNN Conversion via Scale-and-Fire Neurons | Qiuyang Chen et.al. | 2510.23383 | null |
| 2025-10-27 | The Benchmarking Epistemology: Construct Validity for Evaluating Machine Learning Models | Timo Freiesleben et.al. | 2510.23191 | null |
| 2025-10-27 | Generating Auxiliary Tasks with Reinforcement Learning | Judah Goldfeder et.al. | 2510.22940 | null |
| 2025-10-26 | SALSA: Single-pass Autoregressive LLM Structured Classification | Ruslan Berdichevsky et.al. | 2510.22691 | null |
| 2025-10-26 | Alias-Free ViT: Fractional Shift Invariance via Linear Attention | Hagay Michaeli et.al. | 2510.22673 | null |
| 2025-10-25 | Stable neural networks and connections to continuous dynamical systems | Matthias J. Ehrhardt et.al. | 2510.22299 | null |
| 2025-10-25 | WAON: Large-Scale and High-Quality Japanese Image-Text Pair Dataset for Vision-Language Models | Issa Sugiura et.al. | 2510.22276 | null |
| 2025-10-25 | Simplifying Knowledge Transfer in Pretrained Models | Siddharth Jain et.al. | 2510.22208 | null |
| 2025-10-23 | Framework for Machine Evaluation of Reasoning Completeness in Large Language Models For Classification Tasks | Avinash Patil et.al. | 2510.21884 | null |
| 2025-10-23 | TernaryCLIP: Efficiently Compressing Vision-Language Models with Ternary Weights and Distilled Knowledge | Shu-Hao Zhang et.al. | 2510.21879 | null |
| 2025-10-22 | Towards Accurate and Efficient Waste Image Classification: A Hybrid Deep Learning and Machine Learning Approach | Ngoc-Bao-Quang Nguyen et.al. | 2510.21833 | null |
| 2025-10-13 | A Multi-lingual Dataset of Classified Paragraphs from Open Access Scientific Publications | Eric Jeangirard et.al. | 2510.21762 | null |
| 2025-10-24 | Head Pursuit: Probing Attention Specialization in Multimodal Transformers | Lorenzo Basile et.al. | 2510.21518 | null |
| 2025-10-24 | Compressing Quaternion Convolutional Neural Networks for Audio Classification | Arshdeep Singh et.al. | 2510.21388 | null |
| 2025-10-24 | Weak-to-Strong Generalization under Distribution Shifts | Myeongho Jeon et.al. | 2510.21332 | null |
| 2025-10-24 | VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set | Shufan Shen et.al. | 2510.21323 | link |
| 2025-10-23 | H-SPLID: HSIC-based Saliency Preserving Latent Information Decomposition | Lukas Miklautz et.al. | 2510.20627 | null |
| 2025-10-23 | Breakdance Video classification in the age of Generative AI | Sauptik Dhar et.al. | 2510.20287 | null |
| 2025-10-22 | Improving Predictive Confidence in Medical Imaging via Online Label Smoothing | Kushan Choudhury et.al. | 2510.20011 | null |
| 2025-10-22 | Uncertainty evaluation of segmentation models for Earth observation | Melanie Rey et.al. | 2510.19586 | null |
| 2025-10-22 | AegisRF: Adversarial Perturbations Guided with Sensitivity for Protecting Intellectual Property of Neural Radiance Fields | Woo Jae Kim et.al. | 2510.19371 | null |
| 2025-10-22 | AMAuT: A Flexible and Efficient Multiview Audio Transformer Framework Trained from Scratch | Weichuang Shao et.al. | 2510.19368 | null |
| 2025-10-22 | Feature Space Adaptation for Robust Model Fine-Tuning | Peng Wang et.al. | 2510.19155 | null |
| 2025-10-21 | Robustness Verification of Graph Neural Networks Via Lightweight Satisfiability Testing | Chia-Hsuan Lu et.al. | 2510.18591 | null |
| 2025-10-21 | DWaste: Greener AI for Waste Sorting using Mobile and Edge Devices | Suman Kunwar et.al. | 2510.18513 | null |
| 2025-10-21 | ScaleNet: Scaling up Pretrained Neural Networks with Incremental Parameters | Zhiwei Hao et.al. | 2510.18431 | null |
| 2025-10-21 | Learning from N-Tuple Data with M Positive Instances: Unbiased Risk Estimation and Theoretical Guarantees | Miao Zhang et.al. | 2510.18406 | null |
| 2025-10-21 | Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers | Firas Gabetni et.al. | 2510.18358 | null |
| 2025-10-21 | Adaptive Per-Channel Energy Normalization Front-end for Robust Audio Signal Processing | Hanyu Meng et.al. | 2510.18206 | null |
| 2025-10-18 | Advances in Pre-trained Language Models for Domain-Specific Text Classification: A Systematic Review | Zhyar Rzgar K. Rostam et.al. | 2510.17892 | null |
| 2025-10-10 | MAT-Agent: Adaptive Multi-Agent Training Optimization | Jusheng Zhang et.al. | 2510.17845 | null |
| 2025-10-20 | Reliable Inference in Edge-Cloud Model Cascades via Conformal Alignment | Jiayi Huang et.al. | 2510.17543 | null |
| 2025-10-20 | BenCao: An Instruction-Tuned Large Language Model for Traditional Chinese Medicine | Jiacheng Xie et.al. | 2510.17415 | null |
| 2025-10-20 | DDSC: Dynamic Dual-Signal Curriculum for Data-Efficient Acoustic Scene Classification under Domain Shift | Peihong Zhang et.al. | 2510.17345 | null |
| 2025-10-20 | EndoCIL: A Class-Incremental Learning Framework for Endoscopic Image Classification | Bingrong Liu et.al. | 2510.17200 | null |
| 2025-10-19 | ReclAIm: A multi-agent framework for degradation-aware performance tuning of medical imaging AI | Eleftherios Tzanis et.al. | 2510.17004 | null |
| 2025-10-18 | Adversarially Robust Quantum Transfer Learning | Amena Khatun et.al. | 2510.16301 | null |
| 2025-10-17 | Expert Merging in Sparse Mixture of Experts with Nash Bargaining | Dung V. Nguyen et.al. | 2510.16138 | null |
| 2025-10-18 | Differentiable, Bit-shifting, and Scalable Quantization without training neural network from scratch | Zia Badar et.al. | 2510.16088 | null |
| 2025-10-17 | Data-Driven Analysis of Intersectional Bias in Image Classification: A Framework with Bias-Weighted Augmentation | Farjana Yesmin et.al. | 2510.16072 | null |
| 2025-10-17 | FedPURIN: Programmed Update and Reduced INformation for Sparse Personalized Federated Learning | Lunchen Xie et.al. | 2510.16065 | null |
| 2025-10-14 | Layer-Aware Influence for Online Data Valuation Estimation | Ziao Yang et.al. | 2510.16007 | null |
| 2025-10-17 | Balanced Multi-Task Attention for Satellite Image Classification: A Systematic Approach to Achieving 97.23% Accuracy on EuroSAT Without Pre-Training | Aditya Vir et.al. | 2510.15527 | null |
| 2025-10-17 | A Tsetlin Machine Image Classification Accelerator on a Flexible Substrate | Yushu Qin et.al. | 2510.15519 | null |
| 2025-10-17 | Adaptive transfer learning for surgical tool presence detection in laparoscopic videos through gradual freezing fine-tuning | Ana Davila et.al. | 2510.15372 | null |
| 2025-10-16 | Fourier Transform Multiple Instance Learning for Whole Slide Image Classification | Anthony Bilic et.al. | 2510.15138 | null |
| 2025-10-16 | Programmatic Representation Learning with Language Models | Gabriel Poesia et.al. | 2510.14825 | null |
| 2025-10-16 | Rethinking Hebbian Principle: Low-Dimensional Structural Projection for Unsupervised Learning | Shikuang Deng et.al. | 2510.14810 | null |
| 2025-10-16 | Free-Grained Hierarchical Recognition | Seulki Park et.al. | 2510.14737 | null |
| 2025-10-16 | Camera Movement Classification in Historical Footage: A Comparative Study of Deep Video Models | Tingyu Lin et.al. | 2510.14713 | null |
| 2025-10-16 | FedPPA: Progressive Parameter Alignment for Personalized Federated Learning | Maulidi Adi Prasetia et.al. | 2510.14698 | null |
| 2025-10-16 | Geometric Moment Alignment for Domain Adaptation via Siegel Embeddings | Shayan Gharib et.al. | 2510.14666 | null |
| 2025-10-16 | Vision Mamba for Permeability Prediction of Porous Media | Ali Kashefi et.al. | 2510.14516 | null |
| 2025-10-15 | NAPPure: Adversarial Purification for Robust Image Classification under Non-Additive Perturbations | Junjie Nan et.al. | 2510.14025 | null |
| 2025-10-14 | MultiFoodhat: A potential new paradigm for intelligent food quality inspection | Yue Hu et.al. | 2510.13889 | null |
| 2025-10-14 | Large Language Model Agents Enable Autonomous Design and Image Analysis of Microwell Microfluidics | Dinh-Nguyen Nguyen et.al. | 2510.13883 | null |
| 2025-10-15 | Multi-Scale High-Resolution Logarithmic Grapher Module for Efficient Vision GNNs | Mustafa Munir et.al. | 2510.13740 | null |
| 2025-10-15 | Automated document processing system for government agencies using DBNET++ and BART models | Aya Kaysan Bahjat et.al. | 2510.13303 | null |
| 2025-10-15 | Approximate Bilevel Graph Structure Learning for Histopathology Image Classification | Sudipta Paul et.al. | 2510.13188 | null |
| 2025-10-14 | ProtoSiTex: Learning Semi-Interpretable Prototypes for Multi-label Text Classification | Utsav Kumar Nareti et.al. | 2510.12534 | null |
| 2025-10-14 | A Function Centric Perspective On Flat and Sharp Minima | Israel Mason-Williams et.al. | 2510.12451 | null |
| 2025-10-14 | Deep Attention-guided Adaptive Subsampling | Sharath M Shankaranarayana et.al. | 2510.12376 | null |
| 2025-10-14 | Hybrid Vision Transformer and Quantum Convolutional Neural Network for Image Classification | Mingzhu Wang et.al. | 2510.12291 | null |
| 2025-10-14 | State Space Prompting via Gathering and Spreading Spatio-Temporal Information for Video Understanding | Jiahuan Zhou et.al. | 2510.12160 | null |
| 2025-10-14 | A Review on Domain Adaption and Generative Adversarial Networks(GANs) | Aashish Dhawan et.al. | 2510.12075 | null |
| 2025-10-13 | Evaluating the Explainability of Vision Transformers in Medical Imaging | Leili Barekatain et.al. | 2510.12021 | null |
| 2025-10-13 | Bayesian Topological Convolutional Neural Nets | Sarah Harkins Dayton et.al. | 2510.11704 | null |
| 2025-10-13 | Benchmarking foundation models for hyperspectral image classification: Application to cereal crop type mapping | Walid Elbarz et.al. | 2510.11576 | null |
| 2025-10-13 | Investigating Large Language Models’ Linguistic Abilities for Text Preprocessing | Marco Braga et.al. | 2510.11482 | null |
| 2025-10-13 | GADA: Graph Attention-based Detection Aggregation for Ultrasound Video Classification | Li Chen et.al. | 2510.11437 | null |
| 2025-10-13 | Exploring and Leveraging Class Vectors for Classifier Editing | Jaeik Kim et.al. | 2510.11268 | null |
| 2025-10-13 | Multiview Manifold Evidential Fusion for PolSAR Image Classification | Junfei Shi et.al. | 2510.11171 | null |
| 2025-10-13 | One Size Does Not Fit All: Exploring Variable Thresholds for Distance-Based Multi-Label Text Classification | Jens Van Nooten et.al. | 2510.11160 | null |
| 2025-10-13 | Efficient Edge Test-Time Adaptation via Latent Feature Coordinate Correction | Xinyu Luo et.al. | 2510.11068 | null |
| 2025-10-12 | Identifying bias in CNN image classification using image scrambling and transforms | Sai Teja Erukude et.al. | 2510.10383 | null |
| 2025-10-11 | Weed Out, Then Harvest: Dual Low-Rank Adaptation is an Effective Noisy Label Detector for Noise-Robust Learning | Bo Yuan et.al. | 2510.10208 | null |
| 2025-10-11 | Lightweight Baselines for Medical Abstract Classification: DistilBERT with Cross-Entropy as a Strong Default | Jiaqi Liu et.al. | 2510.10025 | null |
| 2025-10-10 | Phase-Aware Deep Learning with Complex-Valued CNNs for Audio Signal Applications | Naman Agrawal et.al. | 2510.09926 | null |
| 2025-10-10 | One Sentence, Two Embeddings: Contrastive Learning of Explicit and Implicit Semantic Representations | Kohei Oda et.al. | 2510.09293 | null |
| 2025-10-10 | Instance-Level Generation for Representation Learning | Yankun Wu et.al. | 2510.09171 | null |
| 2025-10-10 | Exploring Single Domain Generalization of LiDAR-based Semantic Segmentation under Imperfect Labels | Weitong Kong et.al. | 2510.09035 | null |
| 2025-10-10 | Defense against Unauthorized Distillation in Image Restoration via Feature Space Perturbation | Han Hu et.al. | 2510.08925 | null |
| 2025-10-09 | The Boundaries of Fair AI in Medical Image Prognosis: A Causal Perspective | Thai-Hoang Pham et.al. | 2510.08840 | null |
| 2025-10-09 | Structured Output Regularization: a framework for few-shot transfer learning | Nicolas Ewen et.al. | 2510.08728 | null |
| 2025-10-09 | Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints | Zilin Kang et.al. | 2510.08549 | link |
| 2025-10-09 | Efficient Prompt Optimisation for Legal Text Classification with Proxy Prompt Evaluator | Hyunji Lee et.al. | 2510.08524 | null |
| 2025-10-09 | Adaptive Gradient Calibration for Single-Positive Multi-Label Learning in Remote Sensing Image Scene Classification | Chenying Liu et.al. | 2510.08269 | null |
| 2025-10-09 | Enhancing Visual Prompting through Expanded Transformation Space and Overfitting Mitigation | Shohei Enomoto et.al. | 2510.07823 | null |
| 2025-10-08 | Multi-Task Pre-Finetuning of Lightweight Transformer Encoders for Text Classification and NER | Junyi Zhu et.al. | 2510.07566 | null |
| 2025-10-08 | Label Semantics for Robust Hyperspectral Image Classification | Rafin Hassan et.al. | 2510.07556 | null |
| 2025-10-08 | Reasoning for Hierarchical Text Classification: The Case of Patents | Lekang Jiang et.al. | 2510.07167 | null |
| 2025-10-08 | Few-Shot Adaptation Benchmark for Remote Sensing Vision-Language Models | Karim El Khoury et.al. | 2510.07135 | link |
| 2025-10-08 | Textual interpretation of transient image classifications from large language models | Fiorenzo Stoppa et.al. | 2510.06931 | null |
| 2025-10-08 | Lamb wave-based MVDR imaging and CNN classification of defects in pipelines | Shuangshuang Li et.al. | 2510.06899 | null |
| 2025-10-08 | Continual Action Quality Assessment via Adaptive Manifold-Aligned Graph Regularization | Kanglei Zhou et.al. | 2510.06842 | null |
| 2025-10-08 | CLAQS: Compact Learnable All-Quantum Token Mixer with Shared-ansatz for Text Classification | Junhao Chen et.al. | 2510.06532 | null |
| 2025-10-02 | User to Video: A Model for Spammer Detection Inspired by Video Classification Technology | Haoyang Zhang et.al. | 2510.06233 | null |
| 2025-10-07 | Shaken or Stirred? An Analysis of MetaFormer’s Token Mixing for Medical Imaging | Ron Keuth et.al. | 2510.05971 | null |
| 2025-10-07 | Leveraging Vision Transformers for Enhanced Classification of Emotions using ECG Signals | Pubudu L. Indrasiri et.al. | 2510.05826 | null |
| 2025-10-07 | A Novel Technique for Robust Training of Deep Networks With Multisource Weak Labeled Remote Sensing Data | Gianmarco Perantoni et.al. | 2510.05760 | null |
| 2025-10-07 | Large Language Model-Based Uncertainty-Adjusted Label Extraction for Artificial Intelligence Model Development in Upper Extremity Radiography | Hanna Kreutzer et.al. | 2510.05664 | null |
| 2025-10-06 | NASP-T: A Fuzzy Neuro-Symbolic Transformer for Logic-Constrained Aviation Safety Report Classification | Fadi Al Machot et.al. | 2510.05451 | null |
| 2025-10-06 | Neuroplastic Modular Framework: Cross-Domain Image Classification of Garbage and Industrial Surfaces | Debojyoti Ghosh et.al. | 2510.05071 | null |
| 2025-10-06 | AWARE, Beyond Sentence Boundaries: A Contextual Transformer Framework for Identifying Cultural Capital in STEM Narratives | Khalid Mehtab Khan et.al. | 2510.04983 | null |
| 2025-10-06 | REN: Anatomically-Informed Mixture-of-Experts for Interstitial Lung Disease Diagnosis | Alec K. Peltekian et.al. | 2510.04923 | null |
| 2025-10-06 | A Semantics-Aware Hierarchical Self-Supervised Approach to Classification of Remote Sensing Images | Giulio Weikmann et.al. | 2510.04916 | null |
| 2025-10-06 | ERDE: Entropy-Regularized Distillation for Early-exit | Martial Guidez et.al. | 2510.04856 | null |
| 2025-10-06 | Federated Learning for Surgical Vision in Appendicitis Classification: Results of the FedSurg EndoVis 2024 Challenge | Max Kirchner et.al. | 2510.04772 | null |
| 2025-10-06 | Do Superpixel Segmentation Methods Influence Deforestation Image Classification? | Hugo Resende et.al. | 2510.04645 | null |
| 2025-10-06 | A Spatial-Spectral-Frequency Interactive Network for Multimodal Remote Sensing Classification | Hao Liu et.al. | 2510.04628 | null |
| 2025-10-05 | LLM Based Bayesian Optimization for Prompt Search | Adam Ballew et.al. | 2510.04384 | null |
| 2025-10-05 | Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models | Anindya Sundar Das et.al. | 2510.04347 | null |
| 2025-10-05 | SliceMoE: Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer Scaling | Harshil Vejendla et.al. | 2510.04286 | null |
| 2025-10-05 | From Segments to Concepts: Interpretable Image Classification via Concept-Guided Segmentation | Ran Eisenberg et.al. | 2510.04180 | null |
| 2025-10-05 | Quantization Range Estimation for Convolutional Neural Networks | Bingtao Yang et.al. | 2510.04044 | null |
| 2025-10-05 | Replacing Softmax Similarity with a Sharpened Angular Similarity: Theory and Practice of Scaling To Billion-Context Attention | Sahil Joshi et.al. | 2510.04008 | null |
| 2025-10-04 | Zero-Shot Fine-Grained Image Classification Using Large Vision-Language Models | Md. Atabuzzaman et.al. | 2510.03903 | null |
| 2025-10-04 | Cross-View Open-Vocabulary Object Detection in Aerial Imagery | Jyoti Kini et.al. | 2510.03858 | null |
| 2025-10-04 | Lightweight and Generalizable Acoustic Scene Representations via Contrastive Fine-Tuning and Distillation | Kuang Yuan et.al. | 2510.03728 | null |
| 2025-10-04 | Exploring the Hierarchical Reasoning Model for Small Natural-Image Classification Without Augmentation | Alexander V. Mantzaris et.al. | 2510.03598 | null |
| 2025-10-03 | What is a protest anyway? Codebook conceptualization is still a first-order concern in LLM-era classification | Andrew Halterman et.al. | 2510.03541 | null |
| 2025-10-03 | A Robust Clustered Federated Learning Approach for Non-IID Data with Quantity Skew | Michael Ben Ali et.al. | 2510.03380 | null |
| 2025-10-02 | Error correction in multiclass image classification of facial emotion on unbalanced samples | Andrey A. Lebedev et.al. | 2510.03337 | null |
| 2025-10-01 | SVDefense: Effective Defense against Gradient Inversion Attacks via Singular Value Decomposition | Chenxiang Luo et.al. | 2510.03319 | null |
| 2025-09-28 | QuadEnhancer: Leveraging Quadratic Transformations to Enhance Deep Neural Networks | Qian Chen et.al. | 2510.03276 | null |
| 2025-10-03 | Test-Time Defense Against Adversarial Attacks via Stochastic Resonance of Latent Ensembles | Dong Lao et.al. | 2510.03224 | null |
| 2025-10-02 | In-memory Training on Analog Devices with Limited Conductance States via Multi-tile Residual Learning | Jindan Li et.al. | 2510.02516 | null |
| 2025-09-27 | Language, Culture, and Ideology: Personalizing Offensiveness Detection in Political Tweets with Reasoning LLMs | Dzmitry Pihulski et.al. | 2510.02351 | null |
| 2025-10-02 | Knowledge Distillation Detection for Open-weights Models | Qin Shi et.al. | 2510.02302 | null |
| 2025-10-02 | microCLIP: Unsupervised CLIP Adaptation via Coarse-Fine Token Fusion for Fine-Grained Image Classification | Sathira Silva et.al. | 2510.02270 | null |
| 2025-10-02 | StelLA: Subspace Learning in Low-rank Adaptation using Stiefel Manifold | Zhizhong Li et.al. | 2510.01938 | null |
| 2025-10-02 | A Methodology for Transparent Logic-Based Classification Using a Multi-Task Convolutional Tsetlin Machine | Mayur Kishor Shende et.al. | 2510.01906 | null |
| 2025-10-01 | Intuitions of Machine Learning Researchers about Transfer Learning for Medical Image Classification | Yucheng Lu et.al. | 2510.00902 | null |
| 2025-10-01 | Uncertainty-Aware Concept Bottleneck Models with Enhanced Interpretability | Haifei Zhang et.al. | 2510.00773 | null |
| 2025-10-01 | Quantum Probabilistic Label Refining: Enhancing Label Quality for Robust Image Classification | Fang Qi et.al. | 2510.00528 | null |
| 2025-09-30 | Efficient Layer-wise LLM Fine-tuning for Revision Intention Prediction | Zhexiong Liu et.al. | 2510.00268 | null |
| 2025-09-30 | Object-Centric Case-Based Reasoning via Argumentation | Gabriel de Olim Gaul et.al. | 2510.00185 | null |
| 2025-09-30 | Zero-Shot Decentralized Federated Learning | Alessio Masano et.al. | 2509.26462 | null |
| 2025-09-30 | Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification | Artur Barros et.al. | 2509.26457 | null |
| 2025-09-30 | MAPLE: Multi-scale Attribute-enhanced Prompt Learning for Few-shot Whole Slide Image Classification | Junjie Zhou et.al. | 2509.25863 | null |
| 2025-09-29 | Accelerating Dynamic Image Graph Construction on FPGA for Vision GNNs | Anvitha Ramachandran et.al. | 2509.25121 | null |
| 2025-09-29 | Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification | Lukas Rauch et.al. | 2509.24901 | null |
| 2025-09-29 | A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity | Giordano Cicchetti et.al. | 2509.24734 | null |
| 2025-09-29 | VNODE: A Piecewise Continuous Volterra Neural Network | Siddharth Roheda et.al. | 2509.24659 | null |
| 2025-09-29 | Combining Discrepancy-Confusion Uncertainty and Calibration Diversity for Active Fine-Grained Image Classification | Yinghao Jin et.al. | 2509.24181 | null |
| 2025-09-29 | High-Order Progressive Trajectory Matching for Medical Image Dataset Distillation | Le Dong et.al. | 2509.24177 | null |
| 2025-09-28 | Singleton-Optimized Conformal Prediction | Tao Wang et.al. | 2509.24095 | null |
| 2025-09-28 | Bridging the Task Gap: Multi-Task Adversarial Transferability in CLIP and Its Derivatives | Kuanrong Liu et.al. | 2509.23917 | null |
| 2025-09-28 | CE-FAM: Concept-Based Explanation via Fusion of Activation Maps | Michihiro Kuroki et.al. | 2509.23849 | null |
| 2025-09-28 | Spatially Parallel All-optical Neural Networks | Jianwei Qin et.al. | 2509.23611 | null |
| 2025-09-28 | Deep Taxonomic Networks for Unsupervised Hierarchical Prototype Discovery | Zekun Wang et.al. | 2509.23602 | null |
| 2025-09-27 | The Impact of Role Design in In-Context Learning for Large Language Models | Hamidreza Rouzegar et.al. | 2509.23501 | null |
| 2025-09-27 | S $^3$ F-Net: A Multi-Modal Approach to Medical Image Classification via Spatial-Spectral Summarizer Fusion Network | Md. Saiful Bari Siddiqui et.al. | 2509.23442 | null |
| 2025-09-27 | Dynamics of Learning: Generative Schedules from Latent ODEs | Matt L. Sampson et.al. | 2509.23052 | null |
| 2025-09-26 | MonoCon: A general framework for learning ultra-compact high-fidelity representations using monotonicity constraints | Shreyas Gokhale et.al. | 2509.22931 | null |
| 2025-09-26 | FishAI 2.0: Marine Fish Image Classification with Multi-modal Few-shot Learning | Chenghan Yang et.al. | 2509.22930 | null |
| 2025-09-24 | Achieving Fair Skin Lesion Detection through Skin Tone Normalization and Channel Pruning | Zihan Wei et.al. | 2509.22712 | null |
| 2025-09-26 | Training-Free Synthetic Data Generation with Dual IP-Adapter Guidance | Luc Boudier et.al. | 2509.22635 | null |
| 2025-09-26 | Rule-Based Reinforcement Learning for Document Image Classification with Vision Language Models | Michael Jungo et.al. | 2509.22283 | null |
| 2025-09-26 | Universal Legal Article Prediction via Tight Collaboration between Supervised Classification Model and LLM | Xiao Chi et.al. | 2509.22119 | null |
| 2025-09-25 | Filtering with Confidence: When Data Augmentation Meets Conformal Prediction | Zixuan Wu et.al. | 2509.21479 | null |
| 2025-09-23 | Coreset selection based on Intra-class diversity | Imran Ashraf et.al. | 2509.21380 | null |
| 2025-09-21 | MDF-MLLM: Deep Fusion Through Cross-Modal Feature Alignment for Contextually Aware Fundoscopic Image Classification | Jason Jordan et.al. | 2509.21358 | null |
| 2025-09-25 | AutoIntent: AutoML for Text Classification | Ilya Alekseev et.al. | 2509.21138 | null |
| 2025-09-25 | Sparse Representations Improve Adversarial Robustness of Neural Network Classifiers | Killian Steunou et.al. | 2509.21130 | null |
| 2025-09-25 | Concepts in Motion: Temporal Bottlenecks for Interpretable Video Classification | Patrick Knab et.al. | 2509.20899 | null |
| 2025-09-25 | Plant identification based on noisy web data: the amazing performance of deep learning (LifeCLEF 2017) | Herve Goeau et.al. | 2509.20856 | null |
| 2025-09-25 | Punching Above Precision: Small Quantized Model Distillation with Learnable Regularizer | Abdur Rehman et.al. | 2509.20854 | null |
| 2025-09-24 | Understanding and Improving Adversarial Robustness of Neural Probabilistic Circuits | Weixin Chen et.al. | 2509.20549 | null |
| 2025-09-24 | Efficiently Attacking Memorization Scores | Tue Do et.al. | 2509.20463 | null |
| 2025-09-24 | Enabling Multi-Species Bird Classification on Low-Power Bioacoustic Loggers | Stefano Ciapponi et.al. | 2509.20103 | null |
| 2025-09-24 | Anatomically Constrained Transformers for Cardiac Amyloidosis Classification | Alexander Thorley et.al. | 2509.19691 | null |
| 2025-09-24 | Thinking While Listening: Simple Test Time Scaling For Audio Classification | Prateek Verma et.al. | 2509.19676 | null |
| 2025-09-14 | Holographic Transformers for Complex-Valued Signal Processing: Integrating Phase Interference into Self-Attention | Enhao Huang et.al. | 2509.19331 | null |
| 2025-09-23 | Algorithms for Adversarially Robust Deep Learning | Alexander Robey et.al. | 2509.19100 | null |
| 2025-09-23 | No Labels Needed: Zero-Shot Image Classification with Collaborative Self-Learning | Matheus Vinícius Todescato et.al. | 2509.18938 | null |
| 2025-09-23 | Benchmarking Vision-Language and Multimodal Large Language Models in Zero-shot and Few-shot Scenarios: A study on Christian Iconography | Gianmarco Spinaci et.al. | 2509.18839 | null |
| 2025-09-23 | Lightweight Vision Transformer with Window and Spatial Attention for Food Image Classification | Xinle Gao et.al. | 2509.18692 | null |
| 2025-09-23 | An overview of neural architectures for self-supervised audio representation learning from masked spectrograms | Sarthak Yadav et.al. | 2509.18691 | null |
| 2025-09-21 | Automatic Classification of Magnetic Chirality of Solar Filaments from H-Alpha Observations | Alexis Chalmers et.al. | 2509.18214 | null |
| 2025-09-17 | Self Identity Mapping | Xiuding Cai et.al. | 2509.18165 | null |
| 2025-09-22 | Elucidating the Design Space of FP4 training | Robert Hu et.al. | 2509.17791 | null |
| 2025-09-22 | Dual-View Alignment Learning with Hierarchical-Prompt for Class-Imbalance Multi-Label Classification | Sheng Huang et.al. | 2509.17747 | null |
| 2025-09-22 | WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image Classification | Yiwen Jiang et.al. | 2509.17740 | null |
| 2025-09-22 | Enhancing Cross-Lingual Transfer through Reversible Transliteration: A Huffman-Based Approach for Low-Resource Languages | Wenhao Zhuang et.al. | 2509.17493 | null |
| 2025-09-22 | Multimodal Medical Image Classification via Synergistic Learning Pre-training | Qinghua Lin et.al. | 2509.17492 | null |
| 2025-09-21 | DeepASA: An Object-Oriented One-for-All Network for Auditory Scene Analysis | Dongheon Lee et.al. | 2509.17247 | null |
| 2025-09-21 | Flow-Induced Diagonal Gaussian Processes | Moule Lin et.al. | 2509.17153 | null |
| 2025-09-20 | Looking in the mirror: A faithful counterfactual explanation method for interpreting deep image classification models | Townim Faisal Chowdhury et.al. | 2509.16822 | null |
| 2025-09-20 | Towards a Transparent and Interpretable AI Model for Medical Image Classifications | Binbin Wen et.al. | 2509.16685 | null |
| 2025-09-20 | LLM-Guided Co-Training for Text Classification | Md Mezbaur Rahman et.al. | 2509.16516 | null |
| 2025-09-19 | Training Variational Quantum Circuits Using Particle Swarm Optimization | Marco Mordacci et.al. | 2509.15726 | null |
| 2025-09-19 | Impact of Single Rotations and Entanglement Topologies in Quantum Neural Networks | Marco Mordacci et.al. | 2509.15722 | null |
| 2025-09-18 | Training thermodynamic computers by gradient descent | Stephen Whitelam et.al. | 2509.15324 | null |
| 2025-09-18 | Which Direction to Choose? An Analysis on the Representation Power of Self-Supervised ViTs in Downstream Tasks | Yannis Kaltampanidis et.al. | 2509.15272 | null |
| 2025-09-17 | M-PACE: Mother Child Framework for Multimodal Compliance | Shreyash Verma et.al. | 2509.15241 | null |
| 2025-09-18 | Leveraging Geometric Visual Illusions as Perceptual Inductive Biases for Vision Models | Haobo Yang et.al. | 2509.15156 | null |
| 2025-09-18 | Low-rank surrogate modeling and stochastic zero-order optimization for training of neural networks with black-box layers | Andrei Chertkov et.al. | 2509.15113 | null |
| 2025-09-18 | MARIC: Multi-Agent Reasoning for Image Classification | Wonduk Seo et.al. | 2509.14860 | null |
| 2025-09-18 | Threat Modeling for Enhancing Security of IoT Audio Classification Devices under a Secure Protocols Framework | Sergio Benlloch-Lopez et.al. | 2509.14657 | null |
| 2025-09-18 | Enhancing Situational Awareness in Wearable Audio Devices Using a Lightweight Sound Event Localization and Detection System | Jun-Wei Yeow et.al. | 2509.14650 | null |
| 2025-09-16 | HQCNN: A Hybrid Quantum-Classical Neural Network for Medical Image Classification | Shahjalal et.al. | 2509.14277 | null |
| 2025-09-17 | CSMoE: An Efficient Remote Sensing Foundation Model with Soft Mixture-of-Experts | Leonard Hackel et.al. | 2509.14104 | null |
| 2025-09-17 | Comprehensive Evaluation of CNN-Based Audio Tagging Models on Resource-Constrained Devices | Jordi Grau-Haro et.al. | 2509.14049 | null |
| 2025-09-17 | Quantum Variational Activation Functions Empower Kolmogorov-Arnold Networks | Jiun-Cheng Jiang et.al. | 2509.14026 | link |
| 2025-09-17 | Taylor-Series Expanded Kolmogorov-Arnold Network for Medical Imaging Classification | Kaniz Fatema et.al. | 2509.13687 | null |
| 2025-09-17 | Deep Lookup Network | Yulan Guo et.al. | 2509.13662 | null |
| 2025-09-16 | Multimodal Hate Detection Using Dual-Stream Graph Neural Networks | Jiangbei Yue et.al. | 2509.13515 | null |
| 2025-09-14 | Hybrid Quantum-Classical Model for Image Classification | Muhammad Adnan Shahzad et.al. | 2509.13353 | link |
| 2025-09-16 | Hierarchical Deep Fusion Framework for Multi-dimensional Facial Forgery Detection – The 2024 Global Deepfake Image Detection Challenge | Kohou Wang et.al. | 2509.13107 | null |
| 2025-09-16 | Time-step Mixup for Efficient Spiking Knowledge Transfer from Appearance to Event Domain | Yuqi Xie et.al. | 2509.12959 | null |
| 2025-09-16 | Reversible Deep Equilibrium Models | Sam McCallum et.al. | 2509.12917 | null |
| 2025-09-15 | GhostNetV3-Small: A Tailored Architecture and Comparative Study of Distillation Strategies for Tiny Images | Florian Zager et.al. | 2509.12380 | null |
| 2025-09-13 | A Modern Look at Simplicity Bias in Image Classification Tasks | Xiaoguang Chang et.al. | 2509.12265 | null |
| 2025-09-10 | RU-Net for Automatic Characterization of TRISO Fuel Cross Sections | Lu Cai et.al. | 2509.12244 | null |
| 2025-09-15 | GTA: Supervised-Guided Reinforcement Learning for Text Classification with Large Language Models | Min Zeng et.al. | 2509.12108 | null |
| 2025-09-15 | Neuromorphic Photonic Circuits with Nonlinear Dynamics and Memory for Time Sequence Classification | Alessandro Foradori et.al. | 2509.11721 | null |
| 2025-09-15 | Optimizing Class Distributions for Bias-Aware Multi-Class Learning | Mirco Felske et.al. | 2509.11588 | null |
| 2025-09-14 | Decoding Musical Origins: Distinguishing Human and AI Composers | Cheng-Yang Tsai et.al. | 2509.11369 | null |
| 2025-09-14 | Promoting Shape Bias in CNNs: Frequency-Based and Contrastive Regularization for Corruption Robustness | Robin Narsingh Ranabhat et.al. | 2509.11355 | null |
| 2025-09-14 | The Impact of Skin Tone Label Granularity on the Performance and Fairness of AI Based Dermatology Image Classification Models | Partha Shah et.al. | 2509.11184 | null |
| 2025-09-14 | An Entropy-Guided Curriculum Learning Strategy for Data-Efficient Acoustic Scene Classification under Domain Shift | Peihong Zhang et.al. | 2509.11168 | null |
| 2025-09-14 | A Collaborative Framework for Quantum Optimisation and Quantum Neural Networks: Credit Feature Selection and Image Classification | JiaNing Long et.al. | 2509.11110 | null |
| 2025-09-14 | UltraUPConvNet: A UPerNet- and ConvNeXt-Based Multi-Task Network for Ultrasound Tissue Segmentation and Disease Prediction | Zhi Chen et.al. | 2509.11108 | null |
| 2025-09-12 | A Comparison and Evaluation of Fine-tuned Convolutional Neural Networks to Large Language Models for Image Classification and Segmentation of Brain Tumors on MRI | Felicia Liu et.al. | 2509.10683 | null |
| 2025-09-10 | Combining Audio and Non-Audio Inputs in Evolved Neural Networks for Ovenbird | Sergio Poo Hernandez et.al. | 2509.10566 | null |
| 2025-09-02 | FireGNN: Neuro-Symbolic Graph Neural Networks with Trainable Fuzzy Rules for Interpretable Medical Image Classification | Prajit Sengupta et.al. | 2509.10510 | link |
| 2025-09-12 | Beyond Token Limits: Assessing Language Model Performance on Long Text Classification | Miklós Sebők et.al. | 2509.10199 | null |
| 2025-09-12 | Prototypical Contrastive Learning For Improved Few-Shot Audio Classification | Christos Sgouropoulos et.al. | 2509.10074 | null |
| 2025-09-12 | Acoustic Scene Classification Using CNN-GRU Model Without Knowledge Distillation | Ee-Leng Tan et.al. | 2509.09931 | null |
| 2025-09-11 | Images in Motion?: A First Look into Video Leakage in Collaborative Deep Learning | Md Fazle Rasul et.al. | 2509.09742 | null |
| 2025-09-11 | Image Recognition with Vision and Language Embeddings of VLMs | Illia Volkov et.al. | 2509.09311 | null |
| 2025-09-11 | Adaptive Knowledge Distillation using a Device-Aware Teacher for Low-Complexity Acoustic Scene Classification | Seung Gyu Jeong et.al. | 2509.09262 | null |
| 2025-09-11 | CWSSNet: Hyperspectral Image Classification Enhanced by Wavelet Domain Convolution | Yulin Tong et.al. | 2509.09163 | null |
| 2025-09-10 | CoSwin: Convolution Enhanced Hierarchical Shifted Window Attention For Small-Scale Vision | Puskal Khadka et.al. | 2509.08959 | link |
| 2025-09-10 | UOPSL: Unpaired OCT Predilection Sites Learning for Fundus Image Diagnosis Augmentation | Zhihao Zhao et.al. | 2509.08624 | null |
| 2025-09-10 | HyperTTA: Test-Time Adaptation for Hyperspectral Image Classification under Distribution Shifts | Xia Yue et.al. | 2509.08436 | null |
| 2025-09-10 | Boosted Training of Lightweight Early Exits for Optimizing CNN Image Classification Inference | Yehudit Aperstein et.al. | 2509.08318 | null |
| 2025-09-10 | SimCroP: Radiograph Representation Learning with Similarity-driven Cross-granularity Pre-training | Rongsheng Wang et.al. | 2509.08311 | link |
| 2025-09-09 | Are Humans as Brittle as Large Language Models? | Jiahui Li et.al. | 2509.07869 | null |
| 2025-09-09 | Spectral and Rhythm Feature Performance Evaluation for Category and Class Level Audio Classification with Deep Convolutional Neural Networks | Friedrich Wolf-Monheim et.al. | 2509.07756 | null |
| 2025-09-09 | Nearest Neighbor Projection Removal Adversarial Training | Himanshu Singh et.al. | 2509.07673 | null |
| 2025-09-09 | MedicalPatchNet: A Patch-Based Self-Explainable AI Architecture for Chest X-ray Classification | Patrick Wienholt et.al. | 2509.07477 | link |
| 2025-09-08 | Dimensionally Reduced Open-World Clustering: DROWCULA | Erencem Ozbey et.al. | 2509.07184 | null |
| 2025-09-07 | 1 bit is all we need: binary normalized neural networks | Eduardo Lobo Lustoda Cabral et.al. | 2509.07025 | null |
| 2025-09-03 | FedAPT: Federated Adversarial Prompt Tuning for Vision-Language Models | Kun Zhai et.al. | 2509.06992 | null |
| 2025-09-08 | Entanglement and Classical Simulability in Quantum Extreme Learning Machines | A. De Lorenzis et.al. | 2509.06873 | null |
| 2025-09-08 | Video-Based MPAA Rating Prediction: An Attention-Driven Hybrid Architecture Using Contrastive Learning | Dipta Neogi et.al. | 2509.06826 | null |
| 2025-09-08 | Classical Neural Networks on Quantum Devices via Tensor Network Disentanglers: A Case Study in Image Classification | Borja Aizpurua et.al. | 2509.06653 | null |
| 2025-09-08 | IGAff: Benchmarking Adversarial Iterative and Genetic Affine Algorithms on Deep Neural Networks | Sebastian-Vasile Echim et.al. | 2509.06459 | null |
| 2025-09-07 | Khana: A Comprehensive Indian Cuisine Dataset | Omkar Prabhu et.al. | 2509.06006 | null |
| 2025-09-07 | A brain-inspired paradigm for scalable quantum vision | Chenghua Duan et.al. | 2509.05919 | null |
| 2025-09-06 | Brain Tumor Detection Through Diverse CNN Architectures in IoT Healthcare Industries: Fast R-CNN, U-Net, Transfer Learning-Based CNN, and Fully Connected CNN | Mohsen Asghari Ilani et.al. | 2509.05821 | null |
| 2025-09-06 | High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator | Kuan-Ting Lin et.al. | 2509.05688 | null |
| 2025-09-06 | LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding | Yuxuan Hu et.al. | 2509.05657 | null |
| 2025-09-05 | Quaternion Approximation Networks for Enhanced Image Classification and Oriented Object Detection | Bryce Grant et.al. | 2509.05512 | null |
| 2025-09-05 | Prior Distribution and Model Confidence | Maksim Kazanskii et.al. | 2509.05485 | null |
| 2025-09-06 | Universality of physical neural networks with multivariate nonlinearity | Benjamin Savinson et.al. | 2509.05420 | null |
| 2025-08-30 | Application of discrete Ricci curvature in pruning randomly wired neural networks: A case study with chest x-ray classification of COVID-19 | Pavithra Elumalai et.al. | 2509.05322 | null |
| 2025-08-30 | Context-Aware Knowledge Distillation with Adaptive Weighting for Image Classification | Zhengda Li et.al. | 2509.05319 | null |
| 2025-09-04 | Noisy Label Refinement with Semantically Reliable Synthetic Images | Yingxuan Li et.al. | 2509.04298 | null |
| 2025-09-04 | An Automated, Scalable Machine Learning Model Inversion Assessment Pipeline | Tyler Shumaker et.al. | 2509.04214 | null |
| 2025-09-04 | Real Time FPGA Based Transformers & VLMs for Vision Tasks: SOTA Designs and Optimizations | Safa Mohammed Sali et.al. | 2509.04162 | null |
| 2025-09-04 | SAC-MIL: Spatial-Aware Correlated Multiple Instance Learning for Histopathology Whole Slide Image Classification | Yu Bai et.al. | 2509.03973 | null |
| 2025-09-03 | Learning Mechanism Underlying NLP Pre-Training and Fine-Tuning | Yarden Tzach et.al. | 2509.03407 | null |
| 2025-09-03 | TinyDrop: Tiny Model Guided Token Dropping for Vision Transformers | Guoxin Wang et.al. | 2509.03379 | null |
| 2025-08-24 | The Lifecycle Principle: Stabilizing Dynamic Neural Networks with State Memory | Zichuan Yang et.al. | 2509.02575 | null |
| 2025-09-02 | Ordinal Adaptive Correction: A Data-Centric Approach to Ordinal Image Classification with Noisy Labels | Alireza Sedighi Moghaddam et.al. | 2509.02351 | null |
| 2025-09-02 | Extrapolated Markov Chain Oversampling Method for Imbalanced Text Classification | Aleksi Avela et.al. | 2509.02332 | null |
| 2025-09-02 | HydroVision: Predicting Optically Active Parameters in Surface Water Using Computer Vision | Shubham Laxmikant Deshmukh et.al. | 2509.01882 | null |
| 2025-09-01 | Modeling and benchmarking quantum optical neurons for efficient neural computation | Andrea Andrisani et.al. | 2509.01784 | null |
| 2025-09-01 | Examination of PCA Utilisation for Multilabel Classifier of Multispectral Images | Filip Karpowicz et.al. | 2509.01691 | null |
| 2025-09-01 | AgroSense: An Integrated Deep Learning System for Crop Recommendation via Soil Image Analysis and Nutrient Profiling | Vishal Pandey et.al. | 2509.01344 | null |
| 2025-08-31 | Hybrid Topic-Semantic Labeling and Graph Embeddings for Unsupervised Legal Document Clustering | Deepak Bastola et.al. | 2509.00990 | null |
| 2025-08-31 | Performance Analysis of Supervised Machine Learning Algorithms for Text Classification | Sadia Zaman Mishu et.al. | 2509.00983 | null |
| 2025-08-31 | Quantization Meets OOD: Generalizable Quantization-aware Training from a Flatness Perspective | Jiacheng Jiang et.al. | 2509.00859 | null |
| 2025-08-31 | A computer vision-based approach to enhance seismic catalogues | Michele De Solda et.al. | 2509.00791 | null |
| 2025-08-31 | Multi-Level CLS Token Fusion for Contrastive Learning in Endoscopy Image Classification | Y Hop Nguyen et.al. | 2509.00752 | null |
| 2025-08-31 | CSFMamba: Cross State Fusion Mamba Operator for Multimodal Remote Sensing Image Classification | Qingyu Wang et.al. | 2509.00677 | null |
| 2025-08-30 | All-optical classification of real biomedical cell images using a diffractive neural network: a simulation study | Norihide Sagami et.al. | 2509.00370 | null |
| 2025-08-30 | Target-Oriented Single Domain Generalization | Marzi Heidari et.al. | 2509.00351 | null |
| 2025-08-29 | Principled Approximation Methods for Efficient and Scalable Deep Learning | Pedro Savarese et.al. | 2509.00174 | null |
| 2025-08-27 | Yet Unnoticed in LSTM: Binary Tree Based Input Reordering, Weight Regularization, and Gate Nonlinearization | Mojtaba Moattari et.al. | 2509.00087 | null |
| 2025-08-24 | Performance is not All You Need: Sustainability Considerations for Algorithms | Xiang Li et.al. | 2509.00045 | null |
| 2025-08-29 | I Stolenly Swear That I Am Up to (No) Good: Design and Evaluation of Model Stealing Attacks | Daryna Oliynyk et.al. | 2508.21654 | null |
| 2025-08-28 | Full-Frequency Temporal Patching and Structured Masking for Enhanced Audio Classification | Aditya Makineni et.al. | 2508.21243 | null |
| 2025-08-28 | Online incremental learning for audio classification using a pretrained audio model | Manjunath Mulimani et.al. | 2508.20732 | null |
| 2025-08-28 | Domain Adaptation Techniques for Natural and Medical Image Classification | Ahmad Chaddad et.al. | 2508.20537 | null |
| 2025-08-28 | Dual-Model Weight Selection and Self-Knowledge Distillation for Medical Image Classification | Ayaka Tsutsumi et.al. | 2508.20461 | null |
| 2025-08-27 | Exploring Selective Retrieval-Augmentation for Long-Tail Legal Text Classification | Boheng Mao et.al. | 2508.19997 | null |
| 2025-08-27 | Dhati+: Fine-tuned Large Language Models for Arabic Subjectivity Evaluation | Slimane Bellaouar et.al. | 2508.19966 | null |
| 2025-08-27 | Microscale optoelectronic reservoir networks of halide perovskite for in-sensor computing | Jeroen J. de Boer et.al. | 2508.19916 | null |
| 2025-08-27 | Image Quality Assessment for Machines: Paradigm, Large-scale Database, and Models | Xiaoqi Wang et.al. | 2508.19850 | link |
| 2025-08-26 | Time Series Analysis of Spiking Neural Systems via Transfer Entropy and Directed Persistent Homology | Dylan Peek et.al. | 2508.19048 | null |
| 2025-08-26 | Automatic Prompt Optimization with Prompt Distillation | Ernest A. Dyagin et.al. | 2508.18992 | null |
| 2025-08-26 | Flatness-aware Curriculum Learning via Adversarial Difficulty | Hiroaki Aizawa et.al. | 2508.18726 | null |
| 2025-08-26 | Class-wise Flooding Regularization for Imbalanced Image Classification | Hiroaki Aizawa et.al. | 2508.18723 | null |
| 2025-08-26 | Natural Image Classification via Quasi-Cyclic Graph Ensembles and Random-Bond Ising Models at the Nishimori Temperature | V. S. Usatyuk et.al. | 2508.18717 | null |
| 2025-08-25 | Analise de Desaprendizado de Maquina em Modelos de Classificacao de Imagens Medicas | Andreza M. C. Falcao et.al. | 2508.18509 | null |
| 2025-08-25 | Scene-Aware Vectorized Memory Multi-Agent Framework with Cross-Modal Differentiated Quantization VLMs for Visually Impaired Assistance | Xiangxiang Wang et.al. | 2508.18177 | null |
| 2025-08-25 | Hybrid Quantum-Classical Learning for Multiclass Image Classification | Shuchismita Anwar et.al. | 2508.18161 | null |
| 2025-08-25 | Designing Practical Models for Isolated Word Visual Speech Recognition | Iason Ioannis Panagos et.al. | 2508.17894 | null |
| 2025-08-25 | Towards Optimal Convolutional Transfer Learning Architectures for Breast Lesion Classification and ACL Tear Detection | Daniel Frees et.al. | 2508.17567 | null |
| 2025-08-24 | Efficient Zero-Shot Long Document Classification by Reducing Context Through Sentence Ranking | Prathamesh Kokate et.al. | 2508.17490 | null |
| 2025-08-24 | Morphological Cognition: Classifying MNIST Digits Through Morphological Computation Alone | Alican Mertan et.al. | 2508.17469 | null |
| 2025-08-24 | ResLink: A Novel Deep Learning Architecture for Brain Tumor Classification with Area Attention and Residual Connections | Sumedha Arya et.al. | 2508.17259 | null |
| 2025-08-23 | GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection | Melissa Kazemi Rad et.al. | 2508.17057 | null |
| 2025-08-22 | Enhanced NIRMAL Optimizer With Damped Nesterov Acceleration: A Comparative Analysis | Nirmal Gaud et.al. | 2508.16550 | null |
| 2025-08-22 | LLM-as-classifier: Semi-Supervised, Iterative Framework for Hierarchical Text Classification using Large Language Models | Doohee You et.al. | 2508.16478 | null |
| 2025-08-22 | Vision encoders should be image size agnostic and task driven | Nedyalko Prisadnikov et.al. | 2508.16317 | null |
| 2025-08-22 | An Investigation of Visual Foundation Models Robustness | Sandeep Gupta et.al. | 2508.16225 | null |
| 2025-08-21 | Contributions to Label-Efficient Learning in Computer Vision and Remote Sensing | Minh-Tan Pham et.al. | 2508.15973 | null |
| 2025-08-21 | Glo-VLMs: Leveraging Vision-Language Models for Fine-Grained Diseased Glomerulus Classification | Zhenhao Guo et.al. | 2508.15960 | null |
| 2025-08-21 | Investigating Different Geo Priors for Image Classification | Angela Zhu et.al. | 2508.15946 | null |
| 2025-08-21 | Strategic Sample Selection for Improved Clean-Label Backdoor Attacks in Text Classification | Onur Alp Kirci et.al. | 2508.15934 | null |
| 2025-08-21 | Structure-Preserving Medical Image Generation from a Latent Graph Representation | Kevin Arias et.al. | 2508.15920 | null |
| 2025-08-13 | A BERT-based Hierarchical Classification Model with Applications in Chinese Commodity Classification | Kun Liu et.al. | 2508.15800 | null |
| 2025-08-21 | Tutorial on the Probabilistic Unification of Estimation Theory, Machine Learning, and Generative AI | Mohammed Elmusrati et.al. | 2508.15719 | null |
| 2025-08-21 | ASCMamba: Multimodal Time-Frequency Mamba for Acoustic Scene Classification | Bochao Sun et.al. | 2508.15632 | null |
| 2025-08-21 | AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation | Yulin Sun et.al. | 2508.15429 | null |
| 2025-08-21 | Transfer learning optimization based on evolutionary selective fine tuning | Jacinto Colan et.al. | 2508.15367 | null |
| 2025-08-21 | Explainable Knowledge Distillation for Efficient Medical Image Classification | Aqib Nazir Mir et.al. | 2508.15251 | null |
| 2025-08-21 | Robust and Efficient Quantum Reservoir Computing with Discrete Time Crystal | Da Zhang et.al. | 2508.15230 | null |
| 2025-08-20 | Fast Graph Neural Network for Image Classification | Mustafa Mohammadi Gharasuie et.al. | 2508.14958 | null |
| 2025-08-20 | HHNAS-AM: Hierarchical Hybrid Neural Architecture Search using Adaptive Mutation Policies | Anurag Tripathi et.al. | 2508.14946 | null |
| 2025-08-19 | TOM: An Open-Source Tongue Segmentation Method with Multi-Teacher Distillation and Task-Specific Data Augmentation | Jiacheng Xie et.al. | 2508.14932 | null |
| 2025-08-20 | Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models | Jiabo Huang et.al. | 2508.14707 | null |
| 2025-08-20 | SMTrack: End-to-End Trained Spiking Neural Networks for Multi-Object Tracking in RGB Videos | Pengzhi Zhong et.al. | 2508.14607 | null |
| 2025-08-20 | Incremental Object Detection with Prompt-based Methods | Matthias Neuwirth-Trapp et.al. | 2508.14599 | null |
| 2025-08-20 | Multi-view Graph Condensation via Tensor Decomposition | Nícolas Roque dos Santos et.al. | 2508.14330 | null |
| 2025-08-19 | Graph Concept Bottleneck Models | Haotian Xu et.al. | 2508.14255 | null |
| 2025-08-19 | Accelerating Image Classification with Graph Convolutional Neural Networks using Voronoi Diagrams | Mustafa Mohammadi Gharasuie et.al. | 2508.14218 | null |
| 2025-08-19 | Comparing energy consumption and accuracy in text classification inference | Johannes Zschache et.al. | 2508.14170 | null |
| 2025-08-12 | Toward Lifelong Learning in Equilibrium Propagation: Sleep-like and Awake Rehearsal for Enhanced Stability | Yoshimasa Kubo et.al. | 2508.14081 | null |
| 2025-08-19 | Towards Efficient Vision State Space Models via Token Merging | Jinyoung Park et.al. | 2508.13599 | null |
| 2025-08-19 | A fully-programmable integrated photonic processor for both domain-specific and general-purpose computing | Feng-Kai Han et.al. | 2508.13551 | null |
| 2025-08-19 | Compressed Models are NOT Trust-equivalent to Their Large Counterparts | Rohit Raj Rai et.al. | 2508.13533 | null |
| 2025-08-19 | Vision Transformers for Kidney Stone Image Classification: A Comparative Study with CNNs | Ivan Reyes-Amezcua et.al. | 2508.13461 | null |
| 2025-08-18 | Applications of Small Language Models in Medical Imaging Classification with a Focus on Prompt Strategies | Yiting Wang et.al. | 2508.13378 | null |
| 2025-08-18 | Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot Learning | Dexia Chen et.al. | 2508.12877 | null |
| 2025-08-18 | CLAIRE-DSA: Fluoroscopic Image Classification for Quality Assurance of Computer Vision Pipelines in Acute Ischemic Stroke | Cristo J. van den Berg et.al. | 2508.12755 | null |
| 2025-08-17 | Skin Cancer Classification: Hybrid CNN-Transformer Models with KAN-Based Fusion | Shubhi Agarwal et.al. | 2508.12484 | null |
| 2025-08-17 | Federated Cross-Modal Style-Aware Prompt Generation | Suraj Prasad et.al. | 2508.12399 | null |
| 2025-08-17 | Attention Pooling Enhances NCA-based Classification of Microscopy Images | Chen Yang et.al. | 2508.12324 | null |
| 2025-08-17 | CryptPEFT: Efficient and Private Neural Network Inference via Parameter-Efficient Fine-Tuning | Saisai Xia et.al. | 2508.12264 | null |
| 2025-08-16 | Extending Straight-Through Estimation for Robust Neural Networks on Analog CIM Hardware | Yuannuo Feng et.al. | 2508.11940 | null |
| 2025-08-15 | An Efficient Medical Image Classification Method Based on a Lightweight Improved ConvNeXt-Tiny Architecture | Jingsong Xia et.al. | 2508.11532 | null |
| 2025-08-15 | Robust Convolution Neural ODEs via Contractivity-promoting regularization | Muhammad Zakwan et.al. | 2508.11432 | null |
| 2025-08-15 | Model Interpretability and Rationale Extraction by Input Mask Optimization | Marc Brinner et.al. | 2508.11388 | null |
| 2025-08-15 | Noise Matters: Optimizing Matching Noise for Diffusion Classifiers | Yanghao Wang et.al. | 2508.11330 | null |
| 2025-08-13 | NIRMAL Pooling: An Adaptive Max Pooling Approach with Non-linear Activation for Enhanced Image Classification | Nirmal Gaud et.al. | 2508.10940 | null |
| 2025-08-14 | X-Node: Self-Explanation is All We Need | Prajit Sengupta et.al. | 2508.10461 | null |
| 2025-08-13 | Invisible Watermarks, Visible Gains: Steering Machine Unlearning with Bi-Level Watermarking Design | Yuhao Sun et.al. | 2508.10065 | null |
| 2025-08-10 | Multi-task Adversarial Attacks against Black-box Model with Few-shot Queries | Wenqiang Wang et.al. | 2508.10039 | null |
| 2025-08-08 | LLMCARE: Alzheimer’s Detection via Transformer Models Enhanced by LLM-Generated Synthetic Data | Ali Zolnour et.al. | 2508.10027 | null |
| 2025-08-04 | AutoGeTS: Knowledge-based Automated Generation of Text Synthetics for Improving Text Classification | Chenhao Xue et.al. | 2508.10000 | null |
| 2025-08-13 | MOC: Meta-Optimized Classifier for Few-Shot Whole Slide Image Classification | Tianqi Xiang et.al. | 2508.09967 | null |
| 2025-08-13 | HKT: A Biologically Inspired Framework for Modular Hereditary Knowledge Transfer in Neural Networks | Yanick Chistian Tchenko et.al. | 2508.09743 | null |
| 2025-08-13 | Exploring the Equivalence of Closed-Set Generative and Real Data Augmentation in Image Classification | Haowen Wang et.al. | 2508.09550 | null |
| 2025-08-13 | CLIP-Flow: A Universal Discriminator for AI-Generated Images Inspired by Anomaly Detection | Zhipeng Yuan et.al. | 2508.09477 | null |
| 2025-08-12 | SinLlama – A Large Language Model for Sinhala | H. W. K. Aravinda et.al. | 2508.09115 | null |
| 2025-08-12 | 3DFroMLLM: 3D Prototype Generation only from Pretrained Multimodal LLMs | Noor Ahmed et.al. | 2508.08821 | null |
| 2025-08-12 | Classifier Language Models: Unifying Sparse Finetuning and Adaptive Tokenization for Specialized Classification Tasks | Adit Krishnan et.al. | 2508.08635 | null |
| 2025-08-11 | Incoherent Light-Driven Nonlinear Optical Extreme Learner via Data Reverberation | Bofeng Liu et.al. | 2508.08428 | null |
| 2025-08-11 | Neural Tangent Knowledge Distillation for Optical Convolutional Networks | Jinlin Xiang et.al. | 2508.08421 | null |
| 2025-08-11 | SHeRL-FL: When Representation Learning Meets Split Learning in Hierarchical Federated Learning | Dung T. Tran et.al. | 2508.08339 | null |
| 2025-08-11 | FairFLRep: Fairness aware fault localization and repair of Deep Neural Networks | Moses Openja et.al. | 2508.08151 | null |
| 2025-08-11 | Pindrop it! Audio and Visual Deepfake Countermeasures for Robust Detection and Fine Grained-Localization | Nicholas Klein et.al. | 2508.08141 | null |
| 2025-08-11 | Data-Efficient Biomedical In-Context Learning: A Diversity-Enhanced Submodular Perspective | Jun Wang et.al. | 2508.08140 | null |
| 2025-08-11 | Auditory Intelligence: Understanding the World Through Sound | Hyeonuk Nam et.al. | 2508.07829 | null |
| 2025-08-11 | Importance-Aware Semantic Communication in MIMO-OFDM Systems Using Vision Transformer | Joohyuk Park et.al. | 2508.07696 | null |
| 2025-08-11 | GLiClass: Generalist Lightweight Model for Sequence Classification Tasks | Ihor Stepanov et.al. | 2508.07662 | link |
| 2025-08-09 | Sensory robustness through top-down feedback and neural stochasticity in recurrent vision models | Antonino Greco et.al. | 2508.07115 | null |
| 2025-08-09 | Nonlinear Photonic Neuromorphic Chips for Spiking Reinforcement Learning | Shuiying Xiang et.al. | 2508.06962 | null |
| 2025-08-09 | Beyond Frequency: Seeing Subtle Cues Through the Lens of Spatial Decomposition for Fine-Grained Visual Classification | Qin Xu et.al. | 2508.06959 | null |
| 2025-08-08 | Large Language Models for Oral History Understanding with Text Classification and Sentiment Analysis | Komala Subramanyam Cherukuri et.al. | 2508.06729 | link |
| 2025-08-06 | Slice or the Whole Pie? Utility Control for AI Models | Ye Tao et.al. | 2508.06551 | null |
| 2025-08-02 | Large Language Models Facilitate Vision Reflection in Image Classification | Guoyuan An et.al. | 2508.06525 | null |
| 2025-08-08 | Blockchain-Enabled Federated Learning | Murtaza Rangwala et.al. | 2508.06406 | null |
| 2025-08-08 | Text as Any-Modality for Zero-Shot Classification by Consistent Prompt Tuning | Xiangyu Wu et.al. | 2508.06382 | null |
| 2025-08-08 | FedX: Explanation-Guided Pruning for Communication-Efficient Federated Learning in Remote Sensing | Barış Büyüktaş et.al. | 2508.06256 | null |
| 2025-08-07 | AHDMIL: Asymmetric Hierarchical Distillation Multi-Instance Learning for Fast and Accurate Whole-Slide Image Classification | Jiuyang Dong et.al. | 2508.05114 | null |
| 2025-08-07 | ULU: A Unified Activation Function | Simin Huo et.al. | 2508.05073 | null |
| 2025-08-07 | MedMambaLite: Hardware-Aware Mamba for Medical Image Classification | Romina Aalishah et.al. | 2508.05049 | null |
| 2025-08-06 | Revealing Temporal Label Noise in Multimodal Hateful Video Classification | Shuonan Yang et.al. | 2508.04900 | null |
| 2025-08-06 | Share Your Attention: Transformer Weight Sharing via Matrix-based Dictionary Learning | Magauiya Zhussip et.al. | 2508.04581 | null |
| 2025-08-06 | Benchmarking Uncertainty and its Disentanglement in multi-label Chest X-Ray Classification | Simon Baur et.al. | 2508.04457 | null |
| 2025-08-06 | Matrix-Free Two-to-Infinity and One-to-Two Norms Estimation | Askar Tsyganov et.al. | 2508.04444 | null |
| 2025-08-06 | WSS-CL: Weight Saliency Soft-Guided Contrastive Learning for Efficient Machine Unlearning Image Classification | Thang Duc Tran et.al. | 2508.04308 | null |
| 2025-08-06 | Comparative Analysis of Novel NIRMAL Optimizer Against Adam and SGD with Momentum | Nirmal Gaud et.al. | 2508.04293 | null |
| 2025-08-06 | A machine learning approach for image classification in synthetic aperture RADAR | Romina Gaburro et.al. | 2508.04234 | null |
| 2025-08-06 | DocVCE: Diffusion-based Visual Counterfactual Explanations for Document Image Classification | Saifullah Saifullah et.al. | 2508.04233 | null |
| 2025-08-06 | Hierarchical Text Classification Using Black Box Large Language Models | Kosuke Yoshimura et.al. | 2508.04219 | null |
| 2025-08-06 | DP-DocLDM: Differentially Private Document Image Generation using Latent Diffusion Models | Saifullah Saifullah et.al. | 2508.04208 | null |
| 2025-08-06 | Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval | Yifan Wang et.al. | 2508.04028 | null |
| 2025-08-05 | FedPromo: Federated Lightweight Proxy Models at the Edge Bring New Domains to Foundation Models | Matteo Caligiuri et.al. | 2508.03356 | null |
| 2025-08-05 | Adaptive Sparse Softmax: An Effective and Efficient Softmax Variant | Qi Lv et.al. | 2508.03175 | null |
| 2025-08-05 | Augmenting Continual Learning of Diseases with LLM-Generated Visual Concepts | Jiantao Tan et.al. | 2508.03094 | null |
| 2025-08-05 | Contrastive Cross-Bag Augmentation for Multiple Instance Learning-based Whole Slide Image Classification | Bo Zhang et.al. | 2508.03081 | null |
| 2025-08-05 | The Geometry of Cortical Computation: Manifold Disentanglement and Predictive Dynamics in VCNet | Brennen A. Hill et.al. | 2508.02995 | null |
| 2025-08-04 | Tricks and Plug-ins for Gradient Boosting with Transformers | Biyi Fang et.al. | 2508.02924 | null |
| 2025-08-04 | ASMR: Angular Support for Malfunctioning Client Resilience in Federated Learning | Mirko Konstantin et.al. | 2508.02414 | null |
| 2025-08-04 | Semi-Supervised Dual-Threshold Contrastive Learning for Ultrasound Image Classification and Segmentation | Peng Zhang et.al. | 2508.02265 | null |
| 2025-08-04 | Reservoir Computing with Evolved Critical Neural Cellular Automata | Sidney Pontes-Filho et.al. | 2508.02218 | null |
| 2025-08-04 | Large-Scale Model Enabled Semantic Communication Based on Robust Knowledge Distillation | Kuiyuan Ding et.al. | 2508.02148 | null |
| 2025-08-04 | FedLAD: A Linear Algebra Based Data Poisoning Defence for Federated Learning | Qi Xiong et.al. | 2508.02136 | null |
| 2025-08-04 | REACT-KD: Region-Aware Cross-modal Topological Knowledge Distillation for Interpretable Medical Image Classification | Hongzhao Chen et.al. | 2508.02104 | null |
| 2025-08-04 | Deeply Dual Supervised learning for melanoma recognition | Rujosh Polma et.al. | 2508.01994 | null |
| 2025-08-03 | Granular Concept Circuits: Toward a Fine-Grained Circuit Discovery for Concept Representations | Dahee Kwon et.al. | 2508.01728 | null |
| 2025-08-03 | HateClipSeg: A Segment-Level Annotated Dataset for Fine-Grained Hate Video Detection | Han Wang et.al. | 2508.01712 | null |
| 2025-08-03 | TopoImages: Incorporating Local Topology Encoding into Deep Learning Models for Medical Image Classification | Pengfei Gu et.al. | 2508.01574 | null |
| 2025-08-03 | EvoVLMA: Evolutionary Vision-Language Model Adaptation | Kun Ding et.al. | 2508.01558 | null |
| 2025-08-02 | TeSent: A Benchmark Dataset for Fairness-aware Explainable Sentiment Classification in Telugu | Vallabhaneni Raj Kumar et.al. | 2508.01486 | null |
| 2025-08-02 | GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification | Ngoc Bui Lam Quang et.al. | 2508.01293 | null |
| 2025-08-02 | Eigen Neural Network: Unlocking Generalizable Vision with Eigenbasis | Anzhe Cheng et.al. | 2508.01219 | null |
| 2025-08-01 | Small sample-based adaptive text classification through iterative and contrastive description refinement | Amrit Rajeev et.al. | 2508.00957 | null |
| 2025-07-30 | XAutoLM: Efficient Fine-Tuning of Language Models via Meta-Learning and AutoML | Ernesto L. Estevanell-Valladares et.al. | 2508.00924 | null |
| 2025-07-31 | Object-Centric Cropping for Visual Few-Shot Classification | Aymane Abdali et.al. | 2508.00218 | null |
| 2025-07-31 | Explainable Image Classification with Reduced Overconfidence for Tissue Characterisation | Alfie Roddan et.al. | 2507.23709 | null |
| 2025-07-31 | I Am Big, You Are Little; I Am Right, You Are Wrong | David A. Kelly et.al. | 2507.23509 | null |
| 2025-07-31 | Causal Identification of Sufficient, Contrastive and Complete Feature Sets in Image Classification | David A Kelly et.al. | 2507.23497 | null |
| 2025-07-31 | Smart Video Capsule Endoscopy: Raw Image-Based Localization for Enhanced GI Tract Investigation | Oliver Bause et.al. | 2507.23398 | null |
| 2025-07-31 | Popov Mirror-Prox Method for Variational Inequalities | Abhishek Chakraborty et.al. | 2507.23395 | null |
| 2025-07-31 | Analysis of Hyperparameter Optimization Effects on Lightweight Deep Models for Real-Time Image Classification | Vineet Kumar Rakesh et.al. | 2507.23315 | null |
| 2025-07-30 | Vocabulary-free Fine-grained Visual Recognition via Enriched Contextually Grounded Vision-Language Model | Dmitry Demidov et.al. | 2507.23070 | null |
| 2025-07-30 | Tricks and Plug-ins for Gradient Boosting in Image Classification | Biyi Fang et.al. | 2507.22842 | null |
| 2025-07-30 | Label-free estimation of clinically relevant performance metrics under distribution shifts | Tim Flühmann et.al. | 2507.22776 | null |
| 2025-07-30 | RainbowPrompt: Diversity-Enhanced Prompt-Evolving for Continual Learning | Kiseong Hong et.al. | 2507.22553 | null |
| 2025-07-30 | LoReUn: Data Itself Implicitly Provides Cues to Improve Machine Unlearning | Xiang Li et.al. | 2507.22499 | null |
| 2025-07-30 | Visual Language Models as Zero-Shot Deepfake Detectors | Viacheslav Pirogov et.al. | 2507.22469 | null |
| 2025-07-29 | HOG-CNN: Integrating Histogram of Oriented Gradients with Convolutional Neural Networks for Retinal Image Classification | Faisal Ahmed et.al. | 2507.22274 | null |
| 2025-07-29 | LLM-based Content Classification Approach for GitHub Repositories by the README Files | Malik Uzair Mehmood et.al. | 2507.21899 | null |
| 2025-07-29 | Improving Neural Network Training using Dynamic Learning Rate Schedule for PINNs and Image Classification | D. Veerababu et.al. | 2507.21749 | null |
| 2025-07-29 | Ethical Classification of Non-Coding Contributions in Open-Source Projects via Large Language Models | Sergio Cobos et.al. | 2507.21583 | null |
| 2025-07-28 | Evaluating Deep Learning Models for African Wildlife Image Classification: From DenseNet to Vision Transformers | Lukman Jibril Aliyu et.al. | 2507.21364 | null |
| 2025-07-28 | Can human clinical rationales improve the performance and explainability of clinical text classification models? | Christoph Metzner et.al. | 2507.21302 | null |
| 2025-07-28 | Dual Guidance Semi-Supervised Action Detection | Ankit Singh et.al. | 2507.21247 | null |
| 2025-07-27 | Contrast-CAT: Contrasting Activations for Enhanced Interpretability in Transformer-based Text Classifiers | Sungmin Han et.al. | 2507.21186 | null |
| 2025-07-24 | Comparative Analysis of Vision Transformers and Convolutional Neural Networks for Medical Image Classification | Kunal Kawadkar et.al. | 2507.21156 | null |
| 2025-07-28 | Compositional Function Networks: A High-Performance Alternative to Deep Neural Networks with Built-in Interpretability | Fang Li et.al. | 2507.21004 | null |
| 2025-07-28 | Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit | Yang Zhao et.al. | 2507.20623 | null |
| 2025-07-28 | PhaseNAS: Language-Model Driven Architecture Search with Dynamic Phase Adaptation | Fei Kong et.al. | 2507.20592 | null |
| 2025-07-27 | L-MCAT: Unpaired Multimodal Transformer with Contrastive Attention for Label-Efficient Satellite Image Classification | Mitul Goswami et.al. | 2507.20259 | null |
| 2025-07-27 | Dual-Stream Global-Local Feature Collaborative Representation Network for Scene Classification of Mining Area | Shuqi Fan et.al. | 2507.20216 | null |
| 2025-07-26 | Improving Audio Classification by Transitioning from Zero- to Few-Shot | James Taylor et.al. | 2507.20036 | null |
| 2025-07-26 | AF-CLIP: Zero-Shot Anomaly Detection via Anomaly-Focused CLIP Adaptation | Qingqing Fang et.al. | 2507.19949 | null |
| 2025-07-26 | Causality-aligned Prompt Learning via Diffusion-based Counterfactual Generation | Xinshu Li et.al. | 2507.19882 | null |
| 2025-07-26 | FedS2R: One-Shot Federated Domain Generalization for Synthetic-to-Real Semantic Segmentation in Autonomous Driving | Tao Lian et.al. | 2507.19881 | null |
| 2025-07-26 | Debunking Optimization Myths in Federated Learning for Medical Image Classification | Youngjoon Lee et.al. | 2507.19822 | null |
| 2025-07-25 | Joint Feature and Output Distillation for Low-complexity Acoustic Scene Classification | Haowen Li et.al. | 2507.19557 | null |
| 2025-07-25 | MedSymmFlow: Bridging Generative Modeling and Classification in Medical Imaging through Symmetrical Flow Matching | Francisco Caetano et.al. | 2507.19098 | link |
| 2025-07-25 | A New One-Shot Federated Learning Framework for Medical Imaging Classification with Feature-Guided Rectified Flow and Knowledge Distillation | Yufei Ma et.al. | 2507.19045 | null |
| 2025-07-24 | The Role of Orthographic Consistency in Multilingual Embedding Models for Text Classification in Arabic-Script Languages | Abdulhady Abas Abdullah et.al. | 2507.18762 | null |
| 2025-07-24 | CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation | Hyunwoo Oh et.al. | 2507.18750 | null |
| 2025-07-23 | VGS-ATD: Robust Distributed Learning for Multi-Label Medical Image Classification Under Heterogeneous and Imbalanced Conditions | Zehui Zhao et.al. | 2507.18657 | null |
| 2025-07-24 | On the Performance of Concept Probing: The Influence of the Data (Extended Version) | Manuel de Sousa Ribeiro et.al. | 2507.18550 | null |
| 2025-07-24 | GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface | Urchade Zaratiana et.al. | 2507.18546 | link |
| 2025-07-24 | Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows | Simin Huo et.al. | 2507.18405 | link |
| 2025-07-24 | FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting | Zhongzheng Yuan et.al. | 2507.18219 | null |
| 2025-07-23 | LTLZinc: a Benchmarking Framework for Continual Learning and Neuro-Symbolic Temporal Reasoning | Luca Salvatore Lorello et.al. | 2507.17482 | null |
| 2025-07-23 | Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation | Zixuan Wang et.al. | 2507.17204 | null |
| 2025-07-22 | Combining Language and Topic Models for Hierarchical Text Classification | Jaco du Toit et.al. | 2507.16490 | null |
| 2025-07-22 | The Cost of Compression: Tight Quadratic Black-Box Attacks on Sketches for $\ell_2$ Norm Estimation | Sara Ahmadian et.al. | 2507.16345 | null |
| 2025-07-22 | Cross-Modal Distillation For Widely Differing Modalities | Cairong Zhao et.al. | 2507.16296 | null |
| 2025-07-22 | MAN++: Scaling Momentum Auxiliary Network for Supervised Local Learning in Vision Tasks | Junhao Su et.al. | 2507.16279 | null |
| 2025-07-22 | Quality Text, Robust Vision: The Role of Language in Enhancing Visual Robustness of Vision-Language Models | Futa Waseda et.al. | 2507.16257 | null |
| 2025-07-21 | Stop-band Energy Constraint for Orthogonal Tunable Wavelet Units in Convolutional Neural Networks for Computer Vision problems | An D. Le et.al. | 2507.16114 | null |
| 2025-07-21 | Optimizing Canaries for Privacy Auditing with Metagradient Descent | Matteo Boglioni et.al. | 2507.15836 | null |
| 2025-07-21 | GeMix: Conditional GAN-Based Mixup for Improved Medical Image Augmentation | Hugo Carlesso et.al. | 2507.15577 | null |
| 2025-07-21 | Smart Eyes for Silent Threats: VLMs and In-Context Learning for THz Imaging | Nicolas Poggi et.al. | 2507.15576 | null |
| 2025-07-21 | An Investigation of Test-time Adaptation for Audio Classification under Background Noise | Weichuang Shao et.al. | 2507.15523 | null |
| 2025-07-20 | Polymorph: Energy-Efficient Multi-Label Classification for Video Streams on Embedded Devices | Saeid Ghafouri et.al. | 2507.14959 | null |
| 2025-07-20 | Probabilistic smooth attention for deep multiple instance learning in medical imaging | Francisco M. Castro-Macías et.al. | 2507.14932 | null |
| 2025-07-20 | Semantic-Aware Representation Learning for Multi-label Image Classification | Ren-Dong Xie et.al. | 2507.14918 | null |
| 2025-07-20 | The Tsetlin Machine Goes Deep: Logical Learning and Reasoning With Graphs | Ole-Christoffer Granmo et.al. | 2507.14874 | null |
| 2025-07-19 | Performance comparison of medical image classification systems using TensorFlow Keras, PyTorch, and JAX | Merjem Bećirović et.al. | 2507.14587 | null |
| 2025-07-18 | Classification of Histopathology Slides with Persistence Homology Convolutions | Shrunal Pothagoni et.al. | 2507.14378 | null |
| 2025-07-18 | Quantum Boltzmann Machines using Parallel Annealing for Medical Image Classification | Daniëlle Schuman et.al. | 2507.14116 | null |
| 2025-07-18 | Foundation Models as Class-Incremental Learners for Dermatological Image Classification | Mohamed Elkhayat et.al. | 2507.14050 | null |
| 2025-07-18 | Evaluating the Effectiveness of Cost-Efficient Large Language Models in Benchmark Biomedical Tasks | Israt Jahan et.al. | 2507.14045 | null |
| 2025-07-18 | Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations | Yong Feng et.al. | 2507.14010 | null |
| 2025-07-18 | Feature Engineering is Not Dead: Reviving Classical Machine Learning with Entropy, HOG, and LBP Feature Fusion for Image Classification | Abhijit Sen et.al. | 2507.13772 | null |
| 2025-07-18 | Adversarial Training Improves Generalization Under Distribution Shifts in Bioacoustics | René Heinrich et.al. | 2507.13727 | null |
| 2025-07-18 | Enhanced image classification via hybridizing quantum dynamics with classical neural networks | Ruiyang Zhou et.al. | 2507.13587 | null |
| 2025-07-17 | Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy | Yiting Yang et.al. | 2507.13260 | null |
| 2025-07-17 | Adversarial attacks to image classification systems using evolutionary algorithms | Sergio Nesmachnow et.al. | 2507.13136 | null |
| 2025-07-17 | MUPAX: Multidimensional Problem Agnostic eXplainable AI | Vincenzo Dentamaro et.al. | 2507.13090 | null |
| 2025-07-17 | Making Language Model a Hierarchical Classifier and Generator | Yihong Wang et.al. | 2507.12930 | null |
| 2025-07-17 | Federated Learning for Commercial Image Sources | Shreyansh Jain et.al. | 2507.12903 | null |
| 2025-07-17 | LanePerf: a Performance Estimation Framework for Lane Detection | Yin Wu et.al. | 2507.12894 | null |
| 2025-07-17 | Feature-Enhanced TResNet for Fine-Grained Food Image Classification | Lulu Liu et.al. | 2507.12828 | null |
| 2025-07-17 | Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine | Anastasia Kuznetsova et.al. | 2507.12701 | null |
| 2025-07-16 | Comparative Analysis of CNN Performance in Keras, PyTorch and JAX on PathMNIST | Anida Nezović et.al. | 2507.12248 | null |
| 2025-07-16 | PRISM: Distributed Inference for Foundation Models at Edge | Muhammad Azlan Qazi et.al. | 2507.12145 | null |
| 2025-07-16 | Effective Fine-Tuning of Vision Transformers with Low-Rank Adaptation for Privacy-Preserving Image Classification | Haiwei Lin et.al. | 2507.11943 | null |
| 2025-07-16 | Spatial Frequency Modulation for Semantic Segmentation | Linwei Chen et.al. | 2507.11893 | link |
| 2025-07-16 | ProtoConNet: Prototypical Augmentation and Alignment for Open-Set Few-Shot Image Classification | Kexuan Shi et.al. | 2507.11845 | null |
| 2025-07-15 | Quantum Adaptive Excitation Network with Variational Quantum Circuits for Channel Attention | Yu-Chao Hsu et.al. | 2507.11217 | null |
| 2025-07-15 | Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking | Yuan Yao et.al. | 2507.11137 | link |
| 2025-07-15 | Focus on Texture: Rethinking Pre-training in Masked Autoencoders for Medical Image Classification | Chetan Madan et.al. | 2507.10869 | null |
| 2025-07-14 | AudioMAE++: learning better masked audio representations with SwiGLU FFNs | Sarthak Yadav et.al. | 2507.10464 | null |
| 2025-07-14 | Improving Remote Sensing Classification using Topological Data Analysis and Convolutional Neural Networks | Aaryam Sharma et.al. | 2507.10381 | null |
| 2025-07-14 | FTCFormer: Fuzzy Token Clustering Transformer for Image Classification | Muyi Bao et.al. | 2507.10283 | null |
| 2025-07-14 | Transferring Styles for Reduced Texture Bias and Improved Robustness in Semantic Segmentation Networks | Ben Hamscher et.al. | 2507.10239 | null |
| 2025-07-14 | MEDebiaser: A Human-AI Feedback System for Mitigating Bias in Multi-label Medical Image Classification | Shaohan Shi et.al. | 2507.10044 | null |
| 2025-07-14 | Effects of structural properties of neural networks on machine learning performance | Yash Arya et.al. | 2507.10005 | null |
| 2025-07-14 | Hierarchical Job Classification with Similarity Graph Integration | Md Ahsanul Kabir et.al. | 2507.09949 | null |
| 2025-07-13 | Post-Training Quantization of Generative and Discriminative LSTM Text Classifiers: A Study of Calibration, Class Balance, and Robustness | Md Mushfiqur Rahaman et.al. | 2507.09687 | null |
| 2025-07-13 | MLoRQ: Bridging Low-Rank and Quantization for Transformer Compression | Ofir Gordon et.al. | 2507.09616 | null |
| 2025-07-13 | SDTN and TRN: Adaptive Spectral-Spatial Feature Extraction for Hyperspectral Image Classification | Fuyin Ye et.al. | 2507.09492 | null |
| 2025-07-11 | A Hybrid Multi-Well Hopfield-CNN with Feature Extraction and K-Means for MNIST Classification | Ahmed Farooq et.al. | 2507.08766 | null |
| 2025-07-11 | DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images | Haoran Sun et.al. | 2507.08648 | null |
| 2025-07-11 | Onboard Neuromorphic Split Computing via Optical Links for LEO Remote Sensing | Zihang Song et.al. | 2507.08490 | null |
| 2025-07-11 | Interpretability-Aware Pruning for Efficient Medical Image Analysis | Nikita Malik et.al. | 2507.08330 | null |
| 2025-07-11 | Admissibility of Stein Shrinkage for Batch Normalization in the Presence of Adversarial Attacks | Sofia Ivolgina et.al. | 2507.08261 | null |
| 2025-07-10 | A Hybrid Multilayer Extreme Learning Machine for Image Classification with an Application to Quadcopters | Rolando A. Hernandez-Hernandez et.al. | 2507.08047 | null |
| 2025-07-10 | Where are we with calibration under dataset shift in image classification? | Mélanie Roschewitz et.al. | 2507.07780 | null |
| 2025-07-10 | TRIX- Trading Adversarial Fairness via Mixed Adversarial Training | Tejaswini Medi et.al. | 2507.07768 | null |
| 2025-07-10 | OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting | Jaeheun Jung et.al. | 2507.07754 | null |
| 2025-07-10 | Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking | Qiangqiang Wu et.al. | 2507.07483 | null |
| 2025-07-10 | EPIC: Efficient Prompt Interaction for Text-Image Classification | Xinyao Yu et.al. | 2507.07415 | null |
| 2025-07-10 | GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation | Fardin Rastakhiz et.al. | 2507.07414 | null |
| 2025-07-09 | GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning | S M Taslim Uddin Raju et.al. | 2507.07006 | null |
| 2025-07-09 | Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy | Bogdan Kulynych et.al. | 2507.06969 | null |
| 2025-07-09 | Steps Adaptive Decay DPSGD: Enhancing Performance on Imbalanced Datasets with Differential Privacy with HAM10000 | Xiaobo Huang et.al. | 2507.06619 | null |
| 2025-07-08 | Capsule-ConvKAN: A Hybrid Neural Approach to Medical Image Classification | Laura Pituková et.al. | 2507.06417 | null |
| 2025-07-08 | SoftReMish: A Novel Activation Function for Enhanced Convolutional Neural Networks for Visual Recognition Performance | Mustafa Bayram Gücen et.al. | 2507.06148 | null |
| 2025-07-08 | On the Effectiveness of Methods and Metrics for Explainable AI in Remote Sensing Image Scene Classification | Jonas Klotz et.al. | 2507.05916 | null |
| 2025-07-08 | Knowledge-guided Complex Diffusion Model for PolSAR Image Classification in Contourlet Domain | Junfei Shi et.al. | 2507.05666 | null |
| 2025-07-08 | Model-free Optical Processors using In Situ Reinforcement Learning with Proximal Policy Optimization | Yuhang Li et.al. | 2507.05583 | null |
| 2025-07-07 | Experimental data re-uploading with provable enhanced learning capabilities | Martin F. X. Mauser et.al. | 2507.05120 | null |
| 2025-07-07 | Verified Language Processing with Hybrid Explainability: A Technical Report | Oliver Robert Fox et.al. | 2507.05017 | null |
| 2025-07-07 | Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification | Chenfei Xiong et.al. | 2507.05010 | null |
| 2025-07-07 | Bridging KAN and MLP: MJKAN, a Hybrid Architecture with Both Efficiency and Expressiveness | Hanseon Joo et.al. | 2507.04690 | null |
| 2025-07-07 | Recovering Plasticity of Neural Networks via Soft Weight Rescaling | Seungwon Oh et.al. | 2507.04683 | null |
| 2025-07-07 | VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents | Rui Meng et.al. | 2507.04590 | link |
| 2025-07-06 | MVNet: Hyperspectral Remote Sensing Image Classification Based on Hybrid Mamba-Transformer Vision Backbone Architecture | Guandong Li et.al. | 2507.04409 | null |
| 2025-07-06 | Transferring Visual Explainability of Self-Explaining Models through Task Arithmetic | Yuya Yoshikawa et.al. | 2507.04380 | null |
| 2025-07-06 | Efficient Training of Deep Networks using Guided Spectral Data Selection: A Step Toward Learning What You Need | Mohammadreza Sharifi et.al. | 2507.04269 | null |
| 2025-07-06 | Siberian radioheliograph image classification using ensemble of CLIP, EfficientNet and CatBoost models | Yaroslav Egorov et.al. | 2507.04211 | null |
| 2025-07-03 | Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics | Alex Colagrande et.al. | 2507.02748 | link |
| 2025-07-03 | ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning | Junyu Wang et.al. | 2507.02666 | null |
| 2025-07-03 | MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention | Zunhui Xia et.al. | 2507.02488 | null |
| 2025-07-03 | F^2TTA: Free-Form Test-Time Adaptation on Cross-Domain Medical Image Classification via Image-Level Disentangled Prompt Tuning | Wei Li et.al. | 2507.02437 | null |
| 2025-07-03 | Cross-domain Hyperspectral Image Classification based on Bi-directional Domain Adaptation | Yuxiang Zhang et.al. | 2507.02268 | null |
| 2025-07-03 | High-Fidelity Differential-information Driven Binary Vision Transformer | Tian Gao et.al. | 2507.02222 | null |
| 2025-07-02 | Selective Feature Re-Encoded Quantum Convolutional Neural Network with Joint Optimization for Image Classification | Shaswata Mahernob Sarkar et.al. | 2507.02086 | null |
| 2025-07-02 | How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks | Rahul Ramachandran et.al. | 2507.01955 | link |
| 2025-07-02 | evMLP: An Efficient Event-Driven MLP Architecture for Vision | Zhentan Zheng et.al. | 2507.01927 | link |
| 2025-07-02 | mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling | Tristan Torchet et.al. | 2507.01829 | null |
| 2025-07-02 | Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging | Montasir Shams et.al. | 2507.01788 | null |
| 2025-07-02 | Learning from Random Subspace Exploration: Generalized Test-Time Augmentation with Self-supervised Distillation | Andrei Jelea et.al. | 2507.01347 | null |
| 2025-07-01 | Biorthogonal Tunable Wavelet Unit with Lifting Scheme in Convolutional Neural Network | An Le et.al. | 2507.00739 | null |
| 2025-07-01 | Rectifying Magnitude Neglect in Linear Attention | Qihang Fan et.al. | 2507.00698 | link |
| 2025-07-01 | Few-shot Classification as Multi-instance Verification: Effective Backbone-agnostic Transfer across Domains | Xin Xu et.al. | 2507.00401 | null |
| 2025-06-30 | Two-Stage Reasoning-Infused Learning: Improving Classification with LLM-Generated Reasoning | Mads Henrichsen et.al. | 2507.00214 | null |
| 2025-06-30 | Toward Simple and Robust Contrastive Explanations for Image Classification by Leveraging Instance Similarity and Concept Relevance | Yuliia Kaidashova et.al. | 2506.23975 | null |
| 2025-06-30 | Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse Autoencoders | Mathis Le Bail et.al. | 2506.23951 | null |
| 2025-06-30 | Controllable Reference-Based Real-World Remote Sensing Image Super-Resolution with Generative Diffusion Priors | Ce Wang et.al. | 2506.23801 | null |
| 2025-07-01 | Towards the Training of Deeper Predictive Coding Neural Networks | Chang Qi et.al. | 2506.23800 | null |
| 2025-06-30 | A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability Enhancement | Gaozheng Pei et.al. | 2506.23676 | null |
| 2025-06-30 | Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack | Arnisa Fazla et.al. | 2506.23661 | null |
| 2025-06-30 | AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays | Chenlang Yi et.al. | 2506.23467 | null |
| 2025-06-29 | Federated Breast Cancer Detection Enhanced by Synthetic Ultrasound Image Augmentation | Hongyi Pan et.al. | 2506.23334 | null |
| 2025-07-01 | Exposing and Mitigating Calibration Biases and Demographic Unfairness in MLLM Few-Shot In-Context Learning for Medical Image Classification | Xing Shen et.al. | 2506.23298 | null |
| 2025-06-29 | Aggregating Local Saliency Maps for Semi-Global Explainable Image Classification | James Hinns et.al. | 2506.23247 | null |
| 2025-06-27 | Boosting Classification with Quantum-Inspired Augmentations | Matthias Tschöpe et.al. | 2506.22241 | null |
| 2025-06-27 | Remote Sensing Large Vision-Language Model: Semantic-augmented Multi-level Alignment and Semantic-aware Expert Modeling | Sungjune Park et.al. | 2506.21863 | null |
| 2025-06-27 | LinguaSynth: Heterogeneous Linguistic Signals for News Classification | Duo Zhang et.al. | 2506.21848 | null |
| 2025-06-25 | Disentangled representations of microscopy images | Jacopo Dapueto et.al. | 2506.20649 | null |
| 2025-06-25 | Counterfactual Influence as a Distributional Quantity | Matthieu Meeus et.al. | 2506.20481 | null |
| 2025-06-25 | Practical insights on the effect of different encodings, ansätze and measurements in quantum and hybrid convolutional neural networks | Jesús Lozano-Cruz et.al. | 2506.20355 | link |
| 2025-06-25 | Learning Moderately Input-Sensitive Functions: A Case Study in QR Code Decoding | Kazuki Yoda et.al. | 2506.20305 | null |
| 2025-06-25 | Hierarchical Mask-Enhanced Dual Reconstruction Network for Few-Shot Fine-Grained Image Classification | Ning Luo et.al. | 2506.20263 | null |
| 2025-06-25 | Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems | Benedetta Muscato et.al. | 2506.20209 | null |
| 2025-06-26 | Towards Scalable and Generalizable Earth Observation Data Mining via Foundation Model Composition | Man Duc Chuc et.al. | 2506.20174 | null |
| 2025-06-24 | Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons | Dengyu Wu et.al. | 2506.20015 | null |
| 2025-06-24 | Ensemble nonlinear optical learner by electrically tunable linear scattering | Tunan Xia et.al. | 2506.19976 | null |
| 2025-06-25 | One Prototype Is Enough: Single-Prototype Activation for Interpretable Image Classification | Yitao Peng et.al. | 2506.19808 | null |
| 2025-06-24 | MambaOutRS: A Hybrid CNN-Fourier Architecture for Remote Sensing Image Classification | Minjong Cheon et.al. | 2506.19561 | null |
| 2025-06-24 | Iterative Quantum Feature Maps | Nasa Matsumoto et.al. | 2506.19461 | null |
| 2025-06-24 | Comparative Performance of Finetuned ImageNet Pre-trained Models for Electronic Component Classification | Yidi Shao et.al. | 2506.19330 | null |
| 2025-06-23 | LKA: Large Kernel Adapter for Enhanced Medical Image Classification | Ziquan Zhu et.al. | 2506.19118 | null |
| 2025-06-23 | Sensitivity Analysis of Image Classification Models using Generalized Polynomial Chaos | Lukas Bahr et.al. | 2506.18751 | null |
| 2025-06-23 | SIM-Net: A Multimodal Fusion Network Using Inferred 3D Object Shape Point Clouds from RGB Images for 2D Classification | Youcef Sklab et.al. | 2506.18683 | null |
| 2025-06-23 | SpaNN: Detecting Multiple Adversarial Patches on CNNs by Spanning Saliency Thresholds | Mauricio Byrd Victorica et.al. | 2506.18591 | null |
| 2025-06-23 | Geometry-aware Distance Measure for Diverse Hierarchical Structures in Hyperbolic Spaces | Pengxiang Li et.al. | 2506.18533 | null |
| 2025-06-23 | A Set-to-Set Distance Measure in Hyperbolic Space | Pengxiang Li et.al. | 2506.18529 | null |
| 2025-06-23 | Fully Few-shot Class-incremental Audio Classification Using Multi-level Embedding Extractor and Ridge Regression Classifier | Yongjie Si et.al. | 2506.18406 | null |
| 2025-06-23 | Open Set Recognition for Endoscopic Image Classification: A Deep Learning Approach on the Kvasir Dataset | Kasra Moazzami et.al. | 2506.18284 | null |
| 2025-06-22 | Pitfalls of Conformal Predictions for Medical Image Classification | Hendrik Mehrtens et.al. | 2506.18162 | null |
| 2025-06-22 | HE-LRM: Encrypted Deep Learning Recommendation Models using Fully Homomorphic Encryption | Karthik Garimella et.al. | 2506.18150 | null |
| 2025-06-22 | Training-free Test-time Improvement for Explainable Medical Image Classification | Hangzhou He et.al. | 2506.18070 | link |
| 2025-06-20 | Robust Training with Data Augmentation for Medical Imaging Classification | Josué Martínez-Martínez et.al. | 2506.17133 | null |
| 2025-06-20 | Acquiring and Accumulating Knowledge from Diverse Datasets for Multi-label Driving Scene Classification | Ke Li et.al. | 2506.17101 | null |
| 2025-06-20 | From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers | Jingtong Su et.al. | 2506.17052 | null |
| 2025-06-20 | With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You | Fabian Gröger et.al. | 2506.16895 | null |
| 2025-06-20 | Transition of AI Models in dependence of noise | Thomas Seidler et.al. | 2506.16715 | null |
| 2025-06-19 | Efficient Transformations in Deep Learning Convolutional Neural Networks | Berk Yilmaz et.al. | 2506.16418 | null |
| 2025-06-19 | SHREC and PHEONA: Using Large Language Models to Advance Next-Generation Computational Phenotyping | Sarah Pungitore et.al. | 2506.16359 | null |
| 2025-06-19 | Polyline Path Masked Attention for Vision Transformer | Zhongchen Zhao et.al. | 2506.15940 | link |
| 2025-06-18 | FedWSIDD: Federated Whole Slide Image Classification via Dataset Distillation | Haolong Jin et.al. | 2506.15365 | link |
| 2025-06-18 | Enhancing One-run Privacy Auditing with Quantile Regression-Based Membership Inference | Terrance Liu et.al. | 2506.15349 | null |
| 2025-06-19 | OpenPath: Open-Set Active Learning for Pathology Image Classification via Pre-trained Vision-Language Models | Lanfeng Zhong et.al. | 2506.15318 | null |
| 2025-06-18 | J3DAI: A tiny DNN-Based Edge AI Accelerator for 3D-Stacked CMOS Image Sensor | Benoit Tain et.al. | 2506.15316 | null |
| 2025-06-18 | Domain Adaptation for Image Classification of Defects in Semiconductor Manufacturing | Adrian Poniatowski et.al. | 2506.15260 | null |
| 2025-06-18 | A Comparative Study of Task Adaptation Techniques of Large Language Models for Identifying Sustainable Development Goals | Andrea Cadeddu et.al. | 2506.15208 | null |
| 2025-06-18 | Identifying social isolation themes in NVDRS text narratives using topic modeling and text-classification methods | Drew Walker et.al. | 2506.15030 | null |
| 2025-06-17 | DDS-NAS: Dynamic Data Selection within Neural Architecture Search via On-line Hard Example Mining applied to Image Classification | Matt Poyser et.al. | 2506.14667 | null |
| 2025-06-17 | Train Once, Forget Precisely: Anchored Optimization for Efficient Post-Hoc Unlearning | Prabhav Sanga et.al. | 2506.14515 | null |
| 2025-06-17 | Compositional Attribute Imbalance in Vision Datasets | Jiayi Chen et.al. | 2506.14418 | null |
| 2025-06-17 | One-Shot Neural Architecture Search with Network Similarity Directed Initialization for Pathological Image Classification | Renao Yan et.al. | 2506.14176 | null |
| 2025-06-17 | SeqPE: Transformer with Sequential Position Encoding | Huayang Li et.al. | 2506.13277 | link |
| 2025-06-15 | Intriguing Frequency Interpretation of Adversarial Robustness for CNNs and ViTs | Lu Chen et.al. | 2506.12875 | null |
| 2025-06-15 | Medical Argument Mining: Exploitation of Scarce Data Using NLI Systems | Maitane Urruela et.al. | 2506.12823 | null |
| 2025-06-15 | Cross-architecture universal feature coding via distribution alignment | Changsheng Gao et.al. | 2506.12737 | null |
| 2025-06-15 | Unsupervised Contrastive Learning Using Out-Of-Distribution Data for Long-Tailed Dataset | Cuong Manh Hoang et.al. | 2506.12698 | null |
| 2025-06-15 | Evaluating Cell Type Inference in Vision Language Models Under Varying Visual Context | Samarth Singhal et.al. | 2506.12683 | null |
| 2025-06-14 | OscNet v1.5: Energy Efficient Hopfield Network on CMOS Oscillators for Image Classification | Wenxiao Cai et.al. | 2506.12610 | null |
| 2025-06-14 | DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification | Darryl Ho et.al. | 2506.12585 | null |
| 2025-06-14 | MVP-CBM:Multi-layer Visual Preference-enhanced Concept Bottleneck Model for Explainable Medical Image Classification | Chunjiang Wang et.al. | 2506.12568 | null |
| 2025-06-14 | PLD: A Choice-Theoretic List-Wise Knowledge Distillation | Ejafa Bassam et.al. | 2506.12542 | null |
| 2025-06-13 | GeistBERT: Breathing Life into German NLP | Raphael Scheible-Schmitt et.al. | 2506.11903 | null |
| 2025-06-13 | Evaluating Fairness and Mitigating Bias in Machine Learning: A Novel Technique using Tensor Data and Bayesian Regression | Kuniko Paxton et.al. | 2506.11627 | null |
| 2025-06-13 | Machine Unlearning for Robust DNNs: Attribution-Guided Partitioning and Neuron Pruning in Noisy Environments | Deliang Jin et.al. | 2506.11615 | null |
| 2025-06-13 | Black-Box Edge AI Model Selection with Conformal Latency and Accuracy Guarantees | Anders E. Kalør et.al. | 2506.11391 | null |
| 2025-06-12 | SNR and Resource Adaptive Deep JSCC for Distributed IoT Image Classification | Ali Waqas et.al. | 2506.10699 | null |
| 2025-06-13 | PiPViT: Patch-based Visual Interpretable Prototypes for Retinal Image Analysis | Marzieh Oghbaie et.al. | 2506.10669 | link |
| 2025-06-12 | Boosting Adversarial Transferability for Hyperspectral Image Classification Using 3D Structure-invariant Transformation and Intermediate Feature Distance | Chun Liu et.al. | 2506.10459 | null |
| 2025-06-12 | Can We Infer Confidential Properties of Training Data from LLMs? | Penguin Huang et.al. | 2506.10364 | null |
| 2025-06-12 | Flick: Few Labels Text Classification using K-Aware Intermediate Learning in Multi-Task Low-Resource Languages | Ali Almutairi et.al. | 2506.10292 | null |
| 2025-06-11 | FedMLAC: Mutual Learning Driven Heterogeneous Federated Audio Classification | Jun Bai et.al. | 2506.10207 | null |
| 2025-06-11 | Detecção da Psoríase Utilizando Visão Computacional: Uma Abordagem Comparativa Entre CNNs e Vision Transformers | Natanael Lucena et.al. | 2506.10119 | null |
| 2025-06-11 | DeepTraverse: A Depth-First Search Inspired Network for Algorithmic Visual Understanding | Bin Guo et.al. | 2506.10084 | null |
| 2025-06-11 | Evidential Deep Learning with Spectral-Spatial Uncertainty Disentanglement for Open-Set Hyperspectral Domain Generalization | Amirreza Khoshbakht et.al. | 2506.09460 | null |
| 2025-06-11 | MSSDF: Modality-Shared Self-supervised Distillation for High-Resolution Multi-modal Remote Sensing Image Learning | Tong Wang et.al. | 2506.09327 | null |
| 2025-06-10 | ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs | Dhruv Parikh et.al. | 2506.09282 | null |
| 2025-06-10 | Hyperbolic Dual Feature Augmentation for Open-Environment | Peilin Yu et.al. | 2506.08906 | null |
| 2025-06-10 | Normalized Radon Cumulative Distribution Transforms for Invariance and Robustness in Optimal Transport Based Image Classification | Matthias Beckmann et.al. | 2506.08761 | null |
| 2025-06-12 | InceptionMamba: An Efficient Hybrid Network with Large Band Convolution and Bottleneck Mamba | Yuhang Wang et.al. | 2506.08735 | null |
| 2025-06-10 | Biologically Inspired Deep Learning Approaches for Fetal Ultrasound Image Classification | Rinat Prochii et.al. | 2506.08623 | null |
| 2025-06-10 | mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks | Luel Hagos Beyene et.al. | 2506.08400 | null |
| 2025-06-10 | An Adaptive Method Stabilizing Activations for Enhanced Generalization | Hyunseok Seung et.al. | 2506.08353 | null |
| 2025-06-11 | Hyperspectral Image Classification via Transformer-based Spectral-Spatial Attention Decoupling and Adaptive Gating | Guandong Li et.al. | 2506.08324 | null |
| 2025-06-09 | TokenBreak: Bypassing Text Classification Models Through Token Manipulation | Kasimir Schulz et.al. | 2506.07948 | null |
| 2025-06-09 | MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification | Iustin Sirbu et.al. | 2506.07801 | null |
| 2025-06-09 | Improving Memory Efficiency for Training KANs via Meta Learning | Zhangchi Zhao et.al. | 2506.07549 | null |
| 2025-06-09 | Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks | Shakir Yousefi et.al. | 2506.07500 | null |
| 2025-06-08 | Mobility-Aware Asynchronous Federated Learning with Dynamic Sparsification | Jintao Yan et.al. | 2506.07328 | null |
| 2025-06-08 | A Stable Whitening Optimizer for Efficient Neural Network Training | Kevin Frans et.al. | 2506.07254 | null |
| 2025-06-08 | Hierarchical Feature-level Reverse Propagation for Post-Training Neural Networks | Ni Ding et.al. | 2506.07188 | null |
| 2025-06-08 | CTDGSI: A comprehensive exploitation of instance selection methods for automatic text classification. VII Concurso de Teses, Dissertações e Trabalhos de Graduação em SI – XXI Simpósio Brasileiro de Sistemas de Informação | Washington Cunha et.al. | 2506.07169 | null |
| 2025-06-08 | pFedSOP : Accelerating Training Of Personalized Federated Learning Using Second-Order Optimization | Mrinmay Sen et.al. | 2506.07159 | null |
| 2025-06-07 | Rewriting the Budget: A General Framework for Black-Box Attacks Under Cost Asymmetry | Mahdi Salmani et.al. | 2506.06933 | null |
| 2025-06-06 | Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias | Yuanzhe Hu et.al. | 2506.06280 | null |
| 2025-06-06 | FPDANet: A Multi-Section Classification Model for Intelligent Screening of Fetal Ultrasound | Minglang Chen et.al. | 2506.06054 | null |
| 2025-06-06 | Enhancing Orthopox Image Classification Using Hybrid Machine Learning and Deep Learning Models | Alejandro Puente-Castro et.al. | 2506.06007 | null |
| 2025-06-06 | LTG at SemEval-2025 Task 10: Optimizing Context for Classification of Narrative Roles | Egil Rønningstad et.al. | 2506.05976 | null |
| 2025-06-06 | Integer Binary-Range Alignment Neuron for Spiking Neural Networks | Binghao Ye et.al. | 2506.05679 | null |
| 2025-06-05 | FRAME: Pre-Training Video Feature Representations via Anticipation and Memory | Sethuraman TV et.al. | 2506.05543 | null |
| 2025-06-05 | Spectral Graph Neural Networks are Incomplete on Graphs with a Simple Spectrum | Snir Hordan et.al. | 2506.05530 | null |
| 2025-06-05 | Robustness Evaluation for Video Models with Reinforcement Learning | Ashwin Ramesh Babu et.al. | 2506.05431 | null |
| 2025-06-05 | Interpretable Few-Shot Image Classification via Prototypical Concept-Guided Mixture of LoRA Experts | Zhong Ji et.al. | 2506.04673 | null |
| 2025-06-04 | Deep Learning for Absorption-Image Analysis | Jacob Morrey et.al. | 2506.04517 | null |
| 2025-06-04 | KOALA++: Efficient Kalman-Based Optimization of Neural Networks with Gradient-Covariance Products | Zixuan Xia et.al. | 2506.04432 | null |
| 2025-06-04 | Benchmarking Time-localized Explanations for Audio Classification Models | Cecilia Bolaños et.al. | 2506.04391 | null |
| 2025-06-04 | Hierarchical Text Classification Using Contrastive Learning Informed Path Guided Hierarchy | Neeraj Agrawal et.al. | 2506.04381 | null |
| 2025-06-04 | Recent Advances in Medical Image Classification | Loan Dao et.al. | 2506.04129 | null |
| 2025-06-04 | Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation | Mingxuan Xia et.al. | 2506.03857 | link |
| 2025-06-04 | RhoDARTS: Differentiable Quantum Architecture Search with Density Matrix Simulations | Swagat Kumar et.al. | 2506.03697 | null |
| 2025-06-04 | Directional Non-Commutative Monoidal Embeddings for MNIST | Mahesh Godavarti et.al. | 2506.03472 | null |
| 2025-06-03 | RoNFA: Robust Neural Field-based Approach for Few-Shot Image Classification with Noisy Labels | Nan Xiang et.al. | 2506.03461 | null |
| 2025-06-02 | Quantifying task-relevant representational similarity using decision variable correlation | Yu et.al. | 2506.02164 | null |
| 2025-06-02 | Towards Better Generalization and Interpretability in Unsupervised Concept-Based Models | Francesco De Santis et.al. | 2506.02092 | null |
| 2025-06-02 | OD3: Optimization-free Dataset Distillation for Object Detection | Salwa K. Al Khatib et.al. | 2506.01942 | link |
| 2025-06-02 | Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$ -Smoothness | Thomas Pethick et.al. | 2506.01913 | null |
| 2025-06-02 | Beyond Static Responses: Multi-Agent LLM Systems as a New Paradigm for Social Science Research | Jennifer Haase et.al. | 2506.01839 | null |
| 2025-06-02 | mdok of KInIT: Robustly Fine-tuned LLM for Binary and Multiclass AI-Generated Text Detection | Dominik Macko et.al. | 2506.01702 | null |
| 2025-06-02 | Data Pruning by Information Maximization | Haoru Tan et.al. | 2506.01701 | null |
| 2025-06-02 | Domain Lexical Knowledge-based Word Embedding Learning for Text Classification under Small Data | Zixiao Zhu et.al. | 2506.01621 | null |
| 2025-06-02 | Speed-up of Vision Transformer Models by Attention-aware Token Filtering | Takahiro Naruko et.al. | 2506.01519 | null |
| 2025-06-02 | A Novel Context-Adaptive Fusion of Shadow and Highlight Regions for Efficient Sonar Image Classification | Kamal Basha S et.al. | 2506.01445 | null |
| 2025-05-30 | Optimal Weighted Convolution for Classification and Denosing | Simone Cammarasana et.al. | 2505.24558 | link |
| 2025-05-30 | SASP: Strip-Aware Spatial Perception for Fine-Grained Bird Image Classification | Zheng Wang et.al. | 2505.24380 | null |
| 2025-05-30 | Spatiotemporal Analysis of Forest Machine Operations Using 3D Video Classification | Maciej Wielgosz et.al. | 2505.24375 | null |
| 2025-05-30 | GeoVision Labeler: Zero-Shot Geospatial Classification with Vision and Language Models | Gilles Quentin Hacheme et.al. | 2505.24340 | null |
| 2025-05-30 | Provably Improving Generalization of Few-Shot Models with Synthetic Data | Lan-Cuong Nguyen et.al. | 2505.24190 | null |
| 2025-05-30 | FeatureSense: Protecting Speaker Attributes in Always-On Audio Sensing System | Bhawana Chhaglani et.al. | 2505.24115 | null |
| 2025-05-30 | Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting | Chen Huang et.al. | 2505.24088 | null |
| 2025-05-29 | BIRD: Behavior Induction via Representation-structure Distillation | Galen Pogoncheff et.al. | 2505.23933 | null |
| 2025-05-29 | Boosting Domain Incremental Learning: Selecting the Optimal Parameters is All You Need | Qiang Wang et.al. | 2505.23744 | link |
| 2025-05-29 | Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds | Andrew Chang et.al. | 2505.23509 | link |
| 2025-05-29 | MCFNet: A Multimodal Collaborative Fusion Network for Fine-Grained Semantic Classification | Yang Qiao et.al. | 2505.23365 | null |
| 2025-05-29 | DSAGL: Dual-Stream Attention-Guided Learning for Weakly Supervised Whole Slide Image Classification | Daoxi Cao et.al. | 2505.23341 | null |
| 2025-05-29 | Deep Modeling and Optimization of Medical Image Classification | Yihang Wu et.al. | 2505.23040 | link |
| 2025-05-28 | Leveraging Diffusion Models for Synthetic Data Augmentation in Protein Subcellular Localization Classification | Sylvey Lin et.al. | 2505.22926 | null |
| 2025-05-28 | Frequency-Adaptive Discrete Cosine-ViT-ResNet Architecture for Sparse-Data Vision | Ziyue Kang et.al. | 2505.22701 | null |
| 2025-05-28 | S2AFormer: Strip Self-Attention for Efficient Vision Transformer | Guoan Xu et.al. | 2505.22195 | null |
| 2025-05-28 | Efficient Ensemble for Fine-tuning Language Models on Multiple Datasets | Dongyue Li et.al. | 2505.21930 | null |
| 2025-05-28 | Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation | Mehrdad Noori et.al. | 2505.21844 | null |
| 2025-05-27 | MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis | Yitong Li et.al. | 2505.21698 | null |
| 2025-05-27 | Leveraging large language models and traditional machine learning ensembles for ADHD detection from narrative transcripts | Yuxin Zhu et.al. | 2505.21324 | null |
| 2025-05-27 | Making Every Event Count: Balancing Data Efficiency and Accuracy in Event Camera Subsampling | Hesam Araghi et.al. | 2505.21187 | null |
| 2025-05-27 | Information-Theoretic Complementary Prompts for Improved Continual Text Classification | Duzhen Zhang et.al. | 2505.20933 | null |
| 2025-05-27 | Evidential Deep Active Learning for Semi-Supervised Classification | Shenkai Zhao et.al. | 2505.20691 | null |
| 2025-05-26 | UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter-Efficient Fine-Tuning of Large Models | Xueyan Zhang et.al. | 2505.20154 | null |
| 2025-05-26 | Improvement Strategies for Few-Shot Learning in OCT Image Classification of Rare Retinal Diseases | Cheng-Yu Tai et.al. | 2505.20149 | null |
| 2025-05-26 | Differential Privacy Analysis of Decentralized Gossip Averaging under Varying Threat Models | Antti Koskela et.al. | 2505.19969 | null |
| 2025-05-26 | Task-Oriented Low-Label Semantic Communication With Self-Supervised Learning | Run Gu et.al. | 2505.19940 | null |
| 2025-05-26 | Advancements in Medical Image Classification through Fine-Tuning Natural Domain Foundation Models | Mobina Mansoori et.al. | 2505.19779 | link |
| 2025-05-26 | Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments | Junming Liu et.al. | 2505.19699 | null |
| 2025-05-26 | Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models | Rui Cai et.al. | 2505.19616 | link |
| 2025-05-26 | Applications and Effect Evaluation of Generative Adversarial Networks in Semi-Supervised Learning | Jiyu Hu et.al. | 2505.19522 | null |
| 2025-05-26 | DiSa: Directional Saliency-Aware Prompt Learning for Generalizable Vision-Language Models | Niloufar Alipour Talemi et.al. | 2505.19373 | null |
| 2025-05-25 | Remote Sensing Image Classification with Decoupled Knowledge Distillation | Yaping He et.al. | 2505.19111 | null |
| 2025-05-24 | MoMBS: Mixed-order minibatch sampling enhances model training from diverse-quality images | Han Li et.al. | 2505.18741 | null |
| 2025-05-23 | SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond Classification | Shashank Agnihotri et.al. | 2505.18015 | null |
| 2025-05-23 | KITINet: Kinetics Theory Inspired Network Architectures with PDE Simulation Approaches | Mingquan Feng et.al. | 2505.17919 | null |
| 2025-05-23 | Ownership Verification of DNN Models Using White-Box Adversarial Attacks with Specified Probability Manipulation | Teruki Sano et.al. | 2505.17579 | null |
| 2025-05-23 | Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning | Cheng Peng et.al. | 2505.17436 | null |
| 2025-05-23 | EVM-Fusion: An Explainable Vision Mamba Architecture with Neural Algorithmic Fusion | Zichuan Yang et.al. | 2505.17367 | null |
| 2025-05-22 | Extending Dataset Pruning to Object Detection: A Variance-based Approach | Ryota Yagi et.al. | 2505.17245 | null |
| 2025-05-23 | TULiP: Test-time Uncertainty Estimation via Linearization and Weight Perturbation | Yuhui Zhang et.al. | 2505.16923 | null |
| 2025-05-22 | Incremental Sequence Classification with Temporal Consistency | Lucas Maystre et.al. | 2505.16548 | null |
| 2025-05-22 | Fusion of Foundation and Vision Transformer Model Features for Dermatoscopic Image Classification | Amirreza Mahbod et.al. | 2505.16338 | null |
| 2025-05-22 | Accelerating Targeted Hard-Label Adversarial Attacks in Low-Query Black-Box Settings | Arjhun Swaminathan et.al. | 2505.16313 | link |
| 2025-05-22 | Swin Transformer for Robust CGI Images Detection: Intra- and Inter-Dataset Analysis across Multiple Color Spaces | Preeti Mehta et.al. | 2505.16253 | null |
| 2025-05-22 | When VLMs Meet Image Classification: Test Sets Renovation via Missing Label Identification | Zirui Pang et.al. | 2505.16149 | null |
| 2025-05-21 | Small Language Models in the Real World: Insights from Industrial Text Classification | Lujun Li et.al. | 2505.16078 | null |
| 2025-05-21 | GradPCA: Leveraging NTK Alignment for Reliable Out-of-Distribution Detection | Mariia Seleznova et.al. | 2505.16017 | null |
| 2025-05-21 | Domain Adaptive Skin Lesion Classification via Conformal Ensemble of Vision Transformers | Mehran Zoravar et.al. | 2505.15997 | null |
| 2025-05-21 | Large Language Models as Computable Approximations to Solomonoff Induction | Jun Wan et.al. | 2505.15784 | null |
| 2025-05-21 | FragFake: A Dataset for Fine-Grained Detection of Edited Images with Vision Language Models | Zhen Sun et.al. | 2505.15644 | null |
| 2025-05-21 | SNAP: A Benchmark for Testing the Effects of Capture Conditions on Fundamental Vision Tasks | Iuliia Kotseruba et.al. | 2505.15628 | link |
| 2025-05-21 | Aligning Explanations with Human Communication | Jacopo Teneggi et.al. | 2505.15626 | null |
| 2025-05-21 | Beyond Linearity: Squeeze-and-Recalibrate Blocks for Few-Shot Whole Slide Image Classification | Conghao Xiong et.al. | 2505.15504 | null |
| 2025-05-21 | Adaptive Temperature Scaling with Conformal Prediction | Nikita Kotelevskii et.al. | 2505.15437 | null |
| 2025-05-21 | Parameter-Efficient Fine-Tuning of Multispectral Foundation Models for Hyperspectral Image Classification | Bernardin Ligan et.al. | 2505.15334 | null |
| 2025-05-21 | Multicrossmodal Automated Agent for Integrating Diverse Materials Science Data | Adib Bazgir et.al. | 2505.15132 | null |
| 2025-05-20 | Reliable Decision Support with LLMs: A Framework for Evaluating Consistency in Binary Text Classification Applications | Fadel M. Megahed et.al. | 2505.14918 | null |
| 2025-05-20 | Solving MNIST with a globally trained Mixture of Quantum Experts | Paolo Alessandro Xavier Tognini et.al. | 2505.14789 | null |
| 2025-05-20 | Guarded Query Routing for Large Language Models | Richard Šléher et.al. | 2505.14524 | null |
| 2025-05-20 | PRL: Prompts from Reinforcement Learning | Paweł Batorski et.al. | 2505.14412 | null |
| 2025-05-20 | Domain Adaptation for Multi-label Image Classification: a Discriminator-free Approach | Inder Pal Singh et.al. | 2505.14333 | link |
| 2025-05-20 | HausaNLP: Current Status, Challenges and Future Directions for Hausa Natural Language Processing | Shamsuddeen Hassan Muhammad et.al. | 2505.14311 | null |
| 2025-05-20 | Intra-class Patch Swap for Self-Distillation | Hongjun Choi et.al. | 2505.14124 | link |
| 2025-05-20 | Scaling Vision Mamba Across Resolutions via Fractal Traversal | Bo Li et.al. | 2505.14062 | null |
| 2025-05-20 | Learning Concept-Driven Logical Rules for Interpretable and Generalizable Medical Image Classification | Yibo Gao et.al. | 2505.14049 | null |
| 2025-05-20 | A Challenge to Build Neuro-Symbolic Video Agents | Sahil Shah et.al. | 2505.13851 | null |
| 2025-05-19 | Synthetic-Powered Predictive Inference | Meshi Bashari et.al. | 2505.13432 | null |
| 2025-05-20 | Unlabeled Data or Pre-trained Model: Rethinking Semi-Supervised Learning and Pretrain-Finetuning | Song-Lin Li et.al. | 2505.13317 | null |
| 2025-05-19 | A Physics-Inspired Optimizer: Velocity Regularized Adam | Pranav Vaidhyanathan et.al. | 2505.13196 | null |
| 2025-05-19 | Emergence of Fixational and Saccadic Movements in a Multi-Level Recurrent Attention Model for Vision | Pengcheng Pan et.al. | 2505.13191 | null |
| 2025-05-19 | Learning to Adapt to Position Bias in Vision Transformer Classifiers | Robert-Jan Bruintjes et.al. | 2505.13137 | link |
| 2025-05-19 | When majority rules, minority loses: bias amplification of gradient descent | François Bachoc et.al. | 2505.13122 | null |
| 2025-05-19 | Expert-Like Reparameterization of Heterogeneous Pyramid Receptive Fields in Efficient CNNs for Fair Medical Image Classification | Xiao Wu et.al. | 2505.13039 | null |
| 2025-05-19 | EPIC: Explanation of Pretrained Image Classification Networks via Prototype | Piotr Borycki et.al. | 2505.12897 | link |
| 2025-05-19 | Enhancing Transformers Through Conditioned Embedded Tokens | Hemanth Saratchandran et.al. | 2505.12789 | null |
| 2025-05-19 | An approach based on class activation maps for investigating the effects of data augmentation on neural networks for image classification | Lucas M. Dorneles et.al. | 2505.12581 | null |
| 2025-05-16 | Energy efficiency analysis of Spiking Neural Networks for space applications | Paolo Lunghi et.al. | 2505.11418 | null |
| 2025-05-16 | Harnessing Photon Indistinguishability in Quantum Extreme Learning Machines | Malo Joly et.al. | 2505.11238 | null |
| 2025-05-16 | CheX-DS: Improving Chest X-ray Image Classification with Ensemble Learning Based on DenseNet and Swin Transformer | Xinran Li et.al. | 2505.11168 | null |
| 2025-05-16 | Privacy-Aware Lifelong Learning | Ozan Özdenizci et.al. | 2505.10941 | null |
| 2025-05-16 | MCU: Improving Machine Unlearning through Mode Connectivity | Yingdan Shi et.al. | 2505.10859 | null |
| 2025-05-15 | CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier | Ziyang Ou et.al. | 2505.10664 | null |
| 2025-05-15 | Research of the Variational Shadow Quantum Circuit Based on the Whale Optimization Algorithm in Image Classification | Shuang Wu et.al. | 2505.09994 | null |
| 2025-05-14 | Quantum-Enhanced Parameter-Efficient Learning for Typhoon Trajectory Forecasting | Chen-Yu Liu et.al. | 2505.09395 | null |
| 2025-05-14 | Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis | Bingxin Ke et.al. | 2505.09358 | link |
| 2025-05-17 | PrePrompt: Predictive prompting for class incremental learning | Libo Huang et.al. | 2505.08586 | link |
| 2025-05-13 | Convolutional Spiking Neural Network for Image Classification | Mikhail Kiselev et.al. | 2505.08514 | null |
| 2025-05-13 | CNN and ViT Efficiency Study on Tiny ImageNet and DermaMNIST Datasets | Aidar Amangeldi et.al. | 2505.08259 | null |
| 2025-05-13 | Empowering Vision Transformers with Multi-Scale Causal Intervention for Long-Tailed Image Classification | Xiaoshuo Yan et.al. | 2505.08173 | null |
| 2025-05-13 | MoKD: Multi-Task Optimization for Knowledge Distillation | Zeeshan Hayder et.al. | 2505.08170 | null |
| 2025-05-12 | Hierarchical Sparse Attention Framework for Computationally Efficient Classification of Biological Cells | Elad Yoshai et.al. | 2505.07661 | null |
| 2025-05-12 | Synthetic Similarity Search in Automotive Production | Christoph Huber et.al. | 2505.07256 | null |
| 2025-05-12 | Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models | Yan Xie et.al. | 2505.07209 | null |
| 2025-05-12 | KDH-MLTC: Knowledge Distillation for Healthcare Multi-Label Text Classification | Hajar Sakai et.al. | 2505.07162 | null |
| 2025-05-11 | A Vision-Language Foundation Model for Leaf Disease Identification | Khang Nguyen Quoc et.al. | 2505.07019 | null |
| 2025-05-11 | Image Classification Using a Diffusion Model as a Pre-Training Model | Kosuke Ukita et.al. | 2505.06890 | null |
| 2025-05-11 | NeuRN: Neuro-inspired Domain Generalization for Image Classification | Hamd Jalil et.al. | 2505.06881 | null |
| 2025-05-11 | Active Learning for Multi-class Image Classification | Thien Nhan Vo et.al. | 2505.06825 | null |
| 2025-05-10 | FNBench: Benchmarking Robust Federated Learning against Noisy Labels | Xuefeng Jiang et.al. | 2505.06684 | link |
| 2025-05-10 | The Efficiency of Pre-training with Objective Masking in Pseudo Labeling for Semi-Supervised Text Classification | Arezoo Hatefi et.al. | 2505.06624 | null |
| 2025-05-09 | Adapting a Segmentation Foundation Model for Medical Image Classification | Pengfei Gu et.al. | 2505.06217 | null |
| 2025-05-09 | Towards Robust Few-Shot Text Classification Using Transformer Architectures and Dual Loss Strategies | Xu Han et.al. | 2505.06145 | null |
| 2025-05-09 | Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification | Leon Eshuijs et.al. | 2505.06032 | link |
| 2025-05-09 | Efficient Quantum Convolutional Neural Networks for Image Classification: Overcoming Hardware Constraints | Peter Röseler et.al. | 2505.05957 | null |
| 2025-05-09 | Achieving 3D Attention via Triplet Squeeze and Excitation Block | Maan Alhazmi et.al. | 2505.05943 | null |
| 2025-05-09 | Improving Generalizability of Kolmogorov-Arnold Networks via Error-Correcting Output Codes | Youngjoon Lee et.al. | 2505.05798 | null |
| 2025-05-09 | Variational Bayesian Logistic Tensor Regression with Application to Image Recognition | Yunzhi Jin et.al. | 2505.05730 | null |
| 2025-05-08 | V-EfficientNets: Vector-Valued Efficiently Scaled Convolutional Neural Network Models | Guilherme Vieira Neto et.al. | 2505.05659 | link |
| 2025-05-08 | KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text Classification | Qianbo Zang et.al. | 2505.05583 | link |
| 2025-05-08 | Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It | Marvin F. da Silva et.al. | 2505.05409 | null |
| 2025-05-08 | Quantum Surrogate-Driven Image Classifier: A Gradient-Free Approach to Avoid Barren Plateaus | Yichen Xie et.al. | 2505.05249 | null |
| 2025-05-08 | Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models | Wei Peng et.al. | 2505.05189 | null |
| 2025-05-08 | CacheFL: Efficient Federated Cache Model Fine-Tuning for Vision-Language Models | Mengjun Yi et.al. | 2505.05130 | null |
| 2025-05-08 | Direct Image Classification from Fourier Ptychographic Microscopy Measurements without Reconstruction | Navya Sonal Agarwal et.al. | 2505.05054 | null |
| 2025-05-07 | ORXE: Orchestrating Experts for Dynamically Configurable Efficiency | Qingyuan Wang et.al. | 2505.04850 | null |
| 2025-05-07 | Label-efficient Single Photon Images Classification via Active Learning | Zili Zhang et.al. | 2505.04376 | null |
| 2025-05-07 | FRAIN to Train: A Fast-and-Reliable Solution for Decentralized Federated Learning | Sanghyeon Park et.al. | 2505.04223 | null |
| 2025-05-06 | Read My Ears! Horse Ear Movement Detection for Equine Affective State Assessment | João Alves et.al. | 2505.03554 | null |
| 2025-05-06 | Noisy HQNNs: A Comprehensive Analysis of Noise Robustness in Hybrid Quantum Neural Networks | Tasnim Ahmed et.al. | 2505.03378 | null |
| 2025-05-06 | A Vision-Language Model for Focal Liver Lesion Classification | Song Jian et.al. | 2505.03350 | null |
| 2025-05-06 | Comparative Analysis of Lightweight Deep Learning Models for Memory-Constrained Devices | Tasnim Shahriar et.al. | 2505.03303 | null |
| 2025-05-06 | Survey of Abstract Meaning Representation: Then, Now, Future | Behrooz Mansouri et.al. | 2505.03229 | null |
| 2025-05-06 | seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models | Hafez Ghaemi et.al. | 2505.03176 | null |
| 2025-05-06 | Enhancing Glass Defect Detection with Diffusion Models: Addressing Imbalanced Datasets in Manufacturing Quality Control | Sajjad Rezvani Boroujeni et.al. | 2505.03134 | null |
| 2025-05-05 | Bayesian Robust Aggregation for Federated Learning | Aleksandr Karakulev et.al. | 2505.02490 | null |
| 2025-05-06 | Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets | Wei Liu et.al. | 2505.02118 | null |
| 2025-05-03 | Backdoor Attacks Against Patch-based Mixture of Experts | Cedric Chan et.al. | 2505.01811 | null |
| 2025-05-03 | Low-Complexity Acoustic Scene Classification with Device Information in the DCASE 2025 Challenge | Florian Schmid et.al. | 2505.01747 | null |
| 2025-05-03 | CLOG-CD: Curriculum Learning based on Oscillating Granularity of Class Decomposed Medical Image Classification | Asmaa Abbas et.al. | 2505.01741 | null |
| 2025-05-02 | TActiLE: Tiny Active LEarning for wearable devices | Massimo Pavan et.al. | 2505.01160 | null |
| 2025-04-30 | Towards Improved Cervical Cancer Screening: Vision Transformer-Based Classification and Interpretability | Khoa Tuan Nguyen et.al. | 2504.21340 | null |
| 2025-04-28 | AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection | Jianbo Gao et.al. | 2504.21044 | null |
| 2025-04-29 | Photonic Quantum Convolutional Neural Networks with Adaptive State Injection | Léo Monbroussou et.al. | 2504.20989 | null |
| 2025-04-30 | DS_FusionNet: Dynamic Dual-Stream Fusion with Bidirectional Knowledge Distillation for Plant Disease Recognition | Yanghui Song et.al. | 2504.20948 | link |
| 2025-04-29 | MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification | Yichu Xu et.al. | 2504.20509 | null |
| 2025-04-28 | DeepAndes: A Self-Supervised Vision Foundation Model for Multi-Spectral Remote Sensing Imagery of the Andes | Junlin Guo et.al. | 2504.20303 | null |
| 2025-04-28 | GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets | Mingqian He et.al. | 2504.19898 | null |
| 2025-04-28 | Reinforcement Learning-Based Heterogeneous Multi-Task Optimization in Semantic Broadcast Communications | Zhilin Lu et.al. | 2504.19806 | null |
| 2025-04-28 | Explaining Vision GNNs: A Semantic and Visual Analysis of Graph-based Image Classification | Nikolaos Chaidos et.al. | 2504.19682 | null |
| 2025-04-28 | Hardware/Software Co-Design of RISC-V Extensions for Accelerating Sparse DNNs on FPGAs | Muhammad Sabih et.al. | 2504.19659 | null |
| 2025-04-28 | Neural network task specialization via domain constraining | Roman Malashin et.al. | 2504.19592 | null |
| 2025-04-28 | GMAR: Gradient-Driven Multi-Head Attention Rollout for Vision Transformer Interpretability | Sehyeong Jo et.al. | 2504.19414 | null |
| 2025-04-27 | Dual-Branch Residual Network for Cross-Domain Few-Shot Hyperspectral Image Classification with Refined Prototype | Anyong Qin et.al. | 2504.19074 | null |
| 2025-04-26 | Advancing Scientific Text Classification: Fine-Tuned Models with Dataset Expansion and Hard-Voting | Zhyar Rzgar K Rostam et.al. | 2504.19021 | null |
| 2025-04-26 | A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification | Junichiro Niimi et.al. | 2504.18884 | link |
| 2025-04-26 | IoT Botnet Detection: Application of Vision Transformer to Classification of Network Flow Traffic | Hassan Wasswa et.al. | 2504.18781 | null |
| 2025-04-25 | Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models | Patrick Müller et.al. | 2504.18510 | null |
| 2025-04-25 | Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training | Hiroki Naganuma et.al. | 2504.18454 | null |
| 2025-04-25 | Passive All-Optical Nonlinear Neuron Activation via PPLN Nanophotonic Waveguides | Wujie Fu et.al. | 2504.18145 | null |
| 2025-04-25 | DMS-Net:Dual-Modal Multi-Scale Siamese Network for Binocular Fundus Image Classification | Guohao Huo et.al. | 2504.18046 | null |
| 2025-04-24 | Disaggregated Deep Learning via In-Physics Computing at Radio Frequency | Zhihui Gao et.al. | 2504.17752 | null |
| 2025-04-24 | Aerial Image Classification in Scarce and Unconstrained Environments via Conformal Prediction | Farhad Pourkamali-Anaraki et.al. | 2504.17655 | null |
| 2025-04-24 | Enhanced Sample Selection with Confidence Tracking: Identifying Correctly Labeled yet Hard-to-Learn Samples in Noisy Data | Weiran Pan et.al. | 2504.17474 | null |
| 2025-04-24 | Dual-Individual Genetic Algorithm: A Dual-Individual Approach for Efficient Training of Multi-Layer Neural Networks | Tran Thuy Nga Truong et.al. | 2504.17346 | null |
| 2025-04-24 | Evaluating and Mitigating Bias in AI-Based Medical Text Generation | Xiuying Chen et.al. | 2504.17279 | null |
| 2025-04-24 | Group Downsampling with Equivariant Anti-aliasing | Md Ashiqur Rahman et.al. | 2504.17258 | link |
| 2025-04-24 | Multi-Modal Traffic Analysis: Integrating Time-Series Forecasting, Accident Prediction, and Image Classification | Nivedita M et.al. | 2504.17232 | null |
| 2025-04-23 | A Diff-Attention Aware State Space Fusion Model for Remote Sensing Classification | Wenping Ma et.al. | 2504.16665 | null |
| 2025-04-23 | Streetscape Analysis with Generative AI (SAGAI): Vision-Language Assessment and Mapping of Urban Scenes | Joan Perez et.al. | 2504.16538 | null |
| 2025-04-24 | An Effective Gram Matrix Characterizes Generalization in Deep Networks | Rubing Yang et.al. | 2504.16450 | null |
| 2025-04-23 | FrogDogNet: Fourier frequency Retained visual prompt Output Guidance for Domain Generalization of CLIP in Remote Sensing | Hariseetharam Gunduboina et.al. | 2504.16433 | null |
| 2025-04-22 | CLIP-IT: CLIP-based Pairing for Histology Images Classification | Banafsheh Karimian et.al. | 2504.16181 | null |
| 2025-04-22 | Automated Bug Report Prioritization in Large Open-Source Projects | Riley Pierson et.al. | 2504.15912 | null |
| 2025-04-22 | Generative AI for Research Data Processing: Lessons Learnt From Three Use Cases | Modhurita Mitra et.al. | 2504.15829 | null |
| 2025-04-22 | DualOptim: Enhancing Efficacy and Stability in Machine Unlearning with Dual Optimizers | Xuyang Zhong et.al. | 2504.15827 | null |
| 2025-04-22 | HS-Mamba: Full-Field Interaction Multi-Groups Mamba for Hyperspectral Image Classification | Hongxing Peng et.al. | 2504.15612 | null |
| 2025-04-22 | LLM-based Semantic Augmentation for Harmful Content Detection | Elyas Meguellati et.al. | 2504.15548 | null |
| 2025-04-21 | Feeding LLM Annotations to BERT Classifiers at Your Own Risk | Yucheng Lu et.al. | 2504.15432 | null |
| 2025-04-21 | Dynamic 3D KAN Convolution with Adaptive Grid Optimization for Hyperspectral Image Classification | Guandong Li et.al. | 2504.15155 | null |
| 2025-04-21 | Application of Sensitivity Analysis Methods for Studying Neural Network Models | Jiaxuan Miao et.al. | 2504.15100 | null |
| 2025-04-21 | Trainable Quantum Neural Network for Multiclass Image Classification with the Power of Pre-trained Tree Tensor Networks | Keisuke Murota et.al. | 2504.14995 | null |
| 2025-04-21 | ECViT: Efficient Convolutional Vision Transformer with Local-Attention and Multi-scale Stages | Zhoujie Qian et.al. | 2504.14825 | null |
| 2025-04-21 | What Lurks Within? Concept Auditing for Shared Diffusion Models at Scale | Xiaoyong Yuan et.al. | 2504.14815 | null |
| 2025-04-21 | A Basic Evaluation of Neural Networks Trained with the Error Diffusion Learning Algorithm | Kazuhisa Fujita et.al. | 2504.14814 | null |
| 2025-04-19 | Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation | Muhammad Haseeb Aslam et.al. | 2504.14307 | null |
| 2025-04-19 | Exploring Modality Guidance to Enhance VFM-based Feature Fusion for UDA in 3D Semantic Segmentation | Johannes Spoecklberger et.al. | 2504.14231 | null |
| 2025-04-19 | Enhancing Multimodal In-Context Learning for Image Classification through Coreset Optimization | Huiyi Chen et.al. | 2504.14200 | null |
| 2025-04-19 | ThyroidEffi 1.0: A Cost-Effective System for High-Performance Multi-Class Thyroid Carcinoma Classification | Hai Pham-Ngoc et.al. | 2504.14139 | null |
| 2025-04-18 | Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models | Junjie Yang et.al. | 2504.13825 | null |
| 2025-04-18 | CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning | Yang Yue et.al. | 2504.13820 | link |
| 2025-04-18 | Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis | Zhu Zhu et.al. | 2504.13754 | null |
| 2025-04-18 | Human-aligned Deep Learning: Explainability, Causality, and Biological Inspiration | Gianluca Carloni et.al. | 2504.13717 | null |
| 2025-04-18 | Word Embedding Techniques for Classification of Star Ratings | Hesham Abdelmotaleb et.al. | 2504.13653 | null |
| 2025-04-18 | Cross-Hierarchical Bidirectional Consistency Learning for Fine-Grained Visual Classification | Pengxiang Gao et.al. | 2504.13608 | null |
| 2025-04-18 | MAAM: A Lightweight Multi-Agent Aggregation Module for Efficient Image Classification Based on the MindSpore Framework | Zhenkai Qin et.al. | 2504.13574 | null |
| 2025-04-18 | Bayesian continual learning and forgetting in neural networks | Djohan Bonnet et.al. | 2504.13569 | null |
| 2025-04-17 | Dynamic Memory-enhanced Transformer for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2504.13242 | null |
| 2025-04-17 | Perception Encoder: The best visual embeddings are not at the output of the network | Daniel Bolya et.al. | 2504.13181 | null |
| 2025-04-17 | Expert Kernel Generation Network Driven by Contextual Mapping for Hyperspectral Image Classification | Guandong Li et.al. | 2504.13045 | null |
| 2025-04-17 | Quantum Computing Supported Adversarial Attack-Resilient Autonomous Vehicle Perception Module for Traffic Sign Classification | Reek Majumder et.al. | 2504.12644 | null |
| 2025-04-16 | GLUSE: Enhanced Channel-Wise Adaptive Gated Linear Units SE for Onboard Satellite Earth Observation Image Classification | Thanh-Dung Le et.al. | 2504.12484 | null |
| 2025-04-16 | FLIP Reasoning Challenge | Andreas Plesner et.al. | 2504.12256 | null |
| 2025-04-16 | Weakly Semi-supervised Whole Slide Image Classification by Two-level Cross Consistency Supervision | Linhao Qu et.al. | 2504.12132 | null |
| 2025-04-16 | Exploring Video-Based Driver Activity Recognition under Noisy Labels | Linjuan Fan et.al. | 2504.11966 | link |
| 2025-04-17 | Selective Attention Federated Learning: Improving Privacy and Efficiency for Clinical Text Classification | Yue Li et.al. | 2504.11793 | null |
| 2025-04-15 | The Pontryagin Maximum Principle for Training Convolutional Neural Networks | Sebastian Hofmann et.al. | 2504.11647 | null |
| 2025-04-15 | Deep Learning Approaches for Medical Imaging Under Varying Degrees of Label Availability: A Comprehensive Survey | Siteng Ma et.al. | 2504.11588 | null |
| 2025-04-15 | Diversity-Driven Learning: Tackling Spurious Correlations and Data Heterogeneity in Federated Models | Gergely D. Németh et.al. | 2504.11216 | null |
| 2025-04-15 | Embedding Radiomics into Vision Transformers for Multimodal Medical Image Classification | Zhenyu Yang et.al. | 2504.10916 | null |
| 2025-04-15 | Progressive Rock Music Classification | Arpan Nagar et.al. | 2504.10821 | null |
| 2025-04-15 | 3D Wavelet Convolutions with Extended Receptive Fields for Hyperspectral Image Classification | Guandong Li et.al. | 2504.10795 | null |
| 2025-04-14 | Quantum Image Classification: Experiments on Utility-Scale Quantum Computers | Hrant Gharibyan et.al. | 2504.10595 | null |
| 2025-04-14 | LEMUR Neural Network Dataset: Towards Seamless AutoML | Arash Torabi Goodarzi et.al. | 2504.10552 | null |
| 2025-04-13 | An Efficient Quantum Classifier Based on Hamiltonian Representations | Federico Tiblias et.al. | 2504.10542 | null |
| 2025-04-14 | Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning | LeiLei Ma et.al. | 2504.09990 | null |
| 2025-04-14 | GFT: Gradient Focal Transformer | Boris Kriuk et.al. | 2504.09852 | null |
| 2025-04-13 | PCM-SAR: Physics-Driven Contrastive Mutual Learning for SAR Classification | Pengfei Wang et.al. | 2504.09502 | null |
| 2025-04-13 | InfoBound: A Provable Information-Bounds Inspired Framework for Both OoD Generalization and OoD Detection | Lin Zhu et.al. | 2504.09448 | null |
| 2025-04-13 | Sparse Deformable Mamba for Hyperspectral Image Classification | Lincoln Linlin Xu et.al. | 2504.09446 | null |
| 2025-04-12 | Cycle Training with Semi-Supervised Domain Adaptation: Bridging Accuracy and Efficiency for Real-Time Mobile Scene Detection | Huu-Phong Phan-Nguyen et.al. | 2504.09297 | null |
| 2025-04-12 | Sparse Hybrid Linear-Morphological Networks | Konstantinos Fotopoulos et.al. | 2504.09289 | null |
| 2025-04-12 | Mixture of Group Experts for Learning Invariant Representations | Lei Kang et.al. | 2504.09265 | null |
| 2025-04-12 | Langformers: Unified NLP Pipelines for Language Models | Rabindra Lamsal et.al. | 2504.09170 | null |
| 2025-04-12 | Evolved Hierarchical Masking for Self-Supervised Learning | Zhanzhou Feng et.al. | 2504.09155 | null |
| 2025-04-11 | Hypergraph Vision Transformers: Images are More than Nodes, More than Edges | Joshua Fixelle et.al. | 2504.08710 | null |
| 2025-04-11 | Integrated ensemble of BERT- and features-based models for authorship attribution in Japanese literary works | Taisei Kanda et.al. | 2504.08527 | null |
| 2025-04-11 | An Early Experience with Confidential Computing Architecture for On-Device Model Protection | Sina Abdollahi et.al. | 2504.08508 | null |
| 2025-04-11 | The inherent convolution property of quantum neural networks | Guangkai Qu et.al. | 2504.08487 | null |
| 2025-04-11 | A Hybrid Fully Convolutional CNN-Transformer Model for Inherently Interpretable Medical Image Classification | Kerol Djoumessi et.al. | 2504.08481 | null |
| 2025-04-11 | FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations | Cheng-Yu Hsieh et.al. | 2504.08368 | null |
| 2025-04-11 | Comparative Analysis of Different Methods for Classifying Polychromatic Sketches | Fahd Baba et.al. | 2504.08186 | null |
| 2025-04-11 | Pychop: Emulating Low-Precision Arithmetic in Numerical Methods and Neural Networks | Erin Carson et.al. | 2504.07835 | null |
| 2025-04-10 | Traversal Learning Coordination For Lossless And Efficient Distributed Learning | Erdenebileg Batbaatar et.al. | 2504.07471 | null |
| 2025-04-09 | Identifying regions of interest in whole slide images of renal cell carcinoma | Mohammed Lamine Benomar et.al. | 2504.07313 | null |
| 2025-04-09 | A new training approach for text classification in Mental Health: LatentGLoss | Korhan Sevinç et.al. | 2504.07245 | null |
| 2025-04-09 | Deep Learning for Cardiovascular Risk Assessment: Proxy Features from Carotid Sonography as Predictors of Arterial Damage | Christoph Balada et.al. | 2504.06680 | null |
| 2025-04-08 | Memory-Modular Classification: Learning to Generalize with Memory Replacement | Dahyun Kang et.al. | 2504.06021 | null |
| 2025-04-08 | Federated Unlearning Made Practical: Seamless Integration via Negated Pseudo-Gradients | Alessio Mora et.al. | 2504.05822 | null |
| 2025-04-08 | DefMamba: Deformable Visual State Space Model | Leiye Liu et.al. | 2504.05794 | null |
| 2025-04-08 | Layer-Aware Embedding Fusion for LLMs in Text Classifications | Jiho Gwak et.al. | 2504.05764 | null |
| 2025-04-07 | REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding | Sakib Reza et.al. | 2504.05491 | null |
| 2025-04-07 | Secure Diagnostics: Adversarial Robustness Meets Clinical Interpretability | Mohammad Hossein Najafi et.al. | 2504.05483 | null |
| 2025-04-07 | Explaining Low Perception Model Competency with High-Competency Counterfactuals | Sara Pohland et.al. | 2504.05254 | null |
| 2025-04-07 | Federated Learning for Medical Image Classification: A Comprehensive Benchmark | Zhekai Zhou et.al. | 2504.05238 | null |
| 2025-04-07 | Batch Aggregation: An Approach to Enhance Text Classification with Correlated Augmented Data | Charco Hui et.al. | 2504.05020 | null |
| 2025-04-07 | RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model | Congcong Wen et.al. | 2504.04988 | null |
| 2025-04-06 | Your Image Generator Is Your New Private Dataset | Nicolo Resmini et.al. | 2504.04582 | null |
| 2025-04-06 | Attributed Synthetic Data Generation for Zero-shot Domain-specific Image Classification | Shijian Wang et.al. | 2504.04510 | null |
| 2025-04-06 | Spatial-Geometry Enhanced 3D Dynamic Snake Convolutional Neural Network for Hyperspectral Image Classification | Guandong Li et.al. | 2504.04463 | null |
| 2025-04-05 | A Comparative Study of Explainable AI Methods: Model-Agnostic vs. Model-Specific Approaches | Keerthi Devireddy et.al. | 2504.04276 | null |
| 2025-04-05 | GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models | Hengyu Luo et.al. | 2504.04155 | null |
| 2025-04-05 | Scaling Federated Learning Solutions with Kubernetes for Synthesizing Histopathology Images | Andrei-Alexandru Preda et.al. | 2504.04130 | null |
| 2025-04-04 | Adaptive Classification of Interval-Valued Time Series | Wan Tian et.al. | 2504.03318 | null |
| 2025-04-04 | Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction | Junlang Qian et.al. | 2504.03159 | null |
| 2025-04-03 | HQViT: Hybrid Quantum Vision Transformer for Image Classification | Hui Zhang et.al. | 2504.02730 | null |
| 2025-04-03 | LLM-Guided Evolution: An Autonomous Model Optimization for Object Detection | YiMing Yu et.al. | 2504.02280 | null |
| 2025-04-02 | Neural Style Transfer for Synthesising a Dataset of Ancient Egyptian Hieroglyphs | Lewis Matheson Creed et.al. | 2504.02163 | null |
| 2025-04-02 | A thorough benchmark of automatic text classification: From traditional approaches to large language models | Washington Cunha et.al. | 2504.01930 | link |
| 2025-04-02 | A Randomized Zeroth-Order Hierarchical Framework for Heterogeneous Federated Learning | Yuyang Qiu et.al. | 2504.01839 | null |
| 2025-04-02 | A Novel Approach To Implementing Knowledge Distillation In Tsetlin Machines | Calvin Kinateder et.al. | 2504.01798 | null |
| 2025-04-02 | Token Pruning in Audio Transformers: Optimizing Performance and Decoding Patch Importance | Taehan Lee et.al. | 2504.01690 | link |
| 2025-04-02 | All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning | Zheng Yang et.al. | 2504.01396 | null |
| 2025-04-01 | TenAd: A Tensor-based Low-rank Black Box Adversarial Attack for Video Classification | Kimia haghjooei et.al. | 2504.01228 | null |
| 2025-04-01 | PolygoNet: Leveraging Simplified Polygonal Representation for Effective Image Classification | Salim Khazem et.al. | 2504.01214 | link |
| 2025-04-01 | Enabling Efficient Processing of Spiking Neural Networks with On-Chip Learning on Commodity Neuromorphic Processors for Edge AI Systems | Rachmad Vidya Wicaksana Putra et.al. | 2504.00957 | null |
| 2025-04-01 | Impact of Data Duplication on Deep Neural Network-Based Image Classifiers: Robust vs. Standard Models | Alireza Aghabagherloo et.al. | 2504.00638 | null |
| 2025-04-01 | Geometric Median Matching for Robust k-Subset Selection from Noisy Data | Anish Acharya et.al. | 2504.00564 | null |
| 2025-03-31 | NoProp: Training Neural Networks without Back-propagation or Forward-propagation | Qinyu Li et.al. | 2503.24322 | null |
| 2025-03-31 | CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization | Yingrui Ji et.al. | 2503.24182 | null |
| 2025-03-31 | PixelCAM: Pixel Class Activation Mapping for Histology Image Classification and ROI Localization | Alexis Guichemerre et.al. | 2503.24135 | link |
| 2025-03-31 | Crossmodal Knowledge Distillation with WordNet-Relaxed Text Embeddings for Robust Image Classification | Chenqi Guo et.al. | 2503.24017 | null |
| 2025-03-31 | FlexiMo: A Flexible Remote Sensing Foundation Model | Xuyang Li et.al. | 2503.23844 | null |
| 2025-03-31 | Expanding-and-Shrinking Binary Neural Networks | Xulong Shi et.al. | 2503.23709 | link |
| 2025-03-31 | WHERE and WHICH: Iterative Debate for Biomedical Synthetic Data Augmentation | Zhengyi Zhao et.al. | 2503.23673 | null |
| 2025-03-30 | Efficient Dynamic Attention 3D Convolution for Hyperspectral Image Classification | Guandong Li et.al. | 2503.23472 | null |
| 2025-03-30 | KernelDNA: Dynamic Kernel Sharing via Decoupled Naive Adapters | Haiduo Huang et.al. | 2503.23379 | link |
| 2025-03-29 | Optimizing Distributed Training Approaches for Scaling Neural Networks | Vishnu Vardhan Baligodugula et.al. | 2503.23186 | null |
| 2025-03-28 | Data-Free Universal Attack by Exploiting the Intrinsic Vulnerability of Deep Models | YangTian Yan et.al. | 2503.22205 | link |
| 2025-03-28 | Route-and-Aggregate Decentralized Federated Learning Under Communication Errors | Weicai Li et.al. | 2503.22186 | null |
| 2025-03-27 | On Large Multimodal Models as Open-World Image Classifiers | Alessandro Conti et.al. | 2503.21851 | link |
| 2025-03-27 | Bayesian Pseudo Posterior Mechanism for Differentially Private Machine Learning | Robert Chew et.al. | 2503.21528 | null |
| 2025-03-27 | Retinal Fundus Multi-Disease Image Classification using Hybrid CNN-Transformer-Ensemble Architectures | Deependra Singh et.al. | 2503.21465 | link |
| 2025-03-27 | Fine-Tuning LLMs on Small Medical Datasets: Text Classification and Normalization Effectiveness on Cardiology reports and Discharge records | Noah Losch et.al. | 2503.21349 | null |
| 2025-03-27 | Improving $(α, f)$ -Byzantine Resilience in Federated Learning via layerwise aggregation and cosine distance | Mario García-Márquez et.al. | 2503.21244 | link |
| 2025-03-27 | Neural Architecture Search by Learning a Hierarchical Search Space | Mehraveh Javan Roshtkhari et.al. | 2503.21061 | null |
| 2025-03-26 | TS-Inverse: A Gradient Inversion Attack Tailored for Federated Time Series Forecasting Models | Caspar Meijer et.al. | 2503.20952 | link |
| 2025-03-26 | VESTA: A Versatile SNN-Based Transformer Accelerator with Unified PEs for Multiple Computational Layers | Ching-Yao Chen et.al. | 2503.20246 | null |
| 2025-03-26 | BeLightRec: A lightweight recommender system enhanced with BERT | Manh Mai Van et.al. | 2503.20206 | null |
| 2025-03-25 | Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders | Paul Koch et.al. | 2503.19947 | null |
| 2025-03-25 | Optimizing Breast Cancer Detection in Mammograms: A Comprehensive Study of Transfer Learning, Resolution Reduction, and Multi-View Classification | Daniel G. P. Petrini et.al. | 2503.19945 | null |
| 2025-03-25 | Extensions of regret-minimization algorithm for optimal design | Youguang Chen et.al. | 2503.19874 | null |
| 2025-03-25 | VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models | Suhas G Hegde et.al. | 2503.19530 | null |
| 2025-03-25 | LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text | Weizhi Chen et.al. | 2503.19311 | null |
| 2025-03-25 | Face Spoofing Detection using Deep Learning | Najeebullah et.al. | 2503.19223 | link |
| 2025-03-24 | Exploring the Integration of Key-Value Attention Into Pure and Hybrid Transformers for Semantic Segmentation | DeShin Hwa et.al. | 2503.18862 | null |
| 2025-03-24 | Latent Space Class Dispersion: Effective Test Data Quality Assessment for DNNs | Vivek Vekariya et.al. | 2503.18799 | null |
| 2025-03-24 | Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks | Nina Shvetsova et.al. | 2503.18637 | null |
| 2025-03-24 | Explaining Domain Shifts in Language: Concept erasing for Interpretable Image Classification | Zequn Zeng et.al. | 2503.18483 | null |
| 2025-03-24 | Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning | Junsong Li et.al. | 2503.18432 | null |
| 2025-03-24 | Sun-Shine: A Large Language Model for Tibetan Culture | Cheng Huang et.al. | 2503.18288 | null |
| 2025-03-23 | Feature Learning beyond the Lazy-Rich Dichotomy: Insights from Representational Geometry | Chi-Ning Chou et.al. | 2503.18114 | null |
| 2025-03-23 | What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images | Dongheng Lin et.al. | 2503.17899 | null |
| 2025-03-21 | Spatiotemporal Learning with Context-aware Video Tubelets for Ultrasound Video Analysis | Gary Y. Li et.al. | 2503.17475 | null |
| 2025-03-21 | Leveraging Text-to-Image Generation for Handling Spurious Correlation | Aryan Yazdan Parast et.al. | 2503.17226 | null |
| 2025-03-21 | CoRLD: Contrastive Representation Learning Of Deformable Shapes In Images | Tonmoy Hossain ana Miaomiao Zhang et.al. | 2503.17162 | null |
| 2025-03-21 | Beyond Accuracy: What Matters in Designing Well-Behaved Models? | Robin Hesse et.al. | 2503.17110 | null |
| 2025-03-21 | Symbolic Audio Classification via Modal Decision Tree Learning | Enrico Marzano et.al. | 2503.17018 | null |
| 2025-03-21 | EasyRobust: A Comprehensive and Easy-to-use Toolkit for Robust and Generalized Vision | Xiaofeng Mao et.al. | 2503.16975 | null |
| 2025-03-21 | City2Scene: Improving Acoustic Scene Classification with City Features | Yiqiang Cai et.al. | 2503.16862 | null |
| 2025-03-20 | MobilePlantViT: A Mobile-friendly Hybrid ViT for Generalized Plant Disease Image Classification | Moshiur Rahman Tonmoy et.al. | 2503.16628 | null |
| 2025-03-20 | PSA-MIL: A Probabilistic Spatial Attention-Based Multiple Instance Learning for Whole Slide Image Classification | Sharon Peled et.al. | 2503.16284 | link |
| 2025-03-20 | CLS-RL: Image Classification with Rule-Based Reinforcement Learning | Ming Li et.al. | 2503.16188 | link |
| 2025-03-20 | Corrective In-Context Learning: Evaluating Self-Correction in Large Language Models | Mario Sanz-Guerrero et.al. | 2503.16022 | link |
| 2025-03-20 | Beyond the Visible: Multispectral Vision-Language Learning for Earth Observation | Clive Tinashe Marimo et.al. | 2503.15969 | link |
| 2025-03-19 | Graph-Weighted Contrastive Learning for Semi-Supervised Hyperspectral Image Classification | Yuqing Zhang et.al. | 2503.15731 | null |
| 2025-03-20 | Dynamic Bi-Elman Attention Networks (DBEAN): Dual-Directional Context-Aware Representation Learning for Enhanced Text Classification | ZhengLin Lai et.al. | 2503.15469 | link |
| 2025-03-19 | Test-Time Backdoor Detection for Object Detection Models | Hangtao Zhang et.al. | 2503.15293 | null |
| 2025-03-19 | Efficient allocation of image recognition and LLM tasks on multi-GPU system | Marcin Lawenda et.al. | 2503.15252 | null |
| 2025-03-19 | Comparing Llama3 and DeepSeekR1 on Biomedical Text Classification Tasks | Yuting Guo et.al. | 2503.15169 | null |
| 2025-03-19 | ARC: Anchored Representation Clouds for High-Resolution INR Classification | Joost Luijmes et.al. | 2503.15156 | null |
| 2025-03-19 | Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models | Tingxiu Chen et.al. | 2503.14966 | null |
| 2025-03-19 | Optimal Transport Adapter Tuning for Bridging Modality Gaps in Few-Shot Remote Sensing Scene Classification | Zhong Ji et.al. | 2503.14938 | null |
| 2025-03-18 | RAT: Boosting Misclassification Detection Ability without Extra Data | Ge Yan et.al. | 2503.14783 | null |
| 2025-03-18 | LipShiFT: A Certifiably Robust Shift-based Vision Transformer | Rohan Menon et.al. | 2503.14751 | null |
| 2025-03-18 | Utilization of Neighbor Information for Image Classification with Different Levels of Supervision | Gihan Jayatilaka et.al. | 2503.14500 | null |
| 2025-03-17 | Neural Edge Histogram Descriptors for Underwater Acoustic Target Recognition | Atharva Agashe et.al. | 2503.13763 | null |
| 2025-03-17 | Micro Text Classification Based on Balanced Positive-Unlabeled Learning | Lin-Han Jia et.al. | 2503.13562 | null |
| 2025-03-17 | Escaping Plato’s Cave: Robust Conceptual Reasoning through Interpretable 3D Neural Object Volumes | Nhi Pham et.al. | 2503.13429 | link |
| 2025-03-17 | Do Vision Models Develop Human-Like Progressive Difficulty Understanding? | Zeyi Huang et.al. | 2503.13058 | null |
| 2025-03-16 | Domain Generalization for Improved Human Activity Recognition in Office Space Videos Using Adaptive Pre-processing | Partho Ghosh et.al. | 2503.12678 | null |
| 2025-03-16 | Scaling Semantic Categories: Investigating the Impact on Vision Transformer Labeling Performance | Anthony Lamelas et.al. | 2503.12617 | null |
| 2025-03-16 | Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy | Jian-Ping Mei et.al. | 2503.12497 | null |
| 2025-03-16 | GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing | Zilun Zhang et.al. | 2503.12490 | null |
| 2025-03-16 | Shape Bias and Robustness Evaluation via Cue Decomposition for Image Classification and Segmentation | Edgar Heinert et.al. | 2503.12453 | null |
| 2025-03-16 | MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification | Jianwei Zhao et.al. | 2503.12401 | null |
| 2025-03-15 | TLAC: Two-stage LMM Augmented CLIP for Zero-Shot Classification | Ans Munir et.al. | 2503.12206 | null |
| 2025-03-15 | Goal-Oriented Source Coding using LDPC Codes for Compressed-Domain Image Classification | Ahcen Aliouat et.al. | 2503.11954 | null |
| 2025-03-14 | Creating a Good Teacher for Knowledge Distillation in Acoustic Scene Classification | Tobias Morocutti et.al. | 2503.11363 | null |
| 2025-03-14 | PARIC: Probabilistic Attention Regularization for Language Guided Image Classification from Pre-trained Vison Language Models | Mayank Nautiyal et.al. | 2503.11360 | null |
| 2025-03-14 | APLA: A Simple Adaptation Method for Vision Transformers | Moein Sorkhei et.al. | 2503.11335 | link |
| 2025-03-14 | Open-Set Plankton Recognition | Joona Kareinen et.al. | 2503.11318 | null |
| 2025-03-14 | MEET: A Million-Scale Dataset for Fine-Grained Geospatial Scene Classification with Zoom-Free Remote Sensing Imagery | Yansheng Li et.al. | 2503.11219 | null |
| 2025-03-14 | Falcon: A Remote Sensing Vision-Language Foundation Model | Kelu Yao et.al. | 2503.11070 | link |
| 2025-03-13 | $(\varepsilon, δ)$ Considered Harmful: Best Practices for Reporting Differential Privacy Guarantees | Juan Felipe Gomez et.al. | 2503.10945 | null |
| 2025-03-13 | Learning Interpretable Logic Rules from Deep Vision Models | Chuqin Geng et.al. | 2503.10547 | null |
| 2025-03-13 | Extreme Learning Machines for Attention-based Multiple Instance Learning in Whole-Slide Image Classification | Rajiv Krishnakumar et.al. | 2503.10510 | null |
| 2025-03-13 | RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing | Fengxiang Wang et.al. | 2503.10392 | link |
| 2025-03-13 | PS3C: An Ensemble-Based Two-Step Framework for Classification of Pep Smear Cell Images | Theo Di Piazza et.al. | 2503.10312 | link |
| 2025-03-13 | Wikipedia is Not a Dictionary, Delete! Text Classification as a Proxy for Analysing Wiki Deletion Discussions | Hsuvas Borkakoty et.al. | 2503.10294 | null |
| 2025-03-13 | A Multi-Modal Federated Learning Framework for Remote Sensing Image Classification | Barış Büyüktaş et.al. | 2503.10262 | null |
| 2025-03-13 | Interpretable Image Classification via Non-parametric Part Prototype Learning | Zhijie Zhu et.al. | 2503.10247 | null |
| 2025-03-13 | Multiplicative Learning | Han Kim et.al. | 2503.10144 | null |
| 2025-03-13 | Cognitive-Mental-LLM: Leveraging Reasoning in Large Language Models for Mental Health Prediction via Online Text | Avinash Patil et.al. | 2503.10095 | null |
| 2025-03-13 | Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild | Damien Teney et.al. | 2503.10065 | null |
| 2025-03-12 | Fair Federated Medical Image Classification Against Quality Shift via Inter-Client Progressive State Matching | Nannan Wu et.al. | 2503.09587 | null |
| 2025-03-12 | Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework | Bakary Badjie et.al. | 2503.09504 | null |
| 2025-03-12 | ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation | Tobias Christian Nauen et.al. | 2503.09399 | link |
| 2025-03-12 | Membership Inference Attacks fueled by Few-Short Learning to detect privacy leakage tackling data integrity | Daniel Jiménez-López et.al. | 2503.09365 | null |
| 2025-03-12 | Deep Learning for Climate Action: Computer Vision Analysis of Visual Narratives on X | Katharina Prasse et.al. | 2503.09361 | null |
| 2025-03-12 | Bayesian Test-Time Adaptation for Vision-Language Models | Lihua Zhou et.al. | 2503.09248 | null |
| 2025-03-12 | Probing Network Decisions: Capturing Uncertainties and Unveiling Vulnerabilities Without Label Information | Youngju Joung et.al. | 2503.09068 | null |
| 2025-03-12 | Discovering Influential Neuron Path in Vision Transformers | Yifan Wang et.al. | 2503.09046 | link |
| 2025-03-11 | KAN-Mixers: a new deep learning architecture for image classification | Jorge Luiz dos Santos Canuto et.al. | 2503.08939 | null |
| 2025-03-12 | MsaMIL-Net: An End-to-End Multi-Scale Aware Multiple Instance Learning Network for Efficient Whole Slide Image Classification | Jiangping Wen et.al. | 2503.08581 | null |
| 2025-03-11 | Generalizable and Explainable Deep Learning for Medical Image Computing: An Overview | Ahmad Chaddad et.al. | 2503.08420 | null |
| 2025-03-11 | Prototype-Based Multiple Instance Learning for Gigapixel Whole Slide Image Classification | Susu Sun et.al. | 2503.08384 | null |
| 2025-03-11 | Tangentially Aligned Integrated Gradients for User-Friendly Explanations | Lachlan Simpson et.al. | 2503.08240 | null |
| 2025-03-11 | EnergyFormer: Energy Attention with Fourier Embedding for Hyperspectral Image Classification | Saad Sohail et.al. | 2503.08239 | null |
| 2025-03-11 | Identification of Star Clusters in M31 from PAndAS Images Based on Deep Learning | Baisong Zhang et.al. | 2503.08130 | null |
| 2025-03-11 | LabelCoRank: Revolutionizing Long Tail Multi-Label Classification with Co-Occurrence Reranking | Yan Yan et.al. | 2503.07968 | null |
| 2025-03-12 | Measuring directional bias amplification in image captions using predictability | Rahul Nair et.al. | 2503.07878 | null |
| 2025-03-10 | Fair Text Classification via Transferable Representations | Thibaud Leteno et.al. | 2503.07691 | null |
| 2025-03-10 | Keeping Representation Similarity in Finetuning for Medical Image Analysis | Wenqiang Zu et.al. | 2503.07399 | null |
| 2025-03-10 | Brain Inspired Adaptive Memory Dual-Net for Few-Shot Image Classification | Kexin Di et.al. | 2503.07396 | null |
| 2025-03-10 | Is My Text in Your AI Model? Gradient-based Membership Inference Test applied to LLMs | Gonzalo Mancera et.al. | 2503.07384 | null |
| 2025-03-10 | Distilling Knowledge into Quantum Vision Transformers for Biomedical Image Classification | Thomas Boucher et.al. | 2503.07294 | null |
| 2025-03-10 | A Zero-shot Learning Method Based on Large Language Models for Multi-modal Knowledge Graph Embedding | Bingchen Liu et.al. | 2503.07202 | null |
| 2025-03-10 | Understanding the Learning Dynamics of LoRA: A Gradient Flow Perspective on Low-Rank Adaptation in Matrix Factorization | Ziqing Xu et.al. | 2503.06982 | null |
| 2025-03-10 | Task Vector Quantization for Memory-Efficient Model Merging | Youngeun Kim et.al. | 2503.06921 | link |
| 2025-03-10 | MADS: Multi-Attribute Document Supervision for Zero-Shot Image Classification | Xiangyan Qu et.al. | 2503.06847 | null |
| 2025-03-09 | Enhancing Layer Attention Efficiency through Pruning Redundant Retrievals | Hanze Li et.al. | 2503.06473 | null |
| 2025-03-09 | M $^3$ amba: CLIP-driven Mamba Model for Multi-modal Remote Sensing Classification | Mingxiang Cao et.al. | 2503.06446 | null |
| 2025-03-07 | Similarity-Based Domain Adaptation with LLMs | Jie He et.al. | 2503.05281 | null |
| 2025-03-07 | Spatial Context-Driven Positive Pair Sampling for Enhanced Histopathology Image Classification | Willmer Rafell Quinones Robles et.al. | 2503.05170 | null |
| 2025-03-07 | Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy | Ruixi Lin et.al. | 2503.05157 | link |
| 2025-03-07 | Grouped Sequential Optimization Strategy – the Application of Hyperparameter Importance Assessment in Deep Learning | Ruinan Wang et.al. | 2503.05106 | null |
| 2025-03-06 | HieroLM: Egyptian Hieroglyph Recovery with Next Word Prediction Language Model | Xuheng Cai et.al. | 2503.04996 | null |
| 2025-03-06 | Label Distribution Learning-Enhanced Dual-KNN for Text Classification | Bo Yuan et.al. | 2503.04869 | null |
| 2025-03-06 | Guiding LLMs to Generate High-Fidelity and High-Quality Counterfactual Explanations for Text Classification | Van Bach Nguyen et.al. | 2503.04463 | null |
| 2025-03-06 | WeakSupCon: Weakly Supervised Contrastive Learning for Encoder Pre-training | Bodong Zhang et.al. | 2503.04165 | null |
| 2025-03-04 | Measurement noise scaling laws for cellular representation learning | Gokul Gowri et.al. | 2503.02726 | null |
| 2025-03-04 | XFMamba: Cross-Fusion Mamba for Multi-View Medical Image Classification | Xiaoyu Zheng et.al. | 2503.02619 | link |
| 2025-03-04 | Remote Sensing Image Classification Using Convolutional Neural Network (CNN) and Transfer Learning Techniques | Mustafa Majeed Abd Zaid et.al. | 2503.02510 | null |
| 2025-03-06 | Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer | Yujiao Yang et.al. | 2503.02495 | link |
| 2025-03-04 | Making Better Mistakes in CLIP-Based Zero-Shot Classification with Hierarchy-Aware Language Prompts | Tong Liang et.al. | 2503.02248 | null |
| 2025-03-04 | Sharpness-Aware Minimization: General Analysis and Improved Rates | Dimitris Oikonomou et.al. | 2503.02225 | null |
| 2025-03-03 | Mathematical Foundation of Interpretable Equivariant Surrogate Models | Jacopo Joy Colombini et.al. | 2503.01942 | null |
| 2025-03-03 | Visual-RFT: Visual Reinforcement Fine-Tuning | Ziyu Liu et.al. | 2503.01785 | link |
| 2025-03-03 | Mamba base PKD for efficient knowledge compression | José Medina et.al. | 2503.01727 | null |
| 2025-03-04 | SAR-W-MixMAE: SAR Foundation Model Training Using Backscatter Power Weighting | Ali Caglayan et.al. | 2503.01181 | null |
| 2025-03-03 | Large Language Models for Healthcare Text Classification: A Systematic Review | Hajar Sakai et.al. | 2503.01159 | null |
| 2025-03-03 | Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning | Jiuyang Dong et.al. | 2502.21130 | null |
| 2025-02-28 | Comparative study of the ansätze in quantum language models | Jordi Del Castillo et.al. | 2502.20744 | null |
| 2025-02-28 | Exploring the Impact of Temperature Scaling in Softmax for Classification and Adversarial Robustness | Hao Xuan et.al. | 2502.20604 | null |
| 2025-02-27 | In-Model Merging for Enhancing the Robustness of Medical Imaging Classification Models | Hu Wang et.al. | 2502.20516 | null |
| 2025-02-27 | Online Meta-learning for AutoML in Real-time (OnMAR) | Mia Gerber et.al. | 2502.20279 | null |
| 2025-03-03 | Gradient-Guided Annealing for Domain Generalization | Aristotelis Ballas et.al. | 2502.20162 | link |
| 2025-02-27 | QPM: Discrete Optimization for Globally Interpretable Image Classification | Thomas Norrenbrock et.al. | 2502.20130 | link |
| 2025-02-27 | ProAPO: Progressively Automatic Prompt Optimization for Visual Classification | Xiangyan Qu et.al. | 2502.19844 | link |
| 2025-02-27 | Text classification using machine learning methods | Bogdan Oancea et.al. | 2502.19801 | null |
| 2025-02-27 | InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models | Shuchang Zhou et.al. | 2502.19777 | null |
| 2025-02-27 | Learning Mask Invariant Mutual Information for Masked Image Modeling | Tao Huang et.al. | 2502.19718 | null |
| 2025-02-27 | Language-Informed Hyperspectral Image Synthesis for Imbalanced-Small Sample Classification via Semi-Supervised Conditional Diffusion Model | Yimin Zhu et.al. | 2502.19700 | null |
| 2025-02-27 | Spatial-Spectral Diffusion Contrastive Representation Network for Hyperspectral Image Classification | Yimin Zhu et.al. | 2502.19699 | null |
| 2025-02-27 | A Residual Multi-task Network for Joint Classification and Regression in Medical Imaging | Junji Lin et.al. | 2502.19692 | null |
| 2025-02-26 | I Know What I Don’t Know: Improving Model Cascades Through Confidence Tuning | Stephan Rabanser et.al. | 2502.19335 | null |
| 2025-02-26 | Active Few-Shot Learning for Text Classification | Saeed Ahmadnia et.al. | 2502.18782 | null |
| 2025-02-25 | Enhancing Image Classification with Augmentation: Data Augmentation Techniques for Improved Image Classification | Saorj Kumar et.al. | 2502.18691 | null |
| 2025-02-25 | Enhancing Text Classification with a Novel Multi-Agent Collaboration Framework Leveraging BERT | Hediyeh Baban et.al. | 2502.18653 | null |
| 2025-02-25 | MedKAN: An Advanced Kolmogorov-Arnold Network for Medical Image Classification | Zhuoqin Yang et.al. | 2502.18416 | null |
| 2025-02-26 | A Fusion Model for Art Author Identification Based on Convolutional Neural Networks and Transformers | Zhenyu Wang et.al. | 2502.18083 | null |
| 2025-02-25 | MAGE: Multi-Head Attention Guided Embeddings for Low Resource Sentiment Classification | Varun Vashisht et.al. | 2502.17987 | null |
| 2025-02-25 | Dual Classification Head Self-training Network for Cross-scene Hyperspectral Image Classification | Rong Liu et.al. | 2502.17879 | null |
| 2025-02-24 | Can Score-Based Generative Modeling Effectively Handle Medical Image Classification? | Sushmita Sarker et.al. | 2502.17727 | link |
| 2025-02-24 | A Priori Generalizability Estimate for a CNN | Cito Balsells et.al. | 2502.17622 | null |
| 2025-02-24 | Neural Attention: A Novel Mechanism for Enhanced Expressive Power in Transformer Models | Andrew DiGiugno et.al. | 2502.17206 | null |
| 2025-02-24 | Disentangling Visual Transformers: Patch-level Interpretability for Image Classification | Guillaume Jeanneret et.al. | 2502.17196 | null |
| 2025-02-24 | Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment | Chenghao Fan et.al. | 2502.16894 | link |
| 2025-02-24 | Applying LLMs to Active Learning: Towards Cost-Efficient Cross-Task Text Classification without Manually Labeled Data | Yejian Zhang et.al. | 2502.16892 | null |
| 2025-02-24 | A Transformer-in-Transformer Network Utilizing Knowledge Distillation for Image Recognition | Dewan Tauhid Rahman et.al. | 2502.16762 | null |
| 2025-02-23 | AUKT: Adaptive Uncertainty-Guided Knowledge Transfer with Conformal Prediction | Rui Liu et.al. | 2502.16736 | null |
| 2025-02-22 | MOB-GCN: A Novel Multiscale Object-Based Graph Neural Network for Hyperspectral Image Classification | Tuan-Anh Yang et.al. | 2502.16289 | link |
| 2025-02-22 | A Multi-Scale Isolation Forest Approach for Real-Time Detection and Filtering of FGSM Adversarial Attacks in Video Streams of Autonomous Vehicles | Richard Abhulimhen et.al. | 2502.16044 | null |
| 2025-02-21 | MMRAG: Multi-Mode Retrieval-Augmented Generation with Large Language Models for Biomedical In-Context Learning | Zaifu Zhan et.al. | 2502.15954 | null |
| 2025-02-21 | Directional Gradient Projection for Robust Fine-Tuning of Foundation Models | Chengyue Huang et.al. | 2502.15895 | null |
| 2025-02-21 | MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models | Suraj Racha et.al. | 2502.15418 | null |
| 2025-02-21 | A Novel Riemannian Sparse Representation Learning Network for Polarimetric SAR Image Classification | Junfei Shi et.al. | 2502.15302 | null |
| 2025-02-21 | Quantum autoencoders for image classification | Hinako Asaoka et.al. | 2502.15254 | null |
| 2025-02-21 | Steganographic Embeddings as an Effective Data Augmentation | Nicholas DiSalvo et.al. | 2502.15245 | null |
| 2025-02-21 | Learning to Collaborate: A Capability Vectors-based Architecture for Adaptive Human-AI Decision Making | Renlong Jie et.al. | 2502.15196 | null |
| 2025-02-21 | TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba | Xiuwei Chen et.al. | 2502.15130 | null |
| 2025-02-20 | Fundamental Survey on Neuromorphic Based Audio Classification | Amlan Basu et.al. | 2502.15056 | null |
| 2025-02-20 | Reinforcement Learning for Ultrasound Image Analysis A Comprehensive Review of Advances and Applications | Maha Ezzelarab et.al. | 2502.14995 | null |
| 2025-02-20 | Sparse Activations as Conformal Predictors | Margarida M. Campos et.al. | 2502.14773 | link |
| 2025-02-20 | An Enhancement of Jiang, Z., et al.s Compression-Based Classification Algorithm Applied to News Article Categorization | Sean Lester C. Benavides et.al. | 2502.14444 | null |
| 2025-02-20 | Stochastic Resonance Improves the Detection of Low Contrast Images in Deep Learning Models | Siegfried Ludwig et.al. | 2502.14442 | null |
| 2025-02-20 | Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models | Artem Vazhentsev et.al. | 2502.14427 | null |
| 2025-02-20 | Reliable Explainability of Deep Learning Spatial-Spectral Classifiers for Improved Semantic Segmentation in Autonomous Driving | Jon Gutiérrez-Zaballa et.al. | 2502.14416 | null |
| 2025-02-20 | QUAD-LLM-MLTC: Large Language Models Ensemble Learning for Healthcare Text Multi-Label Classification | Hajar Sakai et.al. | 2502.14189 | null |
| 2025-02-19 | Self-Regularization with Latent Space Explanations for Controllable LLM-based Classification | Xuansheng Wu et.al. | 2502.14133 | null |
| 2025-02-19 | Medical Image Classification with KAN-Integrated Transformers and Dilated Neighborhood Attention | Omid Nejati Manzari et.al. | 2502.13693 | link |
| 2025-02-18 | Language Models Can Predict Their Own Behavior | Dhananjay Ashok et.al. | 2502.13329 | null |
| 2025-02-18 | Performance Evaluation of Sentiment Analysis on Text and Emoji Data Using End-to-End, Transfer Learning, Distributed and Explainable AI Models | Sirisha Velampalli et.al. | 2502.13278 | null |
| 2025-02-18 | Private Text Generation by Seeding Large Language Model Prompts | Supriya Nagesh et.al. | 2502.13193 | null |
| 2025-02-18 | RingFormer: Rethinking Recurrent Transformer with Adaptive Level Signals | Jaemu Heo et.al. | 2502.13181 | null |
| 2025-02-18 | Benchmarking MedMNIST dataset on real quantum hardware | Gurinder Singh et.al. | 2502.13056 | null |
| 2025-02-18 | Likelihood-Ratio Regularized Quantile Regression: Adapting Conformal Prediction to High-Dimensional Covariate Shifts | Sunay Joshi et.al. | 2502.13030 | null |
| 2025-02-18 | A Survey of Text Classification Under Class Distribution Shift | Adriana Valentina Costache et.al. | 2502.12965 | null |
| 2025-02-18 | Task-Informed Anti-Curriculum by Masking Improves Downstream Performance on Text | Andrei Jarca et.al. | 2502.12953 | null |
| 2025-02-18 | DAMamba: Vision State Space Model with Dynamic Adaptive Scan | Tanzhe Li et.al. | 2502.12627 | null |
| 2025-02-18 | When Segmentation Meets Hyperspectral Image: New Paradigm for Hyperspectral Image Classification | Weilian Zhou et.al. | 2502.12541 | null |
| 2025-02-17 | Achieving Upper Bound Accuracy of Joint Training in Continual Learning | Saleh Momeni et.al. | 2502.12388 | null |
| 2025-02-17 | OCT Data is All You Need: How Vision Transformers with and without Pre-training Benefit Imaging | Zihao Han et.al. | 2502.12379 | null |
| 2025-02-17 | AdaSplash: Adaptive Sparse Flash Attention | Nuno Gonçalves et.al. | 2502.12082 | link |
| 2025-02-17 | Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning | Aurian Quelennec et.al. | 2502.12031 | null |
| 2025-02-17 | Text Classification in the LLM Era - Where do we stand? | Sowmya Vajjala et.al. | 2502.11830 | null |
| 2025-02-17 | Variable-frame CNNLSTM for Breast Nodule Classification using Ultrasound Videos | Xiangxiang Cui et.al. | 2502.11481 | null |
| 2025-02-16 | Leveraging Conditional Mutual Information to Improve Large Language Model Fine-Tuning For Classification | Thanushon Sivakaran et.al. | 2502.11258 | null |
| 2025-02-16 | UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation | Arka Mukherjee et.al. | 2502.11132 | null |
| 2025-02-16 | Towards Achieving Concept Completeness for Unsupervised Textual Concept Bottleneck Models | Milan Bhan et.al. | 2502.11100 | null |
| 2025-02-16 | Leveraging Large Language Models for Cybersecurity: Enhancing SMS Spam Detection with Robust and Context-Aware Text Classification | Mohsen Ahmadi et.al. | 2502.11014 | null |
| 2025-02-15 | Simulations of Common Unsupervised Domain Adaptation Algorithms for Image Classification | Ahmad Chaddad et.al. | 2502.10694 | null |
| 2025-02-15 | REAL: Realism Evaluation of Text-to-Image Generation Models for Effective Data Augmentation | Ran Li et.al. | 2502.10663 | null |
| 2025-02-14 | Simplifying DINO via Coding Rate Regularization | Ziyang Wu et.al. | 2502.10385 | null |
| 2025-02-14 | Ocular Disease Classification Using CNN with Deep Convolutional Generative Adversarial Network | Arun Kunwar et.al. | 2502.10334 | null |
| 2025-02-14 | SeWA: Selective Weight Average via Probabilistic Masking | Peng Wang et.al. | 2502.10119 | null |
| 2025-02-14 | On Space Folds of ReLU Neural Networks | Michal Lewandowski et.al. | 2502.09954 | null |
| 2025-02-13 | A CNN Approach to Automated Detection and Classification of Brain Tumors | Md. Zahid Hasan et.al. | 2502.09731 | null |
| 2025-02-13 | GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis | Angelos Zavras et.al. | 2502.09598 | link |
| 2025-02-14 | Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering | Mark Beliaev et.al. | 2502.09573 | null |
| 2025-02-13 | Feature-based Graph Attention Networks Improve Online Continual Learning | Adjovi Sim et.al. | 2502.09143 | null |
| 2025-02-13 | A Hybrid Model for Few-Shot Text Classification Using Transfer and Meta-Learning | Jia Gao et.al. | 2502.09086 | null |
| 2025-02-13 | Hierarchical Vision Transformer with Prototypes for Interpretable Medical Image Classification | Luisa Gallée et.al. | 2502.08997 | null |
| 2025-02-13 | Quantum Approaches for Dysphonia Assessment in Small Speech Datasets | Ha Tran et.al. | 2502.08968 | null |
| 2025-02-12 | Measuring Diversity in Synthetic Datasets | Yuchang Zhu et.al. | 2502.08512 | null |
| 2025-02-12 | ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification | Jiangbo Shi et.al. | 2502.08391 | null |
| 2025-02-12 | Keep your distance: learning dispersed embeddings on $\mathbb{S}_d$ | Evgeniia Tokarchuk et.al. | 2502.08231 | null |
| 2025-02-12 | Riemannian Complex Hermit Positive Definite Convolution Network for Polarimetric SAR Image Classification | Junfei Shi et.al. | 2502.08137 | null |
| 2025-02-12 | Knowledge Swapping via Learning and Unlearning | Mingyu Xing et.al. | 2502.08075 | null |
| 2025-02-12 | Can Machine Learning Support the Selection of Studies for Systematic Literature Review Updates? | Marcelo Costalonga et.al. | 2502.08050 | null |
| 2025-02-11 | ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans | Ashkan Shahbazi et.al. | 2502.07962 | null |
| 2025-02-11 | Optimizing Knowledge Distillation in Transformers: Enabling Multi-Head Attention without Alignment Barriers | Zhaodong Bing et.al. | 2502.07436 | null |
| 2025-02-11 | MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks | Lotfi Abdelkrim Mecharbat et.al. | 2502.07422 | null |
| 2025-02-11 | MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification | Anh-Tien Nguyen et.al. | 2502.07409 | null |
| 2025-02-11 | Don’t Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification | Peipei Wei et.al. | 2502.07165 | null |
| 2025-02-10 | From Image to Video: An Empirical Study of Diffusion Representations | Pedro Vélez et.al. | 2502.07001 | null |
| 2025-02-10 | Krum Federated Chain (KFC): Using blockchain to defend against adversarial attacks in Federated Learning | Mario García-Márquez et.al. | 2502.06917 | null |
| 2025-02-10 | Enhancing Performance of Explainable AI Models with Constrained Concept Refinement | Geyu Liang et.al. | 2502.06775 | null |
| 2025-02-10 | Efficient Scientific Full Text Classification: The Case of EICAT Impact Assessments | Marc Felix Brinner et.al. | 2502.06551 | null |
| 2025-02-10 | Hybrid State-Space and GRU-based Graph Tokenization Mamba for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2502.06427 | null |
| 2025-02-10 | Provably Near-Optimal Federated Ensemble Distillation with Negligible Overhead | Won-Jun Jang et.al. | 2502.06349 | null |
| 2025-02-10 | From Pixels to Components: Eigenvector Masking for Visual Representation Learning | Alice Bizeul et.al. | 2502.06314 | null |
| 2025-02-10 | Beyond Batch Learning: Global Awareness Enhanced Domain Adaptation | Lingkun Luo et.al. | 2502.06272 | null |
| 2025-02-10 | Multi-Scale Transformer Architecture for Accurate Medical Image Classification | Jiacheng Hu et.al. | 2502.06243 | null |
| 2025-02-10 | Low Tensor-Rank Adaptation of Kolmogorov–Arnold Networks | Yihang Gao et.al. | 2502.06153 | null |
| 2025-02-09 | Benchmarking Prompt Sensitivity in Large Language Models | Amirhossein Razavi et.al. | 2502.06065 | null |
| 2025-02-09 | ARISE: Iterative Rule Induction and Synthetic Data Generation for Text Classification | Yashwanth M. et.al. | 2502.05923 | null |
| 2025-02-07 | Training-free Neural Architecture Search through Variance of Knowledge of Deep Network Weights | Ondřej Týbl et.al. | 2502.04975 | null |
| 2025-02-07 | Enhancing Disinformation Detection with Explainable AI and Named Entity Replacement | Santiago González-Silot et.al. | 2502.04863 | null |
| 2025-02-07 | AIQViT: Architecture-Informed Post-Training Quantization for Vision Transformers | Runqing Jiang et.al. | 2502.04628 | null |
| 2025-02-06 | Augmented Conditioning Is Enough For Effective Training Image Generation | Jiahui Chen et.al. | 2502.04475 | null |
| 2025-02-06 | How does a Multilingual LM Handle Multiple Languages? | Santhosh Kakarla et.al. | 2502.04269 | null |
| 2025-02-06 | Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion | Marco Mistretta et.al. | 2502.04263 | link |
| 2025-02-06 | Expanding Training Data for Endoscopic Phenotyping of Eosinophilic Esophagitis | Juming Xiong et.al. | 2502.04199 | null |
| 2025-02-06 | Improving Natural Language Understanding for LLMs via Large-Scale Instruction Synthesis | Lin Yuan et.al. | 2502.03843 | null |
| 2025-02-06 | Self-Supervised Learning for Solar Radio Spectrum Classification | Siqi Li et.al. | 2502.03778 | null |
| 2025-02-06 | Conditional Diffusion Models are Medical Image Classifiers that Provide Explainability and Uncertainty for Free | Gian Mario Favero et.al. | 2502.03687 | null |
| 2025-02-05 | A Study in Dataset Distillation for Image Super-Resolution | Tobias Dietz et.al. | 2502.03656 | null |
| 2025-02-05 | Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics | Indrashis Das et.al. | 2502.03654 | link |
| 2025-02-05 | Clinically-Inspired Hierarchical Multi-Label Classification of Chest X-rays with a Penalty-Based Loss Function | Mehrdad Asadi et.al. | 2502.03591 | link |
| 2025-02-05 | Optimal Task Order for Continual Learning of Multiple Tasks | Ziyan Li et.al. | 2502.03350 | null |
| 2025-02-05 | Out-of-Distribution Detection using Synthetic Data Generation | Momin Abbas et.al. | 2502.03323 | null |
| 2025-02-05 | Long-tailed Medical Diagnosis with Relation-aware Representation Learning and Iterative Classifier Calibration | Li Pan et.al. | 2502.03238 | link |
| 2025-02-05 | Adversarial Dependence Minimization | Pierre-François De Plaen et.al. | 2502.03227 | null |
| 2025-02-05 | Disentangling CLIP Features for Enhanced Localized Understanding | Samyak Rawelekar et.al. | 2502.02977 | null |
| 2025-02-05 | Slowing Learning by Erasing Simple Features | Lucia Quirke et.al. | 2502.02820 | null |
| 2025-02-04 | The Skin Game: Revolutionizing Standards for AI Dermatology Model Comparison | Łukasz Miętkiewicz et.al. | 2502.02500 | null |
| 2025-02-04 | BRIDLE: Generalized Self-supervised Learning with Quantization | Hoang M. Nguyen et.al. | 2502.02118 | null |
| 2025-02-04 | DCT-Mamba3D: Spectral Decorrelation and Spatial-Spectral Feature Extraction for Hyperspectral Image Classification | Weijia Cao et.al. | 2502.01986 | null |
| 2025-02-04 | Generative Data Mining with Longtail-Guided Diffusion | David S. Hayden et.al. | 2502.01980 | null |
| 2025-02-03 | A Multi-Scale Feature Fusion Framework Integrating Frequency Domain and Cross-View Attention for Dual-View X-ray Security Inspections | Shilong Hong et.al. | 2502.01710 | null |
| 2025-02-03 | Activation by Interval-wise Dropout: A Simple Way to Prevent Neural Networks from Plasticity Loss | Sangyeon Park et.al. | 2502.01342 | null |
| 2025-02-03 | A Framework for Double-Blind Federated Adaptation of Foundation Models | Nurbek Tastan et.al. | 2502.01289 | null |
| 2025-02-02 | Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications | Yixin Wu et.al. | 2502.00808 | null |
| 2025-02-02 | Enhanced Convolutional Neural Networks for Improved Image Classification | Xiaoran Yang et.al. | 2502.00663 | null |
| 2025-02-01 | Fast Vision Mamba: Pooling Spatial Dimensions for Accelerated Processing | Saarthak Kapse et.al. | 2502.00594 | null |
| 2025-01-31 | Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach | Yingdan Shi et.al. | 2501.19403 | null |
| 2025-01-31 | An All-digital 65-nm Tsetlin Machine Image Classification Accelerator with 8.6 nJ per MNIST Frame at 60.3k Frames per Second | Svein Anders Tunheim et.al. | 2501.19347 | null |
| 2025-01-31 | Through the Looking Glass: LLM-Based Analysis of AR/VR Android Applications Privacy Policies | Abdulaziz Alghamdi et.al. | 2501.19223 | null |
| 2025-01-31 | Fairness Analysis of CLIP-Based Foundation Models for X-Ray Image Classification | Xiangyu Sun et.al. | 2501.19086 | null |
| 2025-01-31 | Memory-Efficient Fine-Tuning of Transformers via Token Selection | Antoine Simoulin et.al. | 2501.18824 | null |
| 2025-01-30 | OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization | Kelvin Kan et.al. | 2501.18793 | null |
| 2025-01-29 | Semantic Consistency Regularization with Large Language Models for Semi-supervised Sentiment Analysis | Kunrong Li et.al. | 2501.17598 | null |
| 2025-01-28 | Extending Information Bottleneck Attribution to Video Sequences | Veronika Solopova et.al. | 2501.16889 | link |
| 2025-01-28 | Misspellings in Natural Language Processing: A survey | Gianluca Sperduti et.al. | 2501.16836 | null |
| 2025-01-28 | DebugAgent: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging | Muxi Chen et.al. | 2501.16751 | null |
| 2025-01-28 | Toward Relative Positional Encoding in Spiking Transformers | Changze Lv et.al. | 2501.16745 | null |
| 2025-01-28 | Improving Interpretability and Accuracy in Neuro-Symbolic Rule Extraction Using Class-Specific Sparse Filters | Parth Padalkar et.al. | 2501.16677 | null |
| 2025-01-27 | Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM | Payal Kamboj et.al. | 2501.16481 | link |
| 2025-01-28 | SPECIAL: Zero-shot Hyperspectral Image Classification With CLIP | Li Pang et.al. | 2501.16222 | link |
| 2025-01-27 | The Linear Attention Resurrection in Vision Transformer | Chuanyang Zheng et.al. | 2501.16182 | null |
| 2025-01-27 | Enhancing the Convergence of Federated Learning Aggregation Strategies with Limited Data | Judith Sáinz-Pardo Díaz et.al. | 2501.15949 | null |
| 2025-01-26 | Quantum-Enhanced Attention Mechanism in NLP: A Hybrid Classical-Quantum Approach | S. M. Yousuf Iqbal Tomal et.al. | 2501.15630 | null |
| 2025-01-26 | Building Efficient Lightweight CNN Models | Nathan Isong et.al. | 2501.15547 | null |
| 2025-01-26 | Fuzzy-aware Loss for Source-free Domain Adaptation in Visual Emotion Recognition | Ying Zheng et.al. | 2501.15519 | null |
| 2025-01-26 | Variational Bayesian Adaptive Learning of Deep Latent Variables for Acoustic Knowledge Transfer | Hu Hu et.al. | 2501.15496 | null |
| 2025-01-25 | Pre-trained Model Guided Mixture Knowledge Distillation for Adversarial Federated Learning | Yu Qiao et.al. | 2501.15257 | null |
| 2025-01-24 | Feasible Learning | Juan Ramirez et.al. | 2501.14912 | link |
| 2025-01-24 | Rethinking Foundation Models for Medical Image Classification through a Benchmark Study on MedMNIST | Fuping Wu et.al. | 2501.14685 | null |
| 2025-01-24 | Geometric Mean Improves Loss For Few-Shot Learning | Tong Wu et.al. | 2501.14593 | null |
| 2025-01-24 | Idiom Detection in Sorani Kurdish Texts | Skala Kamaran Omer et.al. | 2501.14528 | null |
| 2025-01-24 | $SpikePack$ : Enhanced Information Flow in Spiking Neural Networks with High Hardware Compatibility | Guobin Shen et.al. | 2501.14484 | null |
| 2025-01-24 | Impact of Batch Normalization on Convolutional Network Representations | Hermanus L. Potgieter et.al. | 2501.14441 | null |
| 2025-01-24 | Quantum Neural Networks: A Comparative Analysis and Noise Robustness Evaluation | Tasnim Ahmed et.al. | 2501.14412 | null |
| 2025-01-24 | Correlation-Based Band Selection for Hyperspectral Image Classification | Dibyabha Deb et.al. | 2501.14338 | link |
| 2025-01-24 | Relative Layer-Wise Relevance Propagation: a more Robust Neural Networks eXplaination | Eric Nyiri et.al. | 2501.14322 | null |
| 2025-01-24 | A Comprehensive Framework for Semantic Similarity Detection Using Transformer Architectures and Enhanced Ensemble Techniques | Lifu Gao et.al. | 2501.14288 | null |
| 2025-01-24 | TLXML: Task-Level Explanation of Meta-Learning via Influence Functions | Yoshihiro Mitsuka et.al. | 2501.14271 | null |
| 2025-01-23 | A Study of the Plausibility of Attention between RNN Encoders in Natural Language Inference | Duc Hau Nguyen et.al. | 2501.13735 | null |
| 2025-01-23 | A Transformer-based Autoregressive Decoder Architecture for Hierarchical Text Classification | Younes Yousef et.al. | 2501.13598 | link |
| 2025-01-23 | Multi-Level Attention and Contrastive Learning for Enhanced Text Classification with an Optimized Transformer | Jia Gao et.al. | 2501.13467 | null |
| 2025-01-23 | Atmospheric Noise-Resilient Image Classification in a Real-World Scenario: Using Hybrid CNN and Pin-GTSVM | Shlok Mehendale et.al. | 2501.13422 | null |
| 2025-01-23 | AEON: Adaptive Estimation of Instance-Dependent In-Distribution and Out-of-Distribution Label Noise for Robust Learning | Arpit Garg et.al. | 2501.13389 | null |
| 2025-01-23 | Multi-aspect Knowledge Distillation with Large Language Model | Taegyeong Lee et.al. | 2501.13341 | null |
| 2025-01-22 | Revisiting Data Augmentation for Ultrasound Images | Adam Tupper et.al. | 2501.13193 | link |
| 2025-01-22 | Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation | Duc Hau Nguyen et.al. | 2501.12775 | link |
| 2025-01-22 | Estimating the Conformal Prediction Threshold from Noisy Labels | Coby Penso et.al. | 2501.12749 | link |
| 2025-01-22 | Adapting OpenAI’s CLIP Model for Few-Shot Image Inspection in Manufacturing Quality Control: An Expository Case Study with Multiple Application Examples | Fadel M. Megahed et.al. | 2501.12596 | null |
| 2025-01-21 | Efficient Lung Ultrasound Severity Scoring Using Dedicated Feature Extractor | Jiaqi Guo et.al. | 2501.12524 | null |
| 2025-01-21 | CCESAR: Coastline Classification-Extraction From SAR Images Using CNN-U-Net Combination | Vidhu Arora et.al. | 2501.12384 | null |
| 2025-01-21 | CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification | Cristiano Patrício et.al. | 2501.12266 | null |
| 2025-01-21 | Early Detection and Classification of Breast Cancer Using Deep Learning Techniques | Mst. Mumtahina Labonno et.al. | 2501.12217 | null |
| 2025-01-21 | UAV-Assisted Real-Time Disaster Detection Using Optimized Transformer Model | Branislava Jankovic et.al. | 2501.12087 | null |
| 2025-01-20 | Communication-Efficient Federated Learning Based on Explanation-Guided Pruning for Remote Sensing Image Classification | Jonas Klotz et.al. | 2501.11493 | null |
| 2025-01-22 | QGAIC: Quantum Inspired Genetic Algorithm for Image Classification | Akhilesh Kumar Singh et.al. | 2501.11477 | null |
| 2025-01-20 | GenVidBench: A Challenging Benchmark for Detecting AI-Generated Video | Zhenliang Ni et.al. | 2501.11340 | null |
| 2025-01-20 | KPL: Training-Free Medical Knowledge Mining of Vision-Language Models | Jiaxiang Liu et.al. | 2501.11231 | link |
| 2025-01-19 | CLOFAI: A Dataset of Real And Fake Image Classification Tasks for Continual Learning | William Doherty et.al. | 2501.11140 | link |
| 2025-01-19 | Leveraging counterfactual concepts for debugging and improving CNN model performance | Syed Ali Tariq et.al. | 2501.11087 | null |
| 2025-01-17 | A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features | Enes Karanfil et.al. | 2501.10144 | null |
| 2025-01-17 | Classifier Ensemble for Efficient Uncertainty Calibration of Deep Neural Networks for Image Classification | Michael Schulze et.al. | 2501.10089 | null |
| 2025-01-17 | One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression | Keita Miwa et.al. | 2501.10064 | null |
| 2025-01-17 | LWGANet: A Lightweight Group Attention Backbone for Remote Sensing Visual Tasks | Wei Lu et.al. | 2501.10040 | link |
| 2025-01-16 | Empirical Evaluation of Embedding Models in the Context of Text Classification in Document Review in Construction Delay Disputes | Fusheng Wei et.al. | 2501.09859 | null |
| 2025-01-16 | SRE-Conv: Symmetric Rotation Equivariant Convolution for Biomedical Image Classification | Yuexi Du et.al. | 2501.09753 | link |
| 2025-01-16 | Practical Continual Forgetting for Pre-trained Vision Models | Hongbo Zhao et.al. | 2501.09705 | link |
| 2025-01-16 | Multimodal Marvels of Deep Learning in Medical Diagnosis: A Comprehensive Review of COVID-19 Detection | Md Shofiqul Islama et.al. | 2501.09506 | link |
| 2025-01-16 | HydraMix: Multi-Image Feature Mixing for Small Data Image Classification | Christoph Reinders et.al. | 2501.09504 | null |
| 2025-01-16 | Quantum-Enhanced Transformers for Robust Acoustic Scene Classification in IoT Environments | Minh K. Quan et.al. | 2501.09394 | null |
| 2025-01-16 | Shape-Based Single Object Classification Using Ensemble Method Classifiers | Nur Shazwani Kamarudin et.al. | 2501.09311 | null |
| 2025-01-16 | Efficient Few-Shot Medical Image Analysis via Hierarchical Contrastive Vision-Language Learning | Harrison Fuller et.al. | 2501.09294 | null |
| 2025-01-16 | A Simple Graph Contrastive Learning Framework for Short Text Classification | Yonghao Liu et.al. | 2501.09219 | link |
| 2025-01-16 | Boosting Short Text Classification with Multi-Source Information Exploration and Dual-Level Contrastive Learning | Yonghao Liu et.al. | 2501.09214 | link |
| 2025-01-15 | Augmenting Human-Annotated Training Data with Large Language Model Generation and Distillation in Open-Response Assessment | Conrad Borchers et.al. | 2501.09126 | null |
| 2025-01-15 | IDEA: Image Description Enhanced CLIP-Adapter | Zhipeng Ye et.al. | 2501.08816 | null |
| 2025-01-15 | MIAFEx: An Attention-based Feature Extraction Method for Medical Image Classification | Oscar Ramos-Soto et.al. | 2501.08562 | null |
| 2025-01-14 | Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time | Mihai Masala et.al. | 2501.08460 | null |
| 2025-01-14 | Large Language Models For Text Classification: Case Study And Comprehensive Review | Arina Kostina et.al. | 2501.08457 | null |
| 2025-01-14 | READ: Reinforcement-based Adversarial Learning for Text Classification with Limited Labeled Data | Rohit Sharma et.al. | 2501.08035 | null |
| 2025-01-14 | Training Hybrid Neural Networks with Multimode Optical Nonlinearities Using Digital Twins | Ilker Oguz et.al. | 2501.07991 | null |
| 2025-01-14 | deepTerra – AI Land Classification Made Easy | Andrew Keith Wilkinson et.al. | 2501.07859 | null |
| 2025-01-14 | A Low-cost and Ultra-lightweight Binary Neural Network for Traffic Signal Recognition | Mingke Xiao et.al. | 2501.07808 | null |
| 2025-01-14 | Balance Divergence for Knowledge Distillation | Yafei Qi et.al. | 2501.07804 | null |
| 2025-01-14 | Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding | Zhaokai Wang et.al. | 2501.07783 | link |
| 2025-01-13 | Universal Training of Neural Networks to Achieve Bayes Optimal Classification Accuracy | Mohammadreza Tavasoli Naeini et.al. | 2501.07754 | null |
| 2025-01-13 | Uncertainty Guarantees on Automated Precision Weeding using Conformal Prediction | Paul Melki et.al. | 2501.07185 | null |
| 2025-01-13 | Adaptive Noise-Tolerant Network for Image Segmentation | Weizhi Li et.al. | 2501.07163 | null |
| 2025-01-12 | LarvSeg: Exploring Image Classification Data For Large Vocabulary Semantic Segmentation via Category-wise Attentive Classifier | Haojun Yu et.al. | 2501.06862 | link |
| 2025-01-12 | Rice Leaf Disease Detection: A Comparative Study Between CNN, Transformer and Non-neural Network Architectures | Samia Mehnaz et.al. | 2501.06740 | null |
| 2025-01-12 | Multi-Label Scene Classification in Remote Sensing Benefits from Image Super-Resolution | Ashitha Mudraje et.al. | 2501.06720 | null |
| 2025-01-11 | Synthetic Feature Augmentation Improves Generalization Performance of Language Models | Ashok Choudhary et.al. | 2501.06434 | null |
| 2025-01-10 | Kolmogorov-Arnold networks for metal surface defect classification | Maciej Krzywda et.al. | 2501.06389 | null |
| 2025-01-10 | Merging Feed-Forward Sublayers for Compressed Transformers | Neha Verma et.al. | 2501.06126 | link |
| 2025-01-10 | Averaged Adam accelerates stochastic optimization in the training of deep neural network approximations for partial differential equation and optimal control problems | Steffen Dereich et.al. | 2501.06081 | link |
| 2025-01-10 | Constrained Over-the-Air Model Updating for Wireless Online Federated Learning with Delayed Information | Juncheng Wang et.al. | 2501.05637 | null |
| 2025-01-10 | The Impact of Model Scaling on Seen and Unseen Language Performance | Rhitabrat Pokharel et.al. | 2501.05629 | null |
| 2025-01-09 | Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding | Mohammed Elhenawy et.al. | 2501.05566 | null |
| 2025-01-09 | Spatial Information Integration in Small Language Models for Document Layout Generation and Classification | Pablo Melendez et.al. | 2501.05497 | null |
| 2025-01-09 | An Empirical Study of Autoregressive Pre-training from Videos | Jathushan Rajasegaran et.al. | 2501.05453 | null |
| 2025-01-09 | A 1Mb mixed-precision quantized encoder for image classification and patch-based compression | Van Thien Nguyen et.al. | 2501.05097 | null |
| 2025-01-09 | A CT Image Classification Network Framework for Lung Tumors Based on Pre-trained MobileNetV2 Model and Transfer learning, And Its Application and Market Analysis in the Medical field | Ziyang Gao et.al. | 2501.04996 | null |
| 2025-01-09 | MambaHSI: Spatial-Spectral Mamba for Hyperspectral Image Classification | Yapeng Li et.al. | 2501.04944 | null |
| 2025-01-09 | A New Perspective on Privacy Protection in Federated Learning with Granular-Ball Computing | Guannan Lai et.al. | 2501.04940 | link |
| 2025-01-09 | ThriftLLM: On Cost-Effective Selection of Large Language Models for Classification Queries | Keke Huang et.al. | 2501.04901 | null |
| 2025-01-09 | Online Continual Learning: A Systematic Literature Review of Approaches, Challenges, and Benchmarks | Seyed Amir Bidaki et.al. | 2501.04897 | link |
| 2025-01-08 | Planarian Neural Networks: Evolutionary Patterns from Basic Bilateria Shaping Modern Artificial Neural Network Architectures | Ziyuan Huang et.al. | 2501.04700 | null |
| 2025-01-08 | Discrete Wavelet Transform-Based Capsule Network for Hyperspectral Image Classification | Zhiqiang Gao et.al. | 2501.04643 | null |
| 2025-01-08 | Enhancing Scene Classification in Cloudy Image Scenarios: A Collaborative Transfer Method with Information Regulation Mechanism using Optical Cloud-Covered and SAR Remote Sensing Images | Yuze Wang et.al. | 2501.04283 | null |
| 2025-01-08 | Comparison of Neural Models for X-ray Image Classification in COVID-19 Detection | Jimi Togni et.al. | 2501.04196 | null |
| 2025-01-07 | Temporal Feature Weaving for Neonatal Echocardiographic Viewpoint Video Classification | Satchel French et.al. | 2501.03967 | link |
| 2025-01-07 | Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback | Jiakang Yuan et.al. | 2501.03916 | null |
| 2025-01-07 | MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attention | Aadya Arora et.al. | 2501.03839 | null |
| 2025-01-07 | LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging | Shubhr Singh et.al. | 2501.03464 | null |
| 2025-01-06 | FTA-FTL: A Fine-Tuned Aggregation Federated Transfer Learning Scheme for Lithology Microscopic Image Classification | Keyvan RahimiZadeh et.al. | 2501.03349 | link |
| 2025-01-06 | CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets | Tanay Agrawal et.al. | 2501.03332 | null |
| 2025-01-06 | Plant Leaf Disease Detection and Classification Using Deep Learning: A Review and A Proposed System on Bangladesh’s Perspective | Md. Jalal Uddin Chowdhury et.al. | 2501.03305 | null |
| 2025-01-06 | Deep-Relative-Trust-Based Diffusion for Decentralized Deep Learning | Muyun Li et.al. | 2501.03162 | null |
| 2025-01-06 | Graph-based Retrieval Augmented Generation for Dynamic Few-shot Text Classification | Yubo Wang et.al. | 2501.02844 | null |
| 2025-01-06 | TARDiS : Text Augmentation for Refining Diversity and Separability | Kyungmin Kim et.al. | 2501.02739 | null |
| 2025-01-05 | FedRSClip: Federated Learning for Remote Sensing Scene Classification Using Vision-Language Models | Hui Lin et.al. | 2501.02461 | null |
| 2025-01-04 | Exploring Secure Machine Learning Through Payload Injection and FGSM Attacks on ResNet-50 | Umesh Yadav et.al. | 2501.02147 | null |
| 2025-01-03 | A Separable Self-attention Inspired by the State Space Model for Computer Vision | Juntao Zhang et.al. | 2501.02040 | link |
| 2025-01-03 | Google is all you need: Semi-Supervised Transfer Learning Strategy For Light Multimodal Multi-Task Classification Model | Haixu Liu et.al. | 2501.01611 | null |
| 2025-01-02 | Multi-Modal Video Feature Extraction for Popularity Prediction | Haixu Liu et.al. | 2501.01422 | null |
| 2025-01-02 | A Multi-task Supervised Compression Model for Split Computing | Yoshitomo Matsubara et.al. | 2501.01420 | link |
| 2025-01-02 | Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers | Bohang Sun et.al. | 2501.01311 | null |
| 2025-01-02 | FAST: Fast Audio Spectrogram Transformer | Anugunj Naman et.al. | 2501.01104 | null |
| 2025-01-01 | A Novel Approach using CapsNet and Deep Belief Network for Detection and Identification of Oral Leukopenia | Hirthik Mathesh GV et.al. | 2501.00876 | null |
| 2025-01-01 | Ensuring superior learning outcomes and data security for authorized learner | Jeongho Bang et.al. | 2501.00754 | null |
| 2024-12-31 | TSPE: Task-Specific Prompt Ensemble for Improved Zero-Shot Audio Classification | Nishit Anand et.al. | 2501.00398 | null |
| 2024-12-31 | Exploring Variability in Fine-Tuned Models for Text Classification with DistilBERT | Giuliano Lorenzoni et.al. | 2501.00241 | null |
| 2024-12-30 | The Text Classification Pipeline: Starting Shallow going Deeper | Marco Siino et.al. | 2501.00174 | null |
| 2024-12-30 | Text Classification: Neural Networks VS Machine Learning Models VS Pre-trained Models | Christos Petridis et.al. | 2412.21022 | null |
| 2024-12-30 | FPGA-based Acceleration of Neural Network for Image Classification using Vitis AI | Zhengdong Li et.al. | 2412.20974 | null |
| 2024-12-30 | Uncertainty-Aware Out-of-Distribution Detection with Gaussian Processes | Yang Chen et.al. | 2412.20918 | null |
| 2024-12-30 | UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models | Yujie Li et.al. | 2412.20742 | null |
| 2024-12-30 | Improving Acoustic Scene Classification in Low-Resource Conditions | Zhi Chen et.al. | 2412.20722 | null |
| 2024-12-29 | Hilbert Curve Based Molecular Sequence Analysis | Sarwan Ali et.al. | 2412.20616 | null |
| 2024-12-29 | A Novel FPGA-based CNN Hardware Accelerator: Optimization for Convolutional Layers using Karatsuba Ofman Multiplier | Amit Sarkar et.al. | 2412.20393 | null |
| 2024-12-29 | HindiLLM: Large Language Model for Hindi | Sanjay Chouhan et.al. | 2412.20357 | null |
| 2024-12-29 | Deep Learning in Image Classification: Evaluating VGG19’s Performance on Complex Visual Data | Weijie He et.al. | 2412.20345 | null |
| 2024-12-28 | Few-shot Algorithm Assurance | Dang Nguyen et.al. | 2412.20275 | null |
| 2024-12-27 | Asymmetrical Reciprocity-based Federated Learning for Resolving Disparities in Medical Diagnosis | Jiaqi Wang et.al. | 2412.19654 | null |
| 2024-12-27 | Enhancing Fine-grained Image Classification through Attentive Batch Training | Duy M. Le et.al. | 2412.19606 | null |
| 2024-12-27 | A Comparative Study of Machine Unlearning Techniques for Image and Text Classification Models | Omar M. Safa et.al. | 2412.19583 | null |
| 2024-12-27 | Multi-label Classification using Deep Multi-order Context-aware Kernel Networks | Mingyuan Jiu et.al. | 2412.19491 | null |
| 2024-12-27 | Residual Feature-Reutilization Inception Network for Image Classification | Yuanpeng He et.al. | 2412.19433 | null |
| 2024-12-27 | An In-Depth Analysis of Adversarial Discriminative Domain Adaptation for Digit Classification | Eugene Choi et.al. | 2412.19391 | link |
| 2024-12-26 | Assessing Pre-trained Models for Transfer Learning through Distribution of Spectral Components | Tengxue Zhang et.al. | 2412.19085 | null |
| 2024-12-26 | Let the Rule Speak: Enhancing In-context Learning Debiasing with Interpretability | Ruixi Lin et.al. | 2412.19018 | null |
| 2024-12-25 | Injecting Bias into Text Classification Models using Backdoor Attacks | A. Dilara Yavuz et.al. | 2412.18975 | null |
| 2024-12-25 | Research Experiment on Multi-Model Comparison for Chinese Text Classification Tasks | JiaCheng Li et.al. | 2412.18908 | null |
| 2024-12-24 | VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis | Shicheng Yin et.al. | 2412.18178 | link |
| 2024-12-24 | Beyond Gradient Averaging in Parallel Optimization: Improved Robustness through Gradient Agreement Filtering | Francois Chaubard et.al. | 2412.18052 | null |
| 2024-12-23 | Explainability in Neural Networks for Natural Language Processing Tasks | Melkamu Mersha et.al. | 2412.18036 | null |
| 2024-12-23 | COBRA: COmBinatorial Retrieval Augmentation for Few-Shot Learning | Arnav M. Das et.al. | 2412.17684 | null |
| 2024-12-23 | Resource-Aware Arabic LLM Creation: Model Adaptation, Integration, and Multi-Domain Testing | Prakash Aryan et.al. | 2412.17548 | link |
| 2024-12-23 | Domain-Incremental Learning for Audio Classification | Manjunath Mulimani et.al. | 2412.17424 | null |
| 2024-12-23 | An Experimental Evaluation of Japanese Tokenizers for Sentiment-Based Text Classification | Andre Rusli et.al. | 2412.17361 | link |
| 2024-12-23 | DiffFormer: a Differential Spatial-Spectral Transformer for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2412.17350 | link |
| 2024-12-22 | Survey on Abstractive Text Summarization: Dataset, Models, and Metrics | Gospel Ozioma Nnadi et.al. | 2412.17165 | link |
| 2024-12-22 | LH-Mix: Local Hierarchy Correlation Guided Mixup over Hierarchical Prompt Tuning | Fanshuang Kong et.al. | 2412.16963 | link |
| 2024-12-22 | Predicting the Reliability of an Image Classifier under Image Distortion | Dang Nguyen et.al. | 2412.16881 | null |
| 2024-12-21 | Forget Vectors at Play: Universal Input Perturbations Driving Machine Unlearning in Image Classification | Changchang Sun et.al. | 2412.16780 | null |
| 2024-12-21 | UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning | Long Zhou et.al. | 2412.16739 | link |
| 2024-12-20 | Mamba2D: A Natively Multi-Dimensional State-Space Model for Vision Tasks | Enis Baty et.al. | 2412.16146 | link |
| 2024-12-20 | Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG | Hasan Md Tusfiqur Alam et.al. | 2412.16086 | link |
| 2024-12-20 | A Thorough Investigation into the Application of Deep CNN for Enhancing Natural Language Processing Capabilities | Chang Weng et.al. | 2412.15900 | null |
| 2024-12-20 | Continual Learning Using a Kernel-Based Method Over Foundation Models | Saleh Momeni et.al. | 2412.15571 | link |
| 2024-12-19 | Time Will Tell: Timing Side Channels via Output Token Count in Large Language Models | Tianchen Zhang et.al. | 2412.15431 | null |
| 2024-12-19 | Till the Layers Collapse: Compressing a Deep Neural Network through the Lenses of Batch Normalization Layers | Zhu Liao et.al. | 2412.15077 | null |
| 2024-12-18 | Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models | Anna Scius-Bertrand et.al. | 2412.13859 | null |
| 2024-12-18 | Modelling Multi-modal Cross-interaction for ML-FSIC Based on Local Feature Selection | Kun Yan et.al. | 2412.13732 | null |
| 2024-12-18 | MBInception: A new Multi-Block Inception Model for Enhancing Image Processing Efficiency | Fatemeh Froughirad et.al. | 2412.13703 | null |
| 2024-12-17 | Identifying Bias in Deep Neural Networks Using Image Transforms | Sai Teja Erukude et.al. | 2412.13079 | link |
| 2024-12-17 | Token-Level Graphs for Short Text Classification | Gregor Donabauer et.al. | 2412.12754 | link |
| 2024-12-17 | Your Next State-of-the-Art Could Come from Another Domain: A Cross-Domain Analysis of Hierarchical Text Classification | Nan Li et.al. | 2412.12744 | link |
| 2024-12-17 | ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries | Wangyu Xue et.al. | 2412.12675 | null |
| 2024-12-17 | Structural Pruning via Spatial-aware Information Redundancy for Semantic Segmentation | Dongyue Wu et.al. | 2412.12672 | link |
| 2024-12-19 | RemoteTrimmer: Adaptive Structural Pruning for Remote Sensing Image Classification | Guangwenjie Zou et.al. | 2412.12603 | link |
| 2024-12-17 | Addressing Small and Imbalanced Medical Image Datasets Using Generative Models: A Comparative Study of DDPM and PGGANs with Random and Greedy K Sampling | Iman Khazrak et.al. | 2412.12532 | link |
| 2024-12-16 | Gramian Multimodal Representation Learning and Alignment | Giordano Cicchetti et.al. | 2412.11959 | link |
| 2024-12-16 | The Impact of Generalization Techniques on the Interplay Among Privacy, Utility, and Fairness in Image Classification | Ahmad Hassanpour et.al. | 2412.11951 | null |
| 2024-12-16 | Does VLM Classification Benefit from LLM Description Semantics? | Pingchuan Ma et.al. | 2412.11917 | link |
| 2024-12-16 | Discrepancy-Aware Attention Network for Enhanced Audio-Visual Zero-Shot Learning | RunLin Yu et.al. | 2412.11715 | null |
| 2024-12-16 | LMM-Regularized CLIP Embeddings for Image Classification | Maria Tzelepi et.al. | 2412.11663 | null |
| 2024-12-16 | Non-Convex Optimization in Federated Learning via Variance Reduction and Adaptive Learning | Dipanwita Thakur et.al. | 2412.11660 | null |
| 2024-12-16 | CNNtention: Can CNNs do better with Attention? | Julian Glattki et.al. | 2412.11657 | link |
| 2024-12-16 | Explicit and Implicit Graduated Optimization in Deep Neural Networks | Naoki Sato et.al. | 2412.11501 | link |
| 2024-12-16 | Towards Better Multi-task Learning: A Framework for Optimizing Dataset Combinations in Large Language Models | Zaifu Zhan et.al. | 2412.11455 | null |
| 2024-12-16 | Scaled Conjugate Gradient Method for Nonconvex Optimization in Deep Neural Networks | Naoki Sato et.al. | 2412.11400 | null |
| 2024-12-13 | Robust image classification with multi-modal large language models | Francesco Villani et.al. | 2412.10353 | null |
| 2024-12-13 | MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization | Shuaiting Li et.al. | 2412.10261 | null |
| 2024-12-13 | Label-template based Few-Shot Text Classification with Contrastive Learning | Guanghua Hou et.al. | 2412.10110 | null |
| 2024-12-13 | Data Pruning Can Do More: A Comprehensive Data Pruning Approach for Object Re-identification | Zi Yang et.al. | 2412.10091 | link |
| 2024-12-13 | Low-Resource Fast Text Classification Based on Intra-Class and Inter-Class Distance Calculation | Yanxu Mao et.al. | 2412.09922 | null |
| 2024-12-12 | DQA: An Efficient Method for Deep Quantization of Deep Neural Network Activations | Wenhao Hu et.al. | 2412.09687 | null |
| 2024-12-12 | Embeddings are all you need! Achieving High Performance Medical Image Classification through Training-Free Embedding Analysis | Raj Hansini Khoiwal et.al. | 2412.09445 | null |
| 2024-12-12 | Learned Compression for Compressed Learning | Dan Jacobellis et.al. | 2412.09405 | link |
| 2024-12-12 | Advancing Attribution-Based Neural Network Explainability through Relative Absolute Magnitude Layer-Wise Relevance Propagation and Multi-Component Evaluation | Davor Vukadin et.al. | 2412.09311 | link |
| 2024-12-13 | An Efficient Framework for Enhancing Discriminative Models via Diffusion Techniques | Chunxiao Li et.al. | 2412.09063 | null |
| 2024-12-12 | STEAM: Squeeze and Transform Enhanced Attention Module | Rishabh Sabharwal et.al. | 2412.09023 | null |
| 2024-12-12 | Stochastic Learning of Non-Conjugate Variational Posterior for Image Classification | Kart-Leong Lim et.al. | 2412.08951 | null |
| 2024-12-11 | BDA: Bangla Text Data Augmentation Framework | Md. Tariquzzaman et.al. | 2412.08753 | null |
| 2024-12-11 | Advancing Single- and Multi-task Text Classification through Large Language Model Fine-tuning | Hang Zhao et.al. | 2412.08587 | null |
| 2024-12-11 | ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts | Sinan Du et.al. | 2412.08341 | null |
| 2024-12-11 | Online training and pruning of photonic neural networks | Jiawei Zhang et.al. | 2412.08184 | null |
| 2024-12-11 | Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation | Jiaming Lv et.al. | 2412.08139 | null |
| 2024-12-11 | Concept Bottleneck Large Language Models | Chung-En Sun et.al. | 2412.07992 | link |
| 2024-12-10 | FastDDS-Based Middleware System for Remote X-Ray Image Classification Using Raspberry Pi | Omar H. Khater et.al. | 2412.07818 | null |
| 2024-12-10 | Leveraging Content and Context Cues for Low-Light Image Enhancement | Igor Morawski et.al. | 2412.07693 | link |
| 2024-12-10 | Post-Training Non-Uniform Quantization for Convolutional Neural Networks | Ahmed Luqman et.al. | 2412.07391 | null |
| 2024-12-10 | Image Classification Using Singular Value Decomposition and Optimization | Isabela M. Yepes et.al. | 2412.07288 | link |
| 2024-12-10 | An Enhancement of CNN Algorithm for Rice Leaf Disease Image Classification in Mobile Applications | Kayne Uriel K. Rodrigo et.al. | 2412.07182 | null |
| 2024-12-09 | Convolution goes higher-order: a biologically inspired mechanism empowers image classification | Simone Azeglio et.al. | 2412.06740 | null |
| 2024-12-09 | Impact of Privacy Parameters on Deep Learning Models for Image Classification | Basanta Chaulagain et.al. | 2412.06689 | null |
| 2024-12-10 | Data Quality Enhancement on the Basis of Diversity with Large Language Models for Text Classification: Uncovered, Difficult, and Noisy | Min Zeng et.al. | 2412.06575 | null |
| 2024-12-09 | How Certain are Uncertainty Estimates? Three Novel Earth Observation Datasets for Benchmarking Uncertainty Quantification in Machine Learning | Yuanyuan Wang et.al. | 2412.06451 | null |
| 2024-12-09 | Optimizing Multi-Task Learning for Enhanced Performance in Large Language Models | Zhen Qi et.al. | 2412.06249 | null |
| 2024-12-08 | Hyperspectral Image Spectral-Spatial Feature Extraction via Tensor Principal Component Analysis | Yuemei Ren et.al. | 2412.06075 | null |
| 2024-12-08 | Vision Transformer-based Semantic Communications With Importance-Aware Quantization | Joohyuk Park et.al. | 2412.06038 | null |
| 2024-12-06 | Sparse autoencoders reveal selective remapping of visual concepts during adaptation | Hyesu Lim et.al. | 2412.05276 | link |
| 2024-12-06 | MTSpark: Enabling Multi-Task Learning with Spiking Neural Networks for Generalist Agents | Avaneesh Devkota et.al. | 2412.04847 | null |
| 2024-12-05 | Grounding Descriptions in Images informs Zero-Shot Visual Recognition | Shaunak Halbe et.al. | 2412.04429 | link |
| 2024-12-05 | FedDUAL: A Dual-Strategy with Adaptive Loss and Dynamic Aggregation for Mitigating Data Heterogeneity in Federated Learning | Pranab Sahoo et.al. | 2412.04416 | link |
| 2024-12-05 | Enhancing Whole Slide Image Classification through Supervised Contrastive Domain Adaptation | Ilán Carretero et.al. | 2412.04260 | null |
| 2024-12-05 | Demonstration Selection for In-Context Learning via Reinforcement Learning | Xubin Wang et.al. | 2412.03966 | null |
| 2024-12-05 | Quantized and Interpretable Learning Scheme for Deep Neural Networks in Classification Task | Alireza Maleki et.al. | 2412.03915 | null |
| 2024-12-05 | Multisource Collaborative Domain Generalization for Cross-Scene Remote Sensing Image Classification | Zhu Han et.al. | 2412.03897 | null |
| 2024-12-05 | Dual-Branch Subpixel-Guided Network for Hyperspectral Image Classification | Zhu Han et.al. | 2412.03893 | link |
| 2024-12-04 | Language Model Meets Prototypes: Towards Interpretable Text Classification Models through Prototypical Networks | Ximing Wen et.al. | 2412.03761 | null |
| 2024-12-05 | Continual Low-Rank Scaled Dot-product Attention | Ginés Carreto Picón et.al. | 2412.03214 | null |
| 2024-12-04 | Multi-Level Correlation Network For Few-Shot Image Classification | Yunkai Dang et.al. | 2412.03159 | link |
| 2024-12-04 | Assessing the performance of CT image denoisers using Laguerre-Gauss Channelized Hotelling Observer for lesion detection | Prabhat Kc et.al. | 2412.02920 | null |
| 2024-12-04 | Higher Order Transformers: Efficient Attention Mechanism for Tensor Structured Data | Soroush Omranpour et.al. | 2412.02919 | null |
| 2024-12-03 | Synergistic Development of Perovskite Memristors and Algorithms for Robust Analog Computing | Nanyang Ye et.al. | 2412.02779 | null |
| 2024-12-03 | Mixture of Physical Priors Adapter for Parameter-Efficient Fine-Tuning | Zhaozhi Wang et.al. | 2412.02759 | null |
| 2024-12-03 | Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks | Jinjin Cai et.al. | 2412.02531 | null |
| 2024-12-04 | GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing | Khawar Islam et.al. | 2412.02366 | null |
| 2024-12-03 | Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model | Xi Cao et.al. | 2412.02343 | null |
| 2024-12-03 | Active Learning via Classifier Impact and Greedy Selection for Interactive Image Retrieval | Leah Bar et.al. | 2412.02310 | link |
| 2024-12-03 | A Classic-Quantum Hybrid Network Framework: CQH-Net | Ao Liu et.al. | 2412.02059 | null |
| 2024-12-02 | PROFIT: A PROximal FIne Tuning Optimizer for Multi-Task Learning | Anirudh S Chakravarthy et.al. | 2412.01930 | null |
| 2024-12-02 | Concept Based Continuous Prompts for Interpretable Text Classification | Qian Chen et.al. | 2412.01644 | link |
| 2024-12-02 | NYT-Connections: A Deceptively Simple Text Classification Task that Stumps System-1 Thinkers | Angel Yahir Loredo Lopez et.al. | 2412.01621 | null |
| 2024-12-02 | Explaining the Unexplained: Revealing Hidden Correlations for Better Interpretability | Wen-Dong Jiang et.al. | 2412.01365 | null |
| 2024-12-02 | Class Distance Weighted Cross Entropy Loss for Classification of Disease Severity | Gorkem Polat et.al. | 2412.01246 | null |
| 2024-11-29 | LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification | Taja Kuzman et.al. | 2411.19638 | link |
| 2024-11-29 | FairDD: Fair Dataset Distillation via Synchronized Matching | Qihang Zhou et.al. | 2411.19623 | null |
| 2024-11-29 | Memristive Nanowire Network for Energy Efficient Audio Classification: Pre-Processing-Free Reservoir Computing with Reduced Latency | Akshaya Rajesh et.al. | 2411.19611 | null |
| 2024-11-29 | Contextual Checkerboard Denoise – A Novel Neural Network-Based Approach for Classification-Aware OCT Image Denoising | Md. Touhidul Islam et.al. | 2411.19549 | link |
| 2024-11-28 | CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections | Mohamed Fazli Imam et.al. | 2411.19346 | link |
| 2024-11-28 | Quantum Neural Networks in Practice: A Comparative Study with Classical Models from Standard Data Sets to Industrial Images | Daniel Basilewitsch et.al. | 2411.19276 | null |
| 2024-11-28 | Controlling Participation in Federated Learning with Feedback | Michael Cummins et.al. | 2411.19242 | null |
| 2024-11-28 | Introducing Three New Benchmark Datasets for Hierarchical Text Classification | Jaco du Toit et.al. | 2411.19119 | null |
| 2024-11-28 | MVFormer: Diversifying Feature Normalization and Token Mixing for Efficient Vision Transformers | Jongseong Bae et.al. | 2411.18995 | null |
| 2024-11-27 | Fall Leaf Adversarial Attack on Traffic Sign Classification | Anthony Etim et.al. | 2411.18776 | null |
| 2024-11-27 | Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data | Aoran Shen et.al. | 2411.18622 | null |
| 2024-11-27 | Pruning Deep Convolutional Neural Network Using Conditional Mutual Information | Tien Vu-Van et.al. | 2411.18578 | null |
| 2024-11-27 | Mixture of Experts in Image Classification: What’s the Sweet Spot? | Mathurin Videau et.al. | 2411.18322 | null |
| 2024-11-27 | KANs for Computer Vision: An Experimental Study | Karthik Mohan et.al. | 2411.18224 | null |
| 2024-11-27 | Spectral-Spatial Transformer with Active Transfer Learning for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2411.18115 | link |
| 2024-11-27 | Vision Mamba Distillation for Low-resolution Fine-grained Image Classification | Yao Chen et.al. | 2411.17980 | link |
| 2024-11-27 | Optimized Tradeoffs for Private Prediction with Majority Ensembling | Shuli Jiang et.al. | 2411.17965 | null |
| 2024-11-26 | What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics | Jordan J. Bird et.al. | 2411.17593 | null |
| 2024-11-26 | TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba | Xiaowen Ma et.al. | 2411.17473 | link |
| 2024-11-26 | SpikeAtConv: An Integrated Spiking-Convolutional Attention Architecture for Energy-Efficient Neuromorphic Vision Processing | Wangdan Liao et.al. | 2411.17439 | null |
| 2024-11-26 | CoA: Chain-of-Action for Generative Semantic Labels | Meng Wei et.al. | 2411.17406 | link |
| 2024-11-26 | BadScan: An Architectural Backdoor Attack on Visual State Space Models | Om Suhas Deshmukh et.al. | 2411.17283 | null |
| 2024-11-26 | An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models | Yunzhe Hu et.al. | 2411.17182 | null |
| 2024-11-25 | Contrastive Multi-graph Learning with Neighbor Hierarchical Sifting for Semi-supervised Text Classification | Wei Ai et.al. | 2411.16787 | null |
| 2024-11-25 | A Supervised Machine Learning Approach for Assessing Grant Peer Review Reports | Gabriel Okasa et.al. | 2411.16662 | link |
| 2024-11-25 | Debiasing Classifiers by Amplifying Bias with Latent Diffusion and Large Language Models | Donggeun Ko et.al. | 2411.16079 | null |
| 2024-11-24 | Context-Aware Detection of Mixed Critical Events using Video Classification | Filza Akhlaq et.al. | 2411.15773 | null |
| 2024-11-23 | MUNBa: Machine Unlearning via Nash Bargaining | Jing Wu et.al. | 2411.15537 | null |
| 2024-11-23 | Twin Trigger Generative Networks for Backdoor Attacks against Object Detection | Zhiying Li et.al. | 2411.15439 | null |
| 2024-11-22 | MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs | Chaoyou Fu et.al. | 2411.15296 | null |
| 2024-11-21 | CODE-CL: COnceptor-Based Gradient Projection for DEep Continual Learning | Marco Paul E. Apolinario et.al. | 2411.15235 | null |
| 2024-11-21 | BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models | Taha Koleilat et.al. | 2411.15232 | link |
| 2024-11-22 | FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification | Zhengrui Guo et.al. | 2411.14743 | link |
| 2024-11-21 | Adaptable Embeddings Network (AEN) | Stan Loosmore et.al. | 2411.13786 | null |
| 2024-11-20 | Hierarchical Text Classification (HTC) vs. eXtreme Multilabel Classification (XML): Two Sides of the Same Medal | Nerijus Bertalis et.al. | 2411.13687 | link |
| 2024-11-20 | Combining Autoregressive and Autoencoder Language Models for Text Classification | João Gonçalves et.al. | 2411.13282 | link |
| 2024-11-20 | MEGL: Multimodal Explanation-Guided Learning | Yifei Zhang et.al. | 2411.13053 | null |
| 2024-11-19 | Problem-dependent convergence bounds for randomized linear gradient compression | Thomas Flynn et.al. | 2411.12898 | null |
| 2024-11-19 | Enhancing Multi-Class Disease Classification: Neoplasms, Cardiovascular, Nervous System, and Digestive Disorders Using Advanced LLMs | Ahmed Akib Jawad Karim et.al. | 2411.12712 | null |
| 2024-11-22 | STREAM: A Universal State-Space Model for Sparse Geometric Data | Mark Schöne et.al. | 2411.12603 | null |
| 2024-11-19 | AdaCM $^2$ : On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction | Yuanbin Man et.al. | 2411.12593 | null |
| 2024-11-19 | Zero-Shot Crate Digging: DJ Tool Retrieval Using Speech Activity, Music Structure And CLAP Embeddings | Iroro Orife et.al. | 2411.12209 | link |
| 2024-11-19 | Invariant Shape Representation Learning For Image Classification | Tonmoy Hossain et.al. | 2411.12201 | link |
| 2024-11-19 | Self-Supervised Learning in Deep Networks: A Pathway to Robust Few-Shot Classification | Yuyang Xiao et.al. | 2411.12151 | null |
| 2024-11-18 | Just Leaf It: Accelerating Diffusion Classifiers with Hierarchical Class Pruning | Arundhati S. Shanbhag et.al. | 2411.12073 | link |
| 2024-11-18 | Vision Language Models Are Few-Shot Audio Spectrogram Classifiers | Satvik Dixit et.al. | 2411.12058 | null |
| 2024-11-18 | Fair Distillation: Teaching Fairness from Biased Teachers in Medical Imaging | Milad Masroor et.al. | 2411.11939 | null |
| 2024-11-18 | Exploring Emerging Trends and Research Opportunities in Visual Place Recognition | Antonios Gasteratos et.al. | 2411.11481 | null |
| 2024-11-16 | MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map | Yuhong Chou et.al. | 2411.10741 | null |
| 2024-11-16 | Diagnostic Text-guided Representation Learning in Hierarchical Classification for Pathological Whole Slide Image | Jiawen Li et.al. | 2411.10709 | null |
| 2024-11-16 | Multi-perspective Contrastive Logit Distillation | Qi Wang et.al. | 2411.10693 | null |
| 2024-11-15 | Vision Eagle Attention: A New Lens for Advancing Image Classification | Mahmudul Hasan et.al. | 2411.10564 | link |
| 2024-11-15 | On the Cost of Model-Serving Frameworks: An Experimental Evaluation | Pasquale De Rosa et.al. | 2411.10337 | null |
| 2024-11-15 | Embedding Byzantine Fault Tolerance into Federated Learning via Virtual Data-Driven Consistency Scoring Plugin | Youngjoon Lee et.al. | 2411.10212 | link |
| 2024-11-15 | Outliers resistant image classification by anomaly detection | Anton Sergeev et.al. | 2411.10150 | null |
| 2024-11-15 | Adapting the Biological SSVEP Response to Artificial Neural Networks | Emirhan Böge et.al. | 2411.10084 | null |
| 2024-11-15 | Evidential Federated Learning for Skin Lesion Image Classification | Rutger Hendrix et.al. | 2411.10071 | null |
| 2024-11-14 | Adversarial Attacks Using Differentiable Rendering: A Survey | Matthew Hull et.al. | 2411.09749 | null |
| 2024-11-14 | ResidualDroppath: Enhancing Feature Reuse over Residual Connections | Sejik Park et.al. | 2411.09475 | null |
| 2024-11-14 | SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers | Shravan Venkatraman et.al. | 2411.09420 | link |
| 2024-11-14 | Heuristical Comparison of Vision Transformers Against Convolutional Neural Networks for Semantic Segmentation on Remote Sensing Imagery | Ashim Dahal et.al. | 2411.09101 | link |
| 2024-11-13 | Computed tomography using meta-optics | Maksym Zhelyeznuyakov et.al. | 2411.08995 | null |
| 2024-11-13 | CoCoP: Enhancing Text Classification with LLM through Code Completion Prompt | Mohammad Mahdi Mohajeri et.al. | 2411.08979 | null |
| 2024-11-13 | ScaleNet: Scale Invariance Learning in Directed Graphs | Qin Jiang et.al. | 2411.08758 | link |
| 2024-11-13 | Efficient Whole Slide Image Classification through Fisher Vector Representation | Ravi Kant Gupta et.al. | 2411.08530 | null |
| 2024-11-12 | HMIL: Hierarchical Multi-Instance Learning for Fine-Grained Whole Slide Image Classification | Cheng Jin et.al. | 2411.07660 | null |
| 2024-11-12 | Semantic segmentation on multi-resolution optical and microwave data using deep learning | Jai G Singla et.al. | 2411.07581 | null |
| 2024-11-11 | The Inherent Adversarial Robustness of Analog In-Memory Computing | Corey Lammie et.al. | 2411.07023 | null |
| 2024-11-11 | ScaleKD: Strong Vision Transformers Could Be Excellent Teachers | Jiawei Fan et.al. | 2411.06786 | link |
| 2024-11-11 | A Text Classification Model Combining Adversarial Training with Pre-trained Language Model and neural networks: A Case Study on Telecom Fraud Incident Texts | Liu Zhuoxian et.al. | 2411.06772 | null |
| 2024-11-11 | Can KAN Work? Exploring the Potential of Kolmogorov-Arnold Networks in Computer Vision | Yueyang Cang et.al. | 2411.06727 | null |
| 2024-11-10 | Deep Active Learning in the Open World | Tian Xie et.al. | 2411.06353 | null |
| 2024-11-09 | Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs | Shan Zhong et.al. | 2411.06175 | null |
| 2024-11-09 | AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems | Zhiyu Zhu et.al. | 2411.06146 | null |
| 2024-11-09 | Exploring Structural Nonlinearity in Binary Polariton-Based Neuromorphic Architectures | Evgeny Sedov et.al. | 2411.06124 | null |
| 2024-11-09 | Mutual-energy inner product optimization method for constructing feature coordinates and image classification in Machine Learning | Yuanxiu Wang et.al. | 2411.06100 | null |
| 2024-11-08 | GUIDEQ: Framework for Guided Questioning for progressive informational collection and classification | Priya Mishra et.al. | 2411.05991 | link |
| 2024-11-08 | FisherMask: Enhancing Neural Network Labeling Efficiency in Image Classification Using Fisher Information | Shreen Gul et.al. | 2411.05752 | link |
| 2024-11-08 | Visual-TCAV: Concept-based Attribution and Saliency Maps for Post-hoc Explainability in Image Classification | Antonio De Santis et.al. | 2411.05698 | null |
| 2024-11-08 | Efficient Audio-Visual Fusion for Video Classification | Mahrukh Awan et.al. | 2411.05603 | null |
| 2024-11-08 | Training objective drives the consistency of representational similarity across datasets | Laure Ciernik et.al. | 2411.05561 | link |
| 2024-11-08 | Estimating the Influence of Sequentially Correlated Literary Properties in Textual Classification: A Data-Centric Hypothesis-Testing Approach | Gideon Yoffe et.al. | 2411.04950 | null |
| 2024-11-07 | Attention Masks Help Adversarial Attacks to Bypass Safety Detectors | Yunfan Shi et.al. | 2411.04772 | link |
| 2024-11-07 | Zero-Shot Temporal Resolution Domain Adaptation for Spiking Neural Networks | Sanja Karilanova et.al. | 2411.04760 | null |
| 2024-11-07 | Is network fragmentation a useful complexity measure? | Coenraad Mouton et.al. | 2411.04695 | null |
| 2024-11-07 | DISCO: DISCovering Overfittings as Causal Rules for Text Classification Models | Zijian Zhang et.al. | 2411.04649 | null |
| 2024-11-07 | Neural Fingerprints for Adversarial Attack Detection | Haim Fisher et.al. | 2411.04533 | link |
| 2024-11-06 | Multimodal Structure-Aware Quantum Data Processing | Hala Hawashin et.al. | 2411.04242 | null |
| 2024-11-06 | RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models | Maya Varma et.al. | 2411.04097 | link |
| 2024-11-06 | Overcoming label shift in targeted federated learning | Edvin Listo Zec et.al. | 2411.03799 | null |
| 2024-11-06 | Deferred Poisoning: Making the Model More Vulnerable via Hessian Singularization | Yuhao He et.al. | 2411.03752 | null |
| 2024-11-05 | Judge Like a Real Doctor: Dual Teacher Sample Consistency Framework for Semi-supervised Medical Image Classification | Zhang Qixiang et.al. | 2411.03041 | null |
| 2024-11-06 | Confidence Calibration of Classifiers with Many Classes | Adrien LeCoz et.al. | 2411.02988 | link |
| 2024-11-05 | Domain Expansion and Boundary Growth for Open-Set Single-Source Domain Generalization | Pengkun Jiao et.al. | 2411.02920 | null |
| 2024-11-05 | ADOPT: Modified Adam Can Converge with Any $β_2$ with the Optimal Rate | Shohei Taniguchi et.al. | 2411.02853 | link |
| 2024-11-05 | Integrated lithium niobate photonic computing circuit based on efficient and high-speed electro-optic conversion | Yaowen Hu et.al. | 2411.02734 | null |
| 2024-11-06 | Wave Network: An Ultra-Small Language Model | Xin Zhang et.al. | 2411.02674 | null |
| 2024-11-04 | FUSECAPS: Investigating Feature Fusion Based Framework for Capsule Endoscopy Image Classification | Bidisha Chakraborty et.al. | 2411.02637 | null |
| 2024-11-04 | TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives | Maitreya Patel et.al. | 2411.02545 | null |
| 2024-11-04 | A Comparative Analysis of Instruction Fine-Tuning LLMs for Financial Text Classification | Sorouralsadat Fatemi et.al. | 2411.02476 | null |
| 2024-11-04 | Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models | Sharat Agarwal et.al. | 2411.01925 | null |
| 2024-11-03 | Optimizing Gastrointestinal Diagnostics: A CNN-Based Model for VCE Image Classification | Vaneeta Ahlawat et.al. | 2411.01652 | null |
| 2024-11-03 | ParseCaps: An Interpretable Parsing Capsule Network for Medical Image Diagnosis | Xinyu Geng et.al. | 2411.01564 | null |
| 2024-11-03 | Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future Envision | Xiangzhong Luo et.al. | 2411.01431 | null |
| 2024-11-02 | Combining Financial Data and News Articles for Stock Price Movement Prediction Using Large Language Models | Ali Elahi et.al. | 2411.01368 | null |
| 2024-11-02 | Optimizing Violence Detection in Video Classification Accuracy through 3D Convolutional Neural Networks | Aarjav Kavathia et.al. | 2411.01348 | null |
| 2024-11-02 | MIC: Medical Image Classification Using Chest X-ray (COVID-19 and Pneumonia) Dataset with the Help of CNN and Customized CNN | Nafiz Fahad et.al. | 2411.01163 | null |
| 2024-11-02 | Few-Class Arena: A Benchmark for Efficient Selection of Vision Models and Dataset Difficulty Measurement | Bryan Bo Cao et.al. | 2411.01099 | link |
| 2024-11-01 | Towards Robust Text Classification: Mitigating Spurious Correlations with Causal Learning | Yuqing Zhou et.al. | 2411.01045 | null |
| 2024-11-01 | FISHing in Uncertainty: Synthetic Contrastive Learning for Genetic Aberration Detection | Simon Gutwein et.al. | 2411.01025 | link |
| 2024-10-31 | Video Token Merging for Long-form Video Understanding | Seon-Ho Lee et.al. | 2410.23782 | null |
| 2024-10-31 | Neurobench: DCASE 2020 Acoustic Scene Classification benchmark on XyloAudio 2 | Weijie Ke et.al. | 2410.23776 | null |
| 2024-10-31 | QUEST-A: Untrained Filtering with Trained Focusing led to Enhanced Quantum Architectures | Lian-Hui Yu et.al. | 2410.23560 | link |
| 2024-11-01 | Large Language Models for Patient Comments Multi-Label Classification | Hajar Sakai et.al. | 2410.23528 | null |
| 2024-10-30 | Multilingual Vision-Language Pre-training for the Remote Sensing Domain | João Daniel Silva et.al. | 2410.23370 | null |
| 2024-10-30 | Domain-decomposed image classification algorithms using linear discriminant analysis and convolutional neural networks | Axel Klawonn et.al. | 2410.23359 | null |
| 2024-10-30 | CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP | Tianyu Yang et.al. | 2410.23330 | null |
| 2024-10-30 | Don’t Just Pay Attention, PLANT It: Transfer L2R Models to Fine-tune Attention in Extreme Multi-Label Text Classification | Debjyoti Saharoy et.al. | 2410.23066 | null |
| 2024-10-30 | Automated Trustworthiness Oracle Generation for Machine Learning Text Classifiers | Lam Nguyen Tung et.al. | 2410.22663 | null |
| 2024-10-29 | Developing Convolutional Neural Networks using a Novel Lamarckian Co-Evolutionary Algorithm | Zaniar Sharifi et.al. | 2410.22487 | null |
| 2024-10-29 | EfficientNet with Hybrid Attention Mechanisms for Enhanced Breast Histopathology Classification: A Comprehensive Approach | Naren Sengodan et.al. | 2410.22392 | null |
| 2024-10-29 | DISCERN: Decoding Systematic Errors in Natural Language for Text Classifiers | Rakesh R. Menon et.al. | 2410.22239 | null |
| 2024-10-29 | Class-Aware Contrastive Optimization for Imbalanced Text Classification | Grigorii Khvatskii et.al. | 2410.22197 | null |
| 2024-10-29 | Active Learning for Vision-Language Models | Bardia Safaei et.al. | 2410.22187 | null |
| 2024-10-29 | Multi-Level Feature Distillation of Joint Teachers Trained on Distinct Image Datasets | Adrian Iordache et.al. | 2410.22184 | link |
| 2024-10-29 | Natural Language Processing for Analyzing Electronic Health Records and Clinical Notes in Cancer Research: A Review | Muhammad Bilal et.al. | 2410.22180 | null |
| 2024-10-29 | FakeFormer: Efficient Vulnerability-Driven Transformers for Generalisable Deepfake Detection | Dat Nguyen et.al. | 2410.21964 | null |
| 2024-10-29 | Bayesian Optimization for Hyperparameters Tuning in Neural Networks | Gabriele Onorato et.al. | 2410.21886 | null |
| 2024-10-29 | Advancing Efficient Brain Tumor Multi-Class Classification – New Insights from the Vision Mamba Model in Transfer Learning | Yinyi Lai et.al. | 2410.21872 | null |
| 2024-10-28 | Audio Classification of Low Feature Spectrograms Utilizing Convolutional Neural Networks | Noel Elias et.al. | 2410.21561 | null |
| 2024-10-30 | A Novel Score-CAM based Denoiser for Spectrographic Signature Extraction without Ground Truth | Noel Elias et.al. | 2410.21557 | null |
| 2024-10-28 | Attacking Misinformation Detection Using Adversarial Examples Generated by Language Models | Piotr Przybyła et.al. | 2410.20940 | null |
| 2024-10-28 | Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning | Bing Han et.al. | 2410.20775 | null |
| 2024-10-28 | Interpretable Image Classification with Adaptive Prototype-based Vision Transformers | Chiyu Ma et.al. | 2410.20722 | null |
| 2024-10-27 | Graph Neural Networks on Discriminative Graphs of Words | Yassine Abbahaddou et.al. | 2410.20469 | null |
| 2024-10-27 | Historical Test-time Prompt Tuning for Vision Foundation Models | Jingyi Zhang et.al. | 2410.20346 | null |
| 2024-10-27 | Sequential Large Language Model-Based Hyper-Parameter Optimization | Kanan Mahammadli et.al. | 2410.20302 | link |
| 2024-10-26 | Enhancing CNN Classification with Lamarckian Memetic Algorithms and Local Search | Akhilbaran Ghosh et.al. | 2410.20234 | null |
| 2024-10-26 | Annotation Efficiency: Identifying Hard Samples via Blocked Sparse Linear Bandits | Adit Jain et.al. | 2410.20041 | null |
| 2024-10-26 | Attacks against Abstractive Text Summarization Models through Lead Bias and Influence Functions | Poojitha Thota et.al. | 2410.20019 | null |
| 2024-10-26 | Vulnerability of LLMs to Vertically Aligned Text Manipulations | Zhecheng Li et.al. | 2410.20016 | null |
| 2024-10-25 | Learning the Regularization Strength for Deep Fine-Tuning via a Data-Emphasized Variational Objective | Ethan Harvey et.al. | 2410.19675 | null |
| 2024-10-24 | Noise Adaption Network for Morse Code Image Classification | Xiaxia Wang et.al. | 2410.19180 | link |
| 2024-10-24 | Hybrid Quantum-Classical Feature Extraction approach for Image Classification using Autoencoders and Quantum SVMs | Donovan Slabbert et.al. | 2410.18814 | null |
| 2024-10-24 | Spatial-Temporal Search for Spiking Neural Networks | Kaiwei Che et.al. | 2410.18580 | null |
| 2024-10-25 | Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks | Lehan Wang et.al. | 2410.18387 | null |
| 2024-10-23 | Using Cartesian slice plots of a cosmological simulation as input of a convolutional neural network | Guillermo Arreaga-Garcia et.al. | 2410.18320 | null |
| 2024-10-25 | Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing | Dongliang Guo et.al. | 2410.18267 | null |
| 2024-10-23 | Future Token Prediction – Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction | Nicholas Walker et.al. | 2410.18160 | null |
| 2024-10-23 | Deep Learning for Active Region Classification: A Systematic Study from Convolutional Neural Networks to Vision Transformers | Edoardo Legnaro et.al. | 2410.17816 | null |
| 2024-10-23 | New Insight in Cervical Cancer Diagnosis Using Convolution Neural Network Architecture | Ach. Khozaimi et.al. | 2410.17735 | null |
| 2024-10-24 | Advancing Interpretability in Text Classification through Prototype Learning | Bowen Wei et.al. | 2410.17546 | null |
| 2024-10-23 | Enhancing Multimodal Medical Image Classification using Cross-Graph Modal Contrastive Learning | Jun-En Ding et.al. | 2410.17494 | null |
| 2024-10-22 | Data Obfuscation through Latent Space Projection (LSP) for Privacy-Preserving AI Governance: Case Studies in Medical Diagnosis and Finance Fraud Detection | Mahesh Vaijainthymala Krishnamoorthy et.al. | 2410.17459 | null |
| 2024-10-22 | Altogether: Image Captioning via Re-aligning Alt-text | Hu Xu et.al. | 2410.17251 | null |
| 2024-10-22 | KANICE: Kolmogorov-Arnold Networks with Interactive Convolutional Elements | Md Meftahul Ferdaus et.al. | 2410.17172 | link |
| 2024-10-22 | Development of CNN Architectures using Transfer Learning Methods for Medical Image Classification | Ganga Prasad Basyal et.al. | 2410.16711 | null |
| 2024-10-21 | Efficient Neural Network Training via Subset Pretraining | Jan Spörer et.al. | 2410.16523 | null |
| 2024-10-21 | 1024m at SMM4H 2024: Tasks 3, 5 & 6 – Ensembles of Transformers and Large Language Models for Medical Text Classification | Ram Mohan Rao Kadiyala et.al. | 2410.15998 | null |
| 2024-10-21 | Visual Representation Learning Guided By Multi-modal Prior Knowledge | Hongkuan Zhou et.al. | 2410.15981 | null |
| 2024-10-21 | AutoTrain: No-code training for state-of-the-art models | Abhishek Thakur et.al. | 2410.15735 | link |
| 2024-10-21 | ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts | Xumeng Han et.al. | 2410.15732 | null |
| 2024-10-21 | P-YOLOv8: Efficient and Accurate Real-Time Detection of Distracted Driving | Mohamed R. Elshamy et.al. | 2410.15602 | null |
| 2024-10-20 | Open-vocabulary vs. Closed-set: Best Practice for Few-shot Object Detection Considering Text Describability | Yusuke Hosoya et.al. | 2410.15315 | link |
| 2024-10-19 | Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion | Chaodong Xiao et.al. | 2410.15091 | link |
| 2024-10-19 | PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification | Ashish Seth et.al. | 2410.15062 | null |
| 2024-10-19 | Weakly-supervised diagnosis identification from Italian discharge letters | Vittorio Torri et.al. | 2410.15051 | null |
| 2024-10-19 | Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation | Seulbi Lee et.al. | 2410.14975 | null |
| 2024-10-18 | A Hybrid Feature Fusion Deep Learning Framework for Leukemia Cancer Detection in Microscopic Blood Sample Using Gated Recurrent Unit and Uncertainty Quantification | Maksuda Akter et.al. | 2410.14536 | null |
| 2024-10-18 | Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation | Shuai Zhao et.al. | 2410.14425 | link |
| 2024-10-18 | A Novel Method to Metigate Demographic and Expert Bias in ICD Coding with Causal Inference | Bin Zhang et.al. | 2410.14236 | null |
| 2024-10-18 | Comparative Evaluation of Clustered Federated Learning Method | Michael Ben Ali et.al. | 2410.14212 | link |
| 2024-10-17 | Reproducibility study of “LICO: Explainable Models with Language-Image Consistency” | Luan Fletcher et.al. | 2410.13989 | link |
| 2024-10-17 | LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning | Yiming Shi et.al. | 2410.13618 | link |
| 2024-10-17 | Augmentation Policy Generation for Image Classification Using Large Language Models | Ant Duru et.al. | 2410.13453 | null |
| 2024-10-17 | Similarity-Dissimilarity Loss with Supervised Contrastive Learning for Multi-label Classification | Guangming Huang et.al. | 2410.13439 | null |
| 2024-10-16 | Interpreting and Analyzing CLIP’s Zero-Shot Image Classification via Mutual Knowledge | Fawaz Sammani et.al. | 2410.13016 | link |
| 2024-10-16 | PND-Net: Plant Nutrition Deficiency and Disease Classification using Graph Convolutional Network | Asish Bera et.al. | 2410.12742 | null |
| 2024-10-16 | Beyond Speech and More: Investigating the Emergent Ability of Speech Foundation Models for Classifying Physiological Time-Series Signals | Orchid Chetia Phukan et.al. | 2410.12645 | null |
| 2024-10-17 | From Measurement Instruments to Data: Leveraging Theory-Driven Synthetic Training Data for Classifying Social Constructs | Lukas Birkenmaier et.al. | 2410.12622 | null |
| 2024-10-16 | Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look | Yong Zhang et.al. | 2410.12396 | null |
| 2024-10-15 | Clustering doc2vec output for topic-dimensionality reduction: A MITRE ATT&CK calibration | Nathan Monnet et.al. | 2410.11573 | null |
| 2024-10-15 | LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large Models | Hossein Abdi et.al. | 2410.11551 | null |
| 2024-10-15 | Reducing Labeling Costs in Sentiment Analysis via Semi-Supervised Learning | Minoo Jafarlou et.al. | 2410.11355 | null |
| 2024-10-14 | Towards a More Complete Theory of Function Preserving Transforms | Michael Painter et.al. | 2410.11038 | null |
| 2024-10-14 | Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning | Etai Littwin et.al. | 2410.10773 | null |
| 2024-10-15 | Ensemble of ConvNeXt V2 and MaxViT for Long-Tailed CXR Classification with View-Based Aggregation | Yosuke Yamagishi et.al. | 2410.10710 | link |
| 2024-10-14 | Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification | Jiaxiang Gou et.al. | 2410.10573 | null |
| 2024-10-14 | Dynamic Power Control in a Hardware Neural Network with Error-Configurable MAC Units | Maedeh Ghaderi et.al. | 2410.10545 | null |
| 2024-10-14 | Improve Meta-learning for Few-Shot Text Classification with All You Can Acquire from the Tasks | Xinyue Liu et.al. | 2410.10454 | link |
| 2024-10-14 | GlobalMamba: Global Image Serialization for Vision Mamba | Chengkun Wang et.al. | 2410.10316 | link |
| 2024-10-14 | A Multi-Task Text Classification Pipeline with Natural Language Explanations: A User-Centric Evaluation in Sentiment Analysis and Offensive Language Identification in Greek Tweets | Nikolaos Mylonas et.al. | 2410.10290 | null |
| 2024-10-14 | big.LITTLE Vision Transformer for Efficient Visual Recognition | He Guo et.al. | 2410.10267 | null |
| 2024-10-14 | SkillAggregation: Reference-free LLM-Dependent Aggregation | Guangzhi Sun et.al. | 2410.10215 | null |
| 2024-10-14 | Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models? | Zeliang Zhang et.al. | 2410.10160 | null |
| 2024-10-11 | Efficient Hyperparameter Importance Assessment for CNNs | Ruinan Wang et.al. | 2410.08920 | null |
| 2024-10-11 | Parameter-Efficient Fine-Tuning of Large Language Models using Semantic Knowledge Tuning | Nusrat Jahan Prottasha et.al. | 2410.08598 | null |
| 2024-10-11 | DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention | Nguyen Huu Bao Long et.al. | 2410.08582 | link |
| 2024-10-11 | Accelerated Distributed Stochastic Non-Convex Optimization over Time-Varying Directed Networks | Yiyue Chen et.al. | 2410.08508 | null |
| 2024-10-11 | Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP | Eunji Kim et.al. | 2410.08469 | null |
| 2024-10-10 | Bilinear MLPs enable weight-based mechanistic interpretability | Michael T. Pearce et.al. | 2410.08417 | null |
| 2024-10-10 | What is Left After Distillation? How Knowledge Transfer Impacts Fairness and Bias | Aida Mohammadshahi et.al. | 2410.08407 | null |
| 2024-10-10 | Time Traveling to Defend Against Adversarial Example Attacks in Image Classification | Anthony Etim et.al. | 2410.08338 | null |
| 2024-10-10 | More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing | Sagi Shaier et.al. | 2410.08003 | null |
| 2024-10-10 | When the Small-Loss Trick is Not Enough: Multi-Label Image Classification with Noisy Labels Applied to CCTV Sewer Inspections | Keryan Chelouche et.al. | 2410.07689 | null |
| 2024-10-10 | Invisibility Cloak: Disappearance under Human Pose Estimation via Backdoor Attacks | Minxing Zhang et.al. | 2410.07670 | null |
| 2024-10-10 | StablePrompt: Automatic Prompt Tuning using Reinforcement Learning for Large Language Models | Minchan Kwon et.al. | 2410.07652 | null |
| 2024-10-10 | Explainability of Deep Neural Networks for Brain Tumor Detection | S. Park et.al. | 2410.07613 | link |
| 2024-10-10 | CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features | Po-han Li et.al. | 2410.07610 | null |
| 2024-10-09 | One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation | Fabian Paischer et.al. | 2410.07170 | link |
| 2024-10-09 | JPEG Inspired Deep Learning | Ahmed H. Salamah et.al. | 2410.07081 | link |
| 2024-10-09 | Optimizing Estimators of Squared Calibration Errors in Classification | Sebastian G. Gruber et.al. | 2410.07014 | null |
| 2024-10-09 | Spectral and Rhythm Features for Audio Classification with Deep Convolutional Neural Networks | Friedrich Wolf-Monheim et.al. | 2410.06927 | null |
| 2024-10-09 | QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model | Fei Xie et.al. | 2410.06806 | null |
| 2024-10-09 | Convex Distillation: Efficient Compression of Deep Networks via Convex Optimization | Prateek Varshney et.al. | 2410.06567 | null |
| 2024-10-08 | A Comparative Study of Hybrid Models in Health Misinformation Text Classification | Mkululi Sikosana et.al. | 2410.06311 | null |
| 2024-10-08 | Conformal Structured Prediction | Botong Zhang et.al. | 2410.06296 | link |
| 2024-10-08 | TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data | Jeremy Andrew Irvin et.al. | 2410.06234 | null |
| 2024-10-08 | Manual Verbalizer Enrichment for Few-Shot Text Classification | Quang Anh Nguyen et.al. | 2410.06173 | null |
| 2024-10-07 | LoTLIP: Improving Language-Image Pre-training for Long Text Understanding | Wei Wu et.al. | 2410.05249 | null |
| 2024-10-07 | Variable Resolution Pixel Quantization for Low Power Machine Vision Application on Edge | Senorita Deb et.al. | 2410.05189 | null |
| 2024-10-07 | IGroupSS-Mamba: Interval Group Spatial-Spectral Mamba for Hyperspectral Image Classification | Yan He et.al. | 2410.05100 | null |
| 2024-10-07 | Explanation sensitivity to the randomness of large language models: the case of journalistic text classification | Jeremie Bogaert et.al. | 2410.05085 | null |
| 2024-10-07 | Control-oriented Clustering of Visual Latent Representation | Han Qi et.al. | 2410.05063 | null |
| 2024-10-07 | SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification | Benjamin Feuer et.al. | 2410.05057 | link |
| 2024-10-07 | Art Forgery Detection using Kolmogorov Arnold and Convolutional Neural Networks | Sandro Boccuzzo et.al. | 2410.04866 | null |
| 2024-10-06 | MECFormer: Multi-task Whole Slide Image Classification with Expert Consultation Network | Doanh C. Bui et.al. | 2410.04507 | null |
| 2024-10-06 | Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification | Zhaorui Tan et.al. | 2410.04492 | link |
| 2024-10-05 | IT $^3$ : Idempotent Test-Time Training | Nikita Durasov et.al. | 2410.04201 | null |
| 2024-10-04 | Classification-Denoising Networks | Louis Thiry et.al. | 2410.03505 | null |
| 2024-10-04 | A Multimodal Framework for Deepfake Detection | Kashish Gandhi et.al. | 2410.03487 | null |
| 2024-10-04 | On Uncertainty In Natural Language Processing | Dennis Ulmer et.al. | 2410.03446 | link |
| 2024-10-04 | Comparing zero-shot self-explanations with human rationales in multilingual text classification | Stephanie Brandl et.al. | 2410.03296 | null |
| 2024-10-04 | Sm: enhanced localization in Multiple Instance Learning for medical imaging classification | Francisco M. Castro-Macías et.al. | 2410.03276 | null |
| 2024-10-04 | Selective Transformer for Hyperspectral Image Classification | Yichu Xu et.al. | 2410.03171 | null |
| 2024-10-03 | CPFD: Confidence-aware Privileged Feature Distillation for Short Video Classification | Jinghao Shi et.al. | 2410.03038 | null |
| 2024-10-03 | On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions | Huy Nguyen et.al. | 2410.02935 | null |
| 2024-10-03 | Lie Algebra Canonicalization: Equivariant Neural Operators under arbitrary Lie Groups | Zakhar Shumaylov et.al. | 2410.02698 | null |
| 2024-10-03 | LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model | Duy M. H. Nguyen et.al. | 2410.02615 | null |
| 2024-10-03 | Personalized Quantum Federated Learning for Privacy Image Classification | Jinjing Shi et.al. | 2410.02547 | null |
| 2024-10-03 | BiSSL: Bilevel Optimization for Self-Supervised Pre-Training and Fine-Tuning | Gustav Wagner Zakarias et.al. | 2410.02387 | null |
| 2024-10-03 | CTARR: A fast and robust method for identifying anatomical regions on CT images via atlas registration | Thomas Buddenkotte et.al. | 2410.02316 | link |
| 2024-10-03 | Hard Negative Sample Mining for Whole Slide Image Classification | Wentao Huang et.al. | 2410.02212 | link |
| 2024-10-02 | Kolmogorov-Arnold Network Autoencoders | Mohammadamin Moradi et.al. | 2410.02077 | link |
| 2024-10-02 | Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data | Sreyan Ghosh et.al. | 2410.02056 | null |
| 2024-10-02 | FLAG: Financial Long Document Classification via AMR-based GNN | Bolun et.al. | 2410.02024 | link |
| 2024-10-02 | MONICA: Benchmarking on Long-tailed Medical Image Classification | Lie Ju et.al. | 2410.02010 | null |
| 2024-10-02 | Revisiting Hierarchical Text Classification: Inference and Metrics | Roman Plaud et.al. | 2410.01305 | link |
| 2024-10-02 | Automatic deductive coding in discourse analysis: an application of large language models in learning analytics | Lishan Zhang et.al. | 2410.01240 | null |
| 2024-10-01 | Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-Time | Chiao-An Yang et.al. | 2410.01083 | link |
| 2024-10-01 | Local-to-Global Self-Supervised Representation Learning for Diabetic Retinopathy Grading | Mostafa Hajighasemloua et.al. | 2410.00779 | null |
| 2024-10-01 | NECOMIMI: Neural-Cognitive Multimodal EEG-informed Image Generation with Diffusion Models | Chi-Sheng Chen et.al. | 2410.00712 | null |
| 2024-10-01 | TikGuard: A Deep Learning Transformer-Based Solution for Detecting Unsuitable TikTok Content for Kids | Mazen Balat et.al. | 2410.00403 | null |
| 2024-09-30 | KPCA-CAM: Visual Explainability of Deep Computer Vision Models using Kernel PCA | Sachin Karmani et.al. | 2410.00267 | null |
| 2024-09-30 | A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification | Marina Ribeiro et.al. | 2410.00250 | null |
| 2024-09-30 | Evaluating the performance of state-of-the-art esg domain-specific pre-trained large language models in text classification against existing models and traditional machine learning techniques | Tin Yuet Chung et.al. | 2410.00207 | null |
| 2024-10-02 | Evaluating the fairness of task-adaptive pretraining on unlabeled test data before few-shot text classification | Kush Dubey et.al. | 2410.00179 | link |
| 2024-09-30 | POMONAG: Pareto-Optimal Many-Objective Neural Architecture Generator | Eugenio Lomurno et.al. | 2409.20447 | null |
| 2024-09-30 | Satellite image classification with neural quantum kernels | Pablo Rodriguez-Grasa et.al. | 2409.20356 | null |
| 2024-09-30 | All-optical autoencoder machine learning framework using diffractive processors | Peijie Feng et.al. | 2409.20346 | null |
| 2024-09-30 | Fine-Tuning Personalization in Federated Learning to Mitigate Adversarial Clients | Youssef Allouah et.al. | 2409.20329 | null |
| 2024-09-30 | Classroom-Inspired Multi-Mentor Distillation with Adaptive Learning Strategies | Shalini Sarode et.al. | 2409.20237 | null |
| 2024-09-30 | Classification of Radiological Text in Small and Imbalanced Datasets in a Non-English Language | Vincent Beliveau et.al. | 2409.20147 | null |
| 2024-09-30 | SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers | Nick Nikzad et.al. | 2409.19850 | null |
| 2024-09-29 | Adversarial Examples for DNA Classification | Hyunwoo Yoo et.al. | 2409.19788 | null |
| 2024-09-29 | FAST: A Dual-tier Few-Shot Learning Paradigm for Whole Slide Image Classification | Kexue Fu et.al. | 2409.19720 | null |
| 2024-09-29 | Vision-Language Models are Strong Noisy Label Detectors | Tong Wei et.al. | 2409.19696 | link |
| 2024-09-27 | Unconditional stability of a recurrent neural circuit implementing divisive normalization | Shivang Rawat et.al. | 2409.18946 | null |
| 2024-09-27 | Subspace Preserving Quantum Convolutional Neural Network Architectures | Léo Monbroussou et.al. | 2409.18918 | null |
| 2024-09-27 | Med-IC: Fusing a Single Layer Involution with Convolutions for Enhanced Medical Image Classification and Segmentation | Md. Farhadul Islam et.al. | 2409.18506 | null |
| 2024-09-26 | Towards the Mitigation of Confirmation Bias in Semi-supervised Learning: a Debiased Training Perspective | Yu Wang et.al. | 2409.18316 | null |
| 2024-09-26 | Realistic Evaluation of Model Merging for Compositional Generalization | Derek Tam et.al. | 2409.18314 | null |
| 2024-09-26 | DARE: Diverse Visual Question Answering with Robustness Evaluation | Hannah Sterz et.al. | 2409.18023 | null |
| 2024-09-26 | The Lou Dataset – Exploring the Impact of Gender-Fair Language in German Text Classification | Andreas Waldis et.al. | 2409.17929 | null |
| 2024-09-26 | Cascade Prompt Learning for Vision-Language Model Adaptation | Ge Wu et.al. | 2409.17805 | null |
| 2024-09-26 | Byzantine-Robust Aggregation for Securing Decentralized Federated Learning | Diego Cajaraville-Aboy et.al. | 2409.17754 | null |
| 2024-09-26 | Let the Quantum Creep In: Designing Quantum Neural Network Models by Gradually Swapping Out Classical Components | Peiyong Wang et.al. | 2409.17583 | link |
| 2024-09-26 | Leveraging Annotator Disagreement for Text Classification | Jin Xu et.al. | 2409.17577 | null |
| 2024-09-26 | Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE | Xun Zhu et.al. | 2409.17508 | null |
| 2024-09-26 | Reducing and Exploiting Data Augmentation Noise through Meta Reweighting Contrastive Learning for Text Classification | Guanyi Mou et.al. | 2409.17474 | null |
| 2024-09-26 | Navigating the Shortcut Maze: A Comprehensive Analysis of Shortcut Learning in Text Classification by Language Models | Yuqing Zhou et.al. | 2409.17455 | null |
| 2024-09-25 | Block Expanded DINORET: Adapting Natural Domain Foundation Models for Retinal Imaging Without Catastrophic Forgetting | Jay Zoellin et.al. | 2409.17332 | null |
| 2024-09-25 | BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices | Yongqi Xu et.al. | 2409.17093 | link |
| 2024-09-25 | Accumulator-Aware Post-Training Quantization | Ian Colbert et.al. | 2409.17092 | null |
| 2024-09-26 | HVT: A Comprehensive Vision Framework for Learning in Non-Euclidean Space | Jacob Fein-Ashley et.al. | 2409.16897 | link |
| 2024-09-25 | Shifting from endangerment to rebirth in the Artificial Intelligence Age: An Ensemble Machine Learning Approach for Hawrami Text Classification | Aram Khaksar et.al. | 2409.16884 | null |
| 2024-09-25 | Explicitly Modeling Pre-Cortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness | Lucas Piper et.al. | 2409.16838 | link |
| 2024-09-24 | Unleashing the Potential of Synthetic Images: A Study on Histopathology Image Classification | Leire Benito-Del-Valle et.al. | 2409.16002 | link |
| 2024-09-24 | An ensemble framework approach of hybrid Quantum convolutional neural networks for classification of breast cancer images | Dibyasree Guha et.al. | 2409.15958 | null |
| 2024-09-24 | iGAiVA: Integrated Generative AI and Visual Analytics in a Machine Learning Workflow for Text Classification | Yuanzhe Jin et.al. | 2409.15848 | link |
| 2024-09-23 | Optimizing News Text Classification with Bi-LSTM and Attention Mechanism for Efficient Data Processing | Bingyao Liu et.al. | 2409.15576 | null |
| 2024-09-23 | Critic Loss for Image Classification | Brendan Hogan Rappazzo et.al. | 2409.15565 | null |
| 2024-09-23 | VLMine: Long-Tail Data Mining with Vision Language Models | Mao Ye et.al. | 2409.15486 | null |
| 2024-09-23 | HydroVision: LiDAR-Guided Hydrometric Prediction with Vision Transformers and Hybrid Graph Learning | Naghmeh Shafiee Roudbari et.al. | 2409.15213 | null |
| 2024-09-23 | Benchmarking Edge AI Platforms for High-Performance ML Inference | Rakshith Jayanth et.al. | 2409.14803 | null |
| 2024-09-23 | Less yet robust: crucial region selection for scene recognition | Jianqi Zhang et.al. | 2409.14741 | null |
| 2024-09-22 | Low-Light Enhancement Effect on Classification and Detection: An Empirical Study | Xu Wu et.al. | 2409.14461 | null |
| 2024-09-18 | Unraveling the Hessian: A Key to Smooth Convergence in Loss Function Landscapes | Nikita Kiselev et.al. | 2409.11995 | link |
| 2024-09-18 | Data Efficient Acoustic Scene Classification using Teacher-Informed Confusing Class Instruction | Jin Jie Sean Yeo et.al. | 2409.11964 | null |
| 2024-09-18 | Agglomerative Token Clustering | Joakim Bruslund Haurum et.al. | 2409.11923 | null |
| 2024-09-18 | Distillation-free Scaling of Large SSMs for Images and Videos | Hamid Suleman et.al. | 2409.11867 | null |
| 2024-09-18 | Community Shaping in the Digital Age: A Temporal Fusion Framework for Analyzing Discourse Fragmentation in Online Social Networks | Amirhossein Dezhboro et.al. | 2409.11665 | null |
| 2024-09-18 | Few-Shot Learning Approach on Tuberculosis Classification Based on Chest X-Ray Images | A. A. G. Yogi Pramana et.al. | 2409.11644 | null |
| 2024-09-18 | Hyperspectral Image Classification Based on Faster Residual Multi-branch Spiking Neural Network | Yang Liu et.al. | 2409.11619 | null |
| 2024-09-17 | Multi-Cohort Framework with Cohort-Aware Attention and Adversarial Mutual-Information Minimization for Whole Slide Image Classification | Sharon Peled et.al. | 2409.11119 | null |
| 2024-09-17 | Anti-ESIA: Analyzing and Mitigating Impacts of Electromagnetic Signal Injection Attacks | Denglin Kang et.al. | 2409.10922 | null |
| 2024-09-16 | Are Deep Learning Models Robust to Partial Object Occlusion in Visual Recognition Tasks? | Kaleb Kassaw et.al. | 2409.10775 | null |
| 2024-09-16 | Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning | Amin Karimi Monsefi et.al. | 2409.10362 | null |
| 2024-09-16 | InfoDisent: Explainability of Image Classification Models by Information Disentanglement | Łukasz Struski et.al. | 2409.10329 | null |
| 2024-09-16 | Enhancing Image Classification in Small and Unbalanced Datasets through Synthetic Data Augmentation | Neil De La Fuente et.al. | 2409.10286 | null |
| 2024-09-15 | Finetuning CLIP to Reason about Pairwise Differences | Dylan Sam et.al. | 2409.09721 | null |
| 2024-09-15 | Compositional Audio Representation Learning | Sripathi Sridhar et.al. | 2409.09619 | null |
| 2024-09-14 | One missing piece in Vision and Language: A Survey on Comics Understanding | Emanuele Vivoli et.al. | 2409.09502 | link |
| 2024-09-14 | Real-world Adversarial Defense against Patch Attacks based on Diffusion Model | Xingxing Wei et.al. | 2409.09406 | null |
| 2024-09-14 | Turbo your multi-modal classification with contrastive learning | Zhiyu Zhang et.al. | 2409.09282 | null |
| 2024-09-14 | Leveraging Foundation Models for Efficient Federated Learning in Resource-restricted Edge Networks | S. Kawa Atapour et.al. | 2409.09273 | null |
| 2024-09-13 | ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds | Sreyan Ghosh et.al. | 2409.09213 | link |
| 2024-09-13 | Pushing the boundaries of event subsampling in event-based video classification using CNNs | Hesam Araghi et.al. | 2409.08953 | link |
| 2024-09-13 | Pushing Joint Image Denoising and Classification to the Edge | Thomas C Markhorst et.al. | 2409.08943 | null |
| 2024-09-13 | Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering | Changxin Liu et.al. | 2409.08640 | null |
| 2024-09-13 | Anytime Continual Learning for Open Vocabulary Classification | Zhen Zhu et.al. | 2409.08518 | link |
| 2024-09-12 | Enhancing Few-Shot Image Classification through Learnable Multi-Scale Embedding and Attention Mechanisms | Fatemeh Askari et.al. | 2409.07989 | link |
| 2024-09-12 | Microscopic-Mamba: Revealing the Secrets of Microscopic Images with Just 4M Parameters | Shun Zou et.al. | 2409.07896 | link |
| 2024-09-12 | Classifying Images with CoLaNET Spiking Neural Network – the MNIST Example | Mikhail Kiselev et.al. | 2409.07833 | null |
| 2024-09-12 | Efficient Privacy-Preserving KAN Inference Using Homomorphic Encryption | Zhizheng Lai et.al. | 2409.07751 | null |
| 2024-09-12 | DFDG: Data-Free Dual-Generator Adversarial Distillation for One-Shot Federated Learning | Kangyang Luo et.al. | 2409.07734 | null |
| 2024-09-12 | Cooperative Inference with Interleaved Operator Partitioning for CNNs | Zhibang Liu et.al. | 2409.07693 | null |
| 2024-09-11 | Token Turing Machines are Efficient Vision Models | Purvish Jajal et.al. | 2409.07613 | null |
| 2024-09-11 | Minimizing Embedding Distortion for Robust Out-of-Distribution Performance | Tom Shaked et.al. | 2409.07582 | null |
| 2024-09-11 | A Contrastive Symmetric Forward-Forward Algorithm (SFFA) for Continual Learning Tasks | Erik B. Terres-Escudero et.al. | 2409.07387 | null |
| 2024-09-11 | Optimizing Neural Network Performance and Interpretability with Diophantine Equation Encoding | Ronald Katende et.al. | 2409.07310 | null |
| 2024-09-11 | LLM-based feature generation from text for interpretable machine learning | Vojtěch Balek et.al. | 2409.07132 | null |
| 2024-09-11 | Privacy-Preserving Federated Learning with Consistency via Knowledge Distillation Using Conditional Generator | Kangyang Luo et.al. | 2409.06955 | null |
| 2024-09-10 | Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm | Jinwei Zhao et.al. | 2409.06542 | null |
| 2024-09-10 | Seam Carving as Feature Pooling in CNN | Mohammad Imrul Jubair et.al. | 2409.06311 | null |
| 2024-09-10 | EntAugment: Entropy-Driven Adaptive Data Augmentation Framework for Image Classification | Suorong Yang et.al. | 2409.06290 | link |
| 2024-09-09 | A Small Claims Court for the NLP: Judging Legal Text Classification Strategies With Small Datasets | Mariana Yukari Noguti et.al. | 2409.05972 | null |
| 2024-09-09 | SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values | Chengwei Sun et.al. | 2409.05926 | null |
| 2024-09-09 | Adversarial Attacks on Data Attribution | Xinhe Wang et.al. | 2409.05657 | null |
| 2024-09-09 | Look One and More: Distilling Hybrid Order Relational Knowledge for Cross-Resolution Image Recognition | Shiming Ge et.al. | 2409.05384 | null |
| 2024-09-09 | RexUniNLU: Recursive Method with Explicit Schema Instructor for Universal NLU | Chengyuan Liu et.al. | 2409.05275 | null |
| 2024-09-09 | Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space | Junho Lee et.al. | 2409.05260 | null |
| 2024-09-08 | PatchAlign:Fair and Accurate Skin Disease Image Classification by Alignment with Clinical Labels | Aayushman et.al. | 2409.04975 | link |
| 2024-09-07 | Activation Function Optimization Scheme for Image Classification | Abdur Rahman et.al. | 2409.04915 | null |
| 2024-09-07 | LoCa: Logit Calibration for Knowledge Distillation | Runming Yang et.al. | 2409.04778 | null |
| 2024-09-07 | Swin Transformer for Robust Differentiation of Real and Synthetic Images: Intra- and Inter-Dataset Analysis | Preetu Mehta et.al. | 2409.04734 | null |
| 2024-09-06 | Connectivity-Inspired Network for Context-Aware Recognition | Gianluca Carloni et.al. | 2409.04360 | null |
| 2024-09-06 | An optically accelerated extreme learning machine using hot atomic vapors | Pierre Azam et.al. | 2409.04312 | null |
| 2024-09-06 | PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease Segmentation | Tianqi Wei et.al. | 2409.04038 | null |
| 2024-09-05 | Deep Clustering of Remote Sensing Scenes through Heterogeneous Transfer Learning | Isaac Ray et.al. | 2409.03938 | null |
| 2024-09-05 | WaterMAS: Sharpness-Aware Maximization for Neural Network Watermarking | Carl De Sousa Trias et.al. | 2409.03902 | null |
| 2024-09-05 | On-board Satellite Image Classification for Earth Observation: A Comparative Study of Pre-Trained Vision Transformer Models | Thanh-Dung Le et.al. | 2409.03901 | null |
| 2024-09-05 | Have Large Vision-Language Models Mastered Art History? | Ombretta Strafforello et.al. | 2409.03521 | null |
| 2024-09-05 | Non-Uniform Illumination Attack for Fooling Convolutional Neural Networks | Akshay Jain et.al. | 2409.03458 | link |
| 2024-09-05 | Training-free Conversion of Pretrained ANNs to SNNs for Low-Power and High-Performance Applications | Tong Bu et.al. | 2409.03368 | null |
| 2024-09-05 | PEPL: Precision-Enhanced Pseudo-Labeling for Fine-Grained Image Classification in Semi-Supervised Learning | Bowen Tian et.al. | 2409.03192 | null |
| 2024-09-05 | The AdEMAMix Optimizer: Better, Faster, Older | Matteo Pagliardini et.al. | 2409.03137 | null |
| 2024-09-04 | iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation | Hayeon Jo et.al. | 2409.02838 | null |
| 2024-09-03 | MedUnA: Language guided Unsupervised Adaptation of Vision-Language Models for Medical Image Classification | Umaima Rahman et.al. | 2409.02729 | null |
| 2024-09-05 | OpenFact at CheckThat! 2024: Combining Multiple Attack Methods for Effective Adversarial Text Generation | Włodzimierz Lewoniewski et.al. | 2409.02649 | null |
| 2024-09-04 | Boosting Generalizability towards Zero-Shot Cross-Dataset Single-Image Indoor Depth by Meta-Initialization | Cho-Ying Wu et.al. | 2409.02486 | null |
| 2024-09-03 | Evaluation and Comparison of Visual Language Models for Transportation Engineering Problems | Sanjita Prajapati et.al. | 2409.02278 | null |
| 2024-09-05 | Robust Clustering on High-Dimensional Data with Stochastic Quantization | Anton Kozyriev et.al. | 2409.02066 | link |
| 2024-09-03 | Compressed learning based onboard semantic compression for remote sensing platforms | Protim Bhattacharjee et.al. | 2409.01988 | null |
| 2024-09-03 | State-of-the-art Advances of Deep-learning Linguistic Steganalysis Research | Yihao Wang et.al. | 2409.01780 | null |
| 2024-09-03 | Enhancing Fine-Grained Visual Recognition in the Low-Data Regime Through Feature Magnitude Regularization | Avraham Chapman et.al. | 2409.01672 | null |
| 2024-09-03 | ReSpike: Residual Frames-based Hybrid Spiking Neural Networks for Efficient Action Recognition | Shiting Xiao et.al. | 2409.01564 | null |
| 2024-08-30 | Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain | Francesca Grasso et.al. | 2408.17362 | link |
| 2024-08-30 | Covariance-corrected Whitening Alleviates Network Degeneration on Imbalanced Classification | Zhiwei Zhang et.al. | 2408.17197 | null |
| 2024-08-30 | Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study | Shubham Agarwal et.al. | 2408.17181 | null |
| 2024-09-02 | Instant Adversarial Purification with Adversarial Consistency Distillation | Chun Tong Lei et.al. | 2408.17064 | null |
| 2024-08-30 | Generative Modeling Perspective for Control and Reasoning in Robotics | Takuma Yoneda et.al. | 2408.17041 | null |
| 2024-08-29 | Tex-ViT: A Generalizable, Robust, Texture-based dual-branch cross-attention deepfake detector | Deepak Dagar et.al. | 2408.16892 | null |
| 2024-08-29 | SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection | Rohit Venkata Sai Dulam et.al. | 2408.16645 | null |
| 2024-08-29 | Android Malware Detection Based on RGB Images and Multi-feature Fusion | Zhiqiang Wang et.al. | 2408.16555 | null |
| 2024-08-29 | SAU: A Dual-Branch Network to Enhance Long-Tailed Recognition via Generative Models | Guangxi Li et.al. | 2408.16273 | link |
| 2024-08-29 | Improving Diffusion-based Data Augmentation with Inversion Spherical Interpolation | Yanghao Wang et.al. | 2408.16266 | null |
| 2024-08-29 | Low Saturation Confidence Distribution-based Test-Time Adaptation for Cross-Domain Remote Sensing Image Classification | Yu Liang et.al. | 2408.16265 | null |
| 2024-08-28 | EMP: Enhance Memory in Data Pruning | Jinying Xiao et.al. | 2408.16031 | null |
| 2024-08-28 | Local Descriptors Weighted Adaptive Threshold Filtering For Few-Shot Learning | Bingchen Yan et.al. | 2408.15924 | null |
| 2024-08-28 | ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation | Tiantian Feng et.al. | 2408.15803 | null |
| 2024-08-28 | Visual Prompt Engineering for Medical Vision Language Models in Radiology | Stefan Denner et.al. | 2408.15802 | null |
| 2024-08-28 | Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings | Lingyu Gao et.al. | 2408.15650 | null |
| 2024-08-27 | DCT-CryptoNets: Scaling Private Inference in the Frequency Domain | Arjun Roy et.al. | 2408.15231 | null |
| 2024-08-27 | A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships | Gracile Astlin Pereira et.al. | 2408.15178 | null |
| 2024-08-28 | AnomalousPatchCore: Exploring the Use of Anomalous Samples in Industrial Anomaly Detection | Mykhailo Koshil et.al. | 2408.15113 | null |
| 2024-08-27 | Data downlink prioritization using image classification on-board a 6U CubeSat | Keenan A. A. Chatar et.al. | 2408.14865 | null |
| 2024-08-27 | Leveraging Self-supervised Audio Representations for Data-Efficient Acoustic Scene Classification | Yiqiang Cai et.al. | 2408.14862 | null |
| 2024-08-27 | Text-guided Foundation Model Adaptation for Long-Tailed Medical Image Classification | Sirui Li et.al. | 2408.14770 | null |
| 2024-08-26 | On-Chip Learning with Memristor-Based Neural Networks: Assessing Accuracy and Efficiency Under Device Variations, Conductance Errors, and Input Noise | M. Reza Eslami et.al. | 2408.14680 | null |
| 2024-08-26 | Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification | Mahrukh Awan et.al. | 2408.14441 | null |
| 2024-08-26 | Uncertainties of Latent Representations in Computer Vision | Michael Kirchhof et.al. | 2408.14281 | null |
| 2024-08-26 | MSFMamba: Multi-Scale Feature Fusion State Space Model for Multi-Source Remote Sensing Image Classification | Feng Gao et.al. | 2408.14255 | null |
| 2024-08-26 | Feature Aligning Few shot Learning Method Using Local Descriptors Weighted Rules | Bingchen Yan et.al. | 2408.14192 | null |
| 2024-08-26 | GenFormer – Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets | Sven Oehri et.al. | 2408.14131 | null |
| 2024-08-25 | Few-Shot Histopathology Image Classification: Evaluating State-of-the-Art Methods and Unveiling Performance Insights | Ardhendu Sekhar et.al. | 2408.13816 | null |
| 2024-08-25 | On the Robustness of Kolmogorov-Arnold Networks: An Adversarial Perspective | Tal Alter et.al. | 2408.13809 | null |
| 2024-08-25 | Enhancing Adaptive Deep Networks for Image Classification via Uncertainty-aware Decision Fusion | Xu Zhang et.al. | 2408.13744 | link |
| 2024-08-25 | 3D-RCNet: Learning from Transformer to Build a 3D Relational ConvNet for Hyperspectral Image Classification | Haizhao Jing et.al. | 2408.13728 | null |
| 2024-08-24 | Enhanced Astronomical Source Classification with Integration of Attention Mechanisms and Vision Transformers | Srinadh Reddy Bhavanam et.al. | 2408.13634 | null |
| 2024-08-23 | Domain-specific long text classification from sparse relevant information | Célia D’Cruz et.al. | 2408.13253 | null |
| 2024-08-23 | EAViT: External Attention Vision Transformer for Audio Classification | Aquib Iqbal et.al. | 2408.13201 | null |
| 2024-08-23 | A gradient system based on anisotropic monochrome image processing with orientation auto-adjustment | Harbir Antil et.al. | 2408.12847 | null |
| 2024-08-23 | Underwater SONAR Image Classification and Analysis using LIME-based Explainable Artificial Intelligence | Purushothaman Natarajan et.al. | 2408.12837 | null |
| 2024-08-23 | VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models | Purushothaman Natarajan et.al. | 2408.12808 | null |
| 2024-08-23 | BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models | Yige Li et.al. | 2408.12798 | null |
| 2024-08-23 | Semi-Supervised Variational Adversarial Active Learning via Learning to Rank and Agreement-Based Pseudo Labeling | Zongyao Lyu et.al. | 2408.12774 | null |
| 2024-08-23 | Symmetric masking strategy enhances the performance of Masked Image Modeling | Khanh-Binh Nguyen et.al. | 2408.12772 | null |
| 2024-08-22 | ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation | Lujia Zhong et.al. | 2408.12561 | link |
| 2024-08-22 | The Russian-focused embedders’ exploration: ruMTEB benchmark and Russian embedding model design | Artem Snegirev et.al. | 2408.12503 | null |
| 2024-08-22 | Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification | Sudi Murindanyi et.al. | 2408.12426 | null |
| 2024-08-22 | AT-SNN: Adaptive Tokens for Vision Transformer on Spiking Neural Network | Donghwa Kang et.al. | 2408.12293 | null |
| 2024-08-22 | Whole Slide Image Classification of Salivary Gland Tumours | John Charlton et.al. | 2408.12275 | null |
| 2024-08-22 | Query-Efficient Video Adversarial Attack with Stylized Logo | Duoxun Tang et.al. | 2408.12099 | null |
| 2024-08-21 | Approaching Deep Learning through the Spectral Dynamics of Weights | David Yunis et.al. | 2408.11804 | link |
| 2024-08-21 | SBDet: A Symmetry-Breaking Object Detector via Relaxed Rotation-Equivariance | Zhiqiang Wu et.al. | 2408.11760 | null |
| 2024-08-21 | Improving Calibration by Relating Focal Loss, Temperature Scaling, and Properness | Viacheslav Komisarenko et.al. | 2408.11598 | link |
| 2024-08-21 | MSCPT: Few-shot Whole Slide Image Classification with Multi-scale and Context-focused Prompt Tuning | Minghao Han et.al. | 2408.11505 | null |
| 2024-08-21 | Enabling Small Models for Zero-Shot Classification through Model Label Learning | Jia Zhang et.al. | 2408.11449 | null |
| 2024-08-21 | Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond | Minghao Liu et.al. | 2408.11338 | null |
| 2024-08-21 | Towards Evaluating Large Language Models on Sarcasm Understanding | Yazhou Zhang et.al. | 2408.11319 | null |
| 2024-08-20 | Privacy-preserving Universal Adversarial Defense for Black-box Models | Qiao Li et.al. | 2408.10647 | null |
| 2024-08-20 | A Tutorial on Explainable Image Classification for Dementia Stages Using Convolutional Neural Network and Gradient-weighted Class Activation Mapping | Kevin Kam Fung Yuen et.al. | 2408.10572 | null |
| 2024-08-20 | NoMatterXAI: Generating “No Matter What” Alterfactual Examples for Explaining Black-Box Text Classification Models | Tuc Nguyen et.al. | 2408.10528 | null |
| 2024-08-20 | Cervical Cancer Detection Using Multi-Branch Deep Learning Model | Tatsuhiro Baba et.al. | 2408.10498 | null |
| 2024-08-19 | HaSPeR: An Image Repository for Hand Shadow Puppet Recognition | Syed Rifat Raiyan et.al. | 2408.10360 | link |
| 2024-08-19 | Leveraging Superfluous Information in Contrastive Representation Learning | Xuechu Yu et.al. | 2408.10292 | null |
| 2024-08-19 | SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models | Anke Tang et.al. | 2408.10174 | link |
| 2024-08-19 | Towards Robust Federated Image Classification: An Empirical Study of Weight Selection Strategies in Manufacturing | Vinit Hegiste et.al. | 2408.10024 | null |
| 2024-08-19 | Detecting Adversarial Attacks in Semantic Segmentation via Uncertainty Estimation: A Deep Analysis | Kira Maag et.al. | 2408.10021 | null |
| 2024-08-19 | Active Learning for Identifying Disaster-Related Tweets: A Comparison with Keyword Filtering and Generic Fine-Tuning | David Hanny et.al. | 2408.09914 | null |
| 2024-08-19 | Ranking Generated Answers: On the Agreement of Retrieval Models with Humans on Consumer Health Questions | Sebastian Heineking et.al. | 2408.09831 | null |
| 2024-08-19 | AutoML-guided Fusion of Entity and LLM-based representations | Boshko Koloski et.al. | 2408.09794 | null |
| 2024-08-19 | Dataset Distillation for Histopathology Image Classification | Cong Cong et.al. | 2408.09709 | null |
| 2024-08-19 | A Strategy to Combine 1stGen Transformers and Open LLMs for Automatic Text Classification | Claudio M. V. de Andrade et.al. | 2408.09629 | null |
| 2024-08-18 | Attention Is Not What You Need: Revisiting Multi-Instance Learning for Whole Slide Image Classification | Xin Liu et.al. | 2408.09449 | null |
| 2024-08-17 | Narrowing the Focus: Learned Optimizers for Pretrained Models | Gus Kristiansen et.al. | 2408.09310 | null |
| 2024-08-16 | DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models | Eman Ali et.al. | 2408.08855 | null |
| 2024-08-16 | LEVIS: Large Exact Verifiable Input Spaces for Neural Networks | Mohamad Fares El Hajj Chehade et.al. | 2408.08824 | null |
| 2024-08-16 | Leveraging FourierKAN Classification Head for Pre-Trained Transformer-based Text Classification | Abdullah Al Imran et.al. | 2408.08803 | null |
| 2024-08-16 | Xpikeformer: Hybrid Analog-Digital Hardware Acceleration for Spiking Transformers | Zihang Song et.al. | 2408.08794 | null |
| 2024-08-16 | Quantum convolutional neural networks for jet images classification | Hala Elhag et.al. | 2408.08701 | null |
| 2024-08-16 | MM-UNet: A Mixed MLP Architecture for Improved Ophthalmic Image Segmentation | Zunjie Xiao et.al. | 2408.08600 | null |
| 2024-08-16 | Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs | Jinming Liu et.al. | 2408.08575 | null |
| 2024-08-16 | Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness | Hefei Mei et.al. | 2408.08502 | link |
| 2024-08-15 | Beyond Uniform Query Distribution: Key-Driven Grouped Query Attention | Zohaib Khan et.al. | 2408.08454 | null |
| 2024-08-15 | Predictive uncertainty estimation in deep learning for lung carcinoma classification in digital pathology under real dataset shifts | Abdur R. Fayjie et.al. | 2408.08432 | null |
| 2024-08-15 | SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training | Gengwei Zhang et.al. | 2408.08295 | link |
| 2024-08-15 | Moving Healthcare AI-Support Systems for Visually Detectable Diseases onto Constrained Devices | Tess Watt et.al. | 2408.08215 | null |
| 2024-08-15 | Towards flexible perception with visual memory | Robert Geirhos et.al. | 2408.08172 | null |
| 2024-08-15 | Category-Prompt Refined Feature Learning for Long-Tailed Multi-Label Image Classification | Jiexuan Yan et.al. | 2408.08125 | link |
| 2024-08-15 | HAIR: Hypernetworks-based All-in-One Image Restoration | Jin Cao et.al. | 2408.08091 | link |
| 2024-08-14 | Large Language Models Prompting With Episodic Memory | Dai Do et.al. | 2408.07465 | null |
| 2024-08-14 | Leveraging Perceptual Scores for Dataset Pruning in Computer Vision Tasks | Raghavendra Singh et.al. | 2408.07243 | null |
| 2024-08-13 | Efficient Search for Customized Activation Functions with Gradient Descent | Lukas Strack et.al. | 2408.06820 | link |
| 2024-08-13 | Do Vision-Language Foundational models show Robust Visual Perception? | Shivam Chandhok et.al. | 2408.06781 | link |
| 2024-08-13 | Towards Cross-Domain Single Blood Cell Image Classification via Large-Scale LoRA-based Segment Anything Model | Yongcheng Li et.al. | 2408.06716 | link |
| 2024-08-13 | Coherence Awareness in Diffractive Neural Networks | Matan Kleiner et.al. | 2408.06681 | null |
| 2024-08-12 | Is it a work or leisure travel? Applying text classification to identify work-related travel on social networks | Lucas Félix et.al. | 2408.06341 | null |
| 2024-08-12 | Audio Enhancement for Computer Audition – An Iterative Training Paradigm Using Sample Importance | Manuel Milling et.al. | 2408.06264 | null |
| 2024-08-12 | Deep Learning System Boundary Testing through Latent Space Style Mixing | Amr Abdellatif et.al. | 2408.06258 | null |
| 2024-08-12 | Global-to-Local Support Spectrums for Language Model Explainability | Lucas Agussurja et.al. | 2408.05976 | null |
| 2024-08-12 | A Simple Task-aware Contrastive Local Descriptor Selection Strategy for Few-shot Learning between inter class and intra class | Qian Qiao et.al. | 2408.05953 | null |
| 2024-08-12 | Classifier Guidance Enhances Diffusion-based Adversarial Purification by Preserving Predictive Information | Mingkun Zhang et.al. | 2408.05900 | null |
| 2024-08-11 | HiLight: A Hierarchy-aware Light Global Model with Hierarchical Local ConTrastive Learning | Zhijian Chen et.al. | 2408.05786 | null |
| 2024-08-11 | PRECISe : Prototype-Reservation for Explainable Classification under Imbalanced and Scarce-Data Settings | Vaibhav Ganatra et.al. | 2408.05754 | null |
| 2024-08-11 | Disposable-key-based image encryption for collaborative learning of Vision Transformer | Rei Aso et.al. | 2408.05737 | null |
| 2024-08-11 | A Novel Momentum-Based Deep Learning Techniques for Medical Image Classification and Segmentation | Koushik Biswas et.al. | 2408.05692 | null |
| 2024-08-09 | A conformalized learning of a prediction set with applications to medical imaging classification | Roy Hirsch et.al. | 2408.05037 | null |
| 2024-08-09 | Generalisation First, Memorisation Second? Memorisation Localisation for Natural Language Classification Tasks | Verna Dankers et.al. | 2408.04965 | null |
| 2024-08-09 | LiD-FL: Towards List-Decodable Federated Learning | Hong Liu et.al. | 2408.04963 | null |
| 2024-08-09 | In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation | Dahyun Kang et.al. | 2408.04961 | link |
| 2024-08-08 | Enhanced Prototypical Part Network (EPPNet) For Explainable Image Classification Via Prototypes | Bhushan Atote et.al. | 2408.04606 | null |
| 2024-08-08 | SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals | Haoran Zheng et.al. | 2408.04575 | null |
| 2024-08-08 | An experimental comparative study of backpropagation and alternatives for training binary neural networks for image classification | Ben Crulis et.al. | 2408.04460 | null |
| 2024-08-08 | Dual-branch PolSAR Image Classification Based on GraphMAE and Local Feature Extraction | Yuchen Wang et.al. | 2408.04294 | null |
| 2024-08-07 | FMiFood: Multi-modal Contrastive Learning for Food Image Classification | Xinyue Pan et.al. | 2408.03922 | null |
| 2024-08-07 | Leveraging Variation Theory in Counterfactual Data Augmentation for Optimized Active Learning | Simret Araya Gebreegziabher et.al. | 2408.03819 | null |
| 2024-08-07 | Intuitionistic Fuzzy Cognitive Maps for Interpretable Image Classification | Georgia Sovatzidi et.al. | 2408.03745 | null |
| 2024-08-07 | CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications | Tianfang Zhang et.al. | 2408.03703 | link |
| 2024-08-07 | Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks | Jaewook Lee et.al. | 2408.03663 | null |
| 2024-08-07 | Making Robust Generalizers Less Rigid with Soft Ascent-Descent | Matthew J. Holland et.al. | 2408.03619 | null |
| 2024-08-06 | AI Foundation Models in Remote Sensing: A Survey | Siqi Lu et.al. | 2408.03464 | null |
| 2024-08-06 | Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments | Angie Boggust et.al. | 2408.03274 | null |
| 2024-08-06 | A Debiased Nearest Neighbors Framework for Multi-Label Text Classification | Zifeng Cheng et.al. | 2408.03202 | null |
| 2024-08-06 | Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi | Pranita Deshmukh et.al. | 2408.03172 | null |
| 2024-08-06 | Comb, Prune, Distill: Towards Unified Pruning for Vision Model Compression | Jonas Schmitt et.al. | 2408.03046 | null |
| 2024-08-06 | L3iTC at the FinLLM Challenge Task: Quantization for Financial Text Classification & Summarization | Elvys Linhares Pontes et.al. | 2408.03033 | null |
| 2024-08-06 | Adversarial Robustness of Open-source Text Classification Models and Fine-Tuning Chains | Hao Qin et.al. | 2408.02963 | null |
| 2024-08-06 | Dual-View Pyramid Pooling in Deep Neural Networks for Improved Medical Image Classification and Confidence Calibration | Xiaoqing Zhang et.al. | 2408.02906 | null |
| 2024-08-05 | Interpretation of the Intent Detection Problem as Dynamics in a Low-dimensional Space | Eduardo Sanchez-Karhunen et.al. | 2408.02838 | null |
| 2024-08-05 | Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services | Shaopeng Fu et.al. | 2408.02814 | null |
| 2024-08-05 | FPT+: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification | Yijin Huang et.al. | 2408.02426 | null |
| 2024-08-05 | On the Robustness of Malware Detectors to Adversarial Samples | Muhammad Salman et.al. | 2408.02310 | null |
| 2024-08-05 | Low-Cost Self-Ensembles Based on Multi-Branch Transformation and Grouped Convolution | Hojung Lee et.al. | 2408.02307 | null |
| 2024-08-05 | Network Fission Ensembles for Low-Cost Self-Ensembles | Hojung Lee et.al. | 2408.02301 | null |
| 2024-08-04 | VidModEx: Interpretable and Efficient Black Box Model Extraction for High-Dimensional Spaces | Somnath Sendhil Kumar et.al. | 2408.02140 | link |
| 2024-08-04 | DeMansia: Mamba Never Forgets Any Tokens | Ricky Fang et.al. | 2408.01986 | null |
| 2024-08-06 | A Survey and Evaluation of Adversarial Attacks for Object Detection | Khoi Nguyen Tiet Nguyen et.al. | 2408.01934 | null |
| 2024-08-03 | Safe Semi-Supervised Contrastive Learning Using In-Distribution Data as Positive Examples | Min Gu Kwak et.al. | 2408.01872 | null |
| 2024-08-03 | LAM3D: Leveraging Attention for Monocular 3D Object Detection | Diana-Alexandra Sas et.al. | 2408.01739 | null |
| 2024-08-02 | Counterfactual Explanations for Medical Image Classification and Regression using Diffusion Autoencoder | Matan Atad et.al. | 2408.01571 | null |
| 2024-08-02 | Spatial-Spectral Morphological Mamba for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2408.01372 | link |
| 2024-08-02 | WaveMamba: Spatial-Spectral Wavelet Mamba for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2408.01231 | null |
| 2024-08-02 | Multi-head Spatial-Spectral Mamba for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2408.01224 | link |
| 2024-08-02 | Rethinking Pre-trained Feature Extractor Selection in Multiple Instance Learning for Whole Slide Image Classification | Bryan Wong et.al. | 2408.01167 | null |
| 2024-08-01 | CERT-ED: Certifiably Robust Text Classification for Edit Distance | Zhuoqun Huang et.al. | 2408.00728 | null |
| 2024-08-01 | Deep Learning in Medical Image Classification from MRI-based Brain Tumor Images | Xiaoyi Liu et.al. | 2408.00636 | null |
| 2024-08-01 | DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation | Rakshith Subramanyam et.al. | 2408.00331 | null |
| 2024-07-31 | Vera Verto: Multimodal Hijacking Attack | Minxing Zhang et.al. | 2408.00129 | null |
| 2024-07-31 | Learning Video Context as Interleaved Multimodal Sequences | Kevin Qinghong Lin et.al. | 2407.21757 | link |
| 2024-07-30 | Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation | Marcelo Matheus Gauy et.al. | 2407.20989 | null |
| 2024-07-30 | Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach | Adam Wojciechowski et.al. | 2407.20899 | null |
| 2024-08-01 | DFE-IANet: A Method for Polyp Image Classification Based on Dual-domain Feature Extraction and Interaction Attention | Wei Wang et.al. | 2407.20843 | null |
| 2024-08-01 | The Susceptibility of Example-Based Explainability Methods to Class Outliers | Ikhtiyor Nematov et.al. | 2407.20678 | null |
| 2024-07-30 | Knowledge Fused Recognition: Fusing Hierarchical Knowledge for Image Recognition through Quantitative Relativity Modeling and Deep Metric Learning | Yunfeng Zhao et.al. | 2407.20600 | null |
| 2024-07-30 | Exploring Liquid Neural Networks on Loihi-2 | Wiktoria Agata Pawlak et.al. | 2407.20590 | null |
| 2024-07-29 | Graphite: A Graph-based Extreme Multi-Label Short Text Classifier for Keyphrase Recommendation | Ashirbad Mishra et.al. | 2407.20462 | null |
| 2024-07-29 | Diffusion Feedback Helps CLIP See Better | Wenxuan Wang et.al. | 2407.20171 | link |
| 2024-07-29 | Distilling High Diagnostic Value Patches for Whole Slide Image Classification Using Attention Mechanism | Tianhang Nan et.al. | 2407.19821 | null |
| 2024-07-28 | Competition-based Adaptive ReLU for Deep Neural Networks | Junjia Chen et.al. | 2407.19441 | null |
| 2024-07-28 | Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets | Tianxiao Zhang et.al. | 2407.19394 | link |
| 2024-07-27 | Inference-Time Selective Debiasing | Gleb Kuzmin et.al. | 2407.19345 | null |
| 2024-07-27 | Stellar Blend Image Classification Using Computationally Efficient Gaussian Processes | Chinedu Eleh et.al. | 2407.19297 | null |
| 2024-07-27 | Towards Robust Few-shot Class Incremental Learning in Audio Classification using Contrastive Representation | Riyansha Singh et.al. | 2407.19265 | null |
| 2024-07-27 | A Survey of Malware Detection Using Deep Learning | Ahmed Bensaoud et.al. | 2407.19153 | null |
| 2024-07-26 | UniForensics: Face Forgery Detection via General Facial Representation | Ziyuan Fang et.al. | 2407.19079 | null |
| 2024-07-26 | A Scalable Quantum Non-local Neural Network for Image Classification | Sparsh Gupta et.al. | 2407.18906 | link |
| 2024-07-26 | Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment | Yuze Zheng et.al. | 2407.18854 | null |
| 2024-07-26 | Local Binary Pattern(LBP) Optimization for Feature Extraction | Zeinab Sedaghatjoo et.al. | 2407.18665 | null |
| 2024-07-26 | Topology Optimization of Random Memristors for Input-Aware Dynamic SNN | Bo Wang et.al. | 2407.18625 | null |
| 2024-07-26 | Content-driven Magnitude-Derivative Spectrum Complementary Learning for Hyperspectral Image Classification | Huiyan Bai et.al. | 2407.18593 | null |
| 2024-07-26 | VSSD: Vision Mamba with Non-Casual State Space Duality | Yuheng Shi et.al. | 2407.18559 | link |
| 2024-07-25 | Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images | Roberto Di Via et.al. | 2407.18125 | null |
| 2024-07-25 | Mew: Multiplexed Immunofluorescence Image Analysis through an Efficient Multiplex Network | Sukwon Yun et.al. | 2407.17857 | link |
| 2024-07-25 | SAM-MIL: A Spatial Contextual Aware Multiple Instance Learning Approach for Whole Slide Image Classification | Heng Fang et.al. | 2407.17689 | link |
| 2024-07-26 | Unsqueeze [CLS] Bottleneck to Learn Rich Representations | Qing Su et.al. | 2407.17671 | link |
| 2024-07-24 | Explaining the Model, Protecting Your Data: Revealing and Mitigating the Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference | Catherine Huang et.al. | 2407.17663 | null |
| 2024-07-23 | S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks | Neha A S et.al. | 2407.17587 | null |
| 2024-07-24 | A Novel Two-Step Fine-Tuning Pipeline for Cold-Start Active Learning in Text Classification Tasks | Fabiano Belém et.al. | 2407.17284 | null |
| 2024-07-24 | Graph Neural Networks: A suitable Alternative to MLPs in Latent 3D Medical Image Classification? | Johannes Kiechle et.al. | 2407.17219 | link |
| 2024-07-24 | Quanv4EO: Empowering Earth Observation by means of Quanvolutional Neural Networks | Alessandro Sebastianelli et.al. | 2407.17108 | null |
| 2024-07-24 | An Adaptive Gradient Regularization Method | Huixiu Jiang et.al. | 2407.16944 | null |
| 2024-07-23 | Lawma: The Power of Specialization for Legal Tasks | Ricardo Dominguez-Olmedo et.al. | 2407.16615 | null |
| 2024-07-23 | Deep Bayesian segmentation for colon polyps: Well-calibrated predictions in medical imaging | Daniela L. Ramos et.al. | 2407.16608 | null |
| 2024-07-23 | Designing robust diffractive neural networks with improved transverse shift tolerance | Daniil V. Soshnikov et.al. | 2407.16456 | null |
| 2024-07-23 | Image Classification using Fuzzy Pooling in Convolutional Kolmogorov-Arnold Networks | Ayan Igali et.al. | 2407.16268 | null |
| 2024-07-23 | HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification | Shuyi Ouyang et.al. | 2407.16244 | null |
| 2024-07-23 | Improved Few-Shot Image Classification Through Multiple-Choice Questions | Dipika Khullar et.al. | 2407.16145 | null |
| 2024-07-22 | Pavement Fatigue Crack Detection and Severity Classification Based on Convolutional Neural Network | Zhen Wang et.al. | 2407.16021 | null |
| 2024-07-22 | AIDE: Antithetical, Intent-based, and Diverse Example-Based Explanations | Ikhtiyor Nematov et.al. | 2407.16010 | null |
| 2024-07-22 | Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models | Aayush Saxena et.al. | 2407.15904 | null |
| 2024-07-22 | Beyond Size and Class Balance: Alpha as a New Dataset Quality Metric for Deep Learning | Josiah Couch et.al. | 2407.15724 | null |
| 2024-07-22 | Retinomorphic Feature Detection and Machine Vision in a Network Laser | Wai Kit Ng et.al. | 2407.15558 | null |
| 2024-07-22 | Learning deep illumination-robust features from multispectral filter array images | Anis Amziane et.al. | 2407.15472 | null |
| 2024-07-22 | Is user feedback always informative? Retrieval Latent Defending for Semi-Supervised Domain Adaptation without Source Data | Junha Song et.al. | 2407.15383 | null |
| 2024-07-22 | FMDNN: A Fuzzy-guided Multi-granular Deep Neural Network for Histopathological Image Classification | Weiping Ding et.al. | 2407.15312 | null |
| 2024-07-21 | Assessing Sample Quality via the Latent Space of Generative Models | Jingyi Xu et.al. | 2407.15171 | null |
| 2024-07-21 | A multi-level multi-label text classification dataset of 19th century Ottoman and Russian literary and critical texts | Gokcen Gokceoglu et.al. | 2407.15136 | null |
| 2024-07-20 | Toward Efficient Convolutional Neural Networks With Structured Ternary Patterns | Christos Kyrkou et.al. | 2407.14831 | link |
| 2024-07-20 | Subgraph Clustering and Atom Learning for Improved Image Classification | Aryan Singh et.al. | 2407.14772 | null |
| 2024-07-20 | A Comprehensive Review of Few-shot Action Recognition | Yuyang Wanyan et.al. | 2407.14744 | null |
| 2024-07-19 | DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks | Sarah Jabbour et.al. | 2407.14509 | null |
| 2024-07-19 | Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models | Xuenan Xu et.al. | 2407.14355 | null |
| 2024-07-19 | EmoCAM: Toward Understanding What Drives CNN-based Emotion Recognition | Youssef Doulfoukar et.al. | 2407.14314 | null |
| 2024-07-18 | CoAPT: Context Attribute words for Prompt Tuning | Gun Lee et.al. | 2407.13808 | null |
| 2024-07-18 | GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model | Abdelrahman Shaker et.al. | 2407.13772 | link |
| 2024-07-18 | Addressing Imbalance for Class Incremental Learning in Medical Image Classification | Xuze Hao et.al. | 2407.13768 | null |
| 2024-07-18 | Differential Privacy Mechanisms in Neural Tangent Kernel Regression | Jiuxiang Gu et.al. | 2407.13621 | null |
| 2024-07-18 | CycleMix: Mixing Source Domains for Domain Generalization in Style-Dependent Data | Aristotelis Ballas et.al. | 2407.13421 | link |
| 2024-07-17 | LookupViT: Compressing visual information to a limited number of tokens | Rajat Koner et.al. | 2407.12753 | null |
| 2024-07-17 | Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients | Dohyung Kim et.al. | 2407.12637 | null |
| 2024-07-17 | Domain-specific or Uncertainty-aware models: Does it really make a difference for biomedical text classification? | Aman Sinha et.al. | 2407.12626 | null |
| 2024-07-18 | Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks | Antoni Kowalczuk et.al. | 2407.12588 | link |
| 2024-07-17 | Non-parametric regularization for class imbalance federated medical image classification | Jeffry Wicaksana et.al. | 2407.12446 | link |
| 2024-07-17 | FETCH: A Memory-Efficient Replay Approach for Continual Learning in Image Classification | Markus Weißflog et.al. | 2407.12375 | null |
| 2024-07-17 | Adaptive Cascading Network for Continual Test-Time Adaptation | Kien X. Nguyen et.al. | 2407.12240 | null |
| 2024-07-16 | Generalized Coverage for More Robust Low-Budget Active Learning | Wonho Bae et.al. | 2407.12212 | null |
| 2024-07-18 | A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification | Markus Marks et.al. | 2407.12210 | null |
| 2024-07-16 | Novel Artistic Scene-Centric Datasets for Effective Transfer Learning in Fragrant Spaces | Shumei Liu et.al. | 2407.11701 | null |
| 2024-07-16 | Probing the Efficacy of Federated Parameter-Efficient Fine-Tuning of Vision Transformers for Medical Image Classification | Naif Alkhunaizi et.al. | 2407.11573 | null |
| 2024-07-16 | TCFormer: Visual Recognition via Token Clustering Transformer | Wang Zeng et.al. | 2407.11321 | link |
| 2024-07-16 | PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer | Pierre-David Letourneau et.al. | 2407.11306 | null |
| 2024-07-15 | Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion | Philipp Allgeuer et.al. | 2407.11211 | null |
| 2024-07-16 | DataDream: Few-shot Guided Dataset Generation | Jae Myung Kim et.al. | 2407.10910 | link |
| 2024-07-15 | Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification | Linhao Qu et.al. | 2407.10814 | null |
| 2024-07-15 | Employing Sentence Space Embedding for Classification of Data Stream from Fake News Domain | Paweł Zyblewski et.al. | 2407.10807 | null |
| 2024-07-15 | Anticipating Future Object Compositions without Forgetting | Youssef Zahran et.al. | 2407.10723 | null |
| 2024-07-15 | GeoMix: Towards Geometry-Aware Data Augmentation | Wentao Zhao et.al. | 2407.10681 | link |
| 2024-07-15 | Learning Natural Consistency Representation for Face Forgery Video Detection | Daichi Zhang et.al. | 2407.10550 | null |
| 2024-07-15 | Improving Hyperbolic Representations via Gromov-Wasserstein Regularization | Yifei Yang et.al. | 2407.10495 | null |
| 2024-07-15 | Backdoor Attacks against Image-to-Image Networks | Wenbo Jiang et.al. | 2407.10445 | null |
| 2024-07-14 | Deep Learning Algorithms for Early Diagnosis of Acute Lymphoblastic Leukemia | Dimitris Papaioannou et.al. | 2407.10251 | null |
| 2024-07-14 | Advancing Continual Learning for Robust Deepfake Audio Classification | Feiyi Dong et.al. | 2407.10108 | null |
| 2024-07-12 | Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off | Levente Halmosi et.al. | 2407.09150 | link |
| 2024-07-12 | Open Vocabulary Multi-Label Video Classification | Rohit Gupta et.al. | 2407.09073 | null |
| 2024-07-12 | GPC: Generative and General Pathology Image Classifier | Anh Tien Nguyen et.al. | 2407.09035 | null |
| 2024-07-12 | CAMP: Continuous and Adaptive Learning Model in Pathology | Anh Tien Nguyen et.al. | 2407.09030 | null |
| 2024-07-12 | SlideGCD: Slide-based Graph Collaborative Training with Knowledge Distillation for Whole Slide Image Classification | Tong Shu et.al. | 2407.08968 | null |
| 2024-07-12 | Domain-Hierarchy Adaptation via Chain of Iterative Reasoning for Few-shot Hierarchical Text Classification | Ke Ji et.al. | 2407.08959 | null |
| 2024-07-11 | Local Clustering for Lung Cancer Image Classification via Sparse Solution Technique | Jackson Hamel et.al. | 2407.08800 | null |
| 2024-07-11 | Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification | Wenshuo Peng et.al. | 2407.08787 | null |
| 2024-07-11 | ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions | Jiu Feng et.al. | 2407.08691 | link |
| 2024-07-11 | Histopathological Image Classification with Cell Morphology Aware Deep Neural Networks | Andrey Ignatov et.al. | 2407.08625 | link |
| 2024-07-11 | BiasPruner: Debiased Continual Learning for Medical Image Classification | Nourhan Bayasi et.al. | 2407.08609 | link |
| 2024-07-11 | GraphMamba: An Efficient Graph Structure Learning Vision Mamba for Hyperspectral Image Classification | Aitao Yang et.al. | 2407.08255 | link |
| 2024-07-11 | Beyond Text: Leveraging Multi-Task Learning and Cognitive Appraisal Theory for Post-Purchase Intention Analysis | Gerard Christopher Yeo et.al. | 2407.08182 | null |
| 2024-07-11 | Enrich the content of the image Using Context-Aware Copy Paste | Qiushi Guo et.al. | 2407.08151 | null |
| 2024-07-10 | MambaVision: A Hybrid Mamba-Transformer Vision Backbone | Ali Hatamizadeh et.al. | 2407.08083 | link |
| 2024-07-10 | The Misclassification Likelihood Matrix: Some Classes Are More Likely To Be Misclassified Than Others | Daniel Sikar et.al. | 2407.07818 | null |
| 2024-07-11 | Trainable Highly-expressive Activation Functions | Irit Chelly et.al. | 2407.07564 | null |
| 2024-07-10 | HDKD: Hybrid Data-Efficient Knowledge Distillation Network for Medical Image Classification | Omar S. EL-Assiouti et.al. | 2407.07516 | null |
| 2024-07-10 | Towards a text-based quantitative and explainable histopathology image analysis | Anh Tien Nguyen et.al. | 2407.07360 | null |
| 2024-07-11 | FALFormer: Feature-aware Landmarks self-attention for Whole-slide Image Classification | Doanh C. Bui et.al. | 2407.07340 | link |
| 2024-07-10 | Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken | Peifu Liu et.al. | 2407.07307 | link |
| 2024-07-09 | Exploring Camera Encoder Designs for Autonomous Driving Perception | Barath Lakshmanan et.al. | 2407.07276 | null |
| 2024-07-09 | CTRL-F: Pairing Convolution with Transformer for Image Classification via Multi-Level Feature Cross-Attention and Representation Learning Fusion | Hosam S. EL-Assiouti et.al. | 2407.06673 | null |
| 2024-07-09 | NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification | Hongfei Huang et.al. | 2407.06579 | null |
| 2024-07-08 | Hybrid Classical-Quantum architecture for vectorised image classification of hand-written sketches | Y. Cordero et.al. | 2407.06416 | null |
| 2024-07-08 | GeoWATCH for Detecting Heavy Construction in Heterogeneous Time Series of Satellite Images | Jon Crall et.al. | 2407.06337 | null |
| 2024-07-08 | Multi-Label Plant Species Classification with Self-Supervised Vision Transformers | Murilo Gustineli et.al. | 2407.06298 | link |
| 2024-07-08 | Active Label Refinement for Robust Training of Imbalanced Medical Image Classification Tasks in the Presence of High Label Noise | Bidur Khanal et.al. | 2407.05973 | null |
| 2024-07-08 | Wavelet Convolutions for Large Receptive Fields | Shahaf E. Finder et.al. | 2407.05848 | link |
| 2024-07-08 | Evaluating the Fairness of Neural Collapse in Medical Image Classification | Kaouther Mouheb et.al. | 2407.05843 | null |
| 2024-07-08 | Learning to Adapt Category Consistent Meta-Feature of CLIP for Few-Shot Classification | Jiaying Shi et.al. | 2407.05647 | null |
| 2024-07-08 | New Directions in Text Classification Research: Maximizing The Performance of Sentiment Classification from Limited Data | Surya Agustian et.al. | 2407.05627 | null |
| 2024-07-08 | Momentum Auxiliary Network for Supervised Local Learning | Junhao Su et.al. | 2407.05623 | link |
| 2024-07-08 | Open-world Multi-label Text Classification with Extremely Weak Supervision | Xintong Li et.al. | 2407.05609 | link |
| 2024-07-08 | FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance | Jiedong Zhuang et.al. | 2407.05578 | null |
| 2024-07-08 | An accurate detection is not all you need to combat label noise in web-noisy datasets | Paul Albert et.al. | 2407.05528 | null |
| 2024-07-07 | Leveraging Topological Guidance for Improved Knowledge Distillation | Eun Som Jeon et.al. | 2407.05316 | link |
| 2024-07-05 | AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation | Yuhan Zhu et.al. | 2407.04603 | null |
| 2024-07-05 | AMD: Automatic Multi-step Distillation of Large-scale Vision Models | Cheng Han et.al. | 2407.04208 | null |
| 2024-07-04 | LeDNet: Localization-enabled Deep Neural Network for Multi-Label Radiography Image Classification | Lalit Pant et.al. | 2407.03931 | null |
| 2024-07-04 | DocXplain: A Novel Model-Agnostic Explainability Method for Document Image Classification | Saifullah Saifullah et.al. | 2407.03830 | null |
| 2024-07-04 | reBEN: Refined BigEarthNet Dataset for Remote Sensing Image Analysis | Kai Norman Clasen et.al. | 2407.03653 | link |
| 2024-07-04 | Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes | Yusuke Hirota et.al. | 2407.03623 | null |
| 2024-07-04 | Self Adaptive Threshold Pseudo-labeling and Unreliable Sample Contrastive Loss for Semi-supervised Image Classification | Xuerong Zhang et.al. | 2407.03596 | null |
| 2024-07-04 | DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification | Wenhui Zhu et.al. | 2407.03575 | link |
| 2024-07-03 | A multicategory jet image classification framework using deep neural network | Jairo Orozco Sandoval et.al. | 2407.03524 | null |
| 2024-07-03 | Model Guidance via Explanations Turns Image Classifiers into Segmentation Models | Xiaoyan Yu et.al. | 2407.03009 | null |
| 2024-07-03 | ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation | Yipin Guo et.al. | 2407.02881 | null |
| 2024-07-03 | Fine-Grained Scene Image Classification with Modality-Agnostic Adapter | Yiqun Wang et.al. | 2407.02769 | link |
| 2024-07-03 | ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers | Yanfeng Jiang et.al. | 2407.02763 | null |
| 2024-07-02 | Spectral Graph Reasoning Network for Hyperspectral Image Classification | Huiling Wang et.al. | 2407.02647 | null |
| 2024-07-01 | CGRclust: Chaos Game Representation for Twin Contrastive Clustering of Unlabelled DNA Sequences | Fatemeh Alipour et.al. | 2407.02538 | link |
| 2024-07-02 | Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts | Chunlan Ma et.al. | 2407.02320 | null |
| 2024-07-03 | Federated Distillation for Medical Image Classification: Towards Trustworthy Computer-Aided Diagnosis | Sufen Ren et.al. | 2407.02261 | null |
| 2024-07-02 | Hybrid Feature Collaborative Reconstruction Network for Few-Shot Fine-Grained Image Classification | Shulei Qiu et.al. | 2407.02123 | null |
| 2024-07-01 | Optimized Learning for X-Ray Image Classification for Multi-Class Disease Diagnoses with Accelerated Computing Strategies | Sebastian A. Cruz Romero et.al. | 2407.01705 | null |
| 2024-07-02 | xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart | Tianrun Chen et.al. | 2407.01530 | link |
| 2024-07-01 | Scarecrow monitoring system:employing mobilenet ssd for enhanced animal supervision | Balaji VS et.al. | 2407.01435 | null |
| 2024-07-01 | Semantic Compositions Enhance Vision-Language Contrastive Learning | Maxwell Aladago et.al. | 2407.01408 | null |
| 2024-07-01 | GalLoP: Learning Global and Local Prompts for Vision-Language Models | Marc Lafon et.al. | 2407.01400 | null |
| 2024-07-01 | Protecting Privacy in Classifiers by Token Manipulation | Re’em Harel et.al. | 2407.01334 | null |
| 2024-07-01 | Gradient-based Class Weighting for Unsupervised Domain Adaptation in Dense Prediction Visual Tasks | Roberto Alcover-Couso et.al. | 2407.01327 | null |
| 2024-06-28 | Extract More from Less: Efficient Fine-Grained Visual Recognition in Low-Data Regimes | Dmitry Demidov et.al. | 2406.19814 | link |
| 2024-06-27 | Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads | Ali Khaleghi Rahimian et.al. | 2406.19391 | link |
| 2024-06-27 | Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation | Yushun Tang et.al. | 2406.19341 | null |
| 2024-06-27 | Spiking Convolutional Neural Networks for Text Classification | Changze Lv et.al. | 2406.19230 | link |
| 2024-06-27 | Adaptive Stochastic Weight Averaging | Caglar Demir et.al. | 2406.19092 | link |
| 2024-06-27 | FedMLP: Federated Multi-Label Medical Image Classification under Task Heterogeneity | Zhaobin Sun et.al. | 2406.18995 | link |
| 2024-06-26 | Detecting Machine-Generated Texts: Not Just “AI vs Humans” and Explainability is Complicated | Jiazhou Ji et.al. | 2406.18259 | null |
| 2024-06-26 | ViT-1.58b: Mobile Vision Transformers in the 1-bit Era | Zhengqing Yuan et.al. | 2406.18051 | null |
| 2024-06-25 | Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation | Tushar Prasanna Swaminathan et.al. | 2406.17749 | link |
| 2024-06-25 | Structured Unrestricted-Rank Matrices for Parameter Efficient Fine-tuning | Arijit Sehanobish et.al. | 2406.17740 | null |
| 2024-06-25 | BayTTA: Uncertainty-aware medical image classification with optimized test-time augmentation using Bayesian model averaging | Zeinab Sherkatghanad et.al. | 2406.17640 | link |
| 2024-06-26 | Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIP | Sedigheh Eslami et.al. | 2406.17639 | null |
| 2024-06-25 | Knowledge Distillation in Automated Annotation: Supervised Text Classification with LLM-Generated Training Labels | Nicholas Pangakis et.al. | 2406.17633 | null |
| 2024-06-25 | Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification | Huiyao Chen et.al. | 2406.17534 | link |
| 2024-06-25 | TSynD: Targeted Synthetic Data Generation for Enhanced Medical Image Classification | Joshua Niemeijer et.al. | 2406.17473 | null |
| 2024-06-25 | Dynamic Scheduling for Vehicle-to-Vehicle Communications Enhanced Federated Learning | Jintao Yan et.al. | 2406.17470 | null |
| 2024-06-25 | Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes | Qi Ma et.al. | 2406.17438 | null |
| 2024-06-25 | Robustly Optimized Deep Feature Decoupling Network for Fatty Liver Diseases Detection | Peng Huang et.al. | 2406.17338 | null |
| 2024-06-24 | Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings | Andrea Posada et.al. | 2406.16611 | link |
| 2024-06-24 | Improving robustness to corruptions with multiplicative weight perturbations | Trung Trinh et.al. | 2406.16540 | null |
| 2024-06-24 | UNICAD: A Unified Approach for Attack Detection, Noise Reduction and Novel Class Identification | Alvaro Lopez Pellicer et.al. | 2406.16501 | null |
| 2024-06-24 | Improving Quaternion Neural Networks with Quaternionic Activation Functions | Johannes Pöppelbaum et.al. | 2406.16481 | null |
| 2024-06-24 | Learning in Wilson-Cowan model for metapopulation | Raffaele Marino et.al. | 2406.16453 | link |
| 2024-06-24 | Context-augmented Retrieval: A Novel Framework for Fast Information Retrieval based Response Generation using Large Language Model | Sai Ganesh et.al. | 2406.16383 | null |
| 2024-06-24 | Combining Supervised Learning and Reinforcement Learning for Multi-Label Classification Tasks with Partial Labels | Zixia Jia et.al. | 2406.16293 | null |
| 2024-06-23 | Jacobian Descent for Multi-Objective Optimization | Pierre Quinton et.al. | 2406.16232 | null |
| 2024-06-23 | Learning with Noisy Ground Truth: From 2D Classification to 3D Reconstruction | Yangdi Lu et.al. | 2406.15982 | null |
| 2024-06-22 | PUDD: Towards Robust Multi-modal Prototype-based Deepfake Detection | Alvaro Lopez Pellcier et.al. | 2406.15921 | null |
| 2024-06-21 | Retrieval Augmented Zero-Shot Text Classification | Tassallah Abdullahi et.al. | 2406.15241 | null |
| 2024-06-21 | DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation | Yingying Fang et.al. | 2406.15182 | null |
| 2024-06-21 | This actually looks like that: Proto-BagNets for local and global interpretability-by-design | Kerol Djoumessi et.al. | 2406.15168 | link |
| 2024-06-21 | Hierarchical thematic classification of major conference proceedings | Arsentii Kuzmin et.al. | 2406.14983 | null |
| 2024-06-21 | Demonstrating the Efficacy of Kolmogorov-Arnold Networks in Vision Tasks | Minjong Cheon et.al. | 2406.14916 | link |
| 2024-06-21 | MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning | Jiali Cheng et.al. | 2406.14796 | null |
| 2024-06-20 | Depth $F_1$ : Improving Evaluation of Cross-Domain Text Classification by Measuring Semantic Generalizability | Parker Seegmiller et.al. | 2406.14695 | null |
| 2024-06-20 | Automatic Labels are as Effective as Manual Labels in Biomedical Images Classification with Deep Learning | Niccolò Marini et.al. | 2406.14351 | null |
| 2024-06-20 | Self-supervised Interpretable Concept-based Models for Text Classification | Francesco De Santis et.al. | 2406.14335 | null |
| 2024-06-20 | Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization | Tanapat Ratchatorn et.al. | 2406.14329 | null |
| 2024-06-20 | Boosting Hyperspectral Image Classification with Gate-Shift-Fuse Mechanisms in a Novel CNN-Transformer Approach | Mohamed Fadhlallah Guerri et.al. | 2406.14120 | null |
| 2024-06-20 | Seg-LSTM: Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images | Qinfeng Zhu et.al. | 2406.14086 | link |
| 2024-06-21 | CMTNet: Convolutional Meets Transformer Network for Hyperspectral Images Classification | Faxu Guo et.al. | 2406.14080 | null |
| 2024-06-20 | Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods | Tim Tsz-Kit Lau et.al. | 2406.13936 | null |
| 2024-06-19 | WATT: Weight Average Test-Time Adaption of CLIP | David Osowiechi et.al. | 2406.13875 | link |
| 2024-06-19 | CNN Based Flank Predictor for Quadruped Animal Species | Vanessa Suessle et.al. | 2406.13588 | null |
| 2024-06-19 | Online Domain-Incremental Learning Approach to Classify Acoustic Scenes in All Locations | Manjunath Mulimani et.al. | 2406.13386 | null |
| 2024-06-18 | LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging | Jinuk Kim et.al. | 2406.12837 | link |
| 2024-06-18 | Privacy Preserving Federated Learning in Medical Imaging with Uncertainty Estimation | Nikolas Koutsoubis et.al. | 2406.12815 | link |
| 2024-06-18 | Online Anchor-based Training for Image Classification Tasks | Maria Tzelepi et.al. | 2406.12662 | null |
| 2024-06-18 | Fighting Randomness with Randomness: Mitigating Optimisation Instability of Fine-Tuning using Delayed Ensemble and Noisy Interpolation | Branislav Pecher et.al. | 2406.12471 | null |
| 2024-06-18 | GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory | Haoze Wu et.al. | 2406.12375 | null |
| 2024-06-18 | What Did I Do Wrong? Quantifying LLMs’ Sensitivity and Consistency to Prompt Engineering | Federico Errica et.al. | 2406.12334 | null |
| 2024-06-18 | Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification | Zehui Liao et.al. | 2406.12293 | null |
| 2024-06-18 | Advancing Cross-Domain Generalizability in Face Anti-Spoofing: Insights, Design, and Metrics | Hyojin Kim et.al. | 2406.12258 | null |
| 2024-06-19 | MiSuRe is all you need to explain your image segmentation | Syed Nouman Hasany et.al. | 2406.12173 | null |
| 2024-06-17 | Enhancing Text Classification through LLM-Driven Active Learning and Human Annotation | Hamidreza Rouzegar et.al. | 2406.12114 | link |
| 2024-06-17 | Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% | Lei Zhu et.al. | 2406.11837 | link |
| 2024-06-17 | PrAViC: Probabilistic Adaptation Framework for Real-Time Video Classification | Magdalena Trędowicz et.al. | 2406.11443 | null |
| 2024-06-17 | Cross-domain Open-world Discovery | Shuo Wen et.al. | 2406.11422 | link |
| 2024-06-17 | BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language Models | Xuefeng Hu et.al. | 2406.11309 | null |
| 2024-06-17 | An Empirical Investigation of Matrix Factorization Methods for Pre-trained Transformers | Ashim Gupta et.al. | 2406.11307 | null |
| 2024-06-17 | Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification | Letian Peng et.al. | 2406.11115 | null |
| 2024-06-16 | Fine-grained Classes and How to Find Them | Matej Grcić et.al. | 2406.11070 | link |
| 2024-06-16 | Leveraging Foundation Models for Multi-modal Federated Learning with Incomplete Modality | Liwei Che et.al. | 2406.11048 | null |
| 2024-06-16 | Curating Stopwords in Marathi: A TF-IDF Approach for Improved Text Analysis and Information Retrieval | Rohan Chavan et.al. | 2406.11029 | link |
| 2024-06-16 | Universal Cross-Lingual Text Classification | Riya Savant et.al. | 2406.11028 | null |
| 2024-06-14 | UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner | Dongchao Yang et.al. | 2406.10056 | null |
| 2024-06-14 | Comparison of fine-tuning strategies for transfer learning in medical image classification | Ana Davila et.al. | 2406.10050 | null |
| 2024-06-14 | Forgetting Order of Continual Learning: Examples That are Learned First are Forgotten Last | Guy Hacohen et.al. | 2406.09935 | null |
| 2024-06-13 | MirrorCheck: Efficient Adversarial Defense for Vision-Language Models | Samar Fares et.al. | 2406.09250 | null |
| 2024-06-13 | Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models | Christopher Schröder et.al. | 2406.09206 | null |
| 2024-06-13 | Large-Scale Evaluation of Open-Set Image Classification Techniques | Halil Bisgin et.al. | 2406.09112 | link |
| 2024-06-13 | LaCoOT: Layer Collapse through Optimal Transport | Victor Quétu et.al. | 2406.08933 | null |
| 2024-06-13 | The Penalized Inverse Probability Measure for Conformal Classification | Paul Melki et.al. | 2406.08884 | null |
| 2024-06-13 | Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency | Maor Dikter et.al. | 2406.08840 | link |
| 2024-06-13 | DenoiseReID: Denoising Model for Representation Learning of Person Re-Identification | Zhengrui Xu et.al. | 2406.08773 | null |
| 2024-06-12 | Fine-Tuned ‘Small’ LLMs (Still) Significantly Outperform Zero-Shot Generative AI Models in Text Classification | Martin Juan José Bucher et.al. | 2406.08660 | null |
| 2024-06-12 | Intelligent Multi-View Test Time Augmentation | Efe Ozturk et.al. | 2406.08593 | null |
| 2024-06-12 | Transformation-Dependent Adversarial Attacks | Yaoteng Tan et.al. | 2406.08443 | null |
| 2024-06-12 | AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer | Yitao Xu et.al. | 2406.08298 | null |
| 2024-06-12 | DistilDoc: Knowledge Distillation for Visually-Rich Document Applications | Jordy Van Landeghem et.al. | 2406.08226 | null |
| 2024-06-12 | Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor | Yongjie Si et.al. | 2406.08122 | null |
| 2024-06-12 | Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network | Yanxiong Li et.al. | 2406.08119 | null |
| 2024-06-12 | A $^{2}$ -MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder | Lixian Zhang et.al. | 2406.08079 | null |
| 2024-06-12 | Adversarial Evasion Attack Efficiency against Large Language Models | João Vitorino et.al. | 2406.08050 | null |
| 2024-06-12 | Accurate Explanation Model for Image Classifiers using Class Association Embedding | Ruitao Xie et.al. | 2406.07961 | link |
| 2024-06-12 | Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection | Jie Feng et.al. | 2406.07949 | null |
| 2024-06-12 | Small Scale Data-Free Knowledge Distillation | He Liu et.al. | 2406.07876 | link |
| 2024-06-11 | fKAN: Fractional Kolmogorov-Arnold Networks with trainable Jacobi basis functions | Alireza Afzal Aghaei et.al. | 2406.07456 | link |
| 2024-06-11 | Minimizing Energy Costs in Deep Learning Model Training: The Gaussian Sampling Approach | Challapalli Phanindra Revanth et.al. | 2406.07332 | null |
| 2024-06-11 | Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment | Takuto Igarashi et.al. | 2406.07280 | null |
| 2024-06-11 | EEG-ImageNet: An Electroencephalogram Dataset and Benchmarks with Image Visual Stimuli of Multi-Granularity Labels | Shuqi Zhu et.al. | 2406.07151 | link |
| 2024-06-11 | RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents | Wenjia Xu et.al. | 2406.07089 | null |
| 2024-06-11 | DualMamba: A Lightweight Spectral-Spatial Mamba-Convolution Network for Hyperspectral Image Classification | Jiamu Sheng et.al. | 2406.07050 | null |
| 2024-06-11 | Fairness-Aware Meta-Learning via Nash Bargaining | Yi Zeng et.al. | 2406.07029 | null |
| 2024-06-11 | Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models | Zhenyi Lu et.al. | 2406.07001 | link |
| 2024-06-11 | Scaling up masked audio encoder learning for general audio classification | Heinrich Dinkel et.al. | 2406.06992 | null |
| 2024-06-10 | Multi-Objective Neural Architecture Search for In-Memory Computing | Md Hasibul Amin et.al. | 2406.06746 | null |
| 2024-06-10 | Robust Latent Representation Tuning for Image-text Classification | Hao Sun et.al. | 2406.06048 | null |
| 2024-06-09 | Contrastive Learning from Synthetic Audio Doppelgangers | Manuel Cherep et.al. | 2406.05923 | null |
| 2024-06-09 | Scaling Graph Convolutions for Mobile Vision | William Avery et.al. | 2406.05850 | link |
| 2024-06-09 | Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification | Yuxin Hong et.al. | 2406.05677 | null |
| 2024-06-09 | Which Backbone to Use: A Resource-efficient Domain Specific Comparison for Computer Vision | Pranav Jeevan et.al. | 2406.05612 | link |
| 2024-06-08 | Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification | Yunhe Gao et.al. | 2406.05596 | null |
| 2024-06-07 | The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better | Scott Geng et.al. | 2406.05184 | link |
| 2024-06-07 | A Novel Time Series-to-Image Encoding Approach for Weather Phenomena Classification | Christian Giannetti et.al. | 2406.05096 | null |
| 2024-06-07 | Classification Metrics for Image Explanations: Towards Building Reliable XAI-Evaluations | Benjamin Fresz et.al. | 2406.05068 | link |
| 2024-06-07 | REP: Resource-Efficient Prompting for On-device Continual Learning | Sungho Jeon et.al. | 2406.04772 | null |
| 2024-06-07 | AICoderEval: Improving AI Domain Code Generation of Large Language Models | Yinghui Xia et.al. | 2406.04712 | null |
| 2024-06-07 | Cooperative Meta-Learning with Gradient Augmentation | Jongyun Shin et.al. | 2406.04639 | link |
| 2024-06-06 | OCCAM: Towards Cost-Efficient and Accuracy-Aware Image Classification Inference | Dujian Ding et.al. | 2406.04508 | null |
| 2024-06-06 | Can Language Models Use Forecasting Strategies? | Sarah Pratt et.al. | 2406.04446 | null |
| 2024-06-06 | Parameter-Inverted Image Pyramid Networks | Xizhou Zhu et.al. | 2406.04330 | link |
| 2024-06-07 | BEADs: Bias Evaluation Across Domains | Shaina Raza et.al. | 2406.04220 | null |
| 2024-06-06 | What Do Language Models Learn in Context? The Structured Task Hypothesis | Jiaoda Li et.al. | 2406.04216 | null |
| 2024-06-06 | Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness | Lars Hillebrand et.al. | 2406.04156 | link |
| 2024-06-07 | ReDistill: Residual Encoded Distillation for Peak Memory Reduction | Fang Chen et.al. | 2406.03744 | null |
| 2024-06-06 | LLMEmbed: Rethinking Lightweight LLM’s Genuine Function in Text Classification | Chun Liu et.al. | 2406.03725 | link |
| 2024-06-05 | Convolutional Neural Networks and Vision Transformers for Fashion MNIST Classification: A Literature Review | Sonia Bbouzidi et.al. | 2406.03478 | null |
| 2024-06-05 | IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models | David Ifeoluwa Adelani et.al. | 2406.03368 | null |
| 2024-06-05 | Audio Mamba: Bidirectional State Space Model for Audio Representation Learning | Mehmet Hamza Erol et.al. | 2406.03344 | link |
| 2024-06-05 | FusionBench: A Comprehensive Benchmark of Deep Model Fusion | Anke Tang et.al. | 2406.03280 | null |
| 2024-06-05 | VWise: A novel benchmark for evaluating scene classification for vehicular applications | Pedro Azevedo et.al. | 2406.03273 | null |
| 2024-06-05 | Tiny models from tiny data: Textual and null-text inversion for few-shot distillation | Erik Landolsi et.al. | 2406.03146 | link |
| 2024-06-05 | Exploiting LMM-based knowledge for image classification tasks | Maria Tzelepi et.al. | 2406.03071 | null |
| 2024-06-04 | Randomized Geometric Algebra Methods for Convex Neural Networks | Yifei Wang et.al. | 2406.02806 | null |
| 2024-06-04 | DL-KDD: Dual-Light Knowledge Distillation for Action Recognition in the Dark | Chi-Jui Chang et.al. | 2406.02468 | null |
| 2024-06-04 | GrootVL: Tree Topology is All You Need in State Space Model | Yicheng Xiao et.al. | 2406.02395 | link |
| 2024-06-04 | Hybrid Quantum-Classical Neural Network for LAB Color Space Image Classification | Kwokho Ng et.al. | 2406.02229 | null |
| 2024-06-03 | Few-Shot Classification of Interactive Activities of Daily Living (InteractADL) | Zane Durante et.al. | 2406.01662 | link |
| 2024-06-03 | CoLa-DCE – Concept-guided Latent Diffusion Counterfactual Explanations | Franz Motzkus et.al. | 2406.01649 | null |
| 2024-06-03 | Asynchronous Multi-Server Federated Learning for Geo-Distributed Clients | Yuncong Zuo et.al. | 2406.01439 | null |
| 2024-06-03 | Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization | Firas Khader et.al. | 2406.01314 | null |
| 2024-06-03 | Continuous Geometry-Aware Graph Diffusion via Hyperbolic Neural PDE | Jiaxu Liu et.al. | 2406.01282 | null |
| 2024-06-04 | MultiMax: Sparse and Multi-Modal Attention Learning | Yuxuan Zhou et.al. | 2406.01189 | link |
| 2024-06-03 | Synergizing Unsupervised and Supervised Learning: A Hybrid Approach for Accurate Natural Language Task Modeling | Wrick Talukdar et.al. | 2406.01096 | null |
| 2024-05-31 | You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet | Zhen Qin et.al. | 2405.21022 | null |
| 2024-05-31 | Investigating Calibration and Corruption Robustness of Post-hoc Pruned Perception CNNs: An Image Classification Benchmark Study | Pallavi Mitra et.al. | 2405.20876 | null |
| 2024-05-31 | Improving Generalization and Convergence by Enhancing Implicit Regularization | Mingze Wang et.al. | 2405.20763 | null |
| 2024-05-31 | Robust Stable Spiking Neural Networks | Jianhao Ding et.al. | 2405.20694 | null |
| 2024-05-31 | Enhancing Counterfactual Image Generation Using Mahalanobis Distance with Distribution Preferences in Feature Space | Yukai Zhang et.al. | 2405.20685 | null |
| 2024-05-31 | GenMix: Combining Generative and Mixture Data Augmentation for Medical Image Classification | Hansang Lee et.al. | 2405.20650 | null |
| 2024-05-31 | ToxVidLLM: A Multimodal LLM-based Framework for Toxicity Detection in Code-Mixed Videos | Krishanu Maity et.al. | 2405.20628 | null |
| 2024-05-30 | Mitigating the Impact of Labeling Errors on Training via Rockafellian Relaxation | Louis L. Chen et.al. | 2405.20531 | null |
| 2024-05-30 | DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark | Haoxing Chen et.al. | 2405.19707 | link |
| 2024-05-30 | A Novel Approach for Automated Design Information Mining from Issue Logs | Jiuang Zhao et.al. | 2405.19623 | null |
| 2024-05-29 | I Bet You Did Not Mean That: Testing Semantic Importance via Betting | Jacopo Teneggi et.al. | 2405.19146 | link |
| 2024-05-29 | Verifiably Robust Conformal Prediction | Linus Jeary et.al. | 2405.18942 | null |
| 2024-05-29 | Leveraging Many-To-Many Relationships for Defending Against Visual-Language Adversarial Attacks | Futa Waseda et.al. | 2405.18770 | null |
| 2024-05-29 | GIST: Greedy Independent Set Thresholding for Diverse Data Summarization | Matthew Fahrbach et.al. | 2405.18754 | null |
| 2024-05-29 | LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image Classification | Renyi Qu et.al. | 2405.18672 | null |
| 2024-05-28 | Its Not a Modality Gap: Characterizing and Addressing the Contrastive Gap | Abrar Fahim et.al. | 2405.18570 | null |
| 2024-05-28 | Why are Visually-Grounded Language Models Bad at Image Classification? | Yuhui Zhang et.al. | 2405.18415 | link |
| 2024-05-28 | MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution | Wenzhuo Liu et.al. | 2405.18240 | null |
| 2024-05-28 | Confidence-aware multi-modality learning for eye disease screening | Ke Zou et.al. | 2405.18167 | link |
| 2024-05-28 | 4-bit Shampoo for Memory-Efficient Network Training | Sike Wang et.al. | 2405.18144 | link |
| 2024-05-28 | DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture | Shentong Mo et.al. | 2405.17995 | link |
| 2024-05-27 | WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average | Louis Fournier et.al. | 2405.17517 | null |
| 2024-05-27 | Model-Agnostic Zeroth-Order Policy Optimization for Meta-Learning of Ergodic Linear Quadratic Regulators | Yunian Pan et.al. | 2405.17370 | null |
| 2024-05-27 | On the Noise Robustness of In-Context Learning for Text Generation | Hongfu Gao et.al. | 2405.17264 | null |
| 2024-05-27 | Superpixelwise Low-rank Approximation based Partial Label Learning for Hyperspectral Image Classification | Shujun Yang et.al. | 2405.17110 | link |
| 2024-05-26 | Demystify Mamba in Vision: A Linear Attention Perspective | Dongchen Han et.al. | 2405.16605 | null |
| 2024-05-26 | AdaFisher: Adaptive Second Order Optimization via Fisher Information | Damien Martins Gomes et.al. | 2405.16397 | link |
| 2024-05-25 | ModelLock: Locking Your Model With a Spell | Yifeng Gao et.al. | 2405.16285 | null |
| 2024-05-25 | Accelerating Transformers with Spectrum-Preserving Token Merging | Hoai-Chau Tran et.al. | 2405.16148 | link |
| 2024-05-25 | Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack | Mingli Zhu et.al. | 2405.16134 | null |
| 2024-05-24 | Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images | Yiran Luo et.al. | 2405.15961 | link |
| 2024-05-24 | A Neurosymbolic Framework for Bias Correction in CNNs | Parth Padalkar et.al. | 2405.15886 | null |
| 2024-05-24 | What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models | Abdelrahman Abdelhamed et.al. | 2405.15668 | link |
| 2024-05-24 | Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning | Wenhan Chang et.al. | 2405.15662 | null |
| 2024-05-24 | Exposing Image Classifier Shortcuts with Counterfactual Frequency (CoF) Tables | James Hinns et.al. | 2405.15661 | null |
| 2024-05-24 | Harnessing Increased Client Participation with Cohort-Parallel Federated Learning | Akash Dhasade et.al. | 2405.15644 | null |
| 2024-05-24 | Transformer-based Federated Learning for Multi-Label Remote Sensing Image Classification | Barış Büyüktaş et.al. | 2405.15405 | null |
| 2024-05-24 | CLIP model is an Efficient Online Lifelong Learner | Leyuan Wang et.al. | 2405.15155 | null |
| 2024-05-24 | OptLLM: Optimal Assignment of Queries to Large Language Models | Yueyue Liu et.al. | 2405.15130 | null |
| 2024-05-23 | A Lost Opportunity for Vision-Language Models: A Comparative Study of Online Test-time Adaptation for Vision-Language Models | Mario Döbler et.al. | 2405.14977 | link |
| 2024-05-23 | Domain Wall Magnetic Tunnel Junction Reliable Integrate and Fire Neuron | Can Cui1 et.al. | 2405.14851 | null |
| 2024-05-23 | Explaining Black-box Model Predictions via Two-level Nested Feature Attributions with Consistency Property | Yuya Yoshikawa et.al. | 2405.14522 | null |
| 2024-05-23 | SIAVC: Semi-Supervised Framework for Industrial Accident Video Classification | Zuoyong Li et.al. | 2405.14506 | null |
| 2024-05-23 | Scalable Visual State Space Model with Fractal Scanning | Lv Tang et.al. | 2405.14480 | null |
| 2024-05-23 | Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation | Daniel Kienzle et.al. | 2405.14467 | link |
| 2024-05-23 | Boosting Robustness by Clipping Gradients in Distributed Learning | Youssef Allouah et.al. | 2405.14432 | null |
| 2024-05-23 | Advancing Spiking Neural Networks for Sequential Modeling with Central Pattern Generators | Changze Lv et.al. | 2405.14362 | null |
| 2024-05-23 | Simple Hamiltonian dynamics is a powerful quantum processing resource | Akitada Sakurai et.al. | 2405.14245 | null |
| 2024-05-23 | ChronosLex: Time-aware Incremental Training for Temporal Generalization of Legal Classification Tasks | T. Y. S. S Santosh et.al. | 2405.14211 | null |
| 2024-05-22 | Just rotate it! Uncertainty estimation in closed-source models via multiple queries | Konstantinos Pitas et.al. | 2405.13864 | null |
| 2024-05-21 | Decentralized Federated Learning Over Imperfect Communication Channels | Weicai Li et.al. | 2405.12894 | null |
| 2024-05-21 | Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting | Omar Hamed et.al. | 2405.12705 | null |
| 2024-05-21 | Exploration of Masked and Causal Language Modelling for Text Generation | Nicolo Micheletti et.al. | 2405.12630 | null |
| 2024-05-21 | 3DSS-Mamba: 3D-Spectral-Spatial Mamba for Hyperspectral Image Classification | Yan He et.al. | 2405.12487 | null |
| 2024-05-20 | Alzheimer’s Magnetic Resonance Imaging Classification Using Deep and Meta-Learning Models | Nida Nasir et.al. | 2405.12126 | null |
| 2024-05-20 | Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification | Weilian Zhou et.al. | 2405.12003 | link |
| 2024-05-20 | A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers | Tom Roth et.al. | 2405.11904 | null |
| 2024-05-21 | A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus | Eduard Poesina et.al. | 2405.11877 | link |
| 2024-05-20 | SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model | Siavash Shams et.al. | 2405.11831 | link |
| 2024-05-20 | Exploring Ordinality in Text Classification: A Comparative Study of Explicit and Implicit Techniques | Siva Rajesh Kasa et.al. | 2405.11775 | null |
| 2024-05-19 | SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization | Jialong Guo et.al. | 2405.11582 | link |
| 2024-05-19 | Reproducibility Study of CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification | Manan Shah et.al. | 2405.11574 | link |
| 2024-05-19 | An Invisible Backdoor Attack Based On Semantic Feature | Yangming Chen et.al. | 2405.11551 | null |
| 2024-05-19 | Verification technology for finger vein biometric | George Kumi Kyeremeh et.al. | 2405.11540 | null |
| 2024-05-17 | Reduced storage direct tensor ring decomposition for convolutional neural networks compression | Mateusz Gabor et.al. | 2405.10802 | link |
| 2024-05-17 | Benchmarking Large Language Models on CFLUE – A Chinese Financial Language Understanding Evaluation Dataset | Jie Zhu et.al. | 2405.10542 | link |
| 2024-05-17 | Smart Expert System: Large Language Models as Text Classifiers | Zhiqiang Wang et.al. | 2405.10523 | link |
| 2024-05-16 | Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge | Florian Schmid et.al. | 2405.10018 | null |
| 2024-05-16 | ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset | Johannes Rückert et.al. | 2405.10004 | link |
| 2024-05-15 | Improving Label Error Detection and Elimination with Uncertainty Quantification | Johannes Jakubik et.al. | 2405.09602 | null |
| 2024-05-15 | Tackling Distribution Shifts in Task-Oriented Communication with Information Bottleneck | Hongru Li et.al. | 2405.09514 | null |
| 2024-05-15 | Feature-based Federated Transfer Learning: Communication Efficiency, Robustness and Privacy | Feng Wang et.al. | 2405.09014 | link |
| 2024-05-14 | The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks | Ziquan Liu et.al. | 2405.08886 | link |
| 2024-05-14 | Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling | Gregory Holste et.al. | 2405.08780 | null |
| 2024-05-14 | FolkTalent: Enhancing Classification and Tagging of Indian Folk Paintings | Nancy Hada et.al. | 2405.08776 | null |
| 2024-05-14 | The impact of Compositionality in Zero-shot Multi-label action recognition for Object-based tasks | Carmela Calabrese et.al. | 2405.08695 | null |
| 2024-05-14 | Achieving Fairness Through Channel Pruning for Dermatological Disease Diagnosis | Qingpeng Kong et.al. | 2405.08681 | link |
| 2024-05-14 | Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning | Alain Riou et.al. | 2405.08679 | null |
| 2024-05-14 | Dual-Branch Network for Portrait Image Quality Assessment | Wei Sun et.al. | 2405.08555 | link |
| 2024-05-13 | Who’s in and who’s out? A case study of multimodal CLIP-filtering in DataComp | Rachel Hong et.al. | 2405.08209 | link |
| 2024-05-14 | MambaOut: Do We Really Need Mamba for Vision? | Weihao Yu et.al. | 2405.07992 | link |
| 2024-05-13 | Constrained Exploration via Reflected Replica Exchange Stochastic Gradient Langevin Dynamics | Haoyang Zheng et.al. | 2405.07839 | link |
| 2024-05-13 | Analysis of the rate of convergence of an over-parametrized convolutional neural network image classifier learned by gradient descent | Michael Kohler et.al. | 2405.07619 | null |
| 2024-05-13 | On-device Online Learning and Semantic Management of TinyML Systems | Haoyu Ren et.al. | 2405.07601 | null |
| 2024-05-13 | GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation | Andrey V. Galichin et.al. | 2405.07562 | null |
| 2024-05-13 | Fine-tuning the SwissBERT Encoder Model for Embedding Sentences and Documents | Juri Grosjean et.al. | 2405.07513 | null |
| 2024-05-13 | MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks | Haijiang Tian et.al. | 2405.07411 | null |
| 2024-05-12 | Explainable Convolutional Neural Networks for Retinal Fundus Classification and Cutting-Edge Segmentation Models for Retinal Blood Vessels from Fundus Images | Fatema Tuj Johora Faria et.al. | 2405.07338 | null |
| 2024-05-12 | Differentiable Model Scaling using Differentiable Topk | Kai Liu et.al. | 2405.07194 | null |
| 2024-05-11 | A framework of text-dependent speaker verification for chinese numerical string corpus | Litong Zheng et.al. | 2405.07029 | null |
| 2024-05-10 | Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification | Yaoqin Ye et.al. | 2405.06468 | null |
| 2024-05-10 | Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data | Rongyu Zhang et.al. | 2405.06413 | null |
| 2024-05-10 | SaudiBERT: A Large Language Model Pretrained on Saudi Dialect Corpora | Faisal Qarah et.al. | 2405.06239 | link |
| 2024-05-09 | Deep Multi-Task Learning for Malware Image Classification | Ahmed Bensaoud et.al. | 2405.05906 | null |
| 2024-05-09 | Enhancing Suicide Risk Detection on Social Media through Semi-Supervised Deep Label Smoothing | Matthew Squires et.al. | 2405.05795 | null |
| 2024-05-09 | CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks | Nick et.al. | 2405.05755 | null |
| 2024-05-09 | How Quality Affects Deep Neural Networks in Fine-Grained Image Classification | Joseph Smith et.al. | 2405.05742 | null |
| 2024-05-09 | End-to-End Generative Semantic Communication Powered by Shared Semantic Knowledge Base | Shuling Li et.al. | 2405.05738 | null |
| 2024-05-09 | Using Machine Translation to Augment Multilingual Classification | Adam King et.al. | 2405.05478 | null |
| 2024-05-08 | AFEN: Respiratory Disease Classification using Ensemble Learning | Rahul Nadkarni et.al. | 2405.05467 | null |
| 2024-05-08 | XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples | Peiqin Lin et.al. | 2405.05116 | link |
| 2024-05-08 | Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution | Shuo Shao et.al. | 2405.04825 | null |
| 2024-05-07 | Exploring Explainable AI Techniques for Improved Interpretability in Lung and Colon Cancer Classification | Mukaffi Bin Moin et.al. | 2405.04610 | link |
| 2024-05-07 | Pragmatist Intelligence: Where the Principle of Usefulness Can Take ANNs | Antonio Bikić et.al. | 2405.04386 | null |
| 2024-05-07 | Semi-Supervised Disease Classification based on Limited Medical Image Data | Yan Zhang et.al. | 2405.04295 | null |
| 2024-05-07 | DCNN: Dual Cross-current Neural Networks Realized Using An Interactive Deep Learning Discriminator for Fine-grained Objects | Da Fu et.al. | 2405.04093 | null |
| 2024-05-07 | Feature Map Convergence Evaluation for Functional Module | Ludan Zhang et.al. | 2405.04041 | null |
| 2024-05-07 | VMambaCC: A Visual State Space Model for Crowd Counting | Hao-Yuan Ma et.al. | 2405.03978 | null |
| 2024-05-06 | On Adversarial Examples for Text Classification by Perturbing Latent Representations | Korn Sooksatra et.al. | 2405.03789 | null |
| 2024-05-06 | CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification | Sankalp Sinha et.al. | 2405.03660 | null |
| 2024-05-06 | Deep Space Separable Distillation for Lightweight Acoustic Scene Classification | ShuQi Ye et.al. | 2405.03567 | null |
| 2024-05-06 | Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing | Han Liu et.al. | 2405.03565 | null |
| 2024-05-06 | A Lightweight Neural Architecture Search Model for Medical Image Classification | Lunchen Xie et.al. | 2405.03462 | null |
| 2024-05-06 | Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification | Matteo Bianchi et.al. | 2405.03301 | null |
| 2024-05-06 | TED: Accelerate Model Training by Internal Generalization | Jinying Xiao et.al. | 2405.03228 | null |
| 2024-05-06 | Advancing Multimodal Medical Capabilities of Gemini | Lin Yang et.al. | 2405.03162 | null |
| 2024-05-05 | A scoping review of using Large Language Models (LLMs) to investigate Electronic Health Records (EHRs) | Lingyao Li et.al. | 2405.03066 | null |
| 2024-05-05 | Parameter-Efficient Fine-Tuning with Discrete Fourier Transform | Ziqi Gao et.al. | 2405.03003 | null |
| 2024-05-04 | MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning | Vishal Nedungadi et.al. | 2405.02771 | null |
| 2024-05-03 | Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification | Siqi Yin et.al. | 2405.02155 | null |
| 2024-05-03 | The Trade-off between Performance, Efficiency, and Fairness in Adapter Modules for Text Classification | Minh Duc Bui et.al. | 2405.02010 | null |
| 2024-05-03 | Which Identities Are Mobilized: Towards an automated detection of social group appeals in political texts | Felicia Riethmüller et.al. | 2405.01904 | null |
| 2024-05-02 | PVF (Parameter Vulnerability Factor): A Quantitative Metric Measuring AI Vulnerability and Resilience Against Parameter Corruptions | Xun Jiao et.al. | 2405.01741 | null |
| 2024-05-02 | Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey | Guoping Xu et.al. | 2405.01725 | link |
| 2024-05-02 | SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients | Tushar Verma et.al. | 2405.01699 | null |
| 2024-05-02 | Explainable AI (XAI) in Image Segmentation in Medicine, Industry, and Beyond: A Survey | Rokas Gipiškis et.al. | 2405.01636 | null |
| 2024-05-02 | Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models | Nishad Singhi et.al. | 2405.01531 | null |
| 2024-05-03 | Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks | Mikkel Jordahn et.al. | 2405.01196 | null |
| 2024-05-02 | Uncertainty-aware self-training with expectation maximization basis transformation | Zijia Wang et.al. | 2405.01175 | null |
| 2024-05-02 | Transformers Fusion across Disjoint Samples for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2405.01095 | null |
| 2024-05-02 | Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation | Tianyi Chen et.al. | 2405.01041 | null |
| 2024-05-02 | Benchmarking Representations for Speech, Music, and Acoustic Events | Moreno La Quatra et.al. | 2405.00934 | link |
| 2024-05-01 | Digital-analog quantum convolutional neural networks for image classification | Anton Simen et.al. | 2405.00548 | null |
| 2024-05-03 | BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine | Mingchen Li et.al. | 2405.00465 | null |
| 2024-05-01 | Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol | Konstantinos Apostolidis et.al. | 2405.00384 | null |
| 2024-05-01 | Data Augmentation Policy Search for Long-Term Forecasting | Liran Nochumsohn et.al. | 2405.00319 | null |
| 2024-04-30 | Let’s Focus: Focused Backdoor Attack against Federated Transfer Learning | Marco Arazzi et.al. | 2404.19420 | null |
| 2024-04-30 | Large Language Model Informed Patent Image Retrieval | Hao-Cheng Lo et.al. | 2404.19360 | null |
| 2024-04-30 | Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair | Jeonghoon Park et.al. | 2404.19250 | null |
| 2024-04-29 | Spectral-Spatial Mamba for Hyperspectral Image Classification | Lingbo Huang et.al. | 2404.18401 | null |
| 2024-04-28 | TextGram: Towards a better domain-adaptive pretraining | Sharayu Hiwarkhedkar et.al. | 2404.18228 | null |
| 2024-04-28 | L3Cube-MahaNews: News-based Short Text and Long Document Classification Datasets in Marathi | Saloni Mittal et.al. | 2404.18216 | link |
| 2024-04-28 | S $^2$ Mamba: A Spatial-spectral State Space Model for Hyperspectral Image Classification | Guanchun Wang et.al. | 2404.18213 | null |
| 2024-04-27 | Implicit Generative Prior for Bayesian Neural Networks | Yijia Liu et.al. | 2404.18008 | link |
| 2024-04-27 | Towards Privacy-Preserving Audio Classification Systems | Bhawana Chhaglani et.al. | 2404.18002 | null |
| 2024-04-27 | A Method of Moments Embedding Constraint and its Application to Semi-Supervised Learning | Michael Majurski et.al. | 2404.17978 | null |
| 2024-04-27 | Spatial, Temporal, and Geometric Fusion for Remote Sensing Images | Hessah Albanwan et.al. | 2404.17851 | null |
| 2024-04-27 | Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification | Chao Yi et.al. | 2404.17753 | link |
| 2024-04-26 | SPLICE – Streamlining Digital Pathology Image Processing | Areej Alsaafin et.al. | 2404.17704 | null |
| 2024-04-26 | SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes | Georgia Baltsou et.al. | 2404.17255 | null |
| 2024-04-25 | Incorporating Lexical and Syntactic Knowledge for Unsupervised Cross-Lingual Transfer | Jianyu Zheng et.al. | 2404.16627 | link |
| 2024-04-25 | IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks | Zitong Huang et.al. | 2404.16331 | null |
| 2024-04-25 | Lacunarity Pooling Layers for Plant Image Classification using Texture Analysis | Akshatha Mohan et.al. | 2404.16268 | link |
| 2024-04-24 | MiMICRI: Towards Domain-centered Counterfactual Explanations of Cardiovascular Image Classification Models | Grace Guo et.al. | 2404.16174 | null |
| 2024-04-24 | MoDE: CLIP Data Experts via Clustering | Jiawei Ma et.al. | 2404.16030 | link |
| 2024-04-26 | A Survey on Visual Mamba | Hanwei Zhang et.al. | 2404.15956 | null |
| 2024-04-24 | Vision Transformer-based Adversarial Domain Adaptation | Yahan Li et.al. | 2404.15817 | link |
| 2024-04-24 | Rethinking Model Prototyping through the MedMNIST+ Dataset Collection | Sebastian Doerrich et.al. | 2404.15786 | null |
| 2024-04-24 | Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning | Zuheng Kang et.al. | 2404.15704 | null |
| 2024-04-24 | Brain Storm Optimization Based Swarm Learning for Diabetic Retinopathy Image Classification | Liang Qu et.al. | 2404.15585 | null |
| 2024-04-23 | An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models | Yangchen Pan et.al. | 2404.15518 | null |
| 2024-04-23 | Deep multi-prototype capsule networks | Saeid Abbassi et.al. | 2404.15445 | null |
| 2024-04-23 | A review of deep learning-based information fusion techniques for multimodal medical image classification | Yihao Li et.al. | 2404.15022 | null |
| 2024-04-23 | Social Media and Artificial Intelligence for Sustainable Cities and Societies: A Water Quality Analysis Use-case | Muhammad Asif Auyb et.al. | 2404.14977 | null |
| 2024-04-23 | Traditional to Transformers: A Survey on Current Trends and Future Prospects for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2404.14955 | link |
| 2024-04-23 | Pyramid Hierarchical Transformer for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2404.14945 | link |
| 2024-04-23 | Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2404.14944 | link |
| 2024-04-23 | CoProNN: Concept-based Prototypical Nearest Neighbors for Explaining Vision Models | Teodor Chiaburu et.al. | 2404.14830 | link |
| 2024-04-22 | WangLab at MEDIQA-M3G 2024: Multimodal Medical Answer Generation using Large Language Models | Ronald Xie et.al. | 2404.14567 | null |
| 2024-04-22 | CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective | Wencheng Zhu et.al. | 2404.14109 | null |
| 2024-04-21 | EncodeNet: A Framework for Boosting DNN Accuracy with Entropy-driven Generalized Converting Autoencoder | Hasanul Mahmud et.al. | 2404.13770 | null |
| 2024-04-21 | PEACH: Pretrained-embedding Explanation Across Contextual and Hierarchical Structure | Feiqi Cao et.al. | 2404.13645 | link |
| 2024-04-21 | I2CANSAY:Inter-Class Analogical Augmentation and Intra-Class Significance Analysis for Non-Exemplar Online Task-Free Continual Learning | Songlin Dong et.al. | 2404.13576 | null |
| 2024-04-21 | IMO: Greedy Layer-Wise Sparse Representation Learning for Out-of-Distribution Text Classification with Pre-trained Models | Tao Feng et.al. | 2404.13504 | null |
| 2024-04-20 | Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature Processing | Yuang Liu et.al. | 2404.13434 | null |
| 2024-04-20 | Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge | Khuyagbaatar Batsuren et.al. | 2404.13292 | link |
| 2024-04-20 | 3D-Convolution Guided Spectral-Spatial Transformer for Hyperspectral Image Classification | Shyam Varahagiri et.al. | 2404.13252 | link |
| 2024-04-19 | On-board classification of underwater images using hybrid classical-quantum CNN based method | Sreeraj Rajan Warrier et.al. | 2404.13130 | null |
| 2024-04-19 | Next Generation Loss Function for Image Classification | Shakhnaz Akhmedova et.al. | 2404.12948 | null |
| 2024-04-19 | A Hybrid Generative and Discriminative PointNet on Unordered Point Sets | Yang Ye et.al. | 2404.12925 | null |
| 2024-04-19 | Transformer-Based Classification Outcome Prediction for Multimodal Stroke Treatment | Danqing Ma et.al. | 2404.12634 | null |
| 2024-04-18 | When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes | Asaf Yehudai et.al. | 2404.12365 | link |
| 2024-04-18 | Observation, Analysis, and Solution: Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training | Jin Gao et.al. | 2404.12210 | link |
| 2024-04-18 | Concept Induction using LLMs: a user experiment for assessment | Adrita Barua et.al. | 2404.11875 | null |
| 2024-04-17 | Pretraining Billion-scale Geospatial Foundational Models on Frontier | Aristeidis Tsaris et.al. | 2404.11706 | null |
| 2024-04-17 | AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts | Meng Jiang et.al. | 2404.11449 | null |
| 2024-04-17 | Achieving Rotation Invariance in Convolution Operations: Shifting from Data-Driven to Mechanism-Assured | Hanlin Mo et.al. | 2404.11309 | null |
| 2024-04-17 | A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene | Wenbo Zhang et.al. | 2404.11249 | null |
| 2024-04-17 | A Novel ICD Coding Framework Based on Associated and Hierarchical Code Description Distillation | Bin Zhang et.al. | 2404.11132 | null |
| 2024-04-17 | Small Language Models are Good Too: An Empirical Study of Zero-Shot Classification | Pierre Lepagnol et.al. | 2404.11122 | null |
| 2024-04-18 | Supervised Contrastive Vision Transformer for Breast Histopathological Image Classification | Mohammad Shiri et.al. | 2404.11052 | null |
| 2024-04-17 | InfoMatch: Entropy Neural Estimation for Semi-Supervised Image Classification | Qi Han et.al. | 2404.11003 | link |
| 2024-04-16 | Incubating Text Classifiers Following User Instruction with Nothing but LLM | Letian Peng et.al. | 2404.10877 | link |
| 2024-04-16 | Vocabulary-free Image Classification and Semantic Segmentation | Alessandro Conti et.al. | 2404.10864 | link |
| 2024-04-16 | Assessing The Impact of CNN Auto Encoder-Based Image Denoising on Image Classification Tasks | Mohsen Hami et.al. | 2404.10664 | null |
| 2024-04-16 | Tree Bandits for Generative Bayes | Sean O’Hagan et.al. | 2404.10436 | null |
| 2024-04-16 | AudioProtoPNet: An interpretable deep learning model for bird sound classification | René Heinrich et.al. | 2404.10420 | null |
| 2024-04-16 | Lighter, Better, Faster Multi-Source Domain Adaptation with Gaussian Mixture Models and Optimal Transport | Eduardo Fernandes Montesuma et.al. | 2404.10261 | null |
| 2024-04-15 | Distributed Federated Learning-Based Deep Learning Model for Privacy MRI Brain Tumor Detection | Lisang Zhou et.al. | 2404.10026 | null |
| 2024-04-15 | Interaction as Explanation: A User Interaction-based Method for Explaining Image Classification Models | Hyeonggeun Yun et.al. | 2404.09828 | null |
| 2024-04-15 | Quantization of Large Language Models with an Overdetermined Basis | Daniil Merkulov et.al. | 2404.09737 | null |
| 2024-04-15 | Pseudo-label Learning with Calibrated Confidence Using an Energy-based Model | Masahito Toba et.al. | 2404.09585 | null |
| 2024-04-14 | Breast Cancer Image Classification Method Based on Deep Transfer Learning | Weimin Wang et.al. | 2404.09226 | null |
| 2024-04-14 | Coreset Selection for Object Detection | Hojun Lee et.al. | 2404.09161 | null |
| 2024-04-13 | Exploring Explainability in Video Action Recognition | Avinab Saha et.al. | 2404.09067 | null |
| 2024-04-13 | Fast Fishing: Approximating BAIT for Efficient and Scalable Deep Active Image Classification | Denis Huseljic et.al. | 2404.08981 | link |
| 2024-04-13 | PM2: A New Prompting Multi-modal Model Paradigm for Few-shot Medical Image Classification | Zhenwei Wang et.al. | 2404.08915 | null |
| 2024-04-12 | VertAttack: Taking advantage of Text Classifiers’ horizontal vision | Jonathan Rusert et.al. | 2404.08538 | null |
| 2024-04-12 | SpectralMamba: Efficient Mamba for Hyperspectral Image Classification | Jing Yao et.al. | 2404.08489 | null |
| 2024-04-12 | OTTER: Improving Zero-Shot Classification via Optimal Transport | Changho Shin et.al. | 2404.08461 | null |
| 2024-04-12 | A Survey of Neural Network Robustness Assessment in Image Recognition | Jie Wang et.al. | 2404.08285 | null |
| 2024-04-12 | Convolutional neural network classification of cancer cytopathology images: taking breast cancer as an example | MingXuan Xiao et.al. | 2404.08279 | null |
| 2024-04-11 | HGRN2: Gated Linear RNNs with State Expansion | Zhen Qin et.al. | 2404.07904 | link |
| 2024-04-11 | Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification | Ricardo Pereira et.al. | 2404.07739 | null |
| 2024-04-11 | Contrastive-Based Deep Embeddings for Label Noise-Resilient Histopathology Image Classification | Lucas Dedieu et.al. | 2404.07605 | link |
| 2024-04-11 | Learning to Classify New Foods Incrementally Via Compressed Exemplars | Justin Yang et.al. | 2404.07507 | null |
| 2024-04-11 | Interactive Prompt Debugging with Sequence Salience | Ian Tenney et.al. | 2404.07498 | null |
| 2024-04-11 | Privacy preserving layer partitioning for Deep Neural Network models | Kishore Rajasekar et.al. | 2404.07437 | null |
| 2024-04-11 | CopilotCAD: Empowering Radiologists with Report Completion Models and Quantitative Evidence from Medical Image Foundation Models | Sheng Wang et.al. | 2404.07424 | null |
| 2024-04-11 | Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling | Sourajit Saha et.al. | 2404.07410 | null |
| 2024-04-10 | Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations | Ofir Shifman et.al. | 2404.07153 | null |
| 2024-04-10 | Learning of deep convolutional network image classifiers via stochastic gradient descent and over-parametrization | Michael Kohler et.al. | 2404.07128 | null |
| 2024-04-10 | Accelerating Cardiac MRI Reconstruction with CMRatt: An Attention-Driven Approach | Anam Hashmi et.al. | 2404.06941 | null |
| 2024-04-10 | Multi-Label Continual Learning for the Medical Domain: A Novel Benchmark | Marina Ceccon et.al. | 2404.06859 | null |
| 2024-04-10 | Neural Optimizer Equation, Decay Function, and Learning Rate Schedule Joint Evolution | Brandon Morgan et.al. | 2404.06679 | null |
| 2024-04-09 | Variational Stochastic Gradient Descent for Deep Neural Networks | Haotian Chen et.al. | 2404.06549 | link |
| 2024-04-09 | On adversarial training and the 1 Nearest Neighbor classifier | Amir Hagai et.al. | 2404.06313 | link |
| 2024-04-09 | Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models | David Kurzendörfer et.al. | 2404.06309 | link |
| 2024-04-09 | Counterfactual Reasoning for Multi-Label Image Classification via Patching-Based Training | Ming-Kun Xie et.al. | 2404.06287 | null |
| 2024-04-09 | Quantum Circuit $C^*$ -algebra Net | Yuka Hashimoto et.al. | 2404.06218 | null |
| 2024-04-09 | VI-OOD: A Unified Representation Learning Framework for Textual Out-of-distribution Detection | Li-Ming Zhan et.al. | 2404.06217 | link |
| 2024-04-09 | Symmetry-guided gradient descent for quantum neural networks | Kaiming Bian et.al. | 2404.06108 | null |
| 2024-04-10 | Using Few-Shot Learning to Classify Primary Lung Cancer and Other Malignancy with Lung Metastasis in Cytological Imaging via Endobronchial Ultrasound Procedures | Ching-Kai Lin et.al. | 2404.06080 | null |
| 2024-04-08 | Neural Cellular Automata for Lightweight, Robust and Explainable Classification of White Blood Cell Images | Michael Deutges et.al. | 2404.05584 | null |
| 2024-04-08 | On the Convergence of Continual Learning with Adaptive Methods | Seungyub Han et.al. | 2404.05555 | null |
| 2024-04-08 | Multi-Task Learning for Features Extraction in Financial Annual Reports | Syrielle Montariol et.al. | 2404.05281 | link |
| 2024-04-08 | Allowing humans to interactively guide machines where to look does not always improve a human-AI team’s classification accuracy | Giang Nguyen et.al. | 2404.05238 | link |
| 2024-04-08 | iVPT: Improving Task-relevant Information Sharing in Visual Prompt Tuning by Cross-layer Dynamic Connection | Nan Zhou et.al. | 2404.05207 | null |
| 2024-04-08 | Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods | Roopkatha Dey et.al. | 2404.05159 | null |
| 2024-04-07 | PairAug: What Can Augmented Image-Text Pairs Do for Radiology? | Yutong Xie et.al. | 2404.04960 | link |
| 2024-04-07 | GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets | Dongjing Shan et.al. | 2404.04924 | null |
| 2024-04-06 | Focused Active Learning for Histopathological Image Classification | Arne Schmidt et.al. | 2404.04663 | null |
| 2024-04-06 | Trustless Audits without Revealing Data or Models | Suppakit Waiwitlikhit et.al. | 2404.04500 | null |
| 2024-04-05 | Evaluating Adversarial Robustness: A Comparison Of FGSM, Carlini-Wagner Attacks, And The Role of Distillation as Defense Mechanism | Trilokesh Ranjan Sarkar et.al. | 2404.04245 | null |
| 2024-04-05 | Noisy Label Processing for Classification: A Survey | Mengting Li et.al. | 2404.04159 | null |
| 2024-04-05 | Learning Correlation Structures for Vision Transformers | Manjin Kim et.al. | 2404.03924 | null |
| 2024-04-05 | LiDAR-Guided Cross-Attention Fusion for Hyperspectral Band Selection and Image Classification | Judy X Yang et.al. | 2404.03883 | null |
| 2024-04-04 | Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning | Spyridon Chavlis et.al. | 2404.03708 | null |
| 2024-04-05 | A Methodology to Study the Impact of Spiking Neural Network Parameters considering Event-Based Automotive Data | Iqra Bano et.al. | 2404.03493 | null |
| 2024-04-04 | Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks | Lei Zhang et.al. | 2404.03340 | null |
| 2024-04-04 | Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning | Andrei Semenov et.al. | 2404.03323 | link |
| 2024-04-04 | FACTUAL: A Novel Framework for Contrastive Learning Based Robust SAR Image Classification | Xu Wang et.al. | 2404.03225 | null |
| 2024-04-03 | Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales | Lucas E. Resck et.al. | 2404.03098 | link |
| 2024-04-03 | Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds | Kamalika Chaudhuri et.al. | 2404.02866 | link |
| 2024-04-03 | FPT: Feature Prompt Tuning for Few-shot Readability Assessment | Ziyang Wang et.al. | 2404.02772 | link |
| 2024-04-03 | Adversarial Attacks and Dimensionality in Text Classifiers | Nandish Chattopadhyay et.al. | 2404.02660 | null |
| 2024-04-04 | Non-negative Subspace Feature Representation for Few-shot Learning in Medical Imaging | Keqiang Fan et.al. | 2404.02656 | null |
| 2024-04-03 | Adaptive Cross-lingual Text Classification through In-Context One-Shot Demonstrations | Emilio Villa-Cueva et.al. | 2404.02452 | link |
| 2024-04-03 | A Novel Approach to Breast Cancer Histopathological Image Classification Using Cross-Colour Space Feature Fusion and Quantum-Classical Stack Ensemble Method | Sambit Mallick et.al. | 2404.02447 | null |
| 2024-04-03 | Enhancing Low-Resource LLMs Classification with PEFT and Synthetic Data | Parth Patwa et.al. | 2404.02422 | null |
| 2024-04-02 | Smooth Deep Saliency | Rudolf Herdt et.al. | 2404.02282 | null |
| 2024-04-02 | Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models | Matthew Kowal et.al. | 2404.02233 | null |
| 2024-04-02 | ImageNot: A contrast with ImageNet preserves model rankings | Olawale Salaudeen et.al. | 2404.02112 | null |
| 2024-04-02 | Explainability in JupyterLab and Beyond: Interactive XAI Systems for Integrated and Collaborative Workflows | Grace Guo et.al. | 2404.02081 | null |
| 2024-04-02 | Ukrainian Texts Classification: Exploration of Cross-lingual Knowledge Transfer Approaches | Daryna Dementieva et.al. | 2404.02043 | null |
| 2024-04-02 | CAM-Based Methods Can See through Walls | Magamed Taimeskhanov et.al. | 2404.01964 | link |
| 2024-04-02 | Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss | Jaeha Kim et.al. | 2404.01692 | null |
| 2024-04-02 | A Universal Knowledge Embedded Contrastive Learning Framework for Hyperspectral Image Classification | Quanwei Liu et.al. | 2404.01673 | null |
| 2024-04-01 | Can Biases in ImageNet Models Explain Generalization? | Paul Gavrikov et.al. | 2404.01509 | link |
| 2024-04-01 | Parallel Proportional Fusion of Spiking Quantum Neural Network for Optimizing Image Classification | Zuyu Xu et.al. | 2404.01359 | null |
| 2024-04-01 | Bridging Remote Sensors with Multisensor Geospatial Foundation Models | Boran Han et.al. | 2404.01260 | link |
| 2024-04-01 | Diagnosis of Skin Cancer Using VGG16 and VGG19 Based Transfer Learning Models | Amir Faghihi et.al. | 2404.01160 | null |
| 2024-03-29 | Learn “No” to Say “Yes” Better: Improving Vision-Language Models via Negations | Jaisidh Singh et.al. | 2403.20312 | link |
| 2024-03-29 | MCNet: A crowd denstity estimation network based on integrating multiscale attention module | Qiang Guo et.al. | 2403.20173 | null |
| 2024-03-29 | Segmentation, Classification and Interpretation of Breast Cancer Medical Images using Human-in-the-Loop Machine Learning | David Vázquez-Lema et.al. | 2403.20112 | null |
| 2024-03-29 | Adverb Is the Key: Simple Text Data Augmentation with Adverb Deletion | Juhwan Choi et.al. | 2403.20015 | link |
| 2024-03-29 | Diverse Feature Learning by Self-distillation and Reset | Sejik Park et.al. | 2403.19941 | null |
| 2024-03-29 | Heterogeneous Network Based Contrastive Learning Method for PolSAR Land Cover Classification | Jianfeng Cai et.al. | 2403.19902 | link |
| 2024-03-28 | X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization | Anna Kukleva et.al. | 2403.19811 | link |
| 2024-03-28 | RSMamba: Remote Sensing Image Classification with State Space Model | Keyan Chen et.al. | 2403.19654 | link |
| 2024-03-28 | Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model | Zhicai Wang et.al. | 2403.19600 | link |
| 2024-03-28 | The Bad Batches: Enhancing Self-Supervised Learning in Image Classification Through Representative Batch Curation | Ozgu Goksu et.al. | 2403.19579 | null |
| 2024-03-28 | Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach | Wei Dong et.al. | 2403.19067 | link |
| 2024-03-27 | Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media Data | Yuting Guo et.al. | 2403.19031 | null |
| 2024-03-27 | Robustness and Visual Explanation for Black Box Image, Video, and ECG Signal Classification with Reinforcement Learning | Soumyendu Sarkar et.al. | 2403.18985 | null |
| 2024-03-27 | The Impact of Uniform Inputs on Activation Sparsity and Energy-Latency Attacks in Computer Vision | Andreas Müller et.al. | 2403.18587 | link |
| 2024-03-27 | Uncertainty-Aware SAR ATR: Defending Against Adversarial Attacks via Bayesian Neural Networks | Tian Ye et.al. | 2403.18318 | null |
| 2024-03-27 | Multi-scale Unified Network for Image Classification | Wenzhuo Liu et.al. | 2403.18294 | null |
| 2024-03-26 | The Need for Speed: Pruning Transformers with One Recipe | Samir Khaki et.al. | 2403.17921 | link |
| 2024-03-26 | Compressed Multi-task embeddings for Data-Efficient Downstream training and inference in Earth Observation | Carlos Gomes et.al. | 2403.17886 | null |
| 2024-03-26 | PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition | Chenhongyi Yang et.al. | 2403.17695 | link |
| 2024-03-26 | Language Models for Text Classification: Is In-Context Learning Enough? | Aleksandra Edwards et.al. | 2403.17661 | null |
| 2024-03-26 | Boosting Few-Shot Learning with Disentangled Self-Supervised Learning and Meta-Learning for Medical Image Classification | Eva Pachetti et.al. | 2403.17530 | null |
| 2024-03-26 | HILL: Hierarchy-aware Information Lossless Contrastive Learning for Hierarchical Text Classification | He Zhu et.al. | 2403.17307 | link |
| 2024-03-25 | Histogram Layers for Neural Engineered Features | Joshua Peeples et.al. | 2403.17176 | link |
| 2024-03-25 | Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships | Rangel Daroya et.al. | 2403.17173 | link |
| 2024-03-25 | CipherFormer: Efficient Transformer Private Inference with Low Round Complexity | Weize Wang et.al. | 2403.16860 | null |
| 2024-03-25 | Assessing the Performance of Deep Learning for Automated Gleason Grading in Prostate Cancer | Dominik Müller et.al. | 2403.16695 | null |
| 2024-03-25 | DeepGleason: a System for Automated Gleason Grading of Prostate Cancer using Deep Neural Networks | Dominik Müller et.al. | 2403.16678 | link |
| 2024-03-25 | LARA: Linguistic-Adaptive Retrieval-Augmented LLMs for Multi-Turn Intent Classification | Liu Junhua et.al. | 2403.16504 | null |
| 2024-03-24 | On machine learning analysis of atomic force microscopy images for image classification, sample surface recognition | Igor Sokolov et.al. | 2403.16230 | null |
| 2024-03-24 | Leveraging Deep Learning and Xception Architecture for High-Accuracy MRI Classification in Alzheimer Diagnosis | Shaojie Li et.al. | 2403.16212 | null |
| 2024-03-24 | Multi-Task Learning with Multi-Task Optimization | Lu Bai et.al. | 2403.16162 | null |
| 2024-03-24 | CBGT-Net: A Neuromimetic Architecture for Robust Classification of Streaming Data | Shreya Sharma et.al. | 2403.15974 | link |
| 2024-03-23 | A Deep Learning Architectures for Kidney Disease Classification | Muhammad Shoaib Farooq et.al. | 2403.15895 | null |
| 2024-03-23 | VLUE: A New Benchmark and Multi-task Knowledge Transfer Learning for Vietnamese Natural Language Understanding | Phong Nguyen-Thuan Do et.al. | 2403.15882 | null |
| 2024-03-23 | VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Human Annotation-Free Pathological Image Classification | Lanfeng Zhong et.al. | 2403.15836 | null |
| 2024-03-22 | Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion | Sofia Casarin et.al. | 2403.15194 | null |
| 2024-03-22 | Image Classification with Rotation-Invariant Variational Quantum Circuits | Paul San Sebastian et.al. | 2403.15031 | null |
| 2024-03-22 | Extracting Human Attention through Crowdsourced Patch Labeling | Minsuk Chang et.al. | 2403.15013 | null |
| 2024-03-22 | Clean-image Backdoor Attacks | Dazhong Rong et.al. | 2403.15010 | null |
| 2024-03-22 | ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding | Novendra Setyawan et.al. | 2403.15004 | null |
| 2024-03-22 | MasonTigers at SemEval-2024 Task 8: Performance Analysis of Transformer-based Models on Machine-Generated Text Detection | Sadiya Sayara Chowdhury Puspo et.al. | 2403.14989 | null |
| 2024-03-21 | Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention | Ethan N. Evans et.al. | 2403.14753 | null |
| 2024-03-21 | Estimating Physical Information Consistency of Channel Data Augmentation for Remote Sensing Images | Tom Burgert et.al. | 2403.14547 | null |
| 2024-03-21 | Multi-Level Explanations for Generative Language Models | Lucas Monteiro Paes et.al. | 2403.14459 | link |
| 2024-03-21 | Tensor network compressibility of convolutional models | Sukhbinder Singh et.al. | 2403.14379 | null |
| 2024-03-21 | LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding | Masato Fujitake et.al. | 2403.14252 | null |
| 2024-03-21 | Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations | Xun Lin et.al. | 2403.14250 | null |
| 2024-03-21 | Improving Image Classification Accuracy through Complementary Intra-Class and Inter-Class Mixup | Ye Xu et.al. | 2403.14137 | link |
| 2024-03-20 | Bridge the Modality and Capacity Gaps in Vision-Language Model Selection | Chao Yi et.al. | 2403.13797 | null |
| 2024-03-20 | Leveraging feature communication in federated learning for remote sensing image classification | Anh-Kiet Duong et.al. | 2403.13575 | null |
| 2024-03-20 | MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining | Di Wang et.al. | 2403.13430 | link |
| 2024-03-20 | Building Optimal Neural Architectures using Interpretable Knowledge | Keith G. Mills et.al. | 2403.13293 | link |
| 2024-03-19 | LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images | Jing Zhang et.al. | 2403.13171 | null |
| 2024-03-19 | Improved EATFormer: A Vision Transformer for Medical Image Classification | Yulong Shisu et.al. | 2403.13167 | null |
| 2024-03-19 | SIFT-DBT: Self-supervised Initialization and Fine-Tuning for Imbalanced Digital Breast Tomosynthesis Image Classification | Yuexi Du et.al. | 2403.13148 | link |
| 2024-03-19 | Using evolutionary computation to optimize task performance of unclocked, recurrent Boolean circuits in FPGAs | Raphael Norman-Tenazas et.al. | 2403.13105 | null |
| 2024-03-19 | Investigating Text Shortening Strategy in BERT: Truncation vs Summarization | Mirza Alim Mutasodirin et.al. | 2403.12799 | link |
| 2024-03-18 | Posterior Uncertainty Quantification in Neural Networks using Data Augmentation | Luhuan Wu et.al. | 2403.12729 | link |
| 2024-03-19 | SEVEN: Pruning Transformer Model by Reserving Sentinels | Jinying Xiao et.al. | 2403.12688 | link |
| 2024-03-19 | Simple Hack for Transformers against Heavy Long-Text Classification on a Time- and Memory-Limited GPU Service | Mirza Alim Mutasodirin et.al. | 2403.12563 | null |
| 2024-03-19 | Prompt-Guided Adaptive Model Transformation for Whole Slide Image Classification | Yi Lin et.al. | 2403.12537 | null |
| 2024-03-19 | CrossTune: Black-Box Few-Shot Classification with Label Enhancement | Danqing Luo et.al. | 2403.12468 | null |
| 2024-03-18 | Generalizing deep learning models for medical image classification | Matta Sarah et.al. | 2403.12167 | null |
| 2024-03-19 | Leveraging Spatial and Semantic Feature Extraction for Skin Cancer Diagnosis with Capsule Networks and Graph Neural Networks | K. P. Santoso et.al. | 2403.12009 | null |
| 2024-03-18 | High-energy physics image classification: A Survey of Jet Applications | Hamza Kheddar et.al. | 2403.11934 | null |
| 2024-03-18 | Better (pseudo-)labels for semi-supervised instance segmentation | François Porcher et.al. | 2403.11675 | null |
| 2024-03-18 | Continual Forgetting for Pre-trained Vision Models | Hongbo Zhao et.al. | 2403.11530 | link |
| 2024-03-18 | Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting | Mingkui Tan et.al. | 2403.11491 | null |
| 2024-03-17 | Potential of Domain Adaptation in Machine Learning in Ecology and Hydrology to Improve Model Extrapolability | Haiyang Shi et.al. | 2403.11331 | null |
| 2024-03-17 | A Modified Word Saliency-Based Adversarial Attack on Text Classification Models | Hetvi Waghela et.al. | 2403.11297 | null |
| 2024-03-17 | Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation | Silvia Corbara et.al. | 2403.11265 | null |
| 2024-03-17 | Multiple Teachers-Meticulous Student: A Domain Adaptive Meta-Knowledge Distillation Model for Medical Image Classification | Shahabedin Nabavi et.al. | 2403.11226 | null |
| 2024-03-16 | Forward Learning of Graph Neural Networks | Namyong Park et.al. | 2403.11004 | link |
| 2024-03-16 | Understanding Robustness of Visual State Space Models for Image Classification | Chengbin Du et.al. | 2403.10935 | null |
| 2024-03-16 | Automatic location detection based on deep learning | Anjali Karangiya et.al. | 2403.10912 | null |
| 2024-03-14 | Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models | Akhil Kedia et.al. | 2403.09635 | link |
| 2024-03-14 | XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization | Yequan Bie et.al. | 2403.09410 | null |
| 2024-03-14 | ConDiSR: Contrastive Disentanglement and Style Regularization for Single Domain Generalization | Aleksandr Matsun et.al. | 2403.09400 | null |
| 2024-03-14 | A Hierarchical Fused Quantum Fuzzy Neural Network for Image Classification | Sheng-Yao Wu et.al. | 2403.09318 | null |
| 2024-03-14 | CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification | Yiming Ma et.al. | 2403.09281 | null |
| 2024-03-14 | Are Vision Language Models Texture or Shape Biased and Can We Steer Them? | Paul Gavrikov et.al. | 2403.09193 | link |
| 2024-03-14 | Randomized Principal Component Analysis for Hyperspectral Image Classification | Mustafa Ustuner et.al. | 2403.09117 | null |
| 2024-03-14 | CardioCaps: Attention-based Capsule Network for Class-Imbalanced Echocardiogram Classification | Hyunkyung Han et.al. | 2403.09108 | link |
| 2024-03-14 | The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models? | Qinyu Zhao et.al. | 2403.09037 | link |
| 2024-03-13 | PathM3: A Multimodal Multi-Task Multiple Instance Learning Framework for Whole Slide Image Classification and Captioning | Qifeng Zhou et.al. | 2403.08967 | null |
| 2024-03-13 | DAM: Dynamic Adapter Merging for Continual Video QA Learning | Feng Cheng et.al. | 2403.08755 | link |
| 2024-03-13 | Leveraging Compressed Frame Sizes For Ultra-Fast Video Classification | Yuxing Han et.al. | 2403.08580 | null |
| 2024-03-13 | HOLMES: HOLonym-MEronym based Semantic inspection for Convolutional Image Classifiers | Francesco Dibitonto et.al. | 2403.08536 | link |
| 2024-03-13 | Pig aggression classification using CNN, Transformers and Recurrent Networks | Junior Silva Souza et.al. | 2403.08528 | null |
| 2024-03-13 | Reduced Jeffries-Matusita distance: A Novel Loss Function to Improve Generalization Performance of Deep Classification Models | Mohammad Lashkari et.al. | 2403.08408 | null |
| 2024-03-13 | Iterative Online Image Synthesis via Diffusion Model for Imbalanced Classification | Shuhan Li et.al. | 2403.08407 | null |
| 2024-03-13 | Advancing Security in AI Systems: A Novel Approach to Detecting Backdoors in Deep Neural Networks | Khondoker Murad Hossain et.al. | 2403.08208 | null |
| 2024-03-13 | Multiscale Low-Frequency Memory Network for Improved Feature Extraction in Convolutional Neural Networks | Fuzhi Wu et.al. | 2403.08157 | link |
| 2024-03-12 | Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection | Tharindu Kumarage et.al. | 2403.08035 | null |
| 2024-03-13 | Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion | Dongyang Li et.al. | 2403.07721 | link |
| 2024-03-12 | FPT: Fine-grained Prompt Tuning for Parameter and Memory Efficient Fine Tuning in High-resolution Medical Image Classification | Yijin Huang et.al. | 2403.07576 | null |
| 2024-03-12 | Backdoor Attack with Mode Mixture Latent Modification | Hongwei Zhang et.al. | 2403.07463 | null |
| 2024-03-12 | In-context learning enables multimodal large language models to classify cancer pathology images | Dyke Ferber et.al. | 2403.07407 | null |
| 2024-03-12 | Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning | Mark D. McDonnell et.al. | 2403.07356 | null |
| 2024-03-12 | How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance | Hongkang Li et.al. | 2403.07310 | null |
| 2024-03-12 | A Bayesian Approach to OOD Robustness in Image Classification | Prakhar Kaushik et.al. | 2403.07277 | link |
| 2024-03-11 | LeOCLR: Leveraging Original Images for Contrastive Learning of Visual Representations | Mohammad Alkhalefi et.al. | 2403.06813 | null |
| 2024-03-11 | Dynamic Perturbation-Adaptive Adversarial Training on Medical Image Classification | Shuai Li et.al. | 2403.06798 | null |
| 2024-03-11 | Leveraging Internal Representations of Model for Magnetic Image Classification | Adarsh N L et.al. | 2403.06797 | null |
| 2024-03-11 | Shortcut Learning in Medical Image Segmentation | Manxi Lin et.al. | 2403.06748 | null |
| 2024-03-11 | Active Generation for Image Classification | Tao Huang et.al. | 2403.06517 | null |
| 2024-03-11 | Evolving Knowledge Distillation with Large Language Models and Active Learning | Chengyuan Liu et.al. | 2403.06414 | null |
| 2024-03-11 | ‘One size doesn’t fit all’: Learning how many Examples to use for In-Context Learning for Improved Text Classification | Manish Chandra et.al. | 2403.06402 | null |
| 2024-03-10 | Probing Image Compression For Class-Incremental Learning | Justin Yang et.al. | 2403.06288 | null |
| 2024-03-10 | Bayesian Random Semantic Data Augmentation for Medical Image Classification | Yaoyao Zhu et.al. | 2403.06138 | link |
| 2024-03-10 | Universal Debiased Editing for Fair Medical Image Classification | Ruinan Jin et.al. | 2403.06104 | null |
| 2024-03-08 | Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets | Lorenzo Brigato et.al. | 2403.05532 | null |
| 2024-03-08 | Generalized Correspondence Matching via Flexible Hierarchical Refinement and Patch Descriptor Distillation | Yu Han et.al. | 2403.05388 | null |
| 2024-03-08 | The Impact of Quantization on the Robustness of Transformer-based Text Classifiers | Seyed Parsa Neshaei et.al. | 2403.05365 | null |
| 2024-03-08 | Multiple Instance Learning with random sampling for Whole Slide Image Classification | H. Keshvarikhojasteh et.al. | 2403.05351 | null |
| 2024-03-08 | Learning Expressive And Generalizable Motion Features For Face Forgery Detection | Jingyi Zhang et.al. | 2403.05172 | null |
| 2024-03-08 | Defending Against Unforeseen Failure Modes with Latent Adversarial Training | Stephen Casper et.al. | 2403.05030 | link |
| 2024-03-07 | Fooling Neural Networks for Motion Forecasting via Adversarial Attacks | Edgar Medina et.al. | 2403.04954 | null |
| 2024-03-07 | T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers | Mariano V. Ntrougkas et.al. | 2403.04523 | link |
| 2024-03-07 | Source Matters: Source Dataset Impact on Model Robustness in Medical Imaging | Dovile Juodelyte et.al. | 2403.04484 | link |
| 2024-03-07 | Advancing Biomedical Text Mining with Community Challenges | Hui Zong et.al. | 2403.04261 | null |
| 2024-03-07 | Scalable On-Chip Optical Linear Processing Unit Using a Single Thin-Film Lithium Niobate Ring Modulator | Zhaoang Deng et.al. | 2403.04216 | null |
| 2024-03-07 | Scalable and Robust Transformer Decoders for Interpretable Image Classification with Foundation Models | Evelyn Mannix et.al. | 2403.04125 | null |
| 2024-03-07 | Privacy-preserving Fine-tuning of Large Language Models through Flatness | Tiejin Chen et.al. | 2403.04124 | null |
| 2024-03-06 | MedMamba: Vision Mamba for Medical Image Classification | Yubiao Yue et.al. | 2403.03849 | link |
| 2024-03-06 | On the Effectiveness of Distillation in Mitigating Backdoors in Pre-trained Encoder | Tingxu Han et.al. | 2403.03846 | link |
| 2024-03-06 | RADIA – Radio Advertisement Detection with Intelligent Analytics | Jorge Álvarez et.al. | 2403.03538 | null |
| 2024-03-06 | Inverse-Free Fast Natural Gradient Descent Method for Deep Learning | Xinwei Ou et.al. | 2403.03473 | null |
| 2024-03-06 | Sparse Spiking Neural Network: Exploiting Heterogeneity in Timescales for Pruning Recurrent SNN | Biswadeep Chakraborty et.al. | 2403.03409 | null |
| 2024-03-05 | RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical Rules | Miaomiao Li et.al. | 2403.02932 | link |
| 2024-03-05 | Demonstrating Mutual Reinforcement Effect through Information Flow | Chengguang Gan et.al. | 2403.02902 | null |
| 2024-03-05 | Quantum Mixed-State Self-Attention Network | Fu Chen et.al. | 2403.02871 | null |
| 2024-03-05 | SOFIM: Stochastic Optimization Using Regularized Fisher Information Matrix | Gayathri C et.al. | 2403.02833 | null |
| 2024-03-05 | SGD with Partial Hessian for Deep Neural Networks Optimization | Ying Sun et.al. | 2403.02681 | link |
| 2024-03-05 | G-EvoNAS: Evolutionary Neural Architecture Search Based on Network Growth | Juan Zou et.al. | 2403.02667 | null |
| 2024-03-05 | Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad | Sayantan Choudhury et.al. | 2403.02648 | link |
| 2024-03-05 | Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use | Imad Eddine Toubal et.al. | 2403.02626 | null |
| 2024-03-04 | When do Convolutional Neural Networks Stop Learning? | Sahan Ahmad et.al. | 2403.02473 | link |
| 2024-03-04 | NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function | Abdullah Nazhat Abdullah et.al. | 2403.02411 | link |
| 2024-03-02 | Can a Confident Prior Replace a Cold Posterior? | Martin Marek et.al. | 2403.01272 | link |
| 2024-03-02 | Leveraging Self-Supervised Learning for Scene Recognition in Child Sexual Abuse Imagery | Pedro H. V. Valois et.al. | 2403.01183 | null |
| 2024-03-02 | Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation | Lian Xu et.al. | 2403.01156 | null |
| 2024-03-02 | ELA: Efficient Local Attention for Deep Convolutional Neural Networks | Wei Xu et.al. | 2403.01123 | null |
| 2024-03-01 | Margin Discrepancy-based Adversarial Training for Multi-Domain Text Classification | Yuan Wu et.al. | 2403.00888 | null |
| 2024-03-01 | Text classification of column headers with a controlled vocabulary: leveraging LLMs for metadata enrichment | Margherita Martorana et.al. | 2403.00884 | null |
| 2024-03-01 | SURE: SUrvey REcipes for building reliable and robust deep networks | Yuting Li et.al. | 2403.00543 | link |
| 2024-03-01 | Invariant Test-Time Adaptation for Vision-Language Model Generalization | Huan Ma et.al. | 2403.00376 | null |
| 2024-02-29 | TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision | Yunyi Zhang et.al. | 2403.00165 | null |
| 2024-02-29 | Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance | Huakun Shen et.al. | 2402.19401 | null |
| 2024-02-29 | Stitching Gaps: Fusing Situated Perceptual Knowledge with Vision Transformers for High-Level Image Classification | Delfina Sol Martinez Pandiani et.al. | 2402.19339 | null |
| 2024-02-29 | Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction | Hao Li et.al. | 2402.19326 | null |
| 2024-02-29 | Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation | Fahimeh Hosseini Noohdani et.al. | 2402.18919 | null |
| 2024-02-29 | Utilizing Local Hierarchy with Adversarial Training for Hierarchical Text Classification | Zihan Wang et.al. | 2402.18825 | link |
| 2024-02-28 | Comparing Importance Sampling Based Methods for Mitigating the Effect of Class Imbalance | Indu Panigrahi et.al. | 2402.18742 | link |
| 2024-02-28 | Deep Neural Network Models Trained With A Fixed Random Classifier Transfer Better Across Domains | Hafiz Tiomoko Ali et.al. | 2402.18614 | null |
| 2024-02-28 | Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling | Mahdi Karami et.al. | 2402.18508 | null |
| 2024-02-28 | Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization | Deng Li et.al. | 2402.18447 | null |
| 2024-02-29 | A Modular System for Enhanced Robustness of Multimedia Understanding Networks via Deep Parametric Estimation | Francesco Barbato et.al. | 2402.18402 | null |
| 2024-02-28 | A Multimodal Handover Failure Detection Dataset and Baselines | Santosh Thoduka et.al. | 2402.18319 | null |
| 2024-02-28 | Classes Are Not Equal: An Empirical Study on Image Recognition Fairness | Jiequan Cui et.al. | 2402.18133 | null |
| 2024-02-27 | Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers | Yiwei Lu et.al. | 2402.17710 | null |
| 2024-02-27 | SDF2Net: Shallow to Deep Feature Fusion Network for PolSAR Image Classification | Mohammed Q. Alkhatib et.al. | 2402.17672 | link |
| 2024-02-27 | **Predict the Next Word: |
Evgenia Ilia et.al. | 2402.17527 | null |
| 2024-02-27 | Scaling Supervised Local Learning with Augmented Auxiliary Networks | Chenxiang Ma et.al. | 2402.17318 | link |
| 2024-02-26 | Offline Writer Identification Using Convolutional Neural Network Activation Features | Vincent Christlein et.al. | 2402.17029 | null |
Object Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-12-23 | Bridging Modalities and Transferring Knowledge: Enhanced Multimodal Understanding and Recognition | Gorjan Radevski et.al. | 2512.20501 | null |
| 2025-12-23 | ${D}^{3}${ETOR}: ${D}$ebate-Enhanced Pseudo Labeling and Frequency-Aware Progressive ${D}$ebiasing for Weakly-Supervised Camouflaged Object ${D}$ etection with Scribble Annotations | Jiawei Ge et.al. | 2512.20260 | null |
| 2025-12-23 | LiteFusion: Taming 3D Object Detectors from Vision-Based to Multi-Modal with Minimal Adaptation | Xiangxuan Ren et.al. | 2512.20217 | null |
| 2025-12-23 | Gaussian Process Assisted Meta-learning for Image Classification and Object Detection Models | Anna R. Flowers et.al. | 2512.20021 | null |
| 2025-12-23 | PaveSync: A Unified and Comprehensive Dataset for Pavement Distress Analysis and Classification | Blessing Agyei Kyem et.al. | 2512.20011 | null |
| 2025-12-22 | Photonic Spiking Graph Neural Network for Energy-Efficient Structured Data Processing | Wanting Yu et.al. | 2512.19182 | null |
| 2025-12-20 | The size of 3I/ATLAS from non-gravitational acceleration | John C. Forbes et.al. | 2512.18341 | null |
| 2025-12-20 | Pyramidal Adaptive Cross-Gating for Multimodal Detection | Zidong Gu et.al. | 2512.18291 | null |
| 2025-12-20 | Building UI/UX Dataset for Dark Pattern Detection and YOLOv12x-based Real-Time Object Recognition Detection System | Se-Young Jang et.al. | 2512.18269 | null |
| 2025-12-20 | Spectral Discrepancy and Cross-modal Semantic Consistency Learning for Object Detection in Hyperspectral Image | Xiao He et.al. | 2512.18245 | null |
| 2025-12-20 | ALIGN: Advanced Query Initialization with LiDAR-Image Guidance for Occlusion-Robust 3D Object Detection | Janghyun Baek et.al. | 2512.18187 | null |
| 2025-12-19 | YolovN-CBi: A Lightweight and Efficient Architecture for Real-Time Detection of Small UAVs | Ami Pandat et.al. | 2512.18046 | null |
| 2025-12-19 | StereoMV2D: A Sparse Temporal Stereo-Enhanced Framework for Robust Multi-View 3D Object Detection | Di Wu et.al. | 2512.17620 | null |
| 2025-12-19 | Foundation Model Priors Enhance Object Focus in Feature Space for Source-Free Object Detection | Sairam VCR et.al. | 2512.17514 | null |
| 2025-12-19 | PILAR: Personalizing Augmented Reality Interactions with LLM-based Human-Centric and Trustworthy Explanations for Daily Use Cases | Ripan Kumar Kundu et.al. | 2512.17172 | null |
| 2025-12-18 | DenseBEV: Transforming BEV Grid Cells into 3D Objects | Marius Dähling et.al. | 2512.16818 | null |
| 2025-12-18 | FlowDet: Unifying Object Detection and Generative Transport Flows | Enis Baty et.al. | 2512.16771 | null |
| 2025-12-18 | YOLO11-4K: An Efficient Architecture for Real-Time Small Object Detection in 4K Panoramic Images | Huma Hafeez et.al. | 2512.16493 | null |
| 2025-12-18 | Autoencoder-based Denoising Defense against Adversarial Attacks on Object Detection | Min Geun Song et.al. | 2512.16123 | null |
| 2025-12-18 | Auto-Vocabulary 3D Object Detection | Haomeng Zhang et.al. | 2512.16077 | null |
| 2025-12-17 | From Words to Wavelengths: VLMs for Few-Shot Multispectral Object Detection | Manuel Nkegoum et.al. | 2512.15971 | null |
| 2025-12-13 | Two-Step Data Augmentation for Masked Face Detection and Recognition: Turning Fake Masks to Real | Yan Yang et.al. | 2512.15774 | null |
| 2025-12-17 | IMKD: Intensity-Aware Multi-Level Knowledge Distillation for Camera-Radar Fusion | Shashank Mishra et.al. | 2512.15581 | null |
| 2025-12-17 | Evaluation of deep learning architectures for wildlife object detection: A comparative study of ResNet and Inception | Malach Obisa Amonga et.al. | 2512.15480 | null |
| 2025-12-17 | Vision-based module for accurately reading linear scales in a laboratory | Parvesh Saini et.al. | 2512.15327 | null |
| 2025-12-17 | EPSM: A Novel Metric to Evaluate the Safety of Environmental Perception in Autonomous Driving | Jörg Gamerdinger et.al. | 2512.15195 | null |
| 2025-12-17 | Criticality Metrics for Relevance Classification in Safety Evaluation of Object Detection in Automated Driving | Jörg Gamerdinger et.al. | 2512.15181 | null |
| 2025-12-17 | Beyond Proximity: A Keypoint-Trajectory Framework for Classifying Affiliative and Agonistic Social Networks in Dairy Cattle | Sibi Parivendan et.al. | 2512.14998 | null |
| 2025-12-16 | TUMTraf EMOT: Event-Based Multi-Object Tracking Dataset and Baseline for Traffic Scenarios | Mengyu Li et.al. | 2512.14595 | null |
| 2025-12-16 | 4D-RaDiff: Latent Diffusion for 4D Radar Point Cloud Generation | Jimmie Kwok et.al. | 2512.14235 | null |
| 2025-12-16 | CIS-BA: Continuous Interaction Space Based Backdoor Attack for Object Detection in the Real-World | Shuxin Zhao et.al. | 2512.14158 | null |
| 2025-12-16 | Neurosymbolic Inference On Foundation Models For Remote Sensing Text-to-image Retrieval With Complex Queries | Emanuele Mezzi et.al. | 2512.14102 | null |
| 2025-12-16 | Deep Learning Perspective of Scene Understanding in Autonomous Robots | Afia Maham et.al. | 2512.14020 | null |
| 2025-12-16 | Real-Time Service Subscription and Adaptive Offloading Control in Vehicular Edge Computing | Chuanchao Gao et.al. | 2512.14002 | null |
| 2025-12-16 | FocalComm: Hard Instance-Aware Multi-Agent Perception | Dereje Shenkut et.al. | 2512.13982 | null |
| 2025-12-15 | Route-DETR: Pairwise Query Routing in Transformers for Object Detection | Ye Zhang et.al. | 2512.13876 | null |
| 2025-12-15 | VajraV1 – The most accurate Real Time Object Detector of the YOLO family | Naman Balbir Singh Makkar et.al. | 2512.13834 | null |
| 2025-12-15 | Near-Field Perception for Safety Enhancement of Autonomous Mobile Robots in Manufacturing Environments | Li-Wei Shih et.al. | 2512.13561 | null |
| 2025-12-15 | On the Ability of Deep Learning to Detect Signals with Unknown Parameters | Tom Anders et.al. | 2512.13542 | null |
| 2025-12-15 | Computer vision training dataset generation for robotic environments using Gaussian splatting | Patryk Niżeniec et.al. | 2512.13411 | null |
| 2025-12-15 | Diffusion-Based Restoration for Multi-Modal 3D Object Detection in Adverse Weather | Zhijian He et.al. | 2512.13107 | null |
| 2025-12-14 | Cross-Level Sensor Fusion with Object Lists via Transformer for 3D Object Detection | Xiangzhong Liu et.al. | 2512.12884 | null |
| 2025-12-13 | INDOOR-LiDAR: Bridging Simulation and Reality for Robot-Centric 360 degree Indoor LiDAR Perception – A Robot-Centric Hybrid Dataset | Haichuan Li et.al. | 2512.12377 | null |
| 2025-12-13 | WeDetect: Fast Open-Vocabulary Object Detection as Retrieval | Shenghao Fu et.al. | 2512.12309 | null |
| 2025-12-13 | Cognitive-YOLO: LLM-Driven Architecture Synthesis from First Principles of Data for Object Detection | Jiahao Zhao et.al. | 2512.12281 | null |
| 2025-12-13 | AI-Augmented Pollen Recognition in Optical and Holographic Microscopy for Veterinary Imaging | Swarn S. Warshaneyan et.al. | 2512.12101 | null |
| 2025-12-12 | TransBridge: Boost 3D Object Detection by Scene-Level Completion with Transformer Decoder | Qinghao Meng et.al. | 2512.11926 | null |
| 2025-12-12 | Depth-Copy-Paste: Multimodal and Depth-Aware Compositing for Robust Face Detection | Qiushi Guo et.al. | 2512.11683 | null |
| 2025-12-12 | DOS: Distilling Observable Softmaps of Zipfian Prototypes for Self-Supervised Point Representation | Mohamed Abdelsamad et.al. | 2512.11465 | null |
| 2025-12-12 | Assisted Refinement Network Based on Channel Information Interaction for Camouflaged and Salient Object Detection | Kuan Wang et.al. | 2512.11369 | null |
| 2025-12-12 | Reliable Detection of Minute Targets in High-Resolution Aerial Imagery across Temporal Shifts | Mohammad Sadegh Gholizadeh et.al. | 2512.11360 | null |
| 2025-12-11 | VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction | Weitai Kang et.al. | 2512.11099 | null |
| 2025-12-11 | Salient Object Detection in Complex Weather Conditions via Noise Indicators | Quan Chen et.al. | 2512.10592 | null |
| 2025-12-11 | Adaptive Dual-Weighted Gravitational Point Cloud Denoising Method | Ge Zhang et.al. | 2512.10386 | null |
| 2025-12-10 | ABBSPO: Adaptive Bounding Box Scaling and Symmetric Prior based Orientation Prediction for Detecting Aerial Image Objects | Woojin Lee et.al. | 2512.10031 | null |
| 2025-12-10 | NordFKB: a fine-grained benchmark dataset for geospatial AI in Norway | Sander Riisøen Jyhne et.al. | 2512.09913 | null |
| 2025-12-10 | Hands-on Evaluation of Visual Transformers for Object Recognition and Detection | Dimitrios N. Vlachogiannis et.al. | 2512.09579 | null |
| 2025-12-10 | MODA: The First Challenging Benchmark for Multispectral Object Detection in Aerial Images | Shuaihao Han et.al. | 2512.09489 | null |
| 2025-12-10 | A Hierarchical, Model-Based System for High-Performance Humanoid Soccer | Quanyou Wang et.al. | 2512.09431 | null |
| 2025-12-10 | Identifying Bias in Machine-generated Text Detection | Kevin Stowe et.al. | 2512.09292 | null |
| 2025-12-10 | ROI-Packing: Efficient Region-Based Compression for Machine Vision | Md Eimran Hossain Eimon et.al. | 2512.09258 | null |
| 2025-12-09 | Automated Pollen Recognition in Optical and Holographic Microscopy Images | Swarn Singh Warshaneyan et.al. | 2512.08589 | null |
| 2025-12-09 | SSCATeR: Sparse Scatter-Based Convolution Algorithm with Temporal Data Recycling for Real-Time 3D Object Detection in LiDAR Point Clouds | Alexander Dow et.al. | 2512.08557 | null |
| 2025-12-09 | Distilling Future Temporal Knowledge with Masked Feature Reconstruction for 3D Object Detection | Haowen Zheng et.al. | 2512.08247 | null |
| 2025-12-09 | SOP^2: Transfer Learning with Scene-Oriented Prompt Pool on 3D Object Detection | Ching-Hung Cheng et.al. | 2512.08223 | null |
| 2025-12-09 | Metasurfaces Enable Active-Like Passive Radar | Mingyi Li et.al. | 2512.08208 | null |
| 2025-11-27 | Semi-Supervised Contrastive Learning with Orthonormal Prototypes | Huanran Li et.al. | 2512.07880 | null |
| 2025-12-08 | An AI-Powered Autonomous Underwater System for Sea Exploration and Scientific Research | Hamad Almazrouei et.al. | 2512.07652 | null |
| 2025-12-08 | Towards Robust DeepFake Detection under Unstable Face Sequences: Adaptive Sparse Graph Embedding with Order-Free Representation and Explicit Laplacian Spectral Prior | Chih-Chung Hsu et.al. | 2512.07498 | null |
| 2025-12-08 | Enhancing Small Object Detection with YOLO: A Novel Framework for Improved Accuracy and Efficiency | Mahila Moghadami et.al. | 2512.07379 | null |
| 2025-12-08 | A graph generation pipeline for critical infrastructures based on heuristics, images and depth data | Mike Diessner et.al. | 2512.07269 | null |
| 2025-12-08 | DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning | Nithin Sivakumaran et.al. | 2512.07132 | null |
| 2025-12-08 | DFIR-DETR: Frequency Domain Enhancement and Dynamic Feature Aggregation for Cross-Scene Small Object Detection | Bo Gao et.al. | 2512.07078 | null |
| 2025-12-07 | Large Language Models and Forensic Linguistics: Navigating Opportunities and Threats in the Age of Generative AI | George Mikros et.al. | 2512.06922 | null |
| 2025-12-07 | Spatial Retrieval Augmented Autonomous Driving | Xiaosong Jia et.al. | 2512.06865 | null |
| 2025-12-07 | CoT4Det: A Chain-of-Thought Framework for Perception-Oriented Vision-Language Tasks | Yu Qi et.al. | 2512.06663 | null |
| 2025-12-07 | TextMamba: Scene Text Detector with Mamba | Qiyan Zhao et.al. | 2512.06657 | null |
| 2025-12-06 | Neural expressiveness for beyond importance model compression | Angelos-Christos Maroudis et.al. | 2512.06440 | null |
| 2025-12-06 | Are AI-Generated Driving Videos Ready for Autonomous Driving? A Diagnostic Evaluation Framework | Xinhao Xiang et.al. | 2512.06376 | null |
| 2025-12-05 | OWL: Unsupervised 3D Object Detection by Occupancy Guided Warm-up and Large Model Priors Reasoning | Xusheng Guo et.al. | 2512.05698 | null |
| 2025-12-05 | LeAD-M3D: Leveraging Asymmetric Distillation for Real-time Monocular 3D Detection | Johannes Meier et.al. | 2512.05663 | null |
| 2025-12-05 | An Integrated System for WEEE Sorting Employing X-ray Imaging, AI-based Object Detection and Segmentation, and Delta Robot Manipulation | Panagiotis Giannikos et.al. | 2512.05599 | null |
| 2025-12-05 | Concept-based Explainable Data Mining with VLM for 3D Detection | Mai Tsujimoto et.al. | 2512.05482 | null |
| 2025-12-05 | Moving object detection from multi-depth images with an attention-enhanced CNN | Masato Shibukawa et.al. | 2512.05415 | null |
| 2025-12-05 | YOLO and SGBM Integration for Autonomous Tree Branch Detection and Depth Estimation in Radiata Pine Pruning Applications | Yida Lin et.al. | 2512.05412 | null |
| 2025-12-04 | GeoPE:A Unified Geometric Positional Embedding for Structured Tensors | Yupu Yao et.al. | 2512.04963 | null |
| 2025-12-04 | You Only Train Once (YOTO): A Retraining-Free Object Detection Framework | Priyanto Hidayatullah et.al. | 2512.04888 | null |
| 2025-12-04 | DuGI-MAE: Improving Infrared Mask Autoencoders via Dual-Domain Guidance | Yinghui Xing et.al. | 2512.04511 | null |
| 2025-12-04 | Dual-Stream Spectral Decoupling Distillation for Remote Sensing Object Detection | Xiangyi Gao et.al. | 2512.04413 | null |
| 2025-12-03 | Real-time Cricket Sorting By Sex | Juan Manuel Cantarero Angulo et.al. | 2512.04311 | null |
| 2025-12-03 | Fast & Efficient Normalizing Flows and Applications of Image Generative Models | Sandeep Nagar et.al. | 2512.04039 | null |
| 2025-12-03 | MKSNet: Advanced Small Object Detection in Remote Sensing Imagery with Multi-Kernel and Dual Attention Mechanisms | Jiahao Zhang et.al. | 2512.03640 | null |
| 2025-12-03 | Real-Time Control and Automation Framework for Acousto-Holographic Microscopy | Hasan Berkay Abdioğlu et.al. | 2512.03539 | null |
| 2025-12-03 | YOLOA: Real-Time Affordance Detection via LLM Adapter | Yuqi Ji et.al. | 2512.03418 | null |
| 2025-12-02 | GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection | Md Sohag Mia et.al. | 2512.02991 | null |
| 2025-12-02 | BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection | Guowen Zhang et.al. | 2512.02972 | null |
| 2025-12-02 | MRD: Multi-resolution Retrieval-Detection Fusion for High-Resolution Image Understanding | Fan Yang et.al. | 2512.02906 | null |
| 2025-12-02 | ALDI-ray: Adapting the ALDI Framework for Security X-ray Object Detection | Omid Reza Heidari et.al. | 2512.02696 | null |
| 2025-12-02 | SAM2Grasp: Resolve Multi-modal Grasping via Prompt-conditioned Temporal Action Prediction | Shengkai Wu et.al. | 2512.02609 | null |
| 2025-12-02 | GeoDiT: A Diffusion-based Vision-Language Model for Geospatial Understanding | Jiaqi Liu et.al. | 2512.02505 | null |
| 2025-12-02 | Temporal Dynamics Enhancer for Directly Trained Spiking Object Detectors | Fan Luo et.al. | 2512.02447 | null |
| 2025-12-01 | Physical ID-Transfer Attacks against Multi-Object Tracking via Adversarial Trajectory | Chenyi Wang et.al. | 2512.01934 | null |
| 2025-12-01 | SAM3-UNet: Simplified Adaptation of Segment Anything Model 3 | Xinyu Xiong et.al. | 2512.01789 | null |
| 2025-12-01 | Bridging the Scale Gap: Balanced Tiny and General Object Detection in Remote Sensing Imagery | Zhicheng Zhao et.al. | 2512.01665 | null |
| 2025-12-01 | ViT $^3$ : Unlocking Test-Time Training in Vision | Dongchen Han et.al. | 2512.01643 | null |
| 2025-12-01 | OpenBox: Annotate Any Bounding Boxes in 3D | In-Jae Lee et.al. | 2512.01352 | null |
| 2025-12-01 | FOD-S2R: A FOD Dataset for Sim2Real Transfer Learning based Object Detection | Ashish Vashist et.al. | 2512.01315 | null |
| 2025-12-01 | Supervised Contrastive Machine Unlearning of Background Bias in Sonar Image Classification with Fine-Grained Explainable AI | Kamal Basha S et.al. | 2512.01291 | null |
| 2025-12-01 | VSRD++: Autolabeling for 3D Object Detection via Instance-Aware Volumetric Silhouette Rendering | Zihua Liu et.al. | 2512.01178 | null |
| 2025-12-01 | Real-Time On-the-Go Annotation Framework Using YOLO for Automated Dataset Generation | Mohamed Abdallah Salem et.al. | 2512.01165 | null |
| 2025-11-30 | Autonomous Grasping On Quadruped Robot With Task Level Interaction | Muhtadin et.al. | 2512.01052 | null |
| 2025-11-30 | Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning | Haozhen Gong et.al. | 2512.00818 | null |
| 2025-11-30 | DEJIMA: A Novel Large-scale Japanese Dataset for Image Captioning and Visual Question Answering | Toshiki Katsube et.al. | 2512.00773 | null |
| 2025-11-29 | MM-DETR: An Efficient Multimodal Detection Transformer with Mamba-Driven Dual-Granularity Fusion and Frequency-Aware Modality Adapters | Jianhong Han et.al. | 2512.00363 | null |
| 2025-11-28 | Hybrid Synthetic Data Generation with Domain Randomization Enables Zero-Shot Vision-Based Part Inspection Under Extreme Class Imbalance | Ruo-Syuan Mei et.al. | 2512.00125 | null |
| 2025-11-25 | Diffusion-Based Synthetic Brightfield Microscopy Images for Enhanced Single Cell Detection | Mario de Jesus da Graca et.al. | 2512.00078 | null |
| 2025-11-24 | ProvRain: Rain-Adaptive Denoising and Vehicle Detection via MobileNet-UNet and Faster R-CNN | Aswinkumar Varathakumaran et.al. | 2512.00073 | null |
| 2025-11-23 | PEFT-DML: Parameter-Efficient Fine-Tuning Deep Metric Learning for Robust Multi-Modal 3D Object Detection in Autonomous Driving | Abdolazim Rezaei et.al. | 2512.00060 | null |
| 2025-11-28 | Object-Centric Data Synthesis for Category-level Object Detection | Vikhyat Agarwal et.al. | 2511.23450 | null |
| 2025-11-28 | Toward Automatic Safe Driving Instruction: A Large-Scale Vision Language Model Approach | Haruki Sakajo et.al. | 2511.23311 | null |
| 2025-11-28 | Synthetic Industrial Object Detection: GenAI vs. Feature-Based Methods | Jose Moises Araya-Martinez et.al. | 2511.23241 | null |
| 2025-11-28 | Zero-Shot Multi-Criteria Visual Quality Inspection for Semi-Controlled Industrial Environments via Real-Time 3D Digital Twin Simulation | Jose Moises Araya-Martinez et.al. | 2511.23214 | null |
| 2025-11-28 | Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding | Anik De et.al. | 2511.23071 | null |
| 2025-11-28 | Barcode and QR Code Object Detection: An Experimental Study on YOLOv8 Models | Kushagra Pandya et.al. | 2511.22937 | null |
| 2025-11-28 | DM $^3$ T: Harmonizing Modalities via Diffusion for Multi-Object Tracking | Weiran Li et.al. | 2511.22896 | null |
| 2025-11-27 | DocVAL: Validated Chain-of-Thought Distillation for Grounded Document VQA | Ahmad Mohammadshirazi et.al. | 2511.22521 | null |
| 2025-11-27 | Small Object Detection for Birds with Swin Transformer | Da Huo et.al. | 2511.22310 | null |
| 2025-11-27 | Simplex-Optimized Hybrid Ensemble for Large Language Model Text Detection Under Generative Distribution Drif | Sepyan Purnama Kristanto et.al. | 2511.22153 | null |
| 2025-11-27 | Bistatic Passive Tracking via CSI Power | Zhongqin Wang et.al. | 2511.22144 | null |
| 2025-11-27 | SemOD: Semantic Enabled Object Detection Network under Various Weather Conditions | Aiyinsi Zuo et.al. | 2511.22142 | null |
| 2025-11-27 | PAGen: Phase-guided Amplitude Generation for Domain-adaptive Object Detection | Shuchen Du et.al. | 2511.22029 | null |
| 2025-11-22 | A Lightweight Approach to Detection of AI-Generated Texts Using Stylometric Features | Sergey K. Aityan et.al. | 2511.21744 | null |
| 2025-11-26 | Continual Error Correction on Low-Resource Devices | Kirill Paramonov et.al. | 2511.21652 | null |
| 2025-11-26 | CanKD: Cross-Attention-based Non-local operation for Feature-based Knowledge Distillation | Shizhe Sun et.al. | 2511.21503 | null |
| 2025-11-26 | Co-Training Vision Language Models for Remote Sensing Multi-task Learning | Qingyun Li et.al. | 2511.21272 | null |
| 2025-11-26 | OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection | Chujie Wang et.al. | 2511.21064 | null |
| 2025-11-26 | AerialMind: Towards Referring Multi-Object Tracking in UAV Scenarios | Chenglizhao Chen et.al. | 2511.21053 | null |
| 2025-11-26 | Wavefront-Constrained Passive Obscured Object Detection | Zhiwen Zheng et.al. | 2511.20991 | null |
| 2025-11-26 | RefOnce: Distilling References into a Prototype Memory for Referring Camouflaged Object Detection | Yu-Huan Wu et.al. | 2511.20989 | null |
| 2025-11-25 | Video Object Recognition in Mobile Edge Networks: Local Tracking or Edge Detection? | Kun Guo et.al. | 2511.20716 | null |
| 2025-11-25 | MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities | Tooba Tehreem Sheikh et.al. | 2511.20650 | null |
| 2025-11-25 | Zoo3D: Zero-Shot 3D Object Detection at Scene Level | Andrey Lemeshko et.al. | 2511.20253 | null |
| 2025-11-25 | Intelligent Image Search Algorithms Fusing Visual Large Models | Kehan Wang et.al. | 2511.19920 | null |
| 2025-11-24 | Maritime Small Object Detection from UAVs using Deep Learning with Altitude-Aware Dynamic Tiling | Sakib Ahmed et.al. | 2511.19728 | null |
| 2025-11-24 | Studying Maps at Scale: A Digital Investigation of Cartography and the Evolution of Figuration | Remi Petitpierre et.al. | 2511.19538 | null |
| 2025-11-24 | SAM3-Adapter: Efficient Adaptation of Segment Anything 3 for Camouflage Object Segmentation, Shadow Detection, and Medical Image Segmentation | Tianrun Chen et.al. | 2511.19425 | null |
| 2025-11-24 | IDEAL-M3D: Instance Diversity-Enriched Active Learning for Monocular 3D Detection | Johannes Meier et.al. | 2511.19301 | null |
| 2025-11-24 | SpectraNet: FFT-assisted Deep Learning Classifier for Deepfake Face Detection | Nithira Jayarathne et.al. | 2511.19187 | null |
| 2025-11-24 | MambaRefine-YOLO: A Dual-Modality Small Object Detector for UAV Imagery | Shuyu Cao et.al. | 2511.19134 | null |
| 2025-11-24 | 3M-TI: High-Quality Mobile Thermal Imaging via Calibration-free Multi-Camera Cross-Modal Diffusion | Minchong Chen et.al. | 2511.19117 | null |
| 2025-11-24 | LLMAID: Identifying AI Capabilities in Android Apps with LLMs | Pei Liu et.al. | 2511.19059 | null |
| 2025-11-24 | LAA3D: A Benchmark of Detecting and Tracking Low-Altitude Aircraft in 3D Space | Hai Wu et.al. | 2511.19057 | null |
| 2025-11-24 | Enhancing Fast Radio Transient Detection with Mask R-CNN Image Segmentation | Sergio Belmonte Diaz et.al. | 2511.19014 | null |
| 2025-11-24 | Peregrine: One-Shot Fine-Tuning for FHE Inference of General Deep CNNs | Huaming Ling et.al. | 2511.18976 | null |
| 2025-11-24 | DualGazeNet: A Biologically Inspired Dual-Gaze Query Network for Salient Object Detection | Yu Zhang et.al. | 2511.18865 | null |
| 2025-11-24 | DetAny4D: Detect Anything 4D Temporally in a Streaming RGB Video | Jiawei Hou et.al. | 2511.18814 | null |
| 2025-11-24 | StereoDETR: Stereo-based Transformer for 3D Object Detection | Shiyi Mu et.al. | 2511.18788 | null |
| 2025-11-24 | DriveFlow: Rectified Flow Adaptation for Robust 3D Object Detection in Autonomous Driving | Hongbin Lin et.al. | 2511.18713 | null |
| 2025-11-24 | Dendritic Convolution for Noise Image Recognition | Jiarui Xue et.al. | 2511.18699 | null |
| 2025-11-24 | Multimodal Real-Time Anomaly Detection and Industrial Applications | Aman Verma et.al. | 2511.18698 | null |
| 2025-11-24 | Exploring Surround-View Fisheye Camera 3D Object Detection | Changcai Li et.al. | 2511.18695 | null |
| 2025-11-23 | UniFlow: Towards Zero-Shot LiDAR Scene Flow for Autonomous Vehicles via Cross-Domain Generalization | Siyi Li et.al. | 2511.18254 | null |
| 2025-11-22 | VK-Det: Visual Knowledge Guided Prototype Learning for Open-Vocabulary Aerial Object Detection | Jianhang Yao et.al. | 2511.18075 | null |
| 2025-11-22 | Diverse Instance Generation via Diffusion Models for Enhanced Few-Shot Object Detection in Remote Sensing Images | Yanxing Liu et.al. | 2511.18031 | null |
| 2025-11-22 | State and Scene Enhanced Prototypes for Weakly Supervised Open-Vocabulary Object Detection | Jiaying Zhou et.al. | 2511.18012 | null |
| 2025-11-21 | REXO: Indoor Multi-View Radar Object Detection via 3D Bounding Box Diffusion | Ryoma Yataka et.al. | 2511.17806 | null |
| 2025-11-21 | PUCP-Metrix: An Open-source and Comprehensive Toolkit for Linguistic Analysis of Spanish Texts | Javier Alonso Villegas Luis et.al. | 2511.17402 | null |
| 2025-11-04 | In-Context Adaptation of VLMs for Few-Shot Cell Detection in Optical Microscopy | Shreyan Ganguly et.al. | 2511.05565 | null |
| 2025-11-03 | Compressing Multi-Task Model for Autonomous Driving via Pruning and Knowledge Distillation | Jiayuan Wang et.al. | 2511.05557 | null |
| 2025-11-06 | NovisVQ: A Streaming Convolutional Neural Network for No-Reference Opinion-Unaware Frame Quality Assessment | Kylie Cancilla et.al. | 2511.04628 | null |
| 2025-11-06 | Evaluating the Impact of Weather-Induced Sensor Occlusion on BEVFusion for 3D Object Detection | Sanjay Kumar et.al. | 2511.04347 | null |
| 2025-11-06 | Comparative Study of CNN Architectures for Binary Classification of Horses and Motorcycles in the VOC 2008 Dataset | Muhammad Annas Shaikh et.al. | 2511.04344 | null |
| 2025-11-06 | Deep learning-based object detection of offshore platforms on Sentinel-1 Imagery and the impact of synthetic training data | Robin Spanier et.al. | 2511.04304 | null |
| 2025-11-06 | DMSORT: An efficient parallel maritime multi-object tracking architecture for unmanned vessel platforms | Shengyu Tang et.al. | 2511.04128 | null |
| 2025-11-05 | Desert Waste Detection and Classification Using Data-Based and Model-Based Enhanced YOLOv12 DL Model | Abdulmumin Sa’ad et.al. | 2511.03888 | null |
| 2025-11-05 | ISC-Perception: A Hybrid Computer Vision Dataset for Object Detection in Novel Steel Assembly | Miftahur Rahman et.al. | 2511.03098 | null |
| 2025-11-05 | A Computer Vision Based Proxy for Political Polarization in Religious Countries: A Turkiye Case Study | Liangze Ke et.al. | 2511.03088 | null |
| 2025-11-04 | Diffusion Models are Robust Pretrainers | Mika Yagoda et.al. | 2511.02793 | null |
| 2025-11-04 | DetectiumFire: A Comprehensive Multi-modal Dataset Bridging Vision and Language for Fire Understanding | Zixuan Liu et.al. | 2511.02495 | null |
| 2025-11-04 | Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization | Tao Liu et.al. | 2511.02489 | link |
| 2025-11-04 | Facial Expression Recognition System Using DNN Accelerator with Multi-threading on FPGA | Takuto Ando et.al. | 2511.02408 | null |
| 2025-11-04 | 3D Point Cloud Object Detection on Edge Devices for Split Computing | Taisuke Noguchi et.al. | 2511.02293 | null |
| 2025-11-04 | Autobiasing Event Cameras for Flickering Mitigation | Mehdi Sefidgar Dilmaghani et.al. | 2511.02180 | null |
| 2025-11-03 | UniLION: Towards Unified Autonomous Driving Model with Linear Group RNNs | Zhe Liu et.al. | 2511.01768 | link |
| 2025-11-03 | CGF-DETR: Cross-Gated Fusion DETR for Enhanced Pneumonia Detection in Chest X-rays | Yefeng Wu et.al. | 2511.01730 | null |
| 2025-11-03 | Contrast-Guided Cross-Modal Distillation for Thermal Object Detection | SiWoo Kim et.al. | 2511.01435 | null |
| 2025-11-03 | Eyes on Target: Gaze-Aware Object Detection in Egocentric Video | Vishakha Lall et.al. | 2511.01237 | null |
| 2025-11-03 | DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection | Guoxin Ma et.al. | 2511.01192 | null |
| 2025-11-02 | Advancing Machine-Generated Text Detection from an Easy to Hard Supervision Perspective | Chenwang Wu et.al. | 2511.00988 | null |
| 2025-11-02 | A Hybrid YOLOv5-SSD IoT-Based Animal Detection System for Durian Plantation Protection | Anis Suttan Shahrir et.al. | 2511.00777 | null |
| 2025-10-28 | Which LiDAR scanning pattern is better for roadside perception: Repetitive or Non-repetitive? | Zhiqi Qi et.al. | 2511.00060 | null |
| 2025-10-31 | Gaussian Combined Distance: A Generic Metric for Object Detection | Ziqian Guan et.al. | 2510.27649 | null |
| 2025-10-31 | Parameterized Prompt for Incremental Object Detection | Zijia An et.al. | 2510.27316 | null |
| 2025-10-31 | C-LEAD: Contrastive Learning for Enhanced Adversarial Defense | Suklav Ghosh et.al. | 2510.27249 | null |
| 2025-10-31 | M^3Detection: Multi-Frame Multi-Level Feature Fusion for Multi-Modal 3D Object Detection with Camera and 4D Imaging Radar | Xiaozhi Li et.al. | 2510.27166 | null |
| 2025-10-31 | Generating Accurate and Detailed Captions for High-Resolution Images | Hankyeol Lee et.al. | 2510.27164 | null |
| 2025-10-31 | MLPerf Automotive | Radoyeh Shojaei et.al. | 2510.27065 | null |
| 2025-10-30 | Using Salient Object Detection to Identify Manipulative Cookie Banners that Circumvent GDPR | Riley Grossman et.al. | 2510.26967 | null |
| 2025-10-30 | Improving Classification of Occluded Objects through Scene Context | Courtney M. King et.al. | 2510.26681 | null |
| 2025-10-30 | All You Need for Object Detection: From Pixels, Points, and Prompts to Next-Gen Fusion and Multimodal LLMs/VLMs in Autonomous Vehicles | Sayed Pedram Haeri Boroujeni et.al. | 2510.26641 | null |
| 2025-10-30 | PT-DETR: Small Target Detection Based on Partially-Aware Detail Focus | Bingcong Huo et.al. | 2510.26630 | null |
| 2025-10-30 | Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras | Christoffer Koo Øhrstrøm et.al. | 2510.26614 | link |
| 2025-10-30 | Detecting Unauthorized Vehicles using Deep Learning for Smart Cities: A Case Study on Bangladesh | Sudipto Das Sukanto et.al. | 2510.26154 | null |
| 2025-10-29 | Enhancing Underwater Object Detection through Spatio-Temporal Analysis and Spatial Attention Networks | Sai Likhith Karri et.al. | 2510.25797 | null |
| 2025-10-29 | Prototype-Driven Adaptation for Few-Shot Object Detection | Yushen Huang et.al. | 2510.25318 | null |
| 2025-10-29 | GaTector+: A Unified Head-free Framework for Gaze Object and Gaze Following Prediction | Yang Jin et.al. | 2510.25301 | null |
| 2025-10-29 | RT-DETRv4: Painlessly Furthering Real-Time Object Detection with Vision Foundation Models | Zijun Liao et.al. | 2510.25257 | null |
| 2025-10-29 | Test-Time Adaptive Object Detection with Foundation Model | Yingjie Gao et.al. | 2510.25175 | null |
| 2025-10-29 | DINO-YOLO: Self-Supervised Pre-training for Data-Efficient Object Detection in Civil Engineering Applications | Malaisree P et.al. | 2510.25140 | null |
| 2025-10-28 | Pixels to Signals: A Real-Time Framework for Traffic Demand Estimation | H Mhatre et.al. | 2510.24902 | null |
| 2025-10-28 | MIC-BEV: Multi-Infrastructure Camera Bird’s-Eye-View Transformer with Relation-Aware Fusion for 3D Object Detection | Yun Zhang et.al. | 2510.24688 | link |
| 2025-10-28 | A Critical Study towards the Detection of Parkinsons Disease using ML Technologies | Vivek Chetia et.al. | 2510.24456 | null |
| 2025-10-28 | Delving into Cascaded Instability: A Lipschitz Continuity View on Image Restoration and Object Detection Synergy | Qing Zhao et.al. | 2510.24232 | null |
| 2025-10-28 | Mars-Bench: A Benchmark for Evaluating Foundation Models for Mars Science Tasks | Mirali Purohit et.al. | 2510.24010 | null |
| 2025-10-27 | A U-Net and Transformer Pipeline for Multilingual Image Translation | Siddharth Sahay et.al. | 2510.23554 | null |
| 2025-10-27 | FRBNet: Revisiting Low-Light Vision through Frequency-Domain Radial Basis Network | Fangtong Sun et.al. | 2510.23444 | null |
| 2025-10-27 | One-Timestep is Enough: Achieving High-performance ANN-to-SNN Conversion via Scale-and-Fire Neurons | Qiuyang Chen et.al. | 2510.23383 | null |
| 2025-10-27 | Spoofing resilience for simple-detection quantum illumination LIDAR | Richard J. Murchie et.al. | 2510.23228 | null |
| 2025-10-27 | AG-Fusion: adaptive gated multimodal fusion for 3d object detection in complex scenes | Sixian Liu et.al. | 2510.23151 | null |
| 2025-10-27 | DQ3D: Depth-guided Query for Transformer-Based 3D Object Detection in Traffic Scenarios | Ziyu Wang et.al. | 2510.23144 | null |
| 2025-10-27 | M $^{3}$ T2IBench: A Large-Scale Multi-Category, Multi-Instance, Multi-Relation Text-to-Image Benchmark | Huixuan Zhang et.al. | 2510.23020 | null |
| 2025-10-26 | A Comprehensive Dataset for Human vs. AI Generated Text Detection | Rajarshi Roy et.al. | 2510.22874 | null |
| 2025-10-26 | A Critical Study on Tea Leaf Disease Detection using Deep Learning Techniques | Nabajyoti Borah et.al. | 2510.22647 | null |
| 2025-10-25 | 3D Roadway Scene Object Detection with LIDARs in Snowfall Conditions | Ghazal Farhani et.al. | 2510.22436 | null |
| 2025-10-25 | TrajGATFormer: A Graph-Based Transformer Approach for Worker and Obstacle Trajectory Prediction in Off-site Construction Environments | Mohammed Alduais et.al. | 2510.22205 | null |
| 2025-10-21 | Comparative Analysis of Object Detection Algorithms for Surface Defect Detection | Arpan Maity et.al. | 2510.21811 | null |
| 2025-10-24 | On Thin Ice: Towards Explainable Conservation Monitoring via Attribution and Perturbations | Jiayi Zhou et.al. | 2510.21689 | null |
| 2025-10-24 | S3OD: Towards Generalizable Salient Object Detection with Synthetic Data | Orest Kupyn et.al. | 2510.21605 | null |
| 2025-10-24 | Scalpel: Automotive Deep Learning Framework Testing via Assembling Model Components | Yinglong Zou et.al. | 2510.21451 | null |
| 2025-10-24 | Unveiling the Spatial-temporal Effective Receptive Fields of Spiking Neural Networks | Jieyuan Zhang et.al. | 2510.21403 | null |
| 2025-10-24 | WhaleVAD-BPN: Improving Baleen Whale Call Detection with Boundary Proposal Networks and Post-processing Optimisation | Christiaan M. Geldenhuys et.al. | 2510.21280 | null |
| 2025-10-23 | BioDet: Boosting Industrial Object Detection with Image Preprocessing Strategies | Jiaqi Hu et.al. | 2510.21000 | null |
| 2025-10-23 | BUSTED at AraGenEval Shared Task: A Comparative Study of Transformer-Based Models for Arabic AI-Generated Text Detection | Ali Zain et.al. | 2510.20610 | null |
| 2025-10-23 | Synthetic Data for Robust Runway Detection | Estelle Chigot et.al. | 2510.20349 | null |
| 2025-10-23 | Physics-Guided Fusion for Robust 3D Tracking of Fast Moving Small Objects | Prithvi Raj Singh et.al. | 2510.20126 | null |
| 2025-10-22 | A Unified Detection Pipeline for Robust Object Detection in Fisheye-Based Traffic Surveillance | Neema Jakisa Owor et.al. | 2510.20016 | null |
| 2025-10-22 | Can You Trust What You See? Alpha Channel No-Box Attacks on Video Object Detection | Ariana Yi et.al. | 2510.19574 | null |
| 2025-10-22 | Machine Text Detectors are Membership Inference Attacks | Ryuto Koike et.al. | 2510.19492 | link |
| 2025-10-22 | Towards Single-Source Domain Generalized Object Detection via Causal Visual Prompts | Chen Li et.al. | 2510.19487 | null |
| 2025-10-22 | Space Object Detection using Multi-frame Temporal Trajectory Completion Method | Xiaoqing Lan et.al. | 2510.19220 | null |
| 2025-10-22 | SFGFusion: Surface Fitting Guided 3D Object Detection with 4D Radar and Camera Fusion | Xiaozhi Li et.al. | 2510.19215 | null |
| 2025-10-21 | Kinematic Analysis and Integration of Vision Algorithms for a Mobile Manipulator Employed Inside a Self-Driving Laboratory | Shifa Sulaiman et.al. | 2510.19081 | null |
| 2025-10-21 | GBlobs: Local LiDAR Geometry for Improved Sensor Placement Generalization | Dušan Malić et.al. | 2510.18539 | null |
| 2025-10-21 | DWaste: Greener AI for Waste Sorting using Mobile and Edge Devices | Suman Kunwar et.al. | 2510.18513 | null |
| 2025-10-21 | Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection | Ji Du et.al. | 2510.18437 | null |
| 2025-10-21 | ScaleNet: Scaling up Pretrained Neural Networks with Incremental Parameters | Zhiwei Hao et.al. | 2510.18431 | null |
| 2025-10-21 | Beyond Frequency: Scoring-Driven Debiasing for Object Detection via Blueprint-Prompted Image Synthesis | Xinhao Cai et.al. | 2510.18229 | null |
| 2025-10-20 | Accelerating Vision Transformers with Adaptive Patch Sizes | Rohan Choudhury et.al. | 2510.18091 | link |
| 2025-10-20 | Big Data, Tiny Targets: An Exploratory Study in Machine Learning-enhanced Detection of Microplastic from Filters | Paul-Tiberiu Miclea et.al. | 2510.18089 | null |
| 2025-10-15 | MUSE: Model-based Uncertainty-aware Similarity Estimation for zero-shot 2D Object Detection and Segmentation | Sungmin Cho et.al. | 2510.17866 | null |
| 2025-10-20 | Towards 3D Objectness Learning in an Open World | Taichi Liu et.al. | 2510.17686 | null |
| 2025-10-20 | On-the-Fly OVD Adaptation with FLAME: Few-shot Localization via Active Marginal-Samples Exploration | Yehonathan Refael et.al. | 2510.17670 | null |
| 2025-10-20 | DETree: DEtecting Human-AI Collaborative Texts via Tree-Structured Hierarchical Representation Learning | Yongxin He et.al. | 2510.17489 | link |
| 2025-10-20 | Split-Fuse-Transport: Annotation-Free Saliency via Dual Clustering and Optimal Transport Alignment | Muhammad Umer Ramzan et.al. | 2510.17484 | null |
| 2025-10-20 | Monitoring Horses in Stalls: From Object to Event Detection | Dmitrii Galimzianov et.al. | 2510.17409 | null |
| 2025-10-20 | Machine Vision-Based Surgical Lighting System:Design and Implementation | Amir Gharghabi et.al. | 2510.17287 | null |
| 2025-10-20 | Investigating Adversarial Robustness against Preprocessing used in Blackbox Face Recognition | Roland Croft et.al. | 2510.17169 | null |
| 2025-10-20 | Towards a Generalizable Fusion Architecture for Multimodal Object Detection | Jad Berjawi et.al. | 2510.17078 | null |
| 2025-10-19 | ArmFormer: Lightweight Transformer Architecture for Real-Time Multi-Class Weapon Segmentation and Classification | Akhila Kambhatla et.al. | 2510.16854 | null |
| 2025-10-18 | Towards Intelligent Traffic Signaling in Dhaka City Based on Vehicle Detection and Congestion Optimization | Kazi Ababil Azam et.al. | 2510.16622 | null |
| 2025-10-18 | AI-Generated Text Detection in Low-Resource Languages: A Case Study on Urdu | Muhammad Ammar et.al. | 2510.16573 | null |
| 2025-10-18 | ReviewGuard: Enhancing Deficient Peer Review Detection via LLM-Driven Data Augmentation | Haoxuan Zhang et.al. | 2510.16549 | null |
| 2025-10-18 | OOS-DSD: Improving Out-of-stock Detection in Retail Images using Auxiliary Tasks | Franko Šikić et.al. | 2510.16508 | null |
| 2025-10-18 | Enhancing Rotated Object Detection via Anisotropic Gaussian Bounding Box and Bhattacharyya Distance | Chien Thai et.al. | 2510.16445 | null |
| 2025-10-17 | Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI | Zheng Huang et.al. | 2510.16196 | null |
| 2025-10-17 | ObjectTransforms for Uncertainty Quantification and Reduction in Vision-Based Perception for Autonomous Vehicles | Nishad Sahu et.al. | 2510.16118 | null |
| 2025-10-17 | StripRFNet: A Strip Receptive Field and Shape-Aware Network for Road Damage Detection | Jianhan Lin et.al. | 2510.16115 | null |
| 2025-10-17 | LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal | Shr-Ruei Tsai et.al. | 2510.15868 | link |
| 2025-10-17 | ReCon: Region-Controllable Data Augmentation with Rectification and Alignment for Object Detection | Haowei Zhu et.al. | 2510.15783 | null |
| 2025-10-17 | Valeo Near-Field: a novel dataset for pedestrian intent detection | Antonyo Musabini et.al. | 2510.15673 | null |
| 2025-10-17 | FreqPDE: Rethinking Positional Depth Embedding for Multi-View 3D Object Detection Transformers | Haisheng Su et.al. | 2510.15385 | null |
| 2025-10-17 | Symmetric Entropy-Constrained Video Coding for Machines | Yuxiao Sun et.al. | 2510.15347 | null |
| 2025-10-16 | MOBIUS: Big-to-Mobile Universal Instance Segmentation via Multi-modal Bottleneck Fusion and Calibrated Decoder Pruning | Mattia Segu et.al. | 2510.15026 | null |
| 2025-10-16 | EdgeNavMamba: Mamba Optimized Object Detection for Energy Efficient Edge Devices | Romina Aalishah et.al. | 2510.14946 | null |
| 2025-10-16 | VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation | Han Zhao et.al. | 2510.14902 | link |
| 2025-10-16 | CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection | Hojun Choi et.al. | 2510.14792 | null |
| 2025-10-16 | Cross-Layer Feature Self-Attention Module for Multi-Scale Object Detection | Dingzhou Xie et.al. | 2510.14726 | null |
| 2025-10-16 | Structured Universal Adversarial Attacks on Object Detection for Video Sequences | Sven Jacob et.al. | 2510.14460 | null |
| 2025-10-16 | Beat Tracking as Object Detection | Jaehoon Ahn et.al. | 2510.14391 | null |
| 2025-10-15 | How Sampling Affects the Detectability of Machine-written texts: A Comprehensive Study | Matthieu Dubois et.al. | 2510.13681 | null |
| 2025-10-15 | A Modular Object Detection System for Humanoid Robots Using YOLO | Nicolas Pottier et.al. | 2510.13625 | null |
| 2025-10-15 | Fusion Meets Diverse Conditions: A High-diversity Benchmark and Baseline for UAV-based Multimodal Object Detection with Condition Cues | Chen Chen et.al. | 2510.13620 | null |
| 2025-10-15 | Automated document processing system for government agencies using DBNET++ and BART models | Aya Kaysan Bahjat et.al. | 2510.13303 | null |
| 2025-10-15 | LLM one-shot style transfer for Authorship Attribution and Verification | Pablo Miralles-González et.al. | 2510.13302 | null |
| 2025-10-15 | What “Not” to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging | Inha Kang et.al. | 2510.13232 | null |
| 2025-10-15 | An Analytical Framework to Enhance Autonomous Vehicle Perception for Smart Cities | Jalal Khan et.al. | 2510.13230 | null |
| 2025-10-14 | Detect Anything via Next Point Prediction | Qing Jiang et.al. | 2510.12798 | link |
| 2025-10-14 | StyleDecipher: Robust and Explainable Detection of LLM-Generated Texts with Stylistic Analysis | Siyuan Li et.al. | 2510.12608 | null |
| 2025-10-14 | WaterFlow: Explicit Physics-Prior Rectified Flow for Underwater Saliency Mask Generation | Runting Li et.al. | 2510.12605 | null |
| 2025-10-14 | When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection | Lang Gao et.al. | 2510.12476 | null |
| 2025-10-14 | The Impact of Synthetic Data on Object Detection Model Performance: A Comparative Analysis with Real-World Data | Muammer Bay et.al. | 2510.12208 | null |
| 2025-10-14 | SpikePool: Event-driven Spiking Transformer with Pooling Attention | Donghyun Lee et.al. | 2510.12102 | null |
| 2025-10-14 | APGNet: Adaptive Prior-Guided for Underwater Camouflaged Object Detection | Xinxin Huang et.al. | 2510.12056 | null |
| 2025-10-13 | NV3D: Leveraging Spatial Shape Through Normal Vector-based 3D Object Detection | Krittin Chaowakarn et.al. | 2510.11632 | null |
| 2025-10-13 | Enhancing Maritime Domain Awareness on Inland Waterways: A YOLO-Based Fusion of Satellite and AIS for Vessel Characterization | Geoffery Agorku et.al. | 2510.11449 | null |
| 2025-10-13 | A Modular AIoT Framework for Low-Latency Real-Time Robotic Teleoperation in Smart Cities | Shih-Chieh Sun et.al. | 2510.11421 | null |
| 2025-10-13 | REACT3D: Recovering Articulations for Interactive Physical 3D Scenes | Zhao Huang et.al. | 2510.11340 | null |
| 2025-10-13 | When Does Supervised Training Pay Off? The Hidden Economics of Object Detection in the Era of Vision-Language Models | Samer Al-Hamadani et.al. | 2510.11302 | null |
| 2025-10-13 | A Large-Language-Model Assisted Automated Scale Bar Detection and Extraction Framework for Scanning Electron Microscopic Images | Yuxuan Chen et.al. | 2510.11260 | null |
| 2025-10-13 | Source-Free Object Detection with Detection Transformer | Huizai Yao et.al. | 2510.11090 | null |
| 2025-10-13 | Slitless Spectroscopy Source Detection Using YOLO Deep Neural Network | Xiaohan Chen et.al. | 2510.10922 | null |
| 2025-10-12 | EGD-YOLO: A Lightweight Multimodal Framework for Robust Drone-Bird Discrimination via Ghost-Enhanced YOLOv8n and EMA Attention under Adverse Condition | Sudipto Sarkar et.al. | 2510.10765 | null |
| 2025-10-12 | MRS-YOLO Railroad Transmission Line Foreign Object Detection Based on Improved YOLO11 and Channel Pruning | Siyuan Liu et.al. | 2510.10553 | null |
| 2025-10-12 | Risk-Budgeted Control Framework for Balanced Performance and Safety in Autonomous Vehicles | Pei Yu Chang et.al. | 2510.10442 | null |
| 2025-10-11 | Ordinal Scale Traffic Congestion Classification with Multi-Modal Vision-Language and Motion Analysis | Yu-Hsuan Lin et.al. | 2510.10342 | null |
| 2025-10-11 | Bridging Perspectives: Foundation Model Guided BEV Maps for 3D Object Detection and Tracking | Markus Käppeler et.al. | 2510.10287 | null |
| 2025-10-11 | MRI Brain Tumor Detection with Computer Vision | Jack Krolik et.al. | 2510.10250 | null |
| 2025-10-11 | BurstDeflicker: A Benchmark Dataset for Flicker Removal in Dynamic Scenes | Lishen Qu et.al. | 2510.09996 | null |
| 2025-10-10 | SpectralCA: Bi-Directional Cross-Attention for Next-Generation UAV Hyperspectral Vision | D. V. Brovko et.al. | 2510.09912 | null |
| 2025-10-06 | Ultralytics YOLO Evolution: An Overview of YOLO26, YOLO11, YOLOv8 and YOLOv5 Object Detectors for Computer Vision and Pattern Recognition | Ranjan Sapkota et.al. | 2510.09653 | null |
| 2025-10-10 | FSP-DETR: Few-Shot Prototypical Parasitic Ova Detection | Shubham Trehan et.al. | 2510.09583 | null |
| 2025-10-10 | PRNet: Original Information Is All You Have | PeiHuang Zheng et.al. | 2510.09531 | null |
| 2025-10-10 | Utilizing dynamic sparsity on pretrained DETR | Reza Sedghi et.al. | 2510.09380 | null |
| 2025-10-10 | TARO: Toward Semantically Rich Open-World Object Detection | Yuchen Zhang et.al. | 2510.09173 | null |
| 2025-10-10 | SOS: Synthetic Object Segments Improve Detection, Segmentation, and Grounding | Weikai Huang et.al. | 2510.09110 | null |
| 2025-10-09 | Re-Identifying Kākā with AI-Automated Video Key Frame Extraction | Paula Maddigan et.al. | 2510.08775 | null |
| 2025-10-03 | Beyond CNNs: Efficient Fine-Tuning of Multi-Modal LLMs for Object Detection on Low-Data Regimes | Nirmal Elamon et.al. | 2510.08589 | null |
| 2025-10-09 | A Multimodal Depth-Aware Method For Embodied Reference Understanding | Fevziye Irem Eyiokur et.al. | 2510.08278 | null |
| 2025-10-09 | RayFusion: Ray Fusion Enhanced Collaborative Visual Perception | Shaohong Wang et.al. | 2510.08017 | null |
| 2025-10-09 | A Large-scale Dataset for Robust Complex Anime Scene Text Detection | Ziyi Dong et.al. | 2510.07951 | null |
| 2025-10-08 | Robust Measurement of Stellar Streams Around the Milky Way: Correcting Spatially Variable Observational Selection Effects in Optical Imaging Surveys | K. Boone et.al. | 2510.07511 | null |
| 2025-10-08 | A million-solar-mass object detected at cosmological distance using gravitational imaging | D. M. Powell et.al. | 2510.07382 | null |
| 2025-10-08 | Inconsistent Affective Reaction: Sentiment of Perception and Opinion in Urban Environments | Jingfei Huang et.al. | 2510.07359 | null |
| 2025-10-07 | Enhancing Maritime Object Detection in Real-Time with RT-DETR and Data Augmentation | Nader Nemati et.al. | 2510.07346 | null |
| 2025-10-08 | Explaining raw data complexity to improve satellite onboard processing | Adrien Dorise et.al. | 2510.06858 | null |
| 2025-10-08 | Extreme Amodal Face Detection | Changlin Song et.al. | 2510.06791 | null |
| 2025-10-08 | SDQM: Synthetic Data Quality Metric for Object Detection Dataset Evaluation | Ayush Zenith et.al. | 2510.06596 | link |
| 2025-10-08 | Adaptive Stain Normalization for Cross-Domain Medical Histology | Tianyue Xu et.al. | 2510.06592 | null |
| 2025-10-06 | General and Efficient Visual Goal-Conditioned Reinforcement Learning using Object-Agnostic Masks | Fahim Shahriar et.al. | 2510.06277 | null |
| 2025-10-06 | Comparative Analysis of YOLOv5, Faster R-CNN, SSD, and RetinaNet for Motorbike Detection in Kigali Autonomous Driving Context | Ngeyen Yinkfu et.al. | 2510.04912 | null |
| 2025-10-06 | CLEAR-IR: Clarity-Enhanced Active Reconstruction of Infrared Imagery | Nathan Shankar et.al. | 2510.04883 | null |
| 2025-10-06 | SPEGNet: Synergistic Perception-Guided Network for Camouflaged Object Detection | Baber Jan et.al. | 2510.04472 | link |
| 2025-10-04 | From Filters to VLMs: Benchmarking Defogging Methods through Object Detection and Segmentation Performance | Ardalan Aryashad et.al. | 2510.03906 | null |
| 2025-10-04 | Cross-View Open-Vocabulary Object Detection in Aerial Imagery | Jyoti Kini et.al. | 2510.03858 | null |
| 2025-10-04 | Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models | Leander Girrbach et.al. | 2510.03721 | null |
| 2025-10-04 | SAMSOD: Rethinking SAM Optimization for RGB-T Salient Object Detection | Zhengyi Liu et.al. | 2510.03689 | null |
| 2025-10-03 | ALHD: A Large-Scale and Multigenre Benchmark Dataset for Arabic LLM-Generated Text Detection | Ali Khairallah et.al. | 2510.03502 | null |
| 2025-10-03 | Visual Language Model as a Judge for Object Detection in Industrial Diagrams | Sanjukta Ghosh et.al. | 2510.03376 | null |
| 2025-10-03 | Neural Posterior Estimation with Autoregressive Tiling for Detecting Objects in Astronomical Images | Jeffrey Regier et.al. | 2510.03074 | null |
| 2025-10-03 | Align Your Query: Representation Alignment for Multimodality Medical Object Detection | Ara Seo et.al. | 2510.02789 | null |
| 2025-10-02 | Multimodal Large Language Model Framework for Safe and Interpretable Grid-Integrated EVs | Jean Douglas Carvalho et.al. | 2510.02592 | null |
| 2025-10-02 | Clink! Chop! Thud! – Learning Object Sounds from Real-World Interactions | Mengyu Yang et.al. | 2510.02313 | null |
| 2025-10-02 | kabr-tools: Automated Framework for Multi-Species Behavioral Monitoring | Jenna Kline et.al. | 2510.02030 | link |
| 2025-10-02 | Automated Defect Detection for Mass-Produced Electronic Components Based on YOLO Object Detection Models | Wei-Lung Mao et.al. | 2510.01914 | null |
| 2025-10-02 | Calibrating the Full Predictive Class Distribution of 3D Object Detectors for Autonomous Driving | Cornelius Schröder et.al. | 2510.01829 | null |
| 2025-10-01 | Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks | Shoumik Saha et.al. | 2510.01359 | null |
| 2025-10-01 | Span-level Detection of AI-generated Scientific Text via Contrastive Learning and Structural Calibration | Zhen Yin et.al. | 2510.00890 | null |
| 2025-10-01 | Adaptive Event Stream Slicing for Open-Vocabulary Event-Based Object Detection via Vision-Language Knowledge Distillation | Jinchang Zhang et.al. | 2510.00681 | null |
| 2025-10-01 | Forestpest-YOLO: A High-Performance Detection Framework for Small Forestry Pests | Aoduo Li et.al. | 2510.00547 | null |
| 2025-09-30 | Looking Beyond the Known: Towards a Data Discovery Guided Open-World Object Detection | Anay Majee et.al. | 2510.00303 | null |
| 2025-09-30 | Neural Network-Based Single-Carrier Joint Communication and Sensing: Loss Design, Constellation Shaping and Precoding | Charlotte Muth et.al. | 2509.26508 | null |
| 2025-09-30 | Point2RBox-v3: Self-Bootstrapping from Point Annotations via Integrated Pseudo-Label Refinement and Utilization | Teng Zhang et.al. | 2509.26281 | null |
| 2025-09-30 | Beyond Overall Accuracy: Pose- and Occlusion-driven Fairness Analysis in Pedestrian Detection for Autonomous Driving | Mohammad Khoshkdahan et.al. | 2509.26166 | null |
| 2025-09-30 | Towards Continual Expansion of Data Coverage: Automatic Text-guided Edge-case Synthesis | Kyeongryeol Go et.al. | 2509.26158 | null |
| 2025-09-30 | Predicting Penalty Kick Direction Using Multi-Modal Deep Learning with Pose-Guided Attention | Pasindu Ranasinghe et.al. | 2509.26088 | null |
| 2025-09-30 | Geometric Learning of Canonical Parameterizations of $2D$ -curves | Ioana Ciuclea et.al. | 2509.26070 | null |
| 2025-09-30 | CEAID: Benchmark of Multilingual Machine-Generated Text Detection Methods for Central European Languages | Dominik Macko et.al. | 2509.26051 | null |
| 2025-09-30 | Adapting SAM with Dynamic Similarity Graphs for Few-Shot Parameter-Efficient Small Dense Object Detection: A Case Study of Chickpea Pods in Field Conditions | Xintong Jiang et.al. | 2509.25805 | null |
| 2025-09-29 | AttentionViG: Cross-Attention-Based Dynamic Neighbor Aggregation in Vision GNNs | Hakan Emre Gedik et.al. | 2509.25570 | null |
| 2025-09-29 | Infrastructure Sensor-enabled Vehicle Data Generation using Multi-Sensor Fusion for Proactive Safety Applications at Work Zone | Suhala Rabab Saba et.al. | 2509.25452 | null |
| 2025-09-29 | YOLO26: Key Architectural Enhancements and Performance Benchmarking for Real-Time Object Detection | Ranjan Sapkota et.al. | 2509.25164 | null |
| 2025-09-29 | Who’s Your Judge? On the Detectability of LLM-Generated Judgments | Dawei Li et.al. | 2509.25154 | null |
| 2025-09-29 | Accelerating Dynamic Image Graph Construction on FPGA for Vision GNNs | Anvitha Ramachandran et.al. | 2509.25121 | null |
| 2025-09-29 | GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning | Mustansar Fiaz et.al. | 2509.25026 | null |
| 2025-09-29 | Comprehensive Benchmarking of YOLOv11 Architectures for Scalable and Granular Peripheral Blood Cell Detection | Mohamad Abou Ali et.al. | 2509.24595 | null |
| 2025-09-29 | Talk in Pieces, See in Whole: Disentangling and Hierarchical Aggregating Representations for Language-based Object Detection | Sojung An et.al. | 2509.24192 | null |
| 2025-09-28 | Bridging the Task Gap: Multi-Task Adversarial Transferability in CLIP and Its Derivatives | Kuanrong Liu et.al. | 2509.23917 | null |
| 2025-09-28 | Learning Adaptive Pseudo-Label Selection for Semi-Supervised 3D Object Detection | Taehun Kong et.al. | 2509.23880 | null |
| 2025-09-28 | A Multi-Camera Vision-Based Approach for Fine-Grained Assembly Quality Control | Ali Nazeri et.al. | 2509.23815 | null |
| 2025-09-28 | Diff-3DCap: Shape Captioning with Diffusion Models | Zhenyu Shu et.al. | 2509.23718 | null |
| 2025-09-27 | On the Impact of LiDAR Point Cloud Compression on Remote Semantic Segmentation | Tiago de S. Fernandes et.al. | 2509.23341 | null |
| 2025-09-27 | C3-OWD: A Curriculum Cross-modal Contrastive Learning Framework for Open-World Detection | Siheng Wang et.al. | 2509.23316 | null |
| 2025-09-27 | FMC-DETR: Frequency-Decoupled Multi-Domain Coordination for Aerial-View Object Detection | Ben Liang et.al. | 2509.23056 | null |
| 2025-09-26 | SSVIF: Self-Supervised Segmentation-Oriented Visible and Infrared Image Fusion | Zixian Zhao et.al. | 2509.22450 | null |
| 2025-09-26 | $γ$ -Quant: Towards Learnable Quantization for Low-bit Pattern Recognition | Mishal Fatima et.al. | 2509.22448 | null |
| 2025-09-26 | HierLight-YOLO: A Hierarchical and Lightweight Object Detection Network for UAV Photography | Defan Chen et.al. | 2509.22365 | null |
| 2025-09-26 | Mixture of Detectors: A Compact View of Machine-Generated Text Detection | Sai Teja Lekkala et.al. | 2509.22147 | null |
| 2025-09-07 | S-LAM3D: Segmentation-Guided Monocular 3D Object Detection via Feature Space Fusion | Diana-Alexandra Sas et.al. | 2509.05999 | null |
| 2025-07-31 | 3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection | Yung-Hsu Yang et.al. | 2507.23567 | link |
| 2025-07-24 | Protecting Vulnerable Voices: Synthetic Dataset Generation for Self-Disclosure Detection | Shalini Jangra et.al. | 2507.22930 | null |
| 2025-07-25 | Bias Analysis for Synthetic Face Detection: A Case Study of the Impact of Facial Attributes | Asmae Lamsaf et.al. | 2507.19705 | null |
| 2025-07-25 | Co-Win: Joint Object Detection and Instance Segmentation in LiDAR Point Clouds via Collaborative Window Processing | Haichuan Li et.al. | 2507.19691 | null |
| 2025-07-25 | An OpenSource CI/CD Pipeline for Variant-Rich Software-Defined Vehicles | Matthias Weiß et.al. | 2507.19446 | null |
| 2025-07-25 | EffiComm: Bandwidth Efficient Multi Agent Communication | Melih Yazgan et.al. | 2507.19354 | null |
| 2025-07-25 | Multistream Network for LiDAR and Camera-based 3D Object Detection in Outdoor Scenes | Muhammad Ibrahim et.al. | 2507.19304 | null |
| 2025-07-25 | Cross Spatial Temporal Fusion Attention for Remote Sensing Object Detection via Image Feature Matching | Abu Sadat Mohammad Salehin Amit et.al. | 2507.19118 | null |
| 2025-07-25 | Revisiting DETR for Small Object Detection via Noise-Resilient Query Optimization | Xiaocheng Fang et.al. | 2507.19059 | null |
| 2025-07-25 | YOLO for Knowledge Extraction from Vehicle Images: A Baseline Study | Saraa Al-Saddik et.al. | 2507.18966 | null |
| 2025-07-25 | WiSE-OD: Benchmarking Robustness in Infrared Object Detection | Heitor R. Medeiros et.al. | 2507.18925 | null |
| 2025-07-25 | Synthetic-to-Real Camouflaged Object Detection | Zhihao Luo et.al. | 2507.18911 | null |
| 2025-07-24 | Towards Large Scale Geostatistical Methane Monitoring with Part-based Object Detection | Adhemar de Senneville et.al. | 2507.18513 | null |
| 2025-07-24 | Human Scanpath Prediction in Target-Present Visual Search with Semantic-Foveal Bayesian Attention | João Luzio et.al. | 2507.18503 | null |
| 2025-07-24 | A COCO-Formatted Instance-Level Dataset for Plasmodium Falciparum Detection in Giemsa-Stained Blood Smears | Frauke Wilm et.al. | 2507.18483 | null |
| 2025-07-24 | Revisiting Physically Realizable Adversarial Object Attack against LiDAR-based Detection: Clarifying Problem Formulation and Experimental Protocols | Luo Cheng et.al. | 2507.18457 | null |
| 2025-07-24 | Boosting Multi-View Indoor 3D Object Detection via Adaptive 3D Volume Construction | Runmin Zhang et.al. | 2507.18331 | link |
| 2025-07-24 | LMM-Det: Make Large Multimodal Models Excel in Object Detection | Jincheng Li et.al. | 2507.18300 | link |
| 2025-07-24 | Evaluation of facial landmark localization performance in a surgical setting | Ines Frajtag et.al. | 2507.18248 | null |
| 2025-07-24 | Real-Time Object Detection and Classification using YOLO for Edge FPGAs | Rashed Al Amin et.al. | 2507.18174 | null |
| 2025-07-24 | WaveMamba: Wavelet-Driven Mamba Fusion for RGB-Infrared Object Detection | Haodong Zhu et.al. | 2507.18173 | null |
| 2025-07-24 | OpenNav: Open-World Navigation with Multimodal Large Language Models | Mingfeng Yuan et.al. | 2507.18033 | null |
| 2025-07-23 | Bearded Dragon Activity Recognition Pipeline: An AI-Based Approach to Behavioural Monitoring | Arsen Yermukan et.al. | 2507.17987 | null |
| 2025-07-23 | FishDet-M: A Unified Large-Scale Benchmark for Robust Fish Detection and CLIP-Guided Model Selection in Diverse Aquatic Visual Domains | Muayad Abujabal et.al. | 2507.17859 | null |
| 2025-07-23 | Perspective-Invariant 3D Object Detection | Ao Liang et.al. | 2507.17665 | null |
| 2025-07-23 | Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning | Xinyao Liu et.al. | 2507.17539 | link |
| 2025-07-23 | Illicit object detection in X-ray imaging using deep learning techniques: A comparative evaluation | Jorgen Cani et.al. | 2507.17508 | link |
| 2025-07-23 | Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection | Yehao Lu et.al. | 2507.17436 | null |
| 2025-07-23 | SFUOD: Source-Free Unknown Object Detection | Keon-Hee Park et.al. | 2507.17373 | null |
| 2025-07-23 | Optimizing Delivery Logistics: Enhancing Speed and Safety with Drone Technology | Maharshi Shastri et.al. | 2507.17253 | null |
| 2025-07-23 | A Low-Cost Machine Learning Approach for Timber Diameter Estimation | Fatemeh Hasanzadeh Fard et.al. | 2507.17219 | null |
| 2025-07-22 | Few-Shot Learning in Video and 3D Object Detection: A Survey | Md Meftahul Ferdaus et.al. | 2507.17079 | null |
| 2025-07-22 | Transformer Based Building Boundary Reconstruction using Attraction Field Maps | Muhammad Kamran et.al. | 2507.17038 | null |
| 2025-07-22 | Divisive Decisions: Improving Salience-Based Training for Generalization in Binary Classification Tasks | Jacob Piland et.al. | 2507.17000 | null |
| 2025-07-22 | Task-Specific Zero-shot Quantization-Aware Training for Object Detection | Changhao Li et.al. | 2507.16782 | link |
| 2025-07-22 | Screen2AX: Vision-Based Approach for Automatic macOS Accessibility Generation | Viktor Muryn et.al. | 2507.16704 | null |
| 2025-07-22 | QRetinex-Net: Quaternion-Valued Retinex Decomposition for Low-Level Computer Vision Applications | Sos Agaian et.al. | 2507.16683 | null |
| 2025-07-22 | Benchmarking pig detection and tracking under diverse and challenging conditions | Jonathan Henrich et.al. | 2507.16639 | null |
| 2025-07-22 | A2Mamba: Attention-augmented State Space Models for Visual Recognition | Meng Lou et.al. | 2507.16624 | link |
| 2025-07-22 | PlantSAM: An Object Detection-Driven Segmentation Pipeline for Herbarium Specimens | Youcef Sklab et.al. | 2507.16506 | null |
| 2025-07-22 | Towards Railway Domain Adaptation for LiDAR-based 3D Detection: Road-to-Rail and Sim-to-Real via SynDRA-BBox | Xavier Diaz et.al. | 2507.16413 | null |
| 2025-07-22 | Scene Text Detection and Recognition “in light of” Challenging Environmental Conditions using Aria Glasses Egocentric Vision Cameras | Joseph De Mathia et.al. | 2507.16330 | null |
| 2025-07-22 | MAN++: Scaling Momentum Auxiliary Network for Supervised Local Learning in Vision Tasks | Junhao Su et.al. | 2507.16279 | null |
| 2025-07-22 | Edge-case Synthesis for Fisheye Object Detection: A Data-centric Perspective | Seunghyeon Kim et.al. | 2507.16254 | null |
| 2025-07-21 | Experimenting active and sequential learning in a medieval music manuscript | Sachin Sharma et.al. | 2507.15633 | null |
| 2025-07-21 | Few-Shot Object Detection via Spatial-Channel State Space Model | Zhimeng Xin et.al. | 2507.15308 | null |
| 2025-07-21 | Beyond Easy Wins: A Text Hardness-Aware Benchmark for LLM-generated Text Detection | Navid Ayoobi et.al. | 2507.15286 | null |
| 2025-07-20 | Event-based Graph Representation with Spatial and Motion Vectors for Asynchronous Object Detection | Aayush Atul Verma et.al. | 2507.15150 | null |
| 2025-07-20 | BleedOrigin: Dynamic Bleeding Source Localization in Endoscopic Submucosal Dissection via Dual-Stage Detection and Tracking | Mengya Xu et.al. | 2507.15094 | null |
| 2025-07-20 | InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis | Jiale Liu et.al. | 2507.14899 | null |
| 2025-07-20 | An Uncertainty-aware DETR Enhancement Framework for Object Detection | Xingshu Chen et.al. | 2507.14855 | null |
| 2025-07-20 | Seeing Through Deepfakes: A Human-Inspired Framework for Multi-Face Detection | Juan Hu et.al. | 2507.14807 | null |
| 2025-07-19 | GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks | Zixin Xu et.al. | 2507.14679 | null |
| 2025-07-19 | Multispectral State-Space Feature Fusion: Bridging Shared and Cross-Parametric Interactions for Object Detection | Jifeng Shen et.al. | 2507.14643 | null |
| 2025-07-18 | C-DOG: Training-Free Multi-View Multi-Object Association in Dense Scenes Without Visual Feature via Connected δ-Overlap Graphs | Yung-Hong Sun et.al. | 2507.14095 | null |
| 2025-07-18 | Enhancing LiDAR Point Features with Foundation Model Priors for 3D Object Detection | Yujian Mo et.al. | 2507.13899 | null |
| 2025-07-18 | Moving Object Detection from Moving Camera Using Focus of Expansion Likelihood and Segmentation | Masahiro Ogawa et.al. | 2507.13628 | null |
| 2025-07-17 | NSF-DOE Vera C. Rubin Observatory Observations of Interstellar Comet 3I/ATLAS (C/2025 N1) | Colin Orion Chandler et.al. | 2507.13409 | null |
| 2025-07-17 | A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains | Antonio Finocchiaro et.al. | 2507.13326 | null |
| 2025-07-17 | RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images | Xiaozheng Jiang et.al. | 2507.13120 | null |
| 2025-07-17 | Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection | Riku Inoue et.al. | 2507.13085 | null |
| 2025-07-17 | Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis | Saswat Priyadarshi Nayak et.al. | 2507.13073 | null |
| 2025-07-17 | SOD-YOLO: Enhancing YOLO-Based Detection of Small Objects in UAV Imagery | Peijun Wang et.al. | 2507.12727 | null |
| 2025-07-16 | Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios | Van-Hoang-Anh Phan et.al. | 2507.12449 | null |
| 2025-07-16 | InterpIoU: Rethinking Bounding Box Regression with Interpolation-Based IoU Optimization | Haoyuan Liu et.al. | 2507.12420 | null |
| 2025-07-16 | AutoVDC: Automated Vision Data Cleaning Using Vision-Language Models | Santosh Vasa et.al. | 2507.12414 | null |
| 2025-07-16 | OD-VIRAT: A Large-Scale Benchmark for Object Detection in Realistic Surveillance Environments | Hayat Ullah et.al. | 2507.12396 | null |
| 2025-07-16 | Improving Lightweight Weed Detection via Knowledge Distillation | Ahmet Oğuz Saltık et.al. | 2507.12344 | null |
| 2025-07-16 | SS-DC: Spatial-Spectral Decoupling and Coupling Across Visible-Infrared Gap for Domain Adaptive Object Detection | Xiwei Zhang et.al. | 2507.12017 | null |
| 2025-07-16 | Frequency-Dynamic Attention Modulation for Dense Prediction | Linwei Chen et.al. | 2507.12006 | null |
| 2025-07-15 | Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping | Yujie Zhang et.al. | 2507.11279 | null |
| 2025-07-15 | Using Continual Learning for Real-Time Detection of Vulnerable Road Users in Complex Traffic Scenarios | Faryal Aurooj Nasir et.al. | 2507.11046 | null |
| 2025-07-15 | Combining Transformers and CNNs for Efficient Object Detection in High-Resolution Satellite Imagery | Nicolas Drapier et.al. | 2507.11040 | null |
| 2025-07-14 | A Lightweight and Robust Framework for Real-Time Colorectal Polyp Detection Using LOF-Based Preprocessing and YOLO-v11n | Saadat Behzadi et.al. | 2507.10864 | null |
| 2025-07-14 | LLM-Guided Agentic Object Detection for Open-World Understanding | Furkan Mumcu et.al. | 2507.10844 | null |
| 2025-07-14 | Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection | Huiyi Wang et.al. | 2507.10814 | null |
| 2025-07-14 | Fine-Grained Zero-Shot Object Detection | Hongxu Ma et.al. | 2507.10358 | null |
| 2025-07-14 | BlueGlass: A Framework for Composite AI Safety | Harshal Nandigramwar et.al. | 2507.10106 | null |
| 2025-07-14 | SRG/ART-XC All-Sky X-ray Survey: Sensitivity Assessment Based on Aperture Photometry | N. Y. Tyrin et.al. | 2507.10060 | null |
| 2025-07-14 | 3DGAA: Realistic and Robust 3D Gaussian-based Adversarial Attack for Autonomous Driving | Yixun Zhang et.al. | 2507.09993 | null |
| 2025-07-14 | Measuring the Impact of Rotation Equivariance on Aerial Object Detection | Xiuyu Wu et.al. | 2507.09896 | null |
| 2025-07-14 | Secure and Efficient UAV-Based Face Detection via Homomorphic Encryption and Edge Computing | Nguyen Van Duc et.al. | 2507.09860 | null |
| 2025-07-13 | MLoRQ: Bridging Low-Rank and Quantization for Transformer Compression | Ofir Gordon et.al. | 2507.09616 | null |
| 2025-07-12 | Stereo-based 3D Anomaly Object Detection for Autonomous Driving: A New Dataset and Baseline | Shiyi Mu et.al. | 2507.09214 | link |
| 2025-07-12 | On the Fragility of Multimodal Perception to Temporal Misalignment in Autonomous Driving | Md Hasan Shahriar et.al. | 2507.09095 | null |
| 2025-07-11 | VISTA: A Visual Analytics Framework to Enhance Foundation Model-Generated Data Labels | Xiwei Xuan et.al. | 2507.09008 | null |
| 2025-07-11 | RoundaboutHD: High-Resolution Real-World Urban Environment Benchmark for Multi-Camera Vehicle Tracking | Yuqiang Lin et.al. | 2507.08729 | null |
| 2025-07-11 | DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images | Haoran Sun et.al. | 2507.08648 | null |
| 2025-07-11 | OnlineBEV: Recurrent Temporal Fusion in Bird’s Eye View Representations for Multi-Camera 3D Perception | Junho Koh et.al. | 2507.08644 | null |
| 2025-07-11 | Smelly, dense, and spreaded: The Object Detection for Olfactory References (ODOR) dataset | Mathias Zinnen et.al. | 2507.08384 | null |
| 2025-07-11 | Spectroscopic Observations of Four Candidates for Blue Large-Amplitude Pulsators. No BLAPs at High Galactic Latitudes | P. Pietrukowicz et.al. | 2507.08372 | null |
| 2025-07-11 | Understanding Driving Risks using Large Language Models: Toward Elderly Driver Assessment | Yuki Yoshihara et.al. | 2507.08367 | null |
| 2025-07-10 | An Embedded Real-time Object Alert System for Visually Impaired: A Monocular Depth Estimation based Approach through Computer Vision | Jareen Anjom et.al. | 2507.08165 | null |
| 2025-07-10 | Rainbow Artifacts from Electromagnetic Signal Injection Attacks on Image Sensors | Youqian Zhang et.al. | 2507.07773 | null |
| 2025-07-09 | Automated Video Segmentation Machine Learning Pipeline | Johannes Merz et.al. | 2507.07242 | null |
| 2025-07-09 | Aerial Maritime Vessel Detection and Identification | Antonella Barisic Kulas et.al. | 2507.07153 | null |
| 2025-07-09 | DenoiseCP-Net: Efficient Collective Perception in Adverse Weather via Joint LiDAR-Based 3D Object Detection and Denoising | Sven Teufel et.al. | 2507.06976 | null |
| 2025-07-09 | A multi-modal dataset for insect biodiversity with imagery and DNA at the trap and individual level | Johanna Orsholm et.al. | 2507.06972 | null |
| 2025-07-09 | Dataset and Benchmark for Enhancing Critical Retained Foreign Object Detection | Yuli Wang et.al. | 2507.06937 | null |
| 2025-07-09 | Unlocking Thermal Aerial Imaging: Synthetic Enhancement of UAV Datasets | Antonella Barisic Kulas et.al. | 2507.06797 | null |
| 2025-07-09 | LOVON: Legged Open-Vocabulary Object Navigator | Daojie Peng et.al. | 2507.06747 | null |
| 2025-07-09 | EA: An Event Autoencoder for High-Speed Vision Sensing | Riadul Islam et.al. | 2507.06459 | null |
| 2025-07-08 | Hierarchical Multi-Stage Transformer Architecture for Context-Aware Temporal Action Localization | Hayat Ullah et.al. | 2507.06411 | null |
| 2025-07-08 | ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge | Daghash K. Alqahtani et.al. | 2507.06011 | null |
| 2025-07-08 | R-VLM: Region-Aware Vision Language Model for Precise GUI Grounding | Joonhyung Park et.al. | 2507.05673 | null |
| 2025-07-07 | YOLO-APD: Enhancing YOLOv8 for Robust Pedestrian Detection on Complex Road Geometries | Aquino Joctum et.al. | 2507.05376 | null |
| 2025-07-07 | From a Different Star: 3I/ATLAS in the context of the Ōtautahi-Oxford interstellar object population model | Matthew J. Hopkins et.al. | 2507.05318 | null |
| 2025-07-07 | Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations | Xiang Xu et.al. | 2507.05260 | null |
| 2025-07-07 | AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models | Chinnappa Guggilla et.al. | 2507.05157 | null |
| 2025-07-07 | LERa: Replanning with Visual Feedback in Instruction Following | Svyatoslav Pchelintsev et.al. | 2507.05135 | null |
| 2025-07-07 | Robustifying 3D Perception through Least-Squares Multi-Agent Graphs Object Tracking | Maria Damanaki et.al. | 2507.04762 | null |
| 2025-07-07 | CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection | Hanzhi Zhong et.al. | 2507.04587 | null |
| 2025-07-06 | MambaFusion: Height-Fidelity Dense Global Fusion for Multi-modal 3D Object Detection | Hanshi Wang et.al. | 2507.04369 | null |
| 2025-07-06 | DMAT: An End-to-End Framework for Joint Atmospheric Turbulence Mitigation and Object Detection | Paul Hill et.al. | 2507.04323 | null |
| 2025-07-06 | ZERO: Multi-modal Prompt-based Visual Grounding | Sangbum Choi et.al. | 2507.04270 | null |
| 2025-07-05 | Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge | Linshen Liu et.al. | 2507.04123 | null |
| 2025-07-04 | Zero Memory Overhead Approach for Protecting Vision Transformer Parameters | Fereshteh Baradaran et.al. | 2507.03816 | null |
| 2025-07-03 | Partial Weakly-Supervised Oriented Object Detection | Mingxin Liu et.al. | 2507.02751 | null |
| 2025-07-03 | Automatic Labelling for Low-Light Pedestrian Detection | Dimitrios Bouzoulas et.al. | 2507.02513 | null |
| 2025-07-03 | Weakly-supervised Contrastive Learning with Quantity Prompts for Moving Infrared Small Target Detection | Weiwei Duan et.al. | 2507.02454 | null |
| 2025-07-03 | A Late Collaborative Perception Framework for 3D Multi-Object and Multi-Source Association and Fusion | Maryem Fadili et.al. | 2507.02430 | null |
| 2025-07-03 | PLOT: Pseudo-Labeling via Video Object Tracking for Scalable Monocular 3D Object Detection | Seokyeong Lee et.al. | 2507.02393 | null |
| 2025-07-03 | Two-Steps Neural Networks for an Automated Cerebrovascular Landmark Detection | Rafic Nader et.al. | 2507.02349 | null |
| 2025-07-03 | Perception Activator: An intuitive and portable framework for brain cognitive exploration | Le Xu et.al. | 2507.02311 | null |
| 2025-07-03 | Understanding Trade offs When Conditioning Synthetic Data | Brandon Trabucco et.al. | 2507.02217 | null |
| 2025-07-02 | How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks | Rahul Ramachandran et.al. | 2507.01955 | link |
| 2025-07-02 | Survivability of Backdoor Attacks on Unconstrained Face Recognition Systems | Quentin Le Roux et.al. | 2507.01607 | null |
| 2025-07-02 | Learning from Random Subspace Exploration: Generalized Test-Time Augmentation with Self-supervised Distillation | Andrei Jelea et.al. | 2507.01347 | null |
| 2025-07-01 | Rapid Salient Object Detection with Difference Convolutional Neural Networks | Zhuo Su et.al. | 2507.01182 | null |
| 2025-07-01 | Robust Component Detection for Flexible Manufacturing: A Deep Learning Approach to Tray-Free Object Recognition under Variable Lighting | Fatemeh Sadat Daneshmand et.al. | 2507.00852 | null |
| 2025-07-01 | UAVD-Mamba: Deformable Token Fusion Vision Mamba for Multimodal UAV Detection | Wei Li et.al. | 2507.00849 | null |
| 2025-07-01 | High-Frequency Semantics and Geometric Priors for End-to-End Detection Transformers in Challenging UAV Imagery | Hongxing Peng et.al. | 2507.00825 | null |
| 2025-07-01 | Multi-Modal Graph Convolutional Network with Sinusoidal Encoding for Robust Human Action Segmentation | Hao Xing et.al. | 2507.00752 | null |
| 2025-07-01 | UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement | Xiao Zhang et.al. | 2507.00721 | null |
| 2025-07-01 | Rectifying Magnitude Neglect in Linear Attention | Qihang Fan et.al. | 2507.00698 | link |
| 2025-06-30 | Continual Adaptation: Environment-Conditional Parameter Generation for Object Detection in Dynamic Scenarios | Deng Li et.al. | 2506.24063 | null |
| 2025-06-30 | Visual Textualization for Image Prompted Object Detection | Yongjian Wu et.al. | 2506.23785 | null |
| 2025-06-30 | PBCAT: Patch-based composite adversarial training against physically realizable attacks on object detection | Xiao Li et.al. | 2506.23581 | null |
| 2025-06-30 | Event-based Tiny Object Detection: A Benchmark Dataset and Baseline | Nuo Chen et.al. | 2506.23575 | null |
| 2025-06-30 | OcRFDet: Object-Centric Radiance Fields for Multi-View 3D Object Detection in Autonomous Driving | Mingqian Ji et.al. | 2506.23565 | null |
| 2025-06-30 | From Sight to Insight: Unleashing Eye-Tracking in Weakly Supervised Video Salient Object Detection | Qi Qin et.al. | 2506.23519 | null |
| 2025-06-30 | Improve Underwater Object Detection through YOLOv12 Architecture and Physics-informed Augmentation | Tinh Nguyen et.al. | 2506.23505 | null |
| 2025-06-29 | Detecting What Matters: A Novel Approach for Out-of-Distribution 3D Object Detection in Autonomous Vehicles | Menna Taha et.al. | 2506.23426 | null |
| 2025-06-29 | Layer Decomposition and Morphological Reconstruction for Task-Oriented Infrared Image Enhancement | Siyuan Chai et.al. | 2506.23353 | null |
| 2025-06-29 | GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields | Shunsuke Yasuki et.al. | 2506.23352 | null |
| 2025-06-27 | Attention-disentangled Uniform Orthogonal Feature Space Optimization for Few-shot Object Detection | Taijin Zhao et.al. | 2506.22161 | null |
| 2025-06-27 | Evaluating Pointing Gestures for Target Selection in Human-Robot Collaboration | Noora Sassali et.al. | 2506.22116 | null |
| 2025-06-27 | CERBERUS: Crack Evaluation & Recognition Benchmark for Engineering Reliability & Urban Stability | Justin Reinman et.al. | 2506.21909 | null |
| 2025-06-27 | Visual Content Detection in Educational Videos with Transfer Learning and Dataset Enrichment | Dipayan Biswas et.al. | 2506.21903 | null |
| 2025-06-27 | Embodied Domain Adaptation for Object Detection | Xiangyu Shi et.al. | 2506.21860 | null |
| 2025-06-26 | PhotonSplat: 3D Scene Reconstruction and Colorization from SPAD Sensors | Sai Sri Teja et.al. | 2506.21680 | null |
| 2025-06-26 | Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection | Tobias J. Riedlinger et.al. | 2506.21486 | null |
| 2025-06-26 | TITAN: Query-Token based Domain Adaptive Adversarial Learning | Tajamul Ashraf et.al. | 2506.21484 | null |
| 2025-06-26 | A Comprehensive Dataset for Underground Miner Detection in Diverse Scenario | Cyrus Addy et.al. | 2506.21451 | null |
| 2025-06-26 | DuET: Dual Incremental Object Detection via Exemplar-Free Task Arithmetic | Munish Monga et.al. | 2506.21260 | null |
| 2025-06-26 | LASFNet: A Lightweight Attention-Guided Self-Modulation Feature Fusion Network for Multimodal Object Detection | Lei Hao et.al. | 2506.21018 | null |
| 2025-06-26 | ThermalDiffusion: Visual-to-Thermal Image-to-Image Translation for Autonomous Navigation | Shruti Bansal et.al. | 2506.20969 | null |
| 2025-06-25 | Lightweight Multi-Frame Integration for Robust YOLO Object Detection in Videos | Yitong Quan et.al. | 2506.20550 | null |
| 2025-06-25 | Learning-based safety lifting monitoring system for cranes on construction sites | Hao Chen et.al. | 2506.20475 | null |
| 2025-06-25 | Feature Hallucination for Self-supervised Action Recognition | Lei Wang et.al. | 2506.20342 | null |
| 2025-06-25 | From Codicology to Code: A Comparative Study of Transformer and YOLO-based Detectors for Layout Analysis in Historical Documents | Sergio Torres Aguilar et.al. | 2506.20326 | null |
| 2025-06-25 | TDiR: Transformer based Diffusion for Image Restoration Tasks | Abbas Anwar et.al. | 2506.20302 | null |
| 2025-06-25 | Integrated optomechanical ultrasonic sensors with nano-Pascal-level sensitivity | Xuening Cao et.al. | 2506.20219 | null |
| 2025-06-24 | A Survey of Multi-sensor Fusion Perception for Embodied AI: Background, Methods, Challenges and Prospects | Shulan Ruan et.al. | 2506.19769 | null |
| 2025-06-24 | Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance | Xuesong Li et.al. | 2506.19683 | null |
| 2025-06-24 | Probabilistic modelling and safety assurance of an agriculture robot providing light-treatment | Mustafa Adam et.al. | 2506.19620 | null |
| 2025-06-24 | USIS16K: High-Quality Dataset for Underwater Salient Instance Segmentation | Lin Hong et.al. | 2506.19472 | null |
| 2025-06-23 | SpaNN: Detecting Multiple Adversarial Patches on CNNs by Spanning Saliency Thresholds | Mauricio Byrd Victorica et.al. | 2506.18591 | null |
| 2025-06-23 | Improvement on LiDAR-Camera Calibration Using Square Targets | Zhongyuan Li et.al. | 2506.18294 | null |
| 2025-06-23 | Learning Approach to Efficient Vision-based Active Tracking of a Flying Target by an Unmanned Aerial Vehicle | Jagadeswara PKV Pothuri et.al. | 2506.18264 | null |
| 2025-06-23 | Ground tracking for improved landmine detection in a GPR system | Li Tang et.al. | 2506.18258 | null |
| 2025-06-24 | Referring Expression Instance Retrieval and A Strong End-to-End Baseline | Xiangzhao Hao et.al. | 2506.18246 | null |
| 2025-06-24 | Unfolding the Past: A Comprehensive Deep Learning Approach to Analyzing Incunabula Pages | Klaudia Ropel et.al. | 2506.18069 | null |
| 2025-06-21 | YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception | Mengqi Lei et.al. | 2506.17733 | link |
| 2025-06-21 | CSDN: A Context-Gated Self-Adaptive Detection Network for Real-Time Object Detection | Wei Haolin et.al. | 2506.17679 | null |
| 2025-06-21 | DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving | Mihir Godbole et.al. | 2506.17590 | null |
| 2025-06-20 | YASMOT: Yet another stereo image multi-object tracker | Ketil Malde et.al. | 2506.17186 | link |
| 2025-06-20 | Class Agnostic Instance-level Descriptor for Visual Instance Search | Qi-Ying Sun et.al. | 2506.16745 | null |
| 2025-06-20 | Cross-modal Offset-guided Dynamic Alignment and Fusion for Weakly Aligned UAV Object Detection | Liu Zongzhen et.al. | 2506.16737 | null |
| 2025-06-19 | How Hard Is Snow? A Paired Domain Adaptation Dataset for Clear and Snowy Weather: CADC+ | Mei Qi Tang et.al. | 2506.16531 | null |
| 2025-06-19 | Can AI Dream of Unseen Galaxies? Conditional Diffusion Model for Galaxy Morphology Augmentation | Chenrui Ma et.al. | 2506.16233 | null |
| 2025-06-19 | VideoGAN-based Trajectory Proposal for Automated Vehicles | Annajoyce Mariani et.al. | 2506.16209 | null |
| 2025-06-19 | BLADE: An Automated Framework for Classifying Light Curves from the Center for Near-Earth Object Studies (CNEOS) Fireball Database | Elizabeth A. Silber et.al. | 2506.16099 | null |
| 2025-06-19 | Polyline Path Masked Attention for Vision Transformer | Zhongchen Zhao et.al. | 2506.15940 | link |
| 2025-06-18 | PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning | Yuhui Shi et.al. | 2506.15683 | null |
| 2025-06-18 | BoxFusion: Reconstruction-Free Open-Vocabulary 3D Object Detection via Real-Time Multi-View Box Fusion | Yuqing Lan et.al. | 2506.15610 | null |
| 2025-06-18 | Retrospective Memory for Camouflaged Object Detection | Chenxi Zhang et.al. | 2506.15244 | null |
| 2025-06-18 | Fiber Signal Denoising Algorithm using Hybrid Deep Learning Networks | Linlin Wang et.al. | 2506.15125 | null |
| 2025-06-19 | Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction Analysis | Varun Mannam et.al. | 2506.14854 | null |
| 2025-06-18 | YOLOv11-RGBT: Towards a Comprehensive Single-Stage Multispectral Object Detection Framework | Dahang Wan et.al. | 2506.14696 | null |
| 2025-06-17 | VisText-Mosquito: A Multimodal Dataset and Benchmark for AI-Based Mosquito Breeding Site Detection and Reasoning | Md. Adnanul Islam et.al. | 2506.14629 | link |
| 2025-06-17 | GAMORA: A Gesture Articulated Meta Operative Robotic Arm for Hazardous Material Handling in Containment-Level Environments | Farha Abdul Wasay et.al. | 2506.14513 | null |
| 2025-06-17 | Comparison of Two Methods for Stationary Incident Detection Based on Background Image | Deepak Ghimire et.al. | 2506.14256 | null |
| 2025-06-16 | A Point Cloud Completion Approach for the Grasping of Partially Occluded Objects and Its Applications in Robotic Strawberry Harvesting | Ali Abouzeid et.al. | 2506.14066 | link |
| 2025-06-16 | FindMeIfYouCan: Bringing Open Set metrics to $\textit{near} $, $ \textit{far} $ and $\textit{farther}$ Out-of-Distribution Object Detection | Daniel Montoya et.al. | 2506.14008 | null |
| 2025-06-16 | How Real is CARLAs Dynamic Vision Sensor? A Study on the Sim-to-Real Gap in Traffic Object Detection | Kaiyuan Tan et.al. | 2506.13722 | null |
| 2025-06-17 | Lecture Video Visual Objects (LVVO) Dataset: A Benchmark for Visual Object Detection in Educational Videos | Dipayan Biswas et.al. | 2506.13657 | link |
| 2025-06-16 | UAV Object Detection and Positioning in a Mining Industrial Metaverse with Custom Geo-Referenced Data | Vasiliki Balaska et.al. | 2506.13505 | null |
| 2025-06-16 | Sparse Convolutional Recurrent Learning for Efficient Event-based Neuromorphic Object Detection | Shenqi Wang et.al. | 2506.13440 | null |
| 2025-06-16 | Cognitive Synergy Architecture: SEGO for Human-Centric Collaborative Robots | Jaehong Oh et.al. | 2506.13149 | null |
| 2025-06-15 | MGDFIS: Multi-scale Global-detail Feature Integration Strategy for Small Object Detection | Yuxiang Wang et.al. | 2506.12697 | null |
| 2025-06-14 | UniDet-D: A Unified Dynamic Spectral Attention Model for Object Detection under Adverse Weathers | Yuantao Wang et.al. | 2506.12324 | null |
| 2025-06-14 | MatchPlant: An Open-Source Pipeline for UAV-Based Single-Plant Detection and Data Extraction | Worasit Sangjan et.al. | 2506.12295 | link |
| 2025-06-13 | Vision-based Lifting of 2D Object Detections for Automated Driving | Hendrik Königshof et.al. | 2506.11839 | null |
| 2025-06-13 | Teleoperated Driving: a New Challenge for 3D Object Detection in Compressed Point Clouds | Filippo Bragato et.al. | 2506.11804 | null |
| 2025-06-13 | GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers | Guang Liang et.al. | 2506.11784 | null |
| 2025-06-13 | On the Natural Robustness of Vision-Language Models Against Visual Perception Attacks in Autonomous Driving | Pedram MohajerAnsari et.al. | 2506.11472 | null |
| 2025-06-12 | Teaching in adverse scenes: a statistically feedback-driven threshold and mask adjustment teacher-student framework for object detection in UAV images under adverse scenes | Hongyu Chen et.al. | 2506.11175 | null |
| 2025-06-12 | Discrete Lorenz Attractors in 3D Sinusoidal Maps | Sishu Shankar Muni et.al. | 2506.10788 | null |
| 2025-06-12 | Uncertainty-Masked Bernoulli Diffusion for Camouflaged Object Detection Refinement | Yuqi Shen et.al. | 2506.10712 | null |
| 2025-06-12 | Semantic-decoupled Spatial Partition Guided Point-supervised Oriented Object Detection | Xinyuan Liu et.al. | 2506.10601 | link |
| 2025-06-12 | Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration | Jun Wang et.al. | 2506.10573 | null |
| 2025-06-12 | FSATFusion: Frequency-Spatial Attention Transformer for Infrared and Visible Image Fusion | Tianpei Zhang et.al. | 2506.10366 | link |
| 2025-06-11 | DySS: Dynamic Queries and State-Space Learning for Efficient 3D Object Detection from Multi-Camera Videos | Rajeev Yasarla et.al. | 2506.10242 | null |
| 2025-06-11 | CEM-FBGTinyDet: Context-Enhanced Foreground Balance with Gradient Tuning for tiny Objects | Tao Liu et.al. | 2506.09897 | null |
| 2025-06-11 | 3DGeoDet: General-purpose Geometry-aware Image-based 3D Object Detection | Yi Zhang et.al. | 2506.09541 | null |
| 2025-06-11 | MSSDF: Modality-Shared Self-supervised Distillation for High-Resolution Multi-modal Remote Sensing Image Learning | Tong Wang et.al. | 2506.09327 | null |
| 2025-06-10 | Efficient Edge Deployment of Quantized YOLOv4-Tiny for Aerial Emergency Object Detection on Raspberry Pi 5 | Sindhu Boddu et.al. | 2506.09300 | null |
| 2025-06-10 | Lightweight Object Detection Using Quantized YOLOv4-Tiny for Emergency Response in Aerial Imagery | Sindhu Boddu et.al. | 2506.09299 | null |
| 2025-06-10 | WD-DETR: Wavelet Denoising-Enhanced Real-Time Object Detection Transformer for Robot Perception with Event Cameras | Yangjie Cui et.al. | 2506.09098 | null |
| 2025-06-11 | Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models | Xuanchi Ren et.al. | 2506.09042 | null |
| 2025-06-10 | ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations | Amirreza Rouhi et.al. | 2506.08968 | null |
| 2025-06-10 | Data Augmentation For Small Object using Fast AutoAugment | DaeEun Yoon et.al. | 2506.08956 | null |
| 2025-06-11 | Gaussian2Scene: 3D Scene Representation Learning via Self-supervised Learning with 3D Gaussian Splatting | Keyi Liu et.al. | 2506.08777 | null |
| 2025-06-09 | CrosswalkNet: An Optimized Deep Learning Framework for Pedestrian Crosswalk Detection in Aerial Images with High-Performance Computing | Zubin Bhuyan et.al. | 2506.07885 | null |
| 2025-06-09 | SAM2Auto: Auto Annotation Using FLASH | Arash Rocky et.al. | 2506.07850 | null |
| 2025-06-09 | Design and Evaluation of Deep Learning-Based Dual-Spectrum Image Fusion Methods | Beining Xu et.al. | 2506.07779 | null |
| 2025-06-09 | SpikeSMOKE: Spiking Neural Networks for Monocular 3D Object Detection with Cross-Scale Gated Coding | Xuemei Chen et.al. | 2506.07737 | null |
| 2025-06-09 | Domain Randomization for Object Detection in Manufacturing Applications using Synthetic Data: A Comprehensive Study | Xiaomeng Zhu et.al. | 2506.07539 | null |
| 2025-06-09 | SpatialLM: Training Large Language Models for Structured Indoor Modeling | Yongsen Mao et.al. | 2506.07491 | link |
| 2025-06-09 | Happiness Finder: Exploring the Role of AI in Enhancing Well-Being During Four-Leaf Clover Searches | Anna Yokokubo et.al. | 2506.07393 | null |
| 2025-06-09 | Multiple Object Stitching for Unsupervised Representation Learning | Chengchao Shen et.al. | 2506.07364 | link |
| 2025-06-09 | CBAM-STN-TPS-YOLO: Enhancing Agricultural Object Detection through Spatially Adaptive Attention Mechanisms | Satvik Praveen et.al. | 2506.07357 | null |
| 2025-06-08 | UCOD-DPL: Unsupervised Camouflaged Object Detection via Dynamic Pseudo-label Learning | Weiqi Yan et.al. | 2506.07087 | null |
| 2025-06-06 | Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection | Yu Li et.al. | 2506.05872 | null |
| 2025-06-06 | Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration | Fanhu Zeng et.al. | 2506.05709 | null |
| 2025-06-06 | Integer Binary-Range Alignment Neuron for Spiking Neural Networks | Binghao Ye et.al. | 2506.05679 | null |
| 2025-06-05 | CL-ISR: A Contrastive Learning and Implicit Stance Reasoning Framework for Misleading Text Detection on Social Media | Tianyi Huang et.al. | 2506.05107 | null |
| 2025-06-05 | Synthetic Dataset Generation for Autonomous Mobile Robots Using 3D Gaussian Splatting for Vision Training | Aneesh Deogan et.al. | 2506.05092 | null |
| 2025-06-06 | Bridging Annotation Gaps: Transferring Labels to Align Object Detection Datasets | Mikhail Kennerley et.al. | 2506.04737 | null |
| 2025-06-05 | Gen-n-Val: Agentic Image Data Generation and Validation | Jing-En Huang et.al. | 2506.04676 | null |
| 2025-06-05 | VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection | Wuyang Li et.al. | 2506.04623 | null |
| 2025-06-04 | FALO: Fast and Accurate LiDAR 3D Object Detection on Resource-Constrained Devices | Shizhong Han et.al. | 2506.04499 | null |
| 2025-06-04 | Neural Object Detection for 4D STEM: High-Throughput Sub-Pixel Electron Diffraction Pattern Recognition | Arda Genc et.al. | 2506.04477 | null |
| 2025-06-04 | Diffusion Domain Teacher: Diffusion Guided Domain Adaptive Object Detector | Boyong He et.al. | 2506.04211 | link |
| 2025-06-04 | FSHNet: Fully Sparse Hybrid Network for 3D Object Detection | Shuai Liu et.al. | 2506.03714 | null |
| 2025-06-04 | How PARTs assemble into wholes: Learning the relative composition of images | Melika Ayoughi et.al. | 2506.03682 | null |
| 2025-06-05 | MambaNeXt-YOLO: A Hybrid State Space Model for Real-time Object Detection | Xiaochun Lei et.al. | 2506.03654 | null |
| 2025-06-04 | DiagNet: Detecting Objects using Diagonal Constraints on Adjacency Matrix of Graph Neural Network | Chong Hyun Lee et.al. | 2506.03571 | null |
| 2025-06-03 | SportMamba: Adaptive Non-Linear Multi-Object Tracking with State Space Models for Team Sports | Dheeraj Khanna et.al. | 2506.03335 | null |
| 2025-06-03 | Simulate Any Radar: Attribute-Controllable Radar Simulation via Waveform Parameter Embedding | Weiqing Xiao et.al. | 2506.03134 | null |
| 2025-06-03 | HACo-Det: A Study Towards Fine-Grained Machine-Generated Text Detection under Human-AI Coauthoring | Zhixiong Su et.al. | 2506.02959 | null |
| 2025-06-03 | Towards Auto-Annotation from Annotation Guidelines: A Benchmark through 3D LiDAR Detection | Yechi Ma et.al. | 2506.02914 | null |
| 2025-06-03 | A Dynamic Transformer Network for Vehicle Detection | Chunwei Tian et.al. | 2506.02765 | null |
| 2025-06-03 | Open-PMC-18M: A High-Fidelity Large Scale Medical Dataset for Multimodal Representation Learning | Negin Baghbanzadeh et.al. | 2506.02738 | null |
| 2025-06-03 | GeneA-SLAM2: Dynamic SLAM with AutoEncoder-Preprocessed Genetic Keypoints Resampling and Depth Variance-Guided Dynamic Region Removal | Shufan Qing et.al. | 2506.02736 | link |
| 2025-06-03 | Sight Guide: A Wearable Assistive Perception and Navigation System for the Vision Assistance Race in the Cybathlon 2024 | Patrick Pfreundschuh et.al. | 2506.02676 | null |
| 2025-06-03 | Probabilistic Online Event Downsampling | Andreu Girbau-Xalabarder et.al. | 2506.02547 | null |
| 2025-06-03 | Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning | Kunyu Wang et.al. | 2506.02462 | null |
| 2025-06-03 | Auto-Labeling Data for Object Detection | Brent A. Griffin et.al. | 2506.02359 | null |
| 2025-05-30 | Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors | Andrea Pedrotti et.al. | 2505.24523 | link |
| 2025-05-30 | Deformable Attention Mechanisms Applied to Object Detection, case of Remote Sensing | Anasse Boutayeb et.al. | 2505.24489 | null |
| 2025-05-30 | Leadership Assessment in Pediatric Intensive Care Unit Team Training | Liangyang Ouyang et.al. | 2505.24389 | null |
| 2025-05-30 | D2AF: A Dual-Driven Annotation and Filtering Framework for Visual Grounding | Yichi Zhang et.al. | 2505.24372 | null |
| 2025-05-29 | Conformal Object Detection by Sequential Risk Control | Léo Andéol et.al. | 2505.24038 | null |
| 2025-05-29 | Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping | Justin Lazarow et.al. | 2505.23756 | null |
| 2025-05-29 | Boosting Domain Incremental Learning: Selecting the Optimal Parameters is All You Need | Qiang Wang et.al. | 2505.23744 | link |
| 2025-05-29 | FMG-Det: Foundation Model Guided Robust Object Detection | Darryl Hannan et.al. | 2505.23726 | null |
| 2025-05-29 | CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection | Woojin Shin et.al. | 2505.23317 | null |
| 2025-05-30 | WTEFNet: Real-Time Low-Light Object Detection for Advanced Driver Assistance Systems | Hao Wu et.al. | 2505.23201 | null |
| 2025-05-29 | Language-guided Learning for Object Detection Tackling Multiple Variations in Aerial Images | Sungjune Park et.al. | 2505.23193 | null |
| 2025-05-29 | DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes | Sungjune Park et.al. | 2505.23179 | null |
| 2025-05-29 | The Meeseeks Mesh: Spatially Consistent 3D Adversarial Objects for BEV Detector | Aixuan Li et.al. | 2505.22499 | null |
| 2025-05-28 | VME: A Satellite Imagery Dataset and Benchmark for Detecting Vehicles in the Middle East and Beyond | Noora Al-Emadi et.al. | 2505.22353 | link |
| 2025-05-28 | Task-Driven Implicit Representations for Automated Design of LiDAR Systems | Nikhil Behari et.al. | 2505.22344 | null |
| 2025-05-29 | YH-MINER: Multimodal Intelligent System for Natural Ecological Reef Metric Extraction | Mingzhuang Wang et.al. | 2505.22250 | null |
| 2025-05-28 | S2AFormer: Strip Self-Attention for Efficient Vision Transformer | Guoan Xu et.al. | 2505.22195 | null |
| 2025-05-28 | Learning A Robust RGB-Thermal Detector for Extreme Modality Imbalance | Chao Tian et.al. | 2505.22154 | null |
| 2025-05-28 | Prototype Embedding Optimization for Human-Object Interaction Detection in Livestreaming | Menghui Zhang et.al. | 2505.22011 | null |
| 2025-05-28 | Cross-DINO: Cross the Deep MLP and Transformer for Small Object Detection | Guiping Cao et.al. | 2505.21868 | null |
| 2025-05-27 | Object Concepts Emerge from Motion | Haoqian Liang et.al. | 2505.21635 | null |
| 2025-05-27 | Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO | Muzhi Zhu et.al. | 2505.21457 | link |
| 2025-05-27 | Visual Product Graph: Bridging Visual Products And Composite Images For End-to-End Style Recommendations | Yue Li Du et.al. | 2505.21454 | null |
| 2025-05-27 | YOLO-SPCI: Enhancing Remote Sensing Object Detection via Selective-Perspective-Class Integration | Xinyuan Wang et.al. | 2505.21370 | null |
| 2025-05-27 | Assured Autonomy with Neuro-Symbolic Perception | R. Spencer Hallyburton et.al. | 2505.21322 | null |
| 2025-05-27 | Robust Video-Based Pothole Detection and Area Estimation for Intelligent Vehicles with Depth Map and Kalman Smoothing | Dehao Wang et.al. | 2505.21049 | null |
| 2025-05-27 | Facial Attribute Based Text Guided Face Anonymization | Mustafa İzzet Muştu et.al. | 2505.21002 | null |
| 2025-05-27 | YOLO-FireAD: Efficient Fire Detection via Attention-Guided Inverted Residual Learning and Dual-Pooling Feature Preservation | Weichao Pan et.al. | 2505.20884 | null |
| 2025-05-27 | Open-Det: An Efficient Learning Framework for Open-Ended Detection | Guiping Cao et.al. | 2505.20639 | null |
| 2025-05-27 | Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models | Peter Robicheaux et.al. | 2505.20612 | null |
| 2025-05-26 | From Data to Modeling: Fully Open-vocabulary Scene Graph Generation | Zuyao Chen et.al. | 2505.20106 | null |
| 2025-05-26 | Target Tracking via LiDAR-RADAR Sensor Fusion for Autonomous Racing | Marcello Cellina et.al. | 2505.20043 | null |
| 2025-05-26 | Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement | Afrah Shaahid et.al. | 2505.19895 | null |
| 2025-05-26 | ADD-SLAM: Adaptive Dynamic Dense SLAM with Gaussian Splatting | Wenhua Wu et.al. | 2505.19420 | null |
| 2025-05-26 | Neural nanophotonic object detector with ultra-wide field-of-view | Ji Chen et.al. | 2505.19379 | null |
| 2025-05-25 | What do Blind and Low-Vision People Really Want from Assistive Smart Devices? Comparison of the Literature with a Focus Study | Bhanuka Gamage et.al. | 2505.19325 | null |
| 2025-05-25 | VL-SAM-V2: Open-World Object Detection with General and Specific Query Fusion | Zhiwei Lin et.al. | 2505.18986 | null |
| 2025-05-24 | Mitigating Context Bias in Domain Adaptation for Object Detection using Mask Pooling | Hojun Son et.al. | 2505.18446 | null |
| 2025-05-23 | Sampling Strategies for Efficient Training of Deep Learning Object Detection Algorithms | Gefei Shen et.al. | 2505.18302 | null |
| 2025-05-23 | One RL to See Them All: Visual Triple Unified Reinforcement Learning | Yan Ma et.al. | 2505.18129 | link |
| 2025-05-23 | SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond Classification | Shashank Agnihotri et.al. | 2505.18015 | null |
| 2025-05-23 | RQR3D: Reparametrizing the regression targets for BEV-based 3D object detection | Ozsel Kilinc et.al. | 2505.17732 | null |
| 2025-05-23 | Adaptive Semantic Token Communication for Transformer-based Edge Inference | Alessio Devoto et.al. | 2505.17604 | null |
| 2025-05-23 | Distance Estimation in Outdoor Driving Environments Using Phase-only Correlation Method with Event Cameras | Masataka Kobayashi et.al. | 2505.17582 | null |
| 2025-05-23 | OrionBench: A Benchmark for Chart and Human-Recognizable Object Detection in Infographics | Jiangning Zhu et.al. | 2505.17473 | null |
| 2025-05-23 | Reflectance Prediction-based Knowledge Distillation for Robust 3D Object Detection in Compressed Point Clouds | Hao Jing et.al. | 2505.17442 | null |
| 2025-05-23 | Optimizing YOLOv8 for Parking Space Detection: Comparative Analysis of Custom YOLOv8 Architecture | Apar Pokhrel et.al. | 2505.17364 | null |
| 2025-05-22 | Extending Dataset Pruning to Object Detection: A Variance-based Approach | Ryota Yagi et.al. | 2505.17245 | null |
| 2025-05-22 | Semi-Supervised State-Space Model with Dynamic Stacking Filter for Real-World Video Deraining | Shangquan Sun et.al. | 2505.16811 | null |
| 2025-05-22 | Robust Vision-Based Runway Detection through Conformal Prediction and Conformal mAP | Alya Zouzou et.al. | 2505.16740 | link |
| 2025-05-22 | CodeMerge: Codebook-Guided Model Merging for Robust Test-Time Adaptation in Autonomous Driving | Huitong Yang et.al. | 2505.16524 | null |
| 2025-05-22 | MAFE R-CNN: Selecting More Samples to Learn Category-aware Features for Small Object Detection | Yichen Li et.al. | 2505.16442 | null |
| 2025-05-22 | AdvReal: Adversarial Patch Generation Framework with Application to Adversarial Safety Evaluation of Object Detection Systems | Yuanhao Huang et.al. | 2505.16402 | link |
| 2025-05-22 | Self-Classification Enhancement and Correction for Weakly Supervised Object Detection | Yufei Yin et.al. | 2505.16294 | null |
| 2025-05-21 | MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling | Cheng Yifan et.al. | 2505.15772 | null |
| 2025-05-21 | The Devil is in Fine-tuning and Long-tailed Problems:A New Benchmark for Scene Text Detection | Tianjiao Cao et.al. | 2505.15649 | link |
| 2025-05-21 | SNAP: A Benchmark for Testing the Effects of Capture Conditions on Fundamental Vision Tasks | Iuliia Kotseruba et.al. | 2505.15628 | link |
| 2025-05-21 | Detection of Underwater Multi-Targets Based on Self-Supervised Learning and Deformable Path Aggregation Feature Pyramid Network | Chang Liu et.al. | 2505.15518 | null |
| 2025-05-21 | Trends and Challenges in Authorship Analysis: A Review of ML, DL, and LLM Approaches | Nudrat Habib et.al. | 2505.15422 | null |
| 2025-05-21 | RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation | Naman Patel et.al. | 2505.15373 | null |
| 2025-05-21 | AGENT-X: Adaptive Guideline-based Expert Network for Threshold-free AI-generated teXt detection | Jiatao Li et.al. | 2505.15261 | null |
| 2025-05-21 | Multispectral Detection Transformer with Infrared-Centric Sensor Fusion | Seongmin Hwang et.al. | 2505.15137 | null |
| 2025-05-20 | Colors Matter: AI-Driven Exploration of Human Feature Colors | Rama Alyoubi et.al. | 2505.14931 | link |
| 2025-05-20 | Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It) | Rafael Rivera Soto et.al. | 2505.14608 | null |
| 2025-05-20 | SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation | Yuyang Dong et.al. | 2505.14381 | null |
| 2025-05-20 | FAID: Fine-grained AI-generated Text Detection using Multi-task Auxiliary and Multi-level Contrastive Learning | Minh Ngoc Ta et.al. | 2505.14271 | null |
| 2025-05-20 | Decoupling Classifier for Boosting Few-shot Object Detection and Instance Segmentation | Bin-Bin Gao et.al. | 2505.14239 | link |
| 2025-05-20 | Intra-class Patch Swap for Self-Distillation | Hongjun Choi et.al. | 2505.14124 | link |
| 2025-05-20 | Scaling Vision Mamba Across Resolutions via Fractal Traversal | Bo Li et.al. | 2505.14062 | null |
| 2025-05-20 | Automated Quality Evaluation of Cervical Cytopathology Whole Slide Images Based on Content Analysis | Lanlan Kang et.al. | 2505.13875 | null |
| 2025-05-20 | Safety2Drive: Safety-Critical Scenario Benchmark for the Evaluation of Autonomous Driving | Jingzheng Li et.al. | 2505.13872 | null |
| 2025-05-20 | Domain Gating Ensemble Networks for AI-Generated Text Detection | Arihant Tripathi et.al. | 2505.13855 | null |
| 2025-05-20 | A Challenge to Build Neuro-Symbolic Video Agents | Sahil Shah et.al. | 2505.13851 | null |
| 2025-05-19 | Dynamic Graph Induced Contour-aware Heat Conduction Network for Event-based Object Detection | Xiao Wang et.al. | 2505.12908 | link |
| 2025-05-19 | Rethinking Features-Fused-Pyramid-Neck for Object Detection | Hulin Li et.al. | 2505.12820 | link |
| 2025-05-19 | Enhancing Transformers Through Conditioned Embedded Tokens | Hemanth Saratchandran et.al. | 2505.12789 | null |
| 2025-05-19 | LiDAR MOT-DETR: A LiDAR-based Two-Stage Transformer for 3D Multiple Object Tracking | Martha Teiko Teye et.al. | 2505.12753 | null |
| 2025-05-19 | VLC Fusion: Vision-Language Conditioned Sensor Fusion for Robust Object Detection | Aditya Taparia et.al. | 2505.12715 | null |
| 2025-05-18 | LM $^2$ otifs : An Explainable Framework for Machine-Generated Texts Detection | Xu Zheng et.al. | 2505.12507 | null |
| 2025-05-17 | EarthSynth: Generating Informative Earth Observation with Diffusion Models | Jiancheng Pan et.al. | 2505.12108 | null |
| 2025-05-17 | Experimental Study on Automatically Assembling Custom Catering Packages With a 3-DOF Delta Robot Using Deep Learning Methods | Reihaneh Yourdkhani et.al. | 2505.11879 | null |
| 2025-05-16 | Improving Object Detection Performance through YOLOv8: A Comprehensive Training and Evaluation Study | Rana Poureskandar et.al. | 2505.11424 | null |
| 2025-05-16 | MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection | Shrutarv Awasthi et.al. | 2505.11282 | link |
| 2025-05-16 | M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for Optical-SAR Fusion Object Detection | Chao Wang et.al. | 2505.10931 | link |
| 2025-05-16 | A High-Performance Thermal Infrared Object Detection Framework with Centralized Regulation | Jinke Li et.al. | 2505.10825 | null |
| 2025-05-15 | StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation | Daniel A. P. Oliveira et.al. | 2505.10292 | link |
| 2025-05-15 | Defect Detection in Photolithographic Patterns Using Deep Learning Models Trained on Synthetic Data | Prashant P. Shinde et.al. | 2505.10192 | null |
| 2025-05-15 | Application of YOLOv8 in monocular downward multiple Car Target detection | Shijie Lyu et.al. | 2505.10016 | null |
| 2025-05-14 | EdgeAI Drone for Autonomous Construction Site Demonstrator | Emre Girgin et.al. | 2505.09837 | link |
| 2025-05-14 | WhatsAI: Transforming Meta Ray-Bans into an Extensible Generative AI Platform for Accessibility | Nasif Zaman et.al. | 2505.09823 | null |
| 2025-05-14 | MoRAL: Motion-aware Multi-Frame 4D Radar and LiDAR Fusion for Robust 3D Object Detection | Xiangyuan Peng et.al. | 2505.09422 | null |
| 2025-05-14 | A drone that learns to efficiently find objects in agricultural fields: from simulation to the real world | Rick van Essen et.al. | 2505.09278 | null |
| 2025-05-14 | DRRNet: Macro-Micro Feature Fusion and Dual Reverse Refinement for Camouflaged Object Detection | Jianlin Sun et.al. | 2505.09168 | link |
| 2025-05-14 | Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models | Lucas Choi et.al. | 2505.09139 | null |
| 2025-05-14 | Promoting SAM for Camouflaged Object Detection via Selective Key Point-based Guidance | Guoying Liang et.al. | 2505.09123 | null |
| 2025-05-13 | Robustness Analysis against Adversarial Patch Attacks in Fully Unmanned Stores | Hyunsik Na et.al. | 2505.08835 | null |
| 2025-05-13 | Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness | Reihaneh Mirjalili et.al. | 2505.08627 | null |
| 2025-05-14 | Thermal Detection of People with Mobility Restrictions for Barrier Reduction at Traffic Lights Controlled Intersections | Xiao Ni et.al. | 2505.08568 | link |
| 2025-05-13 | MDF: Multi-Modal Data Fusion with CNN-Based Object Detection for Enhanced Indoor Localization Using LiDAR-SLAM | Saqi Hussain Kalan et.al. | 2505.08388 | null |
| 2025-05-13 | HMPNet: A Feature Aggregation Architecture for Maritime Object Detection from a Shipborne Perspective | Yu Zhang et.al. | 2505.08231 | link |
| 2025-05-13 | Object detection in adverse weather conditions for autonomous vehicles using Instruct Pix2Pix | Unai Gurbindo et.al. | 2505.08228 | null |
| 2025-05-13 | MoKD: Multi-Task Optimization for Knowledge Distillation | Zeeshan Hayder et.al. | 2505.08170 | null |
| 2025-05-12 | LAMM-ViT: AI Face Detection via Layer-Aware Modulation of Region-Guided Attention | Jiangling Zhang et.al. | 2505.07734 | null |
| 2025-05-12 | Hybrid Spiking Vision Transformer for Object Detection with Event Cameras | Qi Xu et.al. | 2505.07715 | null |
| 2025-05-12 | Self-Supervised Event Representations: Towards Accurate, Real-Time Perception on SoC FPGAs | Kamil Jeziorek et.al. | 2505.07556 | null |
| 2025-05-12 | Automated Visual Attention Detection using Mobile Eye Tracking in Behavioral Classroom Studies | Efe Bozkir et.al. | 2505.07552 | null |
| 2025-05-12 | DepthFusion: Depth-Aware Hybrid Feature Fusion for LiDAR-Camera 3D Object Detection | Mingqian Ji et.al. | 2505.07398 | null |
| 2025-05-12 | Language-Driven Dual Style Mixing for Single-Domain Generalized Object Detection | Hongda Qin et.al. | 2505.07219 | link |
| 2025-05-11 | Differentiable NMS via Sinkhorn Matching for End-to-End Fabric Defect Detection | Zhengyang Lu et.al. | 2505.07040 | null |
| 2025-05-11 | VALISENS: A Validated Innovative Multi-Sensor System for Cooperative Automated Driving | Lei Wan et.al. | 2505.06980 | null |
| 2025-05-10 | M3CAD: Towards Generic Cooperative Autonomous Driving Benchmark | Morui Zhu et.al. | 2505.06746 | null |
| 2025-05-10 | Underwater object detection in sonar imagery with detection transformer and Zero-shot neural architecture search | XiaoTong Gu et.al. | 2505.06694 | null |
| 2025-05-09 | Camera-Only Bird’s Eye View Perception: A Neural Approach to LiDAR-Free Environmental Mapping for Autonomous Vehicles | Anupkumar Bochare et.al. | 2505.06113 | null |
| 2025-05-09 | Artificial intelligence pioneers the double-strangeness factory | Yan He et.al. | 2505.05802 | null |
| 2025-05-09 | Dome-DETR: DETR with Density-Oriented Feature-Query Manipulation for Efficient Tiny Object Detection | Zhangchi Hu et.al. | 2505.05741 | link |
| 2025-05-09 | DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer | Ho-Joong Kim et.al. | 2505.05711 | link |
| 2025-05-08 | PillarMamba: Learning Local-Global Context for Roadside Point Cloud via Hybrid State Space Model | Zhang Zhang et.al. | 2505.05397 | null |
| 2025-05-08 | PaniCar: Securing the Perception of Advanced Driving Assistance Systems Against Emergency Vehicle Lighting | Elad Feldman et.al. | 2505.05183 | null |
| 2025-05-08 | Reliably Bounding False Positives: A Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal Prediction | Xiaowei Zhu et.al. | 2505.05084 | null |
| 2025-05-08 | FG-CLIP: Fine-Grained Visual and Textual Alignment | Chunyu Xie et.al. | 2505.05071 | link |
| 2025-05-08 | A Simple Detector with Frame Dynamics is a Strong Tracker | Chenxu Peng et.al. | 2505.04917 | null |
| 2025-05-08 | Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model | Navin Ranjan et.al. | 2505.04861 | null |
| 2025-05-07 | Lightweight RGB-D Salient Object Detection from a Speed-Accuracy Tradeoff Perspective | Songsong Duan et.al. | 2505.04758 | null |
| 2025-05-07 | Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer | Sainath Dey et.al. | 2505.04740 | null |
| 2025-05-08 | MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection | Zhihao Zhang et.al. | 2505.04594 | null |
| 2025-05-07 | Edge-GPU Based Face Tracking for Face Detection and Recognition Acceleration | Asma Baobaid et.al. | 2505.04524 | null |
| 2025-05-07 | Leveraging Simultaneous Usage of Edge GPU Hardware Engines for Video Face Detection and Recognition | Asma Baobaid et.al. | 2505.04502 | null |
| 2025-05-07 | DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception | Junjie Wang et.al. | 2505.04410 | link |
| 2025-05-06 | LogisticsVLN: Vision-Language Navigation For Low-Altitude Terminal Delivery Based on Agentic UAVs | Xinyuan Zhang et.al. | 2505.03460 | null |
| 2025-05-06 | Robustness in AI-Generated Detection: Enhancing Resistance to Adversarial Attacks | Sun Haoxuan et.al. | 2505.03435 | null |
| 2025-05-06 | From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection | Guoting Wei et.al. | 2505.03334 | null |
| 2025-05-06 | VISLIX: An XAI Framework for Validating Vision Models with Slice Discovery and Analysis | Xinyuan Yan et.al. | 2505.03132 | null |
| 2025-05-05 | Sim2Real Transfer for Vision-Based Grasp Verification | Pau Amargant et.al. | 2505.03046 | link |
| 2025-05-05 | DPNet: Dynamic Pooling Network for Tiny Object Detection | Luqi Gong et.al. | 2505.02797 | null |
| 2025-05-05 | RGBX-DiffusionDet: A Framework for Multi-Modal RGB-X Object Detection Using DiffusionDet | Eliraz Orfaig et.al. | 2505.02586 | null |
| 2025-05-05 | Point Cloud Recombination: Systematic Real Data Augmentation Using Robotic Targets for LiDAR Perception Validation | Hubert Padusinski et.al. | 2505.02476 | null |
| 2025-05-04 | Robust AI-Generated Face Detection with Imbalanced Data | Yamini Sri Krubha et.al. | 2505.02182 | link |
| 2025-05-04 | Transforming faces into video stories – VideoFace2.0 | Branko Brkljač et.al. | 2505.02060 | null |
| 2025-05-03 | DriveNetBench: An Affordable and Configurable Single-Camera Benchmarking System for Autonomous Driving Networks | Ali Al-Bustami et.al. | 2505.01893 | link |
| 2025-05-03 | OODTE: A Differential Testing Engine for the ONNX Optimizer | Nikolaos Louloudakis et.al. | 2505.01892 | null |
| 2025-05-03 | CMAWRNet: Multiple Adverse Weather Removal via a Unified Quaternion Neural Architecture | Vladimir Frants et.al. | 2505.01882 | null |
| 2025-05-03 | DualDiff: Dual-branch Diffusion Model for Autonomous Driving with Semantic Fusion | Haoteng Li et.al. | 2505.01857 | null |
| 2025-05-03 | Toward Onboard AI-Enabled Solutions to Space Object Detection for Space Sustainability | Wenxuan Zhang et.al. | 2505.01650 | null |
| 2025-05-02 | Efficient Vision-based Vehicle Speed Estimation | Andrej Macko et.al. | 2505.01203 | null |
| 2025-05-02 | CDFormer: Cross-Domain Few-Shot Object Detection Transformer Against Feature Confusion | Boyuan Meng et.al. | 2505.00938 | null |
| 2025-05-01 | Efficient On-Chip Implementation of 4D Radar-Based 3D Object Detection on Hailo-8L | Woong-Chan Byun et.al. | 2505.00757 | null |
| 2025-05-03 | Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook | Muyi Bao et.al. | 2505.00630 | null |
| 2025-05-01 | Visual Trajectory Prediction of Vessels for Inland Navigation | Alexander Puzicha et.al. | 2505.00599 | null |
| 2025-05-01 | Synthesizing and Identifying Noise Levels in Autonomous Vehicle Camera Radar Datasets | Mathis Morales et.al. | 2505.00584 | null |
| 2025-05-01 | X-ray illicit object detection using hybrid CNN-transformer neural network architectures | Jorgen Cani et.al. | 2505.00564 | null |
| 2025-05-01 | A Robust Deep Networks based Multi-Object MultiCamera Tracking System for City Scale Traffic | Muhammad Imran Zaman et.al. | 2505.00534 | null |
| 2025-05-01 | Inconsistency-based Active Learning for LiDAR Object Detection | Esteban Rivera et.al. | 2505.00511 | null |
| 2025-05-01 | HeAL3D: Heuristical-enhanced Active Learning for 3D Object Detection | Esteban Rivera et.al. | 2505.00507 | null |
| 2025-05-01 | Quaternion Wavelet-Conditioned Diffusion Models for Image Super-Resolution | Luigi Sigillo et.al. | 2505.00334 | null |
| 2025-04-30 | V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving | Jannik Lübberstedt et.al. | 2505.00156 | null |
| 2025-04-30 | LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics | Marc Glocker et.al. | 2504.21716 | null |
| 2025-04-30 | Visual Text Processing: A Comprehensive Review and Unified Evaluation | Yan Shu et.al. | 2504.21682 | null |
| 2025-04-29 | T2ID-CAS: Diffusion Model and Class Aware Sampling to Mitigate Class Imbalance in Neck Ultrasound Anatomical Landmark Detection | Manikanta Varaganti et.al. | 2504.21231 | null |
| 2025-04-29 | FLIM-based Salient Object Detection Networks with Adaptive Decoders | Gilson Junior Soares et.al. | 2504.20872 | null |
| 2025-04-29 | A Survey on Event-based Optical Marker Systems | Nafiseh Jabbari Tofighi et.al. | 2504.20736 | null |
| 2025-04-29 | Purifying, Labeling, and Utilizing: A High-Quality Pipeline for Small Object Detection | Siwei Wang et.al. | 2504.20602 | null |
| 2025-04-29 | Style-Adaptive Detection Transformer for Single-Source Domain Generalized Object Detection | Jianhong Han et.al. | 2504.20498 | null |
| 2025-04-28 | More Clear, More Flexible, More Precise: A Comprehensive Oriented Object Detection benchmark for UAV | Kai Ye et.al. | 2504.20032 | null |
| 2025-04-28 | Lossy Source Coding with Focal Loss | Alex Dytso et.al. | 2504.19913 | null |
| 2025-04-28 | Neural network task specialization via domain constraining | Roman Malashin et.al. | 2504.19592 | null |
| 2025-04-28 | GMAR: Gradient-Driven Multi-Head Attention Rollout for Vision Transformer Interpretability | Sehyeong Jo et.al. | 2504.19414 | null |
| 2025-04-27 | Improving Small Drone Detection Through Multi-Scale Processing and Data Augmentation | Rayson Laroca et.al. | 2504.19347 | null |
| 2025-04-27 | ODExAI: A Comprehensive Object Detection Explainable AI Evaluation | Loc Phuc Truong Nguyen et.al. | 2504.19249 | null |
| 2025-04-27 | Boosting Single-domain Generalized Object Detection via Vision-Language Knowledge Interaction | Xiaoran Xu et.al. | 2504.19086 | null |
| 2025-04-26 | Federated Learning-based Semantic Segmentation for Lane and Object Detection in Autonomous Driving | Gharbi Khamis Alshammari et.al. | 2504.18939 | null |
| 2025-04-25 | Dream-Box: Object-wise Outlier Generation for Out-of-Distribution Detection | Brian K. S. Isaac-Medina et.al. | 2504.18746 | null |
| 2025-04-25 | A Review of 3D Object Detection with Vision-Language Models | Ranjan Sapkota et.al. | 2504.18738 | null |
| 2025-04-25 | Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models | Patrick Müller et.al. | 2504.18510 | null |
| 2025-04-25 | Iterative Event-based Motion Segmentation by Variational Contrast Maximization | Ryo Yamaki et.al. | 2504.18447 | null |
| 2025-04-25 | A Multimodal Hybrid Late-Cascade Fusion Network for Enhanced 3D Object Detection | Carlo Sgaravatti et.al. | 2504.18419 | null |
| 2025-04-25 | A comprehensive review of classifier probability calibration metrics | Richard Oliver Lane et.al. | 2504.18278 | null |
| 2025-04-25 | LiDAR-Guided Monocular 3D Object Detection for Long-Range Railway Monitoring | Raul David Dominguez Sanchez et.al. | 2504.18203 | null |
| 2025-04-25 | Multi-Grained Compositional Visual Clue Learning for Image Intent Recognition | Yin Tang et.al. | 2504.18201 | null |
| 2025-04-25 | E-InMeMo: Enhanced Prompting for Visual In-Context Learning | Jiahao Zhang et.al. | 2504.18158 | null |
| 2025-04-25 | MASF-YOLO: An Improved YOLOv11 Network for Small Object Detection on Drone View | Liugang Lu et.al. | 2504.18136 | null |
| 2025-04-25 | Opportunistic Collaborative Planning with Large Vision Model Guided Control and Joint Query-Service Optimization | Jiayi Chen et.al. | 2504.18057 | null |
| 2025-04-25 | Direct sampling method to retrieve small objects from two-dimensional limited-aperture scattered field data | Won-Kwang Park et.al. | 2504.18036 | null |
| 2025-04-24 | DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks | Yinqi Li et.al. | 2504.17253 | link |
| 2025-04-24 | Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation | Phillip Y. Lee et.al. | 2504.17207 | link |
| 2025-04-24 | AUTHENTICATION: Identifying Rare Failure Modes in Autonomous Vehicle Perception Systems using Adversarially Guided Diffusion Models | Mohammad Zarei et.al. | 2504.17179 | null |
| 2025-04-23 | Scene-Aware Location Modeling for Data Augmentation in Automotive Object Detection | Jens Petersen et.al. | 2504.17076 | null |
| 2025-04-23 | Gaussian Splatting is an Effective Data Generator for 3D Object Detection | Farhad G. Zanjani et.al. | 2504.16740 | null |
| 2025-04-23 | EHGCN: Hierarchical Euclidean-Hyperbolic Fusion via Motion-Aware GCN for Hybrid Event Stream Perception | Haosheng Chen et.al. | 2504.16616 | null |
| 2025-04-23 | Beyond Anonymization: Object Scrubbing for Privacy-Preserving 2D and 3D Vision Tasks | Murat Bilgehan Ertan et.al. | 2504.16557 | null |
| 2025-04-23 | Assessing the Feasibility of Internet-Sourced Video for Automatic Cattle Lameness Detection | Md Fahimuzzman Sohan et.al. | 2504.16404 | null |
| 2025-04-23 | Revisiting Radar Camera Alignment by Contrastive Learning for 3D Object Detection | Linhua Kong et.al. | 2504.16368 | null |
| 2025-04-22 | Vision Controlled Orthotic Hand Exoskeleton | Connor Blais et.al. | 2504.16319 | null |
| 2025-04-22 | $π_{0.5}$ : a Vision-Language-Action Model with Open-World Generalization | Physical Intelligence et.al. | 2504.16054 | null |
| 2025-04-22 | SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems | Manjunath D et.al. | 2504.15728 | link |
| 2025-04-22 | You Sense Only Once Beneath: Ultra-Light Real-Time Underwater Object Detection | Jun Dong et.al. | 2504.15694 | null |
| 2025-04-22 | A Vision-Enabled Prosthetic Hand for Children with Upper Limb Disabilities | Md Abdul Baset Sarker et.al. | 2504.15654 | null |
| 2025-04-21 | Context Aware Grounded Teacher for Source Free Object Detection | Tajamul Ashraf et.al. | 2504.15404 | link |
| 2025-04-21 | SuoiAI: Building a Dataset for Aquatic Invertebrates in Vietnam | Tue Vo et.al. | 2504.15252 | null |
| 2025-04-21 | An Efficient Aerial Image Detection with Variable Receptive Fields | Liu Wenbin et.al. | 2504.15165 | null |
| 2025-04-19 | Balancing Privacy and Action Performance: A Penalty-Driven Approach to Image Anonymization | Nazia Aslam et.al. | 2504.14301 | null |
| 2025-04-19 | Visual Consensus Prompting for Co-Salient Object Detection | Jie Wang et.al. | 2504.14254 | link |
| 2025-04-18 | Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models | Junjie Yang et.al. | 2504.13825 | null |
| 2025-04-18 | Lightweight LiDAR-Camera 3D Dynamic Object Detection and Multi-Class Trajectory Prediction | Yushen He et.al. | 2504.13647 | link |
| 2025-04-18 | DenSe-AdViT: A novel Vision Transformer for Dense SAR Object Detection | Yang Zhang et.al. | 2504.13638 | null |
| 2025-04-18 | HMPE:HeatMap Embedding for Efficient Transformer-Based Small Object Detection | YangChen Zeng et.al. | 2504.13469 | null |
| 2025-04-18 | Towards a Multi-Agent Vision-Language System for Zero-Shot Novel Hazardous Object Detection for Autonomous Driving Safety | Shashank Shriram et.al. | 2504.13399 | link |
| 2025-04-17 | VLLFL: A Vision-Language Model Based Lightweight Federated Learning Framework for Smart Agriculture | Long Li et.al. | 2504.13365 | null |
| 2025-04-17 | SAR Object Detection with Self-Supervised Pretraining and Curriculum-Aware Sampling | Yasin Almalioglu et.al. | 2504.13310 | null |
| 2025-04-17 | Weak Cube R-CNN: Weakly Supervised 3D Detection using only 2D Bounding Boxes | Andreas Lau Hansen et.al. | 2504.13297 | link |
| 2025-04-17 | RF-DETR Object Detection vs YOLOv12 : A Study of Transformer-based and CNN-based Architectures for Single-Class and Multi-Class Greenfruit Detection in Complex Orchard Environments Under Label Ambiguity | Ranjan Sapkota et.al. | 2504.13099 | null |
| 2025-04-17 | Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving | Shumin Wang et.al. | 2504.12709 | null |
| 2025-04-18 | RoPETR: Improving Temporal Camera-Only 3D Detection by Integrating Enhanced Rotary Position Embedding | Hang Ji et.al. | 2504.12643 | null |
| 2025-04-16 | Towards a General-Purpose Zero-Shot Synthetic Low-Light Image and Video Pipeline | Joanne Lin et.al. | 2504.12169 | null |
| 2025-04-16 | RADLER: Radar Object Detection Leveraging Semantic 3D City Models and Self-Supervised Radar-Image Learning | Yuan Luo et.al. | 2504.12167 | null |
| 2025-04-16 | pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild | Jonas Myhre Schiøtt et.al. | 2504.12045 | null |
| 2025-04-16 | A Review of YOLOv12: Attention-Based Enhancements vs. Previous Versions | Rahima Khanam et.al. | 2504.11995 | null |
| 2025-04-16 | Multimodal Spatio-temporal Graph Learning for Alignment-free RGBT Video Object Detection | Qishun Wang et.al. | 2504.11779 | null |
| 2025-04-15 | Multi-level Cellular Automata for FLIM networks | Felipe Crispim Salvagnini et.al. | 2504.11406 | null |
| 2025-04-15 | OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution | Lucio La Cava et.al. | 2504.11369 | null |
| 2025-04-15 | CFIS-YOLO: A Lightweight Multi-Scale Fusion Network for Edge-Deployable Wood Defect Detection | Jincheng Kang et.al. | 2504.11305 | null |
| 2025-04-15 | TSAL: Few-shot Text Segmentation Based on Attribute Learning | Chenming Li et.al. | 2504.11164 | null |
| 2025-04-15 | Flyweight FLIM Networks for Salient Object Detection in Biomedical Images | Leonardo M. Joao et.al. | 2504.11112 | null |
| 2025-04-15 | S $^2$ Teacher: Step-by-step Teacher for Sparsely Annotated Oriented Object Detection | Yu Lin et.al. | 2504.11111 | null |
| 2025-04-15 | DRIFT open dataset: A drone-derived intelligence for traffic analysis in urban environmen | Hyejin Lee et.al. | 2504.11019 | link |
| 2025-04-16 | GATE3D: Generalized Attention-based Task-synergized Estimation in 3D* | Eunsoo Im et.al. | 2504.11014 | null |
| 2025-04-15 | CDUPatch: Color-Driven Universal Adversarial Patch Attack for Dual-Modal Visible-Infrared Detectors | Jiahuan Long et.al. | 2504.10888 | null |
| 2025-04-15 | Safe-Construct: Redefining Construction Safety Violation Recognition as 3D Multi-View Engagement Task | Aviral Chharia et.al. | 2504.10880 | null |
| 2025-04-14 | DiffMOD: Progressive Diffusion Point Denoising for Moving Object Detection in Remote Sensing | Jinyue Zhang et.al. | 2504.10278 | null |
| 2025-04-14 | Balancing Stability and Plasticity in Pretrained Detector: A Dual-Path Framework for Incremental Object Detection | Songze Li et.al. | 2504.10214 | null |
| 2025-04-14 | WildLive: Near Real-time Visual Wildlife Tracking onboard UAVs | Nguyen Ngoc Dat et.al. | 2504.10165 | null |
| 2025-04-14 | COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts | Jiansheng Li et.al. | 2504.10158 | null |
| 2025-04-14 | SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting | Dongliang Luo et.al. | 2504.09966 | link |
| 2025-04-14 | Small Object Detection with YOLO: A Performance Analysis Across Model Versions and Hardware | Muhammad Fasih Tariq et.al. | 2504.09900 | null |
| 2025-04-14 | Density-based Object Detection in Crowded Scenes | Chenyang Zhao et.al. | 2504.09819 | null |
| 2025-04-13 | Uncertainty Guided Refinement for Fine-Grained Salient Object Detection | Yao Yuan et.al. | 2504.09666 | link |
| 2025-04-13 | Pillar-Voxel Fusion Network for 3D Object Detection in Airborne Hyperspectral Point Clouds | Yanze Jiang et.al. | 2504.09506 | null |
| 2025-04-13 | Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation | Yongchao Feng et.al. | 2504.09480 | link |
| 2025-04-11 | TinyCenterSpeed: Efficient Center-Based Object Detection for Autonomous Racing | Neil Reichlin et.al. | 2504.08655 | null |
| 2025-04-11 | Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization | Jialu Li et.al. | 2504.08641 | link |
| 2025-04-10 | Enhanced Cooperative Perception Through Asynchronous Vehicle to Infrastructure Framework with Delay Mitigation for Connected and Automated Vehicles | Nithish Kumar Saravanan et.al. | 2504.08172 | null |
| 2025-04-10 | Multi-Task Learning with Multi-Annotation Triplet Loss for Improved Object Detection | Meilun Zhou et.al. | 2504.08054 | null |
| 2025-04-10 | Detect Anything 3D in the Wild | Hanxue Zhang et.al. | 2504.07958 | null |
| 2025-04-11 | Pychop: Emulating Low-Precision Arithmetic in Numerical Methods and Neural Networks | Erin Carson et.al. | 2504.07835 | link |
| 2025-04-10 | P2Object: Single Point Supervised Object Detection and Instance Segmentation | Pengfei Chen et.al. | 2504.07813 | null |
| 2025-04-10 | Nonlocal Retinex-Based Variational Model and its Deep Unfolding Twin for Low-Light Image Enhancement | Daniel Torres et.al. | 2504.07810 | null |
| 2025-04-10 | Adaptive Detection of Fast Moving Celestial Objects Using a Mixture of Experts and Physical-Inspired Neural Network | Peng Jia et.al. | 2504.07777 | null |
| 2025-04-10 | Prediction of Usage Probabilities of Shopping-Mall Corridors Using Heterogeneous Graph Neural Networks | Malik M Barakathullah et.al. | 2504.07645 | null |
| 2025-04-10 | VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model | Haozhan Shen et.al. | 2504.07615 | link |
| 2025-04-10 | RASMD: RGB And SWIR Multispectral Driving Dataset for Robust Perception in Adverse Conditions | Youngwan Jin et.al. | 2504.07603 | null |
| 2025-04-10 | WS-DETR: Robust Water Surface Object Detection through Vision-Radar Fusion with Detection Transformer | Huilin Yin et.al. | 2504.07441 | null |
| 2025-04-10 | Model Discrepancy Learning: Synthetic Faces Detection Based on Multi-Reconstruction | Qingchao Jiang et.al. | 2504.07382 | link |
| 2025-04-09 | Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection | Ruoyu Chen et.al. | 2504.07060 | null |
| 2025-04-09 | UAV Position Estimation using a LiDAR-based 3D Object Detection Method | Uthman Olawoye et.al. | 2504.07028 | null |
| 2025-04-09 | Towards Efficient Roadside LiDAR Deployment: A Fast Surrogate Metric Based on Entropy-Guided Visibility | Yuze Jiang et.al. | 2504.06772 | null |
| 2025-04-09 | Domain-Conditioned Scene Graphs for State-Grounded Task Planning | Jonas Herzog et.al. | 2504.06661 | null |
| 2025-04-09 | Visually Similar Pair Alignment for Robust Cross-Domain Object Detection | Onkar Krishna et.al. | 2504.06607 | null |
| 2025-04-08 | From Broadcast to Minimap: Achieving State-of-the-Art SoccerNet Game State Reconstruction | Vladimir Golovkin et.al. | 2504.06357 | null |
| 2025-04-08 | Analyzing the Impact of Low-Rank Adaptation for Cross-Domain Few-Shot Object Detection in Aerial Images | Hicham Talaoubrid et.al. | 2504.06330 | link |
| 2025-04-08 | Security Analysis of Thumbnail-Preserving Image Encryption and a New Framework | Dong Xie et.al. | 2504.06083 | null |
| 2025-04-08 | Balancing long- and short-term dynamics for the modeling of saliency in videos | Theodor Wulff et.al. | 2504.05913 | null |
| 2025-04-08 | PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario | Sriram Mandalika et.al. | 2504.05908 | null |
| 2025-04-08 | Intrinsic Saliency Guided Trunk-Collateral Network for Unsupervised Video Object Segmentation | Xiangyu Zheng et.al. | 2504.05904 | null |
| 2025-04-08 | KAN-SAM: Kolmogorov-Arnold Network Guided Segment Anything Model for RGB-T Salient Object Detection | Xingyuan Li et.al. | 2504.05878 | null |
| 2025-04-08 | DefMamba: Deformable Visual State Space Model | Leiye Liu et.al. | 2504.05794 | null |
| 2025-04-08 | Event-based Civil Infrastructure Visual Defect Detection: ev-CIVIL Dataset and Benchmark | Udayanga G. W. K. N. Gamage et.al. | 2504.05679 | null |
| 2025-04-08 | POD: Predictive Object Detection with Single-Frame FMCW LiDAR Point Cloud | Yining Shi et.al. | 2504.05649 | null |
| 2025-04-08 | AD-Det: Boosting Object Detection in UAV Images with Focused Small Objects and Balanced Tail Classes | Zhenteng Li et.al. | 2504.05601 | null |
| 2025-04-07 | SSLFusion: Scale & Space Aligned Latent Fusion Model for Multimodal 3D Object Detection | Bonan Ding et.al. | 2504.05170 | null |
| 2025-04-07 | Inland Waterway Object Detection in Multi-environment: Dataset and Approach | Shanshan Wang et.al. | 2504.04835 | null |
| 2025-04-07 | Playing Non-Embedded Card-Based Games with Reinforcement Learning | Tianyang Wu et.al. | 2504.04783 | link |
| 2025-04-07 | Feedback-Enhanced Hallucination-Resistant Vision-Language Model for Real-Time Scene Understanding | Zahir Alsulaimawi et.al. | 2504.04772 | null |
| 2025-04-07 | Inverse++: Vision-Centric 3D Semantic Occupancy Prediction Assisted with 3D Object Detection | Zhenxing Ming et.al. | 2504.04732 | null |
| 2025-04-06 | Enhance Then Search: An Augmentation-Search Strategy with Foundation Models for Cross-Domain Few-Shot Object Detection | Jiancheng Pan et.al. | 2504.04517 | link |
| 2025-04-06 | eKalibr-Stereo: Continuous-Time Spatiotemporal Calibration for Event-Based Stereo Visual Systems | Shuolong Chen et.al. | 2504.04451 | link |
| 2025-04-05 | Autoregressive High-Order Finite Difference Modulo Imaging: High-Dynamic Range for Computer Vision Applications | Brayan Monroy et.al. | 2504.04228 | null |
| 2025-04-05 | An Optimized Density-Based Lane Keeping System for A Cost-Efficient Autonomous Vehicle Platform: AurigaBot V1 | Farbod Younesi et.al. | 2504.04217 | null |
| 2025-04-05 | Learning about the Physical World through Analytic Concepts | Jianhua Sun et.al. | 2504.04170 | null |
| 2025-04-04 | VISTA-OCR: Towards generative and interactive end to end OCR models | Laziz Hamdi et.al. | 2504.03621 | null |
| 2025-04-04 | PF3Det: A Prompted Foundation Feature Assisted Visual LiDAR 3D Detector | Kaidong Li et.al. | 2504.03563 | null |
| 2025-04-04 | ZFusion: An Effective Fuser of Camera and 4D Radar for 3D Object Perception in Autonomous Driving | Sheng Yang et.al. | 2504.03438 | null |
| 2025-04-04 | Infrared bubble recognition in the Milky Way and beyond using deep learning | Shimpei Nishimoto et.al. | 2504.03367 | null |
| 2025-04-04 | Real-Time Roadway Obstacle Detection for Electric Scooters Using Deep Learning and Multi-Sensor Fusion | Zeyang Zheng et.al. | 2504.03171 | null |
| 2025-04-04 | Finding the Reflection Point: Unpadding Images to Remove Data Augmentation Artifacts in Large Open Source Image Datasets for Machine Learning | Lucas Choi et.al. | 2504.03168 | null |
| 2025-04-03 | Attention-Aware Multi-View Pedestrian Tracking | Reef Alturki et.al. | 2504.03047 | null |
| 2025-04-03 | LiDAR-based Object Detection with Real-time Voice Specifications | Anurag Kulkarni et.al. | 2504.02920 | null |
| 2025-04-03 | BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation | Van Nguyen Nguyen et.al. | 2504.02812 | link |
| 2025-04-03 | Rip Current Segmentation: A Novel Benchmark and YOLOv8 Baseline Results | Andrei Dumitriu et.al. | 2504.02558 | link |
| 2025-04-03 | Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision | Xiaofeng Han et.al. | 2504.02477 | link |
| 2025-04-03 | CornerPoint3D: Look at the Nearest Corner Instead of the Center | Ruixiao Zhang et.al. | 2504.02464 | null |
| 2025-04-03 | Hyperspectral Remote Sensing Images Salient Object Detection: The First Benchmark Dataset and Baseline | Peifu Liu et.al. | 2504.02416 | null |
| 2025-04-03 | SemiISP/SemiIE: Semi-Supervised Image Signal Processor and Image Enhancement Leveraging One-to-Many Mapping sRGB-to-RAW | Masakazu Yoshimura et.al. | 2504.02345 | null |
| 2025-04-03 | Improving Harmful Text Detection with Joint Retrieval and External Knowledge | Zidong Yu et.al. | 2504.02310 | null |
| 2025-04-03 | LLM-Guided Evolution: An Autonomous Model Optimization for Object Detection | YiMing Yu et.al. | 2504.02280 | null |
| 2025-04-02 | Cat-Eye Inspired Active-Passive-Composite Aperture-Shared Sub-Terahertz Meta-Imager for Non-Interactive Concealed Object Detection | Mingshuang Hu et.al. | 2504.01473 | null |
| 2025-04-02 | CFMD: Dynamic Cross-layer Feature Fusion for Salient Object Detection | Jin Lian et.al. | 2504.01326 | null |
| 2025-04-01 | Enabling Efficient Processing of Spiking Neural Networks with On-Chip Learning on Commodity Neuromorphic Processors for Edge AI Systems | Rachmad Vidya Wicaksana Putra et.al. | 2504.00957 | null |
| 2025-04-01 | NeuRadar: Neural Radiance Fields for Automotive Radar Point Clouds | Mahan Rafidashti et.al. | 2504.00859 | null |
| 2025-04-01 | AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection | Loveneet Saini et.al. | 2504.00559 | null |
| 2025-04-01 | Archival Faces: Detection of Faces in Digitized Historical Documents | Marek Vaško et.al. | 2504.00558 | null |
| 2025-04-01 | High-Quality Pseudo-Label Generation Based on Visual Prompt Assisted Cloud Model Update | Xinrun Xu et.al. | 2504.00526 | null |
| 2025-04-01 | Intrinsic-feature-guided 3D Object Detection | Wanjing Zhang et.al. | 2504.00382 | null |
| 2025-04-01 | CamoSAM2: Motion-Appearance Induced Auto-Refining Prompts for Video Camouflaged Object Detection | Xin Zhang et.al. | 2504.00375 | null |
| 2025-03-31 | Towards Precise Action Spotting: Addressing Temporal Misalignment in Labels with Dynamic Label Assignment | Masato Tamura et.al. | 2504.00149 | null |
| 2025-03-31 | SU-YOLO: Spiking Neural Network for Efficient Underwater Object Detection | Chenyang Li et.al. | 2503.24389 | link |
| 2025-03-31 | MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing | Karim Radouane et.al. | 2503.24219 | link |
| 2025-03-31 | Spectral-Adaptive Modulation Networks for Visual Perception | Guhnoo Yun et.al. | 2503.23947 | null |
| 2025-03-31 | Reliable Traffic Monitoring Using Low-Cost Doppler Radar Units | Mishay Naidoo et.al. | 2503.23926 | null |
| 2025-03-31 | Expanding-and-Shrinking Binary Neural Networks | Xulong Shi et.al. | 2503.23709 | link |
| 2025-03-30 | Beyond Detection: Designing AI-Resilient Assessments with Automated Feedback Tool to Foster Critical Thinking | Muhammad Sajjad Akbar et.al. | 2503.23622 | null |
| 2025-03-30 | Re-Aligning Language to Visual Objects with an Agentic Workflow | Yuming Chen et.al. | 2503.23508 | null |
| 2025-03-30 | EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing | Hongxiang Jiang et.al. | 2503.23330 | link |
| 2025-03-29 | Context in object detection: a systematic literature review | Mahtab Jamali et.al. | 2503.23249 | null |
| 2025-03-29 | Large Self-Supervised Models Bridge the Gap in Domain Adaptive Object Detection | Marc-Antoine Lavoie et.al. | 2503.23220 | null |
| 2025-03-28 | AnnoPage Dataset: Dataset of Non-Textual Elements in Documents with Fine-Grained Categorization | Martin Kišš et.al. | 2503.22526 | null |
| 2025-03-28 | Data Quality Matters: Quantifying Image Quality Impact on Machine Learning Performance | Christian Steinhauser et.al. | 2503.22375 | null |
| 2025-03-28 | ForcePose: A Deep Learning Approach for Force Calculation Based on Action Recognition Using MediaPipe Pose Estimation Combined with Object Detection | Nandakishor M et.al. | 2503.22363 | null |
| 2025-03-28 | SKDU at De-Factify 4.0: Natural Language Features for AI-Generated Text-Detection | Shrikant Malviya et.al. | 2503.22338 | link |
| 2025-03-28 | Knowledge Rectification for Camouflaged Object Detection: Unlocking Insights from Low-Quality Data | Juwei Guan et.al. | 2503.22180 | null |
| 2025-03-28 | A Survey on Remote Sensing Foundation Models: From Vision to Multimodality | Ziyue Huang et.al. | 2503.22081 | null |
| 2025-03-27 | AGILE: A Diffusion-Based Attention-Guided Image and Label Translation for Efficient Cross-Domain Plant Trait Identification | Earl Ranario et.al. | 2503.22019 | link |
| 2025-03-27 | FACETS: Efficient Once-for-all Object Detection via Constrained Iterative Search | Tony Tran et.al. | 2503.21999 | null |
| 2025-03-27 | Exponentially Weighted Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection Model Training in Unmanned Aerial Vehicles Surveillance Scenarios | Taufiq Ahmed et.al. | 2503.21893 | null |
| 2025-03-27 | Learning Class Prototypes for Unified Sparse Supervised 3D Object Detection | Yun Zhu et.al. | 2503.21099 | link |
| 2025-03-26 | SaViD: Spectravista Aesthetic Vision Integration for Robust and Discerning 3D Object Detection in Challenging Environments | Tanmoy Dam et.al. | 2503.20614 | link |
| 2025-03-26 | Small Object Detection: A Comprehensive Survey on Challenges, Techniques and Real-World Applications | Mahya Nikouei et.al. | 2503.20516 | null |
| 2025-03-25 | Gemini Robotics: Bringing AI into the Physical World | Gemini Robotics Team et.al. | 2503.20020 | null |
| 2025-03-25 | Hyperdimensional Uncertainty Quantification for Multimodal Uncertainty Fusion in Autonomous Vehicles Perception | Luke Chen et.al. | 2503.20011 | null |
| 2025-03-25 | Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models | Ilias Stogiannidis et.al. | 2503.19707 | null |
| 2025-03-25 | BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction | Jan Kohút et.al. | 2503.19658 | link |
| 2025-03-25 | Single Shot AI-assisted quantification of KI-67 proliferation index in breast cancer | Deepti Madurai Muthu et.al. | 2503.19606 | null |
| 2025-03-25 | MATT-GS: Masked Attention-based 3DGS for Robot Perception and Object Detection | Jee Won Lee et.al. | 2503.19330 | null |
| 2025-03-25 | Multiscale Feature Importance-based Bit Allocation for End-to-End Feature Coding for Machines | Junle Liu et.al. | 2503.19278 | null |
| 2025-03-24 | Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery | Sara Al-Emadi et.al. | 2503.19202 | link |
| 2025-03-24 | Pitch Contour Exploration Across Audio Domains: A Vision-Based Transfer Learning Approach | Jakob Abeßer et.al. | 2503.19161 | null |
| 2025-03-24 | Cooperative Control of Multi-Quadrotors for Transporting Cable-Suspended Payloads: Obstacle-Aware Planning and Event-Based Nonlinear Model Predictive Control | Tohid Kargar Tasooji et.al. | 2503.19135 | null |
| 2025-03-24 | Building Blocks for Robust and Effective Semi-Supervised Real-World Object Detection | Moussa Kassem Sbeyti et.al. | 2503.18903 | null |
| 2025-03-24 | LGI-DETR: Local-Global Interaction for UAV Object Detection | Zifa Chen et.al. | 2503.18785 | null |
| 2025-03-25 | Frequency Dynamic Convolution for Dense Image Prediction | Linwei Chen et.al. | 2503.18783 | link |
| 2025-03-24 | CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection | Zhichao Sun et.al. | 2503.18430 | link |
| 2025-03-24 | Vision-Guided Loco-Manipulation with a Snake Robot | Adarsh Salagame et.al. | 2503.18308 | null |
| 2025-03-23 | Extended Visibility of Autonomous Vehicles via Optimized Cooperative Perception under Imperfect Communication | Ahmad Sarlak et.al. | 2503.18192 | null |
| 2025-03-22 | MAMAT: 3D Mamba-Based Atmospheric Turbulence Removal and its Object Detection Capability | Paul Hill et.al. | 2503.17700 | null |
| 2025-03-22 | Sense4FL: Vehicular Crowdsensing Enhanced Federated Learning for Autonomous Driving | Yanan Ma et.al. | 2503.17697 | null |
| 2025-03-21 | Should we pre-train a decoder in contrastive learning for dense prediction tasks? | Sébastien Quetin et.al. | 2503.17526 | null |
| 2025-03-21 | Event-Based Crossing Dataset (EBCD) | Joey Mulé et.al. | 2503.17499 | null |
| 2025-03-21 | An Iterative Feedback Mechanism for Improving Natural Language Class Descriptions in Open-Vocabulary Object Detection | Louis Y. Kim et.al. | 2503.17285 | null |
| 2025-03-21 | Which2comm: An Efficient Collaborative Perception Framework for 3D Object Detection | Duanrui Yu et.al. | 2503.17175 | null |
| 2025-03-21 | Hi-ALPS – An Experimental Robustness Quantification of Six LiDAR-based Object Detection Systems for Autonomous Driving | Alexandra Arzberger et.al. | 2503.17168 | null |
| 2025-03-21 | R-LiViT: A LiDAR-Visual-Thermal Dataset Enabling Vulnerable Road User Focused Roadside Perception | Jonas Mirlach et.al. | 2503.17122 | null |
| 2025-03-21 | Exploring Few-Shot Object Detection on Blood Smear Images: A Case Study of Leukocytes and Schistocytes | Davide Antonio Mura et.al. | 2503.17107 | null |
| 2025-03-21 | R2LDM: An Efficient 4D Radar Super-Resolution Framework Leveraging Diffusion Model | Boyuan Zheng et.al. | 2503.17097 | null |
| 2025-03-21 | Superpowering Open-Vocabulary Object Detectors for X-ray Vision | Pablo Garcia-Fernandez et.al. | 2503.17071 | link |
| 2025-03-21 | Scoring, Remember, and Reference: Catching Camouflaged Objects in Videos | Yuang Feng et.al. | 2503.17050 | null |
| 2025-03-21 | Salient Object Detection in Traffic Scene through the TSOD10K Dataset | Yu Qiu et.al. | 2503.16910 | null |
| 2025-03-21 | Seg2Box: 3D Object Detection by Point-Wise Semantics Supervision | Maoji Zheng et.al. | 2503.16811 | null |
| 2025-03-20 | RESFL: An Uncertainty-Aware Framework for Responsible Federated Learning by Balancing Privacy, Fairness and Utility in Autonomous Vehicles | Dawood Wasif et.al. | 2503.16251 | null |
| 2025-03-20 | MapGlue: Multimodal Remote Sensing Image Matching | Peihao Wu et.al. | 2503.16185 | null |
| 2025-03-20 | Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection | Jiangyi Wang et.al. | 2503.16125 | null |
| 2025-03-20 | Semantic-Guided Global-Local Collaborative Networks for Lightweight Image Super-Resolution | Wanshu Fan et.al. | 2503.16056 | null |
| 2025-03-19 | A Context-Driven Training-Free Network for Lightweight Scene Text Segmentation and Recognition | Ritabrata Chakraborty et.al. | 2503.15639 | null |
| 2025-03-19 | DCA: Dividing and Conquering Amnesia in Incremental Object Detection | Aoting Zhang et.al. | 2503.15295 | null |
| 2025-03-19 | Test-Time Backdoor Detection for Object Detection Models | Hangtao Zhang et.al. | 2503.15293 | null |
| 2025-03-19 | GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector | Zechuan Li et.al. | 2503.15211 | null |
| 2025-03-19 | UltraFlwr – An Efficient Federated Medical and Surgical Object Detection Framework | Yang Li et.al. | 2503.15161 | null |
| 2025-03-19 | An Investigation of Beam Density on LiDAR Object Detection Performance | Christoph Griesbacher et.al. | 2503.15087 | null |
| 2025-03-19 | SPADE: Systematic Prompt Framework for Automated Dialogue Expansion in Machine-Generated Text Detection | Haoyi Li et.al. | 2503.15044 | link |
| 2025-03-19 | Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark | Ying Liu et.al. | 2503.14862 | null |
| 2025-03-19 | State Space Model Meets Transformer: A New Paradigm for 3D Object Detection | Chuxin Wang et.al. | 2503.14493 | null |
| 2025-03-18 | Panoramic Distortion-Aware Tokenization for Person Detection and Localization Using Transformers in Overhead Fisheye Images | Nobuhiko Wakai et.al. | 2503.14228 | null |
| 2025-03-18 | A Revisit to the Decoder for Camouflaged Object Detection | Seung Woo Ko et.al. | 2503.14035 | null |
| 2025-03-18 | Shift, Scale and Rotation Invariant Multiple Object Detection using Balanced Joint Transform Correlator | Xi Shen et.al. | 2503.14034 | null |
| 2025-03-18 | LEGNet: Lightweight Edge-Gaussian Driven Network for Low-Quality Remote Sensing Image Object Detection | Wei Lu et.al. | 2503.14012 | link |
| 2025-03-18 | FrustumFusionNets: A Three-Dimensional Object Detection Network Based on Tractor Road Scene | Lili Yang et.al. | 2503.13951 | null |
| 2025-03-18 | Is Discretization Fusion All You Need for Collaborative Perception? | Kang Yang et.al. | 2503.13946 | link |
| 2025-03-18 | PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds | Barza Nisar et.al. | 2503.13914 | link |
| 2025-03-18 | HSOD-BIT-V2: A New Challenging Benchmarkfor Hyperspectral Salient Object Detection | Yuhao Qiu et.al. | 2503.13906 | null |
| 2025-03-18 | TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection | Qiang Qi et.al. | 2503.13903 | null |
| 2025-03-17 | Beyond RGB: Adaptive Parallel Processing for RAW Object Detection | Shani Gamrian et.al. | 2503.13163 | null |
| 2025-03-17 | Who Wrote This? Identifying Machine vs Human-Generated Text in Hausa | Babangida Sani et.al. | 2503.13101 | link |
| 2025-03-17 | SparseAlign: A Fully Sparse Framework for Cooperative Object Detection | Yunshuang Yuan et.al. | 2503.12982 | null |
| 2025-03-17 | Efficient Multimodal 3D Object Detector via Instance-Level Contrastive Distillation | Zhuoqun Su et.al. | 2503.12914 | null |
| 2025-03-16 | Point Cloud Based Scene Segmentation: A Survey | Dan Halperin et.al. | 2503.12595 | null |
| 2025-03-16 | GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing | Zilun Zhang et.al. | 2503.12490 | null |
| 2025-03-16 | Deepfake Detection with Optimized Hybrid Model: EAR Biometric Descriptor via Improved RCNN | Ruchika Sharma et.al. | 2503.12381 | null |
| 2025-03-15 | An Efficient Deep Learning-Based Approach to Automating Invoice Document Validation | Aziz Amari et.al. | 2503.12267 | null |
| 2025-03-15 | Minuscule Cell Detection in AS-OCT Images with Progressive Field-of-View Focusing | Boyu Chen et.al. | 2503.12249 | null |
| 2025-03-15 | SFMNet: Sparse Focal Modulation for 3D Object Detection | Oren Shrout et.al. | 2503.12093 | null |
| 2025-03-14 | FLASHμ: Fast Localizing And Sizing of Holographic Microparticles | Ayush Paliwal et.al. | 2503.11538 | null |
| 2025-03-14 | Falcon: A Remote Sensing Vision-Language Foundation Model | Kelu Yao et.al. | 2503.11070 | null |
| 2025-03-14 | FMNet: Frequency-Assisted Mamba-Like Linear Attention Network for Camouflaged Object Detection | Ming Deng et.al. | 2503.11030 | null |
| 2025-03-14 | Comparative Analysis of Advanced AI-based Object Detection Models for Pavement Marking Quality Assessment during Daytime | Gian Antariksa et.al. | 2503.11008 | null |
| 2025-03-14 | Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection | Chuhan Zhang et.al. | 2503.11005 | null |
| 2025-03-14 | Enhanced Multi-View Pedestrian Detection Using Probabilistic Occupancy Volume | Reef Alturki et.al. | 2503.10982 | null |
| 2025-03-13 | The Power of One: A Single Example is All it Takes for Segmentation in VLMs | Mir Rayat Imtiaz Hossain et.al. | 2503.10779 | null |
| 2025-03-13 | HeightFormer: Learning Height Prediction in Voxel Features for Roadside Vision Centric 3D Object Detection via Transformer | Zhang Zhang et.al. | 2503.10777 | null |
| 2025-03-13 | Semantic-Supervised Spatial-Temporal Fusion for LiDAR-based 3D Object Detection | Chaoqun Wang et.al. | 2503.10579 | null |
| 2025-03-13 | RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation | Yuwen Du et.al. | 2503.10410 | link |
| 2025-03-13 | RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing | Fengxiang Wang et.al. | 2503.10392 | link |
| 2025-03-13 | Object detection characteristics in a learning factory environment using YOLOv8 | Toni Schneidereit et.al. | 2503.10356 | null |
| 2025-03-13 | TARS: Traffic-Aware Radar Scene Flow Estimation | Jialong Wu et.al. | 2503.10210 | null |
| 2025-03-13 | A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection | Shenghao Fu et.al. | 2503.10152 | link |
| 2025-03-13 | Deep Learning-Based Direct Leaf Area Estimation using Two RGBD Datasets for Model Development | Namal Jayasuriya et.al. | 2503.10129 | null |
| 2025-03-13 | Style Evolving along Chain-of-Thought for Unknown-Domain Object Detection | Zihao Zhang et.al. | 2503.09968 | null |
| 2025-03-12 | CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation | Hariprasath Govindarajan et.al. | 2503.09878 | null |
| 2025-03-12 | How good are deep learning methods for automated road safety analysis using video data? An experimental study | Qingwu Liu et.al. | 2503.09807 | null |
| 2025-03-12 | Deep Learning for Climate Action: Computer Vision Analysis of Visual Narratives on X | Katharina Prasse et.al. | 2503.09361 | null |
| 2025-03-12 | Fully-Synthetic Training for Visual Quality Inspection in Automotive Production | Christoph Huber et.al. | 2503.09354 | null |
| 2025-03-12 | DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection | Chiara Cappellino et.al. | 2503.09271 | null |
| 2025-03-12 | Polygonizing Roof Segments from High-Resolution Aerial Images Using Yolov8-Based Edge Detection | Qipeng Mei et.al. | 2503.09187 | null |
| 2025-03-12 | RFUAV: A Benchmark Dataset for Unmanned Aerial Vehicle Detection and Identification | Rui Shi et.al. | 2503.09033 | link |
| 2025-03-12 | Dual-Domain Homogeneous Fusion with Cross-Modal Mamba and Progressive Decoder for 3D Object Detection | Xuzhong Hu et.al. | 2503.08992 | null |
| 2025-03-11 | GBlobs: Explicit Local Structure via Gaussian Blobs for Improved Cross-Domain LiDAR-based 3D Object Detection | Dušan Malić et.al. | 2503.08639 | null |
| 2025-03-11 | Referring to Any Person | Qing Jiang et.al. | 2503.08507 | link |
| 2025-03-11 | SuperCap: Multi-resolution Superpixel-based Image Captioning | Henry Senior et.al. | 2503.08496 | null |
| 2025-03-13 | Learning to Detect Objects from Multi-Agent LiDAR Scans without Manual Labels | Qiming Xia et.al. | 2503.08421 | null |
| 2025-03-11 | Embodied Crowd Counting | Runling Long et.al. | 2503.08367 | null |
| 2025-03-11 | Physics-based AI methodology for Material Parameter Extraction from Optical Data | M. Koumans et.al. | 2503.08183 | null |
| 2025-03-11 | Bring Remote Sensing Object Detect Into Nature Language Model: Using SFT Method | Fei Wang et.al. | 2503.08144 | null |
| 2025-03-11 | Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning | Lizhen Xu et.al. | 2503.08101 | link |
| 2025-03-11 | SparseVoxFormer: Sparse Voxel-based Transformer for Multi-modal 3D Object Detection | Hyeongseok Son et.al. | 2503.08092 | null |
| 2025-03-11 | Simulating Automotive Radar with Lidar and Camera Inputs | Peili Song et.al. | 2503.08068 | null |
| 2025-03-10 | YOLOE: Real-Time Seeing Anything | Ao Wang et.al. | 2503.07465 | link |
| 2025-03-10 | HGO-YOLO: Advancing Anomaly Behavior Detection with Hierarchical Features and Lightweight Optimized Detection | Qizhi Zheng et.al. | 2503.07371 | null |
| 2025-03-10 | Mitigating Hallucinations in YOLO-based Object Detection Models: A Revisit to Out-of-Distribution Detection | Weicheng He et.al. | 2503.07330 | null |
| 2025-03-10 | Semantic Communications with Computer Vision Sensing for Edge Video Transmission | Yubo Peng et.al. | 2503.07252 | null |
| 2025-03-10 | MIRAM: Masked Image Reconstruction Across Multiple Scales for Breast Lesion Risk Prediction | Hung Q. Vo et.al. | 2503.07157 | null |
| 2025-03-10 | A Light Perspective for 3D Object Detection | Marcelo Eduardo Pederiva et.al. | 2503.07133 | null |
| 2025-03-10 | SimROD: A Simple Baseline for Raw Object Detection with Global and Local Enhancements | Haiyang Xie et.al. | 2503.07101 | link |
| 2025-03-10 | RS2V-L: Vehicle-Mounted LiDAR Data Generation from Roadside Sensor Observations | Ruidan Xing et.al. | 2503.07085 | null |
| 2025-03-10 | Availability-aware Sensor Fusion via Unified Canonical Space for 4D Radar, LiDAR, and Camera | Dong-Hee Paek et.al. | 2503.07029 | null |
| 2025-03-10 | Large Language Model Guided Progressive Feature Alignment for Multimodal UAV Object Detection | Wentao Wu et.al. | 2503.06948 | null |
| 2025-03-06 | Collaborative Evaluation of Deepfake Text with Deliberation-Enhancing Dialogue Systems | Jooyoung Lee et.al. | 2503.04945 | null |
| 2025-03-06 | Fine-Tuning Florence2 for Enhanced Object Detection in Un-constructed Environments: Vision-Language Model Approach | Soumyadeep Ro et.al. | 2503.04918 | null |
| 2025-03-06 | Floxels: Fast Unsupervised Voxel Based Scene Flow Estimation | David T. Hoffmann et.al. | 2503.04718 | null |
| 2025-03-06 | DEAL-YOLO: Drone-based Efficient Animal Localization using YOLO | Aditya Prashant Naidu et.al. | 2503.04698 | null |
| 2025-03-06 | Teach YOLO to Remember: A Self-Distillation Approach for Continual Object Detection | Riccardo De Monte et.al. | 2503.04688 | null |
| 2025-03-06 | ReynoldsFlow: Exquisite Flow Estimation via Reynolds Transport Theorem | Yu-Hsi Chen et.al. | 2503.04500 | link |
| 2025-03-06 | A lightweight model FDM-YOLO for small target improvement based on YOLOv8 | Xuerui Zhang et.al. | 2503.04452 | null |
| 2025-03-06 | Shaken, Not Stirred: A Novel Dataset for Visual Understanding of Glasses in Human-Robot Bartending Tasks | Lukáš Gajdošech et.al. | 2503.04308 | null |
| 2025-03-06 | CA-W3D: Leveraging Context-Aware Knowledge for Weakly Supervised Monocular 3D Detection | Chupeng Liu et.al. | 2503.04154 | null |
| 2025-03-06 | Robust Computer-Vision based Construction Site Detection for Assistive-Technology Applications | Junchi Feng et.al. | 2503.04139 | null |
| 2025-03-06 | Fractional Correspondence Framework in Detection Transformer | Masoumeh Zareapoor et.al. | 2503.04107 | null |
| 2025-03-05 | DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance | Zhao Yang et.al. | 2503.03689 | link |
| 2025-03-05 | 4D Radar Ground Truth Augmentation with LiDAR-to-4D Radar Data Synthesis | Woo-Jin Jung et.al. | 2503.03637 | null |
| 2025-03-05 | Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders | Kristian Kuznetsov et.al. | 2503.03601 | null |
| 2025-03-05 | Simulation-Based Performance Evaluation of 3D Object Detection Methods with Deep Learning for a LiDAR Point Cloud Dataset in a SOTIF-related Use Case | Milin Patel et.al. | 2503.03548 | link |
| 2025-03-05 | AI-Driven Multi-Stage Computer Vision System for Defect Detection in Laser-Engraved Industrial Nameplates | Adhish Anitha Vilasan et.al. | 2503.03395 | null |
| 2025-03-05 | MIAdapt: Source-free Few-shot Domain Adaptive Object Detection for Microscopic Images | Nimra Dilawar et.al. | 2503.03370 | null |
| 2025-03-05 | Automated Attendee Recognition System for Large-Scale Social Events or Conference Gathering | Dhruv Motwani et.al. | 2503.03330 | null |
| 2025-03-05 | BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation | Hiep Truong Cong et.al. | 2503.03280 | null |
| 2025-03-05 | Find Matching Faces Based On Face Parameters | Setu A. Bhatt et.al. | 2503.03204 | null |
| 2025-03-04 | Revolutionizing Traffic Management with AI-Powered Machine Vision: A Step Toward Smart Cities | Seyed Hossein Hosseini DolatAbadi et.al. | 2503.02967 | null |
| 2025-03-04 | Class-Aware PillarMix: Can Mixed Sample Data Augmentation Enhance 3D Object Detection with Radar Point Clouds? | Miao Zhang et.al. | 2503.02687 | null |
| 2025-03-04 | Exploring Model Quantization in GenAI-based Image Inpainting and Detection of Arable Plants | Sourav Modak et.al. | 2503.02420 | null |
| 2025-03-04 | Robust detection of overlapping bioacoustic sound events | Louis Mahon et.al. | 2503.02389 | null |
| 2025-03-04 | YOLO-PRO: Enhancing Instance-Specific Object Detection with Full-Channel Global Self-Attention | Lin Huang et.al. | 2503.02348 | null |
| 2025-03-04 | SSNet: Saliency Prior and State Space Model-based Network for Salient Object Detection in RGB-D Images | Gargi Panda et.al. | 2503.02270 | null |
| 2025-03-03 | Generalized Diffusion Detector: Mining Robust Features from Diffusion Models for Domain-Generalized Detection | Boyong He et.al. | 2503.02101 | null |
| 2025-03-03 | Uncertainty Representation in a SOTIF-Related Use Case with Dempster-Shafer Theory for LiDAR Sensor-Based Object Detection | Milin Patel et.al. | 2503.02087 | link |
| 2025-03-03 | Visual-RFT: Visual Reinforcement Fine-Tuning | Ziyu Liu et.al. | 2503.01785 | link |
| 2025-03-03 | Enhancing Object Detection Accuracy in Underwater Sonar Images through Deep Learning-based Denoising | Ziyu Wang et.al. | 2503.01655 | null |
| 2025-03-03 | Evaluating Stenosis Detection with Grounding DINO, YOLO, and DINO-DETR | Muhammad Musab Ansari et.al. | 2503.01601 | null |
| 2025-02-28 | The Common Objects Underwater (COU) Dataset for Robust Underwater Object Detection | Rishi Mukherjee et.al. | 2502.20651 | null |
| 2025-02-28 | RTGen: Real-Time Generative Detection Transformer | Chi Ruan et.al. | 2502.20622 | null |
| 2025-02-28 | LV-DOT: LiDAR-visual dynamic obstacle detection and tracking for autonomous robot navigation | Zhefan Xu et.al. | 2502.20607 | null |
| 2025-02-27 | Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds | Mohamed Abdelsamad et.al. | 2502.20316 | null |
| 2025-02-27 | OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels | Meng Lou et.al. | 2502.20087 | link |
| 2025-02-27 | Night-Voyager: Consistent and Efficient Nocturnal Vision-Aided State Estimation in Object Maps | Tianxiao Gao et.al. | 2502.20054 | null |
| 2025-02-27 | Learning Mask Invariant Mutual Information for Masked Image Modeling | Tao Huang et.al. | 2502.19718 | null |
| 2025-02-27 | BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance | Xin Ye et.al. | 2502.19694 | null |
| 2025-02-26 | Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event Cameras | Hoonhee Cho et.al. | 2502.19630 | link |
| 2025-02-26 | Is Your Paper Being Reviewed by an LLM? A New Benchmark Dataset and Approach for Detecting AI Text in Peer Review | Sungduk Yu et.al. | 2502.19614 | null |
| 2025-02-23 | Rewards-based image analysis in microscopy | Kamyar Barakati et.al. | 2502.18522 | null |
| 2025-02-25 | Multi-Perspective Data Augmentation for Few-shot Object Detection | Anh-Khoa Nguyen Vu et.al. | 2502.18195 | null |
| 2025-02-25 | Progressive Local Alignment for Medical Multimodal Pre-training | Huimin Yan et.al. | 2502.18047 | null |
| 2025-02-25 | Automatic Vehicle Detection using DETR: A Transformer-Based Approach for Navigating Treacherous Roads | Istiaq Ahmed Fahad et.al. | 2502.17843 | null |
| 2025-02-24 | Semi-Supervised Weed Detection in Vegetable Fields: In-domain and Cross-domain Experiments | Boyang Deng et.al. | 2502.17673 | null |
| 2025-02-24 | Experimental validation of UAV search and detection system in real wilderness environment | Stella Dumenčić et.al. | 2502.17372 | null |
| 2025-02-24 | LCV2I: Communication-Efficient and High-Performance Collaborative Perception Framework with Low-Resolution LiDAR | Xinxin Feng et.al. | 2502.17039 | null |
| 2025-02-24 | Sarang at DEFACTIFY 4.0: Detecting AI-Generated Text Using Noised Data and an Ensemble of DeBERTa Models | Avinash Trivedi et.al. | 2502.16857 | null |
| 2025-02-23 | Geometry-Aware 3D Salient Object Detection Network | Chen Wang et.al. | 2502.16488 | null |
| 2025-02-26 | MQADet: A Plug-and-Play Paradigm for Enhancing Open-Vocabulary Object Detection via Multimodal Question Answering | Caixiong Li et.al. | 2502.16486 | null |
| 2025-02-23 | Cross-domain Few-shot Object Detection with Multi-modal Textual Enrichment | Zeyu Shangguan et.al. | 2502.16469 | null |
| 2025-02-23 | Deep learning approaches to surgical video segmentation and object detection: A Scoping Review | Devanish N. Kamtam et.al. | 2502.16459 | null |
| 2025-02-22 | FeatSharp: Your Vision Model Features, Sharper | Mike Ranzinger et.al. | 2502.16025 | link |
| 2025-02-21 | Generative AI Framework for 3D Object Generation in Augmented Reality | Majid Behravan et.al. | 2502.15869 | null |
| 2025-02-21 | Machine-generated text detection prevents language model collapse | George Drayson et.al. | 2502.15654 | link |
| 2025-02-21 | Depth-aware Fusion Method based on Image and 4D Radar Spectrum for 3D Object Detection | Yue Sun et.al. | 2502.15516 | null |
| 2025-02-21 | Q-PETR: Quant-aware Position Embedding Transformation for Multi-View 3D Object Detection | Jiangyong Yu et.al. | 2502.15488 | null |
| 2025-02-21 | PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments | Yueting Liu et.al. | 2502.15342 | null |
| 2025-02-20 | Synth It Like KITTI: Synthetic Data Generation for Object Detection in Driving Scenarios | Richard Marcus et.al. | 2502.15076 | null |
| 2025-02-20 | YOLOv12: A Breakdown of the Key Architectural Features | Mujadded Al Rabbani Alif et.al. | 2502.14740 | null |
| 2025-02-20 | LXLv2: Enhanced LiDAR Excluded Lean 3D Object Detection with Fusion of 4D Radar and Camera | Weiyi Xiong et.al. | 2502.14503 | null |
| 2025-02-20 | ODVerse33: Is the New YOLO Version Always Better? A Multi Domain benchmark from YOLO v5 to v11 | Tianyou Jiang et.al. | 2502.14314 | null |
| 2025-02-19 | PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection | Rui Zhao et.al. | 2502.14063 | link |
| 2025-02-19 | Image compositing is all you need for data augmentation | Ang Jia Ning Shermaine et.al. | 2502.13936 | null |
| 2025-02-19 | MSVCOD:A Large-Scale Multi-Scene Dataset for Video Camouflage Object Detection | Shuyong Gao et.al. | 2502.13859 | null |
| 2025-02-19 | An Overall Real-Time Mechanism for Classification and Quality Evaluation of Rice | Wanke Xia et.al. | 2502.13764 | null |
| 2025-02-18 | Multiple Distribution Shift – Aerial (MDS-A): A Dataset for Test-Time Error Detection and Model Adaptation | Noel Ngu et.al. | 2502.13289 | null |
| 2025-02-18 | RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird’s Eye View for 3D Object Detection | Jingtong Yue et.al. | 2502.13071 | null |
| 2025-02-18 | Task-Oriented Semantic Communication for Stereo-Vision 3D Object Detection | Zijian Cao et.al. | 2502.12735 | null |
| 2025-02-18 | Iron Sharpens Iron: Defending Against Attacks in Machine-Generated Text Detection with Adversarial Training | Yuanfan Li et.al. | 2502.12734 | null |
| 2025-02-18 | DAMamba: Vision State Space Model with Dynamic Adaptive Scan | Tanzhe Li et.al. | 2502.12627 | null |
| 2025-02-18 | Who Writes What: Unveiling the Impact of Author Roles on AI-generated Text Detection | Jiatao Li et.al. | 2502.12611 | null |
| 2025-02-18 | Gaseous Object Detection | Kailai Zhou et.al. | 2502.12415 | null |
| 2025-02-17 | AI-generated Text Detection with a GLTR-based Approach | Lucía Yan Wu et.al. | 2502.12064 | link |
| 2025-02-17 | Enhancing Transparent Object Pose Estimation: A Fusion of GDR-Net and Edge Detection | Tessa Pulli et.al. | 2502.12027 | null |
| 2025-02-17 | ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability | Ryuto Koike et.al. | 2502.11336 | null |
| 2025-02-16 | DAViMNet: SSMs-Based Domain Adaptive Object Detection | A. Enes Doruk et.al. | 2502.11178 | null |
| 2025-02-15 | CLoCKDistill: Consistent Location-and-Context-aware Knowledge Distillation for DETRs | Qizhen Lan et.al. | 2502.10683 | null |
| 2025-02-14 | Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding | Wenxuan Guo et.al. | 2502.10392 | link |
| 2025-02-14 | Object Detection and Tracking | Md Pranto et.al. | 2502.10310 | null |
| 2025-02-14 | Artificial Intelligence to Assess Dental Findings from Panoramic Radiographs – A Multinational Study | Yin-Chih Chelsea Wang et.al. | 2502.10277 | null |
| 2025-02-13 | Instance Segmentation of Scene Sketches Using Natural Image Priors | Mia Tang et.al. | 2502.09608 | null |
| 2025-02-13 | Wholly-WOOD: Wholly Leveraging Diversified-quality Labels for Weakly-supervised Oriented Object Detection | Yi Yu et.al. | 2502.09471 | link |
| 2025-02-13 | Mitigating the Impact of Prominent Position Shift in Drone-based RGBT Object Detection | Yan Zhang et.al. | 2502.09311 | null |
| 2025-02-13 | Billet Number Recognition Based on Test-Time Adaptation | Yuan Wei et.al. | 2502.09026 | null |
| 2025-02-12 | Uncertainty Aware Human-machine Collaboration in Camouflaged Object Detection | Ziyue Yang et.al. | 2502.08373 | link |
| 2025-02-12 | Modification and Generated-Text Detection: Achieving Dual Detection Capabilities for the Outputs of LLM by Watermark | Yuhang Cai et.al. | 2502.08332 | null |
| 2025-02-12 | Plantation Monitoring Using Drone Images: A Dataset and Performance Review | Yashwanth Karumanchi et.al. | 2502.08233 | null |
| 2025-02-12 | Take What You Need: Flexible Multi-Task Semantic Communications with Channel Adaptation | Xiang Chen et.al. | 2502.08221 | null |
| 2025-02-13 | SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation | Zhiming Ma et.al. | 2502.08168 | link |
| 2025-02-12 | Knowledge Swapping via Learning and Unlearning | Mingyu Xing et.al. | 2502.08075 | null |
| 2025-02-11 | Visual-based spatial audio generation system for multi-speaker environments | Xiaojing Liu et.al. | 2502.07538 | null |
| 2025-02-11 | Quantitative Analysis of Objects in Prisoner Artworks | Thea Christoffersen et.al. | 2502.07440 | null |
| 2025-02-11 | Fast-COS: A Fast One-Stage Object Detector Based on Reparameterized Attention Vision Transformer for Autonomous Driving | Novendra Setyawan et.al. | 2502.07417 | null |
| 2025-02-11 | Multi-Task-oriented Nighttime Haze Imaging Enhancer for Vision-driven Measurement Systems | Ai Chen et.al. | 2502.07351 | link |
| 2025-02-11 | SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer | Wenxi Li et.al. | 2502.07216 | null |
| 2025-02-11 | Dense Object Detection Based on De-homogenized Queries | Yueming Huang et.al. | 2502.07194 | null |
| 2025-02-11 | Foreign-Object Detection in High-Voltage Transmission Line Based on Improved YOLOv8m | Zhenyue Wang et.al. | 2502.07175 | null |
| 2025-02-11 | A Survey on Mamba Architecture for Vision Applications | Fady Ibrahim et.al. | 2502.07161 | null |
| 2025-02-10 | Multimodal Search on a Line | Jared Coleman et.al. | 2502.07000 | null |
| 2025-02-10 | AgilePilot: DRL-Based Drone Agent for Real-Time Motion Planning in Dynamic Environments by Leveraging Object Detection | Roohan Ahmed Khan et.al. | 2502.06725 | null |
| 2025-02-10 | EdgeMLBalancer: A Self-Adaptive Approach for Dynamic Model Switching on Resource-Constrained Edge Devices | Akhila Matathammal et.al. | 2502.06493 | null |
| 2025-02-10 | PLATTER: A Page-Level Handwritten Text Recognition System for Indic Scripts | Badri Vishal Kasuba et.al. | 2502.06172 | null |
| 2025-02-10 | Enhancing Document Key Information Localization Through Data Augmentation | Yue Dai et.al. | 2502.06132 | null |
| 2025-02-10 | Improved YOLOv5s model for key components detection of power transmission lines | Chen Chen et.al. | 2502.06127 | null |
| 2025-02-10 | A Novel Multi-Teacher Knowledge Distillation for Real-Time Object Detection using 4D Radar | Seung-Hyun Song et.al. | 2502.06114 | null |
| 2025-02-09 | Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery | Yuhui Zeng et.al. | 2502.05843 | null |
| 2025-02-08 | Demystifying Catastrophic Forgetting in Two-Stage Incremental Object Detector | Qirui Wu et.al. | 2502.05540 | null |
| 2025-02-07 | Invizo: Arabic Handwritten Document Optical Character Recognition Solution | Alhossien Waly et.al. | 2502.05277 | null |
| 2025-02-07 | LP-DETR: Layer-wise Progressive Relations for Object Detection | Zhengjian Kang et.al. | 2502.05147 | null |
| 2025-02-07 | Counting Fish with Temporal Representations of Sonar Video | Kai Van Brunt et.al. | 2502.05129 | null |
| 2025-02-07 | DetVPCC: RoI-based Point Cloud Sequence Compression for 3D Object Detection | Mingxuan Yan et.al. | 2502.04804 | null |
| 2025-02-07 | MHAF-YOLO: Multi-Branch Heterogeneous Auxiliary Fusion YOLO for accurate object detection | Zhiqiang Yang et.al. | 2502.04656 | link |
| 2025-02-07 | AIQViT: Architecture-Informed Post-Training Quantization for Vision Transformers | Runqing Jiang et.al. | 2502.04628 | null |
| 2025-02-06 | An Optimized YOLOv5 Based Approach For Real-time Vehicle Detection At Road Intersections Using Fisheye Cameras | Md. Jahin Alam et.al. | 2502.04566 | null |
| 2025-02-06 | Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection | Minseok Jung et.al. | 2502.04528 | null |
| 2025-02-06 | OneTrack-M: A multitask approach to transformer-based MOT models | Luiz C. S. de Araujo et.al. | 2502.04478 | null |
| 2025-02-07 | Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances | Yi Yu et.al. | 2502.04268 | null |
| 2025-02-06 | An object detection approach for lane change and overtake detection from motion profiles | Andrea Benericetti et.al. | 2502.04244 | null |
| 2025-02-06 | YOLOv4: A Breakthrough in Real-Time Object Detection | Athulya Sundaresan Geetha et.al. | 2502.04161 | null |
| 2025-02-06 | Advanced Object Detection and Pose Estimation with Hybrid Task Cascade and High-Resolution Networks | Yuhui Jin et.al. | 2502.03877 | null |
| 2025-02-06 | Pursuing Better Decision Boundaries for Long-Tailed Object Detection via Category Information Amount | Yanbiao Ma et.al. | 2502.03852 | null |
| 2025-02-06 | Single-Domain Generalized Object Detection by Balancing Domain Diversity and Invariance | Zhenwei He et.al. | 2502.03835 | null |
| 2025-02-06 | UAV Cognitive Semantic Communications Enabled by Knowledge Graph for Robust Object Detection | Xi Song et.al. | 2502.03761 | null |
| 2025-02-06 | RAMOTS: A Real-Time System for Aerial Multi-Object Tracking based on Deep Learning and Big Data Technology | Nhat-Tan Do et.al. | 2502.03760 | null |
| 2025-02-05 | An Empirical Study of Methods for Small Object Detection from Satellite Imagery | Xiaohui Yuan et.al. | 2502.03674 | null |
| 2025-02-05 | Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics | Indrashis Das et.al. | 2502.03654 | link |
| 2025-02-05 | RoboGrasp: A Universal Grasping Policy for Robust Robotic Control | Yiqi Huang et.al. | 2502.03072 | null |
| 2025-02-05 | Enhancing Quantum-ready QUBO-based Suppression for Object Detection with Appearance and Confidence Features | Keiichiro Yamamura et.al. | 2502.02895 | null |
| 2025-02-05 | RS-YOLOX: A High Precision Detector for Object Detection in Satellite Remote Sensing Images | Lei Yang et.al. | 2502.02850 | null |
| 2025-02-04 | Learning the RoPEs: Better 2D and 3D Position Encodings with STRING | Connor Schenck et.al. | 2502.02562 | null |
| 2025-02-04 | Uncertainty Quantification for Collaborative Object Detection Under Adversarial Attacks | Huiqun Huang et.al. | 2502.02537 | null |
| 2025-02-04 | Improving Generalization Ability for 3D Object Detection by Learning Sparsity-invariant Features | Hsin-Cheng Lu et.al. | 2502.02322 | null |
| 2025-02-04 | From Fog to Failure: How Dehazing Can Harm Clear Image Object Detection | Ashutosh Kumar et.al. | 2502.02027 | null |
| 2025-02-04 | Memory Efficient Transformer Adapter for Dense Predictions | Dong Zhang et.al. | 2502.01962 | null |
| 2025-02-04 | INTACT: Inducing Noise Tolerance through Adversarial Curriculum Training for LiDAR-based Safety-Critical Perception and Autonomy | Nastaran Darabi et.al. | 2502.01896 | null |
| 2025-02-04 | SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset | Goodarz Mehr et.al. | 2502.01894 | link |
| 2025-02-03 | Reliability-Driven LiDAR-Camera Fusion for Robust 3D Object Detection | Reza Sadeghian et.al. | 2502.01856 | null |
| 2025-02-03 | GauCho: Gaussian Distributions with Cholesky Decomposition for Oriented Object Detection | Jeffri Murrugarra-LLerena et.al. | 2502.01565 | null |
| 2025-02-03 | Human Body Restoration with One-Step Diffusion Model and A New Benchmark | Jue Gong et.al. | 2502.01411 | null |
| 2025-01-31 | Let Human Sketches Help: Empowering Challenging Image Segmentation Task with Freehand Sketches | Ying Zang et.al. | 2501.19329 | null |
| 2025-01-31 | Beyond checkmate: exploring the creative chokepoints in AI text | Nafis Irtiza Tripto et.al. | 2501.19301 | link |
| 2025-01-31 | GO: The Great Outdoors Multimodal Dataset | Peng Jiang et.al. | 2501.19274 | null |
| 2025-01-31 | Adversarial Attacks on AI-Generated Text Detection Models: A Token Probability-Based Approach Using Embeddings | Ahmed K. Kadhim et.al. | 2501.18998 | null |
| 2025-01-31 | Early Diagnosis and Severity Assessment of Weligama Coconut Leaf Wilt Disease and Coconut Caterpillar Infestation using Deep Learning-based Image Processing Techniques | Samitha Vidhanaarachchi et.al. | 2501.18835 | null |
| 2025-01-30 | Tuning Event Camera Biases Heuristic for Object Detection Applications in Staring Scenarios | David El-Chai Ben-Ezra et.al. | 2501.18788 | null |
| 2025-01-30 | Adaptive Object Detection for Indoor Navigation Assistance: A Performance Evaluation of Real-Time Algorithms | Abhinav Pratap et.al. | 2501.18444 | null |
| 2025-01-29 | Real Time Scheduling Framework for Multi Object Detection via Spiking Neural Networks | Donghwa Kang et.al. | 2501.18412 | null |
| 2025-01-30 | IROAM: Improving Roadside Monocular 3D Object Detection Learning from Autonomous Vehicle Data Domain | Zhe Wang et.al. | 2501.18162 | null |
| 2025-02-03 | Efficient Feature Fusion for UAV Object Detection | Xudong Wang et.al. | 2501.17983 | null |
| 2025-01-29 | TransRAD: Retentive Vision Transformer for Enhanced Radar Object Detection | Lei Cheng et.al. | 2501.17977 | link |
| 2025-01-28 | Object Detection with Deep Learning for Rare Event Search in the GADGET II TPC | Tyler Wheeler et.al. | 2501.17892 | null |
| 2025-01-29 | Detection of Oscillation-like Patterns in Eclipsing Binary Light Curves using Neural Network-based Object Detection Algorithms | Burak Ulaş et.al. | 2501.17538 | null |
| 2025-01-30 | Assessing the Capability of YOLO- and Transformer-based Object Detectors for Real-time Weed Detection | Alicia Allmendinger et.al. | 2501.17387 | null |
| 2025-01-28 | DINOSTAR: Deep Iterative Neural Object Detector Self-Supervised Training for Roadside LiDAR Applications | Muhammad Shahbaz et.al. | 2501.17076 | null |
| 2025-01-28 | Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding | Akash Kumar et.al. | 2501.17053 | null |
| 2025-01-28 | Approach Towards Semi-Automated Certification for Low Criticality ML-Enabled Airborne Applications | Chandrasekar Sridhar et.al. | 2501.17028 | null |
| 2025-01-28 | Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection | Xiangyu Gao et.al. | 2501.16981 | null |
| 2025-01-28 | B-FPGM: Lightweight Face Detection via Bayesian-Optimized Soft FPGM Pruning | Nikolaos Kaparinos et.al. | 2501.16917 | null |
| 2025-01-28 | SSF-PAN: Semantic Scene Flow-Based Perception for Autonomous Navigation in Traffic Scenarios | Yinqi Chen et.al. | 2501.16754 | null |
| 2025-01-28 | DebugAgent: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging | Muxi Chen et.al. | 2501.16751 | null |
| 2025-01-28 | DFCon: Attention-Driven Supervised Contrastive Learning for Robust Deepfake Detection | MD Sadik Hossain Shanto et.al. | 2501.16704 | null |
| 2025-01-27 | Efficient Object Detection of Marine Debris using Pruned YOLO Model | Abi Aryaza et.al. | 2501.16571 | null |
| 2025-01-27 | Object Detection for Medical Image Analysis: Insights from the RT-DETR Model | Weijie He et.al. | 2501.16469 | null |
| 2025-01-27 | The Linear Attention Resurrection in Vision Transformer | Chuanyang Zheng et.al. | 2501.16182 | null |
| 2025-01-27 | Real-Time Brain Tumor Detection in Intraoperative Ultrasound Using YOLO11: From Model Training to Deployment in the Operating Room | Santiago Cepeda et.al. | 2501.15994 | null |
| 2025-01-26 | Classifying Deepfakes Using Swin Transformers | Aprille J. Xi et.al. | 2501.15656 | null |
| 2025-01-26 | A Privacy Enhancing Technique to Evade Detection by Street Video Cameras Without Using Adversarial Accessories | Jacob Shams et.al. | 2501.15653 | null |
| 2025-01-26 | Breaking the SSL-AL Barrier: A Synergistic Semi-Supervised Active Learning Framework for 3D Object Detection | Zengran Wang et.al. | 2501.15449 | null |
| 2025-01-26 | FAVbot: An Autonomous Target Tracking Micro-Robot with Frequency Actuation Control | Zhijian Hao et.al. | 2501.15426 | null |
| 2025-01-26 | Doracamom: Joint 3D Detection and Occupancy Prediction with Multi-view 4D Radars and Cameras for Omnidirectional Perception | Lianqing Zheng et.al. | 2501.15394 | null |
| 2025-01-26 | iFormer: Integrating ConvNet and Transformer for Mobile Application | Chuanyang Zheng et.al. | 2501.15369 | link |
| 2025-01-25 | Explainable YOLO-Based Dyslexia Detection in Synthetic Handwriting Data | Nora Fink et.al. | 2501.15263 | null |
| 2025-01-25 | SpikSSD: Better Extraction and Fusion for Object Detection with Spiking Neuron Networks | Yimeng Fan et.al. | 2501.15151 | link |
| 2025-01-24 | LiDAR-Based Vehicle Detection and Tracking for Autonomous Racing | Marcello Cellina et.al. | 2501.14502 | null |
| 2025-01-24 | TD-RD: A Top-Down Benchmark with Real-Time Framework for Road Damage Detection | Xi Xiao et.al. | 2501.14302 | null |
| 2025-01-24 | A Comprehensive Framework for Semantic Similarity Detection Using Transformer Architectures and Enhanced Ensemble Techniques | Lifu Gao et.al. | 2501.14288 | null |
| 2025-01-23 | Efficient Precision Control in Object Detection Models for Enhanced and Reliable Ovarian Follicle Counting | Vincent Blot et.al. | 2501.14036 | null |
| 2025-01-23 | PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection | Peiyuan Zhang et.al. | 2501.13898 | link |
| 2025-01-23 | First Lessons Learned of an Artificial Intelligence Robotic System for Autonomous Coarse Waste Recycling Using Multispectral Imaging-Based Methods | Timo Lange et.al. | 2501.13855 | null |
| 2025-01-23 | Integrating Causality with Neurochaos Learning: Proposed Approach and Research Agenda | Nanjangud C. Narendra et.al. | 2501.13763 | null |
| 2025-01-23 | You Only Crash Once v2: Perceptually Consistent Strong Features for One-Stage Domain Adaptive Detection of Space Terrain | Timothy Chase Jr et.al. | 2501.13725 | null |
| 2025-01-23 | YOLO11-JDE: Fast and Accurate Multi-Object Tracking with Self-Supervised Re-ID | Iñaki Erregue et.al. | 2501.13710 | link |
| 2025-01-23 | Emotion estimation from video footage with LSTM | Samer Attrah et.al. | 2501.13432 | link |
| 2025-01-23 | Multi-aspect Knowledge Distillation with Large Language Model | Taegyeong Lee et.al. | 2501.13341 | link |
| 2025-01-22 | MONA: Moving Object Detection from Videos Shot by Dynamic Camera | Boxun Hu et.al. | 2501.13183 | null |
| 2025-01-21 | Large-image Object Detection for Fine-grained Recognition of Punches Patterns in Medieval Panel Painting | Josh Bruegger et.al. | 2501.12489 | link |
| 2025-01-21 | TOFFE – Temporally-binned Object Flow from Events for High-speed and Energy-Efficient Object Detection and Tracking | Adarsh Kumar Kosta et.al. | 2501.12482 | null |
| 2025-01-21 | Benchmarking Image Perturbations for Testing Automated Driving Assistance Systems | Stefano Carlo Lambertenghi et.al. | 2501.12269 | null |
| 2025-01-21 | DLEN: Dual Branch of Transformer for Low-Light Image Enhancement in Dual Domains | Junyu Xia et.al. | 2501.12235 | null |
| 2025-01-21 | SVGS-DSGAT: An IoT-Enabled Innovation in Underwater Robotic Object Detection Technology | Dongli Wu et.al. | 2501.12169 | null |
| 2025-01-21 | Co-Paced Learning Strategy Based on Confidence for Flying Bird Object Detection Model Training | Zi-Wei Sun et.al. | 2501.12071 | null |
| 2025-01-21 | SMamba: Sparse Mamba for Event-based Object Detection | Nan Yang et.al. | 2501.11971 | null |
| 2025-01-21 | LuxVeri at GenAI Detection Task 1: Inverse Perplexity Weighted Ensemble for Robust Detection of AI-Generated Text across English and Multilingual Contexts | Md Kamrujjaman Mobin et.al. | 2501.11914 | null |
| 2025-01-20 | Synthetic Data Can Mislead Evaluations: Membership Inference as Machine Text Detection | Ali Naseh et.al. | 2501.11786 | null |
| 2025-01-20 | Everyone’s Privacy Matters! An Analysis of Privacy Leakage from Real-World Facial Images on Twitter and Associated User Behaviors | Yuqi Niu et.al. | 2501.11756 | null |
| 2025-01-20 | Automatic Labelling & Semantic Segmentation with 4D Radar Tensors | Botao Sun et.al. | 2501.11351 | null |
| 2025-01-20 | Enhancing SAR Object Detection with Self-Supervised Pre-training on Masked Auto-Encoders | Xinyang Pu et.al. | 2501.11249 | null |
| 2025-01-17 | MutualForce: Mutual-Aware Enhancement for 4D Radar-LiDAR 3D Object Detection | Xiangyuan Peng et.al. | 2501.10266 | null |
| 2025-01-17 | Leveraging Confident Image Regions for Source-Free Domain-Adaptive Object Detection | Mohamed Lamine Mekhalfi et.al. | 2501.10081 | null |
| 2025-01-17 | One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression | Keita Miwa et.al. | 2501.10064 | null |
| 2025-01-17 | LWGANet: A Lightweight Group Attention Backbone for Remote Sensing Visual Tasks | Wei Lu et.al. | 2501.10040 | link |
| 2025-01-17 | FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis | Zhe Chen et.al. | 2501.09887 | null |
| 2025-01-16 | Qwen it detect machine-generated text? | Teodor-George Marchitan et.al. | 2501.09813 | link |
| 2025-01-16 | A Simple Aerial Detection Baseline of Multimodal Language Models | Qingyun Li et.al. | 2501.09720 | link |
| 2025-01-16 | Practical Continual Forgetting for Pre-trained Vision Models | Hongbo Zhao et.al. | 2501.09705 | link |
| 2025-01-16 | Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical Images | Tuan Truong et.al. | 2501.09552 | null |
| 2025-01-16 | Multi-task deep-learning for sleep event detection and stage classification | Adriana Anido-Alonso et.al. | 2501.09519 | link |
| 2025-01-16 | The Devil is in the Details: Simple Remedies for Image-to-LiDAR Representation Learning | Wonjun Jo et.al. | 2501.09485 | null |
| 2025-01-16 | MonoSOWA: Scalable monocular 3D Object detector Without human Annotations | Jan Skvrna et.al. | 2501.09481 | link |
| 2025-01-16 | RE-POSE: Synergizing Reinforcement Learning-Based Partitioning and Offloading for Edge Object Detection | Jianrui Shi et.al. | 2501.09465 | null |
| 2025-01-16 | On the Relation between Optical Aperture and Automotive Object Detection | Ofer Bar-Shalom et.al. | 2501.09456 | null |
| 2025-01-16 | SoccerSynth-Detection: A Synthetic Dataset for Soccer Player Detection | Haobin Qin et.al. | 2501.09281 | null |
| 2025-01-15 | GenAI Content Detection Task 3: Cross-Domain Machine-Generated Text Detection Challenge | Liam Dugan et.al. | 2501.08913 | null |
| 2025-01-15 | PACF: Prototype Augmented Compact Features for Improving Domain Adaptive Object Detection | Chenguang Liu et.al. | 2501.08605 | null |
| 2025-01-14 | Predicting Performance of Object Detection Models in Electron Microscopy Using Random Forests | Ni Li et.al. | 2501.08465 | link |
| 2025-01-14 | Bootstrapping Corner Cases: High-Resolution Inpainting for Safety Critical Detect and Avoid for Automated Flying | Jonathan Lyhs et.al. | 2501.08142 | null |
| 2025-01-14 | Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation | Yunzhi Zhuge et.al. | 2501.07806 | link |
| 2025-01-14 | Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding | Zhaokai Wang et.al. | 2501.07783 | link |
| 2025-01-13 | SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing | Varun Biyyala et.al. | 2501.07554 | link |
| 2025-01-13 | ML Mule: Mobile-Driven Context-Aware Collaborative Learning | Haoxiang Yu et.al. | 2501.07536 | null |
| 2025-01-13 | TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry Operations | Daniel Steininger et.al. | 2501.07360 | link |
| 2025-01-13 | Toward Realistic Camouflaged Object Detection: Benchmarks and Method | Zhimeng Xin et.al. | 2501.07297 | link |
| 2025-01-13 | Dual Scale-aware Adaptive Masked Knowledge Distillation for Object Detection | ZhouRui Zhang et.al. | 2501.07101 | null |
| 2025-01-11 | CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection | Yiheng Li et.al. | 2501.06550 | link |
| 2025-01-11 | CPDR: Towards Highly-Efficient Salient Object Detection via Crossed Post-decoder Refinement | Yijie Li et.al. | 2501.06441 | null |
| 2025-01-11 | FocusDD: Real-World Scene Infusion for Robust Dataset Distillation | Youbing Hu et.al. | 2501.06405 | null |
| 2025-01-10 | A Holistically Point-guided Text Framework for Weakly-Supervised Camouflaged Object Detection | Tsui Qin Mok et.al. | 2501.06038 | null |
| 2025-01-10 | Minimizing Occlusion Effect on Multi-View Camera Perception in BEV with Multi-Sensor Fusion | Sanjay Kumar et.al. | 2501.05997 | null |
| 2025-01-10 | EDNet: Edge-Optimized Small Target Detection in UAV Imagery – Faster Context Attention, Better Feature Fusion, and Hardware Acceleration | Zhifan Song et.al. | 2501.05885 | null |
| 2025-01-10 | Automatic detection of single-electron regime of quantum dots and definition of virtual gates using U-Net and clustering | Yui Muto et.al. | 2501.05878 | null |
| 2025-01-10 | Zero-shot Shark Tracking and Biometrics from Aerial Imagery | Chinmay K Lalgudi et.al. | 2501.05717 | null |
| 2025-01-10 | Dark Energy Survey Year 6 Results: Synthetic-source Injection Across the Full Survey Using Balrog | D. Anbajagane et.al. | 2501.05683 | null |
| 2025-01-09 | Approximate Supervised Object Distance Estimation on Unmanned Surface Vehicles | Benjamin Kiefer et.al. | 2501.05567 | null |
| 2025-01-09 | Performance of YOLOv7 in Kitchen Safety While Handling Knife | Athulya Sundaresan Geetha et.al. | 2501.05399 | null |
| 2025-01-09 | The global consensus on the risk management of autonomous driving | Sebastian Krügel et.al. | 2501.05391 | null |
| 2025-01-09 | A Systematic Literature Review on Deep Learning-based Depth Estimation in Computer Vision | Ali Rohan et.al. | 2501.05147 | null |
| 2025-01-09 | CorrDiff: Adaptive Delay-aware Detector with Temporal Cue Inputs for Real-time Object Detection | Xiang Zhang et.al. | 2501.05132 | null |
| 2025-01-09 | AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data | Haoran Zhu et.al. | 2501.04969 | link |
| 2025-01-09 | Online Continual Learning: A Systematic Literature Review of Approaches, Challenges, and Benchmarks | Seyed Amir Bidaki et.al. | 2501.04897 | link |
| 2025-01-08 | Video Summarisation with Incident and Context Information using Generative AI | Ulindu De Silva et.al. | 2501.04764 | null |
| 2025-01-08 | Boosting Salient Object Detection with Knowledge Distillated from Large Foundation Models | Miaoyang He et.al. | 2501.04582 | null |
| 2025-01-08 | Combining YOLO and Visual Rhythm for Vehicle Counting | Victor Nascimento Ribeiro et.al. | 2501.04534 | link |
| 2025-01-08 | RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark | Xin Zhang et.al. | 2501.04440 | link |
| 2025-01-08 | Integrating LLMs with ITS: Recent Advances, Potentials, Challenges, and Future Directions | Doaa Mahmud et.al. | 2501.04437 | null |
| 2025-01-08 | FGU3R: Fine-Grained Fusion via Unified 3D Representation for Multimodal 3D Object Detection | Guoxin Zhang et.al. | 2501.04373 | null |
| 2025-01-08 | H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving | Siran Chen et.al. | 2501.04302 | null |
| 2025-01-08 | UPAQ: A Framework for Real-Time and Energy-Efficient 3D Object Detection in Autonomous Vehicles | Abhishek Balasubramaniam et.al. | 2501.04213 | null |
| 2025-01-07 | LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving | Lingdong Kong et.al. | 2501.04005 | null |
| 2025-01-07 | Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection | Pablo Miralles-González et.al. | 2501.03940 | null |
| 2025-01-07 | Visual question answering: from early developments to recent advances – a survey | Ngoc Dung Huynh et.al. | 2501.03939 | null |
| 2025-01-07 | SCC-YOLO: An Improved Object Detector for Assisting in Brain Tumor Diagnosis | Runci Bai et.al. | 2501.03836 | null |
| 2025-01-07 | Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection | Xinbin Yuan et.al. | 2501.03775 | link |
| 2025-01-07 | AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features | Ruochen Zhang et.al. | 2501.03700 | null |
| 2025-01-07 | Anomaly Triplet-Net: Progress Recognition Model Using Deep Metric Learning Considering Occlusion for Manual Assembly Work | Takumi Kitsukawa et.al. | 2501.03533 | null |
| 2025-01-07 | SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild | Jiawei Liu et.al. | 2501.02962 | null |
| 2025-01-05 | Multispectral Pedestrian Detection with Sparsely Annotated Label | Chan Lee et.al. | 2501.02640 | null |
| 2025-01-05 | Generalization-Enhanced Few-Shot Object Detection in Remote Sensing | Hui Lin et.al. | 2501.02474 | link |
| 2025-01-04 | Who Wrote This? Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities | Tara Radvand et.al. | 2501.02406 | link |
| 2025-01-04 | V2X-DGPE: Addressing Domain Gaps and Pose Errors for Robust Collaborative 3D Object Detection | Sichao Wang et.al. | 2501.02363 | link |
| 2025-01-04 | Accurate Crop Yield Estimation of Blueberries using Deep Learning and Smart Drones | Hieu D. Nguyen et.al. | 2501.02344 | null |
| 2025-01-04 | On The Causal Network Of Face-selective Regions In Human Brain During Movie Watching | Ali Bavafa et.al. | 2501.02333 | null |
| 2025-01-04 | RadarNeXt: Real-Time and Reliable 3D Object Detector Based On 4D mmWave Imaging Radar | Liye Jia et.al. | 2501.02314 | null |
| 2025-01-03 | A Separable Self-attention Inspired by the State Space Model for Computer Vision | Juntao Zhang et.al. | 2501.02040 | link |
| 2025-01-03 | UAV-DETR: Efficient End-to-End Object Detection for Unmanned Aerial Vehicle Imagery | Huaxiang Zhang et.al. | 2501.01855 | null |
| 2025-01-03 | Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection | Kang Yi et.al. | 2501.01648 | null |
| 2025-01-02 | A Multi-task Supervised Compression Model for Split Computing | Yoshitomo Matsubara et.al. | 2501.01420 | link |
| 2025-01-02 | MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception | Xiaoshuai Hao et.al. | 2501.01037 | null |
| 2025-01-01 | A Novel Approach using CapsNet and Deep Belief Network for Detection and Identification of Oral Leukopenia | Hirthik Mathesh GV et.al. | 2501.00876 | null |
| 2025-01-01 | NMM-HRI: Natural Multi-modal Human-Robot Interaction with Voice and Deictic Posture via Large Language Model | Yuzhi Lai et.al. | 2501.00785 | null |
| 2024-12-31 | Gaussian Building Mesh (GBM): Extract a Building’s 3D Mesh with Google Earth and Gaussian Splatting | Kyle Gao et.al. | 2501.00625 | null |
| 2024-12-31 | B2Net: Camouflaged Object Detection via Boundary Aware and Boundary Fusion | Junmin Cai et.al. | 2501.00426 | null |
| 2024-12-31 | Research on vehicle detection based on improved YOLOv8 network | Haocheng Guo et.al. | 2501.00300 | null |
| 2024-12-30 | TiGDistill-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning Distillation | Shaoqing Xu et.al. | 2412.20911 | link |
| 2024-12-30 | Humanoid Robot RHP Friends: Seamless Combination of Autonomous and Teleoperated Tasks in a Nursing Context | Mehdi Benallegue et.al. | 2412.20770 | null |
| 2024-12-30 | Solar Filaments Detection using Active Contours Without Edges | Sanmoy Bandyopadhyay et.al. | 2412.20749 | null |
| 2024-12-30 | Open-Set Object Detection By Aligning Known Class Representations | Hiran Sarkar et.al. | 2412.20701 | null |
| 2024-12-30 | SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection | Yuxuan Li et.al. | 2412.20665 | link |
| 2024-12-30 | YOLO-UniOW: Efficient Universal Open-World Object Detection | Lihao Liu et.al. | 2412.20645 | link |
| 2024-12-29 | Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection | Dmitri Roussinov et.al. | 2412.20595 | link |
| 2024-12-29 | A Novel FPGA-based CNN Hardware Accelerator: Optimization for Convolutional Layers using Karatsuba Ofman Multiplier | Amit Sarkar et.al. | 2412.20393 | null |
| 2024-12-29 | Differential Evolution Integrated Hybrid Deep Learning Model for Object Detection in Pre-made Dishes | Lujia Lv et.al. | 2412.20370 | null |
| 2024-12-28 | Plastic Waste Classification Using Deep Learning: Insights from the WaDaBa Dataset | Suman Kunwar et.al. | 2412.20232 | null |
| 2024-12-27 | Chimera: A Block-Based Neural Architecture Search Framework for Event-Based Object Detection | Diego A. Silva et.al. | 2412.19646 | null |
| 2024-12-27 | Optimizing Helmet Detection with Hybrid YOLO Pipelines: A Detailed Analysis | Vaikunth M et.al. | 2412.19467 | null |
| 2024-12-26 | Revisiting Monocular 3D Object Detection from Scene-Level Depth Retargeting to Instance-Level Spatial Refinement | Qiude Zhang et.al. | 2412.19165 | null |
| 2024-12-26 | From Coin to Data: The Impact of Object Detection on Digital Numismatics | Rafael Cabral et.al. | 2412.19091 | null |
| 2024-12-26 | Assessing Pre-trained Models for Transfer Learning through Distribution of Spectral Components | Tengxue Zhang et.al. | 2412.19085 | null |
| 2024-12-25 | MTCAE-DFER: Multi-Task Cascaded Autoencoder for Dynamic Facial Expression Recognition | Peihao Xiang et.al. | 2412.18988 | null |
| 2024-12-25 | CGCOD: Class-Guided Camouflaged Object Detection | Chenxi Zhang et.al. | 2412.18977 | null |
| 2024-12-25 | HV-BEV: Decoupling Horizontal and Vertical Feature Sampling for Multi-View 3D Object Detection | Di Wu et.al. | 2412.18884 | null |
| 2024-12-25 | TSceneJAL: Joint Active Learning of Traffic Scenes for 3D Object Detection | Chenyang Lei et.al. | 2412.18870 | null |
| 2024-12-25 | Distortion-Aware Adversarial Attacks on Bounding Boxes of Object Detectors | Pham Phuc et.al. | 2412.18815 | link |
| 2024-12-24 | Sampling Bag of Views for Open-Vocabulary Object Detection | Hojun Choi et.al. | 2412.18273 | null |
| 2024-12-24 | Efficient Detection Framework Adaptation for Edge Computing: A Plug-and-play Neural Network Toolbox Enabling Edge Deployment | Jiaqi Wu et.al. | 2412.18230 | null |
| 2024-12-24 | SDM-Car: A Dataset for Small and Dim Moving Vehicles Detection in Satellite Videos | Zhen Zhang et.al. | 2412.18214 | link |
| 2024-12-24 | Spectrum-oriented Point-supervised Saliency Detector for Hyperspectral Images | Peifu Liu et.al. | 2412.18112 | link |
| 2024-12-24 | Multi-Point Positional Insertion Tuning for Small Object Detection | Kanoko Goto et.al. | 2412.18090 | null |
| 2024-12-24 | COMO: Cross-Mamba Interaction and Offset-Guided Fusion for Multimodal Object Detection | Chang Liu et.al. | 2412.18076 | null |
| 2024-12-23 | Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection | Yitong Chen et.al. | 2412.17800 | link |
| 2024-12-23 | Enhanced Temporal Processing in Spiking Neural Networks for Static Object Detection Using 3D Convolutions | Huaxu He et.al. | 2412.17654 | null |
| 2024-12-23 | Impact of Evidence Theory Uncertainty on Training Object Detection Models | M. Tahasanul Ibrahim et.al. | 2412.17405 | null |
| 2024-12-23 | Feature Based Methods Domain Adaptation for Object Detection: A Review Paper | Helia Mohamadi et.al. | 2412.17325 | null |
| 2024-12-23 | Towards Unsupervised Model Selection for Domain Adaptive Object Detection | Hengfu Yu et.al. | 2412.17284 | null |
| 2024-12-22 | NumbOD: A Spatial-Frequency Fusion Attack Against Object Detectors | Ziqi Zhou et.al. | 2412.16955 | link |
| 2024-12-22 | Separating Drone Point Clouds From Complex Backgrounds by Cluster Filter – Technical Report for CVPR 2024 UG2 Challenge | Hanfang Liang et.al. | 2412.16947 | null |
| 2024-12-22 | Seamless Detection: Unifying Salient Object Detection and Camouflaged Object Detection | Yi Liu et.al. | 2412.16840 | link |
| 2024-12-22 | Human-Guided Image Generation for Expanding Small-Scale Training Image Datasets | Changjian Chen et.al. | 2412.16839 | null |
| 2024-12-21 | IV-tuning: Parameter-Efficient Transfer Learning for Infrared-Visible Tasks | Yaming Zhang et.al. | 2412.16654 | link |
| 2024-12-20 | NeRF-To-Real Tester: Neural Radiance Fields as Test Image Generators for Vision of Autonomous Systems | Laura Weihl et.al. | 2412.16141 | null |
| 2024-12-20 | MR-GDINO: Efficient Open-World Continual Object Detection | Bowen Dong et.al. | 2412.15979 | link |
| 2024-12-20 | Mask-RadarNet: Enhancing Transformer With Spatial-Temporal Semantic Context for Radar Object Detection in Autonomous Driving | Yuzhi Wu et.al. | 2412.15595 | null |
| 2024-12-19 | Exploring Machine Learning Engineering for Object Detection and Tracking by Unmanned Aerial Vehicle (UAV) | Aneesha Guna et.al. | 2412.15347 | null |
| 2024-12-19 | Leveraging Color Channel Independence for Improved Unsupervised Object Detection | Bastian Jäckl et.al. | 2412.15150 | null |
| 2024-12-19 | Explainable Tampered Text Detection via Multimodal Large Models | Chenfan Qu et.al. | 2412.14816 | null |
| 2024-12-19 | Explicit Relational Reasoning Network for Scene Text Detection | Yuchen Su et.al. | 2412.14692 | null |
| 2024-12-19 | A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space | Yonghao He et.al. | 2412.14680 | link |
| 2024-12-19 | Progressive Fine-to-Coarse Reconstruction for Accurate Low-Bit Post-Training Quantization in Vision Transformers | Rui Ding et.al. | 2412.14633 | null |
| 2024-12-19 | Alignment-Free RGB-T Salient Object Detection: A Large-scale Dataset and Progressive Correlation Network | Kunpeng Wang et.al. | 2412.14576 | link |
| 2024-12-19 | SCKD: Semi-Supervised Cross-Modality Knowledge Distillation for 4D Radar Object Detection | Ruoyu Xu et.al. | 2412.14571 | null |
| 2024-12-18 | HA-RDet: Hybrid Anchor Rotation Detector for Oriented Object Detection | Phuc D. A. Nguyen et.al. | 2412.14379 | link |
| 2024-12-18 | Joint Perception and Prediction for Autonomous Driving: A Survey | Lucas Dal’Col et.al. | 2412.14088 | link |
| 2024-12-18 | Object Style Diffusion for Generalized Object Detection in Urban Scene | Hao Li et.al. | 2412.13815 | null |
| 2024-12-18 | MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing | Chuang Yang et.al. | 2412.13684 | null |
| 2024-12-18 | Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation | Aneta Zugecova et.al. | 2412.13666 | null |
| 2024-12-18 | Multi-View Pedestrian Occupancy Prediction with a Novel Synthetic Dataset | Sithu Aung et.al. | 2412.13569 | null |
| 2024-12-18 | Comparative Analysis of YOLOv9, YOLOv10 and RT-DETR for Real-Time Weed Detection | Ahmet Oğuz Saltık et.al. | 2412.13490 | null |
| 2024-12-17 | Continuous Patient Monitoring with AI: Real-Time Analysis of Video in Hospital Care Settings | Paolo Gabriel et.al. | 2412.13152 | null |
| 2024-12-17 | A New Adversarial Perspective for LiDAR-based 3D Object Detection | Shijun Zheng et.al. | 2412.13017 | null |
| 2024-12-17 | What is YOLOv6? A Deep Insight into the Object Detection Model | Athulya Sundaresan Geetha et.al. | 2412.13006 | null |
| 2024-12-17 | Differential Alignment for Domain Adaptive Object Detection | Xinyu He et.al. | 2412.12830 | null |
| 2024-12-17 | RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection | Yiheng Li et.al. | 2412.12799 | link |
| 2024-12-17 | RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion | Xiaomeng Chu et.al. | 2412.12725 | null |
| 2024-12-17 | Efficient Oriented Object Detection with Enhanced Small Object Recognition in Aerial Images | Zhifei Shi et.al. | 2412.12562 | null |
| 2024-12-17 | CREST: An Efficient Conjointly-trained Spike-driven Framework for Event-based Object Detection Exploiting Spatiotemporal Dynamics | Ruixin Mao et.al. | 2412.12525 | link |
| 2024-12-17 | PromptDet: A Lightweight 3D Object Detection Framework with LiDAR Prompts | Kun Guo et.al. | 2412.12460 | link |
| 2024-12-16 | Domain Generalization in Autonomous Driving: Evaluating YOLOv8s, RT-DETR, and YOLO-NAS with the ROAD-Almaty Dataset | Madiyar Alimov et.al. | 2412.12349 | null |
| 2024-12-16 | Coconut Palm Tree Counting on Drone Images with Deep Object Detection and Synthetic Training Data | Tobias Rohe et.al. | 2412.11949 | null |
| 2024-12-16 | Sonar-based Deep Learning in Underwater Robotics: Overview, Robustness and Challenges | Martin Aubard et.al. | 2412.11840 | null |
| 2024-12-16 | CLDA-YOLO: Visual Contrastive Learning Based Domain Adaptive YOLO Detector | Tianheng Qiu et.al. | 2412.11812 | null |
| 2024-12-16 | PhysAug: A Physical-guided and Frequency-based Data Augmentation for Single-Domain Generalized Object Detection | Xiaoran Xu et.al. | 2412.11807 | link |
| 2024-12-16 | Impact of Face Alignment on Face Image Quality | Eren Onaran et.al. | 2412.11779 | null |
| 2024-12-16 | Learning UAV-based path planning for efficient localization of objects using prior knowledge | Rick van Essen et.al. | 2412.11717 | null |
| 2024-12-16 | Oriented Tiny Object Detection: A Dataset, Benchmark, and Dynamic Unbiased Learning | Chang Xu et.al. | 2412.11582 | null |
| 2024-12-16 | Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection | Guangsheng Bao et.al. | 2412.11506 | link |
| 2024-12-16 | HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection | Zijian Gu et.al. | 2412.11489 | link |
| 2024-12-16 | Universal Domain Adaptive Object Detection via Dual Probabilistic Alignment | Yuanfan Zheng et.al. | 2412.11443 | link |
| 2024-12-13 | A dual contrastive framework | Yuan Sun et.al. | 2412.10348 | null |
| 2024-12-13 | MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization | Shuaiting Li et.al. | 2412.10261 | null |
| 2024-12-13 | Copy-Move Detection in Optical Microscopy: A Segmentation Network and A Dataset | Hao-Chiang Shao et.al. | 2412.10258 | null |
| 2024-12-13 | UN-DETR: Promoting Objectness Learning via Joint Supervision for Unknown Object Detection | Haomiao Liu et.al. | 2412.10176 | link |
| 2024-12-13 | HS-FPN: High Frequency and Spatial Perception FPN for Tiny Object Detection | Zican Shi et.al. | 2412.10116 | null |
| 2024-12-13 | RemDet: Rethinking Efficient Model Design for UAV Object Detection | Chen Li et.al. | 2412.10040 | link |
| 2024-12-13 | Timealign: A multi-modal object detection method for time misalignment fusing in autonomous driving | Zhihang Song et.al. | 2412.10033 | null |
| 2024-12-13 | Object-Focused Data Selection for Dense Prediction Tasks | Niclas Popp et.al. | 2412.10032 | null |
| 2024-12-13 | CP-DETR: Concept Prompt Guide DETR Toward Stronger Universal Object Detection | Qibo Chen et.al. | 2412.09799 | null |
| 2024-12-12 | FD2-Net: Frequency-Driven Feature Decomposition Network for Infrared-Visible Object Detection | Ke Li et.al. | 2412.09258 | null |
| 2024-12-12 | UADet: A Remarkably Simple Yet Effective Uncertainty-Aware Open-Set Object Detection Framework | Silin Cheng et.al. | 2412.09229 | null |
| 2024-12-12 | ContextHOI: Spatial Context Learning for Human-Object Interaction Detection | Mingda Jia et.al. | 2412.09050 | null |
| 2024-12-12 | STEAM: Squeeze and Transform Enhanced Attention Module | Rishabh Sabharwal et.al. | 2412.09023 | null |
| 2024-12-12 | Sensing for Space Safety and Sustainability: A Deep Learning Approach with Vision Transformers | Wenxuan Zhang et.al. | 2412.08913 | null |
| 2024-12-11 | DALI: Domain Adaptive LiDAR Object Detection via Distribution-level and Instance-level Pseudo Label Denoising | Xiaohu Lu et.al. | 2412.08806 | link |
| 2024-12-11 | Utilizing Multi-step Loss for Single Image Reflection Removal | Abdelrahman Elnenaey et.al. | 2412.08582 | link |
| 2024-12-11 | PointCFormer: a Relation-based Progressive Feature Extraction Network for Point Cloud Completion | Yi Zhong et.al. | 2412.08421 | null |
| 2024-12-11 | Pysical Informed Driving World Model | Zhuoran Yang et.al. | 2412.08410 | null |
| 2024-12-11 | Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation | Jiaming Lv et.al. | 2412.08139 | null |
| 2024-12-11 | DTAA: A Detect, Track and Avoid Architecture for navigation in spaces with Multiple Velocity Objects | Samuel Nordström et.al. | 2412.08121 | null |
| 2024-12-11 | THUD++: Large-Scale Dynamic Indoor Scene Dataset and Benchmark for Mobile Robots | Zeshun Li et.al. | 2412.08096 | null |
| 2024-12-11 | MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents | Yun Xing et.al. | 2412.08014 | null |
| 2024-12-10 | Low-Latency Scalable Streaming for Event-Based Vision | Andrew Hamara et.al. | 2412.07889 | null |
| 2024-12-10 | Leveraging Content and Context Cues for Low-Light Image Enhancement | Igor Morawski et.al. | 2412.07693 | link |
| 2024-12-10 | Multimodal Contextualized Support for Enhancing Video Retrieval System | Quoc-Bao Nguyen-Le et.al. | 2412.07584 | null |
| 2024-12-10 | Making the Flow Glow – Robot Perception under Severe Lighting Conditions using Normalizing Flow Gradients | Simon Kristoffersson Lind et.al. | 2412.07565 | null |
| 2024-12-10 | Enhancing 3D Object Detection in Autonomous Vehicles Based on Synthetic Virtual Environment Analysis | Vladislav Li et.al. | 2412.07509 | null |
| 2024-12-10 | DSFEC: Efficient and Deployable Deep Radar Object Detection | Gayathri Dandugula et.al. | 2412.07411 | null |
| 2024-12-10 | Benchmarking Vision-Based Object Tracking for USVs in Complex Maritime Environments | Muhayy Ud Din et.al. | 2412.07392 | null |
| 2024-12-09 | FlexEvent: Event Camera Object Detection at Arbitrary Frequencies | Dongyue Lu et.al. | 2412.06708 | null |
| 2024-12-09 | EMOv2: Pushing 5M Vision Model Frontier | Jiangning Zhang et.al. | 2412.06674 | link |
| 2024-12-09 | Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset | Xiao Wang et.al. | 2412.06647 | null |
| 2024-12-09 | Prediction of Occluded Pedestrians in Road Scenes using Human-like Reasoning: Insights from the OccluRoads Dataset | Melo Castillo Angie Nataly et.al. | 2412.06549 | null |
| 2024-12-09 | Self-Paced Learning Strategy with Easy Sample Prior Based on Confidence for the Flying Bird Object Detection Model Training | Zi-Wei Sun et.al. | 2412.06306 | null |
| 2024-12-09 | No Annotations for Object Detection in Art through Stable Diffusion | Patrick Ramos et.al. | 2412.06286 | link |
| 2024-12-09 | DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction | Yunheng Li et.al. | 2412.06244 | null |
| 2024-12-09 | A Real-Time Defense Against Object Vanishing Adversarial Patch Attacks for Object Detection in Autonomous Vehicles | Jaden Mu et.al. | 2412.06215 | null |
| 2024-12-09 | PoLaRIS Dataset: A Maritime Object Detection and Tracking Dataset in Pohang Canal | Jiwon Choi et.al. | 2412.06192 | null |
| 2024-12-08 | Tiny Object Detection with Single Point Supervision | Haoran Zhu et.al. | 2412.05837 | null |
| 2024-12-06 | From classical techniques to convolution-based models: A review of object detection algorithms | Fnu Neha et.al. | 2412.05252 | null |
| 2024-12-06 | Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection | Chaoda Zheng et.al. | 2412.05154 | link |
| 2024-12-06 | DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection | Yishuo Chen et.al. | 2412.04931 | link |
| 2024-12-06 | Beyond Boxes: Mask-Guided Spatio-Temporal Feature Aggregation for Video Object Detection | Khurram Azeem Hashmi et.al. | 2412.04915 | null |
| 2024-12-05 | Cubify Anything: Scaling Indoor 3D Object Detection | Justin Lazarow et.al. | 2412.04458 | null |
| 2024-12-05 | Reflective Teacher: Semi-Supervised Multimodal 3D Object Detection in Bird’s-Eye-View via Uncertainty Measure | Saheli Hazra et.al. | 2412.04337 | null |
| 2024-12-05 | YOLO-CCA: A Context-Based Approach for Traffic Sign Detection | Linfeng Jiang et.al. | 2412.04289 | link |
| 2024-12-05 | DEIM: DETR with Improved Matching for Fast Convergence | Shihua Huang et.al. | 2412.04234 | link |
| 2024-12-05 | Frequency-Adaptive Low-Latency Object Detection Using Events and Frames | Haitian Zhang et.al. | 2412.04149 | null |
| 2024-12-05 | MVUDA: Unsupervised Domain Adaptation for Multi-view Pedestrian Detection | Erik Brorsson et.al. | 2412.04117 | link |
| 2024-12-05 | Thermal and RGB Images Work Better Together in Wind Turbine Damage Detection | Serhii Svystun et.al. | 2412.04114 | null |
| 2024-12-05 | SoRA: Singular Value Decomposed Low-Rank Adaptation for Domain Generalizable Representation Learning | Seokju Yun et.al. | 2412.04077 | null |
| 2024-12-05 | Space to Policy: Scalable Brick Kiln Detection and Automatic Compliance Monitoring with Geospatial Data | Zeel B Patel et.al. | 2412.04065 | null |
| 2024-12-05 | UNCOVER: Unknown Class Object Detection for Autonomous Vehicles in Real-time | Lars Schmarje et.al. | 2412.03986 | null |
| 2024-12-04 | Perception Tokens Enhance Visual Reasoning in Multimodal Language Models | Mahtab Bigverdi et.al. | 2412.03548 | null |
| 2024-12-04 | Data Fusion of Semantic and Depth Information in the Context of Object Detection | Md Abu Yusuf et.al. | 2412.03490 | null |
| 2024-12-04 | Task-driven Image Fusion with Learnable Fusion Loss | Haowen Bai et.al. | 2412.03240 | null |
| 2024-12-04 | ObjectFinder: Open-Vocabulary Assistive System for Interactive Object Search by Blind People | Ruiping Liu et.al. | 2412.03118 | null |
| 2024-12-04 | TREND: Unsupervised 3D Representation Learning via Temporal Forecasting for LiDAR Perception | Runjian Chen et.al. | 2412.03054 | null |
| 2024-12-04 | Assessing the performance of CT image denoisers using Laguerre-Gauss Channelized Hotelling Observer for lesion detection | Prabhat Kc et.al. | 2412.02920 | null |
| 2024-12-03 | EvRT-DETR: The Surprising Effectiveness of DETR-based Detection for Event Cameras | Dmitrii Torbunov et.al. | 2412.02890 | null |
| 2024-12-03 | Optimized CNNs for Rapid 3D Point Cloud Object Recognition | Tianyi Lyu et.al. | 2412.02855 | null |
| 2024-12-03 | Gaussian Splatting Under Attack: Investigating Adversarial Noise in 3D Objects | Abdurrahman Zeybey et.al. | 2412.02803 | null |
| 2024-12-03 | SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection | Joongwon Chae et.al. | 2412.02565 | null |
| 2024-12-03 | Underload: Defending against Latency Attacks for Object Detectors on Edge Devices | Tianyi Wang et.al. | 2412.02171 | null |
| 2024-12-03 | Redundant Queries in DETR-Based 3D Detection Methods: Unnecessary and Prunable | Lizhen Xu et.al. | 2412.02054 | null |
| 2024-12-02 | Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLOv8, YOLOv9, YOLOv10, and YOLOv11 | Gustavo P. C. P. da Luz et.al. | 2412.01983 | null |
| 2024-12-02 | HPRM: High-Performance Robotic Middleware for Intelligent Autonomous Systems | Jacky Kwok et.al. | 2412.01799 | null |
| 2024-12-02 | Identifying Reliable Predictions in Detection Transformers | Young-Jin Park et.al. | 2412.01782 | null |
| 2024-12-02 | FEVER-OOD: Free Energy Vulnerability Elimination for Robust Out-of-Distribution Detection | Brian K. S. Isaac-Medina et.al. | 2412.01596 | null |
| 2024-12-02 | Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection | Hao Tang et.al. | 2412.01556 | null |
| 2024-12-03 | GFreeDet: Exploiting Gaussian Splatting and Foundation Models for Model-free Unseen Object Detection in the BOP Challenge 2024 | Xingyu Liu et.al. | 2412.01552 | null |
| 2024-12-02 | Improving Object Detection by Modifying Synthetic Data with Explainable AI | Nitish Mital et.al. | 2412.01477 | null |
| 2024-11-29 | SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection | Philipp Wolters et.al. | 2411.19860 | null |
| 2024-11-29 | Feedback-driven object detection and iterative model improvement | Sönke Tenckhoff et.al. | 2411.19835 | link |
| 2024-11-29 | Real-Time Anomaly Detection in Video Streams | Fabien Poirier et.al. | 2411.19731 | null |
| 2024-11-29 | LDA-AQU: Adaptive Query-guided Upsampling via Local Deformable Attention | Zewen Du et.al. | 2411.19585 | link |
| 2024-11-29 | Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding | Wenbo Zhang et.al. | 2411.19551 | null |
| 2024-11-28 | Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection | Tsun-Hin Cheung et.al. | 2411.19220 | null |
| 2024-11-28 | Co-Learning: Towards Semi-Supervised Object Detection with Road-side Cameras | Jicheng Yuan et.al. | 2411.19143 | null |
| 2024-11-28 | On Moving Object Segmentation from Monocular Video with Transformers | Christian Homeyer et.al. | 2411.19141 | null |
| 2024-11-28 | Dynamic Attention and Bi-directional Fusion for Safety Helmet Wearing Detection | Junwei Feng et.al. | 2411.19071 | null |
| 2024-11-28 | MVFormer: Diversifying Feature Normalization and Token Mixing for Efficient Vision Transformers | Jongseong Bae et.al. | 2411.18995 | null |
| 2024-11-27 | Exploring Depth Information for Detecting Manipulated Face Videos | Haoyue Wang et.al. | 2411.18572 | null |
| 2024-11-27 | Efficient Dynamic LiDAR Odometry for Mobile Robots with Structured Point Clouds | Jonathan Lichtenfeld et.al. | 2411.18443 | link |
| 2024-11-27 | Deep Fourier-embedded Network for Bi-modal Salient Object Detection | Pengfei Lyu et.al. | 2411.18409 | link |
| 2024-11-27 | Optimizing Multispectral Object Detection: A Bag of Tricks and Comprehensive Benchmarks | Chen Zhou et.al. | 2411.18288 | link |
| 2024-11-27 | From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects | Zizhao Li et.al. | 2411.18207 | link |
| 2024-11-27 | RPEE-HEADS: A Novel Benchmark for Pedestrian Head Detection in Crowd Videos | Mohamad Abubaker et.al. | 2411.18164 | null |
| 2024-11-27 | Revisiting Misalignment in Multispectral Pedestrian Detection: A Language-Driven Approach for Cross-modal Alignment Fusion | Taeheon Kim et.al. | 2411.17995 | null |
| 2024-11-27 | ROICtrl: Boosting Instance Control for Visual Generation | Yuchao Gu et.al. | 2411.17949 | link |
| 2024-11-26 | Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning | Hoàng-Ân Lê et.al. | 2411.17536 | link |
| 2024-11-26 | TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba | Xiaowen Ma et.al. | 2411.17473 | link |
| 2024-11-26 | Communication-Efficient Cooperative SLAMMOT via Determining the Number of Collaboration Vehicles | Susu Fang et.al. | 2411.17432 | null |
| 2024-11-26 | DGNN-YOLO: Dynamic Graph Neural Networks with YOLO11 for Small Object Detection and Tracking in Traffic Surveillance | Shahriar Soudeep et.al. | 2411.17251 | null |
| 2024-11-26 | Event-based Spiking Neural Networks for Object Detection: A Review of Datasets, Architectures, Learning Rules, and Implementation | Craig Iaboni et.al. | 2411.17006 | link |
| 2024-11-25 | Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory | Zaira Manigrasso et.al. | 2411.16934 | null |
| 2024-11-25 | Open Vocabulary Monocular 3D Object Detection | Jin Yao et.al. | 2411.16833 | link |
| 2024-11-25 | Imperceptible Adversarial Examples in the Physical World | Weilin Xu et.al. | 2411.16622 | null |
| 2024-11-25 | STDWeb: Simple Transient Detection pipeline for the Web | Sergey Karpov et.al. | 2411.16470 | null |
| 2024-11-25 | Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and Tasks | Asanobu Kitamoto et.al. | 2411.16421 | link |
| 2024-11-25 | CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation | Leon Sick et.al. | 2411.16319 | null |
| 2024-11-25 | Diagnosis of diabetic retinopathy using machine learning & deep learning technique | Eric Shah et.al. | 2411.16250 | null |
| 2024-11-25 | Interpreting Object-level Foundation Models via Visual Precision Search | Ruoyu Chen et.al. | 2411.16198 | link |
| 2024-11-25 | Learn from Foundation Model: Fruit Detection Model without Manual Annotation | Yanan Wang et.al. | 2411.16196 | null |
| 2024-11-25 | CIA: Controllable Image Augmentation Framework Based on Stable Diffusion | Mohamed Benkedadra et.al. | 2411.16128 | null |
| 2024-11-25 | You only thermoelastically deform once: Point Absorber Detection in LIGO Test Masses with YOLO | Simon R. Goode et.al. | 2411.16104 | null |
| 2024-11-25 | Leverage Task Context for Object Affordance Ranking | Haojie Huang et.al. | 2411.16082 | null |
| 2024-11-22 | A Real-Time DETR Approach to Bangladesh Road Object Detection for Autonomous Vehicles | Irfan Nafiz Shahan et.al. | 2411.15110 | null |
| 2024-11-22 | MSSF: A 4D Radar and Camera Fusion Framework With Multi-Stage Sampling for 3D Object Detection in Autonomous Driving | Hongsi Liu et.al. | 2411.15016 | null |
| 2024-11-22 | VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving | Haiming Zhang et.al. | 2411.14716 | null |
| 2024-11-21 | Unveiling the Hidden: A Comprehensive Evaluation of Underwater Image Enhancement and Its Impact on Object Detection | Ali Awad et.al. | 2411.14626 | null |
| 2024-11-21 | DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding | Tianhe Ren et.al. | 2411.14347 | link |
| 2024-11-21 | AnywhereDoor: Multi-Target Backdoor Attacks on Object Detection | Jialin Lu et.al. | 2411.14243 | null |
| 2024-11-21 | Transforming Static Images Using Generative Models for Video Salient Object Detection | Suhwan Cho et.al. | 2411.13975 | link |
| 2024-11-21 | Multitask Learning for SAR Ship Detection with Gaussian-Mask Joint Segmentation | Ming Zhao et.al. | 2411.13847 | null |
| 2024-11-20 | MambaDETR: Query-based Temporal Modeling using State Space Model for Multi-View 3D Object Detection | Tong Ning et.al. | 2411.13628 | null |
| 2024-11-20 | DIS-Mine: Instance Segmentation for Disaster-Awareness in Poor-Light Condition in Underground Mines | Mizanur Rahman Jewel et.al. | 2411.13544 | null |
| 2024-11-20 | A Resource Efficient Fusion Network for Object Detection in Bird’s-Eye View using Camera and Raw Radar Data | Kavin Chandrasekaran et.al. | 2411.13311 | link |
| 2024-11-20 | VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation | Chengjie Huang et.al. | 2411.13186 | null |
| 2024-11-20 | RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation | Christoph Reinders et.al. | 2411.13150 | link |
| 2024-11-20 | YCB-LUMA: YCB Object Dataset with Luminance Keying for Object Localization | Thomas Pöllabauer et.al. | 2411.13149 | link |
| 2024-11-20 | Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension | Yongdong Luo et.al. | 2411.13093 | link |
| 2024-11-20 | Bounding-box Watermarking: Defense against Model Extraction Attacks on Object Detectors | Satoru Koda et.al. | 2411.13047 | null |
| 2024-11-20 | Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection | Xinhao Zhong et.al. | 2411.13001 | null |
| 2024-11-19 | Maps from Motion (MfM): Generating 2D Semantic Maps from Sparse Multi-view Images | Matteo Toso et.al. | 2411.12620 | null |
| 2024-11-19 | GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving | Shaoqing Xu et.al. | 2411.12452 | null |
| 2024-11-19 | Physics-Guided Detector for SAR Airplanes | Zhongling Huang et.al. | 2411.12301 | link |
| 2024-11-18 | Scaling Deep Learning Research with Kubernetes on the NRP Nautilus HyperCluster | J. Alex Hurt et.al. | 2411.12038 | null |
| 2024-11-18 | LightFFDNets: Lightweight Convolutional Neural Networks for Rapid Facial Forgery Detection | Günel Jabbarlı et.al. | 2411.11826 | null |
| 2024-11-18 | WoodYOLO: A Novel Object Detector for Wood Species Detection in Microscopic Images | Lars Nieradzik et.al. | 2411.11738 | null |
| 2024-11-18 | Exploring Emerging Trends and Research Opportunities in Visual Place Recognition | Antonios Gasteratos et.al. | 2411.11481 | null |
| 2024-11-18 | SL-YOLO: A Stronger and Lighter Drone Target Detection Model | Defan Chen et.al. | 2411.11477 | null |
| 2024-11-19 | EVT: Efficient View Transformation for Multi-Modal 3D Object Detection | Yongjin Lee et.al. | 2411.10715 | null |
| 2024-11-15 | Vision Eagle Attention: A New Lens for Advancing Image Classification | Mahmudul Hasan et.al. | 2411.10564 | link |
| 2024-11-15 | Interactive Image-Based Aphid Counting in Yellow Water Traps under Stirring Actions | Xumin Gao et.al. | 2411.10357 | null |
| 2024-11-15 | RETR: Multi-View Radar Detection Transformer for Indoor Perception | Ryoma Yataka et.al. | 2411.10293 | null |
| 2024-11-15 | Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning | Jingru Yang et.al. | 2411.10252 | null |
| 2024-11-15 | Real-Time AI-Driven People Tracking and Counting Using Overhead Cameras | Ishrath Ahamed et.al. | 2411.10072 | null |
| 2024-11-15 | Diachronic Document Dataset for Semantic Layout Analysis | Thibault Clérice et.al. | 2411.10068 | null |
| 2024-11-14 | Adversarial Attacks Using Differentiable Rendering: A Survey | Matthew Hull et.al. | 2411.09749 | null |
| 2024-11-14 | Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration | Yifan Shao et.al. | 2411.09604 | link |
| 2024-11-14 | Long-Tailed Object Detection Pre-training: Dynamic Rebalancing Contrastive Learning with Dual Reconstruction | Chen-Long Duan et.al. | 2411.09453 | null |
| 2024-11-14 | Instruction-Driven Fusion of Infrared-Visible Images: Tailoring for Diverse Downstream Tasks | Zengyi Yang et.al. | 2411.09387 | null |
| 2024-11-14 | DT-JRD: Deep Transformer based Just Recognizable Difference Prediction Model for Video Coding for Machines | Junqi Liu et.al. | 2411.09308 | null |
| 2024-11-14 | Cross-Modal Consistency in Multimodal Large Language Models | Xiang Zhang et.al. | 2411.09273 | null |
| 2024-11-14 | LEAP:D – A Novel Prompt-based Approach for Domain-Generalized Aerial Object Detection | Chanyeong Park et.al. | 2411.09180 | null |
| 2024-11-13 | Multimodal Object Detection using Depth and Image Data for Manufacturing Parts | Nazanin Mahjourian et.al. | 2411.09062 | null |
| 2024-11-13 | DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models | Yongdong Wang et.al. | 2411.09022 | link |
| 2024-11-13 | UIFormer: A Unified Transformer-based Framework for Incremental Few-Shot Object Detection and Instance Segmentation | Chengyuan Zhang et.al. | 2411.08569 | null |
| 2024-11-13 | Methodology for a Statistical Analysis of Influencing Factors on 3D Object Detection Performance | Anton Kuznietsov et.al. | 2411.08482 | null |
| 2024-11-13 | V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion | Xun Huang et.al. | 2411.08402 | link |
| 2024-11-12 | Large-scale Remote Sensing Image Target Recognition and Automatic Annotation | Wuzheng Dong et.al. | 2411.07802 | link |
| 2024-11-12 | Efficient 3D Perception on Multi-Sweep Point Cloud with Gumbel Spatial Pruning | Jianhao Li et.al. | 2411.07742 | null |
| 2024-11-12 | Depthwise Separable Convolutions with Deep Residual Convolutions | Md Arid Hasan et.al. | 2411.07544 | null |
| 2024-11-11 | Transformers for Charged Particle Track Reconstruction in High Energy Physics | Samuel Van Stroud et.al. | 2411.07149 | null |
| 2024-11-11 | Multi-scale Frequency Enhancement Network for Blind Image Deblurring | Yawen Xiang et.al. | 2411.06893 | null |
| 2024-11-11 | Fast and Efficient Transformer-based Method for Bird’s Eye View Instance Prediction | Miguel Antunes-García et.al. | 2411.06851 | link |
| 2024-11-11 | AV-PedAware: Self-Supervised Audio-Visual Fusion for Dynamic Pedestrian Awareness | Yizhuo Yang et.al. | 2411.06789 | null |
| 2024-11-11 | United Domain Cognition Network for Salient Object Detection in Optical Remote Sensing Images | Yanguang Sun et.al. | 2411.06703 | link |
| 2024-11-11 | Track Any Peppers: Weakly Supervised Sweet Pepper Tracking Using VLMs | Jia Syuen Lim et.al. | 2411.06702 | null |
| 2024-11-11 | LFSamba: Marry SAM with Mamba for Light Field Salient Object Detection | Zhengyi Liu et.al. | 2411.06652 | null |
| 2024-11-09 | Robust Detection of LLM-Generated Text: A Comparative Analysis | Yongye Su et.al. | 2411.06248 | null |
| 2024-11-09 | LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance Representation | Weijie Ma et.al. | 2411.06173 | link |
| 2024-11-09 | AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems | Zhiyu Zhu et.al. | 2411.06146 | null |
| 2024-11-08 | Open-set object detection: towards unified problem formulation and benchmarking | Hejer Ammar et.al. | 2411.05564 | null |
| 2024-11-08 | ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving | Tao Ma et.al. | 2411.05311 | null |
| 2024-11-08 | SimpleBEV: Improved LiDAR-Camera Fusion Architecture for 3D Object Detection | Yun Zhao et.al. | 2411.05292 | null |
| 2024-11-07 | On the Inherent Robustness of One-Stage Object Detection against Out-of-Distribution Data | Aitor Martinez-Seras et.al. | 2411.04586 | null |
| 2024-11-07 | l0-Regularized Sparse Coding-based Interpretable Network for Multi-Modal Image Fusion | Gargi Panda et.al. | 2411.04519 | null |
| 2024-11-07 | Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player’s Trajectory | Ali K. AlShami et.al. | 2411.04501 | null |
| 2024-11-07 | SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation | Xun Tu et.al. | 2411.04386 | null |
| 2024-11-07 | UEVAVD: A Dataset for Developing UAV’s Eye View Active Object Detection | Xinhua Jiang et.al. | 2411.04348 | null |
| 2024-11-07 | GazeGen: Gaze-Driven User Interaction for Visual Content Generation | He-Yen Hsieh et.al. | 2411.04335 | null |
| 2024-11-06 | An Enhancement of Haar Cascade Algorithm Applied to Face Recognition for Gate Pass Security | Clarence A. Antipona et.al. | 2411.03831 | null |
| 2024-11-06 | Understanding the Effects of Human-written Paraphrases in LLM-generated Text Detection | Hiu Ting Lau et.al. | 2411.03806 | link |
| 2024-11-06 | Efficient Fourier Filtering Network with Contrastive Learning for UAV-based Unaligned Bi-modal Salient Object Detection | Pengfei Lyu et.al. | 2411.03728 | link |
| 2024-11-06 | Estimation of Psychosocial Work Environment Exposures Through Video Object Detection. Proof of Concept Using CCTV Footage | Claus D. Hansen et.al. | 2411.03724 | null |
| 2024-11-06 | Hybrid Attention for Robust RGB-T Pedestrian Detection in Real-World Conditions | Arunkumar Rathinam et.al. | 2411.03576 | null |
| 2024-11-05 | An Application-Agnostic Automatic Target Recognition System Using Vision Language Models | Anthony Palladino et.al. | 2411.03491 | null |
| 2024-11-05 | Self-supervised cross-modality learning for uncertainty-aware object detection and recognition in applications which lack pre-labelled training data | Irum Mehboob et.al. | 2411.03082 | null |
| 2024-11-05 | CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection | Jisong Kim et.al. | 2411.03013 | null |
| 2024-11-05 | Centerness-based Instance-aware Knowledge Distillation with Task-wise Mutual Lifting for Object Detection on Drone Imagery | Bowei Du et.al. | 2411.02861 | null |
| 2024-11-05 | Correlation of Object Detection Performance with Visual Saliency and Depth Estimation | Matthias Bartolo et.al. | 2411.02844 | link |
| 2024-11-05 | ERUP-YOLO: Enhancing Object Detection Robustness for Adverse Weather Condition by Unified Image-Adaptive Processing | Yuka Ogino et.al. | 2411.02799 | null |
| 2024-11-05 | Real-Time Text Detection with Similar Mask in Traffic, Industrial, and Natural Scenes | Xu Han et.al. | 2411.02794 | link |
| 2024-11-05 | Efficient Feature Aggregation and Scale-Aware Regression for Monocular 3D Object Detection | Yifan Wang et.al. | 2411.02747 | null |
| 2024-11-05 | Analysis of Multi-epoch JWST Images of $\sim 300$ Little Red Dots: Tentative Detection of Variability in a Minority of Sources | Zijian Zhang et.al. | 2411.02729 | null |
| 2024-11-04 | Intelligent Video Recording Optimization using Activity Detection for Surveillance Systems | Youssef Elmir et.al. | 2411.02632 | null |
| 2024-11-04 | SIRA: Scalable Inter-frame Relation and Association for Radar Perception | Ryoma Yataka et.al. | 2411.02220 | null |
| 2024-11-04 | Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery | Robert Fonod et.al. | 2411.02136 | link |
| 2024-11-04 | Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation | Yan Li et.al. | 2411.02057 | link |
| 2024-11-04 | V-CAS: A Realtime Vehicle Anti Collision System Using Vision Transformer on Multi-Camera Streams | Muhammad Waqas Ashraf et.al. | 2411.01963 | null |
| 2024-11-04 | Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models | Sharat Agarwal et.al. | 2411.01925 | null |
| 2024-11-04 | LiDAttack: Robust Black-box Attack on LiDAR-based Object Detection | Jinyin Chen et.al. | 2411.01889 | link |
| 2024-11-03 | ROAD-Waymo: Action Awareness at Scale for Autonomous Driving | Salman Khan et.al. | 2411.01683 | null |
| 2024-11-03 | OSAD: Open-Set Aircraft Detection in SAR Images | Xiayang Xiao et.al. | 2411.01597 | null |
| 2024-11-03 | One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection | Zhenyu Wang et.al. | 2411.01584 | null |
| 2024-11-03 | A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning | Fei Wang et.al. | 2411.01445 | null |
| 2024-10-31 | ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images | Timing Yang et.al. | 2410.24001 | link |
| 2024-10-31 | Localization, balance and affinity: a stronger multifaceted collaborative salient object detector in remote sensing images | Yakun Xie et.al. | 2410.23991 | null |
| 2024-10-31 | Uncertainty Estimation for 3D Object Detection via Evidential Learning | Nikita Durasov et.al. | 2410.23910 | null |
| 2024-10-31 | From Web Data to Real Fields: Low-Cost Unsupervised Domain Adaptation for Agricultural Robots | Vasileios Tzouras et.al. | 2410.23906 | null |
| 2024-10-31 | Open-Set 3D object detection in LiDAR data as an Out-of-Distribution problem | Louis Soum-Fontez et.al. | 2410.23767 | null |
| 2024-10-31 | DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios | Junchao Wu et.al. | 2410.23746 | link |
| 2024-10-31 | GigaCheck: Detecting LLM-generated Content | Irina Tolstykh et.al. | 2410.23728 | null |
| 2024-10-31 | Context-Aware Token Selection and Packing for Enhanced Vision Transformer | Tianyi Zhang et.al. | 2410.23608 | null |
| 2024-10-30 | EMMA: End-to-End Multimodal Model for Autonomous Driving | Jyh-Jing Hwang et.al. | 2410.23262 | null |
| 2024-10-30 | S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving | Maciej K. Wozniak et.al. | 2410.23085 | null |
| 2024-10-30 | First Place Solution to the ECCV 2024 ROAD++ Challenge @ ROAD++ Spatiotemporal Agent Detection 2024 | Tengfei Zhang et.al. | 2410.23077 | null |
| 2024-10-30 | AdaptiveISP: Learning an Adaptive Image Signal Processor for Object Detection | Yujin Wang et.al. | 2410.22939 | null |
| 2024-10-30 | YOLOv11 for Vehicle Detection: Advancements, Performance, and Applications in Intelligent Transportation Systems | Mujadded Al Rabbani Alif et.al. | 2410.22898 | null |
| 2024-10-29 | Unified Domain Generalization and Adaptation for Multi-View 3D Object Detection | Gyusam Chang et.al. | 2410.22461 | null |
| 2024-10-29 | Lighten CARAFE: Dynamic Lightweight Upsampling with Guided Reassemble Kernels | Ruigang Fu et.al. | 2410.22139 | link |
| 2024-10-29 | Data Generation for Hardware-Friendly Post-Training Quantization | Lior Dikstein et.al. | 2410.22110 | null |
| 2024-10-29 | Cognitive Semantic Augmentation LEO Satellite Networks for Earth Observation | Hong-fu Chou et.al. | 2410.21916 | null |
| 2024-10-29 | PK-YOLO: Pretrained Knowledge Guided YOLO for Brain Tumor Detection in Multiplanar MRI Slices | Ming Kang et.al. | 2410.21822 | link |
| 2024-10-28 | MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps | Yating Xu et.al. | 2410.21566 | link |
| 2024-10-28 | TACO: Adversarial Camouflage Optimization on Trucks to Fool Object Detectors | Adonisz Dimitriu et.al. | 2410.21443 | null |
| 2024-10-28 | Joint Audio-Visual Idling Vehicle Detection with Streamlined Input Dependencies | Xiwen Li et.al. | 2410.21170 | null |
| 2024-10-28 | Synthetica: Large Scale Synthetic Data for Robot Perception | Ritvik Singh et.al. | 2410.21153 | null |
| 2024-10-28 | DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning | Xun Guo et.al. | 2410.20964 | link |
| 2024-10-28 | IndraEye: Infrared Electro-Optical UAV-based Perception Dataset for Robust Downstream Tasks | Manjunath D et.al. | 2410.20953 | null |
| 2024-10-28 | SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity | Kunyun Wang et.al. | 2410.20790 | null |
| 2024-10-27 | Sebica: Lightweight Spatial and Efficient Bidirectional Channel Attention Super Resolution Network | Chongxiao Liu et.al. | 2410.20546 | null |
| 2024-10-27 | Guidance Disentanglement Network for Optics-Guided Thermal UAV Image Super-Resolution | Zhicheng Zhao et.al. | 2410.20466 | link |
| 2024-10-27 | Open-Vocabulary Object Detection via Language Hierarchy | Jiaxing Huang et.al. | 2410.20371 | null |
| 2024-10-27 | Historical Test-time Prompt Tuning for Vision Foundation Models | Jingyi Zhang et.al. | 2410.20346 | null |
| 2024-10-25 | OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery | Philipe Dias et.al. | 2410.19965 | null |
| 2024-10-25 | MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services | Hongjia Wu et.al. | 2410.19665 | null |
| 2024-10-25 | Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models | Shenghao Fu et.al. | 2410.19635 | null |
| 2024-10-25 | MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors | Fanqi Pu et.al. | 2410.19590 | link |
| 2024-10-25 | DECADE: Towards Designing Efficient-yet-Accurate Distance Estimation Modules for Collision Avoidance in Mobile Advanced Driver Assistance Systems | Muhammad Zaeem Shahzad et.al. | 2410.19336 | null |
| 2024-10-25 | In-Simulation Testing of Deep Learning Vision Models in Autonomous Robotic Manipulators | Dmytro Humeniuk et.al. | 2410.19277 | null |
| 2024-10-24 | HUE Dataset: High-Resolution Event and Frame Sequences for Low-Light Vision | Burak Ercan et.al. | 2410.19164 | null |
| 2024-10-24 | Optimizing Edge Offloading Decisions for Object Detection | Jiaming Qiu et.al. | 2410.18919 | link |
| 2024-10-24 | You Only Look Around: Learning Illumination Invariant Feature for Low-light Object Detection | Mingbo Hong et.al. | 2410.18398 | null |
| 2024-10-24 | Thermal Chameleon: Task-Adaptive Tone-mapping for Radiometric Thermal-Infrared images | Dong-Guw Lee et.al. | 2410.18340 | link |
| 2024-10-23 | KhmerST: A Low-Resource Khmer Scene Text Detection and Recognition Benchmark | Vannkinh Nom et.al. | 2410.18277 | null |
| 2024-10-23 | Automated Defect Detection and Grading of Piarom Dates Using Deep Learning | Nasrin Azimi et.al. | 2410.18208 | null |
| 2024-10-23 | DREB-Net: Dual-stream Restoration Embedding Blur-feature Fusion Network for High-mobility UAV Object Detection | Qingpeng Li et.al. | 2410.17822 | link |
| 2024-10-23 | YOLO-Vehicle-Pro: A Cloud-Edge Collaborative Framework for Object Detection in Autonomous Driving under Adverse Weather Conditions | Xiguang Li et.al. | 2410.17734 | null |
| 2024-10-23 | YOLOv11: An Overview of the Key Architectural Enhancements | Rahima Khanam et.al. | 2410.17725 | null |
| 2024-10-23 | PlantCamo: Plant Camouflage Detection | Jinyu Yang et.al. | 2410.17598 | link |
| 2024-10-23 | OVT-B: A New Large-Scale Benchmark for Open-Vocabulary Multi-Object Tracking | Haiji Liang et.al. | 2410.17534 | link |
| 2024-10-22 | EPContrast: Effective Point-level Contrastive Learning for Large-scale Point Cloud Understanding | Zhiyi Pan et.al. | 2410.17207 | null |
| 2024-10-22 | YOLO-TS: Real-Time Traffic Sign Detection with Enhanced Accuracy Using Optimized Receptive Fields and Anchor-Free Fusion | Junzhou Chen et.al. | 2410.17144 | null |
| 2024-10-22 | FlightAR: AR Flight Assistance Interface with Multiple Video Streams and Object Detection Aimed at Immersive Drone Control | Oleg Sautenkov et.al. | 2410.16943 | null |
| 2024-10-22 | AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models | Yongjian Wu et.al. | 2410.16820 | link |
| 2024-10-22 | DSORT-MCU: Detecting Small Objects in Real-Time on Microcontroller Units | Liam Boyle et.al. | 2410.16769 | null |
| 2024-10-22 | DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model | Zhixiong Nan et.al. | 2410.16707 | null |
| 2024-10-22 | Fire and Smoke Detection with Burning Intensity Representation | Xiaoyi Han et.al. | 2410.16642 | link |
| 2024-10-21 | Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models | Yufei Zhan et.al. | 2410.16163 | link |
| 2024-10-21 | Multi-Sensor Fusion for UAV Classification Based on Feature Maps of Image and Radar Data | Nikos Sakellariou et.al. | 2410.16089 | null |
| 2024-10-21 | Few-shot target-driven instance detection based on open-vocabulary object detection models | Ben Crulis et.al. | 2410.16028 | null |
| 2024-10-21 | How Important are Data Augmentations to Close the Domain Gap for Object Detection in Orbit? | Maximilian Ulmer et.al. | 2410.15766 | null |
| 2024-10-21 | P-YOLOv8: Efficient and Accurate Real-Time Detection of Distracted Driving | Mohamed R. Elshamy et.al. | 2410.15602 | null |
| 2024-10-21 | Deep Learning and Machine Learning – Object Detection and Semantic Segmentation: From Theory to Applications | Jintao Ren et.al. | 2410.15584 | null |
| 2024-10-21 | Online Pseudo-Label Unified Object Detection for Multiple Datasets Training | XiaoJun Tang et.al. | 2410.15569 | null |
| 2024-10-20 | TrackMe:A Simple and Effective Multiple Object Tracking Annotation Tool | Thinh Phan et.al. | 2410.15518 | null |
| 2024-10-20 | YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary | Hao-Tang Tsui et.al. | 2410.15346 | null |
| 2024-10-20 | Open-vocabulary vs. Closed-set: Best Practice for Few-shot Object Detection Considering Text Describability | Yusuke Hosoya et.al. | 2410.15315 | null |
| 2024-10-18 | MultiOrg: A Multi-rater Organoid-detection Dataset | Christina Bukas et.al. | 2410.14612 | null |
| 2024-10-18 | Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement | Zihao Cheng et.al. | 2410.14259 | null |
| 2024-10-18 | Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech | Shuwei He et.al. | 2410.14101 | link |
| 2024-10-18 | Enhancing In-vehicle Multiple Object Tracking Systems with Embeddable Ising Machines | Kosuke Tatsumura et.al. | 2410.14093 | null |
| 2024-10-17 | FaceSaliencyAug: Mitigating Geographic, Gender and Stereotypical Biases via Saliency-Based Data Augmentation | Teerath Kumar et.al. | 2410.14070 | null |
| 2024-10-17 | Spatiotemporal Object Detection for Improved Aerial Vehicle Detection in Traffic Monitoring | Kristina Telegraph et.al. | 2410.13616 | null |
| 2024-10-17 | RemoteDet-Mamba: A Hybrid Mamba-CNN Network for Multi-modal Object Detection in Remote Sensing Images | Kejun Ren et.al. | 2410.13532 | null |
| 2024-10-16 | Syn2Real Domain Generalization for Underwater Mine-like Object Detection Using Side-Scan Sonar | Aayush Agrawal et.al. | 2410.12953 | null |
| 2024-10-16 | MambaBEV: An efficient 3D detection model with Mamba2 | Zihan You et.al. | 2410.12673 | null |
| 2024-10-16 | On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs | Herun Wan et.al. | 2410.12600 | null |
| 2024-10-16 | Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion | Minkyoung Cho et.al. | 2410.12592 | null |
| 2024-10-16 | Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look | Yong Zhang et.al. | 2410.12396 | null |
| 2024-10-16 | Real-time Stereo-based 3D Object Detection for Streaming Perception | Changcai Li et.al. | 2410.12394 | link |
| 2024-10-16 | Context-Infused Visual Grounding for Art | Selina Khan et.al. | 2410.12369 | link |
| 2024-10-16 | Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond | Pengwei Liang et.al. | 2410.12274 | null |
| 2024-10-16 | Optimizing YOLOv5s Object Detection through Knowledge Distillation algorithm | Guanming Huang et.al. | 2410.12259 | null |
| 2024-10-16 | SAM-Guided Masked Token Prediction for 3D Scene Understanding | Zhimin Chen et.al. | 2410.12158 | null |
| 2024-10-16 | Unveiling the Limits of Alignment: Multi-modal Dynamic Local Fusion Network and A Benchmark for Unaligned RGBT Video Object Detection | Qishun Wang et.al. | 2410.12143 | null |
| 2024-10-15 | Fractal Calibration for long-tailed object detection | Konstantinos Panagiotis Alexandridis et.al. | 2410.11774 | null |
| 2024-10-15 | POLO – Point-based, multi-class animal detection | Giacomo May et.al. | 2410.11741 | null |
| 2024-10-15 | YOLO-ELA: Efficient Local Attention Modeling for High-Performance Real-Time Insulator Defect Detection | Olalekan Akindele et.al. | 2410.11727 | null |
| 2024-10-15 | SeaDATE: Remedy Dual-Attention Transformer with Semantic Alignment via Contrast Learning for Multimodal Object Detection | Shuhan Dong et.al. | 2410.11358 | null |
| 2024-10-15 | Open World Object Detection: A Survey | Yiming Li et.al. | 2410.11301 | null |
| 2024-10-15 | Representation Similarity: A Better Guidance of DNN Layer Sharing for Edge Computing without Training | Bryan Bo Cao et.al. | 2410.11233 | null |
| 2024-10-15 | TEOcc: Radar-camera Multi-modal Occupancy Prediction via Temporal Enhancement | Zhiwei Lin et.al. | 2410.11228 | null |
| 2024-10-15 | CVCP-Fusion: On Implicit Depth Estimation for 3D Bounding Box Prediction | Pranav Gupta et.al. | 2410.11211 | link |
| 2024-10-15 | Multiview Scene Graph | Juexiao Zhang et.al. | 2410.11187 | null |
| 2024-10-14 | UAV3D: A Large-scale 3D Perception Benchmark for Unmanned Aerial Vehicles | Hui Ye et.al. | 2410.11125 | null |
| 2024-10-14 | ROSAR: An Adversarial Re-Training Framework for Robust Side-Scan Sonar Object Detection | Martin Aubard et.al. | 2410.10554 | link |
| 2024-10-14 | Learning to Ground VLMs without Forgetting | Aritra Bhowmik et.al. | 2410.10491 | null |
| 2024-10-14 | SMART-TRACK: A Novel Kalman Filter-Guided Sensor Fusion For Robust UAV Object Tracking in Dynamic Environments | Khaled Gabr et.al. | 2410.10409 | null |
| 2024-10-14 | V2M: Visual 2-Dimensional Mamba for Image Representation Learning | Chengkun Wang et.al. | 2410.10382 | link |
| 2024-10-14 | GlobalMamba: Global Image Serialization for Vision Mamba | Chengkun Wang et.al. | 2410.10316 | link |
| 2024-10-14 | ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object | Jiwei Chen et.al. | 2410.10298 | null |
| 2024-10-14 | Out-of-Bounding-Box Triggers: A Stealthy Approach to Cheat Object Detectors | Tao Lin et.al. | 2410.10091 | link |
| 2024-10-15 | Optimizing Waste Management with Advanced Object Detection for Garbage Classification | Everest Z. Kuang et.al. | 2410.09975 | null |
| 2024-10-13 | EITNet: An IoT-Enhanced Framework for Real-Time Basketball Action Recognition | Jingyu Liu et.al. | 2410.09954 | null |
| 2024-10-13 | LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond | Md Tanvir Islam et.al. | 2410.09831 | link |
| 2024-10-11 | DA-Ada: Learning Domain-Aware Adapter for Domain Adaptive Object Detection | Haochen Li et.al. | 2410.09004 | null |
| 2024-10-11 | LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection | Mingjia Li et.al. | 2410.08810 | null |
| 2024-10-11 | Hespi: A pipeline for automatically detecting information from hebarium specimen sheets | Robert Turnbull et.al. | 2410.08740 | null |
| 2024-10-11 | MMLF: Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation | Qihang Yang et.al. | 2410.08739 | null |
| 2024-10-11 | Boosting Open-Vocabulary Object Detection by Handling Background Samples | Ruizhe Zeng et.al. | 2410.08645 | null |
| 2024-10-11 | DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention | Nguyen Huu Bao Long et.al. | 2410.08582 | link |
| 2024-10-11 | VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking | Zekun Qian et.al. | 2410.08529 | null |
| 2024-10-10 | Are We Ready for Real-Time LiDAR Semantic Segmentation in Autonomous Driving? | Samir Abou Haidar et.al. | 2410.08365 | null |
| 2024-10-10 | PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection | Botao Ren et.al. | 2410.08210 | null |
| 2024-10-10 | Robust AI-Generated Text Detection by Restricted Embeddings | Kristian Kuznetsov et.al. | 2410.08113 | null |
| 2024-10-10 | Dynamic Object Catching with Quadruped Robot Front Legs | André Schakkal et.al. | 2410.08065 | null |
| 2024-10-10 | HeightFormer: A Semantic Alignment Monocular 3D Object Detection Method from Roadside Perspective | Pei Liu et.al. | 2410.07758 | null |
| 2024-10-10 | O1O: Grouping of Known Classes to Identify Unknown Objects as Odd-One-Out | Mısra Yavuz et.al. | 2410.07514 | null |
| 2024-10-09 | Progressive Multi-Modal Fusion for Robust 3D Object Detection | Rohit Mohan et.al. | 2410.07475 | null |
| 2024-10-09 | Self-Supervised Learning for Real-World Object Detection: a Survey | Alina Ciocarlan et.al. | 2410.07442 | null |
| 2024-10-09 | Robust infrared small target detection using self-supervised and a contrario paradigms | Alina Ciocarlan et.al. | 2410.07437 | null |
| 2024-10-09 | SurANet: Surrounding-Aware Network for Concealed Object Detection via Highly-Efficient Interactive Contrastive Learning Strategy | Yuhan Kang et.al. | 2410.06842 | link |
| 2024-10-09 | Rethinking the Evaluation of Visible and Infrared Image Fusion | Dayan Guan et.al. | 2410.06811 | link |
| 2024-10-09 | QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model | Fei Xie et.al. | 2410.06806 | null |
| 2024-10-09 | QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird’s-Eye-View Representation | Yuxin Li et.al. | 2410.06516 | null |
| 2024-10-08 | Adver-City: Open-Source Multi-Modal Dataset for Collaborative Perception Under Adverse Weather Conditions | Mateus Karvat et.al. | 2410.06380 | null |
| 2024-10-08 | Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach | Sha Guo et.al. | 2410.06149 | null |
| 2024-10-08 | Training-free LLM-generated Text Detection by Mining Token Probability Sequences | Yihuai Xu et.al. | 2410.06072 | null |
| 2024-10-08 | Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts | Zhiwei Lin et.al. | 2410.05963 | null |
| 2024-10-08 | Learning Gaussian Data Augmentation in Feature Space for One-shot Object Detection in Manga | Takara Taniguchi et.al. | 2410.05935 | null |
| 2024-10-08 | Unobserved Object Detection using Generative Models | Subhransu S. Bhattacharjee et.al. | 2410.05869 | null |
| 2024-10-07 | Real-Time Truly-Coupled Lidar-Inertial Motion Correction and Spatiotemporal Dynamic Object Detection | Cedric Le Gentil et.al. | 2410.05152 | null |
| 2024-10-07 | Human-in-the-loop Reasoning For Traffic Sign Detection: Collaborative Approach Yolo With Video-llava | Mehdi Azarafza et.al. | 2410.05096 | null |
| 2024-10-07 | Improving Object Detection via Local-global Contrastive Learning | Danai Triantafyllidou et.al. | 2410.05058 | null |
| 2024-10-07 | Windshield Integration of Thermal and Color Fusion for Automatic Emergency Braking in Low Visibility Conditions | Gabriel Jobert et.al. | 2410.04928 | null |
| 2024-10-07 | Improved detection of discarded fish species through BoxAL active learning | Maria Sokolova et.al. | 2410.04880 | link |
| 2024-10-06 | Learning De-Biased Representations for Remote-Sensing Imagery | Zichen Tian et.al. | 2410.04546 | link |
| 2024-10-05 | AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text | Ximing Lu et.al. | 2410.04265 | null |
| 2024-10-05 | ETHcavation: A Dataset and Pipeline for Panoptic Scene Understanding and Object Tracking in Dynamic Construction Environments | Lorenzo Terenzi et.al. | 2410.04250 | null |
| 2024-10-05 | Fast Object Detection with a Machine Learning Edge Device | Richard C. Rodriguez et.al. | 2410.04173 | null |
| 2024-10-05 | Robust Task-Oriented Communication Framework for Real-Time Collaborative Vision Perception | Zhengru Fang et.al. | 2410.04168 | null |
| 2024-10-04 | DRAFTS: A Deep Learning-Based Radio Fast Transient Search Pipeline | Yong-Kun Zhang et.al. | 2410.03200 | null |
| 2024-10-03 | Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review | Sungduk Yu et.al. | 2410.03019 | null |
| 2024-10-04 | Learning 3D Perception from Others’ Predictions | Jinsu Yoo et.al. | 2410.02646 | null |
| 2024-10-02 | Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker | Xinlong Hou et.al. | 2410.01966 | null |
| 2024-10-02 | 3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection | Yang Cao et.al. | 2410.01647 | link |
| 2024-10-02 | Gaussian-Det: Learning Closed-Surface Gaussians for 3D Object Detection | Hongru Yan et.al. | 2410.01404 | null |
| 2024-10-02 | Finetuning Pre-trained Model with Limited Data for LiDAR-based 3D Object Detection by Bridging Domain Gaps | Jiyun Jang et.al. | 2410.01319 | null |
| 2024-10-02 | Panopticus: Omnidirectional 3D Object Detection on Resource-constrained Edge Devices | Jeho Lee et.al. | 2410.01270 | null |
| 2024-10-02 | High and Low Resolution Tradeoffs in Roadside Multimodal Sensing | Shaozu Ding et.al. | 2410.01250 | null |
| 2024-10-02 | Perceptual Piercing: Human Visual Cue-based Object Detection in Low Visibility Conditions | Ashutosh Kumar et.al. | 2410.01225 | link |
| 2024-10-02 | A versatile machine learning workflow for high-throughput analysis of supported metal catalyst particles | Arda Genc et.al. | 2410.01213 | link |
| 2024-10-01 | Synthetic imagery for fuzzy object detection: A comparative study | Siavash H. Khajavi et.al. | 2410.01124 | null |
| 2024-10-01 | Generating Seamless Virtual Immunohistochemical Whole Slide Images with Content and Color Consistency | Sitong Liu et.al. | 2410.01072 | null |
| 2024-10-01 | ARPOV: Expanding Visualization of Object Detection in AR with Panoramic Mosaic Stitching | Erin McGowan et.al. | 2410.01055 | null |
| 2024-09-30 | Accelerating Non-Maximum Suppression: A Graph Theory Perspective | King-Siong Si et.al. | 2409.20520 | link |
| 2024-09-30 | NUTRIVISION: A System for Automatic Diet Management in Smart Healthcare | Madhumita Veeramreddy et.al. | 2409.20508 | null |
| 2024-09-30 | Navigating Threats: A Survey of Physical Adversarial Attacks on LiDAR Perception Systems in Autonomous Vehicles | Amira Guesmi et.al. | 2409.20426 | null |
| 2024-09-30 | Training a Computer Vision Model for Commercial Bakeries with Primarily Synthetic Images | Thomas H. Schmitt et.al. | 2409.20122 | null |
| 2024-09-30 | GearTrack: Automating 6D Pose Estimation | Yu Deng et.al. | 2409.19986 | null |
| 2024-09-30 | TSdetector: Temporal-Spatial Self-correction Collaborative Learning for Colonoscopy Video Detection | Kaini Wang et.al. | 2409.19983 | null |
| 2024-09-30 | DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction | Zhen Yang et.al. | 2409.19972 | link |
| 2024-09-30 | HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes | Changfeng Feng et.al. | 2409.19833 | link |
| 2024-09-29 | Applying the Lower-Biased Teacher Model in Semi-Suepervised Object Detection | Shuang Wang et.al. | 2409.19703 | null |
| 2024-09-29 | OrientedFormer: An End-to-End Transformer-Based Oriented Object Detector in Remote Sensing Images | Jiaqi Zhao et.al. | 2409.19648 | link |
| 2024-09-27 | Spectral Wavelet Dropout: Regularization in the Wavelet Domain | Rinor Cakaj et.al. | 2409.18951 | null |
| 2024-09-27 | MCUBench: A Benchmark of Tiny Object Detectors on MCUs | Sudhakar Sah et.al. | 2409.18866 | link |
| 2024-09-27 | A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation | Jer Pelhan et.al. | 2409.18686 | null |
| 2024-09-27 | Query matching for spatio-temporal action detection with query-based object detector | Shimon Hori et.al. | 2409.18408 | null |
| 2024-09-26 | Efficient Microscopic Image Instance Segmentation for Food Crystal Quality Control | Xiaoyu Ji et.al. | 2409.18291 | null |
| 2024-09-26 | Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing | Huthaifa I. Ashqar et.al. | 2409.18286 | null |
| 2024-09-26 | GSON: A Group-based Social Navigation Framework with Large Multimodal Model | Shangyi Luo et.al. | 2409.18084 | null |
| 2024-09-27 | A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts | Aurel Pjetri et.al. | 2409.17851 | null |
| 2024-09-26 | Scene Understanding in Pick-and-Place Tasks: Analyzing Transformations Between Initial and Final Scenes | Seraj Ghasemi et.al. | 2409.17720 | null |
| 2024-09-26 | SLO-Aware Task Offloading within Collaborative Vehicle Platoons | Boris Sedlak et.al. | 2409.17667 | null |
| 2024-09-26 | CAMOT: Camera Angle-aware Multi-Object Tracking | Felix Limanta et.al. | 2409.17533 | null |
| 2024-09-25 | Transient Adversarial 3D Projection Attacks on Object Detection in Autonomous Driving | Ce Zhou et.al. | 2409.17403 | null |
| 2024-09-25 | AgRegNet: A Deep Regression Network for Flower and Fruit Density Estimation, Localization, and Counting in Orchards | Uddhav Bhattarai et.al. | 2409.17400 | null |
| 2024-09-25 | Energy-Efficient & Real-Time Computer Vision with Intelligent Skipping via Reconfigurable CMOS Image Sensors | Md Abdullah-Al Kaiser et.al. | 2409.17341 | null |
| 2024-09-25 | BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices | Yongqi Xu et.al. | 2409.17093 | link |
| 2024-09-25 | EventHDR: from Event to High-Speed HDR Videos and Beyond | Yunhao Zou et.al. | 2409.17029 | null |
| 2024-09-25 | Focus Entirety and Perceive Environment for Arbitrary-Shaped Text Detection | Xu Han et.al. | 2409.16827 | null |
| 2024-09-25 | XAI-guided Insulator Anomaly Detection for Imbalanced Datasets | Maximilian Andreas Hoefler et.al. | 2409.16821 | null |
| 2024-09-25 | Spotlight Text Detector: Spotlight on Candidate Regions Like a Camera | Xu Han et.al. | 2409.16820 | null |
| 2024-09-25 | Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices | Daghash K. Alqahtani et.al. | 2409.16808 | null |
| 2024-09-25 | Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation | Youngwan Jin et.al. | 2409.16706 | null |
| 2024-09-25 | TSBP: Improving Object Detection in Histology Images via Test-time Self-guided Bounding-box Propagation | Tingting Yang et.al. | 2409.16678 | link |
| 2024-09-25 | Source-Free Domain Adaptation for YOLO Object Detection | Simon Varailhon et.al. | 2409.16538 | null |
| 2024-09-24 | Real-Time Detection of Electronic Components in Waste Printed Circuit Boards: A Transformer-Based Approach | Muhammad Mohsin et.al. | 2409.16496 | null |
| 2024-09-24 | Tiny Robotics Dataset and Benchmark for Continual Object Detection | Francesco Pasti et.al. | 2409.16215 | link |
| 2024-09-24 | Seeing Faces in Things: A Model and Dataset for Pareidolia | Mark Hamilton et.al. | 2409.16143 | null |
| 2024-09-24 | HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection | Yuqi Ma et.al. | 2409.16136 | null |
| 2024-09-24 | Neuromorphic Drone Detection: an Event-RGB Multimodal Approach | Gabriele Magrini et.al. | 2409.16099 | null |
| 2024-09-24 | Open-World Object Detection with Instance Representation Learning | Sunoh Lee et.al. | 2409.16073 | null |
| 2024-09-24 | Towards Robust Object Detection: Identifying and Removing Backdoors via Module Inconsistency Analysis | Xianda Zhang et.al. | 2409.16057 | null |
| 2024-09-24 | Zero-Shot Detection of AI-Generated Images | Davide Cozzolino et.al. | 2409.15875 | null |
| 2024-09-24 | Automated Assessment of Multimodal Answer Sheets in the STEM domain | Rajlaxmi Patil et.al. | 2409.15749 | null |
| 2024-09-24 | Real-Time Pedestrian Detection on IoT Edge Devices: A Lightweight Deep Learning Approach | Muhammad Dany Alfikri et.al. | 2409.15740 | null |
| 2024-09-24 | PDT: Uav Target Detection Dataset for Pests and Diseases Tree | Mingle Zhou et.al. | 2409.15679 | link |
| 2024-09-18 | Applications of Knowledge Distillation in Remote Sensing: A Survey | Yassine Himeur et.al. | 2409.12111 | null |
| 2024-09-18 | Agglomerative Token Clustering | Joakim Bruslund Haurum et.al. | 2409.11923 | null |
| 2024-09-18 | RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework | Xiaoyu Li et.al. | 2409.11749 | null |
| 2024-09-17 | Open-Set Semantic Uncertainty Aware Metric-Semantic Graph Matching | Kurran Singh et.al. | 2409.11555 | null |
| 2024-09-17 | VALO: A Versatile Anytime Framework for LiDAR-based Object Detection Deep Neural Networks | Ahmet Soyyigit et.al. | 2409.11542 | link |
| 2024-09-17 | STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking | Jianbo Ma et.al. | 2409.11234 | link |
| 2024-09-19 | Vision foundation models: can they be applied to astrophysics data? | E. Lastufka et.al. | 2409.11175 | null |
| 2024-09-17 | UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height | Zichen Yu et.al. | 2409.11160 | null |
| 2024-09-17 | Unleashing the Potential of Mamba: Boosting a LiDAR 3D Sparse Detector by Using Cross-Model Knowledge Distillation | Rui Yu et.al. | 2409.11018 | null |
| 2024-09-17 | TrajSSL: Trajectory-Enhanced Semi-Supervised 3D Object Detection | Philip Jacobson et.al. | 2409.10901 | null |
| 2024-09-18 | Context-Dependent Interactable Graphical User Interface Element Detection for Spatial Computing Applications | Shuqing Li et.al. | 2409.10811 | null |
| 2024-09-16 | Online Learning via Memory: Retrieval-Augmented Detector Adaptation | Yanan Jian et.al. | 2409.10716 | null |
| 2024-09-16 | CoMamba: Real-time Cooperative Perception Unlocked with State Space Models | Jinlong Li et.al. | 2409.10699 | null |
| 2024-09-16 | Point2Graph: An End-to-end Point Cloud-based 3D Open-Vocabulary Scene Graph for Robot Navigation | Yifan Xu et.al. | 2409.10350 | null |
| 2024-09-16 | Performance of Human Annotators in Object Detection and Segmentation of Remotely Sensed Data | Roni Blushtein-Livnon et.al. | 2409.10272 | null |
| 2024-09-16 | Self-Updating Vehicle Monitoring Framework Employing Distributed Acoustic Sensing towards Real-World Settings | Xi Wang et.al. | 2409.10259 | null |
| 2024-09-16 | DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion | Yuchen Guo et.al. | 2409.10080 | null |
| 2024-09-16 | Towards Physically-Realizable Adversarial Attacks in Embodied Vision Navigation | Meng Chen et.al. | 2409.10071 | link |
| 2024-09-16 | LithoHoD: A Litho Simulator-Powered Framework for IC Layout Hotspot Detection | Hao-Chiang Shao et.al. | 2409.10021 | null |
| 2024-09-16 | Comprehensive Study on Sentiment Analysis: From Rule-based to modern LLM based system | Shailja Gupta et.al. | 2409.09989 | null |
| 2024-09-15 | Tracking Virtual Meetings in the Wild: Re-identification in Multi-Participant Virtual Meetings | Oriel Perl et.al. | 2409.09841 | null |
| 2024-09-15 | Template-based Multi-Domain Face Recognition | Anirudh Nanduri et.al. | 2409.09832 | null |
| 2024-09-15 | PersonaMark: Personalized LLM watermarking for model protection and user attribution | Yuehan Zhang et.al. | 2409.09739 | null |
| 2024-09-13 | Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing | Minh-Duc Vu et.al. | 2409.08885 | null |
| 2024-09-13 | Direct-CP: Directed Collaborative Perception for Connected and Autonomous Vehicles via Proactive Attention | Yihang Tao et.al. | 2409.08840 | null |
| 2024-09-13 | RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive Supervision | Shuo Wang et.al. | 2409.08475 | null |
| 2024-09-12 | X-ray Fluoroscopy Guided Localization and Steering of Medical Microrobots through Virtual Enhancement | Husnu Halid Alabay et.al. | 2409.08337 | null |
| 2024-09-12 | What is YOLOv9: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector | Muhammad Yaseen et.al. | 2409.07813 | null |
| 2024-09-11 | Object Depth and Size Estimation using Stereo-vision and Integration with SLAM | Layth Hamad et.al. | 2409.07623 | null |
| 2024-09-11 | Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models | Matthieu Dubois et.al. | 2409.07615 | null |
| 2024-09-11 | ENACT: Entropy-based Clustering of Attention Input for Improving the Computational Performance of Object Detection Transformers | Giorgos Savathrakis et.al. | 2409.07541 | link |
| 2024-09-11 | Watchlist Challenge: 3rd Open-set Face Detection and Identification | Furkan Kasım et.al. | 2409.07220 | null |
| 2024-09-11 | SCLNet: A Scale-Robust Complementary Learning Network for Object Detection in UAV Images | Xuexue Li et.al. | 2409.07024 | null |
| 2024-09-11 | ODYSSEE: Oyster Detection Yielded by Sensor Systems on Edge Electronics | Xiaomin Lin et.al. | 2409.07003 | null |
| 2024-09-11 | Brain-Inspired Stepwise Patch Merging for Vision Transformers | Yonghao Yu et.al. | 2409.06963 | null |
| 2024-09-10 | Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds | Mu Cai et.al. | 2409.06827 | link |
| 2024-09-10 | Technical Report of Mobile Manipulator Robot for Industrial Environments | Erfan Amoozad Khalili et.al. | 2409.06693 | null |
| 2024-09-10 | A comprehensive study on Blood Cancer detection and classification using Convolutional Neural Network | Md Taimur Ahad et.al. | 2409.06689 | null |
| 2024-09-10 | When to Extract ReID Features: A Selective Approach for Improved Multiple Object Tracking | Emirhan Bayar et.al. | 2409.06617 | link |
| 2024-09-10 | Transtreaming: Adaptive Delay-aware Transformer for Real-time Streaming Perception | Xiang Zhang et.al. | 2409.06584 | null |
| 2024-09-10 | Semi-Supervised 3D Object Detection with Chanel Augmentation using Transformation Equivariance | Minju Kang et.al. | 2409.06583 | null |
| 2024-09-10 | Knowledge Distillation via Query Selection for Detection Transformer | Yi Liu et.al. | 2409.06443 | null |
| 2024-09-10 | An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open Detection | Pengfei Qi et.al. | 2409.06300 | null |
| 2024-09-09 | Replay Consolidation with Label Propagation for Continual Object Detection | Riccardo De Monte et.al. | 2409.05650 | null |
| 2024-09-09 | Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery | Fan Zhang et.al. | 2409.05624 | null |
| 2024-09-09 | LEROjD: Lidar Extended Radar-Only Object Detection | Patrick Palmer et.al. | 2409.05564 | link |
| 2024-09-09 | Proto-OOD: Enhancing OOD Object Detection with Prototype Feature Similarity | Junkun Chen et.al. | 2409.05466 | null |
| 2024-09-09 | Distribution Discrepancy and Feature Heterogeneity for Active 3D Object Detection | Huang-Yu Chen et.al. | 2409.05425 | null |
| 2024-09-08 | A Low-Computational Video Synopsis Framework with a Standard Dataset | Ramtin Malekpour et.al. | 2409.05230 | link |
| 2024-09-08 | Can OOD Object Detectors Learn from Foundation Models? | Jiahui Liu et.al. | 2409.05162 | link |
| 2024-09-08 | WaterSeeker: Efficient Detection of Watermarked Segments in Large Documents | Leyi Pan et.al. | 2409.05112 | null |
| 2024-09-08 | Visual Grounding with Multi-modal Conditional Adaptation | Ruilin Yao et.al. | 2409.04999 | link |
| 2024-09-08 | Multi-V2X: A Large Scale Multi-modal Multi-penetration-rate Dataset for Cooperative Perception | Rongsong Li et.al. | 2409.04980 | null |
| 2024-09-06 | Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences | Rui Yu et.al. | 2409.04390 | null |
| 2024-09-06 | UniDet3D: Multi-dataset Indoor 3D Object Detection | Maksim Kolodiazhnyi et.al. | 2409.04234 | link |
| 2024-09-06 | Feature Compression for Cloud-Edge Multimodal 3D Object Detection | Chongzhen Tian et.al. | 2409.04123 | null |
| 2024-09-06 | D4: Text-guided diffusion model-based domain adaptive data augmentation for vineyard shoot detection | Kentaro Hirahara et.al. | 2409.04060 | null |
| 2024-09-06 | BFA-YOLO: Balanced multiscale object detection network for multi-view building facade attachments detection | Yangguang Chen et.al. | 2409.04025 | null |
| 2024-09-05 | LowFormer: Hardware Efficient Design for Convolutional Transformer Backbones | Moritz Nottebaum et.al. | 2409.03460 | link |
| 2024-09-05 | Training-free Conversion of Pretrained ANNs to SNNs for Low-Power and High-Performance Applications | Tong Bu et.al. | 2409.03368 | null |
| 2024-09-05 | YOLO-PPA based Efficient Traffic Sign Detection for Cruise Control in Autonomous Driving | Jingyu Zhang et.al. | 2409.03320 | null |
| 2024-09-05 | Gr-IoU: Ground-Intersection over Union for Robust Multi-Object Tracking with 3D Geometric Constraints | Keisuke Toida et.al. | 2409.03252 | null |
| 2024-09-04 | Boundless: Generating Photorealistic Synthetic Data for Object Detection in Urban Streetscapes | Mehmet Kerem Turkcan et.al. | 2409.03022 | link |
| 2024-09-04 | Real-Time Dynamic Scale-Aware Fusion Detection Network: Take Road Damage Detection as an example | Weichao Pan et.al. | 2409.02546 | null |
| 2024-09-04 | TP-GMOT: Tracking Generic Multiple Object by Textual Prompt with Motion-Appearance Cost (MAC) SORT | Duy Le Dinh Anh et.al. | 2409.02490 | link |
| 2024-09-04 | Rapid Automatic Multiple Moving Objects Detection Method Based on Feature Extraction from Images with Non-sidereal Tracking | Lei Wang et.al. | 2409.02405 | null |
| 2024-09-04 | Pluralistic Salient Object Detection | Xuelu Feng et.al. | 2409.02368 | null |
| 2024-09-03 | Site Selection for the Second Flyeye Telescope: A Simulation Study for Optimizing Near-Earth Object Discovery | D. Föhring et.al. | 2409.02329 | null |
| 2024-09-03 | K-Origins: Better Colour Quantification for Neural Networks | Lewis Mason et.al. | 2409.02281 | null |
| 2024-09-03 | Evaluation and Comparison of Visual Language Models for Transportation Engineering Problems | Sanjita Prajapati et.al. | 2409.02278 | null |
| 2024-09-03 | A Modern Take on Visual Relationship Reasoning for Grasp Planning | Paolo Rabino et.al. | 2409.02035 | null |
| 2024-09-03 | Latent Distillation for Continual Object Detection at the Edge | Francesco Pasti et.al. | 2409.01872 | link |
| 2024-09-03 | Real-Time Indoor Object Detection based on hybrid CNN-Transformer Approach | Salah Eddine Laidoudi et.al. | 2409.01871 | null |
| 2024-08-30 | Structuring a Training Strategy to Robustify Perception Models with Realistic Image Augmentations | Ahmed Hammam et.al. | 2408.17311 | null |
| 2024-08-30 | Hybrid Classification-Regression Adaptive Loss for Dense Object Detection | Yanquan Huang et.al. | 2408.17182 | null |
| 2024-08-30 | UTrack: Multi-Object Tracking with Uncertain Detections | Edgardo Solano-Carrillo et.al. | 2408.17098 | link |
| 2024-08-30 | PIB: Prioritized Information Bottleneck Framework for Collaborative Edge Video Analytics | Zhengru Fang et.al. | 2408.17047 | null |
| 2024-08-30 | CP-VoteNet: Contrastive Prototypical VoteNet for Few-Shot Point Cloud Object Detection | Xuejing Li et.al. | 2408.17036 | null |
| 2024-08-30 | MakeWay: Object-Aware Costmaps for Proactive Indoor Navigation Using LiDAR | Binbin Xu et.al. | 2408.17034 | null |
| 2024-08-29 | Analyzing Errors in Controlled Turret System Given Target Location Input from Artificial Intelligence Methods in Automatic Target Recognition | Matthew Karlson et.al. | 2408.16923 | null |
| 2024-08-29 | Space3D-Bench: Spatial 3D Question Answering Benchmark | Emilia Szymanska et.al. | 2408.16662 | null |
| 2024-08-29 | SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection | Rohit Venkata Sai Dulam et.al. | 2408.16645 | null |
| 2024-08-29 | UAV-Based Human Body Detector Selection and Fusion for Geolocated Saliency Map Generation | Piotr Rudol et.al. | 2408.16501 | null |
| 2024-08-29 | Weakly Supervised Object Detection for Automatic Tooth-marked Tongue Recognition | Yongcun Zhang et.al. | 2408.16451 | link |
| 2024-08-29 | Enhancing Sound Source Localization via False Negative Elimination | Zengjie Song et.al. | 2408.16448 | link |
| 2024-08-29 | High-yield large-scale suspended graphene membranes over closed cavities for sensor applications | Sebastian Lukas et.al. | 2408.16408 | null |
| 2024-08-29 | FA-YOLO: Research On Efficient Feature Selection YOLO Improved Algorithm Based On FMDS and AGMF Modules | Yukang Huo et.al. | 2408.16313 | null |
| 2024-08-29 | Anno-incomplete Multi-dataset Detection | Yiran Xu et.al. | 2408.16247 | null |
| 2024-08-29 | PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird’s-Eye-View | Zichen Yu et.al. | 2408.16200 | null |
| 2024-08-28 | ChartEye: A Deep Learning Framework for Chart Information Extraction | Osama Mustafa et.al. | 2408.16123 | null |
| 2024-08-28 | microYOLO: Towards Single-Shot Object Detection on Microcontrollers | Mark Deutel et.al. | 2408.15865 | null |
| 2024-08-28 | What is YOLOv8: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector | Muhammad Yaseen et.al. | 2408.15857 | null |
| 2024-08-28 | Network transferability of adversarial patches in real-time object detection | Jens Bayer et.al. | 2408.15833 | link |
| 2024-08-28 | Object Detection for Vehicle Dashcams using Transformers | Osama Mustafa et.al. | 2408.15809 | null |
| 2024-08-29 | RIDE: Boosting 3D Object Detection for LiDAR Point Clouds via Rotation-Invariant Analysis | Zhaoxuan Wang et.al. | 2408.15643 | null |
| 2024-08-28 | MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion | Yanglin Deng et.al. | 2408.15641 | link |
| 2024-08-28 | Semantic and goal-oriented edge computing for satellite Earth Observation | Beatriz Soret et.al. | 2408.15639 | null |
| 2024-08-28 | Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection | Sondos Mohamed et.al. | 2408.15637 | null |
| 2024-08-28 | Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail | Bianca Lamm et.al. | 2408.15626 | null |
| 2024-08-28 | RoboSense: Large-scale Dataset and Benchmark for Multi-sensor Low-speed Autonomous Driving | Haisheng Su et.al. | 2408.15503 | null |
| 2024-08-27 | A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships | Gracile Astlin Pereira et.al. | 2408.15178 | null |
| 2024-08-27 | Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance | Kunpeng Wang et.al. | 2408.15063 | null |
| 2024-08-27 | Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection | Siyuan Yao et.al. | 2408.15020 | link |
| 2024-08-27 | Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation | Elona Shatri et.al. | 2408.15002 | null |
| 2024-08-27 | BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization | Mario A. V. Saucedo et.al. | 2408.14941 | null |
| 2024-08-26 | PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection | Yidi Li et.al. | 2408.14600 | null |
| 2024-08-26 | A Survey of Camouflaged Object Detection and Beyond | Fengyang Xiao et.al. | 2408.14562 | null |
| 2024-08-26 | Beyond Few-shot Object Detection: A Detailed Survey | Vishal Chudasama et.al. | 2408.14249 | null |
| 2024-08-26 | TC-PDM: Temporally Consistent Patch Diffusion Models for Infrared-to-Visible Video Translation | Anh-Dzung Doan et.al. | 2408.14227 | null |
| 2024-08-26 | EMDFNet: Efficient Multi-scale and Diverse Feature Network for Traffic Sign Detection | Pengyu Li et.al. | 2408.14189 | null |
| 2024-08-26 | More Pictures Say More: Visual Intersection Network for Open Set Object Detection | Bingcheng Dong et.al. | 2408.14032 | null |
| 2024-08-25 | Bridging the Gap between Real-world and Synthetic Images for Testing Autonomous Driving Systems | Mohammad Hossein Amini et.al. | 2408.13950 | null |
| 2024-08-25 | OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart Wheelchair Navigation | Muhammad Rameez ur Rahman et.al. | 2408.13936 | link |
| 2024-08-25 | Infrared Domain Adaptation with Zero-Shot Quantization | Burak Sevsay et.al. | 2408.13925 | null |
| 2024-08-25 | TraIL-Det: Transformation-Invariant Local Feature Networks for 3D LiDAR Object Detection with Unsupervised Pre-Training | Li Li et.al. | 2408.13902 | null |
| 2024-08-25 | Selectively Dilated Convolution for Accuracy-Preserving Sparse Pillar-based Embedded 3D Object Detection | Seongmin Park et.al. | 2408.13798 | null |
| 2024-08-24 | Mean Height Aided Post-Processing for Pedestrian Detection | Jing Yuan et.al. | 2408.13646 | null |
| 2024-08-23 | MCTR: Multi Camera Tracking Transformer | Alexandru Niculescu-Mizil et.al. | 2408.13243 | null |
| 2024-08-23 | DeTPP: Leveraging Object Detection for Robust Long-Horizon Event Prediction | Ivan Karpukhin et.al. | 2408.13131 | null |
| 2024-08-23 | VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models | Wentao Wu et.al. | 2408.13031 | link |
| 2024-08-23 | Can AI Assistance Aid in the Grading of Handwritten Answer Sheets? | Pritam Sil et.al. | 2408.12870 | null |
| 2024-08-23 | Symmetric masking strategy enhances the performance of Masked Image Modeling | Khanh-Binh Nguyen et.al. | 2408.12772 | null |
| 2024-08-22 | CatFree3D: Category-agnostic 3D Object Detection with Diffusion | Wenjing Bian et.al. | 2408.12747 | null |
| 2024-08-22 | Revisiting Cross-Domain Problem for LiDAR-based 3D Object Detection | Ruixiao Zhang et.al. | 2408.12708 | null |
| 2024-08-22 | xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations | Can Qin et.al. | 2408.12590 | link |
| 2024-08-22 | Enhanced Parking Perception by Multi-Task Fisheye Cross-view Transformers | Antonyo Musabini et.al. | 2408.12575 | null |
| 2024-08-22 | Comparing YOLOv5 Variants for Vehicle Detection: A Performance Analysis | Athulya Sundaresan Geetha et.al. | 2408.12550 | null |
| 2024-08-22 | UMAD: University of Macau Anomaly Detection Benchmark Dataset | Dong Li et.al. | 2408.12527 | link |
| 2024-08-22 | Class-balanced Open-set Semi-supervised Object Detection for Medical Images | Zhanyun Lu et.al. | 2408.12355 | null |
| 2024-08-22 | OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion | Guoting Wei et.al. | 2408.12246 | null |
| 2024-08-22 | On the Credibility of Backdoor Attacks Against Object Detectors in the Physical World | Bao Gia Doan et.al. | 2408.12122 | null |
| 2024-08-21 | CARLA Drone: Monocular 3D Object Detection from a Different Perspective | Johannes Meier et.al. | 2408.11958 | null |
| 2024-08-21 | SBDet: A Symmetry-Breaking Object Detector via Relaxed Rotation-Equivariance | Zhiqiang Wu et.al. | 2408.11760 | null |
| 2024-08-21 | Video-to-Text Pedestrian Monitoring (VTPM): Leveraging Computer Vision and Large Language Models for Privacy-Preserve Pedestrian Activity Monitoring at Intersections | Ahmed S. Abdelrahman et.al. | 2408.11649 | null |
| 2024-08-21 | Domain-invariant Progressive Knowledge Distillation for UAV-based Object Detection | Liang Yao et.al. | 2408.11407 | null |
| 2024-08-20 | On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes | Sadia Ilyas et.al. | 2408.11221 | null |
| 2024-08-20 | Quantum Inverse Contextual Vision Transformers (Q-ICVT): A New Frontier in 3D Object Detection for AVs | Sanjay Bhargav Dharavath et.al. | 2408.11207 | link |
| 2024-08-20 | A Closer Look at Data Augmentation Strategies for Finetuning-Based Low/Few-Shot Object Detection | Vladislav Li et.al. | 2408.10940 | null |
| 2024-08-20 | Aligning Object Detector Bounding Boxes with Human Preference | Ombretta Strafforello et.al. | 2408.10844 | null |
| 2024-08-20 | LightMDETR: A Lightweight Approach for Low-Cost Open-Vocabulary Object Detection Training | Binta Sow et.al. | 2408.10787 | null |
| 2024-08-20 | Just a Hint: Point-Supervised Camouflaged Object Detection | Huafeng Chen et.al. | 2408.10777 | null |
| 2024-08-21 | Generative AI in Industrial Machine Vision – A Review | Hans Aoyang Zhou et.al. | 2408.10775 | null |
| 2024-08-20 | Detection of Intracranial Hemorrhage for Trauma Patients | Antoine P. Sanner et.al. | 2408.10768 | null |
| 2024-08-20 | SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection | Huafeng Chen et.al. | 2408.10760 | null |
| 2024-08-20 | Leveraging Temporal Contexts to Enhance Vehicle-Infrastructure Cooperative Perception | Jiaru Zhong et.al. | 2408.10531 | null |
| 2024-08-19 | Leveraging Superfluous Information in Contrastive Representation Learning | Xuechu Yu et.al. | 2408.10292 | null |
| 2024-08-19 | SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition | Wiktor Mucha et.al. | 2408.10037 | null |
| 2024-08-19 | Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving | Jun Yan et.al. | 2408.09839 | link |
| 2024-08-19 | Latent Diffusion for Guided Document Table Generation | Syed Jawwad Haider Hamdani et.al. | 2408.09800 | null |
| 2024-08-18 | Adversarial Attacked Teacher for Unsupervised Domain Adaptive Object Detection | Kaiwen Wang et.al. | 2408.09431 | null |
| 2024-08-18 | Boundary-Recovering Network for Temporal Action Detection | Jihwan Kim et.al. | 2408.09354 | null |
| 2024-08-18 | YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems | Chien-Yao Wang et.al. | 2408.09332 | null |
| 2024-08-17 | GSLAMOT: A Tracklet and Query Graph-based Simultaneous Locating, Mapping, and Multiple Object Tracking System | Shuo Wang et.al. | 2408.09191 | null |
| 2024-08-17 | PADetBench: Towards Benchmarking Physical Attacks against Object Detection | Jiawei Lian et.al. | 2408.09181 | link |
| 2024-08-17 | MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation | Xiao Zhao et.al. | 2408.09122 | null |
| 2024-08-17 | Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community | Jiancheng Pan et.al. | 2408.09110 | null |
| 2024-08-16 | SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation | Xinyu Xiong et.al. | 2408.08870 | link |
| 2024-08-16 | Multimodal Relational Triple Extraction with Query-based Entity Object Transformer | Lei Hei et.al. | 2408.08709 | null |
| 2024-08-16 | Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs | Jinming Liu et.al. | 2408.08575 | null |
| 2024-08-15 | 5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks | Dongshuo Yin et.al. | 2408.08345 | link |
| 2024-08-15 | Learned Multimodal Compression for Autonomous Driving | Hadi Hadizadeh et.al. | 2408.08211 | null |
| 2024-08-16 | OC3D: Weakly Supervised Outdoor 3D Object Detection with Only Coarse Click Annotation | Qiming Xia et.al. | 2408.08092 | null |
| 2024-08-15 | CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection | Xunfa Lai et.al. | 2408.08050 | null |
| 2024-08-15 | Co-Fix3D: Enhancing 3D Object Detection with Collaborative Refinement | Wenxuan Li et.al. | 2408.07999 | null |
| 2024-08-15 | GOReloc: Graph-based Object-Level Relocalization for Visual SLAM | Yutong Wang et.al. | 2408.07917 | link |
| 2024-08-14 | See It All: Contextualized Late Aggregation for 3D Dense Captioning | Minjung Kim et.al. | 2408.07648 | null |
| 2024-08-14 | Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving | Yuqing Wen et.al. | 2408.07605 | null |
| 2024-08-14 | Infra-YOLO: Efficient Neural Network Structure with Model Compression for Real-Time Infrared Small Object Detection | Zhonglin Chen et.al. | 2408.07455 | null |
| 2024-08-14 | Sign language recognition based on deep learning and low-cost handcrafted descriptors | Alvaro Leandro Cavalcante Carneiro et.al. | 2408.07244 | link |
| 2024-08-13 | Vision Language Model for Interpretable and Fine-grained Detection of Safety Compliance in Diverse Workplaces | Zhiling Chen et.al. | 2408.07146 | null |
| 2024-08-13 | Divide and Conquer: Improving Multi-Camera 3D Perception with 2D Semantic-Depth Priors and Input-Dependent Queries | Qi Song et.al. | 2408.06901 | null |
| 2024-08-13 | Integrating Saliency Ranking and Reinforcement Learning for Enhanced Object Detection | Matthias Bartolo et.al. | 2408.06803 | link |
| 2024-08-13 | Exploring Domain Shift on Radar-Based 3D Object Detection Amidst Diverse Environmental Conditions | Miao Zhang et.al. | 2408.06772 | null |
| 2024-08-13 | Unified-IoU: For High-Quality Object Detection | Xiangjie Luo et.al. | 2408.06636 | link |
| 2024-08-13 | A lightweight YOLOv5-FFM model for occlusion pedestrian detection | Xiangjie Luo et.al. | 2408.06633 | null |
| 2024-08-13 | MV-DETR: Multi-modality indoor object detection by Multi-View DEtecton TRansformers | Zichao Dong et.al. | 2408.06604 | null |
| 2024-08-12 | Latent Disentanglement for Low Light Image Enhancement | Zhihao Zheng et.al. | 2408.06245 | null |
| 2024-08-12 | MR3D-Net: Dynamic Multi-Resolution 3D Sparse Voxel Grid Fusion for LiDAR-Based Collective Perception | Sven Teufel et.al. | 2408.06137 | link |
| 2024-08-12 | DPDETR: Decoupled Position Detection Transformer for Infrared-Visible Object Detection | Junjie Guo et.al. | 2408.06123 | null |
| 2024-08-12 | Optimizing Vision Transformers with Data-Free Knowledge Transfer | Gousia Habib et.al. | 2408.05952 | null |
| 2024-08-12 | MV2DFusion: Leveraging Modality-Specific Object Semantics for Multi-Modal 3D Detection | Zitian Wang et.al. | 2408.05945 | null |
| 2024-08-12 | Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes | Ke Zhou et.al. | 2408.05936 | null |
| 2024-08-12 | Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts | Peng Wu et.al. | 2408.05905 | null |
| 2024-08-12 | Toward Pedestrian Head Tracking: A Benchmark Dataset and an Information Fusion Network | Kailai Sun et.al. | 2408.05877 | null |
| 2024-08-11 | U-DECN: End-to-End Underwater Object Detection ConvNet with Improved DeNoising Training | Zhuoyan Liu et.al. | 2408.05780 | link |
| 2024-08-11 | FADE: A Dataset for Detecting Falling Objects around Buildings in Video | Zhigang Tu et.al. | 2408.05750 | null |
| 2024-08-09 | DeepInteraction++: Multi-Modality Interaction for Autonomous Driving | Zeyu Yang et.al. | 2408.05075 | link |
| 2024-08-09 | RadarPillars: Efficient Object Detection from 4D Radar Point Clouds | Alexander Musiat et.al. | 2408.05020 | null |
| 2024-08-09 | Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation | Yifan Feng et.al. | 2408.04804 | link |
| 2024-08-08 | SOD-YOLOv8 – Enhancing YOLOv8 for Small Object Detection in Traffic Scenes | Boshra Khalili et.al. | 2408.04786 | null |
| 2024-08-08 | Data-Driven Pixel Control: Challenges and Prospects | Saurabh Farkya et.al. | 2408.04767 | null |
| 2024-08-10 | SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More | Tianrun Chen et.al. | 2408.04579 | null |
| 2024-08-07 | Impact Analysis of Data Drift Towards The Development of Safety-Critical Automotive System | Md Shahi Amran Hossain et.al. | 2408.04476 | null |
| 2024-08-08 | Detecting Car Speed using Object Detection and Depth Estimation: A Deep Learning Framework | Subhasis Dasgupta et.al. | 2408.04360 | null |
| 2024-08-08 | Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection | Shixuan Gao et.al. | 2408.04326 | null |
| 2024-08-08 | LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection | Mervat Abassy et.al. | 2408.04284 | link |
| 2024-08-08 | Learning to Rewrite: Generalized LLM-Generated Text Detection | Wei Hao et.al. | 2408.04237 | null |
| 2024-08-07 | PaveCap: The First Multimodal Framework for Comprehensive Pavement Condition Assessment with Dense Captioning and PCI Estimation | Blessing Agyei Kyem et.al. | 2408.04110 | link |
| 2024-08-07 | Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection | Christian Fruhwirth-Reisinger et.al. | 2408.03790 | null |
| 2024-08-07 | Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model | Guoqing Zhu et.al. | 2408.03748 | link |
| 2024-08-07 | CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications | Tianfang Zhang et.al. | 2408.03703 | link |
| 2024-08-07 | L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection | Xun Huang et.al. | 2408.03677 | null |
| 2024-08-07 | Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks | Jaewook Lee et.al. | 2408.03663 | null |
| 2024-08-07 | Leveraging LLMs for Enhanced Open-Vocabulary 3D Scene Understanding in Autonomous Driving | Amirhosein Chahe et.al. | 2408.03516 | null |
| 2024-08-07 | GUI Element Detection Using SOTA YOLO Deep Learning Models | Seyed Shayan Daneshvar et.al. | 2408.03507 | null |
| 2024-08-06 | AI Foundation Models in Remote Sensing: A Survey | Siqi Lu et.al. | 2408.03464 | null |
| 2024-08-06 | Biomedical Image Segmentation: A Systematic Literature Review of Deep Learning Based Object Detection Methods | Fazli Wahid et.al. | 2408.03393 | null |
| 2024-08-06 | Nighttime Pedestrian Detection Based on Fore-Background Contrast Learning | He Yao et.al. | 2408.03030 | null |
| 2024-08-06 | Diverse Generation while Maintaining Semantic Coordination: A Diffusion-Based Data Augmentation Method for Object Detection | Sen Nie et.al. | 2408.02891 | null |
| 2024-08-05 | HQOD: Harmonious Quantization for Object Detection | Long Huang et.al. | 2408.02561 | null |
| 2024-08-05 | Tensorial template matching for fast cross-correlation with rotations and its application for tomography | Antonio Martinez-Sanchez et.al. | 2408.02398 | null |
| 2024-08-05 | Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization | Changtao Miao et.al. | 2408.02306 | null |
| 2024-08-05 | AssemAI: Interpretable Image-Based Anomaly Detection for Manufacturing Pipelines | Renjith Prasad et.al. | 2408.02181 | null |
| 2024-08-04 | KAN-RCBEVDepth: A multi-modal fusion algorithm in object detection for autonomous driving | Zhihao Lai et.al. | 2408.02088 | null |
| 2024-08-06 | A Survey and Evaluation of Adversarial Attacks for Object Detection | Khoi Nguyen Tiet Nguyen et.al. | 2408.01934 | null |
| 2024-08-04 | CAF-YOLO: A Robust Framework for Multi-Scale Lesion Detection in Biomedical Imagery | Zilin Chen et.al. | 2408.01897 | null |
| 2024-08-03 | Supervised Image Translation from Visible to Infrared Domain for Object Detection | Prahlad Anand et.al. | 2408.01843 | null |
| 2024-08-03 | Domain penalisation for improved Out-of-Distribution Generalisation | Shuvam Jena et.al. | 2408.01746 | null |
| 2024-08-03 | LAM3D: Leveraging Attention for Monocular 3D Object Detection | Diana-Alexandra Sas et.al. | 2408.01739 | null |
| 2024-08-02 | A Robotics-Inspired Scanpath Model Reveals the Importance of Uncertainty and Semantic Object Cues for Gaze Guidance in Dynamic Scenes | Vito Mengers et.al. | 2408.01322 | null |
| 2024-08-02 | Underwater Object Detection Enhancement via Channel Stabilization | Muhammad Ali et.al. | 2408.01293 | null |
| 2024-08-02 | PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network | Changqun Xia et.al. | 2408.01137 | null |
| 2024-08-02 | Effect of Fog Particle Size Distribution on 3D Object Detection Under Adverse Weather Conditions | Ajinkya Shinde et.al. | 2408.01085 | null |
| 2024-08-02 | Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model | Yang Jin et.al. | 2408.01044 | null |
| 2024-08-02 | MambaST: A Plug-and-Play Cross-Spectral Spatial-Temporal Fuser for Efficient Pedestrian Detection | Xiangbo Gao et.al. | 2408.01037 | null |
| 2024-08-02 | Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach | Yabin Zhu et.al. | 2408.00969 | null |
| 2024-08-01 | Joint Neural Networks for One-shot Object Recognition and Detection | Camilo J. Vargas et.al. | 2408.00701 | link |
| 2024-08-01 | Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection | Ruiyang Zhang et.al. | 2408.00619 | null |
| 2024-08-01 | U2UData: A Large-scale Cooperative Perception Dataset for Swarm UAVs Autonomous Flight | Tongtong Feng et.al. | 2408.00606 | null |
| 2024-08-01 | MUFASA: Multi-View Fusion and Adaptation Network with Spatial Awareness for Radar Object Detection | Xiangyuan Peng et.al. | 2408.00565 | null |
| 2024-08-01 | Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval | Gangyan Zeng et.al. | 2408.00441 | link |
| 2024-08-01 | MonoMM: A Multi-scale Mamba-Enhanced Network for Real-time Monocular 3D Object Detection | Youjia Fu et.al. | 2408.00438 | null |
| 2024-08-01 | DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training | Yu Xie et.al. | 2408.00355 | null |
| 2024-08-01 | A Simple Background Augmentation Method for Object Detection with Diffusion Model | Yuhang Li et.al. | 2408.00350 | null |
| 2024-08-01 | Diff3DETR:Agent-based Diffusion Model for Semi-supervised 3D Object Detection | Jiacheng Deng et.al. | 2408.00286 | null |
| 2024-08-01 | RoCo:Robust Collaborative Perception By Iterative Object Matching and Pose Adjustment | Zhe Huang et.al. | 2408.00257 | null |
| 2024-07-31 | Dynamic Object Queries for Transformer-based Incremental Object Detection | Jichuan Zhang et.al. | 2407.21687 | null |
| 2024-07-31 | Spatial Transformer Network YOLO Model for Agricultural Object Detection | Yash Zambre et.al. | 2407.21652 | null |
| 2024-07-31 | Evaluating SAM2’s Role in Camouflaged Object Detection: From SAM to SAM2 | Lv Tang et.al. | 2407.21596 | null |
| 2024-07-31 | InScope: A New Real-world 3D Infrastructure-side Collaborative Perception Dataset for Open Traffic Scenarios | Xiaofei Zhang et.al. | 2407.21581 | null |
| 2024-07-31 | Voxel Scene Graph for Intracranial Hemorrhage | Antoine P. Sanner et.al. | 2407.21580 | null |
| 2024-07-31 | MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection | Kuo Wang et.al. | 2407.21465 | link |
| 2024-07-31 | Generalized Tampered Scene Text Detection in the era of Generative AI | Chenfan Qu et.al. | 2407.21422 | null |
| 2024-07-30 | Candidate Distant Trans-Neptunian Objects Detected by the New Horizons Subaru TNO Survey | Wesley C. Fraser et.al. | 2407.21142 | null |
| 2024-07-30 | What is YOLOv5: A deep look into the internal features of the popular object detector | Rahima Khanam et.al. | 2407.20892 | null |
| 2024-07-30 | WARM-3D: A Weakly-Supervised Sim2Real Domain Adaptation Framework for Roadside Monocular 3D Object Detection | Xingcheng Zhou et.al. | 2407.20818 | null |
| 2024-07-31 | Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection | Xinhao Luo et.al. | 2407.20708 | link |
| 2024-07-29 | Uncertainty-Rectified YOLO-SAM for Weakly Supervised ICH Segmentation | Pascal Spiegler et.al. | 2407.20461 | null |
| 2024-07-29 | MEVDT: Multi-Modal Event-Based Vehicle Detection and Tracking Dataset | Zaid A. El Shair et.al. | 2407.20446 | null |
| 2024-07-30 | AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics | Xiangxiang Dai et.al. | 2407.20124 | link |
| 2024-07-29 | Octave-YOLO: Cross frequency detection network with octave convolution | Sangjune Shin et.al. | 2407.19746 | null |
| 2024-07-29 | Cross-Layer Feature Pyramid Transformer for Small Object Detection in Aerial Images | Zewen Du et.al. | 2407.19696 | null |
| 2024-07-29 | Practical Video Object Detection via Feature Selection and Aggregation | Yuheng Shi et.al. | 2407.19650 | link |
| 2024-07-28 | Solving Short-Term Relocalization Problems In Monocular Keyframe Visual SLAM Using Spatial And Semantic Data | Azmyin Md. Kamal et.al. | 2407.19518 | link |
| 2024-07-28 | Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets | Tianxiao Zhang et.al. | 2407.19394 | link |
| 2024-07-27 | Sewer Image Super-Resolution with Depth Priors and Its Lightweight Network | Gang Pan et.al. | 2407.19271 | null |
| 2024-07-27 | Enhancing Tree Type Detection in Forest Fire Risk Assessment: Multi-Stage Approach and Color Encoding with Forest Fire Risk Evaluation Framework for UAV Imagery | Jinda Zhang et.al. | 2407.19184 | null |
| 2024-07-27 | Reducing Spurious Correlation for Federated Domain Generalization | Shuran Ma et.al. | 2407.19174 | null |
| 2024-07-27 | Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble | Juhan Cha et.al. | 2407.19156 | link |
| 2024-07-26 | Local Binary Pattern(LBP) Optimization for Feature Extraction | Zeinab Sedaghatjoo et.al. | 2407.18665 | null |
| 2024-07-25 | LION: Linear Group RNN for 3D Object Detection in Point Clouds | Zhe Liu et.al. | 2407.18232 | link |
| 2024-07-25 | XS-VID: An Extremely Small Video Object Detection Dataset | Jiahao Guo et.al. | 2407.18137 | null |
| 2024-07-25 | SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images | Wenxi Li et.al. | 2407.17956 | null |
| 2024-07-25 | A Novel Perception Entropy Metric for Optimizing Vehicle Perception with LiDAR Deployment | Yongjiang He et.al. | 2407.17942 | null |
| 2024-07-25 | Hierarchical Object Detection and Recognition Framework for Practical Plant Disease Diagnosis | Kohei Iwano et.al. | 2407.17906 | null |
| 2024-07-25 | Advancing 3D Point Cloud Understanding through Deep Transfer Learning: A Comprehensive Survey | Shahab Saquib Sohail et.al. | 2407.17877 | null |
| 2024-07-25 | Enhancing Fine-grained Object Detection in Aerial Images via Orthogonal Mapping | Haoran Zhu et.al. | 2407.17738 | link |
| 2024-07-26 | Unsqueeze [CLS] Bottleneck to Learn Rich Representations | Qing Su et.al. | 2407.17671 | link |
| 2024-07-24 | SDLNet: Statistical Deep Learning Network for Co-Occurring Object Detection and Identification | Binay Kumar Singh et.al. | 2407.17664 | null |
| 2024-07-24 | PEEKABOO: Hiding parts of an image for unsupervised object localization | Hasib Zunair et.al. | 2407.17628 | link |
| 2024-07-24 | ALPI: Auto-Labeller with Proxy Injection for 3D Object Detection using 2D Labels Only | Saad Lahlali et.al. | 2407.17197 | null |
| 2024-07-24 | DVPE: Divided View Position Embedding for Multi-View 3D Object Detection | Jiasen Wang et.al. | 2407.16955 | link |
| 2024-07-23 | What Matters in Range View 3D Object Detection | Benjamin Wilson et.al. | 2407.16789 | link |
| 2024-07-23 | A Framework for Pupil Tracking with Event Cameras | Khadija Iddrisu et.al. | 2407.16665 | null |
| 2024-07-24 | Velocity Driven Vision: Asynchronous Sensor Fusion Birds Eye View Models for Autonomous Vehicles | Seamie Hayes et.al. | 2407.16636 | null |
| 2024-07-23 | COALA: A Practical and Vision-Centric Federated Learning Platform | Weiming Zhuang et.al. | 2407.16560 | link |
| 2024-07-23 | Dynamic Retraining-Updating Mean Teacher for Source-Free Object Detection | Trinh Le Ba Khanh et.al. | 2407.16497 | link |
| 2024-07-23 | MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection | Youngmin Oh et.al. | 2407.16448 | link |
| 2024-07-23 | ESOD: Efficient Small Object Detection on High-Resolution Images | Kai Liu et.al. | 2407.16424 | null |
| 2024-07-23 | Understanding Impacts of Electromagnetic Signal Injection Attacks on Object Detection | Youqian Zhang et.al. | 2407.16327 | null |
| 2024-07-23 | DeepClean: Integrated Distortion Identification and Algorithm Selection for Rectifying Image Corruptions | Aditya Kapoor et.al. | 2407.16302 | null |
| 2024-07-23 | FoRA: Low-Rank Adaptation Model beyond Multimodal Siamese Network | Weiying Xie et.al. | 2407.16129 | link |
| 2024-07-22 | PLayerTV: Advanced Player Tracking and Identification for Automatic Soccer Highlight Clips | Håkon Maric Solberg et.al. | 2407.16076 | null |
| 2024-07-22 | Disentangling spatio-temporal knowledge for weakly supervised object detection and segmentation in surgical video | Guiqiu Liao et.al. | 2407.15794 | null |
| 2024-07-22 | Towards Open-World Object-based Anomaly Detection via Self-Supervised Outlier Synthesis | Brian K. S. Isaac-Medina et.al. | 2407.15763 | null |
| 2024-07-22 | Counter Turing Test ( $CT^2$): Investigating AI-Generated Text Detection for Hindi – Ranking LLMs based on Hindi AI Detectability Index ($ADI_{hi}$ ) | Ishan Kavathekar et.al. | 2407.15694 | link |
| 2024-07-22 | YOLOv10 for Automated Fracture Detection in Pediatric Wrist Trauma X-rays | Ammar Ahmed et.al. | 2407.15689 | link |
| 2024-07-22 | SS-SFR: Synthetic Scenes Spatial Frequency Response on Virtual KITTI and Degraded Automotive Simulations for Object Detection | Daniel Jakab et.al. | 2407.15646 | null |
| 2024-07-22 | YOLO-pdd: A Novel Multi-scale PCB Defect Detection Method Using Deep Representations with Sequential Images | Bowen Liu et.al. | 2407.15427 | null |
| 2024-07-22 | Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection | Zhili Chen et.al. | 2407.15354 | null |
| 2024-07-22 | Explore the LiDAR-Camera Dynamic Adjustment Fusion for 3D Object Detection | Yiran Yang et.al. | 2407.15334 | null |
| 2024-07-21 | Weak-to-Strong Compositional Learning from Generative Models for Language-based Object Detection | Kwanyong Park et.al. | 2407.15296 | null |
| 2024-07-21 | Multiple Object Detection and Tracking in Panoramic Videos for Cycling Safety Analysis | Jingwei Guo et.al. | 2407.15199 | null |
| 2024-07-19 | Enhancing Layout Hotspot Detection Efficiency with YOLOv8 and PCA-Guided Augmentation | Dongyang Wu et.al. | 2407.14498 | null |
| 2024-07-19 | MLMT-CNN for Object Detection and Segmentation in Multi-layer and Multi-spectral Images | Majedaldein Almahasneh et.al. | 2407.14473 | null |
| 2024-07-19 | EmoCAM: Toward Understanding What Drives CNN-based Emotion Recognition | Youssef Doulfoukar et.al. | 2407.14314 | null |
| 2024-07-19 | Bucketed Ranking-based Losses for Efficient Training of Object Detectors | Feyza Yavuz et.al. | 2407.14204 | link |
| 2024-07-19 | Visual Text Generation in the Wild | Yuanzhi Zhu et.al. | 2407.14138 | link |
| 2024-07-18 | GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model | Abdelrahman Shaker et.al. | 2407.13772 | link |
| 2024-07-18 | General Geometry-aware Weakly Supervised 3D Object Detection | Guowen Zhang et.al. | 2407.13748 | link |
| 2024-07-18 | Enhancing Source-Free Domain Adaptive Object Detection with Low-confidence Pseudo Label Distillation | Ilhoon Yoon et.al. | 2407.13524 | link |
| 2024-07-18 | The use of the symmetric finite difference in the local binary pattern (symmetric LBP) | Zeinab Sedaghatjoo et.al. | 2407.13178 | null |
| 2024-07-18 | Learning Camouflaged Object Detection from Noisy Pseudo Label | Jin Zhang et.al. | 2407.13157 | null |
| 2024-07-18 | DFMSD: Dual Feature Masking Stage-wise Knowledge Distillation for Object Detection | Zhourui Zhang et.al. | 2407.13147 | null |
| 2024-07-18 | FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection | Jianwei Zhao et.al. | 2407.13133 | null |
| 2024-07-17 | AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer | Zhuguanyu Wu et.al. | 2407.12951 | link |
| 2024-07-17 | Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients | Dohyung Kim et.al. | 2407.12637 | null |
| 2024-07-17 | CerberusDet: Unified Multi-Task Object Detection | Irina Tolstykh et.al. | 2407.12632 | link |
| 2024-07-17 | Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation | Prantik Howlader et.al. | 2407.12630 | link |
| 2024-07-17 | Enhancing Wrist Abnormality Detection with YOLO: Analysis of State-of-the-art Single-stage Detection Models | Ammar Ahmed et.al. | 2407.12597 | link |
| 2024-07-17 | Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection | Hu Cao et.al. | 2407.12582 | null |
| 2024-07-17 | Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation | Kaixin Bai et.al. | 2407.12449 | null |
| 2024-07-17 | GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval | Han Zhou et.al. | 2407.12431 | link |
| 2024-07-17 | Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection | Zhenni Yu et.al. | 2407.12339 | null |
| 2024-07-16 | AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs | Yunling Zheng et.al. | 2407.12217 | null |
| 2024-07-16 | The object detection method aids in image reconstruction evaluation and clinical interpretation of meniscal abnormalities | Natalia Konovalova et.al. | 2407.12184 | null |
| 2024-07-16 | A Case for Application-Aware Space Radiation Tolerance in Orbital Computing | Meiqi Wang et.al. | 2407.11853 | null |
| 2024-07-16 | Improving Unsupervised Video Object Segmentation via Fake Flow Generation | Suhwan Cho et.al. | 2407.11714 | link |
| 2024-07-16 | Relation DETR: Exploring Explicit Position Relation Prior for Object Detection | Xiuquan Hou et.al. | 2407.11699 | link |
| 2024-07-16 | Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection | Qijie Mo et.al. | 2407.11499 | null |
| 2024-07-16 | Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes | Zhi Cai et.al. | 2407.11464 | link |
| 2024-07-16 | Generative AI Driven Task-Oriented Adaptive Semantic Communications | Yuzhou Fu et.al. | 2407.11354 | null |
| 2024-07-16 | LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction | Penghui Du et.al. | 2407.11335 | link |
| 2024-07-16 | TCFormer: Visual Recognition via Token Clustering Transformer | Wang Zeng et.al. | 2407.11321 | link |
| 2024-07-16 | PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer | Pierre-David Letourneau et.al. | 2407.11306 | null |
| 2024-07-15 | OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models | Zijian Zhou et.al. | 2407.11213 | link |
| 2024-07-15 | Interpreting Hand gestures using Object Detection and Digits Classification | Sangeetha K et.al. | 2407.10902 | null |
| 2024-07-15 | RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception | Chunliang Li et.al. | 2407.10876 | link |
| 2024-07-15 | OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection | Jinghua Hou et.al. | 2407.10753 | link |
| 2024-07-15 | Anticipating Future Object Compositions without Forgetting | Youssef Zahran et.al. | 2407.10723 | null |
| 2024-07-15 | OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer | Yu Wang et.al. | 2407.10655 | link |
| 2024-07-15 | Backdoor Attacks against Image-to-Image Networks | Wenbo Jiang et.al. | 2407.10445 | null |
| 2024-07-14 | Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data | Tuo Feng et.al. | 2407.10200 | link |
| 2024-07-14 | LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection | Sanmin Kim et.al. | 2407.10164 | link |
| 2024-07-14 | FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection | Zheng Jiang et.al. | 2407.10135 | null |
| 2024-07-14 | When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset | Yi Zhang et.al. | 2407.10125 | null |
| 2024-07-12 | DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training | Chen Xin et.al. | 2407.09174 | link |
| 2024-07-12 | Open Vocabulary Multi-Label Video Classification | Rohit Gupta et.al. | 2407.09073 | null |
| 2024-07-12 | DroneMOT: Drone-based Multi-Object Tracking Considering Detection Difficulties and Simultaneous Moving of Drones and Objects | Peng Wang et.al. | 2407.09051 | null |
| 2024-07-12 | Task-driven single-image super-resolution reconstruction of document scans | Maciej Zyrek et.al. | 2407.08993 | null |
| 2024-07-11 | OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects | Akshay Krishnan et.al. | 2407.08711 | null |
| 2024-07-11 | Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene | Ruiyang Zhang et.al. | 2407.08569 | link |
| 2024-07-11 | Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation | Zeyang Zhao et.al. | 2407.08489 | link |
| 2024-07-11 | Semi-Supervised Object Detection: A Survey on Progress from CNN to Transformer | Tahira Shehzadi et.al. | 2407.08460 | null |
| 2024-07-11 | PowerYOLO: Mixed Precision Model for Hardware Efficient Object Detection with Event Data | Dominika Przewlocka-Rus et.al. | 2407.08272 | null |
| 2024-07-11 | Knowledge distillation to effectively attain both region-of-interest and global semantics from an image where multiple objects appear | Seonwhee Jin et.al. | 2407.08257 | link |
| 2024-07-11 | Enrich the content of the image Using Context-Aware Copy Paste | Qiushi Guo et.al. | 2407.08151 | null |
| 2024-07-11 | DMM: Disparity-guided Multispectral Mamba for Oriented Object Detection in Remote Sensing | Minghang Zhou et.al. | 2407.08132 | null |
| 2024-07-10 | MambaVision: A Hybrid Mamba-Transformer Vision Backbone | Ali Hatamizadeh et.al. | 2407.08083 | link |
| 2024-07-10 | Bayesian Detector Combination for Object Detection with Crowdsourced Annotations | Zhi Qin Tan et.al. | 2407.07958 | link |
| 2024-07-10 | Cross Domain Object Detection via Multi-Granularity Confidence Alignment based Mean Teacher | Jiangming Chen et.al. | 2407.07780 | null |
| 2024-07-10 | LSM: A Comprehensive Metric for Assessing the Safety of Lane Detection Systems in Autonomous Driving | Jörg Gamerdinger et.al. | 2407.07740 | null |
| 2024-07-10 | Few-Shot Domain Adaptive Object Detection for Microscopic Images | Sumayya Inayat et.al. | 2407.07633 | null |
| 2024-07-10 | Simplifying Source-Free Domain Adaptation for Object Detection: Effective Self-Training Strategies and Performance Insights | Yan Hao et.al. | 2407.07586 | link |
| 2024-07-09 | Exploring Camera Encoder Designs for Autonomous Driving Perception | Barath Lakshmanan et.al. | 2407.07276 | null |
| 2024-07-09 | ConvNLP: Image-based AI Text Detection | Suriya Prakash Jambunathan et.al. | 2407.07225 | null |
| 2024-07-09 | Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images | Chuanrui Zhang et.al. | 2407.06984 | null |
| 2024-07-09 | Cue Point Estimation using Object Detection | Giulia Argüello et.al. | 2407.06823 | link |
| 2024-07-09 | CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection | Shuang Hao et.al. | 2407.06780 | link |
| 2024-07-09 | Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions | Yu-Guan Hsieh et.al. | 2407.06723 | link |
| 2024-07-08 | Stochastic Traveling Salesperson Problem with Neighborhoods for Object Detection | Cheng Peng et.al. | 2407.06366 | null |
| 2024-07-08 | GeoWATCH for Detecting Heavy Construction in Heterogeneous Time Series of Satellite Images | Jon Crall et.al. | 2407.06337 | null |
| 2024-07-08 | Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection | Chenxu Wang et.al. | 2407.05909 | link |
| 2024-07-08 | Boosting 3D Object Detection with Semantic-Aware Multi-Branch Framework | Hao Jing et.al. | 2407.05769 | null |
| 2024-07-08 | Short-term Object Interaction Anticipation with Disentangled Object Detection @ Ego4D Short Term Object Interaction Anticipation Challenge | Hyunjin Cho et.al. | 2407.05713 | link |
| 2024-07-08 | Weakly Supervised Test-Time Domain Adaptation for Object Detection | Anh-Dzung Doan et.al. | 2407.05607 | null |
| 2024-07-08 | Towards Reflected Object Detection: A Benchmark | Zhongtian Wang et.al. | 2407.05575 | null |
| 2024-07-08 | GMC: A General Framework of Multi-stage Context Learning and Utilization for Visual Detection Tasks | Xuan Wang et.al. | 2407.05566 | null |
| 2024-07-07 | CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs | Akshat Ramachandran et.al. | 2407.05266 | link |
| 2024-07-07 | Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image | Pengkun Jiao et.al. | 2407.05256 | null |
| 2024-07-06 | SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention | Yunzhong Si et.al. | 2407.05128 | null |
| 2024-07-06 | Quantizing YOLOv7: A Comprehensive Study | Mohammadamin Baghbanbashi et.al. | 2407.04943 | null |
| 2024-07-05 | SH17: A Dataset for Human Safety and Personal Protective Equipment Detection in Manufacturing Industry | Hafiz Mughees Ahmad et.al. | 2407.04590 | link |
| 2024-07-05 | Optimizing the image correction pipeline for pedestrian detection in the thermal-infrared domain | Christophe Karam et.al. | 2407.04484 | null |
| 2024-07-05 | Multi-Branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for accurate object detection | Zhiqiang Yang et.al. | 2407.04381 | link |
| 2024-07-05 | Towards Stable 3D Object Detection | Jiabao Wang et.al. | 2407.04305 | null |
| 2024-07-05 | Research, Applications and Prospects of Event-Based Pedestrian Detection: A Survey | Han Wang et.al. | 2407.04277 | null |
| 2024-07-04 | LiDAR-based Real-Time Object Detection and Tracking in Dynamic Environments | Wenqiang Du et.al. | 2407.04115 | null |
| 2024-07-04 | FIPGNet:Pyramid grafting network with feature interaction strategies | Ziyi Ding et.al. | 2407.04085 | null |
| 2024-07-04 | Detect Closer Surfaces that can be Seen: New Modeling and Evaluation in Cross-domain 3D Object Detection | Ruixiao Zhang et.al. | 2407.04061 | null |
| 2024-07-04 | The Solution for the GAIIC2024 RGB-TIR object detection Challenge | Xiangyu Wu et.al. | 2407.03872 | null |
| 2024-07-04 | StreamLTS: Query-based Temporal-Spatial LiDAR Fusion for Cooperative Object Detection | Yunshuang Yuan et.al. | 2407.03825 | null |
| 2024-07-03 | Visual Grounding with Attention-Driven Constraint Balancing | Weitai Kang et.al. | 2407.03243 | null |
| 2024-07-03 | Category-Aware Dynamic Label Assignment with High-Quality Oriented Proposal | Mingkui Feng et.al. | 2407.03205 | null |
| 2024-07-03 | SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding | Weitai Kang et.al. | 2407.03200 | link |
| 2024-07-03 | Global Context Modeling in YOLOv8 for Pediatric Wrist Fracture Detection | Rui-Yang Ju et.al. | 2407.03163 | link |
| 2024-07-03 | YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-time Vision | Muhammad Hussain et.al. | 2407.02988 | null |
| 2024-07-03 | Mast Kalandar at SemEval-2024 Task 8: On the Trail of Textual Origins: RoBERTa-BiLSTM Approach to Detect AI-Generated Text | Jainit Sushil Bafna et.al. | 2407.02978 | null |
| 2024-07-03 | A Pairwise DomMix Attentive Adversarial Network for Unsupervised Domain Adaptive Object Detection | Jie Shao et.al. | 2407.02835 | null |
| 2024-07-03 | ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers | Yanfeng Jiang et.al. | 2407.02763 | null |
| 2024-07-02 | SMILe: Leveraging Submodular Mutual Information For Robust Few-Shot Object Detection | Anay Majee et.al. | 2407.02665 | null |
| 2024-07-02 | Robust ADAS: Enhancing Robustness of Machine Learning-based Advanced Driver Assistance Systems for Adverse Weather | Muhammad Zaeem Shahzad et.al. | 2407.02581 | null |
| 2024-07-02 | Similarity Distance-Based Label Assignment for Tiny Object Detection | Shuohao Shi et.al. | 2407.02394 | link |
| 2024-07-02 | OpenSlot: Mixed Open-set Recognition with Object-centric Learning | Xu Yin et.al. | 2407.02386 | null |
| 2024-07-02 | DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection | Kaixin Xu et.al. | 2407.02098 | null |
| 2024-07-02 | Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning | Chengchao Shen et.al. | 2407.02014 | link |
| 2024-07-02 | Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection | Zixing Li et.al. | 2407.01894 | link |
| 2024-07-01 | Scarecrow monitoring system:employing mobilenet ssd for enhanced animal supervision | Balaji VS et.al. | 2407.01435 | null |
| 2024-07-01 | Formal Verification of Object Detection | Avraham Raviv et.al. | 2407.01295 | null |
| 2024-07-01 | Cross-Architecture Auxiliary Feature Space Translation for Efficient Few-Shot Personalized Object Detection | Francesco Barbato et.al. | 2407.01193 | null |
| 2024-07-01 | Eliminating Position Bias of Language Models: A Mechanistic Approach | Ziqi Wang et.al. | 2407.01100 | link |
| 2024-07-01 | No More Potentially Dynamic Objects: Static Point Cloud Map Generation based on 3D Object Detection and Ground Projection | Soojin Woo et.al. | 2407.01073 | null |
| 2024-06-28 | Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood | Yang Xu et.al. | 2406.19874 | link |
| 2024-07-01 | Mobile Robot Oriented Large-Scale Indoor Dataset for Dynamic Scene Understanding | Yifan Tang et.al. | 2406.19791 | null |
| 2024-06-28 | Basketball-SORT: An Association Method for Complex Multi-object Occlusion Problems in Basketball Multi-object Tracking | Qingrui Hu et.al. | 2406.19655 | null |
| 2024-06-27 | Robustness Testing of Black-Box Models Against CT Degradation Through Test-Time Augmentation | Jack Highton et.al. | 2406.19557 | null |
| 2024-06-27 | BOrg: A Brain Organoid-Based Mitosis Dataset for Automatic Analysis of Brain Diseases | Muhammad Awais et.al. | 2406.19556 | link |
| 2024-06-27 | Weighted Circle Fusion: Ensembling Circle Representation from Different Object Detection Results | Jialin Yue et.al. | 2406.19540 | null |
| 2024-06-27 | Stereo Vision Based Robot for Remote Monitoring with VR Support | Mohamed Fazil M. S. et.al. | 2406.19498 | null |
| 2024-06-27 | HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection | Liujuan Cao et.al. | 2406.19394 | link |
| 2024-06-27 | STAL3D: Unsupervised Domain Adaptation for 3D Object Detection via Collaborating Self-Training and Adversarial Learning | Yanan Zhang et.al. | 2406.19362 | null |
| 2024-06-27 | Towards Reducing Data Acquisition and Labeling for Defect Detection using Simulated Data | Lukas Malte Kemeter et.al. | 2406.19175 | null |
| 2024-06-27 | FDLite: A Single Stage Lightweight Face Detector Network | Yogesh Aggarwal et.al. | 2406.19107 | null |
| 2024-06-27 | Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO | Fuseini Mumuni et.al. | 2406.19057 | null |
| 2024-06-27 | BiCo-Fusion: Bidirectional Complementary LiDAR-Camera Fusion for Semantic- and Spatial-Aware 3D Object Detection | Yang Song et.al. | 2406.19048 | null |
| 2024-06-27 | A Universal Railway Obstacle Detection System based on Semi-supervised Segmentation And Optical Flow | Qiushi Guo et.al. | 2406.18908 | null |
| 2024-06-26 | SpY: A Context-Based Approach to Spacecraft Component Detection | Trupti Mahendrakar et.al. | 2406.18709 | null |
| 2024-06-26 | Unveiling the Unknown: Conditional Evidence Decoupling for Unknown Rejection | Zhaowei Wu et.al. | 2406.18443 | link |
| 2024-06-26 | Detecting Machine-Generated Texts: Not Just “AI vs Humans” and Explainability is Complicated | Jiazhou Ji et.al. | 2406.18259 | null |
| 2024-06-26 | CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection | Meiying Zhang et.al. | 2406.18129 | null |
| 2024-06-26 | The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval | Meinardus Boris et.al. | 2406.18113 | link |
| 2024-06-25 | Unmasking the Imposters: In-Domain Detection of Human vs. Machine-Generated Tweets | Bryan E. Tuck et.al. | 2406.17967 | null |
| 2024-06-25 | ET tu, CLIP? Addressing Common Object Errors for Unseen Environments | Ye Won Byun et.al. | 2406.17876 | null |
| 2024-06-25 | MDHA: Multi-Scale Deformable Transformer with Hybrid Anchors for Multi-View 3D Object Detection | Michelle Adeline et.al. | 2406.17654 | link |
| 2024-06-25 | Embedded event based object detection with spiking neural network | Jonathan Courtois et.al. | 2406.17617 | null |
| 2024-06-27 | Towards Open-set Camera 3D Object Detection | Zhuolin He et.al. | 2406.17297 | null |
| 2024-06-25 | Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments | Shilei Cao et.al. | 2406.16439 | null |
| 2024-06-24 | Artistic-style text detector and a new Movie-Poster dataset | Aoxiang Ning et.al. | 2406.16307 | null |
| 2024-06-24 | Investigating the Influence of Prompt-Specific Shortcuts in AI Generated Text Detection | Choonghyun Park et.al. | 2406.16275 | null |
| 2024-06-23 | Review of Zero-Shot and Few-Shot AI Algorithms in The Medical Domain | Maged Badawi et.al. | 2406.16143 | null |
| 2024-06-22 | Understanding Student and Academic Staff Perceptions of AI Use in Assessment and Feedback | Jasper Roe et.al. | 2406.15808 | null |
| 2024-06-22 | Smart Feature is What You Need | Zhaoxin Hu et.al. | 2406.15805 | link |
| 2024-06-22 | MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception | Guanqun Wang et.al. | 2406.15768 | null |
| 2024-06-21 | Towards Robust Training Datasets for Machine Learning with Ontologies: A Case Study for Emergency Road Vehicle Detection | Lynn Vonderhaar et.al. | 2406.15268 | null |
| 2024-06-21 | DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection | Jia Syuen Lim et.al. | 2406.14924 | link |
| 2024-06-21 | MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection | Zhuoxiao Chen et.al. | 2406.14878 | null |
| 2024-06-20 | Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines | Xinyi Ying et.al. | 2406.14482 | link |
| 2024-06-20 | Enhanced Bank Check Security: Introducing a Novel Dataset and Transformer-Based Approach for Detection and Verification | Muhammad Saif Ullah Khan et.al. | 2406.14370 | link |
| 2024-06-20 | HoTPP Benchmark: Are We Good at the Long Horizon Events Forecasting? | Ivan Karpukhin et.al. | 2406.14341 | link |
| 2024-06-20 | LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection | Lilian Hollard et.al. | 2406.14239 | link |
| 2024-06-20 | SSAD: Self-supervised Auxiliary Detection Framework for Panoramic X-ray based Dental Disease Diagnosis | Zijian Cai et.al. | 2406.13963 | link |
| 2024-06-20 | Towards the in-situ Trunk Identification and Length Measurement of Sea Cucumbers via Bézier Curve Modelling | Shuaixin Liu et.al. | 2406.13951 | link |
| 2024-06-19 | DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection | Zhuoxiao Chen et.al. | 2406.13891 | link |
| 2024-06-19 | Semantic Enhanced Few-shot Object Detection | Zheng Wang et.al. | 2406.13498 | null |
| 2024-06-19 | Snowy Scenes,Clear Detections: A Robust Model for Traffic Light Detection in Adverse Weather Conditions | Shivank Garg et.al. | 2406.13473 | link |
| 2024-06-19 | Strengthening Layer Interaction via Dynamic Layer Attention | Kaishen Wang et.al. | 2406.13392 | link |
| 2024-06-18 | Privacy Preserving Federated Learning in Medical Imaging with Uncertainty Estimation | Nikolas Koutsoubis et.al. | 2406.12815 | link |
| 2024-06-18 | Online Anchor-based Training for Image Classification Tasks | Maria Tzelepi et.al. | 2406.12662 | null |
| 2024-06-18 | Applying Ensemble Methods to Model-Agnostic Machine-Generated Text Detection | Ivan Ong et.al. | 2406.12570 | null |
| 2024-06-18 | MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts | Dominik Macko et.al. | 2406.12549 | null |
| 2024-06-18 | ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection | Junhao Lin et.al. | 2406.12536 | link |
| 2024-06-18 | SDNIA-YOLO: A Robust Object Detection Model for Extreme Weather Conditions | Yuexiong Ding et.al. | 2406.12395 | null |
| 2024-06-18 | Competitive Learning for Achieving Content-specific Filters in Video Coding for Machines | Honglei Zhang et.al. | 2406.12367 | null |
| 2024-06-18 | Certified ML Object Detection for Surveillance Missions | Mohammed Belcaid et.al. | 2406.12362 | null |
| 2024-06-18 | DASSF: Dynamic-Attention Scale-Sequence Fusion for Aerial Object Detection | Haodong Li et.al. | 2406.12285 | null |
| 2024-06-18 | The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge | Hongpeng Pan et.al. | 2406.12225 | null |
| 2024-06-17 | V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results | Jiaqi Wang et.al. | 2406.11739 | null |
| 2024-06-17 | YOLO-FEDER FusionNet: A Novel Deep Learning Architecture for Drone Detection | Tamara R. Lenhard et.al. | 2406.11641 | null |
| 2024-06-17 | Low-power Ship Detection in Satellite Images Using Neuromorphic Hardware | Gregor Lenz et.al. | 2406.11319 | null |
| 2024-06-17 | Semi-Supervised Domain Adaptation Using Target-Oriented Domain Augmentation for 3D Object Detection | Yecheol Kim et.al. | 2406.11313 | link |
| 2024-06-17 | Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection | Yunsong Wang et.al. | 2406.11311 | null |
| 2024-06-17 | Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding | Yunsong Wang et.al. | 2406.11283 | null |
| 2024-06-17 | YOLO9tr: A Lightweight Model for Pavement Damage Detection Utilizing a Generalized Efficient Layer Aggregation Network and Attention Mechanism | Sompote Youwai et.al. | 2406.11254 | link |
| 2024-06-16 | GANmut: Generating and Modifying Facial Expressions | Maria Surani et.al. | 2406.11079 | null |
| 2024-06-16 | Exploring the Limitations of Detecting Machine-Generated Text | Jad Doughman et.al. | 2406.11073 | null |
| 2024-06-16 | Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP | Shuyang Lin et.al. | 2406.10961 | null |
| 2024-06-14 | EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models | Julian Straub et.al. | 2406.10224 | link |
| 2024-06-14 | YOLOv1 to YOLOv10: A comprehensive review of YOLO variants and their application in the agricultural domain | Mujadded Al Rabbani Alif et.al. | 2406.10139 | null |
| 2024-06-14 | Shelf-Supervised Multi-Modal Pre-Training for 3D Object Detection | Mehar Khurana et.al. | 2406.10115 | null |
| 2024-06-14 | Automated GIS-Based Framework for Detecting Crosswalk Changes from Bi-Temporal High-Resolution Aerial Images | Richard Boadu Antwi et.al. | 2406.09731 | null |
| 2024-06-14 | An alternate approach for estimating grain-growth kinetics | Manoj Prabakar et.al. | 2406.09653 | null |
| 2024-06-13 | Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach | Yansheng Li et.al. | 2406.09410 | link |
| 2024-06-13 | Towards Evaluating the Robustness of Visual State Space Models | Hashmat Shadab Malik et.al. | 2406.09407 | link |
| 2024-06-13 | Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models | Yushi Hu et.al. | 2406.09403 | null |
| 2024-06-13 | Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024 | Peixi Wu et.al. | 2406.09201 | null |
| 2024-06-13 | Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors | Ying Zhou et.al. | 2406.08922 | link |
| 2024-06-13 | Computer vision-based model for detecting turning lane features on Florida’s public roadways | Richard Boadu Antwi et.al. | 2406.08822 | null |
| 2024-06-13 | BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection | Wenjie Wang et.al. | 2406.08785 | null |
| 2024-06-12 | UnO: Unsupervised Occupancy Fields for Perception and Forecasting | Ben Agro et.al. | 2406.08691 | null |
| 2024-06-12 | Transformation-Dependent Adversarial Attacks | Yaoteng Tan et.al. | 2406.08443 | null |
| 2024-06-12 | Dataset Enhancement with Instance-Level Augmentations | Orest Kupyn et.al. | 2406.08249 | link |
| 2024-06-12 | Chemistry3D: Robotic Interaction Benchmark for Chemistry Experiments | Shoujie Li et.al. | 2406.08160 | null |
| 2024-06-12 | CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer | Hualian Sheng et.al. | 2406.08152 | null |
| 2024-06-12 | MWIRSTD: A MWIR Small Target Detection Dataset | Nikhil Kumar et.al. | 2406.08063 | link |
| 2024-06-12 | Sense Less, Generate More: Pre-training LiDAR Perception with Masked Autoencoders for Ultra-Efficient 3D Sensing | Sina Tayebati et.al. | 2406.07833 | link |
| 2024-06-11 | A Deep Learning Approach to Detect Complete Safety Equipment For Construction Workers Based On YOLOv7 | Md. Shariful Islam et.al. | 2406.07707 | null |
| 2024-06-11 | Transforming a rare event search into a not-so-rare event search in real-time with deep learning-based object detection | J. Schueler et.al. | 2406.07538 | null |
| 2024-06-11 | Understanding Visual Concepts Across Models | Brandon Trabucco et.al. | 2406.07506 | link |
| 2024-06-11 | Minimizing Energy Costs in Deep Learning Model Training: The Gaussian Sampling Approach | Challapalli Phanindra Revanth et.al. | 2406.07332 | null |
| 2024-06-11 | Unsupervised Object Detection with Theoretical Guarantees | Marian Longa et.al. | 2406.07284 | null |
| 2024-06-11 | Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation | Jinyuan Li et.al. | 2406.07268 | null |
| 2024-06-11 | EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network | Yining Shi et.al. | 2406.07042 | link |
| 2024-06-11 | RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks | Zhechao Wang et.al. | 2406.07032 | null |
| 2024-06-12 | LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection | Jiahua Xu et.al. | 2406.07023 | null |
| 2024-06-11 | Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection | Junfei Yi et.al. | 2406.06999 | null |
| 2024-06-10 | UnSupDLA: Towards Unsupervised Document Layout Analysis | Talha Uddin Sheikh et.al. | 2406.06236 | null |
| 2024-06-10 | UEMM-Air: A Synthetic Multi-modal Dataset for Unmanned Aerial Vehicle Object Detection | Fan Liu et.al. | 2406.06230 | link |
| 2024-06-10 | ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery | Xian Sun et.al. | 2406.06028 | null |
| 2024-06-10 | Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024 | Jinwoo Ahn et.al. | 2406.05963 | null |
| 2024-06-10 | Open-Vocabulary Part-Based Grasping | Tjeard van Oort et.al. | 2406.05951 | null |
| 2024-06-09 | Stealthy Targeted Backdoor Attacks against Image Captioning | Wenshu Fan et.al. | 2406.05874 | null |
| 2024-06-09 | Scaling Graph Convolutions for Mobile Vision | William Avery et.al. | 2406.05850 | link |
| 2024-06-09 | Mamba YOLO: SSMs-Based YOLO For Object Detection | Zeyu Wang et.al. | 2406.05835 | link |
| 2024-06-09 | ControlLoc: Physical-World Hijacking Attack on Visual Perception in Autonomous Driving | Chen Ma et.al. | 2406.05810 | null |
| 2024-06-09 | SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention | Muhammad Nawfal Meeran et.al. | 2406.05802 | link |
| 2024-06-07 | Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment | Venkanna Babu Guthula et.al. | 2406.04949 | null |
| 2024-06-07 | EGOR: Efficient Generated Objects Replay for incremental object detection | Zijia An et.al. | 2406.04829 | null |
| 2024-06-07 | UCDNet: Multi-UAV Collaborative 3D Object Detection Network by Reliable Feature Mapping | Pengju Tian et.al. | 2406.04648 | null |
| 2024-06-07 | UVCPNet: A UAV-Vehicle Collaborative Perception Network for 3D Object Detection | Yuchao Wang et.al. | 2406.04647 | null |
| 2024-06-06 | CORU: Comprehensive Post-OCR Parsing and Receipt Understanding Dataset | Abdelrahman Abdallah et.al. | 2406.04493 | link |
| 2024-06-06 | DeTra: A Unified Model for Object Detection and Trajectory Forecasting | Sergio Casas et.al. | 2406.04426 | null |
| 2024-06-06 | Parameter-Inverted Image Pyramid Networks | Xizhou Zhu et.al. | 2406.04330 | link |
| 2024-06-06 | LenslessFace: An End-to-End Optimized Lensless System for Privacy-Preserving Face Verification | Xin Cai et.al. | 2406.04129 | null |
| 2024-06-06 | Semmeldetector: Application of Machine Learning in Commercial Bakeries | Thomas H. Schmitt et.al. | 2406.04050 | null |
| 2024-06-06 | Frequency-based Matcher for Long-tailed Semantic Segmentation | Shan Li et.al. | 2406.03917 | link |
| 2024-06-06 | Instance Segmentation and Teeth Classification in Panoramic X-rays | Devichand Budagam et.al. | 2406.03747 | link |
| 2024-06-05 | FedPylot: Navigating Federated Learning for Real-Time Object Detection in Internet of Vehicles | Cyprien Quéméneur et.al. | 2406.03611 | link |
| 2024-06-05 | LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection | Qiang Chen et.al. | 2406.03459 | link |
| 2024-06-05 | Global Clipper: Enhancing Safety and Reliability of Transformer-based Object Detection Models | Qutub Syed Sha et.al. | 2406.03229 | null |
| 2024-06-05 | Situation Monitor: Diversity-Driven Zero-Shot Out-of-Distribution Detection using Budding Ensemble Architecture for Object Detection | Qutub Syed et.al. | 2406.03188 | null |
| 2024-06-05 | Enhanced Automotive Object Detection via RGB-D Fusion in a DiffusionDet Framework | Eliraz Orfaig et.al. | 2406.03129 | null |
| 2024-06-04 | Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation | Mohamed El Amine Boudjoghra et.al. | 2406.02548 | link |
| 2024-06-04 | SatSplatYOLO: 3D Gaussian Splatting-based Virtual Object Detection Ensembles for Satellite Feature Recognition | Van Minh Nguyen et.al. | 2406.02533 | null |
| 2024-06-04 | GrootVL: Tree Topology is All You Need in State Space Model | Yicheng Xiao et.al. | 2406.02395 | link |
| 2024-06-04 | Low-Rank Adaption on Transformer-based Oriented Object Detector for Satellite Onboard Processing of Remote Sensing Images | Xinyang Pu et.al. | 2406.02385 | link |
| 2024-06-04 | Radar Spectra-Language Model for Automotive Scene Parsing | Mariia Pushkareva et.al. | 2406.02158 | null |
| 2024-06-04 | Detecting Endangered Marine Species in Autonomous Underwater Vehicle Imagery Using Point Annotations and Few-Shot Learning | Heather Doig et.al. | 2406.01932 | null |
| 2024-06-04 | GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer | Ding Jia et.al. | 2406.01210 | link |
| 2024-06-03 | Learning Adaptive Fusion Bank for Multi-modal Salient Object Detection | Kunpeng Wang et.al. | 2406.01127 | link |
| 2024-06-03 | Visual Car Brand Classification by Implementing a Synthetic Image Dataset Creation Pipeline | Jan Lippemeier et.al. | 2406.01071 | null |
| 2024-06-03 | Multi-Object Tracking based on Imaging Radar 3D Object Detection | Patrick Palmer et.al. | 2406.01011 | null |
| 2024-05-31 | Power of Cooperative Supervision: Multiple Teachers Framework for Enhanced 3D Semi-Supervised Object Detection | Jin-Hee Lee et.al. | 2405.20720 | link |
| 2024-05-30 | On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines | Selim Kuzucu et.al. | 2405.20459 | link |
| 2024-05-30 | RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection | Fangyi Chen et.al. | 2405.19854 | null |
| 2024-05-30 | Improving Object Detector Training on Synthetic Data by Starting With a Strong Baseline Methodology | Frank A. Ruis et.al. | 2405.19822 | null |
| 2024-05-30 | Towards Unified Multi-granularity Text Detection with Interactive Attention | Xingyu Wan et.al. | 2405.19765 | null |
| 2024-05-30 | Fully Test-Time Adaptation for Monocular 3D Object Detection | Hongbin Lin et.al. | 2405.19682 | link |
| 2024-05-30 | YotoR-You Only Transform One Representation | José Ignacio Díaz Villa et.al. | 2405.19629 | null |
| 2024-05-29 | Enabling Visual Recognition at Radio Frequency | Haowen Lai et.al. | 2405.19516 | null |
| 2024-05-29 | Model Agnostic Defense against Adversarial Patch Attacks on Object Detection in Unmanned Aerial Vehicles | Saurabh Pathak et.al. | 2405.19179 | null |
| 2024-05-29 | RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision | Jinzhong Wang et.al. | 2405.18955 | null |
| 2024-05-29 | SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving | Yiming Cui et.al. | 2405.18857 | null |
| 2024-05-29 | PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram | Sifan Zhou et.al. | 2405.18734 | null |
| 2024-05-28 | A Review and Implementation of Object Detection Models and Optimizations for Real-time Medical Mask Detection during the COVID-19 Pandemic | Ioanna Gogou et.al. | 2405.18387 | link |
| 2024-05-28 | Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? | Yifan Bai et.al. | 2405.18361 | null |
| 2024-05-28 | Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention | Weitai Kang et.al. | 2405.18295 | null |
| 2024-05-28 | DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture | Shentong Mo et.al. | 2405.17995 | link |
| 2024-05-28 | Transformer and Hybrid Deep Learning Based Models for Machine-Generated Text Detection | Teodor-George Marchitan et.al. | 2405.17964 | null |
| 2024-05-28 | Self-supervised Pre-training for Transferable Multi-modal Perception | Xiaohao Xu et.al. | 2405.17942 | null |
| 2024-05-28 | Boosting General Trimap-free Matting in the Real-World Image | Leo Shan Wenzhang Zhou Grace Zhao et.al. | 2405.17916 | null |
| 2024-05-28 | The Binary Quantized Neural Network for Dense Prediction via Specially Designed Upsampling and Attention | Xingyu Ding et.al. | 2405.17776 | null |
| 2024-05-27 | Understanding differences in applying DETR to natural and medical images | Yanqi Xu et.al. | 2405.17677 | null |
| 2024-05-27 | Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection | Shuai Zeng et.al. | 2405.17422 | link |
| 2024-05-27 | Tracking Small Birds by Detection Candidate Region Filtering and Detection History-aware Association | Tingwei Liu et.al. | 2405.17323 | null |
| 2024-05-27 | Enhanced Automotive Radar Collaborative Sensing By Exploiting Constructive Interference | Lifan Xu et.al. | 2405.17297 | null |
| 2024-05-27 | SCaRL- A Synthetic Multi-Modal Dataset for Autonomous Driving | Avinash Nittur Ramesh et.al. | 2405.17030 | null |
| 2024-05-27 | Collective Perception Datasets for Autonomous Driving: A Comprehensive Review | Sven Teufel et.al. | 2405.16973 | null |
| 2024-05-27 | OED: Towards One-stage End-to-End Dynamic Scene Graph Generation | Guan Wang et.al. | 2405.16925 | link |
| 2024-05-27 | ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection | Ziying Song et.al. | 2405.16873 | null |
| 2024-05-27 | A re-calibration method for object detection with multi-modal alignment bias in autonomous driving | Zhihang Song et.al. | 2405.16848 | null |
| 2024-05-26 | A Study on Unsupervised Anomaly Detection and Defect Localization using Generative Model in Ultrasonic Non-Destructive Testing | Yusaku Ando et.al. | 2405.16580 | null |
| 2024-05-26 | AI-Generated Text Detection and Classification Based on BERT Deep Learning Algorithm | Hao Wang et.al. | 2405.16422 | null |
| 2024-05-24 | UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes | Ted Lentsch et.al. | 2405.15688 | link |
| 2024-05-24 | Multimodal Object Detection via Probabilistic a priori Information Integration | Hafsa El Hafyani et.al. | 2405.15596 | null |
| 2024-05-24 | Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection | Fan Liu et.al. | 2405.15465 | null |
| 2024-05-24 | Leveraging knowledge distillation for partial multi-task learning from multiple remote sensing datasets | Hoàng-Ân Lê et.al. | 2405.15394 | null |
| 2024-05-24 | Towards Global Optimal Visual In-Context Learning Prompt Selection | Chengming Xu et.al. | 2405.15279 | null |
| 2024-05-24 | Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection | Yajing Liu et.al. | 2405.15225 | null |
| 2024-05-24 | ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models | Jingyuan Zhu et.al. | 2405.15199 | null |
| 2024-05-24 | MonoDETRNext: Next-generation Accurate and Efficient Monocular 3D Object Detection Method | Pan Liao et.al. | 2405.15176 | null |
| 2024-05-23 | Learning to Detect and Segment Mobile Objects from Unlabeled Videos | Yihong Sun et.al. | 2405.14841 | null |
| 2024-05-23 | Designing A Sustainable Marine Debris Clean-up Framework without Human Labels | Raymond Wang et.al. | 2405.14815 | null |
| 2024-05-23 | Drones Help Drones: A Collaborative Framework for Multi-Drone Object Trajectory Prediction and Beyond | Zhechao Wang et.al. | 2405.14674 | null |
| 2024-05-23 | Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment | Muhammad Sohail Danish et.al. | 2405.14497 | null |
| 2024-05-23 | YOLOv10: Real-Time End-to-End Object Detection | Ao Wang et.al. | 2405.14458 | link |
| 2024-05-23 | Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations | Mohammed Baharoon et.al. | 2405.14239 | null |
| 2024-05-22 | Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation | Mykhailo Uss et.al. | 2405.14024 | null |
| 2024-05-22 | TS40K: a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission System | Diogo Lavado et.al. | 2405.13989 | null |
| 2024-05-22 | Class-Conditional self-reward mechanism for improved Text-to-Image models | Safouane El Ghazouali et.al. | 2405.13473 | link |
| 2024-05-22 | Adaptive Wireless Image Semantic Transmission and Over-The-Air Testing | Jiarun Ding et.al. | 2405.13403 | null |
| 2024-05-21 | BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once | Theodore Zhao et.al. | 2405.12971 | null |
| 2024-05-21 | AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection | Zizhao Chen et.al. | 2405.12944 | link |
| 2024-05-21 | Predicting the Influence of Adverse Weather on Pedestrian Detection with Automotive Radar and Lidar Sensors | Daniel Weihmayr et.al. | 2405.12736 | null |
| 2024-05-21 | Spotting AI’s Touch: Identifying LLM-Paraphrased Spans in Text | Yafu Li et.al. | 2405.12689 | null |
| 2024-05-21 | Automating Attendance Management in Human Resources: A Design Science Approach Using Computer Vision and Facial Recognition | Bao-Thien Nguyen-Tat et.al. | 2405.12633 | null |
| 2024-05-21 | FFAM: Feature Factorization Activation Map for Explanation of 3D Detectors | Shuai Liu et.al. | 2405.12601 | link |
| 2024-05-21 | Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering | Hiba Maryam et.al. | 2405.12533 | null |
| 2024-05-21 | Active Object Detection with Knowledge Aggregation and Distillation from Large Models | Dejie Yang et.al. | 2405.12509 | null |
| 2024-05-21 | Mutual Information Analysis in Multimodal Learning Systems | Hadi Hadizadeh et.al. | 2405.12456 | null |
| 2024-05-20 | Multi-View Attentive Contextualization for Multi-View 3D Object Detection | Xianpeng Liu et.al. | 2405.12200 | null |
| 2024-05-20 | Bangladeshi Native Vehicle Detection in Wild | Bipin Saha et.al. | 2405.12150 | link |
| 2024-05-20 | Salience-guided Ground Factor for Robust Localization of Delivery Robots in Complex Urban Environments | Jooyong Park et.al. | 2405.11855 | null |
| 2024-05-20 | DATR: Unsupervised Domain Adaptive Detection Transformer with Dataset-Level Adaptation and Prototypical Alignment | Jianhong Han et.al. | 2405.11765 | link |
| 2024-05-20 | Versatile Teacher: A Class-aware Teacher-student Framework for Cross-domain Adaptation | Runou Yang et.al. | 2405.11754 | link |
| 2024-05-19 | FADet: A Multi-sensor 3D Object Detection Network based on Local Featured Attention | Ziang Guo et.al. | 2405.11682 | link |
| 2024-05-19 | SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization | Jialong Guo et.al. | 2405.11582 | link |
| 2024-05-19 | The First Swahili Language Scene Text Detection and Recognition Dataset | Fadila Wendigoundi Douamba et.al. | 2405.11437 | link |
| 2024-05-18 | InfRS: Incremental Few-Shot Object Detection in Remote Sensing Images | Wuzhou Li et.al. | 2405.11293 | null |
| 2024-05-18 | Visible and Clear: Finding Tiny Objects in Difference Map | Bing Cao et.al. | 2405.11276 | null |
| 2024-05-17 | A Versatile Framework for Analyzing Galaxy Image Data by Implanting Human-in-the-loop on a Large Vision Model | Mingxiang Fu et.al. | 2405.10890 | null |
| 2024-05-17 | DeepPavlov at SemEval-2024 Task 8: Leveraging Transfer Learning for Detecting Boundaries of Machine-Generated Texts | Anastasia Voznyuk et.al. | 2405.10629 | link |
| 2024-05-17 | DuoSpaceNet: Leveraging Both Bird’s-Eye-View and Perspective View Representations for 3D Object Detection | Zhe Huang et.al. | 2405.10577 | null |
| 2024-05-16 | Drone-type-Set: Drone types detection benchmark for drone detection and tracking | Kholoud AlDosari et.al. | 2405.10398 | null |
| 2024-05-16 | Grounded 3D-LLM with Referent Tokens | Yilun Chen et.al. | 2405.10370 | link |
| 2024-05-16 | Grounding DINO 1.5: Advance the “Edge” of Open-Set Object Detection | Tianhe Ren et.al. | 2405.10300 | link |
| 2024-05-16 | Towards Task-Compatible Compressible Representations | Anderson de Andrade et.al. | 2405.10244 | link |
| 2024-05-16 | SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network | Zhaoxu Li et.al. | 2405.10148 | link |
| 2024-05-16 | SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection | Mingxuan Liu et.al. | 2405.10053 | link |
| 2024-05-16 | FPDIoU Loss: A Loss Function for Efficient Bounding Box Regression of Rotated Object Detection | Siliang Ma et.al. | 2405.09942 | null |
| 2024-05-16 | Infrared Adversarial Car Stickers | Xiaopei Zhu et.al. | 2405.09924 | null |
| 2024-05-16 | PillarNeXt: Improving the 3D detector by introducing Voxel2Pillar feature encoding and extracting multi-scale features | Xusheng Li et.al. | 2405.09828 | null |
| 2024-05-16 | Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection | Feiran Li et.al. | 2405.09782 | link |
| 2024-05-15 | Synth-to-Real Unsupervised Domain Adaptation for Instance Segmentation | Guo Yachan et.al. | 2405.09682 | null |
| 2024-05-15 | Dynamic Loss Decay based Robust Oriented Object Detection on Remote Sensing Images with Noisy Labels | Guozhang Liu et.al. | 2405.09024 | null |
| 2024-05-14 | CLIP with Quality Captions: A Strong Pretraining for Vision Tasks | Pavan Kumar Anasosalu Vasu et.al. | 2405.08911 | null |
| 2024-05-14 | Open-Vocabulary Object Detection via Neighboring Region Attention Alignment | Sunyuan Qiang et.al. | 2405.08593 | null |
| 2024-05-14 | Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method | Mian Zou et.al. | 2405.08487 | link |
| 2024-05-14 | RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images | Zong-Wei Hong et.al. | 2405.08483 | link |
| 2024-05-14 | Multimodal Collaboration Networks for Geospatial Vehicle Detection in Dense, Occluded, and Large-Scale Events | Xin Wu et.al. | 2405.08251 | link |
| 2024-05-13 | RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors | Liam Dugan et.al. | 2405.07940 | null |
| 2024-05-13 | oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving | Abdul Hannan Khan et.al. | 2405.07698 | null |
| 2024-05-13 | MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders | Xueying Jiang et.al. | 2405.07696 | null |
| 2024-05-13 | Quality-aware Selective Fusion Network for V-D-T Salient Object Detection | Liuxin Bao et.al. | 2405.07655 | link |
| 2024-05-13 | Fast Training Data Acquisition for Object Detection and Segmentation using Black Screen Luminance Keying | Thomas Pöllabauer et.al. | 2405.07653 | null |
| 2024-05-13 | Integrity Monitoring of 3D Object Detection in Automated Driving Systems using Raw Activation Patterns and Spatial Filtering | Hakan Yekta Yatbaz et.al. | 2405.07600 | null |
| 2024-05-13 | Environmental Matching Attack Against Unmanned Aerial Vehicles Object Detection | Dehong Kong et.al. | 2405.07595 | null |
| 2024-05-13 | Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis | Tianci Bi et.al. | 2405.07481 | null |
| 2024-05-13 | Enhancing 3D Object Detection by Using Neural Network with Self-adaptive Thresholding | Houze Liu et.al. | 2405.07479 | null |
| 2024-05-12 | MAML MOT: Multiple Object Tracking based on Meta-Learning | Jiayi Chen et.al. | 2405.07272 | null |
| 2024-05-10 | How to Augment for Atmospheric Turbulence Effects on Thermal Adapted Object Detection Models? | Engin Uzun et.al. | 2405.06383 | null |
| 2024-05-10 | Precise Apple Detection and Localization in Orchards using YOLOv5 for Robotic Harvesting Systems | Jiang Ziyue et.al. | 2405.06260 | null |
| 2024-05-09 | CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks | Nick et.al. | 2405.05755 | null |
| 2024-05-09 | Depth Awakens: A Depth-perceptual Attention Fusion Network for RGB-D Camouflaged Object Detection | Xinran Liua et.al. | 2405.05614 | null |
| 2024-05-09 | The object detection model uses combined extraction with KNN and RF classification | Florentina Tatrin Kurniati et.al. | 2405.05551 | null |
| 2024-05-08 | Reviewing Intelligent Cinematography: AI research for camera-based video production | Adrian Azzarelli et.al. | 2405.05039 | null |
| 2024-05-07 | A Novel Wide-Area Multiobject Detection System with High-Probability Region Searching | Xianlei Long et.al. | 2405.04589 | null |
| 2024-05-07 | DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving | Chen Min et.al. | 2405.04390 | null |
| 2024-05-07 | A New Dataset and Comparative Study for Aphid Cluster Detection and Segmentation in Sorghum Fields | Raiyan Rahman et.al. | 2405.04305 | null |
| 2024-05-07 | ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers | Jinke Li et.al. | 2405.04299 | null |
| 2024-05-07 | Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore | Junchao Wu et.al. | 2405.04286 | null |
| 2024-05-07 | Deep Event-based Object Detection in Autonomous Driving: A Survey | Bingquan Zhou et.al. | 2405.03995 | null |
| 2024-05-06 | BadFusion: 2D-Oriented Backdoor Attacks against 3D Object Detection | Saket S. Chaturvedi et.al. | 2405.03884 | null |
| 2024-05-06 | RepVGG-GELAN: Enhanced GELAN with VGG-STYLE ConvNets for Brain Tumour Detection | Thennarasi Balakrishnan et.al. | 2405.03541 | link |
| 2024-05-06 | Low-light Object Detection | Pengpeng Li et.al. | 2405.03519 | null |
| 2024-05-06 | Salient Object Detection From Arbitrary Modalities | Nianchang Huang et.al. | 2405.03352 | null |
| 2024-05-06 | Modality Prompts for Arbitrary Modality Salient Object Detection | Nianchang Huang et.al. | 2405.03351 | null |
| 2024-05-06 | Vietnamese AI Generated Text Detection | Quang-Dan Tran et.al. | 2405.03206 | null |
| 2024-05-06 | PTQ4SAM: Post-Training Quantization for Segment Anything | Chengtao Lv et.al. | 2405.03144 | link |
| 2024-05-05 | Performance Evaluation of Real-Time Object Detection for Electric Scooters | Dong Chen et.al. | 2405.03039 | link |
| 2024-05-05 | SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection | Kassaw Abraham Mulat et.al. | 2405.02906 | null |
| 2024-05-07 | Adaptive Guidance Learning for Camouflaged Object Detection | Zhennan Chen et.al. | 2405.02824 | null |
| 2024-05-05 | PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection | Zhaoqi Leng et.al. | 2405.02811 | null |
| 2024-05-02 | Segmentation-Free Outcome Prediction in Head and Neck Cancer: Deep Learning-based Feature Extraction from Multi-Angle Maximum Intensity Projections (MA-MIPs) of PET Images | Amirhosein Toosi et.al. | 2405.01756 | null |
| 2024-05-02 | PointCompress3D – A Point Cloud Compression Framework for Roadside LiDARs in Intelligent Transportation Systems | Walter Zimmer et.al. | 2405.01750 | null |
| 2024-05-02 | Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey | Guoping Xu et.al. | 2405.01725 | link |
| 2024-05-02 | SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients | Tushar Verma et.al. | 2405.01699 | null |
| 2024-05-02 | Imagine the Unseen: Occluded Pedestrian Detection via Adversarial Feature Completion | Shanshan Zhang et.al. | 2405.01311 | null |
| 2024-05-02 | Overcoming LLM Challenges using RAG-Driven Precision in Coffee Leaf Disease Remediation | Dr. Selva Kumar S et.al. | 2405.01310 | null |
| 2024-05-02 | Towards Consistent Object Detection via LiDAR-Camera Synergy | Kai Luo et.al. | 2405.01258 | link |
| 2024-05-02 | Federated Learning with Heterogeneous Data Handling for Robust Vehicular Object Detection | Ahmad Khalil et.al. | 2405.01108 | null |
| 2024-05-01 | Grains of Saliency: Optimizing Saliency-based Training of Biometric Attack Detection Models | Colton R. Crum et.al. | 2405.00650 | null |
| 2024-05-01 | Object detection under the linear subspace model with application to cryo-EM images | Amitay Eldar et.al. | 2405.00364 | null |
| 2024-04-30 | Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | Yunhao Ge et.al. | 2404.19752 | null |
| 2024-04-30 | Quantifying Nematodes through Images: Datasets, Models, and Baselines of Deep Learning | Zhipeng Yuan et.al. | 2404.19748 | null |
| 2024-04-30 | Masked Multi-Query Slot Attention for Unsupervised Object Discovery | Rishav Pramanik et.al. | 2404.19654 | link |
| 2024-04-30 | Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World | Wen Yin et.al. | 2404.19417 | null |
| 2024-04-30 | UniFS: Universal Few-shot Instance Perception with Point Representations | Sheng Jin et.al. | 2404.19401 | null |
| 2024-04-30 | Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection | Zhanwei Zhang et.al. | 2404.19384 | null |
| 2024-04-30 | Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank | Sungjune Park et.al. | 2404.19299 | null |
| 2024-04-29 | MiPa: Mixed Patch Infrared-Visible Modality Agnostic Object Detection | Heitor R. Medeiros et.al. | 2404.18849 | null |
| 2024-04-29 | Leveraging PointNet and PointNet++ for Lyft Point Cloud Classification Challenge | Rajat K. Doshi et.al. | 2404.18665 | null |
| 2024-04-29 | CoSense3D: an Agent-based Efficient Learning Framework for Collective Perception | Yunshuang Yuan et.al. | 2404.18617 | null |
| 2024-04-29 | Assessing Quality Metrics for Neural Reality Gap Input Mitigation in Autonomous Driving Testing | Stefano Carlo Lambertenghi et.al. | 2404.18577 | null |
| 2024-04-29 | Efficient Meta-Learning Enabled Lightweight Multiscale Few-Shot Object Detection in Remote Sensing Images | Wenbin Guan et.al. | 2404.18426 | null |
| 2024-04-29 | Multi-modal Perception Dataset of In-water Objects for Autonomous Surface Vehicles | Mingi Jeong et.al. | 2404.18411 | null |
| 2024-04-28 | FAD-SAR: A Novel Fishing Activity Detection System via Synthetic Aperture Radar Images Based on Deep Learning Method | Yanbing Bai et.al. | 2404.18245 | null |
| 2024-04-28 | RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation | Oded Bialer et.al. | 2404.18150 | null |
| 2024-04-27 | Reliable Student: Addressing Noise in Semi-Supervised 3D Object Detection | Farzad Nozarian et.al. | 2404.17910 | link |
| 2024-04-27 | A Hybrid Approach for Document Layout Analysis in Document images | Tahira Shehzadi et.al. | 2404.17888 | null |
| 2024-04-26 | Inhomogeneous illuminated image enhancement under extremely low visibility condition | Libang Chen et.al. | 2404.17503 | null |
| 2024-04-26 | Cost-Sensitive Uncertainty-Based Failure Recognition for Object Detection | Moussa Kassem Sbeyti et.al. | 2404.17427 | null |
| 2024-04-26 | Enhancing mmWave Radar Point Cloud via Visual-inertial Supervision | Cong Fan et.al. | 2404.17229 | null |
| 2024-04-26 | MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection | Chengpei Xu et.al. | 2404.17151 | null |
| 2024-04-25 | Generating Minimalist Adversarial Perturbations to Test Object-Detection Models: An Adaptive Multi-Metric Evolutionary Search Approach | Cristopher McIntyre-Garcia et.al. | 2404.17020 | link |
| 2024-04-25 | Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection | Mehmet Kerem Turkcan et.al. | 2404.16944 | link |
| 2024-04-25 | Self-Balanced R-CNN for Instance Segmentation | Leonardo Rossi et.al. | 2404.16633 | link |
| 2024-04-25 | Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System | Daniel Dworak et.al. | 2404.16548 | null |
| 2024-04-25 | Commonsense Prototype for Outdoor Unsupervised 3D Object Detection | Hai Wu et.al. | 2404.16493 | link |
| 2024-04-25 | IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks | Zitong Huang et.al. | 2404.16331 | null |
| 2024-04-25 | CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions | Haoyuan Li et.al. | 2404.16302 | link |
| 2024-04-24 | AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models | Zhiqiang Tang et.al. | 2404.16233 | null |
| 2024-04-24 | Observational parameters of Blue Large-Amplitude Pulsators | P. Pietrukowicz et.al. | 2404.16089 | null |
| 2024-04-24 | A Survey on Visual Mamba | Hanwei Zhang et.al. | 2404.15956 | null |
| 2024-04-24 | Steal Now and Attack Later: Evaluating Robustness of Object Detection against Black-box Adversarial Attacks | Erh-Chung Chen et.al. | 2404.15881 | null |
| 2024-04-24 | Revisiting Out-of-Distribution Detection in LiDAR-based 3D Object Detection | Michael Kösel et.al. | 2404.15879 | link |
| 2024-04-23 | CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection | Hongyi Cai et.al. | 2404.15451 | null |
| 2024-04-23 | ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning | Weifeng Chen et.al. | 2404.15449 | null |
| 2024-04-23 | Source-free Domain Adaptation for Video Object Detection Under Adverse Image Conditions | Xingguang Zhang et.al. | 2404.15252 | null |
| 2024-04-23 | Efficient Transformer Encoders for Mask2Former-style models | Manyi Yao et.al. | 2404.15244 | null |
| 2024-04-23 | Gallbladder Cancer Detection in Ultrasound Images based on YOLO and Faster R-CNN | Sara Dadjouy et.al. | 2404.15129 | null |
| 2024-04-23 | External Prompt Features Enhanced Parameter-efficient Fine-tuning for Salient Object Detection | Wen Liang et.al. | 2404.15008 | null |
| 2024-04-23 | ContextualFusion: Context-Based Multi-Sensor Fusion for 3D Object Detection in Adverse Operating Conditions | Shounak Sural et.al. | 2404.14780 | null |
| 2024-04-23 | Unified Unsupervised Salient Object Detection via Knowledge Transfer | Yao Yuan et.al. | 2404.14759 | link |
| 2024-04-22 | SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection | Yuxia Wang et.al. | 2404.14183 | null |
| 2024-04-22 | Text in the Dark: Extremely Low-Light Text Image Enhancement | Che-Tsung Lin et.al. | 2404.14135 | null |
| 2024-04-22 | CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective | Wencheng Zhu et.al. | 2404.14109 | null |
| 2024-04-22 | Benchmarking Multi-Modal LLMs for Testing Visual Deep Learning Systems Through the Lens of Image Mutation | Liwen Wang et.al. | 2404.13945 | null |
| 2024-04-22 | NeRF-DetS: Enhancing Multi-View 3D Object Detection with Sampling-adaptive Network of Continuous NeRF-based Representation | Chi Huang et.al. | 2404.13921 | null |
| 2024-04-22 | TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos | Atom Scott et.al. | 2404.13868 | null |
| 2024-04-22 | Toward Robust LiDAR based 3D Object Detection via Density-Aware Adaptive Thresholding | Eunho Lee et.al. | 2404.13852 | null |
| 2024-04-21 | A Nasal Cytology Dataset for Object Detection and Deep Learning | Mauro Camporeale et.al. | 2404.13745 | null |
| 2024-04-23 | Clio: Real-time Task-Driven Open-Set 3D Scene Graphs | Dominic Maggio et.al. | 2404.13696 | null |
| 2024-04-20 | FisheyeDetNet: Object Detection on Fisheye Surround View Camera Systems for Automated Driving | Ganesh Sistu et.al. | 2404.13443 | null |
| 2024-04-19 | A comparison between single-stage and two-stage 3D tracking algorithms for greenhouse robotics | David Rapado-Rincon et.al. | 2404.12963 | null |
| 2024-04-19 | Language-Driven Active Learning for Diverse Open-Set 3D Object Detection | Ross Greer et.al. | 2404.12856 | null |
| 2024-04-19 | ECOR: Explainable CLIP for Object Recognition | Ali Rasekh et.al. | 2404.12839 | null |
| 2024-04-19 | A Point-Based Approach to Efficient LiDAR Multi-Task Perception | Christopher Lang et.al. | 2404.12798 | null |
| 2024-04-19 | ELEV-VISION-SAM: Integrated Vision Language and Foundation Model for Automated Estimation of Building Lowest Floor Elevation | Yu-Hsuan Ho et.al. | 2404.12606 | null |
| 2024-04-18 | The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models | Cheng Shi et.al. | 2404.11957 | link |
| 2024-04-18 | Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition | Xunsong Li et.al. | 2404.11903 | null |
| 2024-04-17 | TempBEV: Improving Learned BEV Encoders with Combined Image and BEV Space Temporal Aggregation | Thomas Monninger et.al. | 2404.11803 | null |
| 2024-04-17 | Multimodal 3D Object Detection on Unseen Domains | Deepti Hegde et.al. | 2404.11764 | null |
| 2024-04-17 | Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection | Deepti Hegde et.al. | 2404.11737 | null |
| 2024-04-17 | Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems | Luca Bompani et.al. | 2404.11488 | link |
| 2024-04-17 | EcoMLS: A Self-Adaptation Approach for Architecting Green ML-Enabled Systems | Meghana Tedla et.al. | 2404.11411 | null |
| 2024-04-17 | Detector Collapse: Backdooring Object Detection to Catastrophic Overload or Blindness | Hangtao Zhang et.al. | 2404.11357 | null |
| 2024-04-17 | Simple In-place Data Augmentation for Surveillance Object Detection | Munkh-Erdene Otgonbold et.al. | 2404.11226 | null |
| 2024-04-17 | Feature Corrective Transfer Learning: End-to-End Solutions to Object Detection in Non-Ideal Visual Conditions | Chuheng Wei et.al. | 2404.11214 | null |
| 2024-04-17 | GhostNetV3: Exploring the Training Strategies for Compact Models | Zhenhua Liu et.al. | 2404.11202 | link |
| 2024-04-17 | How to deal with glare for improved perception of Autonomous Vehicles | Muhammad Z. Alam et.al. | 2404.10992 | null |
| 2024-04-17 | Leveraging 3D LiDAR Sensors to Enable Enhanced Urban Safety and Public Health: Pedestrian Monitoring and Abnormal Activity Detection | Nawfal Guefrachi et.al. | 2404.10978 | null |
| 2024-04-16 | OSR-ViT: A Simple and Modular Framework for Open-Set Object Detection and Discovery | Matthew Inkawhich et.al. | 2404.10865 | null |
| 2024-04-16 | Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark | Jiangning Zhang et.al. | 2404.10760 | null |
| 2024-04-16 | Watch Your Step: Optimal Retrieval for Continual Learning at Scale | Truman Hickok et.al. | 2404.10758 | null |
| 2024-04-16 | Efficient optimal dispersed Haar-like filters for face detection | Zeinab Sedaghatjoo et.al. | 2404.10476 | null |
| 2024-04-16 | Camera clustering for scalable stream-based active distillation | Dani Manjah et.al. | 2404.10411 | null |
| 2024-04-15 | Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets | Dai Quoc Tran et.al. | 2404.10078 | link |
| 2024-04-15 | Explainable Light-Weight Deep Learning Pipeline for Improved Drought Stres | Aswini Kumar Patra et.al. | 2404.10073 | null |
| 2024-04-15 | VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection | Bonan Ding et.al. | 2404.09431 | null |
| 2024-04-14 | TEXT2TASTE: A Versatile Egocentric Vision System for Intelligent Reading Assistance Using Large Language Model | Wiktor Mucha et.al. | 2404.09254 | null |
| 2024-04-14 | DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection | Lewei Yao et.al. | 2404.09216 | null |
| 2024-04-14 | Coreset Selection for Object Detection | Hojun Lee et.al. | 2404.09161 | null |
| 2024-04-14 | Fusion-Mamba for Cross-modality Object Detection | Wenhao Dong et.al. | 2404.09146 | null |
| 2024-04-13 | The Snake’s Beating Heart? A Millisecond Pulsar Binary in the Galactic Center Radio Filament G359.1 $-$ 0.2 | Marcus E. Lower et.al. | 2404.09098 | null |
| 2024-04-13 | BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection | Jian Zhang et.al. | 2404.08979 | null |
| 2024-04-13 | Shifting Spotlight for Co-supervision: A Simple yet Efficient Single-branch Network to See Through Camouflage | Yang Hu et.al. | 2404.08936 | null |
| 2024-04-12 | Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation | Yanhao Zheng et.al. | 2404.08603 | link |
| 2024-04-12 | FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation | Riza Velioglu et.al. | 2404.08582 | link |
| 2024-04-12 | Analyzing Decades-Long Environmental Changes in Namibia Using Archival Aerial Photography and Deep Learning | Girmaw Abebe Tadesse et.al. | 2404.08544 | null |
| 2024-04-12 | MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion | Zhe Li et.al. | 2404.08406 | null |
| 2024-04-12 | Overcoming Scene Context Constraints for Object Detection in wild using Defilters | Vamshi Krishna Kancharla et.al. | 2404.08293 | null |
| 2024-04-11 | ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model | Lifan Jiang et.al. | 2404.07773 | link |
| 2024-04-11 | Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification | Ricardo Pereira et.al. | 2404.07739 | null |
| 2024-04-11 | Run-time Monitoring of 3D Object Detection in Automated Driving Systems Using Early Layer Neural Activation Patterns | Hakan Yekta Yatbaz et.al. | 2404.07685 | null |
| 2024-04-11 | Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes | Poulami Sinhamahapatra et.al. | 2404.07664 | null |
| 2024-04-11 | Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method | Tashmoy Ghosh et.al. | 2404.07649 | null |
| 2024-04-11 | GLID: Pre-training a Generalist Encoder-Decoder Vision Model | Jihao Liu et.al. | 2404.07603 | null |
| 2024-04-11 | SFSORT: Scene Features-based Simple Online Real-Time Tracker | M. M. Morsali et.al. | 2404.07553 | link |
| 2024-04-11 | The Sydney Radio Star Catalogue: properties of radio stars at megahertz to gigahertz frequencies | Laura N. Driessen et.al. | 2404.07418 | null |
| 2024-04-11 | Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing | Jaemin Kang et.al. | 2404.07405 | null |
| 2024-04-11 | A fine-tuning workflow for automatic first-break picking with deep learning | Amir Mardan et.al. | 2404.07400 | link |
| 2024-04-10 | Identification of Fine-grained Systematic Errors via Controlled Scene Generation | Valentyn Boreiko et.al. | 2404.07045 | null |
| 2024-04-10 | Accurate Tennis Court Line Detection on Amateur Recorded Matches | Sameer Agrawal et.al. | 2404.06977 | null |
| 2024-04-10 | SARA: Smart AI Reading Assistant for Reading Comprehension | Enkeleda Thaqi et.al. | 2404.06906 | null |
| 2024-04-10 | Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data | Aakash Kumar et.al. | 2404.06715 | null |
| 2024-04-10 | Scaling Multi-Camera 3D Object Detection through Weak-to-Strong Eliciting | Hao Lu et.al. | 2404.06700 | link |
| 2024-04-09 | Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping | Anas Gouda et.al. | 2404.06277 | link |
| 2024-04-09 | Label-Efficient 3D Object Detection For Road-Side Units | Minh-Quan Dao et.al. | 2404.06256 | null |
| 2024-04-09 | Automatic Defect Detection in Sewer Network Using Deep Learning Based Object Detector | Bach Ha et.al. | 2404.06219 | null |
| 2024-04-09 | YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images | Chenguang Liu et.al. | 2404.06180 | null |
| 2024-04-09 | Enhanced Radar Perception via Multi-Task Learning: Towards Refined Data for Sensor Fusion Applications | Huawei Sun et.al. | 2404.06165 | null |
| 2024-04-09 | Improving Facial Landmark Detection Accuracy and Efficiency with Knowledge Distillation | Zong-Wei Hong et.al. | 2404.06029 | null |
| 2024-04-08 | Retrieval-Augmented Open-Vocabulary Object Detection | Jooyeon Kim et.al. | 2404.05687 | link |
| 2024-04-08 | 3D-COCO: extension of MS-COCO dataset for image detection and 3D reconstruction modules | Maxence Bideaux et.al. | 2404.05641 | null |
| 2024-04-08 | PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text? | Kseniia Petukhova et.al. | 2404.05483 | null |
| 2024-04-08 | Detecting Every Object from Events | Haitian Zhang et.al. | 2404.05285 | link |
| 2024-04-08 | MOSE: Boosting Vision-based Roadside 3D Object Detection with Scene Cues | Xiahan Chen et.al. | 2404.05280 | null |
| 2024-04-08 | Rendering-Enhanced Automatic Image-to-Point Cloud Registration for Roadside Scenes | Yu Sheng et.al. | 2404.05164 | null |
| 2024-04-08 | Better Monocular 3D Detectors with LiDAR from the Past | Yurong You et.al. | 2404.05139 | link |
| 2024-04-07 | AirShot: Efficient Few-Shot Detection for Autonomous Exploration | Zihan Wang et.al. | 2404.05069 | link |
| 2024-04-07 | PlateSegFL: A Privacy-Preserving License Plate Detection Using Federated Segmentation Learning | Md. Shahriar Rahman Anuvab et.al. | 2404.05049 | null |
| 2024-04-07 | PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot | Shenbagaraj Kannapiran et.al. | 2404.05024 | null |
| 2024-04-05 | SCAResNet: A ResNet Variant Optimized for Tiny Object Detection in Transmission and Distribution Towers | Weile Li et.al. | 2404.04179 | link |
| 2024-04-05 | Designing Robots to Help Women | Martin Cooney et.al. | 2404.04123 | null |
| 2024-04-04 | Is CLIP the main roadblock for fine-grained open-world perception? | Lorenzo Bianchi et.al. | 2404.03539 | link |
| 2024-04-04 | DQ-DETR: DETR with Dynamic Query for Tiny Object Detection | Yi-Xin Huang et.al. | 2404.03507 | link |
| 2024-04-05 | A Methodology to Study the Impact of Spiking Neural Network Parameters considering Event-Based Automotive Data | Iqra Bano et.al. | 2404.03493 | null |
| 2024-04-04 | MonoCD: Monocular 3D Object Detection with Complementary Depths | Longfei Yan et.al. | 2404.03181 | link |
| 2024-04-03 | DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection | Felix Fent et.al. | 2404.03015 | null |
| 2024-04-03 | ALOHa: A New Measure for Hallucination in Captioning Models | Suzanne Petryk et.al. | 2404.02904 | null |
| 2024-04-03 | FlightScope: A Deep Comprehensive Assessment of Aircraft Detection Algorithms in Satellite Imagery | Safouane El Ghazouali et.al. | 2404.02877 | link |
| 2024-04-03 | HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras | Zhongyu Xia et.al. | 2404.02517 | link |
| 2024-04-04 | TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression | Ho-Joong Kim et.al. | 2404.02405 | null |
| 2024-04-04 | EGTR: Extracting Graph from Transformer for Scene Graph Generation | Jinbae Im et.al. | 2404.02072 | link |
| 2024-04-03 | Cooperative Students: Navigating Unsupervised Domain Adaptation in Nighttime Object Detection | Jicheng Yuan et.al. | 2404.01988 | link |
| 2024-04-02 | Towards Enhanced Analysis of Lung Cancer Lesions in EBUS-TBNA – A Semi-Supervised Video Object Detection Method | Jyun-An Lin et.al. | 2404.01929 | null |
| 2024-04-02 | Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack | Ying Zhou et.al. | 2404.01907 | link |
| 2024-04-02 | Scene Adaptive Sparse Transformer for Event-based Object Detection | Yansong Peng et.al. | 2404.01882 | link |
| 2024-04-02 | Semi-Supervised Domain Adaptation for Wildfire Detection | JooYoung Jang et.al. | 2404.01842 | null |
| 2024-04-02 | Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection | Tahira Shehzadi et.al. | 2404.01819 | null |
| 2024-04-02 | Analyzing the Single Event Upset Vulnerability of Binarized Neural Networks on SRAM FPGAs | Ioanna Souvatzoglou et.al. | 2404.01757 | null |
| 2024-04-02 | Disentangled Pre-training for Human-Object Interaction Detection | Zhuolong Li et.al. | 2404.01725 | null |
| 2024-04-02 | Task Integration Distillation for Object Detectors | Hai Su et.al. | 2404.01699 | null |
| 2024-03-29 | PLoc: A New Evaluation Criterion Based on Physical Location for Autonomous Driving Datasets | Ruining Yang et.al. | 2403.19893 | null |
| 2024-03-29 | MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection | Ali Behrouz et.al. | 2403.19888 | null |
| 2024-03-28 | DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs | Donghyun Kim et.al. | 2403.19588 | link |
| 2024-03-28 | OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation | Zhenyu Wang et.al. | 2403.19580 | null |
| 2024-03-28 | AIpom at SemEval-2024 Task 8: Detecting AI-produced Outputs in M4 | Alexander Shirnin et.al. | 2403.19354 | null |
| 2024-03-28 | Sparse Generation: Making Pseudo Labels Sparse for weakly supervision with points | Tian Ma et.al. | 2403.19306 | null |
| 2024-03-28 | CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection | Mikhail Kennerley et.al. | 2403.19278 | link |
| 2024-03-28 | Algorithmic Ways of Seeing: Using Object Detection to Facilitate Art Exploration | Louie Søs Meyer et.al. | 2403.19174 | null |
| 2024-03-28 | CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation | Lingjun Zhao et.al. | 2403.19104 | null |
| 2024-03-28 | A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement | Junjie Wen et.al. | 2403.19079 | null |
| 2024-03-27 | Illicit object detection in X-ray images using Vision Transformers | Jorgen Cani et.al. | 2403.19043 | null |
| 2024-03-27 | Benchmarking Object Detectors with COCO: A New Path Forward | Shweta Singh et.al. | 2403.18819 | link |
| 2024-03-27 | PhysicsAssistant: An LLM-Powered Interactive Learning Robot for Physics Lab Investigations | Ehsan Latif et.al. | 2403.18721 | null |
| 2024-03-27 | CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection | Jiayi Zhu et.al. | 2403.18554 | null |
| 2024-03-27 | BAM: Box Abstraction Monitors for Real-time OoD Detection in Object Detection | Changshun Wu et.al. | 2403.18373 | null |
| 2024-03-27 | Ship in Sight: Diffusion Models for Ship-Image Super Resolution | Luigi Sigillo et.al. | 2403.18370 | link |
| 2024-03-27 | DODA: Diffusion for Object-detection Domain Adaptation in Agriculture | Shuai Xiang et.al. | 2403.18334 | null |
| 2024-03-27 | Tracking-Assisted Object Detection with Event Cameras | Ting-Kang Yen et.al. | 2403.18330 | null |
| 2024-03-27 | SGDM: Static-Guided Dynamic Module Make Stronger Visual Models | Wenjie Xing et.al. | 2403.18282 | null |
| 2024-03-27 | Road Obstacle Detection based on Unknown Objectness Scores | Chihiro Noguchi et.al. | 2403.18207 | null |
| 2024-03-26 | State of the art applications of deep learning within tracking and detecting marine debris: A survey | Zoe Moorton et.al. | 2403.18067 | null |
| 2024-03-26 | The Solution for the CVPR 2023 1st foundation model challenge-Track2 | Haonan Xu et.al. | 2403.17702 | null |
| 2024-03-26 | PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition | Chenhongyi Yang et.al. | 2403.17695 | link |
| 2024-03-26 | UADA3D: Unsupervised Adversarial Domain Adaptation for 3D Object Detection with Sparse LiDAR and Large Domain Gaps | Maciej K Wozniak et.al. | 2403.17633 | null |
| 2024-03-26 | SSF3D: Strict Semi-Supervised 3D Object Detection with Switching Filter | Songbur Wong et.al. | 2403.17390 | null |
| 2024-03-26 | Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection | Jiacheng Zhang et.al. | 2403.17387 | null |
| 2024-03-26 | AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving | Mingfu Liang et.al. | 2403.17373 | null |
| 2024-03-26 | Staircase Localization for Autonomous Exploration in Urban Environments | Jinrae Kim et.al. | 2403.17330 | null |
| 2024-03-25 | Co-Occurring of Object Detection and Identification towards unlabeled object discovery | Binay Kumar Singh et.al. | 2403.17223 | null |
| 2024-03-25 | Optimizing LiDAR Placements for Robust Driving Perception in Adverse Conditions | Ye Li et.al. | 2403.17009 | link |
| 2024-03-25 | Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance | Jingyuan Zhu et.al. | 2403.16954 | null |
| 2024-03-25 | TrustAI at SemEval-2024 Task 8: A Comprehensive Analysis of Multi-domain Machine Generated Text Detection Techniques | Ashok Urlana et.al. | 2403.16592 | null |
| 2024-03-25 | RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection | Zhiwei Lin et.al. | 2403.16440 | link |
| 2024-03-25 | ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation | Hannah Schieber et.al. | 2403.16400 | link |
| 2024-03-25 | Impact of Video Compression Artifacts on Fisheye Camera Visual Perception Tasks | Madhumitha Sakthi et.al. | 2403.16338 | null |
| 2024-03-24 | Cross-domain Multi-modal Few-shot Object Detection via Rich Text | Zeyu Shangguan et.al. | 2403.16188 | null |
| 2024-03-24 | Semantic Is Enough: Only Semantic Information For NeRF Reconstruction | Ruibo Wang et.al. | 2403.16043 | null |
| 2024-03-23 | Adversarial Defense Teacher for Cross-Domain Object Detection under Poor Visibility Conditions | Kaiwen Wang et.al. | 2403.15786 | null |
| 2024-03-23 | EAGLE: A Domain Generalization Framework for AI-generated Text Detection | Amrita Bhattacharjee et.al. | 2403.15690 | null |
| 2024-03-25 | Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection | Hongzhi Gao et.al. | 2403.15317 | null |
| 2024-03-22 | CR3DT: Camera-RADAR Fusion for 3D Detection and Tracking | Nicolas Baumann et.al. | 2403.15313 | link |
| 2024-03-22 | IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection | Junbo Yin et.al. | 2403.15241 | null |
| 2024-03-22 | MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection | Taeheon Kim et.al. | 2403.15209 | null |
| 2024-03-22 | SFOD: Spiking Fusion Object Detector | Yimeng Fan et.al. | 2403.15192 | link |
| 2024-03-22 | CRPlace: Camera-Radar Fusion with BEV Representation for Place Recognition | Shaowei Fu et.al. | 2403.15183 | null |
| 2024-03-22 | An In-Depth Analysis of Data Reduction Methods for Sustainable Deep Learning | Víctor Toscano-Durán et.al. | 2403.15150 | null |
| 2024-03-22 | Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection | Jiaming Li et.al. | 2403.15127 | link |
| 2024-03-22 | VRSO: Visual-Centric Reconstruction for Static Object Annotation | Chenyao Yu et.al. | 2403.15026 | null |
| 2024-03-22 | Vehicle Detection Performance in Nordic Region | Hamam Mokayed et.al. | 2403.15017 | null |
| 2024-03-21 | T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy | Qing Jiang et.al. | 2403.14610 | link |
| 2024-03-21 | UAV-Assisted Maritime Search and Rescue: A Holistic Approach | Martin Messmer et.al. | 2403.14281 | null |
| 2024-03-21 | Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection | Tim Salzmann et.al. | 2403.14270 | null |
| 2024-03-21 | 3D Object Detection from Point Cloud via Voting Step Diffusion | Haoran Hou et.al. | 2403.14133 | null |
| 2024-03-20 | EcoSense: Energy-Efficient Intelligent Sensing for In-Shore Ship Detection through Edge-Cloud Collaboration | Wenjun Huang et.al. | 2403.14027 | null |
| 2024-03-20 | RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition | Ziyu Liu et.al. | 2403.13805 | link |
| 2024-03-20 | Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments | Yang Yang et.al. | 2403.13803 | link |
| 2024-03-20 | Fostc3net:A Lightweight YOLOv5 Based On the Network Structure Optimization | Danqing Ma et.al. | 2403.13703 | null |
| 2024-03-20 | Find n’ Propagate: Open-Vocabulary 3D Object Detection in Urban Environments | Djamahl Etchegaray et.al. | 2403.13556 | null |
| 2024-03-20 | MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining | Di Wang et.al. | 2403.13430 | link |
| 2024-03-20 | Few-shot Oriented Object Detection with Memorable Contrastive Learning in Remote Sensing Images | Jiawei Zhou et.al. | 2403.13375 | null |
| 2024-03-20 | Adaptive Ensembles of Fine-Tuned Transformers for LLM-Generated Text Detection | Zhixin Lai et.al. | 2403.13335 | null |
| 2024-03-20 | DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception | Yibo Wang et.al. | 2403.13304 | null |
| 2024-03-20 | Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models | Huachuan Qiu et.al. | 2403.13250 | null |
| 2024-03-19 | SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model | Armen Avetisyan et.al. | 2403.13064 | null |
| 2024-03-19 | Wildfire danger prediction optimization with transfer learning | Spiros Maggioros et.al. | 2403.12871 | link |
| 2024-03-19 | As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks? | Anjun Hu et.al. | 2403.12693 | null |
| 2024-03-19 | EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks | Ziming Wang et.al. | 2403.12574 | null |
| 2024-03-19 | DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM | Yixuan Wu et.al. | 2403.12488 | null |
| 2024-03-19 | TransformMix: Learning Transformation and Mixing Strategies from Data | Tsz-Him Cheung et.al. | 2403.12429 | null |
| 2024-03-19 | VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation | Hao Wang et.al. | 2403.12415 | null |
| 2024-03-19 | Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition | Jielin Qiu et.al. | 2403.12339 | null |
| 2024-03-18 | EffiPerception: an Efficient Framework for Various Perception Tasks | Xinhao Xiang et.al. | 2403.12317 | null |
| 2024-03-18 | Prototipo de un Contador Bidireccional Automático de Personas basado en sensores de visión 3D | Benjamín Ojeda-Magaña et.al. | 2403.12310 | null |
| 2024-03-18 | Align and Distill: Unifying and Improving Domain Adaptive Object Detection | Justin Kay et.al. | 2403.12029 | link |
| 2024-03-18 | TrajectoryNAS: A Neural Architecture Search for Trajectory Prediction | Ali Asghar Sharifi et.al. | 2403.11695 | null |
| 2024-03-18 | Just Add $100 More: Augmenting NeRF-based Pseudo-LiDAR Point Cloud for Resolving Class-imbalance Problem | Mincheol Chang et.al. | 2403.11573 | null |
| 2024-03-18 | R2SNet: Scalable Domain Adaptation for Object Detection in Cloud-Based Robots Ecosystems via Proposal Refinement | Michele Antonazzi et.al. | 2403.11567 | null |
| 2024-03-18 | Continual Forgetting for Pre-trained Vision Models | Hongbo Zhao et.al. | 2403.11530 | link |
| 2024-03-17 | V2X-DGW: Domain Generalization for Multi-agent Perception under Adverse Weather Conditions | Baolu Li et.al. | 2403.11371 | null |
| 2024-03-17 | Advanced Knowledge Extraction of Physical Design Drawings, Translation and conversion to CAD formats using Deep Learning | Jesher Joshua M et.al. | 2403.11291 | null |
| 2024-03-17 | ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models | Siyuan Huang et.al. | 2403.11289 | null |
| 2024-03-17 | CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations | Yuwei Zhang et.al. | 2403.11220 | link |
| 2024-03-17 | GRA: Detecting Oriented Objects through Group-wise Rotating and Attention | Jiangshan Wang et.al. | 2403.11127 | null |
| 2024-03-17 | Self-supervised co-salient object detection via feature correspondence at multiple scales | Souradeep Chakraborty et.al. | 2403.11107 | link |
| 2024-03-14 | Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization | Zhao Wang et.al. | 2403.09433 | null |
| 2024-03-14 | D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection | Dinh Phat Do et.al. | 2403.09359 | link |
| 2024-03-14 | Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring | Yufei Zhan et.al. | 2403.09333 | link |
| 2024-03-14 | EfficientMFD: Towards More Efficient Multimodal Synchronous Fusion Detection | Jiaqing Zhang et.al. | 2403.09323 | link |
| 2024-03-14 | Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection | Martin Aubard et.al. | 2403.09313 | link |
| 2024-03-14 | MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion | Arul Selvam Periyasamy et.al. | 2403.09309 | null |
| 2024-03-14 | CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification | Yiming Ma et.al. | 2403.09281 | null |
| 2024-03-14 | D-YOLO a robust framework for object detection in adverse weather conditions | Zihan Chu et.al. | 2403.09233 | null |
| 2024-03-14 | Improving Distant 3D Object Detection Using 2D Box Supervision | Zetong Yang et.al. | 2403.09230 | null |
| 2024-03-14 | PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest | Jiajun Deng et.al. | 2403.09212 | null |
| 2024-03-13 | VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis | Enric Corona et.al. | 2403.08764 | null |
| 2024-03-13 | MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning | Jialv Zou et.al. | 2403.08760 | link |
| 2024-03-13 | Data Augmentation in Human-Centric Vision | Wentao Jiang et.al. | 2403.08650 | null |
| 2024-03-13 | PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections | Matteo Taiana et.al. | 2403.08586 | null |
| 2024-03-13 | A Multimodal Fusion Network For Student Emotion Recognition Based on Transformer and Tensor Product | Ao Xiang et.al. | 2403.08511 | null |
| 2024-03-13 | Improved YOLOv5 Based on Attention Mechanism and FasterNet for Foreign Object Detection on Railway and Airway tracks | Zongqing Qi et.al. | 2403.08499 | null |
| 2024-03-13 | IAMCV Multi-Scenario Vehicle Interaction Dataset | Novel Certad et.al. | 2403.08455 | null |
| 2024-03-13 | Advancing Security in AI Systems: A Novel Approach to Detecting Backdoors in Deep Neural Networks | Khondoker Murad Hossain et.al. | 2403.08208 | null |
| 2024-03-12 | TaskCLIP: Extend Large Vision-Language Model for Task Oriented Object Detection | Hanning Chen et.al. | 2403.08108 | null |
| 2024-03-12 | Aedes aegypti Egg Counting with Neural Networks for Object Detection | Micheli Nayara de Oliveira Vicente et.al. | 2403.08016 | null |
| 2024-03-12 | Mondrian: On-Device High-Performance Video Analytics with Compressive Packed Inference | Changmin Jeon et.al. | 2403.07598 | null |
| 2024-03-12 | PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution | Honghao Chen et.al. | 2403.07589 | null |
| 2024-03-12 | A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions | Quoc-Vinh Lai-Dang et.al. | 2403.07542 | null |
| 2024-03-12 | JSTR: Joint Spatio-Temporal Reasoning for Event-based Moving Object Detection | Hanyu Zhou et.al. | 2403.07436 | null |
| 2024-03-12 | Eliminating Cross-modal Conflicts in BEV Space for LiDAR-Camera 3D Object Detection | Jiahui Fu et.al. | 2403.07372 | null |
| 2024-03-12 | GPT-generated Text Detection: Benchmark Dataset and Tensor-based Detection Method | Zubair Qazi et.al. | 2403.07321 | link |
| 2024-03-12 | MENTOR: Multilingual tExt detectioN TOward leaRning by analogy | Hsin-Ju Lin et.al. | 2403.07286 | null |
| 2024-03-12 | SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection | Hongcheng Zhang et.al. | 2403.07284 | null |
| 2024-03-12 | Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction | Alexander Timans et.al. | 2403.07263 | null |
| 2024-03-11 | Class Imbalance in Object Detection: An Experimental Diagnosis and Study of Mitigation Strategies | Nieves Crasto et.al. | 2403.07113 | link |
| 2024-03-11 | Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head | Tiancheng Zhao et.al. | 2403.06892 | null |
| 2024-03-11 | LeOCLR: Leveraging Original Images for Contrastive Learning of Visual Representations | Mohammad Alkhalefi et.al. | 2403.06813 | null |
| 2024-03-11 | Genetic Learning for Designing Sim-to-Real Data Augmentations | Bram Vanherle et.al. | 2403.06786 | null |
| 2024-03-11 | Evaluating the Energy Efficiency of Few-Shot Learning for Object Detection in Industrial Settings | Georgios Tsoumplekas et.al. | 2403.06631 | null |
| 2024-03-11 | Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers | Alexander H. Berger et.al. | 2403.06601 | null |
| 2024-03-11 | SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection | Yuxuan Li et.al. | 2403.06534 | link |
| 2024-03-11 | 3D Semantic Segmentation-Driven Representations for 3D Object Detection | Hayeon O et.al. | 2403.06501 | null |
| 2024-03-11 | Fine-Grained Pillar Feature Encoding Via Spatio-Temporal Virtual Grid for 3D Object Detection | Konyul Park et.al. | 2403.06433 | null |
| 2024-03-10 | Transformer based Multitask Learning for Image Captioning and Object Detection | Debolena Basak et.al. | 2403.06292 | null |
| 2024-03-10 | Poly Kernel Inception Network for Remote Sensing Detection | Xinhao Cai et.al. | 2403.06258 | link |
| 2024-03-08 | EVD4UAV: An Altitude-Sensitive Benchmark to Evade Vehicle Detection in UAV | Huiming Sun et.al. | 2403.05422 | null |
| 2024-03-08 | SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised Learning for Robust Infrared Small Target Detection | Yahao Lu et.al. | 2403.05416 | link |
| 2024-03-08 | Exploring Robust Features for Few-Shot Object Detection in Satellite Imagery | Xavier Bou et.al. | 2403.05381 | null |
| 2024-03-08 | Frequency-Adaptive Dilated Convolution for Semantic Segmentation | Linwei Chen et.al. | 2403.05369 | link |
| 2024-03-08 | VLM-PL: Advanced Pseudo Labeling approach Class Incremental Object Detection with Vision-Language Model | Junsu Kim et.al. | 2403.05346 | null |
| 2024-03-08 | Improving the Successful Robotic Grasp Detection Using Convolutional Neural Networks | Hamed Hosseini et.al. | 2403.05211 | null |
| 2024-03-08 | LanePtrNet: Revisiting Lane Detection as Point Voting and Grouping on Curves | Jiayan Cao et.al. | 2403.05155 | null |
| 2024-03-08 | RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features | Geonho Bang et.al. | 2403.05061 | null |
| 2024-03-08 | ActFormer: Scalable Collaborative Perception via Active Queries | Suozhi Huang et.al. | 2403.04968 | null |
| 2024-03-07 | FriendNet: Detection-Friendly Dehazing Network | Yihua Fan et.al. | 2403.04443 | null |
| 2024-03-07 | Effectiveness Assessment of Recent Large Vision-Language Models | Yao Jiang et.al. | 2403.04306 | null |
| 2024-03-07 | ACC-ViT : Atrous Convolution’s Comeback in Vision Transformers | Nabil Ibtehaz et.al. | 2403.04200 | null |
| 2024-03-07 | CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images | Guanlin Shen et.al. | 2403.04198 | null |
| 2024-03-07 | Scalable and Robust Transformer Decoders for Interpretable Image Classification with Foundation Models | Evelyn Mannix et.al. | 2403.04125 | null |
| 2024-03-07 | CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D Object Detection | Gyusam Chang et.al. | 2403.03721 | null |
| 2024-03-06 | Adversarial Infrared Geometry: Using Geometry to Perform Adversarial Attack against Infrared Pedestrian Detectors | Kalibinuer Tiliwalidi et.al. | 2403.03674 | null |
| 2024-03-06 | Towards Detecting AI-Generated Text within Human-AI Collaborative Hybrid Texts | Zijie Zeng et.al. | 2403.03506 | null |
| 2024-03-06 | Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator | Wonhyeok Choi et.al. | 2403.03468 | null |
| 2024-03-06 | FLAME Diffuser: Grounded Wildfire Image Synthesis using Mask Guided Diffusion | Hao Wang et.al. | 2403.03463 | null |
| 2024-03-06 | Performance Evaluation of Semi-supervised Learning Frameworks for Multi-Class Weed Detection | Jiajia Li et.al. | 2403.03390 | link |
| 2024-03-05 | Detecting Concrete Visual Tokens for Multimodal Machine Translation | Braeden Bowen et.al. | 2403.03075 | null |
| 2024-03-05 | Loss Design for Single-carrier Joint Communication and Neural Network-based Sensing | Charlotte Muth et.al. | 2403.02929 | null |
| 2024-03-05 | Are Dense Labels Always Necessary for 3D Object Detection from Point Cloud? | Chenqiang Gao et.al. | 2403.02818 | null |
| 2024-03-05 | Bootstrapping Rare Object Detection in High-Resolution Satellite Imagery | Akram Zaytar et.al. | 2403.02736 | null |
| 2024-03-05 | FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird’s-Eye View and Perspective View | Jiawei Hou et.al. | 2403.02710 | null |
| 2024-03-05 | False Positive Sampling-based Data Augmentation for Enhanced 3D Object Detection Accuracy | Jiyong Oh et.al. | 2403.02639 | null |
| 2024-03-05 | BSDP: Brain-inspired Streaming Dual-level Perturbations for Online Open World Object Detection | Yu Chen et.al. | 2403.02637 | null |
| 2024-03-04 | NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function | Abdullah Nazhat Abdullah et.al. | 2403.02411 | link |
| 2024-03-04 | COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems against Semantic Attacks | Zijian Huang et.al. | 2403.02329 | null |
| 2024-03-04 | Scalable Vision-Based 3D Object Detection and Monocular Depth Estimation for Autonomous Driving | Yuxuan Liu et.al. | 2403.02037 | link |
| 2024-03-02 | TUMTraf V2X Cooperative Perception Dataset | Walter Zimmer et.al. | 2403.01316 | null |
| 2024-03-02 | Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection | Taeheon Kim et.al. | 2403.01300 | null |
| 2024-03-02 | Run-time Introspection of 2D Object Detection in Automated Driving Systems Using Learning Representations | Hakan Yekta Yatbaz et.al. | 2403.01172 | null |
| 2024-03-02 | ELA: Efficient Local Attention for Deep Convolutional Neural Networks | Wei Xu et.al. | 2403.01123 | null |
| 2024-03-02 | Face Swap via Diffusion Model | Feifei Wang et.al. | 2403.01108 | link |
| 2024-03-02 | Beyond Night Visibility: Adaptive Multi-Scale Fusion of Infrared and Visible Images | Shufan Pei et.al. | 2403.01083 | null |
| 2024-03-01 | Learning Causal Features for Incremental Object Detection | Zhenwei He et.al. | 2403.00591 | null |
| 2024-03-01 | Abductive Ego-View Accident Video Understanding for Safe Driving Perception | Jianwu Fang et.al. | 2403.00436 | null |
| 2024-03-04 | DAMS-DETR: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion | Junjie Guo et.al. | 2403.00326 | null |
| 2024-03-01 | ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting | Chen Duan et.al. | 2403.00303 | link |
| 2024-02-29 | SeMoLi: What Moves Together Belongs Together | Jenny Seidenschwarz et.al. | 2402.19463 | null |
| 2024-02-29 | Genie: Smart ROS-based Caching for Connected Autonomous Robots | Zexin Li et.al. | 2402.19410 | null |
| 2024-02-29 | ProtoP-OD: Explainable Object Detection with Prototypical Parts | Pavlos Rath-Manakidis et.al. | 2402.19142 | null |
| 2024-02-29 | Theoretically Achieving Continuous Representation of Oriented Bounding Boxes | Zikai Xiao et.al. | 2402.18975 | link |
| 2024-02-29 | Boosting Semi-Supervised Object Detection in Remote Sensing Images With Active Teaching | Boxuan Zhang et.al. | 2402.18958 | null |
| 2024-02-29 | Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering | Xiang Chen et.al. | 2402.18927 | null |
| 2024-02-29 | A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection | Chao Hao et.al. | 2402.18922 | null |
| 2024-02-29 | Privacy-Preserving Autoencoder for Collaborative Object Detection | Bardia Azizian et.al. | 2402.18864 | null |
| 2024-02-29 | Debiased Novel Category Discovering and Localization | Juexiao Feng et.al. | 2402.18821 | null |
| 2024-02-28 | Spatial Coherence Loss for Salient and Camouflaged Object Detection and Beyond | Ziyun Yang et.al. | 2402.18698 | null |
| 2024-02-28 | UniMODE: Unified Monocular 3D Object Detection | Zhuoling Li et.al. | 2402.18573 | null |
| 2024-02-28 | Detection of Micromobility Vehicles in Urban Traffic Videos | Khalil Sabri et.al. | 2402.18503 | link |
| 2024-02-28 | Sunshine to Rainstorm: Cross-Weather Knowledge Distillation for Robust 3D Object Detection | Xun Huang et.al. | 2402.18493 | null |
| 2024-02-28 | Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization | Deng Li et.al. | 2402.18447 | null |
| 2024-02-28 | Unveiling novel insights into Kirchhoff migration for effective object detection using experimental Fresnel dataset | Won-Kwang Park et.al. | 2402.18322 | null |
| 2024-02-28 | Zero-Shot Aerial Object Detection with Visual Description Regularization | Zhengqing Zang et.al. | 2402.18233 | null |
| 2024-02-28 | VulMCI : Code Splicing-based Pixel-row Oversampling for More Continuous Vulnerability Image Generation | Tao Peng et.al. | 2402.18189 | null |
| 2024-02-27 | SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection | Junsu Kim et.al. | 2402.17323 | null |
| 2024-02-27 | A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge – Multi-Task Robustness Track | Zehui Chen et.al. | 2402.17319 | null |
| 2024-02-27 | Probing Multimodal Large Language Models for Global and Local Semantic Representation | Mingxu Tao et.al. | 2402.17304 | null |
Semantic Segmentation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-12-23 | BiCoR-Seg: Bidirectional Co-Refinement Framework for High-Resolution Remote Sensing Image Segmentation | Jinghao Shi et.al. | 2512.20255 | null |
| 2025-12-22 | Retrieving Objects from 3D Scenes with Box-Guided Open-Vocabulary Instance Segmentation | Khanh Nguyen et.al. | 2512.19088 | null |
| 2025-12-22 | ICP-4D: Bridging Iterative Closest Point and LiDAR Panoptic Segmentation | Gyeongrok Oh et.al. | 2512.18991 | null |
| 2025-12-22 | VOIC: Visible-Occluded Decoupling for Monocular 3D Semantic Scene Completion | Zaidao Han et.al. | 2512.18954 | null |
| 2025-12-20 | Multifaceted Exploration of Spatial Openness in Rental Housing: A Big Data Analysis in Tokyo’s 23 Wards | Takuya OKi et.al. | 2512.18226 | null |
| 2025-12-19 | Uncertainty-Gated Region-Level Retrieval for Robust Semantic Segmentation | Shreshth Rajan et.al. | 2512.18082 | null |
| 2025-12-19 | Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding | Yue Li et.al. | 2512.17817 | null |
| 2025-12-19 | SAVeD: A First-Person Social Media Video Dataset for ADAS-equipped vehicle Near-Miss and Crash Event Analyses | Shaoyan Zhai et.al. | 2512.17724 | null |
| 2025-12-19 | A 28nm 0.22 μJ/token memory-compute-intensity-aware CNN-Transformer accelerator with hybrid-attention-based layer-fusion and cascaded pruning for semanticsegmentation | Pingcheng Dong et.al. | 2512.17555 | null |
| 2025-12-19 | MULTIAQUA: A multimodal maritime dataset and robust training strategies for multimodal semantic segmentation | Jon Muhovič et.al. | 2512.17450 | null |
| 2025-12-19 | AIFloodSense: A Global Aerial Imagery Dataset for Semantic Segmentation and Understanding of Flooded Environments | Georgios Simantiris et.al. | 2512.17432 | null |
| 2025-12-18 | Next-Embedding Prediction Makes Strong Vision Learners | Sihan Xu et.al. | 2512.16922 | null |
| 2025-12-18 | Task-Oriented Data Synthesis and Control-Rectify Sampling for Remote Sensing Semantic Segmentation | Yunkai Yang et.al. | 2512.16740 | null |
| 2025-12-18 | Causal-Tune: Mining Causal Factors from Vision Foundation Models for Domain Generalized Semantic Segmentation | Yin Zhang et.al. | 2512.16567 | null |
| 2025-12-18 | PixelArena: A benchmark for Pixel-Precision Visual Intelligence | Feng Liang et.al. | 2512.16303 | null |
| 2025-12-17 | In Pursuit of Pixel Supervision for Visual Pre-training | Lihe Yang et.al. | 2512.15715 | null |
| 2025-12-17 | MoonSeg3R: Monocular Online Zero-Shot Segment Anything in 3D with Reconstructive Foundation Priors | Zhipeng Du et.al. | 2512.15577 | null |
| 2025-12-17 | SemanticBridge - A Dataset for 3D Semantic Segmentation of Bridges and Domain Gap Analysis | Maximilian Kellner et.al. | 2512.15369 | null |
| 2025-12-17 | Vision-based module for accurately reading linear scales in a laboratory | Parvesh Saini et.al. | 2512.15327 | null |
| 2025-12-17 | SynthSeg-Agents: Multi-Agent Synthetic Data Generation for Zero-Shot Weakly Supervised Semantic Segmentation | Wangyu Wu et.al. | 2512.15310 | null |
| 2025-12-16 | Segmental Attention Decoding With Long Form Acoustic Encodings | Pawel Swietojanski et.al. | 2512.14652 | null |
| 2025-12-16 | S2D: Sparse-To-Dense Keymask Distillation for Unsupervised Video Instance Segmentation | Leon Sick et.al. | 2512.14440 | null |
| 2025-12-16 | DriverGaze360: OmniDirectional Driver Attention with Object-Level Guidance | Shreedhar Govil et.al. | 2512.14266 | null |
| 2025-12-16 | Consistent Instance Field for Dynamic Scene Understanding | Junyi Wu et.al. | 2512.14126 | null |
| 2025-12-16 | ChartAgent: A Chart Understanding Framework with Tool Integrated Reasoning | Boran Wang et.al. | 2512.14040 | null |
| 2025-12-16 | Deep Learning Perspective of Scene Understanding in Autonomous Robots | Afia Maham et.al. | 2512.14020 | null |
| 2025-12-15 | Seeing the Whole Picture: Distribution-Guided Data-Free Distillation for Semantic Segmentation | Hongxuan Sun et.al. | 2512.13175 | null |
| 2025-12-15 | JoDiffusion: Jointly Diffusing Image with Pixel-Level Annotations for Semantic Segmentation Promotion | Haoyu Wang et.al. | 2512.13014 | null |
| 2025-12-15 | TWLR: Text-Guided Weakly-Supervised Lesion Localization and Severity Regression for Explainable Diabetic Retinopathy Grading | Xi Luo et.al. | 2512.13008 | null |
| 2025-12-13 | OMUDA: Omni-level Masking for Unsupervised Domain Adaptation in Semantic Segmentation | Yang Ou et.al. | 2512.12303 | null |
| 2025-12-12 | Enhancing deep learning performance on burned area delineation from SPOT-6/7 imagery for emergency management | Maria Rodriguez et.al. | 2512.12056 | null |
| 2025-12-09 | Generalization vs. Specialization: Evaluating Segment Anything Model (SAM3) Zero-Shot Segmentation Against Fine-Tuned YOLO Detectors | Ranjan Sapkota et.al. | 2512.11884 | null |
| 2025-12-07 | Pseudo-Label Refinement for Robust Wheat Head Segmentation via Two-Stage Hybrid Training | Jiahao Jiang et.al. | 2512.11874 | null |
| 2025-12-12 | Referring Change Detection in Remote Sensing Imagery | Yilmaz Korkmaz et.al. | 2512.11719 | null |
| 2025-12-12 | DOS: Distilling Observable Softmaps of Zipfian Prototypes for Self-Supervised Point Representation | Mohamed Abdelsamad et.al. | 2512.11465 | null |
| 2025-12-12 | Out-of-Distribution Segmentation via Wasserstein-Based Evidential Uncertainty | Arnold Brosch et.al. | 2512.11373 | null |
| 2025-12-12 | VFMF: World Modeling by Forecasting Vision Foundation Model Features | Gabrijel Boduljak et.al. | 2512.11225 | null |
| 2025-12-11 | Take a Peek: Efficient Encoder Adaptation for Few-Shot Semantic Segmentation via LoRA | Pasquale De Marinis et.al. | 2512.10521 | null |
| 2025-12-11 | Hybrid Transformer-Mamba Architecture for Weakly Supervised Volumetric Medical Segmentation | Yiheng Lyu et.al. | 2512.10353 | null |
| 2025-12-11 | ConStruct: Structural Distillation of Foundation Models for Prototype-Based Weakly Supervised Histopathology Segmentation | Khang Le et.al. | 2512.10316 | null |
| 2025-12-11 | DualProtoSeg: Simple and Efficient Design with Text- and Image-Guided Prototype Learning for Weakly Supervised Histopathology Image Segmentation | Anh M. Vu et.al. | 2512.10314 | null |
| 2025-12-10 | NordFKB: a fine-grained benchmark dataset for geospatial AI in Norway | Sander Riisøen Jyhne et.al. | 2512.09913 | null |
| 2025-12-10 | ASSIST-3D: Adapted Scene Synthesis for Class-Agnostic 3D Instance Segmentation | Shengchao Zhou et.al. | 2512.09364 | null |
| 2025-12-10 | ROI-Packing: Efficient Region-Based Compression for Machine Vision | Md Eimran Hossain Eimon et.al. | 2512.09258 | null |
| 2025-12-09 | SIP: Site in Pieces- A Dataset of Disaggregated Construction-Phase 3D Scans for Semantic Segmentation and Scene Understanding | Seongyong Kim et.al. | 2512.09062 | null |
| 2025-12-09 | Persistent Homology for Labeled Datasets: Gromov-Hausdorff Stability and Generalized Landscapes | Yaoying Fu et.al. | 2512.08794 | null |
| 2025-12-09 | SegEarth-OV3: Exploring SAM 3 for Open-Vocabulary Semantic Segmentation in Remote Sensing Images | Kaiyu Li et.al. | 2512.08730 | null |
| 2025-12-09 | Instance-Aware Test-Time Segmentation for Continual Domain Shifts | Seunghwan Lee et.al. | 2512.08569 | null |
| 2025-12-09 | Query-aware Hub Prototype Learning for Few-Shot 3D Point Cloud Semantic Segmentation | YiLin Zhou et.al. | 2512.08253 | null |
| 2025-12-08 | Restrictive Hierarchical Semantic Segmentation for Stratified Tooth Layer Detection | Ryan Banks et.al. | 2512.07984 | null |
| 2025-12-08 | Structure-Aware Feature Rectification with Region Adjacency Graphs for Training-Free Open-Vocabulary Semantic Segmentation | Qiming Huang et.al. | 2512.07360 | null |
| 2025-12-08 | Generalized Referring Expression Segmentation on Aerial Photos | Luís Marnoto et.al. | 2512.07338 | null |
| 2025-12-08 | A graph generation pipeline for critical infrastructures based on heuristics, images and depth data | Mike Diessner et.al. | 2512.07269 | null |
| 2025-12-07 | Power of Boundary and Reflection: Semantic Transparent Object Segmentation using Pyramid Vision Transformer with Transparent Cues | Tuan-Anh Vu et.al. | 2512.07034 | null |
| 2025-12-07 | Selective Masking based Self-Supervised Learning for Image Semantic Segmentation | Yuemin Wang et.al. | 2512.06981 | null |
| 2025-12-07 | Balanced Learning for Domain Adaptive Semantic Segmentation | Wangkai Li et.al. | 2512.06886 | null |
| 2025-12-07 | Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion | Yu Zhu et.al. | 2512.06882 | null |
| 2025-12-07 | Towards Robust Pseudo-Label Learning in Semantic Segmentation: An Encoding Perspective | Wangkai Li et.al. | 2512.06870 | null |
| 2025-12-07 | Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training | Kaixuan Lu et.al. | 2512.06864 | null |
| 2025-12-07 | FedDSR: Federated Deep Supervision and Regularization Towards Autonomous Driving | Wei-Bin Kou et.al. | 2512.06676 | null |
| 2025-12-07 | Statistic-Augmented, Decoupled MoE Routing and Aggregating in Autonomous Driving | Wei-Bin Kou et.al. | 2512.06664 | null |
| 2025-12-07 | CoT4Det: A Chain-of-Thought Framework for Perception-Oriented Vision-Language Tasks | Yu Qi et.al. | 2512.06663 | null |
| 2025-12-06 | Are AI-Generated Driving Videos Ready for Autonomous Driving? A Diagnostic Evaluation Framework | Xinhao Xiang et.al. | 2512.06376 | null |
| 2025-12-03 | Fast and Flexible Robustness Certificates for Semantic Segmentation | Thomas Massena et.al. | 2512.06010 | null |
| 2025-11-30 | Stronger is not better: Better Augmentations in Contrastive Learning for Medical Image Segmentation | Azeez Idris et.al. | 2512.05992 | null |
| 2025-12-05 | LPD: Learnable Prototypes with Diversity Regularization for Weakly Supervised Histopathology Segmentation | Khang Le et.al. | 2512.05922 | null |
| 2025-12-05 | Label-Efficient Point Cloud Segmentation with Active Learning | Johannes Meyer et.al. | 2512.05759 | null |
| 2025-12-05 | DistillFSS: Synthesizing Few-Shot Knowledge into a Lightweight Segmentation Model | Pasquale De Marinis et.al. | 2512.05613 | null |
| 2025-12-01 | FlowEO: Generative Unsupervised Domain Adaptation for Earth Observation | Georges Le Bellier et.al. | 2512.05140 | null |
| 2025-12-04 | GeoPE:A Unified Geometric Positional Embedding for Structured Tensors | Yupu Yao et.al. | 2512.04963 | null |
| 2025-12-04 | MT-Depth: Multi-task Instance feature analysis for the Depth Completion | Abdul Haseeb Nizamani et.al. | 2512.04734 | null |
| 2025-12-04 | DuGI-MAE: Improving Infrared Mask Autoencoders via Dual-Domain Guidance | Yinghui Xing et.al. | 2512.04511 | null |
| 2025-12-03 | A Novel Approach to Tomato Harvesting Using a Hybrid Gripper with Semantic Segmentation and Keypoint Detection | Shahid Ansari et.al. | 2512.03684 | null |
| 2025-12-03 | OpenTrack3D: Towards Accurate and Generalizable Open-Vocabulary 3D Instance Segmentation | Zhishan Zhou et.al. | 2512.03532 | null |
| 2025-12-03 | Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation | Seogkyu Jeon et.al. | 2512.03508 | null |
| 2025-12-02 | Instant Video Models: Universal Adapters for Stabilizing Image-Based Networks | Matthew Dutson et.al. | 2512.03014 | null |
| 2025-12-02 | Enhancing Floor Plan Recognition: A Hybrid Mix-Transformer and U-Net Approach for Precise Wall Segmentation | Dmitriy Parashchuk et.al. | 2512.02413 | null |
| 2025-12-02 | Reproducing and Extending RaDelft 4D Radar with Camera-Assisted Labels | Kejia Hu et.al. | 2512.02394 | null |
| 2025-12-02 | SAGE: Style-Adaptive Generalization for Privacy-Constrained Semantic Segmentation Across Domains | Qingmei Li et.al. | 2512.02369 | null |
| 2025-12-01 | Multifractal Recalibration of Neural Networks for Medical Imaging Segmentation | Miguel L. Martins et.al. | 2512.02198 | null |
| 2025-12-01 | Evaluating SAM2 for Video Semantic Segmentation | Syed Hesham Syed Ariff et.al. | 2512.01774 | null |
| 2025-12-01 | SSR: Semantic and Spatial Rectification for CLIP-based Weakly Supervised Segmentation | Xiuli Bi et.al. | 2512.01701 | null |
| 2025-12-01 | ViT $^3$ : Unlocking Test-Time Training in Vision | Dongchen Han et.al. | 2512.01643 | null |
| 2025-12-01 | Toward Content-based Indexing and Retrieval of Head and Neck CT with Abscess Segmentation | Thao Thi Phuong Dao et.al. | 2512.01589 | null |
| 2025-12-01 | ELVIS: Enhance Low-Light for Video Instance Segmentation in the Dark | Joanne Lin et.al. | 2512.01495 | null |
| 2025-12-01 | Panda: Self-distillation of Reusable Sensor-level Representations for High Energy Physics | Samuel Young et.al. | 2512.01324 | null |
| 2025-12-01 | TabletopGen: Instance-Level Interactive 3D Tabletop Scene Generation from Text or Single Image | Ziqian Wang et.al. | 2512.01204 | null |
| 2025-11-30 | Binary-Gaussian: Compact and Progressive Representation for 3D Gaussian Segmentation | An Yang et.al. | 2512.00944 | null |
| 2025-11-30 | The Outline of Deception: Physical Adversarial Attacks on Traffic Signs Using Edge Patches | Haojie Ji et.al. | 2512.00765 | null |
| 2025-11-30 | VFM-ISRefiner: Towards Better Adapting Vision Foundation Models for Interactive Segmentation of Remote Sensing Images | Deliang Wang et.al. | 2512.00718 | null |
| 2025-11-29 | Doppler-Enhanced Deep Learning: Improving Thyroid Nodule Segmentation with YOLOv5 Instance Segmentation | Mahmoud El Hussieni et.al. | 2512.00639 | null |
| 2025-11-29 | EZ-SP: Fast and Lightweight Superpoint-Based 3D Segmentation | Louis Geist et.al. | 2512.00385 | null |
| 2025-11-29 | Breaking It Down: Domain-Aware Semantic Segmentation for Retrieval Augmented Generation | Aparajitha Allamraju et.al. | 2512.00367 | null |
| 2025-11-29 | Towards aligned body representations in vision models | Andrey Gizdov et.al. | 2512.00365 | null |
| 2025-11-28 | Learning to Predict Aboveground Biomass from RGB Images with 3D Synthetic Scenes | Silvia Zuffi et.al. | 2511.23249 | null |
| 2025-11-28 | Taming the Light: Illumination-Invariant Semantic 3DGS-SLAM | Shouhe Zhang et.al. | 2511.22968 | null |
| 2025-11-28 | Do We Need Perfect Data? Leveraging Noise for Domain Generalized Segmentation | Taeyeong Kim et.al. | 2511.22948 | null |
| 2025-11-27 | GazeTrack: High-Precision Eye Tracking Based on Regularization and Spatial Computing | Xiaoyin Yang et.al. | 2511.22607 | null |
| 2025-11-27 | 3D Affordance Keypoint Detection for Robotic Manipulation | Zhiyang Liu et.al. | 2511.22195 | null |
| 2025-11-26 | OpenTwinMap: An Open-Source Digital Twin Generator for Urban Autonomous Driving | Alex Richardson et.al. | 2511.21925 | null |
| 2025-11-26 | ReSAM: Refine, Requery, and Reinforce: Self-Prompting Point-Supervised Segmentation for Remote Sensing Images | M. Naseer Subhani et.al. | 2511.21606 | null |
| 2025-11-26 | Shift-Equivariant Complex-Valued Convolutional Neural Networks | Quentin Gabot et.al. | 2511.21250 | null |
| 2025-11-25 | Open Vocabulary Compositional Explanations for Neuron Alignment | Biagio La Rosa et.al. | 2511.20931 | null |
| 2025-11-25 | Automated Monitoring of Cultural Heritage Artifacts Using Semantic Segmentation | Andrea Ranieri et.al. | 2511.20541 | null |
| 2025-11-25 | CrossEarth-Gate: Fisher-Guided Adaptive Tuning Engine for Efficient Adaptation of Cross-Domain Remote Sensing Semantic Segmentation | Shilei Cao et.al. | 2511.20302 | null |
| 2025-11-25 | SAM-MI: A Mask-Injected Framework for Enhancing Open-Vocabulary Semantic Segmentation with SAM | Lin Chen et.al. | 2511.20027 | null |
| 2025-11-25 | Supervise Less, See More: Training-free Nuclear Instance Segmentation with Prototype-Guided Prompting | Wen Zhang et.al. | 2511.19953 | null |
| 2025-11-24 | Lightweight Transformer Framework for Weakly Supervised Semantic Segmentation | Ali Torabi et.al. | 2511.19765 | null |
| 2025-11-24 | RADSeg: Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglomerative Models | Omar Alama et.al. | 2511.19704 | null |
| 2025-11-24 | Studying Maps at Scale: A Digital Investigation of Cartography and the Evolution of Figuration | Remi Petitpierre et.al. | 2511.19538 | null |
| 2025-11-24 | BackSplit: The Importance of Sub-dividing the Background in Biomedical Lesion Segmentation | Rachit Saluja et.al. | 2511.19394 | null |
| 2025-11-24 | nnActive: A Framework for Evaluation of Active Learning in 3D Biomedical Segmentation | Carsten T. Lüth et.al. | 2511.19183 | null |
| 2025-11-24 | DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection | Hai Ci et.al. | 2511.19111 | null |
| 2025-11-24 | SupLID: Geometrical Guidance for Out-of-Distribution Detection in Semantic Segmentation | Nimeshika Udayangani et.al. | 2511.18816 | null |
| 2025-11-24 | PartDiffuser: Part-wise 3D Mesh Generation via Discrete Diffusion | Yichen Yang et.al. | 2511.18801 | null |
| 2025-11-23 | SegSplat: Feed-forward Gaussian Splatting and Open-Set Semantic Segmentation | Peter Siegel et.al. | 2511.18386 | null |
| 2025-11-23 | UniFlow: Towards Zero-Shot LiDAR Scene Flow for Autonomous Vehicles via Cross-Domain Generalization | Siyi Li et.al. | 2511.18254 | null |
| 2025-11-22 | Matching-Based Few-Shot Semantic Segmentation Models Are Interpretable by Design | Pasquale De Marinis et.al. | 2511.18163 | link |
| 2025-11-22 | AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens | Purvish Jajal et.al. | 2511.18105 | null |
| 2025-11-18 | HSMix: Hard and Soft Mixing Data Augmentation for Medical Image Segmentation | Danyang Sun et.al. | 2511.17614 | null |
| 2025-11-21 | Improving Multimodal Distillation for 3D Semantic Segmentation under Domain Shift | Björn Michele et.al. | 2511.17455 | null |
| 2025-11-21 | REMSA: An LLM Agent for Foundation Model Selection in Remote Sensing | Binger Chen et.al. | 2511.17442 | null |
| 2025-11-21 | FisheyeGaussianLift: BEV Feature Lifting for Surround-View Fisheye Camera Perception | Shubham Sonarghare et.al. | 2511.17210 | null |
| 2025-11-20 | Late-decoupled 3D Hierarchical Semantic Segmentation with Semantic Prototype Discrimination based Bi-branch Supervision | Shuyu Cao et.al. | 2511.16650 | null |
| 2025-11-20 | Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling | Minseok Seo et.al. | 2511.16301 | link |
| 2025-11-20 | Target Refocusing via Attention Redistribution for Open-Vocabulary Semantic Segmentation: An Explainability Perspective | Jiahao Li et.al. | 2511.16170 | null |
| 2025-11-20 | InfoCLIP: Bridging Vision-Language Pretraining and Open-Vocabulary Semantic Segmentation via Information-Theoretic Alignment Transfer | Muyao Yuan et.al. | 2511.15967 | null |
| 2025-11-19 | Automatic Uncertainty-Aware Synthetic Data Bootstrapping for Historical Map Segmentation | Lukas Arzoumanidis et.al. | 2511.15875 | null |
| 2025-11-19 | GEO-Bench-2: From Performance to Capability, Rethinking Evaluation in Geospatial AI | Naomi Simumba et.al. | 2511.15658 | null |
| 2025-11-19 | Multi-Text Guided Few-Shot Semantic Segmentation | Qiang Jiao et.al. | 2511.15515 | null |
| 2025-11-19 | WarNav: An Autonomous Driving Benchmark for Segmentation of Navigable Zones in War Scenes | Marc-Emmanuel Coupvent des Graviers et.al. | 2511.15429 | null |
| 2025-11-19 | Controlling False Positives in Image Segmentation via Conformal Prediction | Luca Mossina et.al. | 2511.15406 | null |
| 2025-11-18 | EGSA-PT:Edge-Guided Spatial Attention with Progressive Training for Monocular Depth Estimation and Segmentation of Transparent Objects | Gbenga Omotara et.al. | 2511.14970 | null |
| 2025-11-18 | FarSLIP: Discovering Effective CLIP Adaptation for Fine-Grained Remote Sensing Understanding | Zhenshi Li et.al. | 2511.14901 | link |
| 2025-11-18 | Segmentation-Aware Latent Diffusion for Satellite Image Super-Resolution: Enabling Smallholder Farm Boundary Delineation | Aditi Agarwal et.al. | 2511.14481 | null |
| 2025-11-18 | Step by Step Network | Dongchen Han et.al. | 2511.14329 | null |
| 2025-11-18 | Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution | N Dinesh Reddy et.al. | 2511.14210 | null |
| 2025-11-17 | Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting | Jiangnan Ye et.al. | 2511.13684 | null |
| 2025-11-17 | Mapping the Vanishing and Transformation of Urban Villages in China | Wenyu Zhang et.al. | 2511.13507 | null |
| 2025-11-17 | Delineate Anything Flow: Fast, Country-Level Field Boundary Detection from Any Source | Mykola Lavreniuk et.al. | 2511.13417 | null |
| 2025-11-17 | DiffPixelFormer: Differential Pixel-Aware Transformer for RGB-D Indoor Scene Segmentation | Yan Gong et.al. | 2511.13047 | null |
| 2025-11-15 | FaNe: Towards Fine-Grained Cross-Modal Contrast with False-Negative Reduction and Text-Conditioned Sparse Attention | Peng Zhang et.al. | 2511.12215 | null |
| 2025-11-15 | Evaluation of Attention Mechanisms in U-Net Architectures for Semantic Segmentation of Brazilian Rock Art Petroglyphs | Leonardi Melo et.al. | 2511.11959 | null |
| 2025-11-14 | Chain-of-Generation: Progressive Latent Diffusion for Text-Guided Molecular Design | Lingxiao Li et.al. | 2511.11894 | null |
| 2025-11-14 | Advancing Annotat3D with Harpia: A CUDA-Accelerated Library For Large-Scale Volumetric Data Segmentation | Camila Machado de Araujo et.al. | 2511.11890 | null |
| 2025-11-13 | AdaptFly: Prompt-Guided Adaptation of Foundation Models for Low-Altitude UAV Networks | Jiao Chen et.al. | 2511.11720 | null |
| 2025-11-14 | Terrain Costmap Generation via Scaled Preference Conditioning | Luisa Mao et.al. | 2511.11529 | null |
| 2025-11-13 | Histology-informed tiling of whole tissue sections improves the interpretability and predictability of cancer relapse and genetic alterations | Willem Bonnaffé et.al. | 2511.10432 | null |
| 2025-11-13 | Domain Adaptation for Camera-Specific Image Characteristics using Shallow Discriminators | Maximiliane Gruber et.al. | 2511.10424 | null |
| 2025-11-13 | DBGroup: Dual-Branch Point Grouping for Weakly Supervised 3D Semantic Instance Segmentation | Xuexun Liu et.al. | 2511.10003 | null |
| 2025-11-04 | Do Street View Imagery and Public Participation GIS align: Comparative Analysis of Urban Attractiveness | Milad Malekzadeh et.al. | 2511.05570 | null |
| 2025-11-03 | Compressing Multi-Task Model for Autonomous Driving via Pruning and Knowledge Distillation | Jiayuan Wang et.al. | 2511.05557 | null |
| 2025-11-06 | An Active Learning Pipeline for Biomedical Image Instance Segmentation with Minimal Human Intervention | Shuo Zhao et.al. | 2511.04811 | null |
| 2025-11-06 | Cambrian-S: Towards Spatial Supersensing in Video | Shusheng Yang et.al. | 2511.04670 | null |
| 2025-11-06 | Vitessce Link: A Mixed Reality and 2D Display Hybrid Approach for Visual Analysis of 3D Tissue Maps | Eric Mörth et.al. | 2511.04262 | null |
| 2025-11-06 | CaRF: Enhancing Multi-View Consistency in Referring 3D Gaussian Splatting Segmentation | Yuwen Tao et.al. | 2511.03992 | null |
| 2025-11-05 | Laugh, Relate, Engage: Stylized Comment Generation for Short Videos | Xuan Ouyang et.al. | 2511.03757 | null |
| 2025-11-05 | Computational Imaging Meets LLMs: Zero-Shot IDH Mutation Prediction in Brain Gliomas | Syed Muqeem Mahmood et.al. | 2511.03376 | null |
| 2025-11-05 | Enhancing Medical Image Segmentation via Heat Conduction Equation | Rong Wu et.al. | 2511.03260 | null |
| 2025-11-05 | Diffusion-Guided Mask-Consistent Paired Mixing for Endoscopic Image Segmentation | Pengyu Jie et.al. | 2511.03219 | null |
| 2025-11-05 | Subsampled Randomized Fourier GaLore for Adapting Foundation Models in Depth-Driven Liver Landmark Segmentation | Yun-Chen Lin et.al. | 2511.03163 | null |
| 2025-11-05 | Accelerating Physical Property Reasoning for Augmented Visual Cognition | Hongbo Lan et.al. | 2511.03126 | null |
| 2025-11-04 | Learning with less: label-efficient land cover classification at very high spatial resolution using self-supervised deep learning | Dakota Hester et.al. | 2511.03004 | null |
| 2025-11-04 | Comprehensive Assessment of LiDAR Evaluation Metrics: A Comparative Study Using Simulated and Real Data | Syed Mostaquim Ali et.al. | 2511.02994 | null |
| 2025-11-04 | Digital Twin-Driven Pavement Health Monitoring and Maintenance Optimization Using Graph Neural Networks | Mohsin Mahmud Topu et.al. | 2511.02957 | null |
| 2025-11-04 | Optimizing the nnU-Net model for brain tumor (Glioma) segmentation Using a BraTS Sub-Saharan Africa (SSA) dataset | Chukwuemeka Arua Kalu et.al. | 2511.02893 | null |
| 2025-11-02 | Digitizing Spermatogenesis Lineage at Nanoscale Resolution In Tissue-Level Electron Microscopy | Li Xiao et.al. | 2511.02860 | null |
| 2025-11-04 | Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks | Dmitrii Pozdeev et.al. | 2511.02830 | null |
| 2025-11-04 | PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing | Antonio Oroz et.al. | 2511.02777 | null |
| 2025-11-04 | Resource-efficient Automatic Refinement of Segmentations via Weak Supervision from Light Feedback | Alix de Langlais et.al. | 2511.02576 | null |
| 2025-11-04 | ESA: Energy-Based Shot Assembly Optimization for Automatic Video Editing | Yaosen Chen et.al. | 2511.02505 | null |
| 2025-11-04 | Synthetic Crop-Weed Image Generation and its Impact on Model Generalization | Garen Boyadjian et.al. | 2511.02417 | null |
| 2025-11-04 | Revisiting put-that-there, context aware window interactions via LLMs | Riccardo Bovo et.al. | 2511.02378 | null |
| 2025-11-04 | From Instance Segmentation to 3D Growth Trajectory Reconstruction in Planktonic Foraminifera | Huahua Lin et.al. | 2511.02142 | null |
| 2025-11-03 | Terrain-Enhanced Resolution-aware Refinement Attention for Off-Road Segmentation | Seongkyu Choi et.al. | 2511.01434 | null |
| 2025-11-03 | MIQ-SAM3D: From Single-Point Prompt to Multi-Instance Segmentation via Competitive Query Refinement | Jierui Qu et.al. | 2511.01345 | null |
| 2025-11-03 | Source-Only Cross-Weather LiDAR via Geometry-Aware Point Drop | YoungJae Cheong et.al. | 2511.01250 | null |
| 2025-11-03 | CenterMamba-SAM: Center-Prioritized Scanning and Temporal Prototypes for Brain Lesion Segmentation | Yu Tian et.al. | 2511.01243 | null |
| 2025-11-03 | An Enhanced Proprioceptive Method for Soft Robots Integrating Bend Sensors and IMUs | Dong Heon Han et.al. | 2511.01165 | null |
| 2025-11-03 | MicroAUNet: Boundary-Enhanced Multi-scale Fusion with Knowledge Distillation for Colonoscopy Polyp Image Segmentation | Ziyi Wang et.al. | 2511.01143 | null |
| 2025-11-02 | URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model | Zhe Li et.al. | 2511.00940 | null |
| 2025-11-02 | TA-LSDiff:Topology-Aware Diffusion Guided by a Level Set Energy for Pancreas Segmentation | Yue Gou et.al. | 2511.00815 | null |
| 2025-11-02 | Rhythm in the Air: Vision-based Real-Time Music Generation through Gestures | Barathi Subramanian et.al. | 2511.00793 | null |
| 2025-11-02 | Class-agnostic 3D Segmentation by Granularity-Consistent Automatic 2D Mask Tracking | Juan Wang et.al. | 2511.00785 | null |
| 2025-11-01 | Grounding Surgical Action Triplets with Instrument Instance Segmentation: A Dataset and Target-Aware Fusion Approach | Oluwatosin Alabi et.al. | 2511.00643 | null |
| 2025-11-01 | Text-guided Fine-Grained Video Anomaly Detection | Jihao Gu et.al. | 2511.00524 | null |
| 2025-11-01 | Optimization of continuous-flow over traffic networks with fundamental diagram constraints | Anqi Dong et.al. | 2511.00500 | null |
| 2025-11-01 | HumanCrafter: Synergizing Generalizable Human Reconstruction and Semantic 3D Segmentation | Panwang Pan et.al. | 2511.00468 | null |
| 2025-11-01 | Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse | Shaojie Wang et.al. | 2511.00413 | null |
| 2025-10-31 | Predicting the spatial distribution and demographics of commercial swine farms in the United States | Felipe E. Sanchez et.al. | 2511.00132 | null |
| 2025-10-29 | Habitat and Land Cover Change Detection in Alpine Protected Areas: A Comparison of AI Architectures | Harald Kristen et.al. | 2511.00073 | null |
| 2025-10-31 | VessShape: Few-shot 2D blood vessel segmentation by leveraging shape priors from synthetic images | Cesar H. Comin et.al. | 2510.27646 | null |
| 2025-10-31 | Context-Gated Cross-Modal Perception with Visual Mamba for PET-CT Lung Tumor Segmentation | Elena Mulero Ayllón et.al. | 2510.27508 | null |
| 2025-10-31 | Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery | Mahmoud El Hussieni et.al. | 2510.27224 | null |
| 2025-10-31 | SpecAware: A Spectral-Content Aware Foundation Model for Unifying Multi-Sensor Learning in Hyperspectral Remote Sensing Mapping | Renjie Ji et.al. | 2510.27219 | null |
| 2025-10-31 | MLPerf Automotive | Radoyeh Shojaei et.al. | 2510.27065 | null |
| 2025-10-30 | AD-SAM: Fine-Tuning the Segment Anything Vision Foundation Model for Autonomous Driving Perception | Mario Camarena et.al. | 2510.27047 | null |
| 2025-10-30 | Photometric Redshifts in JWST Deep Fields: A Pixel-Based Alternative with DeepDISC | Grant Merz et.al. | 2510.27032 | null |
| 2025-10-30 | Surpassing state of the art on AMD area estimation from RGB fundus images through careful selection of U-Net architectures and loss functions for class imbalance | Valentyna Starodub et.al. | 2510.26778 | null |
| 2025-10-30 | Revisiting Generative Infrared and Visible Image Fusion Based on Human Cognitive Laws | Lin Guo et.al. | 2510.26268 | null |
| 2025-10-29 | BikeScenes: Online LiDAR Semantic Segmentation for Bicycles | Denniz Goren et.al. | 2510.25901 | null |
| 2025-10-29 | StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA | Yuhang Hu et.al. | 2510.25332 | null |
| 2025-10-29 | LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation | Yang Miao et.al. | 2510.25263 | null |
| 2025-10-29 | Mapping and Classification of Trees Outside Forests using Deep Learning | Moritz Lucas et.al. | 2510.25239 | null |
| 2025-10-29 | Classifier Enhancement Using Extended Context and Domain Experts for Semantic Segmentation | Huadong Tang et.al. | 2510.25174 | null |
| 2025-10-29 | EA3D: Online Open-World 3D Object Extraction from Streaming Videos | Xiaoyu Zhou et.al. | 2510.25146 | null |
| 2025-10-29 | Region-CAM: Towards Accurate Object Regions in Class Activation Maps for Weakly Supervised Learning Tasks | Qingdong Cai et.al. | 2510.25134 | null |
| 2025-10-28 | A Critical Study towards the Detection of Parkinsons Disease using ML Technologies | Vivek Chetia et.al. | 2510.24456 | null |
| 2025-10-28 | A Quantitative Evaluation Framework for Explainable AI in Semantic Segmentation | Reem Hammoud et.al. | 2510.24414 | null |
| 2025-10-27 | Improving Visual Discriminability of CLIP for Training-Free Open-Vocabulary Semantic Segmentation | Jinxin Zhou et.al. | 2510.23894 | null |
| 2025-10-27 | DPGLA: Bridging the Gap between Synthetic and Real Data for Unsupervised Domain Adaptation in 3D LiDAR Semantic Segmentation | Wanmeng Li et.al. | 2510.23525 | null |
| 2025-10-27 | One-Timestep is Enough: Achieving High-performance ANN-to-SNN Conversion via Scale-and-Fire Neurons | Qiuyang Chen et.al. | 2510.23383 | null |
| 2025-10-27 | Seq-DeepIPC: Sequential Sensing for End-to-End Control in Legged Robot Navigation | Oskar Natan et.al. | 2510.23057 | null |
| 2025-10-26 | WaveMAE: Wavelet decomposition Masked Auto-Encoder for Remote Sensing | Vittorio Bernuzzi et.al. | 2510.22697 | null |
| 2025-10-26 | A Critical Study on Tea Leaf Disease Detection using Deep Learning Techniques | Nabajyoti Borah et.al. | 2510.22647 | null |
| 2025-10-26 | SABlock: Semantic-Aware KV Cache Eviction with Adaptive Compression Block Size | Jinhan Chen et.al. | 2510.22556 | null |
| 2025-10-25 | Real-Time Semantic Segmentation on FPGA for Autonomous Vehicles Using LMIINet with the CGRA4ML Framework | Amir Mohammad Khadem Hosseini et.al. | 2510.22243 | null |
| 2025-10-25 | Diffusion-Driven Two-Stage Active Learning for Low-Budget Semantic Segmentation | Jeongin Kim et.al. | 2510.22229 | null |
| 2025-10-25 | Simplifying Knowledge Transfer in Pretrained Models | Siddharth Jain et.al. | 2510.22208 | null |
| 2025-10-25 | Bridging Perception and Reasoning: Dual-Pipeline Neuro-Symbolic Landing for UAVs in Cluttered Environments | Weixian Qian et.al. | 2510.22204 | null |
| 2025-10-24 | AURASeg: Attention Guided Upsampling with Residual Boundary-Assistive Refinement for Drivable-Area Segmentation | Narendhiran Vijayakumar et.al. | 2510.21536 | null |
| 2025-10-24 | Unveiling the Spatial-temporal Effective Receptive Fields of Spiking Neural Networks | Jieyuan Zhang et.al. | 2510.21403 | null |
| 2025-10-24 | Urban 3D Change Detection Using LiDAR Sensor for HD Map Maintenance and Smart Mobility | Hezam Albagami et.al. | 2510.21112 | null |
| 2025-10-24 | WaveSeg: Enhancing Segmentation Precision via High-Frequency Prior and Mamba-Driven Spectrum Decomposition | Guoan Xu et.al. | 2510.21079 | null |
| 2025-10-23 | ACS-SegNet: An Attention-Based CNN-SegFormer Segmentation Network for Tissue Segmentation in Histopathology | Nima Torbati et.al. | 2510.20754 | null |
| 2025-10-22 | Uncertainty evaluation of segmentation models for Earth observation | Melanie Rey et.al. | 2510.19586 | null |
| 2025-10-22 | Automated Morphological Analysis of Neurons in Fluorescence Microscopy Using YOLOv8 | Banan Alnemri et.al. | 2510.19455 | null |
| 2025-10-21 | ε-Seg: Sparsely Supervised Semantic Segmentation of Microscopy Data | Sheida Rahnamai Kordasiabi et.al. | 2510.18637 | null |
| 2025-10-21 | Learning to Navigate Under Imperfect Perception: Conformalised Segmentation for Safe Reinforcement Learning | Daniel Bethell et.al. | 2510.18485 | null |
| 2025-10-21 | DART: A Structured Dataset of Regulatory Drug Documents in Italian for Clinical NLP | Mariano Barone et.al. | 2510.18475 | null |
| 2025-10-20 | Accelerating Vision Transformers with Adaptive Patch Sizes | Rohan Choudhury et.al. | 2510.18091 | link |
| 2025-10-17 | 3D Weakly Supervised Semantic Segmentation via Class-Aware and Geometry-Guided Pseudo-Label Refinement | Xiaoxu Xu et.al. | 2510.17875 | null |
| 2025-10-20 | 4DSegStreamer: Streaming 4D Panoptic Segmentation via Dual Threads | Ling Liu et.al. | 2510.17664 | null |
| 2025-10-20 | Expose Camouflage in the Water: Underwater Camouflaged Instance Segmentation and Dataset | Chuhong Wang et.al. | 2510.17585 | null |
| 2025-10-20 | M2H: Multi-Task Learning with Efficient Window-Based Cross-Task Attention for Monocular Spatial Perception | U. V. B. L Udugama et.al. | 2510.17363 | null |
| 2025-10-20 | Exploring Structural Degradation in Dense Representations for Self-supervised Learning | Siran Dai et.al. | 2510.17299 | null |
| 2025-10-19 | ArmFormer: Lightweight Transformer Architecture for Real-Time Multi-Class Weapon Segmentation and Classification | Akhila Kambhatla et.al. | 2510.16854 | null |
| 2025-10-19 | Needles in the Landscape: Semi-Supervised Pseudolabeling for Archaeological Site Discovery under Label Scarcity | Simon Jaxy et.al. | 2510.16814 | null |
| 2025-10-19 | An Efficient Semantic Segmentation Decoder for In-Car or Distributed Applications | Danish Nazir et.al. | 2510.16747 | null |
| 2025-10-19 | UKANFormer: Noise-Robust Semantic Segmentation for Coral Reef Mapping via a Kolmogorov-Arnold Network-Transformer Hybrid | Tianyang Dou et.al. | 2510.16730 | null |
| 2025-10-18 | Self-Supervised Learning to Fly using Efficient Semantic Segmentation and Metric Depth Estimation for Low-Cost Autonomous UAVs | Sebastian Mocanu et.al. | 2510.16624 | null |
| 2025-10-18 | Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis | Mohammad Javad Ahmadi et.al. | 2510.16371 | null |
| 2025-10-17 | Neuro-Symbolic Spatial Reasoning in Segmentation | Jiayi Lin et.al. | 2510.15841 | null |
| 2025-10-17 | Semantic segmentation with coarse annotations | Jort de Jong et.al. | 2510.15756 | null |
| 2025-10-17 | Semantic4Safety: Causal Insights from Zero-shot Street View Imagery Segmentation for Urban Road Safety | Huan Chen et.al. | 2510.15434 | null |
| 2025-10-17 | MARIS: Marine Open-Vocabulary Instance Segmentation with Geometric Enhancement and Semantic Alignment | Bingyu Li et.al. | 2510.15398 | null |
| 2025-10-17 | TranSimHub:A Unified Air-Ground Simulation Platform for Multi-Modal Perception and Decision-Making | Maonan Wang et.al. | 2510.15365 | null |
| 2025-10-17 | RankSEG-RMA: An Efficient Segmentation Algorithm via Reciprocal Moment Approximation | Zixun Wang et.al. | 2510.15362 | null |
| 2025-10-17 | Symmetric Entropy-Constrained Video Coding for Machines | Yuxiao Sun et.al. | 2510.15347 | null |
| 2025-10-16 | Comprehensive language-image pre-training for 3D medical image understanding | Tassilo Wald et.al. | 2510.15042 | null |
| 2025-10-16 | MOBIUS: Big-to-Mobile Universal Instance Segmentation via Multi-modal Bottleneck Fusion and Calibrated Decoder Pruning | Mattia Segu et.al. | 2510.15026 | null |
| 2025-10-16 | Multi-modal video data-pipelines for machine learning with minimal human supervision | Mihai-Cristian Pîrvu et.al. | 2510.14862 | null |
| 2025-10-15 | PoissonNet: A Local-Global Approach for Learning on Surfaces | Arman Maesumi et.al. | 2510.14146 | null |
| 2025-10-15 | Multi-Scale High-Resolution Logarithmic Grapher Module for Efficient Vision GNNs | Mustafa Munir et.al. | 2510.13740 | null |
| 2025-10-15 | Dedelayed: Deleting remote inference delay via on-device correction | Dan Jacobellis et.al. | 2510.13714 | null |
| 2025-10-15 | Novel Class Discovery for Point Cloud Segmentation via Joint Learning of Causal Representation and Reasoning | Yang Li et.al. | 2510.13307 | null |
| 2025-10-15 | FlyAwareV2: A Multimodal Cross-Domain UAV Dataset for Urban Scene Understanding | Francesco Barbato et.al. | 2510.13243 | null |
| 2025-10-14 | SPORTS: Simultaneous Panoptic Odometry, Rendering, Tracking and Segmentation for Urban Scenes Understanding | Zhiliu Yang et.al. | 2510.12749 | null |
| 2025-10-14 | Multiplicative Loss for Enhancing Semantic Segmentation in Medical and Cellular Images | Yuto Yokoi et.al. | 2510.12258 | null |
| 2025-10-14 | BEEP3D: Box-Supervised End-to-End Pseudo-Mask Generation for 3D Instance Segmentation | Youngju Yoo et.al. | 2510.12182 | null |
| 2025-10-13 | A Framework for Low-Effort Training Data Generation for Urban Semantic Segmentation | Denis Zavadski et.al. | 2510.11567 | null |
| 2025-10-13 | Building and Evaluating a Realistic Virtual World for Large Scale Urban Exploration from 360° Videos | Mizuki Takenawa et.al. | 2510.11447 | null |
| 2025-10-13 | Uncertainty-Aware ControlNet: Bridging Domain Gaps with Synthetic Image Generation | Joshua Niemeijer et.al. | 2510.11346 | null |
| 2025-10-12 | DAGLFNet:Deep Attention-Guided Global-Local Feature Fusion for Pseudo-Image Point Cloud Segmentation | Chuang Chen et.al. | 2510.10471 | null |
| 2025-10-11 | MRI Brain Tumor Detection with Computer Vision | Jack Krolik et.al. | 2510.10250 | null |
| 2025-10-11 | SparseUWSeg: Active Sparse Point-Label Augmentation for Underwater Semantic Segmentation | César Borja et.al. | 2510.10163 | null |
| 2025-10-11 | An Unsupervised Time Series Anomaly Detection Approach for Efficient Online Process Monitoring of Additive Manufacturing | Frida Cantu et.al. | 2510.09977 | null |
| 2025-10-10 | Cell Instance Segmentation: The Devil Is in the Boundaries | Peixian Liang et.al. | 2510.09848 | null |
| 2025-10-10 | A methodology for clinically driven interactive segmentation evaluation | Parhom Esmaeili et.al. | 2510.09499 | null |
| 2025-10-10 | SilvaScenes: Tree Segmentation and Species Classification from Under-Canopy Images in Natural Forests | David-Alexandre Duclos et.al. | 2510.09458 | null |
| 2025-10-10 | Instance-Aware Robust Consistency Regularization for Semi-Supervised Nuclei Instance Segmentation | Zenan Lin et.al. | 2510.09329 | null |
| 2025-10-10 | SOS: Synthetic Object Segments Improve Detection, Segmentation, and Grounding | Weikai Huang et.al. | 2510.09110 | null |
| 2025-10-10 | Exploring Single Domain Generalization of LiDAR-based Semantic Segmentation under Imperfect Labels | Weitong Kong et.al. | 2510.09035 | null |
| 2025-10-10 | Pinpointing crucial steps: Attribution-based Credit Assignment for Verifiable Reinforcement Learning | Junxi Yin et.al. | 2510.08899 | null |
| 2025-10-09 | FOLK: Fast Open-Vocabulary 3D Instance Segmentation via Label-guided Knowledge Distillation | Hongrui Wu et.al. | 2510.08849 | null |
| 2025-10-08 | Out-of-Distribution Detection in LiDAR Semantic Segmentation Using Epistemic Uncertainty from Hierarchical GMMs | Hanieh Shojaei Miandashti et.al. | 2510.08631 | null |
| 2025-10-08 | HARP-NeXt: High-Speed and Accurate Range-Point Fusion Network for 3D LiDAR Semantic Segmentation | Samir Abou Haidar et.al. | 2510.06876 | null |
| 2025-10-08 | Semantic Segmentation Algorithm Based on Light Field and LiDAR Fusion | Jie Luo et.al. | 2510.06687 | null |
| 2025-10-08 | Through the Perspective of LiDAR: A Feature-Enriched and Uncertainty-Aware Annotation Pipeline for Terrestrial Point Cloud Segmentation | Fei Zhang et.al. | 2510.06582 | null |
| 2025-10-07 | Dropping the D: RGB-D SLAM Without the Depth Sensor | Mert Kiray et.al. | 2510.06216 | link |
| 2025-10-07 | Overlap-aware segmentation for topological reconstruction of obscured objects | J. Schueler et.al. | 2510.06194 | null |
| 2025-10-07 | Shaken or Stirred? An Analysis of MetaFormer’s Token Mixing for Medical Imaging | Ron Keuth et.al. | 2510.05971 | null |
| 2025-10-07 | ALISE: Annotation-Free LiDAR Instance Segmentation for Autonomous Driving | Yongxuan Lyu et.al. | 2510.05752 | null |
| 2025-07-25 | Co-Win: Joint Object Detection and Instance Segmentation in LiDAR Point Clouds via Collaborative Window Processing | Haichuan Li et.al. | 2507.19691 | null |
| 2025-07-25 | SurgPIS: Surgical-instrument-level Instances and Part-level Semantics for Weakly-supervised Part-aware Instance Segmentation | Meng Wei et.al. | 2507.19592 | null |
| 2025-07-24 | HybridTM: Combining Transformer and Mamba for 3D Semantic Segmentation | Xinyu Wang et.al. | 2507.18575 | null |
| 2025-07-24 | Synthetic Data Augmentation for Enhanced Chicken Carcass Instance Segmentation | Yihong Feng et.al. | 2507.18558 | null |
| 2025-07-24 | Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows | Simin Huo et.al. | 2507.18405 | link |
| 2025-07-24 | GVCCS: A Dataset for Contrail Identification and Tracking on Visible Whole Sky Camera Sequences | Gabriel Jarry et.al. | 2507.18330 | null |
| 2025-07-24 | SemiSegECG: A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation | Minje Park et.al. | 2507.18323 | link |
| 2025-07-24 | Unsupervised Domain Adaptation for 3D LiDAR Semantic Segmentation Using Contrastive Learning and Multi-Model Pseudo Labeling | Abhishek Kaushik et.al. | 2507.18176 | null |
| 2025-07-23 | AFRDA: Attentive Feature Refinement for Domain Adaptive Semantic Segmentation | Md. Al-Masrur Khan et.al. | 2507.17957 | link |
| 2025-07-23 | Exploring Spatial Diversity for Region-based Active Learning | Lile Cai et.al. | 2507.17367 | null |
| 2025-07-23 | Exploring Active Learning for Semiconductor Defect Segmentation | Lile Cai et.al. | 2507.17359 | null |
| 2025-07-23 | Swin-TUNA : A Novel PEFT Approach for Accurate Food Image Segmentation | Haotian Chen et.al. | 2507.17347 | null |
| 2025-07-23 | On Temporal Guidance and Iterative Refinement in Audio Source Separation | Tobias Morocutti et.al. | 2507.17297 | null |
| 2025-07-23 | ScSAM: Debiasing Morphology and Distributional Variability in Subcellular Semantic Segmentation | Bo Fang et.al. | 2507.17149 | null |
| 2025-07-22 | MultiTaskDeltaNet: Change Detection-based Image Segmentation for Operando ETEM with Application to Carbon Gasification Kinetics | Yushuo Niu et.al. | 2507.16803 | null |
| 2025-07-22 | A2Mamba: Attention-augmented State Space Models for Visual Recognition | Meng Lou et.al. | 2507.16624 | null |
| 2025-07-22 | Semantic Segmentation for Preoperative Planning in Transcatheter Aortic Valve Replacement | Cedric Zöllner et.al. | 2507.16573 | null |
| 2025-07-22 | Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge | Tobias Rueckert et.al. | 2507.16559 | null |
| 2025-07-23 | EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion | Shang Liu et.al. | 2507.16535 | null |
| 2025-07-22 | Advancing Visual Large Language Model for Multi-granular Versatile Perception | Wentao Xiang et.al. | 2507.16213 | null |
| 2025-07-22 | AMMNet: An Asymmetric Multi-Modal Network for Remote Sensing Semantic Segmentation | Hui Ye et.al. | 2507.16158 | null |
| 2025-07-21 | Improved Semantic Segmentation from Ultra-Low-Resolution RGB Images Applied to Privacy-Preserving Object-Goal Navigation | Xuying Huang et.al. | 2507.16034 | null |
| 2025-07-21 | ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction | Danhui Chen et.al. | 2507.15803 | null |
| 2025-07-21 | ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting | Ruijie Zhu et.al. | 2507.15454 | null |
| 2025-07-21 | Rethinking Occlusion in FER: A Semantic-Aware Perspective and Go Beyond | Huiyu Zhai et.al. | 2507.15401 | null |
| 2025-07-20 | Towards Geometric and Textural Consistency 3D Scene Generation via Single Image-guided Model Generation and Layout Optimization | Xiang Tang et.al. | 2507.14841 | null |
| 2025-07-20 | A Novel Downsampling Strategy Based on Information Complementarity for Medical Image Segmentation | Wenbo Yue et.al. | 2507.14790 | null |
| 2025-07-19 | GTPBD: A Fine-Grained Global Terraced Parcel and Boundary Dataset | Zhiwei Zhang et.al. | 2507.14697 | null |
| 2025-07-19 | Artificial Intelligence in the Food Industry: Food Waste Estimation based on Computer Vision, a Brief Case Study in a University Dining Hall | Shayan Rokhva et.al. | 2507.14662 | null |
| 2025-07-19 | Multispectral State-Space Feature Fusion: Bridging Shared and Cross-Parametric Interactions for Object Detection | Jifeng Shen et.al. | 2507.14643 | null |
| 2025-07-19 | DiSCO-3D : Discovering and segmenting Sub-Concepts from Open-vocabulary queries in NeRF | Doriand Petit et.al. | 2507.14596 | null |
| 2025-07-18 | Semantic Segmentation based Scene Understanding in Autonomous Vehicles | Ehsan Rassekh et.al. | 2507.14303 | null |
| 2025-07-18 | Leveraging Pathology Foundation Models for Panoptic Segmentation of Melanoma in H&E Images | Jiaqi Lv et.al. | 2507.13974 | null |
| 2025-07-17 | SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation | Shiqi Huang et.al. | 2507.12857 | null |
| 2025-07-17 | A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique | Homare Sueyoshi et.al. | 2507.12730 | null |
| 2025-07-16 | VolSegGS: Segmentation and Tracking in Dynamic Volumetric Scenes via Deformable 3D Gaussians | Siyuan Yao et.al. | 2507.12667 | null |
| 2025-07-16 | NLI4VolVis: Natural Language Interaction for Volume Visualization via LLM Multi-Agents and Editable 3D Gaussian Splatting | Kuangshi Ai et.al. | 2507.12621 | null |
| 2025-07-16 | Out-of-distribution data supervision towards biomedical semantic segmentation | Yiquan Gao et.al. | 2507.12105 | null |
| 2025-07-16 | Tree-SLAM: semantic object SLAM for efficient mapping of individual trees in orchards | David Rapado-Rincon et.al. | 2507.12093 | null |
| 2025-07-16 | Frequency-Dynamic Attention Modulation for Dense Prediction | Linwei Chen et.al. | 2507.12006 | null |
| 2025-07-16 | SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation | Jun Yin et.al. | 2507.11994 | null |
| 2025-07-16 | Prototypical Progressive Alignment and Reweighting for Generalizable Semantic Segmentation | Yuhang Zhang et.al. | 2507.11955 | null |
| 2025-07-16 | Spatial Frequency Modulation for Semantic Segmentation | Linwei Chen et.al. | 2507.11893 | null |
| 2025-07-15 | SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics | Suyuan Zhao et.al. | 2507.11588 | null |
| 2025-07-15 | Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping | Yujie Zhang et.al. | 2507.11279 | null |
| 2025-07-15 | Personalized OVSS: Understanding Personal Concept in Open-Vocabulary Semantic Segmentation | Sunghyun Park et.al. | 2507.11030 | null |
| 2025-07-15 | Graph Aggregation Prototype Learning for Semantic Change Detection in Remote Sensing | Zhengyi Xu et.al. | 2507.10938 | null |
| 2025-07-14 | Static or Temporal? Semantic Scene Simplification to Aid Wayfinding in Immersive Simulations of Bionic Vision | Justin M. Kasowski et.al. | 2507.10813 | null |
| 2025-07-14 | rt-RISeg: Real-Time Model-Free Robot Interactive Segmentation for Active Instance-Level Object Understanding | Howard H. Qian et.al. | 2507.10776 | null |
| 2025-07-14 | FGSSNet: Feature-Guided Semantic Segmentation of Real World Floorplans | Hugo Norrby et.al. | 2507.10343 | null |
| 2025-07-14 | Transferring Styles for Reduced Texture Bias and Improved Robustness in Semantic Segmentation Networks | Ben Hamscher et.al. | 2507.10239 | null |
| 2025-07-14 | Spatial Lifting for Dense Prediction | Mingzhi Xu et.al. | 2507.10222 | null |
| 2025-07-14 | DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation | Ivan Martinović et.al. | 2507.10118 | null |
| 2025-07-13 | MLoRQ: Bridging Low-Rank and Quantization for Transformer Compression | Ofir Gordon et.al. | 2507.09616 | null |
| 2025-07-13 | Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive | You Huang et.al. | 2507.09612 | null |
| 2025-07-13 | SegVec3D: A Method for Vector Embedding of 3D Objects Oriented Towards Robot manipulation | Zhihan Kang et.al. | 2507.09459 | null |
| 2025-07-11 | Multimodal HD Mapping for Intersections by Intelligent Roadside Units | Zhongzhang Chen et.al. | 2507.08903 | null |
| 2025-07-11 | Image Translation with Kernel Prediction Networks for Semantic Segmentation | Cristina Mata et.al. | 2507.08554 | null |
| 2025-07-11 | From Enhancement to Understanding: Build a Generalized Bridge for Low-light Vision via Semantically Consistent Unsupervised Fine-tuning | Sen Wang et.al. | 2507.08380 | null |
| 2025-07-11 | SurfDist: Interpretable Three-Dimensional Instance Segmentation Using Curved Surface Patches | Jackson Borchardt et.al. | 2507.08223 | null |
| 2025-07-10 | RAPS-3D: Efficient interactive segmentation for 3D radiological imaging | Théo Danielou et.al. | 2507.07730 | null |
| 2025-07-10 | LOSC: LiDAR Open-voc Segmentation Consolidator | Nermin Samet et.al. | 2507.07605 | null |
| 2025-07-10 | Diffusion-Guided Knowledge Distillation for Weakly-Supervised Low-Light Semantic Segmentation | Chunyan Wang et.al. | 2507.07578 | null |
| 2025-07-10 | Seg-Wild: Interactive Segmentation based on 3D Gaussian Splatting for Unconstrained Image Collections | Yongtang Bao et.al. | 2507.07395 | null |
| 2025-07-08 | CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddings | Cristina Mata et.al. | 2507.07125 | null |
| 2025-07-09 | A multi-modal dataset for insect biodiversity with imagery and DNA at the trap and individual level | Johanna Orsholm et.al. | 2507.06972 | null |
| 2025-07-09 | SemRaFiner: Panoptic Segmentation in Sparse and Noisy Radar Point Clouds | Matthias Zeller et.al. | 2507.06906 | null |
| 2025-07-09 | Know Your Attention Maps: Class-specific Token Masking for Weakly Supervised Semantic Segmentation | Joelle Hanna et.al. | 2507.06848 | null |
| 2025-07-09 | Ambiguity-aware Point Cloud Segmentation by Adaptive Margin Contrastive Learning | Yang Chen et.al. | 2507.06592 | null |
| 2025-07-08 | Centralized Copy-Paste: Enhanced Data Augmentation Strategy for Wildland Fire Semantic Segmentation | Joon Tai Kim et.al. | 2507.06321 | null |
| 2025-07-08 | FineGrasp: Towards Robust Grasping for Delicate Objects | Yun Du et.al. | 2507.05978 | null |
| 2025-07-08 | Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation | Quanzhu Niu et.al. | 2507.05948 | null |
| 2025-07-08 | I $^2$ R: Inter and Intra-image Refinement in Few Shot Segmentation | Ourui Fu et.al. | 2507.05838 | null |
| 2025-07-09 | Empowering Bridge Digital Twins by Bridging the Data Gap with a Unified Synthesis Framework | Wang Wang et.al. | 2507.05814 | null |
| 2025-07-08 | SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning | Xin Hu et.al. | 2507.05798 | null |
| 2025-07-08 | DreamGrasp: Zero-Shot 3D Multi-Object Reconstruction from Partial-View Images for Robotic Manipulation | Young Hun Kim et.al. | 2507.05627 | null |
| 2025-07-07 | OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts | Shiting Xiao et.al. | 2507.05427 | null |
| 2025-07-07 | Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations | Xiang Xu et.al. | 2507.05260 | null |
| 2025-07-07 | All in One: Visual-Description-Guided Unified Point Cloud Segmentation | Zongyan Han et.al. | 2507.05211 | null |
| 2025-07-07 | RAM-W600: A Multi-Task Wrist Dataset and Benchmark for Rheumatoid Arthritis | Songxiao Yang et.al. | 2507.05193 | null |
| 2025-07-07 | MOSU: Autonomous Long-range Robot Navigation with Multi-modal Scene Understanding | Jing Liang et.al. | 2507.04686 | null |
| 2025-07-06 | Street design and driving behavior: evidence from a large-scale study in Milan, Amsterdam, and Dubai | Giacomo Orsi et.al. | 2507.04434 | null |
| 2025-07-06 | CLIP-RL: Surgical Scene Segmentation Using Contrastive Language-Vision Pretraining & Reinforcement Learning | Fatmaelzahraa Ali Ahmed et.al. | 2507.04317 | null |
| 2025-07-06 | Surg-SegFormer: A Dual Transformer-Based Model for Holistic Surgical Scene Segmentation | Fatimaelzahraa Ahmed et.al. | 2507.04304 | null |
| 2025-07-05 | Differentiable High-Performance Ray Tracing-Based Simulation of Radio Propagation with Point Clouds | Niklas Vaara et.al. | 2507.04021 | null |
| 2025-07-05 | NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models | Siyu Li et.al. | 2507.04002 | null |
| 2025-07-05 | CoT-Segmenter: Enhancing OOD Detection in Dense Road Scenes via Chain-of-Thought Reasoning | Jeonghyo Song et.al. | 2507.03984 | null |
| 2025-07-03 | LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion | Fangfu Liu et.al. | 2507.02813 | null |
| 2025-07-03 | No time to train! Training-Free Reference-Based Instance Segmentation | Miguel Espinosa et.al. | 2507.02798 | null |
| 2025-07-03 | From Pixels to Damage Severity: Estimating Earthquake Impacts Using Semantic Segmentation of Social Media Images | Danrong Zhang et.al. | 2507.02781 | null |
| 2025-07-03 | MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention | Zunhui Xia et.al. | 2507.02488 | null |
| 2025-07-03 | Continual Multiple Instance Learning with Enhanced Localization for Histopathological Whole Slide Image Analysis | Byung Hyun Lee et.al. | 2507.02395 | null |
| 2025-07-03 | Perception Activator: An intuitive and portable framework for brain cognitive exploration | Le Xu et.al. | 2507.02311 | null |
| 2025-07-02 | How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks | Rahul Ramachandran et.al. | 2507.01955 | null |
| 2025-07-02 | 3D Reconstruction and Information Fusion between Dormant and Canopy Seasons in Commercial Orchards Using Deep Learning and Fast GICP | Ranjan Sapkota et.al. | 2507.01912 | null |
| 2025-07-02 | A Gift from the Integration of Discriminative and Diffusion-based Generative Learning: Boundary Refinement Remote Sensing Semantic Segmentation | Hao Wang et.al. | 2507.01573 | null |
| 2025-07-02 | NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation | Max Gandyra et.al. | 2507.01463 | null |
| 2025-07-01 | Towards Open-World Human Action Segmentation Using Graph Convolutional Networks | Hao Xing et.al. | 2507.00756 | null |
| 2025-07-01 | Rectifying Magnitude Neglect in Linear Attention | Qihang Fan et.al. | 2507.00698 | null |
| 2025-07-02 | ExPaMoE: An Expandable Parallel Mixture of Experts for Continual Test-Time Adaptation | JianChao Zhao et.al. | 2507.00502 | null |
| 2025-07-01 | Process-aware and high-fidelity microstructure generation using stable diffusion | Hoang Cuong Phan et.al. | 2507.00459 | null |
| 2025-07-01 | PlantSegNeRF: A few-shot, cross-dataset method for plant 3D instance point cloud reconstruction via joint-channel NeRF with multi-view image instance matching | Xin Yang et.al. | 2507.00371 | null |
| 2025-06-30 | SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures | Fengyi Jiang et.al. | 2507.00209 | null |
| 2025-06-30 | Controllable Reference-Based Real-World Remote Sensing Image Super-Resolution with Generative Diffusion Priors | Ce Wang et.al. | 2506.23801 | null |
| 2025-06-30 | Deep Learning-Based Semantic Segmentation for Real-Time Kidney Imaging and Measurements with Augmented Reality-Assisted Ultrasound | Gijs Luijten et.al. | 2506.23721 | null |
| 2025-06-30 | PGOV3D: Open-Vocabulary 3D Semantic Segmentation with Partial-to-Global Curriculum | Shiqi Zhang et.al. | 2506.23607 | null |
| 2025-06-30 | Interactive Interface For Semantic Segmentation Dataset Synthesis | Ngoc-Do Tran et.al. | 2506.23470 | null |
| 2025-06-30 | Contrastive Learning with Diffusion Features for Weakly Supervised Medical Image Segmentation | Dewen Zeng et.al. | 2506.23460 | null |
| 2025-06-29 | Layer Decomposition and Morphological Reconstruction for Task-Oriented Infrared Image Enhancement | Siyuan Chai et.al. | 2506.23353 | null |
| 2025-06-29 | FastSeg: Efficient Training-Free Open-Vocabulary Segmentation via Hierarchical Attention Refinement Method | Quang-Huy Che et.al. | 2506.23323 | null |
| 2025-06-29 | BPD-Neo: An MRI Dataset for Lung-Trachea Segmentation with Clinical Data for Neonatal Bronchopulmonary Dysplasia | Rachit Saluja et.al. | 2506.23305 | null |
| 2025-06-29 | High-quality Pseudo-labeling for Point Cloud Segmentation with Scene-level Annotation | Lunhao Duan et.al. | 2506.23227 | null |
| 2025-06-29 | DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation | Jihun Kim et.al. | 2506.23104 | null |
| 2025-06-27 | Partial CLIP is Enough: Chimera-Seg for Zero-shot Semantic Segmentation | Jialei Chen et.al. | 2506.22032 | null |
| 2025-06-27 | TASeg: Text-aware RGB-T Semantic Segmentation based on Fine-tuning Vision Foundation Models | Meng Yu et.al. | 2506.21975 | null |
| 2025-06-27 | SDRNET: Stacked Deep Residual Network for Accurate Semantic Segmentation of Fine-Resolution Remotely Sensed Images | Naftaly Wambugu et.al. | 2506.21945 | null |
| 2025-06-26 | Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection | Tobias J. Riedlinger et.al. | 2506.21486 | null |
| 2025-06-26 | PanSt3R: Multi-view Consistent Panoptic Segmentation | Lojze Zust et.al. | 2506.21348 | null |
| 2025-06-26 | HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation | Diego Biagini et.al. | 2506.21287 | null |
| 2025-06-27 | ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation | Xiwei Xuan et.al. | 2506.21233 | null |
| 2025-06-26 | Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4 | Jongyeon Park et.al. | 2506.21174 | null |
| 2025-06-27 | DidSee: Diffusion-Based Depth Completion for Material-Agnostic Robotic Perception and Manipulation | Wenzhou Lyu et.al. | 2506.21034 | null |
| 2025-06-26 | TSDASeg: A Two-Stage Model with Direct Alignment for Interactive Point Cloud Segmentation | Chade Li et.al. | 2506.20991 | null |
| 2025-06-26 | Segment Anything in Pathology Images with Natural Language | Zhixuan Chen et.al. | 2506.20988 | null |
| 2025-06-25 | A Deep Learning Approach to Identify Rock Bolts in Complex 3D Point Clouds of Underground Mines Captured Using Mobile Laser Scanners | Dibyayan Patra et.al. | 2506.20464 | null |
| 2025-06-26 | Towards Scalable and Generalizable Earth Observation Data Mining via Foundation Model Composition | Man Duc Chuc et.al. | 2506.20174 | null |
| 2025-06-24 | A Survey of Multi-sensor Fusion Perception for Embodied AI: Background, Methods, Challenges and Prospects | Shulan Ruan et.al. | 2506.19769 | null |
| 2025-06-24 | USIS16K: High-Quality Dataset for Underwater Salient Instance Segmentation | Lin Hong et.al. | 2506.19472 | null |
| 2025-06-24 | A Global-Local Cross-Attention Network for Ultra-high Resolution Remote Sensing Image Semantic Segmentation | Chen Yi et.al. | 2506.19406 | null |
| 2025-06-25 | AnchorDP3: 3D Affordance Guided Sparse Diffusion Policy for Robotic Manipulation | Ziyan Zhao et.al. | 2506.19269 | null |
| 2025-06-23 | Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation | Jinlong Li et.al. | 2506.19022 | null |
| 2025-06-23 | Multi-Scale Spectral Attention Module-based Hyperspectral Segmentation in Autonomous Driving Scenarios | Imad Ali Shah et.al. | 2506.18682 | null |
| 2025-06-23 | SafeClick: Error-Tolerant Interactive Segmentation of Any Medical Volumes via Hierarchical Expert Consensus | Yifan Gao et.al. | 2506.18404 | null |
| 2025-06-23 | Jet Reconstruction with Mamba Networks in Collider Events | Jinmian Li et.al. | 2506.18336 | null |
| 2025-06-22 | OSDMamba: Enhancing Oil Spill Detection from Remote Sensing Images Using Selective State Space Model | Shuaiyu Chen et.al. | 2506.18006 | null |
| 2025-06-22 | Relation3D: Enhancing Relation Modeling for Point Cloud Instance Segmentation | Jiahao Lu et.al. | 2506.17891 | null |
| 2025-06-22 | Cross-modal State Space Modeling for Real-time RGB-thermal Wild Scene Semantic Segmentation | Xiaodong Guo et.al. | 2506.17869 | null |
| 2025-06-20 | Co-Seg++: Mutual Prompt-Guided Collaborative Learning for Versatile Medical Segmentation | Qing Xu et.al. | 2506.17159 | link |
| 2025-06-20 | ForestFormer3D: A Unified Framework for End-to-End Segmentation of Forest LiDAR 3D Point Clouds | Binbin Xiang et.al. | 2506.16991 | null |
| 2025-06-20 | LunarLoc: Segment-Based Global Localization on the Moon | Annika Thomas et.al. | 2506.16940 | link |
| 2025-06-19 | From Semantic To Instance: A Semi-Self-Supervised Learning Approach | Keyhan Najafian et.al. | 2506.16563 | null |
| 2025-06-19 | Structured Semantic 3D Reconstruction (S23DR) Challenge 2025 – Winning solution | Jan Skvrna et.al. | 2506.16421 | null |
| 2025-06-19 | LBMamba: Locally Bi-directional Mamba | Jingwei Zhang et.al. | 2506.15976 | null |
| 2025-06-19 | Heterogeneous-Modal Unsupervised Domain Adaptation via Latent Space Bridging | Jiawen Yang et.al. | 2506.15971 | null |
| 2025-06-19 | Polyline Path Masked Attention for Vision Transformer | Zhongchen Zhao et.al. | 2506.15940 | null |
| 2025-06-18 | MapFM: Foundation Model-Driven HD Mapping with Multi-Task Contextual Learning | Leonid Ivanov et.al. | 2506.15313 | link |
| 2025-06-18 | Enhancing point cloud analysis via neighbor aggregation correction based on cross-stage structure correlation | Jiaqi Shi et.al. | 2506.15160 | link |
| 2025-06-17 | Scaling-Up the Pretraining of the Earth Observation Foundation Model PhilEO to the MajorTOM Dataset | Nikolaos Dionelis et.al. | 2506.14765 | null |
| 2025-06-17 | FocalClick-XL: Towards Unified and High-quality Interactive Segmentation | Xi Chen et.al. | 2506.14686 | null |
| 2025-06-17 | VisLanding: Monocular 3D Perception for UAV Safe Landing via Depth-Normal Synergy | Zhuoyue Tan et.al. | 2506.14525 | null |
| 2025-06-17 | DepthSeg: Depth prompting in remote sensing semantic segmentation | Ning Zhou et.al. | 2506.14382 | null |
| 2025-06-17 | Leader360V: The Large-scale, Real-world 360 Video Dataset for Multi-task Learning in Diverse Environment | Weiming Zhang et.al. | 2506.14271 | null |
| 2025-06-16 | HierVL: Semi-Supervised Segmentation leveraging Hierarchical Vision-Language Synergy with Dynamic Text-Spatial Query Alignment | Numair Nadeem et.al. | 2506.13925 | null |
| 2025-06-16 | A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects | Guohuan Xie et.al. | 2506.13552 | null |
| 2025-06-16 | Open-Set LiDAR Panoptic Segmentation Guided by Uncertainty-Aware Learning | Rohit Mohan et.al. | 2506.13265 | null |
| 2025-06-16 | ViewPCL: a point cloud based active learning method for multi-view segmentation | Christian Hilaire et.al. | 2506.13043 | null |
| 2025-06-15 | A large-scale, physically-based synthetic dataset for satellite pose estimation | Szabolcs Velkei et.al. | 2506.12782 | null |
| 2025-06-15 | Unleashing Diffusion and State Space Models for Medical Image Segmentation | Rong Wu et.al. | 2506.12747 | null |
| 2025-06-15 | Combining Self-attention and Dilation Convolutional for Semantic Segmentation of Coal Maceral Groups | Zhenghao Xi et.al. | 2506.12712 | null |
| 2025-06-13 | O2Former:Direction-Aware and Multi-Scale Query Enhancement for SAR Ship Instance Segmentation | F. Gao et.al. | 2506.11913 | null |
| 2025-06-13 | Prohibited Items Segmentation via Occlusion-aware Bilayer Modeling | Yunhan Ren et.al. | 2506.11661 | null |
| 2025-06-13 | A $^2$ LC: Active and Automated Label Correction for Semantic Segmentation | Youjin Jeon et.al. | 2506.11599 | null |
| 2025-06-13 | OV-MAP : Open-Vocabulary Zero-Shot 3D Instance Segmentation Map for Robots | Juno Kim et.al. | 2506.11585 | null |
| 2025-06-12 | GynSurg: A Comprehensive Gynecology Laparoscopic Surgery Dataset | Sahar Nasirihaghighi et.al. | 2506.11356 | null |
| 2025-06-12 | Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes | Masahiro Yasuda et.al. | 2506.10676 | link |
| 2025-06-12 | Symmetrical Flow Matching: Unified Image Generation, Segmentation, and Classification with Score-Based Generative Models | Francisco Caetano et.al. | 2506.10634 | null |
| 2025-06-12 | Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration | Jun Wang et.al. | 2506.10573 | null |
| 2025-06-12 | ALBERT: Advanced Localization and Bidirectional Encoder Representations from Transformers for Automotive Damage Evaluation | Teerapong Panboonyuen et.al. | 2506.10524 | null |
| 2025-06-12 | Semantic Localization Guiding Segment Anything Model For Reference Remote Sensing Image Segmentation | Shuyang Li et.al. | 2506.10503 | null |
| 2025-06-12 | Demonstrating Multi-Suction Item Picking at Scale via Multi-Modal Learning of Pick Success | Che Wang et.al. | 2506.10359 | null |
| 2025-06-11 | Deep Semantic Segmentation for Multi-Source Localization Using Angle of Arrival Measurements | Mustafa Atahan Nuhoglu et.al. | 2506.10107 | null |
| 2025-06-11 | Leveraging Depth and Language for Open-Vocabulary Domain-Generalized Semantic Segmentation | Siyu Chen et.al. | 2506.09881 | link |
| 2025-06-11 | Accurate and efficient zero-shot 6D pose estimation with frozen foundation models | Andrea Caraffa et.al. | 2506.09784 | null |
| 2025-06-11 | The Four Color Theorem for Cell Instance Segmentation | Ye Zhang et.al. | 2506.09724 | link |
| 2025-06-11 | Enhancing Human-Robot Collaboration: A Sim2Real Domain Adaptation Algorithm for Point Cloud Segmentation in Industrial Environments | Fatemeh Mohammadi Amin et.al. | 2506.09552 | null |
| 2025-06-12 | Urban1960SatSeg: Unsupervised Semantic Segmentation of Mid-20 $^{th}$ century Urban Landscapes with Satellite Imageries | Tianxiang Hao et.al. | 2506.09476 | null |
| 2025-06-11 | MSSDF: Modality-Shared Self-supervised Distillation for High-Resolution Multi-modal Remote Sensing Image Learning | Tong Wang et.al. | 2506.09327 | null |
| 2025-06-10 | WetCat: Automating Skill Assessment in Wetlab Cataract Surgery Videos | Negin Ghamsarian et.al. | 2506.08896 | null |
| 2025-06-11 | RS-MTDF: Multi-Teacher Distillation and Fusion for Remote Sensing Semi-Supervised Semantic Segmentation | Jiayi Song et.al. | 2506.08772 | null |
| 2025-06-10 | ATAS: Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction | Juan Yeo et.al. | 2506.08678 | null |
| 2025-06-10 | ECMNet:Lightweight Semantic Segmentation with Efficient CNN-Mamba Network | Feixiang Du et.al. | 2506.08629 | null |
| 2025-06-09 | LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds | Zihui Zhang et.al. | 2506.07857 | null |
| 2025-06-09 | SAM2Auto: Auto Annotation Using FLASH | Arash Rocky et.al. | 2506.07850 | null |
| 2025-06-09 | F2Net: A Frequency-Fused Network for Ultra-High Resolution Remote Sensing Segmentation | Hengzhi Chen et.al. | 2506.07847 | null |
| 2025-06-09 | Trend-Aware Fashion Recommendation with Visual Segmentation and Semantic Similarity | Mohamed Djilani et.al. | 2506.07773 | null |
| 2025-06-09 | OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting | Jens Piekenbrinck et.al. | 2506.07697 | null |
| 2025-06-09 | Adapter Naturally Serves as Decoupler for Cross-Domain Few-Shot Semantic Segmentation | Jintao Tong et.al. | 2506.07376 | null |
| 2025-06-09 | Multiple Object Stitching for Unsupervised Representation Learning | Chengchao Shen et.al. | 2506.07364 | link |
| 2025-06-08 | BRIGHT+: Upgrading the BRIGHT Benchmark with MARCUS, a Multi-Agent RAG Clean-Up Suite | Liyang Chen et.al. | 2506.07116 | null |
| 2025-06-08 | Technical Report for ICRA 2025 GOOSE 3D Semantic Segmentation Challenge: Adaptive Point Cloud Understanding for Heterogeneous Robotic Systems | Xiaoya Zhang et.al. | 2506.06995 | null |
| 2025-06-07 | Position Prediction Self-Supervised Learning for Multimodal Satellite Imagery Semantic Segmentation | John Waithaka et.al. | 2506.06852 | null |
| 2025-06-06 | Rethinking Semi-supervised Segmentation Beyond Accuracy: Reliability and Robustness | Steven Landgraf et.al. | 2506.05917 | null |
| 2025-06-06 | You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping | Jingshun Huang et.al. | 2506.05719 | null |
| 2025-06-05 | FRAME: Pre-Training Video Feature Representations via Anticipation and Memory | Sethuraman TV et.al. | 2506.05543 | null |
| 2025-06-05 | U-NetMN and SegNetMN: Modified U-Net and SegNet models for bimodal SAR image segmentation | Marwane Kzadri et.al. | 2506.05444 | null |
| 2025-06-05 | Point Cloud Segmentation of Agricultural Vehicles using 3D Gaussian Splatting | Alfred T. Christiansen et.al. | 2506.05009 | null |
| 2025-06-05 | Bringing SAM to new heights: Leveraging elevation data for tree crown segmentation from drone imagery | Mélisande Teng et.al. | 2506.04970 | null |
| 2025-06-05 | CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx | Lukas Picek et.al. | 2506.04931 | null |
| 2025-06-05 | OpenMaskDINO3D : Reasoning 3D Segmentation via Large Language Model | Kunshen Zhang et.al. | 2506.04837 | null |
| 2025-06-05 | Gen-n-Val: Agentic Image Data Generation and Validation | Jing-En Huang et.al. | 2506.04676 | null |
| 2025-06-04 | You Only Train Once | Christos Sakaridis et.al. | 2506.04349 | null |
| 2025-06-04 | AetherVision-Bench: An Open-Vocabulary RGB-Infrared Benchmark for Multi-Angle Segmentation across Aerial and Ground Perspectives | Aniruddh Sikdar et.al. | 2506.03709 | null |
| 2025-06-04 | OV-COAST: Cost Aggregation with Optimal Transport for Open-Vocabulary Semantic Segmentation | Aditya Gandhamal et.al. | 2506.03706 | null |
| 2025-06-04 | BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation | Jialei Chen et.al. | 2506.03675 | null |
| 2025-06-03 | Cross-Modal Urban Sensing: Evaluating Sound-Vision Alignment Across Street-Level and Aerial Imagery | Pengyu Chen et.al. | 2506.03388 | null |
| 2025-06-03 | Simulate Any Radar: Attribute-Controllable Radar Simulation via Waveform Parameter Embedding | Weiqing Xiao et.al. | 2506.03134 | null |
| 2025-06-03 | GeneA-SLAM2: Dynamic SLAM with AutoEncoder-Preprocessed Genetic Keypoints Resampling and Depth Variance-Guided Dynamic Region Removal | Shufan Qing et.al. | 2506.02736 | link |
| 2025-06-03 | Towards Explicit Geometry-Reflectance Collaboration for Generalized LiDAR Segmentation in Adverse Weather | Longyu Yang et.al. | 2506.02396 | null |
| 2025-06-04 | SAB3R: Semantic-Augmented Backbone in 3D Reconstruction | Xuweiyi Chen et.al. | 2506.02112 | null |
| 2025-06-02 | SEMNAV: A Semantic Segmentation-Driven Approach to Visual Semantic Navigation | Rafael Flor-Rodríguez et.al. | 2506.01418 | null |
| 2025-06-01 | Perceptual Inductive Bias Is What You Need Before Contrastive Learning | Tianqin Li et.al. | 2506.01201 | null |
| 2025-06-01 | GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning | Sahiti Yerramilli et.al. | 2506.00785 | null |
| 2025-05-31 | BAGNet: A Boundary-Aware Graph Attention Network for 3D Point Cloud Semantic Segmentation | Wei Tao et.al. | 2506.00475 | null |
| 2025-05-30 | Bi-Manual Joint Camera Calibration and Scene Representation | Haozhan Tang et.al. | 2505.24819 | null |
| 2025-06-02 | NUC-Net: Non-uniform Cylindrical Partition Network for Efficient LiDAR Semantic Segmentation | Xuzhi Wang et.al. | 2505.24634 | null |
| 2025-05-30 | SPPSFormer: High-quality Superpoint-based Transformer for Roof Plane Instance Segmentation from Point Clouds | Cheng Zeng et.al. | 2505.24475 | null |
| 2025-05-30 | Revisiting Cross-Modal Knowledge Distillation: A Disentanglement Approach for RGBD Semantic Segmentation | Roger Ferrod et.al. | 2505.24361 | null |
| 2025-05-30 | Weakly-Supervised Affordance Grounding Guided by Part-Level Semantic Priors | Peiran Xu et.al. | 2505.24103 | null |
| 2025-05-29 | MaskAdapt: Unsupervised Geometry-Aware Domain Adaptation Using Multimodal Contextual Learning and RGB-Depth Masking | Numair Nadeem et.al. | 2505.24026 | null |
| 2025-05-29 | Semantics-Guided Generative Image Compression | Cheng-Lin Wu et.al. | 2505.24015 | null |
| 2025-05-29 | Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts | Xuweiyi Chen et.al. | 2505.23926 | null |
| 2025-05-29 | TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models | Yao Xiao et.al. | 2505.23769 | link |
| 2025-05-29 | Bridging Classical and Modern Computer Vision: PerceptiveNet for Tree Crown Semantic Segmentation | Georgios Voulgaris et.al. | 2505.23597 | null |
| 2025-05-29 | VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration | Ben Li et.al. | 2505.23439 | link |
| 2025-05-29 | Adaptive Spatial Augmentation for Semi-supervised Semantic Segmentation | Lingyan Ran et.al. | 2505.23438 | null |
| 2025-05-29 | Federated Unsupervised Semantic Segmentation | Evangelos Charalampakis et.al. | 2505.23292 | null |
| 2025-05-29 | LeMoRe: Learn More Details for Lightweight Semantic Segmentation | Mian Muhammad Naeem Abid et.al. | 2505.23093 | link |
| 2025-05-28 | ConfLUNet: Multiple sclerosis lesion instance segmentation in presence of confluent lesions | Maxence Wynen et.al. | 2505.22537 | null |
| 2025-05-28 | Universal Domain Adaptation for Semantic Segmentation | Seun-An Choe et.al. | 2505.22458 | null |
| 2025-05-28 | LiDAR Based Semantic Perception for Forklifts in Outdoor Environments | Benjamin Serfling et.al. | 2505.22258 | null |
| 2025-05-29 | YH-MINER: Multimodal Intelligent System for Natural Ecological Reef Metric Extraction | Mingzhuang Wang et.al. | 2505.22250 | null |
| 2025-05-28 | Enjoying Information Dividend: Gaze Track-based Medical Weakly Supervised Segmentation | Zhisong Wang et.al. | 2505.22230 | null |
| 2025-05-28 | A Survey on Training-free Open-Vocabulary Semantic Segmentation | Naomi Kombol et.al. | 2505.22209 | null |
| 2025-05-28 | S2AFormer: Strip Self-Attention for Efficient Vision Transformer | Guoan Xu et.al. | 2505.22195 | null |
| 2025-05-28 | LiDARDustX: A LiDAR Dataset for Dusty Unstructured Road Environments | Chenfeng Wei et.al. | 2505.21914 | null |
| 2025-05-29 | CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation | Pardis Taghavi et.al. | 2505.21904 | null |
| 2025-05-28 | Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation | Mehrdad Noori et.al. | 2505.21844 | null |
| 2025-05-27 | Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO | Muzhi Zhu et.al. | 2505.21457 | null |
| 2025-05-27 | Object-Centric Action-Enhanced Representations for Robot Visuo-Motor Policy Learning | Nikos Giannakakis et.al. | 2505.20962 | null |
| 2025-05-27 | DSOcc: Leveraging Depth Awareness and Semantic Aid to Boost Camera-Based 3D Semantic Occupancy Prediction | Naiyu Fang et.al. | 2505.20951 | null |
| 2025-05-26 | Vision-Based Risk Aware Emergency Landing for UAVs in Complex Urban Environments | Julio de la Torre-Vanegas et.al. | 2505.20423 | null |
| 2025-05-26 | A fully automated urban PV parameterization framework for improved estimation of energy production profiles | Bowen Tian et.al. | 2505.19876 | null |
| 2025-05-26 | Zero-Shot Pseudo Labels Generation Using SAM and CLIP for Semi-Supervised Semantic Segmentation | Nagito Saito et.al. | 2505.19846 | null |
| 2025-05-26 | The Missing Point in Vision Transformers for Universal Image Segmentation | Sajjad Shahabodini et.al. | 2505.19795 | null |
| 2025-05-26 | ADD-SLAM: Adaptive Dynamic Dense SLAM with Gaussian Splatting | Wenhua Wu et.al. | 2505.19420 | null |
| 2025-05-25 | A Joint Learning Framework with Feature Reconstruction and Prediction for Incomplete Satellite Image Time Series in Agricultural Semantic Segmentation | Yuze Wang et.al. | 2505.19159 | link |
| 2025-05-25 | SPARS: Self-Play Adversarial Reinforcement Learning for Segmentation of Liver Tumours | Catalina Tan et.al. | 2505.18989 | link |
| 2025-05-25 | How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation | Yining Pan et.al. | 2505.18956 | null |
| 2025-05-25 | LLM-Guided Taxonomy and Hierarchical Uncertainty for 3D Point CLoud Active Learning | Chenxi Li et.al. | 2505.18924 | null |
| 2025-05-24 | ThinkVideo: High-Quality Reasoning Video Segmentation with Chain of Thoughts | Shiu-hong Kao et.al. | 2505.18561 | null |
| 2025-05-23 | REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders | Savya Khosla et.al. | 2505.18153 | null |
| 2025-05-23 | SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond Classification | Shashank Agnihotri et.al. | 2505.18015 | null |
| 2025-05-23 | Semantic segmentation with reward | Xie Ting et.al. | 2505.17905 | null |
| 2025-05-23 | Hephaestus Minicubes: A Global, Multi-Modal Dataset for Volcanic Unrest Monitoring | Nikolas Papadopoulos et.al. | 2505.17782 | null |
| 2025-05-23 | EMRA-proxy: Enhancing Multi-Class Region Semantic Segmentation in Remote Sensing Images with Attention Proxy | Yichun Yu et.al. | 2505.17665 | null |
| 2025-05-22 | Deep mineralogical segmentation of thin section images based on QEMSCAN maps | Jean Pablo Vieira de Mello et.al. | 2505.17008 | link |
| 2025-05-22 | OpenSeg-R: Improving Open-Vocabulary Segmentation via Step-by-Step Visual Reasoning | Zongyan Han et.al. | 2505.16974 | link |
| 2025-05-22 | NovelSeek: When Agent Becomes the Scientist – Building Closed-Loop System from Hypothesis to Verification | NovelSeek Team et.al. | 2505.16938 | null |
| 2025-05-22 | TextureSAM: Towards a Texture Aware Foundation Model for Segmentation | Inbal Cohen et.al. | 2505.16540 | null |
| 2025-05-22 | Detailed Evaluation of Modern Machine Learning Approaches for Optic Plastics Sorting | Vaishali Maheshkar et.al. | 2505.16513 | null |
| 2025-05-22 | Sketchy Bounding-box Supervision for 3D Instance Segmentation | Qian Deng et.al. | 2505.16399 | null |
| 2025-05-22 | Style Transfer with Diffusion Models for Synthetic-to-Real Domain Adaptation | Estelle Chigot et.al. | 2505.16360 | link |
| 2025-05-22 | RE-TRIP : Reflectivity Instance Augmented Triangle Descriptor for 3D Place Recognition | Yechan Park et.al. | 2505.16165 | link |
| 2025-05-21 | VP Lab: a PEFT-Enabled Visual Prompting Laboratory for Semantic Segmentation | Niccolo Avogaro et.al. | 2505.15592 | null |
| 2025-05-21 | UWSAM: Segment Anything Model Guided Underwater Instance Segmentation and A Large-scale Benchmark Dataset | Hua Li et.al. | 2505.15581 | link |
| 2025-05-21 | seg_3D_by_PC2D: Multi-View Projection for Domain Generalization and Adaptation in 3D Semantic Segmentation | Andrew Caunes et.al. | 2505.15545 | link |
| 2025-05-21 | Spectral-Aware Global Fusion for RGB-Thermal Semantic Segmentation | Ce Zhang et.al. | 2505.15491 | null |
| 2025-05-21 | gen2seg: Generative Models Enable Generalizable Instance Segmentation | Om Khangaonkar et.al. | 2505.15263 | null |
| 2025-05-21 | Zero-Shot Gaze-based Volumetric Medical Image Segmentation | Tatyana Shmykova et.al. | 2505.15256 | null |
| 2025-05-21 | From Pixels to Images: Deep Learning Advances in Remote Sensing Image Semantic Segmentation | Quanwei Liu et.al. | 2505.15147 | null |
| 2025-05-20 | Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning | Amine Elhafsi et.al. | 2505.14938 | null |
| 2025-05-20 | Instance Segmentation for Point Sets | Abhimanyu Talwar et.al. | 2505.14583 | null |
| 2025-05-20 | ReservoirTTA: Prolonged Test-time Adaptation for Evolving and Recurring Domains | Guillaume Vray et.al. | 2505.14511 | null |
| 2025-05-20 | Decoupling Classifier for Boosting Few-shot Object Detection and Instance Segmentation | Bin-Bin Gao et.al. | 2505.14239 | null |
| 2025-05-20 | Intra-class Patch Swap for Self-Distillation | Hongjun Choi et.al. | 2505.14124 | link |
| 2025-05-20 | Generalizable Multispectral Land Cover Classification via Frequency-Aware Mixture of Low-Rank Token Experts | Xi Chen et.al. | 2505.14088 | null |
| 2025-05-20 | Scaling Vision Mamba Across Resolutions via Fractal Traversal | Bo Li et.al. | 2505.14062 | null |
| 2025-05-20 | EGFormer: Towards Efficient and Generalizable Multimodal Semantic Segmentation | Zelin Zhang et.al. | 2505.14014 | null |
| 2025-05-19 | Self-Supervised Learning for Image Segmentation: A Comprehensive Survey | Thangarajah Akilan et.al. | 2505.13584 | null |
| 2025-05-19 | FlowCut: Unsupervised Video Instance Segmentation via Temporal Mask Matching | Alp Eren Sari et.al. | 2505.13174 | null |
| 2025-05-20 | Industrial Synthetic Segment Pre-training | Shinichi Mae et.al. | 2505.13099 | null |
| 2025-05-19 | Robust Multimodal Segmentation with Representation Regularization and Hybrid Prototype Distillation | Jiaqi Tan et.al. | 2505.12861 | link |
| 2025-05-19 | Enhancing Transformers Through Conditioned Embedded Tokens | Hemanth Saratchandran et.al. | 2505.12789 | null |
| 2025-05-18 | Temporal-Spectral-Spatial Unified Remote Sensing Dense Prediction | Sijie Zhao et.al. | 2505.12280 | link |
| 2025-05-17 | SoftPQ: Robust Instance Segmentation Evaluation via Soft Matching and Tunable Thresholds | Ranit Karmakar et.al. | 2505.12155 | link |
| 2025-05-17 | EarthSynth: Generating Informative Earth Observation with Diffusion Models | Jiancheng Pan et.al. | 2505.12108 | null |
| 2025-05-17 | iSegMan: Interactive Segment-and-Manipulate 3D Gaussians | Yian Zhao et.al. | 2505.11934 | null |
| 2025-05-17 | Technical Report for ICRA 2025 GOOSE 2D Semantic Segmentation Challenge: Boosting Off-Road Segmentation via Photometric Distortion and Exponential Moving Average | Wonjune Kim et.al. | 2505.11769 | null |
| 2025-05-16 | DPSeg: Dual-Prompt Cost Volume Learning for Open-Vocabulary Semantic Segmentation | Ziyu Zhao et.al. | 2505.11676 | null |
| 2025-05-16 | SurgPose: Generalisable Surgical Instrument Pose Estimation using Zero-Shot Learning and Stereo Vision | Utsav Rai et.al. | 2505.11439 | null |
| 2025-05-16 | Pseudo-Label Quality Decoupling and Correction for Semi-Supervised Instance Segmentation | Jianghang Lin et.al. | 2505.11075 | null |
| 2025-05-16 | Completely Weakly Supervised Class-Incremental Learning for Semantic Segmentation | David Minkwan Kim et.al. | 2505.10781 | null |
| 2025-05-15 | Mapping Semantic Segmentation to Point Clouds Using Structure from Motion for Forest Analysis | Francisco Raverta Capua et.al. | 2505.10751 | null |
| 2025-05-15 | TartanGround: A Large-Scale Dataset for Ground Robot Perception and Navigation | Manthan Patel et.al. | 2505.10696 | null |
| 2025-05-15 | SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity | Shihao Zou et.al. | 2505.10352 | null |
| 2025-05-15 | APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds | Yuan Gao et.al. | 2505.09971 | link |
| 2025-05-14 | FedSaaS: Class-Consistency Federated Semantic Segmentation via Global Prototype Supervision and Local Adversarial Harmonization | Xiaoyang Yu et.al. | 2505.09385 | null |
| 2025-05-14 | MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning | Bin-Bin Gao et.al. | 2505.09265 | null |
| 2025-05-13 | MESSI: A Multi-Elevation Semantic Segmentation Image Dataset of an Urban Environment | Barak Pinkovich et.al. | 2505.08589 | null |
| 2025-05-14 | The RaspGrade Dataset: Towards Automatic Raspberry Ripeness Grading with Deep Learning | Mohamed Lamine Mekhalfi et.al. | 2505.08537 | null |
| 2025-05-13 | Dynamic Snake Upsampling Operater and Boundary-Skeleton Weighted Loss for Tubular Structure Segmentation | Yiqi Chen et.al. | 2505.08525 | null |
| 2025-05-13 | Optimizing Retrieval-Augmented Generation: Analysis of Hyperparameter Impact on Performance and Efficiency | Adel Ammar et.al. | 2505.08445 | null |
| 2025-05-13 | GNCAF: A GNN-based Neighboring Context Aggregation Framework for Tertiary Lymphoid Structures Semantic Segmentation in WSI | Lei Su et.al. | 2505.08430 | null |
| 2025-05-12 | Vision Foundation Model Embedding-Based Semantic Anomaly Detection | Max Peter Ronecker et.al. | 2505.07998 | null |
| 2025-05-12 | Privacy Risks of Robot Vision: A User Study on Image Modalities and Resolution | Xuying Huang et.al. | 2505.07766 | null |
| 2025-05-12 | Feedback-Driven Pseudo-Label Reliability Assessment: Redefining Thresholding for Semi-Supervised Semantic Segmentation | Negin Ghamsarian et.al. | 2505.07691 | null |
| 2025-05-12 | MAIS: Memory-Attention for Interactive Segmentation | Mauricio Orbes-Arteaga et.al. | 2505.07511 | null |
| 2025-05-13 | TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark Dataset | Olaf Wysocki et.al. | 2505.07396 | null |
| 2025-05-11 | Semantic-Guided Diffusion Model for Single-Step Image Super-Resolution | Zihang Liu et.al. | 2505.07071 | link |
| 2025-05-11 | Depth-Sensitive Soft Suppression with RGB-D Inter-Modal Stylization Flow for Domain Generalization Semantic Segmentation | Binbin Wei et.al. | 2505.07050 | null |
| 2025-05-11 | Technical Report for ICRA 2025 GOOSE 2D Semantic Segmentation Challenge: Leveraging Color Shift Correction, RoPE-Swin Backbone, and Quantile-based Label Denoising Strategy for Robust Outdoor Scene Understanding | Chih-Chung Hsu et.al. | 2505.06991 | null |
| 2025-05-11 | Boosting Cross-spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation | Seokjun Kwon et.al. | 2505.06951 | null |
| 2025-05-10 | Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization | Xu Zheng et.al. | 2505.06635 | null |
| 2025-05-10 | RESAR-BEV: An Explainable Progressive Residual Autoregressive Approach for Camera-Radar Fusion in BEV Segmentation | Zhiwen Zeng et.al. | 2505.06515 | null |
| 2025-05-09 | Brain Hematoma Marker Recognition Using Multitask Learning: SwinTransformer and Swin-Unet | Kodai Hirata et.al. | 2505.06185 | null |
| 2025-05-08 | CottonSim: Development of an autonomous visual-guided robotic cotton-picking system in the Gazebo | Thevathayarajh Thayananthan et.al. | 2505.05317 | null |
| 2025-05-08 | RepSNet: A Nucleus Instance Segmentation model based on Boundary Regression and Structural Re-parameterization | Shengchun Xiong et.al. | 2505.05073 | null |
| 2025-05-09 | UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything Model | Timo Kaiser et.al. | 2505.05049 | link |
| 2025-05-08 | Split Matching for Inductive Zero-shot Semantic Segmentation | Jialei Chen et.al. | 2505.05023 | null |
| 2025-05-08 | Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model | Navin Ranjan et.al. | 2505.04861 | null |
| 2025-05-07 | Are Synthetic Corruptions A Reliable Proxy For Real-World Corruptions? | Shashank Agnihotri et.al. | 2505.04835 | link |
| 2025-05-07 | Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer | Sainath Dey et.al. | 2505.04740 | null |
| 2025-05-07 | DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception | Junjie Wang et.al. | 2505.04410 | link |
| 2025-05-07 | MFSeg: Efficient Multi-frame 3D Semantic Segmentation | Chengjie Huang et.al. | 2505.04408 | null |
| 2025-05-06 | Self-Supervised Learning for Robotic Leaf Manipulation: A Hybrid Geometric-Neural Approach | Srecharan Selvam et.al. | 2505.03702 | null |
| 2025-05-06 | CaRaFFusion: Improving 2D Semantic Segmentation with Camera-Radar Point Cloud Fusion and Zero-Shot Image Inpainting | Huawei Sun et.al. | 2505.03679 | null |
| 2025-05-06 | Panoramic Out-of-Distribution Segmentation | Mengfei Duan et.al. | 2505.03539 | link |
| 2025-05-06 | 3D Can Be Explored In 2D: Pseudo-Label Generation for LiDAR Point Clouds Using Sensor-Intensity-Based 2D Semantic Segmentation | Andrew Caunes et.al. | 2505.03300 | null |
| 2025-05-05 | Platelet enumeration in dense aggregates | H. Martin Gillis et.al. | 2505.02751 | null |
| 2025-05-04 | Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation | Volodymyr Havrylov et.al. | 2505.02075 | link |
| 2025-05-04 | Segment Any RGB-Thermal Model with Language-aided Distillation | Dong Xing et.al. | 2505.01950 | null |
| 2025-05-03 | OODTE: A Differential Testing Engine for the ONNX Optimizer | Nikolaos Louloudakis et.al. | 2505.01892 | null |
| 2025-05-03 | A Novel WaveInst-based Network for Tree Trunk Structure Extraction and Pattern Analysis in Forest Inventory | Chenyang Fan et.al. | 2505.01656 | null |
| 2025-05-02 | A Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation viaSynergistic Pseudo-Labeling and Generative Learning | Anan Yaghmour et.al. | 2505.01558 | null |
| 2025-05-02 | Rethinking RGB-Event Semantic Segmentation with a Novel Bidirectional Motion-enhanced Event Representation | Zhen Yao et.al. | 2505.01548 | link |
| 2025-05-02 | Global Collinearity-aware Polygonizer for Polygonal Building Mapping in Remote Sensing | Fahong Zhang et.al. | 2505.01385 | null |
| 2025-05-02 | GeloVec: Higher Dimensional Geometric Smoothing for Coherent Visual Feature Extraction in Image Segmentation | Boris Kriuk et.al. | 2505.01057 | null |
| 2025-04-30 | MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection | Qiushi Yang et.al. | 2505.00739 | null |
| 2025-05-03 | Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook | Muyi Bao et.al. | 2505.00630 | null |
| 2025-05-01 | Cues3D: Unleashing the Power of Sole NeRF for Consistent and Unique Instances in Open-Vocabulary 3D Panoptic Segmentation | Feng Xue et.al. | 2505.00378 | null |
| 2025-04-30 | Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space | Leonhard Sommer et.al. | 2504.21749 | null |
| 2025-04-30 | Real Time Semantic Segmentation of High Resolution Automotive LiDAR Scans | Hannes Reichert et.al. | 2504.21602 | null |
| 2025-04-30 | Make Both Ends Meet: A Synergistic Optimization Infrared Small Target Detection with Streamlined Computational Overhead | Yuxin Jing et.al. | 2504.21581 | null |
| 2025-04-30 | ClassWise-CRF: Category-Specific Fusion for Enhanced Semantic Segmentation of Remote Sensing Imagery | Qinfeng Zhu et.al. | 2504.21491 | null |
| 2025-04-29 | DeepVoid: A Deep Learning Void Detector | Sam Kumagai et.al. | 2504.21134 | null |
| 2025-04-29 | Learning a General Model: Folding Clothing with Topological Dynamics | Yiming Liu et.al. | 2504.20720 | null |
| 2025-04-29 | OG-HFYOLO :Orientation gradient guidance and heterogeneous feature fusion for deformation table cell instance segmentation | Long Liu et.al. | 2504.20682 | null |
| 2025-04-28 | DeepAndes: A Self-Supervised Vision Foundation Model for Multi-Spectral Remote Sensing Imagery of the Andes | Junlin Guo et.al. | 2504.20303 | null |
| 2025-04-28 | Learning Streaming Video Representation via Multitask Training | Yibin Yan et.al. | 2504.20041 | null |
| 2025-04-28 | SRMF: A Data Augmentation and Multimodal Fusion Approach for Long-Tail UHR Satellite Image Segmentation | Yulong Guo et.al. | 2504.19839 | null |
| 2025-04-28 | Open-set Anomaly Segmentation in Complex Scenarios | Song Xia et.al. | 2504.19706 | null |
| 2025-04-28 | SubGrapher: Visual Fingerprinting of Chemical Structures | Lucas Morin et.al. | 2504.19695 | null |
| 2025-04-28 | BARIS: Boundary-Aware Refinement with Environmental Degradation Priors for Robust Underwater Instance Segmentation | Pin-Chi Pan et.al. | 2504.19643 | null |
| 2025-04-28 | Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding | Yan Wang et.al. | 2504.19500 | null |
| 2025-04-28 | GSFF-SLAM: 3D Semantic Gaussian Splatting SLAM via Feature Field | Zuxing Lu et.al. | 2504.19409 | null |
| 2025-04-27 | OpenFusion++: An Open-vocabulary Real-time Scene Understanding System | Xiaofeng Jin et.al. | 2504.19266 | null |
| 2025-04-27 | DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning | Jialang Lu et.al. | 2504.19127 | null |
| 2025-04-26 | VISUALCENT: Visual Human Analysis using Dynamic Centroid Representation | Niaz Ahmad et.al. | 2504.19032 | null |
| 2025-04-25 | A Data-Centric Approach to 3D Semantic Segmentation of Railway Scenes | Nicolas Münger et.al. | 2504.18213 | null |
| 2025-04-25 | Multi-Grained Compositional Visual Clue Learning for Image Intent Recognition | Yin Tang et.al. | 2504.18201 | null |
| 2025-04-25 | What is the Added Value of UDA in the VFM Era? | Brunó B. Englert et.al. | 2504.18190 | null |
| 2025-04-25 | Back to Fundamentals: Low-Level Visual Features Guided Progressive Token Pruning | Yuanbing Ouyang et.al. | 2504.17996 | null |
| 2025-04-24 | Virtual Roads, Smarter Safety: A Digital Twin Framework for Mixed Autonomous Traffic Safety Analysis | Hao Zhang et.al. | 2504.17968 | null |
| 2025-04-24 | Masked strategies for images with small objects | H. Martin Gillis et.al. | 2504.17935 | null |
| 2025-04-24 | Occlusion-Aware Self-Supervised Monocular Depth Estimation for Weak-Texture Endoscopic Images | Zebo Huang et.al. | 2504.17582 | null |
| 2025-04-23 | Scene-Aware Location Modeling for Data Augmentation in Automotive Object Detection | Jens Petersen et.al. | 2504.17076 | null |
| 2025-04-23 | SemanticSugarBeets: A Multi-Task Framework and Dataset for Inspecting Harvest and Storage Characteristics of Sugar Beets | Gerardus Croonen et.al. | 2504.16684 | null |
| 2025-04-23 | Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections | Max Kirchner et.al. | 2504.16612 | null |
| 2025-04-23 | SAIP-Net: Enhancing Remote Sensing Image Segmentation via Spectral Adaptive Information Propagation | Zhongtao Wang et.al. | 2504.16564 | null |
| 2025-04-23 | Beyond Anonymization: Object Scrubbing for Privacy-Preserving 2D and 3D Vision Tasks | Murat Bilgehan Ertan et.al. | 2504.16557 | null |
| 2025-04-22 | Efficient Adaptation of Deep Neural Networks for Semantic Segmentation in Space Applications | Leonardo Olivi et.al. | 2504.15991 | null |
| 2025-04-22 | DINOv2-powered Few-Shot Semantic Segmentation: A Unified Framework via Cross-Model Distillation and 4D Correlation Mining | Wei Zhuo et.al. | 2504.15669 | null |
| 2025-04-21 | Segmentation with Noisy Labels via Spatially Correlated Distributions | Ryu Tadokoro et.al. | 2504.14795 | link |
| 2025-04-20 | NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation | Junyuan Fang et.al. | 2504.14638 | null |
| 2025-04-19 | Exploring Modality Guidance to Enhance VFM-based Feature Fusion for UDA in 3D Semantic Segmentation | Johannes Spoecklberger et.al. | 2504.14231 | null |
| 2025-04-19 | Segment Any Crack: Deep Semantic Segmentation Adaptation for Crack Detection | Ghodsiyeh Rostami et.al. | 2504.14138 | null |
| 2025-04-19 | Lightweight Road Environment Segmentation using Vector Quantization | Jiyong Kwag et.al. | 2504.14113 | null |
| 2025-04-18 | Occlusion-Ordered Semantic Instance Segmentation | Soroosh Baselizadeh et.al. | 2504.14054 | null |
| 2025-04-18 | HDBFormer: Efficient RGB-D Semantic Segmentation with A Heterogeneous Dual-Branch Framework | Shuobin Wei et.al. | 2504.13579 | null |
| 2025-04-18 | Learning from Noisy Pseudo-labels for All-Weather Land Cover Mapping | Wang Liu et.al. | 2504.13458 | link |
| 2025-04-18 | DADU: Dual Attention-based Deep Supervised UNet for Automated Semantic Segmentation of Cardiac Images | Racheal Mukisa et.al. | 2504.13415 | null |
| 2025-04-18 | Cardiac MRI Semantic Segmentation for Ventricles and Myocardium using Deep Learning | Racheal Mukisa et.al. | 2504.13391 | null |
| 2025-04-17 | SAR Object Detection with Self-Supervised Pretraining and Curriculum-Aware Sampling | Yasin Almalioglu et.al. | 2504.13310 | null |
| 2025-04-17 | Digital Twin Generation from Visual Data: A Survey | Andrew Melnik et.al. | 2504.13159 | null |
| 2025-04-17 | High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion | Libo Zhang et.al. | 2504.12844 | null |
| 2025-04-17 | Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation | Siyu Chen et.al. | 2504.12753 | link |
| 2025-04-17 | Parsimonious Dataset Construction for Laparoscopic Cholecystectomy Structure Segmentation | Yuning Zhou et.al. | 2504.12573 | null |
| 2025-04-17 | Privacy-Preserving Operating Room Workflow Analysis using Digital Twins | Alejandra Perez et.al. | 2504.12552 | null |
| 2025-04-16 | 3D-PointZshotS: Geometry-Aware 3D Point Cloud Zero-Shot Semantic Segmentation Narrowing the Visual-Semantic Gap | Minmin Yang et.al. | 2504.12442 | null |
| 2025-04-16 | Remote sensing colour image semantic segmentation of trails created by large herbivorous Mammals | Jose Francisco Diez-Pastor et.al. | 2504.12121 | null |
| 2025-04-17 | DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency | Mengshi Qi et.al. | 2504.12080 | link |
| 2025-04-16 | Single-shot Star-convex Polygon-based Instance Segmentation for Spatially-correlated Biomedical Objects | Trina De et.al. | 2504.12078 | null |
| 2025-04-16 | CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting | Wei Sun et.al. | 2504.11893 | null |
| 2025-04-15 | CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image | Jingshun Huang et.al. | 2504.11230 | null |
| 2025-04-15 | Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation | Andrea Simonelli et.al. | 2504.11024 | null |
| 2025-04-15 | PraNet-V2: Dual-Supervised Reverse Attention for Medical Image Segmentation | Bo-Cheng Hu et.al. | 2504.10986 | null |
| 2025-04-15 | LightFormer: A lightweight and efficient decoder for remote sensing image segmentation | Sihang Chen et.al. | 2504.10834 | null |
| 2025-04-15 | OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding | Dianbing Xi et.al. | 2504.10825 | null |
| 2025-04-15 | Efficient and Robust Remote Sensing Image Denoising Using Randomized Approximation of Geodesics’ Gramian on the Manifold Underlying the Patch Space | Kelum Gajamannage et.al. | 2504.10820 | null |
| 2025-04-14 | Real-time Seafloor Segmentation and Mapping | Michele Grimaldi et.al. | 2504.10750 | null |
| 2025-04-14 | FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation | Yasser Benigmim et.al. | 2504.10487 | null |
| 2025-04-14 | The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer | Weixian Lei et.al. | 2504.10462 | null |
| 2025-04-14 | M2S-RoAD: Multi-Modal Semantic Segmentation for Road Damage Using Camera and LiDAR Data | Tzu-Yun Tseng et.al. | 2504.10123 | null |
| 2025-04-14 | DUDA: Distilled Unsupervised Domain Adaptation for Lightweight Semantic Segmentation | Beomseok Kang et.al. | 2504.09814 | null |
| 2025-04-14 | IGL-DT: Iterative Global-Local Feature Learning with Dual-Teacher Semantic Segmentation Framework under Limited Annotation Scheme | Dinh Dai Quan Tran et.al. | 2504.09797 | null |
| 2025-04-14 | Advancing RFI-Detection in Radio Astronomy with Liquid State Machines | Nicholas J Pritchard et.al. | 2504.09796 | null |
| 2025-04-12 | Evolved Hierarchical Masking for Self-Supervised Learning | Zhanzhou Feng et.al. | 2504.09155 | null |
| 2025-04-11 | Data-Importance-Aware Power Allocation for Adaptive Real-Time Communication in Computer Vision Applications | Chunmei Xu et.al. | 2504.08922 | null |
| 2025-04-11 | Offline Reinforcement Learning using Human-Aligned Reward Labeling for Autonomous Emergency Braking in Occluded Pedestrian Crossing | Vinal Asodia et.al. | 2504.08704 | null |
| 2025-04-11 | Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation | Bram Vanherle et.al. | 2504.08473 | link |
| 2025-04-11 | SN-LiDAR: Semantic Neural Fields for Novel Space-time View LiDAR Synthesis | Yi Chen et.al. | 2504.08361 | null |
| 2025-04-11 | DSM: Building A Diverse Semantic Map for 3D Visual Grounding | Qinghongbing Xie et.al. | 2504.08307 | null |
| 2025-04-10 | ChildlikeSHAPES: Semantic Hierarchical Region Parsing for Animating Figure Drawings | Astitva Srivastava et.al. | 2504.08022 | null |
| 2025-04-10 | P2Object: Single Point Supervised Object Detection and Instance Segmentation | Pengfei Chen et.al. | 2504.07813 | null |
| 2025-04-10 | Distilling Knowledge from Heterogeneous Architectures for Semantic Segmentation | Yanglin Huang et.al. | 2504.07691 | null |
| 2025-04-10 | SydneyScapes: Image Segmentation for Australian Environments | Hongyu Lyu et.al. | 2504.07542 | null |
| 2025-04-10 | RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Radiology with Zero-Shot Multi-Task Capability | Jonggwon Park et.al. | 2504.07416 | null |
| 2025-04-09 | RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration | Omar Alama et.al. | 2504.06994 | null |
| 2025-04-09 | Wheat3DGS: In-field 3D Reconstruction, Instance Segmentation and Phenotyping of Wheat Heads with Gaussian Splatting | Daiwei Zhang et.al. | 2504.06978 | null |
| 2025-04-09 | Domain Generalization through Attenuation of Domain-Specific Information | Reiji Saito et.al. | 2504.06781 | null |
| 2025-04-08 | SemiDAViL: Semi-supervised Domain Adaptation with Vision-Language Guidance for Semantic Segmentation | Hritam Basak et.al. | 2504.06389 | null |
| 2025-04-09 | Earth-Adapter: Bridge the Geospatial Domain Gaps with Mixture of Frequency Adaptation | Xiaoxing Hu et.al. | 2504.06220 | null |
| 2025-04-08 | WoundAmbit: Bridging State-of-the-Art Semantic Segmentation and Real-World Wound Care | Vanessa Borst et.al. | 2504.06185 | null |
| 2025-04-08 | Towards Varroa destructor mite detection using a narrow spectra illumination | Samuel Bielik et.al. | 2504.06099 | null |
| 2025-04-08 | econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians | Can Zhang et.al. | 2504.06003 | null |
| 2025-04-08 | Turin3D: Evaluating Adaptation Strategies under Label Scarcity in Urban LiDAR Segmentation with Semi-Supervised Techniques | Luca Barco et.al. | 2504.05882 | null |
| 2025-04-08 | DefMamba: Deformable Visual State Space Model | Leiye Liu et.al. | 2504.05794 | null |
| 2025-04-08 | Transferable Mask Transformer: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation | Enming Zhang et.al. | 2504.05774 | null |
| 2025-04-07 | S^4M: Boosting Semi-Supervised Instance Segmentation with SAM | Heeji Yoon et.al. | 2504.05301 | null |
| 2025-04-07 | BoxSeg: Quality-Aware and Peer-Assisted Learning for Box-supervised Instance Segmentation | Jinxiang Lai et.al. | 2504.05137 | null |
| 2025-04-07 | Balancing Robustness and Efficiency in Embedded DNNs Through Activation Function Selection | Jon Gutiérrez Zaballa et.al. | 2504.05119 | null |
| 2025-04-07 | Prior2Former – Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation | Sebastian Schmidt et.al. | 2504.04841 | null |
| 2025-04-07 | DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation | Bo-Wen Yin et.al. | 2504.04701 | link |
| 2025-04-06 | Statistical Guarantees Of False Discovery Rate In Medical Instance Segmentation Tasks Based on Conformal Risk Control | Mengxia Dai et.al. | 2504.04482 | null |
| 2025-04-06 | Evaluation framework for Image Segmentation Algorithms | Tatiana Merkulova et.al. | 2504.04435 | null |
| 2025-04-05 | CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation | Kai Fang et.al. | 2504.04156 | null |
| 2025-04-05 | DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning | Xiao-Hui Li et.al. | 2504.04085 | null |
| 2025-04-04 | Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation | Xin Zhang et.al. | 2504.03193 | link |
| 2025-04-03 | Adaptive Frequency Enhancement Network for Remote Sensing Image Semantic Segmentation | Feng Gao et.al. | 2504.02647 | null |
| 2025-04-03 | Rip Current Segmentation: A Novel Benchmark and YOLOv8 Baseline Results | Andrei Dumitriu et.al. | 2504.02558 | link |
| 2025-04-03 | Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite Imagery | Mykola Lavreniuk et.al. | 2504.02534 | link |
| 2025-04-03 | Semantic segmentation of forest stands using deep learning | Håkon Næss Sandum et.al. | 2504.02471 | null |
| 2025-04-03 | Taylor Series-Inspired Local Structure Fitting Network for Few-shot Point Cloud Semantic Segmentation | Changshuo Wang et.al. | 2504.02454 | null |
| 2025-04-03 | Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge | Yudi Sang et.al. | 2504.02382 | null |
| 2025-04-03 | APSeg: Auto-Prompt Model with Acquired and Injected Knowledge for Nuclear Instance Segmentation and Classification | Liying Xu et.al. | 2504.02222 | null |
| 2025-04-02 | Scene-Centric Unsupervised Panoptic Segmentation | Oliver Hahn et.al. | 2504.01955 | link |
| 2025-04-02 | Overlap-Aware Feature Learning for Robust Unsupervised Domain Adaptation for 3D Semantic Segmentation | Junjie Chen et.al. | 2504.01668 | null |
| 2025-04-03 | Robust Unsupervised Domain Adaptation for 3D Point Cloud Segmentation Under Source Adversarial Attacks | Haosheng Li et.al. | 2504.01659 | null |
| 2025-04-02 | ProtoGuard-guided PROPEL: Class-Aware Prototype Enhancement and Progressive Labeling for Incremental 3D Point Cloud Segmentation | Haosheng Li et.al. | 2504.01648 | null |
| 2025-04-02 | Benchmarking the Spatial Robustness of DNNs via Natural and Adversarial Localized Corruptions | Giulia Marchiori Pietrosanti et.al. | 2504.01632 | null |
| 2025-04-02 | Instance Migration Diffusion for Nuclear Instance Segmentation in Pathology | Lirui Qi et.al. | 2504.01577 | null |
| 2025-04-02 | Semi-Supervised Biomedical Image Segmentation via Diffusion Models and Teacher-Student Co-Training | Luca Ciampi et.al. | 2504.01547 | null |
| 2025-04-02 | Beyond Nearest Neighbor Interpolation in Data Augmentation | Olivier Rukundo et.al. | 2504.01527 | null |
| 2025-04-02 | Multimodal Point Cloud Semantic Segmentation With Virtual Point Enhancement | Zaipeng Duan et.al. | 2504.01449 | null |
| 2025-04-02 | v-CLR: View-Consistent Learning for Open-World Instance Segmentation | Chang-Bin Zhang et.al. | 2504.01383 | link |
| 2025-03-31 | Pre-training with 3D Synthetic Data: Learning 3D Point Cloud Instance Segmentation from 3D Synthetic Scenes | Daichi Otsuka et.al. | 2503.24229 | null |
| 2025-03-31 | Spectral-Adaptive Modulation Networks for Visual Perception | Guhnoo Yun et.al. | 2503.23947 | null |
| 2025-03-31 | Bridge the Gap Between Visual and Linguistic Comprehension for Generalized Zero-shot Semantic Segmentation | Xiaoqing Guo et.al. | 2503.23806 | null |
| 2025-03-31 | Decoupled Distillation to Erase: A General Unlearning Method for Any Class-centric Tasks | Yu Zhou et.al. | 2503.23751 | null |
| 2025-03-31 | Semantic Packet Aggregation and Repeated Transmission for Text-to-Image Generation | Seunghun Lee et.al. | 2503.23734 | null |
| 2025-03-31 | CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation | Tongke Ni et.al. | 2503.23671 | null |
| 2025-03-30 | BoundMatch: Boundary detection applied to semi-supervised segmentation for urban-driving scenes | Haruya Ishikawa et.al. | 2503.23519 | null |
| 2025-03-30 | Improving underwater semantic segmentation with underwater image quality attention and muti-scale aggregation attention | Xin Zuo et.al. | 2503.23422 | null |
| 2025-03-29 | Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments | Yifan Xu et.al. | 2503.23105 | null |
| 2025-03-28 | Enhancing DeepLabV3+ to Fuse Aerial and Satellite Images for Semantic Segmentation | Anas Berka et.al. | 2503.22909 | null |
| 2025-03-28 | KEVS: Enhancing Segmentation of Visceral Adipose Tissue in Pre-Cystectomy CT with Gaussian Kernel Density Estimation | Thomas Boucher et.al. | 2503.22592 | null |
| 2025-03-28 | A Dataset for Semantic Segmentation in the Presence of Unknowns | Zakaria Laskar et.al. | 2503.22309 | null |
| 2025-03-28 | Concept-Aware LoRA for Domain-Aligned Segmentation Dataset Generation | Minho Park et.al. | 2503.22172 | null |
| 2025-03-28 | Beyond Background Shift: Rethinking Instance Replay in Continual Semantic Segmentation | Hongmei Yin et.al. | 2503.22136 | null |
| 2025-03-28 | Semantic segmentation for building houses from wooden cubes | Ivan Beleacov et.al. | 2503.22125 | null |
| 2025-03-28 | Baseline Systems and Evaluation Metrics for Spatial Semantic Segmentation of Sound Scenes | Binh Thien Nguyen et.al. | 2503.22088 | null |
| 2025-03-28 | A Deep Learning Framework for Boundary-Aware Semantic Segmentation | Tai An et.al. | 2503.22050 | null |
| 2025-03-27 | Foveated Instance Segmentation | Hongyi Zeng et.al. | 2503.21854 | null |
| 2025-03-27 | Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation | Reza Qorbani et.al. | 2503.21780 | link |
| 2025-03-27 | A Unified Image-Dense Annotation Generation Model for Underwater Scenes | Hongkai Lin et.al. | 2503.21771 | link |
| 2025-03-27 | Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving | Lucas Nunes et.al. | 2503.21449 | link |
| 2025-03-26 | Exploring CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation | Zhiwei Yang et.al. | 2503.20826 | link |
| 2025-03-26 | Exploiting Temporal State Space Sharing for Video Semantic Segmentation | Syed Ariff Syed Hesham et.al. | 2503.20824 | null |
| 2025-03-26 | Assessing SAM for Tree Crown Instance Segmentation from Drone Imagery | Mélisande Teng et.al. | 2503.20199 | null |
| 2025-03-25 | Hyperdimensional Uncertainty Quantification for Multimodal Uncertainty Fusion in Autonomous Vehicles Perception | Luke Chen et.al. | 2503.20011 | null |
| 2025-03-25 | The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs | Jonathan Sauder et.al. | 2503.20000 | link |
| 2025-03-25 | LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation | Vladan Stojnić et.al. | 2503.19777 | link |
| 2025-03-25 | OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations | Christina Kassab et.al. | 2503.19764 | null |
| 2025-03-25 | Show or Tell? Effectively prompting Vision-Language Models for semantic segmentation | Niccolo Avogaro et.al. | 2503.19647 | null |
| 2025-03-25 | Exploring Textual Semantics Diversity for Image Transmission in Semantic Communication Systems using Visual Language Model | Peishan Huang et.al. | 2503.19386 | null |
| 2025-03-25 | BIMII-Net: Brain-Inspired Multi-Iterative Interactive Network for RGB-T Road Scene Semantic Segmentation | Hanshuo Qiu et.al. | 2503.19303 | null |
| 2025-03-25 | Multiscale Feature Importance-based Bit Allocation for End-to-End Feature Coding for Machines | Junle Liu et.al. | 2503.19278 | null |
| 2025-03-25 | Context-Aware Semantic Segmentation: Enhancing Pixel-Level Understanding with Large Language Models for Advanced Vision Applications | Ben Rahman et.al. | 2503.19276 | null |
| 2025-03-24 | DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation | Karim Abou Zeid et.al. | 2503.18944 | link |
| 2025-03-24 | Exploring the Integration of Key-Value Attention Into Pure and Hybrid Transformers for Semantic Segmentation | DeShin Hwa et.al. | 2503.18862 | null |
| 2025-03-24 | EgoSurgery-HTS: A Dataset for Egocentric Hand-Tool Segmentation in Open Surgery Videos | Nathan Darjana et.al. | 2503.18755 | null |
| 2025-03-24 | HiRes-FusedMIM: A High-Resolution RGB-DSM Pre-trained Model for Building-Level Remote Sensing Applications | Guneet Mutreja et.al. | 2503.18540 | null |
| 2025-03-24 | Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness | Chenfei Liao et.al. | 2503.18445 | link |
| 2025-03-24 | PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes | Xinhua Xu et.al. | 2503.18393 | null |
| 2025-03-24 | MaSS13K: A Matting-level Semantic Segmentation Benchmark | Chenxi Xie et.al. | 2503.18364 | null |
| 2025-03-23 | PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding | Hongjia Zhai et.al. | 2503.18107 | null |
| 2025-03-23 | Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images | Yara AlaaEldin et.al. | 2503.17982 | link |
| 2025-03-23 | FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation | Dong Zhao et.al. | 2503.17940 | null |
| 2025-03-21 | Center-guided Classifier for Semantic Segmentation of Remote Sensing Images | Wei Zhang et.al. | 2503.16963 | null |
| 2025-03-21 | Seg2Box: 3D Object Detection by Point-Wise Semantics Supervision | Maoji Zheng et.al. | 2503.16811 | null |
| 2025-03-20 | SAGE: Semantic-Driven Adaptive Gaussian Splatting in Extended Reality | Chiara Schiavo et.al. | 2503.16747 | null |
| 2025-03-20 | Panoptic-CUDAL Technical Report: Rural Australia Point Cloud Dataset in Rainy Conditions | Tzu-Yun Tseng et.al. | 2503.16378 | null |
| 2025-03-20 | M2N2V2: Multi-Modal Unsupervised and Training-free Interactive Segmentation | Markus Karmann et.al. | 2503.16254 | null |
| 2025-03-20 | Controllable Segmentation-Based Text-Guided Style Editing | Jingwen Li et.al. | 2503.16129 | null |
| 2025-03-20 | No Thing, Nothing: Highlighting Safety-Critical Classes for Robust LiDAR Semantic Segmentation in Adverse Weather | Junsung Park et.al. | 2503.15910 | null |
| 2025-03-19 | High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight | Cédric Vincent et.al. | 2503.15676 | link |
| 2025-03-19 | Transport-Related Surface Detection with Machine Learning: Analyzing Temporal Trends in Madrid and Vienna | Miguel Ureña Pliego et.al. | 2503.15653 | link |
| 2025-03-19 | CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation | Masud Ahmed et.al. | 2503.15617 | link |
| 2025-03-19 | SUM Parts: Benchmarking Part-Level Semantic Segmentation of Urban Meshes | Weixiao Gao et.al. | 2503.15300 | null |
| 2025-03-19 | Semantic Segmentation of Transparent and Opaque Drinking Glasses with the Help of Zero-shot Learning | Annalena Blänsdorf et.al. | 2503.15004 | null |
| 2025-03-19 | USAM-Net: A U-Net-based Network for Improved Stereo Correspondence and Scene Depth Estimation using Features from a Pre-trained Image Segmentation network | Joseph Emmanuel DL Dayo et.al. | 2503.14950 | null |
| 2025-03-19 | SemanticFlow: A Self-Supervised Framework for Joint Scene Flow Prediction and Instance Segmentation in Dynamic Environments | Yinqi Chen et.al. | 2503.14837 | null |
| 2025-03-18 | Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting | Runsong Zhu et.al. | 2503.14029 | link |
| 2025-03-18 | PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds | Barza Nisar et.al. | 2503.13914 | link |
| 2025-03-18 | Exploiting Inherent Class Label: Towards Robust Scribble Supervised Semantic Segmentation | Xinliang Zhang et.al. | 2503.13895 | link |
| 2025-03-17 | SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint | Zhenlong Yuan et.al. | 2503.13721 | null |
| 2025-03-17 | Let Synthetic Data Shine: Domain Reassembly and Soft-Fusion for Single Domain Generalization | Hao Li et.al. | 2503.13617 | null |
| 2025-03-17 | Clustering is back: Reaching state-of-the-art LiDAR instance segmentation without training | Corentin Sautier et.al. | 2503.13203 | null |
| 2025-03-17 | 3D Hierarchical Panoptic Segmentation in Real Orchard Environments Across Different Sensors | Matteo Sodano et.al. | 2503.13188 | null |
| 2025-03-17 | DehazeMamba: SAR-guided Optical Remote Sensing Image Dehazing with Adaptive State Space Model | Zhicheng Zhao et.al. | 2503.13073 | null |
| 2025-03-17 | Adaptive Transformer Attention and Multi-Scale Fusion for Spine 3D Segmentation | Yanlin Xiang et.al. | 2503.12853 | null |
| 2025-03-17 | LangDA: Building Context-Awareness via Language for Domain Adaptive Semantic Segmentation | Chang Liu et.al. | 2503.12780 | null |
| 2025-03-17 | TransDiff: Diffusion-Based Method for Manipulating Transparent Objects Using a Single RGB-D Image | Haoxiao Wang et.al. | 2503.12779 | null |
| 2025-03-16 | Point Cloud Based Scene Segmentation: A Survey | Dan Halperin et.al. | 2503.12595 | null |
| 2025-03-16 | BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis | Weiguang Zhao et.al. | 2503.12539 | null |
| 2025-03-16 | SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs | Guibiao Liao et.al. | 2503.12535 | null |
| 2025-03-16 | Shape Bias and Robustness Evaluation via Cue Decomposition for Image Classification and Segmentation | Edgar Heinert et.al. | 2503.12453 | null |
| 2025-03-14 | COIN: Confidence Score-Guided Distillation for Annotation-Free Cell Segmentation | Sanghyun Jo et.al. | 2503.11439 | link |
| 2025-03-14 | CyclePose – Leveraging Cycle-Consistency for Annotation-Free Nuclei Segmentation in Fluorescence Microscopy | Jonas Utz et.al. | 2503.11266 | null |
| 2025-03-14 | SpaceSeg: A High-Precision Intelligent Perception Segmentation Method for Multi-Spacecraft On-Orbit Targets | Hao Liu et.al. | 2503.11133 | null |
| 2025-03-14 | A Novel Decomposed Feature-Oriented Framework for Open-Set Semantic Segmentation on LiDAR Data | Wenbang Deng et.al. | 2503.11097 | null |
| 2025-03-12 | Knowledge Consultation for Semi-Supervised Semantic Segmentation | Thuan Than et.al. | 2503.10693 | null |
| 2025-03-13 | RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing | Fengxiang Wang et.al. | 2503.10392 | link |
| 2025-03-13 | OSMa-Bench: Evaluating Open Semantic Mapping Under Varying Lighting Conditions | Maxim Popov et.al. | 2503.10331 | link |
| 2025-03-12 | CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation | Hariprasath Govindarajan et.al. | 2503.09878 | null |
| 2025-03-12 | Active Learning Inspired ControlNet Guidance for Augmenting Semantic Segmentation Datasets | Hannah Kniesel et.al. | 2503.09221 | null |
| 2025-03-12 | Learning Appearance and Motion Cues for Panoptic Tracking | Juana Valeria Hurtado et.al. | 2503.09191 | null |
| 2025-03-11 | SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories | Muzhi Zhu et.al. | 2503.08625 | link |
| 2025-03-11 | SAS: Segment Any 3D Scene with Integrated 2D Priors | Zhuoyuan Li et.al. | 2503.08512 | null |
| 2025-03-11 | WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images | Yansong Guo et.al. | 2503.08407 | null |
| 2025-03-11 | nnInteractive: Redefining 3D Promptable Segmentation | Fabian Isensee et.al. | 2503.08373 | link |
| 2025-03-11 | SegDesicNet: Lightweight Semantic Segmentation in Remote Sensing with Geo-Coordinate Embeddings for Domain Adaptation | Sachin Verma et.al. | 2503.08290 | null |
| 2025-03-11 | Structural and Statistical Texture Knowledge Distillation and Learning for Segmentation | Deyi Ji et.al. | 2503.08043 | null |
| 2025-03-11 | DiffEGG: Diffusion-Driven Edge Generation as a Pixel-Annotation-Free Alternative for Instance Annotation | Sanghyun Jo et.al. | 2503.07982 | null |
| 2025-03-10 | Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models? | Yuru Jia et.al. | 2503.07890 | link |
| 2025-03-10 | REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding | Yan Tai et.al. | 2503.07413 | link |
| 2025-03-10 | Semantic Communications with Computer Vision Sensing for Edge Video Transmission | Yubo Peng et.al. | 2503.07252 | null |
| 2025-03-10 | OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation | Ding Zhong et.al. | 2503.07098 | null |
| 2025-03-10 | Approximate Size Targets Are Sufficient for Accurate Semantic Segmentation | Xingye Fan et.al. | 2503.06954 | null |
| 2025-03-10 | Aligning Instance-Semantic Sparse Representation towards Unsupervised Object Segmentation and Shape Abstraction with Repeatable Primitives | Jiaxin Li et.al. | 2503.06947 | null |
| 2025-03-10 | HierDAMap: Towards Universal Domain Adaptive BEV Mapping via Hierarchical Perspective Priors | Siyu Li et.al. | 2503.06821 | null |
| 2025-03-09 | CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving | Rui Song et.al. | 2503.06744 | null |
| 2025-03-09 | Continuous Online Adaptation Driven by User Interaction for Medical Image Segmentation | Wentian Xu et.al. | 2503.06717 | null |
| 2025-03-09 | MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation | Chenfei Liao et.al. | 2503.06700 | null |
| 2025-03-09 | Asymmetric Decision-Making in Online Knowledge Distillation:Unifying Consensus and Divergence | Zhaowei Chen et.al. | 2503.06685 | null |
| 2025-03-07 | Joint 3D Point Cloud Segmentation using Real-Sim Loop: From Panels to Trees and Branches | Tian Qiu et.al. | 2503.05630 | null |
| 2025-03-07 | TomatoScanner: phenotyping tomato fruit based on only RGB image | Xiaobei Zhao et.al. | 2503.05568 | null |
| 2025-03-07 | S4M: Segment Anything with 4 Extreme Points | Adrien Meyer et.al. | 2503.05534 | null |
| 2025-03-07 | Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction | Shuo Jiang et.al. | 2503.05231 | null |
| 2025-03-06 | EvidMTL: Evidential Multi-Task Learning for Uncertainty-Aware Semantic Surface Mapping from Monocular RGB Images | Rohit Menon et.al. | 2503.04441 | null |
| 2025-03-06 | PointsToWood: A deep learning framework for complete canopy leaf-wood segmentation of TLS data across diverse European forests | Harry J. F. Owen et.al. | 2503.04420 | null |
| 2025-03-06 | Geometry-Constrained Monocular Scale Estimation Using Semantic Segmentation for Dynamic Scenes | Hui Zhang et.al. | 2503.04235 | null |
| 2025-03-06 | MASTER: Multimodal Segmentation with Text Prompts | Fuyang Liu et.al. | 2503.04199 | null |
| 2025-03-06 | Towards Intelligent Transportation with Pedestrians and Vehicles In-the-Loop: A Surveillance Video-Assisted Federated Digital Twin Framework | Xiaolong Li et.al. | 2503.04170 | null |
| 2025-03-06 | H3O: Hyper-Efficient 3D Occupancy Prediction with Heterogeneous Supervision | Yunxiao Shi et.al. | 2503.04059 | null |
| 2025-03-06 | GaussianGraph: 3D Gaussian-based Scene Graph Generation for Open-world Scene Understanding | Xihan Wang et.al. | 2503.04034 | null |
| 2025-03-06 | DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation | Amin Karimi et.al. | 2503.04006 | null |
| 2025-03-05 | COARSE: Collaborative Pseudo-Labeling with Coarse Real Labels for Off-Road Semantic Segmentation | Aurelio Noca et.al. | 2503.03947 | null |
| 2025-03-05 | SurgiSAM2: Fine-tuning a foundational model for surgical video anatomy segmentation and detection | Devanish N. Kamtam et.al. | 2503.03942 | null |
| 2025-03-05 | Automatic Drywall Analysis for Progress Tracking and Quality Control in Construction | Mariusz Trzeciakiewicz et.al. | 2503.03422 | null |
| 2025-03-05 | Golden Cudgel Network for Real-Time Semantic Segmentation | Guoyu Yang et.al. | 2503.03325 | null |
| 2025-03-05 | Label-Efficient LiDAR Semantic Segmentation with 2D-3D Vision Transformer Adapters | Julia Hindel et.al. | 2503.03299 | null |
| 2025-03-05 | Interactive Segmentation and Report Generation for CT Images | Yannian Gu et.al. | 2503.03294 | null |
| 2025-03-05 | Car-STAGE: Automated framework for large-scale high-dimensional simulated time-series data generation based on user-defined criteria | Asma A. Almutairi et.al. | 2503.03100 | null |
| 2025-03-05 | AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model | Wenlun Zhang et.al. | 2503.03088 | null |
| 2025-03-04 | Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance | Jiayi Zhao et.al. | 2503.02581 | link |
| 2025-03-04 | MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments | Ege Özsoy et.al. | 2503.02579 | link |
| 2025-03-04 | TS-CGNet: Temporal-Spatial Fusion Meets Centerline-Guided Diffusion for BEV Mapping | Xinying Hong et.al. | 2503.02578 | null |
| 2025-03-04 | Exploring Token-Level Augmentation in Vision Transformer for Semi-Supervised Semantic Segmentation | Dengke Zhang et.al. | 2503.02459 | null |
| 2025-03-04 | Label-Efficient LiDAR Panoptic Segmentation | Ahmet Selim Çanakçı et.al. | 2503.02372 | null |
| 2025-03-03 | SAGE: A Framework of Precise Retrieval for RAG | Jintao Zhang et.al. | 2503.01713 | null |
| 2025-03-04 | UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface | Hao Tang et.al. | 2503.01342 | link |
| 2025-03-03 | OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging | Yijie Tang et.al. | 2503.01309 | null |
| 2025-03-03 | Convex Hull-based Algebraic Constraint for Visual Quadric SLAM | Xiaolong Yu et.al. | 2503.01254 | link |
| 2025-03-03 | Identity documents recognition and detection using semantic segmentation with convolutional neural network | Mykola Kozlenko et.al. | 2503.01085 | null |
| 2025-02-28 | The Common Objects Underwater (COU) Dataset for Robust Underwater Object Detection | Rishi Mukherjee et.al. | 2502.20651 | null |
| 2025-02-27 | Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds | Mohamed Abdelsamad et.al. | 2502.20316 | null |
| 2025-02-27 | OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels | Meng Lou et.al. | 2502.20087 | link |
| 2025-02-28 | SegLocNet: Multimodal Localization Network for Autonomous Driving via Bird’s-Eye-View Segmentation | Zijie Zhou et.al. | 2502.20077 | link |
| 2025-03-03 | 3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds | Hengshuo Chu et.al. | 2502.20041 | null |
| 2025-02-27 | Learning Mask Invariant Mutual Information for Masked Image Modeling | Tao Huang et.al. | 2502.19718 | null |
| 2025-02-28 | You Only Click Once: Single Point Weakly Supervised 3D Instance Segmentation for Autonomous Driving | Guangfeng Jiang et.al. | 2502.19698 | null |
| 2025-02-26 | Knowledge Distillation for Semantic Segmentation: A Label Space Unification Approach | Anton Backhaus et.al. | 2502.19177 | null |
| 2025-02-26 | Enhanced Neuromorphic Semantic Segmentation Latency through Stream Event | D. Hareb et.al. | 2502.18982 | null |
| 2025-02-28 | OpenFly: A Versatile Toolchain and Large-scale Benchmark for Aerial Vision-Language Navigation | Yunpeng Gao et.al. | 2502.18041 | null |
| 2025-02-25 | CAML: Collaborative Auxiliary Modality Learning for Multi-Agent Systems | Rui Liu et.al. | 2502.17821 | null |
| 2025-02-24 | CLIMB-3D: Continual Learning for Imbalanced 3D Instance Segmentation | Vishal Thengane et.al. | 2502.17429 | link |
| 2025-02-25 | DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks | Canyu Zhao et.al. | 2502.17157 | link |
| 2025-02-24 | SpecDM: Hyperspectral Dataset Synthesis with Pixel-level Semantic Annotations | Wendi Liu et.al. | 2502.17056 | null |
| 2025-02-25 | VPNeXt – Rethinking Dense Decoding for Plain Vision Transformer | Xikai Tang et.al. | 2502.16654 | null |
| 2025-02-23 | Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration | Kim Jun-Seong et.al. | 2502.16652 | null |
| 2025-02-23 | OpenVox: Real-time Instance-level Open-vocabulary Probabilistic Voxel Representation | Yinan Deng et.al. | 2502.16528 | null |
| 2025-02-23 | Deep learning approaches to surgical video segmentation and object detection: A Scoping Review | Devanish N. Kamtam et.al. | 2502.16459 | null |
| 2025-02-22 | Pointmap Association and Piecewise-Plane Constraint for Consistent and Compact 3D Gaussian Segmentation Field | Wenhao Hu et.al. | 2502.16303 | null |
| 2025-02-22 | Importance-Aware Source-Channel Coding for Multi-Modal Task-Oriented Semantic Communication | Yi Ma et.al. | 2502.16194 | null |
| 2025-02-22 | FeatSharp: Your Vision Model Features, Sharper | Mike Ranzinger et.al. | 2502.16025 | link |
| 2025-02-21 | Aligning Task- and Reconstruction-Oriented Communications for Edge Intelligence | Yufeng Diao et.al. | 2502.15472 | null |
| 2025-02-21 | DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation | Luzhou Ge et.al. | 2502.15309 | link |
| 2025-02-21 | Confidence-Weighted Boundary-Aware Learning for Semi-Supervised Semantic Segmentation | Ebenezer Tarubinga et.al. | 2502.15152 | link |
| 2025-02-20 | RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird’s Eye View Segmentation | Henrique Piñeiro Monteagudo et.al. | 2502.14792 | null |
| 2025-02-20 | Multi-dataset synergistic in supervised learning to pre-label structural components in point clouds from shell construction scenes | Lukas Rauch et.al. | 2502.14721 | null |
| 2025-02-20 | Reliable Explainability of Deep Learning Spatial-Spectral Classifiers for Improved Semantic Segmentation in Autonomous Driving | Jon Gutiérrez-Zaballa et.al. | 2502.14416 | null |
| 2025-02-20 | Bayesian SegNet for Semantic Segmentation with Improved Interpretation of Microstructural Evolution During Irradiation of Materials | Marjolein Oostrom et.al. | 2502.14184 | null |
| 2025-02-19 | SegRet: An Efficient Design for Semantic Segmentation with Retentive Network | Zhiyuan Li et.al. | 2502.14014 | link |
| 2025-02-19 | Remote Sensing Semantic Segmentation Quality Assessment based on Vision Language Model | Huiying Shi et.al. | 2502.13990 | null |
| 2025-02-19 | MGFI-Net: A Multi-Grained Feature Integration Network for Enhanced Medical Image Segmentation | Yucheng Zeng et.al. | 2502.13808 | null |
| 2025-02-19 | CARE: Confidence-Aware Regression Estimation of building density fine-tuning EO Foundation Models | Nikolaos Dionelis et.al. | 2502.13734 | null |
| 2025-02-18 | WeedsGalore: A Multispectral and Multitemporal UAV-based Dataset for Crop and Weed Segmentation in Agricultural Maize Fields | Ekin Celikkan et.al. | 2502.13103 | link |
| 2025-02-18 | Enhancing Power Grid Inspections with Machine Learning | Diogo Lavado et.al. | 2502.13037 | null |
| 2025-02-18 | DAMamba: Vision State Space Model with Dynamic Adaptive Scan | Tanzhe Li et.al. | 2502.12627 | null |
| 2025-02-17 | From Open-Vocabulary to Vocabulary-Free Semantic Segmentation | Klara Reichard et.al. | 2502.11891 | null |
| 2025-02-16 | Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring | Murat Arda Onsu et.al. | 2502.11304 | null |
| 2025-02-16 | Text-promptable Propagation for Referring Medical Image Sequence Segmentation | Runtian Yuan et.al. | 2502.11093 | null |
| 2025-02-16 | Detecting Cadastral Boundary from Satellite Images Using U-Net model | Neda Rahimpour Anaraki et.al. | 2502.11044 | null |
| 2025-02-15 | NPSim: Nighttime Photorealistic Simulation From Daytime Images With Monocular Inverse Rendering and Ray Tracing | Shutong Zhang et.al. | 2502.10720 | null |
| 2025-02-15 | Deep Learning for Wound Tissue Segmentation: A Comprehensive Evaluation using A Novel Dataset | Muhammad Ashad Kabir et.al. | 2502.10652 | null |
| 2025-02-14 | Artificial Intelligence to Assess Dental Findings from Panoramic Radiographs – A Multinational Study | Yin-Chih Chelsea Wang et.al. | 2502.10277 | null |
| 2025-02-14 | FrGNet: A fourier-guided weakly-supervised framework for nuclear instance segmentation | Peng Ling et.al. | 2502.09874 | null |
| 2025-02-12 | Towards Fine-grained Interactive Segmentation in Images and Videos | Yuan Yao et.al. | 2502.09660 | null |
| 2025-02-13 | Instance Segmentation of Scene Sketches Using Natural Image Priors | Mia Tang et.al. | 2502.09608 | null |
| 2025-02-13 | SQ-GAN: Semantic Image Communications Using Masked Vector Quantization | Francesco Pezone et.al. | 2502.09520 | null |
| 2025-02-13 | FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation | Bin Yang et.al. | 2502.09274 | null |
| 2025-02-13 | Memory-based Ensemble Learning in CMR Semantic Segmentation | Yiwei Liu et.al. | 2502.09269 | link |
| 2025-02-13 | Latents of latents to delineate pixels: hybrid Matryoshka autoencoder-to-U-Net pairing for segmenting large medical images in GPU-poor and low-data regimes | Tahir Syed et.al. | 2502.08988 | null |
| 2025-02-12 | HistoSmith: Single-Stage Histology Image-Label Generation via Conditional Latent Diffusion for Enhanced Cell Segmentation and Classification | Valentina Vadori et.al. | 2502.08754 | link |
| 2025-02-12 | Generalized Class Discovery in Instance Segmentation | Cuong Manh Hoang et.al. | 2502.08149 | null |
| 2025-02-12 | Knowledge Swapping via Learning and Unlearning | Mingyu Xing et.al. | 2502.08075 | null |
| 2025-02-11 | Efficient Continuous Group Convolutions for Local SE(3) Equivariance in 3D Point Clouds | Lisa Weijler et.al. | 2502.07505 | link |
| 2025-02-11 | A Survey on Mamba Architecture for Vision Applications | Fady Ibrahim et.al. | 2502.07161 | null |
| 2025-02-09 | A Comprehensive Review of U-Net and Its Variants: Advances and Applications in Medical Image Segmentation | Wang Jiangtao et.al. | 2502.06895 | null |
| 2025-02-10 | SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement | Yuqi Lin et.al. | 2502.06756 | null |
| 2025-02-10 | A Large-scale AI-generated Image Inpainting Benchmark | Paschalis Giakoumoglou et.al. | 2502.06593 | link |
| 2025-02-11 | Enhancing Ground-to-Aerial Image Matching for Visual Misinformation Detection Using Semantic Segmentation | Emanuele Mule et.al. | 2502.06288 | null |
| 2025-02-10 | Unsupervised deep learning for semantic segmentation of multispectral LiDAR forest point clouds | Lassi Ruoppa et.al. | 2502.06227 | null |
| 2025-02-09 | Traveling Waves Integrate Spatial Information Into Spectral Representations | Mozes Jacobs et.al. | 2502.06034 | null |
| 2025-02-11 | VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer | Xinyu Liu et.al. | 2502.05979 | null |
| 2025-02-09 | LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification | Shubham Kumar Nigam et.al. | 2502.05836 | null |
| 2025-02-08 | Convolutional Neural Network Segmentation for Satellite Imagery Data to Identify Landforms Using U-Net Architecture | Mitul Goswami et.al. | 2502.05476 | null |
| 2025-02-08 | LMS-Net: A Learned Mumford-Shah Network For Few-Shot Medical Image Segmentation | Shengdong Zhang et.al. | 2502.05473 | link |
| 2025-02-08 | A Novel Convolutional-Free Method for 3D Medical Imaging Segmentation | Canxuan Gang et.al. | 2502.05396 | null |
| 2025-02-07 | IPSeg: Image Posterior Mitigates Semantic Drift in Class-Incremental Segmentation | Xiao Yu et.al. | 2502.04870 | null |
| 2025-02-07 | AIQViT: Architecture-Informed Post-Training Quantization for Vision Transformers | Runqing Jiang et.al. | 2502.04628 | null |
| 2025-02-05 | DILLEMA: Diffusion and Large Language Models for Multi-Modal Augmentation | Luciano Baresi et.al. | 2502.04378 | link |
| 2025-02-06 | Beyond the Final Layer: Hierarchical Query Fusion Transformer with Agent-Interpolation Initialization for 3D Instance Segmentation | Jiahao Lu et.al. | 2502.04139 | null |
| 2025-02-06 | Adaptive Margin Contrastive Learning for Ambiguity-aware 3D Semantic Segmentation | Yang Chen et.al. | 2502.04111 | null |
| 2025-02-06 | LeAP: Consistent multi-domain 3D labeling using Foundation Models | Simon Gebraad et.al. | 2502.03901 | null |
| 2025-02-06 | Optimized Unet with Attention Mechanism for Multi-Scale Semantic Segmentation | Xuan Li et.al. | 2502.03813 | null |
| 2025-02-05 | Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics | Indrashis Das et.al. | 2502.03654 | link |
| 2025-02-05 | ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models | Ying Zhang et.al. | 2502.03266 | link |
| 2025-02-05 | Disentangling CLIP Features for Enhanced Localized Understanding | Samyak Rawelekar et.al. | 2502.02977 | null |
| 2025-02-05 | From DeepSense to Open RAN: AI/ML Advancements in Dynamic Spectrum Sensing and Their Applications | Ryan Barker et.al. | 2502.02889 | null |
| 2025-02-04 | Muographic Image Upsampling with Machine Learning for Built Infrastructure Applications | William O’Donnell et.al. | 2502.02624 | null |
| 2025-02-04 | COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation | Xueqing Deng et.al. | 2502.02589 | null |
| 2025-02-04 | Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation | Junha Lee et.al. | 2502.02548 | null |
| 2025-02-04 | Mind the Gap: Evaluating Patch Embeddings from General-Purpose and Histopathology Foundation Models for Cell Segmentation and Classification | Valentina Vadori et.al. | 2502.02471 | null |
| 2025-02-04 | Transfer Risk Map: Mitigating Pixel-level Negative Transfer in Medical Segmentation | Shutong Duan et.al. | 2502.02340 | null |
| 2025-02-04 | UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation | Tao Zhang et.al. | 2502.02257 | link |
| 2025-02-04 | Deep Ensemble approach for Enhancing Brain Tumor Segmentation in Resource-Limited Settings | Jeremiah Fadugba et.al. | 2502.02179 | null |
| 2025-02-04 | Memory Efficient Transformer Adapter for Dense Predictions | Dong Zhang et.al. | 2502.01962 | null |
| 2025-02-03 | Deep Unfolding Multi-modal Image Fusion Network via Attribution Analysis | Haowen Bai et.al. | 2502.01467 | null |
| 2025-02-03 | Temporal-consistent CAMs for Weakly Supervised Video Segmentation in Waste Sorting | Andrea Marelli et.al. | 2502.01455 | null |
| 2025-02-03 | ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies | Costin F. Ciusdel et.al. | 2502.01335 | null |
| 2025-01-31 | Let Human Sketches Help: Empowering Challenging Image Segmentation Task with Freehand Sketches | Ying Zang et.al. | 2501.19329 | null |
| 2025-01-31 | GO: The Great Outdoors Multimodal Dataset | Peng Jiang et.al. | 2501.19274 | null |
| 2025-01-31 | Medical Semantic Segmentation with Diffusion Pretrain | David Li et.al. | 2501.19265 | null |
| 2025-01-31 | ContextFormer: Redefining Efficiency in Semantic Segmentation | Mian Muhammad Naeem Abid et.al. | 2501.19255 | null |
| 2025-01-31 | Integrating Semi-Supervised and Active Learning for Semantic Segmentation | Wanli Ma et.al. | 2501.19227 | null |
| 2025-01-31 | Improving vision-language alignment with graph spiking hybrid Networks | Siyu Zhang et.al. | 2501.19069 | null |
| 2025-01-31 | SynthmanticLiDAR: A Synthetic Dataset for Semantic Segmentation on LiDAR Imaging | Javier Montalvo et.al. | 2501.19035 | null |
| 2025-01-31 | Project-and-Fuse: Improving RGB-D Semantic Segmentation via Graph Convolution Networks | Xiaoyan Jiang et.al. | 2501.18851 | null |
| 2025-01-30 | INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation | Jian Hu et.al. | 2501.18753 | null |
| 2025-02-03 | Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models | Hao Dong et.al. | 2501.18592 | link |
| 2025-01-30 | Tuning Vision Foundation Model via Test-Time Prompt-Guided Training for VFSS Segmentations | Chengxi Zeng et.al. | 2501.18474 | null |
| 2025-01-30 | Ground Awareness in Deep Learning for Large Outdoor Point Cloud Segmentation | Kevin Qiu et.al. | 2501.18246 | null |
| 2025-01-30 | ContourFormer:Real-Time Contour-Based End-to-End Instance Segmentation Transformer | Weiwei Yao et.al. | 2501.17688 | null |
| 2025-01-29 | Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation | Lin Chen et.al. | 2501.17642 | null |
| 2025-01-29 | 3DSES: an indoor Lidar point cloud segmentation dataset with real and pseudo-labels from a 3D model | Maxime Mérizette et.al. | 2501.17534 | null |
| 2025-01-29 | Beyond-Labels: Advancing Open-Vocabulary Segmentation With Vision-Language Models | Muhammad Atta ur Rahman et.al. | 2501.16769 | null |
| 2025-01-28 | AdaSemSeg: An Adaptive Few-shot Semantic Segmentation of Seismic Facies | Surojit Saha et.al. | 2501.16760 | null |
| 2025-01-28 | SSF-PAN: Semantic Scene Flow-Based Perception for Autonomous Navigation in Traffic Scenarios | Yinqi Chen et.al. | 2501.16754 | null |
| 2025-01-27 | Cross-Domain Semantic Segmentation with Large Language Model-Assisted Descriptor Generation | Philip Hughes et.al. | 2501.16467 | null |
| 2025-01-27 | DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation | Han Sun et.al. | 2501.16410 | null |
| 2025-01-27 | The Linear Attention Resurrection in Vision Transformer | Chuanyang Zheng et.al. | 2501.16182 | null |
| 2025-01-27 | D-PLS: Decoupled Semantic Segmentation for 4D-Panoptic-LiDAR-Segmentation | Maik Steinhauser et.al. | 2501.15870 | null |
| 2025-01-26 | iFormer: Integrating ConvNet and Transformer for Mobile Application | Chuanyang Zheng et.al. | 2501.15369 | link |
| 2025-01-25 | A Training-free Synthetic Data Selection Method for Semantic Segmentation | Hao Tang et.al. | 2501.15201 | null |
| 2025-01-24 | 3DLabelProp: Geometric-Driven Domain Generalization for LiDAR Semantic Segmentation in Autonomous Driving | Jules Sanchez et.al. | 2501.14605 | link |
| 2025-01-24 | Effective Defect Detection Using Instance Segmentation for NDI | Ashiqur Rahman et.al. | 2501.14149 | null |
| 2025-01-23 | ME-CPT: Multi-Task Enhanced Cross-Temporal Point Transformer for Urban 3D Change Detection | Luqi Zhang et.al. | 2501.14004 | link |
| 2025-01-23 | IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models | Jiayi Lei et.al. | 2501.13920 | link |
| 2025-01-23 | Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning | Zuyao You et.al. | 2501.13893 | link |
| 2025-01-23 | Where Do You Go? Pedestrian Trajectory Prediction using Scene Features | Mohammad Ali Rezaei et.al. | 2501.13848 | null |
| 2025-01-23 | Overcoming Support Dilution for Robust Few-shot Semantic Segmentation | Wailing Tang et.al. | 2501.13529 | null |
| 2025-01-22 | Revisiting Data Augmentation for Ultrasound Images | Adam Tupper et.al. | 2501.13193 | link |
| 2025-01-22 | A Novel Scene Coupling Semantic Mask Network for Remote Sensing Image Segmentation | Xiaowen Ma et.al. | 2501.13130 | link |
| 2025-01-22 | Hybridization of Attention UNet with Repeated Atrous Spatial Pyramid Pooling for Improved Brain Tumour Segmentation | Satyaki Roy Chowdhury et.al. | 2501.13129 | null |
| 2025-01-22 | Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks | Alessio Quercia et.al. | 2501.12824 | null |
| 2025-01-19 | Comparative Analysis of Hand-Crafted and Machine-Driven Histopathological Features for Prostate Cancer Classification and Segmentation | Feda Bolus Al Baqain et.al. | 2501.12415 | null |
| 2025-01-21 | Benchmarking Image Perturbations for Testing Automated Driving Assistance Systems | Stefano Carlo Lambertenghi et.al. | 2501.12269 | null |
| 2025-01-21 | A margin-based replacement for cross-entropy loss | Michael W. Spratling et.al. | 2501.12191 | null |
| 2025-01-21 | Foreign object segmentation in chest x-rays through anatomy-guided shape insertion | Constantin Seibold et.al. | 2501.12022 | null |
| 2025-01-21 | Data-driven Detection and Evaluation of Damages in Concrete Structures: Using Deep Learning and Computer Vision | Saeid Ataei et.al. | 2501.11836 | null |
| 2025-01-20 | MedicoSAM: Towards foundation models for medical image segmentation | Anwai Archit et.al. | 2501.11734 | link |
| 2025-01-20 | Automatic Labelling & Semantic Segmentation with 4D Radar Tensors | Botao Sun et.al. | 2501.11351 | null |
| 2025-01-20 | Enhancing Uncertainty Estimation in Semantic Segmentation via Monte-Carlo Frequency Dropout | Tal Zeevi et.al. | 2501.11258 | link |
| 2025-01-20 | Advancing Oyster Phenotype Segmentation with Multi-Network Ensemble and Multi-Scale mechanism | Wenli Yang et.al. | 2501.11203 | null |
| 2025-01-19 | Rethinking Early-Fusion Strategies for Improved Multimodal Image Segmentation | Zhengwen Shen et.al. | 2501.10958 | null |
| 2025-01-18 | OpenEarthMap-SAR: A Benchmark Synthetic Aperture Radar Dataset for Global High-Resolution Land Cover Mapping | Junshi Xia et.al. | 2501.10891 | null |
| 2025-01-17 | Few-shot Structure-Informed Machinery Part Segmentation with Foundation Models and Graph Neural Networks | Michael Schwingshackl et.al. | 2501.10080 | link |
| 2025-01-17 | Robust Change Captioning in Remote Sensing: SECOND-CC Dataset and MModalCC Framework | Ali Can Karaca et.al. | 2501.10075 | link |
| 2025-01-17 | One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression | Keita Miwa et.al. | 2501.10064 | null |
| 2025-01-17 | LWGANet: A Lightweight Group Attention Backbone for Remote Sensing Visual Tasks | Wei Lu et.al. | 2501.10040 | link |
| 2025-01-16 | The Devil is in the Details: Simple Remedies for Image-to-LiDAR Representation Learning | Wonjun Jo et.al. | 2501.09485 | null |
| 2025-01-16 | Scaling up self-supervised learning for improved surgical foundation models | Tim J. M. Jaspers et.al. | 2501.09436 | link |
| 2025-01-16 | SVIA: A Street View Image Anonymization Framework for Self-Driving Applications | Dongyu Liu et.al. | 2501.09393 | link |
| 2025-01-15 | UNIR-Net: A Novel Approach for Restoring Underwater Images with Non-Uniform Illumination Using Synthetic Data | Ezequiel Perez-Zarate et.al. | 2501.09053 | link |
| 2025-01-15 | Pseudolabel guided pixels contrast for domain adaptive semantic segmentation | Jianzi Xiang et.al. | 2501.09040 | link |
| 2025-01-14 | FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing | Isaac Corley et.al. | 2501.08490 | null |
| 2025-01-14 | Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers | Efstathios Karypidis et.al. | 2501.08303 | link |
| 2025-01-14 | SmartEraser: Remove Anything from Images using Masked-Region Guidance | Longtao Jiang et.al. | 2501.08279 | null |
| 2025-01-14 | A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation | Steven Landgraf et.al. | 2501.08188 | null |
| 2025-01-14 | Threshold Attention Network for Semantic Segmentation of Remote Sensing Images | Wei Long et.al. | 2501.07984 | null |
| 2025-01-14 | SkipClick: Combining Quick Responses and Low-Level Features for Interactive Segmentation in Winter Sports Contexts | Robin Schön et.al. | 2501.07960 | null |
| 2025-01-14 | Balance Divergence for Knowledge Distillation | Yafei Qi et.al. | 2501.07804 | null |
| 2025-01-13 | Kolmogorov-Arnold Network for Remote Sensing Image Semantic Segmentation | Xianping Ma et.al. | 2501.07390 | link |
| 2025-01-13 | TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry Operations | Daniel Steininger et.al. | 2501.07360 | link |
| 2025-01-13 | Toward Realistic Camouflaged Object Detection: Benchmarks and Method | Zhimeng Xin et.al. | 2501.07297 | link |
| 2025-01-13 | Skip Mamba Diffusion for Monocular 3D Semantic Scene Completion | Li Liang et.al. | 2501.07260 | link |
| 2025-01-12 | LarvSeg: Exploring Image Classification Data For Large Vocabulary Semantic Segmentation via Category-wise Attentive Classifier | Haojun Yu et.al. | 2501.06862 | link |
| 2025-01-12 | SAM-DA: Decoder Adapter for Efficient Medical Domain Adaptation | Javier Gamazo Tejero et.al. | 2501.06836 | null |
| 2025-01-12 | Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation | Zhenyang Feng et.al. | 2501.06749 | null |
| 2025-01-11 | Parking Space Detection in the City of Granada | Crespo-Orti Luis et.al. | 2501.06651 | link |
| 2025-01-06 | The 2nd Place Solution from the 3D Semantic Segmentation Track in the 2024 Waymo Open Dataset Challenge | Qing Wu et.al. | 2501.05472 | null |
| 2025-01-09 | Domain-Incremental Semantic Segmentation for Autonomous Driving under Adverse Driving Conditions | Shishir Muralidhara et.al. | 2501.05246 | null |
| 2025-01-09 | Advancing ALS Applications with Large-Scale Pre-training: Dataset Development and Downstream Assessment | Haoyi Xiu et.al. | 2501.05095 | null |
| 2025-01-08 | Test-Time Optimization for Domain Adaptive Open Vocabulary Segmentation | Ulindu De Silva et.al. | 2501.04696 | link |
| 2025-01-08 | Rapid Automated Mapping of Clouds on Titan With Instance Segmentation | Zachary Yahn et.al. | 2501.04459 | link |
| 2025-01-07 | Superpixel Boundary Correction for Weakly-Supervised Semantic Segmentation on Histopathology Images | Hongyi Wu et.al. | 2501.03891 | null |
| 2025-01-07 | AutoFish: Dataset and Benchmark for Fine-grained Analysis of Fish | Stefan Hein Bengtson et.al. | 2501.03767 | null |
| 2025-01-07 | Image Segmentation: Inducing graph-based learning | Aryan Singh et.al. | 2501.03765 | link |
| 2025-01-06 | 4D-CS: Exploiting Cluster Prior for 4D Spatio-Temporal LiDAR Semantic Segmentation | Jiexi Zhong et.al. | 2501.02937 | null |
| 2025-01-08 | GLoG-CSUnet: Enhancing Vision Transformers with Adaptable Radiomic Features for Medical Image Segmentation | Niloufar Eghbali et.al. | 2501.02788 | link |
| 2025-01-04 | Unsupervised Class Generation to Expand Semantic Segmentation Datasets | Javier Montalvo et.al. | 2501.02264 | null |
| 2025-01-03 | DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data | Yuanpeng Tu et.al. | 2501.02048 | null |
| 2025-01-03 | Semantic Segmentation for Sequential Historical Maps by Learning from Only One Map | Yunshuang Yuan et.al. | 2501.01845 | null |
| 2025-01-03 | Dedicated Inference Engine and Binary-Weight Neural Networks for Lightweight Instance Segmentation | Tse-Wei Chen et.al. | 2501.01841 | null |
| 2025-01-03 | IAM: Enhancing RGB-D Instance Segmentation with New Benchmarks | Aecheon Jung et.al. | 2501.01685 | link |
| 2025-01-03 | Uncertainty and Energy based Loss Guided Semi-Supervised Semantic Segmentation | Rini Smita Thakur et.al. | 2501.01640 | null |
| 2025-01-02 | A Multi-task Supervised Compression Model for Split Computing | Yoshitomo Matsubara et.al. | 2501.01420 | link |
| 2025-01-02 | Leverage Cross-Attention for End-to-End Open-Vocabulary Panoptic Reconstruction | Xuan Yu et.al. | 2501.01119 | null |
| 2025-01-02 | Evidential Calibrated Uncertainty-Guided Interactive Segmentation paradigm for Ultrasound Images | Jiang Shang et.al. | 2501.01072 | null |
| 2025-01-02 | Efficient Connectivity-Preserving Instance Segmentation with Supervoxel-Based Loss Function | Anna Grim et.al. | 2501.01022 | link |
| 2025-01-03 | FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation | Bingyu Li et.al. | 2501.00877 | link |
| 2024-12-31 | Exploiting Boundary Loss for the Hierarchical Panoptic Segmentation of Plants and Leaves | Madeleine Darbyshire et.al. | 2501.00527 | link |
| 2024-12-31 | H-Net: A Multitask Architecture for Simultaneous 3D Force Estimation and Stereo Semantic Segmentation in Intracardiac Catheters | Pedram Fekri et.al. | 2501.00514 | null |
| 2024-12-31 | A Novel Shape Guided Transformer Network for Instance Segmentation in Remote Sensing Images | Dawen Yu et.al. | 2501.00360 | null |
| 2024-12-31 | PanoSLAM: Panoptic 3D Scene Reconstruction via Gaussian SLAM | Runnan Chen et.al. | 2501.00352 | null |
| 2024-12-31 | OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies | Runnan Chen et.al. | 2501.00326 | link |
| 2024-12-30 | HisynSeg: Weakly-Supervised Histopathological Image Segmentation via Image-Mixing Synthesis and Consistency Regularization | Zijie Fang et.al. | 2412.20924 | link |
| 2024-12-30 | LiDAR-Camera Fusion for Video Panoptic Segmentation without Video Training | Fardin Ayar et.al. | 2412.20881 | null |
| 2024-12-29 | Image Augmentation Agent for Weakly Supervised Semantic Segmentation | Wangyu Wu et.al. | 2412.20439 | null |
| 2024-12-27 | Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP | Zhongxing Xu et.al. | 2412.19650 | null |
| 2024-12-27 | An Actionable Hierarchical Scene Representation Enhancing Autonomous Inspection Missions in Unknown Environments | Vignesh Kottayam Viswanathan et.al. | 2412.19582 | null |
| 2024-12-27 | Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation | Chengyang Ye et.al. | 2412.19492 | link |
| 2024-12-26 | Impact of color and mixing proportion of synthetic point clouds on semantic segmentation | Shaojie Zhou et.al. | 2412.19145 | null |
| 2024-12-25 | Open-Vocabulary Panoptic Segmentation Using BERT Pre-Training of Vision-Language Multiway Transformer Model | Yi-Chia Chen et.al. | 2412.18917 | link |
| 2024-12-24 | AdaCo: Overcoming Visual Foundation Model Noise in 3D Semantic Segmentation via Adaptive Label Correction | Pufan Zou et.al. | 2412.18255 | null |
| 2024-12-25 | VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis | Shicheng Yin et.al. | 2412.18178 | link |
| 2024-12-24 | UniPLV: Towards Label-Efficient Open-World 3D Scene Understanding by Regional Visual Language Supervision | Yuru Wang et.al. | 2412.18131 | null |
| 2024-12-24 | LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding | Hao Li et.al. | 2412.17635 | null |
| 2024-12-25 | AFANet: Adaptive Frequency-Aware Network for Weakly-Supervised Few-Shot Semantic Segmentation | Jiaqi Ma et.al. | 2412.17601 | link |
| 2024-12-24 | Uncertainty-Participation Context Consistency Learning for Semi-supervised Semantic Segmentation | Jianjian Yin et.al. | 2412.17331 | link |
| 2024-12-22 | Multi-Scale Foreground-Background Confidence for Out-of-Distribution Segmentation | Samuel Marschall et.al. | 2412.16990 | null |
| 2024-12-22 | Detect Changes like Humans: Incorporating Semantic Priors for Improved Change Detection | Yuhang Gan et.al. | 2412.16918 | null |
| 2024-12-22 | MAGIC++: Efficient and Resilient Modality-Agnostic Semantic Segmentation via Hierarchical Modality Selection | Xu Zheng et.al. | 2412.16876 | null |
| 2024-12-22 | Adversarial Diffusion Model for Unsupervised Domain-Adaptive Semantic Segmentation | Jongmin Yu et.al. | 2412.16859 | null |
| 2024-12-21 | A Novel Approach to Tomato Harvesting Using a Hybrid Gripper with Semantic Segmentation and Keypoint Detection | Shahid Ansari et.al. | 2412.16755 | null |
| 2024-12-21 | IV-tuning: Parameter-Efficient Transfer Learning for Infrared-Visible Tasks | Yaming Zhang et.al. | 2412.16654 | link |
| 2024-12-21 | V”Mean”ba: Visual State Space Models only need 1 hidden dimension | Tien-Yu Chi et.al. | 2412.16602 | null |
| 2024-12-20 | SegCol Challenge: Semantic Segmentation for Tools and Fold Edges in Colonoscopy data | Xinwei Ju et.al. | 2412.16078 | null |
| 2024-12-20 | Enhancing Generalized Few-Shot Semantic Segmentation via Effective Knowledge Transfer | Xinyue Chen et.al. | 2412.15835 | link |
| 2024-12-19 | MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance | Hallee E. Wong et.al. | 2412.15058 | link |
| 2024-12-19 | GIRAFE: Glottal Imaging Dataset for Advanced Segmentation, Analysis, and Facilitative Playbacks Evaluation | G. Andrade-Miranda et.al. | 2412.15054 | link |
| 2024-12-19 | PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation | Shoumeng Qiu et.al. | 2412.14821 | link |
| 2024-12-19 | Progressive Fine-to-Coarse Reconstruction for Accurate Low-Bit Post-Training Quantization in Vision Transformers | Rui Ding et.al. | 2412.14633 | null |
| 2024-12-19 | Spike2Former: Efficient Spiking Transformer for High-performance Image Segmentation | Zhenxin Lei et.al. | 2412.14587 | null |
| 2024-12-18 | Split Learning in Computer Vision for Semantic Segmentation Delay Minimization | Nikos G. Evgenidis et.al. | 2412.14272 | null |
| 2024-12-18 | Incorporating Feature Pyramid Tokenization and Open Vocabulary Semantic Segmentation | Jianyu Zhang et.al. | 2412.14145 | null |
| 2024-12-18 | Prompt Categories Cluster for Weakly Supervised Semantic Segmentation | Wangyu Wu et.al. | 2412.13823 | null |
| 2024-12-18 | Federated Source-free Domain Adaptation for Classification: Weighted Cluster Aggregation for Unlabeled Data | Junki Mori et.al. | 2412.13757 | null |
| 2024-12-18 | Optical aberrations in autonomous driving: Physics-informed parameterized temperature scaling for neural network uncertainty calibration | Dominik Werner Wolf et.al. | 2412.13695 | null |
| 2024-12-18 | GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting | Yuning Peng et.al. | 2412.13654 | link |
| 2024-12-18 | RelationField: Relate Anything in Radiance Fields | Sebastian Koch et.al. | 2412.13652 | null |
| 2024-12-17 | S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging | Yimu Pan et.al. | 2412.13156 | null |
| 2024-12-17 | Efficient Event-based Semantic Segmentation with Spike-driven Lightweight Transformer-based Networks | Xiaxin Zhu et.al. | 2412.12843 | null |
| 2024-12-17 | ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation | Shiqi Huang et.al. | 2412.12798 | link |
| 2024-12-17 | Open-World Panoptic Segmentation | Matteo Sodano et.al. | 2412.12740 | null |
| 2024-12-17 | SemStereo: Semantic-Constrained Stereo Matching Network for Remote Sensing | Chen Chen et.al. | 2412.12685 | link |
| 2024-12-17 | Structural Pruning via Spatial-aware Information Redundancy for Semantic Segmentation | Dongyue Wu et.al. | 2412.12672 | link |
| 2024-12-17 | Adaptive Prototype Replay for Class Incremental Semantic Segmentation | Guilin Zhu et.al. | 2412.12669 | null |
| 2024-12-17 | SEG-SAM: Semantic-Guided SAM for Unified Medical Image Segmentation | Shuangping Huang et.al. | 2412.12660 | null |
| 2024-12-16 | Exploring Semantic Consistency and Style Diversity for Domain Generalized Semantic Segmentation | Hongwei Niu et.al. | 2412.12050 | link |
| 2024-12-16 | SAMIC: Segment Anything with In-Context Spatial Prompt Engineering | Savinay Nagendra et.al. | 2412.11998 | null |
| 2024-12-16 | SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation | Yunxiang Fu et.al. | 2412.11890 | link |
| 2024-12-16 | Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation | Svetlana Pavlitska et.al. | 2412.11608 | null |
| 2024-12-16 | PyPotteryLens: An Open-Source Deep Learning Framework for Automated Digitisation of Archaeological Pottery Documentation | Lorenzo Cardarelli et.al. | 2412.11574 | null |
| 2024-12-15 | Volumetric Mapping with Panoptic Refinement via Kernel Density Estimation for Mobile Robots | Khang Nguyen et.al. | 2412.11241 | link |
| 2024-12-15 | MoRe: Class Patch Attention Needs Regularization for Weakly Supervised Semantic Segmentation | Zhiwei Yang et.al. | 2412.11076 | link |
| 2024-12-15 | Classification Drives Geographic Bias in Street Scene Segmentation | Rahul Nair et.al. | 2412.11061 | null |
| 2024-12-15 | SAM-IF: Leveraging SAM for Incremental Few-Shot Instance Segmentation | Xudong Zhou et.al. | 2412.11034 | null |
| 2024-12-14 | RapidNet: Multi-Level Dilated Convolution Based Mobile Backbone | Mustafa Munir et.al. | 2412.10995 | link |
| 2024-12-13 | A Universal Degradation-based Bridging Technique for Domain Adaptive Semantic Segmentation | Wangkai Li et.al. | 2412.10339 | null |
| 2024-12-13 | SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians | Siyun Liang et.al. | 2412.10231 | null |
| 2024-12-13 | SPT: Sequence Prompt Transformer for Interactive Image Segmentation | Senlin Cheng et.al. | 2412.10224 | null |
| 2024-12-13 | TSGaussian: Semantic and Depth-Guided Target-Specific Gaussian Splatting from Sparse Views | Liang Zhao et.al. | 2412.10051 | null |
| 2024-12-13 | Object-Focused Data Selection for Dense Prediction Tasks | Niclas Popp et.al. | 2412.10032 | null |
| 2024-12-12 | MaskTerial: A Foundation Model for Automated 2D Material Flake Detection | Jan-Lucas Uslu et.al. | 2412.09333 | null |
| 2024-12-12 | Towards Open-Vocabulary Video Semantic Segmentation | Xinhao Li et.al. | 2412.09329 | null |
| 2024-12-12 | FAMNet: Frequency-aware Matching Network for Cross-domain Few-shot Medical Image Segmentation | Yuntian Bo et.al. | 2412.09319 | link |
| 2024-12-12 | VLMs meet UDA: Boosting Transferability of Open Vocabulary Segmentation with Unsupervised Domain Adaptation | Roberto Alcover-Couso et.al. | 2412.09240 | null |
| 2024-12-12 | STEAM: Squeeze and Transform Enhanced Attention Module | Rishabh Sabharwal et.al. | 2412.09023 | null |
| 2024-12-11 | SegFace: Face Segmentation of Long-Tail Classes | Kartik Narayan et.al. | 2412.08647 | link |
| 2024-12-11 | EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation | Hongwei Niu et.al. | 2412.08628 | null |
| 2024-12-12 | Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning | Fan Lu et.al. | 2412.08614 | link |
| 2024-12-11 | Lightweight Method for Interactive 3D Medical Image Segmentation with Multi-Round Result Fusion | Bingzhi Shen et.al. | 2412.08315 | null |
| 2024-12-11 | Hierarchical Context Alignment with Disentangled Geometric and Temporal Modeling for Semantic Occupancy Prediction | Bohan Li et.al. | 2412.08243 | null |
| 2024-12-11 | THUD++: Large-Scale Dynamic Indoor Scene Dataset and Benchmark for Mobile Robots | Zeshun Li et.al. | 2412.08096 | null |
| 2024-12-11 | Static-Dynamic Class-level Perception Consistency in Video Semantic Segmentation | Zhigang Cen et.al. | 2412.08034 | null |
| 2024-12-10 | Balancing Shared and Task-Specific Representations: A Hybrid Approach to Depth-Aware Video Panoptic Segmentation | Kurt H. W. Stolle et.al. | 2412.07966 | link |
| 2024-12-11 | CADSpotting: Robust Panoptic Symbol Spotting on Large-Scale CAD Drawings | Jiazuo Mu et.al. | 2412.07377 | null |
| 2024-12-09 | SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception | Yaniv Benny et.al. | 2412.06968 | null |
| 2024-12-10 | ContRail: A Framework for Realistic Railway Image Synthesis using ControlNet | Andrei-Robert Alexandrescu et.al. | 2412.06742 | null |
| 2024-12-09 | Active Learning with Context Sampling and One-vs-Rest Entropy for Semantic Segmentation | Fei Wu et.al. | 2412.06470 | null |
| 2024-12-09 | Open-Vocabulary High-Resolution 3D (OVHR3D) Data Segmentation and Annotation Framework | Jiuyi Xu et.al. | 2412.06268 | null |
| 2024-12-09 | GCUNet: A GNN-Based Contextual Learning Network for Tertiary Lymphoid Structure Semantic Segmentation in Whole Slide Image | Lei Su et.al. | 2412.06129 | null |
| 2024-12-08 | Efficient Semantic Splatting for Remote Sensing Multi-view Segmentation | Zipeng Qi et.al. | 2412.05969 | null |
| 2024-12-08 | CSG: A Context-Semantic Guided Diffusion Approach in De Novo Musculoskeletal Ultrasound Image Generation | Elay Dahan et.al. | 2412.05833 | null |
| 2024-12-07 | Integrating YOLO11 and Convolution Block Attention Module for Multi-Season Segmentation of Tree Trunks and Branches in Commercial Apple Orchards | Ranjan Sapkota et.al. | 2412.05728 | null |
| 2024-12-10 | RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts | Xu Liu et.al. | 2412.05679 | link |
| 2024-12-06 | FogROS2-FT: Fault Tolerant Cloud Robotics | Kaiyuan Chen et.al. | 2412.05408 | null |
| 2024-12-06 | DreamColour: Controllable Video Colour Editing without Training | Chaitat Utintu et.al. | 2412.05180 | null |
| 2024-12-05 | Assessing and Learning Alignment of Unimodal Vision and Language Models | Le Zhang et.al. | 2412.04616 | null |
| 2024-12-05 | Towards Real-Time Open-Vocabulary Video Instance Segmentation | Bin Yan et.al. | 2412.04434 | null |
| 2024-12-05 | A Hitchhiker’s Guide to Understanding Performances of Two-Class Classifiers | Anaïs Halin et.al. | 2412.04377 | null |
| 2024-12-05 | Customize Segment Anything Model for Multi-Modal Semantic Segmentation with Mixture of LoRA Experts | Chenyang Zhu et.al. | 2412.04220 | null |
| 2024-12-05 | Text Change Detection in Multilingual Documents Using Image Comparison | Doyoung Park et.al. | 2412.04137 | null |
| 2024-12-05 | SoRA: Singular Value Decomposed Low-Rank Adaptation for Domain Generalizable Representation Learning | Seokju Yun et.al. | 2412.04077 | null |
| 2024-12-05 | Quality Control in Open-Ended Crowdsourcing: A Survey | Lei Chai et.al. | 2412.03991 | null |
| 2024-12-05 | Exact: Exploring Space-Time Perceptive Clues for Weakly Supervised Satellite Image Time Series Semantic Segmentation | Hao Zhu et.al. | 2412.03968 | link |
| 2024-12-05 | LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model | Yuan Xue et.al. | 2412.03841 | null |
| 2024-12-04 | Designing DNNs for a trade-off between robustness and processing performance in embedded devices | Jon Gutiérrez-Zaballa et.al. | 2412.03682 | null |
| 2024-12-04 | FLAIR: VLM with Fine-grained Language-informed Image Representations | Rui Xiao et.al. | 2412.03561 | link |
| 2024-12-04 | Benchmarking Pretrained Attention-based Models for Real-Time Recognition in Robot-Assisted Esophagectomy | Ronald L. P. D. de Jong et.al. | 2412.03401 | null |
| 2024-12-04 | Task-driven Image Fusion with Learnable Fusion Loss | Haowen Bai et.al. | 2412.03240 | null |
| 2024-12-04 | Biologically-inspired Semi-supervised Semantic Segmentation for Biomedical Imaging | Luca Ciampi et.al. | 2412.03192 | null |
| 2024-12-04 | Is Foreground Prototype Sufficient? Few-Shot Medical Image Segmentation with Background-Fused Prototype | Song Tang et.al. | 2412.02983 | null |
| 2024-12-04 | Progressive Vision-Language Prompt for Multi-Organ Multi-Class Cell Semantic Segmentation with Single Branch | Qing Zhang et.al. | 2412.02978 | null |
| 2024-12-04 | Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution | Jiahua Xiao et.al. | 2412.02960 | null |
| 2024-12-04 | Panoptic Diffusion Models: co-generation of images and segmentation maps | Yinghan Long et.al. | 2412.02929 | null |
| 2024-12-03 | SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection | Joongwon Chae et.al. | 2412.02565 | null |
| 2024-12-03 | Multi-scale and Multi-path Cascaded Convolutional Network for Semantic Segmentation of Colorectal Polyps | Malik Abdul Manan et.al. | 2412.02443 | null |
| 2024-12-03 | AH-OCDA: Amplitude-based Curriculum Learning and Hopfield Segmentation Model for Open Compound Domain Adaptation | Jaehyun Choi et.al. | 2412.02280 | null |
| 2024-12-03 | Vision Transformers for Weakly-Supervised Microorganism Enumeration | Javier Ureña Santiago et.al. | 2412.02250 | link |
| 2024-12-03 | Multi-robot autonomous 3D reconstruction using Gaussian splatting with Semantic guidance | Jing Zeng et.al. | 2412.02249 | null |
| 2024-12-02 | INSIGHT: Explainable Weakly-Supervised Medical Image Analysis | Wenbo Zhang et.al. | 2412.02012 | null |
| 2024-12-02 | Global Average Feature Augmentation for Robust Semantic Segmentation with Transformers | Alberto Gonzalo Rodriguez Salgado et.al. | 2412.01941 | null |
| 2024-12-02 | COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training | Sanghwan Kim et.al. | 2412.01814 | null |
| 2024-12-02 | Robust and Transferable Backdoor Attacks Against Deep Image Compression With Selective Frequency Prior | Yi Yu et.al. | 2412.01646 | null |
| 2024-12-02 | Epipolar Attention Field Transformers for Bird’s Eye View Semantic Segmentation | Christian Witte et.al. | 2412.01595 | null |
| 2024-11-29 | LDA-AQU: Adaptive Query-guided Upsampling via Local Deformable Attention | Zewen Du et.al. | 2411.19585 | link |
| 2024-11-29 | Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding | Wenbo Zhang et.al. | 2411.19551 | null |
| 2024-11-29 | Retrieval-guided Cross-view Image Synthesis | Hongji Yang et.al. | 2411.19510 | null |
| 2024-11-29 | Adaptive Interactive Segmentation for Multimodal Medical Imaging via Selection Engine | Zhi Li et.al. | 2411.19447 | link |
| 2024-11-28 | GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model | Rui Zhou et.al. | 2411.19289 | null |
| 2024-11-28 | InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception | Haijie Li et.al. | 2411.19235 | null |
| 2024-11-28 | MVFormer: Diversifying Feature Normalization and Token Mixing for Efficient Vision Transformers | Jongseong Bae et.al. | 2411.18995 | null |
| 2024-11-28 | Textured As-Is BIM via GIS-informed Point Cloud Segmentation | Mohamed S. H. Alabassy et.al. | 2411.18898 | null |
| 2024-11-27 | The Last Mile to Supervised Performance: Semi-Supervised Domain Adaptation for Semantic Segmentation | Daniel Morales-Brotons et.al. | 2411.18728 | null |
| 2024-11-27 | HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior | Li-Yuan Tsao et.al. | 2411.18662 | link |
| 2024-11-26 | Low-rank Adaptation-based All-Weather Removal for Autonomous Navigation | Sudarshan Rajagopalan et.al. | 2411.17814 | null |
| 2024-11-26 | Efficient Multi-modal Large Language Models via Visual Token Grouping | Minbin Huang et.al. | 2411.17773 | null |
| 2024-11-26 | Modality-Incremental Learning with Disjoint Relevance Mapping Networks for Image-based Semantic Segmentation | Niharika Hegde et.al. | 2411.17610 | null |
| 2024-11-26 | A Bilayer Segmentation-Recombination Network for Accurate Segmentation of Overlapping C. elegans | Mengqian Dinga et.al. | 2411.17557 | null |
| 2024-11-26 | Rapid Deployment of Domain-specific Hyperspectral Image Processors with Application to Autonomous Driving | Jon Gutiérrez-Zaballa et.al. | 2411.17543 | null |
| 2024-11-26 | Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning | Hoàng-Ân Lê et.al. | 2411.17536 | link |
| 2024-11-26 | TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba | Xiaowen Ma et.al. | 2411.17473 | link |
| 2024-11-26 | Self-supervised Video Instance Segmentation Can Boost Geographic Entity Alignment in Historical Maps | Xue Xia et.al. | 2411.17425 | null |
| 2024-11-26 | MRIFE: A Mask-Recovering and Interactive-Feature-Enhancing Semantic Segmentation Network For Relic Landslide Detection | Juefei He et.al. | 2411.17167 | null |
| 2024-11-26 | Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation | Chanyoung Kim et.al. | 2411.17150 | null |
| 2024-11-26 | ΩSFormer: Dual-Modal Ω-like Super-Resolution Transformer Network for Cross-scale and High-accuracy Terraced Field Vectorization Extraction | Chang Li et.al. | 2411.17088 | null |
| 2024-11-26 | SCASeg: Strip Cross-Attention for Efficient Semantic Segmentation | Guoan Xu et.al. | 2411.17061 | null |
| 2024-11-25 | Deformable Mamba for Wide Field of View Segmentation | Jie Hu et.al. | 2411.16481 | link |
| 2024-11-25 | A Study on Unsupervised Domain Adaptation for Semantic Segmentation in the Era of Vision-Language Models | Manuel Schwonberg et.al. | 2411.16407 | null |
| 2024-11-25 | CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation | Leon Sick et.al. | 2411.16319 | null |
| 2024-11-25 | An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models | Wentao Qu et.al. | 2411.16308 | null |
| 2024-11-25 | A Performance Increment Strategy for Semantic Segmentation of Low-Resolution Images from Damaged Roads | Rafael S. Toledo et.al. | 2411.16295 | null |
| 2024-11-25 | Weakly supervised image segmentation for defect-based grading of fresh produce | Manuel Knott et.al. | 2411.16219 | null |
| 2024-11-25 | Learn from Foundation Model: Fruit Detection Model without Manual Annotation | Yanan Wang et.al. | 2411.16196 | null |
| 2024-11-25 | Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking | Phuc Nguyen et.al. | 2411.16183 | null |
| 2024-11-25 | Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training | Man Yao et.al. | 2411.16061 | link |
| 2024-11-24 | Deep Learning for automated multi-scale functional field boundaries extraction using multi-date Sentinel-2 and PlanetScope imagery: Case Study of Netherlands and Pakistan | Saba Zahid et.al. | 2411.15923 | null |
| 2024-11-22 | Effective SAM Combination for Open-Vocabulary Semantic Segmentation | Minhyeok Lee et.al. | 2411.14723 | null |
| 2024-11-21 | Revisiting the Integration of Convolution and Attention for Vision Backbone | Lei Zhu et.al. | 2411.14429 | link |
| 2024-11-21 | CompetitorFormer: Competitor Transformer for 3D Instance Segmentation | Duanchu Wang et.al. | 2411.14179 | null |
| 2024-11-21 | CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation | Lin Sun et.al. | 2411.13836 | link |
| 2024-11-21 | Segment Any Class (SAC): Multi-Class Few-Shot Semantic Segmentation via Class Region Proposals | Hussni Mohd Zakir et.al. | 2411.13774 | null |
| 2024-11-20 | FAST-Splat: Fast, Ambiguity-Free Semantics Transfer in Gaussian Splatting | Ola Shorinwa et.al. | 2411.13753 | null |
| 2024-11-20 | DIS-Mine: Instance Segmentation for Disaster-Awareness in Poor-Light Condition in Underground Mines | Mizanur Rahman Jewel et.al. | 2411.13544 | null |
| 2024-11-21 | Entropy Bootstrapping for Weakly Supervised Nuclei Detection | James Willoughby et.al. | 2411.13528 | null |
| 2024-11-20 | BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation | Umamaheswaran Raman Kumar et.al. | 2411.13251 | null |
| 2024-11-20 | XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation | Ziyi Wang et.al. | 2411.13243 | link |
| 2024-11-20 | Automating Sonologists USG Commands with AI and Voice Interface | Emad Mohamed et.al. | 2411.13006 | null |
| 2024-11-19 | Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline | Junlong Cheng et.al. | 2411.12814 | link |
| 2024-11-19 | A Multimodal Approach Combining Structural and Cross-domain Textual Guidance for Weakly Supervised OCT Segmentation | Jiaqi Yang et.al. | 2411.12615 | link |
| 2024-11-19 | SAM Carries the Burden: A Semi-Supervised Approach Refining Pseudo Labels for Medical Segmentation | Ron Keuth et.al. | 2411.12602 | link |
| 2024-11-19 | ADV2E: Bridging the Gap Between Analogue Circuit and Discrete Frames in the Video-to-Events Simulator | Xiao Jiang et.al. | 2411.12250 | null |
| 2024-11-18 | ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements | M. Arda Aydın et.al. | 2411.12044 | link |
| 2024-11-18 | Calibrated and Efficient Sampling-Free Confidence Estimation for LiDAR Scene Semantic Segmentation | Hanieh Shojaei Miandashti et.al. | 2411.11935 | null |
| 2024-11-18 | MGNiceNet: Unified Monocular Geometric Scene Understanding | Markus Schön et.al. | 2411.11466 | null |
| 2024-11-18 | MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models | Harshita Sharma et.al. | 2411.11362 | null |
| 2024-11-18 | Reducing Label Dependency for Underwater Scene Understanding: A Survey of Datasets, Techniques and Applications | Scarlett Raine et.al. | 2411.11287 | null |
| 2024-11-18 | Zero-Shot Automatic Annotation and Instance Segmentation using LLM-Generated Datasets: Eliminating Field Imaging and Manual Annotation for Deep Learning Model Development | Ranjan Sapkota et.al. | 2411.11285 | null |
| 2024-11-16 | Attention-based U-Net Method for Autonomous Lane Detection | Mohammadhamed Tangestanizadeh et.al. | 2411.10902 | null |
| 2024-11-16 | Automatic Discovery and Assessment of Interpretable Systematic Errors in Semantic Segmentation | Jaisidh Singh et.al. | 2411.10845 | null |
| 2024-11-16 | Diffusion-Based Semantic Segmentation of Lumbar Spine MRI Scans of Lower Back Pain Patients | Maria Monzon et.al. | 2411.10755 | null |
| 2024-11-15 | Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation | Markus Karmann et.al. | 2411.10411 | null |
| 2024-11-15 | Y-MAP-Net: Real-time depth, normals, segmentation, multi-label captioning and 2D human pose in RGB images | Ammar Qammaz et.al. | 2411.10334 | null |
| 2024-11-15 | RETR: Multi-View Radar Detection Transformer for Indoor Perception | Ryoma Yataka et.al. | 2411.10293 | null |
| 2024-11-15 | CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation | Dengke Zhang et.al. | 2411.10086 | link |
| 2024-11-14 | OneNet: A Channel-Wise 1D Convolutional U-Net | Sanghyun Byun et.al. | 2411.09838 | link |
| 2024-11-14 | Instruction-Driven Fusion of Infrared-Visible Images: Tailoring for Diverse Downstream Tasks | Zengyi Yang et.al. | 2411.09387 | null |
| 2024-11-14 | Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation | Yuheng Shi et.al. | 2411.09219 | link |
| 2024-11-14 | Heuristical Comparison of Vision Transformers Against Convolutional Neural Networks for Semantic Segmentation on Remote Sensing Imagery | Ashim Dahal et.al. | 2411.09101 | link |
| 2024-11-13 | CoMiX: Cross-Modal Fusion with Deformable Convolutions for HSI-X Semantic Segmentation | Xuming Zhang et.al. | 2411.09023 | null |
| 2024-11-14 | Masked Image Modeling Boosting Semi-Supervised Semantic Segmentation | Yangyang Li et.al. | 2411.08756 | null |
| 2024-11-13 | Slender Object Scene Segmentation in Remote Sensing Image Based on Learnable Morphological Skeleton with Segment Anything Model | Jun Xie et.al. | 2411.08592 | null |
| 2024-11-13 | UIFormer: A Unified Transformer-based Framework for Incremental Few-Shot Object Detection and Instance Segmentation | Chengyuan Zhang et.al. | 2411.08569 | null |
| 2024-11-13 | Detection and classification of radio sources with deep learning | S. Riggi et.al. | 2411.08519 | null |
| 2024-11-12 | Isometric Transformations for Image Augmentation in Mueller Matrix Polarimetry | Christopher Hahne et.al. | 2411.07918 | link |
| 2024-11-12 | INTRABENCH: Interactive Radiological Benchmark | Constantin Ulrich et.al. | 2411.07885 | null |
| 2024-11-12 | Horticultural Temporal Fruit Monitoring via 3D Instance Segmentation and Re-Identification using Point Clouds | Daniel Fusaro et.al. | 2411.07799 | link |
| 2024-11-12 | Semantic segmentation on multi-resolution optical and microwave data using deep learning | Jai G Singla et.al. | 2411.07581 | null |
| 2024-11-12 | GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting | Umangi Jain et.al. | 2411.07555 | null |
| 2024-11-11 | Data-Centric Learning Framework for Real-Time Detection of Aiming Beam in Fluorescence Lifetime Imaging Guided Surgery | Mohamed Abul Hassan et.al. | 2411.07395 | null |
| 2024-11-11 | SAMPart3D: Segment Any Part in 3D Objects | Yunhan Yang et.al. | 2411.07184 | link |
| 2024-11-11 | SIESEF-FusionNet: Spatial Inter-correlation Enhancement and Spatially-Embedded Feature Fusion Network for LiDAR Point Cloud Semantic Segmentation | Jiale Chen et.al. | 2411.06991 | null |
| 2024-11-11 | Fast and Efficient Transformer-based Method for Bird’s Eye View Instance Prediction | Miguel Antunes-García et.al. | 2411.06851 | link |
| 2024-11-11 | Can KAN Work? Exploring the Potential of Kolmogorov-Arnold Networks in Computer Vision | Yueyang Cang et.al. | 2411.06727 | null |
| 2024-11-10 | Few-shot Semantic Learning for Robust Multi-Biome 3D Semantic Mapping in Off-Road Environments | Deegan Atha et.al. | 2411.06632 | null |
| 2024-11-09 | Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing | Kaixuan Lu et.al. | 2411.06091 | null |
| 2024-11-08 | Joint-Optimized Unsupervised Adversarial Domain Adaptation in Remote Sensing Segmentation with Prompted Foundation Model | Shuchang Lyu et.al. | 2411.05878 | link |
| 2024-11-08 | Agricultural Landscape Understanding At Country-Scale | Radhika Dua et.al. | 2411.05359 | null |
| 2024-11-08 | Revisiting Network Perturbation for Semi-Supervised Semantic Segmentation | Sien Li et.al. | 2411.05307 | link |
| 2024-11-07 | In the Era of Prompt Learning with Vision-Language Models | Ankit Jha et.al. | 2411.04892 | null |
| 2024-11-08 | ZAHA: Introducing the Level of Facade Generalization and the Large-Scale Point Cloud Facade Semantic Segmentation Benchmark Dataset | Olaf Wysocki et.al. | 2411.04865 | link |
| 2024-11-06 | Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts | Zhitong Gao et.al. | 2411.03829 | link |
| 2024-11-06 | SA3DIP: Segment Any 3D Instance with Potential 3D Priors | Xi Yang et.al. | 2411.03819 | link |
| 2024-11-06 | Towards 3D Semantic Scene Completion for Autonomous Driving: A Meta-Learning Framework Empowered by Deformable Large-Kernel Attention and Mamba Model | Yansong Qu et.al. | 2411.03672 | null |
| 2024-11-05 | Enhancing Weakly Supervised Semantic Segmentation for Fibrosis via Controllable Image Generation | Zhiling Yue et.al. | 2411.03551 | null |
| 2024-11-05 | SynthSet: Generative Diffusion Model for Semantic Segmentation in Precision Agriculture | Andrew Heschl et.al. | 2411.03505 | link |
| 2024-11-05 | Rethinking Decoders for Transformer-based Semantic Segmentation: Compression is All You Need | Qishuai Wen et.al. | 2411.03033 | link |
| 2024-11-05 | Multi-modal NeRF Self-Supervision for LiDAR Semantic Segmentation | Xavier Timoneda et.al. | 2411.02969 | null |
| 2024-11-05 | Mapping Africa Settlements: High Resolution Urban and Rural Map by Deep Learning and Satellite Imagery | Mohammad Kakooei et.al. | 2411.02935 | null |
| 2024-11-05 | CIT: Rethinking Class-incremental Semantic Segmentation with a Class Independent Transformation | Jinchao Ge et.al. | 2411.02715 | null |
| 2024-11-04 | Deep Learning on 3D Semantic Segmentation: A Detailed Review | Thodoris Betsas et.al. | 2411.02104 | null |
| 2024-11-04 | Tree level change detection over Ahmedabad city using very high resolution satellite images and Deep Learning | Jai G Singla et.al. | 2411.02009 | null |
| 2024-11-04 | Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models | Sharat Agarwal et.al. | 2411.01925 | null |
| 2024-11-04 | DiffuMask-Editor: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability | Bo Gao et.al. | 2411.01819 | null |
| 2024-11-04 | Toward Integrating Semantic-aware Path Planning and Reliable Localization for UAV Operations | Thanh Nguyen Canh et.al. | 2411.01816 | null |
| 2024-11-05 | MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation | Duc Dang Trung Tran et.al. | 2411.01781 | null |
| 2024-11-03 | PreCM: The Padding-based Rotation Equivariant Convolution Mode for Semantic Segmentation | Xinyu Xu et.al. | 2411.01624 | null |
| 2024-11-01 | Enhancing Question Answering Precision with Optimized Vector Retrieval and Instructions | Lixiao Yang et.al. | 2411.01039 | null |
| 2024-11-01 | Event-guided Low-light Video Semantic Segmentation | Zhen Yao et.al. | 2411.00639 | null |
| 2024-11-01 | Automated Classification of Cell Shapes: A Comparative Evaluation of Shape Descriptors | Valentina Vadori et.al. | 2411.00561 | null |
| 2024-10-31 | Federated Black-Box Adaptation for Semantic Segmentation | Jay N. Paranjape et.al. | 2410.24181 | null |
| 2024-10-31 | COSNet: A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes | Muhammad Ali et.al. | 2410.24139 | link |
| 2024-10-31 | Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model | Hao Zhang et.al. | 2410.23905 | link |
| 2024-10-30 | S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving | Maciej K. Wozniak et.al. | 2410.23085 | null |
| 2024-10-31 | CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation | Ziyang Gong et.al. | 2410.22629 | link |
| 2024-10-29 | Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation | Zhaochong An et.al. | 2410.22489 | link |
| 2024-10-29 | Lightweight Frequency Masker for Cross-Domain Few-Shot Semantic Segmentation | Jintao Tong et.al. | 2410.22135 | null |
| 2024-10-29 | Hyperspectral Imaging-Based Perception in Autonomous Driving Scenarios: Benchmarking Baseline Semantic Segmentation Models | Imad Ali Shah et.al. | 2410.22101 | null |
| 2024-10-29 | Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation | Ruihao Xia et.al. | 2410.21708 | link |
| 2024-10-28 | Domain Adaptation with a Single Vision-Language Embedding | Mohammad Fahes et.al. | 2410.21361 | null |
| 2024-10-28 | IndraEye: Infrared Electro-Optical UAV-based Perception Dataset for Robust Downstream Tasks | Manjunath D et.al. | 2410.20953 | null |
| 2024-10-27 | A Framework for Real-Time Volcano-Seismic Event Recognition Based on Multi-Station Seismograms and Semantic Segmentation Models | Camilo Espinosa-Curilem et.al. | 2410.20595 | link |
| 2024-10-27 | Unlocking Comics: The AI4VA Dataset for Visual Understanding | Peter Grönquist et.al. | 2410.20459 | link |
| 2024-10-27 | Historical Test-time Prompt Tuning for Vision Foundation Models | Jingyi Zhang et.al. | 2410.20346 | null |
| 2024-10-25 | OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery | Philipe Dias et.al. | 2410.19965 | null |
| 2024-10-25 | IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation | Kaixian Qu et.al. | 2410.19697 | null |
| 2024-10-25 | Fusion-then-Distillation: Toward Cross-modal Positive Distillation for Domain Adaptive 3D Semantic Segmentation | Yao Wu et.al. | 2410.19446 | link |
| 2024-10-25 | Context-Based Visual-Language Place Recognition | Soojin Woo et.al. | 2410.19341 | link |
| 2024-10-24 | Every Component Counts: Rethinking the Measure of Success for Medical Semantic Segmentation in Multi-Instance Segmentation Tasks | Alexander Jaus et.al. | 2410.18684 | null |
| 2024-10-24 | Unsupervised semantic segmentation of urban high-density multispectral point clouds | Oona Oinonen et.al. | 2410.18520 | null |
| 2024-10-26 | CARLA2Real: a tool for reducing the sim2real gap in CARLA simulator | Stefanos Pasios et.al. | 2410.18238 | link |
| 2024-10-23 | Towards Safer Planetary Exploration: A Hybrid Architecture for Terrain Traversability Analysis in Mars Rovers | Achille Chiuchiarelli et.al. | 2410.17738 | null |
| 2024-10-23 | YOLOv11: An Overview of the Key Architectural Enhancements | Rahima Khanam et.al. | 2410.17725 | null |
| 2024-10-23 | PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting | Yu Wang et.al. | 2410.17505 | null |
| 2024-10-22 | EPContrast: Effective Point-level Contrastive Learning for Large-scale Point Cloud Understanding | Zhiyi Pan et.al. | 2410.17207 | null |
| 2024-10-22 | LIMIS: Towards Language-based Interactive Medical Image Segmentation | Lena Heinemann et.al. | 2410.16939 | null |
| 2024-10-22 | DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model | Zhixiong Nan et.al. | 2410.16707 | null |
| 2024-10-22 | SERN: Simulation-Enhanced Realistic Navigation for Multi-Agent Robotic Systems in Contested Environments | Jumman Hossain et.al. | 2410.16686 | null |
| 2024-10-22 | NucleiMix: Realistic Data Augmentation for Nuclei Instance Segmentation | Jiamu Wang et.al. | 2410.16671 | null |
| 2024-10-21 | PlaneSAM: Multimodal Plane Instance Segmentation Using the Segment Anything Model | Zhongchen Deng et.al. | 2410.16545 | null |
| 2024-10-21 | TIPS: Text-Image Pretraining with Spatial Awareness | Kevis-Kokitsi Maninis et.al. | 2410.16512 | link |
| 2024-10-21 | GenGMM: Generalized Gaussian-Mixture-based Domain Adaptation Model for Semantic Segmentation | Nazanin Moradinasab et.al. | 2410.16485 | null |
| 2024-10-21 | Integrated Image-Text Based on Semi-supervised Learning for Small Sample Instance Segmentation | Ruting Chi et.al. | 2410.16063 | null |
| 2024-10-21 | LiOn-XA: Unsupervised Domain Adaptation via LiDAR-Only Cross-Modal Adversarial Training | Thomas Kreutz et.al. | 2410.15833 | link |
| 2024-10-21 | TALoS: Enhancing Semantic Scene Completion via Test-time Adaptation on the Line of Sight | Hyun-Kurl Jang et.al. | 2410.15674 | link |
| 2024-10-21 | Deep Learning and Machine Learning – Object Detection and Semantic Segmentation: From Theory to Applications | Jintao Ren et.al. | 2410.15584 | null |
| 2024-10-20 | Multi-Layer Feature Fusion with Cross-Channel Attention-Based U-Net for Kidney Tumor Segmentation | Fnu Neha et.al. | 2410.15472 | null |
| 2024-10-20 | Improving 3D Medical Image Segmentation at Boundary Regions using Local Self-attention and Global Volume Mixing | Daniya Najiha Abdul Kareem et.al. | 2410.15360 | null |
| 2024-10-18 | On the Influence of Shape, Texture and Color for Learning Semantic Segmentation | Annika Mütze et.al. | 2410.14878 | null |
| 2024-10-18 | Automated Road Extraction from Satellite Imagery Integrating Dense Depthwise Dilated Separable Spatial Pyramid Pooling with DeepLabV3+ | Arpan Mahara et.al. | 2410.14836 | null |
| 2024-10-18 | Impact of imperfect annotations on CNN training and performance for instance segmentation and classification in digital pathology | Laura Gálvez Jiménez et.al. | 2410.14365 | null |
| 2024-10-17 | ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding | Guangda Ji et.al. | 2410.13924 | null |
| 2024-10-17 | Multi-style conversion for semantic segmentation of lesions in fundus images by adversarial attacks | Clément Playout et.al. | 2410.13822 | link |
| 2024-10-18 | Enhanced Prompt-leveraged Weakly Supervised Cancer Segmentation based on Segment Anything | Joonhyeon Song et.al. | 2410.13621 | link |
| 2024-10-17 | Day-Night Adaptation: An Innovative Source-free Adaptation Framework for Medical Image Segmentation | Ziyang Chen et.al. | 2410.13472 | null |
| 2024-10-17 | SiamSeg: Self-Training with Contrastive Learning for Unsupervised Domain Adaptation in Remote Sensing | Bin Wang et.al. | 2410.13471 | link |
| 2024-10-17 | Railway LiDAR semantic segmentation based on intelligent semi-automated data annotation | Florian Wulff et.al. | 2410.13383 | null |
| 2024-10-17 | LESS: Label-Efficient and Single-Stage Referring 3D Segmentation | Xuexun Liu et.al. | 2410.13294 | null |
| 2024-10-17 | Adversarial Neural Networks in Medical Imaging Advancements and Challenges in Semantic Segmentation | Houze Liu et.al. | 2410.13099 | null |
| 2024-10-16 | Task Consistent Prototype Learning for Incremental Few-shot Semantic Segmentation | Wenbo Xu et.al. | 2410.13094 | null |
| 2024-10-16 | Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation | Anthony Opipari et.al. | 2410.12995 | null |
| 2024-10-16 | Risk Assessment for Autonomous Landing in Urban Environments using Semantic Segmentation | Jesús Alejandro Loera-Ponce et.al. | 2410.12988 | null |
| 2024-10-16 | VividMed: Vision Language Model with Versatile Visual Grounding for Medicine | Lingxiao Luo et.al. | 2410.12694 | null |
| 2024-10-16 | Cascade learning in multi-task encoder-decoder networks for concurrent bone segmentation and glenohumeral joint assessment in shoulder CT scans | Luca Marsilio et.al. | 2410.12641 | null |
| 2024-10-16 | Order-Aware Interactive Segmentation | Bin Wang et.al. | 2410.12214 | null |
| 2024-10-16 | SAM-Guided Masked Token Prediction for 3D Scene Understanding | Zhimin Chen et.al. | 2410.12158 | null |
| 2024-10-15 | WeatherDG: LLM-assisted Procedural Weather Generation for Domain-Generalized Semantic Segmentation | Chenghao Qian et.al. | 2410.12075 | null |
| 2024-10-15 | Development and Testing of a Wood Panels Bark Removal Equipment Based on Deep Learning | Rijun Wang et.al. | 2410.11913 | null |
| 2024-10-15 | Fractal Calibration for long-tailed object detection | Konstantinos Panagiotis Alexandridis et.al. | 2410.11774 | null |
| 2024-10-15 | RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation | Anton Antonov et.al. | 2410.11722 | link |
| 2024-10-15 | InvSeg: Test-Time Prompt Inversion for Semantic Segmentation | Jiayi Lin et.al. | 2410.11473 | null |
| 2024-10-15 | MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation | Xianping Ma et.al. | 2410.11160 | link |
| 2024-10-14 | Locality Alignment Improves Vision-Language Models | Ian Covert et.al. | 2410.11087 | null |
| 2024-10-14 | Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes | Tim Broedermann et.al. | 2410.10791 | null |
| 2024-10-14 | UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation | Lihe Yang et.al. | 2410.10777 | link |
| 2024-10-14 | PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion | Runsong Zhu et.al. | 2410.10659 | link |
| 2024-10-14 | Exploiting Local Features and Range Images for Small Data Real-Time Point Cloud Semantic Segmentation | Daniel Fusaro et.al. | 2410.10510 | link |
| 2024-10-14 | LKASeg:Remote-Sensing Image Semantic Segmentation with Large Kernel Attention and Full-Scale Skip Connections | Xuezhi Xiang et.al. | 2410.10433 | null |
| 2024-10-14 | V2M: Visual 2-Dimensional Mamba for Image Representation Learning | Chengkun Wang et.al. | 2410.10382 | link |
| 2024-10-14 | GlobalMamba: Global Image Serialization for Vision Mamba | Chengkun Wang et.al. | 2410.10316 | link |
| 2024-10-13 | UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation | Ye Sun et.al. | 2410.09909 | null |
| 2024-10-13 | AM-SAM: Automated Prompting and Mask Calibration for Segment Anything Model | Yuchen Li et.al. | 2410.09714 | null |
| 2024-10-12 | An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation | Wei Liang et.al. | 2410.09443 | null |
| 2024-10-11 | Parallel Watershed Partitioning: GPU-Based Hierarchical Image Segmentation | Varduhi Yeghiazaryan et.al. | 2410.08946 | null |
| 2024-10-11 | Uncertainty Estimation and Out-of-Distribution Detection for LiDAR Scene Semantic Segmentation | Hanieh Shojaei et.al. | 2410.08687 | null |
| 2024-10-11 | DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention | Nguyen Huu Bao Long et.al. | 2410.08582 | link |
| 2024-10-10 | Are We Ready for Real-Time LiDAR Semantic Segmentation in Autonomous Driving? | Samir Abou Haidar et.al. | 2410.08365 | null |
| 2024-10-10 | Interactive4D: Interactive 4D LiDAR Segmentation | Ilya Fradlin et.al. | 2410.08206 | null |
| 2024-10-10 | Distribution Guidance Network for Weakly Supervised Point Cloud Semantic Segmentation | Zhiyi Pan et.al. | 2410.08091 | null |
| 2024-10-10 | Shift and matching queries for video semantic segmentation | Tsubasa Mizuno et.al. | 2410.07635 | null |
| 2024-10-10 | 3D Vision-Language Gaussian Splatting | Qucheng Peng et.al. | 2410.07577 | null |
| 2024-10-09 | Segmenting objects with Bayesian fusion of active contour models and convnet priors | Przemyslaw Polewski et.al. | 2410.07421 | null |
| 2024-10-11 | Bridge the Points: Graph-based Few-shot Segment Anything Semantically | Anqi Zhang et.al. | 2410.06964 | null |
| 2024-10-09 | Learning from Spatio-temporal Correlation for Semi-Supervised LiDAR Semantic Segmentation | Seungho Lee et.al. | 2410.06893 | null |
| 2024-10-09 | Rethinking the Evaluation of Visible and Infrared Image Fusion | Dayan Guan et.al. | 2410.06811 | link |
| 2024-10-10 | QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model | Fei Xie et.al. | 2410.06806 | link |
| 2024-10-09 | Transesophageal Echocardiography Generation using Anatomical Models | Emmanuel Oladokun et.al. | 2410.06781 | null |
| 2024-10-09 | Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy | Qinfeng Zhu et.al. | 2410.06725 | null |
| 2024-10-09 | Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments | Meng Yu et.al. | 2410.06626 | null |
| 2024-10-09 | Towards Natural Image Matting in the Wild via Real-Scenario Prior | Ruihao Xia et.al. | 2410.06593 | link |
| 2024-10-08 | Adver-City: Open-Source Multi-Modal Dataset for Collaborative Perception Under Adverse Weather Conditions | Mateus Karvat et.al. | 2410.06380 | null |
| 2024-10-08 | Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts | Zhiwei Lin et.al. | 2410.05963 | null |
| 2024-10-07 | Low-Rank Continual Pyramid Vision Transformer: Incrementally Segment Whole-Body Organs in CT with Light-Weighted Adaptation | Vince Zhu et.al. | 2410.04689 | null |
| 2024-10-06 | In-Place Panoptic Radiance Field Segmentation with Perceptual Prior for 3D Scene Understanding | Shenghao Li et.al. | 2410.04529 | null |
| 2024-10-05 | ETHcavation: A Dataset and Pipeline for Panoptic Scene Understanding and Object Tracking in Dynamic Construction Environments | Lorenzo Terenzi et.al. | 2410.04250 | null |
| 2024-10-04 | SpecSAR-Former: A Lightweight Transformer-based Network for Global LULC Mapping Using Integrated Sentinel-1 and Sentinel-2 | Hao Yu et.al. | 2410.03962 | null |
| 2024-10-04 | Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features | Benyuan Meng et.al. | 2410.03558 | link |
| 2024-10-04 | Semantic Segmentation Based Quality Control of Histopathology Whole Slide Images | Abhijeet Patil et.al. | 2410.03289 | link |
| 2024-10-04 | HRVMamba: High-Resolution Visual State Space Model for Dense Prediction | Hao Zhang et.al. | 2410.03174 | null |
| 2024-10-03 | HiFiSeg: High-Frequency Information Enhanced Polyp Segmentation with Global-Local Vision Transformer | Jingjing Ren et.al. | 2410.02528 | null |
| 2024-10-06 | SynCo: Synthetic Hard Negatives in Contrastive Learning for Better Unsupervised Visual Representations | Nikolaos Giakoumoglou et.al. | 2410.02401 | link |
| 2024-10-04 | Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation | Muzhi Zhu et.al. | 2410.02369 | null |
| 2024-10-03 | ProtoSeg: A Prototype-Based Point Cloud Instance Segmentation Method | Remco Royen et.al. | 2410.02352 | null |
| 2024-10-03 | RESSCAL3D++: Joint Acquisition and Semantic Segmentation of 3D Point Clouds | Remco Royen et.al. | 2410.02323 | null |
| 2024-10-03 | Efficient Semantic Segmentation via Lightweight Multiple-Information Interaction Network | Yangyang Qiu et.al. | 2410.02224 | null |
| 2024-10-03 | Adapting Segment Anything Model to Melanoma Segmentation in Microscopy Slide Images | Qingyuan Liu et.al. | 2410.02207 | null |
| 2024-10-02 | SegEarth-OV: Towards Traning-Free Open-Vocabulary Segmentation for Remote Sensing Images | Kaiyu Li et.al. | 2410.01768 | link |
| 2024-10-02 | One-Shot Robust Imitation Learning for Long-Horizon Visuomotor Tasks from Unsegmented Demonstrations | Shaokang Wu et.al. | 2410.01630 | null |
| 2024-10-02 | Cognition Transferring and Decoupling for Text-supervised Egocentric Semantic Segmentation | Zhaofeng Shi et.al. | 2410.01341 | null |
| 2024-10-02 | VectorGraphNET: Graph Attention Networks for Accurate Segmentation of Complex Technical Drawings | Andrea Carrara et.al. | 2410.01336 | null |
| 2024-10-01 | RobustEMD: Domain Robust Matching for Cross-domain Few-shot Medical Image Segmentation | Yazhou Zhu et.al. | 2410.01110 | null |
| 2024-10-01 | Semantic Segmentation of Unmanned Aerial Vehicle Remote Sensing Images using SegFormer | Vlatko Spasev et.al. | 2410.01092 | null |
| 2024-10-01 | Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-Time | Chiao-An Yang et.al. | 2410.01083 | link |
| 2024-10-01 | DeepAerialMapper: Deep Learning-based Semi-automatic HD Map Creation for Highly Automated Vehicles | Robert Krajewski et.al. | 2410.00769 | null |
| 2024-10-01 | Optimizing Drug Delivery in Smart Pharmacies: A Novel Framework of Multi-Stage Grasping Network Combined with Adaptive Robotics Mechanism | Rui Tang et.al. | 2410.00753 | null |
| 2024-10-01 | Can We Remove the Ground? Obstacle-aware Point Cloud Compression for Remote Object Detection | Pengxi Zeng et.al. | 2410.00582 | null |
| 2024-09-30 | AUCSeg: AUC-oriented Pixel-level Long-tail Semantic Segmentation | Boyu Han et.al. | 2409.20398 | null |
| 2024-09-30 | Leveraging CAM Algorithms for Explaining Medical Semantic Segmentation | Tillmann Rheude et.al. | 2409.20287 | link |
| 2024-09-30 | Erase, then Redraw: A Novel Data Augmentation Approach for Free Space Detection Using Diffusion Model | Fulong Ma et.al. | 2409.20164 | null |
| 2024-09-30 | Segmenting Wood Rot using Computer Vision Models | Roland Kammerbauer et.al. | 2409.20137 | null |
| 2024-09-30 | Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels | Heeseong Shin et.al. | 2409.19846 | null |
| 2024-09-27 | ProMerge: Prompt and Merge for Unsupervised Instance Segmentation | Dylan Li et.al. | 2409.18961 | null |
| 2024-09-27 | Excavating in the Wild: The GOOSE-Ex Dataset for Semantic Segmentation | Raphael Hagmanns et.al. | 2409.18788 | null |
| 2024-09-27 | Learning from Pattern Completion: Self-supervised Controllable Generation | Zhiqiang Chen et.al. | 2409.18694 | link |
| 2024-09-27 | Reducing Semantic Ambiguity In Domain Adaptive Semantic Segmentation Via Probabilistic Prototypical Pixel Contrast | Xiaoke Hao et.al. | 2409.18543 | link |
| 2024-10-01 | Get It For Free: Radar Segmentation without Expert Labels and Its Application in Odometry and Localization | Siru Li et.al. | 2409.18434 | null |
| 2024-09-27 | Search3D: Hierarchical Open-Vocabulary 3D Segmentation | Ayca Takmaz et.al. | 2409.18431 | null |
| 2024-09-26 | Efficient Microscopic Image Instance Segmentation for Food Crystal Quality Control | Xiaoyu Ji et.al. | 2409.18291 | null |
| 2024-09-26 | Amodal Instance Segmentation with Diffusion Shape Prior Estimation | Minh Tran et.al. | 2409.18256 | null |
| 2024-09-26 | Hierarchical End-to-End Autonomous Driving: Integrating BEV Perception with Deep Reinforcement Learning | Siyi Lu et.al. | 2409.17659 | null |
| 2024-09-26 | Global-Local Medical SAM Adaptor Based on Full Adaption | Meng Wang et.al. | 2409.17486 | null |
| 2024-09-25 | VL4AD: Vision-Language Models Improve Pixel-wise Anomaly Detection | Liangyu Zhong et.al. | 2409.17330 | null |
| 2024-09-25 | 2024 BRAVO Challenge Track 1 1st Place Report: Evaluating Robustness of Vision Foundation Models for Semantic Segmentation | Tommie Kerssies et.al. | 2409.17208 | link |
| 2024-09-25 | WasteGAN: Data Augmentation for Robotic Waste Sorting through Generative Adversarial Networks | Alberto Bacchin et.al. | 2409.16999 | link |
| 2024-09-25 | Going Beyond U-Net: Assessing Vision Transformers for Semantic Segmentation in Microscopy Image Analysis | Illia Tsiporenko et.al. | 2409.16940 | null |
| 2024-09-24 | A novel open-source ultrasound dataset with deep learning benchmarks for spinal cord injury localization and anatomical segmentation | Avisha Kumar et.al. | 2409.16441 | null |
| 2024-09-24 | Instance Segmentation of Reinforced Concrete Bridges with Synthetic Point Clouds | Asad Ur Rahman et.al. | 2409.16381 | null |
| 2024-09-24 | Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation | Yong Xien Chng et.al. | 2409.16278 | null |
| 2024-09-24 | Fields of The World: A Machine Learning Benchmark Dataset For Global Agricultural Field Boundary Segmentation | Hannah Kerner et.al. | 2409.16252 | link |
| 2024-09-24 | Deep Learning for Precision Agriculture: Post-Spraying Evaluation and Deposition Estimation | Harry Rogers et.al. | 2409.16213 | link |
| 2024-09-24 | Potential Field as Scene Affordance for Behavior Change-Based Visual Risk Object Identification | Pang-Yuan Pao et.al. | 2409.15846 | null |
| 2024-09-24 | Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks | Roberto Alcover-Couso et.al. | 2409.15813 | null |
| 2024-09-24 | DIAL: Dense Image-text ALignment for Weakly Supervised Semantic Segmentation | Soojin Jang et.al. | 2409.15801 | null |
| 2024-09-24 | Autonomous Hiking Trail Navigation via Semantic Segmentation and Geometric Analysis | Camndon Reed et.al. | 2409.15671 | null |
| 2024-09-23 | Adapting Segment Anything Model for Unseen Object Instance Segmentation | Rui Cao et.al. | 2409.15481 | null |
| 2024-09-23 | ZeroSCD: Zero-Shot Street Scene Change Detection | Shyam Sundar Kannan et.al. | 2409.15255 | null |
| 2024-09-23 | Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer | Minh Bui et.al. | 2409.15117 | null |
| 2024-09-18 | Applications of Knowledge Distillation in Remote Sensing: A Survey | Yassine Himeur et.al. | 2409.12111 | null |
| 2024-09-18 | Panoptic-Depth Forecasting | Juana Valeria Hurtado et.al. | 2409.12008 | null |
| 2024-09-18 | Particle-based Instance-aware Semantic Occupancy Mapping in Dynamic Environments | Gang Chen et.al. | 2409.11975 | null |
| 2024-09-17 | Uncertainty and Prediction Quality Estimation for Semantic Segmentation via Graph Neural Networks | Edgar Heinert et.al. | 2409.11373 | null |
| 2024-09-17 | MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping | Amirreza Fateh et.al. | 2409.11316 | link |
| 2024-09-17 | Generalized Few-Shot Semantic Segmentation in Remote Sensing: Challenge and Benchmark | Clifford Broni-Bediako et.al. | 2409.11227 | link |
| 2024-09-17 | HS3-Bench: A Benchmark and Strong Baseline for Hyperspectral Semantic Segmentation in Driving Scenarios | Nick Theisen et.al. | 2409.11205 | link |
| 2024-09-16 | Are Deep Learning Models Robust to Partial Object Occlusion in Visual Recognition Tasks? | Kaleb Kassaw et.al. | 2409.10775 | null |
| 2024-09-16 | Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning | Amin Karimi Monsefi et.al. | 2409.10362 | null |
| 2024-09-16 | BAFNet: Bilateral Attention Fusion Network for Lightweight Semantic Segmentation of Urban Remote Sensing Images | Wentao Wang et.al. | 2409.10269 | null |
| 2024-09-15 | Semantic2D: A Semantic Dataset for 2D Lidar Semantic Segmentation | Zhanteng Xie et.al. | 2409.09899 | null |
| 2024-09-15 | Resolving Inconsistent Semantics in Multi-Dataset Image Segmentation | Qilong Zhangli et.al. | 2409.09893 | null |
| 2024-09-15 | High Definition Map Mapping and Update: A General Overview and Future Directions | Benny Wijaya et.al. | 2409.09726 | null |
| 2024-09-14 | One missing piece in Vision and Language: A Survey on Comics Understanding | Emanuele Vivoli et.al. | 2409.09502 | link |
| 2024-09-14 | Multi-Scale Grouped Prototypes for Interpretable Semantic Segmentation | Hugo Porta et.al. | 2409.09497 | null |
| 2024-09-14 | LACOSTE: Exploiting stereo and temporal contexts for surgical instrument segmentation | Qiyuan Wang et.al. | 2409.09360 | null |
| 2024-09-16 | QueryCAD: Grounded Question Answering for CAD Models | Claudius Kienle et.al. | 2409.08704 | null |
| 2024-09-13 | AWF: Adaptive Weight Fusion for Enhanced Class Incremental Semantic Segmentation | Zechao Sun et.al. | 2409.08516 | null |
| 2024-09-13 | VistaFormer: Scalable Vision Transformers for Satellite Image Time Series Segmentation | Ezra MacDonald et.al. | 2409.08461 | link |
| 2024-09-12 | Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding | Hongyu Li et.al. | 2409.08251 | null |
| 2024-09-12 | Bayesian Self-Training for Semi-Supervised 3D Segmentation | Ozan Unal et.al. | 2409.08102 | null |
| 2024-09-12 | Depth Matters: Exploring Deep Interactions of RGB-D for Semantic Segmentation in Traffic Scenes | Siyu Chen et.al. | 2409.07995 | null |
| 2024-09-12 | UNIT: Unsupervised Online Instance Segmentation through Time | Corentin Sautier et.al. | 2409.07887 | null |
| 2024-09-12 | SURGIVID: Annotation-Efficient Surgical Video Object Discovery | Çağhan Köksal et.al. | 2409.07801 | null |
| 2024-09-12 | Lagrange Duality and Compound Multi-Attention Transformer for Semi-Supervised Medical Image Segmentation | Fuchen Zheng et.al. | 2409.07793 | link |
| 2024-09-12 | ASSNet: Adaptive Semantic Segmentation Network for Microtumors and Multi-Organ Segmentation | Fuchen Zheng et.al. | 2409.07779 | link |
| 2024-09-12 | Open-Vocabulary Remote Sensing Image Semantic Segmentation | Qinglong Cao et.al. | 2409.07683 | null |
| 2024-09-11 | Token Turing Machines are Efficient Vision Models | Purvish Jajal et.al. | 2409.07613 | null |
| 2024-09-11 | AC-IND: Sparse CT reconstruction based on attenuation coefficient estimation and implicit neural distribution | Wangduo Xie et.al. | 2409.07171 | null |
| 2024-09-11 | Insight Any Instance: Promptable Instance Segmentation for Remote Sensing Images | Xuexue Li et.al. | 2409.07022 | null |
| 2024-09-11 | Brain-Inspired Stepwise Patch Merging for Vision Transformers | Yonghao Yu et.al. | 2409.06963 | null |
| 2024-09-10 | Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds | Mu Cai et.al. | 2409.06827 | link |
| 2024-09-10 | A Semantic Segmentation Approach on Sweet Orange Leaf Diseases Detection Utilizing YOLO | Sabit Ahamed Preanto et.al. | 2409.06671 | null |
| 2024-09-10 | Towards Localizing Structural Elements: Merging Geometrical Detection with Semantic Verification in RGB-D Data | Ali Tourani et.al. | 2409.06625 | null |
| 2024-09-10 | PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation | Yin Hu et.al. | 2409.06309 | null |
| 2024-09-10 | EDADepth: Enhanced Data Augmentation for Monocular Depth Estimation | Nischal Khanal et.al. | 2409.06183 | link |
| 2024-09-09 | SVS-GAN: Leveraging GANs for Semantic Video Synthesis | Khaled M. Seyam et.al. | 2409.06074 | null |
| 2024-09-09 | Enhanced Generative Data Augmentation for Semantic Segmentation via Stronger Guidance | Quang-Huy Che et.al. | 2409.06002 | null |
| 2024-09-09 | Segmentation by Factorization: Unsupervised Semantic Segmentation for Pathology by Factorizing Foundation Model Features | Jacob Gildenblat et.al. | 2409.05697 | null |
| 2024-09-09 | ICPR 2024 Competition on Safe Segmentation of Drive Scenes in Unstructured Traffic and Adverse Weather Conditions | Furqan Ahmed Shaik et.al. | 2409.05327 | null |
| 2024-09-08 | RCBEVDet++: Toward High-accuracy Radar-Camera Fusion 3D Perception Network | Zhiwei Lin et.al. | 2409.04979 | null |
| 2024-09-06 | Train Till You Drop: Towards Stable and Robust Source-free Unsupervised 3D Domain Adaptation | Björn Michele et.al. | 2409.04409 | link |
| 2024-09-06 | Advancing SEM Based Nano-Scale Defect Analysis in Semiconductor Manufacturing for Advanced IC Nodes | Bappaditya Dey et.al. | 2409.04310 | null |
| 2024-09-06 | CISCA and CytoDArk0: a Cell Instance Segmentation and Classification method for histo(patho)logical image Analyses and a new, open, Nissl-stained dataset for brain cytoarchitecture studies | Valentina Vadori et.al. | 2409.04175 | null |
| 2024-09-05 | Foundation Model or Finetune? Evaluation of few-shot semantic segmentation for river pollution | Marga Don et.al. | 2409.03754 | link |
| 2024-09-05 | MaskVal: Simple but Effective Uncertainty Quantification for 6D Pose Estimation | Philipp Quentin et.al. | 2409.03556 | null |
| 2024-09-05 | LowFormer: Hardware Efficient Design for Convolutional Transformer Backbones | Moritz Nottebaum et.al. | 2409.03460 | link |
| 2024-09-05 | Automatic occlusion removal from 3D maps for maritime situational awareness | Felix Sattler et.al. | 2409.03451 | null |
| 2024-09-05 | Training-free Conversion of Pretrained ANNs to SNNs for Low-Power and High-Performance Applications | Tong Bu et.al. | 2409.03368 | null |
| 2024-09-05 | MouseSIS: A Frames-and-Events Dataset for Space-Time Instance Segmentation of Mice | Friedhelm Hamann et.al. | 2409.03358 | null |
| 2024-09-05 | UAV (Unmanned Aerial Vehicles): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking | Md. Mahfuzur Rahman et.al. | 2409.03245 | null |
| 2024-09-05 | Labeled-to-Unlabeled Distribution Alignment for Partially-Supervised Multi-Organ Medical Image Segmentation | Xixi Jiang et.al. | 2409.03228 | link |
| 2024-09-05 | iSeg: An Iterative Refinement-based Framework for Training-free Segmentation | Lin Sun et.al. | 2409.03209 | link |
| 2024-09-04 | iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation | Hayeon Jo et.al. | 2409.02838 | null |
| 2024-09-04 | CLDA: Collaborative Learning for Enhanced Unsupervised Domain Adaptation | Minhee Cho et.al. | 2409.02699 | null |
| 2024-09-04 | Evaluation Study on SAM 2 for Class-agnostic Instance-level Segmentation | Tiantian Zhang et.al. | 2409.02567 | null |
| 2024-09-04 | SG-MIM: Structured Knowledge Guided Efficient Pre-training for Dense Prediction | Sumin Son et.al. | 2409.02513 | null |
| 2024-09-03 | K-Origins: Better Colour Quantification for Neural Networks | Lewis Mason et.al. | 2409.02281 | null |
| 2024-09-03 | AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions | Chenghao Qian et.al. | 2409.02045 | null |
| 2024-09-03 | MetaFood3D: Large 3D Food Object Dataset with Nutrition Values | Yuhao Chen et.al. | 2409.01966 | null |
| 2024-09-03 | Segmenting Object Affordances: Reproducibility and Sensitivity to Scale | Tommaso Apicella et.al. | 2409.01814 | link |
| 2024-09-03 | Efficiently Expanding Receptive Fields: Local Split Attention and Parallel Aggregation for Enhanced Large-scale Point Cloud Semantic Segmentation | Haodong Wang et.al. | 2409.01662 | null |
| 2024-09-02 | Semantic Segmentation from Image Labels by Reconstruction from Structured Decomposition | Xuanrui Zeng et.al. | 2409.01472 | link |
| 2024-08-30 | Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes | Li Zhang et.al. | 2408.17421 | link |
| 2024-08-30 | Structuring a Training Strategy to Robustify Perception Models with Realistic Image Augmentations | Ahmed Hammam et.al. | 2408.17311 | null |
| 2024-08-30 | Stochastic Layer-Wise Shuffle: A Good Practice to Improve Vision Mamba Training | Zizheng Huang et.al. | 2408.17081 | link |
| 2024-08-30 | Transient Fault Tolerant Semantic Segmentation for Autonomous Driving | Leonardo Iurada et.al. | 2408.16952 | link |
| 2024-08-29 | Eigen-Cluster VIS: Improving Weakly-supervised Video Instance Segmentation by Leveraging Spatio-temporal Consistency | Farnoosh Arefi et.al. | 2408.16661 | link |
| 2024-08-29 | SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection | Rohit Venkata Sai Dulam et.al. | 2408.16645 | null |
| 2024-08-29 | A Simple and Generalist Approach for Panoptic Segmentation | Nedyalko Prisadnikov et.al. | 2408.16504 | null |
| 2024-08-29 | MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation | Linyan Yang et.al. | 2408.16478 | null |
| 2024-08-29 | Multi-source Domain Adaptation for Panoramic Semantic Segmentation | Jing Jiang et.al. | 2408.16469 | null |
| 2024-08-29 | EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More | Kanghao Chen et.al. | 2408.16254 | null |
| 2024-08-28 | InstanSeg: an embedding-based instance segmentation algorithm optimized for accurate, efficient and portable cell segmentation | Thibaut Goldsborough et.al. | 2408.15954 | link |
| 2024-08-28 | SpineMamba: Enhancing 3D Spinal Segmentation in Clinical Imaging through Residual Visual Mamba Layers and Shape Priors | Zhiqing Zhang et.al. | 2408.15887 | null |
| 2024-08-28 | DQFormer: Towards Unified LiDAR Panoptic Segmentation with Decoupled Queries | Yu Yang et.al. | 2408.15813 | null |
| 2024-08-28 | TeFF: Tracking-enhanced Forgetting-free Few-shot 3D LiDAR Semantic Segmentation | Junbao Zhou et.al. | 2408.15657 | link |
| 2024-08-27 | Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images | Silvia Seidlitz et.al. | 2408.15373 | link |
| 2024-08-27 | An Investigation on The Position Encoding in Vision-Based Dynamics Prediction | Jiageng Zhu et.al. | 2408.15201 | null |
| 2024-08-27 | Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation | Elona Shatri et.al. | 2408.15002 | null |
| 2024-08-27 | Applying ViT in Generalized Few-shot Semantic Segmentation | Liyuan Geng et.al. | 2408.14957 | link |
| 2024-08-27 | Adversarial Manhole: Challenging Monocular Depth Estimation and Semantic Segmentation Models with Patch Attack | Naufal Suryanto et.al. | 2408.14879 | null |
| 2024-08-27 | MROVSeg: Breaking the Resolution Curse of Vision-Language Models in Open-Vocabulary Semantic Segmentation | Yuanbing Zhu et.al. | 2408.14776 | null |
| 2024-08-26 | Physically Feasible Semantic Segmentation | Shamik Basu et.al. | 2408.14672 | link |
| 2024-08-26 | A Survey of Camouflaged Object Detection and Beyond | Fengyang Xiao et.al. | 2408.14562 | null |
| 2024-08-26 | Satellite Sunroof: High-res Digital Surface Models and Roof Segmentation for Global Solar Mapping | Vishal Batchu et.al. | 2408.14400 | null |
| 2024-08-25 | OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart Wheelchair Navigation | Muhammad Rameez ur Rahman et.al. | 2408.13936 | link |
| 2024-08-25 | Exploring Reliable Matching with Phase Enhancement for Night-time Semantic Segmentation | Yuwen Pan et.al. | 2408.13838 | null |
| 2024-08-25 | TripleMixer: A 3D Point Cloud Denoising Model for Adverse Weather | Xiongwei Zhao et.al. | 2408.13802 | link |
| 2024-08-25 | ICFRNet: Image Complexity Prior Guided Feature Refinement for Real-time Semantic Segmentation | Xin Zhang et.al. | 2408.13771 | null |
| 2024-08-25 | Localization and Expansion: A Decoupled Framework for Point Cloud Few-shot Semantic Segmentation | Zhaoyang Li et.al. | 2408.13752 | null |
| 2024-08-24 | ESA: Annotation-Efficient Active Learning for Semantic Segmentation | Jinchao Ge et.al. | 2408.13491 | link |
| 2024-08-23 | Accuracy Improvement of Cell Image Segmentation Using Feedback Former | Hinako Mitsuoka et.al. | 2408.12974 | null |
| 2024-08-23 | Image Segmentation in Foundation Model Era: A Survey | Tianfei Zhou et.al. | 2408.12957 | null |
| 2024-08-23 | Symmetric masking strategy enhances the performance of Masked Image Modeling | Khanh-Binh Nguyen et.al. | 2408.12772 | null |
| 2024-08-22 | Scribbles for All: Benchmarking Scribble Supervised Segmentation Across Datasets | Wolfgang Boettcher et.al. | 2408.12489 | null |
| 2024-08-22 | The 2nd Solution for LSVOS Challenge RVOS Track: Spatial-temporal Refinement for Consistent Semantic Segmentation | Tuyen Tran et.al. | 2408.12447 | null |
| 2024-08-22 | ISETHDR: A Physics-based Synthetic Radiance Dataset for High Dynamic Range Driving Scenes | Zhenyi Liu et.al. | 2408.12048 | link |
| 2024-08-21 | EmbodiedSAM: Online Segment Any 3D Thing in Real Time | Xiuwei Xu et.al. | 2408.11811 | link |
| 2024-08-21 | NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation | Zhenye Lou et.al. | 2408.11787 | link |
| 2024-08-21 | Open-Ended 3D Point Cloud Instance Segmentation | Phuc D. A. Nguyen et.al. | 2408.11747 | null |
| 2024-08-21 | UNetMamba: Efficient UNet-Like Mamba for Semantic Segmentation of High-Resolution Remote Sensing Images | Enze Zhu et.al. | 2408.11545 | null |
| 2024-08-22 | SAM-REF: Rethinking Image-Prompt Synergy for Refinement in Segment Anything | Chongkai Yu et.al. | 2408.11535 | null |
| 2024-08-21 | Exploring Scene Coherence for Semi-Supervised 3D Semantic Segmentation | Chuandong Liu et.al. | 2408.11280 | null |
| 2024-08-20 | An Interpretable Deep Learning Approach for Morphological Script Type Analysis | Malamatenia Vlachou-Efstathiou et.al. | 2408.11150 | null |
| 2024-08-20 | NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency | Valentinos Pariza et.al. | 2408.11054 | link |
| 2024-08-20 | CO2Wounds-V2: Extended Chronic Wounds Dataset From Leprosy Patients | Karen Sanchez et.al. | 2408.10827 | null |
| 2024-08-20 | Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant | Guofeng Mei et.al. | 2408.10652 | null |
| 2024-08-20 | Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended? | Chen Liang et.al. | 2408.10627 | null |
| 2024-08-20 | Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation | Jiawei Han et.al. | 2408.10537 | link |
| 2024-08-21 | LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS | Xinyu Liu et.al. | 2408.10469 | null |
| 2024-08-19 | Leveraging Superfluous Information in Contrastive Representation Learning | Xuechu Yu et.al. | 2408.10292 | null |
| 2024-08-19 | Imbalance-Aware Culvert-Sewer Defect Segmentation Using an Enhanced Feature Pyramid Network | Rasha Alshawi et.al. | 2408.10181 | null |
| 2024-08-19 | Dynamic Label Injection for Imbalanced Industrial Defect Segmentation | Emanuele Caruso et.al. | 2408.10031 | link |
| 2024-08-19 | Detecting Adversarial Attacks in Semantic Segmentation via Uncertainty Estimation: A Deep Analysis | Kira Maag et.al. | 2408.10021 | null |
| 2024-08-19 | DiscoNeRF: Class-Agnostic Object Field for 3D Object Discovery | Corentin Dumery et.al. | 2408.09928 | null |
| 2024-08-19 | 3D-Aware Instance Segmentation and Tracking in Egocentric Videos | Yash Bhalgat et.al. | 2408.09860 | null |
| 2024-08-19 | Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving | Jun Yan et.al. | 2408.09839 | link |
| 2024-08-18 | OVOSE: Open-Vocabulary Semantic Segmentation in Event-Based Cameras | Muhammad Rameez Ur Rahman et.al. | 2408.09424 | link |
| 2024-08-18 | VrdONE: One-stage Video Visual Relation Detection | Xinjie Jiang et.al. | 2408.09408 | link |
| 2024-08-18 | Elite360M: Efficient 360 Multi-task Learning via Bi-projection Fusion and Cross-task Collaboration | Hao Ai et.al. | 2408.09336 | null |
| 2024-08-17 | Cross-Species Data Integration for Enhanced Layer Segmentation in Kidney Pathology | Junchao Zhu et.al. | 2408.09278 | link |
| 2024-08-16 | Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation | Tri Ton et.al. | 2408.08591 | null |
| 2024-08-16 | Tuning a SAM-Based Model with Multi-Cognitive Visual Adapter to Remote Sensing Instance Segmentation | Linghao Zheng et.al. | 2408.08576 | null |
| 2024-08-16 | Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs | Jinming Liu et.al. | 2408.08575 | null |
| 2024-08-15 | 5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks | Dongshuo Yin et.al. | 2408.08345 | link |
| 2024-08-14 | MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis | Nimeesha Chan et.al. | 2408.07773 | link |
| 2024-08-15 | MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient Semantic Segmentation | Beoungwoo Kang et.al. | 2408.07576 | link |
| 2024-08-15 | MagicFace: Training-free Universal-Style Human Image Customized Synthesis | Yibin Wang et.al. | 2408.07433 | null |
| 2024-08-14 | Segment Using Just One Example | Pratik Vora et.al. | 2408.07393 | null |
| 2024-08-14 | Ensemble architecture in polyp segmentation | Hao-Yun Hsu et.al. | 2408.07262 | link |
| 2024-08-14 | Leveraging Perceptual Scores for Dataset Pruning in Computer Vision Tasks | Raghavendra Singh et.al. | 2408.07243 | null |
| 2024-08-14 | Enhancing Autonomous Vehicle Perception in Adverse Weather through Image Augmentation during Semantic Segmentation Training | Ethan Kou et.al. | 2408.07239 | null |
| 2024-08-13 | ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation | Jingyun Wang et.al. | 2408.06747 | link |
| 2024-08-10 | Dilated Convolution with Learnable Spacings | Ismail Khalfaoui-Hassani et.al. | 2408.06383 | null |
| 2024-08-12 | Correlation Weighted Prototype-based Self-Supervised One-Shot Segmentation of Medical Images | Siladittya Manna et.al. | 2408.06235 | null |
| 2024-08-12 | A-BDD: Leveraging Data Augmentations for Safe Autonomous Driving in Adverse Weather and Lighting | Felix Assion et.al. | 2408.06071 | null |
| 2024-08-13 | ClickAttention: Click Region Similarity Guided Interactive Segmentation | Long Xu et.al. | 2408.06021 | null |
| 2024-08-12 | Enhancing 3D Transformer Segmentation Model for Medical Image with Token-level Representation Learning | Xinrong Hu et.al. | 2408.05889 | null |
| 2024-08-11 | Seg-CycleGAN : SAR-to-optical image translation guided by a downstream task | Hannuo Zhang et.al. | 2408.05777 | null |
| 2024-08-11 | MacFormer: Semantic Segmentation with Fine Object Boundaries | Guoan Xu et.al. | 2408.05699 | null |
| 2024-08-13 | Performance Evaluation of YOLOv8 Model Configurations, for Instance Segmentation of Strawberry Fruit Development Stages in an Open Field Environment | Abdul-Razak Alhassan Gamani et.al. | 2408.05661 | null |
| 2024-08-10 | Multimodal generative semantic communication based on latent diffusion model | Weiqi Fu et.al. | 2408.05455 | null |
| 2024-08-09 | PRISM Lite: A lightweight model for interactive 3D placenta segmentation in ultrasound | Hao Li et.al. | 2408.05372 | link |
| 2024-08-09 | In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation | Dahyun Kang et.al. | 2408.04961 | link |
| 2024-08-09 | ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation | Mengcheng Lan et.al. | 2408.04883 | link |
| 2024-08-09 | Extracting Signal Electron Trajectories in the COMET Phase-I Cylindrical Drift Chamber Using Deep Learning | Fumihiro Kaneko et.al. | 2408.04795 | null |
| 2024-08-08 | Embodied Uncertainty-Aware Object Segmentation | Xiaolin Fang et.al. | 2408.04760 | null |
| 2024-08-08 | SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation | Jieming Yu et.al. | 2408.04593 | null |
| 2024-08-08 | Robust Approximate Characterization of Single-Cell Heterogeneity in Microbial Growth | Richard D. Paul et.al. | 2408.04501 | link |
| 2024-08-08 | SegXAL: Explainable Active Learning for Semantic Segmentation in Driving Scene Scenarios | Sriram Mandalika et.al. | 2408.04482 | null |
| 2024-08-08 | What could go wrong? Discovering and describing failure modes in computer vision | Gabriela Csurka et.al. | 2408.04471 | null |
| 2024-08-07 | Performance and Non-adversarial Robustness of the Segment Anything Model 2 in Surgical Video Segmentation | Yiqing Shen et.al. | 2408.04098 | null |
| 2024-08-07 | CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications | Tianfang Zhang et.al. | 2408.03703 | link |
| 2024-08-07 | SAM2-PATH: A better segment anything model for semantic segmentation in digital pathology | Mingya Zhang et.al. | 2408.03651 | link |
| 2024-08-06 | Post-Mortem Human Iris Segmentation Analysis with Deep Learning | Afzal Hossain et.al. | 2408.03448 | null |
| 2024-08-06 | Comb, Prune, Distill: Towards Unified Pruning for Vision Model Compression | Jonas Schmitt et.al. | 2408.03046 | link |
| 2024-08-06 | Evaluation of Segment Anything Model 2: The Role of SAM2 in the Underwater Environment | Shijie Lian et.al. | 2408.02924 | link |
| 2024-08-05 | Scribble-Based Interactive Segmentation of Medical Hyperspectral Images | Zhonghao Wang et.al. | 2408.02708 | null |
| 2024-08-05 | Perception Matters: Enhancing Embodied AI with Uncertainty-Aware Semantic Segmentation | Sai Prasanna et.al. | 2408.02297 | null |
| 2024-08-05 | Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs | Jeongkee Lim et.al. | 2408.02261 | null |
| 2024-08-05 | Curriculum learning based pre-training using Multi-Modal Contrastive Masked Autoencoders | Muhammad Abdullah Jamal et.al. | 2408.02245 | null |
| 2024-08-04 | Pixel-Level Domain Adaptation: A New Perspective for Enhancing Weakly Supervised Semantic Segmentation | Ye Du et.al. | 2408.02039 | null |
| 2024-08-03 | NuLite – Lightweight and Fast Model for Nuclei Instance Segmentation and Classification | Cristian Tommasino et.al. | 2408.01797 | null |
| 2024-08-03 | Bayesian Active Learning for Semantic Segmentation | Sima Didari et.al. | 2408.01694 | null |
| 2024-08-03 | A Comparative Analysis of CNN-based Deep Learning Models for Landslide Detection | Omkar Oak et.al. | 2408.01692 | null |
| 2024-08-03 | Leveraging GNSS and Onboard Visual Data from Consumer Vehicles for Robust Road Network Estimation | Balázs Opra et.al. | 2408.01640 | null |
| 2024-08-02 | Multi-Unit Floor Plan Recognition and Reconstruction Using Improved Semantic Segmentation of Raster-Wise Floor Plans | Lukas Kratochvila et.al. | 2408.01526 | null |
| 2024-08-02 | Balanced Residual Distillation Learning for 3D Point Cloud Class-Incremental Semantic Segmentation | Yuanzhi Su et.al. | 2408.01356 | null |
| 2024-08-02 | StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation | Bingyu Li et.al. | 2408.01343 | null |
| 2024-08-02 | Amodal Segmentation for Laparoscopic Surgery Video Instruments | Ruohua Shi et.al. | 2408.01067 | null |
| 2024-08-02 | Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach | Yabin Zhu et.al. | 2408.00969 | null |
| 2024-08-01 | Medical SAM 2: Segment medical images as video via Segment Anything Model 2 | Jiayuan Zhu et.al. | 2408.00874 | link |
| 2024-08-01 | Leaf Angle Estimation using Mask R-CNN and LETR Vision Transformer | Venkat Margapuri et.al. | 2408.00749 | null |
| 2024-08-01 | Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation | Siyu Jiao et.al. | 2408.00744 | link |
| 2024-08-01 | Synthetic dual image generation for reduction of labeling efforts in semantic segmentation of micrographs with a customized metric function | Matias Oscar Volman Stern et.al. | 2408.00707 | null |
| 2024-08-01 | AMAES: Augmented Masked Autoencoder Pretraining on Public Brain MRI Data for 3D-Native Segmentation | Asbjørn Munk et.al. | 2408.00640 | null |
| 2024-08-01 | SegStitch: Multidimensional Transformer for Robust and Efficient Medical Imaging Segmentation | Shengbo Tan et.al. | 2408.00496 | null |
| 2024-08-01 | A Simple Background Augmentation Method for Object Detection with Diffusion Model | Yuhang Li et.al. | 2408.00350 | null |
| 2024-07-31 | Con4m: Context-aware Consistency Learning Framework for Segmented Time Series Classification | Junru Chen et.al. | 2408.00041 | null |
| 2024-07-31 | Open-Vocabulary Audio-Visual Semantic Segmentation | Ruohao Guo et.al. | 2407.21721 | link |
| 2024-07-31 | MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment | Anurag Das et.al. | 2407.21654 | null |
| 2024-07-31 | MaskUno: Switch-Split Block For Enhancing Instance Segmentation | Jawad Haidar et.al. | 2407.21498 | null |
| 2024-07-31 | Small Object Few-shot Segmentation for Vision-based Industrial Inspection | Zilong Zhang et.al. | 2407.21351 | null |
| 2024-07-31 | On-the-fly Point Feature Representation for Point Clouds Analysis | Jiangyi Wang et.al. | 2407.21335 | null |
| 2024-07-31 | Fine-grained Metrics for Point Cloud Semantic Segmentation | Zhuheng Lu et.al. | 2407.21289 | null |
| 2024-07-30 | PLANesT-3D: A new annotated dataset for segmentation of 3D plant point clouds | Kerem Mertoğlu et.al. | 2407.21150 | null |
| 2024-07-30 | Learning Ordinality in Semantic Segmentation | Rafael Cristino et.al. | 2407.20959 | null |
| 2024-07-29 | Improving 2D Feature Representations by 3D-Aware Fine-Tuning | Yuanwen Yue et.al. | 2407.20229 | link |
| 2024-07-29 | Background Semantics Matter: Cross-Task Feature Exchange Network for Clustered Infrared Small Target Detection With Sky-Annotated Dataset | Yimian Dai et.al. | 2407.20078 | link |
| 2024-07-29 | Language-driven Grasp Detection with Mask-guided Attention | Tuan Van Vo et.al. | 2407.19877 | null |
| 2024-07-29 | Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets | Muhammad Abdullah Jamal et.al. | 2407.19714 | null |
| 2024-07-29 | ALEN: A Dual-Approach for Uniform and Non-Uniform Low-Light Image Enhancement | Ezequiel Perez-Zarate et.al. | 2407.19708 | link |
| 2024-07-28 | ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding | Zhen Chen et.al. | 2407.19435 | link |
| 2024-07-28 | Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets | Tianxiao Zhang et.al. | 2407.19394 | link |
| 2024-07-27 | Ensembling convolutional neural networks for human skin segmentation | Patryk Kuban et.al. | 2407.19310 | null |
| 2024-07-27 | Sewer Image Super-Resolution with Depth Priors and Its Lightweight Network | Gang Pan et.al. | 2407.19271 | null |
| 2024-07-26 | Sparse Refinement for Efficient High-Resolution Semantic Segmentation | Zhijian Liu et.al. | 2407.19014 | null |
| 2024-07-26 | A Survey on Cell Nuclei Instance Segmentation and Classification: Leveraging Context and Attention | João D. Nunes et.al. | 2407.18673 | null |
| 2024-07-26 | Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation | Jingjun Yi et.al. | 2407.18568 | null |
| 2024-07-25 | Taxonomy-Aware Continual Semantic Segmentation in Hyperbolic Spaces for Open-World Perception | Julia Hindel et.al. | 2407.18145 | null |
| 2024-07-25 | LKCell: Efficient Cell Nuclei Instance Segmentation with Large Convolution Kernels | Ziwei Cui et.al. | 2407.18054 | link |
| 2024-07-25 | TiCoSS: Tightening the Coupling between Semantic Segmentation and Stereo Matching within A Joint Learning Framework | Guanfeng Tang et.al. | 2407.18038 | null |
| 2024-07-25 | Segmentation-guided MRI reconstruction for meaningfully diverse reconstructions | Jan Nikolas Morshuis et.al. | 2407.18026 | link |
| 2024-07-26 | Quality Assured: Rethinking Annotation Strategies in Imaging AI | Tim Rädsch et.al. | 2407.17596 | null |
| 2024-07-24 | Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation | Hyunwoo Yu et.al. | 2407.17261 | link |
| 2024-07-24 | Trans2Unet: Neural fusion for Nuclei Semantic Segmentation | Dinh-Phu Tran et.al. | 2407.17181 | null |
| 2024-07-24 | PiPa++: Towards Unification of Domain Adaptive Semantic Segmentation via Self-supervised Learning | Mu Chen et.al. | 2407.17101 | null |
| 2024-07-25 | Enhancing Environmental Monitoring through Multispectral Imaging: The WasteMS Dataset for Semantic Segmentation of Lakeside Waste | Qinfeng Zhu et.al. | 2407.17028 | link |
| 2024-07-24 | Progressive Query Refinement Framework for Bird’s-Eye-View Semantic Segmentation from Surrounding Images | Dooseop Choi et.al. | 2407.17003 | link |
| 2024-07-24 | McGAN: Generating Manufacturable Designs by Embedding Manufacturing Rules into Conditional Generative Adversarial Network | Zhichao Wang et.al. | 2407.16943 | null |
| 2024-07-23 | SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation | Pengfei Chen et.al. | 2407.16682 | null |
| 2024-07-23 | Deformable Convolution Based Road Scene Semantic Segmentation of Fisheye Images in Autonomous Driving | Anam Manzoor et.al. | 2407.16647 | null |
| 2024-07-23 | Deep Bayesian segmentation for colon polyps: Well-calibrated predictions in medical imaging | Daniela L. Ramos et.al. | 2407.16608 | null |
| 2024-07-23 | Strike a Balance in Continual Panoptic Segmentation | Jinpeng Chen et.al. | 2407.16354 | link |
| 2024-07-23 | Augmented Efficiency: Reducing Memory Footprint and Accelerating Inference for 3D Semantic Segmentation through Hybrid Vision | Aditya Krishnan et.al. | 2407.16102 | null |
| 2024-07-22 | Enhancing Cell Instance Segmentation in Scanning Electron Microscopy Images via a Deep Contour Closing Operator | Florian Robert et.al. | 2407.15817 | null |
| 2024-07-22 | MILAN: Milli-Annotations for Lidar Semantic Segmentation | Nermin Samet et.al. | 2407.15797 | null |
| 2024-07-22 | Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond | Silvio Galesso et.al. | 2407.15739 | link |
| 2024-07-22 | MSSPlace: Multi-Sensor Place Recognition with Visual and Text Semantics | Alexander Melekhin et.al. | 2407.15663 | link |
| 2024-07-22 | Learning at a Glance: Towards Interpretable Data-limited Continual Semantic Segmentation via Semantic-Invariance Modelling | Bo Yuan et.al. | 2407.15429 | link |
| 2024-07-22 | Is user feedback always informative? Retrieval Latent Defending for Semi-Supervised Domain Adaptation without Source Data | Junha Song et.al. | 2407.15383 | null |
| 2024-07-21 | Point Transformer V3 Extreme: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation | Xiaoyang Wu et.al. | 2407.15282 | null |
| 2024-07-20 | Downstream-Pretext Domain Knowledge Traceback for Active Learning | Beichen Zhang et.al. | 2407.14720 | null |
| 2024-07-19 | Panoptic Segmentation of Mammograms with Text-To-Image Diffusion Model | Kun Zhao et.al. | 2407.14326 | null |
| 2024-07-19 | Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation | Zhengyuan Xie et.al. | 2407.14142 | link |
| 2024-07-19 | MC-PanDA: Mask Confidence for Panoptic Domain Adaptation | Ivan Martinović et.al. | 2407.14110 | link |
| 2024-07-19 | GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation | Florian Chabot et.al. | 2407.14108 | null |
| 2024-07-19 | Scale Disparity of Instances in Interactive Point Cloud Segmentation | Chenrui Han et.al. | 2407.14009 | null |
| 2024-07-18 | Many Perception Tasks are Highly Redundant Functions of their Input Data | Rahul Ramesh et.al. | 2407.13841 | null |
| 2024-07-18 | GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model | Abdelrahman Shaker et.al. | 2407.13772 | link |
| 2024-07-18 | SegPoint: Segment Any Point Cloud via Large Language Model | Shuting He et.al. | 2407.13761 | null |
| 2024-07-18 | MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis | Ziming Zhong et.al. | 2407.13675 | link |
| 2024-07-18 | Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models | Xiaoyu Zhu et.al. | 2407.13642 | null |
| 2024-07-18 | FADE: A Task-Agnostic Upsampling Operator for Encoder-Decoder Architectures | Hao Lu et.al. | 2407.13500 | null |
| 2024-07-18 | FREST: Feature RESToration for Semantic Segmentation under Multiple Adverse Conditions | Sohyun Lee et.al. | 2407.13437 | null |
| 2024-07-18 | Lightweight Uncertainty Quantification with Simplex Semantic Segmentation for Terrain Traversability | Judith Dijk et.al. | 2407.13392 | null |
| 2024-07-18 | Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation | Chang Liu et.al. | 2407.13363 | null |
| 2024-07-18 | Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation | Shoumeng Qiu et.al. | 2407.13254 | link |
| 2024-07-18 | OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird’s-eye-view Vehicle Semantic Segmentation | Jian Sun et.al. | 2407.13137 | null |
| 2024-07-17 | FastSAM-3DSlicer: A 3D-Slicer Extension for 3D Volumetric Segment Anything Model with Uncertainty Quantification | Yiqing Shen et.al. | 2407.12658 | null |
| 2024-07-17 | Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation | Prantik Howlader et.al. | 2407.12630 | link |
| 2024-07-17 | Instance-wise Uncertainty for Class Imbalance in Semantic Segmentation | Luís Almeida et.al. | 2407.12609 | null |
| 2024-07-17 | Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks | Antoni Kowalczuk et.al. | 2407.12588 | link |
| 2024-07-17 | Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation | Ruijie Xu et.al. | 2407.12489 | link |
| 2024-07-17 | Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation | Hyun Seok Seong et.al. | 2407.12463 | null |
| 2024-07-17 | Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation | Kaixin Bai et.al. | 2407.12449 | null |
| 2024-07-17 | ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference | Mengcheng Lan et.al. | 2407.12442 | null |
| 2024-07-17 | Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model | Tao Wang et.al. | 2407.12319 | null |
| 2024-07-16 | FoodMem: Near Real-time and Precise Food Video Segmentation | Ahmad AlMughrabi et.al. | 2407.12121 | null |
| 2024-07-16 | Mitigating Background Shift in Class-Incremental Semantic Segmentation | Gilhan Park et.al. | 2407.11859 | link |
| 2024-07-16 | Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation | Juncheng Ma et.al. | 2407.11820 | null |
| 2024-07-16 | Click-Gaussian: Interactive Segmentation to Any 3D Gaussians | Seokhun Choi et.al. | 2407.11793 | null |
| 2024-07-16 | XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach | Truong Thanh Hung Nguyen et.al. | 2407.11771 | null |
| 2024-07-16 | OAM-TCD: A globally diverse dataset of high-resolution tree cover maps | Josh Veitch-Michaelis et.al. | 2407.11743 | link |
| 2024-07-16 | SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds | Yanbo Wang et.al. | 2407.11569 | link |
| 2024-07-16 | SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation | Lei Yao et.al. | 2407.11564 | link |
| 2024-07-16 | Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes | Zhi Cai et.al. | 2407.11464 | link |
| 2024-07-16 | Leveraging Segment Anything Model in Identifying Buildings within Refugee Camps (SAM4Refugee) from Satellite Imagery for Humanitarian Operations | Yunya Gao et.al. | 2407.11381 | link |
| 2024-07-16 | Generative AI Driven Task-Oriented Adaptive Semantic Communications | Yuzhou Fu et.al. | 2407.11354 | null |
| 2024-07-15 | No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations | Walter Simoncini et.al. | 2407.10964 | link |
| 2024-07-15 | APC: Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation | Wangyu Wu et.al. | 2407.10649 | null |
| 2024-07-15 | Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs | Rong Ma et.al. | 2407.10534 | null |
| 2024-07-14 | Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data | Tuo Feng et.al. | 2407.10200 | link |
| 2024-07-14 | RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation | Li Li et.al. | 2407.10159 | link |
| 2024-07-14 | Part2Object: Hierarchical Unsupervised 3D Instance Segmentation | Cheng Shi et.al. | 2407.10084 | link |
| 2024-07-14 | HSFusion: A high-level vision task-driven infrared and visible image fusion network via semantic and geometric domain transformation | Chengjie Jiang et.al. | 2407.10047 | null |
| 2024-07-13 | Background Adaptation with Residual Modeling for Exemplar-Free Class-Incremental Semantic Segmentation | Anqi Zhang et.al. | 2407.09838 | null |
| 2024-07-13 | Enhancing Semantic Segmentation with Adaptive Focal Loss: A Novel Approach | Md Rakibul Islam et.al. | 2407.09828 | null |
| 2024-07-13 | 3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance | Xiaoxu Xu et.al. | 2407.09826 | link |
| 2024-07-12 | FANet: Feature Amplification Network for Semantic Segmentation in Cluttered Background | Muhammad Ali et.al. | 2407.09379 | link |
| 2024-07-12 | WSESeg: Introducing a Dataset for the Segmentation of Winter Sports Equipment with a Baseline for Interactive Segmentation | Robin Schön et.al. | 2407.09288 | null |
| 2024-07-12 | A Fair Ranking and New Model for Panoptic Scene Graph Generation | Julian Lorenz et.al. | 2407.09216 | link |
| 2024-07-12 | Salt & Pepper Heatmaps: Diffusion-informed Landmark Detection Strategy | Julian Wyatt et.al. | 2407.09192 | null |
| 2024-07-12 | From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation | Hanrong Shi et.al. | 2407.09191 | null |
| 2024-07-12 | Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off | Levente Halmosi et.al. | 2407.09150 | link |
| 2024-07-12 | Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation | Wei Cong et.al. | 2407.09047 | null |
| 2024-07-12 | Textual Query-Driven Mask Transformer for Domain Generalized Segmentation | Byeonghyun Pak et.al. | 2407.09033 | link |
| 2024-07-12 | Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation | Zihao Li et.al. | 2407.08994 | null |
| 2024-07-11 | SLoRD: Structural Low-Rank Descriptors for Shape Consistency in Vertebrae Segmentation | Xin You et.al. | 2407.08555 | null |
| 2024-07-11 | Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation | Tong Shao et.al. | 2407.08268 | link |
| 2024-07-11 | Enrich the content of the image Using Context-Aware Copy Paste | Qiushi Guo et.al. | 2407.08151 | null |
| 2024-07-10 | MambaVision: A Hybrid Mamba-Transformer Vision Backbone | Ali Hatamizadeh et.al. | 2407.08083 | link |
| 2024-07-10 | Interactive Segmentation Model for Placenta Segmentation from 3D Ultrasound images | Hao Li et.al. | 2407.08020 | link |
| 2024-07-10 | Satellite Image Time Series Semantic Change Detection: Novel Architecture and Analysis of Domain Shift | Elliot Vincent et.al. | 2407.07616 | link |
| 2024-07-10 | H-FCBFormer Hierarchical Fully Convolutional Branch Transformer for Occlusal Contact Segmentation with Articulating Paper | Ryan Banks et.al. | 2407.07604 | link |
| 2024-07-11 | Trainable Highly-expressive Activation Functions | Irit Chelly et.al. | 2407.07564 | null |
| 2024-07-10 | Panoptic Segmentation of Galactic Structures in LSB Images | Felix Richards et.al. | 2407.07494 | null |
| 2024-07-10 | Deformable-Heatmap-Segmentation for Automobile Visual Perception | Hongyu Jin et.al. | 2407.07493 | null |
| 2024-07-10 | Exploring the Untouched Sweeps for Conflict-Aware 3D Segmentation Pretraining | Tianfang Sun et.al. | 2407.07465 | null |
| 2024-07-11 | HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation | Guoan Xu et.al. | 2407.07441 | null |
| 2024-07-10 | Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation | Hao Fang et.al. | 2407.07427 | link |
| 2024-07-09 | ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation | Yuyuan Liu et.al. | 2407.07171 | link |
| 2024-07-09 | Improved Block Merging for 3D Point Cloud Instance Segmentation | Leon Denis et.al. | 2407.06991 | null |
| 2024-07-09 | Joint prototype and coefficient prediction for 3D instance segmentation | Remco Royen et.al. | 2407.06958 | null |
| 2024-07-08 | Training-free CryoET Tomogram Segmentation | Yizhou Zhao et.al. | 2407.06833 | link |
| 2024-07-09 | CycleSAM: One-Shot Surgical Scene Segmentation using Cycle-Consistent Feature Matching to Prompt SAM | Aditya Murali et.al. | 2407.06795 | null |
| 2024-07-09 | LuSNAR:A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration | Jiayi Liu et.al. | 2407.06512 | link |
| 2024-07-08 | Leveraging image captions for selective whole slide image annotation | Jingna Qiu et.al. | 2407.06363 | null |
| 2024-07-08 | Object-Oriented Material Classification and 3D Clustering for Improved Semantic Perception and Mapping in Mobile Robots | Siva Krishna Ravipati et.al. | 2407.06077 | null |
| 2024-07-08 | Test-time adaptation for geospatial point cloud semantic segmentation with distinct domain shifts | Puzuo Wang et.al. | 2407.06043 | null |
| 2024-07-08 | RHRSegNet: Relighting High-Resolution Night-Time Semantic Segmentation | Sarah Elmahdy et.al. | 2407.06016 | link |
| 2024-07-07 | Semantic Segmentation for Real-World and Synthetic Vehicle’s Forward-Facing Camera Images | Tuan T. Nguyen et.al. | 2407.05452 | null |
| 2024-07-07 | Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness | Idris Hamoud et.al. | 2407.05448 | null |
| 2024-07-06 | A Study of Test-time Contrastive Concepts for Open-world, Open-vocabulary Semantic Segmentation | Monika Wysoczańska et.al. | 2407.05061 | null |
| 2024-07-06 | BlessemFlood21: Advancing Flood Analysis with a High-Resolution Georeferenced Dataset for Humanitarian Aid Support | Vladyslav Polushko et.al. | 2407.05007 | null |
| 2024-07-05 | Explainable Metric Learning for Deflating Data Bias | Emma Andrews et.al. | 2407.04866 | null |
| 2024-07-05 | Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge | Yuanze Lin et.al. | 2407.04681 | null |
| 2024-07-05 | LMSeg: A deep graph message-passing network for efficient and accurate semantic segmentation of large-scale 3D landscape meshes | Zexian Huang et.al. | 2407.04326 | null |
| 2024-07-04 | Slice-100K: A Multimodal Dataset for Extrusion-based 3D Printing | Anushrut Jignasu et.al. | 2407.04180 | link |
| 2024-07-04 | Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier | Prantik Howlader et.al. | 2407.04036 | link |
| 2024-07-04 | Performance of Medical Image Fusion in High-level Analysis Tasks: A Mutual Enhancement Framework for Unaligned PAT and MRI Image Fusion | Yutian Zhong et.al. | 2407.03992 | link |
| 2024-07-04 | Relative Difficulty Distillation for Semantic Segmentation | Dong Liang et.al. | 2407.03719 | null |
| 2024-07-04 | POSTURE: Pose Guided Unsupervised Domain Adaptation for Human Body Part Segmentation | Arindam Dutta et.al. | 2407.03549 | null |
| 2024-07-03 | A Unified Framework for 3D Scene Understanding | Wei Xu et.al. | 2407.03263 | null |
| 2024-07-03 | ISWSST: Index-space-wave State Superposition Transformers for Multispectral Remotely Sensed Imagery Semantic Segmentation | Chang Li et.al. | 2407.03033 | null |
| 2024-07-03 | Context-Aware Video Instance Segmentation | Seunghun Lee et.al. | 2407.03010 | link |
| 2024-07-03 | ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation | Yipin Guo et.al. | 2407.02881 | null |
| 2024-07-03 | Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation | Tao Chen et.al. | 2407.02768 | null |
| 2024-07-03 | ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers | Yanfeng Jiang et.al. | 2407.02763 | null |
| 2024-07-02 | Open Panoramic Segmentation | Junwei Zheng et.al. | 2407.02685 | link |
| 2024-07-02 | Holistically-Nested Structure-Aware Graph Neural Network for Road Extraction | Tinghuai Wang et.al. | 2407.02639 | null |
| 2024-07-02 | Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather | Junsung Park et.al. | 2407.02286 | link |
| 2024-07-02 | MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders | Baijiong Lin et.al. | 2407.02228 | link |
| 2024-07-02 | Occlusion-Aware Seamless Segmentation | Yihong Cao et.al. | 2407.02182 | link |
| 2024-07-02 | VRBiom: A New Periocular Dataset for Biometric Applications of HMD | Ketan Kotwal et.al. | 2407.02150 | null |
| 2024-07-02 | HRSAM: Efficiently Segment Anything in High-Resolution Images | You Huang et.al. | 2407.02109 | null |
| 2024-07-02 | Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts | Pasquale De Marinis et.al. | 2407.02075 | link |
| 2024-07-02 | LiDAR-based HD Map Localization using Semantic Generalized ICP with Road Marking Detection | Yansong Gong et.al. | 2407.02061 | null |
| 2024-07-02 | Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning | Chengchao Shen et.al. | 2407.02014 | link |
| 2024-07-01 | Label-free Neural Semantic Image Synthesis | Jiayi Wang et.al. | 2407.01790 | null |
| 2024-07-01 | PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction | Xuan Yu et.al. | 2407.01349 | null |
| 2024-06-28 | EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model | Yuxuan Zhang et.al. | 2406.20076 | link |
| 2024-07-01 | Mobile Robot Oriented Large-Scale Indoor Dataset for Dynamic Scene Understanding | Yifan Tang et.al. | 2406.19791 | null |
| 2024-06-28 | PM-VIS+: High-Performance Video Instance Segmentation without Video Annotation | Zhangjing Yang et.al. | 2406.19665 | link |
| 2024-06-28 | Precision matters: Precision-aware ensemble for weakly supervised semantic segmentation | Junsung Park et.al. | 2406.19638 | link |
| 2024-06-28 | PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation | Deyi Ji et.al. | 2406.19632 | null |
| 2024-06-27 | Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model | Haobo Yuan et.al. | 2406.19369 | null |
| 2024-06-27 | ProtoGMM: Multi-prototype Gaussian-Mixture-based Domain Adaptation Model for Semantic Segmentation | Nazanin Moradinasab et.al. | 2406.19225 | null |
| 2024-06-30 | Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO | Fuseini Mumuni et.al. | 2406.19057 | null |
| 2024-06-27 | Divide, Ensemble and Conquer: The Last Mile on Unsupervised Domain Adaptation for On-Board Semantic Segmentation | Tao Lian et.al. | 2406.18809 | null |
| 2024-07-01 | 3D Feature Distillation with Object-Centric Priors | Georgios Tziafas et.al. | 2406.18742 | null |
| 2024-06-26 | CAS: Confidence Assessments of classification algorithms for Semantic segmentation of EO data | Nikolaos Dionelis et.al. | 2406.18279 | null |
| 2024-06-26 | CoDA: Interactive Segmentation and Morphological Analysis of Dendroid Structures Exemplified on Stony Cold-Water Corals | Kira Schmitt et.al. | 2406.18236 | link |
| 2024-06-26 | The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval | Meinardus Boris et.al. | 2406.18113 | link |
| 2024-06-26 | Few-Shot Medical Image Segmentation with High-Fidelity Prototypes | Song Tang et.al. | 2406.18074 | link |
| 2024-06-25 | Semi-supervised classification of dental conditions in panoramic radiographs using large language model and instance segmentation: A real-world dataset evaluation | Bernardo Silva et.al. | 2406.17915 | null |
| 2024-06-25 | Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation | Xuming Zhang et.al. | 2406.17679 | null |
| 2024-06-25 | DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for Efficient Scanned Document Annotation | Ahmad Mohammadshirazi et.al. | 2406.17591 | link |
| 2024-06-25 | Principal Component Clustering for Semantic Segmentation in Synthetic Data Generation | Felix Stillger et.al. | 2406.17541 | null |
| 2024-06-25 | Investigating Self-Supervised Methods for Label-Efficient Learning | Srinivasa Rao Nandam et.al. | 2406.17460 | null |
| 2024-06-25 | Pseudo Labelling for Enhanced Masked Autoencoders | Srinivasa Rao Nandam et.al. | 2406.17450 | null |
| 2024-06-25 | Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model | Zhuoyuan Li et.al. | 2406.17442 | null |
| 2024-06-25 | Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes | Qi Ma et.al. | 2406.17438 | null |
| 2024-06-25 | Depth-Guided Semi-Supervised Instance Segmentation | Xin Chen et.al. | 2406.17413 | null |
| 2024-06-25 | XAMI – A Benchmark Dataset for Artefact Detection in XMM-Newton Optical Images | Elisabeta-Iulia Dima et.al. | 2406.17323 | link |
| 2024-06-24 | GMT: Guided Mask Transformer for Leaf Instance Segmentation | Feng Chen et.al. | 2406.17109 | null |
| 2024-06-24 | Instance Consistency Regularization for Semi-Supervised 3D Instance Segmentation | Yizheng Wu et.al. | 2406.16776 | link |
| 2024-06-24 | μ-Net: A Deep Learning-Based Architecture for μ-CT Segmentation | Pierangela Bruno et.al. | 2406.16724 | null |
| 2024-06-24 | GATSBI: An Online GTSP-Based Algorithm for Targeted Surface Bridge Inspection and Defect Detection | Harnaik Dhami et.al. | 2406.16625 | null |
| 2024-06-24 | LOGCAN++: Local-global class-aware network for semantic segmentation of remote sensing images | Xiaowen Ma et.al. | 2406.16502 | link |
| 2024-06-24 | Cascade Reward Sampling for Efficient Decoding-Time Alignment | Bolian Li et.al. | 2406.16306 | link |
| 2024-06-24 | SegNet4D: Effective and Efficient 4D LiDAR Semantic Segmentation in Autonomous Driving Environments | Neng Wang et.al. | 2406.16279 | link |
| 2024-06-23 | UDHF2-Net: An Uncertainty-diffusion-model-based High-Frequency TransFormer Network for High-accuracy Interpretation of Remotely Sensed Imagery | Pengfei Zhang et.al. | 2406.16129 | null |
| 2024-06-23 | CholecInstanceSeg: A Tool Instance Segmentation Dataset for Laparoscopic Surgery | Oluwatosin Alabi et.al. | 2406.16039 | null |
| 2024-06-22 | Fine-grained Background Representation for Weakly Supervised Semantic Segmentation | Xu Yin et.al. | 2406.15755 | null |
| 2024-06-21 | TraceNet: Segment one thing efficiently | Mingyuan Wu et.al. | 2406.14874 | null |
| 2024-06-19 | 3D Instance Segmentation Using Deep Learning on RGB-D Indoor Data | Siddiqui Muhammad Yasir et.al. | 2406.14581 | null |
| 2024-06-20 | Evaluation of Deep Learning Semantic Segmentation for Land Cover Mapping on Multispectral, Hyperspectral and High Spatial Aerial Imagery | Ilham Adi Panuntun et.al. | 2406.14220 | null |
| 2024-06-20 | Trusting Semantic Segmentation Networks | Samik Some et.al. | 2406.14201 | null |
| 2024-06-20 | EvSegSNN: Neuromorphic Semantic Segmentation for Event Data | Dalia Hareb et.al. | 2406.14178 | null |
| 2024-06-20 | Seg-LSTM: Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images | Qinfeng Zhu et.al. | 2406.14086 | link |
| 2024-06-20 | 2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation | Bin Cao et.al. | 2406.13939 | null |
| 2024-06-19 | Search-based DNN Testing and Retraining with GAN-enhanced Simulations | Mohammed Oualid Attaoui et.al. | 2406.13359 | null |
| 2024-06-19 | Deep Learning-Based 3D Instance and Semantic Segmentation: A Review | Siddiqui Muhammad Yasir et.al. | 2406.13308 | null |
| 2024-06-18 | Reparameterizable Dual-Resolution Network for Real-time Semantic Segmentation | Guoyu Yang et.al. | 2406.12496 | link |
| 2024-06-18 | Competitive Learning for Achieving Content-specific Filters in Video Coding for Machines | Honglei Zhang et.al. | 2406.12367 | null |
| 2024-06-18 | Agriculture-Vision Challenge 2024 – The Runner-Up Solution for Agricultural Pattern Recognition via Class Balancing and Model Ensemble | Wang Liu et.al. | 2406.12271 | null |
| 2024-06-17 | OoDIS: Anomaly Instance Segmentation Benchmark | Alexey Nekrasov et.al. | 2406.11835 | link |
| 2024-06-17 | Multimodal Learning To Improve Segmentation With Intraoperative CBCT & Preoperative CT | Maximilian E. Tschuchnig et.al. | 2406.11650 | null |
| 2024-06-17 | Learning from Exemplars for Interactive Image Segmentation | Kun Li et.al. | 2406.11472 | null |
| 2024-06-17 | SWCF-Net: Similarity-weighted Convolution and Local-global Fusion for Efficient Large-scale Point Cloud Semantic Segmentation | Zhenchao Lin et.al. | 2406.11441 | link |
| 2024-06-17 | Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding | Yunsong Wang et.al. | 2406.11283 | null |
| 2024-06-17 | Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation | Bingfeng Zhang et.al. | 2406.11189 | null |
| 2024-06-16 | $α$ -SSC: Uncertainty-Aware Camera-based 3D Semantic Scene Completion | Sanbao Su et.al. | 2406.11021 | null |
| 2024-06-16 | Benchmarking Label Noise in Instance Segmentation: Spatial Noise Matters | Moshe Kimhi et.al. | 2406.10891 | link |
| 2024-06-16 | PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery | Libo Wang et.al. | 2406.10828 | link |
| 2024-06-15 | GenMM: Geometrically and Temporally Consistent Multimodal Data Generation for Video and LiDAR | Bharat Singh et.al. | 2406.10722 | null |
| 2024-06-14 | Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations | Daan de Geus et.al. | 2406.10114 | null |
| 2024-06-14 | ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers | Narges Norouzi et.al. | 2406.09936 | null |
| 2024-06-14 | Label-Efficient Semantic Segmentation of LiDAR Point Clouds in Adverse Weather Conditions | Aldi Piroli et.al. | 2406.09906 | null |
| 2024-06-14 | Exploring the Benefits of Vision Foundation Models for Unsupervised Domain Adaptation | Brunó B. Englert et.al. | 2406.09896 | link |
| 2024-06-14 | Open-Vocabulary Semantic Segmentation with Image Embedding Balancing | Xiangheng Shan et.al. | 2406.09829 | link |
| 2024-06-14 | 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities | Roman Bachmann et.al. | 2406.09406 | null |
| 2024-06-13 | Instance-level quantitative saliency in multiple sclerosis lesion segmentation | Federico Spagnolo et.al. | 2406.09335 | null |
| 2024-06-13 | APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation | Weizhao He et.al. | 2406.08372 | null |
| 2024-06-12 | Dataset Enhancement with Instance-Level Augmentations | Orest Kupyn et.al. | 2406.08249 | link |
| 2024-06-12 | 2nd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation | Zhensong Xu et.al. | 2406.08192 | null |
| 2024-06-13 | A $^{2}$ -MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder | Lixian Zhang et.al. | 2406.08079 | null |
| 2024-06-12 | OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding | Yinan Deng et.al. | 2406.08009 | link |
| 2024-06-12 | SimSAM: Simple Siamese Representations Based Semantic Affinity Matrix for Unsupervised Image Segmentation | Chanda Grover Kamra et.al. | 2406.07986 | link |
| 2024-06-12 | Small Scale Data-Free Knowledge Distillation | He Liu et.al. | 2406.07876 | link |
| 2024-06-11 | Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph | Sergey Linok et.al. | 2406.07113 | null |
| 2024-06-11 | PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving | Yining Shi et.al. | 2406.07037 | null |
| 2024-06-11 | RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks | Zhechao Wang et.al. | 2406.07032 | null |
| 2024-06-12 | LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection | Jiahua Xu et.al. | 2406.07023 | null |
| 2024-06-11 | Dual Thinking and Perceptual Analysis of Deep Learning Models using Human Adversarial Examples | Kailas Dayanandan et.al. | 2406.06967 | link |
| 2024-06-11 | UVIS: Unsupervised Video Instance Segmentation | Shuaiyi Huang et.al. | 2406.06908 | null |
| 2024-06-10 | Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation | Dong Zhao et.al. | 2406.06813 | null |
| 2024-06-10 | Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Louis Blankemeier et.al. | 2406.06512 | link |
| 2024-06-10 | UMAD: Unsupervised Mask-Level Anomaly Detection for Autonomous Driving | Daniel Bogdoll et.al. | 2406.06370 | null |
| 2024-06-10 | Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset | Shijie Lian et.al. | 2406.06039 | link |
| 2024-06-09 | Scaling Graph Convolutions for Mobile Vision | William Avery et.al. | 2406.05850 | link |
| 2024-06-09 | Solution for CVPR 2024 UG2+ Challenge Track on All Weather Semantic Segmentation | Jun Yu et.al. | 2406.05837 | null |
| 2024-06-09 | Convolution and Attention-Free Mamba-based Cardiac Image Segmentation | Abbas Khan et.al. | 2406.05786 | null |
| 2024-06-09 | Separating the “Chirp” from the “Chat”: Self-supervised Visual Grounding of Sound and Language | Mark Hamilton et.al. | 2406.05629 | link |
| 2024-06-08 | A Two-Stage Adverse Weather Semantic Segmentation Method for WeatherProof Challenge CVPR 2024 Workshop UG2+ | Jianzhao Wang et.al. | 2406.05513 | null |
| 2024-06-08 | Layered Image Vectorization via Semantic Simplification | Zhenyu Wang et.al. | 2406.05404 | null |
| 2024-06-08 | 1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR’24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation | Qingfeng Liu et.al. | 2406.05352 | null |
| 2024-06-07 | Semantic Segmentation on VSPW Dataset through Masked Video Consistency | Chen Liang et.al. | 2406.04979 | null |
| 2024-06-07 | Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment | Venkanna Babu Guthula et.al. | 2406.04949 | null |
| 2024-06-06 | Characterizing segregation in blast rock piles a deep-learning approach leveraging aerial image analysis | Chengeng Liu et.al. | 2406.04149 | null |
| 2024-06-07 | 3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation | Ruipu Wu et.al. | 2406.04002 | null |
| 2024-06-06 | Frequency-based Matcher for Long-tailed Semantic Segmentation | Shan Li et.al. | 2406.03917 | link |
| 2024-06-07 | Enhanced Semantic Segmentation Pipeline for WeatherProof Dataset Challenge | Nan Zhang et.al. | 2406.03799 | link |
| 2024-06-06 | Instance Segmentation and Teeth Classification in Panoramic X-rays | Devichand Budagam et.al. | 2406.03747 | link |
| 2024-06-06 | DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation | Zilu Guo et.al. | 2406.03702 | link |
| 2024-06-05 | Comparative Benchmarking of Failure Detection Methods in Medical Image Segmentation: Unveiling the Role of Confidence Aggregation | Maximilian Zenk et.al. | 2406.03323 | null |
| 2024-06-05 | Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy | Yunho Kim et.al. | 2406.02989 | null |
| 2024-06-04 | W-RIZZ: A Weakly-Supervised Framework for Relative Traversability Estimation in Mobile Robotics | Andre Schreiber et.al. | 2406.02822 | link |
| 2024-06-04 | Window to Wall Ratio Detection using SegFormer | Zoe De Simone et.al. | 2406.02706 | link |
| 2024-06-04 | Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation | Mohamed El Amine Boudjoghra et.al. | 2406.02548 | link |
| 2024-06-04 | Generative Active Learning for Long-tailed Instance Segmentation | Muzhi Zhu et.al. | 2406.02435 | link |
| 2024-06-04 | Detecting Endangered Marine Species in Autonomous Underwater Vehicle Imagery Using Point Annotations and Few-Shot Learning | Heather Doig et.al. | 2406.01932 | null |
| 2024-06-03 | MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild | Zeren Jiang et.al. | 2406.01595 | null |
| 2024-06-03 | Towards Flexible Interactive Reflection Removal with Human Guidance | Xiao Chen et.al. | 2406.01555 | link |
| 2024-06-03 | EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding | Thanh-Dat Truong et.al. | 2406.01429 | null |
| 2024-06-03 | An expert-driven data generation pipeline for histological images | Roberto Basla et.al. | 2406.01403 | link |
| 2024-06-03 | TE-NeXt: A LiDAR-Based 3D Sparse Convolutional Network for Traversability Estimation | Antonio Santo et.al. | 2406.01395 | link |
| 2024-06-03 | MP-PolarMask: A Faster and Finer Instance Segmentation for Concave Images | Ke-Lei Wang et.al. | 2406.01356 | null |
| 2024-06-03 | ARCH2S: Dataset, Benchmark and Challenges for Learning Exterior Architectural Structures from Point Clouds | Ka Lung Cheung et.al. | 2406.01337 | link |
| 2024-05-31 | Uncertainty Quantification for Bird’s Eye View Semantic Segmentation: Methods and Benchmarks | Linlin Yu et.al. | 2405.20986 | null |
| 2024-05-31 | Extreme Point Supervised Instance Segmentation | Hyeonjun Lee et.al. | 2405.20729 | null |
| 2024-05-31 | Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation | Wooseok Shin et.al. | 2405.20610 | link |
| 2024-05-30 | P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation | Qi Zhang et.al. | 2405.20443 | null |
| 2024-05-30 | SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow | Chaoyang Wang et.al. | 2405.20282 | link |
| 2024-05-30 | MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and Motion | Angel Villar-Corrales et.al. | 2405.19921 | link |
| 2024-05-30 | Open-Set Domain Adaptation for Semantic Segmentation | Seun-An Choe et.al. | 2405.19899 | link |
| 2024-05-30 | DenseSeg: Joint Learning for Semantic Segmentation and Landmark Detection Using Dense Image-to-Shape Representation | Ron Keuth et.al. | 2405.19746 | link |
| 2024-05-30 | Twin Deformable Point Convolutions for Point Cloud Semantic Segmentation in Remote Sensing Scenes | Yong-Qiang Mao et.al. | 2405.19735 | null |
| 2024-05-30 | CRIS: Collaborative Refinement Integrated with Segmentation for Polyp Segmentation | Ankush Gajanan Arudkar et.al. | 2405.19672 | null |
| 2024-05-29 | Organizing Background to Explore Latent Classes for Incremental Few-shot Semantic Segmentation | Lianlei Shan et.al. | 2405.19568 | null |
| 2024-05-29 | Enabling Visual Recognition at Radio Frequency | Haowen Lai et.al. | 2405.19516 | null |
| 2024-05-29 | Reasoning3D – Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models | Tianrun Chen et.al. | 2405.19326 | null |
| 2024-05-29 | A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation | Niclas Vödisch et.al. | 2405.19035 | link |
| 2024-05-29 | Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation | Zelin Peng et.al. | 2405.18840 | null |
| 2024-05-29 | FocSAM: Delving Deeply into Focused Objects in Segmenting Anything | You Huang et.al. | 2405.18706 | null |
| 2024-05-28 | Learning to Detour: Shortcut Mitigating Augmentation for Weakly Supervised Semantic Segmentation | JuneHyoung Kwon et.al. | 2405.18148 | null |
| 2024-05-28 | Edge-guided and Class-balanced Active Learning for Semantic Segmentation of Aerial Images | Lianlei Shan et.al. | 2405.18078 | null |
| 2024-05-28 | RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields | Mihnea-Bogdan Jurca et.al. | 2405.18033 | null |
| 2024-05-28 | DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture | Shentong Mo et.al. | 2405.17995 | null |
| 2024-05-28 | Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation | Yangxiao Lu et.al. | 2405.17859 | link |
| 2024-05-28 | The Binary Quantized Neural Network for Dense Prediction via Specially Designed Upsampling and Attention | Xingyu Ding et.al. | 2405.17776 | null |
| 2024-05-27 | Evaluation of Multi-task Uncertainties in Joint Semantic Segmentation and Monocular Depth Estimation | Steven Landgraf et.al. | 2405.17097 | null |
| 2024-05-27 | DSU-Net: Dynamic Snake U-Net for 2-D Seismic First Break Picking | Hongtao Wang et.al. | 2405.16980 | null |
| 2024-05-27 | Collective Perception Datasets for Autonomous Driving: A Comprehensive Review | Sven Teufel et.al. | 2405.16973 | null |
| 2024-05-27 | Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models | Qian Wang et.al. | 2405.16947 | null |
| 2024-05-27 | A re-calibration method for object detection with multi-modal alignment bias in autonomous driving | Zhihang Song et.al. | 2405.16848 | null |
| 2024-05-26 | Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning | Neha Kalibhat et.al. | 2405.16401 | null |
| 2024-05-25 | Video Prediction Models as General Visual Encoders | James Maier et.al. | 2405.16382 | null |
| 2024-05-25 | BOLD: Boolean Logic Deep Learning | Van Minh Nguyen et.al. | 2405.16339 | null |
| 2024-05-25 | Improving 3D Occupancy Prediction through Class-balancing Loss and Multi-scale Representation | Huizhou Chen et.al. | 2405.16099 | null |
| 2024-05-25 | Intensity and Texture Correction of Omnidirectional Image Using Camera Images for Indirect Augmented Reality | Hakim Ikebayashi et.al. | 2405.16008 | null |
| 2024-05-24 | Visualize and Paint GAN Activations | Rudolf Herdt et.al. | 2405.15636 | null |
| 2024-05-24 | Leveraging knowledge distillation for partial multi-task learning from multiple remote sensing datasets | Hoàng-Ân Lê et.al. | 2405.15394 | null |
| 2024-05-24 | Autonomous Quilt Spreading for Caregiving Robots | Yuchun Guo et.al. | 2405.15373 | null |
| 2024-05-24 | U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation | Bingyu Li et.al. | 2405.15365 | link |
| 2024-05-24 | Cross-Domain Few-Shot Semantic Segmentation via Doubly Matching Transformation | Jiayi Chen et.al. | 2405.15265 | null |
| 2024-05-23 | Mamba-R: Vision Mamba ALSO Needs Registers | Feng Wang et.al. | 2405.14858 | null |
| 2024-05-23 | Efficient Robot Learning for Perception and Mapping | Niclas Vödisch et.al. | 2405.14688 | null |
| 2024-05-23 | Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation | Daniel Kienzle et.al. | 2405.14467 | null |
| 2024-05-23 | MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models | Jiuming Liu et.al. | 2405.14338 | null |
| 2024-05-23 | Tuning-free Universally-Supervised Semantic Segmentation | Xiaobo Yang et.al. | 2405.14294 | null |
| 2024-05-23 | SCMix: Stochastic Compound Mixing for Open Compound Domain Adaptation in Semantic Segmentation | Kai Yao et.al. | 2405.14278 | null |
| 2024-05-23 | Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations | Mohammed Baharoon et.al. | 2405.14239 | null |
| 2024-05-23 | Leveraging Semantic Segmentation Masks with Embeddings for Fine-Grained Form Classification | Taylor Archibald et.al. | 2405.14162 | null |
| 2024-05-23 | Skip-SCAR: A Modular Approach to ObjectGoal Navigation with Sparsity and Adaptive Skips | Yaotian Liu et.al. | 2405.14154 | null |
| 2024-05-22 | TS40K: a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission System | Diogo Lavado et.al. | 2405.13989 | null |
| 2024-05-21 | Transparency Distortion Robustness for SOTA Image Segmentation Tasks | Volker Knauthe et.al. | 2405.12864 | null |
| 2024-05-20 | A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation | Sushmita Sarker et.al. | 2405.11903 | null |
| 2024-05-20 | Salience-guided Ground Factor for Robust Localization of Delivery Robots in Complex Urban Environments | Jooyong Park et.al. | 2405.11855 | null |
| 2024-05-20 | Improving the Explain-Any-Concept by Introducing Nonlinearity to the Trainable Surrogate Model | Mounes Zaval et.al. | 2405.11837 | null |
| 2024-05-20 | Universal Organizer of SAM for Unsupervised Semantic Segmentation | Tingting Li et.al. | 2405.11742 | null |
| 2024-05-19 | Interpreting a Semantic Segmentation Model for Coastline Detection | Conor O’Sullivan et.al. | 2405.11500 | null |
| 2024-05-19 | Unifying 3D Vision-Language Understanding via Promptable Queries | Ziyu Zhu et.al. | 2405.11442 | null |
| 2024-05-18 | PS6D: Point Cloud Based Symmetry-Aware 6D Object Pose Estimation in Robot Bin-Picking | Yifan Yang et.al. | 2405.11257 | null |
| 2024-05-17 | CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation | Mushui Liu et.al. | 2405.10530 | link |
| 2024-05-16 | 4D Panoptic Scene Graph Generation | Jingkang Yang et.al. | 2405.10305 | link |
| 2024-05-16 | Towards Task-Compatible Compressible Representations | Anderson de Andrade et.al. | 2405.10244 | link |
| 2024-05-16 | DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data | Chengxiang Fan et.al. | 2405.10185 | link |
| 2024-05-16 | An Integrated Framework for Multi-Granular Explanation of Video Summarization | Konstantinos Tsigos et.al. | 2405.10082 | null |
| 2024-05-16 | A Preprocessing and Postprocessing Voxel-based Method for LiDAR Semantic Segmentation Improvement in Long Distance | Andrea Matteazzi et.al. | 2405.10046 | null |
| 2024-05-16 | Towards Realistic Incremental Scenario in Class Incremental Semantic Segmentation | Jihwan Kwak et.al. | 2405.09858 | null |
| 2024-05-15 | Synth-to-Real Unsupervised Domain Adaptation for Instance Segmentation | Guo Yachan et.al. | 2405.09682 | null |
| 2024-05-14 | CLIP with Quality Captions: A Strong Pretraining for Vision Tasks | Pavan Kumar Anasosalu Vasu et.al. | 2405.08911 | null |
| 2024-05-14 | Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study | Qinfeng Zhu et.al. | 2405.08493 | null |
| 2024-05-14 | TEDNet: Twin Encoder Decoder Neural Network for 2D Camera and LiDAR Road Detection | Martín Bayón-Gutiérrez et.al. | 2405.08429 | link |
| 2024-05-13 | IMAFD: An Interpretable Multi-stage Approach to Flood Detection from time series Multispectral Data | Ziyang Zhang et.al. | 2405.07916 | null |
| 2024-05-13 | PLUTO: Pathology-Universal Transformer | Dinkar Juyal et.al. | 2405.07905 | null |
| 2024-05-12 | PotatoGANs: Utilizing Generative Adversarial Networks, Instance Segmentation, and Explainable AI for Enhanced Potato Disease Identification and Classification | Mohammad Shafiul Alam et.al. | 2405.07332 | link |
| 2024-05-12 | Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception | Haoming Chen et.al. | 2405.07201 | null |
| 2024-05-11 | Global Motion Understanding in Large-Scale Video Object Segmentation | Volodymyr Fedynyak et.al. | 2405.07031 | null |
| 2024-05-10 | GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs | Mustafa Munir et.al. | 2405.06849 | link |
| 2024-05-10 | Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach | Elham Ravanbakhsh et.al. | 2405.06586 | null |
| 2024-05-10 | Semantic and Spatial Adaptive Pixel-level Classifier for Semantic Segmentation | Xiaowen Ma et.al. | 2405.06525 | link |
| 2024-05-10 | Multi-Target Unsupervised Domain Adaptation for Semantic Segmentation without External Data | Yonghao Xu et.al. | 2405.06502 | null |
| 2024-05-10 | Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data | Rongyu Zhang et.al. | 2405.06413 | null |
| 2024-05-10 | Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation | Zhenliang Ni et.al. | 2405.06228 | link |
| 2024-05-10 | Zero-shot Degree of Ill-posedness Estimation for Active Small Object Change Detection | Koji Takeda et.al. | 2405.06185 | null |
| 2024-05-10 | Prior-guided Diffusion Model for Cell Segmentation in Quantitative Phase Imaging | Zhuchen Shao et.al. | 2405.06175 | null |
| 2024-05-09 | Mask-TS Net: Mask Temperature Scaling Uncertainty Calibration for Polyp Segmentation | Yudian Zhang et.al. | 2405.05830 | null |
| 2024-05-09 | CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks | Nick et.al. | 2405.05755 | null |
| 2024-05-08 | OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies | Lingdong Kong et.al. | 2405.05259 | link |
| 2024-05-08 | Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving | Lingdong Kong et.al. | 2405.05258 | link |
| 2024-05-08 | Weakly-supervised Semantic Segmentation via Dual-stream Contrastive Learning of Cross-image Contextual Information | Qi Lai et.al. | 2405.04913 | null |
| 2024-05-08 | DeepDamageNet: A two-step deep-learning model for multi-disaster building damage segmentation and classification using satellite imagery | Irene Alisjahbana et.al. | 2405.04800 | null |
| 2024-05-07 | A Self-Supervised Method for Body Part Segmentation and Keypoint Detection of Rat Images | László Kopácsi et.al. | 2405.04650 | null |
| 2024-05-07 | FRACTAL: An Ultra-Large-Scale Aerial Lidar Dataset for 3D Semantic Segmentation of Diverse Landscapes | Charles Gaydon et.al. | 2405.04634 | link |
| 2024-05-07 | AugmenTory: A Fast and Flexible Polygon Augmentation Library | Tanaz Ghahremani et.al. | 2405.04442 | null |
| 2024-05-07 | A New Dataset and Comparative Study for Aphid Cluster Detection and Segmentation in Sorghum Fields | Raiyan Rahman et.al. | 2405.04305 | null |
| 2024-05-07 | ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation | Zhibo Zhang et.al. | 2405.04121 | null |
| 2024-05-07 | Structured Click Control in Transformer-based Interactive Segmentation | Long Xu et.al. | 2405.04009 | link |
| 2024-05-06 | PTQ4SAM: Post-Training Quantization for Segment Anything | Chengtao Lv et.al. | 2405.03144 | link |
| 2024-05-04 | MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning | Vishal Nedungadi et.al. | 2405.02771 | null |
| 2024-05-04 | Few-Shot Fruit Segmentation via Transfer Learning | Jordan A. James et.al. | 2405.02556 | null |
| 2024-05-03 | Panoptic-SLAM: Visual SLAM in Dynamic Environments using Panoptic Segmentation | Gabriel Fischer Abati et.al. | 2405.02177 | null |
| 2024-05-03 | Towards general deep-learning-based tree instance segmentation models | Jonathan Henrich et.al. | 2405.02061 | null |
| 2024-05-03 | DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model | Peijin Jia et.al. | 2405.02008 | null |
| 2024-05-02 | Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey | Guoping Xu et.al. | 2405.01725 | link |
| 2024-05-02 | Explainable AI (XAI) in Image Segmentation in Medicine, Industry, and Beyond: A Survey | Rokas Gipiškis et.al. | 2405.01636 | null |
| 2024-05-02 | CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation | Chenying Liu et.al. | 2405.01217 | null |
| 2024-05-02 | Uncertainty-aware self-training with expectation maximization basis transformation | Zijia Wang et.al. | 2405.01175 | null |
| 2024-05-01 | GraCo: Granularity-Controllable Interactive Segmentation | Yian Zhao et.al. | 2405.00587 | null |
| 2024-05-01 | Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis | Huy H. Nguyen et.al. | 2405.00355 | null |
| 2024-04-30 | Masked Multi-Query Slot Attention for Unsupervised Object Discovery | Rishav Pramanik et.al. | 2404.19654 | link |
| 2024-04-30 | UniFS: Universal Few-shot Instance Perception with Point Representations | Sheng Jin et.al. | 2404.19401 | null |
| 2024-04-30 | DELINE8K: A Synthetic Data Pipeline for the Semantic Segmentation of Historical Documents | Taylor Archibald et.al. | 2404.19259 | null |
| 2024-04-29 | Swin2-MoSE: A New Single Image Super-Resolution Model for Remote Sensing | Leonardo Rossi et.al. | 2404.18924 | null |
| 2024-04-29 | IPixMatch: Boost Semi-supervised Semantic Segmentation with Inter-Pixel Relation | Kebin Wu et.al. | 2404.18891 | null |
| 2024-04-29 | From Density to Geometry: YOLOv8 Instance Segmentation for Reverse Engineering of Optimized Structures | Thomas Rochefort-Beaudoin et.al. | 2404.18763 | null |
| 2024-04-29 | Towards Long-term Robotics in the Wild | Stephen Hausler et.al. | 2404.18477 | null |
| 2024-04-29 | Clicks2Line: Using Lines for Interactive Image Segmentation | Chaewon Lee et.al. | 2404.18461 | null |
| 2024-04-29 | MFP: Making Full Use of Probability Maps for Interactive Image Segmentation | Chaewon Lee et.al. | 2404.18448 | null |
| 2024-04-28 | Panoptic Segmentation and Labelling of Lumbar Spine Vertebrae using Modified Attention Unet | Rikathi Pal et.al. | 2404.18291 | null |
| 2024-04-28 | Garbage Segmentation and Attribute Analysis by Robotic Dogs | Nuo Xu et.al. | 2404.18112 | null |
| 2024-04-27 | Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic Environments | Benoît Gérin et.al. | 2404.17930 | link |
| 2024-04-27 | GLIMS: Attention-Guided Lightweight Multi-Scale Hybrid Network for Volumetric Semantic Segmentation | Ziya Ata Yazıcı et.al. | 2404.17854 | link |
| 2024-04-26 | Optimizing Universal Lesion Segmentation: State Space Model-Guided Hierarchical Networks with Feature Importance Adjustment | Kazi Shahriar Sanjid et.al. | 2404.17235 | null |
| 2024-04-25 | Calculation of Femur Caput Collum Diaphyseal angle for X-Rays images using Semantic Segmentation | Deepak Bhatia et.al. | 2404.17083 | null |
| 2024-04-25 | Boosting Unsupervised Semantic Segmentation with Principal Mask Proposals | Oliver Hahn et.al. | 2404.16818 | link |
| 2024-04-25 | Self-Balanced R-CNN for Instance Segmentation | Leonardo Rossi et.al. | 2404.16633 | link |
| 2024-04-26 | Multi-Scale Representations by Varying Window Attention for Semantic Segmentation | Haotian Yan et.al. | 2404.16573 | link |
| 2024-04-25 | 360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes | Xu Zheng et.al. | 2404.16501 | null |
| 2024-04-25 | Semantic Segmentation Refiner for Ultrasound Applications with Zero-Shot Foundation Models | Hedda Cohen Indelman et.al. | 2404.16325 | null |
| 2024-04-25 | Style Adaptation for Domain-adaptive Semantic Segmentation | Ting Li et.al. | 2404.16301 | null |
| 2024-04-25 | A Multi-objective Optimization Benchmark Test Suite for Real-time Semantic Segmentation | Yifan Zhao et.al. | 2404.16266 | link |
| 2024-04-24 | Does SAM dream of EIG? Characterizing Interactive Segmenter Performance using Expected Information Gain | Kuan-I Chung et.al. | 2404.16155 | null |
| 2024-04-24 | 3D Freehand Ultrasound using Visual Inertial and Deep Inertial Odometry for Measuring Patellar Tracking | Russell Buchanan et.al. | 2404.15847 | null |
| 2024-04-24 | Vision Transformer-based Adversarial Domain Adaptation | Yahan Li et.al. | 2404.15817 | link |
| 2024-04-23 | PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts | Hao Li et.al. | 2404.15028 | link |
| 2024-04-23 | Unknown Object Grasping for Assistive Robotics | Elle Miller et.al. | 2404.15001 | null |
| 2024-04-22 | Surgical-DeSAM: Decoupling SAM for Instrument Segmentation in Robotic Surgery | Yuyang Sheng et.al. | 2404.14040 | link |
| 2024-04-22 | OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks | Sophia Sirko-Galouchenko et.al. | 2404.14027 | null |
| 2024-04-22 | PM-VIS: High-Performance Box-Supervised Video Instance Segmentation | Zhangjing Yang et.al. | 2404.13863 | null |
| 2024-04-21 | Semantic-Rearrangement-Based Multi-Level Alignment for Domain Generalized Segmentation | Guanlong Jiao et.al. | 2404.13701 | null |
| 2024-04-21 | PV-S3: Advancing Automatic Photovoltaic Defect Detection using Semi-Supervised Semantic Segmentation of Electroluminescence Images | Abhishek Jha et.al. | 2404.13693 | null |
| 2024-04-21 | A Complete System for Automated 3D Semantic-Geometric Mapping of Corrosion in Industrial Environments | Rui Pimentel de Figueiredo et.al. | 2404.13691 | null |
| 2024-04-21 | LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing | Tong Wang et.al. | 2404.13659 | null |
| 2024-04-21 | Towards Unified Representation of Multi-Modal Pre-training for 3D Understanding via Differentiable Rendering | Ben Fei et.al. | 2404.13619 | null |
| 2024-04-20 | FisheyeDetNet: Object Detection on Fisheye Surround View Camera Systems for Automated Driving | Ganesh Sistu et.al. | 2404.13443 | null |
| 2024-04-20 | AMMUNet: Multi-Scale Attention Map Merging for Remote Sensing Image Segmentation | Yang Yang et.al. | 2404.13408 | null |
| 2024-04-19 | Nuclei Instance Segmentation of Cryosectioned H&E Stained Histological Images using Triple U-Net Architecture | Zarif Ahmed et.al. | 2404.12986 | null |
| 2024-04-19 | FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving | Xingtai Gui et.al. | 2404.12867 | null |
| 2024-04-19 | Foundation Model assisted Weakly Supervised LiDAR Semantic Segmentation | Yilong Chen et.al. | 2404.12861 | null |
| 2024-04-19 | COIN: Counterfactual inpainting for weakly supervised semantic segmentation for medical images | Dmytro Shvetsov et.al. | 2404.12832 | link |
| 2024-04-19 | A Point-Based Approach to Efficient LiDAR Multi-Task Perception | Christopher Lang et.al. | 2404.12798 | null |
| 2024-04-19 | Generalized Few-Shot Meets Remote Sensing: Discovering Novel Classes in Land Cover Mapping via Hybrid Semantic Segmentation Framework | Zhuohong Li et.al. | 2404.12721 | link |
| 2024-04-19 | Improving Prediction Accuracy of Semantic Segmentation Methods Using Convolutional Autoencoder Based Pre-processing Layers | Hisashi Shimodaira et.al. | 2404.12718 | null |
| 2024-04-19 | Show and Grasp: Few-shot Semantic Segmentation for Robot Grasping through Zero-shot Foundation Models | Leonardo Barcellona et.al. | 2404.12717 | null |
| 2024-04-18 | Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds | Oliver Lemke et.al. | 2404.12440 | null |
| 2024-04-18 | A Perspective on Deep Vision Performance with Standard Image and Video Codecs | Christoph Reich et.al. | 2404.12330 | null |
| 2024-04-18 | Performance Evaluation of Segment Anything Model with Variational Prompting for Application to Non-Visible Spectrum Imagery | Yona Falinie A. Gaus et.al. | 2404.12285 | null |
| 2024-04-18 | Deep Gaussian mixture model for unsupervised image segmentation | Matthias Schwab et.al. | 2404.12252 | null |
| 2024-04-18 | Observation, Analysis, and Solution: Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training | Jin Gao et.al. | 2404.12210 | link |
| 2024-04-18 | How to Benchmark Vision Foundation Models for Semantic Segmentation? | Tommie Kerssies et.al. | 2404.12172 | null |
| 2024-04-17 | Mushroom Segmentation and 3D Pose Estimation from Point Clouds using Fully Convolutional Geometric Features and Implicit Pose Encoding | George Retsinas et.al. | 2404.12144 | link |
| 2024-04-18 | Tendency-driven Mutual Exclusivity for Weakly Supervised Incremental Semantic Segmentation | Chongjie Si et.al. | 2404.11981 | null |
| 2024-04-18 | The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models | Cheng Shi et.al. | 2404.11957 | link |
| 2024-04-18 | Group-On: Boosting One-Shot Segmentation with Supportive Query | Hanjing Zhou et.al. | 2404.11871 | null |
| 2024-04-17 | Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach | Mir Rayat Imtiaz Hossain et.al. | 2404.11732 | null |
| 2024-04-17 | A Semantic Segmentation-guided Approach for Ground-to-Aerial Image Matching | Francesco Pro et.al. | 2404.11302 | link |
| 2024-04-17 | Learning from Unlabelled Data with Transformers: Domain Adaptation for Semantic Segmentation of High Resolution Aerial Images | Nikolaos Dionelis et.al. | 2404.11299 | link |
| 2024-04-17 | Criteria for Uncertainty-based Corner Cases Detection in Instance Segmentation | Florian Heidecker et.al. | 2404.11266 | null |
| 2024-04-16 | A Concise Tiling Strategy for Preserving Spatial Context in Earth Observation Imagery | Ellianna Abrahams et.al. | 2404.10927 | link |
| 2024-04-16 | Vocabulary-free Image Classification and Semantic Segmentation | Alessandro Conti et.al. | 2404.10864 | link |
| 2024-04-16 | Gasformer: A Transformer-based Architecture for Segmenting Methane Emissions from Livestock in Optical Gas Imaging | Toqi Tahamid Sarker et.al. | 2404.10841 | link |
| 2024-04-16 | Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark | Jiangning Zhang et.al. | 2404.10760 | null |
| 2024-04-16 | ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation | Iaroslav Melekhov et.al. | 2404.10699 | null |
| 2024-04-16 | Contextrast: Contextual Contrastive Learning for Semantic Segmentation | Changki Sung et.al. | 2404.10633 | null |
| 2024-04-16 | Label merge-and-split: A graph-colouring approach for memory-efficient brain parcellation | Aaron Kujawa et.al. | 2404.10572 | null |
| 2024-04-16 | LAECIPS: Large Vision Model Assisted Adaptive Edge-Cloud Collaboration for IoT-based Perception System | Shijing Hu et.al. | 2404.10498 | null |
| 2024-04-16 | Adversarial Identity Injection for Semantic Face Image Synthesis | Giuseppe Tarollo et.al. | 2404.10408 | null |
| 2024-04-16 | Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation | Jiapeng Su et.al. | 2404.10322 | null |
| 2024-04-16 | Learnable Prompt for Few-Shot Semantic Segmentation in Remote Sensing Domain | Steve Andreas Immanuel et.al. | 2404.10307 | link |
| 2024-04-15 | NOISe: Nuclei-Aware Osteoclast Instance Segmentation for Mouse-to-Human Domain Transfer | Sai Kumar Reddy Manne et.al. | 2404.10130 | link |
| 2024-04-15 | Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL | Fangwei Zhong et.al. | 2404.09857 | null |
| 2024-04-15 | In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation | Han Xue et.al. | 2404.09633 | null |
| 2024-04-15 | The revenge of BiSeNet: Efficient Multi-Task Image Segmentation | Gabriele Rosi et.al. | 2404.09570 | null |
| 2024-04-15 | kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies | Zhongrui Gui et.al. | 2404.09447 | null |
| 2024-04-15 | Human-in-the-Loop Segmentation of Multi-species Coral Imagery | Scarlett Raine et.al. | 2404.09406 | null |
| 2024-04-14 | Bridging Data Islands: Geographic Heterogeneity-Aware Federated Learning for Collaborative Remote Sensing Semantic Segmentation | Jieyi Tan et.al. | 2404.09292 | null |
| 2024-04-12 | Structured Model Pruning for Efficient Inference in Computational Pathology | Mohammed Adnan et.al. | 2404.08831 | null |
| 2024-04-12 | COCONut: Modernizing COCO Segmentation | Xueqing Deng et.al. | 2404.08639 | null |
| 2024-04-12 | Benchmarking the Cell Image Segmentation Models Robustness under the Microscope Optical Aberrations | Boyuan Peng et.al. | 2404.08549 | null |
| 2024-04-12 | Analyzing Decades-Long Environmental Changes in Namibia Using Archival Aerial Photography and Deep Learning | Girmaw Abebe Tadesse et.al. | 2404.08544 | null |
| 2024-04-12 | LaSagnA: Language-based Segmentation Assistant for Complex Queries | Cong Wei et.al. | 2404.08506 | link |
| 2024-04-12 | Adapting the Segment Anything Model During Usage in Novel Situations | Robin Schön et.al. | 2404.08421 | null |
| 2024-04-12 | Let It Flow: Simultaneous Optimization of 3D Flow and Object Clustering | Patrik Vacek et.al. | 2404.08363 | null |
| 2024-04-12 | AdaContour: Adaptive Contour Descriptor with Hierarchical Representation | Tianyu Ding et.al. | 2404.08292 | null |
| 2024-04-12 | Tackling Ambiguity from Perspective of Uncertainty Inference and Affinity Diversification for Weakly Supervised Semantic Segmentation | Zhiwei Yang et.al. | 2404.08195 | link |
| 2024-04-12 | Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation | Sina Hajimiri et.al. | 2404.08181 | link |
| 2024-04-11 | Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification | Ricardo Pereira et.al. | 2404.07739 | null |
| 2024-04-11 | OpenTrench3D: A Photogrammetric 3D Point Cloud Dataset for Semantic Segmentation of Underground Utilities | Lasse H. Hansen et.al. | 2404.07711 | link |
| 2024-04-11 | ViM-UNet: Vision Mamba for Biomedical Segmentation | Anwai Archit et.al. | 2404.07705 | link |
| 2024-04-11 | Implicit and Explicit Language Guidance for Diffusion-based Visual Perception | Hefeng Wang et.al. | 2404.07600 | null |
| 2024-04-11 | Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling | Sourajit Saha et.al. | 2404.07410 | null |
| 2024-04-10 | AI-Guided Defect Detection Techniques to Model Single Crystal Diamond Growth | Rohan Reddy Mekala et.al. | 2404.07306 | null |
| 2024-04-10 | RESSCAL3D: Resolution Scalable 3D Semantic Segmentation of Point Clouds | Remco Royen et.al. | 2404.06863 | null |
| 2024-04-10 | O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation | Muer Tie et.al. | 2404.06836 | null |
| 2024-04-10 | Convolution-based Probability Gradient Loss for Semantic Segmentation | Guohang Shan et.al. | 2404.06704 | null |
| 2024-04-09 | Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation | Luca Barsellotti et.al. | 2404.06542 | null |
| 2024-04-09 | QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding | Yash Mehan et.al. | 2404.06442 | null |
| 2024-04-09 | DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird’s Eye View Segmentation with Occlusion Reasoning | Senthil Yogamani et.al. | 2404.06352 | null |
| 2024-04-09 | Automated National Urban Map Extraction | Hasan Nasrallah et.al. | 2404.06202 | null |
| 2024-04-09 | Hierarchical Insights: Exploiting Structural Similarities for Reliable 3D Semantic Segmentation | Mariella Dreissig et.al. | 2404.06124 | null |
| 2024-04-09 | Improving Facial Landmark Detection Accuracy and Efficiency with Knowledge Distillation | Zong-Wei Hong et.al. | 2404.06029 | null |
| 2024-04-08 | Evaluating the Efficacy of Cut-and-Paste Data Augmentation in Semantic Segmentation for Satellite Imagery | Ionut M. Motoi et.al. | 2404.05693 | null |
| 2024-04-08 | AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation | Jiannan Ge et.al. | 2404.05667 | null |
| 2024-04-08 | Impact of LiDAR visualisations on semantic segmentation of archaeological objects | Raveerat Jaturapitpornchai et.al. | 2404.05512 | null |
| 2024-04-08 | Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance | Dazhong Shen et.al. | 2404.05384 | link |
| 2024-04-08 | GPS-free Autonomous Navigation in Cluttered Tree Rows with Deep Semantic Segmentation | Alessandro Navone et.al. | 2404.05338 | null |
| 2024-04-08 | Human Detection from 4D Radar Data in Low-Visibility Field Conditions | Mikael Skog et.al. | 2404.05307 | null |
| 2024-04-08 | iVPT: Improving Task-relevant Information Sharing in Visual Prompt Tuning by Cross-layer Dynamic Connection | Nan Zhou et.al. | 2404.05207 | null |
| 2024-04-08 | UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather | Haimei Zhao et.al. | 2404.05145 | null |
| 2024-04-07 | D2SL: Decouple Defogging and Semantic Learning for Foggy Domain-Adaptive Segmentation | Xuan Sun et.al. | 2404.04807 | null |
| 2024-04-06 | HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene | Ziang Guo et.al. | 2404.04653 | link |
| 2024-04-05 | Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation | Zifu Wan et.al. | 2404.04256 | null |
| 2024-04-05 | Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation | Ji-Jia Wu et.al. | 2404.04231 | null |
| 2024-04-05 | MarsSeg: Mars Surface Semantic Segmentation with Multi-level Extractor and Connector | Junbo Li et.al. | 2404.04155 | null |
| 2024-04-04 | Language-Guided Instance-Aware Domain-Adaptive Panoptic Segmentation | Elham Amin Mansour et.al. | 2404.03799 | null |
| 2024-04-04 | Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball | Simon Weber et.al. | 2404.03778 | null |
| 2024-04-04 | OW-VISCap: Open-World Video Instance Segmentation and Captioning | Anwesa Choudhuri et.al. | 2404.03657 | null |
| 2024-04-04 | Background Noise Reduction of Attention Map for Weakly Supervised Semantic Segmentation | Izumi Fujimori et.al. | 2404.03394 | null |
| 2024-04-04 | iSeg: Interactive 3D Segmentation via Interactive Attention | Itai Lang et.al. | 2404.03219 | null |
| 2024-04-04 | CORP: A Multi-Modal Dataset for Campus-Oriented Roadside Perception Tasks | Beibei Wang et.al. | 2404.03191 | null |
| 2024-04-03 | GPU-Accelerated RSF Level Set Evolution for Large-Scale Microvascular Segmentation | Meher Niger et.al. | 2404.02813 | null |
| 2024-04-03 | RS-Mamba for Large Remote Sensing Image Dense Prediction | Sijie Zhao et.al. | 2404.02668 | link |
| 2024-04-03 | A Satellite Band Selection Framework for Amazon Forest Deforestation Detection Task | Eduardo Neto et.al. | 2404.02659 | null |
| 2024-04-03 | SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation | Junyan Ye et.al. | 2404.02638 | link |
| 2024-04-03 | Active learning for efficient annotation in precision agriculture: a use-case on crop-weed semantic segmentation | Bart M. van Marrewijk et.al. | 2404.02580 | null |
| 2024-04-03 | HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras | Zhongyu Xia et.al. | 2404.02517 | link |
| 2024-04-03 | Optimizing traffic signs and lights visibility for the teleoperation of autonomous vehicles through ROI compression | I. Dror et.al. | 2404.02481 | null |
| 2024-04-03 | RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation | Xianping Ma et.al. | 2404.02457 | link |
| 2024-04-02 | Constrained Robotic Navigation on Preferred Terrains Using LLMs and Speech Instruction: Exploiting the Power of Adverbs | Faraz Lotfi et.al. | 2404.02294 | null |
| 2024-04-02 | Segment Any 3D Object with Language | Seungjun Lee et.al. | 2404.02157 | null |
| 2024-04-02 | Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation | Hui Xiao et.al. | 2404.02065 | null |
| 2024-04-01 | What is Point Supervision Worth in Video Instance Segmentation? | Shuaiyi Huang et.al. | 2404.01990 | null |
| 2024-04-02 | Synthetic Data for Robust Stroke Segmentation | Liam Chalcroft et.al. | 2404.01946 | link |
| 2024-04-02 | Improving Bird’s Eye View Semantic Segmentation by Task Decomposition | Tianhao Zhao et.al. | 2404.01925 | null |
| 2024-04-02 | Rethinking Annotator Simulation: Realistic Evaluation of Whole-Body PET Lesion Interactive Segmentation Methods | Zdravko Marinov et.al. | 2404.01816 | null |
| 2024-04-02 | Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model | Qinfeng Zhu et.al. | 2404.01705 | null |
| 2024-04-02 | Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss | Jaeha Kim et.al. | 2404.01692 | null |
| 2024-04-02 | JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments | Duy-Tho Le et.al. | 2404.01686 | null |
| 2024-04-01 | SUGAR: Pre-training 3D Visual Representations for Robotics | Shizhe Chen et.al. | 2404.01491 | null |
| 2024-03-29 | ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning | Beomyoung Kim et.al. | 2403.20126 | link |
| 2024-03-29 | Modeling Weather Uncertainty for Multi-weather Co-Presence Estimation | Qi Bi et.al. | 2403.20092 | null |
| 2024-03-29 | Using Images as Covariates: Measuring Curb Appeal with Deep Learning | Ardyn Nordstrom et.al. | 2403.19915 | null |
| 2024-03-29 | MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection | Ali Behrouz et.al. | 2403.19888 | null |
| 2024-03-28 | Segmentation Re-thinking Uncertainty Estimation Metrics for Semantic Segmentation | Qitian Ma et.al. | 2403.19826 | null |
| 2024-04-01 | Efficient 3D Instance Mapping and Localization with Neural Fields | George Tang et.al. | 2403.19797 | null |
| 2024-03-28 | ENet-21: An Optimized light CNN Structure for Lane Detection | Seyed Rasoul Hosseini et.al. | 2403.19782 | null |
| 2024-03-29 | Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers | Pingcheng Dong et.al. | 2403.19591 | link |
| 2024-03-28 | DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs | Donghyun Kim et.al. | 2403.19588 | link |
| 2024-03-28 | Learning Multiple Representations with Inconsistency-Guided Detail Regularization for Mask-Guided Matting | Weihao Jiang et.al. | 2403.19213 | null |
| 2024-03-27 | Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D | Mukund Varma T et.al. | 2403.18922 | null |
| 2024-03-27 | Annolid: Annotate, Segment, and Track Anything You Need | Chen Yang et.al. | 2403.18690 | null |
| 2024-03-27 | I2CKD : Intra- and Inter-Class Knowledge Distillation for Semantic Segmentation | Ayoub Karine et.al. | 2403.18490 | null |
| 2024-03-28 | ViTAR: Vision Transformer with Any Resolution | Qihang Fan et.al. | 2403.18361 | null |
| 2024-03-27 | Generating Diverse Agricultural Data for Vision-Based Farming Applications | Mikolaj Cieslak et.al. | 2403.18351 | null |
| 2024-03-27 | Road Obstacle Detection based on Unknown Objectness Scores | Chihiro Noguchi et.al. | 2403.18207 | null |
| 2024-03-26 | Spectral Convolutional Transformer: Harmonizing Real vs. Complex Multi-View Spectral Operators for Vision Transformer | Badri N. Patro et.al. | 2403.18063 | link |
| 2024-03-26 | The Need for Speed: Pruning Transformers with One Recipe | Samir Khaki et.al. | 2403.17921 | link |
| 2024-03-26 | Compressed Multi-task embeddings for Data-Efficient Downstream training and inference in Earth Observation | Carlos Gomes et.al. | 2403.17886 | null |
| 2024-03-26 | PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition | Chenhongyi Yang et.al. | 2403.17695 | link |
| 2024-03-26 | Integrating Mamba Sequence Model and Hierarchical Upsampling Network for Accurate Semantic Segmentation of Multiple Sclerosis Legion | Kazi Shahriar Sanjid et.al. | 2403.17432 | null |
| 2024-03-25 | Optimizing LiDAR Placements for Robust Driving Perception in Adverse Conditions | Ye Li et.al. | 2403.17009 | link |
| 2024-03-25 | DreamLIP: Language-Image Pre-training with Long Captions | Kecheng Zheng et.al. | 2403.17007 | null |
| 2024-03-25 | TwinLiteNetPlus: A Stronger Model for Real-time Drivable Area and Lane Segmentation | Quang-Huy Che et.al. | 2403.16958 | null |
| 2024-03-25 | HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation | Linglin Jing et.al. | 2403.16788 | null |
| 2024-03-25 | Clustering Propagation for Universal Medical Image Segmentation | Yuhang Ding et.al. | 2403.16646 | null |
| 2024-03-25 | SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation | Aysim Toker et.al. | 2403.16605 | null |
| 2024-03-25 | Self-Supervised Learning for Medical Image Data with Anatomy-Oriented Imaging Planes | Tianwei Zhang et.al. | 2403.16499 | null |
| 2024-03-25 | GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation | Weiming Zhang et.al. | 2403.16370 | null |
| 2024-03-24 | AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans | Cedric Perauer et.al. | 2403.16318 | null |
| 2024-03-24 | Dual-modal Prior Semantic Guided Infrared and Visible Image Fusion for Intelligent Transportation System | Jing Li et.al. | 2403.16227 | null |
| 2024-03-24 | Segment Anything Model for Road Network Graph Extraction | Congrui Hetang et.al. | 2403.16051 | link |
| 2024-03-24 | SM2C: Boost the Semi-supervised Segmentation for Medical Image by using Meta Pseudo Labels and Mixed Images | Yifei Wang et.al. | 2403.16009 | null |
| 2024-03-22 | Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting | Jun Guo et.al. | 2403.15624 | null |
| 2024-03-22 | A2DMN: Anatomy-Aware Dilated Multiscale Network for Breast Ultrasound Semantic Segmentation | Kyle Lucke et.al. | 2403.15560 | null |
| 2024-03-22 | InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding | Yi Wang et.al. | 2403.15377 | link |
| 2024-03-22 | Anytime, Anywhere, Anyone: Investigating the Feasibility of Segment Anything Model for Crowd-Sourcing Medical Image Annotations | Pranav Kulkarni et.al. | 2403.15218 | null |
| 2024-03-22 | Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion | Sofia Casarin et.al. | 2403.15194 | null |
| 2024-03-22 | IFSENet : Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence | Shreyas Chandgothia et.al. | 2403.15089 | null |
| 2024-03-22 | Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model using 3D Whole-body CT Scans | Heng Guo et.al. | 2403.15063 | null |
| 2024-03-22 | BSNet: Box-Supervised Simulation-assisted Mean Teacher for 3D Instance Segmentation | Jiahao Lu et.al. | 2403.15019 | null |
| 2024-03-22 | Improve Cross-domain Mixed Sampling with Guidance Training for Adaptive Segmentation | Wenlve Zhou et.al. | 2403.14995 | null |
| 2024-03-21 | WeatherProof: Leveraging Language Guidance for Semantic Segmentation in Adverse Weather | Blake Gella et.al. | 2403.14874 | null |
| 2024-03-21 | PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model | Zheng Zhang et.al. | 2403.14598 | link |
| 2024-03-21 | Learning to Project for Cross-Task Knowledge Distillation | Dylan Auty et.al. | 2403.14494 | null |
| 2024-03-21 | OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation | Bohao Peng et.al. | 2403.14418 | link |
| 2024-03-21 | Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models | Pablo Marcos-Manchón et.al. | 2403.14291 | link |
| 2024-03-21 | OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation | Kwanyoung Kim et.al. | 2403.14183 | null |
| 2024-03-21 | Evidential Semantic Mapping in Off-road Environments with Uncertainty-aware Bayesian Kernel Inference | Junyoung Kim et.al. | 2403.14138 | null |
| 2024-03-21 | Soft Masked Transformer for Point Cloud Processing with Skip Attention-Based Upsampling | Yong He et.al. | 2403.14124 | null |
| 2024-03-21 | Semantics from Space: Satellite-Guided Thermal Semantic Segmentation Annotation for Aerial Field Robots | Connor Lee et.al. | 2403.14056 | null |
| 2024-03-20 | When Cars meet Drones: Hyperbolic Federated Learning for Source-Free Domain Adaptation in Adverse Weather | Giulia Rizzoli et.al. | 2403.13762 | null |
| 2024-03-20 | Next day fire prediction via semantic segmentation | Konstantinos Alexis et.al. | 2403.13545 | null |
| 2024-03-20 | MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining | Di Wang et.al. | 2403.13430 | link |
| 2024-03-20 | AMCO: Adaptive Multimodal Coupling of Vision and Proprioception for Quadruped Robot Navigation in Outdoor Environments | Mohamed Elnoor et.al. | 2403.13235 | null |
| 2024-03-20 | Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation | Linshan Wu et.al. | 2403.13225 | link |
| 2024-03-19 | Reflectivity Is All You Need!: Advancing LiDAR Semantic Segmentation | Kasi Viswanath et.al. | 2403.13188 | null |
| 2024-03-19 | As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks? | Anjun Hu et.al. | 2403.12693 | null |
| 2024-03-19 | PCT: Perspective Cue Training Framework for Multi-Camera BEV Segmentation | Haruya Ishikawa et.al. | 2403.12530 | null |
| 2024-03-19 | Semantics, Distortion, and Style Matter: Towards Source-free UDA for Panoramic Segmentation | Xu Zheng et.al. | 2403.12505 | null |
| 2024-03-19 | CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation | Wenqi Zhu et.al. | 2403.12455 | link |
| 2024-03-19 | Multi-Object RANSAC: Efficient Plane Clustering Method in a Clutter | Seunghyeon Lim et.al. | 2403.12449 | null |
| 2024-03-18 | EffiPerception: an Efficient Framework for Various Perception Tasks | Xinhao Xiang et.al. | 2403.12317 | null |
| 2024-03-18 | Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery | Yuqi Zhang et.al. | 2403.11812 | null |
| 2024-03-18 | Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation | Wangbo Zhao et.al. | 2403.11808 | link |
| 2024-03-18 | LSKNet: A Foundation Lightweight Backbone for Remote Sensing | Yuxuan Li et.al. | 2403.11735 | null |
| 2024-03-18 | TTT-KD: Test-Time Training for 3D Semantic Segmentation through Knowledge Distillation from Foundation Models | Lisa Weijler et.al. | 2403.11691 | null |
| 2024-03-18 | Better (pseudo-)labels for semi-supervised instance segmentation | François Porcher et.al. | 2403.11675 | null |
| 2024-03-18 | Synthesizing multi-log grasp poses | Arvid Fälldin et.al. | 2403.11623 | null |
| 2024-03-18 | OurDB: Ouroboric Domain Bridging for Multi-Target Domain Adaptive Semantic Segmentation | Seungbeom Woo et.al. | 2403.11582 | null |
| 2024-03-18 | MISS: Memory-efficient Instance Segmentation Framework By Visual Inductive Priors Flow Propagation | Chih-Chung Hsu et.al. | 2403.11576 | null |
| 2024-03-18 | Augment Before Copy-Paste: Data and Memory Efficiency-Oriented Instance Segmentation Framework for Sport-scenes | Chih-Chung Hsu et.al. | 2403.11572 | null |
| 2024-03-18 | Circle Representation for Medical Instance Object Segmentation | Juming Xiong et.al. | 2403.11507 | link |
| 2024-03-18 | MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception | Thien-Minh Nguyen et.al. | 2403.11496 | null |
| 2024-03-18 | Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting | Mingkui Tan et.al. | 2403.11491 | null |
| 2024-03-18 | ShapeFormer: Shape Prior Visible-to-Amodal Transformer-based Amodal Instance Segmentation | Minh Tran et.al. | 2403.11376 | null |
| 2024-03-14 | PosSAM: Panoptic Open-vocabulary Segment Anything | Vibashan VS et.al. | 2403.09620 | link |
| 2024-03-14 | WeakSurg: Weakly supervised surgical instrument segmentation using temporal equivariance and semantic continuity | Qiyuan Wang et.al. | 2403.09551 | null |
| 2024-03-14 | Annotation Free Semantic Segmentation with Vision Foundation Models | Soroush Seifi et.al. | 2403.09307 | null |
| 2024-03-14 | StainFuser: Controlling Diffusion for Faster Neural Style Transfer in Multi-Gigapixel Histology Images | Robert Jewsbury et.al. | 2403.09302 | link |
| 2024-03-14 | Customizing Segmentation Foundation Model via Prompt Learning for Instance Segmentation | Hyung-Il Kim et.al. | 2403.09199 | null |
| 2024-03-14 | When Semantic Segmentation Meets Frequency Aliasing | Linwei Chen et.al. | 2403.09065 | link |
| 2024-03-13 | CART: Caltech Aerial RGB-Thermal Dataset in the Wild | Connor Lee et.al. | 2403.08997 | link |
| 2024-03-13 | SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net | Helin Cao et.al. | 2403.08885 | null |
| 2024-03-13 | Segmentation of Knee Bones for Osteoarthritis Assessment: A Comparative Analysis of Supervised, Few-Shot, and Zero-Shot Learning Approaches | Yun Xin Teoh et.al. | 2403.08761 | null |
| 2024-03-13 | Real-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution | Samuel Sze et.al. | 2403.08748 | null |
| 2024-03-13 | Semantic Segmentation of Solar Radio Spikes at Low Frequencies | Pearse C. Murphy et.al. | 2403.08546 | null |
| 2024-03-13 | Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation | Zicheng Zhang et.al. | 2403.08426 | null |
| 2024-03-13 | LIX: Implicitly Infusing Spatial Geometric Prior Knowledge into Visual Semantic Segmentation for Autonomous Driving | Sicen Guo et.al. | 2403.08215 | null |
| 2024-03-13 | Multiscale Low-Frequency Memory Network for Improved Feature Extraction in Convolutional Neural Networks | Fuzhi Wu et.al. | 2403.08157 | link |
| 2024-03-12 | Mitigating the Impact of Attribute Editing on Face Recognition | Sudipta Banerjee et.al. | 2403.08092 | null |
| 2024-03-12 | Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation | Feilong Tang et.al. | 2403.07630 | link |
| 2024-03-12 | PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution | Honghao Chen et.al. | 2403.07589 | null |
| 2024-03-12 | Open-World Semantic Segmentation Including Class Similarity | Matteo Sodano et.al. | 2403.07532 | null |
| 2024-03-11 | Average Calibration Error: A Differentiable Loss for Improved Reliability in Image Segmentation | Theodore Barfoot et.al. | 2403.06759 | link |
| 2024-03-11 | Forest Inspection Dataset for Aerial Semantic Segmentation and Depth Estimation | Bianca-Cerasela-Zelia Blaga et.al. | 2403.06621 | link |
| 2024-03-11 | OMH: Structured Sparsity via Optimally Matched Hierarchy for Unsupervised Semantic Segmentation | Baran Ozaydin et.al. | 2403.06546 | null |
| 2024-03-11 | 3D Semantic Segmentation-Driven Representations for 3D Object Detection | Hayeon O et.al. | 2403.06501 | link |
| 2024-03-11 | Point Mamba: A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy | Jiuming Liu et.al. | 2403.06467 | link |
| 2024-03-11 | Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation | Xiaoyang Wang et.al. | 2403.06462 | null |
| 2024-03-11 | Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation | Peng Zhang et.al. | 2403.06401 | null |
| 2024-03-10 | Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning | Woo-Jin Ahn et.al. | 2403.06122 | link |
| 2024-03-09 | Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation | Hairong Shi et.al. | 2403.05912 | null |
| 2024-03-09 | Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration | Jingyun Xue et.al. | 2403.05906 | null |
| 2024-03-08 | Attention-guided Feature Distillation for Semantic Segmentation | Amir M. Mansourian et.al. | 2403.05451 | link |
| 2024-03-08 | Generalized Correspondence Matching via Flexible Hierarchical Refinement and Patch Descriptor Distillation | Yu Han et.al. | 2403.05388 | null |
| 2024-03-08 | Frequency-Adaptive Dilated Convolution for Semantic Segmentation | Linwei Chen et.al. | 2403.05369 | link |
| 2024-03-08 | Embedded Deployment of Semantic Segmentation in Medicine through Low-Resolution Inputs | Erik Ostrowski et.al. | 2403.05340 | null |
| 2024-03-08 | LVIC: Multi-modality segmentation by Lifting Visual Info as Cue | Zichao Dong et.al. | 2403.05159 | null |
| 2024-03-07 | SAM-PD: How Far Can SAM Take Us in Tracking and Segmenting Anything in Videos by Prompt Denoising | Tao Zhou et.al. | 2403.04194 | link |
| 2024-03-06 | ECAP: Extensive Cut-and-Paste Augmentation for Unsupervised Domain Adaptive Semantic Segmentation | Erik Brorsson et.al. | 2403.03854 | link |
| 2024-03-06 | Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision | Yajie Liu et.al. | 2403.03707 | null |
| 2024-03-06 | Causal Prototype-inspired Contrast Adaptation for Unsupervised Domain Adaptive Semantic Segmentation of High-resolution Remote Sensing Imagery | Jingru Zhu et.al. | 2403.03704 | null |
| 2024-03-06 | GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding | Zi-Ting Chou et.al. | 2403.03608 | null |
| 2024-03-06 | Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator | Wonhyeok Choi et.al. | 2403.03468 | null |
| 2024-03-05 | CenterDisks: Real-time instance segmentation with disk covering | Katia Jodogne-Del Litto et.al. | 2403.03296 | link |
| 2024-03-05 | Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection | Mohamed Afifi et.al. | 2403.03111 | null |
| 2024-03-05 | ActiveAD: Planning-Oriented Active Learning for End-to-End Autonomous Driving | Han Lu et.al. | 2403.02877 | null |
| 2024-03-05 | DDF: A Novel Dual-Domain Image Fusion Strategy for Remote Sensing Image Semantic Segmentation with Unsupervised Domain Adaptation | Lingyan Ran et.al. | 2403.02784 | null |
| 2024-03-05 | Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels | Zhuohong Li et.al. | 2403.02746 | null |
| 2024-03-05 | FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird’s-Eye View and Perspective View | Jiawei Hou et.al. | 2403.02710 | null |
| 2024-03-05 | Deep Common Feature Mining for Efficient Video Semantic Segmentation | Yaoyan Zheng et.al. | 2403.02689 | null |
| 2024-03-04 | Self-Supervised Facial Representation Learning with Facial Region Awareness | Zheng Gao et.al. | 2403.02138 | null |
| 2024-03-04 | Semi-Supervised Semantic Segmentation Based on Pseudo-Labels: A Survey | Lingyan Ran et.al. | 2403.01909 | null |
| 2024-03-04 | Map-aided annotation for pole base detection | Benjamin Missaoui et.al. | 2403.01868 | null |
| 2024-03-04 | AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation | Haonan Wang et.al. | 2403.01818 | link |
| 2024-03-02 | Benchmarking Segmentation Models with Mask-Preserved Attribute Editing | Zijin Yin et.al. | 2403.01231 | link |
| 2024-03-02 | Boosting Box-supervised Instance Segmentation with Pseudo Depth | Xinyi Yu et.al. | 2403.01214 | null |
| 2024-03-02 | Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation | Lian Xu et.al. | 2403.01156 | null |
| 2024-03-01 | Rethinking Few-shot 3D Point Cloud Semantic Segmentation | Zhaochong An et.al. | 2403.00592 | link |
| 2024-03-01 | Small, Versatile and Mighty: A Range-View Perception Framework | Qiang Meng et.al. | 2403.00325 | null |
| 2024-03-01 | YOLO-MED : Multi-Task Interaction Network for Biomedical Images | Suizhi Huang et.al. | 2403.00245 | null |
| 2024-02-29 | FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything | Safouane El Ghazouali et.al. | 2403.00175 | link |
| 2024-02-29 | Leveraging AI Predicted and Expert Revised Annotations in Interactive Segmentation: Continual Tuning or Full Training? | Tiezheng Zhang et.al. | 2402.19423 | null |
| 2024-03-01 | PEM: Prototype-based Efficient MaskFormer for Image Segmentation | Niccolò Cavagnero et.al. | 2402.19422 | link |
| 2024-02-29 | RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation | Jie Zhang et.al. | 2402.19004 | null |
| 2024-02-28 | Spatial Coherence Loss for Salient and Camouflaged Object Detection and Beyond | Ziyun Yang et.al. | 2402.18698 | null |
| 2024-02-29 | Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation | Zhiwei Yang et.al. | 2402.18467 | link |
| 2024-02-29 | A Modular System for Enhanced Robustness of Multimedia Understanding Networks via Deep Parametric Estimation | Francesco Barbato et.al. | 2402.18402 | null |
| 2024-02-28 | Enhancing Roadway Safety: LiDAR-based Tree Clearance Analysis | Miriam Louise Carnot et.al. | 2402.18309 | null |
| 2024-02-28 | Feature Denoising For Low-Light Instance Segmentation Using Weighted Non-Local Blocks | Joanne Lin et.al. | 2402.18307 | null |
| 2024-02-28 | Self-Supervised Learning in Electron Microscopy: Towards a Foundation Model for Advanced Image Analysis | Bashir Kazimi et.al. | 2402.18286 | null |
| 2024-02-28 | PRCL: Probabilistic Representation Contrastive Learning for Semi-Supervised Semantic Segmentation | Haoyu Xie et.al. | 2402.18117 | null |
| 2024-02-28 | Spannotation: Enhancing Semantic Segmentation for Autonomous Navigation with Efficient Image Annotation | Samuel O. Folorunsho et.al. | 2402.18084 | link |
| 2024-02-27 | Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation | Xinyu Yang et.al. | 2402.17891 | link |
| 2024-02-27 | Mitigating Distributional Shift in Semantic Segmentation via Uncertainty Estimation from Unlabelled Data | David S. W. Williams et.al. | 2402.17653 | null |
| 2024-02-27 | Masked Gamma-SSL: Learning Uncertainty Estimation via Masked Image Modeling | David S. W. Williams et.al. | 2402.17622 | null |
Object Tracking
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-12-22 | Learning Generalizable Hand-Object Tracking from Synthetic Demonstrations | Yinhuai Wang et.al. | 2512.19583 | null |
| 2025-12-12 | A 96pJ/Frame/Pixel and 61pJ/Event Anti-UAV System with Hybrid Object Tracking Modes | Yuncheng Lu et.al. | 2512.17939 | null |
| 2025-12-17 | Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank | Chenxiao Zhang et.al. | 2512.15066 | null |
| 2025-12-17 | Beyond Proximity: A Keypoint-Trajectory Framework for Classifying Affiliative and Agonistic Social Networks in Dairy Cattle | Sibi Parivendan et.al. | 2512.14998 | null |
| 2025-12-16 | TUMTraf EMOT: Event-Based Multi-Object Tracking Dataset and Baseline for Traffic Scenarios | Mengyu Li et.al. | 2512.14595 | null |
| 2025-12-16 | Quadratic Kalman Filter for Elliptical Extended Object Tracking based on Decoupling State Components | Simon Steuernagel et.al. | 2512.14426 | null |
| 2025-12-15 | Recurrent Video Masked Autoencoders | Daniel Zoran et.al. | 2512.13684 | null |
| 2025-12-15 | LeafTrackNet: A Deep Learning Framework for Robust Leaf Tracking in Top-Down Plant Phenotyping | Shanghua Liu et.al. | 2512.13130 | null |
| 2025-12-15 | Light Field Based 6DoF Tracking of Previously Unobserved Objects | Nikolai Goncharov et.al. | 2512.13007 | null |
| 2025-12-11 | MeViS: A Multi-Modal Dataset for Referring Motion Expression Video Segmentation | Henghui Ding et.al. | 2512.10945 | null |
| 2025-12-10 | Benchmarking SAM2-based Trackers on FMOX | Senem Aktas et.al. | 2512.09633 | null |
| 2025-12-10 | Efficient Feature Compression for Machines with Global Statistics Preservation | Md Eimran Hossain Eimon et.al. | 2512.09235 | null |
| 2025-12-08 | GorillaWatch: An Automated System for In-the-Wild Gorilla Re-Identification and Population Monitoring | Maximilian Schall et.al. | 2512.07776 | null |
| 2025-12-08 | How Far are Modern Trackers from UAV-Anti-UAV? A Million-Scale Benchmark and New Baseline | Chunhui Zhang et.al. | 2512.07385 | null |
| 2025-12-06 | NexusFlow: Unifying Disparate Tasks under Partial Supervision via Invertible Flow Networks | Fangzhou Lin et.al. | 2512.06251 | null |
| 2025-12-04 | Two-Stage Camera Calibration Method for Multi-Camera Systems Using Scene Geometry | Aleksandr Abramov et.al. | 2512.05171 | null |
| 2025-12-02 | TrackNetV5: Residual-Driven Spatio-Temporal Refinement and Motion Direction Decoupling for Fast Object Tracking | Tang Haonan et.al. | 2512.02789 | null |
| 2025-12-02 | UAUTrack: Towards Unified Multimodal Anti-UAV Visual Tracking | Qionglin Ren et.al. | 2512.02668 | null |
| 2025-12-02 | From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking | Yuqing Shao et.al. | 2512.02392 | null |
| 2025-12-01 | Physical ID-Transfer Attacks against Multi-Object Tracking via Adversarial Trajectory | Chenyi Wang et.al. | 2512.01934 | null |
| 2025-12-01 | TransientTrack: Advanced Multi-Object Tracking and Classification of Cancer Cells with Transient Fluorescent Signals | Florian Bürger et.al. | 2512.01885 | null |
| 2025-12-01 | BlinkBud: Detecting Hazards from Behind via Sampled Monocular 3D Detection on a Single Earbud | Yunzhe Li et.al. | 2512.01366 | null |
| 2025-11-30 | City-Conditioned Memory for Multi-City Traffic and Mobility Forecasting | Wenzhang Du et.al. | 2512.00851 | null |
| 2025-11-28 | MANTA: Physics-Informed Generalized Underwater Object Tracking | Suhas Srinath et.al. | 2511.23405 | null |
| 2025-11-28 | DM $^3$ T: Harmonizing Modalities via Diffusion for Multi-Object Tracking | Weiran Li et.al. | 2511.22896 | null |
| 2025-11-27 | Bistatic Passive Tracking via CSI Power | Zhongqin Wang et.al. | 2511.22144 | null |
| 2025-11-26 | Referring Video Object Segmentation with Cross-Modality Proxy Queries | Baoli Sun et.al. | 2511.21139 | null |
| 2025-11-26 | AerialMind: Towards Referring Multi-Object Tracking in UAV Scenarios | Chenglizhao Chen et.al. | 2511.21053 | null |
| 2025-11-25 | V $^{2}$ -SAM: Marrying SAM2 with Multi-Prompt Experts for Cross-View Object Correspondence | Jiancheng Pan et.al. | 2511.20886 | null |
| 2025-11-25 | Video Object Recognition in Mobile Edge Networks: Local Tracking or Edge Detection? | Kun Guo et.al. | 2511.20716 | null |
| 2025-11-25 | StableTrack: Stabilizing Multi-Object Tracking on Low-Frequency Detections | Matvei Shelukhan et.al. | 2511.20418 | null |
| 2025-11-25 | SAFE-IMM: Robust and Lightweight Radar-Based Object Tracking on Mobile Platforms | Dnyandeep Mandaokar et.al. | 2511.20294 | null |
| 2025-11-25 | Occlusion-Aware Multi-Object Tracking via Expected Probability of Detection | Jan Krejčí et.al. | 2511.20239 | null |
| 2025-11-25 | Image Diffusion Models Exhibit Emergent Temporal Propagation in Videos | Youngseo Kim et.al. | 2511.19936 | null |
| 2025-11-22 | Tracking and Segmenting Anything in Any Modality | Tianlu Zhang et.al. | 2511.19475 | null |
| 2025-11-24 | Real-Time Object Tracking with On-Device Deep Learning for Adaptive Beamforming in Dynamic Acoustic Environments | Jorge Ortigoso-Narro et.al. | 2511.19396 | null |
| 2025-11-24 | LAA3D: A Benchmark of Detecting and Tracking Low-Altitude Aircraft in 3D Space | Hai Wu et.al. | 2511.19057 | null |
| 2025-11-24 | Stable Multi-Drone GNSS Tracking System for Marine Robots | Shuo Wen et.al. | 2511.18694 | null |
| 2025-11-23 | A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles | Tianyang Xu et.al. | 2511.18344 | null |
| 2025-11-23 | SatSAM2: Motion-Constrained Video Object Tracking in Satellite Imagery using Promptable SAM2 and Kalman Priors | Ruijie Fan et.al. | 2511.18264 | null |
| 2025-11-22 | CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking | Hao Li et.al. | 2511.17967 | null |
| 2025-11-21 | Vision-Motion-Reference Alignment for Referring Multi-Object Tracking via Multi-Modal Large Language Models | Weiyi Lv et.al. | 2511.17681 | null |
| 2025-11-18 | 3D Ground Truth Reconstruction from Multi-Camera Annotations Using UKF | Linh Van Ma et.al. | 2511.17609 | null |
| 2025-10-04 | Deep Learning-based Lightweight RGB Object Tracking for Augmented Reality Devices | Alice Smith et.al. | 2511.17508 | null |
| 2025-11-21 | OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding | Teng Fu et.al. | 2511.17053 | null |
| 2025-11-21 | RacketVision: A Multiple Racket Sports Benchmark for Unified Ball and Racket Analysis | Linfeng Dong et.al. | 2511.17045 | null |
| 2025-11-20 | Physics-Informed Machine Learning for Efficient Sim-to-Real Data Augmentation in Micro-Object Pose Estimation | Zongcai Tan et.al. | 2511.16494 | null |
| 2025-11-20 | SwiTrack: Tri-State Switch for Cross-Modal Object Tracking | Boyue Xu et.al. | 2511.16227 | null |
| 2025-11-19 | CompTrack: Information Bottleneck-Guided Low-Rank Dynamic Token Compression for Point Cloud Tracking | Sifan Zhou et.al. | 2511.15580 | null |
| 2025-11-19 | MambaTrack3D: A State Space Model Framework for LiDAR-Based Object Tracking under High Temporal Variation | Shengjing Tian et.al. | 2511.15077 | null |
| 2025-11-17 | SAE-MCVT: A Real-Time and Scalable Multi-Camera Vehicle Tracking Framework Powered by Edge Computing | Yuqiang Lin et.al. | 2511.13904 | null |
| 2025-11-17 | A Trajectory-free Crash Detection Framework with Generative Approach and Segment Map Diffusion | Weiying Shen et.al. | 2511.13795 | null |
| 2025-11-17 | PlugTrack: Multi-Perceptive Motion Analysis for Adaptive Fusion in Multi-Object Tracking | Seungjae Kim et.al. | 2511.13105 | null |
| 2025-11-14 | SOTFormer: A Minimal Transformer for Unified Object Tracking and Trajectory Prediction | Zhongping Dong et.al. | 2511.11824 | null |
| 2025-11-14 | Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective | Nhat Chung et.al. | 2511.11478 | null |
| 2025-11-01 | Cognitively-Inspired Episodic Memory Architectures for Accurate and Efficient Character AI | Rafael Arias Gonzalez et.al. | 2511.10652 | null |
| 2025-11-13 | FastGraph: Optimized GPU-Enabled Algorithms for Fast Graph Building and Message Passing | Aarush Agarwal et.al. | 2511.10442 | null |
| 2025-11-12 | Hand Held Multi-Object Tracking Dataset in American Football | Rintaro Otsubo et.al. | 2511.09455 | null |
| 2025-11-12 | Color Multiset Codes based on Sunmao Construction | Wing Shing Wong et.al. | 2511.09070 | null |
| 2025-11-06 | A Multi-Drone Multi-View Dataset and Deep Learning Framework for Pedestrian Detection and Tracking | Kosta Dakic et.al. | 2511.08615 | null |
| 2025-11-06 | In-process 3D Deviation Mapping and Defect Monitoring (3D-DM2) in High Production-rate Robotic Additive Manufacturing | Subash Gautam et.al. | 2511.05604 | null |
| 2025-10-27 | An uncertainty-aware physics-informed neural network solution for the Black-Scholes equation: a novel framework for option pricing | Sina Kazemian et.al. | 2511.05519 | null |
| 2025-11-06 | GentleHumanoid: Learning Upper-body Compliance for Contact-rich Human and Object Interaction | Qingzhou Lu et.al. | 2511.04679 | null |
| 2025-11-06 | Tracking and Understanding Object Transformations | Yihong Sun et.al. | 2511.04678 | null |
| 2025-11-06 | MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers | Ali Boudaghi et.al. | 2511.04376 | null |
| 2025-11-06 | AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research | Tim Beyer et.al. | 2511.04316 | null |
| 2025-11-06 | Measuring economic outlook in the news timely and efficiently | Elliot Beck et.al. | 2511.04299 | null |
| 2025-11-06 | BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning | Yitang Li et.al. | 2511.04131 | null |
| 2025-11-06 | DMSORT: An efficient parallel maritime multi-object tracking architecture for unmanned vessel platforms | Shengyu Tang et.al. | 2511.04128 | null |
| 2025-11-05 | Multi-Object Tracking Retrieval with LLaVA-Video: A Training-Free Solution to MOT25-StAG Challenge | Yi Yang et.al. | 2511.03332 | null |
| 2025-11-05 | Consensus Tracking of an Underwater Vehicle Using Weighted Harmonic Mean Density | Ved Prakash Dubey et.al. | 2511.03130 | null |
| 2025-11-05 | Accelerating Physical Property Reasoning for Augmented Visual Cognition | Hongbo Lan et.al. | 2511.03126 | null |
| 2025-11-04 | Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks | Dmitrii Pozdeev et.al. | 2511.02830 | null |
| 2025-11-04 | Short Blocks, Fast Sensing: Finite Blocklength Tradeoffs in RIS-Assisted ISAC | Adam Umra et.al. | 2511.02673 | null |
| 2025-11-04 | Zero-Shot Multi-Animal Tracking in the Wild | Jan Frederik Meier et.al. | 2511.02591 | null |
| 2025-11-04 | Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization | Tao Liu et.al. | 2511.02489 | link |
| 2025-11-04 | DL-Based Beam Management for mmWave Vehicular Networks Exploring Temporal Correlation | Ailton Oliveira et.al. | 2511.02260 | null |
| 2025-11-04 | Pinpointing Trigger Moment for Grounded Video QA: Enhancing Spatio-temporal Grounding in Multimodal Large Language Models | Jinhwan Seo et.al. | 2511.02182 | null |
| 2025-11-04 | ScenicProver: A Framework for Compositional Probabilistic Verification of Learning-Enabled Systems | Eric Vin et.al. | 2511.02164 | null |
| 2025-11-02 | Autonomous Vehicle front steering control computation saving | Julián Salt Llobregat et.al. | 2511.01936 | null |
| 2025-11-03 | UniLION: Towards Unified Autonomous Driving Model with Linear Group RNNs | Zhe Liu et.al. | 2511.01768 | null |
| 2025-11-03 | UniSOT: A Unified Framework for Multi-Modality Single Object Tracking | Yinchao Ma et.al. | 2511.01427 | null |
| 2025-11-03 | Risk Aware Safe Control with Cooperative Sensing for Dynamic Obstacle Avoidance | Pei Yu Chang et.al. | 2511.01403 | null |
| 2025-11-03 | EREBUS: End-to-end Robust Event Based Underwater Simulation | Hitesh Kyatham et.al. | 2511.01381 | null |
| 2025-11-03 | QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code | Hainan Fang et.al. | 2511.01183 | null |
| 2025-11-03 | Web-Scale Collection of Video Data for 4D Animal Reconstruction | Brian Nlong Zhao et.al. | 2511.01169 | null |
| 2025-11-02 | Heuristic Step Planning for Learning Dynamic Bipedal Locomotion: A Comparative Study of Model-Based and Model-Free Approaches | William Suliman et.al. | 2511.00840 | null |
| 2025-11-02 | Active Thinking Model: A Goal-Directed Self-Improving Framework for Real-World Adaptive Intelligence | Hong Su et.al. | 2511.00758 | null |
| 2025-11-01 | RNN-based linear parameter varying adaptive model predictive control for autonomous driving | Yassine Kebbati et.al. | 2511.00610 | null |
| 2025-11-01 | OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory Feedback | Kai Luo et.al. | 2511.00510 | null |
| 2025-11-01 | Large Language Models for Control | Adil Rasheed et.al. | 2511.00337 | null |
| 2025-10-31 | X-TRACK: Physics-Aware xLSTM for Realistic Vehicle Trajectory Prediction | Aanchal Rajesh Chugh et.al. | 2511.00266 | null |
| 2025-10-31 | A Modular and Scalable System Architecture for Heterogeneous UAV Swarms Using ROS 2 and PX4-Autopilot | Robert Pommeranz et.al. | 2510.27327 | null |
| 2025-10-31 | Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery | Mahmoud El Hussieni et.al. | 2510.27224 | null |
| 2025-10-31 | Joint Visible Light and Backscatter Communications for Proximity-Based Indoor Asset Tracking Enabled by Energy-Neutral Devices | Boxuan Xie et.al. | 2510.27217 | null |
| 2025-10-30 | Theoretical models for the Late Thermal Pulse in post-AGB stars: the case of DY Cen | Zhongyang Liu et.al. | 2510.26250 | null |
| 2025-10-30 | Accelerating Real-World Overtaking in F1TENTH Racing Employing Reinforcement Learning Methods | Emily Steiner et.al. | 2510.26040 | null |
| 2025-10-30 | Engineering Social Optimality via Utility Shaping in Non-Cooperative Games under Incomplete Information and Imperfect Monitoring | David Smith et.al. | 2510.26033 | null |
| 2025-10-29 | RADRON: Cooperative Localization of Ionizing Radiation Sources by MAVs with Compton Cameras | Petr Stibinger et.al. | 2510.26018 | null |
| 2025-10-29 | Ensuring Outcome-Based Curriculum Coherence through Systematic CLO-PLO Alignment and Feedback Loops | Moncef Derouich et.al. | 2510.25905 | null |
| 2025-10-29 | Cross-correlating Astrometric and Timing Residuals to Constrain Stochastic Gravitational-Wave Backgrounds | Elias Fink et.al. | 2510.25646 | null |
| 2025-10-29 | TempoPFN: Synthetic Pre-training of Linear RNNs for Zero-shot Time Series Forecasting | Vladyslav Moroshan et.al. | 2510.25502 | null |
| 2025-10-29 | What Challenges Do Developers Face in AI Agent Systems? An Empirical Study on Stack Overflow | Ali Asgari et.al. | 2510.25423 | null |
| 2025-10-29 | Tackling the Algorithmic Control Crisis – the Technical, Legal, and Ethical Challenges of Research into Algorithmic Agents | B. Bodo et.al. | 2510.25337 | null |
| 2025-10-29 | Data-Enabled Predictive Control and Guidance for Autonomous Underwater Vehicles | Sebastian Zieglmeier et.al. | 2510.25309 | null |
| 2025-10-28 | Efficient License Plate Recognition via Pseudo-Labeled Supervision with Grounding DINO and YOLOv8 | Zahra Ebrahimi Vargoorani et.al. | 2510.25032 | null |
| 2025-10-28 | Hybrid Liquid Neural Network-Random Finite Set Filtering for Robust Maneuvering Object Tracking | Minti Liu et.al. | 2510.25020 | null |
| 2025-10-28 | Micro-Doppler signatures and object characterisation of space debris with radio telescopes | Guifré Molera Calvés et.al. | 2510.25004 | null |
| 2025-10-28 | A Hamilton-Jacobi Reachability Framework with Soft Constraints for Safety-Critical Systems | Chams Eddine Mballo et.al. | 2510.24933 | null |
| 2025-10-28 | Delay Tolerant Control for Autonomous Driving Using CDOB | Xincheng Cao et.al. | 2510.24898 | null |
| 2025-10-28 | What Does It Take? Developing a Smartphone App that Motivates Older Adults to be Physically Active | Sabrina Haque et.al. | 2510.24638 | null |
| 2025-10-28 | An Adaptive Inspection Planning Approach Towards Routine Monitoring in Uncertain Environments | Vignesh Kottayam Viswanathan et.al. | 2510.24554 | null |
| 2025-10-28 | A Hybrid Approach for Visual Multi-Object Tracking | Toan Van Nguyen et.al. | 2510.24410 | null |
| 2025-10-28 | GenTrack: A New Generation of Multi-Object Tracking | Toan Van Nguyen et.al. | 2510.24399 | null |
| 2025-10-28 | Distributed Stochastic Momentum Tracking with Local Updates: Achieving Optimal Communication and Iteration Complexities | Kun Huang et.al. | 2510.24155 | null |
| 2025-10-28 | OneCast: Structured Decomposition and Modular Generation for Cross-Domain Time Series Forecasting | Tingyue Pan et.al. | 2510.24028 | null |
| 2025-10-28 | The Sign Estimator: LLM Alignment in the Face of Choice Heterogeneity | Ali Aouad et.al. | 2510.23965 | null |
| 2025-10-27 | dynsight: an Open Python Platform for Simulation and Experimental Trajectory Data Analysis | Simone Martino et.al. | 2510.23493 | null |
| 2025-10-27 | PlanarTrack: A high-quality and challenging benchmark for large-scale planar object tracking | Yifan Jiao et.al. | 2510.23368 | null |
| 2025-10-27 | Intelligent Multimodal Multi-Sensor Fusion-Based UAV Identification, Localization, and Countermeasures for Safeguarding Low-Altitude Economy | Yi Tao et.al. | 2510.22947 | null |
| 2025-10-26 | Robust Atypical Mitosis Classification with DenseNet121: Stain-Aware Augmentation and Hybrid Loss for Domain Generalization | Adinath Dukre et.al. | 2510.22630 | null |
| 2025-10-26 | RoGER-SLAM: A Robust Gaussian Splatting SLAM System for Noisy and Low-light Environment Resilience | Huilin Yin et.al. | 2510.22600 | null |
| 2025-10-25 | Genetic Optimization of a Software-Defined GNSS Receiver | Laura Train et.al. | 2510.22417 | null |
| 2025-10-25 | Experimental Demonstration of Multi-Object Tracking in Integrated Sensing and Communication | Maximilian Bauhofer et.al. | 2510.22180 | null |
| 2025-10-24 | GRAP-MOT: Unsupervised Graph-based Position Weighted Person Multi-camera Multi-object Tracking in a Highly Congested Space | Marek Socha et.al. | 2510.21482 | null |
| 2025-10-23 | Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal Transformers | Dean L Slack et.al. | 2510.20807 | null |
| 2025-10-23 | Radar-Camera Fused Multi-Object Tracking: Online Calibration and Common Feature | Lei Cheng et.al. | 2510.20794 | null |
| 2025-10-22 | FutrTrack: A Camera-LiDAR Fusion Transformer for 3D Multiple Object Tracking | Martha Teiko Teye et.al. | 2510.19981 | null |
| 2025-10-22 | HAD: Hierarchical Asymmetric Distillation to Bridge Spatio-Temporal Gaps in Event-Based Object Tracking | Yao Deng et.al. | 2510.19560 | null |
| 2025-10-21 | UniHPR: Unified Human Pose Representation via Singular Value Contrastive Learning | Zhongyu Jiang et.al. | 2510.19078 | null |
| 2025-10-21 | Practical Noise Mitigation for Quantum Annealing via Dynamical Decoupling – Towards Industry-Relevant Optimization using Trapped Ions | Sebastian Nagies et.al. | 2510.19073 | null |
| 2025-10-15 | DMTrack: Deformable State-Space Modeling for UAV Multi-Object Tracking with Kalman Fusion and Uncertainty-Aware Association | Zenghuang Fu et.al. | 2510.17860 | null |
| 2025-10-20 | Monitoring Horses in Stalls: From Object to Event Detection | Dmitrii Galimzianov et.al. | 2510.17409 | null |
| 2025-10-17 | Symmetric Entropy-Constrained Video Coding for Machines | Yuxiao Sun et.al. | 2510.15347 | null |
| 2025-10-07 | GAZE:Governance-Aware pre-annotation for Zero-shot World Model Environments | Leela Krishna et.al. | 2510.14992 | null |
| 2025-10-15 | EPIPTrack: Rethinking Prompt Modeling with Explicit and Implicit Prompts for Multi-Object Tracking | Yukuan Zhang et.al. | 2510.13235 | null |
| 2025-10-14 | SVAG-Bench: A Large-Scale Benchmark for Multi-Instance Spatio-temporal Video Action Grounding | Tanveer Hannan et.al. | 2510.13016 | null |
| 2025-10-14 | MMOT: The First Challenging Benchmark for Drone-based Multispectral Multi-Object Tracking | Tianhao Li et.al. | 2510.12565 | null |
| 2025-10-10 | LTGS: Long-Term Gaussian Scene Chronology From Sparse View Updates | Minkwan Kim et.al. | 2510.09881 | null |
| 2025-10-10 | Fast Self-Supervised depth and mask aware Association for Multi-Object Tracking | Milad Khanchi et.al. | 2510.09878 | null |
| 2025-10-10 | GL-DT: Multi-UAV Detection and Tracking with Global-Local Integration | Juanqin Liu et.al. | 2510.09092 | null |
| 2025-10-08 | MSITrack: A Challenging Benchmark for Multispectral Single Object Tracking | Tao Feng et.al. | 2510.06619 | null |
| 2025-10-06 | ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning | Siheng Zhao et.al. | 2510.05070 | null |
| 2025-09-30 | CHAI: Command Hijacking against embodied AI | Luis Burbano et.al. | 2510.00181 | null |
| 2025-09-29 | Infrastructure Sensor-enabled Vehicle Data Generation using Multi-Sensor Fusion for Proactive Safety Applications at Work Zone | Suhala Rabab Saba et.al. | 2509.25452 | null |
| 2025-09-29 | Collaborating Vision, Depth, and Thermal Signals for Multi-Modal Tracking: Dataset and Algorithm | Xue-Feng Zhu et.al. | 2509.24741 | null |
| 2025-09-26 | Motion-Aware Transformer for Multi-Object Tracking | Xu Yang et.al. | 2509.21715 | null |
| 2025-09-23 | Investigating Traffic Accident Detection Using Multimodal Large Language Models | Ilhan Skender et.al. | 2509.19096 | null |
| 2025-09-22 | An Analysis of Kalman Filter based Object Tracking Methods for Fast-Moving Tiny Objects | Prithvi Raj Singh et.al. | 2509.18451 | null |
| 2025-09-22 | StereoFoley: Object-Aware Stereo Audio Generation from Video | Tornike Karchkhadze et.al. | 2509.18272 | null |
| 2025-09-22 | DepTR-MOT: Unveiling the Potential of Depth-Informed Trajectory Refinement for Multi-Object Tracking | Buyin Deng et.al. | 2509.17323 | null |
| 2025-09-20 | Lattice Boltzmann Model for Learning Real-World Pixel Dynamicity | Guangze Zheng et.al. | 2509.16527 | null |
| 2025-09-17 | StableTracker: Learning to Stably Track Target via Differentiable Simulation | Fanxing Li et.al. | 2509.14147 | null |
| 2025-09-17 | VSE-MOT: Multi-Object Tracking in Low-Quality Video Scenes Guided by Visual Semantic Enhancement | Jun Du et.al. | 2509.14060 | null |
| 2025-09-17 | Distractor-Aware Memory-Based Visual Object Tracking | Jovana Videnovic et.al. | 2509.13864 | null |
| 2025-09-16 | Real-Time Detection and Tracking of Foreign Object Intrusions in Power Systems via Feature-Based Edge Intelligence | Xinan Wang et.al. | 2509.13396 | null |
| 2025-09-16 | MATTER: Multiscale Attention for Registration Error Regression | Shipeng Liu et.al. | 2509.12924 | null |
| 2025-09-16 | T-SiamTPN: Temporal Siamese Transformer Pyramid Networks for Robust and Efficient UAV Tracking | Hojat Ardi et.al. | 2509.12913 | null |
| 2025-09-15 | Multi-animal tracking in Transition: Comparative Insights into Established and Emerging Methods | Anne Marthe Sophie Ngo Bibinbe et.al. | 2509.11873 | null |
| 2025-09-15 | Seg2Track-SAM2: SAM2-based Multi-object Tracking and Segmentation for Zero-shot Generalization | Diogo Mendonça et.al. | 2509.11772 | link |
| 2025-09-14 | Beyond Frame-wise Tracking: A Trajectory-based Paradigm for Efficient Point Cloud Tracking | BaiChen Fan et.al. | 2509.11453 | null |
| 2025-09-14 | Motion Estimation for Multi-Object Tracking using KalmanNet with Semantic-Independent Encoding | Jian Song et.al. | 2509.11323 | null |
| 2025-09-12 | ISTASTrack: Bridging ANN and SNN via ISTA Adapter for RGB-Event Tracking | Siying Liu et.al. | 2509.09977 | null |
| 2025-09-12 | An HMM-based framework for identity-aware long-term multi-object tracking from sparse and uncertain identification: use case on long-term tracking in livestock | Anne Marthe Sophie Ngo Bibinbe et.al. | 2509.09962 | null |
| 2025-09-11 | Classification of Driver Behaviour Using External Observation Techniques for Autonomous Vehicles | Ian Nell et.al. | 2509.09349 | null |
| 2025-09-10 | Sparse BEV Fusion with Self-View Consistency for Multi-View Detection and Tracking | Keisuke Toida et.al. | 2509.08421 | null |
| 2025-09-10 | Hyperspectral Mamba for Hyperspectral Object Tracking | Long Gao et.al. | 2509.08265 | null |
| 2025-09-08 | Benchmarking EfficientTAM on FMO datasets | Senem Aktas et.al. | 2509.06536 | null |
| 2025-09-03 | Multi-Sensor Fusion for Extended Object Tracking Exploiting Active and Passive Radio Signals | Hong Zhu et.al. | 2509.03686 | null |
| 2025-09-03 | DeepSea MOT: A benchmark dataset for multi-object tracking on deep-sea video | Kevin Barnard et.al. | 2509.03499 | null |
| 2025-09-02 | ADVMEM: Adversarial Memory Initialization for Realistic Test-Time Adaptation via Tracklet-Based Benchmarking | Shyma Alhuwaider et.al. | 2509.02182 | null |
| 2025-09-02 | NOOUGAT: Towards Unified Online and Offline Multi-Object Tracking | Benjamin Missaoui et.al. | 2509.02111 | null |
| 2025-09-02 | See No Evil: Adversarial Attacks Against Linguistic-Visual Association in Referring Multi-Object Tracking Systems | Halima Bouzidi et.al. | 2509.02028 | null |
| 2025-09-01 | Content-Aware Foveated Camera for Multi-Target Tracking | Zihan Zang et.al. | 2509.01165 | null |
| 2025-09-01 | MVTrajecter: Multi-View Pedestrian Tracking with Trajectory Motion Cost and Trajectory Appearance Cost | Taiga Yamane et.al. | 2509.01157 | null |
| 2025-08-26 | Safe Navigation under State Uncertainty: Online Adaptation for Robust Control Barrier Functions | Ersin Das et.al. | 2508.19159 | null |
| 2025-08-20 | 6-DoF Object Tracking with Event-based Optical Flow and Frames | Zhichao Li et.al. | 2508.14776 | null |
| 2025-08-20 | SMTrack: End-to-End Trained Spiking Neural Networks for Multi-Object Tracking in RGB Videos | Pengzhi Zhong et.al. | 2508.14607 | null |
| 2025-08-20 | FastTracker: Real-Time and Accurate Visual Tracking | Hamidreza Hashempoor et.al. | 2508.14370 | link |
| 2025-08-19 | Model-based Multi-object Visual Tracking: Identification and Standard Model Limitations | Jan Krejčí et.al. | 2508.13647 | null |
| 2025-08-19 | Bridging the Gap: Doubles Badminton Analysis with Singles-Trained Models | Seungheon Baek et.al. | 2508.13507 | null |
| 2025-08-18 | Omni Survey for Multimodality Analysis in Visual Object Tracking | Zhangyong Tang et.al. | 2508.13000 | null |
| 2025-08-18 | Revisiting Functional Derivatives in Multi-object Tracking | Jan Krejčí et.al. | 2508.12982 | null |
| 2025-08-18 | SocialTrack: Multi-Object Tracking in Complex Urban Traffic Scenes Inspired by Social Behavior | Wenguang Tao et.al. | 2508.12777 | null |
| 2025-08-15 | Multi-State Tracker: Enhancing Efficient Object Tracking via Multi-State Specialization and Interaction | Shilei Wang et.al. | 2508.11531 | null |
| 2025-08-15 | Delving into Dynamic Scene Cue-Consistency for Robust 3D Multi-Object Tracking | Haonan Zhang et.al. | 2508.11323 | null |
| 2025-08-14 | Serial Over Parallel: Learning Continual Unification for Multi-Modal Visual Object Tracking and Benchmarking | Zhangyong Tang et.al. | 2508.10655 | null |
| 2025-08-14 | SpaRC-AD: A Baseline for Radar-Camera Fusion in End-to-End Autonomous Driving | Philipp Wolters et.al. | 2508.10567 | null |
| 2025-08-14 | STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes | Keishi Ishihara et.al. | 2508.10427 | null |
| 2025-08-13 | MeMoSORT: Memory-Assisted Filtering and Motion-Adaptive Association Metric for Multi-Person Tracking | Yingjie Wang et.al. | 2508.09796 | null |
| 2025-08-13 | Offline Auto Labeling: BAAS | Stefan Haag et.al. | 2508.09585 | null |
| 2025-08-13 | SOI is the Root of All Evil: Quantifying and Breaking Similar Object Interference in Single Object Tracking | Yipei Wang et.al. | 2508.09524 | null |
| 2025-08-11 | SAGOnline: Segment Any Gaussians Online | Wentao Sun et.al. | 2508.08219 | null |
| 2025-08-11 | GRASPTrack: Geometry-Reasoned Association via Segmentation and Projection for Multi-Object Tracking | Xudong Han et.al. | 2508.08117 | null |
| 2025-08-10 | SUIT: Spatial-Spectral Union-Intersection Interaction Network for Hyperspectral Object Tracking | Fengchao Xiong et.al. | 2508.07250 | null |
| 2025-08-07 | MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes | Henghui Ding et.al. | 2508.05630 | link |
| 2025-08-07 | Head Anchor Enhanced Detection and Association for Crowded Pedestrian Tracking | Zewei Wu et.al. | 2508.05514 | null |
| 2025-08-07 | Multi-tracklet Tracking for Generic Targets with Adaptive Detection Clustering | Zewei Wu et.al. | 2508.05172 | null |
| 2025-08-05 | Constraint-Preserving Data Generation for Visuomotor Policy Learning | Kevin Lin et.al. | 2508.03944 | null |
| 2025-08-04 | Perception of dynamic multi-speaker auditory scenes under different modes of attention | Stephanie Graceffo et.al. | 2508.02620 | null |
| 2025-08-04 | QuaDreamer: Controllable Panoramic Video Generation for Quadruped Robots | Sheng Wu et.al. | 2508.02512 | null |
| 2025-08-04 | VDEGaussian: Video Diffusion Enhanced 4D Gaussian Splatting for Dynamic Urban Scenes Modeling | Yuru Xiao et.al. | 2508.02129 | null |
| 2025-08-04 | YOLOv1 to YOLOv11: A Comprehensive Survey of Real-Time Object Detection Innovations and Challenges | Manikanta Kotthapalli et.al. | 2508.02067 | null |
| 2025-08-03 | SoccerTrack v2: A Full-Pitch Multi-View Soccer Dataset for Game State Reconstruction | Atom Scott et.al. | 2508.01802 | null |
| 2025-08-03 | Vision transformer-based multi-camera multi-object tracking framework for dairy cow monitoring | Kumail Abbas et.al. | 2508.01752 | null |
| 2025-08-03 | Tracking the Unstable: Appearance-Guided Motion Modeling for Robust Multi-Object Tracking in UAV-Captured Videos | Jianbo Ma et.al. | 2508.01730 | null |
| 2025-08-03 | Lessons Learned from the Real-World Deployment of Multi-Sensor Fusion for Proactive Work Zone Safety Application | Minhaj Uddin Ahmad et.al. | 2508.01599 | null |
| 2025-08-01 | Stable at Any Speed: Speed-Driven Multi-Object Tracking with Learnable Kalman Filtering | Yan Gong et.al. | 2508.00358 | null |
| 2025-08-01 | Towards Robust Semantic Correspondence: A Benchmark and Insights | Wenyue Chong et.al. | 2508.00272 | null |
| 2025-07-31 | CST Anti-UAV: A Thermal Infrared Benchmark for Tiny UAV Tracking in Complex Scenes | Bin Xie et.al. | 2507.23473 | null |
| 2025-07-31 | A Deep Dive into Generic Object Tracking: A Survey | Fereshteh Aghaee Meibodi et.al. | 2507.23251 | null |
| 2025-07-30 | Efficient Spatial-Temporal Modeling for Real-Time Video Analysis: A Unified Framework for Action Recognition and Object Tracking | Shahla John et.al. | 2507.22421 | null |
| 2025-07-29 | SAMITE: Position Prompted SAM2 with Calibrated Memory for Visual Object Tracking | Qianxiong Xu et.al. | 2507.21732 | null |
| 2025-07-29 | An Angular-Temporal Interaction Network for Light Field Object Tracking in Low-Light Scenes | Mianzhao Wang et.al. | 2507.21460 | null |
| 2025-07-29 | InSituTale: Enhancing Augmented Data Storytelling with Physical Objects | Kentaro Takahira et.al. | 2507.21411 | null |
| 2025-07-26 | TrackAny3D: Transferring Pretrained 3D Models for Category-unified 3D Point Cloud Tracking | Mengmeng Wang et.al. | 2507.19908 | null |
| 2025-07-25 | CoopTrack: Exploring End-to-End Learning for Efficient Cooperative Sequential Perception | Jiaru Zhong et.al. | 2507.19239 | null |
| 2025-07-25 | HQ-SMem: Video Segmentation and Tracking Using Memory Efficient Object Embedding With Selective Update and Self-Supervised Distillation Feedback | Elham Soltani Kazemi et.al. | 2507.18921 | null |
| 2025-07-24 | DRWKV: Focusing on Object Edges for Low-Light Image Enhancement | Xuecheng Bai et.al. | 2507.18594 | null |
| 2025-07-23 | CHAMP: A Configurable, Hot-Swappable Edge Architecture for Adaptive Biometric Tasks | Joel Brogan et.al. | 2507.17793 | null |
| 2025-07-22 | Benchmarking pig detection and tracking under diverse and challenging conditions | Jonathan Henrich et.al. | 2507.16639 | null |
| 2025-07-21 | Is Tracking really more challenging in First Person Egocentric Vision? | Matteo Dunnhofer et.al. | 2507.16015 | null |
| 2025-07-19 | Depthwise-Dilated Convolutional Adapters for Medical Object Tracking and Segmentation Using the Segment Anything Model 2 | Guoping Xu et.al. | 2507.14613 | link |
| 2025-07-18 | GOSPA and T-GOSPA quasi-metrics for evaluation of multi-object tracking algorithms | Ángel F. García-Fernández et.al. | 2507.13706 | null |
| 2025-07-17 | MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results | Yuki Kondo et.al. | 2507.12832 | null |
| 2025-07-20 | YOLOv8-SMOT: An Efficient and Robust Framework for Real-Time Small Object Tracking via Slice-Assisted Training and Adaptive Association | Xiang Yu et.al. | 2507.12087 | null |
| 2025-07-14 | OpenHuman4D: Open-Vocabulary 4D Human Parsing | Keito Suzuki et.al. | 2507.09880 | null |
| 2025-07-12 | On the Fragility of Multimodal Perception to Temporal Misalignment in Autonomous Driving | Md Hasan Shahriar et.al. | 2507.09095 | null |
| 2025-07-21 | RoundaboutHD: High-Resolution Real-World Urban Environment Benchmark for Multi-Camera Vehicle Tracking | Yuqiang Lin et.al. | 2507.08729 | link |
| 2025-07-11 | SAM2RL: Towards Reinforcement Learning Memory Control in Segment Anything Model 2 | Alen Adamyan et.al. | 2507.08548 | null |
| 2025-07-14 | HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking | Ruixiang Chen et.al. | 2507.07603 | null |
| 2025-07-10 | Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking | Qiangqiang Wu et.al. | 2507.07483 | null |
| 2025-07-08 | When Trackers Date Fish: A Benchmark and Framework for Underwater Multiple Fish Tracking | Weiran Li et.al. | 2507.06400 | link |
| 2025-07-08 | Cooperative Mapping, Localization, and Beam Management via Multi-Modal SLAM in ISAC Systems | Hang Que et.al. | 2507.05718 | null |
| 2025-07-07 | Self-Supervised Real-Time Tracking of Military Vehicles in Low-FPS UAV Footage | Markiyan Kostiv et.al. | 2507.05229 | null |
| 2025-07-15 | Robustifying 3D Perception via Least-Squares Graphs for Multi-Agent Object Tracking | Maria Damanaki et.al. | 2507.04762 | null |
| 2025-07-05 | Integrated Gaussian Processes for Robust and Adaptive Multi-Object Tracking | Fred Lydeard et.al. | 2507.04116 | null |
| 2025-07-03 | CrowdTrack: A Benchmark for Difficult Multiple Pedestrian Tracking in Real Scenarios | Teng Fu et.al. | 2507.02479 | null |
| 2025-07-03 | A Novel Tuning Method for Real-time Multiple-Object Tracking Utilizing Thermal Sensor with Complexity Motion Pattern | Duong Nguyen-Ngoc Tran et.al. | 2507.02408 | null |
| 2025-07-03 | PLOT: Pseudo-Labeling via Video Object Tracking for Scalable Monocular 3D Object Detection | Seokyeong Lee et.al. | 2507.02393 | null |
| 2025-07-02 | Real-Time Emergency Vehicle Siren Detection with Efficient CNNs on Embedded Hardware | Marco Giordano et.al. | 2507.01563 | null |
| 2025-07-02 | TrackingMiM: Efficient Mamba-in-Mamba Serialization for Real-time UAV Object Tracking | Bingxi Liu et.al. | 2507.01535 | null |
| 2025-07-01 | UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions | Siyuan Yao et.al. | 2507.00648 | null |
| 2025-06-30 | Visual and Memory Dual Adapter for Multi-Modal Object Tracking | Boyue Xu et.al. | 2506.23972 | null |
| 2025-06-30 | Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking | Shiao Wang et.al. | 2506.23783 | null |
| 2025-06-28 | Optimal Trajectory Planning for Space Object Tracking with Collision-Avoidance Constraints | Saif R. Kazi et.al. | 2506.22797 | null |
| 2025-06-27 | Improving Token-based Object Detection with Video | Abhineet Singh et.al. | 2506.22562 | null |
| 2025-07-01 | R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning | Biao Wang et.al. | 2506.21980 | null |
| 2025-06-24 | VideoPCDNet: Video Parsing and Prediction with Phase Correlation Networks | Noel José Rodrigues Vicente et.al. | 2506.19621 | null |
| 2025-06-24 | Trajectory Prediction in Dynamic Object Tracking: A Critical Study | Zhongping Dong et.al. | 2506.19341 | null |
| 2025-06-23 | Lightweight RGB-T Tracking with Mobile Vision Transformers | Mahdi Falaki et.al. | 2506.19154 | null |
| 2025-06-23 | USVTrack: USV-Based 4D Radar-Camera Tracking Dataset for Autonomous Driving in Inland Waterways | Shanliang Yao et.al. | 2506.18737 | null |
| 2025-06-20 | RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking | Teng Guo et.al. | 2506.17119 | link |
| 2025-06-19 | KARL: Kalman-Filter Assisted Reinforcement Learner for Dynamic Object Tracking and Grasping | Kowndinya Boyalakuntla et.al. | 2506.15945 | null |
| 2025-06-18 | Probabilistic Trajectory GOSPA: A Metric for Uncertainty-Aware Multi-Object Tracking Performance Evaluation | Yuxuan Xia et.al. | 2506.15148 | null |
| 2025-06-16 | Deep Learning-Based Multi-Object Tracking: A Comprehensive Survey from Foundations to State-of-the-Art | Momir Adžemović et.al. | 2506.13457 | null |
| 2025-06-13 | Multiple Object Tracking in Video SAR: A Benchmark and Tracking Baseline | Haoxiang Chen et.al. | 2506.12105 | null |
| 2025-06-13 | Design and Simulation of Vehicle Motion Tracking System using a Youla Controller Output Observation System | Rongfei Li et.al. | 2506.11386 | null |
| 2025-06-11 | Optimizing Cooperative Multi-Object Tracking using Graph Signal Processing | Maria Damanaki et.al. | 2506.09469 | null |
| 2025-06-10 | MOSE: A Novel Orchestration Framework for Stateful Microservice Migration at the Edge | Antonio Calagna et.al. | 2506.09159 | null |
| 2025-06-09 | SAM2Auto: Auto Annotation Using FLASH | Arash Rocky et.al. | 2506.07850 | null |
| 2025-06-05 | FRAME: Pre-Training Video Feature Representations via Anticipation and Memory | Sethuraman TV et.al. | 2506.05543 | null |
| 2025-06-04 | Contour Errors: An Ego-Centric Metric for Reliable 3D Multi-Object Tracking | Sharang Kaul et.al. | 2506.04122 | null |
| 2025-06-03 | SportMamba: Adaptive Non-Linear Multi-Object Tracking with State Space Models for Team Sports | Dheeraj Khanna et.al. | 2506.03335 | null |
| 2025-06-03 | MVTD: A Benchmark Dataset for Maritime Visual Object Tracking | Ahsan Baidar Bakht et.al. | 2506.02866 | null |
| 2025-06-02 | No Train Yet Gain: Towards Generic Multi-Object Tracking in Sports and Beyond | Tomasz Stanczyk et.al. | 2506.01373 | null |
| 2025-06-01 | Depth-Aware Scoring and Hierarchical Alignment for Multiple Object Tracking | Milad Khanchi et.al. | 2506.00774 | null |
| 2025-05-29 | Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping | Justin Lazarow et.al. | 2505.23756 | null |
| 2025-05-27 | SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation | Claudia Cuttano et.al. | 2505.21795 | link |
| 2025-05-27 | Fully Spiking Neural Networks for Unified Frame-Event Object Tracking | Jingjun Yang et.al. | 2505.20834 | null |
| 2025-05-26 | Video-based Direct Time Series Measurement of Along-Strike Slip on the Coseismic Surface Rupture During the 2025 Mw7.7 Myanmar Earthquake | Jianhao Gao et.al. | 2505.20494 | null |
| 2025-05-26 | ReaMOT: A Benchmark and Framework for Reasoning-based Multi-Object Tracking | Sijia Chen et.al. | 2505.20381 | link |
| 2025-05-28 | Progressive Scaling Visual Object Tracking | Jack Hong et.al. | 2505.19990 | null |
| 2025-05-24 | Distributed Expectation Propagation for Multi-Object Tracking over Sensor Networks | Qing Li et.al. | 2505.18795 | null |
| 2025-05-24 | FusionTrack: End-to-End Multi-Object Tracking in Arbitrary Multi-View Environment | Xiaohe Li et.al. | 2505.18727 | null |
| 2025-05-24 | EOTNet: Deep Memory Aided Bayesian Filter for Extended Object Tracking | Zhixing Wang et.al. | 2505.18684 | link |
| 2025-05-23 | Adapting SAM 2 for Visual Object Tracking: 1st Place Solution for MMVPR Challenge Multi-Modal Tracking | Cheng-Yen Yang et.al. | 2505.18111 | null |
| 2025-05-22 | A Framework for Multi-View Multiple Object Tracking using Single-View Multi-Object Trackers on Fish Data | Chaim Chai Elchik et.al. | 2505.17201 | null |
| 2025-05-22 | Temporal Object Captioning for Street Scene Videos from LiDAR Tracks | Vignesh Gopinathan et.al. | 2505.16594 | null |
| 2025-05-21 | Learning better representations for crowded pedestrians in offboard LiDAR-camera 3D tracking-by-detection | Shichao Li et.al. | 2505.16029 | null |
| 2025-05-21 | ViQAgent: Zero-Shot Video Question Answering via Agent with Open-Vocabulary Grounding Validation | Tony Montes et.al. | 2505.15928 | link |
| 2025-05-19 | Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach | Shiao Wang et.al. | 2505.12903 | link |
| 2025-05-22 | LiDAR MOT-DETR: A LiDAR-based Two-Stage Transformer for 3D Multiple Object Tracking | Martha Teiko Teye et.al. | 2505.12753 | null |
| 2025-05-19 | Diff-MM: Exploring Pre-trained Text-to-Image Generation Model for Unified Multi-modal Object Tracking | Shiyu Xuan et.al. | 2505.12606 | null |
| 2025-05-18 | DIMM: Decoupled Multi-hierarchy Kalman Filter for 3D Object Tracking | Jirong Zha et.al. | 2505.12340 | null |
| 2025-05-17 | GTR: Gaussian Splatting Tracking and Reconstruction of Unknown Objects Based on Appearance and Geometric Complexity | Takuya Ikeda et.al. | 2505.11905 | null |
| 2025-05-12 | Asynchronous Multi-Object Tracking with an Event Camera | Angus Apps et.al. | 2505.08126 | link |
| 2025-05-12 | SAEN-BGS: Energy-Efficient Spiking AutoEncoder Network for Background Subtraction | Zhixuan Zhang et.al. | 2505.07336 | null |
| 2025-05-12 | Towards Accurate State Estimation: Kalman Filter Incorporating Motion Dynamics for 3D Multi-Object Tracking | Mohamed Nagy et.al. | 2505.07254 | null |
| 2025-05-09 | CGTrack: Cascade Gating Network with Hierarchical Feature Aggregation for UAV Tracking | Weihong Li et.al. | 2505.05936 | link |
| 2025-05-08 | A Simple Detector with Frame Dynamics is a Strong Tracker | Chenxu Peng et.al. | 2505.04917 | link |
| 2025-05-07 | Edge-GPU Based Face Tracking for Face Detection and Recognition Acceleration | Asma Baobaid et.al. | 2505.04524 | null |
| 2025-05-07 | Improving Inclusivity for Emotion Recognition Based on Face Tracking | Mats Ole Ellenberg et.al. | 2505.04433 | null |
| 2025-05-11 | SMMT: Siamese Motion Mamba with Self-attention for Thermal Infrared Target Tracking | Shang Zhang et.al. | 2505.04088 | null |
| 2025-05-06 | Interactive Instance Annotation with Siamese Networks | Xiang Xu et.al. | 2505.03184 | null |
| 2025-05-02 | CAMELTrack: Context-Aware Multi-cue ExpLoitation for Online Multi-Object Tracking | Vladimir Somers et.al. | 2505.01257 | link |
| 2025-05-02 | Optimizing Indoor Farm Monitoring Efficiency Using UAV: Yield Estimation in a GNSS-Denied Cherry Tomato Greenhouse | Taewook Park et.al. | 2505.00995 | null |
| 2025-04-30 | MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection | Qiushi Yang et.al. | 2505.00739 | null |
| 2025-05-01 | A Robust Deep Networks based Multi-Object MultiCamera Tracking System for City Scale Traffic | Muhammad Imran Zaman et.al. | 2505.00534 | null |
| 2025-04-30 | Stereo X-ray tomography on deformed object tracking | Zhenduo Shang et.al. | 2505.00122 | null |
| 2025-04-30 | LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics | Marc Glocker et.al. | 2504.21716 | link |
| 2025-04-30 | Enhancing Self-Supervised Fine-Grained Video Object Tracking with Dynamic Memory Prediction | Zihan Zhou et.al. | 2504.21692 | null |
| 2025-04-29 | The Mean of Multi-Object Trajectories | Tran Thien Dat Nguyen et.al. | 2504.20391 | null |
| 2025-04-28 | Improving trajectory continuity in drone-based crowd monitoring using a set of minimal-cost techniques and deep discriminative correlation filters | Bartosz Ptak et.al. | 2504.20234 | null |
| 2025-04-28 | A computer vision method to estimate ventilation rate of Atlantic salmon in sea fish farms | Lukas Folkman et.al. | 2504.19719 | null |
| 2025-04-25 | Decentralized Fusion of 3D Extended Object Tracking based on a B-Spline Shape Model | Longfei Han et.al. | 2504.18708 | null |
| 2025-04-25 | Multi-Sensor Fusion of Active and Passive Measurements for Extended Object Tracking | Hong Zhu et.al. | 2504.18301 | null |
| 2025-04-25 | PerfCam: Digital Twinning for Production Lines Using 3D Gaussian Splatting and Vision Models | Michel Gokan Khan et.al. | 2504.18165 | null |
| 2025-04-25 | S3MOT: Monocular 3D Object Tracking with Selective State Space Model | Zhuohao Yan et.al. | 2504.18068 | null |
| 2025-04-22 | SonarT165: A Large-scale Benchmark and STFTrack Framework for Acoustic Object Tracking | Yunfeng Li et.al. | 2504.15609 | null |
| 2025-04-19 | Adversarial Attack for RGB-Event based Visual Object Tracking | Qiang Chen et.al. | 2504.14423 | null |
| 2025-04-17 | Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving | Shumin Wang et.al. | 2504.12709 | null |
| 2025-04-16 | Robust Visual Servoing under Human Supervision for Assembly Tasks | Victor Nan Fernandez-Ayala et.al. | 2504.12506 | null |
| 2025-04-13 | Intelligent driving vehicle front multi-target tracking and detection based on YOLOv5 and point cloud 3D projection | Dayong Liu et.al. | 2504.11310 | null |
| 2025-04-15 | WildLive: Near Real-time Visual Wildlife Tracking onboard UAVs | Nguyen Ngoc Dat et.al. | 2504.10165 | null |
| 2025-04-12 | PapMOT: Exploring Adversarial Patch Attack against Multiple Object Tracking | Jiahuan Long et.al. | 2504.09361 | null |
| 2025-04-12 | Text To 3D Object Generation For Scalable Room Assembly | Sonia Laguna et.al. | 2504.09328 | null |
| 2025-04-12 | ReferGPT: Towards Zero-Shot Referring Multi-Object Tracking | Tzoulio Chamiti et.al. | 2504.09195 | null |
| 2025-04-10 | GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation | Lang Lin et.al. | 2504.07962 | null |
| 2025-04-09 | Multi-Object Tracking for Collision Avoidance Using Multiple Cameras in Open RAN Networks | Jordi Serra et.al. | 2504.07163 | null |
| 2025-04-13 | VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning | Xinhao Li et.al. | 2504.06958 | null |
| 2025-04-16 | SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation | Junjie Jiang et.al. | 2504.04519 | link |
| 2025-04-05 | Risk-Aware Robot Control in Dynamic Environments Using Belief Control Barrier Functions | Shaohang Han et.al. | 2504.04097 | link |
| 2025-04-04 | TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking | Shuxiao Ding et.al. | 2504.03258 | null |
| 2025-04-03 | Attention-Aware Multi-View Pedestrian Tracking | Reef Alturki et.al. | 2504.03047 | null |
| 2025-04-03 | Data-Driven Object Tracking: Integrating Modular Neural Networks into a Kalman Framework | Christian Alexander Holz et.al. | 2504.02519 | null |
| 2025-04-02 | Deep LG-Track: An Enhanced Localization-Confidence-Guided Multi-Object Tracker | Ting Meng et.al. | 2504.01457 | null |
| 2025-04-02 | COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking | Chunhui Zhang et.al. | 2504.01321 | link |
| 2025-04-01 | IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval | Bangwei Liu et.al. | 2504.00954 | null |
| 2025-04-01 | Suite-IN++: A FlexiWear BodyNet Integrating Global and Local Motion Features from Apple Suite for Robust Inertial Navigation | Lan Sun et.al. | 2504.00438 | null |
| 2025-04-03 | Towards Mobile Sensing with Event Cameras on High-agility Resource-constrained Devices: A Survey | Haoyang Wang et.al. | 2503.22943 | null |
| 2025-03-28 | Hyperspectral Adapter for Object Tracking based on Hyperspectral Video | Long Gao et.al. | 2503.22199 | null |
| 2025-03-24 | TrackID3x3: A Dataset and Algorithm for Multi-Player Tracking with Identification and Pose Estimation in 3x3 Basketball Full-court Videos | Kazuhiro Yamada et.al. | 2503.18282 | null |
| 2025-03-22 | MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking | Haolin Qin et.al. | 2503.17699 | link |
| 2025-03-21 | Dynamic Attention Mechanism in Spatiotemporal Memory Networks for Object Tracking | Meng Zhou et.al. | 2503.16768 | null |
| 2025-03-20 | Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction | Edgar Sucar et.al. | 2503.16318 | null |
| 2025-03-17 | Real-Time Multi-Object Tracking using YOLOv8 and SORT on a SoC FPGA | Michal Danilowicz et.al. | 2503.13023 | null |
| 2025-03-17 | OptiPMB: Enhancing 3D Multi-Object Tracking with Optimized Poisson Multi-Bernoulli Filtering | Guanhua Ding et.al. | 2503.12968 | null |
| 2025-03-17 | UncTrack: Reliable Visual Object Tracking with Uncertainty-Aware Prototype Memory Network | Siyuan Yao et.al. | 2503.12888 | link |
| 2025-03-16 | History-Aware Transformation of ReID Features for Multiple Object Tracking | Ruopeng Gao et.al. | 2503.12562 | null |
| 2025-03-15 | ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object | Zhe Shan et.al. | 2503.12006 | null |
| 2025-03-14 | Cognitive Disentanglement for Referring Multi-Object Tracking | Shaofeng Liang et.al. | 2503.11496 | null |
| 2025-03-13 | 3D Extended Object Tracking based on Extruded B-Spline Side View Profiles | Longfei Han et.al. | 2503.10730 | null |
| 2025-03-18 | OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer | Jinyang Li et.al. | 2503.10616 | link |
| 2025-03-13 | Target-aware Bidirectional Fusion Transformer for Aerial Object Tracking | Xinglong Sun et.al. | 2503.09951 | null |
| 2025-03-12 | How good are deep learning methods for automated road safety analysis using video data? An experimental study | Qingwu Liu et.al. | 2503.09807 | null |
| 2025-03-11 | TrackOcc: Camera-based 4D Panoptic Occupancy Tracking | Zhuoguang Chen et.al. | 2503.08471 | null |
| 2025-03-11 | HRAvatar: High-Quality and Relightable Gaussian Head Avatar | Dongbin Zhang et.al. | 2503.08224 | null |
| 2025-03-11 | Attention to Trajectory: Trajectory-Aware Open-Vocabulary Tracking | Yunhao Li et.al. | 2503.08145 | null |
| 2025-03-10 | CPAny: Couple With Any Encoder to Refer Multi-Object Tracking | Weize Li et.al. | 2503.07516 | null |
| 2025-03-06 | Omnidirectional Multi-Object Tracking | Kai Luo et.al. | 2503.04565 | link |
| 2025-03-09 | ReynoldsFlow: Exquisite Flow Estimation via Reynolds Transport Theorem | Yu-Hsi Chen et.al. | 2503.04500 | null |
| 2025-03-06 | A Modular Pipeline for 3D Object Tracking Using RGB Cameras | Lars Bredereke et.al. | 2503.04322 | link |
| 2025-03-03 | AI-Driven Relocation Tracking in Dynamic Kitchen Environments | Arash Nasr Esfahani et.al. | 2503.01547 | link |
| 2025-02-27 | ACCORD: Application Context-aware Cross-layer Optimization and Resource Design for 5G/NextG Machine-centric Applications | Azuka Chiejina et.al. | 2502.20320 | null |
| 2025-02-27 | MITracker: Multi-View Integration for Visual Object Tracking | Mengjie Xu et.al. | 2502.20111 | null |
| 2025-02-26 | Spectral-Enhanced Transformers: Leveraging Large-Scale Pretrained Models for Hyperspectral Object Tracking | Shaheer Mohamed et.al. | 2502.18748 | null |
| 2025-02-25 | UASTrack: A Unified Adaptive Selection Framework with Modality-Customization in Single Object Tracking | He Wang et.al. | 2502.18220 | null |
| 2025-02-26 | Easy-Poly: A Easy Polyhedral Framework For 3D Multi-Object Tracking | Peng Zhang et.al. | 2502.17822 | null |
| 2025-02-24 | V-HOP: Visuo-Haptic 6D Object Pose Tracking | Hongyu Li et.al. | 2502.17434 | null |
| 2025-02-24 | Enriching Physical-Virtual Interaction in AR Gaming by Tracking Identical Real Objects | Liuchuan Yu et.al. | 2502.17399 | link |
| 2025-02-24 | CRTrack: Low-Light Semi-Supervised Multi-object Tracking Based on Consistency Regularization | Zijing Zhao et.al. | 2502.16809 | null |
| 2025-02-23 | Benchmarking Online Object Trackers for Underwater Robot Position Locking Applications | Ali Safa et.al. | 2502.16569 | null |
| 2025-02-19 | MEX: Memory-efficient Approach to Referring Multi-Object Tracking | Huu-Thien Tran et.al. | 2502.13875 | null |
| 2025-02-13 | IMM-MOT: A Novel 3D Multi-object Tracking Framework with Interacting Multiple Model Filter | Xiaohong Liu et.al. | 2502.09672 | null |
| 2025-02-10 | Adaptive Perception for Unified Visual Multi-modal Object Tracking | Xiantao Hu et.al. | 2502.06583 | null |
| 2025-02-09 | Energy-Efficient Autonomous Aerial Navigation with Dynamic Vision Sensors: A Physics-Guided Neuromorphic Approach | Sourav Sanyal et.al. | 2502.05938 | null |
| 2025-02-08 | Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark | Shiao Wang et.al. | 2502.05574 | link |
| 2025-02-06 | OneTrack-M: A multitask approach to transformer-based MOT models | Luiz C. S. de Araujo et.al. | 2502.04478 | null |
| 2025-02-06 | RAMOTS: A Real-Time System for Aerial Multi-Object Tracking based on Deep Learning and Big Data Technology | Nhat-Tan Do et.al. | 2502.03760 | null |
| 2025-02-04 | Rethinking Vision Transformer for Object Centric Foundation Models | Manuel Traub et.al. | 2502.02763 | null |
| 2025-02-04 | INTACT: Inducing Noise Tolerance through Adversarial Curriculum Training for LiDAR-based Safety-Critical Perception and Autonomy | Nastaran Darabi et.al. | 2502.01896 | null |
| 2025-02-03 | Bayesian Approximation-Based Trajectory Prediction and Tracking with 4D Radar | Dong-In Kim et.al. | 2502.01357 | null |
| 2025-02-03 | Solgenia – A Test Vessel Toward Energy-Efficient Autonomous Water Taxi Applications | Hannes Homburger et.al. | 2502.01207 | null |
| 2025-01-28 | Overcoming Semantic Dilution in Transformer-Based Next Frame Prediction | Hy Nguyen et.al. | 2501.16753 | null |
| 2025-01-27 | Understanding Long Videos via LLM-Powered Entity Relation Graphs | Meng Chu et.al. | 2501.15953 | null |
| 2025-01-24 | Visual Localization via Semantic Structures in Autonomous Photovoltaic Power Plant Inspection | Viktor Kozák et.al. | 2501.14587 | null |
| 2025-01-23 | CSAOT: Cooperative Multi-Agent System for Active Object Tracking | Hy Nguyen et.al. | 2501.13994 | null |
| 2025-01-23 | YOLO11-JDE: Fast and Accurate Multi-Object Tracking with Self-Supervised Re-ID | Iñaki Erregue et.al. | 2501.13710 | link |
| 2025-01-22 | InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling | Yi Wang et.al. | 2501.12386 | link |
| 2025-01-21 | LASER: Lip Landmark Assisted Speaker Detection for Robustness | Le Thien Phuc Nguyen et.al. | 2501.11899 | link |
| 2025-01-31 | FaceQSORT: a Multi-Face Tracking Method based on Biometric and Appearance Features | Robert Jöchl et.al. | 2501.11741 | null |
| 2025-01-20 | PD-SORT: Occlusion-Robust Multi-Object Tracking Using Pseudo-Depth Cues | Yanchao Wang et.al. | 2501.11288 | link |
| 2025-01-17 | Spatio-temporal Graph Learning on Adaptive Mined Key Frames for High-performance Multi-Object Tracking | Futian Wang et.al. | 2501.10129 | null |
| 2025-01-13 | SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing | Varun Biyyala et.al. | 2501.07554 | link |
| 2025-01-13 | TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry Operations | Daniel Steininger et.al. | 2501.07360 | link |
| 2025-01-13 | Robust Single Object Tracking in LiDAR Point Clouds under Adverse Weather Conditions | Xiantong Zhao et.al. | 2501.07133 | null |
| 2025-01-09 | An Empirical Study of Autoregressive Pre-training from Videos | Jathushan Rajasegaran et.al. | 2501.05453 | null |
| 2025-01-08 | Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs | Zeyi Huang et.al. | 2501.04336 | null |
| 2025-01-07 | Neuromorphic Optical Tracking and Imaging of Randomly Moving Targets through Strongly Scattering Media | Ning Zhang et.al. | 2501.03874 | null |
| 2025-01-05 | DeTrack: In-model Latent Denoising Learning for Visual Object Tracking | Xinyu Zhou et.al. | 2501.02467 | null |
| 2025-01-02 | HybridTrack: A Hybrid Approach for Robust Multi-Object Tracking | Leandro Di Bella et.al. | 2501.01275 | link |
| 2025-01-02 | Sensitivity of Room Impulse Responses in Changing Acoustic Environment | Karolina Prawda et.al. | 2501.01206 | null |
| 2025-01-01 | Less is More: Token Context-aware Learning for Object Tracking | Chenlong Xu et.al. | 2501.00758 | null |
| 2024-12-26 | SUTrack: Towards Simple and Unified Single Object Tracking | Xin Chen et.al. | 2412.19138 | link |
| 2024-12-23 | Cross-View Referring Multi-Object Tracking | Sijia Chen et.al. | 2412.17807 | link |
| 2024-12-20 | Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking | Xiantao Hu et.al. | 2412.15691 | link |
| 2024-12-19 | Scaling 4D Representations | João Carreira et.al. | 2412.15212 | null |
| 2024-12-18 | Joint Perception and Prediction for Autonomous Driving: A Survey | Lucas Dal’Col et.al. | 2412.14088 | link |
| 2024-12-18 | GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians | Xiaobao Wei et.al. | 2412.13983 | link |
| 2024-12-18 | MambaLCT: Boosting Tracking via Long-term Context State Space Model | Xiaohai Li et.al. | 2412.13615 | link |
| 2024-12-17 | CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices | Andrei Znobishchev et.al. | 2412.13273 | null |
| 2024-12-17 | License Plate Detection and Character Recognition Using Deep Learning and Font Evaluation | Zahra Ebrahimi Vargoorani et.al. | 2412.12572 | null |
| 2024-12-17 | Tell Me What to Track: Infusing Robust Language Guidance for Enhanced Referring Multi-Object Tracking | Wenjun Huang et.al. | 2412.12561 | null |
| 2024-12-13 | Vehicle Detection and Classification for Toll collection using YOLOv11 and Ensemble OCR | Karthik Sivakoti et.al. | 2412.12191 | null |
| 2024-12-15 | Exploring Enhanced Contextual Information for Video-Level Object Tracking | Ben Kang et.al. | 2412.11023 | link |
| 2024-12-14 | Heterogeneous Graph Transformer for Multiple Tiny Object Tracking in RGB-T Videos | Qingyu Xu et.al. | 2412.10861 | link |
| 2024-12-14 | Patch-level Sounding Object Tracking for Audio-Visual Question Answering | Zhangbin Li et.al. | 2412.10749 | null |
| 2024-12-12 | Analysis of Object Detection Models for Tiny Object in Satellite Imagery: A Dataset-Centric Approach | Kailas PS et.al. | 2412.10453 | null |
| 2024-12-13 | Visual Object Tracking across Diverse Data Modalities: A Review | Mengmeng Wang et.al. | 2412.09991 | null |
| 2024-12-12 | NormalFlow: Fast, Robust, and Accurate Contact-based Object 6DoF Pose Tracking with Vision-based Tactile Sensors | Hung-Jui Huang et.al. | 2412.09617 | link |
| 2024-12-12 | Temporal-Assisted Beamforming and Trajectory Prediction in Sensing-Enabled UAV Communications | Shengcai Zhou et.al. | 2412.09097 | null |
| 2024-12-11 | TGOSPA Metric Parameters Selection and Evaluation for Visual Multi-object Tracking | Jan Krejčí et.al. | 2412.08321 | null |
| 2024-12-11 | Post-Hoc MOTS: Exploring the Capabilities of Time-Symmetric Multi-Object Tracking | Gergely Szabó et.al. | 2412.08313 | null |
| 2024-12-11 | DTAA: A Detect, Track and Avoid Architecture for navigation in spaces with Multiple Velocity Objects | Samuel Nordström et.al. | 2412.08121 | null |
| 2024-12-10 | Balancing Shared and Task-Specific Representations: A Hybrid Approach to Depth-Aware Video Panoptic Segmentation | Kurt H. W. Stolle et.al. | 2412.07966 | link |
| 2024-12-10 | Benchmarking Vision-Based Object Tracking for USVs in Complex Maritime Environments | Muhayy Ud Din et.al. | 2412.07392 | null |
| 2024-12-10 | Optical Levitation of Arrays of Microspheres | Benjamin Siegel et.al. | 2412.07088 | null |
| 2024-12-09 | Enhanced Multi-Object Tracking Using Pose-based Virtual Markers in 3x3 Basketball | Li Yin et.al. | 2412.06258 | null |
| 2024-12-07 | Street Gaussians without 3D Object Tracker | Ruida Zhang et.al. | 2412.05548 | null |
| 2024-12-06 | HOLa: HoloLens Object Labeling | Michael Schwimmbeck et.al. | 2412.04945 | link |
| 2024-12-06 | Beyond Boxes: Mask-Guided Spatio-Temporal Feature Aggregation for Video Object Detection | Khurram Azeem Hashmi et.al. | 2412.04915 | null |
| 2024-12-04 | Distillation of Diffusion Features for Semantic Correspondence | Frank Fundel et.al. | 2412.03512 | null |
| 2024-12-03 | MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual Cues | Zhaofeng Hu et.al. | 2412.02734 | link |
| 2024-12-03 | A Bidirectional Long Short Term Memory Approach for Infrastructure Health Monitoring Using On-board Vibration Response | R. R. Samani et.al. | 2412.02643 | null |
| 2024-12-03 | GSOT3D: Towards Generic 3D Single Object Tracking in the Wild | Yifan Jiao et.al. | 2412.02129 | link |
| 2024-12-02 | 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting | Yufeng Jin et.al. | 2412.01543 | null |
| 2024-12-02 | A2VIS: Amodal-Aware Approach to Video Instance Segmentation | Minh Tran et.al. | 2412.01147 | null |
| 2024-12-02 | Referring Video Object Segmentation via Language-aligned Track Selection | Seongchan Kim et.al. | 2412.01136 | null |
| 2024-12-02 | Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks | Joseph Raj Vishal et.al. | 2412.01132 | link |
| 2024-12-02 | Object Tracking in a $360^o$ View: A Novel Perspective on Bridging the Gap to Biomedical Advancements | Mojtaba S. Fazli et.al. | 2412.01119 | null |
| 2024-12-02 | LiDAR SLAMMOT based on Confidence-guided Data Association | Susu Fang et.al. | 2412.01041 | null |
| 2024-12-01 | BEV-SUSHI: Multi-Target Multi-Camera 3D Detection and Tracking in Bird’s-Eye View | Yizhou Wang et.al. | 2412.00692 | null |
| 2024-11-29 | Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark | Joseph Heyward et.al. | 2411.19941 | null |
| 2024-11-28 | HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos | Prithviraj Banerjee et.al. | 2411.19167 | null |
| 2024-11-28 | Visual SLAMMOT Considering Multiple Motion Models | Peilin Tian et.al. | 2411.19134 | null |
| 2024-11-28 | CrossTracker: Robust Multi-modal 3D Multi-Object Tracking via Cross Correction | Lipeng Gu et.al. | 2411.18850 | null |
| 2024-11-27 | A comparison of extended object tracking with multi-modal sensors in indoor environment | Jiangtao Shuai et.al. | 2411.18476 | null |
| 2024-11-27 | Efficient Dynamic LiDAR Odometry for Mobile Robots with Structured Point Clouds | Jonathan Lichtenfeld et.al. | 2411.18443 | link |
| 2024-11-26 | A Distractor-Aware Memory for Visual Object Tracking with SAM2 | Jovana Videnovic et.al. | 2411.17576 | link |
| 2024-11-24 | FastTrackTr:Towards Fast Multi-Object Tracking with Transformers | Pan Liao et.al. | 2411.15811 | null |
| 2024-11-23 | How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking | Xuchen Li et.al. | 2411.15600 | null |
| 2024-11-23 | MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking | Xinqi Liu et.al. | 2411.15459 | null |
| 2024-11-20 | Gaze2AOI: Open Source Deep-learning Based System for Automatic Area of Interest Annotation with Eye Tracking Data | Karolina Trajkovska et.al. | 2411.13346 | null |
| 2024-11-20 | Teaching VLMs to Localize Specific Objects from In-context Examples | Sivan Doveh et.al. | 2411.13317 | link |
| 2024-11-24 | ClickTrack: Towards Real-time Interactive Single Object Tracking | Kuiran Wang et.al. | 2411.13183 | null |
| 2024-11-20 | Enhancing Thermal MOT: A Novel Box Association Method Leveraging Thermal Identity and Motion Similarity | Wassim El Ahmar et.al. | 2411.12943 | null |
| 2024-11-19 | Resolution Improvement in OFDM-based Joint Communication and Sensing through Combined Tracking and Interpolation | Charlotte Muth et.al. | 2411.12464 | null |
| 2024-11-18 | SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory | Cheng-Yen Yang et.al. | 2411.11922 | link |
| 2024-11-18 | Learning a Neural Association Network for Self-supervised Multi-Object Tracking | Shuai Li et.al. | 2411.11514 | null |
| 2024-11-15 | Real-Time AI-Driven People Tracking and Counting Using Overhead Cameras | Ishrath Ahamed et.al. | 2411.10072 | null |
| 2024-11-21 | MOT FCG++: Enhanced Representation of Spatio-temporal Motion and Appearance Features | Yanzhao Fang et.al. | 2411.10028 | null |
| 2024-11-13 | Predictive Visuo-Tactile Interactive Perception Framework for Object Properties Inference | Anirvan Dutta et.al. | 2411.09020 | null |
| 2024-11-13 | 3D Multi-Object Tracking with Semi-Supervised GRU-Kalman Filter | Xiaoxiang Wang et.al. | 2411.08433 | null |
| 2024-11-13 | DEEGITS: Deep Learning based Framework for Measuring Heterogenous Traffic State in Challenging Traffic Scenarios | Muttahirul Islam et.al. | 2411.08335 | null |
| 2024-11-12 | GTA: Global Tracklet Association for Multi-Object Tracking in Sports | Jiacheng Sun et.al. | 2411.08216 | link |
| 2024-11-11 | BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes | Hemal Naik et.al. | 2411.06896 | null |
| 2024-11-11 | HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Tracking with Hybrid Supervision | Shubo Lin et.al. | 2411.06780 | null |
| 2024-11-11 | Track Any Peppers: Weakly Supervised Sweet Pepper Tracking Using VLMs | Jia Syuen Lim et.al. | 2411.06702 | null |
| 2024-11-10 | PKF: Probabilistic Data Association Kalman Filter for Multi-Object Tracking | Hanwen Cao et.al. | 2411.06378 | link |
| 2024-11-09 | Multi-object Tracking by Detection and Query: an efficient end-to-end manner | Shukun Jia et.al. | 2411.06197 | null |
| 2024-11-06 | Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving | Depanshu Sani et.al. | 2411.03702 | null |
| 2024-11-04 | Enhancing Indoor Mobility with Connected Sensor Nodes: A Real-Time, Delay-Aware Cooperative Perception Approach | Minghao Ning et.al. | 2411.02624 | link |
| 2024-11-04 | SIRA: Scalable Inter-frame Relation and Association for Radar Perception | Ryoma Yataka et.al. | 2411.02220 | null |
| 2024-11-04 | Toward Integrating Semantic-aware Path Planning and Reliable Localization for UAV Operations | Thanh Nguyen Canh et.al. | 2411.01816 | null |
| 2024-11-04 | ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model | Yiming Sun et.al. | 2411.01756 | null |
| 2024-11-01 | Autobiasing Event Cameras | Mehdi Sefidgar Dilmaghani et.al. | 2411.00729 | null |
| 2024-11-01 | HopTrack: A Real-time Multi-Object Tracking System for Embedded Devices | Xiang Li et.al. | 2411.00608 | link |
| 2024-11-01 | Is Multiple Object Tracking a Matter of Specialization? | Gianluca Mancusi et.al. | 2411.00553 | null |
| 2024-10-31 | Extended Object Tracking and Classification based on Linear Splines | Matteo Tesori et.al. | 2410.24183 | null |
| 2024-10-30 | IP-MOT: Instance Prompt Learning for Cross-Domain Multi-Object Tracking | Run Luo et.al. | 2410.23907 | null |
| 2024-10-28 | Joint Audio-Visual Idling Vehicle Detection with Streamlined Input Dependencies | Xiwen Li et.al. | 2410.21170 | null |
| 2024-10-28 | Evaluating the Robustness of LiDAR Point Cloud Tracking Against Adversarial Attack | Shengjing Tian et.al. | 2410.20893 | null |
| 2024-10-27 | NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tracking | Yu Liu et.al. | 2410.20421 | link |
| 2024-10-27 | Depth Attention for Robust RGB Tracking | Yu Liu et.al. | 2410.20395 | link |
| 2024-10-26 | SFTrack: A Robust Scale and Motion Adaptive Algorithm for Tracking Small and Fast Moving Objects | InPyo Song et.al. | 2410.20079 | null |
| 2024-10-23 | ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting | Shaofei Cai et.al. | 2410.17856 | link |
| 2024-10-23 | Real-time Vehicle-to-Vehicle Communication Based Network Cooperative Control System through Distributed Database and Multimodal Perception: Demonstrated in Crossroads | Xinwen Zhu et.al. | 2410.17576 | link |
| 2024-10-23 | OVT-B: A New Large-Scale Benchmark for Open-Vocabulary Multi-Object Tracking | Haiji Liang et.al. | 2410.17534 | link |
| 2024-10-22 | MPT: A Large-scale Multi-Phytoplankton Tracking Benchmark | Yang Yu et.al. | 2410.16695 | null |
| 2024-10-19 | The Solution for Single Object Tracking Task of Perception Test Challenge 2024 | Zhiqiang Zhong et.al. | 2410.16329 | null |
| 2024-10-20 | TrackMe:A Simple and Effective Multiple Object Tracking Annotation Tool | Thinh Phan et.al. | 2410.15518 | link |
| 2024-10-20 | Multiset Combinatorial Gray Codes with Application to Proximity Sensor Networks | Chung Shue Chen et.al. | 2410.15428 | null |
| 2024-10-19 | 3D Multi-Object Tracking Employing MS-GLMB Filter for Autonomous Driving | Linh Van Ma et.al. | 2410.14977 | link |
| 2024-10-18 | Enhancing In-vehicle Multiple Object Tracking Systems with Embeddable Ising Machines | Kosuke Tatsumura et.al. | 2410.14093 | null |
| 2024-10-17 | Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation | Changcheng Xiao et.al. | 2410.13437 | null |
| 2024-10-17 | TRLO: An Efficient LiDAR Odometry with 3D Dynamic Object Tracking and Removal | Yanpeng Jia et.al. | 2410.13240 | null |
| 2024-10-17 | UAV3D: A Large-scale 3D Perception Benchmark for Unmanned Aerial Vehicles | Hui Ye et.al. | 2410.11125 | null |
| 2024-10-14 | Motion-guided small MAV detection in complex and non-planar scenes | Hanqing Guo et.al. | 2410.10527 | null |
| 2024-10-14 | SMART-TRACK: A Novel Kalman Filter-Guided Sensor Fusion For Robust UAV Object Tracking in Dynamic Environments | Khaled Gabr et.al. | 2410.10409 | link |
| 2024-10-14 | DINTR: Tracking via Diffusion-based Interpolation | Pha Nguyen et.al. | 2410.10053 | null |
| 2024-10-11 | Enhanced Kalman with Adaptive Appearance Motion SORT for Grounded Generic Multiple Object Tracking | Duy Le Dinh Anh et.al. | 2410.09243 | null |
| 2024-10-11 | VideoSAM: Open-World Video Segmentation | Pinxue Guo et.al. | 2410.08781 | null |
| 2024-10-11 | Efficient Multi-Object Tracking on Edge Devices via Reconstruction-Based Channel Pruning | Jan Müller et.al. | 2410.08769 | null |
| 2024-10-11 | VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking | Zekun Qian et.al. | 2410.08529 | null |
| 2024-10-05 | ETHcavation: A Dataset and Pipeline for Panoptic Scene Understanding and Object Tracking in Dynamic Construction Environments | Lorenzo Terenzi et.al. | 2410.04250 | null |
| 2024-10-03 | Spatial-Temporal Multi-Cuts for Online Multiple-Camera Vehicle Tracking | Fabian Herzog et.al. | 2410.02638 | link |
| 2024-10-09 | DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM | Xuchen Li et.al. | 2410.02492 | null |
| 2024-10-03 | Spiking Neural Network as Adaptive Event Stream Slicer | Jiahang Cao et.al. | 2410.02249 | link |
| 2024-10-10 | Tracking objects that change in appearance with phase synchrony | Sabine Muzellec et.al. | 2410.02094 | null |
| 2024-10-02 | Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking | Mattia Segu et.al. | 2410.01806 | null |
| 2024-10-02 | Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking | Ayesha Ishaq et.al. | 2410.01678 | link |
| 2024-09-29 | One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | Zechen Bai et.al. | 2409.19603 | link |
| 2024-09-27 | Improving Visual Object Tracking through Visual Prompting | Shih-Fang Chen et.al. | 2409.18901 | link |
| 2024-09-27 | Semantic Model Component Implementation for Model-driven Semantic Communications | Haotai Liang et.al. | 2409.18704 | null |
| 2024-09-30 | An Overview of Multi-Object Estimation via Labeled Random Finite Set | Ba-Ngu Vo et.al. | 2409.18531 | null |
| 2024-09-26 | BlinkTrack: Feature Tracking over 100 FPS via Events and Images | Yichen Shen et.al. | 2409.17981 | null |
| 2024-09-26 | General Compression Framework for Efficient Transformer Object Tracking | Lingyi Hong et.al. | 2409.17564 | null |
| 2024-09-26 | CAMOT: Camera Angle-aware Multi-Object Tracking | Felix Limanta et.al. | 2409.17533 | null |
| 2024-09-25 | Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Appearance Graphs | Mattia Segu et.al. | 2409.17221 | null |
| 2024-09-25 | Automated Surgical Skill Assessment in Endoscopic Pituitary Surgery using Real-time Instrument Tracking on a High-fidelity Bench-top Phantom | Adrito Das et.al. | 2409.17025 | null |
| 2024-09-25 | Towards Underwater Camouflaged Object Tracking: An Experimental Evaluation of SAM and SAM 2 | Chunhui Zhang et.al. | 2409.16902 | link |
| 2024-09-25 | Conditional Generative Denoiser for Nighttime UAV Tracking | Yucheng Wang et.al. | 2409.16834 | null |
| 2024-09-25 | Progressive Representation Learning for Real-Time UAV Tracking | Changhong Fu et.al. | 2409.16652 | link |
| 2024-09-25 | Enhancing Nighttime UAV Tracking with Light Distribution Suppression | Liangliang Yao et.al. | 2409.16631 | link |
| 2024-09-23 | MCTrack: A Unified 3D Multi-Object Tracking Framework for Autonomous Driving | Xiyang Wang et.al. | 2409.16149 | null |
| 2024-09-24 | CloudTrack: Scalable UAV Tracking with Cloud Semantics | Yannik Blei et.al. | 2409.16111 | null |
| 2024-09-22 | TrackNetV4: Enhancing Fast Sports Object Tracking with Motion Attention Maps | Arjun Raj et.al. | 2409.14543 | null |
| 2024-09-21 | Masks and Boxes: Combining the Best of Both Worlds for Multi-Object Tracking | Tomasz Stanczyk et.al. | 2409.14220 | null |
| 2024-09-18 | RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework | Xiaoyu Li et.al. | 2409.11749 | null |
| 2024-09-17 | SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking | Siyuan Li et.al. | 2409.11235 | link |
| 2024-09-17 | STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking | Jianbo Ma et.al. | 2409.11234 | link |
| 2024-09-17 | TrajSSL: Trajectory-Enhanced Semi-Supervised 3D Object Detection | Philip Jacobson et.al. | 2409.10901 | null |
| 2024-09-15 | Tracking Virtual Meetings in the Wild: Re-identification in Multi-Participant Virtual Meetings | Oriel Perl et.al. | 2409.09841 | null |
| 2024-09-14 | Associate Everything Detected: Facilitating Tracking-by-Detection to the Unknown | Zimeng Fang et.al. | 2409.09293 | link |
| 2024-09-12 | FACT: Feature Adaptive Continual-learning Tracker for Multiple Object Tracking | Rongzihan Song et.al. | 2409.07904 | null |
| 2024-09-10 | When to Extract ReID Features: A Selective Approach for Improved Multiple Object Tracking | Emirhan Bayar et.al. | 2409.06617 | link |
| 2024-09-08 | RCBEVDet++: Toward High-accuracy Radar-Camera Fusion 3D Perception Network | Zhiwei Lin et.al. | 2409.04979 | null |
| 2024-09-06 | LITE: A Paradigm Shift in Multi-Object Tracking with Efficient ReID Feature Integration | Jumabek Alikhanov et.al. | 2409.04187 | link |
| 2024-09-09 | Online Residual Learning from Offline Experts for Pedestrian Tracking | Anastasios Vlachos et.al. | 2409.04069 | null |
| 2024-09-05 | Gr-IoU: Ground-Intersection over Union for Robust Multi-Object Tracking with 3D Geometric Constraints | Keisuke Toida et.al. | 2409.03252 | null |
| 2024-09-04 | TP-GMOT: Tracking Generic Multiple Object by Textual Prompt with Motion-Appearance Cost (MAC) SORT | Duy Le Dinh Anh et.al. | 2409.02490 | link |
| 2024-09-01 | YOLOO: You Only Learn from Others Once | Lipeng Gu et.al. | 2409.00618 | null |
| 2024-09-10 | TrackSSM: A General Motion Predictor by State-Space Model | Bin Hu et.al. | 2409.00487 | link |
| 2024-08-31 | Fish Tracking Challenge 2024: A Multi-Object Tracking Competition with Sweetfish Schooling Data | Makoto M. Itoh et.al. | 2409.00339 | null |
| 2024-08-30 | UTrack: Multi-Object Tracking with Uncertain Detections | Edgardo Solano-Carrillo et.al. | 2408.17098 | link |
| 2024-08-29 | Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks | Sierra Bonilla et.al. | 2408.16445 | link |
| 2024-08-29 | Estimating Dynamic Flow Features in Groups of Tracked Objects | Tanner D. Harms et.al. | 2408.16190 | null |
| 2024-08-28 | ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model | Lifan Jiang et.al. | 2408.15548 | link |
| 2024-08-25 | Camouflaged_Object_Tracking__A_Benchmark | Xiaoyu Guo et.al. | 2408.13877 | null |
| 2024-08-23 | MCTR: Multi Camera Tracking Transformer | Alexandru Niculescu-Mizil et.al. | 2408.13243 | null |
| 2024-08-23 | BoostTrack++: using tracklet information to detect more objects in multiple object tracking | Vukašin Stanojević et.al. | 2408.13003 | link |
| 2024-08-22 | BankTweak: Adversarial Attack against Multi-Object Trackers by Manipulating Feature Banks | Woojin Shin et.al. | 2408.12727 | null |
| 2024-08-22 | BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking | Hanzheng Wang et.al. | 2408.12232 | null |
| 2024-08-21 | CHOTA: A Higher Order Accuracy Metric for Cell Tracking | Timo Kaiser et.al. | 2408.11571 | link |
| 2024-08-21 | Low-Light Object Tracking: A Benchmark | Pengzhi Zhong et.al. | 2408.11463 | null |
| 2024-08-20 | MambaEVT: Event Stream based Visual Object Tracking using State Space Model | Xiao Wang et.al. | 2408.10487 | link |
| 2024-08-17 | GSLAMOT: A Tracklet and Query Graph-based Simultaneous Locating, Mapping, and Multiple Object Tracking System | Shuo Wang et.al. | 2408.09191 | null |
| 2024-08-17 | MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model | Changcheng Xiao et.al. | 2408.09178 | null |
| 2024-08-14 | Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving | Yuqing Wen et.al. | 2408.07605 | null |
| 2024-08-14 | RTAT: A Robust Two-stage Association Tracker for Multi-Object Tracking | Song Guo et.al. | 2408.07344 | null |
| 2024-08-13 | Object Tracking Incorporating Transfer Learning into Unscented and Cubature Kalman Filters | Omar Alotaibi et.al. | 2408.07157 | null |
| 2024-08-12 | FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework | Lukas Meyer et.al. | 2408.06190 | link |
| 2024-08-12 | Toward Pedestrian Head Tracking: A Benchmark Dataset and an Information Fusion Network | Kailai Sun et.al. | 2408.05877 | null |
| 2024-08-09 | Mesh-based Object Tracking for Dynamic Semantic 3D Scene Graphs via Ray Tracing | Lennart Niecksch et.al. | 2408.04979 | null |
| 2024-08-06 | Quantum Imaging Using Spatially Entangled Photon Pairs from a Nonlinear Metasurface | Jinyong Ma et.al. | 2408.02903 | null |
| 2024-08-05 | VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking | Yuxuan Lu et.al. | 2408.02263 | null |
| 2024-08-04 | 3D Single-object Tracking in Point Clouds with High Temporal Variation | Qiao Wu et.al. | 2408.02049 | null |
| 2024-08-03 | SiamMo: Siamese Motion-Centric 3D Object Tracking | Yuxiang Yang et.al. | 2408.01688 | link |
| 2024-08-02 | Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach | Yabin Zhu et.al. | 2408.00969 | link |
| 2024-08-05 | U2UData: A Large-scale Cooperative Perception Dataset for Swarm UAVs Autonomous Flight | Tongtong Feng et.al. | 2408.00606 | null |
| 2024-08-01 | A Batch Update Using Multiplicative Noise Modelling for Extended Object Tracking | Christian Gramsch et.al. | 2408.00417 | null |
| 2024-07-30 | SharkTrack: an accurate, generalisable software for streamlining shark and ray underwater video analysis | Filippo Varini et.al. | 2407.20623 | null |
| 2024-07-29 | MEVDT: Multi-Modal Event-Based Vehicle Detection and Tracking Dataset | Zaid A. El Shair et.al. | 2407.20446 | null |
| 2024-07-28 | Progressive Domain Adaptation for Thermal Infrared Object Tracking | Qiao Li et.al. | 2407.19430 | null |
| 2024-08-05 | Leveraging Foundation Models via Knowledge Distillation in Multi-Object Tracking: Distilling DINOv2 Features to FairMOT | Niels G. Faber et.al. | 2407.18288 | null |
| 2024-07-20 | CORT: Class-Oriented Real-time Tracking for Embedded Systems | Edoardo Cittadini et.al. | 2407.17521 | null |
| 2024-07-23 | 3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images | Jie Zhao et.al. | 2407.16137 | null |
| 2024-07-21 | Multiple Object Detection and Tracking in Panoramic Videos for Cycling Safety Analysis | Jingwei Guo et.al. | 2407.15199 | link |
| 2024-07-19 | Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking | Yunfei Zhang et.al. | 2407.14086 | null |
| 2024-07-19 | OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking | Zekun Qian et.al. | 2407.14047 | null |
| 2024-07-18 | Boosting Online 3D Multi-Object Tracking through Camera-Radar Cross Check | Sheng-Yao Kuan et.al. | 2407.13937 | null |
| 2024-07-17 | Strawberry detection and counting based on YOLOv7 pruning and information based tracking algorithm | Shiyu Liu et.al. | 2407.12614 | null |
| 2024-07-16 | VideoClusterNet: Self-Supervised and Adaptive Clustering For Videos | Devesh Walawalkar et.al. | 2407.12214 | null |
| 2024-07-15 | Effective Motion Modeling for UAV-platform Multiple Object Tracking with Re-Margin Loss | Mufeng Yao et.al. | 2407.10485 | null |
| 2024-07-16 | Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking | Lorenzo Vaquero et.al. | 2407.10151 | link |
| 2024-07-12 | DroneMOT: Drone-based Multi-Object Tracking Considering Detection Difficulties and Simultaneous Moving of Drones and Objects | Peng Wang et.al. | 2407.09051 | null |
| 2024-07-11 | Manipulating a Tetris-Inspired 3D Video Representation | Mihir Godbole et.al. | 2407.08885 | null |
| 2024-07-11 | Visual Multi-Object Tracking with Re-Identification and Occlusion Handling using Labeled Random Finite Sets | Linh Van Ma et.al. | 2407.08872 | null |
| 2024-07-11 | CommRad: Context-Aware Sensing-Driven Millimeter-Wave Networks | Ish Kumar Jain et.al. | 2407.08817 | null |
| 2024-07-10 | Deep Learning-Based Robust Multi-Object Tracking via Fusion of mmWave Radar and Camera Sensors | Lei Cheng et.al. | 2407.08049 | null |
| 2024-07-08 | GeoWATCH for Detecting Heavy Construction in Heterogeneous Time Series of Satellite Images | Jon Crall et.al. | 2407.06337 | null |
| 2024-07-07 | Addressing single object tracking in satellite imagery through prompt-engineered solutions | Athena Psalta et.al. | 2407.05518 | null |
| 2024-07-09 | P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds | Jiahao Nie et.al. | 2407.05238 | link |
| 2024-07-06 | VIPS-Odom: Visual-Inertial Odometry Tightly-coupled with Parking Slots for Autonomous Parking | Xuefeng Jiang et.al. | 2407.05017 | null |
| 2024-07-05 | TF-SASM: Training-free Spatial-aware Sparse Memory for Multi-object Tracking | Thuc Nguyen-Quang et.al. | 2407.04327 | null |
| 2024-07-08 | SSP-GNN: Learning to Track via Bilevel Optimization | Griffin Golias et.al. | 2407.04308 | null |
| 2024-07-05 | FeatureSORT: Essential Features for Effective Tracking | Hamidreza Hashempoor et.al. | 2407.04249 | null |
| 2024-07-04 | Attention Normalization Impacts Cardinality Generalization in Slot Attention | Markus Krimmel et.al. | 2407.04170 | null |
| 2024-07-04 | TrackPGD: A White-box Attack using Binary Masks against Robust Transformer Trackers | Fatemeh Nourilenjan Nokabadi et.al. | 2407.03946 | null |
| 2024-07-03 | Applying Extended Object Tracking for Self-Localization of Roadside Radar Sensors | Longfei Han et.al. | 2407.03084 | null |
| 2024-07-02 | FlowTrack: Point-level Flow Network for 3D Single Object Tracking | Shuo Li et.al. | 2407.01959 | null |
| 2024-07-02 | The Solution for the ICCV 2023 Perception Test Challenge 2023 – Task 6 – Grounded videoQA | Hailiang Zhang et.al. | 2407.01907 | null |
| 2024-06-30 | DroBoost: An Intelligent Score and Model Boosting Method for Drone Detection | Ogulcan Eryuksel et.al. | 2407.00830 | null |
| 2024-06-30 | Engineering an Efficient Object Tracker for Non-Linear Motion | Momir Adžemović et.al. | 2407.00738 | null |
| 2024-06-28 | PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators | Kuo-Hao Zeng et.al. | 2406.20083 | null |
| 2024-06-28 | eMoE-Tracker: Environmental MoE-based Transformer for Robust Event-guided Object Tracking | Yucheng Chen et.al. | 2406.20024 | null |
| 2024-06-28 | StreamMOTP: Streaming and Unified Framework for Joint 3D Multi-Object Tracking and Trajectory Prediction | Jiaheng Zhuang et.al. | 2406.19844 | null |
| 2024-06-28 | Basketball-SORT: An Association Method for Complex Multi-object Occlusion Problems in Basketball Multi-object Tracking | Qingrui Hu et.al. | 2406.19655 | null |
| 2024-06-26 | BiTrack: Bidirectional Offline 3D Multi-Object Tracking Using Camera-LiDAR Data | Kemiao Huang et.al. | 2406.18414 | link |
| 2024-06-24 | POPCat: Propagation of particles for complex annotation tasks | Adam Srebrnjak Yang et.al. | 2406.17183 | null |
| 2024-06-24 | A Certifiable Algorithm for Simultaneous Shape Estimation and Object Tracking | Lorenzo Shaikewitz et.al. | 2406.16837 | link |
| 2024-06-24 | The Progression of Transformers from Language to Vision to MOT: A Literature Review on Multi-Object Tracking with Transformers | Abhi Kamboj et.al. | 2406.16784 | null |
| 2024-06-21 | LU2Net: A Lightweight Network for Real-time Underwater Image Enhancement | Haodong Yang et.al. | 2406.14973 | null |
| 2024-06-22 | Velocity Analysis of Moving Objects in Earth Observation Satellite Images Using Multi-Spectral Push Broom Scanning | Eric Keto et.al. | 2406.13710 | null |
| 2024-06-19 | Hierarchical IoU Tracking based on Interval | Yunhao Du et.al. | 2406.13271 | null |
| 2024-06-19 | Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models | Akchay Srivastava et.al. | 2406.13232 | null |
| 2024-06-17 | Deep HM-SORT: Enhancing Multi-Object Tracking in Sports with Deep Features, Harmonic Mean, and Expansion IOU | Matias Gran-Henriksen et.al. | 2406.12081 | null |
| 2024-06-17 | VideoVista: A Versatile Benchmark for Video Understanding and Reasoning | Yunxin Li et.al. | 2406.11303 | null |
| 2024-06-14 | Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors | Chaeyeon Han et.al. | 2406.09998 | null |
| 2024-06-14 | Robust compressive tracking via online weighted multiple instance learning | Sandeep Singh Sengar et.al. | 2406.09914 | null |
| 2024-06-13 | Introducing HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking | Prithviraj Banerjee et.al. | 2406.09598 | null |
| 2024-06-12 | LaMOT: Language-Guided Multi-Object Tracking | Yunhao Li et.al. | 2406.08324 | link |
| 2024-06-12 | Vessel Re-identification and Activity Detection in Thermal Domain for Maritime Surveillance | Yasod Ginige et.al. | 2406.08294 | null |
| 2024-06-11 | Watching Swarm Dynamics from Above: A Framework for Advanced Object Tracking in Drone Videos | Duc Pham et.al. | 2406.07680 | null |
| 2024-06-11 | Haptic Repurposing with GenAI | Haoyu Wang et.al. | 2406.07228 | null |
| 2024-06-11 | UVIS: Unsupervised Video Instance Segmentation | Shuaiyi Huang et.al. | 2406.06908 | null |
| 2024-06-09 | ControlLoc: Physical-World Hijacking Attack on Visual Perception in Autonomous Driving | Chen Ma et.al. | 2406.05810 | null |
| 2024-06-09 | SlowPerception: Physical-World Latency Attack against Visual Perception in Autonomous Driving | Chen Ma et.al. | 2406.05800 | null |
| 2024-06-07 | Bootstrapping Referring Multi-Object Tracking | Yani Zhang et.al. | 2406.05039 | link |
| 2024-06-07 | Multi-Granularity Language-Guided Multi-Object Tracking | Yuhao Li et.al. | 2406.04844 | link |
| 2024-06-06 | Matching Anything by Segmenting Anything | Siyuan Li et.al. | 2406.04221 | link |
| 2024-06-06 | ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints | Divij Handa et.al. | 2406.04046 | null |
| 2024-06-04 | UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking | Lijun Zhou et.al. | 2406.02147 | null |
| 2024-06-03 | Reproducibility Study on Adversarial Attacks Against Robust Transformer Trackers | Fatemeh Nourilenjan Nokabadi et.al. | 2406.01765 | link |
| 2024-06-03 | Prototypical Transformer as Unified Motion Learners | Cheng Han et.al. | 2406.01559 | null |
| 2024-06-03 | Convolutional Unscented Kalman Filter for Multi-Object Tracking with Outliers | Shiqi Liu et.al. | 2406.01380 | null |
| 2024-06-03 | Multi-Object Tracking based on Imaging Radar 3D Object Detection | Patrick Palmer et.al. | 2406.01011 | null |
| 2024-06-01 | Learning to Approximate Particle Smoothing Trajectories via Diffusion Generative Models | Ella Tamir et.al. | 2406.00561 | null |
| 2024-06-01 | Towards Generalizable Multi-Object Tracking | Zheng Qin et.al. | 2406.00429 | link |
| 2024-05-30 | WebUOT-1M: Advancing Deep Underwater Object Tracking with A Million-Scale Benchmark | Chunhui Zhang et.al. | 2405.19818 | link |
| 2024-05-30 | FaceLift: Semi-supervised 3D Facial Landmark Localization | David Ferman et.al. | 2405.19646 | null |
| 2024-05-29 | DGD: Dynamic 3D Gaussians Distillation | Isaac Labe et.al. | 2405.19321 | null |
| 2024-05-28 | Track Initialization and Re-Identification for~3D Multi-View Multi-Object Tracking | Linh Van Ma et.al. | 2405.18606 | link |
| 2024-05-28 | Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion | Hongze Sun et.al. | 2405.17903 | null |
| 2024-05-28 | Towards a Generalist and Blind RGB-X Tracker | Yuedong Tan et.al. | 2405.17773 | null |
| 2024-06-03 | BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos | Isla Duporge et.al. | 2405.17698 | null |
| 2024-05-27 | Tracking Small Birds by Detection Candidate Region Filtering and Detection History-aware Association | Tingwei Liu et.al. | 2405.17323 | null |
| 2024-05-24 | ETTrack: Enhanced Temporal Motion Predictor for Multi-Object Tracking | Xudong Han et.al. | 2405.15755 | null |
| 2024-05-24 | Trackastra: Transformer-based cell tracking for live-cell microscopy | Benjamin Gallusser et.al. | 2405.15700 | link |
| 2024-05-24 | An Approximate Dynamic Programming Framework for Occlusion-Robust Multi-Object Tracking | Pratyusha Musunuru et.al. | 2405.15137 | null |
| 2024-05-23 | Awesome Multi-modal Object Tracking | Chunhui Zhang et.al. | 2405.14200 | null |
| 2024-05-23 | Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning | Zhenyu Wei et.al. | 2405.14195 | null |
| 2024-05-23 | PuTR: A Pure Transformer for Decoupled and Online Multi-Object Tracking | Chongwei Liu et.al. | 2405.14119 | null |
| 2024-05-22 | Multi Player Tracking in Ice Hockey with Homographic Projections | Harish Prakash et.al. | 2405.13397 | null |
| 2024-05-20 | DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM | Xuchen Li et.al. | 2405.12139 | null |
| 2024-05-19 | Track Anything Rapter(TAR) | Tharun V. Puthanveettil et.al. | 2405.11655 | link |
| 2024-05-19 | RobMOT: Robust 3D Multi-Object Tracking by Observational Noise and State Estimation Drift Mitigation on LiDAR PointCloud | Mohamed Nagy et.al. | 2405.11536 | null |
| 2024-05-18 | City-Scale Multi-Camera Vehicle Tracking System with Improved Self-Supervised Camera Link Model | Yuqiang Lin et.al. | 2405.11345 | null |
| 2024-05-17 | Air Signing and Privacy-Preserving Signature Verification for Digital Documents | P. Sarveswarasarma et.al. | 2405.10868 | null |
| 2024-05-16 | A Novel Bounding Box Regression Method for Single Object Tracking | Omar Abdelaziz et.al. | 2405.10444 | null |
| 2024-05-16 | Beyond Traditional Single Object Tracking: A Survey | Omar Abdelaziz et.al. | 2405.10439 | null |
| 2024-05-16 | Spatial Cognition: a Wave Hypothesis | Robert Worden et.al. | 2405.10112 | null |
| 2024-05-14 | Learning Correspondence for Deformable Objects | Priya Sundaresan et.al. | 2405.08996 | null |
| 2024-05-14 | ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association | Shuxiao Ding et.al. | 2405.08909 | link |
| 2024-05-12 | MAML MOT: Multiple Object Tracking based on Meta-Learning | Jiayi Chen et.al. | 2405.07272 | null |
| 2024-05-16 | Common Corruptions for Enhancing and Evaluating Robustness in Air-to-Air Visual Object Detection | Anastasios Arsenos et.al. | 2405.06765 | null |
| 2024-05-16 | Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance Estimation | Vasileios Karampinis et.al. | 2405.06749 | null |
| 2024-05-10 | Multi-Object Tracking in the Dark | Xinzhe Wang et.al. | 2405.06600 | link |
| 2024-05-09 | Outlier-robust Kalman Filtering through Generalised Bayes | Gerardo Duran-Martin et.al. | 2405.05646 | link |
| 2024-05-08 | MOTLEE: Collaborative Multi-Object Tracking Using Temporal Consistency for Neighboring Robot Frame Alignment | Mason B. Peterson et.al. | 2405.05210 | link |
| 2024-05-08 | TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking | Pengcheng Shao et.al. | 2405.05004 | link |
| 2024-05-07 | DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving | Chen Min et.al. | 2405.04390 | null |
| 2024-05-07 | Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map | Yuxuan Xia et.al. | 2405.04290 | null |
| 2024-05-06 | Collecting Consistently High Quality Object Tracks with Minimal Human Involvement by Using Self-Supervised Learning to Detect Tracker Errors | Samreen Anjum et.al. | 2405.03643 | null |
| 2024-05-03 | Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning | Dhruva Tirumala et.al. | 2405.02425 | null |
| 2024-05-03 | DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos | Wen-Hsuan Chu et.al. | 2405.02280 | link |
| 2024-05-02 | Tracking and classifying objects with DAS data along railway | Simon L. B. Fredriksen et.al. | 2405.01140 | null |
| 2024-04-29 | Innovative Integration of Visual Foundation Model with a Robotic Arm on a Mobile Platform | Shimian Zhang et.al. | 2404.18720 | null |
| 2024-04-27 | 3D Extended Object Tracking by Fusing Roadside Sparse Radar Point Clouds and Pixel Keypoints | Jiayin Deng et.al. | 2404.17903 | link |
| 2024-04-22 | 360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos | Yinzhe Xu et.al. | 2404.13953 | null |
| 2024-04-22 | TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos | Atom Scott et.al. | 2404.13868 | null |
| 2024-04-19 | A comparison between single-stage and two-stage 3D tracking algorithms for greenhouse robotics | David Rapado-Rincon et.al. | 2404.12963 | null |
| 2024-04-18 | Inverse Neural Rendering for Explainable Multi-Object Tracking | Julian Ost et.al. | 2404.12359 | null |
| 2024-04-24 | On Target Detection in the Presence of Clutter in Joint Communication and Sensing Cellular Networks | Julia Vinogradova et.al. | 2404.12133 | null |
| 2024-04-18 | MLS-Track: Multilevel Semantic Interaction in RMOT | Zeliang Ma et.al. | 2404.12031 | null |
| 2024-04-18 | KnotResolver: Tracking self-intersecting filaments in microscopy using directed graphs | Dhruv Khatri et.al. | 2404.12029 | link |
| 2024-04-17 | How to deal with glare for improved perception of Autonomous Vehicles | Muhammad Z. Alam et.al. | 2404.10992 | null |
| 2024-04-12 | Into the Fog: Evaluating Multiple Object Tracking Robustness | Nadezda Kirillova et.al. | 2404.10534 | link |
| 2024-04-15 | 3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow | Felix Taubner et.al. | 2404.09819 | null |
| 2024-04-12 | IDD-X: A Multi-View Dataset for Ego-relative Important Object Localization and Explanation in Dense and Unstructured Traffic | Chirag Parikh et.al. | 2404.08561 | null |
| 2024-04-11 | Gaga: Group Any Gaussians via 3D-aware Memory Bank | Weijie Lyu et.al. | 2404.07977 | null |
| 2024-04-11 | SFSORT: Scene Features-based Simple Online Real-Time Tracker | M. M. Morsali et.al. | 2404.07553 | link |
| 2024-04-11 | PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds | Weisheng Xu et.al. | 2404.07495 | link |
| 2024-04-11 | Trashbusters: Deep Learning Approach for Litter Detection and Tracking | Kashish Jain et.al. | 2404.07467 | null |
| 2024-04-09 | LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks | Jianlang Chen et.al. | 2404.06247 | link |
| 2024-04-08 | DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker | Jiapeng Wu et.al. | 2404.05518 | link |
| 2024-04-08 | Self-Supervised Multi-Object Tracking with Path Consistency | Zijia Lu et.al. | 2404.05136 | link |
| 2024-04-07 | Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind | Chiara Plizzari et.al. | 2404.05072 | null |
| 2024-04-03 | Ego-Motion Aware Target Prediction Module for Robust Multi-Object Tracking | Navid Mahdian et.al. | 2404.03110 | link |
| 2024-04-03 | Representation Alignment Contrastive Regularization for Multi-Object Tracking | Shujie Chen et.al. | 2404.02562 | link |
| 2024-03-29 | Bayesian Nonparametrics: An Alternative to Deep Learning | Bahman Moraffah et.al. | 2404.00085 | null |
| 2024-03-29 | MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark | Sanghyun Woo et.al. | 2403.20225 | null |
| 2024-03-29 | SceneTracker: Long-term Scene Flow Estimation Network | Bo Wang et.al. | 2403.19924 | link |
| 2024-03-27 | Enhancing Multiple Object Tracking Accuracy via Quantum Annealing | Yasuyuki Ihara et.al. | 2403.18908 | null |
| 2024-03-27 | TAFormer: A Unified Target-Aware Transformer for Video and Motion Joint Prediction in Aerial Scenes | Liangyu Xu et.al. | 2403.18238 | null |
| 2024-03-27 | Middle Fusion and Multi-Stage, Multi-Form Prompts for Robust RGB-T Tracking | Qiming Wang et.al. | 2403.18193 | null |
| 2024-03-26 | OmniVid: A Generative Framework for Universal Video Understanding | Junke Wang et.al. | 2403.17935 | link |
| 2024-03-26 | Exploring Dynamic Transformer for Efficient Object Tracking | Jiawen Zhu et.al. | 2403.17651 | null |
| 2024-03-25 | Multiple Object Tracking as ID Prediction | Ruopeng Gao et.al. | 2403.16848 | link |
| 2024-03-25 | From Two Stream to One Stream: Efficient RGB-T Tracking via Mutual Prompt Learning and Knowledge Distillation | Yang Luo et.al. | 2403.16834 | null |
| 2024-03-29 | Elysium: Exploring Object-level Perception in Videos via MLLM | Han Wang et.al. | 2403.16558 | link |
| 2024-03-25 | Spike-NeRF: Neural Radiance Field Based On Spike Camera | Yijia Guo et.al. | 2403.16410 | null |
| 2024-03-28 | SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking | Xiaojun Hou et.al. | 2403.16002 | link |
| 2024-03-23 | Spatio-Temporal Bi-directional Cross-frame Memory for Distractor Filtering Point Cloud Single Object Tracking | Shaoyu Sun et.al. | 2403.15831 | null |
| 2024-03-23 | PNAS-MOT: Multi-Modal Object Tracking with Pareto Neural Architecture Search | Chensheng Peng et.al. | 2403.15712 | link |
| 2024-03-22 | CR3DT: Camera-RADAR Fusion for 3D Detection and Tracking | Nicolas Baumann et.al. | 2403.15313 | link |
| 2024-03-22 | Reasoning-Enhanced Object-Centric Learning for Videos | Jian Li et.al. | 2403.15245 | link |
| 2024-03-20 | Fast-Poly: A Fast Polyhedral Framework For 3D Multi-Object Tracking | Xiaoyu Li et.al. | 2403.13443 | link |
| 2024-03-19 | Lifting Multi-View Detection and Tracking to the Bird’s Eye View | Torben Teepe et.al. | 2403.12573 | link |
| 2024-03-18 | Pedestrian Tracking with Monocular Camera using Unconstrained 3D Motion Model | Jan Krejčí et.al. | 2403.11978 | null |
| 2024-03-17 | NetTrack: Tracking Highly Dynamic Objects with a Net | Guangze Zheng et.al. | 2403.11186 | null |
| 2024-03-16 | View-Centric Multi-Object Tracking with Homographic Matching in Moving UAV | Deyi Ji et.al. | 2403.10830 | null |
| 2024-03-16 | Exploring Learning-based Motion Models in Multi-Object Tracking | Hsiang-Wei Huang et.al. | 2403.10826 | null |
| 2024-03-15 | NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices | Zhiyong Zhang et.al. | 2403.10425 | link |
| 2024-03-14 | OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning | Lingyi Hong et.al. | 2403.09634 | null |
| 2024-03-13 | Object Permanence Filter for Robust Tracking with Interactive Robots | Shaoting Peng et.al. | 2403.08231 | null |
| 2024-03-12 | Learning Data Association for Multi-Object Tracking using Only Coordinates | Mehdi Miah et.al. | 2403.08018 | null |
| 2024-03-12 | A Study on Centralised and Decentralised Swarm Robotics Architecture for Part Delivery System | Angelos Dimakos et.al. | 2403.07635 | null |
| 2024-03-12 | LiDAR Point Cloud-based Multiple Vehicle Tracking with Probabilistic Measurement-Region Association | Guanhua Ding et.al. | 2403.06423 | null |
| 2024-03-09 | SSF-Net: Spatial-Spectral Fusion Network with Spectral Angle Awareness for Hyperspectral Object Tracking | Hanzheng Wang et.al. | 2403.05852 | null |
| 2024-03-09 | Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline | Xiao Wang et.al. | 2403.05839 | link |
| 2024-03-11 | Beyond MOT: Semantic Multi-Object Tracking | Yunhao Li et.al. | 2403.05021 | link |
| 2024-03-07 | Delving into the Trajectory Long-tail Distribution for Muti-object Tracking | Sijia Chen et.al. | 2403.04700 | link |
| 2024-03-07 | Towards learning-based planning:The nuPlan benchmark for real-world autonomous driving | Napat Karnchanachari et.al. | 2403.04133 | null |
| 2024-03-06 | Multi-Object Tracking with Camera-LiDAR Fusion for Autonomous Driving | Riccardo Pieroni et.al. | 2403.04112 | null |
| 2024-03-06 | VastTrack: Vast Category Visual Object Tracking | Liang Peng et.al. | 2403.03493 | link |
| 2024-03-05 | DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking | Cheng Huang et.al. | 2403.02767 | null |
| 2024-03-04 | DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction | Weiyi Lv et.al. | 2403.02075 | null |
| 2024-03-04 | Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning | Tung Le et.al. | 2403.01781 | null |
| 2024-03-01 | Joint Spatial-Temporal Calibration for Camera and Global Pose Sensor | Junlin Song et.al. | 2403.00976 | null |
| 2024-02-28 | Estimation of railway vehicle response for track geometry evaluation using branch Fourier neural operator | Qingjing Wang et.al. | 2402.18366 | null |
| 2024-02-28 | EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving | Jiacheng Lin et.al. | 2402.18302 | link |
| 2024-02-28 | Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks | Zhewei Wu et.al. | 2402.17976 | null |
| 2024-02-27 | SWTrack: Multiple Hypothesis Sliding Window 3D Multi-Object Tracking | Sandro Papais et.al. | 2402.17892 | null |
| 2024-02-27 | In Defense and Revival of Bayesian Filtering for Thermal Infrared Object Tracking | Peng Gao et.al. | 2402.17098 | null |
| 2024-02-26 | Searching a Lightweight Network Architecture for Thermal Infrared Pedestrian Tracking | Peng Gao et.al. | 2402.16570 | null |
| 2024-02-26 | SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud Tracking | Yu Lin et.al. | 2402.16249 | null |
| 2024-02-26 | Real-Time Vehicle Detection and Urban Traffic Behavior Analysis Based on UAV Traffic Videos on Mobile Devices | Yuan Zhu et.al. | 2402.16246 | null |
| 2024-02-24 | Multi-Object Tracking by Hierarchical Visual Representations | Jinkun Cao et.al. | 2402.15895 | null |
| 2024-02-24 | Detection Is Tracking: Point Cloud Multi-Sweep Deep Learning Models Revisited | Lingji Chen et.al. | 2402.15756 | null |
Action Recognition
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-12-23 | Bridging Modalities and Transferring Knowledge: Enhanced Multimodal Understanding and Recognition | Gorjan Radevski et.al. | 2512.20501 | null |
| 2025-12-23 | Beyond Motion Pattern: An Empirical Study of Physical Forces for Human Motion Understanding | Anh Dao et.al. | 2512.20451 | null |
| 2025-12-23 | DETACH : Decomposed Spatio-Temporal Alignment for Exocentric Video and Ambient Sensors with Staged Learning | Junho Yoon et.al. | 2512.20409 | null |
| 2025-12-23 | Effect of Activation Function and Model Optimizer on the Performance of Human Activity Recognition System Using Various Deep Learning Models | Subrata Kumer Paula et.al. | 2512.20104 | null |
| 2025-12-23 | A Contextual Analysis of Driver-Facing and Dual-View Video Inputs for Distraction Detection in Naturalistic Driving Environments | Anthony Dontoh et.al. | 2512.20025 | null |
| 2025-12-22 | Distinguishing Visually Similar Actions: Prompt-Guided Semantic Prototype Modulation for Few-Shot Action Recognition | Xiaoyang Li et.al. | 2512.19036 | null |
| 2025-12-21 | Context-Aware Network Based on Multi-scale Spatio-temporal Attention for Action Recognition in Videos | Xiaoyang Li et.al. | 2512.18750 | null |
| 2025-12-21 | Hierarchical Bayesian Framework for Multisource Domain Adaptation | Alexander M. Glandon et.al. | 2512.18553 | null |
| 2025-12-17 | Seeing Beyond the Scene: Analyzing and Mitigating Background Bias in Action Recognition | Ellie Zhou et.al. | 2512.17953 | null |
| 2025-12-19 | Xiaomi MiMo-VL-Miloco Technical Report | Jiaze Li et.al. | 2512.17436 | null |
| 2025-12-18 | OMG-Bench: A New Challenging Benchmark for Skeleton-based Online Micro Hand Gesture Recognition | Haochen Chang et.al. | 2512.16727 | null |
| 2025-12-18 | Skeleton-Snippet Contrastive Learning with Multiscale Feature Fusion for Action Localization | Qiushuo Cheng et.al. | 2512.16504 | null |
| 2025-12-06 | Smart Surveillance: Identifying IoT Device Behaviours using ML-Powered Traffic Analysis | Reza Ryan et.al. | 2512.13709 | null |
| 2025-12-15 | Recurrent Video Masked Autoencoders | Daniel Zoran et.al. | 2512.13684 | null |
| 2025-12-14 | StegaVAR: Privacy-Preserving Video Action Recognition via Steganographic Domain Analysis | Lixin Chen et.al. | 2512.12586 | null |
| 2025-12-13 | From Human Intention to Action Prediction: A Comprehensive Benchmark for Intention-driven End-to-End Autonomous Driving | Huan Zheng et.al. | 2512.12302 | null |
| 2025-12-12 | DynaPURLS: Dynamic Refinement of Part-aware Representations for Skeleton-based Zero-Shot Action Recognition | Jingmin Zhu et.al. | 2512.11941 | null |
| 2025-12-05 | Explainable Adversarial-Robust Vision-Language-Action Model for Robotic Manipulation | Ju-Young Kim et.al. | 2512.11865 | null |
| 2025-12-12 | TSkel-Mamba: Temporal Dynamic Modeling via State Space Model for Human Skeleton-based Action Recognition | Yanan Liu et.al. | 2512.11503 | null |
| 2025-12-12 | Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation | Jingmin Zhu et.al. | 2512.11458 | null |
| 2025-12-12 | Task-Specific Distance Correlation Matching for Few-Shot Action Recognition | Fei Long et.al. | 2512.11340 | null |
| 2025-12-12 | Breast-Rehab: A Postoperative Breast Cancer Rehabilitation Training Assessment System Based on Human Action Recognition | Zikang Chen et.al. | 2512.11245 | null |
| 2025-12-12 | Multi-task Learning with Extended Temporal Shift Module for Temporal Action Localization | Anh-Kiet Duong et.al. | 2512.11189 | null |
| 2025-12-11 | Deep Photonic Reservoir Computing with On-chip Nonlinearity | Jinlong Xiang et.al. | 2512.10626 | null |
| 2025-12-11 | Lang2Motion: Bridging Language and Motion through Joint Embedding Spaces | Bishoy Galoaa et.al. | 2512.10617 | null |
| 2025-12-11 | Lies We Can Trust: Quantifying Action Uncertainty with Inaccurate Stochastic Dynamics through Conformalized Nonholonomic Lie Groups | Luís Marques et.al. | 2512.10294 | null |
| 2025-12-10 | GLaD: Geometric Latent Distillation for Vision-Language-Action Models | Minghao Guo et.al. | 2512.09619 | null |
| 2025-12-09 | Neural Ordinary Differential Equations for Simulating Metabolic Pathway Dynamics from Time-Series Multiomics Data | Udesh Habaraduwa et.al. | 2512.08732 | null |
| 2025-12-09 | Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning | Huilin Xu et.al. | 2512.08639 | null |
| 2025-12-09 | Mind to Hand: Purposeful Robotic Control via Embodied Reasoning | Peijun Tang et.al. | 2512.08580 | null |
| 2025-12-08 | A Comparative Study of EMG- and IMU-based Gesture Recognition at the Wrist and Forearm | Soroush Baghernezhad et.al. | 2512.07997 | null |
| 2025-12-08 | Improving action classification with brain-inspired deep networks | Aidas Aglinskas et.al. | 2512.07729 | null |
| 2025-12-08 | A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning | Siyang Jiang et.al. | 2512.07136 | null |
| 2025-12-07 | VideoVLA: Video Generators Can Be Generalizable Robot Manipulators | Yichao Shen et.al. | 2512.06963 | null |
| 2025-12-04 | Towards Adaptive Fusion of Multimodal Deep Networks for Human Action Recognition | Novanto Yudistira et.al. | 2512.04943 | null |
| 2025-12-04 | CIG-MAE: Cross-Modal Information-Guided Masked Autoencoder for Self-Supervised WiFi Sensing | Gang Liu et.al. | 2512.04723 | null |
| 2025-12-04 | WiFi-based Cross-Domain Gesture Recognition Using Attention Mechanism | Ruijing Liu et.al. | 2512.04521 | null |
| 2025-12-03 | Heatmap Pooling Network for Action Recognition from RGB Videos | Mengyuan Liu et.al. | 2512.03837 | null |
| 2025-12-02 | SAM2Grasp: Resolve Multi-modal Grasping via Prompt-conditioned Temporal Action Prediction | Shengkai Wu et.al. | 2512.02609 | null |
| 2025-12-01 | TBT-Former: Learning Temporal Boundary Distributions for Action Localization | Thisara Rathnayaka et.al. | 2512.01298 | null |
| 2025-11-29 | Developing Fairness-Aware Task Decomposition to Improve Equity in Post-Spinal Fusion Complication Prediction | Yining Yuan et.al. | 2512.00598 | null |
| 2025-11-29 | Integrating Skeleton Based Representations for Robust Yoga Pose Classification Using Deep Learning Models | Mohammed Mohiuddin et.al. | 2512.00572 | null |
| 2025-10-14 | MOTION: ML-Assisted On-Device Low-Latency Motion Recognition | Veeramani Pugazhenthi et.al. | 2512.00008 | null |
| 2025-11-28 | LatBot: Distilling Universal Latent Actions for Vision-Language-Action Models | Zuolei Li et.al. | 2511.23034 | null |
| 2025-11-27 | SkeletonAgent: An Agentic Interaction Framework for Skeleton-based Action Recognition | Hongda Liu et.al. | 2511.22433 | null |
| 2025-11-27 | HandyLabel: Towards Post-Processing to Real-Time Annotation Using Skeleton Based Hand Gesture Recognition | Sachin Kumar Singh et.al. | 2511.22337 | null |
| 2025-11-26 | Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models | Naifu Zhang et.al. | 2511.21663 | null |
| 2025-11-26 | Active Learning for GCN-based Action Recognition | Hichem Sahbi et.al. | 2511.21625 | null |
| 2025-11-26 | Towards an Effective Action-Region Tracking Framework for Fine-grained Video Action Recognition | Baoli Sun et.al. | 2511.21202 | null |
| 2025-11-24 | Scale What Counts, Mask What Matters: Evaluating Foundation Models for Zero-Shot Cross-Domain Wi-Fi Sensing | Cheng Jiang et.al. | 2511.18792 | null |
| 2025-11-22 | ActDistill: General Action-Guided Self-Derived Distillation for Efficient Vision-Language-Action Models | Wencheng Ye et.al. | 2511.18082 | null |
| 2025-11-21 | Label-Efficient Skeleton-based Recognition with Stable-Invertible Graph Convolutional Networks | Hichem Sahbi et.al. | 2511.17345 | null |
| 2025-11-21 | Social-Media Based Personas Challenge: Hybrid Prediction of Common and Rare User Actions on Bluesky | Benjamin White et.al. | 2511.17241 | null |
| 2025-11-21 | VLA-4D: Embedding 4D Awareness into Vision-Language-Action Models for SpatioTemporally Coherent Robotic Manipulation | Hanyu Zhou et.al. | 2511.17199 | null |
| 2025-11-21 | Progress-Think: Semantic Progress Reasoning for Vision-Language Navigation | Shuo Wang et.al. | 2511.17097 | null |
| 2025-11-21 | H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation | Yijie Zhu et.al. | 2511.17079 | null |
| 2025-11-21 | The Wireless Charger as a Gesture Sensor: A Novel Approach to Ubiquitous Interaction | Weiyi Wang et.al. | 2511.16989 | null |
| 2025-11-21 | Parts-Mamba: Augmenting Joint Context with Part-Level Scanning for Occluded Human Skeleton | Tianyi Shen et.al. | 2511.16860 | null |
| 2025-11-20 | BoxingVI: A Multi-Modal Benchmark for Boxing Action Recognition and Localization | Rahul Kumar et.al. | 2511.16524 | null |
| 2025-11-20 | FOOTPASS: A Multi-Modal Multi-Agent Tactical Context Dataset for Play-by-Play Action Spotting in Soccer Broadcast Videos | Jeremie Ochin et.al. | 2511.16183 | null |
| 2025-11-19 | Scriboora: Rethinking Human Pose Forecasting | Daniel Bermuth et.al. | 2511.15565 | null |
| 2025-11-18 | DoGCLR: Dominance-Game Contrastive Learning Network for Skeleton-Based Action Recognition | Yanshan Li et.al. | 2511.14179 | null |
| 2025-11-18 | A Machine Learning-Based Multimodal Framework for Wearable Sensor-Based Archery Action Recognition and Stress Estimation | Xianghe Liu et.al. | 2511.14057 | null |
| 2025-11-17 | Computer Vision based group activity detection and action spotting | Narthana Sivalingam et.al. | 2511.13315 | null |
| 2025-11-17 | MGCA-Net: Multi-Grained Category-Aware Network for Open-Vocabulary Temporal Action Localization | Zhenying Fang et.al. | 2511.13039 | null |
| 2025-11-17 | View-aware Cross-modal Distillation for Multi-view Action Recognition | Trung Thanh Nguyen et.al. | 2511.12870 | null |
| 2025-11-16 | Pixels or Positions? Benchmarking Modalities in Group Activity Recognition | Drishya Karki et.al. | 2511.12606 | null |
| 2025-11-15 | Locomotion in CAVE: Enhancing Immersion through Full-Body Motion | Xiaohui Li et.al. | 2511.12251 | null |
| 2025-11-14 | Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective | Nhat Chung et.al. | 2511.11478 | null |
| 2025-11-13 | SUGAR: Learning Skeleton Representation with Visual-Motion Knowledge for Action Recognition | Qilang Ye et.al. | 2511.10091 | null |
| 2025-11-12 | Revisiting Cross-Architecture Distillation: Adaptive Dual-Teacher Transfer for Lightweight Video Models | Ying Peng et.al. | 2511.09469 | null |
| 2025-11-12 | Learning by Neighbor-Aware Semantics, Deciding by Open-form Flows: Towards Robust Zero-Shot Skeleton Action Recognition | Yang Chen et.al. | 2511.09388 | null |
| 2025-11-12 | PressTrack-HMR: Pressure-Based Top-Down Multi-Person Global Human Mesh Recovery | Jiayue Yuan et.al. | 2511.09147 | null |
| 2025-11-11 | Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding | Joseph Fioresi et.al. | 2511.08666 | null |
| 2025-11-09 | Learning Topology-Driven Multi-Subspace Fusion for Grassmannian Deep Network | Xuan Yu et.al. | 2511.08628 | null |
| 2025-11-05 | The chanciness of time | John M. Myers et.al. | 2511.08611 | null |
| 2025-11-11 | SASG-DA: Sparse-Aware Semantic-Guided Diffusion Augmentation For Myoelectric Gesture Recognition | Chen Liu et.al. | 2511.08344 | null |
| 2025-11-10 | Achieving Effective Virtual Reality Interactions via Acoustic Gesture Recognition based on Large Language Models | Xijie Zhang et.al. | 2511.07085 | null |
| 2025-11-10 | Otter: Mitigating Background Distractions of Wide-Angle Few-Shot Action Recognition with Enhanced RWKV | Wenbo Huang et.al. | 2511.06741 | null |
| 2025-11-09 | Learning-Based Robust Bayesian Persuasion with Conformal Prediction Guarantees | Heeseung Bang et.al. | 2511.06223 | null |
| 2025-11-06 | Grounding Foundational Vision Models with 3D Human Poses for Robust Action Recognition | Nicholas Babey et.al. | 2511.05622 | null |
| 2025-11-06 | Pose-Aware Multi-Level Motion Parsing for Action Quality Assessment | Shuaikang Zhu et.al. | 2511.05611 | null |
| 2025-11-07 | Accurate online action and gesture recognition system using detectors and Deep SPD Siamese Networks | Mohamed Sanim Akremi et.al. | 2511.05250 | null |
| 2025-11-06 | Unified Multimodal Diffusion Forcing for Forceful Manipulation | Zixuan Huang et.al. | 2511.04812 | null |
| 2025-11-06 | X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations | Maximus A. Pace et.al. | 2511.04671 | null |
| 2025-11-06 | Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment | Tao Lin et.al. | 2511.04555 | null |
| 2025-11-06 | Alternative Fairness and Accuracy Optimization in Criminal Justice | Shaolong Wu et.al. | 2511.04505 | null |
| 2025-11-06 | ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai | Surapon Nonesung et.al. | 2511.04479 | null |
| 2025-11-06 | Temporal Action Selection for Action Chunking | Yueyang Weng et.al. | 2511.04421 | null |
| 2025-11-06 | ForeRobo: Unlocking Infinite Simulation Data for 3D Goal-driven Robotic Manipulation | Dexin wang et.al. | 2511.04381 | null |
| 2025-11-06 | GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA Policies | Maëlic Neau et.al. | 2511.04357 | null |
| 2025-11-06 | RCMCL: A Unified Contrastive Learning Framework for Robust Multi-Modal (RGB-D, Skeleton, Point Cloud) Action Understanding | Hasan Akgul et.al. | 2511.04351 | null |
| 2025-11-06 | GUI-360 $^\circ$ : A Comprehensive Dataset and Benchmark for Computer-Using Agents | Jian Mu et.al. | 2511.04307 | null |
| 2025-11-06 | Expectation-Realization Interpretation of Quantum Superposition | Yanting Wang et.al. | 2511.04154 | null |
| 2025-11-06 | Learning from Online Videos at Inference Time for Computer-Use Agents | Yujian Liu et.al. | 2511.04137 | null |
| 2025-11-06 | Unified Effective Field Theory for Nonlinear and Quantum Optics | Xiaochen Liu et.al. | 2511.04118 | null |
| 2025-11-06 | Simple 3D Pose Features Support Human and Machine Social Scene Understanding | Wenshuo Qin et.al. | 2511.03988 | null |
| 2025-11-06 | Use of Continuous Glucose Monitoring with Machine Learning to Identify Metabolic Subphenotypes and Inform Precision Lifestyle Changes | Ahmed A. Metwally et.al. | 2511.03986 | null |
| 2025-11-06 | Temporal Zoom Networks: Distance Regression and Continuous Depth for Efficient Action Localization | Ibne Farabi Shihab et.al. | 2511.03943 | null |
| 2025-11-05 | Enhancing Q-Value Updates in Deep Q-Learning via Successor-State Prediction | Lipeng Zu et.al. | 2511.03836 | null |
| 2025-11-05 | Krylov Complexity Meets Confinement | Xuhao Jiang et.al. | 2511.03783 | null |
| 2025-11-05 | Disentangled Concepts Speak Louder Than Words: Explainable Video Action Recognition | Jongseo Lee et.al. | 2511.03725 | null |
| 2025-11-05 | A Lightweight 3D-CNN for Event-Based Human Action Recognition with Privacy-Preserving Potential | Mehdi Sefidgar Dilmaghani et.al. | 2511.03665 | null |
| 2025-11-05 | LiveTradeBench: Seeking Real-World Alpha with Large Language Models | Haofei Yu et.al. | 2511.03628 | link |
| 2025-11-05 | Learning Communication Skills in Multi-task Multi-agent Deep Reinforcement Learning | Changxi Zhu et.al. | 2511.03348 | null |
| 2025-11-05 | Multi-Object Tracking Retrieval with LLaVA-Video: A Training-Free Solution to MOT25-StAG Challenge | Yi Yang et.al. | 2511.03332 | null |
| 2025-11-04 | WorldPlanner: Monte Carlo Tree Search and MPC with Action-Conditioned Visual World Models | R. Khorrambakht et.al. | 2511.03077 | null |
| 2025-11-04 | The Curved Spacetime of Transformer Architectures | Riccardo Di Sipio et.al. | 2511.03060 | null |
| 2025-11-04 | VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation | Kevin Qinghong Lin et.al. | 2511.02778 | link |
| 2025-11-04 | Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning | Farhad Rezazadeh et.al. | 2511.02748 | null |
| 2025-11-04 | Radio and Optical Flares on the dMe Flare Star EV Lac | Rachel A. Osten et.al. | 2511.02719 | null |
| 2025-11-04 | MVAFormer: RGB-based Multi-View Spatio-Temporal Action Recognition with Transformer | Taiga Yamane et.al. | 2511.02473 | null |
| 2025-11-04 | From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics | Nicolas Schuler et.al. | 2511.02427 | null |
| 2025-11-03 | Euler-Heisenberg action for fermions coupled to gauge and axial vectors: Hessian diagonalization, sector classification, and applications | Lucas Pereira de Souza et.al. | 2511.02118 | null |
| 2025-11-03 | Neural dynamics of cognitive control: Current tensions and future promise | Dale Zhou et.al. | 2511.02063 | null |
| 2025-11-03 | Path-Coordinated Continual Learning with Neural Tangent Kernel-Justified Plasticity: A Theoretical Framework with Near State-of-the-Art Performance | Rathin Chandra Shit et.al. | 2511.02025 | null |
| 2025-11-03 | Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process | Jiayi Chen et.al. | 2511.01718 | null |
| 2025-11-03 | OmniVLA: Physically-Grounded Multimodal VLA with Unified Multi-Sensor Perception for Robotic Manipulation | Heyu Guo et.al. | 2511.01210 | null |
| 2025-11-02 | Rhythm in the Air: Vision-based Real-Time Music Generation through Gestures | Barathi Subramanian et.al. | 2511.00793 | null |
| 2025-10-30 | Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail | NVIDIA et.al. | 2511.00088 | null |
| 2025-10-31 | Enhancing Spatio-Temporal Zero-shot Action Recognition with Language-driven Description Attributes | Yehna Kim et.al. | 2510.27255 | null |
| 2025-10-31 | GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation | Tao Liu et.al. | 2510.27210 | null |
| 2025-10-30 | Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras | Christoffer Koo Øhrstrøm et.al. | 2510.26614 | null |
| 2025-10-29 | Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders | Ali Rasekh et.al. | 2510.26027 | null |
| 2025-10-29 | Informative Sample Selection Model for Skeleton-based Action Recognition with Limited Training Samples | Zhigang Tu et.al. | 2510.25345 | null |
| 2025-10-27 | Evaluation of Vision-LLMs in Surveillance Video | Pascal Benschop et.al. | 2510.23190 | null |
| 2025-10-27 | Enabling Vibration-Based Gesture Recognition on Everyday Furniture via Energy-Efficient FPGA Implementation of 1D Convolutional Networks | Koki Shibata et.al. | 2510.23156 | null |
| 2025-10-27 | Neural Recording Power Optimization Through Machine Learning Guided Resolution Reconfiguration | Aviral Pandey et.al. | 2510.22924 | null |
| 2025-10-13 | J-ORA: A Framework and Multimodal Dataset for Japanese Object Identification, Reference, Action Prediction in Robot Perception | Jesse Atuhurra et.al. | 2510.21761 | link |
| 2025-10-22 | From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction | Zhida Zhao et.al. | 2510.19654 | null |
| 2025-10-22 | Vision-Based Mistake Analysis in Procedural Activities: A Review of Advances and Challenges | Konstantinos Bacharidis et.al. | 2510.19292 | null |
| 2025-10-22 | MobiAct: Efficient MAV Action Recognition Using MobileNetV4 with Contrastive Learning and Knowledge Distillation | Zhang Nengbo et.al. | 2510.19273 | null |
| 2025-10-22 | See, Think, Act: Online Shopper Behavior Simulation with VLM Agents | Yimeng Zhang et.al. | 2510.19245 | null |
| 2025-10-21 | UniHPR: Unified Human Pose Representation via Singular Value Contrastive Learning | Zhongyu Jiang et.al. | 2510.19078 | null |
| 2025-10-21 | A Renaissance of Explicit Motion Information Mining from Transformers for Action Recognition | Peiqin Zhuang et.al. | 2510.18705 | null |
| 2025-10-21 | Biomechanically consistent real-time action recognition for human-robot interaction | Wanchen Li et.al. | 2510.18373 | null |
| 2025-10-21 | FST.ai 2.0: An Explainable AI Ecosystem for Fair, Fast, and Inclusive Decision-Making in Olympic and Paralympic Taekwondo | Keivan Shariatmadar et.al. | 2510.18193 | null |
| 2025-10-20 | Muscle Anatomy-aware Geometric Deep Learning for sEMG-based Gesture Decoding | Adyasha Dash et.al. | 2510.17660 | null |
| 2025-10-18 | MoS-VLA: A Vision-Language-Action Model with One-Shot Skill Adaptation | Ruihan Zhao et.al. | 2510.16617 | null |
| 2025-10-18 | RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba | Kunyu Peng et.al. | 2510.16444 | null |
| 2025-10-17 | StretchySnake: Flexible SSM Training Unlocks Action Recognition Across Spatio-Temporal Scales | Nyle Siddiqui et.al. | 2510.16209 | null |
| 2025-10-17 | MAVR-Net: Robust Multi-View Learning for MAV Action Recognition with Cross-View Attention | Nengbo Zhang et.al. | 2510.15448 | null |
| 2025-10-16 | MoCom: Motion-based Inter-MAV Visual Communication Using Event Vision and Spiking Neural Networks | Zhang Nengbo et.al. | 2510.14770 | null |
| 2025-10-15 | Generalizing WiFi Gesture Recognition via Large-Model-Aware Semantic Distillation and Alignment | Feng-Qi Cui et.al. | 2510.13390 | null |
| 2025-10-14 | SVAG-Bench: A Large-Scale Benchmark for Multi-Instance Spatio-temporal Video Action Grounding | Tanveer Hannan et.al. | 2510.13016 | null |
| 2025-10-13 | FOSSIL: Harnessing Feedback on Suboptimal Samples for Data-Efficient Generalisation with Imitation Learning for Embodied Vision-and-Language Tasks | Sabrina McCallum et.al. | 2510.11307 | null |
| 2025-10-13 | Mixup Helps Understanding Multimodal Video Better | Xiaoyu Ma et.al. | 2510.10986 | null |
| 2025-10-12 | MSF-Mamba: Motion-aware State Fusion Mamba for Efficient Micro-Gesture Recognition | Deng Li et.al. | 2510.10478 | null |
| 2025-10-11 | Dejavu: Towards Experience Feedback Learning for Embodied Intelligence | Shaokai Wu et.al. | 2510.10181 | null |
| 2025-10-11 | SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation | Zeyu Ling et.al. | 2510.10069 | null |
| 2025-10-10 | Enhancing Diffusion Policy with Classifier-Free Guidance for Temporal Robotic Tasks | Yuang Lu et.al. | 2510.09786 | null |
| 2025-10-10 | Diagnosing Shoulder Disorders Using Multimodal Large Language Models and Consumer-Grade Cameras | Jindong Hong et.al. | 2510.09230 | null |
| 2025-10-09 | Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools | Zhenlong Yuan et.al. | 2510.08480 | null |
| 2025-10-09 | MMHOI: Modeling Complex 3D Multi-Human Multi-Object Interactions | Kaen Kogashi et.al. | 2510.07828 | null |
| 2025-10-07 | Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model | Danush Kumar Venkatesh et.al. | 2510.07345 | null |
| 2025-10-08 | Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping | Ziyi Wang et.al. | 2510.07230 | null |
| 2025-10-08 | TrackVLA++: Unleashing Reasoning and Memory Capabilities in VLA Models for Embodied Visual Tracking | Jiahang Liu et.al. | 2510.07134 | null |
| 2025-10-07 | From Captions to Keyframes: KeyScore for Multimodal Frame Scoring and Video-Language Understanding | Shih-Yao Lin et.al. | 2510.06509 | null |
| 2025-10-07 | Human Action Recognition from Point Clouds over Time | James Dickens et.al. | 2510.05506 | null |
| 2025-10-05 | Speculative Actions: A Lossless Framework for Faster Agentic Systems | Naimeng Ye et.al. | 2510.04371 | null |
| 2025-10-04 | Talking Tennis: Language Feedback from 3D Biomechanical Action Recognition | Arushi Dashore et.al. | 2510.03921 | null |
| 2025-10-04 | MonitorVLM:A Vision Language Framework for Safety Violation Detection in Mining Operations | Jiang Wu et.al. | 2510.03666 | null |
| 2025-10-03 | FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents | Imene Kerboua et.al. | 2510.03204 | null |
| 2025-09-27 | $\texttt{BluePrint}$ : A Social Media User Dataset for LLM Persona Evaluation and Training | Aurélien Bück-Kaeffer et.al. | 2510.02343 | null |
| 2025-10-02 | Wearable and Ultra-Low-Power Fusion of EMG and A-Mode US for Hand-Wrist Kinematic Tracking | Giusy Spacone et.al. | 2510.02000 | null |
| 2025-10-02 | Contrastive Representation Regularization for Vision-Language-Action Models | Taeyoung Kim et.al. | 2510.01711 | null |
| 2025-10-01 | EvoStruggle: A Dataset Capturing the Evolution of Struggle across Activities and Skill Levels | Shijia Feng et.al. | 2510.01362 | null |
| 2025-10-01 | HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy | Myungkyu Koo et.al. | 2510.00695 | null |
| 2025-10-01 | Expandable Decision-Making States for Multi-Agent Deep Reinforcement Learning in Soccer Tactical Analysis | Kenjiro Ide et.al. | 2510.00480 | null |
| 2025-09-30 | Towards Intuitive Human-Robot Interaction through Embodied Gesture-Driven Control with Woven Tactile Skins | ChunPing Lam et.al. | 2509.25951 | null |
| 2025-09-22 | Six Sigma For Neural Networks: Taguchi-based optimization | Sai Varun Kodathala et.al. | 2509.25213 | null |
| 2025-09-29 | Fast Real-Time Pipeline for Robust Arm Gesture Recognition | Milán Zsolt Bagladi et.al. | 2509.25042 | null |
| 2025-09-28 | AssemblyHands-X: Modeling 3D Hand-Body Coordination for Understanding Bimanual Human Activities | Tatsuro Banno et.al. | 2509.23888 | null |
| 2025-09-27 | New Synthetic Goldmine: Hand Joint Angle-Driven EMG Data Generation Framework for Micro-Gesture Recognition | Nana Wang et.al. | 2509.23359 | null |
| 2025-09-27 | Spatiotemporal Radar Gesture Recognition with Hybrid Spiking Neural Networks: Balancing Accuracy and Efficiency | Riccardo Mazzieri et.al. | 2509.23303 | null |
| 2025-09-27 | MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action Recognition | Ye-eun Kim et.al. | 2509.23044 | null |
| 2025-09-27 | Disentangling Static and Dynamic Information for Reducing Static Bias in Action Recognition | Masato Kobayashi et.al. | 2509.23009 | null |
| 2025-09-26 | See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation | Chih Yao Hu et.al. | 2509.22653 | link |
| 2025-09-26 | Prompt-guided Representation Disentanglement for Action Recognition | Tianci Wu et.al. | 2509.21783 | null |
| 2025-09-25 | SlotFM: A Motion Foundation Model with Slot Attention for Diverse Downstream Tasks | Junyong Park et.al. | 2509.21673 | null |
| 2025-09-25 | Temporal vs. Spatial: Comparing DINOv3 and V-JEPA2 Feature Representations for Video Action Analysis | Sai Varun Kodathala et.al. | 2509.21595 | null |
| 2025-09-25 | EMG-UP: Unsupervised Personalization in Cross-User EMG Gesture Recognition | Nana Wang et.al. | 2509.21589 | null |
| 2025-09-24 | mmHSense: Multi-Modal and Distributed mmWave ISAC Datasets for Human Sensing | Nabeel Nisar Bhat et.al. | 2509.21396 | null |
| 2025-09-25 | Every Subtlety Counts: Fine-grained Person Independence Micro-Action Recognition via Distributionally Robust Optimization | Feng-Qi Cui et.al. | 2509.21261 | null |
| 2025-09-25 | Autoregressive End-to-End Planning with Time-Invariant Spatial Alignment and Multi-Objective Policy Refinement | Jianbo Zhao et.al. | 2509.20938 | null |
| 2025-09-25 | GenFacts-Generative Counterfactual Explanations for Multi-Variate Time Series | Sarah Seifi et.al. | 2509.20936 | null |
| 2025-09-25 | Causal Time Series Generation via Diffusion Models | Yutong Xia et.al. | 2509.20846 | null |
| 2025-09-23 | A Bimanual Gesture Interface for ROS-Based Mobile Manipulators Using TinyML and Sensor Fusion | Najeeb Ahmed Bhuiyan et.al. | 2509.19521 | null |
| 2025-09-23 | FERA: Foil Fencing Referee Assistant Using Pose-Based Multi-Label Move Recognition and Rule Reasoning | Ziwen Chen et.al. | 2509.18527 | null |
| 2025-09-22 | MoCrop: Training Free Motion Guided Cropping for Efficient Video Action Recognition | Binhua Huang et.al. | 2509.18473 | null |
| 2025-09-22 | Orcust: Stepwise-Feedback Reinforcement Learning for GUI Agent | Junyu Lu et.al. | 2509.17917 | null |
| 2025-09-22 | Trainee Action Recognition through Interaction Analysis in CCATT Mixed-Reality Training | Divya Mereddy et.al. | 2509.17888 | null |
| 2025-09-22 | A $^2$M$^2$ -Net: Adaptively Aligned Multi-Scale Moment for Few-Shot Action Recognition | Zilin Gao et.al. | 2509.17638 | null |
| 2025-09-22 | UIPro: Unleashing Superior Interaction Capability For GUI Agents | Hongxin Li et.al. | 2509.17328 | null |
| 2025-09-21 | Imagine2Act: Leveraging Object-Action Motion Consistency from Imagined Goals for Robotic Manipulation | Liang Heng et.al. | 2509.17125 | null |
| 2025-09-21 | MoCLIP-Lite: Efficient Video Recognition by Fusing CLIP with Motion Vectors | Binhua Huang et.al. | 2509.17084 | null |
| 2025-09-20 | Automated Procedural Analysis via Video-Language Models for AI-assisted Nursing Skills Assessment | Shen Chang et.al. | 2509.16810 | null |
| 2025-09-19 | KRAST: Knowledge-Augmented Robotic Action Recognition with Structured Text for Vision-Language Models | Son Hai Nguyen et.al. | 2509.16452 | null |
| 2025-09-18 | RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation | Yuming Jiang et.al. | 2509.15212 | link |
| 2025-09-18 | Doppler Radiance Field-Guided Antenna Selection for Improved Generalization in Multi-Antenna Wi-Fi-based Human Activity Recognition | Navid Hasanzadeh et.al. | 2509.15129 | null |
| 2025-09-18 | LSTC-MDA: A Unified Framework for Long-Short Term Temporal Convolution and Mixed Data Augmentation in Skeleton-Based Action Recognition | Feng Ding et.al. | 2509.14619 | null |
| 2025-09-18 | ClearFairy: Capturing Creative Workflows through Decision Structuring, In-Situ Questioning, and Rationale Inference | Kihoon Son et.al. | 2509.14537 | null |
| 2025-09-15 | Domain-Adaptive Pretraining Improves Primate Behavior Recognition | Felix B. Mueller et.al. | 2509.12193 | null |
| 2025-09-15 | Open-ended Hierarchical Streaming Video Understanding with Vision Language Models | Hyolim Kang et.al. | 2509.12145 | null |
| 2025-09-15 | Gesture-Based Robot Control Integrating Mm-wave Radar and Behavior Trees | Yuqing Song et.al. | 2509.12008 | null |
| 2025-09-15 | Learning Representations in Video Game Agents with Supervised Contrastive Imitation Learning | Carlos Celemin et.al. | 2509.11880 | null |
| 2025-09-11 | Improvement of Human-Object Interaction Action Recognition Using Scene Information and Multi-Task Learning Approach | Hesham M. Shehata et.al. | 2509.09067 | null |
| 2025-09-10 | A Contextual Bandits Approach for Personalization of Hand Gesture Recognition | Duke Lin et.al. | 2509.08915 | null |
| 2025-09-10 | Diffusion-Based Action Recognition Generalizes to Untrained Domains | Rogerio Guimaraes et.al. | 2509.08908 | null |
| 2025-09-10 | Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening | Piyush Bagad et.al. | 2509.08502 | null |
| 2025-09-10 | LD-ViCE: Latent Diffusion Model for Video Counterfactual Explanations | Payal Varshney et.al. | 2509.08422 | null |
| 2025-09-09 | EHWGesture – A dataset for multimodal understanding of clinical gestures | Gianluca Amprimo et.al. | 2509.07525 | null |
| 2025-09-09 | G3CN: Gaussian Topology Refinement Gated Graph Convolutional Network for Skeleton-Based Action Recognition | Haiqing Ren et.al. | 2509.07335 | null |
| 2025-08-05 | Live Demonstration: Neuromorphic Radar for Gesture Recognition | Satyapreet Singh Yadav et.al. | 2508.03324 | null |
| 2025-07-22 | Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition | Zefeng Qian et.al. | 2507.16287 | null |
| 2025-07-22 | SPACT18: Spiking Human Action Recognition Benchmark Dataset with Complementary RGB and Thermal Modalities | Yasser Ashraf et.al. | 2507.16151 | null |
| 2025-07-20 | Light Future: Multimodal Action Frame Prediction via InstructPix2Pix | Zesen Zhong et.al. | 2507.14809 | null |
| 2025-07-17 | A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains | Antonio Finocchiaro et.al. | 2507.13326 | null |
| 2025-07-17 | Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities | Liuyi Wang et.al. | 2507.13019 | null |
| 2025-07-17 | Generalist Bimanual Manipulation via Foundation Video Diffusion Models | Yao Feng et.al. | 2507.12898 | null |
| 2025-07-16 | Predicting Soccer Penalty Kick Direction Using Human Action Recognition | David Freire-Obregón et.al. | 2507.12617 | null |
| 2025-07-18 | DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition | Hayat Ullah et.al. | 2507.12426 | null |
| 2025-07-16 | Calisthenics Skills Temporal Video Segmentation | Antonio Finocchiaro et.al. | 2507.12245 | null |
| 2025-07-15 | Diffusion-Based Imaginative Coordination for Bimanual Manipulation | Huilin Xu et.al. | 2507.11296 | null |
| 2025-07-15 | Women Sport Actions Dataset for Visual Classification Using Small Scale Training Data | Palash Ray et.al. | 2507.10969 | null |
| 2025-07-14 | Hand Gesture Recognition for Collaborative Robots Using Lightweight Deep Learning in Real-Time Robotic Systems | Muhtadin et.al. | 2507.10055 | null |
| 2025-07-13 | Online Micro-gesture Recognition Using Data Augmentation and Spatial-Temporal Attention | Pengyu Liu et.al. | 2507.09512 | null |
| 2025-07-11 | MM-Gesture: Towards Precise Micro-Gesture Recognition through Multimodal Fusion | Jihao Gu et.al. | 2507.08344 | null |
| 2025-07-10 | Multimodal Framework for Explainable Autonomous Driving: Integrating Video, Sensor, and Textual Data for Enhanced Decision-Making and Transparency | Abolfazl Zarghani et.al. | 2507.07938 | null |
| 2025-07-10 | EEvAct: Early Event-Based Action Recognition with High-Rate Two-Stream Spiking Neural Networks | Michael Neumeier et.al. | 2507.07734 | null |
| 2025-07-09 | Cross-Modal Dual-Causal Learning for Long-Term Action Recognition | Xu Shaowu et.al. | 2507.06603 | null |
| 2025-07-08 | Hierarchical Multi-Stage Transformer Architecture for Context-Aware Temporal Action Localization | Hayat Ullah et.al. | 2507.06411 | null |
| 2025-07-10 | VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting | Juyi Lin et.al. | 2507.05116 | link |
| 2025-07-07 | HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding | Yuxuan Cai et.al. | 2507.04909 | null |
| 2025-07-06 | Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions | Konstantinos Foteinos et.al. | 2507.04465 | null |
| 2025-07-06 | DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge | Wenyao Zhang et.al. | 2507.04447 | link |
| 2025-07-04 | Masked Temporal Interpolation Diffusion for Procedure Planning in Instructional Videos | Yufan Zhou et.al. | 2507.03393 | link |
| 2025-07-05 | AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation | Sixiang Chen et.al. | 2507.01961 | null |
| 2025-07-02 | Variational Graph Convolutional Neural Networks | Illia Oleksiienko et.al. | 2507.01699 | null |
| 2025-07-01 | Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment | Kai Zhou et.al. | 2507.00566 | null |
| 2025-06-30 | LineRetriever: Planning-Aware Observation Reduction for Web Agents | Imene Kerboua et.al. | 2507.00210 | null |
| 2025-06-30 | Online Human Action Detection during Escorting | Siddhartha Mondal et.al. | 2506.23573 | null |
| 2025-06-29 | DEL: Dense Event Localization for Multi-modal Audio-Visual Understanding | Mona Ahmadian et.al. | 2506.23196 | null |
| 2025-06-27 | Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition | Wenhan Wu et.al. | 2506.22179 | null |
| 2025-06-26 | WorldVLA: Towards Autoregressive Action World Model | Jun Cen et.al. | 2506.21539 | link |
| 2025-06-26 | EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception | Sanjoy Chowdhury et.al. | 2506.21080 | null |
| 2025-06-25 | How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction? | Stephanie Käs et.al. | 2506.20795 | null |
| 2025-06-25 | CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition | Joerg Deigmoeller et.al. | 2506.20373 | null |
| 2025-06-25 | Feature Hallucination for Self-supervised Action Recognition | Lei Wang et.al. | 2506.20342 | null |
| 2025-06-27 | ReactEMG: Zero-Shot, Low-Latency Intent Detection via sEMG | Runsheng Wang et.al. | 2506.19815 | null |
| 2025-06-24 | Self-Paced Collaborative and Adversarial Network for Unsupervised Domain Adaptation | Weichen Zhang et.al. | 2506.19267 | null |
| 2025-06-23 | Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition | Dustin Aganian et.al. | 2506.18721 | null |
| 2025-06-23 | Improving Weakly Supervised Temporal Action Localization by Exploiting Multi-resolution Information in Temporal Domain | Rui Su et.al. | 2506.18261 | null |
| 2025-06-23 | Robot Tactile Gesture Recognition Based on Full-body Modular E-skin | Shuo Jiang et.al. | 2506.18256 | null |
| 2025-06-22 | Adapting Vision-Language Models for Evaluating World Models | Mariya Hendriksen et.al. | 2506.17967 | null |
| 2025-06-21 | Domain Generalization using Action Sequences for Egocentric Action Recognition | Amirshayan Nasirimajd et.al. | 2506.17685 | null |
| 2025-06-20 | Wi-Fi Sensing Tool Release: Gathering 802.11ax Channel State Information from a Commercial Wi-Fi Access Point | Zisheng Wang et.al. | 2506.16957 | null |
| 2025-06-20 | Language-driven Description Generation and Common Sense Reasoning for Video Action Recognition | Xiaodan Hu et.al. | 2506.16701 | null |
| 2025-06-19 | CLIP-MG: Guiding Semantic Attention with Skeletal Pose Features and RGB Data for Micro-Gesture Recognition on the iMiGUE Dataset | Santosh Patapati et.al. | 2506.16385 | null |
| 2025-06-18 | Accessible Gesture-Driven Augmented Reality Interaction System | Yikan Wang et.al. | 2506.15189 | null |
| 2025-06-17 | CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion | Jiahua Ma et.al. | 2506.14769 | null |
| 2025-06-16 | Leveraging Vision-Language Pre-training for Human Activity Recognition in Still Images | Cristina Mahanta et.al. | 2506.13458 | null |
| 2025-06-16 | Active Multimodal Distillation for Few-shot Action Recognition | Weijia Feng et.al. | 2506.13322 | null |
| 2025-06-16 | Action Dubber: Timing Audible Actions via Inflectional Flow | Wenlong Wan et.al. | 2506.13320 | null |
| 2025-06-15 | Towards Fine-Grained Emotion Understanding via Skeleton-Based Micro-Gesture Recognition | Hao Xu et.al. | 2506.12848 | null |
| 2025-06-13 | Pose Matters: Evaluating Vision Transformers and CNNs for Human Action Recognition on Small COCO Subsets | MingZe Tang et.al. | 2506.11678 | null |
| 2025-06-12 | GynSurg: A Comprehensive Gynecology Laparoscopic Surgery Dataset | Sahar Nasirihaghighi et.al. | 2506.11356 | null |
| 2025-06-12 | WaveFormer: A Lightweight Transformer Model for sEMG-based Gesture Recognition | Yanlong Chen et.al. | 2506.11168 | null |
| 2025-06-11 | SLRNet: A Real-Time LSTM-Based Sign Language Recognition System | Sharvari Kamble et.al. | 2506.11154 | link |
| 2025-06-10 | Gender Fairness of Machine Learning Algorithms for Pain Detection | Dylan Green et.al. | 2506.11132 | null |
| 2025-06-12 | Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop | Justin Kerr et.al. | 2506.10968 | null |
| 2025-06-11 | HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios | Kunyu Peng et.al. | 2506.09650 | link |
| 2025-06-11 | Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation | Ye Niu et.al. | 2506.09422 | null |
| 2025-06-11 | Synthetic Human Action Video Data Generation with Pose Transfer | Vaclav Knapp et.al. | 2506.09411 | null |
| 2025-06-11 | An Effective End-to-End Solution for Multimodal Action Recognition | Songping Wang et.al. | 2506.09345 | null |
| 2025-06-10 | Diver-Robot Communication Dataset for Underwater Hand Gesture Recognition | Igor Kvasić et.al. | 2506.08974 | null |
| 2025-06-09 | BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models | Peiyan Li et.al. | 2506.07961 | link |
| 2025-06-08 | AugmentGest: Can Random Data Cropping Augmentation Boost Gesture Recognition Performance? | Nada Aboudeshish et.al. | 2506.07216 | null |
| 2025-06-08 | SAP-Bench: Benchmarking Multimodal Large Language Models in Surgical Action Planning | Mengya Xu et.al. | 2506.07196 | null |
| 2025-06-07 | PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments | Minghao Zou et.al. | 2506.06631 | null |
| 2025-06-06 | Conversational Interfaces for Parametric Conceptual Architectural Design: Integrating Mixed Reality with LLM-driven Interaction | Ruochen Ji et.al. | 2506.06066 | null |
| 2025-06-06 | DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models | Yuhan Hao et.al. | 2506.05667 | null |
| 2025-06-05 | Robustness Evaluation for Video Models with Reinforcement Learning | Ashwin Ramesh Babu et.al. | 2506.05431 | null |
| 2025-06-04 | Video, How Do Your Tokens Merge? | Sam Pollard et.al. | 2506.03885 | null |
| 2025-06-04 | Zero-Shot Temporal Interaction Localization for Egocentric Videos | Erhang Zhang et.al. | 2506.03662 | link |
| 2025-06-04 | Heterogeneous Skeleton-Based Action Representation Learning | Hongsong Wang et.al. | 2506.03481 | null |
| 2025-06-04 | Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments | Di Wen et.al. | 2506.02845 | link |
| 2025-06-03 | Technical Report for Ego4D Long-Term Action Anticipation Challenge 2025 | Qiaohui Chu et.al. | 2506.02550 | null |
| 2025-06-03 | VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments | Zelai Xu et.al. | 2506.02387 | link |
| 2025-06-03 | Multi-level and Multi-modal Action Anticipation | Seulgi Kim et.al. | 2506.02382 | null |
| 2025-06-02 | TransAct V2: Lifelong User Action Sequence Modeling on Pinterest Recommendation | Xue Xia et.al. | 2506.02267 | null |
| 2025-06-02 | SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics | Mustafa Shukor et.al. | 2506.01844 | link |
| 2025-06-02 | Efficient Egocentric Action Recognition with Multimodal Data | Marco Calzavara et.al. | 2506.01757 | null |
| 2025-06-02 | EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language models | Andy Bonnetto et.al. | 2506.01608 | link |
| 2025-06-02 | Sheep Facial Pain Assessment Under Weighted Graph Neural Networks | Alam Noor et.al. | 2506.01468 | null |
| 2025-06-02 | EgoBrain: Synergizing Minds and Eyes For Human Action Understanding | Nie Lin et.al. | 2506.01353 | null |
| 2025-05-30 | DiG-Net: Enhancing Quality of Life through Hyper-Range Dynamic Gesture Recognition in Assistive Robotics | Eran Bamani Beeri et.al. | 2505.24786 | null |
| 2025-05-30 | Beyond FACS: Data-driven Facial Expression Dictionaries, with Application to Predicting Autism | Evangelos Sariyanidi et.al. | 2505.24679 | null |
| 2025-05-30 | EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding | Ege Özsoy et.al. | 2505.24287 | null |
| 2025-05-29 | Autoregressive Meta-Actions for Unified Controllable Trajectory Generation | Jianbo Zhao et.al. | 2505.23612 | null |
| 2025-05-29 | CLIP-AE: CLIP-assisted Cross-view Audio-Visual Enhancement for Unsupervised Temporal Action Localization | Rui Xia et.al. | 2505.23524 | null |
| 2025-05-29 | Spatio-Temporal Joint Density Driven Learning for Skeleton-Based Action Recognition | Shanaka Ramesh Gunasekara et.al. | 2505.23012 | link |
| 2025-05-28 | PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion | Jaehyun Choi et.al. | 2505.22564 | null |
| 2025-05-27 | DeepConvContext: A Multi-Scale Approach to Timeseries Classification in Human Activity Recognition | Marius Bock et.al. | 2505.20894 | link |
| 2025-05-27 | TrustSkin: A Fairness Pipeline for Trustworthy Facial Affect Analysis Across Skin Tone | Ana M. Cabanas et.al. | 2505.20637 | null |
| 2025-05-26 | Data-Free Class-Incremental Gesture Recognition with Prototype-Guided Pseudo Feature Replay | Hongsong Wang et.al. | 2505.20049 | link |
| 2025-05-26 | PHI: Bridging Domain Shift in Long-Term Action Quality Assessment via Progressive Hierarchical Instruction | Kanglei Zhou et.al. | 2505.19972 | link |
| 2025-05-26 | The Role of Video Generation in Enhancing Data-Limited Action Understanding | Wei Li et.al. | 2505.19495 | null |
| 2025-05-24 | ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos | Xiaodong Wang et.al. | 2505.18650 | null |
| 2025-05-27 | SHARDeg: A Benchmark for Skeletal Human Action Recognition in Degraded Scenarios | Simon Malzard et.al. | 2505.18048 | null |
| 2025-05-23 | 3D Face Reconstruction Error Decomposed: A Modular Benchmark for Fair and Fast Method Evaluation | Evangelos Sariyanidi et.al. | 2505.18025 | null |
| 2025-05-23 | Multi-task Learning For Joint Action and Gesture Recognition | Konstantinos Spathis et.al. | 2505.17867 | null |
| 2025-05-23 | Temporal Consistency Constrained Transferable Adversarial Attacks with Background Mixup for Action Recognition | Ping Li et.al. | 2505.17807 | link |
| 2025-05-23 | Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour | Bálint Gyevnár et.al. | 2505.17801 | null |
| 2025-05-23 | SVL: Spike-based Vision-language Pretraining for Efficient 3D Open-world Understanding | Xuerui Qiu et.al. | 2505.17674 | null |
| 2025-05-23 | ProTAL: A Drag-and-Link Video Programming Framework for Temporal Action Localization | Yuchen He et.al. | 2505.17555 | null |
| 2025-05-22 | UAV Control with Vision-based Hand Gesture Recognition over Edge-Computing | Sousannah Abdalla et.al. | 2505.17303 | null |
| 2025-05-22 | CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning | Jiange Yang et.al. | 2505.17006 | null |
| 2025-05-21 | Towards Zero-Shot Differential Morphing Attack Detection with Multimodal Large Language Models | Ria Shekhawat et.al. | 2505.15332 | null |
| 2025-05-21 | DiffProb: Data Pruning for Face Recognition | Eduarda Caldeira et.al. | 2505.15272 | link |
| 2025-05-21 | Leveraging Foundation Models for Multimodal Graph-Based Action Recognition | Fatemeh Ziaeetabar et.al. | 2505.15192 | null |
| 2025-05-20 | Egocentric Action-aware Inertial Localization in Point Clouds | Mingfang Zhang et.al. | 2505.14346 | link |
| 2025-05-20 | Transfer Learning from Visual Speech Recognition to Mouthing Recognition in German Sign Language | Dinh Nam Pham et.al. | 2505.13784 | link |
| 2025-05-18 | MTIL: Encoding Full History with Mamba for Temporal Imitation Learning | Yulin Zhou et.al. | 2505.12410 | link |
| 2025-05-20 | Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation | Shuo Wang et.al. | 2505.11886 | null |
| 2025-05-16 | Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation | Zihan Wang et.al. | 2505.11383 | link |
| 2025-05-15 | NeoLightning: A Modern Reimagination of Gesture-Based Sound Design | Yonghyun Kim et.al. | 2505.10686 | link |
| 2025-05-15 | Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized? | Jianyang Xie et.al. | 2505.10679 | link |
| 2025-05-14 | Mission Balance: Generating Under-represented Class Samples using Video Diffusion Models | Danush Kumar Venkatesh et.al. | 2505.09858 | link |
| 2025-05-13 | Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection | Ayush K. Rai et.al. | 2505.08561 | null |
| 2025-05-17 | Training Strategies for Efficient Embodied Reasoning | William Chen et.al. | 2505.08243 | null |
| 2025-05-12 | H $^{\mathbf{3}}$ DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning | Yiyang Lu et.al. | 2505.07819 | null |
| 2025-05-11 | DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems | Tong Zhang et.al. | 2505.07110 | null |
| 2025-05-10 | A Short Overview of Multi-Modal Wi-Fi Sensing | Zijian Zhao et.al. | 2505.06682 | link |
| 2025-05-09 | Context Informed Incremental Learning Improves Myoelectric Control Performance in Virtual Reality Object Manipulation Tasks | Gabriel Gagné et.al. | 2505.06064 | link |
| 2025-05-09 | Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition | Congqi Cao et.al. | 2505.06002 | link |
| 2025-05-07 | DetReIDX: A Stress-Test Dataset for Real-World UAV-Based Person Recognition | Kailash A. Hambarde et.al. | 2505.04793 | link |
| 2025-05-07 | Comparison of Visual Trackers for Biomechanical Analysis of Running | Luis F. Gomez et.al. | 2505.04713 | null |
| 2025-05-07 | Trajectory Entropy Reinforcement Learning for Predictable and Robust Control | Bang You et.al. | 2505.04193 | null |
| 2025-05-07 | FoodTrack: Estimating Handheld Food Portions with Egocentric Video | Ervin Wang et.al. | 2505.04055 | null |
| 2025-05-06 | Action Spotting and Precise Event Detection in Sports: Datasets, Methods, and Challenges | Hao Xu et.al. | 2505.03991 | null |
| 2025-05-03 | A Multimodal Framework for Explainable Evaluation of Soft Skills in Educational Environments | Jared D. T. Guerrero-Sosa et.al. | 2505.01794 | null |
| 2025-05-01 | Predicting Estimated Times of Restoration for Electrical Outages Using Longitudinal Tabular Transformers | Bogireddy Sai Prasanna Teja et.al. | 2505.00225 | null |
| 2025-04-30 | Direct Motion Models for Assessing Generated Videos | Kelsey Allen et.al. | 2505.00209 | null |
| 2025-04-30 | CoCoDiff: Diversifying Skeleton Action Features via Coarse-Fine Text-Co-Guided Latent Diffusion | Zhifu Zhao et.al. | 2504.21266 | null |
| 2025-04-29 | Beyond the Horizon: Decoupling UAVs Multi-View Action Recognition via Partial Order Transfer | Wenxuan Liu et.al. | 2504.20530 | null |
| 2025-04-28 | ProFi-Net: Prototype-based Feature Attention with Curriculum Augmentation for WiFi-based Gesture Recognition | Zhe Cui et.al. | 2504.20193 | null |
| 2025-04-28 | FSBench: A Figure Skating Benchmark for Advancing Artistic Sports Understanding | Rong Gao et.al. | 2504.19514 | null |
| 2025-04-26 | 3DPyranet Features Fusion for Spatio-temporal Feature Learning | Ihsan Ullah et.al. | 2504.18977 | null |
| 2025-04-25 | POET: Prompt Offset Tuning for Continual Human Action Adaptation | Prachi Garg et.al. | 2504.18059 | null |
| 2025-04-25 | RSRNav: Reasoning Spatial Relationship for Image-Goal Navigation | Zheng Qin et.al. | 2504.17991 | null |
| 2025-04-24 | Robotic Task Ambiguity Resolution via Natural Language Interaction | Eugenio Chisari et.al. | 2504.17748 | null |
| 2025-04-23 | Latent Diffusion Planning for Imitation Learning | Amber Xie et.al. | 2504.16925 | null |
| 2025-04-23 | WiFi based Human Fall and Activity Recognition using Transformer based Encoder Decoder and Graph Neural Networks | Younggeol Cho et.al. | 2504.16655 | null |
| 2025-04-23 | Advancing Radar Hand Gesture Recognition: A Hybrid Spectrum Synthetic Framework Merging Simulation with Neural Networks | Jiaqi Tang et.al. | 2504.16423 | null |
| 2025-04-21 | Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer | Ziyi Liu et.al. | 2504.14860 | null |
| 2025-04-20 | Time Frequency Analysis of EMG Signal for Gesture Recognition using Fine grained Features | Parshuram N. Aarotale et.al. | 2504.14708 | null |
| 2025-04-22 | Talk is Not Always Cheap: Promoting Wireless Sensing Models with Text Prompts | Zhenkui Yang et.al. | 2504.14621 | link |
| 2025-04-19 | Balancing Privacy and Action Performance: A Penalty-Driven Approach to Image Anonymization | Nazia Aslam et.al. | 2504.14301 | null |
| 2025-04-18 | Are you SURE? Enhancing Multimodal Pretraining with Missing Modalities through Uncertainty Estimation | Duy A. Nguyen et.al. | 2504.13465 | null |
| 2025-04-23 | Chain-of-Thought Textual Reasoning for Few-shot Temporal Action Localization | Hongwei Ji et.al. | 2504.13460 | null |
| 2025-04-17 | Wearable-Derived Behavioral and Physiological Biomarkers for Classifying Unipolar and Bipolar Depression Severity | Yassine Ouzar et.al. | 2504.13331 | null |
| 2025-04-17 | PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition | Jongseo Lee et.al. | 2504.13140 | null |
| 2025-04-16 | SkeletonX: Data-Efficient Skeleton-based Action Recognition via Cross-sample Feature Aggregation | Zongye Zhang et.al. | 2504.11749 | link |
| 2025-04-14 | Toward Aligning Human and Robot Actions via Multi-Modal Demonstration Learning | Azizul Zahid et.al. | 2504.11493 | link |
| 2025-04-14 | H-MoRe: Learning Human-centric Motion Representation for Action Analysis | Zhanbo Huang et.al. | 2504.10676 | null |
| 2025-04-14 | Towards Low-Latency Event-based Obstacle Avoidance on a FPGA-Drone | Pietro Bonazzi et.al. | 2504.10400 | link |
| 2025-04-14 | Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition | Hongyu Qu et.al. | 2504.10079 | null |
| 2025-04-14 | EmbodiedAgent: A Scalable Hierarchical Approach to Overcome Practical Challenge in Multi-Robot Control | Hanwen Wan et.al. | 2504.10030 | link |
| 2025-04-14 | Hands-On: Segmenting Individual Signs from Continuous Sequences | Low Jian He et.al. | 2504.08593 | null |
| 2025-04-11 | Knowledge Distillation for Multimodal Egocentric Action Recognition Robust to Missing Modalities | Maria Santos-Villafranca et.al. | 2504.08578 | null |
| 2025-04-11 | Breaking the Barriers: Video Vision Transformers for Word-Level Sign Language Recognition | Alexander Brettmann et.al. | 2504.07792 | null |
| 2025-04-10 | Towards Micro-Action Recognition with Limited Annotations: An Asynchronous Pseudo Labeling and Training Approach | Yan Zhang et.al. | 2504.07785 | null |
| 2025-04-13 | ID-Booth: Identity-consistent Face Generation with Diffusion Models | Darian Tomašević et.al. | 2504.07392 | link |
| 2025-04-09 | Leveraging GCN-based Action Recognition for Teleoperation in Daily Activity Assistance | Thomas M. Kwok et.al. | 2504.07001 | null |
| 2025-04-09 | Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation | Sirine Arfa et.al. | 2504.06748 | null |
| 2025-04-09 | Exploring Ordinal Bias in Action Recognition for Instructional Videos | Joochan Kim et.al. | 2504.06580 | link |
| 2025-04-08 | FaceCloak: Learning to Protect Face Templates | Sudipta Banerjee et.al. | 2504.06131 | null |
| 2025-04-08 | Modular Soft Wearable Glove for Real-Time Gesture Recognition and Dynamic 3D Shape Reconstruction | Huazhi Dong et.al. | 2504.05983 | null |
| 2025-04-08 | Temporal Alignment-Free Video Matching for Few-shot Action Recognition | SuBeen Lee et.al. | 2504.05956 | null |
| 2025-04-08 | SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning | Fida Mohammad Thoker et.al. | 2504.05706 | null |
| 2025-04-07 | SelfMAD: Enhancing Generalization and Robustness in Morphing Attack Detection via Self-Supervised Learning | Marija Ivanovska et.al. | 2504.05504 | null |
| 2025-04-06 | SnapPix: Efficient-Coding–Inspired In-Sensor Compression for Edge Vision | Weikai Lin et.al. | 2504.04535 | null |
| 2025-04-04 | An Exploration-free Method for a Linear Stochastic Bandit Driven by a Linear Gaussian Dynamical System | Jonathan Gornet et.al. | 2504.03926 | null |
| 2025-04-04 | Electromyography-Based Gesture Recognition: Hierarchical Feature Extraction for Enhanced Spatial-Temporal Dynamics | Jungpil Shin et.al. | 2504.03221 | null |
| 2025-04-02 | UAC: Uncertainty-Aware Calibration of Neural Networks for Gesture Detection | Farida Al Haddad et.al. | 2504.02895 | null |
| 2025-04-03 | Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets | Chuning Zhu et.al. | 2504.02792 | link |
| 2025-04-03 | MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion | Trung Thanh Nguyen et.al. | 2504.02287 | link |
| 2025-04-07 | MultiTSF: Transformer-based Sensor Fusion for Human-Centric Multi-view and Multi-modal Action Recognition | Trung Thanh Nguyen et.al. | 2504.02279 | null |
| 2025-04-03 | SocialGesture: Delving into Multi-person Gesture Understanding | Xu Cao et.al. | 2504.02244 | null |
| 2025-04-02 | LSC-ADL: An Activity of Daily Living (ADL)-Annotated Lifelog Dataset Generated via Semi-Automatic Clustering | Minh-Quan Ho-Le et.al. | 2504.02060 | null |
| 2025-04-07 | Is Temporal Prompting All We Need For Limited Labeled Action Recognition? | Shreyank N Gowda et.al. | 2504.01890 | null |
| 2025-04-01 | FDDet: Frequency-Decoupling for Boundary Refinement in Temporal Action Detection | Xinnan Zhu et.al. | 2504.00647 | null |
| 2025-04-01 | Sample-level Adaptive Knowledge Distillation for Action Recognition | Ping Li et.al. | 2504.00606 | null |
| 2025-03-30 | CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition | Jongseo Lee et.al. | 2503.23447 | null |
| 2025-03-30 | OwlSight: A Robust Illumination Adaptation Framework for Dark Video Human Action Recognition | Shihao Cheng et.al. | 2503.23266 | null |
| 2025-03-29 | Action Recognition in Real-World Ambient Assisted Living Environment | Vincent Gbouna Zakka et.al. | 2503.23214 | link |
| 2025-03-28 | ForcePose: A Deep Learning Approach for Force Calculation Based on Action Recognition Using MediaPipe Pose Estimation Combined with Object Detection | Nandakishor M et.al. | 2503.22363 | null |
| 2025-03-30 | UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning | Zhengxi Lu et.al. | 2503.21620 | link |
| 2025-03-27 | One Snapshot is All You Need: A Generalized Method for mmWave Signal Generation | Teng Huang et.al. | 2503.21122 | null |
| 2025-03-26 | ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction | Yiqiao Jin et.al. | 2503.20978 | null |
| 2025-03-26 | Siformer: Feature-isolated Transformer for Efficient Skeleton-based Sign Language Recognition | Muxin Pu et.al. | 2503.20436 | null |
| 2025-03-25 | Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings | Chengan Che et.al. | 2503.19740 | link |
| 2025-03-25 | fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models | Saurav Sharma et.al. | 2503.19670 | null |
| 2025-03-24 | LLaVAction: evaluating and training multi-modal large language models for action recognition | Shaokai Ye et.al. | 2503.18712 | link |
| 2025-03-24 | Surgical Action Planning with Large Language Models | Mengya Xu et.al. | 2503.18296 | null |
| 2025-03-27 | Temporal-Guided Spiking Neural Networks for Event-Based Human Action Recognition | Siyuan Yang et.al. | 2503.17132 | null |
| 2025-03-21 | BEAC: Imitating Complex Exploration and Task-oriented Behaviors for Invisible Object Nonprehensile Manipulation | Hirotaka Tahara et.al. | 2503.16803 | null |
| 2025-03-21 | Improving mmWave based Hand Hygiene Monitoring through Beam Steering and Combining Techniques | Isura Nirmal et.al. | 2503.16764 | null |
| 2025-03-19 | A Comprehensive Survey on Architectural Advances in Deep CNNs: Challenges, Applications, and Emerging Research Directions | Saddam Hussain Khan et.al. | 2503.16546 | null |
| 2025-03-25 | Deep learning framework for action prediction reveals multi-timescale locomotor control | Wei-Chen Wang et.al. | 2503.16340 | null |
| 2025-03-19 | UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction | Shravan Nayak et.al. | 2503.15661 | null |
| 2025-03-19 | Multi-Modal Gesture Recognition from Video and Surgical Tool Pose Information via Motion Invariants | Jumanh Atoum et.al. | 2503.15647 | null |
| 2025-03-21 | Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action Recognition | Seungyeon Cho et.al. | 2503.14960 | null |
| 2025-03-19 | DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework | Henrique Morimitsu et.al. | 2503.14880 | link |
| 2025-03-15 | Salient Temporal Encoding for Dynamic Scene Graph Generation | Zhihao Zhu et.al. | 2503.14524 | null |
| 2025-03-17 | Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition | Shristi Das Biswas et.al. | 2503.13724 | null |
| 2025-03-20 | STEP: Simultaneous Tracking and Estimation of Pose for Animals and Humans | Shashikant Verma et.al. | 2503.13344 | null |
| 2025-03-17 | Dense Policy: Bidirectional Autoregressive Learning of Actions | Yue Su et.al. | 2503.13217 | null |
| 2025-03-16 | EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera | Luming Wang et.al. | 2503.12419 | link |
| 2025-03-16 | ProbDiffFlow: An Efficient Learning-Free Framework for Probabilistic Single-Image Optical Flow Estimation | Mo Zhou et.al. | 2503.12348 | null |
| 2025-03-15 | Real-Time Manipulation Action Recognition with a Factorized Graph Sequence Encoder | Enes Erdogan et.al. | 2503.12034 | null |
| 2025-03-14 | Enhancing Hand Palm Motion Gesture Recognition by Eliminating Reference Frame Bias via Frame-Invariant Similarity Measures | Arno Verduyn et.al. | 2503.11352 | null |
| 2025-03-14 | Aerial Vision-and-Language Navigation with Grid-based View Selection and Map Construction | Ganlong Zhao et.al. | 2503.11091 | null |
| 2025-03-14 | VA-AR: Learning Velocity-Aware Action Representations with Mixture of Window Attention | Jiangning Wei et.al. | 2503.11004 | null |
| 2025-03-13 | Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation | Qi Lv et.al. | 2503.10743 | null |
| 2025-03-11 | Open-World Skill Discovery from Unsegmented Demonstrations | Jingwen Deng et.al. | 2503.10684 | null |
| 2025-03-17 | HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model | Jiaming Liu et.al. | 2503.10631 | null |
| 2025-03-13 | SurgRAW: Multi-Agent Workflow with Chain-of-Thought Reasoning for Surgical Intelligence | Chang Han Low et.al. | 2503.10265 | null |
| 2025-03-12 | A Hybrid Neural Network with Smart Skip Connections for High-Precision, Low-Latency EMG-Based Hand Gesture Recognition | Hafsa Wazir et.al. | 2503.09041 | null |
| 2025-03-12 | Unified Locomotion Transformer with Simultaneous Sim-to-Real Transfer for Quadrupeds | Dikai Liu et.al. | 2503.08997 | null |
| 2025-03-11 | PromptGAR: Flexible Promptive Group Activity Recognition | Zhangyu Jin et.al. | 2503.08933 | null |
| 2025-03-11 | MetaFold: Language-Guided Multi-Category Garment Folding Framework via Trajectory Generation and Foundation Model | Haonan Chen et.al. | 2503.08372 | null |
| 2025-03-11 | A Survey on Wi-Fi Sensing Generalizability: Taxonomy, Techniques, Datasets, and Future Research Prospects | Fei Wang et.al. | 2503.08008 | null |
| 2025-03-10 | Helios 2.0: A Robust, Ultra-Low Power Gesture Recognition System Optimised for Event-Sensor based Wearables | Prarthana Bhattacharyya et.al. | 2503.07825 | null |
| 2025-03-10 | Elderly Activity Recognition in the Wild: Results from the EAR Challenge | Anh-Kiet Duong et.al. | 2503.07821 | null |
| 2025-03-09 | TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos | Chen-Lin Zhang et.al. | 2503.06526 | link |
| 2025-03-09 | SGA-INTERACT: A 3D Skeleton-based Benchmark for Group Activity Understanding in Modern Basketball Tactic | Yuchen Yang et.al. | 2503.06522 | link |
| 2025-03-07 | MPTSNet: Integrating Multiscale Periodic Local Patterns and Global Dependencies for Multivariate Time Series Classification | Yang Mu et.al. | 2503.05582 | null |
| 2025-03-07 | Multi-Grained Feature Pruning for Video-Based Human Pose Estimation | Zhigang Wang et.al. | 2503.05365 | null |
| 2025-03-06 | Maestro: A 302 GFLOPS/W and 19.8GFLOPS RISC-V Vector-Tensor Architecture for Wearable Ultrasound Edge Computing | Mattia Sinigaglia et.al. | 2503.04581 | null |
| 2025-03-06 | Gate-Shift-Pose: Enhancing Action Recognition in Sports with Skeleton Information | Edoardo Bianchi et.al. | 2503.04470 | null |
| 2025-03-06 | Spatial-Temporal Perception with Causal Inference for Naturalistic Driving Action Recognition | Qing Chang et.al. | 2503.04078 | null |
| 2025-03-06 | Social Gesture Recognition in spHRI: Leveraging Fabric-Based Tactile Sensing on Humanoid Robots | Dakarai Crowder et.al. | 2503.03234 | null |
| 2025-03-04 | Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup | Seokun Kang et.al. | 2503.02284 | null |
| 2025-03-04 | FABG : End-to-end Imitation Learning for Embodied Affective Human-Robot Interaction | Yanghai Zhang et.al. | 2503.01363 | null |
| 2025-03-04 | An Efficient 3D Convolutional Neural Network with Channel-wise, Spatial-grouped, and Temporal Convolutions | Zhe Wang et.al. | 2503.00796 | null |
| 2025-03-02 | One-Shot Gesture Recognition for Underwater Diver-To-Robot Communication | Rishikesh Joshi et.al. | 2503.00676 | null |
| 2025-03-04 | Unified Video Action Model | Shuang Li et.al. | 2503.00200 | null |
| 2025-02-28 | BST: Badminton Stroke-type Transformer for Skeleton-based Action Recognition in Racket Sports | Jing-Yuan Chang et.al. | 2502.21085 | null |
| 2025-02-27 | Learning to Generalize without Bias for Open-Vocabulary Action Recognition | Yating Yu et.al. | 2502.20158 | null |
| 2025-02-27 | QORT-Former: Query-optimized Real-time Transformer for Understanding Two Hands Manipulating Objects | Elkhan Ismayilzada et.al. | 2502.19769 | null |
| 2025-02-26 | Deep Learning For Time Series Analysis With Application On Human Motion | Ali Ismail-Fawaz et.al. | 2502.19364 | null |
| 2025-02-26 | UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering | Langming Liu et.al. | 2502.19178 | null |
| 2025-02-25 | EgoSim: An Egocentric Multi-view Simulator and Real Dataset for Body-worn Cameras during Motion and Activity | Dominik Hollidt et.al. | 2502.18373 | null |
| 2025-02-25 | Edge Training and Inference with Analog ReRAM Technology for Hand Gesture Recognition | Victoria Clerico et.al. | 2502.18152 | null |
| 2025-02-23 | Trunk-branch Contrastive Network with Multi-view Deformable Aggregation for Multi-view Action Recognition | Yingyuan Yang et.al. | 2502.16493 | null |
| 2025-02-20 | Online hand gesture recognition using Continual Graph Transformers | Rim Slama et.al. | 2502.14939 | null |
| 2025-02-19 | Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral | Shivani Kumar et.al. | 2502.14083 | null |
| 2025-02-19 | PSCon: Toward Conversational Product Search | Jie Zou et.al. | 2502.13881 | null |
| 2025-02-19 | SNN-Driven Multimodal Human Action Recognition via Event Camera and Skeleton Data Fusion | Naichuan Zheng et.al. | 2502.13385 | null |
| 2025-02-18 | Beyond Timesteps: A Novel Activation-wise Membrane Potential Propagation Mechanism for Spiking Neural Networks in 3D cloud | Jian Song et.al. | 2502.12791 | null |
| 2025-02-18 | Adaptive Prototype Model for Attribute-based Multi-label Few-shot Action Recognition | Juefeng Xiao et.al. | 2502.12582 | null |
| 2025-02-25 | Duo Streamers: A Streaming Gesture Recognition Framework | Boxuan Zhu et.al. | 2502.12297 | link |
| 2025-02-17 | Can LLMs Simulate Social Media Engagement? A Study on Action-Guided Response Generation | Zhongyi Qiu et.al. | 2502.12073 | null |
| 2025-02-14 | ManiTrend: Bridging Future Generation and Action Prediction with 3D Flow for Robotic Manipulation | Yuxin He et.al. | 2502.10028 | null |
| 2025-02-14 | VicKAM: Visual Conceptual Knowledge Guided Action Map for Weakly Supervised Group Activity Recognition | Zhuming Wang et.al. | 2502.09967 | null |
| 2025-02-13 | CellFlow: Simulating Cellular Morphology Changes via Flow Matching | Yuhui Zhang et.al. | 2502.09775 | null |
| 2025-02-12 | Measuring Anxiety Levels with Head Motion Patterns in Severe Depression Population | Fouad Boualeb et.al. | 2502.08813 | null |
| 2025-02-18 | Robot Data Curation with Mutual Information Estimators | Joey Hejna et.al. | 2502.08623 | null |
| 2025-02-12 | DGSense: A Domain Generalization Framework for Wireless Sensing | Rui Zhou et.al. | 2502.08155 | null |
| 2025-02-11 | Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis | Amir Hosein Fadaei et.al. | 2502.07277 | null |
| 2025-02-10 | From Image to Video: An Empirical Study of Diffusion Representations | Pedro Vélez et.al. | 2502.07001 | null |
| 2025-02-10 | Conformal Predictions for Human Action Recognition with Vision-Language Models | Bary Tim et.al. | 2502.06631 | null |
| 2025-02-10 | AppVLM: A Lightweight Vision Language Model for Online App Control | Georgios Papoudakis et.al. | 2502.06395 | null |
| 2025-02-09 | Preventing Rogue Agents Improves Multi-Agent Collaboration | Ohav Barbi et.al. | 2502.05986 | link |
| 2025-02-09 | HyLiFormer: Hyperbolic Linear Attention for Skeleton-based Human Action Recognition | Yue Li et.al. | 2502.05869 | null |
| 2025-02-11 | HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation | Yi Li et.al. | 2502.05485 | null |
| 2025-02-06 | HD-EPIC: A Highly-Detailed Egocentric Video Dataset | Toby Perrett et.al. | 2502.04144 | null |
| 2025-02-06 | MD-BERT: Action Recognition in Dark Videos via Dynamic Multi-Stream Fusion and Temporal Modeling | Sharana Dharshikgan Suresh Dass et.al. | 2502.03724 | null |
| 2025-02-10 | Kronecker Mask and Interpretive Prompts are Language-Action Video Learners | Jingyi Yang et.al. | 2502.03549 | null |
| 2025-02-05 | SKI Models: Skeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living | Arkaprava Sinha et.al. | 2502.03459 | null |
| 2025-02-01 | Minimalistic Video Saliency Prediction via Efficient Decoder & Spatio Temporal Action Cues | Rohit Girmaji et.al. | 2502.00397 | null |
| 2025-01-31 | ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition | Joseph Fioresi et.al. | 2502.00156 | null |
| 2025-01-31 | From Soft Materials to Controllers with NeuroTouch: A Neuromorphic Tactile Sensor for Real-Time Gesture Recognition | Victor Hoffmann et.al. | 2501.19174 | null |
| 2025-01-31 | XRF V2: A Dataset for Action Summarization with Wi-Fi Signals, and IMUs in Phones, Watches, Earbuds, and Glasses | Bo Lan et.al. | 2501.19034 | link |
| 2025-02-03 | Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models | Hao Dong et.al. | 2501.18592 | link |
| 2025-01-29 | Action Recognition Using Temporal Shift Module and Ensemble Learning | Anh-Kiet Duong et.al. | 2501.17550 | link |
| 2025-01-28 | Bones of Contention: Exploring Query-Efficient Attacks Against Skeleton Recognition Systems | Yuxin Cao et.al. | 2501.16843 | null |
| 2025-01-27 | A Low-Cost, High-Precision Human-Machine Interaction Solution Based on Multi-Coil Wireless Charging Pads | Bojun Zhang et.al. | 2501.15885 | null |
| 2025-01-25 | Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data | Jiajie Li et.al. | 2501.15326 | null |
| 2025-01-27 | ACT-JEPA: Joint-Embedding Predictive Architecture Improves Policy Representation Learning | Aleksandar Vujinovic et.al. | 2501.14622 | null |
| 2025-01-24 | Optimizing Human Pose Estimation Through Focused Human and Joint Regions | Yingying Jiao et.al. | 2501.14439 | null |
| 2025-01-24 | Human Activity Recognition with a 6.5 GHz Reconfigurable Intelligent Surface for Wi-Fi 6E | Nuno Paulino et.al. | 2501.14423 | null |
| 2025-01-23 | MV-GMN: State Space Model for Multi-View Action Recognition | Yuhui Lin et.al. | 2501.13829 | null |
| 2025-01-23 | EgoHand: Ego-centric Hand Pose Estimation and Gesture Recognition with Head-mounted Millimeter-wave Radar and IMUs | Yizhe Lv et.al. | 2501.13805 | link |
| 2025-01-22 | SMART-Vision: Survey of Modern Action Recognition Techniques in Vision | Ali K. AlShami et.al. | 2501.13066 | null |
| 2025-01-22 | Can masking background and object reduce static bias for zero-shot action recognition? | Takumi Fukuzawa et.al. | 2501.12681 | null |
| 2025-01-21 | BlanketGen2-Fit3D: Synthetic Blanket Augmentation Towards Improving Real-World In-Bed Blanket Occluded Human Pose Estimation | Tamás Karácsony et.al. | 2501.12318 | null |
| 2025-01-21 | InsTALL: Context-aware Instructional Task Assistance with Multi-modal Large Language Models | Pha Nguyen et.al. | 2501.12231 | null |
| 2025-01-21 | DSTSA-GCN: Advancing Skeleton-Based Gesture Recognition with Semantic-Aware Spatio-Temporal Topology Modeling | Hu Cui et.al. | 2501.12086 | null |
| 2025-01-21 | Survey on Hand Gesture Recognition from Visual Input | Manousos Linardakis et.al. | 2501.11992 | null |
| 2025-01-19 | Rethinking Pseudo-Label Guided Learning for Weakly Supervised Temporal Action Localization from the Perspective of Noise Correction | Quan Zhang et.al. | 2501.11124 | null |
| 2025-01-23 | HFGCN:Hypergraph Fusion Graph Convolutional Networks for Skeleton-Based Action Recognition | Pengcheng Dong et.al. | 2501.11007 | null |
| 2025-01-18 | BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues | Prashant Jayannavar et.al. | 2501.10836 | null |
| 2025-01-15 | Visual WetlandBirds Dataset: Bird Species Identification and Behavior Recognition in Videos | Javier Rodriguez-Juan et.al. | 2501.08931 | link |
| 2025-01-13 | Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics | Tze Ho Elden Tse et.al. | 2501.07100 | null |
| 2025-01-12 | DRDT3: Diffusion-Refined Decision Test-Time Training Model | Xingshuai Huang et.al. | 2501.06718 | null |
| 2025-01-07 | Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models | Malak Mansour et.al. | 2501.05478 | null |
| 2025-01-09 | Improving Skeleton-based Action Recognition with Interactive Object Information | Hao Wen et.al. | 2501.05066 | link |
| 2025-01-08 | Video Summarisation with Incident and Context Information using Generative AI | Ulindu De Silva et.al. | 2501.04764 | null |
| 2025-01-08 | Assessing the Acceptance of a Mid-Air Gesture Syntax for Smart Space Interaction: An Empirical Study | Ana M. Bernardos et.al. | 2501.04464 | null |
| 2025-01-07 | Extraction Of Cumulative Blobs From Dynamic Gestures | Rishabh Naulakha et.al. | 2501.04002 | null |
| 2025-01-06 | Large Language Models for Video Surveillance Applications | Ulindu De Silva et.al. | 2501.02850 | null |
| 2025-01-05 | Evolving Skeletons: Motion Dynamics in Action Recognition | Jushang Qiu et.al. | 2501.02593 | null |
| 2025-01-02 | SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization | Yongle Huang et.al. | 2501.01245 | link |
| 2025-01-02 | Event Masked Autoencoder: Point-wise Action Recognition with Event-Based Cameras | Jingkai Sun et.al. | 2501.01040 | null |
| 2025-01-01 | Multiscaled Multi-Head Attention-based Video Transformer Network for Hand Gesture Recognition | Mallika Garg et.al. | 2501.00935 | null |
| 2025-01-01 | Multimodal Large Models Are Effective Action Anticipators | Binglu Wang et.al. | 2501.00795 | link |
| 2024-12-31 | M2I2: Learning Efficient Multi-Agent Communication via Masked State Modeling and Intention Inference | Chuxiong Sun et.al. | 2501.00312 | null |
| 2024-12-30 | A Large-Scale Study on Video Action Dataset Condensation | Yang Chen et.al. | 2412.21197 | null |
| 2024-12-30 | Frequency-aware Event Cloud Network | Hongwei Ren et.al. | 2412.20803 | null |
| 2024-12-29 | FreqMixFormerV2: Lightweight Frequency-aware Mixed Transformer for Human Skeleton Action Recognition | Wenhan Wu et.al. | 2412.20621 | link |
| 2024-12-29 | Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation | Qucheng Peng et.al. | 2412.20538 | link |
| 2024-12-29 | Improving Vision-Language-Action Models via Chain-of-Affordance | Jinming Li et.al. | 2412.20451 | null |
| 2024-12-28 | DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments | Xijun Wang et.al. | 2412.20042 | null |
| 2024-12-27 | Generalized Uncertainty-Based Evidential Fusion with Hybrid Multi-Head Attention for Weak-Supervised Temporal Action Localization | Yuanpeng He et.al. | 2412.19418 | link |
| 2024-12-25 | SWAG: Long-term Surgical Workflow Prediction with Generative-based Anticipation | Maxence Boels et.al. | 2412.18849 | null |
| 2024-12-25 | Skeleton-based Action Recognition with Non-linear Dependency Modeling and Hilbert-Schmidt Independence Criterion | Yuheng Yang et.al. | 2412.18780 | link |
| 2024-12-24 | Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer | Fenghua Shao et.al. | 2412.18321 | null |
| 2024-12-23 | HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data | Ting Zhou et.al. | 2412.17574 | null |
| 2024-12-22 | Video Domain Incremental Learning for Human Action Recognition in Home Environments | Yuanda Hu et.al. | 2412.16946 | null |
| 2024-12-21 | Optical Wireless Communications: Enabling the Next Generation Network of Networks | Aravindh Krishnamoorthy et.al. | 2412.16798 | null |
| 2024-12-21 | FACTS: Fine-Grained Action Classification for Tactical Sports | Christopher Lai et.al. | 2412.16454 | null |
| 2024-12-20 | iRadar: Synthesizing Millimeter-Waves from Wearable Inertial Inputs for Human Gesture Sensing | Huanqi Yang et.al. | 2412.15980 | null |
| 2024-12-19 | Synchronized and Fine-Grained Head for Skeleton-Based Ambiguous Action Recognition | Hao Huang et.al. | 2412.14833 | null |
| 2024-12-19 | Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition | Kun Li et.al. | 2412.14719 | link |
| 2024-12-24 | Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models | Xinghang Li et.al. | 2412.14058 | link |
| 2024-12-18 | Do Language Models Understand Time? | Xi Ding et.al. | 2412.13845 | link |
| 2024-12-17 | CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices | Andrei Znobishchev et.al. | 2412.13273 | null |
| 2024-12-20 | Future Aspects in Human Action Recognition: Exploring Emerging Techniques and Ethical Influences | Antonios Gasteratos et.al. | 2412.12990 | null |
| 2024-12-16 | Designing Semi-Structured Pruning of Graph Convolutional Networks for Skeleton-based Recognition | Hichem Sahbi et.al. | 2412.11813 | null |
| 2024-12-13 | TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies | Ruijie Zheng et.al. | 2412.10345 | null |
| 2024-12-13 | Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP | Yating Yu et.al. | 2412.09895 | link |
| 2024-12-14 | USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation | Wanjiang Weng et.al. | 2412.09220 | link |
| 2024-12-13 | Temporal Action Localization with Cross Layer Task Decoupling and Refinement | Qiang Li et.al. | 2412.09202 | link |
| 2024-12-12 | Goal-Conditioned Supervised Learning for Multi-Objective Recommendation | Shijun Li et.al. | 2412.08911 | null |
| 2024-12-10 | SAT: Spatial Aptitude Training for Multimodal Language Models | Arijit Ray et.al. | 2412.07755 | link |
| 2024-12-10 | Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence | Wenbo Huang et.al. | 2412.07481 | null |
| 2024-12-09 | Mining Limited Data Sufficiently: A BERT-inspired Approach for CSI Time Series Application in Wireless Communication and Sensing | Zijian Zhao et.al. | 2412.06861 | link |
| 2024-12-09 | Exploring the Impact of Synthetic Data on Human Gesture Recognition Tasks Using GANs | George Kontogiannis et.al. | 2412.06389 | null |
| 2024-12-07 | Action Recognition based Industrial Safety Violation Detection | Surya N Reddy et.al. | 2412.05531 | null |
| 2024-12-06 | CCS: Continuous Learning for Customized Incremental Wireless Sensing Services | Qunhang Fu et.al. | 2412.04821 | null |
| 2024-12-06 | KNN-MMD: Cross Domain Wi-Fi Sensing Based on Local Distribution Alignment | Zijian Zhao et.al. | 2412.04783 | link |
| 2024-12-03 | Proximal Control of UAVs with Federated Learning for Human-Robot Collaborative Domains | Lucas Nogueira Nobrega et.al. | 2412.02863 | null |
| 2024-12-03 | Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation | Xuanlin Li et.al. | 2412.02676 | null |
| 2024-12-02 | Human-Machine Interfaces for Subsea Telerobotics: From Soda-straw to Natural Language Interactions | Adnan Abdullah et.al. | 2412.01753 | null |
| 2024-12-02 | HaGRIDv2: 1M Images for Static and Dynamic Hand Gesture Recognition | Anton Nuzhdin et.al. | 2412.01508 | link |
| 2024-12-02 | EdgeOAR: Real-time Online Action Recognition On Edge Devices | Wei Luo et.al. | 2412.01267 | null |
| 2024-11-29 | CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation | Qixiu Li et.al. | 2411.19650 | null |
| 2024-11-29 | SkelMamba: A State Space Model for Efficient Skeleton Action Recognition of Neurological Disorders | Niki Martinel et.al. | 2411.19544 | null |
| 2024-11-29 | Hierarchical Framework for Retrosynthesis Prediction with Enhanced Reaction Center Localization | Seongeun Yun et.al. | 2411.19503 | null |
| 2024-11-28 | TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition | Yilong Wang et.al. | 2411.19041 | null |
| 2024-11-28 | Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition | Hongda Liu et.al. | 2411.18941 | link |
| 2024-11-27 | Robust Dynamic Gesture Recognition at Ultra-Long Distances | Eran Bamani Beeri et.al. | 2411.18413 | null |
| 2024-11-27 | EventCrab: Harnessing Frame and Point Synergy for Event-based Action Recognition and Beyond | Meiqi Cao et.al. | 2411.18328 | null |
| 2024-11-27 | An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action Recognition | Song-Jiang Lai et.al. | 2411.18002 | null |
| 2024-11-26 | Pre-training for Action Recognition with Automatically Generated Fractal Datasets | Davyd Svyezhentsev et.al. | 2411.17584 | link |
| 2024-11-26 | Real-Time Multimodal Signal Processing for HRI in RoboCup: Understanding a Human Referee | Filippo Ansalone et.al. | 2411.17347 | null |
| 2024-11-22 | TSkips: Efficiency Through Explicit Temporal Delay Connections in Spiking Neural Networks | Prajna G. Malettira et.al. | 2411.16711 | null |
| 2024-11-24 | OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions | Guanyu Zhou et.al. | 2411.15729 | link |
| 2024-11-23 | Machine Learning-based sEMG Signal Classification for Hand Gesture Recognition | Parshuram N. Aarotale et.al. | 2411.15655 | null |
| 2024-11-23 | Optimizing Gesture Recognition for Seamless UI Interaction Using Convolutional Neural Networks | Qi Sun et.al. | 2411.15598 | null |
| 2024-11-22 | When Spatial meets Temporal in Action Recognition | Huilin Chen et.al. | 2411.15284 | null |
| 2024-11-22 | Adaptive Hyper-Graph Convolution Network for Skeleton-based Human Action Recognition with Virtual Connections | Youwei Zhou et.al. | 2411.14796 | null |
| 2024-11-22 | Aim My Robot: Precision Local Navigation to Any Object | Xiangyun Meng et.al. | 2411.14770 | null |
| 2024-11-21 | Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning | Jiange Yang et.al. | 2411.14519 | null |
| 2024-11-18 | Enhancing Bidirectional Sign Language Communication: Integrating YOLOv8 and NLP for Real-Time Gesture Recognition & Translation | Hasnat Jamil Bhuiyan et.al. | 2411.13597 | null |
| 2024-11-23 | AzSLD: Azerbaijani Sign Language Dataset for Fingerspelling, Word, and Sentence Translation with Baseline Software | Nigar Alishzade et.al. | 2411.12865 | null |
| 2024-11-20 | Topological Symmetry Enhanced Graph Convolution for Skeleton-Based Action Recognition | Zeyu Liang et.al. | 2411.12560 | link |
| 2024-11-19 | Rethinking Top Probability from Multi-view for Distracted Driver Behaviour Localization | Quang Vinh Nguyen et.al. | 2411.12525 | null |
| 2024-11-18 | Video-to-Task Learning via Motion-Guided Attention for Few-Shot Action Recognition | Hanyu Guo et.al. | 2411.11335 | null |
| 2024-11-18 | Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition | Yang Chen et.al. | 2411.11288 | null |
| 2024-11-18 | Efficient Transfer Learning for Video-language Foundation Models | Haoxing Chen et.al. | 2411.11223 | link |
| 2024-11-16 | TDSM:Triplet Diffusion for Skeleton-Text Matching in Zero-Shot Action Recognition | Jeonghyeok Do et.al. | 2411.10745 | link |
| 2024-11-15 | KuaiFormer: Transformer-Based Retrieval at Kuaishou | Chi Liu et.al. | 2411.10057 | null |
| 2024-11-14 | Towards Scalable Handwriting Communication via EEG Decoding and Latent Embedding Integration | Jun-Young Kim et.al. | 2411.09170 | null |
| 2024-11-14 | VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation | Youpeng Wen et.al. | 2411.09153 | null |
| 2024-11-13 | Can MLLMs Guide Weakly-Supervised Temporal Action Localization Tasks? | Quan Zhang et.al. | 2411.08466 | null |
| 2024-11-13 | Generative AI for Data Augmentation in Wireless Networks: Analysis, Applications, and Case Study | Jinbo Wen et.al. | 2411.08341 | null |
| 2024-11-12 | LapGSR: Laplacian Reconstructive Network for Guided Thermal Super-Resolution | Aditya Kasliwal et.al. | 2411.07750 | null |
| 2024-11-12 | OWLed: Outlier-weighed Layerwise Pruning for Efficient Autonomous Driving Framework | Jiaxi Li et.al. | 2411.07711 | null |
| 2024-11-11 | ConvMixFormer- A Resource-efficient Convolution Mixer for Transformer-based Dynamic Hand Gesture Recognition | Mallika Garg et.al. | 2411.07118 | link |
| 2024-11-10 | Extended multi-stream temporal-attention module for skeleton-based human action recognition (HAR) | Faisal Mehmood et.al. | 2411.06553 | null |
| 2024-11-10 | SuperResolution Radar Gesture Recognitio | Netanel Blumenfeld et.al. | 2411.06410 | null |
| 2024-11-08 | Video RWKV:Video Action Recognition Based RWKV | Zhuowen Yin et.al. | 2411.05636 | null |
| 2024-11-06 | Object Recognition in Human Computer Interaction:- A Comparative Analysis | Kaushik Ranade et.al. | 2411.04263 | null |
| 2024-11-06 | Explaining Human Activity Recognition with SHAP: Validating Insights with Perturbation and Quantitative Measures | Felix Tempel et.al. | 2411.03714 | link |
| 2024-11-05 | One-Stage-TFS: Thai One-Stage Fingerspelling Dataset for Fingerspelling Recognition Frameworks | Siriwiwat Lata et.al. | 2411.02768 | null |
| 2024-11-04 | TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos | Leonardo Plini et.al. | 2411.02570 | null |
| 2024-11-04 | AM Flow: Adapters for Temporal Processing in Action Recognition | Tanay Agrawal et.al. | 2411.02065 | null |
| 2024-11-04 | ARN-LSTM: A Multi-Stream Attention-Based Model for Action Recognition with Temporal Dynamics | Chuanchuan Wang et.al. | 2411.01769 | null |
| 2024-10-31 | Technical Report for ActivityNet Challenge 2022 – Temporal Action Localization | Shimin Chen et.al. | 2411.00883 | null |
| 2024-10-30 | A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footage | Levi Harris et.al. | 2411.00862 | null |
| 2024-11-01 | STAA: Spatio-Temporal Attention Attribution for Real-Time Interpreting Transformer-based Video Models | Zerui Wang et.al. | 2411.00630 | link |
| 2024-11-01 | Human Action Recognition (HAR) Using Skeleton-based Spatial Temporal Relative Transformer Network: ST-RTR | Faisal Mehmood et.al. | 2410.23806 | null |
| 2024-10-31 | Recovering Complete Actions for Cross-dataset Skeleton Action Recognition | Hanchao Liu et.al. | 2410.23641 | null |
| 2024-10-30 | Keypoint Abstraction using Large Models for Object-Relative Imitation Learning | Xiaolin Fang et.al. | 2410.23254 | null |
| 2024-10-30 | AtGCN: A Graph Convolutional Network For Ataxic Gait Detection | Karan Bania et.al. | 2410.22862 | null |
| 2024-10-29 | ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding | Kimihiro Hasegawa et.al. | 2410.22211 | link |
| 2024-10-29 | Multi-Level Feature Distillation of Joint Teachers Trained on Distinct Image Datasets | Adrian Iordache et.al. | 2410.22184 | link |
| 2024-10-28 | Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context | Manuel Benavent-Lledo et.al. | 2410.21275 | link |
| 2024-10-28 | One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation | Zhendong Wang et.al. | 2410.21257 | null |
| 2024-10-28 | Zero-Shot Action Recognition in Surveillance Videos | Joao Pereira et.al. | 2410.21113 | null |
| 2024-10-28 | LiGAR: LiDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition | Naga Venkata Sai Raviteja Chappa et.al. | 2410.21108 | null |
| 2024-10-27 | Exocentric To Egocentric Transfer For Action Recognition: A Short Survey | Anirudh Thatipelli et.al. | 2410.20621 | null |
| 2024-10-27 | Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition | Lilang Lin et.al. | 2410.20349 | null |
| 2024-10-28 | x-RAGE: eXtended Reality – Action & Gesture Events Dataset | Vivek Parmar et.al. | 2410.19486 | null |
| 2024-10-24 | Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms | Zhangheng Li et.al. | 2410.18967 | link |
| 2024-10-24 | Research on gesture recognition method based on SEDCNN-SVM | Mingjin Zhang et.al. | 2410.18557 | null |
| 2024-10-23 | Unsupervised Domain Adaptation for Action Recognition via Self-Ensembling and Conditional Embedding Alignment | Indrajeet Ghosh et.al. | 2410.17489 | link |
| 2024-10-22 | Are Visual-Language Models Effective in Action Recognition? A Comparative Study | Mahmoud Ali et.al. | 2410.17149 | null |
| 2024-10-22 | Masked Differential Privacy | David Schneider et.al. | 2410.17098 | null |
| 2024-10-22 | SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition | Jiaqi Chen et.al. | 2410.16746 | link |
| 2024-10-21 | Improving the Multi-label Atomic Activity Recognition by Robust Visual Feature and Advanced Attention @ ROAD++ Atomic Activity Recognition 2024 | Jiamin Cao et.al. | 2410.16037 | null |
| 2024-10-19 | CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation | Shangning Xia et.al. | 2410.14974 | null |
| 2024-10-18 | DFlow: Diverse Dialogue Flow Simulation with Large Language Models | Wanyu Du et.al. | 2410.14853 | null |
| 2024-10-18 | Storyboard guided Alignment for Fine-grained Video Action Recognition | Enqi Liu et.al. | 2410.14238 | null |
| 2024-10-17 | SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs | Yuling Gu et.al. | 2410.13648 | null |
| 2024-10-16 | In-Context Learning Enables Robot Action Prediction in LLMs | Yida Yin et.al. | 2410.12782 | null |
| 2024-10-14 | Continual Learning Improves Zero-Shot Action Recognition | Shreyank N Gowda et.al. | 2410.10497 | null |
| 2024-10-16 | PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation | Kaidong Zhang et.al. | 2410.10394 | null |
| 2024-10-13 | EITNet: An IoT-Enhanced Framework for Real-Time Basketball Action Recognition | Jingyu Liu et.al. | 2410.09954 | null |
| 2024-10-13 | Multi class activity classification in videos using Motion History Image generation | Senthilkumar Gopal et.al. | 2410.09902 | link |
| 2024-10-12 | Advanced Gesture Recognition in Autism: Integrating YOLOv7, Video Augmentation and VideoMAE for Video Analysis | Amit Kumar Singh et.al. | 2410.09339 | null |
| 2024-10-11 | Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning | Yunpeng Gao et.al. | 2410.08500 | null |
| 2024-10-10 | Human Stone Toolmaking Action Grammar (HSTAG): A Challenging Benchmark for Fine-grained Motor Behavior Recognition | Cheng Liu et.al. | 2410.08410 | null |
| 2024-10-10 | Understanding Spatio-Temporal Relations in Human-Object Interaction using Pyramid Graph Convolutional Network | Hao Xing et.al. | 2410.07912 | null |
| 2024-10-09 | CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition | Yuhang Wen et.al. | 2410.07153 | link |
| 2024-10-09 | Fourier-based Action Recognition for Wildlife Behavior Quantification with Event Cameras | Friedhelm Hamann et.al. | 2410.06698 | null |
| 2024-10-08 | GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation | Chi-Lam Cheang et.al. | 2410.06158 | null |
| 2024-10-10 | ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition | Mohammadreza Salehi et.al. | 2410.05774 | null |
| 2024-10-07 | Exploring Gestural Interaction with a Cushion Interface for Smart Home Control | Yuri Suzuki et.al. | 2410.04730 | null |
| 2024-10-05 | TR-LLM: Integrating Trajectory Data for Scene-Aware LLM-Based Human Action Prediction | Kojiro Takeyama et.al. | 2410.03993 | null |
| 2024-10-04 | Shadow Augmentation for Handwashing Action Recognition: from Synthetic to Real Datasets | Shengtai Ju et.al. | 2410.03984 | null |
| 2024-10-04 | Action Selection Learning for Multi-label Multi-view Action Recognition | Trung Thanh Nguyen et.al. | 2410.03302 | link |
| 2024-10-03 | DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects | Zhaowei Wang et.al. | 2410.02730 | link |
| 2024-10-03 | An Evaluation of Large Pre-Trained Models for Gesture Recognition using Synthetic Videos | Arun Reddy et.al. | 2410.02152 | null |
| 2024-10-02 | Language Supervised Human Action Recognition with Salient Fusion: Construction Worker Action Recognition as a Use Case | Mohammad Mahdavian et.al. | 2410.01962 | null |
| 2024-10-02 | Sparse Covariance Neural Networks | Andrea Cavallo et.al. | 2410.01669 | link |
| 2024-10-02 | Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy | Ricardo Garcia et.al. | 2410.01345 | link |
| 2024-10-01 | Dynamic Planning for LLM-based Graphical User Interface Automation | Shaoqing Zhang et.al. | 2410.00467 | link |
| 2024-09-30 | SurgPETL: Parameter-Efficient Image-to-Surgical-Video Transfer Learning for Surgical Phase Recognition | Shu Yang et.al. | 2409.20083 | null |
| 2024-09-28 | Gesture Recognition for Feedback Based Mixed Reality and Robotic Fabrication: A Case Study of the UnLog Tower | Alexander Htet Kyaw et.al. | 2409.19281 | null |
| 2024-09-26 | SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining | Ruiqi Xian et.al. | 2409.18300 | null |
| 2024-09-26 | Spatial Hierarchy and Temporal Attention Guided Cross Masking for Self-supervised Skeleton-based Action Recognition | Xinpeng Yin et.al. | 2409.17951 | link |
| 2024-09-26 | EAGLE: Egocentric AGgregated Language-video Engine | Jing Bi et.al. | 2409.17523 | null |
| 2024-09-25 | Path-adaptive Spatio-Temporal State Space Model for Event-based Recognition with Arbitrary Duration | Jiazhou Zhou et.al. | 2409.16953 | null |
| 2024-09-25 | Dynamic Obstacle Avoidance through Uncertainty-Based Adaptive Planning with Diffusion | Vineet Punyamoorty et.al. | 2409.16950 | null |
| 2024-09-24 | Hand Gesture Classification Based on Forearm Ultrasound Video Snippets Using 3D Convolutional Neural Networks | Keshav Bimbraw et.al. | 2409.16431 | null |
| 2024-09-22 | Zero-Shot Skeleton-based Action Recognition with Dual Visual-Text Alignment | Jidong Kuang et.al. | 2409.14336 | null |
| 2024-09-21 | Egocentric zone-aware action recognition across environments | Simone Alberto Peirone et.al. | 2409.14205 | null |
| 2024-09-19 | Interpretable Action Recognition on Hard to Classify Actions | Anastasia Anichenko et.al. | 2409.13091 | null |
| 2024-09-18 | Distillation-free Scaling of Large SSMs for Images and Videos | Hamid Suleman et.al. | 2409.11867 | null |
| 2024-09-17 | Mamba Fusion: Learning Actions Through Questioning | Zhikang Dong et.al. | 2409.11513 | link |
| 2024-09-16 | Forearm Ultrasound based Gesture Recognition on Edge | Keshav Bimbraw et.al. | 2409.09915 | null |
| 2024-09-15 | Integrating Audio Narrations to Strengthen Domain Generalization in Multimodal First-Person Action Recognition | Cagri Gungor et.al. | 2409.09611 | null |
| 2024-09-14 | MulCPred: Learning Multi-modal Concepts for Explainable Pedestrian Action Prediction | Yan Feng et.al. | 2409.09446 | link |
| 2024-09-14 | KAN-HyperpointNet for Point Cloud Sequence-Based 3D Human Action Recognition | Zhaoyu Chen et.al. | 2409.09444 | null |
| 2024-09-14 | ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild | Arya Farkhondeh et.al. | 2409.09319 | link |
| 2024-09-13 | Using The Concept Hierarchy for Household Action Recognition | Andrei Costinescu et.al. | 2409.08853 | null |
| 2024-09-12 | Customized Mid-Air Gestures for Accessibility: A $B Recognizer for Multi-Dimensional Biosignal Gestures | Momona Yamagami et.al. | 2409.08402 | null |
| 2024-09-12 | Spatial Adaptation Layer: Interpretable Domain Adaptation For Biosignal Sensor Array Applications | Joao Pereira et.al. | 2409.08058 | null |
| 2024-09-16 | InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation | Andrew Lee et.al. | 2409.07914 | null |
| 2024-09-11 | 2D bidirectional gated recurrent unit convolutional Neural networks for end-to-end violence detection In videos | Abdarahmane Traoré et.al. | 2409.07588 | null |
| 2024-09-10 | Data Collection-free Masked Video Modeling | Yuchi Ishikawa et.al. | 2409.06665 | null |
| 2024-09-10 | Advancements in Gesture Recognition Techniques and Machine Learning for Enhanced Human-Robot Interaction: A Comprehensive Review | Sajjad Hussain et.al. | 2409.06503 | null |
| 2024-09-10 | Learning Generative Interactive Environments By Trained Agent Exploration | Naser Kazemi et.al. | 2409.06445 | link |
| 2024-09-09 | ReL-SAR: Representation Learning for Skeleton Action Recognition with Convolutional Transformers and BYOL | Safwen Naimi et.al. | 2409.05749 | null |
| 2024-09-11 | Real-Time Human Action Recognition on Embedded Platforms | Ruiqi Wang et.al. | 2409.05662 | null |
| 2024-09-06 | Self-Supervised Contrastive Learning for Videos using Differentiable Local Alignment | Keyne Oei et.al. | 2409.04607 | null |
| 2024-09-05 | MVTN: A Multiscale Video Transformer Network for Hand Gesture Recognition | Mallika Garg et.al. | 2409.03890 | link |
| 2024-09-05 | UAV (Unmanned Aerial Vehicles): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking | Md. Mahfuzur Rahman et.al. | 2409.03245 | null |
| 2024-09-04 | SITAR: Semi-supervised Image Transformer for Action Recognition | Owais Iqbal et.al. | 2409.02910 | null |
| 2024-09-04 | TASAR: Transferable Attack on Skeletal Action Recognition | Yunfeng Diao et.al. | 2409.02483 | link |
| 2024-09-04 | Unified Framework with Consistency across Modalities for Human Activity Recognition | Tuyen Tran et.al. | 2409.02385 | null |
| 2024-09-07 | Unfolding Videos Dynamics via Taylor Expansion | Siyi Chen et.al. | 2409.02371 | null |
| 2024-09-03 | ADHD diagnosis based on action characteristics recorded in videos using machine learning | Yichun Li et.al. | 2409.02274 | null |
| 2024-09-03 | Action-Based ADHD Diagnosis in Video | Yichun Li et.al. | 2409.02261 | null |
| 2024-09-03 | ReSpike: Residual Frames-based Hybrid Spiking Neural Networks for Efficient Action Recognition | Shiting Xiao et.al. | 2409.01564 | null |
| 2024-09-02 | FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition | Ishan Rajendrakumar Dave et.al. | 2409.01448 | null |
| 2024-09-01 | Fisher Information guided Purification against Backdoor Attacks | Nazmul Karim et.al. | 2409.00863 | link |
| 2024-09-01 | A Critical Analysis on Machine Learning Techniques for Video-based Human Activity Recognition of Surveillance Systems: A Review | Shahriar Jahan et.al. | 2409.00731 | null |
| 2024-09-03 | Open-vocabulary Temporal Action Localization using VLMs | Naoki Wake et.al. | 2408.17422 | null |
| 2024-08-29 | Text-Enhanced Zero-Shot Action Recognition: A training-free approach | Massimo Bosetti et.al. | 2408.16412 | null |
| 2024-08-28 | DEAR: Depth-Enhanced Action Recognition | Sadegh Rahmaniboldaji et.al. | 2408.15679 | link |
| 2024-08-28 | Online pre-training with long-form videos | Itsuki Kato et.al. | 2408.15651 | null |
| 2024-09-04 | Hand1000: Generating Realistic Hands from Text with Only 1,000 Images | Haozhuo Zhang et.al. | 2408.15461 | null |
| 2024-08-26 | Comparative Analysis: Violence Recognition from Videos using Transfer Learning | Dursun Dashdamirov et.al. | 2408.14659 | link |
| 2024-08-25 | Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization | Jia-Run Du et.al. | 2408.13777 | link |
| 2024-08-25 | FMI-TAL: Few-shot Multiple Instances Temporal Action Localization by Probability Distribution Learning and Interval Cluster Refinement | Fengshun Wang et.al. | 2408.13765 | link |
| 2024-08-25 | EMG-Based Hand Gesture Recognition through Diverse Domain Feature Enhancement and Machine Learning-Based Approach | Abu Saleh Musa Miah et.al. | 2408.13723 | null |
| 2024-08-24 | HabitAction: A Video Dataset for Human Habitual Behavior Recognition | Hongwu Li et.al. | 2408.13463 | null |
| 2024-08-23 | N-DriverMotion: Driver motion learning and prediction using an event-based camera and directly trained spiking neural networks | Hyo Jong Chung et.al. | 2408.13379 | null |
| 2024-08-23 | Energy-Efficient Spiking Recurrent Neural Network for Gesture Recognition on Embedded GPUs | Marzieh Hassanshahi Varposhti et.al. | 2408.12978 | null |
| 2024-08-21 | Data-Free Class Incremental Gesture Recognition via Synthetic Feature Sampling | Zhenyu Lu et.al. | 2408.12629 | null |
| 2024-08-22 | Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition | Bozheng Li et.al. | 2408.12475 | null |
| 2024-08-23 | TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models | Hyeongmin Lee et.al. | 2408.11318 | link |
| 2024-08-21 | CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network | Zijian Zhao et.al. | 2408.10919 | link |
| 2024-08-20 | TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning | Bin Wang et.al. | 2408.10688 | link |
| 2024-08-19 | Narrowing the Gap between Vision and Action in Navigation | Yue Zhang et.al. | 2408.10388 | link |
| 2024-08-19 | SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition | Wiktor Mucha et.al. | 2408.10037 | link |
| 2024-08-19 | Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms | Xiao Wang et.al. | 2408.09764 | link |
| 2024-08-18 | Joint Temporal Pooling for Improving Skeleton-based Action Recognition | Shanaka Ramesh Gunasekara et.al. | 2408.09356 | null |
| 2024-08-17 | Intuitive Human-Robot Interface: A 3-Dimensional Action Recognition and UAV Collaboration Framework | Akash Chaudhary et.al. | 2408.09232 | null |
| 2024-08-17 | Flatten: Video Action Recognition is an Image Classification task | Junlin Chen et.al. | 2408.09220 | null |
| 2024-08-17 | Temporal Reversed Training for Spiking Neural Networks with Generalized Spatio-Temporal Representation | Lin Zuo et.al. | 2408.09108 | null |
| 2024-08-16 | Towards Physical World Backdoor Attacks against Skeleton Action Recognition | Qichen Zheng et.al. | 2408.08671 | null |
| 2024-08-15 | An Advanced Deep Learning Based Three-Stream Hybrid Model for Dynamic Hand Gesture Recognition | Md Abdur Rahim et.al. | 2408.08035 | null |
| 2024-08-12 | HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization | Sakib Reza et.al. | 2408.06437 | link |
| 2024-08-12 | Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization | Geuntaek Lim et.al. | 2408.05955 | link |
| 2024-08-10 | A Methodological and Structural Review of Hand Gesture Recognition Across Diverse Data Modalities | Jungpil Shin et.al. | 2408.05436 | null |
| 2024-08-10 | EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition | Ahmed Abdelkawy et.al. | 2408.05421 | link |
| 2024-08-06 | Prototype Learning for Micro-gesture Classification | Guoliang Chen et.al. | 2408.03097 | null |
| 2024-08-06 | Online Temporal Action Localization with Memory-Augmented Transformer | Youngkil Song et.al. | 2408.02957 | null |
| 2024-08-05 | From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation | Xin Liu et.al. | 2408.02769 | null |
| 2024-08-04 | Enhancing Human Action Recognition and Violence Detection Through Deep Learning Audiovisual Fusion | Pooya Janani et.al. | 2408.02033 | null |
| 2024-08-03 | MultiFuser: Multimodal Fusion Transformer for Enhanced Driver Action Recognition | Ruoyu Wang et.al. | 2408.01766 | null |
| 2024-08-03 | Signal-SGN: A Spiking Graph Convolutional Network for Skeletal Action Recognition via Learning Temporal-Frequency Dynamics | Naichuan Zheng et.al. | 2408.01701 | null |
| 2024-08-01 | Text-Guided Video Masked Autoencoder | David Fan et.al. | 2408.00759 | null |
| 2024-08-01 | How Effective are Self-Supervised Models for Contact Identification in Videos | Malitha Gunawardhana et.al. | 2408.00498 | null |
| 2024-08-01 | Task-Adapter: Task-specific Adaptation of Image Models for Few-shot Action Recognition | Congqi Cao et.al. | 2408.00249 | null |
| 2024-07-31 | Explainable Artificial Intelligence for Quantifying Interfering and High-Risk Behaviors in Autism Spectrum Disorder in a Real-World Classroom Environment Using Privacy-Preserving Video Analysis | Barun Das et.al. | 2407.21691 | null |
| 2024-07-31 | Skeleton-Based Action Recognition with Spatial-Structural Graph Convolution | Jingyao Wang et.al. | 2407.21525 | null |
| 2024-07-31 | Dynamic Gesture Recognition in Ultra-Range Distance for Effective Human-Robot Interaction | Eran Bamani Beeri et.al. | 2407.21374 | null |
| 2024-07-29 | Adversarial Robustness in RGB-Skeleton Action Recognition: Leveraging Attention Modality Reweighter | Chao Liu et.al. | 2407.19981 | null |
| 2024-07-29 | ActivityCLIP: Enhancing Group Activity Recognition by Mining Complementary Information from Text to Supplement Image Modality | Guoliang Xu et.al. | 2407.19820 | null |
| 2024-07-29 | PredIN: Towards Open-Set Gesture Recognition via Prediction Inconsistency | Chen Liu et.al. | 2407.19753 | null |
| 2024-07-28 | Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph | Zhengcen Li et.al. | 2407.19497 | link |
| 2024-07-25 | MARINE: A Computer Vision Model for Detecting Rare Predator-Prey Interactions in Animal Videos | Zsófia Katona et.al. | 2407.18289 | null |
| 2024-07-25 | Trajectory-aligned Space-time Tokens for Few-shot Action Recognition | Pulkit Kumar et.al. | 2407.18249 | null |
| 2024-07-26 | Harnessing Temporal Causality for Advanced Temporal Action Detection | Shuming Liu et.al. | 2407.17792 | link |
| 2024-07-23 | Fusion and Cross-Modal Transfer for Zero-Shot Human Action Recognition | Abhi Kamboj et.al. | 2407.16803 | null |
| 2024-07-23 | PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles | Aws Khalil et.al. | 2407.16740 | link |
| 2024-07-24 | SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition | Wenbo Huang et.al. | 2407.16344 | link |
| 2024-07-22 | Efficient and generalizable prediction of molecular alterations in multiple cancer cohorts using H&E whole slide images | Kshitij Ingale et.al. | 2407.15816 | null |
| 2024-07-25 | Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition | Jinfu Liu et.al. | 2407.15706 | link |
| 2024-07-21 | Semi-Supervised Pipe Video Temporal Defect Interval Localization | Zhu Huang et.al. | 2407.15170 | null |
| 2024-07-20 | Automated Patient Positioning with Learned 3D Hand Gestures | Zhongpai Gao et.al. | 2407.14903 | null |
| 2024-07-20 | Can VLMs be used on videos for action recognition? LLMs are Visual Reasoning Coordinators | Harsh Lunia et.al. | 2407.14834 | null |
| 2024-07-20 | Decoupled Prompt-Adapter Tuning for Continual Activity Recognition | Di Fu et.al. | 2407.14811 | null |
| 2024-07-20 | A Comprehensive Review of Few-shot Action Recognition | Yuyang Wanyan et.al. | 2407.14744 | null |
| 2024-07-19 | LORTSAR: Low-Rank Transformer for Skeleton-based Action Recognition | Soroush Oraki et.al. | 2407.14655 | null |
| 2024-07-19 | Fine-grained Knowledge Graph-driven Video-Language Learning for Action Recognition | Rui Zhang et.al. | 2407.14146 | null |
| 2024-07-19 | Zero-Shot Underwater Gesture Recognition | Sandipan Sarma et.al. | 2407.14103 | link |
| 2024-07-18 | Pose-guided multi-task video transformer for driver action recognition | Ricardo Pizarro et.al. | 2407.13750 | null |
| 2024-07-18 | SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders | Sheng-Wei Li et.al. | 2407.13460 | link |
| 2024-07-18 | QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person View | Trinh T. L. Vuong et.al. | 2407.13216 | link |
| 2024-07-18 | Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism | Sangyoun Lee et.al. | 2407.13078 | link |
| 2024-07-17 | ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos | Hyolim Kang et.al. | 2407.12987 | link |
| 2024-07-17 | NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models | Gengze Zhou et.al. | 2407.12366 | link |
| 2024-07-17 | Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer | Wenhan Wu et.al. | 2407.12322 | null |
| 2024-07-17 | Shap-Mix: Shapley Value Guided Mixing for Long-Tailed Skeleton Based Action Recognition | Jiahang Zhang et.al. | 2407.12312 | null |
| 2024-07-16 | Enhancing Split Computing and Early Exit Applications through Predefined Sparsity | Luigi Capogrosso et.al. | 2407.11763 | link |
| 2024-07-10 | Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical | Adarsh Prasad Behera et.al. | 2407.11061 | null |
| 2024-07-15 | STARS: Self-supervised Tuning for 3D Action Recognition in Skeleton Sequences | Soroush Mehraban et.al. | 2407.10935 | null |
| 2024-07-15 | Human-Centric Transformer for Domain Adaptive Action Recognition | Kun-Yu Lin et.al. | 2407.10860 | null |
| 2024-07-17 | Augmented Neural Fine-Tuning for Efficient Backdoor Purification | Nazmul Karim et.al. | 2407.10052 | link |
| 2024-07-13 | Region-aware Image-based Human Action Retrieval with Transformers | Hongsong Wang et.al. | 2407.09924 | null |
| 2024-07-16 | OmniRace: 6D Hand Pose Estimation for Intuitive Guidance of Racing Drone | Valerii Serpiva et.al. | 2407.09841 | link |
| 2024-07-12 | Full-Stage Pseudo Label Quality Enhancement for Weakly-supervised Temporal Action Localization | Qianhan Feng et.al. | 2407.08971 | link |
| 2024-07-11 | Boosting Adversarial Transferability for Skeleton-based Action Recognition via Exploring the Model Posterior Space | Yunfeng Diao et.al. | 2407.08572 | null |
| 2024-07-12 | Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization | Feixiang Zhou et.al. | 2407.07673 | null |
| 2024-07-10 | EA-VTR: Event-Aware Video-Text Retrieval | Zongyang Ma et.al. | 2407.07478 | null |
| 2024-07-09 | Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization | Jeongseok Hyun et.al. | 2407.07024 | link |
| 2024-07-09 | Rethinking Image-to-Video Adaptation: An Object-centric Perspective | Rui Qian et.al. | 2407.06871 | null |
| 2024-07-09 | Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition | Mingfang Zhang et.al. | 2407.06628 | null |
| 2024-07-08 | Noise-Free Explanation for Driving Action Prediction | Hongbo Zhu et.al. | 2407.06339 | link |
| 2024-07-08 | C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition | Rongchang Li et.al. | 2407.06113 | link |
| 2024-07-08 | DMSD-CDFSAR: Distillation from Mixed-Source Domain for Cross-Domain Few-shot Action Recognition | Fei Guo et.al. | 2407.05657 | null |
| 2024-07-11 | Helios: An extremely low power event-based gesture recognition for always-on smart eyewear | Prarthana Bhattacharyya et.al. | 2407.05206 | null |
| 2024-07-06 | DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition | Qi Wang et.al. | 2407.05106 | link |
| 2024-07-05 | AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation | Yuhan Zhu et.al. | 2407.04603 | null |
| 2024-07-05 | TF-SASM: Training-free Spatial-aware Sparse Memory for Multi-object Tracking | Thuc Nguyen-Quang et.al. | 2407.04327 | null |
| 2024-07-05 | Computer Vision for Clinical Gait Analysis: A Gait Abnormality Video Dataset | Rahm Ranjan et.al. | 2407.04190 | link |
| 2024-07-04 | Robust Policy Learning for Multi-UAV Collision Avoidance with Causal Feature Selection | Jiafan Zhuang et.al. | 2407.04056 | null |
| 2024-07-04 | On-Device Training Empowered Transfer Learning For Human Activity Recognition | Pixi Kang et.al. | 2407.03644 | null |
| 2024-07-03 | Motion meets Attention: Video Motion Prompts | Qixiang Chen et.al. | 2407.03179 | null |
| 2024-07-02 | Advancing Compressed Video Action Recognition through Progressive Knowledge Distillation | Efstathia Soufleri et.al. | 2407.02713 | link |
| 2024-07-02 | Novel Human Machine Interface via Robust Hand Gesture Recognition System using Channel Pruned YOLOv5s Model | Abir Sen et.al. | 2407.02585 | null |
| 2024-07-02 | Referring Atomic Video Action Recognition | Kunyu Peng et.al. | 2407.01872 | link |
| 2024-07-01 | Mask and Compress: Efficient Skeleton-based Action Recognition in Continual Learning | Matteo Mosconi et.al. | 2407.01397 | link |
| 2024-06-30 | Graph in Graph Neural Network | Jiongshu Wang et.al. | 2407.00696 | link |
| 2024-06-29 | Diving Deeper Into Pedestrian Behavior Understanding: Intention Estimation, Action Prediction, and Event Risk Assessment | Amir Rasouli et.al. | 2407.00446 | link |
| 2024-06-29 | PerAct2: A Perceiver Actor Framework for Bimanual Manipulation Tasks | Markus Grotz et.al. | 2407.00278 | null |
| 2024-06-27 | VideoMambaPro: A Leap Forward for Mamba in Video Understanding | Hui Lu et.al. | 2406.19006 | link |
| 2024-06-28 | CSI4Free: GAN-Augmented mmWave CSI for Improved Pose Classification | Nabeel Nisar Bhat et.al. | 2406.18684 | null |
| 2024-06-26 | The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval | Meinardus Boris et.al. | 2406.18113 | link |
| 2024-07-01 | EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation | Baoqi Pei et.al. | 2406.18070 | link |
| 2024-06-26 | Expressive Keypoints for Skeleton-based Action Recognition via Skeleton Transformation | Yijie Yang et.al. | 2406.18011 | link |
| 2024-06-25 | Using joint angles based on the international biomechanical standards for human action recognition and related tasks | Kevin Schlegel et.al. | 2406.17443 | null |
| 2024-06-21 | Open-Vocabulary Temporal Action Localization using Multimodal Guidance | Akshita Gupta et.al. | 2406.15556 | null |
| 2024-06-21 | SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition | Liutao Yu et.al. | 2406.15034 | null |
| 2024-06-21 | Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN | Oluwaleke Yusuf et.al. | 2406.15003 | link |
| 2024-06-20 | Self-supervised Multi-actor Social Activity Understanding in Streaming Videos | Shubham Trehan et.al. | 2406.14472 | null |
| 2024-06-19 | An Efficient yet High-Performance Method for Precise Radar-Based Imaging of Human Hand Poses | Johanna Bräunig et.al. | 2406.13464 | null |
| 2024-06-19 | Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition | Anqi Zhu et.al. | 2406.13327 | link |
| 2024-06-21 | Underwater Human-Robot and Human-Swarm Interaction: A Review and Perspective | Sara Aldhaheri et.al. | 2406.12473 | null |
| 2024-06-18 | Deep self-supervised learning with visualisation for automatic gesture recognition | Fabien Allemand et.al. | 2406.12440 | null |
| 2024-06-17 | Brain-inspired Computational Modeling of Action Recognition with Recurrent Spiking Neural Networks Equipped with Reinforcement Delay Learning | Alireza Nadafian et.al. | 2406.11778 | null |
| 2024-06-18 | CM2-Net: Continual Cross-Modal Mapping Network for Driver Action Recognition | Ruoyu Wang et.al. | 2406.11340 | null |
| 2024-06-17 | Expanding the Design Space of Computer Vision-based Interactive Systems for Group Dance Practice | Soohwan Lee et.al. | 2406.11236 | null |
| 2024-06-14 | Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild | Lingni Ma et.al. | 2406.09905 | null |
| 2024-06-12 | Enhancing End-to-End Autonomous Driving with Latent World Model | Yingyan Li et.al. | 2406.08481 | link |
| 2024-06-09 | ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition | Sanjoy Kundu et.al. | 2406.05722 | null |
| 2024-06-07 | SMART: Scene-motion-aware human action recognition framework for mental disorder group | Zengyuan Lai et.al. | 2406.04649 | link |
| 2024-06-06 | Enhancing Sign Language Detection through Mediapipe and Convolutional Neural Networks (CNN) | Aditya Raj Verma et.al. | 2406.03729 | null |
| 2024-06-05 | The Logarithmic Memristor-Based Bayesian Machine | Clément Turck et.al. | 2406.03492 | null |
| 2024-06-05 | FILS: Self-Supervised Video Feature Prediction In Semantic Language Space | Mona Ahmadian et.al. | 2406.03447 | null |
| 2024-06-05 | Self-Supervised Skeleton Action Representation Learning: A Benchmark and Beyond | Jiahang Zhang et.al. | 2406.02978 | null |
| 2024-06-04 | Contrastive Language Video Time Pre-training | Hengyue Liu et.al. | 2406.02631 | null |
| 2024-06-04 | DL-KDD: Dual-Light Knowledge Distillation for Action Recognition in the Dark | Chi-Jui Chang et.al. | 2406.02468 | null |
| 2024-06-04 | A Generalized Apprenticeship Learning Framework for Modeling Heterogeneous Student Pedagogical Strategies | Md Mirajul Islam et.al. | 2406.02450 | null |
| 2024-06-04 | Analyzing the Feature Extractor Networks for Face Image Synthesis | Erdi Sarıtaş et.al. | 2406.02153 | link |
| 2024-06-04 | Analyzing the Effect of Combined Degradations on Face Recognition | Erdi Sarıtaş et.al. | 2406.02142 | link |
| 2024-06-03 | ELSA: Evaluating Localization of Social Activities in Urban Streets | Maryam Hosseini et.al. | 2406.01551 | null |
| 2024-06-03 | HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models | Mengcheng Li et.al. | 2406.01334 | null |
| 2024-06-03 | Augmented Commonsense Knowledge for Remote Object Grounding | Bahram Mohammadi et.al. | 2406.01256 | link |
| 2024-06-03 | Understanding the Cross-Domain Capabilities of Video-Based Few-Shot Action Recognition Models | Georgia Markham et.al. | 2406.01073 | null |
| 2024-06-02 | An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition | Haojun Xu et.al. | 2406.00639 | null |
| 2024-05-31 | Action-OOD: An End-to-End Skeleton-Based Model for Robust Out-of-Distribution Human Action Detection | Jing Xu et.al. | 2405.20633 | link |
| 2024-05-31 | Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning | Yang Chen et.al. | 2405.20606 | null |
| 2024-05-30 | ENTIRe-ID: An Extensive and Diverse Dataset for Person Re-Identification | Serdar Yildiz et.al. | 2405.20465 | null |
| 2024-05-30 | From Forest to Zoo: Great Ape Behavior Recognition with ChimpBehave | Michael Fuchs et.al. | 2405.20025 | null |
| 2024-05-31 | Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition | Masashi Hatano et.al. | 2405.19917 | null |
| 2024-05-30 | EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos | Ryo Fujii et.al. | 2405.19644 | link |
| 2024-05-30 | SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation | Junjie Zhang et.al. | 2405.19586 | null |
| 2024-05-29 | Matrix Manifold Neural Networks++ | Xuan Son Nguyen et.al. | 2405.19206 | null |
| 2024-05-29 | Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature Preservation | Sabrina Cynthia Triess et.al. | 2405.19173 | null |
| 2024-05-28 | Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition | Muhammad Adi Nugroho et.al. | 2405.18012 | null |
| 2024-05-30 | Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson’s Disease Severity in Walking Sequences | Vida Adeli et.al. | 2405.17817 | link |
| 2024-05-28 | Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions | Rui Zhang et.al. | 2405.17729 | null |
| 2024-05-28 | EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions? | Boshen Xu et.al. | 2405.17719 | link |
| 2024-05-27 | Advancements in Tactile Hand Gesture Recognition for Enhanced Human-Machine Interaction | Chiara Fumelli et.al. | 2405.17038 | null |
| 2024-05-27 | A Cross-Dataset Study for Text-based 3D Human Motion Retrieval | Léore Bensabath et.al. | 2405.16909 | null |
| 2024-05-26 | Flow Snapshot Neurons in Action: Deep Neural Networks Generalize to Biological Motion Perception | Shuangpeng Han et.al. | 2405.16493 | null |
| 2024-05-25 | Application of Artificial Intelligence in Hand Gesture Recognition with Virtual Reality: Survey and Analysis of Hand Gesture Hardware Selection | Jindi Wang et.al. | 2405.16264 | null |
| 2024-05-22 | From CNNs to Transformers in Multimodal Human Action Recognition: A Survey | Muhammad Bilal Shaikh et.al. | 2405.15813 | null |
| 2024-05-24 | V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM | Abdur Rahman et.al. | 2405.15341 | link |
| 2024-05-23 | Enhanced Spatiotemporal Prediction Using Physical-guided And Frequency-enhanced Recurrent Neural Networks | Xuanle Zhao et.al. | 2405.14504 | null |
| 2024-05-23 | SpGesture: Source-Free Domain-adaptive sEMG-based Gesture Recognition with Jaccard Attentive Spiking Neural Network | Weiyu Guo et.al. | 2405.14398 | null |
| 2024-05-23 | MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models | Jiuming Liu et.al. | 2405.14338 | null |
| 2024-05-22 | Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks | Mohit Prabhushankar et.al. | 2405.13758 | null |
| 2024-05-21 | Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding | Rong Gao et.al. | 2405.13206 | null |
| 2024-05-22 | Building Temporal Kernels with Orthogonal Polynomials | Yan Ru Pei et.al. | 2405.12179 | link |
| 2024-05-18 | GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic Hand Gesture Recognition | Mallika Garg et.al. | 2405.11180 | link |
| 2024-05-17 | Air Signing and Privacy-Preserving Signature Verification for Digital Documents | P. Sarveswarasarma et.al. | 2405.10868 | null |
| 2024-05-17 | MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains | Zhaohuan Zhan et.al. | 2405.10620 | null |
| 2024-05-06 | MEET: Mixture of Experts Extra Tree-Based sEMG Hand Gesture Identification | Naveen Gehlot et.al. | 2405.09562 | null |
| 2024-05-14 | Wearable Sensor-Based Few-Shot Continual Learning on Hand Gestures for Motor-Impaired Individuals via Latent Embedding Exploitation | Riyad Bin Rafiq et.al. | 2405.08969 | link |
| 2024-05-14 | The impact of Compositionality in Zero-shot Multi-label action recognition for Object-based tasks | Carmela Calabrese et.al. | 2405.08695 | null |
| 2024-05-15 | POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning | Chang Huang et.al. | 2405.08036 | null |
| 2024-05-13 | Coarse or Fine? Recognising Action End States without Labels | Davide Moltisanti et.al. | 2405.07723 | link |
| 2024-05-11 | PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action Recognition | Shenglin He et.al. | 2405.06929 | null |
| 2024-05-10 | CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras | James Tang et.al. | 2405.06845 | link |
| 2024-05-09 | A Survey on Backbones for Deep Video Action Recognition | Zixuan Tang et.al. | 2405.05584 | null |
| 2024-05-06 | OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs | Jiahao Nick Li et.al. | 2405.03901 | null |
| 2024-05-05 | JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance Videos | Pietro Nardelli et.al. | 2405.02961 | null |
| 2024-05-03 | On the Utility of External Agent Intention Predictor for Human-AI Coordination | Chenxu Wang et.al. | 2405.02229 | null |
| 2024-05-11 | MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition | Hongyu Qu et.al. | 2405.02077 | null |
| 2024-05-03 | Enhancing Micro Gesture Recognition for Emotion Understanding via Context-aware Visual-Text Contrastive Learning | Deng Li et.al. | 2405.01885 | link |
| 2024-05-02 | Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy | Hoang-Quan Nguyen et.al. | 2405.01337 | null |
| 2024-05-07 | Towards Inclusive Face Recognition Through Synthetic Ethnicity Alteration | Praveen Kumar Chandaliya et.al. | 2405.01273 | null |
| 2024-04-30 | One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features | Trung Thanh Nguyen et.al. | 2404.19542 | link |
| 2024-04-30 | Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action Recognition | Zhendong Liu et.al. | 2404.19383 | null |
| 2024-04-28 | Enhancing Action Recognition from Low-Quality Skeleton Data via Part-Level Knowledge Distillation | Cuiwei Liu et.al. | 2404.18206 | null |
| 2024-04-26 | SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes | Georgia Baltsou et.al. | 2404.17255 | null |
| 2024-04-25 | Learning Discriminative Spatio-temporal Representations for Semi-supervised Action Recognition | Yu Wang et.al. | 2404.16416 | null |
| 2024-04-25 | An Improved Graph Pooling Network for Skeleton-Based Action Recognition | Cong Wu et.al. | 2404.16359 | null |
| 2024-04-24 | Unimodal and Multimodal Sensor Fusion for Wearable Activity Recognition | Hymalai Bello et.al. | 2404.16005 | null |
| 2024-04-24 | 3D Face Morphing Attack Generation using Non-Rigid Registration | Jag Mohan Singh et.al. | 2404.15765 | null |
| 2024-04-25 | HDBN: A Novel Hybrid Dual-branch Network for Robust Skeleton-based Action Recognition | Jinfu Liu et.al. | 2404.15719 | link |
| 2024-04-23 | Combating Missing Modalities in Egocentric Videos at Test Time | Merey Ramazanova et.al. | 2404.15161 | null |
| 2024-04-23 | G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition | Kaikai Deng et.al. | 2404.14934 | null |
| 2024-04-23 | Driver Activity Classification Using Generalizable Representations from Vision-Language Models | Ross Greer et.al. | 2404.14906 | null |
| 2024-04-23 | DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition | Haozhe Cheng et.al. | 2404.14890 | null |
| 2024-04-22 | 1st Place Solution to the 1st SkatingVerse Challenge | Tao Sun et.al. | 2404.14032 | null |
| 2024-04-22 | CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment | Kanglei Zhou et.al. | 2404.13999 | link |
| 2024-04-21 | Attack on Scene Flow using Point Clouds | Haniyeh Ehsani Oskouie et.al. | 2404.13621 | null |
| 2024-04-20 | STAT: Towards Generalizable Temporal Action Localization | Yangcen Liu et.al. | 2404.13311 | null |
| 2024-04-19 | Ring-a-Pose: A Ring for Continuous Hand Pose Tracking | Tianhong Catherine Yu et.al. | 2404.12980 | null |
| 2024-04-19 | VoxAtnNet: A 3D Point Clouds Convolutional Neural Network for Generalizable Face Presentation Attack Detection | Raghavendra Ramachandra et.al. | 2404.12680 | null |
| 2024-04-18 | DeepLocalization: Using change point detection for Temporal Action Localization | Mohammed Shaiqur Rahman et.al. | 2404.12258 | null |
| 2024-04-18 | Aligning Actions and Walking to LLM-Generated Textual Descriptions | Radu Chivereanu et.al. | 2404.12192 | link |
| 2024-04-18 | Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition | Xunsong Li et.al. | 2404.11903 | null |
| 2024-04-18 | sEMG-based Fine-grained Gesture Recognition via Improved LightGBM Model | Xiupeng Qiao et.al. | 2404.11861 | null |
| 2024-04-17 | VG4D: Vision-Language Model Goes 4D Video Recognition | Zhichao Deng et.al. | 2404.11605 | link |
| 2024-04-17 | A Data-Driven Representation for Sign Language Production | Harry Walsh et.al. | 2404.11499 | link |
| 2024-04-17 | Lower Limb Movements Recognition Based on Feature Recursive Elimination and Backpropagation Neural Network | Yongkai Ma et.al. | 2404.11383 | null |
| 2024-04-17 | Revisiting Noise Resilience Strategies in Gesture Recognition: Short-Term Enhancement in Surface Electromyographic Signal Analysis | Weiyu Guo et.al. | 2404.11213 | null |
| 2024-04-17 | Kathakali Hand Gesture Recognition With Minimal Data | Kavitha Raju et.al. | 2404.11205 | null |
| 2024-04-16 | HumMUSS: Human Motion Understanding using State Space Models | Arnab Kumar Mondal et.al. | 2404.10880 | null |
| 2024-04-17 | Learning to Score Sign Language with Two-stage Method | Hongli Wen et.al. | 2404.10383 | null |
| 2024-04-16 | MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition | Naichuan Zheng et.al. | 2404.10210 | null |
| 2024-04-15 | Design and Analysis of Efficient Attention in Transformers for Social Group Activity Recognition | Masato Tamura et.al. | 2404.09964 | null |
| 2024-04-15 | A Diffusion-based Data Generator for Training Object Recognition Models in Ultra-Range Distance | Eran Bamani et.al. | 2404.09846 | null |
| 2024-04-15 | Leveraging Temporal Contextualization for Video Action Recognition | Minji Kim et.al. | 2404.09490 | link |
| 2024-04-14 | In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition | Wiktor Mucha et.al. | 2404.09308 | null |
| 2024-04-13 | Exploring Explainability in Video Action Recognition | Avinab Saha et.al. | 2404.09067 | null |
| 2024-04-12 | MSSTNet: A Multi-Scale Spatio-Temporal CNN-Transformer Network for Dynamic Facial Expression Recognition | Linhuang Wang et.al. | 2404.08433 | null |
| 2024-04-11 | Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls | Amin Hosseiny Marani et.al. | 2404.08155 | null |
| 2024-04-11 | Simba: Mamba augmented U-ShiftGCN for Skeletal Action Recognition in Videos | Soumyabrata Chaudhuri et.al. | 2404.07645 | null |
| 2024-04-15 | Fine-Grained Side Information Guided Dual-Prompts for Zero-Shot Skeleton Action Recognition | Yang Chen et.al. | 2404.07487 | null |
| 2024-04-10 | O-TALC: Steps Towards Combating Oversegmentation within Online Action Segmentation | Matthew Kent Myers et.al. | 2404.06894 | null |
| 2024-04-10 | An Animation-based Augmentation Approach for Action Recognition from Discontinuous Video | Xingyu Song et.al. | 2404.06741 | null |
| 2024-04-07 | X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model | Jan Held et.al. | 2404.06332 | null |
| 2024-04-10 | Algorithms for Caching and MTS with reduced number of predictions | Karim Abdel Sadek et.al. | 2404.06280 | null |
| 2024-04-09 | ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos | Sharana Dharshikgan Suresh Dass et.al. | 2404.06243 | link |
| 2024-04-08 | Localizing Moments of Actions in Untrimmed Videos of Infants with Autism Spectrum Disorder | Halil Ismail Helvaci et.al. | 2404.05849 | null |
| 2024-04-09 | TIM: A Time Interval Machine for Audio-Visual Action Recognition | Jacob Chalk et.al. | 2404.05559 | link |
| 2024-04-11 | Test-Time Zero-Shot Temporal Action Localization | Benedetta Liberatori et.al. | 2404.05426 | link |
| 2024-04-09 | SDFR: Synthetic Data for Face Recognition Competition | Hatef Otroshi Shahreza et.al. | 2404.04580 | null |
| 2024-04-05 | PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos | Yufei Zhang et.al. | 2404.04430 | null |
| 2024-04-05 | Koala: Key frame-conditioned long video-LLM | Reuben Tan et.al. | 2404.04346 | null |
| 2024-04-04 | UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization | Tiantian Geng et.al. | 2404.03179 | null |
| 2024-04-03 | Optimizing the Deployment of Tiny Transformers on Low-Power MCUs | Victor J. B. Jung et.al. | 2404.02945 | link |
| 2024-04-03 | Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition | Ikuo Nakamura et.al. | 2404.02624 | null |
| 2024-04-02 | PREGO: online mistake detection in PRocedural EGOcentric videos | Alessandro Flaborea et.al. | 2404.01933 | link |
| 2024-04-02 | Disentangled Pre-training for Human-Object Interaction Detection | Zhuolong Li et.al. | 2404.01725 | link |
| 2024-04-02 | Language Model Guided Interpretable Video Action Reasoning | Ning Wang et.al. | 2404.01591 | null |
| 2024-04-02 | Leveraging YOLO-World and GPT-4V LMMs for Zero-Shot Person Detection and Action Recognition in Drone Imagery | Christian Limberg et.al. | 2404.01571 | null |
| 2024-04-01 | LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization | Akshita Gupta et.al. | 2404.01282 | null |
| 2024-03-31 | LLMs are Good Action Recognizers | Haoxuan Qu et.al. | 2404.00532 | null |
| 2024-03-29 | Latent Embedding Clustering for Occlusion Robust Head Pose Estimation | José Celestino et.al. | 2403.20251 | null |
| 2024-03-29 | A Unified Framework for Human-centric Point Cloud Video Understanding | Yiteng Xu et.al. | 2403.20031 | null |
| 2024-03-28 | Zero-shot Prompt-based Video Encoder for Surgical Gesture Recognition | Mingxing Rao et.al. | 2403.19786 | link |
| 2024-03-28 | Hypergraph-based Multi-View Action Recognition using Event Cameras | Yue Gao et.al. | 2403.19316 | null |
| 2024-03-27 | PLOT-TAL – Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization | Edward Fish et.al. | 2403.18915 | null |
| 2024-03-27 | iFace: Hand-Over-Face Gesture Recognition Leveraging Impedance Sensing | Mengxi Liu et.al. | 2403.18433 | null |
| 2024-03-27 | An Evolutionary Network Architecture Search Framework with Adaptive Multimodal Fusion for Hand Gesture Recognition | Yizhang Xia et.al. | 2403.18208 | null |
| 2024-03-26 | OmniVid: A Generative Framework for Universal Video Understanding | Junke Wang et.al. | 2403.17935 | link |
| 2024-03-25 | Understanding Long Videos in One Multimodal Language Model Pass | Kanchana Ranasinghe et.al. | 2403.16998 | link |
| 2024-03-25 | Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects | Zicong Fan et.al. | 2403.16428 | null |
| 2024-03-24 | Emotion Recognition from the perspective of Activity Recognition | Savinay Nagendra et.al. | 2403.16263 | null |
| 2024-03-22 | InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding | Yi Wang et.al. | 2403.15377 | link |
| 2024-03-22 | Gesture-Controlled Aerial Robot Formation for Human-Swarm Interaction in Safety Monitoring Applications | Vít Krátký et.al. | 2403.15333 | null |
| 2024-03-22 | GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition | Lei Jiang et.al. | 2403.15212 | link |
| 2024-03-21 | Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets | Ahmet Alp Kindiroglu et.al. | 2403.14534 | link |
| 2024-03-20 | Hierarchical NeuroSymbolic Approach for Action Quality Assessment | Lauren Okamoto et.al. | 2403.13798 | link |
| 2024-03-19 | Selective, Interpretable, and Motion Consistent Privacy Attribute Obfuscation for Action Recognition | Filip Ilic et.al. | 2403.12710 | null |
| 2024-03-19 | ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More | Jiazhou Zhou et.al. | 2403.12534 | null |
| 2024-03-19 | VideoBadminton: A Video Dataset for Badminton Action Recognition | Qi Li et.al. | 2403.12385 | null |
| 2024-03-19 | Multi-View Video-Based Learning: Leveraging Weak Labels for Frame-Level Perception | Vijay John et.al. | 2403.11616 | null |
| 2024-03-19 | VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation | Weiyao Wang et.al. | 2403.11461 | null |
| 2024-03-17 | A Lie Group Approach to Riemannian Batch Normalization | Ziheng Chen et.al. | 2403.11261 | link |
| 2024-03-17 | Boosting Semi-Supervised Temporal Action Localization by Learning from Non-Target Classes | Kun Xia et.al. | 2403.11189 | null |
| 2024-03-16 | CoPlay: Audio-agnostic Cognitive Scaling for Acoustic Sensing | Yin Li et.al. | 2403.10796 | null |
| 2024-03-15 | CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner | Tingbing Yan et.al. | 2403.10082 | null |
| 2024-03-15 | Skeleton-Based Human Action Recognition with Noisy Labels | Yi Xu et.al. | 2403.09975 | null |
| 2024-03-14 | On the Utility of 3D Hand Poses for Action Recognition | Md Salman Shamil et.al. | 2403.09805 | null |
| 2024-03-14 | 3D-VLA: A 3D Vision-Language-Action Generative World Model | Haoyu Zhen et.al. | 2403.09631 | link |
| 2024-03-14 | SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition | Jeonghyeok Do et.al. | 2403.09508 | link |
| 2024-03-14 | EventRPG: Event Data Augmentation with Relevance Propagation Guidance | Mingyuan Sun et.al. | 2403.09274 | link |
| 2024-03-14 | Leveraging Foundation Model Automatic Data Augmentation Strategies and Skeletal Points for Hands Action Recognition in Industrial Assembly Lines | Liang Wu et.al. | 2403.09056 | null |
| 2024-03-13 | Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models | Wensheng Liang et.al. | 2403.08420 | null |
| 2024-03-13 | NaturalVLM: Leveraging Fine-grained Natural Language for Affordance-Guided Visual Manipulation | Ran Xu et.al. | 2403.08355 | null |
| 2024-03-13 | ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation | Guanxing Lu et.al. | 2403.08321 | link |
| 2024-03-12 | NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning | Bingqian Lin et.al. | 2403.07376 | link |
| 2024-03-12 | BID: Boundary-Interior Decoding for Unsupervised Temporal Action Localization Pre-Trainin | Qihang Fang et.al. | 2403.07354 | null |
| 2024-03-11 | Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling | Wele Gedara Chaminda Bandara et.al. | 2403.06978 | link |
| 2024-03-11 | Deep Learning Approaches for Human Action Recognition in Video Data | Yufei Xie et.al. | 2403.06810 | null |
| 2024-03-11 | Real-Time Multimodal Cognitive Assistant for Emergency Medical Services | Keshara Weerasinghe et.al. | 2403.06734 | null |
| 2024-03-11 | Multimodal Transformers for Real-Time Surgical Activity Prediction | Keshara Weerasinghe et.al. | 2403.06705 | link |
| 2024-03-11 | epsilon-Mesh Attack: A Surface-based Adversarial Point Cloud Attack for Facial Expression Recognition | Batuhan Cengiz et.al. | 2403.06661 | null |
| 2024-03-11 | Density-Guided Label Smoothing for Temporal Localization of Driving Actions | Tunc Alkanat et.al. | 2403.06616 | null |
| 2024-03-11 | Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition | Erkut Akdag et.al. | 2403.06577 | null |
| 2024-03-10 | Coherent Temporal Synthesis for Incremental Action Segmentation | Guodong Ding et.al. | 2403.06102 | null |
| 2024-03-09 | Dissecting Deep RL with High Update Ratios: Combatting Value Overestimation and Divergence | Marcel Hussing et.al. | 2403.05996 | null |
| 2024-03-08 | Benchmarking Micro-action Recognition: Dataset, Methods, and Applications | Dan Guo et.al. | 2403.05234 | link |
| 2024-03-06 | Video Relationship Detection Using Mixture of Experts | Ala Shaabana et.al. | 2403.03994 | link |
| 2024-03-05 | Behavior Generation with Latent Actions | Seungjae Lee et.al. | 2403.03181 | link |
| 2024-03-05 | Learning to Use Tools via Cooperative and Interactive Agents | Zhengliang Shi et.al. | 2403.03031 | null |
| 2024-03-04 | Gesture recognition with Brownian reservoir computing using geometrically confined skyrmion dynamics | Grischa Beneke et.al. | 2403.01877 | null |
| 2024-03-04 | A Simple Baseline for Efficient Hand Mesh Reconstruction | Zhishan Zhou et.al. | 2403.01813 | null |
| 2024-03-03 | A Unified Model Selection Technique for Spectral Clustering Based Motion Segmentation | Yuxiang Huang et.al. | 2403.01606 | null |
| 2024-03-03 | Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition | Kun-Yu Lin et.al. | 2403.01560 | link |
| 2024-03-02 | Dynamic 3D Point Cloud Sequences as 2D Videos | Yiming Zeng et.al. | 2403.01129 | null |
| 2024-02-29 | On the Design of Human-Robot Collaboration Gestures | Anas Shrinah et.al. | 2402.19058 | null |
| 2024-02-23 | Multimodal Transformer With a Low-Computational-Cost Guarantee | Sungjin Park et.al. | 2402.15096 | null |
| 2024-02-17 | Implementation of a Model of the Cortex Basal Ganglia Loop | Naoya Arakawa et.al. | 2402.13275 | null |
| 2024-02-20 | Radar-Based Recognition of Static Hand Gestures in American Sign Language | Christian Schuessler et.al. | 2402.12800 | null |
| 2024-02-20 | Learning Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition | Yuke Li et.al. | 2402.12706 | null |
| 2024-02-19 | Comprehensive Cognitive LLM Agent for Smartphone GUI Automation | Xinbei Ma et.al. | 2402.11941 | link |
| 2024-02-15 | Hand Shape and Gesture Recognition using Multiscale Template Matching, Background Subtraction and Binary Image Analysis | Ketan Suhaas Saichandran et.al. | 2402.09663 | null |
| 2024-02-14 | TikTokActions: A TikTok-Derived Video Dataset for Human Action Recognition | Yang Qian et.al. | 2402.08875 | null |
| 2024-02-13 | BdSLW60: A Word-Level Bangla Sign Language Dataset | Husne Ara Rubaiyeat et.al. | 2402.08635 | link |
| 2024-02-13 | Vision-Based Hand Gesture Customization from a Single Demonstration | Soroush Shahi et.al. | 2402.08420 | null |
| 2024-02-12 | PBADet: A One-Stage Anchor-Free Approach for Part-Body Association | Zhongpai Gao et.al. | 2402.07814 | null |
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-12-23 | AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric Alignment | Anna Šárová Mikeštíková et.al. | 2512.20538 | null |
| 2025-12-23 | SirenPose: Dynamic Scene Reconstruction via Geometric Supervision | Kaitong Cai et.al. | 2512.20531 | null |
| 2025-12-23 | Differentially Private Feature Release for Wireless Sensing: Adaptive Privacy Budget Allocation on CSI Spectrograms | Ipek Sena Yilmaz et.al. | 2512.20323 | null |
| 2025-12-23 | Enhancing annotations for 5D apple pose estimation through 3D Gaussian Splatting (3DGS) | Robert van de Ven et.al. | 2512.20148 | null |
| 2025-12-23 | milliMamba: Specular-Aware Human Pose Estimation via Dual mmWave Radar with Multi-Frame Mamba Fusion | Niraj Prakash Kini et.al. | 2512.20128 | null |
| 2025-12-22 | Trifocal Tensor and Relative Pose Estimation with Known Vertical Direction | Tao Li et.al. | 2512.19110 | null |
| 2025-12-22 | 6DAttack: Backdoor Attacks in the 6DoF Pose Estimation | Jihui Guo et.al. | 2512.19058 | null |
| 2025-12-20 | A two-stream network with global-local feature fusion for bone age assessment | Qiong Lou et.al. | 2512.18331 | null |
| 2025-12-19 | SurgiPose: Estimating Surgical Tool Kinematics from Monocular Video for Surgical Robot Learning | Juo-Tung Chen et.al. | 2512.18068 | null |
| 2025-12-19 | G3Splat: Geometrically Consistent Generalizable Gaussian Splatting | Mehdi Hosseinzadeh et.al. | 2512.17547 | null |
| 2025-12-19 | Adaptive Covariance and Quaternion-Focused Hybrid Error-State EKF/UKF for Visual-Inertial Odometry | Ufuk Asil et.al. | 2512.17505 | null |
| 2025-12-19 | VAIR: Visual Analytics for Injury Risk Exploration in Sports | Chunggi Lee et.al. | 2512.17446 | null |
| 2025-12-19 | Globally Optimal Solution to the Generalized Relative Pose Estimation Problem using Affine Correspondences | Zhenbao Yu et.al. | 2512.17188 | null |
| 2025-12-19 | InstructDubber: Instruction-based Alignment for Zero-shot Movie Dubbing | Zhedong Zhang et.al. | 2512.17154 | null |
| 2025-12-18 | PoseMoE: Mixture-of-Experts Network for Monocular 3D Human Pose Estimation | Mengyuan Liu et.al. | 2512.16494 | null |
| 2025-12-18 | Avatar4D: Synthesizing Domain-Specific 4D Humans for Real-World Pose Estimation | Jerrin Bright et.al. | 2512.16199 | null |
| 2025-12-18 | LAPX: Lightweight Hourglass Network with Global Context | Haopeng Zhao et.al. | 2512.16089 | null |
| 2025-12-17 | Robust Multi-view Camera Calibration from Dense Matches | Johannes Hägerlind et.al. | 2512.15608 | null |
| 2025-12-17 | BLANKET: Anonymizing Faces in Infant Video Recordings | Ditmar Hadera et.al. | 2512.15542 | null |
| 2025-12-17 | Off The Grid: Detection of Primitives for Feed-Forward 3D Gaussian Splatting | Arthur Moreau et.al. | 2512.15508 | null |
| 2025-12-17 | RUMPL: Ray-Based Transformers for Universal Multi-View 2D to 3D Human Pose Lifting | Seyed Abolfazl Ghasemzadeh et.al. | 2512.15488 | null |
| 2025-12-17 | See It Before You Grab It: Deep Learning-based Action Anticipation in Basketball | Arnau Barrera Roy et.al. | 2512.15386 | null |
| 2025-12-17 | NAP3D: NeRF Assisted 3D-3D Pose Alignment for Autonomous Vehicles | Gaurav Bansal et.al. | 2512.15080 | null |
| 2025-12-16 | Isolated Sign Language Recognition with Segmentation and Pose Estimation | Daniel Perkins et.al. | 2512.14876 | null |
| 2025-12-16 | FastDDHPose: Towards Unified, Efficient, and Disentangled 3D Human Pose Estimation | Qingyuan Cai et.al. | 2512.14162 | null |
| 2025-12-15 | LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction | Tianye Ding et.al. | 2512.13680 | null |
| 2025-12-13 | Audio-Visual Camera Pose Estimation with Passive Scene Sounds and In-the-Wild Video | Daniel Adebi et.al. | 2512.12165 | null |
| 2025-12-10 | mmWEAVER: Environment-Specific mmWave Signal Synthesis from a Photo and Activity Description | Mahathir Monjur et.al. | 2512.11894 | null |
| 2025-12-12 | A Multi-Mode Structured Light 3D Imaging System with Multi-Source Information Fusion for Underwater Pipeline Detection | Qinghan Hu et.al. | 2512.11354 | null |
| 2025-12-11 | SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model | Yukai Shi et.al. | 2512.10957 | null |
| 2025-12-11 | E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training | Qitao Zhao et.al. | 2512.10950 | null |
| 2025-12-11 | PoseGAM: Robust Unseen Object Pose Estimation via Geometry-Aware Multi-View Reasoning | Jianqi Chen et.al. | 2512.10840 | null |
| 2025-12-11 | Geo6DPose: Fast Zero-Shot 6D Object Pose Estimation via Geometry-Filtered Feature Matching | Javier Villena Toro et.al. | 2512.10674 | null |
| 2025-12-11 | Mr. Virgil: Learning Multi-robot Visual-range Relative Localization | Si Wang et.al. | 2512.10540 | null |
| 2025-12-11 | An M-Health Algorithmic Approach to Identify and Assess Physiotherapy Exercises in Real Time | Stylianos Kandylakis et.al. | 2512.10437 | null |
| 2025-12-11 | Point2Pose: A Generative Framework for 3D Human Pose Estimation with Multi-View Point Cloud Dataset | Hyunsoo Lee et.al. | 2512.10321 | null |
| 2025-12-11 | THE-Pose: Topological Prior with Hybrid Graph Fusion for Estimating Category-Level 6D Object Pose | Eunho Lee et.al. | 2512.10251 | null |
| 2025-12-10 | FastPose-ViT: A Vision Transformer for Real-Time Spacecraft Pose Estimation | Pierre Ancey et.al. | 2512.09792 | null |
| 2025-12-10 | Development and Testing for Perception Based Autonomous Landing of a Long-Range QuadPlane | Ashik E Rasul et.al. | 2512.09343 | null |
| 2025-12-09 | ConceptPose: Training-Free Zero-Shot Object Pose Estimation using Concept Vectors | Liming Kuang et.al. | 2512.09056 | null |
| 2025-12-09 | Selfi: Self Improving Reconstruction Engine via 3D Geometric Feature Alignment | Youming Deng et.al. | 2512.08930 | null |
| 2025-12-09 | SDT-6D: Fully Sparse Depth-Transformer for Staged End-to-End 6D Pose Estimation in Industrial Multi-View Bin Picking | Nico Leuze et.al. | 2512.08430 | null |
| 2025-12-09 | Zero-Splat TeleAssist: A Zero-Shot Pose Estimation Framework for Semantic Teleoperation | Srijan Dokania et.al. | 2512.08271 | null |
| 2025-12-08 | UltrasODM: A Dual Stream Optical Flow Mamba Network for 3D Freehand Ultrasound Reconstruction | Mayank Anand et.al. | 2512.07756 | null |
| 2025-12-08 | UnCageNet: Tracking and Pose Estimation of Caged Animal | Sayak Dutta et.al. | 2512.07712 | null |
| 2025-12-08 | VFM-VLM: Vision Foundation Model and Vision Language Model based Visual Comparison for 3D Pose Estimation | Md Selim Sarowar et.al. | 2512.07215 | null |
| 2025-12-08 | Object Pose Distribution Estimation for Determining Revolution and Reflection Uncertainty in Point Clouds | Frederik Hagelskjær et.al. | 2512.07211 | null |
| 2025-12-07 | Dynamic Visual SLAM using a General 3D Prior | Xingguang Zhong et.al. | 2512.06868 | null |
| 2025-12-07 | Physics Informed Human Posture Estimation Based on 3D Landmarks from Monocular RGB-Videos | Tobias Leuthold et.al. | 2512.06783 | null |
| 2025-12-06 | GNC-Pose: Geometry-Aware GNC-PnP for Accurate 6D Pose Estimation | Xiujin Liu et.al. | 2512.06565 | null |
| 2025-12-06 | Exploiting Spatiotemporal Properties for Efficient Event-Driven Human Pose Estimation | Haoxian Zhou et.al. | 2512.06306 | null |
| 2025-12-05 | GuideNav: User-Informed Development of a Vision-Only Robotic Navigation Assistant For Blind Travelers | Hochul Hwang et.al. | 2512.06147 | null |
| 2025-12-03 | Training-Free Robot Pose Estimation using Off-the-Shelf Foundational Models | Laurence Liang et.al. | 2512.06017 | null |
| 2025-12-05 | Deep Learning-Based Real-Time Sequential Facial Expression Analysis Using Geometric Features | Talha Enes Koksal et.al. | 2512.05669 | null |
| 2025-12-04 | Age-Inclusive 3D Human Mesh Recovery for Action-Preserving Data Anonymization | Georgios Chatzichristodoulou et.al. | 2512.05259 | null |
| 2025-12-04 | Equivariant symmetry-aware head pose estimation for fetal MRI | Ramya Muthukrishnan et.al. | 2512.04890 | null |
| 2025-12-04 | Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing | Maria-Paola Forte et.al. | 2512.04862 | null |
| 2025-12-03 | SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL | Siyi Chen et.al. | 2512.04069 | null |
| 2025-12-03 | MSG-Loc: Multi-Label Likelihood-based Semantic Graph Matching for Object-Level Global Localization | Gihyeon Lee et.al. | 2512.03522 | null |
| 2025-12-03 | AfroBeats Dance Movement Analysis Using Computer Vision: A Proof-of-Concept Framework Combining YOLO and Segment Anything Model | Kwaku Opoku-Ware et.al. | 2512.03509 | null |
| 2025-12-02 | DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling | Kairun Wen et.al. | 2512.03000 | null |
| 2025-12-02 | DF-Mamba: Deformable State Space Modeling for 3D Hand Pose Estimation in Interactions | Yifan Zhou et.al. | 2512.02727 | null |
| 2025-12-01 | Is Image-based Object Pose Estimation Ready to Support Grasping? | Eric C. Joyce et.al. | 2512.01856 | null |
| 2025-11-29 | CC-FMO: Camera-Conditioned Zero-Shot Single Image to 3D Scene Generation with Foundation Model Orchestration | Boshi Tang et.al. | 2512.00493 | null |
| 2025-11-03 | Learning from Watching: Scalable Extraction of Manipulation Trajectories from Human Videos | X. Hu et.al. | 2512.00024 | null |
| 2025-11-28 | Zero-Shot Multi-Criteria Visual Quality Inspection for Semi-Controlled Industrial Environments via Real-Time 3D Digital Twin Simulation | Jose Moises Araya-Martinez et.al. | 2511.23214 | null |
| 2025-11-28 | DiskChunGS: Large-Scale 3D Gaussian SLAM Through Chunk-Based Memory Management | Casimir Feldmann et.al. | 2511.23030 | null |
| 2025-11-28 | Threat-Aware UAV Dodging of Human-Thrown Projectiles with an RGB-D Camera | Yuying Zhang et.al. | 2511.22847 | null |
| 2025-11-27 | Emergent Extreme-View Geometry in 3D Foundation Models | Yiwen Zhang et.al. | 2511.22686 | null |
| 2025-11-27 | UAV-MM3D: A Large-Scale Synthetic Benchmark for 3D Perception of Unmanned Aerial Vehicles with Multi-Modal Data | Longkun Zou et.al. | 2511.22404 | null |
| 2025-11-27 | ColonAdapter: Geometry Estimation Through Foundation Model Adaptation for Colonoscopy | Zhiyi Jiang et.al. | 2511.22250 | null |
| 2025-11-26 | Seeing without Pixels: Perception from Camera Trajectories | Zihui Xue et.al. | 2511.21681 | null |
| 2025-11-26 | Uncertainty Quantification for Visual Object Pose Estimation | Lorenzo Shaikewitz et.al. | 2511.21666 | null |
| 2025-11-26 | Enhanced Landmark Detection Model in Pelvic Fluoroscopy using 2D/3D Registration Loss | Chou Mo et.al. | 2511.21575 | null |
| 2025-11-25 | Metric, inertially aligned monocular state estimation via kinetodynamic priors | Jiaxin Liu et.al. | 2511.20496 | null |
| 2025-11-25 | Dance Style Classification using Laban-Inspired and Frequency-Domain Motion Features | Ben Hamscher et.al. | 2511.20469 | null |
| 2025-11-25 | VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction | Yu Hu et.al. | 2511.19971 | null |
| 2025-11-24 | The Determinant Ratio Matrix Approach to Solving 3D Matching and 2D Orthographic Projection Alignment Tasks | Andrew J. Hanson et.al. | 2511.19511 | null |
| 2025-11-18 | PuzzlePoles: Cylindrical Fiducial Markers Based on the PuzzleBoard Pattern | Juri Zach et.al. | 2511.19448 | null |
| 2025-11-24 | Graph-based 3D Human Pose Estimation using WiFi Signals | Jichao Chen et.al. | 2511.19105 | null |
| 2025-11-24 | Analysis of Deep-Learning Methods in an ISO/TS 15066-Compliant Human-Robot Safety Framework | David Bricher et.al. | 2511.19094 | null |
| 2025-11-24 | LAA3D: A Benchmark of Detecting and Tracking Low-Altitude Aircraft in 3D Space | Hai Wu et.al. | 2511.19057 | null |
| 2025-11-24 | Robust Long-term Test-Time Adaptation for 3D Human Pose Estimation through Motion Discretization | Yilin Wen et.al. | 2511.18851 | null |
| 2025-11-24 | CNN-Based Camera Pose Estimation and Localisation of Scan Images for Aircraft Visual Inspection | Xueyan Oh et.al. | 2511.18702 | null |
| 2025-11-23 | Expanding the Workspace of Electromagnetic Navigation Systems Using Dynamic Feedback for Single- and Multi-agent Control | Jasan Zughaibi et.al. | 2511.18486 | null |
| 2025-11-22 | Muskie: Multi-view Masked Image Modeling for 3D Vision Pre-training | Wenyu Li et.al. | 2511.18115 | null |
| 2025-11-21 | NoPe-NeRF++: Local-to-Global Optimization of NeRF with No Pose Prior | Dongbo Shi et.al. | 2511.17322 | null |
| 2025-11-21 | MuM: Multi-View Masked Image Modeling for 3D Vision | David Nordström et.al. | 2511.17309 | null |
| 2025-11-21 | BiFingerPose: Bimodal Finger Pose Estimation for Touch Devices | Xiongjun Guan et.al. | 2511.17306 | null |
| 2025-11-21 | RacketVision: A Multiple Racket Sports Benchmark for Unified Ball and Racket Analysis | Linfeng Dong et.al. | 2511.17045 | null |
| 2025-11-21 | MobileOcc: A Human-Aware Semantic Occupancy Dataset for Mobile Robots | Junseo Kim et.al. | 2511.16949 | null |
| 2025-11-20 | BOP-ASK: Object-Interaction Reasoning for Vision-Language Models | Vineet Bhat et.al. | 2511.16857 | null |
| 2025-11-20 | NoPo-Avatar: Generalizable and Animatable Avatars from Sparse Inputs without Human Poses | Jing Wen et.al. | 2511.16673 | null |
| 2025-11-20 | EOGS++: Earth Observation Gaussian Splatting with Internal Camera Refinement and Direct Panchromatic Rendering | Pierrick Bournez et.al. | 2511.16542 | null |
| 2025-11-20 | Physics-Informed Machine Learning for Efficient Sim-to-Real Data Augmentation in Micro-Object Pose Estimation | Zongcai Tan et.al. | 2511.16494 | null |
| 2025-11-20 | End-to-End Motion Capture from Rigid Body Markers with Geodesic Loss | Hai Lan et.al. | 2511.16418 | null |
| 2025-11-19 | Box6D : Zero-shot Category-level 6D Pose Estimation of Warehouse Boxes | Yintao Ma et.al. | 2511.15884 | null |
| 2025-11-19 | WALDO: Where Unseen Model-based 6D Pose Estimation Meets Occlusion | Sajjad Pakdamansavoji et.al. | 2511.15874 | null |
| 2025-11-19 | Scriboora: Rethinking Human Pose Forecasting | Daniel Bermuth et.al. | 2511.15565 | null |
| 2025-11-18 | RocSync: Millisecond-Accurate Temporal Synchronization for Heterogeneous Camera Systems | Jaro Meyer et.al. | 2511.14948 | null |
| 2025-11-18 | A Quantitative Method for Shoulder Presentation Evaluation in Biometric Identity Documents | Alfonso Pedro Ridao et.al. | 2511.14376 | null |
| 2025-11-18 | Simultaneous Localization and 3D-Semi Dense Mapping for Micro Drones Using Monocular Camera and Inertial Sensors | Jeryes Danial et.al. | 2511.14335 | null |
| 2025-11-18 | LSP-YOLO: A Lightweight Single-Stage Network for Sitting Posture Recognition on Embedded Devices | Nanjun Li et.al. | 2511.14322 | null |
| 2025-11-18 | iGaussian: Real-Time Camera Pose Estimation via Feed-Forward 3D Gaussian Splatting Inversion | Hao Wang et.al. | 2511.14149 | null |
| 2025-11-17 | GRLoc: Geometric Representation Regression for Visual Localization | Changyang Li et.al. | 2511.13864 | null |
| 2025-11-17 | RSPose: Ranking Based Losses for Human Pose Estimation | Muhammed Can Keles et.al. | 2511.13857 | null |
| 2025-11-17 | GeoX-Bench: Benchmarking Cross-View Geo-Localization and Pose Estimation Capabilities of Large Multimodal Models | Yushuo Zheng et.al. | 2511.13259 | null |
| 2025-11-17 | GaRLILEO: Gravity-aligned Radar-Leg-Inertial Enhanced Odometry | Chiyun Noh et.al. | 2511.13216 | null |
| 2025-11-17 | End-to-End Multi-Person Pose Estimation with Pose-Aware Video Transformer | Yonghui Yu et.al. | 2511.13208 | null |
| 2025-11-17 | CapeNext: Rethinking and Refining Dynamic Support Information for Category-Agnostic Pose Estimation | Yu Zhu et.al. | 2511.13102 | null |
| 2025-11-17 | PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos | Dianbing Xi et.al. | 2511.12935 | null |
| 2025-11-17 | CoordAR: One-Reference 6D Pose Estimation of Novel Objects via Autoregressive Coordinate Map Generation | Dexin Zuo et.al. | 2511.12919 | null |
| 2025-11-16 | OPFormer: Object Pose Estimation leveraging foundation model with geometric encoding | Artem Moroz et.al. | 2511.12614 | null |
| 2025-11-16 | Visible Structure Retrieval for Lightweight Image-Based Relocalisation | Fereidoon Zangeneh et.al. | 2511.12503 | null |
| 2025-11-15 | Changes in Real Time: Online Scene Change Detection with Multi-View Fusion | Chamuditha Jayanga Galappaththige et.al. | 2511.12370 | null |
| 2025-11-15 | AURA: Development and Validation of an Augmented Unplanned Removal Alert System using Synthetic ICU Videos | Junhyuk Seo et.al. | 2511.12241 | null |
| 2025-11-15 | VPHO: Joint Visual-Physical Cue Learning and Aggregation for Hand-Object Pose Estimation | Jun Zhou et.al. | 2511.12030 | null |
| 2025-11-12 | Understanding the Representation of Older Adults in Motion Capture Locomotion Datasets | Yunkai Yu et.al. | 2511.11713 | null |
| 2025-11-14 | YCB-Ev SD: Synthetic event-vision dataset for 6DoF object pose estimation | Pavel Rojtberg et.al. | 2511.11344 | null |
| 2025-11-14 | 6D Strawberry Pose Estimation: Real-time and Edge AI Solutions Using Purely Synthetic Training Data | Saptarshi Neil Sinha et.al. | 2511.11307 | null |
| 2025-11-13 | Depth Anything 3: Recovering the Visual Space from Any Views | Haotong Lin et.al. | 2511.10647 | link |
| 2025-11-13 | OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer | Haosong Peng et.al. | 2511.10560 | null |
| 2025-11-12 | STORM: Segment, Track, and Object Re-Localization from a Single Image | Yu Deng et.al. | 2511.09771 | null |
| 2025-11-12 | DreamPose3D: Hallucinative Diffusion with Prompt Learning for 3D Human Pose Estimation | Jerrin Bright et.al. | 2511.09502 | null |
| 2025-11-12 | SMF-VO: Direct Ego-Motion Estimation via Sparse Motion Fields | Sangheon Yang et.al. | 2511.09072 | null |
| 2025-11-12 | RadHARSimulator V2: Video to Doppler Generator | Weicheng Gao et.al. | 2511.09022 | null |
| 2025-11-12 | SasMamba: A Lightweight Structure-Aware Stride State Space Model for 3D Human Pose Estimation | Hu Cui et.al. | 2511.08872 | null |
| 2025-11-11 | Adaptive graph Kolmogorov-Arnold network for 3D human pose estimation | Abu Taib Mohammed Shahjahan et.al. | 2511.08809 | null |
| 2025-11-11 | RAPTR: Radar-based 3D Pose Estimation using Transformer | Sorachi Kato et.al. | 2511.08387 | null |
| 2025-11-11 | SkelSplat: Robust Multi-view 3D Human Pose Estimation with Differentiable Gaussian Rendering | Laura Bragagnolo et.al. | 2511.08294 | null |
| 2025-11-11 | An Image-Based Path Planning Algorithm Using a UAV Equipped with Stereo Vision | Selim Ahmet Iz et.al. | 2511.07928 | null |
| 2025-11-10 | LeCoT: revisiting network architecture for two-view correspondence pruning | Luanyuan Dai et.al. | 2511.07078 | null |
| 2025-11-10 | Robust and High-Fidelity 3D Gaussian Splatting: Fusing Pose Priors and Geometry Constraints for Texture-Deficient Outdoor Scenes | Meijun Guo et.al. | 2511.06765 | null |
| 2025-11-10 | Semi-distributed Cross-modal Air-Ground Relative Localization | Weining Lu et.al. | 2511.06749 | null |
| 2025-11-09 | VDNeRF: Vision-only Dynamic Neural Radiance Field for Urban Scenes | Zhengyu Zou et.al. | 2511.06408 | null |
| 2025-11-07 | Pedicle Screw Pairing and Registration for Screw Pose Estimation from Dual C-arm Images Using CAD Models | Yehyun Suh et.al. | 2511.05702 | null |
| 2025-11-07 | Multi-modal Loop Closure Detection with Foundation Models in Severely Unstructured Environments | Laura Alejandra Encinar Gonzalez et.al. | 2511.05404 | null |
| 2025-11-07 | No Pose Estimation? No Problem: Pose-Agnostic and Instance-Aware Test-Time Adaptation for Monocular Depth Estimation | Mingyu Sung et.al. | 2511.05055 | null |
| 2025-11-06 | Synchronous Observer Design for Landmark-Inertial SLAM with Almost-Global Convergence | Arkadeep Saha et.al. | 2511.04531 | null |
| 2025-11-06 | A Two-stage Adaptive Lifting PINN Framework for Solving Viscous Approximations to Hyperbolic Conservation Laws | Yameng Zhu et.al. | 2511.04490 | null |
| 2025-11-06 | Deep Dictionary-Free Method for Identifying Linear Model of Nonlinear System with Input Delay | Patrik Valábek et.al. | 2511.04451 | null |
| 2025-11-06 | MedSapiens: Taking a Pose to Rethink Medical Imaging Landmark Detection | Marawan Elbatel et.al. | 2511.04255 | null |
| 2025-11-06 | DMSORT: An efficient parallel maritime multi-object tracking architecture for unmanned vessel platforms | Shengyu Tang et.al. | 2511.04128 | null |
| 2025-11-06 | Simple 3D Pose Features Support Human and Machine Social Scene Understanding | Wenshuo Qin et.al. | 2511.03988 | null |
| 2025-11-05 | CORE - A Cell-Level Coarse-to-Fine Image Registration Engine for Multi-stain Image Alignment | Esha Sadia Nasir et.al. | 2511.03826 | null |
| 2025-11-05 | FusionDP: Foundation Model-Assisted Differentially Private Learning for Partially Sensitive Features | Linghui Zeng et.al. | 2511.03806 | null |
| 2025-10-30 | Electric Vehicle Charging Load Modeling: A Survey, Trends, Challenges and Opportunities | Xiachong Lin et.al. | 2511.03741 | null |
| 2025-10-21 | AI-Enhanced Wi-Fi Sensing Through Single Transceiver Pair | Yuxuan Liu et.al. | 2511.02845 | null |
| 2025-11-04 | Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks | Dmitrii Pozdeev et.al. | 2511.02830 | null |
| 2025-11-04 | Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization | Tao Liu et.al. | 2511.02489 | link |
| 2025-11-04 | A New Perspective on Precision and Recall for Generative Models | Benjamin Sykes et.al. | 2511.02414 | null |
| 2025-11-04 | Cycle-Sync: Robust Global Camera Pose Estimation through Enhanced Cycle-Consistent Synchronization | Shaohan Li et.al. | 2511.02329 | null |
| 2025-11-04 | Are Euler angles a useful rotation parameterisation for pose estimation with Normalizing Flows? | Giorgos Sfikas et.al. | 2511.02277 | null |
| 2025-11-04 | A Joint Variational Framework for Multimodal X-ray Ptychography and Fluorescence Reconstruction | Eric Zou et.al. | 2511.02153 | null |
| 2025-11-04 | A new approach for the analysis of evolution partial differential equations on a finite interval | Türker Özsarı et.al. | 2511.02145 | null |
| 2025-11-03 | HGFreNet: Hop-hybrid GraphFomer for 3D Human Pose Estimation with Trajectory Consistency in Frequency Domain | Kai Zhai et.al. | 2511.01756 | null |
| 2025-11-03 | Clutter Suppression in Bistatic ISAC with Joint Angle and Doppler Estimation | M. Ertug Pihtili et.al. | 2511.01599 | null |
| 2025-11-03 | Defining Energy Indicators for Impact Identification on Aerospace Composites: A Physics-Informed Machine Learning Perspective | Natália Ribeiro Marinho et.al. | 2511.01592 | null |
| 2025-11-03 | SE(3)-PoseFlow: Estimating 6D Pose Distributions for Uncertainty-Aware Robotic Manipulation | Yufeng Jin et.al. | 2511.01501 | null |
| 2025-11-03 | Floor Plan-Guided Visual Navigation Incorporating Depth and Directional Cues | Wei Huang et.al. | 2511.01493 | null |
| 2025-11-03 | Tackling the Kidnapped Robot Problem via Sparse Feasible Hypothesis Sampling and Reliable Batched Multi-Stage Inference | Muhua Zhang et.al. | 2511.01219 | null |
| 2025-11-03 | LiDAR-VGGT: Cross-Modal Coarse-to-Fine Fusion for Globally Consistent and Metric-Scale Dense Mapping | Lijie Wang et.al. | 2511.01186 | null |
| 2025-11-03 | Web-Scale Collection of Video Data for 4D Animal Reconstruction | Brian Nlong Zhao et.al. | 2511.01169 | null |
| 2025-11-01 | Active learning-based variance reduction for Monte Carlo simulations: A feasibility study for the nanodosimetry around a gold nanoparticle | Leo Thomas et.al. | 2511.00563 | null |
| 2025-10-31 | Residual Balancing for Non-Linear Outcome Models in High Dimensions | Isaac Meza et.al. | 2511.00324 | null |
| 2025-10-31 | On the well-posedness of the intermediate nonlinear Schrödinger equation on the line | Andreia Chapouto et.al. | 2511.00302 | null |
| 2025-10-31 | VLM6D: VLM based 6Dof Pose Estimation based on RGB-D Images | Md Selim Sarowar et.al. | 2511.00120 | null |
| 2025-10-31 | FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models | Junkang Liu et.al. | 2510.27486 | null |
| 2025-10-31 | Improved refined bilinear estimates and well-posedness for generalized KdV type equations on $\mathbb{R}$ | Luc Molinet et.al. | 2510.27461 | null |
| 2025-10-30 | Cooperative Integrated Estimation-Guidance for Simultaneous Interception of Moving Targets | Lohitvel Gopikannan et.al. | 2510.26948 | null |
| 2025-10-30 | Graph Guided Modulo Recovery of EEG Signals | Soujanya Hazra et.al. | 2510.26756 | null |
| 2025-10-30 | Orbital Optimization and Neural-Network-Assisted Configuration Interaction Calculations of Rydberg States | Gianluca Levi et.al. | 2510.26751 | null |
| 2025-10-30 | Tight Differentially Private PCA via Matrix Coherence | Tommaso d’Orsi et.al. | 2510.26679 | null |
| 2025-10-30 | Statistical Inference for Matching Decisions via Matrix Completion under Dependent Missingness | Congyuan Duan et.al. | 2510.26478 | null |
| 2025-10-30 | Transcending Sparse Measurement Limits: Operator-Learning-Driven Data Super-Resolution for Inverse Source Problem | Guanyu Pan et.al. | 2510.26227 | null |
| 2025-10-30 | Sketch2PoseNet: Efficient and Generalized Sketch to 3D Human Pose Prediction | Li Wang et.al. | 2510.26196 | null |
| 2025-10-30 | JOGS: Joint Optimization of Pose Estimation and 3D Gaussian Splatting | Yuxuan Li et.al. | 2510.26117 | null |
| 2025-10-29 | STITCH 2.0: Extending Augmented Suturing with EKF Needle Estimation and Thread Management | Kush Hari et.al. | 2510.25768 | null |
| 2025-10-29 | Inverse-free quantum state estimation with Heisenberg scaling | Kean Chen et.al. | 2510.25750 | null |
| 2025-10-29 | LieSolver: A PDE-constrained solver for IBVPs using Lie symmetries | René P. Klausen et.al. | 2510.25731 | null |
| 2025-10-29 | Seeing Clearly and Deeply: An RGBD Imaging Approach with a Bio-inspired Monocentric Design | Zongxi Yu et.al. | 2510.25314 | null |
| 2025-10-29 | Non-Invasive Calibration Of A Stewart Platform By Photogrammetry | Sourabh Karmakar et.al. | 2510.25072 | null |
| 2025-10-28 | A Black Box Variational Inference Scheme for Inverse Problems with Demanding Physics-Based Models | G. Robalo Rei et.al. | 2510.25038 | null |
| 2025-10-28 | Understanding Multi-View Transformers | Michal Stary et.al. | 2510.24907 | null |
| 2025-10-28 | Greedy Sampling Is Provably Efficient for RLHF | Di Wu et.al. | 2510.24700 | null |
| 2025-10-28 | GeVI-SLAM: Gravity-Enhanced Stereo Visua Inertial SLAM for Underwater Robots | Yuan Shen et.al. | 2510.24533 | null |
| 2025-10-28 | Contributions to Semialgebraic-Set-Based Stability Verification of Dynamical Systems with Neural-Network-Based Controllers | Alvaro Detailleur et.al. | 2510.24391 | null |
| 2025-10-28 | Global-State-Free Obstacle Avoidance for Quadrotor Control in Air-Ground Cooperation | Baozhe Zhang et.al. | 2510.24315 | null |
| 2025-10-26 | Policies over Poses: Reinforcement Learning based Distributed Pose-Graph Optimization for Multi-Robot SLAM | Sai Krishna Ghanta et.al. | 2510.22740 | null |
| 2025-10-26 | Cross-Species Transfer Learning in Agricultural AI: Evaluating ZebraPose Adaptation for Dairy Cattle Pose Estimation | Mackenzie Tapp et.al. | 2510.22618 | null |
| 2025-10-26 | DynaPose4D: High-Quality 4D Dynamic Content Generation via Pose Alignment Loss | Jing Yang et.al. | 2510.22473 | null |
| 2025-10-25 | Breaking the Static Assumption: A Dynamic-Aware LIO Framework Via Spatio-Temporal Normal Analysis | Chen Zhiqiang et.al. | 2510.22313 | null |
| 2025-10-18 | Multi-Agent Pose Uncertainty: A Differentiable Rendering Cramér-Rao Bound | Arun Muthukkumar et.al. | 2510.21785 | null |
| 2025-10-24 | Group Inertial Poser: Multi-Person Pose and Global Translation from Sparse Inertial Sensors and Ultra-Wideband Ranging | Ying Xue et.al. | 2510.21654 | null |
| 2025-10-23 | BioDet: Boosting Industrial Object Detection with Image Preprocessing Strategies | Jiaqi Hu et.al. | 2510.21000 | null |
| 2025-10-23 | ROPES: Robotic Pose Estimation via Score-Based Causal Representation Learning | Pranamya Kulkarni et.al. | 2510.20884 | null |
| 2025-10-23 | Monocular Visual 8D Pose Estimation for Articulated Bicycles and Cyclists | Eduardo R. Corral-Soto et.al. | 2510.20158 | null |
| 2025-10-22 | AI Pose Analysis and Kinematic Profiling of Range-of-Motion Variations in Resistance Training | Adam Diamant et.al. | 2510.20012 | null |
| 2025-10-22 | PoseCrafter: Extreme Pose Estimation with Hybrid Video Synthesis | Qing Mao et.al. | 2510.19527 | null |
| 2025-10-22 | PRGCN: A Graph Memory Network for Cross-Sequence Pattern Reuse in 3D Human Pose Estimation | Zhuoyang Xie et.al. | 2510.19475 | null |
| 2025-10-21 | Kinematic Analysis and Integration of Vision Algorithms for a Mobile Manipulator Employed Inside a Self-Driving Laboratory | Shifa Sulaiman et.al. | 2510.19081 | null |
| 2025-10-21 | UniHPR: Unified Human Pose Representation via Singular Value Contrastive Learning | Zhongyu Jiang et.al. | 2510.19078 | null |
| 2025-10-21 | PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-Forward Planar Splatting | Changkun Liu et.al. | 2510.18714 | null |
| 2025-10-21 | RayPose: Ray Bundling Diffusion for Template Views in Unseen 6D Object Pose Estimation | Junwen Huang et.al. | 2510.18521 | null |
| 2025-10-20 | Adapting Stereo Vision From Objects To 3D Lunar Surface Reconstruction with the StereoLunar Dataset | Clementine Grethen et.al. | 2510.18172 | null |
| 2025-10-20 | Raindrop GS: A Benchmark for 3D Gaussian Splatting under Raindrop Conditions | Zhiqiang Teng et.al. | 2510.17719 | null |
| 2025-10-20 | PAGE-4D: Disentangled Pose and Geometry Estimation for 4D Perception | Kaichen Zhou et.al. | 2510.17568 | null |
| 2025-10-20 | KineDiff3D: Kinematic-Aware Diffusion for Category-Level Articulated Object Shape Reconstruction and Generation | WenBo Xu et.al. | 2510.17137 | null |
| 2025-10-19 | How Universal Are SAM2 Features? | Masoud Khairi Atani et.al. | 2510.17051 | null |
| 2025-10-19 | GS2POSE: Marry Gaussian Splatting to 6D Object Pose Estimation | Junbo Li et.al. | 2510.16777 | null |
| 2025-10-18 | SPLite Hand: Sparsity-Aware Lightweight 3D Hand Pose Estimation | Yeh Keng Hao et.al. | 2510.16396 | null |
| 2025-10-17 | Proactive Scene Decomposition and Reconstruction | Baicheng Li et.al. | 2510.16272 | null |
| 2025-10-17 | Valeo Near-Field: a novel dataset for pedestrian intent detection | Antonyo Musabini et.al. | 2510.15673 | null |
| 2025-10-17 | Freehand 3D Ultrasound Imaging: Sim-in-the-Loop Probe Pose Optimization via Visual Servoing | Yameng Zhang et.al. | 2510.15668 | null |
| 2025-10-17 | MRASfM: Multi-Camera Reconstruction and Aggregation through Structure-from-Motion in Driving Scenes | Lingfeng Xuan et.al. | 2510.15467 | null |
| 2025-10-17 | PFGS: Pose-Fused 3D Gaussian Splatting for Complete Multi-Pose Object Reconstruction | Ting-Yu Yen et.al. | 2510.15386 | null |
| 2025-10-17 | Proto-Former: Unified Facial Landmark Detection by Prototype Transformer | Shengkai Hu et.al. | 2510.15338 | null |
| 2025-10-17 | CuSfM: CUDA-Accelerated Structure-from-Motion | Jingrui Yu et.al. | 2510.15271 | null |
| 2025-10-17 | LVI-Q: Robust LiDAR-Visual-Inertial-Kinematic Odometry for Quadruped Robots Using Tightly-Coupled and Efficient Alternating Optimization | Kevin Christiansen Marsim et.al. | 2510.15220 | null |
| 2025-10-16 | C4D: 4D Made from 3D through Dual Correspondences | Shizun Wang et.al. | 2510.14960 | null |
| 2025-10-16 | Spatially anchored Tactile Awareness for Robust Dexterous Manipulation | Jialei Huang et.al. | 2510.14647 | null |
| 2025-10-15 | DAMM-LOAM: Degeneracy Aware Multi-Metric LiDAR Odometry and Mapping | Nishant Chandna et.al. | 2510.13287 | null |
| 2025-10-15 | Convergence, design and training of continuous-time dropout as a random batch method | Antonio Álvarez-López et.al. | 2510.13134 | null |
| 2025-10-15 | True Self-Supervised Novel View Synthesis is Transferable | Thomas W. Mitchel et.al. | 2510.13063 | null |
| 2025-10-14 | SPORTS: Simultaneous Panoptic Odometry, Rendering, Tracking and Segmentation for Urban Scenes Understanding | Zhiliu Yang et.al. | 2510.12749 | null |
| 2025-10-14 | On the Use of Hierarchical Vision Foundation Models for Low-Cost Human Mesh Recovery and Pose Estimation | Shuhei Tarashima et.al. | 2510.12660 | null |
| 2025-10-13 | Lightweight Facial Landmark Detection in Thermal Images via Multi-Level Cross-Modal Knowledge Transfer | Qiyi Tong et.al. | 2510.11128 | null |
| 2025-10-13 | High-Resolution Spatiotemporal Modeling with Global-Local State Space Models for Video-Based Human Pose Estimation | Runyang Feng et.al. | 2510.11017 | null |
| 2025-10-13 | DKPMV: Dense Keypoints Fusion from Multi-View RGB Frames for 6D Pose Estimation of Textureless Objects | Jiahong Chen et.al. | 2510.10933 | null |
| 2025-10-12 | MonoSE(3)-Diffusion: A Monocular SE(3) Diffusion Framework for Robust Camera-to-Robot Pose Estimation | Kangjian Zhu et.al. | 2510.10434 | null |
| 2025-10-11 | HccePose(BF): Predicting Front & Back Surfaces to Construct Ultra-Dense 2D-3D Correspondences for Pose Estimation | Yulin Wang et.al. | 2510.10177 | null |
| 2025-10-11 | Gesplat: Robust Pose-Free 3D Reconstruction via Geometry-Guided Gaussian Splatting | Jiahui Lu et.al. | 2510.10097 | null |
| 2025-10-11 | FORM: Fixed-Lag Odometry with Reparative Mapping utilizing Rotating LiDAR Sensors | Easton R. Potokar et.al. | 2510.09966 | null |
| 2025-10-10 | An uncertainty-aware framework for data-efficient multi-view animal pose estimation | Lenny Aharon et.al. | 2510.09903 | null |
| 2025-10-10 | Cross-Sensor Touch Generation | Samanta Rodriguez et.al. | 2510.09817 | null |
| 2025-10-10 | mmJoints: Expanding Joint Representations Beyond (x,y,z) in mmWave-Based 3D Pose Estimation | Zhenyu Wang et.al. | 2510.08970 | null |
| 2025-10-09 | ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation | Guanghao Li et.al. | 2510.08551 | null |
| 2025-10-09 | DexMan: Learning Bimanual Dexterous Manipulation from Human and Generated Videos | Jhen Hsieh et.al. | 2510.08475 | null |
| 2025-10-09 | GraphEnet: Event-driven Human Pose Estimation with a Graph Neural Network | Gaurvi Goyal et.al. | 2510.07990 | null |
| 2025-10-08 | TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics | Yi Han et.al. | 2510.07181 | null |
| 2025-10-07 | Human3R: Everyone Everywhere All at Once | Yue Chen et.al. | 2510.06219 | null |
| 2025-10-07 | DeLTa: Demonstration and Language-Guided Novel Transparent Object Manipulation | Taeyeop Lee et.al. | 2510.05662 | null |
| 2025-10-07 | Correlation-Aware Dual-View Pose and Velocity Estimation for Dynamic Robotic Manipulation | Mahboubeh Zarei et.al. | 2510.05536 | null |
| 2025-10-05 | Joint Learning of Pose Regression and Denoising Diffusion with Score Scaling Sampling for Category-level 6D Pose Estimation | Seunghyun Lee et.al. | 2510.04125 | null |
| 2025-10-04 | TCB-VIO: Tightly-Coupled Focal-Plane Binary-Enhanced Visual Inertial Odometry | Matthew Lisondra et.al. | 2510.03919 | null |
| 2025-10-04 | Adaptively Sampling-Reusing-Mixing Decomposed Gradients to Speed Up Sharpness Aware Minimization | Jiaxin Deng et.al. | 2510.03763 | null |
| 2025-10-03 | Efficient Surgical Robotic Instrument Pose Reconstruction in Real World Conditions Using Unified Feature Detection | Zekai Liang et.al. | 2510.03532 | null |
| 2025-10-02 | Visual Odometry with Transformers | Vlardimir Yugay et.al. | 2510.03348 | null |
| 2025-10-03 | Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields | Zhiting Mei et.al. | 2510.03104 | null |
| 2025-10-03 | VERNIER: an open-source software pushing marker pose estimation down to the micrometer and nanometer scales | Patrick Sandoz et.al. | 2510.02791 | null |
| 2025-10-02 | PhysHMR: Learning Humanoid Control Policies from Vision for Physically Plausible Human Motion Reconstruction | Qiao Feng et.al. | 2510.02566 | null |
| 2025-10-02 | Paving the Way Towards Kinematic Assessment Using Monocular Video: A Preclinical Benchmark of State-of-the-Art Deep-Learning-Based 3D Human Pose Estimators Against Inertial Sensors in Daily Living Activities | Mario Medrano-Paredes et.al. | 2510.02264 | null |
| 2025-10-02 | Zero-shot Human Pose Estimation using Diffusion-based Inverse solvers | Sahil Bhandary Karnoor et.al. | 2510.02043 | null |
| 2025-10-02 | An Efficient Deep Template Matching and In-Plane Pose Estimation Method via Template-Aware Dynamic Convolution | Ke Jia et.al. | 2510.01678 | null |
| 2025-10-01 | Pose Estimation of a Thruster-Driven Bioinspired Multi-Link Robot | Nicholas B. Andrews et.al. | 2510.01485 | null |
| 2025-10-01 | Temporal Score Rescaling for Temperature Sampling in Diffusion and Flow Models | Yanbo Xu et.al. | 2510.01184 | null |
| 2025-10-01 | Enabling High-Frequency Cross-Modality Visual Positioning Service for Accurate Drone Landing | Haoyang Wang et.al. | 2510.00646 | null |
| 2025-10-01 | Cascaded Diffusion Framework for Probabilistic Coarse-to-Fine Hand Pose Estimation | Taeyun Woo et.al. | 2510.00527 | null |
| 2025-10-01 | Affordance-Guided Diffusion Prior for 3D Hand Reconstruction | Naru Suzuki et.al. | 2510.00506 | null |
| 2025-09-30 | TTT3R: 3D Reconstruction as Test-Time Training | Xingyu Chen et.al. | 2509.26645 | link |
| 2025-09-30 | A Multi-purpose Tracking Framework for Salmon Welfare Monitoring in Challenging Environments | Espen Uri Høgstedt et.al. | 2509.25969 | null |
| 2025-09-30 | Physics-Informed Learning for Human Whole-Body Kinematics Prediction via Sparse IMUs | Cheng Guo et.al. | 2509.25704 | null |
| 2025-09-29 | Robust Visual Localization in Compute-Constrained Environments by Salient Edge Rendering and Weighted Hamming Similarity | Tu-Hoa Pham et.al. | 2509.25520 | null |
| 2025-09-29 | VGGT-X: When VGGT Meets Dense Novel View Synthesis | Yang Liu et.al. | 2509.25191 | link |
| 2025-09-29 | PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos | Ting-Hsuan Liao et.al. | 2509.25183 | null |
| 2025-09-29 | SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation | Shuang Liang et.al. | 2509.24980 | link |
| 2025-09-29 | PoseDiff: A Unified Diffusion Model Bridging Robot Pose Estimation and Video-to-Action Control | Haozhuo Zhang et.al. | 2509.24591 | null |
| 2025-09-29 | SCOPE: Semantic Conditioning for Sim2Real Category-Level Object Pose Estimation in Robotics | Peter Hönig et.al. | 2509.24572 | null |
| 2025-09-28 | GRS-SLAM3R: Real-Time Dense SLAM with Gated Recurrent State | Guole Shen et.al. | 2509.23737 | null |
| 2025-09-28 | Color-Pair Guided Robust Zero-Shot 6D Pose Estimation and Tracking of Cluttered Objects on Edge Devices | Xingjian Yang et.al. | 2509.23647 | null |
| 2025-09-27 | 3DPCNet: Pose Canonicalization for Robust Viewpoint-Invariant 3D Kinematic Analysis from Monocular RGB cameras | Tharindu Ekanayake et.al. | 2509.23455 | null |
| 2025-09-27 | Generative Modeling of Shape-Dependent Self-Contact Human Poses | Takehiko Ohkawa et.al. | 2509.23393 | null |
| 2025-09-27 | UniPose: Unified Cross-modality Pose Prior Propagation towards RGB-D data for Weakly Supervised 3D Human Pose Estimation | Jinghong Zheng et.al. | 2509.23376 | null |
| 2025-09-27 | GeLoc3r: Enhancing Relative Camera Pose Regression with Geometric Consistency Regularization | Jingxing Li et.al. | 2509.23038 | null |
| 2025-09-26 | Good Weights: Proactive, Adaptive Dead Reckoning Fusion for Continuous and Robust Visual SLAM | Yanwei Du et.al. | 2509.22910 | null |
| 2025-09-26 | ControlEvents: Controllable Synthesis of Event Camera Datawith Foundational Prior from Image Diffusion Models | Yixuan Hu et.al. | 2509.22864 | null |
| 2025-09-26 | An Adaptive ICP LiDAR Odometry Based on Reliable Initial Pose | Qifeng Wang et.al. | 2509.22058 | null |
| 2025-09-26 | SingRef6D: Monocular Novel Object Pose Estimation with a Single RGB Reference | Jiahui Wang et.al. | 2509.21927 | null |
| 2025-09-24 | mmHSense: Multi-Modal and Distributed mmWave ISAC Datasets for Human Sensing | Nabeel Nisar Bhat et.al. | 2509.21396 | null |
| 2025-09-25 | Finding 3D Positions of Distant Objects from Noisy Camera Movement and Semantic Segmentation Sequences | Julius Pesonen et.al. | 2509.20906 | null |
| 2025-09-25 | AI-Enabled Crater-Based Navigation for Lunar Mapping | Sofia McLeod et.al. | 2509.20748 | null |
| 2025-09-25 | EEG-Driven AR-Robot System for Zero-Touch Grasping Manipulation | Junzhe Wang et.al. | 2509.20656 | null |
| 2025-09-24 | Reflect3r: Single-View 3D Stereo Reconstruction Aided by Mirror Reflections | Jing Wu et.al. | 2509.20607 | null |
| 2025-09-24 | AJAHR: Amputated Joint Aware 3D Human Mesh Recovery | Hyunjin Cho et.al. | 2509.19939 | null |
| 2025-09-23 | Category-Level Object Shape and Pose Estimation in Less Than a Millisecond | Lorenzo Shaikewitz et.al. | 2509.18979 | null |
| 2025-09-23 | Towards Robust LiDAR Localization: Deep Learning-based Uncertainty Estimation | Minoo Dolatabadi et.al. | 2509.18954 | null |
| 2025-09-23 | Human-Interpretable Uncertainty Explanations for Point Cloud Registration | Johannes A. Gaus et.al. | 2509.18786 | null |
| 2025-09-23 | SINGER: An Onboard Generalist Vision-Language Navigation Policy for Drones | Maximilian Adang et.al. | 2509.18610 | null |
| 2025-09-22 | Selecting Optimal Camera Views for Gait Analysis: A Multi-Metric Assessment of 2D Projections | Dong Chen et.al. | 2509.17805 | null |
| 2025-09-22 | Evict3R: Training-Free Token Eviction for Memory-Bounded Streaming Visual Geometry Transformers | Soroush Mahdi et.al. | 2509.17650 | null |
| 2025-09-22 | VideoArtGS: Building Digital Twins of Articulated Objects from Monocular Video | Yu Liu et.al. | 2509.17647 | null |
| 2025-09-22 | Pose Estimation of a Cable-Driven Serpentine Manipulator Utilizing Intrinsic Dynamics via Physical Reservoir Computing | Kazutoshi Tanaka et.al. | 2509.17308 | null |
| 2025-09-21 | SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views | Ranran Huang et.al. | 2509.17246 | null |
| 2025-09-21 | Leveraging RGB Images for Pre-Training of Event-Based Hand Pose Estimation | Ruicong Liu et.al. | 2509.16949 | null |
| 2025-09-19 | UniTac2Pose: A Unified Approach Learned in Simulation for Category-level Visuotactile In-hand Pose Estimation | Mingdong Wu et.al. | 2509.15934 | null |
| 2025-09-19 | Self-Supervised Cross-Modal Learning for Image-to-Point Cloud Registration | Xingmei Wang et.al. | 2509.15882 | null |
| 2025-09-19 | STARC: See-Through-Wall Augmented Reality Framework for Human-Robot Collaboration in Emergency Response | Shenghai Yuan et.al. | 2509.15507 | null |
| 2025-09-18 | NeRF-based Visualization of 3D Cues Supporting Data-Driven Spacecraft Pose Estimation | Antoine Legrand et.al. | 2509.14890 | null |
| 2025-09-17 | SWA-PF: Semantic-Weighted Adaptive Particle Filter for Memory-Efficient 4-DoF UAV Localization in GNSS-Denied Environments | Jiayu Yuan et.al. | 2509.13795 | null |
| 2025-09-17 | Bridging the Synthetic-Real Gap: Supervised Domain Adaptation for Robust Spacecraft 6-DoF Pose Estimation | Inder Pal Singh et.al. | 2509.13792 | null |
| 2025-09-17 | UM-Depth : Uncertainty Masked Self-Supervised Monocular Depth Estimation with Visual Odometry | Tae-Wook Um et.al. | 2509.13713 | null |
| 2025-09-17 | Gaussian Alignment for Relative Camera Pose Estimation via Single-View Reconstruction | Yumin Li et.al. | 2509.13652 | null |
| 2025-09-16 | Object Pose Estimation through Dexterous Touch | Amir-Hossein Shahidzadeh et.al. | 2509.13591 | null |
| 2025-09-16 | Using Visual Language Models to Control Bionic Hands: Assessment of Object Perception and Grasp Inference | Ozan Karaali et.al. | 2509.13572 | null |
| 2025-09-16 | ROOM: A Physics-Based Continuum Robot Simulator for Photorealistic Medical Datasets Generation | Salvatore Esposito et.al. | 2509.13177 | link |
| 2025-09-15 | 3D Human Pose and Shape Estimation from LiDAR Point Clouds: A Review | Salma Galaaoui et.al. | 2509.12197 | null |
| 2025-09-15 | Robust Fetal Pose Estimation across Gestational Ages via Cross-Population Augmentation | Sebastian Diaz et.al. | 2509.12062 | null |
| 2025-09-15 | Segmentation-Driven Initialization for Sparse-view 3D Gaussian Splatting | Yi-Hsin Li et.al. | 2509.11853 | null |
| 2025-09-15 | IMD: A 6-DoF Pose Estimation Benchmark for Industrial Metallic Objects | Ruimin Ma et.al. | 2509.11680 | null |
| 2025-09-14 | ActivePose: Active 6D Object Pose Estimation and Tracking for Robotic Manipulation | Sheng Liu et.al. | 2509.11364 | null |
| 2025-09-13 | AutoOEP – A Multi-modal Framework for Online Exam Proctoring | Aryan Kashyap Naveen et.al. | 2509.10887 | null |
| 2025-09-09 | HiLWS: A Human-in-the-Loop Weak Supervision Framework for Curating Clinical and Home Video Data for Neurological Assessment | Atefeh Irani et.al. | 2509.10557 | null |
| 2025-09-12 | Self-supervised Learning Of Visual Pose Estimation Without Pose Labels By Classifying LED States | Nicholas Carlotti et.al. | 2509.10405 | null |
| 2025-09-11 | MimicDroid: In-Context Learning for Humanoid Robot Manipulation from Human Play Videos | Rutav Shah et.al. | 2509.09769 | null |
| 2025-09-10 | MultimodalHugs: Enabling Sign Language Processing in Hugging Face | Gerard Sant et.al. | 2509.09729 | null |
| 2025-09-09 | Australian Supermarket Object Set (ASOS): A Benchmark Dataset of Physical Objects and 3D Models for Robotics and Computer Vision | Akansel Cosgun et.al. | 2509.09720 | null |
| 2025-09-10 | iMatcher: Improve matching in point cloud registration via local-to-global geometric consistency learning | Karim Slimani et.al. | 2509.08982 | null |
| 2025-09-10 | PianoVAM: A Multimodal Piano Performance Dataset | Yonghyun Kim et.al. | 2509.08800 | null |
| 2025-09-10 | Deep Visual Odometry for Stereo Event Cameras | Sheng Zhong et.al. | 2509.08235 | null |
| 2025-09-09 | SVN-ICP: Uncertainty Estimation of ICP-based LiDAR Odometry using Stein Variational Newton | Shiping Ma et.al. | 2509.08069 | null |
| 2025-09-09 | One View, Many Worlds: Single-Image to 3D Object Meets Generative Domain Randomization for One-Shot 6D Pose Estimation | Zheng Geng et.al. | 2509.07978 | null |
| 2025-09-09 | Parse Graph-Based Visual-Language Interaction for Human Pose Estimation | Shibang Liu et.al. | 2509.07385 | null |
| 2025-09-08 | H $_{2}$ OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers | Wenhao Li et.al. | 2509.06956 | null |
| 2025-09-08 | Musculoskeletal simulation of limb movement biomechanics in Drosophila melanogaster | Pembe Gizem Özdil et.al. | 2509.06426 | null |
| 2025-09-07 | DVLO4D: Deep Visual-Lidar Odometry with Sparse Spatial-temporal Fusion | Mengmeng Liu et.al. | 2509.06023 | null |
| 2025-09-07 | Motion Aware ViT-based Framework for Monocular 6-DoF Spacecraft Pose Estimation | Jose Sosa et.al. | 2509.06000 | null |
| 2025-09-06 | Multi-LVI-SAM: A Robust LiDAR-Visual-Inertial Odometry for Multiple Fisheye Cameras | Xinyu Zhang et.al. | 2509.05740 | null |
| 2025-09-05 | WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool | Zizun Li et.al. | 2509.05296 | link |
| 2025-09-04 | Odometry Calibration and Pose Estimation of a 4WIS4WID Mobile Wall Climbing Robot | Branimir Ćaran et.al. | 2509.04016 | null |
| 2025-09-03 | SmartPoser: Arm Pose Estimation with a Smartphone and Smartwatch Using UWB and IMU Data | Nathan DeVrio et.al. | 2509.03451 | null |
| 2025-09-03 | Towards Realistic Hand-Object Interaction with Gravity-Field Based Diffusion Bridge | Miao Xu et.al. | 2509.03114 | null |
| 2025-09-03 | IL-SLAM: Intelligent Line-assisted SLAM Based on Feature Awareness for Dynamic Environments | Haolan Zhang et.al. | 2509.02972 | null |
| 2025-09-02 | Robotic 3D Flower Pose Estimation for Small-Scale Urban Farms | Harsh Muriki et.al. | 2509.02870 | null |
| 2025-09-02 | Generalizing Unsupervised Lidar Odometry Model from Normal to Snowy Weather Conditions | Beibei Zhou et.al. | 2509.02011 | null |
| 2025-09-02 | Doctoral Thesis: Geometric Deep Learning For Camera Pose Prediction, Registration, Depth Estimation, and 3D Reconstruction | Xueyang Kang et.al. | 2509.01873 | null |
| 2025-09-01 | FGO-SLAM: Enhancing Gaussian SLAM with Globally Consistent Opacity Radiance Field | Fan Zhu et.al. | 2509.01547 | null |
| 2025-09-01 | Learning Correlation-aware Aleatoric Uncertainty for 3D Hand Pose Estimation | Lee Chae-Yeon et.al. | 2509.01242 | null |
| 2025-09-01 | SR-SLAM: Scene-reliability Based RGB-D SLAM in Diverse Environments | Haolan Zhang et.al. | 2509.01111 | null |
| 2025-09-01 | An End-to-End Framework for Video Multi-Person Pose Estimation | Zhihong Wei et.al. | 2509.01095 | null |
| 2025-08-31 | UPGS: Unified Pose-aware Gaussian Splatting for Dynamic Scene Deblurring | Zhijing Wu et.al. | 2509.00831 | null |
| 2025-08-31 | DyPho-SLAM : Real-time Photorealistic SLAM in Dynamic Environments | Yi Liu et.al. | 2509.00741 | null |
| 2025-08-31 | MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation | Aviral Chharia et.al. | 2509.00649 | null |
| 2025-08-30 | Generative Visual Foresight Meets Task-Agnostic Pose Estimation in Robotic Table-Top Manipulation | Chuye Zhang et.al. | 2509.00361 | null |
| 2025-08-24 | Performance is not All You Need: Sustainability Considerations for Algorithms | Xiang Li et.al. | 2509.00045 | null |
| 2025-08-29 | Efficient Diffusion-Based 3D Human Pose Estimation with Hierarchical Temporal Pruning | Yuquan Bi et.al. | 2508.21363 | null |
| 2025-08-28 | PHD: Personalized 3D Human Body Fitting with Point Diffusion | Hsuan-I Ho et.al. | 2508.21257 | null |
| 2025-08-27 | ROBUST-MIPS: A Combined Skeletal Pose and Instance Segmentation Dataset for Laparoscopic Surgical Instruments | Zhe Han et.al. | 2508.21096 | null |
| 2025-08-28 | COMETH: Convex Optimization for Multiview Estimation and Tracking of Humans | Enrico Martini et.al. | 2508.20920 | null |
| 2025-08-28 | Estimating 2D Keypoints of Surgical Tools Using Vision-Language Models with Low-Rank Adaptation | Krit Duangprom et.al. | 2508.20830 | null |
| 2025-08-27 | WEBEYETRACK: Scalable Eye-Tracking for the Browser via On-Device Few-Shot Personalization | Eduardo Davalos et.al. | 2508.19544 | null |
| 2025-08-21 | PriorFormer: A Transformer for Real-time Monocular 3D Human Pose Estimation with Versatile Geometric Priors | Mohamed Adjel et.al. | 2508.18238 | null |
| 2025-08-25 | SAIL-Recon: Large SfM by Augmenting Scene Regression with Localization | Junyuan Deng et.al. | 2508.17972 | null |
| 2025-08-25 | Camera Pose Refinement via 3D Gaussian Splatting | Lulu Hao et.al. | 2508.17876 | null |
| 2025-08-25 | DroneKey: Drone 3D Pose Estimation in Image Sequences using Gated Key-representation and Pose-adaptive Learning | Seo-Bin Hwang et.al. | 2508.17746 | null |
| 2025-08-25 | IDU: Incremental Dynamic Update of Existing 3D Virtual Environments with New Imagery Data | Meida Chen et.al. | 2508.17579 | null |
| 2025-08-24 | PersPose: 3D Human Pose Estimation with Perspective Encoding and Perspective Rotation | Xiaoyang Hao et.al. | 2508.17239 | link |
| 2025-08-23 | Fiducial Marker Splatting for High-Fidelity Robotics Simulations | Diram Tabaa et.al. | 2508.17012 | null |
| 2025-08-22 | An Investigation of Visual Foundation Models Robustness | Sandeep Gupta et.al. | 2508.16225 | null |
| 2025-08-21 | UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation | Zhaodong Jiang et.al. | 2508.15972 | null |
| 2025-08-21 | MExECON: Multi-view Extended Explicit Clothed humans Optimized via Normal integration | Fulden Ece Uğur et.al. | 2508.15500 | null |
| 2025-08-21 | Lang2Lift: A Framework for Language-Guided Pallet Detection and Pose Estimation Integrated in Autonomous Outdoor Forklift Operation | Huy Hoang Nguyen et.al. | 2508.15427 | null |
| 2025-08-20 | A Vision-Based Shared-Control Teleoperation Scheme for Controlling the Robotic Arm of a Four-Legged Robot | Murilo Vinicius da Silva et.al. | 2508.14994 | null |
| 2025-08-20 | You Only Pose Once: A Minimalist’s Detection Transformer for Monocular RGB Category-level 9D Multi-Object Pose Estimation | Hakjin Lee et.al. | 2508.14965 | null |
| 2025-08-19 | Heatmap Regression without Soft-Argmax for Facial Landmark Detection | Chiao-An Yang et.al. | 2508.14929 | null |
| 2025-08-20 | 6-DoF Object Tracking with Event-based Optical Flow and Frames | Zhichao Li et.al. | 2508.14776 | null |
| 2025-08-20 | Fusing Monocular RGB Images with AIS Data to Create a 6D Pose Estimation Dataset for Marine Vessels | Fabian Holst et.al. | 2508.14767 | null |
| 2025-08-20 | GeMS: Efficient Gaussian Splatting for Extreme Motion Blur | Gopi Raju Matta et.al. | 2508.14682 | null |
| 2025-08-20 | Consistent Pose Estimation of Unmanned Ground Vehicles through Terrain-Aided Multi-Sensor Fusion on Geometric Manifolds | Alexander Raab et.al. | 2508.14661 | null |
| 2025-08-20 | From Slices to Structures: Unsupervised 3D Reconstruction of Female Pelvic Anatomy from Freehand Transvaginal Ultrasound | Max Krähenmann et.al. | 2508.14552 | null |
| 2025-08-20 | HyperDiff: Hypergraph Guided Diffusion Model for 3D Human Pose Estimation | Bing Han et.al. | 2508.14431 | null |
| 2025-08-20 | Learning Point Cloud Representations with Pose Continuity for Depth-Based Category-Level 6D Object Pose Estimation | Zhujun Li et.al. | 2508.14358 | null |
| 2025-08-20 | D $^2$ -LIO: Enhanced Optimization for LiDAR-IMU Odometry Considering Directional Degeneracy | Guodong Yao et.al. | 2508.14355 | null |
| 2025-08-19 | LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos | Chin-Yang Lin et.al. | 2508.14041 | link |
| 2025-08-19 | MR6D: Benchmarking 6D Pose Estimation for Mobile Robots | Anas Gouda et.al. | 2508.13775 | null |
| 2025-08-19 | RCGNet: RGB-based Category-Level 6D Object Pose Estimation with Geometric Guidance | Sheng Yu et.al. | 2508.13623 | null |
| 2025-08-18 | Physically Plausible Data Augmentations for Wearable IMU-based Human Activity Recognition Using Physics Simulation | Nobuyuki Oishi et.al. | 2508.13284 | null |
| 2025-08-18 | Stable Diffusion-Based Approach for Human De-Occlusion | Seung Young Noh et.al. | 2508.12663 | null |
| 2025-08-15 | Unifying Scale-Aware Depth Prediction and Perceptual Priors for Monocular Endoscope Pose Estimation and Tissue Reconstruction | Muzammil Khan et.al. | 2508.11282 | null |
| 2025-08-15 | A Coarse-to-Fine Human Pose Estimation Method based on Two-stage Distillation and Progressive Graph Neural Network | Zhangjian Ji et.al. | 2508.11212 | null |
| 2025-08-12 | ViPE: Video Pose Engine for 3D Geometric Perception | Jiahui Huang et.al. | 2508.10934 | null |
| 2025-08-14 | The SET Perceptual Factors Framework: Towards Assured Perception for Autonomous Systems | Troi Williams et.al. | 2508.10798 | null |
| 2025-08-14 | Lameness detection in dairy cows using pose estimation and bidirectional LSTMs | Helena Russello et.al. | 2508.10643 | null |
| 2025-08-14 | EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba | Quang Nguyen et.al. | 2508.10522 | null |
| 2025-08-14 | eMamba: Efficient Acceleration Framework for Mamba Models in Edge Computing | Jiyong Kim et.al. | 2508.10370 | null |
| 2025-08-13 | Predictive Uncertainty for Runtime Assurance of a Real-Time Computer Vision-Based Landing System | Romeo Valentin et.al. | 2508.09732 | null |
| 2025-08-13 | Enhancing Monocular 3D Hand Reconstruction with Learned Texture Priors | Giorgos Karvounas et.al. | 2508.09629 | null |
| 2025-08-12 | DiffPose-Animal: A Language-Conditioned Diffusion Framework for Animal Pose Estimation | Tianyu Xiong et.al. | 2508.08783 | null |
| 2025-08-12 | QoE-Aware Service Provision for Mobile AR Rendering: An Agent-Driven Approach | Conghao Zhou et.al. | 2508.08627 | null |
| 2025-08-11 | Forecasting Continuous Non-Conservative Dynamical Systems in SO(3) | Lennart Bastian et.al. | 2508.07775 | null |
| 2025-08-10 | Generic Calibration: Pose Ambiguity/Linear Solution and Parametric-hybrid Pipeline | Yuqi Han et.al. | 2508.07217 | null |
| 2025-08-09 | AugLift: Boosting Generalization in Lifting-based 3D Human Pose Estimation | Nikolai Warner et.al. | 2508.07112 | null |
| 2025-08-09 | VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions | Yash Garg et.al. | 2508.06757 | null |
| 2025-08-08 | DiffCap: Diffusion-based Real-time Human Motion Capture using Sparse IMUs and a Monocular Camera | Shaohua Pan et.al. | 2508.06139 | null |
| 2025-08-06 | Surf3R: Rapid Surface Reconstruction from Sparse RGB Views in Seconds | Haodong Zhu et.al. | 2508.04508 | null |
| 2025-08-06 | RiemanLine: Riemannian Manifold Representation of 3D Lines for Factor Graph Optimization | Yanyan Li et.al. | 2508.04335 | null |
| 2025-08-05 | OmniShape: Zero-Shot Multi-Hypothesis Shape and Pose Estimation in the Real World | Katherine Liu et.al. | 2508.03669 | null |
| 2025-08-05 | FPG-NAS: FLOPs-Aware Gated Differentiable Neural Architecture Search for Efficient 6DoF Pose Estimation | Nassim Ali Ousalah et.al. | 2508.03618 | null |
| 2025-08-05 | RadProPoser: A Framework for Human Pose Estimation with Uncertainty Quantification from Raw Radar Data | Jonas Leo Mueller et.al. | 2508.03578 | null |
| 2025-08-05 | Vision-based Perception System for Automated Delivery Robot-Pedestrians Interactions | Ergi Tushe et.al. | 2508.03541 | null |
| 2025-08-05 | Semantic Mosaicing of Histo-Pathology Image Fragments using Visual Foundation Models | Stefan Brandstätter et.al. | 2508.03524 | null |
| 2025-08-05 | BaroPoser: Real-time Human Motion Tracking from IMUs and Barometers in Everyday Devices | Libo Zhang et.al. | 2508.03313 | null |
| 2025-08-05 | MVTOP: Multi-View Transformer-based Object Pose-Estimation | Lukas Ranftl et.al. | 2508.03243 | null |
| 2025-08-05 | COFFEE: A Shadow-Resilient Real-Time Pose Estimator for Unknown Tumbling Asteroids using Sparse Neural Networks | Arion Zimmermann et.al. | 2508.03132 | null |
| 2025-08-04 | PyCAT4: A Hierarchical Vision Transformer-based Framework for 3D Human Pose Estimation | Zongyou Yang et.al. | 2508.02806 | null |
| 2025-08-04 | PMGS: Reconstruction of Projectile Motion across Large Spatiotemporal Spans via 3D Gaussian Splatting | Yijun Xu et.al. | 2508.02660 | null |
| 2025-08-04 | SGAD: Semantic and Geometric-aware Descriptor for Local Feature Matching | Xiangzeng Liu et.al. | 2508.02278 | null |
| 2025-08-04 | Unified Category-Level Object Detection and Pose Estimation from RGB Images using 3D Prototypes | Tom Fischer et.al. | 2508.02157 | null |
| 2025-08-04 | YOLOv1 to YOLOv11: A Comprehensive Survey of Real-Time Object Detection Innovations and Challenges | Manikanta Kotthapalli et.al. | 2508.02067 | null |
| 2025-08-04 | StarPose: 3D Human Pose Estimation via Spatial-Temporal Autoregressive Diffusion | Haoxin Yang et.al. | 2508.02056 | link |
| 2025-08-03 | CVD-SfM: A Cross-View Deep Front-end Structure-from-Motion System for Sparse Localization in Multi-Altitude Scenes | Yaxuan Li et.al. | 2508.01936 | null |
| 2025-08-03 | IMUCoCo: Enabling Flexible On-Body IMU Placement for Human Pose Estimation and Activity Recognition | Haozhe Zhou et.al. | 2508.01894 | null |
| 2025-08-03 | ChairPose: Pressure-based Chair Morphology Grounded Sitting Pose Estimation through Simulation-Assisted Training | Lala Shakti Swarup Ray et.al. | 2508.01850 | null |
| 2025-08-02 | No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views | Ranran Huang et.al. | 2508.01171 | null |
| 2025-08-01 | CoProU-VO: Combining Projected Uncertainty for End-to-End Unsupervised Monocular Visual Odometry | Jingchao Xie et.al. | 2508.00568 | null |
| 2025-07-31 | Mitigating Resolution-Drift in Federated Learning: Case of Keypoint Detection | Taeheon Lim et.al. | 2507.23461 | null |
| 2025-07-31 | FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models | Yiming Yang et.al. | 2507.23325 | null |
| 2025-07-30 | From Sharp to Blur: Unsupervised Domain Adaptation for 2D Human Pose Estimation Under Extreme Motion Blur Using Event Cameras | Youngho Kim et.al. | 2507.22438 | null |
| 2025-07-29 | LiteFat: Lightweight Spatio-Temporal Graph Learning for Real-Time Driver Fatigue Detection | Jing Ren et.al. | 2507.21756 | null |
| 2025-07-29 | Adaptive Prior Scene-Object SLAM for Dynamic Environments | Haolan Zhang et.al. | 2507.21709 | null |
| 2025-07-28 | PixelNav: Towards Model-based Vision-Only Navigation with Topological Graphs | Sergey Bakulin et.al. | 2507.20892 | null |
| 2025-07-28 | Beyond Line-of-Sight: Cooperative Localization Using Vision and V2X Communication | Annika Wong et.al. | 2507.20772 | null |
| 2025-07-28 | KASportsFormer: Kinematic Anatomy Enhanced Transformer for 3D Human Pose Estimation on Short Sports Scene Video | Zhuoer Yin et.al. | 2507.20763 | null |
| 2025-07-28 | Automated 3D-GS Registration and Fusion via Skeleton Alignment and Gaussian-Adaptive Features | Shiyang Liu et.al. | 2507.20480 | null |
| 2025-07-26 | A Structure-aware and Motion-adaptive Framework for 3D Human Pose Estimation with Mamba | Ye Lu et.al. | 2507.19852 | null |
| 2025-07-25 | Efficient Lines Detection for Robot Soccer | João G. Melo et.al. | 2507.19469 | null |
| 2025-07-25 | Fast Learning of Non-Cooperative Spacecraft 3D Models through Primitive Initialization | Pol Francesch Huc et.al. | 2507.19459 | null |
| 2025-07-24 | Unposed 3DGS Reconstruction with Probabilistic Procrustes Mapping | Chong Cheng et.al. | 2507.18541 | null |
| 2025-07-24 | NLML-HPE: Head Pose Estimation with Limited Data via Manifold Learning | Mahdi Ghafourian et.al. | 2507.18429 | null |
| 2025-07-24 | AF-RLIO: Adaptive Fusion of Radar-LiDAR-Inertial Information for Robust Odometry in Challenging Environments | Chenglong Qian et.al. | 2507.18317 | null |
| 2025-07-24 | Evaluation of facial landmark localization performance in a surgical setting | Ines Frajtag et.al. | 2507.18248 | null |
| 2025-07-24 | Emotion Recognition from Skeleton Data: A Comprehensive Survey | Haifeng Lu et.al. | 2507.18026 | null |
| 2025-07-23 | RemixFusion: Residual-based Mixed Representation for Large-scale Online RGB-D Reconstruction | Yuqing Lan et.al. | 2507.17594 | null |
| 2025-07-23 | Physics-based Human Pose Estimation from a Single Moving RGB Camera | Ayce Idil Aytekin et.al. | 2507.17406 | null |
| 2025-07-21 | Toward a Real-Time Framework for Accurate Monocular 3D Human Pose Estimation with Geometric Priors | Mohamed Adjel et.al. | 2507.16850 | null |
| 2025-07-22 | Adaptive Relative Pose Estimation Framework with Dual Noise Tuning for Safe Approaching Maneuvers | Batu Candan et.al. | 2507.16214 | null |
| 2025-07-21 | TONUS: Neuromorphic human pose estimation for artistic sound co-creation | Jules Lecomte et.al. | 2507.15734 | null |
| 2025-07-21 | Hi^2-GSLoc: Dual-Hierarchical Gaussian-Specific Visual Relocalization for Remote Sensing | Boni Hu et.al. | 2507.15683 | null |
| 2025-07-21 | Dense-depth map guided deep Lidar-Visual Odometry with Sparse Point Clouds and Images | JunYing Huang et.al. | 2507.15496 | null |
| 2025-07-20 | 3-Dimensional CryoEM Pose Estimation and Shift Correction Pipeline | Kaishva Chintan Shah et.al. | 2507.14924 | null |
| 2025-07-20 | An Evaluation of DUSt3R/MASt3R/VGGT 3D Reconstruction on Photogrammetric Aerial Blocks | Xinyi Wu et.al. | 2507.14798 | null |
| 2025-07-22 | AI-Enhanced Precision in Sport Taekwondo: Increasing Fairness, Speed, and Trust in Competition (FST.ai) | Keivan Shariatmadar et.al. | 2507.14657 | null |
| 2025-07-18 | C-DOG: Training-Free Multi-View Multi-Object Association in Dense Scenes Without Visual Feature via Connected δ-Overlap Graphs | Yung-Hong Sun et.al. | 2507.14095 | null |
| 2025-07-21 | PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations | Yu Wei et.al. | 2507.13891 | null |
| 2025-07-18 | MaskHOI: Robust 3D Hand-Object Interaction Estimation via Masked Pre-training | Yuechen Xie et.al. | 2507.13673 | null |
| 2025-07-17 | $π^3$ : Scalable Permutation-Equivariant Visual Geometry Learning | Yifan Wang et.al. | 2507.13347 | link |
| 2025-07-17 | Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark | Junsu Kim et.al. | 2507.13314 | null |
| 2025-07-17 | DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model | Maulana Bisyir Azhari et.al. | 2507.13145 | null |
| 2025-07-17 | AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability | Tomohiro Suzuki et.al. | 2507.12905 | null |
| 2025-07-17 | From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation | Mengxi Liu et.al. | 2507.12884 | null |
| 2025-07-19 | SpatialTrackerV2: 3D Point Tracking Made Easy | Yuxi Xiao et.al. | 2507.12462 | link |
| 2025-07-16 | Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation | Antonio Finocchiaro et.al. | 2507.12292 | null |
| 2025-07-16 | UniLGL: Learning Uniform Place Recognition for FOV-limited/Panoramic LiDAR Global Localization | Hongming Shen et.al. | 2507.12194 | null |
| 2025-07-16 | BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images | Davide Di Nucci et.al. | 2507.12095 | null |
| 2025-07-16 | SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation | Beining Xu et.al. | 2507.12027 | null |
| 2025-07-16 | SEPose: A Synthetic Event-based Human Pose Estimation Dataset for Pedestrian Monitoring | Kaustav Chanda et.al. | 2507.11910 | null |
| 2025-07-15 | GKNet: Graph-based Keypoints Network for Monocular Pose Estimation of Non-cooperative Spacecraft | Weizhao Ma et.al. | 2507.11077 | null |
| 2025-07-15 | Joint angle model based learning to refine kinematic human pose estimation | Chang Peng et.al. | 2507.11075 | null |
| 2025-07-14 | Raci-Net: Ego-vehicle Odometry Estimation in Adverse Weather Conditions | Mohammadhossein Talebi et.al. | 2507.10376 | null |
| 2025-07-14 | Kaleidoscopic Background Attack: Disrupting Pose Estimation with Multi-Fold Radial Symmetry Textures | Xinlong Ding et.al. | 2507.10265 | null |
| 2025-07-14 | ProGait: A Multi-Purpose Video Dataset and Benchmark for Transfemoral Prosthesis Users | Xiangyu Yin et.al. | 2507.10223 | link |
| 2025-07-13 | VST-Pose: A Velocity-Integrated Spatiotem-poral Attention Network for Human WiFi Pose Estimation | Xinyu Zhang et.al. | 2507.09672 | null |
| 2025-07-13 | EHPE: A Segmented Architecture for Enhanced Hand Pose Estimation | Bolun Zheng et.al. | 2507.09560 | null |
| 2025-07-13 | Self-supervised pretraining of vision transformers for animal behavioral analysis and neural encoding | Yanchen Wang et.al. | 2507.09513 | null |
| 2025-07-12 | PoseLLM: Enhancing Language-Guided Human Pose Estimation with MLP Alignment | Dewen Zhang et.al. | 2507.09139 | null |
| 2025-07-10 | RegGS: Unposed Sparse Views Gaussian Splatting with 3DGS Registration | Chong Cheng et.al. | 2507.08136 | null |
| 2025-07-10 | SCREP: Scene Coordinate Regression and Evidential Learning-based Perception-Aware Trajectory Generation | Juyeop Han et.al. | 2507.07467 | null |
| 2025-07-09 | g2o vs. Ceres: Optimizing Scan Matching in Cartographer SLAM | Quanjie Qiu et.al. | 2507.07142 | null |
| 2025-07-09 | Smartphone Exergames with Real-Time Markerless Motion Capture: Challenges and Trade-offs | Mathieu Phosanarack et.al. | 2507.06669 | null |
| 2025-07-09 | MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning | Yifan Yang et.al. | 2507.06662 | null |
| 2025-07-09 | Mask6D: Masked Pose Priors For 6D Object Pose Estimation | Yuechen Xie et.al. | 2507.06486 | null |
| 2025-07-08 | SenseShift6D: Multimodal RGB-D Benchmarking for Robust 6D Pose Estimation across Environment and Sensor Variations | Yegyu Han et.al. | 2507.05751 | null |
| 2025-07-08 | Event-RGB Fusion for Spacecraft Pose Estimation Under Harsh Lighting | Mohsi Jawaid et.al. | 2507.05698 | null |
| 2025-07-07 | W2W: A Simulated Exploration of IMU Placement Across the Human Body for Designing Smarter Wearable | Lala Shakti Swarup Ray et.al. | 2507.05532 | null |
| 2025-07-07 | UDF-GMA: Uncertainty Disentanglement and Fusion for General Movement Assessment | Zeqi Luo et.al. | 2507.04814 | null |
| 2025-07-06 | Thousand-Brains Systems: Sensorimotor Intelligence for Rapid, Robust Learning and Inference | Niels Leadholm et.al. | 2507.04494 | null |
| 2025-07-09 | Gaussian-LIC2: LiDAR-Inertial-Camera Gaussian Splatting SLAM | Xiaolei Lang et.al. | 2507.04004 | null |
| 2025-07-05 | Accurate Pose Estimation Using Contact Manifold Sampling for Safe Peg-in-Hole Insertion of Complex Geometries | Abhay Negi et.al. | 2507.03925 | null |
| 2025-07-02 | Markerless Stride Length estimation in Athletic using Pose Estimation with monocular vision | Patryk Skorupski et.al. | 2507.03016 | null |
| 2025-07-03 | Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning | Buzhen Huang et.al. | 2507.02565 | null |
| 2025-07-03 | IMASHRIMP: Automatic White Shrimp (Penaeus vannamei) Biometrical Analysis from Laboratory Images Using Computer Vision and Deep Learning | Abiam Remache González et.al. | 2507.02519 | null |
| 2025-07-03 | 3D Heart Reconstruction from Sparse Pose-agnostic 2D Echocardiographic Slices | Zhurong Chen et.al. | 2507.02411 | null |
| 2025-07-03 | LMPNet for Weakly-supervised Keypoint Discovery | Pei Guo et.al. | 2507.02308 | null |
| 2025-07-02 | What does really matter in image goal navigation? | Gianluca Monaci et.al. | 2507.01667 | null |
| 2025-07-01 | 2024 NASA SUITS Report: LLM-Driven Immersive Augmented Reality User Interface for Robotics and Space Exploration | Kathy Zhuang et.al. | 2507.01206 | null |
| 2025-07-01 | Multi-Modal Graph Convolutional Network with Sinusoidal Encoding for Robust Human Action Segmentation | Hao Xing et.al. | 2507.00752 | null |
| 2025-07-01 | LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Alignment | Juelin Zhu et.al. | 2507.00659 | null |
| 2025-06-30 | Computer Vision for Objects used in Group Work: Challenges and Opportunities | Changsoo Jung et.al. | 2507.00224 | null |
| 2025-06-30 | Validation of AI-Based 3D Human Pose Estimation in a Cyber-Physical Environment | Lisa Marie Otto et.al. | 2506.23739 | null |
| 2025-06-30 | MGPRL: Distributed Multi-Gaussian Processes for Wi-Fi-based Multi-Robot Relative Localization in Large Indoor Environments | Sai Krishna Ghanta et.al. | 2506.23514 | null |
| 2025-06-29 | TVG-SLAM: Robust Gaussian Splatting SLAM with Tri-view Geometric Constraints | Zhen Tan et.al. | 2506.23207 | null |
| 2025-06-28 | Deterministic Object Pose Confidence Region Estimation | Jinghao Wang et.al. | 2506.22720 | null |
| 2025-06-27 | Evaluating Pointing Gestures for Target Selection in Human-Robot Collaboration | Noora Sassali et.al. | 2506.22116 | null |
| 2025-06-27 | Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras | Petr Hruby et.al. | 2506.22069 | null |
| 2025-06-24 | ICP-3DGS: SfM-free 3D Gaussian Splatting for Large-scale Unbounded Scenes | Chenhao Zhang et.al. | 2506.21629 | link |
| 2025-06-26 | EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting | Taoyu Wu et.al. | 2506.21420 | null |
| 2025-06-26 | CURL-SLAM: Continuous and Compact LiDAR Mapping | Kaicheng Zhang et.al. | 2506.21077 | null |
| 2025-06-27 | DidSee: Diffusion-Based Depth Completion for Material-Agnostic Robotic Perception and Manipulation | Wenzhou Lyu et.al. | 2506.21034 | null |
| 2025-06-25 | How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction? | Stephanie Käs et.al. | 2506.20795 | null |
| 2025-06-26 | Consensus-Driven Uncertainty for Robotic Grasping based on RGB Perception | Eric C. Joyce et.al. | 2506.20045 | null |
| 2025-06-24 | Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images | Stephanie Käs et.al. | 2506.19747 | null |
| 2025-06-23 | RAG-6DPose: Retrieval-Augmented 6D Pose Estimation via Leveraging CAD as Knowledge Base | Kuanning Wang et.al. | 2506.18856 | null |
| 2025-06-19 | Reproducible Evaluation of Camera Auto-Exposure Methods in the Field: Platform, Benchmark and Lessons Learned | Olivier Gamache et.al. | 2506.18844 | null |
| 2025-06-23 | SViP: Sequencing Bimanual Visuomotor Policies with Object-Centric Motion Primitives | Yizhou Chen et.al. | 2506.18825 | null |
| 2025-06-20 | RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking | Teng Guo et.al. | 2506.17119 | link |
| 2025-06-20 | Monocular One-Shot Metric-Depth Alignment for RGB-Based Robot Grasping | Teng Guo et.al. | 2506.17110 | null |
| 2025-06-20 | LunarLoc: Segment-Based Global Localization on the Moon | Annika Thomas et.al. | 2506.16940 | link |
| 2025-06-19 | ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models | Puhao Li et.al. | 2506.16211 | null |
| 2025-06-19 | STAR-Pose: Efficient Low-Resolution Video Human Pose Estimation via Spatial-Temporal Adaptive Super-Resolution | Yucheng Jin et.al. | 2506.16061 | null |
| 2025-06-19 | KARL: Kalman-Filter Assisted Reinforcement Learner for Dynamic Object Tracking and Grasping | Kowndinya Boyalakuntla et.al. | 2506.15945 | null |
| 2025-06-19 | Beyond Audio and Pose: A General-Purpose Framework for Video Synchronization | Yosub Shin et.al. | 2506.15937 | null |
| 2025-06-18 | Improving Robotic Manipulation: Techniques for Object Pose Estimation, Accommodating Positional Uncertainty, and Disassembly Tasks from Examples | Viral Rasik Galaiya et.al. | 2506.15865 | null |
| 2025-06-18 | PRISM-Loc: a Lightweight Long-range LiDAR Localization in Urban Environments with Topological Maps | Kirill Muravyev et.al. | 2506.15849 | null |
| 2025-06-18 | Human Motion Capture from Loose and Sparse Inertial Sensors with Garment-aware Diffusion Models | Andela Ilic et.al. | 2506.15290 | null |
| 2025-06-18 | RA-NeRF: Robust Neural Radiance Field Reconstruction with Accurate Camera Pose Estimation under Complex Trajectories | Qingsong Yan et.al. | 2506.15242 | null |
| 2025-06-17 | PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation | Ming Xu et.al. | 2506.14596 | null |
| 2025-06-17 | MOL: Joint Estimation of Micro-Expression, Optical Flow, and Landmark via Transformer-Graph-Style Convolution | Zhiwen Shao et.al. | 2506.14511 | null |
| 2025-06-17 | Non-Overlap-Aware Egocentric Pose Estimation for Collaborative Perception in Connected Autonomy | Hong Huang et.al. | 2506.14180 | null |
| 2025-06-17 | TACS-Graphs: Traversability-Aware Consistent Scene Graphs for Ground Robot Indoor Localization and Mapping | Jeewon Kim et.al. | 2506.14178 | null |
| 2025-06-16 | Diffusion-based Inverse Observation Model for Artificial Skin | Ante Maric et.al. | 2506.13986 | null |
| 2025-06-16 | PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images | Lingteng Qiu et.al. | 2506.13766 | null |
| 2025-06-16 | JENGA: Object selection and pose estimation for robotic grasping from a stack | Sai Srinivas Jeevanandam et.al. | 2506.13425 | null |
| 2025-06-16 | Automatic Multi-View X-Ray/CT Registration Using Bone Substructure Contours | Roman Flepp et.al. | 2506.13292 | null |
| 2025-06-16 | DETRPose: Real-time end-to-end transformer model for multi-person pose estimation | Sebastian Janampa et.al. | 2506.13027 | link |
| 2025-06-15 | A large-scale, physically-based synthetic dataset for satellite pose estimation | Szabolcs Velkei et.al. | 2506.12782 | null |
| 2025-06-13 | ViTaSCOPE: Visuo-tactile Implicit Representation for In-hand Pose and Extrinsic Contact Estimation | Jayjun Lee et.al. | 2506.12239 | null |
| 2025-06-10 | Monocular 3D Hand Pose Estimation with Implicit Camera Alignment | Christos Pantazopoulos et.al. | 2506.11133 | null |
| 2025-06-12 | Occlusion-Aware 3D Hand-Object Pose Estimation with Masked AutoEncoders | Hui Yang et.al. | 2506.10816 | null |
| 2025-06-12 | In-Hand Object Pose Estimation via Visual-Tactile Fusion | Felix Nonnengießer et.al. | 2506.10787 | null |
| 2025-06-11 | Fluoroscopic Shape and Pose Tracking of Catheters with Custom Radiopaque Markers | Jared Lawson et.al. | 2506.09934 | null |
| 2025-06-11 | EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks | Athinoulla Konstantinou et.al. | 2506.09895 | link |
| 2025-06-11 | Accurate and efficient zero-shot 6D pose estimation with frozen foundation models | Andrea Caraffa et.al. | 2506.09784 | null |
| 2025-06-11 | CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings | Mattia Nardon et.al. | 2506.09699 | null |
| 2025-06-10 | Princeton365: A Diverse Dataset with Accurate Camera Pose | Karhan Kayan et.al. | 2506.09035 | null |
| 2025-06-10 | ArrowPose: Segmentation, Detection, and 5 DoF Pose Estimation Network for Colorless Point Clouds | Frederik Hagelskjaer et.al. | 2506.08699 | null |
| 2025-06-09 | UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References | Ming-Feng Li et.al. | 2506.07996 | null |
| 2025-06-09 | Hierarchical Scoring with 3D Gaussian Splatting for Instance Image-Goal Navigation | Yijie Deng et.al. | 2506.07338 | null |
| 2025-06-10 | From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models | Pablo Acuaviva et.al. | 2506.07280 | null |
| 2025-06-08 | GoTrack: Generic 6DoF Object Pose Refinement and Tracking | Van Nguyen Nguyen et.al. | 2506.07155 | link |
| 2025-06-08 | UNO: Unified Self-Supervised Monocular Odometry for Platform-Agnostic Deployment | Wentao Zhao et.al. | 2506.07013 | null |
| 2025-06-07 | Deep Inertial Pose: A deep learning approach for human pose estimation | Sara M. Cerqueira et.al. | 2506.06850 | null |
| 2025-06-06 | Dy3DGS-SLAM: Monocular 3D Gaussian Splatting SLAM for Dynamic Environments | Mingrui Li et.al. | 2506.05965 | null |
| 2025-06-06 | SurGSplat: Progressive Geometry-Constrained Gaussian Splatting for Surgical Scene Reconstruction | Yuchao Zheng et.al. | 2506.05935 | null |
| 2025-06-06 | CryoFastAR: Fast Cryo-EM Ab Initio Reconstruction Made Easy | Jiakai Zhang et.al. | 2506.05864 | null |
| 2025-06-06 | You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping | Jingshun Huang et.al. | 2506.05719 | null |
| 2025-06-05 | On-the-fly Reconstruction for Large-Scale Novel View Synthesis from Unposed Images | Andreas Meuleman et.al. | 2506.05558 | null |
| 2025-06-05 | Rectified Point Flow: Generic Point Cloud Pose Estimation | Tao Sun et.al. | 2506.05282 | link |
| 2025-06-05 | Realizing Text-Driven Motion Generation on NAO Robot: A Reinforcement Learning-Optimized Control Pipeline | Zihan Xu et.al. | 2506.05117 | link |
| 2025-06-05 | CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx | Lukas Picek et.al. | 2506.04931 | null |
| 2025-06-05 | SupeRANSAC: One RANSAC to Rule Them All | Daniel Barath et.al. | 2506.04803 | null |
| 2025-06-05 | LGM-Pose: A Lightweight Global Modeling Network for Real-time Human Pose Estimation | Biao Guo et.al. | 2506.04561 | null |
| 2025-06-04 | Photoreal Scene Reconstruction from an Egocentric Device | Zhaoyang Lv et.al. | 2506.04444 | link |
| 2025-06-04 | cuVSLAM: CUDA accelerated visual odometry | Alexander Korovko et.al. | 2506.04359 | null |
| 2025-06-04 | Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation | Tianyu Huang et.al. | 2506.04225 | null |
| 2025-06-04 | Accelerating SfM-based Pose Estimation with Dominating Set | Joji Joseph et.al. | 2506.03667 | null |
| 2025-06-03 | OpenFace 3.0: A Lightweight Multitask System for Comprehensive Facial Behavior Analysis | Jiewen Hu et.al. | 2506.02891 | null |
| 2025-06-03 | Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation | Mingjie Wei et.al. | 2506.02853 | null |
| 2025-06-03 | GeneA-SLAM2: Dynamic SLAM with AutoEncoder-Preprocessed Genetic Keypoints Resampling and Depth Variance-Guided Dynamic Region Removal | Shufan Qing et.al. | 2506.02736 | link |
| 2025-06-02 | Rig3R: Rig-Aware Conditioning for Learned 3D Reconstruction | Samuel Li et.al. | 2506.02265 | null |
| 2025-06-02 | E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models | Wenyan Cong et.al. | 2506.01933 | null |
| 2025-06-02 | SteerPose: Simultaneous Extrinsic Camera Calibration and Matching from Articulation | Sang-Eun Lee et.al. | 2506.01691 | null |
| 2025-06-02 | Sheep Facial Pain Assessment Under Weighted Graph Neural Networks | Alam Noor et.al. | 2506.01468 | null |
| 2025-06-01 | TIGeR: Text-Instructed Generation and Refinement for Template-Free Hand-Object Interaction | Yiyao Huang et.al. | 2506.00953 | null |
| 2025-05-31 | XYZ-IBD: High-precision Bin-picking Dataset for Object 6D Pose Estimation Capturing Real-world Industrial Complexity | Junwen Huang et.al. | 2506.00599 | null |
| 2025-05-30 | Lazy Heuristic Search for Solving POMDPs with Expensive-to-Compute Belief Transitions | Muhammad Suhail Saleem et.al. | 2506.00285 | null |
| 2025-05-30 | 6D Pose Estimation on Point Cloud Data through Prior Knowledge Integration: A Case Study in Autonomous Disassembly | Chengzhi Wu et.al. | 2505.24669 | null |
| 2025-05-30 | Category-Level 6D Object Pose Estimation in Agricultural Settings Using a Lattice-Deformation Framework and Diffusion-Augmented Synthetic Data | Marios Glytsos et.al. | 2505.24636 | null |
| 2025-05-30 | PCIE_Pose Solution for EgoExo4D Pose and Proficiency Estimation Challenge | Feng Chen et.al. | 2505.24411 | null |
| 2025-05-29 | Pose-free 3D Gaussian splatting via shape-ray estimation | Youngju Na et.al. | 2505.22978 | null |
| 2025-05-28 | TwinTrack: Bridging Vision and Contact Physics for Real-Time Tracking of Unknown Dynamic Objects | Wen Yang et.al. | 2505.22882 | null |
| 2025-05-28 | 4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians | Hidenobu Matsuki et.al. | 2505.22859 | null |
| 2025-05-28 | MultiFormer: A Multi-Person Pose Estimation System Based on CSI and Attention Mechanism | Yanyi Qu et.al. | 2505.22555 | null |
| 2025-05-28 | Event-based Egocentric Human Pose Estimation in Dynamic Environment | Wataru Ikeda et.al. | 2505.22007 | null |
| 2025-05-27 | Spectral Compression Transformer with Line Pose Graph for Monocular 3D Human Pose Estimation | Zenghao Zheng et.al. | 2505.21309 | null |
| 2025-05-29 | ReassembleNet: Learnable Keypoints and Diffusion for 2D Fresco Reconstruction | Adeela Islam et.al. | 2505.21117 | null |
| 2025-05-27 | HS-SLAM: A Fast and Hybrid Strategy-Based SLAM Approach for Low-Speed Autonomous Driving | Bingxiang Kang et.al. | 2505.20906 | null |
| 2025-05-27 | Mamba-Driven Topology Fusion for Monocular 3-D Human Pose Estimation | Zenghao Zheng et.al. | 2505.20611 | null |
| 2025-05-28 | HAND Me the Data: Fast Robot Adaptation via Hand Path Retrieval | Matthew Hong et.al. | 2505.20455 | null |
| 2025-05-25 | Learning the Contact Manifold for Accurate Pose Estimation During Peg-in-Hole Insertion of Complex Geometries | Abhay Negi et.al. | 2505.19215 | null |
| 2025-05-24 | Why Not Replace? Sustaining Long-Term Visual Localization via Handcrafted-Learned Feature Collaboration on CPU | Yicheng Lin et.al. | 2505.18652 | null |
| 2025-05-24 | An Inertial Sequence Learning Framework for Vehicle Speed Estimation via Smartphone IMU | Xuan Xiao et.al. | 2505.18490 | null |
| 2025-05-23 | Pose Splatter: A 3D Gaussian Splatting Model for Quantifying Animal Pose and Appearance | Jack Goffinet et.al. | 2505.18342 | null |
| 2025-05-23 | To Glue or Not to Glue? Classical vs Learned Image Matching for Mobile Mapping Cameras to Textured Semantic 3D Building Models | Simone Gaisbauer et.al. | 2505.17973 | null |
| 2025-05-23 | Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery | Ming Hu et.al. | 2505.17677 | null |
| 2025-05-23 | PoseBH: Prototypical Multi-Dataset Training Beyond Human Pose Estimation | Uyoung Jeong et.al. | 2505.17475 | link |
| 2025-05-22 | Towards Texture- And Shape-Independent 3D Keypoint Estimation in Birds | Valentin Schmuker et.al. | 2505.16633 | null |
| 2025-05-22 | GMatch: Geometry-Constrained Feature Matching for RGB-D Object Pose Estimation | Ming Yang et.al. | 2505.16144 | null |
| 2025-05-21 | Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation | Yihang Li et.al. | 2505.15098 | null |
| 2025-05-20 | UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction | Nisarga Nilavadi et.al. | 2505.14866 | null |
| 2025-05-19 | Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos | Ruoyu Wang et.al. | 2505.13440 | link |
| 2025-05-19 | KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture | R. James Cotton et.al. | 2505.13436 | null |
| 2025-05-19 | The Way Up: A Dataset for Hold Usage Detection in Sport Climbing | Anna Maschek et.al. | 2505.12854 | null |
| 2025-05-17 | Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation | Niaz Ahmad et.al. | 2505.12130 | null |
| 2025-05-17 | Black-box Adversaries from Latent Space: Unnoticeable Attacks on Human Pose and Shape Estimation | Zhiying Li et.al. | 2505.12009 | null |
| 2025-05-17 | ElderFallGuard: Real-Time IoT and Computer Vision-Based Fall Detection System for Elderly Safety | Tasrifur Riahi et.al. | 2505.11845 | null |
| 2025-05-16 | SurgPose: Generalisable Surgical Instrument Pose Estimation using Zero-Shot Learning and Stereo Vision | Utsav Rai et.al. | 2505.11439 | null |
| 2025-05-16 | MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection | Shrutarv Awasthi et.al. | 2505.11282 | null |
| 2025-05-16 | PoseBench3D: A Cross-Dataset Analysis Framework for 3D Human Pose Estimation | Saad Manzur et.al. | 2505.10888 | null |
| 2025-05-16 | RefPose: Leveraging Reference Geometric Correspondences for Accurate 6D Pose Estimation of Unseen Objects | Jaeguk Kim et.al. | 2505.10841 | null |
| 2025-05-14 | UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units | Huakun Liu et.al. | 2505.09393 | link |
| 2025-05-14 | APR-Transformer: Initial Pose Estimation for Localization in Complex Environments through Absolute Pose Regression | Srinivas Ravuri et.al. | 2505.09356 | link |
| 2025-05-13 | Real-time Capable Learning-based Visual Tool Pose Correction via Differentiable Simulation | Shuyuan Yang et.al. | 2505.08875 | null |
| 2025-05-12 | Sleep Position Classification using Transfer Learning for Bed-based Pressure Sensors | Olivier Papillon et.al. | 2505.08111 | null |
| 2025-05-12 | Enabling Privacy-Aware AI-Based Ergonomic Analysis | Sander De Coninck et.al. | 2505.07306 | null |
| 2025-05-13 | Human Motion Prediction via Test-domain-aware Adaptation with Easily-available Human Motions Estimated from Videos | Katsuki Shimbo et.al. | 2505.07301 | null |
| 2025-05-12 | When Dance Video Archives Challenge Computer Vision | Philippe Colantoni et.al. | 2505.07249 | null |
| 2025-05-10 | CompSLAM: Complementary Hierarchical Multi-Modal Localization and Mapping for Robot Autonomy in Underground Environments | Shehryar Khattak et.al. | 2505.06483 | null |
| 2025-05-09 | Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach | Tim Schneider et.al. | 2505.06182 | null |
| 2025-05-08 | Semantic Style Transfer for Enhancing Animal Facial Landmark Detection | Anadil Hussein et.al. | 2505.05640 | null |
| 2025-05-08 | Progressive Inertial Poser: Progressive Real-Time Kinematic Chain Estimation for 3D Full-Body Pose from Three IMU Sensors | Zunjie Zhu et.al. | 2505.05336 | null |
| 2025-05-08 | Improving Global Motion Estimation in Sparse IMU-based Motion Capture with Physics | Xinyu Yi et.al. | 2505.05010 | null |
| 2025-05-08 | An Efficient Method for Accurate Pose Estimation and Error Correction of Cuboidal Objects | Utsav Rai et.al. | 2505.04962 | null |
| 2025-05-07 | Comparison of Visual Trackers for Biomechanical Analysis of Running | Luis F. Gomez et.al. | 2505.04713 | null |
| 2025-05-07 | Do We Still Need to Work on Odometry for Autonomous Driving? | Cedric Le Gentil et.al. | 2505.04438 | null |
| 2025-05-07 | HDiffTG: A Lightweight Hybrid Diffusion-Transformer-GCN Architecture for 3D Human Pose Estimation | Yajie Fu et.al. | 2505.04276 | link |
| 2025-05-07 | One2Any: One-Reference 6D Pose Estimation for Any Object | Mengya Liu et.al. | 2505.04109 | null |
| 2025-05-06 | Polar Coordinate-Based 2D Pose Prior with Neural Distance Field | Qi Gan et.al. | 2505.03445 | null |
| 2025-05-06 | LiftFeat: 3D Geometry-Aware Local Feature Matching | Yepeng Liu et.al. | 2505.03422 | link |
| 2025-05-06 | Artificial Behavior Intelligence: Technology, Challenges, and Future Directions | Kanghyun Jo et.al. | 2505.03315 | null |
| 2025-05-05 | Dance of Fireworks: An Interactive Broadcast Gymnastics Training System Based on Pose Estimation | Haotian Chen et.al. | 2505.02690 | null |
| 2025-05-05 | Corr2Distrib: Making Ambiguous Correspondences an Ally to Predict Reliable 6D Pose Distributions | Asma Brazi et.al. | 2505.02501 | null |
| 2025-05-05 | Finger Pose Estimation for Under-screen Fingerprint Sensor | Xiongjun Guan et.al. | 2505.02481 | link |
| 2025-05-05 | 6D Pose Estimation on Spoons and Hands | Kevin Tan et.al. | 2505.02335 | null |
| 2025-05-04 | Continuous Normalizing Flows for Uncertainty-Aware Human Pose Estimation | Shipeng Liu et.al. | 2505.02287 | null |
| 2025-05-04 | A Birotation Solution for Relative Pose Problems | Hongbo Zhao et.al. | 2505.02025 | null |
| 2025-05-03 | Near-field 5D Pose Estimation using Reconfigurable Intelligent Surfaces | Srikar Sharma Sadhu et.al. | 2505.01829 | null |
| 2025-05-03 | AquaGS: Fast Underwater Scene Reconstruction with SfM-Free Gaussian Splatting | Junhao Shi et.al. | 2505.01799 | null |
| 2025-05-03 | PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth | Bu Jin et.al. | 2505.01729 | null |
| 2025-05-02 | T-Graph: Enhancing Sparse-view Camera Pose Estimation by Pairwise Translation Graph | Qingyu Xian et.al. | 2505.01207 | null |
| 2025-05-02 | 3D Human Pose Estimation via Spatial Graph Order Attention and Temporal Body Aware Transformer | Kamel Aouaidjia et.al. | 2505.01003 | null |
| 2025-05-01 | Are Minimal Radial Distortion Solvers Really Necessary for Relative Pose Estimation? | Viktor Kocur et.al. | 2505.00866 | null |
| 2025-05-01 | P2P-Insole: Human Pose Estimation Using Foot Pressure Distribution and Motion Sensors | Atsuya Watanabe et.al. | 2505.00755 | null |
| 2025-05-01 | Dietary Intake Estimation via Continuous 3D Reconstruction of Food | Wallace Lee et.al. | 2505.00606 | null |
| 2025-05-02 | InterLoc: LiDAR-based Intersection Localization using Road Segmentation with Automated Evaluation Method | Nguyen Hoang Khoi Tran et.al. | 2505.00512 | null |
| 2025-04-30 | Self-Supervised Monocular Visual Drone Model Identification through Improved Occlusion Handling | Stavrow A. Bahnam et.al. | 2504.21695 | null |
| 2025-04-29 | Dance Style Recognition Using Laban Movement Analysis | Muhammad Turab et.al. | 2504.21166 | null |
| 2025-04-29 | Adept: Annotation-Denoising Auxiliary Tasks with Discrete Cosine Transform Map and Keypoint for Human-Centric Pretraining | Weizhen He et.al. | 2504.20800 | null |
| 2025-04-29 | A Survey on Event-based Optical Marker Systems | Nafiseh Jabbari Tofighi et.al. | 2504.20736 | null |
| 2025-04-29 | Large-scale visual SLAM for in-the-wild videos | Shuo Sun et.al. | 2504.20496 | null |
| 2025-05-01 | GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting | Jongwon Lee et.al. | 2504.20379 | null |
| 2025-05-01 | PRISM-DP: Spatial Pose-based Observations for Diffusion-Policies via Segmentation, Mesh Generation, and Pose Tracking | Xiatao Sun et.al. | 2504.20359 | null |
| 2025-04-28 | Transformation & Translation Occupancy Grid Mapping: 2-Dimensional Deep Learning Refined SLAM | Leon Davies et.al. | 2504.19654 | null |
| 2025-04-28 | GAN-SLAM: Real-Time GAN Aided Floor Plan Creation Through SLAM | Leon Davies et.al. | 2504.19653 | null |
| 2025-04-28 | Category-Level and Open-Set Object Pose Estimation for Robotics | Peter Hönig et.al. | 2504.19572 | null |
| 2025-04-25 | Certifiably-Correct Mapping for Safe Navigation Despite Odometry Drift | Devansh R. Agrawal et.al. | 2504.18713 | null |
| 2025-04-25 | SSD-Poser: Avatar Pose Estimation with State Space Duality from Sparse Observations | Shuting Zhao et.al. | 2504.18332 | null |
| 2025-04-25 | S3MOT: Monocular 3D Object Tracking with Selective State Space Model | Zhuohao Yan et.al. | 2504.18068 | null |
| 2025-04-22 | SmallGS: Gaussian Splatting-based Camera Pose Estimation for Small-Baseline Videos | Yuxin Yao et.al. | 2504.17810 | null |
| 2025-04-24 | Dynamic Camera Poses and Where to Find Them | Chris Rockwell et.al. | 2504.17788 | null |
| 2025-04-24 | A Guide to Structureless Visual Localization | Vojtech Panek et.al. | 2504.17636 | null |
| 2025-04-24 | Object Pose Estimation by Camera Arm Control Based on the Next Viewpoint Estimation | Tomoki Mizuno et.al. | 2504.17424 | null |
| 2025-04-24 | Bias-Eliminated PnP for Stereo Visual Odometry: Provably Consistent and Large-Scale Localization | Guangyang Zeng et.al. | 2504.17410 | null |
| 2025-04-23 | WiFi based Human Fall and Activity Recognition using Transformer based Encoder Decoder and Graph Neural Networks | Younggeol Cho et.al. | 2504.16655 | null |
| 2025-04-23 | Assessing the Feasibility of Internet-Sourced Video for Automatic Cattle Lameness Detection | Md Fahimuzzman Sohan et.al. | 2504.16404 | null |
| 2025-04-22 | SignX: The Foundation Model for Sign Recognition | Sen Fang et.al. | 2504.16315 | null |
| 2025-04-22 | GADS: A Super Lightweight Model for Head Pose Estimation | Menan Velayuthan et.al. | 2504.15751 | null |
| 2025-04-21 | Field Report on Ground Penetrating Radar for Localization at the Mars Desert Research Station | Anja Sheppard et.al. | 2504.15455 | null |
| 2025-04-21 | Vision6D: 3D-to-2D Interactive Visualization and Annotation Tool for 6D Pose Estimation | Yike Zhang et.al. | 2504.15329 | null |
| 2025-04-21 | Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs | Chun-Hsiao Yeh et.al. | 2504.15280 | link |
| 2025-04-21 | Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation | Xiao Zhang et.al. | 2504.15134 | null |
| 2025-04-20 | Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction | Weirong Chen et.al. | 2504.14516 | null |
| 2025-04-20 | SG-Reg: Generalizable and Efficient Scene Graph Registration | Chuhao Liu et.al. | 2504.14440 | link |
| 2025-04-18 | Imitation Learning with Precisely Labeled Human Demonstrations | Yilong Song et.al. | 2504.13803 | null |
| 2025-04-18 | Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction | Wenyu Li et.al. | 2504.13419 | null |
| 2025-04-17 | ViTa-Zero: Zero-shot Visuotactile Object 6D Pose Estimation | Hongyu Li et.al. | 2504.13179 | null |
| 2025-04-18 | ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos | Zetong Zhang et.al. | 2504.13167 | null |
| 2025-04-17 | Unsupervised Cross-Domain 3D Human Pose Estimation via Pseudo-Label-Guided Global Transforms | Jingjing Liu et.al. | 2504.12699 | null |
| 2025-04-16 | MobilePoser: Real-Time Full-Body Pose Estimation and 3D Human Translation from IMUs in Mobile Consumer Devices | Vasco Xu et.al. | 2504.12492 | link |
| 2025-04-16 | Diffusion Based Robust LiDAR Place Recognition | Benjamin Krummenacher et.al. | 2504.12412 | null |
| 2025-04-16 | Regist3R: Incremental Registration with Stereo Foundation Model | Sidun Liu et.al. | 2504.12356 | null |
| 2025-04-16 | CoMotion: Concurrent Multi-person 3D Motion | Alejandro Newell et.al. | 2504.12186 | link |
| 2025-04-16 | No Fuss, Just Function – A Proposal for Non-Intrusive Full Body Tracking in XR for Meaningful Spatial Interactions | Elisabeth Mayer et.al. | 2504.11987 | null |
| 2025-04-16 | An Online Adaptation Method for Robust Depth Estimation and Visual Odometry in the Open World | Xingwu Ji et.al. | 2504.11698 | link |
| 2025-04-17 | CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image | Jingshun Huang et.al. | 2504.11230 | null |
| 2025-04-15 | DMAGaze: Gaze Estimation Based on Feature Disentanglement and Multi-Scale Attention | Haohan Chen et.al. | 2504.11160 | null |
| 2025-04-14 | MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model | Jian Liu et.al. | 2504.10433 | null |
| 2025-04-14 | Benchmarking 3D Human Pose Estimation Models Under Occlusions | Filipa Lino et.al. | 2504.10350 | null |
| 2025-04-15 | Differentially Private 2D Human Pose Estimation | Kaushik Bhargav Sivangi et.al. | 2504.10190 | null |
| 2025-04-14 | TT3D: Table Tennis 3D Reconstruction | Thomas Gossard et.al. | 2504.10035 | null |
| 2025-04-14 | Efficient 2D to Full 3D Human Pose Uplifting including Joint Rotations | Katja Ludwig et.al. | 2504.09953 | null |
| 2025-04-14 | NeRF-Based Transparent Object Grasping Enhanced by Shape Priors | Yi Han et.al. | 2504.09868 | null |
| 2025-04-13 | EasyREG: Easy Depth-Based Markerless Registration and Tracking using Augmented Reality Device for Surgical Guidance | Yue Yang et.al. | 2504.09498 | null |
| 2025-04-12 | SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow | Qingyuan Wang et.al. | 2504.09160 | null |
| 2025-04-12 | A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds | Jizong Peng et.al. | 2504.09129 | null |
| 2025-04-12 | BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting | Jeongwan On et.al. | 2504.09097 | null |
| 2025-04-11 | The Invisible EgoHand: 3D Hand Forecasting through EgoBody Pose Estimation | Masashi Hatano et.al. | 2504.08654 | null |
| 2025-04-11 | MBE-ARI: A Multimodal Dataset Mapping Bi-directional Engagement in Animal-Robot Interaction | Ian Noronha et.al. | 2504.08646 | null |
| 2025-04-11 | Hardware, Algorithms, and Applications of the Neuromorphic Vision Sensor: a Review | Claudio Cimarelli et.al. | 2504.08588 | null |
| 2025-04-11 | Multi-person Physics-based Pose Estimation for Combat Sports | Hossein Feiz et.al. | 2504.08175 | null |
| 2025-04-10 | Towards Unconstrained 2D Pose Estimation of the Human Spine | Muhammad Saif Ullah Khan et.al. | 2504.08110 | link |
| 2025-04-10 | BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation | Yuanhong Yu et.al. | 2504.07955 | link |
| 2025-04-09 | DLTPose: 6DoF Pose Estimation From Accurate Dense Surface Point Estimates | Akash Jadhav et.al. | 2504.07335 | null |
| 2025-04-09 | Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation | Yu Qi et.al. | 2504.06961 | null |
| 2025-04-09 | GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered Scenes | Seunghyeok Back et.al. | 2504.06866 | link |
| 2025-04-09 | Setup-Invariant Augmented Reality for Teaching by Demonstration with Surgical Robots | Alexandre Banks et.al. | 2504.06677 | link |
| 2025-04-09 | HGMamba: Enhancing 3D Human Pose Estimation with a HyperGCN-Mamba Network | Hu Cui et.al. | 2504.06638 | null |
| 2025-04-08 | Leveraging Synthetic Adult Datasets for Unsupervised Infant Pose Estimation | Sarosij Bose et.al. | 2504.05789 | null |
| 2025-04-08 | SAP-CoPE: Social-Aware Planning using Cooperative Pose Estimation with Infrastructure Sensor Nodes | Minghao Ning et.al. | 2504.05727 | link |
| 2025-04-08 | POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction | Songyan Zhang et.al. | 2504.05692 | link |
| 2025-04-10 | Learning Affine Correspondences by Integrating Geometric Constraints | Pengju Sun et.al. | 2504.04834 | link |
| 2025-04-06 | A Convex and Global Solution for the P $n$ P Problem in 2D Forward-Looking Sonar | Jiayi Su et.al. | 2504.04445 | null |
| 2025-04-05 | 3R-GS: Best Practice in Optimizing Camera Poses Along with 3DGS | Zhisheng Huang et.al. | 2504.04294 | null |
| 2025-04-02 | A Geometric Approach For Pose and Velocity Estimation Using IMU and Inertial/Body-Frame Measurements | Sifeddine Benahmed et.al. | 2504.03764 | null |
| 2025-04-04 | Robust Human Registration with Body Part Segmentation on Noisy Point Clouds | Kai Lascheit et.al. | 2504.03602 | null |
| 2025-04-04 | Endo3R: Unified Online Reconstruction from Dynamic Monocular Endoscopic Video | Jiaxin Guo et.al. | 2504.03198 | null |
| 2025-04-03 | Cooperative Inference for Real-Time 3D Human Pose Estimation in Multi-Device Edge Networks | Hyun-Ho Choi et.al. | 2504.03052 | null |
| 2025-04-03 | BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation | Van Nguyen Nguyen et.al. | 2504.02812 | link |
| 2025-04-03 | PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose Estimation | Lihua Liu et.al. | 2504.02617 | null |
| 2025-04-02 | Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation | Mingrui Ye et.al. | 2504.01764 | link |
| 2025-04-02 | ForestVO: Enhancing Visual Odometry in Forest Environments through ForestGlue | Thomas Pritchard et.al. | 2504.01261 | link |
| 2025-04-01 | AP-CAP: Advancing High-Quality Data Synthesis for Animal Pose Estimation via a Controllable Image Generation Pipeline | Lei Wang et.al. | 2504.00394 | null |
| 2025-03-31 | Easi3R: Estimating Disentangled Motion from DUSt3R Without Training | Xingyu Chen et.al. | 2503.24391 | link |
| 2025-03-31 | LiM-Loc: Visual Localization with Dense and Accurate 3D Reference Maps Directly Corresponding 2D Keypoints to 3D LiDAR Point Clouds | Masahiko Tsuji et.al. | 2503.23664 | null |
| 2025-03-30 | PhysPose: Refining 6D Object Poses with Physical Constraints | Martin Malenický et.al. | 2503.23587 | null |
| 2025-03-30 | Improving Indoor Localization Accuracy by Using an Efficient Implicit Neural Map Representation | Haofei Kuang et.al. | 2503.23480 | link |
| 2025-03-30 | SparseLoc: Sparse Open-Set Landmark-based Global Localization for Autonomous Navigation | Pranjal Paul et.al. | 2503.23465 | null |
| 2025-03-30 | HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation | Hongwei Zheng et.al. | 2503.23331 | null |
| 2025-03-29 | Incorporating GNSS Information with LIDAR-Inertial Odometry for Accurate Land-Vehicle Localization | Jintao Cheng et.al. | 2503.23199 | null |
| 2025-03-28 | ForcePose: A Deep Learning Approach for Force Calculation Based on Action Recognition Using MediaPipe Pose Estimation Combined with Object Detection | Nandakishor M et.al. | 2503.22363 | null |
| 2025-03-28 | GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion | Li-Heng Chen et.al. | 2503.22349 | null |
| 2025-03-27 | NeRF-based Point Cloud Reconstruction using a Stationary Camera for Agricultural Applications | Kibon Ku et.al. | 2503.21958 | null |
| 2025-03-27 | Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video | David Yifan Yao et.al. | 2503.21761 | link |
| 2025-03-27 | Reconstructing Humans with a Biomechanically Accurate Skeleton | Yan Xia et.al. | 2503.21751 | link |
| 2025-03-27 | OccRobNet : Occlusion Robust Network for Accurate 3D Interacting Hand-Object Pose Estimation | Mallika Garg et.al. | 2503.21723 | null |
| 2025-03-27 | RapidPoseTriangulation: Multi-view Multi-person Whole-body Human Pose Triangulation in a Millisecond | Daniel Bermuth et.al. | 2503.21692 | null |
| 2025-03-27 | STAMICS: Splat, Track And Map with Integrated Consistency and Semantics for Dense RGB-D SLAM | Yongxu Wang et.al. | 2503.21425 | null |
| 2025-03-27 | Lidar-only Odometry based on Multiple Scan-to-Scan Alignments over a Moving Window | Aaron Kurda et.al. | 2503.21293 | null |
| 2025-03-27 | Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation | Junjie Chen et.al. | 2503.21140 | link |
| 2025-03-26 | DINeMo: Learning Neural Mesh Models with no 3D Annotations | Weijie Guo et.al. | 2503.20220 | link |
| 2025-03-25 | Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors | Yuke Lou et.al. | 2503.20118 | null |
| 2025-03-25 | Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders | Paul Koch et.al. | 2503.19947 | null |
| 2025-03-25 | Visuo-Tactile Object Pose Estimation for a Multi-Finger Robot Hand with Low-Resolution In-Hand Tactile Sensing | Lukas Mack et.al. | 2503.19893 | null |
| 2025-03-25 | Semi-SD: Semi-Supervised Metric Depth Estimation via Surrounding Cameras for Autonomous Driving | Yusen Xie et.al. | 2503.19713 | null |
| 2025-03-25 | DynOPETs: A Versatile Benchmark for Dynamic Object Pose Estimation and Tracking in Moving Camera Scenarios | Xiangting Meng et.al. | 2503.19625 | null |
| 2025-03-25 | Pose-Based Fall Detection System: Efficient Monitoring on Standard CPUs | Vinayak Mali et.al. | 2503.19501 | null |
| 2025-03-25 | Multi-modal 3D Pose and Shape Estimation with Computed Tomography | Mingxiao Tu et.al. | 2503.19405 | null |
| 2025-03-25 | From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting | Zhiwei Huang et.al. | 2503.19358 | null |
| 2025-03-25 | Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation | Zhuoran Zhao et.al. | 2503.19307 | link |
| 2025-03-25 | Any6D: Model-free 6D Pose Estimation of Novel Objects | Taeyeop Lee et.al. | 2503.18673 | link |
| 2025-03-24 | Structure-Aware Correspondence Learning for Relative Pose Estimation | Yihan Chen et.al. | 2503.18671 | null |
| 2025-03-24 | TrackID3x3: A Dataset and Algorithm for Multi-Player Tracking with Identification and Pose Estimation in 3x3 Basketball Full-court Videos | Kazuhiro Yamada et.al. | 2503.18282 | null |
| 2025-03-23 | Selecting and Pruning: A Differentiable Causal Sequentialized State-Space Model for Two-View Correspondence Learning | Xiang Fang et.al. | 2503.17938 | null |
| 2025-03-22 | Co-op: Correspondence-based Novel Object Pose Estimation | Sungphill Moon et.al. | 2503.17731 | null |
| 2025-03-21 | Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image | Jerred Chen et.al. | 2503.17358 | null |
| 2025-03-21 | Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors | Wonbong Jang et.al. | 2503.17316 | null |
| 2025-03-20 | ContactFusion: Stochastic Poisson Surface Maps from Visual and Contact Sensing | Aditya Kamireddypalli et.al. | 2503.16592 | null |
| 2025-03-19 | A Comprehensive Survey on Architectural Advances in Deep CNNs: Challenges, Applications, and Emerging Research Directions | Saddam Hussain Khan et.al. | 2503.16546 | null |
| 2025-03-20 | Probabilistic Prompt Distribution Learning for Animal Pose Estimation | Jiyong Rao et.al. | 2503.16120 | link |
| 2025-03-20 | Automating 3D Dataset Generation with Neural Radiance Fields | P. Schulz et.al. | 2503.15997 | link |
| 2025-03-20 | Learning to Efficiently Adapt Foundation Models for Self-Supervised Endoscopic 3D Scene Reconstruction from Any Cameras | Beilei Cui et.al. | 2503.15917 | null |
| 2025-03-19 | EdgeRegNet: Edge Feature-based Multimodal Registration Network between Images and LiDAR Point Clouds | Yuanchao Yue et.al. | 2503.15284 | null |
| 2025-03-20 | GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation | Zinqin Huang et.al. | 2503.15110 | link |
| 2025-03-20 | Distilling 3D distinctive local descriptors for 6D pose estimation | Amir Hamza et.al. | 2503.15106 | link |
| 2025-03-18 | Validation of Human Pose Estimation and Human Mesh Recovery for Extracting Clinically Relevant Motion Data from Videos | Kai Armstrong et.al. | 2503.14760 | null |
| 2025-03-18 | SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model | Yucheng Mao et.al. | 2503.14463 | null |
| 2025-03-18 | SCJD: Sparse Correlation and Joint Distillation for Efficient 3D Human Pose Estimation | Weihong Chen et.al. | 2503.14097 | null |
| 2025-03-18 | Foundation Feature-Driven Online End-Effector Pose Estimation: A Marker-Free and Learning-Free Approach | Tianshu Wu et.al. | 2503.14051 | null |
| 2025-03-19 | Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation | Huan Ren et.al. | 2503.13926 | null |
| 2025-03-17 | STEP: Simultaneous Tracking and Estimation of Pose for Animals and Humans | Shashikant Verma et.al. | 2503.13344 | null |
| 2025-03-17 | UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation | Yinqiao Wang et.al. | 2503.13303 | null |
| 2025-03-17 | Uncertainty-Aware Knowledge Distillation for Compact and Efficient 6DoF Pose Estimation | Nassim Ali Ousalah et.al. | 2503.13053 | null |
| 2025-03-17 | PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data | ChangHee Yang et.al. | 2503.13025 | null |
| 2025-03-15 | Gun Detection Using Combined Human Pose and Weapon Appearance | Amulya Reddy Maligireddy et.al. | 2503.12215 | null |
| 2025-03-15 | TACO: Taming Diffusion for in-the-wild Video Amodal Completion | Ruijie Lu et.al. | 2503.12049 | link |
| 2025-03-14 | Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation | Hiroyasu Akada et.al. | 2503.11652 | null |
| 2025-03-14 | Online Test-time Adaptation for 3D Human Pose Estimation: A Practical Perspective with Estimated 2D Poses | Qiuxia Lin et.al. | 2503.11194 | null |
| 2025-03-14 | Fast and Robust Localization for Humanoid Soccer Robot via Iterative Landmark Matching | Ruochen Hou et.al. | 2503.11020 | null |
| 2025-03-13 | Clothes-Changing Person Re-identification Based On Skeleton Dynamics | Asaf Joseph et.al. | 2503.10759 | null |
| 2025-03-13 | Consistent multi-animal pose estimation in cattle using dynamic Kalman filter based tracking | Maarten Perneel et.al. | 2503.10450 | null |
| 2025-03-13 | 6D Object Pose Tracking in Internet Videos for Robotic Manipulation | Georgy Ponimatkin et.al. | 2503.10307 | null |
| 2025-03-13 | VicaSplat: A Single Run is All You Need for 3D Gaussian Splatting and Camera Estimation from Unposed Video Frames | Zhiqi Li et.al. | 2503.10286 | link |
| 2025-03-12 | Physics-Aware Human-Object Rendering from Sparse Views via 3D Gaussian Splatting | Weiquan Wang et.al. | 2503.09640 | null |
| 2025-03-12 | GenHPE: Generative Counterfactuals for 3D Human Pose Estimation with Radio Frequency Signals | Shuokang Huang et.al. | 2503.09537 | null |
| 2025-03-12 | MonoSLAM: Robust Monocular SLAM with Global Structure Optimization | Bingzheng Jiang et.al. | 2503.09296 | null |
| 2025-03-12 | Better Together: Unified Motion Capture and 3D Avatar Reconstruction | Arthur Moreau et.al. | 2503.09293 | null |
| 2025-03-11 | Acoustic Neural 3D Reconstruction Under Pose Drift | Tianxiang Lin et.al. | 2503.08930 | null |
| 2025-03-11 | Keypoint Semantic Integration for Improved Feature Matching in Outdoor Agricultural Environments | Rajitha de Silva et.al. | 2503.08843 | null |
| 2025-03-11 | Keypoint Detection and Description for Raw Bayer Images | Jiakai Lin et.al. | 2503.08673 | null |
| 2025-03-11 | SGNetPose+: Stepwise Goal-Driven Networks with Pose Information for Trajectory Prediction in Autonomous Driving | Akshat Ghiya et.al. | 2503.08016 | null |
| 2025-03-10 | Better Pose Initialization for Fast and Robust 2D/3D Pelvis Registration | Yehyun Suh et.al. | 2503.07767 | null |
| 2025-03-10 | HumanMM: Global Human Motion Recovery from Multi-shot Videos | Yuhong Zhang et.al. | 2503.07597 | link |
| 2025-03-11 | AthletePose3D: A Benchmark Dataset for 3D Human Pose Estimation and Kinematic Validation in Athletic Movements | Calvin Yeung et.al. | 2503.07499 | null |
| 2025-03-10 | Multi-Robot System for Cooperative Exploration in Unknown Environments: A Survey | Chuqi Wang et.al. | 2503.07278 | null |
| 2025-03-10 | Endo-FASt3r: Endoscopic Foundation model Adaptation for Structure from motion | Mona Sheikh Zeinoddin et.al. | 2503.07204 | null |
| 2025-03-10 | Multi-Modal 3D Mesh Reconstruction from Images and Text | Melvin Reka et.al. | 2503.07190 | null |
| 2025-03-11 | PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM | Alan Dao et.al. | 2503.07111 | link |
| 2025-03-09 | AxisPose: Model-Free Matching-Free Single-Shot 6D Object Pose Estimation via Axis Generation | Yang Zou et.al. | 2503.06660 | null |
| 2025-03-08 | NeuraLoc: Visual Localization in Neural Implicit Map with Dual Complementary Features | Hongjia Zhai et.al. | 2503.06117 | null |
| 2025-03-08 | Fish2Mesh Transformer: 3D Human Mesh Recovery from Egocentric Vision | David C. Jeong et.al. | 2503.06089 | null |
| 2025-03-08 | ReJSHand: Efficient Real-Time Hand Pose Estimation and Mesh Reconstruction Using Refined Joint and Skeleton Features | Shan An et.al. | 2503.05995 | link |
| 2025-03-07 | Differentiable Rendering-based Pose Estimation for Surgical Robotic Instruments | Zekai Liang et.al. | 2503.05953 | null |
| 2025-03-07 | Novel Object 6D Pose Estimation with a Single Reference View | Jian Liu et.al. | 2503.05578 | null |
| 2025-03-07 | Multi-Grained Feature Pruning for Video-Based Human Pose Estimation | Zhigang Wang et.al. | 2503.05365 | null |
| 2025-03-07 | Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects | Justin Yu et.al. | 2503.05189 | null |
| 2025-03-07 | SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting | Linqi Yang et.al. | 2503.05174 | null |
| 2025-03-07 | GaussianCAD: Robust Self-Supervised CAD Reconstruction from Three Orthographic Views Using 3D Gaussian Splatting | Zheng Zhou et.al. | 2503.05161 | null |
| 2025-03-06 | MarsLGPR: Mars Rover Localization with Ground Penetrating Radar | Anja Sheppard et.al. | 2503.04944 | null |
| 2025-03-06 | ReynoldsFlow: Exquisite Flow Estimation via Reynolds Transport Theorem | Yu-Hsi Chen et.al. | 2503.04500 | null |
| 2025-03-05 | Active 6D Pose Estimation for Textureless Objects using Multi-View RGB Frames | Jun Yang et.al. | 2503.03726 | null |
| 2025-03-05 | Machine Learning in Biomechanics: Key Applications and Limitations in Walking, Running, and Sports Movements | Carlo Dindorf et.al. | 2503.03717 | null |
| 2025-03-05 | Improving 6D Object Pose Estimation of metallic Household and Industry Objects | Thomas Pöllabauer et.al. | 2503.03655 | null |
| 2025-03-05 | Tiny Lidars for Manipulator Self-Awareness: Sensor Characterization and Initial Localization Experiments | Giammarco Caroleo et.al. | 2503.03449 | null |
| 2025-03-05 | Direct Sparse Odometry with Continuous 3D Gaussian Maps for Indoor Environments | Jie Deng et.al. | 2503.03373 | null |
| 2025-03-05 | Supervised Visual Docking Network for Unmanned Surface Vehicles Using Auto-labeling in Real-world Water Environments | Yijie Chu et.al. | 2503.03282 | null |
| 2025-03-05 | SCORE: Saturated Consensus Relocalization in Semantic Line Maps | Haodong Jiang et.al. | 2503.03254 | null |
| 2025-03-04 | Monocular Person Localization under Camera Ego-motion | Yu Zhan et.al. | 2503.02916 | null |
| 2025-03-04 | PIDLoc: Cross-View Pose Optimization Network Inspired by PID Controllers | Wooju Lee et.al. | 2503.02388 | null |
| 2025-03-04 | DQO-MAP: Dual Quadrics Multi-Object mapping with Gaussian Splatting | Haoyuan Li et.al. | 2503.02223 | null |
| 2025-03-04 | Zero-Shot Sim-to-Real Visual Quadrotor Control with Hard Constraints | Yan Miao et.al. | 2503.02198 | null |
| 2025-03-03 | Constraint-Based Modeling of Dynamic Entities in 3D Scene Graphs for Robust SLAM | Marco Giberna et.al. | 2503.02050 | null |
| 2025-03-03 | Category-level Meta-learned NeRF Priors for Efficient Object Mapping | Saad Ejaz et.al. | 2503.01582 | null |
| 2025-03-03 | RUSSO: Robust Underwater SLAM with Sonar Optimization against Visual Degradation | Shu Pan et.al. | 2503.01434 | null |
| 2025-03-03 | ecg2o: A Seamless Extension of g2o for Equality-Constrained Factor Graph Optimization | Anas Abdelkarim et.al. | 2503.01311 | null |
| 2025-03-03 | Convex Hull-based Algebraic Constraint for Visual Quadric SLAM | Xiaolong Yu et.al. | 2503.01254 | link |
| 2025-03-04 | Floorplan-SLAM: A Real-Time, High-Accuracy, and Long-Term Multi-Session Point-Plane SLAM for Efficient Floorplan Reconstruction | Haolin Wang et.al. | 2503.00397 | null |
| 2025-03-01 | BGM2Pose: Active 3D Human Pose Estimation with Non-Stationary Sounds | Yuto Shibata et.al. | 2503.00389 | null |
| 2025-02-28 | BST: Badminton Stroke-type Transformer for Skeleton-based Action Recognition in Racket Sports | Jing-Yuan Chang et.al. | 2502.21085 | null |
| 2025-02-28 | Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints | Masoumeh Chapariniya et.al. | 2502.20803 | null |
| 2025-02-27 | Cutting-edge 3D reconstruction solutions for underwater coral reef images: A review and comparison | Jiageng Zhong et.al. | 2502.20154 | null |
| 2025-02-27 | BEV-DWPVO: BEV-based Differentiable Weighted Procrustes for Low Scale-drift Monocular Visual Odometry on Ground | Yufei Wei et.al. | 2502.20078 | null |
| 2025-02-28 | SegLocNet: Multimodal Localization Network for Autonomous Driving via Bird’s-Eye-View Segmentation | Zijie Zhou et.al. | 2502.20077 | link |
| 2025-02-27 | RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges | Thibaut Loiseau et.al. | 2502.19955 | null |
| 2025-02-27 | QORT-Former: Query-optimized Real-time Transformer for Understanding Two Hands Manipulating Objects | Elkhan Ismayilzada et.al. | 2502.19769 | null |
| 2025-02-27 | Accurate Pose Estimation for Flight Platforms based on Divergent Multi-Aperture Imaging System | Shunkun Liang et.al. | 2502.19708 | null |
| 2025-02-26 | Increasing the Task Flexibility of Heavy-Duty Manipulators Using Visual 6D Pose Estimation of Objects | Petri Mäkinen et.al. | 2502.19169 | null |
| 2025-02-25 | EgoSim: An Egocentric Multi-view Simulator and Real Dataset for Body-worn Cameras during Motion and Activity | Dominik Hollidt et.al. | 2502.18373 | null |
| 2025-02-25 | Learning Structure-Supporting Dependencies via Keypoint Interactive Transformer for General Mammal Pose Estimation | Tianyang Xu et.al. | 2502.18214 | link |
| 2025-02-24 | V-HOP: Visuo-Haptic 6D Object Pose Tracking | Hongyu Li et.al. | 2502.17434 | null |
| 2025-02-23 | Orchestrating Joint Offloading and Scheduling for Low-Latency Edge SLAM | Yao Zhang et.al. | 2502.16495 | null |
| 2025-02-23 | DeProPose: Deficiency-Proof 3D Human Pose Estimation via Adaptive Multi-View Fusion | Jianbin Jiao et.al. | 2502.16419 | link |
| 2025-02-21 | RGB-Only Gaussian Splatting SLAM for Unbounded Outdoor Scenes | Sicheng Yu et.al. | 2502.15633 | null |
| 2025-02-21 | SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training | Nie Lin et.al. | 2502.15251 | null |
| 2025-02-21 | Nonlinear Dynamical Systems for Automatic Face Annotation in Head Tracking and Pose Estimation | Thoa Thieu et.al. | 2502.15179 | null |
| 2025-02-20 | Design of a Visual Pose Estimation Algorithm for Moon Landing | Atakan Süslü et.al. | 2502.14942 | null |
| 2025-02-20 | Hier-SLAM++: Neuro-Symbolic Semantic SLAM with a Hierarchically Categorical Gaussian Splatting | Boying Li et.al. | 2502.14931 | null |
| 2025-02-19 | EfficientPose 6D: Scalable and Efficient 6D Object Pose Estimation | Zixuan Fang et.al. | 2502.14061 | null |
| 2025-02-19 | Active Illumination for Visual Ego-Motion Estimation in the Dark | Francesco Crocetti et.al. | 2502.13708 | null |
| 2025-02-19 | Object-Pose Estimation With Neural Population Codes | Heiko Hoffmann et.al. | 2502.13403 | null |
| 2025-02-18 | Spatiotemporal Multi-Camera Calibration using Freely Moving People | Sang-Eun Lee et.al. | 2502.12546 | null |
| 2025-02-18 | Learning Transformation-Isomorphic Latent Space for Accurate Hand Pose Estimation | Kaiwen Ren et.al. | 2502.12535 | null |
| 2025-02-19 | FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views | Shangzhan Zhang et.al. | 2502.12138 | null |
| 2025-02-17 | Enhancing Transparent Object Pose Estimation: A Fusion of GDR-Net and Edge Detection | Tessa Pulli et.al. | 2502.12027 | null |
| 2025-02-17 | SurgPose: a Dataset for Articulated Robotic Surgical Tool Pose Estimation and Tracking | Zijian Wu et.al. | 2502.11534 | null |
| 2025-02-18 | VarGes: Improving Variation in Co-Speech 3D Gesture Generation via StyleCLIPS | Ming Meng et.al. | 2502.10729 | link |
| 2025-02-15 | Semantics-aware Test-time Adaptation for 3D Human Pose Estimation | Qiuxia Lin et.al. | 2502.10724 | null |
| 2025-02-15 | Learning semantical dynamics and spatiotemporal collaboration for human pose estimation in video | Runyang Feng et.al. | 2502.10616 | null |
| 2025-02-14 | HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation | Yibo Liu et.al. | 2502.10606 | null |
| 2025-02-14 | Manual2Skill: Learning to Read Manuals and Acquire Robotic Skills for Furniture Assembly Using Vision-Language Models | Chenrui Tie et.al. | 2502.10090 | null |
| 2025-02-13 | Metamorphic Testing for Pose Estimation Systems | Matias Duran et.al. | 2502.09460 | null |
| 2025-02-13 | BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization | Qiwei Wang et.al. | 2502.09080 | null |
| 2025-02-14 | Siren Song: Manipulating Pose Estimation in XR Headsets Using Acoustic Attacks | Zijian Huang et.al. | 2502.08865 | null |
| 2025-02-12 | LIR-LIVO: A Lightweight,Robust LiDAR/Vision/Inertial Odometry with Illumination-Resilient Deep Features | Shujie Zhou et.al. | 2502.08676 | link |
| 2025-02-12 | CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World | Yankai Fu et.al. | 2502.08449 | null |
| 2025-02-11 | GaRLIO: Gravity enhanced Radar-LiDAR-Inertial Odometry | Chiyun Noh et.al. | 2502.07703 | link |
| 2025-02-11 | Matrix3D: Large Photogrammetry Model All-in-One | Yuanxun Lu et.al. | 2502.07685 | null |
| 2025-02-08 | Vision-in-the-loop Simulation for Deep Monocular Pose Estimation of UAV in Ocean Environment | Maneesha Wickramasuriya et.al. | 2502.05409 | null |
| 2025-02-06 | Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation | Nathan Louis et.al. | 2502.04483 | link |
| 2025-02-06 | GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation | Weihang Li et.al. | 2502.04293 | null |
| 2025-02-06 | Advanced Object Detection and Pose Estimation with Hybrid Task Cascade and High-Resolution Networks | Yuhui Jin et.al. | 2502.03877 | null |
| 2025-02-05 | Mapping and Localization Using LiDAR Fiducial Markers | Yibo Liu et.al. | 2502.03510 | null |
| 2025-02-04 | Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation | Jian Liu et.al. | 2502.02525 | link |
| 2025-02-03 | CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation | Xiao Lin et.al. | 2502.01312 | null |
| 2025-02-03 | Enhancing Feature Tracking Reliability for Visual Navigation using Real-Time Safety Filter | Dabin Kim et.al. | 2502.01092 | null |
| 2025-02-03 | ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking | Jianqiu Chen et.al. | 2502.01004 | null |
| 2025-01-31 | A Direct Semi-Exhaustive Search Method for Robust, Partial-to-Full Point Cloud Registration | Richard Cheng et.al. | 2502.00115 | null |
| 2025-01-31 | XRF V2: A Dataset for Action Summarization with Wi-Fi Signals, and IMUs in Phones, Watches, Earbuds, and Glasses | Bo Lan et.al. | 2501.19034 | link |
| 2025-01-30 | SimpleDepthPose: Fast and Reliable Human Pose Estimation with RGBD-Images | Daniel Bermuth et.al. | 2501.18478 | null |
| 2025-01-29 | Online Trajectory Replanner for Dynamically Grasping Irregular Objects | Minh Nhat Vu et.al. | 2501.17968 | null |
| 2025-01-28 | DebugAgent: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging | Muxi Chen et.al. | 2501.16751 | null |
| 2025-01-27 | Toward Efficient Generalization in 3D Human Pose Estimation via a Canonical Domain Approach | Hoosang Lee et.al. | 2501.16146 | null |
| 2025-01-27 | NanoHTNet: Nano Human Topology Network for Efficient 3D Human Pose Estimation | Jialun Cai et.al. | 2501.15763 | null |
| 2025-01-25 | Towards Better Robustness: Progressively Joint Pose-3DGS Learning for Arbitrarily Long Videos | Zhen-Hui Dong et.al. | 2501.15096 | null |
| 2025-01-25 | SpatioTemporal Learning for Human Pose Estimation in Sparsely-Labeled Videos | Yingying Jiao et.al. | 2501.15073 | null |
| 2025-01-24 | 3D/2D Registration of Angiograms using Silhouette-based Differentiable Rendering | Taewoong Lee et.al. | 2501.14918 | link |
| 2025-01-24 | Light3R-SfM: Towards Feed-forward Structure-from-Motion | Sven Elflein et.al. | 2501.14914 | null |
| 2025-01-24 | Glissando-Net: Deep sinGLe vIew category level poSe eStimation ANd 3D recOnstruction | Bo Sun et.al. | 2501.14896 | null |
| 2025-01-24 | Optimizing Grasping Precision for Industrial Pick-and-Place Tasks Through a Novel Visual Servoing Approach | Khairidine Benali et.al. | 2501.14557 | null |
| 2025-01-24 | LiDAR-Based Vehicle Detection and Tracking for Autonomous Racing | Marcello Cellina et.al. | 2501.14502 | null |
| 2025-01-24 | Optimizing Human Pose Estimation Through Focused Human and Joint Regions | Yingying Jiao et.al. | 2501.14439 | null |
| 2025-01-24 | Causal-Inspired Multitask Learning for Video-Based Human Pose Estimation | Haipeng Chen et.al. | 2501.14356 | null |
| 2025-01-24 | HAMMER: Heterogeneous, Multi-Robot Semantic Gaussian Splatting | Javier Yu et.al. | 2501.14147 | null |
| 2025-01-23 | Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass | Jianing Yang et.al. | 2501.13928 | null |
| 2025-01-23 | EgoHand: Ego-centric Hand Pose Estimation and Gesture Recognition with Head-mounted Millimeter-wave Radar and IMUs | Yizhe Lv et.al. | 2501.13805 | link |
| 2025-01-23 | VIGS SLAM: IMU-based Large-Scale 3D Gaussian Splatting SLAM | Gyuhyeon Pak et.al. | 2501.13402 | null |
| 2025-01-22 | Deep Learning-Based Image Recovery and Pose Estimation for Resident Space Objects | Louis Aberdeen et.al. | 2501.13009 | null |
| 2025-01-21 | BlanketGen2-Fit3D: Synthetic Blanket Augmentation Towards Improving Real-World In-Bed Blanket Occluded Human Pose Estimation | Tamás Karácsony et.al. | 2501.12318 | null |
| 2025-01-19 | Refinement Module based on Parse Graph of Feature Map for Human Pose Estimation | Shibang Liu et.al. | 2501.11069 | null |
| 2025-01-17 | landmarker: a Toolkit for Anatomical Landmark Localization in 2D/3D Images | Jef Jonkers et.al. | 2501.10098 | link |
| 2025-01-16 | A New Teacher-Reviewer-Student Framework for Semi-supervised 2D Human Pose Estimation | Wulian Yun et.al. | 2501.09565 | null |
| 2025-01-21 | Towards Robust and Realistic Human Pose Estimation via WiFi Signals | Yang Chen et.al. | 2501.09411 | link |
| 2025-01-16 | RoboReflect: Robotic Reflective Reasoning for Grasping Ambiguous-Condition Objects | Zhen Luo et.al. | 2501.09307 | null |
| 2025-01-16 | BRIGHT-VO: Brightness-Guided Hybrid Transformer for Visual Odometry with Multi-modality Refinement Module | Dongzhihan Wang et.al. | 2501.08659 | null |
| 2025-01-14 | Poseidon: A ViT-based Architecture for Multi-Frame Pose Estimation with Adaptive Frame Weighting and Multi-Scale Feature Fusion | Cesare Davide Pace et.al. | 2501.08446 | link |
| 2025-01-14 | Leveraging 2D Masked Reconstruction for Domain Adaptation of 3D Pose Estimation | Hansoo Park et.al. | 2501.08408 | null |
| 2025-01-14 | Predicting 4D Hand Trajectory from Monocular Videos | Yufei Ye et.al. | 2501.08329 | null |
| 2025-01-14 | A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation | Steven Landgraf et.al. | 2501.08188 | null |
| 2025-01-14 | AgentPose: Progressive Distribution Alignment via Feature Agent for Human Pose Distillation | Feng Zhang et.al. | 2501.08088 | null |
| 2025-01-14 | Robust Low-Light Human Pose Estimation through Illumination-Texture Modulation | Feng Zhang et.al. | 2501.08038 | null |
| 2025-01-14 | BioPose: Biomechanically-accurate 3D Pose Estimation from Monocular Videos | Farnoosh Koleini et.al. | 2501.07800 | null |
| 2025-01-13 | Fixing the Scale and Shift in Monocular Depth For Camera Pose Estimation | Yaqing Ding et.al. | 2501.07742 | link |
| 2025-01-13 | Efficiently Closing Loops in LiDAR-Based SLAM Using Point Cloud Density Maps | Saurabh Gupta et.al. | 2501.07399 | null |
| 2025-01-13 | Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics | Tze Ho Elden Tse et.al. | 2501.07100 | null |
| 2025-01-10 | eKalibr: Dynamic Intrinsic Calibration for Event Cameras From First Principles of Events | Shuolong Chen et.al. | 2501.05688 | null |
| 2025-01-09 | Relative Pose Estimation through Affine Corrections of Monocular Depth Priors | Yifan Yu et.al. | 2501.05446 | link |
| 2025-01-09 | From Simple to Complex Skills: The Case of In-Hand Object Reorientation | Haozhi Qi et.al. | 2501.05439 | null |
| 2025-01-11 | Towards Balanced Continual Multi-Modal Learning in Human Pose Estimation | Jiaxuan Peng et.al. | 2501.05264 | null |
| 2025-01-08 | KN-LIO: Geometric Kinematics and Neural Field Coupled LiDAR-Inertial Odometry | Zhong Wang et.al. | 2501.04263 | null |
| 2025-01-10 | MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer | Junsheng Luan et.al. | 2501.03630 | null |
| 2025-01-07 | TexHOI: Reconstructing Textures of 3D Unknown Objects in Monocular Hand-Object Interaction Scenes | Alakh Aggarwal et.al. | 2501.03525 | link |
| 2025-01-06 | Mobile Augmented Reality Framework with Fusional Localization and Pose Estimation | Songlin Hou et.al. | 2501.03336 | null |
| 2025-01-06 | SurgRIPE challenge: Benchmark of Surgical Robot Instrument Pose Estimation | Haozheng Xu et.al. | 2501.02990 | null |
| 2025-01-06 | HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos | Jinglei Zhang et.al. | 2501.02973 | null |
| 2025-01-06 | Spiking monocular event based 6D pose estimation for space application | Jonathan Courtois et.al. | 2501.02916 | null |
| 2025-01-06 | Universal Features Guided Zero-Shot Category-Level Object Pose Estimation | Wentian Qu et.al. | 2501.02831 | null |
| 2025-01-06 | Unsupervised Domain Adaptation for Occlusion Resilient Human Pose Estimation | Arindam Dutta et.al. | 2501.02773 | null |
| 2025-01-06 | WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation | Tianjian Jiang et.al. | 2501.02771 | null |
| 2025-01-05 | LP-ICP: General Localizability-Aware Point Cloud Registration for Robust Localization in Extreme Unstructured Environments | Haosong Yue et.al. | 2501.02580 | null |
| 2025-01-04 | ROLO-SLAM: Rotation-Optimized LiDAR-Only SLAM in Uneven Terrain with Ground Vehicle | Yinchuan Wang et.al. | 2501.02166 | link |
| 2025-01-03 | TCPFormer: Learning Temporal Correlation with Implicit Pose Proxy for 3D Human Pose Estimation | Jiajie Liu et.al. | 2501.01770 | null |
| 2025-01-03 | Laparoscopic Scene Analysis for Intraoperative Visualisation of Gamma Probe Signals in Minimally Invasive Cancer Surgery | Baoru Huang et.al. | 2501.01752 | null |
| 2025-01-02 | On Unifying Video Generation and Camera Pose Estimation | Chun-Hao Paul Huang et.al. | 2501.01409 | null |
| 2025-01-02 | L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild | Soumyaratna Debnath et.al. | 2501.01174 | null |
| 2024-12-31 | Relative Pose Observability Analysis Using Dual Quaternions | Nicholas B. Andrews et.al. | 2501.00657 | null |
| 2024-12-31 | VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception | Zhaoliang Wan et.al. | 2501.00510 | null |
| 2024-12-30 | Hierarchical Pose Estimation and Mapping with Multi-Scale Neural Feature Fields | Evgenii Kruzhkov et.al. | 2412.20976 | null |
| 2024-12-30 | ReFlow6D: Refraction-Guided Transparent Object 6D Pose Estimation via Intermediate Representation Learning | Hrishikesh Gupta et.al. | 2412.20830 | link |
| 2024-12-30 | Frequency-aware Event Cloud Network | Hongwei Ren et.al. | 2412.20803 | null |
| 2024-12-30 | KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences | Keng-Wei Chang et.al. | 2412.20767 | null |
| 2024-12-30 | Towards nation-wide analytical healthcare infrastructures: A privacy-preserving augmented knee rehabilitation case study | Boris Bačić et.al. | 2412.20733 | null |
| 2024-12-29 | Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation | Qucheng Peng et.al. | 2412.20538 | link |
| 2024-12-28 | MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing | Shuo Wang et.al. | 2412.20082 | null |
| 2024-12-28 | GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting | Atticus J. Zeller et.al. | 2412.20056 | link |
| 2024-12-27 | Optimizing Local-Global Dependencies for Accurate 3D Human Pose Estimation | Guangsheng Xu et.al. | 2412.19676 | link |
| 2024-12-27 | Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images | Xudong Cai et.al. | 2412.19518 | null |
| 2024-12-26 | Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos | Changwoon Choi et.al. | 2412.19089 | null |
| 2024-12-23 | Reconstructing People, Places, and Cameras | Lea Müller et.al. | 2412.17806 | null |
| 2024-12-22 | Leveraging Consistent Spatio-Temporal Correspondence for Robust Visual Odometry | Zhaoxing Zhang et.al. | 2412.16923 | null |
| 2024-12-21 | EasyVis2: A Real Time Multi-view 3D Visualization for Laparoscopic Surgery Training Enhanced by a Deep Neural Network YOLOv8-Pose | Yung-Hong Sun et.al. | 2412.16742 | null |
| 2024-12-21 | FACTS: Fine-Grained Action Classification for Tactical Sports | Christopher Lai et.al. | 2412.16454 | null |
| 2024-12-20 | Can Generative Video Models Help Pose Estimation? | Ruojin Cai et.al. | 2412.16155 | null |
| 2024-12-20 | Monkey Transfer Learning Can Improve Human Pose Estimation | Bradley Scott et.al. | 2412.15966 | null |
| 2024-12-19 | Scaling 4D Representations | João Carreira et.al. | 2412.15212 | null |
| 2024-12-13 | IMPROVE: Impact of Mobile Phones on Remote Online Virtual Education | Roberto Daza et.al. | 2412.14195 | link |
| 2024-12-18 | Level-Set Parameters: Novel Representation for 3D Shape Analysis | Huan Lei et.al. | 2412.13502 | null |
| 2024-12-18 | Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation | Xiaoqi An et.al. | 2412.13454 | null |
| 2024-12-17 | ORFormer: Occlusion-Robust Transformer for Accurate Facial Landmark Detection | Jui-Che Chiang et.al. | 2412.13174 | link |
| 2024-12-17 | CondiMen: Conditional Multi-Person Mesh Recovery | Brégier Romain et.al. | 2412.13058 | null |
| 2024-12-17 | ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries | Wangyu Xue et.al. | 2412.12675 | null |
| 2024-12-16 | Category Level 6D Object Pose Estimation from a Single RGB Image using Diffusion | Adam Bethell et.al. | 2412.11420 | null |
| 2024-12-13 | ExeChecker: Where Did I Go Wrong? | Yiwen Gu et.al. | 2412.10573 | null |
| 2024-12-11 | CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty | Harry Zhang et.al. | 2412.10431 | null |
| 2024-12-13 | RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting | Lizhi Bai et.al. | 2412.09868 | null |
| 2024-12-12 | Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos | Linyi Jin et.al. | 2412.09621 | link |
| 2024-12-12 | FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction | Jiale Xu et.al. | 2412.09573 | link |
| 2024-12-11 | BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation | Shengze Wang et.al. | 2412.08640 | null |
| 2024-12-12 | Drift-free Visual SLAM using Digital Twins | Roxane Merat et.al. | 2412.08496 | null |
| 2024-12-11 | Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization | Siyan Dong et.al. | 2412.08376 | link |
| 2024-12-10 | LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models | Ziqi Lu et.al. | 2412.07746 | null |
| 2024-12-09 | MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds | Zhenggang Tang et.al. | 2412.06974 | link |
| 2024-12-09 | An Efficient Scene Coordinate Encoding and Relocalization Method | Kuan Xu et.al. | 2412.06488 | link |
| 2024-12-09 | Attention-Enhanced Lightweight Hourglass Network for Human Pose Estimation | Marsha Mariya Kappan et.al. | 2412.06227 | null |
| 2024-12-06 | CCS: Continuous Learning for Customized Incremental Wireless Sensing Services | Qunhang Fu et.al. | 2412.04821 | null |
| 2024-12-05 | ProPLIKS: Probablistic 3D human body pose estimation | Karthik Shetty et.al. | 2412.04665 | null |
| 2024-12-05 | DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction | Ben Kaye et.al. | 2412.04464 | link |
| 2024-12-05 | Targeted Hard Sample Synthesis Based on Estimated Pose and Occlusion Error for Improved Object Pose Estimation | Alan Li et.al. | 2412.04279 | null |
| 2024-12-04 | Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis | Qitao Zhao et.al. | 2412.03570 | null |
| 2024-12-06 | NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images | Lingen Li et.al. | 2412.03517 | link |
| 2024-12-05 | A Bidirectional Siamese Recurrent Neural Network for Accurate Gait Recognition Using Body Landmarks | Proma Hossain Progga et.al. | 2412.03498 | null |
| 2024-12-04 | MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras | Huai Yu et.al. | 2412.03146 | link |
| 2024-12-04 | An indoor DSO-based ceiling-vision odometry system for indoor industrial environments | Abdelhak Bougouffa et.al. | 2412.02950 | null |
| 2024-12-03 | EgoCast: Forecasting Egocentric Human Pose in the Wild | Maria Escobar et.al. | 2412.02903 | null |
| 2024-12-02 | emg2pose: A Large and Diverse Benchmark for Surface Electromyographic Hand Pose Estimation | Sasha Salter et.al. | 2412.02725 | null |
| 2024-12-03 | ProbPose: A Probabilistic Approach to 2D Human Pose Estimation | Miroslav Purkrabek et.al. | 2412.02254 | link |
| 2024-12-03 | Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature Extraction and Interaction with Low-Resolution Images | Xiangyong Lu et.al. | 2412.02197 | link |
| 2024-12-03 | CLERF: Contrastive LEaRning for Full Range Head Pose Estimation | Ting-Ruen Wei et.al. | 2412.02066 | null |
| 2024-12-02 | Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle | Miroslav Purkrabek et.al. | 2412.01562 | link |
| 2024-12-02 | 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting | Yufeng Jin et.al. | 2412.01543 | null |
| 2024-12-02 | HandOS: 3D Hand Reconstruction in One Stage | Xingyu Chen et.al. | 2412.01537 | null |
| 2024-12-02 | SF-Loc: A Visual Mapping and Geo-Localization System based on Sparse Visual Structure Frames | Yuxuan Zhou et.al. | 2412.01500 | null |
| 2024-12-02 | MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection | Yonghao Dang et.al. | 2412.01422 | null |
| 2024-12-02 | Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures | Qiyuan Shen et.al. | 2412.01299 | null |
| 2024-12-02 | CRISP: Object Pose and Shape Estimation with Test-Time Adaptation | Jingnan Shi et.al. | 2412.01052 | null |
| 2024-11-29 | Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling | Qirui Wu et.al. | 2411.19492 | null |
| 2024-11-29 | Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning | Yang You et.al. | 2411.19458 | link |
| 2024-11-28 | GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model | Rui Zhou et.al. | 2411.19289 | null |
| 2024-11-28 | HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos | Prithviraj Banerjee et.al. | 2411.19167 | null |
| 2024-11-28 | Lost & Found: Updating Dynamic 3D Scene Graphs from Egocentric Observations | Tjark Behrens et.al. | 2411.19162 | null |
| 2024-11-28 | Distributed Dual Quaternion Extended Kalman Filtering for Spacecraft Pose Estimation | Mathias Hudoba de Badyn et.al. | 2411.19033 | null |
| 2024-11-28 | Waterfall Transformer for Multi-person Pose Estimation | Navin Ranjan et.al. | 2411.18944 | null |
| 2024-12-02 | AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers | Sherwin Bahmani et.al. | 2411.18673 | null |
| 2024-11-27 | XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration | Denys Rozumnyi et.al. | 2411.18377 | null |
| 2024-11-26 | Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors | Ziang Xu et.al. | 2411.17790 | null |
| 2024-11-26 | Geometric Point Attention Transformer for 3D Shape Reassembly | Jiahan Li et.al. | 2411.17788 | null |
| 2024-11-26 | RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training | Raktim Gautam Goswami et.al. | 2411.17662 | null |
| 2024-11-26 | Communication-Efficient Cooperative SLAMMOT via Determining the Number of Collaboration Vehicles | Susu Fang et.al. | 2411.17432 | null |
| 2024-11-26 | Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration | Junyuan Deng et.al. | 2411.17240 | link |
| 2024-11-27 | SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting | Gyeongjin Kang et.al. | 2411.17190 | link |
| 2024-11-26 | GMFlow: Global Motion-Guided Recurrent Flow for 6D Object Pose Estimation | Xin Liu et.al. | 2411.17174 | null |
| 2024-11-25 | Diffusion Features for Zero-Shot 6DoF Object Pose Estimation | Bernd Von Gimborn et.al. | 2411.16668 | null |
| 2024-11-25 | Edge Weight Prediction For Category-Agnostic Pose Estimation | Or Hirschorn et.al. | 2411.16665 | link |
| 2024-11-25 | SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis | Hyojun Go et.al. | 2411.16443 | link |
| 2024-11-25 | One Diffusion to Generate Them All | Duong H. Le et.al. | 2411.16318 | link |
| 2024-11-25 | UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image | Xingyu Liu et.al. | 2411.16106 | link |
| 2024-11-24 | Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching | Yujing Sun et.al. | 2411.15860 | link |
| 2024-11-24 | PEnG: Pose-Enhanced Geo-Localisation | Tavis Shore et.al. | 2411.15742 | link |
| 2024-11-22 | Personalization of Wearable Sensor-Based Joint Kinematic Estimation Using Computer Vision for Hip Exoskeleton Applications | Changseob Song et.al. | 2411.15366 | null |
| 2024-11-22 | mmWave Radar for Sit-to-Stand Analysis: A Comparative Study with Wearables and Kinect | Shuting Hu et.al. | 2411.14656 | null |
| 2024-11-21 | DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding | Tianhe Ren et.al. | 2411.14347 | link |
| 2024-11-21 | SEMPose: A Single End-to-end Network for Multi-object Pose Estimation | Xin Liu et.al. | 2411.14002 | null |
| 2024-11-21 | Dehazing-aided Multi-Rate Multi-Modal Pose Estimation Framework for Mitigating Visual Disturbances in Extreme Underwater Domain | Vidya Sudevan et.al. | 2411.13988 | null |
| 2024-11-21 | Hybrid-Neuromorphic Approach for Underwater Robotics Applications: A Conceptual Framework | Vidya Sudevan et.al. | 2411.13962 | null |
| 2024-11-20 | Developing Normative Gait Cycle Parameters for Clinical Analysis Using Human Pose Estimation | Rahm Ranjan et.al. | 2411.13716 | null |
| 2024-11-20 | Robust SG-NeRF: Robust Scene Graph Aided Neural Surface Reconstruction | Yi Gu et.al. | 2411.13620 | null |
| 2024-11-19 | VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference | Seong Jong Yoo et.al. | 2411.13607 | link |
| 2024-11-20 | DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild | Weicai Ye et.al. | 2411.13291 | null |
| 2024-11-20 | X as Supervision: Contending with Depth Ambiguity in Unsupervised Monocular 3D Pose Estimation | Yuchen Yang et.al. | 2411.13026 | link |
| 2024-11-19 | IoT-Based 3D Pose Estimation and Motion Optimization for Athletes: Application of C3D and OpenPose | Fei Ren et.al. | 2411.12676 | null |
| 2024-11-15 | SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction | Yutao Tang et.al. | 2411.12592 | link |
| 2024-11-19 | GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping | Teli Ma et.al. | 2411.12286 | null |
| 2024-11-18 | IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos | Yunong Liu et.al. | 2411.11409 | link |
| 2024-11-15 | USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting | Kang Chen et.al. | 2411.10504 | link |
| 2024-11-13 | ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening | Hojun Jang et.al. | 2411.09435 | null |
| 2024-11-13 | Generalized Pose Space Embeddings for Training In-the-Wild using Anaylis-by-Synthesis | Dominik Borer et.al. | 2411.08603 | null |
| 2024-11-13 | DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization | Yueming Xu et.al. | 2411.08373 | null |
| 2024-11-16 | RINO: Accurate, Robust Radar-Inertial Odometry with Non-Iterative Estimation | Shuocheng Yang et.al. | 2411.07699 | link |
| 2024-11-12 | Human Arm Pose Estimation with a Shoulder-worn Force-Myography Device for Human-Robot Interaction | Rotem Atari et.al. | 2411.07644 | null |
| 2024-11-12 | Towards Seamless Integration of Magnetic Tracking into Fluoroscopy-guided Interventions | Shuwei Xing et.al. | 2411.07495 | null |
| 2024-11-08 | Acoustic-based 3D Human Pose Estimation Robust to Human Position | Yusuke Oumi et.al. | 2411.07165 | null |
| 2024-11-11 | CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models | Junho Kim et.al. | 2411.06869 | null |
| 2024-11-11 | GenZ-ICP: Generalizable and Degeneracy-Robust LiDAR Odometry Using an Adaptive Weighting | Daehan Lee et.al. | 2411.06766 | null |
| 2024-11-11 | GTA-Net: An IoT-Integrated 3D Human Pose Estimation System for Real-Time Adolescent Sports Posture Correction | Shizhe Yuan et.al. | 2411.06725 | null |
| 2024-11-10 | Magnetic Field Aided Vehicle Localization with Acceleration Correction | Mrunmayee Deshpande et.al. | 2411.06543 | null |
| 2024-11-10 | Visuotactile-Based Learning for Insertion with Compliant Hands | Osher Azulay et.al. | 2411.06408 | null |
| 2024-11-08 | Poze: Sports Technique Feedback under Data Constraints | Agamdeep Singh et.al. | 2411.05734 | null |
| 2024-11-08 | DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions | Rafael Berral-Soler et.al. | 2411.05552 | link |
| 2024-11-08 | Tightly-Coupled, Speed-aided Monocular Visual-Inertial Localization in Topological Map | Chanuk Yang et.al. | 2411.05497 | null |
| 2024-11-08 | Relative Pose Estimation for Nonholonomic Robot Formation with UWB-IO Measurements | Kunrui Ze et.al. | 2411.05481 | null |
| 2024-11-07 | Social EgoMesh Estimation | Luca Scofano et.al. | 2411.04598 | link |
| 2024-11-07 | Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player’s Trajectory | Ali K. AlShami et.al. | 2411.04501 | null |
| 2024-11-07 | SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation | Xun Tu et.al. | 2411.04386 | null |
| 2024-11-08 | GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting | Jilan Mei et.al. | 2411.03807 | null |
| 2024-11-06 | Estimation of Psychosocial Work Environment Exposures Through Video Object Detection. Proof of Concept Using CCTV Footage | Claus D. Hansen et.al. | 2411.03724 | null |
| 2024-11-05 | Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data | Seunggeun Chi et.al. | 2411.03561 | null |
| 2024-11-05 | HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features | Arnab Dey et.al. | 2411.03086 | null |
| 2024-11-04 | Semantic Masking and Visual Feature Matching for Robust Localization | Luisa Mao et.al. | 2411.01804 | null |
| 2024-11-03 | Activating Self-Attention for Multi-Scene Absolute Pose Regression | Miso Lee et.al. | 2411.01443 | link |
| 2024-11-04 | 3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction | Jongmin Lee et.al. | 2411.00543 | null |
| 2024-10-31 | Whole-Herd Elephant Pose Estimation from Drone Data for Collective Behavior Analysis | Brody McNutt et.al. | 2411.00196 | null |
| 2024-10-31 | No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images | Botao Ye et.al. | 2410.24207 | link |
| 2024-11-06 | SceneComplete: Open-World 3D Scene Completion in Complex Real World Environments for Robot Manipulation | Aditya Agarwal et.al. | 2410.23643 | null |
| 2024-10-30 | SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark | HyunJun Jung et.al. | 2410.22715 | null |
| 2024-10-29 | LiVisSfM: Accurate and Robust Structure-from-Motion with LiDAR and Visual Cues | Hanqing Jiang et.al. | 2410.22213 | null |
| 2024-10-29 | PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting | Sunghwan Hong et.al. | 2410.22128 | link |
| 2024-10-29 | HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation | Zhoujie Xu et.al. | 2410.22079 | null |
| 2024-10-29 | EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data | Zhonghua Yi et.al. | 2410.21743 | null |
| 2024-10-28 | Synthetica: Large Scale Synthetic Data for Robot Perception | Ritvik Singh et.al. | 2410.21153 | null |
| 2024-10-29 | BLAPose: Enhancing 3D Human Pose Estimation with Bone Length Adjustment | Chih-Hsiang Hsu et.al. | 2410.20731 | link |
| 2024-11-01 | RopeTP: Global Human Motion Recovery via Integrating Robust Pose Estimation with Diffusion Trajectory Prior | Mingjiang Liang et.al. | 2410.20358 | null |
| 2024-10-27 | Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions | Rawal Khirodkar et.al. | 2410.20294 | null |
| 2024-10-26 | Neural Fields in Robotics: A Survey | Muhammad Zubair Irshad et.al. | 2410.20220 | link |
| 2024-10-25 | DECADE: Towards Designing Efficient-yet-Accurate Distance Estimation Modules for Collision Avoidance in Mobile Advanced Driver Assistance Systems | Muhammad Zaeem Shahzad et.al. | 2410.19336 | null |
| 2024-10-24 | Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction | Junyi Chen et.al. | 2410.18962 | null |
| 2024-10-24 | VoxelKeypointFusion: Generalizable Multi-View Multi-Person Pose Estimation | Daniel Bermuth et.al. | 2410.18723 | null |
| 2024-10-23 | Robust Two-View Geometry Estimation with Implicit Differentiation | Vladislav Pyatov et.al. | 2410.17983 | link |
| 2024-10-23 | YOLOv11: An Overview of the Key Architectural Enhancements | Rahima Khanam et.al. | 2410.17725 | null |
| 2024-10-21 | Assisted Physical Interaction: Autonomous Aerial Robots with Neural Network Detection, Navigation, and Safety Layers | Andrea Berra et.al. | 2410.15802 | null |
| 2024-10-21 | ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos | Tao Tang et.al. | 2410.15582 | link |
| 2024-10-20 | Neural Active Structure-from-Motion in Dark and Textureless Environment | Kazuto Ichimaru et.al. | 2410.15378 | null |
| 2024-10-20 | POSE: Pose estimation Of virtual Sync Exhibit system | Hao-Tang Tsui et.al. | 2410.15343 | link |
| 2024-10-18 | Graph Optimality-Aware Stochastic LiDAR Bundle Adjustment with Progressive Spatial Smoothing | Jianping Li et.al. | 2410.14565 | null |
| 2024-10-18 | Multi-modal Pose Diffuser: A Multimodal Generative Conditional Pose Prior | Calvin-Khang Ta et.al. | 2410.14540 | null |
| 2024-10-18 | Sim2real Cattle Joint Estimation in 3D point clouds | Okour Mohammad et.al. | 2410.14419 | null |
| 2024-10-18 | Unlabeled Action Quality Assessment Based on Multi-dimensional Adaptive Constrained Dynamic Time Warping | Renguang Chen et.al. | 2410.14161 | null |
| 2024-10-15 | From Real Artifacts to Virtual Reference: A Robust Framework for Translating Endoscopic Images | unyang Wu et.al. | 2410.13896 | null |
| 2024-10-17 | DualQuat-LOAM: LiDAR Odometry and Mapping parametrized on Dual Quaternions | Edison P. Velasco-Sánchez et.al. | 2410.13541 | null |
| 2024-10-17 | Object Pose Estimation Using Implicit Representation For Transparent Objects | Varun Burde et.al. | 2410.13465 | null |
| 2024-10-16 | Optimizing Multi-Task Learning for Accurate Spacecraft Pose Estimation | Francesco Evangelisti et.al. | 2410.12679 | null |
| 2024-10-15 | Contrastive Touch-to-Touch Pretraining | Samanta Rodriguez et.al. | 2410.11834 | null |
| 2024-10-18 | X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing | Xinyan Chen et.al. | 2410.10167 | null |
| 2024-10-13 | Occluded Human Pose Estimation based on Limb Joint Augmentation | Gangtao Han et.al. | 2410.09885 | null |
| 2024-10-15 | POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search | Chong-Yang Xiang et.al. | 2410.09583 | null |
| 2024-10-12 | Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors | Hritam Basak et.al. | 2410.09467 | null |
| 2024-10-12 | Towards Multi-Modal Animal Pose Estimation: An In-Depth Analysis | Qianyi Deng et.al. | 2410.09312 | link |
| 2024-10-11 | CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation | Jianyu Zhao et.al. | 2410.09010 | link |
| 2024-10-11 | Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization | Christian Schmidt et.al. | 2410.08743 | link |
| 2024-10-10 | Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation | Felix Petersen et.al. | 2410.08125 | null |
| 2024-10-10 | Robotic framework for autonomous manipulation of laboratory equipment with different degrees of transparency via 6D pose estimation | Maria Makarova et.al. | 2410.07801 | null |
| 2024-10-10 | Optimal-State Dynamics Estimation for Physics-based Human Motion Capture from Videos | Cuong Le et.al. | 2410.07795 | link |
| 2024-10-10 | Autonomous Driving in Unstructured Environments: How Far Have We Come? | Chen Min et.al. | 2410.07701 | null |
| 2024-10-10 | Invisibility Cloak: Disappearance under Human Pose Estimation via Backdoor Attacks | Minxing Zhang et.al. | 2410.07670 | null |
| 2024-10-09 | OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB | Yunzhi Lin et.al. | 2410.06694 | null |
| 2024-10-08 | Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach | Sha Guo et.al. | 2410.06149 | null |
| 2024-10-08 | SpecTrack: Learned Multi-Rotation Tracking via Speckle Imaging | Ziyang Chen et.al. | 2410.06028 | null |
| 2024-10-08 | AIVIO: Closed-loop, Object-relative Navigation of UAVs with AI-aided Visual Inertial Odometry | Thomas Jantos et.al. | 2410.05996 | null |
| 2024-10-08 | Are Minimal Radial Distortion Solvers Necessary for Relative Pose Estimation? | Charalambos Tzamos et.al. | 2410.05984 | link |
| 2024-10-08 | FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance | Ruocheng Wang et.al. | 2410.05791 | null |
| 2024-10-07 | Comparison of marker-less 2D image-based methods for infant pose estimation | Lennart Jahn et.al. | 2410.04980 | null |
| 2024-10-06 | Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion | Mehwish Ghafoor et.al. | 2410.04574 | link |
| 2024-10-06 | LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation | Jianhao Jiao et.al. | 2410.04419 | null |
| 2024-10-05 | Test-Time Adaptation for Keypoint-Based Spacecraft Pose Estimation Based on Predicted-View Synthesis | Juan Ignacio Bravo Pérez-Villar et.al. | 2410.04298 | link |
| 2024-10-05 | A Framework for Reproducible Benchmarking and Performance Diagnosis of SLAM Systems | Nikola Radulov et.al. | 2410.04242 | link |
| 2024-10-04 | Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos | Ziyu Wang et.al. | 2410.03858 | null |
| 2024-10-04 | Universal Global State Estimation for Inertial Navigation Systems | Sifeddine Benahmed et.al. | 2410.03846 | null |
| 2024-10-04 | MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion | Junyi Zhang et.al. | 2410.03825 | link |
| 2024-10-04 | Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images | Ci Li et.al. | 2410.03438 | null |
| 2024-10-04 | HRVMamba: High-Resolution Visual State Space Model for Dense Prediction | Hao Zhang et.al. | 2410.03174 | link |
| 2024-10-04 | CLIP-Clique: Graph-based Correspondence Matching Augmented by Vision Language Models for Object-based Global Localization | Shigemichi Matsuzaki et.al. | 2410.03054 | null |
| 2024-10-03 | Why Sample Space Matters: Keyframe Sampling Optimization for LiDAR-based Place Recognition | Nikolaos Stathoulopoulos et.al. | 2410.02643 | null |
| 2024-10-03 | Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features | Chengkai Hou et.al. | 2410.02237 | null |
| 2024-10-02 | SGBA: Semantic Gaussian Mixture Model-Based LiDAR Bundle Adjustment | Xingyu Ji et.al. | 2410.01618 | null |
| 2024-10-02 | SurgeoNet: Realtime 3D Pose Estimation of Articulated Surgical Instruments from Stereo Images using a Synthetically-trained Network | Ahmed Tawfik Aboukhadra et.al. | 2410.01293 | null |
| 2024-10-01 | Pose Estimation of Buried Deep-Sea Objects using 3D Vision Deep Learning Models | Jerry Yan et.al. | 2410.01061 | null |
| 2024-10-01 | RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations | Kaichen Zhou et.al. | 2410.00713 | link |
| 2024-10-01 | GERA: Geometric Embedding for Efficient Point Registration Analysis | Geng Li et.al. | 2410.00589 | null |
| 2024-09-30 | Continual Human Pose Estimation for Incremental Integration of Keypoints and Pose Variations | Muhammad Saif Ullah Khan et.al. | 2409.20469 | link |
| 2024-09-30 | Classroom-Inspired Multi-Mentor Distillation with Adaptive Learning Strategies | Shalini Sarode et.al. | 2409.20237 | link |
| 2024-09-30 | PuzzleBoard: A New Camera Calibration Pattern with Position Encoding | Peer Stelldinger et.al. | 2409.20127 | link |
| 2024-09-30 | Robust Gaussian Splatting SLAM by Leveraging Loop Closure | Zunjie Zhu et.al. | 2409.20111 | null |
| 2024-09-30 | GearTrack: Automating 6D Pose Estimation | Yu Deng et.al. | 2409.19986 | null |
| 2024-09-29 | PPLNs: Parametric Piecewise Linear Networks for Event-Based Temporal Modeling and Beyond | Chen Song et.al. | 2409.19772 | null |
| 2024-09-29 | GelSlim 4.0: Focusing on Touch and Reproducibility | Andrea Sipos et.al. | 2409.19770 | null |
| 2024-09-27 | Robust Proximity Operations using Probabilistic Markov Models | Deep Parikh et.al. | 2409.19062 | null |
| 2024-09-27 | Exploiting Motion Prior for Accurate Pose Estimation of Dashboard Cameras | Yipeng Lu et.al. | 2409.18673 | null |
| 2024-09-27 | DynaWeightPnP: Toward global real-time 3D-2D solver in PnP without correspondences | Jingwei Song et.al. | 2409.18457 | null |
| 2024-09-26 | Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation | Mengchen Zhang et.al. | 2409.18261 | link |
| 2024-09-26 | AI-Powered Augmented Reality for Satellite Assembly, Integration and Test | Alvaro Patricio et.al. | 2409.18101 | null |
| 2024-09-27 | Leveraging Anthropometric Measurements to Improve Human Mesh Estimation and Ensure Consistent Body Shapes | Katja Ludwig et.al. | 2409.17671 | null |
| 2024-09-25 | Safe Leaf Manipulation for Accurate Shape and Pose Estimation of Occluded Fruits | Shaoxiong Yao et.al. | 2409.17389 | null |
| 2024-09-25 | Hierarchical Tri-manual Planning for Vision-assisted Fruit Harvesting with Quadrupedal Robots | Zhichao Liu et.al. | 2409.17116 | null |
| 2024-09-25 | Self-Sensing for Proprioception and Contact Detection in Soft Robots Using Shape Memory Alloy Artificial Muscles | Ran Jing et.al. | 2409.17111 | null |
| 2024-09-25 | Online 6DoF Pose Estimation in Forests using Cross-View Factor Graph Optimisation and Deep Learned Re-localisation | Lucas Carvalho de Lima et.al. | 2409.16680 | null |
| 2024-09-25 | FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation | Jingyi Tang et.al. | 2409.16600 | null |
| 2024-09-25 | Robo-Platform: A Robotic System for Recording Sensors and Controlling Robots | Masoud Dayani Najafabadi et.al. | 2409.16595 | null |
| 2024-09-24 | PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings | Sutharsan Mahendren et.al. | 2409.15832 | null |
| 2024-09-24 | LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation | Ruida Zhang et.al. | 2409.15727 | null |
| 2024-09-23 | Framework for Robust Localization of UUVs and Mapping of Net Pens | David Botta et.al. | 2409.15475 | null |
| 2024-09-23 | FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera | Guoyang Zhao et.al. | 2409.15054 | link |
| 2024-09-23 | BranchPoseNet: Characterizing tree branching with a deep learning-based pose estimation approach | Stefano Puliti et.al. | 2409.14755 | link |
| 2024-09-23 | ERPoT: Effective and Reliable Pose Tracking for Mobile Robots Based on Lightweight and Compact Polygon Maps | Haiming Gao et.al. | 2409.14723 | null |
| 2024-09-22 | Tactile Functasets: Neural Implicit Representations of Tactile Datasets | Sikai Li et.al. | 2409.14592 | null |
| 2024-09-22 | AR Overlay: Training Image Pose Estimation on Curved Surface in a Synthetic Way | Sining Huang et.al. | 2409.14577 | null |
| 2024-09-22 | DROP: Dexterous Reorientation via Online Planning | Albert H. Li et.al. | 2409.14562 | null |
| 2024-09-21 | Combining Absolute and Semi-Generalized Relative Poses for Visual Localization | Vojtech Panek et.al. | 2409.14269 | null |
| 2024-09-18 | SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection | Tim Engelbracht et.al. | 2409.11870 | null |
| 2024-09-18 | End-to-End Probabilistic Geometry-Guided Regression for 6DoF Object Pose Estimation | Thomas Pöllabauer et.al. | 2409.11819 | null |
| 2024-09-18 | Bridging Domain Gap for Flight-Ready Spaceborne Vision | Tae Ha Park et.al. | 2409.11661 | null |
| 2024-09-17 | Good Grasps Only: A data engine for self-supervised fine-tuning of pose estimation using grasp poses for verification | Frederik Hagelskjær et.al. | 2409.11512 | null |
| 2024-09-17 | Training Datasets Generation for Machine Learning: Application to Vision Based Navigation | Jérémy Lebreton et.al. | 2409.11383 | null |
| 2024-09-17 | OmniGen: Unified Image Generation | Shitao Xiao et.al. | 2409.11340 | link |
| 2024-09-17 | ULOC: Learning to Localize in Complex Large-Scale Environments with Ultra-Wideband Ranges | Thien-Minh Nguyen et.al. | 2409.11122 | link |
| 2024-09-17 | Depth-based Privileged Information for Boosting 3D Human Pose Estimation on RGB | Alessandro Simoni et.al. | 2409.11104 | null |
| 2024-09-21 | HGSLoc: 3DGS-based Heuristic Camera Pose Refinement | Zhongyan Niu et.al. | 2409.10925 | null |
| 2024-09-17 | Pose estimation of CubeSats via sensor fusion and Error-State Extended Kalman Filter | Deep Parikh et.al. | 2409.10815 | null |
| 2024-09-16 | CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera | Jingpei Lu et.al. | 2409.10441 | null |
| 2024-09-16 | HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models | Vineet Bhat et.al. | 2409.10419 | link |
| 2024-09-16 | 2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation? | Téo Guichoux et.al. | 2409.10357 | null |
| 2024-09-16 | Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference | Huy-Dung Nguyen et.al. | 2409.10095 | null |
| 2024-09-15 | Precise Pick-and-Place using Score-Based Diffusion Networks | Shih-Wei Guo et.al. | 2409.09725 | null |
| 2024-09-15 | Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild | Nie Lin et.al. | 2409.09714 | null |
| 2024-09-15 | Proximity operations of CubeSats via sensor fusion of ultra-wideband range measurements with rate gyroscopes, accelerometers and monocular vision | Deep Parikh et.al. | 2409.09665 | null |
| 2024-09-15 | A Scalable Tabletop Satellite Automation Testbed:Design And Experiments | Deep Parikh et.al. | 2409.09633 | null |
| 2024-09-14 | MAC-VO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry | Yuheng Qiu et.al. | 2409.09479 | null |
| 2024-09-14 | Distributed Invariant Kalman Filter for Object-level Multi-robot Pose SLAM | Haoying Li et.al. | 2409.09410 | null |
| 2024-09-13 | Causal Transformer for Fusion and Pose Estimation in Deep Visual Inertial Odometry | Yunus Bilge Kurt et.al. | 2409.08769 | link |
| 2024-09-13 | WheelPoser: Sparse-IMU Based Body Pose Estimation for Wheelchair Users | Yunzhi Li et.al. | 2409.08494 | null |
| 2024-09-12 | Bayesian Inverse Graphics for Few-Shot Concept Learning | Octavio Arriaga et.al. | 2409.08351 | null |
| 2024-09-12 | Touch2Touch: Cross-Modal Tactile Generation for Object Manipulation | Samanta Rodriguez et.al. | 2409.08269 | null |
| 2024-09-12 | Covariance Intersection-based Invariant Kalman Filtering(DInCIKF) for Distributed Pose Estimation | Haoying Li et.al. | 2409.07933 | null |
| 2024-09-12 | GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions | Liang Feng et.al. | 2409.07798 | null |
| 2024-09-12 | GatedUniPose: A Novel Approach for Pose Estimation Combining UniRepLKNet and Gated Convolution | Liang Feng et.al. | 2409.07752 | null |
| 2024-09-11 | FaVoR: Features via Voxel Rendering for Camera Relocalization | Vincenzo Polizzi et.al. | 2409.07571 | null |
| 2024-09-11 | Benchmarking 2D Egocentric Hand Pose Datasets | Olga Taran et.al. | 2409.07337 | null |
| 2024-09-11 | iKalibr-RGBD: Partially-Specialized Target-Free Visual-Inertial Spatiotemporal Calibration For RGBDs via Continuous-Time Velocity Estimation | Shuolong Chen et.al. | 2409.07116 | link |
| 2024-09-11 | Equivariant Filter for Tightly Coupled LiDAR-Inertial Odometry | Anbo Tao et.al. | 2409.06948 | null |
| 2024-09-10 | A Bayesian framework for active object recognition, pose estimation and shape transfer learning through touch | Haodong Zheng et.al. | 2409.06912 | null |
| 2024-09-11 | Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences | Shishir Reddy Vutukur et.al. | 2409.06683 | null |
| 2024-09-10 | PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation | Ginger Delmas et.al. | 2409.06535 | null |
| 2024-09-10 | Test-Time Certifiable Self-Supervision to Bridge the Sim2Real Gap in Event-Based Satellite Pose Estimation | Mohsi Jawaid et.al. | 2409.06240 | null |
| 2024-09-09 | From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models | Tessa Pulli et.al. | 2409.05413 | null |
| 2024-09-08 | HelmetPoser: A Helmet-Mounted IMU Dataset for Data-Driven Estimation of Human Head Motion in Diverse Conditions | Jianping Li et.al. | 2409.05006 | null |
| 2024-09-06 | Casper DPM: Cascaded Perceptual Dynamic Projection Mapping onto Hands | Yotam Erel et.al. | 2409.04397 | null |
| 2024-09-06 | GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers | Lorenza Prospero et.al. | 2409.04196 | null |
| 2024-09-06 | Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics | Woojin Cho et.al. | 2409.04033 | null |
| 2024-09-06 | Matched Filtering based LiDAR Place Recognition for Urban and Natural Environments | Therese Joseph et.al. | 2409.03998 | null |
| 2024-09-09 | The Influence of Faulty Labels in Data Sets on Human Pose Estimation | Arnold Schwarz et.al. | 2409.03887 | null |
| 2024-09-05 | MaskVal: Simple but Effective Uncertainty Quantification for 6D Pose Estimation | Philipp Quentin et.al. | 2409.03556 | null |
| 2024-09-05 | UAV (Unmanned Aerial Vehicles): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking | Md. Mahfuzur Rahman et.al. | 2409.03245 | null |
| 2024-09-01 | Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach | Wenjun Huang et.al. | 2409.02715 | null |
| 2024-09-04 | Object Gaussian for Monocular 6D Pose Estimation from Sparse Views | Luqing Luo et.al. | 2409.02581 | null |
| 2024-09-03 | EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision | Yiming Zhao et.al. | 2409.02224 | null |
| 2024-09-03 | Deep learning for objective estimation of Parkinsonian tremor severity | Felipe Duque-Quiceno et.al. | 2409.02011 | null |
| 2024-09-03 | SPiKE: 3D Human Pose from Point Cloud Sequences | Irene Ballester et.al. | 2409.01879 | link |
| 2024-09-02 | Kalman Filtering for Precise Indoor Position and Orientation Estimation Using IMU and Acoustics on Riemannian Manifolds | Mohammed H. AlSharif et.al. | 2409.01002 | null |
| 2024-09-01 | Detection, Recognition and Pose Estimation of Tabletop Objects | Sanjuksha Nirgude et.al. | 2409.00869 | null |
| 2024-09-01 | DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation | Huixin Zhang et.al. | 2409.00744 | link |
| 2024-09-01 | MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds | Ziqiang Dang et.al. | 2409.00736 | null |
| 2024-08-31 | ActionPose: Pretraining 3D Human Pose Estimation with the Dark Knowledge of Action | Longyun Liao et.al. | 2409.00449 | null |
| 2024-09-02 | Augmented Reality without Borders: Achieving Precise Localization Without Maps | Albert Gassol Puigjaner et.al. | 2408.17373 | null |
| 2024-08-30 | BOP-D: Revisiting 6D Pose Estimation Benchmark for Better Evaluation under Visual Ambiguities | Boris Meden et.al. | 2408.17297 | null |
| 2024-08-30 | EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs | Zhen Fan et.al. | 2408.17168 | null |
| 2024-09-01 | Generic Objects as Pose Probes for Few-Shot View Synthesis | Zhirui Gao et.al. | 2408.16690 | null |
| 2024-08-29 | OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation | Yuchen Che et.al. | 2408.16547 | link |
| 2024-08-29 | GRPose: Learning Graph Relations for Human Image Generation with Pose Priors | Xiangchen Yin et.al. | 2408.16540 | null |
| 2024-08-28 | Are Pose Estimators Ready for the Open World? STAGE: Synthetic Data Generation Toolkit for Auditing 3D Human Pose Estimators | Nikita Kister et.al. | 2408.16536 | null |
| 2024-08-28 | Multi-view Pose Fusion for Occlusion-Aware 3D Human Pose Estimation | Laura Bragagnolo et.al. | 2408.15810 | link |
| 2024-08-30 | Addressing the challenges of loop detection in agricultural environments | Nicolás Soncini et.al. | 2408.15761 | link |
| 2024-08-28 | Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph | Zherong Zhang et.al. | 2408.15750 | null |
| 2024-08-28 | Benchmarking ML Approaches to UWB-Based Range-Only Posture Recognition for Human Robot-Interaction | Salma Salimi et.al. | 2408.15717 | null |
| 2024-08-26 | Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model | Abu Saleh Musa Miah et.al. | 2408.14111 | null |
| 2024-08-25 | InterTrack: Tracking Human Object Interaction without Object Templates | Xianghui Xie et.al. | 2408.13953 | null |
| 2024-08-24 | Temporally-consistent 3D Reconstruction of Birds | Johannes Hägerlind et.al. | 2408.13629 | null |
| 2024-08-24 | Explainable Convolutional Networks for Crater Detection and Lunar Landing Navigation | Jianing Song et.al. | 2408.13587 | null |
| 2024-08-27 | Sapiens: Foundation for Human Vision Models | Rawal Khirodkar et.al. | 2408.12569 | link |
| 2024-08-20 | GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting | Changkun Liu et.al. | 2408.11085 | null |
| 2024-08-20 | ZebraPose: Zebra Detection and Pose Estimation using only Synthetic Data | Elia Bonetto et.al. | 2408.10831 | null |
| 2024-08-20 | MPL: Lifting 3D Human Pose from Multi-view 2D Poses | Seyed Abolfazl Ghasemzadeh et.al. | 2408.10805 | link |
| 2024-08-19 | RUMI: Rummaging Using Mutual Information | Sheng Zhong et.al. | 2408.10450 | null |
| 2024-08-19 | SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views | Chao Xu et.al. | 2408.10195 | null |
| 2024-08-19 | SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition | Wiktor Mucha et.al. | 2408.10037 | link |
| 2024-08-19 | Pose-GuideNet: Automatic Scanning Guidance for Fetal Head Ultrasound from Pose Estimation | Qianhui Men et.al. | 2408.09931 | null |
| 2024-08-18 | OPPH: A Vision-Based Operator for Measuring Body Movements for Personal Healthcare | Chen Long-fei et.al. | 2408.09409 | null |
| 2024-08-17 | An Open-Source American Sign Language Fingerspell Recognition and Semantic Pose Retrieval Interface | Kevin Jose Thomas et.al. | 2408.09311 | link |
| 2024-08-16 | ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation | Hao Tang et.al. | 2408.09042 | null |
| 2024-08-16 | Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS | Wei Sun et.al. | 2408.08723 | null |
| 2024-08-16 | SketchRef: A Benchmark Dataset and Evaluation Metrics for Automated Sketch Synthesis | Xingyue Lin et.al. | 2408.08623 | null |
| 2024-08-15 | HyperTaxel: Hyper-Resolution for Taxel-Based Tactile Signals Through Contrastive Learning | Hongyu Li et.al. | 2408.08312 | null |
| 2024-08-15 | Comparative Evaluation of 3D Reconstruction Methods for Object Pose Estimation | Varun Burde et.al. | 2408.08234 | link |
| 2024-08-15 | Towards Practical Human Motion Prediction with LiDAR Point Clouds | Xiao Han et.al. | 2408.08202 | null |
| 2024-08-15 | Your Turn: Real-World Turning Angle Estimation for Parkinson’s Disease Severity Assessment | Qiushuo Cheng et.al. | 2408.08182 | null |
| 2024-08-15 | Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models | Tianyu Wang et.al. | 2408.07975 | null |
| 2024-08-15 | GOReloc: Graph-based Object-Level Relocalization for Visual SLAM | Yutong Wang et.al. | 2408.07917 | link |
| 2024-08-13 | A Miniature Vision-Based Localization System for Indoor Blimps | Shicong Ma et.al. | 2408.06648 | null |
| 2024-08-12 | UniT: Unified Tactile Representation for Robot Learning | Zhengtong Xu et.al. | 2408.06481 | link |
| 2024-08-12 | Moo-ving Beyond Tradition: Revolutionizing Cattle Behavioural Phenotyping with Pose Estimation Techniques | Navid Ghassemi et.al. | 2408.06336 | null |
| 2024-08-12 | CAD-Mesher: A Convenient, Accurate, Dense Mesh-based Mapping Module in SLAM for Dynamic Environments | Yanpeng Jia et.al. | 2408.05981 | null |
| 2024-08-12 | PAFormer: Part Aware Transformer for Person Re-identification | Hyeono Jung et.al. | 2408.05918 | null |
| 2024-08-11 | SABER-6D: Shape Representation Based Implicit Object Pose Estimation | Shishir Reddy Vutukur et.al. | 2408.05867 | null |
| 2024-08-11 | Real-Time Drowsiness Detection Using Eye Aspect Ratio and Facial Landmark Detection | Varun Shiva Krishna Rupani et.al. | 2408.05836 | null |
| 2024-08-10 | Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis | Zhongche Qu et.al. | 2408.05635 | null |
| 2024-08-10 | Anticipation through Head Pose Estimation: a preliminary study | Federico Figari Tomenotti et.al. | 2408.05516 | null |
| 2024-08-09 | Mesh-based Object Tracking for Dynamic Semantic 3D Scene Graphs via Ray Tracing | Lennart Niecksch et.al. | 2408.04979 | null |
| 2024-08-07 | PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model | Yunlong Huang et.al. | 2408.03540 | null |
| 2024-08-06 | Line-based 6-DoF Object Pose Estimation and Tracking With an Event Camera | Zibin Liu et.al. | 2408.03225 | link |
| 2024-08-06 | Training on the Fly: On-device Self-supervised Learning aboard Nano-drones within 20 mW | Elia Cereda et.al. | 2408.03168 | null |
| 2024-08-06 | BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications | G. Manni et.al. | 2408.03078 | link |
| 2024-08-07 | Pose Magic: Efficient and Temporally Consistent Human Pose Estimation with a Hybrid Mamba-GCN Network | Xinyi Zhang et.al. | 2408.02922 | null |
| 2024-08-05 | Analyzing Data Efficiency and Performance of Machine Learning Algorithms for Assessing Low Back Pain Physical Rehabilitation Exercises | Aleksa Marusic et.al. | 2408.02855 | null |
| 2024-08-05 | Joint-Motion Mutual Learning for Pose Estimation in Videos | Sifan Wu et.al. | 2408.02285 | null |
| 2024-08-04 | AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos | Feichi Lu et.al. | 2408.02110 | null |
| 2024-08-04 | Generalized Maximum Likelihood Estimation for Perspective-n-Point Problem | Tian Zhan et.al. | 2408.01945 | null |
| 2024-08-03 | MotionTrace: IMU-based Field of View Prediction for Smartphone AR Interactions | Rahul Islam et.al. | 2408.01850 | null |
| 2024-08-03 | BEVPlace++: Fast, Robust, and Lightweight LiDAR Global Localization for Unmanned Ground Vehicles | Lun Luo et.al. | 2408.01841 | link |
| 2024-08-03 | E $^3$ NeRF: Efficient Event-Enhanced Neural Radiance Fields from Blurry Images | Yunshan Qi et.al. | 2408.01840 | null |
| 2024-08-03 | Survey on Emotion Recognition through Posture Detection and the possibility of its application in Virtual Reality | Leina Elansary et.al. | 2408.01728 | null |
| 2024-08-03 | Stimulating Imagination: Towards General-purpose Object Rearrangement | Jianyang Wu et.al. | 2408.01655 | null |
| 2024-08-02 | Full-range Head Pose Geometric Data Augmentations | Huei-Chung Hu et.al. | 2408.01566 | null |
| 2024-07-31 | Adapting Skills to Novel Grasps: A Self-Supervised Approach | Georgios Papagiannis et.al. | 2408.00178 | null |
| 2024-07-31 | Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods | Xusheng Luo et.al. | 2408.00117 | null |
| 2024-07-30 | HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation | Wencan Cheng et.al. | 2407.20542 | link |
| 2024-07-30 | Markers Identification for Relative Pose Estimation of an Uncooperative Target | Batu Candan et.al. | 2407.20515 | null |
| 2024-07-29 | BaseBoostDepth: Exploiting Larger Baselines For Self-supervised Monocular Depth Estimation | Kieran Saunders et.al. | 2407.20437 | null |
| 2024-07-28 | Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph | Zhengcen Li et.al. | 2407.19497 | link |
| 2024-07-26 | Flexible graph convolutional network for 3D human pose estimation | Abu Taib Mohammed Shahjahan et.al. | 2407.19077 | null |
| 2024-07-26 | From 2D to 3D: AISG-SLA Visual Localization Challenge | Jialin Gao et.al. | 2407.18590 | null |
| 2024-07-28 | HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Zhenzhi Wang et.al. | 2407.17438 | link |
| 2024-07-24 | Active Loop Closure for OSM-guided Robotic Mapping in Large-Scale Urban Environments | Wei Gao et.al. | 2407.17078 | null |
| 2024-07-30 | DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction | Xiaobiao Du et.al. | 2407.16988 | link |
| 2024-07-24 | Pose Estimation from Camera Images for Underwater Inspection | Luyuan Peng et.al. | 2407.16961 | null |
| 2024-07-23 | COALA: A Practical and Vision-Centric Federated Learning Platform | Weiming Zhuang et.al. | 2407.16560 | link |
| 2024-07-23 | Probabilistic Parameter Estimators and Calibration Metrics for Pose Estimation from Image Features | Romeo Valentin et.al. | 2407.16223 | null |
| 2024-07-23 | Optimal camera-robot pose estimation in linear time from points and lines | Guangyang Zeng et.al. | 2407.16151 | null |
| 2024-07-23 | 3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images | Jie Zhao et.al. | 2407.16137 | null |
| 2024-07-21 | CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models | Zheng Chong et.al. | 2407.15886 | link |
| 2024-07-22 | RADA: Robust and Accurate Feature Learning with Domain Adaptation | Jingtai He et.al. | 2407.15791 | null |
| 2024-07-22 | Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection | Kangqi Ma et.al. | 2407.15771 | null |
| 2024-07-22 | 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model | Matteo Bortolon et.al. | 2407.15484 | null |
| 2024-07-23 | Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions | Yihao Ai et.al. | 2407.15451 | null |
| 2024-07-22 | avaTTAR: Table Tennis Stroke Training with On-body and Detached Visualization in Augmented Reality | Dizhi Ma et.al. | 2407.15373 | null |
| 2024-07-20 | From Underground Mines to Offices: A Versatile and Robust Framework for Range-Inertial SLAM | Lorenzo Montano-Oliván et.al. | 2407.14797 | null |
| 2024-07-19 | ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose Estimation | Luke Bidulka et.al. | 2407.14605 | null |
| 2024-07-19 | 6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry | Sungho Chun et.al. | 2407.14136 | link |
| 2024-07-18 | RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark | Yuan-Hao Ho et.al. | 2407.13930 | null |
| 2024-07-19 | GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation | Bangyan Liao et.al. | 2407.13537 | null |
| 2024-07-18 | SCAPE: A Simple and Strong Category-Agnostic Pose Estimator | Yujia Liang et.al. | 2407.13483 | link |
| 2024-07-17 | SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization | Yiyang Chen et.al. | 2407.12667 | link |
| 2024-07-17 | Invertible Neural Warp for NeRF | Shin-Fang Chng et.al. | 2407.12354 | null |
| 2024-07-16 | NeuSurfEmb: A Complete Pipeline for Dense Correspondence-based 6D Object Pose Estimation without CAD Models | Francesco Milano et.al. | 2407.12207 | link |
| 2024-07-16 | Monocular pose estimation of articulated surgical instruments in open surgery | Robert Spektor et.al. | 2407.12138 | null |
| 2024-07-17 | GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection | Jingwen Yu et.al. | 2407.11736 | link |
| 2024-07-16 | TCFormer: Visual Recognition via Token Clustering Transformer | Wang Zeng et.al. | 2407.11321 | link |
| 2024-07-15 | A BlueROV2-based platform for underwater mapping experiments | Tudor Alinei-Poiana et.al. | 2407.10901 | null |
| 2024-07-15 | LVCP: LiDAR-Vision Tightly Coupled Collaborative Real-time Relative Positioning | Zhuozhu Jian et.al. | 2407.10782 | null |
| 2024-07-15 | Domain Generalization for 6D Pose Estimation Through NeRF-based Image Synthesis | Antoine Legrand et.al. | 2407.10762 | null |
| 2024-07-16 | GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation | Haonan Wang et.al. | 2407.10756 | null |
| 2024-07-15 | Learning to Estimate the Pose of a Peer Robot in a Camera Image by Predicting the States of its LEDs | Nicholas Carlotti et.al. | 2407.10661 | null |
| 2024-07-15 | Deep-Learning-Based Markerless Pose Estimation Systems in Gait Analysis: DeepLabCut Custom Training and the Refinement Function | Giulia Panconi et.al. | 2407.10590 | null |
| 2024-07-14 | 3D Foundation Models Enable Simultaneous Geometry and Pose Estimation of Grasped Objects | Weiming Zhi et.al. | 2407.10331 | null |
| 2024-07-16 | psifx – Psychological and Social Interactions Feature Extraction Package | Guillaume Rochette et.al. | 2407.10266 | null |
| 2024-07-14 | Efficient Facial Landmark Detection for Embedded Systems | Ji-Jia Wu et.al. | 2407.10228 | null |
| 2024-07-14 | PAFUSE: Part-based Diffusion for 3D Whole-Body Pose Estimation | Nermin Samet et.al. | 2407.10220 | null |
| 2024-07-12 | iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning | Tom Fischer et.al. | 2407.09271 | link |
| 2024-07-12 | HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation | Manuel Birlo et.al. | 2407.09215 | null |
| 2024-07-12 | KGpose: Keypoint-Graph Driven End-to-End Multi-Object 6D Pose Estimation via Point-Wise Pose Voting | Andrew Jeong et.al. | 2407.08909 | null |
| 2024-07-11 | RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation | Tao Jiang et.al. | 2407.08634 | link |
| 2024-07-11 | SRPose: Two-view Relative Pose Estimation with Sparse Keypoints | Rui Yin et.al. | 2407.08199 | link |
| 2024-07-11 | SGLC: Semantic Graph-Guided Coarse-Fine-Refine Full Loop Closing for LiDAR SLAM | Neng Wang et.al. | 2407.08106 | link |
| 2024-07-10 | RoCap: A Robotic Data Collection Pipeline for the Pose Estimation of Appearance-Changing Objects | Jiahao Nick Li et.al. | 2407.08081 | null |
| 2024-07-10 | Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization | Jinjie Mai et.al. | 2407.08023 | link |
| 2024-07-10 | Greit-HRNet: Grouped Lightweight High-Resolution Network for Human Pose Estimation | Junjia Han et.al. | 2407.07389 | null |
| 2024-07-09 | Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images | Chuanrui Zhang et.al. | 2407.06984 | null |
| 2024-07-09 | Computer vision tasks for intelligent aerospace missions: An overview | Huilin Chen et.al. | 2407.06513 | null |
| 2024-07-08 | GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields | Weiyi Xue et.al. | 2407.05597 | null |
| 2024-07-10 | On the power of data augmentation for head pose estimation | Michael Welter et.al. | 2407.05357 | null |
| 2024-07-07 | SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning | Yi Feng et.al. | 2407.05283 | link |
| 2024-07-05 | Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos | Leonhard Sommer et.al. | 2407.04384 | link |
| 2024-07-04 | Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation | Laiyan Ding et.al. | 2407.04041 | link |
| 2024-07-04 | Markerless Multi-view 3D Human Pose Estimation: a survey | Ana Filipa Rodrigues Nogueira et.al. | 2407.03817 | null |
| 2024-07-04 | A Fast Dynamic Point Detection Method for LiDAR-Inertial Odometry in Driving Scenarios | Zikang Yuan et.al. | 2407.03590 | null |
| 2024-07-03 | Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation | Mengmeng Cui et.al. | 2407.02990 | null |
| 2024-07-03 | Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction | Jiaxin Guo et.al. | 2407.02918 | link |
| 2024-07-02 | SUPER: Seated Upper Body Pose Estimation using mmWave Radars | Bo Zhang et.al. | 2407.02455 | null |
| 2024-07-02 | ReliaAvatar: A Robust Real-Time Avatar Animator with Integrated Motion Prediction | Bo Qian et.al. | 2407.02129 | null |
| 2024-07-02 | Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval | Nicola Messina et.al. | 2407.02104 | null |
| 2024-07-01 | Active Human Pose Estimation via an Autonomous UAV Agent | Jingxi Chen et.al. | 2407.01811 | null |
| 2024-07-01 | RoDyn-SLAM: Robust Dynamic Dense RGB-D SLAM with Neural Radiance Fields | Haochen Jiang et.al. | 2407.01303 | null |
| 2024-07-01 | Collaborative Graph Exploration with Reduced Pose-SLAM Uncertainty via Submodular Optimization | Ruofei Bai et.al. | 2407.01013 | null |
| 2024-06-30 | Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation | Adnan Abdullah et.al. | 2407.00848 | null |
| 2024-06-29 | When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration | Philipp Allgeuer et.al. | 2407.00518 | null |
| 2024-06-28 | Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review | Moseli Mots’oehli et.al. | 2407.00252 | null |
| 2024-06-28 | EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans | Nicola Garau et.al. | 2406.19726 | null |
| 2024-06-28 | CLOi-Mapper: Consistent, Lightweight, Robust, and Incremental Mapper With Embedded Systems for Commercial Robot Services | DongKi Noh et.al. | 2406.19634 | null |
| 2024-06-27 | Multimodal Visual-haptic pose estimation in the presence of transient occlusion | Michael Zechmair et.al. | 2406.19323 | null |
| 2024-06-27 | Human Modelling and Pose Estimation Overview | Pawel Knap et.al. | 2406.19290 | null |
| 2024-06-26 | Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference | Yuan Gao et.al. | 2406.18453 | link |
| 2024-06-27 | Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods | Filipe Gama et.al. | 2406.17382 | null |
| 2024-06-24 | High-resolution open-vocabulary object 6D pose estimation | Jaime Corsetti et.al. | 2406.16384 | null |
| 2024-06-23 | Breaking the Frame: Image Retrieval by Visual Overlap Prediction | Tong Wei et.al. | 2406.16204 | link |
| 2024-06-21 | Efficient Human Pose Estimation: Leveraging Advanced Techniques with MediaPipe | Sandeep Singh Sengar et.al. | 2406.15649 | link |
| 2024-06-24 | Investigating the impact of 2D gesture representation on co-speech gesture generation | Teo Guichoux et.al. | 2406.15111 | null |
| 2024-06-20 | Benchmarking Monocular 3D Dog Pose Estimation Using In-The-Wild Motion Capture Data | Moira Shooter et.al. | 2406.14412 | null |
| 2024-06-20 | PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions | Sihan Ma et.al. | 2406.14367 | null |
| 2024-06-19 | NeRF-Feat: 6D Object Pose Estimation using Feature Rendering | Shishir Reddy Vutukur et.al. | 2406.13796 | null |
| 2024-06-19 | CNN Based Flank Predictor for Quadruped Animal Species | Vanessa Suessle et.al. | 2406.13588 | null |
| 2024-06-19 | MVSBoost: An Efficient Point Cloud-based 3D Reconstruction | Umair Haroon et.al. | 2406.13515 | null |
| 2024-06-19 | An Efficient yet High-Performance Method for Precise Radar-Based Imaging of Human Hand Poses | Johanna Bräunig et.al. | 2406.13464 | null |
| 2024-06-18 | Head Pose Estimation and 3D Neural Surface Reconstruction via Monocular Camera in situ for Navigation and Safe Insertion into Natural Openings | Ruijie Tang et.al. | 2406.13048 | null |
| 2024-06-17 | Matching Query Image Against Selected NeRF Feature for Efficient and Scalable Localization | Huaiji Zhou et.al. | 2406.11766 | null |
| 2024-06-17 | Domain Generalization for In-Orbit 6D Pose Estimation | Antoine Legrand et.al. | 2406.11743 | null |
| 2024-06-17 | SeamPose: Repurposing Seams as Capacitive Sensors in a Shirt for Upper-Body Pose Tracking | Tianhong Catherine Yu et.al. | 2406.11645 | null |
| 2024-06-14 | Galibr: Targetless LiDAR-Camera Extrinsic Calibration Method via Ground Plane Initialization | Wonho Song et.al. | 2406.11599 | null |
| 2024-06-15 | MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception | M. Mahbubur Rahman et.al. | 2406.10708 | null |
| 2024-06-15 | Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference | Shayan Shekarforoush et.al. | 2406.10455 | null |
| 2024-06-14 | The BabyView dataset: High-resolution egocentric videos of infants’ and young children’s everyday experiences | Bria Long et.al. | 2406.10447 | null |
| 2024-06-14 | OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics | Yoni Gozlan et.al. | 2406.09788 | null |
| 2024-06-13 | ImageNet3D: Towards General-Purpose Object-Level 3D Understanding | Wufei Ma et.al. | 2406.09613 | link |
| 2024-06-13 | Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV | Maneesha Wickramasuriya et.al. | 2406.09260 | link |
| 2024-06-14 | Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning | Huy Hoang Nguyen et.al. | 2406.09039 | null |
| 2024-06-14 | VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks | Jiannan Wu et.al. | 2406.08394 | link |
| 2024-06-12 | Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization | Jiaxin Deng et.al. | 2406.08001 | null |
| 2024-06-12 | IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes | Fengtian Lang et.al. | 2406.07937 | link |
| 2024-06-12 | From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers | Swaminathan Gurumurthy et.al. | 2406.07785 | link |
| 2024-06-12 | SPIN: Spacecraft Imagery for Navigation | Javier Montalvo et.al. | 2406.07500 | link |
| 2024-06-11 | Realistic Data Generation for 6D Pose Estimation of Surgical Instruments | Juan Antonio Barragan et.al. | 2406.07328 | link |
| 2024-06-11 | SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale | Shester Gueuwou et.al. | 2406.06907 | null |
| 2024-06-10 | Multicam-SLAM: Non-overlapping Multi-camera SLAM for Indirect Visual Localization and Navigation | Shenghao Li et.al. | 2406.06374 | link |
| 2024-06-08 | A preprocessing-based planning framework for utilizing contacts in high-precision insertion tasks | Muhammad Suhail Saleem et.al. | 2406.05522 | null |
| 2024-06-06 | GLACE: Global Local Accelerated Coordinate Encoding | Fangjinhua Wang et.al. | 2406.04340 | link |
| 2024-06-06 | Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking | Jiyao Zhang et.al. | 2406.04316 | null |
| 2024-06-05 | Hi5: 2D Hand Pose Estimation with Zero Human Annotation | Masum Hasan et.al. | 2406.03599 | null |
| 2024-06-05 | Sparse Color-Code Net: Real-Time RGB-Based 6D Object Pose Estimation on Edge Devices | Xingjian Yang et.al. | 2406.02977 | null |
| 2024-06-04 | CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation | Dejia Xu et.al. | 2406.02509 | null |
| 2024-06-04 | HPE-CogVLM: New Head Pose Grounding Task Exploration on Vision Language Model | Yu Tian et.al. | 2406.01914 | null |
| 2024-06-03 | A Robust Filter for Marker-less Multi-person Tracking in Human-Robot Interaction Scenarios | Enrico Martini et.al. | 2406.01832 | link |
| 2024-06-01 | Equivariant amortized inference of poses for cryo-EM | Larissa de Ruijter et.al. | 2406.01630 | null |
| 2024-06-03 | 3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information | Sihan Wen et.al. | 2406.01196 | null |
| 2024-06-01 | CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation | Matan Rusanovsky et.al. | 2406.00384 | link |
| 2024-05-30 | Infinite 3D Landmarks: Improving Continuous 2D Facial Landmark Detection | Prashanth Chandran et.al. | 2405.20117 | null |
| 2024-05-30 | Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach | Muhammad Saif Ullah Khan et.al. | 2405.20084 | null |
| 2024-05-30 | TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM | Peifeng Jiang et.al. | 2405.19614 | null |
| 2024-05-29 | Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives | Mingqi Yuan et.al. | 2405.19531 | null |
| 2024-05-29 | Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature Preservation | Sabrina Cynthia Triess et.al. | 2405.19173 | null |
| 2024-05-28 | World Models for General Surgical Grasping | Hongbin Lin et.al. | 2405.17940 | null |
| 2024-05-27 | MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds | Jiahui Lei et.al. | 2405.17421 | null |
| 2024-05-27 | Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding | Niloofar Azizi et.al. | 2405.17397 | null |
| 2024-05-27 | $\text{Di}^2\text{Pose}$ : Discrete Diffusion Model for Occluded 3D Human Pose Estimation | Weiquan Wang et.al. | 2405.17016 | null |
| 2024-05-27 | Clustering-based Learning for UAV Tracking and Pose Estimation | Jiaping Xiao et.al. | 2405.16867 | null |
| 2024-05-26 | Multi-Modal UAV Detection, Classification and Tracking Algorithm – Technical Report for CVPR 2024 UG2 Challenge | Tianchen Deng et.al. | 2405.16464 | link |
| 2024-05-25 | Intensity and Texture Correction of Omnidirectional Image Using Camera Images for Indirect Augmented Reality | Hakim Ikebayashi et.al. | 2405.16008 | null |
| 2024-05-23 | CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments | Yang Zhou et.al. | 2405.14731 | link |
| 2024-05-23 | Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation | Daniel Kienzle et.al. | 2405.14467 | null |
| 2024-05-21 | Geometric Transformation Uncertainty for Improving 3D Fetal Brain Pose Prediction from Freehand 2D Ultrasound Videos | Jayroop Ramesh et.al. | 2405.13235 | null |
| 2024-05-21 | Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations | Antoine Legrand et.al. | 2405.12728 | null |
| 2024-05-21 | PoseGravity: Pose Estimation from Points and Lines with Axis Prior | Akshay Chandrasekhar et.al. | 2405.12646 | link |
| 2024-05-19 | Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation | Zejun Gu et.al. | 2405.12247 | null |
| 2024-05-20 | AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements | Calvin Yeung et.al. | 2405.12070 | link |
| 2024-05-19 | Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries | Christiaan G. A. Viviers et.al. | 2405.11677 | link |
| 2024-05-19 | Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation | Zejun Gu et.al. | 2405.11448 | null |
| 2024-05-18 | PS6D: Point Cloud Based Symmetry-Aware 6D Object Pose Estimation in Robot Bin-Picking | Yifan Yang et.al. | 2405.11257 | null |
| 2024-05-18 | MotionGS : Compact Gaussian Splatting SLAM by Motion Filter | Xinli Guo et.al. | 2405.11129 | link |
| 2024-05-17 | Resolving Symmetry Ambiguity in Correspondence-based Methods for Instance-level Object Pose Estimation | Yongliang Lin et.al. | 2405.10557 | null |
| 2024-05-16 | Diversity-Aware Sign Language Production through a Pose Encoding Variational Autoencoder | Mohamed Ilyes Lakhal et.al. | 2405.10423 | null |
| 2024-05-17 | Toon3D: Seeing Cartoons from a New Perspective | Ethan Weber et.al. | 2405.10320 | null |
| 2024-05-15 | Task-adaptive Q-Face | Haomiao Sun et.al. | 2405.09059 | null |
| 2024-05-14 | RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images | Zong-Wei Hong et.al. | 2405.08483 | link |
| 2024-05-14 | TP3M: Transformer-based Pseudo 3D Image Matching with Reference | Liming Han et.al. | 2405.08434 | null |
| 2024-05-13 | Deep Learning-Based Object Pose Estimation: A Comprehensive Survey | Jian Liu et.al. | 2405.07801 | link |
| 2024-05-13 | JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation | Xubo Luo et.al. | 2405.07429 | link |
| 2024-05-11 | TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization | Zhen Tan et.al. | 2405.07027 | null |
| 2024-05-11 | AHPPEBot: Autonomous Robot for Tomato Harvesting based on Phenotyping and Pose Estimation | Xingxu Li et.al. | 2405.06959 | null |
| 2024-05-10 | CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras | James Tang et.al. | 2405.06845 | link |
| 2024-05-10 | MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization | Pengcheng Zhu et.al. | 2405.06241 | null |
| 2024-05-10 | Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera | Haixin Shi et.al. | 2405.05858 | null |
| 2024-05-09 | Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion | Huanyu Tian et.al. | 2405.05817 | null |
| 2024-05-09 | NeuRSS: Enhancing AUV Localization and Bathymetric Mapping with Neural Rendering for Sidescan SLAM | Yiping Xie et.al. | 2405.05807 | null |
| 2024-05-09 | Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview | Yuhang Ming et.al. | 2405.05526 | null |
| 2024-05-08 | Adversary-Guided Motion Retargeting for Skeleton Anonymization | Thomas Carr et.al. | 2405.05428 | null |
| 2024-05-08 | FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models | Jinglin Xu et.al. | 2405.05216 | link |
| 2024-05-08 | ProbRadarM3F: mmWave Radar based Human Skeletal Pose Estimation with Probability Map Guided Multi-Format Feature Fusion | Bing Zhu et.al. | 2405.05164 | null |
| 2024-05-08 | GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation | Ivan Bilić et.al. | 2405.04890 | null |
| 2024-05-07 | Learning Distributional Demonstration Spaces for Task-Specific Cross-Pose Estimation | Jenny Wang et.al. | 2405.04609 | null |
| 2024-05-07 | Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform | Zhijian Qiao et.al. | 2405.03969 | null |
| 2024-05-07 | Joint Estimation of Identity Verification and Relative Pose for Partial Fingerprints | Xiongjun Guan et.al. | 2405.03959 | null |
| 2024-05-06 | Pose Priors from Language Models | Sanjay Subramanian et.al. | 2405.03689 | null |
| 2024-05-06 | Optimizing Hand Region Detection in MediaPipe Holistic Full-Body Pose Estimation to Improve Accuracy and Avoid Downstream Errors | Amit Moryossef et.al. | 2405.03545 | link |
| 2024-05-05 | Multi-hop graph transformer network for 3D human pose estimation | Zaedul Islam et.al. | 2405.03055 | null |
| 2024-05-05 | Blending Distributed NeRFs with Tri-stage Robust Pose Optimization | Baijun Ye et.al. | 2405.02880 | null |
| 2024-05-03 | WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD | Xuxin Cheng et.al. | 2405.02241 | null |
| 2024-05-03 | Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation | Xianzhou Zeng et.al. | 2405.02114 | link |
| 2024-05-03 | An Onboard Framework for Staircases Modeling Based on Point Clouds | Chun Qing et.al. | 2405.01918 | null |
| 2024-05-06 | ShadowNav: Autonomous Global Localization for Lunar Navigation in Darkness | Deegan Atha et.al. | 2405.01673 | null |
| 2024-05-02 | IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning | Ryan Hoque et.al. | 2405.01472 | null |
| 2024-05-02 | Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning | Liu Qiyuan et.al. | 2405.01284 | null |
| 2024-05-02 | Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors | Wenxuan Guo et.al. | 2405.01112 | null |
| 2024-05-02 | CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications | Jan Blumenkamp et.al. | 2405.01107 | null |
| 2024-05-04 | HandSSCA: 3D Hand Mesh Reconstruction with State Space Channel Attention from RGB images | Zixun Jiao et.al. | 2405.01066 | null |
| 2024-05-01 | Radar-Based Localization For Autonomous Ground Vehicles In Suburban Neighborhoods | Andrew J. Kramer et.al. | 2405.00600 | null |
| 2024-04-30 | Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging | Rayan Armani et.al. | 2404.19541 | link |
| 2024-04-30 | UniFS: Universal Few-shot Instance Perception with Point Representations | Sheng Jin et.al. | 2404.19401 | null |
| 2024-04-30 | Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training | Xingyu Song et.al. | 2404.19279 | null |
| 2024-04-30 | XFeat: Accelerated Features for Lightweight Image Matching | Guilherme Potje et.al. | 2404.19174 | null |
| 2024-04-29 | Self-Avatar Animation in Virtual Reality: Impact of Motion Signals Artifacts on the Full-Body Pose Reconstruction | Antoine Maiorca et.al. | 2404.18628 | null |
| 2024-04-29 | Mesh-based Photorealistic and Real-time 3D Mapping for Robust Visual Perception of Autonomous Underwater Vehicle | Jungwoo Lee et.al. | 2404.18395 | null |
| 2024-04-29 | Reconstructing Satellites in 3D from Amateur Telescope Images | Zhiming Chang et.al. | 2404.18394 | null |
| 2024-04-27 | Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs | Yiming Bao et.al. | 2404.17837 | null |
| 2024-04-26 | Localization Through Particle Filter Powered Neural Network Estimated Monocular Camera Poses | Yi Shen et.al. | 2404.17685 | null |
| 2024-04-26 | SLAM for Indoor Mapping of Wide Area Construction Environments | Vincent Ress et.al. | 2404.17215 | null |
| 2024-04-25 | WheelPose: Data Synthesis Techniques to Improve Pose Estimation Performance on Wheelchair Users | William Huang et.al. | 2404.17063 | link |
| 2024-04-25 | Transformer-Based Local Feature Matching for Multimodal Image Registration | Remi Delaunay et.al. | 2404.16802 | null |
| 2024-04-25 | DeepKalPose: An Enhanced Deep-Learning Kalman Filter for Temporally Consistent Monocular Vehicle Pose Estimation | Leandro Di Bella et.al. | 2404.16558 | null |
| 2024-04-25 | Efficient Solution of Point-Line Absolute Pose | Petr Hruby et.al. | 2404.16552 | link |
| 2024-04-25 | COBRA – COnfidence score Based on shape Regression Analysis for method-independent quality assessment of object pose estimation from single images | Panagiotis Sapoutzoglou et.al. | 2404.16471 | link |
| 2024-04-25 | MegaParticles: Range-based 6-DoF Monte Carlo Localization with GPU-Accelerated Stein Particle Filter | Kenji Koide et.al. | 2404.16370 | null |
| 2024-04-24 | 3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement | Filipa Lino et.al. | 2404.16136 | null |
| 2024-04-23 | SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation | Xiangyu Xu et.al. | 2404.15276 | link |
| 2024-04-25 | Domain adaptive pose estimation via multi-level alignment | Yugan Chen et.al. | 2404.14885 | link |
| 2024-04-23 | Semi-supervised 2D Human Pose Estimation via Adaptive Keypoint Masking | Kexin Meng et.al. | 2404.14835 | null |
| 2024-04-23 | UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues | Vandad Davoodnia et.al. | 2404.14634 | null |
| 2024-04-22 | DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation | Yonghao Dang et.al. | 2404.14025 | null |
| 2024-04-23 | CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory | Yunlong Ran et.al. | 2404.13896 | null |
| 2024-04-21 | Resampling-free Particle Filters in High-dimensions | Akhilan Boopathy et.al. | 2404.13698 | null |
| 2024-04-20 | EC-SLAM: Real-time Dense Neural RGB-D SLAM System with Effectively Constrained Global Bundle Adjustment | Guanghao Li et.al. | 2404.13346 | link |
| 2024-04-18 | Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds | Oliver Lemke et.al. | 2404.12440 | null |
| 2024-04-18 | Gait Recognition from Highly Compressed Videos | Andrei Niculae et.al. | 2404.12183 | null |
| 2024-04-17 | Mushroom Segmentation and 3D Pose Estimation from Point Clouds using Fully Convolutional Geometric Features and Implicit Pose Encoding | George Retsinas et.al. | 2404.12144 | link |
| 2024-04-17 | Kathakali Hand Gesture Recognition With Minimal Data | Kavitha Raju et.al. | 2404.11205 | null |
| 2024-04-17 | GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement | Linfang Zheng et.al. | 2404.11139 | null |
| 2024-04-17 | CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation | Lianyu Hu et.al. | 2404.11111 | link |
| 2024-04-16 | HumMUSS: Human Motion Understanding using State Space Models | Arnab Kumar Mondal et.al. | 2404.10880 | null |
| 2024-04-16 | Invariant Kalman Filtering with Noise-Free Pseudo-Measurements | Sven Goffin et.al. | 2404.10687 | null |
| 2024-04-16 | The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement | Gabriele Trivigno et.al. | 2404.10438 | null |
| 2024-04-16 | GaitPoint+: A Gait Recognition Network Incorporating Point Cloud Analysis and Recycling | Huantao Ren et.al. | 2404.10213 | null |
| 2024-04-16 | LWIRPOSE: A novel LWIR Thermal Image Dataset and Benchmark | Avinash Upadhyay et.al. | 2404.10212 | link |
| 2024-04-15 | LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives | Jiadi Cui et.al. | 2404.09748 | null |
| 2024-04-14 | In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition | Wiktor Mucha et.al. | 2404.09308 | null |
| 2024-04-13 | DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector | Johan Edstedt et.al. | 2404.08928 | link |
| 2024-04-16 | 3D Human Scan With A Moving Event Camera | Kai Kohyama et.al. | 2404.08504 | null |
| 2024-04-11 | Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method | Tashmoy Ghosh et.al. | 2404.07649 | null |
| 2024-04-11 | GLID: Pre-training a Generalist Encoder-Decoder Vision Model | Jihao Liu et.al. | 2404.07603 | null |
| 2024-04-10 | Measuring proximity to standard planes during fetal brain ultrasound scanning | Chiara Di Vece et.al. | 2404.07124 | null |
| 2024-04-10 | MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints | Bedirhan Uguz et.al. | 2404.07094 | null |
| 2024-04-10 | Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting | Xiaolei Lang et.al. | 2404.06926 | null |
| 2024-04-09 | Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences | Axel Barroso-Laguna et.al. | 2404.06337 | link |
| 2024-04-09 | Incremental Joint Learning of Depth, Pose and Implicit Scene Representation on Monocular Camera in Large-scale Scenes | Tianchen Deng et.al. | 2404.06050 | null |
| 2024-04-09 | Improving Facial Landmark Detection Accuracy and Efficiency with Knowledge Distillation | Zong-Wei Hong et.al. | 2404.06029 | null |
| 2024-04-08 | Learning 3D-Aware GANs from Unposed Images with Template Feature Field | Xinya Chen et.al. | 2404.05705 | null |
| 2024-04-08 | Learning a Category-level Object Pose Estimator without Pose Annotations | Fengrui Tian et.al. | 2404.05626 | null |
| 2024-04-08 | DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker | Jiapeng Wu et.al. | 2404.05518 | link |
| 2024-04-08 | Two Hands Are Better Than One: Resolving Hand to Hand Intersections via Occupancy Networks | Maksym Ivashechkin et.al. | 2404.05414 | null |
| 2024-04-08 | STITCH: Augmented Dexterity for Suture Throws Including Thread Coordination and Handoffs | Kush Hari et.al. | 2404.05151 | null |
| 2024-04-05 | ToolEENet: Tool Affordance 6D Pose Estimation | Yunlong Wang et.al. | 2404.04193 | null |
| 2024-04-04 | SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation | Sichen Chen et.al. | 2404.03518 | link |
| 2024-04-04 | Multi Positive Contrastive Learning with Pose-Consistent Generated Images | Sho Inayoshi et.al. | 2404.03256 | null |
| 2024-04-04 | HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud | Wencan Cheng et.al. | 2404.03159 | link |
| 2024-04-03 | Fusing Multi-sensor Input with State Information on TinyML Brains for Autonomous Nano-drones | Luca Crupi et.al. | 2404.02567 | null |
| 2024-04-03 | Semi-Supervised Unconstrained Head Pose Estimation in the Wild | Huayi Zhou et.al. | 2404.02544 | link |
| 2024-04-02 | 3D Congealing: 3D-Aware Image Alignment in the Wild | Yunzhi Zhang et.al. | 2404.02125 | null |
| 2024-04-02 | SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation | Vinkle Srivastav et.al. | 2404.02041 | null |
| 2024-04-01 | Marrying NeRF with Feature Matching for One-step Pose Estimation | Ronghan Chen et.al. | 2404.00891 | null |
| 2024-03-31 | Graph-Based vs. Error State Kalman Filter-Based Fusion Of 5G And Inertial Data For MAV Indoor Pose Estimation | Meisam Kabiri et.al. | 2404.00691 | null |
| 2024-03-31 | OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos | Dongyoung Choi et.al. | 2404.00676 | null |
| 2024-04-02 | KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation | Jihua Peng et.al. | 2404.00658 | link |
| 2024-03-29 | FetalDiffusion: Pose-Controllable 3D Fetal MRI Synthesis with Conditional Diffusion Model | Molin Zhang et.al. | 2404.00132 | null |
| 2024-03-29 | Latent Embedding Clustering for Occlusion Robust Head Pose Estimation | José Celestino et.al. | 2403.20251 | null |
| 2024-03-29 | A Unified Framework for Human-centric Point Cloud Video Understanding | Yiteng Xu et.al. | 2403.20031 | null |
| 2024-04-01 | Video-Based Human Pose Regression via Decoupled Space-Time Aggregation | Jijie He et.al. | 2403.19926 | link |
| 2024-03-28 | Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation | Xiao Lin et.al. | 2403.19527 | link |
| 2024-03-27 | Object Pose Estimation via the Aggregation of Diffusion Features | Tianfu Wang et.al. | 2403.18791 | link |
| 2024-03-27 | RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation | Yang Tian et.al. | 2403.18259 | null |
| 2024-03-26 | Mathematical Foundation and Corrections for Full Range Head Pose Estimation | Huei-Chung Hu et.al. | 2403.18104 | null |
| 2024-03-26 | EgoPoseFormer: A Simple Baseline for Egocentric 3D Human Pose Estimation | Chenhongyi Yang et.al. | 2403.18080 | null |
| 2024-03-26 | A Survey on 3D Egocentric Human Pose Estimation | Md Mushfiqur Azam et.al. | 2403.17893 | null |
| 2024-03-26 | GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction | Hrishav Bakul Barua et.al. | 2403.17837 | link |
| 2024-03-26 | DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions | Sammy Christen et.al. | 2403.17827 | null |
| 2024-03-26 | System Calibration of a Field Phenotyping Robot with Multiple High-Precision Profile Laser Scanners | Felix Esser et.al. | 2403.17788 | null |
| 2024-03-25 | Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos | Remy Sabathier et.al. | 2403.17103 | null |
| 2024-03-25 | Characterisation of the Intel RealSense D415 Stereo Depth Camera for Motion-Corrected CT Perfusion Imaging | Mahdieh Dashtbani Moghari et.al. | 2403.16490 | null |
| 2024-03-25 | Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects | Zicong Fan et.al. | 2403.16428 | null |
| 2024-03-25 | A Geometric Perspective on Fusing Gaussian Distributions on Lie Groups | Yixiao Ge et.al. | 2403.16411 | null |
| 2024-03-25 | ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation | Hannah Schieber et.al. | 2403.16400 | null |
| 2024-03-24 | KITchen: A Real-World Benchmark and Dataset for 6D Object Pose Estimation in Kitchen Environments | Abdelrahman Younes et.al. | 2403.16238 | null |
| 2024-03-24 | Diffusion Model is a Good Pose Estimator from 3D RF-Vision | Junqiao Fan et.al. | 2403.16198 | null |
| 2024-03-23 | UPNeRF: A Unified Framework for Monocular 3D Object Reconstruction and Pose Estimation | Yuliang Guo et.al. | 2403.15705 | null |
| 2024-03-22 | InterFusion: Text-Driven Generation of 3D Human-Object Interaction | Sisi Dai et.al. | 2403.15612 | null |
| 2024-03-22 | Augmented Reality Warnings in Roadway Work Zones: Evaluating the Effect of Modality on Worker Reaction Times | Sepehr Sabeti et.al. | 2403.15571 | null |
| 2024-03-22 | Gesture-Controlled Aerial Robot Formation for Human-Swarm Interaction in Safety Monitoring Applications | Vít Krátký et.al. | 2403.15333 | null |
| 2024-03-22 | WSCLoc: Weakly-Supervised Sparse-View Camera Relocalization | Jialu Wang et.al. | 2403.15272 | null |
| 2024-03-22 | DITTO: Demonstration Imitation by Trajectory Transformation | Nick Heppert et.al. | 2403.15203 | null |
| 2024-03-22 | Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning | Bumsoo Kim et.al. | 2403.15048 | null |
| 2024-03-22 | Trajectory Regularization Enhances Self-Supervised Geometric Representation | Jiayun Wang et.al. | 2403.14973 | null |
| 2024-03-21 | VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding | Ahmad Mahmood et.al. | 2403.14743 | null |
| 2024-03-21 | Visibility-Aware Keypoint Localization for 6DoF Object Pose Estimation | Ruyi Lian et.al. | 2403.14559 | null |
| 2024-03-21 | Exploring 3D Human Pose Estimation and Forecasting from the Robot’s Perspective: The HARPER Dataset | Andrea Avogaro. Andrea Toaiari et.al. | 2403.14447 | null |
| 2024-03-21 | Evaluation and Deployment of LiDAR-based Place Recognition in Dense Forests | Haedam Oh et.al. | 2403.14326 | null |
| 2024-03-21 | Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation | Francesco Di Felice et.al. | 2403.14279 | null |
| 2024-03-20 | DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses | Chen Zhao et.al. | 2403.13683 | link |
| 2024-03-20 | Meta-Point Learning and Refining for Category-Agnostic Pose Estimation | Junjie Chen et.al. | 2403.13647 | link |
| 2024-03-20 | Advancing 6D Pose Estimation in Augmented Reality – Overcoming Projection Ambiguity with Uncontrolled Imagery | Mayura Manawadu et.al. | 2403.13434 | null |
| 2024-03-20 | DOR3D-Net: Dense Ordinal Regression Network for 3D Hand Pose Estimation | Yamin Mao et.al. | 2403.13405 | null |
| 2024-03-20 | ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics | Qiaojun Yu et.al. | 2403.13365 | null |
| 2024-03-20 | MULAN-WC: Multi-Robot Localization Uncertainty-aware Active NeRF with Wireless Coordination | Weiying Wang et.al. | 2403.13348 | null |
| 2024-03-19 | FaceXFormer: A Unified Transformer for Facial Analysis | Kartik Narayan et.al. | 2403.12960 | null |
| 2024-03-19 | WHAC: World-grounded Humans and Cameras | Wanqi Yin et.al. | 2403.12959 | null |
| 2024-03-19 | Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation | Jingtao Sun et.al. | 2403.12728 | link |
| 2024-03-19 | IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model | Matteo Bortolon et.al. | 2403.12682 | null |
| 2024-03-19 | In-Hand Following of Deformable Linear Objects Using Dexterous Fingers with Tactile Sensing | Mingrui Yu et.al. | 2403.12676 | null |
| 2024-03-19 | Self-learning Canonical Space for Multi-view 3D Human Pose Estimation | Xiaoben Li et.al. | 2403.12440 | null |
| 2024-03-19 | Human Mesh Recovery from Arbitrary Multi-view Images | Xiaoben Li et.al. | 2403.12434 | null |
| 2024-03-19 | XPose: eXplainable Human Pose Estimation | Luyu Qiu et.al. | 2403.12370 | null |
| 2024-03-18 | HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data | Mengqi Zhang et.al. | 2403.12011 | null |
| 2024-03-18 | Normalized Validity Scores for DNNs in Regression based Eye Feature Extraction | Wolfgang Fuhl et.al. | 2403.11665 | null |
| 2024-03-18 | An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation | Zewen Xu et.al. | 2403.11639 | null |
| 2024-03-18 | LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models | Yang Yang et.al. | 2403.11627 | link |
| 2024-03-18 | GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects | Sungphill Moon et.al. | 2403.11510 | null |
| 2024-03-17 | A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation | Qucheng Peng et.al. | 2403.11310 | null |
| 2024-03-17 | Compact 3D Gaussian Splatting For Dense Visual SLAM | Tianchen Deng et.al. | 2403.11247 | null |
| 2024-03-16 | Robotic Task Success Evaluation Under Multi-modal Non-Parametric Object Pose Uncertainty | Lakshadeep Naik et.al. | 2403.10874 | null |
| 2024-03-16 | DPPE: Dense Pose Estimation in a Plenoxels Environment using Gradient Approximation | Christopher Kolios et.al. | 2403.10773 | null |
| 2024-03-15 | GS-Pose: Cascaded Framework for Generalizable Segmentation-based 6D Object Pose Estimation | Dingding Cai et.al. | 2403.10683 | null |
| 2024-03-15 | CLOSURE: Fast Quantification of Pose Uncertainty Sets | Yihuai Gao et.al. | 2403.09990 | null |
| 2024-03-14 | Scalable Autonomous Drone Flight in the Forest with Visual-Inertial SLAM and Dense Submaps Built without LiDAR | Sebastián Barbas Laina et.al. | 2403.09596 | null |
| 2024-03-14 | Improving Real-Time Omnidirectional 3D Multi-Person Human Pose Estimation with People Matching and Unsupervised 2D-3D Lifting | Pawel Knap et.al. | 2403.09437 | null |
| 2024-03-14 | LM2D: Lyrics- and Music-Driven Dance Synthesis | Wenjie Yin et.al. | 2403.09407 | null |
| 2024-03-14 | SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios | Ding-Tao Huang et.al. | 2403.09317 | link |
| 2024-03-14 | MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion | Arul Selvam Periyasamy et.al. | 2403.09309 | null |
| 2024-03-13 | Data Augmentation in Human-Centric Vision | Wentao Jiang et.al. | 2403.08650 | null |
| 2024-03-13 | PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections | Matteo Taiana et.al. | 2403.08586 | null |
| 2024-03-13 | NeRF-Supervised Feature Point Detection and Description | Ali Youssef et.al. | 2403.08156 | null |
| 2024-03-12 | Q-SLAM: Quadric Representations for Monocular SLAM | Chensheng Peng et.al. | 2403.08125 | null |
| 2024-03-12 | MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation | Yuelong Li et.al. | 2403.08019 | null |
| 2024-03-12 | Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation | Kira Wursthorn et.al. | 2403.07741 | null |
| 2024-03-12 | Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving | JunDa Cheng et.al. | 2403.07535 | null |
| 2024-03-12 | Category-Agnostic Pose Estimation for Point Clouds | Bowen Liu et.al. | 2403.07437 | null |
| 2024-03-12 | Monocular Microscope to CT Registration using Pose Estimation of the Incus for Augmented Reality Cochlear Implant Surgery | Yike Zhang et.al. | 2403.07219 | null |
| 2024-03-11 | Real-Time Simulated Avatar from Head-Mounted Sensors | Zhengyi Luo et.al. | 2403.06862 | null |
| 2024-03-11 | Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition | Erkut Akdag et.al. | 2403.06577 | null |
| 2024-03-10 | Platypose: Calibrated Zero-Shot Multi-Hypothesis 3D Human Motion Estimation | Paweł A. Pierzchlewicz et.al. | 2403.06164 | link |
| 2024-03-10 | Diffusion Models Trained with Large Data Are Transferable Visual Models | Guangkai Xu et.al. | 2403.06090 | null |
| 2024-03-08 | Prepared for the Worst: A Learning-Based Adversarial Attack for Resilience Analysis of the ICP Algorithm | Ziyu Zhang et.al. | 2403.05666 | null |
| 2024-03-11 | Exploiting polar symmetry in designing equivariant observers for vision-based motion estimation | Tarek Bouazza et.al. | 2403.05450 | null |
| 2024-03-07 | Real-Time Planning Under Uncertainty for AUVs Using Virtual Maps | Ivana Collado-Gonzalez et.al. | 2403.04936 | null |
| 2024-03-07 | That’s My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation | Georgi Pramatarov et.al. | 2403.04755 | null |
| 2024-03-07 | Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser | Qingyuan Cai et.al. | 2403.04444 | null |
| 2024-03-09 | Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation | Ruicong Liu et.al. | 2403.04381 | null |
| 2024-03-05 | FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation | Chris Rockwell et.al. | 2403.03221 | null |
| 2024-03-05 | NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors | Yannan He et.al. | 2403.03122 | null |
| 2024-03-05 | Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection | Mohamed Afifi et.al. | 2403.03111 | null |
| 2024-03-05 | Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps | Timothy Chen et.al. | 2403.02751 | null |
| 2024-03-04 | PowerSkel: A Device-Free Framework Using CSI Signal for Human Skeleton Estimation in Power Station | Cunyi Yin et.al. | 2403.01913 | link |
| 2024-03-04 | A Simple Baseline for Efficient Hand Mesh Reconstruction | Zhishan Zhou et.al. | 2403.01813 | null |
| 2024-03-03 | MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images | Junwen Huang et.al. | 2403.01517 | null |
| 2024-03-02 | Single-image camera calibration with model-free distortion correction | Katia Genovese et.al. | 2403.01263 | null |
| 2024-03-02 | Grid-based Fast and Structural Visual Odometry | Zhang Zhihe et.al. | 2403.01110 | null |
| 2024-03-01 | Optimal Robot Formations: Balancing Range-Based Observability and User-Defined Configurations | Syed Shabbir Ahmed et.al. | 2403.00988 | null |
| 2024-03-04 | TEXterity – Tactile Extrinsic deXterity: Simultaneous Tactile Estimation and Control for Extrinsic Dexterity | Sangwoon Kim et.al. | 2403.00049 | null |
| 2024-03-01 | Graph Convolutional Neural Networks for Automated Echocardiography View Recognition: A Holistic Approach | Sarina Thomas et.al. | 2402.19062 | null |
| 2024-02-29 | Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey | Yang Liu et.al. | 2402.18844 | link |
| 2024-02-28 | Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting | Taeho Kang et.al. | 2402.18330 | link |
| 2024-02-28 | Location-guided Head Pose Estimation for Fisheye Image | Bing Li et.al. | 2402.18320 | null |
| 2024-02-28 | NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images | Jingrui Yu et.al. | 2402.18196 | null |
| 2024-02-28 | Six-Point Method for Multi-Camera Systems with Reduced Solution Space | Banglei Guan et.al. | 2402.18066 | null |
| 2024-02-27 | Real-Time Estimation of Relative Pose for UAVs Using a Dual-Channel Feature Association | Zhaoying Wang et.al. | 2402.17504 | null |
| 2024-02-26 | HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields | Haozhe Qi et.al. | 2402.17062 | link |
| 2024-02-26 | DRSI-Net: Dual-Residual Spatial Interaction Network for Multi-Person Pose Estimation | Shang Wu et.al. | 2402.16640 | null |
| 2024-02-26 | GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video | Xinqi Liu et.al. | 2402.16607 | null |
| 2024-02-26 | DreamUp3D: Object-Centric Generative Models for Single-View 3D Scene Understanding and Real-to-Sim Transfer | Yizhe Wu et.al. | 2402.16308 | null |
| 2024-02-25 | XAI-based gait analysis of patients walking with Knee-Ankle-Foot orthosis using video cameras | Arnav Mishra et.al. | 2402.16175 | null |
Image Generation
LLM
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-12-23 | Making Large Language Models Efficient Dense Retrievers | Yibin Lei et.al. | 2512.20612 | null |
| 2025-12-23 | MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts | Alexandros Christoforos et.al. | 2512.20604 | null |
| 2025-12-23 | Cube Bench: A Benchmark for Spatial Visual Reasoning in MLLMs | Dhruv Anand et.al. | 2512.20595 | null |
| 2025-12-23 | Automated stereotactic radiosurgery planning using a human-in-the-loop reasoning large language model agent | Humza Nusrat et.al. | 2512.20586 | null |
| 2025-12-23 | Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits | Amirhosein Ghasemabadi et.al. | 2512.20578 | null |
| 2025-12-23 | Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs | Rui Pan et.al. | 2512.20573 | null |
| 2025-12-23 | LLM-Based Authoring of Agent-Based Narratives through Scene Descriptions | Vinayak Regmi et.al. | 2512.20550 | null |
| 2025-12-23 | Advancing Multimodal Teacher Sentiment Analysis:The Large-Scale T-MED Dataset & The Effective AAM-TSA Model | Zhiyi Duan et.al. | 2512.20548 | null |
| 2025-12-23 | Benchmarking LLMs for Predictive Applications in the Intensive Care Units | Chehak Malhotra et.al. | 2512.20520 | null |
| 2025-12-23 | Coherence in the brain unfolds across separable temporal regimes | Davide Stauba et.al. | 2512.20481 | null |
| 2025-12-23 | UTDesign: A Unified Framework for Stylized Text Editing and Generation in Graphic Design Images | Yiming Zhao et.al. | 2512.20479 | null |
| 2025-12-23 | Laser: Governing Long-Horizon Agentic Search via Structured Protocol and Context Register | Shuting Wang et.al. | 2512.20458 | null |
| 2025-12-23 | Topic-informed dynamic mixture model for occupational heterogeneity in health risk behaviors | Lorenzo Schiavon et.al. | 2512.20408 | null |
| 2025-12-23 | ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected | Kanchon Gharami et.al. | 2512.20405 | null |
| 2025-12-23 | CRAFT: Continuous Reasoning and Agentic Feedback Tuning for Multimodal Text-to-Image Generation | V. Kovalev et.al. | 2512.20362 | null |
| 2025-12-23 | A DeepSeek-Powered AI System for Automated Chest Radiograph Interpretation in Clinical Practice | Yaowei Bai et.al. | 2512.20344 | null |
| 2025-12-23 | Comment Traps: How Defective Commented-out Code Augment Defects in AI-Assisted Code Generation | Yuan Huang et.al. | 2512.20334 | null |
| 2025-12-23 | SynCraft: Guiding Large Language Models to Predict Edit Sequences for Molecular Synthesizability Optimization | Junren Li et.al. | 2512.20333 | null |
| 2025-12-23 | Toward Explaining Large Language Models in Software Engineering Tasks | Antonio Vitale et.al. | 2512.20328 | null |
| 2025-12-23 | Can LLMs Solve My Grandma’s Riddle? Evaluating Multilingual Large Language Models on Reasoning Traditional Bangla Tricky Riddles | Nurul Labib Sayeedi et.al. | 2512.20324 | null |
| 2025-12-23 | TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning | Saisai Yang et.al. | 2512.20312 | null |
| 2025-12-23 | Structured Visualization Design Knowledge for Grounding Generative Reasoning and Situated Feedback | Péter Ferenc Gyarmati et.al. | 2512.20306 | null |
| 2025-12-23 | AprielGuard | Jaykumar Kasundra et.al. | 2512.20293 | null |
| 2025-12-23 | Synthesizing Procedural Memory: Challenges and Architectures in Automated Workflow Generation | Nishant Gaurav et.al. | 2512.20278 | null |
| 2025-12-23 | Graph-Symbolic Policy Enforcement and Control (G-SPEC): A Neuro-Symbolic Framework for Safe Agentic AI in 5G Autonomous Networks | Divya Vijay et.al. | 2512.20275 | null |
| 2025-12-23 | Memory as Resonance: A Biomimetic Architecture for Infinite Context Memory on Ergodic Phonetic Manifolds | Tarik Houichime et.al. | 2512.20245 | null |
| 2025-12-23 | MemR $^3$ : Memory Retrieval via Reflective Reasoning for LLM Agents | Xingbo Du et.al. | 2512.20237 | null |
| 2025-12-23 | Quantitative Financial Modeling for Sri Lankan Markets: Approach Combining NLP, Clustering and Time-Series Forecasting | Linuk Perera et.al. | 2512.20216 | null |
| 2025-12-23 | Predictive-LoRA: A Proactive and Fragmentation-Aware Serverless Inference System for LLMs | Yinan Ni et.al. | 2512.20210 | null |
| 2025-12-23 | TongSIM: A General Platform for Simulating Intelligent Machines | Zhe Sun et.al. | 2512.20206 | null |
| 2025-12-23 | Corpus of Cross-lingual Dialogues with Minutes and Detection of Misunderstandings | Marko Čechovič et.al. | 2512.20204 | null |
| 2025-12-23 | Well Begun is Half Done: Location-Aware and Trace-Guided Iterative Automated Vulnerability Repair | Zhenlei Ye et.al. | 2512.20203 | null |
| 2025-12-23 | Designing Spatial Architectures for Sparse Attention: STAR Accelerator via Cross-Stage Tiling | Huizheng Wang et.al. | 2512.20198 | null |
| 2025-12-23 | FaithLens: Detecting and Explaining Faithfulness Hallucination | Shuzheng Si et.al. | 2512.20182 | link |
| 2025-12-23 | Optimistic TEE-Rollups: A Hybrid Architecture for Scalable and Verifiable Generative AI Inference on Blockchain | Aaron Chan et.al. | 2512.20176 | null |
| 2025-12-23 | Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark | Hao Guo et.al. | 2512.20174 | null |
| 2025-12-23 | Learning to Reason in LLMs by Expectation Maximization | Junghyun Lee et.al. | 2512.20169 | null |
| 2025-12-23 | Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography | Songze Li et.al. | 2512.20168 | null |
| 2025-12-23 | AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications | Honglin Mu et.al. | 2512.20164 | null |
| 2025-12-23 | Concept Generalization in Humans and Large Language Models: Insights from the Number Game | Arghavan Bazigaran et.al. | 2512.20162 | null |
| 2025-12-23 | AXIOM: Benchmarking LLM-as-a-Judge for Code via Rule-Based Perturbation and Multisource Quality Calibration | Ruiqi Wang et.al. | 2512.20159 | null |
| 2025-12-23 | Multi-hop Reasoning via Early Knowledge Alignment | Yuxin Wang et.al. | 2512.20144 | link |
| 2025-12-23 | Enhancing Zero-Shot Time Series Forecasting in Off-the-Shelf LLMs via Noise Injection | Xingyou Yin et.al. | 2512.20140 | null |
| 2025-12-23 | M $^3$ KG-RAG: Multi-hop Multimodal Knowledge Graph-enhanced Retrieval-Augmented Generation | Hyeongcheol Park et.al. | 2512.20136 | null |
| 2025-12-23 | A Novel Graph-Sequence Learning Model for Inductive Text Classification | Zuo Wang et.al. | 2512.20097 | null |
| 2025-12-23 | QE-Catalytic: A Graph-Language Multimodal Base Model for Relaxed-Energy Prediction in Catalytic Adsorption | Yanjie Li et.al. | 2512.20084 | null |
| 2025-12-23 | Adaptive Financial Sentiment Analysis for NIFTY 50 via Instruction-Tuned LLMs , RAG and Reinforcement Learning Approaches | Chaithra et.al. | 2512.20082 | null |
| 2025-12-23 | Reason2Decide: Rationale-Driven Multi-Task Learning | H M Quamran Hasan et.al. | 2512.20074 | null |
| 2025-12-23 | On the Effectiveness of Instruction-Tuning Local LLMs for Identifying Software Vulnerabilities | Sangryu Park et.al. | 2512.20062 | null |
| 2025-12-23 | Scaling Reinforcement Learning for Content Moderation with Large Language Models | Hamed Firooz et.al. | 2512.20061 | null |
| 2025-12-23 | Beyond Vision: Contextually Enriched Image Captioning with Multi-Modal Retrieva | Nguyen Lam Phu Quy et.al. | 2512.20042 | null |
| 2025-12-23 | VSA:Visual-Structural Alignment for UI-to-Code | Xian Wu et.al. | 2512.20034 | null |
| 2025-12-23 | VALLR-Pin: Dual-Decoding Visual Speech Recognition for Mandarin with Pinyin-Guided LLM Refinement | Chang Sun et.al. | 2512.20032 | null |
| 2025-12-23 | LLM-Assisted Abstract Screening with OLIVER: Evaluating Calibration and Single-Model vs. Actor-Critic Configurations in Literature Reviews | Kian Godhwani et.al. | 2512.20022 | null |
| 2025-12-23 | Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems | Qiushuo Hou et.al. | 2512.20012 | null |
| 2025-12-23 | LoFT-LLM: Low-Frequency Time-Series Forecasting with Large Language Models | Jiacheng You et.al. | 2512.20002 | null |
| 2025-12-23 | Schoenfeld’s Anatomy of Mathematical Reasoning by Language Models | Ming Li et.al. | 2512.19995 | null |
| 2025-12-23 | S $^3$ IT: A Benchmark for Spatially Situated Social Intelligence Test | Zhe Sun et.al. | 2512.19992 | null |
| 2025-12-23 | Bias Beneath the Tone: Empirical Characterisation of Tone Bias in LLM-Driven UX Systems | Heet Bodara et.al. | 2512.19950 | null |
| 2025-12-23 | Interpolative Decoding: Exploring the Spectrum of Personality Traits in LLMs | Eric Yeh et.al. | 2512.19937 | null |
| 2025-12-22 | Conditional Adversarial Fragility in Financial Machine Learning under Macroeconomic Stress | Samruddhi Baviskar et.al. | 2512.19935 | null |
| 2025-12-22 | PRISM: A Personality-Driven Multi-Agent Framework for Social Media Simulation | Zhixiang Lu et.al. | 2512.19933 | null |
| 2025-12-22 | Widget2Code: From Visual Widgets to UI Code via Multimodal LLMs | Houston H. Zhang et.al. | 2512.19918 | null |
| 2025-12-22 | Demystifying LLM-as-a-Judge: Analytically Tractable Model for Inference-Time Scaling | Indranil Halder et.al. | 2512.19905 | null |
| 2025-12-22 | How well do Large Language Models Recognize Instructional Moves? Establishing Baselines for Foundation Models in Educational Discourse | Kirk Vanacore et.al. | 2512.19903 | null |
| 2025-12-22 | Larger Is Not Always Better: Leveraging Structured Code Diffs for Comment Inconsistency Detection | Phong Nguyen et.al. | 2512.19883 | null |
| 2025-12-22 | Fine-Tuned In-Context Learners for Efficient Adaptation | Jorg Bornschein et.al. | 2512.19879 | null |
| 2025-12-22 | CS-Guide: Leveraging LLMs and Student Reflections to Provide Frequent, Scalable Academic Monitoring Feedback to Computer Science Students | Samuel Jacob Chacko et.al. | 2512.19866 | null |
| 2025-12-22 | HARMON-E: Hierarchical Agentic Reasoning for Multimodal Oncology Notes to Extract Structured Data | Shashi Kant Gupta et.al. | 2512.19864 | null |
| 2025-12-22 | From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs | Mingrui Wu et.al. | 2512.19683 | null |
| 2025-12-22 | GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators | Jiacheng Guo et.al. | 2512.19682 | null |
| 2025-12-22 | Multimodal LLMs for Historical Dataset Construction from Archival Image Scans: German Patents (1877-1918) | Niclas Griesshaber et.al. | 2512.19675 | null |
| 2025-12-22 | Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies | Yuqiao Tan et.al. | 2512.19673 | null |
| 2025-12-22 | Beyond CLIP: Knowledge-Enhanced Multimodal Transformers for Cross-Modal Alignment in Diabetic Retinopathy Diagnosis | Argha Kamal Samanta et.al. | 2512.19663 | null |
| 2025-12-22 | Exploring Zero-Shot ACSA with Unified Meaning Representation in Chain-of-Thought Prompting | Filippos Ventirozos et.al. | 2512.19651 | null |
| 2025-12-22 | Exploring the features used for summary evaluation by Human and GPT | Zahra Sadeghi et.al. | 2512.19620 | null |
| 2025-12-22 | MapTrace: Scalable Data Generation for Route Tracing on Maps | Artemis Panagopoulou et.al. | 2512.19609 | null |
| 2025-12-22 | RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference | George Karfakis et.al. | 2512.19606 | null |
| 2025-12-22 | Increasing the Thinking Budget is Not All You Need | Ignacio Iacobacci et.al. | 2512.19585 | null |
| 2025-12-22 | The Epistemological Consequences of Large Language Models: Rethinking collective intelligence and institutional knowledge | Angjelin Hila et.al. | 2512.19570 | null |
| 2025-12-22 | Algerian Dialect | Zakaria Benmounah et.al. | 2512.19543 | null |
| 2025-12-22 | Event Extraction in Large Language Model | Bobo Li et.al. | 2512.19537 | null |
| 2025-12-22 | Learning Continuous Solvent Effects from Transient Flow Data: A Graph Neural Network Benchmark on Catechol Rearrangement | Hongsheng Xing et.al. | 2512.19530 | null |
| 2025-12-22 | Anatomy-R1: Enhancing Anatomy Reasoning in Multimodal Large Language Models via Anatomical Similarity Curriculum and Group Diversity Augmentation | Ziyang Song et.al. | 2512.19512 | null |
| 2025-12-22 | Structured Event Representation and Stock Return Predictability | Gang Li et.al. | 2512.19484 | null |
| 2025-12-22 | A Dataset and Preliminary Study of Using GPT-5 for Code-change Impact Analysis | Katharina Stengg et.al. | 2512.19481 | null |
| 2025-12-22 | A Large-Language-Model Framework for Automated Humanitarian Situation Reporting | Ivan Decostanzi et.al. | 2512.19475 | null |
| 2025-12-22 | Epistemological Fault Lines Between Human and Artificial Intelligence | Walter Quattrociocchi et.al. | 2512.19466 | null |
| 2025-12-22 | An Agentic Framework for Autonomous Materials Computation | Zeyu Xia et.al. | 2512.19458 | null |
| 2025-12-22 | Activations as Features: Probing LLMs for Generalizable Essay Scoring Representations | Jinwei Chi et.al. | 2512.19456 | null |
| 2025-12-22 | SiamGPT: Quality-First Fine-Tuning for Stable Thai Text Generation | Thittipat Pairatsuppawat et.al. | 2512.19455 | null |
| 2025-12-22 | D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning | Evelyn Zhang et.al. | 2512.19443 | null |
| 2025-12-22 | dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models | Yi Xin et.al. | 2512.19433 | null |
| 2025-12-22 | CodeSimpleQA: Scaling Factuality in Code Large Language Models | Jian Yang et.al. | 2512.19424 | null |
| 2025-12-22 | From Retrieval to Reasoning: A Framework for Cyber Threat Intelligence NER with Explicit and Adaptive Instructions | Jiaren Peng et.al. | 2512.19414 | null |
| 2025-12-22 | Brain-Grounded Axes for Reading and Steering LLM States | Sandro Andric et.al. | 2512.19399 | null |
| 2025-12-22 | HATS: High-Accuracy Triple-Set Watermarking for Large Language Models | Zhiqing Hu et.al. | 2512.19378 | null |
| 2025-12-22 | Generative vector search to improve pathology foundation models across multimodal vision-language tasks | Markus Ekvall et.al. | 2512.19360 | null |
| 2025-12-22 | ReasonCD: A Multimodal Reasoning Large Model for Implicit Change-of-Interest Semantic Mining | Zhenyang Huang et.al. | 2512.19354 | null |
| 2025-12-22 | PENDULUM: A Benchmark for Assessing Sycophancy in Multimodal Large Language Models | A. B. M. Ashikur Rahman et.al. | 2512.19350 | null |
| 2025-12-22 | VIGOR+: Iterative Confounder Generation and Validation via LLM-CEVAE Feedback Loop | JiaWei Zhu et.al. | 2512.19349 | null |
| 2025-12-22 | SafeMed-R1: Adversarial Reinforcement Learning for Generalizable and Robust Medical Reasoning in Vision-Language Models | A. A. Gde Yogi Pramana et.al. | 2512.19317 | null |
| 2025-12-22 | CienaLLM: Generative Climate-Impact Extraction from News Articles with Autoregressive LLMs | Javier Vela-Tambo et.al. | 2512.19305 | null |
| 2025-12-22 | Helios: A Foundational Language Model for Smart Energy Knowledge Reasoning and Application | Haoyu Jiang et.al. | 2512.19299 | null |
| 2025-12-22 | Causal-Guided Detoxify Backdoor Attack of Open-Weight LoRA Models | Linzhi Chen et.al. | 2512.19297 | null |
| 2025-12-22 | Auto-Prompting with Retrieval Guidance for Frame Detection in Logistics | Do Minh Duc et.al. | 2512.19247 | null |
| 2025-12-22 | ChemATP: A Training-Free Chemical Reasoning Framework for Large Language Models | Mingxu Zhang et.al. | 2512.19240 | null |
| 2025-12-22 | Identifying Features Associated with Bias Against 93 Stigmatized Groups in Language Models and Guardrail Model Safety Mitigation | Anna-Maria Gueorguieva et.al. | 2512.19238 | null |
| 2025-12-22 | Generation of Programmatic Rules for Document Forgery Detection Using Large Language Models | Valentin Schmidberger et.al. | 2512.19228 | null |
| 2025-12-22 | Observer, Not Player: Simulating Theory of Mind in LLMs through Game Observation | Jerry Wang et.al. | 2512.19210 | null |
| 2025-12-22 | MixKVQ: Query-Aware Mixed-Precision KV Cache Quantization for Long-Context Reasoning | Tao Zhang et.al. | 2512.19206 | null |
| 2025-12-22 | Configuration Work: Four Consequences of LLMs-in-use | Gabriel Alcaras et.al. | 2512.19189 | null |
| 2025-12-22 | L4: Low-Latency and Load-Balanced LLM Serving via Length-Aware Scheduling | Yitao Yuan et.al. | 2512.19179 | null |
| 2025-12-22 | OmniMoGen: Unifying Human Motion Generation via Learning from Interleaved Text-Motion Instructions | Wendong Bu et.al. | 2512.19159 | null |
| 2025-12-22 | Understanding Chain-of-Thought in Large Language Models via Topological Data Analysis | Chenghao Li et.al. | 2512.19135 | null |
| 2025-12-22 | QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation | Dehai Min et.al. | 2512.19134 | null |
| 2025-12-22 | AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards | Zihan Lin et.al. | 2512.19126 | null |
| 2025-12-22 | Stop saying LLM: Large Discourse Models (LDM) and Artificial Discursive Agent (ADA)? | Amar Lakel et.al. | 2512.19117 | null |
| 2025-12-22 | Generative Giants, Retrieval Weaklings: Why do Multimodal Large Language Models Fail at Multimodal Retrieval? | Hengyi Feng et.al. | 2512.19115 | null |
| 2025-12-22 | HyperLoad: A Cross-Modality Enhanced Large Language Model-Based Framework for Green Data Center Cooling Load Prediction | Haoyu Jiang et.al. | 2512.19114 | null |
| 2025-12-22 | FC-MIR: A Mobile Screen Awareness Framework for Intent-Aware Recommendation based on Frame-Compressed Multimodal Trajectory Reasoning | Zhe Yang et.al. | 2512.19107 | null |
| 2025-12-22 | Tool-Augmented Hybrid Ensemble Reasoning with Distillation for Bilingual Mathematical Problem Solving | Peiqing Lu et.al. | 2512.19093 | null |
| 2025-12-22 | A Large Language Model Based Method for Complex Logical Reasoning over Knowledge Graphs | Ziyan Zhang et.al. | 2512.19092 | null |
| 2025-12-22 | Population-Evolve: a Parallel Sampling and Evolutionary Method for LLM Math Reasoning | Yanzhi Zhang et.al. | 2512.19081 | null |
| 2025-12-22 | Watch Closely: Mitigating Object Hallucinations in Large Vision-Language Models with Disentangled Decoding | Ruiqi Ma et.al. | 2512.19070 | null |
| 2025-12-22 | Can abstract concepts from LLM improve SLM performance? | Siddharth Tandon et.al. | 2512.19069 | null |
| 2025-12-22 | Finer-Personalization Rank: Fine-Grained Retrieval Examines Identity Preservation for Personalized Generation | Connor Kilrain et.al. | 2512.19026 | null |
| 2025-12-22 | The Erasure Illusion: Stress-Testing the Generalization of LLM Forgetting Evaluation | Hengrui Jia et.al. | 2512.19025 | null |
| 2025-12-22 | PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations | Muhammad Usman Tariq et.al. | 2512.19018 | null |
| 2025-12-22 | DREAM: Dynamic Red-teaming across Environments for AI Models | Liming Lu et.al. | 2512.19016 | null |
| 2025-12-22 | Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline | Akshaj Prashanth Rao et.al. | 2512.19011 | null |
| 2025-12-22 | Context-Aware Initialization for Reducing Generative Path Length in Diffusion Language Models | Tongyuan Miao et.al. | 2512.19004 | null |
| 2025-12-22 | Evaluating the Challenges of LLMs in Real-world Medical Follow-up: A Comparative Study and An Optimized Framework | Jinyan Liu et.al. | 2512.18999 | null |
| 2025-12-22 | R-GenIMA: Integrating Neuroimaging and Genetics with Interpretable Multimodal AI for Alzheimer’s Disease Progression | Kun Zhao et.al. | 2512.18986 | null |
| 2025-12-22 | Scrum Sprint Planning: LLM-based and algorithmic solutions | Yuwon Yoon et.al. | 2512.18966 | null |
| 2025-12-22 | Learning Hierarchical Procedural Memory for LLM Agents through Bayesian Selection and Contrastive Refinement | Saman Forouzandeh et.al. | 2512.18950 | null |
| 2025-12-22 | FASTRIC: Prompt Specification Language for Verifiable LLM Interactions | Wen-Long Jin et.al. | 2512.18940 | null |
| 2025-12-22 | When Less is More: 8-bit Quantization Improves Continual Learning in Large Language Models | Michael S. Zhang et.al. | 2512.18934 | null |
| 2025-12-21 | An Empirical Study of Developer-Provided Context for AI Coding Assistants in Open-Source Projects | Shaokang Jiang et.al. | 2512.18925 | null |
| 2025-12-21 | Delta-LLaVA: Base-then-Specialize Alignment for Token-Efficient Vision-Language Models | Mohamad Zamini et.al. | 2512.18910 | null |
| 2025-12-21 | Gabliteration: Adaptive Multi-Directional Neural Weight Modification for Selective Behavioral Alteration in Large Language Models | Gökdeniz Gülmez et.al. | 2512.18901 | null |
| 2025-12-21 | Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction | Ming Li et.al. | 2512.18880 | null |
| 2025-12-21 | CrashChat: A Multimodal Large Language Model for Multitask Traffic Crash Video Analysis | Kaidi Liang et.al. | 2512.18878 | null |
| 2025-12-21 | CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning | Zijun Gao et.al. | 2512.18857 | null |
| 2025-12-21 | VizDefender: Unmasking Visualization Tampering through Proactive Localization and Intent Inference | Sicheng Song et.al. | 2512.18853 | null |
| 2025-12-21 | MDToC: Metacognitive Dynamic Tree of Concepts for Boosting Mathematical Problem-Solving of Large Language Models | Tung Duong Ta et.al. | 2512.18841 | null |
| 2025-12-21 | From Word to World: Can Large Language Models be Implicit Text-based World Models? | Yixia Li et.al. | 2512.18832 | null |
| 2025-12-21 | HARBOR: Holistic Adaptive Risk assessment model for BehaviORal healthcare | Aditya Siddhant et.al. | 2512.18829 | null |
| 2025-12-21 | “Even GPT Can Reject Me”: Conceptualizing Abrupt Refusal Secondary Harm (ARSH) and Reimagining Psychological AI Safety with Compassionate Completion Standard (CCS) | Yang Ni et.al. | 2512.18776 | null |
| 2025-12-21 | MEEA: Mere Exposure Effect-Driven Confrontational Optimization for LLM Jailbreaking | Jianyi Zhang et.al. | 2512.18755 | null |
| 2025-12-21 | Code2Doc: A Quality-First Curated Dataset for Code Documentation | Recep Kaan Karaman et.al. | 2512.18748 | null |
| 2025-12-21 | IPCV: Information-Preserving Compression for MLLM Visual Encoders | Yuan Chen et.al. | 2512.18747 | null |
| 2025-12-21 | MemEvolve: Meta-Evolution of Agent Memory Systems | Guibin Zhang et.al. | 2512.18746 | null |
| 2025-12-21 | Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection | Junjun Pan et.al. | 2512.18733 | null |
| 2025-12-21 | A Theoretical Lens for RL-Tuned Language Models via Energy-Based Models | Zhiquan Tan et.al. | 2512.18730 | null |
| 2025-12-21 | Solver-Independent Automated Problem Formulation via LLMs for High-Cost Simulation-Driven Design | Yuchen Li et.al. | 2512.18682 | null |
| 2025-12-21 | Remoe: Towards Efficient and Low-Cost MoE Inference in Serverless Computing | Wentao Liu et.al. | 2512.18674 | null |
| 2025-12-21 | SmartSight: Mitigating Hallucination in Video-LLMs Without Compromising Video Understanding via Temporal Attention Collapse | Yiming Sun et.al. | 2512.18671 | null |
| 2025-12-21 | Tackling dataset curation challenges towards reliable machine learning: a case study on thermoelectric materials | Shoeb Athar et.al. | 2512.18653 | null |
| 2025-12-21 | LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction | Jensen Zhang et.al. | 2512.18623 | null |
| 2025-12-21 | A Multi-agent Text2SQL Framework using Small Language Models and Execution Feedback | Thanh Dat Hoang et.al. | 2512.18622 | null |
| 2025-12-21 | A Comparative Study of Light-weight Language Models for PII Masking and their Deployment for Real Conversational Texts | Prabigya Acharya et.al. | 2512.18608 | null |
| 2025-12-21 | Reflective Confidence: Correcting Reasoning Flaws via Online Self-Correction | Qinglin Zeng et.al. | 2512.18605 | null |
| 2025-12-21 | SimpleCall: A Lightweight Image Restoration Agent in Label-Free Environments with MLLM Perceptual Feedback | Jianglin Lu et.al. | 2512.18599 | null |
| 2025-12-21 | Wireless Copilot: An AI-Powered Partner for Navigating Next-Generation Wireless Complexity | Haoxiang Luo et.al. | 2512.18582 | null |
| 2025-12-21 | ESearch-R1: Learning Cost-Aware MLLM Agents for Interactive Embodied Search via Reinforcement Learning | Weijie Zhou et.al. | 2512.18571 | null |
| 2025-12-21 | AI Code in the Wild: Measuring Security Risks and Ecosystem Shifts of AI-Generated Code in Modern Software | Bin Wang et.al. | 2512.18567 | null |
| 2025-12-21 | Vox Deorum: A Hybrid LLM Architecture for 4X / Grand Strategy Game AI – Lessons from Civilization V | John Chen et.al. | 2512.18564 | null |
| 2025-12-21 | OpenView: Empowering MLLMs with Out-of-view VQA | Qixiang Chen et.al. | 2512.18563 | link |
| 2025-12-18 | AdaTooler-V: Adaptive Tool-Use for Images and Videos | Chaoyang Wang et.al. | 2512.16918 | null |
| 2025-12-18 | Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning | Qihao Liu et.al. | 2512.16917 | null |
| 2025-12-18 | Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward | Peter Chen et.al. | 2512.16912 | null |
| 2025-12-18 | Impacts of Racial Bias in Historical Training Data for News AI | Rahul Bhargava et.al. | 2512.16901 | null |
| 2025-12-18 | Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image | Yushi Hu et.al. | 2512.16899 | null |
| 2025-12-18 | LinkedOut: Linking World Knowledge Representation Out of Video LLM for Next-Generation Video Recommendation | Haichao Zhang et.al. | 2512.16891 | null |
| 2025-12-18 | AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning | Tzu-Han Lin et.al. | 2512.16883 | null |
| 2025-12-18 | TOGGLE: Temporal Logic-Guided Large Language Model Compression for Edge | Khurram Khalil et.al. | 2512.16855 | null |
| 2025-12-18 | Meta-RL Induces Exploration in Language Agents | Yulun Jiang et.al. | 2512.16848 | null |
| 2025-12-18 | Toward Systematic Counterfactual Fairness Evaluation of Large Language Models: The CAFFE Framework | Alessandra Parziale et.al. | 2512.16816 | null |
| 2025-12-18 | From Facts to Conclusions : Integrating Deductive Reasoning in Retrieval-Augmented LLMs | Shubham Mishra et.al. | 2512.16795 | null |
| 2025-12-18 | Inside Out: Uncovering How Comment Internalization Steers LLMs for Better or Worse | Aaron Imani et.al. | 2512.16790 | null |
| 2025-12-18 | Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future | Tianshuai Hu et.al. | 2512.16760 | null |
| 2025-12-18 | Plausibility as Failure: How LLMs and Humans Co-Construct Epistemic Error | Claudia Vale Oliveira et.al. | 2512.16750 | null |
| 2025-12-18 | AI-Driven Prediction of Cancer Pain Episodes: A Hybrid Decision Support Approach | Yipeng Zhuang et.al. | 2512.16739 | null |
| 2025-12-18 | Cyber Humanism in Education: Reclaiming Agency through AI and Learning Sciences | Giovanni Adorni et.al. | 2512.16701 | null |
| 2025-12-18 | Do Multi-Agents Solve Better Than Single? Evaluating Agentic Frameworks for Diagram-Grounded Geometry Problem Solving and Reasoning | Mahbub E Sobhani et.al. | 2512.16698 | null |
| 2025-12-18 | DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI | Hao Liang et.al. | 2512.16676 | null |
| 2025-12-18 | Microsoft Academic Graph Information Retrieval for Research Recommendation and Assistance | Jacob Reiss et.al. | 2512.16661 | null |
| 2025-12-18 | Prefix Probing: Lightweight Harmful Content Detection for Large Language Models | Jirui Yang et.al. | 2512.16650 | null |
| 2025-12-18 | JustRL: Scaling a 1.5B LLM with a Simple RL Recipe | Bingxiang He et.al. | 2512.16649 | null |
| 2025-12-18 | Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game | Barna Pásztor et.al. | 2512.16626 | null |
| 2025-12-18 | Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics | Iker García-Ferrero et.al. | 2512.16602 | null |
| 2025-12-18 | Muon is Provably Faster with Momentum Variance Reduction | Xun Qian et.al. | 2512.16598 | null |
| 2025-12-18 | Sketch-in-Latents: Eliciting Unified Reasoning in MLLMs | Jintao Tong et.al. | 2512.16584 | null |
| 2025-12-18 | Non-Asymptotic Global Convergence of PPO-Clip | Yin Liu et.al. | 2512.16565 | null |
| 2025-12-18 | Needle in the Web: A Benchmark for Retrieving Targeted Web Pages in the Wild | Yumeng Wang et.al. | 2512.16553 | null |
| 2025-12-18 | A Systematic Study of Code Obfuscation Against LLM-based Vulnerability Detection | Xiao Li et.al. | 2512.16538 | null |
| 2025-12-18 | From Personalization to Prejudice: Bias and Discrimination in Memory-Enhanced AI Agents for Recruitment | Himanshu Gharat et.al. | 2512.16532 | null |
| 2025-12-18 | Scaling Laws for Energy Efficiency of Local LLMs | Ander Alvarez et.al. | 2512.16531 | null |
| 2025-12-18 | Plain language adaptations of biomedical text using LLMs: Comparision of evaluation metrics | Primoz Kocbek et.al. | 2512.16530 | null |
| 2025-12-18 | Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems | En-Ming Huang et.al. | 2512.16473 | null |
| 2025-12-18 | cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution | Jinwu Chen et.al. | 2512.16465 | null |
| 2025-12-18 | TimeSeries2Report prompting enables adaptive large language model management of lithium-ion batteries | Jiayang Yang et.al. | 2512.16453 | null |
| 2025-12-18 | Towards AI-Supported Research: a Vision of the TIB AIssistant | Sören Auer et.al. | 2512.16447 | null |
| 2025-12-18 | Topic Modelling Black Box Optimization | Roman Akramov et.al. | 2512.16445 | null |
| 2025-12-18 | TIB AIssistant: a Platform for AI-Supported Research Across Research Life Cycles | Allard Oelen et.al. | 2512.16442 | null |
| 2025-12-18 | From Essence to Defense: Adaptive Semantic-aware Watermarking for Embedding-as-a-Service Copyright Protection | Hao Li et.al. | 2512.16439 | null |
| 2025-12-18 | Introducing ORKG ASK: an AI-driven Scholarly Literature Search and Exploration System Taking a Neuro-Symbolic Approach | Allard Oelen et.al. | 2512.16425 | null |
| 2025-12-18 | Synthelite: Chemist-aligned and feasibility-aware synthesis planning with LLMs | Nguyen Xuan-Vu et.al. | 2512.16424 | null |
| 2025-12-18 | Large Language Models as a (Bad) Security Norm in the Context of Regulation and Compliance | Kaspar Rosager Ludvigsen et.al. | 2512.16419 | null |
| 2025-12-18 | BrepLLM: Native Boundary Representation Understanding with Large Language Models | Liyuan Deng et.al. | 2512.16413 | null |
| 2025-12-18 | A Network Arena for Benchmarking AI Agents on Network Troubleshooting | Zhihao Wang et.al. | 2512.16381 | null |
| 2025-12-18 | Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs | Sara Papi et.al. | 2512.16378 | null |
| 2025-12-18 | Factorized Video Generation: Decoupling Scene Construction and Temporal Synthesis in Text-to-Video Diffusion Models | Mariam Hassan et.al. | 2512.16371 | null |
| 2025-12-18 | AI Needs Physics More Than Physics Needs AI | Peter Coveney et.al. | 2512.16344 | null |
| 2025-12-18 | Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference | Arther Tian et.al. | 2512.16317 | null |
| 2025-12-18 | Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation | Yuxuan Qiao et.al. | 2512.16310 | null |
| 2025-12-18 | PixelArena: A benchmark for Pixel-Precision Visual Intelligence | Feng Liang et.al. | 2512.16303 | null |
| 2025-12-18 | Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection | Fanrui Zhang et.al. | 2512.16300 | null |
| 2025-12-18 | Feature-Selective Representation Misdirection for Machine Unlearning | Taozhao Chen et.al. | 2512.16297 | null |
| 2025-12-18 | MACL: Multi-Label Adaptive Contrastive Learning Loss for Remote Sensing Image Retrieval | Amna Amir et.al. | 2512.16294 | null |
| 2025-12-18 | Ein Typenrad auf der Überholspur: Die Kult-Schreibmaschine “Erika” trifft KI | Karola Köpferl et.al. | 2512.16293 | null |
| 2025-12-18 | In-Context Probing for Membership Inference in Fine-Tuned Language Models | Zhexi Lu et.al. | 2512.16292 | null |
| 2025-12-18 | Evaluating OpenAI GPT Models for Translation of Endangered Uralic Languages: A Comparison of Reasoning and Non-Reasoning Architectures | Yehor Tereshchenko et.al. | 2512.16287 | null |
| 2025-12-18 | CKA-Guided Modular Quantization: Beyond Bit-Width to Algorithmic Diversity | Jinhao Zhang et.al. | 2512.16282 | null |
| 2025-12-18 | Love, Lies, and Language Models: Investigating AI’s Role in Romance-Baiting Scams | Gilad Gressel et.al. | 2512.16280 | null |
| 2025-12-18 | QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems | Yiliu Yang et.al. | 2512.16279 | null |
| 2025-12-18 | Fast Collaborative Inference via Distributed Speculative Decoding | Ce Zheng et.al. | 2512.16273 | null |
| 2025-12-18 | Beyond Blind Spots: Analytic Hints for Mitigating LLM-Based Evaluation Pitfalls | Ora Nova Fandina et.al. | 2512.16272 | null |
| 2025-12-18 | Learning to Wait: Synchronizing Agents with the Physical World | Yifei She et.al. | 2512.16262 | null |
| 2025-12-18 | AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding | Sanjoy Chowdhury et.al. | 2512.16250 | null |
| 2025-12-18 | AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints | Aniruddha Roy et.al. | 2512.16245 | null |
| 2025-12-18 | Coarse-to-Fine Open-Set Graph Node Classification with Large Language Models | Xueqi Ma et.al. | 2512.16244 | null |
| 2025-12-18 | Trustworthy and Controllable Professional Knowledge Utilization in Large Language Models with TEE-GPU Execution | Yifeng Cai et.al. | 2512.16238 | null |
| 2025-12-18 | The Evolution of Reranking Models in Information Retrieval: From Heuristic Methods to Large Language Models | Tejul Pandit et.al. | 2512.16236 | null |
| 2025-12-18 | LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding | Chenkai Xu et.al. | 2512.16229 | null |
| 2025-12-18 | An Information-Theoretic Framework for Robust Large Language Model Editing | Qizhou Chen et.al. | 2512.16227 | null |
| 2025-12-18 | DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack | Hao Li et.al. | 2512.16182 | null |
| 2025-12-18 | Ev-Trust: A Strategy Equilibrium Trust Mechanism for Evolutionary Games in LLM-Based Multi-Agent Services | Shiduo Yang et.al. | 2512.16167 | null |
| 2025-12-18 | Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference | Jian Tian et.al. | 2512.16134 | null |
| 2025-12-18 | Scaling Text2SQL via LLM-efficient Schema Filtering with Functional Dependency Graph Rerankers | Thanh Dat Hoang et.al. | 2512.16083 | null |
| 2025-12-18 | Auto-Vocabulary 3D Object Detection | Haomeng Zhang et.al. | 2512.16077 | null |
| 2025-12-18 | LLM4Perf: Large Language Models Are Effective Samplers for Multi-Objective Performance Modeling (Copy) | Xin Wang et.al. | 2512.16070 | null |
| 2025-12-18 | A Multi-Agent Large Language Model Framework for Automated Qualitative Analysis | Qidi Xu et.al. | 2512.16063 | null |
| 2025-12-18 | ContextLeak: Auditing Leakage in Private In-Context Learning Methods | Jacob Choi et.al. | 2512.16059 | null |
| 2025-12-18 | MultiPath Transfer Engine: Breaking GPU and Host-Memory Bandwidth Bottlenecks in LLM Services | Lingfeng Tang et.al. | 2512.16056 | null |
| 2025-12-17 | Topic Discovery and Classification for Responsible Generative AI Adaptation in Higher Education | Diane Myung-kyung Woodbridge et.al. | 2512.16036 | null |
| 2025-12-17 | Do Large Language Models Know What They Don’t Know? Kalshibench: A New Benchmark for Evaluating Epistemic Calibration via Prediction Markets | Lukas Nel et.al. | 2512.16030 | null |
| 2025-12-17 | Cross-Language Bias Examination in Large Language Models | Yuxuan Liang et.al. | 2512.16029 | null |
| 2025-12-17 | Conversational Time Series Foundation Models: Towards Explainable and Effective Forecasting | Defu Cao et.al. | 2512.16022 | null |
| 2025-12-17 | Few-Shot Inference of Human Perceptions of Robot Performance in Social Navigation Scenarios | Qiping Zhang et.al. | 2512.16019 | null |
| 2025-12-17 | OLAF: Towards Robust LLM-Based Annotation Framework in Empirical Software Engineering | Mia Mohammad Imran et.al. | 2512.15979 | null |
| 2025-12-17 | Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models | Caner Erden et.al. | 2512.15973 | null |
| 2025-12-17 | BRAID: Bounded Reasoning for Autonomous Inference and Decisions | Armağan Amcalar et.al. | 2512.15959 | null |
| 2025-12-17 | The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs | Tejas Anvekar et.al. | 2512.15949 | null |
| 2025-12-17 | Privacy Discourse and Emotional Dynamics in Mental Health Information Interaction on Reddit | Jai Kruthunz Naveen Kumar et.al. | 2512.15945 | null |
| 2025-12-17 | Small Language Models for Efficient Agentic Tool Calling: Outperforming Large Models with Targeted Fine-tuning | Polaris Jhandi et.al. | 2512.15943 | null |
| 2025-12-17 | City Navigation in the Wild: Exploring Emergent Navigation from Web-Scale Knowledge in MLLMs | Dwip Dalal et.al. | 2512.15933 | null |
| 2025-12-17 | DSO: Direct Steering Optimization for Bias Mitigation | Lucas Monteiro Paes et.al. | 2512.15926 | null |
| 2025-12-17 | Leveraging Spreading Activation for Improved Document Retrieval in Knowledge-Graph-Based RAG Systems | Jovan Pavlović et.al. | 2512.15922 | null |
| 2025-12-17 | TabReX : Tabular Referenceless eXplainable Evaluation | Tejas Anvekar et.al. | 2512.15907 | null |
| 2025-12-17 | Darth Vecdor: An Open-Source System for Generating Knowledge Graphs Through Large Language Model Queries | Jonathan A. Handler et.al. | 2512.15906 | null |
| 2025-12-17 | PediatricAnxietyBench: Evaluating Large Language Model Safety Under Parental Anxiety and Pressure in Pediatric Consultations | Vahideh Zolfaghari et.al. | 2512.15894 | null |
| 2025-12-17 | VET Your Agent: Towards Host-Independent Autonomy via Verifiable Execution Traces | Artem Grigor et.al. | 2512.15892 | null |
| 2025-12-17 | Seeing Beyond Words: Self-Supervised Visual Learning for Multimodal Large Language Models | Davide Caffagni et.al. | 2512.15885 | null |
| 2025-12-17 | HEPTAPOD: Orchestrating High Energy Physics Workflows Towards Autonomous Agency | Tony Menzo et.al. | 2512.15867 | null |
| 2025-12-17 | Dynamic Rebatching for Efficient Early-Exit Inference with DREX | Xuting Liu et.al. | 2512.15705 | null |
| 2025-12-17 | Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning | Yifei Li et.al. | 2512.15693 | null |
| 2025-12-17 | Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning | Zhenwen Liang et.al. | 2512.15687 | null |
| 2025-12-17 | Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers | Adam Karvonen et.al. | 2512.15674 | null |
| 2025-12-17 | Explaining the Reasoning of Large Language Models Using Attribution Graphs | Chase Walker et.al. | 2512.15663 | null |
| 2025-12-17 | Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning | Jiaqi Xu et.al. | 2512.15662 | null |
| 2025-12-17 | How Much is Too Much? Exploring LoRA Rank Trade-offs for Retaining Knowledge and Domain Robustness | Darshita Rathore et.al. | 2512.15634 | null |
| 2025-12-17 | Evaluating Metrics for Safety with LLM-as-Judges | Kester Clegg et.al. | 2512.15617 | null |
| 2025-12-17 | Behavior Tokens Speak Louder: Disentangled Explainable Recommendation with Behavior Vocabulary | Xinshun Feng et.al. | 2512.15614 | null |
| 2025-12-17 | Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction | Mathieu Blondel et.al. | 2512.15605 | null |
| 2025-12-17 | Evaluating Large Language Models in Scientific Discovery | Zhangde Song et.al. | 2512.15567 | null |
| 2025-12-17 | GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models | Bozhou Li et.al. | 2512.15560 | null |
| 2025-12-17 | CTkvr: KV Cache Retrieval for Long-Context LLMs via Centroid then Token Indexing | Kuan Lu et.al. | 2512.15550 | null |
| 2025-12-17 | When a Nation Speaks: Machine Learning and NLP in People’s Sentiment Analysis During Bangladesh’s 2024 Mass Uprising | Md. Samiul Alim et.al. | 2512.15547 | null |
| 2025-12-17 | An Efficient and Effective Encoder Model for Vision and Language Tasks in the Remote Sensing Domain | João Daniel Silva et.al. | 2512.15531 | null |
| 2025-12-17 | EmoCaliber: Advancing Reliable Visual Emotion Comprehension via Confidence Verbalization and Calibration | Daiqing Wu et.al. | 2512.15528 | null |
| 2025-12-17 | How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code? | Hua Yang et.al. | 2512.15468 | null |
| 2025-12-17 | On Assessing the Relevance of Code Reviews Authored by Generative Models | Robert Heumüller et.al. | 2512.15466 | null |
| 2025-12-17 | Toward expert-level motivational interviewing for health behavior improvement with LLMs | Run-ze Hu et.al. | 2512.15446 | null |
| 2025-12-17 | Step-GUI Technical Report | Haolong Yan et.al. | 2512.15431 | null |
| 2025-12-17 | Can AI Generate more Comprehensive Test Scenarios? Review on Automated Driving Systems Test Scenario Generation Methods | Ji Zhou et.al. | 2512.15422 | null |
| 2025-12-17 | Bilateral Spatial Reasoning about Street Networks: Graph-based RAG with Qualitative Spatial Representations | Reinhard Moratz et.al. | 2512.15388 | null |
| 2025-12-17 | MedNuggetizer: Confidence-Based Information Nugget Extraction from Medical Documents | Gregor Donabauer et.al. | 2512.15384 | null |
| 2025-12-17 | SCOPE: Prompt Evolution for Enhancing Agent Effectiveness | Zehua Pei et.al. | 2512.15374 | null |
| 2025-12-17 | ArcBERT: An LLM-based Search Engine for Exploring Integrated Multi-Omics Metadata | Gajendra Doniparthi et.al. | 2512.15365 | null |
| 2025-12-17 | Revisiting Task-Oriented Dataset Search in the Era of Large Language Models: Challenges, Benchmark, and Solution | Zixin Wei et.al. | 2512.15363 | null |
| 2025-12-17 | Dual-Density Inference for Efficient Language Model Reasoning | Zhengyi Zhao et.al. | 2512.15358 | null |
| 2025-12-17 | Adversarial versification in portuguese as a jailbreak operator in LLMs | Joao Queiroz et.al. | 2512.15353 | null |
| 2025-12-17 | Exploring User Acceptance and Concerns toward LLM-powered Conversational Agents in Immersive Extended Reality | Efe Bozkir et.al. | 2512.15343 | null |
| 2025-12-17 | Evaluating LLMs for Zeolite Synthesis Event Extraction (ZSEE): A Systematic Analysis of Prompting Strategies | Charan Prakash Rathore et.al. | 2512.15312 | null |
| 2025-12-17 | SynthSeg-Agents: Multi-Agent Synthetic Data Generation for Zero-Shot Weakly Supervised Semantic Segmentation | Wangyu Wu et.al. | 2512.15310 | null |
| 2025-12-17 | Towards Proactive Personalization through Profile Customization for Individual Users in Dialogues | Xiaotian Zhang et.al. | 2512.15302 | null |
| 2025-12-17 | ChatGPT and Gemini participated in the Korean College Scholastic Ability Test – Earth Science I | Seok-Hyun Ga et.al. | 2512.15298 | null |
| 2025-12-17 | Heterogeneous Model Alignment in Digital Twin | Faima Abbasi et.al. | 2512.15281 | null |
| 2025-12-17 | Bounty Hunter: Autonomous, Comprehensive Emulation of Multi-Faceted Adversaries | Louis Hackländer-Jansen et.al. | 2512.15275 | null |
| 2025-12-17 | Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning | Yiliu Sun et.al. | 2512.15274 | null |
| 2025-12-17 | Gaming the Arena: AI Model Evaluation and the Viral Capture of Attention | Sam Hind et.al. | 2512.15252 | null |
| 2025-12-17 | The Moralization Corpus: Frame-Based Annotation and Analysis of Moralizing Speech Acts across Diverse Text Genres | Maria Becker et.al. | 2512.15248 | null |
| 2025-12-17 | Null-LoRA: Low-Rank Adaptation on Null Space | Yi Zhang et.al. | 2512.15233 | null |
| 2025-12-17 | CangLing-KnowFlow: A Unified Knowledge-and-Flow-fused Agent for Comprehensive Remote Sensing Applications | Zhengchao Chen et.al. | 2512.15231 | null |
| 2025-12-17 | Yes-MT’s Submission to the Low-Resource Indic Language Translation Shared Task in WMT 2024 | Yash Bhaskar et.al. | 2512.15226 | null |
| 2025-12-17 | RFKG-CoT: Relation-Driven Adaptive Hop-count Selection and Few-Shot Path Guidance for Knowledge-Aware QA | Chao Zhang et.al. | 2512.15219 | null |
| 2025-12-17 | DEER: Draft with Diffusion, Verify with Autoregressive Models | Zicong Cheng et.al. | 2512.15176 | null |
| 2025-12-17 | MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers | Xuanjun Zong et.al. | 2512.15163 | null |
| 2025-12-17 | Offline Multi-Task Multi-Objective Data-Driven Evolutionary Algorithm with Language Surrogate Model and Implicit Q-Learning | Xian-Rong Zhang et.al. | 2512.15149 | null |
| 2025-12-17 | Aligning Academia with Industry: An Empirical Study of Industrial Needs and Academic Capabilities in AI-Driven Software Engineering | Hang Yu et.al. | 2512.15148 | null |
| 2025-12-17 | Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning | Weiqin Wang et.al. | 2512.15146 | null |
| 2025-12-17 | I am here for you”: How relational conversational AI appeals to adolescents, especially those who are socially and emotionally vulnerable | Pilyoung Kim et.al. | 2512.15117 | null |
| 2025-12-17 | Uni-Parser Technical Report | Xi Fang et.al. | 2512.15098 | null |
| 2025-12-17 | Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models | Jinwu Hu et.al. | 2512.15089 | null |
| 2025-12-17 | The Semantic Architect: How FEAML Bridges Structured Data and LLMs for Multi-Label Tasks | Wanfu Gao et.al. | 2512.15082 | null |
| 2025-12-17 | Quantifying Return on Security Controls in LLM Systems | Richard Helder Moulton et.al. | 2512.15081 | null |
| 2025-12-17 | An Exploratory Study of Bayesian Prompt Optimization for Test-Driven Code Generation with Large Language Models | Shlok Tomar et.al. | 2512.15076 | null |
| 2025-12-17 | The Meta-Prompting Protocol: Orchestrating LLMs via Adversarial Feedback Loops | Fanzhe Fu et.al. | 2512.15053 | null |
| 2025-12-17 | SGM: Safety Glasses for Multimodal Large Language Models via Neuron-Level Detoxification | Hongbo Wang et.al. | 2512.15052 | null |
| 2025-12-17 | Beyond Accuracy: A Geometric Stability Analysis of Large Language Models in Chess Evaluation | Xidan Song et.al. | 2512.15033 | null |
| 2025-12-17 | Toxicity Ahead: Forecasting Conversational Derailment on GitHub | Mia Mohammad Imran et.al. | 2512.15031 | null |
| 2025-12-17 | SeBERTis: A Framework for Producing Classifiers of Security-Related Issue Reports | Sogol Masoumzadeh et.al. | 2512.15003 | null |
| 2025-12-17 | DreamPRM-Code: Function-as-Step Process Reward Model with Label Correction for LLM Coding | Ruiyi Zhang et.al. | 2512.15000 | null |
| 2025-12-17 | Evaluating Large Language Models on Multimodal Chemistry Olympiad Exams | Yiming Cui et.al. | 2512.14989 | null |
| 2025-12-16 | EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving | Shaoting Feng et.al. | 2512.14946 | null |
| 2025-12-16 | Parameter Efficient Multimodal Instruction Tuning for Romanian Vision Language Models | George-Andrei Dima et.al. | 2512.14926 | null |
| 2025-12-16 | Multiscale Aggregated Hierarchical Attention (MAHA): A Game Theoretic and Optimization Driven Approach to Efficient Contextual Modeling in Large Language Models | Caner Erden et.al. | 2512.14925 | null |
| 2025-12-16 | Evaluating Code Reasoning Abilities of Large Language Models Under Real-World Settings | Changshu Liu et.al. | 2512.14917 | null |
| 2025-12-16 | DrugRAG: Enhancing Pharmacy LLM Performance Through A Novel Retrieval-Augmented Generation Pipeline | Houman Kazemzadeh et.al. | 2512.14896 | null |
| 2025-12-16 | Integrating Large Language Models and Knowledge Graphs to Capture Political Viewpoints in News Media | Massimiliano Fadda et.al. | 2512.14887 | null |
| 2025-12-16 | Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse | Jingwei Chen et.al. | 2512.14879 | null |
| 2025-12-16 | Isolated Sign Language Recognition with Segmentation and Pose Estimation | Daniel Perkins et.al. | 2512.14876 | null |
| 2025-12-16 | HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering | Dan Ben-Ami et.al. | 2512.14870 | null |
| 2025-12-16 | MALCDF: A Distributed Multi-Agent LLM Framework for Real-Time Cyber | Arth Bhardwaj et.al. | 2512.14846 | null |
| 2025-12-16 | Sharing State Between Prompts and Programs | Ellie Y. Cheng et.al. | 2512.14805 | null |
| 2025-12-16 | Incentives or Ontology? A Structural Rebuttal to OpenAI’s Hallucination Thesis | Richard Ackermann et.al. | 2512.14801 | null |
| 2025-12-16 | IaC Generation with LLMs: An Error Taxonomy and A Study on Configuration Knowledge Injection | Roman Nekrasov et.al. | 2512.14792 | null |
| 2025-12-16 | TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs | Jun Zhang et.al. | 2512.14698 | null |
| 2025-12-16 | Fast and Accurate Causal Parallel Decoding using Jacobi Forcing | Lanxiang Hu et.al. | 2512.14681 | null |
| 2025-12-16 | EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models | Zechen Bai et.al. | 2512.14666 | null |
| 2025-12-16 | Enhancing Visual Sentiment Analysis via Semiotic Isotopy-Guided Dataset Construction | Marco Blanchini et.al. | 2512.14665 | null |
| 2025-12-16 | Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models | Chiyue Wei et.al. | 2512.14661 | null |
| 2025-12-16 | Beyond Text-to-SQL: Autonomous Research-Driven Database Exploration with DAR | Ostap Vykhopen et.al. | 2512.14622 | null |
| 2025-12-16 | PerProb: Indirectly Evaluating Memorization in Large Language Models | Yihan Liao et.al. | 2512.14600 | null |
| 2025-12-16 | LLM-driven Knowledge Enhancement for Multimodal Cancer Survival Prediction | Chenyu Zhao et.al. | 2512.14594 | null |
| 2025-12-16 | Towards Nepali-language LLMs: Efficient GPT training with a Nepali BPE tokenizer | Adarsha Shrestha et.al. | 2512.14585 | null |
| 2025-12-16 | Pairwise Comparison for Bias Identification and Quantification | Fabian Haak et.al. | 2512.14565 | null |
| 2025-12-16 | Polypersona: Persona-Grounded LLM for Synthetic Survey Responses | Tejaswani Dash et.al. | 2512.14562 | null |
| 2025-12-16 | Agreement Between Large Language Models and Human Raters in Essay Scoring: A Research Synthesis | Hongli Li et.al. | 2512.14561 | null |
| 2025-12-16 | CLNet: Cross-View Correspondence Makes a Stronger Geo-Localizationer | Xianwei Cao et.al. | 2512.14560 | null |
| 2025-12-16 | VLegal-Bench: Cognitively Grounded Benchmark for Vietnamese Legal Reasoning of Large Language Models | Nguyen Tien Dong et.al. | 2512.14554 | null |
| 2025-12-16 | VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse | Ying Nie et.al. | 2512.14531 | null |
| 2025-12-16 | RecGPT-V2 Technical Report | Chao Yi et.al. | 2512.14503 | null |
| 2025-12-16 | C-ing Clearly: Enhanced Binary Code Explanations using C code | Teodor Poncu et.al. | 2512.14500 | null |
| 2025-12-16 | SASQ: Static Activation Scaling for Quantization-Aware Training in Large Language Models | Shizhuo Mao et.al. | 2512.14481 | null |
| 2025-12-16 | Model-First Reasoning LLM Agents: Reducing Hallucinations through Explicit Problem Modeling | Annu Rana et.al. | 2512.14474 | null |
| 2025-12-16 | Reasoning-Style Poisoning of LLM Agents via Stealthy Style Transfer: Process-Level Attacks and Runtime Monitoring in RSV Space | Xingfu Zhou et.al. | 2512.14448 | null |
| 2025-12-16 | Seismology modeling agent: A smart assistant for geophysical researchers | Yukun Ren et.al. | 2512.14429 | null |
| 2025-12-16 | Effect of Document Packing on the Latent Multi-Hop Reasoning Capabilities of Large Language Models | Gabriele Prato et.al. | 2512.14427 | null |
| 2025-12-16 | DISCODE: Distribution-Aware Score Decoder for Robust Automatic Evaluation of Image Captioning | Nakamasa Inoue et.al. | 2512.14420 | null |
| 2025-12-16 | PortAgent: LLM-driven Vehicle Dispatching Agent for Port Terminals | Jia Hu et.al. | 2512.14417 | null |
| 2025-12-16 | Massive Editing for Large Language Models Based on Dynamic Weight Generation | Wentao Wan et.al. | 2512.14395 | null |
| 2025-12-16 | RePo: Language Models with Context Re-Positioning | Huayang Li et.al. | 2512.14391 | null |
| 2025-12-16 | Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations | Xudong Han et.al. | 2512.14321 | null |
| 2025-12-16 | Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity | Shuai Dong et.al. | 2512.14320 | null |
| 2025-12-16 | Inflation Attitudes of Large Language Models | Nikoleta Anesti et.al. | 2512.14306 | null |
| 2025-12-16 | Leveraging LLMs for Collaborative Ontology Engineering in Parkinson Disease Monitoring and Alerting | Georgios Bouchouras et.al. | 2512.14288 | null |
| 2025-12-16 | The Trust in AI-Generated Health Advice (TAIGHA) Scale and Short Version (TAIGHA-S): Development and Validation Study | Marvin Kopka et.al. | 2512.14278 | null |
| 2025-12-16 | SPARQL-LLM: Real-Time SPARQL Query Generation from Natural Language Questions | Panayiotis Smeros et.al. | 2512.14277 | null |
| 2025-12-16 | Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs | Wentao Wan et.al. | 2512.14257 | null |
| 2025-12-16 | TEMP: A Memory Efficient Physical-aware Tensor Partition-Mapping Framework on Wafer-scale Chips | Huizheng Wang et.al. | 2512.14256 | null |
| 2025-12-16 | From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition | Yiqing Zhou et.al. | 2512.14244 | null |
| 2025-12-16 | Two CFG Nahuatl for automatic corpora expansion | Juan-José Guzmán-Landa et.al. | 2512.14239 | null |
| 2025-12-16 | Ladder Up, Memory Down: Low-Cost Fine-Tuning With Side Nets | Estelle Zheng et.al. | 2512.14237 | null |
| 2025-12-16 | PentestEval: Benchmarking LLM-based Penetration Testing with Modular and Stage-Level Design | Ruozhao Yang et.al. | 2512.14233 | null |
| 2025-12-16 | Georeferencing complex relative locality descriptions with large language models | Aneesha Fernando et.al. | 2512.14228 | null |
| 2025-12-16 | Estimating problem difficulty without ground truth using Large Language Model comparisons | Marthe Ballon et.al. | 2512.14220 | null |
| 2025-12-16 | IntentMiner: Intent Inversion Attack via Tool Call Analysis in the Model Context Protocol | Yunhao Yao et.al. | 2512.14166 | null |
| 2025-12-16 | Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement | Songze Liu et.al. | 2512.14151 | null |
| 2025-12-16 | Astraea: A State-Aware Scheduling Engine for LLM-Powered Agents | Hongqiu Ni et.al. | 2512.14142 | null |
| 2025-12-16 | TorchTraceAP: A New Benchmark Dataset for Detecting Performance Anti-Patterns in Computer Vision Models | Hanning Chen et.al. | 2512.14141 | null |
| 2025-12-16 | LAPPI: Interactive Optimization with LLM-Assisted Preference-Based Problem Instantiation | So Kuroki et.al. | 2512.14138 | null |
| 2025-12-16 | SportsGPT: An LLM-driven Framework for Interpretable Sports Motion Assessment and Training Guidance | Wenbo Tian et.al. | 2512.14121 | null |
| 2025-12-16 | CogMem: A Cognitive Memory Architecture for Sustained Multi-Turn Reasoning in Large Language Models | Yiran Zhang et.al. | 2512.14118 | null |
| 2025-12-16 | Neurosymbolic Inference On Foundation Models For Remote Sensing Text-to-image Retrieval With Complex Queries | Emanuele Mezzi et.al. | 2512.14102 | null |
| 2025-12-16 | A First-Order Logic-Based Alternative to Reward Models in RLHF | Chunjin Jian et.al. | 2512.14100 | null |
| 2025-12-16 | Cornserve: Efficiently Serving Any-to-Any Multimodal Models | Jeff J. Ma et.al. | 2512.14098 | null |
| 2025-12-16 | A Unified Sparse Attention via Multi-Granularity Compression | Siran Liu et.al. | 2512.14082 | null |
| 2025-12-16 | From Obfuscated to Obvious: A Comprehensive JavaScript Deobfuscation Tool for Security Analysis | Dongchao Zhou et.al. | 2512.14070 | null |
| 2025-12-16 | RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees | Junjie Ma et.al. | 2512.14069 | null |
| 2025-12-16 | What Affects the Effective Depth of Large Language Models? | Yi Hu et.al. | 2512.14064 | null |
| 2025-12-16 | HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices | HyperAI Team et.al. | 2512.14052 | null |
| 2025-12-16 | OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value | Mengzhang Cai et.al. | 2512.14051 | null |
| 2025-12-16 | Intention Chain-of-Thought Prompting with Dynamic Routing for Code Generation | Shen Li et.al. | 2512.14048 | null |
| 2025-12-16 | Evaluating Small Language Models for Agentic On-Farm Decision Support Systems | Enhong Liu et.al. | 2512.14043 | null |
| 2025-12-16 | ChartAgent: A Chart Understanding Framework with Tool Integrated Reasoning | Boran Wang et.al. | 2512.14040 | null |
| 2025-12-16 | PerfCoder: Large Language Models for Interpretable Code Performance Optimization | Jiuding Yang et.al. | 2512.14018 | null |
| 2025-12-16 | KFS-Bench: Comprehensive Evaluation of Key Frame Sampling in Long Video Understanding | Zongyao Li et.al. | 2512.14017 | null |
| 2025-12-16 | Sparsity-Controllable Dynamic Top-p MoE for Large Foundation Model Pre-training | Can Jin et.al. | 2512.13996 | null |
| 2025-12-16 | Structure-Aware Decoding Mechanisms for Complex Entity Extraction with Large-Scale Language Models | Zhimin Qiu et.al. | 2512.13980 | null |
| 2025-12-16 | ReflCtrl: Controlling LLM Reflection via Representation Engineering | Ge Yan et.al. | 2512.13979 | null |
| 2025-12-16 | Evaluating Frontier LLMs on PhD-Level Mathematical Reasoning: A Benchmark on a Textbook in Theoretical Computer Science about Randomized Algorithms | Yang Cao et.al. | 2512.13978 | null |
| 2025-12-16 | Autonomous Construction-Site Safety Inspection Using Mobile Robots: A Multilayer VLM-LLM Pipeline | Hossein Naderi et.al. | 2512.13974 | null |
| 2025-12-15 | Informing Acquisition Functions via Foundation Models for Molecular Discovery | Qi Chen et.al. | 2512.13935 | null |
| 2025-12-15 | Hierarchical Multi-agent Large Language Model Reasoning for Autonomous Functional Materials Discovery | Samuel Rothfarb et.al. | 2512.13930 | null |
| 2025-12-15 | Context Branching for LLM Conversations: A Version Control Approach to Exploratory Programming | Bhargav Chickmagalur Nanjundappa et.al. | 2512.13914 | null |
| 2025-12-15 | FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition | Jonas Golde et.al. | 2512.13884 | null |
| 2025-12-15 | Verification-Guided Context Optimization for Tool Calling via Hierarchical LLMs-as-Editors | Henger Li et.al. | 2512.13860 | null |
| 2025-12-15 | EvoLattice: Persistent Internal-Population Evolution through Multi-Alternative Quality-Diversity Graph Representations for LLM-Guided Program Discovery | Kamer Ali Yuksel et.al. | 2512.13857 | null |
| 2025-12-15 | Practitioner Insights on Fairness Requirements in the AI Development Life Cycle: An Interview Study | Chaima Boufaied et.al. | 2512.13830 | null |
| 2025-12-15 | The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces | Subramanyam Sahoo et.al. | 2512.13821 | null |
| 2025-12-15 | State-Dependent Refusal and Learned Incapacity in RLHF-Aligned Language Models | TK Lee et.al. | 2512.13762 | null |
| 2025-12-15 | A Scientific Reasoning Model for Organic Synthesis Procedure Generation | Guoqing Liu et.al. | 2512.13668 | null |
| 2025-12-15 | Embedding-Based Rankings of Educational Resources based on Learning Outcome Alignment: Benchmarking, Expert Validation, and Learner Performance | Mohammadreza Molavi et.al. | 2512.13658 | null |
| 2025-12-15 | Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation | Richard J. Young et.al. | 2512.13655 | null |
| 2025-12-15 | Large-Language Memorization During the Classification of United States Supreme Court Cases | John E. Ortega et.al. | 2512.13654 | null |
| 2025-12-15 | MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning | Haoyu Fu et.al. | 2512.13636 | null |
| 2025-12-15 | Temporal Tokenization Strategies for Event Sequence Modeling with Large Language Models | Zefang Liu et.al. | 2512.13618 | null |
| 2025-12-15 | Textual Gradients are a Flawed Metaphor for Automatic Prompt Optimization | Daniel Melcer et.al. | 2512.13598 | null |
| 2025-12-15 | ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding | Jia-Nan Li et.al. | 2512.13586 | null |
| 2025-12-15 | MMhops-R1: Multimodal Multi-hop Reasoning | Tao Zhang et.al. | 2512.13573 | null |
| 2025-12-15 | PrahokBART: A Pre-trained Sequence-to-Sequence Model for Khmer Natural Language Generation | Hour Kaing et.al. | 2512.13552 | null |
| 2025-12-15 | Fine-tuned LLM-based Code Migration Framework | Oleg Grynets et.al. | 2512.13515 | null |
| 2025-12-15 | MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph | Linjie Mu et.al. | 2512.13510 | null |
| 2025-12-15 | SkipCat: Rank-Maximized Low-Rank Compression of Large Language Models via Shared Projection and Block Skipping | Yu-Chen Lu et.al. | 2512.13494 | null |
| 2025-12-15 | From Zipf’s Law to Neural Scaling through Heaps’ Law and Hilberg’s Hypothesis | Łukasz Dębowski et.al. | 2512.13491 | null |
| 2025-12-15 | neuralFOMO: Can LLMs Handle Being Second Best? Measuring Envy-Like Preferences in Multi-Agent Settings | Ojas Pungalia et.al. | 2512.13481 | null |
| 2025-12-15 | Non-Resolution Reasoning (NRR): A Computational Framework for Contextual Identity and Ambiguity Preservation | Kei Saito et.al. | 2512.13478 | null |
| 2025-12-15 | Scaling Laws for Code: Every Programming Language Matters | Jian Yang et.al. | 2512.13472 | null |
| 2025-12-15 | Large language models are not about natural language | Johan J. Bolhuis et.al. | 2512.13441 | null |
| 2025-12-15 | From User Interface to Agent Interface: Efficiency Optimization of UI Representations for LLM Agents | Dezhi Ran et.al. | 2512.13438 | null |
| 2025-12-15 | Behavior and Representation in Large Language Models for Combinatorial Optimization: From Feature Extraction to Algorithm Selection | Francesca Da Ros et.al. | 2512.13374 | null |
| 2025-12-15 | Detecting Emotion Drift in Mental Health Text Using Pre-Trained Transformers | Shibani Sankpal et.al. | 2512.13363 | null |
| 2025-12-15 | UCRBench: Benchmarking LLMs on Use Case Recovery | Shuyuan Xiao et.al. | 2512.13360 | null |
| 2025-12-15 | On the Effectiveness of Membership Inference in Targeted Data Extraction from Large Language Models | Ali Al Sahili et.al. | 2512.13352 | null |
| 2025-12-15 | FROC: A Unified Framework with Risk-Optimized Control for Machine Unlearning in LLMs | Si Qi Goh et.al. | 2512.13337 | null |
| 2025-12-15 | FIN-bench-v2: A Unified and Robust Benchmark Suite for Evaluating Finnish Large Language Models | Joona Kytöniemi et.al. | 2512.13330 | null |
| 2025-12-15 | Security and Detectability Analysis of Unicode Text Watermarking Methods Against Large Language Models | Malte Hellmeier et.al. | 2512.13325 | null |
| 2025-12-15 | KlingAvatar 2.0 Technical Report | Kling Team et.al. | 2512.13313 | null |
| 2025-12-15 | MiniLingua: A Small Open-Source LLM for European Languages | Anna Aksenova et.al. | 2512.13298 | null |
| 2025-12-15 | AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning | Jiaru Zou et.al. | 2512.13278 | null |
| 2025-12-15 | CogniEdit: Dense Gradient Flow Optimization for Fine-Grained Image Editing | Yan Li et.al. | 2512.13276 | null |
| 2025-12-15 | Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection | Juil Koo et.al. | 2512.13250 | null |
| 2025-12-15 | Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance | Francesco Ragusa et.al. | 2512.13238 | null |
| 2025-12-15 | Efficient Adaptive Rejection Sampling for Accelerating Speculative Decoding in Large Language Models | Chendong Sun et.al. | 2512.13194 | null |
| 2025-12-15 | Integrated Semantic and Temporal Alignment for Interactive Video Retrieval | Thanh-Danh Luu et.al. | 2512.13169 | null |
| 2025-12-15 | A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis | Xianchao Guan et.al. | 2512.13164 | null |
| 2025-12-15 | Can AI Understand What We Cannot Say? Measuring Multilevel Alignment Through Abortion Stigma Across Cognitive, Interpersonal, and Structural Levels | Anika Sharma et.al. | 2512.13142 | null |
| 2025-12-15 | Uncovering the Role of Initial Saliency in U-Shaped Attention Bias: Scaling Initial Token Weight for Enhanced Long-Text Processing | Zewen Qiang et.al. | 2512.13109 | null |
| 2025-12-15 | Socratic Students: Teaching Language Models to Learn by Asking Questions | Rajeev Bhatt Ambati et.al. | 2512.13102 | null |
| 2025-12-15 | A Simple and Effective Framework for Symmetric Consistent Indexing in Large-Scale Dense Retrieval | Huimu Wang et.al. | 2512.13074 | null |
| 2025-12-15 | M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization | Bizhe Bai et.al. | 2512.13070 | null |
| 2025-12-15 | LLM Rationalis? Measuring Bargaining Capabilities of AI Negotiators | Cheril Shah et.al. | 2512.13063 | null |
| 2025-12-15 | An Open and Reproducible Deep Research Agent for Long-Form Question Answering | Ikuya Yamada et.al. | 2512.13059 | null |
| 2025-12-15 | Sharpen the Spec, Cut the Code: A Case for Generative File System with SYSSPEC | Qingyuan Liu et.al. | 2512.13047 | null |
| 2025-12-15 | Understanding Structured Financial Data with LLMs: A Case Study on Fraud Detection | Xuwei Tan et.al. | 2512.13040 | null |
| 2025-12-15 | Large Language Models for Power System Applications: A Comprehensive Literature Survey | Muhammad Sarwar et.al. | 2512.13004 | null |
| 2025-12-15 | Are Large Language Models Really Effective for Training-Free Cold-Start Recommendation? | Genki Kusano et.al. | 2512.13001 | null |
| 2025-12-15 | Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views | Tingyang Chen et.al. | 2512.12980 | null |
| 2025-12-15 | Do Reviews Matter for Recommendations in the Era of Large Language Models? | Chee Heng Tan et.al. | 2512.12978 | null |
| 2025-12-15 | Authors Should Annotate | Marcus Ma et.al. | 2512.12976 | null |
| 2025-12-15 | Database Research needs an Abstract Relational Query Language | Wolfgang Gatterbauer et.al. | 2512.12957 | null |
| 2025-12-15 | Building from Scratch: A Multi-Agent Framework with Human-in-the-Loop for Multilingual Legal Terminology Mapping | Lingyi Meng et.al. | 2512.12950 | null |
| 2025-12-15 | SPAR: Session-based Pipeline for Adaptive Retrieval on Legacy File Systems | Duy A. Nguyen et.al. | 2512.12938 | null |
| 2025-12-15 | PROSERVE: Unified Multi-Priority Request Scheduling for LLM Serving | Weizhe Huang et.al. | 2512.12928 | null |
| 2025-12-15 | Interpretable Hypothesis-Driven Trading:A Rigorous Walk-Forward Validation Framework for Market Microstructure Signals | Gagan Deep et.al. | 2512.12924 | null |
| 2025-12-15 | LLM-based Personalized Portfolio Recommender: Integrating Large Language Models and Reinforcement Learning for Intelligent Investment Strategy Optimization | Bangyu Li et.al. | 2512.12922 | null |
| 2025-12-15 | Cisco Integrated AI Security and Safety Framework Report | Amy Chang et.al. | 2512.12921 | null |
| 2025-12-15 | CTIGuardian: A Few-Shot Framework for Mitigating Privacy Leakage in Fine-Tuned LLMs | Shashie Dilhara Batan Arachchige et.al. | 2512.12914 | null |
| 2025-12-14 | SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition | Minghao Zhu et.al. | 2512.12885 | null |
| 2025-12-14 | ERA-IT: Aligning Semantic Models with Revealed Economic Preference for Real-Time and Explainable Patent Valuation | Yoo Yongmin et.al. | 2512.12869 | null |
| 2025-12-14 | Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM | Furong Jia et.al. | 2512.12868 | null |
| 2025-12-14 | Information-Consistent Language Model Recommendations through Group Relative Policy Optimization | Sonal Prabhune et.al. | 2512.12858 | null |
| 2025-12-14 | Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, LLaMA | Hanyu Cai et.al. | 2512.12812 | null |
| 2025-12-14 | Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution | Boyang Yan et.al. | 2512.12806 | null |
| 2025-12-14 | A Disproof of Large Language Model Consciousness: The Necessity of Continual Learning for Consciousness | Erik Hoel et.al. | 2512.12802 | null |
| 2025-12-14 | Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P | Anurag Dutt et.al. | 2512.12801 | null |
| 2025-12-14 | DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning | Zhe Liu et.al. | 2512.12799 | null |
| 2025-12-14 | A Rule-Aware Prompt Framework for Structured Numeric Reasoning in Cyber-Physical Systems | Yichen Liu et.al. | 2512.12794 | null |
| 2025-12-14 | Beyond Task Completion: An Assessment Framework for Evaluating Agentic AI Systems | Sreemaee Akshathala et.al. | 2512.12791 | null |
| 2025-12-14 | State over Tokens: Characterizing the Role of Reasoning Tokens | Mosh Levy et.al. | 2512.12777 | null |
| 2025-12-14 | Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions | Pedro Henrique Luz de Araujo et.al. | 2512.12775 | null |
| 2025-12-14 | JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation | Jianghan Chao et.al. | 2512.12772 | null |
| 2025-12-14 | Adaptive Edge-Cloud Inference for Speech-to-Action Systems Using ASR and Large Language Models (ASTA) | Mohammad Jalili Torkamani et.al. | 2512.12769 | null |
| 2025-12-14 | Intelligent Scientific Literature Explorer using Machine Learning (ISLE) | Sina Jani et.al. | 2512.12760 | null |
| 2025-12-14 | FysicsWorld: A Unified Full-Modality Benchmark for Any-to-Any Understanding, Generation, and Reasoning | Yue Jiang et.al. | 2512.12756 | null |
| 2025-12-14 | Resting Neurons, Active Insights: Improving Input Sparsification for Large Language Models | Haotian Xu et.al. | 2512.12744 | null |
| 2025-12-14 | CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning | Xuanzhang Liu et.al. | 2512.12716 | null |
| 2025-12-14 | Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning | Enhong Mu et.al. | 2512.12706 | null |
| 2025-12-14 | Hybrid Retrieval-Augmented Generation for Robust Multilingual Document Question Answering | Anthony Mudet et.al. | 2512.12694 | null |
| 2025-12-14 | Memoria: A Scalable Agentic Memory Framework for Personalized Conversational AI | Samarth Sarin et.al. | 2512.12686 | null |
| 2025-12-14 | Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches | Amirhossein Yousefiramandi et.al. | 2512.12677 | null |
| 2025-12-14 | LexRel: Benchmarking Legal Relation Extraction for Chinese Civil Cases | Yida Cai et.al. | 2512.12643 | null |
| 2025-12-14 | DiG: Differential Grounding for Enhancing Fine-Grained Perception in Multimodal Large Language Model | Zhou Tao et.al. | 2512.12633 | null |
| 2025-12-14 | ORIBA: Exploring LLM-Driven Role-Play Chatbot as a Creativity Support Tool for Original Character Artists | Yuqian Sun et.al. | 2512.12630 | null |
| 2025-12-14 | Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space | Chengzhi Liu et.al. | 2512.12623 | null |
| 2025-12-14 | Understanding Syllogistic Reasoning in LLMs from Formal and Natural Language Perspectives | Aheli Poddar et.al. | 2512.12620 | null |
| 2025-12-14 | Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching | Wonseok Choi et.al. | 2512.12610 | null |
| 2025-12-14 | Human-Inspired Learning for Large Language Models via Obvious Record and Maximum-Entropy Method Discovery | Hong Su et.al. | 2512.12608 | null |
| 2025-12-14 | Vision-Enhanced Large Language Models for High-Resolution Image Synthesis and Multimodal Data Interpretation | Karthikeya KV et.al. | 2512.12595 | null |
| 2025-12-14 | Beyond Static Scoring: Enhancing Assessment Validity via AI-Generated Interactive Verification | Tom Lee et.al. | 2512.12592 | null |
| 2025-12-14 | StreamingAssistant: Efficient Visual Token Pruning for Accelerating Online Video Understanding | Xinqi Jin et.al. | 2512.12560 | null |
| 2025-12-14 | Large Language Newsvendor: Decision Biases and Cognitive Mechanisms | Jifei Liu et.al. | 2512.12552 | null |
| 2025-12-14 | HyperEdit: Unlocking Instruction-based Text Editing in LLMs via Hypernetworks | Yiming Zeng et.al. | 2512.12544 | null |
| 2025-12-14 | NagaNLP: Bootstrapping NLP for Low-Resource Nagamese Creole with Human-in-the-Loop Synthetic Data | Agniva Maiti et.al. | 2512.12537 | null |
| 2025-12-14 | Diverse LLMs vs. Vulnerabilities: Who Detects and Fixes Them Better? | Arastoo Zibaeirad et.al. | 2512.12536 | null |
| 2025-12-14 | ATLAS: Automated Tree-based Language Analysis System for C and C++ source programs | Jaid Monwar Chowdhury et.al. | 2512.12507 | null |
| 2025-12-14 | KidsArtBench: Multi-Dimensional Children’s Art Evaluation with Attribute-Aware MLLMs | Mingrui Ye et.al. | 2512.12503 | null |
| 2025-12-14 | Explainable AI as a Double-Edged Sword in Dermatology: The Impact on Clinicians versus The Public | Xuhai Xu et.al. | 2512.12500 | null |
| 2025-12-13 | The American Ghost in the Machine: How language models align culturally and the effects of cultural prompting | James Luther et.al. | 2512.12488 | null |
| 2025-12-13 | HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments | Yongjun He et.al. | 2512.12476 | null |
| 2025-12-13 | Large language models have learned to use language | Gary Lupyan et.al. | 2512.12447 | null |
| 2025-12-13 | Can GPT replace human raters? Validity and reliability of machine-generated norms for metaphors | Veronica Mangiaterra et.al. | 2512.12444 | null |
| 2025-12-11 | Towards Efficient and Effective Multi-Camera Encoding for End-to-End Driving | Jiawei Yang et.al. | 2512.10947 | null |
| 2025-12-11 | FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos | Yulu Gan et.al. | 2512.10927 | link |
| 2025-12-11 | SparseSwaps: Tractable LLM Pruning Mask Refinement at Scale | Max Zimmer et.al. | 2512.10922 | null |
| 2025-12-11 | CompanionCast: A Multi-Agent Conversational AI Framework with Spatial Audio for Social Co-Viewing Experiences | Yiyang Wang et.al. | 2512.10918 | null |
| 2025-12-11 | Multi-Granular Node Pruning for Circuit Discovery | Muhammad Umair Haider et.al. | 2512.10903 | null |
| 2025-12-11 | LLMs Can Assist with Proposal Selection at Large User Facilities | Lijie Ding et.al. | 2512.10895 | null |
| 2025-12-11 | Computational emotion analysis with multimodal LLMs: Current evidence on an emerging methodological opportunity | Hauke Licht et.al. | 2512.10882 | null |
| 2025-12-11 | Quantifying Emotional Tone in Tolkien’s The Hobbit: Dialogue Sentiment Analysis with RegEx, NRC-VAD, and Python | Lilin Qiu et.al. | 2512.10865 | null |
| 2025-12-11 | Large Language Models for Superconductor Discovery | Suman Itani et.al. | 2512.10847 | null |
| 2025-12-11 | LabelFusion: Learning to Fuse LLMs and Transformer Classifiers for Robust Text Classification | Michael Schlee et.al. | 2512.10793 | null |
| 2025-12-11 | The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality | Aileen Cheng et.al. | 2512.10791 | null |
| 2025-12-11 | Natural Language Interface for Firewall Configuration | F. Taghiyev et.al. | 2512.10789 | null |
| 2025-12-11 | Developing and Evaluating a Large Language Model-Based Automated Feedback System Grounded in Evidence-Centered Design for Supporting Physics Problem Solving | Holger Maus et.al. | 2512.10785 | null |
| 2025-12-11 | Script Gap: Evaluating LLM Triage on Indian Languages in Native vs Roman Scripts in a Real World Setting | Manurag Khullar et.al. | 2512.10780 | null |
| 2025-12-11 | OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification | Zijian Wu et.al. | 2512.10756 | null |
| 2025-12-11 | LDP: Parameter-Efficient Fine-Tuning of Multimodal LLM for Medical Report Generation | Tianyu Zhou et.al. | 2512.10750 | null |
| 2025-12-11 | Echoes of Automation: How Bots Shaped Political Discourse in Brazil | Merve Ipek Bal et.al. | 2512.10749 | null |
| 2025-12-11 | TRIDENT: A Redundant Architecture for Caribbean-Accented Emergency Speech Triage | Elroy Galbraith et.al. | 2512.10741 | null |
| 2025-12-11 | Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving | Songyang Gao et.al. | 2512.10739 | null |
| 2025-12-11 | Textual Data Bias Detection and Mitigation - An Extensible Pipeline with Experimental Evaluation | Rebekka Görge et.al. | 2512.10734 | null |
| 2025-12-11 | IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Generation | Yuan-Ming Li et.al. | 2512.10730 | link |
| 2025-12-11 | Beyond the Black Box: Identifiable Interpretation and Control in Generative Models via Causal Minimality | Lingjing Kong et.al. | 2512.10720 | null |
| 2025-12-11 | PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In Code | Itay Dreyfuss et.al. | 2512.10713 | null |
| 2025-12-11 | COMPARE: Clinical Optimization with Modular Planning and Assessment via RAG-Enhanced AI-OCT: Superior Decision Support for Percutaneous Coronary Intervention Compared to ChatGPT-5 and Junior Operators | Wei Fang et.al. | 2512.10702 | null |
| 2025-12-11 | Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution | Zouying Cao et.al. | 2512.10696 | null |
| 2025-12-11 | Challenges of Evaluating LLM Safety for User Welfare | Manon Kempermann et.al. | 2512.10687 | null |
| 2025-12-11 | On the Dynamics of Multi-Agent LLM Communities Driven by Value Diversity | Muhua Huang et.al. | 2512.10665 | null |
| 2025-12-11 | Token Sample Complexity of Attention | Léa Bohbot et.al. | 2512.10656 | null |
| 2025-12-11 | TriDF: Evaluating Perception, Detection, and Hallucination for Interpretable DeepFake Detection | Jian-Yu Jiang-Lin et.al. | 2512.10652 | null |
| 2025-12-11 | From Data Scarcity to Data Care: Reimagining Language Technologies for Serbian and other Low-Resource Languages | Smiljana Antonijevic Ubois et.al. | 2512.10630 | null |
| 2025-12-11 | AgriGPT-Omni: A Unified Speech-Vision-Text Framework for Multilingual Agricultural Intelligence | Bo Yang et.al. | 2512.10624 | null |
| 2025-12-11 | Phythesis: Physics-Guided Evolutionary Scene Synthesis for Energy-Efficient Data Center Design via LLMs | Minghao LI et.al. | 2512.10611 | null |
| 2025-12-11 | Multi-Objective Reward and Preference Optimization: Theory and Algorithms | Akhil Agnihotri et.al. | 2512.10601 | null |
| 2025-12-11 | Beyond Pixels: A Training-Free, Text-to-Text Framework for Remote Sensing Image Retrieval | J. Xiao et.al. | 2512.10596 | null |
| 2025-12-11 | RoleRMBench & RoleRM: Towards Reward Modeling for Profile-Based Role Play in Dialogue Systems | Hang Ding et.al. | 2512.10575 | null |
| 2025-12-11 | NormCode: A Semi-Formal Language for Context-Isolated AI Planning | Xin Guan et.al. | 2512.10563 | null |
| 2025-12-11 | Causal Reasoning Favors Encoders: On The Limits of Decoder-Only Models | Amartya Roy et.al. | 2512.10561 | null |
| 2025-12-11 | Grounding Everything in Tokens for Multimodal Large Language Models | Xiangxuan Ren et.al. | 2512.10554 | null |
| 2025-12-11 | LLM-Auction: Generative Auction towards LLM-Native Advertising | Chujie Zhao et.al. | 2512.10551 | null |
| 2025-12-11 | Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding | Yuchen Feng et.al. | 2512.10548 | null |
| 2025-12-11 | Unlocking the Address Book: Dissecting the Sparse Semantic Structure of LLM Key-Value Caches via Sparse Autoencoders | Qingsen Ma et.al. | 2512.10547 | null |
| 2025-12-11 | XDoGE: Multilingual Data Reweighting to Enhance Language Inclusivity in LLMs | Iñaki Lacunza et.al. | 2512.10545 | null |
| 2025-12-11 | Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning | Haiteng Zhao et.al. | 2512.10534 | null |
| 2025-12-11 | Zero-shot 3D Map Generation with LLM Agents: A Dual-Agent Architecture for Procedural Content Generation | Lim Chien Her et.al. | 2512.10501 | null |
| 2025-12-11 | Decoding Human-LLM Collaboration in Coding: An Empirical Study of Multi-Turn Conversations in the Wild | Binquan Zhang et.al. | 2512.10493 | null |
| 2025-12-11 | LLM-Assisted AHP for Explainable Cyber Range Evaluation | Vyron Kampourakis et.al. | 2512.10487 | null |
| 2025-12-11 | From Lab to Reality: A Practical Evaluation of Deep Learning Models and LLMs for Vulnerability Detection | Chaomeng Lu et.al. | 2512.10485 | null |
| 2025-12-11 | Grammaticality Judgments in Humans and Language Models: Revisiting Generative Grammar with LLMs | Lars G. B. Johnsen et.al. | 2512.10453 | null |
| 2025-12-11 | When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection | Devanshu Sahoo et.al. | 2512.10449 | null |
| 2025-12-11 | Decoding Student Minds: Leveraging Conversational Agents for Psychological and Learning Analysis | Nour El Houda Ben Chaabene et.al. | 2512.10441 | null |
| 2025-12-11 | Enhancing Next-Generation Language Models with Knowledge Graphs: Extending Claude, Mistral IA, and GPT-4 via KG-BERT | Nour El Houda Ben Chaabene et.al. | 2512.10440 | null |
| 2025-12-11 | Semantic Reconstruction of Adversarial Plagiarism: A Context-Aware Framework for Detecting and Restoring “Tortured Phrases” in Scientific Literature | Agniva Maiti et.al. | 2512.10435 | null |
| 2025-12-11 | Cooperative Retrieval-Augmented Generation for Question Answering: Mutual Information Exchange and Ranking by Contrasting Layers | Youmin Ko et.al. | 2512.10422 | null |
| 2025-12-11 | How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation | Devanshu Sahoo et.al. | 2512.10415 | null |
| 2025-12-11 | Sliding Window Attention Adaptation | Yijiong Yu et.al. | 2512.10411 | null |
| 2025-12-11 | RoboNeuron: A Modular Framework Linking Foundation Models and ROS for Embodied AI | Weifan Guan et.al. | 2512.10394 | null |
| 2025-12-11 | GPG: Generalized Policy Gradient Theorem for Transformer-based Policies | Hangyu Mao et.al. | 2512.10365 | null |
| 2025-12-11 | Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models | Woojun Jung et.al. | 2512.10362 | null |
| 2025-12-11 | Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task | Sunqi Fan et.al. | 2512.10359 | null |
| 2025-12-11 | Dynamics of Agentic Loops in Large Language Models: A Geometric Theory of Trajectories | Nicolas Tacheny et.al. | 2512.10350 | null |
| 2025-12-11 | EchoingPixels: Cross-Modal Adaptive Token Reduction for Efficient Audio-Visual LLMs | Chao Gong et.al. | 2512.10324 | null |
| 2025-12-11 | EpiPlanAgent: Agentic Automated Epidemic Response Planning | Kangkun Mao et.al. | 2512.10313 | null |
| 2025-12-11 | Efficient-VLN: A Training-Efficient Vision-Language Navigation Model | Duo Zheng et.al. | 2512.10310 | null |
| 2025-12-11 | Reverse Thinking Enhances Missing Information Detection in Large Language Models | Yuxin Liu et.al. | 2512.10273 | null |
| 2025-12-11 | VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models | Yuetong Su et.al. | 2512.10262 | null |
| 2025-12-11 | Reject or Not?: A Benchmark for Voice Assistant Query Rejection in Smart Home Scenario and an Improved Method Based on LLMs | Huichao Men et.al. | 2512.10257 | null |
| 2025-12-11 | InFerActive: Towards Scalable Human Evaluation of Large Language Models through Interactive Inference | Junhyeong Hwangbo et.al. | 2512.10234 | null |
| 2025-12-11 | Adaptive Information Routing for Multimodal Time Series Forecasting | Jun Seo et.al. | 2512.10229 | null |
| 2025-12-11 | Does SWE-Bench-Verified Test Agent Ability or Model Memory? | Thanosan Prathifkumar et.al. | 2512.10218 | null |
| 2025-12-11 | CP-Env: Evaluating Large Language Models on Clinical Pathways in a Controllable Hospital Environment | Yakun Zhu et.al. | 2512.10206 | null |
| 2025-12-11 | AutoMedic: An Automated Evaluation Framework for Clinical Conversational Agents with Medical Dataset Grounding | Gyutaek Oh et.al. | 2512.10195 | null |
| 2025-12-11 | CIEGAD: Cluster-Conditioned Interpolative and Extrapolative Framework for Geometry-Aware and Domain-Aligned Data Augmentation | Keito Inoshita et.al. | 2512.10178 | null |
| 2025-12-11 | ATLAS: Automated Toolkit for Large-Scale Verified Code Synthesis | Mantas Baksys et.al. | 2512.10173 | null |
| 2025-12-11 | Offscript: Automated Auditing of Instruction Adherence in LLMs | Nicholas Clark et.al. | 2512.10172 | null |
| 2025-12-10 | Enhancing Large Language Models for End-to-End Circuit Analysis Problem Solving | Liangliang Chen et.al. | 2512.10159 | null |
| 2025-12-10 | Unforgotten Safety: Preserving Safety Alignment of Large Language Models with Continual Learning | Lama Alssum et.al. | 2512.10150 | null |
| 2025-12-10 | PARAN: Persona-Augmented Review ANswering system on Food Delivery Review Dataset | Moonsoo Park et.al. | 2512.10148 | null |
| 2025-12-10 | Workflow is All You Need: Escaping the “Statistical Smoothing Trap” via High-Entropy Information Foraging and Adversarial Pacing | Zhongjie Jiang et.al. | 2512.10121 | null |
| 2025-12-10 | AgriRegion: Region-Aware Retrieval for High-Fidelity Agricultural Advice | Mesafint Fanuel et.al. | 2512.10114 | null |
| 2025-12-10 | Generate-Then-Validate: A Novel Question Generation Approach Using Small Language Models | Yumou Wei et.al. | 2512.10110 | null |
| 2025-12-10 | LLM-PEA: Leveraging Large Language Models Against Phishing Email Attacks | Najmul Hassan et.al. | 2512.10104 | null |
| 2025-12-10 | What Kind of Reasoning (if any) is an LLM actually doing? On the Stochastic Nature and Abductive Appearance of Large Language Models | Luciano Floridi et.al. | 2512.10080 | null |
| 2025-12-10 | Independent Density Estimation | Jiahao Liu et.al. | 2512.10067 | null |
| 2025-12-10 | Linear socio-demographic representations emerge in Large Language Models from indirect cues | Paul Bouchaud et.al. | 2512.10065 | null |
| 2025-12-10 | \textsc{Text2Graph}: Combining Lightweight LLMs and GNNs for Efficient Text Classification in Label-Scarce Scenarios | João Lucas Luz Lima Sarcinelli et.al. | 2512.10061 | null |
| 2025-12-10 | Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning | Logan Robbins et.al. | 2512.10054 | null |
| 2025-12-10 | Detailed balance in large language model-driven agents | Zhuo-Yang Song et.al. | 2512.10047 | null |
| 2025-12-10 | Local LLM Ensembles for Zero-shot Portuguese Named Entity Recognition | João Lucas Luz Lima Sarcinelli et.al. | 2512.10043 | null |
| 2025-12-10 | Intelligently Weighting Multiple Reference Models for Direct Preference Optimization of LLMs | Skyler Wu et.al. | 2512.10040 | null |
| 2025-12-10 | Exploring LLMs for Scientific Information Extraction Using The SciEx Framework | Sha Li et.al. | 2512.10004 | null |
| 2025-12-10 | SCOPE: Language Models as One-Time Teacher for Hierarchical Planning in Text Environments | Haoye Lu et.al. | 2512.09897 | null |
| 2025-12-10 | Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs | Pius Horn et.al. | 2512.09874 | link |
| 2025-12-10 | FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning | Khurram Khalil et.al. | 2512.09872 | null |
| 2025-12-10 | MedForget: Hierarchy-Aware Multimodal Unlearning Testbed for Medical AI | Fengli Wu et.al. | 2512.09867 | null |
| 2025-12-10 | UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving | Hao Lu et.al. | 2512.09864 | null |
| 2025-12-10 | Mitigating Social Bias in English and Urdu Language Models Using PRM-Guided Candidate Selection and Sequential Refinement | Muneeb Ur Raheem Khan et.al. | 2512.09854 | null |
| 2025-12-10 | ChronusOmni: Improving Time Awareness of Omni Large Language Models | Yijing Chen et.al. | 2512.09841 | null |
| 2025-12-10 | LLMs in Interpreting Legal Documents | Simone Corbo et.al. | 2512.09830 | null |
| 2025-12-10 | RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning | Khurram Khalil et.al. | 2512.09829 | null |
| 2025-12-10 | DeepSeek’s WEIRD Behavior: The cultural alignment of Large Language Models and the effects of prompt language and cultural prompting | James Luther et.al. | 2512.09772 | null |
| 2025-12-10 | Defining Cost Function of Steganography with Large Language Models | Hanzhou Wu et.al. | 2512.09769 | null |
| 2025-12-10 | Towards Language Model Guided TLA+ Proof Automation | Yuhao Zhou et.al. | 2512.09758 | null |
| 2025-12-10 | Knowledge Graph Enrichment and Reasoning for Nobel Laureates | Thanh-Lam T. Nguyen et.al. | 2512.09707 | null |
| 2025-12-10 | Exqutor: Extended Query Optimizer for Vector-augmented Analytical Queries | Hyunjoon Kim et.al. | 2512.09695 | null |
| 2025-12-10 | Understanding Chain-of-Thought Effectiveness in Code Generation: An Empirical and Information-Theoretic Analysis | Naizhu Jin et.al. | 2512.09679 | null |
| 2025-12-10 | The Ky Fan Norms and Beyond: Dual Norms and Combinations for Matrix Optimization | Alexey Kravatskiy et.al. | 2512.09678 | null |
| 2025-12-10 | d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models | Leyi Pan et.al. | 2512.09675 | null |
| 2025-12-10 | IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting | Tao Zhang et.al. | 2512.09663 | link |
| 2025-12-10 | Can LLMs Evaluate What They Cannot Annotate? Revisiting LLM Reliability in Hate Speech Detection | Paloma Piot et.al. | 2512.09662 | null |
| 2025-12-10 | Measuring Corruption from Text Data | Arieda Muço et.al. | 2512.09652 | null |
| 2025-12-10 | MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment | Mengxi Xiao et.al. | 2512.09636 | link |
| 2025-12-10 | Creation of the Estonian Subjectivity Dataset: Assessing the Degree of Subjectivity on a Scale | Karl Gustav Gailit et.al. | 2512.09634 | null |
| 2025-12-10 | An End-to-end Planning Framework with Agentic LLMs and PDDL | Emanuele La Malfa et.al. | 2512.09629 | null |
| 2025-12-10 | LogICL: Distilling LLM Reasoning to Bridge the Semantic Gap in Cross-Domain Log Anomaly Detection | Jingwei Ye et.al. | 2512.09627 | null |
| 2025-12-10 | Rethinking Chain-of-Thought Reasoning for Videos | Yiwu Zhong et.al. | 2512.09616 | link |
| 2025-12-10 | ImageTalk: Designing a Multimodal AAC Text Generation System Driven by Image Recognition and Natural Language Generation | Boyin Yang et.al. | 2512.09610 | null |
| 2025-12-10 | Investigate the Low-level Visual Perception in Vision-Language based Image Quality Assessment | Yuan Li et.al. | 2512.09573 | null |
| 2025-12-10 | System Report for CCL25-Eval Task 10: Prompt-Driven Large Language Model Merge for Fine-Grained Chinese Hate Speech Detection | Binglin Wu et.al. | 2512.09563 | null |
| 2025-12-10 | Systematic Framework of Application Methods for Large Language Models in Language Sciences | Kun Sun et.al. | 2512.09552 | null |
| 2025-12-10 | Chasing Shadows: Pitfalls in LLM Security Research | Jonathan Evertz et.al. | 2512.09549 | null |
| 2025-12-10 | Supporting Dynamic Agentic Workloads: How Data and Agents Interact | Ioana Giurgiu et.al. | 2512.09548 | null |
| 2025-12-10 | Don’t Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search | Ekaterina Fadeeva et.al. | 2512.09538 | null |
| 2025-12-10 | CNFinBench: A Benchmark for Safety and Compliance of Large Language Models in Finance | Jinru Ding et.al. | 2512.09506 | null |
| 2025-12-10 | RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning | Yucan Guo et.al. | 2512.09487 | null |
| 2025-12-10 | Advancing LLM-Based Security Automation with Customized Group Relative Policy Optimization for Zero-Touch Networks | Xinye Cao et.al. | 2512.09485 | null |
| 2025-12-10 | An Efficient Interaction Human-AI Synergy System Bridging Visual Awareness and Large Language Model for Intensive Care Units | Yibowen Zhao et.al. | 2512.09473 | null |
| 2025-12-10 | WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving | Chiheng Lou et.al. | 2512.09472 | null |
| 2025-12-10 | Advancing Text Classification with Large Language Models and Neural Attention Mechanisms | Ning Lyu et.al. | 2512.09444 | null |
| 2025-12-10 | Advancing Research via Human-AI Interactive Theorem Proving | Chenyi Li et.al. | 2512.09443 | null |
| 2025-12-10 | Knowledge-Augmented Large Language Model Agents for Explainable Financial Decision-Making | Qingyuan Zhang et.al. | 2512.09440 | null |
| 2025-12-10 | ODMA: On-Demand Memory Allocation Framework for LLM Serving on LPDDR-Class Accelerators | Guoqiang Zou et.al. | 2512.09427 | null |
| 2025-12-10 | Black-Box Behavioral Distillation Breaks Safety Alignment in Medical LLMs | Sohely Jahan et.al. | 2512.09403 | null |
| 2025-12-10 | Optimizing Data Extraction from Materials Science Literature: A Study of Tools Using Large Language Models | Wenkai Ning et.al. | 2512.09370 | null |
| 2025-12-10 | Are Hypervectors Enough? Single-Call LLM Reasoning over Knowledge Graphs | Yezi Liu et.al. | 2512.09369 | null |
| 2025-12-10 | Video-QTR: Query-Driven Temporal Reasoning Framework for Lightweight Video Understanding | Xinkui Zhao et.al. | 2512.09354 | null |
| 2025-12-10 | Self Distillation Fine-Tuning of Protein Language Models Improves Versatility in Protein Design | Amin Tavakoli et.al. | 2512.09329 | null |
| 2025-12-10 | RACAM: Enhancing DRAM with Reuse-Aware Computation and Automated Mapping for ML Inference | Siyuan Ma et.al. | 2512.09304 | null |
| 2025-12-10 | Identifying Bias in Machine-generated Text Detection | Kevin Stowe et.al. | 2512.09292 | null |
| 2025-12-10 | LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations | Zhichao Yang et.al. | 2512.09271 | null |
| 2025-12-10 | From Forecast to Action: Uncertainty-Aware UAV Deployment for Ocean Drifter Recovery | Jingeun Kim et.al. | 2512.09260 | null |
| 2025-12-10 | The Illusion of Rationality: Tacit Bias and Strategic Dominance in Frontier LLM Negotiation Games | Manuel S. Ríos et.al. | 2512.09254 | null |
| 2025-12-10 | GLACIA: Instance-Aware Positional Reasoning for Glacial Lake Segmentation via Multimodal Large Language Model | Lalit Maurya et.al. | 2512.09251 | link |
| 2025-12-10 | Training-free Context-adaptive Attention for Efficient Long Context Modeling | Zeng You et.al. | 2512.09238 | null |
| 2025-12-10 | CORE: A Conceptual Reasoning Layer for Large Language Models | Vishwas Hegde et.al. | 2512.09222 | null |
| 2025-12-10 | Targeting Misalignment: A Conflict-Aware Framework for Reward-Model-based LLM Alignment | Zixuan Liu et.al. | 2512.09212 | null |
| 2025-12-09 | LLMs for Analog Circuit Design Continuum (ACDC) | Yasaman Esfandiari et.al. | 2512.09199 | null |
| 2025-12-09 | TritonForge: Profiling-Guided Framework for Automated Triton Kernel Optimization | Haonan Li et.al. | 2512.09196 | null |
| 2025-12-09 | WOLF: Werewolf-based Observations for LLM Deception and Falsehoods | Mrinal Agarwal et.al. | 2512.09187 | null |
| 2025-12-09 | MindShift: Analyzing Language Models’ Reactions to Psychological Prompts | Anton Vasiliuk et.al. | 2512.09149 | null |
| 2025-12-09 | Detecting Hallucinations in Graph Retrieval-Augmented Generation via Attention Patterns and Semantic Alignment | Shanghao Li et.al. | 2512.09148 | null |
| 2025-12-09 | Knowledge-Guided Large Language Model for Automatic Pediatric Dental Record Understanding and Safe Antibiotic Recommendation | Zihan Han et.al. | 2512.09127 | null |
| 2025-12-09 | A Categorical Analysis of Large Language Models and Why LLMs Circumvent the Symbol Grounding Problem | Luciano Floridi et.al. | 2512.09117 | null |
| 2025-12-09 | Evolving Excellence: Automated Optimization of LLM-based Agents | Paul Brookes et.al. | 2512.09108 | null |
| 2025-12-09 | Learning Unmasking Policies for Diffusion Language Models | Metod Jazbec et.al. | 2512.09106 | null |
| 2025-12-09 | Explaining the Unseen: Multimodal Vision-Language Reasoning for Situational Awareness in Underground Mining Disasters | Mizanur Rahman Jewel et.al. | 2512.09092 | null |
| 2025-12-09 | Calibrated Trust in Dealing with LLM Hallucinations: A Qualitative Study | Adrian Ryser et.al. | 2512.09088 | null |
| 2025-12-09 | AgentComp: From Agentic Reasoning to Compositional Mastery in Text-to-Image Models | Arman Zarei et.al. | 2512.09081 | null |
| 2025-12-09 | Llama-based source code vulnerability detection: Prompt engineering vs Fine tuning | Dyna Soumhane Ouchebara et.al. | 2512.09006 | null |
| 2025-12-09 | Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs | Angela van Sprang et.al. | 2512.08923 | null |
| 2025-12-09 | Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training | Jakub Krajewski et.al. | 2512.08894 | null |
| 2025-12-09 | Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders | Guangzhi Xiong et.al. | 2512.08892 | null |
| 2025-12-09 | AI Didn’t Start the Fire: Examining the Stack Exchange Moderator and Contributor Strike | Yiwei Wu et.al. | 2512.08884 | null |
| 2025-12-09 | When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation | Joshua Ward et.al. | 2512.08875 | null |
| 2025-12-09 | Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning | Jing Jie Tan et.al. | 2512.08873 | null |
| 2025-12-09 | SimpleDevQA: Benchmarking Large Language Models on Development Knowledge QA | Jing Zhang et.al. | 2512.08867 | null |
| 2025-12-09 | Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts | Yifan Lyu et.al. | 2512.08814 | null |
| 2025-12-09 | PrivTune: Efficient and Privacy-Preserving Fine-Tuning of Large Language Models via Device-Cloud Collaboration | Yi Liu et.al. | 2512.08809 | null |
| 2025-12-09 | A Systematic Evaluation of Preference Aggregation in Federated RLHF for Pluralistic Alignment of LLMs | Mahmoud Srewa et.al. | 2512.08786 | null |
| 2025-12-09 | A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows | Eranga Bandara et.al. | 2512.08769 | null |
| 2025-12-09 | Financial News Summarization: Can extractive methods still offer a true alternative to LLMs? | Nicolas Reche et.al. | 2512.08764 | null |
| 2025-12-09 | Towards Foundation Models with Native Multi-Agent Intelligence | Shuyue Hu et.al. | 2512.08743 | null |
| 2025-12-09 | LaMoSys3.5D: Enabling 3.5D-IC-Based Large Language Model Inference Serving Systems via Hardware/Software Co-Design | Qipan Wang et.al. | 2512.08731 | null |
| 2025-12-09 | Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search | Manos Plitsis et.al. | 2512.08724 | null |
| 2025-12-09 | Multi-Agent Intelligence for Multidisciplinary Decision-Making in Gastrointestinal Oncology | Rongzhao Zhang et.al. | 2512.08674 | null |
| 2025-12-09 | An Agentic AI System for Multi-Framework Communication Coding | Bohao Yang et.al. | 2512.08659 | null |
| 2025-12-09 | QSTN: A Modular Framework for Robust Questionnaire Inference with Large Language Models | Maximilian Kreutner et.al. | 2512.08646 | null |
| 2025-12-09 | Chain-of-Image Generation: Toward Monitorable and Controllable Image Generation | Young Kyung Kim et.al. | 2512.08645 | null |
| 2025-12-09 | See-Control: A Multimodal Agent Framework for Smartphone Interaction with a Robotic Arm | Haoyu Zhao et.al. | 2512.08629 | null |
| 2025-12-09 | HealthcareNLP: where are we and what is next? | Lifeng Han et.al. | 2512.08617 | null |
| 2025-12-09 | CogMCTS: A Novel Cognitive-Guided Monte Carlo Tree Search Framework for Iterative Heuristic Evolution with Large Language Models | Hui Wang et.al. | 2512.08609 | null |
| 2025-12-09 | Bridging Scale Discrepancies in Robotic Control via Language-Based Action Representations | Yuchi Zhang et.al. | 2512.08548 | null |
| 2025-12-09 | Curriculum Guided Massive Multi Agent System Solving For Robust Long Horizon Tasks | Indrajit Kar et.al. | 2512.08545 | null |
| 2025-12-09 | Principles2Plan: LLM-Guided System for Operationalising Ethical Principles into Plans | Tammy Zhong et.al. | 2512.08536 | null |
| 2025-12-09 | Autonomous Issue Resolver: Towards Zero-Touch Code Maintenance | Aliaksei Kaliutau et.al. | 2512.08492 | null |
| 2025-12-09 | Soft Inductive Bias Approach via Explicit Reasoning Perspectives in Inappropriate Utterance Detection Using Large Language Models | Ju-Young Kim et.al. | 2512.08480 | null |
| 2025-12-09 | A Multi-Agent LLM Framework for Design Space Exploration in Autonomous Driving Systems | Po-An Shih et.al. | 2512.08476 | null |
| 2025-12-09 | Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models III: Implementing the Bacterial Biothreat Benchmark (B3) Dataset | Gary Ackerman et.al. | 2512.08459 | null |
| 2025-12-09 | Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models II: Benchmark Generation Process | Gary Ackerman et.al. | 2512.08451 | null |
| 2025-12-09 | What Triggers my Model? Contrastive Explanations Inform Gender Choices by Translation Models | Janiça Hackenbuchner et.al. | 2512.08440 | null |
| 2025-12-09 | Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs | Yinan Zhong et.al. | 2512.08417 | null |
| 2025-12-09 | Towards Effective and Efficient Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval | Tao Chen et.al. | 2512.08410 | null |
| 2025-12-09 | DFALLM: Achieving Generalizable Multitask Deepfake Detection by Optimizing Audio LLM Components | Yupei Li et.al. | 2512.08403 | null |
| 2025-12-09 | The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss | Bozhou Li et.al. | 2512.08374 | null |
| 2025-12-09 | Reflecting with Two Voices: A Co-Adaptive Dual-Strategy Framework for LLM-Based Agent Decision Making | Wentao Zhang et.al. | 2512.08366 | null |
| 2025-12-09 | The High Cost of Incivility: Quantifying Interaction Inefficiency via Multi-Agent Monte Carlo Simulations | Benedikt Mangold et.al. | 2512.08345 | null |
| 2025-12-09 | Argus: A Multi-Agent Sensitive Information Leakage Detection Framework Based on Hierarchical Reference Relationships | Bin Wang et.al. | 2512.08326 | null |
| 2025-12-09 | rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection | Sijia Chen et.al. | 2512.08300 | null |
| 2025-12-09 | Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem | Shiva Gaire et.al. | 2512.08290 | null |
| 2025-12-09 | Empowering smart app development with SolidGPT: an edge-cloud hybrid AI agent framework | Liao Hu et.al. | 2512.08286 | null |
| 2025-12-09 | AgentEval: Generative Agents as Reliable Proxies for Human Evaluation of AI-Generated Content | Thanh Vu et.al. | 2512.08273 | null |
| 2025-12-09 | Reasoning Models Ace the CFA Exams | Jaisal Patel et.al. | 2512.08270 | null |
| 2025-12-09 | Token Sugar: Making Source Code Sweeter for LLMs through Token-Efficient Shorthand | Zhensu Sun et.al. | 2512.08266 | null |
| 2025-12-09 | Beyond Traditional Diagnostics: Transforming Patient-Side Information into Predictive Insights with Knowledge Graphs and Prototypes | Yibowen Zhao et.al. | 2512.08261 | null |
| 2025-12-09 | Chopper: A Multi-Level GPU Characterization Tool & Derived Insights Into LLM Training Inefficiency | Marco Kurzynski et.al. | 2512.08242 | null |
| 2025-12-09 | SOP^2: Transfer Learning with Scene-Oriented Prompt Pool on 3D Object Detection | Ching-Hung Cheng et.al. | 2512.08223 | null |
| 2025-12-09 | Secure or Suspect? Investigating Package Hallucinations of Shell Command in Original and Quantized LLMs | Md Nazmul Haque et.al. | 2512.08213 | null |
| 2025-12-09 | MobileFineTuner: A Unified End-to-End Framework for Fine-Tuning LLMs on Mobile Phones | Jiaxiang Geng et.al. | 2512.08211 | null |
| 2025-12-09 | ClinicalTrialsHub: Bridging Registries and Literature for Comprehensive Clinical Trial Access | Jiwoo Park et.al. | 2512.08193 | null |
| 2025-12-09 | A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties | Jinghao Wang et.al. | 2512.08185 | null |
| 2025-12-09 | Framing Climate Change on YouTube: North-South Divides in Narratives and Public Engagement | Sanika Damle et.al. | 2512.08183 | null |
| 2025-12-09 | Chat with UAV – Human-UAV Interaction Based on Large Language Models | Haoran Wang et.al. | 2512.08145 | null |
| 2025-12-09 | PolyLingua: Margin-based Inter-class Transformer for Robust Cross-domain Language Detection | Ali Lotfi Rezaabad et.al. | 2512.08143 | null |
| 2025-12-09 | Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models I: The Task-Query Architecture | Gary Ackerman et.al. | 2512.08130 | null |
| 2025-12-09 | Universal Adversarial Suffixes Using Calibrated Gumbel-Softmax Relaxation | Sampriti Soor et.al. | 2512.08123 | null |
| 2025-12-08 | Evolutionary perspective of large language models on shaping research insights into healthcare disparities | David An et.al. | 2512.08122 | null |
| 2025-12-08 | Balanced Accuracy: The Right Metric for Evaluating LLM Judges – Explained through Youden’s J statistic | Stephane Collot et.al. | 2512.08121 | null |
| 2025-12-08 | Detecting Ambiguity Aversion in Cyberattack Behavior to Inform Cognitive Defense Strategies | Stephan Carney et.al. | 2512.08107 | null |
| 2025-12-08 | AgentCrypt: Advancing Privacy and (Secure) Computation in AI Agent Collaboration | Harish Karthikeyan et.al. | 2512.08104 | null |
| 2025-12-08 | Training LLMs for Honesty via Confessions | Manas Joglekar et.al. | 2512.08093 | null |
| 2025-12-08 | Adaptation of Embedding Models to Financial Filings via LLM Distillation | Eliot Brenner et.al. | 2512.08088 | null |
| 2025-12-08 | Exploiting the Randomness of Large Language Models (LLM) in Text Classification Tasks: Locating Privileged Documents in Legal Matters | Keith Huffman et.al. | 2512.08083 | null |
| 2025-12-08 | Short-Context Dominance: How Much Local Context Natural Language Actually Needs? | Vala Vakilian et.al. | 2512.08082 | null |
| 2025-12-08 | Leveraging Machine Learning and Large Language Models for Automated Image Clustering and Description in Legal Discovery | Qiang Mao et.al. | 2512.08079 | null |
| 2025-12-08 | A Comparative Study of Retrieval Methods in Azure AI Search | Qiang Mao et.al. | 2512.08078 | null |
| 2025-12-08 | Unveiling Latent Knowledge in Chemistry Language Models through Sparse Autoencoders | Jaron Cohen et.al. | 2512.08077 | null |
| 2025-12-08 | Large Language Models for Education and Research: An Empirical and User Survey-based Analysis | Md Mostafizer Rahman et.al. | 2512.08057 | null |
| 2025-12-08 | CLARITY: Medical World Model for Guiding Treatment Decisions by Modeling Context-Aware Disease Trajectories in Latent Space | Tianxingjian Ding et.al. | 2512.08029 | null |
| 2025-12-08 | Toward an AI Reasoning-Enabled System for Patient-Clinical Trial Matching | Caroline N. Leach et.al. | 2512.08026 | null |
| 2025-12-08 | FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models | Jiyoon Pyo et.al. | 2512.08016 | null |
| 2025-12-08 | Bridging the Clinical Expertise Gap: Development of a Web-Based Platform for Accessible Time Series Forecasting and Analysis | Aaron D. Mullen et.al. | 2512.07992 | null |
| 2025-12-08 | DeepCode: Open Agentic Coding | Zongwei Li et.al. | 2512.07921 | link |
| 2025-12-08 | Relational Visual Similarity | Thao Nguyen et.al. | 2512.07833 | link |
| 2025-12-08 | Do Generalisation Results Generalise? | Matteo Boglioni et.al. | 2512.07832 | null |
| 2025-12-08 | Understanding Privacy Risks in Code Models Through Training Dynamics: A Causal Approach | Hua Yang et.al. | 2512.07814 | null |
| 2025-12-08 | LLM Use for Mental Health: Crowdsourcing Users’ Sentiment-based Perspectives and Values from Social Discussions | Lingyao Li et.al. | 2512.07797 | null |
| 2025-12-08 | Large Causal Models from Large Language Models | Sridhar Mahadevan et.al. | 2512.07796 | null |
| 2025-12-08 | ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning | Nearchos Potamitis et.al. | 2512.07795 | link |
| 2025-12-08 | Automating High Energy Physics Data Analysis with LLM-Powered Agents | Eli Gendreau-Distler et.al. | 2512.07785 | null |
| 2025-12-08 | Mary, the Cheeseburger-Eating Vegetarian: Do LLMs Recognize Incoherence in Narratives? | Karin de Langis et.al. | 2512.07777 | null |
| 2025-12-08 | RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models | Xiqiao Xiong et.al. | 2512.07761 | null |
| 2025-12-08 | SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery | Meng Cao et.al. | 2512.07733 | null |
| 2025-12-08 | SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination | Sangha Park et.al. | 2512.07730 | null |
| 2025-12-08 | Privacy Practices of Browser Agents | Alisha Ukani et.al. | 2512.07725 | null |
| 2025-12-08 | In-Context and Few-Shots Learning for Forecasting Time Series Data based on Large Language Models | Saroj Gopali et.al. | 2512.07705 | null |
| 2025-12-08 | HalluShift++: Bridging Language and Vision through Internal Representation Shifts for Hierarchical Hallucinations in MLLMs | Sujoy Nath et.al. | 2512.07687 | null |
| 2025-12-08 | When Large Language Models Do Not Work: Online Incivility Prediction through Graph Neural Networks | Zihan Chen et.al. | 2512.07684 | null |
| 2025-12-08 | Depth-Wise Activation Steering for Honest Language Models | Gracjan Góral et.al. | 2512.07667 | null |
| 2025-12-08 | Bridging Code Graphs and Large Language Models for Better Code Understanding | Zeqi Chen et.al. | 2512.07666 | null |
| 2025-12-08 | Reliable agent engineering should integrate machine-compatible organizational principles | R. Patrick Xian et.al. | 2512.07665 | null |
| 2025-12-08 | An AI-Powered Autonomous Underwater System for Sea Exploration and Scientific Research | Hamad Almazrouei et.al. | 2512.07652 | null |
| 2025-12-08 | PCMind-2.1-Kaiyuan-2B Technical Report | Kairong Luo et.al. | 2512.07612 | null |
| 2025-12-08 | Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement | Yongsheng Lian et.al. | 2512.07611 | null |
| 2025-12-08 | Metric-Fair Prompting: Treating Similar Samples Similarly | Jing Wang et.al. | 2512.07608 | null |
| 2025-12-08 | Complementary Learning Approach for Text Classification using Large Language Models | Navid Asgari et.al. | 2512.07583 | null |
| 2025-12-08 | All You Need Are Random Visual Tokens? Demystifying Token Pruning in VLLMs | Yahong Wang et.al. | 2512.07580 | null |
| 2025-12-08 | A Simple Method to Enhance Pre-trained Language Models with Speech Tokens for Classification | Nicolas Calbucura et.al. | 2512.07571 | null |
| 2025-12-08 | MoCoRP: Modeling Consistent Relations between Persona and Response for Persona-based Dialogue | Kyungro Lee et.al. | 2512.07544 | null |
| 2025-12-08 | SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents | Michelle Wastl et.al. | 2512.07538 | null |
| 2025-12-08 | Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs | Xiaoran Liu et.al. | 2512.07525 | link |
| 2025-12-08 | AutoICE: Automatically Synthesizing Verifiable C Code via LLM-driven Evolution | Weilin Luo et.al. | 2512.07501 | null |
| 2025-12-08 | How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations | JV Roig et.al. | 2512.07497 | null |
| 2025-12-08 | Enhancing Agentic RL with Progressive Reward Shaping and Value-based Sampling Policy Optimization | Zhuoran Zhuang et.al. | 2512.07478 | null |
| 2025-12-08 | Understanding LLM Agent Behaviours via Game Theory: Strategy Recognition, Biases and Multi-Agent Dynamics | Trung-Kiet Huynh et.al. | 2512.07462 | null |
| 2025-12-08 | Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning | Tong Wu et.al. | 2512.07461 | link |
| 2025-12-08 | Persian-Phi: Efficient Cross-Lingual Adaptation of Compact LLMs via Curriculum Learning | Amir Mohammad Akhlaghi et.al. | 2512.07454 | null |
| 2025-12-08 | From Show Programmes to Data: Designing a Workflow to Make Performing Arts Ephemera Accessible Through Language Models | Clarisse Bardiot et.al. | 2512.07452 | null |
| 2025-12-08 | MIDG: Mixture of Invariant Experts with knowledge injection for Domain Generalization in Multimodal Sentiment Analysis | Yangle Li et.al. | 2512.07430 | null |
| 2025-12-08 | Revolutionizing Mixed Precision Quantization: Towards Training-free Automatic Proxy Discovery via Large Language Models | Haidong Kang et.al. | 2512.07419 | null |
| 2025-12-08 | Do LLMs Trust the Code They Write? | Francisco Ribeiro et.al. | 2512.07404 | null |
| 2025-12-08 | LUNE: Efficient LLM Unlearning via LoRA Fine-Tuning with Negative Examples | Yezi Liu et.al. | 2512.07375 | null |
| 2025-12-08 | Communication-Efficient Serving for Video Diffusion Models with Latent Parallelism | Zhiyuan Wu et.al. | 2512.07350 | null |
| 2025-12-08 | Generalized Referring Expression Segmentation on Aerial Photos | Luís Marnoto et.al. | 2512.07338 | link |
| 2025-12-08 | DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management | Zhongchun Zhou et.al. | 2512.07312 | null |
| 2025-12-08 | Exact Synthetic Populations for Scalable Societal and Market Modeling | Thierry Petit et.al. | 2512.07306 | null |
| 2025-12-08 | Towards Accurate UAV Image Perception: Guiding Vision-Language Models with Stronger Task Prompts | Mingning Guo et.al. | 2512.07302 | null |
| 2025-12-08 | Investigating Training and Generalization in Faithful Self-Explanations of Large Language Models | Tomoki Doi et.al. | 2512.07288 | null |
| 2025-12-08 | Automatic Syntax Error Repair for Discrete Controller Synthesis using Large Language Model | Yusei Ishimizu et.al. | 2512.07261 | null |
| 2025-12-08 | Ensembling LLM-Induced Decision Trees for Explainable and Robust Error Detection | Mengqi Wang et.al. | 2512.07246 | null |
| 2025-12-08 | NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models | Feng Liang et.al. | 2512.07218 | null |
| 2025-12-08 | MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning | Xuhui Zheng et.al. | 2512.07203 | null |
| 2025-12-08 | Generating Storytelling Images with Rich Chains-of-Reasoning | Xiujie Song et.al. | 2512.07198 | null |
| 2025-12-08 | START: Spatial and Textual Learning for Chart Understanding | Zhuoming Liu et.al. | 2512.07186 | link |
| 2025-12-08 | ContextualSHAP : Enhancing SHAP Explanations Through Contextual Language Generation | Latifa Dwiyanti et.al. | 2512.07178 | null |
| 2025-12-08 | SPACE: Noise Contrastive Estimation Stabilizes Self-Play Fine-Tuning for Large Language Models | Yibo Wang et.al. | 2512.07175 | null |
| 2025-12-08 | Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration | Jucheng Shen et.al. | 2512.07173 | null |
| 2025-12-08 | When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing | Siyuan Xu et.al. | 2512.07166 | null |
| 2025-12-08 | A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning | Siyang Jiang et.al. | 2512.07136 | null |
| 2025-12-08 | DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning | Nithin Sivakumaran et.al. | 2512.07132 | null |
| 2025-12-08 | RisConFix: LLM-based Automated Repair of Risk-Prone Drone Configurations | Liping Han et.al. | 2512.07122 | null |
| 2025-12-08 | FOAM: Blocked State Folding for Memory-Efficient LLM Training | Ziqing Wen et.al. | 2512.07112 | null |
| 2025-12-08 | The Geometry of Persona: Disentangling Personality from Reasoning in Large Language Models | Zhixiang Wang et.al. | 2512.07092 | null |
| 2025-12-08 | Leveraging KV Similarity for Online Structured Pruning in LLMs | Jungmin Lee et.al. | 2512.07090 | null |
| 2025-12-08 | ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking | Yunzhe Li et.al. | 2512.07086 | null |
| 2025-12-08 | Do Large Language Models Truly Understand Cross-cultural Differences? | Shiwei Guo et.al. | 2512.07075 | null |
| 2025-12-08 | Replicating TEMPEST at Scale: Multi-Turn Adversarial Attacks Against Trillion-Parameter Frontier Models | Richard Young et.al. | 2512.07059 | null |
| 2025-12-07 | Reformulate, Retrieve, Localize: Agents for Repository-Level Bug Localization | Genevieve Caumartin et.al. | 2512.07022 | null |
| 2025-12-07 | Latency-Response Theory Model: Evaluating Large Language Models via Response Accuracy and Chain-of-Thought Length | Zhiyu Xu et.al. | 2512.07019 | null |
| 2025-12-07 | FVA-RAG: Falsification-Verification Alignment for Mitigating Sycophantic Hallucinations | Mayank Ravishankara et.al. | 2512.07015 | null |
| 2025-12-07 | Block Sparse Flash Attention | Daniel Ohayon et.al. | 2512.07011 | null |
| 2025-12-07 | Singing Timbre Popularity Assessment Based on Multimodal Large Foundation Model | Zihao Wang et.al. | 2512.06999 | null |
| 2025-12-07 | Prompting-in-a-Series: Psychology-Informed Contents and Embeddings for Personality Recognition With Decoder-Only Models | Jing Jie Tan et.al. | 2512.06991 | null |
| 2025-12-07 | Progress Ratio Embeddings: An Impatience Signal for Robust Length Control in Neural Text Generation | Ivanhoé Botcazou et.al. | 2512.06938 | null |
| 2025-12-07 | Large Language Models and Forensic Linguistics: Navigating Opportunities and Threats in the Age of Generative AI | George Mikros et.al. | 2512.06922 | null |
| 2025-12-07 | NeuroABench: A Multimodal Evaluation Benchmark for Neurosurgical Anatomy Identification | Ziyang Song et.al. | 2512.06921 | null |
| 2025-12-07 | SoK: Trust-Authorization Mismatch in LLM Agent Interactions | Guanquan Shi et.al. | 2512.06914 | null |
| 2025-12-07 | Robots with Attitudes: Influence of LLM-Driven Robot Personalities on Motivation and Performance | Dennis Becker et.al. | 2512.06910 | null |
| 2025-12-07 | BabelCoder: Agentic Code Translation with Specification Alignment | Fazle Rabbi et.al. | 2512.06902 | null |
| 2025-12-07 | An Analysis of Large Language Models for Simulating User Responses in Surveys | Ziyun Yu et.al. | 2512.06874 | null |
| 2025-12-07 | Rhea: Role-aware Heuristic Episodic Attention for Conversational LLMs | Wanyang Hong et.al. | 2512.06869 | null |
| 2025-12-07 | Do Persona-Infused LLMs Affect Performance in a Strategic Reasoning Game? | John Licato et.al. | 2512.06867 | null |
| 2025-12-07 | Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior | Yulin Li et.al. | 2512.06866 | null |
| 2025-12-07 | Spatial Retrieval Augmented Autonomous Driving | Xiaosong Jia et.al. | 2512.06865 | null |
| 2025-12-07 | JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models | Ce Chi et.al. | 2512.06859 | null |
| 2025-12-07 | Formal that “Floats” High: Formal Verification of Floating Point Arithmetic | Hansa Mohanty et.al. | 2512.06850 | null |
| 2025-12-07 | CKG-LLM: LLM-Assisted Detection of Smart Contract Access Control Vulnerabilities Based on Knowledge Graphs | Xiaoqi Li et.al. | 2512.06846 | null |
| 2025-12-07 | Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs | Weixing Zhang et.al. | 2512.06836 | null |
| 2025-12-07 | Large Language Model-Based Generation of Discharge Summaries | Tiago Rodrigues et.al. | 2512.06812 | null |
| 2025-12-07 | MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning | Yueqian Wang et.al. | 2512.06810 | null |
| 2025-12-07 | Optimal and Diffusion Transports in Machine Learning | Gabriel Peyré et.al. | 2512.06797 | null |
| 2025-12-07 | LLM4SFC: Sequential Function Chart Generation via Large Language Models | Ofek Glick et.al. | 2512.06787 | null |
| 2025-12-07 | From Description to Score: Can LLMs Quantify Vulnerabilities? | Sima Jafarikhah et.al. | 2512.06781 | null |
| 2025-12-07 | From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs | Yuchuan Tian et.al. | 2512.06776 | link |
| 2025-12-07 | Becoming Experienced Judges: Selective Test-Time Learning for Evaluators | Seungyeon Jwa et.al. | 2512.06751 | null |
| 2025-12-07 | DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems | Ming Ma et.al. | 2512.06749 | null |
| 2025-12-07 | PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance | Jifar Wakuma Ayana et.al. | 2512.06747 | null |
| 2025-12-07 | A Patient-Doctor-NLP-System to contest inequality for less privileged | Subrit Dikshit et.al. | 2512.06734 | null |
| 2025-12-07 | “The Dentist is an involved parent, the bartender is not”: Revealing Implicit Biases in QA with Implicit BBQ | Aarushi Wagh et.al. | 2512.06732 | null |
| 2025-12-07 | KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models | Sourjya Roy et.al. | 2512.06727 | null |
| 2025-12-07 | The Role of Entropy in Visual Grounding: Analysis and Optimization | Shuo Li et.al. | 2512.06726 | null |
| 2025-12-07 | ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems | Bufang Yang et.al. | 2512.06721 | null |
| 2025-12-07 | Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents | Zhibo Liang et.al. | 2512.06716 | null |
| 2025-11-06 | Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs | Preetum Nakkiran et.al. | 2511.04869 | null |
| 2025-11-06 | Software Defined Vehicle Code Generation: A Few-Shot Prompting Approach | Quang-Dung Nguyen et.al. | 2511.04849 | null |
| 2025-11-06 | Grounded Test-Time Adaptation for LLM Agents | Arthur Chen et.al. | 2511.04847 | null |
| 2025-11-06 | Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models | Chenxi Liu et.al. | 2511.04800 | null |
| 2025-11-06 | ReGen: Generative Robot Simulation via Inverse Design | Phat Nguyen et.al. | 2511.04769 | null |
| 2025-11-06 | Surprisal reveals diversity gaps in image captioning and different scorers change the story | Nikolai Ilinykh et.al. | 2511.04754 | null |
| 2025-11-06 | Trustworthiness Calibration Framework for Phishing Email Detection Using Large Language Models | Daniyal Ganiuly et.al. | 2511.04728 | null |
| 2025-11-06 | IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs | Ali Faraz et.al. | 2511.04727 | null |
| 2025-11-06 | Learning to reason about rare diseases through retrieval-augmented agents | Ha Young Kim et.al. | 2511.04720 | null |
| 2025-11-06 | Benchmark Designers Should “Train on the Test Set” to Expose Exploitable Non-Visual Shortcuts | Ellis Brown et.al. | 2511.04655 | null |
| 2025-11-06 | Logit-Entropy Adaptive Stopping Heuristic for Efficient Chain-of-Thought Reasoning | Mohammad Atif Quamar et.al. | 2511.04654 | null |
| 2025-11-06 | Optimal Inference Schedules for Masked Diffusion Models | Sitan Chen et.al. | 2511.04647 | null |
| 2025-11-06 | When retrieval outperforms generation: Dense evidence retrieval for scalable fake news detection | Alamgir Munir Qazi et.al. | 2511.04643 | link |
| 2025-11-06 | PixCLIP: Achieving Fine-grained Visual Language Understanding via Any-granularity Pixel-Text Alignment Learning | Yicheng Xiao et.al. | 2511.04601 | null |
| 2025-11-06 | Question the Questions: Auditing Representation in Online Deliberative Processes | Soham De et.al. | 2511.04588 | null |
| 2025-11-06 | ARETE: an R package for Automated REtrieval from TExt with large language models | Vasco V. Branco et.al. | 2511.04573 | null |
| 2025-11-06 | Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm | Jingqi Tong et.al. | 2511.04570 | link |
| 2025-11-06 | LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems | Baptiste Bonin et.al. | 2511.04541 | null |
| 2025-11-06 | From Model to Breach: Towards Actionable LLM-Generated Vulnerabilities Reporting | Cyril Vallez et.al. | 2511.04538 | null |
| 2025-11-06 | Large Language Models for Cyber Security | Raunak Somani et.al. | 2511.04508 | null |
| 2025-11-06 | RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG | Joshua Gao et.al. | 2511.04502 | null |
| 2025-11-06 | Large language models replicate and predict human cooperation across experiments in game theory | Andrea Cera Palatsi et.al. | 2511.04500 | null |
| 2025-11-06 | Decoding Emergent Big Five Traits in Large Language Models: Temperature-Dependent Expression and Architectural Clustering | Christos-Nikolaos Zacharopoulos et.al. | 2511.04499 | null |
| 2025-11-06 | RUST-BENCH: Benchmarking LLM Reasoning on Unstructured Text within Structured Tables | Nikhil Abhyankar et.al. | 2511.04491 | null |
| 2025-11-06 | Perceptions of AI Bad Behavior: Variations on Discordant Non-Performance | Jaime Banks et.al. | 2511.04487 | null |
| 2025-11-06 | Promoting Sustainable Web Agents: Benchmarking and Estimating Energy Consumption through Empirical and Theoretical Analysis | Lars Krupp et.al. | 2511.04481 | null |
| 2025-11-06 | Enabling Dynamic Sparsity in Quantized LLM Inference | Rongxiang Wang et.al. | 2511.04477 | null |
| 2025-11-06 | Beyond Shortest Path: Agentic Vehicular Routing with Semantic Context | Carnot Braun et.al. | 2511.04464 | null |
| 2025-11-06 | Speed at the Cost of Quality? The Impact of LLM Agent Assistance on Software Development | Hao He et.al. | 2511.04427 | null |
| 2025-11-06 | The Illusion of Certainty: Uncertainty quantification for LLMs fails under ambiguity | Tim Tomov et.al. | 2511.04418 | null |
| 2025-11-06 | Post-Training LLMs as Better Decision-Making Agents: A Regret-Minimization Approach | Chanwoo Park et.al. | 2511.04393 | null |
| 2025-11-06 | Multi-Task Learning for Visually Grounded Reasoning in Gastrointestinal VQA | Itbaan Safwan et.al. | 2511.04384 | null |
| 2025-11-06 | HPC-Vis: A Visual Analytic System for Interactive Exploration of Historical Painter Cohorts | Yingping Yang et.al. | 2511.04383 | null |
| 2025-11-06 | Towards Aligning Multimodal LLMs with Human Experts: A Focus on Parent-Child Interaction | Weiyan Shi et.al. | 2511.04366 | null |
| 2025-11-06 | Where Do LLMs Still Struggle? An In-Depth Analysis of Code Generation Benchmarks | Amir Molzam Sharifloo et.al. | 2511.04355 | null |
| 2025-11-06 | Differentially Private In-Context Learning with Nearest Neighbor Search | Antti Koskela et.al. | 2511.04332 | null |
| 2025-11-06 | RxSafeBench: Identifying Medication Safety Issues of Large Language Models in Simulated Consultation | Jiahao Zhao et.al. | 2511.04328 | null |
| 2025-11-06 | AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research | Tim Beyer et.al. | 2511.04316 | null |
| 2025-11-06 | Measuring economic outlook in the news timely and efficiently | Elliot Beck et.al. | 2511.04299 | null |
| 2025-11-06 | Robustness of Minimum-Volume Nonnegative Matrix Factorization under an Expanded Sufficiently Scattered Condition | Giovanni Barbarino et.al. | 2511.04291 | null |
| 2025-11-06 | A Tool for Benchmarking Large Language Models’ Robustness in Assessing the Realism of Driving Scenarios | Jiahui Wu et.al. | 2511.04267 | null |
| 2025-11-06 | SSPO: Subsentence-level Policy Optimization | Kun Yang et.al. | 2511.04256 | null |
| 2025-11-06 | Efficient Topic Extraction via Graph-Based Labeling: A Lightweight Alternative to Deep Models | Salma Mekaoui et.al. | 2511.04248 | null |
| 2025-11-06 | Reusing Pre-Training Data at Test Time is a Compute Multiplier | Alex Fang et.al. | 2511.04234 | null |
| 2025-11-06 | Black-Box Guardrail Reverse-engineering Attack | Hongwei Yao et.al. | 2511.04215 | null |
| 2025-11-06 | Block Rotation is All You Need for MXFP4 Quantization | Yuantian Shao et.al. | 2511.04214 | null |
| 2025-11-06 | Can we trust LLMs as a tutor for our students? Evaluating the Quality of LLM-generated Feedback in Statistics Exams | Markus Herklotz et.al. | 2511.04213 | null |
| 2025-11-06 | LLM-as-a-Judge is Bad, Based on AI Attempting the Exam Qualifying for the Member of the Polish National Board of Appeal | Michał Karp et.al. | 2511.04205 | null |
| 2025-11-06 | Computational Turing Test Reveals Systematic Differences Between Human and AI Language | Nicolò Pagan et.al. | 2511.04195 | null |
| 2025-11-06 | Explaining Software Vulnerabilities with Large Language Models | Oshando Johnson et.al. | 2511.04179 | null |
| 2025-11-06 | Transforming Mentorship: An AI Powered Chatbot Approach to University Guidance | Mashrur Rahman et.al. | 2511.04172 | null |
| 2025-11-06 | Are We Aligned? A Preliminary Investigation of the Alignment of Responsible AI Values between LLMs and Human Judgment | Asma Yamani et.al. | 2511.04157 | null |
| 2025-11-06 | BAPPA: Benchmarking Agents, Plans, and Pipelines for Automated Text-to-SQL Generation | Fahim Ahmed et.al. | 2511.04153 | null |
| 2025-11-06 | Implementation of transformer-based LLMs with large-scale optoelectronic neurons on a CMOS image sensor platform | Neil Na et.al. | 2511.04136 | null |
| 2025-11-06 | Exploring the Feasibility of End-to-End Large Language Model as a Compiler | Hongbin Zhang et.al. | 2511.04132 | null |
| 2025-11-06 | RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning | Xinyuan Li et.al. | 2511.04120 | null |
| 2025-11-06 | How Natural Language Proficiency Shapes GenAI Code for Software Engineering Tasks | Ruksit Rojpaisarnkit et.al. | 2511.04115 | null |
| 2025-11-06 | Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models | Wenmo Qiu et.al. | 2511.04108 | null |
| 2025-11-06 | KGFR: A Foundation Retriever for Generalized Knowledge Graph Question Answering | Yuanning Cui et.al. | 2511.04093 | null |
| 2025-11-06 | E-CARE: An Efficient LLM-based Commonsense-Augmented Framework for E-Commerce | Ge Zhang et.al. | 2511.04087 | null |
| 2025-11-06 | Caption Injection for Optimization in Generative Search Engine | Xiaolu Chen et.al. | 2511.04080 | null |
| 2025-11-06 | The truth is no diaper: Human and AI-generated associations to emotional words | Špela Vintar et.al. | 2511.04077 | null |
| 2025-11-06 | Agentmandering: A Game-Theoretic Framework for Fair Redistricting via Large Language Model Agents | Hao Li et.al. | 2511.04076 | null |
| 2025-11-06 | Plan of Knowledge: Retrieval-Augmented Large Language Models for Temporal Knowledge Graph Question Answering | Xinying Qian et.al. | 2511.04072 | null |
| 2025-11-06 | TXL Fusion: A Hybrid Machine Learning Framework Integrating Chemical Heuristics and Large Language Models for Topological Materials Discovery | Arif Ullah et.al. | 2511.04068 | null |
| 2025-11-06 | DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization | Yuantian Shao et.al. | 2511.04063 | null |
| 2025-11-06 | Interpreting Multi-Attribute Confounding through Numerical Attributes in Large Language Models | Hirohane Takagi et.al. | 2511.04053 | null |
| 2025-11-06 | An LLM-based Framework for Human-Swarm Teaming Cognition in Disaster Search and Rescue | Kailun Ji et.al. | 2511.04042 | null |
| 2025-11-06 | PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration | Yue Jiet Chong et.al. | 2511.04036 | null |
| 2025-11-06 | Detecting Silent Failures in Multi-Agentic AI Trajectories | Divya Pathak et.al. | 2511.04032 | null |
| 2025-11-06 | Abductive Inference in Retrieval-Augmented Language Models: Generating and Validating Missing Premises | Shiyin Lin et.al. | 2511.04020 | null |
| 2025-11-06 | Specification-Guided Vulnerability Detection with Large Language Models | Hao Zhu et.al. | 2511.04014 | link |
| 2025-11-06 | PSD2Code: Automated Front-End Code Generation from Design Files via Multimodal Large Language Models | Yongxi Chen et.al. | 2511.04012 | null |
| 2025-11-06 | Memory- and Latency-Constrained Inference of Large Language Models via Adaptive Split Computing | Mingyu Sung et.al. | 2511.04002 | null |
| 2025-11-06 | Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback | Shiyin Lin et.al. | 2511.03995 | null |
| 2025-11-06 | TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training | Michael Menezes et.al. | 2511.03983 | null |
| 2025-11-06 | LLMs and Cultural Values: the Impact of Prompt Language and Explicit Cultural Framing | Bram Bulté et.al. | 2511.03980 | null |
| 2025-11-06 | Direct Semantic Communication Between Large Language Models via Vector Translation | Fu-Chun Yang et.al. | 2511.03945 | null |
| 2025-11-06 | MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation | Shih-Lun Wu et.al. | 2511.03942 | null |
| 2025-11-06 | RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods | Raghav Sharma et.al. | 2511.03939 | null |
| 2025-11-06 | SynQuE: Estimating Synthetic Dataset Quality Without Annotations | Arthur Chen et.al. | 2511.03928 | null |
| 2025-11-06 | Collaborative Agents for Automated Program Repair in Ruby | Nikta Akbarpour et.al. | 2511.03925 | null |
| 2025-11-05 | The Human Flourishing Geographic Index: A County-Level Dataset for the United States, 2013–2023 | Stefano M. Iacus et.al. | 2511.03915 | null |
| 2025-11-05 | GRAD: Graph-Retrieved Adaptive Decoding for Hallucination Mitigation | Manh Nguyen et.al. | 2511.03900 | null |
| 2025-11-05 | Secure Code Generation at Scale with Reflexion | Arup Datta et.al. | 2511.03898 | null |
| 2025-11-05 | KnowThyself: An Agentic Assistant for LLM Interpretability | Suraj Prasai et.al. | 2511.03878 | null |
| 2025-11-05 | OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing Paradigms | Arijit Bhattacharjee et.al. | 2511.03866 | null |
| 2025-11-05 | GAIA: Geothermal Analytics and Intelligent Agent | Randy Harsuko et.al. | 2511.03852 | null |
| 2025-11-05 | To See or To Read: User Behavior Reasoning in Multimodal LLMs | Tianning Dong et.al. | 2511.03845 | null |
| 2025-11-05 | ASAP: an Agentic Solution to Auto-optimize Performance of Large-Scale LLM Training | Yuran Ding et.al. | 2511.03844 | null |
| 2025-11-05 | Divide, Cache, Conquer: Dichotomic Prompting for Efficient Multi-Label LLM-Based Classification | Mikołaj Langner et.al. | 2511.03830 | null |
| 2025-11-05 | STARS: Segment-level Token Alignment with Rejection Sampling in Large Language Models | Mohammad Atif Quamar et.al. | 2511.03827 | null |
| 2025-11-05 | How Different Tokenization Algorithms Impact LLMs and Transformer Models for Binary Code Analysis | Ahmed Mostafa et.al. | 2511.03825 | null |
| 2025-11-05 | PLLuM: A Family of Polish Large Language Models | Jan Kocoń et.al. | 2511.03823 | null |
| 2025-11-05 | Expert Evaluation of LLM World Models: A High- $T_c$ Superconductivity Case Study | Haoyu Guo et.al. | 2511.03782 | null |
| 2025-11-05 | Scaling Agent Learning via Experience Synthesis | Zhaorun Chen et.al. | 2511.03773 | null |
| 2025-11-05 | Disentangled Concepts Speak Louder Than Words:Explainable Video Action Recognition | Jongseo Lee et.al. | 2511.03725 | null |
| 2025-11-05 | Outbidding and Outbluffing Elite Humans: Mastering Liar’s Poker via Self-Play and Reinforcement Learning | Richard Dewey et.al. | 2511.03724 | null |
| 2025-11-05 | LLM-enhanced Air Quality Monitoring Interface via Model Context Protocol | Yu-Erh Pan et.al. | 2511.03706 | null |
| 2025-11-05 | Do Androids Dream of Unseen Puppeteers? Probing for a Conspiracy Mindset in Large Language Models | Francesco Corso et.al. | 2511.03699 | null |
| 2025-11-05 | AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing | Mohsen Ahmadzadeh et.al. | 2511.03697 | null |
| 2025-11-05 | Whisper Leak: a side-channel attack on Large Language Models | Geoff McDonald et.al. | 2511.03675 | null |
| 2025-11-05 | Watermarking Large Language Models in Europe: Interpreting the AI Act in Light of Technology | Thomas Souverain et.al. | 2511.03641 | null |
| 2025-11-05 | Towards Transparent Stance Detection: A Zero-Shot Approach Using Implicit and Explicit Interpretability | Apoorva Upadhyaya et.al. | 2511.03635 | null |
| 2025-11-05 | LiveTradeBench: Seeking Real-World Alpha with Large Language Models | Haofei Yu et.al. | 2511.03628 | null |
| 2025-11-05 | PerfDojo: Automated ML Library Generation for Heterogeneous Architectures | Andrei Ivanov et.al. | 2511.03586 | null |
| 2025-11-05 | ASVRI-Legal: Fine-Tuning LLMs with Retrieval Augmented Generation for Enhanced Legal Regulation | One Octadion et.al. | 2511.03563 | null |
| 2025-11-05 | MultiZebraLogic: A Multilingual Logical Reasoning Benchmark | Sofie Helene Bruun et.al. | 2511.03553 | null |
| 2025-11-05 | Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding | Ziv Nevo et.al. | 2511.03549 | null |
| 2025-11-05 | U2F: Encouraging SWE-Agent to Seize Novelty without Losing Feasibility | Wencheng Ye et.al. | 2511.03517 | null |
| 2025-11-05 | One Battle After Another: Probing LLMs’ Limits on Multi-Turn Instruction Following with a Benchmark Evolving Framework | Qi Jia et.al. | 2511.03508 | null |
| 2025-11-05 | BanglaSTEM: A Parallel Corpus for Technical Domain Bangla-English Translation | Kazi Reyazul Hasan et.al. | 2511.03498 | null |
| 2025-11-05 | RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse | Yinsicheng Jiang et.al. | 2511.03475 | null |
| 2025-11-05 | Towards Scalable Web Accessibility Audit with MLLMs as Copilots | Ming Gu et.al. | 2511.03471 | null |
| 2025-11-05 | CareMedEval dataset: Evaluating Critical Appraisal and Reasoning in the Biomedical Field | Doria Bonzi et.al. | 2511.03441 | null |
| 2025-11-05 | Light over Heavy: Automated Performance Requirements Quantification with Linguistic Inducement | Shihai Wang et.al. | 2511.03421 | null |
| 2025-11-05 | Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG | Longpeng Qiu et.al. | 2511.03410 | null |
| 2025-11-05 | Efficient Reasoning via Thought-Training and Thought-Free Inference | Canhui Wu et.al. | 2511.03408 | null |
| 2025-11-05 | Towards Realistic Project-Level Code Generation via Multi-Agent Collaboration and Semantic Architecture Modeling | Qianhui Zhao et.al. | 2511.03404 | null |
| 2025-11-05 | GUIDES: Guidance Using Instructor-Distilled Embeddings for Pre-trained Robot Policy Enhancement | Minquan Gao et.al. | 2511.03400 | null |
| 2025-11-05 | Computational Imaging Meets LLMs: Zero-Shot IDH Mutation Prediction in Brain Gliomas | Syed Muqeem Mahmood et.al. | 2511.03376 | null |
| 2025-11-05 | LFC-DA: Logical Formula-Controlled Data Augmentation for Enhanced Logical Reasoning | Shenghao Li et.al. | 2511.03372 | null |
| 2025-11-05 | EQ-Negotiator: Dynamic Emotional Personas Empower Small Language Models for Edge-Deployable Credit Negotiation | Yunbo Long et.al. | 2511.03370 | null |
| 2025-11-05 | Silenced Biases: The Dark Side LLMs Learned to Refuse | Rom Himelstein et.al. | 2511.03369 | null |
| 2025-11-05 | A Modular, Data-Free Pipeline for Multi-Label Intention Recognition in Transportation Agentic AI Applications | Xiaocai Zhang et.al. | 2511.03363 | null |
| 2025-11-05 | Multi-Object Tracking Retrieval with LLaVA-Video: A Training-Free Solution to MOT25-StAG Challenge | Yi Yang et.al. | 2511.03332 | null |
| 2025-11-05 | Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks | Jindong Hong et.al. | 2511.03328 | null |
| 2025-11-05 | SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding | Mauro Orazio Drago et.al. | 2511.03325 | null |
| 2025-11-05 | TASU: Text-Only Alignment for Speech Understanding | Jing Peng et.al. | 2511.03310 | null |
| 2025-11-05 | How to Evaluate Speech Translation with Source-Aware Neural MT Metrics | Mauro Cettolo et.al. | 2511.03295 | null |
| 2025-11-05 | UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM | Hai Huang et.al. | 2511.03293 | null |
| 2025-11-05 | Let the Bees Find the Weak Spots: A Path Planning Perspective on Multi-Turn Jailbreak Attacks against LLMs | Yize Liu et.al. | 2511.03271 | null |
| 2025-11-05 | SCALE: Upscaled Continual Learning of Large Language Models | Jin-woo Lee et.al. | 2511.03270 | null |
| 2025-11-05 | Comparing the Performance of LLMs in RAG-based Question-Answering: A Case Study in Computer Science Literature | Ranul Dayarathne et.al. | 2511.03261 | null |
| 2025-11-05 | Auditing M-LLMs for Privacy Risks: A Synthetic Benchmark and Evaluation Framework | Junhao Li et.al. | 2511.03248 | null |
| 2025-11-05 | Death by a Thousand Prompts: Open Model Vulnerability Analysis | Amy Chang et.al. | 2511.03247 | null |
| 2025-11-05 | IndicSuperTokenizer: An Optimized Tokenizer for Indic Multilingual LLMs | Souvik Rana et.al. | 2511.03237 | null |
| 2025-11-05 | From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers | Yi-Fei Liu et.al. | 2511.03235 | null |
| 2025-11-05 | Multimodal-Wireless: A Large-Scale Dataset for Sensing and Communication | Tianhao Mao et.al. | 2511.03220 | null |
| 2025-11-05 | Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification | Shaghayegh Kolli et.al. | 2511.03217 | null |
| 2025-11-05 | LGM: Enhancing Large Language Models with Conceptual Meta-Relations and Iterative Retrieval | Wenchang Lei et.al. | 2511.03214 | null |
| 2025-11-05 | QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models | Kuei-Chun Kao et.al. | 2511.03206 | null |
| 2025-11-05 | Large Language Models as Information Sources: Distinctive Characteristics and Types of Low-Quality Information | Jiawei Zhou et.al. | 2511.03198 | null |
| 2025-11-05 | Understanding Robustness of Model Editing in Code LLMs: An Empirical Study | Vinaik Chhetri et.al. | 2511.03182 | null |
| 2025-11-05 | Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control | Rewida Ali et.al. | 2511.03181 | null |
| 2025-11-05 | BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture | Shahriyar Zaman Ridoy et.al. | 2511.03180 | null |
| 2025-11-05 | Toward Autonomous Engineering Design: A Knowledge-Guided Multi-Agent Framework | Varun Kumar et.al. | 2511.03179 | null |
| 2025-11-05 | SurgAnt-ViVQA: Learning to Anticipate Surgical Events through GRU-Driven Temporal Cross-Attention | Shreyas C. Dhake et.al. | 2511.03178 | null |
| 2025-11-05 | AI as We Describe It: How Large Language Models and Their Applications in Health are Represented Across Channels of Public Discourse | Jiawei Zhou et.al. | 2511.03174 | null |
| 2025-11-05 | Measuring Aleatoric and Epistemic Uncertainty in LLMs: Empirical Evaluation on ID and OOD QA Tasks | Kevin Wang et.al. | 2511.03166 | null |
| 2025-11-05 | RefAgent: A Multi-agent LLM-based Framework for Automatic Software Refactoring | Khouloud Oueslati et.al. | 2511.03153 | null |
| 2025-11-05 | From Measurement to Expertise: Empathetic Expert Adapters for Context-Based Empathy in Conversational AI Agents | Erfan Shayegani et.al. | 2511.03143 | null |
| 2025-11-05 | A Proprietary Model-Based Safety Response Framework for AI Agents | Qi Li et.al. | 2511.03138 | null |
| 2025-11-05 | Using Multi-modal Large Language Model to Boost Fireworks Algorithm’s Ability in Settling Challenging Optimization Tasks | Shipeng Cen et.al. | 2511.03137 | null |
| 2025-11-05 | From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation | Najrin Sultana et.al. | 2511.03128 | null |
| 2025-11-05 | Control Barrier Function for Aligning Large Language Models | Yuya Miyaoka et.al. | 2511.03121 | null |
| 2025-11-05 | Large language models require a new form of oversight: capability-based monitoring | Katherine C. Kellogg et.al. | 2511.03106 | null |
| 2025-11-05 | CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic | Saad Mankarious et.al. | 2511.03102 | null |
| 2025-11-05 | ALAS: Transactional and Dynamic Multi-Agent LLM Planning | Longling Geng et.al. | 2511.03094 | null |
| 2025-11-05 | SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators | Jonathan Li et.al. | 2511.03092 | null |
| 2025-11-05 | PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech | Michel Wong et.al. | 2511.03080 | null |
| 2025-11-04 | A Collaborative Reasoning Framework for Anomaly Diagnostics in Underwater Robotics | Markus Buchholz et.al. | 2511.03075 | null |
| 2025-11-04 | Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge | Drago Plecko et.al. | 2511.03070 | null |
| 2025-11-04 | Reading Between the Lines: The One-Sided Conversation Problem | Victoria Ebert et.al. | 2511.03056 | null |
| 2025-11-04 | No-Human in the Loop: Agentic Evaluation at Scale for Recommendation | Tao Zhang et.al. | 2511.03051 | null |
| 2025-11-04 | ROBoto2: An Interactive System and Dataset for LLM-assisted Clinical Trial Risk of Bias Assessment | Anthony Hevia et.al. | 2511.03048 | null |
| 2025-11-04 | Unsupervised Evaluation of Multi-Turn Objective-Driven Interactions | Emi Soroka et.al. | 2511.03047 | null |
| 2025-11-04 | Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis | Yan Cathy Hua et.al. | 2511.03034 | null |
| 2025-11-04 | PublicAgent: Multi-Agent Design Principles From an LLM-Based Open Data Analysis Framework | Sina Montazeri et.al. | 2511.03023 | null |
| 2025-11-04 | LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation | Gyeom Hwangbo et.al. | 2511.03001 | null |
| 2025-11-04 | Zero-shot data citation function classification using transformer-based large language models (LLMs) | Neil Byers et.al. | 2511.02936 | null |
| 2025-11-04 | Cache Mechanism for Agent RAG Systems | Shuhang Lin et.al. | 2511.02919 | null |
| 2025-11-04 | Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models | W. K. M Mithsara et.al. | 2511.02894 | null |
| 2025-11-04 | Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything | Huawei Lin et.al. | 2511.02834 | null |
| 2025-11-04 | Can LLMs subtract numbers? | Mayank Jobanputra et.al. | 2511.02795 | null |
| 2025-11-04 | When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning | Chenyu Zhang et.al. | 2511.02794 | null |
| 2025-11-04 | When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought | Yiyang Zhou et.al. | 2511.02779 | null |
| 2025-11-04 | ConMeZO: Adaptive Descent-Direction Sampling for Gradient-Free Finetuning of Large Language Models | Lejs Deen Behric et.al. | 2511.02757 | null |
| 2025-11-04 | Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning | Bowen Jin et.al. | 2511.02755 | null |
| 2025-11-04 | AI Diffusion in Low Resource Language Countries | Amit Misra et.al. | 2511.02752 | null |
| 2025-11-04 | Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning | Farhad Rezazadeh et.al. | 2511.02748 | null |
| 2025-11-04 | CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents | Jiayu Liu et.al. | 2511.02734 | null |
| 2025-11-04 | LLEXICORP: End-user Explainability of Convolutional Neural Networks | Vojtěch Kůr et.al. | 2511.02720 | null |
| 2025-11-04 | ReleaseEval: A Benchmark for Evaluating Language Models in Automated Release Note Generation | Qianru Meng et.al. | 2511.02713 | null |
| 2025-11-04 | VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models | Zhicheng Zhang et.al. | 2511.02712 | null |
| 2025-11-04 | Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs | Georgios Tzannetos et.al. | 2511.02690 | null |
| 2025-11-04 | Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes | Mohammadsajad Alipour et.al. | 2511.02681 | null |
| 2025-11-04 | EasyTUS: A Comprehensive Framework for Fast and Accurate Table Union Search across Data Lakes | Tim Otto et.al. | 2511.02674 | null |
| 2025-11-04 | Apriel-H1: Towards Efficient Enterprise Reasoning Models | Oleksiy Ostapenko et.al. | 2511.02651 | null |
| 2025-11-04 | Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks | Xiumei Deng et.al. | 2511.02647 | null |
| 2025-11-04 | DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning | Lachlan McPheat et.al. | 2511.02627 | null |
| 2025-11-04 | Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis, Solution, and Interpretation | Renfei Dang et.al. | 2511.02626 | null |
| 2025-11-04 | The Realignment Problem: When Right becomes Wrong in LLMs | Aakash Sen Sharma et.al. | 2511.02623 | null |
| 2025-11-04 | Verifying LLM Inference to Prevent Model Weight Exfiltration | Roy Rinberg et.al. | 2511.02620 | null |
| 2025-11-04 | UniChange: Unifying Change Detection with Multimodal Large Language Model | Xu Zhang et.al. | 2511.02607 | null |
| 2025-11-04 | CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency | Ehsan Aghazadeh et.al. | 2511.02603 | null |
| 2025-11-04 | Next Token Knowledge Tracing: Exploiting Pretrained LLM Representations to Decode Student Behaviour | Max Norris et.al. | 2511.02599 | null |
| 2025-11-04 | A Large Language Model for Corporate Credit Scoring | Chitro Majumdar et.al. | 2511.02593 | null |
| 2025-11-04 | The ORCA Benchmark: Evaluating Real-World Calculation Accuracy in Large Language Models | Claudia Herambourg et.al. | 2511.02589 | null |
| 2025-11-04 | Smart-Hiring: An Explainable end-to-end Pipeline for CV Information Extraction and Job Matching | Kenza Khelkhal et.al. | 2511.02537 | null |
| 2025-11-04 | Knowledge Graph-enhanced Large Language Model for Incremental Game PlayTesting | Enhong Mu et.al. | 2511.02534 | null |
| 2025-11-04 | Causal Graph Neural Networks for Healthcare | Munib Mesinovic et.al. | 2511.02531 | null |
| 2025-11-04 | Large Lemma Miners: Can LLMs do Induction Proofs for Hardware? | Romy Peled et.al. | 2511.02521 | null |
| 2025-11-04 | ESA: Energy-Based Shot Assembly Optimization for Automatic Video Editing | Yaosen Chen et.al. | 2511.02505 | null |
| 2025-11-04 | BRAINS: A Retrieval-Augmented System for Alzheimer’s Detection and Monitoring | Rajan Das Gupta et.al. | 2511.02490 | null |
| 2025-11-04 | Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization | Tao Liu et.al. | 2511.02489 | link |
| 2025-11-04 | Modeling Hawkish-Dovish Latent Beliefs in Multi-Agent Debate-Based LLMs for Monetary Policy Decision Classification | Kaito Takano et.al. | 2511.02469 | null |
| 2025-11-04 | Auditable-choice reframing unlocks RL-based verification for open-ended tasks | Mengyu Zhang et.al. | 2511.02463 | null |
| 2025-11-04 | Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas | Giulia Iadisernia et.al. | 2511.02458 | null |
| 2025-11-04 | Who’s Who? LLM-assisted Software Traceability with Architecture Entity Recognition | Dominik Fuchß et.al. | 2511.02434 | null |
| 2025-11-04 | Can Conversational AI Counsel for Change? A Theory-Driven Approach to Supporting Dietary Intentions in Ambivalent Individuals | Michelle Bak et.al. | 2511.02428 | null |
| 2025-11-04 | From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics | Nicolas Schuler et.al. | 2511.02427 | null |
| 2025-11-04 | ReAcTree: Hierarchical LLM Agent Trees with Control Flow for Long-Horizon Task Planning | Jae-Woo Choi et.al. | 2511.02424 | null |
| 2025-11-04 | LLM4PG: Adapting Large Language Model for Pathloss Map Generation via Synesthesia of Machines | Mingran Sun et.al. | 2511.02423 | null |
| 2025-11-04 | ChartM $^3$ : A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension | Duo Xu et.al. | 2511.02415 | null |
| 2025-11-04 | EvoDev: An Iterative Feature-Driven Framework for End-to-End Software Development with LLM-based Agents | Junwei Liu et.al. | 2511.02399 | null |
| 2025-11-04 | RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning | Jiahe Song et.al. | 2511.02384 | null |
| 2025-11-04 | Revisiting put-that-there, context aware window interactions via LLMs | Riccardo Bovo et.al. | 2511.02378 | null |
| 2025-11-04 | AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models | Aashray Reddy et.al. | 2511.02376 | null |
| 2025-11-04 | AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda | Mohd Nauman et.al. | 2511.02374 | null |
| 2025-11-04 | LUMA-RAG: Lifelong Multimodal Agents with Provably Stable Streaming Alignment | Rohan Wandre et.al. | 2511.02371 | null |
| 2025-11-04 | An LLM-powered MILP modelling engine for workforce scheduling guided by expert knowledge | Qingyang Li et.al. | 2511.02364 | null |
| 2025-11-04 | Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation | Wongyu Kim et.al. | 2511.02358 | null |
| 2025-11-04 | An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks | Xu Liu et.al. | 2511.02356 | null |
| 2025-11-04 | LTD-Bench: Evaluating Large Language Models by Letting Them Draw | Liuhao Lin et.al. | 2511.02347 | null |
| 2025-11-04 | Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation | Zhiwei Zhang et.al. | 2511.02303 | null |
| 2025-11-04 | VFocus: Better Verilog Generation from Large Language Model via Focused Reasoning | Zhuorui Zhao et.al. | 2511.02285 | null |
| 2025-11-04 | SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning | Fangxun Shu et.al. | 2511.02280 | link |
| 2025-11-04 | LA-MARRVEL: A Knowledge-Grounded and Language-Aware LLM Reranker for AI-MARRVEL in Rare Disease Diagnosis | Jaeyeon Lee et.al. | 2511.02263 | null |
| 2025-11-04 | When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs | Zhuoran Zhang et.al. | 2511.02243 | null |
| 2025-11-04 | Deep Ideation: Designing LLM Agents to Generate Novel Research Ideas on Scientific Concept Network | Keyu Zhao et.al. | 2511.02238 | null |
| 2025-11-04 | An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM | Jiawei Liu et.al. | 2511.02234 | null |
| 2025-11-04 | Quantitative Risk Assessment in Radiation Oncology via LLM-Powered Root Cause Analysis of Incident Reports | Yuntao Wang et.al. | 2511.02223 | null |
| 2025-11-04 | TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data | Changjiang Jiang et.al. | 2511.02219 | null |
| 2025-11-04 | IG-Pruning: Input-Guided Block Pruning for Large Language Models | Kangyu Qiao et.al. | 2511.02213 | null |
| 2025-11-04 | Language-Enhanced Generative Modeling for PET Synthesis from MRI and Blood Biomarkers | Zhengjie Zhang et.al. | 2511.02206 | null |
| 2025-11-04 | LLMs as Judges: Toward The Automatic Review of GSN-compliant Assurance Cases | Gerhard Yu et.al. | 2511.02203 | null |
| 2025-11-04 | Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration | Jingbo Wang et.al. | 2511.02200 | null |
| 2025-11-04 | Open the Oyster: Empirical Evaluation and Improvement of Code Reasoning Confidence in LLMs | Shufan Wang et.al. | 2511.02197 | null |
| 2025-11-04 | Personalized Decision Modeling: Utility Optimization or Textualized-Symbolic Reasoning | Yibo Zhao et.al. | 2511.02194 | null |
| 2025-11-04 | Pinpointing Trigger Moment for Grounded Video QA: Enhancing Spatio-temporal Grounding in Multimodal Large Language Models | Jinhwan Seo et.al. | 2511.02182 | null |
| 2025-11-04 | Eliminating Multi-GPU Performance Taxes: A Systems Approach to Efficient Distributed LLMs | Octavian Alexandru Trifan et.al. | 2511.02168 | null |
| 2025-11-03 | Rethinking LLM Human Simulation: When a Graph is What You Need | Joseph Suh et.al. | 2511.02135 | null |
| 2025-11-03 | InsurAgent: A Large Language Model-Empowered Agent for Simulating Individual Behavior in Purchasing Flood Insurance | Ziheng Geng et.al. | 2511.02119 | null |
| 2025-11-03 | Deep Value Benchmark: Measuring Whether Models Generalize Deep values or Shallow Preferences | Joshua Ashkinaze et.al. | 2511.02109 | null |
| 2025-11-03 | Metamorphic Testing of Large Language Models for Natural Language Processing | Steven Cho et.al. | 2511.02108 | null |
| 2025-11-03 | LLM Probing with Contrastive Eigenproblems: Improving Understanding and Applicability of CCS | Stefan F. Schouten et.al. | 2511.02089 | null |
| 2025-11-03 | Watermarking Discrete Diffusion Language Models | Avi Bagchi et.al. | 2511.02083 | null |
| 2025-10-10 | A Unified Biomedical Named Entity Recognition Framework with Large Language Models | Tengxiao Lv et.al. | 2510.08902 | null |
| 2025-09-25 | SCRA-VQA: Summarized Caption-Rerank for Augmented Large Language Models in Visual Question Answering | Yan Zhang et.al. | 2509.20871 | null |
| 2025-08-12 | LLaMA-Based Models for Aspect-Based Sentiment Analysis | Jakub Šmíd et.al. | 2508.08649 | null |
| 2025-07-23 | BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems | Malsha Ashani Mahawatta Dona et.al. | 2507.17722 | null |
| 2025-07-23 | AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer | Danny D. Leybzon et.al. | 2507.17718 | null |
| 2025-07-23 | HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging | Taha Ceritli et.al. | 2507.17706 | null |
| 2025-07-23 | Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models | Changxin Tian et.al. | 2507.17702 | null |
| 2025-07-23 | Thinking Isn’t an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations | Zhao Song et.al. | 2507.17699 | null |
| 2025-07-23 | Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks | Ilias Chatzistefanidis et.al. | 2507.17695 | null |
| 2025-07-23 | Simulating multiple human perspectives in socio-ecological systems using large language models | Yongchao Zeng et.al. | 2507.17680 | null |
| 2025-07-23 | See the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question Answering | Junjie Wang et.al. | 2507.17659 | null |
| 2025-07-23 | Who Attacks, and Why? Using LLMs to Identify Negative Campaigning in 18M Tweets across 19 Countries | Victor Hartman et.al. | 2507.17636 | null |
| 2025-07-23 | A Hybrid Early-Exit Algorithm for Large Language Models Based on Space Alignment Decoding (SPADE) | Bowen Zheng et.al. | 2507.17618 | null |
| 2025-07-22 | LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMs | Da-Chen Lian et.al. | 2507.16809 | null |
| 2025-07-22 | Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis | Zhihao Xu et.al. | 2507.16808 | null |
| 2025-07-22 | Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning | Yanjun Zheng et.al. | 2507.16802 | null |
| 2025-07-23 | Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent | Xiaoyu Zhan et.al. | 2507.16799 | null |
| 2025-07-22 | Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning | Helena Casademunt et.al. | 2507.16795 | null |
| 2025-07-22 | ChatChecker: A Framework for Dialogue System Testing and Evaluation Through Non-cooperative User Simulation | Roman Mayr et.al. | 2507.16792 | null |
| 2025-07-22 | Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning | Hongyin Luo et.al. | 2507.16784 | null |
| 2025-07-22 | Cooling Matters: Benchmarking Large Language Models and Vision-Language Models on Liquid-Cooled Versus Air-Cooled H100 GPU Systems | Imran Latif et.al. | 2507.16781 | null |
| 2025-07-22 | When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs | Yue Li et.al. | 2507.16773 | null |
| 2025-07-22 | WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding | Ran Wang et.al. | 2507.16768 | null |
| 2025-07-21 | Diffusion Beats Autoregressive in Data-Constrained Settings | Mihir Prabhudesai et.al. | 2507.15857 | null |
| 2025-07-21 | Gemini 2.5 Pro Capable of Winning Gold at IMO 2025 | Yichen Huang et.al. | 2507.15855 | null |
| 2025-07-21 | The Other Mind: How Language Models Exhibit Human Temporal Cognition | Lingyu Li et.al. | 2507.15851 | null |
| 2025-07-21 | 3LM: Bridging Arabic, STEM, and Code through Benchmarking | Basma El Amel Boussaha et.al. | 2507.15850 | null |
| 2025-07-21 | The Impact of Language Mixing on Bilingual LLM Reasoning | Yihao Li et.al. | 2507.15849 | null |
| 2025-07-21 | FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs | Anh Nguyen et.al. | 2507.15839 | null |
| 2025-07-21 | Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation | Alessandro B. Melchiorre et.al. | 2507.15826 | null |
| 2025-07-21 | ACS: An interactive framework for conformal selection | Yu Gui et.al. | 2507.15825 | null |
| 2025-07-21 | Do AI models help produce verified bug fixes? | Li Huang et.al. | 2507.15822 | null |
| 2025-07-21 | LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra | Seth Karten et.al. | 2507.15815 | null |
| 2025-07-18 | CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning | Xiaoya Li et.al. | 2507.14111 | link |
| 2025-07-18 | Automated Interpretation of Non-Destructive Evaluation Contour Maps Using Large Language Models for Bridge Condition Assessment | Viraj Nishesh Darji et.al. | 2507.14107 | null |
| 2025-07-18 | Generative AI-Driven High-Fidelity Human Motion Simulation | Hari Iyer et.al. | 2507.14097 | null |
| 2025-07-18 | Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track | Brian Ondov et.al. | 2507.14096 | null |
| 2025-07-18 | DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration | Xiyun Li et.al. | 2507.14088 | null |
| 2025-07-18 | The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems? | Maria Tsfasman et.al. | 2507.14084 | null |
| 2025-07-18 | DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits | Garapati Keerthana et.al. | 2507.14079 | null |
| 2025-07-18 | Evaluating the Effectiveness of Cost-Efficient Large Language Models in Benchmark Biomedical Tasks | Israt Jahan et.al. | 2507.14045 | null |
| 2025-07-18 | Architecting Human-AI Cocreation for Technical Services – Interaction Modes and Contingency Factors | Jochen Wulf et.al. | 2507.14034 | null |
| 2025-07-18 | KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models | Lam Nguyen et.al. | 2507.14032 | null |
| 2025-07-17 | VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding | Shihao Wang et.al. | 2507.13353 | null |
| 2025-07-17 | Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes | Tyler Loakman et.al. | 2507.13335 | null |
| 2025-07-17 | A Survey of Context Engineering for Large Language Models | Lingrui Mei et.al. | 2507.13334 | link |
| 2025-07-17 | The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner | Zhouqi Hua et.al. | 2507.13332 | null |
| 2025-07-17 | GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM | Kyeongjin Ahn et.al. | 2507.13323 | null |
| 2025-07-17 | Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark | Junsu Kim et.al. | 2507.13314 | null |
| 2025-07-17 | The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations | Carlos Arriaga et.al. | 2507.13302 | null |
| 2025-07-17 | AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research | Yilun Zhao et.al. | 2507.13300 | link |
| 2025-07-17 | Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management | Luis Gasco et.al. | 2507.13275 | null |
| 2025-07-17 | Automating Steering for Safe Multimodal Large Language Models | Lyucheng Wu et.al. | 2507.13255 | null |
| 2025-07-16 | Mitigating Object Hallucinations via Sentence-Level Early Intervention | Shangpin Peng et.al. | 2507.12455 | link |
| 2025-07-16 | S2WTM: Spherical Sliced-Wasserstein Autoencoder for Topic Modeling | Suman Adhya et.al. | 2507.12451 | null |
| 2025-07-16 | Describe Anything Model for Visual Question Answering on Text-rich Images | Yen-Linh Vu et.al. | 2507.12441 | link |
| 2025-07-16 | Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models | Yik Siu Chan et.al. | 2507.12428 | null |
| 2025-07-16 | Advancing Retrieval-Augmented Generation for Structured Enterprise and Internal Data | Chandana Cheerla et.al. | 2507.12425 | link |
| 2025-07-16 | QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval | Jaehyun Kwak et.al. | 2507.12416 | null |
| 2025-07-16 | SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? | Xinyi He et.al. | 2507.12415 | link |
| 2025-07-16 | Assessing the Value of Visual Input: A Benchmark of Multimodal Large Language Models for Robotic Path Planning | Jacinto Colan et.al. | 2507.12391 | null |
| 2025-07-16 | Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics | Meysam Alizadeh et.al. | 2507.12372 | null |
| 2025-07-16 | Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate | Ana Davila et.al. | 2507.12370 | null |
| 2025-07-15 | Streaming 4D Visual Geometry Transformer | Dong Zhuo et.al. | 2507.11539 | link |
| 2025-07-15 | DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering | Yinsheng Li et.al. | 2507.11527 | link |
| 2025-07-15 | LLM-based ambiguity detection in natural language instructions for collaborative surgical robots | Ana Davila et.al. | 2507.11525 | null |
| 2025-07-15 | AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air | Shiyi Yang et.al. | 2507.11515 | null |
| 2025-07-15 | LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer | Yaoxian Dong et.al. | 2507.11457 | null |
| 2025-07-15 | Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize? | Yanjian Zhang et.al. | 2507.11423 | null |
| 2025-07-15 | Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations | Miray Özcan et.al. | 2507.11417 | null |
| 2025-07-15 | Seq vs Seq: An Open Suite of Paired Encoders and Decoders | Orion Weller et.al. | 2507.11412 | link |
| 2025-07-15 | KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? | Soumadeep Saha et.al. | 2507.11408 | null |
| 2025-07-15 | EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes | LG AI Research et.al. | 2507.11407 | null |
| 2025-07-14 | Fusing LLM Capabilities with Routing Data | Tao Feng et.al. | 2507.10540 | null |
| 2025-07-14 | CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks | Hongchao Jiang et.al. | 2507.10535 | null |
| 2025-07-14 | Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination | Mingqi Wu et.al. | 2507.10532 | link |
| 2025-07-14 | Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI | Jiangkai Wu et.al. | 2507.10510 | null |
| 2025-07-14 | Scene-Aware Conversational ADAS with Generative AI for Real-Time Driver Assistance | Kyungtae Han et.al. | 2507.10500 | null |
| 2025-07-14 | Can You Detect the Difference? | İsmail Tarım et.al. | 2507.10475 | null |
| 2025-07-14 | GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space | David G. Shatwell et.al. | 2507.10473 | null |
| 2025-07-14 | MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking | Mohamed T. Younes et.al. | 2507.10472 | null |
| 2025-07-14 | An Empirical Evaluation of AI-Powered Non-Player Characters’ Perceived Realism and Performance in Virtual Reality Environments | Mikko Korkiakoski et.al. | 2507.10469 | null |
| 2025-07-14 | Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems | Hammad Atta et.al. | 2507.10457 | null |
| 2025-07-11 | Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective | Hangjie Yuan et.al. | 2507.08801 | link |
| 2025-07-11 | One Token to Fool LLM-as-a-Judge | Yulai Zhao et.al. | 2507.08794 | null |
| 2025-07-11 | BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity | Chenyang Song et.al. | 2507.08771 | link |
| 2025-07-11 | Multilingual Multimodal Software Developer for Code Generation | Linzheng Chai et.al. | 2507.08719 | null |
| 2025-07-11 | KG-Attention: Knowledge Graph-Guided Attention at Test-Time via Bidirectional Information Aggregation | Songlin Zhai et.al. | 2507.08704 | null |
| 2025-07-11 | ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way | Rajarshi Roy et.al. | 2507.08679 | null |
| 2025-07-11 | LLMCup: Ranking-Enhanced Comment Updating with LLMs | Hua Ge et.al. | 2507.08671 | null |
| 2025-07-11 | KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment | Jiyao Zhang et.al. | 2507.08665 | null |
| 2025-07-11 | Introspection of Thought Helps AI Agents | Haoran Sun et.al. | 2507.08664 | null |
| 2025-07-11 | Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning | Xingguang Ji et.al. | 2507.08649 | link |
| 2025-07-10 | Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology | Haochen Wang et.al. | 2507.07999 | link |
| 2025-07-10 | Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs | Ziyue Li et.al. | 2507.07996 | null |
| 2025-07-10 | Multigranular Evaluation for Brain Visual Decoding | Weihao Xia et.al. | 2507.07993 | null |
| 2025-07-10 | Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs | Jeongseok Hyun et.al. | 2507.07990 | link |
| 2025-07-10 | Automating Expert-Level Medical Reasoning Evaluation of Large Language Models | Shuang Zhou et.al. | 2507.07988 | null |
| 2025-07-10 | OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding | JingLi Lin et.al. | 2507.07984 | link |
| 2025-07-10 | Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology | Sabine Felde et.al. | 2507.07983 | null |
| 2025-07-10 | Defending Against Prompt Injection With a Few DefensiveTokens | Sizhe Chen et.al. | 2507.07974 | null |
| 2025-07-10 | Can Large Language Models Improve Phishing Defense? A Large-Scale Controlled Experiment on Warning Dialogue Explanations | Federico Maria Cau et.al. | 2507.07916 | null |
| 2025-07-10 | DTECT: Dynamic Topic Explorer & Context Tracker | Suman Adhya et.al. | 2507.07910 | null |
| 2025-07-09 | Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor | Vatsal Agarwal et.al. | 2507.07106 | null |
| 2025-07-09 | Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models | Tiezheng Zhang et.al. | 2507.07104 | link |
| 2025-07-09 | Evaluating Attribute Confusion in Fashion Text-to-Image Generation | Ziyue Liu et.al. | 2507.07079 | null |
| 2025-07-09 | 5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage | Ugur Ari et.al. | 2507.07045 | null |
| 2025-07-09 | UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations | Fengran Mo et.al. | 2507.07030 | null |
| 2025-07-09 | First Return, Entropy-Eliciting Explore | Tianyu Zheng et.al. | 2507.07017 | null |
| 2025-07-09 | GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning | S M Taslim Uddin Raju et.al. | 2507.07006 | null |
| 2025-07-09 | Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs | Yahan Yu et.al. | 2507.06999 | null |
| 2025-07-09 | MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation | Qilong Xing et.al. | 2507.06992 | null |
| 2025-07-09 | Are They All Good? Evaluating the Quality of CoTs in LLM-based Code Generation | Binquan Zhang et.al. | 2507.06980 | null |
| 2025-07-08 | Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers | Zhiyuan Peng et.al. | 2507.06223 | link |
| 2025-07-08 | A Survey on Latent Reasoning | Rui-Jie Zhu et.al. | 2507.06203 | link |
| 2025-07-08 | UQLM: A Python Package for Uncertainty Quantification in Large Language Models | Dylan Bouchard et.al. | 2507.06196 | null |
| 2025-07-08 | SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads | Jiale Lao et.al. | 2507.06192 | null |
| 2025-07-08 | Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review | Zhicheng Lin et.al. | 2507.06185 | null |
| 2025-07-08 | Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling | Prahitha Movva et.al. | 2507.06183 | null |
| 2025-07-08 | Data-Semantics-Aware Recommendation of Diverse Pivot Tables | Whanhee Cho et.al. | 2507.06171 | null |
| 2025-07-09 | Skywork-R1V3 Technical Report | Wei Shen et.al. | 2507.06167 | link |
| 2025-07-08 | Evaluation of Habitat Robotics using Large Language Models | William Li et.al. | 2507.06157 | null |
| 2025-07-08 | Large Language Models Predict Human Well-being – But Not Equally Everywhere | Pat Pataranutaporn et.al. | 2507.06141 | null |
| 2025-07-07 | Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing | Chun-Hsiao Yeh et.al. | 2507.05259 | null |
| 2025-07-07 | Spatio-Temporal LLM: Reasoning about Environments and Actions | Haozhen Zheng et.al. | 2507.05258 | null |
| 2025-07-07 | Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions | Yuanzhe Hu et.al. | 2507.05257 | link |
| 2025-07-07 | Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning | Yana Wei et.al. | 2507.05255 | link |
| 2025-07-07 | Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models | Ziqi Miao et.al. | 2507.05248 | null |
| 2025-07-07 | StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling | Meng Wei et.al. | 2507.05240 | link |
| 2025-07-07 | All in One: Visual-Description-Guided Unified Point Cloud Segmentation | Zongyan Han et.al. | 2507.05211 | null |
| 2025-07-07 | CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale | Jonathan Hyun et.al. | 2507.05178 | null |
| 2025-07-07 | OpenS2S: Advancing Open-Source End-to-End Empathetic Large Speech Language Model | Chen Wang et.al. | 2507.05177 | null |
| 2025-07-07 | AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models | Chinnappa Guggilla et.al. | 2507.05157 | null |
| 2025-07-03 | Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation | Jiaer Xia et.al. | 2507.02859 | null |
| 2025-07-03 | Requirements Elicitation Follow-Up Question Generation | Yuchen Shen et.al. | 2507.02858 | null |
| 2025-07-03 | MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs | Purbesh Mitra et.al. | 2507.02851 | link |
| 2025-07-03 | Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection | Ziqi Miao et.al. | 2507.02844 | link |
| 2025-07-03 | LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding | Yuchen Ma et.al. | 2507.02843 | null |
| 2025-07-03 | StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason | Kaiyi Zhang et.al. | 2507.02841 | null |
| 2025-07-03 | ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning | Ruiyang Zhou et.al. | 2507.02834 | null |
| 2025-07-03 | SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model | Wencheng Zhang et.al. | 2507.02822 | null |
| 2025-07-03 | Multimodal Mathematical Reasoning with Diverse Solving Perspective | Wenhao Shi et.al. | 2507.02804 | null |
| 2025-07-03 | Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models | Riccardo Cantini et.al. | 2507.02799 | null |
| 2025-07-02 | Kwai Keye-VL Technical Report | Kwai Keye Team et.al. | 2507.01949 | link |
| 2025-07-02 | SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars | Xiaosheng Zhao et.al. | 2507.01939 | null |
| 2025-07-02 | The Thin Line Between Comprehension and Persuasion in LLMs | Adrian de Wynter et.al. | 2507.01936 | null |
| 2025-07-02 | Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations | Wenhao Wang et.al. | 2507.01930 | null |
| 2025-07-03 | Decision-Oriented Text Evaluation | Yu-Shiang Huang et.al. | 2507.01923 | null |
| 2025-07-02 | Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models | Chengao Li et.al. | 2507.01915 | null |
| 2025-07-02 | Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning | Qingdong He et.al. | 2507.01908 | null |
| 2025-07-02 | AI4Research: A Survey of Artificial Intelligence for Scientific Research | Qiguang Chen et.al. | 2507.01903 | null |
| 2025-07-02 | High-Layer Attention Pruning with Rescaling | Songtao Liu et.al. | 2507.01900 | null |
| 2025-07-02 | MiCoTA: Bridging the Learnability Gap with Intermediate CoT and Teacher Assistants | Dongyi Ding et.al. | 2507.01887 | null |
| 2025-07-01 | Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives | Sixun Dong et.al. | 2506.24124 | link |
| 2025-06-30 | Calligrapher: Freestyle Text Image Customization | Yue Ma et.al. | 2506.24123 | link |
| 2025-06-30 | Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime | Yuqing Wang et.al. | 2506.24120 | null |
| 2025-06-30 | DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World | Xiangtai Li et.al. | 2506.24102 | link |
| 2025-06-30 | Logit-Gap Steering: Efficient Short-Suffix Jailbreaks for Aligned Large Language Models | Tung-Ling Li et.al. | 2506.24056 | null |
| 2025-06-30 | Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC | Xinming Wei et.al. | 2506.24045 | null |
| 2025-06-30 | A Survey on Vision-Language-Action Models for Autonomous Driving | Sicong Jiang et.al. | 2506.24044 | link |
| 2025-06-30 | EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations | Hyunjong Kim et.al. | 2506.24016 | null |
| 2025-06-30 | Large Language Models Don’t Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective | Anselm R. Strohmaier et.al. | 2506.24006 | null |
| 2025-06-30 | Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning | Seungjun Yi et.al. | 2506.23998 | null |
| 2025-06-27 | The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements | Bingchen Zhao et.al. | 2506.22419 | null |
| 2025-06-27 | HyperCLOVA X THINK Technical Report | NAVER Cloud HyperCLOVA X Team et.al. | 2506.22403 | null |
| 2025-06-27 | Refining Czech GEC: Insights from a Multi-Experiment Approach | Petr Pechman et.al. | 2506.22402 | link |
| 2025-06-27 | QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization | Danush Khanna et.al. | 2506.22396 | null |
| 2025-06-27 | What Makes ChatGPT Effective for Software Issue Resolution? An Empirical Study of Developer-ChatGPT Conversations in GitHub | Ramtin Ehsani et.al. | 2506.22390 | null |
| 2025-06-27 | Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment | Yue Zhang et.al. | 2506.22385 | null |
| 2025-06-27 | Probabilistic Optimality for Inference-time Scaling | Youkang Wang et.al. | 2506.22376 | null |
| 2025-06-27 | Towards Fair Rankings: Leveraging LLMs for Gender Bias Detection and Measurement | Maryam Mousavian et.al. | 2506.22372 | null |
| 2025-06-27 | Can Large Language Models Help Students Prove Software Correctness? An Experimental Study with Dafny | Carolina Carreira et.al. | 2506.22370 | null |
| 2025-06-27 | Concept-Level AI for Telecom: Moving Beyond Large Language Models | Viswanath Kumarskandpriya et.al. | 2506.22359 | null |
| 2025-06-26 | Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test | Ziyue Li et.al. | 2506.21551 | null |
| 2025-06-26 | mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale | Xiaona Zhou et.al. | 2506.21550 | null |
| 2025-06-26 | PsyLite Technical Report | Fangjun Ding et.al. | 2506.21536 | null |
| 2025-06-26 | Exploring the Design Space of 3D MLLMs for CT Report Generation | Mohammed Baharoon et.al. | 2506.21535 | null |
| 2025-06-26 | “What’s Up, Doc?”: Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets | Akshay Paruchuri et.al. | 2506.21532 | null |
| 2025-06-26 | Potemkin Understanding in Large Language Models | Marina Mancoridis et.al. | 2506.21521 | null |
| 2025-06-26 | Mitigating Hallucination of Large Vision-Language Models via Dynamic Logits Calibration | Jiahe Chen et.al. | 2506.21509 | null |
| 2025-06-26 | Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge | Boyu Gou et.al. | 2506.21506 | null |
| 2025-06-26 | Bridging Offline and Online Reinforcement Learning for LLMs | Jack Lanchantin et.al. | 2506.21495 | null |
| 2025-06-26 | Efficient and Reuseable Cloud Configuration Search Using Discovery Spaces | Michael Johnston et.al. | 2506.21467 | null |
| 2025-06-25 | The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind | Andrei Lupu et.al. | 2506.20664 | null |
| 2025-06-25 | Memento: Note-Taking for Your Future Self | Chao Wan et.al. | 2506.20642 | null |
| 2025-06-25 | Towards Community-Driven Agents for Machine Learning Engineering | Sijie Li et.al. | 2506.20640 | null |
| 2025-06-25 | DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation | Shansan Gong et.al. | 2506.20639 | null |
| 2025-06-25 | AI Assistants to Enhance and Exploit the PETSc Knowledge Base | Barry Smith et.al. | 2506.20608 | null |
| 2025-06-25 | Model Editing as a Double-Edged Sword: Steering Agent Ethical Behavior Toward Beneficence or Harm | Baixiang Huang et.al. | 2506.20606 | null |
| 2025-06-25 | Video Perception Models for 3D Scene Synthesis | Rui Huang et.al. | 2506.20601 | null |
| 2025-06-25 | HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction | Zhonghao Shi et.al. | 2506.20566 | null |
| 2025-06-25 | Large Language Model-Driven Code Compliance Checking in Building Information Modeling | Soumya Madireddy et.al. | 2506.20551 | null |
| 2025-06-25 | When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs | Ammar Khairi et.al. | 2506.20544 | null |
| 2025-06-24 | ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing | Long Xing et.al. | 2506.19848 | null |
| 2025-06-24 | JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning | Ai Han et.al. | 2506.19846 | null |
| 2025-06-24 | MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration | Yucheng Zhou et.al. | 2506.19835 | null |
| 2025-06-24 | Curating art exhibitions using machine learning | Eurico Covas et.al. | 2506.19813 | null |
| 2025-06-24 | KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality | Baochang Ren et.al. | 2506.19807 | null |
| 2025-06-24 | LLM-Based Social Simulations Require a Boundary | Zengqing Wu et.al. | 2506.19806 | null |
| 2025-06-24 | KnowML: Improving Generalization of ML-NIDS with Attack Knowledge Graphs | Xin Fan Guo et.al. | 2506.19802 | null |
| 2025-06-24 | Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study | Yuqi Zhu et.al. | 2506.19794 | null |
| 2025-06-24 | SAGE: Strategy-Adaptive Generation Engine for Query Rewriting | Teng Wang et.al. | 2506.19783 | null |
| 2025-06-24 | SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning | Yuqian Fu et.al. | 2506.19767 | null |
| 2025-06-23 | jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval | Michael Günther et.al. | 2506.18902 | null |
| 2025-06-23 | Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations | Jiaming Han et.al. | 2506.18898 | null |
| 2025-06-23 | ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs | Jiaru Zou et.al. | 2506.18896 | null |
| 2025-06-23 | Universal Video Temporal Grounding with Generative Multi-modal Large Language Models | Zeqian Li et.al. | 2506.18883 | null |
| 2025-06-23 | CommVQ: Commutative Vector Quantization for KV Cache Compression | Junyan Li et.al. | 2506.18879 | null |
| 2025-06-23 | OmniGen2: Exploration to Advanced Multimodal Generation | Chenyuan Wu et.al. | 2506.18871 | null |
| 2025-06-23 | TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting | Zhongbin Guo et.al. | 2506.18862 | null |
| 2025-06-23 | LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning | Yuhao Wu et.al. | 2506.18841 | null |
| 2025-06-23 | STU-PID: Steering Token Usage via PID Controller for Efficient Large Language Model Reasoning | Aryasomayajula Ram Bharadwaj et.al. | 2506.18831 | null |
| 2025-06-23 | Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories | Islem Bouzenia et.al. | 2506.18824 | null |
| 2025-06-20 | VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning | Zhangyang Qi et.al. | 2506.17221 | null |
| 2025-06-20 | No Free Lunch: Rethinking Internal Feedback for LLM Reasoning | Yanzhi Zhang et.al. | 2506.17219 | null |
| 2025-06-20 | Fine-Tuning Lowers Safety and Disrupts Evaluation Consistency | Kathleen C. Fraser et.al. | 2506.17209 | null |
| 2025-06-20 | Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems | Matias Martinez et.al. | 2506.17208 | null |
| 2025-06-20 | Confidence Scoring for LLM-Generated SQL in Supply Chain Data Extraction | Jiekai Ma et.al. | 2506.17203 | null |
| 2025-06-20 | Detecting LLM-Generated Short Answers and Effects on Learner Performance | Shambhavi Bhushan et.al. | 2506.17196 | null |
| 2025-06-20 | The MedPerturb Dataset: What Non-Content Perturbations Reveal About Human and Clinical LLM Decision Making | Abinitha Gourabathina et.al. | 2506.17163 | null |
| 2025-06-20 | Do We Need Large VLMs for Spotting Soccer Actions? | Ritabrata Chakraborty et.al. | 2506.17144 | null |
| 2025-06-20 | Large Language Model Unlearning for Source Code | Xue Jiang et.al. | 2506.17125 | null |
| 2025-06-20 | When Can Model-Free Reinforcement Learning be Enough for Thinking? | Josiah P. Hanna et.al. | 2506.17124 | null |
| 2025-06-18 | PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning | Yuhui Shi et.al. | 2506.15683 | null |
| 2025-06-18 | GenRecal: Generation after Recalibration from Large to Small Vision-Language Models | Byung-Kwan Lee et.al. | 2506.15681 | null |
| 2025-06-18 | SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence | Yao Zhang et.al. | 2506.15672 | null |
| 2025-06-18 | CC-LEARN: Cohort-based Consistency Learning | Xiao Ye et.al. | 2506.15662 | null |
| 2025-06-18 | PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection | Wenhao Li et.al. | 2506.15656 | null |
| 2025-06-18 | deepSURF: Detecting Memory Safety Vulnerabilities in Rust Through Fuzzing LLM-Augmented Harnesses | Georgios Androutsopoulos et.al. | 2506.15648 | null |
| 2025-06-18 | Demystifying the Visual Quality Paradox in Multimodal Large Language Models | Shuo Xing et.al. | 2506.15645 | null |
| 2025-06-18 | Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability | Yusuke Sakai et.al. | 2506.15629 | null |
| 2025-06-18 | The Effect of State Representation on LLM Agent Behavior in Dynamic Routing Games | Lyle Goodyear et.al. | 2506.15624 | null |
| 2025-06-18 | The Compositional Architecture of Regret in Large Language Models | Xiangxiang Cui et.al. | 2506.15617 | null |
| 2025-06-17 | A Variational Framework for Improving Naturalness in Generative Spoken Language Models | Li-Wei Chen et.al. | 2506.14767 | null |
| 2025-06-17 | ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM | Yujun Wang et.al. | 2506.14766 | null |
| 2025-06-17 | Large Language Models – the Future of Fundamental Physics? | Caroline Heneka et.al. | 2506.14757 | null |
| 2025-06-17 | Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs | Ring Team et.al. | 2506.14731 | null |
| 2025-06-17 | AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes | Jiahao Qiu et.al. | 2506.14728 | null |
| 2025-06-17 | HARMONY: A Scalable Distributed Vector Database for High-Throughput Approximate Nearest Neighbor Search | Qian Xu et.al. | 2506.14707 | null |
| 2025-06-17 | Capacity Matters: a Proof-of-Concept for Transformer Memorization on Real-World Data | Anton Changalidis et.al. | 2506.14704 | null |
| 2025-06-17 | Unified Software Engineering agent as AI Software Engineer | Leonhard Applis et.al. | 2506.14683 | null |
| 2025-06-17 | AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models | Ads Dawson et.al. | 2506.14682 | null |
| 2025-06-17 | Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality | Yuto Harada et.al. | 2506.14681 | null |
| 2025-06-16 | Steering LLM Thinking with Budget Guidance | Junyan Li et.al. | 2506.13752 | null |
| 2025-06-16 | Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability | Shova Kuikel et.al. | 2506.13746 | null |
| 2025-06-16 | Instruction Following by Boosting Attention of Large Language Models | Vitoria Guardieiro et.al. | 2506.13734 | null |
| 2025-06-16 | Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs | Sayed Mohammad Vakilzadeh Hatefi et.al. | 2506.13727 | null |
| 2025-06-16 | Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models | Arjun Krishna et.al. | 2506.13726 | null |
| 2025-06-16 | TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning | Junru Zhang et.al. | 2506.13705 | null |
| 2025-06-16 | Balancing Knowledge Delivery and Emotional Comfort in Healthcare Conversational Systems | Shang-Chi Tsai et.al. | 2506.13692 | null |
| 2025-06-16 | What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers | Pulkit Gopalani et.al. | 2506.13688 | null |
| 2025-06-16 | An LLM’s Apology: Outsourcing Awkwardness in the Age of AI | Twm Stone et.al. | 2506.13685 | null |
| 2025-06-16 | Prefix-Tuning+: Modernizing Prefix-Tuning through Attention Independent Prefix Data | Haonan Wang et.al. | 2506.13674 | null |
| 2025-06-13 | code_transformed: The Influence of Large Language Models on Code | Yuliang Xu et.al. | 2506.12014 | null |
| 2025-06-13 | Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making | Xiaopeng Yuan et.al. | 2506.12012 | null |
| 2025-06-13 | VGR: Visual Grounded Reasoning | Jiacong Wang et.al. | 2506.11991 | null |
| 2025-06-13 | How Visual Representations Map to Language Feature Space in Multimodal LLMs | Constantin Venhoff et.al. | 2506.11976 | null |
| 2025-06-13 | Improving Large Language Model Safety with Contrastive Representation Learning | Samuel Simko et.al. | 2506.11938 | null |
| 2025-06-13 | Temporal Dynamics of Emotions in Italian Online Soccer Fandoms | Salvatore Citraro et.al. | 2506.11934 | null |
| 2025-06-13 | LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? | Zihan Zheng et.al. | 2506.11928 | null |
| 2025-06-13 | Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache | Xiaoran Liu et.al. | 2506.11886 | null |
| 2025-06-13 | Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment | Alejandro Peña et.al. | 2506.11880 | null |
| 2025-06-13 | A Short Survey on Formalising Software Requirements using Large Language Models | Arshad Beg et.al. | 2506.11874 | null |
| 2025-06-12 | AutoMind: Adaptive Knowledgeable Agent for Automated Data Science | Yixin Ou et.al. | 2506.10974 | null |
| 2025-06-12 | Farseer: A Refined Scaling Law in Large Language Models | Houyi Li et.al. | 2506.10972 | null |
| 2025-06-12 | Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs | Qizhe Zhang et.al. | 2506.10967 | null |
| 2025-06-12 | ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark | Kangwei Liu et.al. | 2506.10960 | null |
| 2025-06-12 | SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks | Lianghong Guo et.al. | 2506.10954 | link |
| 2025-06-12 | Build the web for agents, not agents for the web | Xing Han Lù et.al. | 2506.10953 | null |
| 2025-06-12 | Execution Guided Line-by-Line Code Generation | Boaz Lavon et.al. | 2506.10948 | null |
| 2025-06-12 | GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models | Evelyn Ma et.al. | 2506.10946 | null |
| 2025-06-12 | Self-Adapting Language Models | Adam Zweiger et.al. | 2506.10943 | null |
| 2025-06-12 | Building a Media Ecosystem Observatory from Scratch: Infrastructure, Methodology, and Insights | Zeynep Pehlivan et.al. | 2506.10942 | null |
| 2025-06-11 | Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling | Tim Z. Xiao et.al. | 2506.09998 | null |
| 2025-06-11 | From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring | Yang Li et.al. | 2506.09996 | null |
| 2025-06-11 | Large Language Models for Toxic Language Detection in Low-Resource Balkan Languages | Amel Muminovic et.al. | 2506.09992 | null |
| 2025-06-11 | Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation | Xinyu Yang et.al. | 2506.09991 | null |
| 2025-06-11 | V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning | Mido Assran et.al. | 2506.09985 | null |
| 2025-06-11 | Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs | Hiroshi Matsuda et.al. | 2506.09983 | null |
| 2025-06-11 | SRLAgent: Enhancing Self-Regulated Learning Skills through Gamification and LLM Assistance | Wentao Ge et.al. | 2506.09968 | null |
| 2025-06-11 | Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing | Junfei Wu et.al. | 2506.09965 | null |
| 2025-06-11 | Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy | Sushant Gautam et.al. | 2506.09958 | null |
| 2025-06-11 | LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge | Sahar Abdelnabi et.al. | 2506.09956 | null |
| 2025-06-09 | GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior | Penghao Wu et.al. | 2506.08012 | null |
| 2025-06-09 | Play to Generalize: Learning to Reason Through Game Play | Yunfei Xie et.al. | 2506.08011 | link |
| 2025-06-09 | Reinforcement Pre-Training | Qingxiu Dong et.al. | 2506.08007 | link |
| 2025-06-09 | Reparameterized LLM Training via Orthogonal Equivalence Transformation | Zeju Qiu et.al. | 2506.08001 | null |
| 2025-06-09 | Supporting Construction Worker Well-Being with a Multi-Agent Conversational AI System | Fan Yang et.al. | 2506.07997 | null |
| 2025-06-09 | $τ^2$ -Bench: Evaluating Conversational Agents in a Dual-Control Environment | Victor Barres et.al. | 2506.07982 | null |
| 2025-06-09 | HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization | Hongzheng Chen et.al. | 2506.07972 | null |
| 2025-06-09 | CyberV: Cybernetics for Test-time Scaling in Video Understanding | Jiahao Meng et.al. | 2506.07971 | null |
| 2025-06-09 | SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence | Ziyang Gong et.al. | 2506.07966 | null |
| 2025-06-09 | Reinforcing Multimodal Understanding and Generation with Dual Self-rewards | Jixiang Hong et.al. | 2506.07963 | null |
| 2025-06-06 | Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias | Yuanzhe Hu et.al. | 2506.06280 | null |
| 2025-06-06 | CoMemo: LVLMs Need Image Context with Image Memory | Shi Liu et.al. | 2506.06279 | link |
| 2025-06-06 | AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization | Mukur Gupta et.al. | 2506.06273 | null |
| 2025-06-06 | Cartridges: Lightweight and general-purpose long context representations via self-study | Sabri Eyuboglu et.al. | 2506.06266 | link |
| 2025-06-06 | PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time | Weizhi Zhang et.al. | 2506.06254 | null |
| 2025-06-06 | DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation | Jingyu Xiao et.al. | 2506.06251 | link |
| 2025-06-06 | Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models | Zahra Babaiee et.al. | 2506.06242 | null |
| 2025-06-06 | Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge | Yi Sui et.al. | 2506.06240 | null |
| 2025-06-06 | CompilerGPT: Leveraging Large Language Models for Analyzing and Acting on Compiler Optimization Reports | Peter Pirkelbauer et.al. | 2506.06227 | null |
| 2025-06-06 | PROVSYN: Synthesizing Provenance Graphs for Data Augmentation in Intrusion Detection Systems | Yi Huang et.al. | 2506.06226 | null |
| 2025-06-05 | Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets | Lei Hsiung et.al. | 2506.05346 | null |
| 2025-06-05 | SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs | Jiahui Wang et.al. | 2506.05344 | link |
| 2025-06-05 | Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning | Xingjian Ran et.al. | 2506.05341 | null |
| 2025-06-05 | VideoMolmo: Spatio-Temporal Grounding Meets Pointing | Ghazi Shazan Ahmad et.al. | 2506.05336 | link |
| 2025-06-05 | Search Arena: Analyzing Search-Augmented LLMs | Mihran Miroyan et.al. | 2506.05334 | link |
| 2025-06-05 | MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning | Xinyan Chen et.al. | 2506.05331 | link |
| 2025-06-05 | Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay | Yifan Sun et.al. | 2506.05316 | null |
| 2025-06-05 | Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models | Taha Entesari et.al. | 2506.05314 | null |
| 2025-06-05 | ProRefine: Inference-time Prompt Refinement with Textual Feedback | Deepak Pandita et.al. | 2506.05305 | null |
| 2025-06-05 | Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos | Weifeng Lin et.al. | 2506.05302 | null |
| 2025-06-04 | Language-Image Alignment with Fixed Text Encoders | Jingfeng Yang et.al. | 2506.04209 | link |
| 2025-06-04 | Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning | Shuang Chen et.al. | 2506.04207 | link |
| 2025-06-04 | EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation | Jinghan Jia et.al. | 2506.04205 | null |
| 2025-06-04 | Cascadia: A Cascade Serving System for Large Language Models | Youhe Jiang et.al. | 2506.04203 | null |
| 2025-06-04 | TracLLM: A Generic Framework for Attributing Long Context LLMs | Yanting Wang et.al. | 2506.04202 | link |
| 2025-06-04 | R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning | Qingfei Zhao et.al. | 2506.04185 | link |
| 2025-06-04 | SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models | Yuhao Wu et.al. | 2506.04180 | link |
| 2025-06-04 | SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling | Anhao Zhao et.al. | 2506.04179 | null |
| 2025-06-04 | Does Prompt Design Impact Quality of Data Imputation by LLMs? | Shreenidhi Srinivasan et.al. | 2506.04172 | null |
| 2025-06-04 | VISCA: Inferring Component Abstractions for Automated End-to-End Testing | Parsa Alian et.al. | 2506.04161 | null |
| 2025-06-03 | Entity-Augmented Neuroscience Knowledge Retrieval Using Ontology and Semantic Understanding Capability of LLM | Pralaypati Ta et.al. | 2506.03145 | null |
| 2025-06-03 | Not All Tokens Are Meant to Be Forgotten | Xiangyu Zhou et.al. | 2506.03142 | null |
| 2025-06-03 | SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation | Siqi Chen et.al. | 2506.03139 | link |
| 2025-06-03 | Native-Resolution Image Synthesis | Zidong Wang et.al. | 2506.03131 | link |
| 2025-06-03 | AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation | Lu Qiu et.al. | 2506.03126 | link |
| 2025-06-03 | AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation | Prashanth Vijayaraghavan et.al. | 2506.03122 | null |
| 2025-06-03 | Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback | Xiaoying Zhang et.al. | 2506.03106 | link |
| 2025-06-03 | TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models | Chetwin Low et.al. | 2506.03099 | link |
| 2025-06-03 | EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models | Mingzhe Li et.al. | 2506.03067 | null |
| 2025-06-03 | Facts Do Care About Your Language: Assessing Answer Quality of Multilingual LLMs | Yuval Kansal et.al. | 2506.03051 | null |
| 2025-05-30 | MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning | Yiqing Liang et.al. | 2505.24871 | link |
| 2025-05-30 | SiLVR: A Simple Language-based Video Reasoning Framework | Ce Zhang et.al. | 2505.24869 | link |
| 2025-05-30 | ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models | Mingjie Liu et.al. | 2505.24864 | null |
| 2025-05-30 | MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning | Jingyan Shen et.al. | 2505.24846 | null |
| 2025-05-30 | Chameleon: A Flexible Data-mixing Framework for Language Model Pretraining and Finetuning | Wanyun Xie et.al. | 2505.24844 | null |
| 2025-05-30 | Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck | Yuwen Tan et.al. | 2505.24840 | null |
| 2025-05-30 | VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software | Brandon Man et.al. | 2505.24838 | link |
| 2025-05-30 | Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs | Juraj Vladika et.al. | 2505.24830 | null |
| 2025-05-30 | LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text | Li yunhan et.al. | 2505.24826 | null |
| 2025-05-30 | PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models | Yinggan Xu et.al. | 2505.24823 | null |
| 2025-05-29 | Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought | Yunze Man et.al. | 2505.23766 | null |
| 2025-05-29 | From Chat Logs to Collective Insights: Aggregative Question Answering | Wentao Zhang et.al. | 2505.23765 | null |
| 2025-05-29 | MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence | Sihan Yang et.al. | 2505.23764 | null |
| 2025-05-29 | Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch | Aneeshan Sain et.al. | 2505.23763 | null |
| 2025-05-29 | Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint | Heekyung Lee et.al. | 2505.23759 | link |
| 2025-05-29 | DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning | Ziyin Zhang et.al. | 2505.23754 | link |
| 2025-05-29 | ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks | Akashah Shabbir et.al. | 2505.23752 | link |
| 2025-05-29 | Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences? | Paul Gölz et.al. | 2505.23749 | null |
| 2025-05-29 | Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence | Diankun Wu et.al. | 2505.23747 | link |
| 2025-05-29 | Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time | Mohamad Chehade et.al. | 2505.23729 | null |
| 2025-05-28 | Zero-Shot Vision Encoder Grafting via LLM Surrogates | Kaiyu Yue et.al. | 2505.22664 | link |
| 2025-05-28 | AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models | Feng Luo et.al. | 2505.22662 | null |
| 2025-05-28 | GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning | Qingchen Yu et.al. | 2505.22661 | link |
| 2025-05-28 | 3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model | Wenbo Hu et.al. | 2505.22657 | null |
| 2025-05-28 | Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents | Michael Kirchhof et.al. | 2505.22655 | null |
| 2025-05-28 | The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason | Ang Lv et.al. | 2505.22653 | link |
| 2025-05-28 | Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese | Hanjia Lyu et.al. | 2505.22645 | link |
| 2025-05-28 | Learning Composable Chains-of-Thought | Fangcong Yin et.al. | 2505.22635 | null |
| 2025-05-28 | Spatial Knowledge Graph-Guided Multimodal Synthesis | Yida Xue et.al. | 2505.22633 | null |
| 2025-05-28 | Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs | Ziling Cheng et.al. | 2505.22630 | null |
| 2025-05-27 | Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making | Yihan Wang et.al. | 2505.21503 | null |
| 2025-05-27 | Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment | Xiaojun Jia et.al. | 2505.21494 | link |
| 2025-05-27 | Reinforcing General Reasoning without Verifiers | Xiangxin Zhou et.al. | 2505.21493 | link |
| 2025-05-27 | Robust Hypothesis Generation: LLM-Automated Language Bias for Inductive Logic Programming | Yang Yang et.al. | 2505.21486 | null |
| 2025-05-27 | Are Language Models Consequentialist or Deontological Moral Reasoners? | Keenan Samway et.al. | 2505.21479 | null |
| 2025-05-27 | Policy Optimized Text-to-Image Pipeline Design | Uri Gadot et.al. | 2505.21478 | null |
| 2025-05-27 | Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration | Zijun Liu et.al. | 2505.21471 | link |
| 2025-05-27 | Do LLMs Need to Think in One Language? Correlation between Latent Language and Task Performance | Shintaro Ozaki et.al. | 2505.21458 | null |
| 2025-05-27 | Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO | Muzhi Zhu et.al. | 2505.21457 | link |
| 2025-05-27 | Can Large Reasoning Models Self-Train? | Sheikh Shafayat et.al. | 2505.21444 | null |
| 2025-05-26 | Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs | Hanting Chen et.al. | 2505.20155 | null |
| 2025-05-26 | UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter-Efficient Fine-Tuning of Large Models | Xueyan Zhang et.al. | 2505.20154 | null |
| 2025-05-26 | MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents | Ziming Wei et.al. | 2505.20148 | null |
| 2025-05-26 | FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities | Jin Wang et.al. | 2505.20147 | null |
| 2025-05-26 | StructEval: Benchmarking LLMs’ Capabilities to Generate Structural Outputs | Jialin Yang et.al. | 2505.20139 | link |
| 2025-05-26 | Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers | Zhengliang Shi et.al. | 2505.20128 | null |
| 2025-05-26 | Agentic AI Process Observability: Discovering Behavioral Variability | Fabiana Fournier et.al. | 2505.20127 | null |
| 2025-05-26 | TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent | Dominik Meier et.al. | 2505.20118 | link |
| 2025-05-26 | Named Entity Recognition in Historical Italian: The Case of Giacomo Leopardi’s Zibaldone | Cristian Santini et.al. | 2505.20113 | null |
| 2025-05-26 | ResSVD: Residual Compensated SVD for Large Language Model Compression | Haolei Bai et.al. | 2505.20112 | null |
| 2025-05-26 | Language-Agnostic Suicidal Risk Detection Using Large Language Models | June-Woo Kim et.al. | 2505.20109 | null |
| 2025-05-26 | Adaptive Deep Reasoning: Triggering Deep Thinking When Needed | Yunhao Wang et.al. | 2505.20101 | null |
| 2025-05-23 | Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs | Wafa Alghallabi et.al. | 2505.18152 | link |
| 2025-05-23 | First Finish Search: Efficient Test-Time Scaling in Large Language Models | Aradhye Agarwal et.al. | 2505.18149 | link |
| 2025-05-23 | Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find | Owen Bianchi et.al. | 2505.18148 | link |
| 2025-05-23 | Gaming Tool Preferences in Agentic LLMs | Kazem Faghih et.al. | 2505.18135 | link |
| 2025-05-23 | Reward Model Overoptimisation in Iterated RLHF | Lorenz Wolf et.al. | 2505.18126 | null |
| 2025-05-23 | UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification | Poojah Ganesan et.al. | 2505.18122 | null |
| 2025-05-23 | ProgRM: Build Better GUI Agents with Progress Rewards | Danyang Zhang et.al. | 2505.18121 | null |
| 2025-05-23 | Bidirectional Knowledge Distillation for Enhancing Sequential Recommendation with Large Language Models | Jiongran Wu et.al. | 2505.18120 | null |
| 2025-05-23 | Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM | Zinuo Li et.al. | 2505.18110 | null |
| 2025-05-23 | ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent Framework | Lisheng Huang et.al. | 2505.18105 | link |
| 2025-05-22 | CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms | Shilin Yan et.al. | 2505.17020 | link |
| 2025-05-22 | Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework | Chenhao Zhang et.al. | 2505.17019 | link |
| 2025-05-22 | SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward | Kaixuan Fan et.al. | 2505.17018 | link |
| 2025-05-22 | Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO | Chengzhuo Tong et.al. | 2505.17017 | link |
| 2025-05-22 | Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models | Runsen Xu et.al. | 2505.17015 | link |
| 2025-05-22 | SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding | Haoning Wu et.al. | 2505.17012 | link |
| 2025-05-22 | R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning | Huatong Song et.al. | 2505.17005 | link |
| 2025-05-22 | Do Large Language Models Excel in Complex Logical Reasoning with Formal Language? | Jin Jiang et.al. | 2505.16998 | link |
| 2025-05-22 | DecoupledESC: Enhancing Emotional Support Generation via Strategy-Response Decoupled Preference Optimization | Chao Zhang et.al. | 2505.16995 | null |
| 2025-05-22 | Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding | Runpeng Yu et.al. | 2505.16990 | link |
| 2025-05-21 | The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation | Patrick Kahardipraja et.al. | 2505.15807 | null |
| 2025-05-21 | Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering | Hwan Chang et.al. | 2505.15805 | link |
| 2025-05-21 | STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs | Zongzhao Li et.al. | 2505.15804 | link |
| 2025-05-21 | VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models | Yuchen Yan et.al. | 2505.15801 | link |
| 2025-05-21 | Reverse Engineering Human Preferences with Reinforcement Learning | Lisa Alazraki et.al. | 2505.15795 | null |
| 2025-05-21 | HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving | Zhiwen Chen et.al. | 2505.15793 | null |
| 2025-05-21 | Large Language Models as Computable Approximations to Solomonoff Induction | Jun Wan et.al. | 2505.15784 | null |
| 2025-05-21 | ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning | Changtai Zhu et.al. | 2505.15776 | link |
| 2025-05-21 | Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention | Huanxuan Liao et.al. | 2505.15774 | null |
| 2025-05-21 | MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling | Cheng Yifan et.al. | 2505.15772 | null |
| 2025-05-20 | Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning | Haolei Xu et.al. | 2505.14684 | link |
| 2025-05-20 | UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation | Rui Tian et.al. | 2505.14682 | null |
| 2025-05-20 | UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Large Language Models | Xiaojie Gu et.al. | 2505.14679 | link |
| 2025-05-20 | Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning | Jiaer Xia et.al. | 2505.14677 | link |
| 2025-05-20 | Reward Reasoning Model | Jiaxin Guo et.al. | 2505.14674 | null |
| 2025-05-20 | Quartet: Native FP4 Training Can Be Optimal for Large Language Models | Roberto L. Castro et.al. | 2505.14669 | link |
| 2025-05-20 | ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions | Bufang Yang et.al. | 2505.14668 | link |
| 2025-05-20 | Beyond Words: Multimodal LLM Knows When to Speak | Zikai Liao et.al. | 2505.14654 | null |
| 2025-05-20 | General-Reasoner: Advancing LLM Reasoning Across All Domains | Xueguang Ma et.al. | 2505.14652 | link |
| 2025-05-20 | Think Only When You Need with Large Hybrid-Reasoning Models | Lingjie Jiang et.al. | 2505.14631 | null |
| 2025-05-19 | CIE: Controlling Language Model Text Generations Using Continuous Signals | Vinay Samuel et.al. | 2505.13448 | link |
| 2025-05-19 | Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards | Xiaoyuan Liu et.al. | 2505.13445 | link |
| 2025-05-19 | Optimizing Anytime Reasoning via Budget Relative Policy Optimization | Penghui Qi et.al. | 2505.13438 | link |
| 2025-05-19 | SMOTExT: SMOTE meets Large Language Models | Mateusz Bystroński et.al. | 2505.13434 | null |
| 2025-05-19 | Fine-tuning Quantized Neural Networks with Zeroth-order Optimization | Sifeng Shang et.al. | 2505.13430 | link |
| 2025-05-19 | Understanding Complexity in VideoQA via Visual Program Generation | Cristobal Eyzaguirre et.al. | 2505.13429 | null |
| 2025-05-19 | MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision | Lingxiao Du et.al. | 2505.13427 | link |
| 2025-05-19 | Learnware of Language Models: Specialized Small Language Models Can Do Big | Zhi-Hao Tan et.al. | 2505.13425 | null |
| 2025-05-19 | Make Still Further Progress: Chain of Thoughts for Tabular Data Leaderboard | Si-Yang Liu et.al. | 2505.13421 | null |
| 2025-05-19 | FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning | Zhuozhao Hu et.al. | 2505.13419 | link |
| 2025-05-16 | Modeling cognitive processes of natural reading with transformer-based Language Models | Bruno Bianchi et.al. | 2505.11485 | null |
| 2025-05-16 | msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML | Zhaolan Huang et.al. | 2505.11483 | null |
| 2025-05-16 | Improving Assembly Code Performance with Large Language Models via Reinforcement Learning | Anjiang Wei et.al. | 2505.11480 | null |
| 2025-05-16 | HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages | Zhilin Wang et.al. | 2505.11475 | null |
| 2025-05-16 | Disentangling Reasoning and Knowledge in Medical Large Language Models | Rahul Thapa et.al. | 2505.11462 | null |
| 2025-05-16 | ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks | Zhixiong Zhuang et.al. | 2505.11459 | null |
| 2025-05-16 | HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation | Shaina Raza et.al. | 2505.11454 | link |
| 2025-05-16 | LLMs unlock new paths to monetizing exploits | Nicholas Carlini et.al. | 2505.11449 | null |
| 2025-05-16 | Is Compression Really Linear with Code Intelligence? | Xianzhen Luo et.al. | 2505.11441 | null |
| 2025-05-16 | GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art | Chenkai Zhang et.al. | 2505.11436 | null |
| 2025-05-15 | End-to-End Vision Tokenizer Tuning | Wenxuan Wang et.al. | 2505.10562 | null |
| 2025-05-15 | Neural Thermodynamic Laws for Large Language Model Training | Ziming Liu et.al. | 2505.10559 | null |
| 2025-05-15 | MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning | Ke Wang et.al. | 2505.10557 | link |
| 2025-05-15 | Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data | Yiwen Liu et.al. | 2505.10551 | link |
| 2025-05-15 | Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models | Annie Wong et.al. | 2505.10543 | link |
| 2025-05-15 | Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis | Pengfei Wang et.al. | 2505.10541 | link |
| 2025-05-15 | S3C2 Summit 2024-09: Industry Secure Software Supply Chain Summit | Imranur Rahman et.al. | 2505.10538 | null |
| 2025-05-15 | RouteNator: A Router-Based Multi-Modal Architecture for Generating Synthetic Training Data for Function Calling LLMs | Vibha Belavadi et.al. | 2505.10495 | null |
| 2025-05-15 | Can You Really Trust Code Copilots? Evaluating Large Language Models from a Code Security Perspective | Yutao Mou et.al. | 2505.10494 | link |
| 2025-05-15 | CL-RAG: Bridging the Gap in Retrieval-Augmented Generation with Curriculum Learning | Shaohan Wang et.al. | 2505.10493 | null |
| 2025-05-14 | Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors | Nicolas Dupuis et.al. | 2505.09610 | null |
| 2025-05-14 | Adversarial Suffix Filtering: a Defense Pipeline for LLMs | David Khachaturov et.al. | 2505.09602 | null |
| 2025-05-14 | How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference | Nidhal Jegham et.al. | 2505.09598 | null |
| 2025-05-14 | WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models | Abdullah Mushtaq et.al. | 2505.09595 | null |
| 2025-05-14 | Variational Visual Question Answering | Tobias Jan Wieczorek et.al. | 2505.09591 | null |
| 2025-05-14 | Beyond Likes: How Normative Feedback Complements Engagement Signals on Social Media | Yuchen Wu et.al. | 2505.09583 | null |
| 2025-05-14 | Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach | Shannon Lodoen et.al. | 2505.09576 | null |
| 2025-05-14 | MIGRATION-BENCH: Repository-Level Code Migration Benchmark from Java 8 | Linbo Liu et.al. | 2505.09569 | link |
| 2025-05-14 | PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning | Zongqian Li et.al. | 2505.09519 | null |
| 2025-05-14 | Layered Unlearning for Adversarial Relearning | Timothy Qian et.al. | 2505.09500 | link |
| 2025-05-13 | CodePDE: An Inference Framework for LLM-driven PDE Solver Generation | Shanda Li et.al. | 2505.08783 | link |
| 2025-05-13 | HealthBench: Evaluating Large Language Models Towards Improved Human Health | Rahul K. Arora et.al. | 2505.08775 | link |
| 2025-05-14 | Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology | Yatai Ji et.al. | 2505.08765 | null |
| 2025-05-13 | AC-Reason: Towards Theory-Guided Actual Causality Reasoning with Large Language Models | Yanxi Zhang et.al. | 2505.08750 | link |
| 2025-05-13 | DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models | Xiaoyang Chen et.al. | 2505.08744 | link |
| 2025-05-13 | Probability Consistency in Large Language Models: Theoretical Foundations Meet Empirical Discrepancies | Xiaoliang Luo et.al. | 2505.08739 | null |
| 2025-05-13 | NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context | Ben Yao et.al. | 2505.08734 | null |
| 2025-05-13 | Securing RAG: A Risk Assessment and Mitigation Framework | Lukas Ammann et.al. | 2505.08728 | null |
| 2025-05-13 | PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts | Yang Su et.al. | 2505.08719 | null |
| 2025-05-13 | LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs | K M Sajjadul Islam et.al. | 2505.08704 | null |
| 2025-05-12 | A Comparative Analysis of Static Word Embeddings for Hungarian | Máté Gedeon et.al. | 2505.07809 | link |
| 2025-05-12 | Learning Dynamics in Continual Pre-Training for Large Language Models | Xingjin Wang et.al. | 2505.07796 | link |
| 2025-05-12 | Domain Regeneration: How well do LLMs match syntactic properties of text domains? | Da Ju et.al. | 2505.07784 | null |
| 2025-05-12 | Relative Overfitting and Accept-Reject Framework | Yanxin Liu et.al. | 2505.07783 | null |
| 2025-05-12 | MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering | Rushi Qiang et.al. | 2505.07782 | link |
| 2025-05-12 | Must Read: A Systematic Survey of Computational Persuasion | Nimet Beyza Bozdag et.al. | 2505.07775 | null |
| 2025-05-12 | Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving | Xinji Mai et.al. | 2505.07773 | link |
| 2025-05-12 | Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding | Yifeng Di et.al. | 2505.07768 | null |
| 2025-05-12 | Assessing the Chemical Intelligence of Large Language Models | Nicholas T. Runcie et.al. | 2505.07735 | null |
| 2025-05-12 | Spoken Language Understanding on Unseen Tasks With In-Context Learning | Neeraj Agrawal et.al. | 2505.07731 | null |
| 2025-05-09 | From Millions of Tweets to Actionable Insights: Leveraging LLMs for User Profiling | Vahid Rahimzadeh et.al. | 2505.06184 | null |
| 2025-05-09 | A Large Language Model-Enhanced Q-learning for Capacitated Vehicle Routing Problem with Time Windows | Linjiang Cao et.al. | 2505.06178 | null |
| 2025-05-09 | MonetGPT: Solving Puzzles Enhances MLLMs’ Image Retouching Skills | Niladri Shekhar Dutt et.al. | 2505.06176 | link |
| 2025-05-09 | Turbo-ICL: In-Context Learning-Based Turbo Equalization | Zihang Song et.al. | 2505.06175 | null |
| 2025-05-09 | A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets | Ryan Lagasse et.al. | 2505.06150 | null |
| 2025-05-09 | Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study | Faeze Ghorbanpour et.al. | 2505.06149 | null |
| 2025-05-09 | LLMs Get Lost In Multi-Turn Conversation | Philippe Laban et.al. | 2505.06120 | link |
| 2025-05-09 | Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models | Jugal Gajjar et.al. | 2505.06110 | null |
| 2025-05-09 | LLMs Outperform Experts on Challenging Biology Benchmarks | Lennart Justen et.al. | 2505.06108 | null |
| 2025-05-09 | Free and Fair Hardware: A Pathway to Copyright Infringement-Free Verilog Generation using LLMs | Sam Bush et.al. | 2505.06096 | null |
| 2025-05-08 | Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation | Chao Liao et.al. | 2505.05472 | null |
| 2025-05-08 | Flow-GRPO: Training Flow Matching Models via Online RL | Jie Liu et.al. | 2505.05470 | link |
| 2025-05-08 | Generating Physically Stable and Buildable LEGO Designs from Text | Ava Pun et.al. | 2505.05469 | link |
| 2025-05-08 | StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant | Haibo Wang et.al. | 2505.05467 | null |
| 2025-05-08 | ComPO: Preference Alignment via Comparison Oracles | Peter Chen et.al. | 2505.05465 | link |
| 2025-05-08 | Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging | Shiqi Chen et.al. | 2505.05464 | link |
| 2025-05-08 | UKElectionNarratives: A Dataset of Misleading Narratives Surrounding Recent UK General Elections | Fatima Haouari et.al. | 2505.05459 | null |
| 2025-05-08 | SITE: towards Spatial Intelligence Thorough Evaluation | Wenqi Wang et.al. | 2505.05456 | null |
| 2025-05-08 | Conversational Process Model Redesign | Nataliia Klievtsova et.al. | 2505.05453 | null |
| 2025-05-08 | clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations | Chalamalasetti Kranti et.al. | 2505.05445 | null |
| 2025-05-07 | EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning | Zhenghao Xing et.al. | 2505.04623 | link |
| 2025-05-07 | On Path to Multimodal Generalist: General-Level and General-Bench | Hao Fei et.al. | 2505.04620 | link |
| 2025-05-07 | OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution | Lianghong Guo et.al. | 2505.04606 | link |
| 2025-05-08 | MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection | Zhihao Zhang et.al. | 2505.04594 | null |
| 2025-05-07 | ZeroSearch: Incentivize the Search Capability of LLMs without Searching | Hao Sun et.al. | 2505.04588 | link |
| 2025-05-07 | SlideItRight: Using AI to Find Relevant Slides and Provide Feedback for Open-Ended Questions | Chloe Qianhui Zhao et.al. | 2505.04584 | null |
| 2025-05-07 | Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization | Wenjun Cao et.al. | 2505.04578 | null |
| 2025-05-07 | Comparative Analysis of Carbon Footprint in Manual vs. LLM-Assisted Code Development | Kuen Sum Cheung et.al. | 2505.04521 | null |
| 2025-05-07 | Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs | Yehui Tang et.al. | 2505.04519 | null |
| 2025-05-07 | CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation | Jiahao Li et.al. | 2505.04481 | null |
| 2025-05-06 | VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model | Zuwei Long et.al. | 2505.03739 | link |
| 2025-05-06 | Graph Drawing for LLMs: An Empirical Evaluation | Walter Didimo et.al. | 2505.03678 | null |
| 2025-05-06 | Binding threshold units with artificial oscillatory neurons | Vladimir Fanaskov et.al. | 2505.03648 | null |
| 2025-05-06 | PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing | Yiping Xie et.al. | 2505.03621 | null |
| 2025-05-06 | A Unifying Bias-aware Multidisciplinary Framework for Investigating Socio-Technical Issues | Sacha Hasan et.al. | 2505.03593 | null |
| 2025-05-06 | BCause: Human-AI collaboration to improve hybrid mapping and ideation in argumentation-grounded deliberation | Lucas Anastasiou et.al. | 2505.03584 | null |
| 2025-05-06 | DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes | Sergey Linok et.al. | 2505.03581 | link |
| 2025-05-06 | LlamaFirewall: An open source guardrail system for building secure AI agents | Sahana Chennabasappa et.al. | 2505.03574 | null |
| 2025-05-06 | Say It Another Way: A Framework for User-Grounded Paraphrasing | Cléa Chataigner et.al. | 2505.03563 | null |
| 2025-05-06 | A Comprehensive Survey of Large AI Models for Future Communications: Foundations, Applications and Challenges | Feibo Jiang et.al. | 2505.03556 | link |
| 2025-05-05 | Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation | Lu Ling et.al. | 2505.02836 | null |
| 2025-05-05 | R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning | Yi-Fan Zhang et.al. | 2505.02835 | null |
| 2025-05-05 | ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations | Dmitriy Shopkhoev et.al. | 2505.02819 | null |
| 2025-05-05 | Towards Quantifying the Hessian Structure of Neural Networks | Zhaorui Dong et.al. | 2505.02809 | null |
| 2025-05-05 | Generating HomeAssistant Automations Using an LLM-based Chatbot | Mathyas Giudici et.al. | 2505.02802 | null |
| 2025-05-05 | HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models | Zheng Lin et.al. | 2505.02795 | null |
| 2025-05-05 | Beyond the Monitor: Mixed Reality Visualization and AI for Enhanced Digital Pathology Workflow | Jai Prakash Veerla et.al. | 2505.02780 | null |
| 2025-05-05 | Giving Simulated Cells a Voice: Evolving Prompt-to-Intervention Models for Cellular Control | Nam H. Le et.al. | 2505.02766 | null |
| 2025-05-05 | Bye-bye, Bluebook? Automating Legal Procedure with Large Language Models | Matthew Dahl et.al. | 2505.02763 | null |
| 2025-05-05 | Knowledge Graphs for Enhancing Large Language Models in Entity Disambiguation | Pons Gerard et.al. | 2505.02737 | null |
| 2025-05-02 | Helping Big Language Models Protect Themselves: An Enhanced Filtering and Summarization System | Sheikh Samit Muhaimin et.al. | 2505.01315 | null |
| 2025-05-02 | Enhancing SPARQL Query Rewriting for Complex Ontology Alignments | Anicet Lepetit Ondo et.al. | 2505.01309 | null |
| 2025-05-02 | Document Retrieval Augmented Fine-Tuning (DRAFT) for safety-critical software assessments | Regan Bolton et.al. | 2505.01307 | null |
| 2025-05-02 | FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing | Gaoxiang Cong et.al. | 2505.01263 | null |
| 2025-05-02 | Digital Pathway Curation (DPC): a comparative pipeline to assess the reproducibility, consensus and accuracy across Gemini, PubMed, and scientific reviewers in biomedical research | Flavio Lichtenstein et.al. | 2505.01259 | null |
| 2025-05-02 | CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning | Tsai-Ning Wang et.al. | 2505.01199 | null |
| 2025-05-02 | LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures | Francisco Aguilera-Martínez et.al. | 2505.01177 | null |
| 2025-05-02 | Methodological Foundations for AI-Driven Survey Question Generation | Ted K. Mburu et.al. | 2505.01150 | null |
| 2025-05-02 | Retrieval-Augmented Generation in Biomedicine: A Survey of Technologies, Datasets, and Clinical Applications | Jiawei He et.al. | 2505.01146 | null |
| 2025-05-02 | MateICL: Mitigating Attention Dispersion in Large-Scale In-Context Learning | Murtadha Ahmed et.al. | 2505.01110 | null |
| 2025-05-01 | T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT | Dongzhi Jiang et.al. | 2505.00703 | null |
| 2025-05-01 | Steering Large Language Models with Register Analysis for Arbitrary Style Transfer | Xinchen Yang et.al. | 2505.00679 | null |
| 2025-05-01 | Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions | Yiming Du et.al. | 2505.00675 | null |
| 2025-05-01 | DeepCritic: Deliberate Critique with Large Language Models | Wenkai Yang et.al. | 2505.00662 | null |
| 2025-05-01 | On the generalization of language models from in-context learning and finetuning: a controlled study | Andrew K. Lampinen et.al. | 2505.00661 | null |
| 2025-05-01 | Large Language Models Understanding: an Inherent Ambiguity Barrier | Daniel N. Nissani et.al. | 2505.00654 | null |
| 2025-05-01 | Open-Source LLM-Driven Federated Transformer for Predictive IoV Management | Yazan Otoum et.al. | 2505.00651 | null |
| 2025-05-01 | Investigating Task Arithmetic for Zero-Shot Information Retrieval | Marco Braga et.al. | 2505.00649 | null |
| 2025-05-01 | The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them) | Zihao Wang et.al. | 2505.00626 | null |
| 2025-05-01 | FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation | Chaitali Bhattacharyya et.al. | 2505.00624 | null |
| 2025-04-30 | TRUST: An LLM-Based Dialogue System for Trauma Understanding and Structured Assessments | Sichang Tu et.al. | 2504.21851 | null |
| 2025-04-30 | COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning | Xindi Wu et.al. | 2504.21850 | null |
| 2025-04-30 | An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding | Xiuwei Shang et.al. | 2504.21803 | null |
| 2025-04-30 | DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition | Z. Z. Ren et.al. | 2504.21801 | null |
| 2025-04-30 | MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness | Junsheng Huang et.al. | 2504.21773 | null |
| 2025-04-30 | LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs | Baleegh Ahmad et.al. | 2504.21770 | null |
| 2025-04-30 | LLM-based Interactive Imitation Learning for Robotic Manipulation | Jonas Werner et.al. | 2504.21769 | null |
| 2025-04-30 | Investigating Literary Motifs in Ancient and Medieval Novels with Large Language Models | Emelie Hallenberg et.al. | 2504.21742 | null |
| 2025-04-30 | TheraQuest: A Gamified, LLM-Powered Simulation for Massage Therapy Training | Shengqian Wang et.al. | 2504.21735 | null |
| 2025-04-30 | XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs | Marco Arazzi et.al. | 2504.21700 | null |
| 2025-04-29 | YoChameleon: Personalized Vision and Language Generation | Thao Nguyen et.al. | 2504.20998 | null |
| 2025-04-29 | Toward Efficient Exploration by Large Language Model Agents | Dilip Arumugam et.al. | 2504.20997 | null |
| 2025-04-29 | X-Fusion: Introducing New Modality to Frozen Large Language Models | Sicheng Mo et.al. | 2504.20996 | null |
| 2025-04-29 | ACE: A Security Architecture for LLM-Integrated App Systems | Evan Li et.al. | 2504.20984 | null |
| 2025-04-29 | Real-Time Wayfinding Assistant for Blind and Low-Vision Users | Dabbrata Das et.al. | 2504.20976 | null |
| 2025-04-29 | SetKE: Knowledge Editing for Knowledge Elements Overlap | Yifan Wei et.al. | 2504.20972 | null |
| 2025-04-29 | OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification | Shangyu Li et.al. | 2504.20964 | null |
| 2025-04-29 | Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models | Maryna Vyshnyvetska et.al. | 2504.20951 | null |
| 2025-04-29 | Trace-of-Thought: Enhanced Arithmetic Problem Solving via Reasoning Distillation From Large to Small Language Models | Tyler McDonald et.al. | 2504.20946 | null |
| 2025-04-29 | ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification | Ziqing Fan et.al. | 2504.20930 | null |
| 2025-04-28 | AutoJudge: Judge Decoding Without Manual Annotation | Roman Garipov et.al. | 2504.20039 | null |
| 2025-04-28 | SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning | Wufei Ma et.al. | 2504.20024 | null |
| 2025-04-28 | Better To Ask in English? Evaluating Factual Accuracy of Multilingual LLMs in English and Low-Resource Languages | Pritika Rohera et.al. | 2504.20022 | null |
| 2025-04-28 | Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models | Xin Wang et.al. | 2504.20020 | null |
| 2025-04-28 | LLM-Generated Fake News Induces Truth Decay in News Ecosystem: A Case Study on Neural News Recommendation | Beizhe Hu et.al. | 2504.20013 | null |
| 2025-04-28 | Towards Automated Scoping of AI for Social Good Projects | Jacob Emmerson et.al. | 2504.20010 | null |
| 2025-04-28 | Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom | Rishika Sen et.al. | 2504.20000 | null |
| 2025-04-28 | TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons | Emre Can Acikgoz et.al. | 2504.19982 | null |
| 2025-04-28 | Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets | Adam Younsi et.al. | 2504.19981 | null |
| 2025-04-29 | From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification | Junhao Ye et.al. | 2504.19959 | null |
| 2025-04-25 | TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation | Gwen Yidou Weng et.al. | 2504.18535 | null |
| 2025-04-25 | Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation | Shivam Duggal et.al. | 2504.18509 | null |
| 2025-04-25 | TopSpace: spatial topic modeling for unsupervised discovery of multicellular spatial tissue structures in multiplex imaging | Junsouk Choi et.al. | 2504.18495 | null |
| 2025-04-25 | Investigating Co-Constructive Behavior of Large Language Models in Explanation Dialogues | Leandra Fichtel et.al. | 2504.18483 | null |
| 2025-04-25 | Generative Induction of Dialogue Task Schemas with Streaming Refinement and Simulated Interactions | James D. Finch et.al. | 2504.18474 | null |
| 2025-04-25 | Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation | Peiyuan Jing et.al. | 2504.18453 | null |
| 2025-04-25 | LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection | Rajesh Yarra et.al. | 2504.18423 | null |
| 2025-04-25 | BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs | Hongyu Wang et.al. | 2504.18415 | null |
| 2025-04-25 | An Empirical Study of Evaluating Long-form Question Answering | Ning Xian et.al. | 2504.18413 | null |
| 2025-04-25 | Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers | Jared Moore et.al. | 2504.18412 | link |
| 2025-04-24 | Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models | Xu Ma et.al. | 2504.17789 | null |
| 2025-04-24 | Replay to Remember: Retaining Domain Knowledge in Streaming Language Models | Sneh Pillai et.al. | 2504.17780 | null |
| 2025-04-24 | Conversational Assistants to support Heart Failure Patients: comparing a Neurosymbolic Architecture with ChatGPT | Anuja Tayal et.al. | 2504.17753 | null |
| 2025-04-24 | Towards Robust LLMs: an Adversarial Robustness Measurement Framework | Natan Levy et.al. | 2504.17723 | null |
| 2025-04-24 | Multilingual Performance Biases of Large Language Models in Education | Vansh Gupta et.al. | 2504.17720 | null |
| 2025-04-24 | Ensemble Bayesian Inference: Leveraging Small Language Models to Achieve LLM-level Accuracy in Profile Matching Tasks | Haru-Tada Sato et.al. | 2504.17685 | null |
| 2025-04-24 | INSIGHT: Bridging the Student-Teacher Gap in Times of Large Language Models | Jarne Thys et.al. | 2504.17677 | null |
| 2025-04-24 | Energy Considerations of Large Language Model Inference and Efficiency Optimizations | Jared Fernandez et.al. | 2504.17674 | null |
| 2025-04-24 | Cross-region Model Training with Communication-Computation Overlapping and Delay Compensation | Ying Zhu et.al. | 2504.17672 | null |
| 2025-04-24 | Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction | Yuanchang Ye et.al. | 2504.17671 | null |
| 2025-04-23 | IberBench: LLM Evaluation on Iberian Languages | José Ángel González et.al. | 2504.16921 | null |
| 2025-04-23 | Do Large Language Models know who did what to whom? | Joseph M. Denning et.al. | 2504.16884 | null |
| 2025-04-23 | Enhancing Critical Thinking with AI: A Tailored Warning System for RAG Models | Xuyang Zhu et.al. | 2504.16883 | null |
| 2025-04-23 | Context-Enhanced Vulnerability Detection Based on Large Language Model | Yixin Yang et.al. | 2504.16877 | null |
| 2025-04-23 | Exploring How LLMs Capture and Represent Domain-Specific Knowledge | Mirian Hipolito Garcia et.al. | 2504.16871 | null |
| 2025-04-23 | Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification | Alexander Shvets et.al. | 2504.16856 | null |
| 2025-04-23 | Monte Carlo Planning with Large Language Model for Text-Based Game Agents | Zijing Shi et.al. | 2504.16855 | null |
| 2025-04-23 | Improving Significant Wave Height Prediction Using Chronos Models | Yilin Zhai et.al. | 2504.16834 | null |
| 2025-04-23 | LRASGen: LLM-based RESTful API Specification Generation | Sida Deng et.al. | 2504.16833 | null |
| 2025-04-23 | GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning | Luu Quy Tung et.al. | 2504.16832 | null |
| 2025-04-22 | TTRL: Test-Time Reinforcement Learning | Yuxin Zuo et.al. | 2504.16084 | link |
| 2025-04-22 | From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning | Le Zhuo et.al. | 2504.16080 | link |
| 2025-04-22 | LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities | Thomas Schmied et.al. | 2504.16078 | null |
| 2025-04-22 | PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models | Shi Qiu et.al. | 2504.16074 | link |
| 2025-04-22 | A Python Tool for Reconstructing Full News Text from GDELT | A. Fronzetti Colladon et.al. | 2504.16063 | null |
| 2025-04-22 | Vision language models are unreliable at trivial spatial cognition | Sangeet Khemlani et.al. | 2504.16061 | null |
| 2025-04-22 | Automated Static Vulnerability Detection via a Holistic Neuro-symbolic Approach | Penghui Li et.al. | 2504.16057 | null |
| 2025-04-22 | Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability | Daniel Hendriks et.al. | 2504.16056 | null |
| 2025-04-22 | Certified Mitigation of Worst-Case LLM Copyright Infringement | Jingyu Zhang et.al. | 2504.16046 | null |
| 2025-04-22 | LLMs meet Federated Learning for Scalable and Secure IoT Management | Yazan Otoum et.al. | 2504.16032 | null |
| 2025-04-21 | Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs | Chun-Hsiao Yeh et.al. | 2504.15280 | link |
| 2025-04-21 | VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models | Weiye Xu et.al. | 2504.15279 | link |
| 2025-04-21 | Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning | Jie Cheng et.al. | 2504.15275 | link |
| 2025-04-21 | Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning | Ehsan Ahmadi et.al. | 2504.15263 | null |
| 2025-04-21 | CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation | Anirudh Khatry et.al. | 2504.15254 | link |
| 2025-04-21 | Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators | Yilun Zhou et.al. | 2504.15253 | link |
| 2025-04-21 | MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning | Yahan Yang et.al. | 2504.15241 | null |
| 2025-04-21 | Fully Bayesian Approaches to Topics over Time | Julián Cendrero et.al. | 2504.15220 | null |
| 2025-04-21 | EvalAgent: Discovering Implicit Evaluation Criteria from the Web | Manya Wadhwa et.al. | 2504.15219 | null |
| 2025-04-21 | Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs | Marina Sakharova et.al. | 2504.15210 | null |
| 2025-04-18 | Generative AI Act II: Test Time Scaling Drives Cognition Engineering | Shijie Xia et.al. | 2504.13828 | link |
| 2025-04-18 | Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models | Junjie Yang et.al. | 2504.13825 | null |
| 2025-04-18 | Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning | Yixuan Even Xu et.al. | 2504.13818 | null |
| 2025-04-18 | BadApex: Backdoor Attack Based on Adaptive Optimization Mechanism of Black-box Large Language Models | Zhengxian Wu et.al. | 2504.13775 | null |
| 2025-04-18 | DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs | Tamim Al Mahmud et.al. | 2504.13774 | null |
| 2025-04-18 | Detecting Malicious Source Code in PyPI Packages with LLMs: Does RAG Come in Handy? | Motunrayo Ibiyo et.al. | 2504.13769 | null |
| 2025-04-18 | Scaling sparse feature circuit finding for in-context learning | Dmitrii Kharlapenko et.al. | 2504.13756 | null |
| 2025-04-18 | Controlled Territory and Conflict Tracking (CONTACT): (Geo-)Mapping Occupied Territory from Open Source Intelligence | Paul K. Mandal et.al. | 2504.13730 | null |
| 2025-04-18 | OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation | Yichen Wu et.al. | 2504.13707 | null |
| 2025-04-18 | Exploring Multimodal Prompt for Visualization Authoring with Large Language Models | Zhen Wen et.al. | 2504.13700 | null |
| 2025-04-17 | SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs | Haoxuan Li et.al. | 2504.13172 | null |
| 2025-04-17 | Sleep-time Compute: Beyond Inference Scaling at Test-time | Kevin Lin et.al. | 2504.13171 | link |
| 2025-04-17 | Exploring Expert Failures Improves LLM Agent Tuning | Li-Cheng Lan et.al. | 2504.13145 | null |
| 2025-04-17 | Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo | João Loula et.al. | 2504.13139 | null |
| 2025-04-17 | Energy-Based Reward Models for Robust Language Model Alignment | Anamika Lochab et.al. | 2504.13134 | null |
| 2025-04-17 | LLMs Meet Finance: Fine-Tuning Foundation Models for the Open FinLLM Leaderboard | Varun Rao et.al. | 2504.13125 | null |
| 2025-04-17 | Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training | Xinsong Zhang et.al. | 2504.13123 | null |
| 2025-04-17 | VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models | Haojian Huang et.al. | 2504.13122 | link |
| 2025-04-17 | Hadamard product in deep learning: Introduction, Advances and Challenges | Grigorios G Chrysos et.al. | 2504.13112 | null |
| 2025-04-17 | Uncertainty-Aware Trajectory Prediction via Rule-Regularized Heteroscedastic Deep Classification | Kumar Manas et.al. | 2504.13111 | null |
| 2025-04-16 | BitNet b1.58 2B4T Technical Report | Shuming Ma et.al. | 2504.12285 | link |
| 2025-04-16 | HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks | Stefan Abi-Karam et.al. | 2504.12268 | null |
| 2025-04-16 | FLIP Reasoning Challenge | Andreas Plesner et.al. | 2504.12256 | link |
| 2025-04-16 | AnomalyGen: An Automated Semantic Log Sequence Generation Framework with LLM for Anomaly Detection | Xinyu Li et.al. | 2504.12250 | null |
| 2025-04-16 | MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models | Hang Yuan et.al. | 2504.12234 | null |
| 2025-04-16 | Watermarking Needs Input Repetition Masking | David Khachaturov et.al. | 2504.12229 | null |
| 2025-04-16 | d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning | Siyan Zhao et.al. | 2504.12216 | null |
| 2025-04-16 | What Do Large Language Models Know? Tacit Knowledge as a Potential Causal-Explanatory Structure | Céline Budding et.al. | 2504.12187 | null |
| 2025-04-16 | SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented Data | Suyoung Bae et.al. | 2504.12185 | null |
| 2025-04-16 | Trusting CHATGPT: how minor tweaks in the prompts lead to major differences in sentiment classification | Jaime E. Cuellar et.al. | 2504.12180 | null |
| 2025-04-15 | TextArena | Leon Guertler et.al. | 2504.11442 | link |
| 2025-04-15 | TADACap: Time-series Adaptive Domain-Aware Captioning | Elizabeth Fons et.al. | 2504.11441 | null |
| 2025-04-15 | Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models | Maria Teleki et.al. | 2504.11431 | null |
| 2025-04-15 | A Dual-Space Framework for General Knowledge Distillation of Large Language Models | Xue Zhang et.al. | 2504.11426 | null |
| 2025-04-15 | Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts | Quanyu Long et.al. | 2504.11420 | null |
| 2025-04-15 | DataDecide: How to Predict Best Pretraining Data with Small Experiments | Ian Magnusson et.al. | 2504.11393 | null |
| 2025-04-15 | RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models | Juan Diego Rodriguez et.al. | 2504.11381 | null |
| 2025-04-15 | Cancer-Myth: Evaluating AI Chatbot on Patient Questions with False Presuppositions | Wang Bill Zhu et.al. | 2504.11373 | link |
| 2025-04-15 | OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution | Lucio La Cava et.al. | 2504.11369 | null |
| 2025-04-15 | Teaching Large Language Models to Reason through Learning and Forgetting | Tianwei Ni et.al. | 2504.11364 | null |
| 2025-04-14 | InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models | Jinguo Zhu et.al. | 2504.10479 | null |
| 2025-04-14 | MIEB: Massive Image Embedding Benchmark | Chenghao Xiao et.al. | 2504.10471 | link |
| 2025-04-14 | Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding | Tao Zhang et.al. | 2504.10465 | link |
| 2025-04-14 | The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer | Weixian Lei et.al. | 2504.10462 | link |
| 2025-04-14 | GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents | Xiaobo Xia et.al. | 2504.10458 | link |
| 2025-04-14 | M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models | Junxiong Wang et.al. | 2504.10449 | link |
| 2025-04-14 | Multimodal Long Video Modeling Based on Temporal Dynamic Context | Haoran Hao et.al. | 2504.10443 | link |
| 2025-04-14 | LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models | Minqian Liu et.al. | 2504.10430 | null |
| 2025-04-14 | Can We Edit LLMs for Long-Tail Biomedical Knowledge? | Xinhao Yi et.al. | 2504.10421 | null |
| 2025-04-14 | Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA | Michał Turski et.al. | 2504.10419 | link |
| 2025-04-11 | Quantum Large Language Model Fine-Tuning | Sang Hyub Kim et.al. | 2504.08732 | null |
| 2025-04-11 | DocAgent: A Multi-Agent System for Automated Code Documentation Generation | Dayu Yang et.al. | 2504.08725 | link |
| 2025-04-11 | Hypergraph Vision Transformers: Images are More than Nodes, More than Edges | Joshua Fixelle et.al. | 2504.08710 | null |
| 2025-04-11 | SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents | Muhammad Shihab Rashid et.al. | 2504.08703 | link |
| 2025-04-11 | Large Language Models as Span Annotators | Zdeněk Kasner et.al. | 2504.08697 | null |
| 2025-04-11 | TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning | Hang Ni et.al. | 2504.08694 | null |
| 2025-04-11 | Fast-Slow-Thinking: Complex Task Solving with Large Language Models | Yiliu Sun et.al. | 2504.08690 | null |
| 2025-04-11 | Voice Interaction With Conversational AI Could Facilitate Thoughtful Reflection and Substantive Revision in Writing | Jiho Kim et.al. | 2504.08687 | null |
| 2025-04-11 | Variability-Driven User-Story Generation using LLM and Triadic Concept Analysis | Alexandre Bazin et.al. | 2504.08666 | null |
| 2025-04-11 | Quality evaluation of Tabby coding assistant using real source code snippets | Marta Borek et.al. | 2504.08650 | null |
| 2025-04-10 | C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing | Zhongyang Li et.al. | 2504.07964 | link |
| 2025-04-10 | GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation | Lang Lin et.al. | 2504.07962 | link |
| 2025-04-10 | MM-IFEngine: Towards Multimodal Instruction Following | Shengyuan Ding et.al. | 2504.07957 | link |
| 2025-04-10 | VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning | Yukun Qi et.al. | 2504.07956 | link |
| 2025-04-10 | Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos | Rundong Luo et.al. | 2504.07940 | null |
| 2025-04-10 | Porting an LLM based Application from ChatGPT to an On-Premise Environment | Teemu Paloniemi et.al. | 2504.07907 | null |
| 2025-04-10 | Redefining Machine Translation on Social Network Services with Large Language Models | Hongcheng Guo et.al. | 2504.07901 | link |
| 2025-04-10 | How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective | Qi Liu et.al. | 2504.07898 | link |
| 2025-04-10 | Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge | Riccardo Cantini et.al. | 2504.07887 | link |
| 2025-04-10 | Token Level Routing Inference System for Edge Devices | Jianshu She et.al. | 2504.07878 | null |
| 2025-04-09 | Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning | Nikhil Shivakumar Nayak et.al. | 2504.07097 | link |
| 2025-04-09 | KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs | Elan Markowitz et.al. | 2504.07087 | null |
| 2025-04-09 | DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning | Atharva Pandey et.al. | 2504.07080 | null |
| 2025-04-09 | A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models | Zhouhang Xie et.al. | 2504.07070 | null |
| 2025-04-09 | HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification | Bibek Paudel et.al. | 2504.07069 | null |
| 2025-04-09 | TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling | Liang-Hsuan Tseng et.al. | 2504.07053 | link |
| 2025-04-09 | To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning | Tian Qin et.al. | 2504.07052 | null |
| 2025-04-09 | Evaluating Retrieval Augmented Generative Models for Document Queries in Transportation Safety | Chad Melton et.al. | 2504.07022 | null |
| 2025-04-09 | LLM-IFT: LLM-Powered Information Flow Tracking for Secure Hardware | Nowfel Mashnoor et.al. | 2504.07015 | null |
| 2025-04-09 | Towards LLMs Robustness to Changes in Prompt Format Styles | Lilian Ngweta et.al. | 2504.06969 | null |
| 2025-04-08 | GOLLuM: Gaussian Process Optimized LLMs – Reframing LLM Finetuning through Bayesian Optimization | Bojana Ranković et.al. | 2504.06265 | null |
| 2025-04-08 | Hogwild! Inference: Parallel LLM Generation via Concurrent Attention | Gleb Rodionov et.al. | 2504.06261 | link |
| 2025-04-08 | FEABench: Evaluating Language Models on Multiphysics Reasoning Ability | Nayantara Mudur et.al. | 2504.06260 | link |
| 2025-04-08 | Transfer between Modalities with MetaQueries | Xichen Pan et.al. | 2504.06256 | null |
| 2025-04-08 | LExT: Towards Evaluating Trustworthiness of Natural Language Explanations | Krithi Shailya et.al. | 2504.06227 | null |
| 2025-04-08 | Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation | Biao Zhang et.al. | 2504.06225 | null |
| 2025-04-08 | Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs | Dongyang Fan et.al. | 2504.06219 | null |
| 2025-04-08 | From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models | Chejian Xu et.al. | 2504.06214 | null |
| 2025-04-08 | TxGemma: Efficient and Agentic LLMs for Therapeutics | Eric Wang et.al. | 2504.06196 | null |
| 2025-04-08 | Assessing how hyperparameters impact Large Language Models’ sarcasm detection performance | Montgomery Gole et.al. | 2504.06166 | null |
| 2025-04-07 | URECA: Unique Region Caption Anything | Sangbeom Lim et.al. | 2504.05305 | null |
| 2025-04-07 | Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations | Pedro Ferreira et.al. | 2504.05294 | null |
| 2025-04-07 | The challenge of uncertainty quantification of large language models in medicine | Zahra Atf et.al. | 2504.05278 | null |
| 2025-04-07 | Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation | Yucheng Chu et.al. | 2504.05276 | null |
| 2025-04-07 | Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models | Yang Yan et.al. | 2504.05262 | null |
| 2025-04-07 | Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models | Adrián Bazaga et.al. | 2504.05258 | null |
| 2025-04-07 | Explaining Low Perception Model Competency with High-Competency Counterfactuals | Sara Pohland et.al. | 2504.05254 | null |
| 2025-04-07 | LLM-based Automated Grading with Human-in-the-Loop | Hang Li et.al. | 2504.05239 | null |
| 2025-04-08 | Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG | Hengran Zhang et.al. | 2504.05220 | null |
| 2025-04-07 | Unleashing the Power of LLMs in Dense Retrieval with Query Likelihood Modeling | Hengran Zhang et.al. | 2504.05216 | null |
| 2025-04-04 | Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning | Xinyi Wang et.al. | 2504.03635 | null |
| 2025-04-04 | Align to Structure: Aligning Large Language Models with Structural Information | Zae Myung Kim et.al. | 2504.03622 | null |
| 2025-04-04 | VISTA-OCR: Towards generative and interactive end to end OCR models | Laziz Hamdi et.al. | 2504.03621 | null |
| 2025-04-04 | Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task | Leonardo Ranaldi et.al. | 2504.03616 | null |
| 2025-04-04 | AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset | Bingxiang He et.al. | 2504.03612 | null |
| 2025-04-04 | EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline | Peter Baile Chen et.al. | 2504.03598 | null |
| 2025-04-04 | Agentic Knowledgeable Self-awareness | Shuofei Qiao et.al. | 2504.03553 | link |
| 2025-04-04 | Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles | Chen Wei Kuo et.al. | 2504.03520 | null |
| 2025-04-04 | LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications | Botao Zhu et.al. | 2504.03444 | null |
| 2025-04-04 | Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models | Mirko Borszukovszki et.al. | 2504.03440 | null |
| 2025-04-03 | STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection | Divya Velayudhan et.al. | 2504.02823 | null |
| 2025-04-03 | Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models | Mateusz Pach et.al. | 2504.02821 | link |
| 2025-04-03 | Generative Evaluation of Complex Reasoning in Large Language Models | Haowei Lin et.al. | 2504.02810 | link |
| 2025-04-03 | MegaMath: Pushing the Limits of Open Math Corpora | Fan Zhou et.al. | 2504.02807 | link |
| 2025-04-04 | A Survey of Large Language Models in Mental Health Disorder Detection on Social Media | Zhuohan Ge et.al. | 2504.02800 | null |
| 2025-04-03 | A Framework for Robust Cognitive Evaluation of LLMs | Karin de Langis et.al. | 2504.02789 | null |
| 2025-04-03 | From Consumption to Collaboration: Measuring Interaction Patterns to Augment Human Cognition in Open-Ended Tasks | Joshua Holstein et.al. | 2504.02780 | null |
| 2025-04-03 | BT-ACTION: A Test-Driven Approach for Modular Understanding of User Instruction Leveraging Behaviour Trees and LLMs | Alexander Leszczynski et.al. | 2504.02779 | null |
| 2025-04-03 | How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices? | Andres Algaba et.al. | 2504.02767 | null |
| 2025-04-03 | Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study | Aryan Agrawal et.al. | 2504.02733 | link |
| 2025-04-02 | Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities | Jing Liu et.al. | 2504.01954 | null |
| 2025-04-02 | The LLM Wears Prada: Analysing Gender Bias and Stereotypes through Online Shopping Data | Massimiliano Luca et.al. | 2504.01951 | null |
| 2025-04-02 | OpenCodeReasoning: Advancing Data Distillation for Competitive Coding | Wasi Uddin Ahmad et.al. | 2504.01943 | null |
| 2025-04-02 | Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length? | Celine Lee et.al. | 2504.01935 | null |
| 2025-04-02 | A thorough benchmark of automatic text classification: From traditional approaches to large language models | Washington Cunha et.al. | 2504.01930 | null |
| 2025-04-02 | Gen-C: Populating Virtual Worlds with Generative Crowds | Andreas Panayiotou et.al. | 2504.01924 | null |
| 2025-04-02 | Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation | Baban Gain et.al. | 2504.01919 | null |
| 2025-04-02 | Advancing AI-Scientist Understanding: Making LLM Think Like a Physicist with Interpretable Reasoning | Yinggan Xu et.al. | 2504.01911 | null |
| 2025-04-02 | GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning | Yanzhou Su et.al. | 2504.01886 | link |
| 2025-04-02 | TransientTables: Evaluating LLMs’ Reasoning on Temporally Evolving Semi-structured Tables | Abhilash Shankarampeta et.al. | 2504.01879 | null |
| 2025-03-31 | Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation | Shengqiong Wu et.al. | 2503.24379 | link |
| 2025-03-31 | Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models | Rui Wang et.al. | 2503.24377 | link |
| 2025-03-31 | Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 | Yi Chen et.al. | 2503.24376 | link |
| 2025-03-31 | Effectively Controlling Reasoning Models through Thinking Intervention | Tong Wu et.al. | 2503.24370 | null |
| 2025-03-31 | ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion | Rana Muhammad Shahroz Khan et.al. | 2503.24354 | null |
| 2025-03-31 | BEATS: Bias Evaluation and Assessment Test Suite for Large Language Models | Alok Abhishek et.al. | 2503.24310 | null |
| 2025-03-31 | A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG | Arshia Kermani et.al. | 2503.24307 | null |
| 2025-03-31 | Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning | Jiacheng Lin et.al. | 2503.24289 | link |
| 2025-03-31 | Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality | Sewoong Lee et.al. | 2503.24277 | link |
| 2025-03-31 | Enhancing Large Language Models (LLMs) for Telecommunications using Knowledge Graphs and Retrieval-Augmented Generation | Dun Yuan et.al. | 2503.24245 | null |
| 2025-03-28 | Q-Insight: Understanding Image Quality via Visual Reinforcement Learning | Weiqi Li et.al. | 2503.22679 | link |
| 2025-03-28 | QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks? | Belinda Z. Li et.al. | 2503.22674 | link |
| 2025-03-28 | Exploring the Effectiveness of Multi-stage Fine-tuning for Cross-encoder Re-rankers | Francesca Pezzuti et.al. | 2503.22672 | link |
| 2025-03-28 | Unicorn: Text-Only Data Synthesis for Vision Language Model Training | Xiaomin Yu et.al. | 2503.22655 | link |
| 2025-03-28 | Sentiment Classification of Thai Central Bank Press Releases Using Supervised Learning | Stefano Grassi et.al. | 2503.22629 | null |
| 2025-03-28 | Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users | Antonia Karamolegkou et.al. | 2503.22610 | null |
| 2025-03-28 | On the Alignment of Post-Publication Reviews & Bibliometric and Altmetric Impact – A Case Study on Expert Statements from the Science Media Center Germany | Dirk Tunger et.al. | 2503.22594 | null |
| 2025-03-28 | LLM-enabled Instance Model Generation | Fengjunjie Pan et.al. | 2503.22587 | null |
| 2025-03-28 | Historical Ink: Exploring Large Language Models for Irony Detection in 19th-Century Spanish | Kevin Cohen et.al. | 2503.22585 | link |
| 2025-03-28 | Beyond Vanilla Fine-Tuning: Leveraging Multistage, Multilingual, and Domain-Specific Methods for Low-Resource Machine Translation | Sarubi Thillainathan et.al. | 2503.22582 | null |
| 2025-03-27 | Video-R1: Reinforcing Video Reasoning in MLLMs | Kaituo Feng et.al. | 2503.21776 | link |
| 2025-03-27 | LOCORE: Image Re-ranking with Long-Context Sequence Modeling | Zilin Xiao et.al. | 2503.21772 | link |
| 2025-03-27 | MemInsight: Autonomous Memory Augmentation for LLM Agents | Rana Salama et.al. | 2503.21760 | null |
| 2025-03-27 | Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck | Adrian Bulat et.al. | 2503.21757 | null |
| 2025-03-27 | LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis | Shitian Zhao et.al. | 2503.21749 | link |
| 2025-03-27 | CTRL-O: Language-Controllable Object-Centric Visual Representation Learning | Aniket Didolkar et.al. | 2503.21747 | null |
| 2025-03-27 | GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics | Arsham Gholamzadeh Khoee et.al. | 2503.21735 | null |
| 2025-03-27 | Effective Skill Unlearning through Intervention and Abstention | Yongce Li et.al. | 2503.21730 | link |
| 2025-03-27 | Collab: Controlled Decoding using Mixture of Agents for LLM Alignment | Souradip Chakraborty et.al. | 2503.21720 | null |
| 2025-03-27 | Enhancing Repository-Level Software Repair via Repository-Aware Knowledge Graphs | Boyang Yang et.al. | 2503.21710 | null |
| 2025-03-26 | Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark | Sondos Mahmoud Bsharat et.al. | 2503.20786 | link |
| 2025-03-26 | Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields | Shijie Zhou et.al. | 2503.20776 | null |
| 2025-03-26 | MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams | Yanpeng Sun et.al. | 2503.20745 | null |
| 2025-03-26 | Dynamic Motion Blending for Versatile Motion Editing | Nan Jiang et.al. | 2503.20724 | null |
| 2025-03-26 | From Annotation to Adaptation: Metrics, Synthetic Data, and Aspect Extraction for Aspect-Based Sentiment Analysis with Large Language Models | Nikita Neveditsin et.al. | 2503.20715 | null |
| 2025-03-27 | Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy | Yinan Sun et.al. | 2503.20673 | null |
| 2025-03-26 | TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews | Huimin Xu et.al. | 2503.20666 | null |
| 2025-03-26 | Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging | Han Wu et.al. | 2503.20641 | link |
| 2025-03-26 | Collaborative Storytelling and LLM: A Linguistic Analysis of Automatically-Generated Role-Playing Game Sessions | Alessandro Maisto et.al. | 2503.20623 | null |
| 2025-03-26 | What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond | Wenchao Gu et.al. | 2503.20589 | null |
| 2025-03-25 | CoLLM: A Large Language Model for Composed Image Retrieval | Chuong Huynh et.al. | 2503.19910 | link |
| 2025-03-25 | A Multi-Agent Framework Integrating Large Language Models and Generative AI for Accelerated Metamaterial Design | Jie Tian et.al. | 2503.19889 | null |
| 2025-03-25 | CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation | Nengbo Wang et.al. | 2503.19878 | null |
| 2025-03-25 | SLA-Awareness for AI-assisted coding | Kishanthan Thangarajah et.al. | 2503.19876 | null |
| 2025-03-25 | Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking | Xiaoyu Tian et.al. | 2503.19855 | link |
| 2025-03-25 | Towards Online Multi-Modal Social Interaction Understanding | Xinpeng Li et.al. | 2503.19851 | null |
| 2025-03-25 | FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs | Carlos Plou et.al. | 2503.19850 | null |
| 2025-03-25 | A Comparative Analysis of Word Segmentation, Part-of-Speech Tagging, and Named Entity Recognition for Historical Chinese Sources, 1900-1950 | Zhao Fang et.al. | 2503.19844 | null |
| 2025-03-25 | SemEval-2025 Task 9: The Food Hazard Detection Challenge | Korbinian Randl et.al. | 2503.19800 | null |
| 2025-03-25 | PAVE: Patching and Adapting Video Large Language Models | Zhuoming Liu et.al. | 2503.19794 | link |
| 2025-03-24 | SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding | Mingze Xu et.al. | 2503.18943 | null |
| 2025-03-24 | Video-T1: Test-Time Scaling for Video Generation | Fangfu Liu et.al. | 2503.18942 | link |
| 2025-03-24 | Exploring Training and Inference Scaling Laws in Generative Retrieval | Hongru Cai et.al. | 2503.18941 | null |
| 2025-03-24 | Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training | Brian R. Bartoldson et.al. | 2503.18929 | link |
| 2025-03-24 | FFN Fusion: Rethinking Sequential Computation in Large Language Models | Akhiad Bercovich et.al. | 2503.18908 | null |
| 2025-03-24 | xKV: Cross-Layer SVD for KV-Cache Compression | Chi-Chih Chang et.al. | 2503.18893 | link |
| 2025-03-24 | AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration | Zhexuan Wang et.al. | 2503.18891 | null |
| 2025-03-24 | Toward building next-generation Geocoding systems: a systematic review | Zhengcong Yin et.al. | 2503.18888 | null |
| 2025-03-24 | I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders | Andrey Galichin et.al. | 2503.18878 | link |
| 2025-03-24 | Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design | Rui Xie et.al. | 2503.18869 | null |
| 2025-03-21 | Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique | Yansi Li et.al. | 2503.17363 | null |
| 2025-03-21 | OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement | Yihe Deng et.al. | 2503.17352 | link |
| 2025-03-21 | Capturing Individual Human Preferences with Reward Features | André Barreto et.al. | 2503.17338 | null |
| 2025-03-21 | Efficient Intent-Based Filtering for Multi-Party Conversations Using Knowledge Distillation from LLMs | Reem Gody et.al. | 2503.17336 | null |
| 2025-03-21 | CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities | Yuxuan Zhu et.al. | 2503.17332 | link |
| 2025-03-21 | LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language | Kun Chu et.al. | 2503.17309 | null |
| 2025-03-21 | Bugdar: AI-Augmented Secure Code Review for GitHub Pull Requests | John Naulty et.al. | 2503.17302 | null |
| 2025-03-21 | CASE – Condition-Aware Sentence Embeddings for Conditional Semantic Textual Similarity Measurement | Gaifan Zhang et.al. | 2503.17279 | null |
| 2025-03-21 | SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging | Aladin Djuhera et.al. | 2503.17239 | null |
| 2025-03-21 | FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs | Albert Sawczyn et.al. | 2503.17229 | null |
| 2025-03-20 | Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models | Yang Sui et.al. | 2503.16419 | link |
| 2025-03-20 | The Emperor’s New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination | Yifan Sun et.al. | 2503.16402 | null |
| 2025-03-20 | Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them | Guanyu Chen et.al. | 2503.16401 | null |
| 2025-03-20 | Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation | Yijia Luo et.al. | 2503.16385 | link |
| 2025-03-20 | LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images | Leyang Wang et.al. | 2503.16376 | null |
| 2025-03-20 | CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners | Yunzhi Yao et.al. | 2503.16356 | link |
| 2025-03-20 | LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates | Ying Shen et.al. | 2503.16334 | null |
| 2025-03-20 | OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence | Long Yuan et.al. | 2503.16326 | null |
| 2025-03-20 | Bridging Technology and Humanities: Evaluating the Impact of Large Language Models on Social Sciences Research with DeepSeek-R1 | Peiran Gu et.al. | 2503.16304 | null |
| 2025-03-20 | Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens | Shuqi Lu et.al. | 2503.16278 | link |
| 2025-03-19 | SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks | Yifei Zhou et.al. | 2503.15478 | null |
| 2025-03-19 | Cube: A Roblox View of 3D Intelligence | Foundation AI Team et.al. | 2503.15475 | null |
| 2025-03-19 | From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment | Jia-Nan Li et.al. | 2503.15463 | null |
| 2025-03-19 | Visual Position Prompt for MLLM based Visual Grounding | Wei Tang et.al. | 2503.15426 | null |
| 2025-03-19 | Probing the topology of the space of tokens with structured prompts | Michael Robinson et.al. | 2503.15421 | null |
| 2025-03-19 | EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models | Yinan Liang et.al. | 2503.15369 | null |
| 2025-03-19 | SemEval-2025 Task 1: AdMIRe – Advancing Multimodal Idiomaticity Representation | Thomas Pickard et.al. | 2503.15358 | null |
| 2025-03-19 | SPILL: Domain-Adaptive Intent Clustering based on Selection and Pooling with Large Language Models | I-Fan Lin et.al. | 2503.15351 | null |
| 2025-03-19 | TruthLens:A Training-Free Paradigm for DeepFake Detection | Ritabrata Chakraborty et.al. | 2503.15342 | null |
| 2025-03-19 | Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs | Yuqi Zhu et.al. | 2503.15341 | null |
| 2025-03-18 | Aligning Multimodal LLM with Human Preference: A Survey | Tao Yu et.al. | 2503.14504 | link |
| 2025-03-18 | Engineering Scientific Assistants using Interactive Structured Induction of Programs | Shraddha Surana et.al. | 2503.14488 | null |
| 2025-03-18 | Gricean Norms as a Basis for Effective Collaboration | Fardin Saad et.al. | 2503.14484 | null |
| 2025-03-18 | Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM | Xinyu Fang et.al. | 2503.14478 | link |
| 2025-03-18 | EnvBench: A Benchmark for Automated Environment Setup | Aleksandra Eliseeva et.al. | 2503.14443 | link |
| 2025-03-18 | LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers | Nikhil Abhyankar et.al. | 2503.14434 | link |
| 2025-03-18 | PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play | Wei Fang et.al. | 2503.14432 | null |
| 2025-03-18 | Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models | Siwei Zhang et.al. | 2503.14411 | null |
| 2025-03-18 | Large Language Models for Virtual Human Gesture Selection | Parisa Ghanad Torshizi et.al. | 2503.14408 | null |
| 2025-03-18 | From “Hallucination” to “Suture”: Insights from Language Philosophy to Enhance Large Language Models | Qiantong Wang et.al. | 2503.14392 | null |
| 2025-03-17 | MetaScale: Test-Time Scaling with Evolving Meta-Thoughts | Qin Liu et.al. | 2503.13447 | null |
| 2025-03-17 | Faithfulness of LLM Self-Explanations for Commonsense Tasks: Larger Is Better, and Instruction-Tuning Allows Trade-Offs but Not Pareto Dominance | Noah Y. Siegel et.al. | 2503.13445 | null |
| 2025-03-17 | VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning | Ye Liu et.al. | 2503.13444 | link |
| 2025-03-17 | xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference | Maximilian Beck et.al. | 2503.13427 | null |
| 2025-03-17 | A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives | Weiqiang Jin et.al. | 2503.13415 | null |
| 2025-03-17 | DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective | Dengyun Peng et.al. | 2503.13413 | null |
| 2025-03-17 | Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis | Alexander Ku et.al. | 2503.13401 | null |
| 2025-03-17 | MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research | James Burgess et.al. | 2503.13399 | link |
| 2025-03-17 | Scale Efficient Training for Large Datasets | Qing Zhou et.al. | 2503.13385 | link |
| 2025-03-17 | Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning | Mengyao Lyu et.al. | 2503.13383 | null |
| 2025-03-14 | ASMA-Tune: Unlocking LLMs’ Assembly Code Comprehension via Structural-Semantic Instruction Tuning | Xinyi Wang et.al. | 2503.11617 | null |
| 2025-03-14 | Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs using Semantic Space | Zhiliang Chen et.al. | 2503.11586 | null |
| 2025-03-14 | Synthesizing Access Control Policies using Large Language Models | Adarsh Vatsa et.al. | 2503.11573 | null |
| 2025-03-14 | Implicit Bias-Like Patterns in Reasoning Models | Messi H. J. Lee et.al. | 2503.11572 | null |
| 2025-03-14 | VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity | Jing Bi et.al. | 2503.11557 | null |
| 2025-03-14 | Potential of large language model-powered nudges for promoting daily water and energy conservation | Zonghan Li et.al. | 2503.11531 | null |
| 2025-03-14 | HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models | Ziqin Zhou et.al. | 2503.11513 | null |
| 2025-03-14 | V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning | Zixu Cheng et.al. | 2503.11495 | link |
| 2025-03-14 | A Review of DeepSeek Models’ Key Innovative Techniques | Chengen Wang et.al. | 2503.11486 | null |
| 2025-03-14 | T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation | Seyed Mohammad Hadi Hosseini et.al. | 2503.11481 | link |
| 2025-03-13 | GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing | Rongyao Fang et.al. | 2503.10639 | link |
| 2025-03-13 | HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model | Jiaming Liu et.al. | 2503.10631 | null |
| 2025-03-13 | UniGoal: Towards Universal Zero-shot Goal-oriented Navigation | Hang Yin et.al. | 2503.10630 | null |
| 2025-03-13 | DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding | Ayesha Ishaq et.al. | 2503.10621 | link |
| 2025-03-13 | From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM | Kshitij Ambilduke et.al. | 2503.10620 | link |
| 2025-03-13 | Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search | Andy Zhou et.al. | 2503.10619 | null |
| 2025-03-13 | Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models | Andy Zhou et.al. | 2503.10617 | null |
| 2025-03-13 | R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization | Yi Yang et.al. | 2503.10615 | link |
| 2025-03-13 | CoSTA $\ast$ : Cost-Sensitive Toolpath Agent for Multi-turn Image Editing | Advait Gupta et.al. | 2503.10613 | link |
| 2025-03-13 | TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention | Jinhao Duan et.al. | 2503.10602 | link |
| 2025-03-12 | MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System | Jihao Zhao et.al. | 2503.09600 | link |
| 2025-03-12 | How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation | Ruohao Guo et.al. | 2503.09598 | link |
| 2025-03-12 | SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment | Katrin Renz et.al. | 2503.09594 | null |
| 2025-03-12 | BIMBA: Selective-Scan Compression for Long-Range Video Question Answering | Md Mohaiminul Islam et.al. | 2503.09590 | link |
| 2025-03-12 | Cost-Optimal Grouped-Query Attention for Long-Context LLMs | Yingfa Chen et.al. | 2503.09579 | link |
| 2025-03-12 | Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks | Lutfi Eren Erdogan et.al. | 2503.09572 | null |
| 2025-03-12 | Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models | Qiguang Chen et.al. | 2503.09567 | null |
| 2025-03-12 | Large Language Models for Multi-Facility Location Mechanism Design | Nguyen Thach et.al. | 2503.09533 | null |
| 2025-03-12 | Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning | Bowen Jin et.al. | 2503.09516 | link |
| 2025-03-12 | ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning | Ziyu Wan et.al. | 2503.09501 | link |
| 2025-03-11 | Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs | Ariba Khan et.al. | 2503.08688 | link |
| 2025-03-11 | OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models | Jialv Zou et.al. | 2503.08686 | link |
| 2025-03-11 | Self-Taught Self-Correction for Small Language Models | Viktor Moskvoretskii et.al. | 2503.08681 | link |
| 2025-03-11 | Exploring the Word Sense Disambiguation Capabilities of Large Language Models | Pierpaolo Basile et.al. | 2503.08662 | null |
| 2025-03-11 | LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization | Xianfeng Wu et.al. | 2503.08619 | link |
| 2025-03-11 | EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments | Dongping Li et.al. | 2503.08604 | link |
| 2025-03-11 | NSF-SciFy: Mining the NSF Awards Database for Scientific Claims | Delip Rao et.al. | 2503.08600 | null |
| 2025-03-11 | HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding | Shehreen Azad et.al. | 2503.08585 | null |
| 2025-03-11 | RAG-Adapter: A Plug-and-Play RAG-enhanced Framework for Long Video Understanding | Xichen Tan et.al. | 2503.08576 | null |
| 2025-03-11 | DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process | Minjun Zhu et.al. | 2503.08569 | null |
| 2025-03-10 | Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru | Dunant Cusipuma et.al. | 2503.07587 | null |
| 2025-03-10 | Talking to GDELT Through Knowledge Graphs | Audun Myers et.al. | 2503.07584 | null |
| 2025-03-10 | AutoSpatial: Visual-Language Reasoning for Social Robot Navigation through Efficient Spatial Reasoning Learning | Yangzhe Kong et.al. | 2503.07557 | null |
| 2025-03-10 | Junior Software Developers’ Perspectives on Adopting LLMs for Software Engineering: a Systematic Literature Review | Samuel Ferino et.al. | 2503.07556 | link |
| 2025-03-10 | KSOD: Knowledge Supplement for LLMs On Demand | Haoran Li et.al. | 2503.07550 | null |
| 2025-03-10 | Bi-Directional Mental Model Reconciliation for Human-Robot Interaction with Large Language Models | Nina Moorman et.al. | 2503.07547 | null |
| 2025-03-10 | Queueing, Predictions, and LLMs: Challenges and Open Problems | Michael Mitzenmacher et.al. | 2503.07545 | null |
| 2025-03-10 | XIFBench: Evaluating Large Language Models on Multilingual Instruction Following | Zhenyu Li et.al. | 2503.07539 | null |
| 2025-03-10 | TokenButler: Token Importance is Predictable | Yash Akhauri et.al. | 2503.07518 | link |
| 2025-03-10 | Language Models Fail to Introspect About Their Knowledge of Language | Siyuan Song et.al. | 2503.07513 | null |
| 2025-03-10 | LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition? | Bangyan Li et.al. | 2503.07487 | null |
| 2025-03-10 | GenAIReading: Augmenting Human Cognition with Interactive Digital Textbooks Using Large Language Models and Image Generation Models | Ryugo Morita et.al. | 2503.07463 | null |
| 2025-03-10 | MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning | Xiangru Tang et.al. | 2503.07459 | link |
| 2025-03-10 | LLMs syntactically adapt their language use to their conversational partner | Florian Kandra et.al. | 2503.07457 | null |
| 2025-03-10 | From Idea to Implementation: Evaluating the Influence of Large Language Models in Software Development – An Opinion Paper | Sargam Yadav et.al. | 2503.07450 | null |
| 2025-03-10 | From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics | Jaewook Lee et.al. | 2503.07429 | null |
| 2025-03-10 | RePO: ReLU-based Preference Optimization | Junkang Wu et.al. | 2503.07426 | link |
| 2025-03-10 | REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding | Yan Tai et.al. | 2503.07413 | link |
| 2025-03-10 | Revisiting Noise in Natural Language Processing for Computational Social Science | Nadav Borenstein et.al. | 2503.07395 | null |
| 2025-03-10 | Process-Supervised LLM Recommenders via Flow-guided Tuning | Chongming Gao et.al. | 2503.07377 | link |
| 2025-03-07 | Understanding the Limits of Lifelong Knowledge Editing in LLMs | Lukas Thede et.al. | 2503.05683 | null |
| 2025-03-07 | A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval | Yu Zhang et.al. | 2503.05659 | null |
| 2025-03-07 | Learning LLM Preference over Intra-Dialogue Pairs: A Framework for Utterance-level Understandings | Xuanqing Liu et.al. | 2503.05620 | null |
| 2025-03-07 | A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models | Dong Shu et.al. | 2503.05613 | null |
| 2025-03-07 | R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning | Huatong Song et.al. | 2503.05592 | null |
| 2025-03-07 | Evaluating open-source Large Language Models for automated fact-checking | Nicolo’ Fontana et.al. | 2503.05565 | null |
| 2025-03-07 | Revitalizing Saturated Benchmarks: A Weighted Metric Approach for Differentiating Large Language Model Performance | Bryan Etzine et.al. | 2503.05551 | null |
| 2025-03-07 | Leveraging Approximate Caching for Faster Retrieval-Augmented Generation | Shai Bergman et.al. | 2503.05530 | null |
| 2025-03-07 | PoSSUM: A Protocol for Surveying Social-media Users with Multimodal LLMs | Roberto Cerina et.al. | 2503.05529 | null |
| 2025-03-07 | Cognitive Bias Detection Using Advanced Prompt Engineering | Frederic Lemieux et.al. | 2503.05516 | null |
| 2025-03-06 | L $^2$ M: Mutual Information Scaling Law for Long-Context Language Modeling | Zhuo Chen et.al. | 2503.04725 | link |
| 2025-03-06 | Shifting Long-Context LLMs Research from Input to Output | Yuhao Wu et.al. | 2503.04723 | null |
| 2025-03-06 | Enough Coin Flips Can Make LLMs Act Bayesian | Ritwik Gupta et.al. | 2503.04722 | null |
| 2025-03-06 | Predictable Scale: Part I – Optimal Hyperparameter Scaling Law in Large Language Model Pretraining | Houyi Li et.al. | 2503.04715 | link |
| 2025-03-06 | Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size | Alireza Behtash et.al. | 2503.04704 | null |
| 2025-03-06 | UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets | Wenyu Wang et.al. | 2503.04693 | null |
| 2025-03-06 | Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases | Pengcheng Qiu et.al. | 2503.04691 | link |
| 2025-03-06 | LLM-guided Plan and Retrieval: A Strategic Alignment for Interpretable User Satisfaction Estimation in Dialogue | Sangyeop Kim et.al. | 2503.04675 | null |
| 2025-03-06 | RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining | Tengfei Zhang et.al. | 2503.04653 | null |
| 2025-03-06 | Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment | Wen Yang et.al. | 2503.04647 | null |
| 2025-03-05 | The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems | Richard Ren et.al. | 2503.03750 | null |
| 2025-03-05 | Process-based Self-Rewarding Language Models | Shimao Zhang et.al. | 2503.03746 | null |
| 2025-03-05 | Towards Understanding Distilled Reasoning Models: A Representational Approach | David D. Baek et.al. | 2503.03730 | null |
| 2025-03-05 | Improving LLM Safety Alignment with Dual-Objective Optimization | Xuandong Zhao et.al. | 2503.03710 | link |
| 2025-03-05 | Effective LLM Knowledge Learning via Model Generalization | Mingkang Zhu et.al. | 2503.03705 | null |
| 2025-03-05 | A Practical Memory Injection Attack against LLM Agents | Shen Dong et.al. | 2503.03704 | null |
| 2025-03-05 | Developing and Utilizing a Large-Scale Cantonese Dataset for Multi-Tasking in Large Language Models | Jiyue Jiang et.al. | 2503.03702 | null |
| 2025-03-05 | Addressing Overprescribing Challenges: Fine-Tuning Large Language Models for Medication Recommendation Tasks | Zihao Zhao et.al. | 2503.03687 | null |
| 2025-03-05 | Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models | Bar Karov et.al. | 2503.03669 | null |
| 2025-03-05 | Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction | Gustaw Opiełka et.al. | 2503.03666 | null |
| 2025-03-04 | Wikipedia in the Era of LLMs: Evolution and Risks | Siming Huang et.al. | 2503.02879 | null |
| 2025-03-04 | The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models | Ke Ji et.al. | 2503.02875 | null |
| 2025-03-04 | Prompting Generative AI with Interaction-Augmented Instructions | Leixian Shen et.al. | 2503.02874 | null |
| 2025-03-04 | FairSense-AI: Responsible AI Meets Sustainability | Shaina Raza et.al. | 2503.02865 | null |
| 2025-03-04 | Calibrating LLM Confidence with Semantic Steering: A Multi-Prompt Aggregation Framework | Ziang Zhou et.al. | 2503.02863 | null |
| 2025-03-04 | Privacy and Accuracy-Aware AI/ML Model Deduplication | Hong Guan et.al. | 2503.02862 | null |
| 2025-03-04 | Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs’ Decoding Layers | Zicong He et.al. | 2503.02851 | null |
| 2025-03-04 | Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs | Yuzhe Gu et.al. | 2503.02846 | null |
| 2025-03-04 | AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation | Songming Zhang et.al. | 2503.02832 | null |
| 2025-03-04 | Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression | Nathan Godey et.al. | 2503.02812 | null |
| 2025-02-28 | LLM Post-Training: A Deep Dive into Reasoning Large Language Models | Komal Kumar et.al. | 2502.21321 | null |
| 2025-02-28 | FANformer: Improving Large Language Models Through Effective Periodicity Modeling | Yihong Dong et.al. | 2502.21309 | null |
| 2025-02-28 | Contextualizing biological perturbation experiments through language | Menghua Wu et.al. | 2502.21290 | null |
| 2025-02-28 | Adaptive Keyframe Sampling for Long Video Understanding | Xi Tang et.al. | 2502.21271 | null |
| 2025-02-28 | Token-level Ensembling of Models with Different Vocabularies | Rachel Wicks et.al. | 2502.21265 | null |
| 2025-02-28 | RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete | Yuheng Ji et.al. | 2502.21257 | null |
| 2025-02-28 | Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs | Xiaomin Li et.al. | 2502.21239 | null |
| 2025-02-28 | Transforming Tuberculosis Care: Optimizing Large Language Models For Enhanced Clinician-Patient Communication | Daniil Filienko et.al. | 2502.21236 | null |
| 2025-02-28 | ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs | Hao Ge et.al. | 2502.21231 | null |
| 2025-03-03 | ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer | Omer Goldman et.al. | 2502.21228 | null |
| 2025-02-27 | R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts | Zhongyang Li et.al. | 2502.20395 | null |
| 2025-02-27 | Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis | Jeffrey Yang Fan Chiang et.al. | 2502.20383 | null |
| 2025-02-27 | Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers | Shalev Lifshitz et.al. | 2502.20379 | null |
| 2025-02-27 | PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation | Albert Gong et.al. | 2502.20377 | null |
| 2025-02-27 | Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization | Ryan C. Barron et.al. | 2502.20364 | null |
| 2025-02-27 | Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs | Kuan Lok Zhou et.al. | 2502.20356 | null |
| 2025-02-27 | KEDRec-LM: A Knowledge-distilled Explainable Drug Recommendation Large Language Model | Kai Zhang et.al. | 2502.20350 | null |
| 2025-02-27 | Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models | Yi Jing et.al. | 2502.20344 | null |
| 2025-02-27 | Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners | Daniele Paliotta et.al. | 2502.20339 | null |
| 2025-02-27 | Expertise Is What We Want | Alan Ashworth et.al. | 2502.20335 | null |
| 2025-02-26 | Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing | Akshat Gupta et.al. | 2502.19416 | null |
| 2025-02-26 | Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs | Dayu Yang et.al. | 2502.19411 | null |
| 2025-02-26 | Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices | Xinru Wang et.al. | 2502.19410 | null |
| 2025-02-26 | ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models | Danae Sánchez Villegas et.al. | 2502.19409 | null |
| 2025-02-26 | Learning Code-Edit Embedding to Model Student Debugging Behavior | Hasnain Heickal et.al. | 2502.19407 | null |
| 2025-02-26 | General Reasoning Requires Learning to Reason from the Get-go | Seungwook Han et.al. | 2502.19402 | null |
| 2025-02-26 | TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding | Max Ku et.al. | 2502.19400 | null |
| 2025-02-26 | Residual Speech Embeddings for Tone Classification: Removing Linguistic Content to Enhance Paralinguistic Analysis | Hamdan Al Ahbabi et.al. | 2502.19387 | null |
| 2025-02-26 | DataMan: Data Manager for Pre-training Large Language Models | Ru Peng et.al. | 2502.19363 | null |
| 2025-02-26 | Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? | Yancheng He et.al. | 2502.19361 | null |
| 2025-02-25 | DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers | Xueguang Ma et.al. | 2502.18460 | null |
| 2025-02-25 | LLM-Based Design Pattern Detection | Christian Schindler et.al. | 2502.18458 | null |
| 2025-02-25 | FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response | Mollie Shichman et.al. | 2502.18452 | null |
| 2025-02-25 | SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution | Yuxiang Wei et.al. | 2502.18449 | null |
| 2025-02-25 | MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning | Chanwoo Park et.al. | 2502.18439 | null |
| 2025-02-25 | TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning | Frederikus Hudi et.al. | 2502.18431 | null |
| 2025-02-25 | OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference | Xiangyu Zhao et.al. | 2502.18411 | null |
| 2025-02-25 | Monte Carlo Temperature: a robust sampling strategy for LLM’s uncertainty quantification methods | Nicola Cecere et.al. | 2502.18389 | null |
| 2025-02-25 | How Far are LLMs from Real Search? A Comprehensive Study on Efficiency, Completeness, and Inherent Capabilities | Minhua Lin et.al. | 2502.18387 | null |
| 2025-02-25 | MindMem: Multimodal for Predicting Advertisement Memorability Using LLMs and Deep Learning | Sepehr Asgarian et.al. | 2502.18371 | null |
| 2025-02-24 | Introducing Visual Perception Token into Multimodal Large Language Model | Runpeng Yu et.al. | 2502.17425 | link |
| 2025-02-24 | MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs | Jiarui Zhang et.al. | 2502.17422 | link |
| 2025-02-24 | LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification | Penghui Yang et.al. | 2502.17421 | link |
| 2025-02-24 | The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence | Tom Wollschläger et.al. | 2502.17420 | null |
| 2025-02-24 | From System 1 to System 2: A Survey of Reasoning Large Language Models | Zhong-Zhi Li et.al. | 2502.17419 | link |
| 2025-02-24 | Reasoning with Latent Thoughts: On the Power of Looped Transformers | Nikunj Saunshi et.al. | 2502.17416 | null |
| 2025-02-24 | COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs | Liming Liu et.al. | 2502.17410 | link |
| 2025-02-24 | Large Language Models are Powerful EHR Encoders | Stefan Hegselmann et.al. | 2502.17403 | null |
| 2025-02-24 | DIS-CO: Discovering Copyrighted Content in VLMs Training Data | André V. Duarte et.al. | 2502.17358 | link |
| 2025-02-24 | On Relation-Specific Neurons in Large Language Models | Yihong Liu et.al. | 2502.17355 | link |
| 2025-02-21 | ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval | Guanqi Zhan et.al. | 2502.15682 | null |
| 2025-02-21 | Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training | Jaydeep Borkar et.al. | 2502.15680 | null |
| 2025-02-21 | FLEKE: Federated Locate-then-Edit Knowledge Editing | Zongkai Zhao et.al. | 2502.15677 | null |
| 2025-02-21 | AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind | Zhining Zhang et.al. | 2502.15676 | null |
| 2025-02-21 | Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing | Shoumik Saha et.al. | 2502.15666 | null |
| 2025-02-21 | Machine-generated text detection prevents language model collapse | George Drayson et.al. | 2502.15654 | null |
| 2025-02-21 | Empowering LLMs with Logical Reasoning: A Comprehensive Survey | Fengxiang Cheng et.al. | 2502.15652 | null |
| 2025-02-21 | Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models | Anirudh Sundar et.al. | 2502.15639 | null |
| 2025-02-21 | The Relationship Between Reasoning and Performance in Large Language Models – o3 (mini) Thinks Harder, Not Longer | Marthe Ballon et.al. | 2502.15631 | null |
| 2025-02-21 | Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing | Qi Le et.al. | 2502.15618 | null |
| 2025-02-20 | LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention | Shang Yang et.al. | 2502.14866 | null |
| 2025-02-20 | Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning | Shuyue Stella Li et.al. | 2502.14860 | null |
| 2025-02-20 | FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling | Weilin Zhao et.al. | 2502.14856 | null |
| 2025-02-20 | Prompt-to-Leaderboard | Evan Frick et.al. | 2502.14855 | null |
| 2025-02-20 | GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks | Jianwen Luo et.al. | 2502.14848 | null |
| 2025-02-20 | Red-Teaming LLM Multi-Agent Systems via Communication Attacks | Pengfei He et.al. | 2502.14847 | null |
| 2025-02-20 | Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation | Yue Yang et.al. | 2502.14846 | null |
| 2025-02-20 | Revealing and Mitigating Over-Attention in Knowledge Editing | Pinzheng Wang et.al. | 2502.14838 | null |
| 2025-02-20 | Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs | Danni Liu et.al. | 2502.14830 | null |
| 2025-02-20 | Exploring Advanced Techniques for Visual Question Answering: A Comprehensive Comparison | Aiswarya Baby et.al. | 2502.14827 | null |
| 2025-02-19 | Where’s the Bug? Attention Probing for Scalable Fault Localization | Adam Stein et.al. | 2502.13966 | null |
| 2025-02-19 | Autellix: An Efficient Serving Engine for LLM Agents as General Programs | Michael Luo et.al. | 2502.13965 | null |
| 2025-02-19 | MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads | Weihao Liu et.al. | 2502.13963 | null |
| 2025-02-19 | Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering | William Jurayj et.al. | 2502.13962 | null |
| 2025-02-19 | LIDDIA: Language-based Intelligent Drug Discovery Agent | Reza Averly et.al. | 2502.13959 | null |
| 2025-02-19 | Neurosymbolic artificial intelligence via large language models and coherence-driven inference | Steve Huntsman et.al. | 2502.13953 | null |
| 2025-02-19 | Why Safeguarded Ships Run Aground? Aligned Large Language Models’ Safety Mechanisms Tend to Be Anchored in The Template Region | Chak Tou Leong et.al. | 2502.13946 | null |
| 2025-02-19 | A Chain-of-Thought Subspace Meta-Learning for Few-shot Image Captioning with Large Vision and Language Models | Hao Huang et.al. | 2502.13942 | null |
| 2025-02-19 | LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization | Guanzheng Chen et.al. | 2502.13922 | link |
| 2025-02-19 | Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis | Jiahao Gai et.al. | 2502.13921 | null |
| 2025-02-18 | Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization | Shuo Xing et.al. | 2502.13146 | null |
| 2025-02-18 | Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation | Bencheng Liao et.al. | 2502.13145 | null |
| 2025-02-18 | UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models | Huawei Lin et.al. | 2502.13141 | null |
| 2025-02-18 | Towards Quantum Tensor Decomposition in Biomedical Applications | Myson Burch et.al. | 2502.13140 | null |
| 2025-02-18 | AIDE: AI-Driven Exploration in the Space of Code | Zhengyao Jiang et.al. | 2502.13138 | null |
| 2025-02-18 | Theorem Prover as a Judge for Synthetic Data Generation | Joshua Ong Jun Leang et.al. | 2502.13137 | null |
| 2025-02-18 | Learning to Defer for Causal Discovery with Imperfect Experts | Oscar Clivio et.al. | 2502.13132 | null |
| 2025-02-18 | Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning | Jingyang Lin et.al. | 2502.13127 | null |
| 2025-02-18 | RuozhiBench: Evaluating LLMs with Logical Fallacies and Misleading Premises | Zenan Zhai et.al. | 2502.13125 | null |
| 2025-02-18 | Adapting Psycholinguistic Research for LLMs: Gender-inclusive Language in a Coreference Context | Marion Bartl et.al. | 2502.13120 | null |
| 2025-02-17 | Idiosyncrasies in Large Language Models | Mingjie Sun et.al. | 2502.12150 | null |
| 2025-02-17 | HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation | Ling Yang et.al. | 2502.12148 | null |
| 2025-02-17 | Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control | Jinyan Su et.al. | 2502.12145 | null |
| 2025-02-17 | Small Models Struggle to Learn from Strong Reasoners | Yuetai Li et.al. | 2502.12143 | null |
| 2025-02-17 | SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs | Yige Xu et.al. | 2502.12134 | null |
| 2025-02-17 | Transformer Dynamics: A neuroscientific approach to interpretability of large language models | Jesseba Fernando et.al. | 2502.12131 | null |
| 2025-02-17 | Scaling Autonomous Agents via Automatic Reward Modeling And Planning | Zhenfang Chen et.al. | 2502.12130 | null |
| 2025-02-17 | Minimal Ranks, Maximum Confidence: Parameter-efficient Uncertainty Quantification for LoRA | Patryk Marszałek et.al. | 2502.12122 | null |
| 2025-02-17 | LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws | Prasanna Mayilvahanan et.al. | 2502.12120 | null |
| 2025-02-17 | PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection | Jinhe Bi et.al. | 2502.12119 | null |
| 2025-02-14 | MM-RLHF: The Next Step Forward in Multimodal LLM Alignment | Yi-Fan Zhang et.al. | 2502.10391 | link |
| 2025-02-14 | Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction | WonJin Yoon et.al. | 2502.10388 | null |
| 2025-02-14 | Enhancing Multilingual LLM Pretraining with Model-Based Data Selection | Bettina Messmer et.al. | 2502.10361 | null |
| 2025-02-14 | Organize the Web: Constructing Domains Enhances Pre-Training Data Curation | Alexander Wettig et.al. | 2502.10341 | null |
| 2025-02-14 | Evaluating the Meta- and Object-Level Reasoning of Large Language Models for Question Answering | Nick Ferguson et.al. | 2502.10338 | null |
| 2025-02-14 | LLM-Powered Preference Elicitation in Combinatorial Assignment | Ermis Soumalias et.al. | 2502.10308 | null |
| 2025-02-14 | Open-Source AI-Powered Optimization in Scalene: Advancing Python Performance Profiling with DeepSeek-R1 and LLaMA 3.2 | Saem Hasan et.al. | 2502.10299 | null |
| 2025-02-14 | Are Large Language Models the future crowd workers of Linguistics? | Iris Ferrazzo et.al. | 2502.10266 | null |
| 2025-02-14 | Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers | Aivin V. Solatorio et.al. | 2502.10263 | null |
| 2025-02-14 | VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models | Gokul Karthik Kumar et.al. | 2502.10250 | null |
| 2025-02-13 | Theoretical Benefit and Limitation of Diffusion Language Model | Guhao Feng et.al. | 2502.09622 | null |
| 2025-02-13 | MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency | Dongzhi Jiang et.al. | 2502.09621 | link |
| 2025-02-13 | Exploring the Potential of Encoder-free Architectures in 3D LMMs | Yiwen Tang et.al. | 2502.09620 | link |
| 2025-02-13 | Human-LLM Coevolution: Evidence from Academic Writing | Mingmeng Geng et.al. | 2502.09606 | null |
| 2025-02-13 | SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models | Yung-Sung Chuang et.al. | 2502.09604 | link |
| 2025-02-13 | GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis | Angelos Zavras et.al. | 2502.09598 | link |
| 2025-02-13 | Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs | Siyan Zhao et.al. | 2502.09597 | link |
| 2025-02-13 | KIMAs: A Configurable Knowledge Integrated Multi-Agent System | Zitao Li et.al. | 2502.09596 | null |
| 2025-02-13 | Logical forms complement probability in understanding language model (and human) performance | Yixuan Wang et.al. | 2502.09589 | null |
| 2025-02-13 | Polymind: Parallel Visual Diagramming with Large Language Models to Support Prewriting Through Microtasks | Qian Wan et.al. | 2502.09577 | null |
| 2025-02-12 | Examining Multilingual Embedding Models Cross-Lingually Through LLM-Generated Adversarial Examples | Andrianos Michail et.al. | 2502.08638 | null |
| 2025-02-12 | Ensemble based approach to quantifying uncertainty of LLM based classifications | Srijith Rajamohan et.al. | 2502.08631 | null |
| 2025-02-12 | Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks | Ang Li et.al. | 2502.08586 | null |
| 2025-02-12 | QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval | Wonduk Seo et.al. | 2502.08557 | null |
| 2025-02-12 | Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies | Sunnie S. Y. Kim et.al. | 2502.08554 | null |
| 2025-02-12 | LLMs can implicitly learn from mistakes in-context | Lisa Alazraki et.al. | 2502.08550 | null |
| 2025-02-12 | LLM Pretraining with Continuous Concepts | Jihoon Tack et.al. | 2502.08524 | link |
| 2025-02-12 | The Paradox of Stochasticity: Limited Creativity and Computational Decoupling in Temperature-Varied LLM Outputs of Structured Fictional Data | Evgenii Evstafev et.al. | 2502.08515 | null |
| 2025-02-12 | Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation | Mahnaz Koupaee et.al. | 2502.08514 | null |
| 2025-02-12 | Measuring Diversity in Synthetic Datasets | Yuchang Zhu et.al. | 2502.08512 | null |
| 2025-02-11 | DarwinLM: Evolutionary Structured Pruning of Large Language Models | Shengkun Tang et.al. | 2502.07780 | link |
| 2025-02-11 | Auditing Prompt Caching in Language Model APIs | Chenchen Gu et.al. | 2502.07776 | link |
| 2025-02-11 | Automatic Robot Task Planning by Integrating Large Language Model with Genetic Programming | Azizjon Kobilov et.al. | 2502.07772 | null |
| 2025-02-11 | Great Power Brings Great Responsibility: Personalizing Conversational AI for Diverse Problem-Solvers | Italo Santos et.al. | 2502.07763 | null |
| 2025-02-11 | Scalable Fingerprinting of Large Language Models | Anshul Nasery et.al. | 2502.07760 | link |
| 2025-02-11 | Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension | Wenbo Gong et.al. | 2502.07752 | null |
| 2025-02-11 | WHODUNIT: Evaluation benchmark for culprit detection in mystery stories | Kshitij Gupta et.al. | 2502.07747 | link |
| 2025-02-11 | The Economics of Large Language Models: Token Allocation, Fine-Tuning, and Optimal Pricing | Dirk Bergemann et.al. | 2502.07736 | null |
| 2025-02-11 | Economics of Sourcing Human Data | Sebastin Santy et.al. | 2502.07732 | null |
| 2025-02-11 | Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK | Marcos Cramer et.al. | 2502.07728 | null |
| 2025-02-10 | Rationalization Models for Text-to-SQL | Gaetano Rossiello et.al. | 2502.06759 | null |
| 2025-02-10 | Gradient Multi-Normalization for Stateless and Scalable LLM Training | Meyer Scetbon et.al. | 2502.06742 | null |
| 2025-02-10 | VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data | Thomas Zeng et.al. | 2502.06737 | null |
| 2025-02-10 | Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining | Daouda Sow et.al. | 2502.06733 | null |
| 2025-02-10 | Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling | Runze Liu et.al. | 2502.06703 | link |
| 2025-02-10 | Boosting Self-Efficacy and Performance of Large Language Models via Verbal Efficacy Stimulations | Rui Chen et.al. | 2502.06669 | null |
| 2025-02-10 | Automatic Evaluation of Healthcare LLMs Beyond Question-Answering | Anna Arias-Duart et.al. | 2502.06666 | null |
| 2025-02-10 | On the Limitations of Combining Sentiment Analysis Tools in a Cross-Platform Setting | Martin Obaidi et.al. | 2502.06665 | null |
| 2025-02-10 | EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models | Xingrun Xing et.al. | 2502.06663 | link |
| 2025-02-10 | Unbiased Evaluation of Large Language Models from a Causal Perspective | Meilin Chen et.al. | 2502.06655 | null |
| 2025-02-07 | Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Yunhang Shen et.al. | 2502.05177 | link |
| 2025-02-07 | NoLiMa: Long-Context Evaluation Beyond Literal Matching | Ali Modarressi et.al. | 2502.05167 | link |
| 2025-02-07 | DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails | Yihe Deng et.al. | 2502.05163 | link |
| 2025-02-07 | A Lightweight Method to Disrupt Memorized Sequences in LLM | Parjanya Prajakta Prashant et.al. | 2502.05159 | null |
| 2025-02-07 | Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment | Minh-Quan Le et.al. | 2502.05153 | link |
| 2025-02-07 | Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation | Steffen Eger et.al. | 2502.05151 | link |
| 2025-02-07 | CodeSCM: Causal Analysis for Multi-Modal Code Generation | Mukur Gupta et.al. | 2502.05150 | null |
| 2025-02-07 | An Annotated Reading of ‘The Singer of Tales’ in the LLM Era | Kush R. Varshney et.al. | 2502.05148 | null |
| 2025-02-07 | Refining Integration-by-Parts Reduction of Feynman Integrals with Machine Learning | Matt von Hippel et.al. | 2502.05121 | null |
| 2025-02-07 | Flexible and Efficient Grammar-Constrained Decoding | Kanghee Park et.al. | 2502.05111 | null |
| 2025-02-06 | Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment | Zuyan Liu et.al. | 2502.04328 | link |
| 2025-02-06 | Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions | Yik Siu Chan et.al. | 2502.04322 | link |
| 2025-02-06 | ChamaleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters | Kamer Ali Yuksel et.al. | 2502.04315 | null |
| 2025-02-06 | ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization | Yinjie Wang et.al. | 2502.04306 | link |
| 2025-02-06 | Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization | Yuanye Liu et.al. | 2502.04295 | link |
| 2025-02-06 | PILAF: Optimal Human Preference Sampling for Reward Modeling | Yunzhen Feng et.al. | 2502.04270 | null |
| 2025-02-06 | How does a Multilingual LM Handle Multiple Languages? | Santhosh Kakarla et.al. | 2502.04269 | null |
| 2025-02-06 | Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion | Marco Mistretta et.al. | 2502.04263 | link |
| 2025-02-06 | TriNER: A Series of Named Entity Recognition Models For Hindi, Bengali & Marathi | Mohammed Amaan Dhamaskar et.al. | 2502.04245 | null |
| 2025-02-06 | MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion | Xintong Hao et.al. | 2502.04235 | null |
| 2025-02-05 | Do Large Language Model Benchmarks Test Reliability? | Joshua Vendrow et.al. | 2502.03461 | link |
| 2025-02-05 | Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training | Boyao Wang et.al. | 2502.03460 | link |
| 2025-02-05 | A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs) | Yiye Chen et.al. | 2502.03450 | null |
| 2025-02-05 | BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving | Ran Xin et.al. | 2502.03438 | null |
| 2025-02-05 | On Fairness of Unified Multimodal Large Language Model for Image Generation | Ming Liu et.al. | 2502.03429 | null |
| 2025-02-05 | Harnessing Large Language Models for Curated Code Reviews | Oussama Ben Sghaier et.al. | 2502.03425 | null |
| 2025-02-05 | Investigating Corporate Social Responsibility Initiatives: Examining the case of corporate Covid-19 response | Meheli Basu et.al. | 2502.03421 | null |
| 2025-02-05 | Think or Step-by-Step? UnZIPping the Black Box in Zero-Shot Prompts | Nikta Gohari Sadr et.al. | 2502.03418 | null |
| 2025-02-05 | SPRI: Aligning Large Language Models with Context-Situated Principles | Hongli Zhan et.al. | 2502.03397 | link |
| 2025-02-05 | LIMO: Less is More for Reasoning | Yixin Ye et.al. | 2502.03387 | link |
| 2025-02-04 | COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation | Xueqing Deng et.al. | 2502.02589 | link |
| 2025-02-04 | A comparison of translation performance between DeepL and Supertext | Alex Flückiger et.al. | 2502.02577 | link |
| 2025-02-04 | Are Language Models Up to Sequential Optimization Problems? From Evaluation to a Hegelian-Inspired Enhancement | Soheil Abbasloo et.al. | 2502.02573 | null |
| 2025-02-04 | Learning the RoPEs: Better 2D and 3D Position Encodings with STRING | Connor Schenck et.al. | 2502.02562 | null |
| 2025-02-04 | LLMs for Generation of Architectural Components: An Exploratory Empirical Study in the Serverless World | Shrikara Arun et.al. | 2502.02539 | null |
| 2025-02-04 | Adaptive Self-improvement LLM Agentic System for ML Library Development | Genghan Zhang et.al. | 2502.02534 | null |
| 2025-02-04 | Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies | Han Zhou et.al. | 2502.02533 | null |
| 2025-02-04 | Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search | Maohao Shen et.al. | 2502.02508 | null |
| 2025-02-04 | EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization | Yize Wu et.al. | 2502.02493 | null |
| 2025-02-04 | Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study | Menglong Cui et.al. | 2502.02481 | link |
| 2025-01-31 | Vintix: Action Model via In-Context Reinforcement Learning | Andrey Polubarov et.al. | 2501.19400 | link |
| 2025-01-31 | Do LLMs Strategically Reveal, Conceal, and Infer Information? A Theoretical and Empirical Analysis in The Chameleon Game | Mustafa O. Karabag et.al. | 2501.19398 | link |
| 2025-01-31 | Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models | Alina Shutova et.al. | 2501.19392 | null |
| 2025-01-31 | Federated Sketching LoRA: On-Device Collaborative Fine-Tuning of Large Language Models | Wenzhi Fang et.al. | 2501.19389 | link |
| 2025-02-03 | SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions | Dominik Wagner et.al. | 2501.19377 | null |
| 2025-01-31 | We’re Different, We’re the Same: Creative Homogeneity Across LLMs | Emily Wenger et.al. | 2501.19361 | null |
| 2025-01-31 | Mechanical Properties of the Meninges: Large Language Model Assisted Systematic Review of over 25,000 Studies | Brandon P. Chelstrom et.al. | 2501.19359 | null |
| 2025-01-31 | The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking | Yuchun Miao et.al. | 2501.19358 | null |
| 2025-01-31 | Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SCICAP Challenge 2023 | Ting-Yao E. Hsu et.al. | 2501.19353 | null |
| 2025-01-31 | Towards Adaptive Self-Improvement for Smarter Energy Systems | Alexander Sommer et.al. | 2501.19340 | null |
| 2025-01-30 | Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs | Yue Wang et.al. | 2501.18585 | null |
| 2025-01-30 | Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH | Evgenii Evstafev et.al. | 2501.18576 | null |
| 2025-01-30 | BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos | Lehao Lin et.al. | 2501.18565 | null |
| 2025-01-30 | Semantic Web and Creative AI – A Technical Report from ISWS 2023 | Raia Abu Ahmad et.al. | 2501.18542 | null |
| 2025-01-30 | Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges | Manveer Singh Tamber et.al. | 2501.18536 | link |
| 2025-01-30 | Differentially Private Steering for Large Language Model Alignment | Anmol Goel et.al. | 2501.18532 | link |
| 2025-01-30 | Learn from the Past: Language-conditioned Object Rearrangement with Large Language Models | Guanqun Cao et.al. | 2501.18516 | null |
| 2025-01-30 | Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch | Arthur Douillard et.al. | 2501.18512 | null |
| 2025-01-30 | CLEAR: Cue Learning using Evolution for Accurate Recognition Applied to Sustainability Data Extraction | Peter J. Bentley et.al. | 2501.18504 | null |
| 2025-01-30 | A Tool for In-depth Analysis of Code Execution Reasoning of Large Language Models | Changshu Liu et.al. | 2501.18482 | null |
| 2025-01-29 | Learning Beyond the Surface: How Far Can Continual Pre-Training with LoRA Enhance LLMs’ Domain-Specific Insight Learning? | Pouya Pezeshkpour et.al. | 2501.17840 | link |
| 2025-01-29 | Leveraging Multimodal LLM for Inspirational User Interface Search | Seokhyeon Park et.al. | 2501.17799 | link |
| 2025-01-29 | BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation – Challenges and Insights | Chan-Jan Hsu et.al. | 2501.17790 | null |
| 2025-01-29 | AdditiveLLM: Large Language Models Predict Defects in Additive Manufacturing | Peter Pak et.al. | 2501.17784 | null |
| 2025-01-29 | 2SSP: A Two-Stage Framework for Structured Pruning of LLMs | Fabrizio Sandri et.al. | 2501.17771 | link |
| 2025-01-29 | Hybrid Graphs for Table-and-Text based Question Answering using LLMs | Ankush Agarwal et.al. | 2501.17767 | null |
| 2025-01-29 | On the Partitioning of GPU Power among Multi-Instances | Tirth Vamja et.al. | 2501.17752 | null |
| 2025-01-29 | Early External Safety Testing of OpenAI’s o3-mini: Insights from the Pre-Deployment Evaluation | Aitor Arrieta et.al. | 2501.17749 | link |
| 2025-01-29 | Using Code Generation to Solve Open Instances of Combinatorial Design Problems | Christopher D. Rosin et.al. | 2501.17725 | link |
| 2025-01-29 | RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts | Eujeong Choi et.al. | 2501.17715 | link |
| 2025-01-28 | Cultural Differences and Perverse Incentives in Science Create a Bad Mix: Exploring Country-Level Publication Bias in Select ACM Conferences | Aksheytha Chelikavada et.al. | 2501.17150 | null |
| 2025-01-28 | FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data | Deren Lei et.al. | 2501.17144 | link |
| 2025-01-28 | ASTRAL: Automated Safety Testing of Large Language Models | Miriam Ugarte et.al. | 2501.17132 | null |
| 2025-01-28 | Optimizing Large Language Model Training Using FP4 Quantization | Ruizhe Wang et.al. | 2501.17116 | null |
| 2025-01-28 | Unlocking Transparent Alignment Through Enhanced Inverse Constitutional AI for Principle Extraction | Carl-Leander Henneking et.al. | 2501.17112 | null |
| 2025-01-28 | Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving | Evgenii Evstafev et.al. | 2501.17084 | null |
| 2025-01-28 | Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models | Minghan Li et.al. | 2501.17039 | null |
| 2025-01-28 | Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies | Manojkumar Parmar et.al. | 2501.17030 | null |
| 2025-01-28 | Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs | Alessandro Midolo et.al. | 2501.17024 | null |
| 2025-01-28 | Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement | Kei Katsumata et.al. | 2501.17022 | null |
| 2025-01-27 | Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology | Meiyun Cao et.al. | 2501.16309 | null |
| 2025-01-27 | RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval | Long Nguyen et.al. | 2501.16303 | null |
| 2025-01-27 | Matryoshka Re-Ranker: A Flexible Re-Ranking Architecture With Configurable Depth and Width | Zheng Liu et.al. | 2501.16302 | null |
| 2025-01-27 | Large Models in Dialogue for Active Perception and Anomaly Detection | Tzoulio Chamiti et.al. | 2501.16300 | null |
| 2025-01-27 | FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers | Renshan Zhang et.al. | 2501.16297 | null |
| 2025-01-27 | Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models | Jing Zhang et.al. | 2501.16282 | null |
| 2025-01-27 | Do LLMs Have Visualization Literacy? An Evaluation on Modified Visualizations to Test Generalization in Data Interpretation | Jiayi Hong et.al. | 2501.16277 | null |
| 2025-01-27 | URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots – A Case Study at HCMUT | Long Nguyen et.al. | 2501.16276 | null |
| 2025-01-27 | A foundation model for human-AI collaboration in medical literature mining | Zifeng Wang et.al. | 2501.16255 | null |
| 2025-01-27 | Multi-Agent Geospatial Copilots for Remote Sensing Workflows | Chaehong Lee et.al. | 2501.16254 | null |
| 2025-01-24 | HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation | Xin Zhou et.al. | 2501.14729 | link |
| 2025-01-24 | Do LLMs Provide Consistent Answers to Health-Related Questions across Languages? | Ipek Baris Schlicht et.al. | 2501.14719 | null |
| 2025-01-24 | Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models | Naihao Deng et.al. | 2501.14717 | null |
| 2025-01-24 | FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing | James Seale Smith et.al. | 2501.14713 | null |
| 2025-01-24 | The Karp Dataset | Mason DiCicco et.al. | 2501.14705 | null |
| 2025-01-24 | Rethinking Table Instruction Tuning | Naihao Deng et.al. | 2501.14693 | null |
| 2025-01-24 | An Empirical Study on LLM-based Classification of Requirements-related Provisions in Food-safety Regulations | Shabnam Hassani et.al. | 2501.14683 | null |
| 2025-01-24 | Diffusion based Text-to-Music Generationwith Global and Local Text based Conditioning | Jisi Zhang et.al. | 2501.14680 | null |
| 2025-01-24 | MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications | Yixing Jiang et.al. | 2501.14654 | link |
| 2025-01-24 | Investigating the (De)Composition Capabilities of Large Language Models in Natural-to-Formal Language Conversion | Ziyao Xu et.al. | 2501.14649 | link |
| 2025-01-23 | CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation | Guofeng Cui et.al. | 2501.13927 | null |
| 2025-01-23 | Analysis of Indic Language Capabilities in LLMs | Aatman Vaidya et.al. | 2501.13912 | null |
| 2025-01-23 | Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models | Linh Tran et.al. | 2501.13904 | null |
| 2025-01-23 | Exploring Finetuned Audio-LLM on Heart Murmur Features | Adrian Florea et.al. | 2501.13884 | null |
| 2025-01-23 | The machine learning platform for developers of large systems | Alexey Naikov et.al. | 2501.13881 | null |
| 2025-01-23 | A RAG-Based Institutional Assistant | Gustavo Kuratomi et.al. | 2501.13880 | null |
| 2025-01-23 | Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes | Shiling Deng et.al. | 2501.13851 | link |
| 2025-01-23 | On the Reasoning Capacity of AI Models and How to Quantify It | Santosh Kumar Radha et.al. | 2501.13833 | null |
| 2025-01-23 | Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing | Hao Zhang et.al. | 2501.13831 | null |
| 2025-01-23 | Hallucinations Can Improve Large Language Models in Drug Discovery | Shuzhou Yuan et.al. | 2501.13824 | null |
| 2025-01-22 | A Rate-Distortion Framework for Summarization | Enes Arda et.al. | 2501.13100 | null |
| 2025-01-22 | Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment | Melissa Kazemi Rad et.al. | 2501.13080 | null |
| 2025-01-22 | Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning | Bohao Yang et.al. | 2501.13042 | link |
| 2025-01-22 | Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament | Yantao Liu et.al. | 2501.13007 | link |
| 2025-01-22 | Large Language Model-Based Semantic Communication System for Image Transmission | Soheyb Ribouh et.al. | 2501.12988 | null |
| 2025-01-22 | LLM4WM: Adapting LLM for Wireless Multi-Tasking | Xuanyu Liu et.al. | 2501.12983 | null |
| 2025-01-22 | OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models | Chongren Sun et.al. | 2501.12975 | link |
| 2025-01-22 | Accessible Smart Contracts Verification: Synthesizing Formal Models with Tamed LLMs | Jan Corazza et.al. | 2501.12972 | null |
| 2025-01-22 | It’s complicated. The relationship of algorithmic fairness and non-discrimination regulations in the EU AI Act | Kristof Meding et.al. | 2501.12962 | null |
| 2025-01-22 | Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference | Weizhi Fei et.al. | 2501.12959 | null |
| 2025-01-21 | InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling | Yi Wang et.al. | 2501.12386 | link |
| 2025-01-21 | Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists | Thomas F. Eisenmann et.al. | 2501.12374 | link |
| 2025-01-21 | Is Long Context All You Need? Leveraging LLM’s Extended Context for NL2SQL | Yeounoh Chung et.al. | 2501.12372 | null |
| 2025-01-21 | Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration | Thomas Walshe et.al. | 2501.12332 | null |
| 2025-01-21 | VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model | Xianwei Zhuang et.al. | 2501.12327 | link |
| 2025-01-21 | LLM-Assisted Knowledge Graph Completion for Curriculum and Domain Modelling in Personalized Higher Education Recommendations | Hasan Abu-Rasheed et.al. | 2501.12300 | null |
| 2025-01-21 | MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks | Qishen Zhou et.al. | 2501.12281 | link |
| 2025-01-21 | Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement | Maosong Cao et.al. | 2501.12273 | null |
| 2025-01-21 | FOCUS: First Order Concentrated Updating Scheme | Yizhou Liu et.al. | 2501.12243 | null |
| 2025-01-21 | InsTALL: Context-aware Instructional Task Assistance with Multi-modal Large Language Models | Pha Nguyen et.al. | 2501.12231 | null |
| 2025-01-17 | FaceXBench: Evaluating Multimodal LLMs on Face Understanding | Kartik Narayan et.al. | 2501.10360 | link |
| 2025-01-17 | Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems | Weibo Gao et.al. | 2501.10332 | null |
| 2025-01-17 | Large language models for automated scholarly paper review: A survey | Zhenzhen Zhuang et.al. | 2501.10326 | null |
| 2025-01-17 | HiMix: Reducing Computational Complexity in Large Vision-Language Models | Xuange Zhang et.al. | 2501.10318 | null |
| 2025-01-17 | Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling | Suvodip Dey et.al. | 2501.10316 | link |
| 2025-01-17 | Addressing Popularity Bias in Third-Party Library Recommendations Using LLMs | Claudio Di Sipio et.al. | 2501.10313 | null |
| 2025-01-17 | Computational Protein Science in the Era of Large Language Models (LLMs) | Wenqi Fan et.al. | 2501.10282 | null |
| 2025-01-17 | Test Wars: A Comparative Study of SBST, Symbolic Execution, and LLM-Based Approaches to Unit Test Generation | Azat Abdullin et.al. | 2501.10200 | null |
| 2025-01-17 | Generative Artificial Intelligence: Implications for Biomedical and Health Professions Education | William Hersh et.al. | 2501.10186 | null |
| 2025-01-17 | Multi-stage Training of Bilingual Islamic LLM for Neural Passage Retrieval | Vera Pavlova et.al. | 2501.10175 | null |
| 2025-01-16 | Distilling Multi-modal Large Language Models for Autonomous Driving | Deepti Hegde et.al. | 2501.09757 | null |
| 2025-01-16 | Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues | Youngjoon Jang et.al. | 2501.09754 | null |
| 2025-01-16 | OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking | Zekun Xi et.al. | 2501.09751 | null |
| 2025-01-16 | Enhancing Lexicon-Based Text Embeddings with Large Language Models | Yibin Lei et.al. | 2501.09749 | null |
| 2025-01-16 | Suggesting Code Edits in Interactive Machine Learning Notebooks Using Large Language Models | Bihui Jin et.al. | 2501.09745 | null |
| 2025-01-16 | KU AIGEN ICL EDI@BC8 Track 3: Advancing Phenotype Named Entity Recognition and Normalization for Dysmorphology Physical Examination Reports | Hajung Kim et.al. | 2501.09744 | null |
| 2025-01-16 | Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps | Nanye Ma et.al. | 2501.09732 | null |
| 2025-01-16 | A Simple Aerial Detection Baseline of Multimodal Language Models | Qingyun Li et.al. | 2501.09720 | link |
| 2025-01-16 | CyberMentor: AI Powered Learning Tool Platform to Address Diverse Student Needs in Cybersecurity Education | Tianyu Wang et.al. | 2501.09709 | null |
| 2025-01-16 | Domain Adaptation of Foundation LLMs for e-Commerce | Christian Herold et.al. | 2501.09706 | null |
| 2025-01-15 | Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails | Shaona Ghosh et.al. | 2501.09004 | null |
| 2025-01-15 | Vision Foundation Models for Computed Tomography | Suraj Pai et.al. | 2501.09001 | null |
| 2025-01-15 | Development and Validation of the Provider Documentation Summarization Quality Instrument for Large Language Models | Emma Croxford et.al. | 2501.08977 | null |
| 2025-01-15 | Learning to Extract Cross-Domain Aspects and Understanding Sentiments Using Large Language Models | Karukriti Kaushik Ghosh et.al. | 2501.08974 | null |
| 2025-01-15 | Analyzing the Ethical Logic of Six Large Language Models | W. Russell Neuman et.al. | 2501.08951 | null |
| 2025-01-15 | Applying General Turn-taking Models to Conversational Human-Robot Interaction | Gabriel Skantze et.al. | 2501.08946 | null |
| 2025-01-15 | Disentangling Exploration of Large Language Models by Optimal Exploitation | Tim Grams et.al. | 2501.08925 | null |
| 2025-01-15 | GenAI Content Detection Task 3: Cross-Domain Machine-Generated Text Detection Challenge | Liam Dugan et.al. | 2501.08913 | null |
| 2025-01-15 | Leveraging Large Language Models as Knowledge-Driven Agents for Reliable Retrosynthesis Planning | Qinyu Ma et.al. | 2501.08897 | null |
| 2025-01-15 | XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework | Sida Tian et.al. | 2501.08809 | null |
| 2025-01-14 | PokerBench: Training Large Language Models to become Professional Poker Players | Richard Zhuang et.al. | 2501.08328 | link |
| 2025-01-14 | Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks | Miran Heo et.al. | 2501.08326 | null |
| 2025-01-14 | ADAM-1: AI and Bioinformatics for Alzheimer’s Detection and Microbiome-Clinical Data Integrations | Ziyuan Huang et.al. | 2501.08324 | null |
| 2025-01-14 | Exploring Robustness of Multilingual LLMs on Real-World Noisy Data | Amirhossein Aliakbarzadeh et.al. | 2501.08322 | link |
| 2025-01-14 | Enhancing Automated Interpretability with Output-Centric Feature Descriptions | Yoav Gur-Arieh et.al. | 2501.08319 | link |
| 2025-01-14 | HALoGEN: Fantastic LLM Hallucinations and Where to Find Them | Abhilasha Ravichander et.al. | 2501.08292 | null |
| 2025-01-14 | LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding | Hongyu Li et.al. | 2501.08282 | link |
| 2025-01-14 | Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing | Pulkit Arora et.al. | 2501.08276 | null |
| 2025-01-14 | TriMod Fusion for Multimodal Named Entity Recognition in Social Media | Mosab Alfaqeeh et.al. | 2501.08267 | null |
| 2025-01-14 | Addressing the sustainable AI trilemma: a case study on LLM agents and RAG | Hui Wu et.al. | 2501.08262 | null |
| 2025-01-13 | Imagine while Reasoning in Space: Multimodal Visualization-of-Thought | Chengzu Li et.al. | 2501.07542 | null |
| 2025-01-13 | ML Mule: Mobile-Driven Context-Aware Collaborative Learning | Haoxiang Yu et.al. | 2501.07536 | null |
| 2025-01-13 | Investigating Large Language Models in Inferring Personality Traits from User Conversations | Jianfeng Zhu et.al. | 2501.07532 | null |
| 2025-01-13 | RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment | Difei Gu et.al. | 2501.07525 | link |
| 2025-01-13 | Parallel Key-Value Cache Fusion for Position Invariant RAG | Philhoon Oh et.al. | 2501.07523 | null |
| 2025-01-13 | Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards | Yangsibo Huang et.al. | 2501.07493 | null |
| 2025-01-13 | TiEBe: A Benchmark for Assessing the Current Knowledge of Large Language Models | Thales Sales Almeida et.al. | 2501.07482 | null |
| 2025-01-13 | A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities | Yihao Liu et.al. | 2501.07468 | null |
| 2025-01-13 | Understanding and Benchmarking Artificial Intelligence: OpenAI’s o3 Is Not AGI | Rolf Pfister et.al. | 2501.07458 | null |
| 2025-01-13 | Enhancing LLM’s Ability to Generate More Repository-Aware Unit Tests Through Precise Contextual Information Injection | Xin Yin et.al. | 2501.07425 | null |
| 2025-01-10 | LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs | Omkar Thawakar et.al. | 2501.06186 | link |
| 2025-01-10 | PEACE: Empowering Geologic Map Holistic Understanding with MLLMs | Yangyu Huang et.al. | 2501.06184 | null |
| 2025-01-10 | Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories | Gerd Kortemeyer et.al. | 2501.06143 | null |
| 2025-01-10 | Supervision policies can shape long-term risk management in general-purpose AI models | Manuel Cebrian et.al. | 2501.06137 | link |
| 2025-01-10 | Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI | Yuya Asano et.al. | 2501.06129 | null |
| 2025-01-10 | Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding | Fabian David Schmidt et.al. | 2501.06117 | null |
| 2025-01-10 | From Conversation to Automation: Leveraging Large Language Models to Analyze Strategies in Problem Solving Therapy | Elham Aghakhani et.al. | 2501.06101 | null |
| 2025-01-10 | How to Tune a Multilingual Encoder Model for Germanic Languages: A Study of PEFT, Full Fine-Tuning, and Language Adapters | Romina Oji et.al. | 2501.06025 | link |
| 2025-01-10 | Addressing speaker gender bias in large scale speech translation systems | Shubham Bansal et.al. | 2501.05989 | null |
| 2025-01-10 | Exploring LLMs for Automated Pre-Testing of Cross-Cultural Surveys | Divya Mani Adhikari et.al. | 2501.05985 | null |
| 2025-01-09 | ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding | Xingyu Fu et.al. | 2501.05452 | null |
| 2025-01-09 | Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark | Yunzhuo Hao et.al. | 2501.05444 | null |
| 2025-01-09 | A survey of textual cyber abuse detection using cutting-edge language models and large language models | Jose A. Diaz-Garcia et.al. | 2501.05443 | null |
| 2025-01-09 | Using LLMs to Infer Non-Binary COVID-19 Sentiments of Chinese Micro-bloggers | Jerry Chongyi Hu et.al. | 2501.05423 | null |
| 2025-01-09 | FairCode: Evaluating Social Bias of LLMs in Code Generation | Yongkang Du et.al. | 2501.05396 | link |
| 2025-01-09 | Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models | Kristian G. Barman et.al. | 2501.05382 | null |
| 2025-01-09 | Accelerated Diffusion Models via Speculative Sampling | Valentin De Bortoli et.al. | 2501.05370 | null |
| 2025-01-09 | Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction | Hantao Lou et.al. | 2501.05336 | link |
| 2025-01-09 | “What’s Happening”- A Human-centered Multimodal Interpreter Explaining the Actions of Autonomous Vehicles | Xuewen Luo et.al. | 2501.05322 | null |
| 2025-01-09 | CallNavi: A Study and Challenge on Function Calling Routing and Invocation in Large Language Models | Yewei Song et.al. | 2501.05255 | null |
| 2025-01-08 | Re-ranking the Context for Multimodal Retrieval Augmented Generation | Matin Mortaheb et.al. | 2501.04695 | null |
| 2025-01-08 | URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics | Ruilin Luo et.al. | 2501.04686 | null |
| 2025-01-08 | Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations | Archita Srivastava et.al. | 2501.04675 | null |
| 2025-01-08 | Assessing Language Comprehension in Large Language Models Using Construction Grammar | Wesley Scivetti et.al. | 2501.04661 | null |
| 2025-01-08 | Multi-task retriever fine-tuning for domain-specific and efficient RAG | Patrice Béchard et.al. | 2501.04652 | null |
| 2025-01-08 | FlairGPT: Repurposing LLMs for Interior Designs | Gabrielle Littlefair et.al. | 2501.04648 | null |
| 2025-01-08 | Knowledge Retrieval Based on Generative AI | Te-Lun Yang et.al. | 2501.04635 | null |
| 2025-01-08 | “Can you be my mum?”: Manipulating Social Robots in the Large Language Models Era | Giulio Antonio Abbo et.al. | 2501.04633 | null |
| 2025-01-08 | Quantum-inspired Embeddings Projection and Similarity Metrics for Representation Learning | Ivan Kankeu et.al. | 2501.04591 | null |
| 2025-01-08 | InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection | Yuhang Liu et.al. | 2501.04575 | link |
| 2025-01-07 | Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Haobo Yuan et.al. | 2501.04001 | null |
| 2025-01-07 | RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance | Matin Mortaheb et.al. | 2501.03995 | null |
| 2025-01-07 | Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles | Yuxi Xia et.al. | 2501.03991 | null |
| 2025-01-07 | (De)-Indexing and the Right to be Forgotten | Salvatore Vilella et.al. | 2501.03989 | null |
| 2025-01-07 | VLM-driven Behavior Tree for Context-aware Task Planning | Naoki Wake et.al. | 2501.03968 | null |
| 2025-01-07 | Vision Language Models as Values Detectors | Giulio Antonio Abbo et.al. | 2501.03957 | null |
| 2025-01-07 | Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States | Jurgita Kapočiūtė-Dzikienė et.al. | 2501.03952 | null |
| 2025-01-07 | Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection | Pablo Miralles-González et.al. | 2501.03940 | null |
| 2025-01-07 | Visual question answering: from early developments to recent advances – a survey | Ngoc Dung Huynh et.al. | 2501.03939 | null |
| 2025-01-07 | Exploring the Potential of Large Language Models in Public Transportation: San Antonio Case Study | Ramya Jonnala et.al. | 2501.03904 | null |
| 2025-01-06 | BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning | Beichen Zhang et.al. | 2501.03226 | link |
| 2025-01-06 | Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation | Yuhui Zhang et.al. | 2501.03225 | link |
| 2025-01-06 | Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLMs-Generated Text | Ayat Najjar et.al. | 2501.03212 | null |
| 2025-01-06 | Detecting AI-Generated Text in Educational Content: Leveraging Machine Learning and Explainable AI for Academic Integrity | Ayat A. Najjar et.al. | 2501.03203 | null |
| 2025-01-06 | CLIX: Cross-Lingual Explanations of Idiomatic Expressions | Aaron Gluck et.al. | 2501.03191 | null |
| 2025-01-06 | GLiREL – Generalist Model for Zero-Shot Relation Extraction | Jack Boylan et.al. | 2501.03172 | null |
| 2025-01-06 | Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text | Ali Al-Lawati et.al. | 2501.03166 | link |
| 2025-01-06 | Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches | Alhassan Mumuni et.al. | 2501.03151 | null |
| 2025-01-06 | VicSim: Enhancing Victim Simulation with Emotional and Linguistic Fidelity | Yerong Li et.al. | 2501.03139 | null |
| 2025-01-06 | PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models | Mingyang Song et.al. | 2501.03124 | link |
| 2025-01-03 | VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction | Chaoyou Fu et.al. | 2501.01957 | link |
| 2025-01-03 | Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap | Weizhi Zhang et.al. | 2501.01945 | null |
| 2025-01-03 | Abstractive Text Summarization for Contemporary Sanskrit Prose: Issues and Challenges | Shagun Sinha et.al. | 2501.01933 | null |
| 2025-01-03 | Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding | Jiaming Li et.al. | 2501.01926 | null |
| 2025-01-03 | Virgo: A Preliminary Exploration on Reproducing o1-like MLLM | Yifan Du et.al. | 2501.01904 | link |
| 2025-01-03 | Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions | Rachneet Sachdeva et.al. | 2501.01872 | link |
| 2025-01-03 | Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification | Xiangxiang Dai et.al. | 2501.01849 | null |
| 2025-01-03 | MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning | Pu Yang et.al. | 2501.01834 | null |
| 2025-01-03 | Time Series Language Model for Descriptive Caption Generation | Mohamed Trabelsi et.al. | 2501.01832 | null |
| 2025-01-03 | Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models | Yanjiang Liu et.al. | 2501.01830 | link |
| 2025-01-02 | Unifying Specialized Visual Encoders for Video Language Models | Jihoon Chung et.al. | 2501.01426 | link |
| 2025-01-02 | Multi-Modal Video Feature Extraction for Popularity Prediction | Haixu Liu et.al. | 2501.01422 | null |
| 2025-01-02 | Deep Discrete Encoders: Identifiable Deep Generative Models for Rich Data with Discrete Latent Layers | Seunghyun Lee et.al. | 2501.01414 | null |
| 2025-01-02 | OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios | Xize Cheng et.al. | 2501.01384 | null |
| 2025-01-02 | CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering | Ben Vardi et.al. | 2501.01371 | null |
| 2025-01-02 | Embedding-based Approaches to Hyperpartisan News Detection | Karthik Mohan et.al. | 2501.01370 | null |
| 2025-01-02 | Aligning Large Language Models for Faithful Integrity Against Opposing Argument | Yong Zhao et.al. | 2501.01336 | null |
| 2025-01-02 | CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models | Johan Wahréus et.al. | 2501.01335 | link |
| 2025-01-02 | Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension | Yanbo Fang et.al. | 2501.01332 | null |
| 2025-01-02 | The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation | Shuzheng Gao et.al. | 2501.01329 | null |
| 2024-12-30 | Distributed Mixture-of-Agents for Edge Inference with Large Language Models | Purbesh Mitra et.al. | 2412.21200 | link |
| 2024-12-31 | HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation | Zhaojian Yu et.al. | 2412.21199 | link |
| 2024-12-30 | Facilitating large language model Russian adaptation with Learned Embedding Propagation | Mikhail Tikhomirov et.al. | 2412.21140 | link |
| 2024-12-30 | ExpShield: Safeguarding Web Text from Unauthorized Crawling and Language Modeling Exploitation | Ruixuan Liu et.al. | 2412.21123 | null |
| 2024-12-30 | Toward Intelligent and Secure Cloud: Large Language Model Empowered Proactive Defense | Yuyang Zhou et.al. | 2412.21051 | link |
| 2024-12-30 | TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization | Chia-Yu Hung et.al. | 2412.21037 | link |
| 2024-12-30 | GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models | Shangyu Xing et.al. | 2412.21036 | null |
| 2024-12-30 | Automated Robustness Testing for LLM-based NLP Software | Mingxuan Xiao et.al. | 2412.21016 | link |
| 2024-12-30 | MapQaTor: A System for Efficient Annotation of Map Query Datasets | Mahir Labib Dihan et.al. | 2412.21015 | link |
| 2024-12-31 | Verbosity-Aware Rationale Reduction: Effective Reduction of Redundant Rationale via Principled Criteria | Joonwon Jang et.al. | 2412.21006 | null |
| 2024-12-27 | Can AI Help with Your Personal Finances? | Oudom Hean et.al. | 2412.19784 | null |
| 2024-12-27 | Machine Learning for Sentiment Analysis of Imported Food in Trinidad and Tobago | Cassandra Daniels et.al. | 2412.19781 | null |
| 2024-12-27 | Fortran2CPP: Automating Fortran-to-C++ Migration using LLMs via Multi-Turn Dialogue and Dual-Agent Integration | Le Chen et.al. | 2412.19770 | link |
| 2024-12-27 | Can Large Language Models Adapt to Other Agents In-Context? | Matthew Riemer et.al. | 2412.19726 | null |
| 2024-12-27 | Text2Insight: Transform natural language text into insights seamlessly using multi-model architecture | Pradeep Sain et.al. | 2412.19718 | null |
| 2024-12-27 | Toward Adaptive Reasoning in Large Language Models with Thought Rollback | Sijia Chen et.al. | 2412.19707 | link |
| 2024-12-27 | A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization | Jingchun Lian et.al. | 2412.19685 | null |
| 2024-12-27 | Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework | Jiang Liu et.al. | 2412.19684 | null |
| 2024-12-27 | CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs | Siyu Wang et.al. | 2412.19663 | link |
| 2024-12-27 | FreStega: A Plug-and-Play Method for Boosting Imperceptibility and Capacity in Generative Linguistic Steganography for Real-World Scenarios | Kaiyi Pang et.al. | 2412.19652 | null |
| 2024-12-24 | Decentralized Intelligence in GameFi: Embodied AI Agents and the Convergence of DeFi and Virtual Ecosystems | Fernando Jia et.al. | 2412.18601 | link |
| 2024-12-24 | A Paragraph is All It Takes: Rich Robot Behaviors from Interacting, Trusted LLMs | OpenMind et.al. | 2412.18588 | null |
| 2024-12-24 | Exploring Embedding Priors in Prompt-Tuning for Improved Interpretability and Control | Sergey Sedov et.al. | 2412.18582 | null |
| 2024-12-24 | Zero-resource Speech Translation and Recognition with LLMs | Karel Mundnich et.al. | 2412.18566 | null |
| 2024-12-24 | Distilling Fine-grained Sentiment Understanding from Large Language Models | Yice Zhang et.al. | 2412.18552 | link |
| 2024-12-24 | Token-Budget-Aware LLM Reasoning | Tingxu Han et.al. | 2412.18547 | link |
| 2024-12-24 | PLD-Tree: Persistent Laplacian Decision Tree for Protein-Protein Binding Free Energy Prediction | Xingjian Xu et.al. | 2412.18541 | null |
| 2024-12-24 | Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation | Derong Xu Xinhang Li et.al. | 2412.18537 | link |
| 2024-12-24 | Automated Code Review In Practice | Umut Cihan et.al. | 2412.18531 | null |
| 2024-12-24 | Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving | Hao Pang et.al. | 2412.18511 | null |
| 2024-12-23 | ChatGarment: Garment Estimation, Generation and Editing via Large Language Models | Siyuan Bian et.al. | 2412.17811 | link |
| 2024-12-23 | Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective | Xinmiao Yu et.al. | 2412.17787 | null |
| 2024-12-23 | ResearchTown: Simulator of Human Research Community | Haofei Yu et.al. | 2412.17767 | link |
| 2024-12-23 | Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy | Priyaranjan Pattnayak et.al. | 2412.17759 | null |
| 2024-12-23 | ADC: Enhancing Function Calling Via Adversarial Datasets and Code Line-Level Feedback | Wei Zhang et.al. | 2412.17754 | null |
| 2024-12-23 | Deliberation in Latent Space via Differentiable Cache Augmentation | Luyang Liu et.al. | 2412.17747 | null |
| 2024-12-23 | YuLan-Mini: An Open Data-efficient Language Model | Yiwen Hu et.al. | 2412.17743 | link |
| 2024-12-23 | **Reasoning to Attend: Try to Understand How |
Rui Qian et.al. | 2412.17741 | link |
| 2024-12-23 | Knowledge Editing through Chain-of-Thought | Changyue Wang et.al. | 2412.17727 | link |
| 2024-12-23 | Understanding the Logic of Direct Preference Alignment through Logic | Kyle Richardson et.al. | 2412.17696 | null |
| 2024-12-20 | HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding | Chenxin Tao et.al. | 2412.16158 | null |
| 2024-12-20 | Offline Reinforcement Learning for LLM Multi-Step Reasoning | Huaijie Wang et.al. | 2412.16145 | link |
| 2024-12-20 | Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation | Seyedreza Mohseni et.al. | 2412.16135 | link |
| 2024-12-20 | Data-Driven Mechanism Design: Jointly Eliciting Preferences and Information | Dirk Bergemann et.al. | 2412.16132 | null |
| 2024-12-20 | PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics | Daniil Larionov et.al. | 2412.16120 | null |
| 2024-12-20 | Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts | Muhammad Abdullah Sohail et.al. | 2412.16119 | link |
| 2024-12-20 | PruneVid: Visual Token Pruning for Efficient Video Large Language Models | Xiaohu Huang et.al. | 2412.16117 | link |
| 2024-12-20 | The Content Moderator’s Dilemma: Removal of Toxic Content and Distortions to Online Discourse | Mahyar Habibi et.al. | 2412.16114 | null |
| 2024-12-20 | Logical Consistency of Large Language Models in Fact-checking | Bishwamittra Ghosh et.al. | 2412.16100 | null |
| 2024-12-20 | The Evolution of LLM Adoption in Industry Data Curation Practices | Crystal Qian et.al. | 2412.16089 | null |
| 2024-12-19 | UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency | Enis Simsar et.al. | 2412.15216 | null |
| 2024-12-19 | Flowing from Words to Pixels: A Framework for Cross-Modality Evolution | Qihao Liu et.al. | 2412.15213 | link |
| 2024-12-19 | OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving | Shuo Xing et.al. | 2412.15208 | link |
| 2024-12-19 | AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving | Shuo Xing et.al. | 2412.15206 | link |
| 2024-12-19 | MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark | Qihao Zhao et.al. | 2412.15194 | link |
| 2024-12-19 | LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation | Weijia Shi et.al. | 2412.15188 | null |
| 2024-12-19 | Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning | Simon Frieder et.al. | 2412.15184 | null |
| 2024-12-19 | HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages | Aman Chaturvedi et.al. | 2412.15178 | null |
| 2024-12-19 | Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying | Federico Castagna et.al. | 2412.15177 | link |
| 2024-12-19 | Rethinking Uncertainty Estimation in Natural Language Generation | Lukas Aichberger et.al. | 2412.15176 | null |
| 2024-12-18 | Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces | Jihan Yang et.al. | 2412.14171 | link |
| 2024-12-18 | TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks | Frank F. Xu et.al. | 2412.14161 | link |
| 2024-12-18 | Advanced Reasoning and Transformation Engine for Multi-Step Insight Synthesis in Data Analytics with Large Language Models | Atin Sakkeer Hussain et.al. | 2412.14146 | null |
| 2024-12-18 | LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research | Tianyang Gu et.al. | 2412.14141 | null |
| 2024-12-18 | Design choices made by LLM-based test generators prevent them from finding bugs | Noble Saji Mathews et.al. | 2412.14137 | null |
| 2024-12-18 | Adversarial Hubness in Multi-Modal Retrieval | Tingwei Zhang et.al. | 2412.14113 | link |
| 2024-12-18 | Alignment faking in large language models | Ryan Greenblatt et.al. | 2412.14093 | link |
| 2024-12-18 | Future Research Avenues for Artificial Intelligence in Digital Gaming: An Exploratory Report | Markus Dablander et.al. | 2412.14085 | null |
| 2024-12-18 | Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification | Kyle Thompson et.al. | 2412.14063 | null |
| 2024-12-18 | Understanding and Evaluating Trust in Generative AI and Large Language Models for Spreadsheets | Simon Thorne et.al. | 2412.14062 | null |
| 2024-12-17 | SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents | Sheng Yin et.al. | 2412.13178 | link |
| 2024-12-17 | DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation | Miriam Wanner et.al. | 2412.13175 | null |
| 2024-12-17 | Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study | Bolei Ma et.al. | 2412.13169 | link |
| 2024-12-17 | C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System | Parker Addison et.al. | 2412.13163 | null |
| 2024-12-17 | BanglishRev: A Large-Scale Bangla-English and Code-mixed Dataset of Product Reviews in E-Commerce | Mohammad Nazmush Shamael et.al. | 2412.13161 | null |
| 2024-12-17 | SWAN: Preprocessing SGD Enables Adam-Level Performance On LLM Training With Significant Memory Reduction | Chao Ma et.al. | 2412.13148 | null |
| 2024-12-17 | Are Your LLMs Capable of Stable Reasoning? | Junnan Liu et.al. | 2412.13147 | link |
| 2024-12-17 | AI PERSONA: Towards Life-long Personalization of LLMs | Tiannan Wang et.al. | 2412.13103 | null |
| 2024-12-17 | AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark | Jianlyu Chen et.al. | 2412.13102 | link |
| 2024-12-17 | Modality-Inconsistent Continual Learning of Multimodal Large Language Models | Weiguo Pian et.al. | 2412.13050 | null |
| 2024-12-16 | SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator | Guoxuan Chen et.al. | 2412.12094 | link |
| 2024-12-16 | Instruction-based Image Manipulation by Watching How Things Move | Mingdeng Cao et.al. | 2412.12087 | null |
| 2024-12-16 | CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology | Yuxuan Sun et.al. | 2412.12077 | null |
| 2024-12-16 | CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding | Guo Chen et.al. | 2412.12075 | null |
| 2024-12-16 | Making FETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats | Kuleen Sasse et.al. | 2412.12072 | link |
| 2024-12-16 | How Private are Language Models in Abstractive Summarization? | Anthony Hughes et.al. | 2412.12040 | null |
| 2024-12-16 | Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection | Ira Ceka et.al. | 2412.12039 | null |
| 2024-12-16 | SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval | Yueqian Lin et.al. | 2412.12009 | null |
| 2024-12-16 | Agentic AI-Driven Technical Troubleshooting for Enterprise Systems: A Novel Weighted Retrieval-Augmented Generation Paradigm | Rajat Khanda et.al. | 2412.12006 | null |
| 2024-12-16 | The Open Source Advantage in Large Language Models (LLMs) | Jiya Manchanda et.al. | 2412.12004 | null |
| 2024-12-13 | UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities | Muhammad Uzair Khattak et.al. | 2412.10372 | link |
| 2024-12-13 | Robust image classification with multi-modal large language models | Francesco Villani et.al. | 2412.10353 | null |
| 2024-12-13 | COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models | Yuchen Ren et.al. | 2412.10347 | null |
| 2024-12-13 | Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining | Zhiqi Ge et.al. | 2412.10342 | null |
| 2024-12-13 | AdvPrefix: An Objective for Nuanced LLM Jailbreaks | Sicheng Zhu et.al. | 2412.10321 | null |
| 2024-12-13 | BrushEdit: All-In-One Image Inpainting and Editing | Yaowei Li et.al. | 2412.10316 | link |
| 2024-12-13 | DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding | Zhiyu Wu et.al. | 2412.10302 | link |
| 2024-12-13 | Buzz to Broadcast: Predicting Sports Viewership Using Social Media Engagement | Anakin Trotter et.al. | 2412.10298 | link |
| 2024-12-13 | Still “Talking About Large Language Models”: Some Clarifications | Murray Shanahan et.al. | 2412.10291 | null |
| 2024-12-13 | One world, one opinion? The superstar effect in LLM responses | Sofie Goethals et.al. | 2412.10281 | null |
| 2024-12-12 | Doe-1: Closed-Loop Autonomous Driving with Large World Model | Wenzhao Zheng et.al. | 2412.09627 | link |
| 2024-12-12 | EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM | Zhuofan Zong et.al. | 2412.09618 | null |
| 2024-12-12 | Olympus: A Universal Task Router for Computer Vision Tasks | Yuanze Lin et.al. | 2412.09612 | link |
| 2024-12-12 | SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding | Hao Li et.al. | 2412.09604 | null |
| 2024-12-12 | Do Multimodal Large Language Models See Like Humans? | Jiaying Lin et.al. | 2412.09603 | null |
| 2024-12-12 | InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions | Pan Zhang et.al. | 2412.09596 | link |
| 2024-12-12 | OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages | Chester Palen-Michel et.al. | 2412.09587 | null |
| 2024-12-12 | DISHONEST: Dissecting misInformation Spread using Homogeneous sOcial NEtworks and Semantic Topic classification | Caleb Stam et.al. | 2412.09578 | null |
| 2024-12-12 | DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction | Yu Feng et.al. | 2412.09572 | null |
| 2024-12-12 | Does Representation Matter? Exploring Intermediate Layers in Large Language Models | Oscar Skean et.al. | 2412.09563 | null |
| 2024-12-11 | Generative Semantic Communication: Architectures, Technologies, and Applications | Jinke Ren et.al. | 2412.08642 | null |
| 2024-12-11 | Fast Prompt Alignment for Text-to-Image Generation | Khalil Mrini et.al. | 2412.08639 | link |
| 2024-12-11 | Multimodal Latent Language Modeling with Next-Token Diffusion | Yutao Sun et.al. | 2412.08635 | link |
| 2024-12-11 | Synthetic Vision: Training Vision-Language Models to Understand Physics | Vahid Balazadeh et.al. | 2412.08619 | null |
| 2024-12-11 | Image Retrieval Methods in the Dissimilarity Space | Madhu Kiran et.al. | 2412.08618 | null |
| 2024-12-11 | Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models | Jiahui Li et.al. | 2412.08615 | link |
| 2024-12-11 | Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning | Fan Lu et.al. | 2412.08614 | link |
| 2024-12-11 | Preference Discerning with LLM-Enhanced Generative Retrieval | Fabian Paischer et.al. | 2412.08604 | null |
| 2024-12-11 | Empirical Measurements of AI Training Power Demand on a GPU-Accelerated Node | Imran Latif et.al. | 2412.08602 | null |
| 2024-12-11 | Leveraging Graph-RAG and Prompt Engineering to Enhance LLM-Based Automated Requirement Traceability and Compliance Checks | Arsalan Masoudifard et.al. | 2412.08593 | null |
| 2024-12-10 | BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities | Sahal Shaji Mullappilly et.al. | 2412.07769 | link |
| 2024-12-10 | Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences | Alan Nawzad Amin et.al. | 2412.07763 | link |
| 2024-12-10 | Zero-Shot ATC Coding with Large Language Models for Clinical Assessments | Zijian Chen et.al. | 2412.07743 | null |
| 2024-12-10 | Image Retrieval with Intra-Sweep Representation Learning for Neck Ultrasound Scanning Guidance | Wanwen Chen et.al. | 2412.07741 | null |
| 2024-12-10 | Granite Guardian | Inkit Padhi et.al. | 2412.07724 | link |
| 2024-12-10 | DriveMM: All-in-One Large Multimodal Model for Autonomous Driving | Zhijian Huang et.al. | 2412.07689 | link |
| 2024-12-10 | Privacy-Preserving Customer Support: A Framework for Secure and Scalable Interactions | Anant Prakash Awasthi et.al. | 2412.07687 | null |
| 2024-12-10 | TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation | Alfredo Garrachón Ruiz et.al. | 2412.07682 | null |
| 2024-12-10 | Ask Humans or AI? Exploring Their Roles in Visualization Troubleshooting | Shuyu Shen et.al. | 2412.07673 | null |
| 2024-12-10 | FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks | Bocheng Chen et.al. | 2412.07672 | null |
| 2024-12-09 | Training Large Language Models to Reason in a Continuous Latent Space | Shibo Hao et.al. | 2412.06769 | null |
| 2024-12-09 | Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code | Joy Krishan Das et.al. | 2412.06757 | null |
| 2024-12-09 | Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models | Neel Jain et.al. | 2412.06748 | null |
| 2024-12-09 | JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM | Takuro Fujii et.al. | 2412.06738 | null |
| 2024-12-09 | AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark | Lan Li et.al. | 2412.06724 | null |
| 2024-12-09 | DEEPER: Dense Electroencephalography Passage Retrieval | Niall McGuire et.al. | 2412.06695 | null |
| 2024-12-09 | OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions | Yi-Kai Zhang et.al. | 2412.06693 | null |
| 2024-12-09 | Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach | Weichao Xu et.al. | 2412.06684 | null |
| 2024-12-09 | Toward LLM-Agent-Based Modeling of Transportation Systems: A Conceptual Framework | Tianming Liu et.al. | 2412.06681 | null |
| 2024-12-09 | I Don’t Know: Explicit Modeling of Uncertainty with an [IDK] Token | Roi Cohen et.al. | 2412.06676 | link |
| 2024-12-06 | Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | Zhe Chen et.al. | 2412.05271 | null |
| 2024-12-06 | APOLLO: SGD-like Memory, AdamW-level Performance | Hanqing Zhu et.al. | 2412.05270 | link |
| 2024-12-06 | CompCap: Improving Multimodal Large Language Models with Composite Captions | Xiaohui Chen et.al. | 2412.05243 | null |
| 2024-12-06 | MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale | Jarvis Guo et.al. | 2412.05237 | link |
| 2024-12-06 | BEExformer: A Fast Inferencing Transformer Architecture via Binarization with Multiple Early Exits | Wazib Ansar et.al. | 2412.05225 | null |
| 2024-12-06 | 100% Hallucination Elimination Using Acurai | Michael C. Wood et.al. | 2412.05223 | null |
| 2024-12-06 | Evaluating and Aligning CodeLLMs on Human Preference | Jian Yang et.al. | 2412.05210 | link |
| 2024-12-06 | A Survey of Large Language Model-Based Generative AI for Text-to-SQL: Benchmarks, Applications, Use Cases, and Challenges | Aditi Singh et.al. | 2412.05208 | null |
| 2024-12-06 | Are Frontier Large Language Models Suitable for Q&A in Science Centres? | Jacob Watson et.al. | 2412.05200 | null |
| 2024-12-06 | SurgBox: Agent-Driven Operating Room Sandbox with Surgery Copilot | Jinlin Wu et.al. | 2412.05187 | link |
| 2024-12-05 | p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay | Jun Zhang et.al. | 2412.04449 | link |
| 2024-12-05 | EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios | Lu Qiu et.al. | 2412.04447 | null |
| 2024-12-05 | Moto: Latent Motion Token as the Bridging Language for Robot Manipulation | Yi Chen et.al. | 2412.04445 | link |
| 2024-12-05 | Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation | Yuying Ge et.al. | 2412.04432 | link |
| 2024-12-05 | Grounding Descriptions in Images informs Zero-Shot Visual Recognition | Shaunak Halbe et.al. | 2412.04429 | link |
| 2024-12-05 | Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion | Jiuhai Chen et.al. | 2412.04424 | link |
| 2024-12-05 | Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation | Xuying Li et.al. | 2412.04415 | null |
| 2024-12-05 | Retrieval-Augmented Machine Translation with Unstructured Knowledge | Jiaan Wang et.al. | 2412.04342 | link |
| 2024-12-05 | Liquid: Language Models are Scalable Multi-modal Generators | Junfeng Wu et.al. | 2412.04332 | link |
| 2024-12-05 | The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation | Fredrik Carlsson et.al. | 2412.04318 | null |
| 2024-12-04 | From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents | Xinyi Mou et.al. | 2412.03563 | link |
| 2024-12-04 | SPICE: Smart Projection Interface for Cooking Enhancement | Vera Prohaska et.al. | 2412.03551 | null |
| 2024-12-04 | Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models | Natalie Mackraz et.al. | 2412.03537 | null |
| 2024-12-04 | A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences | Gabriel Lino Garcia et.al. | 2412.03531 | null |
| 2024-12-04 | FANAL – Financial Activity News Alerting Language Modeling Framework | Urjitkumar Patel et.al. | 2412.03527 | null |
| 2024-12-04 | You’re (Not) My Type – Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks? | Dominic Lohr et.al. | 2412.03516 | null |
| 2024-12-04 | Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective | Neta Shaul et.al. | 2412.03487 | null |
| 2024-12-04 | Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning | Neale Ratzlaff et.al. | 2412.03467 | null |
| 2024-12-04 | From Words to Workflows: Automating Business Processes | Laura Minkova et.al. | 2412.03446 | null |
| 2024-12-04 | RedStone: Curating General, Code, Math, and QA Data for Large Language Models | Yaoyao Chang et.al. | 2412.03398 | null |
| 2024-12-03 | T-REG: Preference Optimization with Token-Level Reward Regularization | Wenxuan Zhou et.al. | 2412.02685 | link |
| 2024-12-03 | Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models | Yuda Song et.al. | 2412.02674 | null |
| 2024-12-03 | LLM-Enhanced Path Planning: Safe and Efficient Autonomous Navigation with Instructional Inputs | Pranav Doma et.al. | 2412.02655 | null |
| 2024-12-03 | Time-Reversal Provides Unsupervised Feedback to LLMs | Yerram Varun et.al. | 2412.02626 | null |
| 2024-12-03 | Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback | Hiroki Furuta et.al. | 2412.02617 | null |
| 2024-12-03 | AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? | Kaixiong Gong et.al. | 2412.02611 | link |
| 2024-12-03 | Interpretable Company Similarity with Sparse Autoencoders | Marco Molinari et.al. | 2412.02605 | null |
| 2024-12-03 | CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs | Abhas Kumar et.al. | 2412.02602 | null |
| 2024-12-03 | PrefixLLM: LLM-aided Prefix Circuit Design | Weihua Xiao et.al. | 2412.02594 | null |
| 2024-12-03 | OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation | Junyuan Zhang et.al. | 2412.02592 | link |
| 2024-12-02 | T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs | Shukang Yin et.al. | 2411.19951 | link |
| 2024-12-02 | Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability | Zicheng Lin et.al. | 2411.19943 | link |
| 2024-11-29 | VLSBench: Unveiling Visual Leakage in Multimodal Safety | Xuhao Hu et.al. | 2411.19939 | link |
| 2024-11-29 | On Domain-Specific Post-Training for Multimodal Large Language Models | Daixuan Cheng et.al. | 2411.19930 | link |
| 2024-11-29 | SIMS: Simulating Human-Scene Interactions with Real World Script Planning | Wenjia Wang et.al. | 2411.19921 | null |
| 2024-11-29 | PDDLFuse: A Tool for Generating Diverse Planning Domains | Vedant Khandelwal et.al. | 2411.19886 | null |
| 2024-12-02 | LUMIA: Linear probing for Unimodal and MultiModal Membership Inference Attacks leveraging internal LLM states | Luis Ibanez-Lissen et.al. | 2411.19876 | null |
| 2024-11-29 | AIDetx: a compression-based method for identification of machine-learning generated text | Leonardo Almeida et.al. | 2411.19869 | link |
| 2024-11-29 | Reverse Thinking Makes LLMs Stronger Reasoners | Justin Chih-Yao Chen et.al. | 2411.19865 | null |
| 2024-11-29 | Cross-Domain Recommendation Meets Large Language Models | Ajay Krishna Vajjala et.al. | 2411.19862 | link |
| 2024-11-27 | Cross-modal Information Flow in Multimodal Large Language Models | Zhi Zhang et.al. | 2411.18620 | null |
| 2024-11-27 | Automated Literature Review Using NLP Techniques and LLM-Based Retrieval-Augmented Generation | Nurshat Fateh Ali et.al. | 2411.18583 | null |
| 2024-11-27 | Challenges in Adapting Multilingual LLMs to Low-Resource Languages using LoRA PEFT Tuning | Omkar Khade et.al. | 2411.18571 | null |
| 2024-11-27 | A Pipeline of Neural-Symbolic Integration to Enhance Spatial Reasoning in Large Language Models | Rong Wang et.al. | 2411.18564 | null |
| 2024-11-27 | DexDiffuser: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation | Zhixuan Liang et.al. | 2411.18562 | null |
| 2024-11-27 | Retrofitting (Large) Language Models with Dynamic Tokenization | Darius Feher et.al. | 2411.18553 | null |
| 2024-11-27 | Emergence of Self-Identity in AI: A Mathematical Framework and Empirical Study with Generative Large Language Models | Minhyeok Lee et.al. | 2411.18530 | link |
| 2024-11-27 | LLM-ABBA: Understand time series via symbolic approximation | Erin Carson et.al. | 2411.18506 | null |
| 2024-11-27 | GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation | Pengfei Zhou et.al. | 2411.18499 | link |
| 2024-11-27 | Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS | Jinyang Wu et.al. | 2411.18478 | link |
| 2024-11-26 | Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats | Jiaxin Wen et.al. | 2411.17693 | null |
| 2024-11-26 | Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens | Xu Ouyang et.al. | 2411.17691 | null |
| 2024-11-26 | Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration | Yuhang Han et.al. | 2411.17686 | null |
| 2024-11-26 | Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning | Zhu Xu et.al. | 2411.17679 | null |
| 2024-11-26 | Push the Limit of Multi-modal Emotion Recognition by Prompting LLMs with Receptive-Field-Aware Attention Weighting | Liyun Zhang et.al. | 2411.17674 | null |
| 2024-11-26 | SketchAgent: Language-Driven Sequential Sketch Generation | Yael Vinker et.al. | 2411.17673 | null |
| 2024-11-26 | Synthetic Data Generation with LLM for Improved Depression Prediction | Andrea Kang et.al. | 2411.17672 | null |
| 2024-11-26 | BERT or FastText? A Comparative Analysis of Contextual as well as Non-Contextual Embeddings | Abhay Shanbhag et.al. | 2411.17661 | null |
| 2024-11-26 | Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism | Yi-Chien Lin et.al. | 2411.17651 | null |
| 2024-11-26 | On Limitations of LLM as Annotator for Low Resource Languages | Suramya Jadhav et.al. | 2411.17637 | null |
| 2024-11-25 | Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts? | Sohee Yang et.al. | 2411.16679 | null |
| 2024-11-25 | DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation | Zun Wang et.al. | 2411.16657 | null |
| 2024-11-25 | Self-Generated Critiques Boost Reward Modeling for Language Models | Yue Yu et.al. | 2411.16646 | null |
| 2024-11-25 | Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective | Jean Marie Tshimula et.al. | 2411.16642 | null |
| 2024-11-25 | Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models | Ronghuan Wu et.al. | 2411.16602 | null |
| 2024-11-25 | From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge | Dawei Li et.al. | 2411.16594 | link |
| 2024-11-25 | Large Language Model-based Decision-making for COLREGs and the Control of Autonomous Surface Vehicles | Klinsmann Agyei et.al. | 2411.16587 | null |
| 2024-11-25 | MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series | Aaron Wheeler et.al. | 2411.16585 | null |
| 2024-11-25 | Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision | Zhiheng Xi et.al. | 2411.16579 | null |
| 2024-11-25 | Predictive Power of LLMs in Financial Markets | Jerick Shi et.al. | 2411.16569 | null |
| 2024-11-22 | Measuring Bullshit in the Language Games played by ChatGPT | Alessandro Trevisan et.al. | 2411.15129 | null |
| 2024-11-22 | AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution | Fengyuan Liu et.al. | 2411.15102 | link |
| 2024-11-22 | XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models | Yixin Dong et.al. | 2411.15100 | null |
| 2024-11-22 | Locating the Leading Edge of Cultural Change | Sarah Griebel et.al. | 2411.15068 | link |
| 2024-11-22 | mR $^2$ AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA | Tao Zhang et.al. | 2411.15041 | null |
| 2024-11-22 | One to rule them all: natural language to bind communication, perception and action | Simone Colombani et.al. | 2411.15033 | null |
| 2024-11-22 | Time is on my sight: scene graph filtering for dynamic environment perception in an LLM-driven robot | Simone Colombani et.al. | 2411.15027 | null |
| 2024-11-22 | DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models | Keda Tao et.al. | 2411.15024 | null |
| 2024-11-22 | FTA generation using GenAI with an Autonomy sensor Usecase | Sneha Sudhir Shetiya et.al. | 2411.15007 | null |
| 2024-11-22 | ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data | Junhong Shen et.al. | 2411.15004 | link |
| 2024-11-21 | Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models | Yuhao Dong et.al. | 2411.14432 | link |
| 2024-11-21 | Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding | Yiming Zhang et.al. | 2411.14401 | null |
| 2024-11-21 | Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings | Aaron Zheng et.al. | 2411.14398 | null |
| 2024-11-21 | UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages | Bethel Melesse Tessema et.al. | 2411.14343 | link |
| 2024-11-21 | Velocitune: A Velocity-based Dynamic Domain Reweighting Method for Continual Pre-training | Zheheng Luo et.al. | 2411.14318 | null |
| 2024-11-21 | Automated Generation of Code Debugging Exercises | Victor-Alexandru Pădurean et.al. | 2411.14303 | null |
| 2024-11-21 | Auto-SPICE: Leveraging LLMs for Dataset Creation via Automated SPICE Netlist Extraction from Analog Circuit Diagrams | Jitendra Bhandari et.al. | 2411.14299 | null |
| 2024-11-21 | Efficient Aspect-Based Summarization of Climate Change Reports with Small Language Models | Iacopo Ghinassi et.al. | 2411.14272 | link |
| 2024-11-21 | Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective | Ernests Lavrinovics et.al. | 2411.14258 | null |
| 2024-11-21 | Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models | Javier Ferrando et.al. | 2411.14257 | null |
| 2024-11-20 | SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs | Shirley Kokane et.al. | 2411.13547 | null |
| 2024-11-20 | BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games | Davide Paglieri et.al. | 2411.13543 | null |
| 2024-11-20 | Metacognition for Unknown Situations and Environments (MUSE) | Rodolfo Valiente et.al. | 2411.13537 | null |
| 2024-11-20 | Advancing Complex Medical Communication in Arabic with Sporo AraSum: Surpassing Existing Large Language Models | Chanseo Lee et.al. | 2411.13518 | null |
| 2024-11-20 | Disentangling Memory and Reasoning Ability in Large Language Models | Mingyu Jin et.al. | 2411.13504 | link |
| 2024-11-20 | Utilizing Large Language Models to Synthesize Product Desirability Datasets | John D. Hastings et.al. | 2411.13485 | null |
| 2024-11-20 | PatentEdits: Framing Patent Novelty as Textual Entailment | Ryan Lee et.al. | 2411.13477 | null |
| 2024-11-20 | When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training | Haonan Wang et.al. | 2411.13476 | link |
| 2024-11-20 | SoK: A Systems Perspective on Compound AI Threats and Countermeasures | Sarbartha Banerjee et.al. | 2411.13459 | null |
| 2024-11-20 | AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations | Gaurav Verma et.al. | 2411.13451 | null |
| 2024-11-19 | ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models | Salma Kharrat et.al. | 2411.12736 | link |
| 2024-11-19 | Information Theory of Meaningful Communication | Doron Sivan et.al. | 2411.12728 | null |
| 2024-11-19 | CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs | Zhehan Kan et.al. | 2411.12713 | null |
| 2024-11-19 | Strengthening Fake News Detection: Leveraging SVM and Sophisticated Text Vectorization Techniques. Defying BERT? | Ahmed Akib Jawad Karim et.al. | 2411.12703 | null |
| 2024-11-19 | When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations | Huaizhi Ge et.al. | 2411.12701 | null |
| 2024-11-19 | SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference | Jiho Shin et.al. | 2411.12692 | null |
| 2024-11-19 | Neurosymbolic Graph Enrichment for Grounded World Models | Stefano De Giorgis et.al. | 2411.12671 | null |
| 2024-11-19 | DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models | Vinay Kumar Sankarapu et.al. | 2411.12643 | link |
| 2024-11-19 | Improving Controllability and Editability for Pretrained Text-to-Music Generation Models | Yixiao Zhang et.al. | 2411.12641 | null |
| 2024-11-19 | AdaCM $^2$ : On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction | Yuanbin Man et.al. | 2411.12593 | null |
| 2024-11-18 | Bi-Mamba: Towards Accurate 1-Bit State Space Models | Shengkun Tang et.al. | 2411.11843 | null |
| 2024-11-18 | Tackling prediction tasks in relational databases with LLMs | Marek Wydmuch et.al. | 2411.11829 | null |
| 2024-11-18 | Exploring adversarial robustness of JPEG AI: methodology, comparison and new methods | Egor Kovalev et.al. | 2411.11795 | null |
| 2024-11-18 | LLM-IE: A Python Package for Generative Information Extraction with Large Language Models | Enshuo Hsu et.al. | 2411.11779 | null |
| 2024-11-18 | The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning | Longju Bai et.al. | 2411.11758 | link |
| 2024-11-18 | sMoRe: Enhancing Object Manipulation and Organization in Mixed Reality Spaces with LLMs and Generative AI | Yunhao Xing et.al. | 2411.11752 | null |
| 2024-11-18 | BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration | Yuzong Chen et.al. | 2411.11745 | link |
| 2024-11-18 | Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment | Allison Huang et.al. | 2411.11731 | null |
| 2024-11-18 | Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation | Mingchao Qi et.al. | 2411.11714 | link |
| 2024-11-18 | FedCoLLM: A Parameter-Efficient Federated Co-tuning Framework for Large and Small Language Models | Tao Fan et.al. | 2411.11707 | null |
| 2024-11-15 | Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization | Weiyun Wang et.al. | 2411.10442 | link |
| 2024-11-15 | LLaVA-o1: Let Vision Language Models Reason Step-by-Step | Guowei Xu et.al. | 2411.10440 | link |
| 2024-11-15 | MARS: Unleashing the Power of Variance Reduction for Training Large Models | Huizhuo Yuan et.al. | 2411.10438 | link |
| 2024-11-15 | Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization | Yuhan Fu et.al. | 2411.10436 | null |
| 2024-11-15 | Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent Balderdash | Parsa Hejabi et.al. | 2411.10422 | link |
| 2024-11-15 | Interactive Cycle Model – The Linkage Combination among Automatic Speech Recognition, Large Language Models and Smart Glasses | Libo Wang et.al. | 2411.10362 | null |
| 2024-11-15 | Bias Unveiled: Investigating Social Bias in LLM-Generated Code | Lin Ling et.al. | 2411.10351 | null |
| 2024-11-15 | On the Cost of Model-Serving Frameworks: An Experimental Evaluation | Pasquale De Rosa et.al. | 2411.10337 | null |
| 2024-11-15 | Number it: Temporal Grounding Videos like Flipping Manga | Yongliang Wu et.al. | 2411.10332 | link |
| 2024-11-15 | Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting | Ziqi Xie et.al. | 2411.10309 | link |
| 2024-11-14 | MagicQuill: An Intelligent Interactive Image Editing System | Zichen Liu et.al. | 2411.09703 | link |
| 2024-11-14 | Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models | Wei Wang et.al. | 2411.09691 | null |
| 2024-11-14 | Squeezed Attention: Accelerating Long Context Length LLM Inference | Coleman Hooper et.al. | 2411.09688 | link |
| 2024-11-14 | Towards a Classification of Open-Source ML Models and Datasets for Software Engineering | Alexandra González et.al. | 2411.09683 | null |
| 2024-11-14 | Med-Bot: An AI-Powered Assistant to Provide Accurate and Reliable Medical Information | Ahan Bhatt et.al. | 2411.09648 | null |
| 2024-11-14 | Local deployment of large-scale music AI models on commodity hardware | Xun Zhou et.al. | 2411.09625 | null |
| 2024-11-14 | PTR: Precision-Driven Tool Recommendation for Large Language Models | Hang Gao et.al. | 2411.09613 | null |
| 2024-11-14 | The Moral Foundations Weibo Corpus | Renjie Cao et.al. | 2411.09612 | null |
| 2024-11-14 | Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework | Ronak Pradeep et.al. | 2411.09607 | null |
| 2024-11-14 | Accelerating Knowledge Graph and Ontology Engineering with Large Language Models | Cogan Shimizu et.al. | 2411.09601 | null |
| 2024-11-13 | The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models | Daniel P. Jeong et.al. | 2411.08870 | null |
| 2024-11-13 | LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs | Piyush Jha et.al. | 2411.08862 | null |
| 2024-11-13 | Multimodal Instruction Tuning with Hybrid State Space Models | Jianing Zhou et.al. | 2411.08840 | null |
| 2024-11-13 | FinRobot: AI Agent for Equity Research and Valuation with Large Language Models | Tianyu Zhou et.al. | 2411.08804 | link |
| 2024-11-13 | Evaluating World Models with LLM for Decision Making | Chang Yang et.al. | 2411.08794 | null |
| 2024-11-13 | Can sparse autoencoders be used to decompose and interpret steering vectors? | Harry Mayne et.al. | 2411.08790 | link |
| 2024-11-13 | Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers | Clément Dumas et.al. | 2411.08745 | link |
| 2024-11-13 | A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models | Dingdong Wang et.al. | 2411.08742 | null |
| 2024-11-13 | Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models | Somanshu Singla et.al. | 2411.08733 | link |
| 2024-11-13 | Polymetis:Large Language Modeling for Multiple Material Domains | Chao Huang et.al. | 2411.08728 | null |
| 2024-11-12 | Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data | Juanhui Li et.al. | 2411.08028 | null |
| 2024-11-12 | LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models | Anoop Cherian et.al. | 2411.08027 | null |
| 2024-11-12 | Language Models as Causal Effect Generators | Lucius E. J. Bynum et.al. | 2411.08019 | link |
| 2024-11-12 | ExpressivityArena: Can LLMs Express Information Implicitly? | Joshua Tint et.al. | 2411.08010 | null |
| 2024-11-12 | Can adversarial attacks by large language models be attributed? | Manuel Cebrian et.al. | 2411.08003 | null |
| 2024-11-12 | Derivational Morphology Reveals Analogical Generalization in Large Language Models | Valentin Hofmann et.al. | 2411.07990 | null |
| 2024-11-12 | JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation | Yiyang Ma et.al. | 2411.07975 | link |
| 2024-11-12 | From General to Specific: Utilizing General Hallucation to Automatically Measure the Role Relationship Fidelity for Specific Role-Play Agents | Chuyi Kong et.al. | 2411.07965 | null |
| 2024-11-12 | Towards Low-bit Communication for Tensor Parallel LLM Inference | Harry Dong et.al. | 2411.07942 | null |
| 2024-11-12 | Leveraging Multimodal Models for Enhanced Neuroimaging Diagnostics in Alzheimer’s Disease | Francesco Chiumento et.al. | 2411.07871 | null |
| 2024-11-11 | UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts | Bo Yang et.al. | 2411.07240 | link |
| 2024-11-11 | OpenThaiGPT 1.5: A Thai-Centric Open Source Large Language Model | Sumeth Yuenyong et.al. | 2411.07238 | null |
| 2024-11-11 | Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving | Botao Yu et.al. | 2411.07228 | null |
| 2024-11-11 | Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks | Madeline Brumley et.al. | 2411.07213 | null |
| 2024-11-11 | DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID | Nyle Siddiqui et.al. | 2411.07205 | link |
| 2024-11-11 | The Super Weight in Large Language Models | Mengxia Yu et.al. | 2411.07191 | link |
| 2024-11-11 | NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics | David Robinson et.al. | 2411.07186 | null |
| 2024-11-11 | Gradual Fine-Tuning with Graph Routing for Multi-Source Unsupervised Domain Adaptation | Yao Ma et.al. | 2411.07185 | null |
| 2024-11-11 | Continual Memorization of Factoids in Large Language Models | Howard Chen et.al. | 2411.07175 | link |
| 2024-11-11 | A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19 | Vedant Khandelwal et.al. | 2411.07163 | null |
| 2024-11-08 | Recycled Attention: Efficient inference for long-context language models | Fangyuan Xu et.al. | 2411.05787 | link |
| 2024-11-08 | Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths? | Veronica Chatrath et.al. | 2411.05775 | null |
| 2024-11-08 | Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024 | Christopher Malon et.al. | 2411.05762 | null |
| 2024-11-08 | Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models | Jia-Hong Huang et.al. | 2411.05706 | null |
| 2024-11-08 | Unmasking the Limits of Large Language Models: A Systematic Evaluation of Masked Text Processing Ability through MskQA and MskCal | Fuka Matsuzaki et.al. | 2411.05665 | link |
| 2024-11-08 | The influence of persona and conversational task on social interactions with a LLM-controlled embodied conversational agent | Leon O. H. Kroczek et.al. | 2411.05653 | null |
| 2024-11-08 | LightVA: Lightweight Visual Analytics with LLM Agent-Based Task Planning and Execution | Yuheng Zhao et.al. | 2411.05651 | null |
| 2024-11-08 | Evaluating Large Language Model Capability in Vietnamese Fact-Checking Data Generation | Long Truong To et.al. | 2411.05641 | null |
| 2024-11-08 | Assessing Open-Source Large Language Models on Argumentation Mining Subtasks | Mohammad Yeghaneh Abkenar et.al. | 2411.05639 | null |
| 2024-11-08 | A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis | Cristiano Patrício et.al. | 2411.05609 | null |
| 2024-11-07 | SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models | Muyang Li et.al. | 2411.05007 | link |
| 2024-11-07 | Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? | Jonathan Roberts et.al. | 2411.05000 | link |
| 2024-11-07 | LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation | Weiquan Huang et.al. | 2411.04997 | link |
| 2024-11-07 | Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models | Weixin Liang et.al. | 2411.04996 | link |
| 2024-11-07 | Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and Alternatives | Hao Sun et.al. | 2411.04991 | link |
| 2024-11-07 | Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries | Dylan Manuel et.al. | 2411.04981 | null |
| 2024-11-07 | SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference | Gabriele Oliaro et.al. | 2411.04975 | link |
| 2024-11-07 | BitNet a4.8: 4-bit Activations for 1-bit LLMs | Hongyu Wang et.al. | 2411.04965 | link |
| 2024-11-07 | Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability | Yanjun Gao et.al. | 2411.04962 | null |
| 2024-11-07 | CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM | Jingwei Xu et.al. | 2411.04954 | link |
| 2024-11-06 | Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? | Daniel P. Jeong et.al. | 2411.04118 | null |
| 2024-11-06 | How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis | Guan Zhe Hong et.al. | 2411.04105 | null |
| 2024-11-06 | Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation | Ke Fan et.al. | 2411.04079 | null |
| 2024-11-06 | Beemo: Benchmark of Expert-edited Machine-generated Outputs | Ekaterina Artemova et.al. | 2411.04032 | link |
| 2024-11-06 | Prompt Engineering Using GPT for Word-Level Code-Mixed Language Identification in Low-Resource Dravidian Languages | Aniket Deroy et.al. | 2411.04025 | null |
| 2024-11-06 | Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval | Davide Buoso et.al. | 2411.04006 | null |
| 2024-11-06 | Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning | Jiawei Yao et.al. | 2411.03978 | null |
| 2024-11-06 | What Really is Commonsense Knowledge? | Quyet V. Do et.al. | 2411.03964 | null |
| 2024-11-06 | How Does A Text Preprocessing Pipeline Affect Ontology Syntactic Matching? | Zhangcheng Qiang et.al. | 2411.03962 | null |
| 2024-11-06 | Fine-Grained Guidance for Retrievers: Leveraging LLMs’ Feedback in Retrieval-Augmented Generation | Yuhang Liu et.al. | 2411.03957 | null |
| 2024-11-05 | MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning | Ziliang Gan et.al. | 2411.03314 | null |
| 2024-11-05 | LLMs for Domain Generation Algorithm Detection | Reynier Leyva La O et.al. | 2411.03307 | null |
| 2024-11-05 | VERITAS: A Unified Approach to Reliability Evaluation | Rajkumar Ramamurthy et.al. | 2411.03300 | null |
| 2024-11-05 | Examining Human-AI Collaboration for Co-Writing Constructive Comments Online | Farhana Shahid et.al. | 2411.03295 | null |
| 2024-11-05 | Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation? | Jingyu Xiao et.al. | 2411.03292 | null |
| 2024-11-05 | The Future of Intelligent Healthcare: A Systematic Analysis and Discussion on the Integration and Impact of Robots Using Large Language Models for Healthcare | Souren Pashangpour et.al. | 2411.03287 | null |
| 2024-11-05 | SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents | Dawei Li et.al. | 2411.03284 | link |
| 2024-11-05 | Spontaneous Emergence of Agent Individuality through Social Interactions in LLM-Based Communities | Ryosuke Takata et.al. | 2411.03252 | null |
| 2024-11-05 | DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models | Ying Zhou et.al. | 2411.03250 | null |
| 2024-11-05 | From Pen to Prompt: How Creative Writers Integrate AI into their Writing Practice | Alicia Guo et.al. | 2411.03137 | null |
| 2024-11-04 | Training-free Regional Prompting for Diffusion Transformers | Anthony Chen et.al. | 2411.02395 | link |
| 2024-11-04 | Adaptive Length Image Tokenization via Recurrent Allocation | Shivam Duggal et.al. | 2411.02393 | link |
| 2024-11-04 | Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models | Guangzhi Xiong et.al. | 2411.02382 | null |
| 2024-11-04 | Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI | Ramneet Kaur et.al. | 2411.02381 | null |
| 2024-11-04 | DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution | Yang Yue et.al. | 2411.02359 | link |
| 2024-11-04 | “Give Me BF16 or Give Me Death”? Accuracy-Performance Trade-Offs in LLM Quantization | Eldar Kurtic et.al. | 2411.02355 | null |
| 2024-11-04 | Social-RAG: Retrieving from Group Interactions to Socially Ground Proactive AI Generation to Group Preferences | Ruotong Wang et.al. | 2411.02353 | null |
| 2024-11-04 | Can Large Language Models generalize analogy solving like people can? | Claire E. Stevenson et.al. | 2411.02348 | null |
| 2024-11-04 | WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning | Zehan Qi et.al. | 2411.02337 | link |
| 2024-11-04 | Sparsing Law: Towards Large Language Models with Greater Activation Sparsity | Yuqi Luo et.al. | 2411.02335 | link |
| 2024-10-31 | P-Masking: Power Law Masking Improves Multi-attribute Controlled Generation | Mohamed Elgaar et.al. | 2410.24201 | null |
| 2024-11-01 | SelfCodeAlign: Self-Alignment for Code Generation | Yuxiang Wei et.al. | 2410.24198 | link |
| 2024-10-31 | Constraint Back-translation Improves Complex Instruction Following of Large Language Models | Yunjia Qi et.al. | 2410.24175 | link |
| 2024-10-31 | Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning | Jinghan Zhang et.al. | 2410.24155 | null |
| 2024-10-31 | Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning | Jiaqi Liu et.al. | 2410.24152 | null |
| 2024-10-31 | Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age | Nouar AlDahoul et.al. | 2410.24148 | null |
| 2024-11-01 | Multi-environment Topic Models | Dominic Sobhani et.al. | 2410.24126 | null |
| 2024-10-31 | Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing | Akash Dhruv et.al. | 2410.24119 | link |
| 2024-10-31 | Repository-Level Compositional Code Translation and Validation | Ali Reza Ibrahimzada et.al. | 2410.24117 | null |
| 2024-10-31 | Nearest Neighbor Normalization Improves Multimodal Retrieval | Neil Chowdhury et.al. | 2410.24114 | link |
| 2024-10-30 | EMMA: End-to-End Multimodal Model for Autonomous Driving | Jyh-Jing Hwang et.al. | 2410.23262 | null |
| 2024-10-30 | Evaluating Cultural and Social Awareness of LLM Web Agents | Haoyi Qiu et.al. | 2410.23252 | null |
| 2024-10-30 | Carrot and Stick: Eliciting Comparison Data and Beyond | Yiling Chen et.al. | 2410.23243 | null |
| 2024-10-30 | A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment | Matteo G. Mecattaf et.al. | 2410.23242 | null |
| 2024-10-30 | EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning | Peide Huang et.al. | 2410.23234 | null |
| 2024-10-31 | Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval | Sheryl Hsu et.al. | 2410.23214 | null |
| 2024-10-30 | Reliability of Topic Modeling | Kayla Schroeder et.al. | 2410.23186 | null |
| 2024-10-30 | ProTransformer: Robustify Transformers via Plug-and-Play Paradigm | Zhichao Hou et.al. | 2410.23182 | null |
| 2024-10-30 | ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning | Millennium Bismay et.al. | 2410.23180 | link |
| 2024-10-30 | SciPIP: An LLM-based Scientific Paper Idea Proposer | Wenxiao Wang et.al. | 2410.23166 | link |
| 2024-10-29 | Enhancing Code Annotation Reliability: Generative AI’s Role in Comment Quality Assessment Models | Seetharam Killivalavan et.al. | 2410.22323 | null |
| 2024-10-29 | Online Detecting LLM-Generated Texts via Sequential Hypothesis Testing by Betting | Can Chen et.al. | 2410.22318 | link |
| 2024-10-29 | Natural Language Inference Improves Compositionality in Vision-Language Models | Paola Cascante-Bonilla et.al. | 2410.22315 | null |
| 2024-10-29 | GPT-4o reads the mind in the eyes | James W. A. Strachan et.al. | 2410.22309 | null |
| 2024-10-29 | SVIP: Towards Verifiable Inference of Open-source Large Language Models | Yifan Sun et.al. | 2410.22307 | null |
| 2024-10-29 | Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | Yihe Deng et.al. | 2410.22304 | null |
| 2024-10-29 | LLMs are Highly-Constrained Biophysical Sequence Optimizers | Angelica Chen et.al. | 2410.22296 | null |
| 2024-10-29 | Fine-Tuning LLMs for Code Mutation: A New Era of Cyber Threats | Mohammad Setak et.al. | 2410.22293 | null |
| 2024-10-29 | Embedding-based classifiers can detect prompt injection attacks | Md. Ahsan Ayub et.al. | 2410.22284 | link |
| 2024-10-29 | Whose ChatGPT? Unveiling Real-World Educational Inequalities Introduced by Large Language Models | Renzhe Yu et.al. | 2410.22282 | null |
| 2024-10-28 | Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics | Yaniv Nikankin et.al. | 2410.21272 | link |
| 2024-10-28 | LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior | Hanyu Wang et.al. | 2410.21264 | link |
| 2024-10-28 | AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? | Han Bao et.al. | 2410.21259 | link |
| 2024-10-28 | LongReward: Improving Long-context Large Language Models with AI Feedback | Jiajie Zhang et.al. | 2410.21252 | link |
| 2024-10-28 | Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback | Nour Jedidi et.al. | 2410.21242 | null |
| 2024-10-28 | Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce | Zhantao Yang et.al. | 2410.21237 | null |
| 2024-10-28 | Flaming-hot Initiation with Regular Execution Sampling for Large Language Models | Weizhe Chen et.al. | 2410.21236 | null |
| 2024-10-28 | LoRA vs Full Fine-tuning: An Illusion of Equivalence | Reece Shuttleworth et.al. | 2410.21228 | null |
| 2024-10-28 | Lifting the Veil on the Large Language Model Supply Chain: Composition, Risks, and Mitigations | Kaifeng Huang et.al. | 2410.21218 | null |
| 2024-10-28 | BongLLaMA: LLaMA for Bangla Language | Abdullah Khan Zehady et.al. | 2410.21200 | null |
| 2024-10-25 | The Potential and Value of AI Chatbot in Personalized Cognitive Training | Zilong Wang et.al. | 2410.19733 | null |
| 2024-10-25 | Counting Ability of Large Language Models and Impact of Tokenization | Xiang Zhang et.al. | 2410.19730 | null |
| 2024-10-25 | FISHNET: Financial Intelligence from Sub-querying, Harmonizing, Neural-Conditioning, Expert Swarms, and Task Planning | Nicole Cho et.al. | 2410.19727 | null |
| 2024-10-25 | 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision | Shilong Li et.al. | 2410.19720 | null |
| 2024-10-25 | TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Xiangyu Zeng et.al. | 2410.19702 | null |
| 2024-10-25 | IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation | Kaixian Qu et.al. | 2410.19697 | null |
| 2024-10-25 | Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs | Yifei Zhang et.al. | 2410.19694 | null |
| 2024-10-25 | APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs | Huaxiaoyue Wang et.al. | 2410.19656 | null |
| 2024-10-25 | Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina | Yuan Gao et.al. | 2410.19599 | null |
| 2024-10-25 | Diverse Sign Language Translation | Xin Shen et.al. | 2410.19586 | null |
| 2024-10-24 | Unbounded: A Generative Infinite Game of Character Life Simulation | Jialu Li et.al. | 2410.18975 | null |
| 2024-10-24 | Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms | Zhangheng Li et.al. | 2410.18967 | null |
| 2024-10-24 | Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions | Yujuan Fu et.al. | 2410.18966 | null |
| 2024-10-24 | OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning | Xiaoqiang Wang et.al. | 2410.18963 | null |
| 2024-10-24 | Bridge-Coder: Unlocking LLMs’ Potential to Overcome Language Gaps in Low-Resource Code | Jipeng Zhang et.al. | 2410.18957 | null |
| 2024-10-24 | BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning | Yujuan Velvin Fu et.al. | 2410.18955 | null |
| 2024-10-24 | Dynamic Vocabulary Pruning in Early-Exit LLMs | Jort Vincenti et.al. | 2410.18952 | link |
| 2024-10-24 | SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models | Zonghao Ying et.al. | 2410.18927 | null |
| 2024-10-24 | From Blind Solvers to Logical Thinkers: Benchmarking LLMs’ Logical Integrity on Faulty Mathematical Problems | A M Muntasir Rahman et.al. | 2410.18921 | null |
| 2024-10-24 | A Survey on Speech Large Language Models | Jing Peng et.al. | 2410.18908 | null |
| 2024-10-23 | TP-Eval: Tap Multimodal LLMs’ Potential in Evaluation by Customizing Prompts | Yuxuan Xie et.al. | 2410.18071 | null |
| 2024-10-23 | LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering | Qingfei Zhao et.al. | 2410.18050 | link |
| 2024-10-23 | Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases | Anna Glazkova et.al. | 2410.18040 | null |
| 2024-10-23 | MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning | Jingfan Zhang et.al. | 2410.18035 | null |
| 2024-10-23 | GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration | Xin Li et.al. | 2410.18032 | link |
| 2024-10-23 | MiniFed : Integrating LLM-based Agentic-Workflow for Simulating FOMC Meeting | Sungil Seok et.al. | 2410.18012 | null |
| 2024-10-23 | Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation | Suho Kang et.al. | 2410.18001 | link |
| 2024-10-23 | Zeitenwenden: Detecting changes in the German political discourse | Kai-Robin Lange et.al. | 2410.17960 | null |
| 2024-10-23 | ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference | Xin He et.al. | 2410.17954 | null |
| 2024-10-23 | SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains | Ran Xu et.al. | 2410.17952 | null |
| 2024-10-22 | Altogether: Image Captioning via Re-aligning Alt-text | Hu Xu et.al. | 2410.17251 | null |
| 2024-10-22 | Large Language Models Empowered Personalized Web Agents | Hongru Cai et.al. | 2410.17236 | null |
| 2024-10-22 | Automated Spinal MRI Labelling from Reports Using a Large Language Model | Robin Y. Park et.al. | 2410.17235 | link |
| 2024-10-22 | Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy | Benedict Aaron Tjandra et.al. | 2410.17234 | null |
| 2024-10-22 | Few-shot In-Context Preference Learning Using Large Language Models | Chao Yu et.al. | 2410.17233 | null |
| 2024-10-22 | Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods | Tsachi Blau et.al. | 2410.17222 | null |
| 2024-10-22 | Exploring Possibilities of AI-Powered Legal Assistance in Bangladesh through Large Language Modeling | Azmine Toushik Wasi et.al. | 2410.17210 | link |
| 2024-10-22 | VoiceBench: Benchmarking LLM-Based Voice Assistants | Yiming Chen et.al. | 2410.17196 | link |
| 2024-10-22 | Language Model Non-myopic Generation for Reasoning and Planning | Chang Ma et.al. | 2410.17195 | null |
| 2024-10-22 | From Attention to Activation: Unravelling the Enigmas of Large Language Models | Prannay Kaul et.al. | 2410.17174 | null |
| 2024-10-21 | Reflection-Bench: probing AI intelligence with reflection | Lingyu Li et.al. | 2410.16270 | link |
| 2024-10-21 | Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance | Zhangwei Gao et.al. | 2410.16261 | link |
| 2024-10-21 | Elucidating the design space of language models for image generation | Xuantong Liu et.al. | 2410.16257 | null |
| 2024-10-21 | CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution | Maosong Cao et.al. | 2410.16256 | link |
| 2024-10-21 | Can Knowledge Editing Really Correct Hallucinations? | Baixiang Huang et.al. | 2410.16251 | link |
| 2024-10-21 | Analyzing Context Contributions in LLM-based Machine Translation | Emmanouil Zaranis et.al. | 2410.16246 | null |
| 2024-10-21 | IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems | Yihuan Mao et.al. | 2410.16237 | null |
| 2024-10-21 | LLaVA-KD: A Framework of Distilling Multimodal Large Language Models | Yuxuan Cai et.al. | 2410.16236 | null |
| 2024-10-21 | ToW: Thoughts of Words Improve Reasoning in Large Language Models | Zhikun Xu et.al. | 2410.16235 | null |
| 2024-10-21 | Building A Coding Assistant via the Retrieval-Augmented Language Model | Xinze Li et.al. | 2410.16229 | null |
| 2024-10-18 | Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts | German Gritsai et.al. | 2410.14677 | null |
| 2024-10-18 | SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment | Qin Liu et.al. | 2410.14676 | null |
| 2024-10-18 | Enhancing Large Language Models’ Situated Faithfulness to External Contexts | Yukun Huang et.al. | 2410.14675 | link |
| 2024-10-18 | NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples | Baiqi Li et.al. | 2410.14669 | null |
| 2024-10-18 | MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps | Xiongtao Zhou et.al. | 2410.14668 | link |
| 2024-10-18 | A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning | Shengjie Sun et.al. | 2410.14660 | null |
| 2024-10-18 | EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search | Oliver Sieberling et.al. | 2410.14649 | null |
| 2024-10-18 | Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs | Runchu Tian et.al. | 2410.14641 | link |
| 2024-10-18 | GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings | Raghuveer Thirukovalluru et.al. | 2410.14635 | null |
| 2024-10-18 | You Shall Know a Tool by the Traces it Leaves: The Predictability of Sentiment Analysis Tools | Daniel Baumartz et.al. | 2410.14626 | null |
| 2024-10-17 | Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens | Lijie Fan et.al. | 2410.13863 | null |
| 2024-10-17 | PUMA: Empowering Unified MLLM with Multi-granular Visual Generation | Rongyao Fang et.al. | 2410.13861 | link |
| 2024-10-17 | $γ-$ MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models | Yaxin Luo et.al. | 2410.13859 | null |
| 2024-10-17 | How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs | Guhao Feng et.al. | 2410.13857 | null |
| 2024-10-17 | Can MLLMs Understand the Deep Implication Behind Chinese Images? | Chenhao Zhang et.al. | 2410.13854 | link |
| 2024-10-17 | Retrospective Learning from Interactions | Zizhao Chen et.al. | 2410.13852 | null |
| 2024-10-17 | SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction | Xuan Zhang et.al. | 2410.13846 | link |
| 2024-10-17 | Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs | Tianyu Guo et.al. | 2410.13835 | null |
| 2024-10-17 | AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents | Ke Yang et.al. | 2410.13825 | null |
| 2024-10-17 | Harnessing Webpage UIs for Text-Rich Visual Understanding | Junpeng Liu et.al. | 2410.13824 | null |
| 2024-10-16 | Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media | Ross Deans Kristensen-McLachlan et.al. | 2410.12791 | null |
| 2024-10-16 | Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception | Jihao Zhao et.al. | 2410.12788 | null |
| 2024-10-16 | In-Context Learning Enables Robot Action Prediction in LLMs | Yida Yin et.al. | 2410.12782 | null |
| 2024-10-16 | Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information | Yingya Li et.al. | 2410.12774 | null |
| 2024-10-16 | StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples | Ajay Patel et.al. | 2410.12757 | null |
| 2024-10-16 | Comparative Analysis of Extrinsic Factors for NER in French | Grace Yang et.al. | 2410.12750 | null |
| 2024-10-16 | CREAM: Consistency Regularized Self-Rewarding Language Models | Zhaoyang Wang et.al. | 2410.12735 | null |
| 2024-10-16 | FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression | Zhenheng Tang et.al. | 2410.12707 | null |
| 2024-10-16 | WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines | Genta Indra Winata et.al. | 2410.12705 | null |
| 2024-10-16 | Sarcasm Detection in a Less-Resourced Language | Lazar Đoković et.al. | 2410.12704 | null |
| 2024-10-15 | GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation | Fei Tang et.al. | 2410.11841 | null |
| 2024-10-15 | MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding | Yue Cao et.al. | 2410.11829 | link |
| 2024-10-15 | SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing | Zhiyuan Zhang et.al. | 2410.11815 | null |
| 2024-10-15 | NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models | Han Han et.al. | 2410.11805 | null |
| 2024-10-15 | FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting | Zhe Li et.al. | 2410.11802 | null |
| 2024-10-15 | Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability | Tsz Ting Chung et.al. | 2410.11786 | null |
| 2024-10-15 | G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks | Guibin Zhang et.al. | 2410.11782 | null |
| 2024-10-15 | Language Models Encode Numbers Using Digit Representations in Base 10 | Amit Arnold Levy et.al. | 2410.11781 | null |
| 2024-10-15 | MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation | Chenxi Wang et.al. | 2410.11779 | link |
| 2024-10-15 | Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models | Kai Yao et.al. | 2410.11772 | link |
| 2024-10-14 | DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads | Guangxuan Xiao et.al. | 2410.10819 | link |
| 2024-10-14 | TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models | Mu Cai et.al. | 2410.10818 | null |
| 2024-10-14 | Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free | Ziyue Li et.al. | 2410.10814 | null |
| 2024-10-14 | LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory | Di Wu et.al. | 2410.10813 | link |
| 2024-10-14 | Local and Global Decoding in Text Generation | Daniel Gareev et.al. | 2410.10810 | link |
| 2024-10-14 | Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning | Aakanksha et.al. | 2410.10801 | null |
| 2024-10-14 | Towards Foundation Models for 3D Vision: How Close Are We? | Yiming Zuo et.al. | 2410.10799 | null |
| 2024-10-14 | MMAR: Towards Lossless Multi-Modal Auto-Regressive Prababilistic Modeling | Jian Yang et.al. | 2410.10798 | null |
| 2024-10-14 | Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance | Sachin Goyal et.al. | 2410.10796 | link |
| 2024-10-14 | LiveXiv – A Multi-Modal Live Benchmark Based on Arxiv Papers Content | Nimrod Shabtay et.al. | 2410.10783 | link |
| 2024-10-11 | MiRAGeNews: Multimodal Realistic AI-Generated News Detection | Runsheng Huang et.al. | 2410.09045 | null |
| 2024-10-11 | AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation | Zijun Wang et.al. | 2410.09040 | link |
| 2024-10-11 | Semi-Supervised Learning of Noisy Mixture of Experts Models | Oh-Ran Kwon et.al. | 2410.09039 | null |
| 2024-10-11 | SimpleStrat: Diversifying Language Model Generation with Stratification | Justin Wong et.al. | 2410.09038 | null |
| 2024-10-11 | Mentor-KD: Making Small Language Models Better Multi-step Reasoners | Hojae Lee et.al. | 2410.09037 | link |
| 2024-10-11 | PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents | Xiangyu Yin et.al. | 2410.09034 | null |
| 2024-10-11 | The Impact of Visual Information in Chinese Characters: Evaluating Large Models’ Ability to Recognize and Utilize Radicals | Xiaofeng Wu et.al. | 2410.09013 | null |
| 2024-10-11 | Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models | Hao Li et.al. | 2410.09012 | null |
| 2024-10-11 | SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights | Ling Yang et.al. | 2410.09008 | link |
| 2024-10-11 | From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating UI Operation Impacts | Zhuohao Jerry Zhang et.al. | 2410.09006 | null |
| 2024-10-10 | Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision | Shengcao Cao et.al. | 2410.08209 | null |
| 2024-10-10 | Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training | Gen Luo et.al. | 2410.08202 | null |
| 2024-10-10 | From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions | Changle Qu et.al. | 2410.08197 | link |
| 2024-10-10 | MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code | Zimu Lu et.al. | 2410.08196 | link |
| 2024-10-10 | GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment | Yuancheng Xu et.al. | 2410.08193 | null |
| 2024-10-10 | Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models | Qingni Wang et.al. | 2410.08174 | null |
| 2024-10-10 | On the Evaluation of Generative Robotic Simulations | Feng Chen et.al. | 2410.08172 | null |
| 2024-10-10 | Agent S: An Open Agentic Framework that Uses Computers Like a Human | Saaket Agashe et.al. | 2410.08164 | link |
| 2024-10-10 | Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning | Amrith Setlur et.al. | 2410.08146 | null |
| 2024-10-10 | Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs | Xiaoyuan Liu et.al. | 2410.08145 | null |
| 2024-10-09 | Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models | Fei Wang et.al. | 2410.07176 | null |
| 2024-10-09 | Do better language models have crisper vision? | Jona Ruthardt et.al. | 2410.07173 | null |
| 2024-10-09 | Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate | Qidong Huang et.al. | 2410.07167 | link |
| 2024-10-09 | Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making | Manling Li et.al. | 2410.07166 | link |
| 2024-10-09 | Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning | Chongyu Fan et.al. | 2410.07163 | null |
| 2024-10-09 | Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis | Bohan Zeng et.al. | 2410.07155 | link |
| 2024-10-09 | Mental Disorders Detection in the Era of Large Language Models | Gleb Kuzmin et.al. | 2410.07129 | null |
| 2024-10-09 | Personalized Visual Instruction Tuning | Renjie Pi et.al. | 2410.07113 | null |
| 2024-10-09 | I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy | Gian Maria Campedelli et.al. | 2410.07109 | null |
| 2024-10-09 | Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context | Sangwon Yu et.al. | 2410.07103 | null |
| 2024-10-07 | Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models | Fei Wang et.al. | 2410.05269 | null |
| 2024-10-07 | PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs | Mengzhao Chen et.al. | 2410.05265 | link |
| 2024-10-07 | TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles | Qingchen Yu et.al. | 2410.05262 | link |
| 2024-10-07 | Differential Transformer | Tianzhu Ye et.al. | 2410.05258 | null |
| 2024-10-07 | GLEE: A Unified Framework and Benchmark for Language-based Economic Environments | Eilam Shapira et.al. | 2410.05254 | link |
| 2024-10-07 | Causal Micro-Narratives | Mourad Heddaya et.al. | 2410.05252 | null |
| 2024-10-07 | LoTLIP: Improving Language-Image Pre-training for Long Text Understanding | Wei Wu et.al. | 2410.05249 | null |
| 2024-10-07 | SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe | Yuxin Xiao et.al. | 2410.05248 | null |
| 2024-10-07 | Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents | Boyu Gou et.al. | 2410.05243 | null |
| 2024-10-07 | GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models | Iman Mirzadeh et.al. | 2410.05229 | null |
| 2024-10-04 | Enhance Reasoning by Learning from Mistakes: Peer-Review Knowledge Distillation from Multiple Large Language Models | Zhuochun Li et.al. | 2410.03663 | link |
| 2024-10-04 | RAFT: Realistic Attacks to Fool Text Detectors | James Wang et.al. | 2410.03658 | null |
| 2024-10-04 | Aligning LLMs with Individual Preferences via Interaction | Shujin Wu et.al. | 2410.03642 | link |
| 2024-10-04 | Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation | Jie Xiao et.al. | 2410.03613 | null |
| 2024-10-04 | TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation | Jonathan Cook et.al. | 2410.03608 | null |
| 2024-10-04 | Efficiently Identifying Watermarked Segments in Mixed-Source Texts | Xuandong Zhao et.al. | 2410.03600 | null |
| 2024-10-04 | Understanding Reasoning in Chain-of-Thought from the Hopfieldian View | Lijie Hu et.al. | 2410.03595 | null |
| 2024-10-04 | Explicit, Implicit, and Scattered: Revisiting Event Extraction to Capture Complex Arguments | Omar Sharif et.al. | 2410.03594 | null |
| 2024-10-04 | Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models | Xin Zou et.al. | 2410.03577 | null |
| 2024-10-04 | Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs) | Abrar Rahman et.al. | 2410.03568 | null |
| 2024-10-03 | FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models | Zhipei Xu et.al. | 2410.02761 | link |
| 2024-10-03 | Loong: Generating Minute-level Long Videos with Autoregressive Language Models | Yuqing Wang et.al. | 2410.02757 | null |
| 2024-10-03 | SIEVE: General Purpose Data Filtering System Matching GPT-4o Accuracy at 1% the Cost | Jifan Zhang et.al. | 2410.02755 | null |
| 2024-10-03 | Training Language Models on Synthetic Edit Sequences Improves Code Synthesis | Ulyana Piterbarg et.al. | 2410.02749 | link |
| 2024-10-03 | CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation | Han He et.al. | 2410.02748 | null |
| 2024-10-03 | Contrastive Localized Language-Image Pre-Training | Hong-You Chen et.al. | 2410.02746 | null |
| 2024-10-03 | Neutral residues: revisiting adapters for model extension | Franck Signe Talla et.al. | 2410.02744 | null |
| 2024-10-03 | MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions | Yekun Chai et.al. | 2410.02743 | link |
| 2024-10-03 | Grounding Large Language Models In Embodied Environment With Imperfect World Models | Haolan Liu et.al. | 2410.02742 | null |
| 2024-10-03 | Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization | Lei Xu et.al. | 2410.02741 | null |
| 2024-10-02 | Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads | Yuxiang Huang et.al. | 2410.01805 | link |
| 2024-10-02 | Efficient $1$ -bit tensor approximations | Alex W. Neal Riasanovsky et.al. | 2410.01799 | null |
| 2024-10-02 | Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models | Joseph Lee et.al. | 2410.01795 | link |
| 2024-10-02 | When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 | R. Thomas McCoy et.al. | 2410.01792 | null |
| 2024-10-02 | Investigating on RLHF methodology | Alexey Kutalev et.al. | 2410.01789 | null |
| 2024-10-02 | OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation Models | Heng Yang et.al. | 2410.01784 | link |
| 2024-10-02 | Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models | Shayekh Bin Islam et.al. | 2410.01782 | null |
| 2024-10-02 | Quantifying Generalization Complexity for Large Language Models | Zhenting Qi et.al. | 2410.01769 | link |
| 2024-10-02 | LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks | Mengzhao Jia et.al. | 2410.01744 | link |
| 2024-10-02 | VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models | Kailai Feng et.al. | 2410.01738 | link |
| 2024-09-30 | MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning | Haotian Zhang et.al. | 2409.20566 | null |
| 2024-09-30 | Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos | Md Mohaiminul Islam et.al. | 2409.20557 | null |
| 2024-09-30 | LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation | Ziyao Zhang et.al. | 2409.20550 | null |
| 2024-09-30 | Robi Butler: Remote Multimodal Interactions with Household Robot Assistant | Anxing Xiao et.al. | 2409.20548 | null |
| 2024-09-30 | Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models | Arpan Mukherjee et.al. | 2409.20512 | null |
| 2024-09-30 | COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models | Divyanshu Daiya et.al. | 2409.20502 | null |
| 2024-10-02 | Linear Projections of Teacher Embeddings for Few-Class Distillation | Noel Loo et.al. | 2409.20449 | null |
| 2024-10-01 | Instance-adaptive Zero-shot Chain-of-Thought Prompting | Xiaosong Yuan et.al. | 2409.20441 | null |
| 2024-09-30 | HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding | Fan Yuan et.al. | 2409.20429 | link |
| 2024-09-30 | World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering | Jiacong Wang et.al. | 2409.20424 | link |
| 2024-09-27 | LML: Language Model Learning a Dataset for Data-Augmented Prediction | Praneeth Vadlapati et.al. | 2409.18957 | link |
| 2024-09-27 | Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models | Jiaming Li et.al. | 2409.18943 | link |
| 2024-09-27 | From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding | Heqing Zou et.al. | 2409.18938 | link |
| 2024-09-27 | AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow | Huizi Yu et.al. | 2409.18924 | null |
| 2024-09-27 | Soft Measures for Extracting Causal Collective Intelligence | Maryam Berijanian et.al. | 2409.18911 | link |
| 2024-09-27 | Multi-Source Hard and Soft Information Fusion Approach for Accurate Cryptocurrency Price Movement Prediction | Saeed Mohammadi Dashtaki et.al. | 2409.18895 | null |
| 2024-09-27 | HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models | Yu Zhou et.al. | 2409.18893 | null |
| 2024-09-27 | IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation | Fan Lin et.al. | 2409.18892 | null |
| 2024-09-27 | Predicting and analyzing memorization within fine-tuned Large Language Models | Jérémie Dentan et.al. | 2409.18858 | null |
| 2024-09-27 | Mitigating Selection Bias with Node Pruning and Auxiliary Options | Hyeong Kyu Choi et.al. | 2409.18857 | null |
| 2024-09-26 | EgoLM: Multi-Modal Language Model of Egocentric Motions | Fangzhou Hong et.al. | 2409.18127 | null |
| 2024-09-26 | Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography | Yuexi Du et.al. | 2409.18119 | link |
| 2024-09-26 | E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding | Ye Liu et.al. | 2409.18111 | link |
| 2024-09-26 | Infering Alt-text For UI Icons With Large Language Models During App Development | Sabrina Haque et.al. | 2409.18060 | null |
| 2024-09-26 | DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving | Dingrui Wang et.al. | 2409.18053 | null |
| 2024-09-26 | IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning | Soeun Lee et.al. | 2409.18046 | null |
| 2024-09-26 | Unveiling the Role of Pretraining in Direct Speech Translation | Belen Alastruey et.al. | 2409.18044 | null |
| 2024-09-26 | EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions | Kai Chen et.al. | 2409.18042 | link |
| 2024-09-26 | Compositional Hardness of Code in Large Language Models – A Probabilistic Perspective | Yotam Wolf et.al. | 2409.18028 | null |
| 2024-09-26 | An Adversarial Perspective on Machine Unlearning for AI Safety | Jakub Łucki et.al. | 2409.18025 | null |
| 2024-09-25 | Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models | Matt Deitke et.al. | 2409.17146 | link |
| 2024-09-25 | Attention Prompting on Image for Large Vision-Language Models | Runpeng Yu et.al. | 2409.17143 | link |
| 2024-09-25 | FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression | Fazal Mittu et.al. | 2409.17141 | link |
| 2024-09-25 | Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents | Junting Lu et.al. | 2409.17140 | null |
| 2024-09-25 | Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale | Fan Zhou et.al. | 2409.17115 | link |
| 2024-09-25 | Accumulator-Aware Post-Training Quantization | Ian Colbert et.al. | 2409.17092 | null |
| 2024-09-25 | VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models | Yifei Liu et.al. | 2409.17066 | link |
| 2024-09-25 | Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia | Azmul Asmar Irfan et.al. | 2409.17054 | null |
| 2024-09-25 | How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not | Francesco Verdini et.al. | 2409.17044 | null |
| 2024-09-25 | Counterfactual Token Generation in Large Language Models | Ivi Chatzi et.al. | 2409.17027 | link |
| 2024-09-24 | MonoFormer: One Transformer for Both Diffusion and Autoregression | Chuyang Zhao et.al. | 2409.16280 | link |
| 2024-09-24 | A fast and sound tagging method for discontinuous named-entity recognition | Caio Corro et.al. | 2409.16243 | null |
| 2024-09-24 | LLM Echo Chamber: personalized and automated disinformation | Tony Ma et.al. | 2409.16241 | link |
| 2024-09-24 | Towards Enhancing Linked Data Retrieval in Conversational UIs using Large Language Models | Omar Mussa et.al. | 2409.16220 | null |
| 2024-09-24 | LLMCount: Enhancing Stationary mmWave Detection with Multimodal-LLM | Boyan Li et.al. | 2409.16209 | null |
| 2024-09-25 | CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data | Qian-Wen Zhang et.al. | 2409.16202 | link |
| 2024-09-24 | HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models | Haoran Que et.al. | 2409.16191 | link |
| 2024-09-24 | Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation | Xiaohong Liu et.al. | 2409.16183 | null |
| 2024-09-24 | Cyber Knowledge Completion Using Large Language Models | Braden K Webb et.al. | 2409.16176 | null |
| 2024-09-24 | Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering | Ziyu Zhao et.al. | 2409.16167 | null |
| 2024-09-20 | Gender Representation and Bias in Indian Civil Service Mock Interviews | Somonnoy Banerjee et.al. | 2409.12194 | null |
| 2024-09-18 | To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning | Zayne Sprague et.al. | 2409.12183 | link |
| 2024-09-18 | Finetuning Language Models to Emit Linguistic Expressions of Uncertainty | Arslan Chaudhry et.al. | 2409.12180 | null |
| 2024-09-18 | Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference | Najmeh Forouzandehmehr et.al. | 2409.12150 | null |
| 2024-09-18 | MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | Justin Chih-Yao Chen et.al. | 2409.12147 | link |
| 2024-09-18 | Experimental Evidence That Conversational Artificial Intelligence Can Steer Consumer Behavior Without Detection | Tobias Werner et.al. | 2409.12143 | null |
| 2024-09-18 | MoRAG – Multi-Fusion Retrieval Augmented Generation for Human Motion | Kalakonda Sai Shashank et.al. | 2409.12140 | link |
| 2024-09-24 | Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models | Sijing Chen et.al. | 2409.12139 | null |
| 2024-09-18 | Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement | An Yang et.al. | 2409.12122 | null |
| 2024-09-18 | Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference | Edresson Casanova et.al. | 2409.12117 | null |
| 2024-09-17 | AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs | Basel Mousi et.al. | 2409.11404 | null |
| 2024-09-17 | NVLM: Open Frontier-Class Multimodal LLMs | Wenliang Dai et.al. | 2409.11402 | null |
| 2024-09-17 | Says Who? Effective Zero-Shot Annotation of Focalization | Rebecca M. M. Hicke et.al. | 2409.11390 | null |
| 2024-09-17 | Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement | Simon Yu et.al. | 2409.11378 | link |
| 2024-09-17 | Towards Time Series Reasoning with LLMs | Winnie Chow et.al. | 2409.11376 | null |
| 2024-09-17 | Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification | Fatema-E- Jannat et.al. | 2409.11375 | null |
| 2024-09-17 | CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration | Jiahui Gao et.al. | 2409.11365 | null |
| 2024-09-17 | AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances | Dhruv Agarwal et.al. | 2409.11360 | null |
| 2024-09-17 | THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models | Mengfei Liang et.al. | 2409.11353 | null |
| 2024-09-18 | Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling | Xinyue Fang et.al. | 2409.11283 | null |
| 2024-09-16 | RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval | Di Liu et.al. | 2409.10516 | null |
| 2024-09-16 | Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models | Momoko Shiraishi et.al. | 2409.10506 | null |
| 2024-09-16 | DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction | John Wu et.al. | 2409.10504 | null |
| 2024-09-16 | Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles | Kulin Shah et.al. | 2409.10502 | link |
| 2024-09-16 | Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models | Shaznin Sultana et.al. | 2409.10490 | null |
| 2024-09-16 | XLM for Autonomous Driving Systems: A Comprehensive Review | Sonda Fourati et.al. | 2409.10484 | null |
| 2024-09-16 | Schrodinger’s Memory: Large Language Models | Wei Wang et.al. | 2409.10482 | null |
| 2024-09-16 | LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning | Jicong Ao et.al. | 2409.10444 | link |
| 2024-09-16 | A Large-Scale Privacy Assessment of Android Third-Party SDKs | Mark Huasong Meng et.al. | 2409.10411 | null |
| 2024-09-17 | Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot | Bhuvan Sachdeva et.al. | 2409.10354 | null |
| 2024-09-13 | Agents in Software Engineering: Survey, Landscape, and Vision | Yanxian Huang et.al. | 2409.09030 | link |
| 2024-09-13 | Contri(e)ve: Context + Retrieve for Scholarly Question Answering | Kanchan Shivashankar et.al. | 2409.09010 | null |
| 2024-09-13 | Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance | Lucio La Cava et.al. | 2409.08963 | null |
| 2024-09-13 | Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions | Zahra Ashktorab et.al. | 2409.08937 | null |
| 2024-09-13 | SynSUM – Synthetic Benchmark with Structured and Unstructured Medical Records | Paloma Rabaey et.al. | 2409.08936 | link |
| 2024-09-13 | LLM-based Weak Supervision Framework for Query Intent Classification in Video Search | Farnoosh Javadi et.al. | 2409.08931 | null |
| 2024-09-13 | AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models | Yifei Yao et.al. | 2409.08904 | null |
| 2024-09-13 | A Market for Lemons? Strategic Directions for a Vigilant Application of Artificial Intelligence in Entrepreneurship Research | Martin Obschonka et.al. | 2409.08890 | null |
| 2024-09-13 | Exploring Graph Structure Comprehension Ability of Multimodal Large Language Models: Case Studies | Zhiqiang Zhong et.al. | 2409.08864 | null |
| 2024-09-13 | FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition | Zhenhua Xu et.al. | 2409.08846 | null |
| 2024-09-12 | DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors | Thomas Hanwen Zhu et.al. | 2409.08278 | null |
| 2024-09-12 | Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale | Rogerio Bonatti et.al. | 2409.08264 | link |
| 2024-09-12 | OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering | Jiahao Nick Li et.al. | 2409.08250 | null |
| 2024-09-12 | Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources | Alisia Lupidi et.al. | 2409.08239 | null |
| 2024-09-12 | LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems | Hakan T. Otal et.al. | 2409.08234 | link |
| 2024-09-12 | What Makes a Maze Look Like a Maze? | Joy Hsu et.al. | 2409.08202 | null |
| 2024-09-12 | Fine-tuning Large Language Models for Entity Matching | Aaron Steiner et.al. | 2409.08185 | link |
| 2024-09-12 | Faster Speech-LLaMA Inference with Multi-token Prediction | Desh Raj et.al. | 2409.08148 | null |
| 2024-09-12 | LLM-POTUS Score: A Framework of Analyzing Presidential Debates with Large Language Models | Zhengliang Liu et.al. | 2409.08147 | null |
| 2024-09-12 | WhisperNER: Unified Open Named Entity and Speech Recognition | Gil Ayache et.al. | 2409.08107 | null |
| 2024-09-11 | “My Grade is Wrong!”: A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays | Shengxin Hong et.al. | 2409.07453 | null |
| 2024-09-11 | SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories | Ben Bogin et.al. | 2409.07440 | link |
| 2024-09-11 | CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification | Zeqing Qin et.al. | 2409.07407 | null |
| 2024-09-11 | AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge | Han Wang et.al. | 2409.07394 | link |
| 2024-09-11 | Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective | Guimin Hu et.al. | 2409.07388 | null |
| 2024-09-11 | Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code | Khiem Ton et.al. | 2409.07368 | null |
| 2024-09-11 | Think Together and Work Better: Combining Humans’ and LLMs’ Think-Aloud Outcomes for Effective Text Evaluation | SeongYeub Chu et.al. | 2409.07355 | link |
| 2024-09-11 | Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks | Md Zarif Hossain et.al. | 2409.07353 | link |
| 2024-09-11 | Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering | Weixi Weng et.al. | 2409.07331 | null |
| 2024-09-11 | MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications | Praveen K Kanithi et.al. | 2409.07314 | null |
| 2024-09-10 | E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning | Zihan Liao et.al. | 2409.06679 | link |
| 2024-09-10 | LLaMA-Omni: Seamless Speech Interaction with Large Language Models | Qingkai Fang et.al. | 2409.06666 | link |
| 2024-09-10 | Human Perception of LLM-generated Text Content in Social Media Environments | Kristina Radivojevic et.al. | 2409.06653 | null |
| 2024-09-10 | Optimal Workload Placement on Multi-Instance GPUs | Bekir Turkkan et.al. | 2409.06646 | null |
| 2024-09-10 | EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis | Danli Shi et.al. | 2409.06644 | null |
| 2024-09-10 | MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders | Wenyu Zhang et.al. | 2409.06635 | null |
| 2024-09-10 | A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio | Ningyuan Xi et.al. | 2409.06624 | null |
| 2024-09-10 | Alleviating Hallucinations in Large Language Models with Scepticism Modeling | Yetao Wu et.al. | 2409.06601 | null |
| 2024-09-10 | GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering | Sacha Muller et.al. | 2409.06595 | link |
| 2024-09-10 | MAPS: Energy-Reliability Tradeoff Management in Autonomous Vehicles Through LLMs Penetrated Science | Mahdieh Aliazam et.al. | 2409.06558 | null |
| 2024-09-09 | MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct | Run Luo et.al. | 2409.05840 | null |
| 2024-09-09 | Are Large Language Models a Threat to Programming Platforms? An Exploratory Study | Md Mustakim Billah et.al. | 2409.05824 | null |
| 2024-09-09 | Benchmarking Chinese Knowledge Rectification in Large Language Models | Tianhe Lu et.al. | 2409.05806 | link |
| 2024-09-09 | Breaking Neural Network Scaling Laws with Modularity | Akhilan Boopathy et.al. | 2409.05780 | null |
| 2024-09-09 | Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models | Emily Cheng et.al. | 2409.05771 | null |
| 2024-09-09 | Model Input Verification of Large Scale Simulations | Rumyana Neykova et.al. | 2409.05768 | null |
| 2024-09-09 | A Novel Idea Generation Tool using a Structured Conversational AI (CAI) System | B. Sankar et.al. | 2409.05747 | null |
| 2024-09-09 | LLMs Will Always Hallucinate, and We Need to Live With This | Sourav Banerjee et.al. | 2409.05746 | null |
| 2024-09-09 | A System and Benchmark for LLM-based Q\&A on Heterogeneous Data | Achille Fokoue et.al. | 2409.05735 | null |
| 2024-09-09 | Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach | Meng Zhou et.al. | 2409.05732 | link |
| 2024-09-06 | RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs | Jiaxing Wu et.al. | 2409.04421 | null |
| 2024-09-06 | Question-Answering Dense Video Events | Hangyu Qin et.al. | 2409.04388 | null |
| 2024-09-06 | Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs | Aliakbar Nafar et.al. | 2409.04318 | null |
| 2024-09-06 | An optically accelerated extreme learning machine using hot atomic vapors | Pierre Azam et.al. | 2409.04312 | null |
| 2024-09-06 | Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets | Desiree Heim et.al. | 2409.04286 | null |
| 2024-09-06 | Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models | Yuxiao Huang et.al. | 2409.04270 | null |
| 2024-09-06 | GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding | Ziyin Zhang et.al. | 2409.04183 | null |
| 2024-09-06 | Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering | Larissa Pusch et.al. | 2409.04181 | null |
| 2024-09-06 | From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks | Andreas Stephan et.al. | 2409.04168 | null |
| 2024-09-06 | Can OpenSource beat ChatGPT? – A Comparative Study of Large Language Models for Text-to-Code Generation | Luis Mayer et.al. | 2409.04164 | null |
| 2024-09-05 | Attention Heads of Large Language Models: A Survey | Zifan Zheng et.al. | 2409.03752 | link |
| 2024-09-05 | LLM-CI: Assessing Contextual Integrity Norms in Language Models | Yan Shvartzshnaider et.al. | 2409.03735 | null |
| 2024-09-05 | Safety vs. Performance: How Multi-Objective Learning Reduces Barriers to Market Entry | Meena Jagadeesan et.al. | 2409.03734 | null |
| 2024-09-05 | Planning In Natural Language Improves LLM Search For Code Generation | Evan Wang et.al. | 2409.03733 | null |
| 2024-09-05 | RAG based Question-Answering for Contextual Response Prediction System | Sriram Veturi et.al. | 2409.03708 | null |
| 2024-09-05 | TRACE-cs: Trustworthy Reasoning for Contrastive Explanations in Course Scheduling Problems | Stylianos Loukas Vasileiou et.al. | 2409.03671 | null |
| 2024-09-05 | A Fused Large Language Model for Predicting Startup Success | Abdurahman Maarouf et.al. | 2409.03668 | null |
| 2024-09-05 | The representation landscape of few-shot learning and fine-tuning in large language models | Diego Doimo et.al. | 2409.03662 | link |
| 2024-09-06 | LLM-based multi-agent poetry generation in non-cooperative environments | Ran Zhang et.al. | 2409.03659 | link |
| 2024-09-05 | From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents | Jifan Yu et.al. | 2409.03512 | null |
| 2024-09-04 | RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version) | Yao Mu et.al. | 2409.02920 | null |
| 2024-09-05 | LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA | Jiajie Zhang et.al. | 2409.02897 | link |
| 2024-09-04 | LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture | Xidong Wang et.al. | 2409.02889 | link |
| 2024-09-04 | Historical German Text Normalization Using Type- and Token-Based Language Modeling | Anton Ehrmanntraut et.al. | 2409.02841 | null |
| 2024-09-04 | Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models | Moein Shahiki Tash et.al. | 2409.02836 | null |
| 2024-09-04 | CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models | Wentao Liu et.al. | 2409.02834 | link |
| 2024-09-04 | ExpLLM: Towards Chain of Thought for Facial Expression Recognition | Xing Lan et.al. | 2409.02828 | link |
| 2024-09-04 | Design Contradictions: Help or Hindrance? | Aron E. Owen et.al. | 2409.02823 | null |
| 2024-09-04 | Language Understanding as a Constraint on Consensus Size in LLM Societies | Giordano De Marzo et.al. | 2409.02822 | null |
| 2024-09-04 | Towards a Unified View of Preference Learning for Large Language Models: A Survey | Bofei Gao et.al. | 2409.02795 | null |
| 2024-08-30 | SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists | Raoyuan Zhao et.al. | 2408.17437 | link |
| 2024-08-30 | Advancing Multi-talker ASR Performance with Large Language Models | Mohan Shi et.al. | 2408.17431 | null |
| 2024-08-30 | CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models | Jonathan Bourne et.al. | 2408.17428 | null |
| 2024-08-30 | Getting Inspiration for Feature Elicitation: App Store- vs. LLM-based Approach | Jialiang Wei et.al. | 2408.17404 | link |
| 2024-08-30 | NDP: Next Distribution Prediction as a More Broad Target | Junhao Ruan et.al. | 2408.17377 | null |
| 2024-08-30 | Look, Learn and Leverage (L $^3$ ): Mitigating Visual-Domain Shift and Discovering Intrinsic Relations via Symbolic Alignment | Hanchen Xie et.al. | 2408.17363 | null |
| 2024-08-30 | Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain | Francesca Grasso et.al. | 2408.17362 | link |
| 2024-08-30 | Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage | Md Rafi Ur Rashid et.al. | 2408.17354 | null |
| 2024-08-30 | Bridging Domain Knowledge and Process Discovery Using Large Language Models | Ali Norouzifar et.al. | 2408.17316 | link |
| 2024-08-30 | Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts | Rhui Dih Lee et.al. | 2408.17280 | null |
| 2024-08-29 | How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models | Jiyue Jiang et.al. | 2408.16756 | link |
| 2024-08-29 | Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models | Alec Solway et.al. | 2408.16753 | null |
| 2024-08-29 | Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge | Beidi Dong et.al. | 2408.16749 | null |
| 2024-08-29 | Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models | Jiří Milička et.al. | 2408.16740 | null |
| 2024-08-29 | GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models | Moreno D’Incà et.al. | 2408.16700 | link |
| 2024-08-29 | Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity | Ziniu Li et.al. | 2408.16673 | null |
| 2024-08-29 | Examination of Code generated by Large Language Models | Robin Beer et.al. | 2408.16601 | link |
| 2024-08-29 | Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies | Zhiyang Qi et.al. | 2408.16586 | null |
| 2024-08-29 | CNIMA: A Universal Evaluation Framework and Automated Approach for Assessing Second Language Dialogues | Rena Gao et.al. | 2408.16518 | null |
| 2024-08-29 | LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs? | Jan Cegin et.al. | 2408.16502 | null |
| 2024-08-28 | Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders | Min Shi et.al. | 2408.15998 | link |
| 2024-08-28 | BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems | Wei Wang et.al. | 2408.15971 | null |
| 2024-08-28 | More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding | Yuan Tang et.al. | 2408.15966 | link |
| 2024-08-28 | Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games | Nicholas R. Waytowich et.al. | 2408.15950 | null |
| 2024-08-28 | Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models | Yuncheng Yang et.al. | 2408.15915 | link |
| 2024-08-28 | Decentralized LLM Inference over Edge Networks with Energy Harvesting | Aria Khoshsirat et.al. | 2408.15907 | null |
| 2024-08-28 | LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments | Ruirui Chen et.al. | 2408.15903 | null |
| 2024-08-28 | Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts | Nikolas Gritsch et.al. | 2408.15901 | null |
| 2024-08-28 | Bias in LLMs as Annotators: The Effect of Party Cues on Labelling Decision by Large Language Models | Sebastian Vallejo Vera et.al. | 2408.15895 | null |
| 2024-08-28 | Persuasion Games using Large Language Models | Ganesh Prasath Ramani et.al. | 2408.15879 | null |
| 2024-08-27 | Generative Verifiers: Reward Modeling as Next-Token Prediction | Lunjun Zhang et.al. | 2408.15240 | null |
| 2024-08-27 | LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet | Nathaniel Li et.al. | 2408.15221 | null |
| 2024-08-27 | Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks | Shide Zhou et.al. | 2408.15207 | null |
| 2024-08-27 | Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation | Jian Hu et.al. | 2408.15205 | link |
| 2024-08-27 | Can Unconfident LLM Annotations Be Used for Confident Conclusions? | Kristina Gligorić et.al. | 2408.15204 | link |
| 2024-08-27 | Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement | Longshen Ou et.al. | 2408.15176 | null |
| 2024-08-27 | X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation | Hanjia Lyu et.al. | 2408.15172 | null |
| 2024-08-27 | Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation | N. E. Kriman et.al. | 2408.15171 | null |
| 2024-08-27 | BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline | Guosheng Dong et.al. | 2408.15079 | null |
| 2024-08-27 | Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models | Ned Cooper et.al. | 2408.15066 | null |
| 2024-08-27 | Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models | Aradhye Agarwal et.al. | 2408.14470 | null |
| 2024-08-26 | Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos | Qirui Chen et.al. | 2408.14469 | link |
| 2024-08-26 | Explicit Inductive Inference using Large Language Models | Tianyang Liu et.al. | 2408.14467 | null |
| 2024-08-26 | Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study | Liuchang Xu Shuo Zhao et.al. | 2408.14438 | null |
| 2024-08-26 | CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models | Shubham Bharti et.al. | 2408.14419 | null |
| 2024-08-26 | MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues | Kuluhan Binici et.al. | 2408.14418 | null |
| 2024-08-26 | Language-specific Calibration for Pruning Multilingual Language Models | Simon Kurz et.al. | 2408.14398 | null |
| 2024-08-26 | Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning | Sakhinana Sagar Srinivas et.al. | 2408.14387 | null |
| 2024-08-26 | Probing Causality Manipulation of Large Language Models | Chenyang Zhang et.al. | 2408.14380 | link |
| 2024-08-26 | SWE-bench-java: A GitHub Issue Resolving Benchmark for Java | Daoguang Zan et.al. | 2408.14354 | link |
| 2024-08-23 | MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? | Yi-Fan Zhang et.al. | 2408.13257 | null |
| 2024-08-23 | Domain-specific long text classification from sparse relevant information | Célia D’Cruz et.al. | 2408.13253 | null |
| 2024-08-23 | Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption | Sakhinana Sagar Srinivas et.al. | 2408.13248 | null |
| 2024-08-23 | Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time | Yingyu Liang et.al. | 2408.13233 | null |
| 2024-08-23 | EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods | Hongcheng Ding et.al. | 2408.13214 | null |
| 2024-08-23 | DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation | Qiming Zhu et.al. | 2408.13204 | null |
| 2024-08-23 | Instruct-DeBERTa: A Hybrid Approach for Aspect-based Sentiment Analysis on Textual Reviews | Dineth Jayakody et.al. | 2408.13202 | null |
| 2024-08-23 | Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning | Hourui Deng et.al. | 2408.13184 | null |
| 2024-08-23 | IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models | Zhihao Yu et.al. | 2408.13073 | null |
| 2024-08-23 | Guiding IoT-Based Healthcare Alert Systems with Large Language Models | Yulan Gao et.al. | 2408.13071 | null |
| 2024-08-22 | Controllable Text Generation for Large Language Models: A Survey | Xun Liang et.al. | 2408.12599 | link |
| 2024-08-22 | RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment | Xiaohan Wang et.al. | 2408.12579 | null |
| 2024-08-22 | Jamba-1.5: Hybrid Transformer-Mamba Models at Scale | Jamba Team et.al. | 2408.12570 | link |
| 2024-08-22 | ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation | Lujia Zhong et.al. | 2408.12561 | link |
| 2024-08-22 | Towards Evaluating and Building Versatile Large Language Models for Medicine | Chaoyi Wu et.al. | 2408.12547 | link |
| 2024-08-22 | Show-o: One Single Transformer to Unify Multimodal Understanding and Generation | Jinheng Xie et.al. | 2408.12528 | link |
| 2024-08-22 | MEDCO: Medical Education Copilots Based on A Multi-Agent Framework | Hao Wei et.al. | 2408.12496 | null |
| 2024-08-22 | GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models | Kunsheng Tang et.al. | 2408.12494 | link |
| 2024-08-22 | Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese | Khang T. Doan et.al. | 2408.12480 | null |
| 2024-08-22 | Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition | Bozheng Li et.al. | 2408.12475 | null |
| 2024-08-21 | SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs | Yuanyang Yin et.al. | 2408.11813 | null |
| 2024-08-21 | Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models | Yuzhou Huang et.al. | 2408.11801 | null |
| 2024-08-21 | PermitQA: A Benchmark for Retrieval Augmented Generation in Wind Siting and Permitting domain | Rounak Meyur et.al. | 2408.11800 | null |
| 2024-08-21 | EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model | Feipeng Ma et.al. | 2408.11795 | null |
| 2024-08-21 | Leveraging Chemistry Foundation Models to Facilitate Structure Focused Retrieval Augmented Generation in Multi-Agent Workflows for Catalyst and Materials Design | Nathaniel H. Park et.al. | 2408.11793 | null |
| 2024-08-21 | Critique-out-Loud Reward Models | Zachary Ankner et.al. | 2408.11791 | link |
| 2024-08-21 | DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework | Zhifei Xie et.al. | 2408.11788 | null |
| 2024-08-21 | Personality Alignment of Large Language Models | Minjun Zhu et.al. | 2408.11779 | link |
| 2024-08-21 | Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards | Omar Erak et.al. | 2408.11775 | link |
| 2024-08-21 | Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks | Yiyi Chen et.al. | 2408.11749 | null |
| 2024-08-20 | Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks | Nathaniel Pinckney et.al. | 2408.11053 | null |
| 2024-08-20 | FLAME: Learning to Navigate with Multimodal LLM in Urban Environments | Yunzhe Xu et.al. | 2408.11051 | link |
| 2024-08-20 | MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding | Jian Chen et.al. | 2408.11049 | link |
| 2024-08-20 | Reconciling Methodological Paradigms: Employing Large Language Models as Novice Qualitative Research Assistants in Talent Management Research | Sreyoshi Bhaduri et.al. | 2408.11043 | null |
| 2024-08-20 | Scaling Law with Learning Rate Annealing | Howe Tissue et.al. | 2408.11029 | null |
| 2024-08-20 | Athena: Safe Autonomous Agents with Verbal Contrastive Learning | Tanmana Sadhu et.al. | 2408.11021 | null |
| 2024-08-20 | While GitHub Copilot Excels at Coding, Does It Ensure Responsible Output? | Wen Cheng et.al. | 2408.11006 | link |
| 2024-08-20 | CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models | Michael Reinisch et.al. | 2408.10995 | null |
| 2024-08-20 | Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models | Yuyan Chen et.al. | 2408.10947 | null |
| 2024-08-20 | Large Language Model Driven Recommendation | Anton Korikov et.al. | 2408.10946 | null |
| 2024-08-19 | Demystifying the Communication Characteristics for Distributed Transformer Models | Quentin Anthony et.al. | 2408.10197 | null |
| 2024-08-19 | SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models | Anke Tang et.al. | 2408.10174 | link |
| 2024-08-19 | Customizing Language Models with Instance-wise LoRA for Sequential Recommendation | Xiaoyu Kong et.al. | 2408.10159 | null |
| 2024-08-19 | Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models | Amey Hengle et.al. | 2408.10151 | null |
| 2024-08-19 | In-Context Learning with Representations: Contextual Generalization of Trained Transformers | Tong Yang et.al. | 2408.10147 | null |
| 2024-08-19 | Instruction Finetuning for Leaderboard Generation from Empirical AI Research | Salomon Kabongo et.al. | 2408.10141 | null |
| 2024-08-19 | Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models | Tianyu Zhang et.al. | 2408.10124 | link |
| 2024-08-20 | PLUTUS: A Well Pre-trained Large Unified Transformer can Unveil Financial Time Series Regularities | Yuanjian Xu et.al. | 2408.10111 | null |
| 2024-08-19 | Recent Surge in Public Interest in Transportation: Sentiment Analysis of Baidu Apollo Go Using Weibo Data | Shiqi Wang et.al. | 2408.10088 | link |
| 2024-08-19 | ARMADA: Attribute-Based Multimodal Data Augmentation | Xiaomeng Jin et.al. | 2408.10086 | null |
| 2024-08-16 | PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars | Sumanth Prabhu et.al. | 2408.08869 | null |
| 2024-08-16 | Visual Agents as Fast and Slow Thinkers | Guangyan Sun et.al. | 2408.08862 | null |
| 2024-08-16 | ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis | Yubao Zhao et.al. | 2408.08849 | null |
| 2024-08-16 | PsychoLex: Unveiling the Psychological Mind of Large Language Models | Mohammad Amin Abbasi et.al. | 2408.08848 | null |
| 2024-08-16 | FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats | Xuanliang Zhang et.al. | 2408.08841 | link |
| 2024-08-16 | Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors | Felipe A. Csaszar et.al. | 2408.08811 | null |
| 2024-08-16 | Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge | Ravi Raju et.al. | 2408.08808 | null |
| 2024-08-16 | EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics | Chenwei Wan et.al. | 2408.08782 | link |
| 2024-08-16 | Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions | Chenming Tang et.al. | 2408.08780 | null |
| 2024-08-16 | DAC: Decomposed Automation Correction for Text-to-SQL | Dingzirui Wang et.al. | 2408.08779 | link |
| 2024-08-15 | Can Large Language Models Understand Symbolic Graphics Programs? | Zeju Qiu et.al. | 2408.08313 | null |
| 2024-08-15 | ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws | Ruihang Li et.al. | 2408.08310 | null |
| 2024-08-15 | Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors | Usman Syed et.al. | 2408.08302 | null |
| 2024-08-15 | HELP: Hierarchical Embeddings-based Log Parsing | Andy Xu et.al. | 2408.08300 | null |
| 2024-08-15 | The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community | Shachar Don-Yehiya et.al. | 2408.08291 | null |
| 2024-08-15 | Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model | Jin Wang et.al. | 2408.08282 | null |
| 2024-08-15 | BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts | Qizhen Zhang et.al. | 2408.08274 | null |
| 2024-08-15 | DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System | Xihong Yang et.al. | 2408.08231 | null |
| 2024-08-15 | RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science | David Farr et.al. | 2408.08217 | null |
| 2024-08-15 | Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models | Javier González et.al. | 2408.08210 | null |
| 2024-08-14 | The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models | Karime Maamari et.al. | 2408.07702 | null |
| 2024-08-15 | Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities | Enneng Yang et.al. | 2408.07666 | link |
| 2024-08-14 | Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models | Yi-Cheng Lin et.al. | 2408.07665 | link |
| 2024-08-14 | Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions | Quan Liu et.al. | 2408.07663 | link |
| 2024-08-14 | WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs | Weijian Xie et.al. | 2408.07611 | null |
| 2024-08-14 | Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey | Hamza Kheddar et.al. | 2408.07583 | null |
| 2024-08-15 | MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Minxuan Zhou et.al. | 2408.07543 | link |
| 2024-08-14 | Usefulness of data flow diagrams and large language models for security threat validation: a registered report | Winnie Bahati Mbaka et.al. | 2408.07537 | null |
| 2024-08-14 | Development of a Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments | Seungjun Han et.al. | 2408.07531 | null |
| 2024-08-14 | Large Language Models Know What Makes Exemplary Contexts | Quanyu Long et.al. | 2408.07505 | null |
| 2024-08-13 | Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents | Kexun Zhang et.al. | 2408.07060 | link |
| 2024-08-13 | LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs | Yushi Bai et.al. | 2408.07055 | link |
| 2024-08-13 | PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology | Xiaomin Wu et.al. | 2408.07037 | null |
| 2024-08-13 | Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models | Chun Jie Chong et.al. | 2408.07004 | null |
| 2024-08-13 | Generative AI for automatic topic labelling | Diego Kozlowski et.al. | 2408.07003 | null |
| 2024-08-13 | LLMs can Schedule | Henrik Abgaryan et.al. | 2408.06993 | link |
| 2024-08-13 | OpenResearcher: Unleashing AI for Accelerated Scientific Research | Yuxiang Zheng et.al. | 2408.06941 | link |
| 2024-08-13 | Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas | Louis Kwok et.al. | 2408.06929 | null |
| 2024-08-13 | Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives | Zhihu Wang et.al. | 2408.06904 | link |
| 2024-08-13 | Leveraging Language Models for Emotion and Behavior Analysis in Education | Kaito Tanaka et.al. | 2408.06874 | null |
| 2024-08-12 | Animate, or Inanimate, That is the Question for Large Language Models | Leonardo Ranaldi et.al. | 2408.06332 | null |
| 2024-08-12 | Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let’s Take TravelPlanner as an Example | Yanan Chen et.al. | 2408.06318 | null |
| 2024-08-12 | Long-Form Answers to Visual Questions from Blind and Low Vision People | Mina Huh et.al. | 2408.06303 | null |
| 2024-08-12 | The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery | Chris Lu et.al. | 2408.06292 | link |
| 2024-08-12 | MovieSum: An Abstractive Summarization Dataset for Movie Screenplays | Rohit Saxena et.al. | 2408.06281 | link |
| 2024-08-12 | Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation | Jieyong Kim et.al. | 2408.06276 | null |
| 2024-08-12 | FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data | Haoran Sun et.al. | 2408.06273 | link |
| 2024-08-12 | A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution | Sampath Rajapaksha et.al. | 2408.06272 | null |
| 2024-08-12 | Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment | Karel D’Oosterlinck et.al. | 2408.06266 | link |
| 2024-08-12 | On Effects of Steering Latent Representation for Large Language Model Unlearning | Dang Huu-Tien et.al. | 2408.06223 | null |
| 2024-08-10 | Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions | Michele Miranda et.al. | 2408.05212 | link |
| 2024-08-09 | VITA: Towards Open-Source Interactive Omni Multimodal LLM | Chaoyou Fu et.al. | 2408.05211 | null |
| 2024-08-09 | Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners | Michael Vaccaro Jr et.al. | 2408.05204 | null |
| 2024-08-09 | TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning | Yujie Feng et.al. | 2408.05200 | null |
| 2024-08-09 | AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset | Pritam Deka et.al. | 2408.05149 | null |
| 2024-08-09 | A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning | Ye Yuan et.al. | 2408.05141 | null |
| 2024-08-09 | Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations | Jasmine Latendresse et.al. | 2408.05128 | null |
| 2024-08-09 | Large Language Models and Thematic Analysis: Human-AI Synergy in Researching Hate Speech on Social Media | Petre Breazu et.al. | 2408.05126 | null |
| 2024-08-09 | Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video | Chunggi Lee et.al. | 2408.05123 | null |
| 2024-08-09 | A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? | Xinyu Liu et.al. | 2408.05109 | link |
| 2024-08-08 | Transformer Explainer: Interactive Learning of Text-Generative Models | Aeree Cho et.al. | 2408.04619 | link |
| 2024-08-08 | Better Alignment with Instruction Back-and-Forth Translation | Thao Nguyen et.al. | 2408.04614 | null |
| 2024-08-08 | Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models | Qirui Jiao et.al. | 2408.04594 | link |
| 2024-08-08 | Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness | Xiaojing Fan et.al. | 2408.04585 | null |
| 2024-08-08 | SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals | Haoran Zheng et.al. | 2408.04575 | null |
| 2024-08-08 | Learning Fine-Grained Grounded Citations for Attributed Large Language Models | Lei Huang et.al. | 2408.04568 | link |
| 2024-08-08 | Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models | Yupeng Chang et.al. | 2408.04556 | link |
| 2024-08-08 | Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models | Fabio Pernisi et.al. | 2408.04522 | null |
| 2024-08-08 | What You Need is What You Get: Theory of Mind for an LLM-Based Code Understanding Assistant | Jonan Richards et.al. | 2408.04477 | null |
| 2024-08-08 | Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate | Yiqun Zhang et.al. | 2408.04472 | link |
| 2024-08-07 | How Well Can Vision Language Models See Image Details? | Chenhui Gou et.al. | 2408.03940 | null |
| 2024-08-07 | SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature | Vinícius Di Oliveira et.al. | 2408.03936 | null |
| 2024-08-07 | CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases | Xiangyan Liu et.al. | 2408.03910 | link |
| 2024-08-07 | Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models | Shachi H Kumar et.al. | 2408.03907 | null |
| 2024-08-07 | From Data to Story: Towards Automatic Animated Data Video Creation with LLM-based Multi-Agent Systems | Leixian Shen et.al. | 2408.03876 | null |
| 2024-08-07 | PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training | Haoran Xu et.al. | 2408.03865 | null |
| 2024-08-07 | GAIA – A Large Language Model for Advanced Power Dispatch | Yuheng Cheng et.al. | 2408.03847 | null |
| 2024-08-07 | MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models | Yuchen Dong et.al. | 2408.03841 | null |
| 2024-08-07 | WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models | Prannaya Gupta et.al. | 2408.03837 | link |
| 2024-08-07 | Target Prompting for Information Extraction with Vision Language Model | Dipankar Medhi et.al. | 2408.03834 | null |
| 2024-08-06 | Pre-training and in-context learning IS Bayesian inference a la De Finetti | Naimeng Ye et.al. | 2408.03307 | null |
| 2024-08-06 | TextIM: Part-aware Interactive Motion Synthesis from Text | Siyuan Fan et.al. | 2408.03302 | null |
| 2024-08-06 | KaPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models | Ruizhe Zhang et.al. | 2408.03297 | null |
| 2024-08-06 | AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval | Pavel Suma et.al. | 2408.03282 | null |
| 2024-08-07 | StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation | Boxi Cao et.al. | 2408.03281 | link |
| 2024-08-06 | Synthesizing Text-to-SQL Data from Weak and Strong LLMs | Jiaxi Yang et.al. | 2408.03256 | null |
| 2024-08-06 | Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons | Yifei Wang et.al. | 2408.03247 | link |
| 2024-08-06 | Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi | Pranita Deshmukh et.al. | 2408.03172 | null |
| 2024-08-06 | Conditioning LLMs with Emotion in Neural Machine Translation | Charles Brazier et.al. | 2408.03150 | null |
| 2024-08-06 | Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations | Leo Donisch et.al. | 2408.03130 | null |
| 2024-08-05 | Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining | Dongyang Liu et.al. | 2408.02657 | link |
| 2024-08-05 | Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models? | Mohammad Bahrami Karkevandi et.al. | 2408.02651 | null |
| 2024-08-05 | SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models | Muxi Diao et.al. | 2408.02632 | null |
| 2024-08-05 | Language Model Can Listen While Speaking | Ziyang Ma et.al. | 2408.02622 | null |
| 2024-08-05 | Progressively Selective Label Enhancement for Language Model Alignment | Biao Liu et.al. | 2408.02599 | null |
| 2024-08-05 | Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection | Sajal Aggarwal et.al. | 2408.02595 | null |
| 2024-08-05 | Leveraging the Power of LLMs: A Fine-Tuning Approach for High-Quality Aspect-Based Summarization | Ankan Mullick et.al. | 2408.02584 | null |
| 2024-08-05 | Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information | Yauwai Yim et.al. | 2408.02559 | null |
| 2024-08-05 | Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning | Hao Zhou et.al. | 2408.02549 | null |
| 2024-08-05 | RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation | Daniel Fleischer et.al. | 2408.02545 | link |
| 2024-08-02 | Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting | Xiangyu Zhao et.al. | 2408.01423 | null |
| 2024-08-02 | Mission Impossible: A Statistical Perspective on Jailbreaking LLMs | Jingtong Su et.al. | 2408.01420 | null |
| 2024-08-02 | DebateQA: Evaluating Question Answering on Debatable Knowledge | Rongwu Xu et.al. | 2408.01419 | null |
| 2024-08-02 | Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs | Yilun Hua et.al. | 2408.01417 | null |
| 2024-08-02 | Coalitions of Large Language Models Increase the Robustness of AI Agents | Prattyush Mangal et.al. | 2408.01380 | null |
| 2024-08-02 | Toward Automatic Relevance Judgment using Vision–Language Models for Image–Text Retrieval Evaluation | Jheng-Hong Yang et.al. | 2408.01363 | null |
| 2024-08-02 | Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs | Peng Ding et.al. | 2408.01355 | null |
| 2024-08-02 | MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code | Kaiwen Ning et.al. | 2408.01354 | null |
| 2024-08-02 | Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks | Anders Giovanni Møller et.al. | 2408.01346 | null |
| 2024-08-02 | A Backbone for Long-Horizon Robot Task Understanding | Xiaoshuai Chen et.al. | 2408.01334 | null |
| 2024-08-01 | AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation | Mengkang Hu et.al. | 2408.00764 | link |
| 2024-08-01 | Tamper-Resistant Safeguards for Open-Weight LLMs | Rishub Tamirisa et.al. | 2408.00761 | null |
| 2024-08-01 | DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency | Jovan Stojkovic et.al. | 2408.00741 | null |
| 2024-08-01 | Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions | Guangzhi Xiong et.al. | 2408.00727 | null |
| 2024-08-01 | An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models | Yangzhen Wu et.al. | 2408.00724 | link |
| 2024-08-01 | Pathway to Secure and Trustworthy 6G for LLMs: Attacks, Defense, and Opportunities | Sunder Ali Khowaja et.al. | 2408.00722 | null |
| 2024-08-01 | Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning | Trapoom Ukarapol et.al. | 2408.00690 | link |
| 2024-08-01 | Can Developers Prompt? A Controlled Experiment for Code Documentation Generation | Hans-Alexander Kruse et.al. | 2408.00686 | null |
| 2024-08-01 | AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models | Daqin Luo et.al. | 2408.00665 | null |
| 2024-08-01 | Disentangling Dense Embeddings with Sparse Autoencoders | Charles O’Neill et.al. | 2408.00657 | null |
| 2024-07-31 | Vision-Language Model Based Handwriting Verification | Mihir Chauhan et.al. | 2407.21788 | null |
| 2024-07-31 | Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs | Shi Liu et.al. | 2407.21771 | null |
| 2024-07-31 | ReplanVLM: Replanning Robotic Tasks with Visual Language Models | Aoran Mei et.al. | 2407.21762 | null |
| 2024-07-31 | Adaptive Retrieval-Augmented Generation for Conversational Systems | Xi Wang et.al. | 2407.21712 | null |
| 2024-07-31 | CEAR: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature | Stefan Langer et.al. | 2407.21708 | null |
| 2024-07-31 | TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities | Ming Zhang et.al. | 2407.21693 | null |
| 2024-07-31 | Synth-Empathy: Towards High-Quality Synthetic Empathy Data | Hao Liang et.al. | 2407.21669 | link |
| 2024-07-31 | LLM-for-X: Application-agnostic Integration of Large Language Models to Support Personal Writing Workflows | Lukas Teufelberger et.al. | 2407.21593 | null |
| 2024-07-31 | A Performance Study of LLM-Generated Code on Leetcode | Tristan Coignion et.al. | 2407.21579 | null |
| 2024-07-31 | PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning | Min Jae Jung et.al. | 2407.21571 | null |
| 2024-07-30 | ThinK: Thinner Key Cache by Query-Driven Pruning | Yuhui Xu et.al. | 2407.21018 | link |
| 2024-07-30 | CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning | Yuexi Du et.al. | 2407.21011 | link |
| 2024-07-30 | The Dual-Edged Sword of Technical Debt: Benefits and Issues Analyzed Through Developer Discussions | Xiaozhou Li et.al. | 2407.21007 | null |
| 2024-07-30 | MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning | Yupeng Chen et.al. | 2407.20999 | null |
| 2024-07-30 | From Feature Importance to Natural Language Explanations Using LLMs with RAG | Sule Tekkesinoglu et.al. | 2407.20990 | null |
| 2024-07-30 | Large Language Models (LLMs) for Semantic Communication in Edge-based IoT Networks | Alakesh Kalita et.al. | 2407.20970 | null |
| 2024-07-30 | Automated Review Generation Method Based on Large Language Models | Shican Wu et.al. | 2407.20906 | link |
| 2024-07-30 | ThinkRepair: Self-Directed Automated Program Repair | Xin Yin et.al. | 2407.20898 | link |
| 2024-07-30 | Effective Black Box Testing of Sentiment Analysis Classification Networks | Parsa Karbasizadeh et.al. | 2407.20884 | null |
| 2024-07-30 | Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification | Boyang Zhang et.al. | 2407.20859 | null |
| 2024-07-29 | Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing | Ekaterina Iakovleva et.al. | 2407.20232 | null |
| 2024-07-29 | Can Editing LLMs Inject Harm? | Canyu Chen et.al. | 2407.20224 | link |
| 2024-07-29 | QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval | Hongming Tan et.al. | 2407.20207 | null |
| 2024-07-29 | MindSearch: Mimicking Human Minds Elicits Deep AI Searcher | Zehui Chen et.al. | 2407.20183 | link |
| 2024-07-29 | Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning | Xingchen Zeng et.al. | 2407.20174 | link |
| 2024-07-29 | Diffusion Feedback Helps CLIP See Better | Wenxuan Wang et.al. | 2407.20171 | link |
| 2024-07-29 | Language-Conditioned Offline RL for Multi-Robot Navigation | Steven Morad et.al. | 2407.20164 | null |
| 2024-07-29 | rLLM: Relational Table Learning with LLMs | Weichen Li et.al. | 2407.20157 | link |
| 2024-07-29 | ByteCheckpoint: A Unified Checkpointing System for LLM Development | Borui Wan et.al. | 2407.20143 | null |
| 2024-07-29 | Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models | Zhe Li et.al. | 2407.20053 | null |
| 2024-07-26 | Small Molecule Optimization with Large Language Models | Philipp Guevorguian et.al. | 2407.18897 | link |
| 2024-07-26 | Human-artificial intelligence teaming for scientific information extraction from data-driven additive manufacturing research using large language models | Mutahar Safdar et.al. | 2407.18827 | null |
| 2024-07-26 | Automatic Detection of Moral Values in Music Lyrics | Vjosa Preniqi et.al. | 2407.18787 | link |
| 2024-07-26 | The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs | Aleix Sant et.al. | 2407.18786 | null |
| 2024-07-26 | TAGIFY: LLM-powered Tagging Interface for Improved Data Findability on OGD portals | Kevin Kliimask et.al. | 2407.18764 | null |
| 2024-07-26 | Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery | Yuni Susanti et.al. | 2407.18752 | link |
| 2024-07-26 | Towards Effective and Efficient Continual Pre-training of Large Language Models | Jie Chen et.al. | 2407.18743 | link |
| 2024-07-26 | Towards Generalized Offensive Language Identification | Alphaeus Dmonte et.al. | 2407.18738 | null |
| 2024-07-26 | LLASP: Fine-tuning Large Language Models for Answer Set Programming | Erica Coppolillo et.al. | 2407.18723 | null |
| 2024-07-26 | Neurosymbolic AI for Enhancing Instructability in Generative AI | Amit Sheth et.al. | 2407.18722 | null |
| 2024-07-25 | Recursive Introspection: Teaching Language Model Agents How to Self-Improve | Yuxiao Qu et.al. | 2407.18219 | null |
| 2024-07-25 | Exploring Scaling Trends in LLM Robustness | Nikolhaus Howe et.al. | 2407.18213 | null |
| 2024-07-25 | Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models | Sanae Lotfi et.al. | 2407.18158 | null |
| 2024-07-25 | Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic | Fakhraddin Alwajih et.al. | 2407.18129 | null |
| 2024-07-25 | Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow | Tian Guo et.al. | 2407.18103 | null |
| 2024-07-25 | PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization | Christopher Clarke et.al. | 2407.18078 | link |
| 2024-07-25 | C2P: Featuring Large Language Models with Causal Reasoning | Abdolmahdi Bagheri et.al. | 2407.18069 | null |
| 2024-07-25 | ComPeer: A Generative Conversational Agent for Proactive Peer Support | Tianjian Liu et.al. | 2407.18064 | null |
| 2024-07-25 | Audio Entailment: Assessing Deductive Reasoning for Audio Understanding | Soham Deshmukh et.al. | 2407.18062 | link |
| 2024-07-25 | Difficulty Estimation and Simplification of French Text Using LLMs | Henri Jamet et.al. | 2407.18061 | null |
| 2024-07-24 | I Could’ve Asked That: Reformulating Unanswerable Questions | Wenting Zhao et.al. | 2407.17469 | link |
| 2024-07-24 | WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries | Wenting Zhao et.al. | 2407.17468 | null |
| 2024-07-24 | CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models | Jiawei Gu et.al. | 2407.17467 | null |
| 2024-07-24 | $VILA^2$ : VILA Augmented VILA | Yunhao Fang et.al. | 2407.17453 | null |
| 2024-07-24 | Generative AI in Evidence-Based Software Engineering: A White Paper | Mattel Esposito et.al. | 2407.17440 | null |
| 2024-07-24 | Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? | Michael-Andrei Panaitescu-Liess et.al. | 2407.17417 | null |
| 2024-07-24 | (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork | Tianjin Huang et.al. | 2407.17412 | null |
| 2024-07-24 | Grammar-based Game Description Generation using Large Language Models | Tsunehiko Tanaka et.al. | 2407.17404 | null |
| 2024-07-24 | 3D Question Answering for City Scene Understanding | Penglei Sun et.al. | 2407.17398 | null |
| 2024-07-24 | ViPer: Visual Personalization of Generative Models via Individual Preference Learning | Sogand Salehi et.al. | 2407.17365 | null |
| 2024-07-23 | Can Large Language Models Automatically Jailbreak GPT-4V? | Yuanwei Wu et.al. | 2407.16686 | null |
| 2024-07-23 | RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent | Huiyu Xu et.al. | 2407.16667 | null |
| 2024-07-23 | Course-Correction: Safety Alignment Using Synthetic Preferences | Rongwu Xu et.al. | 2407.16637 | link |
| 2024-07-23 | Lawma: The Power of Specialization for Legal Tasks | Ricardo Dominguez-Olmedo et.al. | 2407.16615 | null |
| 2024-07-23 | Shared Imagination: LLMs Hallucinate Alike | Yilun Zhou et.al. | 2407.16604 | null |
| 2024-07-23 | Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs | Yifan Xia et.al. | 2407.16576 | null |
| 2024-07-23 | Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models | Ioana Buhnila et.al. | 2407.16565 | null |
| 2024-07-23 | Patched RTC: evaluating LLMs for diverse software development tasks | Asankhaya Sharma et.al. | 2407.16557 | link |
| 2024-07-24 | MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues | Liyun Zhang et.al. | 2407.16552 | null |
| 2024-07-23 | Imperfect Vision Encoders: Efficient and Robust Tuning for Vision-Language Models | Aristeidis Panos et.al. | 2407.16526 | null |
| 2024-07-22 | AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description | Junyu Xie et.al. | 2407.15850 | link |
| 2024-07-22 | LLMmap: Fingerprinting For Large Language Models | Dario Pasquini et.al. | 2407.15847 | null |
| 2024-07-22 | SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models | Mingze Xu et.al. | 2407.15841 | link |
| 2024-07-22 | MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity | Yangzhou Liu et.al. | 2407.15838 | link |
| 2024-07-22 | dMel: Speech Tokenization made Simple | He Bai et.al. | 2407.15835 | link |
| 2024-07-22 | Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight | Ziyuan Huang et.al. | 2407.15819 | null |
| 2024-07-22 | Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach | Rian Dolphin et.al. | 2407.15788 | null |
| 2024-07-22 | MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation | Marco Simoni et.al. | 2407.15748 | null |
| 2024-07-22 | OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context | Steffen Kleinle et.al. | 2407.15736 | null |
| 2024-07-22 | TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON | John Chong Min Tan et.al. | 2407.15734 | link |
| 2024-07-19 | Internal Consistency and Self-Feedback in Large Language Models: A Survey | Xun Liang et.al. | 2407.14507 | link |
| 2024-07-19 | On Pre-training of Multimodal Language Models Customized for Chart Understanding | Wan-Cyuan Fan et.al. | 2407.14506 | null |
| 2024-07-19 | Evaluating the Reliability of Self-Explanations in Large Language Models | Korbinian Randl et.al. | 2407.14487 | link |
| 2024-07-19 | Contrastive Learning with Counterfactual Explanations for Radiology Report Generation | Mingjie Li et.al. | 2407.14474 | null |
| 2024-07-19 | Check-Eval: A Checklist-based Approach for Evaluating Text Quality | Jayr Pereira et.al. | 2407.14467 | null |
| 2024-07-19 | Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier | Zachary Wojtowicz et.al. | 2407.14452 | null |
| 2024-07-19 | From Instruction to Insight: Exploring the Functional and Semantic Roles of Text in Interactive Dashboards | Nicole Sultanum et.al. | 2407.14451 | null |
| 2024-07-19 | Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding | Renshan Zhang et.al. | 2407.14439 | link |
| 2024-07-19 | The Vision of Autonomic Computing: Can LLMs Make It a Reality? | Zhiyang Zhang et.al. | 2407.14402 | null |
| 2024-07-19 | Open Artificial Knowledge | Vadim Borisov et.al. | 2407.14371 | null |
| 2024-07-18 | Visual Haystacks: Answering Harder Questions About Sets of Images | Tsung-Han Wu et.al. | 2407.13766 | null |
| 2024-07-18 | SegPoint: Segment Any Point Cloud via Large Language Model | Shuting He et.al. | 2407.13761 | null |
| 2024-07-18 | Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models | Zhuo Chen et.al. | 2407.13757 | null |
| 2024-07-18 | CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications | Mirza Masfiqur Rahman et.al. | 2407.13742 | null |
| 2024-07-18 | Baba Is AI: Break the Rules to Beat the Benchmark | Nathan Cloos et.al. | 2407.13729 | null |
| 2024-07-18 | CoDefeater: Using LLMs To Find Defeaters in Assurance Cases | Usman Gohar et.al. | 2407.13717 | null |
| 2024-07-18 | Understanding Reference Policies in Direct Preference Optimization | Yixin Liu et.al. | 2407.13709 | link |
| 2024-07-18 | A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice | Shaina Raza et.al. | 2407.13699 | null |
| 2024-07-18 | Prover-Verifier Games improve legibility of LLM outputs | Jan Hendrik Kirchner et.al. | 2407.13692 | link |
| 2024-07-18 | COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization | Skyler Grandel et.al. | 2407.13648 | null |
| 2024-07-17 | LookupViT: Compressing visual information to a limited number of tokens | Rajat Koner et.al. | 2407.12753 | null |
| 2024-07-17 | EchoSight: Advancing Visual-Language Models with Wiki Knowledge | Yibin Yan et.al. | 2407.12735 | null |
| 2024-07-17 | NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model | Zhongqun Zhang et.al. | 2407.12727 | null |
| 2024-07-17 | Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? | Ben Yao et.al. | 2407.12725 | null |
| 2024-07-17 | The Future of Learning: Large Language Models through the Lens of Students | He Zhang et.al. | 2407.12723 | null |
| 2024-07-17 | MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models | Leyang Shen et.al. | 2407.12709 | link |
| 2024-07-17 | Patch-Level Training for Large Language Models | Chenze Shao et.al. | 2407.12665 | link |
| 2024-07-17 | Zero-shot Text-guided Infinite Image Synthesis with LLM guidance | Soyeong Kwon et.al. | 2407.12642 | null |
| 2024-07-17 | Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences | Claudio Pinhanez et.al. | 2407.12620 | null |
| 2024-07-17 | AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism | William Brannon et.al. | 2407.12613 | link |
| 2024-07-16 | UrbanWorld: An Urban World Model for 3D City Generation | Yu Shang et.al. | 2407.11965 | null |
| 2024-07-16 | NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? | Mo Li et.al. | 2407.11963 | link |
| 2024-07-16 | Code Documentation and Analysis to Secure Software Development | Paul Attie et.al. | 2407.11934 | null |
| 2024-07-16 | What’s Wrong? Refining Meeting Summaries with LLM Feedback | Frederic Kirstein et.al. | 2407.11919 | null |
| 2024-07-16 | Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads | Aritra Dhar et.al. | 2407.11888 | null |
| 2024-07-16 | Schema Matching with Large Language Models: an Experimental Study | Marcel Parciak et.al. | 2407.11852 | link |
| 2024-07-16 | LoFTI: Localization and Factuality Transfer to Indian Locales | Sona Elza Simon et.al. | 2407.11833 | link |
| 2024-07-16 | GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text | Kyle Hamilton et.al. | 2407.11827 | null |
| 2024-07-16 | PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation | Branden Butler et.al. | 2407.11798 | null |
| 2024-07-16 | Large Language Models as Misleading Assistants in Conversation | Betty Li Hou et.al. | 2407.11789 | null |
| 2024-07-15 | VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation | Bocheng Zou et.al. | 2407.10972 | link |
| 2024-07-15 | Q-Sparse: All Large Language Models can be Fully Sparsely-Activated | Hongyu Wang et.al. | 2407.10969 | null |
| 2024-07-15 | No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations | Walter Simoncini et.al. | 2407.10964 | link |
| 2024-07-15 | Fast Matrix Multiplications for Lookup Table-Quantized LLMs | Han Guo et.al. | 2407.10960 | null |
| 2024-07-15 | MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models | Chengguang Gan et.al. | 2407.10953 | null |
| 2024-07-15 | Can Textual Semantics Mitigate Sounding Object Segmentation Preference? | Yaoting Wang et.al. | 2407.10947 | link |
| 2024-07-15 | GRUtopia: Dream General Robots in a City at Scale | Hanqing Wang et.al. | 2407.10943 | link |
| 2024-07-15 | Benchmarking Vision Language Models for Cultural Understanding | Shravan Nayak et.al. | 2407.10920 | null |
| 2024-07-15 | FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets | Xiaohui Victor Li et.al. | 2407.10909 | null |
| 2024-07-15 | Hey, That’s My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique | Mark Russinovich et.al. | 2407.10887 | null |
| 2024-07-12 | FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3 | Georgios Makridis et.al. | 2407.09467 | null |
| 2024-07-12 | Human-like Episodic Memory for Infinite Context LLMs | Zafeirios Fountas et.al. | 2407.09450 | null |
| 2024-07-12 | ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts | Amelia F. Hardy et.al. | 2407.09447 | null |
| 2024-07-12 | MUSCLE: A Model Update Strategy for Compatible LLM Evolution | Jessica Echterhoff et.al. | 2407.09435 | null |
| 2024-07-12 | Open (Clinical) LLMs are Sensitive to Instruction Phrasings | Alberto Mario Ceballos Arroyo et.al. | 2407.09429 | null |
| 2024-07-12 | TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models | Hang Zou et.al. | 2407.09424 | null |
| 2024-07-12 | Mitigating Entity-Level Hallucination in Large Language Models | Weihang Su et.al. | 2407.09417 | link |
| 2024-07-12 | SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers | Shraman Pramanick et.al. | 2407.09413 | link |
| 2024-07-12 | PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents | Saber Zerhoudi et.al. | 2407.09394 | null |
| 2024-07-12 | GAVEL: Generating Games Via Evolution and Language Models | Graham Todd et.al. | 2407.09388 | null |
| 2024-07-11 | MAVIS: Mathematical Visual Instruction Tuning | Renrui Zhang et.al. | 2407.08739 | link |
| 2024-07-11 | Real-Time Anomaly Detection and Reactive Planning with Large Language Models | Rohan Sinha et.al. | 2407.08735 | null |
| 2024-07-11 | Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist | Zihao Zhou et.al. | 2407.08733 | null |
| 2024-07-11 | A Taxonomy for Data Contamination in Large Language Models | Medha Palavalli et.al. | 2407.08716 | null |
| 2024-07-11 | GTA: A Benchmark for General Tool Agents | Jize Wang et.al. | 2407.08713 | link |
| 2024-07-11 | Extracting Training Data from Document-Based VQA Models | Francesco Pinto et.al. | 2407.08707 | null |
| 2024-07-11 | Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models | Zhening Xing et.al. | 2407.08701 | null |
| 2024-07-11 | Mitigating Catastrophic Forgetting in Language Transfer via Model Merging | Anton Alexandrov et.al. | 2407.08699 | null |
| 2024-07-11 | Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight | Zhiqiang Xie et.al. | 2407.08694 | null |
| 2024-07-11 | SEED-Story: Multimodal Long Story Generation with Large Language Model | Shuai Yang et.al. | 2407.08683 | link |
| 2024-07-10 | Training on the Test Task Confounds Evaluation and Emergence | Ricardo Dominguez-Olmedo et.al. | 2407.07890 | link |
| 2024-07-10 | Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization | Junkang Wu et.al. | 2407.07880 | link |
| 2024-07-10 | FACTS About Building Retrieval Augmented Generation-based Chatbots | Rama Akkiraju et.al. | 2407.07858 | null |
| 2024-07-10 | OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training | Sami Jaghouar et.al. | 2407.07852 | null |
| 2024-07-10 | Natural Language Mechanisms via Self-Resolution with Foundation Models | Nicolas Della Penna et.al. | 2407.07845 | null |
| 2024-07-10 | Transformer Alignment in Large Language Models | Murdock Aubry et.al. | 2407.07810 | null |
| 2024-07-10 | Attribute or Abstain: Large Language Models as Long Document Assistants | Jan Buchmann et.al. | 2407.07799 | link |
| 2024-07-11 | Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard | Oguzhan Topsakal et.al. | 2407.07796 | link |
| 2024-07-10 | Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities | Tianjie Ju et.al. | 2407.07791 | null |
| 2024-07-10 | WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment | Jiefu Ou et.al. | 2407.07778 | null |
| 2024-07-09 | AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning | Jiaxi Cui et.al. | 2407.07094 | link |
| 2024-07-09 | FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation | Liqun Ma et.al. | 2407.07093 | link |
| 2024-07-09 | Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models | Logan Cross et.al. | 2407.07086 | link |
| 2024-07-09 | Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities | Shaltiel Shmidman et.al. | 2407.07080 | null |
| 2024-07-09 | Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps | Yung-Sung Chuang et.al. | 2407.07071 | link |
| 2024-07-09 | Prompting Techniques for Secure Code Generation: A Systematic Investigation | Catherine Tony et.al. | 2407.07064 | null |
| 2024-07-09 | Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence | Weize Chen et.al. | 2407.07061 | link |
| 2024-07-09 | Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model | Wenqi Zhang et.al. | 2407.07053 | link |
| 2024-07-09 | CorMulT: A Semi-supervised Modality Correlation-aware Multimodal Transformer for Sentiment Analysis | Yangmin Li et.al. | 2407.07046 | null |
| 2024-07-09 | Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies | Inwon Kang et.al. | 2407.07019 | null |
| 2024-07-08 | Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision | Orr Zohar et.al. | 2407.06189 | link |
| 2024-07-08 | CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation | Xinying Guo et.al. | 2407.06188 | null |
| 2024-07-08 | On Speeding Up Language Model Evaluation | Jin Peng Zhou et.al. | 2407.06172 | null |
| 2024-07-08 | What’s Wrong with Your Code Generated by Large Language Models? An Extensive Study | Shihan Dou et.al. | 2407.06153 | null |
| 2024-07-08 | Using Grammar Masking to Ensure Syntactic Validity in LLM-based Modeling Tasks | Lukas Netz et.al. | 2407.06146 | null |
| 2024-07-08 | ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation | Ethan Chern et.al. | 2407.06135 | link |
| 2024-07-08 | Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization | Hannah K. Bako et.al. | 2407.06129 | link |
| 2024-07-08 | Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities | Avinash Anand et.al. | 2407.06125 | null |
| 2024-07-08 | Artificial Intuition: Efficient Classification of Scientific Abstracts | Harsh Sakhrani et.al. | 2407.06093 | null |
| 2024-07-08 | Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models | Jinliang Lu et.al. | 2407.06089 | null |
| 2024-07-05 | Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs | Rudolf Laine et.al. | 2407.04694 | null |
| 2024-07-05 | ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models | Yuzhe Gu et.al. | 2407.04693 | null |
| 2024-07-05 | Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge | Yuanze Lin et.al. | 2407.04681 | null |
| 2024-07-05 | Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition | Ye Bai et.al. | 2407.04675 | null |
| 2024-07-05 | Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement | Yongji Wu et.al. | 2407.04656 | null |
| 2024-07-05 | Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework | Reza Averly et.al. | 2407.04629 | null |
| 2024-07-05 | On scalable oversight with weak LLMs judging strong LLMs | Zachary Kenton et.al. | 2407.04622 | null |
| 2024-07-05 | Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions | Shumaila Javaid et.al. | 2407.04581 | null |
| 2024-07-05 | VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models | Hang Gao et.al. | 2407.04573 | null |
| 2024-07-05 | PoPreRo: A New Dataset for Popularity Prediction of Romanian Reddit Posts | Ana-Cristina Rogoz et.al. | 2407.04541 | link |
| 2024-07-03 | BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations | Zhantao Yang et.al. | 2407.03314 | null |
| 2024-07-03 | Universal Length Generalization with Turing Programs | Kaiying Hou et.al. | 2407.03310 | null |
| 2024-07-03 | Large Language Models for JSON Schema Discovery | Michael J. Mior et.al. | 2407.03286 | null |
| 2024-07-03 | LLM Internal States Reveal Hallucination Risk Faced With a Query | Ziwei Ji et.al. | 2407.03282 | null |
| 2024-07-03 | Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning | Zhili Shen et.al. | 2407.03227 | null |
| 2024-07-03 | How Does Quantization Affect Multilingual LLMs? | Kelly Marchisio et.al. | 2407.03211 | null |
| 2024-07-03 | TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts | Ruida Wang et.al. | 2407.03203 | link |
| 2024-07-03 | Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models | Haritz Puerto et.al. | 2407.03181 | link |
| 2024-07-03 | Investigating Decoder-only Large Language Models for Speech-to-text Translation | Chao-Wei Huang et.al. | 2407.03169 | null |
| 2024-07-03 | SOS! Soft Prompt Attack Against Open-Source Large Language Models | Ziqing Yang et.al. | 2407.03160 | null |
| 2024-07-02 | MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention | Huiqiang Jiang et.al. | 2407.02490 | link |
| 2024-07-02 | Neurocache: Efficient Vector Retrieval for Long-range Language Modeling | Ali Safaya et.al. | 2407.02486 | link |
| 2024-07-02 | RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs | Yue Yu et.al. | 2407.02485 | null |
| 2024-07-02 | MMedAgent: Learning to Use Medical Tools with Multi-modal Agent | Binxu Li et.al. | 2407.02483 | null |
| 2024-07-02 | Understanding Alignment in Multimodal LLMs: A Comprehensive Study | Elmira Amirloo et.al. | 2407.02477 | null |
| 2024-07-02 | Open Scene Graphs for Open World Object-Goal Navigation | Joel Loo et.al. | 2407.02473 | null |
| 2024-07-02 | Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I | Harrie Oosterhuis et.al. | 2407.02464 | null |
| 2024-07-02 | Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling | Margaret Li et.al. | 2407.02446 | null |
| 2024-07-02 | Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs | Jinmin Li et.al. | 2407.02411 | null |
| 2024-07-02 | CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models | Song Wang et.al. | 2407.02408 | null |
| 2024-06-28 | Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs | Sukmin Yun et.al. | 2406.20098 | link |
| 2024-06-28 | LLaRA: Supercharging Robot Learning Data for Vision-Language Policy | Xiang Li et.al. | 2406.20095 | link |
| 2024-06-28 | Scaling Synthetic Data Creation with 1,000,000,000 Personas | Xin Chan et.al. | 2406.20094 | link |
| 2024-06-28 | LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression | Jieneng Chen et.al. | 2406.20092 | link |
| 2024-06-28 | ProgressGym: Alignment with a Millennium of Moral Progress | Tianyi Qiu et.al. | 2406.20087 | link |
| 2024-06-28 | Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language | Yicheng Chen et.al. | 2406.20085 | null |
| 2024-06-28 | Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification | Anisha Gunjal et.al. | 2406.20079 | link |
| 2024-06-28 | Applying RLAIF for Code Generation with API-usage in Lightweight LLMs | Sujan Dutta et.al. | 2406.20060 | null |
| 2024-07-01 | BMW Agents – A Framework For Task Automation Through Multi-Agent Collaboration | Noel Crawford et.al. | 2406.20041 | null |
| 2024-06-28 | BioMNER: A Dataset for Biomedical Method Entity Recognition | Chen Tang et.al. | 2406.20038 | null |
| 2024-06-27 | ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos | Jr-Jen Chen et.al. | 2406.19392 | link |
| 2024-06-27 | The Remarkable Robustness of LLMs: Stages of Inference? | Vedang Lad et.al. | 2406.19384 | link |
| 2024-06-27 | Suri: Multi-constraint Instruction Following for Long-form Text Generation | Chau Minh Pham et.al. | 2406.19371 | link |
| 2024-06-27 | The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models | Xiliang Zhu et.al. | 2406.19358 | null |
| 2024-06-27 | DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions | Nigel Fernandez et.al. | 2406.19356 | null |
| 2024-06-27 | IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language | Lucky Susanto et.al. | 2406.19349 | null |
| 2024-06-27 | Jump Starting Bandits with LLM-Generated Prior Knowledge | Parand A. Alamdari et.al. | 2406.19317 | null |
| 2024-06-27 | Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation | Malvina Nikandrou et.al. | 2406.19297 | null |
| 2024-06-27 | From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data | Zheyang Xiong et.al. | 2406.19292 | link |
| 2024-06-27 | PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models | Cathy Mengying Fang et.al. | 2406.19283 | null |
| 2024-06-26 | Symbolic Learning Enables Self-Evolving Agents | Wangchunshu Zhou et.al. | 2406.18532 | link |
| 2024-06-26 | PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation | Christoph Leiter et.al. | 2406.18528 | null |
| 2024-06-26 | CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs | Zirui Wang et.al. | 2406.18521 | null |
| 2024-06-26 | “Is ChatGPT a Better Explainer than My Professor?”: Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline | Grace Li et.al. | 2406.18512 | null |
| 2024-06-26 | Mental Modeling of Reinforcement Learning Agents by Language Models | Wenhao Lu et.al. | 2406.18505 | null |
| 2024-06-26 | Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming | Zhenghao Zhou et.al. | 2406.18501 | null |
| 2024-06-26 | Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation | Ahmed Njifenjou et.al. | 2406.18460 | null |
| 2024-06-26 | Cascading Large Language Models for Salient Event Graph Generation | Xingwei Tan et.al. | 2406.18449 | null |
| 2024-06-26 | New intelligent empowerment for digital transformation | Peng Yifeng et.al. | 2406.18440 | null |
| 2024-06-26 | IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons | Dan Shi et.al. | 2406.18406 | null |
| 2024-06-25 | Text-Animator: Controllable Visual Text Video Generation | Lin Liu et.al. | 2406.17777 | null |
| 2024-06-25 | MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning | Xiangyu Zhao et.al. | 2406.17770 | link |
| 2024-06-25 | BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning | Ercong Nie et.al. | 2406.17764 | link |
| 2024-06-25 | CaLMQA: Exploring culturally specific long-form question answering across 23 languages | Shane Arora et.al. | 2406.17761 | link |
| 2024-06-25 | Accelerating Clinical Evidence Synthesis with Large Language Models | Zifeng Wang et.al. | 2406.17755 | null |
| 2024-06-25 | Measuring and Benchmarking Large Language Models’ Capabilities to Generate Persuasive Language | Amalie Brogaard Pauli et.al. | 2406.17753 | null |
| 2024-06-25 | LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users | Elinor Poole-Dayan et.al. | 2406.17737 | null |
| 2024-06-25 | FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model | Feijie Wu et.al. | 2406.17706 | null |
| 2024-06-25 | From Distributional to Overton Pluralism: Investigating Large Language Model Alignment | Thom Lake et.al. | 2406.17692 | link |
| 2024-06-25 | VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation | Kun Qian et.al. | 2406.17681 | null |
| 2024-06-24 | EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees | Yuhui Li et.al. | 2406.16858 | null |
| 2024-06-24 | From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models | Sean Welleck et.al. | 2406.16838 | null |
| 2024-06-24 | USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$ onversations | Mounika Marreddy et.al. | 2406.16833 | null |
| 2024-06-24 | Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track | Ronak Pradeep et.al. | 2406.16828 | null |
| 2024-06-24 | GPT-4V Explorations: Mining Autonomous Driving | Zixuan Li et.al. | 2406.16817 | null |
| 2024-06-24 | RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale | Beck LaBash et.al. | 2406.16801 | link |
| 2024-06-24 | Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs | Ashwinee Panda et.al. | 2406.16797 | link |
| 2024-06-24 | M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models | Rishabh Maheshwary et.al. | 2406.16783 | null |
| 2024-06-24 | It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension | Sagi Shaier et.al. | 2406.16779 | null |
| 2024-06-24 | Blending LLMs into Cascaded Speech Translation: KIT’s Offline Speech Translation System for IWSLT 2024 | Sai Koneru et.al. | 2406.16777 | null |
| 2024-06-21 | GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians | Haoyang Liu et.al. | 2406.15341 | link |
| 2024-06-21 | Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance | Haoling Li et.al. | 2406.15330 | null |
| 2024-06-21 | An End-to-End, Segmentation-Free, Arabic Handwritten Recognition Model on KHATT | Sondos Aabed et.al. | 2406.15329 | null |
| 2024-06-21 | Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks | Hokyung Lee et.al. | 2406.15325 | null |
| 2024-06-21 | Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics | Weijia Zhang et.al. | 2406.15264 | null |
| 2024-06-21 | Detecting Synthetic Lyrics with Few-Shot Inference | Yanis Labrak et.al. | 2406.15231 | null |
| 2024-06-21 | A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation | Irune Zubiaga et.al. | 2406.15227 | null |
| 2024-06-21 | Unsupervised Extraction of Dialogue Policies from Conversations | Makesh Narsimhan Sreedhar et.al. | 2406.15214 | null |
| 2024-06-21 | Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding | Mohan Li et.al. | 2406.15209 | null |
| 2024-06-21 | Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms | Santiago Berrezueta-Guzman et.al. | 2406.15198 | null |
| 2024-06-20 | Model Merging and Safety Alignment: One Bad Model Spoils the Bunch | Hasan Abed Al Kader Hammoud et.al. | 2406.14563 | null |
| 2024-06-20 | Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities | Sachit Menon et.al. | 2406.14562 | null |
| 2024-06-20 | Asynchronous Large Language Model Enhanced Planner for Autonomous Driving | Yuan Chen et.al. | 2406.14556 | link |
| 2024-06-20 | GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models | Shilong Li et.al. | 2406.14550 | null |
| 2024-06-20 | Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models | Sunny Duan et.al. | 2406.14549 | null |
| 2024-06-20 | Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data | Johannes Treutlein et.al. | 2406.14546 | link |
| 2024-06-20 | Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems | Đorđe Klisura et.al. | 2406.14545 | null |
| 2024-06-20 | Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs | Yuxuan Qiao et.al. | 2406.14544 | link |
| 2024-06-20 | Are LLMs Naturally Good at Synthetic Tabular Data Generation? | Shengzhe Xu et.al. | 2406.14541 | link |
| 2024-06-20 | PostMark: A Robust Blackbox Watermark for Large Language Models | Yapei Chang et.al. | 2406.14517 | link |
| 2024-06-18 | DrVideo: Document Retrieval Based Long Video Understanding | Ziyu Ma et.al. | 2406.12846 | null |
| 2024-06-18 | Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts | Haoxiang Wang et.al. | 2406.12845 | link |
| 2024-06-18 | Synergizing Foundation Models and Federated Learning: A Survey | Shenghui Li et.al. | 2406.12844 | null |
| 2024-06-18 | LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation | Seyedarmin Azizi et.al. | 2406.12832 | link |
| 2024-06-18 | Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models? | Pinzhen Chen et.al. | 2406.12822 | null |
| 2024-06-18 | Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones? | Zhe Yang et.al. | 2406.12809 | null |
| 2024-06-18 | Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents | Zehao Wang et.al. | 2406.12806 | null |
| 2024-06-18 | Supporting Human Raters with the Detection of Harmful Content using Large Language Models | Kurt Thomas et.al. | 2406.12800 | null |
| 2024-06-18 | ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | Team GLM et.al. | 2406.12793 | null |
| 2024-06-18 | Generating Educational Materials with Different Levels of Readability using LLMs | Chieh-Yang Huang et.al. | 2406.12787 | null |
| 2024-06-17 | LLaNA: Large Language and NeRF Assistant | Andrea Amaduzzi et.al. | 2406.11840 | null |
| 2024-06-17 | mDPO: Conditional Preference Optimization for Multimodal Large Language Models | Fei Wang et.al. | 2406.11839 | link |
| 2024-06-17 | Unveiling Encoder-Free Vision-Language Models | Haiwen Diao et.al. | 2406.11832 | link |
| 2024-06-17 | Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models | Bingqi Ma et.al. | 2406.11831 | null |
| 2024-06-17 | WPO: Enhancing RLHF with Weighted Preference Optimization | Wenxuan Zhou et.al. | 2406.11827 | link |
| 2024-06-17 | Composing Object Relations and Attributes for Image-Text Matching | Khoi Pham et.al. | 2406.11820 | null |
| 2024-06-17 | Embodied Instruction Following in Unknown Environments | Zhenyu Wu et.al. | 2406.11818 | null |
| 2024-06-17 | VideoLLM-online: Online Video Large Language Model for Streaming Video | Joya Chen et.al. | 2406.11816 | null |
| 2024-06-17 | LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning | Dantong Niu et.al. | 2406.11815 | null |
| 2024-06-17 | How Do Large Language Models Acquire Factual Knowledge During Pretraining? | Hoyeon Chang et.al. | 2406.11813 | link |
| 2024-06-14 | Quantifying Variance in Evaluation Benchmarks | Lovish Madaan et.al. | 2406.10229 | null |
| 2024-06-14 | Semantic Membership Inference Attack against Large Language Models | Hamid Mozaffari et.al. | 2406.10218 | null |
| 2024-06-14 | Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs | Rui Yang et.al. | 2406.10216 | link |
| 2024-06-14 | Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs | Abhimanyu Hans et.al. | 2406.10209 | link |
| 2024-06-14 | A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors | Naaman Tan et.al. | 2406.10203 | null |
| 2024-06-14 | TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners | Tomas de la Rosa et.al. | 2406.10196 | null |
| 2024-06-14 | Detecting and Evaluating Medical Hallucinations in Large Vision Language Models | Jiawei Chen et.al. | 2406.10185 | null |
| 2024-06-14 | Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors | Siyuan Chen et.al. | 2406.10181 | null |
| 2024-06-14 | Datasets for Multilingual Answer Sentence Selection | Matteo Gabburo et.al. | 2406.10172 | null |
| 2024-06-14 | Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models | Carson Denison et.al. | 2406.10162 | link |
| 2024-06-13 | VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding | Muhammad Maaz et.al. | 2406.09418 | link |
| 2024-06-13 | Explore the Limits of Omni-modal Pretraining at Scale | Yiyuan Zhang et.al. | 2406.09412 | link |
| 2024-06-13 | Yo’LLaVA: Your Personalized Language and Vision Assistant | Thao Nguyen et.al. | 2406.09400 | link |
| 2024-06-13 | Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms | Miaosen Zhang et.al. | 2406.09397 | null |
| 2024-06-13 | Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA | Jongwoo Park et.al. | 2406.09396 | link |
| 2024-06-13 | Improving Autoregressive Training with Dynamic Oracles | Jianing Yang et.al. | 2406.09393 | null |
| 2024-06-13 | Towards Vision-Language Geo-Foundation Model: A Survey | Yue Zhou et.al. | 2406.09385 | link |
| 2024-06-13 | Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs | Zijia Zhao et.al. | 2406.09367 | link |
| 2024-06-13 | ElicitationGPT: Text Elicitation Mechanisms via Language Models | Yifan Wu et.al. | 2406.09363 | null |
| 2024-06-13 | DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding | Suwon Shon et.al. | 2406.09345 | null |
| 2024-06-12 | Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens | Ting-Ji Huang et.al. | 2406.08477 | null |
| 2024-06-12 | Real2Code: Reconstruct Articulated Objects via Code Generation | Zhao Mandi et.al. | 2406.08474 | null |
| 2024-06-12 | Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing | Zhangchen Xu et.al. | 2406.08464 | link |
| 2024-06-12 | ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery | Kam Woh Ng et.al. | 2406.08457 | link |
| 2024-06-12 | TasTe: Teaching Large Language Models to Translate through Self-Reflection | Yutong Wang et.al. | 2406.08434 | link |
| 2024-06-12 | Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL | Zijin Hong et.al. | 2406.08426 | null |
| 2024-06-12 | OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text | Qingyun Li et.al. | 2406.08418 | link |
| 2024-06-12 | Discovering Preference Optimization Algorithms with and for Large Language Models | Chris Lu et.al. | 2406.08414 | link |
| 2024-06-12 | Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference | Christopher Wolters et.al. | 2406.08413 | null |
| 2024-06-12 | Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models | Chun-Yi Kuan et.al. | 2406.08402 | link |
| 2024-06-11 | Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena | Aidar Myrzakhan et.al. | 2406.07545 | link |
| 2024-06-11 | QuickLLaMA: Query-aware Inference Acceleration for Large Language Models | Jingyao Li et.al. | 2406.07528 | link |
| 2024-06-11 | Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement | Yunzhen Feng et.al. | 2406.07515 | null |
| 2024-06-11 | THaLLE: Text Hyperlocally Augmented Large Language Extension – Technical Report | KBTG Labs et.al. | 2406.07505 | null |
| 2024-06-11 | Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions | Renjie Pi et.al. | 2406.07502 | link |
| 2024-06-11 | TextGrad: Automatic “Differentiation” via Text | Mert Yuksekgonul et.al. | 2406.07496 | link |
| 2024-06-11 | CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization | Frederic Kirstein et.al. | 2406.07494 | null |
| 2024-06-11 | PITCH: Productivity and Mental Well-being Coaching through Daily Conversational Interaction | Adnan Abbas et.al. | 2406.07485 | null |
| 2024-06-11 | Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing | Mao Li et.al. | 2406.07483 | null |
| 2024-06-11 | VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs | Zesen Cheng et.al. | 2406.07476 | link |
| 2024-06-10 | Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation | Peize Sun et.al. | 2406.06525 | link |
| 2024-06-10 | UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor | Shivani Upadhyay et.al. | 2406.06519 | link |
| 2024-06-10 | NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative | Asmar Nadeem et.al. | 2406.06499 | null |
| 2024-06-10 | Towards a Personal Health Large Language Model | Justin Cosentino et.al. | 2406.06474 | null |
| 2024-06-10 | AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction | Zhen Xing et.al. | 2406.06465 | null |
| 2024-06-10 | Transforming Wearable Data into Health Insights using Large Language Model Agents | Mike A. Merrill et.al. | 2406.06464 | null |
| 2024-06-10 | VCR: Visual Caption Restoration | Tianyu Zhang et.al. | 2406.06462 | link |
| 2024-06-10 | Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies | Junlin Wang et.al. | 2406.06461 | null |
| 2024-06-10 | Evaluating the Retrieval Component in LLM-Based Question Answering Systems | Ashkan Alinejad et.al. | 2406.06458 | null |
| 2024-06-10 | A Large Language Model Pipeline for Breast Cancer Oncology | Tristen Pool et.al. | 2406.06455 | null |
| 2024-06-07 | 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs | Jianing Yang et.al. | 2406.05132 | null |
| 2024-06-07 | An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models | Xiongtao Zhou et.al. | 2406.05130 | null |
| 2024-06-07 | Towards Semantic Equivalence of Tokenization in Multimodal LLM | Shengqiong Wu et.al. | 2406.05127 | null |
| 2024-06-07 | Categorizing Sources of Information for Explanations in Conversational AI Systems for Older Adults Aging in Place | Niharika Mathur et.al. | 2406.05111 | null |
| 2024-06-07 | LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration | Tavor Lipman et.al. | 2406.05107 | null |
| 2024-06-07 | Multi-Head RAG: Solving Multi-Aspect Problems with LLMs | Maciej Besta et.al. | 2406.05085 | link |
| 2024-06-07 | Are Large Language Models More Empathetic than Humans? | Anuradha Welivita et.al. | 2406.05063 | null |
| 2024-06-07 | Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions | Shi-Yu Tian et.al. | 2406.05055 | null |
| 2024-06-07 | Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation | Nachiket Kotalwar et.al. | 2406.05053 | null |
| 2024-06-07 | Bootstrapping Referring Multi-Object Tracking | Yani Zhang et.al. | 2406.05039 | null |
| 2024-06-06 | Verbalized Machine Learning: Revisiting Machine Learning with Language Models | Tim Z. Xiao et.al. | 2406.04344 | null |
| 2024-06-06 | RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation | Jiaming Liu et.al. | 2406.04339 | null |
| 2024-06-06 | Coherent Zero-Shot Visual Instruction Generation | Quynh Phung et.al. | 2406.04337 | null |
| 2024-06-06 | DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs | Lingchen Meng et.al. | 2406.04334 | null |
| 2024-06-06 | PaCE: Parsimonious Concept Engineering for Large Language Models | Jinqi Luo et.al. | 2406.04331 | link |
| 2024-06-06 | Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step | Zhanhao Liang et.al. | 2406.04314 | null |
| 2024-06-06 | Semantically Diverse Language Generation for Uncertainty Estimation in Language Models | Lukas Aichberger et.al. | 2406.04306 | link |
| 2024-06-06 | Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models | Phat Nguyen et.al. | 2406.04300 | null |
| 2024-06-06 | What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages | Nadav Borenstein et.al. | 2406.04289 | null |
| 2024-06-06 | Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People | Dun-Ming Huang et.al. | 2406.04278 | link |
| 2024-06-05 | Wings: Learning Multimodal LLMs without Text-only Forgetting | Yi-Kai Zhang et.al. | 2406.03496 | null |
| 2024-06-05 | Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training | Sun Ao et.al. | 2406.03488 | null |
| 2024-06-05 | Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends | Sanjana Ramprasad et.al. | 2406.03487 | null |
| 2024-06-05 | BIPED: Pedagogically Informed Tutoring System for ESL Education | Soonwoo Kwon et.al. | 2406.03486 | null |
| 2024-06-05 | Does your data spark joy? Performance gains from domain upsampling at the end of training | Cody Blakeney et.al. | 2406.03476 | null |
| 2024-06-05 | AD-H: Autonomous Driving with Hierarchical Agents | Zaibin Zhang et.al. | 2406.03474 | null |
| 2024-06-05 | What is the Best Way for ChatGPT to Translate Poetry? | Shanshan Wang et.al. | 2406.03450 | null |
| 2024-06-05 | Pre-trained Large Language Models Use Fourier Features to Compute Addition | Tianyi Zhou et.al. | 2406.03445 | null |
| 2024-06-05 | Investigating the Relationship Between User Specialization and Toxicity on Reddit: A Sentiment Analysis Approach | Abi Oppenheim et.al. | 2406.03443 | null |
| 2024-06-05 | Cycles of Thought: Measuring LLM Confidence through Stable Explanations | Evan Becker et.al. | 2406.03441 | null |
| 2024-06-04 | Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks | Tianyu He et.al. | 2406.02550 | link |
| 2024-06-04 | Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning | Alex Jinpeng Wang et.al. | 2406.02547 | link |
| 2024-06-04 | To Believe or Not to Believe Your LLM | Yasin Abbasi Yadkori et.al. | 2406.02543 | null |
| 2024-06-04 | Loki: Low-Rank Keys for Efficient Sparse Attention | Prajwal Singhania et.al. | 2406.02542 | null |
| 2024-06-04 | Parrot: Multilingual Visual Instruction Tuning | Hai-Long Sun et.al. | 2406.02539 | null |
| 2024-06-04 | Mitigate Position Bias in Large Language Models via Scaling a Single Dimension | Yijiong Yu et.al. | 2406.02536 | null |
| 2024-06-04 | SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices | Ruslan Svirschevski et.al. | 2406.02532 | null |
| 2024-06-04 | Scalable MatMul-free Language Modeling | Rui-Jie Zhu et.al. | 2406.02528 | link |
| 2024-06-04 | CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks | Maciej Besta et.al. | 2406.02524 | null |
| 2024-06-04 | RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots | Soroush Nasiriany et.al. | 2406.02523 | null |
| 2024-05-31 | Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis | Chaoyou Fu et.al. | 2405.21075 | null |
| 2024-05-31 | Grammar-Aligned Decoding | Kanghee Park et.al. | 2405.21047 | null |
| 2024-05-31 | Direct Alignment of Language Models via Quality-Aware Self-Refinement | Runsheng Yu et.al. | 2405.21040 | null |
| 2024-05-31 | Standards for Belief Representations in LLMs | Daniel A. Herrmann et.al. | 2405.21030 | null |
| 2024-05-31 | LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models | Elias Stengel-Eskin et.al. | 2405.21028 | link |
| 2024-05-31 | Improved Techniques for Optimization-Based Jailbreaking on Large Language Models | Xiaojun Jia et.al. | 2405.21018 | link |
| 2024-05-31 | DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models | Linli Yao et.al. | 2405.20985 | null |
| 2024-05-31 | Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training | Feiteng Fang et.al. | 2405.20978 | null |
| 2024-05-31 | SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales | Tianyang Xu et.al. | 2405.20974 | link |
| 2024-05-31 | LCQ: Low-Rank Codebook based Quantization for Large Language Models | Wen-Pu Cai et.al. | 2405.20973 | null |
| 2024-05-30 | MotionLLM: Understanding Human Behaviors from Human Motions and Videos | Ling-Hao Chen et.al. | 2405.20340 | null |
| 2024-05-30 | Visual Perception by Large Language Model’s Weights | Feipeng Ma et.al. | 2405.20339 | null |
| 2024-05-30 | Xwin-LM: Strong and Scalable Alignment Practice for LLMs | Bolin Ni et.al. | 2405.20335 | link |
| 2024-05-31 | ParSEL: Parameterized Shape Editing with Language | Aditya Ganeshan et.al. | 2405.20319 | null |
| 2024-05-30 | CausalQuest: Collecting Natural Causal Questions for AI Agents | Roberto Ceraolo et.al. | 2405.20318 | link |
| 2024-05-30 | ANAH: Analytical Annotation of Hallucinations in Large Language Models | Ziwei Ji et.al. | 2405.20315 | link |
| 2024-05-30 | Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation | Guillaume Huguet et.al. | 2405.20313 | null |
| 2024-05-30 | Large Language Models Can Self-Improve At Web Agent Tasks | Ajay Patel et.al. | 2405.20309 | null |
| 2024-05-30 | Group Robust Preference Optimization in Reward-free RLHF | Shyam Sundhar Ramesh et.al. | 2405.20304 | link |
| 2024-05-30 | Who Writes the Review, Human or AI? | Panagiotis C. Theocharopoulos et.al. | 2405.20285 | null |
| 2024-05-29 | X-VILA: Cross-Modality Alignment for Large Language Model | Hanrong Ye et.al. | 2405.19335 | null |
| 2024-05-29 | LLMs Meet Multimodal Generation and Editing: A Survey | Yingqing He et.al. | 2405.19334 | link |
| 2024-05-29 | Multi-Modal Generative Embedding Model | Feipeng Ma et.al. | 2405.19333 | null |
| 2024-05-29 | Self-Exploring Language Models: Active Preference Elicitation for Online Alignment | Shenao Zhang et.al. | 2405.19332 | link |
| 2024-05-29 | Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation | Atrisha Sarkar et.al. | 2405.19328 | null |
| 2024-05-29 | MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series | Ge Zhang et.al. | 2405.19327 | null |
| 2024-05-29 | Reasoning3D – Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models | Tianrun Chen et.al. | 2405.19326 | null |
| 2024-05-29 | Nearest Neighbor Speculative Decoding for LLM Generation and Attribution | Minghan Li et.al. | 2405.19325 | null |
| 2024-05-29 | Are Large Language Models Chameleons? | Mingmeng Geng et.al. | 2405.19323 | null |
| 2024-05-29 | Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF | Shicong Cen et.al. | 2405.19320 | null |
| 2024-05-28 | Don’t Forget to Connect! Improving RAG with Graph-based Reranking | Jialin Dong et.al. | 2405.18414 | null |
| 2024-05-28 | Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass | Ethan Shen et.al. | 2405.18400 | link |
| 2024-05-28 | Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning | Yixiao Zhang et.al. | 2405.18386 | link |
| 2024-05-28 | OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning | Pengxiang Li et.al. | 2405.18380 | link |
| 2024-05-28 | LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models | Anthony Sarah et.al. | 2405.18377 | null |
| 2024-05-28 | Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning | Dongjie Chen et.al. | 2405.18376 | link |
| 2024-05-28 | Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning | Phakphum Artkaew et.al. | 2405.18375 | null |
| 2024-05-28 | PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework | Eshaan Agarwal et.al. | 2405.18369 | null |
| 2024-05-28 | Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? | Yifan Bai et.al. | 2405.18361 | null |
| 2024-05-28 | Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs | Somnath Kumar et.al. | 2405.18359 | null |
| 2024-05-27 | Matryoshka Multimodal Models | Mu Cai et.al. | 2405.17430 | null |
| 2024-05-27 | NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models | Chankyu Lee et.al. | 2405.17428 | null |
| 2024-05-27 | Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model | Kuan-Chih Huang et.al. | 2405.17427 | link |
| 2024-05-27 | LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence | Zhuoling Li et.al. | 2405.17424 | null |
| 2024-05-27 | Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation | Jiaming Liu et.al. | 2405.17418 | null |
| 2024-05-27 | THREAD: Thinking Deeper with Recursive Spawning | Philip Schroeder et.al. | 2405.17402 | null |
| 2024-05-27 | MindMerger: Efficient Boosting LLM Reasoning in non-English Languages | Zixian Huang et.al. | 2405.17386 | null |
| 2024-05-27 | ReMoDetect: Reward Models Recognize Aligned LLM’s Generations | Hyunseok Lee et.al. | 2405.17382 | null |
| 2024-05-27 | RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects | Ahmed Allam et.al. | 2405.17378 | null |
| 2024-05-27 | Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models | ShengYun Peng et.al. | 2405.17374 | null |
| 2024-05-24 | Scaling Laws for Discriminative Classification in Large Language Models | Dean Wyatte et.al. | 2405.15765 | null |
| 2024-05-24 | Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias | Andres Algaba et.al. | 2405.15739 | null |
| 2024-05-24 | More Insight from Being More Focused: Analysis of Clustered Market Apps | Maleknaz Nayebi et.al. | 2405.15737 | null |
| 2024-05-24 | LM4LV: A Frozen Large Language Model for Low-level Vision Tasks | Boyang Zheng et.al. | 2405.15734 | null |
| 2024-05-24 | Optimizing Large Language Models for OpenAPI Code Completion | Bohdan Petryshyn et.al. | 2405.15729 | null |
| 2024-05-24 | Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models | Yue Zhang et.al. | 2405.15684 | null |
| 2024-05-24 | What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models | Abdelrahman Abdelhamed et.al. | 2405.15668 | null |
| 2024-05-24 | Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning | Wenhan Chang et.al. | 2405.15662 | null |
| 2024-05-24 | \(\mathbf{L^2\cdot M = C^2}\) Large Language Models as Covert Channels… a Systematic Analysis | Simen Gaure et.al. | 2405.15652 | null |
| 2024-05-24 | LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots | Ruoyu Wang et.al. | 2405.15646 | null |
| 2024-05-23 | A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns | Asaf Yehudai et.al. | 2405.14863 | null |
| 2024-05-23 | Bitune: Bidirectional Instruction-Tuning | Dawid J. Kopiczko et.al. | 2405.14862 | null |
| 2024-05-23 | PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression | Vladimir Malinovskii et.al. | 2405.14852 | null |
| 2024-05-23 | HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models | Bernal Jiménez Gutiérrez et.al. | 2405.14831 | null |
| 2024-05-23 | Can LLMs Solve longer Math Word Problems Better? | Xin Xu et.al. | 2405.14804 | null |
| 2024-05-23 | Lessons from the Trenches on Reproducible Evaluation of Language Models | Stella Biderman et.al. | 2405.14782 | null |
| 2024-05-23 | WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models | Peng Wang et.al. | 2405.14768 | link |
| 2024-05-23 | FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models | Hongyang Yang et.al. | 2405.14767 | link |
| 2024-05-23 | Evaluating Large Language Models for Public Health Classification and Extraction Tasks | Joshua Harris et.al. | 2405.14766 | null |
| 2024-05-23 | Large language models can be zero-shot anomaly detectors for time series? | Sarah Alnegheimish et.al. | 2405.14755 | null |
| 2024-05-21 | Reducing Transformer Key-Value Cache Size with Cross-Layer Attention | William Brandon et.al. | 2405.12981 | null |
| 2024-05-21 | Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale | Shriram Chennakesavalu et.al. | 2405.12961 | null |
| 2024-05-21 | Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models | Zhangyue Yin et.al. | 2405.12939 | null |
| 2024-05-21 | Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs | Bilgehan Sel et.al. | 2405.12933 | null |
| 2024-05-21 | Code-mixed Sentiment and Hate-speech Prediction | Anjali Yadav et.al. | 2405.12929 | null |
| 2024-05-21 | Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples | Tim Menzies et.al. | 2405.12920 | null |
| 2024-05-21 | G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation | Xingyuan Pan et.al. | 2405.12915 | null |
| 2024-05-21 | An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation | Zhiyu Tan et.al. | 2405.12914 | null |
| 2024-05-21 | Topic Modelling Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment | Holli Sargeant et.al. | 2405.12910 | link |
| 2024-05-21 | Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents | San Kim et.al. | 2405.12900 | null |
| 2024-05-20 | Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning | Guanglin Zhou et.al. | 2405.12217 | link |
| 2024-05-20 | MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark | Hongwei Liu et.al. | 2405.12209 | link |
| 2024-05-20 | Developers’ Perceptions on the Impact of ChatGPT in Software Development: A Survey | Thiago S. Vaillant et.al. | 2405.12195 | null |
| 2024-05-20 | CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models | Haoxiang Shi et.al. | 2405.12174 | null |
| 2024-05-20 | Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging | Xiaobo Liang et.al. | 2405.12163 | link |
| 2024-05-20 | Eliciting Problem Specifications via Large Language Models | Robert E. Wray et.al. | 2405.12147 | null |
| 2024-05-20 | DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM | Xuchen Li et.al. | 2405.12139 | null |
| 2024-05-20 | MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning | Ting Jiang et.al. | 2405.12130 | link |
| 2024-05-20 | Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation | Zhankui He et.al. | 2405.12119 | null |
| 2024-05-20 | Imp: Highly Capable Large Multimodal Models for Mobile Devices | Zhenwei Shao et.al. | 2405.12107 | link |
| 2024-05-17 | A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers | Kaiyu Huang et.al. | 2405.10936 | link |
| 2024-05-17 | The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks | Lucius Bushnaq et.al. | 2405.10928 | null |
| 2024-05-17 | COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain | Dimitrios P. Panagoulias et.al. | 2405.10893 | null |
| 2024-05-17 | Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review | Hongyi Yang et.al. | 2405.10883 | null |
| 2024-05-17 | The Future of Large Language Model Pre-training is Federated | Lorenzo Sani et.al. | 2405.10853 | null |
| 2024-05-17 | Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities | Hao Zhou et.al. | 2405.10825 | null |
| 2024-05-17 | Modeling Supply Chain Interaction and Disruption: Insights from Real-world Data and Complex Adaptive System | Jiawei Feng et.al. | 2405.10818 | null |
| 2024-05-17 | ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios | Markus Bayer et.al. | 2405.10808 | null |
| 2024-05-17 | Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings | Albert Sawczyn et.al. | 2405.10745 | null |
| 2024-05-17 | Efficient Multimodal Large Language Models: A Survey | Yizhang Jin et.al. | 2405.10739 | link |
| 2024-05-16 | UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models | Sahel Sharifymoghaddam et.al. | 2405.10311 | null |
| 2024-05-16 | 4D Panoptic Scene Graph Generation | Jingkang Yang et.al. | 2405.10305 | link |
| 2024-05-16 | HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models | Rhea Sanjay Sukthanker et.al. | 2405.10299 | link |
| 2024-05-16 | Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction | Jianhao Chen et.al. | 2405.10288 | null |
| 2024-05-16 | FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models | Adrian Bulat et.al. | 2405.10286 | null |
| 2024-05-16 | Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers | Tuo Zhang et.al. | 2405.10276 | null |
| 2024-05-16 | Keep It Private: Unsupervised Privatization of Online Text | Calvin Bao et.al. | 2405.10260 | link |
| 2024-05-16 | When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models | Xianzheng Ma et.al. | 2405.10255 | null |
| 2024-05-16 | A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks | Xuanfan Ni et.al. | 2405.10251 | null |
| 2024-05-16 | IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers | Hao Yan et.al. | 2405.10250 | null |
| 2024-05-15 | Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming | Bushi Xiao et.al. | 2405.09508 | null |
| 2024-05-15 | ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata | Jonne Sälevä et.al. | 2405.09496 | null |
| 2024-05-15 | Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts | Donya Rooein et.al. | 2405.09482 | null |
| 2024-05-15 | Tell Me Why: Explainable Public Health Fact-Checking with Large Language Models | Majid Zarharan et.al. | 2405.09454 | link |
| 2024-05-15 | Facilitating Opinion Diversity through Hybrid NLP Approaches | Michiel van der Meer et.al. | 2405.09439 | null |
| 2024-05-15 | MicroPython Testbed for Federated Learning Algorithms | Miroslav Popovic et.al. | 2405.09423 | null |
| 2024-05-15 | Matching domain experts by training from scratch on domain knowledge | Xiaoliang Luo et.al. | 2405.09395 | null |
| 2024-05-15 | PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models | Devansh Jain et.al. | 2405.09373 | null |
| 2024-05-15 | Large Language Model Bias Mitigation from the Perspective of Knowledge Editing | Ruizhe Chen et.al. | 2405.09341 | null |
| 2024-05-15 | Prompting-based Synthetic Data Generation for Few-Shot Question Answering | Maximilian Schmidt et.al. | 2405.09335 | null |
| 2024-05-14 | Towards Enhanced RAC Accessibility: Leveraging Datasets and LLMs | Edison Jair Bejarano Sepulveda et.al. | 2405.08792 | null |
| 2024-05-14 | Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring | Tiantian Zhang et.al. | 2405.08786 | null |
| 2024-05-14 | Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs | Akhila Yerukola et.al. | 2405.08760 | link |
| 2024-05-14 | Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven Approach | Syed Mhamudul Hasan et.al. | 2405.08755 | null |
| 2024-05-14 | Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding | Zhimin Li et.al. | 2405.08748 | link |
| 2024-05-14 | ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation | Dimitris Gkoumas et.al. | 2405.08619 | null |
| 2024-05-14 | A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine | Hanguang Xiao et.al. | 2405.08603 | null |
| 2024-05-14 | EVDA: Evolving Deepfake Audio Detection Continual Learning Benchmark | Xiaohui Zhang et.al. | 2405.08596 | null |
| 2024-05-14 | Falcon 7b for Software Mention Detection in Scholarly Documents | AmeerAli Khan et.al. | 2405.08514 | null |
| 2024-05-14 | Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil Procedure | Odysseas S. Chlapanis et.al. | 2405.08502 | null |
| 2024-05-13 | Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots | Chengyue Wu et.al. | 2405.07990 | link |
| 2024-05-13 | A Generalist Learner for Multifaceted Medical Image Interpretation | Hong-Yu Zhou et.al. | 2405.07988 | null |
| 2024-05-13 | PyZoBot: A Platform for Conversational Information Extraction and Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented Generation | Suad Alshammari et.al. | 2405.07963 | null |
| 2024-05-13 | AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | Samuel Schmidgall et.al. | 2405.07960 | null |
| 2024-05-13 | EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning | Yinzhu Quan et.al. | 2405.07938 | link |
| 2024-05-13 | PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition | Ziyang Zhang et.al. | 2405.07932 | link |
| 2024-05-13 | Can Better Text Semantics in Prompt Tuning Improve VLM Generalization? | Hari Chandana Kuchibhotla et.al. | 2405.07921 | null |
| 2024-05-13 | A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking | Ferdinand Schlatt et.al. | 2405.07920 | link |
| 2024-05-13 | Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers | Alena Tsanda et.al. | 2405.07886 | null |
| 2024-05-13 | Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques | Michela Lorandi et.al. | 2405.07875 | null |
| 2024-05-10 | Linearizing Large Language Models | Jean Mercat et.al. | 2405.06640 | link |
| 2024-05-10 | Value Augmented Sampling for Language Model Alignment and Personalization | Seungwook Han et.al. | 2405.06639 | link |
| 2024-05-10 | Federated Document Visual Question Answering: A Pilot Study | Khanh Nguyen et.al. | 2405.06636 | null |
| 2024-05-10 | Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models | Chakshu Moar et.al. | 2405.06626 | null |
| 2024-05-10 | What Can Natural Language Processing Do for Peer Review? | Ilia Kuznetsov et.al. | 2405.06563 | null |
| 2024-05-10 | Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval | Mengjia Niu et.al. | 2405.06545 | null |
| 2024-05-10 | Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts | Wenyu Huang et.al. | 2405.06524 | null |
| 2024-05-10 | UniDM: A Unified Framework for Data Manipulation with Large Language Models | Yichen Qian et.al. | 2405.06510 | null |
| 2024-05-10 | Aspect-based Sentiment Evaluation of Chess Moves (ASSESS): an NLP-based Method for Evaluating Chess Strategies from Textbooks | Haifa Alrdahi et.al. | 2405.06499 | null |
| 2024-05-10 | Storypark: Leveraging Large Language Models to Enhance Children Story Learning Through Child-AI collaboration Storytelling | Lyumanshan Ye et.al. | 2405.06495 | null |
| 2024-05-09 | Natural Language Processing RELIES on Linguistics | Juri Opitz et.al. | 2405.05966 | null |
| 2024-05-09 | OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning | Dan Qiao et.al. | 2405.05957 | link |
| 2024-05-09 | Probing Multimodal LLMs as World Models for Driving | Shiva Sreeram et.al. | 2405.05956 | link |
| 2024-05-09 | Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning | Junzhi Chen et.al. | 2405.05955 | null |
| 2024-05-09 | CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | Jiachen Li et.al. | 2405.05949 | link |
| 2024-05-09 | Trustworthy AI-Generative Content in Intelligent 6G Network: Adversarial, Privacy, and Fairness | Siyuan Li et.al. | 2405.05930 | null |
| 2024-05-09 | Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? | Zorik Gekhman et.al. | 2405.05904 | null |
| 2024-05-09 | Co-driver: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes | Ziang Guo et.al. | 2405.05885 | link |
| 2024-05-09 | FlockGPT: Guiding UAV Flocking with Linguistic Orchestration | Artem Lykov et.al. | 2405.05872 | null |
| 2024-05-09 | Robots Can Feel: LLM-based Framework for Robot Ethical Reasoning | Artem Lykov et.al. | 2405.05824 | link |
| 2024-05-08 | You Only Cache Once: Decoder-Decoder Architectures for Language Models | Yutao Sun et.al. | 2405.05254 | link |
| 2024-05-08 | Open Source Language Models Can Provide Feedback: Evaluating LLMs’ Ability to Help Students Using GPT-4-As-A-Judge | Charles Koutcheme et.al. | 2405.05253 | link |
| 2024-05-09 | LLMs with Personalities in Multi-issue Negotiation Games | Sean Noh et.al. | 2405.05248 | null |
| 2024-05-08 | SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants | Masoud Moghani et.al. | 2405.05226 | null |
| 2024-05-08 | Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers | Jiuxiang Gu et.al. | 2405.05219 | null |
| 2024-05-08 | MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning | Inderjeet Nair et.al. | 2405.05189 | null |
| 2024-05-08 | Air Gap: Protecting Privacy-Conscious Conversational Agents | Eugene Bagdasaryan et.al. | 2405.05175 | null |
| 2024-05-08 | XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples | Peiqin Lin et.al. | 2405.05116 | null |
| 2024-05-08 | QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs | Weijia Zhang et.al. | 2405.05109 | null |
| 2024-05-08 | Concerns on Bias in Large Language Models when Creating Synthetic Personae | Helena A. Haxvig et.al. | 2405.05080 | null |
| 2024-05-07 | ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning | Jing Lin et.al. | 2405.04533 | null |
| 2024-05-07 | QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | Yujun Lin et.al. | 2405.04532 | link |
| 2024-05-07 | NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts | Shudan Zhang et.al. | 2405.04520 | link |
| 2024-05-07 | xLSTM: Extended Long Short-Term Memory | Maximilian Beck et.al. | 2405.04517 | link |
| 2024-05-07 | A Transformer with Stack Attention | Jiaoda Li et.al. | 2405.04515 | link |
| 2024-05-08 | Unveiling Disparities in Web Task Handling Between Human and Web Agent | Kihoon Son et.al. | 2405.04497 | null |
| 2024-05-07 | Toward In-Context Teaching: Adapting Examples to Students’ Misconceptions | Alexis Ross et.al. | 2405.04495 | null |
| 2024-05-07 | The Silicone Ceiling: Auditing GPT’s Race and Gender Biases in Hiring | Lena Armstrong et.al. | 2405.04412 | null |
| 2024-05-07 | Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks | Georgios Pantazopoulos et.al. | 2405.04403 | link |
| 2024-05-07 | Large Language Models Cannot Explain Themselves | Advait Sarkar et.al. | 2405.04382 | null |
| 2024-05-06 | Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs | Muhammad Uzair Khattak et.al. | 2405.03690 | null |
| 2024-05-06 | Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Frames | Keith Burghardt et.al. | 2405.03688 | null |
| 2024-05-06 | Language-Image Models with 3D Understanding | Jang Hyun Cho et.al. | 2405.03685 | null |
| 2024-05-06 | AtomGPT: Atomistic Generative Pre-trained Transformer for Forward and Inverse Materials Design | Kamal Choudhary et.al. | 2405.03680 | null |
| 2024-05-06 | A New Robust Partial $p$ -Wasserstein-Based Metric for Comparing Distributions | Sharath Raghvendra et.al. | 2405.03664 | null |
| 2024-05-06 | When LLMs Meet Cybersecurity: A Systematic Literature Review | Jie Zhang et.al. | 2405.03644 | null |
| 2024-05-06 | A Controlled Experiment on the Energy Efficiency of the Source Code Generated by Code Llama | Vlad-Andrei Cursaru et.al. | 2405.03616 | null |
| 2024-05-06 | Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment | Abhinav Agarwalla et.al. | 2405.03594 | null |
| 2024-05-06 | AlphaMath Almost Zero: process Supervision without process | Guoxin Chen et.al. | 2405.03553 | null |
| 2024-05-06 | MAmmoTH2: Scaling Instructions from the Web | Xiang Yue et.al. | 2405.03548 | null |
| 2024-05-03 | Leveraging Large Language Models to Enhance Domain Expert Inclusion in Data Science Workflows | Jasmine Y. Shih et.al. | 2405.02260 | null |
| 2024-05-03 | What matters when building vision-language models? | Hugo Laurençon et.al. | 2405.02246 | null |
| 2024-05-03 | REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs | Deepa Tilwani et.al. | 2405.02228 | null |
| 2024-05-03 | Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks | Lujing Zhang et.al. | 2405.02225 | null |
| 2024-05-03 | FairEvalLLM. A Comprehensive Framework for Benchmarking Fairness in Large Language Model Recommender Systems | Yashar Deldjoo et.al. | 2405.02219 | null |
| 2024-05-03 | Automatic Programming: Large Language Models and Beyond | Michael R. Lyu et.al. | 2405.02213 | null |
| 2024-05-03 | Assessing and Verifying Task Utility in LLM-Powered Applications | Negar Arabzadeh et.al. | 2405.02178 | null |
| 2024-05-03 | The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates | Giuseppe Russo Latona et.al. | 2405.02150 | null |
| 2024-05-03 | MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain | Chao Jiang et.al. | 2405.02144 | null |
| 2024-05-03 | Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection | Guillem Ramírez et.al. | 2405.02134 | null |
| 2024-05-02 | Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks | Murtaza Dalal et.al. | 2405.01534 | null |
| 2024-05-02 | OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning | Shihao Wang et.al. | 2405.01533 | link |
| 2024-05-02 | FLAME: Factuality-Aware Alignment for Large Language Models | Sheng-Chieh Lin et.al. | 2405.01525 | null |
| 2024-05-02 | Transformer-Aided Semantic Communications | Matin Mortaheb et.al. | 2405.01521 | null |
| 2024-05-02 | Analyzing the Role of Semantic Representations in the Era of Large Language Models | Zhijing Jin et.al. | 2405.01502 | link |
| 2024-05-02 | Supporting Business Document Workflows via Collection-Centric Information Foraging with Large Language Models | Raymond Fok et.al. | 2405.01501 | null |
| 2024-05-02 | Controllable Text Generation in the Instruction-Tuning Era | Dhananjay Ashok et.al. | 2405.01490 | null |
| 2024-05-02 | NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment | Gerald Shen et.al. | 2405.01481 | link |
| 2024-05-02 | V-FLUTE: Visual Figurative Language Understanding with Textual Explanations | Arkadiy Saakyan et.al. | 2405.01474 | link |
| 2024-05-02 | Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning | Théo Moutakanni et.al. | 2405.01469 | null |
| 2024-05-01 | Is Bigger Edit Batch Size Always Better? – An Empirical Study on Model Editing with Llama-3 | Junsang Yoon et.al. | 2405.00664 | link |
| 2024-05-01 | HalluVault: A Novel Logic Programming-aided Metamorphic Testing Framework for Detecting Fact-Conflicting Hallucinations in Large Language Models | Ningke Li et.al. | 2405.00648 | null |
| 2024-05-01 | When Quantization Affects Confidence of Large Language Models? | Irina Proskurina et.al. | 2405.00632 | link |
| 2024-05-01 | “I’m Not Sure, But…”: Examining the Impact of Large Language Models’ Uncertainty Expression on User Reliance and Trust | Sunnie S. Y. Kim et.al. | 2405.00623 | null |
| 2024-05-01 | Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling | Yida Mu et.al. | 2405.00611 | null |
| 2024-05-01 | Investigating Automatic Scoring and Feedback using Large Language Models | Gloria Ashiya Katuka et.al. | 2405.00602 | null |
| 2024-05-01 | Are Models Biased on Text without Gender-related Language? | Catarina G Belém et.al. | 2405.00588 | link |
| 2024-05-01 | The Real, the Better: Aligning Large Language Models with Online Human Behaviors | Guanying Jiang et.al. | 2405.00578 | null |
| 2024-05-01 | EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model | Deng Li et.al. | 2405.00574 | null |
| 2024-05-01 | Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval | Young Kyun Jang et.al. | 2405.00571 | null |
| 2024-04-30 | DOCCI: Descriptions of Connected and Contrasting Images | Yasumasa Onoe et.al. | 2404.19753 | null |
| 2024-04-30 | Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | Yunhao Ge et.al. | 2404.19752 | null |
| 2024-04-30 | PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance Verification | Leon Garza et.al. | 2404.19744 | null |
| 2024-04-30 | Better & Faster Large Language Models via Multi-token Prediction | Fabian Gloeckle et.al. | 2404.19737 | null |
| 2024-04-30 | A Framework for Leveraging Human Computation Gaming to Enhance Knowledge Graphs for Accuracy Critical Generative AI Applications | Steph Buongiorno et.al. | 2404.19729 | null |
| 2024-04-30 | PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games | Steph Buongiorno et.al. | 2404.19721 | null |
| 2024-04-30 | Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns | Constantinos Patsakis et.al. | 2404.19715 | null |
| 2024-04-30 | Automated Generation of High-Quality Medical Simulation Scenarios Through Integration of Semi-Structured Data and Large Language Models | Scott Sumpter et.al. | 2404.19713 | null |
| 2024-04-30 | When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively | Tiziano Labruna et.al. | 2404.19705 | link |
| 2024-04-30 | Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners | Chun Feng et.al. | 2404.19696 | null |
| 2024-04-29 | Hallucination of Multimodal Large Language Models: A Survey | Zechen Bai et.al. | 2404.18930 | link |
| 2024-04-29 | DPO Meets PPO: Reinforced Token Optimization for RLHF | Han Zhong et.al. | 2404.18922 | link |
| 2024-04-29 | TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation | Junhao Cheng et.al. | 2404.18919 | null |
| 2024-04-29 | Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting | Fangcheng Liu et.al. | 2404.18911 | link |
| 2024-04-29 | Human-in-the-Loop Synthetic Text Data Inspection with Provenance Tracking | Hong Jin Kang et.al. | 2404.18881 | link |
| 2024-04-29 | More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness | Aaron J. Li et.al. | 2404.18870 | link |
| 2024-04-29 | Truth-value judgment in language models: belief directions are context sensitive | Stefan F. Schouten et.al. | 2404.18865 | null |
| 2024-04-29 | Performance-Aligned LLMs for Generating Fast Code | Daniel Nichols et.al. | 2404.18864 | null |
| 2024-04-29 | VERT: Verified Equivalent Rust Transpilation with Few-Shot Learning | Aidan Z. H. Yang et.al. | 2404.18852 | null |
| 2024-04-29 | It’s Difficult to be Neutral – Human and LLM-based Sentiment Annotation of Patient Comments | Petter Mæhlum et.al. | 2404.18832 | null |
| 2024-04-26 | Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo | Stephen Zhao et.al. | 2404.17546 | link |
| 2024-04-26 | Large Language Model Agent as a Mechanical Designer | Yayati Jadhav et.al. | 2404.17525 | null |
| 2024-04-26 | On the Use of Large Language Models to Generate Capability Ontologies | Luis Miguel Vieira da Silva et.al. | 2404.17524 | null |
| 2024-04-26 | Enhancing Legal Compliance and Regulation Analysis with Large Language Models | Shabnam Hassani et.al. | 2404.17522 | null |
| 2024-04-26 | A Comprehensive Evaluation on Event Reasoning of Large Language Models | Zhengwei Tao et.al. | 2404.17513 | link |
| 2024-04-26 | Learning text-to-video retrieval from image captioning | Lucas Ventura et.al. | 2404.17498 | null |
| 2024-04-26 | CEval: A Benchmark for Evaluating Counterfactual Text Generation | Van Bach Nguyen et.al. | 2404.17475 | link |
| 2024-04-26 | Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System | Robin Schmucker et.al. | 2404.17460 | null |
| 2024-04-26 | “ChatGPT Is Here to Help, Not to Replace Anybody” – An Evaluation of Students’ Opinions On Integrating ChatGPT In CS Courses | Bruno Pereira Cipriano et.al. | 2404.17443 | null |
| 2024-04-26 | InspectorRAGet: An Introspection Platform for RAG Evaluation | Kshitij Fadnis et.al. | 2404.17347 | link |
| 2024-04-25 | Make-it-Real: Unleashing Large Multimodal Model’s Ability for Painting 3D Objects with Realistic Materials | Ye Fang et.al. | 2404.16829 | null |
| 2024-04-25 | How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites | Zhe Chen et.al. | 2404.16821 | link |
| 2024-04-25 | IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages | Harman Singh et.al. | 2404.16816 | null |
| 2024-04-25 | Make Your LLM Fully Utilize the Context | Shengnan An et.al. | 2404.16811 | link |
| 2024-04-25 | Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning | Tianhui Zhang et.al. | 2404.16807 | null |
| 2024-04-25 | Weak-to-Strong Extrapolation Expedites Alignment | Chujie Zheng et.al. | 2404.16792 | link |
| 2024-04-25 | SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension | Bohao Li et.al. | 2404.16790 | link |
| 2024-04-25 | Continual Learning of Large Language Models: A Comprehensive Survey | Haizhou Shi et.al. | 2404.16789 | link |
| 2024-04-25 | Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model | Runzhe Zhan et.al. | 2404.16766 | null |
| 2024-04-25 | RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis | Xiaoman Zhang et.al. | 2404.16754 | null |
| 2024-04-24 | Hybrid LLM/Rule-based Approaches to Business Insights Generation from Structured Data | Aliaksei Vertsel et.al. | 2404.15604 | null |
| 2024-04-24 | ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction | Henry Peng Zou et.al. | 2404.15592 | link |
| 2024-04-24 | Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations? | Hossein Salami et.al. | 2404.15578 | null |
| 2024-04-23 | PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models | Shashi Kant Gupta et.al. | 2404.15549 | null |
| 2024-04-23 | Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models | Mihir Parmar et.al. | 2404.15522 | link |
| 2024-04-23 | Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval | Young Kyun Jang et.al. | 2404.15516 | null |
| 2024-04-23 | ToM-LM: Delegating Theory Of Mind Reasoning to External Symbolic Executors in Large Language Models | Weizhi Tang et.al. | 2404.15515 | null |
| 2024-04-23 | GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots | Simranjit Singh et.al. | 2404.15500 | null |
| 2024-04-23 | IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents | Jean-Philippe Corbeil et.al. | 2404.15488 | link |
| 2024-04-23 | Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance | Het Patel et.al. | 2404.15485 | null |
| 2024-04-23 | Aligning LLM Agents by Learning Latent Preference from User Edits | Ge Gao et.al. | 2404.15269 | null |
| 2024-04-23 | XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts | Yifeng Ding et.al. | 2404.15247 | link |
| 2024-04-23 | Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models | Aidan Z. H. Yang et.al. | 2404.15236 | null |
| 2024-04-23 | Re-Thinking Inverse Graphics With Large Language Models | Peter Kulits et.al. | 2404.15228 | null |
| 2024-04-23 | Setting up the Data Printer with Improved English to Ukrainian Machine Translation | Yurii Paniv et.al. | 2404.15196 | null |
| 2024-04-23 | Regressive Side Effects of Training Language Models to Mimic Student Misconceptions | Shashank Sonkar et.al. | 2404.15156 | null |
| 2024-04-23 | Bias patterns in the application of LLMs for clinical decision support: A comprehensive study | Raphael Poulain et.al. | 2404.15149 | null |
| 2024-04-23 | Rethinking LLM Memorization through the Lens of Adversarial Compression | Avi Schwarzschild et.al. | 2404.15146 | null |
| 2024-04-23 | MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning | Sunan He et.al. | 2404.15127 | null |
| 2024-04-23 | Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation | Xun Wu et.al. | 2404.15100 | null |
| 2024-04-22 | AutoAD III: The Prequel – Back to the Pixels | Tengda Han et.al. | 2404.14412 | null |
| 2024-04-22 | SpaceByte: Towards Deleting Tokenization from Large Language Modeling | Kevin Slagle et.al. | 2404.14408 | link |
| 2024-04-22 | RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios? | Adrian de Wynter et.al. | 2404.14397 | null |
| 2024-04-22 | A Survey on Self-Evolution of Large Language Models | Zhengwei Tao et.al. | 2404.14387 | null |
| 2024-04-22 | Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph | Xiaochen Kev Gao et.al. | 2404.14372 | link |
| 2024-04-22 | Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data | Fahim Tajwar et.al. | 2404.14367 | link |
| 2024-04-22 | Better Synthetic Data by Retrieving and Transforming Existing Datasets | Saumya Gandhi et.al. | 2404.14361 | link |
| 2024-04-22 | Rethinking Legal Compliance Automation: Opportunities with Large Language Models | Shabnam Hassani et.al. | 2404.14356 | null |
| 2024-04-22 | Automated Long Answer Grading with RiceChem Dataset | Shashank Sonkar et.al. | 2404.14316 | null |
| 2024-04-22 | Explaining Arguments’ Strength: Unveiling the Role of Attacks and Supports (Technical Report) | Xiang Yin et.al. | 2404.14304 | null |
| 2024-04-19 | MoVA: Adapting Mixture of Vision Experts to Multimodal Context | Zhuofan Zong et.al. | 2404.13046 | link |
| 2024-04-19 | Unified Scene Representation and Reconstruction for 3D Large Language Models | Tao Chu et.al. | 2404.13044 | null |
| 2024-04-19 | Data Alignment for Zero-Shot Concept Generation in Dermatology AI | Soham Gadgil et.al. | 2404.13043 | null |
| 2024-04-19 | LaPA: Latent Prompt Assist Model For Medical Visual Question Answering | Tiancheng Gu et.al. | 2404.13039 | link |
| 2024-04-19 | Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs | Biyang Guo et.al. | 2404.13033 | link |
| 2024-04-19 | When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering | Stephen Choi et.al. | 2404.13028 | null |
| 2024-04-19 | Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models | Chuofan Ma et.al. | 2404.13013 | null |
| 2024-04-19 | Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs | Clemencia Siro et.al. | 2404.12994 | link |
| 2024-04-19 | RedactBuster: Entity Type Recognition from Redacted Documents | Mirco Beltrame et.al. | 2404.12991 | null |
| 2024-04-19 | FineRec:Exploring Fine-grained Sequential Recommendation | Xiaokun Zhang et.al. | 2404.12975 | null |
| 2024-04-18 | BLINK: Multimodal Large Language Models Can See but Not Perceive | Xingyu Fu et.al. | 2404.12390 | null |
| 2024-04-18 | MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale | Xiaotang Gai et.al. | 2404.12372 | null |
| 2024-04-18 | When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes | Asaf Yehudai et.al. | 2404.12365 | null |
| 2024-04-18 | Towards a Foundation Model for Partial Differential Equation: Multi-Operator Learning and Extrapolation | Jingmin Sun et.al. | 2404.12355 | link |
| 2024-04-18 | V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning | Hang Hua et.al. | 2404.12353 | null |
| 2024-04-18 | Large Language Models in Targeted Sentiment Analysis | Nicolay Rusnachenko et.al. | 2404.12342 | link |
| 2024-04-18 | Normative Requirements Operationalization with Large Language Models | Nick Feng et.al. | 2404.12335 | null |
| 2024-04-18 | Large Language Models for Synthetic Participatory Planning of Shared Automated Electric Mobility Systems | Jiangbo Yu et.al. | 2404.12317 | null |
| 2024-04-18 | Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair | Yusuke Sakai et.al. | 2404.12299 | null |
| 2024-04-18 | Augmenting emotion features in irony detection with Large language modeling | Yucheng Lin et.al. | 2404.12291 | null |
| 2024-04-17 | A Deep Dive into Large Language Models for Automated Bug Localization and Repair | Soneya Binta Hossain et.al. | 2404.11595 | null |
| 2024-04-17 | Related Work and Citation Text Generation: A Survey | Xiangci Li et.al. | 2404.11588 | null |
| 2024-04-17 | LLMTune: Accelerate Database Knob Tuning with Large Language Models | Xinmei Huang et.al. | 2404.11581 | null |
| 2024-04-17 | MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation | Kuan-Chieh et.al. | 2404.11565 | null |
| 2024-04-17 | Quantifying Multilingual Performance of Large Language Models Across Languages | Zihao Li et.al. | 2404.11553 | null |
| 2024-04-17 | Evaluating Span Extraction in Generative Paradigm: A Reflection on Aspect-Based Sentiment Analysis | Soyoung Yang et.al. | 2404.11539 | null |
| 2024-04-17 | Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization | Costas Mavromatis et.al. | 2404.11531 | null |
| 2024-04-17 | Embedding Privacy in Computational Social Science and Artificial Intelligence Research | Keenan Jones et.al. | 2404.11515 | null |
| 2024-04-17 | Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models | Yushuo Chen et.al. | 2404.11502 | link |
| 2024-04-17 | Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models | Yue Zhou et.al. | 2404.11500 | link |
| 2024-04-16 | Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback | Qiwei Di et.al. | 2404.10776 | null |
| 2024-04-16 | LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? | Yuchi Wang et.al. | 2404.10763 | link |
| 2024-04-16 | Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification | Yu-Yang Li et.al. | 2404.10757 | null |
| 2024-04-16 | Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study | Shusheng Xu et.al. | 2404.10719 | null |
| 2024-04-16 | An empirical study on code review activity prediction in practice | Doriane Olewicki et.al. | 2404.10703 | null |
| 2024-04-16 | Automating REST API Postman Test Cases Using LLM | S Deepika Sri et.al. | 2404.10678 | null |
| 2024-04-16 | ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images | Quan Van Nguyen et.al. | 2404.10652 | link |
| 2024-04-16 | Self-playing Adversarial Language Game Enhances LLM Reasoning | Pengyu Cheng et.al. | 2404.10642 | link |
| 2024-04-16 | HLAT: High-quality Large Language Model Pre-trained on AWS Trainium | Haozheng Fan et.al. | 2404.10630 | null |
| 2024-04-16 | Private Attribute Inference from Images with Vision-Language Models | Batuhan Tömekçe et.al. | 2404.10618 | null |
| 2024-04-15 | Personalized Collaborative Fine-Tuning for On-Device Large Language Models | Nicolas Wagner et.al. | 2404.09753 | null |
| 2024-04-15 | Quantization of Large Language Models with an Overdetermined Basis | Daniil Merkulov et.al. | 2404.09737 | null |
| 2024-04-15 | Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model | Hyunsoo Cho et.al. | 2404.09717 | null |
| 2024-04-15 | Enhancing Robot Explanation Capabilities through Vision-Language Models: a Preliminary Study by Interpreting Visual Inputs for Improved Human-Robot Interaction | David Sobrín-Hidalgo et.al. | 2404.09705 | null |
| 2024-04-15 | Generative AI for Game Theory-based Mobile Networking | Long He et.al. | 2404.09699 | null |
| 2024-04-15 | Are Large Language Models Reliable Argument Quality Annotators? | Nailia Mirzakhmedova et.al. | 2404.09696 | null |
| 2024-04-15 | LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models | Guangyan Li et.al. | 2404.09695 | null |
| 2024-04-15 | Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation | Juhwan Choi et.al. | 2404.09682 | null |
| 2024-04-15 | Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection | Jiaqi Zhu et.al. | 2404.09654 | null |
| 2024-04-15 | Bridging Vision and Language Spaces with Assignment Prediction | Jungin Park et.al. | 2404.09632 | link |
| 2024-04-12 | Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts | Övgü Özdemir et.al. | 2404.08589 | link |
| 2024-04-12 | Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation | Hanlin Tian et.al. | 2404.08570 | null |
| 2024-04-12 | RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs | Shreyas Chaudhari et.al. | 2404.08555 | null |
| 2024-04-12 | Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward | Xuan Xie et.al. | 2404.08517 | null |
| 2024-04-12 | Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | Haoran Qiu et.al. | 2404.08509 | link |
| 2024-04-12 | LaSagnA: Language-based Segmentation Assistant for Complex Queries | Cong Wei et.al. | 2404.08506 | link |
| 2024-04-12 | Strategic Interactions between Large Language Models-based Agents in Beauty Contests | Siting Lu et.al. | 2404.08492 | null |
| 2024-04-12 | Thematic Analysis with Large Language Models: does it work with languages other than English? A targeted test in Italian | Stefano De Paoli et.al. | 2404.08488 | null |
| 2024-04-12 | Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task | Hassan Ali et.al. | 2404.08424 | null |
| 2024-04-12 | AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees | William Fleshman et.al. | 2404.08417 | null |
| 2024-04-11 | OpenBias: Open-set Bias Detection in Text-to-Image Generative Models | Moreno D’Incà et.al. | 2404.07990 | null |
| 2024-04-11 | View Selection for 3D Captioning via Diffusion Ranking | Tiange Luo et.al. | 2404.07984 | null |
| 2024-04-11 | Manipulating Large Language Models to Increase Product Visibility | Aounon Kumar et.al. | 2404.07981 | link |
| 2024-04-11 | LLoCO: Learning Long Contexts Offline | Sijun Tan et.al. | 2404.07979 | link |
| 2024-04-11 | Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models | Haotian Zhang et.al. | 2404.07973 | null |
| 2024-04-11 | Leveraging Large Language Models (LLMs) to Support Collaborative Human-AI Online Risk Data Annotation | Jinkyung Park et.al. | 2404.07926 | null |
| 2024-04-11 | LaVy: Vietnamese Multimodal Large Language Model | Chi Tran et.al. | 2404.07922 | null |
| 2024-04-11 | AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs | Zeyi Liao et.al. | 2404.07921 | link |
| 2024-04-11 | DesignQA: A Multimodal Benchmark for Evaluating Large Language Models’ Understanding of Engineering Documentation | Anna C. Doris et.al. | 2404.07917 | link |
| 2024-04-11 | High-Dimension Human Value Representation in Large Language Models | Samuel Cahyawijaya et.al. | 2404.07900 | null |
| 2024-04-10 | UMBRAE: Unified Multimodal Decoding of Brain Signals | Weihao Xia et.al. | 2404.07202 | null |
| 2024-04-10 | Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | Tsendsuren Munkhdalai et.al. | 2404.07143 | null |
| 2024-04-11 | Semantically-correlated memories in a dense associative model | Thomas F Burns et.al. | 2404.07123 | null |
| 2024-04-10 | Continuous Language Model Interpolation for Dynamic and Controllable Text Generation | Sara Kangaslahti et.al. | 2404.07117 | null |
| 2024-04-11 | From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications | Yongqiang Ma et.al. | 2404.07108 | null |
| 2024-04-10 | Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs | Bowen Jin et.al. | 2404.07103 | null |
| 2024-04-10 | Dynamic Generation of Personalities with Large Language Models | Jianzhi Liu et.al. | 2404.07084 | null |
| 2024-04-10 | VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning | Alexandros Xenos et.al. | 2404.07078 | link |
| 2024-04-10 | Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers? | Mingyu Jin et.al. | 2404.07066 | link |
| 2024-04-10 | Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study | Alessandro Stolfo et.al. | 2404.07060 | null |
| 2024-04-09 | Pitfalls of Conversational LLMs on News Debiasing | Ipek Baris Schlicht et.al. | 2404.06488 | null |
| 2024-04-09 | Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks | Chonghua Wang et.al. | 2404.06480 | link |
| 2024-04-09 | Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models | Zihan Fang et.al. | 2404.06448 | null |
| 2024-04-09 | Large Language Models to the Rescue: Deadlock Resolution in Multi-Robot Systems | Kunal Garg et.al. | 2404.06413 | null |
| 2024-04-09 | AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents | Luca Gioacchini et.al. | 2404.06411 | link |
| 2024-04-09 | Take a Look at it! Rethinking How to Evaluate Language Model Jailbreak | Hongyu Cai et.al. | 2404.06407 | link |
| 2024-04-09 | Apprentices to Research Assistants: Advancing Research with Large Language Models | M. Namvarpour et.al. | 2404.06404 | null |
| 2024-04-09 | MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies | Shengding Hu et.al. | 2404.06395 | link |
| 2024-04-09 | MuPT: A Generative Symbolic Music Pretrained Transformer | Xingwei Qu et.al. | 2404.06393 | null |
| 2024-04-09 | Latent Distance Guided Alignment Training for Large Language Models | Haotian Luo et.al. | 2404.06390 | null |
| 2024-04-08 | MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | Bo He et.al. | 2404.05726 | null |
| 2024-04-08 | Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs | Keen You et.al. | 2404.05719 | null |
| 2024-04-08 | Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding | Ahmad Idrissi-Yaghir et.al. | 2404.05694 | null |
| 2024-04-08 | Evaluating Mathematical Reasoning Beyond Accuracy | Shijie Xia et.al. | 2404.05692 | link |
| 2024-04-08 | Retrieval-Augmented Open-Vocabulary Object Detection | Jooyeon Kim et.al. | 2404.05687 | link |
| 2024-04-08 | MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation | Kunpeng Song et.al. | 2404.05674 | null |
| 2024-04-08 | CoReS: Orchestrating the Dance of Reasoning and Segmentation | Xiaoyi Bao et.al. | 2404.05673 | null |
| 2024-04-08 | Fighting crime with Transformers: Empirical analysis of address parsing methods in payment data | Haitham Hammami et.al. | 2404.05632 | link |
| 2024-04-08 | LTNER: Large Language Model Tagging for Named Entity Recognition with Contextualized Entity Marking | Faren Yan et.al. | 2404.05624 | null |
| 2024-04-08 | MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering | Iñigo Alonso et.al. | 2404.05590 | null |
| 2024-04-05 | Physical Property Understanding from Language-Embedded Feature Fields | Albert J. Zhai et.al. | 2404.04242 | null |
| 2024-04-05 | Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents | Harsh Kohli et.al. | 2404.04237 | null |
| 2024-04-05 | Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation | Tianqi Zhong et.al. | 2404.04232 | link |
| 2024-04-05 | Social Skill Training with Large Language Models | Diyi Yang et.al. | 2404.04204 | null |
| 2024-04-05 | Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model | Xinrun Du et.al. | 2404.04167 | null |
| 2024-04-05 | Large language models as oracles for instantiating ontologies with domain-specific knowledge | Giovanni Ciatto et.al. | 2404.04108 | link |
| 2024-04-05 | Improving Factual Accuracy of Neural Table-to-Text Output by Addressing Input Problems in ToTTo | Barkavi Sundararajan et.al. | 2404.04103 | link |
| 2024-04-05 | Robust Preference Optimization with Provable Noise Tolerance for LLMs | Xize Liang et.al. | 2404.04102 | null |
| 2024-04-05 | Assessing the quality of information extraction | Filip Seitl et.al. | 2404.04068 | null |
| 2024-04-05 | CLUE: A Clinical Language Understanding Evaluation for LLMs | Amin Dada et.al. | 2404.04067 | null |
| 2024-04-04 | CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching | Dongzhi Jiang et.al. | 2404.03653 | link |
| 2024-04-04 | AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent | Hanyu Lai et.al. | 2404.03648 | link |
| 2024-04-04 | Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra | Darioush Kevian et.al. | 2404.03647 | null |
| 2024-04-04 | Training LLMs over Neurally Compressed Text | Brian Lester et.al. | 2404.03626 | null |
| 2024-04-04 | Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph | Marco Bronzini et.al. | 2404.03623 | null |
| 2024-04-04 | Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models | Wenshan Wu et.al. | 2404.03622 | null |
| 2024-04-04 | DeViDe: Faceted medical knowledge for improved medical vision-language pre-training | Haozhe Luo et.al. | 2404.03618 | null |
| 2024-04-04 | Sailor: Open Language Models for South-East Asia | Longxu Dou et.al. | 2404.03608 | link |
| 2024-04-04 | Evaluating LLMs at Detecting Errors in LLM Responses | Ryo Kamoi et.al. | 2404.03602 | link |
| 2024-04-04 | Intent Detection and Entity Extraction from BioMedical Literature | Ankan Mullick et.al. | 2404.03598 | link |
| 2024-04-03 | ALOHa: A New Measure for Hallucination in Captioning Models | Suzanne Petryk et.al. | 2404.02904 | null |
| 2024-04-03 | MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment | Duygu Ceylan et.al. | 2404.02899 | null |
| 2024-04-03 | ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline | Yifan Xu et.al. | 2404.02893 | null |
| 2024-04-03 | Integrating Explanations in Learning LTL Specifications from Demonstrations | Ashutosh Gupta et.al. | 2404.02872 | null |
| 2024-04-03 | Toward Inference-optimal Mixture-of-Expert Large Language Models | Longfei Yun et.al. | 2404.02852 | null |
| 2024-04-03 | I-Design: Personalized LLM Interior Designer | Ata Çelen et.al. | 2404.02838 | null |
| 2024-04-03 | Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models | Wanyun Cui et.al. | 2404.02837 | null |
| 2024-04-03 | Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison | Maxime Bouthors et.al. | 2404.02835 | null |
| 2024-04-03 | Empowering Biomedical Discovery with AI Agents | Shanghua Gao et.al. | 2404.02831 | null |
| 2024-04-03 | BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models | Qijun Luo et.al. | 2404.02827 | link |
| 2024-04-02 | Topic-based Watermarks for LLM-Generated Text | Alexander Nemecek et.al. | 2404.02138 | null |
| 2024-04-02 | Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models | Wanyong Feng et.al. | 2404.02124 | null |
| 2024-04-02 | GINopic: Topic Modeling with Graph Isomorphism Network | Suman Adhya et.al. | 2404.02115 | link |
| 2024-04-02 | CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems | Sara Rosenthal et.al. | 2404.02103 | link |
| 2024-04-02 | Advancing LLM Reasoning Generalists with Preference Trees | Lifan Yuan et.al. | 2404.02078 | link |
| 2024-04-02 | Digital Forgetting in Large Language Models: A Survey of Unlearning Methods | Alberto Blanco-Justicia et.al. | 2404.02062 | null |
| 2024-04-02 | Long-context LLMs Struggle with Long In-context Learning | Tianle Li et.al. | 2404.02060 | link |
| 2024-04-02 | Deconstructing In-Context Learning: Understanding Prompts via Corruption | Namrata Shivagunde et.al. | 2404.02054 | link |
| 2024-04-02 | BERTopic-Driven Stock Market Predictions: Unraveling Sentiment Insights | Enmin Zhu et.al. | 2404.02053 | null |
| 2024-04-02 | A Survey on Large Language Model-Based Game Agents | Sihao Hu et.al. | 2404.02039 | link |
| 2024-03-29 | Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models | Atsuyuki Miyai et.al. | 2403.20331 | link |
| 2024-03-29 | Gecko: Versatile Text Embeddings Distilled from Large Language Models | Jinhyuk Lee et.al. | 2403.20327 | null |
| 2024-03-29 | Convolutional Prompting meets Language Models for Continual Learning | Anurag Roy et.al. | 2403.20317 | null |
| 2024-03-29 | Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference | Jovan Stojkovic et.al. | 2403.20306 | null |
| 2024-03-29 | Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain | Burcu Sayin et.al. | 2403.20288 | null |
| 2024-03-29 | LUQ: Long-text Uncertainty Quantification for LLMs | Caiqi Zhang et.al. | 2403.20279 | null |
| 2024-04-01 | Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want | Weifeng Lin et.al. | 2403.20271 | link |
| 2024-03-29 | Latxa: An Open Language Model and Evaluation Suite for Basque | Julen Etxaniz et.al. | 2403.20266 | link |
| 2024-03-29 | ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models | Thibaut Thonet et.al. | 2403.20262 | null |
| 2024-03-29 | Using LLMs to Model the Beliefs and Preferences of Targeted Populations | Keiichi Namikoshi et.al. | 2403.20252 | null |
| 2024-03-28 | InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction | Sirui Xu et.al. | 2403.19652 | null |
| 2024-03-28 | MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions | Kai Zhang et.al. | 2403.19651 | link |
| 2024-03-28 | Change-Agent: Towards Interactive Comprehensive Change Interpretation and Analysis from Change Detection and Change Captioning | Chenyang Liu et.al. | 2403.19646 | link |
| 2024-03-28 | Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models | Yucheng Shi et.al. | 2403.19631 | null |
| 2024-03-28 | Semantic Map-based Generation of Navigation Instructions | Chengzu Li et.al. | 2403.19603 | link |
| 2024-03-28 | LocCa: Visual Pretraining with Location-aware Captioners | Bo Wan et.al. | 2403.19596 | null |
| 2024-03-28 | Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation | Zhongliang Zhou et.al. | 2403.19584 | null |
| 2024-03-28 | WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models | Piotr Molenda et.al. | 2403.19548 | null |
| 2024-03-28 | LLMs as Academic Reading Companions: Extending HCI Through Synthetic Personae | Celia Chen et.al. | 2403.19506 | null |
| 2024-03-28 | Evolving Assembly Code in an Adversarial Environment | Irina Maliukov et.al. | 2403.19489 | null |
| 2024-03-27 | Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | Yanwei Li et.al. | 2403.18814 | link |
| 2024-03-27 | ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation | Suraj Patni et.al. | 2403.18807 | link |
| 2024-03-27 | Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation | Mateusz Klimaszewski et.al. | 2403.18804 | null |
| 2024-03-27 | Long-form factuality in large language models | Jerry Wei et.al. | 2403.18802 | link |
| 2024-03-27 | 3P-LLM: Probabilistic Path Planning using Large Language Model for Autonomous Robot Navigation | Ehsan Latif et.al. | 2403.18778 | null |
| 2024-03-27 | CheckEval: Robust Evaluation Framework using Large Language Model via Checklist | Yukyung Lee et.al. | 2403.18771 | null |
| 2024-03-27 | MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model | Yike Wu et.al. | 2403.18760 | null |
| 2024-03-27 | Understanding the Learning Dynamics of Alignment with Human Feedback | Shawn Im et.al. | 2403.18742 | link |
| 2024-03-27 | PhysicsAssistant: An LLM-Powered Interactive Learning Robot for Physics Lab Investigations | Ehsan Latif et.al. | 2403.18721 | null |
| 2024-03-27 | NL-ITI: Optimizing Probing and Intervention for Improvement of ITI Method | Jakub Hoscilowicz et.al. | 2403.18680 | link |
| 2024-03-26 | MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution | Wei Tao et.al. | 2403.17927 | link |
| 2024-03-26 | LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning | Rui Pan et.al. | 2403.17919 | null |
| 2024-03-26 | Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach | Andrea Ferrario et.al. | 2403.17873 | null |
| 2024-03-26 | Exploring LLMs as a Source of Targeted Synthetic Textual Data to Minimize High Confidence Misclassifications | Philip Lippmann et.al. | 2403.17860 | null |
| 2024-03-26 | ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages | Bhawna Piryani et.al. | 2403.17859 | link |
| 2024-03-26 | Verbing Weirds Language (Models): Evaluation of English Zero-Derivation in Five LLMs | David R. Mortensen et.al. | 2403.17856 | null |
| 2024-03-26 | ArabicaQA: A Comprehensive Dataset for Arabic Question Answering | Abdelrahman Abdallah et.al. | 2403.17848 | link |
| 2024-03-26 | Assessment of Multimodal Large Language Models in Alignment with Human Values | Zhelun Shi et.al. | 2403.17830 | null |
| 2024-03-26 | Accelerating Radio Spectrum Regulation Workflows with Large Language Models (LLMs) | Amir Ghasemi et.al. | 2403.17819 | null |
| 2024-03-26 | Are Compressed Language Models Less Subgroup Robust? | Leonidas Gee et.al. | 2403.17811 | link |
| 2024-03-25 | Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making | Shuai Ma et.al. | 2403.16812 | null |
| 2024-03-25 | An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems | Hanqing Yang et.al. | 2403.16809 | null |
| 2024-03-25 | Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback | Zhangqian Bi et.al. | 2403.16792 | null |
| 2024-03-25 | All Artificial, Less Intelligence: GenAI through the Lens of Formal Verification | Deepak Narayan Gadde et.al. | 2403.16750 | null |
| 2024-03-25 | Synapse: Learning Preferential Concepts from Visual Demonstrations | Sadanand Modak et.al. | 2403.16689 | null |
| 2024-03-25 | Investigation of the effectiveness of applying ChatGPT in Dialogic Teaching Using Electroencephalography | Jiayue Zhang et.al. | 2403.16687 | null |
| 2024-03-25 | ToXCL: A Unified Framework for Toxic Speech Detection and Explanation | Nhat M. Hoang et.al. | 2403.16685 | link |
| 2024-03-25 | RU22Fact: Optimizing Evidence for Multilingual Explainable Fact-Checking on Russia-Ukraine Conflict | Yirong Zeng et.al. | 2403.16662 | link |
| 2024-03-25 | Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT | Rohit Raju et.al. | 2403.16655 | null |
| 2024-03-25 | CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment | Feiteng Fang et.al. | 2403.16649 | link |
| 2024-03-25 | Virtual Co-Pilot: Multimodal Large Language Model-enabled Quick-access Procedures for Single Pilot Operations | Fan Li et.al. | 2403.16645 | null |
| 2024-03-25 | Conversational Grounding: Annotation and Analysis of Grounding Acts and Grounding Units | Biswesh Mohapatra et.al. | 2403.16609 | null |
| 2024-03-25 | TrustAI at SemEval-2024 Task 8: A Comprehensive Analysis of Multi-domain Machine Generated Text Detection Techniques | Ashok Urlana et.al. | 2403.16592 | null |
| 2024-03-25 | Can Large Language Models (or Humans) Distill Text? | Nicolas Audinet de Pieuchon et.al. | 2403.16584 | link |
| 2024-03-22 | LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models | Yuzhang Shang et.al. | 2403.15388 | null |
| 2024-03-22 | Long-CLIP: Unlocking the Long-Text Capability of CLIP | Beichen Zhang et.al. | 2403.15378 | link |
| 2024-03-22 | Can large language models explore in-context? | Akshay Krishnamurthy et.al. | 2403.15371 | null |
| 2024-03-22 | CoLLEGe: Concept Embedding Generation for Large Language Models | Ryan Teehan et.al. | 2403.15362 | null |
| 2024-03-22 | Multi-Review Fusion-in-Context | Aviv Slobodkin et.al. | 2403.15351 | null |
| 2024-03-22 | CO-Fun: A German Dataset on Company Outsourcing in Fund Prospectuses for Named Entity Recognition and Relation Extraction | Neda Foroutan et.al. | 2403.15322 | null |
| 2024-03-22 | Sphere Neural-Networks for Rational Reasoning | Tiansi Dong et.al. | 2403.15297 | null |
| 2024-03-22 | Measuring Gender and Racial Biases in Large Language Models | Jiafu An et.al. | 2403.15281 | null |
| 2024-03-22 | Bioinformatics and Biomedical Informatics with ChatGPT: Year One Review | Jinge Wang et.al. | 2403.15274 | null |
| 2024-03-22 | Event Temporal Relation Extraction based on Retrieval-Augmented on LLMs | Xiaobin Zhang et.al. | 2403.15273 | null |
| 2024-03-21 | MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? | Renrui Zhang et.al. | 2403.14624 | null |
| 2024-03-21 | Language Repository for Long Video Understanding | Kumara Kahatapitiya et.al. | 2403.14622 | link |
| 2024-03-21 | Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey | Zeyu Han et.al. | 2403.14608 | null |
| 2024-03-21 | MyVLM: Personalizing VLMs for User-Specific Queries | Yuval Alaluf et.al. | 2403.14599 | null |
| 2024-03-21 | Large Language Models for Multi-Choice Question Classification of Medical Subjects | Víctor Ponce-López et.al. | 2403.14582 | null |
| 2024-03-21 | RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain | William James Bolton et.al. | 2403.14578 | link |
| 2024-03-21 | A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students’ Formative Assessment Responses in Science | Clayton Cohn et.al. | 2403.14565 | null |
| 2024-03-21 | EDT: Improving Large Language Models’ Generation by Entropy-based Dynamic Temperature Sampling | Shimao Zhang et.al. | 2403.14541 | null |
| 2024-03-21 | Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference | Han Zhao et.al. | 2403.14520 | link |
| 2024-03-21 | The Ethics of ChatGPT in Medicine and Healthcare: A Systematic Review on Large Language Models (LLMs) | Joschka Haltaufderheide et.al. | 2403.14473 | null |
| 2024-03-20 | RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition | Ziyu Liu et.al. | 2403.13805 | null |
| 2024-03-20 | Learning from Models and Data for Visual Grounding | Ruozhen He et.al. | 2403.13804 | null |
| 2024-03-20 | Reverse Training to Nurse the Reversal Curse | Olga Golovneva et.al. | 2403.13799 | null |
| 2024-03-20 | Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts | Guangzeng Han et.al. | 2403.13786 | null |
| 2024-03-20 | Leveraging High-Resolution Features for Improved Deep Hashing-based Image Retrieval | Aymene Berriche et.al. | 2403.13747 | null |
| 2024-03-20 | EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation | Atnafu Lambebo Tonja et.al. | 2403.13737 | null |
| 2024-03-20 | Large Language Models meet Network Slicing Management and Orchestration | Abdulhalim Dandoush et.al. | 2403.13721 | null |
| 2024-03-20 | RoleInteract: Evaluating the Social Interaction of Role-Playing Agents | Hongzhan Chen et.al. | 2403.13679 | null |
| 2024-03-20 | Do Not Worry if You Do Not Have Data: Building Pretrained Language Models Using Translationese | Meet Doshi et.al. | 2403.13638 | null |
| 2024-03-20 | VL-Mamba: Exploring State Space Models for Multimodal Learning | Yanyuan Qiao et.al. | 2403.13600 | null |
| 2024-03-19 | Dated Data: Tracing Knowledge Cutoffs in Large Language Models | Jeffrey Cheng et.al. | 2403.12958 | link |
| 2024-03-19 | Automatic Information Extraction From Employment Tribunal Judgements Using Large Language Models | Joana Ribeiro de Faria et.al. | 2403.12936 | null |
| 2024-03-19 | Rapid AIdeation: Generating Ideas With the Self and in Collaboration With Large Language Models | Gionnieve Lim et.al. | 2403.12928 | null |
| 2024-03-19 | Supporting Energy Policy Research with Large Language Models | Grant Buster et.al. | 2403.12924 | null |
| 2024-03-19 | Semantic Layering in Room Segmentation via LLMs | Taehyeon Kim et.al. | 2403.12920 | null |
| 2024-03-19 | Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference | Baolin Li et.al. | 2403.12900 | null |
| 2024-03-19 | mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding | Anwen Hu et.al. | 2403.12895 | link |
| 2024-03-19 | MEDBind: Unifying Language and Multimodal Medical Data Embeddings | Yuan Gao et.al. | 2403.12894 | null |
| 2024-03-19 | HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning | Fucai Ke et.al. | 2403.12884 | link |
| 2024-03-19 | Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | Zehui Chen et.al. | 2403.12881 | link |
| 2024-03-18 | HDLdebugger: Streamlining HDL debugging with Large Language Models | Xufeng Yao et.al. | 2403.11671 | null |
| 2024-03-18 | Let’s Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model | Haoyun Xu et.al. | 2403.11621 | null |
| 2024-03-18 | Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines | Ekaterina Trofimova et.al. | 2403.11585 | null |
| 2024-03-18 | Reinforcement Learning with Token-level Feedback for Controllable Text Generation | Wendi Li et.al. | 2403.11558 | null |
| 2024-03-18 | LLM^3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning | Shu Wang et.al. | 2403.11552 | link |
| 2024-03-18 | TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling | Weiran Chen et.al. | 2403.11550 | null |
| 2024-03-18 | DEE: Dual-stage Explainable Evaluation Method for Text Generation | Shenyu Zhang et.al. | 2403.11509 | null |
| 2024-03-18 | Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis | Vishnu Sashank Dorbala et.al. | 2403.11487 | null |
| 2024-03-18 | VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding | Yue Fan et.al. | 2403.11481 | null |
| 2024-03-18 | HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models | Huy Nghiem et.al. | 2403.11456 | link |
| 2024-03-14 | Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference | Piotr Nawrot et.al. | 2403.09636 | null |
| 2024-03-14 | 3D-VLA: A 3D Vision-Language-Action Generative World Model | Haoyu Zhen et.al. | 2403.09631 | link |
| 2024-03-14 | MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training | Brandon McKinzie et.al. | 2403.09611 | null |
| 2024-03-14 | Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey | Xiaoyu Liu et.al. | 2403.09606 | null |
| 2024-03-14 | Logical Discrete Graphical Models Must Supplement Large Language Models for Information Synthesis | Gregory Coppola et.al. | 2403.09599 | null |
| 2024-03-14 | ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models | Runyu Ma et.al. | 2403.09583 | null |
| 2024-03-14 | Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation | Yunhao Gou et.al. | 2403.09572 | null |
| 2024-03-14 | Enhancing Trust in Autonomous Agents: An Architecture for Accountability and Explainability through Blockchain and Large Language Models | Laura Fernández-Becerra et.al. | 2403.09567 | null |
| 2024-03-14 | Welcome Your New AI Teammate: On Safety Analysis by Leashing Large Language Models | Ali Nouri et.al. | 2403.09565 | null |
| 2024-03-14 | Less is More: Data Value Estimation for Visual Instruction Tuning | Zikang Liu et.al. | 2403.09559 | null |
| 2024-03-13 | Simple and Scalable Strategies to Continually Pre-train Large Language Models | Adam Ibrahim et.al. | 2403.08763 | link |
| 2024-03-13 | Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework | Jingling Li et.al. | 2403.08743 | null |
| 2024-03-13 | The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models | Carlo Nicolini et.al. | 2403.08739 | null |
| 2024-03-13 | Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization | Renjie Pi et.al. | 2403.08730 | null |
| 2024-03-14 | SOTOPIA- $π$ : Interactive Learning of Socially Intelligent Language Agents | Ruiyi Wang et.al. | 2403.08715 | link |
| 2024-03-13 | Review of Generative AI Methods in Cybersecurity | Yagmur Yigit et.al. | 2403.08701 | null |
| 2024-03-13 | TeaMs-RL: Teaching LLMs to Teach Themselves Better Instructions via Reinforcement Learning | Shangding Gu et.al. | 2403.08694 | link |
| 2024-03-13 | Token Alignment via Character Matching for Subword Completion | Ben Athiwaratkun et.al. | 2403.08688 | null |
| 2024-03-13 | Zero-shot and Few-shot Generation Strategies for Artificial Clinical Records | Erlend Frayling et.al. | 2403.08664 | null |
| 2024-03-13 | Human Alignment of Large Language Models through Online Preference Optimisation | Daniele Calandriello et.al. | 2403.08635 | null |
| 2024-03-12 | Beyond Text: Frozen Large Language Models in Visual Signal Comprehension | Lei Zhu et.al. | 2403.07874 | link |
| 2024-03-12 | Rethinking Generative Large Language Model Evaluation for Semantic Comprehension | Fangyun Wei et.al. | 2403.07872 | null |
| 2024-03-12 | Exploring Safety Generalization Challenges of Large Language Models via Code | Qibing Ren et.al. | 2403.07865 | null |
| 2024-03-12 | DeliGrasp: Inferring Object Mass, Friction, and Compliance with LLMs for Adaptive and Minimally Deforming Grasp Policies | William Xie et.al. | 2403.07832 | null |
| 2024-03-12 | The Missing Piece in Model Editing: A Deep Dive into the Hidden Damage Brought By Model Editing | Jianchen Wang et.al. | 2403.07825 | null |
| 2024-03-12 | Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM | Sainbayar Sukhbaatar et.al. | 2403.07816 | link |
| 2024-03-12 | Fine-tuning Large Language Models with Sequential Instructions | Hanxu Hu et.al. | 2403.07794 | link |
| 2024-03-12 | Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations | Carlos Jose Xavier Cruz et.al. | 2403.07769 | link |
| 2024-03-12 | Synth $^2$ : Boosting Visual-Language Models with Synthetic Captions and Image Embeddings | Sahand Sharifzadeh et.al. | 2403.07750 | null |
| 2024-03-12 | FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models | Yan Liu et.al. | 2403.07747 | null |
| 2024-03-11 | Hybrid Human-LLM Corpus Construction and LLM Evaluation for Rare Linguistic Phenomena | Leonie Weissweiler et.al. | 2403.06965 | null |
| 2024-03-11 | Materials science in the era of large language models: a perspective | Ge Lei et.al. | 2403.06949 | null |
| 2024-03-11 | Naming, Describing, and Quantifying Visual Objects in Humans and LLMs | Alberto Testoni et.al. | 2403.06935 | null |
| 2024-03-11 | ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis | Yanming Liu et.al. | 2403.06932 | link |
| 2024-03-11 | MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning | Yichuan Li et.al. | 2403.06914 | link |
| 2024-03-11 | Exploring Large Language Models and Hierarchical Frameworks for Classification of Large Unstructured Legal Documents | Nishchal Prasad et.al. | 2403.06872 | null |
| 2024-03-11 | Development of a Reliable and Accessible Caregiving Language Model (CaLM) | Bambang Parmanto et.al. | 2403.06857 | null |
| 2024-03-11 | DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation | Guosheng Zhao et.al. | 2403.06845 | null |
| 2024-03-11 | RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback | Yanming Liu et.al. | 2403.06840 | link |
| 2024-03-11 | ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts | Lyuye Zhang et.al. | 2403.06838 | null |
| 2024-03-08 | Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context | Machel Reid et.al. | 2403.05530 | null |
| 2024-03-08 | GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM | Hao Kang et.al. | 2403.05527 | link |
| 2024-03-08 | Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapola | Yijiang Li et.al. | 2403.05523 | null |
| 2024-03-08 | Will GPT-4 Run DOOM? | Adrian de Wynter et.al. | 2403.05468 | null |
| 2024-03-08 | Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs | Arijit Nag et.al. | 2403.05434 | null |
| 2024-03-08 | Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings | Wei Zhou et.al. | 2403.05338 | null |
| 2024-03-08 | ChatASU: Evoking LLM’s Reflexion to Truly Understand Aspect Sentiment in Dialogues | Yiding Liu et.al. | 2403.05326 | null |
| 2024-03-08 | RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | Zihao Wang et.al. | 2403.05313 | link |
| 2024-03-08 | Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents | Jinyang Li et.al. | 2403.05307 | link |
| 2024-03-08 | ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications | Sotaro Takeshita et.al. | 2403.05303 | link |
| 2024-03-07 | Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed | Yifan Wang et.al. | 2403.04765 | link |
| 2024-03-07 | iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries | Adam Coscia et.al. | 2403.04760 | link |
| 2024-03-07 | KnowledgeVIS: Interpreting Language Models by Comparing Fill-in-the-Blank Prompts | Adam Coscia et.al. | 2403.04758 | link |
| 2024-03-07 | LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error | Boshi Wang et.al. | 2403.04746 | link |
| 2024-03-07 | SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM | Jielin Qiu et.al. | 2403.04735 | null |
| 2024-03-07 | ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes | Hashmat Shadab Malik et.al. | 2403.04701 | null |
| 2024-03-07 | Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification | Ekaterina Fadeeva et.al. | 2403.04696 | null |
| 2024-03-07 | PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation | Junsong Chen et.al. | 2403.04692 | null |
| 2024-03-07 | Telecom Language Models: Must They Be Large? | Nicola Piovesan et.al. | 2403.04666 | null |
| 2024-03-07 | QAQ: Quality Adaptive Quantization for LLM KV Cache | Shichen Dong et.al. | 2403.04643 | link |
| 2024-03-06 | Bridging Language and Items for Retrieval and Recommendation | Yupeng Hou et.al. | 2403.03952 | link |
| 2024-03-06 | Did Translation Models Get More Robust Without Anyone Even Noticing? | Ben Peters et.al. | 2403.03923 | null |
| 2024-03-06 | Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing | Asmita et.al. | 2403.03897 | null |
| 2024-03-06 | SaulLM-7B: A pioneering Large Language Model for Law | Pierre Colombo et.al. | 2403.03883 | null |
| 2024-03-06 | Learning to Decode Collaboratively with Multiple Language Models | Shannon Zejiang Shen et.al. | 2403.03870 | link |
| 2024-03-06 | On the Origins of Linear Representations in Large Language Models | Yibo Jiang et.al. | 2403.03867 | null |
| 2024-03-06 | KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions | Fangyuan Xu et.al. | 2403.03866 | null |
| 2024-03-06 | Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning | Deepanway Ghosal et.al. | 2403.03864 | link |
| 2024-03-06 | X-Shot: A Unified System to Handle Frequent, Few-shot and Zero-shot Learning Simultaneously in Classification | Hanzi Xu et.al. | 2403.03863 | link |
| 2024-03-06 | Emojinize : Enriching Any Text with Emoji Translations | Lars Henning Klein et.al. | 2403.03857 | null |
| 2024-03-05 | The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning | Nathaniel Li et.al. | 2403.03218 | link |
| 2024-03-05 | CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments | Savitha Sam Abraham et.al. | 2403.03203 | null |
| 2024-03-05 | Towards Democratized Flood Risk Management: An Advanced AI Assistant Enabled by GPT-4 for Enhanced Interpretability and Public Engagement | Rafaela Martelo et.al. | 2403.03188 | link |
| 2024-03-05 | MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting | Fangchen Liu et.al. | 2403.03174 | null |
| 2024-03-05 | SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection | Peng Qi et.al. | 2403.03170 | null |
| 2024-03-05 | PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset | Arda Uzunoğlu et.al. | 2403.03167 | link |
| 2024-03-05 | Quantum Many-Body Physics Calculations with Large Language Models | Haining Pan et.al. | 2403.03154 | null |
| 2024-03-05 | Language Guided Exploration for RL Agents in Text Environments | Hitesh Golchha et.al. | 2403.03141 | null |
| 2024-03-05 | Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution | Flor Miriam Plaza-del-Arco et.al. | 2403.03121 | null |
| 2024-03-05 | “In Dialogues We Learn”: Towards Personalized Dialogue Without Pre-defined Profiles through In-Dialogue Learning | Chuanqi Cheng et.al. | 2403.03102 | null |
| 2024-03-02 | LM4OPT: Unveiling the Potential of Large Language Models in Formulating Mathematical Optimization Problems | Tasnim Ahmed et.al. | 2403.01342 | null |
| 2024-03-02 | Chaining thoughts and LLMs to learn DNA structural biophysics | Tyler D. Ross et.al. | 2403.01332 | null |
| 2024-03-02 | VNLP: Turkish NLP Package | Meliksah Turker et.al. | 2403.01309 | null |
| 2024-03-02 | VBART: The Turkish LLM | Meliksah Turker et.al. | 2403.01308 | null |
| 2024-03-02 | ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation | Moran Yanuka et.al. | 2403.01306 | link |
| 2024-03-02 | Improving the Validity of Automatically Generated Feedback via Reinforcement Learning | Alexander Scarlatos et.al. | 2403.01304 | link |
| 2024-03-02 | NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention | Tianyi Zhang et.al. | 2403.01273 | null |
| 2024-03-02 | Employing LLMs for Incident Response Planning and Review | Sam Hays et.al. | 2403.01271 | null |
| 2024-03-02 | A comprehensive cross-language framework for harmful content detection with the aid of sentiment analysis | Mohammad Dehghani et.al. | 2403.01270 | null |
| 2024-03-02 | Dissecting Language Models: Machine Unlearning via Selective Pruning | Nicholas Pochinkov et.al. | 2403.01267 | null |
| 2024-02-29 | The All-Seeing Project V2: Towards General Relation Comprehension of the Open World | Weiyun Wang et.al. | 2402.19474 | link |
| 2024-02-29 | Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling | Gabriel Grand et.al. | 2402.19471 | null |
| 2024-02-29 | Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models | Chen Qian et.al. | 2402.19465 | link |
| 2024-02-29 | Curiosity-driven Red-teaming for Large Language Models | Zhang-Wei Hong et.al. | 2402.19464 | link |
| 2024-02-29 | ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL | Yifei Zhou et.al. | 2402.19446 | link |
| 2024-02-29 | Compositional API Recommendation for Library-Oriented Code Generation | Zexiong Ma et.al. | 2402.19431 | null |
| 2024-02-29 | Crafting Knowledge: Exploring the Creative Mechanisms of Chat-Based Search Engines | Lijia Ma et.al. | 2402.19421 | null |
| 2024-02-29 | On the Scaling Laws of Geographical Representation in Language Models | Nathan Godey et.al. | 2402.19406 | null |
| 2024-02-29 | Entity-Aware Multimodal Alignment Framework for News Image Captioning | Junzhe Zhang et.al. | 2402.19404 | null |
| 2024-02-29 | Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Match Human Crowd Accuracy | Philipp Schoenegger et.al. | 2402.19379 | null |
| 2024-02-28 | Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards | Haoxiang Wang et.al. | 2402.18571 | link |
| 2024-02-28 | A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic | Gregory Coppola et.al. | 2402.18566 | null |
| 2024-02-28 | Implicit Bias of Next-Token Prediction | Christos Thrampoulidis et.al. | 2402.18551 | null |
| 2024-02-28 | Few-Shot Fairness: Unveiling LLM’s Potential for Fairness-Aware Classification | Garima Chhikara et.al. | 2402.18502 | null |
| 2024-02-28 | Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration | Crystal Qian et.al. | 2402.18498 | null |
| 2024-02-28 | Language Models Represent Beliefs of Self and Others | Wentao Zhu et.al. | 2402.18496 | null |
| 2024-02-28 | Meta-Task Prompting Elicits Embedding from Large Language Models | Yibin Lei et.al. | 2402.18458 | null |
| 2024-02-28 | Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication | Weize Chen et.al. | 2402.18439 | link |
| 2024-02-28 | Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport | Bin Li et.al. | 2402.18411 | link |
| 2024-02-28 | A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision Language Models | Xiujie Song et.al. | 2402.18409 | null |
Scene Understanding
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-12-22 | CoDrone: Autonomous Drone Navigation Assisted by Edge and Cloud Foundation Models | Pengyu Chen et.al. | 2512.19083 | null |
| 2025-12-22 | VOIC: Visible-Occluded Decoupling for Monocular 3D Semantic Scene Completion | Zaidao Han et.al. | 2512.18954 | null |
| 2025-12-20 | LLaViDA: A Large Language Vision Driving Assistant for Explicit Reasoning and Enhanced Trajectory Planning | Yudong Liu et.al. | 2512.18211 | null |
| 2025-12-19 | InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion | Hoiyeong Jin et.al. | 2512.17504 | null |
| 2025-12-18 | MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning | Yuanchen Ju et.al. | 2512.16909 | null |
| 2025-12-18 | SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning | Tin Stribor Sohn et.al. | 2512.16461 | null |
| 2025-12-18 | Privacy-Aware Sharing of Raw Spatial Sensor Data for Cooperative Perception | Bangya Liu et.al. | 2512.16265 | null |
| 2025-12-16 | Unified Semantic Transformer for 3D Scene Understanding | Sebastian Koch et.al. | 2512.14364 | null |
| 2025-12-16 | Consistent Instance Field for Dynamic Scene Understanding | Junyi Wu et.al. | 2512.14126 | null |
| 2025-12-16 | Deep Learning Perspective of Scene Understanding in Autonomous Robots | Afia Maham et.al. | 2512.14020 | null |
| 2025-12-15 | I-Scene: 3D Instance Models are Implicit Generalizable Spatial Learners | Lu Ling et.al. | 2512.13683 | null |
| 2025-12-15 | MMDrive: Interactive Scene Understanding Beyond Vision with Multi-representational Fusion | Minghui Hou et.al. | 2512.13177 | null |
| 2025-12-15 | DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass | Vivek Alumootil et.al. | 2512.13122 | null |
| 2025-12-15 | SLIM-VDB: A Real-Time 3D Probabilistic Semantic Mapping Framework | Anja Sheppard et.al. | 2512.12945 | null |
| 2025-12-13 | INDOOR-LiDAR: Bridging Simulation and Reality for Robot-Centric 360 degree Indoor LiDAR Perception – A Robot-Centric Hybrid Dataset | Haichuan Li et.al. | 2512.12377 | null |
| 2025-12-13 | MRD: Using Physically Based Differentiable Rendering to Probe Vision Models for 3D Scene Understanding | Benjamin Beilharz et.al. | 2512.12307 | null |
| 2025-12-13 | A Multi-Year Urban Streetlight Imagery Dataset for Visual Monitoring and Spatio-Temporal Drift Detection | Peizheng Li et.al. | 2512.12205 | null |
| 2025-12-13 | Audio-Visual Camera Pose Estimation with Passive Scene Sounds and In-the-Wild Video | Daniel Adebi et.al. | 2512.12165 | null |
| 2025-12-12 | Evaluating Foundation Models’ 3D Understanding Through Multi-View Correspondence Analysis | Valentina Lilova et.al. | 2512.11574 | null |
| 2025-12-12 | Reconstruction as a Bridge for Event-Based Visual Question Answering | Hanyue Lou et.al. | 2512.11510 | null |
| 2025-12-12 | VLM2GeoVec: Toward Universal Multimodal Embeddings for Remote Sensing | Emanuel Sánchez Aimar et.al. | 2512.11490 | null |
| 2025-12-10 | LISN: Language-Instructed Social Navigation with VLM-based Controller Modulating | Junting Chen et.al. | 2512.09920 | null |
| 2025-12-09 | SIP: Site in Pieces- A Dataset of Disaggregated Construction-Phase 3D Scans for Semantic Segmentation and Scene Understanding | Seongyong Kim et.al. | 2512.09062 | null |
| 2025-12-09 | LapFM: A Laparoscopic Segmentation Foundation Model via Hierarchical Concept Evolving Pre-training | Qing Xu et.al. | 2512.08439 | null |
| 2025-12-09 | CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial Reasoning | Zeyuan Chen et.al. | 2512.08135 | null |
| 2025-12-08 | SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery | Meng Cao et.al. | 2512.07733 | null |
| 2025-12-08 | STRinGS: Selective Text Refinement in Gaussian Splatting | Abhinav Raundhal et.al. | 2512.07230 | null |
| 2025-12-08 | A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning | Siyang Jiang et.al. | 2512.07136 | null |
| 2025-12-05 | Physics-Grounded Attached Shadow Detection Using Approximate 3D Geometry and Light Direction | Shilin Hu et.al. | 2512.06179 | null |
| 2025-12-05 | BeLLA: End-to-End Birds Eye View Large Language Assistant for Autonomous Driving | Karthik Mohan et.al. | 2512.06096 | null |
| 2025-12-05 | Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic Excision | Lennart Maack et.al. | 2512.05740 | null |
| 2025-12-05 | Fast SceneScript: Accurate and Efficient Structured Language Model via Multi-Token Prediction | Ruihong Yin et.al. | 2512.05597 | null |
| 2025-12-05 | VOST-SGG: VLM-Aided One-Stage Spatio-Temporal Scene Graph Generation | Chinthani Sugandhika et.al. | 2512.05524 | null |
| 2025-12-04 | 4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer | Xianfeng Wu et.al. | 2512.05060 | null |
| 2025-12-03 | C3G: Learning Compact 3D Representations with 2K Gaussians | Honggyu An et.al. | 2512.04021 | null |
| 2025-12-03 | Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding | Haoran Zhou et.al. | 2512.03601 | null |
| 2025-12-03 | What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models | Tianchen Deng et.al. | 2512.03422 | null |
| 2025-12-03 | ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding | Lingjun Zhao et.al. | 2512.03370 | null |
| 2025-12-02 | SpatialReasoner: Active Perception for Large-Scale 3D Scene Understanding | Hongpei Zheng et.al. | 2512.03284 | null |
| 2025-11-29 | When Harmful Content Gets Camouflaged: Unveiling Perception Failure of LVLMs with CamHarmTI | Yanhui Li et.al. | 2512.03087 | null |
| 2025-12-02 | Layout Anything: One Transformer for Universal Room Layout Estimation | Md Sohag Mia et.al. | 2512.02952 | null |
| 2025-12-02 | Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding | Yerim Jeon et.al. | 2512.02487 | null |
| 2025-12-02 | HouseLayout3D: A Benchmark and Training-Free Baseline for 3D Layout Estimation in the Wild | Valentin Bieri et.al. | 2512.02450 | null |
| 2025-12-01 | ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation | Chenyang Gu et.al. | 2512.02013 | null |
| 2025-12-01 | OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic | Songyan Zhang et.al. | 2512.01830 | null |
| 2025-12-01 | IGen: Scalable Data Generation for Robot Learning from Open-World Images | Chenghao Gu et.al. | 2512.01773 | null |
| 2025-12-01 | SPARK: Sim-ready Part-level Articulated Reconstruction with VLM Knowledge | Yumeng He et.al. | 2512.01629 | null |
| 2025-12-01 | MDiff4STR: Mask Diffusion Model for Scene Text Recognition | Yongkun Du et.al. | 2512.01422 | null |
| 2025-12-01 | VSRD++: Autolabeling for 3D Object Detection via Instance-Aware Volumetric Silhouette Rendering | Zihua Liu et.al. | 2512.01178 | null |
| 2025-11-30 | FOM-Nav: Frontier-Object Maps for Object Goal Navigation | Thomas Chabal et.al. | 2512.01009 | null |
| 2025-11-30 | Smol-GS: Compact Representations for Abstract 3D Gaussian Splatting | Haishan Wang et.al. | 2512.00850 | null |
| 2025-11-29 | Describe Anything Anywhere At Any Moment | Nicolas Gorlo et.al. | 2512.00565 | null |
| 2025-11-29 | Words into World: A Task-Adaptive Agent for Language-Guided Spatial Retrieval in AR | Lixing Guo et.al. | 2512.00294 | null |
| 2025-11-28 | DenseScan: Advancing 3D Scene Understanding with 2D Dense Annotation | Zirui Wang et.al. | 2512.00226 | null |
| 2025-10-28 | A Comprehensive Survey on Surgical Digital Twin | Afsah Sharaf Khan et.al. | 2512.00019 | null |
| 2025-11-28 | DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation | Hongfei Zhang et.al. | 2511.23127 | null |
| 2025-11-28 | Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding | Anik De et.al. | 2511.23071 | null |
| 2025-11-28 | HMR3D: Hierarchical Multimodal Representation for 3D Scene Understanding with Large Vision-Language Model | Chen Li et.al. | 2511.22961 | null |
| 2025-11-28 | See, Rank, and Filter: Important Word-Aware Clip Filtering via Scene Understanding for Moment Retrieval and Highlight Detection | YuEun Lee et.al. | 2511.22906 | null |
| 2025-11-27 | GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes | Di Wang et.al. | 2511.22645 | null |
| 2025-11-27 | CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving | Zhaohui Wang et.al. | 2511.22532 | null |
| 2025-11-27 | RoadSceneBench: A Lightweight Benchmark for Mid-Level Road Scene Understanding | Xiyan Liu et.al. | 2511.22466 | null |
| 2025-11-26 | SurgMLLMBench: A Multimodal Large Language Model Benchmark Dataset for Surgical Scene Understanding | Tae-Min Choi et.al. | 2511.21339 | null |
| 2025-11-26 | Scenes as Tokens: Multi-Scale Normal Distributions Transform Tokenizer for General 3D Vision-Language Understanding | Yutao Tang et.al. | 2511.21191 | null |
| 2025-11-26 | Scaling Foundation Models for Radar Scene Understanding | Pushkal Mishra et.al. | 2511.21105 | null |
| 2025-11-25 | 3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding | Xiaoye Wang et.al. | 2511.20646 | null |
| 2025-11-25 | CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception | Miguel Carvalho et.al. | 2511.19820 | null |
| 2025-11-24 | Perceptual Taxonomy: Evaluating and Guiding Hierarchical Scene Reasoning in Vision-Language Models | Jonathan Lee et.al. | 2511.19526 | null |
| 2025-11-24 | Percept-WAM: Perception-Enhanced World-Awareness-Action Model for Robust End-to-End Autonomous Driving | Jianhua Han et.al. | 2511.19221 | null |
| 2025-11-24 | AIRHILT: A Human-in-the-Loop Testbed for Multimodal Conflict Detection in Aviation | Omar Garib et.al. | 2511.18718 | null |
| 2025-11-24 | Autonomous Surface Selection For Manipulator-Based UV Disinfection In Hospitals Using Foundation Models | Xueyan Oh et.al. | 2511.18709 | null |
| 2025-11-23 | Gaze Beyond the Frame: Forecasting Egocentric 3D Visual Span | Heeseung Yun et.al. | 2511.18470 | null |
| 2025-11-22 | Plan-X: Instruct Video Generation via Semantic Planning | Lun Huang et.al. | 2511.17986 | null |
| 2025-11-21 | CORA: Consistency-Guided Semi-Supervised Framework for Reasoning Segmentation | Prantik Howlader et.al. | 2511.17755 | null |
| 2025-11-18 | Unified Low-Light Traffic Image Enhancement via Multi-Stage Illumination Recovery and Adaptive Noise Suppression | Siddiqua Namrah et.al. | 2511.17612 | null |
| 2025-11-21 | SuperQuadricOcc: Multi-Layer Gaussian Approximation of Superquadrics for Real-Time Self-Supervised Occupancy Estimation | Seamie Hayes et.al. | 2511.17361 | null |
| 2025-11-21 | Robot Confirmation Generation and Action Planning Using Long-context Q-Former Integrated with Multimodal LLM | Chiori Hori et.al. | 2511.17335 | null |
| 2025-11-20 | POMA-3D: The Point Map Way to 3D Scene Understanding | Ye Mao et.al. | 2511.16567 | null |
| 2025-11-20 | LLaVA $^3$ : Representing 3D Scenes like a Cubist Painter to Boost 3D Scene Understanding of VLMs | Doriand Petit et.al. | 2511.16454 | null |
| 2025-11-20 | Building temporally coherent 3D maps with VGGT for memory-efficient Semantic SLAM | Gergely Dinya et.al. | 2511.16282 | null |
| 2025-11-20 | How Robot Dogs See the Unseeable | Oliver Bimber et.al. | 2511.16262 | null |
| 2025-11-20 | Real-Time 3D Object Detection with Inference-Aligned Learning | Chenyu Zhao et.al. | 2511.16140 | null |
| 2025-11-20 | Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click | Raphael Ruschel et.al. | 2511.15948 | null |
| 2025-11-19 | WALDO: Where Unseen Model-based 6D Pose Estimation Meets Occlusion | Sajjad Pakdamansavoji et.al. | 2511.15874 | null |
| 2025-11-19 | ShelfOcc: Native 3D Supervision beyond LiDAR for Vision-Based Occupancy Estimation | Simon Boeder et.al. | 2511.15396 | null |
| 2025-11-19 | Look, Zoom, Understand: The Robotic Eyeball for Embodied Perception | Jiashu Yang et.al. | 2511.15279 | null |
| 2025-11-18 | RocSync: Millisecond-Accurate Temporal Synchronization for Heterogeneous Camera Systems | Jaro Meyer et.al. | 2511.14948 | null |
| 2025-11-18 | Multi-view Phase-aware Pedestrian-Vehicle Incident Reasoning Framework with Vision-Language Models | Hao Zhen et.al. | 2511.14120 | null |
| 2025-11-18 | Error-Driven Scene Editing for 3D Grounding in Large Language Models | Yue Zhang et.al. | 2511.14086 | null |
| 2025-11-18 | RISE: Single Static Radar-based Indoor Scene Understanding | Kaichen Zhou et.al. | 2511.14019 | null |
| 2025-11-17 | VLMs Guided Interpretable Decision Making for Autonomous Driving | Xin Hu et.al. | 2511.13881 | null |
| 2025-11-17 | Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation | Lingfeng Zhang et.al. | 2511.13269 | null |
| 2025-11-17 | Decoupling Scene Perception and Ego Status: A Multi-Context Fusion Approach for Enhanced Generalization in End-to-End Autonomous Driving | Jiacheng Tang et.al. | 2511.13079 | null |
| 2025-11-17 | Visual Room 2.0: Seeing is Not Understanding for MLLMs | Haokun Li et.al. | 2511.12928 | null |
| 2025-11-16 | RoboAfford++: A Generative AI-Enhanced Dataset for Multimodal Affordance Learning in Robotic Manipulation and Navigation | Xiaoshuai Hao et.al. | 2511.12436 | null |
| 2025-11-14 | Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy | Vinit Mehta et.al. | 2511.11777 | null |
| 2025-11-13 | ExpertAD: Enhancing Autonomous Driving Systems with Mixture of Experts | Haowen Jiang et.al. | 2511.11740 | null |
| 2025-11-14 | AirCopBench: A Benchmark for Multi-drone Collaborative Embodied Perception and Reasoning | Jirong Zha et.al. | 2511.11025 | null |
| 2025-11-13 | DBGroup: Dual-Branch Point Grouping for Weakly Supervised 3D Semantic Instance Segmentation | Xuexun Liu et.al. | 2511.10003 | null |
| 2025-11-12 | Spatio-Temporal Data Enhanced Vision-Language Model for Traffic Scene Understanding | Jingtian Ma et.al. | 2511.08978 | null |
| 2025-11-11 | RS-Net: Context-Aware Relation Scoring for Dynamic Scene Graph Generation | Hae-Won Jo et.al. | 2511.08651 | null |
| 2025-11-05 | Case Study: Transformer-Based Solution for the Automatic Digitization of Gas Plants | I. Bailo et.al. | 2511.08609 | null |
| 2025-11-11 | OTSNet: A Neurocognitive-Inspired Observation-Thinking-Spelling Pipeline for Scene Text Recognition | Lixu Sun et.al. | 2511.08133 | null |
| 2025-11-11 | HD $^2$ -SSC: High-Dimension High-Density Semantic Scene Completion for Autonomous Driving | Zhiwen Yang et.al. | 2511.07925 | null |
| 2025-11-11 | Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views | Haida Feng et.al. | 2511.07813 | null |
| 2025-11-10 | Inference-Time Scaling of Diffusion Models for Infrared Data Generation | Kai A. Horstmann et.al. | 2511.07362 | null |
| 2025-11-10 | PlanT 2.0: Exposing Biases and Structural Flaws in Closed-Loop Driving | Simon Gerstenecker et.al. | 2511.07292 | null |
| 2025-11-10 | Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images | JiaKui Hu et.al. | 2511.07222 | null |
| 2025-11-10 | TrueCity: Real and Simulated Urban Data for Cross-Domain 3D Scene Understanding | Duc Nguyen et.al. | 2511.07007 | null |
| 2025-11-10 | PanoNav: Mapless Zero-Shot Object Navigation with Panoramic Scene Parsing and Dynamic Memory | Qunchao Jin et.al. | 2511.06840 | null |
| 2025-11-09 | Video Dataset for Surgical Phase, Keypoint, and Instrument Recognition in Laparoscopic Surgery (PhaKIR) | Tobias Rueckert et.al. | 2511.06549 | null |
| 2025-11-08 | Interaction-Centric Knowledge Infusion and Transfer for Open-Vocabulary Scene Graph Generation | Lin Li et.al. | 2511.05935 | null |
| 2025-11-08 | Open-World 3D Scene Graph Generation for Retrieval-Augmented Reasoning | Fei Yu et.al. | 2511.05894 | null |
| 2025-11-07 | Lite VLA: Efficient Vision-Language-Action Control on CPU-Bound Edge Robots | Justin Williams et.al. | 2511.05642 | null |
| 2025-11-06 | Grounding Foundational Vision Models with 3D Human Poses for Robust Action Recognition | Nicholas Babey et.al. | 2511.05622 | null |
| 2025-10-30 | Token Is All You Need: Cognitive Planning through Belief-Intent Co-Evolution | Shiyao Sang et.al. | 2511.05540 | null |
| 2025-11-06 | GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA Policies | Maëlic Neau et.al. | 2511.04357 | null |
| 2025-11-06 | CaRF: Enhancing Multi-View Consistency in Referring 3D Gaussian Splatting Segmentation | Yuwen Tao et.al. | 2511.03992 | null |
| 2025-11-06 | Simple 3D Pose Features Support Human and Machine Social Scene Understanding | Wenshuo Qin et.al. | 2511.03988 | null |
| 2025-11-06 | Room Envelopes: A Synthetic Dataset for Indoor Layout Reconstruction from Images | Sam Bahrami et.al. | 2511.03970 | null |
| 2025-11-05 | SILVI: Simple Interface for Labeling Video Interactions | Ozan Kanbertay et.al. | 2511.03819 | null |
| 2025-11-05 | SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding | Mauro Orazio Drago et.al. | 2511.03325 | null |
| 2025-11-04 | LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation | Gyeom Hwangbo et.al. | 2511.03001 | null |
| 2025-11-04 | DetectiumFire: A Comprehensive Multi-modal Dataset Bridging Vision and Language for Fire Understanding | Zixuan Liu et.al. | 2511.02495 | null |
| 2025-11-04 | Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization | Tao Liu et.al. | 2511.02489 | link |
| 2025-11-04 | From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics | Nicolas Schuler et.al. | 2511.02427 | null |
| 2025-11-03 | Text-VQA Aug: Pipelined Harnessing of Large Multimodal Models for Automated Synthesis | Soham Joshi et.al. | 2511.02046 | null |
| 2025-10-31 | The Eigenvalues Entropy as a Classifier Evaluation Measure | Doulaye Dembélé et.al. | 2511.01904 | null |
| 2025-11-03 | A Compact Model for Polar Multiple-Channel Field Effect Transistors: A Case Study in III-V Nitride Semiconductors | Aias Asteris et.al. | 2511.01699 | null |
| 2025-11-03 | Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models | Xiaoyu Zhan et.al. | 2511.01618 | null |
| 2025-11-03 | PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model | Wenqi Liang et.al. | 2511.01571 | null |
| 2025-11-03 | Fast and Robust Remote Two-Qubit Gates on Distributed Qubits | Yunan Li et.al. | 2511.01418 | null |
| 2025-11-03 | A Generative Adversarial Approach to Adversarial Attacks Guided by Contrastive Language-Image Pre-trained Model | Sampriti Soor et.al. | 2511.01317 | null |
| 2025-11-03 | LiDAR-VGGT: Cross-Modal Coarse-to-Fine Fusion for Globally Consistent and Metric-Scale Dense Mapping | Lijie Wang et.al. | 2511.01186 | null |
| 2025-11-02 | GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies | Ziye Wang et.al. | 2511.00998 | null |
| 2025-11-01 | Grounding Surgical Action Triplets with Instrument Instance Segmentation: A Dataset and Target-Aware Fusion Approach | Oluwatosin Alabi et.al. | 2511.00643 | null |
| 2025-11-01 | CueBench: Advancing Unified Understanding of Context-Aware Video Anomalies in Real-World | Yating Yu et.al. | 2511.00613 | null |
| 2025-11-01 | Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models | Panwang Pan et.al. | 2511.00503 | null |
| 2025-10-30 | AI Powered High Quality Text to Video Generation with Enhanced Temporal Consistency | Piyushkumar Patel et.al. | 2511.00107 | null |
| 2025-10-31 | Toward Accurate Long-Horizon Robotic Manipulation: Language-to-Action with Foundation Models via Scene Graphs | Sushil Samuel Dinesh et.al. | 2510.27558 | null |
| 2025-10-31 | NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding | Wei Xu et.al. | 2510.27481 | null |
| 2025-10-31 | Understanding the Implicit User Intention via Reasoning with Large Language Model for Image Editing | Yijia Wang et.al. | 2510.27335 | null |
| 2025-10-31 | Generative Semantic Coding for Ultra-Low Bitrate Visual Communication and Analysis | Weiming Chen et.al. | 2510.27324 | null |
| 2025-10-31 | HiGS: Hierarchical Generative Scene Framework for Multi-Step Associative Semantic Spatial Composition | Jiacheng Hong et.al. | 2510.27148 | null |
| 2025-10-30 | A Multi-Modal Neuro-Symbolic Approach for Spatial Reasoning-Based Visual Grounding in Robotics | Simindokht Jahangard et.al. | 2510.27033 | null |
| 2025-10-30 | The ANUBIS detector and its sensitivity to neutral long-lived particles | ANUBIS Collaboration et.al. | 2510.26932 | null |
| 2025-10-30 | HEIR: Learning Graph-Based Motion Hierarchies | Cheng Zheng et.al. | 2510.26786 | null |
| 2025-10-30 | Dynamic Context-Aware Scene Reasoning Using Vision-Language Alignment in Zero-Shot Real-World Scenarios | Manjunath Prasad Holenarasipura Rajiv et.al. | 2510.26580 | null |
| 2025-10-30 | AgriGS-SLAM: Orchard Mapping Across Seasons via Multi-View Gaussian Splatting SLAM | Mirko Usuelli et.al. | 2510.26358 | null |
| 2025-10-30 | GLYPH-SR: Can We Achieve Both High-Quality Image Super-Resolution and High-Fidelity Text Recovery via VLM-guided Latent Diffusion Model? | Mingyu Sung et.al. | 2510.26339 | null |
| 2025-10-30 | Letter of Intent: The Forward Physics Facility | Luis A. Anchordoqui et.al. | 2510.26260 | null |
| 2025-10-30 | Exploring Object-Aware Attention Guided Frame Association for RGB-D SLAM | Ali Caglayan et.al. | 2510.26131 | null |
| 2025-10-29 | Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks | Xu Zheng et.al. | 2510.25760 | link |
| 2025-10-29 | More than a Moment: Towards Coherent Sequences of Audio Descriptions | Eshika Khandelwal et.al. | 2510.25440 | null |
| 2025-10-29 | U-CAN: Unsupervised Point Cloud Denoising with Consistency-Aware Noise2Noise Matching | Junsheng Zhou et.al. | 2510.25210 | null |
| 2025-10-29 | EA3D: Online Open-World 3D Object Extraction from Streaming Videos | Xiaoyu Zhou et.al. | 2510.25146 | null |
| 2025-10-29 | Learning Spatial-Aware Manipulation Ordering | Yuxiang Yan et.al. | 2510.25138 | null |
| 2025-10-29 | Vision-Language Integration for Zero-Shot Scene Understanding in Real-World Environments | Manjunath Prasad Holenarasipura Rajiv et.al. | 2510.25070 | null |
| 2025-10-28 | VividCam: Learning Unconventional Camera Motions from Virtual Synthetic Videos | Qiucheng Wu et.al. | 2510.24904 | null |
| 2025-10-28 | Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation | Inclusion AI et.al. | 2510.24821 | link |
| 2025-10-28 | Sound Source Localization for Spatial Mapping of Surgical Actions in Dynamic Scenes | Jonas Hein et.al. | 2510.24332 | null |
| 2025-10-28 | Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning | Aodi Wu et.al. | 2510.24152 | null |
| 2025-10-27 | Optimized Loudspeaker Panning for Adaptive Sound-Field Correction and Non-stationary Listening Areas | Yuancheng Luo et.al. | 2510.23937 | null |
| 2025-10-27 | DynaStride: Dynamic Stride Windowing with MMCoT for Instructional Multi-Scene Captioning | Eddison Pham et.al. | 2510.23907 | null |
| 2025-10-27 | Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations | Yujia Zhang et.al. | 2510.23607 | null |
| 2025-10-27 | PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity | Yuqian Yuan et.al. | 2510.23603 | link |
| 2025-10-27 | InFlux: A Benchmark for Self-Calibration of Dynamic Intrinsics of Video Cameras | Erich Liang et.al. | 2510.23589 | null |
| 2025-10-27 | Localising under the drape: proprioception in the era of distributed surgical robotic system | Martin Huber et.al. | 2510.23512 | null |
| 2025-10-27 | UrbanIng-V2X: A Large-Scale Multi-Vehicle, Multi-Infrastructure Dataset Across Multiple Intersections for Cooperative Perception | Karthikeyan Chandra Sekaran et.al. | 2510.23478 | null |
| 2025-10-27 | Evaluation of Spherical Wavelet Framework in Comparsion with Ambisonics | Ş. Ekmen et.al. | 2510.23403 | null |
| 2025-10-27 | Evaluation of Vision-LLMs in Surveillance Video | Pascal Benschop et.al. | 2510.23190 | null |
| 2025-10-27 | Adapting Interleaved Encoders with PPO for Language-Guided Reinforcement Learning in BabyAI | Aryan Mathur et.al. | 2510.23148 | null |
| 2025-10-27 | SceneDecorator: Towards Scene-Oriented Story Generation with Scene Planning and Scene Consistency | Quanjian Song et.al. | 2510.22994 | null |
| 2025-10-27 | Charting the Design Space of Neural Graph Representations for Subgraph Matching | Vaibhav Raj et.al. | 2510.22897 | null |
| 2025-10-26 | IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction | Hao Li et.al. | 2510.22706 | link |
| 2025-10-26 | Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views | Anna Deichler et.al. | 2510.22672 | null |
| 2025-10-25 | BLIP-FusePPO: A Vision-Language Deep Reinforcement Learning Framework for Lane Keeping in Autonomous Vehicles | Seyed Ahmad Hosseini Miangoleh et.al. | 2510.22370 | null |
| 2025-10-25 | Bridging Perception and Reasoning: Dual-Pipeline Neuro-Symbolic Landing for UAVs in Cluttered Environments | Weixian Qian et.al. | 2510.22204 | null |
| 2025-10-25 | MOGRAS: Human Motion with Grasping in 3D Scenes | Kunal Bhosikar et.al. | 2510.22199 | null |
| 2025-10-25 | LOC: A General Language-Guided Framework for Open-Set 3D Occupancy Prediction | Yuhang Gao et.al. | 2510.22141 | null |
| 2025-10-25 | CogStereo: Neural Stereo Matching with Implicit Spatial Cognition Embedding | Lihuang Fang et.al. | 2510.22119 | null |
| 2025-10-07 | Avi: Action from Volumetric Inference | Harris Song et.al. | 2510.21746 | null |
| 2025-10-24 | OpenHype: Hyperbolic Embeddings for Hierarchical Open-Vocabulary Radiance Fields | Lisa Weijler et.al. | 2510.21441 | null |
| 2025-10-24 | ZING-3D: Zero-shot Incremental 3D Scene Graphs via Vision-Language Models | Pranav Saxena et.al. | 2510.21069 | null |
| 2025-10-22 | Uncertainty evaluation of segmentation models for Earth observation | Melanie Rey et.al. | 2510.19586 | null |
| 2025-10-22 | Exploring Scale Shift in Crowd Localization under the Context of Domain Generalization | Juncheng Wang et.al. | 2510.19330 | null |
| 2025-10-21 | Event-Grounding Graph: Unified Spatio-Temporal Scene Graph from Robotic Observations | Phuoc Nguyen et.al. | 2510.18697 | null |
| 2025-10-21 | MoTVLA: A Vision-Language-Action Model with Unified Fast-Slow Reasoning | Wenhui Huang et.al. | 2510.18337 | null |
| 2025-10-21 | UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding | Da Zhang et.al. | 2510.18262 | null |
| 2025-10-21 | OpenInsGaussian: Open-vocabulary Instance Gaussian Segmentation with Context-aware Cross-view Fusion | Tianyu Huang et.al. | 2510.18253 | null |
| 2025-10-20 | Enhanced Motion Forecasting with Plug-and-Play Multimodal Large Language Models | Katie Luo et.al. | 2510.17274 | null |
| 2025-10-19 | SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes | Xiongkun Linghu et.al. | 2510.16714 | null |
| 2025-10-18 | Structured Interfaces for Automated Reasoning with 3D Scene Graphs | Aaron Ray et.al. | 2510.16643 | null |
| 2025-10-11 | ESCA: Contextualizing Embodied Agents via Scene-Graph Generation | Jiani Huang et.al. | 2510.15963 | null |
| 2025-10-07 | GAZE:Governance-Aware pre-annotation for Zero-shot World Model Environments | Leela Krishna et.al. | 2510.14992 | null |
| 2025-10-16 | QuASH: Using Natural-Language Heuristics to Query Visual-Language Robotic Maps | Matti Pekkanen et.al. | 2510.14546 | null |
| 2025-10-15 | Efficient Few-Shot Learning in Remote Sensing: Fusing Vision and Vision-Language Models | Jia Yun Chua et.al. | 2510.13993 | null |
| 2025-10-15 | SWIR-LightFusion: Multi-spectral Semantic Fusion of Synthetic SWIR with Thermal IR (LWIR/MWIR) and RGB | Muhammad Ishfaq Hussain et.al. | 2510.13404 | null |
| 2025-10-15 | FlyAwareV2: A Multimodal Cross-Domain UAV Dataset for Urban Scene Understanding | Francesco Barbato et.al. | 2510.13243 | null |
| 2025-10-14 | VLURes: Benchmarking VLM Visual and Linguistic Understanding in Low-Resource Languages | Jesse Atuhurra et.al. | 2510.12845 | null |
| 2025-10-14 | SPORTS: Simultaneous Panoptic Odometry, Rendering, Tracking and Segmentation for Urban Scenes Understanding | Zhiliu Yang et.al. | 2510.12749 | null |
| 2025-10-13 | PanoTPS-Net: Panoramic Room Layout Estimation via Thin Plate Spline Transformation | Hatem Ibrahem et.al. | 2510.11992 | null |
| 2025-10-13 | PhySIC: Physically Plausible 3D Human-Scene Interaction and Contact from a Single Image | Pradyumna Yalandur Muralidhar et.al. | 2510.11649 | null |
| 2025-10-13 | A Framework for Low-Effort Training Data Generation for Urban Semantic Segmentation | Denis Zavadski et.al. | 2510.11567 | null |
| 2025-10-13 | mmWalk: Towards Multi-modal Multi-view Walking Assistance | Kedi Ying et.al. | 2510.11520 | null |
| 2025-10-13 | REACT3D: Recovering Articulations for Interactive Physical 3D Scenes | Zhao Huang et.al. | 2510.11340 | null |
| 2025-10-12 | Real2USD: Scene Representations in Universal Scene Description Language | Christopher D. Hsu et.al. | 2510.10778 | null |
| 2025-10-11 | B2N3D: Progressive Learning from Binary to N-ary Relationships for 3D Object Grounding | Feng Xiao et.al. | 2510.10194 | null |
| 2025-10-10 | CFVBench: A Comprehensive Video Benchmark for Fine-grained Multimodal Retrieval-Augmented Generation | Kaiwen Wei et.al. | 2510.09266 | null |
| 2025-10-08 | Out-of-Distribution Detection in LiDAR Semantic Segmentation Using Epistemic Uncertainty from Hierarchical GMMs | Hanieh Shojaei Miandashti et.al. | 2510.08631 | null |
| 2025-10-03 | Beyond CNNs: Efficient Fine-Tuning of Multi-Modal LLMs for Object Detection on Low-Data Regimes | Nirmal Elamon et.al. | 2510.08589 | null |
| 2025-10-09 | The impact of abstract and object tags on image privacy classification | Darya Baranouskaya et.al. | 2510.07976 | null |
| 2025-10-09 | CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving | Tianrui Zhang et.al. | 2510.07944 | null |
| 2025-10-09 | An End-to-End Room Geometry Constrained Depth Estimation Framework for Indoor Panorama Images | Kanglin Ning et.al. | 2510.07817 | null |
| 2025-10-07 | Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model | Danush Kumar Venkatesh et.al. | 2510.07345 | null |
| 2025-10-08 | Semantic Segmentation Algorithm Based on Light Field and LiDAR Fusion | Jie Luo et.al. | 2510.06687 | null |
| 2025-10-07 | When and How to Cut Classical Concerts? A Multimodal Automated Video Editing Approach | Daniel Gonzálbez-Biosca et.al. | 2510.05661 | null |
| 2025-10-07 | HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video | Hongchi Xia et.al. | 2510.05560 | null |
| 2025-10-06 | Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction | Chi Yan et.al. | 2510.04759 | null |
| 2025-10-02 | LadderMoE: Ladder-Side Mixture of Experts Adapters for Bronze Inscription Recognition | Rixin Zhou et.al. | 2510.01651 | null |
| 2025-10-01 | VL-KnG: Visual Scene Understanding for Navigation Goal Identification using Spatiotemporal Knowledge Graphs | Mohamad Al Mdfaa et.al. | 2510.01483 | null |
| 2025-09-30 | Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification | Artur Barros et.al. | 2509.26457 | null |
| 2025-09-30 | Neighbor-aware informal settlement mapping with graph convolutional networks | Thomas Hallopeau et.al. | 2509.26171 | null |
| 2025-09-30 | Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models | Yuansen Liu et.al. | 2509.26165 | null |
| 2025-09-30 | EasyOcc: 3D Pseudo-Label Supervision for Fully Self-Supervised Semantic Occupancy Prediction Models | Seamie Hayes et.al. | 2509.26087 | null |
| 2025-09-30 | VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs | Peng Liu et.al. | 2509.25916 | null |
| 2025-09-29 | PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos | Ting-Hsuan Liao et.al. | 2509.25183 | null |
| 2025-09-29 | Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs | Yue Zhang et.al. | 2509.25139 | null |
| 2025-09-29 | Social 3D Scene Graphs: Modeling Human Actions and Relations for Interactive Service Robots | Ermanno Bartoli et.al. | 2509.24966 | null |
| 2025-09-29 | CORE-3D: Context-aware Open-vocabulary Retrieval by Embeddings in 3D | Mohamad Amin Mirzaei et.al. | 2509.24528 | null |
| 2025-09-29 | PhysiAgent: An Embodied Agent Framework in Physical World | Zhihao Wang et.al. | 2509.24524 | null |
| 2025-09-29 | Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction-Reasoning Synergy | Haijier Chen et.al. | 2509.24385 | null |
| 2025-09-29 | Robust Partial 3D Point Cloud Registration via Confidence Estimation under Global Context | Yongqiang Wang et.al. | 2509.24275 | null |
| 2025-09-28 | FUSAR-KLIP: Towards Multimodal Foundation Models for Remote Sensing | Yi Yang et.al. | 2509.23927 | null |
| 2025-09-28 | Uni4D-LLM: A Unified SpatioTemporal-Aware VLM for 4D Understanding and Generation | Hanyu Zhou et.al. | 2509.23828 | null |
| 2025-09-28 | From Static to Dynamic: a Survey of Topology-Aware Perception in Autonomous Driving | Yixiao Chen et.al. | 2509.23641 | null |
| 2025-09-28 | From Fields to Splats: A Cross-Domain Survey of Real-Time Neural Scene Representations | Javed Ahmad et.al. | 2509.23555 | null |
| 2025-09-26 | Good Weights: Proactive, Adaptive Dead Reckoning Fusion for Continuous and Robust Visual SLAM | Yanwei Du et.al. | 2509.22910 | null |
| 2025-09-20 | Learning Hyperspectral Images with Curated Text Prompts for Efficient Multimodal Alignment | Abhiroop Chatterjee et.al. | 2509.22697 | null |
| 2025-09-26 | UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective | Jun He et.al. | 2509.22228 | null |
| 2025-09-26 | Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics | Saurav Jha et.al. | 2509.22014 | null |
| 2025-09-26 | Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding | Vahid Mirjalili et.al. | 2509.21922 | null |
| 2025-09-25 | Real-Time Indoor Object SLAM with LLM-Enhanced Priors | Yang Jiao et.al. | 2509.21602 | null |
| 2025-09-25 | Residual Vector Quantization For Communication-Efficient Multi-Agent Perception | Dereje Shenkut et.al. | 2509.21464 | null |
| 2025-09-23 | TUN3D: Towards Real-World Scene Understanding from Unposed Images | Anton Konushin et.al. | 2509.21388 | null |
| 2025-09-25 | DENet: Dual-Path Edge Network with Global-Local Attention for Infrared Small Target Detection | Jiayi Zuo et.al. | 2509.20701 | null |
| 2025-09-23 | SGAligner++: Cross-Modal Language-Aided 3D Scene Graph Alignment | Binod Singh et.al. | 2509.20401 | null |
| 2025-09-24 | Queryable 3D Scene Representation: A Multi-Modal Framework for Semantic Reasoning and Robotic Task Planning | Xun Li et.al. | 2509.20077 | null |
| 2025-09-24 | OmniScene: Attention-Augmented Multimodal 4D Scene Understanding for Autonomous Driving | Pei Liu et.al. | 2509.19973 | null |
| 2025-09-23 | Category-Level Object Shape and Pose Estimation in Less Than a Millisecond | Lorenzo Shaikewitz et.al. | 2509.18979 | null |
| 2025-09-23 | Eva-VLA: Evaluating Vision-Language-Action Models’ Robustness Under Real-World Physical Variations | Hanqing Liu et.al. | 2509.18953 | null |
| 2025-09-23 | Surgical Video Understanding with Label Interpolation | Garam Kim et.al. | 2509.18802 | null |
| 2025-09-23 | MV-UMI: A Scalable Multi-View Interface for Cross-Embodiment Learning | Omar Rayyan et.al. | 2509.18757 | null |
| 2025-09-23 | PIE: Perception and Interaction Enhanced End-to-End Motion Planning for Autonomous Driving | Chengran Yuan et.al. | 2509.18609 | null |
| 2025-09-22 | Multi-scale Temporal Prediction via Incremental Generation and Multi-agent Collaboration | Zhitao Zeng et.al. | 2509.17429 | null |
| 2025-09-20 | Text-Scene: A Scene-to-Language Parsing Framework for 3D Scene Understanding | Haoyuan Li et.al. | 2509.16721 | null |
| 2025-09-20 | ST-GS: Vision-Based 3D Semantic Occupancy Prediction with Spatial-Temporal Gaussian Splatting | Xiaoyang Yan et.al. | 2509.16552 | null |
| 2025-09-19 | Towards Sharper Object Boundaries in Self-Supervised Depth Estimation | Aurélien Cecille et.al. | 2509.15987 | null |
| 2025-09-19 | RangeSAM: Leveraging Visual Foundation Models for Range-View repesented LiDAR segmentation | Paul Julius Kühn et.al. | 2509.15886 | null |
| 2025-09-19 | SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models | Sen Wang et.al. | 2509.15536 | null |
| 2025-09-18 | Evil Vizier: Vulnerabilities of LLM-Integrated XR Systems | Yicheng Zhang et.al. | 2509.15213 | null |
| 2025-09-18 | SPATIALGEN: Layout-guided 3D Indoor Scene Generation | Chuan Fang et.al. | 2509.14981 | link |
| 2025-09-16 | Semantic 3D Reconstructions with SLAM for Central Airway Obstruction | Ayberk Acar et.al. | 2509.13541 | null |
| 2025-09-16 | ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors | Romain Hardy et.al. | 2509.13525 | null |
| 2025-09-16 | 3D Aware Region Prompted Vision Language Model | An-Chieh Cheng et.al. | 2509.13317 | null |
| 2025-09-16 | Weakly and Self-Supervised Class-Agnostic Motion Prediction for Autonomous Driving | Ruibo Li et.al. | 2509.13116 | null |
| 2025-09-16 | Beyond Averages: Open-Vocabulary 3D Scene Understanding with Gaussian Splatting and Bag of Embeddings | Abdalla Arafa et.al. | 2509.12938 | null |
| 2025-09-16 | MEJO: MLLM-Engaged Surgical Triplet Recognition via Inter- and Intra-Task Joint Optimization | Yiyi Zhang et.al. | 2509.12893 | null |
| 2025-09-15 | RailSafeNet: Visual Scene Understanding for Tram Safety | Ondřej Valach et.al. | 2509.12125 | link |
| 2025-09-15 | Microsurgical Instrument Segmentation for Robot-Assisted Surgery | Tae Kyeong Jeong et.al. | 2509.11727 | null |
| 2025-09-15 | See What I Mean? Mobile Eye-Perspective Rendering for Optical See-through Head-mounted Displays | Gerlinde Emsenhuber et.al. | 2509.11653 | null |
| 2025-09-14 | Modality-Aware Infrared and Visible Image Fusion with Target-Aware Supervision | Tianyao Sun et.al. | 2509.11476 | null |
| 2025-09-14 | DreamNav: A Trajectory-Based Imaginative Framework for Zero-Shot Vision-and-Language Navigation | Yunheng Wang et.al. | 2509.11197 | null |
| 2025-09-14 | 3DAeroRelief: The first 3D Benchmark UAV Dataset for Post-Disaster Assessment | Nhut Le et.al. | 2509.11097 | null |
| 2025-09-13 | OpenUrban3D: Annotation-Free Open-Vocabulary Semantic Segmentation of Large-Scale Urban Point Clouds | Chongyu Wang et.al. | 2509.10842 | null |
| 2025-09-12 | Multimodal SAM-adapter for Semantic Segmentation | Iacopo Curti et.al. | 2509.10408 | null |
| 2025-09-10 | SocialNav-SUB: Benchmarking VLMs for Scene Understanding in Social Robot Navigation | Michael J. Munje et.al. | 2509.08757 | link |
| 2025-09-09 | OmniMap: A General Mapping Framework Integrating Optics, Geometry, and Semantics | Yinan Deng et.al. | 2509.07500 | null |
| 2025-09-09 | DepthVision: Robust Vision-Language Understanding through GAN-Based LiDAR-to-RGB Synthesis | Sven Kirchner et.al. | 2509.07463 | null |
| 2025-09-08 | Synesthesia of Machines (SoM)-Aided LiDAR Point Cloud Transmission for Collaborative Perception | Ensong Liu et.al. | 2509.06506 | null |
| 2025-09-07 | UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning | Huy Le et.al. | 2509.06165 | null |
| 2025-09-06 | Depth-Aware Super-Resolution via Distance-Adaptive Variational Formulation | Tianhao Guo et.al. | 2509.05746 | null |
| 2025-09-05 | SGS-3D: High-Fidelity 3D Instance Segmentation via Reliable Semantic Mask Splitting and Growing | Chaolei Wang et.al. | 2509.05144 | null |
| 2025-09-03 | Reg3D: Reconstructive Geometry Instruction Tuning for 3D Scene Understanding | Hongpei Zheng et.al. | 2509.03635 | null |
| 2025-09-03 | Rashomon in the Streets: Explanation Ambiguity in Scene Understanding | Helge Spieker et.al. | 2509.03169 | null |
| 2025-09-02 | Generalizable Skill Learning for Construction Robots with Crowdsourced Natural Language Instructions, Composable Skills Standardization, and Large Language Model | Hongrui Yu et.al. | 2509.02876 | null |
| 2025-09-02 | SynthGenNet: a self-supervised approach for test-time generalization using synthetic multi-source domain mixing of street view images | Pushpendra Dhakara et.al. | 2509.02287 | null |
| 2025-09-02 | Omnidirectional Spatial Modeling from Correlated Panoramas | Xinshen Zhang et.al. | 2509.02164 | null |
| 2025-09-02 | AI-Driven Marine Robotics: Emerging Trends in Underwater Perception and Ecosystem Monitoring | Scarlett Raine et.al. | 2509.01878 | null |
| 2025-09-01 | Articulated Object Estimation in the Wild | Abdelrhman Werby et.al. | 2509.01708 | null |
| 2025-09-01 | Measuring Image-Relation Alignment: Reference-Free Evaluation of VLMs and Synthetic Pre-training for Open-Vocabulary Scene Graph Generation | Maëlic Neau et.al. | 2509.01209 | null |
| 2025-08-31 | SWAGSplatting: Semantic-guided Water-scene Augmented Gaussian Splatting | Zhuodong Jiang et.al. | 2509.00800 | null |
| 2025-08-31 | OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving | Pei Liu et.al. | 2509.00789 | null |
| 2025-08-30 | ConceptBot: Enhancing Robot’s Autonomy through Task Decomposition with Large Language Models and Knowledge Graph | Alessandro Leanza et.al. | 2509.00570 | null |
| 2025-08-29 | Beyond Pixels: Introducing Geometric-Semantic World Priors for Video-based Embodied Models via Spatio-temporal Alignment | Jinzhou Tang et.al. | 2509.00210 | null |
| 2025-08-18 | 2COOOL: 2nd Workshop on the Challenge Of Out-Of-Label Hazards in Autonomous Driving | Ali K. AlShami et.al. | 2508.21080 | null |
| 2025-08-27 | Hyperspectral Sensors and Autonomous Driving: Technologies, Limitations, and Opportunities | Imad Ali Shah et.al. | 2508.19905 | null |
| 2025-08-27 | Context-Aware Risk Estimation in Home Environments: A Probabilistic Framework for Service Robots | Sena Ishii et.al. | 2508.19788 | null |
| 2025-08-27 | LabelGS: Label-Aware 3D Gaussian Splatting for 3D Scene Segmentation | Yupeng Zhang et.al. | 2508.19699 | link |
| 2025-08-27 | Scalable Object Detection in the Car Interior With Vision Foundation Models | Bálint Mészáros et.al. | 2508.19651 | null |
| 2025-08-25 | ArgusCogito: Chain-of-Thought for Cross-Modal Synergy and Omnidirectional Reasoning in Camouflaged Object Segmentation | Jianwen Tan et.al. | 2508.18050 | null |
| 2025-08-25 | HLG: Comprehensive 3D Room Construction via Hierarchical Layout Generation | Xiping Wang et.al. | 2508.17832 | null |
| 2025-08-24 | Investigating Domain Gaps for Indoor 3D Object Detection | Zijing Zhao et.al. | 2508.17439 | null |
| 2025-08-24 | An LLM-LVLM Driven Agent for Iterative and Fine-Grained Image Editing | Zihan Liang et.al. | 2508.17435 | null |
| 2025-08-24 | SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality | Yuzhi Lai et.al. | 2508.17255 | null |
| 2025-08-24 | Multi-Agent Visual-Language Reasoning for Comprehensive Highway Scene Understanding | Yunxiang Yang et.al. | 2508.17205 | null |
| 2025-08-23 | PVNet: Point-Voxel Interaction LiDAR Scene Upsampling Via Diffusion Models | Xianjing Cheng et.al. | 2508.17050 | null |
| 2025-08-22 | HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction | Sara Rojas et.al. | 2508.16433 | null |
| 2025-08-21 | ASCMamba: Multimodal Time-Frequency Mamba for Acoustic Scene Classification | Bochao Sun et.al. | 2508.15632 | null |
| 2025-08-19 | Hybrelighter: Combining Deep Anisotropic Diffusion and Scene Reconstruction for On-device Real-time Relighting in Mixed Reality | Hanwen Zhao et.al. | 2508.14930 | null |
| 2025-08-20 | MoVieDrive: Multi-Modal Multi-View Urban Scene Video Generation | Guile Wu et.al. | 2508.14327 | null |
| 2025-08-19 | GALA: Guided Attention with Language Alignment for Open Vocabulary Gaussian Splatting | Elena Alegret et.al. | 2508.14278 | null |
| 2025-08-19 | ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving | Xianda Guo et.al. | 2508.13977 | null |
| 2025-08-19 | Structured Prompting and Multi-Agent Knowledge Distillation for Traffic Video Interpretation and Risk Inference | Yunxiang Yang et.al. | 2508.13439 | null |
| 2025-08-17 | PreSem-Surf: RGB-D Surface Reconstruction with Progressive Semantic Modeling and SG-MLP Pre-Rendering Mechanism | Yuyan Ye et.al. | 2508.13228 | null |
| 2025-08-17 | LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving | Nan Song et.al. | 2508.12404 | null |
| 2025-08-17 | Splat Feature Solver | Butian Xiong et.al. | 2508.12216 | null |
| 2025-08-16 | InstDrive: Instance-Aware 3D Gaussian Splatting for Driving Scenes | Hongyuan Liu et.al. | 2508.12015 | null |
| 2025-08-14 | Advancing 3D Scene Understanding with MV-ScanQA Multi-View Reasoning Evaluation and TripAlign Pre-training Dataset | Wentao Mo et.al. | 2508.11058 | link |
| 2025-08-13 | Semantic-aware DropSplat: Adaptive Pruning of Redundant Gaussians for 3D Aerial-View Segmentation | Xu Tang et.al. | 2508.09626 | null |
| 2025-08-12 | Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment | Shi-Chen Zhang et.al. | 2508.08811 | null |
| 2025-08-11 | SAGOnline: Segment Any Gaussians Online | Wentao Sun et.al. | 2508.08219 | null |
| 2025-08-11 | TrackOR: Towards Personalized Intelligent Operating Rooms Through Robust Tracking | Tony Danjun Wang et.al. | 2508.07968 | null |
| 2025-08-11 | DoorDet: Semi-Automated Multi-Class Door Detection Dataset via Object Detection and Large Language Models | Licheng Zhang et.al. | 2508.07714 | null |
| 2025-08-10 | Understanding Dynamic Scenes in Ego Centric 4D Point Clouds | Junsheng Huang et.al. | 2508.07251 | null |
| 2025-08-05 | Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images | Qi Xun Yeo et.al. | 2508.06546 | null |
| 2025-08-07 | VISTA: Vision-Language Imitation of Situational Thinking and Attention for Human-Like Driver Focus in Dynamic Environments | Kaiser Hamid et.al. | 2508.05852 | null |
| 2025-08-07 | Point cloud segmentation for 3D Clothed Human Layering | Davide Garavaso et.al. | 2508.05531 | null |
| 2025-08-07 | EndoMatcher: Generalizable Endoscopic Image Matcher via Multi-Domain Pre-training for Robot-Assisted Surgery | Bingyu Yang et.al. | 2508.05205 | null |
| 2025-08-07 | A Study of the Framework and Real-World Applications of Language Embedding for 3D Scene Understanding | Mahmoud Chick Zaouali et.al. | 2508.05064 | null |
| 2025-08-07 | TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring | Zhu Xu et.al. | 2508.04943 | null |
| 2025-08-06 | PixCuboid: Room Layout Estimation from Multi-view Featuremetric Alignment | Gustav Hanning et.al. | 2508.04659 | link |
| 2025-08-05 | SAVER: Mitigating Hallucinations in Large Vision-Language Models via Style-Aware Visual Early Revision | Zhaoxu Li et.al. | 2508.03177 | null |
| 2025-08-05 | CHARM: Collaborative Harmonization across Arbitrary Modalities for Modality-agnostic Semantic Segmentation | Lekang Wen et.al. | 2508.03060 | null |
| 2025-08-04 | FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation | Cui Miao et.al. | 2508.02190 | null |
| 2025-08-04 | GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting | Lei Yao et.al. | 2508.02172 | link |
| 2025-08-03 | DiffSemanticFusion: Semantic Raster BEV Fusion for Autonomous Driving via Online HD Map Diffusion | Zhigang Sun et.al. | 2508.01778 | link |
| 2025-08-03 | AG $^2$ aussian: Anchor-Graph Structured Gaussian Splatting for Instance-Level 3D Scene Understanding and Editing | Zhaonan Wang et.al. | 2508.01740 | null |
| 2025-08-03 | Dynamic Robot-Assisted Surgery with Hierarchical Class-Incremental Semantic Segmentation | Julia Hindel et.al. | 2508.01713 | null |
| 2025-08-02 | TEACH: Text Encoding as Curriculum Hints for Scene Text Recognition | Xiahan Yang et.al. | 2508.01153 | null |
| 2025-08-02 | OpenGS-Fusion: Open-Vocabulary Dense Mapping with Hybrid 3D Gaussian Splatting for Refined Object-Level Understanding | Dianyi Yang et.al. | 2508.01150 | null |
| 2025-08-01 | 3D Reconstruction via Incremental Structure From Motion | Muhammad Zeeshan et.al. | 2508.01019 | null |
| 2025-08-01 | Cooperative Perception: A Resource-Efficient Framework for Multi-Drone 3D Scene Reconstruction Using Federated Diffusion and NeRF | Massoud Pourmandi et.al. | 2508.00967 | null |
| 2025-07-31 | Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs | Bhavya Goyal et.al. | 2508.00169 | link |
| 2025-07-31 | 3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding | Ting Huang et.al. | 2507.23478 | link |
| 2025-07-31 | FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models | Yiming Yang et.al. | 2507.23325 | null |
| 2025-07-31 | FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning | Jiajun Cao et.al. | 2507.23318 | null |
| 2025-07-30 | DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion | Qingcheng Zhao et.al. | 2507.22825 | null |
| 2025-07-30 | UAVScenes: A Multi-Modal Dataset for UAVs | Sijie Wang et.al. | 2507.22412 | link |
| 2025-07-29 | EIFNet: Leveraging Event-Image Fusion for Robust Semantic Segmentation | Zhijiang Li et.al. | 2507.21971 | null |
| 2025-07-28 | GTAD: Global Temporal Aggregation Denoising Learning for 3D Semantic Occupancy Prediction | Tianhao Li et.al. | 2507.20963 | null |
| 2025-07-28 | Compositional Video Synthesis by Temporal Object-Centric Learning | Adil Kaan Akan et.al. | 2507.20855 | null |
| 2025-07-27 | VESPA: Towards un(Human)supervised Open-World Pointcloud Labeling for Autonomous Driving | Levente Tempfli et.al. | 2507.20397 | null |
| 2025-07-27 | Solving Scene Understanding for Autonomous Navigation in Unstructured Environments | Naveen Mathews Renji et.al. | 2507.20389 | null |
| 2025-07-26 | FROSS: Faster-than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images | Hao-Yu Hou et.al. | 2507.19993 | link |
| 2025-07-26 | UniCT Depth: Event-Image Fusion Based Monocular Depth Estimation with Convolution-Compensated ViT Dual SA Block | Luoxi Jing et.al. | 2507.19948 | null |
| 2025-07-26 | RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection | Xiaokai Bai et.al. | 2507.19856 | null |
| 2025-07-26 | Taking Language Embedded 3D Gaussian Splatting into the Wild | Yuze Wang et.al. | 2507.19830 | null |
| 2025-07-25 | Co-Win: Joint Object Detection and Instance Segmentation in LiDAR Point Clouds via Collaborative Window Processing | Haichuan Li et.al. | 2507.19691 | null |
| 2025-07-25 | VisHall3D: Monocular Semantic Scene Completion from Reconstructing the Visible Regions to Hallucinating the Invisible Regions | Haoang Lu et.al. | 2507.19188 | null |
| 2025-07-24 | Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting | Xingyu Miao et.al. | 2507.18678 | null |
| 2025-07-23 | From Scan to Action: Leveraging Realistic Scans for Embodied Scene Understanding | Anna-Maria Halacheva et.al. | 2507.17585 | null |
| 2025-07-23 | IndoorBEV: Joint Detection and Footprint Completion of Objects via Mask-based Prediction in Indoor Scenarios for Bird’s-Eye View Perception | Haichuan Li et.al. | 2507.17445 | null |
| 2025-07-22 | ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension | Yizhi Hu et.al. | 2507.16877 | null |
| 2025-07-22 | Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge | Tobias Rueckert et.al. | 2507.16559 | null |
| 2025-07-22 | Optimization of DNN-based HSI Segmentation FPGA-based SoC for ADS: A Practical Approach | Jon Gutiérrez-Zaballa et.al. | 2507.16556 | null |
| 2025-07-22 | DenseSR: Image Shadow Removal as Dense Prediction | Yu-Fan Lin et.al. | 2507.16472 | null |
| 2025-07-21 | Label tree semantic losses for rich multi-class medical image segmentation | Junwen Wang et.al. | 2507.15777 | null |
| 2025-07-21 | Towards Holistic Surgical Scene Graph | Jongmin Shin et.al. | 2507.15541 | null |
| 2025-07-21 | ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting | Ruijie Zhu et.al. | 2507.15454 | null |
| 2025-07-21 | VLM-UDMC: VLM-Enhanced Unified Decision-Making and Motion Control for Urban Autonomous Driving | Haichao Liu et.al. | 2507.15266 | null |
| 2025-07-19 | DiSCO-3D : Discovering and segmenting Sub-Concepts from Open-vocabulary queries in NeRF | Doriand Petit et.al. | 2507.14596 | null |
| 2025-07-19 | Descrip3D: Enhancing Large Language Model-based 3D Scene Understanding with Object-Level Text Descriptions | Jintang Xue et.al. | 2507.14555 | null |
| 2025-07-19 | Multimodal AI for Gastrointestinal Diagnostics: Tackling VQA in MEDVQA-GI 2025 | Sujata Gaihre et.al. | 2507.14544 | null |
| 2025-07-19 | CRAFT: A Neuro-Symbolic Framework for Visual Functional Affordance Grounding | Zhou Chen et.al. | 2507.14426 | null |
| 2025-07-18 | Semantic Segmentation based Scene Understanding in Autonomous Vehicles | Ehsan Rassekh et.al. | 2507.14303 | null |
| 2025-07-18 | Moving Object Detection from Moving Camera Using Focus of Expansion Likelihood and Segmentation | Masahiro Ogawa et.al. | 2507.13628 | null |
| 2025-07-17 | Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection | Jingyao Wang et.al. | 2507.13061 | null |
| 2025-07-17 | Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models | Yifan Xu et.al. | 2507.12916 | null |
| 2025-07-17 | City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning | Penglei Sun et.al. | 2507.12795 | null |
| 2025-07-16 | Funnel-HOI: Top-Down Perception for Zero-Shot HOI Detection | Sandipan Sarma et.al. | 2507.12628 | null |
| 2025-07-15 | Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis | Maciej Szankin et.al. | 2507.11730 | null |
| 2025-07-15 | Tactical Decision for Multi-UGV Confrontation with a Vision-Language Model-Based Commander | Li Wang et.al. | 2507.11079 | null |
| 2025-07-15 | Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation | Yanbo Wang et.al. | 2507.11001 | null |
| 2025-07-14 | Static or Temporal? Semantic Scene Simplification to Aid Wayfinding in Immersive Simulations of Bionic Vision | Justin M. Kasowski et.al. | 2507.10813 | null |
| 2025-07-14 | EmbRACE-3K: Embodied Reasoning and Action in Complex Environments | Mingxian Lin et.al. | 2507.10548 | null |
| 2025-07-13 | VRU-Accident: A Vision-Language Benchmark for Video Question Answering and Dense Captioning for Accident Scene Understanding | Younggun Kim et.al. | 2507.09815 | null |
| 2025-07-13 | Self-supervised Pretraining for Integrated Prediction and Planning of Automated Vehicles | Yangang Ren et.al. | 2507.09537 | null |
| 2025-07-12 | Fast3D: Accelerating 3D Multi-modal Large Language Models for Efficient 3D Scene Understanding | Wencan Huang et.al. | 2507.09334 | null |
| 2025-07-12 | THYME: Temporal Hierarchical-Cyclic Interactivity Modeling for Video Scene Graphs in Aerial Footage | Trong-Thuan Nguyen et.al. | 2507.09200 | null |
| 2025-07-12 | Towards Spatial Audio Understanding via Question Answering | Parthasaarathy Sudarsanam et.al. | 2507.09195 | null |
| 2025-07-12 | On the Fragility of Multimodal Perception to Temporal Misalignment in Autonomous Driving | Md Hasan Shahriar et.al. | 2507.09095 | null |
| 2025-07-10 | OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding | JingLi Lin et.al. | 2507.07984 | null |
| 2025-07-10 | MUVOD: A Novel Multi-view Video Object Segmentation Dataset and A Benchmark for 3D Segmentation | Bangning Wei et.al. | 2507.07519 | null |
| 2025-07-09 | SemRaFiner: Panoptic Segmentation in Sparse and Noisy Radar Point Clouds | Matthias Zeller et.al. | 2507.06906 | null |
| 2025-07-09 | Token Bottleneck: One Token to Remember Dynamics | Taekyung Kim et.al. | 2507.06543 | null |
| 2025-07-09 | What Demands Attention in Urban Street Scenes? From Scene Understanding towards Road Safety: A Survey of Vision-driven Datasets and Studies | Yaoqi Huang et.al. | 2507.06513 | null |
| 2025-07-08 | Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion | Aleksandar Jevtić et.al. | 2507.06230 | null |
| 2025-07-08 | SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning | Xin Hu et.al. | 2507.05798 | null |
| 2025-07-07 | All in One: Visual-Description-Guided Unified Point Cloud Segmentation | Zongyan Han et.al. | 2507.05211 | null |
| 2025-07-07 | MOSU: Autonomous Long-range Robot Navigation with Multi-modal Scene Understanding | Jing Liang et.al. | 2507.04686 | null |
| 2025-07-05 | Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation | Ziyu Zhu et.al. | 2507.04047 | null |
| 2025-07-05 | Habitat Classification from Ground-Level Imagery Using Deep Neural Networks | Hongrui Shi et.al. | 2507.04017 | null |
| 2025-07-04 | Radar Velocity Transformer: Single-scan Moving Object Segmentation in Noisy Radar Point Clouds | Matthias Zeller et.al. | 2507.03463 | null |
| 2025-07-03 | LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans | Zhening Huang et.al. | 2507.02861 | null |
| 2025-07-03 | LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion | Fangfu Liu et.al. | 2507.02813 | null |
| 2025-07-03 | SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment | Qi Xu et.al. | 2507.02705 | null |
| 2025-07-04 | Team RAS in 9th ABAW Competition: Multimodal Compound Expression Recognition Approach | Elena Ryumina et.al. | 2507.02205 | null |
| 2025-07-02 | ESTR-CoT: Towards Explainable and Accurate Event Stream based Scene Text Recognition with Chain-of-Thought Reasoning | Xiao Wang et.al. | 2507.02200 | null |
| 2025-07-02 | ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving | Kai Chen et.al. | 2507.01735 | null |
| 2025-07-01 | GaussianVLM: Scene-centric 3D Vision-Language Models using Language-aligned Gaussian Splats for Embodied Reasoning and Beyond | Anna-Maria Halacheva et.al. | 2507.00886 | null |
| 2025-07-01 | BEV-VAE: Multi-view Image Generation with Spatial Consistency for Autonomous Driving | Zeming Chen et.al. | 2507.00707 | null |
| 2025-06-29 | IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering | Parker Liu et.al. | 2506.23329 | null |
| 2025-07-01 | SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting | Yiming Huang et.al. | 2506.23309 | null |
| 2025-06-29 | Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation | Zhenhua Ning et.al. | 2506.23120 | null |
| 2025-06-28 | Unleashing the Multi-View Fusion Potential: Noise Correction in VLM for Open-Vocabulary 3D Scene Understanding | Xingyilang Yin et.al. | 2506.22817 | null |
| 2025-06-28 | VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding | Minchao Jiang et.al. | 2506.22799 | null |
| 2025-06-26 | CAT-SG: A Large Dynamic Scene Graph Dataset for Fine-Grained Understanding of Cataract Surgery | Felix Holm et.al. | 2506.21813 | null |
| 2025-06-24 | FrankenBot: Brain-Morphic Modular Orchestration for Robotic Manipulation with Vision-Language Models | Shiyi Wang et.al. | 2506.21627 | null |
| 2025-06-26 | CoPa-SG: Dense Scene Graphs with Parametric and Proto-Relations | Julian Lorenz et.al. | 2506.21357 | null |
| 2025-06-27 | ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation | Xiwei Xuan et.al. | 2506.21233 | null |
| 2025-06-25 | IPFormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals | Markus Gross et.al. | 2506.20671 | null |
| 2025-06-25 | Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios | Wenbin Gan et.al. | 2506.20531 | null |
| 2025-06-25 | DreamAnywhere: Object-Centric Panoramic 3D Scene Generation | Edoardo Alberto Dominici et.al. | 2506.20367 | null |
| 2025-06-24 | HOIverse: A Synthetic Scene Graph Dataset With Human Object Interactions | Mrunmai Vivek Phatak et.al. | 2506.19639 | null |
| 2025-06-24 | Fake or Real, Can Robots Tell? Evaluating Embodied Vision-Language Models on Real and 3D-Printed Objects | Federico Tavella et.al. | 2506.19579 | null |
| 2025-06-24 | Surgery-R1: Advancing Surgical-VQLA with Reasoning Multimodal Large Language Model via Reinforcement Learning | Pengfei Hao et.al. | 2506.19469 | null |
| 2025-06-24 | Segment Any 3D-Part in a Scene from a Sentence | Hongyu Wu et.al. | 2506.19331 | null |
| 2025-06-24 | Da Yu: Towards USV-Based Image Captioning for Waterway Surveillance and Scene Understanding | Runwei Guan et.al. | 2506.19288 | null |
| 2025-06-24 | Object-aware Sound Source Localization via Audio-Visual Scene Understanding | Sung Jin Um et.al. | 2506.18557 | null |
| 2025-06-23 | DIP: Unsupervised Dense In-Context Post-training of Visual Representations | Sophia Sirko-Galouchenko et.al. | 2506.18463 | null |
| 2025-06-22 | TEM^3-Learning: Time-Efficient Multimodal Multi-Task Learning for Advanced Assistive Driving | Wenzhuo Liu et.al. | 2506.18084 | null |
| 2025-06-22 | Feedback Driven Multi Stereo Vision System for Real-Time Event Analysis | Mohamed Benkedadra et.al. | 2506.17910 | null |
| 2025-06-21 | Optimization-Free Patch Attack on Stereo Depth Estimation | Hangcheng Liu et.al. | 2506.17632 | null |
| 2025-06-21 | Scene-R1: Video-Grounded Large Language Models for 3D Scene Reasoning without 3D Annotations | Zhihao Yuan et.al. | 2506.17545 | null |
| 2025-06-17 | Leader360V: The Large-scale, Real-world 360 Video Dataset for Multi-task Learning in Diverse Environment | Weiming Zhang et.al. | 2506.14271 | null |
| 2025-06-17 | Unified Representation Space for 3D Visual Grounding | Yinuo Zheng et.al. | 2506.14238 | null |
| 2025-06-17 | SceneAware: Scene-Constrained Pedestrian Trajectory Prediction with LLM-Guided Walkability | Juho Bai et.al. | 2506.14144 | null |
| 2025-06-17 | Image Segmentation with Large Language Models: A Survey with Perspectives for Intelligent Transportation Systems | Sanjeda Akter et.al. | 2506.14096 | null |
| 2025-06-16 | FreeQ-Graph: Free-form Querying with Semantic Consistent Scene Graph for 3D Scene Understanding | Chenlu Zhan et.al. | 2506.13629 | null |
| 2025-06-16 | A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects | Guohuan Xie et.al. | 2506.13552 | null |
| 2025-06-14 | A Spatial Relationship Aware Dataset for Robotics | Peng Wang et.al. | 2506.12525 | link |
| 2025-06-14 | Understanding and Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding | Youze Wang et.al. | 2506.12336 | null |
| 2025-06-12 | GynSurg: A Comprehensive Gynecology Laparoscopic Surgery Dataset | Sahar Nasirihaghighi et.al. | 2506.11356 | null |
| 2025-06-12 | SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis | Weiliang Chen et.al. | 2506.10981 | null |
| 2025-06-13 | SemanticSplat: Feed-Forward 3D Scene Understanding with Language-Aware Gaussian Fields | Qijing Li et.al. | 2506.09565 | null |
| 2025-06-11 | ODG: Occupancy Prediction Using Dual Gaussians | Yunxiao Shi et.al. | 2506.09417 | null |
| 2025-06-10 | SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting | Mengjiao Ma et.al. | 2506.08710 | null |
| 2025-06-10 | PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly | Liang Ma et.al. | 2506.08708 | null |
| 2025-06-10 | From Pixels to Graphs: using Scene and Knowledge Graphs for HD-EPIC VQA Challenge | Agnese Taluzzi et.al. | 2506.08553 | null |
| 2025-06-10 | Robust Visual Localization via Semantic-Guided Multi-Scale Transformer | Zhongtao Tian et.al. | 2506.08526 | null |
| 2025-06-09 | Open World Scene Graph Generation using Vision Language Models | Amartya Dutta et.al. | 2506.08189 | link |
| 2025-06-09 | Design and Evaluation of Deep Learning-Based Dual-Spectrum Image Fusion Methods | Beining Xu et.al. | 2506.07779 | null |
| 2025-06-09 | OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting | Jens Piekenbrinck et.al. | 2506.07697 | null |
| 2025-06-09 | Taking Flight with Dialogue: Enabling Natural Language Control for PX4-based Drone Agent | Shoon Kit Lim et.al. | 2506.07509 | link |
| 2025-06-09 | SpatialLM: Training Large Language Models for Structured Indoor Modeling | Yongsen Mao et.al. | 2506.07491 | null |
| 2025-06-08 | BePo: Leveraging Birds Eye View and Sparse Points for Efficient and Accurate 3D Occupancy Prediction | Yunxiao Shi et.al. | 2506.07002 | null |
| 2025-06-07 | IRS: Instance-Level 3D Scene Graphs via Room Prior Guided LiDAR-Camera Fusion | Hongming Chen et.al. | 2506.06804 | null |
| 2025-06-07 | PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments | Minghao Zou et.al. | 2506.06631 | null |
| 2025-06-06 | Towards Terrain-Aware Task-Driven 3D Scene Graph Generation in Outdoor Environments | Chad R Samuelson et.al. | 2506.06562 | null |
| 2025-06-06 | Enhancing Situational Awareness in Underwater Robotics with Multi-modal Spatial Perception | Pushyami Kaveti et.al. | 2506.06476 | null |
| 2025-06-06 | Challenging Vision-Language Models with Surgical Data: A New Dataset and Broad Benchmarking Study | Leon Mayer et.al. | 2506.06232 | null |
| 2025-06-06 | STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving | Christian Fruhwirth-Reisinger et.al. | 2506.06218 | null |
| 2025-06-06 | Rethinking Semi-supervised Segmentation Beyond Accuracy: Reliability and Robustness | Steven Landgraf et.al. | 2506.05917 | null |
| 2025-06-06 | HMVLM: Multistage Reasoning-Enhanced Vision-Language Model for Long-Tailed Driving Scenarios | Daming Wang et.al. | 2506.05883 | null |
| 2025-06-06 | Pts3D-LLM: Studying the Impact of Token Structure for 3D Scene Understanding With Large Language Models | Hugues Thomas et.al. | 2506.05689 | null |
| 2025-06-06 | Hallucinate, Ground, Repeat: A Framework for Generalized Visual Relationship Detection | Shanmukha Vellamcheti et.al. | 2506.05651 | null |
| 2025-06-05 | SIV-Bench: A Video Benchmark for Social Interaction Understanding and Reasoning | Fanqi Kong et.al. | 2506.05425 | null |
| 2025-06-06 | Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs | Haoyuan Li et.al. | 2506.05318 | null |
| 2025-06-06 | ProJo4D: Progressive Joint Optimization for Sparse-View Inverse Physics Estimation | Daniel Rho et.al. | 2506.05317 | null |
| 2025-06-04 | OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis | Junting Chen et.al. | 2506.04217 | link |
| 2025-06-04 | BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation | Jialei Chen et.al. | 2506.03675 | null |
| 2025-06-04 | Analyzing Transformer Models and Knowledge Distillation Approaches for Image Captioning on Edge AI | Wing Man Casca Kwok et.al. | 2506.03607 | null |
| 2025-06-03 | Trajectory Prediction Meets Large Language Models: A Survey | Yi Xu et.al. | 2506.03408 | null |
| 2025-06-04 | Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments | Di Wen et.al. | 2506.02845 | link |
| 2025-06-03 | PhysGaia: A Physics-Aware Dataset of Multi-Body Interactions for Dynamic Novel View Synthesis | Mijeong Kim et.al. | 2506.02794 | null |
| 2025-06-03 | Large-scale Self-supervised Video Foundation Model for Intelligent Surgery | Shu Yang et.al. | 2506.02692 | null |
| 2025-06-03 | Sight Guide: A Wearable Assistive Perception and Navigation System for the Vision Assistance Race in the Cybathlon 2024 | Patrick Pfreundschuh et.al. | 2506.02676 | null |
| 2025-06-03 | Hierarchical Question-Answering for Driving Scene Understanding Using Vision-Language Models | Safaa Abdullahi Moallim Mohamud et.al. | 2506.02615 | null |
| 2025-06-03 | Sign Language: Towards Sign Understanding for Robot Autonomy | Ayush Agrawal et.al. | 2506.02556 | null |
| 2025-06-02 | MLLMs Need 3D-Aware Representation Supervision for Scene Understanding | Xiaohu Huang et.al. | 2506.01946 | null |
| 2025-06-02 | SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes | Yuji Wang et.al. | 2506.01558 | null |
| 2025-06-02 | FDSG: Forecasting Dynamic Scene Graphs | Yi Yang et.al. | 2506.01487 | null |
| 2025-06-02 | Learning Sparsity for Effective and Efficient Music Performance Question Answering | Xingjian Diao et.al. | 2506.01319 | null |
| 2025-05-30 | Tackling View-Dependent Semantics in 3D Language Gaussian Splatting | Jiazhong Cen et.al. | 2505.24746 | null |
| 2025-05-30 | Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors | Duo Zheng et.al. | 2505.24625 | link |
| 2025-05-30 | EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding | Ege Özsoy et.al. | 2505.24287 | null |
| 2025-05-29 | ConversAR: Exploring Embodied LLM-Powered Group Conversations in Augmented Reality for Second Language Learners | Jad Bendarkawi et.al. | 2505.24000 | null |
| 2025-05-29 | A Reverse Causal Framework to Mitigate Spurious Correlations for Debiasing Scene Graph Generation | Shuzhou Sun et.al. | 2505.23451 | null |
| 2025-05-29 | SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model | Bowen Chen et.al. | 2505.23010 | null |
| 2025-05-28 | On Geometry-Enhanced Parameter-Efficient Fine-Tuning for 3D Scene Segmentation | Liyao Tang et.al. | 2505.22444 | null |
| 2025-05-28 | LiDAR Based Semantic Perception for Forklifts in Outdoor Environments | Benjamin Serfling et.al. | 2505.22258 | null |
| 2025-05-28 | 3D Question Answering via only 2D Vision-Language Models | Fengyun Wang et.al. | 2505.22143 | null |
| 2025-05-29 | DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation | Tianjun Gu et.al. | 2505.21969 | null |
| 2025-05-28 | Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs | Insu Lee et.al. | 2505.21955 | null |
| 2025-05-27 | A Graph Completion Method that Jointly Predicts Geometry and Topology Enables Effective Molecule Assembly | Rohan V. Koodli et.al. | 2505.21833 | null |
| 2025-05-29 | Compositional Scene Understanding through Inverse Generative Modeling | Yanbo Wang et.al. | 2505.21780 | null |
| 2025-05-30 | Right Side Up? Disentangling Orientation Understanding in MLLMs with Fine-grained Multi-axis Perception Tasks | Keanu Nichols et.al. | 2505.21649 | null |
| 2025-05-27 | Assured Autonomy with Neuro-Symbolic Perception | R. Spencer Hallyburton et.al. | 2505.21322 | null |
| 2025-05-27 | Occlusion Boundary and Depth: Mutual Enhancement via Multi-Task Learning | Lintao Xu et.al. | 2505.21231 | null |
| 2025-05-27 | Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts | Yue Zhang et.al. | 2505.21079 | null |
| 2025-05-27 | OccLE: Label-Efficient 3D Semantic Occupancy Prediction | Naiyu Fang et.al. | 2505.20617 | null |
| 2025-05-27 | OmniIndoor3D: Comprehensive Indoor 3D Reconstruction | Xiaobao Wei et.al. | 2505.20610 | null |
| 2025-05-26 | From Data to Modeling: Fully Open-vocabulary Scene Graph Generation | Zuyao Chen et.al. | 2505.20106 | null |
| 2025-05-26 | DepthMatch: Semi-Supervised RGB-D Scene Parsing through Depth-Guided Regularization | Jianxin Huang et.al. | 2505.20041 | null |
| 2025-05-26 | Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement | Afrah Shaahid et.al. | 2505.19895 | null |
| 2025-05-26 | LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study | Dongil Yang et.al. | 2505.19510 | link |
| 2025-05-25 | FHGS: Feature-Homogenized Gaussian Splatting | Q. G. Duan et.al. | 2505.19154 | null |
| 2025-05-25 | Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection | Md. Mithun Hossain et.al. | 2505.19010 | null |
| 2025-05-24 | Self-Supervised and Generalizable Tokenization for CLIP-Based 3D Understanding | Guofeng Mei et.al. | 2505.18819 | null |
| 2025-05-24 | Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps | Sicheng Feng et.al. | 2505.18675 | link |
| 2025-05-23 | SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain | Jiawei Zhou et.al. | 2505.17727 | null |
| 2025-05-23 | From Flight to Insight: Semantic 3D Reconstruction for Aerial Inspection via Gaussian Splatting and Language-Guided Segmentation | Mahmoud Chick Zaouali et.al. | 2505.17402 | null |
| 2025-05-22 | Assessing the generalization performance of SAM for ureteroscopy scene understanding | Martin Villagrana et.al. | 2505.17210 | null |
| 2025-05-22 | CoNav: Collaborative Cross-Modal Reasoning for Embodied Navigation | Haihong Hao et.al. | 2505.16663 | link |
| 2025-05-21 | SCENIR: Visual Semantic Clarity through Unsupervised Scene Graph Retrieval | Nikolaos Chaidos et.al. | 2505.15867 | link |
| 2025-05-21 | HAMF: A Hybrid Attention-Mamba Framework for Joint Scene Context Understanding and Future Motion Representation Learning | Xiaodong Mei et.al. | 2505.15703 | null |
| 2025-05-21 | Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets | Kaiyuan Chen et.al. | 2505.15517 | link |
| 2025-05-21 | RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation | Naman Patel et.al. | 2505.15373 | null |
| 2025-05-21 | DC-Scene: Data-Centric Learning for 3D Scene Understanding | Ting Huang et.al. | 2505.15232 | link |
| 2025-05-19 | ORQA: A Benchmark and Foundation Model for Holistic Operating Room Modeling | Ege Özsoy et.al. | 2505.12890 | null |
| 2025-05-19 | AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning | Kai Zhang et.al. | 2505.12782 | null |
| 2025-05-19 | Predicting Reaction Time to Comprehend Scenes with Foveated Scene Understanding Maps | Ziqi Wen et.al. | 2505.12660 | null |
| 2025-05-18 | LLaVA-4D: Embedding SpatioTemporal Prompt into LMMs for 4D Scene Understanding | Hanyu Zhou et.al. | 2505.12253 | null |
| 2025-05-18 | SEPT: Standard-Definition Map Enhanced Scene Perception and Topology Reasoning for Autonomous Driving | Muleilan Pei et.al. | 2505.12246 | null |
| 2025-05-18 | Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind | Qingmei Li et.al. | 2505.12207 | link |
| 2025-05-18 | Spatial-LLaVA: Enhancing Large Language Models with Spatial Referring Expressions for Visual Understanding | Xuefei Sun et.al. | 2505.12194 | null |
| 2025-05-17 | TinyRS-R1: Compact Multimodal Language Model for Remote Sensing | Aybora Koksal et.al. | 2505.12099 | null |
| 2025-05-15 | StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation | Daniel A. P. Oliveira et.al. | 2505.10292 | link |
| 2025-05-15 | APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds | Yuan Gao et.al. | 2505.09971 | link |
| 2025-05-14 | DRRNet: Macro-Micro Feature Fusion and Dual Reverse Refinement for Camouflaged Object Detection | Jianlin Sun et.al. | 2505.09168 | link |
| 2025-05-14 | Seeing Beyond the Scene: Enhancing Vision-Language Models with Interactional Reasoning | Dayong Liang et.al. | 2505.09118 | null |
| 2025-05-13 | Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving | Zongchuang Zhao et.al. | 2505.08725 | link |
| 2025-05-12 | Deep Learning Advances in Vision-Based Traffic Accident Anticipation: A Comprehensive Review of Methods,Datasets,and Future Directions | Yi Zhang et.al. | 2505.07611 | null |
| 2025-05-11 | Technical Report for ICRA 2025 GOOSE 2D Semantic Segmentation Challenge: Leveraging Color Shift Correction, RoPE-Swin Backbone, and Quantile-based Label Denoising Strategy for Robust Outdoor Scene Understanding | Chih-Chung Hsu et.al. | 2505.06991 | null |
| 2025-05-11 | Boosting Cross-spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation | Seokjun Kwon et.al. | 2505.06951 | null |
| 2025-05-09 | Camera Control at the Edge with Language Models for Scene Understanding | Alexiy Buynitsky et.al. | 2505.06402 | null |
| 2025-05-09 | Camera-Only Bird’s Eye View Perception: A Neural Approach to LiDAR-Free Environmental Mapping for Autonomous Vehicles | Anupkumar Bochare et.al. | 2505.06113 | null |
| 2025-05-08 | Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization | Sooyoung Park et.al. | 2505.05343 | link |
| 2025-05-08 | PADriver: Towards Personalized Autonomous Driving | Genghua Kou et.al. | 2505.05240 | null |
| 2025-05-08 | Does CLIP perceive art the same way we do? | Andrea Asperti et.al. | 2505.05229 | null |
| 2025-05-07 | GSsplat: Generalizable Semantic Gaussian Splatting for Novel-view Synthesis in 3D Scenes | Feng Xiao et.al. | 2505.04659 | link |
| 2025-05-07 | RAFT: Robust Augmentation of FeaTures for Image Segmentation | Edward Humes et.al. | 2505.04529 | null |
| 2025-05-03 | Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models | Gracjan Góral et.al. | 2505.03821 | null |
| 2025-05-06 | MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation | Mingcheng Li et.al. | 2505.02648 | null |
| 2025-05-04 | Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation | Volodymyr Havrylov et.al. | 2505.02075 | link |
| 2025-05-04 | Segment Any RGB-Thermal Model with Language-aided Distillation | Dong Xing et.al. | 2505.01950 | null |
| 2025-05-02 | Embracing Diffraction: A Paradigm Shift in Wireless Sensing and Communication | Anurag Pallaprolu et.al. | 2505.01625 | null |
| 2025-04-30 | V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving | Jannik Lübberstedt et.al. | 2505.00156 | null |
| 2025-04-30 | LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics | Marc Glocker et.al. | 2504.21716 | link |
| 2025-04-30 | ImaginateAR: AI-Assisted In-Situ Authoring in Augmented Reality | Jaewook Lee et.al. | 2504.21360 | null |
| 2025-04-28 | Category-Level and Open-Set Object Pose Estimation for Robotics | Peter Hönig et.al. | 2504.19572 | null |
| 2025-04-28 | Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding | Yan Wang et.al. | 2504.19500 | null |
| 2025-04-27 | Beyond Physical Reach: Comparing Head- and Cane-Mounted Cameras for Last-Mile Navigation by Blind Users | Apurv Varshney et.al. | 2504.19345 | null |
| 2025-04-27 | OpenFusion++: An Open-vocabulary Real-time Scene Understanding System | Xiaofeng Jin et.al. | 2504.19266 | null |
| 2025-04-27 | CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis | Alexander Baumann et.al. | 2504.19223 | null |
| 2025-04-27 | Segmenting Objectiveness and Task-awareness Unknown Region for Autonomous Driving | Mi Zheng et.al. | 2504.19183 | null |
| 2025-04-23 | TraveLLaMA: Facilitating Multi-modal Large Language Models to Understand Urban Scenes and Provide Travel Assistance | Meng Chu et.al. | 2504.16505 | null |
| 2025-04-21 | Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends | Mohammad Abu Tami et.al. | 2504.16134 | null |
| 2025-04-22 | Vision language models are unreliable at trivial spatial cognition | Sangeet Khemlani et.al. | 2504.16061 | null |
| 2025-04-20 | Relation-R1: Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relational Comprehension | Lin Li et.al. | 2504.14642 | null |
| 2025-04-20 | RoboOcc: Enhancing the Geometric and Semantic Scene Understanding for Robots | Zhang Zhang et.al. | 2504.14604 | null |
| 2025-04-20 | Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding | Tong Zeng et.al. | 2504.14526 | link |
| 2025-04-20 | Vision-Centric Representation-Efficient Fine-Tuning for Robust Universal Foreground Segmentation | Guoyi Zhang et.al. | 2504.14481 | null |
| 2025-04-18 | HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering | Alexander Rusnak et.al. | 2504.13590 | null |
| 2025-04-18 | Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding | Yuchen Rao et.al. | 2504.13580 | link |
| 2025-04-18 | Temporal Propagation of Asymmetric Feature Pyramid for Surgical Scene Segmentation | Cheng Yuan et.al. | 2504.13440 | null |
| 2025-04-17 | Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs | Shaohui Dai et.al. | 2504.13153 | link |
| 2025-04-17 | Explainable Scene Understanding with Qualitative Representations and Graph Neural Networks | Nassim Belmecheri et.al. | 2504.12817 | null |
| 2025-04-17 | Robo-SGG: Exploiting Layout-Oriented Normalization and Restitution for Robust Scene Graph Generation | Changsheng Lv et.al. | 2504.12606 | null |
| 2025-04-16 | Generalized Visual Relation Detection with Diffusion Models | Kaifeng Gao et.al. | 2504.12100 | null |
| 2025-04-17 | DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency | Mengshi Qi et.al. | 2504.12080 | link |
| 2025-04-16 | CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting | Wei Sun et.al. | 2504.11893 | null |
| 2025-04-15 | Single-Input Multi-Output Model Merging: Leveraging Foundation Models for Dense Multi-Task Learning | Juan Garcia Giraldo et.al. | 2504.11268 | null |
| 2025-04-14 | Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization | Darryl Hannan et.al. | 2504.10727 | null |
| 2025-04-14 | SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding | Marc Gutiérrez-Pérez et.al. | 2504.10106 | link |
| 2025-04-12 | Text To 3D Object Generation For Scalable Room Assembly | Sonia Laguna et.al. | 2504.09328 | null |
| 2025-04-11 | FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment | Sebastián Barbas Laina et.al. | 2504.08603 | null |
| 2025-04-11 | FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents | Xin Tan et.al. | 2504.08581 | null |
| 2025-04-11 | DSM: Building A Diverse Semantic Map for 3D Visual Grounding | Qinghongbing Xie et.al. | 2504.08307 | null |
| 2025-04-10 | SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos | Joshua Li et.al. | 2504.07867 | null |
| 2025-04-10 | DGOcc: Depth-aware Global Query-based Network for Monocular 3D Occupancy Prediction | Xu Zhao et.al. | 2504.07524 | null |
| 2025-04-09 | RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration | Omar Alama et.al. | 2504.06994 | null |
| 2025-04-09 | Audio-visual Event Localization on Portrait Mode Short Videos | Wuyang Liu et.al. | 2504.06884 | null |
| 2025-04-09 | MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking | Chang Nie et.al. | 2504.06863 | null |
| 2025-04-09 | Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding | Pedro Hermosilla et.al. | 2504.06719 | link |
| 2025-04-09 | Domain-Conditioned Scene Graphs for State-Grounded Task Planning | Jonas Herzog et.al. | 2504.06661 | null |
| 2025-04-09 | Attributes-aware Visual Emotion Representation Learning | Rahul Singh Maharjan et.al. | 2504.06578 | null |
| 2025-04-08 | CamContextI2V: Context-aware Controllable Video Generation | Luis Denninger et.al. | 2504.06022 | link |
| 2025-04-08 | AEGIS: Human Attention-based Explainable Guidance for Intelligent Vehicle Systems | Zhuoli Zhuang et.al. | 2504.05950 | null |
| 2025-04-08 | PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario | Sriram Mandalika et.al. | 2504.05908 | null |
| 2025-04-08 | InvNeRF-Seg: Fine-Tuning a Pre-Trained NeRF for 3D Object Segmentation | Jiangsan Zhao et.al. | 2504.05751 | null |
| 2025-04-07 | RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model | Congcong Wen et.al. | 2504.04988 | null |
| 2025-04-07 | Feedback-Enhanced Hallucination-Resistant Vision-Language Model for Real-Time Scene Understanding | Zahir Alsulaimawi et.al. | 2504.04772 | null |
| 2025-04-07 | DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation | Bo-Wen Yin et.al. | 2504.04701 | link |
| 2025-04-06 | Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language Models | Rui Gan et.al. | 2504.04562 | null |
| 2025-04-04 | 3D Scene Understanding Through Local Random Access Sequence Modeling | Wanhee Lee et.al. | 2504.03875 | link |
| 2025-04-07 | NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving | Kexin Tian et.al. | 2504.03164 | null |
| 2025-04-03 | F-ViTA: Foundation Model Guided Visible to Thermal Translation | Jay N. Paranjape et.al. | 2504.02801 | link |
| 2025-04-03 | Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision | Xiaofeng Han et.al. | 2504.02477 | link |
| 2025-04-02 | Scene-Centric Unsupervised Panoptic Segmentation | Oliver Hahn et.al. | 2504.01955 | link |
| 2025-04-02 | Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness | Haochen Wang et.al. | 2504.01901 | null |
| 2025-04-02 | CoMatcher: Multi-View Collaborative Feature Matching | Jintao Zhang et.al. | 2504.01872 | null |
| 2025-04-02 | TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication | Petr Vanc et.al. | 2504.01708 | null |
| 2025-04-02 | Overlap-Aware Feature Learning for Robust Unsupervised Domain Adaptation for 3D Semantic Segmentation | Junjie Chen et.al. | 2504.01668 | null |
| 2025-04-01 | WikiVideo: Article Generation from Multiple Videos | Alexander Martin et.al. | 2504.00939 | link |
| 2025-04-01 | Zero-Shot 4D Lidar Panoptic Segmentation | Yushan Zhang et.al. | 2504.00848 | null |
| 2025-04-01 | PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks | Abdelrahman Elskhawy et.al. | 2504.00844 | null |
| 2025-04-01 | Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights | Yuchen Liu et.al. | 2504.00839 | null |
| 2025-03-30 | PhysPose: Refining 6D Object Poses with Physical Constraints | Martin Malenický et.al. | 2503.23587 | null |
| 2025-03-30 | Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model | Jannik Endres et.al. | 2503.23502 | link |
| 2025-03-29 | Can DeepSeek-V3 Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery | Boyi Ma et.al. | 2503.23130 | null |
| 2025-03-29 | Evaluating Compositional Scene Understanding in Multimodal Generative Models | Shuhao Fu et.al. | 2503.23125 | link |
| 2025-03-29 | Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments | Yifan Xu et.al. | 2503.23105 | null |
| 2025-03-29 | Empowering Large Language Models with 3D Situation Awareness | Zhihao Yuan et.al. | 2503.23024 | null |
| 2025-03-28 | Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users | Antonia Karamolegkou et.al. | 2503.22610 | null |
| 2025-03-28 | Next-Best-Trajectory Planning of Robot Manipulators for Effective Observation and Exploration | Heiko Renz et.al. | 2503.22588 | null |
| 2025-03-28 | NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving | Fuhao Li et.al. | 2503.22436 | null |
| 2025-03-28 | Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision | Rulin Zhou et.al. | 2503.22394 | null |
| 2025-03-28 | A Dataset for Semantic Segmentation in the Presence of Unknowns | Zakaria Laskar et.al. | 2503.22309 | null |
| 2025-03-28 | Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction | Seokha Moon et.al. | 2503.22087 | null |
| 2025-03-27 | Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting | Anand Bhattad et.al. | 2503.21770 | null |
| 2025-03-27 | uLayout: Unified Room Layout Estimation for Perspective and Panoramic Images | Jonathan Lee et.al. | 2503.21562 | link |
| 2025-03-27 | Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving | Lucas Nunes et.al. | 2503.21449 | link |
| 2025-03-26 | DINeMo: Learning Neural Mesh Models with no 3D Annotations | Weijie Guo et.al. | 2503.20220 | link |
| 2025-03-25 | The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs | Jonathan Sauder et.al. | 2503.20000 | link |
| 2025-03-25 | SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining | Xiang Xu et.al. | 2503.19912 | link |
| 2025-03-25 | OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations | Christina Kassab et.al. | 2503.19764 | null |
| 2025-03-26 | COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting | Jiaxin Zhang et.al. | 2503.19443 | link |
| 2025-03-25 | Divide-and-Conquer: Dual-Hierarchical Optimization for Semantic 4D Gaussian Spatting | Zhiying Yan et.al. | 2503.19332 | null |
| 2025-03-25 | BIMII-Net: Brain-Inspired Multi-Iterative Interactive Network for RGB-T Road Scene Semantic Segmentation | Hanshuo Qiu et.al. | 2503.19303 | null |
| 2025-03-24 | Efficient and Accurate Scene Text Recognition with Cascaded-Transformers | Savas Ozkan et.al. | 2503.18883 | null |
| 2025-03-24 | Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition | Yifei Zhang et.al. | 2503.18746 | null |
| 2025-03-24 | Predicting the Road Ahead: A Knowledge Graph based Foundation Model for Scene Understanding in Autonomous Driving | Hongkuan Zhou et.al. | 2503.18730 | null |
| 2025-03-23 | MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation | Jiaxin Huang et.al. | 2503.18135 | null |
| 2025-03-23 | PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding | Hongjia Zhai et.al. | 2503.18107 | null |
| 2025-03-23 | PanopticSplatting: End-to-End Panoptic Gaussian Splatting | Yuxuan Xie et.al. | 2503.18073 | null |
| 2025-03-23 | PolarFree: Polarization-based Reflection-free Imaging | Mingde Yao et.al. | 2503.18055 | link |
| 2025-03-23 | SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining | Yue Li et.al. | 2503.18052 | link |
| 2025-03-23 | Geometric Constrained Non-Line-of-Sight Imaging | Xueying Liu et.al. | 2503.17992 | null |
| 2025-03-22 | A Causal Adjustment Module for Debiasing Scene Graph Generation | Li Liu et.al. | 2503.17862 | null |
| 2025-03-21 | Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation | Giacomo Savazzi et.al. | 2503.17224 | null |
| 2025-03-21 | ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail | Chandan Yeshwanth et.al. | 2503.17044 | null |
| 2025-03-21 | Seg2Box: 3D Object Detection by Point-Wise Semantics Supervision | Maoji Zheng et.al. | 2503.16811 | null |
| 2025-03-21 | OpenCity3D: What do Vision-Language Models know about Urban Environments? | Valentin Bieri et.al. | 2503.16776 | link |
| 2025-03-20 | Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding | Jinlong Li et.al. | 2503.16707 | link |
| 2025-03-20 | ContactFusion: Stochastic Poisson Surface Maps from Visual and Contact Sensing | Aditya Kamireddypalli et.al. | 2503.16592 | null |
| 2025-03-20 | From Monocular Vision to Autonomous Action: Guiding Tumor Resection via 3D Reconstruction | Ayberk Acar et.al. | 2503.16263 | null |
| 2025-03-20 | Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation | Andrea Maracani et.al. | 2503.16184 | null |
| 2025-03-20 | What can Off-the-Shelves Large Multi-Modal Models do for Dynamic Scene Graph Generation? | Xuanming Cui et.al. | 2503.15846 | null |
| 2025-03-19 | A Context-Driven Training-Free Network for Lightweight Scene Text Segmentation and Recognition | Ritabrata Chakraborty et.al. | 2503.15639 | null |
| 2025-03-19 | Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene | Shengqiong Wu et.al. | 2503.15019 | null |
| 2025-03-19 | Universal Scene Graph Generation | Shengqiong Wu et.al. | 2503.15005 | null |
| 2025-03-19 | SemanticFlow: A Self-Supervised Framework for Joint Scene Flow Prediction and Instance Segmentation in Dynamic Environments | Yinqi Chen et.al. | 2503.14837 | null |
| 2025-03-20 | These Magic Moments: Differentiable Uncertainty Quantification of Radiance Field Models | Parker Ewen et.al. | 2503.14665 | null |
| 2025-03-17 | Learning-based 3D Reconstruction in Autonomous Driving: A Comprehensive Survey | Liewen Liao et.al. | 2503.14537 | null |
| 2025-03-18 | DIFFVSGG: Diffusion-Driven Online Video Scene Graph Generation | Mu Chen et.al. | 2503.13957 | link |
| 2025-03-18 | Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation | Sayak Nag et.al. | 2503.13947 | null |
| 2025-03-18 | ChatBEV: A Visual Language Model that Understands BEV Maps | Qingyao Xu et.al. | 2503.13938 | null |
| 2025-03-18 | PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds | Barza Nisar et.al. | 2503.13914 | link |
| 2025-03-17 | Clustering is back: Reaching state-of-the-art LiDAR instance segmentation without training | Corentin Sautier et.al. | 2503.13203 | null |
| 2025-03-17 | Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation | Henghui Du et.al. | 2503.13068 | null |
| 2025-03-17 | InsightDrive: Insight Scene Representation for End-to-End Autonomous Driving | Ruiqi Song et.al. | 2503.13047 | null |
| 2025-03-17 | HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding | Jiahe Zhao et.al. | 2503.12955 | link |
| 2025-03-17 | NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models | Sung-Yeon Park et.al. | 2503.12772 | link |
| 2025-03-16 | Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding | Imran Kabir et.al. | 2503.12663 | null |
| 2025-03-16 | Car-1000: A New Large Scale Fine-Grained Visual Categorization Dataset | Yutao Hu et.al. | 2503.12385 | null |
| 2025-03-15 | TACO: Taming Diffusion for in-the-wild Video Amodal Completion | Ruijie Lu et.al. | 2503.12049 | null |
| 2025-03-14 | Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling | Christopher Xie et.al. | 2503.11806 | null |
| 2025-03-14 | EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting | Di Li et.al. | 2503.11345 | null |
| 2025-03-14 | Road Rage Reasoning with Vision-language Models (VLMs): Task Definition and Evaluation Dataset | Yibing Weng et.al. | 2503.11342 | null |
| 2025-03-13 | Graph-Grounded LLMs: Leveraging Graphical Function Calling to Minimize LLM Hallucinations | Piyush Gupta et.al. | 2503.10941 | null |
| 2025-03-11 | MaskAttn-UNet: A Mask Attention-Driven Framework for Universal Low-Resolution Image Segmentation | Anzhe Cheng et.al. | 2503.10686 | null |
| 2025-03-13 | TARS: Traffic-Aware Radar Scene Flow Estimation | Jialong Wu et.al. | 2503.10210 | null |
| 2025-03-13 | TGP: Two-modal occupancy prediction with 3D Gaussian and sparse points for 3D Environment Awareness | Mu Chen et.al. | 2503.09941 | null |
| 2025-03-12 | Object-Aware DINO (Oh-A-Dino): Enhancing Self-Supervised Representations for Multi-Object Instance Retrieval | Stefan Sylvius Wagner et.al. | 2503.09867 | null |
| 2025-03-11 | Language-Depth Navigated Thermal and Visible Image Fusion | Jinchang Zhang et.al. | 2503.08676 | null |
| 2025-03-11 | Generating Robot Constitutions & Benchmarks for Semantic Safety | Pierre Sermanet et.al. | 2503.08663 | null |
| 2025-03-11 | Collaborative Dynamic 3D Scene Graphs for Open-Vocabulary Urban Scene Understanding | Tim Steinke et.al. | 2503.08474 | null |
| 2025-03-11 | TrackOcc: Camera-based 4D Panoptic Occupancy Tracking | Zhuoguang Chen et.al. | 2503.08471 | link |
| 2025-03-11 | Ev-Layout: A Large-scale Event-based Multi-modal Dataset for Indoor Layout Estimation and Tracking | Xucheng Guo et.al. | 2503.08370 | null |
| 2025-03-11 | DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos | Lorenzo Mur-Labadia et.al. | 2503.08344 | null |
| 2025-03-11 | Talk2PC: Enhancing 3D Visual Grounding through LiDAR and Radar Point Clouds Fusion for Autonomous Driving | Runwei Guan et.al. | 2503.08336 | link |
| 2025-03-11 | General-Purpose Aerial Intelligent Agents Empowered by Large Language Models | Ji Zhao et.al. | 2503.08302 | null |
| 2025-03-10 | FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction | Dennis Rotondi et.al. | 2503.07909 | null |
| 2025-03-10 | Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction | Zongzheng Zhang et.al. | 2503.07485 | null |
| 2025-03-10 | CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting | Haicheng Liao et.al. | 2503.07234 | null |
| 2025-03-10 | A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning | Xin Wen et.al. | 2503.06960 | link |
| 2025-03-10 | LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs | Hanyu Zhou et.al. | 2503.06934 | null |
| 2025-03-08 | SplatTalk: 3D VQA with Gaussian Splatting | Anh Thai et.al. | 2503.06271 | null |
| 2025-03-08 | Segment Anything, Even Occluded | Wei-En Tai et.al. | 2503.06261 | null |
| 2025-03-08 | VLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion | Meng Wang et.al. | 2503.06219 | null |
| 2025-03-08 | Attention on the Wires (AttWire): A Foundation Model for Detecting Devices and Catheters in X-ray Fluoroscopic Images | YingLiang Ma et.al. | 2503.06190 | null |
| 2025-03-08 | Feature-EndoGaussian: Feature Distilled Gaussian Splatting in Surgical Deformable Scene Reconstruction | Kai Li et.al. | 2503.06161 | null |
| 2025-03-08 | Towards Ambiguity-Free Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity | Xiaohao Xu et.al. | 2503.06014 | null |
| 2025-03-07 | HexPlane Representation for 3D Semantic Scene Understanding | Zeren Chen et.al. | 2503.05127 | null |
| 2025-03-06 | Extracting Symbolic Sequences from Visual Representations via Self-Supervised Learning | Victor Sebastian Martinez Pozos et.al. | 2503.04900 | null |
| 2025-03-06 | EvidMTL: Evidential Multi-Task Learning for Uncertainty-Aware Semantic Surface Mapping from Monocular RGB Images | Rohit Menon et.al. | 2503.04441 | null |
| 2025-03-06 | An Egocentric Vision-Language Model based Portable Real-time Smart Assistant | Yifei Huang et.al. | 2503.04250 | null |
| 2025-03-06 | H3O: Hyper-Efficient 3D Occupancy Prediction with Heterogeneous Supervision | Yunxiao Shi et.al. | 2503.04059 | null |
| 2025-03-06 | GaussianGraph: 3D Gaussian-based Scene Graph Generation for Open-world Scene Understanding | Xihan Wang et.al. | 2503.04034 | null |
| 2025-03-05 | SurgiSAM2: Fine-tuning a foundational model for surgical video anatomy segmentation and detection | Devanish N. Kamtam et.al. | 2503.03942 | null |
| 2025-03-05 | Vision-Language Models Struggle to Align Entities across Modalities | Iñigo Alonso et.al. | 2503.03854 | null |
| 2025-03-05 | Improving 6D Object Pose Estimation of metallic Household and Industry Objects | Thomas Pöllabauer et.al. | 2503.03655 | null |
| 2025-03-04 | MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments | Ege Özsoy et.al. | 2503.02579 | link |
| 2025-03-04 | Label-Efficient LiDAR Panoptic Segmentation | Ahmet Selim Çanakçı et.al. | 2503.02372 | null |
| 2025-03-04 | SSNet: Saliency Prior and State Space Model-based Network for Salient Object Detection in RGB-D Images | Gargi Panda et.al. | 2503.02270 | null |
| 2025-03-03 | vS-Graphs: Integrating Visual SLAM and Situational Graphs through Multi-level Scene Understanding | Ali Tourani et.al. | 2503.01783 | link |
| 2025-03-03 | OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding | Dianyi Yang et.al. | 2503.01646 | null |
| 2025-03-03 | Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond | Guanyao Wu et.al. | 2503.01210 | link |
| 2025-03-03 | Semi-Supervised 360 Layout Estimation with Panoramic Collaborative Perturbations | Junsong Zhang et.al. | 2503.01114 | null |
| 2025-03-01 | Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing | Yanjun Li et.al. | 2503.00548 | null |
| 2025-03-01 | Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning | Hanxun Yu et.al. | 2503.00513 | link |
| 2025-03-04 | Floorplan-SLAM: A Real-Time, High-Accuracy, and Long-Term Multi-Session Point-Plane SLAM for Efficient Floorplan Reconstruction | Haolin Wang et.al. | 2503.00397 | null |
| 2025-02-28 | Vibrotactile information coding strategies for a body-worn vest to aid robot-human collaboration | Adrian Vecina Tercero et.al. | 2502.21056 | null |
| 2025-02-27 | Towards Statistical Factuality Guarantee for Large Vision-Language Models | Zhuohang Li et.al. | 2502.20560 | null |
| 2025-02-26 | Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator | Xiankang He et.al. | 2502.19204 | link |
| 2025-02-25 | VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion | Pei Liu et.al. | 2502.18042 | null |
| 2025-02-24 | AAD-LLM: Neural Attention-Driven Auditory Scene Understanding | Xilin Jiang et.al. | 2502.16794 | null |
| 2025-02-28 | Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model | Yaxuan Huang et.al. | 2502.16779 | link |
| 2025-02-23 | Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration | Kim Jun-Seong et.al. | 2502.16652 | null |
| 2025-02-21 | Weakly Supervised Video Scene Graph Generation via Natural Language Supervision | Kibum Kim et.al. | 2502.15370 | link |
| 2025-02-21 | DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation | Luzhou Ge et.al. | 2502.15309 | link |
| 2025-02-21 | Hierarchical Context Transformer for Multi-level Semantic Scene Understanding | Luoying Hao et.al. | 2502.15184 | link |
| 2025-02-20 | CrossOver: 3D Scene Cross-Modal Alignment | Sayan Deb Sarkar et.al. | 2502.15011 | link |
| 2025-02-20 | Hier-SLAM++: Neuro-Symbolic Semantic SLAM with a Hierarchically Categorical Gaussian Splatting | Boying Li et.al. | 2502.14931 | null |
| 2025-02-19 | Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning | Rui Zhao et.al. | 2502.14917 | null |
| 2025-02-16 | Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review | Ufaq Khan et.al. | 2502.14886 | null |
| 2025-02-21 | AVD2: Accident Video Diffusion for Accident Video Description | Cheng Li et.al. | 2502.14801 | null |
| 2025-02-18 | Spiking Vision Transformer with Saccadic Attention | Shuai Wang et.al. | 2502.12677 | null |
| 2025-02-16 | NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM | Zihan Wang et.al. | 2502.11142 | link |
| 2025-02-15 | Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy | Mingyang Zhao et.al. | 2502.10704 | link |
| 2025-02-14 | Leveraging V2X for Collaborative HD Maps Construction Using Scene Graph Generation | Gamal Elghazaly et.al. | 2502.10127 | null |
| 2025-02-13 | FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation | Bin Yang et.al. | 2502.09274 | null |
| 2025-02-13 | Billet Number Recognition Based on Test-Time Adaptation | Yuan Wei et.al. | 2502.09026 | null |
| 2025-02-13 | EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition | Xiao Wang et.al. | 2502.09020 | link |
| 2025-02-13 | 3D-Grounded Vision-Language Framework for Robotic Task Planning: Automated Prompt Synthesis and Supervised Reasoning | Guoqin Tang et.al. | 2502.08903 | null |
| 2025-02-10 | Fully Exploiting Vision Foundation Model’s Profound Prior Knowledge for Generalizable RGB-Depth Driving Scene Parsing | Sicen Guo et.al. | 2502.06219 | null |
| 2025-02-08 | Content-based Video Retrieval in Traffic Videos using Latent Dirichlet Allocation Topic Model | Mohammad Kianpisheh et.al. | 2502.05457 | null |
| 2025-02-06 | sshELF: Single-Shot Hierarchical Extrapolation of Latent Features for 3D Reconstruction from Sparse-Views | Eyvaz Najafli et.al. | 2502.04318 | null |
| 2025-02-06 | Taking A Closer Look at Interacting Objects: Interaction-Aware Open Vocabulary Scene Graph Generation | Lin Li et.al. | 2502.03856 | null |
| 2025-02-05 | EnVisionVR: A Scene Interpretation Tool for Visual Accessibility in Virtual Reality | Junlong Chen et.al. | 2502.03564 | null |
| 2025-02-04 | Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation | Junha Lee et.al. | 2502.02548 | null |
| 2025-02-04 | Event-aided Semantic Scene Completion | Shangwei Guo et.al. | 2502.02334 | link |
| 2025-02-03 | AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis | Basit Alawode et.al. | 2502.01785 | null |
| 2025-01-30 | Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation | Yuelei Li et.al. | 2501.18733 | null |
| 2025-01-30 | Efficient Interactive 3D Multi-Object Removal | Jingcheng Ni et.al. | 2501.17636 | null |
| 2025-02-04 | Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding | Akash Kumar et.al. | 2501.17053 | null |
| 2025-01-29 | PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding | Wei Chow et.al. | 2501.16411 | null |
| 2025-01-26 | Ocean-OCR: Towards General OCR Application via a Vision-Language Model | Song Chen et.al. | 2501.15558 | link |
| 2025-01-26 | Unveiling the Potential of iMarkers: Invisible Fiducial Markers for Advanced Robotics | Ali Tourani et.al. | 2501.15505 | link |
| 2025-01-24 | HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation | Xin Zhou et.al. | 2501.14729 | link |
| 2025-01-24 | Scene Understanding Enabled Semantic Communication with Open Channel Coding | Zhe Xiang et.al. | 2501.14520 | null |
| 2025-01-23 | GeomGS: LiDAR-Guided Geometry-Aware Gaussian Splatting for Robot Localization | Jaewon Lee et.al. | 2501.13417 | null |
| 2025-01-22 | Neural Radiance Fields for the Real World: A Survey | Wenhui Xiao et.al. | 2501.13104 | null |
| 2025-01-22 | PSGSL: A Probabilistic Framework Integrating Semantic Scene Understanding and Gas Sensing for Gas Source Localization | Pepe Ojeda et.al. | 2501.12812 | null |
| 2025-01-20 | Dynamic Scene Understanding from Vision-Language Representations | Shahaf Pruss et.al. | 2501.11653 | null |
| 2025-01-20 | EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery | Guankun Wang et.al. | 2501.11347 | link |
| 2025-01-20 | A Survey of World Models for Autonomous Driving | Tuo Feng et.al. | 2501.11260 | null |
| 2025-01-17 | A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features | Enes Karanfil et.al. | 2501.10144 | null |
| 2025-01-16 | CrossModalityDiffusion: Multi-Modal Novel View Synthesis with Unified Intermediate Representation | Alex Berian et.al. | 2501.09838 | link |
| 2025-01-16 | YETI (YET to Intervene) Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks | Saptarashmi Bandyopadhyay et.al. | 2501.09355 | null |
| 2025-01-15 | Embodied Scene Understanding for Vision Language Models via MetaVQA | Weizhen Wang et.al. | 2501.09167 | null |
| 2025-01-15 | GOTLoc: General Outdoor Text-based Localization Using Scene Graph Retrieval with OpenStreetMap | Donghwi Jung et.al. | 2501.08575 | link |
| 2025-01-14 | 3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding | Haomiao Xiong et.al. | 2501.07819 | link |
| 2025-01-13 | Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models | Yasiru Ranasinghe et.al. | 2501.07396 | null |
| 2025-01-13 | Hierarchical Superpixel Segmentation via Structural Information Theory | Minhui Xie et.al. | 2501.07069 | link |
| 2025-01-12 | Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving | Haoxiang Gao et.al. | 2501.06680 | null |
| 2025-01-08 | NextStop: An Improved Tracker For Panoptic LIDAR Segmentation Data | Nirit Alkalay et.al. | 2501.06235 | null |
| 2025-01-10 | Self-Supervised Partial Cycle-Consistency for Multi-View Matching | Fedor Taggenbrock et.al. | 2501.06000 | link |
| 2025-01-10 | UniQ: Unified Decoder with Task-specific Queries for Efficient Scene Graph Generation | Xinyao Liao et.al. | 2501.05687 | null |
| 2025-01-09 | Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding | Mohammed Elhenawy et.al. | 2501.05566 | null |
| 2025-01-09 | A Systematic Literature Review on Deep Learning-based Depth Estimation in Computer Vision | Ali Rohan et.al. | 2501.05147 | null |
| 2025-01-08 | TADFormer : Task-Adaptive Dynamic Transformer for Efficient Multi-Task Learning | Seungmin Baek et.al. | 2501.04293 | null |
| 2025-01-07 | A Bayesian Modeling Framework for Estimation and Ground Segmentation of Cluttered Staircases | Prasanna Sriganesh et.al. | 2501.04170 | null |
| 2025-01-07 | LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving | Lingdong Kong et.al. | 2501.04005 | null |
| 2025-01-07 | CL3DOR: Contrastive Learning for 3D Large Multimodal Models via Odds Ratio on High-Resolution Point Clouds | Keonwoo Kim et.al. | 2501.03879 | null |
| 2025-01-07 | Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets | Jing Liu et.al. | 2501.03637 | null |
| 2025-01-03 | VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment | Wenyan Cong et.al. | 2501.01949 | null |
| 2025-01-03 | IAM: Enhancing RGB-D Instance Segmentation with New Benchmarks | Aecheon Jung et.al. | 2501.01685 | link |
| 2025-01-09 | GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models | Zhangyang Qi et.al. | 2501.01428 | null |
| 2025-01-02 | 3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer | Jiajun Deng et.al. | 2501.01163 | null |
| 2025-01-02 | Leverage Cross-Attention for End-to-End Open-Vocabulary Panoptic Reconstruction | Xuan Yu et.al. | 2501.01119 | null |
| 2024-12-31 | STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes | Jiawei Yang et.al. | 2501.00602 | null |
| 2024-12-31 | Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding | Yue Fan et.al. | 2501.00358 | null |
| 2024-12-31 | OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies | Runnan Chen et.al. | 2501.00326 | link |
| 2024-12-30 | Text-to-Image GAN with Pretrained Representations | Xiaozhou You et.al. | 2501.00116 | null |
| 2024-12-30 | 4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives | Zeyu Yang et.al. | 2412.20720 | null |
| 2024-12-27 | An Actionable Hierarchical Scene Representation Enhancing Autonomous Inspection Missions in Unknown Environments | Vignesh Kottayam Viswanathan et.al. | 2412.19582 | null |
| 2024-12-27 | xFLIE: Leveraging Actionable Hierarchical Scene Representations for Autonomous Semantic-Aware Inspection Missions | Vignesh Kottayam Viswanathan et.al. | 2412.19571 | link |
| 2024-12-27 | MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios | Jiaqi Fan et.al. | 2412.19406 | null |
| 2024-12-26 | Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation | Tao Liu et.al. | 2412.19021 | null |
| 2024-12-25 | 3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding | Tatiana Zemskova et.al. | 2412.18450 | link |
| 2024-12-24 | MR-COGraphs: Communication-efficient Multi-Robot Open-vocabulary Mapping System via 3D Scene Graphs | Qiuyi Gu et.al. | 2412.18381 | null |
| 2024-12-24 | Parallel Neural Computing for Scene Understanding from LiDAR Perception in Autonomous Racing | Suwesh Prasad Sah et.al. | 2412.18165 | link |
| 2024-12-24 | UniPLV: Towards Label-Efficient Open-World 3D Scene Understanding by Regional Visual Language Supervision | Yuru Wang et.al. | 2412.18131 | null |
| 2024-12-24 | LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding | Hao Li et.al. | 2412.17635 | null |
| 2024-12-21 | Application of Multimodal Large Language Models in Autonomous Driving | Md Robiul Islam et.al. | 2412.16410 | null |
| 2024-12-20 | Improving Object Detection for Time-Lapse Imagery Using Temporal Features in Wildlife Monitoring | Marcus Jenkins et.al. | 2412.16329 | link |
| 2024-12-19 | AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving | Shuo Xing et.al. | 2412.15206 | link |
| 2024-12-19 | ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects | Qihang Cao et.al. | 2412.14837 | null |
| 2024-12-19 | PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation | Shoumeng Qiu et.al. | 2412.14821 | link |
| 2024-12-18 | GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting | Yuning Peng et.al. | 2412.13654 | null |
| 2024-12-18 | RelationField: Relate Anything in Radiance Fields | Sebastian Koch et.al. | 2412.13652 | null |
| 2024-12-18 | Multi-View Pedestrian Occupancy Prediction with a Novel Synthetic Dataset | Sithu Aung et.al. | 2412.13569 | null |
| 2024-12-17 | RA-SGG: Retrieval-Augmented Scene Graph Generation Framework via Multi-Prototype Learning | Kanghoon Yoon et.al. | 2412.12788 | link |
| 2024-12-18 | Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration | Ziheng Zhou et.al. | 2412.12628 | null |
| 2024-12-17 | Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning | Qi Sun et.al. | 2412.11974 | link |
| 2024-12-16 | DINO-Foresight: Looking into the Future with DINO | Efstathios Karypidis et.al. | 2412.11673 | link |
| 2024-12-16 | An Enhanced Classification Method Based on Adaptive Multi-Scale Fusion for Long-tailed Multispectral Point Clouds | TianZhu Liu et.al. | 2412.11407 | null |
| 2024-12-15 | SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation | Hang Zhang et.al. | 2412.11026 | null |
| 2024-12-13 | SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians | Siyun Liang et.al. | 2412.10231 | null |
| 2024-12-13 | Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance | Jiahao Lyu et.al. | 2412.10159 | null |
| 2024-12-17 | WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model | Songyan Zhang et.al. | 2412.09951 | link |
| 2024-12-12 | LIVE-GS: LLM Powers Interactive VR by Enhancing Gaussian Splatting | Haotian Mao et.al. | 2412.09176 | null |
| 2024-12-11 | SLGaussian: Fast Language Gaussian Splatting in Sparse Views | Kangjie Chen et.al. | 2412.08331 | null |
| 2024-12-11 | TGOSPA Metric Parameters Selection and Evaluation for Visual Multi-object Tracking | Jan Krejčí et.al. | 2412.08321 | null |
| 2024-12-11 | THUD++: Large-Scale Dynamic Indoor Scene Dataset and Benchmark for Mobile Robots | Zeshun Li et.al. | 2412.08096 | null |
| 2024-12-11 | MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents | Yun Xing et.al. | 2412.08014 | null |
| 2024-12-10 | Motion-aware Contrastive Learning for Temporal Panoptic Scene Graph Generation | Thong Thanh Nguyen et.al. | 2412.07160 | null |
| 2024-12-11 | ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models | Jieyu Zhang et.al. | 2412.07012 | link |
| 2024-12-07 | Timely reliable Bayesian decision-making enabled using memristors | Lekai Song et.al. | 2412.06838 | null |
| 2024-12-09 | Visual Lexicon: Rich Image Features in Language Space | XuDong Wang et.al. | 2412.06774 | null |
| 2024-12-09 | LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations | Mingjie Xu et.al. | 2412.06322 | link |
| 2024-12-09 | Event fields: Capturing light fields at high speed, resolution, and dynamic range | Ziyuan Qu et.al. | 2412.06191 | null |
| 2024-12-07 | TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances | Wenting Xu et.al. | 2412.05596 | null |
| 2024-12-06 | Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model | Lening Wang et.al. | 2412.05280 | link |
| 2024-12-06 | EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding | Yuqi Wu et.al. | 2412.04380 | link |
| 2024-12-04 | Designing DNNs for a trade-off between robustness and processing performance in embedded devices | Jon Gutiérrez-Zaballa et.al. | 2412.03682 | null |
| 2024-12-04 | Assessing the performance of CT image denoisers using Laguerre-Gauss Channelized Hotelling Observer for lesion detection | Prabhat Kc et.al. | 2412.02920 | null |
| 2024-12-03 | BYE: Build Your Encoder with One Sequence of Exploration Data for Long-Term Dynamic Scene Understanding | Chenguang Huang et.al. | 2412.02449 | null |
| 2024-12-04 | SparseLGS: Sparse View Language Embedded Gaussian Splatting | Jun Hu et.al. | 2412.02245 | null |
| 2024-12-02 | Occam’s LGS: A Simple Approach for Language Gaussian Splatting | Jiahuan Cheng et.al. | 2412.01807 | null |
| 2024-12-02 | Holistic Understanding of 3D Scenes as Universal Scene Description | Anna-Maria Halacheva et.al. | 2412.01398 | null |
| 2024-12-02 | LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences | Hongyan Zhi et.al. | 2412.01292 | null |
| 2024-12-02 | A Semantic Communication System for Real-time 3D Reconstruction Tasks | Jiaxing Zhang et.al. | 2412.01191 | null |
| 2024-12-02 | TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition | Xingsong Ye et.al. | 2412.01137 | link |
| 2024-12-01 | ChatSplat: 3D Conversational Gaussian Splatting | Hanlin Chen et.al. | 2412.00734 | null |
| 2024-11-30 | Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding | Duo Zheng et.al. | 2412.00493 | null |
| 2024-11-29 | SIMS: Simulating Human-Scene Interactions with Real World Script Planning | Wenjia Wang et.al. | 2411.19921 | null |
| 2024-11-29 | Quantifying the synthetic and real domain gap in aerial scene understanding | Alina Marcu et.al. | 2411.19913 | null |
| 2024-11-29 | Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding | Wenbo Zhang et.al. | 2411.19551 | null |
| 2024-11-28 | GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks | Muhammad Sohail Danish et.al. | 2411.19325 | link |
| 2024-11-28 | On-chip Hyperspectral Image Segmentation with Fully Convolutional Networks for Scene Understanding in Autonomous Driving | Jon Gutiérrez-Zaballa et.al. | 2411.19274 | null |
| 2024-11-28 | InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception | Haijie Li et.al. | 2411.19235 | null |
| 2024-11-27 | Reconstructing Animals and the Wild | Peter Kulits et.al. | 2411.18807 | null |
| 2024-11-27 | Grid-augumented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents | Joongwon Chae et.al. | 2411.18270 | null |
| 2024-11-27 | HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation | Trong-Thuan Nguyen et.al. | 2411.18042 | null |
| 2024-11-26 | Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning | Hoàng-Ân Lê et.al. | 2411.17536 | link |
| 2024-11-26 | HSI-Drive v2.0: More Data for New Challenges in Scene Understanding for Autonomous Driving | Jon Gutiérrez-Zaballa et.al. | 2411.17530 | null |
| 2024-11-25 | RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics | Chan Hee Song et.al. | 2411.16537 | null |
| 2024-11-27 | An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models | Wentao Qu et.al. | 2411.16308 | link |
| 2024-11-25 | Open-Vocabulary Octree-Graph for 3D Scene Understanding | Zhigang Wang et.al. | 2411.16253 | null |
| 2024-11-24 | SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition | Yongkun Du et.al. | 2411.15858 | link |
| 2024-11-24 | ROOT: VLM based System for Indoor Scene Understanding and Beyond | Yonghui Wang et.al. | 2411.15714 | link |
| 2024-11-23 | Comparative Analysis of Resource-Efficient CNN Architectures for Brain Tumor Classification | Md Ashik Khan et.al. | 2411.15596 | null |
| 2024-11-23 | Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing | Yadong Qu et.al. | 2411.15585 | null |
| 2024-11-22 | UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations | Yuan Ren et.al. | 2411.15355 | null |
| 2024-11-21 | Multimodal 3D Reasoning Segmentation with Complex Scenes | Xueying Jiang et.al. | 2411.13927 | null |
| 2024-11-20 | Unbiased Scene Graph Generation by Type-Aware Message Passing on Heterogeneous and Dual Graphs | Guanglu Sun et.al. | 2411.13287 | null |
| 2024-11-20 | Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation | Rohith Peddi et.al. | 2411.13059 | null |
| 2024-11-19 | GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving | Shaoqing Xu et.al. | 2411.12452 | link |
| 2024-11-19 | Classification of Geographical Land Structure Using Convolution Neural Network and Transfer Learning | Mustafa M. Abd Zaid et.al. | 2411.12415 | null |
| 2024-11-18 | Calibrated and Efficient Sampling-Free Confidence Estimation for LiDAR Scene Semantic Segmentation | Hanieh Shojaei Miandashti et.al. | 2411.11935 | null |
| 2024-11-18 | MGNiceNet: Unified Monocular Geometric Scene Understanding | Markus Schön et.al. | 2411.11466 | null |
| 2024-11-18 | The ADUULM-360 Dataset – A Multi-Modal Dataset for Depth Estimation in Adverse Weather | Markus Schön et.al. | 2411.11455 | null |
| 2024-11-18 | Reducing Label Dependency for Underwater Scene Understanding: A Survey of Datasets, Techniques and Applications | Scarlett Raine et.al. | 2411.11287 | null |
| 2024-11-19 | Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition | Tiancheng Lin et.al. | 2411.11219 | link |
| 2024-11-17 | Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry | Wenjun Hou et.al. | 2411.10937 | null |
| 2024-11-16 | MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation | Ansh Shah et.al. | 2411.10886 | link |
| 2024-11-16 | Large Language Models (LLMs) as Traffic Control Systems at Urban Intersections: A New Paradigm | Sari Masri et.al. | 2411.10869 | null |
| 2024-11-15 | TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding | Quang P. M. Pham et.al. | 2411.10509 | null |
| 2024-11-15 | Content-Aware Preserving Image Generation | Giang H. Le et.al. | 2411.09871 | null |
| 2024-11-13 | Voxeland: Probabilistic Instance-Aware Semantic Mapping with Evidence-based Uncertainty Quantification | Jose-Luis Matez-Bandera et.al. | 2411.08727 | link |
| 2024-11-11 | $SE(3)$ Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation | Yinshuang Xu et.al. | 2411.07326 | null |
| 2024-11-06 | Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving | Depanshu Sani et.al. | 2411.03702 | null |
| 2024-11-05 | VLA-3D: A Dataset for 3D Semantic Scene Understanding and Navigation | Haochen Zhang et.al. | 2411.03540 | link |
| 2024-11-05 | OLAF: A Plug-and-Play Framework for Enhanced Multi-object Multi-part Scene Parsing | Pranav Gupta et.al. | 2411.02858 | null |
| 2024-11-04 | Modeling Uncertainty in 3D Gaussian Splatting through Continuous Semantic Splatting | Joey Wilson et.al. | 2411.02547 | null |
| 2024-11-04 | Multi-task Geometric Estimation of Depth and Surface Normal from Monocular 360° Images | Kun Huang et.al. | 2411.01749 | link |
| 2024-11-03 | VQ-Map: Bird’s-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization | Yiwei Zhang et.al. | 2411.01618 | link |
| 2024-11-01 | On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR | Li Li et.al. | 2411.00600 | link |
| 2024-11-01 | Federated Voxel Scene Graph for Intracranial Hemorrhage | Antoine P. Sanner et.al. | 2411.00578 | null |
| 2024-10-30 | UniRiT: Towards Few-Shot Non-Rigid Point Cloud Registration | Geng Li et.al. | 2410.22909 | null |
| 2024-10-30 | Situational Scene Graph for Structured Human-centric Situation Understanding | Chinthani Sugandhika et.al. | 2410.22829 | null |
| 2024-10-30 | Symbolic Graph Inference for Compound Scene Understanding | FNU Aryan et.al. | 2410.22626 | null |
| 2024-10-29 | Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving | Bo Jiang et.al. | 2410.22313 | link |
| 2024-10-26 | Towards Robust Algorithms for Surgical Phase Recognition via Digital Twin-based Scene Representation | Hao Ding et.al. | 2410.20026 | null |
| 2024-10-23 | Surgical Scene Segmentation by Transformer With Asymmetric Feature Enhancement | Cheng Yuan et.al. | 2410.17642 | link |
| 2024-10-22 | PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding | Vinh Nguyen et.al. | 2410.16824 | null |
| 2024-10-20 | Scene Graph Generation with Role-Playing Large Language Models | Guikun Chen et.al. | 2410.15364 | null |
| 2024-10-20 | Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment | Can Cui et.al. | 2410.15281 | null |
| 2024-10-19 | Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards | Lukas Brunke et.al. | 2410.15185 | null |
| 2024-10-19 | Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding | Yi Liu et.al. | 2410.14944 | link |
| 2024-10-17 | ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding | Guangda Ji et.al. | 2410.13924 | link |
| 2024-10-17 | VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding | Runsen Xu et.al. | 2410.13860 | link |
| 2024-10-16 | 3D Gaussian Splatting in Robotics: A Survey | Siting Zhu et.al. | 2410.12262 | null |
| 2024-10-17 | SAM-Guided Masked Token Prediction for 3D Scene Understanding | Zhimin Chen et.al. | 2410.12158 | null |
| 2024-10-16 | Leveraging Large Vision Language Model For Better Automatic Web GUI Testing | Siyi Wang et.al. | 2410.12157 | null |
| 2024-10-15 | MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark | Bin Shan et.al. | 2410.11538 | link |
| 2024-10-14 | 3DArticCyclists: Generating Simulated Dynamic 3D Cyclists for Human-Object Interaction (HOI) and Autonomous Driving Applications | Eduardo R. Corral-Soto et.al. | 2410.10782 | null |
| 2024-10-17 | Stratified Domain Adaptation: A Progressive Self-Training Approach for Scene Text Recognition | Kha Nhat Le et.al. | 2410.09913 | null |
| 2024-10-13 | LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond | Md Tanvir Islam et.al. | 2410.09831 | link |
| 2024-10-12 | Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors | Hritam Basak et.al. | 2410.09467 | null |
| 2024-10-11 | Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking | Wei Zhang et.al. | 2410.08616 | null |
| 2024-10-10 | A transition towards virtual representations of visual scenes | Américo Pereira et.al. | 2410.07987 | null |
| 2024-10-10 | RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation | Songming Liu et.al. | 2410.07864 | null |
| 2024-10-11 | Test-Time Intensity Consistency Adaptation for Shadow Detection | Leyi Zhu et.al. | 2410.07695 | null |
| 2024-10-10 | 3D Vision-Language Gaussian Splatting | Qucheng Peng et.al. | 2410.07577 | null |
| 2024-10-09 | Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy | Qinfeng Zhu et.al. | 2410.06725 | null |
| 2024-10-09 | Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments | Meng Yu et.al. | 2410.06626 | null |
| 2024-10-08 | BoxMap: Efficient Structural Mapping and Navigation | Zili Wang et.al. | 2410.06263 | null |
| 2024-10-08 | OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and Open-Vocabulary Semantic Scene Graphs | Venkata Naren Devarakonda et.al. | 2410.06239 | null |
| 2024-10-07 | Resource-Efficient Multiview Perception: Integrating Semantic Masking with Masked Autoencoders | Kosta Dakic et.al. | 2410.04817 | null |
| 2024-10-07 | Diffusion Models in 3D Vision: A Survey | Zhen Wang et.al. | 2410.04738 | null |
| 2024-10-06 | In-Place Panoptic Radiance Field Segmentation with Perceptual Prior for 3D Scene Understanding | Shenghao Li et.al. | 2410.04529 | null |
| 2024-10-05 | ETHcavation: A Dataset and Pipeline for Panoptic Scene Understanding and Object Tracking in Dynamic Construction Environments | Lorenzo Terenzi et.al. | 2410.04250 | null |
| 2024-10-05 | Fast Object Detection with a Machine Learning Edge Device | Richard C. Rodriguez et.al. | 2410.04173 | null |
| 2024-10-04 | SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models | Yue Zhang et.al. | 2410.03878 | null |
| 2024-10-03 | RESSCAL3D++: Joint Acquisition and Semantic Segmentation of 3D Point Clouds | Remco Royen et.al. | 2410.02323 | link |
| 2024-10-01 | A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio | Xavier Juanola et.al. | 2410.01020 | link |
| 2024-09-30 | Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation | Aleyna Kütük et.al. | 2410.00266 | null |
| 2024-09-30 | Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation | Kun Yuan et.al. | 2410.00263 | link |
| 2024-09-30 | You Only Speak Once to See | Wenhao Yang et.al. | 2409.18372 | null |
| 2024-09-26 | LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness | Chenming Zhu et.al. | 2409.18125 | null |
| 2024-09-26 | Text Image Generation for Low-Resource Languages with Dual Translation Learning | Chihiro Noguchi et.al. | 2409.17747 | null |
| 2024-09-26 | Scene Understanding in Pick-and-Place Tasks: Analyzing Transformations Between Initial and Final Scenes | Seraj Ghasemi et.al. | 2409.17720 | null |
| 2024-10-02 | BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes | Kasun Weerakoon et.al. | 2409.16484 | null |
| 2024-09-24 | Open-World Object Detection with Instance Representation Learning | Sunoh Lee et.al. | 2409.16073 | null |
| 2024-09-24 | Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving | Lingyu Xiao et.al. | 2409.15730 | link |
| 2024-09-27 | Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer | Minh Bui et.al. | 2409.15117 | null |
| 2024-09-23 | An Adverse Weather-Immune Scheme with Unfolded Regularization and Foundation Model Knowledge Distillation for Street Scene Understanding | Wei-Bin Kou et.al. | 2409.14737 | null |
| 2024-09-22 | One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance | Minyi Zhao et.al. | 2409.14483 | null |
| 2024-09-22 | Scene-Text Grounding for Text-Based Video Question Answering | Sheng Zhou et.al. | 2409.14319 | null |
| 2024-09-21 | MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors | Zhenhua Du et.al. | 2409.14019 | null |
| 2024-09-21 | Relevance-driven Decision Making for Safer and More Efficient Human Robot Collaboration | Xiaotong Zhang et.al. | 2409.13998 | null |
| 2024-09-21 | Enhanced Semantic Segmentation for Large-Scale and Imbalanced Point Clouds | Haoran Gong et.al. | 2409.13983 | null |
| 2024-09-19 | CLAIR-A: Leveraging Large Language Models to Judge Audio Captions | Tsung-Han Wu et.al. | 2409.12962 | link |
| 2024-09-18 | Towards Global Localization using Multi-Modal Object-Instance Re-Identification | Aneesh Chavan et.al. | 2409.12002 | null |
| 2024-09-18 | SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection | Tim Engelbracht et.al. | 2409.11870 | null |
| 2024-09-18 | VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer | Humen Zhong et.al. | 2409.11656 | null |
| 2024-09-18 | DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion | Jian Xu et.al. | 2409.11642 | link |
| 2024-09-16 | Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving | Yunsheng Ma et.al. | 2409.11182 | null |
| 2024-09-16 | Point2Graph: An End-to-end Point Cloud-based 3D Open-Vocabulary Scene Graph for Robot Navigation | Yifan Xu et.al. | 2409.10350 | null |
| 2024-09-16 | Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation | Minghan Chen et.al. | 2409.10262 | null |
| 2024-09-15 | Semantic2D: A Semantic Dataset for 2D Lidar Semantic Segmentation | Zhanteng Xie et.al. | 2409.09899 | null |
| 2024-09-12 | LED: Light Enhanced Depth Estimation at Night | Simon de Moreau et.al. | 2409.08031 | link |
| 2024-09-12 | Relevance for Human Robot Collaboration | Xiaotong Zhang et.al. | 2409.07753 | null |
| 2024-09-10 | Towards Localizing Structural Elements: Merging Geometrical Detection with Semantic Verification in RGB-D Data | Ali Tourani et.al. | 2409.06625 | null |
| 2024-09-10 | Loss Distillation via Gradient Matching for Point Cloud Completion with Weighted Chamfer Distance | Fangzhou Lin et.al. | 2409.06171 | link |
| 2024-09-09 | Online 3D reconstruction and dense tracking in endoscopic videos | Michel Hayoz et.al. | 2409.06037 | link |
| 2024-09-08 | TanDepth: Leveraging Global DEMs for Metric Monocular Depth Estimation in UAVs | Horatiu Florea et.al. | 2409.05142 | null |
| 2024-09-06 | Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences | Rui Yu et.al. | 2409.04390 | null |
| 2024-09-06 | RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement | Hao Luo et.al. | 2409.04363 | link |
| 2024-09-05 | Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding | Yunze Man et.al. | 2409.03757 | link |
| 2024-09-05 | Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction | Shen Chen et.al. | 2409.03213 | null |
| 2024-09-04 | Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving | Yuhang Lu et.al. | 2409.02914 | null |
| 2024-09-03 | Unveiling Deep Shadows: A Survey on Image and Video Shadow Detection, Removal, and Generation in the Era of Deep Learning | Xiaowei Hu et.al. | 2409.02108 | link |
| 2024-09-03 | EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video | Zhen Zhou et.al. | 2409.01807 | link |
| 2024-09-03 | GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting | Zixuan Guo et.al. | 2409.01581 | null |
| 2024-08-31 | Leaky Wave Antenna-Equipped RF Chipless Tags for Orientation Estimation | Onel L. A. López et.al. | 2409.00501 | null |
| 2024-08-30 | UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios | Baichuan Zhou et.al. | 2408.17267 | link |
| 2024-08-30 | AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding | Yonghui Wang et.al. | 2408.16986 | link |
| 2024-08-29 | DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving | Yongjie Fu et.al. | 2408.16647 | null |
| 2024-08-28 | Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph | Zherong Zhang et.al. | 2408.15750 | null |
| 2024-08-28 | RoboSense: Large-scale Dataset and Benchmark for Multi-sensor Low-speed Autonomous Driving | Haisheng Su et.al. | 2408.15503 | link |
| 2024-08-27 | Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images | Silvia Seidlitz et.al. | 2408.15373 | link |
| 2024-08-27 | MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders | Baijiong Lin et.al. | 2408.15101 | link |
| 2024-08-27 | Interactive Occlusion Boundary Estimation through Exploitation of Synthetic Data | Lintao Xu et.al. | 2408.15038 | null |
| 2024-08-27 | BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization | Mario A. V. Saucedo et.al. | 2408.14941 | null |
| 2024-08-27 | Platypus: A Generalized Specialist Model for Reading Text in Various Forms | Peng Wang et.al. | 2408.14805 | link |
| 2024-08-27 | RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models | Junyao Ge et.al. | 2408.14744 | link |
| 2024-08-26 | Ensemble Predicate Decoding for Unbiased Scene Graph Generation | Jiasong Feng et.al. | 2408.14187 | null |
| 2024-08-26 | FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation | Daixun Li et.al. | 2408.13980 | null |
| 2024-08-25 | Making Large Language Models Better Planners with Reasoning-Decision Alignment | Zhijian Huang et.al. | 2408.13890 | null |
| 2024-08-25 | 3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing | Shichao Dong et.al. | 2408.13788 | null |
| 2024-08-25 | Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild | Fares Bougourzi et.al. | 2408.13774 | link |
| 2024-08-25 | SeeBelow: Sub-dermal 3D Reconstruction of Tumors with Surgical Robotic Palpation and Tactile Exploration | Raghava Uppuluri et.al. | 2408.13699 | null |
| 2024-08-21 | Exploring Scene Coherence for Semi-Supervised 3D Semantic Segmentation | Chuandong Liu et.al. | 2408.11280 | null |
| 2024-08-20 | OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding | Youjun Zhao et.al. | 2408.11030 | link |
| 2024-08-19 | 3D-Aware Instance Segmentation and Tracking in Egocentric Videos | Yash Bhalgat et.al. | 2408.09860 | null |
| 2024-08-16 | Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation | Tri Ton et.al. | 2408.08591 | null |
| 2024-08-15 | Towards Flexible Visual Relationship Segmentation | Fangrui Zhu et.al. | 2408.08305 | null |
| 2024-08-13 | SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis | Saptarshi Neil Sinha et.al. | 2408.06975 | null |
| 2024-08-13 | SceneGPT: A Language Model for 3D Scene Understanding | Shivam Chandhok et.al. | 2408.06926 | null |
| 2024-08-12 | HeLiMOS: A Dataset for Moving Object Segmentation in 3D Point Clouds From Heterogeneous LiDAR Sensors | Hyungtae Lim et.al. | 2408.06328 | null |
| 2024-08-11 | Decoder Pre-Training with only Text for Scene Text Recognition | Shuai Zhao et.al. | 2408.05706 | link |
| 2024-08-09 | Spherical World-Locking for Audio-Visual Localization in Egocentric Videos | Heeseung Yun et.al. | 2408.05364 | null |
| 2024-08-15 | DeepInteraction++: Multi-Modality Interaction for Autonomous Driving | Zeyu Yang et.al. | 2408.05075 | link |
| 2024-08-09 | Mesh-based Object Tracking for Dynamic Semantic 3D Scene Graphs via Ray Tracing | Lennart Niecksch et.al. | 2408.04979 | null |
| 2024-08-09 | Manipulable Semantic Components: a Computational Representation of Data Visualization Scenes | Zhicheng Liu et.al. | 2408.04798 | null |
| 2024-08-07 | Leveraging LLMs for Enhanced Open-Vocabulary 3D Scene Understanding in Autonomous Driving | Amirhosein Chahe et.al. | 2408.03516 | null |
| 2024-08-04 | LEGO: Self-Supervised Representation Learning for Scene Text Images | Yujin Ren et.al. | 2408.02036 | null |
| 2024-07-31 | RoadFormer+: Delivering RGB-X Scene Parsing through Scale-Aware Information Decoupling and Advanced Heterogeneous Feature Fusion | Jianxin Huang et.al. | 2407.21631 | null |
| 2024-07-31 | Voxel Scene Graph for Intracranial Hemorrhage | Antoine P. Sanner et.al. | 2407.21580 | null |
| 2024-07-31 | A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap | Lijun Zhang et.al. | 2407.21438 | link |
| 2024-07-31 | DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations | Dongwon Son et.al. | 2407.21267 | null |
| 2024-07-30 | From Feature Importance to Natural Language Explanations Using LLMs with RAG | Sule Tekkesinoglu et.al. | 2407.20990 | null |
| 2024-07-30 | Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering | Yanpeng Zhao et.al. | 2407.20908 | link |
| 2024-07-30 | NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding | Hongjia Zhai et.al. | 2407.20853 | null |
| 2024-07-29 | SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction | Çağhan Köksal et.al. | 2407.20214 | null |
| 2024-07-29 | Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets | Muhammad Abdullah Jamal et.al. | 2407.19714 | null |
| 2024-07-28 | ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding | Zhen Chen et.al. | 2407.19435 | link |
| 2024-07-27 | GP-VLS: A general-purpose vision language model for surgery | Samuel Schmidgall et.al. | 2407.19305 | null |
| 2024-07-27 | Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction | Yansheng Li et.al. | 2407.19259 | null |
| 2024-07-26 | BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation | Peng Hao et.al. | 2407.18715 | null |
| 2024-07-26 | MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition | Chang Liu et.al. | 2407.18616 | link |
| 2024-07-26 | Answerability Fields: Answerable Location Estimation via Diffusion Models | Daichi Azuma et.al. | 2407.18497 | null |
| 2024-07-24 | 3D Question Answering for City Scene Understanding | Penglei Sun et.al. | 2407.17398 | null |
| 2024-07-23 | Augmented Efficiency: Reducing Memory Footprint and Accelerating Inference for 3D Semantic Segmentation through Hybrid Vision | Aditya Krishnan et.al. | 2407.16102 | null |
| 2024-07-25 | Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation | Jaehyeong Jeon et.al. | 2407.15396 | link |
| 2024-07-21 | VideoGameBunny: Towards vision assistants for video games | Mohammad Reza Taesiri et.al. | 2407.15295 | null |
| 2024-07-21 | Self-training Room Layout Estimation via Geometry-aware Ray-casting | Bolivar Solarte et.al. | 2407.15041 | null |
| 2024-07-19 | A New Lightweight Hybrid Graph Convolutional Neural Network – CNN Scheme for Scene Classification using Object Detection Inference | Ayman Beghdadi et.al. | 2407.14658 | null |
| 2024-07-19 | OpenSU3D: Open World 3D Scene Understanding using Foundation Models | Rafay Mohiuddin et.al. | 2407.14279 | null |
| 2024-07-19 | MC-PanDA: Mask Confidence for Panoptic Domain Adaptation | Ivan Martinović et.al. | 2407.14110 | link |
| 2024-07-19 | GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation | Florian Chabot et.al. | 2407.14108 | null |
| 2024-07-18 | Training-Free Model Merging for Multi-target Domain Adaptation | Wenyi Li et.al. | 2407.13771 | null |
| 2024-07-18 | General Geometry-aware Weakly Supervised 3D Object Detection | Guowen Zhang et.al. | 2407.13748 | link |
| 2024-07-18 | Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation | Pengfei Wang et.al. | 2407.13362 | null |
| 2024-07-17 | InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction | Xulong Wang et.al. | 2407.12661 | link |
| 2024-07-17 | Out of Length Text Recognition with Sub-String Matching | Yongkun Du et.al. | 2407.12317 | link |
| 2024-07-17 | Dual-Hybrid Attention Network for Specular Highlight Removal | Xiaojiao Guo et.al. | 2407.12255 | null |
| 2024-07-16 | Disentangled Acoustic Fields For Multimodal Physical Scene Understanding | Jie Yin et.al. | 2407.11333 | null |
| 2024-07-15 | OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models | Zijian Zhou et.al. | 2407.11213 | null |
| 2024-07-15 | No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations | Walter Simoncini et.al. | 2407.10964 | link |
| 2024-07-18 | Benchmarking Vision Language Models for Cultural Understanding | Shravan Nayak et.al. | 2407.10920 | null |
| 2024-07-14 | Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data | Tuo Feng et.al. | 2407.10200 | link |
| 2024-07-13 | Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding | Ruihuang Li et.al. | 2407.09781 | null |
| 2024-07-12 | A Fair Ranking and New Model for Panoptic Scene Graph Generation | Julian Lorenz et.al. | 2407.09216 | null |
| 2024-07-12 | From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation | Hanrong Shi et.al. | 2407.09191 | null |
| 2024-07-11 | BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight | Hang Wu et.al. | 2407.08526 | null |
| 2024-07-10 | Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences | Nikolaos Dimitriadis et.al. | 2407.08056 | null |
| 2024-07-10 | Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search | Kirill Paramonov et.al. | 2407.07541 | null |
| 2024-07-09 | Joint prototype and coefficient prediction for 3D instance segmentation | Remco Royen et.al. | 2407.06958 | null |
| 2024-07-09 | LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition | Teng Wang et.al. | 2407.06730 | null |
| 2024-07-08 | Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition | Bangbang Zhou et.al. | 2407.05562 | link |
| 2024-07-07 | Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness | Idris Hamoud et.al. | 2407.05448 | null |
| 2024-07-05 | Hybrid Primal Sketch: Combining Analogy, Qualitative Representations, and Computer Vision for Scene Understanding | Kenneth D. Forbus et.al. | 2407.04859 | null |
| 2024-07-03 | A Unified Framework for 3D Scene Understanding | Wei Xu et.al. | 2407.03263 | null |
| 2024-07-11 | Open Panoramic Segmentation | Junwei Zheng et.al. | 2407.02685 | link |
| 2024-07-02 | MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders | Baijiong Lin et.al. | 2407.02228 | link |
| 2024-07-02 | Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning | Chengchao Shen et.al. | 2407.02014 | link |
| 2024-07-01 | PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction | Xuan Yu et.al. | 2407.01349 | null |
| 2024-06-30 | ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding | Quang P. M. Pham et.al. | 2407.00609 | null |
| 2024-06-28 | EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting | Daiwei Zhang et.al. | 2406.19811 | null |
| 2024-07-01 | Mobile Robot Oriented Large-Scale Indoor Dataset for Dynamic Scene Understanding | Yifan Tang et.al. | 2406.19791 | null |
| 2024-06-28 | PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation | Deyi Ji et.al. | 2406.19632 | null |
| 2024-06-27 | Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation | KuanChao Chu et.al. | 2406.19316 | null |
| 2024-06-26 | 3D-MVP: 3D Multiview Pretraining for Robotic Manipulation | Shengyi Qian et.al. | 2406.18158 | null |
| 2024-06-24 | GPT-4V Explorations: Mining Autonomous Driving | Zixuan Li et.al. | 2406.16817 | null |
| 2024-06-25 | AudioBench: A Universal Benchmark for Audio Large Language Models | Bin Wang et.al. | 2406.16020 | link |
| 2024-06-20 | EvSegSNN: Neuromorphic Semantic Segmentation for Event Data | Dalia Hareb et.al. | 2406.14178 | null |
| 2024-06-19 | StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images | Rushikesh Zawar et.al. | 2406.13735 | null |
| 2024-06-17 | DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features | Letian Wang et.al. | 2406.12095 | null |
| 2024-06-17 | Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding | Yunsong Wang et.al. | 2406.11283 | null |
| 2024-06-15 | PIG: Prompt Images Guidance for Night-Time Scene Parsing | Zhifeng Xie et.al. | 2406.10531 | link |
| 2024-06-14 | MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report | Zhongyu Yang et.al. | 2406.10125 | null |
| 2024-06-14 | SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding | Junwei Luo et.al. | 2406.10100 | link |
| 2024-06-14 | A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion | Kailai Sun et.al. | 2406.09792 | link |
| 2024-06-13 | MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding | Fei Wang et.al. | 2406.09411 | null |
| 2024-06-13 | Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach | Yansheng Li et.al. | 2406.09410 | link |
| 2024-06-12 | Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment | Taekbeom Lee et.al. | 2406.08176 | null |
| 2024-06-13 | A3VLM: Actionable Articulation-Aware Vision Language Model | Siyuan Huang et.al. | 2406.07549 | link |
| 2024-06-10 | ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery | Xian Sun et.al. | 2406.06028 | null |
| 2024-06-11 | LOP-Field: Brain-inspired Layout-Object-Position Fields for Robotic Scene Understanding | Jiawei Hou et.al. | 2406.05985 | null |
| 2024-06-08 | 1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR’24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation | Qingfeng Liu et.al. | 2406.05352 | null |
| 2024-06-06 | Semantic Similarity Score for Measuring Visual Similarity at Semantic Level | Senran Fan et.al. | 2406.03865 | null |
| 2024-06-04 | Radar Spectra-Language Model for Automotive Scene Parsing | Mariia Pushkareva et.al. | 2406.02158 | null |
| 2024-06-04 | Leveraging Predicate and Triplet Learning for Scene Graph Generation | Jiankai Li et.al. | 2406.02038 | link |
| 2024-06-04 | FastLGS: Speeding up Language Embedded Gaussians with Feature Grid Mapping | Yuzhou Ji et.al. | 2406.01916 | null |
| 2024-06-04 | PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning | Yupeng Zheng et.al. | 2406.01587 | null |
| 2024-06-03 | EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding | Thanh-Dat Truong et.al. | 2406.01429 | null |
| 2024-06-03 | Object Aware Egocentric Online Action Detection | Joungbin An et.al. | 2406.01079 | null |
| 2024-06-03 | CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos | Trong-Thuan Nguyen et.al. | 2406.01029 | null |
| 2024-06-02 | Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering | Xingrui Wang et.al. | 2406.00622 | null |
| 2024-06-02 | Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024 | Biao Wu et.al. | 2406.00587 | null |
| 2024-05-30 | Learning 3D Robotics Perception using Inductive Priors | Muhammad Zubair Irshad et.al. | 2405.20364 | null |
| 2024-05-30 | SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation | Junjie Zhang et.al. | 2405.19586 | null |
| 2024-05-29 | Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding | Junjie Fei et.al. | 2405.18937 | null |
| 2024-05-27 | GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane | Yansong Qu et.al. | 2405.17596 | null |
| 2024-05-27 | OED: Towards One-stage End-to-End Dynamic Scene Graph Generation | Guan Wang et.al. | 2405.16925 | link |
| 2024-05-25 | Real-Time Scene Graph Generation | Maëlic Neau et.al. | 2405.16116 | link |
| 2024-05-24 | Open-Vocabulary SAM3D: Understand Any 3D Scene | Hanchen Tai et.al. | 2405.15580 | null |
| 2024-05-23 | Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis | Basile Van Hoorick et.al. | 2405.14868 | null |
| 2024-05-23 | CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments | Yang Zhou et.al. | 2405.14731 | link |
| 2024-05-23 | Efficient Robot Learning for Perception and Mapping | Niclas Vödisch et.al. | 2405.14688 | null |
| 2024-05-24 | Transformers for Image-Goal Navigation | Nikhilanj Pelluri et.al. | 2405.14128 | null |
| 2024-05-22 | TS40K: a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission System | Diogo Lavado et.al. | 2405.13989 | null |
| 2024-05-22 | A General Framework for Jersey Number Recognition in Sports Video | Maria Koshkina et.al. | 2405.13896 | link |
| 2024-05-22 | GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games | Aoran Mei et.al. | 2405.13751 | null |
| 2024-05-21 | Anticipating Object State Changes | Victoria Manousaki et.al. | 2405.12789 | null |
| 2024-05-21 | Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency | Hyeongjin Kim et.al. | 2405.12648 | null |
| 2024-05-20 | MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering | Jingqun Tang et.al. | 2405.11985 | null |
| 2024-05-19 | The First Swahili Language Scene Text Detection and Recognition Dataset | Fadila Wendigoundi Douamba et.al. | 2405.11437 | link |
| 2024-05-16 | Grounded 3D-LLM with Referent Tokens | Yilun Chen et.al. | 2405.10370 | link |
| 2024-05-16 | 4D Panoptic Scene Graph Generation | Jingkang Yang et.al. | 2405.10305 | link |
| 2024-05-16 | When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models | Xianzheng Ma et.al. | 2405.10255 | null |
| 2024-05-16 | A Preprocessing and Postprocessing Voxel-based Method for LiDAR Semantic Segmentation Improvement in Long Distance | Andrea Matteazzi et.al. | 2405.10046 | null |
| 2024-05-15 | BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation | Yunhao Ge et.al. | 2405.09546 | null |
| 2024-05-15 | HAAP: Vision-context Hierarchical Attention Autoregressive with Adaptive Permutation for Scene Text Recognition | Honghui Chen et.al. | 2405.09125 | null |
| 2024-05-15 | 3D Shape Augmentation with Content-Aware Shape Resizing | Mingxiang Chen et.al. | 2405.09050 | null |
| 2024-05-09 | Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control | Gunshi Gupta et.al. | 2405.05852 | link |
| 2024-05-11 | Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition | Zuan Gao et.al. | 2405.05841 | null |
| 2024-05-09 | Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview | Yuhang Ming et.al. | 2405.05526 | null |
| 2024-05-09 | DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction | Siyu Li et.al. | 2405.05518 | null |
| 2024-05-08 | OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies | Lingdong Kong et.al. | 2405.05259 | link |
| 2024-05-08 | Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving | Lingdong Kong et.al. | 2405.05258 | link |
| 2024-05-07 | DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving | Chen Min et.al. | 2405.04390 | null |
| 2024-05-07 | Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing | Boqiang Zhang et.al. | 2405.04377 | null |
| 2024-05-06 | An Empty Room is All We Want: Automatic Defurnishing of Indoor Panoramas | Mira Slavcheva et.al. | 2405.03682 | null |
| 2024-05-04 | Few-Shot Fruit Segmentation via Transfer Learning | Jordan A. James et.al. | 2405.02556 | link |
| 2024-04-29 | Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM | Navid Rajabi et.al. | 2404.19128 | null |
| 2024-04-29 | Compositional Factorization of Visual Scenes with Convolutional Sparse Coding and Resonator Networks | Christopher J. Kymn et.al. | 2404.19126 | null |
| 2024-04-24 | Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer | Jiaming Lei et.al. | 2404.15785 | null |
| 2024-04-22 | CloudFort: Enhancing Robustness of 3D Point Cloud Classification Against Backdoor Attacks via Spatial Partitioning and Ensemble Prediction | Wenhao Lan et.al. | 2404.14042 | null |
| 2024-04-22 | On Support Relations Inference and Scene Hierarchy Graph Construction from Point Cloud in Clustered Environments | Gang Ma et.al. | 2404.13842 | null |
| 2024-04-29 | Clio: Real-time Task-Driven Open-Set 3D Scene Graphs | Dominic Maggio et.al. | 2404.13696 | link |
| 2024-04-19 | BACS: Background Aware Continual Semantic Segmentation | Mostafa ElAraby et.al. | 2404.13148 | link |
| 2024-04-19 | Unified Scene Representation and Reconstruction for 3D Large Language Models | Tao Chu et.al. | 2404.13044 | null |
| 2024-04-18 | SPIdepth: Strengthened Pose Information for Self-supervised Monocular Depth Estimation | Mykola Lavreniuk et.al. | 2404.12501 | null |
| 2024-04-19 | AccidentBlip2: Accident Detection With Multi-View MotionBlip2 | Yihua Shao et.al. | 2404.12149 | link |
| 2024-04-17 | Multimodal 3D Object Detection on Unseen Domains | Deepti Hegde et.al. | 2404.11764 | null |
| 2024-04-16 | ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation | Iaroslav Melekhov et.al. | 2404.10699 | link |
| 2024-04-16 | PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction | Sinisa Stekovic et.al. | 2404.10620 | null |
| 2024-04-16 | PreGSU-A Generalized Traffic Scene Understanding Model for Autonomous Driving based on Pre-trained Graph Attention Network | Yuning Wang et.al. | 2404.10263 | null |
| 2024-04-15 | No More Ambiguity in 360° Room Layout via Bi-Layout Estimation | Yu-Ju Tsai et.al. | 2404.09993 | null |
| 2024-04-15 | A Review and Efficient Implementation of Scene Graph Generation Metrics | Julian Lorenz et.al. | 2404.09616 | null |
| 2024-04-14 | Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms | Diandian Guo et.al. | 2404.09231 | null |
| 2024-04-11 | Gaga: Group Any Gaussians via 3D-aware Memory Bank | Weijie Lyu et.al. | 2404.07977 | null |
| 2024-04-11 | AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation | Yansheng Li et.al. | 2404.07788 | null |
| 2024-04-11 | Depth Estimation using Weighted-loss and Transfer Learning | Muhammad Adeel Hafeez et.al. | 2404.07686 | null |
| 2024-04-11 | Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange | Yanhao Wu et.al. | 2404.07504 | null |
| 2024-04-10 | Incorporating Explanations into Human-Machine Interfaces for Trust and Situation Awareness in Autonomous Vehicles | Shahin Atakishiyev et.al. | 2404.07383 | null |
| 2024-04-10 | ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling | Ege Özsoy et.al. | 2404.07031 | null |
| 2024-04-10 | O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation | Muer Tie et.al. | 2404.06836 | null |
| 2024-04-09 | QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding | Yash Mehan et.al. | 2404.06442 | null |
| 2024-04-09 | DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird’s Eye View Segmentation with Occlusion Reasoning | Senthil Yogamani et.al. | 2404.06352 | null |
| 2024-04-09 | JSTR: Judgment Improves Scene Text Recognition | Masato Fujitake et.al. | 2404.05967 | null |
| 2024-04-06 | Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation | Danpei Zhao et.al. | 2404.04608 | null |
| 2024-04-06 | SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos | Tao Wu et.al. | 2404.04565 | null |
| 2024-04-05 | Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation | Zifu Wan et.al. | 2404.04256 | link |
| 2024-04-06 | HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion | Jiahang Li et.al. | 2404.03527 | link |
| 2024-04-04 | You Only Scan Once: A Dynamic Scene Reconstruction Pipeline for 6-DoF Robotic Grasping of Novel Objects | Lei Zhou et.al. | 2404.03462 | null |
| 2024-04-03 | Weakly-Supervised 3D Scene Graph Generation via Visual-Linguistic Assisted Pseudo-labeling | Xu Wang et.al. | 2404.02527 | null |
| 2024-04-05 | EGTR: Extracting Graph from Transformer for Scene Graph Generation | Jinbae Im et.al. | 2404.02072 | link |
| 2024-04-01 | NeRF-MAE : Masked AutoEncoders for Self Supervised 3D representation Learning for Neural Radiance Fields | Muhammad Zubair Irshad et.al. | 2404.01300 | null |
| 2024-04-08 | 360+x: A Panoptic Multi-modal Scene Understanding Dataset | Hao Chen et.al. | 2404.00989 | null |
| 2024-04-01 | Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping | Hyeongjun Kwon et.al. | 2404.00974 | link |
| 2024-04-01 | GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields | Yunsong Wang et.al. | 2404.00931 | link |
| 2024-04-01 | MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements | Lisong C. Sun et.al. | 2404.00923 | null |
| 2024-04-01 | From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models | Rongjie Li et.al. | 2404.00906 | null |
| 2024-03-31 | Adapting to Length Shift: FlexiLength Network for Trajectory Prediction | Yi Xu et.al. | 2404.00742 | null |
| 2024-03-31 | Neural Radiance Field-based Visual Rendering: A Comprehensive Review | Mingyuan Yao et.al. | 2404.00714 | null |
| 2024-03-29 | VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection | Zihua Liu et.al. | 2404.00149 | null |
| 2024-03-29 | HGS-Mapping: Online Dense Mapping Using Hybrid Gaussian Representation in Urban Scenes | Ke Wu et.al. | 2403.20159 | null |
| 2024-04-01 | Efficient 3D Instance Mapping and Localization with Neural Fields | George Tang et.al. | 2403.19797 | null |
| 2024-03-27 | Object Pose Estimation via the Aggregation of Diffusion Features | Tianfu Wang et.al. | 2403.18791 | link |
| 2024-03-25 | Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding | Lingdong Kong et.al. | 2403.17010 | link |
| 2024-03-25 | Towards Trustworthy Automated Driving through Qualitative Scene Understanding and Explanations | Nassim Belmecheri et.al. | 2403.16908 | null |
| 2024-03-25 | DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding | Xiaoxuan Yu et.al. | 2403.16431 | link |
| 2024-03-24 | AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans | Cedric Perauer et.al. | 2403.16318 | null |
| 2024-03-24 | Improving Scene Graph Generation with Relation Words’ Debiasing in Vision-Language Models | Yuxuan Wang et.al. | 2403.16184 | null |
| 2024-03-24 | Multi-Task Learning with Multi-Task Optimization | Lu Bai et.al. | 2403.16162 | null |
| 2024-03-24 | Semantic Is Enough: Only Semantic Information For NeRF Reconstruction | Ruibo Wang et.al. | 2403.16043 | null |
| 2024-03-22 | Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting | Jun Guo et.al. | 2403.15624 | null |
| 2024-03-22 | DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data | Hanrong Ye et.al. | 2403.15389 | null |
| 2024-03-21 | DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation | Zeeshan Hayder et.al. | 2403.14886 | null |
| 2024-03-21 | Evaluating Panoramic 3D Estimation in Indoor Lighting Analysis | Zining Cheng et.al. | 2403.14836 | null |
| 2024-03-21 | SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field | Lizhe Liu et.al. | 2403.14366 | null |
| 2024-03-21 | Exosense: A Vision-Centric Scene Understanding System For Safe Exoskeleton Navigation | Jianeng Wang et.al. | 2403.14320 | null |
| 2024-03-21 | Volumetric Environment Representation for Vision-Language Navigation | Rui Liu et.al. | 2403.14158 | null |
| 2024-03-21 | 3D Object Detection from Point Cloud via Voting Step Diffusion | Haoran Hou et.al. | 2403.14133 | null |
| 2024-03-20 | Efficient scene text image super-resolution with semantic guidance | LeoWu TomyEnrique et.al. | 2403.13330 | link |
| 2024-03-19 | SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model | Armen Avetisyan et.al. | 2403.13064 | null |
| 2024-03-19 | HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting | Hongyu Zhou et.al. | 2403.12722 | null |
| 2024-03-19 | M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving | Dongyang Xu et.al. | 2403.12552 | null |
| 2024-03-19 | Multi-Object RANSAC: Efficient Plane Clustering Method in a Clutter | Seunghyeon Lim et.al. | 2403.12449 | null |
| 2024-03-19 | Geometric Constraints in Deep Learning Frameworks: A Survey | Vibhas K Vats et.al. | 2403.12431 | null |
| 2024-03-18 | R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding | Qirui Wu et.al. | 2403.12301 | null |
| 2024-03-18 | HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation | Ce Zhang et.al. | 2403.12033 | link |
| 2024-03-18 | Agent3D-Zero: An Agent for Zero-shot 3D Understanding | Sha Zhang et.al. | 2403.11835 | null |
| 2024-03-18 | OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation | Haochen Jiang et.al. | 2403.11796 | null |
| 2024-03-19 | Urban Scene Diffusion through Semantic Occupancy Map | Junge Zhang et.al. | 2403.11697 | null |
| 2024-03-18 | Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation | Ming Xu et.al. | 2403.11541 | link |
| 2024-03-18 | Beyond Uncertainty: Risk-Aware Active View Acquisition for Safe Robot Navigation and 3D Scene Understanding with FisherRF | Guangyi Liu et.al. | 2403.11396 | null |
| 2024-03-17 | Omni-Recon: Towards General-Purpose Neural Radiance Fields for Versatile 3D Applications | Yonggan Fu et.al. | 2403.11131 | null |
| 2024-03-16 | N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields | Yash Bhalgat et.al. | 2403.10997 | null |
| 2024-03-16 | Segment Any Object Model (SAOM): Real-to-Simulation Fine-Tuning Strategy for Multi-Class Multi-Instance Segmentation | Mariia Khan et.al. | 2403.10780 | null |
| 2024-03-15 | Robust Shape Fitting for 3D Scene Abstraction | Florian Kluger et.al. | 2403.10452 | link |
| 2024-03-15 | Do Visual-Language Maps Capture Latent Semantics? | Matti Pekkanen et.al. | 2403.10117 | null |
| 2024-03-15 | Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning | Hang Zhang et.al. | 2403.10107 | null |
| 2024-03-14 | GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding | Chengyao Wang et.al. | 2403.09639 | link |
| 2024-03-12 | IndicSTR12: A Dataset for Indic Scene Text Recognition | Harsh Lunia et.al. | 2403.08007 | null |
| 2024-03-12 | Efficient Global Navigational Planning in 3D Structures based on Point Cloud Tomography | Bowen Yang et.al. | 2403.07631 | link |
| 2024-03-12 | Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss | Xuhua Ren et.al. | 2403.07518 | null |
| 2024-03-12 | MoAI: Mixture of All Intelligence for Large Language and Vision Models | Byung-Kwan Lee et.al. | 2403.07508 | link |
| 2024-03-11 | Mapping High-level Semantic Regions in Indoor Environments without Object Recognition | Roberto Bigazzi et.al. | 2403.07076 | null |
| 2024-03-11 | Optimizing Latent Graph Representations of Surgical Scenes for Zero-Shot Domain Transfer | Siddhant Satyanaik et.al. | 2403.06953 | null |
| 2024-03-08 | Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation | Yifan Mao et.al. | 2403.05056 | link |
| 2024-03-07 | Towards Scene Graph Anticipation | Rohith Peddi et.al. | 2403.04899 | null |
| 2024-03-07 | Embodied Understanding of Driving Scenarios | Yunsong Zhou et.al. | 2403.04593 | link |
| 2024-03-07 | Out of the Room: Generalizing Event-Based Dynamic Motion Segmentation for Complex Scenes | Stamatios Georgoulis et.al. | 2403.04562 | null |
| 2024-03-06 | GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding | Zi-Ting Chou et.al. | 2403.03608 | null |
| 2024-03-05 | OORD: The Oxford Offroad Radar Dataset | Matthew Gadd et.al. | 2403.02845 | link |
| 2024-03-05 | HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes | Yichen Yao et.al. | 2403.02769 | null |
| 2024-02-29 | FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything | Safouane El Ghazouali et.al. | 2403.00175 | link |
| 2024-02-29 | One model to use them all: Training a segmentation model with complementary datasets | Alexander C. Jenke et.al. | 2402.19340 | link |
| 2024-02-29 | Feature boosting with efficient attention for scene parsing | Vivek Singh et.al. | 2402.19250 | null |
| 2024-02-29 | PCDepth: Pattern-based Complementary Learning for Monocular Depth Estimation by Best of Both Worlds | Haotian Liu et.al. | 2402.18925 | null |
| 2024-02-28 | Windowed-FourierMixer: Enhancing Clutter-Free Room Modeling with Fourier Transform | Bruno Henriques et.al. | 2402.18287 | null |
| 2024-02-27 | LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment | Yiming Ren et.al. | 2402.17171 | null |
| 2024-02-27 | Efficiently Leveraging Linguistic Priors for Scene Text Spotting | Nguyen Nguyen et.al. | 2402.17134 | null |
| 2024-02-26 | DreamUp3D: Object-Centric Generative Models for Single-View 3D Scene Understanding and Real-to-Sim Transfer | Yizhe Wu et.al. | 2402.16308 | null |
| 2024-02-24 | Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition | Mingkun Yang et.al. | 2402.15806 | null |
| 2024-02-23 | OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding | Francis Engelmann et.al. | 2402.15321 | null |
| 2024-02-22 | S^2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in OR | Jialun Pei et.al. | 2402.14461 | null |
| 2024-02-22 | Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding | Yu-Qi Yang et.al. | 2402.14215 | link |
| 2024-02-21 | Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition | Mingkun Yang et.al. | 2402.13643 | link |
| 2024-02-25 | DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models | Xiaoyu Tian et.al. | 2402.12289 | null |
Depth Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-12-22 | CoDrone: Autonomous Drone Navigation Assisted by Edge and Cloud Foundation Models | Pengyu Chen et.al. | 2512.19083 | null |
| 2025-12-22 | CETCAM: Camera-Controllable Video Generation via Consistent and Extensible Tokenization | Zelin Zhao et.al. | 2512.19020 | null |
| 2025-12-21 | A Study of Finetuning Video Transformers for Multi-view Geometry Tasks | Huimin Wu et.al. | 2512.18684 | null |
| 2025-12-20 | EndoStreamDepth: Temporally Consistent Monocular Depth Estimation for Endoscopic Video Streams | Hao Li et.al. | 2512.18159 | null |
| 2025-12-17 | A Modular Framework for Single-View 3D Reconstruction of Indoor Environments | Yuxiao Li et.al. | 2512.17955 | null |
| 2025-12-19 | Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting | Ananta R. Bhattarai et.al. | 2512.17908 | null |
| 2025-12-19 | Long-Range depth estimation using learning based Hybrid Distortion Model for CCTV cameras | Ami Pandat et.al. | 2512.17784 | null |
| 2025-12-19 | SAVeD: A First-Person Social Media Video Dataset for ADAS-equipped vehicle Near-Miss and Crash Event Analyses | Shaoyan Zhai et.al. | 2512.17724 | null |
| 2025-12-18 | Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation | Min-Jung Kim et.al. | 2512.17040 | null |
| 2025-12-18 | Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation | Xin Lin et.al. | 2512.16913 | null |
| 2025-12-18 | N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models | Yuxin Wang et.al. | 2512.16561 | null |
| 2025-12-17 | In Pursuit of Pixel Supervision for Visual Pre-training | Lihe Yang et.al. | 2512.15715 | null |
| 2025-12-16 | DASP: Self-supervised Nighttime Monocular Depth Estimation with Domain Adaptation of Spatiotemporal Priors | Yiheng Huang et.al. | 2512.14536 | null |
| 2025-12-16 | Elastic3D: Controllable Stereo Video Conversion with Guided Latent Decoding | Nando Metzger et.al. | 2512.14236 | null |
| 2025-12-16 | Robust Single-shot Structured Light 3D Imaging via Neural Feature Decoding | Jiaheng Li et.al. | 2512.14028 | null |
| 2025-12-16 | Deep Learning Perspective of Scene Understanding in Autonomous Robots | Afia Maham et.al. | 2512.14020 | null |
| 2025-12-15 | StarryGazer: Leveraging Monocular Depth Estimation Models for Domain-Agnostic Single Depth Image Completion | Sangmin Hong et.al. | 2512.13147 | null |
| 2025-12-13 | BokehDepth: Enhancing Monocular Depth Estimation through Bokeh Generation | Hangwei Zhang et.al. | 2512.12425 | null |
| 2025-12-12 | ProbeMDE: Uncertainty-Guided Active Proprioception for Monocular Depth Estimation in Surgical Robotics | Britton Jordan et.al. | 2512.11773 | null |
| 2025-12-11 | Empowering Dynamic Urban Navigation with Stereo and Mid-Level Vision | Wentao Zhou et.al. | 2512.10956 | null |
| 2025-12-11 | Video Depth Propagation | Luigi Piccinelli et.al. | 2512.10725 | null |
| 2025-12-11 | SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving | Peizheng Li et.al. | 2512.10719 | null |
| 2025-12-11 | Robust Shape from Focus via Multiscale Directional Dilated Laplacian and Recurrent Network | Khurram Ashfaq et.al. | 2512.10498 | null |
| 2025-12-09 | Scale-invariant and View-relational Representation Learning for Full Surround Monocular Depth | Kyumin Hwang et.al. | 2512.08700 | null |
| 2025-12-09 | Development & first Performance evaluation of multi-element monolithic HPGe detector for X-ray spectroscopy | N. Goyal et.al. | 2512.08389 | null |
| 2025-12-09 | Accuracy Does Not Guarantee Human-Likeness in Monocular Depth Estimators | Yuki Kubota et.al. | 2512.08163 | null |
| 2025-12-08 | More than Segmentation: Benchmarking SAM 3 for Segmentation, 3D Perception, and Reconstruction in Robotic Surgery | Wenzhen Dong et.al. | 2512.07596 | null |
| 2025-12-07 | CoT4Det: A Chain-of-Thought Framework for Perception-Oriented Vision-Language Tasks | Yu Qi et.al. | 2512.06663 | null |
| 2025-12-06 | HuPrior3R: Incorporating Human Priors for Better 3D Dynamic Reconstruction from Monocular Videos | Weitao Xiong et.al. | 2512.06368 | null |
| 2025-12-05 | See in Depth: Training-Free Surgical Scene Segmentation with Monocular Depth Priors | Kunyi Yang et.al. | 2512.05529 | null |
| 2025-12-05 | YOLO and SGBM Integration for Autonomous Tree Branch Detection and Depth Estimation in Radiata Pine Pruning Applications | Yida Lin et.al. | 2512.05412 | null |
| 2025-12-03 | Gamma-from-Mono: Road-Relative, Metric, Self-Supervised Monocular Geometry for Vehicular Applications | Gasser Elazab et.al. | 2512.04303 | null |
| 2025-12-03 | Unique Lives, Shared World: Learning from Single-Life Videos | Tengda Han et.al. | 2512.04085 | null |
| 2025-12-03 | SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL | Siyi Chen et.al. | 2512.04069 | null |
| 2025-12-03 | MDE-AgriVLN: Agricultural Vision-and-Language Navigation with Monocular Depth Estimation | Xiaobei Zhao et.al. | 2512.03958 | null |
| 2025-12-03 | Generalization Evaluation of Deep Stereo Matching Methods for UAV-Based Forestry Applications | Yida Lin et.al. | 2512.03427 | null |
| 2025-12-02 | DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling | Kairun Wen et.al. | 2512.03000 | null |
| 2025-12-02 | BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection | Guowen Zhang et.al. | 2512.02972 | null |
| 2025-12-01 | DepthScape: Authoring 2.5D Designs via Depth Estimation, Semantic Understanding, and Geometry Extraction | Xia Su et.al. | 2512.02263 | null |
| 2025-12-01 | BlinkBud: Detecting Hazards from Behind via Sampled Monocular 3D Detection on a Single Earbud | Yunzhe Li et.al. | 2512.01366 | null |
| 2025-11-30 | Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model | Jing He et.al. | 2512.01030 | null |
| 2025-11-30 | EAG3R: Event-Augmented 3D Geometry Estimation for Dynamic and Extreme-Lighting Scenes | Xiaoshan Wu et.al. | 2512.00771 | null |
| 2025-11-26 | Multi-modal On-Device Learning for Monocular Depth Estimation on Ultra-low-power MCUs | Davide Nadalini et.al. | 2512.00086 | null |
| 2025-11-28 | Geometry-Consistent 4D Gaussian Splatting for Sparse-Input Dynamic View Synthesis | Yiwei Li et.al. | 2511.23044 | null |
| 2025-11-27 | Advances in electromagnetic techniques for subsurface infrastructure detection: A comprehensive review of methods, challenges, and innovations | Arasti Afrasiabi et.al. | 2511.22673 | null |
| 2025-11-27 | IE-SRGS: An Internal-External Knowledge Fusion Framework for High-Fidelity 3D Gaussian Splatting Super-Resolution | Xiang Feng et.al. | 2511.22233 | null |
| 2025-11-25 | MODEST: Multi-Optics Depth-of-Field Stereo Dataset | Nisarg K. Trivedi et.al. | 2511.20853 | null |
| 2025-11-25 | 3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding | Xiaoye Wang et.al. | 2511.20646 | null |
| 2025-11-25 | DeLightMono: Enhancing Self-Supervised Monocular Depth Estimation in Endoscopy by Decoupling Uneven Illumination | Mingyang Ou et.al. | 2511.20058 | null |
| 2025-11-24 | Real-Time Object Tracking with On-Device Deep Learning for Adaptive Beamforming in Dynamic Acoustic Environments | Jorge Ortigoso-Narro et.al. | 2511.19396 | null |
| 2025-11-24 | DensifyBeforehand: LiDAR-assisted Content-aware Densification for Efficient and Quality 3D Gaussian Splatting | Phurtivilai Patt et.al. | 2511.19294 | null |
| 2025-11-24 | Understanding Task Transfer in Vision-Language Models | Bhuvan Sachdeva et.al. | 2511.18787 | null |
| 2025-11-22 | AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens | Purvish Jajal et.al. | 2511.18105 | null |
| 2025-11-21 | Vision-Guided Optic Flow Navigation for Small Lunar Missions | Sean Cowan et.al. | 2511.17720 | null |
| 2025-11-21 | DepthFocus: Controllable Depth Estimation for See-Through Scenes | Junhong Min et.al. | 2511.16993 | null |
| 2025-11-20 | Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations | Irmak Guzey et.al. | 2511.16661 | null |
| 2025-11-20 | Lite Any Stereo: Efficient Zero-Shot Stereo Matching | Junpeng Jing et.al. | 2511.16555 | null |
| 2025-11-20 | CylinderDepth: Cylindrical Spatial Attention for Multi-View Consistent Self-Supervised Surround Depth Estimation | Samer Abualhanud et.al. | 2511.16428 | null |
| 2025-11-20 | Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling | Minseok Seo et.al. | 2511.16301 | null |
| 2025-11-19 | Learning Depth from Past Selves: Self-Evolution Contrast for Robust Depth Estimation | Jing Cao et.al. | 2511.15167 | null |
| 2025-11-18 | EGSA-PT:Edge-Guided Spatial Attention with Progressive Training for Monocular Depth Estimation and Segmentation of Transparent Objects | Gbenga Omotara et.al. | 2511.14970 | null |
| 2025-11-18 | Cheating Stereo Matching in Full-scale: Physical Adversarial Attack against Binocular Depth Estimation in Autonomous Driving | Kangqiao Zhao et.al. | 2511.14386 | null |
| 2025-11-18 | Enhancing Generalization of Depth Estimation Foundation Model via Weakly-Supervised Adaptation with Regularization | Yan Huang et.al. | 2511.14238 | null |
| 2025-11-18 | RTS-Mono: A Real-Time Self-Supervised Monocular Depth Estimation Method for Real-World Deployment | Zeyu Cheng et.al. | 2511.14107 | null |
| 2025-11-17 | Towards Metric-Aware Multi-Person Mesh Recovery by Jointly Optimizing Human Crowd in Camera Space | Kaiwen Wang et.al. | 2511.13282 | null |
| 2025-11-17 | Difficulty-Aware Label-Guided Denoising for Monocular 3D Object Detection | Soyul Lee et.al. | 2511.13195 | null |
| 2025-11-13 | Depth Anything 3: Recovering the Visual Space from Any Views | Haotong Lin et.al. | 2511.10647 | null |
| 2025-11-13 | OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer | Haosong Peng et.al. | 2511.10560 | null |
| 2025-11-13 | Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision | Yu Deng et.al. | 2511.10316 | null |
| 2025-11-13 | RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo | Jueun Ko et.al. | 2511.10107 | null |
| 2025-11-12 | PALMS+: Modular Image-Based Floor Plan Localization Leveraging Depth Foundation Model | Yunqian Cheng et.al. | 2511.09724 | null |
| 2025-11-12 | PIFF: A Physics-Informed Generative Flow Model for Real-Time Flood Depth Mapping | ChunLiang Wu et.al. | 2511.09130 | null |
| 2025-11-11 | WEDepth: Efficient Adaptation of World Knowledge for Monocular Depth Estimation | Gongshu Wang et.al. | 2511.08036 | null |
| 2025-11-11 | Visual Bridge: Universal Visual Perception Representations Generating | Yilin Gao et.al. | 2511.07877 | null |
| 2025-11-10 | FlowFeat: Pixel-Dense Embedding of Motion Profiles | Nikita Araslanov et.al. | 2511.07696 | null |
| 2025-11-09 | How Wide and How Deep? Mitigating Over-Squashing of GNNs via Channel Capacity Constrained Estimation | Zinuo You et.al. | 2511.06443 | null |
| 2025-11-09 | Temporal-Guided Visual Foundation Models for Event-Based Vision | Ruihao Xia et.al. | 2511.06238 | null |
| 2025-11-08 | Light-Field Dataset for Disparity Based Depth Estimation | Suresh Nehra et.al. | 2511.05866 | null |
| 2025-11-06 | FiCABU: A Fisher-Based, Context-Adaptive Machine Unlearning Processor for Edge AI | Eun-Su Cho et.al. | 2511.05605 | null |
| 2025-11-07 | No Pose Estimation? No Problem: Pose-Agnostic and Instance-Aware Test-Time Adaptation for Monocular Depth Estimation | Mingyu Sung et.al. | 2511.05055 | null |
| 2025-11-06 | Machine Learning-Driven Analysis of kSZ Maps to Predict CMB Optical Depth $τ$ | Farshid Farhadi Khouzani et.al. | 2511.04770 | null |
| 2025-11-06 | Asymptotics of constrained $M$ -estimation under convexity | Victor-Emmanuel Brunel et.al. | 2511.04612 | null |
| 2025-11-06 | Annual net community production and carbon exports in the central Sargasso Sea from autonomous underwater glider observations | Ruth G. Curry et.al. | 2511.04544 | null |
| 2025-11-06 | BoRe-Depth: Self-supervised Monocular Depth Estimation with Boundary Refinement for Embedded Systems | Chang Liu et.al. | 2511.04388 | null |
| 2025-11-06 | Simple 3D Pose Features Support Human and Machine Social Scene Understanding | Wenshuo Qin et.al. | 2511.03988 | null |
| 2025-11-05 | Thermodynamic Probes of Multipartite Entanglement in Strongly Interacting Quantum Systems | Harsh Sharma et.al. | 2511.03266 | null |
| 2025-11-05 | Quantum Sensing of Copper-Phthalocyanine Electron Spins via NV Relaxometry | Boning Li et.al. | 2511.03200 | null |
| 2025-11-05 | Exploring the spectral characteristics of the periodic burster 4U 1323-62: Type-I X-ray burst and persistent emission | Mahasweta Bhattacharya et.al. | 2511.03172 | null |
| 2025-11-04 | EvtSlowTV – A Large and Diverse Dataset for Event-Based Depth Estimation | Sadiq Layi Macaulay et.al. | 2511.02953 | null |
| 2025-11-04 | Classical shadows for sample-efficient measurements of gauge-invariant observables | Jacob Bringewatt et.al. | 2511.02904 | null |
| 2025-11-04 | Hydrogen site-dependent physical properties of hydrous magnesium silicates: implications for water storage and transport in the mantle transition zone | Zifan Wang et.al. | 2511.02416 | null |
| 2025-11-04 | Monocular absolute depth estimation from endoscopy via domain-invariant feature learning and latent consistency | Hao Li et.al. | 2511.02247 | null |
| 2025-11-04 | Bayesian spatio-temporal weighted regression for integrating missing and misaligned environmental data | Yovna Junglee et.al. | 2511.02149 | null |
| 2025-11-03 | Opto-Electronic Convolutional Neural Network Design Via Direct Kernel Optimization | Ali Almuallem et.al. | 2511.02065 | null |
| 2025-11-03 | Dynamic Reconstruction of Ultrasound-Derived Flow Fields With Physics-Informed Neural Fields | Viraj Patel et.al. | 2511.01804 | null |
| 2025-11-03 | HGFreNet: Hop-hybrid GraphFomer for 3D Human Pose Estimation with Trajectory Consistency in Frequency Domain | Kai Zhai et.al. | 2511.01756 | null |
| 2025-11-03 | Discriminately Treating Motion Components Evolves Joint Depth and Ego-Motion Learning | Mengtan Zhang et.al. | 2511.01502 | null |
| 2025-11-03 | Floor Plan-Guided Visual Navigation Incorporating Depth and Directional Cues | Wei Huang et.al. | 2511.01493 | null |
| 2025-11-03 | Fast End-to-End Framework for Cosmological Parameter Inference from CMB Data Using Machine Learning | Larissa Santos et.al. | 2511.01291 | null |
| 2025-11-03 | Contextual Relevance and Adaptive Sampling for LLM-Based Document Reranking | Jerry Huang et.al. | 2511.01208 | null |
| 2025-10-31 | VLM6D: VLM based 6Dof Pose Estimation based on RGB-D Images | Md Selim Sarowar et.al. | 2511.00120 | null |
| 2025-10-31 | MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts | Jingnan Gao et.al. | 2510.27234 | null |
| 2025-10-30 | FlowQ-Net: A Generative Framework for Automated Quantum Circuit Design | Jun Dai et.al. | 2510.26688 | null |
| 2025-10-30 | Interstellar Comet 3I/ATLAS: Evidence for Galactic Cosmic Ray Processing | R. Maggiolo et.al. | 2510.26308 | null |
| 2025-10-29 | Quantum simulation of actinide chemistry: towards scalable algorithms on trapped ion quantum computers | Kesha Sorathia et.al. | 2510.25675 | null |
| 2025-10-29 | Continuous subsurface property retrieval from sparse radar observations using physics informed neural networks | Ishfaq Aziz et.al. | 2510.25648 | null |
| 2025-10-29 | SPADE: Sparsity Adaptive Depth Estimator for Zero-Shot, Real-Time, Monocular Depth Estimation in Underwater Environments | Hongjie Zhang et.al. | 2510.25463 | null |
| 2025-10-29 | Seeing Clearly and Deeply: An RGBD Imaging Approach with a Bio-inspired Monocentric Design | Zongxi Yu et.al. | 2510.25314 | null |
| 2025-10-28 | GeVI-SLAM: Gravity-Enhanced Stereo Visua Inertial SLAM for Underwater Robots | Yuan Shen et.al. | 2510.24533 | null |
| 2025-10-27 | The case for an Astrometric Mission Extension of Euclid. Extending Gaia by 6 magnitudes with Euclid covering one-third of the sky | Luigi “Rolly’’ BEDIN et.al. | 2510.23694 | null |
| 2025-10-27 | More Than Generation: Unifying Generation and Depth Estimation via Text-to-Image Diffusion Models | Hongkai Lin et.al. | 2510.23574 | null |
| 2025-10-27 | Group-Level and Personalized Optimization for the Insula and Hippocampus Focal Electric Field in Transcranial Temporal Interferential Stimulation: A Computational Study | Taiga Inoue et.al. | 2510.23290 | null |
| 2025-10-27 | Precise Time Delay Measurement and Compensation for Tightly Coupled Underwater SINS/piUSBL Navigation | Jin Huang et.al. | 2510.23286 | null |
| 2025-10-27 | Development of the Reconstruction Procedure of the Fluorescence detector Array of Single-pixel Telescopes for measuring Ultra-High Energy Cosmic Rays | Fraser Bradfield et.al. | 2510.23219 | null |
| 2025-10-27 | Resource analysis of Shor’s elliptic curve algorithm with an improved quantum adder on a two-dimensional lattice | Quan Gu et.al. | 2510.23212 | null |
| 2025-10-27 | Seq-DeepIPC: Sequential Sensing for End-to-End Control in Legged Robot Navigation | Oskar Natan et.al. | 2510.23057 | null |
| 2025-10-26 | LVD-GS: Gaussian Splatting SLAM for Dynamic Scenes via Hierarchical Explicit-Implicit Representation Collaboration Rendering | Wenkai Zhu et.al. | 2510.22669 | null |
| 2025-10-26 | qc-kmeans: A Quantum Compressive K-Means Algorithm for NISQ Devices | Pedro Chumpitaz-Flores et.al. | 2510.22540 | null |
| 2025-10-25 | EndoSfM3D: Learning to 3D Reconstruct Any Endoscopic Surgery Scene using Self-supervised Foundation Model | Changhao Zhang et.al. | 2510.22359 | null |
| 2025-10-25 | I2-NeRF: Learning Neural Radiance Fields Under Physically-Grounded Media Interactions | Shuhong Liu et.al. | 2510.22161 | null |
| 2025-10-25 | CogStereo: Neural Stereo Matching with Implicit Spatial Cognition Embedding | Lihuang Fang et.al. | 2510.22119 | null |
| 2025-10-25 | Impact of Charge Transfer Inefficiency on transit light-curves: A correction strategy for PLATO | Shaunak Mishra et.al. | 2510.22092 | null |
| 2025-10-24 | An Hα Transit of HD 189733b to Assess Stellar Activity Across the Transit Chord Close to JWST Observations | Kingsley E. Ehrich et.al. | 2510.21703 | null |
| 2025-10-24 | MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning | Siyong Chen et.al. | 2510.21093 | null |
| 2025-10-23 | PPMStereo: Pick-and-Play Memory Construction for Consistent Dynamic Stereo Matching | Yun Wang et.al. | 2510.20178 | null |
| 2025-10-22 | Projecting Hurricane Risk in Atlantic Canada under Climate Change | Saeed Saviz Naeini et.al. | 2510.20074 | null |
| 2025-10-22 | Toward A Better Understanding of Monocular Depth Evaluation | Siyang Wu et.al. | 2510.19814 | null |
| 2025-10-22 | FAUST. XXVIII. High-Resolution ALMA Observations of Class 0/I Disks: Structure, Optical Depths, and Temperatures | M. J. Maureira et.al. | 2510.19635 | null |
| 2025-10-22 | Insights into the Unknown: Federated Data Diversity Analysis on Molecular Data | Markus Bujotzek et.al. | 2510.19535 | null |
| 2025-10-22 | PRGCN: A Graph Memory Network for Cross-Sequence Pattern Reuse in 3D Human Pose Estimation | Zhuoyang Xie et.al. | 2510.19475 | null |
| 2025-10-22 | Seabed-Net: A multi-task network for joint bathymetry estimation and seabed classification from remote sensing imagery in shallow waters | Panagiotis Agrafiotis et.al. | 2510.19329 | null |
| 2025-10-22 | SFGFusion: Surface Fitting Guided 3D Object Detection with 4D Radar and Camera Fusion | Xiaozhi Li et.al. | 2510.19215 | null |
| 2025-10-21 | Kinematic Analysis and Integration of Vision Algorithms for a Mobile Manipulator Employed Inside a Self-Driving Laboratory | Shifa Sulaiman et.al. | 2510.19081 | null |
| 2025-10-21 | Adaptive hyperviscosity stabilisation for the RBF-FD method in solving advection-dominated transport equations | Miha Rot et.al. | 2510.18772 | null |
| 2025-10-21 | PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-Forward Planar Splatting | Changkun Liu et.al. | 2510.18714 | null |
| 2025-10-21 | GeoDiff: Geometry-Guided Diffusion for Metric Depth Estimation | Tuan Pham et.al. | 2510.18291 | null |
| 2025-10-20 | Believe It or Not: How Deeply do LLMs Believe Implanted Facts? | Stewart Slocum et.al. | 2510.17941 | null |
| 2025-10-20 | PAGE-4D: Disentangled Pose and Geometry Estimation for VGGT-4D Perception | Kaichen Zhou et.al. | 2510.17568 | null |
| 2025-10-20 | M2H: Multi-Task Learning with Efficient Window-Based Cross-Task Attention for Monocular Spatial Perception | U. V. B. L Udugama et.al. | 2510.17363 | null |
| 2025-10-20 | Capturing Head Avatar with Hand Contacts from a Monocular Video | Haonan He et.al. | 2510.17181 | null |
| 2025-10-19 | How Universal Are SAM2 Features? | Masoud Khairi Atani et.al. | 2510.17051 | null |
| 2025-10-19 | A Low-Complexity View Synthesis Distortion Estimation Method for 3D Video with Large Baseline Considerations | Chongyuan Bi et.al. | 2510.17037 | null |
| 2025-10-19 | Prediction-Augmented Trees for Reliable Statistical Inference | Vikram Kher et.al. | 2510.16937 | null |
| 2025-10-18 | Self-Supervised Learning to Fly using Efficient Semantic Segmentation and Metric Depth Estimation for Low-Cost Autonomous UAVs | Sebastian Mocanu et.al. | 2510.16624 | null |
| 2025-10-18 | OOS-DSD: Improving Out-of-stock Detection in Retail Images using Auxiliary Tasks | Franko Šikić et.al. | 2510.16508 | null |
| 2025-10-15 | Decision-focused Sensing and Forecasting for Adaptive and Rapid Flood Response: An Implicit Learning Approach | Qian Sun et.al. | 2510.16015 | null |
| 2025-10-17 | FIDDLE: Reinforcement Learning for Quantum Fidelity Enhancement | Hoang M. Ngo et.al. | 2510.15833 | null |
| 2025-10-17 | Adaptive time Compressed QITE (ACQ) and its geometrical interpretation | Alberto Acevedo Meléndez et.al. | 2510.15781 | null |
| 2025-10-16 | SaLon3R: Structure-aware Long-term Generalizable 3D Reconstruction from Unposed Images | Jiaxin Guo et.al. | 2510.15072 | null |
| 2025-10-16 | C4D: 4D Made from 3D through Dual Correspondences | Shizun Wang et.al. | 2510.14960 | null |
| 2025-10-16 | Multi-modal video data-pipelines for machine learning with minimal human supervision | Mihai-Cristian Pîrvu et.al. | 2510.14862 | null |
| 2025-10-15 | XD-RCDepth: Lightweight Radar-Camera Depth Estimation with Explainability-Aligned and Distribution-Aware Distillation | Huawei Sun et.al. | 2510.13565 | null |
| 2025-10-15 | FlyAwareV2: A Multimodal Cross-Domain UAV Dataset for Urban Scene Understanding | Francesco Barbato et.al. | 2510.13243 | null |
| 2025-10-14 | E-MoFlow: Learning Egomotion and Optical Flow from Event Data via Implicit Regularization | Wenpu Li et.al. | 2510.12753 | null |
| 2025-10-14 | Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model | Fuhao Li et.al. | 2510.12276 | link |
| 2025-10-13 | Evaluating the effects of preprocessing, method selection, and hyperparameter tuning on SAR-based flood mapping and water depth estimation | Jean-Paul Travert et.al. | 2510.11305 | null |
| 2025-10-11 | Gesplat: Robust Pose-Free 3D Reconstruction via Geometry-Guided Gaussian Splatting | Jiahui Lu et.al. | 2510.10097 | null |
| 2025-10-10 | Fast Self-Supervised depth and mask aware Association for Multi-Object Tracking | Milad Khanchi et.al. | 2510.09878 | null |
| 2025-10-10 | Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance for Self-supervised Monocular Depth Estimation | Wenyao Zhang et.al. | 2510.09320 | link |
| 2025-10-10 | Online Video Depth Anything: Temporally-Consistent Depth Prediction with Low Memory Consumption | Johann-Friedrich Feiden et.al. | 2510.09182 | null |
| 2025-10-08 | Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry | Thomas Fel et.al. | 2510.08638 | null |
| 2025-10-09 | RayFusion: Ray Fusion Enhanced Collaborative Visual Perception | Shaohong Wang et.al. | 2510.08017 | null |
| 2025-10-09 | CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving | Tianrui Zhang et.al. | 2510.07944 | link |
| 2025-10-09 | An End-to-End Room Geometry Constrained Depth Estimation Framework for Indoor Panorama Images | Kanglin Ning et.al. | 2510.07817 | null |
| 2025-10-08 | Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers | Gangwei Xu et.al. | 2510.07316 | null |
| 2025-10-08 | MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis | Yihao Zhi et.al. | 2510.07190 | null |
| 2025-10-07 | Human3R: Everyone Everywhere All at Once | Yue Chen et.al. | 2510.06219 | link |
| 2025-10-07 | EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark | Deheng Zhang et.al. | 2510.06218 | null |
| 2025-10-07 | Dropping the D: RGB-D SLAM Without the Depth Sensor | Mert Kiray et.al. | 2510.06216 | link |
| 2025-10-07 | DeLTa: Demonstration and Language-Guided Novel Transparent Object Manipulation | Taeyeop Lee et.al. | 2510.05662 | null |
| 2025-10-07 | Human Action Recognition from Point Clouds over Time | James Dickens et.al. | 2510.05506 | null |
| 2025-10-06 | HybridFlow: Quantification of Aleatoric and Epistemic Uncertainty with a Single Hybrid Model | Peter Van Katwyk et.al. | 2510.05054 | null |
| 2025-10-06 | Benchmark on Monocular Metric Depth Estimation in Wildlife Setting | Niccolò Niccoli et.al. | 2510.04723 | null |
| 2025-10-04 | Evaluating High-Resolution Piano Sustain Pedal Depth Estimation with Musically Informed Metrics | Hanwen Zhang et.al. | 2510.03750 | null |
| 2025-10-03 | Whisker-based Tactile Flight for Tiny Drones | Chaoxiang Ye et.al. | 2510.03119 | null |
| 2025-10-02 | Non-Rigid Structure-from-Motion via Differential Geometry with Recoverable Conformal Scale | Yongbo Chen et.al. | 2510.01665 | null |
| 2025-10-01 | Temporal Score Rescaling for Temperature Sampling in Diffusion and Flow Models | Yanbo Xu et.al. | 2510.01184 | null |
| 2025-09-30 | DA $^{2}$ : Depth Anything in Any Direction | Haodong Li et.al. | 2509.26618 | link |
| 2025-09-30 | DEPTHOR++: Robust Depth Enhancement from a Real-World Lightweight dToF and RGB Guidance | Jijun Xiang et.al. | 2509.26498 | null |
| 2025-09-30 | EasyOcc: 3D Pseudo-Label Supervision for Fully Self-Supervised Semantic Occupancy Prediction Models | Seamie Hayes et.al. | 2509.26087 | null |
| 2025-09-30 | PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric Fusion | Zhiwei Zhang et.al. | 2509.26008 | null |
| 2025-09-29 | DepthLM: Metric Depth From Vision Language Models | Zhipeng Cai et.al. | 2509.25413 | link |
| 2025-09-29 | Fast Feature Field ( $\text{F}^3$ ): A Predictive Representation of Events | Richeek Das et.al. | 2509.25146 | null |
| 2025-09-29 | BRIDGE – Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation | Dingning Liu et.al. | 2509.25077 | link |
| 2025-09-29 | HBSplat: Robust Sparse-View Gaussian Reconstruction with Hybrid-Loss Guided Depth and Bidirectional Warping | Yu Ma et.al. | 2509.24893 | null |
| 2025-09-28 | RPG360: Robust 360 Depth Estimation with Perspective Foundation Models and Graph Optimization | Dongki Jung et.al. | 2509.23991 | null |
| 2025-09-28 | FastViDAR: Real-Time Omnidirectional Depth Estimation via Alternative Hierarchical Attention | Hangtian Zhao et.al. | 2509.23733 | link |
| 2025-09-28 | Efficient Domain-Adaptive Multi-Task Dense Prediction with Vision Foundation Models | Beomseok Kang et.al. | 2509.23626 | null |
| 2025-09-26 | CCNeXt: An Effective Self-Supervised Stereo Depth Estimation Approach | Alexandre Lopes et.al. | 2509.22627 | link |
| 2025-09-26 | EfficientDepth: A Fast and Detail-Preserving Monocular Depth Estimation Model | Andrii Litvynchuk et.al. | 2509.22527 | null |
| 2025-09-26 | DualFocus: Depth from Focus with Spatio-Focal Dual Variational Constraints | Sungmin Woo et.al. | 2509.21992 | null |
| 2025-09-25 | Finding 3D Positions of Distant Objects from Noisy Camera Movement and Semantic Segmentation Sequences | Julius Pesonen et.al. | 2509.20906 | null |
| 2025-09-24 | Shared Neural Space: Unified Precomputed Feature Encoding for Multi-Task and Cross Domain Vision | Jing Li et.al. | 2509.20481 | null |
| 2025-09-24 | BiTAA: A Bi-Task Adversarial Attack for Object Detection and Depth Estimation via 3D Gaussian Splatting | Yixun Zhang et.al. | 2509.19793 | null |
| 2025-09-24 | VIMD: Monocular Visual-Inertial Motion and Depth Estimation | Saimouli Katragadda et.al. | 2509.19713 | null |
| 2025-09-24 | Enhancing Transformer-Based Vision Models: Addressing Feature Map Anomalies Through Novel Optimization Strategies | Sumit Mamtani et.al. | 2509.19687 | null |
| 2025-09-23 | An on-chip Pixel Processing Approach with 2.4μs latency for Asynchronous Read-out of SPAD-based dToF Flash LiDARs | Yiyang Liu et.al. | 2509.19192 | null |
| 2025-09-23 | RS3DBench: A Comprehensive Benchmark for 3D Spatial Perception in Remote Sensing | Jiayu Wang et.al. | 2509.18897 | null |
| 2025-09-23 | Zero-shot Monocular Metric Depth for Endoscopic Images | Nicolas Toussaint et.al. | 2509.18642 | null |
| 2025-09-18 | URNet: Uncertainty-aware Refinement Network for Event-based Stereo Depth Estimation | Yifeng Cheng et.al. | 2509.18184 | null |
| 2025-09-22 | RadarSFD: Single-Frame Diffusion with Pretrained Priors for Radar Point Clouds | Bin Zhao et.al. | 2509.18068 | null |
| 2025-09-22 | Predicting Depth Maps from Single RGB Images and Addressing Missing Information in Depth Estimation | Mohamad Mofeed Chaar et.al. | 2509.17686 | null |
| 2025-09-22 | Evict3R: Training-Free Token Eviction for Memory-Bounded Streaming Visual Geometry Transformers | Soroush Mahdi et.al. | 2509.17650 | null |
| 2025-09-22 | GPS Denied IBVS-Based Navigation and Collision Avoidance of UAV Using a Low-Cost RGB Camera | Xiaoyu Wang et.al. | 2509.17435 | null |
| 2025-09-21 | ConfidentSplat: Confidence-Weighted Depth Fusion for Accurate 3D Gaussian Splatting SLAM | Amanuel T. Dufera et.al. | 2509.16863 | null |
| 2025-09-19 | 3D Gaussian Flats: Hybrid 2D/3D Photometric Scene Reconstruction | Maria Taktasheva et.al. | 2509.16423 | null |
| 2025-09-19 | StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes | Zhengri Wu et.al. | 2509.16415 | link |
| 2025-09-19 | Towards Sharper Object Boundaries in Self-Supervised Depth Estimation | Aurélien Cecille et.al. | 2509.15987 | null |
| 2025-09-19 | Shedding Light on Depth: Explainability Assessment in Monocular Depth Estimation | Lorenzo Cirillo et.al. | 2509.15980 | null |
| 2025-09-19 | MS-GS: Multi-Appearance Sparse-View 3D Gaussian Splatting in the Wild | Deming Li et.al. | 2509.15548 | null |
| 2025-09-18 | Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation | Luca Bartolomei et.al. | 2509.15224 | null |
| 2025-09-18 | Lightweight and Accurate Multi-View Stereo with Confidence-Aware Diffusion Model | Fangjinhua Wang et.al. | 2509.15220 | null |
| 2025-09-18 | UCorr: Wire Detection and Depth Estimation for Autonomous Drones | Benedikt Kolbeinsson et.al. | 2509.14989 | null |
| 2025-09-18 | MapAnything: Mapping Urban Assets using Single Street-View Images | Miriam Louise Carnot et.al. | 2509.14839 | null |
| 2025-09-16 | \textsc{Gen2Real}: Towards Demo-Free Dexterous Manipulation by Harnessing Generated Video | Kai Ye et.al. | 2509.14178 | null |
| 2025-09-17 | UM-Depth : Uncertainty Masked Self-Supervised Monocular Depth Estimation with Visual Odometry | Tae-Wook Um et.al. | 2509.13713 | null |
| 2025-09-17 | Gaussian Alignment for Relative Camera Pose Estimation via Single-View Reconstruction | Yumin Li et.al. | 2509.13652 | null |
| 2025-09-16 | ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors | Romain Hardy et.al. | 2509.13525 | null |
| 2025-09-16 | MINGLE: VLMs for Semantically Complex Region Detection in Urban Scenes | Liu Liu et.al. | 2509.13484 | null |
| 2025-09-16 | MapAnything: Universal Feed-Forward Metric 3D Reconstruction | Nikhil Keetha et.al. | 2509.13414 | null |
| 2025-09-16 | ROOM: A Physics-Based Continuum Robot Simulator for Photorealistic Medical Datasets Generation | Salvatore Esposito et.al. | 2509.13177 | link |
| 2025-09-15 | BREA-Depth: Bronchoscopy Realistic Airway-geometric Depth Estimation | Francis Xiatian Zhang et.al. | 2509.11885 | null |
| 2025-09-14 | In-Vivo Skin 3-D Surface Reconstruction and Wrinkle Depth Estimation using Handheld High Resolution Tactile Sensing | Akhil Padmanabha et.al. | 2509.11385 | null |
| 2025-09-14 | The System Description of CPS Team for Track on Driving with Language of CVPR 2024 Autonomous Grand Challenge | Jinghan Peng et.al. | 2509.11071 | null |
| 2025-09-12 | Self-supervised Learning Of Visual Pose Estimation Without Pose Labels By Classifying LED States | Nicholas Carlotti et.al. | 2509.10405 | null |
| 2025-09-10 | Computational Imaging for Enhanced Computer Vision | Humera Shaikh et.al. | 2509.08712 | null |
| 2025-09-10 | Deep Visual Odometry for Stereo Event Cameras | Sheng Zhong et.al. | 2509.08235 | null |
| 2025-09-09 | Zero-Shot Metric Depth Estimation via Monocular Visual-Inertial Rescaling for Autonomous Aerial Navigation | Steven Yang et.al. | 2509.08159 | null |
| 2025-09-09 | MCTED: A Machine-Learning-Ready Dataset for Digital Elevation Model Generation From Mars Imagery | Rafał Osadnik et.al. | 2509.08027 | null |
| 2025-09-08 | Event Spectroscopy: Event-based Multispectral and Depth Sensing using Structured Light | Christian Geckeler et.al. | 2509.06741 | null |
| 2025-09-08 | VIM-GS: Visual-Inertial Monocular Gaussian Splatting via Object-level Guidance in Large Scenes | Shengkai Zhang et.al. | 2509.06685 | null |
| 2025-09-07 | S-LAM3D: Segmentation-Guided Monocular 3D Object Detection via Feature Space Fusion | Diana-Alexandra Sas et.al. | 2509.05999 | null |
| 2025-09-06 | MonoGlass3D: Monocular 3D Glass Detection with Plane Regression and Adaptive Feature Fusion | Kai Zhang et.al. | 2509.05599 | null |
| 2025-09-05 | FloodVision: Urban Flood Depth Estimation Using Foundation Vision-Language Models and Domain Knowledge Graph | Zhangding Liu et.al. | 2509.04772 | null |
| 2025-09-03 | Uncertainty-aware Test-Time Training (UT $^3$ ) for Efficient On-the-fly Domain Adaptive Dense Regression | Uddeshya Upadhyay et.al. | 2509.03012 | null |
| 2025-09-03 | DUViN: Diffusion-Based Underwater Visual Navigation via Knowledge-Transferred Depth Features | Jinghe Yang et.al. | 2509.02983 | null |
| 2025-09-02 | Physics-Informed Machine Learning with Adaptive Grids for Optical Microrobot Depth Estimation | Lan Wei et.al. | 2509.02343 | null |
| 2025-09-02 | Doctoral Thesis: Geometric Deep Learning For Camera Pose Prediction, Registration, Depth Estimation, and 3D Reconstruction | Xueyang Kang et.al. | 2509.01873 | null |
| 2025-09-01 | EndoGMDE: Generalizable Monocular Depth Estimation with Mixture of Low-Rank Experts for Diverse Endoscopic Scenes | Liangjing Shao et.al. | 2509.01206 | null |
| 2025-08-31 | ER-LoRA: Effective-Rank Guided Adaptation for Weather-Generalized Depth Estimation | Weilong Yan et.al. | 2509.00665 | null |
| 2025-08-23 | ARTPS: Depth-Enhanced Hybrid Anomaly Detection and Learnable Curiosity Score for Autonomous Rover Target Prioritization | Poyraz Baydemir et.al. | 2509.00042 | null |
| 2025-08-28 | Enhancing Pseudo-Boxes via Data-Level LiDAR-Camera Fusion for Unsupervised 3D Object Detection | Mingqian Ji et.al. | 2508.20530 | null |
| 2025-08-27 | OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations | Peng-Hao Hsu et.al. | 2508.20063 | null |
| 2025-08-26 | SoccerNet 2025 Challenges Results | Silvio Giancola et.al. | 2508.19182 | null |
| 2025-08-25 | Impact of Target and Tool Visualization on Depth Perception and Usability in Optical See-Through AR | Yue Yang et.al. | 2508.18481 | null |
| 2025-08-25 | EndoUFM: Utilizing Foundation Models for Monocular depth estimation of endoscopic images | Xinning Yao et.al. | 2508.17916 | null |
| 2025-08-23 | Balanced Sharpness-Aware Minimization for Imbalanced Regression | Yahao Liu et.al. | 2508.16973 | null |
| 2025-08-20 | FOCUS: Frequency-Optimized Conditioning of DiffUSion Models for mitigating catastrophic forgetting during Test-Time Adaptation | Gabriel Tjio et.al. | 2508.14437 | null |
| 2025-08-19 | ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving | Xianda Guo et.al. | 2508.13977 | null |
| 2025-08-18 | Batching-Aware Joint Model Onloading and Offloading for Hierarchical Multi-Task Inference | Seohyeon Cha et.al. | 2508.13380 | null |
| 2025-08-18 | DMS:Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation | Zihua Liu et.al. | 2508.13091 | null |
| 2025-08-15 | DashCam Video: A complementary low-cost data stream for on-demand forest-infrastructure system monitoring | Durga Joshi et.al. | 2508.11591 | null |
| 2025-08-15 | Unifying Scale-Aware Depth Prediction and Perceptual Priors for Monocular Endoscope Pose Estimation and Tissue Reconstruction | Muzammil Khan et.al. | 2508.11282 | null |
| 2025-08-15 | CHARM3R: Towards Unseen Camera Height Robust Monocular 3D Detector | Abhinav Kumar et.al. | 2508.11185 | null |
| 2025-08-12 | Vision-Only Gaussian Splatting for Collaborative Semantic Occupancy Prediction | Cheng Chen et.al. | 2508.10936 | null |
| 2025-08-14 | SC-Lane: Slope-aware and Consistent Road Height Estimation Framework for 3D Lane Detection | Chaesong Park et.al. | 2508.10411 | null |
| 2025-08-12 | A new dataset and comparison for multi-camera frame synthesis | Conall Daly et.al. | 2508.09068 | null |
| 2025-08-12 | Deep Spectral Epipolar Representations for Dense Light Field Reconstruction | Noor Islam S. Mohammad et.al. | 2508.08900 | null |
| 2025-08-11 | GRASPTrack: Geometry-Reasoned Association via Segmentation and Projection for Multi-Object Tracking | Xudong Han et.al. | 2508.08117 | null |
| 2025-08-11 | TRIDE: A Text-assisted Radar-Image weather-aware fusion network for Depth Estimation | Huawei Sun et.al. | 2508.08038 | null |
| 2025-08-11 | Autonomous Navigation of Cloud-Controlled Quadcopters in Confined Spaces Using Multi-Modal Perception and LLM-Driven High Semantic Reasoning | Shoaib Ahmmad et.al. | 2508.07885 | null |
| 2025-08-10 | MonoMPC: Monocular Vision Based Navigation with Learned Collision Model and Risk-Aware Model Predictive Control | Basant Sharma et.al. | 2508.07387 | null |
| 2025-08-10 | DIP-GS: Deep Image Prior For Gaussian Splatting Sparse View Recovery | Rajaei Khatib et.al. | 2508.07372 | null |
| 2025-08-10 | Similarity Matters: A Novel Depth-guided Network for Image Restoration and A New Dataset | Junyi He et.al. | 2508.07211 | null |
| 2025-08-10 | Acoustic source depth estimation method based on a single hydrophone in Arctic underwater | Jinbao Weng et.al. | 2508.07157 | null |
| 2025-08-09 | AugLift: Boosting Generalization in Lifting-based 3D Human Pose Estimation | Nikolai Warner et.al. | 2508.07112 | null |
| 2025-08-08 | Neural Field Representations of Mobile Computational Photography | Ilya Chugunov et.al. | 2508.05907 | null |
| 2025-08-07 | Propagating Sparse Depth via Depth Foundation Model for Out-of-Distribution Depth Completion | Shenglun Chen et.al. | 2508.04984 | null |
| 2025-08-06 | Extending Foundational Monocular Depth Estimators to Fisheye Cameras with Calibration Tokens | Suchisrit Gangopadhyay et.al. | 2508.04928 | null |
| 2025-08-06 | BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment | Tongfan Guan et.al. | 2508.04611 | link |
| 2025-08-06 | Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline | Linqing Zhao et.al. | 2508.04597 | null |
| 2025-08-06 | MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction | Yaopeng Lou et.al. | 2508.04297 | null |
| 2025-08-06 | DET-GS: Depth- and Edge-Aware Regularization for High-Fidelity 3D Gaussian Splatting | Zexu Huang et.al. | 2508.04099 | null |
| 2025-08-05 | Monocular Depth Estimation with Global-Aware Discretization and Local Context Modeling | Heng Wu et.al. | 2508.03186 | null |
| 2025-08-04 | VRSight: An AI-Driven Scene Description System to Improve Virtual Reality Accessibility for Blind People | Daniel Killough et.al. | 2508.02958 | null |
| 2025-08-04 | Elucidating the Role of Feature Normalization in IJEPA | Adam Colton et.al. | 2508.02829 | null |
| 2025-08-04 | Rethinking Transparent Object Grasping: Depth Completion with Monocular Depth Estimation and Instance Mask | Yaofeng Cheng et.al. | 2508.02507 | null |
| 2025-08-02 | 3DRot: 3D Rotation Augmentation for RGB-Based 3D Tasks | Shitian Yang et.al. | 2508.01423 | null |
| 2025-08-02 | A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding | Zhan Shi et.al. | 2508.01197 | link |
| 2025-07-29 | A Survey on Deep Multi-Task Learning in Connected Autonomous Vehicles | Jiayuan Wang et.al. | 2508.00917 | null |
| 2025-07-29 | TESPEC: Temporally-Enhanced Self-Supervised Pretraining for Event Cameras | Mohammad Mohammadi et.al. | 2508.00913 | null |
| 2025-07-28 | Sparse 3D Perception for Rose Harvesting Robots: A Two-Stage Approach Bridging Simulation and Real-World Applications | Taha Samavati et.al. | 2508.00900 | null |
| 2025-08-01 | Can Large Pretrained Depth Estimation Models Help With Image Dehazing? | Hongfei Zhang et.al. | 2508.00698 | null |
| 2025-07-31 | Stereo 3D Gaussian Splatting SLAM for Outdoor Urban Scenes | Xiaohan Li et.al. | 2507.23677 | null |
| 2025-07-30 | A Dual-Feature Extractor Framework for Accurate Back Depth and Spine Morphology Estimation from Monocular RGB Images | Yuxin Wei et.al. | 2507.22691 | null |
| 2025-07-30 | UAVScenes: A Multi-Modal Dataset for UAVs | Sijie Wang et.al. | 2507.22412 | link |
| 2025-07-29 | PanoSplatt3R: Leveraging Perspective Pretraining for Generalized Unposed Wide-Baseline Panorama Reconstruction | Jiahui Ren et.al. | 2507.21960 | null |
| 2025-07-25 | Event-Based De-Snowing for Autonomous Driving | Manasi Muglikar et.al. | 2507.20901 | null |
| 2025-07-28 | Endoscopic Depth Estimation Based on Deep Learning: A Survey | Ke Niu et.al. | 2507.20881 | null |
| 2025-07-26 | UniCT Depth: Event-Image Fusion Based Monocular Depth Estimation with Convolution-Compensated ViT Dual SA Block | Luoxi Jing et.al. | 2507.19948 | null |
| 2025-07-24 | Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting | Xingyu Miao et.al. | 2507.18678 | null |
| 2025-07-24 | DepthDark: Robust Monocular Depth Estimation for Low-Light Environments | Longjian Zeng et.al. | 2507.18243 | null |
| 2025-07-24 | BokehDiff: Neural Lens Blur with One-Step Diffusion | Chengxuan Zhu et.al. | 2507.18060 | null |
| 2025-07-23 | Monocular Semantic Scene Completion via Masked Recurrent Networks | Xuzhi Wang et.al. | 2507.17661 | null |
| 2025-07-22 | SDGOCC: Semantic and Depth-Guided Bird’s-Eye View Transformation for 3D Multimodal Occupancy Prediction | Zaipeng Duan et.al. | 2507.17083 | null |
| 2025-07-21 | DAViD: Data-efficient and Accurate Vision Models from Synthetic Data | Fatemeh Saleh et.al. | 2507.15365 | link |
| 2025-07-21 | BenchDepth: Are We on the Right Way to Evaluate Depth Foundation Models? | Zhenyu Li et.al. | 2507.15321 | null |
| 2025-07-20 | Region-aware Depth Scale Adaptation with Sparse Measurements | Rizhao Fan et.al. | 2507.14879 | null |
| 2025-07-20 | Training Self-Supervised Depth Completion Using Sparse Measurements and a Single Image | Rizhao Fan et.al. | 2507.14845 | null |
| 2025-07-19 | DCHM: Depth-Consistent Human Modeling for Multiview Detection | Jiahao Ma et.al. | 2507.14505 | null |
| 2025-07-19 | Motion Segmentation and Egomotion Estimation from Event-Based Normal Flow | Zhiyuan Hua et.al. | 2507.14500 | null |
| 2025-07-18 | Depth3DLane: Fusing Monocular 3D Lane Detection with Self-Supervised Monocular Depth Estimation | Max van den Hoven et.al. | 2507.13857 | null |
| 2025-07-18 | Augmented Reality in Cultural Heritage: A Dual-Model Pipeline for 3D Artwork Reconstruction | Daniele Pannone et.al. | 2507.13719 | null |
| 2025-07-17 | $π^3$ : Scalable Permutation-Equivariant Visual Geometry Learning | Yifan Wang et.al. | 2507.13347 | link |
| 2025-07-17 | $S^2M^2$ : Scalable Stereo Matching Model for Reliable Depth Estimation | Junhong Min et.al. | 2507.13229 | null |
| 2025-07-16 | Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios | Van-Hoang-Anh Phan et.al. | 2507.12449 | null |
| 2025-07-16 | Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation | Antonio Finocchiaro et.al. | 2507.12292 | null |
| 2025-07-15 | Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation | Zhen Xu et.al. | 2507.11540 | null |
| 2025-07-15 | MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network | Jianfei Jiang et.al. | 2507.11333 | null |
| 2025-07-15 | Uncertainty Aware Mapping for Vision-Based Underwater Robots | Abhimanyu Bhowmik et.al. | 2507.10991 | null |
| 2025-07-14 | Static or Temporal? Semantic Scene Simplification to Aid Wayfinding in Immersive Simulations of Bionic Vision | Justin M. Kasowski et.al. | 2507.10813 | null |
| 2025-07-14 | Cameras as Relative Positional Encoding | Ruilong Li et.al. | 2507.10496 | null |
| 2025-07-14 | Spatial Lifting for Dense Prediction | Mingzhi Xu et.al. | 2507.10222 | null |
| 2025-07-13 | Prompt2DEM: High-Resolution DEMs for Urban and Open Environments from Global Prompts Using a Monocular Foundation Model | Osher Rafaeli et.al. | 2507.09681 | null |
| 2025-07-11 | ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way | Rajarshi Roy et.al. | 2507.08679 | null |
| 2025-07-10 | An Embedded Real-time Object Alert System for Visually Impaired: A Monocular Depth Estimation based Approach through Computer Vision | Jareen Anjom et.al. | 2507.08165 | null |
| 2025-07-10 | Tree-Mamba: A Tree-Aware Mamba for Underwater Monocular Depth Estimation | Peixian Zhuang et.al. | 2507.07687 | null |
| 2025-07-10 | HOTA: Hierarchical Overlap-Tiling Aggregation for Large-Area 3D Flood Mapping | Wenfeng Jia et.al. | 2507.07585 | null |
| 2025-07-08 | LighthouseGS: Indoor Structure-aware 3D Gaussian Splatting for Panorama-Style Mobile Captures | Seungoh Han et.al. | 2507.06109 | null |
| 2025-07-14 | Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation | Quanzhu Niu et.al. | 2507.05948 | link |
| 2025-07-07 | The Generalization Ridge: Information Flow in Natural Language Generation | Ruidi Chang et.al. | 2507.05387 | null |
| 2025-07-10 | VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting | Juyi Lin et.al. | 2507.05116 | link |
| 2025-07-07 | Estimating Object Physical Properties from RGB-D Vision and Depth Robot Sensors Using Deep Learning | Ricardo Cardoso et.al. | 2507.05029 | null |
| 2025-07-06 | A View-consistent Sampling Method for Regularized Training of Neural Radiance Fields | Aoxiang Fan et.al. | 2507.04408 | null |
| 2025-07-06 | High-Resolution Sustain Pedal Depth Estimation from Piano Audio Across Room Acoustics | Kun Fang et.al. | 2507.04230 | null |
| 2025-07-03 | From Pixels to Damage Severity: Estimating Earthquake Impacts Using Semantic Segmentation of Social Media Images | Danrong Zhang et.al. | 2507.02781 | null |
| 2025-07-02 | Underwater Monocular Metric Depth Estimation: Real-World Benchmarks and Synthetic Fine-Tuning | Zijie Cai et.al. | 2507.02148 | null |
| 2025-07-02 | RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather | Yuran Wang et.al. | 2507.01653 | null |
| 2025-07-02 | Depth Anything at Any Condition | Boyuan Sun et.al. | 2507.01634 | link |
| 2025-07-02 | DepthSync: Diffusion Guidance-Based Depth Synchronization for Scale- and Geometry-Consistent Video Depth Estimation | Yue-Jiang Dong et.al. | 2507.01603 | null |
| 2025-07-02 | Evaluating Robustness of Monocular Depth Estimation with Procedural Scene Perturbations | Jack Nugent et.al. | 2507.00981 | null |
| 2025-06-30 | SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures | Fengyi Jiang et.al. | 2507.00209 | null |
| 2025-06-30 | OcRFDet: Object-Centric Radiance Fields for Multi-View 3D Object Detection in Autonomous Driving | Mingqian Ji et.al. | 2506.23565 | null |
| 2025-06-26 | ThermalDiffusion: Visual-to-Thermal Image-to-Image Translation for Autonomous Navigation | Shruti Bansal et.al. | 2506.20969 | null |
| 2025-06-25 | THIRDEYE: Cue-Aware Monocular Depth Estimation via Brain-Inspired Multi-Stage Fusion | Calin Teodor Ioan et.al. | 2506.20877 | null |
| 2025-06-30 | StereoDiff: Stereo-Diffusion Synergy for Video Depth Estimation | Haodong Li et.al. | 2506.20756 | null |
| 2025-06-24 | Look to Locate: Vision-Based Multisensory Navigation with 3-D Digital Maps for GNSS-Challenged Environments | Ola Elmaghraby et.al. | 2506.19827 | null |
| 2025-06-23 | SOF: Sorted Opacity Fields for Fast Unbounded Surface Reconstruction | Lukas Radl et.al. | 2506.19139 | null |
| 2025-06-23 | BulletGen: Improving 4D Reconstruction with Bullet-Time Generation | Denys Rozumnyi et.al. | 2506.18601 | null |
| 2025-06-21 | Optimization-Free Patch Attack on Stereo Depth Estimation | Hangcheng Liu et.al. | 2506.17632 | null |
| 2025-06-20 | DreamCube: 3D Panorama Generation via Multi-plane Synchronization | Yukun Huang et.al. | 2506.17206 | link |
| 2025-06-20 | RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking | Teng Guo et.al. | 2506.17119 | link |
| 2025-06-20 | Monocular One-Shot Metric-Depth Alignment for RGB-Based Robot Grasping | Teng Guo et.al. | 2506.17110 | null |
| 2025-06-20 | DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches | Yun Xing et.al. | 2506.16690 | null |
| 2025-06-19 | EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training | Liangjing Shao et.al. | 2506.16017 | link |
| 2025-06-18 | RaCalNet: Radar Calibration Network for Sparse-Supervised Metric Depth Estimation | Xingrui Qin et.al. | 2506.15560 | null |
| 2025-06-17 | Time-Optimized Safe Navigation in Unstructured Environments through Learning Based Depth Completion | Jeffrey Mao et.al. | 2506.14975 | null |
| 2025-06-17 | DiFuse-Net: RGB and Dual-Pixel Depth Estimation using Window Bi-directional Parallax Attention and Cross-modal Transfer Learning | Kunal Swami et.al. | 2506.14709 | null |
| 2025-06-16 | Test3R: Learning to Reconstruct 3D at Test Time | Yuheng Yuan et.al. | 2506.13750 | link |
| 2025-06-16 | Multiview Geometric Regularization of Gaussian Splatting for Accurate Radiance Fields | Jungeon Kim et.al. | 2506.13508 | null |
| 2025-06-17 | Self-Supervised Enhancement for Depth from a Lightweight ToF Sensor with Monocular Images | Laiyan Ding et.al. | 2506.13444 | null |
| 2025-06-16 | TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Scale-Oriented Contrast | Beilei Cui et.al. | 2506.13387 | link |
| 2025-06-17 | 3D Hand Mesh-Guided AI-Generated Malformed Hand Refinement with Hand Pose Transformation via Diffusion Model | Chen-Bin Feng et.al. | 2506.12680 | null |
| 2025-06-12 | Leveraging 6DoF Pose Foundation Models For Mapping Marine Sediment Burial | Jerry Yan et.al. | 2506.10386 | link |
| 2025-06-11 | DCIRNet: Depth Completion with Iterative Refinement for Dexterous Grasping of Transparent and Reflective Objects | Guanghu Xie et.al. | 2506.09491 | null |
| 2025-06-11 | MSSDF: Modality-Shared Self-supervised Distillation for High-Resolution Multi-modal Remote Sensing Image Learning | Tong Wang et.al. | 2506.09327 | null |
| 2025-06-10 | AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models | Zheda Mai et.al. | 2506.09082 | null |
| 2025-06-10 | One Patch to Rule Them All: Transforming Static Patches into Dynamic Attacks in the Physical World | Xingshuo Han et.al. | 2506.08482 | null |
| 2025-06-09 | Jamais Vu: Exposing the Generalization Gap in Supervised Semantic Correspondence | Octave Mariotti et.al. | 2506.08220 | null |
| 2025-06-09 | Hidden in plain sight: VLMs overlook their visual representations | Stephanie Fu et.al. | 2506.08008 | null |
| 2025-06-09 | EgoM2P: Egocentric Multimodal Multitask Pretraining | Gen Li et.al. | 2506.07886 | null |
| 2025-06-09 | Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images | Yingping Liang et.al. | 2506.07740 | null |
| 2025-06-07 | Dark Channel-Assisted Depth-from-Defocus from a Single Image | Moushumi Medhi et.al. | 2506.06643 | null |
| 2025-06-06 | NTIRE 2025 Challenge on HR Depth from Images of Specular and Transparent Surfaces | Pierluigi Zama Ramirez et.al. | 2506.05815 | null |
| 2025-06-06 | Advancement and Field Evaluation of a Dual-arm Apple Harvesting Robot | Keyi Zhu et.al. | 2506.05714 | null |
| 2025-06-06 | Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration | Fanhu Zeng et.al. | 2506.05709 | null |
| 2025-06-06 | Aerial Multi-View Stereo via Adaptive Depth Range Inference and Normal Cues | Yimei Liu et.al. | 2506.05655 | null |
| 2025-06-03 | Attacking Attention of Foundation Models Disrupts Downstream Tasks | Hondamunige Prasanna Silva et.al. | 2506.05394 | null |
| 2025-06-09 | Structure-Aware Radar-Camera Depth Estimation | Fuyi Zhang et.al. | 2506.05008 | null |
| 2025-06-05 | Generating Synthetic Stereo Datasets using 3D Gaussian Splatting and Expert Knowledge Transfer | Filip Slezak et.al. | 2506.04908 | null |
| 2025-06-05 | Toward Better SSIM Loss for Unsupervised Monocular Depth Estimation | Yijun Cao et.al. | 2506.04758 | null |
| 2025-06-04 | JointSplat: Probabilistic Joint Flow-Depth Optimization for Sparse-View Gaussian Splatting | Yang Xiao et.al. | 2506.03872 | null |
| 2025-06-03 | ViT-Split: Unleashing the Power of Vision Foundation Models via Efficient Splitting Heads | Yifan Li et.al. | 2506.03433 | null |
| 2025-06-02 | E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models | Wenyan Cong et.al. | 2506.01933 | null |
| 2025-06-01 | Perceptual Inductive Bias Is What You Need Before Contrastive Learning | Tianqin Li et.al. | 2506.01201 | null |
| 2025-05-31 | XYZ-IBD: High-precision Bin-picking Dataset for Object 6D Pose Estimation Capturing Real-world Industrial Complexity | Junwen Huang et.al. | 2506.00599 | null |
| 2025-05-31 | Improving Optical Flow and Stereo Depth Estimation by Leveraging Uncertainty-Based Learning Difficulties | Jisoo Jeong et.al. | 2506.00324 | null |
| 2025-05-30 | Harnessing Foundation Models for Robust and Generalizable 6-DOF Bronchoscopy Localization | Qingyao Tian et.al. | 2505.24249 | null |
| 2025-05-29 | Ultrafast High-Flux Single-Photon LiDAR Simulator via Neural Mapping | Weijian Zhang et.al. | 2505.23992 | null |
| 2025-05-29 | Bridging Geometric and Semantic Foundation Models for Generalized Monocular Depth Estimation | Sanggyun Ma et.al. | 2505.23400 | null |
| 2025-05-29 | GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion | Gwanghyun Kim et.al. | 2505.23085 | null |
| 2025-05-28 | Depth to magnetic source estimation using TDX contour | Hammed Oyekan et.al. | 2505.22780 | null |
| 2025-05-27 | Object Concepts Emerge from Motion | Haoqian Liang et.al. | 2505.21635 | null |
| 2025-05-23 | EvidenceMoE: A Physics-Guided Mixture-of-Experts with Evidential Critics for Advancing Fluorescence Light Detection and Ranging in Scattering Media | Ismail Erbas et.al. | 2505.21532 | null |
| 2025-05-27 | Occlusion Boundary and Depth: Mutual Enhancement via Multi-Task Learning | Lintao Xu et.al. | 2505.21231 | null |
| 2025-05-27 | Robust Video-Based Pothole Detection and Area Estimation for Intelligent Vehicles with Depth Map and Kalman Smoothing | Dehao Wang et.al. | 2505.21049 | null |
| 2025-05-27 | Spatial RoboGrasp: Generalized Robotic Grasping Control Policy | Yiqi Huang et.al. | 2505.20814 | null |
| 2025-05-26 | SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams | Zhuoheng Gao et.al. | 2505.19487 | null |
| 2025-05-25 | From Single Images to Motion Policies via Video-Generation Environment Representations | Weiming Zhi et.al. | 2505.19306 | null |
| 2025-05-23 | Repurposing Marigold for Zero-Shot Metric Depth Estimation via Defocus Blur Cues | Chinmay Talegaonkar et.al. | 2505.17358 | null |
| 2025-05-22 | MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation | Bohan Zhou et.al. | 2505.16602 | null |
| 2025-05-22 | BadDepth: Backdoor Attacks Against Monocular Depth Estimation in the Physical World | Ji Guo et.al. | 2505.16154 | null |
| 2025-05-21 | RadarRGBD A Multi-Sensor Fusion Dataset for Perception with RGB-D and mmWave Radar | Tieshuai Song et.al. | 2505.15860 | null |
| 2025-05-20 | M3Depth: Wavelet-Enhanced Depth Estimation on Mars via Mutual Boosting of Dual-Modal Data | Junjie Li et.al. | 2505.14159 | null |
| 2025-05-20 | Multi-Label Stereo Matching for Transparent Scene Depth Estimation | Zhidan Liu et.al. | 2505.14008 | link |
| 2025-05-20 | Event-Driven Dynamic Scene Depth Completion | Zhiqiang Yan et.al. | 2505.13279 | null |
| 2025-05-19 | DB3D-L: Depth-aware BEV Feature Transformation for Accurate 3D Lane Detection | Yehao Liu et.al. | 2505.13266 | null |
| 2025-05-24 | 3D Visual Illusion Depth Estimation | Chengtang Yao et.al. | 2505.13061 | link |
| 2025-05-19 | IA-MVS: Instance-Focused Adaptive Depth Sampling for Multi-View Stereo | Yinzhe Wang et.al. | 2505.12714 | null |
| 2025-05-18 | Depth Transfer: Learning to See Like a Simulator for Real-World Drone Navigation | Hang Yu et.al. | 2505.12428 | null |
| 2025-05-18 | Always Clear Depth: Robust Monocular Depth Estimation under Adverse Weather | Kui Jiang et.al. | 2505.12199 | link |
| 2025-05-17 | MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos | Hongyi Zhou et.al. | 2505.11868 | null |
| 2025-05-16 | SurgPose: Generalisable Surgical Instrument Pose Estimation using Zero-Shot Learning and Stereo Vision | Utsav Rai et.al. | 2505.11439 | null |
| 2025-05-16 | Attention on the Sphere | Boris Bonev et.al. | 2505.11157 | null |
| 2025-05-15 | Depth Anything with Any Prior | Zehan Wang et.al. | 2505.10565 | null |
| 2025-05-15 | JointDistill: Adaptive Multi-Task Distillation for Joint Depth Estimation and Scene Segmentation | Tiancong Cheng et.al. | 2505.10057 | null |
| 2025-05-14 | Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis | Bingxin Ke et.al. | 2505.09358 | link |
| 2025-05-13 | Boosting Zero-shot Stereo Matching using Large-scale Mixed Images Sources in the Real World | Yuran Wang et.al. | 2505.08607 | null |
| 2025-05-12 | Some insights into depth estimators for location and scatter in the multivariate setting | Jorge G. Adrover et.al. | 2505.07383 | null |
| 2025-05-11 | Reinforcement Learning-Based Monocular Vision Approach for Autonomous UAV Landing | Tarik Houichime et.al. | 2505.06963 | null |
| 2025-05-10 | ElectricSight: 3D Hazard Monitoring for Power Lines Using Low-Cost Sensors | Xingchen Li et.al. | 2505.06573 | null |
| 2025-05-09 | Camera-Only Bird’s Eye View Perception: A Neural Approach to LiDAR-Free Environmental Mapping for Autonomous Vehicles | Anupkumar Bochare et.al. | 2505.06113 | null |
| 2025-05-09 | MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection | Zhihao Zhang et.al. | 2505.04594 | null |
| 2025-05-13 | Self-Supervised Learning for Robotic Leaf Manipulation: A Hybrid Geometric-Neural Approach | Srecharan Selvam et.al. | 2505.03702 | null |
| 2025-05-06 | LiftFeat: 3D Geometry-Aware Local Feature Matching | Yepeng Liu et.al. | 2505.03422 | link |
| 2025-05-06 | VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery | Bojin Wu et.al. | 2505.02704 | link |
| 2025-05-05 | DELTA: Dense Depth from Events and LiDAR using Transformer’s Attention | Vincent Brebion et.al. | 2505.02593 | null |
| 2025-05-03 | PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth | Bu Jin et.al. | 2505.01729 | null |
| 2025-05-02 | LMDepth: Lightweight Mamba-based Monocular Depth Estimation for Real-World Deployment | Jiahuan Long et.al. | 2505.00980 | null |
| 2025-05-01 | JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers | Kwon Byung-Ki et.al. | 2505.00482 | null |
| 2025-04-30 | HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation | Haiyang Zhou et.al. | 2504.21650 | null |
| 2025-04-30 | eNCApsulate: NCA for Precision Diagnosis on Capsule Endoscopes | Henry John Krumb et.al. | 2504.21562 | null |
| 2025-04-29 | Real-Time Wayfinding Assistant for Blind and Low-Vision Users | Dabbrata Das et.al. | 2504.20976 | null |
| 2025-04-29 | Large-scale visual SLAM for in-the-wild videos | Shuo Sun et.al. | 2504.20496 | null |
| 2025-04-28 | Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video | Hoang Chuong Nguyen et.al. | 2504.19819 | null |
| 2025-04-27 | Leveraging Multi-Modal Saliency and Fusion for Gaze Target Detection | Athul M. Mathew et.al. | 2504.19271 | null |
| 2025-04-26 | Depth as Points: Center Point-based Depth Estimation | Zhiheng Tu et.al. | 2504.18773 | null |
| 2025-04-25 | LaRI: Layered Ray Intersections for Single-view 3D Geometric Reasoning | Rui Li et.al. | 2504.18424 | null |
| 2025-04-25 | Dense Geometry Supervision for Underwater Depth Estimation | Wenxiang Gua et.al. | 2504.18233 | null |
| 2025-04-25 | LiDAR-Guided Monocular 3D Object Detection for Long-Range Railway Monitoring | Raul David Dominguez Sanchez et.al. | 2504.18203 | null |
| 2025-04-24 | The Fourth Monocular Depth Estimation Challenge | Anton Obukhov et.al. | 2504.17787 | null |
| 2025-04-24 | Occlusion-Aware Self-Supervised Monocular Depth Estimation for Weak-Texture Endoscopic Images | Zebo Huang et.al. | 2504.17582 | null |
| 2025-04-24 | Invasion depth estimation of gastric cancer in early stage using circularly polarized light scattering: Phantom studies | Mike R. Maskey et.al. | 2504.17161 | null |
| 2025-04-23 | PPS-Ctrl: Controllable Sim-to-Real Translation for Colonoscopy Depth Estimation | Xinqi Xiong et.al. | 2504.17067 | null |
| 2025-04-23 | Helping Blind People Grasp: Enhancing a Tactile Bracelet with an Automated Hand Navigation System | Marcin Furtak et.al. | 2504.16502 | null |
| 2025-04-21 | MonoTher-Depth: Enhancing Thermal Depth Estimation via Confidence-Aware Distillation | Xingxing Zuo et.al. | 2504.16127 | null |
| 2025-04-22 | DERD-Net: Learning Depth from Event-based Ray Densities | Diego de Oliveira Hitzges et.al. | 2504.15863 | null |
| 2025-04-22 | VistaDepth: Frequency Modulation With Bias Reweighting For Enhanced Long-Range Depth Estimation | Mingxia Zhan et.al. | 2504.15095 | null |
| 2025-04-20 | Seurat: From Moving Points to Depth | Seokju Cho et.al. | 2504.14687 | link |
| 2025-04-18 | Occlusion-Ordered Semantic Instance Segmentation | Soroosh Baselizadeh et.al. | 2504.14054 | null |
| 2025-04-18 | Enhancing Pothole Detection and Characterization: Integrated Segmentation and Depth Estimation in Road Anomaly Systems | Uthman Baroudi et.al. | 2504.13648 | null |
| 2025-04-17 | Perception Encoder: The best visual embeddings are not at the output of the network | Daniel Bolya et.al. | 2504.13181 | null |
| 2025-04-17 | TSGS: Improving Gaussian Splatting for Transparent Surface Reconstruction via Normal and De-lighting Priors | Mingwei Li et.al. | 2504.12799 | null |
| 2025-04-17 | Privacy-Preserving Operating Room Workflow Analysis using Digital Twins | Alejandra Perez et.al. | 2504.12552 | null |
| 2025-04-16 | Metric-Solver: Sliding Anchored Metric Depth Estimation from a Single Image | Tao Wen et.al. | 2504.12103 | null |
| 2025-04-16 | TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion | Yiran Wang et.al. | 2504.11773 | null |
| 2025-04-16 | An Online Adaptation Method for Robust Depth Estimation and Visual Odometry in the Open World | Xingwu Ji et.al. | 2504.11698 | link |
| 2025-04-15 | Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception | Ziqi Pang et.al. | 2504.11457 | link |
| 2025-04-16 | DeepWheel: Generating a 3D Synthetic Wheel Dataset for Design and Performance Evaluation | Soyoung Yoo et.al. | 2504.11347 | null |
| 2025-04-13 | TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting | Zhicong Wu et.al. | 2504.09588 | null |
| 2025-04-12 | Text To 3D Object Generation For Scalable Room Assembly | Sonia Laguna et.al. | 2504.09328 | null |
| 2025-04-11 | Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation | Bram Vanherle et.al. | 2504.08473 | link |
| 2025-04-10 | Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction | Zeren Jiang et.al. | 2504.07961 | null |
| 2025-04-09 | FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution | Gene Chou et.al. | 2504.07093 | null |
| 2025-04-08 | POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction | Songyan Zhang et.al. | 2504.05692 | link |
| 2025-04-07 | Stereo-LiDAR Fusion by Semi-Global Matching With Discrete Disparity-Matching Cost and Semidensification | Yasuhiro Yao et.al. | 2504.05148 | link |
| 2025-04-04 | 3D Scene Understanding Through Local Random Access Sequence Modeling | Wanhee Lee et.al. | 2504.03875 | null |
| 2025-04-04 | RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation | Hanbo Bi et.al. | 2504.03166 | null |
| 2025-04-02 | FreSca: Unveiling the Scaling Space in Diffusion Models | Chao Huang et.al. | 2504.02154 | null |
| 2025-04-03 | Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting | Shu-Wei Lu et.al. | 2504.01957 | null |
| 2025-04-02 | A novel gesture interaction control method for rehabilitation lower extremity exoskeleton | Shuang Qiu et.al. | 2504.01888 | null |
| 2025-04-02 | DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image | Jijun Xiang et.al. | 2504.01596 | null |
| 2025-04-01 | GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors | Tian-Xing Xu et.al. | 2504.01016 | null |
| 2025-04-01 | Monocular and Generalizable Gaussian Talking Head Animation | Shengjie Gong et.al. | 2504.00665 | null |
| 2025-03-31 | ExScene: Free-View 3D Scene Reconstruction with Gaussian Splatting from a Single Image | Tianyi Gong et.al. | 2503.23881 | null |
| 2025-03-31 | Detail-aware multi-view stereo network for depth estimation | Haitao Tian et.al. | 2503.23684 | null |
| 2025-03-30 | Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries | Wei Xu et.al. | 2503.23606 | null |
| 2025-03-30 | Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model | Jannik Endres et.al. | 2503.23502 | link |
| 2025-03-28 | SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations | Krispin Wandel et.al. | 2503.22462 | null |
| 2025-03-28 | EndoLRMGS: Complete Endoscopic Scene Reconstruction combining Large Reconstruction Modelling and Gaussian Splatting | Xu Wang et.al. | 2503.22437 | link |
| 2025-03-28 | MVSAnywhere: Zero-Shot Multi-View Stereo | Sergio Izquierdo et.al. | 2503.22430 | null |
| 2025-03-28 | One Look is Enough: A Novel Seamless Patchwise Refinement for Zero-Shot Monocular Depth Estimation Models on High-Resolution Images | Byeongjun Kwon et.al. | 2503.22351 | null |
| 2025-03-28 | Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces | Wonhyeok Choi et.al. | 2503.22209 | null |
| 2025-03-28 | Deep Depth Estimation from Thermal Image: Dataset, Benchmark, and Challenges | Ukcheol Shin et.al. | 2503.22060 | link |
| 2025-03-27 | A Unified Image-Dense Annotation Generation Model for Underwater Scenes | Hongkai Lin et.al. | 2503.21771 | link |
| 2025-03-27 | ICG-MVSNet: Learning Intra-view and Cross-view Relationships for Guidance in Multi-View Stereo | Yuxi Hu et.al. | 2503.21525 | null |
| 2025-03-26 | Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors | Weilong Yan et.al. | 2503.20211 | link |
| 2025-03-26 | FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion | Pihai Sun et.al. | 2503.19739 | link |
| 2025-03-25 | Semi-SD: Semi-Supervised Metric Depth Estimation via Surrounding Cameras for Autonomous Driving | Yusen Xie et.al. | 2503.19713 | link |
| 2025-03-25 | StableGS: A Floater-Free Framework for 3D Gaussian Splatting | Luchao Wang et.al. | 2503.18458 | null |
| 2025-03-24 | PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes | Xinhua Xu et.al. | 2503.18393 | null |
| 2025-03-23 | Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images | Yara AlaaEldin et.al. | 2503.17982 | link |
| 2025-03-21 | Radar-Guided Polynomial Fitting for Metric Depth Estimation | Patrick Rim et.al. | 2503.17182 | null |
| 2025-03-21 | AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process | Junjie Hu et.al. | 2503.17029 | null |
| 2025-03-21 | Distilling Monocular Foundation Model for Fine-grained Depth Completion | Yingping Liang et.al. | 2503.16970 | null |
| 2025-03-20 | QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge | Xuan Shen et.al. | 2503.16709 | null |
| 2025-03-20 | A Recipe for Generating 3D Worlds From a Single Image | Katja Schwarz et.al. | 2503.16611 | null |
| 2025-03-20 | Learning to Efficiently Adapt Foundation Models for Self-Supervised Endoscopic 3D Scene Reconstruction from Any Cameras | Beilei Cui et.al. | 2503.15917 | null |
| 2025-03-20 | Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation | Jiyuan Wang et.al. | 2503.15905 | null |
| 2025-03-19 | TULIP: Towards Unified Language-Image Pretraining | Zineng Tang et.al. | 2503.15485 | null |
| 2025-03-19 | EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining | Boshen Xu et.al. | 2503.15470 | null |
| 2025-03-19 | USAM-Net: A U-Net-based Network for Improved Stereo Correspondence and Scene Depth Estimation using Features from a Pre-trained Image Segmentation network | Joseph Emmanuel DL Dayo et.al. | 2503.14950 | null |
| 2025-03-18 | Multi-view Reconstruction via SfM-guided Monocular Depth Estimation | Haoyu Guo et.al. | 2503.14483 | null |
| 2025-03-18 | DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers | Mert Bulent Sariyildiz et.al. | 2503.14405 | null |
| 2025-03-18 | 3D Densification for Multi-Map Monocular VSLAM in Endoscopy | X. Anadón et.al. | 2503.14346 | null |
| 2025-03-17 | MonoCT: Overcoming Monocular 3D Detection Domain Shift with Consistent Teacher Models | Johannes Meier et.al. | 2503.13743 | null |
| 2025-03-17 | Improving Geometric Consistency for 360-Degree Neural Radiance Fields in Indoor Scenarios | Iryna Repinetska et.al. | 2503.13710 | null |
| 2025-03-19 | FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis | Luxi Chen et.al. | 2503.13265 | null |
| 2025-03-17 | MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs | Erik Daxberger et.al. | 2503.13111 | null |
| 2025-03-17 | TransDiff: Diffusion-Based Method for Manipulating Transparent Objects Using a Single RGB-D Image | Haoxiao Wang et.al. | 2503.12779 | null |
| 2025-03-16 | UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing | Tsu-Jui Fu et.al. | 2503.12652 | null |
| 2025-03-16 | Deblur Gaussian Splatting SLAM | Francesco Girlanda et.al. | 2503.12572 | null |
| 2025-03-14 | VGGT: Visual Geometry Grounded Transformer | Jianyuan Wang et.al. | 2503.11651 | null |
| 2025-03-14 | Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation | Hongyu Wen et.al. | 2503.11633 | null |
| 2025-03-14 | Simulating Dual-Pixel Images From Ray Tracing For Depth Estimation | Fengchen He et.al. | 2503.11213 | null |
| 2025-03-13 | Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow within Unified Neural Representations | Xunzhi Zheng et.al. | 2503.10464 | null |
| 2025-03-15 | WonderVerse: Extendable 3D Scene Generation with Video Generative Models | Hao Feng et.al. | 2503.09160 | null |
| 2025-03-11 | Language-Depth Navigated Thermal and Visible Image Fusion | Jinchang Zhang et.al. | 2503.08676 | null |
| 2025-03-11 | CL-MVSNet: Unsupervised Multi-view Stereo with Dual-level Contrastive Learning | Kaiqiang Xiong et.al. | 2503.08219 | null |
| 2025-03-10 | SIRE: SE(3) Intrinsic Rigidity Embeddings | Cameron Smith et.al. | 2503.07739 | null |
| 2025-03-10 | LBM: Latent Bridge Matching for Fast Image-to-Image Translation | Clément Chadebec et.al. | 2503.07535 | link |
| 2025-03-12 | Endo-FASt3r: Endoscopic Foundation model Adaptation for Structure from motion | Mona Sheikh Zeinoddin et.al. | 2503.07204 | null |
| 2025-03-11 | LightMotion: A Light and Tuning-free Method for Simulating Camera Motion in Video Generation | Quanjian Song et.al. | 2503.06508 | null |
| 2025-03-08 | Towards Ambiguity-Free Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity | Xiaohao Xu et.al. | 2503.06014 | link |
| 2025-03-07 | TomatoScanner: phenotyping tomato fruit based on only RGB image | Xiaobei Zhao et.al. | 2503.05568 | null |
| 2025-03-07 | Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects | Justin Yu et.al. | 2503.05189 | null |
| 2025-03-05 | RTFusion: A depth estimation network based on multimodal fusion in challenging scenarios | Zelin Meng et.al. | 2503.04821 | null |
| 2025-03-06 | A Novel Solution for Drone Photogrammetry with Low-overlap Aerial Images using Monocular Depth Estimation | Jiageng Zhong et.al. | 2503.04513 | null |
| 2025-03-08 | EvidMTL: Evidential Multi-Task Learning for Uncertainty-Aware Semantic Surface Mapping from Monocular RGB Images | Rohit Menon et.al. | 2503.04441 | null |
| 2025-03-06 | H3O: Hyper-Efficient 3D Occupancy Prediction with Heterogeneous Supervision | Yunxiao Shi et.al. | 2503.04059 | null |
| 2025-03-05 | Task-Agnostic Attacks Against Vision Foundation Models | Brian Pulfer et.al. | 2503.03842 | null |
| 2025-03-05 | Multi-View Depth Consistent Image Generation Using Generative AI Models: Application on Architectural Design of University Buildings | Xusheng Du et.al. | 2503.03068 | null |
| 2025-03-04 | RGBSQGrasp: Inferring Local Superquadric Primitives from Single RGB Image for Graspability-Aware Bin Picking | Yifeng Xu et.al. | 2503.02387 | null |
| 2025-03-03 | MUSt3R: Multi-view Network for Stereo 3D Reconstruction | Yohann Cabon et.al. | 2503.01661 | null |
| 2025-03-02 | Bridging Spectral-wise and Multi-spectral Depth Estimation via Geometry-guided Contrastive Learning | Ukcheol Shin et.al. | 2503.00793 | null |
| 2025-02-28 | EndoPBR: Material and Lighting Estimation for Photorealistic Surgical Simulations via Physically-based Rendering | John J. Han et.al. | 2502.20669 | null |
| 2025-02-27 | UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler | Luigi Piccinelli et.al. | 2502.20110 | link |
| 2025-02-26 | Stellar Models Also Limit Exoplanet Atmosphere Studies in Emission | Thomas J. Fauchez et.al. | 2502.19585 | null |
| 2025-02-26 | Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator | Xiankang He et.al. | 2502.19204 | link |
| 2025-02-26 | SLAM in the Dark: Self-Supervised Learning of Pose, Depth and Loop-Closure from Thermal Images | Yangfan Xu et.al. | 2502.18932 | null |
| 2025-02-21 | RGB-Only Gaussian Splatting SLAM for Unbounded Outdoor Scenes | Sicheng Yu et.al. | 2502.15633 | null |
| 2025-02-20 | CDGS: Confidence-Aware Depth Regularization for 3D Gaussian Splatting | Qilin Zhang et.al. | 2502.14684 | link |
| 2025-03-03 | Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion | Jiangyuan Liu et.al. | 2502.14616 | link |
| 2025-02-20 | Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining | Wonhyeok Choi et.al. | 2502.14573 | null |
| 2025-02-20 | OrchardDepth: Precise Metric Depth Estimation of Orchard Scene from Monocular Camera Images | Zhichao Zheng et.al. | 2502.14279 | null |
| 2025-02-18 | Pre-training Auto-regressive Robotic Models with 4D Representations | Dantong Niu et.al. | 2502.13142 | null |
| 2025-02-18 | SHADeS: Self-supervised Monocular Depth Estimation Through Non-Lambertian Image Decomposition | Rema Daher et.al. | 2502.12994 | null |
| 2025-02-17 | Deep Neural Networks for Accurate Depth Estimation with Latent Space Features | Siddiqui Muhammad Yasir et.al. | 2502.11777 | null |
| 2025-02-16 | Adjust Your Focus: Defocus Deblurring From Dual-Pixel Images Using Explicit Multi-Scale Cross-Correlation | Kunal Swami et.al. | 2502.11002 | null |
| 2025-02-14 | RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control | Teng Li et.al. | 2502.10059 | null |
| 2025-02-13 | SteROI-D: System Design and Mapping for Stereo Depth Inference on Regions of Interest | Jack Erhardt et.al. | 2502.09528 | null |
| 2025-02-17 | S $^2$ -Diffusion: Generalizing from Instance-level to Category-level Skills in Robot Manipulation | Quantao Yang et.al. | 2502.09389 | null |
| 2025-02-13 | CoL3D: Collaborative Learning of Single-view Depth and Camera Intrinsics for Metric 3D Shape Recovery | Chenghao Zhang et.al. | 2502.08902 | null |
| 2025-02-13 | Visual-based spatial audio generation system for multi-speaker environments | Xiaojing Liu et.al. | 2502.07538 | null |
| 2025-02-11 | Learning Inverse Laplacian Pyramid for Progressive Depth Completion | Kun Wang et.al. | 2502.07289 | null |
| 2025-02-10 | From Image to Video: An Empirical Study of Diffusion Representations | Pedro Vélez et.al. | 2502.07001 | null |
| 2025-02-09 | Revisiting Gradient-based Uncertainty for Monocular Depth Estimation | Julia Hornauer et.al. | 2502.05964 | null |
| 2025-02-09 | SphereFusion: Efficient Panorama Depth Estimation via Gated Fusion | Qingsong Yan et.al. | 2502.05859 | null |
| 2025-02-05 | MetaFE-DE: Learning Meta Feature Embedding for Depth Estimation from Monocular Endoscopic Images | Dawei Lu et.al. | 2502.03493 | null |
| 2025-02-04 | DOC-Depth: A novel approach for dense depth ground truth generation | Simon de Moreau et.al. | 2502.02144 | null |
| 2025-02-01 | Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding | Jingming Xia et.al. | 2502.01666 | null |
| 2025-02-01 | Exploring Representation-Aligned Latent Space for Better Generation | Wanghan Xu et.al. | 2502.00359 | null |
| 2025-02-01 | MonoDINO-DETR: Depth-Enhanced Monocular 3D Object Detection Using a Vision Foundation Model | Jihyeok Kim et.al. | 2502.00315 | null |
| 2025-01-30 | Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion | Vitor Guizilini et.al. | 2501.18804 | null |
| 2025-01-25 | Snapshot Compressed Imaging Based Single-Measurement Computer Vision for Videos | Fengpu Pan et.al. | 2501.15122 | null |
| 2025-01-24 | Rethinking Encoder-Decoder Flow Through Shared Structures | Frederik Laboyrie et.al. | 2501.14535 | null |
| 2025-01-23 | IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models | Jiayi Lei et.al. | 2501.13920 | link |
| 2025-01-23 | PromptMono: Cross Prompting Attention for Self-Supervised Monocular Depth Estimation in Challenging Environments | Changhao Wang et.al. | 2501.13796 | null |
| 2025-01-22 | Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks | Alessio Quercia et.al. | 2501.12824 | null |
| 2025-01-22 | Video Depth Anything: Consistent Depth Estimation for Super-Long Videos | Sili Chen et.al. | 2501.12375 | link |
| 2025-01-21 | Fast Underwater Scene Reconstruction using Multi-View Stereo and Physical Imaging | Shuyi Hu et.al. | 2501.11884 | null |
| 2025-01-21 | Survey on Monocular Metric Depth Estimation | Jiuling Zhang et.al. | 2501.11841 | null |
| 2025-01-19 | RDG-GS: Relative Depth Guidance with Gaussian Splatting for Real-time Sparse-View 3D Rendering | Chenlu Zhan et.al. | 2501.11102 | null |
| 2025-01-15 | BloomScene: Lightweight Structured 3D Gaussian Splatting for Crossmodal Scene Generation | Xiaolu Hou et.al. | 2501.10462 | null |
| 2025-01-20 | Zero-Shot Monocular Scene Flow Estimation in the Wild | Yiqing Liang et.al. | 2501.10357 | null |
| 2025-01-17 | One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression | Keita Miwa et.al. | 2501.10064 | null |
| 2025-01-17 | Multi-Modal Attention Networks for Enhanced Segmentation and Depth Estimation of Subsurface Defects in Pulse Thermography | Mohammed Salah et.al. | 2501.09994 | link |
| 2025-01-21 | FoundationStereo: Zero-Shot Stereo Matching | Bowen Wen et.al. | 2501.09898 | link |
| 2025-01-16 | DEFOM-Stereo: Depth Foundation Model Based Stereo Matching | Hualie Jiang et.al. | 2501.09466 | link |
| 2025-01-15 | StereoGen: High-quality Stereo Image Generation from a Single Image | Xianqi Wang et.al. | 2501.08654 | link |
| 2025-01-15 | MonSter: Marry Monodepth to Stereo Unleashes Power | Junda Cheng et.al. | 2501.08643 | link |
| 2025-01-14 | A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation | Steven Landgraf et.al. | 2501.08188 | null |
| 2025-01-14 | Revisiting Birds Eye View Perception Models with Frozen Foundation Models: DINOv2 and Metric3Dv2 | Seamie Hayes et.al. | 2501.08118 | null |
| 2025-01-13 | Matching Free Depth Recovery from Structured Light | Zhuohang Yu et.al. | 2501.07113 | null |
| 2025-01-09 | Relative Pose Estimation through Affine Corrections of Monocular Depth Priors | Yifan Yu et.al. | 2501.05446 | link |
| 2025-01-09 | $DPF^*$ : improved Depth Potential Function for scale-invariant sulcal depth estimation | Maxime Dieudonné et.al. | 2501.05436 | link |
| 2025-01-09 | A Systematic Literature Review on Deep Learning-based Depth Estimation in Computer Vision | Ali Rohan et.al. | 2501.05147 | null |
| 2025-01-07 | AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features | Ruochen Zhang et.al. | 2501.03700 | null |
| 2025-01-05 | DepthMaster: Taming Diffusion Models for Monocular Depth Estimation | Ziyang Song et.al. | 2501.02576 | link |
| 2025-01-05 | Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera | Yuliang Guo et.al. | 2501.02464 | link |
| 2025-01-03 | SafeAug: Safety-Critical Driving Data Augmentation from Naturalistic Datasets | Zhaobin Mo et.al. | 2501.02143 | null |
| 2025-01-03 | Laparoscopic Scene Analysis for Intraoperative Visualisation of Gamma Probe Signals in Minimally Invasive Cancer Surgery | Baoru Huang et.al. | 2501.01752 | null |
| 2025-01-03 | IGAF: Incremental Guided Attention Fusion for Depth Super-Resolution | Athanasios Tragakis et.al. | 2501.01723 | null |
| 2024-12-31 | Tech Report: Divide and Conquer 3D Real-Time Reconstruction for Improved IGS | Yicheng Zhu et.al. | 2501.01465 | null |
| 2025-01-02 | TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions | Vriksha Srihari et.al. | 2501.01156 | null |
| 2025-01-02 | PatchRefiner V2: Fast and Lightweight Real-Domain High-Resolution Metric Depth Estimation | Zhenyu Li et.al. | 2501.01121 | null |
| 2024-12-30 | FPGA-based Acceleration of Neural Network for Image Classification using Vitis AI | Zhengdong Li et.al. | 2412.20974 | null |
| 2024-12-29 | MetricDepth: Enhancing Monocular Depth Estimation with Deep Metric Learning | Chunpu Liu et.al. | 2412.20390 | null |
| 2024-12-28 | Multi-Modality Driven LoRA for Adverse Condition Depth Estimation | Guanglei Yang et.al. | 2412.20162 | null |
| 2024-12-28 | DepthMamba with Adaptive Fusion | Zelin Meng et.al. | 2412.19964 | null |
| 2024-12-26 | An End-to-End Depth-Based Pipeline for Selfie Image Rectification | Ahmed Alhawwary et.al. | 2412.19189 | null |
| 2024-12-26 | Revisiting Monocular 3D Object Detection from Scene-Level Depth Retargeting to Instance-Level Spatial Refinement | Qiude Zhang et.al. | 2412.19165 | null |
| 2024-12-26 | MVS-GS: High-Quality 3D Gaussian Splatting Mapping via Online Multi-View Stereo | Byeonggwon Lee et.al. | 2412.19130 | null |
| 2024-12-26 | Learning Monocular Depth from Events via Egomotion Compensation | Haitao Meng et.al. | 2412.19067 | null |
| 2024-12-24 | RSGaussian:3D Gaussian Splatting with LiDAR for Aerial Remote Sensing Novel View Synthesis | Yiling Yao et.al. | 2412.18380 | null |
| 2024-12-27 | LiRCDepth: Lightweight Radar-Camera Depth Estimation via Knowledge Distillation and Uncertainty Guidance | Huawei Sun et.al. | 2412.16380 | link |
| 2024-12-19 | Flowing from Words to Pixels: A Framework for Cross-Modality Evolution | Qihao Liu et.al. | 2412.15213 | link |
| 2024-12-19 | Scaling 4D Representations | João Carreira et.al. | 2412.15212 | link |
| 2024-12-18 | Foundation Models Meet Low-Cost Sensors: Test-Time Adaptation for Rescaling Disparity for Zero-Shot Metric Depth Estimation | Rémi Marsal et.al. | 2412.14103 | null |
| 2024-12-18 | Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation | Haotong Lin et.al. | 2412.14015 | link |
| 2024-12-18 | Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion | Massimiliano Viola et.al. | 2412.13389 | link |
| 2024-12-18 | Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera | Zhengdi Yu et.al. | 2412.12861 | null |
| 2024-12-17 | PromptDet: A Lightweight 3D Object Detection Framework with LiDAR Prompts | Kun Guo et.al. | 2412.12460 | null |
| 2024-12-16 | V-MIND: Building Versatile Monocular Indoor 3D Detector with Diverse 2D Annotations | Jin-Cheng Jhang et.al. | 2412.11412 | null |
| 2024-12-16 | Depth-Centric Dehazing and Depth-Estimation from Real-World Hazy Driving Video | Junkai Fan et.al. | 2412.11395 | null |
| 2024-12-15 | ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction | Yi Feng et.al. | 2412.11210 | link |
| 2024-12-14 | MAL: Cluster-Masked and Multi-Task Pretraining for Enhanced xLSTM Vision Performance | Wenjun Huang et.al. | 2412.10730 | null |
| 2024-12-12 | Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos | Linyi Jin et.al. | 2412.09621 | link |
| 2024-12-12 | T-SVG: Text-Driven Stereoscopic Video Generation | Qiao Jin et.al. | 2412.09323 | null |
| 2024-12-12 | Cross-View Completion Models are Zero-shot Correspondence Estimators | Honggyu An et.al. | 2412.09072 | null |
| 2024-12-11 | BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation | Shengze Wang et.al. | 2412.08640 | null |
| 2024-12-13 | Utilizing Multi-step Loss for Single Image Reflection Removal | Abdelrahman Elnenaey et.al. | 2412.08582 | link |
| 2024-12-11 | Dense Depth from Event Focal Stack | Kenta Horikawa et.al. | 2412.08120 | null |
| 2024-12-10 | Diffusion-Based Attention Warping for Consistent 3D Scene Editing | Eyal Gomel et.al. | 2412.07984 | null |
| 2024-12-10 | Balancing Shared and Task-Specific Representations: A Hybrid Approach to Depth-Aware Video Panoptic Segmentation | Kurt H. W. Stolle et.al. | 2412.07966 | link |
| 2024-12-09 | SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception | Yaniv Benny et.al. | 2412.06968 | null |
| 2024-12-09 | Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving | Xin Fei et.al. | 2412.06777 | link |
| 2024-12-09 | MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views | Antoine Guédon et.al. | 2412.06767 | link |
| 2024-12-09 | On-Device Self-Supervised Learning of Low-Latency Monocular Depth from Only Events | Jesse Hagenaars et.al. | 2412.06359 | null |
| 2024-12-09 | Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction | Dongxu Wei et.al. | 2412.06273 | null |
| 2024-12-09 | Event fields: Capturing light fields at high speed, resolution, and dynamic range | Ziyuan Qu et.al. | 2412.06191 | null |
| 2024-12-08 | GVDepth: Zero-Shot Monocular Depth Estimation for Ground Vehicles based on Probabilistic Cue Fusion | Karlo Koledic et.al. | 2412.06080 | null |
| 2024-12-08 | Prism: Semi-Supervised Multi-View Stereo with Monocular Structure Priors | Alex Rich et.al. | 2412.05771 | null |
| 2024-12-10 | TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action | Zixian Ma et.al. | 2412.05479 | link |
| 2024-12-06 | SimC3D: A Simple Contrastive 3D Pretraining Framework Using RGB Images | Jiahua Dong et.al. | 2412.05274 | null |
| 2024-12-06 | Penetrative rotating magnetoconvection subject to lateral variations in temperature gradients | Tirtharaj Barman et.al. | 2412.05235 | null |
| 2024-12-06 | PanoDreamer: 3D Panorama Synthesis from a Single Image | Avinash Paliwal et.al. | 2412.04827 | link |
| 2024-12-05 | LAA-Net: A Physical-prior-knowledge Based Network for Robust Nighttime Depth Estimation | Kebin Peng et.al. | 2412.04666 | null |
| 2024-12-05 | MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos | Zhengqi Li et.al. | 2412.04463 | link |
| 2024-12-05 | MT3DNet: Multi-Task learning Network for 3D Surgical Scene Reconstruction | Mithun Parab et.al. | 2412.03928 | null |
| 2024-12-04 | Perception Tokens Enhance Visual Reasoning in Multimodal Language Models | Mahtab Bigverdi et.al. | 2412.03548 | null |
| 2024-12-04 | Dense Scene Reconstruction from Light-Field Images Affected by Rolling Shutter | Hermes McGriff et.al. | 2412.03518 | null |
| 2024-12-04 | MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction | Gangjian Zhang et.al. | 2412.03103 | null |
| 2024-12-05 | Align3R: Aligned Monocular Depth Estimation for Dynamic Videos | Jiahao Lu et.al. | 2412.03079 | null |
| 2024-12-03 | Single-Shot Metric Depth from Focused Plenoptic Cameras | Blanca Lasheras-Hernandez et.al. | 2412.02386 | null |
| 2024-12-03 | Dual Exposure Stereo for Extended Dynamic Range 3D Imaging | Juhyung Choi et.al. | 2412.02351 | null |
| 2024-12-03 | Amodal Depth Anything: Amodal Depth Estimation in the Wild | Zhenyu Li et.al. | 2412.02336 | null |
| 2024-12-03 | GSGTrack: Gaussian Splatting-Guided Object Pose Tracking from RGB Videos | Zhiyuan Chen et.al. | 2412.02267 | null |
| 2024-12-03 | FoveaSPAD: Exploiting Depth Priors for Adaptive and Efficient Single-Photon 3D Imaging | Justin Folden et.al. | 2412.02052 | null |
| 2024-12-02 | Mutli-View 3D Reconstruction using Knowledge Distillation | Aditya Dutt et.al. | 2412.02039 | link |
| 2024-12-02 | AVS-Net: Audio-Visual Scale Net for Self-supervised Monocular Metric Depth Estimation | Xiaohu Liu et.al. | 2412.01637 | null |
| 2024-12-02 | STATIC : Surface Temporal Affine for TIme Consistency in Video Monocular Depth Estimation | Sunghun Yang et.al. | 2412.01090 | null |
| 2024-12-01 | FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation | Yunpeng Bai et.al. | 2412.00671 | link |
| 2024-11-29 | SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection | Philipp Wolters et.al. | 2411.19860 | null |
| 2024-11-29 | MonoPP: Metric-Scaled Self-Supervised Monocular Depth Estimation by Planar-Parallax Geometry in Automotive Applications | Gasser Elazab et.al. | 2411.19717 | null |
| 2024-11-29 | Gaussian Splashing: Direct Volumetric Rendering Underwater | Nir Mualem et.al. | 2411.19588 | null |
| 2024-11-28 | Learning Surrogate Rainfall-driven Inundation Models with Few Data | Marzieh Alireza Mirhoseini et.al. | 2411.19323 | null |
| 2024-11-28 | AGS-Mesh: Adaptive Gaussian Splatting and Meshing with Geometric Priors for Indoor Room Reconstruction Using Smartphones | Xuqian Ren et.al. | 2411.19271 | null |
| 2024-11-28 | Video Depth without Video Models | Bingxin Ke et.al. | 2411.19189 | link |
| 2024-11-28 | 360Recon: An Accurate Reconstruction Method Based on Depth Fusion from 360 Images | Zhongmiao Yan et.al. | 2411.19102 | null |
| 2024-11-27 | Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation | Mehdi Zayene et.al. | 2411.18335 | link |
| 2024-11-27 | GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation | Wenbo Cui et.al. | 2411.18276 | null |
| 2024-11-27 | SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation | Duc-Hai Pham et.al. | 2411.18229 | null |
| 2024-11-26 | Low-rank Adaptation-based All-Weather Removal for Autonomous Navigation | Sudarshan Rajagopalan et.al. | 2411.17814 | null |
| 2024-11-26 | Spatially Visual Perception for End-to-End Robotic Learning | Travis Davies et.al. | 2411.17458 | null |
| 2024-11-26 | DepthCues: Evaluating Monocular Depth Perception in Large Vision Models | Duolikun Danier et.al. | 2411.17385 | null |
| 2024-11-26 | Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration | Junyuan Deng et.al. | 2411.17240 | link |
| 2024-11-25 | G2SDF: Surface Reconstruction from Explicit Gaussians with Implicit SDFs | Kunyi Li et.al. | 2411.16898 | null |
| 2024-11-24 | PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation | Ziyao Zeng et.al. | 2411.16750 | null |
| 2024-11-25 | Generative Omnimatte: Learning to Decompose Video into Layers | Yao-Chih Lee et.al. | 2411.16683 | null |
| 2024-11-25 | One Diffusion to Generate Them All | Duong H. Le et.al. | 2411.16318 | link |
| 2024-11-24 | Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors | Soumava Paul et.al. | 2411.15966 | link |
| 2024-11-21 | StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart | Jian Shi et.al. | 2411.14295 | link |
| 2024-11-20 | DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild | Weicai Ye et.al. | 2411.13291 | null |
| 2024-11-20 | OceanLens: An Adaptive Backscatter and Edge Correction using Deep Learning Model for Enhanced Underwater Imaging | Rajini Makam et.al. | 2411.13230 | null |
| 2024-11-15 | SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction | Yutao Tang et.al. | 2411.12592 | link |
| 2024-11-18 | Towards Degradation-Robust Reconstruction in Generalizable NeRF | Chan Ho Park et.al. | 2411.11691 | null |
| 2024-11-18 | MGNiceNet: Unified Monocular Geometric Scene Understanding | Markus Schön et.al. | 2411.11466 | null |
| 2024-11-18 | The ADUULM-360 Dataset – A Multi-Modal Dataset for Depth Estimation in Adverse Weather | Markus Schön et.al. | 2411.11455 | null |
| 2024-11-18 | GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views | Boyao Zhou et.al. | 2411.11363 | null |
| 2024-11-18 | Scalable Autoregressive Monocular Depth Estimation | Jinhong Wang et.al. | 2411.11361 | null |
| 2024-11-16 | MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation | Ansh Shah et.al. | 2411.10886 | link |
| 2024-11-19 | EVT: Efficient View Transformation for Multi-Modal 3D Object Detection | Yongjin Lee et.al. | 2411.10715 | null |
| 2024-11-15 | Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses | Yongfan Liu et.al. | 2411.10013 | null |
| 2024-11-14 | Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting | Yian Wang et.al. | 2411.09823 | null |
| 2024-11-14 | Adversarial Attacks Using Differentiable Rendering: A Survey | Matthew Hull et.al. | 2411.09749 | null |
| 2024-11-14 | Mono2Stereo: Monocular Knowledge Transfer for Enhanced Stereo Matching | Yuran Wang et.al. | 2411.09151 | null |
| 2024-11-13 | OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Geometric and Semantic Guidances | Youqi Liao et.al. | 2411.08665 | null |
| 2024-11-13 | Scaling Properties of Diffusion Models for Perceptual Tasks | Rahul Ravishankar et.al. | 2411.08034 | null |
| 2024-11-11 | $SE(3)$ Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation | Yinshuang Xu et.al. | 2411.07326 | null |
| 2024-11-08 | Enhancing Depth Image Estimation for Underwater Robots by Combining Image Processing and Machine Learning | Quang Truong Nguyen et.al. | 2411.05344 | null |
| 2024-11-08 | SimpleBEV: Improved LiDAR-Camera Fusion Architecture for 3D Object Detection | Yun Zhao et.al. | 2411.05292 | null |
| 2024-11-07 | D $^3$ epth: Self-Supervised Depth Estimation with Dynamic Mask in Dynamic Scenes | Siyu Chen et.al. | 2411.04826 | null |
| 2024-11-06 | Revisiting Disparity from Dual-Pixel Images: Physics-Informed Lightweight Depth Estimation | Teppei Kurita et.al. | 2411.04714 | null |
| 2024-11-07 | Enhancing Bronchoscopy Depth Estimation through Synthetic-to-Real Domain Adaptation | Qingyao Tian et.al. | 2411.04404 | null |
| 2024-11-04 | PMPNet: Pixel Movement Prediction Network for Monocular Depth Estimation in Dynamic Scenes | Kebin Peng et.al. | 2411.04227 | null |
| 2024-11-06 | Adaptive Stereo Depth Estimation with Multi-Spectral Images Across All Lighting Conditions | Zihan Qin et.al. | 2411.03638 | null |
| 2024-11-05 | Monocular Event-Based Vision for Obstacle Avoidance with a Quadrotor | Anish Bhattacharya et.al. | 2411.03303 | null |
| 2024-11-05 | Correlation of Object Detection Performance with Visual Saliency and Depth Estimation | Matthias Bartolo et.al. | 2411.02844 | link |
| 2024-11-05 | FewViewGS: Gaussian Splatting with Few View Matching and Multi-stage Training | Ruihong Yin et.al. | 2411.02229 | null |
| 2024-11-05 | Improving Domain Generalization in Self-supervised Monocular Depth Estimation via Stabilized Adversarial Training | Yuanqi Yao et.al. | 2411.02149 | null |
| 2024-11-01 | MultiDepth: Multi-Sample Priors for Refining Monocular Metric Depth Estimations in Indoor Scenes | Sanghyun Byun et.al. | 2411.01048 | null |
| 2024-11-01 | On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR | Li Li et.al. | 2411.00600 | link |
| 2024-10-31 | Optical Lens Attack on Monocular Depth Estimation for Autonomous Driving | Ce Zhou et.al. | 2411.00192 | null |
| 2024-10-31 | ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images | Timing Yang et.al. | 2410.24001 | link |
| 2024-10-30 | Nested ResNet: A Vision-Based Method for Detecting the Sensing Area of a Drop-in Gamma Probe | Songyu Xu et.al. | 2410.23154 | null |
| 2024-10-29 | Active Event Alignment for Monocular Distance Estimation | Nan Cai et.al. | 2410.22280 | null |
| 2024-10-29 | PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting | Sunghwan Hong et.al. | 2410.22128 | link |
| 2024-10-27 | Unlocking Comics: The AI4VA Dataset for Visual Understanding | Peter Grönquist et.al. | 2410.20459 | link |
| 2024-10-27 | Depth Attention for Robust RGB Tracking | Yu Liu et.al. | 2410.20395 | link |
| 2024-10-21 | YOLO11 and Vision Transformers based 3D Pose Estimation of Immature Green Fruits in Commercial Apple Orchards for Robotic Thinning | Ranjan Sapkota et.al. | 2410.19846 | null |
| 2024-10-25 | MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors | Fanqi Pu et.al. | 2410.19590 | link |
| 2024-10-24 | Segmentation-aware Prior Assisted Joint Global Information Aggregated 3D Building Reconstruction | Hongxin Peng et.al. | 2410.18433 | null |
| 2024-10-24 | Thermal Chameleon: Task-Adaptive Tone-mapping for Radiometric Thermal-Infrared images | Dong-Guw Lee et.al. | 2410.18340 | link |
| 2024-10-25 | UnCLe: Unsupervised Continual Learning of Depth Completion | Suchisrit Gangopadhyay et.al. | 2410.18074 | null |
| 2024-10-21 | TIPS: Text-Image Pretraining with Spatial Awareness | Kevis-Kokitsi Maninis et.al. | 2410.16512 | link |
| 2024-10-22 | DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain | Kun Wang et.al. | 2410.14980 | link |
| 2024-10-17 | DepthSplat: Connecting Gaussian Splatting and Depth | Haofei Xu et.al. | 2410.13862 | link |
| 2024-10-16 | DH-VTON: Deep Text-Driven Virtual Try-On via Hybrid Attention Learning | Jiabao Wei et.al. | 2410.12501 | null |
| 2024-10-16 | Depth Estimation From Monocular Images With Enhanced Encoder-Decoder Architecture | Dabbrata Das et.al. | 2410.11610 | null |
| 2024-10-16 | CVCP-Fusion: On Implicit Depth Estimation for 3D Bounding Box Prediction | Pranav Gupta et.al. | 2410.11211 | link |
| 2024-10-14 | When Does Perceptual Alignment Benefit Vision Representations? | Shobhita Sundaram et.al. | 2410.10817 | null |
| 2024-10-14 | Depth Any Video with Scalable Synthetic Data | Honghui Yang et.al. | 2410.10815 | link |
| 2024-10-15 | Improved Depth Estimation of Bayesian Neural Networks | Bart van Erp et.al. | 2410.10395 | link |
| 2024-10-10 | Color-Guided Flying Pixel Correction in Depth Images | Ekamresh Vasudevan et.al. | 2410.08084 | null |
| 2024-10-09 | Surgical Depth Anything: Depth Estimation for Surgical Scenes using Foundation Models | Ange Lou et.al. | 2410.07434 | null |
| 2024-10-09 | Structure-Centric Robust Monocular Depth Estimation via Knowledge Distillation | Runze Chen et.al. | 2410.06982 | null |
| 2024-10-09 | Analysis of different disparity estimation techniques on aerial stereo image datasets | Ishan Narayan et.al. | 2410.06711 | null |
| 2024-10-08 | Vision Transformer based Random Walk for Group Re-Identification | Guoqing Zhang et.al. | 2410.05808 | null |
| 2024-10-08 | CUBE360: Learning Cubic Field Representation for Monocular 360 Depth Estimation for Virtual Reality | Wenjie Chang et.al. | 2410.05735 | null |
| 2024-10-07 | PhotoReg: Photometrically Registering 3D Gaussian Splatting Models | Ziwen Yuan et.al. | 2410.05044 | null |
| 2024-10-10 | Hybrid NeRF-Stereo Vision: Pioneering Depth Estimation and 3D Reconstruction in Endoscopy | Pengcheng Chen et.al. | 2410.04041 | null |
| 2024-10-04 | Refinement of Monocular Depth Maps via Multi-View Differentiable Rendering | Laura Fink et.al. | 2410.03861 | null |
| 2024-10-03 | RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions | Ziyao Zeng et.al. | 2410.02924 | null |
| 2024-10-02 | Depth Pro: Sharp Monocular Metric Depth in Less Than a Second | Aleksei Bochkovskii et.al. | 2410.02073 | link |
| 2024-10-01 | Towards Full-parameter and Parameter-efficient Self-learning For Endoscopic Camera Depth Estimation | Shuting Zhao et.al. | 2410.00979 | null |
| 2024-10-01 | Radar Meets Vision: Robustifying Monocular Metric Depth Prediction for Mobile Robotics | Marco Job et.al. | 2410.00736 | link |
| 2024-10-06 | Drone Stereo Vision for Radiata Pine Branch Detection and Distance Measurement: Utilizing Deep Learning and YOLO Integration | Yida Lin et.al. | 2410.00503 | null |
| 2024-10-01 | Seamless Augmented Reality Integration in Arthroscopy: A Pipeline for Articular Reconstruction and Guidance | Hongchao Shu et.al. | 2410.00386 | null |
| 2024-09-30 | CCDepth: A Lightweight Self-supervised Depth Estimation Network with Enhanced Interpretability | Xi Zhang et.al. | 2409.19933 | null |
| 2024-09-30 | EndoDepth: A Benchmark for Assessing Robustness in Endoscopic Depth Prediction | Ivan Reyes-Amezcua et.al. | 2409.19930 | link |
| 2024-09-29 | fCOP: Focal Length Estimation from Category-level Object Priors | Xinyue Zhang et.al. | 2409.19641 | null |
| 2024-09-29 | KineDepth: Utilizing Robot Kinematics for Online Metric Depth Estimation | Soofiyan Atar et.al. | 2409.19490 | null |
| 2024-09-27 | Speckle-illumination spatial frequency domain imaging with a stereo laparoscope for profile-corrected optical property mapping | Anthony A. Song et.al. | 2409.19153 | null |
| 2024-09-26 | Self-supervised Monocular Depth Estimation with Large Kernel Attention | Xuezhi Xiang et.al. | 2409.17895 | null |
| 2024-09-26 | Self-Distilled Depth Refinement with Noisy Poisson Fusion | Jiaqi Li et.al. | 2409.17880 | null |
| 2024-09-27 | A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts | Aurel Pjetri et.al. | 2409.17851 | null |
| 2024-09-26 | Event-based Stereo Depth Estimation: A Survey | Suman Ghosh et.al. | 2409.17680 | null |
| 2024-09-26 | CAMOT: Camera Angle-aware Multi-Object Tracking | Felix Limanta et.al. | 2409.17533 | null |
| 2024-09-25 | Optical Lens Attack on Deep Learning Based Monocular Depth Estimation | Ce Zhou et.al. | 2409.17376 | null |
| 2024-09-25 | Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation | Richard D. Paul et.al. | 2409.17085 | null |
| 2024-09-25 | EventHDR: from Event to High-Speed HDR Videos and Beyond | Yunhao Zou et.al. | 2409.17029 | null |
| 2024-09-25 | 3DDX: Bone Surface Reconstruction from a Single Standard-Geometry Radiograph via Dual-Face Depth Estimation | Yi Gu et.al. | 2409.16702 | null |
| 2024-09-24 | MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling | Yifang Men et.al. | 2409.16160 | null |
| 2024-09-24 | Benchmarking Robustness of Endoscopic Depth Estimation with Synthetically Corrupted Data | An Wang et.al. | 2409.16063 | link |
| 2024-09-23 | FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera | Guoyang Zhao et.al. | 2409.15054 | link |
| 2024-09-23 | DepthART: Monocular Depth Estimation as Autoregressive Refinement Task | Bulat Gabdullin et.al. | 2409.15010 | null |
| 2024-09-23 | Generalizing monocular colonoscopy image depth estimation by uncertainty-based global and local fusion network | Sijia Du et.al. | 2409.15006 | null |
| 2024-09-23 | GroCo: Ground Constraint for Metric Self-Supervised Monocular Depth | Aurélien Cecille et.al. | 2409.14850 | null |
| 2024-09-23 | Robust and Flexible Omnidirectional Depth Estimation with Multiple 360° Cameras | Ming Li et.al. | 2409.14766 | null |
| 2024-09-25 | D3RoMa: Disparity Diffusion-based Depth Sensing for Material-Agnostic Robotic Manipulation | Songlin Wei et.al. | 2409.14365 | null |
| 2024-09-21 | @Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology | Xin Jiang et.al. | 2409.14215 | null |
| 2024-09-20 | High-Resolution Flood Probability Mapping Using Generative Machine Learning with Large-Scale Synthetic Precipitation and Inundation Data | Lipai Huang et.al. | 2409.13936 | null |
| 2024-09-18 | Panoptic-Depth Forecasting | Juana Valeria Hurtado et.al. | 2409.12008 | null |
| 2024-09-17 | Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think | Gonzalo Martin Garcia et.al. | 2409.11355 | link |
| 2024-09-15 | GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion | Vitor Guizilini et.al. | 2409.09896 | null |
| 2024-09-15 | Towards Single-Lens Controllable Depth-of-Field Imaging via All-in-Focus Aberration Correction and Monocular Depth Estimation | Xiaolong Qian et.al. | 2409.09754 | link |
| 2024-09-13 | PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage | Denis Zavadski et.al. | 2409.09144 | link |
| 2024-09-23 | Precision Aquaculture: An Integrated Computer Vision and IoT Approach for Optimized Tilapia Feeding | Rania Hossam et.al. | 2409.08695 | link |
| 2024-09-12 | Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor | Andrea Conti et.al. | 2409.08277 | null |
| 2024-09-12 | LED: Light Enhanced Depth Estimation at Night | Simon de Moreau et.al. | 2409.08031 | link |
| 2024-09-12 | Real-time Multi-view Omnidirectional Depth Estimation System for Robots and Autonomous Driving on Real Scenes | Ming Li et.al. | 2409.07843 | null |
| 2024-09-12 | Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy | Bojian Li et.al. | 2409.07723 | null |
| 2024-09-12 | FIReStereo: Forest InfraRed Stereo Dataset for UAS Depth Perception in Visually Degraded Environments | Devansh Dhrafani et.al. | 2409.07715 | null |
| 2024-09-10 | Deep Neural Networks: Multi-Classification and Universal Approximation | Martín Hernández et.al. | 2409.06555 | null |
| 2024-09-10 | EDADepth: Enhanced Data Augmentation for Monocular Depth Estimation | Nischal Khanal et.al. | 2409.06183 | link |
| 2024-09-11 | EndoOmni: Zero-Shot Cross-Dataset Depth Estimation in Endoscopy by Robust Self-Learning from Noisy Labels | Qingyao Tian et.al. | 2409.05442 | null |
| 2024-09-09 | Spontaneous magnetic field and disorder effects in BaPtAs_1-x_Sb_x_ with honeycomb network | T. Adachi et.al. | 2409.05266 | null |
| 2024-09-08 | TanDepth: Leveraging Global DEMs for Metric Monocular Depth Estimation in UAVs | Horatiu Florea et.al. | 2409.05142 | null |
| 2024-09-12 | Introducing a Class-Aware Metric for Monocular Depth Estimation: An Automotive Perspective | Tim Bader et.al. | 2409.04086 | link |
| 2024-09-08 | Estimating Indoor Scene Depth Maps from Ultrasonic Echoes | Junpei Honma et.al. | 2409.03336 | null |
| 2024-09-04 | iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation | Hayeon Jo et.al. | 2409.02838 | null |
| 2024-09-02 | GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling | Huawei Sun et.al. | 2409.02720 | null |
| 2024-09-04 | Skip-and-Play: Depth-Driven Pose-Preserved Image Generation for Any Objects | Kyungmin Jo et.al. | 2409.02653 | null |
| 2024-09-04 | UniTT-Stereo: Unified Training of Transformer for Enhanced Stereo Matching | Soomin Kim et.al. | 2409.02545 | null |
| 2024-09-04 | SG-MIM: Structured Knowledge Guided Efficient Pre-training for Dense Prediction | Sumin Son et.al. | 2409.02513 | null |
| 2024-09-04 | Plane2Depth: Hierarchical Adaptive Plane Guidance for Monocular Depth Estimation | Li Liu et.al. | 2409.02494 | null |
| 2024-09-04 | Boosting Generalizability towards Zero-Shot Cross-Dataset Single-Image Indoor Depth by Meta-Initialization | Cho-Ying Wu et.al. | 2409.02486 | null |
| 2024-09-04 | GGS: Generalizable Gaussian Splatting for Lane Switching in Autonomous Driving | Huasong Han et.al. | 2409.02382 | null |
| 2024-09-03 | DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos | Wenbo Hu et.al. | 2409.02095 | null |
| 2024-09-02 | Large Language Models Can Understanding Depth from Monocular Images | Zhongyi Xia et.al. | 2409.01133 | null |
| 2024-08-30 | DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model | Mona Sheikh Zeinoddin et.al. | 2408.17433 | null |
| 2024-08-30 | Enhancing Underwater Imaging with 4-D Light Fields: Dataset and Method | Yuji Lin et.al. | 2408.17339 | null |
| 2024-08-30 | Synthetic Lunar Terrain: A Multimodal Open Dataset for Training and Evaluating Neuromorphic Vision Algorithms | Marcus Märtens et.al. | 2408.16971 | null |
| 2024-08-29 | EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More | Kanghao Chen et.al. | 2408.16254 | null |
| 2024-08-30 | Revisiting 360 Depth Estimation with PanoGabor: A New Fusion Perspective | Zhijie Shen et.al. | 2408.16227 | link |
| 2024-08-27 | Adversarial Manhole: Challenging Monocular Depth Estimation and Semantic Segmentation Models with Patch Attack | Naufal Suryanto et.al. | 2408.14879 | null |
| 2024-08-26 | NimbleD: Enhancing Self-supervised Monocular Depth Estimation with Pseudo-labels and Large-scale Video Pre-training | Albert Luginov et.al. | 2408.14177 | null |
| 2024-08-26 | Pixel-Aligned Multi-View Generation with Depth Guided Decoder | Zhenggang Tang et.al. | 2408.14016 | null |
| 2024-08-25 | TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers | Chuanrui Zhang et.al. | 2408.13770 | null |
| 2024-08-25 | InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth | Cho-Ying Wu et.al. | 2408.13708 | null |
| 2024-08-25 | SeeBelow: Sub-dermal 3D Reconstruction of Tumors with Surgical Robotic Palpation and Tactile Exploration | Raghava Uppuluri et.al. | 2408.13699 | null |
| 2024-08-27 | Sapiens: Foundation for Human Vision Models | Rawal Khirodkar et.al. | 2408.12569 | null |
| 2024-08-21 | LiFCal: Online Light Field Camera Calibration via Bundle Adjustment | Aymeric Fleith et.al. | 2408.11682 | null |
| 2024-08-19 | Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video | Shuxian Wang et.al. | 2408.10153 | null |
| 2024-08-19 | SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition | Wiktor Mucha et.al. | 2408.10037 | link |
| 2024-08-19 | P3P: Pseudo-3D Pre-training for Scaling 3D Masked Autoencoders | Xuechao Chen et.al. | 2408.10007 | null |
| 2024-08-14 | Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling | Ruofeng Wei et.al. | 2408.07266 | null |
| 2024-08-12 | Towards Robust Monocular Depth Estimation in Non-Lambertian Surfaces | Junrui Zhang et.al. | 2408.06083 | null |
| 2024-08-08 | Depth Any Canopy: Leveraging Depth Foundation Models for Canopy Height Estimation | Daniele Rege Cambrin et.al. | 2408.04523 | link |
| 2024-08-08 | Detecting Car Speed using Object Detection and Depth Estimation: A Deep Learning Framework | Subhasis Dasgupta et.al. | 2408.04360 | null |
| 2024-08-08 | Design and Implementation of Smart Infrastructures and Connected Vehicles in A Mini-city Platform | Daniel Vargas et.al. | 2408.04195 | null |
| 2024-08-07 | Focal Depth Estimation: A Calibration-Free, Subject- and Daytime Invariant Approach | Benedikt W. Hosp et.al. | 2408.03591 | null |
| 2024-08-06 | BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications | G. Manni et.al. | 2408.03078 | link |
| 2024-08-05 | Gaussian Mixture based Evidential Learning for Stereo Matching | Weide Liu et.al. | 2408.02796 | null |
| 2024-08-05 | Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining | Dongyang Liu et.al. | 2408.02657 | link |
| 2024-08-03 | MCPDepth: Omnidirectional Depth Estimation via Stereo Matching from Multi-Cylindrical Panoramas | Feng Qiao et.al. | 2408.01653 | null |
| 2024-08-02 | Self-Supervised Depth Estimation Based on Camera Models | Jinchang Zhang et.al. | 2408.01565 | null |
| 2024-08-01 | MonoMM: A Multi-scale Mamba-Enhanced Network for Real-time Monocular 3D Object Detection | Youjia Fu et.al. | 2408.00438 | null |
| 2024-08-01 | High-Precision Self-Supervised Monocular Depth Estimation with Rich-Resource Prior | Wencheng Han et.al. | 2408.00361 | null |
| 2024-07-31 | Unifying Event-based Flow, Stereo and Depth Estimation via Feature Similarity Matching | Pengjie Zhang et.al. | 2407.21735 | null |
| 2024-07-29 | BaseBoostDepth: Exploiting Larger Baselines For Self-supervised Monocular Depth Estimation | Kieran Saunders et.al. | 2407.20437 | null |
| 2024-07-29 | Analysis and Improvement of Rank-Ordered Mean Algorithm in Single-Photon LiDAR | William C. Yau et.al. | 2407.20399 | null |
| 2024-07-29 | Improving 2D Feature Representations by 3D-Aware Fine-Tuning | Yuanwen Yue et.al. | 2407.20229 | null |
| 2024-07-27 | Revisit Self-supervised Depth Estimation with Local Structure-from-Motion | Shengjie Zhu et.al. | 2407.19166 | null |
| 2024-07-27 | RePLAy: Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry | Shengjie Zhu et.al. | 2407.19154 | null |
| 2024-07-26 | HybridDepth: Robust Depth Fusion for Mobile AR by Leveraging Depth from Focus and Single-Image Priors | Ashkan Ganj et.al. | 2407.18443 | link |
| 2024-07-26 | Enhanced Depth Estimation and 3D Geometry Reconstruction using Bayesian Helmholtz Stereopsis with Belief Propagation | Razieh Azizi et.al. | 2407.18195 | null |
| 2024-07-25 | BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation | Xiang Zhang et.al. | 2407.17952 | null |
| 2024-07-25 | UMono: Physical Model Informed Hybrid CNN-Transformer Framework for Underwater Monocular Depth Estimation | Jian Wang et.al. | 2407.17838 | null |
| 2024-07-24 | DarSwin-Unet: Distortion Aware Encoder-Decoder Architecture | Akshaya Athwale et.al. | 2407.17328 | null |
| 2024-07-24 | Physical Adversarial Attack on Monocular Depth Estimation via Shape-Varying Patches | Chenxing Zhao et.al. | 2407.17312 | null |
| 2024-07-23 | SINDER: Repairing the Singular Defects of DINOv2 | Haoqi Wang et.al. | 2407.16826 | link |
| 2024-07-23 | Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions | Fabio Tosi et.al. | 2407.16698 | link |
| 2024-07-23 | ToDER: Towards Colonoscopy Depth Estimation and Reconstruction with Geometry Constraint Adaptation | Zhenhua Wu et.al. | 2407.16508 | null |
| 2024-07-19 | Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation | Jinfeng Liu et.al. | 2407.14126 | link |
| 2024-07-18 | Unveiling the purely young star formation history of the SMC’s northeastern shell from colour-magnitude diagram fitting | Joanna D. Sakowska et.al. | 2407.13876 | null |
| 2024-07-18 | Many Perception Tasks are Highly Redundant Functions of their Input Data | Rahul Ramesh et.al. | 2407.13841 | null |
| 2024-07-18 | Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks | Antoni Kowalczuk et.al. | 2407.12588 | link |
| 2024-07-16 | Temporally Consistent Stereo Matching | Jiaxi Zeng et.al. | 2407.11950 | link |
| 2024-07-15 | IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation | Yuanhao Zhai et.al. | 2407.10937 | link |
| 2024-07-15 | OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection | Jinghua Hou et.al. | 2407.10753 | link |
| 2024-07-15 | Towards Scale-Aware Full Surround Monodepth with Transformers | Yuchen Yang et.al. | 2407.10406 | null |
| 2024-07-12 | ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion | Sungmin Woo et.al. | 2407.09303 | link |
| 2024-07-11 | ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation | Ruijie Zhu et.al. | 2407.08187 | link |
| 2024-07-10 | Controlling Space and Time with Diffusion Models | Daniel Watson et.al. | 2407.07860 | null |
| 2024-07-07 | SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning | Yi Feng et.al. | 2407.05283 | link |
| 2024-07-05 | A Physical Model-Guided Framework for Underwater Image Enhancement and Depth Estimation | Dazhao Du et.al. | 2407.04230 | null |
| 2024-07-04 | Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation | Laiyan Ding et.al. | 2407.04041 | null |
| 2024-07-02 | Parametric Modeling and Estimation of Photon Registrations for 3D Imaging | Weijian Zhang et.al. | 2407.02712 | null |
| 2024-07-02 | Depth-Aware Endoscopic Video Inpainting | Francis Xiatian Zhang et.al. | 2407.02675 | link |
| 2024-07-04 | Camera-LiDAR Cross-modality Gait Recognition | Wenxuan Guo et.al. | 2407.02038 | null |
| 2024-07-07 | CaFNet: A Confidence-Driven Framework for Radar Camera Depth Estimation | Huawei Sun et.al. | 2407.00697 | link |
| 2024-06-28 | Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey | Uchitha Rajapaksha et.al. | 2406.19675 | null |
| 2024-07-05 | 360 in the Wild: Dataset for Depth Prediction and View Synthesis | Kibaek Park et.al. | 2406.18898 | null |
| 2024-06-27 | Dense Monocular Motion Segmentation Using Optical Flow and Pseudo Depth Map: A Zero-Shot Approach | Yuxiang Huang et.al. | 2406.18837 | null |
| 2024-06-26 | DoubleTake: Geometry Guided Depth Estimation | Mohamed Sayed et.al. | 2406.18387 | null |
| 2024-06-25 | Depth-Guided Semi-Supervised Instance Segmentation | Xin Chen et.al. | 2406.17413 | null |
| 2024-06-20 | Uncertainty and Self-Supervision in Single-View Depth | Javier Rodriguez-Puigvert et.al. | 2406.14226 | null |
| 2024-06-19 | WaterMono: Teacher-Guided Anomaly Masking and Enhancement Boosting for Robust Underwater Self-Supervised Monocular Depth Estimation | Yilin Ding et.al. | 2406.13344 | link |
| 2024-06-18 | Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation | Ning-Hsu Wang et.al. | 2406.12849 | null |
| 2024-06-21 | GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models | Yongtao Ge et.al. | 2406.12671 | link |
| 2024-06-17 | DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features | Letian Wang et.al. | 2406.12095 | null |
| 2024-06-17 | MEDeA: Multi-view Efficient Depth Adjustment | Mikhail Artemyev et.al. | 2406.12048 | null |
| 2024-06-16 | 3D Gaze Tracking for Studying Collaborative Interactions in Mixed-Reality Environments | Eduardo Davalos et.al. | 2406.11003 | null |
| 2024-06-15 | GenMM: Geometrically and Temporally Consistent Multimodal Data Generation for Video and LiDAR | Bharat Singh et.al. | 2406.10722 | null |
| 2024-06-14 | The BabyView dataset: High-resolution egocentric videos of infants’ and young children’s everyday experiences | Bria Long et.al. | 2406.10447 | null |
| 2024-06-14 | D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video | Moritz Kappel et.al. | 2406.10078 | null |
| 2024-06-14 | DurLAR: A High-fidelity 128-channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi-modal Autonomous Driving Applications | Li Li et.al. | 2406.10068 | link |
| 2024-06-14 | Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion | Runze Liu et.al. | 2406.09782 | null |
| 2024-06-13 | Depth Anything V2 | Lihe Yang et.al. | 2406.09414 | null |
| 2024-06-14 | WonderWorld: Interactive 3D Scene Generation from a Single Image | Hong-Xing Yu et.al. | 2406.09394 | null |
| 2024-06-13 | Scale-Invariant Monocular Depth Estimation via SSI Depth | S. Mahdi H. Miangoleh et.al. | 2406.09374 | null |
| 2024-06-13 | Multiple Prior Representation Learning for Self-Supervised Monocular Depth Estimation via Hybrid Transformer | Guodong Sun et.al. | 2406.08928 | link |
| 2024-06-13 | ToSA: Token Selective Attention for Efficient Vision Transformers | Manish Kumar Singh et.al. | 2406.08816 | null |
| 2024-06-11 | Back to the Color: Learning Depth to Specific Color Transformation for Unsupervised Depth Estimation | Yufan Zhu et.al. | 2406.07741 | link |
| 2024-06-11 | PLT-D3: A High-fidelity Dynamic Driving Simulation Dataset for Stereo Depth and Scene Flow | Joshua Tokarsky et.al. | 2406.07667 | null |
| 2024-06-11 | RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks | Zhechao Wang et.al. | 2406.07032 | null |
| 2024-06-10 | PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation | Zhenyu Li et.al. | 2406.06679 | link |
| 2024-06-09 | Self-supervised Adversarial Training of Monocular Depth Estimation against Physical-World Attacks | Zhiyuan Cheng et.al. | 2406.05857 | link |
| 2024-06-09 | RefGaussian: Disentangling Reflections from 3D Gaussian Splatting for Realistic Rendering | Rui Zhang et.al. | 2406.05852 | null |
| 2024-06-07 | Normal-guided Detail-Preserving Neural Implicit Functions for High-Fidelity 3D Surface Reconstruction | Aarya Patel et.al. | 2406.04861 | null |
| 2024-06-07 | UVCPNet: A UAV-Vehicle Collaborative Perception Network for 3D Object Detection | Yuchao Wang et.al. | 2406.04647 | null |
| 2024-06-06 | MambaDepth: Enhancing Long-range Dependency for Self-Supervised Fine-Structured Monocular Depth Estimation | Ionuţ Grigore et.al. | 2406.04532 | null |
| 2024-06-06 | Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image | Stanislaw Szymanowicz et.al. | 2406.04343 | null |
| 2024-06-06 | Neural Surface Reconstruction from Sparse Views Using Epipolar Geometry | Kaichen Zhou et.al. | 2406.04301 | null |
| 2024-06-04 | VHS: High-Resolution Iterative Stereo Matching with Visual Hull Priors | Markus Plack et.al. | 2406.02552 | null |
| 2024-06-03 | L-MAGIC: Language Model Assisted Generation of Images with Coherence | Zhipeng Cai et.al. | 2406.01843 | link |
| 2024-06-04 | Learning Temporally Consistent Video Depth from Video Diffusion Priors | Jiahao Shao et.al. | 2406.01493 | null |
| 2024-06-03 | Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry | Takayuki Kanai et.al. | 2406.00929 | null |
| 2024-06-01 | MoDGS: Dynamic Gaussian Splatting from Causually-captured Monocular Videos | Qingming Liu et.al. | 2406.00434 | null |
| 2024-05-30 | Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian | Wei Sun et.al. | 2405.19657 | null |
| 2024-05-28 | Hybrid Multi-Head Physics-informed Neural Network for Depth Estimation in Terahertz Imaging | Mingjun Xiang et.al. | 2405.18317 | null |
| 2024-05-27 | Consistency Regularisation for Unsupervised Domain Adaptation in Monocular Depth Estimation | Amir El-Ghoussani et.al. | 2405.17704 | null |
| 2024-05-27 | Benchmarking and Improving Bird’s Eye View Perception Robustness in Autonomous Driving | Shaoyuan Xie et.al. | 2405.17426 | link |
| 2024-05-27 | All-day Depth Completion | Vadim Ezhov et.al. | 2405.17315 | null |
| 2024-05-27 | GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping | Junyoung Seo et.al. | 2405.17251 | link |
| 2024-05-27 | SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing | Yong-Qiang Mao et.al. | 2405.17140 | null |
| 2024-05-27 | DINO-SD: Champion Solution for ICRA 2024 RoboDepth Challenge | Yifan Mao et.al. | 2405.17102 | null |
| 2024-05-27 | Evaluation of Multi-task Uncertainties in Joint Semantic Segmentation and Monocular Depth Estimation | Steven Landgraf et.al. | 2405.17097 | null |
| 2024-05-27 | DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation | Mengtan Zhang et.al. | 2405.16960 | null |
| 2024-05-27 | ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection | Ziying Song et.al. | 2405.16873 | null |
| 2024-05-27 | Estimating Depth of Monocular Panoramic Image with Teacher-Student Model Fusing Equirectangular and Spherical Representations | Jingguo Liu et.al. | 2405.16858 | null |
| 2024-05-26 | Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians | Erik Sandström et.al. | 2405.16544 | null |
| 2024-05-24 | Transparent Object Depth Completion | Yifan Zhou et.al. | 2405.15299 | null |
| 2024-05-24 | MonoDETRNext: Next-generation Accurate and Efficient Monocular 3D Object Detection Method | Pan Liao et.al. | 2405.15176 | null |
| 2024-05-23 | EvGGS: A Collaborative Learning Framework for Event-based Generalizable Gaussian Splatting | Jiaxu Wang et.al. | 2405.14959 | link |
| 2024-05-23 | Ghost-Stereo: GhostNet-based Cost Volume Enhancement and Aggregation for Stereo Matching Networks | Xingguang Jiang et.al. | 2405.14520 | null |
| 2024-05-23 | Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning | Zhenyu Wei et.al. | 2405.14195 | null |
| 2024-05-21 | Cross-spectral Gated-RGB Stereo Depth Estimation | Samuel Brucker et.al. | 2405.12759 | null |
| 2024-05-20 | Depth Reconstruction with Neural Signed Distance Fields in Structured Light Systems | Rukun Qiao et.al. | 2405.12006 | null |
| 2024-05-20 | Depth Prompting for Sensor-Agnostic Depth Estimation | Jin-Hwi Park et.al. | 2405.11867 | null |
| 2024-05-19 | CRF360D: Monocular 360 Depth Estimation via Spherical Fully-Connected CRFs | Zidong Cao et.al. | 2405.11564 | null |
| 2024-05-18 | Dusk Till Dawn: Self-supervised Nighttime Stereo Depth Estimation using Visual Foundation Models | Madhu Vankadari et.al. | 2405.11158 | link |
| 2024-05-17 | FA-Depth: Toward Fast and Accurate Self-supervised Monocular Depth Estimation | Fei Wang et.al. | 2405.10885 | link |
| 2024-05-17 | Accurate Training Data for Occupancy Map Prediction in Automated Driving Using Evidence Theory | Jonas Kälble et.al. | 2405.10575 | link |
| 2024-05-16 | Towards Task-Compatible Compressible Representations | Anderson de Andrade et.al. | 2405.10244 | link |
| 2024-05-16 | KPNDepth: Depth Estimation of Lane Images under Complex Rainy Environment | Zhengxu Shi et.al. | 2405.09964 | null |
| 2024-05-14 | CLIP with Quality Captions: A Strong Pretraining for Vision Tasks | Pavan Kumar Anasosalu Vasu et.al. | 2405.08911 | null |
| 2024-05-14 | The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition | Lingdong Kong et.al. | 2405.08816 | null |
| 2024-05-14 | EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera | Beilei Cui et.al. | 2405.08672 | link |
| 2024-05-13 | SceneFactory: A Workflow-centric and Unified Framework for Incremental Scene Modeling | Yijun Yuan et.al. | 2405.07847 | null |
| 2024-05-16 | Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance Estimation | Vasileios Karampinis et.al. | 2405.06749 | null |
| 2024-05-10 | MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization | Pengcheng Zhu et.al. | 2405.06241 | null |
| 2024-04-30 | A critical appraisal of water table depth estimation: Challenges and opportunities within machine learning | Joseph Janssen et.al. | 2405.04579 | null |
| 2024-05-06 | A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose | Kaiwen Jiang et.al. | 2405.03659 | null |
| 2024-05-03 | M ${^2}$ Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation | Yingshuang Zou et.al. | 2405.02004 | null |
| 2024-05-02 | Domain-Transferred Synthetic Data Generation for Improving Monocular Depth Estimation | Seungyeop Lee et.al. | 2405.01113 | null |
| 2024-05-13 | Depth Priors in Removal Neural Radiance Fields | Zhihao Guo et.al. | 2405.00630 | null |
| 2024-04-30 | Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting | Paul Engstler et.al. | 2404.19758 | null |
| 2024-04-30 | Masked Spatial Propagation Network for Sparsity-Adaptive Depth Refinement | Jinyoung Jun et.al. | 2404.19294 | link |
| 2024-04-29 | Simple-RF: Regularizing Sparse Input Radiance Fields with Simpler Solutions | Nagabhushan Somraj et.al. | 2404.19015 | null |
| 2024-05-02 | Underwater Variable Zoom: Depth-Guided Perception Network for Underwater Image Enhancement | Zhixiong Huang et.al. | 2404.17883 | link |
| 2024-05-01 | A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation | Xin Zhang et.al. | 2404.17335 | null |
| 2024-04-27 | The Third Monocular Depth Estimation Challenge | Jaime Spencer et.al. | 2404.16831 | null |
| 2024-04-25 | MonoPCC: Photometric-invariant Cycle Constraint for Monocular Depth Estimation of Endoscopic Images | Zhiwei Wang et.al. | 2404.16571 | null |
| 2024-04-25 | Promoting CNNs with Cross-Architecture Knowledge Distillation for Efficient Monocular Depth Estimation | Zhimeng Zheng et.al. | 2404.16386 | null |
| 2024-04-23 | SGFormer: Spherical Geometry Transformer for 360 Depth Estimation | Junsong Zhang et.al. | 2404.14979 | null |
| 2024-04-23 | Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation | Hoang Chuong Nguyen et.al. | 2404.14908 | null |
| 2024-04-22 | Self-Supervised Monocular Depth Estimation in the Dark: Towards Data Distribution Compensation | Haolin Yang et.al. | 2404.13854 | null |
| 2024-04-21 | GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal | Yuxin Wang et.al. | 2404.13679 | null |
| 2024-04-20 | High-fidelity Endoscopic Image Synthesis by Utilizing Depth-guided Neural Surfaces | Baoru Huang et.al. | 2404.13437 | null |
| 2024-04-18 | SPIdepth: Strengthened Pose Information for Self-supervised Monocular Depth Estimation | Mykola Lavreniuk et.al. | 2404.12501 | link |
| 2024-04-25 | BLINK: Multimodal Large Language Models Can See but Not Perceive | Xingyu Fu et.al. | 2404.12390 | null |
| 2024-04-17 | How to deal with glare for improved perception of Autonomous Vehicles | Muhammad Z. Alam et.al. | 2404.10992 | null |
| 2024-04-12 | Into the Fog: Evaluating Multiple Object Tracking Robustness | Nadezda Kirillova et.al. | 2404.10534 | null |
| 2024-04-17 | Digging into contrastive learning for robust depth estimation with diffusion models | Jiyuan Wang et.al. | 2404.09831 | null |
| 2024-04-15 | Virtually Enriched NYU Depth V2 Dataset for Monocular Depth Estimation: Do We Need Artificial Augmentation? | Dmitry Ignatov et.al. | 2404.09469 | link |
| 2024-04-14 | In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition | Wiktor Mucha et.al. | 2404.09308 | null |
| 2024-04-12 | FusionPortableV2: A Unified Multi-Sensor Dataset for Generalized SLAM Across Diverse Platforms and Scalable Environments | Hexiang Wei et.al. | 2404.08563 | null |
| 2024-04-12 | On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation | Agneet Chatterjee et.al. | 2404.08540 | link |
| 2024-04-11 | Depth Estimation using Weighted-loss and Transfer Learning | Muhammad Adeel Hafeez et.al. | 2404.07686 | null |
| 2024-04-11 | GLID: Pre-training a Generalist Encoder-Decoder Vision Model | Jihao Liu et.al. | 2404.07603 | null |
| 2024-04-11 | Implicit and Explicit Language Guidance for Diffusion-based Visual Perception | Hefeng Wang et.al. | 2404.07600 | null |
| 2024-04-11 | Stereo-LiDAR Depth Estimation with Deformable Propagation and Learned Disparity-Depth Conversion | Ang Li et.al. | 2404.07545 | null |
| 2024-04-10 | Self-supervised Monocular Depth Estimation on Water Scenes via Specular Reflection Prior | Zhengyang Lu et.al. | 2404.07176 | null |
| 2024-04-10 | MonoSelfRecon: Purely Self-Supervised Explicit Generalizable 3D Reconstruction of Indoor Scenes from Monocular RGB Views | Runfa Li et.al. | 2404.06753 | null |
| 2024-04-09 | RoadBEV: Road Surface Reconstruction in Bird’s Eye View | Tong Zhao et.al. | 2404.06605 | link |
| 2024-04-09 | ZeST: Zero-Shot Material Transfer from a Single Image | Ta-Ying Cheng et.al. | 2404.06425 | link |
| 2024-04-09 | Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences | Axel Barroso-Laguna et.al. | 2404.06337 | null |
| 2024-04-09 | Enhanced Radar Perception via Multi-Task Learning: Towards Refined Data for Sensor Fusion Applications | Huawei Sun et.al. | 2404.06165 | null |
| 2024-04-09 | Incremental Joint Learning of Depth, Pose and Implicit Scene Representation on Monocular Camera in Large-scale Scenes | Tianchen Deng et.al. | 2404.06050 | null |
| 2024-04-06 | HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene | Ziang Guo et.al. | 2404.04653 | null |
| 2024-04-09 | Co-Occ: Coupling Explicit Feature Fusion with Volume Rendering Regularization for Multi-Modal 3D Semantic Occupancy Prediction | Jingyi Pan et.al. | 2404.04561 | null |
| 2024-04-05 | SpatialTracker: Tracking Any 2D Pixels in 3D Space | Yuxi Xiao et.al. | 2404.04319 | null |
| 2024-04-05 | Deep Phase Coded Image Prior | Nimrod Shabtay et.al. | 2404.03906 | null |
| 2024-04-04 | Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning | Rui Li et.al. | 2404.03658 | link |
| 2024-04-04 | MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation | Hanzhe Hu et.al. | 2404.03656 | null |
| 2024-04-05 | WorDepth: Variational Language Prior for Monocular Depth Estimation | Ziyao Zeng et.al. | 2404.03635 | link |
| 2024-04-04 | Adaptive Discrete Disparity Volume for Self-supervised Monocular Depth Estimation | Jianwei Ren et.al. | 2404.03190 | null |
| 2024-04-04 | MonoCD: Monocular 3D Object Detection with Complementary Depths | Longfei Yan et.al. | 2404.03181 | link |
| 2024-04-02 | CHOSEN: Contrastive Hypothesis Selection for Multi-View Depth Refinement | Di Qiu et.al. | 2404.02225 | null |
| 2024-04-02 | Improving Bird’s Eye View Semantic Segmentation by Task Decomposition | Tianhao Zhao et.al. | 2404.01925 | null |
| 2024-04-01 | BadPart: Unified Black-box Adversarial Patch Attacks against Pixel-wise Regression Tasks | Zhiyuan Cheng et.al. | 2404.00924 | null |
| 2024-04-01 | MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements | Lisong C. Sun et.al. | 2404.00923 | link |
| 2024-03-31 | OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees | Hakyeong Kim et.al. | 2404.00678 | null |
| 2024-03-30 | The Devil is in the Edges: Monocular Depth Estimation with Edge-aware Consistency Fusion | Pengzhi Li et.al. | 2404.00373 | null |
| 2024-03-30 | Reusable Architecture Growth for Continual Stereo Matching | Chenghao Zhang et.al. | 2404.00360 | null |
| 2024-03-30 | MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text | Takayuki Hara et.al. | 2404.00345 | null |
| 2024-03-29 | VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection | Zihua Liu et.al. | 2404.00149 | null |
| 2024-03-29 | NeSLAM: Neural Implicit Mapping and Self-Supervised Feature Tracking With Depth Completion and Denoising | Tianchen Deng et.al. | 2403.20034 | link |
| 2024-03-28 | SAID-NeRF: Segmentation-AIDed NeRF for Depth Completion of Transparent Objects | Avinash Ummadisingu et.al. | 2403.19607 | null |
| 2024-03-30 | GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM | Ganlin Zhang et.al. | 2403.19549 | link |
| 2024-03-28 | CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians | Avinash Paliwal et.al. | 2403.19495 | link |
| 2024-03-28 | FlowDepth: Decoupling Optical Flow for Self-Supervised Monocular Depth Estimation | Yiyang Sun et.al. | 2403.19294 | null |
| 2024-03-28 | Neural Fields for 3D Tracking of Anatomy and Surgical Instruments in Monocular Laparoscopic Video Clips | Beerend G. A. Gerats et.al. | 2403.19265 | null |
| 2024-03-27 | UniDepth: Universal Monocular Metric Depth Estimation | Luigi Piccinelli et.al. | 2403.18913 | link |
| 2024-04-01 | ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation | Suraj Patni et.al. | 2403.18807 | link |
| 2024-03-27 | ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition | Weidong Xie et.al. | 2403.18762 | link |
| 2024-03-27 | $\mathrm{F^2Depth}$ : Self-supervised Indoor Monocular Depth Estimation via Optical Flow Consistency and Feature Map Synthesis | Xiaotong Guo et.al. | 2403.18443 | null |
| 2024-03-26 | Track Everything Everywhere Fast and Robustly | Yunzhou Song et.al. | 2403.17931 | null |
| 2024-03-26 | Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos | Akshay Paruchuri et.al. | 2403.17915 | null |
| 2024-03-26 | DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing | Matias Turkulainen et.al. | 2403.17822 | null |
| 2024-03-27 | Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving | Junhao Zheng et.al. | 2403.17301 | link |
| 2024-03-25 | Spike-NeRF: Neural Radiance Field Based On Spike Camera | Yijia Guo et.al. | 2403.16410 | null |
| 2024-03-25 | Elite360D: Towards Efficient 360 Depth Estimation via Semantic- and Distance-Aware Bi-Projection Fusion | Hao Ai et.al. | 2403.16376 | null |
| 2024-03-23 | Depth Estimation fusing Image and Radar Measurements with Uncertain Directions | Masaya Kotani et.al. | 2403.15787 | null |
| 2024-03-22 | Language-Based Depth Hints for Monocular Depth Estimation | Dylan Auty et.al. | 2403.15551 | null |
| 2024-03-21 | Learning to Project for Cross-Task Knowledge Distillation | Dylan Auty et.al. | 2403.14494 | null |
| 2024-03-20 | DepthFM: Fast Monocular Depth Estimation with Flow Matching | Ming Gui et.al. | 2403.13788 | null |
| 2024-03-19 | When Do We Not Need Larger Vision Models? | Baifeng Shi et.al. | 2403.13043 | link |
| 2024-03-19 | FutureDepth: Learning to Predict the Future Improves Video Depth Estimation | Rajeev Yasarla et.al. | 2403.12953 | null |
| 2024-03-19 | Geometric Constraints in Deep Learning Frameworks: A Survey | Vibhas K Vats et.al. | 2403.12431 | null |
| 2024-03-18 | GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection | Ziying Song et.al. | 2403.11848 | null |
| 2024-03-18 | SSAP: A Shape-Sensitive Adversarial Patch for Comprehensive Disruption of Monocular Depth Estimation in Autonomous Navigation Applications | Amira Guesmi et.al. | 2403.11515 | null |
| 2024-03-17 | Bilateral Propagation Network for Depth Completion | Jie Tang et.al. | 2403.11270 | null |
| 2024-03-16 | MSI-NeRF: Linking Omni-Depth with View Synthesis through Multi-Sphere Image aided Generalizable Neural Radiance Field | Dongyu Yan et.al. | 2403.10840 | null |
| 2024-03-15 | SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images | Pardis Taghavi et.al. | 2403.10662 | link |
| 2024-03-15 | Robust Shape Fitting for 3D Scene Abstraction | Florian Kluger et.al. | 2403.10452 | link |
| 2024-03-15 | Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning | Meixuan Li et.al. | 2403.10252 | null |
| 2024-03-18 | Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting | Aiden Swann et.al. | 2403.09875 | null |
| 2024-03-14 | Improving Distant 3D Object Detection Using 2D Box Supervision | Zetong Yang et.al. | 2403.09230 | null |
| 2024-03-13 | SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model | Yihao Liu et.al. | 2403.08556 | link |
| 2024-03-13 | METER: a mobile vision transformer architecture for monocular depth estimation | L. Papa et.al. | 2403.08368 | link |
| 2024-03-12 | Q-SLAM: Quadric Representations for Monocular SLAM | Chensheng Peng et.al. | 2403.08125 | null |
| 2024-03-12 | Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving | JunDa Cheng et.al. | 2403.07535 | null |
| 2024-03-12 | D4D: An RGBD diffusion model to boost monocular depth estimation | L. Papa et.al. | 2403.07516 | link |
| 2024-03-12 | SGE: Structured Light System Based on Gray Code with an Event Camera | Xingyu Lu et.al. | 2403.07326 | null |
| 2024-03-11 | Forest Inspection Dataset for Aerial Semantic Segmentation and Depth Estimation | Bianca-Cerasela-Zelia Blaga et.al. | 2403.06621 | link |
| 2024-03-11 | HDA-LVIO: A High-Precision LiDAR-Visual-Inertial Odometry in Urban Environments with Hybrid Data Association | Jian Shi et.al. | 2403.06590 | null |
| 2024-03-11 | Confidence-Aware RGB-D Face Recognition via Virtual Depth Synthesis | Zijian Chen et.al. | 2403.06529 | null |
| 2024-03-09 | DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and Depth from Monocular Videos | Xiuzhe Wu et.al. | 2403.05895 | null |
| 2024-03-07 | Density-Regression: Efficient and Distance-Aware Deep Regressor for Uncertainty Estimation under Distribution Shifts | Ha Manh Bui et.al. | 2403.05600 | link |
| 2024-03-08 | OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction | Ji Zhang et.al. | 2403.05329 | null |
| 2024-03-08 | Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation | Yifan Mao et.al. | 2403.05056 | link |
| 2024-03-06 | Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator | Wonhyeok Choi et.al. | 2403.03468 | null |
| 2024-03-07 | Scene Depth Estimation from Traditional Oriental Landscape Paintings | Sungho Kang et.al. | 2403.03408 | null |
| 2024-03-04 | Iterative Occlusion-Aware Light Field Depth Estimation using 4D Geometrical Cues | Rui Lourenço et.al. | 2403.02043 | null |
| 2024-03-04 | Scalable Vision-Based 3D Object Detection and Monocular Depth Estimation for Autonomous Driving | Yuxuan Liu et.al. | 2403.02037 | link |
| 2024-03-04 | DD-VNB: A Depth-based Dual-Loop Framework for Real-time Visually Navigated Bronchoscopy | Qingyao Tian et.al. | 2403.01683 | null |
| 2024-03-03 | Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV | Jaime Spencer et.al. | 2403.01569 | link |
| 2024-03-03 | Pyramid Feature Attention Network for Monocular Depth Prediction | Yifang Xu et.al. | 2403.01440 | null |
| 2024-03-03 | Depth Estimation Algorithm Based on Transformer-Encoder and Feature Fusion | Linhan Xia et.al. | 2403.01370 | null |
| 2024-03-02 | Depth Information Assisted Collaborative Mutual Promotion Network for Single Image Dehazing | Yafei Zhang et.al. | 2403.01105 | null |
| 2024-02-29 | PCDepth: Pattern-based Complementary Learning for Monocular Depth Estimation by Best of Both Worlds | Haotian Liu et.al. | 2402.18925 | null |
| 2024-02-29 | CFDNet: A Generalizable Foggy Stereo Matching Network with Contrastive Feature Distillation | Zihua Liu et.al. | 2402.18181 | null |
| 2024-02-28 | Self-Supervised Spatially Variant PSF Estimation for Aberration-Aware Depth-from-Defocus | Zhuofeng Wu et.al. | 2402.18175 | null |
| 2024-02-28 | Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging | Bhargav Ghanekar et.al. | 2402.18102 | null |
| 2024-02-27 | A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge – Multi-Task Robustness Track | Zehui Chen et.al. | 2402.17319 | null |
| 2024-02-26 | Automated Floodwater Depth Estimation Using Large Multimodal Model for Rapid Flood Mapping | Temitope Akinboyewa et.al. | 2402.16684 | null |
| 2024-02-22 | GAM-Depth: Self-Supervised Indoor Depth Estimation Leveraging a Gradient-Aware Mask and Semantic Constraints | Anqi Cheng et.al. | 2402.14354 | null |
| 2024-02-22 | TIE-KD: Teacher-Independent and Explainable Knowledge Distillation for Monocular Depth Estimation | Sangwon Choi et.al. | 2402.14340 | link |
| 2024-02-21 | Zero-BEV: Zero-shot Projection of Any First-Person Modality to BEV Maps | Gianluca Monaci et.al. | 2402.13848 | null |
| 2024-02-19 | An Endoscopic Chisel: Intraoperative Imaging Carves 3D Anatomical Models | Jan Emily Mangulabnan et.al. | 2402.11840 | null |
| 2024-02-19 | Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios | Jialei Xu et.al. | 2402.11826 | null |
Audio Processing
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-12-23 | QuarkAudio Technical Report | Chengwei Liu et.al. | 2512.20151 | null |
| 2025-12-23 | VALLR-Pin: Dual-Decoding Visual Speech Recognition for Mandarin with Pinyin-Guided LLM Refinement | Chang Sun et.al. | 2512.20032 | null |
| 2025-12-22 | From Speech to Subtitles: Evaluating ASR Models in Subtitling Italian Television Programs | Alessandro Lucca et.al. | 2512.19161 | null |
| 2025-12-22 | Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization | Jian You et.al. | 2512.18967 | null |
| 2025-12-21 | Speaker Recognition – Wavelet Packet Based Multiresolution Feature Extraction Approach | Saurabh Bhardwaj et.al. | 2512.18902 | null |
| 2025-12-21 | Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis | Pengchao Feng et.al. | 2512.18699 | null |
| 2025-12-20 | Phoneme-based speech recognition driven by large language models and sampling marginalization | Te Ma et.al. | 2512.18371 | null |
| 2025-12-20 | TICL+: A Case Study On Speech In-Context Learning for Children’s Speech Recognition | Haolong Zheng et.al. | 2512.18263 | null |
| 2025-12-19 | SAM Audio: Segment Anything in Audio | Bowen Shi et.al. | 2512.18099 | null |
| 2025-11-27 | Supplementary Resources and Analysis for Automatic Speech Recognition Systems Trained on the Loquacious Dataset | Nick Rossenbach et.al. | 2512.17915 | null |
| 2025-12-19 | Peeking Into The Future For Contextual Biasing | Ramaneswaran Selvakumar et.al. | 2512.17657 | null |
| 2025-12-19 | When De-noising Hurts: A Systematic Study of Speech Enhancement Effects on Modern Medical ASR Systems | Sujal Chondhekar et.al. | 2512.17562 | null |
| 2025-12-19 | Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models | Ali Alsayegh et.al. | 2512.17474 | null |
| 2025-12-19 | Incorporating Error Level Noise Embedding for Improving LLM-Assisted Robustness in Persian Speech Recognition | Zahra Rahmani et.al. | 2512.17247 | null |
| 2025-12-18 | Navigating the Reality Gap: Privacy-Preserving Adaptation of ASR for Challenging Low-Resource Domains | Darshil Chauhan et.al. | 2512.16401 | null |
| 2025-12-16 | ComMark: Covert and Robust Black-Box Model Watermarking with Compressed Samples | Yunfei Yang et.al. | 2512.15641 | null |
| 2025-12-16 | Adapting Speech Language Model to Singing Voice Synthesis | Yiwen Zhao et.al. | 2512.14657 | null |
| 2025-12-16 | MuseCPBench: an Empirical Study of Music Editing Methods through Music Context Preservation | Yash Vishe et.al. | 2512.14629 | null |
| 2025-12-16 | GLM-TTS Technical Report | Jiayan Cui et.al. | 2512.14291 | null |
| 2025-12-16 | Scalable Frameworks for Real-World Audio-Visual Speech Recognition | Sungnyun Kim et.al. | 2512.14083 | null |
| 2025-12-15 | Reproducing and Dissecting Denoising Language Models for Speech Recognition | Dorian Koch et.al. | 2512.13576 | null |
| 2025-12-15 | DisCo-Speech: Controllable Zero-Shot Speech Generation with A Disentangled Speech Codec | Tao Li et.al. | 2512.13251 | null |
| 2025-12-14 | BUT Systems for WildSpoof Challenge: SASV in the Wild | Junyi Peng et.al. | 2512.12851 | null |
| 2025-12-14 | Procedural Music Generation Systems in Games | Shangxuan Luo et.al. | 2512.12834 | null |
| 2025-12-14 | Adaptive Edge-Cloud Inference for Speech-to-Action Systems Using ASR and Large Language Models | Mohammad Jalili Torkamani et.al. | 2512.12769 | null |
| 2025-12-13 | System X: A Mobile Voice-Based AI System for EMR Generation and Clinical Decision Support in Low-Resource Maternal Healthcare | Maryam Mustafa et.al. | 2512.12240 | null |
| 2025-12-13 | A comparative study of generative models for child voice conversion | Protima Nomo Sudro et.al. | 2512.12129 | null |
| 2025-12-12 | All-in-One ASR: Unifying Encoder-Decoder Models of CTC, Attention, and Transducer in Dual-Mode ASR | Takafumi Moriya et.al. | 2512.11543 | null |
| 2025-12-12 | PhraseVAE and PhraseLDM: Latent Diffusion for Full-Song Multitrack Symbolic Music Generation | Longshen Ou et.al. | 2512.11348 | null |
| 2025-12-12 | The Affective Bridge: Unifying Feature Representations for Speech Deepfake Detection | Yupei Li et.al. | 2512.11241 | null |
| 2025-12-11 | The TCG CREST – RKMVERI Submission for the NCIIPC Startup India AI Grand Challenge | Nikhil Raghav et.al. | 2512.11009 | null |
| 2025-11-30 | Benchmarking Automatic Speech Recognition Models for African Languages | Alvin Nahabwe et.al. | 2512.10968 | null |
| 2025-11-30 | ASR Under the Stethoscope: Evaluating Biases in Clinical Speech Recognition across Indian Languages | Subham Kumar et.al. | 2512.10967 | null |
| 2025-12-11 | CompanionCast: A Multi-Agent Conversational AI Framework with Spatial Audio for Social Co-Viewing Experiences | Yiyang Wang et.al. | 2512.10918 | null |
| 2025-12-11 | TRIDENT: A Redundant Architecture for Caribbean-Accented Emergency Speech Triage | Elroy Galbraith et.al. | 2512.10741 | null |
| 2025-12-11 | MR-FlowDPO: Multi-Reward Direct Preference Optimization for Flow-Matching Text-to-Music Generation | Alon Ziv et.al. | 2512.10264 | null |
| 2025-12-10 | Robust Speech Activity Detection in the Presence of Singing Voice | Philipp Grundhuber et.al. | 2512.09713 | null |
| 2025-12-09 | LG Uplus System with Multi-Speaker IDs and Discriminator-based Sub-Judges for the WildSpoof Challenge | Jinyoung Park et.al. | 2512.09000 | null |
| 2025-12-02 | Enhancing Automatic Speech Recognition Through Integrated Noise Detection Architecture | Karamvir Singh et.al. | 2512.08973 | null |
| 2025-12-09 | Emovectors: assessing emotional content in jazz improvisations for creativity evaluation | Anna Jordanous et.al. | 2512.08812 | null |
| 2025-12-08 | A Simple Method to Enhance Pre-trained Language Models with Speech Tokens for Classification | Nicolas Calbucura et.al. | 2512.07571 | null |
| 2025-12-08 | Efficient ASR for Low-Resource Languages: Leveraging Cross-Lingual Unlabeled Data | Srihari Bandarupalli et.al. | 2512.07277 | null |
| 2025-12-06 | Sanvaad: A Multimodal Accessibility Framework for ISL Recognition and Voice-Based Interaction | Kush Revankar et.al. | 2512.06485 | null |
| 2025-12-06 | Degrading Voice: A Comprehensive Overview of Robust Voice Conversion Through Input Manipulation | Xining Song et.al. | 2512.06304 | null |
| 2025-12-01 | KidSpeak: A General Multi-purpose LLM for Kids’ Speech Recognition and Screening | Rohan Sharma et.al. | 2512.05994 | null |
| 2025-11-23 | SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model | Kaidi Wang et.al. | 2512.05126 | null |
| 2025-12-04 | YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases | Gongyu Chen et.al. | 2512.04793 | null |
| 2025-12-04 | M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis | Xiaopeng Wang et.al. | 2512.04720 | null |
| 2025-12-02 | Comparing Unsupervised and Supervised Semantic Speech Tokens: A Case Study of Child ASR | Mohan Shi et.al. | 2512.03301 | null |
| 2025-12-02 | DAWZY: A New Addition to AI powered “Human in the Loop” Music Co-creation | Aaron C Elkins et.al. | 2512.03289 | null |
| 2025-12-02 | Bangla Hate Speech Classification with Fine-tuned Transformer Models | Yalda Keivan Jafari et.al. | 2512.02845 | null |
| 2025-12-01 | Swivuriso: The South African Next Voices Multilingual Speech Dataset | Vukosi Marivatee et.al. | 2512.02201 | null |
| 2025-12-01 | Story2MIDI: Emotionally Aligned Music Generation from Text | Mohammad Shokri et.al. | 2512.02192 | null |
| 2025-11-18 | On the Difficulty of Token-Level Modeling of Dysfluency and Fluency Shaping Artifacts | Kashaf Gulzar et.al. | 2512.02027 | null |
| 2025-12-01 | MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark | Yuezhang Peng et.al. | 2512.01603 | null |
| 2025-12-01 | ZO-ASR: Zeroth-Order Fine-Tuning of Speech Foundation Models without Back-Propagation | Yuezhang Peng et.al. | 2512.01267 | null |
| 2025-11-29 | Melody or Machine: Detecting Synthetic Music with Dual-Stream Contrastive Learning | Arnesh Batra et.al. | 2512.00621 | null |
| 2025-11-28 | OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion | Sai Koneru et.al. | 2512.00234 | null |
| 2025-11-27 | Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment | Jiaying Hong et.al. | 2512.00120 | null |
| 2025-11-28 | Scaling HuBERT for African Languages: From Base to Large and XL | Antoine Caubrière et.al. | 2511.23370 | null |
| 2025-11-28 | HPSU: A Benchmark for Human-Level Perception in Real-World Spoken Speech Understanding | Chen Li et.al. | 2511.23178 | null |
| 2025-11-28 | Group-Aware Partial Model Merging for Children’s Automatic Speech Recognition | Thomas Rolland et.al. | 2511.23098 | null |
| 2025-11-27 | Modeling Romanized Hindi and Bengali: Dataset Creation and Multilingual LLM Integration | Kanchon Gharami et.al. | 2511.22769 | null |
| 2025-11-27 | Do You See What I Say? Generalizable Deepfake Detection based on Visual Speech Recognition | Maheswar Bora et.al. | 2511.22443 | null |
| 2025-11-27 | GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis | Teysir Baoueb et.al. | 2511.22293 | null |
| 2025-11-16 | On the Cross-lingual Transferability of Pre-trained wav2vec2-based Models | Jonatas Grosman et.al. | 2511.21704 | null |
| 2025-11-26 | ASR Error Correction in Low-Resource Burmese with Alignment-Enhanced Transformers using Phonetic Features | Ye Bhone Lin et.al. | 2511.21088 | null |
| 2025-11-26 | CartoonSing: Unifying Human and Nonhuman Timbres in Singing Generation | Jionghao Han et.al. | 2511.21045 | null |
| 2025-11-26 | Towards Audio Token Compression in Large Audio Language Models | Saurabhchand Bhati et.al. | 2511.20973 | null |
| 2025-11-26 | SingingSDS: A Singing-Capable Spoken Dialogue System for Conversational Roleplay Applications | Jionghao Han et.al. | 2511.20972 | null |
| 2025-11-25 | Bridging the Language Gap: Synthetic Voice Diversity via Latent Mixup for Equitable Speech Recognition | Wesley Bian et.al. | 2511.20534 | null |
| 2025-11-25 | Modular Deep Learning Framework for Assistive Perception: Gaze, Affect, and Speaker Identification | Akshit Pramod Anchan et.al. | 2511.20474 | null |
| 2025-11-25 | Mispronunciation Detection and Diagnosis Without Model Training: A Retrieval-Based Approach | Huu Tuong Tu et.al. | 2511.20107 | null |
| 2025-11-25 | Continual Audio Deepfake Detection via Universal Adversarial Perturbation | Wangjie Li et.al. | 2511.19974 | null |
| 2025-11-24 | Explicit Tonal Tension Conditioning via Dual-Level Beam Search for Symbolic Music Generation | Maral Ebrahimzadeh et.al. | 2511.19342 | null |
| 2025-11-24 | Neural Architecture Search for Quantum Autoencoders | Hibah Agha et.al. | 2511.19246 | null |
| 2025-11-24 | Context-Aware Whisper for Arabic ASR Under Linguistic Varieties | Bashar Talafha et.al. | 2511.18774 | null |
| 2025-11-24 | AIRHILT: A Human-in-the-Loop Testbed for Multimodal Conflict Detection in Aviation | Omar Garib et.al. | 2511.18718 | null |
| 2025-11-23 | InstructAudio: Unified speech and music generation with natural language instruction | Chunyu Qiang et.al. | 2511.18487 | null |
| 2025-11-23 | A Multimodal Conversational Agent for Tabular Data Analysis | Mohammad Nour Al Awad et.al. | 2511.18405 | null |
| 2025-11-21 | Point of Order: Action-Aware LLM Persona Modeling for Realistic Civic Simulation | Scott Merrill et.al. | 2511.17813 | null |
| 2025-11-12 | Speech Recognition Model Improves Text-to-Speech Synthesis using Fine-Grained Reward | Guansu Wang et.al. | 2511.17555 | null |
| 2025-11-21 | MusicAIR: A Multimodal AI Music Generation Framework Powered by an Algorithm-Driven Core | Callie C. Liao et.al. | 2511.17323 | null |
| 2025-11-20 | Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs | Wei-Cheng Tseng et.al. | 2511.16639 | null |
| 2025-11-20 | WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue | Zachary Ellis et.al. | 2511.16544 | null |
| 2025-11-20 | SceneGuard: Training-Time Voice Protection with Scene-Consistent Audible Background Noise | Rui Sang et.al. | 2511.16114 | null |
| 2025-11-20 | Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio | Mohan Shi et.al. | 2511.16046 | null |
| 2025-11-19 | LargeSHS: A large-scale dataset of music adaptation | Chih-Pin Tan et.al. | 2511.15270 | null |
| 2025-11-19 | Aligning Generative Music AI with Human Preferences: Methods and Challenges | Dorien Herremans et.al. | 2511.15038 | null |
| 2025-11-06 | The Impact of Prosodic Segmentation on Speech Synthesis of Spontaneous Speech | Julio Cesar Galdino et.al. | 2511.14779 | null |
| 2025-11-18 | A Controllable Perceptual Feature Generative Model for Melody Harmonization via Conditional Variational Autoencoder | Dengyun Huang et.al. | 2511.14600 | null |
| 2025-11-18 | TTA: Transcribe, Translate and Alignment for Cross-lingual Speech Representation | Wei Liu et.al. | 2511.14410 | null |
| 2025-11-18 | AfriSpeech-MultiBench: A Verticalized Multidomain Multicountry Benchmark Suite for African Accented English ASR | Gabrial Zencha Ashungafac et.al. | 2511.14255 | null |
| 2025-11-18 | Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation | Kumud Tripathi et.al. | 2511.14219 | null |
| 2025-11-17 | Human-centric Maintenance Process Through Integration of AI, Speech, and AR | Parul Khanna et.al. | 2511.13918 | null |
| 2025-11-05 | Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical Fusion | Xiao Li et.al. | 2511.13731 | null |
| 2025-11-17 | Alpha Divergence Losses for Biometric Verification | Dimitrios Koutsianos et.al. | 2511.13621 | null |
| 2025-11-17 | Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets | Máté Gedeon et.al. | 2511.13529 | null |
| 2025-11-17 | Spatial Blind Spot: Auditory Motion Perception Deficits in Audio LLMs | Zhe Sun et.al. | 2511.13273 | null |
| 2025-11-17 | Distinguishing Repetition Disfluency from Morphological Reduplication in Bangla ASR Transcripts: A Novel Corpus and Benchmarking Analysis | Zaara Zabeen Arpa et.al. | 2511.13159 | null |
| 2025-11-16 | Hi-Reco: High-Fidelity Real-Time Conversational Digital Humans | Hongbin Huang et.al. | 2511.12662 | null |
| 2025-11-15 | VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing | Zhisheng Zheng et.al. | 2511.12347 | null |
| 2025-11-15 | How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer | Minu Kim et.al. | 2511.12285 | null |
| 2025-11-15 | Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets | Huy M. Le et.al. | 2511.12255 | null |
| 2025-11-12 | Tighter Truncated Rectangular Prism Approximation for RNN Robustness Verification | Xingqi Lin et.al. | 2511.11699 | null |
| 2025-11-14 | Speech-Aware Long Context Pruning and Integration for Contextualized Automatic Speech Recognition | Yiming Rong et.al. | 2511.11139 | null |
| 2025-11-13 | Curved Worlds, Clear Boundaries: Generalizing Speech Deepfake Detection using Hyperbolic and Spherical Geometry Spaces | Farhan Sheth et.al. | 2511.10793 | null |
| 2025-11-13 | TEDxTN: A Three-way Speech Translation Corpus for Code-Switched Tunisian Arabic - English | Fethi Bougares et.al. | 2511.10780 | null |
| 2025-11-09 | Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment | Yan Gao et.al. | 2511.10670 | null |
| 2025-11-13 | VocalNet-M2: Advancing Low-Latency Spoken Language Modeling via Integrated Multi-Codebook Tokenization and Multi-Token Prediction | Yuhao Wang et.al. | 2511.10232 | null |
| 2025-11-13 | FabasedVC: Enhancing Voice Conversion with Text Modality Fusion and Phoneme-Level SSL Features | Wenyu Wang et.al. | 2511.10112 | null |
| 2025-11-13 | Time-Layer Adaptive Alignment for Speaker Similarity in Flow-Matching Based Zero-Shot TTS | Haoyu Li et.al. | 2511.09995 | null |
| 2025-11-12 | Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages | Omnilingual ASR team et.al. | 2511.09690 | null |
| 2025-11-12 | Video Echoed in Music: Semantic, Temporal, and Rhythmic Alignment for Video-to-Music Generation | Xinyi Tong et.al. | 2511.09585 | null |
| 2025-11-12 | End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering | Jiliang Hu et.al. | 2511.09282 | null |
| 2025-11-12 | Diff-V2M: A Hierarchical Conditional Diffusion Model with Explicit Rhythmic Modeling for Video-to-Music Generation | Shulei Ji et.al. | 2511.09090 | null |
| 2025-11-12 | Context-Aware Dynamic Chunking for Streaming Tibetan Speech Recognition | Chao Wang et.al. | 2511.09085 | null |
| 2025-11-12 | Towards Effective and Efficient Non-autoregressive decoders for Conformer and LLM-based ASR using Block-based Attention Mask | Tianzi Wang et.al. | 2511.09084 | null |
| 2025-11-11 | HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios | Bingsong Bai et.al. | 2511.08496 | null |
| 2025-11-11 | Melodia: Training-Free Music Editing Guided by Attention Probing in Diffusion Models | Yi Yang et.al. | 2511.08252 | null |
| 2025-11-11 | Quantizing Whisper-small: How design choices affect ASR performance | Arthur Söhler et.al. | 2511.08093 | null |
| 2025-11-11 | SpeechJudge: Towards Human-Level Judgment for Speech Naturalness | Xueyao Zhang et.al. | 2511.07931 | null |
| 2025-11-10 | Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction | Hyeryun Park et.al. | 2511.07392 | null |
| 2025-11-10 | Generating Piano Music with Transformers: A Comparative Study of Scale, Data, and Metrics | Jonathan Lehmkuhl et.al. | 2511.07268 | null |
| 2025-11-10 | Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models | Umberto Cappellazzo et.al. | 2511.07253 | null |
| 2025-11-10 | Improving Remote Patient Monitoring Systems Using a Fog-based IoT Platform with Speech Recognition | Marc Jayson Baucas et.al. | 2511.07189 | null |
| 2025-11-10 | Conditional Diffusion as Latent Constraints for Controllable Symbolic Music Generation | Matteo Pettenó et.al. | 2511.07156 | null |
| 2025-11-10 | Generating Novel and Realistic Speakers for Voice Conversion | Meiying Melissa Chen et.al. | 2511.07135 | null |
| 2025-11-10 | On the Joint Minimization of Regularization Loss Functions in Deep Variational Bayesian Methods for Attribute-Controlled Symbolic Music Generation | Matteo Pettenó et.al. | 2511.07118 | null |
| 2025-11-10 | E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis | Zhisheng Zhang et.al. | 2511.07099 | null |
| 2025-11-10 | Metric Analysis for Spatial Semantic Segmentation of Sound Scenes | Mayank Mishra et.al. | 2511.07075 | null |
| 2025-11-10 | CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition | Hung-Yang Sung et.al. | 2511.06860 | null |
| 2025-11-07 | Persian Musical Instruments Classification Using Polyphonic Data Augmentation | Diba Hadi Esfangereh et.al. | 2511.05717 | null |
| 2025-11-02 | Factual and Musical Evaluation Metrics for Music Language Models | Daniel Chenyu Lin et.al. | 2511.05550 | null |
| 2025-11-06 | PromptSep: Generative Audio Separation via Multimodal Prompting | Yutong Wen et.al. | 2511.04623 | null |
| 2025-11-06 | MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers | Ali Boudaghi et.al. | 2511.04376 | null |
| 2025-11-06 | Robustness of Minimum-Volume Nonnegative Matrix Factorization under an Expanded Sufficiently Scattered Condition | Giovanni Barbarino et.al. | 2511.04291 | null |
| 2025-11-06 | CantoASR: Prosody-Aware ASR-LALM Collaboration for Low-Resource Cantonese | Dazhong Chen et.al. | 2511.04139 | null |
| 2025-11-06 | Testing the Testers: Human-Driven Quality Assessment of Voice AI Testing Platforms | Miguel E. Andres et.al. | 2511.04133 | null |
| 2025-11-06 | WST: Weakly Supervised Transducer for Automatic Speech Recognition | Dongji Gao et.al. | 2511.04035 | null |
| 2025-11-06 | Accelerating scientific discovery with the common task framework | J. Nathan Kutz et.al. | 2511.04001 | null |
| 2025-11-06 | MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation | Shih-Lun Wu et.al. | 2511.03942 | null |
| 2025-11-05 | SyMuPe: Affective and Controllable Symbolic Music Performance | Ilya Borovik et.al. | 2511.03425 | null |
| 2025-11-05 | Seeing What You Say: Expressive Image Generation from Speech | Jiyoung Lee et.al. | 2511.03423 | null |
| 2025-11-05 | Open Source State-Of-the-Art Solution for Romanian Speech Recognition | Gabriel Pirlogeanu et.al. | 2511.03361 | null |
| 2025-11-05 | TASU: Text-Only Alignment for Speech Understanding | Jing Peng et.al. | 2511.03310 | null |
| 2025-11-05 | How to Evaluate Speech Translation with Source-Aware Neural MT Metrics | Mauro Cettolo et.al. | 2511.03295 | null |
| 2025-11-04 | An unscented Kalman filter method for real time input-parameter-state estimation | Marios Impraimakis et.al. | 2511.02717 | null |
| 2025-11-04 | Augmenting Open-Vocabulary Dysarthric Speech Assessment with Human Perceptual Supervision | Kaimeng Jia et.al. | 2511.02270 | null |
| 2025-11-04 | Energy-Efficient Hardware Acceleration of Whisper ASR on a CGLA | Takuto Ando et.al. | 2511.02269 | null |
| 2025-11-03 | ADNAC: Audio Denoiser using Neural Audio Codec | Daniel Jimon et.al. | 2511.01773 | null |
| 2025-11-03 | SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia | Chaoqun Liu et.al. | 2511.01670 | null |
| 2025-11-03 | The Ghost in the Keys: A Disklavier Demo for Human-AI Musical Co-Creativity | Louis Bradshaw et.al. | 2511.01663 | null |
| 2025-11-02 | WhisperVC: Target Speaker-Controllable Mandarin Whisper-to-Speech Conversion | Dong Liu et.al. | 2511.01056 | null |
| 2025-11-02 | MULTI-Bench: A Multi-Turn Interactive Benchmark for Assessing Emotional Intelligence ability of Spoken Dialogue Models | Yayue Deng et.al. | 2511.00850 | null |
| 2025-11-02 | Rhythm in the Air: Vision-based Real-Time Music Generation through Gestures | Barathi Subramanian et.al. | 2511.00793 | null |
| 2025-11-01 | More Than A Shortcut: A Hyperbolic Approach To Early-Exit Networks | Swapnil Bhosale et.al. | 2511.00641 | null |
| 2025-11-01 | On Improvisation and Open-Endedness: Insights for Experiential AI | Botao ‘Amber’ Hu et.al. | 2511.00529 | null |
| 2025-11-01 | Emotion Detection in Speech Using Lightweight and Transformer-Based Models: A Comparative and Ablation Study | Lucky Onyekwelu-Udoka et.al. | 2511.00402 | null |
| 2025-10-31 | NaturalVoices: A Large-Scale, Spontaneous and Emotional Podcast Dataset for Voice Conversion | Zongyang Du et.al. | 2511.00256 | null |
| 2025-10-31 | Holographic equation of state matched with hadron gas equation as a tool for the study of the quark-gluon plasma evolution | A. V. Anufriev et.al. | 2510.27541 | null |
| 2025-10-31 | Referee: Reference-aware Audiovisual Deepfake Detection | Hyemin Boo et.al. | 2510.27475 | null |
| 2025-10-31 | Pairwise and Attribute-Aware Decision Tree-Based Preference Elicitation for Cold-Start Recommendation | Alireza Gharahighehi et.al. | 2510.27342 | null |
| 2025-10-31 | Reconstructing Unseen Sentences from Speech-related Biosignals for Open-vocabulary Neural Communication | Deok-Seon Kim et.al. | 2510.27247 | null |
| 2025-10-31 | Reference Microphone Selection for Guided Source Separation based on the Normalized L-p Norm | Anselm Lohmann et.al. | 2510.27198 | null |
| 2025-10-31 | Expressive Range Characterization of Open Text-to-Audio Models | Jonathan Morse et.al. | 2510.27102 | null |
| 2025-10-30 | Are Online Sports Fan Communities Becoming More Offensive? A Quantitative Review of Topics, Trends, and Toxicity of r/PremierLeague | Muhammad Zeeshan Mazhar et.al. | 2510.27003 | null |
| 2025-10-30 | Overview of the MEDIQA-OE 2025 Shared Task on Medical Order Extraction from Doctor-Patient Consultations | Jean-Philippe Corbeil et.al. | 2510.26974 | null |
| 2025-10-29 | Multi-Representation Attention Framework for Underwater Bioacoustic Denoising and Recognition | Amine Razig et.al. | 2510.26838 | null |
| 2025-10-29 | Audio-Visual Speech Enhancement In Complex Scenarios With Separation And Dereverberation Joint Modeling | Jiarong Du et.al. | 2510.26825 | null |
| 2025-10-28 | Cross-Corpus Validation of Speech Emotion Recognition in Urdu using Domain-Knowledge Acoustic Features | Unzela Talpur et.al. | 2510.26823 | null |
| 2025-10-28 | See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement | Jinting Wang et.al. | 2510.26819 | null |
| 2025-10-28 | GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment | Jinting Wang et.al. | 2510.26818 | null |
| 2025-10-30 | HMM for short independent sequences: Multiple sequence Baum-Welch application | Margarita Cabrera-Bean et.al. | 2510.26532 | null |
| 2025-10-30 | UniTok-Audio: A Unified Audio Generation Framework via Generative Modeling on Discrete Codec Tokens | Chengwei Liu et.al. | 2510.26372 | null |
| 2025-10-30 | Language Models Are Borrowing-Blind: A Multilingual Evaluation of Loanword Identification across 10 Languages | Mérilin Sousa Silva et.al. | 2510.26254 | null |
| 2025-10-29 | Efficient Vocal Source Separation Through Windowed Sink Attention | Christodoulos Benetatos et.al. | 2510.25745 | null |
| 2025-10-29 | Lost in Phonation: Voice Quality Variation as an Evaluation Dimension for Speech Foundation Models | Harm Lameris et.al. | 2510.25577 | null |
| 2025-10-29 | Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation | Yuxiang Mao et.al. | 2510.25234 | null |
| 2025-10-27 | SFMS-ALR: Script-First Multilingual Speech Synthesis with Adaptive Locale Resolution | Dharma Teja Donepudi et.al. | 2510.25178 | null |
| 2025-10-29 | Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Training of Sound Events With Partial Labels | Keisuke Imoto et.al. | 2510.25075 | null |
| 2025-10-29 | Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech | Pedro Corrêa et.al. | 2510.25054 | null |
| 2025-10-28 | POWSM: A Phonetic Open Whisper-Style Speech Foundation Model | Chin-Jou Li et.al. | 2510.24992 | null |
| 2025-10-28 | The Narrative Continuity Test: A Conceptual Framework for Evaluating Identity Persistence in AI Systems | Stefano Natangelo et.al. | 2510.24831 | null |
| 2025-10-28 | Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation | Inclusion AI et.al. | 2510.24821 | null |
| 2025-10-28 | BEST-RQ-Based Self-Supervised Learning for Whisper Domain Adaptation | Raphaël Bagat et.al. | 2510.24570 | null |
| 2025-10-28 | Levée d’ambiguïtés par grammaires locales | Eric G. C. Laporte et.al. | 2510.24530 | null |
| 2025-10-28 | Audio Signal Processing Using Time Domain Mel-Frequency Wavelet Coefficient | Rinku Sebastian et.al. | 2510.24519 | null |
| 2025-10-28 | Sound Source Localization for Spatial Mapping of Surgical Actions in Dynamic Scenes | Jonas Hein et.al. | 2510.24332 | null |
| 2025-10-28 | V-SAT: Video Subtitle Annotation Tool | Arpita Kundu et.al. | 2510.24180 | null |
| 2025-10-28 | RegSpeech12: A Regional Corpus of Bengali Spontaneous Speech Across Dialects | Md. Rezuwan Hassan et.al. | 2510.24096 | null |
| 2025-10-27 | A Neural Model for Contextual Biasing Score Learning and Filtering | Wanting Huang et.al. | 2510.23849 | null |
| 2025-10-27 | Learning Interpretable Features in Audio Latent Spaces via Sparse Autoencoders | Nathan Paek et.al. | 2510.23802 | null |
| 2025-10-27 | SoulX-Podcast: Towards Realistic Long-form Podcasts with Dialectal and Paralinguistic Diversity | Hanke Xie et.al. | 2510.23541 | null |
| 2025-10-27 | LibriConvo: Simulating Conversations from Read Literature for ASR and Diarization | Máté Gedeon et.al. | 2510.23320 | null |
| 2025-10-27 | Arabic Little STT: Arabic Children Speech Recognition Dataset | Mouhand Alkadri et.al. | 2510.23319 | null |
| 2025-10-27 | Are ASR foundation models generalized enough to capture features of regional dialects for low-resource languages? | Tawsif Tashwar Dipto et.al. | 2510.23252 | null |
| 2025-10-27 | Treble10: A high-quality dataset for far-field speech recognition, dereverberation, and enhancement | Sarabeth S. Mullins et.al. | 2510.23141 | null |
| 2025-10-27 | Adapting Speech Foundation Models with Large Language Models for Unified Speech Recognition | Jing-Xuan Zhang et.al. | 2510.22961 | null |
| 2025-10-26 | LRW-Persian: Lip-reading in the Wild Dataset for Persian Language | Zahra Taghizadeh et.al. | 2510.22716 | null |
| 2025-10-26 | Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMs | Anand et.al. | 2510.22603 | null |
| 2025-10-26 | UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models | Wenming Tu et.al. | 2510.22588 | null |
| 2025-10-26 | A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus | Michael Scott et.al. | 2510.22495 | null |
| 2025-10-26 | The Tonogenesis Continuum in Tibetan: A Computational Investigation | Siyu Liang et.al. | 2510.22485 | null |
| 2025-10-25 | M-CIF: Multi-Scale Alignment For CIF-Based Non-Autoregressive ASR | Ruixiang Mao et.al. | 2510.22172 | null |
| 2025-10-25 | Streaming Generation for Music Accompaniment | Yusong Wu et.al. | 2510.22105 | null |
| 2025-10-23 | GuitarFlow: Realistic Electric Guitar Synthesis From Tablatures via Flow Matching and Style Transfer | Jackson Loth et.al. | 2510.21872 | null |
| 2025-10-24 | StylePitcher: Generating Style-Following and Expressive Pitch Curves for Versatile Singing Tasks | Jingyue Huang et.al. | 2510.21685 | null |
| 2025-10-23 | ReFESS-QI: Reference-Free Evaluation For Speech Separation With Joint Quality And Intelligibility Scoring | Ari Frummer et.al. | 2510.21014 | null |
| 2025-10-21 | Can large audio language models understand child stuttering speech? speech summarization, and source separation | Chibuzor Okocha et.al. | 2510.20850 | null |
| 2025-10-23 | R2-SVC: Towards Real-World Robust and Expressive Zero-shot Singing Voice Conversion | Junjie Zheng et.al. | 2510.20677 | null |
| 2025-10-23 | Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding | Xin Zhang et.al. | 2510.20504 | null |
| 2025-10-23 | Vox-Evaluator: Enhancing Stability and Fidelity for Zero-shot TTS with A Multi-Level Evaluator | Hualei Wang et.al. | 2510.20210 | null |
| 2025-10-23 | SpeechAgent: An End-to-End Mobile Infrastructure for Speech Impairment Assistance | Haowei Lou et.al. | 2510.20113 | null |
| 2025-10-22 | Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition | Yuu Jinnai et.al. | 2510.19471 | null |
| 2025-10-22 | FLASH Viterbi: Fast and Adaptive Viterbi Decoding for Modern Data Systems | Ziheng Deng et.al. | 2510.19301 | null |
| 2025-10-22 | Tibetan Language and AI: A Comprehensive Survey of Resources, Methods and Challenges | Cheng Huang et.al. | 2510.19144 | null |
| 2025-10-21 | Steering Autoregressive Music Generation with Recursive Feature Machines | Daniel Zhao et.al. | 2510.19127 | null |
| 2025-10-21 | StutterZero and StutterFormer: End-to-End Speech Conversion for Stuttering Transcription and Correction | Qianheng Xu et.al. | 2510.18938 | null |
| 2025-10-21 | RIR-Mega: a large-scale simulated room impulse response dataset for machine learning and room acoustics modeling | Mandip Goswami et.al. | 2510.18917 | null |
| 2025-10-21 | MLMA: Towards Multilingual ASR With Mamba-based Architectures | Mohamed Nabih Ali et.al. | 2510.18684 | null |
| 2025-10-21 | Noise-Conditioned Mixture-of-Experts Framework for Robust Speaker Verification | Bin Gu et.al. | 2510.18533 | null |
| 2025-10-21 | A Stage-Wise Learning Strategy with Fixed Anchors for Robust Speaker Verification | Bin Gu et.al. | 2510.18530 | null |
| 2025-10-20 | DELULU: Discriminative Embedding Learning Using Latent Units for Speaker-Aware Self-Supervised Speech Foundational Model | Massa Baali et.al. | 2510.17662 | null |
| 2025-10-19 | U-Codec: Ultra Low Frame-rate Neural Speech Codec for Fast High-fidelity Speech Generation | Xusheng Yang et.al. | 2510.16718 | null |
| 2025-10-19 | Zero- and One-Shot Data Augmentation for Sentence-Level Dysarthric Speech Recognition in Constrained Scenarios | Shiyao Wang et.al. | 2510.16700 | null |
| 2025-10-18 | Hallucination Benchmark for Speech Foundation Models | Alkis Koudounas et.al. | 2510.16567 | null |
| 2025-10-18 | Interpreting the Dimensions of Speaker Embedding Space | Mark Huckvale et.al. | 2510.16489 | null |
| 2025-10-18 | Probing the Hidden Talent of ASR Foundation Models for L2 English Oral Assessment | Fu-An Chao et.al. | 2510.16387 | null |
| 2025-10-18 | MuseTok: Symbolic Music Tokenization for Generation and Semantic Understanding | Jingyue Huang et.al. | 2510.16273 | null |
| 2025-10-17 | SpeechLLMs for Large-scale Contextualized Zero-shot Slot Filling | Kadri Hacioglu et.al. | 2510.15851 | null |
| 2025-10-17 | SpikeVox: Towards Energy-Efficient Speech Therapy Framework with Spike-driven Generative Language Models | Rachmad Vidya Wicaksana Putra et.al. | 2510.15566 | null |
| 2025-10-16 | RLAIF-SPA: Optimizing LLM-based Emotional Speech Synthesis via RLAIF | Qing Yang et.al. | 2510.14628 | null |
| 2025-10-16 | Do Joint Language-Audio Embeddings Encode Perceptual Timbre Semantics? | Qixin Deng et.al. | 2510.14249 | null |
| 2025-10-15 | Do Slides Help? Multi-modal Context for Automatic Transcription of Conference Talks | Supriti Sinhamahapatra et.al. | 2510.13979 | null |
| 2025-10-15 | Closing the Gap Between Text and Speech Understanding in LLMs | Santiago Cuervo et.al. | 2510.13632 | null |
| 2025-10-15 | UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE | Zhenyu Liu et.al. | 2510.13344 | null |
| 2025-10-15 | Two Heads Are Better Than One: Audio-Visual Speech Error Correction with Dual Hypotheses | Sungnyun Kim et.al. | 2510.13281 | null |
| 2025-10-14 | Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs | Xinlu He et.al. | 2510.12995 | null |
| 2025-10-14 | VCTR: A Transformer-Based Model for Non-parallel Voice Conversion | Maharnab Saikia et.al. | 2510.12964 | null |
| 2025-10-14 | A Critical Review of the Need for Knowledge-Centric Evaluation of Quranic Recitation | Mohammed Hilal Al-Kharusi et.al. | 2510.12858 | null |
| 2025-10-14 | Adaptive vector steering: A training-free, layer-wise intervention for hallucination mitigation in large audio and multimodal models | Tsung-En Lin et.al. | 2510.12851 | null |
| 2025-10-11 | Automatic Speech Recognition in the Modern Era: Architectures, Training, and Evaluation | Md. Nayeem et.al. | 2510.12827 | null |
| 2025-10-14 | Structured Sparsity and Weight-adaptive Pruning for Memory and Compute efficient Whisper models | Prasenjit K Mudi et.al. | 2510.12666 | null |
| 2025-10-13 | BridgeCode: A Dual Speech Representation Paradigm for Autoregressive Zero-Shot Text-to-Speech Synthesis | Jingyuan Xing et.al. | 2510.11646 | null |
| 2025-10-13 | Perturbation Self-Supervised Representations for Cross-Lingual Emotion TTS: Stage-Wise Modeling of Emotion and Speaker | Cheng Gong et.al. | 2510.11124 | null |
| 2025-10-13 | VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents | Jiliang Hu et.al. | 2510.11098 | null |
| 2025-10-12 | ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis | Mohammad Javad Ranjbar Kalahroodi et.al. | 2510.10774 | null |
| 2025-10-12 | End-to-end Speech Recognition with similar length speech and text | Peng Fan et.al. | 2510.10453 | null |
| 2025-10-12 | MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations | Wenxiang Guo et.al. | 2510.10396 | null |
| 2025-10-11 | End-to-end Automatic Speech Recognition and Speech Translation: Integration of Speech Foundational Models and LLMs | Nam Luu et.al. | 2510.10329 | null |
| 2025-10-11 | ProGress: Structured Music Generation via Graph Diffusion and Hierarchical Music Analysis | Stephen Ni-Hahn et.al. | 2510.10249 | null |
| 2025-10-11 | SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation | Zeyu Ling et.al. | 2510.10069 | null |
| 2025-10-10 | Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking | Mohammad Hossein Sameti et.al. | 2510.09528 | null |
| 2025-10-10 | WildElder: A Chinese Elderly Speech Dataset from the Wild with Fine-Grained Manual Annotations | Hui Wang et.al. | 2510.09344 | null |
| 2025-10-10 | SynthVC: Leveraging Synthetic Data for End-to-End Low Latency Streaming Voice Conversion | Zhao Guo et.al. | 2510.09245 | null |
| 2025-10-10 | Effects of automotive microphone frequency response characteristics and noise conditions on speech and ASR quality – an experimental evaluation | Michele Buccoli et.al. | 2510.09236 | null |
| 2025-10-10 | FLToP CTC: Frame-Level Token Pruning via Relative Threshold for Efficient and Memory-Saving Decoding on Diverse Platforms | Atul Shree et.al. | 2510.09085 | null |
| 2025-10-10 | O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion | Huu Tuong Tu et.al. | 2510.09061 | null |
| 2025-10-08 | Look before Transcription: End-to-End SlideASR with Visually-Anchored Policy Optimization | Rui Hu et.al. | 2510.08618 | null |
| 2025-10-09 | MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows | Guobin Ma et.al. | 2510.08392 | null |
| 2025-10-09 | DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching | Hanke Xie et.al. | 2510.08373 | null |
| 2025-10-09 | Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition | Yi-Cheng Lin et.al. | 2510.08047 | null |
| 2025-10-09 | IntMeanFlow: Few-step Speech Generation with Integral Velocity Distillation | Wei Wang et.al. | 2510.07979 | null |
| 2025-10-09 | VoiceAgentBench: Are Voice Assistants ready for agentic tasks? | Dhruv Jain et.al. | 2510.07978 | null |
| 2025-10-09 | Bloodroot: When Watermarking Turns Poisonous For Stealthy Backdoor | Kuan-Yu Chen et.al. | 2510.07909 | null |
| 2025-10-08 | How much speech data is necessary for ASR in African languages? An evaluation of data scaling in Kinyarwanda and Kikuyu | Benjamin Akera et.al. | 2510.07221 | null |
| 2025-10-08 | Making Machines Sound Sarcastic: LLM-Enhanced and Retrieval-Guided Sarcastic Speech Synthesis | Zhu Li et.al. | 2510.07096 | null |
| 2025-10-08 | Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation | Vaibhav Srivastav et.al. | 2510.06961 | null |
| 2025-08-26 | A Framework for Robust Speaker Verification in Highly Noisy Environments Leveraging Both Noisy and Enhanced Audio | Adam Katav et.al. | 2508.18913 | null |
| 2025-08-20 | Improving Resource-Efficient Speech Enhancement via Neural Differentiable DSP Vocoder Refinement | Heitor R. Guimarães et.al. | 2508.14709 | null |
| 2025-08-18 | Integrating Feedback Loss from Bi-modal Sarcasm Detector for Sarcastic Speech Synthesis | Zhu Li et.al. | 2508.13028 | null |
| 2025-08-15 | EmoSSLSphere: Multilingual Emotional Speech Synthesis with Spherical Vectors and Discrete Speech Tokens | Joonyong Park et.al. | 2508.11273 | null |
| 2025-08-12 | Multi-Target Backdoor Attacks Against Speaker Recognition | Alexandrine Fortier et.al. | 2508.08559 | null |
| 2025-07-23 | AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer | Danny D. Leybzon et.al. | 2507.17718 | null |
| 2025-07-23 | Synthetic Voice Data for Automatic Speech Recognition in African Languages | Brian DeRenzi et.al. | 2507.17578 | null |
| 2025-07-23 | BoSS: Beyond-Semantic Speech | Qing Wang et.al. | 2507.17563 | null |
| 2025-07-23 | Clustering-based hard negative sampling for supervised contrastive speaker verification | Piotr Masztalski et.al. | 2507.17540 | null |
| 2025-07-23 | Application of Whisper in Clinical Practice: the Post-Stroke Speech Assessment during a Naming Task | Milena Davudova et.al. | 2507.17326 | null |
| 2025-07-23 | On Temporal Guidance and Iterative Refinement in Audio Source Separation | Tobias Morocutti et.al. | 2507.17297 | null |
| 2025-07-23 | Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge | Miaomiao Gao et.al. | 2507.17288 | null |
| 2025-07-22 | SplitMeanFlow: Interval Splitting Consistency in Few-Step Generative Modeling | Yi Guo et.al. | 2507.16884 | null |
| 2025-07-22 | Step-Audio 2 Technical Report | Boyong Wu et.al. | 2507.16632 | null |
| 2025-07-22 | An approach to measuring the performance of Automatic Speech Recognition (ASR) models in the context of Large Language Model (LLM) powered applications | Sujith Pulikodan et.al. | 2507.16456 | null |
| 2025-07-21 | Beyond Rate Coding: Surrogate Gradients Enable Spike Timing Learning in Spiking Neural Networks | Ziqiao Yu et.al. | 2507.16043 | null |
| 2025-07-21 | Mixture to Beamformed Mixture: Leveraging Beamformed Mixture as Weak-Supervision for Speech Enhancement and Noise-Robust ASR | Zhong-Qiu Wang et.al. | 2507.15229 | null |
| 2025-07-21 | EchoVoices: Preserving Generational Voices and Memories for Seniors and Children | Haiying Xu et.al. | 2507.15221 | null |
| 2025-07-21 | Exploiting Context-dependent Duration Features for Voice Anonymization Attack Systems | Natalia Tomashenko et.al. | 2507.15214 | null |
| 2025-07-20 | DMOSpeech 2: Reinforcement Learning for Duration Prediction in Metric-Optimized Speech Synthesis | Yinghao Aaron Li et.al. | 2507.14988 | null |
| 2025-07-19 | Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion | Yu Zhang et.al. | 2507.14534 | null |
| 2025-07-19 | Adapting Whisper for Lightweight and Efficient Automatic Speech Recognition of Children for On-device Edge Applications | Satwik Dutta et.al. | 2507.14451 | null |
| 2025-07-18 | Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic | Lilit Grigoryan et.al. | 2507.13977 | null |
| 2025-07-18 | Optimizing ASR for Catalan-Spanish Code-Switching: A Comparative Analysis of Methodologies | Carlos Mena et.al. | 2507.13875 | null |
| 2025-07-17 | A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models | Kirill Borodin et.al. | 2507.13563 | null |
| 2025-07-17 | Reading Between the Lines: Combining Pause Dynamics and Semantic Coherence for Automated Assessment of Thought Disorder | Feng Chen et.al. | 2507.13551 | null |
| 2025-07-18 | Automatically assessing oral narratives of Afrikaans and isiXhosa children | Retief Louw et.al. | 2507.13205 | null |
| 2025-07-17 | SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks | Kutub Uddin et.al. | 2507.13170 | null |
| 2025-07-17 | NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech | Maksim Borisov et.al. | 2507.13155 | null |
| 2025-07-17 | UniSLU: Unified Spoken Language Understanding from Heterogeneous Cross-Task Datasets | Zhichao Sheng et.al. | 2507.12951 | null |
| 2025-07-17 | Enkidu: Universal Frequential Perturbation for Real-Time Audio Privacy Protection against Voice Deepfakes | Zhou Feng et.al. | 2507.12932 | null |
| 2025-07-17 | AudioJudge: Understanding What Works in Large Audio Model Based Speech Evaluation | Potsawee Manakul et.al. | 2507.12705 | null |
| 2025-07-17 | Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine | Anastasia Kuznetsova et.al. | 2507.12701 | null |
| 2025-07-16 | Improving Contextual ASR via Multi-grained Fusion with Large Language Models | Shilin Zhou et.al. | 2507.12252 | null |
| 2025-07-16 | EME-TTS: Unlocking the Emphasis and Emotion Link in Speech Synthesis | Haoxun Li et.al. | 2507.12015 | null |
| 2025-07-15 | Towards Scalable AASIST: Refining Graph Attention for Speech Deepfake Detection | Ivan Viakhirev et.al. | 2507.11777 | null |
| 2025-07-15 | FasTUSS: Faster Task-Aware Unified Source Separation | Francesco Paissan et.al. | 2507.11435 | null |
| 2025-07-15 | Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models | Paul A. Bereuter et.al. | 2507.11427 | null |
| 2025-07-14 | WhisperKit: On-device Real-time ASR with Billion-Scale Transformers | Atila Orhon et.al. | 2507.10860 | null |
| 2025-07-14 | Supporting SENĆOTEN Language Documentation Efforts with Automatic Speech Recognition | Mengzhe Geng et.al. | 2507.10827 | null |
| 2025-07-14 | WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling | Qihui Yang et.al. | 2507.10534 | null |
| 2025-07-14 | DQLoRA: A Lightweight Domain-Aware Denoising ASR via Adapter-guided Distillation | Yiru Yang et.al. | 2507.10313 | null |
| 2025-07-13 | The DKU System for Multi-Speaker Automatic Speech Recognition in MLC-SLM Challenge | Yuke Lin et.al. | 2507.09499 | null |
| 2025-07-12 | Voice Conversion for Lombard Speaking Style with Implicit and Explicit Acoustic Feature Conditioning | Dominika Woszczyk et.al. | 2507.09310 | null |
| 2025-07-12 | Can We Really Repurpose Multi-Speaker ASR Corpus for Speaker Diarization? | Shota Horiguchi et.al. | 2507.09226 | null |
| 2025-07-15 | Mixture of LoRA Experts with Multi-Modal and Multi-Granularity LLM Generative Error Correction for Accented Speech Recognition | Bingshen Mu et.al. | 2507.09116 | null |
| 2025-07-11 | SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment | Shivam Mehta et.al. | 2507.09070 | null |
| 2025-07-11 | The Impact of Automatic Speech Transcription on Speaker Attribution | Cristina Aggazzotti et.al. | 2507.08660 | null |
| 2025-07-11 | Unlocking Speech Instruction Data Potential with Query Rewriting | Yonghua Hei et.al. | 2507.08603 | null |
| 2025-07-11 | ILT-Iterative LoRA Training through Focus-Feedback-Fix for Multilingual Speech Recognition | Qingliang Meng et.al. | 2507.08477 | null |
| 2025-07-11 | Active Learning for Text-to-Speech Synthesis with Informative Sample Collection | Kentaro Seki et.al. | 2507.08319 | null |
| 2025-07-11 | RawTFNet: A Lightweight CNN Architecture for Speech Anti-spoofing | Yang Xiao et.al. | 2507.08227 | null |
| 2025-07-10 | DARAS: Dynamic Audio-Room Acoustic Synthesis for Blind Room Impulse Response Estimation | Chunxi Wang et.al. | 2507.08135 | null |
| 2025-07-10 | Modèle physique variationnel pour l’estimation de réponses impulsionnelles de salles | Louis Lalay et.al. | 2507.08051 | null |
| 2025-07-10 | Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models | Chen Feng et.al. | 2507.07877 | null |
| 2025-07-10 | SecureSpeech: Prompt-based Speaker and Content Protection | Belinda Soh Hui Hui et.al. | 2507.07799 | null |
| 2025-07-10 | Code-Switching in End-to-End Automatic Speech Recognition: A Systematic Literature Review | Maha Tufail Agro et.al. | 2507.07741 | null |
| 2025-07-08 | Deep Feed-Forward Neural Network for Bangla Isolated Speech Recognition | Dipayan Bhadra et.al. | 2507.07068 | null |
| 2025-07-09 | Speech Tokenizer is Key to Consistent Representation | Wonjin Jung et.al. | 2507.06802 | null |
| 2025-07-09 | Exploring State-Space-Model based Language Model in Music Generation | Wei-Jaw Lee et.al. | 2507.06674 | null |
| 2025-07-09 | Learning Japanese with Jouzu: Interaction Outcomes with Stylized Dialogue Fictional Agents | Zackary Rackauckas et.al. | 2507.06483 | null |
| 2025-07-08 | Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis | Xintong Hu et.al. | 2507.06116 | null |
| 2025-07-08 | VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis | Alexandre Symeonidis-Herzig et.al. | 2507.06060 | null |
| 2025-07-08 | MusiScene: Leveraging MU-LLaMA for Scene Imagination and Enhanced Video Background Music Generation | Fathinah Izzati et.al. | 2507.05894 | null |
| 2025-07-08 | How to Evaluate Automatic Speech Recognition: Comparing Different Performance and Bias Measures | Tanvina Patel et.al. | 2507.05885 | null |
| 2025-07-08 | ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark | He Wang et.al. | 2507.05727 | null |
| 2025-07-08 | Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition | Zijin Gu et.al. | 2507.05724 | null |
| 2025-07-07 | EXPOTION: Facial Expression and Motion Control for Multimodal Music Generation | Fathinah Izzati et.al. | 2507.04955 | null |
| 2025-07-07 | Adaptive Slimming for Scalable and Efficient Speech Enhancement | Riccardo Miccini et.al. | 2507.04879 | null |
| 2025-07-07 | Fast-VGAN: Lightweight Voice Conversion with Explicit Control of F0 and Duration Parameters | Mathilde Abrassart et.al. | 2507.04817 | null |
| 2025-07-07 | Multi-Step Prediction and Control of Hierarchical Emotion Distribution in Text-to-Speech Synthesis | Sho Inoue et.al. | 2507.04598 | null |
| 2025-07-06 | TTS-CtrlNet: Time varying emotion aligned text-to-speech generation with ControlNet | Jaeseok Jeong et.al. | 2507.04349 | null |
| 2025-07-05 | Prosody Labeling with Phoneme-BERT and Speech Foundation Models | Tomoki Koriyama et.al. | 2507.03912 | null |
| 2025-07-04 | Improving Low-Resource Dialect Classification Using Retrieval-based Voice Conversion | Lea Fischbach et.al. | 2507.03641 | null |
| 2025-07-04 | MusGO: A Community-Driven Framework For Assessing Openness in Music-Generative AI | Roser Batlle-Roca et.al. | 2507.03599 | null |
| 2025-07-08 | SHNU Multilingual Conversational Speech Recognition System for INTERSPEECH 2025 MLC-SLM Challenge | Yuxiang Mei et.al. | 2507.03343 | null |
| 2025-07-03 | DeepGesture: A conversational gesture synthesis system based on emotions and semantics | Thanh Hoang-Minh et.al. | 2507.03147 | null |
| 2025-07-03 | Multi-agent Auditory Scene Analysis | Caleb Rascon et.al. | 2507.02755 | null |
| 2025-07-03 | Open-Source System for Multilingual Translation and Cloned Speech Synthesis | Mateo Cámara et.al. | 2507.02530 | null |
| 2025-07-03 | A Cookbook for Community-driven Data Collection of Impaired Speech in LowResource Languages | Sumaya Ahmed Salihs et.al. | 2507.02428 | null |
| 2025-07-03 | Benchmarking Akan ASR Models Across Domain-Specific Datasets: A Comparative Evaluation of Performance, Scalability, and Adaptability | Mark Atta Mensah et.al. | 2507.02407 | null |
| 2025-07-02 | Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis | Marc-André Carbonneau et.al. | 2507.02176 | null |
| 2025-07-02 | Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams | Zirui Li et.al. | 2507.02115 | null |
| 2025-07-02 | Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla | Md Sazzadul Islam Ridoy et.al. | 2507.01931 | null |
| 2025-07-02 | First Steps Towards Voice Anonymization for Code-Switching Speech | Sarina Meyer et.al. | 2507.01765 | null |
| 2025-07-02 | PERTINENCE: Input-based Opportunistic Neural Network Dynamic Execution | Omkar Shende et.al. | 2507.01695 | null |
| 2025-07-02 | Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora | Hitoshi Suda et.al. | 2507.01356 | null |
| 2025-07-02 | Learning from Random Subspace Exploration: Generalized Test-Time Augmentation with Self-supervised Distillation | Andrei Jelea et.al. | 2507.01347 | null |
| 2025-07-02 | AI Meets Maritime Training: Precision Analytics for Enhanced Safety and Performance | Vishakha Lall et.al. | 2507.01274 | null |
| 2025-07-01 | MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement | Nikolai Lund Kühne et.al. | 2507.00966 | link |
| 2025-07-02 | Multi-interaction TTS toward professional recording reproduction | Hiroki Kanagawa et.al. | 2507.00808 | null |
| 2025-07-01 | Rectifying Magnitude Neglect in Linear Attention | Qihang Fan et.al. | 2507.00698 | link |
| 2025-07-01 | Audio-3DVG: Unified Audio - Point Cloud Fusion for 3D Visual Grounding | Duc Cao-Dinh et.al. | 2507.00669 | null |
| 2025-06-29 | You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties | Paige Tuttösí et.al. | 2506.23367 | null |
| 2025-06-29 | The Florence Price Art Song Dataset and Piano Accompaniment Generator | Tao-Tao He et.al. | 2506.23130 | null |
| 2025-06-29 | TOMI: Transforming and Organizing Music Ideas for Multi-Track Compositions with Full-Song Structure | Qi He et.al. | 2506.23094 | null |
| 2025-06-29 | Research on Comprehensive Classroom Evaluation System Based on Multiple AI Models | Cong Xie et.al. | 2506.23079 | null |
| 2025-06-28 | Mind the Gap: Entity-Preserved Context-Aware ASR Structured Transcriptions | Duygu Altinok et.al. | 2506.22858 | null |
| 2025-06-28 | Boosting CTC-Based ASR Using LLM-Based Intermediate Loss Regularization | Duygu Altinok et.al. | 2506.22846 | null |
| 2025-06-28 | A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition | Shiyao Wang et.al. | 2506.22810 | null |
| 2025-06-27 | Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR | Weiqing Wang et.al. | 2506.22646 | null |
| 2025-06-27 | Cross-lingual Data Selection Using Clip-level Acoustic Similarity for Enhancing Low-resource Automatic Speech Recognition | Shunsuke Mitsumori et.al. | 2506.22194 | null |
| 2025-06-27 | SAGE: Spliced-Audio Generated Data for Enhancing Foundational Models in Low-Resource Arabic-English Code-Switched Speech Recognition | Muhammad Umar Farooq et.al. | 2506.22143 | null |
| 2025-06-27 | Evaluating Pointing Gestures for Target Selection in Human-Robot Collaboration | Noora Sassali et.al. | 2506.22116 | null |
| 2025-06-27 | Robust and Efficient Autoregressive Speech Synthesis with Dynamic Chunk-wise Prediction Policy | Bohan Li et.al. | 2506.22023 | null |
| 2025-06-27 | Analyzing and Fine-Tuning Whisper Models for Multilingual Pilot Speech Transcription in the Cockpit | Kartheek Kumar Reddy Nareddy et.al. | 2506.21990 | null |
| 2025-06-26 | Exploring Adapter Design Tradeoffs for Low Resource Music Generation | Atharva Mehta et.al. | 2506.21298 | null |
| 2025-06-26 | A Multi-Stage Framework for Multimodal Controllable Speech Synthesis | Rui Niu et.al. | 2506.20945 | null |
| 2025-06-25 | Multimodal Representation Learning and Fusion | Qihang Jin et.al. | 2506.20494 | null |
| 2025-06-25 | Lightweight Target-Speaker-Based Overlap Transcription for Practical Streaming ASR | Aleš Pražák et.al. | 2506.20288 | null |
| 2025-06-24 | Accurate, fast, cheap: Choose three. Replacing Multi-Head-Attention with Bidirectional Recurrent Attention for Long-Form ASR | Martin Ratajczak et.al. | 2506.19761 | null |
| 2025-06-23 | A Fourier Explanation of AI-music Artifacts | Darius Afchar et.al. | 2506.19108 | null |
| 2025-06-23 | Benchmarking Music Generation Models and Metrics via Human Preference Studies | Florian Grötschla et.al. | 2506.19085 | null |
| 2025-06-23 | Let Your Video Listen to Your Music! | Xinyu Zhang et.al. | 2506.18881 | null |
| 2025-06-24 | MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners | Fang-Duo Tsai et.al. | 2506.18729 | link |
| 2025-06-23 | Context Biasing for Pronunciations-Orthography Mismatch in Automatic Speech Recognition | Christian Huber et.al. | 2506.18703 | null |
| 2025-06-23 | Evaluating Multichannel Speech Enhancement Algorithms at the Phoneme Scale Across Genders | Nasser-Eddine Monir et.al. | 2506.18691 | null |
| 2025-06-23 | End-to-End Spoken Grammatical Error Correction | Mengjie Qian et.al. | 2506.18532 | null |
| 2025-06-23 | AI-Generated Song Detection via Lyrics Transcripts | Markus Frohmann et.al. | 2506.18488 | null |
| 2025-06-23 | Selecting N-lowest scores for training MOS prediction models | Yuto Kondo et.al. | 2506.18326 | null |
| 2025-06-23 | Large-Scale Training Data Attribution for Music Generative Models via Unlearning | Woosung Choi et.al. | 2506.18312 | null |
| 2025-06-23 | Rethinking Mean Opinion Scores in Speech Quality Assessment: Aggregation through Quantized Distribution Fitting | Yuto Kondo et.al. | 2506.18307 | null |
| 2025-06-23 | JIS: A Speech Corpus of Japanese Idol Speakers with Various Speaking Styles | Yuto Kondo et.al. | 2506.18296 | null |
| 2025-06-20 | Simultaneous Translation with Offline Speech and LLM Models in CUNI Submission to IWSLT 2025 | Dominik Macháček et.al. | 2506.17077 | null |
| 2025-06-20 | Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning | Giuseppe Attanasio et.al. | 2506.17019 | null |
| 2025-06-20 | State-Space Models in Efficient Whispered and Multi-dialect Speech Recognition | Aref Farhadipour et.al. | 2506.16969 | null |
| 2025-06-20 | Hybrid-Sep: Language-queried audio source separation via pre-trained Model Fusion and Adversarial Diffusion Training | Jianyuan Feng et.al. | 2506.16833 | null |
| 2025-06-20 | RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching | Hyun Joon Park et.al. | 2506.16741 | link |
| 2025-06-20 | LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization | Daejin Jo et.al. | 2506.16738 | null |
| 2025-06-20 | V-CASS: Vision-context-aware Expressive Speech Synthesis for Enhancing User Understanding of Videos | Qixin Wang et.al. | 2506.16716 | null |
| 2025-06-19 | Weight Factorization and Centralization for Continual Learning in Speech Recognition | Enes Yavuz Ugan et.al. | 2506.16574 | null |
| 2025-06-19 | Automatic Speech Recognition Biases in Newcastle English: an Error Analysis | Dana Serditova et.al. | 2506.16558 | null |
| 2025-06-19 | InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems | Kexin Huang et.al. | 2506.16381 | link |
| 2025-06-18 | Diff-TONE: Timestep Optimization for iNstrument Editing in Text-to-Music Diffusion Models | Teysir Baoueb et.al. | 2506.15530 | null |
| 2025-06-18 | Exploiting Music Source Separation for Automatic Lyrics Transcription with Whisper | Jaza Syed et.al. | 2506.15514 | link |
| 2025-06-18 | Foundation of Affective Computing and Interaction | Changzeng Fu et.al. | 2506.15497 | null |
| 2025-06-18 | An accurate and revised version of optical character recognition-based speech synthesis using LabVIEW | Prateek Mehta et.al. | 2506.15029 | null |
| 2025-06-17 | A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments | Md Jahangir Alam Khondkar et.al. | 2506.15000 | link |
| 2025-06-17 | Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition | Jiamin Xie et.al. | 2506.14973 | null |
| 2025-06-17 | Unifying Streaming and Non-streaming Zipformer-based ASR | Bidisha Sharma et.al. | 2506.14434 | null |
| 2025-06-17 | Investigation of Zero-shot Text-to-Speech Models for Enhancing Short-Utterance Speaker Verification | Yiyang Zhao et.al. | 2506.14226 | null |
| 2025-06-17 | Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios | Aswin Shanmugam Subramanian et.al. | 2506.14204 | null |
| 2025-06-17 | AsyncSwitch: Asynchronous Text-Speech Adaptation for Code-Switched ASR | Tuan Nguyen et.al. | 2506.14190 | null |
| 2025-06-17 | Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models | Tuan Dat Phuong et.al. | 2506.14153 | null |
| 2025-06-16 | Qwen vs. Gemma Integration with Whisper: A Comparative Study in Multilingual SpeechLLM Systems | Tuan Nguyen et.al. | 2506.13596 | null |
| 2025-06-16 | From Flat to Feeling: A Feasibility and Impact Study on Dynamic Facial Emotions in AI-Generated Avatars | Pegah Salehi et.al. | 2506.13477 | null |
| 2025-06-16 | BUT System for the MLC-SLM Challenge | Alexander Polok et.al. | 2506.13414 | link |
| 2025-06-16 | Bi-directional Context-Enhanced Speech Large Language Models for Multilingual Conversational ASR | Yizhou Peng et.al. | 2506.13396 | null |
| 2025-06-16 | NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025 | Yizhou Peng et.al. | 2506.13339 | null |
| 2025-06-16 | Seewo’s Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models | Bo Li et.al. | 2506.13300 | null |
| 2025-06-16 | Personalizable Long-Context Symbolic Music Infilling with MIDI-RWKV | Christian Zhou-Zheng et.al. | 2506.13001 | link |
| 2025-06-15 | SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition | Yuta Hirano et.al. | 2506.12672 | null |
| 2025-06-14 | Video-Guided Text-to-Music Generation Using Public Domain Movie Collections | Haven Kim et.al. | 2506.12573 | null |
| 2025-06-14 | Mitigating Non-Target Speaker Bias in Guided Speaker Embedding | Shota Horiguchi et.al. | 2506.12500 | null |
| 2025-06-13 | Enabling automatic transcription of child-centered audio recordings from real-world environments | Daniil Kocharov et.al. | 2506.11747 | null |
| 2025-06-13 | Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform | Xiangzhu Kong et.al. | 2506.11630 | null |
| 2025-06-13 | (SimPhon Speech Test): A Data-Driven Method for In Silico Design and Validation of a Phonetically Balanced Speech Test | Stefan Bleeck et.al. | 2506.11620 | null |
| 2025-06-13 | Machine Unlearning for Robust DNNs: Attribution-Guided Partitioning and Neuron Pruning in Noisy Environments | Deliang Jin et.al. | 2506.11615 | null |
| 2025-06-12 | Advances in Small-Footprint Keyword Spotting: A Comprehensive Review of Efficient Models and Algorithms | Soumen Garai et.al. | 2506.11169 | null |
| 2025-06-12 | Improving Named Entity Transcription with Contextual LLM-based Revision | Viet Anh Trinh et.al. | 2506.10779 | null |
| 2025-06-12 | BNMusic: Blending Environmental Noises into Personalized Music | Chi Zuo et.al. | 2506.10754 | null |
| 2025-06-12 | FairASR: Fair Audio Contrastive Learning for Automatic Speech Recognition | Jongsuk Kim et.al. | 2506.10747 | null |
| 2025-06-12 | Joint ASR and Speaker Role Tagging with Serialized Output Training | Anfeng Xu et.al. | 2506.10349 | null |
| 2025-06-12 | RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding | Yisi Liu et.al. | 2506.10289 | null |
| 2025-06-11 | Fine-Grained control over Music Generation with Activation Steering | Dipanshu Panda et.al. | 2506.10225 | null |
| 2025-06-11 | UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching | Neta Glazer et.al. | 2506.09874 | null |
| 2025-06-11 | Regularizing Learnable Feature Extraction for Automatic Speech Recognition | Peter Vieting et.al. | 2506.09804 | null |
| 2025-06-11 | Training-Free Voice Conversion with Factorized Optimal Transport | Alexander Lobashev et.al. | 2506.09709 | link |
| 2025-06-11 | You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks | Ünal Ege Gaznepoglu et.al. | 2506.09521 | null |
| 2025-06-11 | OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary | Yui Sudo et.al. | 2506.09448 | null |
| 2025-06-11 | CoLMbo: Speaker Language Model for Descriptive Profiling | Massa Baali et.al. | 2506.09375 | null |
| 2025-06-11 | OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment | Chao-Hong Tan et.al. | 2506.09349 | null |
| 2025-06-10 | SimClass: A Classroom Speech Dataset Generated via Game Engine Simulation For Automatic Speech Recognition Research | Ahmed Adel Attia et.al. | 2506.09206 | null |
| 2025-06-10 | FROST-EMA: Finnish and Russian Oral Speech Dataset of Electromagnetic Articulography Measurements with L1, L2 and Imitated L2 Accents | Satu Hopponen et.al. | 2506.08981 | null |
| 2025-06-10 | Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model | Ailin Huang et.al. | 2506.08967 | null |
| 2025-06-09 | Uncovering the Functional Roles of Nonlinearity in Memory | Manuel Brenner et.al. | 2506.07919 | null |
| 2025-06-09 | Unified Semi-Supervised Pipeline for Automatic Speech Recognition | Nune Tadevosyan et.al. | 2506.07659 | null |
| 2025-06-09 | Transcript-Prompted Whisper with Dictionary-Enhanced Decoding for Japanese Speech Annotation | Rui Hu et.al. | 2506.07646 | null |
| 2025-06-09 | SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement | Chenyu Yang et.al. | 2506.07634 | link |
| 2025-06-09 | Bayesian Learning for Domain-Invariant Speaker Verification and Anti-Spoofing | Jin Li et.al. | 2506.07536 | null |
| 2025-06-09 | LeVo: High-Quality Song Generation with Multi-Preference Alignment | Shun Lei et.al. | 2506.07520 | link |
| 2025-06-09 | Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition | Asahi Sakuma et.al. | 2506.07515 | null |
| 2025-06-09 | DeRAGEC: Denoising Named Entity Candidates with Synthetic Rationale for ASR Error Correction | Solee Im et.al. | 2506.07510 | null |
| 2025-06-09 | Towards Energy-Efficient and Low-Latency Voice-Controlled Smart Homes: A Proposal for Offline Speech Recognition and IoT Integration | Peng Huang et.al. | 2506.07494 | null |
| 2025-06-08 | Speech Recognition on TV Series with Video-guided Post-Correction | Haoyuan Yang et.al. | 2506.07323 | null |
| 2025-06-06 | Lightweight Prompt Biasing for Contextualized End-to-End ASR Systems | Bo Ren et.al. | 2506.06252 | null |
| 2025-06-06 | Phonetically-Augmented Discriminative Rescoring for Voice Search Error Correction | Christophe Van Gysel et.al. | 2506.06117 | null |
| 2025-06-06 | CO-VADA: A Confidence-Oriented Voice Augmentation Debiasing Approach for Fair Speech Emotion Recognition | Yun-Shao Tsai et.al. | 2506.06071 | null |
| 2025-06-06 | Diarization-Aware Multi-Speaker Automatic Speech Recognition via Large Language Models | Yuke Lin et.al. | 2506.05796 | null |
| 2025-06-06 | Bridging the Modality Gap: Softly Discretizing Audio Representation for LLM-based Automatic Speech Recognition | Mu Yang et.al. | 2506.05706 | null |
| 2025-06-06 | Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning | Yangui Fang et.al. | 2506.05671 | null |
| 2025-06-05 | Improving AI-generated music with user-guided training | Vishwa Mohan Singh et.al. | 2506.04852 | null |
| 2025-06-05 | LLM-based phoneme-to-grapheme for phoneme-based speech recognition | Te Ma et.al. | 2506.04711 | null |
| 2025-06-05 | ViCocktail: Automated Multi-Modal Data Collection for Vietnamese Audio-Visual Speech Recognition | Thai-Binh Nguyen et.al. | 2506.04635 | null |
| 2025-06-05 | LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models | Wen Ding et.al. | 2506.04586 | null |
| 2025-06-04 | French Listening Tests for the Assessment of Intelligibility, Quality, and Identity of Body-Conducted Speech Enhancement | Thomas Joubaud et.al. | 2506.04495 | null |
| 2025-06-04 | Effects of Speaker Count, Duration, and Accent Diversity on Zero-Shot Accent Robustness in Low-Resource ASR | Zheng-Xin Yong et.al. | 2506.04364 | null |
| 2025-06-04 | HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset | Ryan Langman et.al. | 2506.04152 | null |
| 2025-06-04 | A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions | Chung-Chun Wang et.al. | 2506.04077 | null |
| 2025-06-04 | Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion | Seymanur Akti et.al. | 2506.04013 | null |
| 2025-06-04 | MFLA: Monotonic Finite Look-ahead Attention for Streaming Speech Recognition | Yinfeng Xia et.al. | 2506.03722 | null |
| 2025-06-04 | Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments | Reo Yoneyama et.al. | 2506.03554 | null |
| 2025-06-04 | Local Equivariance Error-Based Metrics for Evaluating Sampling-Frequency-Independent Property of Neural Network | Kanami Imamura et.al. | 2506.03550 | null |
| 2025-06-03 | Controllable Text-to-Speech Synthesis with Masked-Autoencoded Style-Rich Representation | Yongqi Wang et.al. | 2506.02997 | null |
| 2025-06-03 | A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation | Verena Blaschke et.al. | 2506.02894 | link |
| 2025-06-03 | CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech | Helin Wang et.al. | 2506.02863 | link |
| 2025-06-05 | DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization | Geonyoung Lee et.al. | 2506.02858 | null |
| 2025-06-03 | On the influence of language similarity in non-target speaker verification trials | Paul M. Reuter et.al. | 2506.02777 | null |
| 2025-06-03 | Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions | Xiaoxue Gao et.al. | 2506.02742 | null |
| 2025-06-03 | Overcoming Data Scarcity in Multi-Dialectal Arabic ASR via Whisper Fine-Tuning | Ömer Tarik Özyilmaz et.al. | 2506.02627 | null |
| 2025-06-03 | On the Language and Gender Biases in PSTN, VoIP and Neural Audio Codecs | Kemal Altwlkany et.al. | 2506.02545 | null |
| 2025-06-03 | DnR-nonverbal: Cinematic Audio Source Separation Dataset Containing Non-Verbal Sounds | Takuya Hasumi et.al. | 2506.02499 | null |
| 2025-06-03 | SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant | Yixuan Hou et.al. | 2506.02457 | null |
| 2025-05-30 | Running Conventional Automatic Speech Recognition on Memristor Hardware: A Simulated Approach | Nick Rossenbach et.al. | 2505.24721 | null |
| 2025-05-30 | Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification | Badr M. Abdullah et.al. | 2505.24713 | link |
| 2025-06-02 | MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR | Dimitrios Damianos et.al. | 2505.24656 | null |
| 2025-05-30 | Pretraining Multi-Speaker Identification for Neural Speaker Diarization | Shota Horiguchi et.al. | 2505.24545 | null |
| 2025-05-30 | SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition | Longjie Luo et.al. | 2505.24450 | null |
| 2025-05-30 | Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge | Longjie Luo et.al. | 2505.24446 | null |
| 2025-05-30 | Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction | Yangui Fang et.al. | 2505.24347 | null |
| 2025-05-30 | When Humans Growl and Birds Speak: High-Fidelity Voice Conversion from Human to Animal and Designed Sounds | Minsu Kang et.al. | 2505.24336 | null |
| 2025-05-30 | A Perception-Based L2 Speech Intelligibility Indicator: Leveraging a Rater’s Shadowing and Sequence-to-sequence Voice Conversion | Haopeng Geng et.al. | 2505.24304 | null |
| 2025-05-30 | Discl-VC: Disentangled Discrete Tokens and In-Context Learning for Controllable Zero-Shot Voice Conversion | Kaidi Wang et.al. | 2505.24291 | null |
| 2025-05-29 | Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection | Griffin Dietz Smith et.al. | 2505.23627 | null |
| 2025-05-29 | ZeroSep: Separate Anything in Audio with Zero Training | Chao Huang et.al. | 2505.23625 | link |
| 2025-05-29 | MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction | Yunkee Chae et.al. | 2505.23305 | null |
| 2025-05-29 | Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation | Zhennan Lin et.al. | 2505.23077 | null |
| 2025-05-29 | AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition | Yuhang Dai et.al. | 2505.23036 | link |
| 2025-05-28 | BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models | Susan Liang et.al. | 2505.22865 | null |
| 2025-05-28 | NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding | Vladimir Bataev et.al. | 2505.22857 | null |
| 2025-05-28 | Evaluation of LLMs in Speech is Often Flawed: Test Set Contamination in Large Language Models for Speech Recognition | Yuan Tseng et.al. | 2505.22251 | null |
| 2025-05-28 | Advancing Hearing Assessment: An ASR-Based Frequency-Specific Speech Test for Diagnosing Presbycusis | Stefan Bleeck et.al. | 2505.22231 | null |
| 2025-05-28 | On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition | Shujie HU et.al. | 2505.22072 | null |
| 2025-05-28 | Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR | Mingchen Shao et.al. | 2505.22063 | null |
| 2025-05-28 | Overlap-Adaptive Hybrid Speaker Diarization and ASR-Aware Observation Addition for MISP 2025 Challenge | Shangkun Huang et.al. | 2505.22013 | null |
| 2025-05-28 | Leveraging LLM for Stuttering Speech: A Unified Architecture Bridging Recognition and Event Detection | Shangkun Huang et.al. | 2505.22005 | null |
| 2025-05-27 | GMU Systems for the IWSLT 2025 Low-Resource Speech Translation Shared Task | Chutong Meng et.al. | 2505.21781 | null |
| 2025-05-27 | VoxAging: Continuously Tracking Speaker Aging with a Large-Scale Longitudinal Dataset in English and Mandarin | Zhiqi Ai et.al. | 2505.21445 | null |
| 2025-05-27 | Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision | Zhaoqing Li et.al. | 2505.21245 | null |
| 2025-05-27 | PSRB: A Comprehensive Benchmark for Evaluating Persian ASR Systems | Nima Sedghiyeh et.al. | 2505.21230 | null |
| 2025-05-27 | Topological Deep Learning for Speech Data | Zhiwang Yu et.al. | 2505.21173 | null |
| 2025-05-27 | Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis | Tianyi Xu et.al. | 2505.21138 | null |
| 2025-05-27 | Text-Queried Audio Source Separation via Hierarchical Modeling | Xinlei Yin et.al. | 2505.21025 | null |
| 2025-05-27 | VibE-SVC: Vibrato Extraction with High-frequency F0 Contour for Singing Voice Conversion | Joon-Seung Choi et.al. | 2505.20794 | null |
| 2025-05-27 | REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion | Ishan D. Biyani et.al. | 2505.20756 | null |
| 2025-05-27 | PromptEVC: Controllable Emotional Voice Conversion with Natural Language Prompts | Tianhua Qi et.al. | 2505.20678 | null |
| 2025-05-27 | Towards Pretraining Robust ASR Foundation Model with Acoustic-Aware Data Augmentation | Dancheng Liu et.al. | 2505.20606 | null |
| 2025-05-26 | Towards Video to Piano Music Generation with Chain-of-Perform Support Benchmarks | Chang Liu et.al. | 2505.20038 | null |
| 2025-05-26 | Mixture of LoRA Experts for Low-Resourced Multi-Accent Automatic Speech Recognition | Raphaël Bagat et.al. | 2505.20006 | null |
| 2025-05-26 | Novel Loss-Enhanced Universal Adversarial Patches for Sustainable Speaker Privacy | Elvir Karimov et.al. | 2505.19951 | null |
| 2025-05-26 | DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech | Deok-Hyeon Cho et.al. | 2505.19687 | null |
| 2025-05-26 | KIT’s Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization | Zhaolin Li et.al. | 2505.19679 | null |
| 2025-05-26 | Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling | Haiyang Sun et.al. | 2505.19669 | null |
| 2025-05-26 | Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically | Ryan Soh-Eun Shim et.al. | 2505.19606 | null |
| 2025-05-26 | Training-Free Multi-Step Audio Source Separation | Yongyi Zang et.al. | 2505.19534 | null |
| 2025-05-26 | Beyond Manual Transcripts: The Potential of Automated Speech Recognition Errors in Improving Alzheimer’s Disease Detection | Yin-Long Liu et.al. | 2505.19448 | null |
| 2025-05-26 | GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor | Seokgi Lee et.al. | 2505.19384 | null |
| 2025-05-23 | Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities | Ziwei Zhou et.al. | 2505.17862 | link |
| 2025-05-23 | CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training | Zhihao Du et.al. | 2505.17589 | null |
| 2025-05-23 | Private kNN-VC: Interpretable Anonymization of Converted Speech | Carlos Franzreb et.al. | 2505.17584 | link |
| 2025-05-23 | Swedish Whispers; Leveraging a Massive Speech Corpus for Swedish Speech Recognition | Leonora Vesterbacka et.al. | 2505.17538 | null |
| 2025-05-23 | Speechless: Speech Instruction Training Without Speech for Low Resource Languages | Alan Dao et.al. | 2505.17417 | link |
| 2025-05-23 | LLM-based Generative Error Correction for Rare Words with Synthetic Data and Phonetic Context | Natsuo Yamashita et.al. | 2505.17410 | null |
| 2025-05-23 | An End-to-End Approach for Child Reading Assessment in the Xhosa Language | Sergio Chevtchenko et.al. | 2505.17371 | null |
| 2025-05-22 | An Effective Training Framework for Light-Weight Automatic Speech Recognition Models | Abdul Hannan et.al. | 2505.16991 | null |
| 2025-05-22 | From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition | Tianduo Wang et.al. | 2505.16972 | link |
| 2025-05-23 | EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion | Advait Joglekar et.al. | 2505.16691 | link |
| 2025-05-22 | SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding | Sushant Gautam et.al. | 2505.16630 | link |
| 2025-05-22 | HPP-Voice: A Large-Scale Evaluation of Speech Embeddings for Multi-Phenotypic Classification | David Krongauz et.al. | 2505.16490 | null |
| 2025-05-22 | X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance | Junbo Zhang et.al. | 2505.16369 | link |
| 2025-05-22 | Large Language Models based ASR Error Correction for Child Conversations | Anfeng Xu et.al. | 2505.16212 | null |
| 2025-05-22 | Differentiable K-means for Fully-optimized Discrete Token-based ASR | Kentaro Onda et.al. | 2505.16207 | null |
| 2025-05-22 | Prosodically Enhanced Foreign Accent Simulation by Discrete Token-based Resynthesis Only with Native Speech Corpora | Kentaro Onda et.al. | 2505.16191 | null |
| 2025-05-22 | Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty | Hongfei Xue et.al. | 2505.16168 | null |
| 2025-05-21 | MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling | Cheng Yifan et.al. | 2505.15772 | null |
| 2025-05-21 | Word Level Timestamp Generation for Automatic Speech Recognition and Translation | Ke Hu et.al. | 2505.15646 | null |
| 2025-05-21 | Moonbeam: A MIDI Foundation Model Using Both Absolute and Relative Music Attributes | Zixun Guo et.al. | 2505.15559 | null |
| 2025-05-21 | Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models | Zirui Song et.al. | 2505.15406 | link |
| 2025-05-21 | Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning | Junchuan Zhao et.al. | 2505.15402 | null |
| 2025-05-21 | Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding | Zijian Lin et.al. | 2505.15380 | null |
| 2025-05-21 | Voice-ENHANCE: Speech Restoration using a Diffusion-based Voice Conversion Framework | Kyungguen Byun et.al. | 2505.15254 | null |
| 2025-05-20 | In-Context Learning Boosts Speech Recognition via Human-like Adaptation to Speakers and Language Varieties | Nathan Roll et.al. | 2505.14887 | link |
| 2025-05-20 | Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages | Chin-Jou Li et.al. | 2505.14874 | null |
| 2025-05-20 | Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits | Tiantian Feng et.al. | 2505.14648 | link |
| 2025-05-20 | Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference | Tomer Gafni et.al. | 2505.14638 | null |
| 2025-05-20 | SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification | Theo Lepage et.al. | 2505.14561 | null |
| 2025-05-20 | Pairwise Evaluation of Accent Similarity in Speech Synthesis | Jinzuomu Zhong et.al. | 2505.14410 | null |
| 2025-05-20 | PersonaTAB: Predicting Personality Traits using Textual, Acoustic, and Behavioral Cues in Fully-Duplex Speech Dialogs | Sho Inoue et.al. | 2505.14356 | null |
| 2025-05-20 | FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation | Yutong Liu et.al. | 2505.14351 | null |
| 2025-05-20 | Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach | Umberto Cappellazzo et.al. | 2505.14336 | null |
| 2025-05-20 | HausaNLP: Current Status, Challenges and Future Directions for Hausa Natural Language Processing | Shamsuddeen Hassan Muhammad et.al. | 2505.14311 | null |
| 2025-05-20 | Source Verification for Speech Deepfakes | Viola Negroni et.al. | 2505.14188 | null |
| 2025-05-20 | The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition | Ming Gao et.al. | 2505.13971 | null |
| 2025-05-19 | Granary: Speech Recognition and Translation Dataset in 25 European Languages | Nithin Rao Koluguri et.al. | 2505.13404 | null |
| 2025-05-19 | Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space | Zhengrui Ma et.al. | 2505.13181 | link |
| 2025-05-19 | Cross-modal Knowledge Transfer Learning as Graph Matching Based on Optimal Transport for ASR | Xugang Lu et.al. | 2505.13079 | null |
| 2025-05-19 | KIT’s Offline Speech Translation and Instruction Following Submission for IWSLT 2025 | Sai Koneru et.al. | 2505.13036 | link |
| 2025-05-19 | Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition | Dominik Wagner et.al. | 2505.12991 | null |
| 2025-05-19 | Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down | Yingzhi Wang et.al. | 2505.12969 | null |
| 2025-05-19 | Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio | Jongmin Jung et.al. | 2505.12863 | null |
| 2025-05-19 | OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching | Hieu-Nghia Huynh-Nguyen et.al. | 2505.12800 | null |
| 2025-05-19 | RoVo: Robust Voice Protection Against Unauthorized Speech Synthesis with Embedding-Level Perturbations | Seungmin Kim et.al. | 2505.12686 | null |
| 2025-05-19 | Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time Alignment | Abhinaba Roy et.al. | 2505.12669 | link |
| 2025-05-16 | LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models | Danilo de Oliveira et.al. | 2505.11391 | null |
| 2025-05-16 | LegoSLM: Connecting LLM with Speech Encoder using CTC Posteriors | Rao Ma et.al. | 2505.11352 | null |
| 2025-05-16 | Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio | Xinlu He et.al. | 2505.10975 | null |
| 2025-05-16 | Multi-Stage Speaker Diarization for Noisy Classrooms | Ali Sartaz Khan et.al. | 2505.10879 | null |
| 2025-05-15 | UDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech | Jiaxuan Liu et.al. | 2505.10599 | null |
| 2025-05-15 | Inclusivity of AI Speech in Healthcare: A Decade Look Back | Retno Larasati et.al. | 2505.10596 | null |
| 2025-05-15 | Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio | Tu Duyen Nguyen et.al. | 2505.10500 | null |
| 2025-05-14 | GlobalMood: A cross-cultural benchmark for music emotion recognition | Harin Lee et.al. | 2505.09539 | null |
| 2025-05-14 | SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset | Yicheng Gu et.al. | 2505.09325 | null |
| 2025-05-14 | DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis | Zeeshan Ahmad et.al. | 2505.09091 | null |
| 2025-05-13 | Inference Attacks for X-Vector Speaker Anonymization | Luke Bauer et.al. | 2505.08978 | null |
| 2025-05-13 | Investigating self-supervised features for expressive, multilingual voice conversion | Álvaro Martín-Cortinas et.al. | 2505.08278 | null |
| 2025-05-13 | Not that Groove: Zero-Shot Symbolic Music Editing | Li Zhang et.al. | 2505.08203 | null |
| 2025-05-12 | Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications | Biel Tura Vecino et.al. | 2505.07701 | null |
| 2025-05-12 | Full simulation on the dynamics of auditory synaptic fusion: Strong clustering of calcium channel might be the origin of the coherent release in the auditory hair cells | Jaeyun Yoo et.al. | 2505.07273 | null |
| 2025-05-09 | Remote Rowhammer Attack using Adversarial Observations on Federated Learning Clients | Jinsheng Yuan et.al. | 2505.06335 | null |
| 2025-05-08 | Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations | Linrong Pan et.al. | 2505.05056 | null |
| 2025-05-08 | A Multi-Agent AI Framework for Immersive Audiobook Production through Spatial Audio and Neural Narration | Shaja Arul Selvamani et.al. | 2505.04885 | null |
| 2025-05-07 | Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond | Jessie Richter-Powell et.al. | 2505.04621 | null |
| 2025-05-07 | SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer | Young-Hu Park et.al. | 2505.04394 | null |
| 2025-05-07 | Discrete Optimal Transport and Voice Conversion | Anton Selitskiy et.al. | 2505.04382 | null |
| 2025-05-07 | Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement | Rauf Nasretdinov et.al. | 2505.04237 | null |
| 2025-05-06 | VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model | Zuwei Long et.al. | 2505.03739 | link |
| 2025-05-06 | Fairness of Automatic Speech Recognition in Cleft Lip and Palate Speech | Susmita Bhattacharjee et.al. | 2505.03697 | null |
| 2025-05-06 | Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic Music Generation | Jincheng Zhang et.al. | 2505.03314 | link |
| 2025-05-06 | SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation | Zhaoxi Mu et.al. | 2505.03273 | null |
| 2025-05-06 | SonicRAG : High Fidelity Sound Effects Synthesis Based on Retrival Augmented Generation | Yu-Ren Guo et.al. | 2505.03244 | null |
| 2025-05-06 | MGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise Separable Module for Speaker Verification | Ya Li et.al. | 2505.03228 | link |
| 2025-05-06 | CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization | Detao Bai et.al. | 2505.03186 | link |
| 2025-05-05 | Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play | Yemin Shi et.al. | 2505.02707 | link |
| 2025-05-05 | LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis | Qingkai Fang et.al. | 2505.02625 | link |
| 2025-05-04 | Transforming faces into video stories – VideoFace2.0 | Branko Brkljač et.al. | 2505.02060 | null |
| 2025-05-04 | A Synergistic Framework of Nonlinear Acoustic Computing and Reinforcement Learning for Real-World Human-Robot Interaction | Xiaoliang Chen et.al. | 2505.01998 | null |
| 2025-05-02 | Transfer Learning-Based Deep Residual Learning for Speech Recognition in Clean and Noisy Environments | Noussaiba Djeffal et.al. | 2505.01632 | null |
| 2025-05-01 | Scaling On-Device GPU Inference for Large Generative Models | Jiuqiang Tang et.al. | 2505.00232 | null |
| 2025-04-30 | BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition | Paige Tuttösí et.al. | 2505.00059 | link |
| 2025-04-30 | From Aesthetics to Human Preferences: Comparative Perspectives of Evaluating Text-to-Music Systems | Huan Zhang et.al. | 2504.21815 | null |
| 2025-04-30 | Retrieval-Enhanced Few-Shot Prompting for Speech Event Extraction | Máté Gedeon et.al. | 2504.21372 | null |
| 2025-04-29 | AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation | Jeongsoo Choi et.al. | 2504.20629 | null |
| 2025-05-02 | Towards Flow-Matching-based TTS without Classifier-Free Guidance | Yuzhe Liang et.al. | 2504.20334 | null |
| 2025-04-28 | A Comprehensive Part-of-Speech Tagging to Standardize Central-Kurdish Language: A Research Guide for Kurdish Natural Language Processing Tasks | Shadan Shukr Sabr et.al. | 2504.19645 | null |
| 2025-04-27 | Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements | Sandipan Dhar et.al. | 2504.19197 | null |
| 2025-04-25 | Kimi-Audio Technical Report | KimiTeam et.al. | 2504.18425 | link |
| 2025-04-28 | Augmenting Captions with Emotional Cues: An AR Interface for Real-Time Accessible Communication | Sunday David Ubur et.al. | 2504.17171 | null |
| 2025-04-23 | SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward | Nicolas Jonason et.al. | 2504.16839 | null |
| 2025-04-22 | TinyML for Speech Recognition | Andrew Barovic et.al. | 2504.16213 | null |
| 2025-04-22 | LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale | Joya Chen et.al. | 2504.16030 | link |
| 2025-04-22 | Quantifying Source Speaker Leakage in One-to-One Voice Conversion | Scott Wellington et.al. | 2504.15822 | null |
| 2025-04-22 | Development and evaluation of a deep learning algorithm for German word recognition from lip movements | Dinh Nam Pham et.al. | 2504.15792 | null |
| 2025-04-22 | FADEL: Uncertainty-aware Fake Audio Detection with Evidential Deep Learning | Ju Yeon Kang et.al. | 2504.15663 | null |
| 2025-04-22 | A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models | Gengxian Cao et.al. | 2504.15552 | null |
| 2025-04-21 | Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides | Jinghua Zhao et.al. | 2504.15066 | null |
| 2025-04-21 | SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation | Yue Li et.al. | 2504.15035 | null |
| 2025-04-21 | Speaker Fuzzy Fingerprints: Benchmarking Text-Based Identification in Multiparty Dialogues | Rui Ribeiro et.al. | 2504.14963 | null |
| 2025-04-21 | StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models | Yeona Hong et.al. | 2504.14915 | null |
| 2025-04-20 | DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue | Xiang Li et.al. | 2504.14482 | link |
| 2025-04-19 | The First VoicePrivacy Attacker Challenge | Natalia Tomashenko et.al. | 2504.14183 | null |
| 2025-04-18 | Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion | Sandipan Dhar et.al. | 2504.13791 | null |
| 2025-04-18 | MusFlow: Multimodal Music Generation via Conditional Flow Matching | Jiahao Song et.al. | 2504.13535 | null |
| 2025-04-17 | Acoustic to Articulatory Inversion of Speech; Data Driven Approaches, Challenges, Applications, and Future Scope | Leena G Pillai et.al. | 2504.13308 | null |
| 2025-04-16 | Dysarthria Normalization via Local Lie Group Transformations for Robust ASR | Mikhail Osipov et.al. | 2504.12279 | null |
| 2025-04-16 | Advancing Arabic Speech Recognition Through Large-Scale Weakly Supervised Learning | Mahmoud Salhab et.al. | 2504.12254 | null |
| 2025-04-16 | Voice Conversion with Diverse Intonation using Conditional Variational Auto-Encoder | Soobin Suh et.al. | 2504.12005 | null |
| 2025-04-15 | Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation | Yan Rong et.al. | 2504.11002 | null |
| 2025-04-15 | Real-Time Word-Level Temporal Segmentation in Streaming Speech Recognition | Naoto Nishida et.al. | 2504.10849 | null |
| 2025-04-15 | Generalized Audio Deepfake Detection Using Frame-level Latent Information Entropy | Botao Zhao et.al. | 2504.10819 | null |
| 2025-04-14 | Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis | Yifan Yang et.al. | 2504.10352 | null |
| 2025-04-14 | AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis | Dan Luo et.al. | 2504.10309 | null |
| 2025-04-14 | SafeSpeech: Robust and Universal Voice Protection Against Malicious Speech Synthesis | Zhisheng Zhang et.al. | 2504.09839 | link |
| 2025-04-12 | AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis | Yubing Cao et.al. | 2504.09225 | null |
| 2025-04-11 | Spatial Audio Processing with Large Language Model on Wearable Devices | Ayushi Mishra et.al. | 2504.08907 | null |
| 2025-04-11 | Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion | Na Li et.al. | 2504.08524 | null |
| 2025-04-10 | From Speech to Summary: A Comprehensive Survey of Speech Summarization | Fabian Retkowski et.al. | 2504.08024 | null |
| 2025-04-10 | Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis | Yizhong Geng et.al. | 2504.07858 | null |
| 2025-04-10 | SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow | Kaidi Wang et.al. | 2504.07776 | null |
| 2025-04-10 | Extending Visual Dynamics for Video-to-Music Generation | Xiaohao Liu et.al. | 2504.07594 | null |
| 2025-04-09 | Visual-Aware Speech Recognition for Noisy Scenarios | Lakshmipathi Balaji et.al. | 2504.07229 | null |
| 2025-04-09 | RNN-Transducer-based Losses for Speech Recognition on Noisy Targets | Vladimir Bataev et.al. | 2504.06963 | null |
| 2025-04-08 | AVENet: Disentangling Features by Approximating Average Features for Voice Conversion | Wenyu Wang et.al. | 2504.05833 | null |
| 2025-04-08 | kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization | Keren Shao et.al. | 2504.05686 | null |
| 2025-04-07 | Of All StrIPEs: Investigating Structure-informed Positional Encoding for Efficient Music Generation | Manvi Agarwal et.al. | 2504.05364 | null |
| 2025-04-07 | DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation | Xinglin Lyu et.al. | 2504.05122 | null |
| 2025-04-06 | Trainable Adaptive Score Normalization for Automatic Speaker Verification | Jeong-Hwan Choi et.al. | 2504.04512 | null |
| 2025-04-06 | Public speech recognition transcripts as a configuring parameter | Damien Rudaz et.al. | 2504.04488 | null |
| 2025-04-06 | Activation Patching for Interpretable Steering in Music Generation | Simone Facchiano et.al. | 2504.04479 | null |
| 2025-04-08 | LoopGen: Training-Free Loopable Music Generation | Davide Marincione et.al. | 2504.04466 | null |
| 2025-04-06 | Selective Masking Adversarial Attack on Automatic Speech Recognition Systems | Zheng Fang et.al. | 2504.04394 | null |
| 2025-04-04 | An Efficient GPU-based Implementation for Noise Robust Sound Source Localization | Zirui Lin et.al. | 2504.03373 | null |
| 2025-04-04 | A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations | Abdul Mannan Mohammed et.al. | 2504.03147 | null |
| 2025-04-03 | LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect | Hedi Naouara et.al. | 2504.02604 | null |
| 2025-04-03 | Deep learning for music generation. Four approaches and their comparative evaluation | Razvan Paroiu et.al. | 2504.02586 | null |
| 2025-04-03 | F5R-TTS: Improving Flow Matching based Text-to-Speech with Group Relative Policy Optimization | Xiaohui Sun et.al. | 2504.02407 | null |
| 2025-04-03 | VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models | Kim Sung-Bin et.al. | 2504.02386 | null |
| 2025-04-02 | Chain of Correction for Full-text Speech Recognition with Large Language Models | Zhiyuan Tang et.al. | 2504.01519 | null |
| 2025-04-01 | Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition Systems | Weifei Jin et.al. | 2504.00858 | link |
| 2025-04-01 | A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives: Data, Methods, and Challenges | Shuyu Li et.al. | 2504.00837 | null |
| 2025-03-31 | Can Diffusion Models Disentangle? A Theoretical Perspective | Liming Wang et.al. | 2504.00220 | null |
| 2025-03-31 | SVLA: A Unified Speech-Vision-Language Assistant with Multimodal Reasoning and Speech Generation | Ngoc Dung Huynh et.al. | 2503.24164 | null |
| 2025-04-02 | TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection | Zhiming Ma et.al. | 2503.24115 | link |
| 2025-03-31 | SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development | Minghan Wang et.al. | 2503.23848 | link |
| 2025-03-30 | The Impact of Code-switched Synthetic Data Quality is Task Dependent: Insights from MT and ASR | Injy Hamed et.al. | 2503.23576 | null |
| 2025-03-30 | Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages | Xabier de Zuazo et.al. | 2503.23542 | link |
| 2025-03-30 | Scaling Auditory Cognition via Test-Time Compute in Audio Language Models | Ting Dang et.al. | 2503.23395 | null |
| 2025-03-29 | SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System | Hyeongju Kim et.al. | 2503.23108 | null |
| 2025-03-28 | Enhancing Dance-to-Music Generation via Negative Conditioning Latent Diffusion Model | Changchang Sun et.al. | 2503.22138 | null |
| 2025-03-27 | VALLR: Visual ASR Language Model for Lip Reading | Marshall Thomas et.al. | 2503.21408 | null |
| 2025-03-27 | A 71.2- $μ$ W Speech Recognition Accelerator with Recurrent Spiking Neural Network | Chih-Chyau Yang et.al. | 2503.21337 | null |
| 2025-03-27 | Vision-to-Music Generation: A Survey | Zhaokai Wang et.al. | 2503.21254 | link |
| 2025-03-26 | Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit | Aniket Abhishek Soni et.al. | 2503.21025 | null |
| 2025-03-26 | Text-Driven Voice Conversion via Latent State-Space Modeling | Wen Li et.al. | 2503.20999 | null |
| 2025-03-26 | FinAudio: A Benchmark for Audio Large Language Models in Financial Applications | Yupeng Cao et.al. | 2503.20990 | null |
| 2025-03-26 | Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages | Yangyang Meng et.al. | 2503.20212 | link |
| 2025-03-25 | Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy | Athiya Deviyani et.al. | 2503.19828 | null |
| 2025-03-25 | Analyzable Chain-of-Musical-Thought Prompting for High-Fidelity Music Generation | Max W. Y. Lam et.al. | 2503.19611 | null |
| 2025-03-25 | Boosting the Transferability of Audio Adversarial Examples with Acoustic Representation Optimization | Weifei Jin et.al. | 2503.19591 | null |
| 2025-03-25 | Design of Seamless Multi-modal Interaction Framework for Intelligent Virtual Agents in Wearable Mixed Reality Environment | Ghazanfar Ali et.al. | 2503.19334 | null |
| 2025-03-22 | A Survey on Structured State Space Sequence (S4) Models | Shriyank Somvanshi et.al. | 2503.18970 | link |
| 2025-03-24 | Towards Responsible AI Music: an Investigation of Trustworthy Features for Creative Systems | Jacopo de Berardinis et.al. | 2503.18814 | null |
| 2025-03-24 | Whispering in Amharic: Fine-tuning Whisper for Low-resource Language | Dawit Ketema Gete et.al. | 2503.18485 | null |
| 2025-03-23 | Elevating Robust Multi-Talker ASR by Decoupling Speaker Separation and Speech Recognition | Yufeng Yang et.al. | 2503.17886 | null |
| 2025-03-22 | LZMidi: Compression-Based Symbolic Music Generation | Connor Ding et.al. | 2503.17654 | null |
| 2025-03-21 | Your voice is your voice: Supporting Self-expression through Speech Generation and LLMs in Augmented and Alternative Communication | Yiwen Xu et.al. | 2503.17479 | null |
| 2025-03-21 | From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech | Ji-Hoon Kim et.al. | 2503.16956 | null |
| 2025-03-20 | CAARMA: Class Augmentation with Adversarial Mixup Regularization | Massa Baali et.al. | 2503.16718 | null |
| 2025-03-20 | WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching | Tianze Luo et.al. | 2503.16689 | null |
| 2025-03-20 | SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors | Yang Chen et.al. | 2503.16578 | null |
| 2025-03-19 | A Comprehensive Survey on Architectural Advances in Deep CNNs: Challenges, Applications, and Emerging Research Directions | Saddam Hussain Khan et.al. | 2503.16546 | null |
| 2025-03-19 | Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces | Korbinian Kuhn et.al. | 2503.15124 | null |
| 2025-03-19 | Communication Access Real-Time Translation Through Collaborative Correction of Automatic Speech Recognition | Korbinian Kuhn et.al. | 2503.15120 | null |
| 2025-03-19 | MoonCast: High-Quality Zero-Shot Podcast Generation | Zeqian Ju et.al. | 2503.14345 | link |
| 2025-03-18 | InnerSelf: Designing Self-Deepfaked Voice for Emotional Well-being | Guang Dai et.al. | 2503.14257 | null |
| 2025-03-17 | Halving transcription time: A fast, user-friendly and GDPR-compliant workflow to create AI-assisted transcripts for content analysis | Jakob Sponholz et.al. | 2503.13031 | null |
| 2025-03-14 | MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens | Jeong Hun Yeo et.al. | 2503.11315 | link |
| 2025-03-13 | AudioX: Diffusion Transformer for Anything-to-Audio Generation | Zeyue Tian et.al. | 2503.10522 | link |
| 2025-03-13 | Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings | Jakaria Islam Emon et.al. | 2503.10446 | link |
| 2025-03-14 | Proceedings of the ISCA/ITG Workshop on Diversity in Large Speech and Language Models | Sebastian Möller et.al. | 2503.10298 | null |
| 2025-03-12 | ValSub: Subsampling Validation Data to Mitigate Forgetting during ASR Personalization | Haaris Mehmood et.al. | 2503.09906 | null |
| 2025-03-12 | Quantization for OpenAI’s Whisper Models: A Comparative Analysis | Allison Andreyev et.al. | 2503.09905 | link |
| 2025-03-12 | Everything Can Be Described in Words: A Simple Unified Multi-Modal Framework with Semantic and Temporal Alignment | Xiaowei Bi et.al. | 2503.09081 | null |
| 2025-03-11 | An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR | Sewade Ogun et.al. | 2503.08954 | null |
| 2025-03-11 | YuE: Scaling Open Foundation Models for Long-Form Music Generation | Ruibin Yuan et.al. | 2503.08638 | link |
| 2025-03-11 | Prompt2LVideos: Exploring Prompts for Understanding Long-Form Multimodal Videos | Soumya Shamarao Jahagirdar et.al. | 2503.08335 | null |
| 2025-03-11 | FilmComposer: LLM-Driven Music Production for Silent Film Clips | Zhifeng Xie et.al. | 2503.08147 | link |
| 2025-03-11 | Boundary Regression for Leitmotif Detection in Music Audio | Sihun Lee et.al. | 2503.07977 | null |
| 2025-03-10 | Building English ASR model with regional language support | Purvi Agrawal et.al. | 2503.07522 | null |
| 2025-03-10 | Impact of Microphone Array Mismatches to Learning-based Replay Speech Detection | Michael Neri et.al. | 2503.07357 | null |
| 2025-03-10 | Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling | Michael McGuire et.al. | 2503.06924 | null |
| 2025-03-09 | Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs | Umberto Cappellazzo et.al. | 2503.06362 | null |
| 2025-03-08 | Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations | Jeong Hun Yeo et.al. | 2503.06273 | link |
| 2025-03-08 | A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment | Koji Inoue et.al. | 2503.06241 | null |
| 2025-03-07 | DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility | Yifan Liu et.al. | 2503.05223 | null |
| 2025-03-06 | From Voice to Safety: Language AI Powered Pilot-ATC Communication Understanding for Airport Surface Movement Collision Risk Assessment | Yutian Pang et.al. | 2503.04974 | null |
| 2025-03-04 | Normalization through Fine-tuning: Understanding Wav2vec 2.0 Embeddings for Phonetic Analysis | Yiming Wang et.al. | 2503.04814 | null |
| 2025-03-06 | LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM | Sambal Shikhar et.al. | 2503.04724 | link |
| 2025-03-06 | Self-Supervised Models for Phoneme Recognition: Applications in Children’s Speech for Reading Learning | Lucas Block Medin et.al. | 2503.04710 | null |
| 2025-03-05 | Good practices for evaluation of synthesized speech | Erica Cooper et.al. | 2503.03250 | null |
| 2025-03-03 | Fine-Tuning Whisper for Inclusive Prosodic Stress Analysis | Samuel S. Sohn et.al. | 2503.02907 | null |
| 2025-03-04 | Go Beyond Your Means: Unlearning with Per-Sample Gradient Orthogonalization | Aviv Shamsian et.al. | 2503.02312 | null |
| 2025-03-05 | Pruning Deep Neural Networks via a Combination of the Marchenko-Pastur Distribution and Regularization | Leonid Berlyand et.al. | 2503.01922 | null |
| 2025-03-03 | Augmenting Online Meetings with Context-Aware Real-time Music Generation | Haruki Suzawa et.al. | 2503.01354 | null |
| 2025-03-03 | Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology | Birger Moell et.al. | 2503.01266 | null |
| 2025-03-03 | DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion | Ziqian Ning et.al. | 2503.01183 | null |
| 2025-03-02 | Unveiling Biases while Embracing Sustainability: Assessing the Dual Challenges of Automatic Speech Recognition Systems | Ajinkya Kulkarni et.al. | 2503.00907 | null |
| 2025-03-02 | UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation | Alexander H. Liu et.al. | 2503.00733 | null |
| 2025-03-01 | PodAgent: A Comprehensive Framework for Podcast Generation | Yujia Xiao et.al. | 2503.00455 | null |
| 2025-02-28 | InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation | Chong Zhang et.al. | 2503.00084 | null |
| 2025-02-27 | LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation | Keisuke Kamahori et.al. | 2502.20583 | link |
| 2025-02-27 | Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications | Marcus Yu Zhe Wee et.al. | 2502.20311 | null |
| 2025-02-27 | CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR | Nian Shao et.al. | 2502.20040 | null |
| 2025-02-27 | DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models | Weihao wu et.al. | 2502.19924 | null |
| 2025-02-26 | Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis | Ziyue Jiang et.al. | 2502.18924 | null |
| 2025-02-26 | CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition | Jiaming Zhou et.al. | 2502.18913 | null |
| 2025-02-25 | Exploring Gender Disparities in Automatic Speech Recognition Technology | Hend ElGhazaly et.al. | 2502.18434 | null |
| 2025-02-27 | NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms | Yashan Wang et.al. | 2502.18008 | link |
| 2025-02-25 | Silent Speech Sentence Recognition with Six-Axis Accelerometers using Conformer and CTC Algorithm | Yudong Xie et.al. | 2502.17829 | null |
| 2025-02-26 | Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation | Qiuming Zhao et.al. | 2502.17380 | null |
| 2025-02-24 | Improving the Inclusivity of Dutch Speech Recognition by Fine-tuning Whisper on the JASMIN-CGN Corpus | Golshid Shekoufandeh et.al. | 2502.17284 | link |
| 2025-02-24 | Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM | Jiatong Shi et.al. | 2502.16897 | null |
| 2025-02-22 | Understanding Zero-shot Rare Word Recognition Improvements Through LLM Integration | Haoxuan Wang et.al. | 2502.16142 | null |
| 2025-02-21 | The Esethu Framework: Reimagining Sustainable Dataset Governance and Curation for Low-Resource Languages | Jenalea Rajab et.al. | 2502.15916 | null |
| 2025-02-21 | Retrieval-Augmented Speech Recognition Approach for Domain Challenges | Peng Shen et.al. | 2502.15264 | null |
| 2025-02-21 | Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders | Weiqiao Shan et.al. | 2502.15178 | null |
| 2025-02-21 | Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context Masking | Khanh Le et.al. | 2502.15158 | null |
| 2025-02-20 | WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models | Yifu Chen et.al. | 2502.14727 | null |
| 2025-02-20 | SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition | Khanh Le et.al. | 2502.14685 | null |
| 2025-02-20 | Moshi Moshi? A Model Selection Hijacking Adversarial Attack | Riccardo Petrucci et.al. | 2502.14586 | null |
| 2025-02-19 | On the application of Visibility Graphs in the Spectral Domain for Speaker Recognition | Hernan Bocaccio et.al. | 2502.14110 | null |
| 2025-02-18 | Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders | Seungbae Kim et.al. | 2502.13983 | null |
| 2025-02-19 | Measuring the Effect of Transcription Noise on Downstream Language Understanding Tasks | Ori Shapira et.al. | 2502.13645 | link |
| 2025-02-21 | VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation | Wei Zhao et.al. | 2502.13508 | link |
| 2025-02-19 | Adopting Whisper for Confidence Estimation | Vaibhav Aggarwal et.al. | 2502.13446 | null |
| 2025-02-18 | AV-Flow: Transforming Text to Audio-Visual Human-like Interactions | Aggelina Chatziagapi et.al. | 2502.13133 | null |
| 2025-02-18 | Neuro-oscillatory models of cortical speech processing | Olesia Dogonasheva et.al. | 2502.12935 | null |
| 2025-02-18 | High-Fidelity Music Vocoder using Neural Audio Codecs | Luca A. Lanzendörfer et.al. | 2502.12759 | null |
| 2025-02-18 | Playing with Voices: Tabletop Role-Playing Game Recordings as a Diarization Challenge | Lian Remme et.al. | 2502.12714 | null |
| 2025-02-18 | A Comprehensive Survey on Generative AI for Video-to-Music Generation | Shulei Ji et.al. | 2502.12489 | null |
| 2025-02-18 | Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models | Hanin Atwany et.al. | 2502.12414 | null |
| 2025-02-18 | On the Robust Approximation of ASR Metrics | Abdul Waheed et.al. | 2502.12408 | null |
| 2025-02-17 | A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond | Shreya Shukla et.al. | 2502.12048 | null |
| 2025-02-17 | NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing | Yifan Liang et.al. | 2502.12002 | null |
| 2025-02-17 | Can you pass that tool?: Implications of Indirect Speech in Physical Human-Robot Collaboration | Yan Zhang et.al. | 2502.11720 | null |
| 2025-02-17 | Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models | Yingqing Guo et.al. | 2502.11420 | null |
| 2025-02-16 | FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching | Hui Wang et.al. | 2502.11128 | null |
| 2025-02-16 | In Situ Optimization of an Optoelectronic Reservoir Computer with Digital Delayed Feedback | Fyodor Morozko et.al. | 2502.11126 | null |
| 2025-02-16 | DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities | Xiangyu Lu et.al. | 2502.11123 | null |
| 2025-02-14 | Enhancing Age-Related Robustness in Children Speaker Verification | Vishwas M. Shetty et.al. | 2502.10511 | null |
| 2025-02-14 | OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models | William Chen et.al. | 2502.10373 | null |
| 2025-02-14 | VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect | Qingyuan Fei et.al. | 2502.10329 | null |
| 2025-02-14 | Video Soundtrack Generation by Aligning Emotions and Temporal Boundaries | Serkan Sulun et.al. | 2502.10154 | null |
| 2025-02-14 | MTLM: an Innovative Language Model Training Paradigm for ASR | Qingliang Meng et.al. | 2502.10058 | null |
| 2025-02-14 | A Preliminary Exploration with GPT-4o Voice Mode | Yu-Xiang Lin et.al. | 2502.09940 | null |
| 2025-02-14 | Microphone Array Geometry Independent Multi-Talker Distant ASR: NTT System for the DASR Task of the CHiME-8 Challenge | Naoyuki Kamo et.al. | 2502.09859 | null |
| 2025-02-13 | SyntheticPop: Attacking Speaker Verification Systems With Synthetic VoicePops | Eshaq Jamdar et.al. | 2502.09553 | null |
| 2025-02-13 | Shortcut Learning Susceptibility in Vision Classifiers | Pirzada Suhail et.al. | 2502.09150 | null |
| 2025-02-13 | Quantum Approaches for Dysphonia Assessment in Small Speech Datasets | Ha Tran et.al. | 2502.08968 | null |
| 2025-02-13 | TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument | Kyungsu Kim et.al. | 2502.08939 | link |
| 2025-02-13 | ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech | Xin Wang et.al. | 2502.08857 | null |
| 2025-02-12 | Causal Analysis of ASR Errors for Children: Quantifying the Impact of Physiological, Cognitive, and Extrinsic Factors | Vishwanath Pratap Singh et.al. | 2502.08587 | null |
| 2025-02-11 | LoRP-TTS: Low-Rank Personalized Text-To-Speech | Łukasz Bondaruk et.al. | 2502.07562 | null |
| 2025-02-12 | Music for All: Exploring Multicultural Representations in Music Generation Models | Atharva Mehta et.al. | 2502.07328 | link |
| 2025-02-11 | Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement | Xueyao Zhang et.al. | 2502.07243 | null |
| 2025-02-11 | VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification | Pengyu Wang et.al. | 2502.07205 | link |
| 2025-02-10 | A Comparative Study of ASR Implementations in Resource-Constrained Wireless Sensor Networks for Real-Time Voice Communication | Qutaiba I. Ali et.al. | 2502.06969 | null |
| 2025-02-10 | Automatic Identification of Samples in Hip-Hop Music via Multi-Loss Training and an Artificial Dataset | Huw Cheston et.al. | 2502.06364 | null |
| 2025-02-09 | Speech to Speech Translation with Translatotron: A State of the Art Review | Jules R. Kala et.al. | 2502.05980 | null |
| 2025-02-09 | Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models | Jing-Xuan Zhang et.al. | 2502.05766 | null |
| 2025-02-09 | Non-invasive electromyographic speech neuroprosthesis: a geometric perspective | Harshavardhana T. Gowda et.al. | 2502.05762 | null |
| 2025-02-09 | BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting | Mohammad Jahid Ibna Basher et.al. | 2502.05729 | null |
| 2025-02-08 | Gender Bias in Instruction-Guided Speech Synthesis Models | Chun-Yi Kuan et.al. | 2502.05649 | null |
| 2025-02-08 | Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model | Jialong Zuo et.al. | 2502.05471 | null |
| 2025-02-07 | Evaluating Standard and Dialectal Frisian ASR: Multilingual Fine-tuning and Language Identification for Improved Low-resource Performance | Reihaneh Amooie et.al. | 2502.04883 | null |
| 2025-02-07 | Lightweight Operations for Visual Speech Recognition | Iason Ioannis Panagos et.al. | 2502.04834 | null |
| 2025-02-07 | Singing Voice Conversion with Accompaniment Using Self-Supervised Representation-Based Melody Features | Wei Chen et.al. | 2502.04722 | null |
| 2025-02-06 | ImprovNet: Generating Controllable Musical Improvisations with Iterative Corruption Refinement | Keshav Bhandari et.al. | 2502.04522 | link |
| 2025-02-06 | GenVC: Self-Supervised Zero-Shot Voice Conversion | Zexin Cai et.al. | 2502.04519 | null |
| 2025-02-06 | FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks | Luca Della Libera et.al. | 2502.04465 | link |
| 2025-02-06 | Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis | Zhen Ye et.al. | 2502.04128 | link |
| 2025-02-06 | Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond | Mardhiyah Sanni et.al. | 2502.03945 | null |
| 2025-02-06 | Rule-Based Modeling of Low-Dimensional Data with PCA and Binary Particle Swarm Optimization (BPSO) in ANFIS | Afnan Al-Ali et.al. | 2502.03895 | null |
| 2025-02-05 | Integrating automatic speech recognition into remote healthcare interpreting: A pilot study of its impact on interpreting quality | Shiyi Tan et.al. | 2502.03381 | null |
| 2025-02-05 | Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling | Jakob Poncelet et.al. | 2502.03212 | link |
| 2025-02-05 | Metis: A Foundation Speech Generation Model with Masked Generative Pre-training | Yuancheng Wang et.al. | 2502.03128 | null |
| 2025-02-04 | Developing multilingual speech synthesis system for Ojibwe, Mi’kmaq, and Maliseet | Shenran Wang et.al. | 2502.02703 | null |
| 2025-02-03 | CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition | Martijn Bartelds et.al. | 2502.01777 | null |
| 2025-02-03 | Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models | Christopher Simic et.al. | 2502.01709 | null |
| 2025-02-03 | A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport | Yacouba Kaloga et.al. | 2502.01588 | null |
| 2025-02-03 | mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition | Andrew Rouditchenko et.al. | 2502.01547 | link |
| 2025-02-03 | Gradient Norm-based Fine-Tuning for Backdoor Defense in Automatic Speech Recognition | Nanjun Zhou et.al. | 2502.01152 | null |
| 2025-02-03 | Continuous Autoregressive Modeling with Stochastic Monotonic Alignment for Speech Synthesis | Weiwei Lin et.al. | 2502.01084 | null |
| 2025-02-01 | Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition | Anna Seo Gyeong Choi et.al. | 2502.00583 | null |
| 2025-02-01 | Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions | David Gimeno-Gómez et.al. | 2502.00464 | null |
| 2025-02-01 | Sagalee: an Open Source Automatic Speech Recognition Dataset for Oromo Language | Turi Abu et.al. | 2502.00421 | link |
| 2025-02-01 | When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation | Anna Min et.al. | 2502.00377 | null |
| 2025-02-03 | SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions | Dominik Wagner et.al. | 2501.19377 | null |
| 2025-01-31 | Language Bias in Self-Supervised Learning For Automatic Speech Recognition | Edward Storey et.al. | 2501.19321 | null |
| 2025-02-03 | DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech Recognition | Wonjun Lee et.al. | 2501.19010 | null |
| 2025-01-30 | AGAV-Rater: Adapting Large Multimodal Model for AI-Generated Audio-Visual Quality Assessment | Yuqin Cao et.al. | 2501.18314 | null |
| 2025-01-29 | Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling | Theo Lepage et.al. | 2501.17772 | null |
| 2025-01-29 | Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition | Zhengdong Yang et.al. | 2501.17615 | null |
| 2025-01-29 | VoicePrompter: Robust Zero-Shot Voice Conversion with Voice Prompt and Conditional Flow Matching | Ha-Yeong Choi et.al. | 2501.17612 | null |
| 2025-01-28 | Compact Neural TTS Voices for Accessibility | Kunal Jain et.al. | 2501.17332 | null |
| 2025-01-28 | RDMM: Fine-Tuned LLM Models for On-Device Robotic Decision Making with Enhanced Contextual Awareness in Specific Domains | Shady Nasrat et.al. | 2501.16899 | link |
| 2025-01-28 | AVE Speech Dataset: A Comprehensive Benchmark for Multi-Modal Speech Recognition Integrating Audio, Visual, and Electromyographic Signals | Dongliang Zhou et.al. | 2501.16780 | null |
| 2025-01-28 | SCDiar: a streaming diarization system based on speaker change detection and speech recognition | Naijun Zheng et.al. | 2501.16641 | null |
| 2025-01-27 | UniPET-SPK: A Unified Framework for Parameter-Efficient Tuning of Pre-trained Speech Models for Robust Speaker Verification | Mufan Sang et.al. | 2501.16542 | null |
| 2025-01-27 | Optimized Self-supervised Training with BEST-RQ for Speech Recognition | Ilja Baumann et.al. | 2501.16131 | null |
| 2025-01-27 | Classification Error Bound for Low Bayes Error Conditions in Machine Learning | Zijian Yang et.al. | 2501.15977 | null |
| 2025-01-26 | Stepback: Enhanced Disentanglement for Voice Conversion via Multi-Task Learning | Qian Yang et.al. | 2501.15613 | null |
| 2025-01-26 | End-to-End Target Speaker Speech Recognition Using Context-Aware Attention Mechanisms for Challenging Enrollment Scenario | Mohsen Ghane et.al. | 2501.15466 | null |
| 2025-01-26 | Overview of the Amphion Toolkit (v0.2) | Jiaqi Li et.al. | 2501.15442 | link |
| 2025-01-25 | The Multicultural Medical Assistant: Can LLMs Improve Medical ASR Errors Across Borders? | Ayo Adedeji et.al. | 2501.15310 | null |
| 2025-01-25 | Music Generation using Human-In-The-Loop Reinforcement Learning | Aju Ani Justus et.al. | 2501.15304 | null |
| 2025-01-25 | Speech Translation Refinement using Large Language Models | Huaixia Dou et.al. | 2501.15090 | link |
| 2025-01-25 | Robust Cross-Etiology and Speaker-Independent Dysarthric Speech Recognition | Satwinder Singh et.al. | 2501.14994 | null |
| 2025-01-27 | Diffusion based Text-to-Music Generation with Global and Local Text based Conditioning | Jisi Zhang et.al. | 2501.14680 | null |
| 2025-01-24 | FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration | Kai-Tuo Xu et.al. | 2501.14350 | link |
| 2025-01-24 | Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models | Tianrui Wang et.al. | 2501.14273 | null |
| 2025-01-24 | Generalizable Audio Deepfake Detection via Latent Space Refinement and Augmentation | Wen Huang et.al. | 2501.14240 | null |
| 2025-01-24 | LoCoML: A Framework for Real-World ML Inference Pipelines | Kritin Maddireddy et.al. | 2501.14165 | null |
| 2025-01-23 | Integrating Persian Lip Reading in Surena-V Humanoid Robot for Human-Robot Interaction | Ali Farshian Abbasi et.al. | 2501.13996 | null |
| 2025-01-23 | Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing | Hao Zhang et.al. | 2501.13831 | null |
| 2025-01-23 | Learning-based A Posteriori Speech Presence Probability Estimation and Applications | Shuai Tao et.al. | 2501.13642 | null |
| 2025-01-23 | DQ-Data2vec: Decoupling Quantization for Multilingual Speech Recognition | Qijie Shao et.al. | 2501.13497 | null |
| 2025-01-23 | Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement | Jae-Sung Bae et.al. | 2501.13372 | null |
| 2025-01-23 | OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia | Xuelong Geng et.al. | 2501.13306 | link |
| 2025-01-22 | Let SSMs be ConvNets: State-space Modeling with Optimal Tensor Contractions | Yan Ru Pei et.al. | 2501.13230 | link |
| 2025-01-22 | FlanEC: Exploring Flan-T5 for Post-ASR Error Correction | Moreno La Quatra et.al. | 2501.12979 | link |
| 2025-01-21 | A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data | Minh Tran et.al. | 2501.12501 | null |
| 2025-01-21 | DOTA-ME-CS: Daily Oriented Text Audio-Mandarin English-Code Switching Dataset | Yupei Li et.al. | 2501.12122 | null |
| 2025-01-20 | Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio | Mateusz Barański et.al. | 2501.11378 | null |
| 2025-01-20 | SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with Local and Global Contexts Aggregation | Ziling Huang et.al. | 2501.11274 | null |
| 2025-01-19 | Enhancing Neural Spoken Language Recognition: An Exploration with Multilingual Datasets | Or Haim Anidjar et.al. | 2501.11065 | null |
| 2025-01-18 | A Benchmark of French ASR Systems Based on Error Severity | Antoine Tholly et.al. | 2501.10879 | null |
| 2025-01-18 | GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems | Amin Robatian et.al. | 2501.10734 | null |
| 2025-01-17 | Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR | Karl El Hajal et.al. | 2501.10256 | null |
| 2025-01-17 | Automatic Speech Recognition for Sanskrit with Transfer Learning | Bidit Sadhukhan et.al. | 2501.10024 | null |
| 2025-01-17 | GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions | Heda Zuo et.al. | 2501.09972 | null |
| 2025-01-21 | PIER: A Novel Metric for Evaluating What Matters in Code-Switching | Enes Yavuz Ugan et.al. | 2501.09512 | link |
| 2025-01-16 | Teaching Wav2Vec2 the Language of the Brain | Tobias Fiedler et.al. | 2501.09459 | link |
| 2025-01-16 | Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition | Takaaki Hori et.al. | 2501.09258 | null |
| 2025-01-17 | persoDA: Personalized Data Augmentation for Personalized ASR | Pablo Peso Parada et.al. | 2501.09113 | null |
| 2025-01-15 | A Non-autoregressive Model for Joint STT and TTS | Vishal Sunder et.al. | 2501.09104 | null |
| 2025-01-13 | Discrimination loss vs. SRT: A model-based approach towards harmonizing speech test interpretations | Mareike Buhl et.al. | 2501.08921 | null |
| 2025-01-15 | XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework | Sida Tian et.al. | 2501.08809 | null |
| 2025-01-15 | Speech Synthesis along Perceptual Voice Quality Dimensions | Frederik Rautenberg et.al. | 2501.08791 | null |
| 2025-01-15 | Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification | Li Zhang et.al. | 2501.08691 | null |
| 2025-01-15 | Adapting Whisper for Regional Dialects: Enhancing Public Services for Vulnerable Populations in the United Kingdom | Melissa Torgbi et.al. | 2501.08502 | null |
| 2025-01-14 | Selective Attention Merging for low resource tasks: A case study of Child ASR | Natarajan Balaji Shankar et.al. | 2501.08468 | link |
| 2025-01-14 | Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications | Dimme de Groot et.al. | 2501.08104 | null |
| 2025-01-13 | Exploring the encoding of linguistic representations in the Fully-Connected Layer of generative CNNs for Speech | Bruno Ferenc Šegedin et.al. | 2501.07726 | null |
| 2025-01-13 | Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding | Jiliang Hu et.al. | 2501.07329 | null |
| 2025-01-13 | Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model | Ziyang Ma et.al. | 2501.07246 | null |
| 2025-01-13 | AdaCS: Adaptive Normalization for Enhanced Code-Switching ASR | The Chuong Chu et.al. | 2501.07102 | null |
| 2025-01-11 | Discrete Speech Unit Extraction via Independent Component Analysis | Tomohiko Nakamura et.al. | 2501.06562 | link |
| 2025-01-11 | A Survey on Spoken Italian Datasets and Corpora | Marco Giordano et.al. | 2501.06557 | null |
| 2025-01-11 | Speech Recognition for Automatically Assessing Afrikaans and isiXhosa Preschool Oral Narratives | Christiaan Jacobs et.al. | 2501.06478 | null |
| 2025-01-11 | Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis | Rui Liu et.al. | 2501.06467 | null |
| 2025-01-10 | TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer | Vladimir Bataev et.al. | 2501.06320 | null |
| 2025-01-10 | Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI | Yuya Asano et.al. | 2501.06129 | null |
| 2025-01-10 | Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding | Fabian David Schmidt et.al. | 2501.06117 | link |
| 2025-01-10 | Benchmarking Rotary Position Embeddings for Automatic Speech Recognition | Shucong Zhang et.al. | 2501.06051 | null |
| 2025-01-10 | Comparing Self-Supervised Learning Models Pre-Trained on Human Speech and Animal Vocalizations for Bioacoustics Processing | Eklavya Sarkar et.al. | 2501.05987 | link |
| 2025-01-10 | Low-Resource Text-to-Speech Synthesis Using Noise-Augmented Training of ForwardTacotron | Kishor Kayyar Lakshminarayana et.al. | 2501.05976 | null |
| 2025-01-10 | Universal-2-TF: Robust All-Neural Text Formatting for ASR | Yash Khare et.al. | 2501.05948 | null |
| 2025-01-10 | ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification | Yi Ma et.al. | 2501.05729 | link |
| 2025-01-09 | FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion | Alef Iury Siqueira Ferreira et.al. | 2501.05586 | link |
| 2025-01-09 | Probing Speaker-specific Features in Speaker Representations | Aemon Yat Fei Chiu et.al. | 2501.05310 | null |
| 2025-01-09 | DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification | Qing Wang et.al. | 2501.05127 | null |
| 2025-01-09 | JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis | Jun-Hyeok Cha et.al. | 2501.04904 | null |
| 2025-01-08 | FleSpeech: Flexibly Controllable Speech Generation with Various Prompts | Hanzhao Li et.al. | 2501.04644 | null |
| 2025-01-09 | OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis | Run Luo et.al. | 2501.04561 | null |
| 2025-01-09 | Right Label Context in End-to-End Training of Time-Synchronous ASR Models | Tina Raissi et.al. | 2501.04521 | null |
| 2025-01-08 | PolInterviews – A Dataset of German Politician Public Broadcast Interviews | Lukas Birkenmaier et.al. | 2501.04484 | null |
| 2025-01-08 | ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training | Xinfa Zhu et.al. | 2501.04416 | null |
| 2025-01-08 | Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition | Huimeng Wang et.al. | 2501.04379 | null |
| 2025-01-08 | DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions | Weidong Chen et.al. | 2501.04256 | null |
| 2025-01-08 | LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition | Bowen Hao et.al. | 2501.04204 | null |
| 2025-01-07 | Spectral-Aware Low-Rank Adaptation for Speaker Verification | Zhe Li et.al. | 2501.03829 | link |
| 2025-01-07 | NeuroIncept Decoder for High-Fidelity Speech Reconstruction from Neural Activity | Owais Mujtaba Khanday et.al. | 2501.03757 | null |
| 2025-01-07 | Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection | Bang Zeng et.al. | 2501.03612 | null |
| 2025-01-07 | Towards a Generalizable Speech Marker for Parkinson’s Disease Diagnosis | Maksim Siniukov et.al. | 2501.03581 | null |
| 2025-01-07 | Deep Learning for Pathological Speech: A Survey | Shakeel A. Sheikh et.al. | 2501.03536 | null |
| 2025-01-02 | FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles | Tian-Hao Zhang et.al. | 2501.03181 | null |
| 2025-01-06 | SYKI-SVC: Advancing Singing Voice Conversion with Post-Processing Innovations and an Open-Source Professional Testset | Yiquan Zhou et.al. | 2501.02953 | null |
| 2025-01-07 | Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models | Syed Abdul Gaffar Shakhadri et.al. | 2501.02832 | null |
| 2025-01-05 | Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module | Zhongjian Cui et.al. | 2501.02452 | null |
| 2025-01-03 | Improving Transducer-Based Spoken Language Understanding with Self-Conditioned CTC and Knowledge Transfer | Vishal Sunder et.al. | 2501.01936 | null |
| 2025-01-03 | CycleFlow: Leveraging Cycle Consistency in Flow Matching for Speaker Style Adaptation | Ziqi Liang et.al. | 2501.01861 | null |
| 2025-01-03 | MusicGen-Stem: Multi-stem music generation and edition through autoregressive modeling | Simon Rouard et.al. | 2501.01757 | null |
| 2025-01-03 | Controlling your Attributes in Voice | Xuyuan Li et.al. | 2501.01674 | null |
| 2025-01-03 | AdaptVC: High Quality Voice Conversion with Adaptive Learning | Jaehun Kim et.al. | 2501.01347 | null |
| 2025-01-02 | Advancing Singlish Understanding: Bridging the Gap with Datasets and Multimodal Models | Bin Wang et.al. | 2501.01034 | link |
| 2025-01-01 | Incremental Dialogue Management: Survey, Discussion, and Implications for HRI | Casey Kennington et.al. | 2501.00953 | null |
| 2025-01-01 | Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation | Shoutao Guo et.al. | 2501.00868 | link |
| 2025-01-01 | Automatic Text Pronunciation Correlation Generation and Application for Contextual Biasing | Gaofeng Cheng et.al. | 2501.00804 | null |
| 2024-12-31 | Fotheidil: an Automatic Transcription System for the Irish Language | Liam Lonergan et.al. | 2501.00509 | null |
| 2024-12-31 | Unrolled Creative Adversarial Network For Generating Novel Musical Pieces | Pratik Nag et.al. | 2501.00452 | null |
| 2024-12-31 | Whisper Turns Stronger: Augmenting Wav2Vec 2.0 for Superior ASR in Low-Resource Languages | Or Haim Anidjar et.al. | 2501.00425 | null |
| 2024-12-30 | Takeaways from Applying LLM Capabilities to Multiple Conversational Avatars in a VR Pilot Study | Mykola Maslych et.al. | 2501.00168 | null |
| 2024-12-30 | DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition | Alexander Polok et.al. | 2501.00114 | null |
| 2024-12-29 | EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion | Ashishkumar Gudmalwar et.al. | 2412.20359 | null |
| 2024-12-28 | Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting | Wooseok Han et.al. | 2412.20155 | null |
| 2024-12-28 | CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation | Ji-Hoon Kim et.al. | 2412.20048 | null |
| 2024-12-27 | Enhancing Whisper’s Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization | Kumud Tripathi et.al. | 2412.19785 | null |
| 2024-12-26 | Towards a Single ASR Model That Generalizes to Disordered Speech | Jimmy Tobin et.al. | 2412.19315 | null |
| 2024-12-26 | VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis | Jaemin Jung et.al. | 2412.19259 | null |
| 2024-12-26 | Attacking Voice Anonymization Systems with Augmented Feature and Speaker Identity Difference | Yanzhe Zhang et.al. | 2412.19068 | null |
| 2024-12-26 | Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization | Yihan Wu et.al. | 2412.19005 | link |
| 2024-12-25 | MRI2Speech: Speech Synthesis from Articulatory Movements Recorded by Real-time MRI | Neil Shah et.al. | 2412.18836 | null |
| 2024-12-25 | Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition | Shujie Hu et.al. | 2412.18832 | null |
| 2024-12-25 | Zema Dataset: A Comprehensive Study of Yaredawi Zema with a Focus on Horologium Chants | Mequanent Argaw Muluneh et.al. | 2412.18784 | null |
| 2024-12-25 | Intra- and Inter-modal Context Interaction Modeling for Conversational Speech Synthesis | Zhenqi Jia et.al. | 2412.18733 | null |
| 2024-12-24 | Zero-resource Speech Translation and Recognition with LLMs | Karel Mundnich et.al. | 2412.18566 | null |
| 2024-12-23 | Trading Devil RL: Backdoor attack via Stock market, Bayesian Optimization and Reinforcement Learning | Orson Mengara et.al. | 2412.17908 | null |
| 2024-12-23 | Investigating Prosodic Signatures via Speech Pre-Trained Models for Audio Deepfake Source Attribution | Orchid Chetia Phukan et.al. | 2412.17796 | null |
| 2024-12-23 | VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music | Jiatong Shi et.al. | 2412.17667 | link |
| 2024-12-23 | UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition | Li Fu et.al. | 2412.17507 | null |
| 2024-12-23 | Deep Learning in Proteomics Informatics: Applications, Challenges, and Future Directions | Yindan Luo et.al. | 2412.17349 | null |
| 2024-12-23 | Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding | Yueqian Wang et.al. | 2412.17295 | link |
| 2024-12-22 | Analysis of Speech Temporal Dynamics in the Context of Speaker Verification and Voice Anonymization | Natalia Tomashenko et.al. | 2412.17164 | null |
| 2024-12-22 | Tandem spoofing-robust automatic speaker verification based on time-domain embeddings | Avishai Weizman et.al. | 2412.17133 | null |
| 2024-12-22 | Uncovering the Visual Contribution in Audio-Visual Speech Recognition | Zhaofeng Lin et.al. | 2412.17129 | null |
| 2024-12-22 | Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis | Ye-Xin Lu et.al. | 2412.16977 | null |
| 2024-12-22 | Autoregressive Speech Synthesis with Next-Distribution Prediction | Xinfa Zhu et.al. | 2412.16846 | null |
| 2024-12-20 | MathSpeech: Leveraging Small LMs for Accurate Conversion in Mathematical Speech-to-Formula | Sieun Hyeon et.al. | 2412.15655 | link |
| 2024-12-20 | TouchASP: Elastic Automatic Speech Perception that Everyone Can Touch | Xingchen Song et.al. | 2412.15622 | null |
| 2024-12-19 | Transcribing and Translating, Fast and Slow: Joint Speech Translation and Recognition | Niko Moritz et.al. | 2412.15415 | null |
| 2024-12-19 | LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration | Sangmin Lee et.al. | 2412.15299 | null |
| 2024-12-17 | Deep Speech Synthesis from Multimodal Articulatory Representations | Peter Wu et.al. | 2412.13387 | null |
| 2024-12-17 | CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition | He Wang et.al. | 2412.12760 | null |
| 2024-12-17 | Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency | Yu Xi et.al. | 2412.12635 | null |
| 2024-12-17 | Hierarchical Control of Emotion Rendering in Speech Synthesis | Sho Inoue et.al. | 2412.12498 | link |
| 2024-12-17 | Speak & Improve Corpus 2025: an L2 English Speech Corpus for Language Assessment and Feedback | Kate Knill et.al. | 2412.11986 | null |
| 2024-12-17 | Speak & Improve Challenge 2025: Tasks and Baseline Systems | Mengjie Qian et.al. | 2412.11985 | null |
| 2024-12-19 | ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis | Xiangheng He et.al. | 2412.11795 | null |
| 2024-12-16 | Region-Based Optimization in Continual Learning for Audio Deepfake Detection | Yujie Chen et.al. | 2412.11551 | link |
| 2024-12-16 | Towards a Speech Foundation Model for Singapore and Beyond | Muhammad Huzaifah et.al. | 2412.11538 | null |
| 2024-12-15 | Transliterated Zero-Shot Domain Adaptation for Automatic Speech Recognition | Han Zhu et.al. | 2412.11185 | null |
| 2024-12-14 | MASV: Speaker Verification with Global and Local Context Mamba | Yang Liu et.al. | 2412.10989 | null |
| 2024-12-14 | Robust Recognition of Persian Isolated Digits in Speech using Deep Neural Network | Ali Nasr-Esfahani et.al. | 2412.10857 | null |
| 2024-12-14 | Efficient Adaptation of Multilingual Models for Japanese ASR | Mark Bajo et.al. | 2412.10705 | null |
| 2024-12-16 | Efficient Generative Modeling with Residual Vector Quantization-Based Tokens | Jaehyeon Kim et.al. | 2412.10208 | null |
| 2024-12-13 | CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models | Zhihao Du et.al. | 2412.10117 | null |
| 2024-12-13 | AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation | Xiyuan Gao et.al. | 2412.10103 | null |
| 2024-12-13 | CSL-L2M: Controllable Song-Level Lyric-to-Melody Generation Based on Conditional Transformer with Fine-Grained Lyric and Musical Controls | Li Chai et.al. | 2412.09887 | null |
| 2024-12-13 | MERaLiON-AudioLLM: Technical Report | Yingxu He et.al. | 2412.09818 | null |
| 2024-12-12 | Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation | Baisen Wang et.al. | 2412.09428 | link |
| 2024-12-12 | Interpreting Graphic Notation with MusicLDM: An AI Improvisation of Cornelius Cardew’s Treatise | Tornike Karchkhadze et.al. | 2412.08944 | null |
| 2024-12-11 | Multimodal Latent Language Modeling with Next-Token Diffusion | Yutao Sun et.al. | 2412.08635 | link |
| 2024-12-12 | Watermarking Training Data of Music Generation Models | Pascal Epple et.al. | 2412.08549 | null |
| 2024-12-11 | Bilevel Joint Unsupervised and Supervised Training for Automatic Speech Recognition | Xiaodong Cui et.al. | 2412.08548 | null |
| 2024-12-11 | Zero-Shot Mono-to-Binaural Speech Synthesis | Alon Levkovitch et.al. | 2412.08356 | null |
| 2024-12-11 | A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction | Sowmya Cheripally et.al. | 2412.08312 | null |
| 2024-12-10 | Frechet Music Distance: A Metric For Generative Symbolic Music Evaluation | Jan Retkowski et.al. | 2412.07948 | null |
| 2024-12-10 | Style-agnostic evaluation of ASR using multiple reference transcripts | Quinten McNamara et.al. | 2412.07937 | null |
| 2024-12-09 | Effective Text Adaptation for LLM-based ASR through Soft Prompt Fine-Tuning | Yingyi Ma et.al. | 2412.06967 | null |
| 2024-12-09 | MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models | Shansong Liu et.al. | 2412.06660 | null |
| 2024-12-09 | Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey | Tianxin Xie et.al. | 2412.06602 | link |
| 2024-12-09 | Not All Errors Are Equal: Investigation of Speech Recognition Errors in Alzheimer’s Disease Detection | Jiawen Kang et.al. | 2412.06332 | null |
| 2024-12-09 | VidMusician: Video-to-Music Generation with Semantic-Rhythmic Alignment via Hierarchical Visual Features | Sifei Li et.al. | 2412.06296 | null |
| 2024-12-09 | Leveraging Prompt Learning and Pause Encoding for Alzheimer’s Disease Detection | Yin-Long Liu et.al. | 2412.06259 | null |
| 2024-12-07 | SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR | Pengcheng Guo et.al. | 2412.05589 | null |
| 2024-12-06 | Adaptive Dropout for Pruning Conformers | Yotaro Kubo et.al. | 2412.04836 | null |
| 2024-12-10 | StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching | Jixun Yao et.al. | 2412.04724 | null |
| 2024-12-05 | Missing Melodies: AI Music Generation and its “Nearly” Complete Omission of the Global South | Atharva Mehta et.al. | 2412.04100 | null |
| 2024-12-05 | Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding | Vakada Naveen et.al. | 2412.03980 | null |
| 2024-12-05 | Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech | Yerin Choi et.al. | 2412.03784 | null |
| 2024-12-04 | ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction | Victor Junqiu Wei et.al. | 2412.03075 | null |
| 2024-12-04 | Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model | Joonyong Park et.al. | 2412.03074 | null |
| 2024-12-03 | GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot | Aohan Zeng et.al. | 2412.02612 | link |
| 2024-12-01 | Late fusion ensembles for speech recognition on diverse input audio representations | Marin Jezidžić et.al. | 2412.01861 | null |
| 2024-12-02 | Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification | Bei Liu et.al. | 2412.01195 | null |
| 2024-12-01 | Automating Feedback Analysis in Surgical Training: Detection, Categorization, and Assessment | Firdavs Nasriddinov et.al. | 2412.00760 | link |
| 2024-12-04 | A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario | Zheshu Song et.al. | 2412.00721 | null |
| 2024-11-30 | From Audio Deepfake Detection to AI-Generated Music Detection – A Pathway and Overview | Yupei Li et.al. | 2412.00571 | null |
| 2024-11-30 | Sample adaptive data augmentation with progressive scheduling | Hongxuan Lu et.al. | 2412.00415 | null |
| 2024-11-30 | Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models | Nadeen Fathallah et.al. | 2412.00342 | null |
| 2024-11-30 | MusicGen-Chord: Advancing Music Generation through Chord Progressions and Interactive Web-UI | Jongmin Jung et.al. | 2412.00325 | null |
| 2024-11-30 | Improving speaker verification robustness with synthetic emotional utterances | Nikhil Kumar Koditala et.al. | 2412.00319 | null |
| 2024-11-29 | Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities | Haorui He et.al. | 2411.19770 | null |
| 2024-11-29 | Memristive Nanowire Network for Energy Efficient Audio Classification: Pre-Processing-Free Reservoir Computing with Reduced Latency | Akshaya Rajesh et.al. | 2411.19611 | null |
| 2024-12-02 | CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion | Yuke Li et.al. | 2411.18918 | null |
| 2024-11-28 | ArEEG_Words: Dataset for Envisioned Speech Recognition using EEG for Arabic Words | Hazem Darwish et.al. | 2411.18888 | null |
| 2024-11-27 | EEG-Based Analysis of Brain Responses in Multi-Modal Human-Robot Interaction: Modulating Engagement | Suzanne Oliver et.al. | 2411.18587 | null |
| 2024-11-27 | AMPS: ASR with Multimodal Paraphrase Supervision | Amruta Parulekar et.al. | 2411.18368 | null |
| 2024-11-27 | Continual Learning in Machine Speech Chain Using Gradient Episodic Memory | Geoffrey Tyndall et.al. | 2411.18320 | null |
| 2024-11-27 | Aligning Pre-trained Models for Spoken Language Translation | Šimon Sedláček et.al. | 2411.18294 | null |
| 2024-11-27 | Efficient Nonlinear Function Approximation in Analog Resistive Crossbars for Recurrent Neural Networks | Junyi Yang et.al. | 2411.18271 | null |
| 2024-11-27 | How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario | Shih-Heng Wang et.al. | 2411.18217 | null |
| 2024-11-27 | Machine Unlearning reveals that the Gender-based Violence Victim Condition can be detected from Speech in a Speaker-Agnostic Setting | Emma Reyner-Fuentes et.al. | 2411.18177 | null |
| 2024-11-27 | MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models | Thai-Binh Nguyen et.al. | 2411.18152 | null |
| 2024-11-27 | SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation | Wenyi Yu et.al. | 2411.18138 | null |
| 2024-11-27 | Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition | Shih-heng Wang et.al. | 2411.18107 | null |
| 2024-11-26 | Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis | Akshita Gupta et.al. | 2411.17690 | null |
| 2024-11-26 | Scaling Speech-Text Pre-training with Synthetic Interleaved Data | Aohan Zeng et.al. | 2411.17607 | null |
| 2024-11-26 | Towards Maximum Likelihood Training for Transducer-based Streaming Speech Recognition | Hyeonseung Lee et.al. | 2411.17537 | null |
| 2024-11-26 | Comparative Analysis of ASR Methods for Speech Deepfake Detection | Davide Salvi et.al. | 2411.17349 | null |
| 2024-11-26 | k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning | Yifan Yang et.al. | 2411.17100 | null |
| 2024-11-25 | Synthesising Handwritten Music with GANs: A Comprehensive Evaluation of CycleWGAN, ProGAN, and DCGAN | Elona Shatri et.al. | 2411.16405 | null |
| 2024-11-25 | The SVASR System for Text-dependent Speaker Verification (TdSV) AAIC Challenge 2024 | Mohammadreza Molavi et.al. | 2411.16276 | null |
| 2024-11-25 | SKQVC: One-Shot Voice Conversion by K-Means Quantization with Self-Supervised Speech Representations | Youngjun Sim et.al. | 2411.16147 | null |
| 2024-11-24 | A Training-Free Approach for Music Style Transfer with Latent Diffusion Models | Sooyoung Kim et.al. | 2411.15913 | null |
| 2024-11-22 | Transforming NLU with Babylon: A Case Study in Development of Real-time, Edge-Efficient, Multi-Intent Translation System for Automated Drive-Thru Ordering | Mostafa Varzaneh et.al. | 2411.15372 | null |
| 2024-11-22 | Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network | Irfan Nafiz Shahan et.al. | 2411.15082 | link |
| 2024-11-22 | VQalAttent: a Transparent Speech Generation Pipeline based on Transformer-learned VQ-VAE Latent Space | Armani Rodriguez et.al. | 2411.14642 | null |
| 2024-11-21 | Generative AI for Music and Audio | Hao-Wen Dong et.al. | 2411.14627 | null |
| 2024-11-20 | From Statistical Methods to Pre-Trained Models; A Survey on Automatic Speech Recognition for Resource Scarce Urdu Language | Muhammad Sharif et.al. | 2411.14493 | null |
| 2024-11-21 | Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge | Ruiyang Qin et.al. | 2411.13766 | null |
| 2024-11-18 | A Novel Speech Analysis and Correction Tool for Arabic-Speaking Children | Lamia Berriche et.al. | 2411.13592 | null |
| 2024-11-20 | CAFE A Novel Code switching Dataset for Algerian Dialect French and English | Houssam Eddine-Othman Lachemat et.al. | 2411.13424 | null |
| 2024-11-20 | I2TTS: Image-indicated Immersive Text-to-speech Synthesis with Spatial Perception | Jiawei Zhang et.al. | 2411.13314 | null |
| 2024-11-20 | Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM | Jiawei Yu et.al. | 2411.13159 | null |
| 2024-11-21 | Improving Controllability and Editability for Pretrained Text-to-Music Generation Models | Yixiao Zhang et.al. | 2411.12641 | null |
| 2024-11-19 | Whisper Finetuning on Nepali Language | Sanjay Rijal et.al. | 2411.12587 | null |
| 2024-11-18 | An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems | Jingyu Li et.al. | 2411.11353 | null |
| 2024-11-18 | Study of the Performance of CEEMDAN in Underdetermined Speech Separation | Rawad Melhem et.al. | 2411.11312 | null |
| 2024-11-18 | SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations and Acoustic Features | Yu-Fei Shi et.al. | 2411.11232 | null |
| 2024-11-17 | Inter-linguistic Phonetic Composition (IPC): A Theoretical and Computational Approach to Enhance Second Language Pronunciation | Jisang Park et.al. | 2411.10927 | null |
| 2024-11-16 | BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization | Md. Nazmus Sadat Samin et.al. | 2411.10879 | link |
| 2024-11-16 | Bilingual Text-dependent Speaker Verification with Pre-trained Models for TdSV Challenge 2024 | Seyed Ali Farokh et.al. | 2411.10828 | null |
| 2024-11-15 | SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers | Joseph Liu et.al. | 2411.10510 | link |
| 2024-11-15 | Interactive Cycle Model – The Linkage Combination among Automatic Speech Recognition, Large Language Models and Smart Glasses | Libo Wang et.al. | 2411.10362 | null |
| 2024-11-15 | Systolic Arrays and Structured Pruning Co-design for Efficient Transformers in Edge Systems | Pedro Palacios et.al. | 2411.10285 | null |
| 2024-11-15 | DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization | Christos Koutlis et.al. | 2411.10193 | null |
| 2024-11-15 | XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack Detection | Yang Xiao et.al. | 2411.10027 | null |
| 2024-11-15 | Zero-shot Voice Conversion with Diffusion Transformers | Songting Liu et.al. | 2411.09943 | null |
| 2024-11-14 | Everyone deserves their voice to be heard: Analyzing Predictive Gender Bias in ASR Models Applied to Dutch Speech Data | Rik Raes et.al. | 2411.09431 | null |
| 2024-11-14 | Transferable Adversarial Attacks against ASR | Xiaoxue Gao et.al. | 2411.09220 | null |
| 2024-11-14 | Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation | Kuiyuan Zhang et.al. | 2411.09167 | null |
| 2024-11-13 | Language Models for Music Medicine Generation | Emmanouil Nikolakakis et.al. | 2411.09080 | null |
| 2024-11-14 | Evaluating Synthetic Command Attacks on Smart Voice Assistants | Zhengxian He et.al. | 2411.08316 | null |
| 2024-11-13 | PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation | Yungang Yi et.al. | 2411.08307 | null |
| 2024-11-11 | Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition | Yoshiki Masuyama et.al. | 2411.06968 | link |
| 2024-11-11 | DCF-DS: Deep Cascade Fusion of Diarization and Separation for Speech Recognition under Realistic Single-Channel Conditions | Shu-Tong Niu et.al. | 2411.06667 | null |
| 2024-11-10 | Debatts: Zero-Shot Debating Text-to-Speech Synthesis | Yiqiao Huang et.al. | 2411.06540 | null |
| 2024-11-10 | CTC-Assisted LLM-Based Contextual ASR | Guanrou Yang et.al. | 2411.06437 | link |
| 2024-11-07 | Dialectal Coverage And Generalization in Arabic Speech Recognition | Amirbek Djanibekov et.al. | 2411.05872 | null |
| 2024-11-07 | Sentiment Analysis of Spanish Political Party Tweets Using Pre-trained Language Models | Chuqiao Song et.al. | 2411.04862 | null |
| 2024-11-07 | Multistage Fine-tuning Strategies for Automatic Speech Recognition in Low-resource Languages | Leena G Pillai et.al. | 2411.04573 | null |
| 2024-11-06 | Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks | Felipe Marra et.al. | 2411.03948 | null |
| 2024-11-04 | Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs | Alexandros Haliassos et.al. | 2411.02256 | link |
| 2024-11-04 | Complete reconstruction of the tongue contour through acoustic to articulatory inversion using real-time MRI data | Sofiane Azzouz et.al. | 2411.02037 | null |
| 2024-11-04 | CTEFM-VC: Zero-Shot Voice Conversion Based on Content-Aware Timbre Ensemble Modeling and Flow Matching | Yu Pan et.al. | 2411.02026 | null |
| 2024-11-04 | MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence | Fuming You et.al. | 2411.01805 | null |
| 2024-11-03 | SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation | Dennis Fucci et.al. | 2411.01710 | null |
| 2024-11-02 | Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection | Han Yin et.al. | 2411.01174 | link |
| 2024-11-02 | Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis | Shijia Liao et.al. | 2411.01156 | link |
| 2024-11-01 | Enhancing AAC Software for Dysarthric Speakers in e-Health Settings: An Evaluation Using TORGO | Macarious Hui et.al. | 2411.00980 | null |
| 2024-11-04 | Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval | Nikolaos Flemotomos et.al. | 2411.00664 | null |
| 2024-10-31 | IO Transformer: Evaluating SwinV2-Based Reward Models for Computer Vision | Maxwell Meyer et.al. | 2411.00252 | null |
| 2024-10-31 | Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody? | Ioannis Tsiamas et.al. | 2410.24019 | null |
| 2024-10-31 | Task-Aware Unified Source Separation | Kohei Saijo et.al. | 2410.23987 | null |
| 2024-10-30 | Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis | Théodor Lemerle et.al. | 2410.23320 | link |
| 2024-10-30 | Augmenting Polish Automatic Speech Recognition System With Synthetic Data | Łukasz Bondaruk et.al. | 2410.22903 | null |
| 2024-10-30 | Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising | Yoto Fujita et.al. | 2410.22805 | null |
| 2024-10-29 | Emotion-Guided Image to Music Generation | Souraja Kundu et.al. | 2410.22299 | null |
| 2024-10-29 | Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding | Bohan Li et.al. | 2410.21951 | null |
| 2024-10-29 | Joint Beamforming and Speaker-Attributed ASR for Real Distant-Microphone Meeting Transcription | Can Cui et.al. | 2410.21849 | null |
| 2024-10-28 | Asynchronous Tool Usage for Real-Time Agents | Antonio A. Ginart et.al. | 2410.21620 | null |
| 2024-10-28 | Enhancing TTS Stability in Hebrew using Discrete Semantic Units | Ella Zeldes et.al. | 2410.21502 | null |
| 2024-10-28 | Mitigating Unauthorized Speech Synthesis for Voice Protection | Zhisheng Zhang et.al. | 2410.20742 | link |
| 2024-10-27 | Using Confidence Scores to Improve Eyes-free Detection of Speech Recognition Errors | Sadia Nowrin et.al. | 2410.20564 | null |
| 2024-10-27 | Symbotunes: unified hub for symbolic music generative models | Paweł Skierś et.al. | 2410.20515 | link |
| 2024-10-27 | MusicFlow: Cascaded Flow Matching for Text Guided Music Generation | K R Prajwal et.al. | 2410.20478 | null |
| 2024-10-27 | Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation | Maohao Shen et.al. | 2410.20336 | null |
| 2024-10-27 | Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs | Enshi Zhang et.al. | 2410.20334 | null |
| 2024-10-26 | emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography | Viswanath Sivakumar et.al. | 2410.20081 | link |
| 2024-10-24 | Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis | Suparna De et.al. | 2410.19199 | null |
| 2024-10-25 | A Survey on Speech Large Language Models | Jing Peng et.al. | 2410.18908 | null |
| 2024-10-24 | We Augmented Whisper With kNN and You Won’t Believe What Came Next | Maya K. Nachesa et.al. | 2410.18850 | null |
| 2024-10-24 | STTATTS: Unified Speech-To-Text And Text-To-Speech Model | Hawau Olamide Toyin et.al. | 2410.18607 | null |
| 2024-10-24 | Evaluating and Improving Automatic Speech Recognition Systems for Korean Meteorological Experts | ChaeHun Park et.al. | 2410.18444 | null |
| 2024-10-24 | Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model | Vishakha Lall et.al. | 2410.18363 | null |
| 2024-10-23 | Music102: An $D_{12}$ -equivariant transformer for chord progression accompaniment | Weiliang Luo et.al. | 2410.18151 | link |
| 2024-10-23 | ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams | Srija Anand et.al. | 2410.17901 | null |
| 2024-10-23 | OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation | Qinglin Zhang et.al. | 2410.17799 | link |
| 2024-10-23 | Exploring Tokenization Methods for Multitrack Sheet Music Generation | Yashan Wang et.al. | 2410.17584 | null |
| 2024-10-23 | VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning | Yifan Peng et.al. | 2410.17485 | null |
| 2024-10-22 | mmWave-Whisper: Phone Call Eavesdropping and Transcription Using Millimeter-Wave Radar | Suryoday Basak et.al. | 2410.17457 | null |
| 2024-10-22 | Improving Automatic Speech Recognition with Decoder-Centric Regularisation in Encoder-Decoder Models | Alexander Polok et.al. | 2410.17437 | null |
| 2024-10-22 | VoiceBench: Benchmarking LLM-Based Voice Assistants | Yiming Chen et.al. | 2410.17196 | link |
| 2024-10-22 | Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification | Wen Huang et.al. | 2410.17033 | null |
| 2024-10-22 | Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap | Guanrou Yang et.al. | 2410.16726 | null |
| 2024-10-22 | DENOASR: Debiasing ASRs through Selective Denoising | Anand Kumar Rai et.al. | 2410.16712 | null |
| 2024-10-21 | AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition | Zehua Liu et.al. | 2410.16438 | link |
| 2024-10-21 | Neural Scoring, Not Embedding: A Novel Framework for Robust Speaker Verification | Wan Lin et.al. | 2410.16428 | null |
| 2024-10-21 | Continuous Speech Synthesis using per-token Latent Diffusion | Arnon Turetzky et.al. | 2410.16048 | null |
| 2024-10-21 | LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec | Yiwei Guo et.al. | 2410.15764 | null |
| 2024-10-21 | Acoustic Model Optimization over Multiple Data Sources: Merging and Valuation | Victor Junqiu Wei et.al. | 2410.15620 | null |
| 2024-10-21 | Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding | Yeonjoon Jung et.al. | 2410.15609 | null |
| 2024-10-21 | Moonshine: Speech Recognition for Live Transcription and Voice Commands | Nat Jeffries et.al. | 2410.15608 | null |
| 2024-10-20 | Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example | Suhita Ghosh et.al. | 2410.15500 | link |
| 2024-10-20 | Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses | Suhita Ghosh et.al. | 2410.15499 | null |
| 2024-10-20 | Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant | Alan Dao et.al. | 2410.15316 | link |
| 2024-10-19 | Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention | Yuzhe Weng et.al. | 2410.15029 | link |
| 2024-10-18 | AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup | Carlos Carvalho et.al. | 2410.14910 | null |
| 2024-10-18 | A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages | Sujitha Sathiyamoorthy et.al. | 2410.14197 | null |
| 2024-10-17 | Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding | Tan Dat Nguyen et.al. | 2410.13839 | null |
| 2024-10-17 | Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR | Abhishek Gupta et.al. | 2410.13445 | null |
| 2024-10-17 | MeloTrans: A Text to Symbolic Music Generation Model Following Human Composition Habit | Yutian Wang et.al. | 2410.13419 | null |
| 2024-10-17 | DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech | Jan Melechovsky et.al. | 2410.13342 | null |
| 2024-10-17 | Computational Approaches to Arabic-English Code-Switching | Caroline Sabty et.al. | 2410.13318 | null |
| 2024-10-17 | DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis | Yu Gu et.al. | 2410.13288 | null |
| 2024-10-17 | Roadmap towards Superhuman Speech Understanding using Large Language Models | Fan Bu et.al. | 2410.13268 | null |
| 2024-10-17 | Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation | Sreyan Ghosh et.al. | 2410.13198 | null |
| 2024-10-17 | EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning | Ashish Seth et.al. | 2410.13179 | link |
| 2024-10-17 | Deep Learning-based Software Engineering: Progress, Challenges, and Opportunities | Xiangping Chen et.al. | 2410.13110 | null |
| 2024-10-16 | Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech Synthesis in ASR | Christoph Minixhofer et.al. | 2410.12279 | null |
| 2024-10-16 | Guided Speaker Embedding | Shota Horiguchi et.al. | 2410.12182 | null |
| 2024-10-15 | A Framework for Adapting Human-Robot Interaction to Diverse User Groups | Theresa Pekarek Rosin et.al. | 2410.11377 | null |
| 2024-10-15 | Investigation of Speaker Representation for Target-Speaker Speech Processing | Takanori Ashihara et.al. | 2410.11243 | null |
| 2024-10-14 | DMDSpeech: Distilled Diffusion Model Surpassing The Teacher in Zero-shot Speech Synthesis via Direct Metric Optimization | Yingahao Aaron Li et.al. | 2410.11097 | null |
| 2024-10-14 | Character-aware audio-visual subtitling in context | Jaesung Huh et.al. | 2410.11068 | null |
| 2024-10-14 | Do we need more complex representations for structure? A comparison of note duration representation for Music Transformers | Gabriel Souza et.al. | 2410.10515 | null |
| 2024-10-14 | Everyday Speech in the Indian Subcontinent | Utkarsh Pathak et.al. | 2410.10508 | null |
| 2024-10-14 | In-Materia Speech Recognition | Mohamadreza Zolfagharinejad et.al. | 2410.10434 | null |
| 2024-10-13 | State of NLP in Kenya: A Survey | Cynthia Jayne Amol et.al. | 2410.09948 | null |
| 2024-10-13 | M2M-Gen: A Multimodal Framework for Automated Background Music Generation in Japanese Manga Using Large Language Models | Megha Sharma et.al. | 2410.09928 | null |
| 2024-10-12 | SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs | Wenxi Chen et.al. | 2410.09503 | null |
| 2024-10-12 | Automatic Speech Recognition with BERT and CTC Transformers: A Review | Noussaiba Djeffal et.al. | 2410.09456 | null |
| 2024-10-11 | UniGlyph: A Seven-Segment Script for Universal Language Representation | G. V. Bency Sherin et.al. | 2410.08974 | null |
| 2024-10-14 | Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models with Diverse Speech Variabilities | Aulia Adila et.al. | 2410.08828 | null |
| 2024-10-11 | Small Tunes Transformer: Exploring Macro & Micro-Level Hierarchies for Skeleton-Conditioned Melody Generation | Yishan Lv et.al. | 2410.08626 | null |
| 2024-10-11 | Symbolic Music Generation with Fine-grained Interactive Textural Guidance | Tingyu Zhu et.al. | 2410.08435 | null |
| 2024-10-10 | SoundScape: A Human-AI Co-Creation System Making Your Memories Heard | Chongjun Zhong et.al. | 2410.08136 | null |
| 2024-10-10 | Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models | Adriana Fernandez-Lopez et.al. | 2410.07771 | null |
| 2024-10-09 | The First VoicePrivacy Attacker Challenge Evaluation Plan | Natalia Tomashenko et.al. | 2410.07428 | link |
| 2024-10-09 | Advocating Character Error Rate for Multilingual ASR Evaluation | Thennal D K et.al. | 2410.07400 | null |
| 2024-10-09 | Efficient training strategies for natural sounding speech synthesis and speaker adaptation based on FastPitch | Teodora Răgman et.al. | 2410.06787 | null |
| 2024-10-09 | Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS | Onkar Kishor Susladkar et.al. | 2410.06608 | null |
| 2024-10-08 | Diversity-Rewarded CFG Distillation | Geoffrey Cideron et.al. | 2410.06084 | null |
| 2024-10-08 | The USTC-NERCSLIP Systems for the CHiME-8 MMCSG Challenge | Ya Jiang et.al. | 2410.05986 | null |
| 2024-10-08 | Improving Data Augmentation-based Cross-Speaker Style Transfer for TTS with Singing Voice, Style Filtering, and F0 Matching | Leonardo B. de M. M. Marques et.al. | 2410.05620 | link |
| 2024-10-07 | Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments | Sagarika Alavilli et.al. | 2410.05423 | null |
| 2024-10-07 | Presto! Distilling Steps and Layers for Accelerating Music Generation | Zachary Novack et.al. | 2410.05167 | null |
| 2024-10-07 | Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer | Siyuan Hou et.al. | 2410.05151 | null |
| 2024-10-07 | Enhancing Job Interview Preparation Through Immersive Experiences Using Photorealistic, AI-powered Metahuman Avatars | Navid Ashrafi et.al. | 2410.05131 | null |
| 2024-10-07 | CR-CTC: Consistency regularization on CTC for improved speech recognition | Zengwei Yao et.al. | 2410.05101 | null |
| 2024-10-07 | Improving Speaker Representations Using Contrastive Losses on Multi-scale Features | Satvik Dixit et.al. | 2410.05037 | null |
| 2024-10-06 | Punctuation Prediction for Polish Texts using Transformers | Jakub Pokrywka et.al. | 2410.04621 | null |
| 2024-10-06 | Casablanca: Data and Models for Multidialectal Arabic Speech Recognition | Bashar Talafha et.al. | 2410.04527 | null |
| 2024-10-06 | HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis | Yuto Nishimura et.al. | 2410.04380 | null |
| 2024-10-06 | SONAR: A Synthetic AI-Audio Detection Framework~and Benchmark | Xiang Li et.al. | 2410.04324 | link |
| 2024-10-05 | Efficient and Robust Long-Form Speech Recognition with Hybrid H3-Conformer | Tomoki Honda et.al. | 2410.04159 | link |
| 2024-10-04 | Generative Semantic Communication for Text-to-Speech Synthesis | Jiahao Zheng et.al. | 2410.03459 | null |
| 2024-10-04 | Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and Challenges | Nguyen Van Dinh et.al. | 2410.03458 | null |
| 2024-10-04 | Team MTS @ AutoMin 2021: An Overview of Existing Summarization Approaches and Comparison to Unsupervised Summarization Techniques | Olga Iakovenko et.al. | 2410.03412 | null |
| 2024-10-04 | MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech | Taejun Bak et.al. | 2410.03192 | null |
| 2024-10-03 | Disentangling Textual and Acoustic Features of Neural Speech Representations | Hosein Mohebbi et.al. | 2410.03037 | null |
| 2024-10-03 | Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR | Hainan Xu et.al. | 2410.02597 | null |
| 2024-10-04 | Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition | Olga Iakovenko et.al. | 2410.02560 | null |
| 2024-10-03 | Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems | Olga Iakovenko et.al. | 2410.02538 | null |
| 2024-10-03 | State-of-the-art Embeddings with Video-free Segmentation of the Source VoxCeleb Data | Sara Barahona et.al. | 2410.02364 | null |
| 2024-10-03 | A Pilot Study of Applying Sequence-to-Sequence Voice Conversion to Evaluate the Intelligibility of L2 Speech Using a Native Speaker’s Shadowings | Haopeng Geng et.al. | 2410.02239 | null |
| 2024-10-02 | Generating Symbolic Music from Natural Language Prompts using an LLM-Enhanced Dataset | Weihan Xu et.al. | 2410.02084 | null |
| 2024-10-02 | Spoken Grammar Assessment Using LLM | Sunil Kumar Kopparapu et.al. | 2410.01579 | null |
| 2024-10-02 | Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling | Yuguang Yang et.al. | 2410.01350 | null |
| 2024-10-01 | MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages | Marco Gaido et.al. | 2410.01036 | link |
| 2024-10-01 | Automatic Speech Recognition for the Ika Language | Uchenna Nzenwata et.al. | 2410.00940 | null |
| 2024-10-01 | Do Music Generation Models Encode Music Theory? | Megan Wei et.al. | 2410.00872 | null |
| 2024-10-01 | VHASR: A Multimodal Speech Recognition System With Vision Hotwords | Jiliang Hu et.al. | 2410.00822 | link |
| 2024-10-01 | Improving curriculum learning for target speaker extraction with synthetic speakers | Yun Liu et.al. | 2410.00811 | null |
| 2024-10-01 | End-to-End Speech Recognition with Pre-trained Masked Language Model | Yosuke Higuchi et.al. | 2410.00528 | null |
| 2024-10-02 | Integrating Text-to-Music Models with Language Models: Composing Long Structured Music Pieces | Lilac Atassi et.al. | 2410.00344 | null |
| 2024-10-01 | EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control | Haozhe Chen et.al. | 2410.00316 | null |
| 2024-09-30 | Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding | Takafumi Moriya et.al. | 2409.20313 | null |
| 2024-09-30 | Alignment-Free Training for Transducer-based Multi-Talker ASR | Takafumi Moriya et.al. | 2409.20301 | null |
| 2024-09-30 | AfriHuBERT: A self-supervised speech representation model for African languages | Jesujoba O. Alabi et.al. | 2409.20201 | null |
| 2024-09-30 | Melody Is All You Need For Music Generation | Shaopeng Wei et.al. | 2409.20196 | link |
| 2024-09-30 | Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems | Oswald Zink et.al. | 2409.19990 | null |
| 2024-09-30 | HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models | Bingshen Mu et.al. | 2409.19878 | null |
| 2024-09-29 | Fine-Tuning Automatic Speech Recognition for People with Parkinson’s: An Effective Strategy for Enhancing Speech Technology Accessibility | Xiuwen Zheng et.al. | 2409.19818 | null |
| 2024-09-29 | Efficient Long-Form Speech Recognition for General Speech In-Context Learning | Hao Yen et.al. | 2409.19757 | null |
| 2024-09-29 | Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective | Chen Chen et.al. | 2409.19575 | null |
| 2024-09-29 | CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought | Yexing Du et.al. | 2409.19510 | link |
| 2024-09-27 | Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models | Xiaoxue Gao et.al. | 2409.18654 | null |
| 2024-09-27 | ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5 | Jiaming Zhou et.al. | 2409.18584 | null |
| 2024-09-27 | EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis | Haoyu Wang et.al. | 2409.18512 | null |
| 2024-09-27 | Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking | Brian Yan et.al. | 2409.18428 | null |
| 2024-09-26 | Unveiling the Role of Pretraining in Direct Speech Translation | Belen Alastruey et.al. | 2409.18044 | null |
| 2024-09-26 | Are Transformers in Pre-trained LM A Good ASR Encoder? An Empirical Study | Keyu An et.al. | 2409.17750 | null |
| 2024-09-26 | Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition | Keyu An et.al. | 2409.17746 | null |
| 2024-09-26 | Deep CLAS: Deep Contextual Listen, Attend and Spell | Shifu Xiong et.al. | 2409.17603 | null |
| 2024-09-25 | Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion | Giuseppe Ruggiero et.al. | 2409.17387 | null |
| 2024-09-25 | Exploring synthetic data for cross-speaker style transfer in style representation based TTS | Lucas H. Ueda et.al. | 2409.17364 | null |
| 2024-09-25 | How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not | Francesco Verdini et.al. | 2409.17044 | null |
| 2024-09-25 | MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events | Xiaoyu Yang et.al. | 2409.17010 | null |
| 2024-09-25 | Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition | Andrés Piñeiro-Martín et.al. | 2409.16954 | null |
| 2024-09-25 | Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling | Yuanchao Li et.al. | 2409.16937 | link |
| 2024-09-25 | Speech Recognition Rescoring with Large Speech-Text Foundation Models | Prashanth Gurunath Shivakumar et.al. | 2409.16654 | null |
| 2024-09-24 | Spelling Correction through Rewriting of Non-Autoregressive ASR Lattices | Leonid Velikovich et.al. | 2409.16469 | null |
| 2024-09-24 | FastTalker: Jointly Generating Speech and Conversational Gestures from Text | Zixin Guo et.al. | 2409.16404 | null |
| 2024-09-24 | Revisiting Acoustic Features for Robust ASR | Muhammad A. Shah et.al. | 2409.16399 | null |
| 2024-09-24 | Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech | Yunji Chu et.al. | 2409.16203 | null |
| 2024-09-24 | ComiCap: A VLMs pipeline for dense captioning of Comic Panels | Emanuele Vivoli et.al. | 2409.16159 | link |
| 2024-09-24 | Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs | Yang Yuhang et.al. | 2409.16005 | null |
| 2024-09-24 | Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification | Fengrun Zhang et.al. | 2409.15974 | null |
| 2024-09-24 | Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM | Fengrun Zhang et.al. | 2409.15905 | null |
| 2024-09-24 | Exploring VQ-VAE with Prosody Parameters for Speaker Anonymization | Sotheara Leang et.al. | 2409.15882 | null |
| 2024-09-24 | WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction | Shuai Wang et.al. | 2409.15799 | null |
| 2024-09-24 | M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions | Shuai Wang et.al. | 2409.15782 | null |
| 2024-09-24 | Enhancing Open-Set Speaker Identification through Rapid Tuning with Speaker Reciprocal Points and Negative Sample | Zhiyong Chen et.al. | 2409.15742 | null |
| 2024-09-24 | StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech Synthesis | Zhiyong Chen et.al. | 2409.15741 | null |
| 2024-09-19 | WeHelp: A Shared Autonomy System for Wheelchair Users | Abulikemu Abuduweili et.al. | 2409.12159 | link |
| 2024-09-18 | ASR Benchmarking: Need for a More Representative Conversational Dataset | Gaurav Maheshwari et.al. | 2409.12042 | link |
| 2024-09-18 | Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0 | Zhiyong Wang et.al. | 2409.11909 | null |
| 2024-09-18 | M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper | Jiaming Zhou et.al. | 2409.11889 | null |
| 2024-09-18 | METEOR: Melody-aware Texture-controllable Symbolic Orchestral Music Generation | Dinh-Viet-Toan Le et.al. | 2409.11753 | link |
| 2024-09-19 | Simulating Native Speaker Shadowing for Nonnative Speech Assessment with Latent Speech Representations | Haopeng Geng et.al. | 2409.11742 | null |
| 2024-09-17 | Discrete Unit based Masking for Improving Disentanglement in Voice Conversion | Philip H. Lee et.al. | 2409.11560 | null |
| 2024-09-17 | Chain-of-Thought Prompting for Speech Translation | Ke Hu et.al. | 2409.11538 | null |
| 2024-09-17 | M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses | Yufeng Yang et.al. | 2409.11494 | null |
| 2024-09-17 | Bio-Inspired Mamba: Temporal Locality and Bioplausible Learning in Selective State Space Models | Jiahao Qin et.al. | 2409.11263 | null |
| 2024-09-17 | WER We Stand: Benchmarking Urdu ASR Models | Samee Arif et.al. | 2409.11252 | null |
| 2024-09-17 | Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text | Hongfei Xue et.al. | 2409.11214 | null |
| 2024-09-17 | Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora | Francesco Nespoli et.al. | 2409.11107 | null |
| 2024-09-17 | Single-stage TTS with Masked Audio Token Modeling and Semantic Knowledge Distillation | Gerard I. Gállego et.al. | 2409.11003 | null |
| 2024-09-17 | Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models | Potsawee Manakul et.al. | 2409.10999 | null |
| 2024-09-17 | Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data | Jing Xu et.al. | 2409.10969 | null |
| 2024-09-17 | Speech Recognition for Analysis of Police Radio Communication | Tejes Srivastava et.al. | 2409.10858 | null |
| 2024-09-17 | PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing | Phillip Long et.al. | 2409.10831 | null |
| 2024-09-16 | Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels | Zakaria Aldeneh et.al. | 2409.10791 | null |
| 2024-09-16 | An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems | Hitesh Tulsiani et.al. | 2409.10515 | null |
| 2024-09-16 | Meta-Whisper: Speech-Based Meta-ICL for ASR on Low-Resource Languages | Ming-Hao Hsu et.al. | 2409.10429 | null |
| 2024-09-16 | Voice control interface for surgical robot assistants | Ana Davila et.al. | 2409.10225 | null |
| 2024-09-16 | Augmenting Automatic Speech Recognition Models with Disfluency Detection | Robin Amann et.al. | 2409.10177 | null |
| 2024-09-16 | Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization | Xiaoxue Gao et.al. | 2409.10157 | null |
| 2024-09-16 | Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for SLT 2024 LRDWWS Challenge | Shuiyun Liu et.al. | 2409.10076 | null |
| 2024-09-16 | Speaker Contrastive Learning for Source Speaker Tracing | Qing Wang et.al. | 2409.10072 | null |
| 2024-09-16 | StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion | Yinghao Aaron Li et.al. | 2409.10058 | null |
| 2024-09-16 | A Study on Zero-shot Non-intrusive Speech Assessment using Large Language Models | Ryandhimas E. Zezario et.al. | 2409.09914 | null |
| 2024-09-15 | Large Language Model Based Generative Error Correction: A Challenge and Baselines forSpeech Recognition, Speaker Tagging, and Emotion Recognition | Chao-Han Huck Yang et.al. | 2409.09785 | null |
| 2024-09-13 | Clean Label Attacks against SLU Systems | Henry Li Xinyuan et.al. | 2409.08985 | null |
| 2024-09-13 | HLTCOE JHU Submission to the Voice Privacy Challenge 2024 | Henry Li Xinyuan et.al. | 2409.08913 | null |
| 2024-09-13 | Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages | Yao-Fei Cheng et.al. | 2409.08872 | null |
| 2024-09-13 | Exploring SSL Discrete Tokens for Multilingual ASR | Mingyu Cui et.al. | 2409.08805 | null |
| 2024-09-13 | Text-To-Speech Synthesis In The Wild | Jee-weon Jung et.al. | 2409.08711 | null |
| 2024-09-13 | NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training | Minglun Han et.al. | 2409.08680 | null |
| 2024-09-13 | LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation | Shaojun Li et.al. | 2409.08597 | null |
| 2024-09-13 | Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions | Lingwei Meng et.al. | 2409.08596 | null |
| 2024-09-13 | LHQ-SVC: Lightweight and High Quality Singing Voice Conversion Modeling | Yubo Huang et.al. | 2409.08583 | null |
| 2024-09-13 | LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study | Mahta Fetrat Qharabagh et.al. | 2409.08554 | null |
| 2024-09-12 | Hierarchical Symbolic Pop Music Generation with Graph Neural Networks | Wen Qing Lim et.al. | 2409.08155 | null |
| 2024-09-12 | Faster Speech-LLaMA Inference with Multi-token Prediction | Desh Raj et.al. | 2409.08148 | null |
| 2024-09-12 | WhisperNER: Unified Open Named Entity and Speech Recognition | Gil Ayache et.al. | 2409.08107 | null |
| 2024-09-12 | The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language | Michael Ong et.al. | 2409.08103 | null |
| 2024-09-12 | Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations | Wangjin Zhou et.al. | 2409.08039 | null |
| 2024-09-12 | Auto-Landmark: Acoustic Landmark Dataset and Open-Source Toolkit for Landmark Extraction | Xiangyu Zhang et.al. | 2409.07969 | null |
| 2024-09-12 | Detecting and Defending Against Adversarial Attacks on Automatic Speech Recognition via Diffusion Models | Nikolai L. Kühne et.al. | 2409.07936 | null |
| 2024-09-12 | Tidal MerzA: Combining affective modelling and autonomous code generation through Reinforcement Learning | Elizabeth Wilson et.al. | 2409.07918 | null |
| 2024-09-12 | Bridging Paintings and Music – Exploring Emotion based Music Generation through Paintings | Tanisha Hisariya et.al. | 2409.07827 | null |
| 2024-09-12 | Full-text Error Correction for Chinese Speech Recognition with Large Language Model | Zhiyuan Tang et.al. | 2409.07790 | null |
| 2024-09-11 | VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos | Yan-Bo Lin et.al. | 2409.07450 | null |
| 2024-09-11 | D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack | Hong-Hanh Nguyen-Le et.al. | 2409.07390 | null |
| 2024-09-11 | Rethinking Mamba in Speech Processing by Self-Supervised Models | Xiangyu Zhang et.al. | 2409.07273 | null |
| 2024-09-11 | ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages | Mahta Fetrat Qharabagh et.al. | 2409.07259 | null |
| 2024-09-11 | Enhancing CTC-Based Visual Speech Recognition | Hendrik Laux et.al. | 2409.07210 | null |
| 2024-09-11 | Linear Time Complexity Conformers with SummaryMixing for Streaming Speech Recognition | Titouan Parcollet et.al. | 2409.07165 | null |
| 2024-09-11 | The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction | Wen-Chin Huang et.al. | 2409.07001 | null |
| 2024-09-10 | An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition | Yi-Cheng Wang et.al. | 2409.06468 | null |
| 2024-09-10 | What happens to diffusion model likelihood when your model is conditional? | Mattias Cross et.al. | 2409.06364 | null |
| 2024-09-10 | VoiceWukong: Benchmarking Deepfake Voice Detection | Ziwei Yan et.al. | 2409.06348 | null |
| 2024-09-10 | Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches | Chang Zeng et.al. | 2409.06327 | null |
| 2024-09-10 | Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking | Jihyun Lee et.al. | 2409.06263 | null |
| 2024-09-10 | RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion | Wei Chen et.al. | 2409.06237 | null |
| 2024-09-10 | Advancing Topic Segmentation of Broadcasted Speech with Multilingual Semantic Embeddings | Sakshi Deo Shukla et.al. | 2409.06222 | null |
| 2024-09-10 | Multi-Source Music Generation with Latent Diffusion | Zhongweiyang Xu et.al. | 2409.06190 | link |
| 2024-09-10 | VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion | Kyungguen Byun et.al. | 2409.06126 | null |
| 2024-09-09 | Retrieval Augmented Correction of Named Entity Speech Recognition Errors | Ernest Pusateri et.al. | 2409.06062 | null |
| 2024-09-09 | PDAF: A Phonetic Debiasing Attention Framework For Speaker Verification | Massa Baali et.al. | 2409.05799 | null |
| 2024-09-09 | Consensus-based Distributed Quantum Kernel Learning for Speech Recognition | Kuan-Cheng Chen et.al. | 2409.05770 | null |
| 2024-09-09 | A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR | Giovanni Morrone et.al. | 2409.05750 | null |
| 2024-09-09 | AS-Speech: Adaptive Style For Speech Synthesis | Zhipeng Li et.al. | 2409.05730 | null |
| 2024-09-09 | Evaluation of real-time transcriptions using end-to-end ASR models | Carlos Arriaga et.al. | 2409.05674 | null |
| 2024-09-09 | Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation | Nithin Rao Koluguri et.al. | 2409.05601 | null |
| 2024-09-09 | An investigation of modularity for noise robustness in conformer-based ASR | Louise Coppieters de Gibson et.al. | 2409.05589 | null |
| 2024-09-09 | NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge | Naoyuki Kamo et.al. | 2409.05554 | null |
| 2024-09-09 | Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge | Hongfei Xue et.al. | 2409.05430 | null |
| 2024-09-08 | Exploring WavLM Back-ends for Speech Spoofing and Deepfake Detection | Theophile Stourbe et.al. | 2409.05032 | null |
| 2024-09-05 | Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization | Zexin Cai et.al. | 2409.03655 | null |
| 2024-09-05 | DiffEVC: Any-to-Any Emotion Voice Conversion with Expressive Guidance | Hsing-Hang Chou et.al. | 2409.03636 | null |
| 2024-09-05 | Speaker and Style Disentanglement of Speech Based on Contrastive Predictive Coding Supported Factorized Variational Autoencoder | Yuying Xie et.al. | 2409.03520 | null |
| 2024-09-04 | Probing self-attention in self-supervised speech models for cross-linguistic differences | Sai Gopinath et.al. | 2409.03115 | null |
| 2024-09-04 | Quantification of stylistic differences in human- and ASR-produced transcripts of African American English | Annika Heuser et.al. | 2409.03059 | null |
| 2024-09-04 | SymPAC: Scalable Symbolic Music Generation With Prompts And Constraints | Haonan Chen et.al. | 2409.03055 | null |
| 2024-09-04 | Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model | Tornike Karchkhadze et.al. | 2409.02845 | null |
| 2024-09-04 | Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models | Jakob Poncelet et.al. | 2409.02565 | null |
| 2024-09-04 | Parameter estimation of hidden Markov models: comparison of EM and quasi-Newton methods with a new hybrid algorithm | Sidonie Foulon et.al. | 2409.02477 | null |
| 2024-09-04 | Fast, High-Quality and Parameter-Efficient Articulatory Synthesis using Differentiable DSP | Yisi Liu et.al. | 2409.02451 | null |
| 2024-09-04 | What is lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations | Kavya Manohar et.al. | 2409.02449 | null |
| 2024-09-04 | MusicMamba: A Dual-Feature Modeling Approach for Generating Chinese Traditional Music with Modal Precision | Jiatao Chen et.al. | 2409.02421 | null |
| 2024-09-03 | FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation | Takuhiro Kaneko et.al. | 2409.02245 | null |
| 2024-09-03 | Temporal Order Preserved Optimal Transport-based Cross-modal Knowledge Transfer Learning for ASR | Xugang Lu et.al. | 2409.02239 | null |
| 2024-09-03 | Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model | Hukai Huang et.al. | 2409.02050 | null |
| 2024-09-03 | The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge | Shutong Niu et.al. | 2409.02041 | null |
| 2024-08-30 | Advancing Multi-talker ASR Performance with Large Language Models | Mohan Shi et.al. | 2408.17431 | null |
| 2024-08-30 | AASIST3: KAN-Enhanced AASIST Speech Deepfake Detection using SSL Features and Additional Regularization for the ASVspoof 2024 Challenge | Kirill Borodin et.al. | 2408.17352 | null |
| 2024-08-30 | Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model | Zhen Ye et.al. | 2408.17175 | link |
| 2024-08-30 | Recursive Attentive Pooling for Extracting Speaker Embeddings from Multi-Speaker Recordings | Shota Horiguchi et.al. | 2408.17142 | null |
| 2024-08-30 | Generative Modeling Perspective for Control and Reasoning in Robotics | Takuma Yoneda et.al. | 2408.17041 | null |
| 2024-08-30 | Utilizing Speaker Profiles for Impersonation Audio Detection | Hao Gu et.al. | 2408.17009 | null |
| 2024-08-30 | Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming | Zhifei Xie et.al. | 2408.16725 | link |
| 2024-08-29 | CrisperWhisper: Accurate Timestamps on Verbatim Speech Transcriptions | Laurin Wagner et.al. | 2408.16589 | null |
| 2024-08-29 | Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing | Qianhui Liu et.al. | 2408.16564 | null |
| 2024-08-29 | RAVE for Speech: Efficient Voice Conversion at High Sampling Rates | Anders R. Bargum et.al. | 2408.16546 | null |
| 2024-08-29 | Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis | Zehai Tu et.al. | 2408.16373 | null |
| 2024-08-29 | Measuring the Accuracy of Automatic Speech Recognition Solutions | Korbinian Kuhn et.al. | 2408.16287 | link |
| 2024-08-29 | Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation | Lun Wang et.al. | 2408.16204 | null |
| 2024-08-29 | Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction | Yuka Ko et.al. | 2408.16180 | null |
| 2024-08-28 | Spoofing-Robust Speaker Verification Using Parallel Embedding Fusion: BTU Speech Group’s Approach for ASVspoof5 Challenge | Oğuzhan Kurnaz et.al. | 2408.15877 | null |
| 2024-08-28 | VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling | Yixuan Zhou et.al. | 2408.15676 | null |
| 2024-08-28 | Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications | Korbinian Kuhn et.al. | 2408.15616 | link |
| 2024-08-28 | Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models | Yiyang Zhao et.al. | 2408.15585 | null |
| 2024-08-28 | EmoAttack: Utilizing Emotional Voice Conversion for Speech Backdoor Attacks on Deep Speech Classification Models | Wenhan Yao et.al. | 2408.15508 | null |
| 2024-08-27 | Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement | Longshen Ou et.al. | 2408.15176 | null |
| 2024-08-27 | Speech Recognition Transformers: Topological-lingualism Perspective | Shruti Singh et.al. | 2408.14991 | null |
| 2024-08-27 | Literary and Colloquial Dialect Identification for Tamil using Acoustic Features | M. Nanmalar et.al. | 2408.14887 | null |
| 2024-08-27 | The VoxCeleb Speaker Recognition Challenge: A Retrospective | Jaesung Huh et.al. | 2408.14886 | null |
| 2024-08-27 | MaskCycleGAN-based Whisper to Normal Speech Conversion | K. Rohith Gupta et.al. | 2408.14797 | null |
| 2024-08-26 | MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues | Kuluhan Binici et.al. | 2408.14418 | null |
| 2024-08-26 | Self-supervised Speech Representations Still Struggle with African American Vernacular English | Kalvin Chang et.al. | 2408.14262 | link |
| 2024-08-26 | Automatic recognition and detection of aphasic natural speech | Mara Barberis et.al. | 2408.14082 | null |
| 2024-08-26 | Research Advances and New Paradigms for Biology-inspired Spiking Neural Networks | Tianyu Zheng et.al. | 2408.13996 | null |
| 2024-08-26 | Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact on Empathy, Trust, and Feeling Heard | Wonjune Kang et.al. | 2408.13970 | null |
| 2024-08-25 | Literary and Colloquial Tamil Dialect Identification | M. Nanmalar et.al. | 2408.13739 | null |
| 2024-08-24 | Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification | Aditya Dawn et.al. | 2408.13644 | null |
| 2024-08-24 | As Biased as You Measure: Methodological Pitfalls of Bias Evaluations in Speaker Verification Research | Wiebke Hutiri et.al. | 2408.13614 | null |
| 2024-08-24 | SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description | Zeyu Jin et.al. | 2408.13608 | null |
| 2024-08-23 | Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples | Zhenyu Wang et.al. | 2408.13341 | null |
| 2024-08-23 | Which Prosodic Features Matter Most for Pragmatics? | Nigel G. Ward et.al. | 2408.13240 | null |
| 2024-08-23 | NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks | He Huang et.al. | 2408.13106 | null |
| 2024-08-23 | Focused Discriminative Training For Streaming CTC-Trained Automatic Speech Recognition Models | Adnan Haider et.al. | 2408.13008 | null |
| 2024-08-22 | Towards measuring fairness in speech recognition: Fair-Speech dataset | Irina-Elena Veliche et.al. | 2408.12734 | null |
| 2024-08-22 | WhisperMask: A Noise Suppressive Mask-Type Microphone for Whisper Speech | Hirotaka Hiraki et.al. | 2408.12500 | null |
| 2024-08-22 | Positional Description for Numerical Normalization | Deepanshu Gupta et.al. | 2408.12430 | null |
| 2024-08-22 | LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation | Shihao Chen et.al. | 2408.12354 | null |
| 2024-08-22 | Developing vocal system impaired patient-aimed voice quality assessment approach using ASR representation-included multiple features | Shaoxiang Dang et.al. | 2408.12279 | null |
| 2024-08-21 | The State of Commercial Automatic French Legal Speech Recognition Systems and their Impact on Court Reporters et al | Nicolad Garneau et.al. | 2408.11940 | null |
| 2024-08-21 | Approaching Deep Learning through the Spectral Dynamics of Weights | David Yunis et.al. | 2408.11804 | link |
| 2024-08-22 | A Joint Noise Disentanglement and Adversarial Training Framework for Robust Speaker Verification | Xujiang Xing et.al. | 2408.11562 | null |
| 2024-08-21 | Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech | Anastasia Avdeeva et.al. | 2408.11528 | null |
| 2024-08-21 | Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech Recognizers | Prashant Serai et.al. | 2408.11258 | null |
| 2024-08-20 | BUT Systems and Analyses for the ASVspoof 5 Challenge | Johan Rohdin et.al. | 2408.11152 | null |
| 2024-08-20 | AI-Based IVR | Gassyrbek Kosherbay et.al. | 2408.10549 | null |
| 2024-08-20 | XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition | Xucheng Wan et.al. | 2408.10524 | null |
| 2024-08-19 | ASASVIcomtech: The Vicomtech-UGR Speech Deepfake Detection and SASV Systems for the ASVspoof5 Challenge | Juan M. Martín-Doñas et.al. | 2408.10361 | null |
| 2024-08-19 | Hear Your Face: Face-based voice conversion with F0 estimation | Jaejun Lee et.al. | 2408.09802 | null |
| 2024-08-19 | Unsupervised Composable Representations for Audio | Giovanni Bindi et.al. | 2408.09792 | null |
| 2024-08-19 | Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts | Jiaqing Liu et.al. | 2408.09688 | null |
| 2024-08-18 | A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition | Yangze Li et.al. | 2408.09491 | null |
| 2024-08-17 | Malacopula: adversarial automatic speaker verification attacks using a neural-based generalised Hammerstein model | Massimiliano Todisco et.al. | 2408.09300 | null |
| 2024-08-17 | Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition | Samuele Cornell et.al. | 2408.09215 | null |
| 2024-08-14 | Supervised and Unsupervised Alignments for Spoofing Behavioral Biometrics | Thomas Thebaud et.al. | 2408.08918 | null |
| 2024-08-16 | ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale | Xin Wang et.al. | 2408.08739 | null |
| 2024-08-15 | Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words | Kento Nozawa et.al. | 2408.08027 | null |
| 2024-08-14 | SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition | Mohamed Osman et.al. | 2408.07851 | link |
| 2024-08-14 | WavLM model ensemble for audio deepfake detection | David Combei et.al. | 2408.07414 | null |
| 2024-08-14 | DPSNN: Spiking Neural Network for Low-Latency Streaming Speech Enhancement | Tao Sun et.al. | 2408.07388 | null |
| 2024-08-13 | Play Me Something Icy: Practical Challenges, Explainability and the Semantic Gap in Generative AI Music | Jesse Allison et.al. | 2408.07224 | null |
| 2024-08-13 | VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders | Yubing Cao et.al. | 2408.06906 | null |
| 2024-08-13 | SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis | Osamu Take et.al. | 2408.06858 | link |
| 2024-08-13 | PRESENT: Zero-Shot Text-to-Prosody Control | Perry Lam et.al. | 2408.06827 | link |
| 2024-08-13 | Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation | Matthias Bartolo et.al. | 2408.06804 | link |
| 2024-08-12 | Cross-Lingual Conversational Speech Summarization with Large Language Models | Max Nelson et.al. | 2408.06484 | null |
| 2024-08-12 | Audio Enhancement for Computer Audition – An Iterative Training Paradigm Using Sample Importance | Manuel Milling et.al. | 2408.06264 | null |
| 2024-08-12 | Enhancing Dialogue Speech Recognition with Robust Contextual Awareness via Noise Representation Learning | Wonjun Lee et.al. | 2408.06043 | null |
| 2024-08-12 | Controlling Surprisal in Music Generation via Information Content Curve Matching | Mathias Rose Bjare et.al. | 2408.06022 | link |
| 2024-08-11 | LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition | Eunseop Yoon et.al. | 2408.05769 | null |
| 2024-08-11 | VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing | Chunyu Qiang et.al. | 2408.05758 | null |
| 2024-08-10 | Improving Whisper’s Recognition Performance for Under-Represented Language Kazakh Leveraging Unpaired Speech and Text | Jinpeng Li et.al. | 2408.05554 | null |
| 2024-08-09 | MooER: LLM-based Speech Recognition and Translation Models from Moore Threads | Junhao Xu et.al. | 2408.05101 | null |
| 2024-08-09 | TEAdapter: Supply abundant guidance for controllable text-to-music generation | Jialing Zou et.al. | 2408.04865 | null |
| 2024-08-08 | MulliVC: Multi-lingual Voice Conversion With Cycle Consistency | Jiawei Huang et.al. | 2408.04708 | null |
| 2024-08-08 | NeuralMultiling: A Novel Neural Architecture Search for Smartphone based Multilingual Speaker Verification | Aravinda Reddy PN et.al. | 2408.04362 | null |
| 2024-08-08 | HydraFormer: One Encoder For All Subsampling Rates | Yaoxun Xu et.al. | 2408.04325 | link |
| 2024-08-08 | Preserving spoken content in voice anonymisation with character-level vocoder conditioning | Michele Panariello et.al. | 2408.04306 | null |
| 2024-08-08 | wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech | Khai Le-Duc et.al. | 2408.04174 | null |
| 2024-08-07 | Speaker Adaptation for Quantised End-to-End ASR Models | Qiuming Zhao et.al. | 2408.03979 | null |
| 2024-08-06 | Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training | Hawraz A. Ahmad et.al. | 2408.03887 | null |
| 2024-08-07 | Facing the Music: Tackling Singing Voice Separation in Cinematic Audio Source Separation | Karn N. Watcharasupat et.al. | 2408.03588 | null |
| 2024-08-06 | ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval | Ruixiang Zhao et.al. | 2408.02978 | null |
| 2024-08-06 | Self-Supervised Learning for Multi-Channel Neural Transducer | Atsushi Kojima et.al. | 2408.02945 | null |
| 2024-08-05 | Automatic Voice Identification after Speech Resynthesis using PPG | Thibault Gaudier et.al. | 2408.02712 | null |
| 2024-08-05 | Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition | Jaeyoung Kim et.al. | 2408.02582 | null |
| 2024-08-05 | The NPU-ASLP System Description for Visual Speech Recognition in CNVSRC 2024 | He Wang et.al. | 2408.02369 | null |
| 2024-08-05 | StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion | Zhichao Wang et.al. | 2408.02178 | null |
| 2024-08-04 | Why Perturbing Symbolic Music is Necessary: Fitting the Distribution of Never-used Notes through a Joint Probabilistic Diffusion Model | Shipei Liu et.al. | 2408.01950 | null |
| 2024-08-03 | ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features | Peng Cheng et.al. | 2408.01808 | null |
| 2024-08-03 | Generating High-quality Symbolic Music Using Fine-grained Discriminators | Zhedong Zhang et.al. | 2408.01696 | null |
| 2024-08-02 | EmoBack: Backdoor Attacks Against Speaker Identification Using Emotional Prosody | Coen Schoof et.al. | 2408.01178 | null |
| 2024-08-01 | Expressive MIDI-format Piano Performance Generation | Jingwei Liu et.al. | 2408.00900 | null |
| 2024-08-01 | SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data | Yichen Lu et.al. | 2408.00624 | null |
| 2024-08-01 | Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation | Xinhan Di et.al. | 2408.00284 | null |
| 2024-08-01 | Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation | Kohei Matsuura et.al. | 2408.00205 | null |
| 2024-07-31 | Combining audio control and style transfer using latent diffusion | Nils Demerlé et.al. | 2408.00196 | null |
| 2024-07-31 | The Llama 3 Herd of Models | Abhimanyu Dubey et.al. | 2407.21783 | null |
| 2024-07-31 | Between the AI and Me: Analysing Listeners’ Perspectives on AI- and Human-Composed Progressive Metal Music | Pedro Sarmento et.al. | 2407.21615 | null |
| 2024-08-01 | Generative Expressive Conversational Speech Synthesis | Rui Liu et.al. | 2407.21491 | null |
| 2024-07-31 | On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition | Nick Rossenbach et.al. | 2407.21476 | null |
| 2024-07-31 | Towards interfacing large language models with ASR systems using confidence measures and prompting | Maryam Naderi et.al. | 2407.21414 | null |
| 2024-07-30 | Self-Supervised Models in Automatic Whispered Speech Recognition | Aref Farhadipour et.al. | 2407.21211 | null |
| 2024-07-28 | ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks | Nakamasa Inoue et.al. | 2407.21066 | null |
| 2024-07-30 | Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation | Jingyue Huang et.al. | 2407.20955 | link |
| 2024-07-29 | Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation | Junda Wu et.al. | 2407.20445 | null |
| 2024-07-29 | Practical and Reproducible Symbolic Music Generation by Large Language Models with Structural Embeddings | Seungyeon Rhyu et.al. | 2407.19900 | null |
| 2024-07-26 | Dynamic Language Group-Based MoE: Enhancing Efficiency and Flexibility for Code-Switching Speech Recognition | Hukai Huang et.al. | 2407.18581 | null |
| 2024-07-29 | Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks | Mahmoud Salhab et.al. | 2407.18571 | null |
| 2024-07-26 | Towards Improving NAM-to-Speech Synthesis Intelligibility using Self-Supervised Speech Models | Neil Shah et.al. | 2407.18541 | null |
| 2024-07-26 | VoxSim: A perceptual voice similarity dataset | Junseok Ahn et.al. | 2407.18505 | link |
| 2024-07-26 | Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation | Shiyao Wang et.al. | 2407.18461 | link |
| 2024-07-25 | On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures | Nick Rossenbach et.al. | 2407.17997 | null |
| 2024-07-25 | Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization | Ruijie Tao et.al. | 2407.17902 | link |
| 2024-07-25 | Improving Domain-Specific ASR with LLM-Generated Contextual Descriptions | Jiwon Suh et.al. | 2407.17874 | null |
| 2024-07-25 | Scaling A Simple Approach to Zero-Shot Speech Recognition | Jinming Zhao et.al. | 2407.17852 | link |
| 2024-07-24 | Coupling Speech Encoders with Downstream Text Models | Ciprian Chelba et.al. | 2407.17605 | null |
| 2024-07-24 | A Comparative Analysis of Bilingual and Trilingual Wav2Vec Models for Automatic Speech Recognition in Multilingual Oral History Archives | Jan Lehečka et.al. | 2407.17160 | null |
| 2024-07-24 | Long-Term, Store-Front Robotics: Interactive Music for Robotic Arm, Caxixi and Frame Drums | Richard Savery et.al. | 2407.16956 | null |
| 2024-07-23 | Quantifying the Role of Textual Predictability in Automatic Speech Recognition | Sean Robertson et.al. | 2407.16537 | null |
| 2024-07-23 | The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization | Samuele Cornell et.al. | 2407.16447 | null |
| 2024-07-23 | Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction | Rithik Sachdev et.al. | 2407.16370 | link |
| 2024-07-22 | dMel: Speech Tokenization made Simple | He Bai et.al. | 2407.15835 | link |
| 2024-07-22 | Robustness of Speech Separation Models for Similar-pitch Speakers | Bunlong Lay et.al. | 2407.15749 | null |
| 2024-07-22 | SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios | Hazim Bukhari et.al. | 2407.15300 | null |
| 2024-07-21 | Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning | Shuai Wang et.al. | 2407.15188 | null |
| 2024-07-21 | MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation | Yun-Han Lan et.al. | 2407.15060 | link |
| 2024-07-20 | Towards Realistic Emotional Voice Conversion using Controllable Emotional Intensity | Tianhua Qi et.al. | 2407.14800 | null |
| 2024-07-21 | Trading Devil Final: Backdoor attack via Stock market and Bayesian Optimization | Orson Mengara et.al. | 2407.14573 | null |
| 2024-07-19 | Towards Assessing Data Replication in Music Generation with Music Similarity Metrics on Raw Audio | Roser Batlle-Roca et.al. | 2407.14364 | link |
| 2024-07-19 | Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings | Praveen Srinivasa Varadhan et.al. | 2407.14056 | link |
| 2024-07-19 | GE2E-AC: Generalized End-to-End Loss Training for Accent Classification | Chihiro Watanabe et.al. | 2407.14021 | null |
| 2024-07-19 | MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis | Qian Yang et.al. | 2407.14006 | null |
| 2024-07-19 | Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding by Provenance | Changye Li et.al. | 2407.13982 | link |
| 2024-07-18 | Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models | Weiqin Li et.al. | 2407.13509 | null |
| 2024-07-18 | Reducing Barriers to the Use of Marginalised Music Genres in AI | Nick Bryan-Kinns et.al. | 2407.13439 | null |
| 2024-07-18 | Robust ASR Error Correction with Conservative Data Filtering | Takuma Udagawa et.al. | 2407.13300 | null |
| 2024-07-18 | Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training | Lukuan Dong et.al. | 2407.13292 | null |
| 2024-07-18 | How Private is Low-Frequency Speech Audio in the Wild? An Analysis of Verbal Intelligibility by Humans and Machines | Ailin Liu et.al. | 2407.13266 | null |
| 2024-07-18 | A light-weight and efficient punctuation and word casing prediction model for on-device streaming ASR | Jian You et.al. | 2407.13142 | null |
| 2024-07-17 | Audio Conditioning for Music Generation via Discrete Bottleneck Features | Simon Rouard et.al. | 2407.12563 | null |
| 2024-07-17 | Morphosyntactic Analysis for CHILDES | Houjun Liu et.al. | 2407.12389 | null |
| 2024-07-17 | Adaptive Cascading Network for Continual Test-Time Adaptation | Kien X. Nguyen et.al. | 2407.12240 | null |
| 2024-07-16 | Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models | Minh Nguyen et.al. | 2407.12094 | link |
| 2024-07-17 | Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors | Julien Hauret et.al. | 2407.11828 | link |
| 2024-07-16 | Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality | Tina Raissi et.al. | 2407.11641 | null |
| 2024-07-16 | The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation | Michele Panariello et.al. | 2407.11516 | null |
| 2024-07-16 | VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark | Yuke Lin et.al. | 2407.11510 | null |
| 2024-07-16 | Beyond Binary: Multiclass Paraphasia Detection with Generative Pretrained Transformers and End-to-End Models | Matthew Perez et.al. | 2407.11345 | null |
| 2024-07-15 | Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data | Liang-Hsuan Tseng et.al. | 2407.10603 | null |
| 2024-07-15 | BandControlNet: Parallel Transformers-based Steerable Popular Music Generation with Fine-Grained Spatiotemporal Features | Jing Luo et.al. | 2407.10462 | link |
| 2024-07-14 | The Interpretation Gap in Text-to-Music Generation Models | Yongyi Zang et.al. | 2407.10328 | null |
| 2024-07-14 | Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation | Ruizhe Huang et.al. | 2407.10303 | null |
| 2024-07-14 | CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR | Wenbo Zhao et.al. | 2407.10255 | null |
| 2024-07-14 | Textless Dependency Parsing by Labeled Sequence Prediction | Shunsuke Kando et.al. | 2407.10118 | link |
| 2024-07-14 | Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification | Li Zhang et.al. | 2407.10048 | null |
| 2024-07-13 | Text-Based Detection of On-Hold Scripts in Contact Center Calls | Dmitrii Galimzianov et.al. | 2407.09849 | link |
| 2024-07-13 | Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System | Lingwei Meng et.al. | 2407.09817 | null |
| 2024-07-13 | A Streaming Multi-Channel End-to-End Speech Recognition System with Realistic Evaluations | Xiangzhu Kong et.al. | 2407.09807 | null |
| 2024-07-12 | Music Proofreading with RefinPaint: Where and How to Modify Compositions given Context | Pedro Ramoneda et.al. | 2407.09099 | link |
| 2024-07-12 | Optimization of DNN-based speaker verification model through efficient quantization technique | Yeona Hong et.al. | 2407.08991 | null |
| 2024-07-10 | Evaluating Voice Command Pipelines for Drone Control: From STT and LLM to Direct Classification and Siamese Networks | Lucca Emmanuel Pineli Simões et.al. | 2407.08658 | null |
| 2024-07-11 | Tamil Language Computing: the Present and the Future | Kengatharaiyer Sarveswaran et.al. | 2407.08618 | null |
| 2024-07-11 | Autoregressive Speech Synthesis without Vector Quantization | Lingwei Meng et.al. | 2407.08551 | null |
| 2024-07-11 | Toward accessible comics for blind and low vision readers | Christophe Rigaud et.al. | 2407.08248 | null |
| 2024-07-10 | Phonetic Richness for Improved Automatic Speaker Verification | Nicholas Klein et.al. | 2407.08017 | null |
| 2024-07-10 | Source Tracing of Audio Deepfake Systems | Nicholas Klein et.al. | 2407.08016 | null |
| 2024-07-11 | SaMoye: Zero-shot Singing Voice Conversion Based on Feature Disentanglement and Synthesis | Zihao Wang et.al. | 2407.07728 | link |
| 2024-07-10 | HebDB: a Weakly Supervised Dataset for Hebrew Speech Processing | Arnon Turetzky et.al. | 2407.07566 | null |
| 2024-07-09 | Remastering Divide and Remaster: A Cinematic Audio Source Separation Dataset with Multilingual Support | Karn N. Watcharasupat et.al. | 2407.07275 | null |
| 2024-07-09 | Speech After Gender: A Trans-Feminine Perspective on Next Steps for Speech Science and Technology | Robin Netzorg et.al. | 2407.07235 | null |
| 2024-07-09 | Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models | Yi-Cheng Lin et.al. | 2407.06957 | link |
| 2024-07-09 | Tailored Design of Audio-Visual Speech Recognition Models using Branchformers | David Gimeno-Gómez et.al. | 2407.06606 | link |
| 2024-07-08 | Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation | Mengzhe Geng et.al. | 2407.06310 | null |
| 2024-07-08 | Two-Path GMM-ResNet and GMM-SENet for ASV Spoofing Detection | Zhenchun Lei et.al. | 2407.05605 | null |
| 2024-07-07 | Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation | Jin Woo Lee et.al. | 2407.05516 | null |
| 2024-07-07 | Fine-Grained and Interpretable Neural Speech Editing | Max Morrison et.al. | 2407.05471 | null |
| 2024-07-09 | CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens | Zhihao Du et.al. | 2407.05407 | null |
| 2024-07-06 | A Reference-free Metric for Language-Queried Audio Source Separation using Contrastive Language-Audio Pretraining | Feiyang Xiao et.al. | 2407.04936 | null |
| 2024-07-05 | MUSIC-lite: Efficient MUSIC using Approximate Computing: An OFDM Radar Case Study | Rajat Bhattacharjya et.al. | 2407.04849 | null |
| 2024-07-05 | Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition | Ye Bai et.al. | 2407.04675 | null |
| 2024-07-05 | Multitaper mel-spectrograms for keyword spotting | Douglas Baptista de Souza et.al. | 2407.04662 | null |
| 2024-07-05 | Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units | Bolaji Yusuf et.al. | 2407.04652 | link |
| 2024-07-05 | Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models | Bolaji Yusuf et.al. | 2407.04641 | null |
| 2024-07-05 | Written Term Detection Improves Spoken Term Detection | Bolaji Yusuf et.al. | 2407.04601 | link |
| 2024-07-05 | FA-GAN: Artifacts-free and Phase-aware High-fidelity GAN-based Vocoder | Rubing Shen et.al. | 2407.04575 | null |
| 2024-07-05 | Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect | Salima Mdhaffar et.al. | 2407.04533 | null |
| 2024-07-05 | Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models | Vyas Raina et.al. | 2407.04482 | null |
| 2024-07-05 | XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models | Shashi Kumar et.al. | 2407.04439 | null |
| 2024-07-05 | Romanization Encoding For Multilingual ASR | Wen Ding et.al. | 2407.04368 | null |
| 2024-07-03 | GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification | Hui Yan et.al. | 2407.03135 | null |
| 2024-07-03 | Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition | Jinming Chen et.al. | 2407.03026 | null |
| 2024-07-03 | Probing the Feasibility of Multilingual Speaker Anonymization | Sarina Meyer et.al. | 2407.02937 | link |
| 2024-07-02 | Towards the Next Frontier in Speech Representation Learning Using Disentanglement | Varun Krishna et.al. | 2407.02543 | null |
| 2024-07-02 | Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization | Yuchen Hu et.al. | 2407.02243 | null |
| 2024-07-02 | The USTC-NERCSLIP Systems for The ICMC-ASR Challenge | Minghui Wu et.al. | 2407.02052 | null |
| 2024-07-02 | Accompanied Singing Voice Synthesis with Fully Text-controlled Melody | Ruiqi Li et.al. | 2407.02049 | null |
| 2024-07-02 | Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models | Zhiyuan Tang et.al. | 2407.01909 | link |
| 2024-07-01 | Pictures Of MIDI: Controlled Music Generation via Graphical Prompts for Image-Based Diffusion Inpainting | Scott H. Hawley et.al. | 2407.01499 | null |
| 2024-07-01 | Lightweight Zero-shot Text-to-Speech with Mixture of Adapters | Kenichi Fujita et.al. | 2407.01291 | null |
| 2024-06-30 | An Attribute Interpolation Method in Speech Synthesis by Model Merging | Masato Murata et.al. | 2407.00766 | null |
| 2024-06-30 | Less Forgetting for Better Generalization: Exploring Continual-learning Fine-tuning Methods for Speech Self-supervised Representations | Salah Zaiem et.al. | 2407.00756 | null |
| 2024-06-30 | FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis | Yinlin Guo et.al. | 2407.00753 | null |
| 2024-06-29 | When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration | Philipp Allgeuer et.al. | 2407.00518 | null |
| 2024-06-28 | SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR | Qiuming Zhao et.al. | 2406.19706 | null |
| 2024-06-28 | Less is More: Accurate Speech Recognition & Translation without Web-Scale Data | Krishna C. Puvvada et.al. | 2406.19674 | null |
| 2024-06-27 | Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects | Orevaoghene Ahia et.al. | 2406.19564 | null |
| 2024-06-27 | Tradition or Innovation: A Comparison of Modern ASR Methods for Forced Alignment | Rotem Rousso et.al. | 2406.19363 | null |
| 2024-06-27 | Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems | Zheng Fang et.al. | 2406.19311 | null |
| 2024-06-27 | Application of ASV for Voice Identification after VC and Duration Predictor Improvement in TTS Models | Borodin Kirill Nikolayevich et.al. | 2406.19243 | null |
| 2024-06-27 | DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability | Hyun Joon Park et.al. | 2406.19135 | link |
| 2024-06-27 | Applying LLMs for Rescoring N-best ASR Hypotheses of Casual Conversations: Effects of Domain Adaptation and Context Carry-over | Atsunori Ogawa et.al. | 2406.18972 | null |
| 2024-06-27 | Enhanced ASR Robustness to Packet Loss with a Front-End Adaptation Network | Yehoshua Dissen et.al. | 2406.18928 | null |
| 2024-06-27 | Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study | Peikun Chen et.al. | 2406.18862 | null |
| 2024-06-26 | A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems | Karn N. Watcharasupat et.al. | 2406.18747 | link |
| 2024-06-26 | Dynamic Data Pruning for Automatic Speech Recognition | Qiao Xiao et.al. | 2406.18373 | null |
| 2024-06-26 | MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research | Song Li et.al. | 2406.18301 | null |
| 2024-06-26 | Automatic Speech Recognition for Hindi | Anish Saha et.al. | 2406.18135 | null |
| 2024-06-26 | ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs | Ahmed Heakl et.al. | 2406.18120 | link |
| 2024-06-26 | SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR | Shuaishuai Ye et.al. | 2406.18021 | null |
| 2024-06-25 | Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment | Paarth Neekhara et.al. | 2406.17957 | null |
| 2024-06-25 | Sequential Editing for Lifelong Training of Speech Recognition Models | Devang Kulshreshtha et.al. | 2406.17935 | null |
| 2024-06-25 | FASA: a Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech Data | Dancheng Liu et.al. | 2406.17926 | link |
| 2024-06-25 | Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals | Kentaro Seki et.al. | 2406.17722 | null |
| 2024-06-25 | Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model | Jiawen Huang et.al. | 2406.17618 | link |
| 2024-06-25 | MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization | Adriana Fernandez-Lopez et.al. | 2406.17614 | null |
| 2024-06-25 | High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model | Joun Yeop Lee et.al. | 2406.17310 | null |
| 2024-06-25 | A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR | Van Tung Pham et.al. | 2406.17272 | null |
| 2024-06-25 | Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation | Yingting Li et.al. | 2406.17257 | null |
| 2024-06-24 | Investigating Confidence Estimation Measures for Speaker Diarization | Anurag Chowdhury et.al. | 2406.17124 | null |
| 2024-06-24 | Exploring the Capability of Mamba in Speech Applications | Koichi Miyazaki et.al. | 2406.16808 | null |
| 2024-06-24 | Blending LLMs into Cascaded Speech Translation: KIT’s Offline Speech Translation System for IWSLT 2024 | Sai Koneru et.al. | 2406.16777 | null |
| 2024-06-25 | Towards Zero-Shot Text-To-Speech for Arabic Dialects | Khai Duy Doan et.al. | 2406.16751 | null |
| 2024-06-24 | One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection | Hyun Myung Kim et.al. | 2406.16716 | null |
| 2024-06-24 | RefXVC: Cross-Lingual Voice Conversion with Enhanced Reference Leveraging | Mingyang Zhang et.al. | 2406.16326 | null |
| 2024-06-24 | DreamVoice: Text-Guided Voice Conversion | Jiarui Hai et.al. | 2406.16314 | null |
| 2024-06-23 | Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss | Muhammad Shakeel et.al. | 2406.16120 | null |
| 2024-06-23 | Decoder-only Architecture for Streaming End-to-end Speech Recognition | Emiru Tsunoo et.al. | 2406.16107 | null |
| 2024-06-22 | Acoustic Feature Mixup for Balanced Multi-aspect Pronunciation Assessment | Heejin Do et.al. | 2406.15723 | null |
| 2024-06-21 | PI-Whisper: An Adaptive and Incremental ASR Framework for Diverse and Evolving Speaker Characteristics | Amir Nassereldine et.al. | 2406.15668 | null |
| 2024-06-21 | Perception of Phonological Assimilation by Neural Speech Recognition Models | Charlotte Pouw et.al. | 2406.15265 | null |
| 2024-06-21 | InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions | Yu Nakagome et.al. | 2406.14890 | null |
| 2024-06-20 | An Adapter-Based Unified Model for Multiple Spoken Language Processing Tasks | Varsha Suresh et.al. | 2406.14747 | null |
| 2024-06-21 | DASB – Discrete Audio and Speech Benchmark | Pooneh Mousavi et.al. | 2406.14294 | null |
| 2024-06-20 | Intelligent Interface: Enhancing Lecture Engagement with Didactic Activity Summaries | Anna Wróblewska et.al. | 2406.14266 | null |
| 2024-06-19 | Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control | Alexander Blatt et.al. | 2406.13842 | null |
| 2024-06-19 | ManWav: The First Manchu ASR Model | Jean Seo et.al. | 2406.13502 | null |
| 2024-06-19 | Children’s Speech Recognition through Discrete Token Enhancement | Vrunda N. Sukhadia et.al. | 2406.13431 | null |
| 2024-06-19 | CEC: A Noisy Label Detection Method for Speaker Recognition | Yao Shen et.al. | 2406.13268 | null |
| 2024-06-18 | Articulatory Encodec: Vocal Tract Kinematics as a Codec for Speech | Cheol Jun Cho et.al. | 2406.12998 | null |
| 2024-06-18 | Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition | Kuan-Chen Wang et.al. | 2406.12699 | null |
| 2024-06-18 | Transcribe, Align and Segment: Creating speech datasets for low-resource languages | Taras Sereda et.al. | 2406.12674 | null |
| 2024-06-18 | Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of Speech | Adrien Pupier et.al. | 2406.12621 | null |
| 2024-06-18 | Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting | Yosuke Kashiwagi et.al. | 2406.12611 | null |
| 2024-06-18 | Unsupervised Online Continual Learning for Automatic Speech Recognition | Steven Vander Eeckt et.al. | 2406.12503 | null |
| 2024-06-18 | Performant ASR Models for Medical Entities in Accented Speech | Tejumade Afonja et.al. | 2406.12387 | null |
| 2024-06-18 | Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model | Hayato Futami et.al. | 2406.12317 | null |
| 2024-06-18 | JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning | Boyu Chen et.al. | 2406.12292 | null |
| 2024-06-18 | SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization | Young Jin Ahn et.al. | 2406.12233 | null |
| 2024-06-18 | A Mel Spectrogram Enhancement Paradigm Based on CWT in Speech Synthesis | Guoqiang Hu et.al. | 2406.12164 | null |
| 2024-06-17 | 1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis | Sewade Ogun et.al. | 2406.11727 | null |
| 2024-06-17 | GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement | Yifan Yang et.al. | 2406.11546 | link |
| 2024-06-17 | Performance Improvement of Language-Queried Audio Source Separation Based on Caption Augmentation From Large Language Models for DCASE Challenge 2024 Task 9 | Do Hyun Lee et.al. | 2406.11248 | null |
| 2024-06-17 | Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision | Yafeng Chen et.al. | 2406.11169 | null |
| 2024-06-16 | Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech | Guan-Ting Lin et.al. | 2406.11064 | null |
| 2024-06-16 | NAST: Noise Aware Speech Tokenization for Speech Language Models | Shoval Messica et.al. | 2406.11037 | null |
| 2024-06-16 | Large Language Models for Dysfluency Detection in Stuttered Speech | Dominik Wagner et.al. | 2406.11025 | null |
| 2024-06-16 | Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models | Dominik Wagner et.al. | 2406.11022 | null |
| 2024-06-16 | Optimized Speculative Sampling for GPU Hardware Accelerators | Dominik Wagner et.al. | 2406.11016 | null |
| 2024-06-16 | CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving | Bhavani Shankar et.al. | 2406.10993 | null |
| 2024-06-14 | Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation | Dena Mujtaba et.al. | 2406.10177 | null |
| 2024-06-14 | On the Evaluation of Speech Foundation Models for Spoken Language Understanding | Siddhant Arora et.al. | 2406.10083 | null |
| 2024-06-14 | Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation | Andrew Rouditchenko et.al. | 2406.10082 | link |
| 2024-06-14 | Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection | Haoyu Wang et.al. | 2406.10052 | null |
| 2024-06-14 | ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR | Vishwanath Pratap Singh et.al. | 2406.09999 | null |
| 2024-06-14 | An efficient text augmentation approach for contextualized Mandarin speech recognition | Naijun Zheng et.al. | 2406.09950 | null |
| 2024-06-14 | Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition | Yicong Jiang et.al. | 2406.09873 | null |
| 2024-06-14 | MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model | Jiatong Shi et.al. | 2406.09869 | null |
| 2024-06-14 | Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy | Linhan Ma et.al. | 2406.09844 | null |
| 2024-06-14 | Low algorithmic delay implementation of convolutional beamformer for online joint source separation and dereverberation | Kaien Mo et.al. | 2406.09821 | null |
| 2024-06-13 | Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech | Martina Valente et.al. | 2406.09290 | null |
| 2024-06-13 | Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn’t | Chihiro Taguchi et.al. | 2406.09202 | null |
| 2024-06-13 | LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks | Amit Meghanani et.al. | 2406.09153 | null |
| 2024-06-13 | ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis | Dehua Tao et.al. | 2406.08989 | null |
| 2024-06-13 | Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition | William Ravenscroft et.al. | 2406.08914 | null |
| 2024-06-13 | AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers | Emil Biju et.al. | 2406.08904 | null |
| 2024-06-13 | A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed | Ziyang Zhuang et.al. | 2406.08835 | null |
| 2024-06-13 | Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems | Zhengyang Chen et.al. | 2406.08812 | null |
| 2024-06-12 | ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets | Jiatong Shi et.al. | 2406.08641 | null |
| 2024-06-12 | Emotion Manipulation Through Music – A Deep Learning Interactive Visual Approach | Adel N. Abdalla et.al. | 2406.08623 | null |
| 2024-06-12 | SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models | Chun Yin et.al. | 2406.08445 | null |
| 2024-06-12 | TokSing: Singing Voice Synthesis based on Discrete Tokens | Yuning Wu et.al. | 2406.08416 | null |
| 2024-06-12 | Neural Blind Source Separation and Diarization for Distant Speech Recognition | Yoshiaki Bando et.al. | 2406.08396 | null |
| 2024-06-12 | Towards Unsupervised Speech Recognition Without Pronunciation Models | Junrui Ni et.al. | 2406.08380 | null |
| 2024-06-12 | Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques | Yuanchao Li et.al. | 2406.08353 | link |
| 2024-06-12 | Refining Self-Supervised Learnt Speech Representation using Brain Activations | Hengyu Li et.al. | 2406.08266 | null |
| 2024-06-12 | Transformer-based Model for ASR N-Best Rescoring and Rewriting | Iwen E. Kang et.al. | 2406.08207 | null |
| 2024-06-12 | FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter | Yuanjun Lv et.al. | 2406.08196 | null |
| 2024-06-12 | Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data | Yuma Shirahata et.al. | 2406.08111 | null |
| 2024-06-12 | Can Large Language Models Understand Spatial Audio? | Changli Tang et.al. | 2406.07914 | null |
| 2024-06-11 | Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? | Qingkai Fang et.al. | 2406.07289 | null |
| 2024-06-11 | Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment | Takuto Igarashi et.al. | 2406.07280 | null |
| 2024-06-11 | AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection | Rong Gong et.al. | 2406.07256 | null |
| 2024-06-11 | SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark | Yuki Saito et.al. | 2406.07254 | null |
| 2024-06-11 | CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems | Haibin Wu et.al. | 2406.07237 | null |
| 2024-06-11 | MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms | Seung-bin Kim et.al. | 2406.07103 | link |
| 2024-06-11 | Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter | Andrei Andrusenko et.al. | 2406.07096 | null |
| 2024-06-11 | Spoken Language Corpora Augmentation with Domain-Specific Voice-Cloned Speech | Mateusz Czyżnikiewicz et.al. | 2406.07090 | null |
| 2024-06-11 | Reading Miscue Detection in Primary School through Automatic Speech Recognition | Lingyun Gao et.al. | 2406.07060 | null |
| 2024-06-10 | Synthetic Query Generation using Large Language Models for Virtual Assistants | Sonal Sannigrahi et.al. | 2406.06729 | null |
| 2024-06-10 | Meta Learning Text-to-Speech Synthesis in over 7000 Languages | Florian Lux et.al. | 2406.06403 | link |
| 2024-06-10 | A Parameter-efficient Language Extension Framework for Multilingual ASR | Wei Liu et.al. | 2406.06329 | null |
| 2024-06-10 | Quantifying the effect of speech pathology on automatic and human speaker verification | Bence Mark Halpern et.al. | 2406.06208 | null |
| 2024-06-10 | JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis | Hyunjae Cho et.al. | 2406.06111 | null |
| 2024-06-10 | Prompting Large Language Models with Audio for General-Purpose Speech Summarization | Wonjune Kang et.al. | 2406.05968 | link |
| 2024-06-09 | Conserving Human Creativity with Evolutionary Generative Algorithms: A Case Study in Music Generation | Justin Kilb et.al. | 2406.05873 | null |
| 2024-06-09 | Source -Free Domain Adaptation for Speaker Verification in Data-Scarce Languages and Noisy Channels | Shlomo Salo Elia et.al. | 2406.05863 | null |
| 2024-06-09 | Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper | Chih-Kai Yang et.al. | 2406.05806 | null |
| 2024-06-09 | Optimizing Multi-Stuttered Speech Classification: Leveraging Whisper’s Encoder for Efficient Parameter Reduction in Automated Assessment | Huma Ameer et.al. | 2406.05784 | null |
| 2024-06-09 | SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion | Bingsong Bai et.al. | 2406.05692 | null |
| 2024-06-07 | The Database and Benchmark for Source Speaker Verification Against Voice Conversion | Ze Li et.al. | 2406.04951 | null |
| 2024-06-07 | LLM-based speaker diarization correction: A generalizable approach | Georgios Efstathiadis et.al. | 2406.04927 | null |
| 2024-06-07 | Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR | Shaojun Li et.al. | 2406.04791 | null |
| 2024-06-07 | Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis | Xintong Wang et.al. | 2406.04595 | null |
| 2024-06-07 | Neural Codec-based Adversarial Sample Detection for Speaker Verification | Xuanjun Chen et.al. | 2406.04582 | null |
| 2024-06-06 | Flexible Multichannel Speech Enhancement for Noise-Robust Frontend | Ante Jukić et.al. | 2406.04552 | null |
| 2024-06-06 | Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation | Keqi Deng et.al. | 2406.04541 | null |
| 2024-06-06 | To Distill or Not to Distill? On the Robustness of Robust Knowledge Distillation | Abdul Waheed et.al. | 2406.04512 | null |
| 2024-06-06 | Towards Naturalistic Voice Conversion: NaturalVoices Dataset with an Automatic Processing Pipeline | Ali N. Salman et.al. | 2406.04494 | null |
| 2024-06-06 | Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis | Théodor Lemerle et.al. | 2406.04467 | null |
| 2024-06-06 | VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling | Zeyue Tian et.al. | 2406.04321 | link |
| 2024-06-06 | Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement | Wangyou Zhang et.al. | 2406.04269 | null |
| 2024-06-06 | Hypernetworks for Personalizing ASR to Atypical Speech | Max Mueller-Eberstein et.al. | 2406.04240 | null |
| 2024-06-06 | Helsinki Speech Challenge 2024 | Martin Ludvigsen et.al. | 2406.04123 | null |
| 2024-06-06 | BLSP-Emo: Towards Empathetic Large Speech-Language Models | Chen Wang et.al. | 2406.03872 | link |
| 2024-06-06 | Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores | Jiaming Zhou et.al. | 2406.03814 | null |
| 2024-06-06 | Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU | Daniel Galvez et.al. | 2406.03791 | null |
| 2024-06-06 | Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining | Jinlong Xue et.al. | 2406.03714 | null |
| 2024-06-06 | Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model | Jinlong Xue et.al. | 2406.03706 | null |
| 2024-06-05 | Style Mixture of Experts for Expressive Text-To-Speech Synthesis | Ahad Jawaid et.al. | 2406.03637 | null |
| 2024-06-05 | Enhancing CTC-based speech recognition with diverse modeling units | Shiyi Han et.al. | 2406.03274 | null |
| 2024-06-05 | Error-preserving Automatic Speech Recognition of Young English Learners’ Language | Janick Michot et.al. | 2406.03235 | link |
| 2024-06-05 | StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning | Shaolei Zhang et.al. | 2406.03049 | link |
| 2024-06-05 | 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders | Yui Sudo et.al. | 2406.02950 | null |
| 2024-06-05 | SYN2REAL: Leveraging Task Arithmetic for Mitigating Synthetic-Real Discrepancies in ASR Domain Adaptation | Hsuan Su et.al. | 2406.02925 | null |
| 2024-06-05 | Text Injection for Neural Contextual Biasing | Zhong Meng et.al. | 2406.02921 | null |
| 2024-06-04 | Keyword-Guided Adaptation of Automatic Speech Recognition | Aviv Shamsian et.al. | 2406.02649 | null |
| 2024-06-04 | Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion | Ruiqi Li et.al. | 2406.02429 | null |
| 2024-06-04 | An Independence-promoting Loss for Music Generation with Language Models | Jean-Marie Lemercier et.al. | 2406.02315 | null |
| 2024-06-04 | Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models | Victor Miara et.al. | 2406.02285 | null |
| 2024-06-04 | ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency | Yafeng Chen et.al. | 2406.02167 | null |
| 2024-06-04 | Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision | Saierdaer Yusuyin et.al. | 2406.02166 | link |
| 2024-06-04 | Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis | Kun Zhou et.al. | 2406.02009 | null |
| 2024-06-04 | Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping | Lun Wang et.al. | 2406.02004 | null |
| 2024-06-03 | TinySV: Speaker Verification in TinyML with On-device Learning | Massimo Pavan et.al. | 2406.01655 | null |
| 2024-06-03 | Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach | Ara Yeroyan et.al. | 2406.01446 | null |
| 2024-06-03 | Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization | Firas Khader et.al. | 2406.01314 | null |
| 2024-05-31 | Very Low Complexity Speech Synthesis Using Framewise Autoregressive GAN (FARGAN) with Pitch Prediction | Jean-Marc Valin et.al. | 2405.21069 | null |
| 2024-05-30 | DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation | Zachary Novack et.al. | 2405.20289 | null |
| 2024-05-30 | Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation | Adam Sorrenti et.al. | 2405.20059 | link |
| 2024-05-30 | Explainable Attribute-Based Speaker Verification | Xiaoliang Wu et.al. | 2405.19796 | null |
| 2024-05-31 | Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities | Vicky Zayats et.al. | 2405.18669 | null |
| 2024-05-28 | Augmented Conversation with Embedded Speech-Driven On-the-Fly Referencing in AR | Shivesh Jadon et.al. | 2405.18537 | null |
| 2024-05-28 | Intelligent Clinical Documentation: Harnessing Generative AI for Patient-Centric Clinical Note Generation | Anjanava Biswas et.al. | 2405.18346 | null |
| 2024-05-28 | NUTS, NARS, and Speech | D. van der Sluis et.al. | 2405.17874 | null |
| 2024-05-28 | TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation | Chenyang Le et.al. | 2405.17809 | null |
| 2024-05-27 | Federating Dynamic Models using Early-Exit Architectures for Automatic Speech Recognition on Heterogeneous Clients | Mohamed Nabih Ali et.al. | 2405.17376 | null |
| 2024-05-27 | “Pass the butter”: A study on desktop-classic multitasking robotic arm based on advanced YOLOv7 and BERT | Haohua Que et.al. | 2405.17250 | null |
| 2024-05-27 | RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis | Haoxiang Shi et.al. | 2405.17028 | null |
| 2024-05-27 | A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and Recognition | Zilu Guo et.al. | 2405.16952 | null |
| 2024-05-24 | Quality-aware Masked Diffusion Transformer for Enhanced Music Generation | Chang Li et.al. | 2405.15863 | null |
| 2024-05-27 | HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System | Zhisheng Zhang et.al. | 2405.15655 | null |
| 2024-05-24 | Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition | Zijin Gu et.al. | 2405.15216 | null |
| 2024-05-23 | Contrastive and Consistency Learning for Neural Noisy-Channel Model in Spoken Language Understanding | Suyoung Kim et.al. | 2405.15097 | null |
| 2024-05-23 | Real-Time and Accurate: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow Synthesis | Hui Li et.al. | 2405.15093 | null |
| 2024-05-23 | Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models | Jingyi Chen et.al. | 2405.14632 | null |
| 2024-05-23 | Let’s Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition | Chan-Jan Hsu et.al. | 2405.14259 | null |
| 2024-05-23 | Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models | Yuchen Hu et.al. | 2405.14161 | null |
| 2024-05-23 | A Survey on Vision-Language-Action Models for Embodied AI | Yueen Ma et.al. | 2405.14093 | null |
| 2024-05-22 | ST-Gait++: Leveraging spatio-temporal convolutions for gait-based emotion recognition on videos | Maria Luísa Lima et.al. | 2405.13903 | null |
| 2024-05-22 | Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation | Muhammad Shakeel et.al. | 2405.13514 | null |
| 2024-05-22 | A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction | Yue Li et.al. | 2405.13477 | null |
| 2024-05-22 | You don’t understand me!: Comparing ASR results for L1 and L2 speakers of Swedish | Ronald Cumbal et.al. | 2405.13379 | null |
| 2024-05-22 | Contextualized Automatic Speech Recognition with Dynamic Vocabulary | Yui Sudo et.al. | 2405.13344 | null |
| 2024-05-21 | FairLENS: Assessing Fairness in Law Enforcement Speech Recognition | Yicheng Wang et.al. | 2405.13166 | null |
| 2024-05-21 | Could a Computer Architect Understand our Brain? | Valentin Puente-Varona et.al. | 2405.12815 | null |
| 2024-05-21 | SYMPLEX: Controllable Symbolic Music Generation using Simplex Diffusion with Vocabulary Priors | Nicolas Jonason et.al. | 2405.12666 | null |
| 2024-05-21 | Mamba in Speech: Towards an Alternative to Self-Attention | Xiangyu Zhang et.al. | 2405.12609 | null |
| 2024-05-20 | Neighborhood Attention Transformer with Progressive Channel Fusion for Speaker Verification | Nian Li et.al. | 2405.12031 | null |
| 2024-05-20 | Continuous Sign Language Recognition with Adapted Conformer via Unsupervised Pretraining | Neena Aloysius et.al. | 2405.12018 | null |
| 2024-05-20 | Diff-BGM: A Diffusion Model for Video Background Music Generation | Sizhe Li et.al. | 2405.11913 | null |
| 2024-05-20 | SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model | Siavash Shams et.al. | 2405.11831 | link |
| 2024-05-17 | Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System | Vimal Manohar et.al. | 2405.11078 | null |
| 2024-05-17 | Distinctive and Natural Speaker Anonymization via Singular Value Transformation-assisted Matrix | Jixun Yao et.al. | 2405.10786 | null |
| 2024-05-16 | Speaker Verification in Agent-Generated Conversations | Yizhe Yang et.al. | 2405.10150 | null |
| 2024-05-16 | Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models | Yuchen Hu et.al. | 2405.10025 | null |
| 2024-05-16 | Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded Diffusion Models | Ziyu Wang et.al. | 2405.09901 | link |
| 2024-05-16 | Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model | Siyang Wang et.al. | 2405.09768 | null |
| 2024-05-15 | No More Mumbles: Enhancing Robot Intelligibility through Speech Adaptation | Qiaoqiao Ren et.al. | 2405.09708 | link |
| 2024-05-15 | Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer | Weifei Jin et.al. | 2405.09470 | null |
| 2024-05-15 | Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis | Sho Inoue et.al. | 2405.09171 | null |
| 2024-05-15 | Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization | Jenthe Thienpondt et.al. | 2405.09142 | null |
| 2024-05-14 | Investigating the ‘Autoencoder Behavior’ in Speech Self-Supervised Models: a focus on HuBERT’s Pretraining | Valentin Vielzeuf et.al. | 2405.08402 | null |
| 2024-05-14 | SpeechVerse: A Large-scale Generalizable Audio Language Model | Nilaksh Das et.al. | 2405.08295 | null |
| 2024-05-13 | Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases | Pengfei Zhang et.al. | 2405.07442 | null |
| 2024-05-12 | SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset | Sushant Gautam et.al. | 2405.07354 | link |
| 2024-05-11 | Towards an Accessible and Rapidly Trainable Rhythm Sequencer Using a Generative Stacked Autoencoder | Alex Wastnidge et.al. | 2405.07034 | null |
| 2024-05-11 | A framework of text-dependent speaker verification for chinese numerical string corpus | Litong Zheng et.al. | 2405.07029 | null |
| 2024-05-10 | DP-DyLoRA: Fine-Tuning Transformer-Based Models On-Device under Differentially Private Federated Learning using Dynamic Low-Rank Adaptation | Jie Xu et.al. | 2405.06368 | null |
| 2024-05-10 | Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech | Dena Mujtaba et.al. | 2405.06150 | null |
| 2024-05-09 | Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models | Vyas Raina et.al. | 2405.06134 | link |
| 2024-05-09 | The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge | Jingguang Tian et.al. | 2405.05498 | null |
| 2024-05-07 | Open Implementation and Study of BEST-RQ for Speech Processing | Ryan Whetten et.al. | 2405.04296 | link |
| 2024-05-07 | Speaker Characterization by means of Attention Pooling | Federico Costa et.al. | 2405.04096 | null |
| 2024-05-06 | Whispy: Adapting STT Whisper Models to Real-Time Environments | Antonio Bevilacqua et.al. | 2405.03484 | null |
| 2024-05-06 | MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition | Bingshen Mu et.al. | 2405.03152 | null |
| 2024-05-06 | Determined Multichannel Blind Source Separation with Clustered Source Model | Jianyu Wang et.al. | 2405.03118 | null |
| 2024-05-11 | Analysis about Theoretical Foundations for Method to Enhancing ASR Performance using OCR Word Frequency Differences | Kyudan Jung et.al. | 2405.02995 | null |
| 2024-05-07 | Mozart’s Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models | Tianze Xu et.al. | 2405.02801 | link |
| 2024-05-04 | Mixat: A Data Set of Bilingual Emirati-English Speech | Maryam Al Ali et.al. | 2405.02578 | link |
| 2024-05-06 | Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models | Alessandro Pianese et.al. | 2405.02179 | null |
| 2024-05-06 | Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets | Xuelong Geng et.al. | 2405.02132 | null |
| 2024-05-02 | Converting Anyone’s Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model | Zongyang Du et.al. | 2405.01730 | null |
| 2024-05-01 | Efficient Sample-Specific Encoder Perturbations | Yassir Fathullah et.al. | 2405.01601 | null |
| 2024-05-02 | Low-resource speech recognition and dialect identification of Irish in a multi-task framework | Liam Lonergan et.al. | 2405.01293 | null |
| 2024-05-02 | Improving Membership Inference in ASR Model Auditing with Perturbed Loss Features | Francisco Teixeira et.al. | 2405.01207 | null |
| 2024-05-02 | Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment | Aditya Chakravarty et.al. | 2405.01004 | link |
| 2024-05-02 | Efficient Compression of Multitask Multilingual Speech Models | Thomas Palmeira Ferraz et.al. | 2405.00966 | null |
| 2024-05-02 | MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion | Pengcheng Li et.al. | 2405.00930 | null |
| 2024-05-01 | Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation | Yimin Deng et.al. | 2405.00603 | null |
| 2024-05-01 | Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition | Dongyuan Li et.al. | 2405.00307 | link |
| 2024-04-30 | Who is Authentic Speaker | Qiang Huang et.al. | 2405.00248 | null |
| 2024-04-30 | ConFides: A Visual Analytics Solution for Automated Speech Recognition Analysis and Exploration | Sunwoo Ha et.al. | 2405.00223 | null |
| 2024-04-30 | Expressivity and Speech Synthesis | Andreas Triantafyllopoulos et.al. | 2404.19363 | null |
| 2024-04-30 | Does Whisper understand Swiss German? An automatic, qualitative, and human evaluation | Eyal Liron Dolev et.al. | 2404.19310 | null |
| 2024-04-30 | EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization | Jianzong Wang et.al. | 2404.19214 | null |
| 2024-04-30 | EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning | Ziqi Liang et.al. | 2404.19212 | null |
| 2024-04-29 | Towards Dog Bark Decoding: Leveraging Human Speech Processing for Automated Bark Classification | Artem Abzaliev et.al. | 2404.18739 | null |
| 2024-04-29 | MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis | Xiang Li et.al. | 2404.18398 | link |
| 2024-04-30 | ComposerX: Multi-Agent Symbolic Music Composition with LLMs | Qixin Deng et.al. | 2404.18081 | link |
| 2024-04-27 | A Comparison of Differential Performance Metrics for the Evaluation of Automatic Speaker Verification Fairness | Oubaida Chouchane et.al. | 2404.17810 | null |
| 2024-04-26 | An RFP dataset for Real, Fake, and Partially fake audio detection | Abdulazeez AlAli et.al. | 2404.17721 | null |
| 2024-04-26 | A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification | Rémi Uro et.al. | 2404.17552 | null |
| 2024-04-26 | Child Speech Recognition in Human-Robot Interaction: Problem Solved? | Ruben Janssens et.al. | 2404.17394 | null |
| 2024-04-26 | Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks | Mingrui He et.al. | 2404.17280 | null |
| 2024-04-29 | COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations | Ruben Ciranni et.al. | 2404.16969 | null |
| 2024-04-26 | Automatic Speech Recognition System-Independent Word Error Rate Estimation | Chanho Park et.al. | 2404.16743 | null |
| 2024-04-25 | Developing Acoustic Models for Automatic Speech Recognition in Swedish | Giampiero Salvi et.al. | 2404.16547 | null |
| 2024-04-25 | U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF | Xingchen Song et.al. | 2404.16407 | null |
| 2024-04-24 | Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges | Badri Narayana Patro et.al. | 2404.16112 | link |
| 2024-04-24 | Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning | Zuheng Kang et.al. | 2404.15704 | null |
| 2024-04-24 | HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts | Xinlei Niu et.al. | 2404.15637 | null |
| 2024-04-23 | Killkan: The Automatic Speech Recognition Dataset for Kichwa with Morphosyntactic Information | Chihiro Taguchi et.al. | 2404.15501 | link |
| 2024-04-23 | Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations | Theo Lepage et.al. | 2404.14913 | null |
| 2024-04-23 | Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance | Tsubasa Ochiai et.al. | 2404.14860 | null |
| 2024-04-25 | FlashSpeech: Efficient Zero-Shot Speech Synthesis | Zhen Ye et.al. | 2404.14700 | null |
| 2024-04-22 | Assessment of Sign Language-Based versus Touch-Based Input for Deaf Users Interacting with Intelligent Personal Assistants | Nina Tran et.al. | 2404.14605 | null |
| 2024-04-22 | Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks | Alexandre Bittar et.al. | 2404.14024 | null |
| 2024-04-23 | Retrieval-Augmented Audio Deepfake Detection | Zuheng Kang et.al. | 2404.13892 | null |
| 2024-04-23 | Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications | Charith Chandra Sai Balne et.al. | 2404.13506 | null |
| 2024-04-20 | Text-dependent Speaker Verification (TdSV) Challenge 2024: Challenge Evaluation Plan | Zeinali Hossein et.al. | 2404.13428 | null |
| 2024-04-20 | Semantically Corrected Amharic Automatic Speech Recognition | Samuael Adnew et.al. | 2404.13362 | link |
| 2024-04-20 | Music Consistency Models | Zhengcong Fei et.al. | 2404.13358 | null |
| 2024-04-20 | Track Role Prediction of Single-Instrumental Sequences | Changheon Han et.al. | 2404.13286 | null |
| 2024-04-19 | Learn2Talk: 3D Talking Face Learns from 2D Talking Face | Yixiang Zhuang et.al. | 2404.12888 | null |
| 2024-04-19 | Efficient infusion of self-supervised representations in Automatic Speech Recognition | Darshan Prabhu et.al. | 2404.12628 | null |
| 2024-04-18 | TIMIT Speaker Profiling: A Comparison of Multi-task learning and Single-task learning Approaches | Rong Wang et.al. | 2404.12077 | null |
| 2024-04-18 | Large Language Models: From Notes to Musical Form | Lilac Atassi et.al. | 2404.11976 | null |
| 2024-04-17 | Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation | Ye Bai et.al. | 2404.11275 | null |
| 2024-04-16 | Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training | Pavel Denisov et.al. | 2404.10922 | link |
| 2024-04-16 | Long-form music generation with latent diffusion | Zach Evans et.al. | 2404.10301 | null |
| 2024-04-16 | Anatomy of Industrial Scale Multilingual ASR | Francis McCann Ramirez et.al. | 2404.09841 | null |
| 2024-04-15 | Resilience of Large Language Models for Noisy Instructions | Bin Wang et.al. | 2404.09754 | null |
| 2024-04-16 | Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment | Zhiqing Hong et.al. | 2404.09313 | null |
| 2024-04-12 | Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task | Hassan Ali et.al. | 2404.08424 | null |
| 2024-04-12 | ASR advancements for indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa’ikhana | Monica Romero et.al. | 2404.08368 | null |
| 2024-04-10 | An inclusive review on deep learning techniques and their scope in handwriting recognition | Sukhdeep Singh et.al. | 2404.08011 | null |
| 2024-04-12 | An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution | Tien-Hong Lo et.al. | 2404.07575 | null |
| 2024-04-12 | Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping | Kevin Zhang et.al. | 2404.07341 | null |
| 2024-04-12 | Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness | Xincan Feng et.al. | 2404.06714 | link |
| 2024-04-10 | MuPT: A Generative Symbolic Music Pretrained Transformer | Xingwei Qu et.al. | 2404.06393 | null |
| 2024-04-10 | The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge | Yiwei Guo et.al. | 2404.06079 | null |
| 2024-04-06 | A Novel Bi-LSTM And Transformer Architecture For Generating Tabla Music | Roopa Mayya et.al. | 2404.05765 | null |
| 2024-04-08 | VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain | Khai Le-Duc et.al. | 2404.05659 | link |
| 2024-04-07 | Gull: A Generative Multifunctional Audio Codec | Yi Luo et.al. | 2404.04947 | null |
| 2024-04-07 | Safeguarding Voice Privacy: Harnessing Near-Ultrasonic Interference To Protect Against Unauthorized Audio Recording | Forrest McKee et.al. | 2404.04769 | null |
| 2024-04-06 | HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks | Yingting Li et.al. | 2404.04645 | link |
| 2024-04-05 | The NES Video-Music Database: A Dataset of Symbolic Video Game Music Paired with Gameplay Videos | Igor Cardoso et.al. | 2404.04420 | null |
| 2024-04-04 | Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition | Hainan Xu et.al. | 2404.04295 | null |
| 2024-04-05 | Open vocabulary keyword spotting through transfer learning from speech synthesis | Kesavaraj V et.al. | 2404.03914 | null |
| 2024-04-06 | RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis | Detai Xin et.al. | 2404.03204 | null |
| 2024-04-03 | Mai Ho’omāuna i ka ‘Ai: Language Models Improve Automatic Speech Recognition in Hawaiian | Kaavya Chaparala et.al. | 2404.03073 | null |
| 2024-04-03 | PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders | Yu Pan et.al. | 2404.02702 | null |
| 2024-04-03 | Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation | Yejin Jeon et.al. | 2404.02592 | null |
| 2024-04-03 | CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models | Zaid Sheikh et.al. | 2404.02408 | link |
| 2024-04-02 | BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition | Alexandros Haliassos et.al. | 2404.02098 | link |
| 2024-04-02 | Noise Masking Attacks and Defenses for Pretrained Speech Models | Matthew Jagielski et.al. | 2404.02052 | null |
| 2024-04-02 | Kallaama: A Transcribed Speech Dataset about Agriculture in the Three Most Widely Spoken Languages in Senegal | Elodie Gauthier et.al. | 2404.01991 | link |
| 2024-04-05 | Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials | Ali Akram et.al. | 2404.01981 | null |
| 2024-04-02 | Transfer Learning from Whisper for Microscopic Intelligibility Prediction | Paul Best et.al. | 2404.01737 | null |
| 2024-03-31 | Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation | Rohan Chaudhury et.al. | 2404.01339 | link |
| 2024-04-01 | KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis | Adal Abilbekov et.al. | 2404.01033 | link |
| 2024-04-01 | Voice Conversion Augmentation for Speaker Recognition on Defective Datasets | Ruijie Tao et.al. | 2404.00863 | null |
| 2024-04-01 | Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling | Injune Hwang et.al. | 2404.00856 | null |
| 2024-03-31 | CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models | Xiang Li et.al. | 2404.00569 | link |
| 2024-03-29 | ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models | Thibaut Thonet et.al. | 2403.20262 | null |
| 2024-03-29 | 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization | Yafeng Chen et.al. | 2403.19971 | link |
| 2024-03-28 | Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition | Yash Jain et.al. | 2403.19822 | null |
| 2024-03-28 | Asymmetric and trial-dependent modeling: the contribution of LIA to SdSV Challenge Task 2 | Pierre-Michel Bousquet et.al. | 2403.19634 | null |
| 2024-03-28 | Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition | Siyuan Shen et.al. | 2403.19224 | link |
| 2024-03-28 | LV-CTC: Non-autoregressive ASR with CTC and latent variable models | Yuya Fujita et.al. | 2403.19207 | null |
| 2024-03-27 | PhysicsAssistant: An LLM-Powered Interactive Learning Robot for Physics Lab Investigations | Ehsan Latif et.al. | 2403.18721 | null |
| 2024-03-27 | ZAEBUC-Spoken: A Multilingual Multidialectal Arabic-English Speech Corpus | Injy Hamed et.al. | 2403.18182 | null |
| 2024-03-28 | DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition | Yi-Cheng Wang et.al. | 2403.17645 | null |
| 2024-03-26 | Extracting Biomedical Entities from Noisy Audio Transcripts | Nima Ebadi et.al. | 2403.17363 | null |
| 2024-03-25 | Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT | Rohit Raju et.al. | 2403.16655 | null |
| 2024-03-25 | Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator | Takuhiro Kaneko et.al. | 2403.16464 | null |
| 2024-03-22 | Privacy-Preserving End-to-End Spoken Language Understanding | Yinggui Wang et.al. | 2403.15510 | null |
| 2024-03-26 | A Multimodal Approach to Device-Directed Speech Detection with Large Language Models | Dominik Wagner et.al. | 2403.14438 | null |
| 2024-03-21 | XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception | HyoJung Han et.al. | 2403.14402 | null |
| 2024-03-21 | M $^3$ AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset | Zhe Chen et.al. | 2403.14168 | null |
| 2024-03-21 | The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data | Alice Baird et.al. | 2403.14048 | null |
| 2024-03-20 | Open Access NAO (OAN): a ROS2-based software framework for HRI applications with the NAO robot | Antonio Bono et.al. | 2403.13960 | null |
| 2024-03-20 | BanglaNum – A Public Dataset for Bengali Digit Recognition from Speech | Mir Sayeed Mohammad et.al. | 2403.13465 | null |
| 2024-03-20 | Advanced Long-Content Speech Recognition With Factorized Neural Transducer | Xun Gong et.al. | 2403.13423 | null |
| 2024-03-20 | KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario | Huali Zhou et.al. | 2403.13356 | link |
| 2024-03-20 | Building speech corpus with diverse voice characteristics for its prompt-based representation | Aya Watanabe et.al. | 2403.13353 | null |
| 2024-03-20 | Polaris: A Safety-focused LLM Constellation Architecture for Healthcare | Subhabrata Mukherjee et.al. | 2403.13313 | null |
| 2024-03-19 | FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer | Dongyeong Hwang et.al. | 2403.12821 | link |
| 2024-03-19 | Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation | Yuto Ishikawa et.al. | 2403.12477 | null |
| 2024-03-19 | An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis | Yifan Peng et.al. | 2403.12402 | null |
| 2024-03-18 | Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models | Linus Nwankwo et.al. | 2403.12273 | null |
| 2024-03-18 | Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models | Emilian Postolache et.al. | 2403.11706 | link |
| 2024-03-18 | QEAN: Quaternion-Enhanced Attention Network for Visual Dance Generation | Zhizhen Zhou et.al. | 2403.11626 | null |
| 2024-03-18 | AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition | SooHwan Eom et.al. | 2403.11578 | null |
| 2024-03-16 | Energy-Based Models with Applications to Speech and Language Processing | Zhijian Ou et.al. | 2403.10961 | null |
| 2024-03-16 | Initial Decoding with Minimally Augmented Language Model for Improved Lattice Rescoring in Low Resource ASR | Savitha Murthy et.al. | 2403.10937 | null |
| 2024-03-15 | MusicHiFi: Fast High-Fidelity Stereo Vocoding | Ge Zhu et.al. | 2403.10493 | null |
| 2024-03-15 | Neural Networks Hear You Loud And Clear: Hearing Loss Compensation Using Deep Neural Networks | Peter Leer et.al. | 2403.10420 | null |
| 2024-03-14 | SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages | René Groh et.al. | 2403.09753 | link |
| 2024-03-14 | More than words: Advancements and challenges in speech recognition for singing | Anna Kruspe et.al. | 2403.09298 | null |
| 2024-03-13 | Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition | Wenjing Zhu et.al. | 2403.08258 | null |
| 2024-03-13 | SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation | Jiayu Du et.al. | 2403.08196 | link |
| 2024-03-13 | Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children | Taekyung Ahn et.al. | 2403.08187 | null |
| 2024-03-13 | EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech | Ziqi Liang et.al. | 2403.08164 | null |
| 2024-03-12 | Gujarati-English Code-Switching Speech Recognition using ensemble prediction of spoken language | Yash Sharma et.al. | 2403.08011 | null |
| 2024-03-12 | Motifs, Phrases, and Beyond: The Modelling of Structure in Symbolic Music Generation | Keshav Bhandari et.al. | 2403.07995 | null |
| 2024-03-11 | The evaluation of a code-switched Sepedi-English automatic speech recognition system | Amanda Phaladi et.al. | 2403.07947 | null |
| 2024-03-12 | Beyond the Labels: Unveiling Text-Dependency in Paralinguistic Speech Recognition Datasets | Jan Pešán et.al. | 2403.07767 | null |
| 2024-03-11 | Real-Time Multimodal Cognitive Assistant for Emergency Medical Services | Keshara Weerasinghe et.al. | 2403.06734 | null |
| 2024-03-11 | Towards Decoupling Frontend Enhancement and Backend Recognition in Monaural Robust ASR | Yufeng Yang et.al. | 2403.06387 | null |
| 2024-03-10 | SCORE: Self-supervised Correspondence Fine-tuning for Improved Content Representations | Amit Meghanani et.al. | 2403.06260 | null |
| 2024-03-09 | HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling | Chunhui Wang et.al. | 2403.05989 | null |
| 2024-03-09 | Aligning Speech to Languages to Enhance Code-switching Speech Recognition | Hexin Liu et.al. | 2403.05887 | null |
| 2024-03-07 | Classist Tools: Social Class Correlates with Performance in NLP | Amanda Cercas Curry et.al. | 2403.04445 | null |
| 2024-03-07 | A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain | Qusai Abo Obaidah et.al. | 2403.04280 | null |
| 2024-03-07 | A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition | Yusheng Dai et.al. | 2403.04245 | link |
| 2024-03-06 | RADIA – Radio Advertisement Detection with Intelligent Analytics | Jorge Álvarez et.al. | 2403.03538 | null |
| 2024-03-06 | Non-verbal information in spontaneous speech – towards a new framework of analysis | Tirza Biron et.al. | 2403.03522 | null |
| 2024-03-05 | NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models | Zeqian Ju et.al. | 2403.03100 | link |
| 2024-03-05 | AIx Speed: Playback Speed Optimization Using Listening Comprehension of Speech Recognition Models | Kazuki Kawamura et.al. | 2403.02938 | null |
| 2024-03-05 | Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction | Yue Li et.al. | 2403.02918 | null |
| 2024-03-04 | PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings | Joonas Kalda et.al. | 2403.02288 | link |
| 2024-03-04 | What has LeBenchmark Learnt about French Syntax? | Zdravko Dugonjić et.al. | 2403.02173 | null |
| 2024-03-04 | SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR | Zhiyun Fan et.al. | 2403.02010 | null |
| 2024-03-04 | Language and Speech Technology for Central Kurdish Varieties | Sina Ahmadi et.al. | 2403.01983 | link |
| 2024-03-03 | PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion | Tianhua Qi et.al. | 2403.01494 | null |
| 2024-03-03 | A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement | Ravi Shankar et.al. | 2403.01369 | null |
| 2024-03-03 | a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification | Hye-jin Shim et.al. | 2403.01355 | link |
| 2024-03-02 | Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey | Hamza Kheddar et.al. | 2403.01255 | null |
| 2024-03-02 | Towards Accurate Lip-to-Speech Synthesis in-the-Wild | Sindhu Hegde et.al. | 2403.01087 | null |
| 2024-03-01 | VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis | Weiwei Lin et.al. | 2403.00529 | null |
| 2024-03-01 | Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview | Heyang Liu et.al. | 2403.00370 | null |
| 2024-03-01 | Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification | Mufan Sang et.al. | 2403.00293 | null |
| 2024-03-01 | Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART | Aniket Tathe et.al. | 2403.00212 | null |
| 2024-02-29 | Probing the Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems | Quentin Raymondaud et.al. | 2402.19443 | null |
| 2024-02-29 | Unraveling Adversarial Examples against Speaker Identification – Techniques for Attack Detection and Victim Model Classification | Sonal Joshi et.al. | 2402.19355 | null |
| 2024-02-29 | Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data | Takaaki Saeki et.al. | 2402.18932 | null |
| 2024-02-29 | Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition | Jeehyun Lee et.al. | 2402.18923 | null |
| 2024-02-29 | Investigation of Adapter for Automatic Speech Recognition in Noisy Environment | Hao Shi et.al. | 2402.18275 | null |
| 2024-02-28 | Multilingual Speech Models for Automatic Speech Recognition Exhibit Gender Performance Gaps | Giuseppe Attanasio et.al. | 2402.17954 | link |
| 2024-02-24 | ByteComposer: a Human-like Melody Composition Method based on Language Model Agent | Xia Liang et.al. | 2402.17785 | null |
| 2024-02-27 | High-Fidelity Neural Phonetic Posteriorgrams | Cameron Churchwell et.al. | 2402.17735 | link |
| 2024-02-27 | Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey | Dinh-Viet-Toan Le et.al. | 2402.17467 | link |
| 2024-02-27 | An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement | Tzu-Ting Yang et.al. | 2402.17189 | null |
| 2024-02-27 | Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models | Rohit Prabhavalkar et.al. | 2402.17184 | null |
| 2024-02-26 | Towards Decoding Brain Activity During Passive Listening of Speech | Milán András Fodor et.al. | 2402.16996 | link |
| 2024-02-26 | Effect of utterance duration and phonetic content on speaker identification using second-order statistical methods | Ivan Magrin-Chagnolleau et.al. | 2402.16429 | null |
| 2024-02-24 | ArEEG_Chars: Dataset for Envisioned Speech Recognition using EEG for Arabic Characters | Hazem Darwish et.al. | 2402.15733 | null |
Multimodal
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-12-23 | Dual-Encoder Transformer-Based Multimodal Learning for Ischemic Stroke Lesion Segmentation Using Diffusion MRI | Muhammad Usman et.al. | 2512.20436 | null |
| 2025-12-23 | Generative Digital Twins: Vision-Language Simulation Models for Executable Industrial Systems | YuChe Hsu et.al. | 2512.20387 | null |
| 2025-12-23 | Retrieval-augmented Prompt Learning for Pre-trained Foundation Models | Xiang Chen et.al. | 2512.20145 | null |
| 2025-12-22 | Beyond CLIP: Knowledge-Enhanced Multimodal Transformers for Cross-Modal Alignment in Diabetic Retinopathy Diagnosis | Argha Kamal Samanta et.al. | 2512.19663 | null |
| 2025-12-22 | Non-Contrast CT Esophageal Varices Grading through Clinical Prior-Enhanced Multi-Organ Analysis | Xiaoming Zhang et.al. | 2512.19415 | null |
| 2025-12-22 | OmniMER: Indonesian Multimodal Emotion Recognition via Auxiliary-Enhanced LLM Adaptation | Xueming Yan et.al. | 2512.19379 | null |
| 2025-12-19 | STAR: Semantic-Traffic Alignment and Retrieval for Zero-Shot HTTPS Website Fingerprinting | Yifei Cheng et.al. | 2512.17667 | null |
| 2025-12-19 | PathFLIP: Fine-grained Language-Image Pretraining for Versatile Computational Pathology | Fengchun Liu et.al. | 2512.17621 | null |
| 2025-12-18 | Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future | Tianshuai Hu et.al. | 2512.16760 | null |
| 2025-12-18 | Smile on the Face, Sadness in the Eyes: Bridging the Emotion Gap with a Multimodal Dataset of Eye and Facial Behaviors | Kejun Liu et.al. | 2512.16485 | null |
| 2025-12-17 | GateFusion: Hierarchical Gated Cross-Modal Fusion for Active Speaker Detection | Yu Wang et.al. | 2512.15707 | null |
| 2025-12-17 | An Efficient and Effective Encoder Model for Vision and Language Tasks in the Remote Sensing Domain | João Daniel Silva et.al. | 2512.15531 | null |
| 2025-12-16 | Visual-textual Dermatoglyphic Animal Biometrics: A First Case Study on Panthera tigris | Wenshuo Li et.al. | 2512.14878 | null |
| 2025-12-15 | STAR: STacked AutoRegressive Scheme for Unified Multimodal Learning | Jie Qin et.al. | 2512.13752 | null |
| 2025-12-15 | JoVA: Unified Multimodal Learning for Joint Video-Audio Generation | Xiaohu Huang et.al. | 2512.13677 | null |
| 2025-12-15 | A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis | Xianchao Guan et.al. | 2512.13164 | null |
| 2025-12-13 | EchoVLM: Measurement-Grounded Multimodal Learning for Echocardiography | Yuheng Li et.al. | 2512.12107 | null |
| 2025-12-12 | VLM2GeoVec: Toward Universal Multimodal Embeddings for Remote Sensing | Emanuel Sánchez Aimar et.al. | 2512.11490 | null |
| 2025-12-12 | Exploring MLLM-Diffusion Information Transfer with MetaCanvas | Han Lin et.al. | 2512.11464 | null |
| 2025-12-12 | AMBER: An Adaptive Multimodal Mask Transformer for Beam Prediction with Missing Modalities | Chenyiming Wen et.al. | 2512.11331 | null |
| 2025-12-02 | Agent-Based Modular Learning for Multimodal Emotion Recognition in Human-Agent Systems | Matvey Nepomnyaschiy et.al. | 2512.10975 | null |
| 2025-12-11 | Beyond Pixels: A Training-Free, Text-to-Text Framework for Remote Sensing Image Retrieval | J. Xiao et.al. | 2512.10596 | null |
| 2025-12-11 | Cross-modal Retrieval Models for Stripped Binary Analysis | Guoqiang Chen et.al. | 2512.10393 | null |
| 2025-12-05 | What Happens When: Learning Temporal Orders of Events in Videos | Daechul Ahn et.al. | 2512.08979 | null |
| 2025-12-09 | Towards Effective and Efficient Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval | Tao Chen et.al. | 2512.08410 | null |
| 2025-12-08 | CAMO: Causality-Guided Adversarial Multimodal Domain Generalization for Crisis Classification | Pingchuan Ma et.al. | 2512.08071 | null |
| 2025-12-08 | Unison: A Fully Automatic, Task-Universal, and Low-Cost Framework for Unified Understanding and Generation | Shihao Zhao et.al. | 2512.07747 | null |
| 2025-12-08 | VFM-VLM: Vision Foundation Model and Vision Language Model based Visual Comparison for 3D Pose Estimation | Md Selim Sarowar et.al. | 2512.07215 | null |
| 2025-12-07 | A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations | Waleed Razzaq et.al. | 2512.06708 | null |
| 2025-12-06 | Enhancing Medical Cross-Modal Hashing Retrieval using Dropout-Voting Mixture-of-Experts Fusion | Jaewon Ahn et.al. | 2512.06449 | null |
| 2025-12-05 | Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures | Amirkia Rafiei Oskooei et.al. | 2512.05908 | null |
| 2025-12-04 | 4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer | Xianfeng Wu et.al. | 2512.05060 | link |
| 2025-12-03 | Cross-Space Synergy: A Unified Framework for Multimodal Emotion Recognition in Conversation | Xiaosen Lyu et.al. | 2512.03521 | null |
| 2025-12-03 | Multi-Aspect Knowledge-Enhanced Medical Vision-Language Pretraining with Multi-Agent Data Generation | Xieji Li et.al. | 2512.03445 | null |
| 2025-12-03 | Label-Efficient Hyperspectral Image Classification via Spectral FiLM Modulation of Low-Level Pretrained Diffusion Features | Yuzhen Hu et.al. | 2512.03430 | null |
| 2025-12-02 | Learning Multimodal Embeddings for Traffic Accident Prediction and Causal Estimation | Ziniu Zhang et.al. | 2512.02920 | null |
| 2025-12-02 | Real-Time Multimodal Data Collection Using Smartwatches and Its Visualization in Education | Alvaro Becerra et.al. | 2512.02651 | null |
| 2025-12-02 | Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing Resources | Phuc Pham et.al. | 2512.02438 | null |
| 2025-11-30 | MM-ACT: Learn from Multimodal Parallel Generation to Act | Haotian Liang et.al. | 2512.00975 | null |
| 2025-11-29 | Describe Anything Anywhere At Any Moment | Nicolas Gorlo et.al. | 2512.00565 | null |
| 2025-11-29 | CourseTimeQA: A Lecture-Video Benchmark and a Latency-Constrained Cross-Modal Fusion Method for Timestamped QA | Vsevolod Kovalev et.al. | 2512.00360 | null |
| 2025-11-28 | Buffer replay enhances the robustness of multimodal learning under missing-modality | Hongye Zhu et.al. | 2511.23070 | null |
| 2025-11-27 | Orthogonal Disentanglement with Projected Feature Alignment for Multimodal Emotion Recognition in Conversation | Xinyi Che et.al. | 2511.22463 | null |
| 2025-11-27 | Angle-Optimized Partial Disentanglement for Multimodal Emotion Recognition in Conversation | Xinyi Che et.al. | 2511.22447 | null |
| 2025-11-27 | Bridging the Modality Gap by Similarity Standardization with Pseudo-Positive Samples | Shuhei Yamashita et.al. | 2511.22141 | null |
| 2025-11-26 | WalkCLIP: Multimodal Learning for Urban Walkability Prediction | Shilong Xiang et.al. | 2511.21947 | null |
| 2025-11-26 | Evaluating Strategies for Synthesizing Clinical Notes for Medical Multimodal AI | Niccolo Marini et.al. | 2511.21827 | null |
| 2025-11-26 | Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling | Mengran Li et.al. | 2511.21120 | null |
| 2025-11-25 | A review on data fusion in multimodal learning analytics and educational data mining | Wilson Chango et.al. | 2511.20871 | null |
| 2025-11-25 | VibraVerse: A Large-Scale Geometry-Acoustics Alignment Dataset for Physically-Consistent Multimodal Learning | Bo Pang et.al. | 2511.20422 | null |
| 2025-11-25 | MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts | Zilong Huang et.al. | 2511.20415 | null |
| 2025-11-25 | ScenarioCLIP: Pretrained Transferable Visual Language Models and Action-Genome Dataset for Natural Scene Analysis | Advik Sinha et.al. | 2511.20274 | null |
| 2025-11-24 | Medusa: Cross-Modal Transferable Adversarial Attacks on Multimodal Medical Retrieval-Augmented Generation | Yingjia Shang et.al. | 2511.19257 | null |
| 2025-11-24 | IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes | Carl Lindström et.al. | 2511.19235 | null |
| 2025-11-24 | Can Modern Vision Models Understand the Difference Between an Object and a Look-alike? | Itay Cohen et.al. | 2511.19200 | null |
| 2025-11-23 | Breaking Forgetting: Training-Free Few-Shot Class-Incremental Learning via Conditional Diffusion | Haidong Kang et.al. | 2511.18516 | null |
| 2025-11-22 | Vulnerability-Aware Robust Multimodal Adversarial Training | Junrui Zhang et.al. | 2511.18138 | null |
| 2025-11-22 | Consolidating Diffusion-Generated Video Detection with Unified Multimodal Forgery Learning | Xiaohong Liu et.al. | 2511.18104 | null |
| 2025-11-17 | Reconstruction-Driven Multimodal Representation Learning for Automated Media Understanding | Yassir Benhammou et.al. | 2511.17596 | null |
| 2025-11-21 | MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment | Huangbiao Xu et.al. | 2511.17397 | null |
| 2025-11-21 | UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation | Chi Zhang et.al. | 2511.16917 | null |
| 2025-11-20 | LLaVA $^3$ : Representing 3D Scenes like a Cubist Painter to Boost 3D Scene Understanding of VLMs | Doriand Petit et.al. | 2511.16454 | null |
| 2025-11-20 | Boosting Medical Visual Understanding From Multi-Granular Language Learning | Zihan Li et.al. | 2511.15943 | null |
| 2025-11-18 | Uncertainty-Resilient Multimodal Learning via Consistency-Guided Cross-Modal Transfer | Hyo-Jeong Jang et.al. | 2511.15741 | null |
| 2025-11-19 | SIGMMA: Hierarchical Graph-Based Multi-Scale Multi-modal Contrastive Alignment of Histopathology Image and Spatial Transcriptome | Dabin Jeong et.al. | 2511.15464 | null |
| 2025-11-19 | Reflexive Evidence-Based Multimodal Learning for Clean Energy Transitions: Causal Insights on Cooking Fuel Access, Urbanization, and Carbon Emissions | Shan Shan et.al. | 2511.15342 | null |
| 2025-11-19 | Towards Unbiased Cross-Modal Representation Learning for Food Image-to-Recipe Retrieval | Qing Wang et.al. | 2511.15201 | null |
| 2025-11-19 | TiCAL:Typicality-Based Consistency-Aware Learning for Multimodal Emotion Recognition | Wen Yin et.al. | 2511.15085 | null |
| 2025-11-18 | Quality-Controlled Multimodal Emotion Recognition in Conversations with Identity-Based Transfer Learning and MAMBA Fusion | Zanxu Wang et.al. | 2511.14969 | null |
| 2025-11-18 | Toward Robust and Harmonious Adaptation for Cross-modal Retrieval | Haobin Li et.al. | 2511.14416 | null |
| 2025-11-18 | Let Language Constrain Geometry: Vision-Language Models as Semantic and Spatial Critics for 3D Generation | Weimin Bai et.al. | 2511.14271 | null |
| 2025-11-18 | Online Data Curation for Object Detection via Marginal Contributions to Dataset-level Average Precision | Zitang Sun et.al. | 2511.14197 | null |
| 2025-11-14 | Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement | Zhe Yang et.al. | 2511.13755 | null |
| 2025-11-17 | 3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale | Yijia Fan et.al. | 2511.13211 | null |
| 2025-11-17 | uCLIP: Parameter-Efficient Multilingual Extension of Vision-Language Models with Unpaired Data | Dahyun Chung et.al. | 2511.13036 | null |
| 2025-11-17 | Angular Gradient Sign Method: Uncovering Vulnerabilities in Hyperbolic Networks | Minsoo Jo et.al. | 2511.12985 | null |
| 2025-11-15 | To Align or Not to Align: Strategic Multimodal Representation Alignment for Optimal Performance | Wanlong Fang et.al. | 2511.12121 | null |
| 2025-11-14 | Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification | Qinghao Gao et.al. | 2511.11460 | null |
| 2025-11-14 | AIonopedia: an LLM agent orchestrating multimodal learning for ionic liquid discovery | Yuqi Yin et.al. | 2511.11257 | null |
| 2025-11-14 | LEMUR: Large scale End-to-end MUltimodal Recommendation | Xintian Han et.al. | 2511.10962 | null |
| 2025-11-14 | MCN-CL: Multimodal Cross-Attention Network and Contrastive Learning for Multimodal Emotion Recognition | Feng Li et.al. | 2511.10892 | null |
| 2025-11-13 | Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals | Shruti Singh Baghel et.al. | 2511.10615 | null |
| 2025-11-13 | URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding | Yongxin Shi et.al. | 2511.10552 | null |
| 2025-11-13 | GEA: Generation-Enhanced Alignment for Text-to-Image Person Retrieval | Hao Zou et.al. | 2511.10154 | null |
| 2025-11-13 | Explicit Temporal-Semantic Modeling for Dense Video Captioning via Context-Aware Cross-Modal Interaction | Mingda Jia et.al. | 2511.10134 | null |
| 2025-11-13 | Towards Robust Multimodal Learning in the Open World | Fushuo Huo et.al. | 2511.09989 | null |
| 2025-11-12 | Baby Sophia: A Developmental Approach to Self-Exploration through Self-Touch and Hand Regard | Stelios Zarifis et.al. | 2511.09727 | null |
| 2025-11-12 | End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering | Jiliang Hu et.al. | 2511.09282 | null |
| 2025-11-11 | Compression then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding | Da Li et.al. | 2511.08480 | null |
| 2025-11-11 | Boomda: Balanced Multi-objective Optimization for Multimodal Domain Adaptation | Jun Sun et.al. | 2511.08152 | null |
| 2025-11-11 | Semantic-Consistent Bidirectional Contrastive Hashing for Noisy Multi-Label Cross-Modal Retrieval | Likang Peng et.al. | 2511.07780 | null |
| 2025-11-11 | Cross Modal Fine-Grained Alignment via Granularity-Aware and Region-Uncertain Modeling | Jiale Liu et.al. | 2511.07710 | null |
| 2025-11-10 | A Hybrid Multimodal Deep Learning Framework for Intelligent Fashion Recommendation | Kamand Kalashi et.al. | 2511.07573 | null |
| 2025-11-10 | Integrating Epigenetic and Phenotypic Features for Biological Age Estimation in Cancer Patients via Multimodal Learning | Shuyue Jiang et.al. | 2511.07219 | null |
| 2025-11-10 | Med-SORA: Symptom to Organ Reasoning in Abdomen CT Images | You-Kyoung Na et.al. | 2511.06752 | null |
| 2025-11-09 | LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval | Jian Zhang et.al. | 2511.06268 | null |
| 2025-11-09 | VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving | Ruifei Zhang et.al. | 2511.06256 | null |
| 2025-11-09 | AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving | Ruifei Zhang et.al. | 2511.06253 | null |
| 2025-11-08 | Referring Expressions as a Lens into Spatial Language Grounding in Vision-Language Models | Akshar Tumu et.al. | 2511.06146 | null |
| 2025-11-04 | Fine-Tuning Vision-Language Models for Multimodal Polymer Property Prediction | An Vuong et.al. | 2511.05577 | null |
| 2025-11-06 | DINOv2 Driven Gait Representation Learning for Video-Based Visible-Infrared Person Re-identification | Yujie Yang et.al. | 2511.04281 | null |
| 2025-11-05 | Cross-Modal Alignment via Variational Copula Modelling | Feng Wu et.al. | 2511.03196 | null |
| 2025-11-04 | SLIP: Structural-aware Language-Image Pretraining for Vision-Language Alignment | Wenbo Lu et.al. | 2511.03019 | null |
| 2025-11-04 | ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology | Srikumar Sastry et.al. | 2511.02946 | null |
| 2025-11-04 | When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning | Chenyu Zhang et.al. | 2511.02794 | null |
| 2025-11-03 | OmniFuser: Adaptive Multimodal Fusion for Service-Oriented Predictive Maintenance | Ziqi Wang et.al. | 2511.01320 | null |
| 2025-11-02 | Balanced Multimodal Learning via Mutual Information | Rongrong Xie et.al. | 2511.00987 | null |
| 2025-11-01 | LIR: The First Workshop on Late Interaction and Multi Vector Retrieval @ ECIR 2026 | Benjamin Clavié et.al. | 2511.00444 | null |
| 2025-11-01 | Federated Dialogue-Semantic Diffusion for Emotion Recognition under Incomplete Modalities | Xihang Qiu et.al. | 2511.00344 | null |
| 2025-10-24 | Multimodal Detection of Fake Reviews using BERT and ResNet-50 | Suhasnadh Reddy Veluru et.al. | 2511.00020 | null |
| 2025-10-04 | Multimodal Learning with Augmentation Techniques for Natural Disaster Assessment | Adrian-Dinu Urse et.al. | 2511.00004 | null |
| 2025-10-31 | MedM2T: A MultiModal Framework for Time-Aware Modeling with Electronic Health Record and Electrocardiogram Data | Yu-Chen Kuo et.al. | 2510.27321 | null |
| 2025-10-30 | Evaluating Perspectival Biases in Cross-Modal Retrieval | Teerapol Saengsukhiran et.al. | 2510.26861 | null |
| 2025-10-30 | Contribution-Guided Asymmetric Learning for Robust Multimodal Fusion under Imbalance and Noise | Zijing Xu et.al. | 2510.26289 | null |
| 2025-10-29 | Metis-SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start | Kun Chen et.al. | 2510.25801 | null |
| 2025-10-29 | LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation | Yang Miao et.al. | 2510.25263 | null |
| 2025-10-29 | H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts | Peilin Tan et.al. | 2510.25091 | null |
| 2025-10-29 | Vision-Language Integration for Zero-Shot Scene Understanding in Real-World Environments | Manjunath Prasad Holenarasipura Rajiv et.al. | 2510.25070 | null |
| 2025-10-28 | Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning | Hossein R. Nowdeh et.al. | 2510.24919 | null |
| 2025-10-28 | MCIHN: A Hybrid Network Model Based on Multi-path Cross-modal Interaction for Multimodal Emotion Recognition | Haoyang Zhang et.al. | 2510.24827 | null |
| 2025-10-24 | Towards Fine-Grained Human Motion Video Captioning | Guorui Song et.al. | 2510.24767 | null |
| 2025-10-27 | Toward Clinically Grounded Foundation Models in Pathology | Hamid R. Tizhoosh et.al. | 2510.23807 | null |
| 2025-10-27 | Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier | Hyeongseop Rha et.al. | 2510.23506 | null |
| 2025-10-27 | Evaluation of Vision-LLMs in Surveillance Video | Pascal Benschop et.al. | 2510.23190 | null |
| 2025-10-21 | Unifying Inductive, Cross-Domain, and Multimodal Learning for Robust and Generalizable Recommendation | Chanyoung Chung et.al. | 2510.21812 | null |
| 2025-10-07 | Avi: Action from Volumetric Inference | Harris Song et.al. | 2510.21746 | null |
| 2025-10-24 | CXR-LanIC: Language-Grounded Interpretable Classifier for Chest X-Ray Diagnosis | Yiming Tang et.al. | 2510.21464 | null |
| 2025-10-24 | Bridging the gap to real-world language-grounded visual concept learning | Whie Jung et.al. | 2510.21412 | null |
| 2025-10-23 | Multimodal Negative Learning | Baoquan Gong et.al. | 2510.20877 | null |
| 2025-10-23 | Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process | Tsai Hor Chan et.al. | 2510.20736 | null |
| 2025-10-23 | Calibrating Multimodal Consensus for Emotion Recognition | Guowei Zhong et.al. | 2510.20256 | null |
| 2025-10-22 | Learning Noise-Resilient and Transferable Graph-Text Alignment via Dynamic Quality Assessment | Yuhang Liu et.al. | 2510.19384 | null |
| 2025-10-22 | FrogDeepSDM: Improving Frog Counting and Occurrence Prediction Using Multimodal Data and Pseudo-Absence Imputation | Chirag Padubidri et.al. | 2510.19305 | null |
| 2025-10-21 | Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation | Yasser Hamidullah et.al. | 2510.18439 | null |
| 2025-10-20 | Transformer Redesign for Late Fusion of Audio-Text Features on Ultra-Low-Power Edge Hardware | Stavros Mitsis et.al. | 2510.18036 | null |
| 2025-10-20 | MILES: Modality-Informed Learning Rate Scheduler for Balancing Multimodal Learning | Alejandro Guerra-Manzanares et.al. | 2510.17394 | null |
| 2025-10-19 | Graph4MM: Weaving Multimodal Learning with Structural Information | Xuying Ning et.al. | 2510.16990 | null |
| 2025-10-19 | ProtoMol: Enhancing Molecular Property Prediction via Prototype-Guided Multimodal Learning | Yingxu Wang et.al. | 2510.16824 | null |
| 2025-10-19 | Pursuing Minimal Sufficiency in Spatial Reasoning | Yejie Guo et.al. | 2510.16688 | null |
| 2025-10-18 | Safire: Similarity Framework for Visualization Retrieval | Huyen N. Nguyen et.al. | 2510.16662 | null |
| 2025-10-18 | Structured Interfaces for Automated Reasoning with 3D Scene Graphs | Aaron Ray et.al. | 2510.16643 | null |
| 2025-10-09 | Lyapunov-Stable Adaptive Control for Multimodal Concept Drift | Tianyu Bell Pan et.al. | 2510.15944 | null |
| 2025-10-17 | Towards Relaxed Multimodal Inputs for Gait-based Parkinson’s Disease Assessment | Minlin Zeng et.al. | 2510.15748 | null |
| 2025-10-16 | From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance | Zhe Li et.al. | 2510.14952 | null |
| 2025-10-16 | Revisit Modality Imbalance at the Decision Layer | Xiaoyu Ma et.al. | 2510.14411 | null |
| 2025-10-15 | A Multimodal Approach to Heritage Preservation in the Context of Climate Change | David Roqui et.al. | 2510.14136 | null |
| 2025-10-15 | Grounding Long-Context Reasoning with Contextual Normalization for Retrieval-Augmented Generation | Jiamin Chen et.al. | 2510.13191 | null |
| 2025-10-15 | Information-Theoretic Criteria for Knowledge Distillation in Multimodal Learning | Rongrong Xie et.al. | 2510.13182 | null |
| 2025-10-14 | A Text-Image Fusion Method with Data Augmentation Capabilities for Referring Medical Image Segmentation | Shurong Chai et.al. | 2510.12482 | null |
| 2025-10-14 | Ground Stratification for a Logic of Definitions with Induction | Nathan Guermond et.al. | 2510.12297 | null |
| 2025-10-14 | IL3D: A Large-Scale Indoor Layout Dataset for LLM-Driven 3D Scene Generation | Wenxu Zhou et.al. | 2510.12095 | null |
| 2025-10-13 | Task-Specific Dual-Model Framework for Comprehensive Traffic Safety Video Description and Analysis | Blessing Agyei Kyem et.al. | 2510.11907 | null |
| 2025-10-10 | Cattle-CLIP: A Multimodal Framework for Cattle Behaviour Recognition | Huimin Liu et.al. | 2510.09203 | null |
| 2025-10-09 | Provably Robust Adaptation for Language-Empowered Foundation Models | Yuni Lai et.al. | 2510.08659 | null |
| 2025-10-07 | Centering Emotion Hotspots: Multimodal Local-Global Fusion and Cross-Modal Alignment for Emotion Recognition in Conversations | Yu Liu et.al. | 2510.08606 | null |
| 2025-10-09 | Looking to Learn: Token-wise Dynamic Gating for Low-Resource Vision-Language Modelling | Bianca-Mihaela Ganescu et.al. | 2510.08470 | null |
| 2025-10-08 | FLEET: Formal Language-Grounded Scheduling for Heterogeneous Robot Teams | Corban Rivera et.al. | 2510.07417 | null |
| 2025-09-30 | MultiFair: Multimodal Balanced Fairness-Aware Medical Classification with Dual-Level Gradient Modulation | Md Zubair et.al. | 2510.07328 | null |
| 2025-10-08 | TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation | Jiaben Chen et.al. | 2510.07249 | null |
| 2025-10-08 | Expressive and Scalable Quantum Fusion for Multimodal Learning | Tuyen Nguyen et.al. | 2510.06938 | null |
| 2025-10-07 | Deforming Videos to Masks: Flow Matching for Referring Video Segmentation | Zanyi Wang et.al. | 2510.06139 | null |
| 2025-10-04 | Towards Unsupervised Speech Recognition at the Syllable-Level | Liming Wang et.al. | 2510.03639 | null |
| 2025-09-25 | Towards Multimodal Active Learning: Efficient Learning with Limited Paired Data | Jiancheng Zhang et.al. | 2510.03247 | null |
| 2025-10-02 | Latency-aware Multimodal Federated Learning over UAV Networks | Shaba Shaon et.al. | 2510.01717 | null |
| 2025-10-01 | PhraseStereo: The First Open-Vocabulary Stereo Image Segmentation Dataset | Thomas Campagnolo et.al. | 2510.00818 | null |
| 2025-09-30 | MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning | Seong-Hyeon Hwang et.al. | 2509.25831 | null |
| 2025-09-29 | FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology | Faizan Farooq Khan et.al. | 2509.25564 | null |
| 2025-09-29 | MAESTRO : Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series | Payal Mohapatra et.al. | 2509.25278 | null |
| 2025-09-29 | A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity | Giordano Cicchetti et.al. | 2509.24734 | null |
| 2025-09-29 | Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey | Yuntao Shou et.al. | 2509.24322 | null |
| 2025-09-28 | Contrastive Learning Enhances Language Model Based Cell Embeddings for Low-Sample Single Cell Transcriptomics | Luxuan Zhang et.al. | 2509.23543 | null |
| 2025-09-26 | RefAM: Attention Magnets for Zero-Shot Referral Segmentation | Anna Kukleva et.al. | 2509.22650 | null |
| 2025-09-26 | HELIOS: Hierarchical Exploration for Language-grounded Interaction in Open Scenes | Katrina Ashton et.al. | 2509.22498 | null |
| 2025-09-26 | From Watch to Imagine: Steering Long-horizon Manipulation via Human Demonstration and Future Envisionment | Ke Ye et.al. | 2509.22205 | null |
| 2025-09-26 | VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation | Huayi Zhou et.al. | 2509.21723 | null |
| 2025-09-14 | LibEMER: A novel benchmark and algorithms library for EEG-based Multimodal Emotion Recognition | Zejun Liu et.al. | 2509.19330 | null |
| 2025-09-10 | Advancing Few-Shot Pediatric Arrhythmia Classification with a Novel Contrastive Loss and Multimodal Learning | Yiqiao Chen et.al. | 2509.19315 | null |
| 2025-09-23 | Single-Branch Network Architectures to Close the Modality Gap in Multimodal Recommendation | Christian Ganhör et.al. | 2509.18807 | null |
| 2025-09-23 | M4SER: Multimodal, Multirepresentation, Multitask, and Multistrategy Learning for Speech Emotion Recognition | Jiajun He et.al. | 2509.18706 | null |
| 2025-09-22 | Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction | Yi Gu et.al. | 2509.18284 | null |
| 2025-09-22 | ClassMind: Scaling Classroom Observation and Instructional Feedback with Multimodal AI | Ao Qu et.al. | 2509.18020 | null |
| 2025-09-22 | M3ET: Efficient Vision-Language Learning for Robotics based on Multimodal Mamba-Enhanced Transformer | Yanxin Zhang et.al. | 2509.18005 | null |
| 2025-09-22 | Trainee Action Recognition through Interaction Analysis in CCATT Mixed-Reality Training | Divya Mereddy et.al. | 2509.17888 | null |
| 2025-09-20 | Self-organized epithelial reticulum inhibits cell proliferation | Liav Daraf et.al. | 2509.16661 | null |
| 2025-09-19 | Vision-Language Models as Differentiable Semantic and Spatial Rewards for Text-to-3D Generation | Weimin Bai et.al. | 2509.15772 | null |
| 2025-09-19 | Multimodal Learning for Fake News Detection in Short Videos Using Linguistically Verified Data and Heterogeneous Modality Fusion | Shanghong Li et.al. | 2509.15578 | null |
| 2025-09-19 | Beyond Words: Enhancing Desire, Emotion, and Sentiment Recognition with Non-Verbal Cues | Wei Chen et.al. | 2509.15540 | null |
| 2025-09-17 | Exploring the Capabilities of LLM Encoders for Image-Text Retrieval in Chest X-rays | Hanbin Ko et.al. | 2509.15234 | null |
| 2025-09-17 | VocSegMRI: Multimodal Learning for Precise Vocal Tract Segmentation in Real-time MRI | Daiqi Liu et.al. | 2509.13767 | null |
| 2025-09-15 | Evaluating Robustness of Vision-Language Models Under Noisy Conditions | Purushoth et.al. | 2509.12492 | null |
| 2025-09-15 | OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling | Yang Zhou et.al. | 2509.12201 | null |
| 2025-09-15 | Enriched text-guided variational multimodal knowledge distillation network (VMD) for automated diagnosis of plaque vulnerability in 3D carotid artery MRI | Bo Cao et.al. | 2509.11924 | null |
| 2025-09-14 | GLaVE-Cap: Global-Local Aligned Video Captioning with Vision Expert Integration | Wan Xu et.al. | 2509.11360 | null |
| 2025-09-14 | DMLDroid: Deep Multimodal Fusion Framework for Android Malware Detection with Resilience to Code Obfuscation and Adversarial Perturbations | Doan Minh Trung et.al. | 2509.11187 | null |
| 2025-09-14 | Filling the Gaps: A Multitask Hybrid Multiscale Generative Framework for Missing Modality in Remote Sensing Semantic Segmentation | Nhi Kieu et.al. | 2509.11102 | null |
| 2025-09-13 | Why Bonds Fail Differently? Explainable Multimodal Learning for Multi-Class Default Prediction | Yi Lu et.al. | 2509.10802 | null |
| 2025-09-11 | Modality-Agnostic Input Channels Enable Segmentation of Brain lesions in Multimodal MRI with Sequences Unavailable During Training | Anthony P. Addison et.al. | 2509.09290 | null |
| 2025-09-09 | Enhancing Online Learning by Integrating Biosensors and Multimodal Learning Analytics for Detecting and Predicting Student Behavior: A Review | Alvaro Becerra et.al. | 2509.07742 | null |
| 2025-09-08 | Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding | Jiangnan Xie et.al. | 2509.06291 | null |
| 2025-09-06 | GraMFedDHAR: Graph Based Multimodal Differentially Private Federated HAR | Labani Halder et.al. | 2509.05671 | null |
| 2025-09-06 | Causal Debiasing Medical Multimodal Representation Learning with Missing Modalities | Xiaoguang Zhu et.al. | 2509.05615 | null |
| 2025-09-04 | Vehicle-to-Infrastructure Collaborative Spatial Perception via Multimodal Large Language Models | Kimia Ehsani et.al. | 2509.03837 | null |
| 2025-09-03 | Designing Gaze Analytics for ELA Instruction: A User-Centered Dashboard with Conversational AI Support | Eduardo Davalos et.al. | 2509.03741 | null |
| 2025-09-03 | Robult: Leveraging Redundancy and Modality Specific Features for Robust Multimodal Learning | Duy A. Nguyen et.al. | 2509.03477 | null |
| 2025-09-03 | Multimodal learning of melt pool dynamics in laser powder bed fusion | Satyajit Mojumder et.al. | 2509.03029 | null |
| 2025-09-03 | Resilient Multimodal Industrial Surface Defect Detection with Uncertain Sensors Availability | Shuai Jiang et.al. | 2509.02962 | null |
| 2025-09-02 | Language-Guided Long Horizon Manipulation with LLM-based Planning and Visual Perception | Changshi Zhou et.al. | 2509.02324 | null |
| 2025-09-02 | Balanced Multimodal Learning: An Unidirectional Dynamic Interaction Perspective | Shijie Wang et.al. | 2509.02281 | null |
| 2025-09-02 | Content and Engagement Trends in COVID-19 YouTube Videos: Evidence from the Late Pandemic | Nirmalya Thakur et.al. | 2509.01954 | null |
| 2025-09-01 | OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning | Yanqing Liu et.al. | 2509.01644 | link |
| 2025-09-01 | Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement | Jiayi Gao et.al. | 2509.01362 | null |
| 2025-08-29 | Integrating Pathology and CT Imaging for Personalized Recurrence Risk Prediction in Renal Cancer | Daniël Boeke et.al. | 2508.21581 | null |
| 2025-08-27 | Bangla-Bayanno: A 52K-Pair Bengali Visual Question Answering Dataset with LLM-Assisted Translation Refinement | Mohammed Rakibul Hasan et.al. | 2508.19887 | null |
| 2025-08-27 | AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning | Shu Shen et.al. | 2508.19769 | null |
| 2025-08-25 | BTW: A Non-Parametric Variance Stabilization Framework for Multimodal Model Integration | Jun Hou et.al. | 2508.18551 | null |
| 2025-08-22 | Can VLMs Recall Factual Associations From Visual References? | Dhananjay Ashok et.al. | 2508.18297 | null |
| 2025-08-20 | Human-like Content Analysis for Generative AI with Language-Grounded Sparse Encoders | Yiming Tang et.al. | 2508.18236 | null |
| 2025-08-24 | Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice | Hugo Bohy et.al. | 2508.17502 | link |
| 2025-08-24 | Multimodal Representation Learning Conditioned on Semantic Relations | Yang Qiao et.al. | 2508.17497 | null |
| 2025-08-24 | SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality | Yuzhi Lai et.al. | 2508.17255 | null |
| 2025-08-10 | An Embodied AR Navigation Agent: Integrating BIM with Retrieval-Augmented Generation for Language Guidance | Hsuan-Kung Yang et.al. | 2508.16602 | null |
| 2025-08-22 | Disentangled Multi-modal Learning of Histology and Transcriptomics for Cancer Characterization | Yupei Zhang et.al. | 2508.16479 | null |
| 2025-08-22 | A Multimodal-Multitask Framework with Cross-modal Relation and Hierarchical Interactive Attention for Semantic Comprehension | Mohammad Zia Ur Rehman et.al. | 2508.16300 | null |
| 2025-08-21 | Lang2Lift: A Framework for Language-Guided Pallet Detection and Pose Estimation Integrated in Autonomous Outdoor Forklift Operation | Huy Hoang Nguyen et.al. | 2508.15427 | null |
| 2025-08-21 | DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding | Zhu Wang et.al. | 2508.15297 | null |
| 2025-08-20 | MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs | Ruyi Ding et.al. | 2508.15036 | null |
| 2025-08-19 | Beyond Simple Edits: Composed Video Retrieval with Dense Modifications | Omkar Thawakar et.al. | 2508.14039 | link |
| 2025-08-19 | CrafterDojo: A Suite of Foundation Models for Building Open-Ended Embodied Agents in Crafter | Junyeong Park et.al. | 2508.13530 | null |
| 2025-08-19 | CAST: Counterfactual Labels Improve Instruction Following in Vision-Language-Action Models | Catherine Glossop et.al. | 2508.13446 | null |
| 2025-08-18 | SPANER: Shared Prompt Aligner for Multimodal Semantic Representation | Thye Shan Ng et.al. | 2508.13387 | null |
| 2025-08-18 | Eyes on the Image: Gaze Supervised Multimodal Learning for Chest X-ray Diagnosis and Report Generation | Tanjim Islam Riju et.al. | 2508.13068 | null |
| 2025-08-17 | Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping | Xuhui Zhan et.al. | 2508.12466 | link |
| 2025-08-16 | MOVER: Multimodal Optimal Transport with Volume-based Embedding Regularization | Haochen You et.al. | 2508.12149 | null |
| 2025-08-16 | ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models | Zhichen Lou et.al. | 2508.11918 | null |
| 2025-08-13 | MANGO: Multimodal Attention-based Normalizing Flow Approach to Fusion Learning | Thanh-Dat Truong et.al. | 2508.10133 | null |
| 2025-08-13 | Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model | Sushrut Patwardhan et.al. | 2508.10110 | null |
| 2025-08-12 | LPGNet: A Lightweight Network with Parallel Attention and Gated Fusion for Multimodal Emotion Recognition | Zhining He et.al. | 2508.08925 | null |
| 2025-08-12 | Multimodal learning enables instant ionizing radiation alerts on unmodified mobile phones for real-world emergency response | Yanfeng Xie et.al. | 2508.08541 | null |
| 2025-08-11 | BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models | Maozhen Zhang et.al. | 2508.08040 | null |
| 2025-08-11 | A Trustworthy Method for Multimodal Emotion Recognition | Junxiao Xue et.al. | 2508.07625 | null |
| 2025-08-10 | Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks for Enhanced Action Understanding | Zhaoyu Chen et.al. | 2508.07388 | null |
| 2025-08-10 | FLUID: Flow-Latent Unified Integration via Token Distillation for Expert Specialization in Multimodal Learning | Van Duc Cuong et.al. | 2508.07264 | null |
| 2025-08-09 | Can Multitask Learning Enhance Model Explainability? | Hiba Najjar et.al. | 2508.06966 | null |
| 2025-08-09 | Intrinsic Explainability of Multimodal Learning for Crop Yield Prediction | Hiba Najjar et.al. | 2508.06939 | null |
| 2025-08-09 | Hardness-Aware Dynamic Curriculum Learning for Robust Multimodal Emotion Recognition with Missing Modalities | Rui Liu et.al. | 2508.06800 | null |
| 2025-08-08 | Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Records | Mosbah Aouad et.al. | 2508.06627 | null |
| 2025-08-07 | Surformer v1: Transformer-Based Surface Classification Using Tactile and Vision Features | Manish Kansana et.al. | 2508.06566 | null |
| 2025-08-06 | Grounding Emotion Recognition with Visual Prototypes: VEGA – Revisiting CLIP in MERC | Guanyu Hu et.al. | 2508.06564 | null |
| 2025-08-08 | Text as Any-Modality for Zero-Shot Classification by Consistent Prompt Tuning | Xiangyu Wu et.al. | 2508.06382 | null |
| 2025-08-08 | ECMF: Enhanced Cross-Modal Fusion for Multimodal Emotion Recognition in MER-SEMI Challenge | Juewen Hu et.al. | 2508.05991 | null |
| 2025-08-07 | Analyzing the Impact of Multimodal Perception on Sample Complexity and Optimization Landscapes in Imitation Learning | Luai Abuelsamen et.al. | 2508.05077 | null |
| 2025-08-07 | MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding | Weifan Zhang et.al. | 2508.05021 | null |
| 2025-08-06 | Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models | Md Raisul Kibria et.al. | 2508.04427 | null |
| 2025-08-06 | Length Matters: Length-Aware Transformer for Temporal Sentence Grounding | Yifan Wang et.al. | 2508.04299 | null |
| 2025-08-06 | SVC 2025: the First Multimodal Deception Detection Challenge | Xun Lin et.al. | 2508.04129 | null |
| 2025-07-29 | Multimodal Video Emotion Recognition with Reliable Reasoning Priors | Zhepeng Wang et.al. | 2508.03722 | null |
| 2025-08-05 | T2UE: Generating Unlearnable Examples from Text Descriptions | Xingjun Ma et.al. | 2508.03091 | null |
| 2025-08-04 | MonoDream: Monocular Vision-Language Navigation with Panoramic Dreaming | Shuo Wang et.al. | 2508.02549 | null |
| 2025-08-04 | Hierarchical MoE: Continuous Multimodal Emotion Recognition with Incomplete and Asynchronous Inputs | Yitong Zhu et.al. | 2508.02133 | null |
| 2025-08-04 | “Harmless to You, Hurtful to Me!”: Investigating the Detection of Toxic Languages Grounded in the Perspective of Youth | Yaqiong Li et.al. | 2508.02094 | null |
| 2025-08-03 | DRKF: Decoupled Representations with Knowledge Fusion for Multimodal Emotion Recognition | Peiyuan Jiang et.al. | 2508.01644 | null |
| 2025-08-02 | A Large-Scale Benchmark of Cross-Modal Learning for Histology and Gene Expression in Spatial Transcriptomics | Rushin H. Gindra et.al. | 2508.01490 | null |
| 2025-08-02 | AffectGPT-R1: Leveraging Reinforcement Learning for Open-Vocabulary Emotion Recognition | Zheng Lian et.al. | 2508.01318 | null |
| 2025-07-29 | SmartCLIP: Modular Vision-language Alignment with Identification Guarantees | Shaoan Xie et.al. | 2507.22264 | null |
| 2025-07-29 | MAGE: Multimodal Alignment and Generation Enhancement via Bridging Visual and Semantic Spaces | Shaojun E et.al. | 2507.21741 | link |
| 2025-07-29 | Sync-TVA: A Graph-Attention Framework for Multimodal Emotion Recognition with Cross-Modal Fusion | Zeyu Deng et.al. | 2507.21395 | null |
| 2025-07-28 | On the Limits of Hierarchically Embedded Logic in Classical Neural Networks | Bill Cochran et.al. | 2507.20960 | null |
| 2025-07-28 | TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model | Ao Li et.al. | 2507.20630 | link |
| 2025-07-25 | Enhancing Speech Emotion Recognition Leveraging Aligning Timestamps of ASR Transcripts and Speaker Diarization | Hsuan-Yu Wang et.al. | 2507.19356 | null |
| 2025-07-25 | SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality | Sijie Li et.al. | 2507.19264 | null |
| 2025-07-24 | Deep Learning for Blood-Brain Barrier Permeability Prediction | Zihan Yang et.al. | 2507.18557 | null |
| 2025-07-23 | RoadBench: A Vision-Language Foundation Model and Benchmark for Road Damage Understanding | Xi Xiao et.al. | 2507.17353 | null |
| 2025-07-22 | VL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP Embeddings | Ramin Giahi et.al. | 2507.17080 | null |
| 2025-07-20 | TD-Interpreter: Enhancing the Understanding of Timing Diagrams with Visual-Language Learning | Jie He et.al. | 2507.16844 | null |
| 2025-07-21 | Applying multimodal learning to Classify transient Detections Early (AppleCiDEr) I: Data set, methods, and infrastructure | Alexandra Junell et.al. | 2507.16088 | null |
| 2025-07-21 | MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations | Deyun Zhang et.al. | 2507.15255 | null |
| 2025-07-20 | LeAdQA: LLM-Driven Context-Aware Temporal Grounding for Video Question Answering | Xinxin Dong et.al. | 2507.14784 | null |
| 2025-07-18 | MaskHOI: Robust 3D Hand-Object Interaction Estimation via Masked Pre-training | Yuechen Xie et.al. | 2507.13673 | null |
| 2025-07-17 | City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning | Penglei Sun et.al. | 2507.12795 | null |
| 2025-07-17 | A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models | Weijieying Ren et.al. | 2507.12774 | null |
| 2025-07-15 | Partitioner Guided Modal Learning Framework | Guimin Hu et.al. | 2507.11661 | null |
| 2025-07-15 | A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition | Xinkui Zhao et.al. | 2507.11202 | null |
| 2025-07-14 | Ground-Compose-Reinforce: Tasking Reinforcement Learning Agents through Formal Language | Andrew C. Li et.al. | 2507.10741 | null |
| 2025-07-14 | Boosting Multimodal Learning via Disentangled Gradient Learning | Shicai Wei et.al. | 2507.10213 | null |
| 2025-07-21 | Improving Multimodal Learning via Imbalanced Learning | Shicai Wei et.al. | 2507.10203 | null |
| 2025-07-13 | HMID-Net: An Exploration of Masked Image Modeling and Knowledge Distillation in Hyperbolic Space | Changli Wang et.al. | 2507.09487 | null |
| 2025-07-09 | Robust Multimodal Learning Framework For Intake Gesture Detection Using Contactless Radar and Wearable IMU Sensors | Chunzhuo Wang et.al. | 2507.07261 | null |
| 2025-07-09 | Explainable Artificial Intelligence in Biomedical Image Analysis: A Comprehensive Survey | Getamesay Haile Dagnaw et.al. | 2507.07148 | null |
| 2025-07-08 | Enhancing Synthetic CT from CBCT via Multimodal Fusion and End-To-End Registration | Maximilian Tschuchnig et.al. | 2507.06067 | null |
| 2025-07-08 | Graph Learning | Feng Xia et.al. | 2507.05636 | null |
| 2025-07-07 | Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models | Eunseop Yoon et.al. | 2507.04976 | null |
| 2025-07-07 | From Vision To Language through Graph of Events in Space and Time: An Explainable Self-supervised Approach | Mihai Masala et.al. | 2507.04815 | null |
| 2025-07-07 | MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding | Zhicheng Zhang et.al. | 2507.04635 | null |
| 2025-07-10 | DMER-Ranker: Learning to Rank Emotion Descriptions in the Absence of Ground Truth | Zheng Lian et.al. | 2507.04278 | null |
| 2025-07-05 | Unlocking Compositional Control: Self-Supervision for LVLM-Based Image Generation | Fernando Gabriela Garcia et.al. | 2507.04151 | null |
| 2025-07-03 | Intelligent Histology for Tumor Neurosurgery | Xinhai Hou et.al. | 2507.03037 | null |
| 2025-07-01 | Gated Recursive Fusion: A Stateful Approach to Scalable Multimodal Transformers | Yusuf Shihata et.al. | 2507.02985 | null |
| 2025-07-02 | TAGF: Time-aware Gated Fusion for Multimodal Valence-Arousal Estimation | Yubeen Lee et.al. | 2507.02080 | null |
| 2025-06-27 | XxaCT-NN: Structure Agnostic Multimodal Learning for Materials Science | Jithendaraa Subramanian et.al. | 2507.01054 | null |
| 2025-06-27 | Test-Time Consistency in Vision Language Models | Shih-Han Chou et.al. | 2506.22395 | null |
| 2025-06-27 | Sheaf-Based Decentralized Multimodal Learning for Next-Generation Wireless Communication Systems | Abdulmomen Ghalkha et.al. | 2506.22374 | null |
| 2025-06-26 | ImplicitQA: Going beyond frames towards Implicit Video Reasoning | Sirnam Swetha et.al. | 2506.21742 | null |
| 2025-06-28 | G $^{2}$ D: Boosting Multimodal Learning with Gradient-Guided Distillation | Mohammed Rakib et.al. | 2506.21514 | null |
| 2025-06-26 | V2X-REALM: Vision-Language Model-Based Robust End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling | Junwei You et.al. | 2506.21041 | null |
| 2025-06-26 | TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence | Feng Jiang et.al. | 2506.21028 | null |
| 2025-06-26 | Where is AIED Headed? Key Topics and Emerging Frontiers (2020-2024) | Shihui Feng et.al. | 2506.20971 | null |
| 2025-06-24 | Emergence of Text Readability in Vision Language Models | Jaeyoo Park et.al. | 2506.19389 | null |
| 2025-06-27 | Haptic-ACT – Pseudo Oocyte Manipulation by a Robot Using Multimodal Information and Action Chunking with Transformers | Pedro Miguel Uriguen Eljuri et.al. | 2506.18212 | null |
| 2025-06-21 | Can Generated Images Serve as a Viable Modality for Text-Centric Multimodal Learning? | Yuesheng Huang et.al. | 2506.17623 | null |
| 2025-06-24 | AI-based Multimodal Biometrics for Detecting Smartphone Distractions: Application to Online Learning | Alvaro Becerra et.al. | 2506.17364 | null |
| 2025-06-20 | With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You | Fabian Gröger et.al. | 2506.16895 | null |
| 2025-06-18 | A Strong View-Free Baseline Approach for Single-View Image Guided Point Cloud Completion | Fangzhou Lin et.al. | 2506.15747 | null |
| 2025-06-18 | Foundation of Affective Computing and Interaction | Changzeng Fu et.al. | 2506.15497 | null |
| 2025-06-18 | video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models | Changli Tang et.al. | 2506.15220 | null |
| 2025-06-17 | Can Pretrained Vision-Language Embeddings Alone Guide Robot Navigation? | Nitesh Subedi et.al. | 2506.14507 | link |
| 2025-06-16 | Comparison of ConvNeXt and Vision-Language Models for Breast Density Assessment in Screening Mammography | Yusdivia Molina-Román et.al. | 2506.13964 | null |
| 2025-06-16 | A Survey on World Models Grounded in Acoustic Physical Information | Xiaoliang Chen et.al. | 2506.13833 | link |
| 2025-06-16 | A Survey on Imitation Learning for Contact-Rich Tasks in Robotics | Toshiaki Tsuji et.al. | 2506.13498 | null |
| 2025-06-16 | Fatigue-Aware Adaptive Interfaces for Wearable Devices Using Deep Learning | Yikan Wang et.al. | 2506.13203 | null |
| 2025-06-15 | Learning to Fuse: Modality-Aware Adaptive Scheduling for Robust Multimodal Foundation Models | Liam Bennett et.al. | 2506.12733 | null |
| 2025-06-14 | Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics | Asifullah khan et.al. | 2506.12365 | null |
| 2025-06-14 | GSDNet: Revisiting Incomplete Multimodal-Diffusion from Graph Spectrum Perspective for Conversation Emotion Recognition | Yuntao Shou et.al. | 2506.12325 | null |
| 2025-06-16 | Improving Multimodal Learning Balance and Sufficiency through Data Remixing | Xiaoyu Ma et.al. | 2506.11550 | null |
| 2025-06-13 | RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer | Haotian Ni et.al. | 2506.11465 | null |
| 2025-06-12 | Combining Log Data and Collaborative Dialogue Features to Predict Project Quality in Middle School AI Education | Conrad Borchers et.al. | 2506.11326 | null |
| 2025-06-12 | Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction | Thanathai Lertpetchpun et.al. | 2506.10930 | null |
| 2025-06-12 | Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts | Guowei Zhong et.al. | 2506.10452 | link |
| 2025-06-09 | Segment Any Architectural Facades (SAAF):An automatic segmentation model for building facades, walls and windows based on multimodal semantics guidance | Peilin Li et.al. | 2506.09071 | null |
| 2025-06-10 | Enhancing Synthetic CT from CBCT via Multimodal Fusion: A Study on the Impact of CBCT Quality and Alignment | Maximilian Tschuchnig et.al. | 2506.08716 | null |
| 2025-06-10 | MOSAIC-F: A Framework for Enhancing Students’ Oral Presentation Skills through Personalized Feedback | Alvaro Becerra et.al. | 2506.08634 | null |
| 2025-06-09 | Language-Grounded Hierarchical Planning and Execution with Multi-Robot 3D Scene Graphs | Jared Strader et.al. | 2506.07454 | null |
| 2025-06-08 | A Narrative Review on Large AI Models in Lung Cancer Screening, Diagnosis, and Treatment Planning | Jiachen Zhong et.al. | 2506.07236 | null |
| 2025-06-08 | Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning | Tianyi Bai et.al. | 2506.07227 | null |
| 2025-06-08 | A Layered Self-Supervised Knowledge Distillation Framework for Efficient Multimodal Learning on the Edge | Tarique Dahri et.al. | 2506.07055 | null |
| 2025-06-06 | Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning | Sheng Chen et.al. | 2506.06205 | null |
| 2025-06-06 | Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization | Jonathan Yang et.al. | 2506.06196 | null |
| 2025-06-06 | MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations Theory | Ana Carolina Condez et.al. | 2506.05696 | null |
| 2025-06-03 | Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation | Israa A. Albadarneh et.al. | 2506.05399 | null |
| 2025-06-05 | Towards Language-Augmented Multi-Agent Deep Reinforcement Learning | Maxime Toquebiau et.al. | 2506.05236 | null |
| 2025-06-05 | Quantifying Cross-Modality Memorization in Vision-Language Models | Yuxin Wen et.al. | 2506.05198 | null |
| 2025-06-05 | A Survey on Vietnamese Document Analysis and Recognition: Challenges and Future Directions | Anh Le et.al. | 2506.05061 | null |
| 2025-06-04 | EmoArt: A Multidimensional Dataset for Emotion-Aware Artistic Generation | Cheng Zhang et.al. | 2506.03652 | null |
| 2025-06-03 | Enriching Location Representation with Detailed Semantic Information | Junyuan Liu et.al. | 2506.02744 | null |
| 2025-06-02 | Entity Image and Mixed-Modal Image Retrieval Datasets | Cristian-Ioan Blaga et.al. | 2506.02291 | null |
| 2025-06-02 | Confidence-Aware Self-Distillation for Multimodal Sentiment Analysis with Incomplete Modalities | Yanxi Luo et.al. | 2506.01490 | null |
| 2025-06-02 | Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text Benchmark | Shuyu Yang et.al. | 2506.01466 | null |
| 2025-06-02 | Agentic Episodic Control | Xidong Yang et.al. | 2506.01442 | null |
| 2025-06-01 | Leveraging CLIP Encoder for Multimodal Emotion Recognition | Yehun Song et.al. | 2506.00903 | null |
| 2025-06-01 | GIA-MIC: Multimodal Emotion Recognition with Gated Interactive Attention and Modality-Invariant Learning Constraints | Jiajun He et.al. | 2506.00865 | null |
| 2025-06-01 | TIME: TabPFN-Integrated Multimodal Engine for Robust Tabular-Image Learning | Jiaqi Luo et.al. | 2506.00813 | null |
| 2025-05-30 | Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework | Can Polat et.al. | 2506.00302 | null |
| 2025-05-30 | Mixpert: Mitigating Multimodal Learning Conflicts with Efficient Mixture-of-Vision-Experts | Xin He et.al. | 2505.24541 | null |
| 2025-05-29 | Towards disentangling the contributions of articulation and acoustics in multimodal phoneme recognition | Sean Foley et.al. | 2505.24059 | null |
| 2025-06-02 | Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles | Zifu Wang et.al. | 2505.23590 | link |
| 2025-05-29 | OmniEarth-Bench: Towards Holistic Evaluation of Earth’s Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data | Fengxiang Wang et.al. | 2505.23522 | null |
| 2025-05-29 | Bidirectional predictive coding | Gaspard Oliviers et.al. | 2505.23415 | null |
| 2025-05-29 | Deep Modeling and Optimization of Medical Image Classification | Yihang Wu et.al. | 2505.23040 | link |
| 2025-05-30 | EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations | Haoqin Sun et.al. | 2505.23018 | link |
| 2025-05-27 | A Cross Modal Knowledge Distillation & Data Augmentation Recipe for Improving Transcriptomics Representations through Morphological Features | Ihab Bendidi et.al. | 2505.21317 | null |
| 2025-05-26 | Multimodal Emotion Recognition in Conversations: A Survey of Methods, Trends, Challenges and Prospects | Chengyan Wu et.al. | 2505.20511 | null |
| 2025-05-25 | PDFBench: A Benchmark for De novo Protein Design from Function | Jiahao Kuang et.al. | 2505.20346 | null |
| 2025-05-26 | Learning Optimal Multimodal Information Bottleneck Representations | Qilong Wu et.al. | 2505.19996 | null |
| 2025-05-26 | ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs | Pooneh Mousavi et.al. | 2505.19937 | null |
| 2025-05-26 | Multiplicity is an Inevitable and Inherent Challenge in Multimodal Learning | Sanghyuk Chun et.al. | 2505.19614 | null |
| 2025-05-26 | Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate | Liangwei Nathan Zheng et.al. | 2505.19525 | null |
| 2025-05-25 | Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding | Shiyue Wang et.al. | 2505.19219 | null |
| 2025-05-25 | I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts | Jiayi Xin et.al. | 2505.19190 | link |
| 2025-05-23 | Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation | Zhihua Liu et.al. | 2505.17994 | null |
| 2025-05-23 | HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning | Chuhao Zhou et.al. | 2505.17645 | null |
| 2025-05-23 | RoHyDR: Robust Hybrid Diffusion Recovery for Incomplete Multimodal Emotion Recognition | Yuehan Jin et.al. | 2505.17501 | null |
| 2025-05-21 | NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation | Weiming Wu et.al. | 2505.17121 | null |
| 2025-05-22 | ICYM2I: The illusion of multimodal informativeness under missingness | Young Sang Choi et.al. | 2505.16953 | link |
| 2025-05-22 | Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports | Francesco Dalla Serra et.al. | 2505.16624 | null |
| 2025-05-22 | Multimodal Online Federated Learning with Modality Missing in Internet of Things | Heqiang Wang et.al. | 2505.16138 | null |
| 2025-05-21 | Robust Multimodal Learning via Entropy-Gated Contrastive Fusion | Leon Chlon et.al. | 2505.15417 | null |
| 2025-05-21 | EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy | Chi Kit Ng et.al. | 2505.15206 | null |
| 2025-05-21 | Graph Foundation Models: A Comprehensive Survey | Zehong Wang et.al. | 2505.15116 | link |
| 2025-05-19 | HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity | Xuejun Sun et.al. | 2505.14725 | link |
| 2025-05-20 | Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal Learning | Jiangrong Shen et.al. | 2505.14535 | null |
| 2025-05-20 | Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition | Shuo Zhang et.al. | 2505.14143 | null |
| 2025-05-20 | LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts | Qifeng Cai et.al. | 2505.13928 | link |
| 2025-05-17 | Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering | Hessa Alawwad et.al. | 2505.13520 | null |
| 2025-05-19 | AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning | Kai Zhang et.al. | 2505.12782 | null |
| 2025-05-19 | PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI | Yingchen He et.al. | 2505.12707 | null |
| 2025-05-17 | Understanding the Capabilities of Molecular Graph Neural Networks in Materials Science Through Multimodal Learning and Physical Context Encoding | Can Polat et.al. | 2505.12137 | null |
| 2025-05-17 | SafeVid: Toward Safety Aligned Video Large Multimodal Models | Yixu Wang et.al. | 2505.11926 | null |
| 2025-05-16 | GeoMM: On Geodesic Perspective for Multi-modal Learning | Shibin Mei et.al. | 2505.11216 | null |
| 2025-05-15 | Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence | Xiang He et.al. | 2505.10176 | link |
| 2025-05-14 | VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation | Chaofan Zhang et.al. | 2505.09577 | null |
| 2025-05-16 | Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora | Michael Majurski et.al. | 2505.08905 | link |
| 2025-05-13 | Decoupled Multimodal Prototypes for Visual Recognition with Missing Modalities | Jueqing Lu et.al. | 2505.08283 | null |
| 2025-05-11 | MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning | Lishan Yang et.al. | 2505.06911 | null |
| 2025-05-10 | Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning | H M Dipu Kabir et.al. | 2505.06592 | link |
| 2025-05-10 | TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition | Feng Liu et.al. | 2505.06536 | link |
| 2025-05-09 | NSF-MAP: Neurosymbolic Multimodal Fusion for Robust and Interpretable Anomaly Prediction in Assembly Pipelines | Chathurangi Shyalika et.al. | 2505.06333 | link |
| 2025-05-09 | Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models | Jugal Gajjar et.al. | 2505.06110 | null |
| 2025-05-09 | Why Are You Wrong? Counterfactual Explanations for Language Grounding with 3D Objects | Tobias Preintner et.al. | 2505.06030 | link |
| 2025-05-08 | The Moon’s Many Faces: A Single Unified Transformer for Multimodal Lunar Reconstruction | Tom Sander et.al. | 2505.05644 | null |
| 2025-05-07 | OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning | Xianhang Li et.al. | 2505.04601 | null |
| 2025-05-02 | Mapping the Climate Change Landscape on TikTok | Alessia Galdeman et.al. | 2505.03813 | null |
| 2025-05-06 | Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant | Haonan Wang et.al. | 2505.03380 | null |
| 2025-05-06 | A Vision-Language Model for Focal Liver Lesion Classification | Song Jian et.al. | 2505.03350 | null |
| 2025-05-06 | SonicRAG : High Fidelity Sound Effects Synthesis Based on Retrival Augmented Generation | Yu-Ren Guo et.al. | 2505.03244 | null |
| 2025-05-05 | The Multimodal Paradox: How Added and Missing Modalities Shape Bias and Performance in Multimodal AI | Kishore Sampath et.al. | 2505.03020 | null |
| 2025-05-02 | Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders | Rogelio A Mancisidor et.al. | 2505.01134 | null |
| 2025-04-30 | Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design | Vasudev Sharma et.al. | 2505.00134 | null |
| 2025-04-28 | DEEMO: De-identity Multimodal Emotion Recognition and Reasoning | Deng Li et.al. | 2504.19549 | null |
| 2025-04-27 | Platonic Grounding for Efficient Multimodal Language Models | Moulik Choraria et.al. | 2504.19327 | null |
| 2025-04-27 | DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning | Jialang Lu et.al. | 2504.19127 | null |
| 2025-04-23 | A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thaw | Wenwen Li et.al. | 2504.17822 | null |
| 2025-04-23 | Monte Carlo Planning with Large Language Model for Text-Based Game Agents | Zijing Shi et.al. | 2504.16855 | null |
| 2025-04-23 | Towards Explainable AI: Multi-Modal Transformer for Video-based Image Description Generation | Lakshita Agarwal et.al. | 2504.16788 | null |
| 2025-04-23 | PsyCounAssist: A Full-Cycle AI-Powered Psychological Counseling Assistant System | Xianghe Liu et.al. | 2504.16573 | null |
| 2025-04-22 | CLIP-IT: CLIP-based Pairing for Histology Images Classification | Banafsheh Karimian et.al. | 2504.16181 | null |
| 2025-04-22 | SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems | Manjunath D et.al. | 2504.15728 | null |
| 2025-04-21 | Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models | Guo Chen et.al. | 2504.15271 | null |
| 2025-04-21 | IoT-AMLHP: Aligned Multimodal Learning of Header-Payload Representations for Resource-Efficient Malicious IoT Traffic Classification | Fengyuan Nie et.al. | 2504.14833 | null |
| 2025-04-19 | Text-Audio-Visual-conditioned Diffusion Model for Video Saliency Prediction | Li Yu et.al. | 2504.14267 | null |
| 2025-04-19 | PEFT A2Z: Parameter-Efficient Fine-Tuning Survey for Large Language and Vision Models | Nusrat Jahan Prottasha et.al. | 2504.14117 | null |
| 2025-04-18 | Are you SURE? Enhancing Multimodal Pretraining with Missing Modalities through Uncertainty Estimation | Duy A. Nguyen et.al. | 2504.13465 | null |
| 2025-04-17 | A Survey on Cross-Modal Interaction Between Music and Multimodal Data | Sifei Li et.al. | 2504.12796 | null |
| 2025-04-16 | An Algebraic Extension of Intuitionistic Linear Logic: The $L_!^S$ -Calculus and Its Categorical Model | Alejandro Díaz-Caro et.al. | 2504.12128 | null |
| 2025-04-16 | FedEPA: Enhancing Personalization and Modality Alignment in Multimodal Federated Learning | Yu Zhang et.al. | 2504.12025 | null |
| 2025-04-15 | Leveraging multimodal explanatory annotations for video interpretation with Modality Specific Dataset | Elisa Ancarani et.al. | 2504.11232 | null |
| 2025-04-14 | Improving Multimodal Hateful Meme Detection Exploiting LMM-Generated Knowledge | Maria Tzelepi et.al. | 2504.09914 | null |
| 2025-04-13 | Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention | Vasilii Korolkov et.al. | 2504.09738 | null |
| 2025-04-13 | Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation | Yongchao Feng et.al. | 2504.09480 | link |
| 2025-04-09 | Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging | Siyuan Dai et.al. | 2504.07336 | null |
| 2025-04-07 | Resource-Efficient Beam Prediction in mmWave Communications with Multimodal Realistic Simulation Framework | Yu Min Park et.al. | 2504.05187 | null |
| 2025-04-07 | Leveraging Label Potential for Enhanced Multimodal Emotion Recognition | Xuechun Shao et.al. | 2504.05158 | null |
| 2025-04-06 | FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency | Shiyan Liu et.al. | 2504.04427 | null |
| 2025-04-04 | Interpretable Multimodal Learning for Tumor Protein-Metal Binding: Progress, Challenges, and Perspectives | Xiaokun Liu et.al. | 2504.03847 | null |
| 2025-04-04 | DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models | Sathish Kumar et.al. | 2504.03423 | null |
| 2025-04-02 | Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities | Jing Liu et.al. | 2504.01954 | null |
| 2025-04-02 | Deep Learning-Driven Protein Structure Prediction and Design: Key Model Developments by Nobel Laureates and Multi-Domain Applications | Wanqing Yang et.al. | 2504.01490 | null |
| 2025-03-31 | Grounding Agent Reasoning in Image Schemas: A Neurosymbolic Approach to Embodied Cognition | François Olivier et.al. | 2503.24110 | null |
| 2025-03-31 | DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description | Adrienne Deganutti et.al. | 2503.24096 | null |
| 2025-03-31 | BeMERC: Behavior-Aware MLLM-based Framework for Multimodal Emotion Recognition in Conversation | Yumeng Fu et.al. | 2503.23990 | null |
| 2025-03-31 | Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion | Jiagen Li et.al. | 2503.23721 | null |
| 2025-03-31 | HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation | Kun Liu et.al. | 2503.23715 | null |
| 2025-03-27 | Graph-to-Vision: Multi-graph Understanding and Reasoning using Vision-Language Models | Ruizhou Li et.al. | 2503.21435 | null |
| 2025-03-27 | UGen: Unified Autoregressive Multimodal Model with Progressive Vocabulary Learning | Hongxuan Tang et.al. | 2503.21193 | null |
| 2025-03-27 | AdaMHF: Adaptive Multimodal Hierarchical Fusion for Survival Prediction | Shuaiyu Zhang et.al. | 2503.21124 | link |
| 2025-03-26 | GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations | Yupei Li et.al. | 2503.20919 | null |
| 2025-03-26 | An Encoding of Interaction Nets in OCaml | Nikolaus Huber et.al. | 2503.20463 | null |
| 2025-03-27 | RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models | Mehdi Moshtaghi et.al. | 2503.19654 | null |
| 2025-03-25 | VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction | Zizhi Chen et.al. | 2503.19367 | link |
| 2025-03-25 | LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text | Weizhi Chen et.al. | 2503.19311 | link |
| 2025-03-24 | Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition | Chengxiang Huang et.al. | 2503.18595 | link |
| 2025-03-21 | Feature-Based Dual Visual Feature Extraction Model for Compound Multimodal Emotion Recognition | Ran Liu et.al. | 2503.17453 | link |
| 2025-03-21 | MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering | Jialin Chen et.al. | 2503.16858 | null |
| 2025-03-20 | EVA-MED: An Enhanced Valence-Arousal Multimodal Emotion Dataset for Emotion Recognition | Xin Huang et.al. | 2503.16584 | null |
| 2025-03-18 | Do Multimodal Large Language Models Understand Welding? | Grigorii Khvatskii et.al. | 2503.16537 | null |
| 2025-03-19 | EarthScape: A Multimodal Dataset for Surficial Geologic Mapping and Earth Surface Analysis | Matthew Massey et.al. | 2503.15625 | link |
| 2025-03-19 | Optimal Transport Adapter Tuning for Bridging Modality Gaps in Few-Shot Remote Sensing Scene Classification | Zhong Ji et.al. | 2503.14938 | null |
| 2025-03-18 | HySurvPred: Multimodal Hyperbolic Embedding with Angle-Aware Hierarchical Contrastive Learning and Uncertainty Constraints for Survival Prediction | Jiaqi Yang et.al. | 2503.13862 | null |
| 2025-03-17 | Exploring 3D Activity Reasoning and Planning: From Implicit Human Intentions to Route-Aware Planning | Xueying Jiang et.al. | 2503.12974 | null |
| 2025-03-16 | BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries | Tianle Li et.al. | 2503.12446 | null |
| 2025-03-15 | Handling Weak Complementary Relationships for Audio-Visual Emotion Recognition | R. Gnana Praveen et.al. | 2503.12261 | null |
| 2025-03-14 | Cross-Modal Learning for Music-to-Music-Video Description Generation | Zhuoyuan Mao et.al. | 2503.11190 | null |
| 2025-03-20 | Unifying 2D and 3D Vision-Language Understanding | Ayush Jain et.al. | 2503.10745 | null |
| 2025-03-11 | TLA: Tactile-Language-Action Model for Contact-Rich Manipulation | Peng Hao et.al. | 2503.08548 | null |
| 2025-03-10 | Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency | Duy Phuong Nguyen et.al. | 2503.07552 | link |
| 2025-03-10 | A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis | Xiang Liu et.al. | 2503.06973 | link |
| 2025-03-10 | HiSTF Mamba: Hierarchical Spatiotemporal Fusion with Multi-Granular Body-Spatial Modeling for High-Fidelity Text-to-Motion Generation | Xingzu Zhan et.al. | 2503.06897 | null |
| 2025-03-10 | Towards Generalization of Tactile Image Generation: Reference-Free Evaluation in a Leakage-Free Setting | Cagri Gungor et.al. | 2503.06860 | null |
| 2025-03-09 | Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts | Aref Farhadipour et.al. | 2503.06805 | null |
| 2025-03-13 | DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning | Chengxuan Qian et.al. | 2503.06456 | link |
| 2025-03-05 | Beyond H&E: Unlocking Pathological Insights with Polarization via Self-supervised Learning | Yao Du et.al. | 2503.05933 | null |
| 2025-03-10 | R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning | Jiaxing Zhao et.al. | 2503.05379 | null |
| 2025-03-07 | Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation | Xinkun Wang et.al. | 2503.05319 | null |
| 2025-03-06 | Large Language Models in Bioinformatics: A Survey | Zhenyu Wang et.al. | 2503.04490 | null |
| 2025-03-05 | Rebalanced Multimodal Learning with Data-aware Unimodal Sampling | Qingyuan Jiang et.al. | 2503.03792 | null |
| 2025-03-04 | Multimodal Deep Learning for Subtype Classification in Breast Cancer Using Histopathological Images and Gene Expression Data | Amin Honarmandi Shandiz et.al. | 2503.02849 | null |
| 2025-03-04 | Multimodal AI predicts clinical outcomes of drug combinations from preclinical data | Yepeng Huang et.al. | 2503.02781 | null |
| 2025-03-03 | Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA | Zhusi Zhong et.al. | 2503.02034 | null |
| 2025-03-03 | DeepSuM: Deep Sufficient Modality Learning Framework | Zhe Gao et.al. | 2503.01728 | null |
| 2025-03-03 | Dementia Insights: A Context-Based MultiModal Approach | Sahar Sinene Mehdoui et.al. | 2503.01226 | null |
| 2025-03-03 | HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation | Hongye Cheng et.al. | 2503.01175 | null |
| 2025-02-28 | Foundation-Model-Boosted Multimodal Learning for fMRI-based Neuropathic Pain Drug Response Prediction | Wenrui Fan et.al. | 2503.00210 | null |
| 2025-02-28 | PathVG: A New Benchmark and Dataset for Pathology Visual Grounding | Chunlin Zhong et.al. | 2502.20869 | null |
| 2025-02-28 | Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving Systems | Faisal Mohammad et.al. | 2502.20806 | null |
| 2025-02-27 | VideoA11y: Method and Dataset for Accessible Video Description | Chaoyu Li et.al. | 2502.20480 | null |
| 2025-02-27 | LIFT-GS: Cross-Scene Render-Supervised Distillation for 3D Language Grounding | Ang Cao et.al. | 2502.20389 | null |
| 2025-02-27 | Rethinking Multimodal Learning from the Perspective of Mitigating Classification Ability Disproportion | QingYuan Jiang et.al. | 2502.20120 | null |
| 2025-02-27 | MICINet: Multi-Level Inter-Class Confusing Information Removal for Reliable Multimodal Classification | Tong Zhang et.al. | 2502.19674 | null |
| 2025-02-25 | CPVis: Evidence-based Multimodal Learning Analytics for Evaluation in Collaborative Programming | Gefei Zhang et.al. | 2502.17835 | null |
| 2025-02-24 | Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI | Syed Abdul Gaffar Shakhadri et.al. | 2502.17092 | null |
| 2025-02-24 | DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications | Ibrahim Fayad et.al. | 2502.17066 | null |
| 2025-02-23 | Category-Selective Neurons in Deep Networks: Comparing Purely Visual and Visual-Language Models | Zitong Lu et.al. | 2502.16456 | null |
| 2025-02-23 | A Survey on Industrial Anomalies Synthesis | Xichen Xu et.al. | 2502.16412 | link |
| 2025-02-22 | Understanding the Emergence of Multimodal Representation Alignment | Megan Tjandrasuwita et.al. | 2502.16282 | link |
| 2025-02-21 | M2LADS Demo: A System for Generating Multimodal Learning Analytics Dashboards | Alvaro Becerra et.al. | 2502.15363 | null |
| 2025-02-20 | FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis | Fadillah Maani et.al. | 2502.14807 | link |
| 2025-02-21 | AVD2: Accident Video Diffusion for Accident Video Description | Cheng Li et.al. | 2502.14801 | null |
| 2025-02-19 | Latent Distribution Decoupling: A Probabilistic Framework for Uncertainty-Aware Multimodal Emotion Recognition | Jingwang Huang et.al. | 2502.13954 | link |
| 2025-02-22 | Grounding LLM Reasoning with Knowledge Graphs | Alfonso Amayuelas et.al. | 2502.13247 | null |
| 2025-02-18 | SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation | Zekun Qi et.al. | 2502.13143 | link |
| 2025-02-18 | Robust Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning | Mengshi Qi et.al. | 2502.12425 | link |
| 2025-02-16 | AudioSpa: Spatializing Sound Events with Text | Linfeng Feng et.al. | 2502.11219 | null |
| 2025-02-18 | BalanceBenchmark: A Survey for Imbalanced Learning | Shaoxuan Xu et.al. | 2502.10816 | link |
| 2025-02-17 | Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation | Mohammad Mahdi Abootorabi et.al. | 2502.08826 | link |
| 2025-02-12 | A Novel Approach to for Multimodal Emotion Recognition : Multimodal semantic information fusion | Wei Dai et.al. | 2502.08573 | null |
| 2025-02-17 | What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations | Dongqi Liu et.al. | 2502.08279 | link |
| 2025-02-11 | Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis | Amir Hosein Fadaei et.al. | 2502.07277 | null |
| 2025-02-10 | Generative Distribution Prediction: A Unified Approach to Multimodal Learning | Xinyu Tian et.al. | 2502.07090 | null |
| 2025-02-06 | CAST: Cross Attention based multimodal fusion of Structure and Text for materials property prediction | Jaewan Lee et.al. | 2502.06836 | null |
| 2025-02-10 | Learning Musical Representations for Music Performance Question Answering | Xingjian Diao et.al. | 2502.06710 | null |
| 2025-02-04 | Exploring Spatial Language Grounding Through Referring Expressions | Akshar Tumu et.al. | 2502.04359 | null |
| 2025-02-03 | Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective | Xiaorui Ma et.al. | 2502.01524 | null |
| 2025-02-03 | MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks | Alejandro Guerra-Manzanares et.al. | 2502.01158 | null |
| 2025-02-01 | Milmer: a Framework for Multiple Instance Learning based Multimodal Emotion Recognition | Zaitian Wang et.al. | 2502.00547 | link |
| 2025-01-29 | U2A: Unified Unimodal Adaptation for Robust and Efficient Multimodal Learning | Md Kaykobad Reza et.al. | 2501.17823 | null |
| 2025-01-28 | Molecular-driven Foundation Model for Oncologic Pathology | Anurag Vaidya et.al. | 2501.16652 | null |
| 2025-01-27 | AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models | Zheng Lian et.al. | 2501.16566 | null |
| 2025-01-25 | Inductive Biases for Zero-shot Systematic Generalization in Language-informed Reinforcement Learning | Negin Hashemi Dijujin et.al. | 2501.15270 | null |
| 2025-01-25 | Deep Multimodal Learning for Real-Time DDoS Attacks Detection in Internet of Vehicles | Mohamed Ababsa et.al. | 2501.15252 | link |
| 2025-01-25 | Cross-modal Context Fusion and Adaptive Graph Convolutional Network for Multimodal Conversational Emotion Recognition | Junwei Feng et.al. | 2501.15063 | null |
| 2025-01-23 | Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge | Haomiao Xiong et.al. | 2501.13468 | link |
| 2025-01-22 | EmoTech: A Multi-modal Speech Emotion Recognition Using Multi-source Low-level Information with Hybrid Recurrent Network | Shamin Bin Habib Avro et.al. | 2501.12674 | null |
| 2025-01-21 | Compositional Instruction Following with Language Models and Reinforcement Learning | Vanya Cohen et.al. | 2501.12539 | null |
| 2025-01-21 | Multi-stage intermediate fusion for multimodal learning to classify non-small cell lung cancer subtypes from CT and PET | Fatih Aksu et.al. | 2501.12425 | null |
| 2025-01-20 | LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations | Soumya Dutta et.al. | 2501.11468 | null |
| 2025-01-20 | ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction | Xiangyang Hu et.al. | 2501.11276 | link |
| 2025-01-18 | Fake Advertisements Detection Using Automated Multimodal Learning: A Case Study for Vietnamese Real Estate Data | Duy Nguyen et.al. | 2501.10848 | null |
| 2025-01-17 | A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features | Enes Karanfil et.al. | 2501.10144 | null |
| 2025-01-17 | TeamVision: An AI-powered Learning Analytics System for Supporting Reflection in Team-based Healthcare Simulation | Vanessa Echeverria et.al. | 2501.09930 | null |
| 2025-01-19 | IDEA: Image Description Enhanced CLIP-Adapter | Zhipeng Ye et.al. | 2501.08816 | link |
| 2025-01-14 | Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time | Mihai Masala et.al. | 2501.08460 | null |
| 2025-01-12 | SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval | Bhavin Jawade et.al. | 2501.08347 | null |
| 2025-01-17 | Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding | Liping Yuan et.al. | 2501.07888 | null |
| 2025-01-13 | Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis | Andrzej D. Dobrzycki et.al. | 2501.07221 | null |
| 2025-01-12 | 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes | Mahmoud Ahmed et.al. | 2501.06785 | link |
| 2025-01-14 | Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding | Joshua Jones et.al. | 2501.04693 | link |
| 2025-01-06 | CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets | Tanay Agrawal et.al. | 2501.03332 | null |
| 2025-01-06 | MVP: Multimodal Emotion Recognition based on Video and Physiological Signals | Valeriya Strizhkova et.al. | 2501.03103 | null |
| 2025-01-02 | Asymmetric Reinforcing against Multi-modal Representation Bias | Xiyuan Gao et.al. | 2501.01240 | link |
| 2025-01-02 | Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning | Jian Lang et.al. | 2501.01120 | link |
| 2024-12-30 | Aviary: training language agents on challenging scientific tasks | Siddharth Narayanan et.al. | 2412.21154 | link |
| 2024-12-30 | Hierarchical Banzhaf Interaction for General Video-Language Representation Learning | Peng Jin et.al. | 2412.20964 | link |
| 2024-12-30 | Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment | Xuechen Wang et.al. | 2412.20821 | null |
| 2024-12-29 | Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and Alignment | Shiyun Chen et.al. | 2412.20418 | null |
| 2024-12-26 | Multi-Head Attention Driven Dynamic Visual-Semantic Embedding for Enhanced Image-Text Matching | Wenjing Chen et.al. | 2412.19184 | null |
| 2024-12-26 | CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting | Siyu Jiao et.al. | 2412.19142 | null |
| 2024-12-24 | MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and Learning | Abdelmadjid Chergui et.al. | 2412.18437 | link |
| 2024-12-23 | Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion | Grigor Bezirganyan et.al. | 2412.18024 | link |
| 2024-12-23 | A Multimodal Emotion Recognition System: Integrating Facial Expressions, Body Movement, Speech, and Spoken Language | Kris Kraack et.al. | 2412.17907 | null |
| 2024-12-18 | Constraint-Based Model in Multimodal Learning to Improve Ventricular Arrhythmia Prediction | Evariste Njomgue Fotso et.al. | 2412.17840 | null |
| 2024-12-23 | Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy | Priyaranjan Pattnayak et.al. | 2412.17759 | null |
| 2024-12-23 | EPE-P: Evidence-based Parameter-efficient Prompting for Multimodal Learning with Missing Modalities | Zhe Chen et.al. | 2412.17677 | link |
| 2024-12-23 | V $^2$ -SfMLearner: Learning Monocular Depth and Ego-motion for Multimodal Wireless Capsule Endoscopy | Long Bai et.al. | 2412.17595 | null |
| 2024-12-22 | COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations | Vanessa Su et.al. | 2412.17180 | null |
| 2024-12-17 | DoPTA: Improving Document Layout Analysis using Patch-Text Alignment | Nikitha SR et.al. | 2412.12902 | null |
| 2024-12-17 | Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning | Shiping Ge et.al. | 2412.12791 | link |
| 2024-12-17 | PBVS 2024 Solution: Self-Supervised Learning and Sampling Strategies for SAR Classification in Extreme Long-Tail Distribution | Yuhyun Kim et.al. | 2412.12565 | null |
| 2024-12-16 | Gramian Multimodal Representation Learning and Alignment | Giordano Cicchetti et.al. | 2412.11959 | link |
| 2024-12-10 | Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning | Can Yaras et.al. | 2412.07909 | null |
| 2024-12-07 | WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition | Feng Li et.al. | 2412.05558 | null |
| 2024-12-05 | Lattice Lingo: Effect of Textual Detail on Multimodal Learning for Property Prediction of Crystals | Mrigi Munjal et.al. | 2412.04670 | null |
| 2024-12-04 | Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning | Neale Ratzlaff et.al. | 2412.03467 | null |
| 2024-12-04 | Grounded Language Design for Lightweight Diagramming for Formal Methods | Siddhartha Prasad et.al. | 2412.03310 | null |
| 2024-12-04 | Dynamic Graph Neural Ordinary Differential Equation Network for Multi-modal Emotion Recognition in Conversation | Yuntao Shou et.al. | 2412.02935 | null |
| 2024-12-03 | Initial Study On Improving Segmentation By Combining Preoperative CT And Intraoperative CBCT Using Synthetic Data | Maximilian E. Tschuchnig et.al. | 2412.02294 | null |
| 2024-12-02 | Occam’s LGS: A Simple Approach for Language Gaussian Splatting | Jiahuan Cheng et.al. | 2412.01807 | null |
| 2024-11-30 | Approximate Fiber Product: A Preliminary Algebraic-Geometric Perspective on Multimodal Embedding Alignment | Dongfang Zhao et.al. | 2412.00373 | null |
| 2024-11-29 | SDR-GNN: Spectral Domain Reconstruction Graph Neural Network for Incomplete Multimodal Learning in Conversational Emotion Recognition | Fangze Fu et.al. | 2411.19822 | null |
| 2024-11-26 | Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment | Zheng Chen et.al. | 2411.17237 | link |
| 2024-11-26 | Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation | Xu Zheng et.al. | 2411.17141 | link |
| 2024-11-26 | Relations, Negations, and Numbers: Looking for Logic in Generative Text-to-Image Models | Colin Conwell et.al. | 2411.17066 | link |
| 2024-11-26 | Multimodal Alignment and Fusion: A Survey | Songtao Li et.al. | 2411.17040 | null |
| 2024-11-25 | Language Driven Occupancy Prediction | Zhu Yu et.al. | 2411.16072 | link |
| 2024-11-23 | From Complexity to Parsimony: Integrating Latent Class Analysis to Uncover Multimodal Learning Patterns in Collaborative Learning | Lixiang Yan et.al. | 2411.15590 | null |
| 2024-11-23 | Botfip-LLM: An Enhanced Multimodal Scientific Computing Framework Leveraging Knowledge Distillation from Large Language Models | Tianhao Chen et.al. | 2411.15525 | null |
| 2024-11-22 | PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision | Arnav M. Das et.al. | 2411.15127 | null |
| 2024-11-21 | Generative AI for Music and Audio | Hao-Wen Dong et.al. | 2411.14627 | null |
| 2024-11-21 | Multimodal 3D Reasoning Segmentation with Complex Scenes | Xueying Jiang et.al. | 2411.13927 | null |
| 2024-11-12 | Public Health Advocacy Dataset: A Dataset of Tobacco Usage Videos from Social Media | Naga VS Raviteja Chappa et.al. | 2411.13572 | null |
| 2024-11-20 | I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences | Zihan Wang et.al. | 2411.12960 | null |
| 2024-11-18 | MMBind: Unleashing the Potential of Distributed and Heterogeneous Data for Multimodal Learning in IoT | Xiaomin Ouyang et.al. | 2411.12126 | null |
| 2024-11-19 | SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach | Ruoxi Sun et.al. | 2411.11195 | null |
| 2024-11-15 | Everything is a Video: Unifying Modalities through Next-Frame Prediction | G. Thomas Hudson et.al. | 2411.10503 | null |
| 2024-11-15 | Weakly-Supervised Multimodal Learning on MIMIC-CXR | Andrea Agostini et.al. | 2411.10356 | null |
| 2024-11-15 | CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation | Xiaofei Zhu et.al. | 2411.10060 | null |
| 2024-11-21 | Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era | Thanh Tam Nguyen et.al. | 2411.09955 | link |
| 2024-11-14 | SmartInv: Multimodal Learning for Smart Contract Invariant Inference | Sally Junsong Wang et.al. | 2411.09217 | null |
| 2024-11-12 | NL-SLAM for OC-VLN: Natural Language Grounded SLAM for Object-Centric VLN | Sonia Raychaudhuri et.al. | 2411.07848 | null |
| 2024-11-11 | Multimodal Fusion Balancing Through Game-Theoretic Regularization | Konstantinos Kontras et.al. | 2411.07335 | link |
| 2024-11-11 | StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification | Yichen He et.al. | 2411.07076 | link |
| 2024-11-08 | Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors | Yuanyuan Liu et.al. | 2411.05879 | null |
| 2024-11-06 | AutoGameUI: Constructing High-Fidelity Game UIs via Multimodal Learning and Interactive Web-Based Tool | Zhongliang Tang et.al. | 2411.03709 | null |
| 2024-11-05 | STEER: Flexible Robotic Manipulation via Dense Language Grounding | Laura Smith et.al. | 2411.03409 | null |
| 2024-11-05 | Grounding Natural Language to SQL Translation with Data-Based Self-Explanations | Yuankai Fan et.al. | 2411.02948 | link |
| 2024-11-04 | Grounding Emotional Descriptions to Electrovibration Haptic Signals | Guimin Hu et.al. | 2411.02118 | null |
| 2024-11-03 | Classifier-guided Gradient Modulation for Enhanced Multimodal Learning | Zirun Guo et.al. | 2411.01409 | link |
| 2024-11-01 | Text2Freq: Learning Series Patterns from Text via Frequency Domain | Ming-Chih Lo et.al. | 2411.00929 | null |
| 2024-10-29 | EEG-based Multimodal Representation Learning for Emotion Recognition | Kang Yin et.al. | 2411.00822 | null |
| 2024-11-01 | Analyzing Multimodal Integration in the Variational Autoencoder from an Information-Theoretic Perspective | Carlotta Langer et.al. | 2411.00522 | null |
| 2024-10-30 | PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation | Ryozo Masukawa et.al. | 2410.22623 | null |
| 2024-10-28 | IndraEye: Infrared Electro-Optical UAV-based Perception Dataset for Robust Downstream Tasks | Manjunath D et.al. | 2410.20953 | link |
| 2024-10-25 | TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Xiangyu Zeng et.al. | 2410.19702 | link |
| 2024-10-24 | UGotMe: An Embodied System for Affective Human-Robot Interaction | Peizhen Li et.al. | 2410.18373 | link |
| 2024-10-22 | EVC-MF: End-to-end Video Captioning Network with Multi-scale Features | Tian-Zi Niu et.al. | 2410.16624 | null |
| 2024-10-22 | MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report | Samrajya Thapa et.al. | 2410.16239 | link |
| 2024-10-21 | Multimodal Learning for Embryo Viability Prediction in Clinical IVF | Junsik Kim et.al. | 2410.15581 | null |
| 2024-10-20 | Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison | Shiyu Hu et.al. | 2410.15270 | null |
| 2024-10-15 | CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning | Qingqing Cao et.al. | 2410.11963 | null |
| 2024-10-15 | Generalizable Spacecraft Trajectory Generation via Multimodal Learning with Transformers | Davide Celestini et.al. | 2410.11723 | null |
| 2024-10-15 | On-the-fly Modulation for Balanced Multimodal Learning | Yake Wei et.al. | 2410.11582 | link |
| 2024-10-14 | MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models | Peng Xia et.al. | 2410.10139 | link |
| 2024-10-10 | Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts | Sukwon Yun et.al. | 2410.08245 | link |
| 2024-10-11 | Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization | Changli Tang et.al. | 2410.06682 | null |
| 2024-10-08 | Multimodal Representation Learning using Adaptive Graph Construction | Weichen Huang et.al. | 2410.06395 | null |
| 2024-10-07 | Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models | Dehong Kong et.al. | 2410.04884 | null |
| 2024-10-07 | MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection | Niki Nezakati et.al. | 2410.03010 | null |
| 2024-10-02 | Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations | Minoh Jeong et.al. | 2410.02086 | null |
| 2024-10-02 | Open-vocabulary Multimodal Emotion Recognition: Dataset, Metric, and Benchmark | Zheng Lian et.al. | 2410.01495 | null |
| 2024-10-04 | VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models | Jiapeng Wang et.al. | 2410.00741 | null |
| 2024-09-30 | Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning | Weitai Kang et.al. | 2410.00255 | link |
| 2024-09-30 | Towards Robust Multimodal Sentiment Analysis with Incomplete Data | Haoyu Zhang et.al. | 2409.20012 | link |
| 2024-10-02 | CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling | Jihai Zhang et.al. | 2409.19291 | link |
| 2024-09-26 | Infer Human’s Intentions Before Following Natural Language Instructions | Yanming Wan et.al. | 2409.18073 | link |
| 2024-09-26 | A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios | Christian Ganhör et.al. | 2409.17864 | null |
| 2024-09-26 | Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification | Raja Kumar et.al. | 2409.17777 | null |
| 2024-09-25 | Language Grounded Multi-agent Communication for Ad-hoc Teamwork | Huao Li et.al. | 2409.17348 | null |
| 2024-09-24 | CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation | Fuxian Huang et.al. | 2409.15806 | null |
| 2024-09-18 | All-in-one foundational models learning across quantum chemical levels | Yuxinxin Chen et.al. | 2409.12015 | link |
| 2024-09-13 | Hierarchical Hypercomplex Network for Multimodal Emotion Recognition | Eleonora Lopez et.al. | 2409.09194 | link |
| 2024-09-13 | Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing | Minh-Duc Vu et.al. | 2409.08885 | null |
| 2024-09-13 | A Multimodal Approach for Fluid Overload Prediction: Integrating Lung Ultrasound and Clinical Data | Tianqi Yang et.al. | 2409.08790 | null |
| 2024-09-13 | A Comprehensive Survey on Deep Multimodal Learning with Missing Modality | Renjie Wu et.al. | 2409.07825 | null |
| 2024-09-11 | What to align in multimodal contrastive learning? | Benoit Dufumier et.al. | 2409.07402 | null |
| 2024-09-11 | Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective | Guimin Hu et.al. | 2409.07388 | link |
| 2024-09-11 | Multimodal Emotion Recognition with Vision-language Prompting and Modality Dropout | Anbin QI et.al. | 2409.07078 | null |
| 2024-09-11 | A Survey of Multimodal Composite Editing and Retrieval | Suyan Li et.al. | 2409.05405 | link |
| 2024-09-09 | Diagnostic Reasoning in Natural Language: Computational Model and Application | Nils Dycke et.al. | 2409.05367 | null |
| 2024-09-10 | Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment | Zhixian Zhao et.al. | 2409.05015 | null |
| 2024-08-31 | Comparative Analysis of Modality Fusion Approaches for Audio-Visual Person Identification and Verification | Aref Farhadipour et.al. | 2409.00562 | null |
| 2024-08-29 | Toward Robust Early Detection of Alzheimer’s Disease via an Integrated Multimodal Learning Approach | Yifei Chen et.al. | 2408.16343 | link |
| 2024-08-28 | Meta-Learn Unimodal Signals with Weak Supervision for Multimodal Sentiment Analysis | Sijie Mai et.al. | 2408.16029 | null |
| 2024-08-28 | ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation | Tiantian Feng et.al. | 2408.15803 | null |
| 2024-08-28 | Visual Prompt Engineering for Medical Vision Language Models in Radiology | Stefan Denner et.al. | 2408.15802 | null |
| 2024-08-27 | The Benefits of Balance: From Information Projections to Variance Reduction | Lang Liu et.al. | 2408.15065 | null |
| 2024-08-27 | NeuralOOD: Improving Out-of-Distribution Generalization Performance with Brain-machine Fusion Learning Framework | Shuangchen Zhao et.al. | 2408.14950 | null |
| 2024-09-03 | Foundation Models for Music: A Survey | Yinghao Ma et.al. | 2408.14340 | link |
| 2024-09-06 | Quantum Multimodal Contrastive Learning Framework | Chi-Sheng Chen et.al. | 2408.13919 | null |
| 2024-08-25 | Multimodal Ensemble with Conditional Feature Fusion for Dysgraphia Diagnosis in Children from Handwriting Samples | Jayakanth Kunhoth et.al. | 2408.13754 | null |
| 2024-08-24 | R2G: Reasoning to Ground in 3D Scenes | Yixuan Li et.al. | 2408.13499 | null |
| 2024-08-23 | Ada2I: Enhancing Modality Balance for Multimodal Conversational Emotion Recognition | Cam-Van Thi Nguyen et.al. | 2408.12895 | null |
| 2024-08-23 | Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey | Qika Lin et.al. | 2408.12880 | link |
| 2024-08-23 | Grounding Fallacies Misrepresenting Scientific Publications in Evidence | Max Glockner et.al. | 2408.12812 | null |
| 2024-08-22 | Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models | Jean Park et.al. | 2408.12763 | null |
| 2024-08-22 | Mental-Perceiver: Audio-Textual Multimodal Learning for Mental Health Assessment | Jinghui Qin et.al. | 2408.12088 | null |
| 2024-08-22 | Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model | Mengying Ge et.al. | 2408.11286 | null |
| 2024-08-21 | SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition | Zebang Cheng et.al. | 2408.10500 | link |
| 2024-08-19 | Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation | Liu He et.al. | 2408.10453 | null |
| 2024-08-18 | Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition | Qifei Li et.al. | 2408.09438 | link |
| 2024-08-16 | Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition | Muhammad Haseeb Aslam et.al. | 2408.09035 | link |
| 2024-08-14 | Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach | Muhammad Saad Saeed et.al. | 2408.07445 | null |
| 2024-08-14 | Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration | Xiaogen Zhon et.al. | 2408.07341 | link |
| 2024-08-14 | Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion | Peiyuan Chen et.al. | 2408.07303 | null |
| 2024-08-13 | Prioritizing Modalities: Flexible Importance Scheduling in Federated Multimodal Learning | Jieming Bian et.al. | 2408.06549 | null |
| 2024-08-04 | Distribution-Level Memory Recall for Continual Learning: Preserving Knowledge and Avoiding Confusion | Shaoxu Cheng et.al. | 2408.02695 | null |
| 2024-08-06 | Infusing Environmental Captions for Long-Form Video Language Grounding | Hyogun Lee et.al. | 2408.02336 | null |
| 2024-08-05 | REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models | Agneet Chatterjee et.al. | 2408.02231 | null |
| 2024-08-04 | CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization | Xiang He et.al. | 2408.01952 | link |
| 2024-08-02 | Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation | Zijian Yi et.al. | 2408.00970 | link |
| 2024-08-01 | The Monetisation of Toxicity: Analysing YouTube Content Creators and Controversy-Driven Engagement | Thales Bertaglia et.al. | 2408.00534 | null |
| 2024-07-31 | Tracing Intricate Cues in Dialogue: Joint Graph Structure and Sentiment Dynamics for Multimodal Emotion Recognition | Jiang Li et.al. | 2407.21536 | null |
| 2024-07-31 | DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations | Dongwon Son et.al. | 2407.21267 | null |
| 2024-07-30 | HyperMM : Robust Multimodal Learning with Varying-sized Inputs | Hava Chaptoukaev et.al. | 2407.20768 | null |
| 2024-07-29 | ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2 | Wenjun Huang et.al. | 2407.19832 | null |
| 2024-08-02 | XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training | Biao Wu et.al. | 2407.19546 | link |
| 2024-07-28 | Detached and Interactive Multimodal Learning | Yunfeng Fan et.al. | 2407.19514 | link |
| 2024-07-26 | Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment | Yuze Zheng et.al. | 2407.18854 | null |
| 2024-07-26 | Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention | Joe Dhanith P R et.al. | 2407.18552 | null |
| 2024-07-25 | $\mathbb{X}$ -Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs | Vlad Sobal et.al. | 2407.18134 | null |
| 2024-07-25 | Cross-Vendor Reproducibility of Radiomics-based Machine Learning Models for Computer-aided Diagnosis | Jatin Chaudhary et.al. | 2407.18060 | null |
| 2024-07-23 | Masked Graph Learning with Recurrent Alignment for Multimodal Emotion Recognition in Conversation | Tao Meng et.al. | 2407.16714 | null |
| 2024-07-24 | MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues | Liyun Zhang et.al. | 2407.16552 | null |
| 2024-07-23 | Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities | Muhammad Irzam Liaqat et.al. | 2407.16243 | null |
| 2024-07-22 | Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training | Ye Lin Tun et.al. | 2407.15426 | null |
| 2024-07-17 | Text- and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild | Nicolas Richet et.al. | 2407.12927 | link |
| 2024-07-17 | Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models | Donggeun Kim et.al. | 2407.12616 | null |
| 2024-07-12 | Diagnosing and Re-learning for Balanced Multimodal Learning | Yake Wei et.al. | 2407.09705 | link |
| 2024-07-12 | Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework | Haoqin Sun et.al. | 2407.09029 | null |
| 2024-07-10 | AffectGPT: Dataset and Framework for Explainable Multimodal Emotion Recognition | Zheng Lian et.al. | 2407.07653 | link |
| 2024-07-06 | Completed Feature Disentanglement Learning for Multimodal MRIs Analysis | Tianling Liu et.al. | 2407.04916 | null |
| 2024-07-05 | Multimodal Classification via Modal-Aware Interactive Enhancement | Qing-Yuan Jiang et.al. | 2407.04587 | null |
| 2024-07-05 | Robust Multimodal Learning via Representation Decoupling | Shicai Wei et.al. | 2407.04458 | null |
| 2024-07-05 | Smart Vision-Language Reasoners | Denisa Roberts et.al. | 2407.04212 | link |
| 2024-07-04 | ADAPT: Multimodal Learning for Detecting Physiological Changes under Missing Modalities | Julie Mordacq et.al. | 2407.03836 | link |
| 2024-07-02 | Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties | Srivathsan Badrinarayanan et.al. | 2407.03380 | link |
| 2024-07-05 | Multi-Task Domain Adaptation for Language Grounding with 3D Objects | Penglei Sun et.al. | 2407.02846 | null |
| 2024-07-01 | Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation | Sirui Xia et.al. | 2407.01796 | null |
| 2024-06-30 | Tarsier: Recipes for Training and Evaluating Large Video Description Models | Jiawei Wang et.al. | 2407.00634 | link |
| 2024-06-28 | Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction | Akash Awasthi et.al. | 2407.00129 | null |
| 2024-06-27 | From Efficient Multimodal Models to World Models: A Survey | Xinji Mai et.al. | 2407.00118 | null |
| 2024-06-27 | Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment | Hao Fei et.al. | 2406.19255 | null |
| 2024-06-27 | RAVEN: Multitask Retrieval Augmented Vision-Language Learning | Varun Nagaraj Rao et.al. | 2406.19150 | null |
| 2024-06-26 | Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs | Uttaran Bhattacharya et.al. | 2406.18068 | null |
| 2024-06-25 | Data curation via joint example selection further accelerates multimodal learning | Talfan Evans et.al. | 2406.17711 | null |
| 2024-06-23 | LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control | Delin Qu et.al. | 2406.16038 | null |
| 2024-06-20 | Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning | Yupei Zhang et.al. | 2406.13979 | link |
| 2024-06-19 | VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models | Haowen Hou et.al. | 2406.13362 | link |
| 2024-06-18 | Language and Multimodal Models in Sports: A Survey of Datasets and Applications | Haotian Xia et.al. | 2406.12252 | null |
| 2024-07-01 | Multimodal Learning With Intraoperative CBCT & Variably Aligned Preoperative CT Data To Improve Segmentation | Maximilian E. Tschuchnig et.al. | 2406.11650 | null |
| 2024-06-17 | Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective | Yang Chen et.al. | 2406.11249 | null |
| 2024-06-17 | Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning | Zebang Cheng et.al. | 2406.11161 | link |
| 2024-06-13 | Explore the Limits of Omni-modal Pretraining at Scale | Yiyuan Zhang et.al. | 2406.09412 | link |
| 2024-06-13 | OpenVLA: An Open-Source Vision-Language-Action Model | Moo Jin Kim et.al. | 2406.09246 | null |
| 2024-06-13 | Zoom and Shift are All You Need | Jiahao Qin et.al. | 2406.08866 | null |
| 2024-06-11 | Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes | Asim Waqas et.al. | 2406.08521 | null |
| 2024-06-16 | A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles | Nirmalya Thakur et.al. | 2406.07693 | null |
| 2024-06-11 | Situational Awareness Matters in 3D Vision Language Reasoning | Yunze Man et.al. | 2406.07544 | null |
| 2024-06-11 | Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology | Huahui Yi et.al. | 2406.07078 | link |
| 2024-06-10 | NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative | Asmar Nadeem et.al. | 2406.06499 | null |
| 2024-06-10 | Vript: A Video Is Worth Thousands of Words | Dongjie Yang et.al. | 2406.06040 | link |
| 2024-06-09 | Stealthy Targeted Backdoor Attacks against Image Captioning | Wenshu Fan et.al. | 2406.05874 | null |
| 2024-06-07 | Predictive Dynamic Fusion | Bing Cao et.al. | 2406.04802 | link |
| 2024-06-07 | AICoderEval: Improving AI Domain Code Generation of Large Language Models | Yinghui Xia et.al. | 2406.04712 | null |
| 2024-06-02 | Multimodal Deep Learning for Low-Resource Settings: A Vector Embedding Alignment Approach for Healthcare Applications | David Restrepo et.al. | 2406.02601 | null |
| 2024-06-04 | Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization | Yunpeng Zhao et.al. | 2406.01987 | null |
| 2024-06-03 | Automatic Fused Multimodal Deep Learning for Plant Identification | Alfreds Lapkovskis et.al. | 2406.01455 | link |
| 2024-06-05 | Pulmonary Embolism Mortality Prediction Using Multimodal Learning Based on Computed Tomography Angiography and Clinical Data | Zhusi Zhong et.al. | 2406.01302 | null |
| 2024-06-02 | Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient | Zechu Li et.al. | 2406.00681 | null |
| 2024-05-31 | Ovis: Structural Embedding Alignment for Multimodal Large Language Model | Shiyin Lu et.al. | 2405.20797 | null |
| 2024-05-31 | Visual Attention Analysis in Online Learning | Miriam Navarro et.al. | 2405.20091 | null |
| 2024-05-29 | Thermodynamically Informed Multimodal Learning of High-Dimensional Free Energy Models in Molecular Coarse Graining | Blake R. Duschatko et.al. | 2405.19386 | null |
| 2024-05-29 | LLMs Meet Multimodal Generation and Editing: A Survey | Yingqing He et.al. | 2405.19334 | link |
| 2024-05-29 | Exploring Exotic Decays of the Higgs Boson to Multi-Photons at the LHC via Multimodal Learning Approaches | A. Hammad et.al. | 2405.18834 | null |
| 2024-05-28 | RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives | Jaehong Yoon et.al. | 2405.18406 | link |
| 2024-05-28 | MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance | Yake Wei et.al. | 2405.17730 | link |
| 2024-05-27 | Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning | Zihua Zhao et.al. | 2405.16996 | null |
| 2024-05-27 | Multilingual Diversity Improves Vision-Language Representations | Thao Nguyen et.al. | 2405.16915 | null |
| 2024-05-27 | Hawk: Learning to Understand Open-World Video Anomalies | Jiaqi Tang et.al. | 2405.16886 | link |
| 2024-05-24 | Shopping Queries Image Dataset (SQID): An Image-Enriched ESCI Dataset for Exploring Multimodal Learning in Product Search | Marie Al Ghossein et.al. | 2405.15190 | link |
| 2024-05-23 | TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing | Teng Xu et.al. | 2405.14455 | null |
| 2024-05-22 | Grounding Toxicity in Real-World Events across Languages | Wondimagegnhue Tsegaye Tufa et.al. | 2405.13754 | link |
| 2024-05-21 | A Survey of Robotic Language Grounding: Tradeoffs Between Symbols and Embeddings | Vanya Cohen et.al. | 2405.13245 | null |
| 2024-05-21 | Inconsistency-Aware Cross-Attention for Audio-Visual Fusion in Dimensional Emotion Recognition | R Gnana Praveen et.al. | 2405.12853 | null |
| 2024-05-21 | Scientific discourse on YouTube: Motivations for citing research in comments | Sören Striewski et.al. | 2405.12798 | null |
| 2024-05-21 | Amplifying Academic Research through YouTube: Engagement Metrics as Predictors of Citation Impact | Olga Zagovora et.al. | 2405.12734 | null |
| 2024-05-21 | A Multimodal Learning-based Approach for Autonomous Landing of UAV | Francisco Neves et.al. | 2405.12681 | null |
| 2024-05-21 | Mutual Information Analysis in Multimodal Learning Systems | Hadi Hadizadeh et.al. | 2405.12456 | null |
| 2024-05-16 | Grounded 3D-LLM with Referent Tokens | Yilun Chen et.al. | 2405.10370 | link |
| 2024-05-13 | Improving Multimodal Learning with Multi-Loss Gradient Modulation | Konstantinos Kontras et.al. | 2405.07930 | link |
| 2024-05-13 | Generating Human Motion in 3D Scenes from Text Descriptions | Zhi Cen et.al. | 2405.07784 | null |
| 2024-05-13 | An Efficient Multimodal Learning Framework to Comprehend Consumer Preferences Using BERT and Cross-Attention | Junichiro Niimi et.al. | 2405.07435 | null |
| 2024-05-10 | A First Step in Using Machine Learning Methods to Enhance Interaction Analysis for Embodied Learning Environments | Joyce Fonteles et.al. | 2405.06203 | null |
| 2024-05-09 | Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training | Sheng Yan et.al. | 2405.05523 | null |
| 2024-05-08 | Empathy Through Multimodality in Conversational Interfaces | Mahyar Abbasian et.al. | 2405.04777 | null |
| 2024-05-08 | All in One Framework for Multimodal Re-identification in the Wild | He Li et.al. | 2405.04741 | null |
| 2024-05-07 | Interpretable Tensor Fusion | Saurabh Varshneya et.al. | 2405.04671 | null |
| 2024-04-27 | MediFact at MEDIQA-M3G 2024: Medical Question Answering in Dermatology with Multimodal Learning | Nadia Saeed et.al. | 2405.01583 | null |
| 2024-04-29 | 3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset | Xinyu Ma et.al. | 2404.18413 | link |
| 2024-04-28 | LEGENT: Open Platform for Embodied Agents | Zhili Cheng et.al. | 2404.18243 | null |
| 2024-05-03 | Revisiting Multimodal Emotion Recognition in Conversation from the Perspective of Graph Spectrum | Tao Meng et.al. | 2404.17862 | null |
| 2024-04-29 | MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition | Zheng Lian et.al. | 2404.17113 | link |
| 2024-04-30 | AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models | Zhiqiang Tang et.al. | 2404.16233 | null |
| 2024-04-23 | Hidden in Plain Sight: Exploring the Intersections of Mental Health, Eating Disorders, and Content Moderation on TikTok | Charles Bickham et.al. | 2404.15457 | null |
| 2024-04-14 | A Survey on Multimodal Wearable Sensor-based Human Action Recognition | Jianyuan Ni et.al. | 2404.15349 | null |
| 2024-04-23 | Between Flat-Earthers and Fitness Coaches: Who is Citing Scientific Publications in YouTube Video Descriptions? | Olga Zagovora et.al. | 2404.15083 | null |
| 2024-04-19 | Cooperative Sentiment Agents for Multimodal Sentiment Analysis | Shanmin Wang et.al. | 2404.12642 | link |
| 2024-04-18 | Dynamic Modality and View Selection for Multimodal Emotion Recognition with Missing Modalities | Luciana Trinkaus Menon et.al. | 2404.12251 | null |
| 2024-04-19 | TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content | Avinash Anand et.al. | 2404.10305 | null |
| 2024-04-15 | AIGeN: An Adversarial Approach for Instruction Generation in VLN | Niyati Rawal et.al. | 2404.10054 | null |
| 2024-04-22 | Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning | Xiongye Xiao et.al. | 2404.09403 | link |
| 2024-04-14 | TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning | Quang Minh Dinh et.al. | 2404.09275 | link |
| 2024-04-13 | MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild | Kateryna Chumachenko et.al. | 2404.09010 | link |
| 2024-04-12 | OmniSat: Self-Supervised Modality Fusion for Earth Observation | Guillaume Astruc et.al. | 2404.08351 | link |
| 2024-04-11 | Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios | Yuan Zhang et.al. | 2404.07484 | null |
| 2024-04-07 | X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model | Jan Held et.al. | 2404.06332 | null |
| 2024-04-07 | A Data-to-Product Multimodal Conceptual Framework to Achieve Automated Software Evolution for Context-rich Intelligent Applications | Songhui Yue et.al. | 2404.04821 | null |
| 2024-04-06 | Interpretable Multimodal Learning for Cardiovascular Hemodynamics Assessment | Prasun C Tripathi et.al. | 2404.04718 | link |
| 2024-04-05 | Mitigating Heterogeneity in Federated Multimodal Learning with Biomedical Vision-Language Pre-training | Zitao Shuai et.al. | 2404.03854 | null |
| 2024-04-02 | On Stronger Computational Separations Between Multimodal and Unimodal Machine Learning | Ari Karchmer et.al. | 2404.02254 | null |
| 2024-04-01 | iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer | Fengtao Zhou et.al. | 2404.01192 | link |
| 2024-04-11 | MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models | Zebang Cheng et.al. | 2404.00511 | link |
| 2024-03-30 | UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause | Guimin Hu et.al. | 2404.00403 | null |
| 2024-03-28 | IVLMap: Instance-Aware Visual Language Grounding for Consumer Robot Navigation | Jiacui Huang et.al. | 2403.19336 | null |
| 2024-03-26 | Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation | Abdelrhman Werby et.al. | 2403.17846 | null |
| 2024-03-26 | Project MOSLA: Recording Every Moment of Second Language Acquisition | Masato Hagiwara et.al. | 2403.17314 | null |
| 2024-03-17 | A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity Recognition | Abhi Kamboj et.al. | 2403.15444 | null |
| 2024-03-22 | Contrastive Learning on Multimodal Analysis of Electronic Health Records | Tianxi Cai et.al. | 2403.14926 | null |
| 2024-03-20 | Grounding Spatial Relations in Text-Only Language Models | Gorka Azkune et.al. | 2403.13666 | link |
| 2024-04-02 | Recursive Joint Cross-Modal Attention for Multimodal Fusion in Dimensional Emotion Recognition | R. Gnana Praveen et.al. | 2403.13659 | null |
| 2024-03-20 | VL-Mamba: Exploring State Space Models for Multimodal Learning | Yanyuan Qiao et.al. | 2403.13600 | null |
| 2024-03-17 | From Pixels to Predictions: Spectrogram and Vision Transformer for Better Time Series Forecasting | Zhen Zeng et.al. | 2403.11047 | null |
| 2024-03-26 | Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity | Zhuo Zhi et.al. | 2403.09428 | link |
| 2024-03-14 | Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation | Daniel Honerkamp et.al. | 2403.08605 | link |
| 2024-03-12 | A Multimodal Intermediate Fusion Network with Manifold Learning for Stress Detection | Morteza Bodaghi et.al. | 2403.08077 | null |
| 2024-03-10 | WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs | Deshun Yang et.al. | 2403.07944 | null |
| 2024-03-25 | FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks | Muhammad Saif Ullah Khan et.al. | 2403.06904 | null |
| 2024-03-11 | DiaLoc: An Iterative Approach to Embodied Dialog Localization | Chao Zhang et.al. | 2403.06846 | null |
| 2024-03-11 | Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement | Che Liu et.al. | 2403.06659 | link |
| 2024-03-07 | A Modular End-to-End Multimodal Learning Method for Structured and Unstructured Data | Marco D Alessandro et.al. | 2403.04866 | link |
| 2024-03-05 | JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models | Arefa et.al. | 2403.04798 | link |
| 2024-03-07 | CLIP the Bias: How Useful is Balancing Data in Multimodal Learning? | Ibrahim Alabdulmohsin et.al. | 2403.04547 | null |
| 2024-03-04 | Reactive Programming without Functions | Bjarno Oeyen et.al. | 2403.02296 | null |
| 2024-03-03 | Hyperspectral Image Analysis in Single-Modal and Multimodal setting using Deep Learning Techniques | Shivam Pande et.al. | 2403.01546 | null |
| 2024-03-02 | ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation | Moran Yanuka et.al. | 2403.01306 | link |
| 2024-03-02 | Adversarial Testing for Visual Grounding via Image-Aware Property Reduction | Zhiyuan Chang et.al. | 2403.01118 | null |
| 2024-02-29 | Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers | Tsai-Shien Chen et.al. | 2402.19479 | null |
| 2024-02-29 | FATE in MMLA: A Student-Centred Exploration of Fairness, Accountability, Transparency, and Ethics in Multimodal Learning Analytics | Yueqiao Jin et.al. | 2402.19071 | null |
| 2024-02-28 | Grounding Language Models for Visual Entity Recognition | Zilin Xiao et.al. | 2402.18695 | link |
| 2024-02-28 | Multimodal Learning To Improve Cardiac Late Mechanical Activation Detection From Cine MR Images | Jiarui Xing et.al. | 2402.18507 | null |
| 2024-02-28 | DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning | Jianxiong Li et.al. | 2402.18137 | null |
| 2024-02-27 | Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control | Thong Nguyen et.al. | 2402.17535 | link |
| 2024-02-27 | Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition | Cam-Van Thi Nguyen et.al. | 2402.17269 | null |
| 2024-02-26 | GROUNDHOG: Grounding Large Language Models to Holistic Segmentation | Yichi Zhang et.al. | 2402.16846 | null |
| 2024-02-26 | Gradient-Guided Modality Decoupling for Missing-Modality Robustness | Hao Wang et.al. | 2402.16318 | null |
| 2024-02-24 | FedMM: Federated Multi-Modal Learning with Modality Heterogeneity in Computational Pathology | Yuanzhe Peng et.al. | 2402.15858 | null |
| 2024-02-20 | GRAFFORD: A Benchmark Dataset for Testing the Knowledge of Object Affordances of Language and Vision Models | Sayantan Adak et.al. | 2402.12881 | link |
| 2024-02-19 | Multimodal Emotion Recognition from Raw Audio with Sinc-convolution | Xiaohui Zhang et.al. | 2402.11954 | null |
| 2024-02-18 | Efficient Multimodal Learning from Data-centric Perspective | Muyang He et.al. | 2402.11530 | link |
Anomaly Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-12-23 | High Dimensional Data Decomposition for Anomaly Detection of Textured Images | Ji Song et.al. | 2512.20432 | null |
| 2025-12-23 | Chain-of-Anomaly Thoughts with Large Vision-Language Models | Pedro Domingos et.al. | 2512.20417 | null |
| 2025-12-23 | Machine-learning techniques for model-independent searches in dijet final states | CMS Collaboration et.al. | 2512.20395 | null |
| 2025-12-23 | Population Protocols Revisited: Parity and Beyond | Leszek Gąsieniec et.al. | 2512.20163 | null |
| 2025-12-23 | Spatio-Temporal Graphs Beyond Grids: Benchmark for Maritime Anomaly Detection | Jeehong Kim et.al. | 2512.20086 | null |
| 2025-12-22 | Lightweight Intrusion Detection in IoT via SHAP-Guided Feature Pruning and Knowledge-Distilled Kronecker Networks | Hafsa Benaddi et.al. | 2512.19488 | null |
| 2025-12-22 | Real-Time Machine Learning for Embedded Anomaly Detection | Abdelmadjid Benmachiche et.al. | 2512.19383 | null |
| 2025-12-22 | Evaluating MCC for Low-Frequency Cyberattack Detection in Imbalanced Intrusion Detection Data | Prameshwar Thiyagarajan et.al. | 2512.19203 | null |
| 2025-12-22 | Fraud Detection Through Large-Scale Graph Clustering with Heterogeneous Link Transformation | Chi Liu et.al. | 2512.19061 | null |
| 2025-12-22 | Elevating Intrusion Detection and Security Fortification in Intelligent Networks through Cutting-Edge Machine Learning Paradigms | Md Minhazul Islam Munna et.al. | 2512.19037 | null |
| 2025-12-22 | Outlier detection in mixed-attribute data: a semi-supervised approach with fuzzy approximations and relative entropy | Baiyang Chen et.al. | 2512.18978 | null |
| 2025-12-22 | Consistency-guided semi-supervised outlier detection in heterogeneous data using fuzzy rough sets | Baiyang Chen et.al. | 2512.18977 | null |
| 2025-12-21 | Hyperbolic Graph Embeddings: a Survey and an Evaluation on Anomaly Detection | Souhail Abdelmouaiz Sadat et.al. | 2512.18826 | null |
| 2025-12-21 | Label-Informed Outlier Detection Based on Granule Density | Baiyang Chen et.al. | 2512.18774 | null |
| 2025-12-21 | Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection | Junjun Pan et.al. | 2512.18733 | null |
| 2025-12-21 | Improving Pattern Recognition of Scheduling Anomalies through Structure-Aware and Semantically-Enhanced Graphs | Ning Lyu et.al. | 2512.18673 | null |
| 2025-12-20 | Cyber Threat Detection Enabled by Quantum Computing | Zisheng Chen et.al. | 2512.18493 | null |
| 2025-12-20 | Towards Scalable Visual Data Wrangling via Direct Manipulation | El Kindi Rezig et.al. | 2512.18405 | null |
| 2025-12-20 | Unsupervised Anomaly Detection with an Enhanced Teacher for Student-Teacher Feature Pyramid Matching | Mohammad Zolfaghari et.al. | 2512.18219 | null |
| 2025-12-20 | PROVEX: Enhancing SOC Analyst Trust with Explainable Provenance-Based IDS | Devang Dhanuka et.al. | 2512.18199 | null |
| 2025-12-19 | Grad: Guided Relation Diffusion Generation for Graph Augmentation in Graph Fraud Detection | Jie Yang et.al. | 2512.18133 | null |
| 2025-12-19 | AI Assisted Next Gen Outdoor Optical Networks: Camera Sensing for Monitoring and User Localization | Meysam Ghanbari et.al. | 2512.18087 | null |
| 2025-12-19 | Fraud detection in credit card transactions using Quantum-Assisted Restricted Boltzmann Machines | João Marcos Cavalcanti de Albuquerque Neto et.al. | 2512.17660 | null |
| 2025-12-19 | HeadHunt-VAD: Hunting Robust Anomaly-Sensitive Heads in MLLM for Tuning-Free Video Anomaly Detection | Zhaolin Cai et.al. | 2512.17601 | null |
| 2025-12-18 | Another Fit Bites the Dust: Conformal Prediction as a Calibration Standard for Machine Learning in High-Energy Physics | Jack Y. Araz et.al. | 2512.17048 | null |
| 2025-12-18 | New Theoretical Insights and Algorithmic Solutions for Reconstructing Score Sequences from Tournament Score Sets | Bowen Liu et.al. | 2512.16961 | null |
| 2025-12-18 | TimeSeries2Report prompting enables adaptive large language model management of lithium-ion batteries | Jiayang Yang et.al. | 2512.16453 | null |
| 2025-12-18 | Coarse-to-Fine Open-Set Graph Node Classification with Large Language Models | Xueqi Ma et.al. | 2512.16244 | null |
| 2025-12-17 | Explainable AI in Big Data Fraud Detection | Ayush Jain et.al. | 2512.16037 | null |
| 2025-12-17 | Reusable theory representations for colliders: a demonstrator SMEFT foundation model | Supratim Das Bakshi et.al. | 2512.15862 | null |
| 2025-12-14 | Adversarial Robustness in Financial Machine Learning: Defenses, Economic Impact, and Governance Evidence | Samruddhi Baviskar et.al. | 2512.15780 | null |
| 2025-12-14 | Hyperparameter Tuning-Based Optimized Performance Analysis of Machine Learning Algorithms for Network Intrusion Detection | Sudhanshu Sekhar Tripathy et.al. | 2512.15779 | null |
| 2025-12-12 | PHANTOM: Progressive High-fidelity Adversarial Network for Threat Object Modeling | Jamal Al-Karaki et.al. | 2512.15768 | null |
| 2025-12-06 | Bayesian Modeling for Uncertainty Management in Financial Risk Forecasting and Compliance | Sharif Al Mamun et.al. | 2512.15739 | null |
| 2025-12-17 | A Multivariate Statistical Framework for Detection, Classification and Pre-localization of Anomalies in Water Distribution Networks | Oleg Melnikov et.al. | 2512.15685 | null |
| 2025-12-17 | Online Partitioned Local Depth for semi-supervised applications | John D. Foley et.al. | 2512.15436 | null |
| 2025-12-17 | A Masked Reverse Knowledge Distillation Method Incorporating Global and Local Information for Image Anomaly Detection | Yuxin Jiang et.al. | 2512.15326 | null |
| 2025-12-17 | MECAD: A multi-expert architecture for continual anomaly detection | Malihe Dahmardeh et.al. | 2512.15323 | null |
| 2025-12-17 | Prototypical Learning Guided Context-Aware Segmentation Network for Few-Shot Anomaly Detection | Yuxin Jiang et.al. | 2512.15319 | null |
| 2025-12-17 | Quantum Machine Learning for Cybersecurity: A Taxonomy and Future Directions | Siva Sai et.al. | 2512.15286 | null |
| 2025-12-17 | Bounty Hunter: Autonomous, Comprehensive Emulation of Multi-Faceted Adversaries | Louis Hackländer-Jansen et.al. | 2512.15275 | null |
| 2025-12-17 | Accelerating High-Throughput Catalyst Screening by Direct Generation of Equilibrium Adsorption Structures | Songze Huo et.al. | 2512.15228 | null |
| 2025-12-16 | Intrusion Detection in Internet of Vehicles Using Machine Learning | Hop Le et.al. | 2512.14958 | null |
| 2025-12-12 | Quantum-Augmented AI/ML for O-RAN: Hierarchical Threat Detection with Synergistic Intelligence and Interpretability (Technical Report) | Tan Le et.al. | 2512.14742 | null |
| 2025-12-12 | How Deep Does Your Dependency Tree Go? An Empirical Study of Dependency Amplification Across 10 Package Ecosystems | Jahidul Arafat et.al. | 2512.14739 | null |
| 2025-12-08 | SGEMAS: A Self-Growing Ephemeral Multi-Agent System for Unsupervised Online Anomaly Detection via Entropic Homeostasis | Mustapha Hamdi et.al. | 2512.14708 | null |
| 2025-12-16 | Hierarchical Persistence Velocity for Network Anomaly Detection: Theory and Applications to Cryptocurrency Markets | Omid Khormali et.al. | 2512.14615 | null |
| 2025-12-16 | Fusion of Cellular ISAC and Passive RF Sensing for UAV Detection and Tracking | Cole Dickerson et.al. | 2512.14608 | null |
| 2025-12-16 | LLmFPCA-detect: LLM-powered Multivariate Functional PCA for Anomaly Detection in Sparse Longitudinal Texts | Prasanjit Dubey et.al. | 2512.14604 | null |
| 2025-12-16 | Hybrid Ensemble Method for Detecting Cyber-Attacks in Water Distribution Systems Using the BATADAL Dataset | Waqas Ahmed et.al. | 2512.14422 | null |
| 2025-12-16 | FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis | Da Zhang et.al. | 2512.14078 | null |
| 2025-12-15 | Look everywhere effects in anomaly detection | Marie Hein et.al. | 2512.13787 | null |
| 2025-12-14 | DARTs: A Dual-Path Robust Framework for Anomaly Detection in High-Dimensional Multivariate Time Series | Xuechun Liu et.al. | 2512.13735 | null |
| 2025-12-15 | AgentIAD: Tool-Augmented Single-Agent for Industrial Anomaly Detection | Junwen Miao et.al. | 2512.13671 | null |
| 2025-12-15 | 3D Human-Human Interaction Anomaly Detection | Shun Maeda et.al. | 2512.13560 | null |
| 2025-12-15 | Behavior-Aware and Generalizable Defense Against Black-Box Adversarial Attacks for ML-Based IDS | Sabrine Ennaji et.al. | 2512.13501 | null |
| 2025-12-15 | On-Device Continual Learning for Unsupervised Visual Anomaly Detection in Dynamic Manufacturing | Haoyu Ren et.al. | 2512.13497 | null |
| 2025-12-15 | Understanding Structured Financial Data with LLMs: A Case Study on Fraud Detection | Xuwei Tan et.al. | 2512.13040 | null |
| 2025-12-14 | A Rule-Aware Prompt Framework for Structured Numeric Reasoning in Cyber-Physical Systems | Yichen Liu et.al. | 2512.12794 | null |
| 2025-12-14 | FiD-QAE: A Fidelity-Driven Quantum Autoencoder for Credit Card Fraud Detection | Mansour El Alami et.al. | 2512.12689 | null |
| 2025-12-13 | Sleep pattern profiling using a finite mixture of contaminated multivariate skew-normal distributions on incomplete data | Jason Pillay et.al. | 2512.12464 | null |
| 2025-12-13 | Robust Outlier Detection and Low-Latency Concept Drift Adaptation for Data Stream Regression: A Dual-Channel Architecture | Bingbing Wang et.al. | 2512.12289 | null |
| 2025-12-13 | A Multi-Year Urban Streetlight Imagery Dataset for Visual Monitoring and Spatio-Temporal Drift Detection | Peizheng Li et.al. | 2512.12205 | null |
| 2025-12-12 | Rethinking Jailbreak Detection of Large Vision Language Models with Representational Contrastive Scoring | Peichun Hua et.al. | 2512.12069 | null |
| 2025-12-12 | Log Anomaly Detection with Large Language Models via Knowledge-Enriched Fusion | Anfeng Peng et.al. | 2512.11997 | null |
| 2025-12-12 | A Multi-Criteria Automated MLOps Pipeline for Cost-Effective Cloud-Based Classifier Retraining in Response to Data Distribution Shifts | Emmanuel K. Katalay et.al. | 2512.11541 | null |
| 2025-12-12 | Collaborative Reconstruction and Repair for Multi-class Industrial Anomaly Detection | Qishan Wang et.al. | 2512.11401 | null |
| 2025-12-12 | RcAE: Recursive Reconstruction Framework for Unsupervised Industrial Anomaly Detection | Rongcheng Wu et.al. | 2512.11284 | null |
| 2025-12-11 | Q-BAR: Blogger Anomaly Recognition via Quantum-enhanced Manifold Learning | Maida Wang et.al. | 2512.11071 | null |
| 2025-12-11 | An Efficient Variant of One-Class SVM with Lifelong Online Learning Guarantees | Joe Suk et.al. | 2512.11052 | null |
| 2025-12-11 | DCFO: Density-Based Counterfactuals for Outliers - Additional Material | Tommaso Amico et.al. | 2512.10659 | null |
| 2025-12-11 | Adaptive Intrusion Detection System Leveraging Dynamic Neural Models with Adversarial Learning for 5G/6G Networks | Neha et.al. | 2512.10637 | null |
| 2025-12-11 | Stealth and Evasion in Rogue AP Attacks: An Analysis of Modern Detection and Bypass Techniques | Kaleb Bacztub et.al. | 2512.10470 | null |
| 2025-12-11 | Semantic Reconstruction of Adversarial Plagiarism: A Context-Aware Framework for Detecting and Restoring “Tortured Phrases” in Scientific Literature | Agniva Maiti et.al. | 2512.10435 | null |
| 2025-12-10 | LogICL: Distilling LLM Reasoning to Bridge the Semantic Gap in Cross-Domain Log Anomaly Detection | Jingwei Ye et.al. | 2512.09627 | null |
| 2025-12-10 | Defect-aware Hybrid Prompt Optimization via Progressive Tuning for Zero-Shot Multi-type Anomaly Detection and Segmentation | Nadeem Nazer et.al. | 2512.09446 | null |
| 2025-12-09 | Natural Geometry of Robust Data Attribution: From Convex Models to Deep Networks | Shihao Li et.al. | 2512.09103 | null |
| 2025-12-09 | Explainable Anomaly Detection for Industrial IoT Data Streams | Ana Rita Paupério et.al. | 2512.08885 | null |
| 2025-12-09 | Reusability in MLOps: Leveraging Ports and Adapters to Build a Microservices Architecture for the Maritime Domain | Renato Cordeiro Ferreira et.al. | 2512.08657 | null |
| 2025-12-09 | The SMART+ Framework for AI Systems | Laxmiraju Kandikatla et.al. | 2512.08592 | null |
| 2025-12-09 | Labeled Delegated PSI and its Applications in the Public Sector | Kristof Verslype et.al. | 2512.08558 | null |
| 2025-12-09 | Developing a Strong CPS Defender: An Evolutionary Approach | Qingyuan Hu et.al. | 2512.08320 | null |
| 2025-12-09 | FedLAD: A Modular and Adaptive Testbed for Federated Log Anomaly Detection | Yihan Liao et.al. | 2512.08277 | null |
| 2025-12-09 | Improving the Sensitivity of Backdoor Detectors via Class Subspace Orthogonalization | Guangmingmei Yang et.al. | 2512.08129 | null |
| 2025-12-08 | The interstellar signature: A computational framework for open source interstellar tracking | Pancha Narayan Sahu et.al. | 2512.07910 | null |
| 2025-12-08 | Agentic Artificial Intelligence for Ethical Cybersecurity in Uganda: A Reinforcement Learning Framework for Threat Detection in Resource-Constrained Environments | Ibrahim Adabara et.al. | 2512.07909 | null |
| 2025-11-26 | Pattern Recognition of Ozone-Depleting Substance Exports in Global Trade Data | Muhammad Sukri Bin Ramli et.al. | 2512.07864 | null |
| 2025-11-26 | SetAD: Semi-Supervised Anomaly Learning in Contextual Sets | Jianling Gao et.al. | 2512.07863 | null |
| 2025-12-08 | An Adaptive Multi-Layered Honeynet Architecture for Threat Behavior Analysis via Deep Learning | Lukas Johannes Möller et.al. | 2512.07827 | null |
| 2025-12-08 | WaggleNet: A LoRa and MQTT-Based Monitoring System for Internal and External Beehive Conditions | Minju Jeon et.al. | 2512.07408 | null |
| 2025-12-07 | MINES: Explainable Anomaly Detection through Web API Invariant Inference | Wenjie Zhang et.al. | 2512.06906 | null |
| 2025-12-07 | Pseudo Anomalies Are All You Need: Diffusion-Based Generation for Weakly-Supervised Video Anomaly Detection | Satoshi Hashimoto et.al. | 2512.06845 | null |
| 2025-12-07 | CADE: Continual Weakly-supervised Video Anomaly Detection with Ensembles | Satoshi Hashimoto et.al. | 2512.06840 | null |
| 2025-12-07 | Financial Fraud Identification and Interpretability Study for Listed Companies Based on Convolutional Neural Network | Xiao Li et.al. | 2512.06648 | null |
| 2025-12-06 | AgenticCyber: A GenAI-Powered Multi-Agent System for Multimodal Threat Detection and Adaptive Response in Cybersecurity | Shovan Roy et.al. | 2512.06396 | null |
| 2025-12-06 | The Bag-and-Whisker Plot: A New Bagplot for Bivariate Data | Shenghao Qin et.al. | 2512.06314 | null |
| 2025-12-05 | AIMNET: An IoT-Empowered Digital Twin for Continuous Gas Emission Monitoring and Early Hazard Detection | Zifan Zhou et.al. | 2512.06148 | null |
| 2025-11-25 | Autoencoder-based time series anomaly detection for ATLAS Liquid Argon calorimeter data quality monitoring | Vilius Čepaitis et.al. | 2512.05977 | null |
| 2025-12-05 | A Unified AI System For Data Quality Control and DataOps Management in Regulated Environments | Devender Saini et.al. | 2512.05559 | null |
| 2025-12-05 | SCoNE: Spherical Consistent Neighborhoods Ensemble for Effective and Efficient Multi-View Anomaly Detection | Yang Xu et.al. | 2512.05540 | null |
| 2025-12-05 | IDK-S: Incremental Distributional Kernel for Streaming Anomaly Detection | Yang Xu et.al. | 2512.05531 | null |
| 2025-12-05 | Concept-based Explainable Data Mining with VLM for 3D Detection | Mai Tsujimoto et.al. | 2512.05482 | null |
| 2025-12-04 | Hybrid Quantum-Classical Autoencoders for Unsupervised Network Intrusion Detection | Mohammad Arif Rasyidi et.al. | 2512.05069 | null |
| 2025-12-04 | Logic-Driven Cybersecurity: A Novel Framework for System Log Anomaly Detection using Answer Set Programming | Fang Li et.al. | 2512.04908 | null |
| 2025-12-04 | A Novel Trust-Based DDoS Cyberattack Detection Model for Smart Business Environments | Oghenetejiri Okporokpo et.al. | 2512.04855 | null |
| 2025-12-04 | Optimal Transport Event Representation for Anomaly Detection | Aditya Bhargava et.al. | 2512.04839 | null |
| 2025-12-04 | Federated Learning for Anomaly Detection in Maritime Movement Data | Anita Graser et.al. | 2512.04635 | null |
| 2025-12-04 | Exploiting ftrace’s function_graph Tracer Features for Machine Learning: A Case Study on Encryption Detection | Kenan Begovic et.al. | 2512.04590 | null |
| 2025-12-04 | LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models | Jiaqi Sun et.al. | 2512.04474 | null |
| 2025-12-04 | AutoGuard: A Self-Healing Proactive Security Layer for DevSecOps Pipelines Using Reinforcement Learning | Praveen Anugula et.al. | 2512.04368 | null |
| 2025-12-03 | Inference-time Stochastic Refinement of GRU-Normalizing Flow for Real-time Video Motion Transfer | Tasmiah Haque et.al. | 2512.04282 | null |
| 2025-12-03 | TARA Test-by-Adaptive-Ranks for Quantum Anomaly Detection with Conformal Prediction Guarantees | Davut Emre Tasar et.al. | 2512.04016 | null |
| 2025-12-03 | Quantum Topological Graph Neural Networks for Detecting Complex Fraud Patterns | Mohammad Doost et.al. | 2512.03696 | null |
| 2025-12-03 | Federated Learning and Trajectory Compression for Enhanced AIS Coverage | Thomas Gräupl et.al. | 2512.03584 | null |
| 2025-11-30 | A Hybrid Deep Learning and Anomaly Detection Framework for Real-Time Malicious URL Classification | Berkani Khaled et.al. | 2512.03462 | null |
| 2025-12-03 | MAGE-ID: A Multimodal Generative Framework for Intrusion Detection Systems | Mahdi Arab Loodaricheh et.al. | 2512.03375 | null |
| 2025-12-03 | Hierarchical Attention for Sparse Volumetric Anomaly Detection in Subclinical Keratoconus | Lynn Kandakji et.al. | 2512.03346 | null |
| 2025-12-02 | Novelty detection on path space | Ioannis Gasteratos et.al. | 2512.03243 | null |
| 2025-12-02 | Deteccion de intrusiones en redes mediante algoritmos de aprendizaje automatico: Un estudio multiclase sobre el conjunto de datos NSL-KDD | Luis Miguel Osco Vasquez et.al. | 2512.03200 | null |
| 2025-12-02 | Neighborhood density estimation using space-partitioning based hashing schemes | Aashi Jindal et.al. | 2512.03187 | null |
| 2025-12-02 | AGENTSAFE: A Unified Framework for Ethical Assurance and Governance in Agentic AI | Rafflesia Khan et.al. | 2512.03180 | null |
| 2025-12-02 | Temporal Graph Neural Networks for Early Anomaly Detection and Performance Prediction via PV System Monitoring Data | Srijani Mukherjee et.al. | 2512.03114 | null |
| 2025-12-01 | ALARM: Automated MLLM-Based Anomaly Detection in Complex-EnviRonment Monitoring with Uncertainty Quantification | Congjing Zhang et.al. | 2512.03101 | null |
| 2025-12-02 | AuditCopilot: Leveraging LLMs for Fraud Detection in Double-Entry Bookkeeping | Md Abdul Kadir et.al. | 2512.02726 | null |
| 2025-12-02 | FGC-Comp: Adaptive Neighbor-Grouped Attribute Completion for Graph-based Anomaly Detection | Junpeng Wu et.al. | 2512.02705 | null |
| 2025-12-02 | ClimaOoD: Improving Anomaly Segmentation via Physically Realistic Synthetic Data | Yuxing Liu et.al. | 2512.02686 | null |
| 2025-12-02 | On the Problem of Consistent Anomalies in Zero-Shot Anomaly Detection | Tai Le-Gia et.al. | 2512.02520 | null |
| 2025-12-02 | Breast Cell Segmentation Under Extreme Data Constraints: Quantum Enhancement Meets Adaptive Loss Stabilization | Varun Kumar Dasoju et.al. | 2512.02302 | null |
| 2025-12-01 | Intrusion Detection on Resource-Constrained IoT Devices with Hardware-Aware ML and DL | Ali Diab et.al. | 2512.02272 | null |
| 2025-12-01 | AI-Driven Cybersecurity Testbed for Nuclear Infrastructure: Comprehensive Evaluation Using METL Operational Data | Benjamin Blakely et.al. | 2512.01727 | null |
| 2025-12-01 | ICAD-LLM: One-for-All Anomaly Detection via In-Context Learning with Large Language Models | Zhongyuan Wu et.al. | 2512.01672 | null |
| 2025-12-01 | Deep Unsupervised Anomaly Detection in Brain Imaging: Large-Scale Benchmarking and Bias Analysis | Alexander Frotscher et.al. | 2512.01534 | null |
| 2025-12-01 | Winning Solutions for the Rayan AI Contest: Compositional Retrieval, Zero-Shot Anomaly Detection, and Backdoor Detection | Ali Nafisi et.al. | 2512.01498 | null |
| 2025-12-01 | Modeling Wavelet Transformed Quantum Support Vector for Network Intrusion Detection | Swati Kumari et.al. | 2512.01365 | null |
| 2025-12-01 | The Dynamical Model Representation of Convolution-Generated Spatio-Temporal Gaussian Processes and Its Applications | Yutong Zhang et.al. | 2512.01279 | null |
| 2025-11-30 | Opportunities and Challenges for Data Quality in the Era of Quantum Computing | Sven Groppe et.al. | 2512.00870 | null |
| 2025-11-30 | FC-ADL: Efficient Microservice Anomaly Detection and Localisation Through Functional Connectivity | Giles Winchester et.al. | 2512.00844 | null |
| 2025-11-29 | Pushing the Boundaries of Interpretability: Incremental Enhancements to the Explainable Boosting Machine | Isara Liyanage et.al. | 2512.00528 | null |
| 2025-11-29 | Introducing AI-Driven IoT Energy Management Framework | Shivani Mruthyunjaya et.al. | 2512.00321 | null |
| 2025-11-29 | ART-ASyn: Anatomy-aware Realistic Texture-based Anomaly Synthesis Framework for Chest X-Rays | Qinyi Cao et.al. | 2512.00310 | null |
| 2025-11-28 | SD-CGAN: Conditional Sinkhorn Divergence GAN for DDoS Anomaly Detection in IoT Networks | Henry Onyeka et.al. | 2512.00251 | null |
| 2025-11-28 | TIE: A Training-Inversion-Exclusion Framework for Visually Interpretable and Uncertainty-Guided Out-of-Distribution Detection | Pirzada Suhail et.al. | 2512.00229 | null |
| 2025-11-28 | A Trainable Centrality Framework for Modern Data | Minh Duc Vu et.al. | 2511.22959 | null |
| 2025-11-27 | An Efficient Privacy-preserving Intrusion Detection Scheme for UAV Swarm Networks | Kanchon Gharami et.al. | 2511.22791 | null |
| 2025-11-27 | AnoRefiner: Anomaly-Aware Group-Wise Refinement for Zero-Shot Industrial Anomaly Detection | Dayou Huang et.al. | 2511.22595 | null |
| 2025-11-27 | ABounD: Adversarial Boundary-Driven Few-Shot Learning for Multi-Class Anomaly Detection | Runzhi Deng et.al. | 2511.22436 | null |
| 2025-11-27 | ARES: Anomaly Recognition Model For Edge Streams | Simone Mungari et.al. | 2511.22078 | null |
| 2025-11-27 | A Catalogue of Mid-infrared Variable Sources from unTimely | Zihan kang et.al. | 2511.22071 | null |
| 2025-11-26 | Modeling Quantum Autoencoder Trainable Kernel for IoT Anomaly Detection | Swathi Chandrasekhar et.al. | 2511.21932 | null |
| 2025-11-26 | New Physics Searches at the LHC through Event-based Anomaly Detection and Development of ADFilter Web-tool | Wasikul Islam et.al. | 2511.21869 | null |
| 2025-11-26 | Unsupervised Anomaly Detection for Smart IoT Devices: Performance and Resource Comparison | Md. Sad Abdullah Sami et.al. | 2511.21842 | null |
| 2025-11-26 | Advanced Data Collection Techniques in Cloud Security: A Multi-Modal Deep Learning Autoencoder Approach | Aamiruddin Syed et.al. | 2511.21795 | null |
| 2025-11-26 | TAGFN: A Text-Attributed Graph Dataset for Fake News Detection in the Age of LLMs | Kay Liu et.al. | 2511.21624 | null |
| 2025-11-26 | Anomaly Detection with Adaptive and Aggressive Rejection for Contaminated Training Data | Jungi Lee et.al. | 2511.21378 | null |
| 2025-11-26 | Evaluation of Large Language Models for Numeric Anomaly Detection in Power Systems | Yichen Liu et.al. | 2511.21371 | null |
| 2025-11-26 | Hybrid SIFT-SNN for Efficient Anomaly Detection of Traffic Flow-Control Infrastructure | Munish Rathee et.al. | 2511.21337 | null |
| 2025-11-26 | I-GLIDE: Input Groups for Latent Health Indicators in Degradation Estimation | Lucas Thil et.al. | 2511.21208 | link |
| 2025-11-25 | Securing the Model Context Protocol (MCP): Risks, Controls, and Governance | Herman Errico et.al. | 2511.20920 | null |
| 2025-11-25 | Ranking-Enhanced Anomaly Detection Using Active Learning-Assisted Attention Adversarial Dual AutoEncoders | Sidahmed Benabderrahmane et.al. | 2511.20480 | null |
| 2025-11-25 | DRL-Guided Neural Batch Sampling for Semi-Supervised Pixel-Level Anomaly Detection | Amirhossein Khadivi Noghredeh et.al. | 2511.20270 | null |
| 2025-11-25 | ADNet: A Large-Scale and Extensible Multi-Domain Benchmark for Anomaly Detection Across 380 Real-World Categories | Hai Ling et.al. | 2511.20169 | null |
| 2025-11-25 | An experimental study of existing tools for outlier detection and cleaning in trajectories | Mariana M Garcez Duarte et.al. | 2511.20139 | null |
| 2025-11-25 | Explainable Visual Anomaly Detection via Concept Bottleneck Models | Arianna Stropeni et.al. | 2511.20088 | null |
| 2025-11-24 | IRSDA: An Agent-Orchestrated Framework for Enterprise Intrusion Response | Damodar Panigrahi et.al. | 2511.19644 | null |
| 2025-11-23 | The Generalized Proximity Forest | Ben Shaw et.al. | 2511.19487 | null |
| 2025-11-22 | Pistachio: Towards Synthetic, Balanced, and Long-Form Video Anomaly Benchmarks | Jie Li et.al. | 2511.19474 | null |
| 2025-11-24 | Neural Architecture Search for Quantum Autoencoders | Hibah Agha et.al. | 2511.19246 | null |
| 2025-11-24 | Unsupervised Multi-View Visual Anomaly Detection via Progressive Homography-Guided Alignment | Xintao Chen et.al. | 2511.18766 | null |
| 2025-11-24 | Evaluation of Real-Time Mitigation Techniques for Cyber Security in IEC 61850 / IEC 62351 Substations | Akila Herath et.al. | 2511.18748 | null |
| 2025-11-24 | A Problem-Oriented Taxonomy of Evaluation Metrics for Time Series Anomaly Detection | Kaixiang Yang et.al. | 2511.18739 | null |
| 2025-11-24 | Multimodal Real-Time Anomaly Detection and Industrial Applications | Aman Verma et.al. | 2511.18698 | null |
| 2025-11-23 | Functional Localization Enforced Deep Anomaly Detection Using Fundus Images | Jan Benedikt Ruhland et.al. | 2511.18627 | null |
| 2025-11-23 | Algorithmic detection of false data injection attacks in cyber-physical systems | Souvik Das et.al. | 2511.18588 | null |
| 2025-11-23 | Carbon-Aware Intrusion Detection: A Comparative Study of Supervised and Unsupervised DRL for Sustainable IoT Edge Gateways | Saeid Jamshidi et.al. | 2511.18240 | null |
| 2025-11-23 | Lightweight Autoencoder-Isolation Forest Anomaly Detection for Green IoT Edge Gateways | Saeid Jamshidi et.al. | 2511.18235 | null |
| 2025-11-23 | Think Fast: Real-Time IoT Intrusion Reasoning Using IDS and LLMs at the Edge Gateway | Saeid Jamshidi et.al. | 2511.18230 | null |
| 2025-11-22 | A Novel and Practical Universal Adversarial Perturbations against Deep Reinforcement Learning based Intrusion Detection Systems | H. Zhang et.al. | 2511.18223 | null |
| 2025-11-22 | MEDIC: a network for monitoring data quality in collider experiments | Juvenal Bassa et.al. | 2511.18172 | null |
| 2025-11-22 | PromptMoE: Generalizable Zero-Shot Anomaly Detection via Visually-Guided Prompt Mixtures | Yuheng Shao et.al. | 2511.18116 | null |
| 2025-11-22 | Federated Anomaly Detection and Mitigation for EV Charging Forecasting Under Cyberattacks | Oluleke Babayomi et.al. | 2511.17978 | null |
| 2025-11-21 | StealthCup: Realistic, Multi-Stage, Evasion-Focused CTF for Benchmarking IDS | Manuel Kern et.al. | 2511.17761 | null |
| 2025-11-21 | DelTriC: A Novel Clustering Method with Accurate Outlier | Tomas Javurek et.al. | 2511.17219 | null |
| 2025-11-21 | Modeling Anomaly Detection in Cloud Services: Analysis of the Properties that Impact Latency and Resource Consumption | Gabriel Job Antunes Grabher et.al. | 2511.17119 | null |
| 2025-11-21 | AutoGraphAD: A novel approach using Variational Graph Autoencoders for anomalous network flow detection | Georgios Anyfantis et.al. | 2511.17113 | null |
| 2025-11-21 | Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models | He Huang et.al. | 2511.17094 | null |
| 2025-11-21 | CroTad: A Contrastive Reinforcement Learning Framework for Online Trajectory Anomaly Detection | Rui Xue et.al. | 2511.16929 | null |
| 2025-11-20 | A streaming algorithm and hardware accelerator for top-K flow detection in network traffic | Carolina Gallardo-Pavesi et.al. | 2511.16797 | null |
| 2025-11-20 | PersonaDrift: A Benchmark for Temporal Anomaly Detection in Language-Based Dementia Monitoring | Joy Lai et.al. | 2511.16445 | null |
| 2025-11-04 | M2S2L: Mamba-based Multi-Scale Spatial-temporal Learning for Video Anomaly Detection | Yang Liu et.al. | 2511.05564 | null |
| 2025-11-06 | Knowledge-based anomaly detection for identifying network-induced shape artifacts | Rucha Deshpande et.al. | 2511.04729 | null |
| 2025-11-06 | ARETE: an R package for Automated REtrieval from TExt with large language models | Vasco V. Branco et.al. | 2511.04573 | null |
| 2025-11-06 | Large Language Models for Cyber Security | Raunak Somani et.al. | 2511.04508 | null |
| 2025-11-06 | Fraud-Proof Revenue Division on Subscription Platforms | Abheek Ghosh et.al. | 2511.04465 | null |
| 2025-11-06 | PUL-SLAM: Path-Uncertainty Co-Optimization with Lightweight Stagnation Detection for Efficient Robotic Exploration | Yizhen Yin et.al. | 2511.04180 | null |
| 2025-11-06 | Automated and Explainable Denial of Service Analysis for AI-Driven Intrusion Detection Systems | Paul Badu Yakubu et.al. | 2511.04114 | null |
| 2025-11-06 | DeNoise: Learning Robust Graph Representations for Unsupervised Graph-Level Anomaly Detection | Qingfeng Chen et.al. | 2511.04086 | null |
| 2025-11-06 | Detecting Silent Failures in Multi-Agentic AI Trajectories | Divya Pathak et.al. | 2511.04032 | null |
| 2025-11-06 | Multiscale Astrocyte Network Calcium Dynamics for Biologically Plausible Intelligence in Anomaly Detection | Berk Iskar et.al. | 2511.03993 | null |
| 2025-11-06 | Design and Detection of Covert Man-in-the-Middle Cyberattacks on Water Treatment Plants | Victor Mattos et.al. | 2511.03971 | null |
| 2025-11-05 | I Detect What I Don’t Know: Incremental Anomaly Learning with Stochastic Weight Averaging-Gaussian for Oracle-Free Medical Imaging | Nand Kumar Yadav et.al. | 2511.03912 | null |
| 2025-11-05 | Temporal Analysis Framework for Intrusion Detection Systems: A Novel Taxonomy for Time-Aware Cybersecurity | Tatiana S. Parlanti et.al. | 2511.03799 | null |
| 2025-11-05 | Magnetism and Peierls distortion in Dirac semimetal CaMnBi $_2$ | Aashish Sapkota et.al. | 2511.03721 | null |
| 2025-11-05 | SHIELD: Securing Healthcare IoT with Efficient Machine Learning Techniques for Anomaly Detection | Mahek Desai et.al. | 2511.03661 | null |
| 2025-11-05 | Model order reduction via Lie groups | Yannik P. Wotte et.al. | 2511.03520 | null |
| 2025-11-05 | Analytical Queries for Unstructured Data | Daniel Kang et.al. | 2511.03489 | null |
| 2025-11-05 | Graph Neural AI with Temporal Dynamics for Comprehensive Anomaly Detection in Microservices | Qingyuan Zhang et.al. | 2511.03285 | null |
| 2025-11-05 | IEC3D-AD: A 3D Dataset of Industrial Equipment Components for Unsupervised Point Cloud Anomaly Detection | Bingyang Guo et.al. | 2511.03267 | null |
| 2025-11-05 | RKUM: An R Package for Robust Kernel Unsupervised Methods | Md Ashad Alam et.al. | 2511.03216 | null |
| 2025-11-05 | Who Sees the Risk? Stakeholder Conflicts and Explanatory Policies in LLM-based Risk Assessment | Srishti Yadav et.al. | 2511.03152 | null |
| 2025-11-05 | Sparse, self-organizing ensembles of local kernels detect rare statistical anomalies | Gaia Grosso et.al. | 2511.03095 | null |
| 2025-11-04 | A Collaborative Reasoning Framework for Anomaly Diagnostics in Underwater Robotics | Markus Buchholz et.al. | 2511.03075 | null |
| 2025-11-04 | Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models | W. K. M Mithsara et.al. | 2511.02894 | null |
| 2025-11-04 | AI-Generated Image Detection: An Empirical Study and Future Research Directions | Nusrat Tasnim et.al. | 2511.02791 | null |
| 2025-11-04 | Unsupervised Learning for Industrial Defect Detection: A Case Study on Shearographic Data | Jessica Plassmann et.al. | 2511.02541 | null |
| 2025-11-04 | Anomaly Detection-Based UE-Centric Inter-Cell Interference Suppression | Kwonyeol Park et.al. | 2511.02320 | null |
| 2025-11-04 | Federated Quantum Kernel Learning for Anomaly Detection in Multivariate IoT Time-Series | Kuan-Cheng Chen et.al. | 2511.02301 | null |
| 2025-11-04 | DOD: Detection of outliers in high dimensional data with distance of distances | Seong-ho Lee et.al. | 2511.02199 | null |
| 2025-11-04 | DoFlow: Causal Generative Flows for Interventional and Counterfactual Time-Series Prediction | Dongze Wu et.al. | 2511.02137 | null |
| 2025-11-03 | A short blanket for cosmology: the CMB lensing anomaly behind the preference for a negative neutrino mass | Andrea Cozzumbo et.al. | 2511.01967 | null |
| 2025-10-31 | Rapid Inference of Logic Gate Neural Networks for Anomaly Detection in High Energy Physics | Lino Gerlach et.al. | 2511.01908 | null |
| 2025-11-03 | Machine and Deep Learning for Indoor UWB Jammer Localization | Hamed Fard et.al. | 2511.01819 | null |
| 2025-11-03 | An Open-Access Benchmark of Statistical and Machine-Learning Anomaly Detection Methods for Battery Applications | Mei-Chin Pang et.al. | 2511.01745 | null |
| 2025-11-03 | Detailed spectroscopic and photometric analysis of the remarkable planet-hosting wide binary system HD 202772A/B | Emiliano Jofré et.al. | 2511.01595 | null |
| 2025-11-03 | Example-Based Feature Painting on Textures | Andrei-Timotei Ardelean et.al. | 2511.01513 | null |
| 2025-11-03 | Characterizing Build Compromises Through Vulnerability Disclosure Analysis | Maimouna Tamah Diao et.al. | 2511.01395 | null |
| 2025-11-03 | An Interdisciplinary and Cross-Task Review on Missing Data Imputation | Jicong Fan et.al. | 2511.01196 | null |
| 2025-11-02 | SliceVision-F2I: A Synthetic Feature-to-Image Dataset for Visual Pattern Representation on Network Slices | Md. Abid Hasan Rafi et.al. | 2511.01087 | null |
| 2025-11-02 | Hydra: Dual Exponentiated Memory for Multivariate Time Series Analysis | Asal Meskin et.al. | 2511.00989 | null |
| 2025-11-02 | A Unified Reasoning Framework for Holistic Zero-Shot Video Anomaly Analysis | Dongheng Lin et.al. | 2511.00962 | null |
| 2025-11-02 | Towards Ultra-Low Latency: Binarized Neural Network Architectures for In-Vehicle Network Intrusion Detection | Huiyao Dong et.al. | 2511.00828 | null |
| 2025-11-01 | TRACES: Temporal Recall with Contextual Embeddings for Real-Time Video Anomaly Detection | Yousuf Ahmed Siddiqui et.al. | 2511.00580 | null |
| 2025-11-01 | Real-IAD Variety: Pushing Industrial Anomaly Detection Dataset to a Modern Era | Wenbing Zhu et.al. | 2511.00540 | null |
| 2025-11-01 | Text-guided Fine-Grained Video Anomaly Detection | Jihao Gu et.al. | 2511.00524 | null |
| 2025-11-01 | An Efficient Anomaly Detection Framework for Wireless Sensor Networks Using Markov Process | Rahul Mishra et.al. | 2511.00481 | null |
| 2025-11-01 | Deep Learning Approach to Anomaly Detection in Enterprise ETL Processes with Autoencoders | Xin Chen et.al. | 2511.00462 | null |
| 2025-11-01 | Mind the Gap: Missing Cyber Threat Coverage in NIDS Datasets for the Energy Sector | Adrita Rahman Tory et.al. | 2511.00360 | null |
| 2025-10-31 | Mist-Assisted Federated Learning for Intrusion Detection in Heterogeneous IoT Networks | Saadat Izadi et.al. | 2511.00271 | null |
| 2025-10-31 | Feature Importance Guided Random Forest Learning with Simulated Annealing Based Hyperparameter Tuning | Kowshik Balasubramanian et.al. | 2511.00133 | null |
| 2025-10-30 | A generative adversarial network optimization method for damage detection and digital twinning by deep AI fault learning: Z24 Bridge structural health monitoring benchmark validation | Marios Impraimakis et.al. | 2511.00099 | null |
| 2025-10-28 | Adoption of AI-Driven Fraud Detection System in the Nigerian Banking Sector: An Analysis of Cost, Compliance, and Competency | Stephen Alaba John et.al. | 2511.00061 | null |
| 2025-10-28 | DynBERG: Dynamic BERT-based Graph neural network for financial fraud detection | Omkar Kulkarni et.al. | 2511.00047 | null |
| 2025-10-31 | Rethinking Telemetry Design for Fine-Grained Anomaly Detection in 5G User Planes | Niloy Saha et.al. | 2510.27664 | null |
| 2025-10-31 | Information theory for hypergraph similarity | Helcio Felippe et.al. | 2510.27411 | null |
| 2025-10-31 | Binary Anomaly Detection in Streaming IoT Traffic under Concept Drift | Rodrigo Matos Carnier et.al. | 2510.27304 | null |
| 2025-10-31 | Functional Analysis of Loss-development Patterns in P&C Insurance | Arthur Charpentier et.al. | 2510.27204 | null |
| 2025-10-31 | SERVIMON: AI-Driven Predictive Maintenance and Real-Time Monitoring for Astronomical Observatories | Emilio Mastriani et.al. | 2510.27146 | null |
| 2025-10-31 | Conditional variational autoencoders for cosmological model discrimination and anomaly detection in cosmic microwave background power spectra | Tian-Yang Sun et.al. | 2510.27086 | null |
| 2025-10-30 | Robust fuzzy clustering for high-dimensional multivariate time series with outlier detection | Ziling Ma et.al. | 2510.26982 | null |
| 2025-10-30 | The Impact of Data Compression in Real-Time and Historical Data Acquisition Systems on the Accuracy of Analytical Solutions | Reham Faqehi et.al. | 2510.26868 | null |
| 2025-10-30 | Process Integrated Computer Vision for Real-Time Failure Prediction in Steel Rolling Mill | Vaibhav Kurrey et.al. | 2510.26684 | null |
| 2025-10-30 | MSAD: A Deep Dive into Model Selection for Time series Anomaly Detection | Emmanouil Sylligardos et.al. | 2510.26643 | null |
| 2025-10-30 | Enhancing ECG Classification Robustness with Lightweight Unsupervised Anomaly Detection Filters | Mustafa Fuad Rifet Ibrahim et.al. | 2510.26501 | null |
| 2025-10-30 | Quantum Gated Recurrent GAN with Gaussian Uncertainty for Network Anomaly Detection | Wajdi Hammami et.al. | 2510.26487 | null |
| 2025-10-30 | Towards Fine-Grained Vision-Language Alignment for Few-Shot Anomaly Detection | Yuanting Fan et.al. | 2510.26464 | null |
| 2025-10-30 | A Survey of Heterogeneous Graph Neural Networks for Cybersecurity Anomaly Detection | Laura Jiang et.al. | 2510.26307 | null |
| 2025-10-30 | Segmentation over Complexity: Evaluating Ensemble and Hybrid Approaches for Anomaly Detection in Industrial Time Series | Emilio Mastriani et.al. | 2510.26159 | null |
| 2025-10-29 | CAVE: Detecting and Explaining Commonsense Anomalies in Visual Environments | Rishika Bhagwatkar et.al. | 2510.26006 | null |
| 2025-10-29 | A Critical Roadmap to Driver Authentication via CAN Bus: Dataset Review, Introduction of the Kidmose CANid Dataset (KCID), and Proof of Concept | Brooke Elizabeth Kidmose et.al. | 2510.25856 | null |
| 2025-10-29 | Flex-GAD : Flexible Graph Anomaly Detection | Apu Chakraborty et.al. | 2510.25809 | null |
| 2025-10-29 | Attention Augmented GNN RNN-Attention Models for Advanced Cybersecurity Intrusion Detection | Jayant Biradar et.al. | 2510.25802 | null |
| 2025-10-29 | Bridging the Divide: End-to-End Sequence-Graph Learning | Yuen Chen et.al. | 2510.25126 | null |
| 2025-10-28 | Semi-supervised and unsupervised learning for health indicator extraction from guided waves in aerospace composite structures | James Josep Perry et.al. | 2510.24614 | null |
| 2025-10-28 | ARIMA_PLUS: Large-scale, Accurate, Automatic and Interpretable In-Database Time Series Forecasting and Anomaly Detection in Google BigQuery | Xi Cheng et.al. | 2510.24452 | null |
| 2025-10-28 | OmniLearned: A Foundation Model Framework for All Tasks Involving Jet Physics | Wahid Bhimji et.al. | 2510.24066 | null |
| 2025-10-28 | Localized Kernel Projection Outlyingness: A Two-Stage Approach for Multi-Modal Outlier Detection | Akira Tamamori et.al. | 2510.24043 | null |
| 2025-10-28 | LLMLogAnalyzer: A Clustering-Based Log Analysis Chatbot using Large Language Models | Peng Cai et.al. | 2510.24031 | null |
| 2025-10-27 | In Search of the Unknown Unknowns: A Multi-Metric Distance Ensemble for Out of Distribution Anomaly Detection in Astronomical Surveys | Siddharth Chaini et.al. | 2510.23702 | null |
| 2025-10-27 | GRAD: Real-Time Gated Recurrent Anomaly Detection in Autonomous Vehicle Sensors Using Reinforced EMA and Multi-Stage Sliding Window Techniques | Mohammad Hossein Jafari Naeimi et.al. | 2510.23327 | null |
| 2025-10-27 | Network Intrusion Detection: Evolution from Conventional Approaches to LLM Collaboration and Emerging Risks | Yaokai Feng et.al. | 2510.23313 | null |
| 2025-10-27 | Approaching Domain Generalization with Embeddings for Robust Discrimination and Recognition of RF Communication Signals | Lukas Henneke et.al. | 2510.23186 | null |
| 2025-10-27 | A method for outlier detection based on cluster analysis and visual expert criteria | Juan A. Lara et.al. | 2510.23136 | null |
| 2025-10-27 | Reliable Robotic Task Execution in the Face of Anomalies | Bharath Santhanam et.al. | 2510.23121 | null |
| 2025-10-27 | Sentinel: Dynamic Knowledge Distillation for Personalized Federated Intrusion Detection in Heterogeneous IoT Networks | Gurpreet Singh et.al. | 2510.23019 | null |
| 2025-10-27 | CodeAD: Synthesize Code of Rules for Log-based Anomaly Detection with LLMs | Junjie Huang et.al. | 2510.22986 | null |
| 2025-10-27 | Diffuse to Detect: A Generalizable Framework for Anomaly Detection with Diffusion Models Applications to UAVs and Beyond | Mingze Gong et.al. | 2510.22928 | null |
| 2025-10-26 | A Theory of the Mechanics of Information: Generalization Through Measurement of Uncertainty (Learning is Measuring) | Christopher J. Hazard et.al. | 2510.22809 | null |
| 2025-10-26 | VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree | Wenlong Li et.al. | 2510.22693 | null |
| 2025-10-26 | CLEANet: Robust and Efficient Anomaly Detection in Contaminated Multivariate Time Series | Songhan Zhang et.al. | 2510.22619 | null |
| 2025-10-26 | Doubly Smoothed Density Estimation with Application on Miners’ Unsafe Act Detection | Qianhan Zeng et.al. | 2510.22482 | null |
| 2025-10-25 | Optimal Spatial Anomaly Detection | Baiyu Wang et.al. | 2510.22330 | null |
| 2025-10-25 | Adapting Noise-Driven PUF and AI for Secure WBG ICS: A Proof-of-Concept Study | Devon A. Kelly et.al. | 2510.22283 | null |
| 2025-10-24 | Human-Centric Anomaly Detection in Surveillance Videos Using YOLO-World and Spatio-Temporal Deep Learning | Mohammad Ali Etemadi Naeen et.al. | 2510.22056 | null |
| 2025-10-24 | Input Adaptive Bayesian Model Averaging | Yuli Slavutsky et.al. | 2510.22054 | null |
| 2025-10-24 | AutoSciDACT: Automated Scientific Discovery through Contrastive Embedding and Hypothesis Testing | Samuel Bright-Thonney et.al. | 2510.21935 | null |
| 2025-10-22 | Training data membership inference via Gaussian process meta-modeling: a post-hoc analysis approach | Yongchao Huang et.al. | 2510.21846 | null |
| 2025-10-22 | Quantum Autoencoders for Anomaly Detection in Cybersecurity | Rohan Senthil et.al. | 2510.21837 | null |
| 2025-10-24 | Automated Quality Control for Language Documentation: Detecting Phonotactic Inconsistencies in a Kokborok Wordlist | Kellen Parker van Dam et.al. | 2510.21584 | null |
| 2025-10-24 | FrameShield: Adversarially Robust Video Anomaly Detection | Mojtaba Nafez et.al. | 2510.21532 | null |
| 2025-10-24 | Actionable Cybersecurity Notifications for Smart Homes: A User Study on the Role of Length and Complexity | Victor Jüttner et.al. | 2510.21508 | null |
| 2025-10-24 | MoniTor: Exploiting Large Language Models with Instruction for Online Video Anomaly Detection | Shengtian Yang et.al. | 2510.21449 | null |
| 2025-10-24 | REMONI: An Autonomous System Integrating Wearables and Multimodal Large Language Models for Enhanced Remote Health Monitoring | Thanh Cong Ho et.al. | 2510.21445 | null |
| 2025-10-24 | An Evidence-Based Post-Hoc Adjustment Framework for Anomaly Detection Under Data Contamination | Sukanya Patra et.al. | 2510.21296 | null |
| 2025-10-24 | TokenCLIP: Token-wise Prompt Learning for Zero-shot Anomaly Detection | Qihang Zhou et.al. | 2510.21171 | null |
| 2025-10-23 | Security Logs to ATT&CK Insights: Leveraging LLMs for High-Level Threat Understanding and Cognitive Trait Inference | Soham Hans et.al. | 2510.20930 | null |
| 2025-10-23 | Unsupervised Anomaly Prediction with N-BEATS and Graph Neural Network in Multi-variate Semiconductor Process Time Series | Daniel Sorensen et.al. | 2510.20718 | null |
| 2025-10-23 | Capability of using the normalizing flows for extraction rare gamma events in the TAIGA experiment | A. P. Kryukov et.al. | 2510.20334 | null |
| 2025-10-23 | GMFVAD: Using Grained Multi-modal Feature to Improve Video Anomaly Detection | Guangyu Dai et.al. | 2510.20268 | null |
| 2025-10-23 | Unifying Boxplots: A Multiple Testing Perspective | Bowen Gang et.al. | 2510.20259 | null |
| 2025-10-23 | Physics-Guided Fusion for Robust 3D Tracking of Fast Moving Small Objects | Prithvi Raj Singh et.al. | 2510.20126 | null |
| 2025-10-23 | Human-Centered LLM-Agent System for Detecting Anomalous Digital Asset Transactions | Gyuyeon Na et.al. | 2510.20102 | null |
| 2025-10-22 | The Temporal Graph of Bitcoin Transactions | Vahid Jalili et.al. | 2510.20028 | null |
| 2025-10-22 | Machine Learning-Based Localization Accuracy of RFID Sensor Networks via RSSI Decision Trees and CAD Modeling for Defense Applications | Curtis Lee Shull et.al. | 2510.20019 | null |
| 2025-10-21 | Cyberattack Detection in Critical Infrastructure and Supply Chains | Smita Khapre et.al. | 2510.19859 | null |
| 2025-10-22 | Exploring the Effect of DNN Depth on Adversarial Attacks in Network Intrusion Detection Systems | Mohamed ElShehaby et.al. | 2510.19761 | null |
| 2025-10-22 | Network-Centric Anomaly Filtering and Spoofer localization for 5G-NR Localization in LAWNs | Zexin Fang et.al. | 2510.19521 | null |
| 2025-10-22 | AegisMCP: Online Graph Intrusion Detection for Tool-Augmented LLMs on Edge Devices | Zhonghao Zhan et.al. | 2510.19462 | null |
| 2025-10-22 | Neuromorphic computing for anomaly detection in a laser powder bed fusion process | Shreyan Banerjee et.al. | 2510.19309 | null |
| 2025-10-22 | Reliability and Resilience of AI-Driven Critical Network Infrastructure under Cyber-Physical Threats | Konstantinos A. Lizos et.al. | 2510.19295 | null |
| 2025-10-22 | Brain-Inspired Perspective on Configurations: Unsupervised Similarity and Early Cognition | Juntang Wang et.al. | 2510.19229 | null |
| 2025-10-21 | Securing IoT Communications via Anomaly Traffic Detection: Synergy of Genetic Algorithm and Ensemble Method | Behnam Seyedi et.al. | 2510.19121 | null |
| 2025-10-21 | Fusion of Machine Learning and Blockchain-based Privacy-Preserving Approach for Health Care Data in the Internet of Things | Behnam Rezaei Bezanjani et.al. | 2510.19026 | null |
| 2025-10-21 | An Encode-then-Decompose Approach to Unsupervised Time Series Anomaly Detection on Contaminated Training Data–Extended Version | Buang Zhang et.al. | 2510.18998 | null |
| 2025-10-21 | Dimensionality Reduction for Remote Sensing Data Analysis: A Systematic Review of Methods and Applications | Nathan Mankovich et.al. | 2510.18935 | null |
| 2025-10-21 | Rebellious Student: A Complementary Learning Framework for Background Feature Enhancement in Hyperspectral Anomaly Detection | Wenping Jin et.al. | 2510.18781 | null |
| 2025-10-21 | Privacy-Preserving Healthcare Data in IoT: A Synergistic Approach with Deep Learning and Blockchain | Behnam Rezaei Bezanjani et.al. | 2510.18568 | null |
| 2025-10-21 | Microsecond Federated SVD on Grassmann Manifold for Real-time IoT Intrusion Detection | Tung-Anh Nguyen et.al. | 2510.18501 | null |
| 2025-10-14 | BeSTAD: Behavior-Aware Spatio-Temporal Anomaly Detection for Human Mobility Data | Junyi Xie et.al. | 2510.12076 | null |
| 2025-10-06 | Interpreting anomaly detection of SDSS spectra | Edgar Ortiz Manrique et.al. | 2510.05235 | null |
| 2025-10-02 | Unlocking Vision-Language Models for Video Anomaly Detection via Fine-Grained Prompting | Shu Zou et.al. | 2510.02155 | null |
| 2025-09-28 | FraudTransformer: Time-Aware GPT for Transaction Fraud Detection | Gholamali Aminian et.al. | 2509.23712 | null |
| 2025-09-16 | LIGHT-HIDS: A Lightweight and Effective Machine Learning-Based Framework for Robust Host Intrusion Detection | Onat Gungor et.al. | 2509.13464 | null |
| 2025-09-01 | An Efficient Intrusion Detection System for Safeguarding Radiation Detection Systems | Nathanael Coolidge et.al. | 2509.01599 | null |
| 2025-08-26 | CITADEL: Continual Anomaly Detection for Enhanced Learning in IoT Intrusion Detection | Elvin Li et.al. | 2508.19450 | null |
| 2025-07-23 | Unsupervised anomaly detection using Bayesian flow networks: application to brain FDG PET in the context of Alzheimer’s disease | Hugues Roy et.al. | 2507.17486 | null |
| 2025-07-23 | HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs | Zhaolin Cai et.al. | 2507.17394 | null |
| 2025-07-23 | Tabular Diffusion based Actionable Counterfactual Explanations for Network Intrusion Detection | Vinura Galwaduge et.al. | 2507.17161 | null |
| 2025-07-23 | Auto-scaling Approaches for Cloud-native Applications: A Survey and Taxonomy | Minxian Xu et.al. | 2507.17128 | null |
| 2025-07-22 | Evaluating Ensemble and Deep Learning Models for Static Malware Detection with Dimensionality Reduction Using the EMBER Dataset | Md Min-Ha-Zul Abedin et.al. | 2507.16952 | null |
| 2025-07-22 | Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts | Chiao-An Yang et.al. | 2507.16946 | null |
| 2025-07-22 | Improving Predictions on Highly Unbalanced Data Using Open Source Synthetic Data Upsampling | Ivona Krchova et.al. | 2507.16419 | null |
| 2025-07-22 | eX-NIDS: A Framework for Explainable Network Intrusion Detection Leveraging Large Language Models | Paul R. B. Houssel et.al. | 2507.16241 | null |
| 2025-07-22 | DP2Guard: A Lightweight and Byzantine-Robust Privacy-Preserving Federated Learning Scheme for Industrial IoT | Baofu Han et.al. | 2507.16134 | null |
| 2025-07-21 | Stop-band Energy Constraint for Orthogonal Tunable Wavelet Units in Convolutional Neural Networks for Computer Vision problems | An D. Le et.al. | 2507.16114 | null |
| 2025-07-21 | RightTyper: Effective and Efficient Type Annotation for Python | Juan Altmayer Pizzorno et.al. | 2507.16051 | null |
| 2025-07-21 | Foundation Models and Transformers for Anomaly Detection: A Survey | Mouïn Ben Ammar et.al. | 2507.15905 | null |
| 2025-07-21 | Explainable Anomaly Detection for Electric Vehicles Charging Stations | Matteo Cederle et.al. | 2507.15718 | null |
| 2025-07-21 | Towards Explainable Anomaly Detection in Shared Mobility Systems | Elnur Isgandarov et.al. | 2507.15643 | null |
| 2025-07-21 | Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems | Andrii Balashov et.al. | 2507.15613 | null |
| 2025-07-21 | We Need to Rethink Benchmarking in Anomaly Detection | Philipp Röchner et.al. | 2507.15584 | null |
| 2025-07-21 | An aerial color image anomaly dataset for search missions in complex forested terrain | Rakesh John Amala Arokia Nathan et.al. | 2507.15492 | null |
| 2025-07-21 | ExDD: Explicit Dual Distribution Learning for Surface Defect Detection via Diffusion Synthesis | Muhammad Aqeel et.al. | 2507.15335 | null |
| 2025-07-20 | Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback | Yiyuan Yang et.al. | 2507.15066 | null |
| 2025-07-20 | Deep Generative Models in Condition and Structural Health Monitoring: Opportunities, Limitations and Future Outlook | Xin Yang et.al. | 2507.15026 | null |
| 2025-07-20 | A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books | Ivan Letteri et.al. | 2507.14960 | null |
| 2025-07-20 | Mayura: Exploiting Similarities in Motifs for Temporal Co-Mining | Sanjay Sri Vallabh Singapuram et.al. | 2507.14813 | null |
| 2025-07-18 | Unmasking Performance Gaps: A Comparative Study of Human Anonymization and Its Effects on Video Anomaly Detection | Sara Abdulaziz et.al. | 2507.14083 | null |
| 2025-07-18 | Robust Anomaly Detection with Graph Neural Networks using Controllability | Yifan Wei et.al. | 2507.13954 | null |
| 2025-07-18 | Conformal Data Contamination Tests for Trading or Sharing of Data | Martin V. Vejling et.al. | 2507.13835 | null |
| 2025-07-18 | Efficient and Scalable Self-Healing Databases Using Meta-Learning and Dependency-Driven Recovery | Joydeep Chandra et.al. | 2507.13757 | null |
| 2025-07-18 | Kolmogorov-Arnold Networks-based GRU and LSTM for Loan Default Early Prediction | Yue Yang et.al. | 2507.13685 | null |
| 2025-07-17 | Theory-informed neural networks for particle physics | Barry M. Dillon et.al. | 2507.13447 | null |
| 2025-07-17 | A Crowdsensing Intrusion Detection Dataset For Decentralized Federated Learning Models | Chao Feng et.al. | 2507.13313 | null |
| 2025-07-17 | 3DKeyAD: High-Resolution 3D Point Cloud Anomaly Detection via Keypoint-Guided Point Clustering | Zi Wang et.al. | 2507.13110 | null |
| 2025-07-17 | Fault detection and diagnosis for the engine electrical system of a space launcher based on a temporal convolutional autoencoder and calibrated classifiers | Luis Basora et.al. | 2507.13022 | null |
| 2025-07-17 | A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys | Yufeng Luo et.al. | 2507.12784 | null |
| 2025-07-16 | Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding | Feng Xiao et.al. | 2507.12295 | link |
| 2025-07-16 | A Privacy-Preserving Framework for Advertising Personalization Incorporating Federated Learning and Differential Privacy | Xiang Li et.al. | 2507.12098 | null |
| 2025-07-16 | MoViAD: Modular Visual Anomaly Detection | Manuel Barusco et.al. | 2507.12049 | null |
| 2025-07-16 | Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection | Tairan Huang et.al. | 2507.11997 | null |
| 2025-07-16 | d-DQIVAR: Data-centric Visual Analytics and Reasoning for Data Quality Improvement | Hyein Hong et.al. | 2507.11960 | null |
| 2025-07-15 | How To Mitigate And Defend Against DDoS Attacks In IoT Devices | Ifiyemi Leigha et.al. | 2507.11772 | null |
| 2025-07-15 | Learning Representations of Event Time Series with Sparse Autoencoders for Anomaly Detection, Similarity Search, and Unsupervised Classification | Steven Dillmann et.al. | 2507.11620 | null |
| 2025-07-15 | FlexCAST: Enabling Flexible Scientific Data Analyses | Benjamin Nachman et.al. | 2507.11528 | null |
| 2025-07-15 | A Mathematical Optimization Approach to Multisphere Support Vector Data Description | Víctor Blanco et.al. | 2507.11106 | null |
| 2025-07-15 | LogTinyLLM: Tiny Large Language Models Based Contextual Log Anomaly Detection | Isaiah Thompson Ocansey et.al. | 2507.11071 | null |
| 2025-07-15 | Bridge Feature Matching and Cross-Modal Alignment with Mutual-filtering for Zero-shot Anomaly Detection | Yuhu Bai et.al. | 2507.11003 | null |
| 2025-07-15 | Class-Proportional Coreset Selection for Difficulty-Separable Data | Elisa Tsai et.al. | 2507.10904 | null |
| 2025-07-15 | From Alerts to Intelligence: A Novel LLM-Aided Framework for Host-based Intrusion Detection | Danyu Sun et.al. | 2507.10873 | null |
| 2025-07-14 | REAL-IoT: Characterizing GNN Intrusion Detection Robustness under Practical Adversarial Attack | Zhonghao Zhan et.al. | 2507.10836 | null |
| 2025-07-14 | Contrastive-KAN: A Semi-Supervised Intrusion Detection Framework for Cybersecurity with scarce Labeled Data | Mohammad Alikhani et.al. | 2507.10808 | null |
| 2025-07-14 | Real-time, Adaptive Radiological Anomaly Detection and Isotope Identification Using Non-negative Matrix Factorization | Chandler Jones et.al. | 2507.10715 | null |
| 2025-07-14 | BenchReAD: A systematic benchmark for retinal anomaly detection | Chenyu Lian et.al. | 2507.10492 | null |
| 2025-07-13 | Causality-informed Anomaly Detection in Partially Observable Sensor Networks: Moving beyond Correlations | Xiaofeng Xiao et.al. | 2507.09742 | null |
| 2025-07-12 | Adversarial Activation Patching: A Framework for Detecting and Mitigating Emergent Deception in Safety-Aligned Transformers | Santhosh Kumar Ravindran et.al. | 2507.09406 | null |
| 2025-07-12 | Credit Card Fraud Detection Using RoFormer Model With Relative Distance Rotating Encoding | Kevin Reyes et.al. | 2507.09385 | null |
| 2025-07-12 | Robust Spatiotemporal Epidemic Modeling with Integrated Adaptive Outlier Detection | Haoming Shi et.al. | 2507.09380 | null |
| 2025-07-12 | Simplifying Traffic Anomaly Detection with Video Foundation Models | Svetlana Orlova et.al. | 2507.09338 | link |
| 2025-07-12 | Stereo-based 3D Anomaly Object Detection for Autonomous Driving: A New Dataset and Baseline | Shiyi Mu et.al. | 2507.09214 | link |
| 2025-07-12 | Proactive AI-and-RAN Workload Orchestration in O-RAN Architectures for 6G Networks | Syed Danial Ali Shah et.al. | 2507.09124 | null |
| 2025-07-11 | RoundaboutHD: High-Resolution Real-World Urban Environment Benchmark for Multi-Camera Vehicle Tracking | Yuqiang Lin et.al. | 2507.08729 | link |
| 2025-07-11 | InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching | Yilun Wang et.al. | 2507.08523 | null |
| 2025-07-11 | Data Depth as a Risk | Arturo Castellanos et.al. | 2507.08518 | null |
| 2025-07-10 | Rethinking Spatio-Temporal Anomaly Detection: A Vision for Causality-Driven Cybersecurity | Arun Vignesh Malarkkan et.al. | 2507.08177 | null |
| 2025-07-10 | HybridQC: Machine Learning-Augmented Quality Control for Single-Cell RNA-seq Data | Kaitao Lai et.al. | 2507.08058 | null |
| 2025-07-10 | SAGE: A Visual Language Model for Anomaly Detection via Fact Enhancement and Entropy-aware Alignment | Guoxin Zang et.al. | 2507.07939 | null |
| 2025-07-11 | LISTEN: Lightweight Industrial Sound-representable Transformer for Edge Notification | Changheon Han et.al. | 2507.07879 | null |
| 2025-07-10 | 3D-ADAM: A Dataset for 3D Anomaly Detection in Advanced Manufacturing | Paul McHard et.al. | 2507.07838 | null |
| 2025-07-10 | Adaptive Gaussian Mixture Models-based Anomaly Detection for under-constrained Cable-Driven Parallel Robots | Julio Garrido et.al. | 2507.07714 | null |
| 2025-07-10 | NexViTAD: Few-shot Unsupervised Cross-Domain Defect Detection via Vision Foundation Models and Multi-Task Learning | Tianwei Mu et.al. | 2507.07579 | null |
| 2025-07-10 | Real-Time Decorrelation-Based Anomaly Detection for Multivariate Time Series | Amirhossein Sadough et.al. | 2507.07559 | null |
| 2025-07-10 | Towards High-Resolution 3D Anomaly Detection: A Scalable Dataset and Real-Time Framework for Subtle Industrial Defects | Yuqi Cheng et.al. | 2507.07435 | null |
| 2025-07-10 | Hybrid LLM-Enhanced Intrusion Detection for Zero-Day Threats in IoT Networks | Mohammad F. Al-Hammouri et.al. | 2507.07413 | null |
| 2025-07-09 | MADPOT: Medical Anomaly Detection with CLIP Adaptation and Partial Optimal Transport | Mahshid Shiri et.al. | 2507.06733 | null |
| 2025-07-09 | UniOD: A Universal Model for Outlier Detection across Diverse Domains | Dazhi Fu et.al. | 2507.06624 | null |
| 2025-07-09 | IMPACT: Industrial Machine Perception via Acoustic Cognitive Transformer | Changheon Han et.al. | 2507.06481 | null |
| 2025-07-09 | Mitigating Message Imbalance in Fraud Detection with Dual-View Graph Representation Learning | Yudan Song et.al. | 2507.06469 | null |
| 2025-07-08 | seMCD: Sequentially implemented Monte Carlo depth computation with statistical guarantees | Felix Gnettner et.al. | 2507.06227 | null |
| 2025-07-08 | What ZTF Saw Where Rubin Looked: Anomaly Hunting in DR23 | Maria V. Pruzhinskaya et.al. | 2507.06217 | null |
| 2025-07-08 | Universal Embeddings of Tabular Data | Astrid Franz et.al. | 2507.05904 | null |
| 2025-07-08 | Hyperspectral Anomaly Detection Methods: A Survey and Comparative Study | Aayushma Pant et.al. | 2507.05730 | null |
| 2025-07-08 | Area-based epigraph and hypograph indices for functional outlier detection | Belen Pulido et.al. | 2507.05701 | null |
| 2025-07-08 | Graph Learning | Feng Xia et.al. | 2507.05636 | null |
| 2025-07-08 | Quantum Machine Learning for Identifying Transient Events in X-ray Light Curves | Taiki Kawamuro et.al. | 2507.05589 | null |
| 2025-07-08 | iThermTroj: Exploiting Intermittent Thermal Trojans in Multi-Processor System-on-Chips | Mehdi Elahi et.al. | 2507.05576 | null |
| 2025-07-07 | PROTEAN: Federated Intrusion Detection in Non-IID Environments through Prototype-Based Knowledge Sharing | Sara Chennoufi et.al. | 2507.05524 | null |
| 2025-07-09 | Silent Failures in Stateless Systems: Rethinking Anomaly Detection for Serverless Computing | Chanh Nguyen et.al. | 2507.04969 | null |
| 2025-07-07 | Large Language Models for Network Intrusion Detection Systems: Foundations, Implementations, and Future Directions | Shuo Yang et.al. | 2507.04752 | null |
| 2025-07-06 | A Data-Driven Novelty Score for Diverse In-Vehicle Data Recording | Philipp Reis et.al. | 2507.04529 | null |
| 2025-07-06 | Dealing with Uncertainty in Contextual Anomaly Detection | Luca Bindini et.al. | 2507.04490 | null |
| 2025-07-06 | Anomalous Decision Discovery using Inverse Reinforcement Learning | Ashish Bastola et.al. | 2507.04464 | null |
| 2025-07-06 | Normalizing Flow to Augmented Posterior: Conditional Density Estimation with Interpretable Dimension Reduction for High Dimensional Data | Cheng Zeng et.al. | 2507.04216 | null |
| 2025-07-06 | ML-Enhanced AES Anomaly Detection for Real-Time Embedded Security | Nishant Chinnasami et.al. | 2507.04197 | null |
| 2025-07-05 | Specific heat and density anomalies in the Hubbard model | M. A. Habitzreuter et.al. | 2507.04041 | null |
| 2025-07-05 | Fast Re-Trainable Attention Autoencoder for Liquid Sensor Anomaly Detection at the Edge | Seongyun Choi et.al. | 2507.03995 | null |
| 2025-07-05 | An Efficient Detector for Faulty GNSS Measurements Detection With Non-Gaussian Noises | Penggao Yan et.al. | 2507.03987 | null |
| 2025-07-03 | Alleviating Attack Data Scarcity: SCANIA’s Experience Towards Enhancing In-Vehicle Cyber Security Measures | Frida Sundfeldt et.al. | 2507.02607 | null |
| 2025-07-03 | CyberRAG: An agentic RAG cyber attack classification and reporting tool | Francesco Blefari et.al. | 2507.02424 | null |
| 2025-07-03 | Evaluating Language Models For Threat Detection in IoT Security Logs | Jorge J. Tejero-Fernández et.al. | 2507.02390 | null |
| 2025-07-02 | Can Artificial Intelligence solve the blockchain oracle problem? Unpacking the Challenges and Possibilities | Giulio Caldarelli et.al. | 2507.02125 | null |
| 2025-07-02 | SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars | Xiaosheng Zhao et.al. | 2507.01939 | null |
| 2025-07-02 | Exploring a Hybrid Deep Learning Approach for Anomaly Detection in Mental Healthcare Provider Billing: Addressing Label Scarcity through Semi-Supervised Anomaly Detection | Samirah Bakker et.al. | 2507.01924 | null |
| 2025-07-02 | Towards Foundation Auto-Encoders for Time-Series Anomaly Detection | Gastón García González et.al. | 2507.01875 | null |
| 2025-07-02 | Graph Representation-based Model Poisoning on Federated LLMs in CyberEdge Networks | Hanlin Cai et.al. | 2507.01694 | null |
| 2025-07-02 | BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments | Yibo Qiu et.al. | 2507.01485 | null |
| 2025-07-02 | OoDDINO:A Multi-level Framework for Anomaly Segmentation on Complex Road Scenes | Yuxing Liu et.al. | 2507.01455 | null |
| 2025-07-02 | Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy | Xiaoyun Zhang et.al. | 2507.01327 | null |
| 2025-07-01 | Deep Learning-Based Intrusion Detection for Automotive Ethernet: Evaluating & Optimizing Fast Inference Techniques for Deployment on Low-Cost Platform | Pedro R. X. Carmo et.al. | 2507.01208 | null |
| 2025-07-01 | Good Enough to Learn: LLM-based Anomaly Detection in ECU Logs without Reliable Labels | Bogdan Bogdan et.al. | 2507.01077 | null |
| 2025-07-01 | Biorthogonal Tunable Wavelet Unit with Lifting Scheme in Convolutional Neural Network | An Le et.al. | 2507.00739 | null |
| 2025-06-30 | Learning Constraints Directly from Network Data | Hongyu Hè et.al. | 2506.23964 | null |
| 2025-06-30 | MadCLIP: Few-shot Medical Anomaly Detection with CLIP | Mahshid Shiri et.al. | 2506.23810 | null |
| 2025-06-30 | Adaptive Out-of-Control Point Pattern Detection in Sequential Random Finite Set Observations | Konstantinos Bourazas et.al. | 2506.23802 | null |
| 2025-06-30 | MReg: A Novel Regression Model with MoE-based Video Feature Mining for Mitral Regurgitation Diagnosis | Zhe Liu et.al. | 2506.23648 | null |
| 2025-06-30 | StackCLIP: Clustering-Driven Stacked Prompt in Zero-Shot Industrial Anomaly Detection | Yanning Hou et.al. | 2506.23577 | null |
| 2025-06-30 | Reconciling Attribute and Structural Anomalies for Improved Graph Anomaly Detection | Chunjing Xiao et.al. | 2506.23469 | null |
| 2025-06-30 | Enhancing Insider Threat Detection Using User-Based Sequencing and Transformer Encoders | Mohamed Elbasheer et.al. | 2506.23446 | null |
| 2025-06-29 | GaussMaster: An LLM-based Database Copilot System | Wei Zhou et.al. | 2506.23322 | null |
| 2025-06-29 | Autoregressive Denoising Score Matching is a Good Video Anomaly Detector | Hanwen Zhang et.al. | 2506.23282 | null |
| 2025-06-28 | Kernel Outlier Detection | Can Hakan Dağıdır et.al. | 2506.22994 | null |
| 2025-06-27 | Data-Driven Intrusion Detection in Vehicles: Integrating Unscented Kalman Filter (UKF) with Machine Learning | Shuhao Bian et.al. | 2506.22404 | null |
| 2025-06-27 | A Self-scaled Approximate $\ell_0$ Regularization Robust Model for Outlier Detection | Pengyang Song et.al. | 2506.22277 | null |
| 2025-06-27 | Autonomic Microservice Management via Agentic AI and MAPE-K Integration | Matteo Esposito et.al. | 2506.22185 | null |
| 2025-06-27 | Proof-of-Behavior: Behavior-Driven Consensus for Trustworthy Decentralized Finance | Ailiya Borjigin et.al. | 2506.22171 | null |
| 2025-06-27 | Explainable anomaly detection for sound spectrograms using pooling statistics with quantile differences | Nicolas Thewes et.al. | 2506.21921 | null |
| 2025-06-26 | mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale | Xiaona Zhou et.al. | 2506.21550 | null |
| 2025-06-26 | SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark | Alex Costanzino et.al. | 2506.21549 | null |
| 2025-06-26 | Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems | Francesco Vitale et.al. | 2506.21502 | null |
| 2025-06-26 | IDGraphs: Intrusion Detection and Analysis Using Stream Compositing | Pin Ren et.al. | 2506.21425 | null |
| 2025-06-26 | FastRef:Fast Prototype Refinement for Few-Shot Industrial Anomaly Detection | Long Tian et.al. | 2506.21398 | null |
| 2025-06-26 | Temporal-Aware Graph Attention Network for Cryptocurrency Transaction Fraud Detection | Zhi Zheng et.al. | 2506.21382 | null |
| 2025-06-26 | GANet-Seg: Adversarial Learning for Brain Tumor Segmentation with Hybrid Generative Models | Qifei Cui et.al. | 2506.21245 | null |
| 2025-06-26 | Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks | Deepak Kumar Panda et.al. | 2506.21142 | null |
| 2025-06-25 | Poster: Enhancing GNN Robustness for Network Intrusion Detection via Agent-based Analysis | Zhonghao Zhan et.al. | 2506.20806 | null |
| 2025-06-25 | Joint attitude estimation and 3D neural reconstruction of non-cooperative space objects | Clément Forray et.al. | 2506.20638 | null |
| 2025-06-25 | Benchmarking Unsupervised Strategies for Anomaly Detection in Multivariate Time Series | Laura Boggia et.al. | 2506.20574 | null |
| 2025-06-24 | Predicting wide binaries and deviations from standard gravity using machine learning algorithms | Amoy Ashesh et.al. | 2506.19942 | null |
| 2025-06-24 | A Hybrid Intrusion Detection System with a New Approach to Protect the Cybersecurity of Cloud Computing | Maryam Mahdi Al-Husseini et.al. | 2506.19934 | null |
| 2025-06-24 | Graph theory inspired anomaly detection at the LHC | Jack Y. Araz et.al. | 2506.19920 | null |
| 2025-06-24 | Exact Matrix Seriation through Mathematical Optimization: Stress and Effectiveness-Based Models | Víctor Blanco et.al. | 2506.19821 | null |
| 2025-06-24 | KnowML: Improving Generalization of ML-NIDS with Attack Knowledge Graphs | Xin Fan Guo et.al. | 2506.19802 | null |
| 2025-06-24 | UltraAD: Fine-Grained Ultrasound Anomaly Classification via Few-Shot CLIP Adaptation | Yue Zhou et.al. | 2506.19694 | null |
| 2025-06-24 | Experimental Assessment of Neural 3D Reconstruction for Small UAV-based Applications | Genís Castillo Gómez-Raya et.al. | 2506.19491 | null |
| 2025-06-24 | Behavioral Anomaly Detection in Distributed Systems via Federated Contrastive Learning | Renzi Meng et.al. | 2506.19246 | null |
| 2025-06-24 | Quantitative Benchmarking of Anomaly Detection Methods in Digital Pathology | Can Cui et.al. | 2506.19234 | null |
| 2025-06-23 | Multimodal Anomaly Detection with a Mixture-of-Experts | Christoph Willibald et.al. | 2506.19077 | null |
| 2025-06-23 | 3D Arena: An Open Platform for Generative 3D Evaluation | Dylan Ebert et.al. | 2506.18787 | null |
| 2025-06-23 | Trustworthy Prediction with Gaussian Process Knowledge Scores | Kurt Butler et.al. | 2506.18630 | null |
| 2025-06-23 | Normality Prior Guided Multi-Semantic Fusion Network for Unsupervised Image Anomaly Detection | Muhao Xu et.al. | 2506.18544 | null |
| 2025-06-23 | Sequential keypoint density estimator: an overlooked baseline of skeleton-based video anomaly detection | Anja Delić et.al. | 2506.18368 | null |
| 2025-06-23 | Learning High-Quality Latent Representations for Anomaly Detection and Signal Integrity Enhancement in High-Speed Signals | Muhammad Usama et.al. | 2506.18288 | null |
| 2025-06-23 | Cross-Architecture Knowledge Distillation (KD) for Retinal Fundus Image Anomaly Detection on NVIDIA Jetson Nano | Berk Yilmaz et.al. | 2506.18220 | null |
| 2025-06-22 | Dynamic Temporal Positional Encodings for Early Intrusion Detection in IoT | Ioannis Panopoulos et.al. | 2506.18114 | null |
| 2025-06-22 | TAB: Unified Benchmarking of Time Series Anomaly Detection Methods | Xiangfei Qiu et.al. | 2506.18046 | null |
| 2025-06-21 | Quantum-Hybrid Support Vector Machines for Anomaly Detection in Industrial Control Systems | Tyler Cultice et.al. | 2506.17824 | null |
| 2025-06-21 | The Blind Spot of BGP Anomaly Detection: Why LSTM Autoencoders Fail on Real-World Outages | Samuel Oluwafemi Adebayo et.al. | 2506.17821 | null |
| 2025-06-20 | Searching for a Hidden Markov Anomaly over Multiple Processes | Levli Citron et.al. | 2506.17108 | null |
| 2025-06-20 | MAWIFlow Benchmark: Realistic Flow-Based Evaluation for Network Intrusion Detection | Joshua Schraven et.al. | 2506.17041 | link |
| 2025-06-20 | Anomaly Detection in Event-triggered Traffic Time Series via Similarity Learning | Shaoyu Dou et.al. | 2506.16855 | null |
| 2025-06-20 | Robust Group Anomaly Detection for Quasi-Periodic Network Time Series | Kai Yang et.al. | 2506.16815 | null |
| 2025-06-19 | Few-Shot Learning-Based Cyber Incident Detection with Augmented Context Intelligence | Fei Zuo et.al. | 2506.16626 | null |
| 2025-06-19 | On Continuous Monitoring of Risk Violations under Unknown Shift | Alexander Timans et.al. | 2506.16416 | null |
| 2025-06-19 | Classification of Cattle Behavior and Detection of Heat (Estrus) using Sensor Data | Druva Dhakshinamoorthy et.al. | 2506.16380 | null |
| 2025-06-19 | Signatures to help interpretability of anomalies | Emmanuel Gangler et.al. | 2506.16314 | null |
| 2025-06-19 | On the Efficient Discovery of Maximum $k$ -Defective Biclique | Donghang Cui et.al. | 2506.16121 | null |
| 2025-06-19 | CRIA: A Cross-View Interaction and Instance-Adapted Pre-training Framework for Generalizable EEG Representations | Puchun Liu et.al. | 2506.16056 | null |
| 2025-06-18 | Real-Time Initialization of Unknown Anchors for UWB-aided Navigation | Giulio Delama et.al. | 2506.15518 | null |
| 2025-06-18 | Time-dependent density estimation using binary classifiers | Agnimitra Dasgupta et.al. | 2506.15505 | null |
| 2025-06-18 | Semi-supervised Graph Anomaly Detection via Robust Homophily Learning | Guoguo Ai et.al. | 2506.15448 | null |
| 2025-06-18 | Evaluation Pipeline for systematically searching for Anomaly Detection Systems | Florian Rokohl et.al. | 2506.15388 | null |
| 2025-06-18 | Human-Centred AI in FinTech: Developing a User Experience (UX) Research Point of View (PoV) Playbook | Festus Adedoyin et.al. | 2506.15325 | null |
| 2025-06-17 | Determinação Automática de Limiar de Detecção de Ataques em Redes de Computadores Utilizando Autoencoders | Luan Gonçalves Miranda et.al. | 2506.14937 | null |
| 2025-06-17 | Explain First, Trust Later: LLM-Augmented Explanations for Graph-Based Crypto Anomaly Detection | Adriana Watson et.al. | 2506.14933 | link |
| 2025-06-17 | Latent Anomaly Detection: Masked VQ-GAN for Unsupervised Segmentation in Medical CBCT | Pengwei Wang et.al. | 2506.14209 | null |
| 2025-06-17 | $β$ -integrated local depth and corresponding partitioned local depth representation | Siyi Wang et.al. | 2506.14108 | null |
| 2025-06-16 | Bridging Unsupervised and Semi-Supervised Anomaly Detection: A Theoretically-Grounded and Practical Framework with Synthetic Anomalies | Matthew Lau et.al. | 2506.13955 | null |
| 2025-06-15 | Hybrid Meta-Learning Framework for Anomaly Forecasting in Nonlinear Dynamical Systems via Physics-Inspired Simulation and Deep Ensembles | Abdullah Burkan Bereketoglu et.al. | 2506.13828 | null |
| 2025-06-16 | Anomaly Object Segmentation with Vision-Language Models for Steel Scrap Recycling | Daichi Tanaka et.al. | 2506.13282 | null |
| 2025-06-16 | Polyra Swarms: A Shape-Based Approach to Machine Learning | Simon Klüttermann et.al. | 2506.13217 | null |
| 2025-06-16 | Pro-AD: Learning Comprehensive Prototypes with Prototype-based Constraint for Multi-class Unsupervised Anomaly Detection | Ziqing Zhou et.al. | 2506.13097 | null |
| 2025-06-16 | Learning Event Completeness for Weakly Supervised Video Anomaly Detection | Yu Wang et.al. | 2506.13095 | null |
| 2025-06-16 | Condition Monitoring with Machine Learning: A Data-Driven Framework for Quantifying Wind Turbine Energy Loss | Emil Marcus Buchberg et.al. | 2506.13012 | null |
| 2025-06-15 | SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models | Xinyi Zhao et.al. | 2506.12992 | null |
| 2025-06-15 | Probing Deep into Temporal Profile Makes the Infrared Small Target Detector Much Better | Ruojing Li et.al. | 2506.12766 | null |
| 2025-06-13 | Temporal cross-validation impacts multivariate time series subsequence anomaly detection evaluation | Steven C. Hespeler et.al. | 2506.12183 | null |
| 2025-06-13 | Diffusion-Based Electrocardiography Noise Quantification via Anomaly Detection | Tae-Seong Han et.al. | 2506.11815 | null |
| 2025-06-13 | A retrospective on DISPEED – Leveraging heterogeneity in a drone swarm for IDS execution | Vincent Lannurien et.al. | 2506.11800 | null |
| 2025-06-13 | Self-supervised Learning of Echocardiographic Video Representations via Online Cluster Distillation | Divyanshu Mishra et.al. | 2506.11777 | null |
| 2025-06-13 | CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection | Byeongchan Lee et.al. | 2506.11772 | null |
| 2025-06-13 | Deep Symmetric Autoencoders from the Eckart-Young-Schmidt Perspective | Simone Brivio et.al. | 2506.11641 | null |
| 2025-06-13 | FAA Framework: A Large Language Model-Based Approach for Credit Card Fraud Investigations | Shaun Shuster et.al. | 2506.11635 | null |
| 2025-06-13 | Prioritizing Alignment Paradigms over Task-Specific Model Customization in Time-Series LLMs | Wei Li et.al. | 2506.11512 | null |
| 2025-06-12 | Advanced fraud detection using machine learning models: enhancing financial transaction security | Nudrat Fariha et.al. | 2506.10842 | null |
| 2025-06-13 | IQE-CLIP: Instance-aware Query Embedding for Zero-/Few-shot Anomaly Detection in Medical Domain | Hong Huang et.al. | 2506.10730 | link |
| 2025-06-12 | Anomaly Detection for Sensing Security | Stefan Roth et.al. | 2506.10718 | null |
| 2025-06-12 | Assessing the Resilience of Automotive Intrusion Detection Systems to Adversarial Manipulation | Stefano Longari et.al. | 2506.10620 | null |
| 2025-06-12 | Data Driven Diagnosis for Large Cyber-Physical-Systems with Minimal Prior Information | Henrik Sebastian Steude et.al. | 2506.10613 | null |
| 2025-06-11 | Conditional diffusion models for guided anomaly detection in brain images using fluid-driven anomaly randomization | Ana Lawry Aguila et.al. | 2506.10233 | null |
| 2025-06-11 | TrioXpert: An automated incident management framework for microservice system | Yongqian Sun et.al. | 2506.10043 | null |
| 2025-06-11 | Microservices and Real-Time Processing in Retail IT: A Review of Open-Source Toolchains and Deployment Strategies | Aaditaa Vashisht et.al. | 2506.09938 | null |
| 2025-06-11 | Wavelet Scattering Transform and Fourier Representation for Offline Detection of Malicious Clients in Federated Learning | Alessandro Licciardi et.al. | 2506.09674 | null |
| 2025-06-11 | Securing Open RAN: A Survey of Cryptographic Challenges and Emerging Solutions for 5G | Ryan Barker et.al. | 2506.09418 | null |
| 2025-06-11 | Anomaly Detection and Generation with Diffusion Models: A Survey | Yang Liu et.al. | 2506.09368 | null |
| 2025-06-11 | ContextBuddy: AI-Enhanced Contextual Insights for Security Alert Investigation (Applied to Intrusion Detection) | Ronal Singh et.al. | 2506.09365 | null |
| 2025-06-10 | PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies | Mojtaba Nafez et.al. | 2506.09237 | null |
| 2025-06-10 | Integrating Asynchronous AdaBoost into Federated Learning: Five Real World Applications | Arthur Oghlukyan et.al. | 2506.09090 | null |
| 2025-06-10 | HomographyAD: Deep Anomaly Detection Using Self Homography Learning | Jongyub Seok et.al. | 2506.08784 | null |
| 2025-06-10 | EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements | Issa Sugiura et.al. | 2506.08762 | null |
| 2025-06-10 | MAMBO: High-Resolution Generative Approach for Mammography Images | Milica Škipina et.al. | 2506.08677 | null |
| 2025-06-09 | Evaluating explainable AI for deep learning-based network intrusion detection system alert classification | Rajesh Kalakoti et.al. | 2506.07882 | null |
| 2025-06-09 | Are Trees Really Green? A Detection Approach of IoT Malware Attacks | Silvia Lucia Sanna et.al. | 2506.07836 | null |
| 2025-06-09 | Explainable AI for Enhancing IDS Against Advanced Persistent Kill Chain | Bassam Noori Shaker et.al. | 2506.07480 | null |
| 2025-06-09 | Anomaly Detection and Early Warning Mechanism for Intelligent Monitoring Systems in Multi-Cloud Environments Based on LLM | Yihong Jin et.al. | 2506.07407 | null |
| 2025-06-09 | Enhanced Consistency Bi-directional GAN(CBiGAN) for Malware Anomaly Detection | Thesath Wijayasiri et.al. | 2506.07372 | null |
| 2025-06-08 | Towards Physics-informed Diffusion for Anomaly Detection in Trajectories | Arun Sharma et.al. | 2506.06999 | null |
| 2025-06-07 | ARGOS: Anomaly Recognition and Guarding through O-RAN Sensing | Stavros Dimou et.al. | 2506.06916 | null |
| 2025-06-07 | Harnessing Vision-Language Models for Time Series Anomaly Detection | Zelin He et.al. | 2506.06836 | null |
| 2025-06-07 | Cross-Entropy Games for Language Models: From Implicit Knowledge to General Capability Measures | Clément Hongler et.al. | 2506.06832 | null |
| 2025-06-07 | LADSG: Label-Anonymized Distillation and Similar Gradient Substitution for Label Privacy in Vertical Federated Learning | Zeyu Yan et.al. | 2506.06742 | null |
| 2025-06-06 | PROVSYN: Synthesizing Provenance Graphs for Data Augmentation in Intrusion Detection Systems | Yi Huang et.al. | 2506.06226 | null |
| 2025-06-06 | MLOps with Microservices: A Case Study on the Maritime Domain | Renato Cordeiro Ferreira et.al. | 2506.06202 | null |
| 2025-06-06 | Tensor-to-Tensor Models with Fast Iterated Sum Features | Joscha Diehl et.al. | 2506.06041 | null |
| 2025-06-06 | $\text{C}^{2}\text{BNVAE}$ : Dual-Conditional Deep Generation of Network Traffic Data for Network Intrusion Detection System Balancing | Yifan Zeng et.al. | 2506.05844 | null |
| 2025-06-06 | FIST: A Structured Threat Modeling Framework for Fraud Incidents | Yu-Chen Dai et.al. | 2506.05740 | null |
| 2025-06-05 | Track Any Anomalous Object: A Granular Video Anomaly Detection Pipeline | Yuzhi Huang et.al. | 2506.05175 | null |
| 2025-06-05 | Federated Isolation Forest for Efficient Anomaly Detection on Edge IoT Systems | Pavle Vasiljevic et.al. | 2506.05138 | null |
| 2025-06-05 | Noise-Driven AI Sensors: Secure Healthcare Monitoring with PUFs | Christiana Chamon et.al. | 2506.05135 | null |
| 2025-06-05 | Low noise flux estimate and data quality control monitoring in EUCLID-NISP cosmological survey | B. Kubik et.al. | 2506.05024 | null |
| 2025-06-05 | Evaluating the Impact of Privacy-Preserving Federated Learning on CAN Intrusion Detection | Gabriele Digregorio et.al. | 2506.04978 | null |
| 2025-06-05 | KPIRoot+: An Efficient Integrated Framework for Anomaly Detection and Root Cause Analysis in Large-Scale Cloud Systems | Wenwei Gu et.al. | 2506.04569 | null |
| 2025-06-04 | Neurosymbolic Artificial Intelligence for Robust Network Intrusion Detection: From Scratch to Transfer Learning | Huynh T. T. Tran et.al. | 2506.04454 | null |
| 2025-06-04 | An AI-Based Public Health Data Monitoring System | Ananya Joshi et.al. | 2506.04429 | null |
| 2025-06-04 | How to Use Graph Data in the Wild to Help Graph Anomaly Detection? | Yuxuan Cao et.al. | 2506.04190 | null |
| 2025-06-04 | Causality-Aware Contrastive Learning for Robust Multivariate Time-Series Anomaly Detection | HyunGi Kim et.al. | 2506.03964 | null |
| 2025-06-04 | Lower Ricci Curvature for Hypergraphs | Shiyi Yang et.al. | 2506.03943 | null |
| 2025-06-04 | INP-Former++: Advancing Universal Anomaly Detection via Intrinsic Normal Prototypes and Residual Learning | Wei Luo et.al. | 2506.03660 | null |
| 2025-06-04 | SubSearch: Robust Estimation and Outlier Detection for Stochastic Block Models via Subgraph Search | Leonardo Martins Bianco et.al. | 2506.03657 | null |
| 2025-06-03 | Investigating Mask-aware Prototype Learning for Tabular Anomaly Detection | Ruiying Lu et.al. | 2506.02757 | null |
| 2025-06-03 | Data Leakage and Deceptive Performance: A Critical Examination of Credit Card Fraud Detection Methodologies | Khizar Hayat et.al. | 2506.02703 | null |
| 2025-06-04 | MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection | Juntong Li et.al. | 2506.02535 | link |
| 2025-06-03 | A Review of Various Datasets for Machine Learning Algorithm-Based Intrusion Detection System: Advances and Challenges | Sudhanshu Sekhar Tripathy et.al. | 2506.02438 | null |
| 2025-06-02 | RATFM: Retrieval-augmented Time Series Foundation Model for Anomaly Detection | Chihiro Maru et.al. | 2506.02081 | null |
| 2025-06-02 | Enhancing Interpretability of Quantum-Assisted Blockchain Clustering via AI Agent-Based Qualitative Analysis | Yun-Cheng Tsai et.al. | 2506.02068 | null |
| 2025-06-02 | Federated Gaussian Mixture Models | Sophia Zhang Pettersson et.al. | 2506.01780 | null |
| 2025-06-02 | ShaTS: A Shapley-based Explainability Method for Time Series Artificial Intelligence Models applied to Anomaly Detection in Industrial Internet of Things | Manuel Franco de la Peña et.al. | 2506.01450 | null |
| 2025-06-02 | System Calls for Malware Detection and Classification: Methodologies and Applications | Bishwajit Prasad Gond et.al. | 2506.01412 | null |
| 2025-06-01 | Continual-MEGA: A Large-scale Benchmark for Generalizable Continual Anomaly Detection | Geonu Lee et.al. | 2506.00956 | link |
| 2025-05-30 | Harnessing Large Language Models for Scientific Novelty Detection | Yan Liu et.al. | 2505.24615 | null |
| 2025-05-30 | HLSAD: Hodge Laplacian-based Simplicial Anomaly Detection | Florian Frantzen et.al. | 2505.24534 | null |
| 2025-05-30 | Diversify and Conquer: Open-set Disagreement for Robust Semi-supervised Learning with Outliers | Heejo Kong et.al. | 2505.24443 | link |
| 2025-05-30 | Bridging 3D Anomaly Localization and Repair via High-Quality Continuous Geometric Representation | Bozhong Zheng et.al. | 2505.24431 | null |
| 2025-05-30 | Anomaly Detection and Improvement of Clusters using Enhanced K-Means Algorithm | Vardhan Shorewala et.al. | 2505.24365 | null |
| 2025-05-30 | KairosAD: A SAM-Based Model for Industrial Anomaly Detection on Embedded Devices | Uzair Khan et.al. | 2505.24334 | null |
| 2025-05-30 | INSIGHT: A Survey of In-Network Systems for Intelligent, High-Efficiency AI and Topology Optimization | Aleksandr Algazinov et.al. | 2505.24269 | null |
| 2025-05-30 | SentinelAgent: Graph-based Anomaly Detection in Multi-Agent Systems | Xu He et.al. | 2505.24201 | null |
| 2025-05-29 | Machine Learning-Based Anomaly Detection of Correlated Sensor Data: An Integrated Principal Component Analysis-Autoencoder Approach | Tanish Baranwal et.al. | 2505.24044 | null |
| 2025-05-29 | An Advanced Cyber-Physical System Security Testbed for Substation Automation | Akila Herath et.al. | 2505.24021 | null |
| 2025-05-29 | Distributed Federated Learning for Vehicular Network Security: Anomaly Detection Benefits and Multi-Domain Attack Threats | Utku Demir et.al. | 2505.23706 | null |
| 2025-05-29 | VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning | Liyun Zhu et.al. | 2505.23504 | link |
| 2025-05-29 | Sentinel: Scheduling Live Streams with Proactive Anomaly Detection in Crowdsourced Cloud-Edge Platforms | Yuting Li et.al. | 2505.23347 | null |
| 2025-05-29 | FreRA: A Frequency-Refined Augmentation for Contrastive Learning on Time Series Classification | Tian Tian et.al. | 2505.23181 | link |
| 2025-05-28 | Anomalies by Synthesis: Anomaly Detection using Generative Diffusion Models for Off-Road Navigation | Siddharth Ancha et.al. | 2505.22805 | null |
| 2025-05-28 | Evaluating Supervised Learning Models for Fraud Detection: A Comparative Study of Classical and Deep Architectures on Imbalanced Transaction Data | Chao Wang et.al. | 2505.22521 | null |
| 2025-05-28 | Does Johnny Get the Message? Evaluating Cybersecurity Notifications for Everyday Users | Victor Jüttner et.al. | 2505.22435 | null |
| 2025-05-28 | Domain Adaptation of Attention Heads for Zero-shot Anomaly Detection | Kiyoon Jeong et.al. | 2505.22259 | null |
| 2025-05-28 | OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning | Shifang Zhao et.al. | 2505.22039 | null |
| 2025-05-28 | A2Seek: Towards Reasoning-Centric Benchmark for Aerial Anomaly Understanding | Mengjingcheng Mo et.al. | 2505.21962 | null |
| 2025-05-27 | Mentor3AD: Feature Reconstruction-based 3D Anomaly Detection via Multi-modality Mentor Learning | Jinbao Wang et.al. | 2505.21420 | null |
| 2025-05-27 | Learnable Kernel Density Estimation for Graphs | Xudong Wang et.al. | 2505.21285 | null |
| 2025-05-27 | Is Hyperbolic Space All You Need for Medical Anomaly Detection? | Alvaro Gonzalez-Jimenez et.al. | 2505.21228 | null |
| 2025-05-27 | RoBiS: Robust Binary Segmentation for High-Resolution Industrial Images | Xurui Li et.al. | 2505.21152 | link |
| 2025-05-27 | Robust and Explainable Detector of Time Series Anomaly via Augmenting Multiclass Pseudo-Anomalies | Kohei Obata et.al. | 2505.20765 | null |
| 2025-05-26 | Byzantine-Resilient Distributed P2P Energy Trading via Spatial-Temporal Anomaly Detection | Junhong Liu et.al. | 2505.20567 | null |
| 2025-05-26 | Cellwise and Casewise Robust Covariance in High Dimensions | Fabio Centofanti et.al. | 2505.19925 | null |
| 2025-05-26 | Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought | Chao Huang et.al. | 2505.19877 | link |
| 2025-05-26 | Multi-Agent Reinforcement Learning in Cybersecurity: From Fundamentals to Applications | Christoph R. Landolt et.al. | 2505.19837 | null |
| 2025-05-27 | SuperAD: A Training-free Anomaly Classification and Segmentation Method for CVPR 2025 VAND 3.0 Workshop Challenge Track 1: Adapt & Detect | Huaiyuan Zhang et.al. | 2505.19750 | null |
| 2025-05-26 | ICS for complex data with application to outlier detection for density data | Camille Mondon et.al. | 2505.19403 | link |
| 2025-05-25 | Co-evolutionary Dynamics of Attack and Defence in Cybersecurity | Adeela Bashir et.al. | 2505.19338 | null |
| 2025-05-25 | Rethinking Metrics and Benchmarks of Video Anomaly Detection | Zihao Liu et.al. | 2505.19022 | null |
| 2025-05-25 | Chi-Square Wavelet Graph Neural Networks for Heterogeneous Graph Anomaly Detection | Xiping Li et.al. | 2505.18934 | null |
| 2025-05-25 | Words as Geometric Features: Estimating Homography using Optical Character Recognition as Compressed Image Representation | Ross Greer et.al. | 2505.18925 | null |
| 2025-05-24 | Anomaly detection in radio galaxy data with trainable COSFIRE filters | Steven Ndung’u et.al. | 2505.18643 | null |
| 2025-05-23 | Rethinking Contrastive Learning in Graph Anomaly Detection: A Clean-View Perspective | Di Jin et.al. | 2505.18002 | null |
| 2025-05-23 | Hyperspectral Anomaly Detection Fused Unified Nonconvex Tensor Ring Factors Regularization | Wenjin Qin et.al. | 2505.17881 | null |
| 2025-05-23 | ViP $^2$ -CLIP: Visual-Perception Prompting with Unified Alignment for Zero-Shot Anomaly Detection | Ziteng Yang et.al. | 2505.17692 | null |
| 2025-05-23 | Automating Versatile Time-Series Analysis with Tiny Transformers on Embedded FPGAs | Tianheng Ling et.al. | 2505.17662 | null |
| 2025-05-23 | Large Language Models in the IoT Ecosystem – A Survey on Security Challenges and Applications | Kushal Khatiwada et.al. | 2505.17586 | null |
| 2025-05-23 | Center-aware Residual Anomaly Synthesis for Multi-class Industrial Anomaly Detection | Qiyu Chen et.al. | 2505.17551 | null |
| 2025-05-22 | Harnessing EHRs for Diffusion-based Anomaly Detection on Chest X-rays | Harim Kim et.al. | 2505.17311 | null |
| 2025-05-22 | Advancing Security with Digital Twins: A Comprehensive Survey | Blessing Airehenbuwa et.al. | 2505.17310 | null |
| 2025-05-22 | Vehicular Intrusion Detection System for Controller Area Network: A Comprehensive Survey and Evaluation | Yangyang Liu et.al. | 2505.17274 | null |
| 2025-05-22 | A Multi-Step Comparative Framework for Anomaly Detection in IoT Data Streams | Mohammed Al-Qudah et.al. | 2505.16872 | null |
| 2025-05-22 | Zero-Shot Anomaly Detection in Battery Thermal Images Using Visual Question Answering with Prior Knowledge | Marcella Astrid et.al. | 2505.16674 | null |
| 2025-05-22 | SD-MAD: Sign-Driven Few-shot Multi-Anomaly Detection in Medical Images | Kaiyu Guo et.al. | 2505.16659 | null |
| 2025-05-22 | Unsupervised Network Anomaly Detection with Autoencoders and Traffic Images | Michael Neri et.al. | 2505.16650 | link |
| 2025-05-22 | Is Your LLM-Based Multi-Agent a Reliable Real-World Planner? Exploring Fraud Detection in Travel Planning | Junchi Yao et.al. | 2505.16557 | null |
| 2025-05-22 | Privacy-Aware Cyberterrorism Network Analysis using Graph Neural Networks and Federated Learning | Anas Ali et.al. | 2505.16371 | null |
| 2025-05-22 | Learning novel representations of variable sources from multi-modal $\textit{Gaia}$ data via autoencoders | P. Huijse et.al. | 2505.16320 | null |
| 2025-05-22 | Interpretable Anomaly Detection in Encrypted Traffic Using SHAP with Machine Learning Models | Kalindi Singh et.al. | 2505.16261 | null |
| 2025-05-22 | MADCluster: Model-agnostic Anomaly Detection with Self-supervised Clustering Network | Sangyong Lee et.al. | 2505.16223 | null |
| 2025-05-22 | A Scalable Hierarchical Intrusion Detection System for Internet of Vehicles | Md Ashraf Uddin et.al. | 2505.16215 | null |
| 2025-05-21 | Federated Learning-Enhanced Blockchain Framework for Privacy-Preserving Intrusion Detection in Industrial IoT | Anas Ali et.al. | 2505.15376 | null |
| 2025-05-21 | Flashback: Memory-Driven Zero-shot, Real-time Video Anomaly Detection | Hyogun Lee et.al. | 2505.15205 | null |
| 2025-05-20 | Anomaly Detection Based on Critical Paths for Deep Neural Networks | Fangzhen Zhao et.al. | 2505.14967 | null |
| 2025-05-20 | SecCAN: An Extended CAN Controller with Embedded Intrusion Detection | Shashwat Khandelwal et.al. | 2505.14924 | null |
| 2025-05-20 | Adaptive Pruning of Deep Neural Networks for Resource-Aware Embedded Intrusion Detection on the Edge | Alexandre Broggi et.al. | 2505.14592 | null |
| 2025-05-20 | AquaSignal: An Integrated Framework for Robust Underwater Acoustic Analysis | Eirini Panteli et.al. | 2505.14285 | null |
| 2025-05-20 | Partition-wise Graph Filtering: A Unified Perspective Through the Lens of Graph Coarsening | Guoming Li et.al. | 2505.14033 | null |
| 2025-05-20 | CSAGC-IDS: A Dual-Module Deep Learning Network Intrusion Detection Model for Complex and Imbalanced Data | Yifan Zeng et.al. | 2505.14027 | null |
| 2025-05-20 | Multimodal RAG-driven Anomaly Detection and Classification in Laser Powder Bed Fusion using Large Language Models | Kiarash Naghavi Khanghah et.al. | 2505.13828 | null |
| 2025-05-19 | Unsupervised anomaly detection in MeV ultrafast electron diffraction | Mariana A. Fazio et.al. | 2505.13702 | null |
| 2025-05-19 | Sensitivity to New Physics Phenomena in Anomaly Detection: A Study of Untunable Hyperparameters | Fernando Abreu de Souza et.al. | 2505.13228 | null |
| 2025-05-19 | Just Dance with $π$ ! A Poly-modal Inductor for Weakly-supervised Video Anomaly Detection | Snehashis Majhi et.al. | 2505.13123 | null |
| 2025-05-19 | TSPulse: Dual Space Tiny Pre-Trained Models for Rapid Time-Series Analysis | Vijay Ekambaram et.al. | 2505.13033 | null |
| 2025-05-19 | Structure-based Anomaly Detection and Clustering | Filippo Leveni et.al. | 2505.12751 | null |
| 2025-05-19 | AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection | Tiankai Yang et.al. | 2505.12594 | null |
| 2025-05-18 | Importance Sampling for Nonlinear Models | Prakash Palanivelu Rajmohan et.al. | 2505.12353 | link |
| 2025-05-17 | PyScrew: A Comprehensive Dataset Collection from Industrial Screw Driving Experiments | Nikolai West et.al. | 2505.11925 | null |
| 2025-05-17 | Robust outlier detection for heterogeneous distributions applicable to censoring in functional MRI | Saranjeet Singh Saluja et.al. | 2505.11806 | null |
| 2025-05-17 | Are vision language models robust to uncertain inputs? | Xi Wang et.al. | 2505.11804 | null |
| 2025-05-17 | CL-BioGAN: Biologically-Inspired Cross-Domain Continual Learning for Hyperspectral Anomaly Detection | Jianing Wang et.al. | 2505.11796 | null |
| 2025-05-16 | Anomaly Detection for Non-stationary Time Series using Recurrent Wavelet Probabilistic Neural Network | Pu Yang et.al. | 2505.11321 | null |
| 2025-05-16 | Diffusion Model in Hyperspectral Image Processing and Analysis: A Review | Xing Hu et.al. | 2505.11158 | null |
| 2025-05-16 | Fairness-aware Anomaly Detection via Fair Projection | Feng Xiao et.al. | 2505.11132 | null |
| 2025-05-16 | Visual Anomaly Detection under Complex View-Illumination Interplay: A Large-Scale Benchmark | Yunkang Cao et.al. | 2505.10996 | null |
| 2025-05-16 | Preference Isolation Forest for Structure-based Anomaly Detection | Filippo Leveni et.al. | 2505.10876 | null |
| 2025-05-16 | Hashing for Structure-based Anomaly Detection | Filippo Leveni et.al. | 2505.10873 | null |
| 2025-05-15 | PIF: Anomaly detection via preference embedding | Filippo Leveni et.al. | 2505.10441 | null |
| 2025-05-15 | A Representation Learning Approach to Feature Drift Detection in Wireless Networks | Athanasios Tziouvaras et.al. | 2505.10325 | null |
| 2025-05-15 | Financial Fraud Detection Using Explainable AI and Stacking Ensemble Methods | Fahad Almalki et.al. | 2505.10050 | null |
| 2025-05-15 | AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection | Bin-Bin Gao et.al. | 2505.09926 | link |
| 2025-05-15 | Correlating Account on Ethereum Mixing Service via Domain-Invariant feature learning | Zheng Che et.al. | 2505.09892 | null |
| 2025-05-14 | Online Isolation Forest | Filippo Leveni et.al. | 2505.09593 | link |
| 2025-05-14 | CANTXSec: A Deterministic Intrusion Detection and Prevention System for CAN Bus Monitoring ECU Activations | Denis Donadel et.al. | 2505.09384 | null |
| 2025-05-14 | MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning | Bin-Bin Gao et.al. | 2505.09265 | link |
| 2025-05-14 | Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt | Bin-Bin Gao et.al. | 2505.09264 | link |
| 2025-05-14 | Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation | Guan Gui et.al. | 2505.09263 | link |
| 2025-05-14 | WSCIF: A Weakly-Supervised Color Intelligence Framework for Tactical Anomaly Detection in Surveillance Keyframes | Wei Meng et.al. | 2505.09129 | null |
| 2025-05-13 | Intelligent Road Anomaly Detection with Real-time Notification System for Enhanced Road Safety | Ali Almakhluk et.al. | 2505.08882 | null |
| 2025-05-13 | Exploring Scotogenic Parameter Spaces and Mapping Uncharted Dark Matter Phenomenology with Multi-Objective Search Algorithms | Fernando Abreu de Souza et.al. | 2505.08862 | null |
| 2025-05-13 | Robust Indoor Localization via Conformal Methods and Variational Bayesian Adaptive Filtering | Zhiyi Zhou et.al. | 2505.08639 | null |
| 2025-05-13 | neuralGAM: An R Package for Fitting Generalized Additive Neural Networks | Ines Ortega-Fernandez et.al. | 2505.08610 | null |
| 2025-05-13 | Isolation Forest in Novelty Detection Scenario | Adam Ulrich et.al. | 2505.08489 | null |
| 2025-05-13 | Structural-Temporal Coupling Anomaly Detection with Dynamic Graph Transformer | Chang Zong et.al. | 2505.08330 | null |
| 2025-05-13 | Deep Probabilistic Modeling of User Behavior for Anomaly Detection via Mixture Density Networks | Lu Dai et.al. | 2505.08220 | null |
| 2025-05-13 | Fault Detection Method for Power Conversion Circuits Using Thermal Image and Convolutional Autoencoder | Noboru Katayama et.al. | 2505.08150 | null |
| 2025-05-12 | Evaluating Explanation Quality in X-IDS Using Feature Alignment Metrics | Mohammed Alquliti et.al. | 2505.08006 | null |
| 2025-05-12 | Vision Foundation Model Embedding-Based Semantic Anomaly Detection | Max Peter Ronecker et.al. | 2505.07998 | null |
| 2025-05-12 | Simultaneous Intrusion Detection and Localization Using ISAC Network | Usama Shakoor et.al. | 2505.07656 | null |
| 2025-05-12 | Evaluating Modern Visual Anomaly Detection Approaches in Semiconductor Manufacturing: A Comparative Study | Manuel Barusco et.al. | 2505.07576 | null |
| 2025-05-12 | EAGLE: Contrastive Learning for Efficient Graph Anomaly Detection | Jing Ren et.al. | 2505.07508 | null |
| 2025-05-12 | Boosting Global-Local Feature Matching via Anomaly Synthesis for Multi-Class Point Cloud Anomaly Detection | Yuqi Cheng et.al. | 2505.07375 | link |
| 2025-05-12 | GAN-based synthetic FDG PET images from T1 brain MRI can serve to improve performance of deep unsupervised anomaly detection models | Daria Zotova et.al. | 2505.07364 | null |
| 2025-05-11 | Towards Scalable IoT Deployment for Visual Anomaly Detection via Efficient Compression | Arianna Stropeni et.al. | 2505.07119 | null |
| 2025-05-11 | Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety | Zihan Guan et.al. | 2505.06843 | null |
| 2025-05-10 | A Contrastive Federated Semi-Supervised Learning Intrusion Detection Framework for Internet of Robotic Things | Yifan Zeng et.al. | 2505.06636 | null |
| 2025-05-10 | AI-Powered Anomaly Detection with Blockchain for Real-Time Security and Reliability in Autonomous Vehicles | Rathin Chandra Shit et.al. | 2505.06632 | null |
| 2025-05-10 | ReplayCAD: Generative Diffusion Replay for Continual Anomaly Detection | Lei Hu et.al. | 2505.06603 | link |
| 2025-05-09 | Mixtures of multivariate linear asymmetric Laplace regressions with multiple asymmetric Laplace covariates | Arnoldus F. Otto et.al. | 2505.05979 | null |
| 2025-05-09 | Adaptive Robot Localization with Ultra-wideband Novelty Detection | Umberto Albertin et.al. | 2505.05903 | null |
| 2025-05-09 | Examining the Source of Defects from a Mechanical Perspective for 3D Anomaly Detection | Hanzhe Liang et.al. | 2505.05901 | link |
| 2025-05-09 | Unsupervised Anomaly Detection for Autonomous Robots via Mahalanobis SVDD with Audio-IMU Fusion | Yizhuo Yang et.al. | 2505.05811 | link |
| 2025-05-09 | Intrusion Detection System Using Deep Learning for Network Security | Soham Chatterjee et.al. | 2505.05810 | null |
| 2025-05-08 | KPI Poisoning: An Attack in Open RAN Near Real-Time Control Loop | Hamed Alimohammadi et.al. | 2505.05537 | null |
| 2025-05-08 | QUIC-Exfil: Exploiting QUIC’s Server Preferred Address Feature to Perform Data Exfiltration Attacks | Thomas Grübl et.al. | 2505.05292 | null |
| 2025-05-08 | Research on Anomaly Detection Methods Based on Diffusion Models | Yi Chen et.al. | 2505.05137 | null |
| 2025-05-08 | Conformal Prediction with Cellwise Outliers: A Detect-then-Impute Approach | Qian Peng et.al. | 2505.04986 | null |
| 2025-05-07 | Comparison of Visual Trackers for Biomechanical Analysis of Running | Luis F. Gomez et.al. | 2505.04713 | null |
| 2025-05-07 | Hierarchical Task Decomposition for Execution Monitoring and Error Recovery: Understanding the Rationale Behind Task Demonstrations | Christoph Willibald et.al. | 2505.04565 | null |
| 2025-05-07 | Detecting Spelling and Grammatical Anomalies in Russian Poetry Texts | Ilya Koziev et.al. | 2505.04507 | null |
| 2025-05-07 | Blockchain Data Analytics: A Scoping Literature Review and Directions for Future Research | Marcel Bühlmann et.al. | 2505.04403 | null |
| 2025-05-07 | Cyber Security Data Science: Machine Learning Methods and their Performance on Imbalanced Datasets | Mateo Lopez-Ledezma et.al. | 2505.04204 | null |
| 2025-05-06 | Extending Decision Predicate Graphs for Comprehensive Explanation of Isolation Forest | Matteo Ceschin et.al. | 2505.04019 | null |
| 2025-05-06 | Using anomaly detection to search for technosignatures in Breakthrough Listen observations | Snir Pardo et.al. | 2505.03927 | null |
| 2025-05-06 | Explaining Anomalies with Tensor Networks | Hans Hohenfeld et.al. | 2505.03911 | null |
| 2025-05-06 | AnomalyMatch: Discovering Rare Objects of Interest with Semi-supervised and Active Learning | Pablo Gómez et.al. | 2505.03509 | null |
| 2025-05-06 | CXR-AD: Component X-ray Image Dataset for Industrial Anomaly Detection | Haoyu Bai et.al. | 2505.03412 | null |
| 2025-05-06 | Bridging Expertise Gaps: The Role of LLMs in Human-AI Collaboration for Cybersecurity | Shahroz Tariq et.al. | 2505.03179 | null |
| 2025-05-06 | Adversarial Sample Generation for Anomaly Detection in Industrial Control Systems | Abdul Mustafa et.al. | 2505.03120 | null |
| 2025-05-05 | An Explainable Anomaly Detection Framework for Monitoring Depression and Anxiety Using Consumer Wearable Devices | Yuezhou Zhang et.al. | 2505.03039 | null |
| 2025-05-05 | Fast and Precise Track Fitting with Machine Learning | Ryan Miller et.al. | 2505.02937 | null |
| 2025-05-05 | A robust neural determination of the source-count distribution of the Fermi-LAT sky at high latitudes | Christopher Eckner et.al. | 2505.02906 | null |
| 2025-05-05 | Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models | Sassan Mokhtar et.al. | 2505.02626 | null |
| 2025-05-05 | Lane-Wise Highway Anomaly Detection | Mei Qiu et.al. | 2505.02613 | null |
| 2025-05-05 | A probabilistic view on Riemannian machine learning models for SPD matrices | Thibault de Surrel et.al. | 2505.02402 | null |
| 2025-05-05 | Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection | Sungheon Jeong et.al. | 2505.02393 | link |
| 2025-05-05 | Quantum-assited anomaly detection with multivariate Gaussian distribution | Chao-Hua Yu et.al. | 2505.02316 | null |
| 2025-05-04 | ProDisc-VAD: An Efficient System for Weakly-Supervised Anomaly Detection in Video Surveillance Applications | Tao Zhu et.al. | 2505.02179 | link |
| 2025-05-04 | Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving | Alexey Nekrasov et.al. | 2505.02148 | null |
| 2025-05-04 | MC3D-AD: A Unified Geometry-aware Reconstruction Model for Multi-category 3D Anomaly Detection | Jiayi Cheng et.al. | 2505.01969 | null |
| 2025-05-03 | Runtime Anomaly Detection for Drones: An Integrated Rule-Mining and Unsupervised-Learning Approach | Ivan Tan et.al. | 2505.01947 | null |
| 2025-05-03 | Context-Aware Online Conformal Anomaly Detection with Prediction-Powered Data Acquisition | Amirmohammad Farzaneh et.al. | 2505.01783 | null |
| 2025-05-02 | Constrained Network Adversarial Attacks: Validity, Robustness, and Transferability | Anass Grini et.al. | 2505.01328 | null |
| 2025-05-02 | Secure Cluster-Based Hierarchical Federated Learning in Vehicular Networks | M. Saeid HaghighiFard et.al. | 2505.01186 | null |
| 2025-05-02 | Quantum Support Vector Regression for Robust Anomaly Detection | Kilian Tscharke et.al. | 2505.01012 | null |
| 2025-05-02 | Addressing Noise and Stochasticity in Fraud Detection for Service Networks | Wenxin Zhang et.al. | 2505.00946 | null |
| 2025-05-02 | FreCT: Frequency-augmented Convolutional Transformer for Robust Time Series Anomaly Detection | Wenxin Zhang et.al. | 2505.00941 | null |
| 2025-05-02 | MARS: Defending Unmanned Aerial Vehicles From Attacks on Inertial Sensors with Model-based Anomaly Detection and Recovery | Haocheng Meng et.al. | 2505.00924 | null |
| 2025-05-01 | CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series | Tian Lan et.al. | 2505.00415 | null |
| 2025-05-01 | LLM-Based Threat Detection and Prevention Framework for IoT Ecosystems | Yazan Otoum et.al. | 2505.00240 | null |
| 2025-04-30 | Toward Practical Quantum Machine Learning: A Novel Hybrid Quantum LSTM for Fraud Detection | Rushikesh Ubale et.al. | 2505.00137 | null |
| 2025-04-30 | Anomaly-Driven Approach for Enhanced Prostate Cancer Segmentation | Alessia Hu et.al. | 2504.21789 | null |
| 2025-04-30 | Overlapping data in network protocols: bridging OS and NIDS reassembly gap | Lucas Aubard et.al. | 2504.21618 | null |
| 2025-04-30 | Generative AI in Financial Institution: A Global Survey of Opportunities, Threats, and Regulation | Bikash Saha et.al. | 2504.21574 | null |
| 2025-04-30 | Enhanced Semi-Supervised Stamping Process Monitoring with Physically-Informed Feature Extraction | Jianyu Zhang et.al. | 2504.21389 | null |
| 2025-04-30 | Are Haicheng and Tangshan Earthquakes Dragon-Kings? | Jiawei Li et.al. | 2504.21310 | null |
| 2025-04-30 | Learning Multi-view Multi-class Anomaly Detection | Qianzi Yu et.al. | 2504.21294 | null |
| 2025-04-30 | Subject Information Extraction for Novelty Detection with Domain Shifts | Yangyang Qu et.al. | 2504.21247 | null |
| 2025-04-29 | Optimized Quantum Embedding: A Universal Minor-Embedding Framework for Large Complete Bipartite Graph | Salvatore Sinno et.al. | 2504.21112 | null |
| 2025-04-29 | On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks | Adrian Rebmann et.al. | 2504.21074 | null |
| 2025-04-29 | Leveraging Generative AI Through Prompt Engineering and Rigorous Validation to Create Comprehensive Synthetic Datasets for AI Training in Healthcare | Polycarp Nalela et.al. | 2504.20921 | null |
| 2025-04-29 | GiBy: A Giant-Step Baby-Step Classifier For Anomaly Detection In Industrial Control Systems | Sarad Venugopalan et.al. | 2504.20906 | null |
| 2025-04-29 | Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking | Dayananda Herurkar et.al. | 2504.20900 | null |
| 2025-04-29 | Tabular Data Adapters: Improving Outlier Detection for Unlabeled Private Data | Dayananda Herurkar et.al. | 2504.20862 | null |
| 2025-04-29 | Unsupervised Surrogate Anomaly Detection | Simon Klüttermann et.al. | 2504.20733 | null |
| 2025-04-28 | The Dark Side of Digital Twins: Adversarial Attacks on AI-Driven Water Forecasting | Mohammadhossein Homaei et.al. | 2504.20295 | null |
| 2025-04-28 | Smart Water Security with AI and Blockchain-Enhanced Digital Twins | Mohammadhossein Homaei et.al. | 2504.20275 | null |
| 2025-04-28 | A Virtual Cybersecurity Department for Securing Digital Twins in Water Distribution Systems | Mohammadhossein Homaei et.al. | 2504.20266 | null |
| 2025-04-28 | Cybersecurity for Autonomous Vehicles | Sai varun reddy Bhemavarapu et.al. | 2504.20180 | null |
| 2025-04-28 | A Novel Multilevel Taxonomical Approach for Describing High-Dimensional Unlabeled Movement Data | Yashat Tavakoli et.al. | 2504.20174 | null |
| 2025-04-28 | Simplified and Secure MCP Gateways for Enterprise AI Integration | Ivo Brett et.al. | 2504.19997 | null |
| 2025-04-28 | Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose | Narges Rashvand et.al. | 2504.19970 | null |
| 2025-04-28 | QFDNN: A Resource-Efficient Variational Quantum Feature Deep Neural Networks for Fraud Detection and Loan Prediction | Subham Das et.al. | 2504.19632 | null |
| 2025-04-28 | LR-IAD:Mask-Free Industrial Anomaly Detection with Logical Reasoning | Peijian Zeng et.al. | 2504.19524 | null |
| 2025-04-27 | Segmenting Objectiveness and Task-awareness Unknown Region for Autonomous Driving | Mi Zheng et.al. | 2504.19183 | null |
| 2025-04-26 | Zero-Day Botnet Attack Detection in IoV: A Modular Approach Using Isolation Forests and Particle Swarm Optimization | Abdelaziz Amara korba et.al. | 2504.18814 | null |
| 2025-04-26 | Reservoir-enhanced Segment Anything Model for Subsurface Diagnosis | Xiren Zhou et.al. | 2504.18802 | null |
| 2025-04-26 | ALF: Advertiser Large Foundation Model for Multi-Modal Advertiser Understanding | Santosh Rajagopalan et.al. | 2504.18785 | null |
| 2025-04-26 | Performance of Machine Learning Classifiers for Anomaly Detection in Cyber Security Applications | Markus Haug et.al. | 2504.18771 | null |
| 2025-04-25 | Unsupervised outlier detection to improve bird audio dataset labels | Bruce Collins et.al. | 2504.18650 | null |
| 2025-04-25 | An Unsupervised Machine Learning Approach to Identify Spectral Energy Distribution Outliers: Application to the S-PLUS DR4 data | F. Quispe-Huaynasi et.al. | 2504.18491 | null |
| 2025-04-25 | Time and Frequency Domain-based Anomaly Detection in Smart Meter Data for Distribution Network Studies | Petar Labura et.al. | 2504.18231 | null |
| 2025-04-25 | Bayesian Quantum Orthogonal Neural Networks for Anomaly Detection | Natansh Mathur et.al. | 2504.18103 | null |
| 2025-04-24 | Enabling Deep Visibility into VxWorks-Based Embedded Controllers in Cyber-Physical Systems for Anomaly Detection | Prashanth Krishnamurthy et.al. | 2504.17875 | null |
| 2025-04-24 | Fault Diagnosis in New Wind Turbines using Knowledge from Existing Turbines by Generative Domain Adaptation | Stefan Jonas et.al. | 2504.17709 | null |
| 2025-04-24 | MindFlow: A Network Traffic Anomaly Detection Model Based on MindSpore | Qiuyan Xiang et.al. | 2504.17678 | null |
| 2025-04-25 | PTCL: Pseudo-Label Temporal Curriculum Learning for Label-Limited Dynamic Graph | Shengtao Zhang et.al. | 2504.17641 | null |
| 2025-04-24 | Quantum Autoencoder for Multivariate Time Series Anomaly Detection | Kilian Tscharke et.al. | 2504.17548 | null |
| 2025-04-23 | Goodness-of-fit for amplitude analysis with anomaly detection | Huoyi Hou et.al. | 2504.17494 | null |
| 2025-04-24 | Breaking the Flow and the Bank: Stealthy Cyberattacks on Water Network Hydraulics | Abdallah Alalem Albustami et.al. | 2504.17211 | null |
| 2025-04-23 | Unsupervised Time-Series Signal Analysis with Autoencoders and Vision Transformers: A Review of Architectures and Applications | Hossein Ahmadi et.al. | 2504.16972 | null |
| 2025-04-23 | CAIBA: Multicast Source Authentication for CAN Through Reactive Bit Flipping | Eric Wagner et.al. | 2504.16695 | null |
| 2025-04-23 | A Collaborative Intrusion Detection System Using Snort IDS Nodes | Tom Davies et.al. | 2504.16550 | null |
| 2025-04-23 | Almost Right: Making First-layer Kernels Nearly Orthogonal Improves Model Generalization | Colton R. Crum et.al. | 2504.16362 | null |
| 2025-04-22 | Blockchain Meets Adaptive Honeypots: A Trust-Aware Approach to Next-Gen IoT Security | Yazan Otoum et.al. | 2504.16226 | null |
| 2025-04-22 | Explainable Unsupervised Anomaly Detection with Random Forest | Joshua S. Harvey et.al. | 2504.16075 | null |
| 2025-04-22 | Adaptive PCA-Based Outlier Detection for Multi-Feature Time Series in Space Missions | Jonah Ekelund et.al. | 2504.15846 | null |
| 2025-04-22 | Bayesian Autoencoder for Medical Anomaly Detection: Uncertainty-Aware Approach for Brain 2 MRI Analysis | Dip Roy et.al. | 2504.15562 | null |
| 2025-04-21 | Application of Deep Generative Models for Anomaly Detection in Complex Financial Transactions | Tengda Tang et.al. | 2504.15491 | null |
| 2025-04-21 | FLARE: Feature-based Lightweight Aggregation for Robust Evaluation of IoT Intrusion Detection | Bradley Boswell et.al. | 2504.15375 | null |
| 2025-04-21 | M $^2$ AD: Multi-Sensor Multi-System Anomaly Detection through Global Scoring and Calibrated Thresholding | Sarah Alnegheimish et.al. | 2504.15225 | link |
| 2025-04-21 | GenCLIP: Generalizing CLIP Prompts for Zero-shot Anomaly Detection | Donghyeong Kim et.al. | 2504.14919 | null |
| 2025-04-21 | Memory-Augmented Dual-Decoder Networks for Multi-Class Unsupervised Anomaly Detection | Jingyu Xing et.al. | 2504.14884 | null |
| 2025-04-20 | Advancing Video Anomaly Detection: A Bi-Directional Hybrid Framework for Enhanced Single- and Multi-Task Approaches | Guodong Shen et.al. | 2504.14753 | null |
| 2025-04-20 | Sensor Scheduling in Intrusion Detection Games with Uncertain Payoffs | Jayanth Bhargav et.al. | 2504.14725 | null |
| 2025-04-20 | Uncovering Issues in the Radio Access Network by Looking at the Neighbors | José Suárez-Varela et.al. | 2504.14686 | null |
| 2025-04-20 | Hierarchical Robust PCA for Scalable Data Quality Monitoring in Multi-level Aggregation Pipelines | Preetam Kumar Ojha et.al. | 2504.14524 | null |
| 2025-04-20 | Application of Deep Reinforcement Learning for Intrusion Detection in Internet of Things: A Systematic Review | Saeid Jamshidia et.al. | 2504.14436 | null |
| 2025-04-19 | Balancing Privacy and Action Performance: A Penalty-Driven Approach to Image Anonymization | Nazia Aslam et.al. | 2504.14301 | null |
| 2025-04-19 | A Pre-Training and Adaptive Fine-Tuning Framework for Graph Anomaly Detection | Yunhui Liu et.al. | 2504.14250 | null |
| 2025-04-18 | Enhancing Pothole Detection and Characterization: Integrated Segmentation and Depth Estimation in Road Anomaly Systems | Uthman Baroudi et.al. | 2504.13648 | null |
| 2025-04-18 | Can Local Representation Alignment RNNs Solve Temporal Tasks? | Nikolay Manchev et.al. | 2504.13531 | null |
| 2025-04-18 | Chain-of-Thought Textual Reasoning for Few-shot Temporal Action Localization | Hongwei Ji et.al. | 2504.13460 | null |
| 2025-04-18 | Towards a Multi-Agent Vision-Language System for Zero-Shot Novel Hazardous Object Detection for Autonomous Driving Safety | Shashank Shriram et.al. | 2504.13399 | null |
| 2025-04-17 | DYNAMITE: Dynamic Defense Selection for Enhancing Machine Learning-based Intrusion Detection Against Adversarial Attacks | Jing Chen et.al. | 2504.13301 | null |
| 2025-04-17 | Weakly supervised anomaly detection with event-level variables | Liam Brennan et.al. | 2504.13249 | null |
| 2025-04-17 | Predicting BVD Re-emergence in Irish Cattle From Highly Imbalanced Herd-Level Data Using Machine Learning Algorithms | Niamh Mimnagh et.al. | 2504.13116 | null |
| 2025-04-17 | Quorum: Zero-Training Unsupervised Anomaly Detection using Quantum Autoencoders | Jason Zev Ludmir et.al. | 2504.13113 | link |
| 2025-04-17 | EventVAD: Training-Free Event-Aware Video Anomaly Detection | Yihua Shao et.al. | 2504.13092 | null |
| 2025-04-17 | MathPhys-Guided Coarse-to-Fine Anomaly Synthesis with SQE-Driven Bi-Level Optimization for Anomaly Detection | Long Qian et.al. | 2504.12970 | null |
| 2025-04-17 | Sliced-Wasserstein Distance-based Data Selection | Julien Pallage et.al. | 2504.12918 | null |
| 2025-04-17 | 3D-PNAS: 3D Industrial Surface Anomaly Synthesis with Perlin Noise | Yifeng Cheng et.al. | 2504.12856 | null |
| 2025-04-17 | LAD-Reasoner: Tiny Multimodal Models are Good Reasoners for Logical Anomaly Detection | Weijia Li et.al. | 2504.12749 | null |
| 2025-04-17 | HSS-IAD: A Heterogeneous Same-Sort Industrial Anomaly Detection Dataset | Qishan Wang et.al. | 2504.12689 | link |
| 2025-04-16 | Enhancing Sensitivity for Di-Higgs Boson Searches Using Anomaly Detection and Supervised Machine Learning Techniques | Sergei V. Chekanov et.al. | 2504.12418 | null |
| 2025-04-16 | AnomalyGen: An Automated Semantic Log Sequence Generation Framework with LLM for Anomaly Detection | Xinyu Li et.al. | 2504.12250 | null |
| 2025-04-16 | LO2: Microservice API Anomaly Dataset of Logs and Metrics | Alexander Bakhtin et.al. | 2504.12067 | null |
| 2025-04-16 | AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection | Yuhao Chao et.al. | 2504.11914 | null |
| 2025-04-16 | Search is All You Need for Few-shot Anomaly Detection | Qishan Wang et.al. | 2504.11895 | link |
| 2025-04-16 | Federated Spectral Graph Transformers Meet Neural Ordinary Differential Equations for Non-IID Graphs | Kishan Gurumurthy et.al. | 2504.11808 | link |
| 2025-04-16 | ACMamba: Fast Unsupervised Anomaly Detection via An Asymmetrical Consensus State Space Model | Guanchun Wang et.al. | 2504.11781 | null |
| 2025-04-15 | Possibility for Proactive Anomaly Detection | Jinsung Jeon et.al. | 2504.11623 | null |
| 2025-04-15 | Strengthening Anomaly Awareness | Adam Banda et.al. | 2504.11520 | null |
| 2025-04-15 | HeatSense: Intelligent Thermal Anomaly Detection for Securing NoC-Enabled MPSoCs | Mahdi Hasanzadeh et.al. | 2504.11421 | null |
| 2025-04-16 | A Real-time Anomaly Detection Method for Robots based on a Flexible and Sparse Latent Space | Taewook Kang et.al. | 2504.11170 | null |
| 2025-04-15 | Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections | Alireza Salehi et.al. | 2504.11055 | link |
| 2025-04-15 | AFiRe: Anatomy-Driven Self-Supervised Learning for Fine-Grained Representation in Radiographic Images | Yihang Liu et.al. | 2504.10972 | null |
| 2025-04-14 | Optimising Intrusion Detection Systems in Cloud-Edge Continuum with Knowledge Distillation for Privacy-Preserving and Efficient Communication | Soad Almabdy et.al. | 2504.10698 | null |
| 2025-04-14 | SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model | Zongcan Ding et.al. | 2504.10320 | null |
| 2025-04-14 | ROSFD: Robust Online Streaming Fraud Detection with Resilience to Concept Drift in Data Streams | Vivek Yelleti et.al. | 2504.10229 | null |
| 2025-04-14 | Investigating the Role of Bilateral Symmetry for Inpainting Brain MRI | Sergey Kuznetsov et.al. | 2504.10039 | null |
| 2025-04-13 | Evaluating Machine Learning-Driven Intrusion Detection Systems in IoT: Performance and Energy Consumption | Saeid Jamshidi et.al. | 2504.09634 | null |
| 2025-04-13 | MADLLM: Multivariate Anomaly Detection via Pre-trained LLMs | Wei Tao et.al. | 2504.09504 | null |
| 2025-04-12 | Dupin: A Parallel Framework for Densest Subgraph Discovery in Fraud Detection on Massive Graphs (Technical Report) | Jiaxin Jiang et.al. | 2504.09311 | null |
| 2025-04-12 | Secure Physical Layer Communications for Low-Altitude Economy Networking: A Survey | Lingyi Cai et.al. | 2504.09153 | null |
| 2025-04-12 | CAShift: Benchmarking Log-Based Cloud Attack Detection under Normality Shift | Jiongchi Yu et.al. | 2504.09115 | null |
| 2025-04-12 | Leveraging Large Self-Supervised Time-Series Models for Transferable Diagnosis in Cross-Aircraft Type Bleed Air System | Yilin Wang et.al. | 2504.09090 | null |
| 2025-04-11 | Toward Realistic Adversarial Attacks in IDS: A Novel Feasibility Metric for Transferability | Sabrine Ennaji et.al. | 2504.08480 | null |
| 2025-04-11 | DaemonSec: Examining the Role of Machine Learning for Daemon Security in Linux Environments | Sheikh Muhammad Farjad et.al. | 2504.08227 | null |
| 2025-04-11 | Detecting Credit Card Fraud via Heterogeneous Graph Neural Networks with Graph Attention | Qiuwu Sha et.al. | 2504.08183 | null |
| 2025-04-10 | Benchmarking Suite for Synthetic Aperture Radar Imagery Anomaly Detection (SARIAD) Algorithms | Lucian Chauvina et.al. | 2504.08115 | null |
| 2025-04-10 | Dataset of artefacts for machine learning applications in astronomy | Sreevarsha Sreejith et.al. | 2504.08053 | null |
| 2025-04-10 | Patch distribution modeling framework adaptive cosine estimator (PaDiM-ACE) for anomaly detection and localization in synthetic aperture radar imagery | Angelina Ibarra et.al. | 2504.08049 | null |
| 2025-04-10 | Deep Learning-based Intrusion Detection Systems: A Survey | Zhiwei Xu et.al. | 2504.07839 | null |
| 2025-04-10 | PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization | Yang Jiao et.al. | 2504.07717 | null |
| 2025-04-10 | Adversarial Subspace Generation for Outlier Detection in High-Dimensional Data | Jose Cribeiro-Ramallo et.al. | 2504.07522 | link |
| 2025-04-10 | Intelligent DoS and DDoS Detection: A Hybrid GRU-NTM Approach to Network Security | Caroline Panggabean et.al. | 2504.07478 | null |
| 2025-04-09 | Leveraging Machine Learning Techniques in Intrusion Detection Systems for Internet of Things | Saeid Jamshidi et.al. | 2504.07220 | null |
| 2025-04-09 | Weak Signals and Heavy Tails: Machine-learning meets Extreme Value Theory | Stephan Clémençon et.al. | 2504.06984 | null |
| 2025-04-09 | MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning | Ylli Sadikaj et.al. | 2504.06740 | null |
| 2025-04-10 | AMAD: AutoMasked Attention for Unsupervised Multivariate Time Series Anomaly Detection | Tiange Huang et.al. | 2504.06643 | null |
| 2025-04-08 | TRIDENT: Tri-modal Real-time Intrusion Detection Engine for New Targets | Ildi Alla et.al. | 2504.06417 | link |
| 2025-04-08 | A Case for Network-wide Orchestration of Host-based Intrusion Detection and Response | Mark Timmons et.al. | 2504.06241 | null |
| 2025-04-08 | A Self-Supervised Framework for Space Object Behaviour Characterisation | Ian Groves et.al. | 2504.06176 | null |
| 2025-04-08 | Sherlock: A Dataset for Process-aware Intrusion Detection Research on Power Grid Networks | Eric Wagner et.al. | 2504.06102 | null |
| 2025-04-08 | MCAT: Visual Query-Based Localization of Standard Anatomical Clips in Fetal Ultrasound Videos Using Multi-Tier Class-Aware Token Transformer | Divyanshu Mishra et.al. | 2504.06088 | null |
| 2025-04-08 | Enhanced Anomaly Detection for Capsule Endoscopy Using Ensemble Learning Strategies | Julia Werner et.al. | 2504.06039 | null |
| 2025-04-08 | Autoencoder-Based Detection of Anomalous Stokes V Spectra in the Flare-Producing Active Region 13663 Using Hinode/SP Observations | Jargalmaa Batmunkh et.al. | 2504.05962 | null |
| 2025-04-08 | Addressing Class Imbalance with Probabilistic Graphical Models and Variational Inference | Yujia Lou et.al. | 2504.05758 | null |
| 2025-04-08 | Reconstruction-Free Anomaly Detection with Diffusion Models via Direct Latent Likelihood Evaluation | Shunsuke Sakai et.al. | 2504.05662 | null |
| 2025-04-07 | Towards Efficient Real-Time Video Motion Transfer via Generative Time Series Modeling | Tasmiah Haque et.al. | 2504.05537 | null |
| 2025-04-07 | On multipolar magnetic anomaly detection: multipolar signal subspaces, an analytical orthonormal basis, multipolar truncature and detection performance | Clément Chenevas-Paule et.al. | 2504.05212 | null |
| 2025-04-07 | IterMask3D: Unsupervised Anomaly Detection and Segmentation with Test-Time Iterative Mask Refinement in 3D Brain MR | Ziyun Liang et.al. | 2504.04911 | null |
| 2025-04-07 | SoK: LLM-based Log Parsing | Viktor Beck et.al. | 2504.04877 | null |
| 2025-04-06 | AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection | Peng Wu et.al. | 2504.04495 | null |
| 2025-04-06 | iADCPS: Time Series Anomaly Detection for Evolving Cyber-physical Systems via Incremental Meta-learning | Jiyu Tian et.al. | 2504.04374 | null |
| 2025-04-06 | WeiDetect: Weibull Distribution-Based Defense against Poisoning Attacks in Federated Learning for Network Intrusion Detection Systems | Sameera K. M. et.al. | 2504.04367 | null |
| 2025-04-06 | AnomalyHybrid: A Domain-agnostic Generative Framework for General Anomaly Detection | Ying Zhao et.al. | 2504.04340 | null |
| 2025-04-05 | AttackLLM: LLM-based Attack Pattern Generation for an Industrial Control System | Chuadhry Mujeeb Ahmed et.al. | 2504.04187 | null |
| 2025-04-05 | Overcoming the Identity Mapping Problem in Self-Supervised Hyperspectral Anomaly Detection | Yongchuan Cui et.al. | 2504.04115 | null |
| 2025-04-05 | Foundation Models for Time Series: A Survey | Siva Rama Krishna Kottapalli et.al. | 2504.04011 | null |
| 2025-04-04 | Pyramid-based Mamba Multi-class Unsupervised Anomaly Detection | Nasar Iqbal et.al. | 2504.03442 | null |
| 2025-04-04 | Multi-Flow: Multi-View-Enriched Normalizing Flows for Industrial Anomaly Detection | Mathis Kruse et.al. | 2504.03306 | null |
| 2025-04-04 | Search for anomalous quartic gauge couplings in the process $μ^+μ^-\to \barννγγ$ with a nested local outlier factor | Ke-Xin Chen et.al. | 2504.03145 | null |
| 2025-04-03 | Anomaly Detection in Time Series Data Using Reinforcement Learning, Variational Autoencoder, and Active Learning | Bahareh Golchin et.al. | 2504.02999 | null |
| 2025-04-03 | Improving log-based anomaly detection through learned adaptive filter | Yiyuan Xiong et.al. | 2504.02994 | null |
| 2025-04-03 | TailedCore: Few-Shot Sampling for Unsupervised Long-Tail Noisy Anomaly Detection | Yoon Gyo Jung et.al. | 2504.02775 | null |
| 2025-04-03 | Analytical Discovery of Manifold with Machine Learning | Yafei Shen et.al. | 2504.02511 | null |
| 2025-04-03 | ZClip: Adaptive Spike Mitigation for LLM Pre-Training | Abhay Kumar et.al. | 2504.02507 | link |
| 2025-04-03 | VISTA: Unsupervised 2D Temporal Dependency Representations for Time Series Anomaly Detection | Sinchee Chin et.al. | 2504.02498 | null |
| 2025-04-03 | Robust Randomized Low-Rank Approximation with Row-Wise Outlier Detection | Aidan Tiruvan et.al. | 2504.02432 | null |
| 2025-04-03 | Distributed Log-driven Anomaly Detection System based on Evolving Decision Making | Zhuoran Tan et.al. | 2504.02322 | null |
| 2025-04-03 | Enhancing Customer Contact Efficiency with Graph Neural Networks in Credit Card Fraud Detection Workflow | Menghao Huo et.al. | 2504.02275 | null |
| 2025-04-03 | CRC-SGAD: Conformal Risk Control for Supervised Graph Anomaly Detection | Songran Bai et.al. | 2504.02248 | null |
| 2025-04-02 | LogLSHD: Fast Log Parsing with Locality-Sensitive Hashing and Dynamic Time Warping | Shu-Wei Huang et.al. | 2504.02172 | null |
| 2025-04-03 | Accelerating IoV Intrusion Detection: Benchmarking GPU-Accelerated vs CPU-Based ML Libraries | Furkan Çolhak et.al. | 2504.01905 | null |
| 2025-04-02 | CO-DEFEND: Continuous Decentralized Federated Learning for Secure DoH-Based Threat Detection | Diego Cajaraville-Aboy et.al. | 2504.01882 | null |
| 2025-04-02 | What is AI, what is it not, how we use it in physics and how it impacts… you | Claire David et.al. | 2504.01827 | null |
| 2025-04-02 | Anomaly Detection for Hybrid Butterfly Subspecies via Probability Filtering | Bo-Kai Ruan et.al. | 2504.01671 | null |
| 2025-04-02 | The Multifractal IP Address Structure: Physical Explanation and Implications | Chris Misa et.al. | 2504.01374 | null |
| 2025-04-01 | Towards Resilient Federated Learning in CyberEdge Networks: Recent Advances and Future Trends | Kai Li et.al. | 2504.01240 | null |
| 2025-04-01 | Conformal Anomaly Detection for Functional Data with Elastic Distance Metrics | Jason Adams et.al. | 2504.01172 | null |
| 2025-04-01 | Efficient State Estimation of a Networked FlipIt Model | Brandon Collins et.al. | 2504.01096 | null |
| 2025-04-01 | Detection of Anomalous Vehicular Traffic and Sensor Failures Using Data Clustering Techniques | Davide Moretti et.al. | 2504.00881 | null |
| 2025-04-01 | FeatInsight: An Online ML Feature Management System on 4Paradigm Sage-Studio Platform | Xin Tong et.al. | 2504.00786 | null |
| 2025-04-01 | TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection | Zhiming Ma et.al. | 2503.24115 | link |
| 2025-03-31 | Federated Structured Sparse PCA for Anomaly Detection in IoT Networks | Chenyi Huang et.al. | 2503.23981 | null |
| 2025-03-31 | Detecting Localized Density Anomalies in Multivariate Data via Coin-Flip Statistics | Sebastian Springer et.al. | 2503.23927 | link |
| 2025-03-31 | Evaluation of (Un-)Supervised Machine Learning Methods for GNSS Interference Classification with Real-World Data Discrepancies | Lucas Heublein et.al. | 2503.23775 | null |
| 2025-03-30 | Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection | Aimira Baitieva et.al. | 2503.23451 | null |
| 2025-03-29 | Unsupervised Anomaly Detection in Multivariate Time Series across Heterogeneous Domains | Vincent Jacob et.al. | 2503.23060 | link |
| 2025-03-28 | A Dataset for Semantic Segmentation in the Presence of Unknowns | Zakaria Laskar et.al. | 2503.22309 | null |
| 2025-03-28 | Federated Intrusion Detection System Based on Unsupervised Machine Learning | Maxime Gourceyraud et.al. | 2503.22065 | null |
| 2025-03-27 | AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis | Zhiwei Yang et.al. | 2503.21904 | null |
| 2025-03-27 | VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness | Dian Zheng et.al. | 2503.21755 | link |
| 2025-03-27 | The MVTec AD 2 Dataset: Advanced Scenarios for Unsupervised Anomaly Detection | Lars Heckler-Kram et.al. | 2503.21622 | null |
| 2025-03-27 | Advancing CAN Network Security through RBM-Based Synthetic Attack Data Generation for Intrusion Detection Systems | Huacheng Li et.al. | 2503.21496 | link |
| 2025-03-27 | Unveiling Latent Information in Transaction Hashes: Hypergraph Learning for Ethereum Ponzi Scheme Detection | Junhao Wu et.al. | 2503.21463 | null |
| 2025-03-27 | VADMamba: Exploring State Space Models for Fast Video Anomaly Detection | Jiahao Lyu et.al. | 2503.21169 | link |
| 2025-03-27 | A Data Balancing and Ensemble Learning Approach for Credit Card Fraud Detection | Yuhan Wang et.al. | 2503.21160 | null |
| 2025-03-28 | Omni-AD: Learning to Reconstruct Global and Local Features for Multi-class Anomaly Detection | Jiajie Quan et.al. | 2503.21125 | null |
| 2025-03-26 | Channel impulse response peak clustering using neural networks | Petr Horky et.al. | 2503.20838 | null |
| 2025-03-26 | Stabilizing Neural Likelihood Ratio Estimation | Fernando Torales Acosta et.al. | 2503.20753 | null |
| 2025-03-26 | $β$ -GNN: A Robust Ensemble Approach Against Graph Structure Perturbation | Haci Ismail Aslan et.al. | 2503.20630 | null |
| 2025-03-26 | Are We There Yet? Unraveling the State-of-the-Art Graph Network Intrusion Detection Systems | Chenglong Wang et.al. | 2503.20281 | null |
| 2025-03-26 | LogicQA: Logical Anomaly Detection with Vision Language Model Generated Questions | Yejin Kwon et.al. | 2503.20252 | null |
| 2025-03-25 | Video Anomaly Detection with Contours - A Study | Mia Siemon et.al. | 2503.19588 | null |
| 2025-03-25 | Post-Hoc Calibrated Anomaly Detection | Sean Gloumeau et.al. | 2503.19577 | null |
| 2025-03-25 | Bayesian Outlier Detection for Matrix-variate Models | Monica Billio et.al. | 2503.19515 | null |
| 2025-03-25 | Social Network User Profiling for Anomaly Detection Based on Graph Neural Networks | Yiwei Zhang et.al. | 2503.19380 | null |
| 2025-03-25 | Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection | Farzad Beizaee et.al. | 2503.19357 | link |
| 2025-03-25 | Efficient IoT Intrusion Detection with an Improved Attention-Based CNN-BiLSTM Architecture | Amna Naeem et.al. | 2503.19339 | null |
| 2025-03-24 | Risk-Based Thresholding for Reliable Anomaly Detection in Concentrated Solar Power Plants | Yorick Estievenart et.al. | 2503.19146 | null |
| 2025-03-24 | Anomaly Detection Using Computer Vision: A Comparative Analysis of Class Distinction and Performance Metrics | Md. Barkat Ullah Tusher et.al. | 2503.19100 | null |
| 2025-03-24 | Unsupervised Detection of Fraudulent Transactions in E-commerce Using Contrastive Learning | Xuan Li et.al. | 2503.18841 | null |
| 2025-03-24 | CRCL: Causal Representation Consistency Learning for Anomaly Detection in Surveillance Videos | Yang Liu et.al. | 2503.18808 | null |
| 2025-03-24 | Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization | Minsu Kim et.al. | 2503.18599 | null |
| 2025-03-24 | RoCA: Robust Contrastive One-class Time Series Anomaly Detection with Contaminated Data | Xudong Mou et.al. | 2503.18385 | null |
| 2025-03-24 | PS-EIP: Robust Photometric Stereo Based on Event Interval Profile | Kazuma Kitazawa et.al. | 2503.18341 | null |
| 2025-03-24 | Towards Training-free Anomaly Detection with Vision and Language Foundation Models | Jinjin Zhang et.al. | 2503.18325 | link |
| 2025-03-24 | Knowledge Transfer from LLMs to Provenance Analysis: A Semantic-Augmented Method for APT Detection | Fei Zuo et.al. | 2503.18316 | null |
| 2025-03-23 | Enhance GNNs with Reliable Confidence Estimation via Adversarial Calibration Learning | Yilong Wang et.al. | 2503.18235 | null |
| 2025-03-23 | Anomize: Better Open Vocabulary Video Anomaly Detection | Fei Li et.al. | 2503.18094 | null |
| 2025-03-23 | Anomaly Detection and Localization for Speech Deepfakes via Feature Pyramid Matching | Emma Coletta et.al. | 2503.18032 | null |
| 2025-03-21 | ATHENA: An In-vehicle CAN Intrusion Detection Framework Based on Physical Characteristics of Vehicle Systems | Kai Wang et.al. | 2503.17067 | null |
| 2025-03-21 | TRACE: Time SeRies PArameter EffiCient FinE-tuning | Yuze Li et.al. | 2503.16991 | null |
| 2025-03-20 | Catalog-based detection of unrecognized blends in deep optical ground based catalogs | Shuang Liang et.al. | 2503.16680 | null |
| 2025-03-20 | Benchmarking Visual Language Models on Standardized Visualization Literacy Tests | Saugat Pandey et.al. | 2503.16632 | null |
| 2025-03-20 | A Dataset of Performance Measurements and Alerts from Mozilla (Data Artifact) | Mohamed Bilel Besbes et.al. | 2503.16332 | null |
| 2025-03-20 | Multivariate Time Series Anomaly Detection in Industry 5.0 | Lorenzo Colombi et.al. | 2503.15946 | null |
| 2025-03-20 | A multi-model approach using XAI and anomaly detection to predict asteroid hazards | Amit Kumar Mondal et.al. | 2503.15901 | null |
| 2025-03-20 | WeirdFlows: Anomaly Detection in Financial Transaction Flows | Arthur Capozzi et.al. | 2503.15896 | null |
| 2025-03-19 | Reducing Communication Overhead in Federated Learning for Network Anomaly Detection with Adaptive Client Selection | William Marfo et.al. | 2503.15448 | null |
| 2025-03-19 | Automated Processing of eXplainable Artificial Intelligence Outputs in Deep Learning Models for Fault Diagnostics of Large Infrastructures | Giovanni Floreale et.al. | 2503.15415 | null |
| 2025-03-19 | Euclid Quick Data Release (Q1) Exploring galaxy properties with a multi-modal foundation model | Euclid Collaboration et.al. | 2503.15312 | null |
| 2025-03-19 | Robust Distribution Alignment for Industrial Anomaly Detection under Distribution Shift | Jingyi Liao et.al. | 2503.14910 | null |
| 2025-03-19 | FetalFlex: Anatomy-Guided Diffusion Model for Flexible Control on Fetal Ultrasound Image Synthesis | Yaofei Duan et.al. | 2503.14906 | null |
| 2025-03-19 | LogLLaMA: Transformer-based log anomaly detection with LLaMA | Zhuoyi Yang et.al. | 2503.14849 | null |
| 2025-03-19 | Pruning-Based TinyML Optimization of Machine Learning Models for Anomaly Detection in Electric Vehicle Charging Infrastructure | Fatemeh Dehrouyeh et.al. | 2503.14799 | null |
| 2025-03-18 | Entente: Cross-silo Intrusion Detection on Network Log Graphs with Federated Learning | Jiacen Xu et.al. | 2503.14284 | null |
| 2025-03-18 | EIAD: Explainable Industrial Anomaly Detection Via Multi-Modal Large Language Models | Zongyun Zhang et.al. | 2503.14162 | null |
| 2025-03-18 | Enhancing Kubernetes Resilience through Anomaly Detection and Prediction | V. Anemogiannis et.al. | 2503.14114 | null |
| 2025-03-18 | Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection | Chunlei Li et.al. | 2503.13828 | null |
| 2025-03-17 | U2AD: Uncertainty-based Unsupervised Anomaly Detection Framework for Detecting T2 Hyperintensity in MRI Spinal Cord | Qi Zhang et.al. | 2503.13400 | null |
| 2025-03-17 | Highly Efficient Direct Analytics on Semantic-aware Time Series Data Compression | Guoyou Sun et.al. | 2503.13246 | null |
| 2025-03-17 | Deep Learning Advancements in Anomaly Detection: A Comprehensive Survey | Haoqi Huang et.al. | 2503.13195 | null |
| 2025-03-17 | Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process | Yuanze Li et.al. | 2503.13184 | link |
| 2025-03-17 | Language-guided Open-world Video Anomaly Detection | Zihao Liu et.al. | 2503.13160 | link |
| 2025-03-17 | MFP-CLIP: Exploring the Efficacy of Multi-Form Prompts for Zero-Shot Industrial Anomaly Detection | Jingyi Yuan et.al. | 2503.12910 | null |
| 2025-03-17 | UCF-Crime-DVS: A Novel Event-Based Dataset for Video Anomaly Detection with Spiking Neural Networks | Yuanbin Qian et.al. | 2503.12905 | null |
| 2025-03-16 | GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation | Tao Feng et.al. | 2503.12600 | null |
| 2025-03-16 | Time-EAPCR-T: A Universal Deep Learning Approach for Anomaly Detection in Industrial Equipment | Huajie Liang et.al. | 2503.12534 | null |
| 2025-03-16 | KDSelector: A Knowledge-Enhanced and Data-Efficient Model Selector Learning Framework for Time Series Anomaly Detection | Zhiyu Liang et.al. | 2503.12478 | null |
| 2025-03-14 | Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model | Moritz A. Zanger et.al. | 2503.11339 | null |
| 2025-03-14 | Financial Fraud Detection with Entropy Computing | Babak Emami et.al. | 2503.11273 | null |
| 2025-03-14 | Federated Koopman-Reservoir Learning for Large-Scale Multivariate Time-Series Anomaly Detection | Long Tan Le et.al. | 2503.11255 | null |
| 2025-03-14 | A Novel Decomposed Feature-Oriented Framework for Open-Set Semantic Segmentation on LiDAR Data | Wenbang Deng et.al. | 2503.11097 | null |
| 2025-03-14 | Multi-View Industrial Anomaly Detection with Epipolar Constrained Cross-View Fusion | Yifan Liu et.al. | 2503.11088 | null |
| 2025-03-13 | OuroMamba: A Data-Free Quantization Framework for Vision Mamba Models | Akshat Ramachandran et.al. | 2503.10959 | null |
| 2025-03-13 | Hoi2Anomaly: An Explainable Anomaly Detection Approach Guided by Human-Object Interaction | Yuhan Wang et.al. | 2503.10508 | null |
| 2025-03-13 | An Open-RAN Testbed for Detecting and Mitigating Radio-Access Anomalies | Hanna Bogucka et.al. | 2503.10255 | null |
| 2025-03-13 | G $^{2}$ SF-MIAD: Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection | Chengyu Tao et.al. | 2503.10091 | null |
| 2025-03-13 | Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection | Zhen Qu et.al. | 2503.10080 | link |
| 2025-03-13 | Deep Learning Approaches for Anti-Money Laundering on Mobile Transactions: Review, Framework, and Directions | Jiani Fan et.al. | 2503.10058 | null |
| 2025-03-12 | Anomaly Detection to identify Transients in LSST Time Series Data | Miguel Crispim Romao et.al. | 2503.09699 | link |
| 2025-03-12 | Detecting and Preventing Data Poisoning Attacks on AI Models | Halima I. Kure et.al. | 2503.09302 | null |
| 2025-03-12 | Time-EAPCR: A Deep Learning-Based Novel Approach for Anomaly Detection Applied to the Environmental Field | Lei Liu et.al. | 2503.09200 | null |
| 2025-03-11 | Zero-to-One IDV: A Conceptual Model for AI-Powered Identity Verification | Aniket Vaidya et.al. | 2503.08734 | null |
| 2025-03-11 | A systematic literature review of unsupervised learning algorithms for anomalous traffic detection based on flows | Alberto Miguel-Diez et.al. | 2503.08293 | null |
| 2025-03-11 | Evidential Uncertainty Probes for Graph Neural Networks | Linlin Yu et.al. | 2503.08097 | null |
| 2025-03-11 | Adapting Large Language Models for Parameter-Efficient Log Anomaly Detection | Ying Fu Lim et.al. | 2503.08045 | null |
| 2025-03-11 | STEAD: Spatio-Temporal Efficient Anomaly Detection for Time and Compute Sensitive Applications | Andrew Gao et.al. | 2503.07942 | null |
| 2025-03-10 | Self-supervised Normality Learning and Divergence Vector-guided Model Merging for Zero-shot Congenital Heart Disease Detection in Fetal Ultrasound Videos | Pramit Saha et.al. | 2503.07799 | null |
| 2025-03-10 | A Time Series Multitask Framework Integrating a Large Language Model, Pre-Trained Time Series Model, and Knowledge Graph | Shule Hao et.al. | 2503.07682 | null |
| 2025-03-10 | Open-Set Gait Recognition from Sparse mmWave Radar Point Clouds | Riccardo Mazzieri et.al. | 2503.07435 | null |
| 2025-03-10 | ECNN: A Low-complex, Adjustable CNN for Industrial Pump Monitoring Using Vibration Data | Jonas Ney et.al. | 2503.07401 | null |
| 2025-03-10 | Probabilistic Segmentation for Robust Field of View Estimation | R. Spencer Hallyburton et.al. | 2503.07375 | null |
| 2025-03-11 | AnomalyPainter: Vision-Language-Diffusion Synergy for Zero-Shot Realistic and Diverse Industrial Anomaly Synthesis | Zhangyu Lai et.al. | 2503.07253 | null |
| 2025-03-10 | Learning Decision Trees as Amortized Structure Inference | Mohammed Mahfoud et.al. | 2503.06985 | link |
| 2025-03-09 | Task-Oriented Connectivity for Networked Robotics with Generative AI and Semantic Communications | Peizheng Li et.al. | 2503.06771 | null |
| 2025-03-09 | AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP | Wenxin Ma et.al. | 2503.06661 | null |
| 2025-03-09 | FaaSMT: Lightweight Serverless Framework for Intrusion Detection Using Merkle Tree and Task Inlining | Chuang Li et.al. | 2503.06532 | null |
| 2025-03-09 | StructVizor: Interactive Profiling of Semi-Structured Textual Data | Yanwei Huang et.al. | 2503.06500 | null |
| 2025-03-09 | Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models | Nguyen Do et.al. | 2503.06413 | null |
| 2025-03-07 | Removing Geometric Bias in One-Class Anomaly Detection with Adaptive Feature Perturbation | Romain Hermary et.al. | 2503.05520 | null |
| 2025-03-07 | Robust Intrusion Detection System with Explainable Artificial Intelligence | Betül Güvenç Paltun et.al. | 2503.05303 | null |
| 2025-03-07 | Spectral-Spatial Extraction through Layered Tensor Decomposition for Hyperspectral Anomaly Detection | Quan Yu et.al. | 2503.05183 | null |
| 2025-03-06 | ISP-AD: A Large-Scale Real-World Dataset for Advancing Industrial Anomaly Detection with Synthetic and Real Defects | Paul J. Krassnig et.al. | 2503.04997 | null |
| 2025-03-06 | AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM | Sunghyun Ahn et.al. | 2503.04504 | link |
| 2025-03-06 | Temporal Analysis of NetFlow Datasets for Network Intrusion Detection Systems | Majed Luay et.al. | 2503.04404 | null |
| 2025-03-06 | TRANSIT your events into a new mass: Fast background interpolation for weakly-supervised anomaly searches | Ivan Oleksiyuk et.al. | 2503.04342 | null |
| 2025-03-06 | Unsupervised anomaly detection on cybersecurity data streams: a case with BETH dataset | Evgeniy Eremin et.al. | 2503.04178 | null |
| 2025-03-06 | UniNet: A Unified Multi-granular Traffic Modeling Framework for Network Security | Binghui Wu et.al. | 2503.04174 | null |
| 2025-03-05 | DeepGrav: Anomalous Gravitational-Wave Detection Through Deep Latent Features | Jianqi Yan et.al. | 2503.03799 | null |
| 2025-03-05 | PacketCLIP: Multi-Modal Embedding of Network Traffic and Language for Cybersecurity Reasoning | Ryozo Masukawa et.al. | 2503.03747 | null |
| 2025-03-06 | Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection | Wenqiao Li et.al. | 2503.03562 | null |
| 2025-03-05 | AI-Driven Multi-Stage Computer Vision System for Defect Detection in Laser-Engraved Industrial Nameplates | Adhish Anitha Vilasan et.al. | 2503.03395 | null |
| 2025-03-05 | Enhancing Cybersecurity in Critical Infrastructure with LLM-Assisted Explainable IoT Systems | Ashutosh Ghimire et.al. | 2503.03180 | null |
| 2025-03-05 | SoK: Knowledge is All You Need: Last Mile Delivery for Automated Provenance-based Intrusion Detection with LLMs | Wenrui Cheng et.al. | 2503.03108 | null |
| 2025-03-04 | Intrusion Detection in IoT Networks Using Hyperdimensional Computing: A Case Study on the NSL-KDD Dataset | Ghazal Ghajari et.al. | 2503.03037 | null |
| 2025-03-04 | Network Anomaly Detection for IoT Using Hyperdimensional Computing on NSL-KDD | Ghazal Ghajari et.al. | 2503.03031 | null |
| 2025-03-04 | Generative Active Adaptation for Drifting and Imbalanced Network Intrusion Detection | Ragini Gupta et.al. | 2503.03022 | null |
| 2025-03-04 | RAAD-LLM: Adaptive Anomaly Detection Using LLMs and RAG Integration | Alicia Russell-Gilbert et.al. | 2503.02800 | null |
| 2025-03-04 | Memory Efficient Continual Learning for Edge-Based Visual Anomaly Detection | Manuel Barusco et.al. | 2503.02691 | null |
| 2025-03-04 | World Models for Anomaly Detection during Model-Based Reinforcement Learning Inference | Fabian Domberg et.al. | 2503.02552 | null |
| 2025-03-04 | A compact unshielded optically-pumped magnetic gradiometer | Hangfei Ye et.al. | 2503.02507 | null |
| 2025-03-04 | Robust Multi-Source Domain Adaptation under Label Shift | Congbin Xu et.al. | 2503.02506 | null |
| 2025-03-04 | Monge-Kantorovich quantiles and ranks for image data | Gauthier Thurin et.al. | 2503.02427 | null |
| 2025-03-04 | Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection | Wei Luo et.al. | 2503.02424 | link |
| 2025-03-04 | Anomaly detection in non-stationary videos using time-recursive differencing network based prediction | Gargi V. Pillai et.al. | 2503.02234 | null |
| 2025-03-03 | Aerial Infrared Health Monitoring of Solar Photovoltaic Farms at Scale | Isaac Corley et.al. | 2503.02128 | null |
| 2025-03-03 | Building Machine Learning Challenges for Anomaly Detection in Science | Elizabeth G. Campolongo et.al. | 2503.02112 | null |
| 2025-02-28 | Enabling AutoML for Zero-Touch Network Security: Use-Case Driven Analysis | Li Yang et.al. | 2502.21286 | null |
| 2025-02-28 | TimesBERT: A BERT-Style Foundation Model for Time Series Understanding | Haoran Zhang et.al. | 2502.21245 | null |
| 2025-02-28 | Unmasking Stealthy Attacks on Nonlinear DAE Models of Power Grids | Abdallah Alalem Albustami et.al. | 2502.21146 | null |
| 2025-02-28 | Detection of anomalies in cow activity using wavelet transform based features | Valentin Guien et.al. | 2502.21051 | null |
| 2025-02-28 | When Unsupervised Domain Adaptation meets One-class Anomaly Detection: Addressing the Two-fold Unsupervised Curse by Leveraging Anomaly Scarcity | Nesryne Mejri et.al. | 2502.21022 | null |
| 2025-02-28 | FedDyMem: Efficient Federated Learning with Dynamic Memory and Memory-Reduce for Unsupervised Image Anomaly Detection | Silin Chen et.al. | 2502.21012 | null |
| 2025-02-28 | Distribution Prototype Diffusion Learning for Open-set Supervised Anomaly Detection | Fuyun Wang et.al. | 2502.20981 | null |
| 2025-02-28 | Towards Zero Touch Networks: Cross-Layer Automated Security Solutions for 6G Wireless Networks | Li Yang et.al. | 2502.20627 | null |
| 2025-02-27 | Discovering Antagonists in Networks of Systems: Robot Deployment | Ingeborg Wenger et.al. | 2502.20125 | link |
| 2025-02-27 | One-for-More: Continual Diffusion Model for Anomaly Detection | Xiaofan Li et.al. | 2502.19848 | link |
| 2025-02-26 | Retrieval Augmented Anomaly Detection (RAAD): Nimble Model Adjustment Without Retraining | Sam Pastoriza et.al. | 2502.19534 | null |
| 2025-02-26 | Anomaly Detection in Complex Dynamical Systems: A Systematic Framework Using Embedding Theory and Physics-Inspired Consistency | Michael Somma et.al. | 2502.19307 | null |
| 2025-02-26 | Corporate Fraud Detection in Rich-yet-Noisy Financial Graph | Shiqi Wang et.al. | 2502.19305 | null |
| 2025-02-26 | HDM: Hybrid Diffusion Model for Unified Image Anomaly Detection | Zekang Weng et.al. | 2502.19200 | null |
| 2025-02-26 | Towards Privacy-Preserving Anomaly-Based Intrusion Detection in Energy Communities | Zeeshan Afzal et.al. | 2502.19154 | null |
| 2025-02-26 | Random Similarity Isolation Forests | Sebastian Chwilczyński et.al. | 2502.19122 | null |
| 2025-02-26 | Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM | Junxiao Ma et.al. | 2502.18863 | null |
| 2025-02-25 | Identification and Characterization for Disruptions in the U.S. National Airspace System (NAS) | Jing Xu et.al. | 2502.18687 | null |
| 2025-02-25 | Tighten The Lasso: A Convex Hull Volume-based Anomaly Detection Method | Uri Itai et.al. | 2502.18601 | null |
| 2025-02-25 | Structural Alignment Improves Graph Test-Time Adaptation | Hans Hao-Hsun Hsu et.al. | 2502.18334 | null |
| 2025-02-25 | From Vision to Sound: Advancing Audio Anomaly Detection with Vision-Based Algorithms | Manuel Barusco et.al. | 2502.18328 | link |
| 2025-02-25 | Sequential Outlier Detection in Non-Stationary Time Series | Florian Heinrichs et.al. | 2502.18038 | null |
| 2025-02-25 | Radon-Nikodým Derivative: Re-imagining Anomaly Detection from a Measure Theoretic Perspective | Shlok Mehendale et.al. | 2502.18002 | null |
| 2025-02-25 | Improved YOLOv7x-Based Defect Detection Algorithm for Power Equipment | Jin Hou et.al. | 2502.17961 | null |
| 2025-02-25 | Can Multimodal LLMs Perform Time Series Anomaly Detection? | Xiongxiao Xu et.al. | 2502.17812 | link |
| 2025-02-25 | A digital eye-fixation biomarker using a deep anomaly scheme to classify Parkisonian patterns | Juan Niño et.al. | 2502.17762 | null |
| 2025-02-24 | 1 Particle - 1 Qubit: Particle Physics Data Encoding for Quantum Machine Learning | Aritra Bal et.al. | 2502.17301 | null |
| 2025-02-24 | Using Machine Learning to Detect Fraudulent SMSs in Chichewa | Amelia Taylor et.al. | 2502.16947 | null |
| 2025-02-24 | MAD-AD: Masked Diffusion for Unsupervised Brain Anomaly Detection | Farzad Beizaee et.al. | 2502.16943 | link |
| 2025-02-23 | Enhancing sensor attack detection in supervisory control systems modeled by probabilistic automata | Parastou Fahim et.al. | 2502.16753 | null |
| 2025-02-23 | TrustChain: A Blockchain Framework for Auditing and Verifying Aggregators in Decentralized Federated Learning | Ehsan Hallaji et.al. | 2502.16406 | null |
| 2025-02-23 | An Expert Ensemble for Detecting Anomalous Scenes, Interactions, and Behaviors in Autonomous Driving | Tianchen Ji et.al. | 2502.16389 | null |
| 2025-02-22 | Machine Learning-Based Cloud Computing Compliance Process Automation | Yuqing Wang et.al. | 2502.16344 | null |
| 2025-02-22 | DiffFake: Exposing Deepfakes using Differential Anomaly Detection | Sotirios Stamnas et.al. | 2502.16247 | null |
| 2025-02-21 | Anomaly preserving contrastive neural embeddings for end-to-end model-independent searches at the LHC | Kyle Metzger et.al. | 2502.15926 | null |
| 2025-02-21 | ML-Driven Approaches to Combat Medicare Fraud: Advances in Class Imbalance Solutions, Feature Engineering, Adaptive Learning, and Business Impact | Dorsa Farahmandazad et.al. | 2502.15898 | null |
| 2025-02-21 | A Defensive Framework Against Adversarial Attacks on Machine Learning-Based Network Intrusion Detection Systems | Benyamin Tafreshian et.al. | 2502.15561 | null |
| 2025-02-21 | Pub-Guard-LLM: Detecting Fraudulent Biomedical Articles with Reliable Explanations | Lihu Chen et.al. | 2502.15429 | link |
| 2025-02-20 | CyberSentinel: An Emergent Threat Detection System for AI Security | Krti Tallam et.al. | 2502.14966 | null |
| 2025-02-20 | Outlier Detection in Mendelian Randomisation | Maximilian M Mandl et.al. | 2502.14716 | link |
| 2025-02-19 | A Method to Simultaneously Facilitate All Jet Physics Tasks | Vinicius Mikuni et.al. | 2502.14652 | null |
| 2025-02-20 | dtaianomaly: A Python library for time series anomaly detection | Louis Carpentier et.al. | 2502.14381 | link |
| 2025-02-20 | Graph Anomaly Detection via Adaptive Test-time Representation Learning across Out-of-Distribution Domains | Delaram Pirhayati et.al. | 2502.14293 | null |
| 2025-02-20 | Adaptive Sparsified Graph Learning Framework for Vessel Behavior Anomalies | Jeehong Kim et.al. | 2502.14197 | null |
| 2025-02-19 | CND-IDS: Continual Novelty Detection for Intrusion Detection Systems | Sean Fuhrman et.al. | 2502.14094 | null |
| 2025-02-19 | Isolating Unisolated Upsilons with Anomaly Detection in CMS Open Data | Rikab Gambhir et.al. | 2502.14036 | link |
| 2025-02-19 | A Synergy Scoring Filter for Unsupervised Anomaly Detection with Noisy Data | Fengjie Wang et.al. | 2502.13992 | null |
| 2025-02-19 | Unlocking Multimodal Integration in EHRs: A Prompt Learning Framework for Language and Time Series Fusion | Shuai Niu et.al. | 2502.13509 | null |
| 2025-02-19 | Flow-based generative models as iterative algorithms in probability space | Yao Xie et.al. | 2502.13394 | null |
| 2025-02-18 | VUS: Effective and Efficient Accuracy Measures for Time-Series Anomaly Detection | Paul Boniol et.al. | 2502.13318 | link |
| 2025-02-18 | A Label-Free Heterophily-Guided Approach for Unsupervised Graph Fraud Detection | Junjun Pan et.al. | 2502.13308 | null |
| 2025-02-18 | Value Gradient Sampler: Sampling as Sequential Decision Making | Sangwoong Yoon et.al. | 2502.13280 | null |
| 2025-02-18 | A Survey of Anomaly Detection in Cyber-Physical Systems | Danial Abshari et.al. | 2502.13256 | null |
| 2025-02-18 | Statistically Significant $k$ NNAD by Selective Inference | Mizuki Niihori et.al. | 2502.12978 | null |
| 2025-02-18 | Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements | Shu Yang et.al. | 2502.12904 | null |
| 2025-02-18 | Toward Cybersecurity Testing and Monitoring of IoT Ecosystems | Steve Taylor et.al. | 2502.12837 | null |
| 2025-02-18 | Unsupervised Anomaly Detection through Mass Repulsing Optimal Transport | Eduardo Fernandes Montesuma et.al. | 2502.12793 | null |
| 2025-02-17 | Hybrid Machine Learning Models for Intrusion Detection in IoT: Leveraging a Real-World IoT Dataset | Md Ahnaf Akif et.al. | 2502.12382 | null |
| 2025-02-17 | Positional Encoding in Transformer-Based Time Series Models: A Survey | Habib Irani et.al. | 2502.12370 | link |
| 2025-02-17 | Unifying Explainable Anomaly Detection and Root Cause Analysis in Dynamical Systems | Yue Sun et.al. | 2502.12086 | null |
| 2025-02-17 | Enhanced Anomaly Detection in IoMT Networks using Ensemble AI Models on the CICIoMT2024 Dataset | Prathamesh Chandekar et.al. | 2502.11854 | null |
| 2025-02-17 | Component-aware Unsupervised Logical Anomaly Generation for Industrial Anomaly Detection | Xuan Tong et.al. | 2502.11712 | null |
| 2025-02-17 | Towards a Trustworthy Anomaly Detection for Critical Applications through Approximated Partial AUC Loss | Arnaud Bougaham et.al. | 2502.11570 | null |
| 2025-02-17 | DCAD-2000: A Multilingual Dataset across 2000+ Languages with Data Cleaning as Anomaly Detection | Yingli Shen et.al. | 2502.11546 | null |
| 2025-02-17 | WRT-SAM: Foundation Model-Driven Segmentation for Generalized Weld Radiographic Testing | Yunyi Zhou et.al. | 2502.11338 | null |
| 2025-02-16 | Exploiting Point-Language Models with Dual-Prompts for 3D Anomaly Detection | Jiaxiang Wang et.al. | 2502.11307 | null |
| 2025-02-16 | Evaluating the Potential of Quantum Machine Learning in Cybersecurity: A Case-Study on PCA-based Intrusion Detection Systems | Armando Bellante et.al. | 2502.11173 | null |
| 2025-02-16 | Machine Learning-Based Intrusion Detection and Prevention System for IIoT Smart Metering Networks: Challenges and Solutions | Sahar Lazim et.al. | 2502.11138 | null |
| 2025-02-15 | A Computational Model for Ransomware Detection Using Cross-Domain Entropy Signatures | Michael Mannon et.al. | 2502.10711 | null |
| 2025-02-14 | Dynamic Fraud Proof | Gabriele Picco et.al. | 2502.10321 | null |
| 2025-02-14 | Anomaly Detection with LWE Encrypted Control | Rijad Alisic et.al. | 2502.10283 | null |
| 2025-02-14 | Control-flow anomaly detection by process mining-based feature extraction and dimensionality reduction | Francesco Vitale et.al. | 2502.10211 | null |
| 2025-02-14 | Enhancing anomaly detection with topology-aware autoencoders | Vishal S. Ngairangbam et.al. | 2502.10163 | null |
| 2025-02-14 | Robust Anomaly Detection via Tensor Chidori Pseudoskeleton Decomposition | Bowen Su et.al. | 2502.09926 | null |
| 2025-02-13 | Weakly supervised anomaly detection for resonant new physics in the dijet final state using proton-proton collisions at $\sqrt{s}=13$ TeV with the ATLAS detector | ATLAS Collaboration et.al. | 2502.09770 | null |
| 2025-02-13 | APT-LLM: Embedding-Based Anomaly Detection of Cyber Advanced Persistent Threats Using Large Language Models | Sidahmed Benabderrahmane et.al. | 2502.09385 | null |
| 2025-02-13 | AnomalyGFM: Graph Foundation Model for Zero/Few-shot Anomaly Detection | Hezhe Qiao et.al. | 2502.09254 | null |
| 2025-02-13 | XAInomaly: Explainable and Interpretable Deep Contractive Autoencoder for O-RAN Traffic Anomaly Detection | Osman Tugay Basaran et.al. | 2502.09194 | null |
| 2025-02-13 | Unsupervised Anomaly Detection on Implicit Shape representations for Sarcopenia Detection | Louise Piecuch et.al. | 2502.09088 | null |
| 2025-02-13 | Privacy-Preserving Hybrid Ensemble Model for Network Anomaly Detection: Balancing Security and Data Protection | Shaobo Liu et.al. | 2502.09001 | null |
| 2025-02-12 | Hierarchical Entropy Disruption for Ransomware Detection: A Computationally-Driven Framework | Hayden Srynn et.al. | 2502.08843 | null |
| 2025-02-12 | Investigation of Advanced Persistent Threats Network-based Tactics, Techniques and Procedures | Almuthanna Alageel et.al. | 2502.08830 | null |
| 2025-02-12 | CurvGAD: Leveraging Curvature for Enhanced Graph Anomaly Detection | Karish Grover et.al. | 2502.08605 | null |
| 2025-02-12 | Mapping the Landscape of Generative AI in Network Monitoring and Management | Giampaolo Bovenzi et.al. | 2502.08576 | null |
| 2025-02-12 | GenIAS: Generator for Instantiating Anomalies in time Series | Zahra Zamanzadeh Darban et.al. | 2502.08262 | null |
| 2025-02-12 | Out-of-Distribution Detection on Graphs: A Survey | Tingyi Cai et.al. | 2502.08105 | null |
| 2025-02-11 | MAAT: Mamba Adaptive Anomaly Transformer with association discrepancy for time series | Abdellah Zakaria Sellam et.al. | 2502.07858 | null |
| 2025-02-11 | Quantum-driven Zero Trust Framework with Dynamic Anomaly Detection in 7G Technology: A Neural Network Approach | Shakil Ahmed et.al. | 2502.07779 | null |
| 2025-02-11 | Advancing climate model interpretability: Feature attribution for Arctic melt anomalies | Tolulope Ale et.al. | 2502.07741 | null |
| 2025-02-11 | Methodology for Identifying Social Groups within a Transactional Graph | Maxence Morin et.al. | 2502.07694 | null |
| 2025-02-11 | Unsupervised Feature Extraction and Reconstruction Using Parameterized Quantum Circuits | Li-An Lo et.al. | 2502.07667 | link |
| 2025-02-11 | Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models | Jiacong Xu et.al. | 2502.07601 | null |
| 2025-02-11 | FADE: Forecasting for Anomaly Detection on ECG | Paula Ruiz-Barroso et.al. | 2502.07389 | null |
| 2025-02-10 | SAFE: Self-Supervised Anomaly Detection Framework for Intrusion Detection | Elvin Li et.al. | 2502.07119 | null |
| 2025-02-10 | Leveraging GPT-4o Efficiency for Detecting Rework Anomaly in Business Processes | Mohammad Derakhshan et.al. | 2502.06918 | null |
| 2025-02-10 | Network Intrusion Datasets: A Survey, Limitations, and Recommendations | Patrik Goldschmidt et.al. | 2502.06688 | null |
| 2025-02-10 | An Efficient Security Model for Industrial Internet of Things (IIoT) System Based on Machine Learning Principles | Sahar L. Qaddoori et.al. | 2502.06502 | null |
| 2025-02-10 | Multimodal Task Representation Memory Bank vs. Catastrophic Forgetting in Anomaly Detection | You Zhou et.al. | 2502.06194 | null |
| 2025-02-10 | Fine-Tuning Federated Learning-Based Intrusion Detection Systems for Transportation IoT | Robert Akinie et.al. | 2502.06099 | null |
| 2025-02-09 | A Conditional Tabular GAN-Enhanced Intrusion Detection System for Rare Attacks in IoT Networks | Safaa Menssouri et.al. | 2502.06031 | null |
| 2025-02-09 | A 3D Multimodal Feature for Infrastructure Anomaly Detection | Yixiong Jing et.al. | 2502.05779 | null |
| 2025-02-09 | 3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly | Enquan Yang et.al. | 2502.05761 | null |
| 2025-02-08 | Extended Histogram-based Outlier Score (EHBOS) | Tanvir Islam et.al. | 2502.05719 | null |
| 2025-02-08 | Federated Learning with Reservoir State Analysis for Time Series Anomaly Detection | Keigo Nogami et.al. | 2502.05679 | null |
| 2025-02-08 | Dual Defense: Enhancing Privacy and Mitigating Poisoning Attacks in Federated Learning | Runhua Xu et.al. | 2502.05547 | null |
| 2025-02-07 | Federated Learning for Anomaly Detection in Energy Consumption Data: Assessing the Vulnerability to Adversarial Attacks | Yohannis Kifle Telila et.al. | 2502.05041 | null |
| 2025-02-07 | Robust Conformal Outlier Detection under Contaminated Reference Data | Meshi Bashari et.al. | 2502.04807 | null |
| 2025-02-06 | Finding Pegasus: Enhancing Unsupervised Anomaly Detection in High-Dimensional Data using a Manifold-Based Approach | R. P. Nathan et.al. | 2502.04310 | null |
| 2025-02-06 | NLP-Based .NET CLR Event Logs Analyzer | Maxim Stavtsev et.al. | 2502.04219 | null |
| 2025-02-06 | CleanSurvival: Automated data preprocessing for time-to-event models using reinforcement learning | Yousef Koka et.al. | 2502.03946 | null |
| 2025-02-06 | Technical Report: Generating the WEB-IDS23 Dataset | Eric Lanfer et.al. | 2502.03909 | null |
| 2025-02-06 | Hierarchical Entropic Diffusion for Ransomware Detection: A Probabilistic Approach to Behavioral Anomaly Isolation | Vasili Iskorohodov et.al. | 2502.03882 | null |
| 2025-02-06 | Position: Untrained Machine Learning for Anomaly Detection | Juan Du et.al. | 2502.03876 | null |
| 2025-02-05 | The Adoption of Artificial Intelligence in Different Network Security Concepts | Mamoon A. Al Jbaar et.al. | 2502.03398 | null |
| 2025-02-05 | A Structured Reasoning Framework for Unbalanced Data Classification Using Probabilistic Models | Junliang Du et.al. | 2502.03386 | null |
| 2025-02-05 | General Time-series Model for Universal Knowledge Representation of Multivariate Time-Series data | Cheng He et.al. | 2502.03264 | null |
| 2025-02-05 | Calibrated Unsupervised Anomaly Detection in Multivariate Time-series using Reinforcement Learning | Saba Sanami et.al. | 2502.03245 | null |
| 2025-02-05 | SpaceGNN: Multi-Space Graph Neural Network for Node Anomaly Detection with Extremely Limited Labels | Xiangyu Dong et.al. | 2502.03201 | null |
| 2025-02-05 | Gotham Dataset 2025: A Reproducible Large-Scale IoT Network Dataset for Intrusion Detection and Security Research | Othmane Belarbi et.al. | 2502.03134 | null |
| 2025-02-05 | Implementing Large Quantum Boltzmann Machines as Generative AI Models for Dataset Balancing | Salvatore Sinno et.al. | 2502.03086 | null |
| 2025-02-05 | Time Series Anomaly Detection in the Frequency Domain with Statistical Reliability | Akifumi Yamada et.al. | 2502.03062 | null |
| 2025-02-05 | TopoCL: Topological Contrastive Learning for Time Series | Namwoo Kim et.al. | 2502.02924 | null |
| 2025-02-04 | SHIELD: APT Detection and Intelligent Explanation Using LLM | Parth Atulbhai Gandhi et.al. | 2502.02342 | null |
| 2025-02-04 | FRAUD-RLA: A new reinforcement learning adversarial attack against credit card fraud detection | Daniele Lunghi et.al. | 2502.02290 | null |
| 2025-02-04 | LAST SToP For Modeling Asynchronous Time Series | Shubham Gupta et.al. | 2502.01922 | null |
| 2025-02-04 | Anomaly Detection via Autoencoder Composite Features and NCE | Yalin Liao et.al. | 2502.01920 | null |
| 2025-02-03 | A Poisson Process AutoDecoder for X-ray Sources | Yanke Song et.al. | 2502.01627 | null |
| 2025-02-03 | Federated Detection of Open Charge Point Protocol 1.6 Cyberattacks | Christos Dalamagkas et.al. | 2502.01569 | null |
| 2025-02-03 | Unsupervised anomaly detection in large-scale estuarine acoustic telemetry data | Siphendulwe Zaza et.al. | 2502.01543 | null |
| 2025-02-03 | Dense Subgraph Discovery Meets Strong Triadic Closure | Chamalee Wickrama Arachchi et.al. | 2502.01435 | null |
| 2025-02-03 | ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies | Costin F. Ciusdel et.al. | 2502.01335 | null |
| 2025-02-03 | One-to-Normal: Anomaly Personalization for Few-shot Anomaly Detection | Yiyue Li et.al. | 2502.01201 | null |
| 2025-01-31 | Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes | Zhiyao Xu et.al. | 2501.19298 | null |
| 2025-01-31 | Transformer-Based Financial Fraud Detection with Cloud-Optimized Real-Time Streaming | Tingting Deng et.al. | 2501.19267 | null |
| 2025-01-31 | DINAMO: Dynamic and INterpretable Anomaly MOnitoring for Large-Scale Particle Physics Experiments | Arsenii Gavrikov et.al. | 2501.19237 | link |
| 2025-01-31 | Secured Communication Schemes for UAVs in 5G: CRYSTALS-Kyber and IDS | Taneya Sharma et.al. | 2501.19191 | link |
| 2025-01-31 | An Optimal Cascade Feature-Level Spatiotemporal Fusion Strategy for Anomaly Detection in CAN Bus | Mohammad Fatahi et.al. | 2501.18821 | null |
| 2025-01-30 | CryptoDNA: A Machine Learning Paradigm for DDoS Detection in Healthcare IoT, Inspired by crypto jacking prevention Models | Zag ElSayed et.al. | 2501.18549 | null |
| 2025-02-03 | Real-Time Anomaly Detection with Synthetic Anomaly Monitoring (SAM) | Emanuele Luzio et.al. | 2501.18417 | null |
| 2025-01-30 | GDformer: Going Beyond Subsequence Isolation for Multivariate Time Series Anomaly Detection | Qingxiang Liu et.al. | 2501.18196 | link |
| 2025-01-30 | Conformal novelty detection for replicate point patterns with FDR or FWER control | Christophe A. N. Biscio et.al. | 2501.18195 | link |
| 2025-01-30 | Battery State of Health Estimation Using LLM Framework | Aybars Yunusoglu et.al. | 2501.18123 | null |
| 2025-01-29 | KoopAGRU: A Koopman-based Anomaly Detection in Time-Series using Gated Recurrent Units | Issam Ait Yahia et.al. | 2501.17976 | null |
| 2025-01-29 | Unsupervised Patch-GAN with Targeted Patch Ranking for Fine-Grained Novelty Detection in Medical Imaging | Jingkun Chen et.al. | 2501.17906 | null |
| 2025-01-29 | Free Agent in Agent-Based Mixture-of-Experts Generative AI Framework | Jung-Hua Liu et.al. | 2501.17903 | null |
| 2025-01-29 | Detecting Anomalies Using Rotated Isolation Forest | Vahideh Monemizadeh et.al. | 2501.17787 | null |
| 2025-01-29 | Neural Networks for the Analysis of Traced Particles in Kinetic Plasma Simulations | Gabriel Torralba Paz et.al. | 2501.17537 | null |
| 2025-01-29 | si4onnx: A Python package for Selective Inference in Deep Learning Models | Teruyuki Katsuoka et.al. | 2501.17415 | null |
| 2025-01-28 | Anomaly Detection in Cooperative Vehicle Perception Systems under Imperfect Communication | Ashish Bastola et.al. | 2501.17329 | null |
| 2025-01-28 | A Contrastive Teacher-Student Framework for Novelty Detection under Style Shifts | Hossein Mirzaei et.al. | 2501.17289 | null |
| 2025-01-28 | Goodness of Fit for Bayesian Generative Models with Applications in Population Genetics | Guillaume Le Mailloux et.al. | 2501.17107 | link |
| 2025-01-28 | MAUCell: An Adaptive Multi-Attention Framework for Video Frame Prediction | Shreyam Gupta et.al. | 2501.16997 | null |
| 2025-01-28 | RODEO: Robust Outlier Detection via Exposing Adaptive Out-of-Distribution Samples | Hossein Mirzaei et.al. | 2501.16971 | link |
| 2025-01-28 | Enhancing Web Service Anomaly Detection via Fine-grained Multi-modal Association and Frequency Domain Analysis | Xixuan Yang et.al. | 2501.16875 | null |
| 2025-01-28 | LLM Assisted Anomaly Detection Service for Site Reliability Engineers: Enhancing Cloud Infrastructure Resilience | Nimesh Jha et.al. | 2501.16744 | null |
| 2025-01-28 | Federated Learning for Efficient Condition Monitoring and Anomaly Detection in Industrial Cyber-Physical Systems | William Marfo et.al. | 2501.16666 | null |
| 2025-01-28 | Analysis of Zero Day Attack Detection Using MLP and XAI | Ashim Dahal et.al. | 2501.16638 | null |
| 2025-01-27 | Large Models in Dialogue for Active Perception and Anomaly Detection | Tzoulio Chamiti et.al. | 2501.16300 | null |
| 2025-01-27 | Addressing Out-of-Label Hazard Detection in Dashcam Videos: Insights from the COOOL Challenge | Anh-Kiet Duong et.al. | 2501.16037 | link |
| 2025-01-27 | Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection? | Zhiling Chen et.al. | 2501.15795 | null |
| 2025-01-27 | Investigating Application of Deep Neural Networks in Intrusion Detection System Design | Mofe O. Jeje et.al. | 2501.15760 | null |
| 2025-01-27 | Modeling shared micromobility as a label propagation process for detecting the overlapping communities | Peng Luo et.al. | 2501.15713 | null |
| 2025-01-26 | PCAP-Backdoor: Backdoor Poisoning Generator for Network Traffic in CPS/IoT Environments | Ajesh Koyatan Chathoth et.al. | 2501.15563 | null |
| 2025-01-26 | Mitigating Spurious Negative Pairs for Robust Industrial Anomaly Detection | Hossein Mirzaei et.al. | 2501.15434 | null |
| 2025-01-26 | A Transfer Learning Framework for Anomaly Detection in Multivariate IoT Traffic Data | Mahshid Rezakhani et.al. | 2501.15365 | null |
| 2025-01-25 | Advanced Real-Time Fraud Detection Using RAG-Based LLMs | Gurjot Singh et.al. | 2501.15290 | null |
| 2025-01-25 | Killing it with Zero-Shot: Adversarially Robust Novelty Detection | Hossein Mirzaei et.al. | 2501.15271 | link |
| 2025-01-24 | Towards Automated Self-Supervised Learning for Truly Unsupervised Graph Anomaly Detection | Zhong Li et.al. | 2501.14694 | link |
| 2025-01-24 | Bi-directional Curriculum Learning for Graph Anomaly Detection: Dual Focus on Homogeneity and Heterogeneity | Yitong Hao et.al. | 2501.14197 | null |
| 2025-01-24 | Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models | Yile Gu et.al. | 2501.14170 | null |
| 2025-01-23 | Autoencoders for Anomaly Detection are Unreliable | Roel Bouman et.al. | 2501.13864 | null |
| 2025-01-23 | The Lock Generative Adversarial Network for Medical Waveform Anomaly Detection | Wenjie Xu et.al. | 2501.13858 | null |
| 2025-01-23 | Anomaly Detection for Automated Data Quality Monitoring in the CMS Detector | Andrew Brinkerhoff et.al. | 2501.13789 | null |
| 2025-01-23 | GCAD: Anomaly Detection in Multivariate Time Series from the Perspective of Granger Causality | Zehao Liu et.al. | 2501.13493 | null |
| 2025-01-23 | Leveraging Digital Twin and Machine Learning Techniques for Anomaly Detection in Power Electronics Dominated Grid | Ildar N. Idrisov et.al. | 2501.13474 | null |
| 2025-01-23 | Unraveling Normal Anatomy via Fluid-Driven Anomaly Randomization | Peirong Liu et.al. | 2501.13370 | link |
| 2025-01-22 | Distributed Intrusion Detection in Dynamic Networks of UAVs using Few-Shot Federated Learning | Ozlem Ceviz et.al. | 2501.13213 | null |
| 2025-01-22 | Real-Time Multi-Modal Subcomponent-Level Measurements for Trustworthy System Monitoring and Malware Detection | Farshad Khorrami et.al. | 2501.13081 | null |
| 2025-01-22 | Comparison of feature extraction tools for network traffic data | Borys Lypa et.al. | 2501.13004 | null |
| 2025-01-22 | Anomaly Detection in Double-entry Bookkeeping Data by Federated Learning System with Non-model Sharing Approach | Sota Mashiko et.al. | 2501.12723 | null |
| 2025-01-22 | Improved Detection and Diagnosis of Faults in Deep Neural Networks Using Hierarchical and Explainable Classification | Sigma Jahan et.al. | 2501.12560 | null |
| 2025-01-21 | Optimizing Blockchain Analysis: Tackling Temporality and Scalability with an Incremental Approach with Metropolis-Hastings Random Walks | Junliang Luo et.al. | 2501.12491 | null |
| 2025-01-21 | SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection | Xiaocheng Zhang et.al. | 2501.12430 | null |
| 2025-01-21 | Towards Accurate Unified Anomaly Segmentation | Wenxin Ma et.al. | 2501.12295 | link |
| 2025-01-21 | Score Combining for Contrastive OOD Detection | Edward T. Reehorst et.al. | 2501.12204 | null |
| 2025-01-21 | Beyond Window-Based Detection: A Graph-Centric Framework for Discrete Log Anomaly Detection | Jiaxing Qi et.al. | 2501.12166 | null |
| 2025-01-22 | Teacher Encoder-Student Decoder Denoising Guided Segmentation Network for Anomaly Detection | Shixuan Song et.al. | 2501.12104 | null |
| 2025-01-21 | Application of Machine Learning Techniques for Secure Traffic in NoC-based Manycores | Geaninne Lopes et.al. | 2501.12034 | null |
| 2025-01-21 | TAD-Bench: A Comprehensive Benchmark for Embedding-Based Text Anomaly Detection | Yang Cao et.al. | 2501.11960 | null |
| 2025-01-21 | Noise-Resilient Point-wise Anomaly Detection in Time Series Using Weak Segment Labels | Yaxuan Wang et.al. | 2501.11959 | link |
| 2025-01-20 | Towards Improving IDS Using CTF Events | Manuel Kern et.al. | 2501.11685 | null |
| 2025-01-20 | Class Imbalance in Anomaly Detection: Learning from an Exactly Solvable Model | F. S. Pezzicoli et.al. | 2501.11638 | null |
| 2025-01-20 | A Survey on Diffusion Models for Anomaly Detection | Jing Liu et.al. | 2501.11430 | null |
| 2025-01-17 | FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization | Zhaopeng Gu et.al. | 2501.10067 | link |
| 2025-01-16 | Ruling the Unruly: Designing Effective, Low-Noise Network Intrusion Detection Rules for Security Operations Centers | Koen T. W. Teuwen et.al. | 2501.09808 | null |
| 2025-01-16 | ComplexVAD: Detecting Interaction Anomalies in Video | Furkan Mumcu et.al. | 2501.09733 | null |
| 2025-01-16 | Sequential PatchCore: Anomaly Detection for Surface Inspection using Synthetic Impurities | Runzhou Mao et.al. | 2501.09579 | null |
| 2025-01-16 | AI-based Identity Fraud Detection: A Systematic Review | Chuo Jun Zhang et.al. | 2501.09239 | null |
| 2025-01-15 | Attention is All You Need Until You Need Retention | M. Murat Yaslioglu et.al. | 2501.09166 | null |
| 2025-01-15 | When Uncertainty Leads to Unsafety: Empirical Insights into the Role of Uncertainty in Unmanned Aerial Vehicle Safety | Sajad Khatiri et.al. | 2501.08908 | null |
| 2025-01-15 | Transformer-based Multivariate Time Series Anomaly Localization | Charalampos Shimillas et.al. | 2501.08628 | null |
| 2025-01-14 | Detecting Contextual Anomalies by Discovering Consistent Spatial Regions | Zhengye Yang et.al. | 2501.08470 | null |
| 2025-01-14 | Multiple-Input Variational Auto-Encoder for Anomaly Detection in Heterogeneous Data | Phai Vu Dinh et.al. | 2501.08149 | null |
| 2025-01-14 | PolyLUT: Ultra-low Latency Polynomial Inference with Hardware-Aware Structured Pruning | Marta Andronic et.al. | 2501.08043 | null |
| 2025-01-14 | Unsupervised Feature Construction for Anomaly Detection in Time Series – An Evaluation | Marine Hamon et.al. | 2501.07999 | null |
| 2025-01-14 | STTS-EAD: Improving Spatio-Temporal Learning Based Time Series Prediction via | Yuanyuan Liang et.al. | 2501.07814 | null |
| 2025-01-14 | A Comparative Analysis of DNN-based White-Box Explainable AI Methods in Network Security | Osvaldo Arreche et.al. | 2501.07801 | link |
| 2025-01-13 | A Novel Approach to Network Traffic Analysis: the HERA tool | Daniela Pinto et.al. | 2501.07475 | null |
| 2025-01-13 | ADKGD: Anomaly Detection in Knowledge Graphs with Dual-Channel Training | Jiayang Wu et.al. | 2501.07078 | link |
| 2025-01-13 | Detection of AI Deepfake and Fraud in Online Payments Using GAN-Based Models | Zong Ke et.al. | 2501.07033 | null |
| 2025-01-13 | TFLAG:Towards Practical APT Detection via Deviation-Aware Learning on Temporal Provenance Graph | Wenhan Jiang et.al. | 2501.06997 | link |
| 2025-01-12 | Shake-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Manipulations and Liquid Mixing | Muhamamd Haris Khan et.al. | 2501.06919 | null |
| 2025-01-12 | Driver Age and Its Effect on Key Driving Metrics: Insights from Dynamic Vehicle Data | Aparna Joshi et.al. | 2501.06918 | null |
| 2025-01-12 | Generative AI Enabled Robust Sensor Placement in Cyber-Physical Power Systems: A Graph Diffusion Approach | Changyuan Zhao et.al. | 2501.06756 | null |
| 2025-01-11 | Exploring Pose-Based Anomaly Detection for Retail Security: A Real-World Shoplifting Dataset and Benchmark | Narges Rashvand et.al. | 2501.06591 | link |
| 2025-01-11 | Active Rule Mining for Multivariate Anomaly Detection in Radio Access Networks | Ebenezer R. H. P. Isaac et.al. | 2501.06571 | null |
| 2025-01-10 | Explaining Deep Learning-based Anomaly Detection in Energy Consumption Data by Focusing on Contextually Relevant Data | Mohammad Noorchenarboo et.al. | 2501.06099 | null |
| 2025-01-10 | Facilitate Collaboration between Large Language Model and Task-specific Model for Time Series Anomaly Detection | Feiyi Chen et.al. | 2501.05675 | null |
| 2025-01-10 | Evidential Deep Learning for Uncertainty Quantification and Out-of-Distribution Detection in Jet Identification using Deep Neural Networks | Ayush Khot et.al. | 2501.05656 | link |
| 2025-01-09 | Outlyingness Scores with Cluster Catch Digraphs | Rui Shi et.al. | 2501.05530 | null |
| 2025-01-09 | EVA-S2PLoR: A Secure Element-wise Multiplication Meets Logistic Regression on Heterogeneous Database | Tianle Tao et.al. | 2501.05223 | null |
| 2025-01-10 | Learning In-Distribution Representations for Anomaly Detection | Willian T. Lunardi et.al. | 2501.05130 | link |
| 2025-01-08 | Back Home: A Machine Learning Approach to Seashell Classification and Ecosystem Restoration | Alexander Valverde et.al. | 2501.04873 | null |
| 2025-01-08 | Quantum Hybrid Support Vector Machines for Stress Detection in Older Adults | Md Saif Hassan Onim et.al. | 2501.04831 | null |
| 2025-01-08 | Planing It by Ear: Convolutional Neural Networks for Acoustic Anomaly Detection in Industrial Wood Planers | Anthony Deschênes et.al. | 2501.04819 | link |
| 2025-01-08 | Leveraging Registers in Vision Transformers for Robust Adaptation | Srikar Yellapragada et.al. | 2501.04784 | null |
| 2025-01-07 | SPECTRE: A Hybrid System for an Adaptative and Optimised Cyber Threats Detection, Response and Investigation in Volatile Memory | Arslan Tariq Syed et.al. | 2501.03898 | null |
| 2025-01-07 | KAnoCLIP: Zero-Shot Anomaly Detection through Knowledge-Driven Prompt Learning and Enhanced Cross-Modal Integration | Chengyuan Li et.al. | 2501.03786 | null |
| 2025-01-06 | On the Adversarial Robustness of Benjamini Hochberg | Louis L Chen et.al. | 2501.03402 | null |
| 2025-01-07 | CONTINUUM: Detecting APT Attacks through Spatial-Temporal Graph Neural Networks | Atmane Ayoub Mansour Bahar et.al. | 2501.02981 | null |
| 2025-01-06 | Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls | Can Gao et.al. | 2501.02975 | link |
| 2025-01-06 | Unsupervised Tomato Split Anomaly Detection using Hyperspectral Imaging and Variational Autoencoders | Mahmoud Abdulsalam et.al. | 2501.02921 | null |
| 2025-01-06 | GraphDART: Graph Distillation for Efficient Advanced Persistent Threat Detection | Saba Fathi Rabooki et.al. | 2501.02796 | null |
| 2025-01-06 | Full-conformal novelty detection: A powerful and non-random approach | Junu Lee et.al. | 2501.02703 | null |
| 2025-01-04 | Self-Supervised Learning for Detecting AI-Generated Faces as Anomalies | Mian Zou et.al. | 2501.02207 | link |
| 2025-01-03 | Counterfactual Explanation for Auto-Encoder Based Time-Series Anomaly Detection | Abhishek Srinivasan et.al. | 2501.02069 | null |
| 2025-01-03 | Dynamic Feature Fusion: Combining Global Graph Structures and Local Semantics for Blockchain Fraud Detection | Zhang Sheng et.al. | 2501.02032 | null |
| 2025-01-03 | Robust resonant anomaly detection with NPLM | Gaia Grosso et.al. | 2501.01778 | null |
| 2025-01-03 | LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction | Er Jin et.al. | 2501.01767 | null |
| 2025-01-03 | BARTPredict: Empowering IoT Security with LLM-Driven Cyber Threat Prediction | Alaeddine Diaf et.al. | 2501.01664 | null |
| 2025-01-03 | Multivariate Time Series Anomaly Detection using DiffGAN Model | Guangqiang Wu et.al. | 2501.01591 | null |
| 2025-01-02 | Transfer Neyman-Pearson Algorithm for Outlier Detection | Mohammadreza M. Kalan et.al. | 2501.01525 | null |
| 2025-01-02 | Quantum Computing for Partition Function Estimation of a Markov Random Field in a Radar Anomaly Detection Problem | Timothe Presles et.al. | 2501.01154 | null |
| 2025-01-02 | InDeed: Interpretable image deep decomposition with guaranteed generalizability | Sihan Wang et.al. | 2501.01127 | null |
| 2025-01-02 | SpecPT (Spectroscopy Pre-trained Transformer) Model for Extragalactic Spectroscopy: I. Architecture and Automated Redshift Measurement | Rohan Pattnaik et.al. | 2501.01070 | null |
| 2025-01-02 | An Efficient Outlier Detection Algorithm for Data Streaming | Rui Hu et.al. | 2501.01061 | null |
| 2025-01-01 | LENS-XAI: Redefining Lightweight and Explainable Network Security through Knowledge Distillation and Variational Autoencoders for Scalable Intrusion Detection in Cybersecurity | Muhammet Anil Yagiz et.al. | 2501.00790 | null |
| 2024-12-31 | METANOIA: A Lifelong Intrusion Detection and Investigation System for Mitigating Concept Drift | Jie Ying et.al. | 2501.00438 | null |
| 2024-12-31 | CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection | Xiaolei Wang et.al. | 2501.00346 | null |
| 2024-12-31 | Q3DE: A fault-tolerant quantum computer architecture for multi-bit burst errors by cosmic rays | Yasunari Suzuki et.al. | 2501.00331 | null |
| 2024-12-31 | Collaborative Approaches to Enhancing Smart Vehicle Cybersecurity by AI-Driven Threat Detection | Syed Atif Ali et.al. | 2501.00261 | null |
| 2024-12-30 | An Unsupervised Anomaly Detection in Electricity Consumption Using Reinforcement Learning and Time Series Forest Based Framework | Jihan Ghanim et.al. | 2501.00107 | null |
| 2024-12-30 | Galaxy Spectra Networks (GaSNet). III. Generative pre-trained network for spectrum reconstruction, redshift estimate and anomaly detection | Fucheng Zhong et.al. | 2412.21130 | link |
| 2024-12-30 | SoftPatch+: Fully Unsupervised Anomaly Classification and Segmentation | Chengjie Wang et.al. | 2412.20870 | link |
| 2024-12-30 | Blockchain-Empowered Cyber-Secure Federated Learning for Trustworthy Edge Computing | Ervin Moore et.al. | 2412.20674 | null |
| 2024-12-29 | Testing and Improving the Robustness of Amortized Bayesian Inference for Cognitive Models | Yufei Wu et.al. | 2412.20586 | link |
| 2024-12-29 | A Survey on Time-Series Distance Measures | John Paparrizos et.al. | 2412.20574 | null |
| 2024-12-29 | Dive into Time-Series Anomaly Detection: A Decade Review | Paul Boniol et.al. | 2412.20512 | null |
| 2024-12-29 | Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection | Ayush Ghadiya et.al. | 2412.20455 | null |
| 2024-12-29 | Exploring the Magnitude-Shape Plot Framework for Anomaly Detection in Crowded Video Scenes | Zuzheng Wang et.al. | 2412.20363 | null |
| 2024-12-28 | An Anomaly Detection System Based on Generative Classifiers for Controller Area Network | Chunheng Zhao et.al. | 2412.20255 | null |
| 2024-12-28 | Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems | Wen-Dong Jiang et.al. | 2412.20201 | null |
| 2024-12-27 | Comparative Performance Analysis of Quantum Machine Learning Architectures for Credit Card Fraud Detection | Mansour El Alami et.al. | 2412.19441 | null |
| 2024-12-26 | Time Series Foundational Models: Their Role in Anomaly Detection and Prediction | Chathurangi Shyalika et.al. | 2412.19286 | link |
| 2024-12-26 | Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection | Xiaoyu Huang et.al. | 2412.19108 | null |
| 2024-12-26 | Brain Ageing Prediction using Isolation Forest Technique and Residual Neural Network (ResNet) | Saadat Behzadi et.al. | 2412.19017 | null |
| 2024-12-25 | CausalTAD: Causal Implicit Generative Model for Debiased Online Trajectory Anomaly Detection | Wenbin Li et.al. | 2412.18820 | null |
| 2024-12-24 | Unveiling the Threat of Fraud Gangs to Graph Neural Networks: Multi-Target Graph Injection Attacks against GNN-Based Fraud Detectors | Jinhyeok Choi et.al. | 2412.18370 | link |
| 2024-12-24 | Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight | Xi Ding et.al. | 2412.18298 | link |
| 2024-12-24 | Semi-supervised Credit Card Fraud Detection via Attribute-Driven Graph Representation | Sheng Xiang et.al. | 2412.18287 | link |
| 2024-12-24 | PowerRadio: Manipulate Sensor Measurementvia Power GND Radiation | Yan Jiang et.al. | 2412.18103 | null |
| 2024-12-23 | Kernel-Aware Graph Prompt Learning for Few-Shot Anomaly Detection | Fenfang Tao et.al. | 2412.17619 | link |
| 2024-12-23 | Progressive Boundary Guided Anomaly Synthesis for Industrial Anomaly Detection | Qiyu Chen et.al. | 2412.17458 | null |
| 2024-12-23 | A Temporal Convolutional Network-based Approach for Network Intrusion Detection | Rukmini Nazre et.al. | 2412.17452 | null |
| 2024-12-23 | Cech Complex Generation with Homotopy Equivalence Framework for Myocardial Infarction Diagnosis using Electrocardiogram Signals | Srikireddy Dhanunjay Reddy et.al. | 2412.17370 | null |
| 2024-12-23 | Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective | Kaifang Long et.al. | 2412.17297 | null |
| 2024-12-23 | VarAD: Lightweight High-Resolution Image Anomaly Detection via Visual Autoregressive Modeling | Yunkang Cao et.al. | 2412.17263 | link |
| 2024-12-23 | Dual Conditioned Motion Diffusion for Pose-Based Video Anomaly Detection | Andi Xu et.al. | 2412.17210 | link |
| 2024-12-22 | A Parameter-Efficient Quantum Anomaly Detection Method on a Superconducting Quantum Processor | Maida Wang et.al. | 2412.16867 | link |
| 2024-12-21 | DCOR: Anomaly Detection in Attributed Networks via Dual Contrastive Learning Reconstruction | Hossein Rafiee Zade et.al. | 2412.16788 | null |
| 2024-12-21 | CyberSentinel: Efficient Anomaly Detection in Programmable Switch using Knowledge Distillation | Sankalp Mittal et.al. | 2412.16693 | null |
| 2024-12-20 | Applying Predictive Analytics to Occupational Health and Safety in India | Ritwik Raj Saxena et.al. | 2412.16038 | null |
| 2024-12-20 | Detection of Aerial Spoofing Attacks to LEO Satellite Systems via Deep Learning | Jos Wigchert et.al. | 2412.16008 | null |
| 2024-12-20 | Efficient Curation of Invertebrate Image Datasets Using Feature Embeddings and Automatic Size Comparison | Mikko Impiö et.al. | 2412.15844 | link |
| 2024-12-20 | Static and Dynamic Load Tests on the Bridge Vahrendorfer Stadtweg | Martin Köhncke et.al. | 2412.15713 | null |
| 2024-12-20 | A Deep Probabilistic Framework for Continuous Time Dynamic Graph Generation | Ryien Hosseini et.al. | 2412.15582 | null |
| 2024-12-19 | Cross-System Software Log-based Anomaly Detection Using Meta-Learning | Yuqing Wang et.al. | 2412.15445 | null |
| 2024-12-19 | Cruise Control: Dynamic Model Selection for ML-Based Network Traffic Analysis | Johann Hugon et.al. | 2412.15146 | null |
| 2024-12-19 | Tests for model misspecification in simulation-based inference: from local distortions to global model checks | Noemi Anau Montel et.al. | 2412.15100 | null |
| 2024-12-19 | Provably Convergent Plug-and-play Proximal Block Coordinate Descent Method for Hyperspectral Anomaly Detection | Xiaoxia Liu et.al. | 2412.14824 | null |
| 2024-12-19 | Simplicity over Complexity: An ARN-Based Intrusion Detection Method for Industrial Control Network | Ziyi Liu et.al. | 2412.14669 | null |
| 2024-12-19 | Robust PCA Based on Adaptive Weighted Least Squares and Low-Rank Matrix Factorization | Kexin Li et.al. | 2412.14629 | null |
| 2024-12-19 | Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties | Wenqiao Li et.al. | 2412.14592 | link |
| 2024-12-19 | Global Spatio-Temporal Fusion-based Traffic Prediction Algorithm with Anomaly Aware | Chaoqun Liu et.al. | 2412.14569 | null |
| 2024-12-18 | Flow Exporter Impact on Intelligent Intrusion Detection Systems | Daniela Pinto et.al. | 2412.14021 | null |
| 2024-12-18 | A Review of the Duality of Adversarial Learning in Network Intrusion: Attacks and Countermeasures | Shalini Saini et.al. | 2412.13880 | null |
| 2024-12-18 | Do Language Models Understand Time? | Xi Ding et.al. | 2412.13845 | null |
| 2024-12-18 | Quantum Machine Learning in Log-based Anomaly Detection: Challenges and Opportunities | Jiaxing Qi et.al. | 2412.13529 | null |
| 2024-12-18 | Look Inside for More: Internal Spatial Modality Perception for 3D Anomaly Detection | Hanzhe Liang et.al. | 2412.13461 | null |
| 2024-12-17 | BadSAD: Clean-Label Backdoor Attacks against Deep Semi-Supervised Anomaly Detection | He Cheng et.al. | 2412.13324 | null |
| 2024-12-17 | Enhancing Internet of Things Security throughSelf-Supervised Graph Neural Networks | Safa Ben Atitallah et.al. | 2412.13240 | null |
| 2024-12-17 | Synthetic Data Generation for Anomaly Detection on Table Grapes | Ionut Marian Motoi et.al. | 2412.12949 | null |
| 2024-12-17 | Cuckoo Heavy Keeper and the balancing act of maintaining heavy-hitters in stream processing | Vinh Quang Ngo et.al. | 2412.12873 | null |
| 2024-12-17 | Boosting Fine-Grained Visual Anomaly Detection with Coarse-Knowledge-Aware Adversarial Learning | Qingqing Fang et.al. | 2412.12850 | null |
| 2024-12-17 | PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection | Jianan Ye et.al. | 2412.12617 | null |
| 2024-12-16 | F-RBA: A Federated Learning-based Framework for Risk-based Authentication | Hamidreza Fereidouni et.al. | 2412.12324 | null |
| 2024-12-16 | Are Large Language Models Useful for Time Series Data Analysis? | Francis Tang et.al. | 2412.12219 | null |
| 2024-12-16 | Comprehensive Survey on Adversarial Examples in Cybersecurity: Impacts, Challenges, and Mitigation Strategies | Li Li et.al. | 2412.12217 | null |
| 2024-12-16 | AMI-Net: Adaptive Mask Inpainting Network for Industrial Anomaly Detection and Localization | Wei Luo et.al. | 2412.11802 | link |
| 2024-12-16 | Counting Butterflies over Streaming Bipartite Graphs with Duplicate Edges | Lingkai Meng et.al. | 2412.11488 | null |
| 2024-12-16 | Unsupervised Anomaly Detection for Tabular Data Using Noise Evaluation | Wei Dai et.al. | 2412.11461 | null |
| 2024-12-15 | Learning Set Functions with Implicit Differentiation | Gözde Özcan et.al. | 2412.11239 | null |
| 2024-12-15 | Redefining Normal: A Novel Object-Level Approach for Multi-Object Novelty Detection | Mohammadreza Salehi et.al. | 2412.11148 | link |
| 2024-12-15 | AD-LLM: Benchmarking Large Language Models for Anomaly Detection | Tiankai Yang et.al. | 2412.11142 | link |
| 2024-12-14 | Labeling NIDS Rules with MITRE ATT&CK Techniques: Machine Learning vs. Large Language Models | Nir Daniel et.al. | 2412.10978 | null |
| 2024-12-14 | Know Unreported Roadway Incidents in Real-time: A Deep Learning Framework for Early Traffic Anomaly Detection | Haocheng Duan et.al. | 2412.10892 | null |
| 2024-12-14 | Audio-based Anomaly Detection in Industrial Machines Using Deep One-Class Support Vector Data Description | Sertac Kilickaya et.al. | 2412.10792 | null |
| 2024-12-14 | Diagnosing Unknown Attacks in Smart Homes Using Abductive Reasoning | Kushal Ramkumar et.al. | 2412.10738 | null |
| 2024-12-13 | Copy-Move Detection in Optical Microscopy: A Segmentation Network and A Dataset | Hao-Chiang Shao et.al. | 2412.10258 | null |
| 2024-12-13 | Filter or Compensate: Towards Invariant Representation from Distribution Shift for Anomaly Detection | Zining Chen et.al. | 2412.10115 | null |
| 2024-12-13 | FDM-Bench: A Comprehensive Benchmark for Evaluating Large Language Models in Additive Manufacturing Tasks | Ahmadreza Eslaminia et.al. | 2412.09819 | link |
| 2024-12-12 | Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model | Hang Zhou et.al. | 2412.09026 | null |
| 2024-12-12 | Multimodal Industrial Anomaly Detection by Crossmodal Reverse Distillation | Xinyue Liu et.al. | 2412.08949 | link |
| 2024-12-12 | Federated Foundation Models on Heterogeneous Time Series | Shengchao Chen et.al. | 2412.08906 | link |
| 2024-12-11 | GradStop: Exploring Training Dynamics in Unsupervised Outlier Detection through Gradient Cohesion | Yuang Zhang et.al. | 2412.08501 | link |
| 2024-12-11 | Backdoor attacks on DNN and GBDT – A Case Study from the insurance domain | Robin Kühlem et.al. | 2412.08366 | null |
| 2024-12-11 | Enhancing Cybersecurity in IoT Networks: A Deep Learning Approach to Anomaly Detection | Yining Pang et.al. | 2412.08301 | null |
| 2024-12-11 | Breaking the Bias: Recalibrating the Attention of Industrial Anomaly Detection | Xin Chen et.al. | 2412.08189 | null |
| 2024-12-11 | Unsupervised Detection of Anomalous Driving Patterns Using High Resolution Telematics Time Series Data | Ian Weng Chan et.al. | 2412.08106 | null |
| 2024-12-10 | Distributed Intrusion Detection System using Semantic-based Rules for SCADA in Smart Grid | Sathya Narayana Mohan et.al. | 2412.07917 | null |
| 2024-12-10 | Unlocking the Potential of Reverse Distillation for Anomaly Detection | Xinyue Liu et.al. | 2412.07579 | link |
| 2024-12-10 | Boundary anomaly detection in two-dimensional subsystem symmetry-protected topological phases | Ke Ding et.al. | 2412.07563 | null |
| 2024-12-10 | Anomaly detection using Diffusion-based methods | Aryan Bhosale et.al. | 2412.07539 | null |
| 2024-12-10 | Impact of Sampling Techniques and Data Leakage on XGBoost Performance in Credit Card Fraud Detection | Siyaxolisa Kabane et.al. | 2412.07437 | null |
| 2024-12-09 | Facade: High-Precision Insider Threat Detection Using Deep Contextual Anomaly Detection | Alex Kantchelian et.al. | 2412.06700 | null |
| 2024-12-09 | Simulation of Multi-Stage Attack and Defense Mechanisms in Smart Grids | Omer Sen et.al. | 2412.06255 | null |
| 2024-12-09 | Applications of Positive Unlabeled (PU) and Negative Unlabeled (NU) Learning in Cybersecurity | Robert Dilworth et.al. | 2412.06203 | null |
| 2024-12-09 | Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity | Huaxin Zhang et.al. | 2412.06171 | link |
| 2024-12-08 | siForest: Detecting Network Anomalies with Set-Structured Isolation Forest | Christie Djidjev et.al. | 2412.06015 | null |
| 2024-12-07 | Leveraging Time-Series Foundation Model for Subsurface Well Logs Prediction and Anomaly Detection | Ardiansyah Koeshidayatullah et.al. | 2412.05681 | null |
| 2024-12-07 | Detecting outliers by clustering algorithms | Qi Li et.al. | 2412.05669 | null |
| 2024-12-07 | Hyperedge Anomaly Detection with Hypergraph Neural Network | Md. Tanvir Alam et.al. | 2412.05641 | null |
| 2024-12-07 | Self-Supervised Masked Mesh Learning for Unsupervised Anomaly Detection on 3D Cortical Surfaces | Hao-Chun Yang et.al. | 2412.05580 | null |
| 2024-12-07 | A New Perspective on Time Series Anomaly Detection: Faster Patch-based Broad Learning System | Pengyu Li et.al. | 2412.05498 | null |
| 2024-12-06 | Automated, Unsupervised, and Auto-parameterized Inference of Data Patterns and Anomaly Detection | Qiaolin Qin et.al. | 2412.05240 | null |
| 2024-12-06 | Backdooring Outlier Detection Methods: A Novel Attack Approach | ZeinabSadat Taghavi et.al. | 2412.05010 | null |
| 2024-12-06 | ETLNet: An Efficient TCN-BiLSTM Network for Road Anomaly Detection Using Smartphone Sensors | Mohd Faiz Ansari et.al. | 2412.04990 | null |
| 2024-12-06 | On Process Awareness in Detecting Multi-stage Cyberattacks in Smart Grids | Omer Sen et.al. | 2412.04902 | null |
| 2024-12-06 | Encryption-Aware Anomaly Detection in Power Grid Communication Networks | Omer Sen et.al. | 2412.04901 | null |
| 2024-12-06 | MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects | Lei Fan et.al. | 2412.04867 | null |
| 2024-12-06 | NLP-ADBench: NLP Anomaly Detection Benchmark | Yuangang Li et.al. | 2412.04784 | link |
| 2024-12-06 | DPGIIL: Dirichlet Process-Deep Generative Model-Integrated Incremental Learning for Clustering in Transmissibility-based Online Structural Anomaly Detection | Lin-Feng Mei et.al. | 2412.04781 | null |
| 2024-12-06 | Anomaly Detection and Classification in Knowledge Graphs | Asara Senaratne et.al. | 2412.04780 | null |
| 2024-12-06 | Revitalizing Reconstruction Models for Multi-class Anomaly Detection via Class-Aware Contrastive Learning | Lei Fan et.al. | 2412.04769 | null |
| 2024-12-05 | Towards Zero-shot 3D Anomaly Localization | Yizhou Wang et.al. | 2412.04304 | null |
| 2024-12-05 | SCADE: Scalable Command-line Anomaly Detection Engine | Vaishali Vinay et.al. | 2412.04259 | null |
| 2024-12-05 | DistB-VNET: Distributed Cluster-based Blockchain Vehicular Ad-Hoc Networks through SDN-NFV for Smart City | Anichur Rahman et.al. | 2412.04222 | null |
| 2024-12-05 | ONER: Online Experience Replay for Incremental Anomaly Detection | Yizhou Jin et.al. | 2412.03907 | null |
| 2024-12-05 | Machine Learning-based Android Intrusion Detection System | Madiha Tahreem et.al. | 2412.03894 | null |
| 2024-12-05 | Transferring self-supervised pre-trained models for SHM data anomaly detection with scarce labeled data | Mingyuan Zhou et.al. | 2412.03880 | null |
| 2024-12-05 | Training MLPs on Graphs without Supervision | Zehong Wang et.al. | 2412.03864 | link |
| 2024-12-05 | CLIP-FSAC++: Few-Shot Anomaly Classification with Anomaly Descriptor Based on CLIP | Zuo Zuo et.al. | 2412.03829 | null |
| 2024-12-04 | Model-agnostic search for dijet resonances with anomalous jet substructure in proton-proton collisions at $\sqrt{s}$ = 13 TeV | CMS Collaboration et.al. | 2412.03747 | null |
| 2024-12-04 | Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond | Loukas Ilias et.al. | 2412.03483 | null |
| 2024-12-04 | Sifting through the haystack – efficiently finding rare animal behaviors in large-scale datasets | Shir Bar et.al. | 2412.03452 | link |
| 2024-12-04 | State Frequency Estimation for Anomaly Detection | Clinton Cao et.al. | 2412.03442 | null |
| 2024-12-04 | UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection | Zhaopeng Gu et.al. | 2412.03342 | null |
| 2024-12-04 | The bcc coating of Lennard-Jones crystal nuclei vanishes with a change of local structure detection algorithm | Willem Gispen et.al. | 2412.03276 | null |
| 2024-12-04 | Frequency-Guided Diffusion Model with Perturbation Training for Skeleton-Based Video Anomaly Detection | Xiaofeng Tan et.al. | 2412.03044 | link |
| 2024-12-03 | Optimized IoT Intrusion Detection using Machine Learning Technique | Muhammad Zawad Mahmud et.al. | 2412.02845 | null |
| 2024-12-03 | Graph-Powered Defense: Controller Area Network Intrusion Detection for Unmanned Aerial Vehicles | Reek Majumder et.al. | 2412.02539 | null |
| 2024-12-03 | F-SE-LSTM: A Time Series Anomaly Detection Method with Frequency Domain Information | Yi-Xiang Lu et.al. | 2412.02474 | null |
| 2024-12-03 | Leveraging Ensemble-Based Semi-Supervised Learning for Illicit Account Detection in Ethereum DeFi Transactions | Shabnam Fazliani et.al. | 2412.02408 | null |
| 2024-12-03 | An Automated Data Mining Framework Using Autoencoders for Feature Extraction and Dimensionality Reduction | Yaxin Liang et.al. | 2412.02211 | null |
| 2024-12-03 | Deep Learning, Machine Learning, Advancing Big Data Analytics and Management | Weiche Hsieh et.al. | 2412.02187 | null |
| 2024-12-02 | Network Simulation with Complex Cyber-attack Scenarios | Tiago Dias et.al. | 2412.01421 | null |
| 2024-12-02 | Representation Learning for Time-Domain High-Energy Astrophysics: Discovery of Extragalactic Fast X-ray Transient XRT 200515 | Steven Dillmann et.al. | 2412.01150 | null |
| 2024-12-02 | VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models | Muchao Ye et.al. | 2412.01095 | null |
| 2024-12-02 | Practitioners’ Expectations on Log Anomaly Detection | Xiaoxue Ma et.al. | 2412.01066 | null |
| 2024-12-01 | TGTOD: A Global Temporal Graph Transformer for Outlier Detection at Scale | Kay Liu et.al. | 2412.00984 | link |
| 2024-11-29 | Enhanced anomaly detection in well log data through the application of ensemble GANs | Abdulrahman Al-Fakih et.al. | 2411.19875 | link |
| 2024-11-29 | Real-Time Anomaly Detection in Video Streams | Fabien Poirier et.al. | 2411.19731 | null |
| 2024-11-29 | Real-time Anomaly Detection at the L1 Trigger of CMS Experiment | Abhijith Gandrakota et.al. | 2411.19506 | null |
| 2024-11-29 | Multi-task CNN Behavioral Embedding Model For Transaction Fraud Detection | Bo Qu et.al. | 2411.19457 | null |
| 2024-11-29 | Unsupervised Learning Approach to Anomaly Detection in Gravitational Wave Data | Ammar Fayad et.al. | 2411.19450 | null |
| 2024-11-28 | Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection | Tsun-Hin Cheung et.al. | 2411.19220 | null |
| 2024-11-28 | Swarm Intelligence-Driven Client Selection for Federated Learning in Cybersecurity applications | Koffka Khan et.al. | 2411.18877 | null |
| 2024-11-27 | Optimal In-Network Distribution of Learning Functions for a Secure-by-Design Programmable Data Plane of Next-Generation Networks | Mattia Giovanni Spina et.al. | 2411.18384 | null |
| 2024-11-27 | P4-NIDS: High-Performance Network Monitoring and Intrusion Detection in P4 | Yaying Chen et.al. | 2411.17987 | null |
| 2024-11-26 | Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering | Christopher Holder et.al. | 2411.17838 | null |
| 2024-11-26 | A Machine Learning-based Anomaly Detection Framework in Life Insurance Contracts | Andreas Groll et.al. | 2411.17495 | null |
| 2024-11-26 | GraphSubDetector: Time Series Subsequence Anomaly Detection via Density-Aware Adaptive Graph Neural Network | Weiqi Chen et.al. | 2411.17218 | null |
| 2024-11-25 | Unsupervised Quantum Anomaly Detection on Noisy Quantum Processors | Daniel Pranjić et.al. | 2411.16970 | null |
| 2024-11-25 | Spectroscopic Quasar Anomaly Detection (SQuAD) I: Rest-Frame UV Spectra from SDSS DR16 | Arihant Tiwari et.al. | 2411.16858 | null |
| 2024-11-25 | Revisiting DDIM Inversion for Controlling Defect Generation by Disentangling the Background | Youngjae Cho et.al. | 2411.16767 | null |
| 2024-11-25 | Anomaly Detection and RFI Classification with Unsupervised Learning in Narrowband Radio Technosignature Searches | Ben Jacobson-Bell et.al. | 2411.16556 | null |
| 2024-11-25 | Unsupervised Event Outlier Detection in Continuous Time | Somjit Nath et.al. | 2411.16427 | null |
| 2024-11-25 | FUN-AD: Fully Unsupervised Learning for Anomaly Detection with Noisy Training Data | Jiin Im et.al. | 2411.16110 | link |
| 2024-11-25 | ROADS: Robust Prompt-driven Multi-Class Anomaly Detection under Domain Shift | Hossein Kashiani et.al. | 2411.16049 | null |
| 2024-11-24 | An AutoML-based approach for Network Intrusion Detection | Nana Kankam Gyimah et.al. | 2411.15920 | null |
| 2024-11-24 | Streaming SQL Multi-Way Join Method for Long State Streams | Jinlong Hu et.al. | 2411.15835 | null |
| 2024-11-24 | Runtime-optimized Multi-way Stream Join Operator for Large-scale Streaming data | Jinlong Hu et.al. | 2411.15827 | null |
| 2024-11-23 | Circuit design in biology and machine learning. II. Anomaly detection | Steven A. Frank et.al. | 2411.15647 | null |
| 2024-11-25 | Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation | Aniket Bhattacharyya et.al. | 2411.14957 | null |
| 2024-11-22 | Evaluating Vision Transformer Models for Visual Quality Control in Industrial Manufacturing | Miriam Alber et.al. | 2411.14953 | link |
| 2024-11-22 | Physical and Software Based Fault Injection Attacks Against TEEs in Mobile Devices: A Systemisation of Knowledge | Aaron Joy et.al. | 2411.14878 | null |
| 2024-11-22 | A Lightweight Edge-CNN-Transformer Model for Detecting Coordinated Cyber and Digital Twin Attacks in Cooperative Smart Farming | Lopamudra Praharaj et.al. | 2411.14729 | null |
| 2024-11-21 | Privacy-Preserving Video Anomaly Detection: A Survey | Jing Liu et.al. | 2411.14565 | null |
| 2024-11-21 | The importance of the clustering model to detect new types of intrusion in data traffic | Noor Saud Abd et.al. | 2411.14550 | null |
| 2024-11-21 | Are Anomaly Scores Telling the Whole Story? A Benchmark for Multilevel Anomaly Detection | Tri Cao et.al. | 2411.14515 | null |
| 2024-11-21 | End-to-End Convolutional Activation Anomaly Analysis for Anomaly Detection | Aleksander Kozłowski et.al. | 2411.14509 | null |
| 2024-11-21 | Lower Dimensional Spherical Representation of Medium Voltage Load Profiles for Visualization, Outlier Detection, and Generative Modelling | Edgar Mauricio Salazar Duque et.al. | 2411.14346 | null |
| 2024-11-21 | Adaptive Anomaly Detection for Identifying Attacks in Cyber-Physical Systems: A Systematic Literature Review | Pablo Moriano et.al. | 2411.14278 | null |
| 2024-11-21 | A Dataset for Evaluating Online Anomaly Detection Approaches for Discrete Multivariate Time Series | Lucas Correia et.al. | 2411.13951 | link |
| 2024-11-20 | Demonstrating the Suitability of Neuromorphic, Event-Based, Dynamic Vision Sensors for In Process Monitoring of Metallic Additive Manufacturing and Welding | David Mascareñas et.al. | 2411.13108 | null |
| 2024-11-19 | AI Guided Early Screening of Cervical Cancer | Dharanidharan S I et.al. | 2411.12681 | null |
| 2024-11-19 | UMGAD: Unsupervised Multiplex Graph Anomaly Detection | Xiang Li et.al. | 2411.12556 | null |
| 2024-11-20 | TSINR: Capturing Temporal Continuity via Implicit Neural Representations for Time Series Anomaly Detection | Mengxuan Li et.al. | 2411.11641 | link |
| 2024-11-18 | Feature Selection for Network Intrusion Detection | Charles Westphal et.al. | 2411.11603 | null |
| 2024-11-18 | SADDE: Semi-supervised Anomaly Detection with Dependable Explanations | Yachao Yuan et.al. | 2411.11293 | link |
| 2024-11-17 | Digital Twin for Advanced Network Planning: Tackling Interference | Juan Carlos Estrada-Jimenez et.al. | 2411.11034 | null |
| 2024-11-17 | TeG: Temporal-Granularity Method for Anomaly Detection with Attention in Smart City Surveillance | Erkut Akdag et.al. | 2411.11003 | null |
| 2024-11-17 | Anomaly Detection for People with Visual Impairments Using an Egocentric 360-Degree Camera | Inpyo Song et.al. | 2411.10945 | null |
| 2024-11-17 | LLM-assisted Physical Invariant Extraction for Cyber-Physical Systems Anomaly Detection | Danial Abshari et.al. | 2411.10918 | null |
| 2024-11-16 | Steam Turbine Anomaly Detection: An Unsupervised Learning Approach Using Enhanced Long Short-Term Memory Variational Autoencoder | Weiming Xu et.al. | 2411.10765 | null |
| 2024-11-16 | On-device Anomaly Detection in Conveyor Belt Operations | Luciano S. Martinez-Rau et.al. | 2411.10729 | null |
| 2024-11-15 | Systematically Constructing the Likelihood for Boosted $H\to gg$ Decays | Andrew J. Larkoski et.al. | 2411.10539 | null |
| 2024-11-15 | Uncertainty in Supply Chain Digital Twins: A Quantum-Classical Hybrid Approach | Abdullah Abdullah et.al. | 2411.10254 | null |
| 2024-11-15 | Outliers resistant image classification by anomaly detection | Anton Sergeev et.al. | 2411.10150 | null |
| 2024-11-15 | Early Detection of Multiwavelength Blazar Variability | Hermann Stolte et.al. | 2411.10140 | null |
| 2024-11-15 | Quantum similarity learning for anomaly detection | A. Hammad et.al. | 2411.09927 | null |
| 2024-11-14 | Deep Autoencoders for Unsupervised Anomaly Detection in Wildfire Prediction | İrem Üstek et.al. | 2411.09844 | null |
| 2024-11-14 | Adaptive Deviation Learning for Visual Anomaly Detection with Data Contamination | Anindya Sundar Das et.al. | 2411.09558 | link |
| 2024-11-14 | Exploring Zero-Shot Anomaly Detection with CLIP in Medical Imaging: Are We There Yet? | Aldo Marzullo et.al. | 2411.09310 | null |
| 2024-11-14 | Advancing Software Security and Reliability in Cloud Platforms through AI-based Anomaly Detection | Sabbir M. Saleh et.al. | 2411.09200 | null |
| 2024-11-13 | Continuous GNN-based Anomaly Detection on Edge using Efficient Adaptive Knowledge Graph Learning | Sanggeon Yun et.al. | 2411.09072 | null |
| 2024-11-13 | Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset | Mohammad Saiful Islam et.al. | 2411.09047 | null |
| 2024-11-13 | Unsupervised Parameter-free Outlier Detection using HDBSCAN* Outlier Profiles | Kushankur Ghosh et.al. | 2411.08867 | null |
| 2024-11-13 | AstroM $^3$ : A self-supervised multimodal model for astronomy | Mariia Rizhko et.al. | 2411.08842 | null |
| 2024-11-13 | AI-Enhanced Inverter Fault and Anomaly Detection System for Distributed Energy Resources in Microgrids | Swetha Rani Kasimalla et.al. | 2411.08761 | null |
| 2024-11-13 | Weakly-Supervised Anomaly Detection in Surveillance Videos Based on Two-Stream I3D Convolution Network | Sareh Soltani Nejad et.al. | 2411.08755 | null |
| 2024-11-13 | LogLLM: Log-based Anomaly Detection Using Large Language Models | Wei Guan et.al. | 2411.08561 | link |
| 2024-11-13 | Graph Neural Networks in Supply Chain Analytics and Optimization: Concepts, Perspectives, Dataset and Benchmarks | Azmine Toushik Wasi et.al. | 2411.08550 | null |
| 2024-11-13 | A Fuzzy Reinforcement LSTM-based Long-term Prediction Model for Fault Conditions in Nuclear Power Plants | Siwei Li et.al. | 2411.08370 | null |
| 2024-11-12 | EAPCR: A Universal Feature Extractor for Scientific Data without Explicit Feature Relation Patterns | Zhuohang Yu et.al. | 2411.08164 | null |
| 2024-11-12 | Spatially Regularized Graph Attention Autoencoder Framework for Detecting Rainfall Extremes | Mihir Agarwal et.al. | 2411.07753 | null |
| 2024-11-12 | Disentangling Tabular Data towards Better One-Class Anomaly Detection | Jianan Ye et.al. | 2411.07574 | null |
| 2024-11-12 | Contrastive Language Prompting to Ease False Positives in Medical Anomaly Detection | YeongHyeon Park et.al. | 2411.07546 | null |
| 2024-11-11 | SDN-Based Smart Cyber Switching (SCS) for Cyber Restoration of a Digital Substation | Mansi Girdhar et.al. | 2411.07433 | null |
| 2024-11-11 | Anomaly Detection in OKTA Logs using Autoencoders | Jericho Cain et.al. | 2411.07314 | null |
| 2024-11-10 | ASTD Patterns for Integrated Continuous Anomaly Detection In Data Logs | Chaymae El Jabri et.al. | 2411.07272 | null |
| 2024-11-11 | Enhancing Predictive Maintenance in Mining Mobile Machinery through a TinyML-enabled Hierarchical Inference Network | Raúl de la Fuente et.al. | 2411.07168 | null |
| 2024-11-11 | A neural-network based anomaly detection system and a safety protocol to protect vehicular network | Marco Franceschini et.al. | 2411.07013 | null |
| 2024-11-10 | UniGAD: Unifying Multi-level Graph Anomaly Detection | Yiqing Lin et.al. | 2411.06427 | link |
| 2024-11-10 | Locally Adaptive One-Class Classifier Fusion with Dynamic $\ell$ p-Norm Constraints for Robust Anomaly Detection | Sepehr Nourmohammadi et.al. | 2411.06406 | null |
| 2024-11-09 | Early Prediction of Natural Gas Pipeline Leaks Using the MKTCN Model | Xuguang Li et.al. | 2411.06214 | null |
| 2024-11-09 | IDU-Detector: A Synergistic Framework for Robust Masquerader Attack Detection | Zilin Huang et.al. | 2411.06172 | null |
| 2024-11-09 | GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection | Jiyul Ham et.al. | 2411.06071 | null |
| 2024-11-08 | Sdn Intrusion Detection Using Machine Learning Method | Muhammad Zawad Mahmud et.al. | 2411.05888 | null |
| 2024-11-08 | Differential Privacy Under Class Imbalance: Methods and Empirical Insights | Lucas Rosenblatt et.al. | 2411.05733 | null |
| 2024-11-08 | Machine learning-driven Anomaly Detection and Forecasting for Euclid Space Telescope Operations | Pablo Gómez et.al. | 2411.05596 | null |
| 2024-11-07 | Interpretable Measurement of CNN Deep Feature Density using Copula and the Generalized Characteristic Function | David Chapman et.al. | 2411.05183 | null |
| 2024-11-07 | MISGUIDE: Security-Aware Attack Analytics for Smart Grid Load Frequency Control | Nur Imtiazul Haque et.al. | 2411.04731 | null |
| 2024-11-08 | From CNN to ConvRNN: Adapting Visualization Techniques for Time-Series Anomaly Detection | Fabien Poirier et.al. | 2411.04707 | null |
| 2024-11-07 | Peri-midFormer: Periodic Pyramid Transformer for Time Series Analysis | Qiang Wu et.al. | 2411.04554 | link |
| 2024-11-07 | GPT-Guided Monte Carlo Tree Search for Symbolic Regression in Financial Fraud Detection | Prashank Kadam et.al. | 2411.04459 | null |
| 2024-11-06 | Astronomaly Protege: Discovery Through Human-Machine Collaboration | Michelle Lochner et.al. | 2411.04188 | link |
| 2024-11-06 | Synomaly Noise and Multi-Stage Diffusion: A Novel Approach for Unsupervised Anomaly Detection in Ultrasound Imaging | Yuan Bi et.al. | 2411.04004 | null |
| 2024-11-06 | Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis | Alexandros Gkillas et.al. | 2411.03996 | null |
| 2024-11-05 | Enhanced Real-Time Threat Detection in 5G Networks: A Self-Attention RNN Autoencoder Approach for Spectral Intrusion Analysis | Mohammadreza Kouchaki et.al. | 2411.03365 | null |
| 2024-11-04 | LLM-based Continuous Intrusion Detection Framework for Next-Gen Networks | Frederic Adjewa et.al. | 2411.03354 | null |
| 2024-11-05 | iAnomaly: A Toolkit for Generating Performance Anomaly Datasets in Edge-Cloud Integrated Computing Environments | Duneesha Fernando et.al. | 2411.02868 | null |
| 2024-11-05 | Brewing Vodka: Distilling Pure Knowledge for Lightweight Threat Detection in Audit Logs | Weiheng Wu et.al. | 2411.02775 | null |
| 2024-11-05 | JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase | Wanying Ding et.al. | 2411.02695 | null |
| 2024-11-04 | Visually Analyze SHAP Plots to Diagnose Misclassifications in ML-based Intrusion Detection | Maraz Mia et.al. | 2411.02670 | null |
| 2024-11-04 | See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers | Jiaxin Zhuang et.al. | 2411.02465 | null |
| 2024-11-04 | Advancing Cyber-Attack Detection in Power Systems: A Comparative Study of Machine Learning and Graph Neural Network Approaches | Tianzhixi Yin et.al. | 2411.02248 | null |
| 2024-11-04 | HACD: Harnessing Attribute Semantics and Mesoscopic Structure for Community Detection | Anran Zhang et.al. | 2411.01947 | link |
| 2024-11-04 | High-Pass Graph Convolutional Network for Enhanced Anomaly Detection: A Novel Approach | Shelei Li et.al. | 2411.01817 | null |
| 2024-11-04 | TabSec: A Collaborative Framework for Novel Insider Threat Detection | Zilin Huang et.al. | 2411.01779 | null |
| 2024-11-03 | Anomalous Client Detection in Federated Learning | Dipanwita Thakur et.al. | 2411.01490 | null |
| 2024-11-02 | Autoencoders for At-Source Data Reduction and Anomaly Detection in High Energy Particle Detectors | Alexander Yue et.al. | 2411.01118 | null |
| 2024-11-01 | Identify Backdoored Model in Federated Learning via Individual Unlearning | Jiahao Xu et.al. | 2411.01040 | null |
| 2024-11-01 | AAD-LLM: Adaptive Anomaly Detection Using Large Language Models | Alicia Russell-Gilbert et.al. | 2411.00914 | null |
| 2024-11-01 | PedSleepMAE: Generative Model for Multimodal Pediatric Sleep Signals | Saurav R. Pandey et.al. | 2411.00718 | null |
| 2024-11-01 | Integrating Fuzzy Logic into Deep Symbolic Regression | Wout Gerdes et.al. | 2411.00431 | null |
| 2024-10-31 | AR-Pro: Counterfactual Explanations for Anomaly Repair with Formal Properties | Xiayan Ji et.al. | 2410.24178 | null |
| 2024-10-31 | Distributing Intelligence in 6G Programmable Data Planes for Effective In-Network Deployment of an Active Intrusion Detection System | Mattia G. Spina et.al. | 2410.24013 | null |
| 2024-10-31 | Towards Convexity in Anomaly Detection: A New Formulation of SSLM with Unique Optimal Solutions | Hongying Liu et.al. | 2410.23774 | null |
| 2024-10-30 | Partial Channel Dependence with Channel Masks for Time Series Foundation Models | Seunghan Lee et.al. | 2410.23222 | null |
| 2024-10-30 | Directional anomaly detection | Oliver Urs Lenz et.al. | 2410.23158 | null |
| 2024-10-30 | Dynamic Threshold-based Two-layer Online Unsupervised Anomaly Detector | Yachao Yuan et.al. | 2410.22967 | link |
| 2024-10-30 | MIXAD: Memory-Induced Explainable Time Series Anomaly Detection | Minha Kim et.al. | 2410.22735 | link |
| 2024-10-30 | PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation | Ryozo Masukawa et.al. | 2410.22623 | null |
| 2024-10-29 | Unsupervised Multimodal Fusion of In-process Sensor Data for Advanced Manufacturing Process Monitoring | Matthew McKinney et.al. | 2410.22558 | null |
| 2024-10-29 | Hypergraph-based multi-scale spatio-temporal graph convolution network for Time-Series anomaly detection | Hongyi Xu et.al. | 2410.22256 | null |
| 2024-10-29 | A Survey on RGB, 3D, and Multimodal Approaches for Unsupervised Industrial Anomaly Detection | Yuxuan Lin et.al. | 2410.21982 | link |
| 2024-10-29 | LogSHIELD: A Graph-based Real-time Anomaly Detection Framework using Frequency Analysis | Krishna Chandra Roy et.al. | 2410.21936 | null |
| 2024-10-29 | Differentiable Inductive Logic Programming for Fraud Detection | Boris Wolfson et.al. | 2410.21928 | null |
| 2024-10-29 | SCGNet-Stacked Convolution with Gated Recurrent Unit Network for Cyber Network Intrusion Detection and Intrusion Type Classification | Rajana Akter et.al. | 2410.21873 | null |
| 2024-10-29 | Representational learning for an anomalous sound detection system with source separation model | Seunghyeon Shin et.al. | 2410.21797 | null |
| 2024-10-29 | Sliced-Wasserstein-based Anomaly Detection and Open Dataset for Localized Critical Peak Rebates | Julien Pallage et.al. | 2410.21712 | null |
| 2024-10-28 | A Generative Model Based Honeypot for Industrial OPC UA Communication | Olaf Sassnick et.al. | 2410.21574 | link |
| 2024-10-28 | A Systematic Review of Machine Learning in Sports Betting: Techniques, Challenges, and Future Directions | René Manassé Galekwa et.al. | 2410.21484 | null |
| 2024-10-28 | Topological Identification of Agent Status in Information Contagions: Application to Financial Markets | Anubha Goel et.al. | 2410.21104 | null |
| 2024-10-28 | A Review of Graph-Powered Data Quality Applications for IoT Monitoring Sensor Networks | Pau Ferrer-Cid et.al. | 2410.21006 | null |
| 2024-10-27 | SIGMA: Single Interpolated Generative Model for Anomalies | Ranit Das et.al. | 2410.20537 | null |
| 2024-10-27 | Causal Modeling in Multi-Context Systems: Distinguishing Multiple Context-Specific Causal Graphs which Account for Observational Support | Martin Rabel et.al. | 2410.20405 | null |
| 2024-10-27 | Rethinking Reconstruction-based Graph-Level Anomaly Detection: Limitations and a Simple Remedy | Sunwoo Kim et.al. | 2410.20366 | null |
| 2024-10-27 | ANOMIX: A Simple yet Effective Hard Negative Generation via Mixing for Graph Anomaly Detection | Hwan Kim et.al. | 2410.20310 | link |
| 2024-10-26 | Proactive Fraud Defense: Machine Learning’s Evolving Role in Protecting Against Online Fraud | Md Kamrul Hasan Chy et.al. | 2410.20281 | null |
| 2024-10-26 | ResAD: A Simple Framework for Class Generalizable Anomaly Detection | Xincheng Yao et.al. | 2410.20047 | link |
| 2024-10-25 | Federated Anomaly Detection for Early-Stage Diagnosis of Autism Spectrum Disorders using Serious Game Data | Nikolaos Pavlidis et.al. | 2410.20003 | null |
| 2024-10-25 | Temporal Convolution-based Hybrid Model Approach with Representation Learning for Real-Time Acoustic Anomaly Detection | Sahan Dissanayaka et.al. | 2410.19722 | null |
| 2024-10-25 | Enhanced Anomaly Detection in Industrial Control Systems aided by Machine Learning | Vegard Berge et.al. | 2410.19717 | null |
| 2024-10-25 | Neuromorphic IoT Architecture for Efficient Water Management: A Smart Village Case Study | Mugdim Bublin et.al. | 2410.19562 | null |
| 2024-10-25 | Detection of Emerging Infectious Diseases in Lung CT based on Spatial Anomaly Patterns | Branko Mitic et.al. | 2410.19535 | null |
| 2024-10-24 | Context-Aware Trajectory Anomaly Detection | Haoji Hu et.al. | 2410.19136 | null |
| 2024-10-24 | Exploring the Universe with SNAD: Anomaly Detection in Astronomy | Alina A. Volnova et.al. | 2410.18875 | null |
| 2024-10-24 | Low-Latency Video Anonymization for Crowd Anomaly Detection: Privacy vs. Performance | Mulugeta Weldezgina Asres et.al. | 2410.18717 | link |
| 2024-10-25 | NIDS Neural Networks Using Sliding Time Window Data Processing with Trainable Activations and its Generalization Capability | Anton Raskovalov et.al. | 2410.18658 | null |
| 2024-10-24 | Graph Pre-Training Models Are Strong Anomaly Detectors | Jiashun Cheng et.al. | 2410.18487 | null |
| 2024-10-24 | Harnessing PU Learning for Enhanced Cloud-based DDoS Detection: A Comparative Analysis | Robert Dilworth et.al. | 2410.18380 | null |
| 2024-10-23 | Advancing Network Security: A Comprehensive Testbed and Dataset for Machine Learning-Based Intrusion Detection | Talaya Farasat et.al. | 2410.18332 | null |
| 2024-10-23 | Real time anomalies detection on video | Fabien Poirier et.al. | 2410.18051 | null |
| 2024-10-22 | Data Obfuscation through Latent Space Projection (LSP) for Privacy-Preserving AI Governance: Case Studies in Medical Diagnosis and Finance Fraud Detection | Mahesh Vaijainthymala Krishnamoorthy et.al. | 2410.17459 | null |
| 2024-10-22 | Coniferest: a complete active anomaly detection framework | M. V. Kornilov et.al. | 2410.17142 | null |
| 2024-10-22 | OMLog: Online Log Anomaly Detection for Evolving System with Meta-learning | Jiyu Tian et.al. | 2410.16612 | null |
| 2024-10-22 | Generative AI for Overall Mission Effectiveness at the Habitable Worlds Observatory | Megan Shabram et.al. | 2410.16609 | null |
| 2024-10-21 | Spatio-temporal Multivariate Cluster Evolution Analysis for Detecting and Tracking Climate Impacts | Warren L. Davis IV et.al. | 2410.16544 | null |
| 2024-10-21 | LLM-TS Integrator: Integrating LLM for Enhanced Time Series Modeling | Can Chen et.al. | 2410.16489 | null |
| 2024-10-21 | Revisiting Deep Feature Reconstruction for Logical and Structural Industrial Anomaly Detection | Sukanya Patra et.al. | 2410.16255 | link |
| 2024-10-21 | TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis | Shiyu Wang et.al. | 2410.16032 | null |
| 2024-10-21 | MultiRC: Joint Learning for Time Series Anomaly Prediction and Detection with Multi-scale Reconstructive Contrast | Shiyan Hu et.al. | 2410.15997 | null |
| 2024-10-21 | Redefining Finance: The Influence of Artificial Intelligence (AI) and Machine Learning (ML) | Animesh Kumar et.al. | 2410.15951 | null |
| 2024-10-21 | Hybrid Architecture for Real-Time Video Anomaly Detection: Integrating Spatial and Temporal Analysis | Fabien Poirier et.al. | 2410.15909 | null |
| 2024-10-21 | A Comprehensive Comparative Study of Individual ML Models and Ensemble Strategies for Network Intrusion Detection Systems | Ismail Bibers et.al. | 2410.15597 | null |
| 2024-10-20 | MedDiff-FM: A Diffusion-based Foundation Model for Versatile Medical Image Applications | Yongrui Yu et.al. | 2410.15432 | null |
| 2024-10-20 | XAI-based Feature Ensemble for Enhanced Anomaly Detection in Autonomous Driving Systems | Sazid Nazat et.al. | 2410.15405 | null |
| 2024-10-19 | Controllable RANSAC-based Anomaly Detection via Hypothesis Testing | Le Hong Phong et.al. | 2410.15133 | null |
| 2024-10-19 | ReeFRAME: Reeb Graph based Trajectory Analysis Framework to Capture Top-Down and Bottom-Up Patterns of Life | Chandrakanth Gudavalli et.al. | 2410.14913 | null |
| 2024-10-18 | Towards Unsupervised Validation of Anomaly-Detection Models | Lihi Idan et.al. | 2410.14579 | null |
| 2024-10-18 | AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios | Ziming Huang et.al. | 2410.14379 | link |
| 2024-10-18 | FedMSE: Federated learning for IoT network intrusion detection | Van Tuan Nguyen et.al. | 2410.14121 | link |
| 2024-10-17 | A Physics-Based Context-Aware Approach for Anomaly Detection in Teleoperated Driving Operations Under False Data Injection Attacks | Subhadip Ghosh et.al. | 2410.13962 | null |
| 2024-10-17 | Statistical testing on generative AI anomaly detection tools in Alzheimer’s Disease diagnosis | Rosemary He et.al. | 2410.13363 | null |
| 2024-10-17 | A Comprehensive Analysis of Routing Vulnerabilities and Defense Strategies in IoT Networks | Kim Jae-Dong et.al. | 2410.13214 | null |
| 2024-10-16 | FedCAP: Robust Federated Learning via Customized Aggregation and Personalization | Youpeng Li et.al. | 2410.13083 | link |
| 2024-10-16 | Semi-supervised Learning for Detecting Inverse Compton Emission in Galaxy Clusters | Sheng-Chieh Lin et.al. | 2410.12943 | null |
| 2024-10-17 | Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2 | Mohamad Abdi et.al. | 2410.12686 | null |
| 2024-10-16 | Improved Anomaly Detection through Conditional Latent Space VAE Ensembles | Oskar Åström et.al. | 2410.12328 | link |
| 2024-10-16 | Revisited Large Language Model for Time Series Analysis through Modality Alignment | Liangwei Nathan Zheng et.al. | 2410.12326 | null |
| 2024-10-16 | CATCH: Channel-Aware multivariate Time Series Anomaly Detection via Frequency Patching | Xingjian Wu et.al. | 2410.12261 | link |
| 2024-10-15 | SplatPose+: Real-time Image-Based Pose-Agnostic 3D Anomaly Detection | Yizhe Liu et.al. | 2410.12080 | null |
| 2024-10-15 | Federated Learning framework for LoRaWAN-enabled IIoT communication: A case study | Oscar Torres Sanchez et.al. | 2410.11612 | null |
| 2024-10-15 | PaSTe: Improving the Efficiency of Visual Anomaly Detection at the Edge | Manuel Barusco et.al. | 2410.11591 | link |
| 2024-10-15 | CONSULT: Contrastive Self-Supervised Learning for Few-shot Tumor Detection | Sin Chee Chin et.al. | 2410.11307 | null |
| 2024-10-14 | ASTM :Autonomous Smart Traffic Management System Using Artificial Intelligence CNN and LSTM | Christofel Rio Goenawan et.al. | 2410.10929 | null |
| 2024-10-14 | AI-based particle track identification in scintillating fibres read out with imaging sensors | Noemi Bührer et.al. | 2410.10519 | null |
| 2024-10-14 | WT-CFormer: High-Performance Web Traffic Anomaly Detection Using CNN and Transformer Networks | Yundi He et.al. | 2410.10327 | null |
| 2024-10-14 | Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection | Jiawen Zhu et.al. | 2410.10289 | link |
| 2024-10-14 | LADMIM: Logical Anomaly Detection with Masked Image Modeling in Discrete Latent Space | Shunsuke Sakai et.al. | 2410.10234 | null |
| 2024-10-14 | XAI-based Feature Selection for Improved Network Intrusion Detection Systems | Osvaldo Arreche et.al. | 2410.10050 | link |
| 2024-10-13 | Point Cloud Novelty Detection Based on Latent Representations of a General Feature Extractor | Shizuka Akahori et.al. | 2410.09861 | null |
| 2024-10-13 | DAS3D: Dual-modality Anomaly Synthesis for 3D Anomaly Detection | Kecen Li et.al. | 2410.09821 | null |
| 2024-10-12 | Timeseria: an object-oriented time series processing library | Stefano Alberto Russo et.al. | 2410.09567 | null |
| 2024-10-12 | Anomaly Detection and Inlet Pressure Prediction in Water Distribution Systems Using Machine Learning | Tran Dang Khoa et.al. | 2410.09530 | null |
| 2024-10-12 | MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection | Xi Jiang et.al. | 2410.09453 | link |
| 2024-10-11 | Transforming In-Vehicle Network Intrusion Detection: VAE-based Knowledge Distillation Meets Explainable AI | Muhammet Anil Yagiz et.al. | 2410.09043 | null |
| 2024-10-11 | Lifted Coefficient of Determination: Fast model-free prediction intervals and likelihood-free model comparison | Daniel Salnikov et.al. | 2410.08958 | null |
| 2024-10-11 | Low-complexity Attention-based Unsupervised Anomalous Sound Detection exploiting Separable Convolutions and Angular Loss | Michael Neri et.al. | 2410.08919 | null |
| 2024-10-11 | Interdependency Matters: Graph Alignment for Multivariate Time Series Anomaly Detection | Yuanyi Wang et.al. | 2410.08877 | null |
| 2024-10-11 | Towards Cross-domain Few-shot Graph Anomaly Detection | Jiazhen Chen et.al. | 2410.08629 | null |
| 2024-10-11 | A Theoretical Framework for AI-driven data quality monitoring in high-volume data environments | Nikhil Bangad et.al. | 2410.08576 | null |
| 2024-10-10 | KnowGraph: Knowledge-Enabled Anomaly Detection via Logical Reasoning on Graph Data | Andy Zhou et.al. | 2410.08390 | null |
| 2024-10-10 | Heterogeneous Graph Auto-Encoder for CreditCard Fraud Detection | Moirangthem Tiken Singh et.al. | 2410.08121 | null |
| 2024-10-09 | Spatiotemporal Modeling and Forecasting at Scale with Dynamic Generalized Linear Models | Pranay Pherwani et.al. | 2410.07161 | null |
| 2024-10-09 | Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax | Ivan Butakov et.al. | 2410.06993 | null |
| 2024-10-09 | Revisiting Multi-Permutation Equivariance through the Lens of Irreducible Representations | Yonatan Sverdlov et.al. | 2410.06665 | link |
| 2024-10-10 | Task-oriented Time Series Imputation Evaluation via Generalized Representers | Zhixian Wang et.al. | 2410.06652 | link |
| 2024-10-09 | On The Relationship between Visual Anomaly-free and Anomalous Representations | Riya Sadrani et.al. | 2410.06576 | null |
| 2024-10-09 | DiffGAD: A Diffusion-based Unsupervised Graph Anomaly Detector | Jinghan Li et.al. | 2410.06549 | link |
| 2024-10-08 | MTFL: Multi-Timescale Feature Learning for Weakly-Supervised Anomaly Detection in Surveillance Videos | Yiling Zhang et.al. | 2410.05900 | null |
| 2024-10-08 | Extreme Value Modelling of Feature Residuals for Anomaly Detection in Dynamic Graphs | Sevvandi Kandanaarachchi et.al. | 2410.05687 | null |
| 2024-10-07 | Can LLMs Understand Time Series Anomalies? | Zihao Zhou et.al. | 2410.05440 | link |
| 2024-10-07 | Neural Fourier Modelling: A Highly Compact Approach to Time-Series Analysis | Minjung Kim et.al. | 2410.04703 | link |
| 2024-10-06 | Fast Area-Weighted Peeling of Convex Hulls for Outlier Detection | Vinesh Sridhar et.al. | 2410.04544 | null |
| 2024-10-06 | Data Distribution Valuation | Xinyi Xu et.al. | 2410.04386 | link |
| 2024-10-06 | Multi Armed Bandit Algorithms Based Virtual Machine Allocation Policy for Security in Multi-Tenant Distributed Systems | Pravin Patil et.al. | 2410.04363 | null |
| 2024-10-05 | Self-Supervised Anomaly Detection in the Wild: Favor Joint Embeddings Methods | Daniel Otero et.al. | 2410.04289 | null |
| 2024-10-05 | Applying Quantum Autoencoders for Time Series Anomaly Detection | Robin Frehner et.al. | 2410.04154 | null |
| 2024-10-05 | Beyond Forecasting: Compositional Time Series Reasoning for End-to-End Task Execution | Wen Ye et.al. | 2410.04047 | null |
| 2024-10-05 | BlockFound: Customized blockchain foundation model for anomaly detection | Jiahao Yu et.al. | 2410.04039 | null |
| 2024-10-04 | Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection | Ksheeraja Raghavan et.al. | 2410.03904 | null |
| 2024-10-04 | Identification of Anomalous Geospatial Trajectories via Persistent Homology | Kyle Evans-Lee et.al. | 2410.03889 | null |
| 2024-10-04 | Selective Test-Time Adaptation for Unsupervised Anomaly Detection using Neural Implicit Representations | Sameer Ambekar et.al. | 2410.03306 | null |
| 2024-10-03 | Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization | Ryan C. Barron et.al. | 2410.02721 | null |
| 2024-10-02 | HyperBrain: Anomaly Detection for Temporal Hypergraph Brain Networks | Sadaf Sadeghian et.al. | 2410.02087 | link |
| 2024-10-02 | RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection | Bingchen Miao et.al. | 2410.01737 | null |
| 2024-10-03 | LEGO: Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion | Dexuan Ding et.al. | 2410.01506 | null |
| 2024-10-02 | Uncertainty-aware Human Mobility Modeling and Anomaly Detection | Haomin Wen et.al. | 2410.01281 | null |
| 2024-10-01 | Finding radio transients with anomaly detection and active learning based on volunteer classifications | Alex Andersson et.al. | 2410.01034 | null |
| 2024-10-01 | Machine Learning-Assisted Intrusion Detection for Enhancing Internet of Things Security | Mona Esmaeili et.al. | 2410.01016 | null |
| 2024-10-03 | Back to Bayesics: Uncovering Human Mobility Distributions and Anomalies with an Integrated Statistical and Neural Framework | Minxuan Duan et.al. | 2410.01011 | null |
| 2024-10-01 | Review of blockchain application with Graph Neural Networks, Graph Convolutional Networks and Convolutional Neural Networks | Amy Ancelotti et.al. | 2410.00875 | null |
| 2024-10-02 | Show Me What’s Wrong!: Combining Charts and Text to Guide Data Analysis | Beatriz Feliciano et.al. | 2410.00727 | null |
| 2024-10-01 | RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations | Kaichen Zhou et.al. | 2410.00713 | link |
| 2024-10-01 | ECORS: An Ensembled Clustering Approach to Eradicate The Local And Global Outlier In Collaborative Filtering Recommender System | Mahamudul Hasan et.al. | 2410.00408 | null |
| 2024-09-30 | What Information Contributes to Log-based Anomaly Detection? Insights from a Configurable Transformer-Based Approach | Xingfang Wu et.al. | 2409.20503 | link |
| 2024-09-30 | ALLO: A Photorealistic Dataset and Data Generation Pipeline for Anomaly Detection During Robotic Proximity Operations in Lunar Orbit | Selina Leveugle et.al. | 2409.20435 | link |
| 2024-09-30 | Novel machine learning applications at the LHC | Javier M. Duarte et.al. | 2409.20413 | null |
| 2024-09-30 | CableInspect-AD: An Expert-Annotated Anomaly Detection Dataset | Akshatha Arodi et.al. | 2409.20353 | link |
| 2024-09-30 | Constraining Anomaly Detection with Anomaly-Free Regions | Maximilian Toller et.al. | 2409.20208 | null |
| 2024-09-30 | VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection | Huilin Deng et.al. | 2409.20146 | null |
| 2024-09-29 | MCDDPM: Multichannel Conditional Denoising Diffusion Model for Unsupervised Anomaly Detection in Brain MRI | Vivek Kumar Trivedi et.al. | 2409.19623 | link |
| 2024-09-28 | Efficient Federated Intrusion Detection in 5G ecosystem using optimized BERT-based model | Frederic Adjewa et.al. | 2409.19390 | null |
| 2024-09-28 | Sparse Modelling for Feature Learning in High Dimensional Data | Harish Neelam et.al. | 2409.19361 | null |
| 2024-09-27 | Semi-Supervised Bone Marrow Lesion Detection from Knee MRI Segmentation Using Mask Inpainting Models | Shihua Qin et.al. | 2409.19185 | null |
| 2024-09-27 | CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting | Josef Koumar et.al. | 2409.18874 | null |
| 2024-09-27 | Adversarial Challenges in Network Intrusion Detection Systems: Research Insights and Future Prospects | Sabrine Ennaji et.al. | 2409.18736 | null |
| 2024-09-27 | Enhanced Convolution Neural Network with Optimized Pooling and Hyperparameter Tuning for Network Intrusion Detection | Ayush Kumar Sharma et.al. | 2409.18642 | link |
| 2024-09-27 | MIMII-Gen: Generative Modeling Approach for Simulated Evaluation of Anomalous Sound Detection System | Harsh Purohit et.al. | 2409.18542 | null |
| 2024-09-27 | Improved Approximation Algorithms for Relational Clustering | Aryan Esmailpour et.al. | 2409.18498 | null |
| 2024-09-27 | Review of Digital Asset Development with Graph Neural Network Unlearning | Zara Lisbon et.al. | 2409.18455 | null |
| 2024-09-27 | Neural Collaborative Filtering to Detect Anomalies in Human Semantic Trajectories | Yueyang Liu et.al. | 2409.18427 | null |
| 2024-09-26 | Machine Learning-based vs Deep Learning-based Anomaly Detection in Multivariate Time Series for Spacecraft Attitude Sensors | R. Gallon et.al. | 2409.17841 | null |
| 2024-09-26 | Invariant Coordinate Selection and Fisher discriminant subspace beyond the case of two groups | Colombe Becquart et.al. | 2409.17631 | null |
| 2024-09-26 | Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection | Jiahao Lyu et.al. | 2409.17608 | null |
| 2024-09-26 | Revisiting Deep Ensemble Uncertainty for Enhanced Medical Anomaly Detection | Yi Gu et.al. | 2409.17485 | link |
| 2024-09-25 | VL4AD: Vision-Language Models Improve Pixel-wise Anomaly Detection | Liangyu Zhong et.al. | 2409.17330 | null |
| 2024-09-25 | Scalable quality control on processing of large diffusion-weighted and structural magnetic resonance imaging datasets | Michael E. Kim et.al. | 2409.17286 | null |
| 2024-09-25 | Conditional Testing based on Localized Conformal p-values | Xiaoyang Wu et.al. | 2409.16829 | null |
| 2024-09-25 | XAI-guided Insulator Anomaly Detection for Imbalanced Datasets | Maximilian Andreas Hoefler et.al. | 2409.16821 | null |
| 2024-09-26 | VideoPatchCore: An Effective Method to Memorize Normality for Video Anomaly Detection | Sunghyun Ahn et.al. | 2409.16225 | link |
| 2024-09-24 | Exploring the Impact of Outlier Variability on Anomaly Detection Evaluation Metrics | Minjae Ok et.al. | 2409.15986 | null |
| 2024-09-24 | Leveraging Unsupervised Learning for Cost-Effective Visual Anomaly Detection | Yunbo Long et.al. | 2409.15980 | null |
| 2024-09-24 | A sparsified Christoffel function for high-dimensional inference | Jean-Bernard Lasserre et.al. | 2409.15965 | null |
| 2024-09-24 | A Multi-Level Approach for Class Imbalance Problem in Federated Learning for Remote Industry 4.0 Applications | Razin Farhan Hussain et.al. | 2409.15802 | null |
| 2024-09-24 | Identified-and-Targeted: The First Early Evidence of the Privacy-Invasive Use of Browser Fingerprinting for Online Tracking | Zengrui Liu et.al. | 2409.15656 | null |
| 2024-09-23 | MotifDisco: Motif Causal Discovery For Time Series Motifs | Josephine Lamp et.al. | 2409.15219 | null |
| 2024-09-23 | Anomaly Detection from a Tensor Train Perspective | Alejandro Mata Ali et.al. | 2409.15030 | null |
| 2024-09-23 | VARADE: a Variational-based AutoRegressive model for Anomaly Detection on the Edge | Alessio Mascolini et.al. | 2409.14816 | null |
| 2024-09-23 | Research on Dynamic Data Flow Anomaly Detection based on Machine Learning | Liyang Wang et.al. | 2409.14796 | null |
| 2024-09-18 | Asymptotics for conformal inference | Ulysse Gazin et.al. | 2409.12019 | null |
| 2024-09-18 | Log2graphs: An Unsupervised Framework for Log Anomaly Detection with Efficient Feature Extraction | Caihong Wang et.al. | 2409.11890 | null |
| 2024-09-18 | QUBO-based SVM for credit card fraud detection on a real QPU | Ettore Canonici et.al. | 2409.11876 | null |
| 2024-09-18 | Constraint Guided AutoEncoders for Joint Optimization of Condition Indicator Estimation and Anomaly Detection in Machine Condition Monitoring | Maarten Meire et.al. | 2409.11807 | null |
| 2024-09-18 | PieClam: A Universal Graph Autoencoder Based on Overlapping Inclusive and Exclusive Communities | Daniel Zilberg et.al. | 2409.11618 | null |
| 2024-09-17 | Outlier Detection with Cluster Catch Digraphs | Rui Shi et.al. | 2409.11596 | null |
| 2024-09-17 | Unsupervised Hybrid framework for ANomaly Detection (HAND) – applied to Screening Mammogram | Zhemin Zhang et.al. | 2409.11534 | link |
| 2024-09-17 | Adaptive Anomaly Detection in Network Flows with Low-Rank Tensor Decompositions and Deep Unrolling | Lukas Schynol et.al. | 2409.11529 | null |
| 2024-09-17 | An Empirical Study of Sensitive Information in Logs | Roozbeh Aghili et.al. | 2409.11313 | null |
| 2024-09-17 | Multimodal Attention-Enhanced Feature Fusion-based Weekly Supervised Anomaly Violence Detection | Yuta Kaneko et.al. | 2409.11223 | null |
| 2024-09-17 | Fair Anomaly Detection For Imbalanced Groups | Ziwei Wu et.al. | 2409.10951 | null |
| 2024-09-16 | Real-bogus scores for active anomaly detection | T. A. Semenikhin et.al. | 2409.10256 | null |
| 2024-09-16 | Evaluating the Efficacy of Instance Incremental vs. Batch Learning in Delayed Label Environments: An Empirical Study on Tabular Data Streaming for Fraud Detection | Kodjo Mawuena Amekoe et.al. | 2409.10111 | link |
| 2024-09-16 | Enhancing Anomaly Detection via Generating Diversified and Hard-to-distinguish Synthetic Anomalies | Hyuntae Kim et.al. | 2409.10069 | null |
| 2024-09-16 | Deep Graph Anomaly Detection: A Survey and New Perspectives | Hezhe Qiao et.al. | 2409.09957 | link |
| 2024-09-15 | Dynamic Fraud Detection: Integrating Reinforcement Learning into Graph Neural Networks | Yuxin Dong et.al. | 2409.09892 | null |
| 2024-09-15 | Abnormal Event Detection In Videos Using Deep Embedding | Darshan Venkatrayappa et.al. | 2409.09804 | null |
| 2024-09-15 | Federated Learning in Adversarial Environments: Testbed Design and Poisoning Resilience in Cybersecurity | Hao Jian Huang et.al. | 2409.09794 | null |
| 2024-09-15 | Enhancing Data Quality through Self-learning on Imbalanced Financial Risk Data | Xu Sun et.al. | 2409.09792 | null |
| 2024-09-15 | Towards Multi-view Graph Anomaly Detection with Similarity-Guided Contrastive Clustering | Lecheng Zheng et.al. | 2409.09770 | null |
| 2024-09-15 | OML-AD: Online Machine Learning for Anomaly Detection in Time Series Data | Sebastian Wette et.al. | 2409.09742 | null |
| 2024-09-13 | 1D-CNN-IDS: 1D CNN-based Intrusion Detection System for IIoT | Muhammad Arslan et.al. | 2409.08529 | null |
| 2024-09-13 | Optimal Classification-based Anomaly Detection with Neural Networks: Theory and Practice | Tian-Yi Zhou et.al. | 2409.08521 | null |
| 2024-09-12 | Towards a graph-based foundation model for network traffic analysis | Louis Van Langendonck et.al. | 2409.08111 | null |
| 2024-09-12 | Cellwise outlier detection in heterogeneous populations | Giorgia Zaccaria et.al. | 2409.07881 | null |
| 2024-09-11 | Unsupervised anomaly detection in spatio-temporal stream network sensor data | Edgar Santos-Fernandez et.al. | 2409.07667 | null |
| 2024-09-11 | Ensemble Methods for Sequence Classification with Hidden Markov Models | Maxime Kawawa-Beaudan et.al. | 2409.07619 | null |
| 2024-09-11 | A Survey of Anomaly Detection in In-Vehicle Networks | Övgü Özdemir et.al. | 2409.07505 | null |
| 2024-09-11 | Introducing Perturb-ability Score (PS) to Enhance Robustness Against Evasion Adversarial Attacks on ML-NIDS | Mohamed elShehaby et.al. | 2409.07448 | null |
| 2024-09-11 | Unsupervised Novelty Detection Methods Benchmarking with Wavelet Decomposition | Ariel Priarone et.al. | 2409.07135 | link |
| 2024-09-11 | A Continual and Incremental Learning Approach for TinyML On-device Training Using Dataset Distillation and Model Size Adaption | Marcus Rüb et.al. | 2409.07114 | null |
| 2024-09-11 | Detect anomalous quartic gauge couplings at muon colliders with quantum kernel k-means | Shuai Zhang et.al. | 2409.07010 | null |
| 2024-09-10 | Atom dimension adaptation for infinite set dictionary learning | Andra Băltoiu et.al. | 2409.06831 | null |
| 2024-09-09 | Kramnik vs Nakamura: A Chess Scandal | Shiva Maharaj et.al. | 2409.06739 | null |
| 2024-09-10 | GeMuCo: Generalized Multisensory Correlational Model for Body Schema Learning | Kento Kawaharazuka et.al. | 2409.06427 | null |
| 2024-09-10 | Texture-AD: An Anomaly Detection Dataset and Benchmark for Real Algorithm Development | Tianwu Lei et.al. | 2409.06367 | null |
| 2024-09-10 | Context Enhancement with Reconstruction as Sequence for Unified Unsupervised Anomaly Detection | Hui-Yue Yang et.al. | 2409.06285 | link |
| 2024-09-09 | DetoxBench: Benchmarking Large Language Models for Multitask Fraud & Abuse Detection | Joymallya Chakraborty et.al. | 2409.06072 | null |
| 2024-09-09 | Zero-shot Outlier Detection via Prior-data Fitted Networks: Model Selection Bygone! | Yuchen Shen et.al. | 2409.05672 | link |
| 2024-09-09 | Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection | Tianwu Lei et.al. | 2409.05611 | null |
| 2024-09-09 | A Novel Representation of Periodic Pattern and Its Application to Untrained Anomaly Detection | Peng Ye et.al. | 2409.05389 | null |
| 2024-09-09 | Deep Learning for Video Anomaly Detection: A Review | Peng Wu et.al. | 2409.05383 | null |
| 2024-09-09 | Memoryless Multimodal Anomaly Detection via Student-Teacher Network and Signed Distance Learning | Zhongbin Sun et.al. | 2409.05378 | null |
| 2024-09-09 | GDFlow: Anomaly Detection with NCDE-based Normalizing Flow for Advanced Driver Assistance System | Kangjun Lee et.al. | 2409.05346 | null |
| 2024-09-08 | NetDPSyn: Synthesizing Network Traces under Differential Privacy | Danyu Sun et.al. | 2409.05249 | null |
| 2024-09-08 | Lung-DETR: Deformable Detection Transformer for Sparse Lung Nodule Anomaly Detection | Hooman Ramezani et.al. | 2409.05200 | null |
| 2024-09-08 | 2DSig-Detect: a semi-supervised framework for anomaly detection on image data using 2D-signatures | Xinheng Xie et.al. | 2409.04982 | null |
| 2024-09-08 | Anomaly Detection for Real-World Cyber-Physical Security using Quantum Hybrid Support Vector Machines | Tyler Cultice et.al. | 2409.04935 | null |
| 2024-09-06 | Evaluating Fairness in Transaction Fraud Models: Fairness Metrics, Bias Audits, and Challenges | Parameswaran Kamalaruban et.al. | 2409.04373 | null |
| 2024-09-06 | Unmasking Covert Intrusions: Detection of Fault-Masking Cyberattacks on Differential Protection Systems | Ahmad Mohammad Saber et.al. | 2409.04242 | null |
| 2024-09-06 | Ultra-imbalanced classification guided by statistical information | Yin Jin et.al. | 2409.04101 | null |
| 2024-09-05 | Unsupervised Anomaly Detection and Localization with Generative Adversarial Networks | Khouloud Abdelli et.al. | 2409.03657 | null |
| 2024-09-05 | A Dual-Path Framework with Frequency-and-Time Excited Network for Anomalous Sound Detection | Yucong Zhang et.al. | 2409.03610 | null |
| 2024-09-05 | CTMBIDS: Convolutional Tsetlin Machine Based Intrusion Detection System for DDoS attacks in an SDN environment | Rasoul Jafari Gohari et.al. | 2409.03544 | null |
| 2024-09-05 | Unveiling Context-Related Anomalies: Knowledge Graph Empowered Decoupling of Scene and Action for Human-Related Video Anomaly Detection | Chenglizhao Chen et.al. | 2409.03236 | link |
| 2024-09-05 | Towards Autonomous Cybersecurity: An Intelligent AutoML Framework for Autonomous Intrusion Detection | Li Yang et.al. | 2409.03141 | link |
| 2024-09-04 | ADFilter – A Web Tool for New Physics Searches With Autoencoder-Based Anomaly Detection Using Deep Unsupervised Neural Networks | Sergei V. Chekanov et.al. | 2409.03065 | null |
| 2024-09-04 | Oddballness: universal anomaly detection with language models | Filip Graliński et.al. | 2409.03046 | null |
| 2024-09-04 | NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks | Chris Stanford et.al. | 2409.03024 | null |
| 2024-09-04 | SDOoop: Capturing Periodical Patterns and Out-of-phase Anomalies in Streaming Data Analysis | Alexander Hartl et.al. | 2409.02973 | link |
| 2024-09-04 | Anomaly Detection in Offshore Open Radio Access Network Using Long Short-Term Memory Models on a Novel Artificial Intelligence-Driven Cloud-Native Data Platform | Abdelrahim Ahmad et.al. | 2409.02849 | null |
| 2024-09-03 | TimeDiT: General-purpose Diffusion Transformers for Time Series Foundation Model | Defu Cao et.al. | 2409.02322 | null |
| 2024-09-03 | Generalized implementation of invariant coordinate selection with positive semi-definite scatter matrices | Aurore Archimbaud et.al. | 2409.02258 | null |
| 2024-09-02 | AutoEncoder Convolutional Neural Network for Pneumonia Detection | Michael Nosa-Omoruyi et.al. | 2409.02142 | null |
| 2024-09-02 | The Role of Transformer Models in Advancing Blockchain Technology: A Systematic Review | Tianxu Liu et.al. | 2409.02139 | null |
| 2024-09-03 | Synthetic Data Generation and Automated Multidimensional Data Labeling for AI/ML in General and Circular Coordinates | Alice Williams et.al. | 2409.02079 | null |
| 2024-09-03 | Activity-Guided Industrial Anomalous Sound Detection against Interferences | Yunjoo Lee et.al. | 2409.01885 | null |
| 2024-09-03 | Interpreting Outliers in Time Series Data through Decoding Autoencoder | Patrick Knab et.al. | 2409.01713 | null |
| 2024-09-03 | Improving Robustness of Spectrogram Classifiers with Neural Stochastic Differential Equations | Joel Brogan et.al. | 2409.01532 | null |
| 2024-09-02 | VQ-Flow: Taming Normalizing Flows for Multi-Class Anomaly Detection via Hierarchical Vector Quantization | Yixuan Zhou et.al. | 2409.00942 | link |
| 2024-08-30 | Semi-supervised permutation invariant particle-level anomaly detection | Gabriel Matos et.al. | 2408.17409 | null |
| 2024-08-30 | C-RADAR: A Centralized Deep Learning System for Intrusion Detection in Software Defined Networks | Osama Mustafa et.al. | 2408.17356 | null |
| 2024-08-30 | AASIST3: KAN-Enhanced AASIST Speech Deepfake Detection using SSL Features and Additional Regularization for the ASVspoof 2024 Challenge | Kirill Borodin et.al. | 2408.17352 | null |
| 2024-08-30 | AI-Driven Intrusion Detection Systems (IDS) on the ROAD dataset: A Comparative Analysis for automotive Controller Area Network (CAN) | Lorenzo Guerra et.al. | 2408.17235 | null |
| 2024-08-30 | Self-supervised Anomaly Detection Pretraining Enhances Long-tail ECG Diagnosis | Aofan Jiang et.al. | 2408.17154 | link |
| 2024-08-30 | Meta-UAD: A Meta-Learning Scheme for User-level Network Traffic Anomaly Detection | Tongtong Feng et.al. | 2408.17031 | null |
| 2024-08-29 | HLogformer: A Hierarchical Transformer for Representing Log Data | Zhichao Hou et.al. | 2408.16803 | null |
| 2024-08-30 | ARINC 429 Cyber-vulnerabilities and Voltage Data in a Hardware-in-the-Loop Simulator | Connor Trask et.al. | 2408.16714 | null |
| 2024-08-29 | Data Quality Monitoring through Transfer Learning on Anomaly Detection for the Hadron Calorimeters | Mulugeta Weldezgina Asres et.al. | 2408.16612 | null |
| 2024-08-29 | Multitask learning for improved scour detection: A dynamic wave tank study | Simon M. Brealy et.al. | 2408.16527 | link |
| 2024-08-29 | Uni-3DAD: GAN-Inversion Aided Universal 3D Anomaly Detection on Model-free Products | Jiayu Liu et.al. | 2408.16201 | null |
| 2024-08-29 | Real-Time Energy Pricing in New Zealand: An Evolving Stream Analysis | Yibin Sun et.al. | 2408.16187 | null |
| 2024-08-28 | Systematic Evaluation of Synthetic Data Augmentation for Multi-class NetFlow Traffic | Maximilian Wolf et.al. | 2408.16034 | null |
| 2024-08-28 | Efficient Slice Anomaly Detection Network for 3D Brain MRI Volume | Zeduo Zhang et.al. | 2408.15958 | null |
| 2024-08-29 | Enhancing Intrusion Detection in IoT Environments: An Advanced Ensemble Approach Using Kolmogorov-Arnold Networks | Amar Amouri et.al. | 2408.15886 | null |
| 2024-08-28 | Robust Statistical Scaling of Outlier Scores: Improving the Quality of Outlier Probabilities for Outliers (Extended Version) | Philipp Röchner et.al. | 2408.15874 | null |
| 2024-08-28 | CSAD: Unsupervised Component Segmentation for Logical Anomaly Detection | Yu-Hsuan Hsieh et.al. | 2408.15628 | link |
| 2024-08-29 | VFLIP: A Backdoor Defense for Vertical Federated Learning via Identification and Purification | Yungi Cho et.al. | 2408.15591 | link |
| 2024-08-27 | PoseWatch: A Transformer-based Architecture for Human-centric Video Anomaly Detection Using Spatio-temporal Pose Tokenization | Ghazal Alinezhad Noghre et.al. | 2408.15185 | null |
| 2024-08-28 | AnomalousPatchCore: Exploring the Use of Anomalous Samples in Industrial Anomaly Detection | Mykhailo Koshil et.al. | 2408.15113 | null |
| 2024-08-27 | ERX: A Fast Real-Time Anomaly Detection Algorithm for Hyperspectral Line-Scanning | Samuel Garske et.al. | 2408.14947 | link |
| 2024-08-28 | User-level Social Multimedia Traffic Anomaly Detection with Meta-Learning | Tongtong Feng et.al. | 2408.14884 | null |
| 2024-08-27 | Channel-wise Influence: Estimating Data Influence for Multivariate Time Series | Muyao Wang et.al. | 2408.14763 | null |
| 2024-08-27 | Training-Free Time-Series Anomaly Detection: Leveraging Image Foundation Models | Nobuo Namura et.al. | 2408.14756 | null |
| 2024-08-26 | Anomaly Detection Within Mission-Critical Call Processing | Sean Doris et.al. | 2408.14599 | null |
| 2024-08-26 | Aiding Humans in Financial Fraud Decision Making: Toward an XAI-Visualization Framework | Angelos Chatzimparmpas et.al. | 2408.14552 | null |
| 2024-08-26 | PHEVA: A Privacy-preserving Human-centric Video Anomaly Detection Dataset | Ghazal Alinezhad Noghre et.al. | 2408.14329 | link |
| 2024-08-26 | Beyond Detection: Leveraging Large Language Models for Cyber Attack Prediction in IoT Networks | Alaeddine Diaf et.al. | 2408.14045 | null |
| 2024-08-26 | Evaluating The Explainability of State-of-the-Art Machine Learning-based IoT Network Intrusion Detection Systems | Ayush Kumar et.al. | 2408.14040 | null |
| 2024-08-25 | Time Series Analysis for Education: Methods, Applications, and Future Directions | Shengzhong Mao et.al. | 2408.13960 | link |
| 2024-08-24 | Outlier Detection Bias Busted: Understanding Sources of Algorithmic Bias through Data-centric Factors | Xueying Ding et.al. | 2408.13667 | null |
| 2024-08-24 | Temporal Divide-and-Conquer Anomaly Actions Localization in Semi-Supervised Videos with Hierarchical Transformer | Nada Osman et.al. | 2408.13643 | null |
| 2024-08-24 | Robust Principal Components by Casewise and Cellwise Weighting | Fabio Centofanti et.al. | 2408.13596 | null |
| 2024-08-24 | Variational Autoencoder for Anomaly Detection: A Comparative Study | Huy Hoang Nguyen et.al. | 2408.13561 | link |
| 2024-08-24 | AnoPLe: Few-Shot Anomaly Detection via Bi-directional Prompt Learning with Only Normal Samples | Yujin Lee et.al. | 2408.13516 | link |
| 2024-08-24 | DualAnoDiff: Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation | Ying Jin et.al. | 2408.13509 | null |
| 2024-08-23 | Multivariate Time-Series Anomaly Detection based on Enhancing Graph Attention Networks with Topological Analysis | Zhe Liu et.al. | 2408.13082 | link |
| 2024-08-23 | RIFF: Inducing Rules for Fraud Detection from Decision Trees | João Lucas Martins et.al. | 2408.12989 | null |
| 2024-08-23 | Efficient Training Approaches for Performance Anomaly Detection Models in Edge Computing Environments | Duneesha Fernando et.al. | 2408.12855 | null |
| 2024-08-22 | UMAD: University of Macau Anomaly Detection Benchmark Dataset | Dong Li et.al. | 2408.12527 | link |
| 2024-08-22 | Multimodal Foundational Models for Unsupervised 3D General Obstacle Detection | Tamás Matuszka et.al. | 2408.12322 | null |
| 2024-08-23 | Enhanced Fine-Tuning of Lightweight Domain-Specific Q&A Model Based on Large Language Models | Shenglin Zhang et.al. | 2408.12247 | link |
| 2024-08-21 | Explainable Anomaly Detection: Counterfactual driven What-If Analysis | Logan Cummins et.al. | 2408.11935 | null |
| 2024-08-21 | RODEM Jet Datasets | Knut Zoch et.al. | 2408.11616 | null |
| 2024-08-21 | Self-Supervised Iterative Refinement for Anomaly Detection in Industrial Quality Control | Muhammad Aqeel et.al. | 2408.11561 | null |
| 2024-08-21 | Hypergraph Learning based Recommender System for Anomaly Detection, Control and Optimization | Sakhinana Sagar Srinivas et.al. | 2408.11359 | null |
| 2024-08-20 | Quantum Machine Learning Algorithms for Anomaly Detection: a Survey | Sebastiano Corli et.al. | 2408.11047 | null |
| 2024-08-20 | Universal Novelty Detection Through Adaptive Contrastive Learning | Hossein Mirzaei et.al. | 2408.10798 | link |
| 2024-08-20 | Physics-Driven AI Correction in Laser Absorption Sensing Quantification | Ruiyuan Kang et.al. | 2408.10714 | null |
| 2024-08-19 | Forecasting Attacker Actions using Alert-driven Attack Graphs | Ion Băbălău et.al. | 2408.09888 | null |
| 2024-08-19 | ALTBI: Constructing Improved Outlier Detection Models via Optimization of Inlier-Memorization Effect | Seoyoung Cho et.al. | 2408.09791 | null |
| 2024-08-19 | Simplicial complexes in network intrusion profiling | Mandala von Westenholz et.al. | 2408.09788 | null |
| 2024-08-18 | Federated Graph Learning with Structure Proxy Alignment | Xingbo Fu et.al. | 2408.09393 | link |
| 2024-08-16 | Deep Generative Classification of Blood Cell Morphology | Simon Deltadahl et.al. | 2408.08982 | link |
| 2024-08-16 | A Novel Buffered Federated Learning Framework for Privacy-Driven Anomaly Detection in IIoT | Samira Kamali Poorazad et.al. | 2408.08722 | null |
| 2024-08-15 | Efficient Data-Sketches and Fine-Tuning for Early Detection of Distributional Drift in Medical Imaging | Yusen Wu et.al. | 2408.08456 | null |
| 2024-08-15 | A Robust Multi-Stage Intrusion Detection System for In-Vehicle Network Security using Hierarchical Federated Learning | Muzun Althunayyan et.al. | 2408.08433 | null |
| 2024-08-15 | HELP: Hierarchical Embeddings-based Log Parsing | Andy Xu et.al. | 2408.08300 | null |
| 2024-08-15 | Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective | Zixuan Pan et.al. | 2408.08228 | link |
| 2024-08-15 | Impact of Comprehensive Data Preprocessing on Predictive Modelling of COVID-19 Mortality | Sangita Das et.al. | 2408.08142 | link |
| 2024-08-15 | Detection and Impact of Debit/Credit Card Fraud: Victims’ Experiences | Eman Alashwali et.al. | 2408.08131 | null |
| 2024-08-14 | How Industry Tackles Anomalies during Runtime: Approaches and Key Monitoring Parameters | Monika Steidl et.al. | 2408.07816 | null |
| 2024-08-14 | MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis | Nimeesha Chan et.al. | 2408.07773 | link |
| 2024-08-14 | Extending Network Intrusion Detection with Enhanced Particle Swarm Optimization Techniques | Surasit Songma et.al. | 2408.07729 | null |
| 2024-08-14 | Latent Anomaly Detection Through Density Matrices | Joseph Gallego-Mejia et.al. | 2408.07623 | null |
| 2024-08-14 | Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey | Hamza Kheddar et.al. | 2408.07583 | null |
| 2024-08-14 | Attention-Guided Perturbation for Unsupervised Image Anomaly Detection | Tingfeng Huang et.al. | 2408.07490 | null |
| 2024-08-14 | A novel framework for quantifying nominal outlyingness | Efthymios Costa et.al. | 2408.07463 | null |
| 2024-08-13 | FedMADE: Robust Federated Learning for Intrusion Detection in IoT Networks Using a Dynamic Aggregation Method | Shihua Sun et.al. | 2408.07152 | null |
| 2024-08-13 | Investigation of unsupervised and supervised hyperspectral anomaly detection | Mazharul Hossain et.al. | 2408.07114 | null |
| 2024-08-13 | RW-NSGCN: A Robust Approach to Structural Attacks via Negative Sampling | Shuqi He et.al. | 2408.06665 | null |
| 2024-08-13 | Unveiling the Flaws: A Critical Analysis of Initialization Effect on Time Series Anomaly Detection | Alex Koran et.al. | 2408.06620 | null |
| 2024-08-12 | Hi-SAM: A high-scalable authentication model for satellite-ground Zero-Trust system using mean field game | Xuesong Wu et.al. | 2408.06185 | null |
| 2024-08-12 | A Methodological Report on Anomaly Detection on Dynamic Knowledge Graphs | Xiaohua Lu et.al. | 2408.06121 | null |
| 2024-08-13 | Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts | Peng Wu et.al. | 2408.05905 | null |
| 2024-08-10 | What Matters in Autonomous Driving Anomaly Detection: A Weakly Supervised Horizon | Utkarsh Tiwari et.al. | 2408.05562 | link |
| 2024-08-10 | Detecting Masquerade Attacks in Controller Area Networks Using Graph Machine Learning | William Marfo et.al. | 2408.05427 | link |
| 2024-08-09 | Hybrid Efficient Unsupervised Anomaly Detection for Early Pandemic Case Identification | Ghazal Ghajari et.al. | 2408.05347 | null |
| 2024-08-09 | Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing | Jiarui Xie et.al. | 2408.05307 | null |
| 2024-08-09 | Cross-Domain Learning for Video Anomaly Detection with Limited Supervision | Yashika Jain et.al. | 2408.05191 | null |
| 2024-08-09 | Adversarially Robust Industrial Anomaly Detection Through Diffusion Model | Yuanpu Cao et.al. | 2408.04839 | null |
| 2024-08-09 | Performance Metric for Multiple Anomaly Score Distributions with Discrete Severity Levels | Wonjun Yi et.al. | 2408.04817 | link |
| 2024-08-08 | Counter Denial of Service for Next-Generation Networks within the Artificial Intelligence and Post-Quantum Era | Saleh Darzi et.al. | 2408.04725 | null |
| 2024-08-08 | Towards High-resolution 3D Anomaly Detection via Group-Level Feature Contrastive Learning | Hongze Zhu et.al. | 2408.04604 | null |
| 2024-08-08 | FedAD-Bench: A Unified Benchmark for Federated Unsupervised Anomaly Detection in Tabular Data | Ahmed Anwar et.al. | 2408.04442 | null |
| 2024-08-09 | Anomaly Prediction: A Novel Approach with Explicit Delay and Horizon | Jiang You et.al. | 2408.04377 | null |
| 2024-08-08 | Towards Explainable Network Intrusion Detection using Large Language Models | Paul R. B. Houssel et.al. | 2408.04342 | null |
| 2024-08-08 | Self-Supervised Contrastive Graph Clustering Network via Structural Information Fusion | Xiaoyang Ji et.al. | 2408.04339 | null |
| 2024-08-08 | AI-Driven Chatbot for Intrusion Detection in Edge Networks: Enhancing Cybersecurity with Ethical User Consent | Mugheez Asif et.al. | 2408.04281 | null |
| 2024-08-08 | Generating Fine-Grained Causality in Climate Time Series Data for Forecasting and Anomaly Detection | Dongqi Fu et.al. | 2408.04254 | null |
| 2024-08-08 | Cluster-Wide Task Slowdown Detection in Cloud System | Feiyi Chen et.al. | 2408.04236 | null |
| 2024-08-07 | Programmable Dataflows: Abstraction and Programming Model for Data Sharing | Siyuan Xia et.al. | 2408.04092 | null |
| 2024-08-07 | Dual-Modeling Decouple Distillation for Unsupervised Anomaly Detection | Xinyue Liu et.al. | 2408.03888 | null |
| 2024-08-09 | Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions | Lucas Correia et.al. | 2408.03747 | null |
| 2024-08-07 | Unsupervised Detection of Fetal Brain Anomalies using Denoising Diffusion Models | Markus Ditlev Sjøgren Olsen et.al. | 2408.03654 | null |
| 2024-08-07 | Minimum Enclosing Ball Synthetic Minority Oversampling Technique from a Geometric Perspective | Yi-Yang Shangguan et.al. | 2408.03526 | null |
| 2024-08-06 | Can LLMs Serve As Time Series Anomaly Detectors? | Manqing Dong et.al. | 2408.03475 | null |
| 2024-08-06 | CKNN: Cleansed k-Nearest Neighbor for Unsupervised Video Anomaly Detection | Jihun Yi et.al. | 2408.03014 | null |
| 2024-08-05 | Operational range bounding of spectroscopy models with anomaly detection | Luís F. Simões et.al. | 2408.02581 | null |
| 2024-08-05 | Introducing a Comprehensive, Continuous, and Collaborative Survey of Intrusion Detection Datasets | Philipp Bönninghausen et.al. | 2408.02521 | null |
| 2024-08-05 | AssemAI: Interpretable Image-Based Anomaly Detection for Manufacturing Pipelines | Renjith Prasad et.al. | 2408.02181 | null |
| 2024-08-04 | EOL: Transductive Few-Shot Open-Set Recognition by Enhancing Outlier Logits | Mateusz Ochal et.al. | 2408.02052 | null |
| 2024-08-04 | Individualized multi-horizon MRI trajectory prediction for Alzheimer’s Disease | Rosemary He et.al. | 2408.02018 | null |
| 2024-08-04 | SR-CIS: Self-Reflective Incremental System with Decoupled Memory and Reasoning | Biqing Qi et.al. | 2408.01970 | null |
| 2024-08-04 | AnomalySD: Few-Shot Multi-Class Anomaly Detection with Stable Diffusion Model | Zhenyu Yan et.al. | 2408.01960 | null |
| 2024-08-03 | Optimizing Intrusion Detection System Performance Through Synergistic Hyperparameter Tuning and Advanced Data Processing | Samia Saidane et.al. | 2408.01792 | null |
| 2024-08-03 | IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection | Hong Guan et.al. | 2408.01690 | null |
| 2024-08-02 | Interplay of Traditional Methods and Machine Learning Algorithms for Tagging Boosted Objects | Camellia Bose et.al. | 2408.01138 | null |
| 2024-08-01 | Online Detection of Anomalies in Temporal Knowledge Graphs with Interpretability | Jiasheng Zhang et.al. | 2408.00872 | null |
| 2024-08-01 | Token Interdependency Parsing (Tipping) – Fast and Accurate Log Parsing | Shayan Hashemi et.al. | 2408.00645 | null |
| 2024-08-01 | Enhancing Ethereum Fraud Detection via Generative and Contrastive Self-supervision | Chenxiang Jin et.al. | 2408.00641 | null |
| 2024-08-01 | VecAug: Unveiling Camouflaged Frauds with Cohort Augmentation for Enhanced Detection | Fei Xiao et.al. | 2408.00513 | null |
| 2024-08-01 | Enhance the Detection of DoS and Brute Force Attacks within the MQTT Environment through Feature Engineering and Employing an Ensemble Technique | Abdulelah Al Hanif et.al. | 2408.00480 | null |
| 2024-07-31 | CT-based Anomaly Detection of Liver Tumors Using Generative Diffusion Prior | Yongyi Shi et.al. | 2408.00092 | null |
| 2024-07-31 | Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey | Atsuyuki Miyai et.al. | 2407.21794 | null |
| 2024-07-31 | Artificial Intelligence Approaches for Energy Efficiency: A Review | Alberto Pasqualetto et.al. | 2407.21726 | null |
| 2024-07-31 | Small Object Few-shot Segmentation for Vision-based Industrial Inspection | Zilong Zhang et.al. | 2407.21351 | null |
| 2024-08-01 | Outlier Detection in Large Radiological Datasets using UMAP | Mohammad Tariqul Islam et.al. | 2407.21263 | null |
| 2024-07-30 | FCN4Flare: Fully Convolution Neural Networks for Flare Detection | Ming-Hui Jia et.al. | 2407.21240 | link |
| 2024-07-30 | Efficient Quantum One-Class Support Vector Machines for Anomaly Detection Using Randomized Measurements and Variable Subsampling | Michael Kölle et.al. | 2407.20753 | null |
| 2024-07-30 | Time Series Anomaly Detection with CNN for Environmental Sensors in Healthcare-IoT | Mirza Akhi Khatun et.al. | 2407.20695 | null |
| 2024-07-30 | DocXPand-25k: a large and diverse benchmark dataset for identity documents analysis | Julien Lerouge et.al. | 2407.20662 | link |
| 2024-07-29 | Can I trust my anomaly detection system? A case study based on explainable AI | Muhammad Rashid et.al. | 2407.19951 | link |
| 2024-07-29 | Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning | Leen Kweider et.al. | 2407.19860 | null |
| 2024-07-29 | Normality Addition via Normality Detection in Industrial Image Anomaly Detection Models | Jihun Yi et.al. | 2407.19849 | null |
| 2024-07-29 | Detecting Unsafe Behavior in Neural Network Imitation Policies for Caregiving Robotics | Andrii Tytarenko et.al. | 2407.19819 | null |
| 2024-07-29 | Accelerating template generation in resonant anomaly detection searches with optimal transport | Matthew Leigh et.al. | 2407.19818 | null |
| 2024-07-29 | Application of Computer Technology in Financial Investment | Xinye Sha et.al. | 2407.19684 | null |
| 2024-07-29 | Foundations for Unfairness in Anomaly Detection – Case Studies in Facial Imaging Data | Michael Livanos et.al. | 2407.19646 | null |
| 2024-07-26 | HADES: Detecting Active Directory Attacks via Whole Network Provenance Analytics | Qi Liu et.al. | 2407.18858 | null |
| 2024-07-26 | Homomorphic Encryption-Enabled Federated Learning for Privacy-Preserving Intrusion Detection in Resource-Constrained IoV Networks | Bui Duc Manh et.al. | 2407.18503 | null |
| 2024-07-26 | Textile Anomaly Detection: Evaluation of the State-of-the-Art for Automated Quality Inspection of Carpet | Briony Forsberg et.al. | 2407.18450 | null |
| 2024-07-26 | Impact of Recurrent Neural Networks and Deep Learning Frameworks on Real-time Lightweight Time Series Anomaly Detection | Ming-Chang Lee et.al. | 2407.18439 | null |
| 2024-07-25 | Separating Novel Features for Logical Anomaly Detection: A Straightforward yet Effective Approach | Kangil Lee et.al. | 2407.17909 | null |
| 2024-07-24 | Large Language Models for Anomaly Detection in Computational Workflows: from Supervised Fine-Tuning to In-Context Learning | Hongwei Jin et.al. | 2407.17545 | link |
| 2024-07-23 | On the Relationship between $Λ$ -poisedness in Derivative-Free Optimization and Outliers in Local Outlier Factor | Qi Zhang et.al. | 2407.17529 | null |
| 2024-07-25 | Looking at Model Debiasing through the Lens of Anomaly Detection | Vito Paolo Pastore et.al. | 2407.17449 | null |
| 2024-07-24 | Preliminary study on artificial intelligence methods for cybersecurity threat detection in computer networks based on raw data packets | Aleksander Ogonowski et.al. | 2407.17339 | null |
| 2024-07-24 | Global and Local Confidence Based Fraud Detection Graph Neural Network | Jiaxun Liu et.al. | 2407.17333 | null |
| 2024-07-24 | When Text and Images Don’t Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection | Adam Goodge et.al. | 2407.17083 | null |
| 2024-07-23 | Securing Tomorrow’s Smart Cities: Investigating Software Security in Internet of Vehicles and Deep Learning Technologies | Ridhi Jain et.al. | 2407.16410 | null |
| 2024-07-22 | AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection | Yunkang Cao et.al. | 2407.15795 | link |
| 2024-07-22 | STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay | Yongcan Yu et.al. | 2407.15773 | link |
| 2024-07-22 | Towards Open-World Object-based Anomaly Detection via Self-Supervised Outlier Synthesis | Brian K. S. Isaac-Medina et.al. | 2407.15763 | null |
| 2024-07-22 | A Life-long Learning Intrusion Detection System for 6G-Enabled IoV | Abdelaziz Amara korba et.al. | 2407.15700 | null |
| 2024-07-22 | Semi-Supervised Learning for Anomaly Detection in Blockchain-based Supply Chains | Do Hai Son et.al. | 2407.15603 | link |
| 2024-07-23 | Bidirectional skip-frame prediction for video anomaly detection with intra-domain disparity-driven attention | Jiahao Lyu et.al. | 2407.15424 | null |
| 2024-07-21 | LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme | Jeongmin Brian Park et.al. | 2407.15264 | null |
| 2024-07-21 | Diffusion Models for Unsupervised Anomaly Detection in Fetal Brain Ultrasound | Hanna Mykula et.al. | 2407.15119 | null |
| 2024-07-20 | Efficient Intrusion Detection: Combining $χ^2$ Feature Selection with CNN-BiLSTM on the UNSW-NB15 Dataset | Mohammed Jouhari et.al. | 2407.14945 | null |
| 2024-07-20 | A Two-Phase Visualization System for Continuous Human-AI Collaboration in Sequelae Analysis and Modeling | Yang Ouyang et.al. | 2407.14769 | null |
| 2024-07-19 | Evaluation of Provenance Serialisations for Astronomical Provenance | Michael A. C. Johnson et.al. | 2407.14290 | null |
| 2024-07-18 | Motif-Consistent Counterfactuals with Adversarial Refinement for Graph-Level Anomaly Detection | Chunjing Xiao et.al. | 2407.13251 | null |
| 2024-07-17 | INTELLECT: Adapting Cyber Threat Detection to Heterogeneous Computing Environments | Simone Magnani et.al. | 2407.13043 | null |
| 2024-07-17 | In-Situ Infrared Camera Monitoring for Defect and Anomaly Detection in Laser Powder Bed Fusion: Calibration, Data Mapping, and Feature Extraction | Shawn Hinnebusch et.al. | 2407.12682 | null |
| 2024-07-17 | A Brief Review of Quantum Machine Learning for Financial Services | Mina Doosti et.al. | 2407.12618 | null |
| 2024-07-17 | SigDLA: A Deep Learning Accelerator Extension for Signal Processing | Fangfa Fu et.al. | 2407.12565 | null |
| 2024-07-17 | Leveraging the Mahalanobis Distance to enhance Unsupervised Brain MRI Anomaly Detection | Finn Behrendt et.al. | 2407.12474 | link |
| 2024-07-17 | GraphGuard: Contrastive Self-Supervised Learning for Credit-Card Fraud Detection in Multi-Relational Dynamic Graphs | Kristófer Reynisson et.al. | 2407.12440 | null |
| 2024-07-17 | GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features | Luc P. J. Sträter et.al. | 2407.12427 | link |
| 2024-07-16 | The object detection method aids in image reconstruction evaluation and clinical interpretation of meniscal abnormalities | Natalia Konovalova et.al. | 2407.12184 | null |
| 2024-07-16 | Agglomerative Clustering of Simulation Output Distributions Using Regularized Wasserstein Distance | Mohammadmahdi Ghasemloo et.al. | 2407.12100 | null |
| 2024-07-16 | Learning Multi-view Anomaly Detection | Haoyang He et.al. | 2407.11935 | null |
| 2024-07-16 | Variance Norms for Kernelized Anomaly Detection | Thomas Cass et.al. | 2407.11873 | link |
| 2024-07-16 | An AI System for Continuous Knee Osteoarthritis Severity Grading Using Self-Supervised Anomaly Detection with Limited Data | Niamh Belton et.al. | 2407.11500 | link |
| 2024-07-16 | Detection of Global Anomalies on Distributed IoT Edges with Device-to-Device Communication | Hideya Ochiai et.al. | 2407.11308 | null |
| 2024-07-15 | CICAPT-IIOT: A provenance-based APT attack dataset for IIoT environment | Erfan Ghiasvand et.al. | 2407.11278 | null |
| 2024-07-15 | Impacts of Data Preprocessing and Hyperparameter Optimization on the Performance of Machine Learning Models Applied to Intrusion Detection Systems | Mateus Guimarães Lima et.al. | 2407.11105 | null |
| 2024-07-15 | R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection | Zheyuan Zhou et.al. | 2407.10862 | link |
| 2024-07-15 | An Autonomous Drone Swarm for Detecting and Tracking Anomalies among Dense Vegetation | Rakesh John Amala Arokia Nathan et.al. | 2407.10754 | null |
| 2024-07-15 | Omni-Dimensional Frequency Learner for General Time Series Analysis | Xianing Chen. Hanting Chen et.al. | 2407.10419 | null |
| 2024-07-14 | Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models | Yuchen Yang et.al. | 2407.10299 | null |
| 2024-07-14 | Harnessing Feature Clustering For Enhanced Anomaly Detection With Variational Autoencoder And Dynamic Threshold | Tolulope Ale et.al. | 2407.10042 | null |
| 2024-07-12 | BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning | Ning Wang et.al. | 2407.09658 | null |
| 2024-07-12 | Unsupervised Anomaly Detection Using Diffusion Trend Analysis | Eunwoo Kim et.al. | 2407.09578 | null |
| 2024-07-12 | A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization | Qiyu Chen et.al. | 2407.09359 | link |
| 2024-07-12 | Temporal M-quantile models and robust bias-corrected small area predictors | María Bugallo Porto et.al. | 2407.09062 | null |
| 2024-07-12 | Challenges of Anomaly Detection in the Object-Centric Setting: Dimensions and the Role of Domain Knowledge | Alessandro Berti et.al. | 2407.09023 | null |
| 2024-07-11 | A Survey on the Application of Generative Adversarial Networks in Cybersecurity: Prospective, Direction and Open Research Scopes | Md Mashrur Arifin et.al. | 2407.08839 | null |
| 2024-07-11 | Deep Learning for Network Anomaly Detection under Data Contamination: Evaluating Robustness and Mitigating Performance Degradation | D’Jeff K. Nkashama et.al. | 2407.08838 | null |
| 2024-07-11 | Real-Time Anomaly Detection and Reactive Planning with Large Language Models | Rohan Sinha et.al. | 2407.08735 | null |
| 2024-07-10 | Estimation and Control of Motor Core Temperature with Online Learning of Thermal Model Parameters: Application to Musculoskeletal Humanoids | Kento Kawaharazuka et.al. | 2407.08055 | null |
| 2024-07-10 | Unsupervised Beyond-Standard-Model Event Discovery at the LHC with a Novel Quantum Autoencoder | Callum Duffy et.al. | 2407.07961 | null |
| 2024-07-10 | GothX: a generator of customizable, legitimate and malicious IoT network traffic | Manuel Poisson et.al. | 2407.07456 | null |
| 2024-07-10 | Federated PCA on Grassmann Manifold for IoT Anomaly Detection | Tung-Anh Nguyen et.al. | 2407.07421 | link |
| 2024-07-09 | Integrating Ontology Design with the CRISP-DM in the context of Cyber-Physical Systems Maintenance | Milapji Singh Gill et.al. | 2407.06930 | null |
| 2024-07-09 | TeVAE: A Variational Autoencoder Approach for Discrete Online Anomaly Detection in Variable-state Multivariate Time-series Data | Lucas Correia et.al. | 2407.06849 | link |
| 2024-07-09 | PSPU: Enhanced Positive and Unlabeled Learning by Leveraging Pseudo Supervision | Chengjie Wang et.al. | 2407.06698 | null |
| 2024-07-09 | Ensembled Cold-Diffusion Restorations for Unsupervised Anomaly Detection | Sergio Naval Marimont et.al. | 2407.06635 | link |
| 2024-07-09 | Comparison of Optimizers for Fault Isolation and Diagnostics of Control Rod Drives | Ark Ifeanyi et.al. | 2407.06557 | null |
| 2024-07-09 | Advanced Financial Fraud Detection Using GNN-CL Model | Yu Cheng et.al. | 2407.06529 | null |
| 2024-07-09 | F2PAD: A General Optimization Framework for Feature-Level to Pixel-Level Anomaly Detection | Chengyu Tao et.al. | 2407.06519 | null |
| 2024-07-08 | Non-Robust Features are Not Always Useful in One-Class Classification | Matthew Lau et.al. | 2407.06372 | null |
| 2024-07-08 | Bounding Boxes and Probabilistic Graphical Models: Video Anomaly Detection Simplified | Mia Siemon et.al. | 2407.06000 | null |
| 2024-07-08 | Graph Anomaly Detection with Noisy Labels by Reinforcement Learning | Zhu Wang et.al. | 2407.05934 | null |
| 2024-07-08 | Multi-agent Reinforcement Learning-based Network Intrusion Detection System | Amine Tellache et.al. | 2407.05766 | null |
| 2024-07-08 | Deep Learning-based Anomaly Detection and Log Analysis for Computer Networks | Shuzhan Wang et.al. | 2407.05639 | null |
| 2024-07-08 | New User Event Prediction Through the Lens of Causal Inference | Henry Shaowu Yuchi et.al. | 2407.05625 | null |
| 2024-07-07 | CAV-AD: A Robust Framework for Detection of Anomalous Data and Malicious Sensors in CAV Networks | Md Sazedur Rahman et.al. | 2407.05461 | null |
| 2024-07-07 | Rethinking Unsupervised Outlier Detection via Multiple Thresholding | Zhonghang Liu et.al. | 2407.05382 | link |
| 2024-07-05 | SPINEX: Similarity-based Predictions with Explainable Neighbors Exploration for Anomaly and Outlier Detection | MZ Naser et.al. | 2407.04760 | null |
| 2024-07-05 | Feature Attenuation of Defective Representation Can Resolve Incomplete Masking on Anomaly Detection | YeongHyeon Park et.al. | 2407.04597 | null |
| 2024-07-05 | Machine Learning for Complex Systems with Abnormal Pattern by Exception Maximization Outlier Detection Method | Zhikun Zhang et.al. | 2407.04248 | null |
| 2024-07-04 | An Autoencoder Architecture for L-band Passive Microwave Retrieval of Landscape Freeze-Thaw Cycle | Divya Kumawat et.al. | 2407.04119 | link |
| 2024-07-04 | Looking for Tiny Defects via Forward-Backward Feature Transfer | Alex Costanzino et.al. | 2407.04092 | link |
| 2024-07-04 | A Critical Assessment of Interpretable and Explainable Machine Learning for Intrusion Detection | Omer Subasi et.al. | 2407.04009 | null |
| 2024-07-04 | Support Vector Based Anomaly Detection in Federated Learning | Massimo Frasson et.al. | 2407.03920 | null |
| 2024-07-04 | Seamless Monitoring of Stress Levels Leveraging a Universal Model for Time Sequences | Davide Gabrielli et.al. | 2407.03821 | null |
| 2024-07-04 | M $\mathbf5$ – A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks | Florian Schneider et.al. | 2407.03791 | null |
| 2024-07-04 | Charging Ahead: A Hierarchical Adversarial Framework for Counteracting Advanced Cyber Threats in EV Charging Stations | Mohammed Al-Mehdhar et.al. | 2407.03729 | null |
| 2024-07-04 | SOWA: Adapting Hierarchical Frozen Window Self-Attention to Visual-Language Models for Better Anomaly Detection | Zongxiang Hu et.al. | 2407.03634 | link |
| 2024-07-03 | Anomaly-based Framework for Detecting Power Overloading Cyberattacks in Smart Grid AMI | Abdelaziz Amara Korba et.al. | 2407.03264 | null |
| 2024-07-03 | Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization | Hanxi Li et.al. | 2407.03130 | null |
| 2024-07-03 | Federated Learning for Zero-Day Attack Detection in 5G and Beyond V2X Networks | Abdelaziz Amara korba et.al. | 2407.03070 | null |
| 2024-07-03 | Zero-X: A Blockchain-Enabled Open-Set Federated Learning Framework for Zero-Day Attack Detection in IoV | Abdelaziz Amara korba et.al. | 2407.02969 | null |
| 2024-07-03 | Unified Anomaly Detection methods on Edge Device using Knowledge Distillation and Quantization | Sushovan Jena et.al. | 2407.02968 | null |
| 2024-07-03 | Efficient IoT Devices Localization Through Wi-Fi CSI Feature Fusion and Anomaly Detection | Yan Li et.al. | 2407.02919 | null |
| 2024-07-03 | Domain-independent detection of known anomalies | Jonas Bühler et.al. | 2407.02910 | null |
| 2024-07-03 | Early-Stage Anomaly Detection: A Study of Model Performance on Complete vs. Partial Flows | Adrian Pekar et.al. | 2407.02856 | null |
| 2024-07-03 | FedPot: A Quality-Aware Collaborative and Incentivized Honeypot-Based Detector for Smart Grid Networks | Abdullatif Albaseer et.al. | 2407.02845 | null |
| 2024-07-03 | A Radiometric Correction based Optical Modeling Approach to Removing Reflection Noise in TLS Point Clouds of Urban Scenes | Li Fang et.al. | 2407.02830 | null |
| 2024-07-02 | Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks | Adrian Rebmann et.al. | 2407.02310 | link |
| 2024-07-02 | Counterfactual Data Augmentation with Denoising Diffusion for Graph Anomaly Detection | Chunjing Xiao et.al. | 2407.02143 | null |
| 2024-07-02 | HC-GLAD: Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly Detection | Yali Fu et.al. | 2407.02057 | link |
| 2024-07-02 | Enhancing Multi-Class Anomaly Detection via Diffusion Refinement with Dual Conditioning | Jiawei Zhan et.al. | 2407.01905 | null |
| 2024-07-02 | LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis | Tianyu Cui et.al. | 2407.01896 | link |
| 2024-07-01 | Science DMZ Networks: How Different are They Really? | Emily Mutter et.al. | 2407.01822 | null |
| 2024-07-01 | Optimization of Retrieval-Augmented Generation Context with Outlier Detection | Vitaly Bulgakov et.al. | 2407.01403 | null |
| 2024-07-01 | ToCoAD: Two-Stage Contrastive Learning for Industrial Anomaly Detection | Yun Liang et.al. | 2407.01312 | null |
| 2024-06-30 | Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models | Sangwoong Yoon et.al. | 2407.00626 | null |
| 2024-06-29 | Infrared Computer Vision for Utility-Scale Photovoltaic Array Inspection | David F. Ramirez et.al. | 2407.00544 | null |
| 2024-06-28 | Odd-One-Out: Anomaly Detection by Comparing with Neighbors | Ankan Bhunia et.al. | 2406.20099 | link |
| 2024-06-28 | HAITCH: A Framework for Distortion and Motion Correction in Fetal Multi-Shell Diffusion-Weighted MRI | Haykel Snoussi et.al. | 2406.20042 | null |
| 2024-06-28 | NetNN: Neural Intrusion Detection System in Programmable Networks | Kamran Razavi et.al. | 2406.19990 | null |
| 2024-06-28 | Self-Supervised Spatial-Temporal Normality Learning for Time Series Anomaly Detection | Yutong Chen et.al. | 2406.19770 | null |
| 2024-06-28 | xSemAD: Explainable Semantic Anomaly Detection in Event Logs Using Sequence-to-Sequence Models | Kiran Busch et.al. | 2406.19763 | null |
| 2024-06-28 | CHASE: A Causal Heterogeneous Graph based Framework for Root Cause Analysis in Multimodal Microservice Systems | Ziming Zhao et.al. | 2406.19711 | null |
| 2024-06-27 | Looking 3D: Anomaly Detection with 2D-3D Alignment | Ankan Bhunia et.al. | 2406.19393 | link |
| 2024-06-27 | Hack Me If You Can: Aggregating AutoEncoders for Countering Persistent Access Threats Within Highly Imbalanced Data | Sidahmed Benabderrahmane et.al. | 2406.19220 | link |
| 2024-06-27 | QSketch: An Efficient Sketch for Weighted Cardinality Estimation in Streams | Yiyan Qi et.al. | 2406.19143 | null |
| 2024-06-27 | CLIP3D-AD: Extending CLIP for 3D Few-Shot Anomaly Detection with Multi-View Images Generation | Zuo Zuo et.al. | 2406.18941 | null |
| 2024-06-27 | Statistical Test for Data Analysis Pipeline by Selective Inference | Tomohiro Shiraishi et.al. | 2406.18902 | link |
| 2024-06-27 | MissionGNN: Hierarchical Multimodal GNN-based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation | Sanggeon Yun et.al. | 2406.18815 | null |
| 2024-06-26 | Universal Anomaly Detection at the LHC: Transforming Optimal Classifiers and the DDD Method | Sascha Caron et.al. | 2406.18469 | null |
| 2024-06-26 | Human-free Prompted Based Anomaly Detection: prompt optimization with Meta-guiding prompt scheme | Pi-Wei Chen et.al. | 2406.18197 | null |
| 2024-06-26 | View-Invariant Pixelwise Anomaly Detection in Multi-object Scenes with Adaptive View Synthesis | Subin Varghese et.al. | 2406.18012 | null |
| 2024-06-25 | European Space Agency Benchmark for Anomaly Detection in Satellite Telemetry | Krzysztof Kotowski et.al. | 2406.17826 | null |
| 2024-06-25 | Diffusion-based Adversarial Purification for Intrusion Detection | Mohamed Amine Merzouk et.al. | 2406.17606 | null |
| 2024-06-25 | SincVAE: a New Approach to Improve Anomaly Detection on EEG Data Using SincNet and Variational Autoencoder | Andrea Pollastro et.al. | 2406.17537 | null |
| 2024-06-24 | Robust Zero Trust Architecture: Joint Blockchain based Federated learning and Anomaly Detection based Framework | Shiva Raj Pokhrel et.al. | 2406.17172 | null |
| 2024-06-24 | Integrating Generative AI with Network Digital Twins for Enhanced Network Operations | Kassi Muhammad et.al. | 2406.17112 | null |
| 2024-06-24 | Deep Learning and Chaos: A combined Approach To Image Encryption and Decryption | Bharath V Nair et.al. | 2406.16792 | null |
| 2024-06-25 | Anomaly Detection based on Markov Data: A Statistical Depth Approach | Carlos Fernández et.al. | 2406.16759 | null |
| 2024-06-24 | Machine Learning with Real-time and Small Footprint Anomaly Detection System for In-Vehicle Gateway | Yi Wang et.al. | 2406.16369 | null |
| 2024-06-24 | Anomaly Detection of Tabular Data Using LLMs | Aodong Li et.al. | 2406.16308 | null |
| 2024-06-23 | Detecting Abnormal Operations in Concentrated Solar Power Plants from Irregular Sequences of Thermal Images | Sukanya Patra et.al. | 2406.16077 | null |
| 2024-06-22 | DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models | Wei Guan et.al. | 2406.15781 | null |
| 2024-06-21 | GenSQL: A Probabilistic Programming System for Querying Generative Models of Database Tables | Mathieu Huot et.al. | 2406.15652 | null |
| 2024-06-21 | Root Cause Analysis of Anomalies in 5G RAN Using Graph Neural Network and Transformer | Antor Hasan et.al. | 2406.15638 | null |
| 2024-06-24 | FT-AED: Benchmark Dataset for Early Freeway Traffic Anomalous Event Detection | Austin Coursey et.al. | 2406.15283 | null |
| 2024-06-21 | AI-based Anomaly Detection for Clinical-Grade Histopathological Diagnostics | Jonas Dippel et.al. | 2406.14866 | null |
| 2024-06-20 | Energy Mapping of Existing Building Stock in Cambridge using Energy Performance Certificates and Thermal Infrared Imagery | Yinglong He et.al. | 2406.14520 | null |
| 2024-06-20 | Rule-based outlier detection of AI-generated anatomy segmentations | Deepa Krishnaswamy et.al. | 2406.14486 | null |
| 2024-06-20 | ATAC-Net: Zoomed view works better for Anomaly Detection | Shaurya Gupta et.al. | 2406.14398 | null |
| 2024-06-20 | aeon: a Python toolkit for learning from time series | Matthew Middlehurst et.al. | 2406.14231 | link |
| 2024-06-21 | Image anomaly detection and prediction scheme based on SSA optimized ResNet50-BiGRU model | Qianhui Wan et.al. | 2406.13987 | null |
| 2024-06-19 | Benchmarking Unsupervised Online IDS for Masquerade Attacks in CAN | Pablo Moriano et.al. | 2406.13778 | link |
| 2024-06-19 | PPT-GNN: A Practical Pre-Trained Spatio-Temporal Graph Neural Network for Network Security | Louis Van Langendonck et.al. | 2406.13365 | null |
| 2024-06-19 | Enhancing supply chain security with automated machine learning | Haibo Wang et.al. | 2406.13166 | null |
| 2024-06-18 | Feasibility of Non-Line-of-Sight Integrated Sensing and Communication at mmWave | Paolo Tosi et.al. | 2406.12828 | null |
| 2024-06-18 | Online-Adaptive Anomaly Detection for Defect Identification in Aircraft Assembly | Siddhant Shete et.al. | 2406.12698 | null |
| 2024-06-18 | Tracking Real-time Anomalies in Cyber-Physical Systems Through Dynamic Behavioral Analysis | Prashanth Krishnamurthy et.al. | 2406.12438 | null |
| 2024-06-18 | A Cutting-Edge Deep Learning Method For Enhancing IoT Security | Nadia Ansar et.al. | 2406.12400 | null |
| 2024-06-18 | Self-Supervised Time-Series Anomaly Detection Using Learnable Data Augmentation | Kukjin Choi et.al. | 2406.12260 | null |
| 2024-06-18 | Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM | Huaxin Zhang et.al. | 2406.12235 | link |
| 2024-06-17 | Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection | Haiming Yao et.al. | 2406.11507 | null |
| 2024-06-17 | SEFraud: Graph-based Self-Explainable Fraud Detection via Interpretative Mask Learning | Kaidi Li et.al. | 2406.11389 | null |
| 2024-06-17 | VideoVista: A Versatile Benchmark for Video Understanding and Reasoning | Yunxin Li et.al. | 2406.11303 | null |
| 2024-06-18 | Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask | Jingyu Xiao et.al. | 2406.10928 | link |
| 2024-06-15 | Enhancing Anomaly Detection Generalization through Knowledge Exposure: The Dual Effects of Augmentation | Mohammad Akhavan Anvari et.al. | 2406.10617 | null |
| 2024-06-14 | Enhanced Intrusion Detection System for Multiclass Classification in UAV Networks | Safaa Menssouri et.al. | 2406.10417 | null |
| 2024-06-14 | VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs | Rohit Bharadwaj et.al. | 2406.10326 | link |
| 2024-06-14 | Outlier detection in maritime environments using AIS data and deep recurrent architectures | Constantine Maganaris et.al. | 2406.09966 | null |
| 2024-06-14 | Unraveling Anomalies in Time: Unsupervised Discovery and Isolation of Anomalous Behavior in Bio-regenerative Life Support System Telemetry | Ferdinand Rewicki et.al. | 2406.09825 | link |
| 2024-06-14 | Explainable AI for Comparative Analysis of Intrusion Detection Models | Pap M. Corea et.al. | 2406.09684 | link |
| 2024-06-13 | Comparison Visual Instruction Tuning | Wei Lin et.al. | 2406.09240 | null |
| 2024-06-13 | Detection-Rate-Emphasized Multi-objective Evolutionary Feature Selection for Network Intrusion Detection | Zi-Hang Cheng et.al. | 2406.09180 | null |
| 2024-06-13 | Weakly-supervised anomaly detection for multimodal data distributions | Xu Tan et.al. | 2406.09147 | null |
| 2024-06-13 | Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting Process: Methodology and Benchmark | Gaochang Wu et.al. | 2406.09016 | null |
| 2024-06-13 | Few-Shot Anomaly Detection via Category-Agnostic Registration Learning | Chaoqin Huang et.al. | 2406.08810 | link |
| 2024-06-12 | Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization | Fengxiao Tang et.al. | 2406.08305 | null |
| 2024-06-12 | Efficient Network Traffic Feature Sets for IoT Intrusion Detection | Miguel Silva et.al. | 2406.08042 | null |
| 2024-06-12 | Multivariate Log-based Anomaly Detection for Distributed Database | Lingzhe Zhang et.al. | 2406.07976 | null |
| 2024-06-11 | GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection | Hang Yao et.al. | 2406.07487 | null |
| 2024-06-11 | Anomaly Detection on Unstable Logs with GPT Models | Fatemeh Hadadi et.al. | 2406.07467 | null |
| 2024-06-11 | Global-Regularized Neighborhood Regression for Efficient Zero-Shot Texture Anomaly Detection | Haiming Yao et.al. | 2406.07333 | null |
| 2024-06-11 | Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring | Tomoya Nishida et.al. | 2406.07250 | null |
| 2024-06-11 | RAD: A Comprehensive Dataset for Benchmarking the Robustness of Image Anomaly Detection | Yuqi Cheng et.al. | 2406.07176 | null |
| 2024-06-11 | CARACAS: vehiCular ArchitectuRe for detAiled Can Attacks Simulation | Sadek Misto Kirdi et.al. | 2406.07125 | null |
| 2024-06-10 | Hybrid Video Anomaly Detection for Anomalous Scenarios in Autonomous Driving | Daniel Bogdoll et.al. | 2406.06423 | null |
| 2024-06-10 | UMAD: Unsupervised Mask-Level Anomaly Detection for Autonomous Driving | Daniel Bogdoll et.al. | 2406.06370 | null |
| 2024-06-10 | Federated learning in food research | Zuzanna Fendor et.al. | 2406.06202 | null |
| 2024-06-10 | Sequential Binary Classification for Intrusion Detection in Software Defined Networks | Ishan Chokshi et.al. | 2406.06099 | null |
| 2024-06-10 | fSEAD: a Composable FPGA-based Streaming Ensemble Anomaly Detection Library | Binglei Lou et.al. | 2406.05999 | link |
| 2024-06-08 | A Novel Generative AI-Based Framework for Anomaly Detection in Multicast Messages in Smart Grid Communications | Aydin Zaboli et.al. | 2406.05472 | null |
| 2024-06-08 | Novel Approach to Intrusion Detection: Introducing GAN-MSCNN-BILSTM with LIME Predictions | Asmaa Benchama et.al. | 2406.05443 | null |
| 2024-06-08 | RAPID: Robust APT Detection and Investigation Using Context-Aware Deep Learning | Yonatan Amaru et.al. | 2406.05362 | null |
| 2024-06-07 | GANetic Loss for Generative Adversarial Networks with a Focus on Medical Applications | Shakhnaz Akhmedova et.al. | 2406.05023 | link |
| 2024-06-07 | PolyLUT-Add: FPGA-based LUT Inference with Wide Inputs | Binglei Lou et.al. | 2406.04910 | link |
| 2024-06-07 | Higher-order Structure Based Anomaly Detection on Attributed Networks | Xu Yuan et.al. | 2406.04690 | null |
| 2024-06-07 | LogiCode: an LLM-Driven Framework for Logical Anomaly Detection | Yiheng Zhang et.al. | 2406.04687 | null |
| 2024-06-07 | A Recover-then-Discriminate Framework for Robust Anomaly Detection | Peng Xing et.al. | 2406.04608 | null |
| 2024-06-07 | Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach | Jianbo Dong et.al. | 2406.04594 | null |
| 2024-06-07 | Attention Fusion Reverse Distillation for Multi-Lighting Image Anomaly Detection | Yiheng Zhang et.al. | 2406.04573 | null |
| 2024-06-06 | Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models | Ali Behrouz et.al. | 2406.04320 | null |
| 2024-06-06 | Generative AI-in-the-loop: Integrating LLMs and GPTs into the Next Generation Networks | Han Zhang et.al. | 2406.04276 | null |
| 2024-06-06 | Credit Card Fraud Detection Using Advanced Transformer Model | Chang Yu et.al. | 2406.03733 | null |
| 2024-06-06 | Meta-learning for Positive-unlabeled Classification | Atsutoshi Kumagai et.al. | 2406.03680 | null |
| 2024-06-05 | Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs | Alexander Bakumenko et.al. | 2406.03614 | null |
| 2024-06-05 | Robust Prediction Model for Multidimensional and Unbalanced Datasets | Pooja Thakar et.al. | 2406.03507 | null |
| 2024-06-06 | ADer: A Comprehensive Benchmark for Multi-class Visual Anomaly Detection | Jiangning Zhang et.al. | 2406.03262 | link |
| 2024-06-05 | DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection | Ruituo Wu et.al. | 2406.02976 | null |
| 2024-06-05 | Multivariate Physics-Informed Convolutional Autoencoder for Anomaly Detection in Power Distribution Systems with High Penetration of DERs | Mehdi Jabbari Zideh et.al. | 2406.02927 | null |
| 2024-06-05 | Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection | Jash Dalvi et.al. | 2406.02831 | null |
| 2024-06-04 | Feasibility of State Space Models for Network Traffic Generation | Andrew Chu et.al. | 2406.02784 | null |
| 2024-06-04 | Diagnostic Digital Twin for Anomaly Detection in Floating Offshore Wind Energy | Florian Stadtmann et.al. | 2406.02775 | null |
| 2024-06-04 | Lightweight CNN-BiLSTM based Intrusion Detection Systems for Resource-Constrained IoT Devices | Mohammed Jouhari et.al. | 2406.02768 | null |
| 2024-06-04 | Pancreatic Tumor Segmentation as Anomaly Detection in CT Images Using Denoising Diffusion Models | Reza Babaei et.al. | 2406.02653 | null |
| 2024-06-04 | PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection | Ronghui Xu et.al. | 2406.02318 | null |
| 2024-06-04 | M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising | Chengjie Wang et.al. | 2406.02263 | null |
| 2024-06-04 | Review of searches for new physics at CMS | Anne-Mazarine Lyon et.al. | 2406.02010 | null |
| 2024-06-04 | Can Dense Connectivity Benefit Outlier Detection? An Odyssey with NAS | Hao Fu et.al. | 2406.01975 | null |
| 2024-06-03 | Diffusion Boosted Trees | Xizewen Han et.al. | 2406.01813 | null |
| 2024-06-03 | An Origami-Inspired Endoscopic Capsule with Tactile Perception for Early Tissue Anomaly Detection | Yukun Ge et.al. | 2406.01371 | null |
| 2024-06-03 | CUT: A Controllable, Universal, and Training-Free Visual Anomaly Generation Framework | Han Sun et.al. | 2406.01078 | null |
| 2024-06-03 | Enhancing Fairness in Unsupervised Graph Anomaly Detection through Disentanglement | Wenjing Chang et.al. | 2406.00987 | null |
| 2024-06-03 | A Synergistic Approach In Network Intrusion Detection By Neurosymbolic AI | Alice Bizzarri et.al. | 2406.00938 | null |
| 2024-06-02 | Expanding the Attack Scenarios of SAE J1939: A Comprehensive Analysis of Established and Novel Vulnerabilities in Transport Protocol | Hwejae Lee et.al. | 2406.00810 | null |
| 2024-05-30 | Optimizing cnn-Bigru performance: Mish activation and comparative analysis with Relu | Asmaa Benchama et.al. | 2405.20503 | null |
| 2024-05-30 | From Zero to Hero: Cold-Start Anomaly Detection | Tal Reiss et.al. | 2405.20341 | link |
| 2024-05-30 | The Solar System Notification Alert Processing System (SNAPS): Asteroid Population Outlier Detection | Michael Gowanlock et.al. | 2405.20176 | null |
| 2024-05-30 | Deep Reinforcement Learning for Intrusion Detection in IoT: A Survey | Afrah Gueriani et.al. | 2405.20038 | null |
| 2024-05-30 | Joint Selective State Space Model and Detrending for Robust Time Series Anomaly Detection | Junqi Chen et.al. | 2405.19823 | null |
| 2024-05-30 | Performance Examination of Symbolic Aggregate Approximation in IoT Applications | Suzana Veljanovska et.al. | 2405.19817 | null |
| 2024-05-29 | Video Anomaly Detection in 10 Years: A Survey and Outlook | Moshira Abdalla et.al. | 2405.19387 | null |
| 2024-05-29 | Comparative Study of Neighbor-based Methods for Local Outlier Detection | Zhuang Qi et.al. | 2405.19247 | null |
| 2024-05-29 | Early Detection of Critical Urban Events using Mobile Phone Network Data | Pierre Lemaire et.al. | 2405.19125 | null |
| 2024-05-29 | A Mallows-like Criterion for Anomaly Detection with Random Forest Implementation | Gaoxiang Zhao et.al. | 2405.18932 | null |
| 2024-05-29 | Deep Positive-Unlabeled Anomaly Detection for Contaminated Unlabeled Data | Hiroshi Takahashi et.al. | 2405.18929 | link |
| 2024-05-29 | Anomaly Detection by Context Contrasting | Alain Ryser et.al. | 2405.18848 | null |
| 2024-05-28 | When and How Does In-Distribution Label Help Out-of-Distribution Detection? | Xuefeng Du et.al. | 2405.18635 | link |
| 2024-05-28 | Enhancing IoT Security with CNN and LSTM-Based Intrusion Detection Systems | Afrah Gueriani et.al. | 2405.18624 | null |
| 2024-05-28 | Anomaly detection for the identification of volcanic unrest in satellite imagery | Robert Gabriel Popescu et.al. | 2405.18487 | null |
| 2024-05-28 | Long Short-Term Memory Networks for Anomaly Detection in Magnet Power Supplies of Particle Accelerators | Ihar Lobach et.al. | 2405.18321 | null |
| 2024-05-28 | Learning-Based Link Anomaly Detection in Continuous-Time Dynamic Graphs | Tim Poštuvan et.al. | 2405.18050 | link |
| 2024-05-28 | On Robust Clustering of Temporal Point Process | Yuecheng Zhang et.al. | 2405.17828 | null |
| 2024-05-27 | SmoothGNN: Smoothing-based GNN for Unsupervised Node Anomaly Detection | Xiangyu Dong et.al. | 2405.17525 | null |
| 2024-05-27 | Survey of Graph Neural Network for Internet of Things and NextG Networks | Sabarish Krishna Moorthy et.al. | 2405.17309 | null |
| 2024-05-27 | Hawk: Learning to Understand Open-World Video Anomalies | Jiaqi Tang et.al. | 2405.16886 | link |
| 2024-05-27 | ARC: A Generalist Graph Anomaly Detector with In-Context Learning | Yixin Liu et.al. | 2405.16771 | null |
| 2024-05-26 | A Study on Unsupervised Anomaly Detection and Defect Localization using Generative Model in Ultrasonic Non-Destructive Testing | Yusaku Ando et.al. | 2405.16580 | null |
| 2024-05-26 | KiNETGAN: Enabling Distributed Network Intrusion Detection through Knowledge-Infused Synthetic Data Generation | Anantaa Kotal et.al. | 2405.16476 | null |
| 2024-05-25 | Qsco: A Quantum Scoring Module for Open-set Supervised Anomaly Detection | Yifeng Peng et.al. | 2405.16368 | null |
| 2024-05-25 | Acquiring Better Load Estimates by Combining Anomaly and Change-point Detection in Power Grid Time-series Measurements | Roel Bouman et.al. | 2405.16164 | link |
| 2024-05-24 | UnitNorm: Rethinking Normalization for Transformers in Time Series | Nan Huang et.al. | 2405.15903 | null |
| 2024-05-24 | Anomalous Change Point Detection Using Probabilistic Predictive Coding | Roelof G. Hup et.al. | 2405.15727 | null |
| 2024-05-24 | Large Language Models can Deliver Accurate and Interpretable Time Series Anomaly Detection | Jun Liu et.al. | 2405.15370 | null |
| 2024-05-24 | Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders | Qichao Shentu et.al. | 2405.15273 | null |
| 2024-05-23 | Large language models can be zero-shot anomaly detectors for time series? | Sarah Alnegheimish et.al. | 2405.14755 | null |
| 2024-05-23 | Applied Machine Learning to Anomaly Detection in Enterprise Purchase Processes | A. Herreros-Martínez et.al. | 2405.14754 | null |
| 2024-05-23 | AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2 | Simon Damm et.al. | 2405.14529 | null |
| 2024-05-23 | Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection | Jia Guo et.al. | 2405.14325 | link |
| 2024-05-22 | Uncertainty-aware Evaluation of Auxiliary Anomalies with the Expected Anomaly Posterior | Lorenzo Perini et.al. | 2405.13699 | null |
| 2024-05-22 | Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com | Sergei Krutikov et.al. | 2405.13692 | null |
| 2024-05-22 | GNN-based Anomaly Detection for Encoded Network Traffic | Anasuya Chattopadhyay et.al. | 2405.13670 | null |
| 2024-05-22 | LogRCA: Log-based Root Cause Analysis for Distributed Services | Thorsten Wittkopp et.al. | 2405.13599 | null |
| 2024-05-22 | Cross-Modal Distillation in Industrial Anomaly Detection: Exploring Efficient Multi-Modal IAD | Wenbo Sui et.al. | 2405.13571 | null |
| 2024-05-22 | Kinematics of Abdominal Aortic Aneurysms | Mostafa Jamshidian et.al. | 2405.13377 | null |
| 2024-05-21 | Strategic Deployment of Honeypots in Blockchain-based IoT Systems | Daniel Commey et.al. | 2405.12951 | null |
| 2024-05-21 | Spatial-aware Attention Generative Adversarial Network for Semi-supervised Anomaly Detection in Medical Image | Zerui Zhang et.al. | 2405.12872 | null |
| 2024-05-21 | Generative AI and Large Language Models for Cyber Security: All Insights You Need | Mohamed Amine Ferrag et.al. | 2405.12750 | null |
| 2024-05-21 | Multimodal video analysis for crowd anomaly detection using open access tourism cameras | Alejandro Dionis-Ros et.al. | 2405.12708 | null |
| 2024-05-21 | EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy | Yihong Huang et.al. | 2405.12502 | null |
| 2024-05-20 | Automated Anomaly Detection on European XFEL Klystrons | Antonin Sulc et.al. | 2405.12391 | null |
| 2024-05-20 | PATE: Proximity-Aware Time series anomaly Evaluation | Ramin Ghorbani et.al. | 2405.12096 | link |
| 2024-05-20 | Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays | Zhichao Sun et.al. | 2405.11976 | link |
| 2024-05-20 | Dynamic classifier auditing by unsupervised anomaly detection methods: an application in packaging industry predictive maintenance | Fernando Mateo et.al. | 2405.11960 | null |
| 2024-05-18 | MediCLIP: Adapting CLIP for Few-shot Medical Image Anomaly Detection | Ximiao Zhang et.al. | 2405.11315 | link |
| 2024-05-18 | Few-Shot API Attack Detection: Overcoming Data Scarcity with GAN-Inspired Learning | Udi Aharon et.al. | 2405.11258 | null |
| 2024-05-18 | Few-Shot API Attack Anomaly Detection in a Classification-by-Retrieval Framework | Udi Aharon et.al. | 2405.11247 | null |
| 2024-05-18 | SimAD: A Simple Dissimilarity-based Approach for Time Series Anomaly Detection | Zhijie Zhong et.al. | 2405.11238 | link |
| 2024-05-18 | OTLP: Output Thresholding Using Mixed Integer Linear Programming | Baran Koseoglu et.al. | 2405.11230 | null |
| 2024-05-18 | Enhancing Automata Learning with Statistical Machine Learning: A Network Security Case Study | Negin Ayoughi et.al. | 2405.11141 | null |
| 2024-05-17 | Safety in Graph Machine Learning: Threats and Safeguards | Song Wang et.al. | 2405.11034 | null |
| 2024-05-17 | FitNets: An Adaptive Framework to Learn Accurate Traffic Distributions | Alexander Dietmüller et.al. | 2405.10931 | null |
| 2024-05-17 | Rethinking Graph Backdoor Attacks: A Distribution-Preserving Perspective | Zhiwei Zhang et.al. | 2405.10757 | null |
| 2024-05-17 | Harnessing Collective Structure Knowledge in Data Augmentation for Graph Neural Networks | Rongrong Ma et.al. | 2405.10633 | null |
| 2024-05-17 | ECATS: Explainable-by-design concept-based anomaly detection for time series | Irene Ferfoglia et.al. | 2405.10608 | null |
| 2024-05-16 | Networking Systems for Video Anomaly Detection: A Tutorial and Survey | Jing Liu et.al. | 2405.10347 | link |
| 2024-05-16 | Applications of Quantum Machine Learning for Quantitative Finance | Piotr Mironowicz et.al. | 2405.10119 | null |
| 2024-05-16 | MiniMaxAD: A Lightweight Autoencoder for Feature-Rich Anomaly Detection | Fengjie Wang et.al. | 2405.09933 | null |
| 2024-05-15 | BARO: Robust Root Cause Analysis for Microservices via Multivariate Bayesian Online Change Point Detection | Luan Pham et.al. | 2405.09330 | link |
| 2024-05-15 | A Hierarchically Feature Reconstructed Autoencoder for Unsupervised Anomaly Detection | Honghui Chen et.al. | 2405.09148 | null |
| 2024-05-14 | Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis | Alexandre Englebert et.al. | 2405.08932 | link |
| 2024-05-14 | Incorporating Physical Priors into Weakly-Supervised Anomaly Detection | Chi Lung Cheng et.al. | 2405.08889 | null |
| 2024-05-14 | GPS-IDS: An Anomaly-based GPS Spoofing Attack Detection Framework for Autonomous Vehicles | Murad Mehrab Abrar et.al. | 2405.08359 | null |
| 2024-05-14 | Model-Free Unsupervised Anomaly detection framework in multivariate time-series of industrial dynamical systems | Mazen Alamir et.al. | 2405.08349 | null |
| 2024-05-14 | Facilitating Feature and Topology Lightweighting: An Ethereum Transaction Graph Compression Method for Malicious Account Detection | Xuanze Chen et.al. | 2405.08278 | null |
| 2024-05-13 | Enhancing Rover Mobility Monitoring: Autoencoder-driven Anomaly Detection for Curiosity | Mielad Sabzehi et.al. | 2405.07982 | null |
| 2024-05-13 | IMAFD: An Interpretable Multi-stage Approach to Flood Detection from time series Multispectral Data | Ziyang Zhang et.al. | 2405.07916 | null |
| 2024-05-13 | AnoVox: A Benchmark for Multimodal Anomaly Detection in Autonomous Driving | Daniel Bogdoll et.al. | 2405.07865 | link |
| 2024-05-13 | DeepHYDRA: Resource-Efficient Time-Series Anomaly Detection in Dynamically-Configured Systems | Franz Kevin Stehle et.al. | 2405.07749 | link |
| 2024-05-13 | AnomalyLLM: Few-shot Anomaly Edge Detection for Dynamic Graphs using Large Language Models | Shuo Liu et.al. | 2405.07626 | link |
| 2024-05-13 | RESTAD: REconstruction and Similarity based Transformer for time series Anomaly Detection | Ramin Ghorbani et.al. | 2405.07509 | link |
| 2024-05-12 | A Flow is a Stream of Packets: A Stream-Structured Data Approach for DDoS Detection | Raja Giryes et.al. | 2405.07232 | null |
| 2024-05-11 | Fractals as Pre-training Datasets for Anomaly Detection and Localization | C. I. Ugwu et.al. | 2405.06980 | null |
| 2024-05-11 | Semi-supervised Anomaly Detection via Adaptive Reinforcement Learning-Enabled Method with Causal Inference | Xiangwei Chen et.al. | 2405.06925 | null |
| 2024-05-11 | Generation of Granular-Balls for Clustering Based on the Principle of Justifiable Granularity | Zhen Zhang et.al. | 2405.06904 | null |
| 2024-05-10 | Continuous-variable Quantum Boltzmann Machine | Shikha Bangar et.al. | 2405.06580 | null |
| 2024-05-10 | Attend, Distill, Detect: Attention-aware Entropy Distillation for Anomaly Detection | Sushovan Jena et.al. | 2405.06467 | null |
| 2024-05-10 | TS3IM: Unveiling Structural Similarity in Time Series through Image Similarity Assessment Insights | Yuhan Liu et.al. | 2405.06234 | null |
| 2024-05-10 | MAPL: Memory Augmentation and Pseudo-Labeling for Semi-Supervised Anomaly Detection | Junzhuo Chen et.al. | 2405.06198 | link |
| 2024-05-10 | Anomaly Detection in Graph Structured Data: A Survey | Prabin B Lamichhane et.al. | 2405.06172 | null |
| 2024-05-09 | Advancing Anomaly Detection in Computational Workflows with Active Learning | Krishnan Raghavan et.al. | 2405.06133 | null |
| 2024-05-09 | Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask | Zineb Senane et.al. | 2405.05959 | link |
| 2024-05-09 | Exploiting Autoencoder’s Weakness to Generate Pseudo Anomalies | Marcella Astrid et.al. | 2405.05886 | null |
| 2024-05-09 | PLLM-CS: Pre-trained Large Language Model (LLM) for Cyber Threat Detection in Satellite Networks | Mohammed Hassanin et.al. | 2405.05469 | null |
| 2024-05-08 | Anomaly Detection in Certificate Transparency Logs | Richard Ostertág et.al. | 2405.05206 | null |
| 2024-05-08 | Discrepancy-based Diffusion Models for Lesion Detection in Brain MRI | Keqiang Fan et.al. | 2405.04974 | null |
| 2024-05-08 | Supervised Anomaly Detection for Complex Industrial Images | Aimira Baitieva et.al. | 2405.04953 | link |
| 2024-05-08 | Persistent homology of featured time series data and its applications | Eunwoo Heo et.al. | 2405.04796 | null |
| 2024-05-08 | Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection | Zhaoxiang Zhang et.al. | 2405.04782 | null |
| 2024-05-09 | Large Language Models for Cyber Security: A Systematic Literature Review | HanXiang Xu et.al. | 2405.04760 | null |
| 2024-05-07 | Research on financial fraud algorithm based on federal learning and big data technology | Xinye Sha et.al. | 2405.03992 | null |
| 2024-05-06 | On the Influence of Data Resampling for Deep Learning-Based Log Anomaly Detection: Insights and Recommendations | Xiaoxue Ma et.al. | 2405.03489 | link |
| 2024-05-07 | A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series | Ziquan Deng et.al. | 2405.03234 | null |
| 2024-05-06 | Braced Fourier Continuation and Regression for Anomaly Detection | Josef Sabuda et.al. | 2405.03180 | link |
| 2024-05-05 | AnoGAN for Tabular Data: A Novel Approach to Anomaly Detection | Aditya Singh et.al. | 2405.03075 | null |
| 2024-05-05 | A Model-Free Kullback-Leibler Divergence Filter for Anomaly Detection in Noisy Data Series | Ruikun Zhou et.al. | 2405.03047 | null |
| 2024-05-05 | Defense against Joint Poison and Evasion Attacks: A Case Study of DERMS | Zain ul Abdeen et.al. | 2405.02989 | null |
| 2024-05-04 | Systematic Review: Anomaly Detection in Connected and Autonomous Vehicles | J. R. V. Solaas et.al. | 2405.02731 | null |
| 2024-05-04 | Position Paper: Quo Vadis, Unsupervised Time Series Anomaly Detection? | M. Saquib Sarfraz et.al. | 2405.02678 | null |
| 2024-05-04 | Generic Multi-modal Representation Learning for Network Traffic Analysis | Luca Gioacchini et.al. | 2405.02649 | null |
| 2024-05-04 | A Data Mining-Based Dynamical Anomaly Detection Method for Integrating with an Advance Metering System | Sarit Maitra et.al. | 2405.02574 | null |
| 2024-05-03 | Subgraph2vec: A random walk-based algorithm for embedding knowledge graphs | Elika Bozorgi et.al. | 2405.02240 | null |
| 2024-05-03 | Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection | Canhui Tang et.al. | 2405.02068 | link |
| 2024-05-03 | Detecting and Deterring Manipulation in a Cognitive Hierarchy | Nitay Alon et.al. | 2405.01870 | null |
| 2024-05-02 | Language-Enhanced Latent Representations for Out-of-Distribution Detection in Autonomous Driving | Zhenjiang Mao et.al. | 2405.01691 | null |
| 2024-05-02 | GTX: A Transactional Graph Data System For HTAP Workloads | Libin Zhou et.al. | 2405.01448 | null |
| 2024-05-02 | A Framework for the Systematic Assessment of Anomaly Detectors in Time-Sensitive Automotive Networks | Philipp Meyer et.al. | 2405.01324 | null |
| 2024-05-02 | Interpretable Data-driven Anomaly Detection in Industrial Processes with ExIFFI | Davide Frizzo et.al. | 2405.01158 | null |
| 2024-05-01 | Quantum algorithms for matrix geometric means | Nana Liu et.al. | 2405.00673 | null |
| 2024-04-30 | IgCONDA-PET: Implicitly-Guided Counterfactual Diffusion for Detecting Anomalies in PET Images | Shadab Ahamed et.al. | 2405.00239 | link |
| 2024-04-30 | Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly | Hang Du et.al. | 2405.00181 | link |
| 2024-04-30 | Rockafellian Relaxation for PDE-Constrained Optimization with Distributional Uncertainty | Harbir Antil et.al. | 2405.00176 | null |
| 2024-04-30 | Improved AutoEncoder with LSTM module and KL divergence | Wei Huang et.al. | 2404.19247 | null |
| 2024-04-29 | Enhancing IoT Security: A Novel Feature Engineering Approach for ML-Based Intrusion Detection Systems | Afsaneh Mahanipour et.al. | 2404.19114 | null |
| 2024-04-29 | A Survey on Diffusion Models for Time Series and Spatio-Temporal Data | Yiyuan Yang et.al. | 2404.18886 | link |
| 2024-04-29 | Evaluating the Effectiveness of Video Anomaly Detection in the Wild: Online Learning and Inference for Real-world Deployment | Shanle Yao et.al. | 2404.18747 | null |
| 2024-04-29 | Self-supervised learning for classifying paranasal anomalies in the maxillary sinus | Debayan Bhattacharya et.al. | 2404.18599 | link |
| 2024-04-29 | Enabling Efficient and Flexible Interpretability of Data-driven Anomaly Detection in Industrial Processes with AcME-AD | Valentina Zaccaria et.al. | 2404.18525 | link |
| 2024-04-29 | Self-supervised contrastive learning of radio data for source detection, classification and peculiar object discovery | S. Riggi et.al. | 2404.18462 | null |
| 2024-04-28 | Multi-stage Attack Detection and Prediction Using Graph Neural Networks: An IoT Feasibility Study | Hamdi Friji et.al. | 2404.18328 | null |
| 2024-04-27 | A Method of Moments Embedding Constraint and its Application to Semi-Supervised Learning | Michael Majurski et.al. | 2404.17978 | null |
| 2024-04-27 | Accurate and fast anomaly detection in industrial processes and IoT environments | Simone Tonini et.al. | 2404.17925 | null |
| 2024-04-27 | Unsupervised Anomaly Detection via Masked Diffusion Posterior Sampling | Di Wu et.al. | 2404.17900 | null |
| 2024-04-29 | Domain Adaptive and Fine-grained Anomaly Detection for Single-cell Sequencing Data and Beyond | Kaichen Xu et.al. | 2404.17454 | link |
| 2024-04-26 | Frequency-Guided Multi-Level Human Action Anomaly Detection with Normalizing Flows | Shun Maeda et.al. | 2404.17381 | null |
| 2024-04-26 | Synchronized Stepwise Control of Firing and Learning Thresholds in a Spiking Randomly Connected Neural Network toward Hardware Implementation | Kumiko Nomura et.al. | 2404.17241 | null |
| 2024-04-25 | Dr-SAM: An End-to-End Framework for Vascular Segmentation, Diameter Estimation, and Anomaly Detection on Angiography Images | Vazgen Zohranyan et.al. | 2404.17029 | null |
| 2024-04-24 | Anomaly Detection for Incident Response at Scale | Hanzhang Wang et.al. | 2404.16887 | null |
| 2024-04-25 | Guarding Graph Neural Networks for Unsupervised Graph Anomaly Detection | Yuanchen Bei et.al. | 2404.16366 | null |
| 2024-04-24 | ABCD: Trust enhanced Attention based Convolutional Autoencoder for Risk Assessment | Sarala Naidu et.al. | 2404.16183 | null |
| 2024-04-24 | S2DEVFMAP: Self-Supervised Learning Framework with Dual Ensemble Voting Fusion for Maximizing Anomaly Prediction in Timeseries | Sarala Naidu et.al. | 2404.16179 | null |
| 2024-04-24 | OmniLearn: A Method to Simultaneously Facilitate All Jet Physics Tasks | Vinicius Mikuni et.al. | 2404.16091 | link |
| 2024-04-23 | Feature Distribution Shift Mitigation with Contrastive Pretraining for Intrusion Detection | Weixing Wang et.al. | 2404.15382 | null |
| 2024-04-23 | IPAD: Industrial Process Anomaly Detection Dataset | Jinfan Liu et.al. | 2404.15033 | null |
| 2024-04-23 | Fin-Fed-OD: Federated Outlier Detection on Financial Tabular Data | Dayananda Herurkar et.al. | 2404.14933 | null |
| 2024-04-23 | A Customer Level Fraudulent Activity Detection Benchmark for Enhancing Machine Learning Model Research and Evaluation | Phoebe Jing et.al. | 2404.14746 | null |
| 2024-04-23 | Incorporating Gradients to Rules: Towards Lightweight, Adaptive Provenance-based Intrusion Detection | Lingzhi Wang et.al. | 2404.14720 | null |
| 2024-04-23 | Deep Overlapping Community Search via Subspace Embedding | Qing Sima et.al. | 2404.14692 | null |
| 2024-04-21 | A Neuro-Symbolic Explainer for Rare Events: A Case Study on Predictive Maintenance | João Gama et.al. | 2404.14455 | null |
| 2024-04-20 | Generative Subspace Adversarial Active Learning for Outlier Detection in Multiple Views of High-dimensional Data | Jose Cribeiro-Ramallo et.al. | 2404.14451 | null |
| 2024-04-22 | Explaining Arguments’ Strength: Unveiling the Role of Attacks and Supports (Technical Report) | Xiang Yin et.al. | 2404.14304 | null |
| 2024-04-21 | Detecting Compromised IoT Devices Using Autoencoders with Sequential Hypothesis Testing | Md Mainuddin et.al. | 2404.13690 | null |
| 2024-04-21 | FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization | Zhaopeng Gu et.al. | 2404.13671 | link |
| 2024-04-20 | Intrusion Detection at Scale with the Assistance of a Command-line Language Model | Jiongliang Lin et.al. | 2404.13402 | null |
| 2024-04-20 | Hyperspectral Anomaly Detection with Self-Supervised Anomaly Prior | Yidan Liu et.al. | 2404.13342 | null |
| 2024-04-20 | Multi-feature Reconstruction Network using Crossed-mask Restoration for Unsupervised Anomaly Detection | Junpu Wang et.al. | 2404.13273 | null |
| 2024-04-19 | uTRAND: Unsupervised Anomaly Detection in Traffic Trajectories | Giacomo D’Amicantonio et.al. | 2404.12712 | null |
| 2024-04-19 | Detecting Out-Of-Distribution Earth Observation Images with Diffusion Models | Georges Le Bellier et.al. | 2404.12667 | null |
| 2024-04-18 | Blind Localization and Clustering of Anomalies in Textures | Andrei-Timotei Ardelean et.al. | 2404.12246 | null |
| 2024-04-18 | Warped Time Series Anomaly Detection | Charlotte Lacoquelle et.al. | 2404.12134 | null |
| 2024-04-17 | Simulating Cloud Environments of Connected Vehicles for Anomaly Detection | M. Weiß et.al. | 2404.11740 | null |
| 2024-04-17 | Uncertainty estimation and anomaly detection in chiral effective field theory studies of key nuclear electroweak processes | Bijaya Acharya et.al. | 2404.11522 | null |
| 2024-04-19 | LogSD: Detecting Anomalies from System Logs through Self-supervised Learning and Frequency-based Masking | Yongzheng Xie et.al. | 2404.11294 | null |
| 2024-04-17 | DACAD: Domain Adaptation Contrastive Learning for Anomaly Detection in Multivariate Time Series | Zahra Zamanzadeh Darban et.al. | 2404.11269 | null |
| 2024-04-16 | Unsupervised machine learning for the detection of exotic phases in skyrmion phase diagrams | F. A. Gómez Albarracín et.al. | 2404.10943 | null |
| 2024-04-16 | Advancing Network Intrusion Detection: Integrating Graph Neural Networks with Scattering Transform and Node2Vec for Enhanced Anomaly Detection | Abdeljalil Zoubir et.al. | 2404.10800 | null |
| 2024-04-16 | Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark | Jiangning Zhang et.al. | 2404.10760 | link |
| 2024-04-16 | A Calibrated and Automated Simulator for Innovations in 5G | Conrado Boeira et.al. | 2404.10643 | null |
| 2024-04-16 | Community detection and anomaly prediction in dynamic networks | Hadiseh Safdari et.al. | 2404.10468 | null |
| 2024-04-16 | CARE to Compare: A real-world dataset for anomaly detection in wind turbine data | Christian Gück et.al. | 2404.10320 | null |
| 2024-04-16 | Anomaly Correction of Business Processes Using Transformer Autoencoder | Ziyou Gong et.al. | 2404.10211 | null |
| 2024-04-15 | Explainable Online Unsupervised Anomaly Detection for Cyber-Physical Systems via Causal Discovery from Time Series | Daniele Meli et.al. | 2404.09871 | null |
| 2024-04-15 | Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection | Jiaqi Zhu et.al. | 2404.09654 | null |
| 2024-04-15 | Privacy-Preserving Intrusion Detection using Convolutional Neural Networks | Martin Kodys et.al. | 2404.09625 | null |
| 2024-04-14 | Machine learning-based identification of Gaia astrometric exoplanet orbits | Johannes Sahlmann et.al. | 2404.09350 | null |
| 2024-04-14 | Reap the Wild Wind: Detecting Media Storms in Large-Scale News Corpora | Dror K. Markus et.al. | 2404.09299 | null |
| 2024-04-14 | Fault Detection in Mobile Networks Using Diffusion Models | Mohamad Nabeel et.al. | 2404.09240 | null |
| 2024-04-13 | Label-free Anomaly Detection in Aerial Agricultural Images with Masked Image Modeling | Sambal Shikhar et.al. | 2404.08931 | null |
| 2024-04-12 | FastLogAD: Log Anomaly Detection with Mask-Guided Pseudo Anomaly Generation and Discrimination | Yifei Lin et.al. | 2404.08750 | link |
| 2024-04-12 | Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection | Zhiwei Yang et.al. | 2404.08531 | null |
| 2024-04-12 | TSLANet: Rethinking Transformers for Time Series Representation Learning | Emadeldeen Eldele et.al. | 2404.08472 | null |
| 2024-04-12 | Adaptive Anomaly Detection Disruption Prediction Starting from First Discharge | Xinkun Ai et.al. | 2404.08241 | null |
| 2024-04-12 | HCL-MTSAD: Hierarchical Contrastive Consistency Learning for Accurate Detection of Industrial Multivariate Time Series Anomalies | Haili Sun et.al. | 2404.08224 | null |
| 2024-04-11 | Anomaly Detection in Power Grids via Context-Agnostic Learning | SangWoo Park et.al. | 2404.07898 | null |
| 2024-04-11 | Context-aware Video Anomaly Detection in Long-Term Datasets | Zhengye Yang et.al. | 2404.07887 | null |
| 2024-04-11 | M-dwarf flares in the Zwicky Transient Facility data and what we can learn from them | A. S. Voloshina et.al. | 2404.07812 | null |
| 2024-04-11 | 3D-CSAD: Untrained 3D Anomaly Detection for Complex Manufacturing Surfaces | Xuanming Cao et.al. | 2404.07748 | null |
| 2024-04-11 | Multi-Image Visual Question Answering for Unsupervised Anomaly Detection | Jun Li et.al. | 2404.07622 | null |
| 2024-04-11 | Enhancing Network Intrusion Detection Performance using Generative Adversarial Networks | Xinxing Zhao et.al. | 2404.07464 | null |
| 2024-04-10 | Complete Optimal Non-Resonant Anomaly Detection | Gregor Kasieczka et.al. | 2404.07258 | null |
| 2024-04-10 | SplatPose & Detect: Pose-Agnostic 3D Anomaly Detection | Mathis Kruse et.al. | 2404.06832 | link |
| 2024-04-11 | MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection | Haoyang He et.al. | 2404.06564 | null |
| 2024-04-09 | Aggressive or Imperceptible, or Both: Network Pruning Assisted Hybrid Byzantines in Federated Learning | Emre Ozfatura et.al. | 2404.06230 | null |
| 2024-04-09 | Differential Privacy for Anomaly Detection: Analyzing the Trade-off Between Privacy and Explainability | Fatima Ezzeddine et.al. | 2404.06144 | null |
| 2024-04-09 | Supervised Contamination Detection, with Flow Cytometry Application | Solenne Gaucher et.al. | 2404.06093 | link |
| 2024-04-10 | AI-Enabled System for Efficient and Effective Cyber Incident Detection and Response in Cloud Environments | Mohammed Ashfaaq M. Farzaan et.al. | 2404.05602 | null |
| 2024-04-08 | Semi-Supervised Novelty Detection for Precise Ultra-Wideband Error Signal Prediction | Umberto Albertin et.al. | 2404.05351 | null |
| 2024-04-08 | PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection | Xiaofan Li et.al. | 2404.05231 | link |
| 2024-04-08 | Out-of-Distribution Data: An Acquaintance of Adversarial Examples – A Survey | Naveen Karunanayake et.al. | 2404.05219 | null |
| 2024-04-07 | TimeCSL: Unsupervised Contrastive Learning of General Shapelets for Explorable Time Series Analysis | Zhiyu Liang et.al. | 2404.05057 | null |
| 2024-04-07 | Dynamic Distinction Learning: Adaptive Pseudo Anomalies for Video Anomaly Detection | Demetris Lappas et.al. | 2404.04986 | link |
| 2024-04-07 | Anomaly Detection in Electrocardiograms: Advancing Clinical Diagnosis Through Self-Supervised Learning | Aofan Jiang et.al. | 2404.04935 | null |
| 2024-04-06 | CANEDERLI: On The Impact of Adversarial Training and Transferability on CAN Intrusion Detection Systems | Francesco Marchiori et.al. | 2404.04648 | null |
| 2024-04-06 | MedIAnomaly: A comparative study of anomaly detection in medical images | Yu Cai et.al. | 2404.04518 | link |
| 2024-04-06 | Beyond the Known: Adversarial Autoencoders in Novelty Detection | Muhammad Asad et.al. | 2404.04456 | null |
| 2024-04-05 | Fusing Dictionary Learning and Support Vector Machines for Unsupervised Anomaly Detection | Paul Irofti et.al. | 2404.04064 | link |
| 2024-04-04 | A Systems Theoretic Approach to Online Machine Learning | Anli du Preez et.al. | 2404.03775 | null |
| 2024-04-04 | Test Time Training for Industrial Anomaly Segmentation | Alex Costanzino et.al. | 2404.03743 | null |
| 2024-04-04 | About Test-time training for outlier detection | Simon Klüttermann et.al. | 2404.03495 | null |
| 2024-04-03 | Transfer learning applications for anomaly detection in wind turbines | Cyriana M. A. Roelofs et.al. | 2404.03011 | null |
| 2024-04-03 | Foundation Models for Structural Health Monitoring | Luca Benfenati et.al. | 2404.02944 | link |
| 2024-04-03 | End-To-End Self-tuning Self-supervised Time Series Anomaly Detection | Boje Deforce et.al. | 2404.02865 | null |
| 2024-04-03 | QFNN-FFD: Quantum Federated Neural Network for Financial Fraud Detection | Nouhaila Innan et.al. | 2404.02595 | null |
| 2024-04-03 | Learning with errors based dynamic encryption that discloses residue signal for anomaly detection | Yeongjun Jang et.al. | 2404.02574 | null |
| 2024-04-02 | Deep Learning for AGILE Anticoincidence System’s Background Prediction from Orbital and Attitude Parameters | N. Parmiggiani et.al. | 2404.02107 | null |
| 2024-04-02 | Enhancing Functional Safety in Automotive AMS Circuits through Unsupervised Machine Learning | Ayush Arunachalam et.al. | 2404.01632 | null |
| 2024-04-02 | FLEXIS: FLEXible Frequent Subgraph Mining using Maximal Independent Sets | Akshit Sharma et.al. | 2404.01585 | null |
| 2024-04-01 | Decentralized Collaborative Learning Framework with External Privacy Leakage Analysis | Tsuyoshi Idé et.al. | 2404.01270 | null |
| 2024-04-01 | Anomaly Detection and Approximate Similarity Searches of Transients in Real-time Data Streams | P. D. Aleo et.al. | 2404.01235 | null |
| 2024-04-01 | An incremental hybrid adaptive network-based IDS in Software Defined Networks to detect stealth attacks | Abdullah H Alqahtani et.al. | 2404.01109 | null |
| 2024-04-01 | Harnessing Large Language Models for Training-free Video Anomaly Detection | Luca Zanella et.al. | 2404.01014 | null |
| 2024-04-01 | Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline | Anas Al-lahham et.al. | 2404.00847 | null |
| 2024-03-31 | On the True Distribution Approximation of Minimum Bayes-Risk Decoding | Atsumoto Ohashi et.al. | 2404.00752 | link |
| 2024-03-31 | Absolute-Unified Multi-Class Anomaly Detection via Class-Agnostic Distribution Alignment | Jia Guo et.al. | 2404.00724 | null |
| 2024-03-29 | Long-Tailed Anomaly Detection with Learnable Class Names | Chih-Hui Ho et.al. | 2403.20236 | null |
| 2024-03-29 | MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark | Sanghyun Woo et.al. | 2403.20225 | null |
| 2024-03-28 | Enhancing Anomaly Detection in Financial Markets with an LLM-based Multi-Agent Framework | Taejin Park et.al. | 2403.19735 | null |
| 2024-03-28 | Quantitatively rating galaxy simulations against real observations with anomaly detection | Zehao Jin et.al. | 2403.19464 | link |
| 2024-03-28 | Genos: General In-Network Unsupervised Intrusion Detection by Rule Extraction | Ruoyu Li et.al. | 2403.19248 | link |
| 2024-03-28 | Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection | Hao Shen et.al. | 2403.19111 | null |
| 2024-03-31 | Few-Shot Cross-System Anomaly Trace Classification for Microservice-based systems | Yuqing Wang et.al. | 2403.18998 | null |
| 2024-03-27 | Dealing with Imbalanced Classes in Bot-IoT Dataset | Jesse Atuhurra et.al. | 2403.18989 | null |
| 2024-03-27 | A Data-Driven Search For Mid-Infrared Excesses Among Five Million Main-Sequence FGK Stars | Gabriella Contardo et.al. | 2403.18941 | link |
| 2024-03-27 | A Transformer-Based Framework for Payload Malware Detection and Classification | Kyle Stein et.al. | 2403.18223 | null |
| 2024-03-27 | Road Obstacle Detection based on Unknown Objectness Scores | Chihiro Noguchi et.al. | 2403.18207 | null |
| 2024-03-27 | Few-shot Online Anomaly Detection and Segmentation | Shenxing Wei et.al. | 2403.18201 | null |
| 2024-03-24 | EG-ConMix: An Intrusion Detection Method based on Graph Contrastive Learning | Lijin Wu et.al. | 2403.17980 | null |
| 2024-03-26 | Practical Applications of Advanced Cloud Services and Generative AI Systems in Medical Image Analysis | Jingyu Xu et.al. | 2403.17549 | null |
| 2024-03-26 | FaultGuard: A Generative Approach to Resilient Fault Prediction in Smart Electrical Grids | Emad Efatinasab et.al. | 2403.17494 | null |
| 2024-03-27 | Expectations Versus Reality: Evaluating Intrusion Detection Systems in Practice | Jake Hesford et.al. | 2403.17458 | null |
| 2024-03-25 | The pretty bad measurement | Caleb McIrvin et.al. | 2403.17252 | null |
| 2024-03-25 | XAV: A High-Performance Regular Expression Matching Engine for Packet Processing | Jincheng Zhong et.al. | 2403.16533 | null |
| 2024-03-24 | Constricting Normal Latent Space for Anomaly Detection with Normal-only Training Data | Marcella Astrid et.al. | 2403.16270 | null |
| 2024-03-22 | Multiple-Input Auto-Encoder Guided Feature Selection for IoT Intrusion Detection Systems | Phai Vu Dinh et.al. | 2403.15511 | null |
| 2024-03-22 | Hyperbolic Metric Learning for Visual Outlier Detection | Alvaro Gonzalez-Jimenez et.al. | 2403.15260 | null |
| 2024-03-21 | A Classifier-Based Approach to Multi-Class Anomaly Detection for Astronomical Transients | Rithwik Gupta et.al. | 2403.14742 | null |
| 2024-03-21 | A task of anomaly detection for a smart satellite Internet of things system | Zilong Shao et.al. | 2403.14738 | null |
| 2024-03-21 | MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection | Jakub Micorek et.al. | 2403.14497 | null |
| 2024-03-24 | Large Language Models for Blockchain Security: A Systematic Literature Review | Zheyuan He et.al. | 2403.14280 | null |
| 2024-03-21 | Diffusion Models with Ensembled Structure-Based Anomaly Scoring for Unsupervised Anomaly Detection | Finn Behrendt et.al. | 2403.14262 | link |
| 2024-03-21 | SoftPatch: Unsupervised Anomaly Detection with Noisy Data | Xi Jiang et.al. | 2403.14233 | link |
| 2024-03-21 | Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference | Xi Jiang et.al. | 2403.14213 | null |
| 2024-03-21 | Deep Learning for Trajectory Data Management and Mining: A Survey and Beyond | Wei Chen et.al. | 2403.14151 | link |
| 2024-03-21 | Automatic Outlier Rectification via Optimal Transport | Jose Blanchet et.al. | 2403.14067 | null |
| 2024-03-21 | Hypothesis-Driven Deep Learning for Out of Distribution Detection | Yasith Jayawardana et.al. | 2403.14058 | null |
| 2024-03-20 | Unsupervised learning in particle physics | Jai Bardhan et.al. | 2403.13676 | null |
| 2024-03-20 | Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection | Xincheng Yao et.al. | 2403.13349 | null |
| 2024-03-19 | Wildfire danger prediction optimization with transfer learning | Spiros Maggioros et.al. | 2403.12871 | link |
| 2024-03-19 | A Comparison of Deep Learning Architectures for Spacecraft Anomaly Detection | Daniel Lakey et.al. | 2403.12864 | null |
| 2024-03-19 | Improving Interpretability of Scores in Anomaly Detection Based on Gaussian-Bernoulli Restricted Boltzmann Machine | Kaiji Sekimoto et.al. | 2403.12672 | null |
| 2024-03-19 | Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection | Chengjie Wang et.al. | 2403.12580 | null |
| 2024-03-19 | Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images | Chaoqin Huang et.al. | 2403.12570 | link |
| 2024-03-19 | TAGS: Real-time Intrusion Detection with Tag-Propagation-based Provenance Graph Alignment on Streaming Events | Zhenyuan Li et.al. | 2403.12541 | null |
| 2024-03-19 | VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation | Hao Wang et.al. | 2403.12415 | null |
| 2024-03-19 | DMAD: Dual Memory Bank for Real-World Anomaly Detection | Jianlong Hu et.al. | 2403.12362 | null |
| 2024-03-18 | Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection | Ali Karami et.al. | 2403.12172 | null |
| 2024-03-18 | Problem space structural adversarial attacks for Network Intrusion Detection Systems based on Graph Neural Networks | Andrea Venturi et.al. | 2403.11830 | null |
| 2024-03-18 | Binary Noise for Binary Tasks: Masked Bernoulli Diffusion for Unsupervised Anomaly Detection | Julia Wolleb et.al. | 2403.11667 | null |
| 2024-03-18 | Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection | Liren He et.al. | 2403.11561 | null |
| 2024-03-18 | Out-of-Distribution Detection Should Use Conformal Prediction (and Vice-versa?) | Paul Novello et.al. | 2403.11532 | null |
| 2024-03-17 | Causality from Bottom to Top: A Survey | Abraham Itzhak Weinberg et.al. | 2403.11219 | null |
| 2024-03-17 | usfAD Based Effective Unknown Attack Detection Focused IDS Framework | Md. Ashraf Uddin et.al. | 2403.11180 | null |
| 2024-03-17 | Customizing Visual-Language Foundation Models for Multi-modal Anomaly Detection and Reasoning | Xiaohao Xu et.al. | 2403.11083 | link |
| 2024-03-16 | An Open-Source Experimentation Framework for the Edge Cloud Continuum | Georgios Koukis et.al. | 2403.10977 | null |
| 2024-03-16 | DTOR: Decision Tree Outlier Regressor to explain anomalies | Riccardo Crupi et.al. | 2403.10903 | link |
| 2024-03-16 | Anomaly Detection Based on Isolation Mechanisms: A Survey | Yang Cao et.al. | 2403.10802 | null |
| 2024-03-16 | Bayesian Design for Sampling Anomalous Spatio-Temporal Data | Katie Buchhorn et.al. | 2403.10791 | null |
| 2024-03-14 | Code Revert Prediction with Graph Neural Networks: A Case Study at J.P. Morgan Chase | Yulong Pei et.al. | 2403.09507 | null |
| 2024-03-14 | Anomaly Detection by Adapting a pre-trained Vision Language Model | Yuxuan Cai et.al. | 2403.09493 | null |
| 2024-03-14 | Detecting the third family of compact stars with normalizing flows | Valéria Carvalho et.al. | 2403.09398 | null |
| 2024-03-14 | Privacy Preserving Anomaly Detection on Homomorphic Encrypted Data from IoT Sensors | Anca Hangan et.al. | 2403.09322 | null |
| 2024-03-14 | Rethinking Autoencoders for Medical Anomaly Detection from A Theoretical Perspective | Yu Cai et.al. | 2403.09303 | null |
| 2024-03-14 | LAN: Learning Adaptive Neighbors for Real-Time Insider Threat Detection | Xiangrui Cai et.al. | 2403.09209 | link |
| 2024-03-14 | Spatial-temporal Memories Enhanced Graph Autoencoder for Anomaly Detection in Dynamic Graphs | Jie Liu et.al. | 2403.09039 | null |
| 2024-03-13 | Exploiting Structural Consistency of Chest Anatomy for Unsupervised Anomaly Detection in Radiography Images | Tiange Xiang et.al. | 2403.08689 | null |
| 2024-03-13 | Extracting Explanations, Justification, and Uncertainty from Black-Box Deep Neural Networks | Paul Ardis et.al. | 2403.08652 | null |
| 2024-03-13 | Caformer: Rethinking Time Series Analysis from Causal Perspective | Kexuan Zhang et.al. | 2403.08572 | null |
| 2024-03-13 | Diffusion Models with Implicit Guidance for Medical Anomaly Detection | Cosmin I. Bercea et.al. | 2403.08464 | null |
| 2024-03-13 | Validating and Exploring Large Geographic Corpora | Jonathan Dunn et.al. | 2403.08198 | null |
| 2024-03-12 | Supervised Time Series Classification for Anomaly Detection in Subsea Engineering | Ergys Çokaj et.al. | 2403.08013 | null |
| 2024-03-12 | An Interpretable Generalization Mechanism for Accurately Detecting Anomaly and Identifying Networking Intrusion Techniques | Hao-Ting Pai et.al. | 2403.07959 | null |
| 2024-03-12 | A robust SVM-based approach with feature selection and outliers detection for classification problems | Marta Baldomero-Naranjo et.al. | 2403.07753 | null |
| 2024-03-11 | Study of the Impact of the Big Data Era on Accounting and Auditing | Yuxiang Sun et.al. | 2403.07180 | null |
| 2024-03-11 | Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints | Jean V. Alves et.al. | 2403.06906 | null |
| 2024-03-11 | Detection of Object Throwing Behavior in Surveillance Videos | Ivo P. C. Kersten et.al. | 2403.06552 | null |
| 2024-03-12 | Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts | Jiawen Zhu et.al. | 2403.06495 | link |
| 2024-03-11 | When Crypto Economics Meet Graph Analytics and Learning | Bingqiao Luo et.al. | 2403.06454 | null |
| 2024-03-11 | Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation | Jan Laukemann et.al. | 2403.06348 | null |
| 2024-03-10 | Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation | Mingyu Lee et.al. | 2403.06247 | null |
| 2024-03-12 | GlanceVAD: Exploring Glance Supervision for Label-efficient Video Anomaly Detection | Huaxin Zhang et.al. | 2403.06154 | link |
| 2024-03-09 | RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection | Ximiao Zhang et.al. | 2403.05897 | link |
| 2024-03-08 | Learning Expressive And Generalizable Motion Features For Face Forgery Detection | Jingyi Zhang et.al. | 2403.05172 | null |
| 2024-03-08 | Simulating Battery-Powered TinyML Systems Optimised using Reinforcement Learning in Image-Based Anomaly Detection | Jared M. Ping et.al. | 2403.05106 | null |
| 2024-03-07 | Divide and Conquer: High-Resolution Industrial Anomaly Detection via Memory Efficient Tiled Ensemble | Blaž Rolih et.al. | 2403.04932 | link |
| 2024-03-07 | A Survey of Graph Neural Networks in Real world: Imbalance, Noise, Privacy and OOD Challenges | Wei Ju et.al. | 2403.04468 | null |
| 2024-03-07 | Exploring the Influence of Dimensionality Reduction on Anomaly Detection Performance in Multivariate Time Series | Mahsun Altin et.al. | 2403.04429 | link |
| 2024-03-07 | Signature Isolation Forest | Guillaume Staerman et.al. | 2403.04405 | null |
| 2024-03-07 | Effectiveness Assessment of Recent Large Vision-Language Models | Yao Jiang et.al. | 2403.04306 | null |
| 2024-03-07 | MKF-ADS: A Multi-Knowledge Fused Anomaly Detection System for Automotive | Pengzhou Cheng et.al. | 2403.04293 | null |
| 2024-03-07 | VAEMax: Open-Set Intrusion Detection based on OpenMax and Variational Autoencoder | Zhiyin Qiu et.al. | 2403.04193 | null |
| 2024-03-07 | Dual-path Frequency Discriminators for Few-shot Anomaly Detection | Yuhu Bai et.al. | 2403.04151 | null |
| 2024-03-06 | ZTRAN: Prototyping Zero Trust Security xApps for Open Radio Access Network Deployments | Aly S. Abdalla et.al. | 2403.04113 | null |
| 2024-03-06 | Three Revisits to Node-Level Graph Anomaly Detection: Outliers, Message Passing and Hyperbolic Neural Networks | Jing Gu et.al. | 2403.04010 | link |
| 2024-03-06 | Robust covariance estimation and explainable outlier detection for matrix-valued data | Marcus Mayrhofer et.al. | 2403.03975 | null |
| 2024-03-06 | Portraying the Need for Temporal Data in Flood Detection via Sentinel-1 | Xavier Bou et.al. | 2403.03671 | null |
| 2024-03-06 | Unsupervised Incremental Learning with Dual Concept Drift Detection for Identifying Anomalous Sequences | Jin Li et.al. | 2403.03576 | null |
| 2024-03-06 | Multimodal Anomaly Detection based on Deep Auto-Encoder for Object Slip Perception of Mobile Manipulation Robots | Youngjae Yoo et.al. | 2403.03563 | null |
| 2024-03-05 | Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection | Mohamed Afifi et.al. | 2403.03111 | null |
| 2024-03-05 | On-demand Mobility Services for Urban Resilience: A Review Towards Human-Machine Collaborative Future | Jiangbo Yu et.al. | 2403.03107 | null |
| 2024-03-05 | Self-adaptive Traffic Anomaly Detection System for IoT Smart Home Environments | Naoto Watanabe et.al. | 2403.02744 | null |
| 2024-03-05 | Interactive Continual Learning: Fast and Slow Thinking | Biqing Qi et.al. | 2403.02628 | null |
| 2024-03-04 | Towards efficient deep autoencoders for multivariate time series anomaly detection | Marcin Pietroń et.al. | 2403.02429 | null |
| 2024-03-04 | Unsupervised Distance Metric Learning for Anomaly Detection Over Multivariate Time Series | Hanyang Yuan et.al. | 2403.01895 | null |
| 2024-03-04 | CSE: Surface Anomaly Detection with Contrastively Selected Embedding | Simon Thomine et.al. | 2403.01859 | link |
| 2024-03-04 | Deployment Challenges of Industrial Intrusion Detection Systems | Konrad Wolsing et.al. | 2403.01809 | null |
| 2024-03-04 | PointCore: Efficient Unsupervised Point Cloud Anomaly Detector Using Local-Global Features | Baozhu Zhao et.al. | 2403.01804 | null |
| 2024-03-03 | Applying Self-supervised Learning to Network Intrusion Detection for Network Flows with Graph Neural Network | Renjie Xu et.al. | 2403.01501 | link |
| 2024-03-02 | AcME-AD: Accelerated Model Explanations for Anomaly Detection | Valentina Zaccaria et.al. | 2403.01245 | null |
| 2024-03-02 | Shaping Multi-Robot Patrol Performance with Heterogeneity in Individual Learning Behavior | Connor York et.al. | 2403.01181 | null |
| 2024-03-02 | Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection | Chenchen Tao et.al. | 2403.01169 | null |
| 2024-03-01 | Dimensionality reduction techniques to support insider trading detection | Adele Ravagnani et.al. | 2403.00707 | null |
| 2024-03-01 | The Impact of Frequency Bands on Acoustic Anomaly Detection of Machines using Deep Learning Based Model | Tin Nguyen et.al. | 2403.00379 | null |
| 2024-03-01 | WindGP: Efficient Graph Partitioning on Heterogenous Machines | Li Zeng et.al. | 2403.00331 | null |
| 2024-02-29 | UniTS: Building a Unified Time Series Model | Shanghua Gao et.al. | 2403.00131 | link |
| 2024-02-29 | A Novel Approach to Industrial Defect Generation through Blended Latent Diffusion Model with Online Adaptation | Hanxi Li et.al. | 2402.19330 | null |
| 2024-02-29 | Anomaly Detection in Offshore Wind Turbine Structures using Hierarchical Bayesian Modelling | S. M. Smith et.al. | 2402.19295 | null |
| 2024-02-29 | A SAM-guided Two-stream Lightweight Model for Anomaly Detection | Chenghao Li et.al. | 2402.19145 | link |
| 2024-02-29 | COFT-AD: COntrastive Fine-Tuning for Few-Shot Anomaly Detection | Jingyi Liao et.al. | 2402.18998 | null |
| 2024-02-29 | Always be Pre-Training: Representation Learning for Network Intrusion Detection with GNNs | Zhengyao Gu et.al. | 2402.18986 | null |
| 2024-02-28 | Objective and Interpretable Breast Cosmesis Evaluation with Attention Guided Denoising Diffusion Anomaly Detection Model | Sangjoon Park et.al. | 2402.18362 | null |
| 2024-02-28 | Grid-Based Continuous Normal Representation for Anomaly Detection | Joo Chan Lee et.al. | 2402.18293 | link |
| 2024-02-28 | A Compact Anomaly Detection Solution for Science Instruments | Alfonso Lagares de Toledo et.al. | 2402.17961 | null |
| 2024-02-27 | Outlier-Detection for Reactive Machine Learned Potential Energy Surfaces | Luis Itza Vazquez-Salazar et.al. | 2402.17686 | null |
| 2024-02-27 | Fraud Detection with Binding Global and Local Relational Interaction | Haolin Li et.al. | 2402.17472 | null |
| 2024-02-27 | CGGM: A conditional graph generation model with adaptive sparsity for node anomaly detection in IoT networks | Xianshi Su et.al. | 2402.17363 | null |
| 2024-02-27 | Structural Teacher-Student Normality Learning for Multi-Class Anomaly Detection and Localization | Hanqiu Deng et.al. | 2402.17091 | null |
| 2024-02-26 | Deep Learning Algorithms Used in Intrusion Detection Systems – A Review | Richard Kimanzi et.al. | 2402.17020 | null |
| 2024-02-25 | An Adversarial Robustness Benchmark for Enterprise Network Intrusion Detection | João Vitorino et.al. | 2402.16912 | null |
| 2024-02-26 | Uncertainty Quantification in Anomaly Detection with Cross-Conformal $p$ -Values | Oliver Hennhöfer et.al. | 2402.16388 | null |