LLM - 2026-01

Publish Date Title Authors PDF Translate Read Code
2026-01-30 FOCUS: DLLMs Know How to Tame Their Compute Bound Kaihua Liang et.al. 2601.23278 translate read null
2026-01-30 UPA: Unsupervised Prompt Agent via Tree-Based Search and Selection Siran Peng et.al. 2601.23273 translate read null
2026-01-30 TEON: Tensorized Orthonormalization Beyond Layer-Wise Muon for Large Language Model Pre-Training Ruijie Zhang et.al. 2601.23261 translate read null
2026-01-30 GrepRAG: An Empirical Study and Optimization of Grep-Like Retrieval for Code Completion Baoyi Wang et.al. 2601.23254 translate read null
2026-01-30 ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search Tao Yu et.al. 2601.23232 translate read null
2026-01-30 Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning Xiangyu Zeng et.al. 2601.23224 translate read null
2026-01-30 Med-Scout: Curing MLLMs’ Geometric Blindness in Medical Perception via Geometry-Aware RL Post-Training Anglin Liu et.al. 2601.23220 translate read null
2026-01-30 High-quality generation of dynamic game content via small language models: A proof of concept Morten I. K. Munk et.al. 2601.23206 translate read null
2026-01-30 TSAQA: Time Series Analysis Question And Answering Benchmark Baoyu Jing et.al. 2601.23204 translate read null
2026-01-30 Large Language Models for Patent Classification: Strengths, Trade-offs, and the Long Tail Effect Lorenzo Emer et.al. 2601.23200 translate read null
2026-01-30 Deep Search with Hierarchical Meta-Cognitive Monitoring Inspired by Cognitive Neuroscience Zhongxiang Sun et.al. 2601.23188 translate read null
2026-01-30 ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought Fanmeng Wang et.al. 2601.23184 translate read link
2026-01-30 TriSpec: Ternary Speculative Decoding via Lightweight Proxy Verification Haoyun Jiang et.al. 2601.23180 translate read null
2026-01-30 Make Anything Match Your Target: Universal Adversarial Perturbations against Closed-Source MLLMs via Multi-Crop Routed Meta Optimization Hui Lu et.al. 2601.23179 translate read null
2026-01-30 Probing the Trajectories of Reasoning Traces in Large Language Models Marthe Ballon et.al. 2601.23163 translate read null
2026-01-30 DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding Jiaming Zhou et.al. 2601.23161 translate read link
2026-01-30 SPICE: Submodular Penalized Information-Conflict Selection for Efficient Large Language Model Training Powei Chang et.al. 2601.23155 translate read null
2026-01-30 Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data Eugenia Iofinova et.al. 2601.23153 translate read null
2026-01-30 Hearing is Believing? Evaluating and Analyzing Audio Language Model Sycophancy with SYAUDIO Junchi Yao et.al. 2601.23149 translate read null
2026-01-30 RAudit: A Blind Auditing Protocol for Large Language Model Reasoning Edward Y. Chang et.al. 2601.23133 translate read null
2026-01-30 Secure Tool Manifest and Digital Signing Solution for Verifiable MCP and LLM Pipelines Saeid Jamshidi et.al. 2601.23132 translate read null
2026-01-30 An Automatic Deep Learning Approach for Trailer Generation through Large Language Models Roberto Balestri et.al. 2601.23121 translate read null
2026-01-30 CATTO: Balancing Preferences and Confidence in Language Models Nisarg Parikh et.al. 2601.23096 translate read null
2026-01-30 Exploring Sidewalk Sheds in New York City through Chatbot Surveys and Human Computer Interaction Junyi Li et.al. 2601.23095 translate read null
2026-01-30 WiFiPenTester: Advancing Wireless Ethical Hacking with Governed GenAI Haitham S. Al-Sinani et.al. 2601.23092 translate read null
2026-01-30 OrLog: Resolving Complex Queries with LLMs and Probabilistic Reasoning Mohanna Hoveyda et.al. 2601.23085 translate read null
2026-01-30 Character as a Latent Variable in Large Language Models: A Mechanistic Account of Emergent Misalignment and Conditional Safety Failures Yanghao Su et.al. 2601.23081 translate read null
2026-01-30 Towards Explicit Acoustic Evidence Perception in Audio LLMs for Speech Deepfake Detection Xiaoxuan Guo et.al. 2601.23066 translate read null
2026-01-30 HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation Hari Krishna Gadi et.al. 2601.23064 translate read null
2026-01-30 Gender Disparities in StackOverflow’s Community-Based Question Answering: A Matter of Quantity versus Quality Maddalena Amendola et.al. 2601.23063 translate read null
2026-01-30 On the Impact of Code Comments for Automated Bug-Fixing: An Empirical Study Antonio Vitale et.al. 2601.23059 translate read null
2026-01-30 From Absolute to Relative: Rethinking Reward Shaping in Group-Based Reinforcement Learning Wenzhe Niu et.al. 2601.23058 translate read null
2026-01-30 From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics Bowen Cao et.al. 2601.23048 translate read null
2026-01-30 Guided by Trajectories: Repairing and Rewarding Tool-Use Trajectories for Tool-Integrated Reasoning Siyu Gong et.al. 2601.23032 translate read null
2026-01-30 DimABSA: Building Multilingual and Multidomain Datasets for Dimensional Aspect-Based Sentiment Analysis Lung-Hao Lee et.al. 2601.23022 translate read null
2026-01-30 Integrating Multi-Label Classification and Generative AI for Scalable Analysis of User Feedback Sandra Loop et.al. 2601.23018 translate read null
2026-01-30 SolAgent: A Specialized Multi-Agent Framework for Solidity Code Generation Wei Chen et.al. 2601.23009 translate read null
2026-01-30 InstructDiff: Domain-Adaptive Data Selection via Differential Entropy for Efficient LLM Fine-Tuning Junyou Su et.al. 2601.23006 translate read null
2026-01-30 Bias Beyond Borders: Political Ideology Evaluation and Steering in Multilingual LLMs Afrozah Nadeem et.al. 2601.23001 translate read null
2026-01-30 Mano: Restriking Manifold Optimization for LLM Training Yufei Gu et.al. 2601.23000 translate read null
2026-01-30 Competitive Non-Clairvoyant KV-Cache Scheduling for LLM Inference Yiding Feng et.al. 2601.22996 translate read null
2026-01-30 Learnable Permutation for Structured Sparsity on Transformer Models Zekai Li et.al. 2601.22980 translate read null
2026-01-30 Quantifying Model Uniqueness in Heterogeneous AI Ecosystems Lei You et.al. 2601.22977 translate read null
2026-01-30 Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Ximing Lu et.al. 2601.22975 translate read null
2026-01-30 MiTa: A Hierarchical Multi-Agent Collaboration Framework with Memory-integrated and Task Allocation XiaoJie Zhang et.al. 2601.22974 translate read null
2026-01-30 A Unified View of Attention and Residual Sinks: Outlier-Driven Rescaling is Essential for Transformer Training Zihan Qiu et.al. 2601.22966 translate read null
2026-01-30 SWE-Manager: Selecting and Synthesizing Golden Proposals Before Coding Boyin Tan et.al. 2601.22956 translate read null
2026-01-30 Residual Context Diffusion Language Models Yuezhou Hu et.al. 2601.22954 translate read null
2026-01-30 Sifting the Noise: A Comparative Study of LLM Agents in Vulnerability False Positive Filtering Yunpeng Xiong et.al. 2601.22952 translate read null
2026-01-30 Alignment among Language, Vision and Action Representations Nicola Milano et.al. 2601.22948 translate read null
2026-01-30 Relaxing Positional Alignment in Masked Diffusion Language Models Mengyu Ye et.al. 2601.22947 translate read null
2026-01-30 Protecting Private Code in IDE Autocomplete using Differential Privacy Evgeny Grigorenko et.al. 2601.22935 translate read null
2026-01-30 MTDrive: Multi-turn Interactive Reinforcement Learning for Autonomous Driving Xidong Li et.al. 2601.22930 translate read null
2026-01-30 LLMs Explain’t: A Post-Mortem on Semantic Interpretability in Transformer Models Alhassan Abdelhalim et.al. 2601.22928 translate read null
2026-01-30 BEAR: Towards Beam-Search-Aware Optimization for Recommendation with Large Language Models Weiqin Yang et.al. 2601.22925 translate read null
2026-01-30 Evaluating Large Language Models for Security Bug Report Prediction Farnaz Soltaniani et.al. 2601.22921 translate read null
2026-01-30 LLMDR: Large language model driven framework for missing data recovery in mixed data under low resource regime Durga Keshav et.al. 2601.22916 translate read null
2026-01-30 Game-Theoretic Co-Evolution for LLM-Based Heuristic Discovery Xinyi Ke et.al. 2601.22896 translate read null
2026-01-30 When Machines Get It Wrong: Large Language Models Perpetuate Autism Myths More Than Humans Do Eduardo C. Garrido-Merchán et.al. 2601.22893 translate read null
2026-01-30 MoVE: Mixture of Value Embeddings – A New Axis for Scaling Parametric Memory in Autoregressive Models Yangyan Li et.al. 2601.22887 translate read null
2026-01-30 Leveraging LLMs For Turkish Skill Extraction Ezgi Arslan İltüzer et.al. 2601.22885 translate read null
2026-01-30 EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis Li Zhou et.al. 2601.22873 translate read null
2026-01-30 MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering Chuanzhe Guo et.al. 2601.22859 translate read null
2026-01-30 Learning to Build Shapes by Extrusion Thor Vestergaard Christiansen et.al. 2601.22858 translate read null
2026-01-30 Hierarchical Shift Mixing – Beyond Dense Attention in Transformers Robert Forchheimer et.al. 2601.22852 translate read null
2026-01-30 When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training Felicia Körner et.al. 2601.22851 translate read null
2026-01-30 Hide and Seek in Embedding Space: Geometry-based Steganography and Detection in Large Language Models Charles Westphal et.al. 2601.22818 translate read null
2026-01-30 Stable Personas: Dual-Assessment of Temporal Stability in LLM-Based Human Simulation Jana Gonnermann-Müller et.al. 2601.22812 translate read null
2026-01-30 Operational Solar Flare Forecasting System Using an Explainable Large Language Model Xuebao Li et.al. 2601.22811 translate read null
2026-01-30 Clipping-Free Policy Optimization for Large Language Models Ömer Veysel Çağatan et.al. 2601.22801 translate read null
2026-01-30 Sparse or Dense? A Mechanistic Estimation of Computation Density in Transformer-based LLMs Corentin Kervadec et.al. 2601.22795 translate read null
2026-01-30 Understanding on the Edge: LLM-generated Boundary Test Explanations Sabinakhon Akbarova et.al. 2601.22791 translate read null
2026-01-30 Toward IIT-Inspired Consciousness in LLMs: A Reward-Based Learning Framework Hamid Reza Akbari et.al. 2601.22786 translate read null
2026-01-30 Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval Ilyass Moummad et.al. 2601.22783 translate read null
2026-01-30 Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization Genshun Wan et.al. 2601.22779 translate read null
2026-01-30 RASST: Fast Cross-modal Retrieval-Augmented Simultaneous Speech Translation Jiaxuan Luo et.al. 2601.22777 translate read null
2026-01-30 TSPO: Breaking the Double Homogenization Dilemma in Multi-turn Search Policy Optimization Shichao Ma et.al. 2601.22776 translate read null
2026-01-30 How Far Can Pretrained LLMs Go in Symbolic Music? Controlled Comparisons of Supervised and Preference-based Adaptation Deepak Kumar et.al. 2601.22764 translate read null
2026-01-30 AscendCraft: Automatic Ascend NPU Kernel Generation via DSL-Guided Transcompilation Zhongzhen Wen et.al. 2601.22760 translate read null
2026-01-30 Qualitative Evaluation of LLM-Designed GUI Bartosz Sawicki et.al. 2601.22759 translate read null
2026-01-30 AutoRefine: From Trajectories to Reusable Expertise for Continual LLM Agent Refinement Libin Qiu et.al. 2601.22758 translate read null
2026-01-30 AutoMerge: Search-Based Model Merging Framework for Effective Model Reuse You Lu et.al. 2601.22748 translate read null
2026-01-30 AR-BENCH: Benchmarking Legal Reasoning with Judgment Error Detection, Classification and Correction Yifei Li et.al. 2601.22742 translate read null
2026-01-30 MM-THEBench: Do Reasoning MLLMs Think Reasonably? Zhidian Huang et.al. 2601.22735 translate read null
2026-01-30 ImgCoT: Compressing Long Chain of Thought into Compact Visual Tokens for Efficient Reasoning of Large Language Model Xiaoshu Chen et.al. 2601.22730 translate read null
2026-01-30 A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization Shiye Lei et.al. 2601.22718 translate read null
2026-01-30 RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories Yanlin Wang et.al. 2601.22706 translate read null
2026-01-30 Models Know Models Best: Evaluation via Model-Preferred Formats Joonhak Lee et.al. 2601.22699 translate read null
2026-01-30 FNF: Functional Network Fingerprint for Large Language Models Yiheng Liu et.al. 2601.22692 translate read null
2026-01-30 Do Transformers Have the Ability for Periodicity Generalization? Huanyu Liu et.al. 2601.22690 translate read null
2026-01-30 BioModelsRAG: A Biological Modeling Assistant Using RAG (Retrieval Augmented Generation) Bhavyahshree Navaneetha Krishnan et.al. 2601.22684 translate read null
2026-01-30 VarParser: Unleashing the Neglected Power of Variables for LLM-based Log Parsing Jinrui Sun et.al. 2601.22676 translate read null
2026-01-30 VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration Hanxun Yu et.al. 2601.22674 translate read link
2026-01-30 Real-Time Aligned Reward Model beyond Semantics Zixuan Huang et.al. 2601.22664 translate read null
2026-01-30 Task-Aware LLM Council with Adaptive Decision Pathways for Decision Support Wei Zhu et.al. 2601.22662 translate read null
2026-01-30 UCPO: Uncertainty-Aware Policy Optimization Xianzhou Zeng et.al. 2601.22648 translate read null
2026-01-30 Beyond Medical Chatbots: Meddollina and the Rise of Continuous Clinical Intelligence Vaibhav Ram S. V. N. S et.al. 2601.22645 translate read null
2026-01-30 Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification Chuxue Cao et.al. 2601.22642 translate read null
2026-01-30 Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling Mingqian Feng et.al. 2601.22636 translate read null
2026-01-30 MCP-Diag: A Deterministic, Protocol-Driven Architecture for AI-Native Network Diagnostics Devansh Lodha et.al. 2601.22633 translate read null
2026-01-30 DART-ing Through the Drift: Dynamic Tracing of Knowledge Neurons for Adaptive Inference-Time Pruning Abhishek Tyagi et.al. 2601.22632 translate read null
2026-01-30 Time-Annealed Perturbation Sampling: Diverse Generation for Diffusion Language Models Jingxuan Wu et.al. 2601.22629 translate read null
2026-01-30 TTCS: Test-Time Curriculum Synthesis for Self-Evolving Chengyi Yang et.al. 2601.22628 translate read link
2026-01-30 SYMPHONY: Synergistic Multi-agent Planning with Heterogeneous Language Model Assembly Wei Zhu et.al. 2601.22623 translate read null
2026-01-30 Ethical Risks of Large Language Models in Medical Consultation: An Assessment Based on Reproductive Ethics Hanhui Xu et.al. 2601.22621 translate read null
2026-01-30 Layer-wise Swapping for Generalizable Multilingual Safety Hyunseo Shin et.al. 2601.22620 translate read null
2026-01-30 Learn More with Less: Uncertainty Consistency Guided Query Selection for RLVR Hao Yi et.al. 2601.22595 translate read null
2026-01-30 Small is Beautiful: A Practical and Efficient Log Parsing Framework Minxing Wang et.al. 2601.22590 translate read null
2026-01-30 Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry Zhuochun Li et.al. 2601.22588 translate read null
2026-01-30 HetCCL: Accelerating LLM Training with Heterogeneous GPUs Heehoon Kim et.al. 2601.22585 translate read null
2026-01-30 SpanNorm: Reconciling Training Stability and Performance in Deep Transformers Chao Wang et.al. 2601.22580 translate read null
2026-01-30 PhoStream: Benchmarking Real-World Streaming for Omnimodal Assistants in Mobile Scenarios Xudong Lu et.al. 2601.22575 translate read null
2026-01-30 Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding Yuansheng Gao et.al. 2601.22574 translate read null
2026-01-30 PerfGuard: A Performance-Aware Agent for Visual Content Generation Zhipeng Chen et.al. 2601.22571 translate read null
2026-01-30 Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction Aditya Sarkar et.al. 2601.22570 translate read null
2026-01-30 Whispers of Wealth: Red-Teaming Google’s Agent Payments Protocol via Prompt Injection Tanusree Debi et.al. 2601.22569 translate read null
2026-01-30 Are LLM Evaluators Really Narcissists? Sanity Checking Self-Preference Evaluations Dani Roytburg et.al. 2601.22548 translate read null
2026-01-30 Towards the Holographic Characteristic of LLMs for Efficient Short-text Generation Shun Qian et.al. 2601.22546 translate read null
2026-01-30 SCaLRec: Semantic Calibration for LLM-enabled Cloud-Device Sequential Recommendation Ruiqi Zheng et.al. 2601.22543 translate read null
2026-01-30 Decoding in Geometry: Alleviating Embedding-Space Crowding for Complex Reasoning Yixin Yang et.al. 2601.22536 translate read null
2026-01-30 Darwinian Memory: A Training-Free Self-Regulating Memory System for GUI Agent Evolution Hongze Mi et.al. 2601.22528 translate read null
2026-01-30 $ρ$-$\texttt{EOS}$ : Training-free Bidirectional Variable-Length Control for Masked Diffusion LLMs Jingyi Yang et.al. 2601.22527 translate read null
2026-01-30 Shattered Compositionality: Counterintuitive Learning Dynamics of Transformers for Arithmetic Xingyu Zhao et.al. 2601.22510 translate read null
2026-01-30 FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks Naen Xu et.al. 2601.22485 translate read null
2026-01-30 Head-Aware Visual Cropping: Enhancing Fine-Grained VQA with Attention-Guided Subimage Junfei Xie et.al. 2601.22483 translate read null
2026-01-30 Transform-Augmented GRPO Improves Pass@k Khiem Le et.al. 2601.22478 translate read null
2026-01-30 Unrewarded Exploration in Large Language Models Reveals Latent Learning from Psychology Jian Xiong et.al. 2601.22474 translate read null
2026-01-30 Toward Non-Expert Customized Congestion Control Mingrui Zhang et.al. 2601.22461 translate read null
2026-01-30 ScribbleSense: Generative Scribble-Based Texture Editing with Intent Prediction Yudi Zhang et.al. 2601.22455 translate read null
2026-01-30 Does My Chatbot Have an Agenda? Understanding Human and AI Agency in Human-Human-like Chatbot Interaction Bhada Yun et.al. 2601.22452 translate read null
2026-01-30 Countering the Over-Reliance Trap: Mitigating Object Hallucination for LVLMs via a Self-Validation Framework Shiyu Liu et.al. 2601.22451 translate read null
2026-01-30 Towards Resiliency in Large Language Model Serving with KevlarFlow Shangshu Qian et.al. 2601.22438 translate read null
2026-01-30 Large Language Model Agents Are Not Always Faithful Self-Evolvers Weixiang Zhao et.al. 2601.22436 translate read null
2026-01-30 When LLM meets Fuzzy-TOPSIS for Personnel Selection through Automated Profile Analysis Shahria Hoque et.al. 2601.22433 translate read null
2026-01-30 ScamPilot: Simulating Conversations with LLMs to Protect Against Online Scams Owen Hoffman et.al. 2601.22426 translate read null
2026-01-29 Bifocal Attention: Harmonizing Geometric and Spectral Positional Embeddings for Algorithmic Generalization Kanishk Awadhiya et.al. 2601.22402 translate read null
2026-01-29 Jailbreaks on Vision Language Model via Multimodal Reasoning Aarush Noheria et.al. 2601.22398 translate read null
2026-01-29 Culturally Grounded Personas in Large Language Models: Characterization and Alignment with Socio-Psychological Value Frameworks Candida M. Greco et.al. 2601.22396 translate read null
2026-01-29 Specialists or Generalists? Multi-Agent and Single-Agent LLMs for Essay Grading Jamiu Adekunle Idowu et.al. 2601.22386 translate read null
2026-01-29 Purely Agentic Black-Box Optimization for Biological Design Natalie Maus et.al. 2601.22382 translate read null
2026-01-29 Stability-Aware Prompt Optimization for Clinical Data Abstraction Arinbjörn Kolbeinsson et.al. 2601.22373 translate read null
2026-01-29 Towards Solving the Gilbert-Pollak Conjecture via Large Language Models Yisi Ke et.al. 2601.22365 translate read null
2026-01-29 Context Structure Reshapes the Representational Geometry of Language Models Eghbal A. Hosseini et.al. 2601.22364 translate read null
2026-01-29 Understanding Efficiency: Quantization, Batching, and Serving Strategies in LLM Energy Use Julien Delavande et.al. 2601.22362 translate read null
2026-01-29 MERMAID: Memory-Enhanced Retrieval and Reasoning with Multi-Agent Iterative Knowledge Grounding for Veracity Assessment Yupeng Cao et.al. 2601.22361 translate read null
2026-01-29 Small Talk, Big Impact: The Energy Cost of Thanking AI Julien Delavande et.al. 2601.22357 translate read null
2026-01-29 Sparks of Rationality: Do Reasoning LLMs Align with Human Judgment and Choice? Ala N. Tak et.al. 2601.22329 translate read null
2026-01-29 Federate the Router: Learning Language Model Routers with Sparse and Decentralized Evaluations Baris Askin et.al. 2601.22318 translate read null
2026-01-29 Gaussian Process Bandit Optimization with Machine Learning Predictions and Application to Hypothesis Generation Xin Jennifer Chen et.al. 2601.22315 translate read null
2026-01-29 Hair-Trigger Alignment: Black-Box Evaluation Cannot Guarantee Post-Update Alignment Yavuz Bakman et.al. 2601.22313 translate read null
2026-01-29 SCALAR: Quantifying Structural Hallucination, Consistency, and Reasoning Gaps in Materials Foundation Models Can Polat et.al. 2601.22312 translate read null
2026-01-29 Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents Zehong Wang et.al. 2601.22311 translate read null
2026-01-29 Prepare Reasoning Language Models for Multi-Agent Debate with Self-Debate Reinforcement Learning Chenxi Liu et.al. 2601.22297 translate read null
2026-01-29 The Six Sigma Agent: Achieving Enterprise-Grade Reliability in LLM Systems Through Consensus-Driven Decomposed Execution Khush Patel et.al. 2601.22290 translate read null
2026-01-29 FunPRM: Function-as-Step Process Reward Model with Meta Reward Correction for Code Generation Ruiyi Zhang et.al. 2601.22249 translate read null
2026-01-29 MirrorMark: A Distortion-Free Multi-Bit Watermark for Large Language Models Ya Jiang et.al. 2601.22246 translate read null
2026-01-29 A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy Pedro H. Barcha Correia et.al. 2601.22240 translate read null
2026-01-29 What Lies Beneath: A Call for Distribution-based Visual Question & Answer Datasets Jill P. Naiman et.al. 2601.22218 translate read null
2026-01-29 Stalled, Biased, and Confused: Uncovering Reasoning Failures in LLMs for Cloud-Based Root Cause Analysis Evelien Riddell et.al. 2601.22208 translate read null
2026-01-28 Tacit Coordination of Large Language Models Ido Aharon et.al. 2601.22184 translate read null
2026-01-29 UEval: A Benchmark for Unified Multimodal Generation Bo Li et.al. 2601.22155 translate read null
2026-01-29 DynaWeb: Model-Based Reinforcement Learning of Web Agents Hang Ding et.al. 2601.22149 translate read null
2026-01-29 FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale Ajay Patel et.al. 2601.22146 translate read null
2026-01-29 Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers Xin Chen et.al. 2601.22139 translate read null
2026-01-29 Pay for Hints, Not Answers: LLM Shepherding for Cost-Efficient Inference Ziming Dong et.al. 2601.22132 translate read link
2026-01-29 World of Workflows: a Benchmark for Bringing World Models to Enterprise Systems Lakshya Gupta et.al. 2601.22130 translate read null
2026-01-29 SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents Yifeng Ding et.al. 2601.22129 translate read null
2026-01-29 The Patient is not a Moving Document: A World Model Training Paradigm for Longitudinal EHR Irsyad Adam et.al. 2601.22128 translate read null
2026-01-29 A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine Anran Li et.al. 2601.22124 translate read null
2026-01-29 ECO: Quantized Training without Full-Precision Master Weights Mahdi Nikdan et.al. 2601.22101 translate read null
2026-01-29 VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning Yibo Wang et.al. 2601.22069 translate read link
2026-01-29 Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Wenxuan Huang et.al. 2601.22060 translate read null
2026-01-29 AIRPET: Virtual Positron Emission Tomography J. Renner et.al. 2601.22059 translate read null
2026-01-29 MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources Baorui Ma et.al. 2601.22054 translate read link
2026-01-29 MasalBench: A Benchmark for Contextual and Cross-Cultural Understanding of Persian Proverbs in LLMs Ghazal Kalhor et.al. 2601.22050 translate read null
2026-01-29 On the Paradoxical Interference between Instruction-Following and Task Solving Yunjia Qi et.al. 2601.22047 translate read null
2026-01-29 Per-parameter Task Arithmetic for Unlearning in Large Language Models Chengyi Cai et.al. 2601.22030 translate read null
2026-01-29 CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty Johannes Kirmayr et.al. 2601.22027 translate read null
2026-01-29 When “Better” Prompts Hurt: Evaluation-Driven Iteration for LLM Applications Daniel Commey et.al. 2601.22025 translate read null
2026-01-29 Visual-Guided Key-Token Regularization for Multimodal Large Language Model Unlearning Chengyi Cai et.al. 2601.22020 translate read null
2026-01-29 TBDFiltering: Sample-Efficient Tree-Based Data Filtering Robert Istvan Busa-Fekete et.al. 2601.22016 translate read null
2026-01-29 SpecTran: Spectral-Aware Transformer-based Adapter for LLM-Enhanced Sequential Recommendation Yu Cui et.al. 2601.21986 translate read null
2026-01-29 Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding Yifan Zhu et.al. 2601.21969 translate read null
2026-01-29 Industrialized Deception: The Collateral Effects of LLM-Generated Misinformation on Digital Ecosystems Alexander Loth et.al. 2601.21963 translate read null
2026-01-29 ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models Bowen Fang et.al. 2601.21947 translate read null
2026-01-29 Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities Shuangshuang Ying et.al. 2601.21937 translate read null
2026-01-29 Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text Hongyi Zhou et.al. 2601.21895 translate read null
2026-01-29 Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning Lukas Twist et.al. 2601.21894 translate read null
2026-01-29 astra-langchain4j: Experiences Combining LLMs and Agent Programming Rem Collier et.al. 2601.21879 translate read null
2026-01-29 Evolution of Benchmark: Black-Box Optimization Benchmark Design through Large Language Model Chen Wang et.al. 2601.21877 translate read null
2026-01-29 LLM-Driven Scenario-Aware Planning for Autonomous Driving He Li et.al. 2601.21876 translate read null
2026-01-29 WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents Yao Zhang et.al. 2601.21872 translate read null
2026-01-29 KnowBias: Mitigating Social Bias in LLMs via Know-Bias Neuron Enhancement Jinhao Pan et.al. 2601.21864 translate read null
2026-01-29 READY: Reward Discovery for Meta-Black-Box Optimization Zechuan Huang et.al. 2601.21847 translate read null
2026-01-29 Embodied Task Planning via Graph-Informed Action Generation with Large Lanaguage Model Xiang Li et.al. 2601.21841 translate read null
2026-01-29 Test-Time Compute Games Ander Artola Velasco et.al. 2601.21839 translate read null
2026-01-29 Mil-SCORE: Benchmarking Long-Context Geospatial Reasoning and Planning in Large Language Models Aadi Palnitkar et.al. 2601.21826 translate read null
2026-01-29 DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training Xinwei Qiang et.al. 2601.21824 translate read null
2026-01-29 CORE:Toward Ubiquitous 6G Intelligence Through Collaborative Orchestration of Large Language Model Agents Over Hierarchical Edge Zitong Yu et.al. 2601.21822 translate read null
2026-01-29 A Judge-Aware Ranking Framework for Evaluating Large Language Models without Ground Truth Mingyuan Xu et.al. 2601.21817 translate read null
2026-01-29 Nonparametric LLM Evaluation from Preference Data Dennis Frauen et.al. 2601.21816 translate read null
2026-01-29 Distribution-Aware Reward Estimation for Test-Time Reinforcement Learning Bodong Du et.al. 2601.21804 translate read null
2026-01-29 A Unified XAI-LLM Approach for EndotrachealSuctioning Activity Recognition Hoang Khang Phan et.al. 2601.21802 translate read null
2026-01-29 CG-MLLM: Captioning and Generating 3D content via Multi-modal Large Language Models Junming Huang et.al. 2601.21798 translate read null
2026-01-29 Effective LoRA Adapter Routing using Task Representations Akash Dhasade et.al. 2601.21795 translate read null
2026-01-29 Assessing the Business Process Modeling Competences of Large Language Models Chantale Lauer et.al. 2601.21787 translate read null
2026-01-29 Zonkey: A Hierarchical Diffusion Language Model with Differentiable Tokenization and Probabilistic Attention Alon Rozental et.al. 2601.21768 translate read null
2026-01-29 Evaluating ChatGPT on Medical Information Extraction Tasks: Performance, Explainability and Beyond Wei Zhu et.al. 2601.21767 translate read null
2026-01-29 EWSJF: An Adaptive Scheduler with Hybrid Partitioning for Mixed-Workload LLM Inference Bronislav Sidik et.al. 2601.21758 translate read null
2026-01-29 Language-based Trial and Error Falls Behind in the Era of Experience Haoyu Wang et.al. 2601.21754 translate read null
2026-01-29 Temporal Guidance for Large Language Models Hong-Kai Zheng et.al. 2601.21744 translate read null
2026-01-29 MIDI-LLaMA: An Instruction-Following Multimodal LLM for Symbolic Music Understanding Meng Yang et.al. 2601.21740 translate read null
2026-01-29 CE-GOCD: Central Entity-Guided Graph Optimization for Community Detection to Augment LLM Scientific Question Answering Jiayin Lan et.al. 2601.21733 translate read null
2026-01-29 E-mem: Multi-agent based Episodic Context Reconstruction for LLM Agent Memory Kaixiang Wang et.al. 2601.21714 translate read null
2026-01-29 TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning Huiyuan Lai et.al. 2601.21711 translate read null
2026-01-29 Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis Qingyue Yang et.al. 2601.21709 translate read link
2026-01-29 FBS: Modeling Native Parallel Reading inside a Transformer Tongxi Wang et.al. 2601.21708 translate read null
2026-01-29 Toward Culturally Aligned LLMs through Ontology-Guided Multi-Agent Reasoning Wonduk Seo et.al. 2601.21700 translate read null
2026-01-29 ChartE $^{3}$ : A Comprehensive Benchmark for End-to-End Chart Editing Shuo Li et.al. 2601.21694 translate read null
2026-01-29 TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning Mingzu Liu et.al. 2601.21692 translate read null
2026-01-29 Do Not Waste Your Rollouts: Recycling Search Experience for Efficient Test-Time Scaling Xinglin Wang et.al. 2601.21684 translate read null
2026-01-29 FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning Xiaoyu Xu et.al. 2601.21682 translate read null
2026-01-29 LLM4Fluid: Large Language Models as Generalizable Neural Solvers for Fluid Dynamics Qisong Xiao et.al. 2601.21681 translate read null
2026-01-29 Scale-Dependent Semantic Dynamics Revealed by Allan Deviation Debayan Dasgupta et.al. 2601.21678 translate read null
2026-01-29 SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding Ahmed Y. Radwan et.al. 2601.21666 translate read link
2026-01-29 AdaptBPE: From General Purpose to Specialized Tokenizers Vijini Liyanage et.al. 2601.21665 translate read null
2026-01-29 ScholarGym: Benchmarking Deep Research Workflows on Academic Literature Retrieval Hao Shen et.al. 2601.21654 translate read null
2026-01-29 ILRR: Inference-Time Steering Method for Masked Diffusion Language Models Eden Avrahami et.al. 2601.21647 translate read null
2026-01-29 RSGround-R1: Rethinking Remote Sensing Visual Grounding through Spatial Reasoning Shiqi Huang et.al. 2601.21634 translate read null
2026-01-29 LAMP: Look-Ahead Mixed-Precision Inference of Large Language Models Stanislav Budzinskiy et.al. 2601.21623 translate read null
2026-01-29 StarSD: One-for-Many Speculative Decoding Junhao He et.al. 2601.21622 translate read null
2026-01-29 Thinking Broad, Acting Fast: Latent Reasoning Distillation from Multi-Perspective Chain-of-Thought for E-Commerce Relevance Baopu Qiu et.al. 2601.21611 translate read null
2026-01-29 WMVLM: Evaluating Diffusion Model Image Watermarking via Vision-Language Models Zijin Yang et.al. 2601.21610 translate read null
2026-01-29 RecNet: Self-Evolving Preference Propagation for Agentic Recommender Systems Bingqian Li et.al. 2601.21609 translate read null
2026-01-29 Age Matters: Analyzing Age-Related Discussions in App Reviews Shashiwadana Nirmania et.al. 2601.21605 translate read null
2026-01-29 CORE: Collaborative Reasoning via Cross Teaching Kshitij Mishra et.al. 2601.21600 translate read null
2026-01-29 Beyond Imitation: Reinforcement Learning for Active Latent Planning Zhi Zheng et.al. 2601.21598 translate read null
2026-01-29 Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening Xiaotong Ji et.al. 2601.21590 translate read null
2026-01-29 ICL-EVADER: Zero-Query Black-Box Evasion Attacks on In-Context Learning and Their Defenses Ningyuan He et.al. 2601.21586 translate read null
2026-01-29 Learning the Mechanism of Catastrophic Forgetting: A Perspective from Gradient Similarity Mutian Yang et.al. 2601.21577 translate read null
2026-01-29 Chain Of Thought Compression: A Theoritical Analysis Juncai Li et.al. 2601.21576 translate read null
2026-01-29 ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas Xiaoyu Tian et.al. 2601.21558 translate read null
2026-01-29 Meta Context Engineering via Agentic Skill Evolution Haoran Ye et.al. 2601.21557 translate read null
2026-01-29 Note2Chat: Improving LLMs for Multi-Turn Clinical History Taking Using Medical Notes Yang Zhou et.al. 2601.21551 translate read null
2026-01-29 ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory Yang Zhao et.al. 2601.21545 translate read null
2026-01-29 Opinion Consensus Formation Among Networked Large Language Models Iris Yazici et.al. 2601.21540 translate read null
2026-01-29 More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD) Sagi Meir et.al. 2601.21522 translate read null
2026-01-29 HERS: Hidden-Pattern Expert Learning for Risk-Specific Vehicle Damage Adaptation in Diffusion Models Teerapong Panboonyuen et.al. 2601.21517 translate read null
2026-01-29 LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AI Niki van Stein et.al. 2601.21511 translate read null
2026-01-29 The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation Diaoulé Diallo et.al. 2601.21505 translate read null
2026-01-29 MAR: Efficient Large Language Models via Module-aware Architecture Refinement Junhong Cai et.al. 2601.21503 translate read null
2026-01-29 The Path of Least Resistance: Guiding LLM Reasining Trajectories with Prefix Consensus Ishan Jindal et.al. 2601.21494 translate read null
2026-01-29 DimStance: Multilingual Datasets for Dimensional Stance Analysis Jonas Becker et.al. 2601.21483 translate read null
2026-01-29 SOUP: Token-level Single-sample Mix-policy Reinforcement Learning for Large Language Models Lei Yang et.al. 2601.21476 translate read null
2026-01-29 Adaptive Confidence Gating in Multi-Agent Collaboration for Efficient and Optimized Code Generation Haoji Zhang et.al. 2601.21469 translate read null
2026-01-29 Topeax – An Improved Clustering Topic Model with Density Peak Detection and Lexical-Semantic Term Importance Márton Kardos et.al. 2601.21465 translate read null
2026-01-29 Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation Yuan Sui et.al. 2601.21464 translate read null
2026-01-29 Unifying Speech Editing Detection and Content Localization via Prior-Enhanced Audio LLMs Jun Xue et.al. 2601.21463 translate read null
2026-01-29 SAGE: Sequence-level Adaptive Gradient Evolution for Generative Recommendation Yu Xie et.al. 2601.21452 translate read null
2026-01-29 Variance & Greediness: A comparative study of metric-learning losses Donghuo Zeng et.al. 2601.21450 translate read null
2026-01-29 ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design Zhongkai Yu et.al. 2601.21448 translate read null
2026-01-29 The Paradox of Robustness: Decoupling Rule-Based Logic from Affective Noise in High-Stakes Decision-Making Jon Chun et.al. 2601.21439 translate read null
2026-01-29 Accurate Network Traffic Matrix Prediction via LEAD: an LLM-Enhanced Adapter-Based Conditional Diffusion Model Yu Sun et.al. 2601.21437 translate read null
2026-01-29 From Consistency to Complementarity: Aligned and Disentangled Multi-modal Learning for Time Series Understanding and Reasoning Hang Ni et.al. 2601.21436 translate read null
2026-01-29 When Prohibitions Become Permissions: Auditing Negation Sensitivity in Language Models Katherine Elkins et.al. 2601.21433 translate read null
2026-01-29 MultiModal Fine-tuning with Synthetic Captions Shohei Enomoto et.al. 2601.21426 translate read null
2026-01-29 ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Zihao Huang et.al. 2601.21420 translate read null
2026-01-29 Statsformer: Validated Ensemble Learning with LLM-Derived Semantic Priors Erica Zhang et.al. 2601.21410 translate read null
2026-01-29 User-Centric Evidence Ranking for Attribution and Fact Verification Guy Alt et.al. 2601.21387 translate read null
2026-01-29 Predicting Developer Acceptance of AI-Generated Code Suggestions Jing Jiang et.al. 2601.21379 translate read null
2026-01-29 TeachBench: A Syllabus-Grounded Framework for Evaluating Teaching Ability in Large Language Models Zheng Li et.al. 2601.21375 translate read null
2026-01-29 NEMO: Execution-Aware Optimization Modeling via Autonomous Coding Agents Yang Song et.al. 2601.21372 translate read null
2026-01-29 Small models, big threats: Characterizing safety challenges from low-compute AI models Prateek Puri et.al. 2601.21365 translate read null
2026-01-29 The Compliance Paradox: Semantic-Instruction Decoupling in Automated Academic Code Evaluation Devanshu Sahoo et.al. 2601.21360 translate read null
2026-01-29 Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization Jiecong Wang et.al. 2601.21358 translate read null
2026-01-29 Factored Causal Representation Learning for Robust Reward Modeling in RLHF Yupei Yang et.al. 2601.21350 translate read null
2026-01-29 Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER Xiuwen Zheng et.al. 2601.21347 translate read null
2026-01-29 Self-Improving Pretraining: using post-trained models to pretrain better models Ellen Xiaoqing Tan et.al. 2601.21343 translate read null
2026-01-29 Ostrakon-VL: Towards Domain-Expert MLLM for Food-Service and Retail Stores Zhiyong Shen et.al. 2601.21342 translate read null
2026-01-29 EHR-RAG: Bridging Long-Horizon Structured Electronic Health Records and Large Language Models via Enhanced Retrieval-Augmented Generation Lang Cao et.al. 2601.21340 translate read null
2026-01-29 Within-Model vs Between-Prompt Variability in Large Language Models for Creative Tasks Jennifer Haase et.al. 2601.21339 translate read null
2026-01-29 White-Box Op-Amp Design via Human-Mimicking Reasoning Zihao Chen et.al. 2601.21321 translate read null
2026-01-29 Detecting Multiple Semantic Concerns in Tangled Code Commits Beomsu Koh et.al. 2601.21298 translate read null
2026-01-29 More Code, Less Reuse: Investigating Code Quality and Reviewer Sentiment towards AI-generated Pull Requests Haoming Huang et.al. 2601.21276 translate read null
2026-01-29 Reinforcement Learning from Meta-Evaluation: Aligning Language Models Without Ground-Truth Labels Micah Rentschler et.al. 2601.21268 translate read null
2026-01-29 CausalEmbed: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding Jiahao Huo et.al. 2601.21262 translate read null
2026-01-29 User-Centric Phishing Detection: A RAG and LLM-Based Approach Abrar Hamed Al Barwani et.al. 2601.21261 translate read null
2026-01-29 TIDE: Tuning-Integrated Dynamic Evolution for LLM-Based Automated Heuristic Design Chentong Chen et.al. 2601.21239 translate read null
2026-01-29 SHARP: Social Harm Analysis via Risk Profiles for Measuring Inequities in Large Language Models Alok Abhishek et.al. 2601.21235 translate read null
2026-01-29 Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs Xiang Zheng et.al. 2601.21233 translate read null
2026-01-29 MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation Tianyi Xu et.al. 2601.21225 translate read null
2026-01-29 LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models Alvi Md Ishmam et.al. 2601.21220 translate read null
2026-01-29 Parametric Knowledge is Not All You Need: Toward Honest Large Language Models via Retrieval of Pretraining Data Christopher Adrian Kusuma et.al. 2601.21218 translate read null
2026-01-29 Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models Zhaoyi Li et.al. 2601.21214 translate read null
2026-01-29 Intelli-Planner: Towards Customized Urban Planning via Large Language Model Empowered Reinforcement Learning Xixian Yong et.al. 2601.21212 translate read null
2026-01-29 Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification Paul He et.al. 2601.21210 translate read null
2026-01-29 Scaling Embeddings Outperforms Scaling Experts in Language Models Hong Liu et.al. 2601.21204 translate read null
2026-01-29 ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling Yuchen Yang et.al. 2601.21198 translate read null
2026-01-29 Do Reasoning Models Enhance Embedding Models? Wun Yu Chan et.al. 2601.21192 translate read null
2026-01-29 Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks Arther Tian et.al. 2601.21189 translate read null
2026-01-29 MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models Sangyun Chung et.al. 2601.21181 translate read link
2026-01-29 Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving Jingyun Wang et.al. 2601.21164 translate read null
2026-01-29 Bridging the Arithmetic Gap: The Cognitive Complexity Benchmark and Financial-PoT for Robust Financial Reasoning Boxiang Zhao et.al. 2601.21157 translate read null
2026-01-29 Large Language Models Naively Recover Ethnicity from Individual Records Noah Dasanaike et.al. 2601.21132 translate read null
2026-01-29 Beyond a Single Reference: Training and Evaluation with Paraphrases in Sign Language Translation Václav Javorek et.al. 2601.21128 translate read null
2026-01-28 Planner-Auditor Twin: Agentic Discharge Planning with FHIR-Based LLM Planning, Guideline Recall, Optional Caching and Self-Improvement Kaiyuan Wu et.al. 2601.21113 translate read null
2026-01-28 ChunkWise LoRA: Adaptive Sequence Partitioning for Memory-Efficient Low-Rank Adaptation and Accelerated LLM Inference Ketan Thakkar et.al. 2601.21109 translate read null
2026-01-28 OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence Jarrod Barnes et.al. 2601.21083 translate read link
2026-01-28 LOCUS: Low-Dimensional Model Embeddings for Efficient Model Exploration, Comparison, and Selection Shivam Patel et.al. 2601.21082 translate read null
2026-01-28 Towards Comprehensive Benchmarking Infrastructure for LLMs In Software Engineering Daniel Rodriguez-Cardenas et.al. 2601.21070 translate read null
2026-01-28 Textual Equilibrium Propagation for Deep Compound AI Systems Minghui Chen et.al. 2601.21064 translate read null
2026-01-28 Human-LLM Collaborative Feature Engineering for Tabular Data Zhuoyan Li et.al. 2601.21060 translate read null
2026-01-28 Order-Aware Test-Time Adaptation: Leveraging Temporal Dynamics for Robust Streaming Inference Young Kyung Kim et.al. 2601.21012 translate read null
2026-01-28 Bayesian-LoRA: Probabilistic Low-Rank Adaptation of Large Language Models Moule Lin et.al. 2601.21003 translate read null
2026-01-28 UrduBench: An Urdu Reasoning Benchmark using Contextually Ensembled Translations with Human-in-the-Loop Muhammad Ali Shafique et.al. 2601.21000 translate read null
2026-01-28 Diversifying Toxicity Search in Large Language Models Through Speciation Onkar Shelar et.al. 2601.20981 translate read null
2026-01-28 Infusion of Blockchain to Establish Trustworthiness in AI Supported Software Evolution: A Systematic Literature Review Mohammad Naserameri et.al. 2601.20918 translate read null
2026-01-28 Noisy but Valid: Robust Statistical Evaluation of LLMs with Imperfect Judges Chen Feng et.al. 2601.20913 translate read null
2026-01-28 Non-Markov Multi-Round Conversational Image Generation with History-Conditioned MLLMs Haochen Zhang et.al. 2601.20911 translate read null
2026-01-28 TwinWeaver: An LLM-Based Foundation Model Framework for Pan-Cancer Digital Twins Nikita Makarov et.al. 2601.20906 translate read null
2026-01-28 ICON: Intent-Context Coupling for Efficient Multi-Turn Jailbreak Attack Xingwei Lin et.al. 2601.20903 translate read null
2026-01-28 Text-only adaptation in LLM-based ASR through text denoising Sergio Burdisso et.al. 2601.20900 translate read null
2026-01-28 Reducing Prompt Sensitivity in LLM-based Speech Recognition Through Learnable Projection Sergio Burdisso et.al. 2601.20898 translate read null
2026-01-28 IDE-Bench: Evaluating Large Language Models as IDE Agents on Real-World Software Engineering Tasks Spencer Mateega et.al. 2601.20886 translate read null
2026-01-27 What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models Md Tasnim Jawad et.al. 2601.20885 translate read null
2026-01-28 When Flores Bloomz Wrong: Cross-Direction Contamination in Machine Translation Evaluation David Tan et.al. 2601.20858 translate read null
2026-01-28 SokoBench: Evaluating Long-Horizon Planning and Reasoning in Large Language Models Sebastiano Monti et.al. 2601.20856 translate read null
2026-01-28 Reward Models Inherit Value Biases from Pretraining Brian Christian et.al. 2601.20838 translate read null
2026-01-28 Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives Tengyue Xu et.al. 2601.20833 translate read link
2026-01-28 MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents Vishnu Sashank Dorbala et.al. 2601.20831 translate read null
2026-01-28 Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning Minwu Kim et.al. 2601.20829 translate read link
2026-01-28 Context-Augmented Code Generation Using Programming Knowledge Graphs Shahd Seddik et.al. 2601.20810 translate read null
2026-01-28 How Disciplinary Partnerships Shape Research Landscape in U.S. Library and Information Science Schools Jiangen He et.al. 2601.20806 translate read null
2026-01-28 Reinforcement Learning via Self-Distillation Jonas Hübotter et.al. 2601.20802 translate read link
2026-01-28 Dissecting Multimodal In-Context Learning: Modality Asymmetries and Circuit Dynamics in modern Transformers Yiran Huang et.al. 2601.20796 translate read null
2026-01-28 Agentic Fog: A Policy-driven Framework for Distributed Intelligence in Fog Computing Saeed Akbar et.al. 2601.20764 translate read null
2026-01-28 Persona Prompting as a Lens on LLM Social Reasoning Jing Yang et.al. 2601.20757 translate read link
2026-01-28 ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler Bohua Zou et.al. 2601.20755 translate read null
2026-01-28 Like a Therapist, But Not: Reddit Narratives of AI in Mental Health Contexts Elham Aghakhani et.al. 2601.20747 translate read null
2026-01-28 HESTIA: A Hessian-Guided Differentiable Quantization-Aware Training Framework for Extremely Low-Bit LLMs Guoan Wang et.al. 2601.20745 translate read null
2026-01-28 Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification Xin Jin et.al. 2601.20742 translate read null
2026-01-28 QueerGen: How LLMs Reflect Societal Norms on Gender and Sexuality in Sentence Completion Tasks Mae Sosto et.al. 2601.20731 translate read null
2026-01-28 AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts Shicheng Fang et.al. 2601.20730 translate read link
2026-01-28 Audit Trails for Accountability in Large Language Models Victor Ojewale et.al. 2601.20727 translate read null
2026-01-28 MedViz: An Agent-based, Visual-guided Research Assistant for Navigating Biomedical Literature Huan He et.al. 2601.20709 translate read null
2026-01-28 Beyond GEMM-Centric NPUs: Enabling Efficient Diffusion LLM Sampling Binglei Lou et.al. 2601.20706 translate read null
2026-01-28 Structurally Human, Semantically Biased: Detecting LLM-Generated References with Embeddings and GNNs Melika Mobini et.al. 2601.20704 translate read null
2026-01-28 Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework Xinyue Li et.al. 2601.20689 translate read null
2026-01-28 Online Density-Based Clustering for Real-Time Narrative Evolution Monitorin Ostap Vykhopen et.al. 2601.20680 translate read null
2026-01-28 ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code Mingqiao Mo et.al. 2601.20679 translate read null
2026-01-28 Efficient Multimodal Planning Agent for Visual Question-Answering Zhuo Chen et.al. 2601.20676 translate read null
2026-01-28 bi-modal textual prompt learning for vision-language models in remote sensing Pankhi Kashyap et.al. 2601.20675 translate read null
2026-01-28 Harnessing Large Language Models for Precision Querying and Retrieval-Augmented Knowledge Extraction in Clinical Data Science Juan Jose Rubio Jan et.al. 2601.20674 translate read null
2026-01-28 When Vision Meets Texts in Listwise Reranking Hongyi Cai et.al. 2601.20623 translate read null
2026-01-28 GDCNet: Generative Discrepancy Comparison Network for Multimodal Sarcasm Detection Shuguang Zhang et.al. 2601.20618 translate read null
2026-01-28 Agent Benchmarks Fail Public Sector Requirements Jonathan Rystrøm et.al. 2601.20617 translate read null
2026-01-28 DRAINCODE: Stealthy Energy Consumption Attacks on Retrieval-Augmented Code Generation via Context Poisoning Yanlin Wang et.al. 2601.20615 translate read null
2026-01-28 Dialogical Reasoning Across AI Architectures: A Multi-Model Framework for Testing AI Alignment Strategies Gray Cox et.al. 2601.20604 translate read null
2026-01-28 MeCo: Enhancing LLM-Empowered Multi-Robot Collaboration via Similar Task Memoization Baiqing Wang et.al. 2601.20577 translate read null
2026-01-28 Gen-SER: When the generative model meets speech emotion recognition Taihui Wang et.al. 2601.20573 translate read null
2026-01-28 Beyond Divergent Creativity: A Human-Based Evaluation of Creativity in Large Language Models Kumiko Nakajima et.al. 2601.20546 translate read null
2026-01-28 PathWise: Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs Oguzhan Gungordu et.al. 2601.20539 translate read null
2026-01-28 Interpreting Emergent Extreme Events in Multi-Agent Systems Ling Tang et.al. 2601.20538 translate read null
2026-01-28 Context Tokens are Anchors: Understanding the Repetition Curse in dMLLMs from an Information Flow Perspective Qiyan Zhao et.al. 2601.20520 translate read null
2026-01-28 Can We Improve Educational Diagram Generation with In-Context Examples? Not if a Hallucination Spoils the Bunch Evanfiya Logacheva et.al. 2601.20476 translate read null
2026-01-28 Piloting Planetarium Visualizations with LLMs during Live Events in Science Centers Mathis Brossier et.al. 2601.20466 translate read null
2026-01-28 PEARL: Plan Exploration and Adaptive Reinforcement Learning for Multihop Tool Use Qihao Wang et.al. 2601.20439 translate read null
2026-01-28 Concept Component Analysis: A Principled Approach for Concept Extraction in LLMs Yuhang Liu et.al. 2601.20420 translate read null
2026-01-28 Beyond Accuracy: A Cognitive Load Framework for Mapping the Capability Boundaries of Tool-use Agents Qihao Wang et.al. 2601.20412 translate read null
2026-01-28 GuideAI: A Real-time Personalized Learning Solution with Adaptive Interventions Ananya Shukla et.al. 2601.20402 translate read null
2026-01-28 Eliminating Hallucination in Diffusion-Augmented Interactive Text-to-Image Retrieval Zhuocheng Zhang et.al. 2601.20391 translate read null
2026-01-28 Policy of Thoughts: Scaling LLM Reasoning via Test-time Policy Evolution Zhengbo Jiao et.al. 2601.20379 translate read null
2026-01-28 LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning Wei Huang et.al. 2601.20375 translate read null
2026-01-28 AMA: Adaptive Memory via Multi-Agent Collaboration Weiquan Huang et.al. 2601.20352 translate read null
2026-01-28 Demonstration-Free Robotic Control via LLM Agents Brian Y. Tsui et.al. 2601.20334 translate read null
2026-01-28 PsychePass: Calibrating LLM Therapeutic Competence via Trajectory-Anchored Tournaments Zhuang Chen et.al. 2601.20330 translate read null
2026-01-28 ECG-Agent: On-Device Tool-Calling Agent for ECG Multi-Turn Dialogue Hyunseung Chung et.al. 2601.20323 translate read null
2026-01-28 Less is More: Benchmarking LLM Based Recommendation Agents Kargi Chauhan et.al. 2601.20316 translate read null
2026-01-28 DiagLink: A Dual-User Diagnostic Assistance System by Synergizing Experts with LLMs and Knowledge Graphs Zihan Zhou et.al. 2601.20311 translate read null
2026-01-28 SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips Jiahuan Yu et.al. 2601.20309 translate read null
2026-01-28 Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction Tianyi Alex Qiu et.al. 2601.20299 translate read null
2026-01-28 Memory Retrieval in Transformers: Insights from The Encoding Specificity Principle Viet Hung Dinh et.al. 2601.20282 translate read null
2026-01-28 Eliciting Least-to-Most Reasoning for Phishing URL Detection Holly Trikilis et.al. 2601.20270 translate read null
2026-01-28 HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH Yueyang Wang et.al. 2601.20255 translate read null
2026-01-28 Efficient Evaluation of LLM Performance with Statistical Guarantees Skyler Wu et.al. 2601.20251 translate read null
2026-01-28 Large Language Models Polarize Ideologically but Moderate Affectively in Online Political Discourse Gavin Wang et.al. 2601.20238 translate read null
2026-01-28 Unit-Based Agent for Semi-Cascaded Full-Duplex Dialogue Systems Haoyuan Yu et.al. 2601.20230 translate read null
2026-01-28 Scaling Medical Reasoning Verification via Tool-Integrated Reinforcement Learning Hang Zhang et.al. 2601.20221 translate read null
2026-01-28 Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning Jinyang Wu et.al. 2601.20209 translate read null
2026-01-28 An Autonomous Agent Framework for Feature-Label Extraction from Device Dialogues and Automatic Multi-Dimensional Device Hosting Planning Based on Large Language Models Huichao Men et.al. 2601.20194 translate read null
2026-01-28 Me-Agent: A Personalized Mobile Agent with Two-Level User Habit Learning for Enhanced Interaction Shuoxin Wang et.al. 2601.20162 translate read null
2026-01-28 Large language models accurately predict public perceptions of support for climate action worldwide Nattavudh Powdthavee et.al. 2601.20141 translate read null
2026-01-27 BengaliSent140: A Large-Scale Bengali Binary Sentiment Dataset for Hate and Non-Hate Speech Classification Akif Islam et.al. 2601.20129 translate read null
2026-01-27 Rewarding Intellectual Humility Learning When Not To Answer In Large Language Models Abha Jha et.al. 2601.20126 translate read null
2026-01-27 Usage, Effects and Requirements for AI Coding Assistants in the Enterprise: An Empirical Study Maja Vukovic et.al. 2601.20112 translate read null
2026-01-27 FFE-Hallu:Hallucinations in Fixed Figurative Expressions:Benchmark of Idioms and Proverbs in the Persian Language Faezeh Hosseini et.al. 2601.20105 translate read null
2026-01-27 Dynamics of Human-AI Collective Knowledge on the Web: A Scalable Model and Insights for Sustainable Growth Buddhika Nettasinghe et.al. 2601.20099 translate read null
2026-01-27 Should I Have Expressed a Different Intent? Counterfactual Generation for LLM-Based Autonomous Control Amirmohammad Farzaneh et.al. 2601.20090 translate read null
2026-01-27 Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery Meng Xin et.al. 2601.20088 translate read null
2026-01-27 Sparse CLIP: Co-Optimizing Interpretability and Performance in Contrastive Learning Chuan Qin et.al. 2601.20075 translate read null
2026-01-23 A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs Dayal Singh Kalra et.al. 2601.16979 translate read null
2026-01-23 Auto-Regressive Masked Diffusion Models Mahdi Karami et.al. 2601.16971 translate read null
2026-01-23 Empowering Medical Equipment Sustainability in Low-Resource Settings: An AI-Powered Diagnostic and Support Platform for Biomedical Technicians Bernes Lorier Atabonfack et.al. 2601.16967 translate read null
2026-01-23 AgentDrive: An Open Benchmark Dataset for Agentic AI Reasoning with LLM-Generated Scenarios in Autonomous Systems Mohamed Amine Ferrag et.al. 2601.16964 translate read null
2026-01-23 DataStates-LLM: Scalable Checkpointing for Transformer Models Using Composable State Providers Avinash Maurya et.al. 2601.16956 translate read null
2026-01-23 Strategies for Span Labeling with Large Language Models Danil Semin et.al. 2601.16946 translate read null
2026-01-23 GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints Andy Zhu et.al. 2601.16905 translate read null
2026-01-23 Reasoning Promotes Robustness in Theory of Mind Tasks Ian B. de Haan et.al. 2601.16853 translate read null
2026-01-23 Trapped in the past? Disentangling fluid and crystallized intelligence of large language models using chess Leonard S. Pleiss et.al. 2601.16823 translate read null
2026-01-23 Large Language Models as Automatic Annotators and Annotation Adjudicators for Fine-Grained Opinion Analysis Gaurav Negi et.al. 2601.16800 translate read null
2026-01-23 Persuasion Tokens for Editing Factual Knowledge in LLMs Paul Youssef et.al. 2601.16781 translate read null
2026-01-23 LLM-powered Real-time Patent Citation Recommendation for Financial Technologies Tianang Deng et.al. 2601.16775 translate read null
2026-01-23 Standardizing Longitudinal Radiology Report Evaluation via Large Language Model Annotation Xinyi Wang et.al. 2601.16753 translate read null
2026-01-23 Supporting Stakeholder Requirements Expression with LLM Revisions: An Empirical Evaluation Michael Mircea et.al. 2601.16699 translate read null
2026-01-23 AgentsEval: Clinically Faithful Evaluation of Medical Imaging Reports via Multi-Agent Reasoning Suzhong Fu et.al. 2601.16685 translate read null
2026-01-23 From Transactions to Exploits: Automated PoC Synthesis for Real-World DeFi Attacks Xing Su et.al. 2601.16681 translate read null
2026-01-23 PLawBench: A Rubric-Based Benchmark for Evaluating LLMs in Real-World Legal Practice Yuzhen Shi et.al. 2601.16669 translate read null
2026-01-23 Revisiting the Role of Natural Language Code Comments in Code Translation Monika Gupta et.al. 2601.16661 translate read null
2026-01-23 Select or Project? Evaluating Lower-dimensional Vectors for LLM Training Data Explanations Lukas Hinterleitner et.al. 2601.16651 translate read null
2026-01-23 LUMINA: Long-horizon Understanding for Multi-turn Interactive Agents Amin Rakhsha et.al. 2601.16649 translate read null
2026-01-23 MultiLexNorm++: A Unified Benchmark and a Generative Model for Lexical Normalization for Asian Languages Weerayut Buaphet et.al. 2601.16623 translate read null
2026-01-23 How Does Personalized Memory Shape LLM Behavior? Benchmarking Rational Preference Utilization in Personalized Assistants Xueyang Feng et.al. 2601.16621 translate read null
2026-01-23 PROST-LLM: Progressively Enhancing the Speech-to-Speech Translation Capability in LLMs Jing Xu et.al. 2601.16618 translate read null
2026-01-23 AuroraEdge-V-2B: A Faster And Stronger Edge Visual Large Language Model Xiang Chen et.al. 2601.16615 translate read null
2026-01-23 Attention-MoA: Enhancing Mixture-of-Agents via Inter-Agent Semantic Attention and Deep Residual Synthesis Jianyu Wen et.al. 2601.16596 translate read null
2026-01-23 X-Aligner: Composed Visual Retrieval without the Bells and Whistles Yuqian Zheng et.al. 2601.16582 translate read null
2026-01-23 Predicting Startup Success Using Large Language Models: A Novel In-Context Learning Approach Abdurahman Maarouf et.al. 2601.16568 translate read null
2026-01-23 Retrieve-Refine-Calibrate: A Framework for Complex Claim Fact-Checking Mingwei Sun et.al. 2601.16555 translate read null
2026-01-23 LLM is Not All You Need: A Systematic Evaluation of ML vs. Foundation Models for text and image based Medical Classification Meet Raval et.al. 2601.16549 translate read null
2026-01-23 CORD: Bridging the Audio-Text Reasoning Gap via Weighted On-policy Cross-modal Distillation Jing Hu et.al. 2601.16547 translate read null
2026-01-23 Do Models Hear Like Us? Probing the Representational Alignment of Audio LLMs and Naturalistic EEG Haoyun Yang et.al. 2601.16540 translate read null
2026-01-23 OnlineSI: Taming Large Language Model for Online 3D Understanding and Grounding Zixian Liu et.al. 2601.16538 translate read null
2026-01-23 W4A16 Mixed-Precision Matrix Multiplication on Decoupled Architecture: Kernel Design and Memory Bottleneck Analysis for Ascend NPUs Yuanhong He et.al. 2601.16536 translate read null
2026-01-23 Curate-Train-Refine: A Closed-Loop Agentic Framework for Zero Shot Classification Gaurav Maheshwari et.al. 2601.16530 translate read null
2026-01-23 SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care Dongshen Peng et.al. 2601.16529 translate read null
2026-01-23 TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning Daixian Liu et.al. 2601.16520 translate read null
2026-01-23 DANCE: Dynamic, Available, Neighbor-gated Condensation for Federated Text-Attributed Graphs Zekai Chen et.al. 2601.16519 translate read null
2026-01-23 Rethinking Large Language Models For Irregular Time Series Classification In Critical Care Feixiang Zheng et.al. 2601.16516 translate read null
2026-01-23 SearchLLM: Detecting LLM Paraphrased Text by Measuring the Similarity with Regeneration of the Candidate Source via Search Engine Hoang-Quoc Nguyen-Son et.al. 2601.16512 translate read null
2026-01-23 REprompt: Prompt Generation for Intelligent Software Development Guided by Requirements Engineering Junjie Shi et.al. 2601.16507 translate read null
2026-01-23 SafeThinker: Reasoning about Risk to Deepen Safety Beyond Shallow Alignment Xianya Fang et.al. 2601.16506 translate read null
2026-01-23 EvoConfig: Self-Evolving Multi-Agent Systems for Efficient Autonomous Environment Configuration Xinshuai Guo et.al. 2601.16489 translate read null
2026-01-23 Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic Yichuan Ma et.al. 2601.16486 translate read null
2026-01-23 FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning Haoxu Wang et.al. 2601.16483 translate read null
2026-01-23 TL-GRPO: Turn-Level RL for Reasoning-Guided Iterative Optimization Peiji Li et.al. 2601.16480 translate read null
2026-01-23 Doc2AHP: Inferring Structured Multi-Criteria Decision Models via Semantic Trees with LLMs Hongjia Wu et.al. 2601.16479 translate read null
2026-01-23 Order from Chaos: Physical World Understanding from Glitchy Gameplay Videos Meng Cao et.al. 2601.16471 translate read null
2026-01-23 Persona Jailbreaking in Large Language Models Jivnesh Sandhan et.al. 2601.16466 translate read null
2026-01-23 Cutting the Gordian Knot: Detecting Malicious PyPI Packages via a Knowledge-Mining Framework Wenbo Guo et.al. 2601.16463 translate read null
2026-01-23 Graph-Anchored Knowledge Indexing for Retrieval-Augmented Generation Zhenghao Liu et.al. 2601.16462 translate read null
2026-01-23 Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding Xiaojiang Peng et.al. 2601.16449 translate read null
2026-01-23 Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go Yichuan Ma et.al. 2601.16447 translate read null
2026-01-23 Exploring the Effects of Alignment on Numerical Bias in Large Language Models Ayako Sato et.al. 2601.16444 translate read null
2026-01-23 iPDB – Optimizing SQL Queries with ML and LLM Predicates Udesh Kumarasinghe et.al. 2601.16432 translate read null
2026-01-23 Learning Domain Knowledge in Multimodal Large Language Models through Reinforcement Fine-Tuning Qinglong Cao et.al. 2601.16419 translate read null
2026-01-23 Gen-DBA: Generative Database Agents (Towards a Move 37 for Databases) Yeasir Rayhan et.al. 2601.16409 translate read null
2026-01-23 Jacobian Scopes: token-level causal attributions in LLMs Toni J. B. Liu et.al. 2601.16407 translate read null
2026-01-23 Towards a Theoretical Understanding to the Generalization of RLHF Zhaochun Li et.al. 2601.16403 translate read null
2026-01-23 Clarify or Answer: Reinforcement Learning for Agentic VQA with Context Under-specification Zongwan Cao et.al. 2601.16400 translate read null
2026-01-23 White-Box Sensitivity Auditing with Steering Vectors Hannah Cyberey et.al. 2601.16398 translate read null
2026-01-23 ResAgent: Entropy-based Prior Point Discovery and Visual Reasoning for Referring Expression Segmentation Yihao Wang et.al. 2601.16394 translate read null
2026-01-23 Cross-Lingual Activation Steering for Multilingual Language Models Rhitabrat Pokharel et.al. 2601.16390 translate read null
2026-01-23 PolyAgent: Large Language Model Agent for Polymer Design Vani Nigam et.al. 2601.16376 translate read null
2026-01-22 The Behavioral Fabric of LLM-Powered GUI Agents: Human Values and Interaction Outcomes Simret Araya Gebreegziabher et.al. 2601.16356 translate read null
2026-01-22 Identity, Cooperation and Framing Effects within Groups of Real and Simulated Humans Suhong Moon et.al. 2601.16355 translate read null
2026-01-22 NOIR: Privacy-Preserving Generation of Code with Open-Source LLMs Khoa Nguyen et.al. 2601.16354 translate read null
2026-01-22 Regional Bias in Large Language Models M P V S Gopinadh et.al. 2601.16349 translate read null
2026-01-22 Identifying Concurrency Bug Reports via Linguistic Patterns Shuai Shao et.al. 2601.16338 translate read null
2026-01-22 National Quantum Strategies: A Data-Driven Approach to Understanding the Quantum Ecosystem Simon Richard Goorney et.al. 2601.16329 translate read null
2026-01-22 Machine-Assisted Grading of Nationwide School-Leaving Essay Exams with LLMs and Statistical NLP Andres Karjus et.al. 2601.16314 translate read null
2026-01-22 A Longitudinal, Multinational, and Multilingual Corpus of News Coverage of the Russo-Ukrainian War Dikshya Mohanty et.al. 2601.16309 translate read null
2026-01-22 When Agents Fail to Act: A Diagnostic Framework for Tool Invocation Reliability in Multi-Agent LLM Systems Donghao Huang et.al. 2601.16280 translate read null
2026-01-22 Better as Generators Than Classifiers: Leveraging LLMs and Synthetic Data for Low-Resource Multilingual Classification Branislav Pecher et.al. 2601.16278 translate read null
2026-01-22 GameTalk: Training LLMs for Strategic Conversation Victor Conchello Vendrell et.al. 2601.16276 translate read null
2026-01-21 Algorithmic Identity Based on Metaparameters: A Path to Reliability, Auditability, and Traceability Juliao Braga et.al. 2601.16234 translate read null
2026-01-22 Provable Robustness in Multimodal Large Language Models via Feature Space Smoothing Song Xia et.al. 2601.16200 translate read null
2026-01-22 PAL*M: Property Attestation for Large Generative Models Prach Chantasantitam et.al. 2601.16199 translate read null
2026-01-22 Structured Hints for Sample-Efficient Lean Theorem Proving Zachary Burton et.al. 2601.16172 translate read null
2026-01-22 Low-altitude Multi-UAV-assisted Data Collection and Semantic Forwarding for Post-Disaster Relief Xiaoya Zheng et.al. 2601.16146 translate read null
2026-01-22 LLM Prompt Evaluation for Educational Applications Langdon Holmes et.al. 2601.16134 translate read null
2026-01-22 Improving Training Efficiency and Reducing Maintenance Costs via Language Specific Model Merging Alphaeus Dmonte et.al. 2601.16127 translate read null
2026-01-22 Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing Tingyu Song et.al. 2601.16125 translate read null
2026-01-22 Adapter Fusion for Multilingual Text2Cypher with Linear and Learned Gating Makbule Gulcin Ozsoy et.al. 2601.16097 translate read null
2026-01-22 Controlling Long-Horizon Behavior in Language Model Agents with Explicit State Dynamics Sukesh Subaharan et.al. 2601.16087 translate read null
2026-01-22 Grounding Large Language Models in Reaction Knowledge Graphs for Synthesis Retrieval Olga Bunkova et.al. 2601.16038 translate read null
2026-01-22 Sawtooth Wavefront Reordering: Enhanced CuTile FlashAttention on NVIDIA GB10 Yifan Zhu et.al. 2601.16032 translate read null
2026-01-22 Deja Vu in Plots: Leveraging Cross-Session Evidence with Retrieval-Augmented LLMs for Live Streaming Risk Assessment Yiran Qiao et.al. 2601.16027 translate read null
2026-01-22 Timbre-Aware LLM-based Direct Speech-to-Speech Translation Extendable to Multiple Language Pairs Lalaram Arya et.al. 2601.16023 translate read null
2026-01-22 PhysicsMind: Sim and Real Mechanics Benchmarking for Physical Reasoning and Prediction in Foundational VLMs and World Models Chak-Wing Mak et.al. 2601.16007 translate read null
2026-01-22 TeNet: Text-to-Network for Compact Policy Synthesis Ariyan Bighashdel et.al. 2601.15912 translate read null
2026-01-22 Co-Constructing Alignment: A Participatory Approach to Situate AI Values Anne Arzberger et.al. 2601.15895 translate read null
2026-01-22 Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model Chenghao Fan et.al. 2601.15892 translate read null
2026-01-22 Evaluating and Achieving Controllable Code Completion in Code LLM Jiajun Zhang et.al. 2601.15879 translate read null
2026-01-22 Virtual Traffic Police: Large Language Model-Augmented Traffic Signal Control for Unforeseen Incidents Shiqi Wei et.al. 2601.15816 translate read null
2026-01-22 ErrorMap and ErrorAtlas: Charting the Failure Landscape of Large Language Models Shir Ashury-Tahan et.al. 2601.15812 translate read null
2026-01-22 Attributing and Exploiting Safety Vectors through Global Optimization in Large Language Models Fengheng Chu et.al. 2601.15801 translate read null
2026-01-22 HumanLLM: Towards Personalized Understanding and Simulation of Human Nature Yuxuan Lei et.al. 2601.15793 translate read null
2026-01-22 Next Generation Active Learning: Mixture of LLMs in the Loop Yuanyuan Qi et.al. 2601.15773 translate read null
2026-01-22 Beyond Marginal Distributions: A Framework to Evaluate the Representativeness of Demographic-Aligned LLMs Tristan Williams et.al. 2601.15755 translate read null
2026-01-22 Tabular Incremental Inference Xinda Chen et.al. 2601.15751 translate read null
2026-01-22 Towards Automated Kernel Generation in the Era of LLMs Yang Yu et.al. 2601.15727 translate read null
2026-01-22 VideoThinker: Building Agentic VideoLLMs with LLM-Guided Tool Reasoning Chenglin Li et.al. 2601.15724 translate read null
2026-01-22 CoNRec: Context-Discerning Negative Recommendation with LLMs Xinda Chen et.al. 2601.15721 translate read null
2026-01-22 Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs Mingyu Yu et.al. 2601.15698 translate read null
2026-01-22 From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models Jiaxin Zhang et.al. 2601.15690 translate read null
2026-01-22 Connect the Dots: Knowledge Graph-Guided Crawler Attack on Retrieval-Augmented Generation Systems Mengyu Yao et.al. 2601.15678 translate read null
2026-01-22 What Patients Really Ask: Exploring the Effect of False Assumptions in Patient Information Seeking Raymond Xiong et.al. 2601.15674 translate read null
2026-01-22 EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning Dingdong Wang et.al. 2601.15668 translate read null
2026-01-22 Event-VStream: Event-Driven Real-Time Understanding for Long Video Streams Zhenghui Guo et.al. 2601.15655 translate read null
2026-01-22 Predictive Coding and Information Bottleneck for Hallucination Detection in Large Language Models Manish Bhatt et.al. 2601.15652 translate read null
2026-01-22 Towards Reliable Medical LLMs: Benchmarking and Enhancing Confidence Estimation of Large Language Models in Medical Consultation Zhiyao Ren et.al. 2601.15645 translate read null
2026-01-22 CogToM: A Comprehensive Theory of Mind Benchmark inspired by Human Cognition for Large Language Models Haibo Tong et.al. 2601.15628 translate read null
2026-01-22 Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors Zhiwei Zhang et.al. 2601.15625 translate read null
2026-01-22 Explainable Deepfake Detection with RL Enhanced Self-Blended Images Ning Jiang et.al. 2601.15624 translate read null
2026-01-22 When Sharpening Becomes Collapse: Sampling Bias and Semantic Coupling in RL with Verifiable Rewards Mingyuan Fan et.al. 2601.15609 translate read null
2026-01-22 ToxiTwitch: Toward Emote-Aware Hybrid Moderation for Live Streaming Platforms Baktash Ansari et.al. 2601.15605 translate read null
2026-01-22 Autonomous Business System via Neuro-symbolic AI Cecil Pang et.al. 2601.15599 translate read null
2026-01-22 DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice Leying Zhang et.al. 2601.15596 translate read null
2026-01-22 Data-Free Privacy-Preserving for LLMs via Model Inversion and Selective Unlearning Xinjie Zhou et.al. 2601.15595 translate read null
2026-01-22 YuFeng-XGuard: A Reasoning-Centric, Interpretable, and Flexible Guardrail Model for Large Language Models Junyu Lin et.al. 2601.15588 translate read null
2026-01-22 MapViT: A Two-Stage ViT-Based Framework for Real-Time Radio Quality Map Prediction in Dynamic Environments Cyril Shih-Huan Hsu et.al. 2601.15578 translate read null
2026-01-22 From Generation to Collaboration: Using LLMs to Edit for Empathy in Healthcare Man Luo et.al. 2601.15558 translate read null
2026-01-22 LLM or Human? Perceptions of Trust and Information Quality in Research Summaries Nil-Jana Akpinar et.al. 2601.15556 translate read null
2026-01-22 VIOLA: Towards Video In-Context Learning with Minimal Annotations Ryo Fujii et.al. 2601.15549 translate read null
2026-01-21 Securing LLM-as-a-Service for Small Businesses: An Industry Case Study of a Distributed Chatbot Deployment Platform Jiazhu Xie et.al. 2601.15528 translate read null
2026-01-21 TransportAgents: a multi-agents LLM framework for traffic accident severity prediction Zhichao Yang et.al. 2601.15519 translate read null
2026-01-21 AdversaRiskQA: An Adversarial Factuality Benchmark for High-Risk Domains Adam Szelestey et.al. 2601.15511 translate read null
2026-01-21 MARS: Unleashing the Power of Speculative Decoding via Margin-Aware Verification Jingwei Song et.al. 2601.15498 translate read null
2026-01-21 Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge Yiyang Feng et.al. 2601.15495 translate read null
2026-01-21 Testing Deep Learning Libraries via Neurosymbolic Constraint Learning M M Abid Naziri et.al. 2601.15493 translate read null
2026-01-21 Multi-Persona Thinking for Bias Mitigation in Large Language Models Yuxing Chen et.al. 2601.15488 translate read null
2026-01-21 A Universal Large Language Model – Drone Command and Control Interface Javier N. Ramos-Silva et.al. 2601.15486 translate read null
2026-01-21 The Rise of Large Language Models and the Direction and Impact of US Federal Research Funding Yifan Qian et.al. 2601.15485 translate read null
2026-01-21 Martingale Foresight Sampling: A Principled Approach to Inference-Time LLM Decoding Huayu Li et.al. 2601.15482 translate read null
2026-01-21 Benchmarking LLMs for Pairwise Causal Discovery in Biomedical and Multi-Domain Contexts Sydney Anuyah et.al. 2601.15479 translate read null
2026-01-21 Reliability by design: quantifying and eliminating fabrication risk in LLMs. From generative to consultative AI: a comparative analysis in the legal domain and lessons for high-stakes knowledge bases Alex Dantart et.al. 2601.15476 translate read null
2026-01-21 Chunking, Retrieval, and Re-ranking: An Empirical Evaluation of RAG Architectures for Policy Document Question Answering Anuj Maharjan et.al. 2601.15457 translate read null
2026-01-21 Exploring Implicit Perspectives on Autism in Large Language Models Through Multi-Agent Simulations Sohyeon Park et.al. 2601.15437 translate read null
2026-01-21 Not Your Typical Sycophant: The Elusive Nature of Sycophancy in Large Language Models Shahar Ben Natan et.al. 2601.15436 translate read null
2026-01-21 Domain-Specific Knowledge Graphs in RAG-Enhanced Healthcare LLMs Sydney Anuyah et.al. 2601.15429 translate read null
2026-01-21 Evaluating Multimodal Large Language Models for Heterogeneous Face Recognition Hatef Otroshi Shahreza et.al. 2601.15406 translate read null
2026-01-21 Beyond Prompting: Efficient and Robust Contextual Biasing for Speech LLMs via Logit-Space Integration (LOGIC) Peidong Wang et.al. 2601.15397 translate read null
2026-01-21 Memorization Dynamics in Knowledge Distillation for Language Models Jaydeep Borkar et.al. 2601.15394 translate read null
2026-01-21 VegaChat: A Robust Framework for LLM-Based Chart Generation and Assessment Marko Hostnik et.al. 2601.15385 translate read null
2026-01-21 OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation Letian Zhang et.al. 2601.15369 translate read null
2026-01-21 Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing Xiang Li et.al. 2601.15356 translate read null
2026-01-21 A Prompt-Based Framework for Loop Vulnerability Detection Using Local LLMs Adeyemi Adeseye et.al. 2601.15352 translate read null
2026-01-21 Abusive music and song transformation using GenAI and LLMs Jiyang Choi et.al. 2601.15348 translate read null
2026-01-20 Lost in Transcription: How Speech-to-Text Errors Derail Code Understanding Jayant Havare et.al. 2601.15339 translate read null
2026-01-20 From Quotes to Concepts: Axial Coding of Political Debates with Ensemble LMs Angelina Parfenova et.al. 2601.15338 translate read null
2026-01-20 ToolCaching: Towards Efficient Caching for LLM Tool-calling Yi Zhai et.al. 2601.15335 translate read null
2026-01-20 No Reliable Evidence of Self-Reported Sentience in Small Large Language Models Caspar Kaiser et.al. 2601.15334 translate read null
2026-01-20 Empowering LLMs for Structure-Based Drug Design via Exploration-Augmented Latent Inference Xuanning Hu et.al. 2601.15333 translate read null
2026-01-20 RECAP: A Resource-Efficient Method for Adversarial Prompting in Large Language Models Rishit Chugh et.al. 2601.15331 translate read null
2026-01-20 ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation Zhebo Wang et.al. 2601.15330 translate read null
2026-01-21 Towards Understanding Best Practices for Quantization of Vision-Language Models Gautom Das et.al. 2601.15287 translate read link
2026-01-21 Iterative Refinement Improves Compositional Image Generation Shantanu Jaiswal et.al. 2601.15286 translate read null
2026-01-21 MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs Christoph Bartmann et.al. 2601.15279 translate read null
2026-01-21 Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks Sahar Tahmasebi et.al. 2601.15277 translate read null
2026-01-21 Lightweight LLMs for Network Attack Detection in IoT Networks Piyumi Bhagya Sudasinghe et.al. 2601.15269 translate read null
2026-01-21 Evaluation of Large Language Models in Legal Applications: Challenges, Methods, and Future Directions Yiran Hu et.al. 2601.15267 translate read null
2026-01-21 The Effect of Scripts and Formats on LLM Numeracy Varshini Reddy et.al. 2601.15251 translate read null
2026-01-21 Metadata Conditioned Large Language Models for Localization Anjishnu Mukherjee et.al. 2601.15236 translate read null
2026-01-21 When Agents Fail: A Comprehensive Study of Bugs in LLM Agents with Automated Labeling Niful Islam et.al. 2601.15232 translate read null
2026-01-21 Deaf and Hard of Hearing Access to Intelligent Personal Assistants: Comparison of Voice-Based Options with an LLM-Powered Touch Interface Paige S. DeVries et.al. 2601.15209 translate read null
2026-01-21 Benchmarking Large Language Models for ABAP Code Generation: An Empirical Study on Iterative Improvement by Compiler Feedback Stephan Wallraven et.al. 2601.15188 translate read null
2026-01-21 Supporting Humans in Evaluating AI Summaries of Legal Depositions Naghmeh Farzi et.al. 2601.15182 translate read null
2026-01-21 The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Zanlin Ni et.al. 2601.15165 translate read link
2026-01-21 Automated Rubrics for Reliable Evaluation of Medical Dialogue Systems Yinzhu Chen et.al. 2601.15161 translate read null
2026-01-21 Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning Yuval Kansal et.al. 2601.15160 translate read null
2026-01-21 How to Build AI Agents by Augmenting LLMs with Codified Human Expert Domain Knowledge? A Software Engineering Framework Choro Ulan uulu et.al. 2601.15153 translate read null
2026-01-21 CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning Tianshi Xu et.al. 2601.15141 translate read null
2026-01-21 Why Authors and Maintainers Link (or Don’t Link) Their PyPI Libraries to Code Repositories and Donation Platforms Alexandros Tsakpinis et.al. 2601.15139 translate read null
2026-01-21 Conversational AI for Social Good (CAI4SG): An Overview of Emerging Trends, Applications, and Challenges Yi-Chieh Lee et.al. 2601.15136 translate read null
2026-01-21 The Plausibility Trap: Using Probabilistic Engines for Deterministic Tasks Ivan Carrera et.al. 2601.15130 translate read null
2026-01-21 RSNA Large Language Model Benchmark Dataset for Chest Radiographs of Cardiothoracic Disease: Radiologist Evaluation and Validation Enhanced by AI Labels (REVEAL-CXR) Yishu Wei et.al. 2601.15129 translate read null
2026-01-21 From Who They Are to How They Act: Behavioral Traits in Generative Agent-Based Models of Social Media Valerio La Gatta et.al. 2601.15114 translate read null
2026-01-21 Parameter-Efficient Multi-Task Fine-Tuning in Code-Related Tasks Md Zahidul Haque et.al. 2601.15094 translate read null
2026-01-21 Multi-Agent Constraint Factorization Reveals Latent Invariant Solution Structure Christopher Scofield et.al. 2601.15077 translate read null
2026-01-21 The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution Chen Qian et.al. 2601.15075 translate read null
2026-01-21 SmartOracle – An Agentic Approach to Mitigate Noise in Differential Oracles Srinath Srinivasan et.al. 2601.15074 translate read null
2026-01-21 Turning Citation Networks Inside Out: Studying Science Using Content-Based Knowledge Graphs from LLM-Derived Taxonomies Seorin Kim et.al. 2601.15062 translate read null
2026-01-21 LogicScore: Fine-grained Logic Evaluation of Conciseness, Completeness, and Determinateness in Attributed Question Answering Zhichao Yan et.al. 2601.15050 translate read null
2026-01-21 Game-Theoretic Lens on LLM-based Multi-Agent Systems Jianing Hao et.al. 2601.15047 translate read null
2026-01-21 Knowledge Restoration-driven Prompt Optimization: Unlocking LLM Potential for Open-Domain Relational Triplet Extraction Xiaonan Jing et.al. 2601.15037 translate read null
2026-01-21 Visual and Cognitive Demands of a Large Language Model-Powered In-vehicle Conversational Agent Chris Monk et.al. 2601.15034 translate read null
2026-01-21 Mixture-of-Experts Models in Vision: Routing, Optimization, and Generalization Adam Rokah et.al. 2601.15021 translate read null
2026-01-21 LiViBench: An Omnimodal Benchmark for Interactive Livestream Video Understanding Xiaodong Wang et.al. 2601.15016 translate read null
2026-01-21 Obscuring Data Contamination Through Translation: Evidence from Arabic Corpora Chaymaa Abbas et.al. 2601.14994 translate read null
2026-01-21 InstructTime++: Time Series Classification with Multimodal Language Modeling via Implicit Feature Enhancement Mingyue Cheng et.al. 2601.14968 translate read null
2026-01-21 Power-Law Scaling in the Classification Performance of Small-Scale Spiking Neural Networks Zhengdi Zhang et.al. 2601.14961 translate read null
2026-01-21 CorpusQA: A 10 Million Token Benchmark for Corpus-Level Analysis and Reasoning Zhiyuan Lu et.al. 2601.14952 translate read null
2026-01-21 What Should I Cite? A RAG Benchmark for Academic Citation Prediction Leqi Zheng et.al. 2601.14949 translate read null
2026-01-21 The GDN-CC Dataset: Automatic Corpus Clarification for AI-enhanced Democratic Citizen Consultations Pierre-Antoine Lequeu et.al. 2601.14944 translate read null
2026-01-21 State of the Art of LLM-Enabled Interaction with Visualization Mathis Brossier et.al. 2601.14943 translate read null
2026-01-21 LLM-Based Repair of C++ Implicit Data Loss Compiler Warnings: An Industrial Case Study Chansong You et.al. 2601.14936 translate read null
2026-01-21 CodeDelegator: Mitigating Context Pollution via Role Separation in Code-as-Action Agents Tianxiang Fei et.al. 2601.14914 translate read null
2026-01-21 AlertGuardian: Intelligent Alert Life-Cycle Management for Large-scale Cloud Systems Guangba Yu et.al. 2601.14912 translate read null
2026-01-21 SynPerf: A Hybrid Analytical-ML Framework for GPU Performance Prediction Kaixuan Zhang et.al. 2601.14910 translate read null
2026-01-21 Comparative Study of Large Language Models on Chinese Film Script Continuation: An Empirical Analysis Based on GPT-5.2 and Qwen-Max Yuxuan Cao et.al. 2601.14826 translate read null
2026-01-21 Reflecting in the Reflection: Integrating a Socratic Questioning Framework into Automated AI-Based Question Generation Ondřej Holub et.al. 2601.14798 translate read null
2026-01-21 CI4A: Semantic Component Interfaces for Agents Empowering Web Automation Zhi Qiu et.al. 2601.14790 translate read null
2026-01-21 RECAP: Resistance Capture in Text-based Mental Health Counseling with Large Language Models Anqi Li et.al. 2601.14780 translate read null
2026-01-21 ReinPath: A Multimodal Reinforcement Learning Approach for Pathology Kangcheng Zhou et.al. 2601.14757 translate read null
2026-01-21 Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning Yifan Wang et.al. 2601.14750 translate read link
2026-01-21 Optimizing FaaS Platforms for MCP-enabled Agentic Workflows Varad Kulkarni et.al. 2601.14735 translate read null
2026-01-21 AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering Chun-Yi Kuan et.al. 2601.14728 translate read null
2026-01-21 HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Haowei Zhang et.al. 2601.14724 translate read link
2026-01-21 PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning Yao Lu et.al. 2601.14716 translate read null
2026-01-21 Unified Multimodal and Multilingual Retrieval via Multi-Task Learning with NLU Integration Xinyuan Zhang et.al. 2601.14714 translate read null
2026-01-21 DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs Mingxuan Song et.al. 2601.14711 translate read null
2026-01-21 LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval Chao Gao et.al. 2601.14706 translate read null
2026-01-21 DARL: Encouraging Diverse Answers for General Reasoning without Verifiers Chongxuan Huang et.al. 2601.14700 translate read null
2026-01-21 AdaTIR: Adaptive Tool-Integrated Reasoning via Difficulty-Aware Policy Optimization Zhaiyu Fang et.al. 2601.14696 translate read null
2026-01-21 Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation Muhammad Khalifa et.al. 2601.14691 translate read null
2026-01-21 IB-GRPO: Aligning LLM-based Learning Path Recommendation with Educational Objectives via Indicator-Based Group Relative Policy Optimization Shuai Wang et.al. 2601.14686 translate read null
2026-01-21 FARE: Fast-Slow Agentic Robotic Exploration Shuhao Liao et.al. 2601.14681 translate read null
2026-01-21 HCVR Scene Generation: High Compatibility Virtual Reality Environment Generation for Extended Redirected Walking Yiran Zhang et.al. 2601.14679 translate read null
2026-01-21 INFA-Guard: Mitigating Malicious Propagation via Infection-Aware Safeguarding in LLM-Based Multi-Agent Systems Yijin Zhou et.al. 2601.14667 translate read null
2026-01-21 NeuroFilter: Privacy Guardrails for Conversational LLM Agents Saswat Das et.al. 2601.14660 translate read null
2026-01-21 Say Anything but This: When Tokenizer Betrays Reasoning in LLMs Navid Ayoobi et.al. 2601.14658 translate read null
2026-01-21 MIND: Empowering Mental Health Clinicians with Multimodal Data Insights through a Narrative Dashboard Ruishi Zou et.al. 2601.14641 translate read null
2026-01-21 Forest-Chat: Adapting Vision-Language Agents for Interactive Forest Change Analysis James Brock et.al. 2601.14637 translate read null
2026-01-21 Probing Prompt Design for Socially Compliant Robot Navigation with Vision Language Models Ling Xiao et.al. 2601.14622 translate read null
2026-01-21 Seeing to Think? How Source Transparency Design Shapes Interactive Information Seeking and Evaluation in Conversational AI Jiangen He et.al. 2601.14611 translate read null
2026-01-21 An LLM Agent-based Framework for Whaling Countermeasures Daisuke Miyamoto et.al. 2601.14606 translate read null
2026-01-21 Variance-Adaptive Muon: Accelerating LLM Pretraining with NSR-Modulated and Variance-Scaled Momentum Jingru Li et.al. 2601.14603 translate read null
2026-01-21 3D Space as a Scratchpad for Editable Text-to-Image Generation Oindrila Saha et.al. 2601.14602 translate read null
2026-01-21 HELIOS: Hierarchical Graph Abstraction for Structure-Aware LLM Decompilation Yonatan Gizachew Achamyeleh et.al. 2601.14598 translate read null
2026-01-21 LFS: Learnable Frame Selector for Event-Aware and Temporally Diverse Video Captioning Lianying Chao et.al. 2601.14594 translate read null
2026-01-21 Counterfactual Modeling with Fine-Tuned LLMs for Health Intervention Design and Sensor Data Augmentation Shovito Barua Soumma et.al. 2601.14590 translate read null
2026-01-21 Social Caption: Evaluating Social Understanding in Multimodal Models Bhaavanaa Thumu et.al. 2601.14569 translate read null
2026-01-21 Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education Unggi Lee et.al. 2601.14560 translate read null
2026-01-21 Self-Blinding and Counterfactual Self-Simulation Mitigate Biases and Sycophancy in Large Language Models Brian Christian et.al. 2601.14553 translate read null
2026-01-20 Predicting Retrieval Utility and Answer Quality in Retrieval-Augmented Generation Fangzheng Tian et.al. 2601.14546 translate read null
2026-01-20 Report for NSF Workshop on AI for Electronic Design Automation Deming Chen et.al. 2601.14541 translate read null
2026-01-20 LLM Security and Safety: Insights from Homotopy-Inspired Prompt Obfuscation Luis Lazo et.al. 2601.14528 translate read null
2026-01-20 Large Language Model-Powered Evolutionary Code Optimization on a Phylogenetic Tree Leyi Zhao et.al. 2601.14523 translate read null
2026-01-20 Can LLM Reasoning Be Trusted? A Comparative Study: Using Human Benchmarking on Statistical Tasks Crish Nagarkar et.al. 2601.14479 translate read null
2026-01-20 Large Language Models for Large-Scale, Rigorous Qualitative Analysis in Applied Health Services Research Sasha Ronaghi et.al. 2601.14478 translate read null
2026-01-20 On the Generalization Gap in LLM Planning: Tests and Verifier-Reward RL Valerio Belcamino et.al. 2601.14456 translate read null
2026-01-20 Diffusion Large Language Models for Black-Box Optimization Ye Yuan et.al. 2601.14446 translate read null
2026-01-20 Agentic AI Meets Edge Computing in Autonomous UAV Swarms Thuan Minh Nguyen et.al. 2601.14437 translate read null
2026-01-20 CMind: An AI Agent for Localizing C Memory Bugs Chia-Yi Su et.al. 2601.14434 translate read null
2026-01-20 Measuring the State of Open Science in Transportation Using Large Language Models Junyi Ji et.al. 2601.14429 translate read null
2026-01-20 Rethinking On-Device LLM Reasoning: Why Analogical Mapping Outperforms Abstract Thinking for IoT DDoS Detection William Pan et.al. 2601.14343 translate read null
2026-01-20 Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs Yiyang Lu et.al. 2601.14340 translate read null
2026-01-20 Layer-adaptive Expert Pruning for Pre-Training of Mixture-of-Experts Large Language Models YuanLab. ai et.al. 2601.14327 translate read null
2026-01-19 Tracing the Data Trail: A Survey of Data Provenance, Transparency and Traceability in LLMs Richard Hohensinner et.al. 2601.14311 translate read null
2026-01-19 CORVUS: Red-Teaming Hallucination Detectors via Internal Signal Camouflage in Large Language Models Nay Myat Min et.al. 2601.14310 translate read null
2026-01-20 XR: Cross-Modal Agents for Composed Image Retrieval Zhongyu Yang et.al. 2601.14245 translate read null
2026-01-20 Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow Haocheng Xi et.al. 2601.14243 translate read null
2026-01-20 Attention-Based Offline Reinforcement Learning and Clustering for Interpretable Sepsis Treatment Punit Kumar et.al. 2601.14228 translate read null
2026-01-20 HALT: Hallucination Assessment via Latent Testing Rohan Bhatnagar et.al. 2601.14210 translate read null
2026-01-20 InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning Matthew Y. R. Yang et.al. 2601.14209 translate read null
2026-01-20 Toward Efficient Agents: Memory, Tool learning, and Planning Xiaofang Yang et.al. 2601.14192 translate read link
2026-01-20 ReSearch: A Multi-Stage Machine Learning Framework for Earth Science Data Discovery Youran Sun et.al. 2601.14176 translate read null
2026-01-20 Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance Qianli Ma et.al. 2601.14171 translate read link
2026-01-20 Domain-Adaptation through Synthetic Data: Fine-Tuning Large Language Models for German Law Ali Hamza Bashir et.al. 2601.14160 translate read null
2026-01-20 ConceptCaps – a Distilled Concept Dataset for Interpretability in Music Models Bruno Sienkiewicz et.al. 2601.14157 translate read null
2026-01-20 LLM Augmented Intervenable Multimodal Adaptor for Post-operative Complication Prediction in Lung Cancer Surgery Shubham Pandey et.al. 2601.14154 translate read null
2026-01-20 Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models Hyunjong Ok et.al. 2601.14152 translate read null
2026-01-20 The Quest for Reliable AI Accelerators: Cross-Layer Evaluation and Design Optimization Meng Li et.al. 2601.14148 translate read null
2026-01-20 CREATE: Cross-Layer Resilience Characterization and Optimization for Efficient yet Reliable Embodied AI Systems Tong Xie et.al. 2601.14140 translate read null
2026-01-20 The Side Effects of Being Smart: Safety Risks in MLLMs’ Multi-Image Reasoning Renmiao Chen et.al. 2601.14127 translate read link
2026-01-20 Style Transfer as Bias Mitigation: Diffusion Models for Synthetic Mental Health Text for Arabic Saad Mankarious et.al. 2601.14124 translate read null
2026-01-20 NewsRECON: News article REtrieval for image CONtextualization Jonathan Tonglet et.al. 2601.14121 translate read null
2026-01-20 A flexible language model-assisted electronic design automation framework Cristian Sestito et.al. 2601.14098 translate read null
2026-01-20 Zero-shot adaptable task planning for autonomous construction robots: a comparative study of lightweight single and multi-AI agent systems Hossein Naderi et.al. 2601.14091 translate read null
2026-01-20 DermaBench: A Clinician-Annotated Benchmark Dataset for Dermatology Visual Question Answering and Reasoning Abdurrahim Yilmaz et.al. 2601.14084 translate read null
2026-01-20 XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs Mohsinul Kabir et.al. 2601.14063 translate read null
2026-01-20 Fine-Grained Zero-Shot Composed Image Retrieval with Complementary Visual-Semantic Integration Yongcong Ye et.al. 2601.14060 translate read null
2026-01-20 LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems Badri N. Patro et.al. 2601.14053 translate read null
2026-01-20 Vision Also You Need: Navigating Out-of-Distribution Detection with Multimodal Large Language Model Haoran Xu et.al. 2601.14052 translate read null
2026-01-20 Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants Yunhe Wang et.al. 2601.14041 translate read null
2026-01-20 RM-Distiller: Exploiting Generative LLM for Reward Model Distillation Hongli Zhou et.al. 2601.14032 translate read null
2026-01-20 BACH-V: Bridging Abstract and Concrete Human-Values in Large Language Models Junyu Zhang et.al. 2601.14007 translate read null
2026-01-20 Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models Hengyuan Zhang et.al. 2601.14004 translate read link
2026-01-20 Auditory Brain Passage Retrieval: Cross-Sensory EEG Training for Neural Information Retrieval Niall McGuire et.al. 2601.14001 translate read null
2026-01-20 “The Whole Is Greater Than the Sum of Its Parts”: A Compatibility-Aware Multi-Teacher CoT Distillation Framework Jin Cui et.al. 2601.13992 translate read null
2026-01-20 VirtualCrime: Evaluating Criminal Potential of Large Language Models via Sandbox Simulation Yilin Tang et.al. 2601.13981 translate read null
2026-01-20 RepoGenesis: Benchmarking End-to-End Microservice Generation from Readme to Repository Zhiyuan Peng et.al. 2601.13943 translate read null
2026-01-20 Glance-or-Gaze: Incentivizing LMMs to Adaptively Focus Search via Reinforcement Learning Hongbo Bai et.al. 2601.13942 translate read null
2026-01-20 HyperWalker: Dynamic Hypergraph-Based Deep Diagnosis for Multi-Hop Clinical Modeling across EHR and X-Ray in Medical VLMs Yuezhe Yang et.al. 2601.13919 translate read null
2026-01-20 AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization Yusheng Liao et.al. 2601.13918 translate read link
2026-01-20 Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches Changhao Pan et.al. 2601.13910 translate read null
2026-01-20 Multi-Objective Hierarchical Optimization with Large Language Models Andrej Schwanke et.al. 2601.13892 translate read null
2026-01-20 Human Simulation Computation: A Human-Inspired Framework for Adaptive AI Systems Hong Su et.al. 2601.13887 translate read null
2026-01-20 OpenLearnLM Benchmark: A Unified Framework for Evaluating Knowledge, Skill, and Attitude in Educational Large Language Models Unggi Lee et.al. 2601.13882 translate read null
2026-01-20 LifeAgentBench: A Multi-dimensional Benchmark and Agent for Personal Health Assistants in Digital Health Ye Tian et.al. 2601.13880 translate read null
2026-01-20 Chain-of-Thought Compression Should Not Be Blind: V-Skip for Efficient Multimodal Reasoning via Dual-Path Anchoring Dongxu Zhang et.al. 2601.13879 translate read null
2026-01-20 Pedagogical Alignment for Vision-Language-Action Models: A Comprehensive Framework for Data, Architecture, and Evaluation in Education Unggi Lee et.al. 2601.13876 translate read null
2026-01-20 HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation Qirui Chen et.al. 2601.13864 translate read null
2026-01-20 QKVQA: Question-Focused Filtering for Knowledge-based VQA Wei Ye et.al. 2601.13856 translate read null
2026-01-20 Small Models, Big Impact: Tool-Augmented AI Agents for Wireless Network Planning Yongqiang Zhang et.al. 2601.13843 translate read null
2026-01-20 DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes Aisha Al-Mohannadi et.al. 2601.13839 translate read null
2026-01-20 FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs Qian Chen et.al. 2601.13836 translate read link
2026-01-20 ELSA: Efficient LLM-Centric Split Aggregation for Privacy-Aware Hierarchical Federated Learning over Resource-Constrained Edge Networks Xiaohong Yang et.al. 2601.13824 translate read null
2026-01-20 HoverAI: An Embodied Aerial Agent for Natural Human-Drone Interaction Yuhua Jin et.al. 2601.13801 translate read null
2026-01-20 Look-Ahead-Bench: a Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance Mostapha Benhenda et.al. 2601.13770 translate read null
2026-01-20 DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution Shengda Fan et.al. 2601.13761 translate read link
2026-01-20 On Autopilot? An Empirical Study of Human-AI Teaming and Review Practices in Open Source Haoyu Gao et.al. 2601.13754 translate read null
2026-01-20 Pro-AI Bias in Large Language Models Benaya Trabelsi et.al. 2601.13749 translate read null
2026-01-20 Dimension-First Evaluation of Speech-to-Speech Models with Structured Acoustic Cues Arjun Chandra et.al. 2601.13742 translate read null
2026-01-20 Towards robust long-context understanding of large language model via active recap learning Chenyu Hui et.al. 2601.13734 translate read null
2026-01-20 OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents Yulin Hu et.al. 2601.13722 translate read null
2026-01-20 GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark Lotta Kiefer et.al. 2601.13711 translate read null
2026-01-20 Hidden in Plain Text: Measuring LLM Deception Quality Against Human Baselines Using Social Deduction Games Christopher Kao et.al. 2601.13709 translate read null
2026-01-20 IGAA: Intent-Driven General Agentic AI for Edge Services Scheduling using Generative Meta Learning Yan Sun et.al. 2601.13702 translate read null
2026-01-20 Uncertainty-Aware Gradient Signal-to-Noise Data Selection for Instruction Tuning Zhihang Yuan et.al. 2601.13697 translate read null
2026-01-20 Generative Intent Prediction Agentic AI empowered Edge Service Function Chain Orchestration Yan Sun et.al. 2601.13694 translate read null
2026-01-20 Dr. Assistant: Enhancing Clinical Diagnostic Inquiry via Structured Diagnostic Reasoning Data and Reinforcement Learning Yue Guo et.al. 2601.13690 translate read null
2026-01-20 CodeContests-O: Powering LLMs via Feedback-Driven Iterative Test Case Generation Jianfeng Cai et.al. 2601.13682 translate read link
2026-01-20 CommunityBench: Benchmarking Community-Level Alignment across Diverse Groups and Tasks Jiayu Lin et.al. 2601.13669 translate read null
2026-01-20 Temporal-Spatial Decouple before Act: Disentangled Representation Learning for Multimodal Sentiment Analysis Chunlei Meng et.al. 2601.13659 translate read null
2026-01-20 Beyond Known Facts: Generating Unseen Temporal Knowledge to Address Data Contamination in LLM Evaluation Arthur Amalvy et.al. 2601.13658 translate read null
2026-01-20 Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs Guangba Yu et.al. 2601.13655 translate read null
2026-01-20 TimeART: Towards Agentic Time Series Reasoning via Tool-Augmentation Xingjian Wu et.al. 2601.13653 translate read null
2026-01-20 Fairness or Fluency? An Investigation into Language Bias of Pairwise LLM-as-a-Judge Xiaolin Zhou et.al. 2601.13649 translate read null
2026-01-20 ContiguousKV: Accelerating LLM Prefill with Granularity-Aligned KV Cache Management Jing Zou et.al. 2601.13631 translate read null
2026-01-20 Activation-Space Anchored Access Control for Multi-Class Permission Reasoning in Large Language Models Zhaopeng Zhang et.al. 2601.13630 translate read null
2026-01-20 S $^2$ Voice: Style-Aware Autoregressive Modeling with Enhanced Conditioning for Singing Style Conversion Ziqian Wang et.al. 2601.13629 translate read null
2026-01-20 PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator Yue Jiet Chong et.al. 2601.13628 translate read null
2026-01-20 Are Large Language Models able to Predict Highly Cited Papers? Evidence from Statistical Publications Zhanshuo Ye et.al. 2601.13627 translate read null
2026-01-20 PINA: Prompt Injection Attack against Navigation Agents Jiani Liu et.al. 2601.13612 translate read null
2026-01-20 Foundations of Global Consistency Checking with Noisy LLM Oracles Paul He et.al. 2601.13600 translate read null
2026-01-20 AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development Shyam Agarwal et.al. 2601.13597 translate read null
2026-01-20 Vulnerability of LLMs’ Belief Systems? LLMs Belief Resistance Check Through Strategic Persuasive Conversation Interventions Fan Huang et.al. 2601.13590 translate read null
2026-01-20 TREX: Tokenizer Regression for Optimal Data Mixture Inho Won et.al. 2601.13588 translate read null
2026-01-20 SCRIPTMIND: Crime Script Inference and Cognitive Evaluation for LLM-based Social Engineering Scam Detection System Heedou Kim et.al. 2601.13581 translate read null
2026-01-20 Leveraging ChatGPT and Other NLP Methods for Identifying Risk and Protective Behaviors in MSM: Social Media and Dating apps Text Analysis Mehrab Beikzadeh et.al. 2601.13558 translate read null
2026-01-20 LogicEnvGen: Task-Logic Driven Generation of Diverse Simulated Environments for Embodied AI Jianan Wang et.al. 2601.13556 translate read null
2026-01-20 TruthTensor: Evaluating LLMs Human Imitation through Prediction Market Drift and Holistic Reasoning Shirin Shahabi et.al. 2601.13545 translate read null
2026-01-20 When Wording Steers the Evaluation: Framing Bias in LLM judges Yerin Hwang et.al. 2601.13537 translate read null
2026-01-20 CatMaster: An Agentic Autonomous System for Computational Heterogeneous Catalysis Research Honghao Chen et.al. 2601.13508 translate read null
2026-01-20 Towards Efficient and Robust Linguistic Emotion Diagnosis for Mental Health via Multi-Agent Instruction Refinement Jian Zhang et.al. 2601.13481 translate read null
2026-01-20 A Unified Variational Imputation Framework for Electric Vehicle Charging Data Using Retrieval-Augmented Language Model Jinhao Li et.al. 2601.13476 translate read null
2026-01-20 Preconditioning Benefits of Spectral Orthogonalization in Muon Jianhao Ma et.al. 2601.13474 translate read null
2026-01-19 PhysicsSolutionAgent: Towards Multimodal Explanations for Numerical Physics Problem Solving Aditya Thole et.al. 2601.13453 translate read null
2026-01-19 Explicit Cognitive Allocation: A Principle for Governed and Auditable Inference in Large Language Models Héctor Manuel Manzanilla-Granados et.al. 2601.13443 translate read null
2026-01-19 Trust Me, I’m an Expert: Decoding and Steering Authority Bias in Large Language Models Priyanka Mary Mammen et.al. 2601.13433 translate read null
2026-01-19 RLBR: Reinforcement Learning with Biasing Rewards for Contextual Speech Large Language Models Bo Ren et.al. 2601.13409 translate read null
2026-01-19 Integrating Virtual Reality and Large Language Models for Team-Based Non-Technical Skills Training and Evaluation in the Operating Room Jacob Barker et.al. 2601.13406 translate read null
2026-01-19 Beyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks Shlok Shelat et.al. 2601.13392 translate read null
2026-01-19 Structured Insight from Unstructured Data: Large Language Models for SDOH-Driven Diabetes Risk Prediction Sasha Ronaghi et.al. 2601.13388 translate read null
2026-01-19 Confidence over Time: Confidence Calibration with Temporal Logic for Large Language Model Reasoning Zhenjiang Mao et.al. 2601.13387 translate read null
2026-01-19 A Lightweight Modular Framework for Constructing Autonomous Agents Driven by Large Language Models: Design, Implementation, and Applications in AgentForge Akbar Anbar Jafari et.al. 2601.13383 translate read null
2026-01-19 Bounded Minds, Generative Machines: Envisioning Conversational AI that Works with Human Heuristics and Reduces Bias Risk Jiqun Liu et.al. 2601.13376 translate read null
2026-01-19 Recurrent Confidence Chain: Temporal-Aware Uncertainty Quantification in Large Language Models Zhenjiang Mao et.al. 2601.13368 translate read null
2026-01-19 Sockpuppetting: Jailbreaking LLMs Without Optimization Through Output Prefix Injection Asen Dotsinski et.al. 2601.13359 translate read null
2026-01-19 The Geometry of Thought: How Scale Restructures Reasoning In Large Language Models Samuel Cyrenius Anderson et.al. 2601.13358 translate read null
2026-01-19 LLM-as-RNN: A Recurrent Language Model for Memory Updates and Sequence Prediction Yuxing Lu et.al. 2601.13352 translate read null
2026-01-19 FlipFlop: A Static Analysis-based Energy Optimization Framework for GPU Kernels Saurabhsingh Rajput et.al. 2601.13345 translate read null
2026-01-19 Paid Voices vs. Public Feeds: Interpretable Cross-Platform Theme Modeling of Climate Discourse Samantha Sudhoff et.al. 2601.13317 translate read null
2026-01-19 CausalSpatial: A Benchmark for Object-Centric Causal Spatial Reasoning Wenxin Ma et.al. 2601.13304 translate read null
2026-01-19 OI-Bench: An Option Injection Benchmark for Evaluating LLM Susceptibility to Directive Interference Yow-Fu Liou et.al. 2601.13300 translate read null
2026-01-19 Enginuity: Building an Open Multi-Domain Dataset of Complex Engineering Diagrams Ethan Seefried et.al. 2601.13299 translate read null
2026-01-19 The Tag is the Signal: URL-Agnostic Credibility Scoring for Messages on Telegram Yipeng Wang et.al. 2601.13294 translate read null
2026-01-19 Semantic Communication in Underwater IoT Networks for Meaning-Driven Connectivity Ruhul Amin Khalil et.al. 2601.13289 translate read null
2026-01-19 Balancing Classification and Calibration Performance in Decision-Making LLMs via Calibration Aware Reinforcement Learning Duygu Nur Yaldiz et.al. 2601.13284 translate read null
2026-01-19 Improving the Safety and Trustworthiness of Medical AI via Multi-Agent Evaluation Loops Zainab Ghafoor et.al. 2601.13268 translate read null
2026-01-19 Unlearning in LLMs: Methods, Evaluation, and Open Challenges Tyler Lizzo et.al. 2601.13264 translate read null
2026-01-19 CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning Eric Onyame et.al. 2601.13262 translate read null
2026-01-19 Stop Taking Tokenizers for Granted: They Are Core Design Decisions in Large Language Models Sawsan Alqahtani et.al. 2601.13260 translate read null
2026-01-19 Aligning Agentic World Models via Knowledgeable Experience Learning Baochang Ren et.al. 2601.13247 translate read null
2026-01-19 A Comprehensive Evaluation of LLM Reasoning: From Single-Model to Multi-Agent Paradigms Yapeng Li et.al. 2601.13243 translate read null
2026-01-19 KOCO-BENCH: Can Large Language Models Leverage Domain Knowledge in Software Development? Xue Jiang et.al. 2601.13240 translate read null
2026-01-19 GTPred: Benchmarking MLLMs for Interpretable Geo-localization and Time-of-capture Prediction Jinnao Li et.al. 2601.13207 translate read null
2026-01-19 Real-Time Deadlines Reveal Temporal Awareness Failures in LLM Strategic Dialogues Neil K. R. Sehgal et.al. 2601.13206 translate read null
2026-01-19 Scientific production in the era of Large Language Models Keigo Kusumegi et.al. 2601.13187 translate read null
2026-01-19 Prompt Injection Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching Diego Gosmar et.al. 2601.13186 translate read null
2026-01-19 Training instability in deep learning follows low-dimensional dynamical principles Zhipeng Zhang et.al. 2601.13160 translate read null
2026-01-19 Seeing Radio: From Zero RF Priors to Explainable Modulation Recognition with Vision Language Models Hang Zou et.al. 2601.13157 translate read null
2026-01-19 Probe and Skip: Self-Predictive Token Skipping for Efficient Long-Context LLM Inference Zimeng Wu et.al. 2601.13155 translate read null
2026-01-19 FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference Chaeyoung Jung et.al. 2601.13143 translate read null
2026-01-19 From Human to Machine Refactoring: Assessing GPT-4’s Impact on Python Class Quality and Readability Alessandro Midolo et.al. 2601.13139 translate read null
2026-01-19 Adversarial Alignment: Ensuring Value Consistency in Large Language Models for Sensitive Domains Yuan Gao et.al. 2601.13137 translate read null
2026-01-19 Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization Alessandro Midolo et.al. 2601.13118 translate read null
2026-01-19 Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning Fengran Mo et.al. 2601.13115 translate read null
2026-01-19 Leveraging Lora Fine-Tuning and Knowledge Bases for Construction Identification Liu Kaipeng et.al. 2601.13105 translate read null
2026-01-19 Alexandria: A Multi-Domain Dialectal Arabic Machine Translation Dataset for Culturally Inclusive and Linguistically Diverse LLMs Abdellah El Mekki et.al. 2601.13099 translate read null
2026-01-19 LLM-VLM Fusion Framework for Autonomous Maritime Port Inspection using a Heterogeneous UAV-USV System Muhayy Ud Din et.al. 2601.13096 translate read null
2026-01-19 Adversarial News and Lost Profits: Manipulating Headlines in LLM-Driven Algorithmic Trading Advije Rizvani et.al. 2601.13082 translate read null
2026-01-19 What’s it like to be a chat? On the co-simulation of artificial minds in human-AI conversations Geoff Keeling et.al. 2601.13081 translate read null
2026-01-19 Profiling German Text Simplification with Interpretable Model-Fingerprints Lars Klöser et.al. 2601.13050 translate read null
2026-01-19 Tears or Cheers? Benchmarking LLMs via Culturally Elicited Distinct Affective Responses Chongyuan Dai et.al. 2601.13024 translate read null
2026-01-19 PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning Zhiyan Hou et.al. 2601.13020 translate read null
2026-01-19 MeltRTL: Multi-Expert LLMs with Inference-time Intervention for RTL Code Generation Nowfel Mashnoor et.al. 2601.13015 translate read null
2026-01-19 ArchAgent: Scalable Legacy Software Architecture Recovery with LLMs Rusheng Pan et.al. 2601.13007 translate read null
2026-01-19 Graph Reasoning Paradigm: Structured and Symbolic Reasoning with Topology-Aware Reinforcement Learning for Large Language Models Runxuan Liu et.al. 2601.12995 translate read null
2026-01-19 RAGExplorer: A Visual Analytics System for the Comparative Diagnosis of RAG Systems Haoyu Tian et.al. 2601.12991 translate read null
2026-01-19 PaperGuide: Making Small Language-Model Paper-Reading Agents More Efficient Zijian Wang et.al. 2601.12988 translate read null
2026-01-19 KinGuard: Hierarchical Kinship-Aware Fingerprinting to Defend Against Large Language Model Stealing Zhenhua Xu et.al. 2601.12986 translate read null
2026-01-19 Rules, Resources, and Restrictions: A Taxonomy of Task-Based Information Request Intents Melanie A. Kilian et.al. 2601.12985 translate read null
2026-01-19 ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation Jesus-German Ortiz-Barajas et.al. 2601.12983 translate read null
2026-01-19 The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check Qingyu Lu et.al. 2601.12979 translate read null
2026-01-19 Bridging the Knowledge-Action Gap by Evaluating LLMs in Dynamic Dental Clinical Scenarios Hongyang Ma et.al. 2601.12974 translate read null
2026-01-19 ACE-Align: Attribute Causal Effect Alignment for Cultural Values under Varying Persona Granularities Jiatang Luo et.al. 2601.12962 translate read null
2026-01-19 Beyond Accuracy: Characterizing Code Comprehension Capabilities in (Large) Language Models Felix Mächtle et.al. 2601.12951 translate read null
2026-01-19 AI-generated data contamination erodes pathological variability and diagnostic reliability Hongyu He et.al. 2601.12946 translate read null
2026-01-19 A Component-Based Survey of Interactions between Large Language Models and Multi-Armed Bandits Miao Xie et.al. 2601.12945 translate read null
2026-01-19 On the Evidentiary Limits of Membership Inference for Copyright Auditing Murat Bilgehan Ertan et.al. 2601.12937 translate read null
2026-01-19 A Benchmark for Language Models in Real-World System Building Weilin Jin et.al. 2601.12927 translate read null
2026-01-19 Dual-Stream Collaborative Transformer for Image Captioning Jun Wan et.al. 2601.12926 translate read null
2026-01-19 Injecting Knowledge from Social Science Journals to Improve Indonesian Cultural Understanding by LLMs Adimulya Kartiyasa et.al. 2601.12921 translate read null
2026-01-19 CooperLLM: Cloud-Edge-End Cooperative Federated Fine-tuning for LLMs via ZOO-based Gradient Correction He Sun et.al. 2601.12917 translate read null
2026-01-19 From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation Jiahao Wang et.al. 2601.12904 translate read null
2026-01-19 Efficient Code Analysis via Graph-Guided Large Language Models Hang Gao et.al. 2601.12890 translate read null
2026-01-19 Race, Ethnicity and Their Implication on Bias in Large Language Models Shiyue Hu et.al. 2601.12868 translate read null
2026-01-19 SCULPT: Constraint-Guided Pruned MCTS that Carves Efficient Paths for Mathematical Reasoning Qitong Fang et.al. 2601.12842 translate read null
2026-01-19 Do Clinical Question Answering Systems Really Need Specialised Medical Fine Tuning? Sushant Kumar Ray et.al. 2601.12812 translate read null
2026-01-19 Semi-supervised Instruction Tuning for Large Language Models on Text-Attributed Graphs Zixing Song et.al. 2601.12807 translate read null
2026-01-19 SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding Xiaohan Huang et.al. 2601.12805 translate read null
2026-01-19 VIRO: Robust and Efficient Neuro-Symbolic Reasoning with Verification for Referring Expression Comprehension Hyejin Park et.al. 2601.12781 translate read null
2026-01-19 Who Does This Name Remind You of? Nationality Prediction via Large Language Model Associative Memory Keito Inoshita et.al. 2601.12771 translate read null
2026-01-19 Spatial-VLN: Zero-Shot Vision-and-Language Navigation With Explicit Spatial Perception and Exploration Lu Yue et.al. 2601.12766 translate read null
2026-01-19 Teaching LLMs to Learn Tool Trialing and Execution through Environment Interaction Xingjie Gao et.al. 2601.12762 translate read link
2026-01-19 VISPA: Pluralistic Alignment via Automatic Value Selection and Activation Shenyan Zheng et.al. 2601.12758 translate read null
2026-01-19 PAIR-SAFE: A Paired-Agent Approach for Runtime Auditing and Refining AI-Mediated Mental Health Support Jiwon Kim et.al. 2601.12754 translate read null
2026-01-19 Towards Robust Process Reward Modeling via Noise-aware Learning Bin Xie et.al. 2601.12748 translate read null
2026-01-19 Vision Language Models for Optimization-Driven Intent Processing in Autonomous Networks Tasnim Ahmed et.al. 2601.12744 translate read null
2026-01-19 A Shared Geometry of Difficulty in Multilingual Language Models Stefano Civelli et.al. 2601.12731 translate read null
2026-01-19 Distribution-Centric Policy Optimization Dominates Exploration-Exploitation Trade-off Zhaochun Li et.al. 2601.12730 translate read link
2026-01-19 AI-exhibited Personality Traits Can Shape Human Self-concept through Conversations Jingshu Li et.al. 2601.12727 translate read null
2026-01-19 An Evolutionary Framework for Automatic Optimization Benchmark Generation via Large Language Models Yuhiro Ono et.al. 2601.12723 translate read null
2026-01-19 CellularSpecSec-Bench: A Staged Benchmark for Evidence-Grounded Interpretation and Security Reasoning over 3GPP Specifications Ke Xie et.al. 2601.12716 translate read null
2026-01-19 Neurosymbolic LoRA: Why and When to Tune Weights vs. Rewrite Prompts Kevin Wang et.al. 2601.12711 translate read null
2026-01-19 Improving Audio Question Answering with Variational Inference Haolin Chen et.al. 2601.12700 translate read null
2026-01-19 MetaToolAgent: Towards Generalizable Tool Usage in LLMs through Meta-Learning Zheng Fang et.al. 2601.12680 translate read null
2026-01-19 MedConsultBench: A Full-Cycle, Fine-Grained, Process-Aware Benchmark for Medical Consultation Agents Chuhan Qiao et.al. 2601.12661 translate read null
2026-01-19 Augmenting Question Answering with A Hybrid RAG Approach Tianyi Yang et.al. 2601.12658 translate read null
2026-01-19 Ethical Risks in Deploying Large Language Models: An Evaluation of Medical Ethics Jailbreaking Chutian Huang et.al. 2601.12652 translate read null
2026-01-19 Intelligent Documentation in Medical Education: Can AI Replace Manual Case Logging? Nafiz Imtiaz Khan et.al. 2601.12648 translate read null
2026-01-19 STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models Xiangyu Shi et.al. 2601.12641 translate read null
2026-01-19 BioPulse-QA: A Dynamic Biomedical Question-Answering Benchmark for Evaluating Factuality, Robustness, and Bias in Large Language Models Kriti Bhattarai et.al. 2601.12632 translate read null
2026-01-16 Extractive summarization on a CMOS Ising machine Ziqing Zeng et.al. 2601.11491 translate read null
2026-01-16 Health Facility Location in Ethiopia: Leveraging LLMs to Integrate Expert Knowledge into Algorithmic Planning Yohai Trabelsi et.al. 2601.11479 translate read null
2026-01-16 Predict the Retrieval! Test time adaptation for Retrieval Augmented Generation Xin Sun et.al. 2601.11443 translate read null
2026-01-16 Hierarchical Orthogonal Residual Spread for Precise Massive Editing in Large Language Models Xiaojie Gu et.al. 2601.11441 translate read null
2026-01-16 The unreasonable effectiveness of pattern matching Gary Lupyan et.al. 2601.11432 translate read null
2026-01-16 Relational Linearity is a Predictor of Hallucinations Yuetian Lu et.al. 2601.11429 translate read null
2026-01-16 Understanding Help Seeking for Digital Privacy, Safety, and Security Kurt Thomas et.al. 2601.11398 translate read null
2026-01-16 Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning Haomiao Tang et.al. 2601.11393 translate read null
2026-01-16 Evaluating LLM Behavior in Hiring: Implicit Weights, Fairness Across Groups, and Alignment with Human Preferences Morgane Hoffmann et.al. 2601.11379 translate read null
2026-01-16 Reward Modeling for Scientific Writing Evaluation Furkan Şahinuç et.al. 2601.11374 translate read null
2026-01-16 RITA: A Tool for Automated Requirements Classification and Specification from Online User Feedback Manjeshwar Aniruddh Mallya et.al. 2601.11362 translate read null
2026-01-16 Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding Wenhui Tan et.al. 2601.11359 translate read null
2026-01-16 AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems Weiyi Wang et.al. 2601.11354 translate read link
2026-01-16 How Much Would a Clinician Edit This Draft? Evaluating LLM Alignment for Patient Message Response Drafting Parker Seegmiller et.al. 2601.11344 translate read null
2026-01-16 Unlocking the Potentials of Retrieval-Augmented Generation for Diffusion Language Models Chuanyue Yu et.al. 2601.11342 translate read null
2026-01-16 Neural Chain-of-Thought Search: Searching the Optimal Reasoning Path to Enhance Large Language Models Guoming Ling et.al. 2601.11340 translate read null
2026-01-16 Idea First, Code Later: Disentangling Problem Solving from Code Generation in Evaluating LLMs for Competitive Programming Sama Hadhoud et.al. 2601.11332 translate read null
2026-01-16 Membership Inference on LLMs in the Wild Jiatong Yi et.al. 2601.11314 translate read null
2026-01-16 FORESTLLM: Large Language Models Make Random Forest Great on Few-shot Tabular Learning Zhihan Yang et.al. 2601.11311 translate read null
2026-01-16 One LLM to Train Them All: Multi-Task Learning Framework for Fact-Checking Malin Astrid Larsson et.al. 2601.11293 translate read null
2026-01-16 Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation Pingzhi Tang et.al. 2601.11258 translate read null
2026-01-16 Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering Yuling Shi et.al. 2601.11255 translate read null
2026-01-16 LLM-Assisted Pseudo-Relevance Feedback David Otero et.al. 2601.11238 translate read null
2026-01-16 How DDAIR you? Disambiguated Data Augmentation for Intent Recognition Galo Castillo-López et.al. 2601.11234 translate read null
2026-01-16 FactCorrector: A Graph-Inspired Approach to Long-Form Factuality Correction of Large Language Models Javier Carnerero-Cano et.al. 2601.11232 translate read null
2026-01-16 Language of Thought Shapes Output Diversity in Large Language Models Shaoyang Xu et.al. 2601.11227 translate read null
2026-01-16 MultiCaption: Detecting disinformation using multilingual visual claims Rafael Martins Frade et.al. 2601.11220 translate read null
2026-01-16 SDFLoRA: Selective Dual-Module LoRA for Federated Fine-tuning with Heterogeneous Clients Zhikang Shen et.al. 2601.11219 translate read null
2026-01-16 FAQ: Mitigating Quantization Error via Regenerating Calibration Data with Family-Aware Quantization Haiyang Xiao et.al. 2601.11200 translate read null
2026-01-16 SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation Aiman Al Masoud et.al. 2601.11199 translate read null
2026-01-16 From Knots to Knobs: Towards Steerable Collaborative Filtering Using Sparse Autoencoders Martin Spišák et.al. 2601.11182 translate read null
2026-01-16 Do We Always Need Query-Level Workflows? Rethinking Agentic Workflow Generation for Multi-Agent Systems Zixu Wang et.al. 2601.11147 translate read null
2026-01-16 Learn Before Represent: Bridging Generative and Contrastive Learning for Domain-Specific LLM Embeddings Xiaoyu Liang et.al. 2601.11124 translate read null
2026-01-16 Optimized Algorithms for Text Clustering with LLM-Generated Constraints Chaoqi Jia et.al. 2601.11118 translate read null
2026-01-16 Differentially Private Subspace Fine-Tuning for Large Language Models Lele Zheng et.al. 2601.11113 translate read null
2026-01-16 Simple Models, Rich Representations: Visual Decoding from Primate Intracortical Neural Signals Matteo Ciferri et.al. 2601.11108 translate read null
2026-01-16 ReCreate: Reasoning and Creating Domain Agents Driven by Experience Zhezheng Hao et.al. 2601.11100 translate read null
2026-01-16 Integrity Shield A System for Ethical AI Use & Authorship Transparency in Assessments Ashish Raj Shekhar et.al. 2601.11093 translate read null
2026-01-16 ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development Jie Yang et.al. 2601.11077 translate read link
2026-01-16 Visual question answering-based image-finding generation for pulmonary nodules on chest CT from structured annotations Maiko Nagao et.al. 2601.11075 translate read null
2026-01-16 H-AIM: Orchestrating LLMs, PDDL, and Behavior Trees for Hierarchical Multi-Robot Planning Haishan Zeng et.al. 2601.11063 translate read null
2026-01-16 Children’s Expectations, Engagement, and Evaluation of an LLM-enabled Spherical Visualization Platform in the Classroom Emelie Fälton et.al. 2601.11060 translate read null
2026-01-16 Predicting Biased Human Decision-Making with Large Language Models in Conversational Settings Stephen Pilli et.al. 2601.11049 translate read null
2026-01-16 CoG: Controllable Graph Reasoning via Relational Blueprints and Failure-Aware Refinement over Knowledge Graphs Yuanxiang Liu et.al. 2601.11047 translate read null
2026-01-16 AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts Keyu Li et.al. 2601.11044 translate read link
2026-01-16 Spectral Characterization and Mitigation of Sequential Knowledge Editing Collapse Chi Zhang et.al. 2601.11042 translate read null
2026-01-16 Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data Xuanming Zhang et.al. 2601.11038 translate read null
2026-01-16 PruneRAG: Confidence-Guided Query Decomposition Trees for Efficient Retrieval-Augmented Generation Shuguang Jiao et.al. 2601.11024 translate read null
2026-01-16 Combating Spurious Correlations in Graph Interpretability via Self-Reflection Kecheng Cai et.al. 2601.11021 translate read null
2026-01-16 Finding the Translation Switch: Discovering and Exploiting the Task-Initiation Features in LLMs Xinwei Wu et.al. 2601.11019 translate read null
2026-01-16 NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Jiayu Liu et.al. 2601.11004 translate read null
2026-01-16 Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies Qianen Zhang et.al. 2601.11002 translate read null
2026-01-16 When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs Zhongxiang Sun et.al. 2601.11000 translate read null
2026-01-16 Data-driven Prediction of Ionic Conductivity in Solid-State Electrolytes with Machine Learning and Large Language Models Haewon Kim et.al. 2601.10997 translate read null
2026-01-16 ZPD Detector: Data Selection via Capability-Difficulty Alignment for Large Language Models Bo Yang et.al. 2601.10986 translate read null
2026-01-16 Evaluating 21st-Century Competencies in Postsecondary Curricula with Large Language Models: Performance Benchmarking and Reasoning-Based Prompting Strategies Zhen Xu et.al. 2601.10983 translate read null
2026-01-16 AJAR: Adaptive Jailbreak Architecture for Red-teaming Yipu Dou et.al. 2601.10971 translate read null
2026-01-16 Large Wireless Foundation Models: Stronger over Bigger Xiang Cheng et.al. 2601.10963 translate read null
2026-01-16 Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents Kaiyu Zhou et.al. 2601.10955 translate read null
2026-01-16 SwiftKV: An Edge-Oriented Attention Algorithm and Multi-Head Accelerator for Fast, Efficient LLM Decoding Junming Zhang et.al. 2601.10953 translate read null
2026-01-16 Multi-Stage Patient Role-Playing Framework for Realistic Clinical Interactions Shijie Jiang et.al. 2601.10951 translate read null
2026-01-16 HOSL: Hybrid-Order Split Learning for Memory-Constrained Edge Training Aakriti et.al. 2601.10940 translate read null
2026-01-15 FrankenMotion: Part-level Human Motion Generation and Composition Chuqiao Li et.al. 2601.10909 translate read link
2026-01-15 Topic Modeling in New Physics Detection Alexandre Alves et.al. 2601.10871 translate read null
2026-01-15 Multi-Agent Taint Specification Extraction for Vulnerability Detection Jonah Ghebremichael et.al. 2601.10865 translate read null
2026-01-15 Reasoning Models Generate Societies of Thought Junsol Kim et.al. 2601.10825 translate read null
2026-01-15 Mugi: Value Level Parallelism For Efficient LLMs Daniel Price et.al. 2601.10823 translate read null
2026-01-15 Digital Metabolism: Decoupling Logic from Facts via Regenerative Unlearning – Towards a Pure Neural Logic Core Mengmeng Peng et.al. 2601.10810 translate read null
2026-01-15 A Concise Agent is Less Expert: Revealing Side Effects of Using Style Features on Conversational Agents Young-Min Cho et.al. 2601.10809 translate read null
2026-01-15 BYOL: Bring Your Own Language Into LLMs Syed Waqas Zamir et.al. 2601.10804 translate read null
2026-01-15 Bidirectional Human-Robot Communication for Physical Human-Robot Interaction Junxiang Wang et.al. 2601.10796 translate read null
2026-01-15 LogicLens: Leveraging Semantic Code Graph to explore Multi Repository large systems Niko Usai et.al. 2601.10773 translate read null
2026-01-15 Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers Runyuan Cai et.al. 2601.10770 translate read null
2026-01-14 Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents Fengchao Chen et.al. 2601.10758 translate read null
2026-01-15 MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching Changle Qu et.al. 2601.10712 translate read link
2026-01-15 From One-to-One to Many-to-Many: Dynamic Cross-Layer Injection for Deep Vision-Language Fusion Cheng Chen et.al. 2601.10710 translate read null
2026-01-15 Grounding Agent Memory in Contextual Intent Ruozhen Yang et.al. 2601.10702 translate read null
2026-01-15 LIBERTy: A Causal Framework for Benchmarking Concept-Based Explanations of LLMs with Structural Counterfactuals Gilat Toker et.al. 2601.10700 translate read null
2026-01-15 Structure and Diversity Aware Context Bubble Construction for Enterprise Retrieval Augmented Systems Amir Khurshid et.al. 2601.10681 translate read null
2026-01-15 Are Your Reasoning Models Reasoning or Guessing? A Mechanistic Analysis of Hierarchical Reasoning Models Zirui Ren et.al. 2601.10679 translate read null
2026-01-15 Single-Stage Huffman Encoder for ML Compression Aditya Agrawal et.al. 2601.10673 translate read null
2026-01-15 Detecting Winning Arguments with Large Language Models and Persuasion Strategies Tiziano Labruna et.al. 2601.10660 translate read null
2026-01-15 PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution Minghao Yan et.al. 2601.10657 translate read null
2026-01-15 Influential Training Data Retrieval for Explaining Verbalized Confidence of LLMs Yuxi Xia et.al. 2601.10645 translate read null
2026-01-15 iTIMO: An LLM-empowered Synthesis Dataset for Travel Itinerary Modification Zhuoxuan Huang et.al. 2601.10609 translate read null
2026-01-15 Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay Hao Wang et.al. 2601.10589 translate read null
2026-01-15 From Single to Multi-Agent Reasoning: Advancing GeneGPT for Genomics QA Kimia Abedini et.al. 2601.10581 translate read null
2026-01-15 Generative AI collective behavior needs an interactionist paradigm Laura Ferrarotti et.al. 2601.10567 translate read null
2026-01-15 Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing Yinzhi Zhao et.al. 2601.10543 translate read null
2026-01-15 A Propagation Framework for Network Regression Yingying Ma et.al. 2601.10533 translate read null
2026-01-15 PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models Chengbing Wang et.al. 2601.10532 translate read null
2026-01-15 A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5 Xingjun Ma et.al. 2601.10527 translate read null
2026-01-15 Diagnosing Generalization Failures in Fine-Tuned LLMs: A Cross-Architectural Study on Phishing Detection Frank Bobe et.al. 2601.10524 translate read null
2026-01-15 DR-Arena: an Automated Evaluation Framework for Deep Research Agents Yiwen Gao et.al. 2601.10504 translate read null
2026-01-15 Projected Microbatch Accumulation yields reference-free proximal policy updates for reinforcement learning Nilin Abrahamsen et.al. 2601.10498 translate read null
2026-01-15 Model See, Model Do? Exposure-Aware Evaluation of Bug-vs-Fix Preference in Code LLMs Ali Al-Kaswan et.al. 2601.10496 translate read null
2026-01-15 ChartComplete: A Taxonomy-based Inclusive Chart Dataset Ahmad Mustapha et.al. 2601.10462 translate read null
2026-01-15 Contextual StereoSet: Stress-Testing Bias Alignment Robustness in Large Language Models Abhinaba Basu et.al. 2601.10460 translate read null
2026-01-15 LangLasso: Interactive Cluster Descriptions through LLM Explanation Raphael Buchmüller et.al. 2601.10458 translate read null
2026-01-15 NSR-Boost: A Neuro-Symbolic Residual Boosting Framework for Industrial Legacy Models Ziming Dai et.al. 2601.10457 translate read null
2026-01-15 Development of Ontological Knowledge Bases by Leveraging Large Language Models Le Ngoc Luyen et.al. 2601.10436 translate read null
2026-01-15 LLMdoctor: Token-Level Flow-Guided Preference Optimization for Efficient Test-Time Alignment of Large Language Models Tiesunlong Shen et.al. 2601.10416 translate read null
2026-01-15 LADFA: A Framework of Using Large Language Models and Retrieval-Augmented Generation for Personal Data Flow Analysis in Privacy Policies Haiyue Yuan et.al. 2601.10413 translate read null
2026-01-15 Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering Xinyu Zhu et.al. 2601.10402 translate read null
2026-01-15 LatentRefusal: Latent-Signal Refusal for Unanswerable Text-to-SQL Queries Xuancheng Ren et.al. 2601.10398 translate read null
2026-01-15 The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models Christina Lu et.al. 2601.10387 translate read null
2026-01-15 Advanced Manufacturing with Renewable and Bio-based Materials: AI/ML workflows and Process Optimization Rigoberto Advincula et.al. 2601.10382 translate read null
2026-01-15 Fine-Grained Human Pose Editing Assessment via Layer-Selective MLLMs Ningyu Sun et.al. 2601.10369 translate read null
2026-01-15 Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text Zhihao Xu et.al. 2601.10355 translate read null
2026-01-15 SuS: Strategy-aware Surprise for Intrinsic Exploration Mark Kashirskiy et.al. 2601.10349 translate read null
2026-01-15 C-GRASP: Clinically-Grounded Reasoning for Affective Signal Processing Cheng Lin Cheng et.al. 2601.10342 translate read null
2026-01-15 Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders Siqi Kou et.al. 2601.10332 translate read null
2026-01-15 ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding Xueyun Tian et.al. 2601.10323 translate read null
2026-01-15 An Efficient Long-Context Ranking Architecture With Calibrated LLM Distillation: Application to Person-Job Fit Warren Jouanneau et.al. 2601.10321 translate read null
2026-01-15 The Straight and Narrow: Do LLMs Possess an Internal Moral Path? Luoming Hu et.al. 2601.10307 translate read null
2026-01-15 DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset Hengyu Shen et.al. 2601.10305 translate read null
2026-01-15 Queueing-Aware Optimization of Reasoning Tokens for Accuracy-Latency Trade-offs in LLM Servers Emre Ozbas et.al. 2601.10274 translate read null
2026-01-15 MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts Yuxuan Lou et.al. 2601.10272 translate read null
2026-01-15 In-Context Source and Channel Coding Ziqiong Wang et.al. 2601.10267 translate read null
2026-01-15 NoReGeo: Non-Reasoning Geometry Benchmark Irina Abdullaeva et.al. 2601.10254 translate read null
2026-01-15 Loop as a Bridge: Can Looped Transformers Truly Link Representation Space and Natural Language Outputs? Guanxu Chen et.al. 2601.10242 translate read null
2026-01-15 GeoSteer: Faithful Chain-of-Thought Steering via Latent Manifold Gradients Kentaro Kazama et.al. 2601.10229 translate read null
2026-01-15 Optimizing Multimodal LLMs for Egocentric Video Understanding: A Solution for the HD-EPIC VQA Challenge Sicheng Yang et.al. 2601.10228 translate read null
2026-01-15 PRL: Process Reward Learning Improves LLMs’ Reasoning Ability and Broadens the Reasoning Boundary Jiarui Yao et.al. 2601.10201 translate read null
2026-01-15 HUMANLLM: Benchmarking and Reinforcing LLM Anthropomorphism via Human Cognitive Patterns Xintao Wang et.al. 2601.10198 translate read null
2026-01-15 Autonomous Quantum Simulation through Large Language Model Agents Weitang Li et.al. 2601.10194 translate read null
2026-01-15 GFM4GA: Graph Foundation Model for Group Anomaly Detection Jiujiu Chen et.al. 2601.10193 translate read null
2026-01-15 HOMURA: Taming the Sand-Glass for Time-Constrained LLM Translation via Reinforcement Learning Ziang Cui et.al. 2601.10187 translate read null
2026-01-15 ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack Hao Li et.al. 2601.10173 translate read null
2026-01-15 Credit C-GPT: A Domain-Specialized Large Language Model for Conversational Understanding in Vietnamese Debt Collection Nhung Nguyen Thi Hong et.al. 2601.10167 translate read null
2026-01-15 Advancing Adaptive Multi-Stage Video Anomaly Reasoning: A Benchmark Dataset and Method Chao Huang et.al. 2601.10165 translate read null
2026-01-15 AWED-FiNER: Agents, Web applications, and Expert Detectors for Fine-grained Named Entity Recognition across 36 Languages for 6.6 Billion Speakers Prachuryya Kaushik et.al. 2601.10161 translate read link
2026-01-15 LOOKAT: Lookup-Optimized Key-Attention for Memory-Efficient Transformers Aryan Karmore et.al. 2601.10155 translate read null
2026-01-15 DecisionLLM: Large Language Models for Long Sequence Decision Exploration Xiaowei Lv et.al. 2601.10148 translate read null
2026-01-15 Actors, Frames and Arguments: A Multi-Decade Computational Analysis of Climate Discourse in Financial News using Large Language Models Ruiran Su et.al. 2601.10142 translate read null
2026-01-15 Understanding and Preserving Safety in Fine-Tuned LLMs Jiawen Zhang et.al. 2601.10141 translate read null
2026-01-15 Is More Context Always Better? Examining LLM Reasoning Capability for Time Interval Prediction Yanan Cao et.al. 2601.10132 translate read null
2026-01-15 M^4olGen: Multi-Agent, Multi-Stage Molecular Generation under Precise Multi-Property Constraints Yizhan Li et.al. 2601.10131 translate read null
2026-01-15 LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning Linquan Wu et.al. 2601.10129 translate read link
2026-01-15 Role-Playing Agents Driven by Large Language Models: Current Status, Challenges, and Future Trends Ye Wang et.al. 2601.10122 translate read null
2026-01-15 Following the Teacher’s Footsteps: Scheduled Checkpoint Distillation for Domain-Specific LLMs Cheng Feng et.al. 2601.10114 translate read null
2026-01-15 SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature Yiming Ren et.al. 2601.10108 translate read null
2026-01-15 When Personas Override Payoffs: Role Identity Bias in Multi-Agent LLM Decision-Making Viswonathan Manoranjan et.al. 2601.10102 translate read null
2026-01-15 MATRIX AS PLAN: Structured Logical Reasoning with Feedback-Driven Replanning Ke Chen et.al. 2601.10101 translate read null
2026-01-15 Multilingual-To-Multimodal (M2M): Unlocking New Languages with Monolingual Text Piyush Singh Pasi et.al. 2601.10096 translate read link
2026-01-15 State of AI: An Empirical 100 Trillion Token Study with OpenRouter Malika Aubakirova et.al. 2601.10088 translate read null
2026-01-15 CALM-IT: Generating Realistic Long-Form Motivational Interviewing Dialogues with Dual-Actor Conversational Dynamics Tracking Viet Cuong Nguyen et.al. 2601.10085 translate read null
2026-01-15 Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts Sijia Luo et.al. 2601.10079 translate read null
2026-01-15 Long-Chain Reasoning Distillation via Adaptive Prefix Alignment Zhenghao Liu et.al. 2601.10064 translate read null
2026-01-15 Unlabeled Data Can Provably Enhance In-Context Learning of Transformers Renpu Liu et.al. 2601.10058 translate read null
2026-01-15 Privacy Enhanced PEFT: Tensor Train Decomposition Improves Privacy Utility Tradeoffs under DP-SGD Pradip Kunwar et.al. 2601.10045 translate read null
2026-01-15 Instruction Finetuning LLaMA-3-8B Model Using LoRA for Financial Named Entity Recognition Zhiming Lian et.al. 2601.10043 translate read null
2026-01-15 EmplifAI: a Fine-grained Dataset for Japanese Empathetic Medical Dialogues in 28 Emotion Labels Wan Jou She et.al. 2601.10033 translate read null
2026-01-15 Structured Personality Control and Adaptation for LLM Agents Jinpeng Wang et.al. 2601.10025 translate read null
2026-01-15 Empowering Older Adults in Digital Technology Use with Foundation Models Hasti Sharifi et.al. 2601.10018 translate read null
2026-01-15 VERHallu: Evaluating and Mitigating Event Relation Hallucination in Video Large Language Models Zefan Zhang et.al. 2601.10010 translate read null
2026-01-15 SoK: Privacy-aware LLM in Healthcare: Threat Model, Privacy Techniques, Challenges and Recommendations Mohoshin Ara Tahera et.al. 2601.10004 translate read null
2026-01-15 Towards Native Intelligence: 6G-LLM Trained with Reinforcement Learning from NDT Feedback Zhuoran Xiao et.al. 2601.09992 translate read null
2026-01-15 Context Volume Drives Performance: Tackling Domain Shift in Extremely Low-Resource Translation via RAG David Samuel Setiawan et.al. 2601.09982 translate read null
2026-01-15 DR $^2$ Seg: Decomposed Two-Stage Rollouts for Efficient Reasoning Segmentation in Multimodal Large Language Models Yulin He et.al. 2601.09981 translate read null
2026-01-15 Performance of AI agents based on reasoning language models on ALD process optimization tasks Angel Yanguas-Gil et.al. 2601.09980 translate read null
2026-01-15 SPRInG: Continual LLM Personalization via Selective Parametric Adaptation and Retrieval-Interpolated Generation Seoyeon Kim et.al. 2601.09974 translate read null
2026-01-15 Chinese Labor Law Large Language Model Benchmark Zixun Lan et.al. 2601.09972 translate read null
2026-01-15 An Exploratory Study to Repurpose LLMs to a Unified Architecture for Time Series Classification Hansen He et.al. 2601.09971 translate read null
2026-01-15 Take Out Your Calculators: Estimating the Real Difficulty of Question Items with LLM Student Simulations Christabel Acquaye et.al. 2601.09953 translate read null
2026-01-14 How Diplomacy Reshapes Online Discourse:Asymmetric Persistence in Online Framing of North Korea Hunjun Shin et.al. 2601.09942 translate read null
2026-01-14 Hallucination Detection and Mitigation in Large Language Models Ahmad Pesaranghader et.al. 2601.09929 translate read null
2026-01-14 Continuum Memory Architectures for Long-Horizon LLM Agents Joe Logan et.al. 2601.09913 translate read null
2026-01-14 Self-reflection in Automated Qualitative Coding: Improving Text Annotation through Secondary LLM Critique Zackary Okun Dunivin et.al. 2601.09905 translate read null
2026-01-14 Beyond Rule-Based Workflows: An Information-Flow-Orchestrated Multi-Agents Paradigm via Agent-to-Agent Communication from CORAL Xinxing Ren et.al. 2601.09883 translate read null
2026-01-14 MedVL-SAM2: A unified 3D medical vision-language model for multimodal reasoning and prompt-driven segmentation Yang Xing et.al. 2601.09879 translate read null
2026-01-14 Beyond Strict Rules: Assessing the Effectiveness of Large Language Models for Code Smell Detection Saymon Souza et.al. 2601.09873 translate read null
2026-01-14 A Scoping Review of the Ethical Perspectives on Anthropomorphising Large Language Model-Based Conversational Agents Andrea Ferrario et.al. 2601.09869 translate read null
2026-01-14 Advancing Model Refinement: Muon-Optimized Distillation and Quantization for LLM Deployment Jacob Sander et.al. 2601.09865 translate read null
2026-01-14 OUTLINEFORGE: Hierarchical Reinforcement Learning with Explicit States for Scientific Writing Yilin Bao et.al. 2601.09858 translate read null
2026-01-14 MedRedFlag: Investigating how LLMs Redirect Misconceptions in Real-World Health Communication Sraavya Sambara et.al. 2601.09853 translate read null
2026-01-14 Strategies of cooperation and defection in five large language models Saptarshi Pal et.al. 2601.09849 translate read null
2026-01-14 Stable and Explainable Personality Trait Evaluation in Large Language Models with Internal Activations Xiaoxu Ma et.al. 2601.09833 translate read null
2026-01-14 UniHash: Unifying Pointwise and Pairwise Hashing Paradigms for Seen and Unseen Category Retrieval Xiaoxu Ma et.al. 2601.09828 translate read null
2026-01-14 LLM-Based Agentic Systems for Software Engineering: Challenges and Opportunities Yongjian Tang et.al. 2601.09822 translate read null
2026-01-14 Antisocial behavior towards large language model users: experimental evidence Paweł Niszczota et.al. 2601.09772 translate read null
2026-01-14 Explicating Tacit Regulatory Knowledge from LLMs to Auto-Formalize Requirements for Compliance Test Case Generation Zhiyi Xue et.al. 2601.09762 translate read null
2026-01-14 Investigating Tool-Memory Conflicts in Tool-Augmented LLMs Jiali Cheng et.al. 2601.09760 translate read null
2026-01-13 Synthetic Data for Veterinary EHR De-identification: Benefits, Limits, and Safety Trade-offs Under Fixed Compute David Brundage et.al. 2601.09756 translate read null
2026-01-12 SAGE: Tool-Augmented LLM Task Solving Strategies in Scalable Multi-Agent Environments Robert K. Strehlow et.al. 2601.09750 translate read null
2026-01-14 ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation Sicong Liu et.al. 2601.09703 translate read null
2026-01-14 How well LLM-based test generation techniques perform with newer LLM versions? Michael Konstantinou et.al. 2601.09695 translate read null
2026-01-14 LLMs can Compress LLMs: Adaptive Pruning by Agents Sai Varun Kodathala et.al. 2601.09694 translate read null
2026-01-14 Routing with Generated Data: Annotation-Free LLM Skill Estimation and Expert Selection Tianyi Niu et.al. 2601.09692 translate read null
2026-01-14 Disentangling Task Conflicts in Multi-Task LoRA via Orthogonal Gradient Projection Ziyu Yang et.al. 2601.09684 translate read null
2026-01-14 Automating Supply Chain Disruption Monitoring via an Agentic AI Approach Sara AlMahri et.al. 2601.09680 translate read null
2026-01-14 LLMs Got Rhythm? Hybrid Phonological Filtering for Greek Poetry Rhyme Detection and Generation Stergios Chatzikyriakidis et.al. 2601.09631 translate read null
2026-01-14 From Prompt to Protocol: Fast Charging Batteries with Large Language Models Ge Lei et.al. 2601.09626 translate read null
2026-01-14 The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multi-Step Malware Ben Nassi et.al. 2601.09625 translate read null
2026-01-14 DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing Qian Cao et.al. 2601.09609 translate read null
2026-01-14 GRCF: Two-Stage Groupwise Ranking and Calibration Framework for Multimodal Sentiment Analysis Manning Gao et.al. 2601.09606 translate read null
2026-01-14 OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding Sheng-Yu Huang et.al. 2601.09575 translate read null
2026-01-14 Dialogue Telemetry: Turn-Level Instrumentation for Autonomous Information Gathering Dimitris Panagopoulos et.al. 2601.09570 translate read null
2026-01-14 Hot-Start from Pixels: Low-Resolution Visual Tokens for Chinese Language Modeling Shuyang Xiang et.al. 2601.09566 translate read null
2026-01-14 Benchmarking Post-Training Quantization of Large Language Models under Microscaling Floating Point Formats Manyi Zhang et.al. 2601.09555 translate read null
2026-01-14 Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning Dongjie Cheng et.al. 2601.09536 translate read link
2026-01-14 MVSS: A Unified Framework for Multi-View Structured Survey Generation Yinqi Liu et.al. 2601.09504 translate read null
2026-01-14 What Do LLM Agents Know About Their World? Task2Quiz: A Paradigm for Studying Environment Understanding Siyuan Liu et.al. 2601.09503 translate read null
2026-01-14 SlidesGen-Bench: Evaluating Slides Generation via Computational and Quantitative Metrics Yunqiao Yang et.al. 2601.09487 translate read link
2026-01-14 Bridging Semantic Understanding and Popularity Bias with LLMs Renqiang Luo et.al. 2601.09478 translate read null
2026-01-14 SimMerge: Learning to Select Merge Operators from Similarity Signals Oliver Bolton et.al. 2601.09473 translate read null
2026-01-14 Personalized Multimodal Feedback Using Multiple External Representations: Strategy Profiles and Learning in High School Physics Natalia Revenga-Lozano et.al. 2601.09470 translate read null
2026-01-14 Dissecting Judicial Reasoning in U.S. Copyright Damage Awards Pei-Chi Lo et.al. 2601.09459 translate read null
2026-01-14 Population-Aligned Audio Reproduction With LLM-Based Equalizers Ioannis Stylianou et.al. 2601.09448 translate read null
2026-01-14 Improving Symbolic Translation of Language Models for Logical Reasoning Ramya Keerthy Thatikonda et.al. 2601.09446 translate read null
2026-01-14 SC-MAS: Constructing Cost-Efficient Multi-Agent Systems with Edge-Level Heterogeneous Collaboration Di Zhao et.al. 2601.09434 translate read null
2026-01-14 Video-MSR: Benchmarking Multi-hop Spatial Reasoning Capabilities of MLLMs Rui Zhu et.al. 2601.09430 translate read null
2026-01-14 TiInsight: A SQL-based Automated Exploratory Data Analysis System through Large Language Models Jun-Peng Zhu et.al. 2601.09404 translate read null
2026-01-14 Structured Knowledge Representation through Contextual Pages for Retrieval-Augmented Generation Xinze Li et.al. 2601.09402 translate read null
2026-01-14 Ability Transfer and Recovery via Modularized Parameters Localization Songyao Jin et.al. 2601.09398 translate read null
2026-01-14 SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing Ziyang Ma et.al. 2601.09385 translate read null
2026-01-14 Long-term Task-oriented Agent: Proactive Long-term Intent Maintenance in Dynamic Environments Qinglong Shi et.al. 2601.09382 translate read null
2026-01-14 The Imperfective Paradox in Large Language Models Bolei Ma et.al. 2601.09373 translate read null
2026-01-14 Relation Extraction Capabilities of LLMs on Clinical Text: A Bilingual Evaluation for English and Turkish Aidana Aidynkyzy et.al. 2601.09367 translate read null
2026-01-14 See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval Mingyu Jeon et.al. 2601.09350 translate read null
2026-01-14 SpatialJB: How Text Distribution Art Becomes the “Jailbreak Key” for LLM Guardrails Zhiyi Mou et.al. 2601.09321 translate read null
2026-01-14 On-Device Large Language Models for Sequential Recommendation Xin Xia et.al. 2601.09306 translate read null
2026-01-14 Multi-Modal LLM based Image Captioning in ICT: Bridging the Gap Between General and Industry Domain Lianying Chao et.al. 2601.09298 translate read null
2026-01-14 MACRO-LLM: LLM-Empowered Multi-Agent Collaborative Reasoning under Spatiotemporal Partial Observability Handi Chen et.al. 2601.09295 translate read null
2026-01-14 Enhancing Spatial Reasoning in Large Language Models for Metal-Organic Frameworks Structure Prediction Mianzhi Pan et.al. 2601.09285 translate read null
2026-01-14 Cluster Workload Allocation: Semantic Soft Affinity Using Natural Language Processing Leszek Sliwko et.al. 2601.09282 translate read null
2026-01-14 STaR: Sensitive Trajectory Regulation for Unlearning in Large Reasoning Models Jingjing Zhou et.al. 2601.09281 translate read null
2026-01-14 ReGraM: Region-First Knowledge Graph Reasoning for Medical Question Answering Chaerin Lee et.al. 2601.09280 translate read null
2026-01-14 MCGA: A Multi-task Classical Chinese Literary Genre Audio Corpus Yexing Du et.al. 2601.09270 translate read link
2026-01-14 RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering Wencheng Ye et.al. 2601.09269 translate read link
2026-01-14 Coordinated Pandemic Control with Large Language Model Agents as Policymaking Assistants Ziyi Shi et.al. 2601.09264 translate read null
2026-01-14 Efficient Paths and Dense Rewards: Probabilistic Flow Reasoning for Large Language Models Yan Liu et.al. 2601.09260 translate read null
2026-01-14 MAXS: Meta-Adaptive Exploration with LLM Agents Jian Zhang et.al. 2601.09259 translate read link
2026-01-14 When to Invoke: Refining LLM Fairness with Toxicity Assessment Jing Ren et.al. 2601.09250 translate read null
2026-01-14 When to Trust: A Causality-Aware Calibration Framework for Accurate Knowledge Graph Retrieval-Augmented Generation Jing Ren et.al. 2601.09241 translate read null
2026-01-14 DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion Hanlin Zhang et.al. 2601.09239 translate read null
2026-01-14 Mikasa: A Character-Driven Emotional AI Companion Inspired by Japanese Oshi Culture Miki Ueno et.al. 2601.09208 translate read null
2026-01-14 ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection Tao Liu et.al. 2601.09195 translate read link
2026-01-14 OrthoGeoLoRA: Geometric Parameter-Efficient Fine-Tuning for Structured Social Science Concept Retrieval on theWeb Zeqiang Wang et.al. 2601.09185 translate read null
2026-01-14 $D^2Prune$ : Sparsifying Large Language Models via Dual Taylor Expansion and Attention Distribution Awareness Lang Xiong et.al. 2601.09176 translate read null
2026-01-14 BalDRO: A Distributionally Robust Optimization based Framework for Large Language Model Unlearning Pengyang Shao et.al. 2601.09172 translate read null
2026-01-14 LLMs Meet Isolation Kernel: Lightweight, Learning-free Binary Embeddings for Fast Retrieval Zhibo Zhang et.al. 2601.09159 translate read null
2026-01-14 PrivacyReasoner: Can LLM Emulate a Human-like Privacy Mind? Yiwen Tu et.al. 2601.09152 translate read null
2026-01-14 Interpretable Probability Estimation with LLMs via Shapley Reconstruction Yang Nan et.al. 2601.09151 translate read null
2026-01-14 World Craft: Agentic Framework to Create Visualizable Worlds via Text Jianwen Sun et.al. 2601.09150 translate read null
2026-01-14 Identity-Robust Language Model Generation via Content Integrity Preservation Miao Zhang et.al. 2601.09141 translate read null
2026-01-14 KryptoPilot: An Open-World Knowledge-Augmented LLM Agent for Automated Cryptographic Exploitation Xiaonan Liu et.al. 2601.09129 translate read null
2026-01-14 Contrastive Bi-Encoder Models for Multi-Label Skill Extraction: Enhancing ESCO Ontology Matching with BERT and Attention Mechanisms Yongming Sun et.al. 2601.09119 translate read null
2026-01-14 The AI Hippocampus: How Far are We From Human Memory? Zixia Jia et.al. 2601.09113 translate read null
2026-01-14 Seeking Human Security Consensus: A Unified Value Scale for Generative AI Value Safety Ying He et.al. 2601.09112 translate read null
2026-01-14 DScheLLM: Enabling Dynamic Scheduling through a Fine-Tuned Dual-System Large language Model Lixiang Zhang et.al. 2601.09100 translate read null
2026-01-14 Programming over Thinking: Efficient and Robust Multi-Constraint Planning Derrick Goh Xin Deik et.al. 2601.09097 translate read null
2026-01-14 Hidden States as Early Signals: Step-level Trace Evaluation and Pruning for Efficient Test-Time Scaling Zhixiang Liang et.al. 2601.09093 translate read null
2026-01-14 SubTokenTest: A Practical Benchmark for Real-World Sub-token Understanding Shuyang Hou et.al. 2601.09089 translate read null
2026-01-14 From Symbolic to Natural-Language Relations: Rethinking Knowledge Graph Construction in the Era of Large Language Models Kanyao Han et.al. 2601.09069 translate read null
2026-01-14 Mi:dm 2.0 Korea-centric Bilingual Language Models Donghoon Shin et.al. 2601.09066 translate read null
2026-01-14 Efficient Multilingual Dialogue Processing via Translation Pipelines and Distilled Language Models Santiago Martínez Novoa et.al. 2601.09059 translate read null
2026-01-14 Evaluating local large language models for structured extraction from endometriosis-specific transvaginal ultrasound reports Haiyi Li et.al. 2601.09053 translate read null
2026-01-14 Is Grokking Worthwhile? Functional Analysis and Transferability of Generalization Circuits in Transformers Kaiyu He et.al. 2601.09049 translate read null
2026-01-14 Horseshoe Mixtures-of-Experts (HS-MoE) Nick Polson et.al. 2601.09043 translate read null
2026-01-14 Can LLMs interpret figurative language as humans do?: surface-level vs representational similarity Samhita Bollepally et.al. 2601.09041 translate read null
2026-01-14 An Information-Theoretic Perspective on LLM Tokenizers Mete Erdogan et.al. 2601.09039 translate read null
2026-01-14 SpectraQuery: A Hybrid Retrieval-Augmented Conversational Assistant for Battery Science Sreya Vangara et.al. 2601.09036 translate read null
2026-01-14 A Decompilation-Driven Framework for Malware Detection with Large Language Models Aniesh Chawla et.al. 2601.09035 translate read null
2026-01-13 The Hierarchy of Agentic Capabilities: Evaluating Frontier Models on Realistic RL Environments Logan Ritchie et.al. 2601.09032 translate read null
2026-01-13 Proactively Detecting Threats: A Novel Approach Using LLMs Aniesh Chawla et.al. 2601.09029 translate read null
2026-01-13 OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG Fengran Mo et.al. 2601.09028 translate read null
2026-01-13 Agentic AI and Machine Learning for Accelerated Materials Discovery and Applications Jihua Chen et.al. 2601.09027 translate read null
2026-01-13 Multicultural Spyfall: Assessing LLMs through Dynamic Multilingual Social Deduction Game Haryo Akbarianto Wibowo et.al. 2601.09017 translate read null
2026-01-13 Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers Annalisa Belloni et.al. 2601.09000 translate read null
2026-01-13 Optimising for Energy Efficiency and Performance in Machine Learning Emile Dos Santos Ferreira et.al. 2601.08991 translate read null
2026-01-13 ART: Action-based Reasoning Task Benchmarking for Medical AI Agents Ananya Mantravadi et.al. 2601.08988 translate read null
2026-01-13 Integrating APK Image and Text Data for Enhanced Threat Detection: A Multimodal Deep Learning Approach to Android Malware Md Mashrur Arifin et.al. 2601.08959 translate read null
2026-01-13 Fine Grained Evaluation of LLMs-as-Judges Sourav Saha et.al. 2601.08919 translate read null
2026-01-13 Spectral Generative Flow Models: A Physics-Inspired Replacement for Vectorized Large Language Models Andrew Kiruluta et.al. 2601.08893 translate read null
2026-01-13 Evaluating Role-Consistency in LLMs for Counselor Training Eric Rudolph et.al. 2601.08892 translate read null
2026-01-12 Bridging the Gap: Empowering Small Models in Reliable OpenACC-based Parallelization via GEPA-Optimized Prompting Samyak Jhaveri et.al. 2601.08884 translate read null
2026-01-13 Modeling LLM Agent Reviewer Dynamics in Elo-Ranked Review System Hsiang-Wei Huang et.al. 2601.08829 translate read null
2026-01-13 Reasoning Matters for 3D Visual Grounding Hsiang-Wei Huang et.al. 2601.08811 translate read null
2026-01-13 Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge Yao Tang et.al. 2601.08808 translate read link
2026-01-13 MixServe: An Automatic Distributed Serving System for MoE Models with Hybrid Parallelism Based on Fused Communication Algorithm Bowen Zhou et.al. 2601.08800 translate read null
2026-01-13 Uncovering Political Bias in Large Language Models using Parliamentary Voting Records Jieying Chen et.al. 2601.08785 translate read null
2026-01-13 Asymptotic Universal Alignment: A New Alignment Framework via Test-Time Scaling Yang Cai et.al. 2601.08777 translate read null
2026-01-13 Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Zhiyuan Hu et.al. 2601.08763 translate read null
2026-01-13 M3CoTBench: Benchmark Chain-of-Thought of MLLMs in Medical Image Understanding Juntao Jiang et.al. 2601.08758 translate read null
2026-01-13 Inferring Latent Intentions: Attributional Natural Language Inference in LLM Agents Xin Quan et.al. 2601.08742 translate read null
2026-01-13 From Rows to Reasoning: A Retrieval-Augmented Multimodal Framework for Spreadsheet Understanding Anmol Gulati et.al. 2601.08741 translate read null
2026-01-13 PrivGemo: Privacy-Preserving Dual-Tower Graph Retrieval for Empowering LLM Reasoning with Memory Augmentation Xingyu Tan et.al. 2601.08739 translate read null
2026-01-13 TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback Prithwish Jana et.al. 2601.08734 translate read null
2026-01-13 RAGShaper: Eliciting Sophisticated Agentic RAG Skills via Automated Data Synthesis Zhengwei Tao et.al. 2601.08699 translate read null
2026-01-13 Nationality and Region Prediction from Names: A Comparative Study of Neural Models and Large Language Models Keito Inoshita et.al. 2601.08692 translate read null
2026-01-13 LLMs in Code Vulnerability Analysis: A Proof of Concept Shaznin Sultana et.al. 2601.08691 translate read null
2026-01-13 All Required, In Order: Phase-Level Evaluation for AI-Human Dialogue in Healthcare and Beyond Shubham Kulkarni et.al. 2601.08690 translate read null
2026-01-13 QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models Zhaolu Kang et.al. 2601.08689 translate read null
2026-01-13 Advancing ESG Intelligence: An Expert-level Agent and Comprehensive Benchmark for Sustainable Finance Yilei Zhao et.al. 2601.08676 translate read null
2026-01-13 Why AI Alignment Failure Is Structural: Learned Human Interaction Structures and AGI as an Endogenous Evolutionary Shock Didier Sornette et.al. 2601.08673 translate read null
2026-01-13 Analyzing Bias in False Refusal Behavior of Large Language Models for Hate Speech Detoxification Kyuri Im et.al. 2601.08668 translate read null
2026-01-13 Prism: Towards Lowering User Cognitive Load in LLMs via Complex Intent Understanding Zenghua Liao et.al. 2601.08653 translate read null
2026-01-13 Resisting Manipulative Bots in Memecoin Copy Trading: A Multi-Agent Approach with Chain-of-Thought Reasoning Yichen Luo et.al. 2601.08641 translate read null
2026-01-13 Moral Lenses, Political Coordinates: Towards Ideological Positioning of Morally Conditioned LLMs Chenchen Yuan et.al. 2601.08634 translate read null
2026-01-13 How Order-Sensitive Are LLMs? OrderProbe for Deterministic Structural Reconstruction Yingjie He et.al. 2601.08626 translate read null
2026-01-13 Efficient Maintenance of Leiden Communities in Large Dynamic Graphs Chunxu Lin et.al. 2601.08554 translate read null
2026-01-13 Learner-Tailored Program Repair: A Solution Generator with Iterative Edit-Driven Retrieval Enhancement Zhenlong Dai et.al. 2601.08545 translate read null
2026-01-13 Reducing Compute Waste in LLMs through Kernel-Level DVFS Jeffrey Spaan et.al. 2601.08539 translate read null
2026-01-13 Your Group-Relative Advantage Is Biased Fengkai Yang et.al. 2601.08521 translate read null
2026-01-13 Closed-Loop LLM Discovery of Non-Standard Channel Priors in Vision Models Tolgay Atinc Uzun et.al. 2601.08517 translate read null
2026-01-13 Robust CAPTCHA Using Audio Illusions in the Era of Large Language Models: from Evaluation to Advances Ziqi Ding et.al. 2601.08516 translate read null
2026-01-13 What If TSF: A Benchmark for Reframing Forecasting as Scenario-Guided Multimodal Forecasting Jinkwan Jang et.al. 2601.08509 translate read null
2026-01-13 It’s All About the Confidence: An Unsupervised Approach for Multilingual Historical Entity Linking using Large Language Models Cristian Santini et.al. 2601.08500 translate read null
2026-01-13 BenchOverflow: Measuring Overflow in Large Language Models via Plain-Text Prompts Erin Feiglin et.al. 2601.08490 translate read null
2026-01-13 SUMMPILOT: Bridging Efficiency and Customization for Interactive Summarization System JungMin Yun et.al. 2601.08475 translate read null
2026-01-13 sui-1: Grounded and Verifiable Long-Form Summarization Benedikt Droste et.al. 2601.08472 translate read null
2026-01-13 JudgeRLVR: Judge First, Generate Second for Efficient Reasoning Jiangshan Duo et.al. 2601.08468 translate read null
2026-01-13 M3-BENCH: Process-Aware Evaluation of LLM Agents Social Behaviors in Mixed-Motive Games Sixiong Xie et.al. 2601.08462 translate read null
2026-01-13 Beyond Linearization: Attributed Table Graphs for Table Reasoning Yuxiang Wang et.al. 2601.08444 translate read null
2026-01-13 YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation Abdelaziz Bounhar et.al. 2601.08441 translate read null
2026-01-13 Incentivizing Cardiologist-Like Reasoning in MLLMs for Interpretable Echocardiographic Diagnosis Yi Qin et.al. 2601.08440 translate read null
2026-01-13 Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management Weitao Ma et.al. 2601.08435 translate read null
2026-01-13 Silence the Judge: Reinforcement Learning with Self-Verifier via Latent Geometric Clustering Nonghai Zhang et.al. 2601.08427 translate read null
2026-01-13 Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance Jihang Li et.al. 2601.08418 translate read null
2026-01-13 Regulatory gray areas of LLM Terms Brittany I. Davidson et.al. 2601.08415 translate read null
2026-01-13 Hybrid Distillation with CoT Guidance for Edge-Drone Control Code Generation Yizhan Feng et.al. 2601.08412 translate read null
2026-01-13 Large Language Models to Enhance Multi-task Drone Operations in Simulated Environments Yizhan Feng et.al. 2601.08405 translate read null
2026-01-13 Owen-Shapley Policy Optimization (OSPO): A Principled RL Algorithm for Generative Search LLMs Abhijnan Nath et.al. 2601.08403 translate read null
2026-01-13 PATS: Personality-Aware Teaching Strategies with Large Language Model Tutors Donya Rooein et.al. 2601.08402 translate read null
2026-01-13 CLaS-Bench: A Cross-Lingual Alignment and Steering Benchmark Daniil Gurgurov et.al. 2601.08331 translate read null
2026-01-13 Deep Exploration of Epoch-wise Double Descent in Noisy Data: Signal Separation, Large Activation, and Benign Overfitting Tomoki Kubo et.al. 2601.08316 translate read null
2026-01-13 Enhancing Image Quality Assessment Ability of LMMs via Retrieval-Augmented Generation Kang Fu et.al. 2601.08311 translate read null
2026-01-13 Enhancing Sentiment Classification and Irony Detection in Large Language Models through Advanced Prompt Engineering Techniques Marvin Schmitt et.al. 2601.08302 translate read null
2026-01-13 Demystifying the Slash Pattern in Attention: The Role of RoPE Yuan Cheng et.al. 2601.08297 translate read null
2026-01-13 KidVis: Do Multimodal Large Language Models Possess the Visual Perceptual Capabilities of a 6-Year-Old? Xianfeng Wang et.al. 2601.08292 translate read null
2026-01-13 OpenMic: A Multi-Agent-Based Stand-Up Comedy Generation System Yuyang Wu et.al. 2601.08288 translate read null
2026-01-13 AgriLens: Semantic Retrieval in Agricultural Texts Using Topic Modeling and Language Models Heba Shakeel et.al. 2601.08283 translate read null
2026-01-13 Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees Kun Li et.al. 2601.08274 translate read null
2026-01-13 HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding Qitan Lv et.al. 2601.08273 translate read null
2026-01-13 Med-CoReasoner: Reducing Language Disparities in Medical Reasoning via Language-Informed Co-Reasoning Fan Gao et.al. 2601.08267 translate read null
2026-01-13 Unleashing Tool Engineering and Intelligence for Agentic AI in Next-Generation Communication Networks Yinqiu Liu et.al. 2601.08259 translate read null
2026-01-13 Large Artificial Intelligence Model Guided Deep Reinforcement Learning for Resource Allocation in Non Terrestrial Networks Abdikarim Mohamed Ibrahim et.al. 2601.08254 translate read null
2026-01-13 Improving Zero-shot ADL Recognition with Large Language Models through Event-based Context and Confidence Michele Fiori et.al. 2601.08241 translate read null
2026-01-13 The End of Reward Engineering: How LLMs Are Redefining Multi-Agent Coordination Haoran Su et.al. 2601.08237 translate read null
2026-01-13 DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection Zhenhua Xu et.al. 2601.08223 translate read null
2026-01-13 Generation-Augmented Generation: A Plug-and-Play Framework for Private Knowledge Injection in Large Language Models Rongji Li et.al. 2601.08209 translate read null
2026-01-13 Triplets Better Than Pairs: Towards Stable and Effective Self-Play Fine-Tuning for LLMs Yibo Wang et.al. 2601.08198 translate read null
2026-01-13 Evaluating Implicit Regulatory Compliance in LLM Tool Invocation via Logic-Guided Synthesis Da Song et.al. 2601.08196 translate read null
2026-01-13 Improving LLM Reasoning with Homophily-aware Structural and Semantic Text-Attributed Graph Compression Zijun Di et.al. 2601.08187 translate read null
2026-01-13 GI-Bench: A Panoramic Benchmark Revealing the Knowledge-Experience Dissociation of Multimodal Large Language Models in Gastrointestinal Endoscopy Against Clinical Standards Yan Zhu et.al. 2601.08183 translate read null
2026-01-13 Prompt-Based Clarity Evaluation and Topic Detection in Political Question Answering Lavanya Prahallad et.al. 2601.08176 translate read null
2026-01-13 The Agent’s First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios Daocheng Fu et.al. 2601.08173 translate read null
2026-01-13 Relational Knowledge Distillation Using Fine-tuned Function Vectors Andrea Kang et.al. 2601.08169 translate read null
2026-01-13 WISE-Flow: Workflow-Induced Structured Experience for Self-Evolving Conversational Service Agents Yuqing Zhou et.al. 2601.08158 translate read null
2026-01-13 Where Does Vision Meet Language? Understanding and Refining Visual Fusion in MLLMs via Contrastive Attention Shezheng Song et.al. 2601.08151 translate read null
2026-01-13 Enriching Semantic Profiles into Knowledge Graph for Recommender Systems Using Large Language Models Seokho Ahn et.al. 2601.08148 translate read null
2026-01-13 Qalb: Largest State-of-the-Art Urdu Large Language Model for 230M Speakers with Systematic Continued Pre-training Muhammad Taimoor Hassan et.al. 2601.08141 translate read null
2026-01-13 MirrorBench: An Extensible Framework to Evaluate User-Proxy Agents for Human-Likeness Ashutosh Hathidara et.al. 2601.08118 translate read null
2026-01-13 Coordinated Cooling and Compute Management for AI Datacenters Nardos Belay Abera et.al. 2601.08113 translate read null
2026-01-13 Debiasing Large Language Models via Adaptive Causal Prompting with Sketch-of-Thought Bowen Li et.al. 2601.08108 translate read null
2026-01-13 STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order Chengyang Gu et.al. 2601.08107 translate read null
2026-01-13 AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling Yongliang Miao et.al. 2601.08097 translate read null
2026-01-13 Q-realign: Piggybacking Realignment on Quantization for Safe and Efficient LLM Deployment Qitao Tan et.al. 2601.08089 translate read null
2026-01-12 MemoBrain: Executive Memory as an Agentic Brain for Reasoning Hongjin Qian et.al. 2601.08079 translate read null
2026-01-12 Semantic Gravity Wells: Why Negative Constraints Backfire Shailesh Rana et.al. 2601.08070 translate read null
2026-01-12 Calibration Is Not Enough: Evaluating Confidence Estimation Under Language Variations Yuxi Xia et.al. 2601.08064 translate read null
2026-01-12 Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models Zhenghao He et.al. 2601.08058 translate read null
2026-01-12 Cognitive Biases in LLM-Assisted Software Development Xinyi Zhou et.al. 2601.08045 translate read null
2026-01-12 Towards Verifiably Safe Tool Use for LLM Agents Aarya Doshi et.al. 2601.08012 translate read null
2026-01-12 LLM Review: Enhancing Creative Writing via Blind Peer Review Feedback Weiyue Li et.al. 2601.08003 translate read null
2026-01-12 Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety Can Jin et.al. 2601.08000 translate read null
2026-01-12 Is Sentiment Banana-Shaped? Exploring the Geometry and Portability of Sentiment Concept Vectors Laurits Lyngbaek et.al. 2601.07995 translate read null
2026-01-12 DYCP: Dynamic Context Pruning for Long-Form Dialogue with LLMs Nayoung Choi et.al. 2601.07994 translate read null
2026-01-12 Fake Date Tests: Can We Trust In-sample Accuracy of LLMs in Macroeconomic Forecasting? Alexander Eliseev et.al. 2601.07992 translate read null
2026-01-12 Multilingual, Multimodal Pipeline for Creating Authentic and Structured Fact-Checked Claim Dataset Z. Melce Hüsünbeyi et.al. 2601.07985 translate read null
2026-01-12 Cost and accuracy of long-term memory in Distributed Multi-Agent Systems based on Large Language Models Benedict Wolff et.al. 2601.07978 translate read null
2026-01-12 Explaining Generalization of AI-Generated Text Detectors Through Linguistic Analysis Yuxi Xia et.al. 2601.07974 translate read null
2026-01-12 Knowing But Not Doing: Convergent Morality and Divergent Action in LLMs Jen-tse Huang et.al. 2601.07972 translate read null
2026-01-12 A Human-Centric Pipeline for Aligning Large Language Models with Chinese Medical Ethics Haoan Jin et.al. 2601.07954 translate read null
2026-01-12 SECite: Analyzing and Summarizing Citations in Software Engineering Literature Shireesh Reddy Pyreddy et.al. 2601.07939 translate read null
2026-01-12 Towards Specialized Generalists: A Multi-Task MoE-LoRA Framework for Domain-Specific LLM Adaptation Yuxin Yang et.al. 2601.07935 translate read null
2026-01-12 Enhancing Large Language Models for Time-Series Forecasting via Vector-Injected In-Context Learning Jianqi Zhang et.al. 2601.07903 translate read null
2026-01-12 SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations Mohammed Himayath Ali et.al. 2601.07835 translate read null
2026-01-12 The Confidence Trap: Gender Bias and Predictive Certainty in LLMs Ahmed Sabir et.al. 2601.07806 translate read null
2026-01-12 Learning Through Dialogue: Unpacking the Dynamics of Human-LLM Conversations on Political Issues Shaz Furniturewala et.al. 2601.07796 translate read null
2026-01-12 Kinship Data Benchmark for Multi-hop Reasoning Tianda Sun et.al. 2601.07794 translate read null
2026-01-12 “TODO: Fix the Mess Gemini Created”: Towards Understanding GenAI-Induced Self-Admitted Technical Debt Abdullah Al Mujahid et.al. 2601.07786 translate read null
2026-01-12 Enhancing Self-Correction in Large Language Models through Multi-Perspective Reflection Mariana Costa et.al. 2601.07780 translate read null
2026-01-12 Are LLM Decisions Faithful to Verbal Confidence? Jiawei Wang et.al. 2601.07767 translate read null
2026-01-12 Structure First, Reason Next: Enhancing a Large Language Model using Knowledge Graph for Numerical Reasoning in Financial Documents Aryan Mishra et.al. 2601.07754 translate read null
2026-01-12 Evaluating the encoding competence of visual language models using uncommon actions Chen Ling et.al. 2601.07737 translate read null
2026-01-12 Is Agentic RAG worth it? An experimental comparison of RAG approaches Pietro Ferrazzi et.al. 2601.07711 translate read null
2026-01-12 Exploring the Meta-level Reasoning of Large Language Models via a Tool-based Multi-hop Tabular Question Answering Task Nick Ferguson et.al. 2601.07696 translate read null
2026-01-12 Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference Rei Taniguchi et.al. 2601.07667 translate read null
2026-01-12 Towards Automating Blockchain Consensus Verification with IsabeLLM Elliot Jones et.al. 2601.07654 translate read null
2026-01-12 PlaM: Training-Free Plateau-Guided Model Merging for Better Visual Grounding in MLLMs Zijing Wang et.al. 2601.07645 translate read null
2026-01-12 GeoMotionGPT: Geometry-Aligned Motion Understanding with Large Language Models Zhankai Ye et.al. 2601.07632 translate read null
2026-01-12 Proof of Time: A Benchmark for Evaluating Scientific Idea Judgments Bingyang Ye et.al. 2601.07606 translate read null
2026-01-12 OODEval: Evaluating Large Language Models on Object-Oriented Design Bingxu Xiao et.al. 2601.07602 translate read null
2026-01-12 GRPO with State Mutations: Improving LLM-Based Hardware Test Plan Generation Dimple Vijay Kochar et.al. 2601.07593 translate read null
2026-01-12 Large Language Models for Physics Instrument Design Sara Zoccheddu et.al. 2601.07580 translate read null
2026-01-12 Beyond Entangled Planning: Task-Decoupled Planning for Long-Horizon Agents Yunfan Li et.al. 2601.07577 translate read null
2026-01-12 d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation Yu-Yang Qian et.al. 2601.07568 translate read link
2026-01-12 A Unified Framework for Emotion Recognition and Sentiment Analysis via Expert-Guided Multimodal Fusion with Large Language Models Jiaqi Qiao et.al. 2601.07565 translate read null
2026-01-05 Heterogeneous Low-Bandwidth Pre-Training of LLMs Yazan Obeidi et.al. 2601.02360 translate read null
2026-01-05 Robust Persona-Aware Toxicity Detection with Prompt Optimization and Learned Ensembling Berk Atil et.al. 2601.02337 translate read null
2026-01-05 Estimating Text Temperature Nikolay Mikhaylovskiy et.al. 2601.02320 translate read null
2026-01-05 Project Ariadne: A Structural Causal Framework for Auditing Faithfulness in LLM Agents Sourena Khanzadeh et.al. 2601.02314 translate read null
2026-01-05 Placement Semantics for Distributed Deep Learning: A Systematic Framework for Analyzing Parallelism Strategies Deep Pankajbhai Mehta et.al. 2601.02311 translate read null
2026-01-05 Power-of-Two Quantization-Aware-Training (PoT-QAT) in Large Language Models (LLMs) Mahmoud Elgenedy et.al. 2601.02298 translate read null
2026-01-05 CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models Yihao Liang et.al. 2601.02236 translate read null
2026-01-05 ELLA: Efficient Lifelong Learning for Adapters in Large Language Models Shristi Das Biswas et.al. 2601.02232 translate read null
2026-01-05 From XAI to Stories: A Factorial Study of LLM-Generated Explanation Quality Fabian Lukassen et.al. 2601.02224 translate read null
2026-01-05 CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents Keyu Wang et.al. 2601.02201 translate read null
2026-01-05 Toward Global Large Language Models in Medicine Rui Yang et.al. 2601.02186 translate read null
2026-01-05 Confidence Estimation for LLMs in Multi-turn Interactions Caiqi Zhang et.al. 2601.02179 translate read null
2026-01-05 Streaming Hallucination Detection in Long Chain-of-Thought Reasoning Haolang Lu et.al. 2601.02170 translate read null
2026-01-05 EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning Chuanrui Hu et.al. 2601.02163 translate read null
2026-01-05 Routing by Analogy: kNN-Augmented Expert Assignment for Mixture-of-Experts Boxuan Lyu et.al. 2601.02144 translate read null
2026-01-05 Towards Multi-Level Transcript Segmentation: LoRA Fine-Tuning for Table-of-Contents Generation Steffen Freisinger et.al. 2601.02128 translate read null
2026-01-05 DeCode: Decoupling Content and Delivery for Medical QA Po-Jen Ko et.al. 2601.02123 translate read null
2026-01-05 Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot Chenghao Yin et.al. 2601.02078 translate read null
2026-01-05 Deferred Commitment Decoding for Diffusion Language Models with Confidence-Aware Sliding Windows Yingte Shu et.al. 2601.02076 translate read null
2026-01-05 MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics Zhuofan Shi et.al. 2601.02075 translate read null
2026-01-05 FormuLLA: A Large Language Model Approach to Generating Novel 3D Printable Formulations Adeshola Okubena et.al. 2601.02071 translate read null
2026-01-05 Cost-Efficient Cross-Lingual Retrieval-Augmented Generation for Low-Resource Languages: A Case Study in Bengali Agricultural Advisory Md. Asif Hossain et.al. 2601.02065 translate read null
2026-01-05 Perish or Flourish? A Holistic Evaluation of Large Language Models for Code Generation in Functional Programming Nguyet-Anh H. Lang et.al. 2601.02060 translate read null
2026-01-05 Output Embedding Centering for Stable LLM Pretraining Felix Stollenwerk et.al. 2601.02031 translate read null
2026-01-05 Not All Needles Are Found: How Fact Distribution and Don’t Make It Up Prompts Shape Literal Extraction, Logical Inference, and Hallucination Risks in Long-Context LLMs Amirali Ebrahimzadeh et.al. 2601.02023 translate read null
2026-01-05 AgentVNE: LLM-Augmented Graph Reinforcement Learning for Affinity-Aware Multi-Agent Placement in Edge Agentic AI Runze Zheng et.al. 2601.02021 translate read null
2026-01-05 Exploring Approaches for Detecting Memorization of Recommender System Data in Large Language Models Antonio Colacicco et.al. 2601.02002 translate read null
2026-01-05 MindChat: A Privacy-preserving Large Language Model for Mental Health Support Dong Xue et.al. 2601.01993 translate read null
2026-01-05 ChaosBench-Logic: A Benchmark for Logical and Symbolic Reasoning on Chaotic Dynamical Systems Noel Thomas et.al. 2601.01982 translate read null
2026-01-05 Reporting LLM Prompting in Automated Software Engineering: A Guideline Based on Current Practices and Expectations Alexander Korn et.al. 2601.01954 translate read null
2026-01-05 MacVQA: Adaptive Memory Allocation and Global Noise Filtering for Continual Visual Question Answering Zhifei Li et.al. 2601.01926 translate read null
2026-01-05 AR-MOT: Autoregressive Multi-object Tracking Lianjie Jia et.al. 2601.01925 translate read null
2026-01-05 TalkPhoto: A Versatile Training-Free Conversational Assistant for Intelligent Image Editing Yujie Hu et.al. 2601.01915 translate read null
2026-01-05 MMP-A*: Multimodal Perception Enhanced Incremental Heuristic Search on Path Planning Minh Hieu Ha et.al. 2601.01910 translate read null
2026-01-05 Tackling the Inherent Difficulty of Noise Filtering in RAG Jingyu Liu et.al. 2601.01896 translate read null
2026-01-05 Agentic AI in Remote Sensing: Foundations, Taxonomy, and Emerging Systems Niloufar Alipour Talemi et.al. 2601.01891 translate read null
2026-01-05 Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance Jiawen Zhang et.al. 2601.01887 translate read null
2026-01-05 Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents Yi Yu et.al. 2601.01885 translate read null
2026-01-05 Theory Trace Card: Theory-Driven Socio-Cognitive Evaluation of LLMs Farzan Karimi-Malekabadi et.al. 2601.01878 translate read null
2026-01-05 Toward Auditable Neuro-Symbolic Reasoning in Pathology: SQL as an Explicit Trace of Evidence Kewen Cao et.al. 2601.01875 translate read null
2026-01-05 CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving Shuhang Chen et.al. 2601.01874 translate read null
2026-01-05 Entity-Guided Multi-Task Learning for Infrared and Visible Image Fusion Wenyu Shao et.al. 2601.01870 translate read null
2026-01-05 DermoGPT: Open Weights and Open Data for Morphology-Grounded Dermatological Reasoning MLLMs Jinghan Ru et.al. 2601.01868 translate read null
2026-01-05 Judging with Personality and Confidence: A Study on Personality-Conditioned LLM Relevance Assessment Nuo Chen et.al. 2601.01862 translate read null
2026-01-05 Jenius Agent: Towards Experience-Driven Accuracy Optimization in Real-World Scenarios Defei Xia et.al. 2601.01857 translate read null
2026-01-05 MORE: Multi-Objective Adversarial Attacks on Speech Recognition Xiaoxue Gao et.al. 2601.01852 translate read null
2026-01-05 Clinical Knowledge Graph Construction and Evaluation with Multi-LLMs via Retrieval-Augmented Generation Udiptaman Das et.al. 2601.01844 translate read null
2026-01-05 COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Dasol Choi et.al. 2601.01836 translate read null
2026-01-05 Emergent Introspective Awareness in Large Language Models Jack Lindsey et.al. 2601.01828 translate read null
2026-01-05 Aspect Extraction from E-Commerce Product and Service Reviews Valiant Lance D. Dionela et.al. 2601.01827 translate read null
2026-01-05 CSCBench: A PVC Diagnostic Benchmark for Commodity Supply Chain Reasoning Yaxin Cui et.al. 2601.01825 translate read null
2026-01-05 Causality-Aware Temporal Projection for Video Understanding in Video-LLMs Zhengjian Kang et.al. 2601.01804 translate read null
2026-01-05 UnPII: Unlearning Personally Identifiable Information with Quantifiable Exposure Risk Intae Jeon et.al. 2601.01786 translate read null
2026-01-05 LIA: Supervised Fine-Tuning of Large Language Models for Automatic Issue Assignment Arsham Khosravani et.al. 2601.01780 translate read null
2026-01-05 Can Large Language Models Solve Engineering Equations? A Systematic Comparison of Direct Prediction and Solver-Assisted Approaches Sai Varun Kodathala et.al. 2601.01774 translate read null
2026-01-05 Can LLMs Track Their Output Length? A Dynamic Feedback Mechanism for Precise Length Regulation Meiman Xiao et.al. 2601.01768 translate read null
2026-01-05 A New Benchmark for the Appropriate Evaluation of RTL Code Optimization Yao Lu et.al. 2601.01765 translate read null
2026-01-05 Query-Document Dense Vectors for LLM Relevance Judgment Bias Analysis Samaneh Mohtadi et.al. 2601.01751 translate read null
2026-01-05 Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications YuanLab. ai et.al. 2601.01718 translate read null
2026-01-05 A Training-Free Large Reasoning Model-based Knowledge Tracing Framework for Unified Prediction and Prescription Unggi Lee et.al. 2601.01708 translate read null
2026-01-04 All-Optical Deep Learning with Quantum Nonlinearity Qingyi Zhou et.al. 2601.01690 translate read null
2026-01-04 Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage Jinwei Hu et.al. 2601.01685 translate read null
2026-01-04 Exposing Hidden Interfaces: LLM-Guided Type Inference for Reverse Engineering macOS Private Frameworks Arina Kharlamova et.al. 2601.01673 translate read null
2026-01-04 JMedEthicBench: A Multi-Turn Conversational Benchmark for Evaluating Medical Safety in Japanese Large Language Models Junyu Liu et.al. 2601.01627 translate read null
2026-01-04 Structured Decomposition for LLM Reasoning: Cross-Domain Validation and Semantic Web Integration Albert Sadowski et.al. 2601.01609 translate read null
2026-01-04 OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs Xin Wang et.al. 2601.01592 translate read null
2026-01-04 The Two-Stage Decision-Sampling Hypothesis: Understanding the Emergence of Self-Reflection in RL-Trained LLMs Zibo Zhao et.al. 2601.01580 translate read null
2026-01-04 CaveAgent: Transforming LLMs into Stateful Runtime Operators Maohao Ran et.al. 2601.01569 translate read null
2026-01-04 MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization Donghua Yu et.al. 2601.01554 translate read null
2026-01-04 HalluZig: Hallucination Detection using Zigzag Persistence Shreyas N. Samaga et.al. 2601.01552 translate read null
2026-01-04 Improving Behavioral Alignment in LLM Social Simulations via Context Formation and Navigation Letian Kong et.al. 2601.01546 translate read null
2026-01-04 Bridging the Data Gap: Creating a Hindi Text Summarization Dataset from the English XSUM Praveenkumar Katwe et.al. 2601.01543 translate read null
2026-01-04 Bayesian Orchestration of Multi-LLM Agents for Cost-Aware Sequential Decision-Making Danial Amin et.al. 2601.01522 translate read null
2026-01-04 Distortion Instead of Hallucination: The Effect of Reasoning Under Strict Constraints Junichiro Niimi et.al. 2601.01490 translate read null
2026-01-04 Can Legislation Be Made Machine-Readable in PROLEG? May-Myo Zin et.al. 2601.01477 translate read null
2026-01-04 Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR Yuxiang Mei et.al. 2601.01461 translate read null
2026-01-04 Bayesian Subspace Gradient Estimation for Zeroth-Order Optimization of Large Language Models Jian Feng et.al. 2601.01452 translate read null
2026-01-04 iFlip: Iterative Feedback-driven Counterfactual Example Refinement Yilong Wang et.al. 2601.01446 translate read null
2026-01-04 Personalizing black-box models for nonparametric regression with minimax optimality Sai Li et.al. 2601.01432 translate read null
2026-01-04 From Emotion Classification to Emotional Reasoning: Enhancing Emotional Intelligence in Large Language Models Arjhun Sreedar et.al. 2601.01407 translate read null
2026-01-04 LANCET: Neural Intervention via Structural Entropy for Mitigating Faithfulness Hallucinations in LLMs Chenxu Wang et.al. 2601.01401 translate read null
2026-01-04 EternalMath: A Living Benchmark of Frontier Mathematics that Evolves with Human Discovery Jicheng Ma et.al. 2601.01400 translate read null
2026-01-04 Empowering Small Language Models with Factual Hallucination-Aware Reasoning for Financial Classification Han Yuan et.al. 2601.01378 translate read null
2026-01-04 KGCE: Knowledge-Augmented Dual-Graph Evaluator for Cross-Platform Educational Agent Benchmarking with Multimodal Language Models Zixian Liu et.al. 2601.01366 translate read null
2026-01-04 A unified multimodal understanding and generation model for cross-disciplinary scientific research Xiaomeng Yang et.al. 2601.01363 translate read null
2026-01-04 Investigating the Multilingual Calibration Effects of Language Model Instruction-Tuning Jerry Huang et.al. 2601.01362 translate read null
2026-01-04 Towards LLM-enabled autonomous combustion research: A literature-aware agent for self-corrective modeling workflows Ke Xiao et.al. 2601.01357 translate read null
2026-01-04 Reasoning Over Recall: Evaluating the Efficacy of Generalist Architectures vs. Specialized Fine-Tunes in RAG-Based Mental Health Dialogue Systems Md Abdullah Al Kafi et.al. 2601.01341 translate read null
2026-01-04 FLOP-Efficient Training: Early Stopping Based on Test-Time Compute Awareness Hossam Amer et.al. 2601.01332 translate read null
2026-01-04 Beyond Gemini-3-Pro: Revisiting LLM Routing and Aggregation at Scale Shengji Tang et.al. 2601.01330 translate read null
2026-01-04 Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models Rong Zhou et.al. 2601.01321 translate read null
2026-01-04 Adaptive Hierarchical Evaluation of LLMs and SAST tools for CWE Prediction in Python Muntasir Adnan et.al. 2601.01320 translate read null
2026-01-04 Towards a Principled Muon under $μ\mathsf{P}$ : Ensuring Spectral Conditions throughout Training John Zhao et.al. 2601.01306 translate read null
2026-01-03 Warp-Cortex: An Asynchronous, Memory-Efficient Architecture for Million-Agent Cognitive Scaling on Consumer Hardware Jorge L. Ruiz Williams et.al. 2601.01298 translate read null
2026-01-03 Aggressive Compression Enables LLM Weight Theft Davis Brown et.al. 2601.01296 translate read null
2026-01-03 LLM Collusion Shengyu Cao et.al. 2601.01279 translate read null
2026-01-03 CatchAll: Repository-Aware Exception Handling with Knowledge-Guided LLMs Qingxiao Tao et.al. 2601.01271 translate read null
2026-01-03 From Policy to Logic for Efficient and Interpretable Coverage Assessment Rhitabrat Pokharel et.al. 2601.01266 translate read null
2026-01-03 MambaFormer: Token-Level Guided Routing Mixture-of-Experts for Accurate and Efficient Clinical Assistance Hamad Khan et.al. 2601.01260 translate read null
2026-01-03 Entity-Aware and Secure Query Optimization in Database Using Named Entity Recognition Azrin Sultana et.al. 2601.01254 translate read null
2026-01-03 Racka: Efficient Hungarian LLM Adaptation on Academic Infrastructure Zsolt Csibi et.al. 2601.01244 translate read null
2026-01-03 IO-RAE: Information-Obfuscation Reversible Adversarial Example for Audio Privacy Protection Jiajie Zhu et.al. 2601.01239 translate read null
2026-01-03 Atomizer: An LLM-based Collaborative Multi-Agent Framework for Intent-Driven Commit Untangling Kangchen Zhu et.al. 2601.01233 translate read null
2026-01-03 Correctness isnt Efficiency: Runtime Memory Divergence in LLM-Generated Code Prateek Rajput et.al. 2601.01215 translate read null
2026-01-03 OrchestrRL: Dynamic Compute and Network Orchestration for Disaggregated RL Xin Tan et.al. 2601.01209 translate read null
2026-01-03 EduSim-LLM: An Educational Platform Integrating Large Language Models and Robotic Simulation for Beginners Shenqi Lu et.al. 2601.01196 translate read null
2026-01-03 Reinforcement Learning Enhanced Multi-hop Reasoning for Temporal Knowledge Question Answering Wuzhenghong Wen et.al. 2601.01195 translate read null
2026-01-03 SecureCodeRL: Security-Aware Reinforcement Learning for Code Generation with Partial-Credit Rewards Suryansh Singh Sijwali et.al. 2601.01184 translate read null
2026-01-03 Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models Zihua Yang et.al. 2601.01162 translate read null
2026-01-03 DHI: Leveraging Diverse Hallucination Induction for Enhanced Contrastive Factuality Control in Large Language Models Jiani Guo et.al. 2601.01156 translate read null
2026-01-03 SongSage: A Large Musical Language Model with Lyric Generative Pre-training Jiani Guo et.al. 2601.01153 translate read null
2026-01-03 RovoDev Code Reviewer: A Large-Scale Online Evaluation of LLM-based Code Review Automation at Atlassian Kla Tantithamthavorn et.al. 2601.01129 translate read null
2026-01-03 ScienceDB AI: An LLM-Driven Agentic Recommender System for Large-Scale Scientific Data Sharing Services Qingqing Long et.al. 2601.01118 translate read null
2026-01-03 NarrativeTrack: Evaluating Video Language Models Beyond the Frame Hyeonjeong Ha et.al. 2601.01095 translate read null
2026-01-03 ks-lit-3m: A 3.1 million word kashmiri text dataset for large language model pretraining Haq Nawaz Malik et.al. 2601.01091 translate read null
2026-01-03 Harm in AI-Driven Societies: An Audit of Toxicity Adoption on Chirper.ai Erica Coppolillo et.al. 2601.01090 translate read null
2026-01-03 SPoRC-VIST: A Benchmark for Evaluating Generative Natural Narrative in Vision-Language Models Yunlin Zeng et.al. 2601.01062 translate read null
2026-01-03 A Platform for Interactive AI Character Experiences Rafael Wampfler et.al. 2601.01027 translate read null
2026-01-03 HyperJoin: LLM-augmented Hypergraph Link Prediction for Joinable Table Discovery Shiyuan Liu et.al. 2601.01015 translate read null
2026-01-02 Grain-Aware Data Transformations: Type-Level Formal Verification at Zero Computational Cost Nikos Karayannidis et.al. 2601.00995 translate read null
2026-01-02 Reliability Under Randomness: An Empirical Analysis of Sparse and Dense Language Models Across Decoding Temperatures Kabir Grover et.al. 2601.00942 translate read null
2026-01-02 Emoji-Based Jailbreaking of Large Language Models M P V S Gopinadh et.al. 2601.00936 translate read null
2026-01-02 AI-Guided Computational Design of a Room-Temperature, Ambient- Pressure Superconductor Candidate: Grokene DEARDAO DeSci Collaborative Team et.al. 2601.00931 translate read null
2026-01-02 AlignUSER: Human-Aligned LLM Agents via World Models for Recommender System Evaluation Nicolas Bougie et.al. 2601.00930 translate read null
2026-01-02 Measuring Social Media Polarization Using Large Language Models and Heuristic Rules Jawad Chowdhury et.al. 2601.00927 translate read null
2026-01-01 MACA: A Framework for Distilling Trustworthy LLMs into Efficient Retrievers Satya Swaroop Gudipudi et.al. 2601.00926 translate read null
2026-01-01 Context Collapse: In-Context Learning and Model Collapse Josef Ott et.al. 2601.00923 translate read null
2026-01-01 Attention Needs to Focus: A Unified Perspective on Attention Allocation Zichuan Fu et.al. 2601.00919 translate read null
2026-01-01 The Discovery Gap: How Product Hunt Startups Vanish in LLM Organic Discovery Queries Amit Prakash Sharma et.al. 2601.00912 translate read null
2026-01-02 Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning Valentin Noël et.al. 2601.00791 translate read null
2026-01-02 Investigating the Viability of Employing Multi-modal Large Language Models in the Context of Audio Deepfake Detection Akanksha Chuchra et.al. 2601.00777 translate read null
2026-01-02 Memory Bank Compression for Continual Adaptation of Large Language Models Thomas Katraouras et.al. 2601.00756 translate read null
2026-01-02 The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving Max Ruiz Luyten et.al. 2601.00747 translate read null
2026-01-02 Materials Informatics: Emergence To Autonomous Discovery In The Age Of AI Turab Lookman et.al. 2601.00742 translate read null
2026-01-02 Exploring the Performance of Large Language Models on Subjective Span Identification Tasks Alphaeus Dmonte et.al. 2601.00736 translate read null
2026-01-02 Grading Handwritten Engineering Exams with Multimodal Large Language Models Janez Perš et.al. 2601.00730 translate read null
2026-01-02 A Vision-and-Knowledge Enhanced Large Language Model for Generalizable Pedestrian Crossing Behavior Inference Qingwen Pu et.al. 2601.00694 translate read null
2026-01-02 Human-like AI-based Auto-Field-in-Field Whole-Brain Radiotherapy Treatment Planning With Conversation Large Language Model Feedback Adnan Jafar et.al. 2601.00685 translate read null
2026-01-02 QSLM: A Performance- and Memory-aware Quantization Framework with Tiered Search Strategy for Spike-driven Language Models Rachmad Vidya Wicaksana Putra et.al. 2601.00679 translate read null
2026-01-02 Physio-DPO: Aligning Large Language Models with the Protein Energy Landscape to Eliminate Structural Hallucinations QiWei Meng et.al. 2601.00647 translate read null
2026-01-02 FlexSpec: Frozen Drafts Meet Evolving Targets in Edge-Cloud Collaborative LLM Speculative Decoding Yuchen Li et.al. 2601.00644 translate read null
2026-01-02 Probabilistic Guarantees for Reducing Contextual Hallucinations in LLMs Nils Rautenberg et.al. 2601.00641 translate read null
2026-01-02 SEMODS: A Validated Dataset of Open-Source Software Engineering Models Alexandra González et.al. 2601.00635 translate read null
2026-01-02 Do Chatbot LLMs Talk Too Much? The YapBench Benchmark Vadim Borisov et.al. 2601.00624 translate read null
2026-01-02 DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations Longtian Qiu et.al. 2601.00623 translate read null
2026-01-02 Beyond IVR: Benchmarking Customer Support LLM Agents for Business-Adherence Sumanth Balaji et.al. 2601.00596 translate read null
2026-01-02 CSSBench: Evaluating the Safety of Lightweight LLMs against Chinese-Specific Adversarial Patterns Zhenhong Zhou et.al. 2601.00588 translate read null
2026-01-02 HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts Zihan Fang et.al. 2601.00583 translate read null
2026-01-02 The AI Invisibility Effect: Understanding Human-AI Interaction When Users Don’t Recognize Artificial Intelligence Obada Kraishan et.al. 2601.00579 translate read null
2026-01-02 InfoSynth: Information-Guided Benchmark Synthesis for LLMs Ishir Garg et.al. 2601.00575 translate read null
2026-01-02 Improving Scientific Document Retrieval with Academic Concept Index Jeyun Lee et.al. 2601.00567 translate read null
2026-01-02 Low Rank Comes with Low Security: Gradient Assembly Poisoning Attacks against Distributed LoRA-based LLM Systems Yueyan Dong et.al. 2601.00566 translate read null
2026-01-02 Cracking IoT Security: Can LLMs Outsmart Static Analysis Tools? Jason Quantrill et.al. 2601.00559 translate read null
2026-01-01 Improving LLM-Assisted Secure Code Generation through Retrieval-Augmented-Generation and Multi-Tool Feedback Vidyut Sriram et.al. 2601.00509 translate read null
2026-01-01 Rule-Based Approaches to Atomic Sentence Extraction Lineesha Kamana et.al. 2601.00506 translate read null
2026-01-01 MotionPhysics: Learnable Motion Distillation for Text-Guided Simulation Miaowei Wang et.al. 2601.00504 translate read null
2026-01-01 STELLAR: A Search-Based Testing Framework for Large Language Model Applications Lev Sorokin et.al. 2601.00497 translate read null
2026-01-01 Noise-Aware Named Entity Recognition for Historical VET Documents Alexander M. Esser et.al. 2601.00488 translate read null
2026-01-01 Multi-Agent Coordinated Rename Refactoring Abhiram Bellur et.al. 2601.00482 translate read null
2026-01-01 DSL or Code? Evaluating the Quality of LLM-Generated Algebraic Specifications: A Case Study in Optimization at Kinaxis Negin Ayoughi et.al. 2601.00469 translate read null
2026-01-01 Defensive M2S: Training Guardrail Models on Compressed Multi-turn Conversations Hyunjun Kim et.al. 2601.00454 translate read null
2026-01-01 Language as Mathematical Structure: Examining Semantic Field Theory Against Language Games Dimitris Vartziotis et.al. 2601.00448 translate read null
2026-01-01 Toward Better Temporal Structures for Geopolitical Events Forecasting Kian Ahrabian et.al. 2601.00430 translate read null
2026-01-01 Do LLMs Judge Distantly Supervised Named Entity Labels Well? Constructing the JudgeWEL Dataset Alistair Plum et.al. 2601.00411 translate read null
2026-01-01 Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach Biao Wu et.al. 2601.00388 translate read null
2026-01-01 The Role of Mixed-Language Documents for Multilingual Large Language Model Pretraining Jiandong Shao et.al. 2601.00364 translate read null
2026-01-01 Robust Uncertainty Quantification for Factual Generation of Large Language Models Yuhao Zhang et.al. 2601.00348 translate read null
2026-01-01 Can Large Language Models Still Explain Themselves? Investigating the Impact of Quantization on Self-Explanations Qianli Wang et.al. 2601.00282 translate read null
2026-01-01 Making Theft Useless: Adulteration-Based Protection of Proprietary Knowledge Graphs in GraphRAG Systems Weijie Wang et.al. 2601.00274 translate read null
2026-01-01 FaithSCAN: Model-Driven Single-Pass Hallucination Detection for Faithful Visual Question Answering Chaodong Tong et.al. 2601.00269 translate read null
2026-01-01 Beyond Perfect APIs: A Comprehensive Evaluation of LLM Agents Under Real-World API Complexity Doyoung Kim et.al. 2601.00268 translate read null
2026-01-01 Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counterfactual Example Generation Qianli Wang et.al. 2601.00263 translate read null
2026-01-01 TotalFM: An Organ-Separated Framework for 3D-CT Vision Foundation Models Kohei Yamamoto et.al. 2601.00260 translate read null
2026-01-01 An Empirical Evaluation of LLM-Based Approaches for Code Vulnerability Detection: RAG, SFT, and Dual-Agent Systems Md Hasan Saju et.al. 2601.00254 translate read null
2026-01-01 FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems Shanli Xing et.al. 2601.00227 translate read null
2026-01-01 Talk Less, Verify More: Improving LLM Assistants with Semantic Checks and Execution Feedback Yan Sun et.al. 2601.00224 translate read null
2026-01-01 From Evidence-Based Medicine to Knowledge Graph: Retrieval-Augmented Generation for Sports Rehabilitation and a Domain Benchmark Jinning Zhang et.al. 2601.00216 translate read null
2026-01-01 From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning Omar Sharif et.al. 2601.00215 translate read null
2026-01-01 Overlooked Safety Vulnerability in LLMs: Malicious Intelligent Optimization Algorithm Request and its Jailbreak Haoran Gu et.al. 2601.00213 translate read null
2026-01-01 Knowledge Distillation for Temporal Knowledge Graph Reasoning with Large Language Models Wang Xing et.al. 2601.00202 translate read null
2026-01-01 Pat-DEVAL: Chain-of-Legal-Thought Evaluation for Patent Description Yongmin Yoo et.al. 2601.00166 translate read null
2026-01-01 Combining datasets with different ground truths using Low-Rank Adaptation to generalize image-based CNN models for photometric redshift prediction Vikram Seenivasan et.al. 2601.00146 translate read null

(<a href=../LLM.md>back to LLM</a>)