LLM - 2025-11

Publish Date Title Authors PDF Translate Read Code
2025-11-06 Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs Preetum Nakkiran et.al. 2511.04869 translate read null
2025-11-06 Software Defined Vehicle Code Generation: A Few-Shot Prompting Approach Quang-Dung Nguyen et.al. 2511.04849 translate read null
2025-11-06 Grounded Test-Time Adaptation for LLM Agents Arthur Chen et.al. 2511.04847 translate read null
2025-11-06 Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models Chenxi Liu et.al. 2511.04800 translate read null
2025-11-06 ReGen: Generative Robot Simulation via Inverse Design Phat Nguyen et.al. 2511.04769 translate read null
2025-11-06 Surprisal reveals diversity gaps in image captioning and different scorers change the story Nikolai Ilinykh et.al. 2511.04754 translate read null
2025-11-06 Trustworthiness Calibration Framework for Phishing Email Detection Using Large Language Models Daniyal Ganiuly et.al. 2511.04728 translate read null
2025-11-06 IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs Ali Faraz et.al. 2511.04727 translate read null
2025-11-06 Learning to reason about rare diseases through retrieval-augmented agents Ha Young Kim et.al. 2511.04720 translate read null
2025-11-06 Benchmark Designers Should “Train on the Test Set” to Expose Exploitable Non-Visual Shortcuts Ellis Brown et.al. 2511.04655 translate read null
2025-11-06 Logit-Entropy Adaptive Stopping Heuristic for Efficient Chain-of-Thought Reasoning Mohammad Atif Quamar et.al. 2511.04654 translate read null
2025-11-06 Optimal Inference Schedules for Masked Diffusion Models Sitan Chen et.al. 2511.04647 translate read null
2025-11-06 When retrieval outperforms generation: Dense evidence retrieval for scalable fake news detection Alamgir Munir Qazi et.al. 2511.04643 translate read link
2025-11-06 PixCLIP: Achieving Fine-grained Visual Language Understanding via Any-granularity Pixel-Text Alignment Learning Yicheng Xiao et.al. 2511.04601 translate read null
2025-11-06 Question the Questions: Auditing Representation in Online Deliberative Processes Soham De et.al. 2511.04588 translate read null
2025-11-06 ARETE: an R package for Automated REtrieval from TExt with large language models Vasco V. Branco et.al. 2511.04573 translate read null
2025-11-06 Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Jingqi Tong et.al. 2511.04570 translate read link
2025-11-06 LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems Baptiste Bonin et.al. 2511.04541 translate read null
2025-11-06 From Model to Breach: Towards Actionable LLM-Generated Vulnerabilities Reporting Cyril Vallez et.al. 2511.04538 translate read null
2025-11-06 Large Language Models for Cyber Security Raunak Somani et.al. 2511.04508 translate read null
2025-11-06 RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG Joshua Gao et.al. 2511.04502 translate read null
2025-11-06 Large language models replicate and predict human cooperation across experiments in game theory Andrea Cera Palatsi et.al. 2511.04500 translate read null
2025-11-06 Decoding Emergent Big Five Traits in Large Language Models: Temperature-Dependent Expression and Architectural Clustering Christos-Nikolaos Zacharopoulos et.al. 2511.04499 translate read null
2025-11-06 RUST-BENCH: Benchmarking LLM Reasoning on Unstructured Text within Structured Tables Nikhil Abhyankar et.al. 2511.04491 translate read null
2025-11-06 Perceptions of AI Bad Behavior: Variations on Discordant Non-Performance Jaime Banks et.al. 2511.04487 translate read null
2025-11-06 Promoting Sustainable Web Agents: Benchmarking and Estimating Energy Consumption through Empirical and Theoretical Analysis Lars Krupp et.al. 2511.04481 translate read null
2025-11-06 Enabling Dynamic Sparsity in Quantized LLM Inference Rongxiang Wang et.al. 2511.04477 translate read null
2025-11-06 Beyond Shortest Path: Agentic Vehicular Routing with Semantic Context Carnot Braun et.al. 2511.04464 translate read null
2025-11-06 Speed at the Cost of Quality? The Impact of LLM Agent Assistance on Software Development Hao He et.al. 2511.04427 translate read null
2025-11-06 The Illusion of Certainty: Uncertainty quantification for LLMs fails under ambiguity Tim Tomov et.al. 2511.04418 translate read null
2025-11-06 Post-Training LLMs as Better Decision-Making Agents: A Regret-Minimization Approach Chanwoo Park et.al. 2511.04393 translate read null
2025-11-06 Multi-Task Learning for Visually Grounded Reasoning in Gastrointestinal VQA Itbaan Safwan et.al. 2511.04384 translate read null
2025-11-06 HPC-Vis: A Visual Analytic System for Interactive Exploration of Historical Painter Cohorts Yingping Yang et.al. 2511.04383 translate read null
2025-11-06 Towards Aligning Multimodal LLMs with Human Experts: A Focus on Parent-Child Interaction Weiyan Shi et.al. 2511.04366 translate read null
2025-11-06 Where Do LLMs Still Struggle? An In-Depth Analysis of Code Generation Benchmarks Amir Molzam Sharifloo et.al. 2511.04355 translate read null
2025-11-06 Differentially Private In-Context Learning with Nearest Neighbor Search Antti Koskela et.al. 2511.04332 translate read null
2025-11-06 RxSafeBench: Identifying Medication Safety Issues of Large Language Models in Simulated Consultation Jiahao Zhao et.al. 2511.04328 translate read null
2025-11-06 AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research Tim Beyer et.al. 2511.04316 translate read null
2025-11-06 Measuring economic outlook in the news timely and efficiently Elliot Beck et.al. 2511.04299 translate read null
2025-11-06 Robustness of Minimum-Volume Nonnegative Matrix Factorization under an Expanded Sufficiently Scattered Condition Giovanni Barbarino et.al. 2511.04291 translate read null
2025-11-06 A Tool for Benchmarking Large Language Models’ Robustness in Assessing the Realism of Driving Scenarios Jiahui Wu et.al. 2511.04267 translate read null
2025-11-06 SSPO: Subsentence-level Policy Optimization Kun Yang et.al. 2511.04256 translate read null
2025-11-06 Efficient Topic Extraction via Graph-Based Labeling: A Lightweight Alternative to Deep Models Salma Mekaoui et.al. 2511.04248 translate read null
2025-11-06 Reusing Pre-Training Data at Test Time is a Compute Multiplier Alex Fang et.al. 2511.04234 translate read null
2025-11-06 Black-Box Guardrail Reverse-engineering Attack Hongwei Yao et.al. 2511.04215 translate read null
2025-11-06 Block Rotation is All You Need for MXFP4 Quantization Yuantian Shao et.al. 2511.04214 translate read null
2025-11-06 Can we trust LLMs as a tutor for our students? Evaluating the Quality of LLM-generated Feedback in Statistics Exams Markus Herklotz et.al. 2511.04213 translate read null
2025-11-06 LLM-as-a-Judge is Bad, Based on AI Attempting the Exam Qualifying for the Member of the Polish National Board of Appeal Michał Karp et.al. 2511.04205 translate read null
2025-11-06 Computational Turing Test Reveals Systematic Differences Between Human and AI Language Nicolò Pagan et.al. 2511.04195 translate read null
2025-11-06 Explaining Software Vulnerabilities with Large Language Models Oshando Johnson et.al. 2511.04179 translate read null
2025-11-06 Transforming Mentorship: An AI Powered Chatbot Approach to University Guidance Mashrur Rahman et.al. 2511.04172 translate read null
2025-11-06 Are We Aligned? A Preliminary Investigation of the Alignment of Responsible AI Values between LLMs and Human Judgment Asma Yamani et.al. 2511.04157 translate read null
2025-11-06 BAPPA: Benchmarking Agents, Plans, and Pipelines for Automated Text-to-SQL Generation Fahim Ahmed et.al. 2511.04153 translate read null
2025-11-06 Implementation of transformer-based LLMs with large-scale optoelectronic neurons on a CMOS image sensor platform Neil Na et.al. 2511.04136 translate read null
2025-11-06 Exploring the Feasibility of End-to-End Large Language Model as a Compiler Hongbin Zhang et.al. 2511.04132 translate read null
2025-11-06 RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning Xinyuan Li et.al. 2511.04120 translate read null
2025-11-06 How Natural Language Proficiency Shapes GenAI Code for Software Engineering Tasks Ruksit Rojpaisarnkit et.al. 2511.04115 translate read null
2025-11-06 Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models Wenmo Qiu et.al. 2511.04108 translate read null
2025-11-06 KGFR: A Foundation Retriever for Generalized Knowledge Graph Question Answering Yuanning Cui et.al. 2511.04093 translate read null
2025-11-06 E-CARE: An Efficient LLM-based Commonsense-Augmented Framework for E-Commerce Ge Zhang et.al. 2511.04087 translate read null
2025-11-06 Caption Injection for Optimization in Generative Search Engine Xiaolu Chen et.al. 2511.04080 translate read null
2025-11-06 The truth is no diaper: Human and AI-generated associations to emotional words Špela Vintar et.al. 2511.04077 translate read null
2025-11-06 Agentmandering: A Game-Theoretic Framework for Fair Redistricting via Large Language Model Agents Hao Li et.al. 2511.04076 translate read null
2025-11-06 Plan of Knowledge: Retrieval-Augmented Large Language Models for Temporal Knowledge Graph Question Answering Xinying Qian et.al. 2511.04072 translate read null
2025-11-06 TXL Fusion: A Hybrid Machine Learning Framework Integrating Chemical Heuristics and Large Language Models for Topological Materials Discovery Arif Ullah et.al. 2511.04068 translate read null
2025-11-06 DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization Yuantian Shao et.al. 2511.04063 translate read null
2025-11-06 Interpreting Multi-Attribute Confounding through Numerical Attributes in Large Language Models Hirohane Takagi et.al. 2511.04053 translate read null
2025-11-06 An LLM-based Framework for Human-Swarm Teaming Cognition in Disaster Search and Rescue Kailun Ji et.al. 2511.04042 translate read null
2025-11-06 PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration Yue Jiet Chong et.al. 2511.04036 translate read null
2025-11-06 Detecting Silent Failures in Multi-Agentic AI Trajectories Divya Pathak et.al. 2511.04032 translate read null
2025-11-06 Abductive Inference in Retrieval-Augmented Language Models: Generating and Validating Missing Premises Shiyin Lin et.al. 2511.04020 translate read null
2025-11-06 Specification-Guided Vulnerability Detection with Large Language Models Hao Zhu et.al. 2511.04014 translate read null
2025-11-06 PSD2Code: Automated Front-End Code Generation from Design Files via Multimodal Large Language Models Yongxi Chen et.al. 2511.04012 translate read null
2025-11-06 Memory- and Latency-Constrained Inference of Large Language Models via Adaptive Split Computing Mingyu Sung et.al. 2511.04002 translate read null
2025-11-06 Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback Shiyin Lin et.al. 2511.03995 translate read null
2025-11-06 TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training Michael Menezes et.al. 2511.03983 translate read null
2025-11-06 LLMs and Cultural Values: the Impact of Prompt Language and Explicit Cultural Framing Bram Bulté et.al. 2511.03980 translate read null
2025-11-06 Direct Semantic Communication Between Large Language Models via Vector Translation Fu-Chun Yang et.al. 2511.03945 translate read null
2025-11-06 MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation Shih-Lun Wu et.al. 2511.03942 translate read null
2025-11-06 RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods Raghav Sharma et.al. 2511.03939 translate read null
2025-11-06 SynQuE: Estimating Synthetic Dataset Quality Without Annotations Arthur Chen et.al. 2511.03928 translate read null
2025-11-06 Collaborative Agents for Automated Program Repair in Ruby Nikta Akbarpour et.al. 2511.03925 translate read null
2025-11-05 The Human Flourishing Geographic Index: A County-Level Dataset for the United States, 2013–2023 Stefano M. Iacus et.al. 2511.03915 translate read null
2025-11-05 GRAD: Graph-Retrieved Adaptive Decoding for Hallucination Mitigation Manh Nguyen et.al. 2511.03900 translate read null
2025-11-05 Secure Code Generation at Scale with Reflexion Arup Datta et.al. 2511.03898 translate read null
2025-11-05 KnowThyself: An Agentic Assistant for LLM Interpretability Suraj Prasai et.al. 2511.03878 translate read null
2025-11-05 OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing Paradigms Arijit Bhattacharjee et.al. 2511.03866 translate read null
2025-11-05 GAIA: Geothermal Analytics and Intelligent Agent Randy Harsuko et.al. 2511.03852 translate read null
2025-11-05 To See or To Read: User Behavior Reasoning in Multimodal LLMs Tianning Dong et.al. 2511.03845 translate read null
2025-11-05 ASAP: an Agentic Solution to Auto-optimize Performance of Large-Scale LLM Training Yuran Ding et.al. 2511.03844 translate read null
2025-11-05 Divide, Cache, Conquer: Dichotomic Prompting for Efficient Multi-Label LLM-Based Classification Mikołaj Langner et.al. 2511.03830 translate read null
2025-11-05 STARS: Segment-level Token Alignment with Rejection Sampling in Large Language Models Mohammad Atif Quamar et.al. 2511.03827 translate read null
2025-11-05 How Different Tokenization Algorithms Impact LLMs and Transformer Models for Binary Code Analysis Ahmed Mostafa et.al. 2511.03825 translate read null
2025-11-05 PLLuM: A Family of Polish Large Language Models Jan Kocoń et.al. 2511.03823 translate read null
2025-11-05 Expert Evaluation of LLM World Models: A High- $T_c$ Superconductivity Case Study Haoyu Guo et.al. 2511.03782 translate read null
2025-11-05 Scaling Agent Learning via Experience Synthesis Zhaorun Chen et.al. 2511.03773 translate read link
2025-11-05 Disentangled Concepts Speak Louder Than Words:Explainable Video Action Recognition Jongseo Lee et.al. 2511.03725 translate read null
2025-11-05 Outbidding and Outbluffing Elite Humans: Mastering Liar’s Poker via Self-Play and Reinforcement Learning Richard Dewey et.al. 2511.03724 translate read null
2025-11-05 LLM-enhanced Air Quality Monitoring Interface via Model Context Protocol Yu-Erh Pan et.al. 2511.03706 translate read null
2025-11-05 Do Androids Dream of Unseen Puppeteers? Probing for a Conspiracy Mindset in Large Language Models Francesco Corso et.al. 2511.03699 translate read null
2025-11-05 AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing Mohsen Ahmadzadeh et.al. 2511.03697 translate read null
2025-11-05 Whisper Leak: a side-channel attack on Large Language Models Geoff McDonald et.al. 2511.03675 translate read null
2025-11-05 Watermarking Large Language Models in Europe: Interpreting the AI Act in Light of Technology Thomas Souverain et.al. 2511.03641 translate read null
2025-11-05 Towards Transparent Stance Detection: A Zero-Shot Approach Using Implicit and Explicit Interpretability Apoorva Upadhyaya et.al. 2511.03635 translate read null
2025-11-05 LiveTradeBench: Seeking Real-World Alpha with Large Language Models Haofei Yu et.al. 2511.03628 translate read null
2025-11-05 PerfDojo: Automated ML Library Generation for Heterogeneous Architectures Andrei Ivanov et.al. 2511.03586 translate read null
2025-11-05 ASVRI-Legal: Fine-Tuning LLMs with Retrieval Augmented Generation for Enhanced Legal Regulation One Octadion et.al. 2511.03563 translate read null
2025-11-05 MultiZebraLogic: A Multilingual Logical Reasoning Benchmark Sofie Helene Bruun et.al. 2511.03553 translate read null
2025-11-05 Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding Ziv Nevo et.al. 2511.03549 translate read null
2025-11-05 U2F: Encouraging SWE-Agent to Seize Novelty without Losing Feasibility Wencheng Ye et.al. 2511.03517 translate read null
2025-11-05 One Battle After Another: Probing LLMs’ Limits on Multi-Turn Instruction Following with a Benchmark Evolving Framework Qi Jia et.al. 2511.03508 translate read null
2025-11-05 BanglaSTEM: A Parallel Corpus for Technical Domain Bangla-English Translation Kazi Reyazul Hasan et.al. 2511.03498 translate read null
2025-11-05 RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse Yinsicheng Jiang et.al. 2511.03475 translate read null
2025-11-05 Towards Scalable Web Accessibility Audit with MLLMs as Copilots Ming Gu et.al. 2511.03471 translate read null
2025-11-05 CareMedEval dataset: Evaluating Critical Appraisal and Reasoning in the Biomedical Field Doria Bonzi et.al. 2511.03441 translate read null
2025-11-05 Light over Heavy: Automated Performance Requirements Quantification with Linguistic Inducement Shihai Wang et.al. 2511.03421 translate read null
2025-11-05 Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG Longpeng Qiu et.al. 2511.03410 translate read null
2025-11-05 Efficient Reasoning via Thought-Training and Thought-Free Inference Canhui Wu et.al. 2511.03408 translate read null
2025-11-05 Towards Realistic Project-Level Code Generation via Multi-Agent Collaboration and Semantic Architecture Modeling Qianhui Zhao et.al. 2511.03404 translate read null
2025-11-05 GUIDES: Guidance Using Instructor-Distilled Embeddings for Pre-trained Robot Policy Enhancement Minquan Gao et.al. 2511.03400 translate read null
2025-11-05 Computational Imaging Meets LLMs: Zero-Shot IDH Mutation Prediction in Brain Gliomas Syed Muqeem Mahmood et.al. 2511.03376 translate read null
2025-11-05 LFC-DA: Logical Formula-Controlled Data Augmentation for Enhanced Logical Reasoning Shenghao Li et.al. 2511.03372 translate read null
2025-11-05 EQ-Negotiator: Dynamic Emotional Personas Empower Small Language Models for Edge-Deployable Credit Negotiation Yunbo Long et.al. 2511.03370 translate read null
2025-11-05 Silenced Biases: The Dark Side LLMs Learned to Refuse Rom Himelstein et.al. 2511.03369 translate read null
2025-11-05 A Modular, Data-Free Pipeline for Multi-Label Intention Recognition in Transportation Agentic AI Applications Xiaocai Zhang et.al. 2511.03363 translate read null
2025-11-05 Multi-Object Tracking Retrieval with LLaVA-Video: A Training-Free Solution to MOT25-StAG Challenge Yi Yang et.al. 2511.03332 translate read null
2025-11-05 Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks Jindong Hong et.al. 2511.03328 translate read null
2025-11-05 SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding Mauro Orazio Drago et.al. 2511.03325 translate read null
2025-11-05 TASU: Text-Only Alignment for Speech Understanding Jing Peng et.al. 2511.03310 translate read null
2025-11-05 How to Evaluate Speech Translation with Source-Aware Neural MT Metrics Mauro Cettolo et.al. 2511.03295 translate read null
2025-11-05 UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM Hai Huang et.al. 2511.03293 translate read null
2025-11-05 Let the Bees Find the Weak Spots: A Path Planning Perspective on Multi-Turn Jailbreak Attacks against LLMs Yize Liu et.al. 2511.03271 translate read null
2025-11-05 SCALE: Upscaled Continual Learning of Large Language Models Jin-woo Lee et.al. 2511.03270 translate read null
2025-11-05 Comparing the Performance of LLMs in RAG-based Question-Answering: A Case Study in Computer Science Literature Ranul Dayarathne et.al. 2511.03261 translate read null
2025-11-05 Auditing M-LLMs for Privacy Risks: A Synthetic Benchmark and Evaluation Framework Junhao Li et.al. 2511.03248 translate read null
2025-11-05 Death by a Thousand Prompts: Open Model Vulnerability Analysis Amy Chang et.al. 2511.03247 translate read null
2025-11-05 IndicSuperTokenizer: An Optimized Tokenizer for Indic Multilingual LLMs Souvik Rana et.al. 2511.03237 translate read null
2025-11-05 From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers Yi-Fei Liu et.al. 2511.03235 translate read null
2025-11-05 Multimodal-Wireless: A Large-Scale Dataset for Sensing and Communication Tianhao Mao et.al. 2511.03220 translate read null
2025-11-05 Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification Shaghayegh Kolli et.al. 2511.03217 translate read null
2025-11-05 LGM: Enhancing Large Language Models with Conceptual Meta-Relations and Iterative Retrieval Wenchang Lei et.al. 2511.03214 translate read null
2025-11-05 QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models Kuei-Chun Kao et.al. 2511.03206 translate read null
2025-11-05 Large Language Models as Information Sources: Distinctive Characteristics and Types of Low-Quality Information Jiawei Zhou et.al. 2511.03198 translate read null
2025-11-05 Understanding Robustness of Model Editing in Code LLMs: An Empirical Study Vinaik Chhetri et.al. 2511.03182 translate read null
2025-11-05 Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control Rewida Ali et.al. 2511.03181 translate read null
2025-11-05 BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture Shahriyar Zaman Ridoy et.al. 2511.03180 translate read null
2025-11-05 Toward Autonomous Engineering Design: A Knowledge-Guided Multi-Agent Framework Varun Kumar et.al. 2511.03179 translate read null
2025-11-05 SurgAnt-ViVQA: Learning to Anticipate Surgical Events through GRU-Driven Temporal Cross-Attention Shreyas C. Dhake et.al. 2511.03178 translate read null
2025-11-05 AI as We Describe It: How Large Language Models and Their Applications in Health are Represented Across Channels of Public Discourse Jiawei Zhou et.al. 2511.03174 translate read null
2025-11-05 Measuring Aleatoric and Epistemic Uncertainty in LLMs: Empirical Evaluation on ID and OOD QA Tasks Kevin Wang et.al. 2511.03166 translate read null
2025-11-05 RefAgent: A Multi-agent LLM-based Framework for Automatic Software Refactoring Khouloud Oueslati et.al. 2511.03153 translate read null
2025-11-05 From Measurement to Expertise: Empathetic Expert Adapters for Context-Based Empathy in Conversational AI Agents Erfan Shayegani et.al. 2511.03143 translate read null
2025-11-05 A Proprietary Model-Based Safety Response Framework for AI Agents Qi Li et.al. 2511.03138 translate read null
2025-11-05 Using Multi-modal Large Language Model to Boost Fireworks Algorithm’s Ability in Settling Challenging Optimization Tasks Shipeng Cen et.al. 2511.03137 translate read null
2025-11-05 From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation Najrin Sultana et.al. 2511.03128 translate read null
2025-11-05 Control Barrier Function for Aligning Large Language Models Yuya Miyaoka et.al. 2511.03121 translate read null
2025-11-05 Large language models require a new form of oversight: capability-based monitoring Katherine C. Kellogg et.al. 2511.03106 translate read null
2025-11-05 CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic Saad Mankarious et.al. 2511.03102 translate read null
2025-11-05 ALAS: Transactional and Dynamic Multi-Agent LLM Planning Longling Geng et.al. 2511.03094 translate read null
2025-11-05 SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators Jonathan Li et.al. 2511.03092 translate read null
2025-11-05 PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech Michel Wong et.al. 2511.03080 translate read null
2025-11-04 A Collaborative Reasoning Framework for Anomaly Diagnostics in Underwater Robotics Markus Buchholz et.al. 2511.03075 translate read null
2025-11-04 Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge Drago Plecko et.al. 2511.03070 translate read null
2025-11-04 Reading Between the Lines: The One-Sided Conversation Problem Victoria Ebert et.al. 2511.03056 translate read null
2025-11-04 No-Human in the Loop: Agentic Evaluation at Scale for Recommendation Tao Zhang et.al. 2511.03051 translate read null
2025-11-04 ROBoto2: An Interactive System and Dataset for LLM-assisted Clinical Trial Risk of Bias Assessment Anthony Hevia et.al. 2511.03048 translate read null
2025-11-04 Unsupervised Evaluation of Multi-Turn Objective-Driven Interactions Emi Soroka et.al. 2511.03047 translate read null
2025-11-04 Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis Yan Cathy Hua et.al. 2511.03034 translate read null
2025-11-04 PublicAgent: Multi-Agent Design Principles From an LLM-Based Open Data Analysis Framework Sina Montazeri et.al. 2511.03023 translate read null
2025-11-04 LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation Gyeom Hwangbo et.al. 2511.03001 translate read null
2025-11-04 Zero-shot data citation function classification using transformer-based large language models (LLMs) Neil Byers et.al. 2511.02936 translate read null
2025-11-04 Cache Mechanism for Agent RAG Systems Shuhang Lin et.al. 2511.02919 translate read null
2025-11-04 Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models W. K. M Mithsara et.al. 2511.02894 translate read null
2025-11-04 Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything Huawei Lin et.al. 2511.02834 translate read null
2025-11-04 Can LLMs subtract numbers? Mayank Jobanputra et.al. 2511.02795 translate read null
2025-11-04 When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning Chenyu Zhang et.al. 2511.02794 translate read null
2025-11-04 When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Yiyang Zhou et.al. 2511.02779 translate read null
2025-11-04 ConMeZO: Adaptive Descent-Direction Sampling for Gradient-Free Finetuning of Large Language Models Lejs Deen Behric et.al. 2511.02757 translate read null
2025-11-04 Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning Bowen Jin et.al. 2511.02755 translate read null
2025-11-04 AI Diffusion in Low Resource Language Countries Amit Misra et.al. 2511.02752 translate read null
2025-11-04 Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning Farhad Rezazadeh et.al. 2511.02748 translate read null
2025-11-04 CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents Jiayu Liu et.al. 2511.02734 translate read link
2025-11-04 LLEXICORP: End-user Explainability of Convolutional Neural Networks Vojtěch Kůr et.al. 2511.02720 translate read null
2025-11-04 ReleaseEval: A Benchmark for Evaluating Language Models in Automated Release Note Generation Qianru Meng et.al. 2511.02713 translate read null
2025-11-04 VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models Zhicheng Zhang et.al. 2511.02712 translate read null
2025-11-04 Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs Georgios Tzannetos et.al. 2511.02690 translate read null
2025-11-04 Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes Mohammadsajad Alipour et.al. 2511.02681 translate read null
2025-11-04 EasyTUS: A Comprehensive Framework for Fast and Accurate Table Union Search across Data Lakes Tim Otto et.al. 2511.02674 translate read null
2025-11-04 Apriel-H1: Towards Efficient Enterprise Reasoning Models Oleksiy Ostapenko et.al. 2511.02651 translate read null
2025-11-04 Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks Xiumei Deng et.al. 2511.02647 translate read null
2025-11-04 DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning Lachlan McPheat et.al. 2511.02627 translate read null
2025-11-04 Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis, Solution, and Interpretation Renfei Dang et.al. 2511.02626 translate read null
2025-11-04 The Realignment Problem: When Right becomes Wrong in LLMs Aakash Sen Sharma et.al. 2511.02623 translate read null
2025-11-04 Verifying LLM Inference to Prevent Model Weight Exfiltration Roy Rinberg et.al. 2511.02620 translate read null
2025-11-04 UniChange: Unifying Change Detection with Multimodal Large Language Model Xu Zhang et.al. 2511.02607 translate read null
2025-11-04 CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency Ehsan Aghazadeh et.al. 2511.02603 translate read null
2025-11-04 Next Token Knowledge Tracing: Exploiting Pretrained LLM Representations to Decode Student Behaviour Max Norris et.al. 2511.02599 translate read null
2025-11-04 A Large Language Model for Corporate Credit Scoring Chitro Majumdar et.al. 2511.02593 translate read null
2025-11-04 The ORCA Benchmark: Evaluating Real-World Calculation Accuracy in Large Language Models Claudia Herambourg et.al. 2511.02589 translate read null
2025-11-04 Smart-Hiring: An Explainable end-to-end Pipeline for CV Information Extraction and Job Matching Kenza Khelkhal et.al. 2511.02537 translate read null
2025-11-04 Knowledge Graph-enhanced Large Language Model for Incremental Game PlayTesting Enhong Mu et.al. 2511.02534 translate read null
2025-11-04 Causal Graph Neural Networks for Healthcare Munib Mesinovic et.al. 2511.02531 translate read null
2025-11-04 Large Lemma Miners: Can LLMs do Induction Proofs for Hardware? Romy Peled et.al. 2511.02521 translate read null
2025-11-04 ESA: Energy-Based Shot Assembly Optimization for Automatic Video Editing Yaosen Chen et.al. 2511.02505 translate read null
2025-11-04 BRAINS: A Retrieval-Augmented System for Alzheimer’s Detection and Monitoring Rajan Das Gupta et.al. 2511.02490 translate read null
2025-11-04 Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization Tao Liu et.al. 2511.02489 translate read link
2025-11-04 Modeling Hawkish-Dovish Latent Beliefs in Multi-Agent Debate-Based LLMs for Monetary Policy Decision Classification Kaito Takano et.al. 2511.02469 translate read null
2025-11-04 Auditable-choice reframing unlocks RL-based verification for open-ended tasks Mengyu Zhang et.al. 2511.02463 translate read null
2025-11-04 Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas Giulia Iadisernia et.al. 2511.02458 translate read null
2025-11-04 Who’s Who? LLM-assisted Software Traceability with Architecture Entity Recognition Dominik Fuchß et.al. 2511.02434 translate read null
2025-11-04 Can Conversational AI Counsel for Change? A Theory-Driven Approach to Supporting Dietary Intentions in Ambivalent Individuals Michelle Bak et.al. 2511.02428 translate read null
2025-11-04 From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics Nicolas Schuler et.al. 2511.02427 translate read null
2025-11-04 ReAcTree: Hierarchical LLM Agent Trees with Control Flow for Long-Horizon Task Planning Jae-Woo Choi et.al. 2511.02424 translate read null
2025-11-04 LLM4PG: Adapting Large Language Model for Pathloss Map Generation via Synesthesia of Machines Mingran Sun et.al. 2511.02423 translate read null
2025-11-04 ChartM $^3$ : A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension Duo Xu et.al. 2511.02415 translate read null
2025-11-04 EvoDev: An Iterative Feature-Driven Framework for End-to-End Software Development with LLM-based Agents Junwei Liu et.al. 2511.02399 translate read null
2025-11-04 RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning Jiahe Song et.al. 2511.02384 translate read null
2025-11-04 Revisiting put-that-there, context aware window interactions via LLMs Riccardo Bovo et.al. 2511.02378 translate read null
2025-11-04 AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models Aashray Reddy et.al. 2511.02376 translate read null
2025-11-04 AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda Mohd Nauman et.al. 2511.02374 translate read null
2025-11-04 LUMA-RAG: Lifelong Multimodal Agents with Provably Stable Streaming Alignment Rohan Wandre et.al. 2511.02371 translate read null
2025-11-04 An LLM-powered MILP modelling engine for workforce scheduling guided by expert knowledge Qingyang Li et.al. 2511.02364 translate read null
2025-11-04 Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation Wongyu Kim et.al. 2511.02358 translate read null
2025-11-04 An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks Xu Liu et.al. 2511.02356 translate read null
2025-11-04 LTD-Bench: Evaluating Large Language Models by Letting Them Draw Liuhao Lin et.al. 2511.02347 translate read link
2025-11-04 Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation Zhiwei Zhang et.al. 2511.02303 translate read null
2025-11-04 VFocus: Better Verilog Generation from Large Language Model via Focused Reasoning Zhuorui Zhao et.al. 2511.02285 translate read null
2025-11-04 SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning Fangxun Shu et.al. 2511.02280 translate read link
2025-11-04 LA-MARRVEL: A Knowledge-Grounded and Language-Aware LLM Reranker for AI-MARRVEL in Rare Disease Diagnosis Jaeyeon Lee et.al. 2511.02263 translate read null
2025-11-04 When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs Zhuoran Zhang et.al. 2511.02243 translate read null
2025-11-04 Deep Ideation: Designing LLM Agents to Generate Novel Research Ideas on Scientific Concept Network Keyu Zhao et.al. 2511.02238 translate read null
2025-11-04 An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM Jiawei Liu et.al. 2511.02234 translate read null
2025-11-04 Quantitative Risk Assessment in Radiation Oncology via LLM-Powered Root Cause Analysis of Incident Reports Yuntao Wang et.al. 2511.02223 translate read null
2025-11-04 TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data Changjiang Jiang et.al. 2511.02219 translate read null
2025-11-04 IG-Pruning: Input-Guided Block Pruning for Large Language Models Kangyu Qiao et.al. 2511.02213 translate read null
2025-11-04 Language-Enhanced Generative Modeling for PET Synthesis from MRI and Blood Biomarkers Zhengjie Zhang et.al. 2511.02206 translate read null
2025-11-04 LLMs as Judges: Toward The Automatic Review of GSN-compliant Assurance Cases Gerhard Yu et.al. 2511.02203 translate read null
2025-11-04 Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration Jingbo Wang et.al. 2511.02200 translate read null
2025-11-04 Open the Oyster: Empirical Evaluation and Improvement of Code Reasoning Confidence in LLMs Shufan Wang et.al. 2511.02197 translate read null
2025-11-04 Personalized Decision Modeling: Utility Optimization or Textualized-Symbolic Reasoning Yibo Zhao et.al. 2511.02194 translate read null
2025-11-04 Pinpointing Trigger Moment for Grounded Video QA: Enhancing Spatio-temporal Grounding in Multimodal Large Language Models Jinhwan Seo et.al. 2511.02182 translate read null
2025-11-04 Eliminating Multi-GPU Performance Taxes: A Systems Approach to Efficient Distributed LLMs Octavian Alexandru Trifan et.al. 2511.02168 translate read null
2025-11-03 Rethinking LLM Human Simulation: When a Graph is What You Need Joseph Suh et.al. 2511.02135 translate read null
2025-11-03 InsurAgent: A Large Language Model-Empowered Agent for Simulating Individual Behavior in Purchasing Flood Insurance Ziheng Geng et.al. 2511.02119 translate read null
2025-11-03 Deep Value Benchmark: Measuring Whether Models Generalize Deep values or Shallow Preferences Joshua Ashkinaze et.al. 2511.02109 translate read null
2025-11-03 Metamorphic Testing of Large Language Models for Natural Language Processing Steven Cho et.al. 2511.02108 translate read null
2025-11-03 LLM Probing with Contrastive Eigenproblems: Improving Understanding and Applicability of CCS Stefan F. Schouten et.al. 2511.02089 translate read null
2025-11-03 Watermarking Discrete Diffusion Language Models Avi Bagchi et.al. 2511.02083 translate read null

(<a href=../LLM.md>back to LLM</a>)