LLM - 2025-11 | Paper Arxiv Daily

LLM - 2025-11

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-11-06	Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs	Preetum Nakkiran et.al.	2511.04869	translate	read	null
2025-11-06	Software Defined Vehicle Code Generation: A Few-Shot Prompting Approach	Quang-Dung Nguyen et.al.	2511.04849	translate	read	null
2025-11-06	Grounded Test-Time Adaptation for LLM Agents	Arthur Chen et.al.	2511.04847	translate	read	null
2025-11-06	Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models	Chenxi Liu et.al.	2511.04800	translate	read	null
2025-11-06	ReGen: Generative Robot Simulation via Inverse Design	Phat Nguyen et.al.	2511.04769	translate	read	null
2025-11-06	Surprisal reveals diversity gaps in image captioning and different scorers change the story	Nikolai Ilinykh et.al.	2511.04754	translate	read	null
2025-11-06	Trustworthiness Calibration Framework for Phishing Email Detection Using Large Language Models	Daniyal Ganiuly et.al.	2511.04728	translate	read	null
2025-11-06	IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs	Ali Faraz et.al.	2511.04727	translate	read	null
2025-11-06	Learning to reason about rare diseases through retrieval-augmented agents	Ha Young Kim et.al.	2511.04720	translate	read	null
2025-11-06	Benchmark Designers Should “Train on the Test Set” to Expose Exploitable Non-Visual Shortcuts	Ellis Brown et.al.	2511.04655	translate	read	null
2025-11-06	Logit-Entropy Adaptive Stopping Heuristic for Efficient Chain-of-Thought Reasoning	Mohammad Atif Quamar et.al.	2511.04654	translate	read	null
2025-11-06	Optimal Inference Schedules for Masked Diffusion Models	Sitan Chen et.al.	2511.04647	translate	read	null
2025-11-06	When retrieval outperforms generation: Dense evidence retrieval for scalable fake news detection	Alamgir Munir Qazi et.al.	2511.04643	translate	read	link
2025-11-06	PixCLIP: Achieving Fine-grained Visual Language Understanding via Any-granularity Pixel-Text Alignment Learning	Yicheng Xiao et.al.	2511.04601	translate	read	null
2025-11-06	Question the Questions: Auditing Representation in Online Deliberative Processes	Soham De et.al.	2511.04588	translate	read	null
2025-11-06	ARETE: an R package for Automated REtrieval from TExt with large language models	Vasco V. Branco et.al.	2511.04573	translate	read	null
2025-11-06	Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm	Jingqi Tong et.al.	2511.04570	translate	read	link
2025-11-06	LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems	Baptiste Bonin et.al.	2511.04541	translate	read	null
2025-11-06	From Model to Breach: Towards Actionable LLM-Generated Vulnerabilities Reporting	Cyril Vallez et.al.	2511.04538	translate	read	null
2025-11-06	Large Language Models for Cyber Security	Raunak Somani et.al.	2511.04508	translate	read	null
2025-11-06	RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG	Joshua Gao et.al.	2511.04502	translate	read	null
2025-11-06	Large language models replicate and predict human cooperation across experiments in game theory	Andrea Cera Palatsi et.al.	2511.04500	translate	read	null
2025-11-06	Decoding Emergent Big Five Traits in Large Language Models: Temperature-Dependent Expression and Architectural Clustering	Christos-Nikolaos Zacharopoulos et.al.	2511.04499	translate	read	null
2025-11-06	RUST-BENCH: Benchmarking LLM Reasoning on Unstructured Text within Structured Tables	Nikhil Abhyankar et.al.	2511.04491	translate	read	null
2025-11-06	Perceptions of AI Bad Behavior: Variations on Discordant Non-Performance	Jaime Banks et.al.	2511.04487	translate	read	null
2025-11-06	Promoting Sustainable Web Agents: Benchmarking and Estimating Energy Consumption through Empirical and Theoretical Analysis	Lars Krupp et.al.	2511.04481	translate	read	null
2025-11-06	Enabling Dynamic Sparsity in Quantized LLM Inference	Rongxiang Wang et.al.	2511.04477	translate	read	null
2025-11-06	Beyond Shortest Path: Agentic Vehicular Routing with Semantic Context	Carnot Braun et.al.	2511.04464	translate	read	null
2025-11-06	Speed at the Cost of Quality? The Impact of LLM Agent Assistance on Software Development	Hao He et.al.	2511.04427	translate	read	null
2025-11-06	The Illusion of Certainty: Uncertainty quantification for LLMs fails under ambiguity	Tim Tomov et.al.	2511.04418	translate	read	null
2025-11-06	Post-Training LLMs as Better Decision-Making Agents: A Regret-Minimization Approach	Chanwoo Park et.al.	2511.04393	translate	read	null
2025-11-06	Multi-Task Learning for Visually Grounded Reasoning in Gastrointestinal VQA	Itbaan Safwan et.al.	2511.04384	translate	read	null
2025-11-06	HPC-Vis: A Visual Analytic System for Interactive Exploration of Historical Painter Cohorts	Yingping Yang et.al.	2511.04383	translate	read	null
2025-11-06	Towards Aligning Multimodal LLMs with Human Experts: A Focus on Parent-Child Interaction	Weiyan Shi et.al.	2511.04366	translate	read	null
2025-11-06	Where Do LLMs Still Struggle? An In-Depth Analysis of Code Generation Benchmarks	Amir Molzam Sharifloo et.al.	2511.04355	translate	read	null
2025-11-06	Differentially Private In-Context Learning with Nearest Neighbor Search	Antti Koskela et.al.	2511.04332	translate	read	null
2025-11-06	RxSafeBench: Identifying Medication Safety Issues of Large Language Models in Simulated Consultation	Jiahao Zhao et.al.	2511.04328	translate	read	null
2025-11-06	AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research	Tim Beyer et.al.	2511.04316	translate	read	null
2025-11-06	Measuring economic outlook in the news timely and efficiently	Elliot Beck et.al.	2511.04299	translate	read	null
2025-11-06	Robustness of Minimum-Volume Nonnegative Matrix Factorization under an Expanded Sufficiently Scattered Condition	Giovanni Barbarino et.al.	2511.04291	translate	read	null
2025-11-06	A Tool for Benchmarking Large Language Models’ Robustness in Assessing the Realism of Driving Scenarios	Jiahui Wu et.al.	2511.04267	translate	read	null
2025-11-06	SSPO: Subsentence-level Policy Optimization	Kun Yang et.al.	2511.04256	translate	read	null
2025-11-06	Efficient Topic Extraction via Graph-Based Labeling: A Lightweight Alternative to Deep Models	Salma Mekaoui et.al.	2511.04248	translate	read	null
2025-11-06	Reusing Pre-Training Data at Test Time is a Compute Multiplier	Alex Fang et.al.	2511.04234	translate	read	null
2025-11-06	Black-Box Guardrail Reverse-engineering Attack	Hongwei Yao et.al.	2511.04215	translate	read	null
2025-11-06	Block Rotation is All You Need for MXFP4 Quantization	Yuantian Shao et.al.	2511.04214	translate	read	null
2025-11-06	Can we trust LLMs as a tutor for our students? Evaluating the Quality of LLM-generated Feedback in Statistics Exams	Markus Herklotz et.al.	2511.04213	translate	read	null
2025-11-06	LLM-as-a-Judge is Bad, Based on AI Attempting the Exam Qualifying for the Member of the Polish National Board of Appeal	Michał Karp et.al.	2511.04205	translate	read	null
2025-11-06	Computational Turing Test Reveals Systematic Differences Between Human and AI Language	Nicolò Pagan et.al.	2511.04195	translate	read	null
2025-11-06	Explaining Software Vulnerabilities with Large Language Models	Oshando Johnson et.al.	2511.04179	translate	read	null
2025-11-06	Transforming Mentorship: An AI Powered Chatbot Approach to University Guidance	Mashrur Rahman et.al.	2511.04172	translate	read	null
2025-11-06	Are We Aligned? A Preliminary Investigation of the Alignment of Responsible AI Values between LLMs and Human Judgment	Asma Yamani et.al.	2511.04157	translate	read	null
2025-11-06	BAPPA: Benchmarking Agents, Plans, and Pipelines for Automated Text-to-SQL Generation	Fahim Ahmed et.al.	2511.04153	translate	read	null
2025-11-06	Implementation of transformer-based LLMs with large-scale optoelectronic neurons on a CMOS image sensor platform	Neil Na et.al.	2511.04136	translate	read	null
2025-11-06	Exploring the Feasibility of End-to-End Large Language Model as a Compiler	Hongbin Zhang et.al.	2511.04132	translate	read	null
2025-11-06	RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning	Xinyuan Li et.al.	2511.04120	translate	read	null
2025-11-06	How Natural Language Proficiency Shapes GenAI Code for Software Engineering Tasks	Ruksit Rojpaisarnkit et.al.	2511.04115	translate	read	null
2025-11-06	Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models	Wenmo Qiu et.al.	2511.04108	translate	read	null
2025-11-06	KGFR: A Foundation Retriever for Generalized Knowledge Graph Question Answering	Yuanning Cui et.al.	2511.04093	translate	read	null
2025-11-06	E-CARE: An Efficient LLM-based Commonsense-Augmented Framework for E-Commerce	Ge Zhang et.al.	2511.04087	translate	read	null
2025-11-06	Caption Injection for Optimization in Generative Search Engine	Xiaolu Chen et.al.	2511.04080	translate	read	null
2025-11-06	The truth is no diaper: Human and AI-generated associations to emotional words	Špela Vintar et.al.	2511.04077	translate	read	null
2025-11-06	Agentmandering: A Game-Theoretic Framework for Fair Redistricting via Large Language Model Agents	Hao Li et.al.	2511.04076	translate	read	null
2025-11-06	Plan of Knowledge: Retrieval-Augmented Large Language Models for Temporal Knowledge Graph Question Answering	Xinying Qian et.al.	2511.04072	translate	read	null
2025-11-06	TXL Fusion: A Hybrid Machine Learning Framework Integrating Chemical Heuristics and Large Language Models for Topological Materials Discovery	Arif Ullah et.al.	2511.04068	translate	read	null
2025-11-06	DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization	Yuantian Shao et.al.	2511.04063	translate	read	null
2025-11-06	Interpreting Multi-Attribute Confounding through Numerical Attributes in Large Language Models	Hirohane Takagi et.al.	2511.04053	translate	read	null
2025-11-06	An LLM-based Framework for Human-Swarm Teaming Cognition in Disaster Search and Rescue	Kailun Ji et.al.	2511.04042	translate	read	null
2025-11-06	PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration	Yue Jiet Chong et.al.	2511.04036	translate	read	null
2025-11-06	Detecting Silent Failures in Multi-Agentic AI Trajectories	Divya Pathak et.al.	2511.04032	translate	read	null
2025-11-06	Abductive Inference in Retrieval-Augmented Language Models: Generating and Validating Missing Premises	Shiyin Lin et.al.	2511.04020	translate	read	null
2025-11-06	Specification-Guided Vulnerability Detection with Large Language Models	Hao Zhu et.al.	2511.04014	translate	read	null
2025-11-06	PSD2Code: Automated Front-End Code Generation from Design Files via Multimodal Large Language Models	Yongxi Chen et.al.	2511.04012	translate	read	null
2025-11-06	Memory- and Latency-Constrained Inference of Large Language Models via Adaptive Split Computing	Mingyu Sung et.al.	2511.04002	translate	read	null
2025-11-06	Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback	Shiyin Lin et.al.	2511.03995	translate	read	null
2025-11-06	TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training	Michael Menezes et.al.	2511.03983	translate	read	null
2025-11-06	LLMs and Cultural Values: the Impact of Prompt Language and Explicit Cultural Framing	Bram Bulté et.al.	2511.03980	translate	read	null
2025-11-06	Direct Semantic Communication Between Large Language Models via Vector Translation	Fu-Chun Yang et.al.	2511.03945	translate	read	null
2025-11-06	MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation	Shih-Lun Wu et.al.	2511.03942	translate	read	null
2025-11-06	RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods	Raghav Sharma et.al.	2511.03939	translate	read	null
2025-11-06	SynQuE: Estimating Synthetic Dataset Quality Without Annotations	Arthur Chen et.al.	2511.03928	translate	read	null
2025-11-06	Collaborative Agents for Automated Program Repair in Ruby	Nikta Akbarpour et.al.	2511.03925	translate	read	null
2025-11-05	The Human Flourishing Geographic Index: A County-Level Dataset for the United States, 2013–2023	Stefano M. Iacus et.al.	2511.03915	translate	read	null
2025-11-05	GRAD: Graph-Retrieved Adaptive Decoding for Hallucination Mitigation	Manh Nguyen et.al.	2511.03900	translate	read	null
2025-11-05	Secure Code Generation at Scale with Reflexion	Arup Datta et.al.	2511.03898	translate	read	null
2025-11-05	KnowThyself: An Agentic Assistant for LLM Interpretability	Suraj Prasai et.al.	2511.03878	translate	read	null
2025-11-05	OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing Paradigms	Arijit Bhattacharjee et.al.	2511.03866	translate	read	null
2025-11-05	GAIA: Geothermal Analytics and Intelligent Agent	Randy Harsuko et.al.	2511.03852	translate	read	null
2025-11-05	To See or To Read: User Behavior Reasoning in Multimodal LLMs	Tianning Dong et.al.	2511.03845	translate	read	null
2025-11-05	ASAP: an Agentic Solution to Auto-optimize Performance of Large-Scale LLM Training	Yuran Ding et.al.	2511.03844	translate	read	null
2025-11-05	Divide, Cache, Conquer: Dichotomic Prompting for Efficient Multi-Label LLM-Based Classification	Mikołaj Langner et.al.	2511.03830	translate	read	null
2025-11-05	STARS: Segment-level Token Alignment with Rejection Sampling in Large Language Models	Mohammad Atif Quamar et.al.	2511.03827	translate	read	null
2025-11-05	How Different Tokenization Algorithms Impact LLMs and Transformer Models for Binary Code Analysis	Ahmed Mostafa et.al.	2511.03825	translate	read	null
2025-11-05	PLLuM: A Family of Polish Large Language Models	Jan Kocoń et.al.	2511.03823	translate	read	null
2025-11-05	Expert Evaluation of LLM World Models: A High- $T_c$ Superconductivity Case Study	Haoyu Guo et.al.	2511.03782	translate	read	null
2025-11-05	Scaling Agent Learning via Experience Synthesis	Zhaorun Chen et.al.	2511.03773	translate	read	link
2025-11-05	Disentangled Concepts Speak Louder Than Words:Explainable Video Action Recognition	Jongseo Lee et.al.	2511.03725	translate	read	null
2025-11-05	Outbidding and Outbluffing Elite Humans: Mastering Liar’s Poker via Self-Play and Reinforcement Learning	Richard Dewey et.al.	2511.03724	translate	read	null
2025-11-05	LLM-enhanced Air Quality Monitoring Interface via Model Context Protocol	Yu-Erh Pan et.al.	2511.03706	translate	read	null
2025-11-05	Do Androids Dream of Unseen Puppeteers? Probing for a Conspiracy Mindset in Large Language Models	Francesco Corso et.al.	2511.03699	translate	read	null
2025-11-05	AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing	Mohsen Ahmadzadeh et.al.	2511.03697	translate	read	null
2025-11-05	Whisper Leak: a side-channel attack on Large Language Models	Geoff McDonald et.al.	2511.03675	translate	read	null
2025-11-05	Watermarking Large Language Models in Europe: Interpreting the AI Act in Light of Technology	Thomas Souverain et.al.	2511.03641	translate	read	null
2025-11-05	Towards Transparent Stance Detection: A Zero-Shot Approach Using Implicit and Explicit Interpretability	Apoorva Upadhyaya et.al.	2511.03635	translate	read	null
2025-11-05	LiveTradeBench: Seeking Real-World Alpha with Large Language Models	Haofei Yu et.al.	2511.03628	translate	read	null
2025-11-05	PerfDojo: Automated ML Library Generation for Heterogeneous Architectures	Andrei Ivanov et.al.	2511.03586	translate	read	null
2025-11-05	ASVRI-Legal: Fine-Tuning LLMs with Retrieval Augmented Generation for Enhanced Legal Regulation	One Octadion et.al.	2511.03563	translate	read	null
2025-11-05	MultiZebraLogic: A Multilingual Logical Reasoning Benchmark	Sofie Helene Bruun et.al.	2511.03553	translate	read	null
2025-11-05	Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding	Ziv Nevo et.al.	2511.03549	translate	read	null
2025-11-05	U2F: Encouraging SWE-Agent to Seize Novelty without Losing Feasibility	Wencheng Ye et.al.	2511.03517	translate	read	null
2025-11-05	One Battle After Another: Probing LLMs’ Limits on Multi-Turn Instruction Following with a Benchmark Evolving Framework	Qi Jia et.al.	2511.03508	translate	read	null
2025-11-05	BanglaSTEM: A Parallel Corpus for Technical Domain Bangla-English Translation	Kazi Reyazul Hasan et.al.	2511.03498	translate	read	null
2025-11-05	RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse	Yinsicheng Jiang et.al.	2511.03475	translate	read	null
2025-11-05	Towards Scalable Web Accessibility Audit with MLLMs as Copilots	Ming Gu et.al.	2511.03471	translate	read	null
2025-11-05	CareMedEval dataset: Evaluating Critical Appraisal and Reasoning in the Biomedical Field	Doria Bonzi et.al.	2511.03441	translate	read	null
2025-11-05	Light over Heavy: Automated Performance Requirements Quantification with Linguistic Inducement	Shihai Wang et.al.	2511.03421	translate	read	null
2025-11-05	Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG	Longpeng Qiu et.al.	2511.03410	translate	read	null
2025-11-05	Efficient Reasoning via Thought-Training and Thought-Free Inference	Canhui Wu et.al.	2511.03408	translate	read	null
2025-11-05	Towards Realistic Project-Level Code Generation via Multi-Agent Collaboration and Semantic Architecture Modeling	Qianhui Zhao et.al.	2511.03404	translate	read	null
2025-11-05	GUIDES: Guidance Using Instructor-Distilled Embeddings for Pre-trained Robot Policy Enhancement	Minquan Gao et.al.	2511.03400	translate	read	null
2025-11-05	Computational Imaging Meets LLMs: Zero-Shot IDH Mutation Prediction in Brain Gliomas	Syed Muqeem Mahmood et.al.	2511.03376	translate	read	null
2025-11-05	LFC-DA: Logical Formula-Controlled Data Augmentation for Enhanced Logical Reasoning	Shenghao Li et.al.	2511.03372	translate	read	null
2025-11-05	EQ-Negotiator: Dynamic Emotional Personas Empower Small Language Models for Edge-Deployable Credit Negotiation	Yunbo Long et.al.	2511.03370	translate	read	null
2025-11-05	Silenced Biases: The Dark Side LLMs Learned to Refuse	Rom Himelstein et.al.	2511.03369	translate	read	null
2025-11-05	A Modular, Data-Free Pipeline for Multi-Label Intention Recognition in Transportation Agentic AI Applications	Xiaocai Zhang et.al.	2511.03363	translate	read	null
2025-11-05	Multi-Object Tracking Retrieval with LLaVA-Video: A Training-Free Solution to MOT25-StAG Challenge	Yi Yang et.al.	2511.03332	translate	read	null
2025-11-05	Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks	Jindong Hong et.al.	2511.03328	translate	read	null
2025-11-05	SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding	Mauro Orazio Drago et.al.	2511.03325	translate	read	null
2025-11-05	TASU: Text-Only Alignment for Speech Understanding	Jing Peng et.al.	2511.03310	translate	read	null
2025-11-05	How to Evaluate Speech Translation with Source-Aware Neural MT Metrics	Mauro Cettolo et.al.	2511.03295	translate	read	null
2025-11-05	UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM	Hai Huang et.al.	2511.03293	translate	read	null
2025-11-05	Let the Bees Find the Weak Spots: A Path Planning Perspective on Multi-Turn Jailbreak Attacks against LLMs	Yize Liu et.al.	2511.03271	translate	read	null
2025-11-05	SCALE: Upscaled Continual Learning of Large Language Models	Jin-woo Lee et.al.	2511.03270	translate	read	null
2025-11-05	Comparing the Performance of LLMs in RAG-based Question-Answering: A Case Study in Computer Science Literature	Ranul Dayarathne et.al.	2511.03261	translate	read	null
2025-11-05	Auditing M-LLMs for Privacy Risks: A Synthetic Benchmark and Evaluation Framework	Junhao Li et.al.	2511.03248	translate	read	null
2025-11-05	Death by a Thousand Prompts: Open Model Vulnerability Analysis	Amy Chang et.al.	2511.03247	translate	read	null
2025-11-05	IndicSuperTokenizer: An Optimized Tokenizer for Indic Multilingual LLMs	Souvik Rana et.al.	2511.03237	translate	read	null
2025-11-05	From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers	Yi-Fei Liu et.al.	2511.03235	translate	read	null
2025-11-05	Multimodal-Wireless: A Large-Scale Dataset for Sensing and Communication	Tianhao Mao et.al.	2511.03220	translate	read	null
2025-11-05	Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification	Shaghayegh Kolli et.al.	2511.03217	translate	read	null
2025-11-05	LGM: Enhancing Large Language Models with Conceptual Meta-Relations and Iterative Retrieval	Wenchang Lei et.al.	2511.03214	translate	read	null
2025-11-05	QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models	Kuei-Chun Kao et.al.	2511.03206	translate	read	null
2025-11-05	Large Language Models as Information Sources: Distinctive Characteristics and Types of Low-Quality Information	Jiawei Zhou et.al.	2511.03198	translate	read	null
2025-11-05	Understanding Robustness of Model Editing in Code LLMs: An Empirical Study	Vinaik Chhetri et.al.	2511.03182	translate	read	null
2025-11-05	Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control	Rewida Ali et.al.	2511.03181	translate	read	null
2025-11-05	BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture	Shahriyar Zaman Ridoy et.al.	2511.03180	translate	read	null
2025-11-05	Toward Autonomous Engineering Design: A Knowledge-Guided Multi-Agent Framework	Varun Kumar et.al.	2511.03179	translate	read	null
2025-11-05	SurgAnt-ViVQA: Learning to Anticipate Surgical Events through GRU-Driven Temporal Cross-Attention	Shreyas C. Dhake et.al.	2511.03178	translate	read	null
2025-11-05	AI as We Describe It: How Large Language Models and Their Applications in Health are Represented Across Channels of Public Discourse	Jiawei Zhou et.al.	2511.03174	translate	read	null
2025-11-05	Measuring Aleatoric and Epistemic Uncertainty in LLMs: Empirical Evaluation on ID and OOD QA Tasks	Kevin Wang et.al.	2511.03166	translate	read	null
2025-11-05	RefAgent: A Multi-agent LLM-based Framework for Automatic Software Refactoring	Khouloud Oueslati et.al.	2511.03153	translate	read	null
2025-11-05	From Measurement to Expertise: Empathetic Expert Adapters for Context-Based Empathy in Conversational AI Agents	Erfan Shayegani et.al.	2511.03143	translate	read	null
2025-11-05	A Proprietary Model-Based Safety Response Framework for AI Agents	Qi Li et.al.	2511.03138	translate	read	null
2025-11-05	Using Multi-modal Large Language Model to Boost Fireworks Algorithm’s Ability in Settling Challenging Optimization Tasks	Shipeng Cen et.al.	2511.03137	translate	read	null
2025-11-05	From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation	Najrin Sultana et.al.	2511.03128	translate	read	null
2025-11-05	Control Barrier Function for Aligning Large Language Models	Yuya Miyaoka et.al.	2511.03121	translate	read	null
2025-11-05	Large language models require a new form of oversight: capability-based monitoring	Katherine C. Kellogg et.al.	2511.03106	translate	read	null
2025-11-05	CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic	Saad Mankarious et.al.	2511.03102	translate	read	null
2025-11-05	ALAS: Transactional and Dynamic Multi-Agent LLM Planning	Longling Geng et.al.	2511.03094	translate	read	null
2025-11-05	SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators	Jonathan Li et.al.	2511.03092	translate	read	null
2025-11-05	PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech	Michel Wong et.al.	2511.03080	translate	read	null
2025-11-04	A Collaborative Reasoning Framework for Anomaly Diagnostics in Underwater Robotics	Markus Buchholz et.al.	2511.03075	translate	read	null
2025-11-04	Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge	Drago Plecko et.al.	2511.03070	translate	read	null
2025-11-04	Reading Between the Lines: The One-Sided Conversation Problem	Victoria Ebert et.al.	2511.03056	translate	read	null
2025-11-04	No-Human in the Loop: Agentic Evaluation at Scale for Recommendation	Tao Zhang et.al.	2511.03051	translate	read	null
2025-11-04	ROBoto2: An Interactive System and Dataset for LLM-assisted Clinical Trial Risk of Bias Assessment	Anthony Hevia et.al.	2511.03048	translate	read	null
2025-11-04	Unsupervised Evaluation of Multi-Turn Objective-Driven Interactions	Emi Soroka et.al.	2511.03047	translate	read	null
2025-11-04	Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis	Yan Cathy Hua et.al.	2511.03034	translate	read	null
2025-11-04	PublicAgent: Multi-Agent Design Principles From an LLM-Based Open Data Analysis Framework	Sina Montazeri et.al.	2511.03023	translate	read	null
2025-11-04	LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation	Gyeom Hwangbo et.al.	2511.03001	translate	read	null
2025-11-04	Zero-shot data citation function classification using transformer-based large language models (LLMs)	Neil Byers et.al.	2511.02936	translate	read	null
2025-11-04	Cache Mechanism for Agent RAG Systems	Shuhang Lin et.al.	2511.02919	translate	read	null
2025-11-04	Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models	W. K. M Mithsara et.al.	2511.02894	translate	read	null
2025-11-04	Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything	Huawei Lin et.al.	2511.02834	translate	read	null
2025-11-04	Can LLMs subtract numbers?	Mayank Jobanputra et.al.	2511.02795	translate	read	null
2025-11-04	When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning	Chenyu Zhang et.al.	2511.02794	translate	read	null
2025-11-04	When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought	Yiyang Zhou et.al.	2511.02779	translate	read	null
2025-11-04	ConMeZO: Adaptive Descent-Direction Sampling for Gradient-Free Finetuning of Large Language Models	Lejs Deen Behric et.al.	2511.02757	translate	read	null
2025-11-04	Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning	Bowen Jin et.al.	2511.02755	translate	read	null
2025-11-04	AI Diffusion in Low Resource Language Countries	Amit Misra et.al.	2511.02752	translate	read	null
2025-11-04	Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning	Farhad Rezazadeh et.al.	2511.02748	translate	read	null
2025-11-04	CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents	Jiayu Liu et.al.	2511.02734	translate	read	link
2025-11-04	LLEXICORP: End-user Explainability of Convolutional Neural Networks	Vojtěch Kůr et.al.	2511.02720	translate	read	null
2025-11-04	ReleaseEval: A Benchmark for Evaluating Language Models in Automated Release Note Generation	Qianru Meng et.al.	2511.02713	translate	read	null
2025-11-04	VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models	Zhicheng Zhang et.al.	2511.02712	translate	read	null
2025-11-04	Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs	Georgios Tzannetos et.al.	2511.02690	translate	read	null
2025-11-04	Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes	Mohammadsajad Alipour et.al.	2511.02681	translate	read	null
2025-11-04	EasyTUS: A Comprehensive Framework for Fast and Accurate Table Union Search across Data Lakes	Tim Otto et.al.	2511.02674	translate	read	null
2025-11-04	Apriel-H1: Towards Efficient Enterprise Reasoning Models	Oleksiy Ostapenko et.al.	2511.02651	translate	read	null
2025-11-04	Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks	Xiumei Deng et.al.	2511.02647	translate	read	null
2025-11-04	DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning	Lachlan McPheat et.al.	2511.02627	translate	read	null
2025-11-04	Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis, Solution, and Interpretation	Renfei Dang et.al.	2511.02626	translate	read	null
2025-11-04	The Realignment Problem: When Right becomes Wrong in LLMs	Aakash Sen Sharma et.al.	2511.02623	translate	read	null
2025-11-04	Verifying LLM Inference to Prevent Model Weight Exfiltration	Roy Rinberg et.al.	2511.02620	translate	read	null
2025-11-04	UniChange: Unifying Change Detection with Multimodal Large Language Model	Xu Zhang et.al.	2511.02607	translate	read	null
2025-11-04	CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency	Ehsan Aghazadeh et.al.	2511.02603	translate	read	null
2025-11-04	Next Token Knowledge Tracing: Exploiting Pretrained LLM Representations to Decode Student Behaviour	Max Norris et.al.	2511.02599	translate	read	null
2025-11-04	A Large Language Model for Corporate Credit Scoring	Chitro Majumdar et.al.	2511.02593	translate	read	null
2025-11-04	The ORCA Benchmark: Evaluating Real-World Calculation Accuracy in Large Language Models	Claudia Herambourg et.al.	2511.02589	translate	read	null
2025-11-04	Smart-Hiring: An Explainable end-to-end Pipeline for CV Information Extraction and Job Matching	Kenza Khelkhal et.al.	2511.02537	translate	read	null
2025-11-04	Knowledge Graph-enhanced Large Language Model for Incremental Game PlayTesting	Enhong Mu et.al.	2511.02534	translate	read	null
2025-11-04	Causal Graph Neural Networks for Healthcare	Munib Mesinovic et.al.	2511.02531	translate	read	null
2025-11-04	Large Lemma Miners: Can LLMs do Induction Proofs for Hardware?	Romy Peled et.al.	2511.02521	translate	read	null
2025-11-04	ESA: Energy-Based Shot Assembly Optimization for Automatic Video Editing	Yaosen Chen et.al.	2511.02505	translate	read	null
2025-11-04	BRAINS: A Retrieval-Augmented System for Alzheimer’s Detection and Monitoring	Rajan Das Gupta et.al.	2511.02490	translate	read	null
2025-11-04	Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization	Tao Liu et.al.	2511.02489	translate	read	link
2025-11-04	Modeling Hawkish-Dovish Latent Beliefs in Multi-Agent Debate-Based LLMs for Monetary Policy Decision Classification	Kaito Takano et.al.	2511.02469	translate	read	null
2025-11-04	Auditable-choice reframing unlocks RL-based verification for open-ended tasks	Mengyu Zhang et.al.	2511.02463	translate	read	null
2025-11-04	Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas	Giulia Iadisernia et.al.	2511.02458	translate	read	null
2025-11-04	Who’s Who? LLM-assisted Software Traceability with Architecture Entity Recognition	Dominik Fuchß et.al.	2511.02434	translate	read	null
2025-11-04	Can Conversational AI Counsel for Change? A Theory-Driven Approach to Supporting Dietary Intentions in Ambivalent Individuals	Michelle Bak et.al.	2511.02428	translate	read	null
2025-11-04	From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics	Nicolas Schuler et.al.	2511.02427	translate	read	null
2025-11-04	ReAcTree: Hierarchical LLM Agent Trees with Control Flow for Long-Horizon Task Planning	Jae-Woo Choi et.al.	2511.02424	translate	read	null
2025-11-04	LLM4PG: Adapting Large Language Model for Pathloss Map Generation via Synesthesia of Machines	Mingran Sun et.al.	2511.02423	translate	read	null
2025-11-04	ChartM $^3$ : A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension	Duo Xu et.al.	2511.02415	translate	read	null
2025-11-04	EvoDev: An Iterative Feature-Driven Framework for End-to-End Software Development with LLM-based Agents	Junwei Liu et.al.	2511.02399	translate	read	null
2025-11-04	RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning	Jiahe Song et.al.	2511.02384	translate	read	null
2025-11-04	Revisiting put-that-there, context aware window interactions via LLMs	Riccardo Bovo et.al.	2511.02378	translate	read	null
2025-11-04	AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models	Aashray Reddy et.al.	2511.02376	translate	read	null
2025-11-04	AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda	Mohd Nauman et.al.	2511.02374	translate	read	null
2025-11-04	LUMA-RAG: Lifelong Multimodal Agents with Provably Stable Streaming Alignment	Rohan Wandre et.al.	2511.02371	translate	read	null
2025-11-04	An LLM-powered MILP modelling engine for workforce scheduling guided by expert knowledge	Qingyang Li et.al.	2511.02364	translate	read	null
2025-11-04	Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation	Wongyu Kim et.al.	2511.02358	translate	read	null
2025-11-04	An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks	Xu Liu et.al.	2511.02356	translate	read	null
2025-11-04	LTD-Bench: Evaluating Large Language Models by Letting Them Draw	Liuhao Lin et.al.	2511.02347	translate	read	link
2025-11-04	Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation	Zhiwei Zhang et.al.	2511.02303	translate	read	null
2025-11-04	VFocus: Better Verilog Generation from Large Language Model via Focused Reasoning	Zhuorui Zhao et.al.	2511.02285	translate	read	null
2025-11-04	SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning	Fangxun Shu et.al.	2511.02280	translate	read	link
2025-11-04	LA-MARRVEL: A Knowledge-Grounded and Language-Aware LLM Reranker for AI-MARRVEL in Rare Disease Diagnosis	Jaeyeon Lee et.al.	2511.02263	translate	read	null
2025-11-04	When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs	Zhuoran Zhang et.al.	2511.02243	translate	read	null
2025-11-04	Deep Ideation: Designing LLM Agents to Generate Novel Research Ideas on Scientific Concept Network	Keyu Zhao et.al.	2511.02238	translate	read	null
2025-11-04	An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM	Jiawei Liu et.al.	2511.02234	translate	read	null
2025-11-04	Quantitative Risk Assessment in Radiation Oncology via LLM-Powered Root Cause Analysis of Incident Reports	Yuntao Wang et.al.	2511.02223	translate	read	null
2025-11-04	TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data	Changjiang Jiang et.al.	2511.02219	translate	read	null
2025-11-04	IG-Pruning: Input-Guided Block Pruning for Large Language Models	Kangyu Qiao et.al.	2511.02213	translate	read	null
2025-11-04	Language-Enhanced Generative Modeling for PET Synthesis from MRI and Blood Biomarkers	Zhengjie Zhang et.al.	2511.02206	translate	read	null
2025-11-04	LLMs as Judges: Toward The Automatic Review of GSN-compliant Assurance Cases	Gerhard Yu et.al.	2511.02203	translate	read	null
2025-11-04	Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration	Jingbo Wang et.al.	2511.02200	translate	read	null
2025-11-04	Open the Oyster: Empirical Evaluation and Improvement of Code Reasoning Confidence in LLMs	Shufan Wang et.al.	2511.02197	translate	read	null
2025-11-04	Personalized Decision Modeling: Utility Optimization or Textualized-Symbolic Reasoning	Yibo Zhao et.al.	2511.02194	translate	read	null
2025-11-04	Pinpointing Trigger Moment for Grounded Video QA: Enhancing Spatio-temporal Grounding in Multimodal Large Language Models	Jinhwan Seo et.al.	2511.02182	translate	read	null
2025-11-04	Eliminating Multi-GPU Performance Taxes: A Systems Approach to Efficient Distributed LLMs	Octavian Alexandru Trifan et.al.	2511.02168	translate	read	null
2025-11-03	Rethinking LLM Human Simulation: When a Graph is What You Need	Joseph Suh et.al.	2511.02135	translate	read	null
2025-11-03	InsurAgent: A Large Language Model-Empowered Agent for Simulating Individual Behavior in Purchasing Flood Insurance	Ziheng Geng et.al.	2511.02119	translate	read	null
2025-11-03	Deep Value Benchmark: Measuring Whether Models Generalize Deep values or Shallow Preferences	Joshua Ashkinaze et.al.	2511.02109	translate	read	null
2025-11-03	Metamorphic Testing of Large Language Models for Natural Language Processing	Steven Cho et.al.	2511.02108	translate	read	null
2025-11-03	LLM Probing with Contrastive Eigenproblems: Improving Understanding and Applicability of CCS	Stefan F. Schouten et.al.	2511.02089	translate	read	null
2025-11-03	Watermarking Discrete Diffusion Language Models	Avi Bagchi et.al.	2511.02083	translate	read	null

(<a href=../LLM.md>back to LLM</a>)