LLM - 2025-11
LLM - 2025-11
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-11-06 | Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs | Preetum Nakkiran et.al. | 2511.04869 | translate | read | null |
| 2025-11-06 | Software Defined Vehicle Code Generation: A Few-Shot Prompting Approach | Quang-Dung Nguyen et.al. | 2511.04849 | translate | read | null |
| 2025-11-06 | Grounded Test-Time Adaptation for LLM Agents | Arthur Chen et.al. | 2511.04847 | translate | read | null |
| 2025-11-06 | Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models | Chenxi Liu et.al. | 2511.04800 | translate | read | null |
| 2025-11-06 | ReGen: Generative Robot Simulation via Inverse Design | Phat Nguyen et.al. | 2511.04769 | translate | read | null |
| 2025-11-06 | Surprisal reveals diversity gaps in image captioning and different scorers change the story | Nikolai Ilinykh et.al. | 2511.04754 | translate | read | null |
| 2025-11-06 | Trustworthiness Calibration Framework for Phishing Email Detection Using Large Language Models | Daniyal Ganiuly et.al. | 2511.04728 | translate | read | null |
| 2025-11-06 | IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs | Ali Faraz et.al. | 2511.04727 | translate | read | null |
| 2025-11-06 | Learning to reason about rare diseases through retrieval-augmented agents | Ha Young Kim et.al. | 2511.04720 | translate | read | null |
| 2025-11-06 | Benchmark Designers Should “Train on the Test Set” to Expose Exploitable Non-Visual Shortcuts | Ellis Brown et.al. | 2511.04655 | translate | read | null |
| 2025-11-06 | Logit-Entropy Adaptive Stopping Heuristic for Efficient Chain-of-Thought Reasoning | Mohammad Atif Quamar et.al. | 2511.04654 | translate | read | null |
| 2025-11-06 | Optimal Inference Schedules for Masked Diffusion Models | Sitan Chen et.al. | 2511.04647 | translate | read | null |
| 2025-11-06 | When retrieval outperforms generation: Dense evidence retrieval for scalable fake news detection | Alamgir Munir Qazi et.al. | 2511.04643 | translate | read | link |
| 2025-11-06 | PixCLIP: Achieving Fine-grained Visual Language Understanding via Any-granularity Pixel-Text Alignment Learning | Yicheng Xiao et.al. | 2511.04601 | translate | read | null |
| 2025-11-06 | Question the Questions: Auditing Representation in Online Deliberative Processes | Soham De et.al. | 2511.04588 | translate | read | null |
| 2025-11-06 | ARETE: an R package for Automated REtrieval from TExt with large language models | Vasco V. Branco et.al. | 2511.04573 | translate | read | null |
| 2025-11-06 | Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm | Jingqi Tong et.al. | 2511.04570 | translate | read | link |
| 2025-11-06 | LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems | Baptiste Bonin et.al. | 2511.04541 | translate | read | null |
| 2025-11-06 | From Model to Breach: Towards Actionable LLM-Generated Vulnerabilities Reporting | Cyril Vallez et.al. | 2511.04538 | translate | read | null |
| 2025-11-06 | Large Language Models for Cyber Security | Raunak Somani et.al. | 2511.04508 | translate | read | null |
| 2025-11-06 | RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG | Joshua Gao et.al. | 2511.04502 | translate | read | null |
| 2025-11-06 | Large language models replicate and predict human cooperation across experiments in game theory | Andrea Cera Palatsi et.al. | 2511.04500 | translate | read | null |
| 2025-11-06 | Decoding Emergent Big Five Traits in Large Language Models: Temperature-Dependent Expression and Architectural Clustering | Christos-Nikolaos Zacharopoulos et.al. | 2511.04499 | translate | read | null |
| 2025-11-06 | RUST-BENCH: Benchmarking LLM Reasoning on Unstructured Text within Structured Tables | Nikhil Abhyankar et.al. | 2511.04491 | translate | read | null |
| 2025-11-06 | Perceptions of AI Bad Behavior: Variations on Discordant Non-Performance | Jaime Banks et.al. | 2511.04487 | translate | read | null |
| 2025-11-06 | Promoting Sustainable Web Agents: Benchmarking and Estimating Energy Consumption through Empirical and Theoretical Analysis | Lars Krupp et.al. | 2511.04481 | translate | read | null |
| 2025-11-06 | Enabling Dynamic Sparsity in Quantized LLM Inference | Rongxiang Wang et.al. | 2511.04477 | translate | read | null |
| 2025-11-06 | Beyond Shortest Path: Agentic Vehicular Routing with Semantic Context | Carnot Braun et.al. | 2511.04464 | translate | read | null |
| 2025-11-06 | Speed at the Cost of Quality? The Impact of LLM Agent Assistance on Software Development | Hao He et.al. | 2511.04427 | translate | read | null |
| 2025-11-06 | The Illusion of Certainty: Uncertainty quantification for LLMs fails under ambiguity | Tim Tomov et.al. | 2511.04418 | translate | read | null |
| 2025-11-06 | Post-Training LLMs as Better Decision-Making Agents: A Regret-Minimization Approach | Chanwoo Park et.al. | 2511.04393 | translate | read | null |
| 2025-11-06 | Multi-Task Learning for Visually Grounded Reasoning in Gastrointestinal VQA | Itbaan Safwan et.al. | 2511.04384 | translate | read | null |
| 2025-11-06 | HPC-Vis: A Visual Analytic System for Interactive Exploration of Historical Painter Cohorts | Yingping Yang et.al. | 2511.04383 | translate | read | null |
| 2025-11-06 | Towards Aligning Multimodal LLMs with Human Experts: A Focus on Parent-Child Interaction | Weiyan Shi et.al. | 2511.04366 | translate | read | null |
| 2025-11-06 | Where Do LLMs Still Struggle? An In-Depth Analysis of Code Generation Benchmarks | Amir Molzam Sharifloo et.al. | 2511.04355 | translate | read | null |
| 2025-11-06 | Differentially Private In-Context Learning with Nearest Neighbor Search | Antti Koskela et.al. | 2511.04332 | translate | read | null |
| 2025-11-06 | RxSafeBench: Identifying Medication Safety Issues of Large Language Models in Simulated Consultation | Jiahao Zhao et.al. | 2511.04328 | translate | read | null |
| 2025-11-06 | AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research | Tim Beyer et.al. | 2511.04316 | translate | read | null |
| 2025-11-06 | Measuring economic outlook in the news timely and efficiently | Elliot Beck et.al. | 2511.04299 | translate | read | null |
| 2025-11-06 | Robustness of Minimum-Volume Nonnegative Matrix Factorization under an Expanded Sufficiently Scattered Condition | Giovanni Barbarino et.al. | 2511.04291 | translate | read | null |
| 2025-11-06 | A Tool for Benchmarking Large Language Models’ Robustness in Assessing the Realism of Driving Scenarios | Jiahui Wu et.al. | 2511.04267 | translate | read | null |
| 2025-11-06 | SSPO: Subsentence-level Policy Optimization | Kun Yang et.al. | 2511.04256 | translate | read | null |
| 2025-11-06 | Efficient Topic Extraction via Graph-Based Labeling: A Lightweight Alternative to Deep Models | Salma Mekaoui et.al. | 2511.04248 | translate | read | null |
| 2025-11-06 | Reusing Pre-Training Data at Test Time is a Compute Multiplier | Alex Fang et.al. | 2511.04234 | translate | read | null |
| 2025-11-06 | Black-Box Guardrail Reverse-engineering Attack | Hongwei Yao et.al. | 2511.04215 | translate | read | null |
| 2025-11-06 | Block Rotation is All You Need for MXFP4 Quantization | Yuantian Shao et.al. | 2511.04214 | translate | read | null |
| 2025-11-06 | Can we trust LLMs as a tutor for our students? Evaluating the Quality of LLM-generated Feedback in Statistics Exams | Markus Herklotz et.al. | 2511.04213 | translate | read | null |
| 2025-11-06 | LLM-as-a-Judge is Bad, Based on AI Attempting the Exam Qualifying for the Member of the Polish National Board of Appeal | Michał Karp et.al. | 2511.04205 | translate | read | null |
| 2025-11-06 | Computational Turing Test Reveals Systematic Differences Between Human and AI Language | Nicolò Pagan et.al. | 2511.04195 | translate | read | null |
| 2025-11-06 | Explaining Software Vulnerabilities with Large Language Models | Oshando Johnson et.al. | 2511.04179 | translate | read | null |
| 2025-11-06 | Transforming Mentorship: An AI Powered Chatbot Approach to University Guidance | Mashrur Rahman et.al. | 2511.04172 | translate | read | null |
| 2025-11-06 | Are We Aligned? A Preliminary Investigation of the Alignment of Responsible AI Values between LLMs and Human Judgment | Asma Yamani et.al. | 2511.04157 | translate | read | null |
| 2025-11-06 | BAPPA: Benchmarking Agents, Plans, and Pipelines for Automated Text-to-SQL Generation | Fahim Ahmed et.al. | 2511.04153 | translate | read | null |
| 2025-11-06 | Implementation of transformer-based LLMs with large-scale optoelectronic neurons on a CMOS image sensor platform | Neil Na et.al. | 2511.04136 | translate | read | null |
| 2025-11-06 | Exploring the Feasibility of End-to-End Large Language Model as a Compiler | Hongbin Zhang et.al. | 2511.04132 | translate | read | null |
| 2025-11-06 | RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning | Xinyuan Li et.al. | 2511.04120 | translate | read | null |
| 2025-11-06 | How Natural Language Proficiency Shapes GenAI Code for Software Engineering Tasks | Ruksit Rojpaisarnkit et.al. | 2511.04115 | translate | read | null |
| 2025-11-06 | Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models | Wenmo Qiu et.al. | 2511.04108 | translate | read | null |
| 2025-11-06 | KGFR: A Foundation Retriever for Generalized Knowledge Graph Question Answering | Yuanning Cui et.al. | 2511.04093 | translate | read | null |
| 2025-11-06 | E-CARE: An Efficient LLM-based Commonsense-Augmented Framework for E-Commerce | Ge Zhang et.al. | 2511.04087 | translate | read | null |
| 2025-11-06 | Caption Injection for Optimization in Generative Search Engine | Xiaolu Chen et.al. | 2511.04080 | translate | read | null |
| 2025-11-06 | The truth is no diaper: Human and AI-generated associations to emotional words | Špela Vintar et.al. | 2511.04077 | translate | read | null |
| 2025-11-06 | Agentmandering: A Game-Theoretic Framework for Fair Redistricting via Large Language Model Agents | Hao Li et.al. | 2511.04076 | translate | read | null |
| 2025-11-06 | Plan of Knowledge: Retrieval-Augmented Large Language Models for Temporal Knowledge Graph Question Answering | Xinying Qian et.al. | 2511.04072 | translate | read | null |
| 2025-11-06 | TXL Fusion: A Hybrid Machine Learning Framework Integrating Chemical Heuristics and Large Language Models for Topological Materials Discovery | Arif Ullah et.al. | 2511.04068 | translate | read | null |
| 2025-11-06 | DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization | Yuantian Shao et.al. | 2511.04063 | translate | read | null |
| 2025-11-06 | Interpreting Multi-Attribute Confounding through Numerical Attributes in Large Language Models | Hirohane Takagi et.al. | 2511.04053 | translate | read | null |
| 2025-11-06 | An LLM-based Framework for Human-Swarm Teaming Cognition in Disaster Search and Rescue | Kailun Ji et.al. | 2511.04042 | translate | read | null |
| 2025-11-06 | PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration | Yue Jiet Chong et.al. | 2511.04036 | translate | read | null |
| 2025-11-06 | Detecting Silent Failures in Multi-Agentic AI Trajectories | Divya Pathak et.al. | 2511.04032 | translate | read | null |
| 2025-11-06 | Abductive Inference in Retrieval-Augmented Language Models: Generating and Validating Missing Premises | Shiyin Lin et.al. | 2511.04020 | translate | read | null |
| 2025-11-06 | Specification-Guided Vulnerability Detection with Large Language Models | Hao Zhu et.al. | 2511.04014 | translate | read | null |
| 2025-11-06 | PSD2Code: Automated Front-End Code Generation from Design Files via Multimodal Large Language Models | Yongxi Chen et.al. | 2511.04012 | translate | read | null |
| 2025-11-06 | Memory- and Latency-Constrained Inference of Large Language Models via Adaptive Split Computing | Mingyu Sung et.al. | 2511.04002 | translate | read | null |
| 2025-11-06 | Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback | Shiyin Lin et.al. | 2511.03995 | translate | read | null |
| 2025-11-06 | TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training | Michael Menezes et.al. | 2511.03983 | translate | read | null |
| 2025-11-06 | LLMs and Cultural Values: the Impact of Prompt Language and Explicit Cultural Framing | Bram Bulté et.al. | 2511.03980 | translate | read | null |
| 2025-11-06 | Direct Semantic Communication Between Large Language Models via Vector Translation | Fu-Chun Yang et.al. | 2511.03945 | translate | read | null |
| 2025-11-06 | MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation | Shih-Lun Wu et.al. | 2511.03942 | translate | read | null |
| 2025-11-06 | RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods | Raghav Sharma et.al. | 2511.03939 | translate | read | null |
| 2025-11-06 | SynQuE: Estimating Synthetic Dataset Quality Without Annotations | Arthur Chen et.al. | 2511.03928 | translate | read | null |
| 2025-11-06 | Collaborative Agents for Automated Program Repair in Ruby | Nikta Akbarpour et.al. | 2511.03925 | translate | read | null |
| 2025-11-05 | The Human Flourishing Geographic Index: A County-Level Dataset for the United States, 2013–2023 | Stefano M. Iacus et.al. | 2511.03915 | translate | read | null |
| 2025-11-05 | GRAD: Graph-Retrieved Adaptive Decoding for Hallucination Mitigation | Manh Nguyen et.al. | 2511.03900 | translate | read | null |
| 2025-11-05 | Secure Code Generation at Scale with Reflexion | Arup Datta et.al. | 2511.03898 | translate | read | null |
| 2025-11-05 | KnowThyself: An Agentic Assistant for LLM Interpretability | Suraj Prasai et.al. | 2511.03878 | translate | read | null |
| 2025-11-05 | OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing Paradigms | Arijit Bhattacharjee et.al. | 2511.03866 | translate | read | null |
| 2025-11-05 | GAIA: Geothermal Analytics and Intelligent Agent | Randy Harsuko et.al. | 2511.03852 | translate | read | null |
| 2025-11-05 | To See or To Read: User Behavior Reasoning in Multimodal LLMs | Tianning Dong et.al. | 2511.03845 | translate | read | null |
| 2025-11-05 | ASAP: an Agentic Solution to Auto-optimize Performance of Large-Scale LLM Training | Yuran Ding et.al. | 2511.03844 | translate | read | null |
| 2025-11-05 | Divide, Cache, Conquer: Dichotomic Prompting for Efficient Multi-Label LLM-Based Classification | Mikołaj Langner et.al. | 2511.03830 | translate | read | null |
| 2025-11-05 | STARS: Segment-level Token Alignment with Rejection Sampling in Large Language Models | Mohammad Atif Quamar et.al. | 2511.03827 | translate | read | null |
| 2025-11-05 | How Different Tokenization Algorithms Impact LLMs and Transformer Models for Binary Code Analysis | Ahmed Mostafa et.al. | 2511.03825 | translate | read | null |
| 2025-11-05 | PLLuM: A Family of Polish Large Language Models | Jan Kocoń et.al. | 2511.03823 | translate | read | null |
| 2025-11-05 | Expert Evaluation of LLM World Models: A High- $T_c$ Superconductivity Case Study | Haoyu Guo et.al. | 2511.03782 | translate | read | null |
| 2025-11-05 | Scaling Agent Learning via Experience Synthesis | Zhaorun Chen et.al. | 2511.03773 | translate | read | link |
| 2025-11-05 | Disentangled Concepts Speak Louder Than Words:Explainable Video Action Recognition | Jongseo Lee et.al. | 2511.03725 | translate | read | null |
| 2025-11-05 | Outbidding and Outbluffing Elite Humans: Mastering Liar’s Poker via Self-Play and Reinforcement Learning | Richard Dewey et.al. | 2511.03724 | translate | read | null |
| 2025-11-05 | LLM-enhanced Air Quality Monitoring Interface via Model Context Protocol | Yu-Erh Pan et.al. | 2511.03706 | translate | read | null |
| 2025-11-05 | Do Androids Dream of Unseen Puppeteers? Probing for a Conspiracy Mindset in Large Language Models | Francesco Corso et.al. | 2511.03699 | translate | read | null |
| 2025-11-05 | AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing | Mohsen Ahmadzadeh et.al. | 2511.03697 | translate | read | null |
| 2025-11-05 | Whisper Leak: a side-channel attack on Large Language Models | Geoff McDonald et.al. | 2511.03675 | translate | read | null |
| 2025-11-05 | Watermarking Large Language Models in Europe: Interpreting the AI Act in Light of Technology | Thomas Souverain et.al. | 2511.03641 | translate | read | null |
| 2025-11-05 | Towards Transparent Stance Detection: A Zero-Shot Approach Using Implicit and Explicit Interpretability | Apoorva Upadhyaya et.al. | 2511.03635 | translate | read | null |
| 2025-11-05 | LiveTradeBench: Seeking Real-World Alpha with Large Language Models | Haofei Yu et.al. | 2511.03628 | translate | read | null |
| 2025-11-05 | PerfDojo: Automated ML Library Generation for Heterogeneous Architectures | Andrei Ivanov et.al. | 2511.03586 | translate | read | null |
| 2025-11-05 | ASVRI-Legal: Fine-Tuning LLMs with Retrieval Augmented Generation for Enhanced Legal Regulation | One Octadion et.al. | 2511.03563 | translate | read | null |
| 2025-11-05 | MultiZebraLogic: A Multilingual Logical Reasoning Benchmark | Sofie Helene Bruun et.al. | 2511.03553 | translate | read | null |
| 2025-11-05 | Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding | Ziv Nevo et.al. | 2511.03549 | translate | read | null |
| 2025-11-05 | U2F: Encouraging SWE-Agent to Seize Novelty without Losing Feasibility | Wencheng Ye et.al. | 2511.03517 | translate | read | null |
| 2025-11-05 | One Battle After Another: Probing LLMs’ Limits on Multi-Turn Instruction Following with a Benchmark Evolving Framework | Qi Jia et.al. | 2511.03508 | translate | read | null |
| 2025-11-05 | BanglaSTEM: A Parallel Corpus for Technical Domain Bangla-English Translation | Kazi Reyazul Hasan et.al. | 2511.03498 | translate | read | null |
| 2025-11-05 | RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse | Yinsicheng Jiang et.al. | 2511.03475 | translate | read | null |
| 2025-11-05 | Towards Scalable Web Accessibility Audit with MLLMs as Copilots | Ming Gu et.al. | 2511.03471 | translate | read | null |
| 2025-11-05 | CareMedEval dataset: Evaluating Critical Appraisal and Reasoning in the Biomedical Field | Doria Bonzi et.al. | 2511.03441 | translate | read | null |
| 2025-11-05 | Light over Heavy: Automated Performance Requirements Quantification with Linguistic Inducement | Shihai Wang et.al. | 2511.03421 | translate | read | null |
| 2025-11-05 | Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG | Longpeng Qiu et.al. | 2511.03410 | translate | read | null |
| 2025-11-05 | Efficient Reasoning via Thought-Training and Thought-Free Inference | Canhui Wu et.al. | 2511.03408 | translate | read | null |
| 2025-11-05 | Towards Realistic Project-Level Code Generation via Multi-Agent Collaboration and Semantic Architecture Modeling | Qianhui Zhao et.al. | 2511.03404 | translate | read | null |
| 2025-11-05 | GUIDES: Guidance Using Instructor-Distilled Embeddings for Pre-trained Robot Policy Enhancement | Minquan Gao et.al. | 2511.03400 | translate | read | null |
| 2025-11-05 | Computational Imaging Meets LLMs: Zero-Shot IDH Mutation Prediction in Brain Gliomas | Syed Muqeem Mahmood et.al. | 2511.03376 | translate | read | null |
| 2025-11-05 | LFC-DA: Logical Formula-Controlled Data Augmentation for Enhanced Logical Reasoning | Shenghao Li et.al. | 2511.03372 | translate | read | null |
| 2025-11-05 | EQ-Negotiator: Dynamic Emotional Personas Empower Small Language Models for Edge-Deployable Credit Negotiation | Yunbo Long et.al. | 2511.03370 | translate | read | null |
| 2025-11-05 | Silenced Biases: The Dark Side LLMs Learned to Refuse | Rom Himelstein et.al. | 2511.03369 | translate | read | null |
| 2025-11-05 | A Modular, Data-Free Pipeline for Multi-Label Intention Recognition in Transportation Agentic AI Applications | Xiaocai Zhang et.al. | 2511.03363 | translate | read | null |
| 2025-11-05 | Multi-Object Tracking Retrieval with LLaVA-Video: A Training-Free Solution to MOT25-StAG Challenge | Yi Yang et.al. | 2511.03332 | translate | read | null |
| 2025-11-05 | Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks | Jindong Hong et.al. | 2511.03328 | translate | read | null |
| 2025-11-05 | SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding | Mauro Orazio Drago et.al. | 2511.03325 | translate | read | null |
| 2025-11-05 | TASU: Text-Only Alignment for Speech Understanding | Jing Peng et.al. | 2511.03310 | translate | read | null |
| 2025-11-05 | How to Evaluate Speech Translation with Source-Aware Neural MT Metrics | Mauro Cettolo et.al. | 2511.03295 | translate | read | null |
| 2025-11-05 | UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM | Hai Huang et.al. | 2511.03293 | translate | read | null |
| 2025-11-05 | Let the Bees Find the Weak Spots: A Path Planning Perspective on Multi-Turn Jailbreak Attacks against LLMs | Yize Liu et.al. | 2511.03271 | translate | read | null |
| 2025-11-05 | SCALE: Upscaled Continual Learning of Large Language Models | Jin-woo Lee et.al. | 2511.03270 | translate | read | null |
| 2025-11-05 | Comparing the Performance of LLMs in RAG-based Question-Answering: A Case Study in Computer Science Literature | Ranul Dayarathne et.al. | 2511.03261 | translate | read | null |
| 2025-11-05 | Auditing M-LLMs for Privacy Risks: A Synthetic Benchmark and Evaluation Framework | Junhao Li et.al. | 2511.03248 | translate | read | null |
| 2025-11-05 | Death by a Thousand Prompts: Open Model Vulnerability Analysis | Amy Chang et.al. | 2511.03247 | translate | read | null |
| 2025-11-05 | IndicSuperTokenizer: An Optimized Tokenizer for Indic Multilingual LLMs | Souvik Rana et.al. | 2511.03237 | translate | read | null |
| 2025-11-05 | From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers | Yi-Fei Liu et.al. | 2511.03235 | translate | read | null |
| 2025-11-05 | Multimodal-Wireless: A Large-Scale Dataset for Sensing and Communication | Tianhao Mao et.al. | 2511.03220 | translate | read | null |
| 2025-11-05 | Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification | Shaghayegh Kolli et.al. | 2511.03217 | translate | read | null |
| 2025-11-05 | LGM: Enhancing Large Language Models with Conceptual Meta-Relations and Iterative Retrieval | Wenchang Lei et.al. | 2511.03214 | translate | read | null |
| 2025-11-05 | QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models | Kuei-Chun Kao et.al. | 2511.03206 | translate | read | null |
| 2025-11-05 | Large Language Models as Information Sources: Distinctive Characteristics and Types of Low-Quality Information | Jiawei Zhou et.al. | 2511.03198 | translate | read | null |
| 2025-11-05 | Understanding Robustness of Model Editing in Code LLMs: An Empirical Study | Vinaik Chhetri et.al. | 2511.03182 | translate | read | null |
| 2025-11-05 | Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control | Rewida Ali et.al. | 2511.03181 | translate | read | null |
| 2025-11-05 | BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture | Shahriyar Zaman Ridoy et.al. | 2511.03180 | translate | read | null |
| 2025-11-05 | Toward Autonomous Engineering Design: A Knowledge-Guided Multi-Agent Framework | Varun Kumar et.al. | 2511.03179 | translate | read | null |
| 2025-11-05 | SurgAnt-ViVQA: Learning to Anticipate Surgical Events through GRU-Driven Temporal Cross-Attention | Shreyas C. Dhake et.al. | 2511.03178 | translate | read | null |
| 2025-11-05 | AI as We Describe It: How Large Language Models and Their Applications in Health are Represented Across Channels of Public Discourse | Jiawei Zhou et.al. | 2511.03174 | translate | read | null |
| 2025-11-05 | Measuring Aleatoric and Epistemic Uncertainty in LLMs: Empirical Evaluation on ID and OOD QA Tasks | Kevin Wang et.al. | 2511.03166 | translate | read | null |
| 2025-11-05 | RefAgent: A Multi-agent LLM-based Framework for Automatic Software Refactoring | Khouloud Oueslati et.al. | 2511.03153 | translate | read | null |
| 2025-11-05 | From Measurement to Expertise: Empathetic Expert Adapters for Context-Based Empathy in Conversational AI Agents | Erfan Shayegani et.al. | 2511.03143 | translate | read | null |
| 2025-11-05 | A Proprietary Model-Based Safety Response Framework for AI Agents | Qi Li et.al. | 2511.03138 | translate | read | null |
| 2025-11-05 | Using Multi-modal Large Language Model to Boost Fireworks Algorithm’s Ability in Settling Challenging Optimization Tasks | Shipeng Cen et.al. | 2511.03137 | translate | read | null |
| 2025-11-05 | From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation | Najrin Sultana et.al. | 2511.03128 | translate | read | null |
| 2025-11-05 | Control Barrier Function for Aligning Large Language Models | Yuya Miyaoka et.al. | 2511.03121 | translate | read | null |
| 2025-11-05 | Large language models require a new form of oversight: capability-based monitoring | Katherine C. Kellogg et.al. | 2511.03106 | translate | read | null |
| 2025-11-05 | CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic | Saad Mankarious et.al. | 2511.03102 | translate | read | null |
| 2025-11-05 | ALAS: Transactional and Dynamic Multi-Agent LLM Planning | Longling Geng et.al. | 2511.03094 | translate | read | null |
| 2025-11-05 | SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators | Jonathan Li et.al. | 2511.03092 | translate | read | null |
| 2025-11-05 | PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech | Michel Wong et.al. | 2511.03080 | translate | read | null |
| 2025-11-04 | A Collaborative Reasoning Framework for Anomaly Diagnostics in Underwater Robotics | Markus Buchholz et.al. | 2511.03075 | translate | read | null |
| 2025-11-04 | Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge | Drago Plecko et.al. | 2511.03070 | translate | read | null |
| 2025-11-04 | Reading Between the Lines: The One-Sided Conversation Problem | Victoria Ebert et.al. | 2511.03056 | translate | read | null |
| 2025-11-04 | No-Human in the Loop: Agentic Evaluation at Scale for Recommendation | Tao Zhang et.al. | 2511.03051 | translate | read | null |
| 2025-11-04 | ROBoto2: An Interactive System and Dataset for LLM-assisted Clinical Trial Risk of Bias Assessment | Anthony Hevia et.al. | 2511.03048 | translate | read | null |
| 2025-11-04 | Unsupervised Evaluation of Multi-Turn Objective-Driven Interactions | Emi Soroka et.al. | 2511.03047 | translate | read | null |
| 2025-11-04 | Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis | Yan Cathy Hua et.al. | 2511.03034 | translate | read | null |
| 2025-11-04 | PublicAgent: Multi-Agent Design Principles From an LLM-Based Open Data Analysis Framework | Sina Montazeri et.al. | 2511.03023 | translate | read | null |
| 2025-11-04 | LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation | Gyeom Hwangbo et.al. | 2511.03001 | translate | read | null |
| 2025-11-04 | Zero-shot data citation function classification using transformer-based large language models (LLMs) | Neil Byers et.al. | 2511.02936 | translate | read | null |
| 2025-11-04 | Cache Mechanism for Agent RAG Systems | Shuhang Lin et.al. | 2511.02919 | translate | read | null |
| 2025-11-04 | Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models | W. K. M Mithsara et.al. | 2511.02894 | translate | read | null |
| 2025-11-04 | Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything | Huawei Lin et.al. | 2511.02834 | translate | read | null |
| 2025-11-04 | Can LLMs subtract numbers? | Mayank Jobanputra et.al. | 2511.02795 | translate | read | null |
| 2025-11-04 | When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning | Chenyu Zhang et.al. | 2511.02794 | translate | read | null |
| 2025-11-04 | When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought | Yiyang Zhou et.al. | 2511.02779 | translate | read | null |
| 2025-11-04 | ConMeZO: Adaptive Descent-Direction Sampling for Gradient-Free Finetuning of Large Language Models | Lejs Deen Behric et.al. | 2511.02757 | translate | read | null |
| 2025-11-04 | Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning | Bowen Jin et.al. | 2511.02755 | translate | read | null |
| 2025-11-04 | AI Diffusion in Low Resource Language Countries | Amit Misra et.al. | 2511.02752 | translate | read | null |
| 2025-11-04 | Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning | Farhad Rezazadeh et.al. | 2511.02748 | translate | read | null |
| 2025-11-04 | CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents | Jiayu Liu et.al. | 2511.02734 | translate | read | link |
| 2025-11-04 | LLEXICORP: End-user Explainability of Convolutional Neural Networks | Vojtěch Kůr et.al. | 2511.02720 | translate | read | null |
| 2025-11-04 | ReleaseEval: A Benchmark for Evaluating Language Models in Automated Release Note Generation | Qianru Meng et.al. | 2511.02713 | translate | read | null |
| 2025-11-04 | VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models | Zhicheng Zhang et.al. | 2511.02712 | translate | read | null |
| 2025-11-04 | Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs | Georgios Tzannetos et.al. | 2511.02690 | translate | read | null |
| 2025-11-04 | Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes | Mohammadsajad Alipour et.al. | 2511.02681 | translate | read | null |
| 2025-11-04 | EasyTUS: A Comprehensive Framework for Fast and Accurate Table Union Search across Data Lakes | Tim Otto et.al. | 2511.02674 | translate | read | null |
| 2025-11-04 | Apriel-H1: Towards Efficient Enterprise Reasoning Models | Oleksiy Ostapenko et.al. | 2511.02651 | translate | read | null |
| 2025-11-04 | Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks | Xiumei Deng et.al. | 2511.02647 | translate | read | null |
| 2025-11-04 | DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning | Lachlan McPheat et.al. | 2511.02627 | translate | read | null |
| 2025-11-04 | Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis, Solution, and Interpretation | Renfei Dang et.al. | 2511.02626 | translate | read | null |
| 2025-11-04 | The Realignment Problem: When Right becomes Wrong in LLMs | Aakash Sen Sharma et.al. | 2511.02623 | translate | read | null |
| 2025-11-04 | Verifying LLM Inference to Prevent Model Weight Exfiltration | Roy Rinberg et.al. | 2511.02620 | translate | read | null |
| 2025-11-04 | UniChange: Unifying Change Detection with Multimodal Large Language Model | Xu Zhang et.al. | 2511.02607 | translate | read | null |
| 2025-11-04 | CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency | Ehsan Aghazadeh et.al. | 2511.02603 | translate | read | null |
| 2025-11-04 | Next Token Knowledge Tracing: Exploiting Pretrained LLM Representations to Decode Student Behaviour | Max Norris et.al. | 2511.02599 | translate | read | null |
| 2025-11-04 | A Large Language Model for Corporate Credit Scoring | Chitro Majumdar et.al. | 2511.02593 | translate | read | null |
| 2025-11-04 | The ORCA Benchmark: Evaluating Real-World Calculation Accuracy in Large Language Models | Claudia Herambourg et.al. | 2511.02589 | translate | read | null |
| 2025-11-04 | Smart-Hiring: An Explainable end-to-end Pipeline for CV Information Extraction and Job Matching | Kenza Khelkhal et.al. | 2511.02537 | translate | read | null |
| 2025-11-04 | Knowledge Graph-enhanced Large Language Model for Incremental Game PlayTesting | Enhong Mu et.al. | 2511.02534 | translate | read | null |
| 2025-11-04 | Causal Graph Neural Networks for Healthcare | Munib Mesinovic et.al. | 2511.02531 | translate | read | null |
| 2025-11-04 | Large Lemma Miners: Can LLMs do Induction Proofs for Hardware? | Romy Peled et.al. | 2511.02521 | translate | read | null |
| 2025-11-04 | ESA: Energy-Based Shot Assembly Optimization for Automatic Video Editing | Yaosen Chen et.al. | 2511.02505 | translate | read | null |
| 2025-11-04 | BRAINS: A Retrieval-Augmented System for Alzheimer’s Detection and Monitoring | Rajan Das Gupta et.al. | 2511.02490 | translate | read | null |
| 2025-11-04 | Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization | Tao Liu et.al. | 2511.02489 | translate | read | link |
| 2025-11-04 | Modeling Hawkish-Dovish Latent Beliefs in Multi-Agent Debate-Based LLMs for Monetary Policy Decision Classification | Kaito Takano et.al. | 2511.02469 | translate | read | null |
| 2025-11-04 | Auditable-choice reframing unlocks RL-based verification for open-ended tasks | Mengyu Zhang et.al. | 2511.02463 | translate | read | null |
| 2025-11-04 | Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas | Giulia Iadisernia et.al. | 2511.02458 | translate | read | null |
| 2025-11-04 | Who’s Who? LLM-assisted Software Traceability with Architecture Entity Recognition | Dominik Fuchß et.al. | 2511.02434 | translate | read | null |
| 2025-11-04 | Can Conversational AI Counsel for Change? A Theory-Driven Approach to Supporting Dietary Intentions in Ambivalent Individuals | Michelle Bak et.al. | 2511.02428 | translate | read | null |
| 2025-11-04 | From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics | Nicolas Schuler et.al. | 2511.02427 | translate | read | null |
| 2025-11-04 | ReAcTree: Hierarchical LLM Agent Trees with Control Flow for Long-Horizon Task Planning | Jae-Woo Choi et.al. | 2511.02424 | translate | read | null |
| 2025-11-04 | LLM4PG: Adapting Large Language Model for Pathloss Map Generation via Synesthesia of Machines | Mingran Sun et.al. | 2511.02423 | translate | read | null |
| 2025-11-04 | ChartM $^3$ : A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension | Duo Xu et.al. | 2511.02415 | translate | read | null |
| 2025-11-04 | EvoDev: An Iterative Feature-Driven Framework for End-to-End Software Development with LLM-based Agents | Junwei Liu et.al. | 2511.02399 | translate | read | null |
| 2025-11-04 | RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning | Jiahe Song et.al. | 2511.02384 | translate | read | null |
| 2025-11-04 | Revisiting put-that-there, context aware window interactions via LLMs | Riccardo Bovo et.al. | 2511.02378 | translate | read | null |
| 2025-11-04 | AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models | Aashray Reddy et.al. | 2511.02376 | translate | read | null |
| 2025-11-04 | AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda | Mohd Nauman et.al. | 2511.02374 | translate | read | null |
| 2025-11-04 | LUMA-RAG: Lifelong Multimodal Agents with Provably Stable Streaming Alignment | Rohan Wandre et.al. | 2511.02371 | translate | read | null |
| 2025-11-04 | An LLM-powered MILP modelling engine for workforce scheduling guided by expert knowledge | Qingyang Li et.al. | 2511.02364 | translate | read | null |
| 2025-11-04 | Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation | Wongyu Kim et.al. | 2511.02358 | translate | read | null |
| 2025-11-04 | An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks | Xu Liu et.al. | 2511.02356 | translate | read | null |
| 2025-11-04 | LTD-Bench: Evaluating Large Language Models by Letting Them Draw | Liuhao Lin et.al. | 2511.02347 | translate | read | link |
| 2025-11-04 | Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation | Zhiwei Zhang et.al. | 2511.02303 | translate | read | null |
| 2025-11-04 | VFocus: Better Verilog Generation from Large Language Model via Focused Reasoning | Zhuorui Zhao et.al. | 2511.02285 | translate | read | null |
| 2025-11-04 | SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning | Fangxun Shu et.al. | 2511.02280 | translate | read | link |
| 2025-11-04 | LA-MARRVEL: A Knowledge-Grounded and Language-Aware LLM Reranker for AI-MARRVEL in Rare Disease Diagnosis | Jaeyeon Lee et.al. | 2511.02263 | translate | read | null |
| 2025-11-04 | When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs | Zhuoran Zhang et.al. | 2511.02243 | translate | read | null |
| 2025-11-04 | Deep Ideation: Designing LLM Agents to Generate Novel Research Ideas on Scientific Concept Network | Keyu Zhao et.al. | 2511.02238 | translate | read | null |
| 2025-11-04 | An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM | Jiawei Liu et.al. | 2511.02234 | translate | read | null |
| 2025-11-04 | Quantitative Risk Assessment in Radiation Oncology via LLM-Powered Root Cause Analysis of Incident Reports | Yuntao Wang et.al. | 2511.02223 | translate | read | null |
| 2025-11-04 | TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data | Changjiang Jiang et.al. | 2511.02219 | translate | read | null |
| 2025-11-04 | IG-Pruning: Input-Guided Block Pruning for Large Language Models | Kangyu Qiao et.al. | 2511.02213 | translate | read | null |
| 2025-11-04 | Language-Enhanced Generative Modeling for PET Synthesis from MRI and Blood Biomarkers | Zhengjie Zhang et.al. | 2511.02206 | translate | read | null |
| 2025-11-04 | LLMs as Judges: Toward The Automatic Review of GSN-compliant Assurance Cases | Gerhard Yu et.al. | 2511.02203 | translate | read | null |
| 2025-11-04 | Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration | Jingbo Wang et.al. | 2511.02200 | translate | read | null |
| 2025-11-04 | Open the Oyster: Empirical Evaluation and Improvement of Code Reasoning Confidence in LLMs | Shufan Wang et.al. | 2511.02197 | translate | read | null |
| 2025-11-04 | Personalized Decision Modeling: Utility Optimization or Textualized-Symbolic Reasoning | Yibo Zhao et.al. | 2511.02194 | translate | read | null |
| 2025-11-04 | Pinpointing Trigger Moment for Grounded Video QA: Enhancing Spatio-temporal Grounding in Multimodal Large Language Models | Jinhwan Seo et.al. | 2511.02182 | translate | read | null |
| 2025-11-04 | Eliminating Multi-GPU Performance Taxes: A Systems Approach to Efficient Distributed LLMs | Octavian Alexandru Trifan et.al. | 2511.02168 | translate | read | null |
| 2025-11-03 | Rethinking LLM Human Simulation: When a Graph is What You Need | Joseph Suh et.al. | 2511.02135 | translate | read | null |
| 2025-11-03 | InsurAgent: A Large Language Model-Empowered Agent for Simulating Individual Behavior in Purchasing Flood Insurance | Ziheng Geng et.al. | 2511.02119 | translate | read | null |
| 2025-11-03 | Deep Value Benchmark: Measuring Whether Models Generalize Deep values or Shallow Preferences | Joshua Ashkinaze et.al. | 2511.02109 | translate | read | null |
| 2025-11-03 | Metamorphic Testing of Large Language Models for Natural Language Processing | Steven Cho et.al. | 2511.02108 | translate | read | null |
| 2025-11-03 | LLM Probing with Contrastive Eigenproblems: Improving Understanding and Applicability of CCS | Stefan F. Schouten et.al. | 2511.02089 | translate | read | null |
| 2025-11-03 | Watermarking Discrete Diffusion Language Models | Avi Bagchi et.al. | 2511.02083 | translate | read | null |
(<a href=../LLM.md>back to LLM</a>)