LLM - 2026-01 | Paper Arxiv Daily

LLM - 2026-01

Publish Date	Title	Authors	PDF	Translate	Read	Code
2026-01-30	FOCUS: DLLMs Know How to Tame Their Compute Bound	Kaihua Liang et.al.	2601.23278	translate	read	null
2026-01-30	UPA: Unsupervised Prompt Agent via Tree-Based Search and Selection	Siran Peng et.al.	2601.23273	translate	read	null
2026-01-30	TEON: Tensorized Orthonormalization Beyond Layer-Wise Muon for Large Language Model Pre-Training	Ruijie Zhang et.al.	2601.23261	translate	read	null
2026-01-30	GrepRAG: An Empirical Study and Optimization of Grep-Like Retrieval for Code Completion	Baoyi Wang et.al.	2601.23254	translate	read	null
2026-01-30	ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search	Tao Yu et.al.	2601.23232	translate	read	null
2026-01-30	Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning	Xiangyu Zeng et.al.	2601.23224	translate	read	null
2026-01-30	Med-Scout: Curing MLLMs’ Geometric Blindness in Medical Perception via Geometry-Aware RL Post-Training	Anglin Liu et.al.	2601.23220	translate	read	null
2026-01-30	High-quality generation of dynamic game content via small language models: A proof of concept	Morten I. K. Munk et.al.	2601.23206	translate	read	null
2026-01-30	TSAQA: Time Series Analysis Question And Answering Benchmark	Baoyu Jing et.al.	2601.23204	translate	read	null
2026-01-30	Large Language Models for Patent Classification: Strengths, Trade-offs, and the Long Tail Effect	Lorenzo Emer et.al.	2601.23200	translate	read	null
2026-01-30	Deep Search with Hierarchical Meta-Cognitive Monitoring Inspired by Cognitive Neuroscience	Zhongxiang Sun et.al.	2601.23188	translate	read	null
2026-01-30	ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought	Fanmeng Wang et.al.	2601.23184	translate	read	link
2026-01-30	TriSpec: Ternary Speculative Decoding via Lightweight Proxy Verification	Haoyun Jiang et.al.	2601.23180	translate	read	null
2026-01-30	Make Anything Match Your Target: Universal Adversarial Perturbations against Closed-Source MLLMs via Multi-Crop Routed Meta Optimization	Hui Lu et.al.	2601.23179	translate	read	null
2026-01-30	Probing the Trajectories of Reasoning Traces in Large Language Models	Marthe Ballon et.al.	2601.23163	translate	read	null
2026-01-30	DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding	Jiaming Zhou et.al.	2601.23161	translate	read	link
2026-01-30	SPICE: Submodular Penalized Information-Conflict Selection for Efficient Large Language Model Training	Powei Chang et.al.	2601.23155	translate	read	null
2026-01-30	Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data	Eugenia Iofinova et.al.	2601.23153	translate	read	null
2026-01-30	Hearing is Believing? Evaluating and Analyzing Audio Language Model Sycophancy with SYAUDIO	Junchi Yao et.al.	2601.23149	translate	read	null
2026-01-30	RAudit: A Blind Auditing Protocol for Large Language Model Reasoning	Edward Y. Chang et.al.	2601.23133	translate	read	null
2026-01-30	Secure Tool Manifest and Digital Signing Solution for Verifiable MCP and LLM Pipelines	Saeid Jamshidi et.al.	2601.23132	translate	read	null
2026-01-30	An Automatic Deep Learning Approach for Trailer Generation through Large Language Models	Roberto Balestri et.al.	2601.23121	translate	read	null
2026-01-30	CATTO: Balancing Preferences and Confidence in Language Models	Nisarg Parikh et.al.	2601.23096	translate	read	null
2026-01-30	Exploring Sidewalk Sheds in New York City through Chatbot Surveys and Human Computer Interaction	Junyi Li et.al.	2601.23095	translate	read	null
2026-01-30	WiFiPenTester: Advancing Wireless Ethical Hacking with Governed GenAI	Haitham S. Al-Sinani et.al.	2601.23092	translate	read	null
2026-01-30	OrLog: Resolving Complex Queries with LLMs and Probabilistic Reasoning	Mohanna Hoveyda et.al.	2601.23085	translate	read	null
2026-01-30	Character as a Latent Variable in Large Language Models: A Mechanistic Account of Emergent Misalignment and Conditional Safety Failures	Yanghao Su et.al.	2601.23081	translate	read	null
2026-01-30	Towards Explicit Acoustic Evidence Perception in Audio LLMs for Speech Deepfake Detection	Xiaoxuan Guo et.al.	2601.23066	translate	read	null
2026-01-30	HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation	Hari Krishna Gadi et.al.	2601.23064	translate	read	null
2026-01-30	Gender Disparities in StackOverflow’s Community-Based Question Answering: A Matter of Quantity versus Quality	Maddalena Amendola et.al.	2601.23063	translate	read	null
2026-01-30	On the Impact of Code Comments for Automated Bug-Fixing: An Empirical Study	Antonio Vitale et.al.	2601.23059	translate	read	null
2026-01-30	From Absolute to Relative: Rethinking Reward Shaping in Group-Based Reinforcement Learning	Wenzhe Niu et.al.	2601.23058	translate	read	null
2026-01-30	From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics	Bowen Cao et.al.	2601.23048	translate	read	null
2026-01-30	Guided by Trajectories: Repairing and Rewarding Tool-Use Trajectories for Tool-Integrated Reasoning	Siyu Gong et.al.	2601.23032	translate	read	null
2026-01-30	DimABSA: Building Multilingual and Multidomain Datasets for Dimensional Aspect-Based Sentiment Analysis	Lung-Hao Lee et.al.	2601.23022	translate	read	null
2026-01-30	Integrating Multi-Label Classification and Generative AI for Scalable Analysis of User Feedback	Sandra Loop et.al.	2601.23018	translate	read	null
2026-01-30	SolAgent: A Specialized Multi-Agent Framework for Solidity Code Generation	Wei Chen et.al.	2601.23009	translate	read	null
2026-01-30	InstructDiff: Domain-Adaptive Data Selection via Differential Entropy for Efficient LLM Fine-Tuning	Junyou Su et.al.	2601.23006	translate	read	null
2026-01-30	Bias Beyond Borders: Political Ideology Evaluation and Steering in Multilingual LLMs	Afrozah Nadeem et.al.	2601.23001	translate	read	null
2026-01-30	Mano: Restriking Manifold Optimization for LLM Training	Yufei Gu et.al.	2601.23000	translate	read	null
2026-01-30	Competitive Non-Clairvoyant KV-Cache Scheduling for LLM Inference	Yiding Feng et.al.	2601.22996	translate	read	null
2026-01-30	Learnable Permutation for Structured Sparsity on Transformer Models	Zekai Li et.al.	2601.22980	translate	read	null
2026-01-30	Quantifying Model Uniqueness in Heterogeneous AI Ecosystems	Lei You et.al.	2601.22977	translate	read	null
2026-01-30	Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text	Ximing Lu et.al.	2601.22975	translate	read	null
2026-01-30	MiTa: A Hierarchical Multi-Agent Collaboration Framework with Memory-integrated and Task Allocation	XiaoJie Zhang et.al.	2601.22974	translate	read	null
2026-01-30	A Unified View of Attention and Residual Sinks: Outlier-Driven Rescaling is Essential for Transformer Training	Zihan Qiu et.al.	2601.22966	translate	read	null
2026-01-30	SWE-Manager: Selecting and Synthesizing Golden Proposals Before Coding	Boyin Tan et.al.	2601.22956	translate	read	null
2026-01-30	Residual Context Diffusion Language Models	Yuezhou Hu et.al.	2601.22954	translate	read	null
2026-01-30	Sifting the Noise: A Comparative Study of LLM Agents in Vulnerability False Positive Filtering	Yunpeng Xiong et.al.	2601.22952	translate	read	null
2026-01-30	Alignment among Language, Vision and Action Representations	Nicola Milano et.al.	2601.22948	translate	read	null
2026-01-30	Relaxing Positional Alignment in Masked Diffusion Language Models	Mengyu Ye et.al.	2601.22947	translate	read	null
2026-01-30	Protecting Private Code in IDE Autocomplete using Differential Privacy	Evgeny Grigorenko et.al.	2601.22935	translate	read	null
2026-01-30	MTDrive: Multi-turn Interactive Reinforcement Learning for Autonomous Driving	Xidong Li et.al.	2601.22930	translate	read	null
2026-01-30	LLMs Explain’t: A Post-Mortem on Semantic Interpretability in Transformer Models	Alhassan Abdelhalim et.al.	2601.22928	translate	read	null
2026-01-30	BEAR: Towards Beam-Search-Aware Optimization for Recommendation with Large Language Models	Weiqin Yang et.al.	2601.22925	translate	read	null
2026-01-30	Evaluating Large Language Models for Security Bug Report Prediction	Farnaz Soltaniani et.al.	2601.22921	translate	read	null
2026-01-30	LLMDR: Large language model driven framework for missing data recovery in mixed data under low resource regime	Durga Keshav et.al.	2601.22916	translate	read	null
2026-01-30	Game-Theoretic Co-Evolution for LLM-Based Heuristic Discovery	Xinyi Ke et.al.	2601.22896	translate	read	null
2026-01-30	When Machines Get It Wrong: Large Language Models Perpetuate Autism Myths More Than Humans Do	Eduardo C. Garrido-Merchán et.al.	2601.22893	translate	read	null
2026-01-30	MoVE: Mixture of Value Embeddings – A New Axis for Scaling Parametric Memory in Autoregressive Models	Yangyan Li et.al.	2601.22887	translate	read	null
2026-01-30	Leveraging LLMs For Turkish Skill Extraction	Ezgi Arslan İltüzer et.al.	2601.22885	translate	read	null
2026-01-30	EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis	Li Zhou et.al.	2601.22873	translate	read	null
2026-01-30	MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering	Chuanzhe Guo et.al.	2601.22859	translate	read	null
2026-01-30	Learning to Build Shapes by Extrusion	Thor Vestergaard Christiansen et.al.	2601.22858	translate	read	null
2026-01-30	Hierarchical Shift Mixing – Beyond Dense Attention in Transformers	Robert Forchheimer et.al.	2601.22852	translate	read	null
2026-01-30	When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training	Felicia Körner et.al.	2601.22851	translate	read	null
2026-01-30	Hide and Seek in Embedding Space: Geometry-based Steganography and Detection in Large Language Models	Charles Westphal et.al.	2601.22818	translate	read	null
2026-01-30	Stable Personas: Dual-Assessment of Temporal Stability in LLM-Based Human Simulation	Jana Gonnermann-Müller et.al.	2601.22812	translate	read	null
2026-01-30	Operational Solar Flare Forecasting System Using an Explainable Large Language Model	Xuebao Li et.al.	2601.22811	translate	read	null
2026-01-30	Clipping-Free Policy Optimization for Large Language Models	Ömer Veysel Çağatan et.al.	2601.22801	translate	read	null
2026-01-30	Sparse or Dense? A Mechanistic Estimation of Computation Density in Transformer-based LLMs	Corentin Kervadec et.al.	2601.22795	translate	read	null
2026-01-30	Understanding on the Edge: LLM-generated Boundary Test Explanations	Sabinakhon Akbarova et.al.	2601.22791	translate	read	null
2026-01-30	Toward IIT-Inspired Consciousness in LLMs: A Reward-Based Learning Framework	Hamid Reza Akbari et.al.	2601.22786	translate	read	null
2026-01-30	Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval	Ilyass Moummad et.al.	2601.22783	translate	read	null
2026-01-30	Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization	Genshun Wan et.al.	2601.22779	translate	read	null
2026-01-30	RASST: Fast Cross-modal Retrieval-Augmented Simultaneous Speech Translation	Jiaxuan Luo et.al.	2601.22777	translate	read	null
2026-01-30	TSPO: Breaking the Double Homogenization Dilemma in Multi-turn Search Policy Optimization	Shichao Ma et.al.	2601.22776	translate	read	null
2026-01-30	How Far Can Pretrained LLMs Go in Symbolic Music? Controlled Comparisons of Supervised and Preference-based Adaptation	Deepak Kumar et.al.	2601.22764	translate	read	null
2026-01-30	AscendCraft: Automatic Ascend NPU Kernel Generation via DSL-Guided Transcompilation	Zhongzhen Wen et.al.	2601.22760	translate	read	null
2026-01-30	Qualitative Evaluation of LLM-Designed GUI	Bartosz Sawicki et.al.	2601.22759	translate	read	null
2026-01-30	AutoRefine: From Trajectories to Reusable Expertise for Continual LLM Agent Refinement	Libin Qiu et.al.	2601.22758	translate	read	null
2026-01-30	AutoMerge: Search-Based Model Merging Framework for Effective Model Reuse	You Lu et.al.	2601.22748	translate	read	null
2026-01-30	AR-BENCH: Benchmarking Legal Reasoning with Judgment Error Detection, Classification and Correction	Yifei Li et.al.	2601.22742	translate	read	null
2026-01-30	MM-THEBench: Do Reasoning MLLMs Think Reasonably?	Zhidian Huang et.al.	2601.22735	translate	read	null
2026-01-30	ImgCoT: Compressing Long Chain of Thought into Compact Visual Tokens for Efficient Reasoning of Large Language Model	Xiaoshu Chen et.al.	2601.22730	translate	read	null
2026-01-30	A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization	Shiye Lei et.al.	2601.22718	translate	read	null
2026-01-30	RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories	Yanlin Wang et.al.	2601.22706	translate	read	null
2026-01-30	Models Know Models Best: Evaluation via Model-Preferred Formats	Joonhak Lee et.al.	2601.22699	translate	read	null
2026-01-30	FNF: Functional Network Fingerprint for Large Language Models	Yiheng Liu et.al.	2601.22692	translate	read	null
2026-01-30	Do Transformers Have the Ability for Periodicity Generalization?	Huanyu Liu et.al.	2601.22690	translate	read	null
2026-01-30	BioModelsRAG: A Biological Modeling Assistant Using RAG (Retrieval Augmented Generation)	Bhavyahshree Navaneetha Krishnan et.al.	2601.22684	translate	read	null
2026-01-30	VarParser: Unleashing the Neglected Power of Variables for LLM-based Log Parsing	Jinrui Sun et.al.	2601.22676	translate	read	null
2026-01-30	VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration	Hanxun Yu et.al.	2601.22674	translate	read	link
2026-01-30	Real-Time Aligned Reward Model beyond Semantics	Zixuan Huang et.al.	2601.22664	translate	read	null
2026-01-30	Task-Aware LLM Council with Adaptive Decision Pathways for Decision Support	Wei Zhu et.al.	2601.22662	translate	read	null
2026-01-30	UCPO: Uncertainty-Aware Policy Optimization	Xianzhou Zeng et.al.	2601.22648	translate	read	null
2026-01-30	Beyond Medical Chatbots: Meddollina and the Rise of Continuous Clinical Intelligence	Vaibhav Ram S. V. N. S et.al.	2601.22645	translate	read	null
2026-01-30	Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification	Chuxue Cao et.al.	2601.22642	translate	read	null
2026-01-30	Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling	Mingqian Feng et.al.	2601.22636	translate	read	null
2026-01-30	MCP-Diag: A Deterministic, Protocol-Driven Architecture for AI-Native Network Diagnostics	Devansh Lodha et.al.	2601.22633	translate	read	null
2026-01-30	DART-ing Through the Drift: Dynamic Tracing of Knowledge Neurons for Adaptive Inference-Time Pruning	Abhishek Tyagi et.al.	2601.22632	translate	read	null
2026-01-30	Time-Annealed Perturbation Sampling: Diverse Generation for Diffusion Language Models	Jingxuan Wu et.al.	2601.22629	translate	read	null
2026-01-30	TTCS: Test-Time Curriculum Synthesis for Self-Evolving	Chengyi Yang et.al.	2601.22628	translate	read	link
2026-01-30	SYMPHONY: Synergistic Multi-agent Planning with Heterogeneous Language Model Assembly	Wei Zhu et.al.	2601.22623	translate	read	null
2026-01-30	Ethical Risks of Large Language Models in Medical Consultation: An Assessment Based on Reproductive Ethics	Hanhui Xu et.al.	2601.22621	translate	read	null
2026-01-30	Layer-wise Swapping for Generalizable Multilingual Safety	Hyunseo Shin et.al.	2601.22620	translate	read	null
2026-01-30	Learn More with Less: Uncertainty Consistency Guided Query Selection for RLVR	Hao Yi et.al.	2601.22595	translate	read	null
2026-01-30	Small is Beautiful: A Practical and Efficient Log Parsing Framework	Minxing Wang et.al.	2601.22590	translate	read	null
2026-01-30	Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry	Zhuochun Li et.al.	2601.22588	translate	read	null
2026-01-30	HetCCL: Accelerating LLM Training with Heterogeneous GPUs	Heehoon Kim et.al.	2601.22585	translate	read	null
2026-01-30	SpanNorm: Reconciling Training Stability and Performance in Deep Transformers	Chao Wang et.al.	2601.22580	translate	read	null
2026-01-30	PhoStream: Benchmarking Real-World Streaming for Omnimodal Assistants in Mobile Scenarios	Xudong Lu et.al.	2601.22575	translate	read	null
2026-01-30	Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding	Yuansheng Gao et.al.	2601.22574	translate	read	null
2026-01-30	PerfGuard: A Performance-Aware Agent for Visual Content Generation	Zhipeng Chen et.al.	2601.22571	translate	read	null
2026-01-30	Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction	Aditya Sarkar et.al.	2601.22570	translate	read	null
2026-01-30	Whispers of Wealth: Red-Teaming Google’s Agent Payments Protocol via Prompt Injection	Tanusree Debi et.al.	2601.22569	translate	read	null
2026-01-30	Are LLM Evaluators Really Narcissists? Sanity Checking Self-Preference Evaluations	Dani Roytburg et.al.	2601.22548	translate	read	null
2026-01-30	Towards the Holographic Characteristic of LLMs for Efficient Short-text Generation	Shun Qian et.al.	2601.22546	translate	read	null
2026-01-30	SCaLRec: Semantic Calibration for LLM-enabled Cloud-Device Sequential Recommendation	Ruiqi Zheng et.al.	2601.22543	translate	read	null
2026-01-30	Decoding in Geometry: Alleviating Embedding-Space Crowding for Complex Reasoning	Yixin Yang et.al.	2601.22536	translate	read	null
2026-01-30	Darwinian Memory: A Training-Free Self-Regulating Memory System for GUI Agent Evolution	Hongze Mi et.al.	2601.22528	translate	read	null
2026-01-30	$ρ$-$\texttt{EOS}$ : Training-free Bidirectional Variable-Length Control for Masked Diffusion LLMs	Jingyi Yang et.al.	2601.22527	translate	read	null
2026-01-30	Shattered Compositionality: Counterintuitive Learning Dynamics of Transformers for Arithmetic	Xingyu Zhao et.al.	2601.22510	translate	read	null
2026-01-30	FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks	Naen Xu et.al.	2601.22485	translate	read	null
2026-01-30	Head-Aware Visual Cropping: Enhancing Fine-Grained VQA with Attention-Guided Subimage	Junfei Xie et.al.	2601.22483	translate	read	null
2026-01-30	Transform-Augmented GRPO Improves Pass@k	Khiem Le et.al.	2601.22478	translate	read	null
2026-01-30	Unrewarded Exploration in Large Language Models Reveals Latent Learning from Psychology	Jian Xiong et.al.	2601.22474	translate	read	null
2026-01-30	Toward Non-Expert Customized Congestion Control	Mingrui Zhang et.al.	2601.22461	translate	read	null
2026-01-30	ScribbleSense: Generative Scribble-Based Texture Editing with Intent Prediction	Yudi Zhang et.al.	2601.22455	translate	read	null
2026-01-30	Does My Chatbot Have an Agenda? Understanding Human and AI Agency in Human-Human-like Chatbot Interaction	Bhada Yun et.al.	2601.22452	translate	read	null
2026-01-30	Countering the Over-Reliance Trap: Mitigating Object Hallucination for LVLMs via a Self-Validation Framework	Shiyu Liu et.al.	2601.22451	translate	read	null
2026-01-30	Towards Resiliency in Large Language Model Serving with KevlarFlow	Shangshu Qian et.al.	2601.22438	translate	read	null
2026-01-30	Large Language Model Agents Are Not Always Faithful Self-Evolvers	Weixiang Zhao et.al.	2601.22436	translate	read	null
2026-01-30	When LLM meets Fuzzy-TOPSIS for Personnel Selection through Automated Profile Analysis	Shahria Hoque et.al.	2601.22433	translate	read	null
2026-01-30	ScamPilot: Simulating Conversations with LLMs to Protect Against Online Scams	Owen Hoffman et.al.	2601.22426	translate	read	null
2026-01-29	Bifocal Attention: Harmonizing Geometric and Spectral Positional Embeddings for Algorithmic Generalization	Kanishk Awadhiya et.al.	2601.22402	translate	read	null
2026-01-29	Jailbreaks on Vision Language Model via Multimodal Reasoning	Aarush Noheria et.al.	2601.22398	translate	read	null
2026-01-29	Culturally Grounded Personas in Large Language Models: Characterization and Alignment with Socio-Psychological Value Frameworks	Candida M. Greco et.al.	2601.22396	translate	read	null
2026-01-29	Specialists or Generalists? Multi-Agent and Single-Agent LLMs for Essay Grading	Jamiu Adekunle Idowu et.al.	2601.22386	translate	read	null
2026-01-29	Purely Agentic Black-Box Optimization for Biological Design	Natalie Maus et.al.	2601.22382	translate	read	null
2026-01-29	Stability-Aware Prompt Optimization for Clinical Data Abstraction	Arinbjörn Kolbeinsson et.al.	2601.22373	translate	read	null
2026-01-29	Towards Solving the Gilbert-Pollak Conjecture via Large Language Models	Yisi Ke et.al.	2601.22365	translate	read	null
2026-01-29	Context Structure Reshapes the Representational Geometry of Language Models	Eghbal A. Hosseini et.al.	2601.22364	translate	read	null
2026-01-29	Understanding Efficiency: Quantization, Batching, and Serving Strategies in LLM Energy Use	Julien Delavande et.al.	2601.22362	translate	read	null
2026-01-29	MERMAID: Memory-Enhanced Retrieval and Reasoning with Multi-Agent Iterative Knowledge Grounding for Veracity Assessment	Yupeng Cao et.al.	2601.22361	translate	read	null
2026-01-29	Small Talk, Big Impact: The Energy Cost of Thanking AI	Julien Delavande et.al.	2601.22357	translate	read	null
2026-01-29	Sparks of Rationality: Do Reasoning LLMs Align with Human Judgment and Choice?	Ala N. Tak et.al.	2601.22329	translate	read	null
2026-01-29	Federate the Router: Learning Language Model Routers with Sparse and Decentralized Evaluations	Baris Askin et.al.	2601.22318	translate	read	null
2026-01-29	Gaussian Process Bandit Optimization with Machine Learning Predictions and Application to Hypothesis Generation	Xin Jennifer Chen et.al.	2601.22315	translate	read	null
2026-01-29	Hair-Trigger Alignment: Black-Box Evaluation Cannot Guarantee Post-Update Alignment	Yavuz Bakman et.al.	2601.22313	translate	read	null
2026-01-29	SCALAR: Quantifying Structural Hallucination, Consistency, and Reasoning Gaps in Materials Foundation Models	Can Polat et.al.	2601.22312	translate	read	null
2026-01-29	Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents	Zehong Wang et.al.	2601.22311	translate	read	null
2026-01-29	Prepare Reasoning Language Models for Multi-Agent Debate with Self-Debate Reinforcement Learning	Chenxi Liu et.al.	2601.22297	translate	read	null
2026-01-29	The Six Sigma Agent: Achieving Enterprise-Grade Reliability in LLM Systems Through Consensus-Driven Decomposed Execution	Khush Patel et.al.	2601.22290	translate	read	null
2026-01-29	FunPRM: Function-as-Step Process Reward Model with Meta Reward Correction for Code Generation	Ruiyi Zhang et.al.	2601.22249	translate	read	null
2026-01-29	MirrorMark: A Distortion-Free Multi-Bit Watermark for Large Language Models	Ya Jiang et.al.	2601.22246	translate	read	null
2026-01-29	A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy	Pedro H. Barcha Correia et.al.	2601.22240	translate	read	null
2026-01-29	What Lies Beneath: A Call for Distribution-based Visual Question & Answer Datasets	Jill P. Naiman et.al.	2601.22218	translate	read	null
2026-01-29	Stalled, Biased, and Confused: Uncovering Reasoning Failures in LLMs for Cloud-Based Root Cause Analysis	Evelien Riddell et.al.	2601.22208	translate	read	null
2026-01-28	Tacit Coordination of Large Language Models	Ido Aharon et.al.	2601.22184	translate	read	null
2026-01-29	UEval: A Benchmark for Unified Multimodal Generation	Bo Li et.al.	2601.22155	translate	read	null
2026-01-29	DynaWeb: Model-Based Reinforcement Learning of Web Agents	Hang Ding et.al.	2601.22149	translate	read	null
2026-01-29	FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale	Ajay Patel et.al.	2601.22146	translate	read	null
2026-01-29	Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers	Xin Chen et.al.	2601.22139	translate	read	null
2026-01-29	Pay for Hints, Not Answers: LLM Shepherding for Cost-Efficient Inference	Ziming Dong et.al.	2601.22132	translate	read	link
2026-01-29	World of Workflows: a Benchmark for Bringing World Models to Enterprise Systems	Lakshya Gupta et.al.	2601.22130	translate	read	null
2026-01-29	SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents	Yifeng Ding et.al.	2601.22129	translate	read	null
2026-01-29	The Patient is not a Moving Document: A World Model Training Paradigm for Longitudinal EHR	Irsyad Adam et.al.	2601.22128	translate	read	null
2026-01-29	A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine	Anran Li et.al.	2601.22124	translate	read	null
2026-01-29	ECO: Quantized Training without Full-Precision Master Weights	Mahdi Nikdan et.al.	2601.22101	translate	read	null
2026-01-29	VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning	Yibo Wang et.al.	2601.22069	translate	read	link
2026-01-29	Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models	Wenxuan Huang et.al.	2601.22060	translate	read	null
2026-01-29	AIRPET: Virtual Positron Emission Tomography	J. Renner et.al.	2601.22059	translate	read	null
2026-01-29	MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources	Baorui Ma et.al.	2601.22054	translate	read	link
2026-01-29	MasalBench: A Benchmark for Contextual and Cross-Cultural Understanding of Persian Proverbs in LLMs	Ghazal Kalhor et.al.	2601.22050	translate	read	null
2026-01-29	On the Paradoxical Interference between Instruction-Following and Task Solving	Yunjia Qi et.al.	2601.22047	translate	read	null
2026-01-29	Per-parameter Task Arithmetic for Unlearning in Large Language Models	Chengyi Cai et.al.	2601.22030	translate	read	null
2026-01-29	CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty	Johannes Kirmayr et.al.	2601.22027	translate	read	null
2026-01-29	When “Better” Prompts Hurt: Evaluation-Driven Iteration for LLM Applications	Daniel Commey et.al.	2601.22025	translate	read	null
2026-01-29	Visual-Guided Key-Token Regularization for Multimodal Large Language Model Unlearning	Chengyi Cai et.al.	2601.22020	translate	read	null
2026-01-29	TBDFiltering: Sample-Efficient Tree-Based Data Filtering	Robert Istvan Busa-Fekete et.al.	2601.22016	translate	read	null
2026-01-29	SpecTran: Spectral-Aware Transformer-based Adapter for LLM-Enhanced Sequential Recommendation	Yu Cui et.al.	2601.21986	translate	read	null
2026-01-29	Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding	Yifan Zhu et.al.	2601.21969	translate	read	null
2026-01-29	Industrialized Deception: The Collateral Effects of LLM-Generated Misinformation on Digital Ecosystems	Alexander Loth et.al.	2601.21963	translate	read	null
2026-01-29	ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models	Bowen Fang et.al.	2601.21947	translate	read	null
2026-01-29	Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities	Shuangshuang Ying et.al.	2601.21937	translate	read	null
2026-01-29	Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text	Hongyi Zhou et.al.	2601.21895	translate	read	null
2026-01-29	Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning	Lukas Twist et.al.	2601.21894	translate	read	null
2026-01-29	astra-langchain4j: Experiences Combining LLMs and Agent Programming	Rem Collier et.al.	2601.21879	translate	read	null
2026-01-29	Evolution of Benchmark: Black-Box Optimization Benchmark Design through Large Language Model	Chen Wang et.al.	2601.21877	translate	read	null
2026-01-29	LLM-Driven Scenario-Aware Planning for Autonomous Driving	He Li et.al.	2601.21876	translate	read	null
2026-01-29	WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents	Yao Zhang et.al.	2601.21872	translate	read	null
2026-01-29	KnowBias: Mitigating Social Bias in LLMs via Know-Bias Neuron Enhancement	Jinhao Pan et.al.	2601.21864	translate	read	null
2026-01-29	READY: Reward Discovery for Meta-Black-Box Optimization	Zechuan Huang et.al.	2601.21847	translate	read	null
2026-01-29	Embodied Task Planning via Graph-Informed Action Generation with Large Lanaguage Model	Xiang Li et.al.	2601.21841	translate	read	null
2026-01-29	Test-Time Compute Games	Ander Artola Velasco et.al.	2601.21839	translate	read	null
2026-01-29	Mil-SCORE: Benchmarking Long-Context Geospatial Reasoning and Planning in Large Language Models	Aadi Palnitkar et.al.	2601.21826	translate	read	null
2026-01-29	DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training	Xinwei Qiang et.al.	2601.21824	translate	read	null
2026-01-29	CORE:Toward Ubiquitous 6G Intelligence Through Collaborative Orchestration of Large Language Model Agents Over Hierarchical Edge	Zitong Yu et.al.	2601.21822	translate	read	null
2026-01-29	A Judge-Aware Ranking Framework for Evaluating Large Language Models without Ground Truth	Mingyuan Xu et.al.	2601.21817	translate	read	null
2026-01-29	Nonparametric LLM Evaluation from Preference Data	Dennis Frauen et.al.	2601.21816	translate	read	null
2026-01-29	Distribution-Aware Reward Estimation for Test-Time Reinforcement Learning	Bodong Du et.al.	2601.21804	translate	read	null
2026-01-29	A Unified XAI-LLM Approach for EndotrachealSuctioning Activity Recognition	Hoang Khang Phan et.al.	2601.21802	translate	read	null
2026-01-29	CG-MLLM: Captioning and Generating 3D content via Multi-modal Large Language Models	Junming Huang et.al.	2601.21798	translate	read	null
2026-01-29	Effective LoRA Adapter Routing using Task Representations	Akash Dhasade et.al.	2601.21795	translate	read	null
2026-01-29	Assessing the Business Process Modeling Competences of Large Language Models	Chantale Lauer et.al.	2601.21787	translate	read	null
2026-01-29	Zonkey: A Hierarchical Diffusion Language Model with Differentiable Tokenization and Probabilistic Attention	Alon Rozental et.al.	2601.21768	translate	read	null
2026-01-29	Evaluating ChatGPT on Medical Information Extraction Tasks: Performance, Explainability and Beyond	Wei Zhu et.al.	2601.21767	translate	read	null
2026-01-29	EWSJF: An Adaptive Scheduler with Hybrid Partitioning for Mixed-Workload LLM Inference	Bronislav Sidik et.al.	2601.21758	translate	read	null
2026-01-29	Language-based Trial and Error Falls Behind in the Era of Experience	Haoyu Wang et.al.	2601.21754	translate	read	null
2026-01-29	Temporal Guidance for Large Language Models	Hong-Kai Zheng et.al.	2601.21744	translate	read	null
2026-01-29	MIDI-LLaMA: An Instruction-Following Multimodal LLM for Symbolic Music Understanding	Meng Yang et.al.	2601.21740	translate	read	null
2026-01-29	CE-GOCD: Central Entity-Guided Graph Optimization for Community Detection to Augment LLM Scientific Question Answering	Jiayin Lan et.al.	2601.21733	translate	read	null
2026-01-29	E-mem: Multi-agent based Episodic Context Reconstruction for LLM Agent Memory	Kaixiang Wang et.al.	2601.21714	translate	read	null
2026-01-29	TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning	Huiyuan Lai et.al.	2601.21711	translate	read	null
2026-01-29	Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis	Qingyue Yang et.al.	2601.21709	translate	read	link
2026-01-29	FBS: Modeling Native Parallel Reading inside a Transformer	Tongxi Wang et.al.	2601.21708	translate	read	null
2026-01-29	Toward Culturally Aligned LLMs through Ontology-Guided Multi-Agent Reasoning	Wonduk Seo et.al.	2601.21700	translate	read	null
2026-01-29	ChartE $^{3}$ : A Comprehensive Benchmark for End-to-End Chart Editing	Shuo Li et.al.	2601.21694	translate	read	null
2026-01-29	TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning	Mingzu Liu et.al.	2601.21692	translate	read	null
2026-01-29	Do Not Waste Your Rollouts: Recycling Search Experience for Efficient Test-Time Scaling	Xinglin Wang et.al.	2601.21684	translate	read	null
2026-01-29	FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning	Xiaoyu Xu et.al.	2601.21682	translate	read	null
2026-01-29	LLM4Fluid: Large Language Models as Generalizable Neural Solvers for Fluid Dynamics	Qisong Xiao et.al.	2601.21681	translate	read	null
2026-01-29	Scale-Dependent Semantic Dynamics Revealed by Allan Deviation	Debayan Dasgupta et.al.	2601.21678	translate	read	null
2026-01-29	SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding	Ahmed Y. Radwan et.al.	2601.21666	translate	read	link
2026-01-29	AdaptBPE: From General Purpose to Specialized Tokenizers	Vijini Liyanage et.al.	2601.21665	translate	read	null
2026-01-29	ScholarGym: Benchmarking Deep Research Workflows on Academic Literature Retrieval	Hao Shen et.al.	2601.21654	translate	read	null
2026-01-29	ILRR: Inference-Time Steering Method for Masked Diffusion Language Models	Eden Avrahami et.al.	2601.21647	translate	read	null
2026-01-29	RSGround-R1: Rethinking Remote Sensing Visual Grounding through Spatial Reasoning	Shiqi Huang et.al.	2601.21634	translate	read	null
2026-01-29	LAMP: Look-Ahead Mixed-Precision Inference of Large Language Models	Stanislav Budzinskiy et.al.	2601.21623	translate	read	null
2026-01-29	StarSD: One-for-Many Speculative Decoding	Junhao He et.al.	2601.21622	translate	read	null
2026-01-29	Thinking Broad, Acting Fast: Latent Reasoning Distillation from Multi-Perspective Chain-of-Thought for E-Commerce Relevance	Baopu Qiu et.al.	2601.21611	translate	read	null
2026-01-29	WMVLM: Evaluating Diffusion Model Image Watermarking via Vision-Language Models	Zijin Yang et.al.	2601.21610	translate	read	null
2026-01-29	RecNet: Self-Evolving Preference Propagation for Agentic Recommender Systems	Bingqian Li et.al.	2601.21609	translate	read	null
2026-01-29	Age Matters: Analyzing Age-Related Discussions in App Reviews	Shashiwadana Nirmania et.al.	2601.21605	translate	read	null
2026-01-29	CORE: Collaborative Reasoning via Cross Teaching	Kshitij Mishra et.al.	2601.21600	translate	read	null
2026-01-29	Beyond Imitation: Reinforcement Learning for Active Latent Planning	Zhi Zheng et.al.	2601.21598	translate	read	null
2026-01-29	Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening	Xiaotong Ji et.al.	2601.21590	translate	read	null
2026-01-29	ICL-EVADER: Zero-Query Black-Box Evasion Attacks on In-Context Learning and Their Defenses	Ningyuan He et.al.	2601.21586	translate	read	null
2026-01-29	Learning the Mechanism of Catastrophic Forgetting: A Perspective from Gradient Similarity	Mutian Yang et.al.	2601.21577	translate	read	null
2026-01-29	Chain Of Thought Compression: A Theoritical Analysis	Juncai Li et.al.	2601.21576	translate	read	null
2026-01-29	ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas	Xiaoyu Tian et.al.	2601.21558	translate	read	null
2026-01-29	Meta Context Engineering via Agentic Skill Evolution	Haoran Ye et.al.	2601.21557	translate	read	null
2026-01-29	Note2Chat: Improving LLMs for Multi-Turn Clinical History Taking Using Medical Notes	Yang Zhou et.al.	2601.21551	translate	read	null
2026-01-29	ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory	Yang Zhao et.al.	2601.21545	translate	read	null
2026-01-29	Opinion Consensus Formation Among Networked Large Language Models	Iris Yazici et.al.	2601.21540	translate	read	null
2026-01-29	More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD)	Sagi Meir et.al.	2601.21522	translate	read	null
2026-01-29	HERS: Hidden-Pattern Expert Learning for Risk-Specific Vehicle Damage Adaptation in Diffusion Models	Teerapong Panboonyuen et.al.	2601.21517	translate	read	null
2026-01-29	LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AI	Niki van Stein et.al.	2601.21511	translate	read	null
2026-01-29	The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation	Diaoulé Diallo et.al.	2601.21505	translate	read	null
2026-01-29	MAR: Efficient Large Language Models via Module-aware Architecture Refinement	Junhong Cai et.al.	2601.21503	translate	read	null
2026-01-29	The Path of Least Resistance: Guiding LLM Reasining Trajectories with Prefix Consensus	Ishan Jindal et.al.	2601.21494	translate	read	null
2026-01-29	DimStance: Multilingual Datasets for Dimensional Stance Analysis	Jonas Becker et.al.	2601.21483	translate	read	null
2026-01-29	SOUP: Token-level Single-sample Mix-policy Reinforcement Learning for Large Language Models	Lei Yang et.al.	2601.21476	translate	read	null
2026-01-29	Adaptive Confidence Gating in Multi-Agent Collaboration for Efficient and Optimized Code Generation	Haoji Zhang et.al.	2601.21469	translate	read	null
2026-01-29	Topeax – An Improved Clustering Topic Model with Density Peak Detection and Lexical-Semantic Term Importance	Márton Kardos et.al.	2601.21465	translate	read	null
2026-01-29	Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation	Yuan Sui et.al.	2601.21464	translate	read	null
2026-01-29	Unifying Speech Editing Detection and Content Localization via Prior-Enhanced Audio LLMs	Jun Xue et.al.	2601.21463	translate	read	null
2026-01-29	SAGE: Sequence-level Adaptive Gradient Evolution for Generative Recommendation	Yu Xie et.al.	2601.21452	translate	read	null
2026-01-29	Variance & Greediness: A comparative study of metric-learning losses	Donghuo Zeng et.al.	2601.21450	translate	read	null
2026-01-29	ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design	Zhongkai Yu et.al.	2601.21448	translate	read	null
2026-01-29	The Paradox of Robustness: Decoupling Rule-Based Logic from Affective Noise in High-Stakes Decision-Making	Jon Chun et.al.	2601.21439	translate	read	null
2026-01-29	Accurate Network Traffic Matrix Prediction via LEAD: an LLM-Enhanced Adapter-Based Conditional Diffusion Model	Yu Sun et.al.	2601.21437	translate	read	null
2026-01-29	From Consistency to Complementarity: Aligned and Disentangled Multi-modal Learning for Time Series Understanding and Reasoning	Hang Ni et.al.	2601.21436	translate	read	null
2026-01-29	When Prohibitions Become Permissions: Auditing Negation Sensitivity in Language Models	Katherine Elkins et.al.	2601.21433	translate	read	null
2026-01-29	MultiModal Fine-tuning with Synthetic Captions	Shohei Enomoto et.al.	2601.21426	translate	read	null
2026-01-29	ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation	Zihao Huang et.al.	2601.21420	translate	read	null
2026-01-29	Statsformer: Validated Ensemble Learning with LLM-Derived Semantic Priors	Erica Zhang et.al.	2601.21410	translate	read	null
2026-01-29	User-Centric Evidence Ranking for Attribution and Fact Verification	Guy Alt et.al.	2601.21387	translate	read	null
2026-01-29	Predicting Developer Acceptance of AI-Generated Code Suggestions	Jing Jiang et.al.	2601.21379	translate	read	null
2026-01-29	TeachBench: A Syllabus-Grounded Framework for Evaluating Teaching Ability in Large Language Models	Zheng Li et.al.	2601.21375	translate	read	null
2026-01-29	NEMO: Execution-Aware Optimization Modeling via Autonomous Coding Agents	Yang Song et.al.	2601.21372	translate	read	null
2026-01-29	Small models, big threats: Characterizing safety challenges from low-compute AI models	Prateek Puri et.al.	2601.21365	translate	read	null
2026-01-29	The Compliance Paradox: Semantic-Instruction Decoupling in Automated Academic Code Evaluation	Devanshu Sahoo et.al.	2601.21360	translate	read	null
2026-01-29	Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization	Jiecong Wang et.al.	2601.21358	translate	read	null
2026-01-29	Factored Causal Representation Learning for Robust Reward Modeling in RLHF	Yupei Yang et.al.	2601.21350	translate	read	null
2026-01-29	Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER	Xiuwen Zheng et.al.	2601.21347	translate	read	null
2026-01-29	Self-Improving Pretraining: using post-trained models to pretrain better models	Ellen Xiaoqing Tan et.al.	2601.21343	translate	read	null
2026-01-29	Ostrakon-VL: Towards Domain-Expert MLLM for Food-Service and Retail Stores	Zhiyong Shen et.al.	2601.21342	translate	read	null
2026-01-29	EHR-RAG: Bridging Long-Horizon Structured Electronic Health Records and Large Language Models via Enhanced Retrieval-Augmented Generation	Lang Cao et.al.	2601.21340	translate	read	null
2026-01-29	Within-Model vs Between-Prompt Variability in Large Language Models for Creative Tasks	Jennifer Haase et.al.	2601.21339	translate	read	null
2026-01-29	White-Box Op-Amp Design via Human-Mimicking Reasoning	Zihao Chen et.al.	2601.21321	translate	read	null
2026-01-29	Detecting Multiple Semantic Concerns in Tangled Code Commits	Beomsu Koh et.al.	2601.21298	translate	read	null
2026-01-29	More Code, Less Reuse: Investigating Code Quality and Reviewer Sentiment towards AI-generated Pull Requests	Haoming Huang et.al.	2601.21276	translate	read	null
2026-01-29	Reinforcement Learning from Meta-Evaluation: Aligning Language Models Without Ground-Truth Labels	Micah Rentschler et.al.	2601.21268	translate	read	null
2026-01-29	CausalEmbed: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding	Jiahao Huo et.al.	2601.21262	translate	read	null
2026-01-29	User-Centric Phishing Detection: A RAG and LLM-Based Approach	Abrar Hamed Al Barwani et.al.	2601.21261	translate	read	null
2026-01-29	TIDE: Tuning-Integrated Dynamic Evolution for LLM-Based Automated Heuristic Design	Chentong Chen et.al.	2601.21239	translate	read	null
2026-01-29	SHARP: Social Harm Analysis via Risk Profiles for Measuring Inequities in Large Language Models	Alok Abhishek et.al.	2601.21235	translate	read	null
2026-01-29	Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs	Xiang Zheng et.al.	2601.21233	translate	read	null
2026-01-29	MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation	Tianyi Xu et.al.	2601.21225	translate	read	null
2026-01-29	LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models	Alvi Md Ishmam et.al.	2601.21220	translate	read	null
2026-01-29	Parametric Knowledge is Not All You Need: Toward Honest Large Language Models via Retrieval of Pretraining Data	Christopher Adrian Kusuma et.al.	2601.21218	translate	read	null
2026-01-29	Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models	Zhaoyi Li et.al.	2601.21214	translate	read	null
2026-01-29	Intelli-Planner: Towards Customized Urban Planning via Large Language Model Empowered Reinforcement Learning	Xixian Yong et.al.	2601.21212	translate	read	null
2026-01-29	Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification	Paul He et.al.	2601.21210	translate	read	null
2026-01-29	Scaling Embeddings Outperforms Scaling Experts in Language Models	Hong Liu et.al.	2601.21204	translate	read	null
2026-01-29	ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling	Yuchen Yang et.al.	2601.21198	translate	read	null
2026-01-29	Do Reasoning Models Enhance Embedding Models?	Wun Yu Chan et.al.	2601.21192	translate	read	null
2026-01-29	Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks	Arther Tian et.al.	2601.21189	translate	read	null
2026-01-29	MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models	Sangyun Chung et.al.	2601.21181	translate	read	link
2026-01-29	Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving	Jingyun Wang et.al.	2601.21164	translate	read	null
2026-01-29	Bridging the Arithmetic Gap: The Cognitive Complexity Benchmark and Financial-PoT for Robust Financial Reasoning	Boxiang Zhao et.al.	2601.21157	translate	read	null
2026-01-29	Large Language Models Naively Recover Ethnicity from Individual Records	Noah Dasanaike et.al.	2601.21132	translate	read	null
2026-01-29	Beyond a Single Reference: Training and Evaluation with Paraphrases in Sign Language Translation	Václav Javorek et.al.	2601.21128	translate	read	null
2026-01-28	Planner-Auditor Twin: Agentic Discharge Planning with FHIR-Based LLM Planning, Guideline Recall, Optional Caching and Self-Improvement	Kaiyuan Wu et.al.	2601.21113	translate	read	null
2026-01-28	ChunkWise LoRA: Adaptive Sequence Partitioning for Memory-Efficient Low-Rank Adaptation and Accelerated LLM Inference	Ketan Thakkar et.al.	2601.21109	translate	read	null
2026-01-28	OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence	Jarrod Barnes et.al.	2601.21083	translate	read	link
2026-01-28	LOCUS: Low-Dimensional Model Embeddings for Efficient Model Exploration, Comparison, and Selection	Shivam Patel et.al.	2601.21082	translate	read	null
2026-01-28	Towards Comprehensive Benchmarking Infrastructure for LLMs In Software Engineering	Daniel Rodriguez-Cardenas et.al.	2601.21070	translate	read	null
2026-01-28	Textual Equilibrium Propagation for Deep Compound AI Systems	Minghui Chen et.al.	2601.21064	translate	read	null
2026-01-28	Human-LLM Collaborative Feature Engineering for Tabular Data	Zhuoyan Li et.al.	2601.21060	translate	read	null
2026-01-28	Order-Aware Test-Time Adaptation: Leveraging Temporal Dynamics for Robust Streaming Inference	Young Kyung Kim et.al.	2601.21012	translate	read	null
2026-01-28	Bayesian-LoRA: Probabilistic Low-Rank Adaptation of Large Language Models	Moule Lin et.al.	2601.21003	translate	read	null
2026-01-28	UrduBench: An Urdu Reasoning Benchmark using Contextually Ensembled Translations with Human-in-the-Loop	Muhammad Ali Shafique et.al.	2601.21000	translate	read	null
2026-01-28	Diversifying Toxicity Search in Large Language Models Through Speciation	Onkar Shelar et.al.	2601.20981	translate	read	null
2026-01-28	Infusion of Blockchain to Establish Trustworthiness in AI Supported Software Evolution: A Systematic Literature Review	Mohammad Naserameri et.al.	2601.20918	translate	read	null
2026-01-28	Noisy but Valid: Robust Statistical Evaluation of LLMs with Imperfect Judges	Chen Feng et.al.	2601.20913	translate	read	null
2026-01-28	Non-Markov Multi-Round Conversational Image Generation with History-Conditioned MLLMs	Haochen Zhang et.al.	2601.20911	translate	read	null
2026-01-28	TwinWeaver: An LLM-Based Foundation Model Framework for Pan-Cancer Digital Twins	Nikita Makarov et.al.	2601.20906	translate	read	null
2026-01-28	ICON: Intent-Context Coupling for Efficient Multi-Turn Jailbreak Attack	Xingwei Lin et.al.	2601.20903	translate	read	null
2026-01-28	Text-only adaptation in LLM-based ASR through text denoising	Sergio Burdisso et.al.	2601.20900	translate	read	null
2026-01-28	Reducing Prompt Sensitivity in LLM-based Speech Recognition Through Learnable Projection	Sergio Burdisso et.al.	2601.20898	translate	read	null
2026-01-28	IDE-Bench: Evaluating Large Language Models as IDE Agents on Real-World Software Engineering Tasks	Spencer Mateega et.al.	2601.20886	translate	read	null
2026-01-27	What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models	Md Tasnim Jawad et.al.	2601.20885	translate	read	null
2026-01-28	When Flores Bloomz Wrong: Cross-Direction Contamination in Machine Translation Evaluation	David Tan et.al.	2601.20858	translate	read	null
2026-01-28	SokoBench: Evaluating Long-Horizon Planning and Reasoning in Large Language Models	Sebastiano Monti et.al.	2601.20856	translate	read	null
2026-01-28	Reward Models Inherit Value Biases from Pretraining	Brian Christian et.al.	2601.20838	translate	read	null
2026-01-28	Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives	Tengyue Xu et.al.	2601.20833	translate	read	link
2026-01-28	MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents	Vishnu Sashank Dorbala et.al.	2601.20831	translate	read	null
2026-01-28	Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning	Minwu Kim et.al.	2601.20829	translate	read	link
2026-01-28	Context-Augmented Code Generation Using Programming Knowledge Graphs	Shahd Seddik et.al.	2601.20810	translate	read	null
2026-01-28	How Disciplinary Partnerships Shape Research Landscape in U.S. Library and Information Science Schools	Jiangen He et.al.	2601.20806	translate	read	null
2026-01-28	Reinforcement Learning via Self-Distillation	Jonas Hübotter et.al.	2601.20802	translate	read	link
2026-01-28	Dissecting Multimodal In-Context Learning: Modality Asymmetries and Circuit Dynamics in modern Transformers	Yiran Huang et.al.	2601.20796	translate	read	null
2026-01-28	Agentic Fog: A Policy-driven Framework for Distributed Intelligence in Fog Computing	Saeed Akbar et.al.	2601.20764	translate	read	null
2026-01-28	Persona Prompting as a Lens on LLM Social Reasoning	Jing Yang et.al.	2601.20757	translate	read	link
2026-01-28	ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler	Bohua Zou et.al.	2601.20755	translate	read	null
2026-01-28	Like a Therapist, But Not: Reddit Narratives of AI in Mental Health Contexts	Elham Aghakhani et.al.	2601.20747	translate	read	null
2026-01-28	HESTIA: A Hessian-Guided Differentiable Quantization-Aware Training Framework for Extremely Low-Bit LLMs	Guoan Wang et.al.	2601.20745	translate	read	null
2026-01-28	Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification	Xin Jin et.al.	2601.20742	translate	read	null
2026-01-28	QueerGen: How LLMs Reflect Societal Norms on Gender and Sexuality in Sentence Completion Tasks	Mae Sosto et.al.	2601.20731	translate	read	null
2026-01-28	AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts	Shicheng Fang et.al.	2601.20730	translate	read	link
2026-01-28	Audit Trails for Accountability in Large Language Models	Victor Ojewale et.al.	2601.20727	translate	read	null
2026-01-28	MedViz: An Agent-based, Visual-guided Research Assistant for Navigating Biomedical Literature	Huan He et.al.	2601.20709	translate	read	null
2026-01-28	Beyond GEMM-Centric NPUs: Enabling Efficient Diffusion LLM Sampling	Binglei Lou et.al.	2601.20706	translate	read	null
2026-01-28	Structurally Human, Semantically Biased: Detecting LLM-Generated References with Embeddings and GNNs	Melika Mobini et.al.	2601.20704	translate	read	null
2026-01-28	Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework	Xinyue Li et.al.	2601.20689	translate	read	null
2026-01-28	Online Density-Based Clustering for Real-Time Narrative Evolution Monitorin	Ostap Vykhopen et.al.	2601.20680	translate	read	null
2026-01-28	ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code	Mingqiao Mo et.al.	2601.20679	translate	read	null
2026-01-28	Efficient Multimodal Planning Agent for Visual Question-Answering	Zhuo Chen et.al.	2601.20676	translate	read	null
2026-01-28	bi-modal textual prompt learning for vision-language models in remote sensing	Pankhi Kashyap et.al.	2601.20675	translate	read	null
2026-01-28	Harnessing Large Language Models for Precision Querying and Retrieval-Augmented Knowledge Extraction in Clinical Data Science	Juan Jose Rubio Jan et.al.	2601.20674	translate	read	null
2026-01-28	When Vision Meets Texts in Listwise Reranking	Hongyi Cai et.al.	2601.20623	translate	read	null
2026-01-28	GDCNet: Generative Discrepancy Comparison Network for Multimodal Sarcasm Detection	Shuguang Zhang et.al.	2601.20618	translate	read	null
2026-01-28	Agent Benchmarks Fail Public Sector Requirements	Jonathan Rystrøm et.al.	2601.20617	translate	read	null
2026-01-28	DRAINCODE: Stealthy Energy Consumption Attacks on Retrieval-Augmented Code Generation via Context Poisoning	Yanlin Wang et.al.	2601.20615	translate	read	null
2026-01-28	Dialogical Reasoning Across AI Architectures: A Multi-Model Framework for Testing AI Alignment Strategies	Gray Cox et.al.	2601.20604	translate	read	null
2026-01-28	MeCo: Enhancing LLM-Empowered Multi-Robot Collaboration via Similar Task Memoization	Baiqing Wang et.al.	2601.20577	translate	read	null
2026-01-28	Gen-SER: When the generative model meets speech emotion recognition	Taihui Wang et.al.	2601.20573	translate	read	null
2026-01-28	Beyond Divergent Creativity: A Human-Based Evaluation of Creativity in Large Language Models	Kumiko Nakajima et.al.	2601.20546	translate	read	null
2026-01-28	PathWise: Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs	Oguzhan Gungordu et.al.	2601.20539	translate	read	null
2026-01-28	Interpreting Emergent Extreme Events in Multi-Agent Systems	Ling Tang et.al.	2601.20538	translate	read	null
2026-01-28	Context Tokens are Anchors: Understanding the Repetition Curse in dMLLMs from an Information Flow Perspective	Qiyan Zhao et.al.	2601.20520	translate	read	null
2026-01-28	Can We Improve Educational Diagram Generation with In-Context Examples? Not if a Hallucination Spoils the Bunch	Evanfiya Logacheva et.al.	2601.20476	translate	read	null
2026-01-28	Piloting Planetarium Visualizations with LLMs during Live Events in Science Centers	Mathis Brossier et.al.	2601.20466	translate	read	null
2026-01-28	PEARL: Plan Exploration and Adaptive Reinforcement Learning for Multihop Tool Use	Qihao Wang et.al.	2601.20439	translate	read	null
2026-01-28	Concept Component Analysis: A Principled Approach for Concept Extraction in LLMs	Yuhang Liu et.al.	2601.20420	translate	read	null
2026-01-28	Beyond Accuracy: A Cognitive Load Framework for Mapping the Capability Boundaries of Tool-use Agents	Qihao Wang et.al.	2601.20412	translate	read	null
2026-01-28	GuideAI: A Real-time Personalized Learning Solution with Adaptive Interventions	Ananya Shukla et.al.	2601.20402	translate	read	null
2026-01-28	Eliminating Hallucination in Diffusion-Augmented Interactive Text-to-Image Retrieval	Zhuocheng Zhang et.al.	2601.20391	translate	read	null
2026-01-28	Policy of Thoughts: Scaling LLM Reasoning via Test-time Policy Evolution	Zhengbo Jiao et.al.	2601.20379	translate	read	null
2026-01-28	LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning	Wei Huang et.al.	2601.20375	translate	read	null
2026-01-28	AMA: Adaptive Memory via Multi-Agent Collaboration	Weiquan Huang et.al.	2601.20352	translate	read	null
2026-01-28	Demonstration-Free Robotic Control via LLM Agents	Brian Y. Tsui et.al.	2601.20334	translate	read	null
2026-01-28	PsychePass: Calibrating LLM Therapeutic Competence via Trajectory-Anchored Tournaments	Zhuang Chen et.al.	2601.20330	translate	read	null
2026-01-28	ECG-Agent: On-Device Tool-Calling Agent for ECG Multi-Turn Dialogue	Hyunseung Chung et.al.	2601.20323	translate	read	null
2026-01-28	Less is More: Benchmarking LLM Based Recommendation Agents	Kargi Chauhan et.al.	2601.20316	translate	read	null
2026-01-28	DiagLink: A Dual-User Diagnostic Assistance System by Synergizing Experts with LLMs and Knowledge Graphs	Zihan Zhou et.al.	2601.20311	translate	read	null
2026-01-28	SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips	Jiahuan Yu et.al.	2601.20309	translate	read	null
2026-01-28	Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction	Tianyi Alex Qiu et.al.	2601.20299	translate	read	null
2026-01-28	Memory Retrieval in Transformers: Insights from The Encoding Specificity Principle	Viet Hung Dinh et.al.	2601.20282	translate	read	null
2026-01-28	Eliciting Least-to-Most Reasoning for Phishing URL Detection	Holly Trikilis et.al.	2601.20270	translate	read	null
2026-01-28	HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH	Yueyang Wang et.al.	2601.20255	translate	read	null
2026-01-28	Efficient Evaluation of LLM Performance with Statistical Guarantees	Skyler Wu et.al.	2601.20251	translate	read	null
2026-01-28	Large Language Models Polarize Ideologically but Moderate Affectively in Online Political Discourse	Gavin Wang et.al.	2601.20238	translate	read	null
2026-01-28	Unit-Based Agent for Semi-Cascaded Full-Duplex Dialogue Systems	Haoyuan Yu et.al.	2601.20230	translate	read	null
2026-01-28	Scaling Medical Reasoning Verification via Tool-Integrated Reinforcement Learning	Hang Zhang et.al.	2601.20221	translate	read	null
2026-01-28	Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning	Jinyang Wu et.al.	2601.20209	translate	read	null
2026-01-28	An Autonomous Agent Framework for Feature-Label Extraction from Device Dialogues and Automatic Multi-Dimensional Device Hosting Planning Based on Large Language Models	Huichao Men et.al.	2601.20194	translate	read	null
2026-01-28	Me-Agent: A Personalized Mobile Agent with Two-Level User Habit Learning for Enhanced Interaction	Shuoxin Wang et.al.	2601.20162	translate	read	null
2026-01-28	Large language models accurately predict public perceptions of support for climate action worldwide	Nattavudh Powdthavee et.al.	2601.20141	translate	read	null
2026-01-27	BengaliSent140: A Large-Scale Bengali Binary Sentiment Dataset for Hate and Non-Hate Speech Classification	Akif Islam et.al.	2601.20129	translate	read	null
2026-01-27	Rewarding Intellectual Humility Learning When Not To Answer In Large Language Models	Abha Jha et.al.	2601.20126	translate	read	null
2026-01-27	Usage, Effects and Requirements for AI Coding Assistants in the Enterprise: An Empirical Study	Maja Vukovic et.al.	2601.20112	translate	read	null
2026-01-27	FFE-Hallu:Hallucinations in Fixed Figurative Expressions:Benchmark of Idioms and Proverbs in the Persian Language	Faezeh Hosseini et.al.	2601.20105	translate	read	null
2026-01-27	Dynamics of Human-AI Collective Knowledge on the Web: A Scalable Model and Insights for Sustainable Growth	Buddhika Nettasinghe et.al.	2601.20099	translate	read	null
2026-01-27	Should I Have Expressed a Different Intent? Counterfactual Generation for LLM-Based Autonomous Control	Amirmohammad Farzaneh et.al.	2601.20090	translate	read	null
2026-01-27	Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery	Meng Xin et.al.	2601.20088	translate	read	null
2026-01-27	Sparse CLIP: Co-Optimizing Interpretability and Performance in Contrastive Learning	Chuan Qin et.al.	2601.20075	translate	read	null
2026-01-23	A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs	Dayal Singh Kalra et.al.	2601.16979	translate	read	null
2026-01-23	Auto-Regressive Masked Diffusion Models	Mahdi Karami et.al.	2601.16971	translate	read	null
2026-01-23	Empowering Medical Equipment Sustainability in Low-Resource Settings: An AI-Powered Diagnostic and Support Platform for Biomedical Technicians	Bernes Lorier Atabonfack et.al.	2601.16967	translate	read	null
2026-01-23	AgentDrive: An Open Benchmark Dataset for Agentic AI Reasoning with LLM-Generated Scenarios in Autonomous Systems	Mohamed Amine Ferrag et.al.	2601.16964	translate	read	null
2026-01-23	DataStates-LLM: Scalable Checkpointing for Transformer Models Using Composable State Providers	Avinash Maurya et.al.	2601.16956	translate	read	null
2026-01-23	Strategies for Span Labeling with Large Language Models	Danil Semin et.al.	2601.16946	translate	read	null
2026-01-23	GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints	Andy Zhu et.al.	2601.16905	translate	read	null
2026-01-23	Reasoning Promotes Robustness in Theory of Mind Tasks	Ian B. de Haan et.al.	2601.16853	translate	read	null
2026-01-23	Trapped in the past? Disentangling fluid and crystallized intelligence of large language models using chess	Leonard S. Pleiss et.al.	2601.16823	translate	read	null
2026-01-23	Large Language Models as Automatic Annotators and Annotation Adjudicators for Fine-Grained Opinion Analysis	Gaurav Negi et.al.	2601.16800	translate	read	null
2026-01-23	Persuasion Tokens for Editing Factual Knowledge in LLMs	Paul Youssef et.al.	2601.16781	translate	read	null
2026-01-23	LLM-powered Real-time Patent Citation Recommendation for Financial Technologies	Tianang Deng et.al.	2601.16775	translate	read	null
2026-01-23	Standardizing Longitudinal Radiology Report Evaluation via Large Language Model Annotation	Xinyi Wang et.al.	2601.16753	translate	read	null
2026-01-23	Supporting Stakeholder Requirements Expression with LLM Revisions: An Empirical Evaluation	Michael Mircea et.al.	2601.16699	translate	read	null
2026-01-23	AgentsEval: Clinically Faithful Evaluation of Medical Imaging Reports via Multi-Agent Reasoning	Suzhong Fu et.al.	2601.16685	translate	read	null
2026-01-23	From Transactions to Exploits: Automated PoC Synthesis for Real-World DeFi Attacks	Xing Su et.al.	2601.16681	translate	read	null
2026-01-23	PLawBench: A Rubric-Based Benchmark for Evaluating LLMs in Real-World Legal Practice	Yuzhen Shi et.al.	2601.16669	translate	read	null
2026-01-23	Revisiting the Role of Natural Language Code Comments in Code Translation	Monika Gupta et.al.	2601.16661	translate	read	null
2026-01-23	Select or Project? Evaluating Lower-dimensional Vectors for LLM Training Data Explanations	Lukas Hinterleitner et.al.	2601.16651	translate	read	null
2026-01-23	LUMINA: Long-horizon Understanding for Multi-turn Interactive Agents	Amin Rakhsha et.al.	2601.16649	translate	read	null
2026-01-23	MultiLexNorm++: A Unified Benchmark and a Generative Model for Lexical Normalization for Asian Languages	Weerayut Buaphet et.al.	2601.16623	translate	read	null
2026-01-23	How Does Personalized Memory Shape LLM Behavior? Benchmarking Rational Preference Utilization in Personalized Assistants	Xueyang Feng et.al.	2601.16621	translate	read	null
2026-01-23	PROST-LLM: Progressively Enhancing the Speech-to-Speech Translation Capability in LLMs	Jing Xu et.al.	2601.16618	translate	read	null
2026-01-23	AuroraEdge-V-2B: A Faster And Stronger Edge Visual Large Language Model	Xiang Chen et.al.	2601.16615	translate	read	null
2026-01-23	Attention-MoA: Enhancing Mixture-of-Agents via Inter-Agent Semantic Attention and Deep Residual Synthesis	Jianyu Wen et.al.	2601.16596	translate	read	null
2026-01-23	X-Aligner: Composed Visual Retrieval without the Bells and Whistles	Yuqian Zheng et.al.	2601.16582	translate	read	null
2026-01-23	Predicting Startup Success Using Large Language Models: A Novel In-Context Learning Approach	Abdurahman Maarouf et.al.	2601.16568	translate	read	null
2026-01-23	Retrieve-Refine-Calibrate: A Framework for Complex Claim Fact-Checking	Mingwei Sun et.al.	2601.16555	translate	read	null
2026-01-23	LLM is Not All You Need: A Systematic Evaluation of ML vs. Foundation Models for text and image based Medical Classification	Meet Raval et.al.	2601.16549	translate	read	null
2026-01-23	CORD: Bridging the Audio-Text Reasoning Gap via Weighted On-policy Cross-modal Distillation	Jing Hu et.al.	2601.16547	translate	read	null
2026-01-23	Do Models Hear Like Us? Probing the Representational Alignment of Audio LLMs and Naturalistic EEG	Haoyun Yang et.al.	2601.16540	translate	read	null
2026-01-23	OnlineSI: Taming Large Language Model for Online 3D Understanding and Grounding	Zixian Liu et.al.	2601.16538	translate	read	null
2026-01-23	W4A16 Mixed-Precision Matrix Multiplication on Decoupled Architecture: Kernel Design and Memory Bottleneck Analysis for Ascend NPUs	Yuanhong He et.al.	2601.16536	translate	read	null
2026-01-23	Curate-Train-Refine: A Closed-Loop Agentic Framework for Zero Shot Classification	Gaurav Maheshwari et.al.	2601.16530	translate	read	null
2026-01-23	SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care	Dongshen Peng et.al.	2601.16529	translate	read	null
2026-01-23	TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning	Daixian Liu et.al.	2601.16520	translate	read	null
2026-01-23	DANCE: Dynamic, Available, Neighbor-gated Condensation for Federated Text-Attributed Graphs	Zekai Chen et.al.	2601.16519	translate	read	null
2026-01-23	Rethinking Large Language Models For Irregular Time Series Classification In Critical Care	Feixiang Zheng et.al.	2601.16516	translate	read	null
2026-01-23	SearchLLM: Detecting LLM Paraphrased Text by Measuring the Similarity with Regeneration of the Candidate Source via Search Engine	Hoang-Quoc Nguyen-Son et.al.	2601.16512	translate	read	null
2026-01-23	REprompt: Prompt Generation for Intelligent Software Development Guided by Requirements Engineering	Junjie Shi et.al.	2601.16507	translate	read	null
2026-01-23	SafeThinker: Reasoning about Risk to Deepen Safety Beyond Shallow Alignment	Xianya Fang et.al.	2601.16506	translate	read	null
2026-01-23	EvoConfig: Self-Evolving Multi-Agent Systems for Efficient Autonomous Environment Configuration	Xinshuai Guo et.al.	2601.16489	translate	read	null
2026-01-23	Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic	Yichuan Ma et.al.	2601.16486	translate	read	null
2026-01-23	FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning	Haoxu Wang et.al.	2601.16483	translate	read	null
2026-01-23	TL-GRPO: Turn-Level RL for Reasoning-Guided Iterative Optimization	Peiji Li et.al.	2601.16480	translate	read	null
2026-01-23	Doc2AHP: Inferring Structured Multi-Criteria Decision Models via Semantic Trees with LLMs	Hongjia Wu et.al.	2601.16479	translate	read	null
2026-01-23	Order from Chaos: Physical World Understanding from Glitchy Gameplay Videos	Meng Cao et.al.	2601.16471	translate	read	null
2026-01-23	Persona Jailbreaking in Large Language Models	Jivnesh Sandhan et.al.	2601.16466	translate	read	null
2026-01-23	Cutting the Gordian Knot: Detecting Malicious PyPI Packages via a Knowledge-Mining Framework	Wenbo Guo et.al.	2601.16463	translate	read	null
2026-01-23	Graph-Anchored Knowledge Indexing for Retrieval-Augmented Generation	Zhenghao Liu et.al.	2601.16462	translate	read	null
2026-01-23	Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding	Xiaojiang Peng et.al.	2601.16449	translate	read	null
2026-01-23	Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go	Yichuan Ma et.al.	2601.16447	translate	read	null
2026-01-23	Exploring the Effects of Alignment on Numerical Bias in Large Language Models	Ayako Sato et.al.	2601.16444	translate	read	null
2026-01-23	iPDB – Optimizing SQL Queries with ML and LLM Predicates	Udesh Kumarasinghe et.al.	2601.16432	translate	read	null
2026-01-23	Learning Domain Knowledge in Multimodal Large Language Models through Reinforcement Fine-Tuning	Qinglong Cao et.al.	2601.16419	translate	read	null
2026-01-23	Gen-DBA: Generative Database Agents (Towards a Move 37 for Databases)	Yeasir Rayhan et.al.	2601.16409	translate	read	null
2026-01-23	Jacobian Scopes: token-level causal attributions in LLMs	Toni J. B. Liu et.al.	2601.16407	translate	read	null
2026-01-23	Towards a Theoretical Understanding to the Generalization of RLHF	Zhaochun Li et.al.	2601.16403	translate	read	null
2026-01-23	Clarify or Answer: Reinforcement Learning for Agentic VQA with Context Under-specification	Zongwan Cao et.al.	2601.16400	translate	read	null
2026-01-23	White-Box Sensitivity Auditing with Steering Vectors	Hannah Cyberey et.al.	2601.16398	translate	read	null
2026-01-23	ResAgent: Entropy-based Prior Point Discovery and Visual Reasoning for Referring Expression Segmentation	Yihao Wang et.al.	2601.16394	translate	read	null
2026-01-23	Cross-Lingual Activation Steering for Multilingual Language Models	Rhitabrat Pokharel et.al.	2601.16390	translate	read	null
2026-01-23	PolyAgent: Large Language Model Agent for Polymer Design	Vani Nigam et.al.	2601.16376	translate	read	null
2026-01-22	The Behavioral Fabric of LLM-Powered GUI Agents: Human Values and Interaction Outcomes	Simret Araya Gebreegziabher et.al.	2601.16356	translate	read	null
2026-01-22	Identity, Cooperation and Framing Effects within Groups of Real and Simulated Humans	Suhong Moon et.al.	2601.16355	translate	read	null
2026-01-22	NOIR: Privacy-Preserving Generation of Code with Open-Source LLMs	Khoa Nguyen et.al.	2601.16354	translate	read	null
2026-01-22	Regional Bias in Large Language Models	M P V S Gopinadh et.al.	2601.16349	translate	read	null
2026-01-22	Identifying Concurrency Bug Reports via Linguistic Patterns	Shuai Shao et.al.	2601.16338	translate	read	null
2026-01-22	National Quantum Strategies: A Data-Driven Approach to Understanding the Quantum Ecosystem	Simon Richard Goorney et.al.	2601.16329	translate	read	null
2026-01-22	Machine-Assisted Grading of Nationwide School-Leaving Essay Exams with LLMs and Statistical NLP	Andres Karjus et.al.	2601.16314	translate	read	null
2026-01-22	A Longitudinal, Multinational, and Multilingual Corpus of News Coverage of the Russo-Ukrainian War	Dikshya Mohanty et.al.	2601.16309	translate	read	null
2026-01-22	When Agents Fail to Act: A Diagnostic Framework for Tool Invocation Reliability in Multi-Agent LLM Systems	Donghao Huang et.al.	2601.16280	translate	read	null
2026-01-22	Better as Generators Than Classifiers: Leveraging LLMs and Synthetic Data for Low-Resource Multilingual Classification	Branislav Pecher et.al.	2601.16278	translate	read	null
2026-01-22	GameTalk: Training LLMs for Strategic Conversation	Victor Conchello Vendrell et.al.	2601.16276	translate	read	null
2026-01-21	Algorithmic Identity Based on Metaparameters: A Path to Reliability, Auditability, and Traceability	Juliao Braga et.al.	2601.16234	translate	read	null
2026-01-22	Provable Robustness in Multimodal Large Language Models via Feature Space Smoothing	Song Xia et.al.	2601.16200	translate	read	null
2026-01-22	PAL*M: Property Attestation for Large Generative Models	Prach Chantasantitam et.al.	2601.16199	translate	read	null
2026-01-22	Structured Hints for Sample-Efficient Lean Theorem Proving	Zachary Burton et.al.	2601.16172	translate	read	null
2026-01-22	Low-altitude Multi-UAV-assisted Data Collection and Semantic Forwarding for Post-Disaster Relief	Xiaoya Zheng et.al.	2601.16146	translate	read	null
2026-01-22	LLM Prompt Evaluation for Educational Applications	Langdon Holmes et.al.	2601.16134	translate	read	null
2026-01-22	Improving Training Efficiency and Reducing Maintenance Costs via Language Specific Model Merging	Alphaeus Dmonte et.al.	2601.16127	translate	read	null
2026-01-22	Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing	Tingyu Song et.al.	2601.16125	translate	read	null
2026-01-22	Adapter Fusion for Multilingual Text2Cypher with Linear and Learned Gating	Makbule Gulcin Ozsoy et.al.	2601.16097	translate	read	null
2026-01-22	Controlling Long-Horizon Behavior in Language Model Agents with Explicit State Dynamics	Sukesh Subaharan et.al.	2601.16087	translate	read	null
2026-01-22	Grounding Large Language Models in Reaction Knowledge Graphs for Synthesis Retrieval	Olga Bunkova et.al.	2601.16038	translate	read	null
2026-01-22	Sawtooth Wavefront Reordering: Enhanced CuTile FlashAttention on NVIDIA GB10	Yifan Zhu et.al.	2601.16032	translate	read	null
2026-01-22	Deja Vu in Plots: Leveraging Cross-Session Evidence with Retrieval-Augmented LLMs for Live Streaming Risk Assessment	Yiran Qiao et.al.	2601.16027	translate	read	null
2026-01-22	Timbre-Aware LLM-based Direct Speech-to-Speech Translation Extendable to Multiple Language Pairs	Lalaram Arya et.al.	2601.16023	translate	read	null
2026-01-22	PhysicsMind: Sim and Real Mechanics Benchmarking for Physical Reasoning and Prediction in Foundational VLMs and World Models	Chak-Wing Mak et.al.	2601.16007	translate	read	null
2026-01-22	TeNet: Text-to-Network for Compact Policy Synthesis	Ariyan Bighashdel et.al.	2601.15912	translate	read	null
2026-01-22	Co-Constructing Alignment: A Participatory Approach to Situate AI Values	Anne Arzberger et.al.	2601.15895	translate	read	null
2026-01-22	Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model	Chenghao Fan et.al.	2601.15892	translate	read	null
2026-01-22	Evaluating and Achieving Controllable Code Completion in Code LLM	Jiajun Zhang et.al.	2601.15879	translate	read	null
2026-01-22	Virtual Traffic Police: Large Language Model-Augmented Traffic Signal Control for Unforeseen Incidents	Shiqi Wei et.al.	2601.15816	translate	read	null
2026-01-22	ErrorMap and ErrorAtlas: Charting the Failure Landscape of Large Language Models	Shir Ashury-Tahan et.al.	2601.15812	translate	read	null
2026-01-22	Attributing and Exploiting Safety Vectors through Global Optimization in Large Language Models	Fengheng Chu et.al.	2601.15801	translate	read	null
2026-01-22	HumanLLM: Towards Personalized Understanding and Simulation of Human Nature	Yuxuan Lei et.al.	2601.15793	translate	read	null
2026-01-22	Next Generation Active Learning: Mixture of LLMs in the Loop	Yuanyuan Qi et.al.	2601.15773	translate	read	null
2026-01-22	Beyond Marginal Distributions: A Framework to Evaluate the Representativeness of Demographic-Aligned LLMs	Tristan Williams et.al.	2601.15755	translate	read	null
2026-01-22	Tabular Incremental Inference	Xinda Chen et.al.	2601.15751	translate	read	null
2026-01-22	Towards Automated Kernel Generation in the Era of LLMs	Yang Yu et.al.	2601.15727	translate	read	null
2026-01-22	VideoThinker: Building Agentic VideoLLMs with LLM-Guided Tool Reasoning	Chenglin Li et.al.	2601.15724	translate	read	null
2026-01-22	CoNRec: Context-Discerning Negative Recommendation with LLMs	Xinda Chen et.al.	2601.15721	translate	read	null
2026-01-22	Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs	Mingyu Yu et.al.	2601.15698	translate	read	null
2026-01-22	From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models	Jiaxin Zhang et.al.	2601.15690	translate	read	null
2026-01-22	Connect the Dots: Knowledge Graph-Guided Crawler Attack on Retrieval-Augmented Generation Systems	Mengyu Yao et.al.	2601.15678	translate	read	null
2026-01-22	What Patients Really Ask: Exploring the Effect of False Assumptions in Patient Information Seeking	Raymond Xiong et.al.	2601.15674	translate	read	null
2026-01-22	EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning	Dingdong Wang et.al.	2601.15668	translate	read	null
2026-01-22	Event-VStream: Event-Driven Real-Time Understanding for Long Video Streams	Zhenghui Guo et.al.	2601.15655	translate	read	null
2026-01-22	Predictive Coding and Information Bottleneck for Hallucination Detection in Large Language Models	Manish Bhatt et.al.	2601.15652	translate	read	null
2026-01-22	Towards Reliable Medical LLMs: Benchmarking and Enhancing Confidence Estimation of Large Language Models in Medical Consultation	Zhiyao Ren et.al.	2601.15645	translate	read	null
2026-01-22	CogToM: A Comprehensive Theory of Mind Benchmark inspired by Human Cognition for Large Language Models	Haibo Tong et.al.	2601.15628	translate	read	null
2026-01-22	Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors	Zhiwei Zhang et.al.	2601.15625	translate	read	null
2026-01-22	Explainable Deepfake Detection with RL Enhanced Self-Blended Images	Ning Jiang et.al.	2601.15624	translate	read	null
2026-01-22	When Sharpening Becomes Collapse: Sampling Bias and Semantic Coupling in RL with Verifiable Rewards	Mingyuan Fan et.al.	2601.15609	translate	read	null
2026-01-22	ToxiTwitch: Toward Emote-Aware Hybrid Moderation for Live Streaming Platforms	Baktash Ansari et.al.	2601.15605	translate	read	null
2026-01-22	Autonomous Business System via Neuro-symbolic AI	Cecil Pang et.al.	2601.15599	translate	read	null
2026-01-22	DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice	Leying Zhang et.al.	2601.15596	translate	read	null
2026-01-22	Data-Free Privacy-Preserving for LLMs via Model Inversion and Selective Unlearning	Xinjie Zhou et.al.	2601.15595	translate	read	null
2026-01-22	YuFeng-XGuard: A Reasoning-Centric, Interpretable, and Flexible Guardrail Model for Large Language Models	Junyu Lin et.al.	2601.15588	translate	read	null
2026-01-22	MapViT: A Two-Stage ViT-Based Framework for Real-Time Radio Quality Map Prediction in Dynamic Environments	Cyril Shih-Huan Hsu et.al.	2601.15578	translate	read	null
2026-01-22	From Generation to Collaboration: Using LLMs to Edit for Empathy in Healthcare	Man Luo et.al.	2601.15558	translate	read	null
2026-01-22	LLM or Human? Perceptions of Trust and Information Quality in Research Summaries	Nil-Jana Akpinar et.al.	2601.15556	translate	read	null
2026-01-22	VIOLA: Towards Video In-Context Learning with Minimal Annotations	Ryo Fujii et.al.	2601.15549	translate	read	null
2026-01-21	Securing LLM-as-a-Service for Small Businesses: An Industry Case Study of a Distributed Chatbot Deployment Platform	Jiazhu Xie et.al.	2601.15528	translate	read	null
2026-01-21	TransportAgents: a multi-agents LLM framework for traffic accident severity prediction	Zhichao Yang et.al.	2601.15519	translate	read	null
2026-01-21	AdversaRiskQA: An Adversarial Factuality Benchmark for High-Risk Domains	Adam Szelestey et.al.	2601.15511	translate	read	null
2026-01-21	MARS: Unleashing the Power of Speculative Decoding via Margin-Aware Verification	Jingwei Song et.al.	2601.15498	translate	read	null
2026-01-21	Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge	Yiyang Feng et.al.	2601.15495	translate	read	null
2026-01-21	Testing Deep Learning Libraries via Neurosymbolic Constraint Learning	M M Abid Naziri et.al.	2601.15493	translate	read	null
2026-01-21	Multi-Persona Thinking for Bias Mitigation in Large Language Models	Yuxing Chen et.al.	2601.15488	translate	read	null
2026-01-21	A Universal Large Language Model – Drone Command and Control Interface	Javier N. Ramos-Silva et.al.	2601.15486	translate	read	null
2026-01-21	The Rise of Large Language Models and the Direction and Impact of US Federal Research Funding	Yifan Qian et.al.	2601.15485	translate	read	null
2026-01-21	Martingale Foresight Sampling: A Principled Approach to Inference-Time LLM Decoding	Huayu Li et.al.	2601.15482	translate	read	null
2026-01-21	Benchmarking LLMs for Pairwise Causal Discovery in Biomedical and Multi-Domain Contexts	Sydney Anuyah et.al.	2601.15479	translate	read	null
2026-01-21	Reliability by design: quantifying and eliminating fabrication risk in LLMs. From generative to consultative AI: a comparative analysis in the legal domain and lessons for high-stakes knowledge bases	Alex Dantart et.al.	2601.15476	translate	read	null
2026-01-21	Chunking, Retrieval, and Re-ranking: An Empirical Evaluation of RAG Architectures for Policy Document Question Answering	Anuj Maharjan et.al.	2601.15457	translate	read	null
2026-01-21	Exploring Implicit Perspectives on Autism in Large Language Models Through Multi-Agent Simulations	Sohyeon Park et.al.	2601.15437	translate	read	null
2026-01-21	Not Your Typical Sycophant: The Elusive Nature of Sycophancy in Large Language Models	Shahar Ben Natan et.al.	2601.15436	translate	read	null
2026-01-21	Domain-Specific Knowledge Graphs in RAG-Enhanced Healthcare LLMs	Sydney Anuyah et.al.	2601.15429	translate	read	null
2026-01-21	Evaluating Multimodal Large Language Models for Heterogeneous Face Recognition	Hatef Otroshi Shahreza et.al.	2601.15406	translate	read	null
2026-01-21	Beyond Prompting: Efficient and Robust Contextual Biasing for Speech LLMs via Logit-Space Integration (LOGIC)	Peidong Wang et.al.	2601.15397	translate	read	null
2026-01-21	Memorization Dynamics in Knowledge Distillation for Language Models	Jaydeep Borkar et.al.	2601.15394	translate	read	null
2026-01-21	VegaChat: A Robust Framework for LLM-Based Chart Generation and Assessment	Marko Hostnik et.al.	2601.15385	translate	read	null
2026-01-21	OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation	Letian Zhang et.al.	2601.15369	translate	read	null
2026-01-21	Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing	Xiang Li et.al.	2601.15356	translate	read	null
2026-01-21	A Prompt-Based Framework for Loop Vulnerability Detection Using Local LLMs	Adeyemi Adeseye et.al.	2601.15352	translate	read	null
2026-01-21	Abusive music and song transformation using GenAI and LLMs	Jiyang Choi et.al.	2601.15348	translate	read	null
2026-01-20	Lost in Transcription: How Speech-to-Text Errors Derail Code Understanding	Jayant Havare et.al.	2601.15339	translate	read	null
2026-01-20	From Quotes to Concepts: Axial Coding of Political Debates with Ensemble LMs	Angelina Parfenova et.al.	2601.15338	translate	read	null
2026-01-20	ToolCaching: Towards Efficient Caching for LLM Tool-calling	Yi Zhai et.al.	2601.15335	translate	read	null
2026-01-20	No Reliable Evidence of Self-Reported Sentience in Small Large Language Models	Caspar Kaiser et.al.	2601.15334	translate	read	null
2026-01-20	Empowering LLMs for Structure-Based Drug Design via Exploration-Augmented Latent Inference	Xuanning Hu et.al.	2601.15333	translate	read	null
2026-01-20	RECAP: A Resource-Efficient Method for Adversarial Prompting in Large Language Models	Rishit Chugh et.al.	2601.15331	translate	read	null
2026-01-20	ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation	Zhebo Wang et.al.	2601.15330	translate	read	null
2026-01-21	Towards Understanding Best Practices for Quantization of Vision-Language Models	Gautom Das et.al.	2601.15287	translate	read	link
2026-01-21	Iterative Refinement Improves Compositional Image Generation	Shantanu Jaiswal et.al.	2601.15286	translate	read	null
2026-01-21	MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs	Christoph Bartmann et.al.	2601.15279	translate	read	null
2026-01-21	Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks	Sahar Tahmasebi et.al.	2601.15277	translate	read	null
2026-01-21	Lightweight LLMs for Network Attack Detection in IoT Networks	Piyumi Bhagya Sudasinghe et.al.	2601.15269	translate	read	null
2026-01-21	Evaluation of Large Language Models in Legal Applications: Challenges, Methods, and Future Directions	Yiran Hu et.al.	2601.15267	translate	read	null
2026-01-21	The Effect of Scripts and Formats on LLM Numeracy	Varshini Reddy et.al.	2601.15251	translate	read	null
2026-01-21	Metadata Conditioned Large Language Models for Localization	Anjishnu Mukherjee et.al.	2601.15236	translate	read	null
2026-01-21	When Agents Fail: A Comprehensive Study of Bugs in LLM Agents with Automated Labeling	Niful Islam et.al.	2601.15232	translate	read	null
2026-01-21	Deaf and Hard of Hearing Access to Intelligent Personal Assistants: Comparison of Voice-Based Options with an LLM-Powered Touch Interface	Paige S. DeVries et.al.	2601.15209	translate	read	null
2026-01-21	Benchmarking Large Language Models for ABAP Code Generation: An Empirical Study on Iterative Improvement by Compiler Feedback	Stephan Wallraven et.al.	2601.15188	translate	read	null
2026-01-21	Supporting Humans in Evaluating AI Summaries of Legal Depositions	Naghmeh Farzi et.al.	2601.15182	translate	read	null
2026-01-21	The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models	Zanlin Ni et.al.	2601.15165	translate	read	link
2026-01-21	Automated Rubrics for Reliable Evaluation of Medical Dialogue Systems	Yinzhu Chen et.al.	2601.15161	translate	read	null
2026-01-21	Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning	Yuval Kansal et.al.	2601.15160	translate	read	null
2026-01-21	How to Build AI Agents by Augmenting LLMs with Codified Human Expert Domain Knowledge? A Software Engineering Framework	Choro Ulan uulu et.al.	2601.15153	translate	read	null
2026-01-21	CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning	Tianshi Xu et.al.	2601.15141	translate	read	null
2026-01-21	Why Authors and Maintainers Link (or Don’t Link) Their PyPI Libraries to Code Repositories and Donation Platforms	Alexandros Tsakpinis et.al.	2601.15139	translate	read	null
2026-01-21	Conversational AI for Social Good (CAI4SG): An Overview of Emerging Trends, Applications, and Challenges	Yi-Chieh Lee et.al.	2601.15136	translate	read	null
2026-01-21	The Plausibility Trap: Using Probabilistic Engines for Deterministic Tasks	Ivan Carrera et.al.	2601.15130	translate	read	null
2026-01-21	RSNA Large Language Model Benchmark Dataset for Chest Radiographs of Cardiothoracic Disease: Radiologist Evaluation and Validation Enhanced by AI Labels (REVEAL-CXR)	Yishu Wei et.al.	2601.15129	translate	read	null
2026-01-21	From Who They Are to How They Act: Behavioral Traits in Generative Agent-Based Models of Social Media	Valerio La Gatta et.al.	2601.15114	translate	read	null
2026-01-21	Parameter-Efficient Multi-Task Fine-Tuning in Code-Related Tasks	Md Zahidul Haque et.al.	2601.15094	translate	read	null
2026-01-21	Multi-Agent Constraint Factorization Reveals Latent Invariant Solution Structure	Christopher Scofield et.al.	2601.15077	translate	read	null
2026-01-21	The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution	Chen Qian et.al.	2601.15075	translate	read	null
2026-01-21	SmartOracle – An Agentic Approach to Mitigate Noise in Differential Oracles	Srinath Srinivasan et.al.	2601.15074	translate	read	null
2026-01-21	Turning Citation Networks Inside Out: Studying Science Using Content-Based Knowledge Graphs from LLM-Derived Taxonomies	Seorin Kim et.al.	2601.15062	translate	read	null
2026-01-21	LogicScore: Fine-grained Logic Evaluation of Conciseness, Completeness, and Determinateness in Attributed Question Answering	Zhichao Yan et.al.	2601.15050	translate	read	null
2026-01-21	Game-Theoretic Lens on LLM-based Multi-Agent Systems	Jianing Hao et.al.	2601.15047	translate	read	null
2026-01-21	Knowledge Restoration-driven Prompt Optimization: Unlocking LLM Potential for Open-Domain Relational Triplet Extraction	Xiaonan Jing et.al.	2601.15037	translate	read	null
2026-01-21	Visual and Cognitive Demands of a Large Language Model-Powered In-vehicle Conversational Agent	Chris Monk et.al.	2601.15034	translate	read	null
2026-01-21	Mixture-of-Experts Models in Vision: Routing, Optimization, and Generalization	Adam Rokah et.al.	2601.15021	translate	read	null
2026-01-21	LiViBench: An Omnimodal Benchmark for Interactive Livestream Video Understanding	Xiaodong Wang et.al.	2601.15016	translate	read	null
2026-01-21	Obscuring Data Contamination Through Translation: Evidence from Arabic Corpora	Chaymaa Abbas et.al.	2601.14994	translate	read	null
2026-01-21	InstructTime++: Time Series Classification with Multimodal Language Modeling via Implicit Feature Enhancement	Mingyue Cheng et.al.	2601.14968	translate	read	null
2026-01-21	Power-Law Scaling in the Classification Performance of Small-Scale Spiking Neural Networks	Zhengdi Zhang et.al.	2601.14961	translate	read	null
2026-01-21	CorpusQA: A 10 Million Token Benchmark for Corpus-Level Analysis and Reasoning	Zhiyuan Lu et.al.	2601.14952	translate	read	null
2026-01-21	What Should I Cite? A RAG Benchmark for Academic Citation Prediction	Leqi Zheng et.al.	2601.14949	translate	read	null
2026-01-21	The GDN-CC Dataset: Automatic Corpus Clarification for AI-enhanced Democratic Citizen Consultations	Pierre-Antoine Lequeu et.al.	2601.14944	translate	read	null
2026-01-21	State of the Art of LLM-Enabled Interaction with Visualization	Mathis Brossier et.al.	2601.14943	translate	read	null
2026-01-21	LLM-Based Repair of C++ Implicit Data Loss Compiler Warnings: An Industrial Case Study	Chansong You et.al.	2601.14936	translate	read	null
2026-01-21	CodeDelegator: Mitigating Context Pollution via Role Separation in Code-as-Action Agents	Tianxiang Fei et.al.	2601.14914	translate	read	null
2026-01-21	AlertGuardian: Intelligent Alert Life-Cycle Management for Large-scale Cloud Systems	Guangba Yu et.al.	2601.14912	translate	read	null
2026-01-21	SynPerf: A Hybrid Analytical-ML Framework for GPU Performance Prediction	Kaixuan Zhang et.al.	2601.14910	translate	read	null
2026-01-21	Comparative Study of Large Language Models on Chinese Film Script Continuation: An Empirical Analysis Based on GPT-5.2 and Qwen-Max	Yuxuan Cao et.al.	2601.14826	translate	read	null
2026-01-21	Reflecting in the Reflection: Integrating a Socratic Questioning Framework into Automated AI-Based Question Generation	Ondřej Holub et.al.	2601.14798	translate	read	null
2026-01-21	CI4A: Semantic Component Interfaces for Agents Empowering Web Automation	Zhi Qiu et.al.	2601.14790	translate	read	null
2026-01-21	RECAP: Resistance Capture in Text-based Mental Health Counseling with Large Language Models	Anqi Li et.al.	2601.14780	translate	read	null
2026-01-21	ReinPath: A Multimodal Reinforcement Learning Approach for Pathology	Kangcheng Zhou et.al.	2601.14757	translate	read	null
2026-01-21	Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning	Yifan Wang et.al.	2601.14750	translate	read	link
2026-01-21	Optimizing FaaS Platforms for MCP-enabled Agentic Workflows	Varad Kulkarni et.al.	2601.14735	translate	read	null
2026-01-21	AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering	Chun-Yi Kuan et.al.	2601.14728	translate	read	null
2026-01-21	HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding	Haowei Zhang et.al.	2601.14724	translate	read	link
2026-01-21	PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning	Yao Lu et.al.	2601.14716	translate	read	null
2026-01-21	Unified Multimodal and Multilingual Retrieval via Multi-Task Learning with NLU Integration	Xinyuan Zhang et.al.	2601.14714	translate	read	null
2026-01-21	DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs	Mingxuan Song et.al.	2601.14711	translate	read	null
2026-01-21	LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval	Chao Gao et.al.	2601.14706	translate	read	null
2026-01-21	DARL: Encouraging Diverse Answers for General Reasoning without Verifiers	Chongxuan Huang et.al.	2601.14700	translate	read	null
2026-01-21	AdaTIR: Adaptive Tool-Integrated Reasoning via Difficulty-Aware Policy Optimization	Zhaiyu Fang et.al.	2601.14696	translate	read	null
2026-01-21	Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation	Muhammad Khalifa et.al.	2601.14691	translate	read	null
2026-01-21	IB-GRPO: Aligning LLM-based Learning Path Recommendation with Educational Objectives via Indicator-Based Group Relative Policy Optimization	Shuai Wang et.al.	2601.14686	translate	read	null
2026-01-21	FARE: Fast-Slow Agentic Robotic Exploration	Shuhao Liao et.al.	2601.14681	translate	read	null
2026-01-21	HCVR Scene Generation: High Compatibility Virtual Reality Environment Generation for Extended Redirected Walking	Yiran Zhang et.al.	2601.14679	translate	read	null
2026-01-21	INFA-Guard: Mitigating Malicious Propagation via Infection-Aware Safeguarding in LLM-Based Multi-Agent Systems	Yijin Zhou et.al.	2601.14667	translate	read	null
2026-01-21	NeuroFilter: Privacy Guardrails for Conversational LLM Agents	Saswat Das et.al.	2601.14660	translate	read	null
2026-01-21	Say Anything but This: When Tokenizer Betrays Reasoning in LLMs	Navid Ayoobi et.al.	2601.14658	translate	read	null
2026-01-21	MIND: Empowering Mental Health Clinicians with Multimodal Data Insights through a Narrative Dashboard	Ruishi Zou et.al.	2601.14641	translate	read	null
2026-01-21	Forest-Chat: Adapting Vision-Language Agents for Interactive Forest Change Analysis	James Brock et.al.	2601.14637	translate	read	null
2026-01-21	Probing Prompt Design for Socially Compliant Robot Navigation with Vision Language Models	Ling Xiao et.al.	2601.14622	translate	read	null
2026-01-21	Seeing to Think? How Source Transparency Design Shapes Interactive Information Seeking and Evaluation in Conversational AI	Jiangen He et.al.	2601.14611	translate	read	null
2026-01-21	An LLM Agent-based Framework for Whaling Countermeasures	Daisuke Miyamoto et.al.	2601.14606	translate	read	null
2026-01-21	Variance-Adaptive Muon: Accelerating LLM Pretraining with NSR-Modulated and Variance-Scaled Momentum	Jingru Li et.al.	2601.14603	translate	read	null
2026-01-21	3D Space as a Scratchpad for Editable Text-to-Image Generation	Oindrila Saha et.al.	2601.14602	translate	read	null
2026-01-21	HELIOS: Hierarchical Graph Abstraction for Structure-Aware LLM Decompilation	Yonatan Gizachew Achamyeleh et.al.	2601.14598	translate	read	null
2026-01-21	LFS: Learnable Frame Selector for Event-Aware and Temporally Diverse Video Captioning	Lianying Chao et.al.	2601.14594	translate	read	null
2026-01-21	Counterfactual Modeling with Fine-Tuned LLMs for Health Intervention Design and Sensor Data Augmentation	Shovito Barua Soumma et.al.	2601.14590	translate	read	null
2026-01-21	Social Caption: Evaluating Social Understanding in Multimodal Models	Bhaavanaa Thumu et.al.	2601.14569	translate	read	null
2026-01-21	Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education	Unggi Lee et.al.	2601.14560	translate	read	null
2026-01-21	Self-Blinding and Counterfactual Self-Simulation Mitigate Biases and Sycophancy in Large Language Models	Brian Christian et.al.	2601.14553	translate	read	null
2026-01-20	Predicting Retrieval Utility and Answer Quality in Retrieval-Augmented Generation	Fangzheng Tian et.al.	2601.14546	translate	read	null
2026-01-20	Report for NSF Workshop on AI for Electronic Design Automation	Deming Chen et.al.	2601.14541	translate	read	null
2026-01-20	LLM Security and Safety: Insights from Homotopy-Inspired Prompt Obfuscation	Luis Lazo et.al.	2601.14528	translate	read	null
2026-01-20	Large Language Model-Powered Evolutionary Code Optimization on a Phylogenetic Tree	Leyi Zhao et.al.	2601.14523	translate	read	null
2026-01-20	Can LLM Reasoning Be Trusted? A Comparative Study: Using Human Benchmarking on Statistical Tasks	Crish Nagarkar et.al.	2601.14479	translate	read	null
2026-01-20	Large Language Models for Large-Scale, Rigorous Qualitative Analysis in Applied Health Services Research	Sasha Ronaghi et.al.	2601.14478	translate	read	null
2026-01-20	On the Generalization Gap in LLM Planning: Tests and Verifier-Reward RL	Valerio Belcamino et.al.	2601.14456	translate	read	null
2026-01-20	Diffusion Large Language Models for Black-Box Optimization	Ye Yuan et.al.	2601.14446	translate	read	null
2026-01-20	Agentic AI Meets Edge Computing in Autonomous UAV Swarms	Thuan Minh Nguyen et.al.	2601.14437	translate	read	null
2026-01-20	CMind: An AI Agent for Localizing C Memory Bugs	Chia-Yi Su et.al.	2601.14434	translate	read	null
2026-01-20	Measuring the State of Open Science in Transportation Using Large Language Models	Junyi Ji et.al.	2601.14429	translate	read	null
2026-01-20	Rethinking On-Device LLM Reasoning: Why Analogical Mapping Outperforms Abstract Thinking for IoT DDoS Detection	William Pan et.al.	2601.14343	translate	read	null
2026-01-20	Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs	Yiyang Lu et.al.	2601.14340	translate	read	null
2026-01-20	Layer-adaptive Expert Pruning for Pre-Training of Mixture-of-Experts Large Language Models	YuanLab. ai et.al.	2601.14327	translate	read	null
2026-01-19	Tracing the Data Trail: A Survey of Data Provenance, Transparency and Traceability in LLMs	Richard Hohensinner et.al.	2601.14311	translate	read	null
2026-01-19	CORVUS: Red-Teaming Hallucination Detectors via Internal Signal Camouflage in Large Language Models	Nay Myat Min et.al.	2601.14310	translate	read	null
2026-01-20	XR: Cross-Modal Agents for Composed Image Retrieval	Zhongyu Yang et.al.	2601.14245	translate	read	null
2026-01-20	Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow	Haocheng Xi et.al.	2601.14243	translate	read	null
2026-01-20	Attention-Based Offline Reinforcement Learning and Clustering for Interpretable Sepsis Treatment	Punit Kumar et.al.	2601.14228	translate	read	null
2026-01-20	HALT: Hallucination Assessment via Latent Testing	Rohan Bhatnagar et.al.	2601.14210	translate	read	null
2026-01-20	InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning	Matthew Y. R. Yang et.al.	2601.14209	translate	read	null
2026-01-20	Toward Efficient Agents: Memory, Tool learning, and Planning	Xiaofang Yang et.al.	2601.14192	translate	read	link
2026-01-20	ReSearch: A Multi-Stage Machine Learning Framework for Earth Science Data Discovery	Youran Sun et.al.	2601.14176	translate	read	null
2026-01-20	Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance	Qianli Ma et.al.	2601.14171	translate	read	link
2026-01-20	Domain-Adaptation through Synthetic Data: Fine-Tuning Large Language Models for German Law	Ali Hamza Bashir et.al.	2601.14160	translate	read	null
2026-01-20	ConceptCaps – a Distilled Concept Dataset for Interpretability in Music Models	Bruno Sienkiewicz et.al.	2601.14157	translate	read	null
2026-01-20	LLM Augmented Intervenable Multimodal Adaptor for Post-operative Complication Prediction in Lung Cancer Surgery	Shubham Pandey et.al.	2601.14154	translate	read	null
2026-01-20	Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models	Hyunjong Ok et.al.	2601.14152	translate	read	null
2026-01-20	The Quest for Reliable AI Accelerators: Cross-Layer Evaluation and Design Optimization	Meng Li et.al.	2601.14148	translate	read	null
2026-01-20	CREATE: Cross-Layer Resilience Characterization and Optimization for Efficient yet Reliable Embodied AI Systems	Tong Xie et.al.	2601.14140	translate	read	null
2026-01-20	The Side Effects of Being Smart: Safety Risks in MLLMs’ Multi-Image Reasoning	Renmiao Chen et.al.	2601.14127	translate	read	link
2026-01-20	Style Transfer as Bias Mitigation: Diffusion Models for Synthetic Mental Health Text for Arabic	Saad Mankarious et.al.	2601.14124	translate	read	null
2026-01-20	NewsRECON: News article REtrieval for image CONtextualization	Jonathan Tonglet et.al.	2601.14121	translate	read	null
2026-01-20	A flexible language model-assisted electronic design automation framework	Cristian Sestito et.al.	2601.14098	translate	read	null
2026-01-20	Zero-shot adaptable task planning for autonomous construction robots: a comparative study of lightweight single and multi-AI agent systems	Hossein Naderi et.al.	2601.14091	translate	read	null
2026-01-20	DermaBench: A Clinician-Annotated Benchmark Dataset for Dermatology Visual Question Answering and Reasoning	Abdurrahim Yilmaz et.al.	2601.14084	translate	read	null
2026-01-20	XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs	Mohsinul Kabir et.al.	2601.14063	translate	read	null
2026-01-20	Fine-Grained Zero-Shot Composed Image Retrieval with Complementary Visual-Semantic Integration	Yongcong Ye et.al.	2601.14060	translate	read	null
2026-01-20	LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems	Badri N. Patro et.al.	2601.14053	translate	read	null
2026-01-20	Vision Also You Need: Navigating Out-of-Distribution Detection with Multimodal Large Language Model	Haoran Xu et.al.	2601.14052	translate	read	null
2026-01-20	Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants	Yunhe Wang et.al.	2601.14041	translate	read	null
2026-01-20	RM-Distiller: Exploiting Generative LLM for Reward Model Distillation	Hongli Zhou et.al.	2601.14032	translate	read	null
2026-01-20	BACH-V: Bridging Abstract and Concrete Human-Values in Large Language Models	Junyu Zhang et.al.	2601.14007	translate	read	null
2026-01-20	Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models	Hengyuan Zhang et.al.	2601.14004	translate	read	link
2026-01-20	Auditory Brain Passage Retrieval: Cross-Sensory EEG Training for Neural Information Retrieval	Niall McGuire et.al.	2601.14001	translate	read	null
2026-01-20	“The Whole Is Greater Than the Sum of Its Parts”: A Compatibility-Aware Multi-Teacher CoT Distillation Framework	Jin Cui et.al.	2601.13992	translate	read	null
2026-01-20	VirtualCrime: Evaluating Criminal Potential of Large Language Models via Sandbox Simulation	Yilin Tang et.al.	2601.13981	translate	read	null
2026-01-20	RepoGenesis: Benchmarking End-to-End Microservice Generation from Readme to Repository	Zhiyuan Peng et.al.	2601.13943	translate	read	null
2026-01-20	Glance-or-Gaze: Incentivizing LMMs to Adaptively Focus Search via Reinforcement Learning	Hongbo Bai et.al.	2601.13942	translate	read	null
2026-01-20	HyperWalker: Dynamic Hypergraph-Based Deep Diagnosis for Multi-Hop Clinical Modeling across EHR and X-Ray in Medical VLMs	Yuezhe Yang et.al.	2601.13919	translate	read	null
2026-01-20	AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization	Yusheng Liao et.al.	2601.13918	translate	read	link
2026-01-20	Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches	Changhao Pan et.al.	2601.13910	translate	read	null
2026-01-20	Multi-Objective Hierarchical Optimization with Large Language Models	Andrej Schwanke et.al.	2601.13892	translate	read	null
2026-01-20	Human Simulation Computation: A Human-Inspired Framework for Adaptive AI Systems	Hong Su et.al.	2601.13887	translate	read	null
2026-01-20	OpenLearnLM Benchmark: A Unified Framework for Evaluating Knowledge, Skill, and Attitude in Educational Large Language Models	Unggi Lee et.al.	2601.13882	translate	read	null
2026-01-20	LifeAgentBench: A Multi-dimensional Benchmark and Agent for Personal Health Assistants in Digital Health	Ye Tian et.al.	2601.13880	translate	read	null
2026-01-20	Chain-of-Thought Compression Should Not Be Blind: V-Skip for Efficient Multimodal Reasoning via Dual-Path Anchoring	Dongxu Zhang et.al.	2601.13879	translate	read	null
2026-01-20	Pedagogical Alignment for Vision-Language-Action Models: A Comprehensive Framework for Data, Architecture, and Evaluation in Education	Unggi Lee et.al.	2601.13876	translate	read	null
2026-01-20	HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation	Qirui Chen et.al.	2601.13864	translate	read	null
2026-01-20	QKVQA: Question-Focused Filtering for Knowledge-based VQA	Wei Ye et.al.	2601.13856	translate	read	null
2026-01-20	Small Models, Big Impact: Tool-Augmented AI Agents for Wireless Network Planning	Yongqiang Zhang et.al.	2601.13843	translate	read	null
2026-01-20	DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes	Aisha Al-Mohannadi et.al.	2601.13839	translate	read	null
2026-01-20	FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs	Qian Chen et.al.	2601.13836	translate	read	link
2026-01-20	ELSA: Efficient LLM-Centric Split Aggregation for Privacy-Aware Hierarchical Federated Learning over Resource-Constrained Edge Networks	Xiaohong Yang et.al.	2601.13824	translate	read	null
2026-01-20	HoverAI: An Embodied Aerial Agent for Natural Human-Drone Interaction	Yuhua Jin et.al.	2601.13801	translate	read	null
2026-01-20	Look-Ahead-Bench: a Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance	Mostapha Benhenda et.al.	2601.13770	translate	read	null
2026-01-20	DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution	Shengda Fan et.al.	2601.13761	translate	read	link
2026-01-20	On Autopilot? An Empirical Study of Human-AI Teaming and Review Practices in Open Source	Haoyu Gao et.al.	2601.13754	translate	read	null
2026-01-20	Pro-AI Bias in Large Language Models	Benaya Trabelsi et.al.	2601.13749	translate	read	null
2026-01-20	Dimension-First Evaluation of Speech-to-Speech Models with Structured Acoustic Cues	Arjun Chandra et.al.	2601.13742	translate	read	null
2026-01-20	Towards robust long-context understanding of large language model via active recap learning	Chenyu Hui et.al.	2601.13734	translate	read	null
2026-01-20	OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents	Yulin Hu et.al.	2601.13722	translate	read	null
2026-01-20	GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark	Lotta Kiefer et.al.	2601.13711	translate	read	null
2026-01-20	Hidden in Plain Text: Measuring LLM Deception Quality Against Human Baselines Using Social Deduction Games	Christopher Kao et.al.	2601.13709	translate	read	null
2026-01-20	IGAA: Intent-Driven General Agentic AI for Edge Services Scheduling using Generative Meta Learning	Yan Sun et.al.	2601.13702	translate	read	null
2026-01-20	Uncertainty-Aware Gradient Signal-to-Noise Data Selection for Instruction Tuning	Zhihang Yuan et.al.	2601.13697	translate	read	null
2026-01-20	Generative Intent Prediction Agentic AI empowered Edge Service Function Chain Orchestration	Yan Sun et.al.	2601.13694	translate	read	null
2026-01-20	Dr. Assistant: Enhancing Clinical Diagnostic Inquiry via Structured Diagnostic Reasoning Data and Reinforcement Learning	Yue Guo et.al.	2601.13690	translate	read	null
2026-01-20	CodeContests-O: Powering LLMs via Feedback-Driven Iterative Test Case Generation	Jianfeng Cai et.al.	2601.13682	translate	read	link
2026-01-20	CommunityBench: Benchmarking Community-Level Alignment across Diverse Groups and Tasks	Jiayu Lin et.al.	2601.13669	translate	read	null
2026-01-20	Temporal-Spatial Decouple before Act: Disentangled Representation Learning for Multimodal Sentiment Analysis	Chunlei Meng et.al.	2601.13659	translate	read	null
2026-01-20	Beyond Known Facts: Generating Unseen Temporal Knowledge to Address Data Contamination in LLM Evaluation	Arthur Amalvy et.al.	2601.13658	translate	read	null
2026-01-20	Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs	Guangba Yu et.al.	2601.13655	translate	read	null
2026-01-20	TimeART: Towards Agentic Time Series Reasoning via Tool-Augmentation	Xingjian Wu et.al.	2601.13653	translate	read	null
2026-01-20	Fairness or Fluency? An Investigation into Language Bias of Pairwise LLM-as-a-Judge	Xiaolin Zhou et.al.	2601.13649	translate	read	null
2026-01-20	ContiguousKV: Accelerating LLM Prefill with Granularity-Aligned KV Cache Management	Jing Zou et.al.	2601.13631	translate	read	null
2026-01-20	Activation-Space Anchored Access Control for Multi-Class Permission Reasoning in Large Language Models	Zhaopeng Zhang et.al.	2601.13630	translate	read	null
2026-01-20	S $^2$ Voice: Style-Aware Autoregressive Modeling with Enhanced Conditioning for Singing Style Conversion	Ziqian Wang et.al.	2601.13629	translate	read	null
2026-01-20	PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator	Yue Jiet Chong et.al.	2601.13628	translate	read	null
2026-01-20	Are Large Language Models able to Predict Highly Cited Papers? Evidence from Statistical Publications	Zhanshuo Ye et.al.	2601.13627	translate	read	null
2026-01-20	PINA: Prompt Injection Attack against Navigation Agents	Jiani Liu et.al.	2601.13612	translate	read	null
2026-01-20	Foundations of Global Consistency Checking with Noisy LLM Oracles	Paul He et.al.	2601.13600	translate	read	null
2026-01-20	AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development	Shyam Agarwal et.al.	2601.13597	translate	read	null
2026-01-20	Vulnerability of LLMs’ Belief Systems? LLMs Belief Resistance Check Through Strategic Persuasive Conversation Interventions	Fan Huang et.al.	2601.13590	translate	read	null
2026-01-20	TREX: Tokenizer Regression for Optimal Data Mixture	Inho Won et.al.	2601.13588	translate	read	null
2026-01-20	SCRIPTMIND: Crime Script Inference and Cognitive Evaluation for LLM-based Social Engineering Scam Detection System	Heedou Kim et.al.	2601.13581	translate	read	null
2026-01-20	Leveraging ChatGPT and Other NLP Methods for Identifying Risk and Protective Behaviors in MSM: Social Media and Dating apps Text Analysis	Mehrab Beikzadeh et.al.	2601.13558	translate	read	null
2026-01-20	LogicEnvGen: Task-Logic Driven Generation of Diverse Simulated Environments for Embodied AI	Jianan Wang et.al.	2601.13556	translate	read	null
2026-01-20	TruthTensor: Evaluating LLMs Human Imitation through Prediction Market Drift and Holistic Reasoning	Shirin Shahabi et.al.	2601.13545	translate	read	null
2026-01-20	When Wording Steers the Evaluation: Framing Bias in LLM judges	Yerin Hwang et.al.	2601.13537	translate	read	null
2026-01-20	CatMaster: An Agentic Autonomous System for Computational Heterogeneous Catalysis Research	Honghao Chen et.al.	2601.13508	translate	read	null
2026-01-20	Towards Efficient and Robust Linguistic Emotion Diagnosis for Mental Health via Multi-Agent Instruction Refinement	Jian Zhang et.al.	2601.13481	translate	read	null
2026-01-20	A Unified Variational Imputation Framework for Electric Vehicle Charging Data Using Retrieval-Augmented Language Model	Jinhao Li et.al.	2601.13476	translate	read	null
2026-01-20	Preconditioning Benefits of Spectral Orthogonalization in Muon	Jianhao Ma et.al.	2601.13474	translate	read	null
2026-01-19	PhysicsSolutionAgent: Towards Multimodal Explanations for Numerical Physics Problem Solving	Aditya Thole et.al.	2601.13453	translate	read	null
2026-01-19	Explicit Cognitive Allocation: A Principle for Governed and Auditable Inference in Large Language Models	Héctor Manuel Manzanilla-Granados et.al.	2601.13443	translate	read	null
2026-01-19	Trust Me, I’m an Expert: Decoding and Steering Authority Bias in Large Language Models	Priyanka Mary Mammen et.al.	2601.13433	translate	read	null
2026-01-19	RLBR: Reinforcement Learning with Biasing Rewards for Contextual Speech Large Language Models	Bo Ren et.al.	2601.13409	translate	read	null
2026-01-19	Integrating Virtual Reality and Large Language Models for Team-Based Non-Technical Skills Training and Evaluation in the Operating Room	Jacob Barker et.al.	2601.13406	translate	read	null
2026-01-19	Beyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks	Shlok Shelat et.al.	2601.13392	translate	read	null
2026-01-19	Structured Insight from Unstructured Data: Large Language Models for SDOH-Driven Diabetes Risk Prediction	Sasha Ronaghi et.al.	2601.13388	translate	read	null
2026-01-19	Confidence over Time: Confidence Calibration with Temporal Logic for Large Language Model Reasoning	Zhenjiang Mao et.al.	2601.13387	translate	read	null
2026-01-19	A Lightweight Modular Framework for Constructing Autonomous Agents Driven by Large Language Models: Design, Implementation, and Applications in AgentForge	Akbar Anbar Jafari et.al.	2601.13383	translate	read	null
2026-01-19	Bounded Minds, Generative Machines: Envisioning Conversational AI that Works with Human Heuristics and Reduces Bias Risk	Jiqun Liu et.al.	2601.13376	translate	read	null
2026-01-19	Recurrent Confidence Chain: Temporal-Aware Uncertainty Quantification in Large Language Models	Zhenjiang Mao et.al.	2601.13368	translate	read	null
2026-01-19	Sockpuppetting: Jailbreaking LLMs Without Optimization Through Output Prefix Injection	Asen Dotsinski et.al.	2601.13359	translate	read	null
2026-01-19	The Geometry of Thought: How Scale Restructures Reasoning In Large Language Models	Samuel Cyrenius Anderson et.al.	2601.13358	translate	read	null
2026-01-19	LLM-as-RNN: A Recurrent Language Model for Memory Updates and Sequence Prediction	Yuxing Lu et.al.	2601.13352	translate	read	null
2026-01-19	FlipFlop: A Static Analysis-based Energy Optimization Framework for GPU Kernels	Saurabhsingh Rajput et.al.	2601.13345	translate	read	null
2026-01-19	Paid Voices vs. Public Feeds: Interpretable Cross-Platform Theme Modeling of Climate Discourse	Samantha Sudhoff et.al.	2601.13317	translate	read	null
2026-01-19	CausalSpatial: A Benchmark for Object-Centric Causal Spatial Reasoning	Wenxin Ma et.al.	2601.13304	translate	read	null
2026-01-19	OI-Bench: An Option Injection Benchmark for Evaluating LLM Susceptibility to Directive Interference	Yow-Fu Liou et.al.	2601.13300	translate	read	null
2026-01-19	Enginuity: Building an Open Multi-Domain Dataset of Complex Engineering Diagrams	Ethan Seefried et.al.	2601.13299	translate	read	null
2026-01-19	The Tag is the Signal: URL-Agnostic Credibility Scoring for Messages on Telegram	Yipeng Wang et.al.	2601.13294	translate	read	null
2026-01-19	Semantic Communication in Underwater IoT Networks for Meaning-Driven Connectivity	Ruhul Amin Khalil et.al.	2601.13289	translate	read	null
2026-01-19	Balancing Classification and Calibration Performance in Decision-Making LLMs via Calibration Aware Reinforcement Learning	Duygu Nur Yaldiz et.al.	2601.13284	translate	read	null
2026-01-19	Improving the Safety and Trustworthiness of Medical AI via Multi-Agent Evaluation Loops	Zainab Ghafoor et.al.	2601.13268	translate	read	null
2026-01-19	Unlearning in LLMs: Methods, Evaluation, and Open Challenges	Tyler Lizzo et.al.	2601.13264	translate	read	null
2026-01-19	CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning	Eric Onyame et.al.	2601.13262	translate	read	null
2026-01-19	Stop Taking Tokenizers for Granted: They Are Core Design Decisions in Large Language Models	Sawsan Alqahtani et.al.	2601.13260	translate	read	null
2026-01-19	Aligning Agentic World Models via Knowledgeable Experience Learning	Baochang Ren et.al.	2601.13247	translate	read	null
2026-01-19	A Comprehensive Evaluation of LLM Reasoning: From Single-Model to Multi-Agent Paradigms	Yapeng Li et.al.	2601.13243	translate	read	null
2026-01-19	KOCO-BENCH: Can Large Language Models Leverage Domain Knowledge in Software Development?	Xue Jiang et.al.	2601.13240	translate	read	null
2026-01-19	GTPred: Benchmarking MLLMs for Interpretable Geo-localization and Time-of-capture Prediction	Jinnao Li et.al.	2601.13207	translate	read	null
2026-01-19	Real-Time Deadlines Reveal Temporal Awareness Failures in LLM Strategic Dialogues	Neil K. R. Sehgal et.al.	2601.13206	translate	read	null
2026-01-19	Scientific production in the era of Large Language Models	Keigo Kusumegi et.al.	2601.13187	translate	read	null
2026-01-19	Prompt Injection Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching	Diego Gosmar et.al.	2601.13186	translate	read	null
2026-01-19	Training instability in deep learning follows low-dimensional dynamical principles	Zhipeng Zhang et.al.	2601.13160	translate	read	null
2026-01-19	Seeing Radio: From Zero RF Priors to Explainable Modulation Recognition with Vision Language Models	Hang Zou et.al.	2601.13157	translate	read	null
2026-01-19	Probe and Skip: Self-Predictive Token Skipping for Efficient Long-Context LLM Inference	Zimeng Wu et.al.	2601.13155	translate	read	null
2026-01-19	FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference	Chaeyoung Jung et.al.	2601.13143	translate	read	null
2026-01-19	From Human to Machine Refactoring: Assessing GPT-4’s Impact on Python Class Quality and Readability	Alessandro Midolo et.al.	2601.13139	translate	read	null
2026-01-19	Adversarial Alignment: Ensuring Value Consistency in Large Language Models for Sensitive Domains	Yuan Gao et.al.	2601.13137	translate	read	null
2026-01-19	Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization	Alessandro Midolo et.al.	2601.13118	translate	read	null
2026-01-19	Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning	Fengran Mo et.al.	2601.13115	translate	read	null
2026-01-19	Leveraging Lora Fine-Tuning and Knowledge Bases for Construction Identification	Liu Kaipeng et.al.	2601.13105	translate	read	null
2026-01-19	Alexandria: A Multi-Domain Dialectal Arabic Machine Translation Dataset for Culturally Inclusive and Linguistically Diverse LLMs	Abdellah El Mekki et.al.	2601.13099	translate	read	null
2026-01-19	LLM-VLM Fusion Framework for Autonomous Maritime Port Inspection using a Heterogeneous UAV-USV System	Muhayy Ud Din et.al.	2601.13096	translate	read	null
2026-01-19	Adversarial News and Lost Profits: Manipulating Headlines in LLM-Driven Algorithmic Trading	Advije Rizvani et.al.	2601.13082	translate	read	null
2026-01-19	What’s it like to be a chat? On the co-simulation of artificial minds in human-AI conversations	Geoff Keeling et.al.	2601.13081	translate	read	null
2026-01-19	Profiling German Text Simplification with Interpretable Model-Fingerprints	Lars Klöser et.al.	2601.13050	translate	read	null
2026-01-19	Tears or Cheers? Benchmarking LLMs via Culturally Elicited Distinct Affective Responses	Chongyuan Dai et.al.	2601.13024	translate	read	null
2026-01-19	PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning	Zhiyan Hou et.al.	2601.13020	translate	read	null
2026-01-19	MeltRTL: Multi-Expert LLMs with Inference-time Intervention for RTL Code Generation	Nowfel Mashnoor et.al.	2601.13015	translate	read	null
2026-01-19	ArchAgent: Scalable Legacy Software Architecture Recovery with LLMs	Rusheng Pan et.al.	2601.13007	translate	read	null
2026-01-19	Graph Reasoning Paradigm: Structured and Symbolic Reasoning with Topology-Aware Reinforcement Learning for Large Language Models	Runxuan Liu et.al.	2601.12995	translate	read	null
2026-01-19	RAGExplorer: A Visual Analytics System for the Comparative Diagnosis of RAG Systems	Haoyu Tian et.al.	2601.12991	translate	read	null
2026-01-19	PaperGuide: Making Small Language-Model Paper-Reading Agents More Efficient	Zijian Wang et.al.	2601.12988	translate	read	null
2026-01-19	KinGuard: Hierarchical Kinship-Aware Fingerprinting to Defend Against Large Language Model Stealing	Zhenhua Xu et.al.	2601.12986	translate	read	null
2026-01-19	Rules, Resources, and Restrictions: A Taxonomy of Task-Based Information Request Intents	Melanie A. Kilian et.al.	2601.12985	translate	read	null
2026-01-19	ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation	Jesus-German Ortiz-Barajas et.al.	2601.12983	translate	read	null
2026-01-19	The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check	Qingyu Lu et.al.	2601.12979	translate	read	null
2026-01-19	Bridging the Knowledge-Action Gap by Evaluating LLMs in Dynamic Dental Clinical Scenarios	Hongyang Ma et.al.	2601.12974	translate	read	null
2026-01-19	ACE-Align: Attribute Causal Effect Alignment for Cultural Values under Varying Persona Granularities	Jiatang Luo et.al.	2601.12962	translate	read	null
2026-01-19	Beyond Accuracy: Characterizing Code Comprehension Capabilities in (Large) Language Models	Felix Mächtle et.al.	2601.12951	translate	read	null
2026-01-19	AI-generated data contamination erodes pathological variability and diagnostic reliability	Hongyu He et.al.	2601.12946	translate	read	null
2026-01-19	A Component-Based Survey of Interactions between Large Language Models and Multi-Armed Bandits	Miao Xie et.al.	2601.12945	translate	read	null
2026-01-19	On the Evidentiary Limits of Membership Inference for Copyright Auditing	Murat Bilgehan Ertan et.al.	2601.12937	translate	read	null
2026-01-19	A Benchmark for Language Models in Real-World System Building	Weilin Jin et.al.	2601.12927	translate	read	null
2026-01-19	Dual-Stream Collaborative Transformer for Image Captioning	Jun Wan et.al.	2601.12926	translate	read	null
2026-01-19	Injecting Knowledge from Social Science Journals to Improve Indonesian Cultural Understanding by LLMs	Adimulya Kartiyasa et.al.	2601.12921	translate	read	null
2026-01-19	CooperLLM: Cloud-Edge-End Cooperative Federated Fine-tuning for LLMs via ZOO-based Gradient Correction	He Sun et.al.	2601.12917	translate	read	null
2026-01-19	From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation	Jiahao Wang et.al.	2601.12904	translate	read	null
2026-01-19	Efficient Code Analysis via Graph-Guided Large Language Models	Hang Gao et.al.	2601.12890	translate	read	null
2026-01-19	Race, Ethnicity and Their Implication on Bias in Large Language Models	Shiyue Hu et.al.	2601.12868	translate	read	null
2026-01-19	SCULPT: Constraint-Guided Pruned MCTS that Carves Efficient Paths for Mathematical Reasoning	Qitong Fang et.al.	2601.12842	translate	read	null
2026-01-19	Do Clinical Question Answering Systems Really Need Specialised Medical Fine Tuning?	Sushant Kumar Ray et.al.	2601.12812	translate	read	null
2026-01-19	Semi-supervised Instruction Tuning for Large Language Models on Text-Attributed Graphs	Zixing Song et.al.	2601.12807	translate	read	null
2026-01-19	SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding	Xiaohan Huang et.al.	2601.12805	translate	read	null
2026-01-19	VIRO: Robust and Efficient Neuro-Symbolic Reasoning with Verification for Referring Expression Comprehension	Hyejin Park et.al.	2601.12781	translate	read	null
2026-01-19	Who Does This Name Remind You of? Nationality Prediction via Large Language Model Associative Memory	Keito Inoshita et.al.	2601.12771	translate	read	null
2026-01-19	Spatial-VLN: Zero-Shot Vision-and-Language Navigation With Explicit Spatial Perception and Exploration	Lu Yue et.al.	2601.12766	translate	read	null
2026-01-19	Teaching LLMs to Learn Tool Trialing and Execution through Environment Interaction	Xingjie Gao et.al.	2601.12762	translate	read	link
2026-01-19	VISPA: Pluralistic Alignment via Automatic Value Selection and Activation	Shenyan Zheng et.al.	2601.12758	translate	read	null
2026-01-19	PAIR-SAFE: A Paired-Agent Approach for Runtime Auditing and Refining AI-Mediated Mental Health Support	Jiwon Kim et.al.	2601.12754	translate	read	null
2026-01-19	Towards Robust Process Reward Modeling via Noise-aware Learning	Bin Xie et.al.	2601.12748	translate	read	null
2026-01-19	Vision Language Models for Optimization-Driven Intent Processing in Autonomous Networks	Tasnim Ahmed et.al.	2601.12744	translate	read	null
2026-01-19	A Shared Geometry of Difficulty in Multilingual Language Models	Stefano Civelli et.al.	2601.12731	translate	read	null
2026-01-19	Distribution-Centric Policy Optimization Dominates Exploration-Exploitation Trade-off	Zhaochun Li et.al.	2601.12730	translate	read	link
2026-01-19	AI-exhibited Personality Traits Can Shape Human Self-concept through Conversations	Jingshu Li et.al.	2601.12727	translate	read	null
2026-01-19	An Evolutionary Framework for Automatic Optimization Benchmark Generation via Large Language Models	Yuhiro Ono et.al.	2601.12723	translate	read	null
2026-01-19	CellularSpecSec-Bench: A Staged Benchmark for Evidence-Grounded Interpretation and Security Reasoning over 3GPP Specifications	Ke Xie et.al.	2601.12716	translate	read	null
2026-01-19	Neurosymbolic LoRA: Why and When to Tune Weights vs. Rewrite Prompts	Kevin Wang et.al.	2601.12711	translate	read	null
2026-01-19	Improving Audio Question Answering with Variational Inference	Haolin Chen et.al.	2601.12700	translate	read	null
2026-01-19	MetaToolAgent: Towards Generalizable Tool Usage in LLMs through Meta-Learning	Zheng Fang et.al.	2601.12680	translate	read	null
2026-01-19	MedConsultBench: A Full-Cycle, Fine-Grained, Process-Aware Benchmark for Medical Consultation Agents	Chuhan Qiao et.al.	2601.12661	translate	read	null
2026-01-19	Augmenting Question Answering with A Hybrid RAG Approach	Tianyi Yang et.al.	2601.12658	translate	read	null
2026-01-19	Ethical Risks in Deploying Large Language Models: An Evaluation of Medical Ethics Jailbreaking	Chutian Huang et.al.	2601.12652	translate	read	null
2026-01-19	Intelligent Documentation in Medical Education: Can AI Replace Manual Case Logging?	Nafiz Imtiaz Khan et.al.	2601.12648	translate	read	null
2026-01-19	STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models	Xiangyu Shi et.al.	2601.12641	translate	read	null
2026-01-19	BioPulse-QA: A Dynamic Biomedical Question-Answering Benchmark for Evaluating Factuality, Robustness, and Bias in Large Language Models	Kriti Bhattarai et.al.	2601.12632	translate	read	null
2026-01-16	Extractive summarization on a CMOS Ising machine	Ziqing Zeng et.al.	2601.11491	translate	read	null
2026-01-16	Health Facility Location in Ethiopia: Leveraging LLMs to Integrate Expert Knowledge into Algorithmic Planning	Yohai Trabelsi et.al.	2601.11479	translate	read	null
2026-01-16	Predict the Retrieval! Test time adaptation for Retrieval Augmented Generation	Xin Sun et.al.	2601.11443	translate	read	null
2026-01-16	Hierarchical Orthogonal Residual Spread for Precise Massive Editing in Large Language Models	Xiaojie Gu et.al.	2601.11441	translate	read	null
2026-01-16	The unreasonable effectiveness of pattern matching	Gary Lupyan et.al.	2601.11432	translate	read	null
2026-01-16	Relational Linearity is a Predictor of Hallucinations	Yuetian Lu et.al.	2601.11429	translate	read	null
2026-01-16	Understanding Help Seeking for Digital Privacy, Safety, and Security	Kurt Thomas et.al.	2601.11398	translate	read	null
2026-01-16	Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning	Haomiao Tang et.al.	2601.11393	translate	read	null
2026-01-16	Evaluating LLM Behavior in Hiring: Implicit Weights, Fairness Across Groups, and Alignment with Human Preferences	Morgane Hoffmann et.al.	2601.11379	translate	read	null
2026-01-16	Reward Modeling for Scientific Writing Evaluation	Furkan Şahinuç et.al.	2601.11374	translate	read	null
2026-01-16	RITA: A Tool for Automated Requirements Classification and Specification from Online User Feedback	Manjeshwar Aniruddh Mallya et.al.	2601.11362	translate	read	null
2026-01-16	Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding	Wenhui Tan et.al.	2601.11359	translate	read	null
2026-01-16	AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems	Weiyi Wang et.al.	2601.11354	translate	read	link
2026-01-16	How Much Would a Clinician Edit This Draft? Evaluating LLM Alignment for Patient Message Response Drafting	Parker Seegmiller et.al.	2601.11344	translate	read	null
2026-01-16	Unlocking the Potentials of Retrieval-Augmented Generation for Diffusion Language Models	Chuanyue Yu et.al.	2601.11342	translate	read	null
2026-01-16	Neural Chain-of-Thought Search: Searching the Optimal Reasoning Path to Enhance Large Language Models	Guoming Ling et.al.	2601.11340	translate	read	null
2026-01-16	Idea First, Code Later: Disentangling Problem Solving from Code Generation in Evaluating LLMs for Competitive Programming	Sama Hadhoud et.al.	2601.11332	translate	read	null
2026-01-16	Membership Inference on LLMs in the Wild	Jiatong Yi et.al.	2601.11314	translate	read	null
2026-01-16	FORESTLLM: Large Language Models Make Random Forest Great on Few-shot Tabular Learning	Zhihan Yang et.al.	2601.11311	translate	read	null
2026-01-16	One LLM to Train Them All: Multi-Task Learning Framework for Fact-Checking	Malin Astrid Larsson et.al.	2601.11293	translate	read	null
2026-01-16	Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation	Pingzhi Tang et.al.	2601.11258	translate	read	null
2026-01-16	Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering	Yuling Shi et.al.	2601.11255	translate	read	null
2026-01-16	LLM-Assisted Pseudo-Relevance Feedback	David Otero et.al.	2601.11238	translate	read	null
2026-01-16	How DDAIR you? Disambiguated Data Augmentation for Intent Recognition	Galo Castillo-López et.al.	2601.11234	translate	read	null
2026-01-16	FactCorrector: A Graph-Inspired Approach to Long-Form Factuality Correction of Large Language Models	Javier Carnerero-Cano et.al.	2601.11232	translate	read	null
2026-01-16	Language of Thought Shapes Output Diversity in Large Language Models	Shaoyang Xu et.al.	2601.11227	translate	read	null
2026-01-16	MultiCaption: Detecting disinformation using multilingual visual claims	Rafael Martins Frade et.al.	2601.11220	translate	read	null
2026-01-16	SDFLoRA: Selective Dual-Module LoRA for Federated Fine-tuning with Heterogeneous Clients	Zhikang Shen et.al.	2601.11219	translate	read	null
2026-01-16	FAQ: Mitigating Quantization Error via Regenerating Calibration Data with Family-Aware Quantization	Haiyang Xiao et.al.	2601.11200	translate	read	null
2026-01-16	SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation	Aiman Al Masoud et.al.	2601.11199	translate	read	null
2026-01-16	From Knots to Knobs: Towards Steerable Collaborative Filtering Using Sparse Autoencoders	Martin Spišák et.al.	2601.11182	translate	read	null
2026-01-16	Do We Always Need Query-Level Workflows? Rethinking Agentic Workflow Generation for Multi-Agent Systems	Zixu Wang et.al.	2601.11147	translate	read	null
2026-01-16	Learn Before Represent: Bridging Generative and Contrastive Learning for Domain-Specific LLM Embeddings	Xiaoyu Liang et.al.	2601.11124	translate	read	null
2026-01-16	Optimized Algorithms for Text Clustering with LLM-Generated Constraints	Chaoqi Jia et.al.	2601.11118	translate	read	null
2026-01-16	Differentially Private Subspace Fine-Tuning for Large Language Models	Lele Zheng et.al.	2601.11113	translate	read	null
2026-01-16	Simple Models, Rich Representations: Visual Decoding from Primate Intracortical Neural Signals	Matteo Ciferri et.al.	2601.11108	translate	read	null
2026-01-16	ReCreate: Reasoning and Creating Domain Agents Driven by Experience	Zhezheng Hao et.al.	2601.11100	translate	read	null
2026-01-16	Integrity Shield A System for Ethical AI Use & Authorship Transparency in Assessments	Ashish Raj Shekhar et.al.	2601.11093	translate	read	null
2026-01-16	ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development	Jie Yang et.al.	2601.11077	translate	read	link
2026-01-16	Visual question answering-based image-finding generation for pulmonary nodules on chest CT from structured annotations	Maiko Nagao et.al.	2601.11075	translate	read	null
2026-01-16	H-AIM: Orchestrating LLMs, PDDL, and Behavior Trees for Hierarchical Multi-Robot Planning	Haishan Zeng et.al.	2601.11063	translate	read	null
2026-01-16	Children’s Expectations, Engagement, and Evaluation of an LLM-enabled Spherical Visualization Platform in the Classroom	Emelie Fälton et.al.	2601.11060	translate	read	null
2026-01-16	Predicting Biased Human Decision-Making with Large Language Models in Conversational Settings	Stephen Pilli et.al.	2601.11049	translate	read	null
2026-01-16	CoG: Controllable Graph Reasoning via Relational Blueprints and Failure-Aware Refinement over Knowledge Graphs	Yuanxiang Liu et.al.	2601.11047	translate	read	null
2026-01-16	AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts	Keyu Li et.al.	2601.11044	translate	read	link
2026-01-16	Spectral Characterization and Mitigation of Sequential Knowledge Editing Collapse	Chi Zhang et.al.	2601.11042	translate	read	null
2026-01-16	Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data	Xuanming Zhang et.al.	2601.11038	translate	read	null
2026-01-16	PruneRAG: Confidence-Guided Query Decomposition Trees for Efficient Retrieval-Augmented Generation	Shuguang Jiao et.al.	2601.11024	translate	read	null
2026-01-16	Combating Spurious Correlations in Graph Interpretability via Self-Reflection	Kecheng Cai et.al.	2601.11021	translate	read	null
2026-01-16	Finding the Translation Switch: Discovering and Exploiting the Task-Initiation Features in LLMs	Xinwei Wu et.al.	2601.11019	translate	read	null
2026-01-16	NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems	Jiayu Liu et.al.	2601.11004	translate	read	null
2026-01-16	Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies	Qianen Zhang et.al.	2601.11002	translate	read	null
2026-01-16	When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs	Zhongxiang Sun et.al.	2601.11000	translate	read	null
2026-01-16	Data-driven Prediction of Ionic Conductivity in Solid-State Electrolytes with Machine Learning and Large Language Models	Haewon Kim et.al.	2601.10997	translate	read	null
2026-01-16	ZPD Detector: Data Selection via Capability-Difficulty Alignment for Large Language Models	Bo Yang et.al.	2601.10986	translate	read	null
2026-01-16	Evaluating 21st-Century Competencies in Postsecondary Curricula with Large Language Models: Performance Benchmarking and Reasoning-Based Prompting Strategies	Zhen Xu et.al.	2601.10983	translate	read	null
2026-01-16	AJAR: Adaptive Jailbreak Architecture for Red-teaming	Yipu Dou et.al.	2601.10971	translate	read	null
2026-01-16	Large Wireless Foundation Models: Stronger over Bigger	Xiang Cheng et.al.	2601.10963	translate	read	null
2026-01-16	Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents	Kaiyu Zhou et.al.	2601.10955	translate	read	null
2026-01-16	SwiftKV: An Edge-Oriented Attention Algorithm and Multi-Head Accelerator for Fast, Efficient LLM Decoding	Junming Zhang et.al.	2601.10953	translate	read	null
2026-01-16	Multi-Stage Patient Role-Playing Framework for Realistic Clinical Interactions	Shijie Jiang et.al.	2601.10951	translate	read	null
2026-01-16	HOSL: Hybrid-Order Split Learning for Memory-Constrained Edge Training	Aakriti et.al.	2601.10940	translate	read	null
2026-01-15	FrankenMotion: Part-level Human Motion Generation and Composition	Chuqiao Li et.al.	2601.10909	translate	read	link
2026-01-15	Topic Modeling in New Physics Detection	Alexandre Alves et.al.	2601.10871	translate	read	null
2026-01-15	Multi-Agent Taint Specification Extraction for Vulnerability Detection	Jonah Ghebremichael et.al.	2601.10865	translate	read	null
2026-01-15	Reasoning Models Generate Societies of Thought	Junsol Kim et.al.	2601.10825	translate	read	null
2026-01-15	Mugi: Value Level Parallelism For Efficient LLMs	Daniel Price et.al.	2601.10823	translate	read	null
2026-01-15	Digital Metabolism: Decoupling Logic from Facts via Regenerative Unlearning – Towards a Pure Neural Logic Core	Mengmeng Peng et.al.	2601.10810	translate	read	null
2026-01-15	A Concise Agent is Less Expert: Revealing Side Effects of Using Style Features on Conversational Agents	Young-Min Cho et.al.	2601.10809	translate	read	null
2026-01-15	BYOL: Bring Your Own Language Into LLMs	Syed Waqas Zamir et.al.	2601.10804	translate	read	null
2026-01-15	Bidirectional Human-Robot Communication for Physical Human-Robot Interaction	Junxiang Wang et.al.	2601.10796	translate	read	null
2026-01-15	LogicLens: Leveraging Semantic Code Graph to explore Multi Repository large systems	Niko Usai et.al.	2601.10773	translate	read	null
2026-01-15	Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers	Runyuan Cai et.al.	2601.10770	translate	read	null
2026-01-14	Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents	Fengchao Chen et.al.	2601.10758	translate	read	null
2026-01-15	MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching	Changle Qu et.al.	2601.10712	translate	read	link
2026-01-15	From One-to-One to Many-to-Many: Dynamic Cross-Layer Injection for Deep Vision-Language Fusion	Cheng Chen et.al.	2601.10710	translate	read	null
2026-01-15	Grounding Agent Memory in Contextual Intent	Ruozhen Yang et.al.	2601.10702	translate	read	null
2026-01-15	LIBERTy: A Causal Framework for Benchmarking Concept-Based Explanations of LLMs with Structural Counterfactuals	Gilat Toker et.al.	2601.10700	translate	read	null
2026-01-15	Structure and Diversity Aware Context Bubble Construction for Enterprise Retrieval Augmented Systems	Amir Khurshid et.al.	2601.10681	translate	read	null
2026-01-15	Are Your Reasoning Models Reasoning or Guessing? A Mechanistic Analysis of Hierarchical Reasoning Models	Zirui Ren et.al.	2601.10679	translate	read	null
2026-01-15	Single-Stage Huffman Encoder for ML Compression	Aditya Agrawal et.al.	2601.10673	translate	read	null
2026-01-15	Detecting Winning Arguments with Large Language Models and Persuasion Strategies	Tiziano Labruna et.al.	2601.10660	translate	read	null
2026-01-15	PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution	Minghao Yan et.al.	2601.10657	translate	read	null
2026-01-15	Influential Training Data Retrieval for Explaining Verbalized Confidence of LLMs	Yuxi Xia et.al.	2601.10645	translate	read	null
2026-01-15	iTIMO: An LLM-empowered Synthesis Dataset for Travel Itinerary Modification	Zhuoxuan Huang et.al.	2601.10609	translate	read	null
2026-01-15	Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay	Hao Wang et.al.	2601.10589	translate	read	null
2026-01-15	From Single to Multi-Agent Reasoning: Advancing GeneGPT for Genomics QA	Kimia Abedini et.al.	2601.10581	translate	read	null
2026-01-15	Generative AI collective behavior needs an interactionist paradigm	Laura Ferrarotti et.al.	2601.10567	translate	read	null
2026-01-15	Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing	Yinzhi Zhao et.al.	2601.10543	translate	read	null
2026-01-15	A Propagation Framework for Network Regression	Yingying Ma et.al.	2601.10533	translate	read	null
2026-01-15	PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models	Chengbing Wang et.al.	2601.10532	translate	read	null
2026-01-15	A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5	Xingjun Ma et.al.	2601.10527	translate	read	null
2026-01-15	Diagnosing Generalization Failures in Fine-Tuned LLMs: A Cross-Architectural Study on Phishing Detection	Frank Bobe et.al.	2601.10524	translate	read	null
2026-01-15	DR-Arena: an Automated Evaluation Framework for Deep Research Agents	Yiwen Gao et.al.	2601.10504	translate	read	null
2026-01-15	Projected Microbatch Accumulation yields reference-free proximal policy updates for reinforcement learning	Nilin Abrahamsen et.al.	2601.10498	translate	read	null
2026-01-15	Model See, Model Do? Exposure-Aware Evaluation of Bug-vs-Fix Preference in Code LLMs	Ali Al-Kaswan et.al.	2601.10496	translate	read	null
2026-01-15	ChartComplete: A Taxonomy-based Inclusive Chart Dataset	Ahmad Mustapha et.al.	2601.10462	translate	read	null
2026-01-15	Contextual StereoSet: Stress-Testing Bias Alignment Robustness in Large Language Models	Abhinaba Basu et.al.	2601.10460	translate	read	null
2026-01-15	LangLasso: Interactive Cluster Descriptions through LLM Explanation	Raphael Buchmüller et.al.	2601.10458	translate	read	null
2026-01-15	NSR-Boost: A Neuro-Symbolic Residual Boosting Framework for Industrial Legacy Models	Ziming Dai et.al.	2601.10457	translate	read	null
2026-01-15	Development of Ontological Knowledge Bases by Leveraging Large Language Models	Le Ngoc Luyen et.al.	2601.10436	translate	read	null
2026-01-15	LLMdoctor: Token-Level Flow-Guided Preference Optimization for Efficient Test-Time Alignment of Large Language Models	Tiesunlong Shen et.al.	2601.10416	translate	read	null
2026-01-15	LADFA: A Framework of Using Large Language Models and Retrieval-Augmented Generation for Personal Data Flow Analysis in Privacy Policies	Haiyue Yuan et.al.	2601.10413	translate	read	null
2026-01-15	Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering	Xinyu Zhu et.al.	2601.10402	translate	read	null
2026-01-15	LatentRefusal: Latent-Signal Refusal for Unanswerable Text-to-SQL Queries	Xuancheng Ren et.al.	2601.10398	translate	read	null
2026-01-15	The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models	Christina Lu et.al.	2601.10387	translate	read	null
2026-01-15	Advanced Manufacturing with Renewable and Bio-based Materials: AI/ML workflows and Process Optimization	Rigoberto Advincula et.al.	2601.10382	translate	read	null
2026-01-15	Fine-Grained Human Pose Editing Assessment via Layer-Selective MLLMs	Ningyu Sun et.al.	2601.10369	translate	read	null
2026-01-15	Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text	Zhihao Xu et.al.	2601.10355	translate	read	null
2026-01-15	SuS: Strategy-aware Surprise for Intrinsic Exploration	Mark Kashirskiy et.al.	2601.10349	translate	read	null
2026-01-15	C-GRASP: Clinically-Grounded Reasoning for Affective Signal Processing	Cheng Lin Cheng et.al.	2601.10342	translate	read	null
2026-01-15	Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders	Siqi Kou et.al.	2601.10332	translate	read	null
2026-01-15	ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding	Xueyun Tian et.al.	2601.10323	translate	read	null
2026-01-15	An Efficient Long-Context Ranking Architecture With Calibrated LLM Distillation: Application to Person-Job Fit	Warren Jouanneau et.al.	2601.10321	translate	read	null
2026-01-15	The Straight and Narrow: Do LLMs Possess an Internal Moral Path?	Luoming Hu et.al.	2601.10307	translate	read	null
2026-01-15	DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset	Hengyu Shen et.al.	2601.10305	translate	read	null
2026-01-15	Queueing-Aware Optimization of Reasoning Tokens for Accuracy-Latency Trade-offs in LLM Servers	Emre Ozbas et.al.	2601.10274	translate	read	null
2026-01-15	MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts	Yuxuan Lou et.al.	2601.10272	translate	read	null
2026-01-15	In-Context Source and Channel Coding	Ziqiong Wang et.al.	2601.10267	translate	read	null
2026-01-15	NoReGeo: Non-Reasoning Geometry Benchmark	Irina Abdullaeva et.al.	2601.10254	translate	read	null
2026-01-15	Loop as a Bridge: Can Looped Transformers Truly Link Representation Space and Natural Language Outputs?	Guanxu Chen et.al.	2601.10242	translate	read	null
2026-01-15	GeoSteer: Faithful Chain-of-Thought Steering via Latent Manifold Gradients	Kentaro Kazama et.al.	2601.10229	translate	read	null
2026-01-15	Optimizing Multimodal LLMs for Egocentric Video Understanding: A Solution for the HD-EPIC VQA Challenge	Sicheng Yang et.al.	2601.10228	translate	read	null
2026-01-15	PRL: Process Reward Learning Improves LLMs’ Reasoning Ability and Broadens the Reasoning Boundary	Jiarui Yao et.al.	2601.10201	translate	read	null
2026-01-15	HUMANLLM: Benchmarking and Reinforcing LLM Anthropomorphism via Human Cognitive Patterns	Xintao Wang et.al.	2601.10198	translate	read	null
2026-01-15	Autonomous Quantum Simulation through Large Language Model Agents	Weitang Li et.al.	2601.10194	translate	read	null
2026-01-15	GFM4GA: Graph Foundation Model for Group Anomaly Detection	Jiujiu Chen et.al.	2601.10193	translate	read	null
2026-01-15	HOMURA: Taming the Sand-Glass for Time-Constrained LLM Translation via Reinforcement Learning	Ziang Cui et.al.	2601.10187	translate	read	null
2026-01-15	ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack	Hao Li et.al.	2601.10173	translate	read	null
2026-01-15	Credit C-GPT: A Domain-Specialized Large Language Model for Conversational Understanding in Vietnamese Debt Collection	Nhung Nguyen Thi Hong et.al.	2601.10167	translate	read	null
2026-01-15	Advancing Adaptive Multi-Stage Video Anomaly Reasoning: A Benchmark Dataset and Method	Chao Huang et.al.	2601.10165	translate	read	null
2026-01-15	AWED-FiNER: Agents, Web applications, and Expert Detectors for Fine-grained Named Entity Recognition across 36 Languages for 6.6 Billion Speakers	Prachuryya Kaushik et.al.	2601.10161	translate	read	link
2026-01-15	LOOKAT: Lookup-Optimized Key-Attention for Memory-Efficient Transformers	Aryan Karmore et.al.	2601.10155	translate	read	null
2026-01-15	DecisionLLM: Large Language Models for Long Sequence Decision Exploration	Xiaowei Lv et.al.	2601.10148	translate	read	null
2026-01-15	Actors, Frames and Arguments: A Multi-Decade Computational Analysis of Climate Discourse in Financial News using Large Language Models	Ruiran Su et.al.	2601.10142	translate	read	null
2026-01-15	Understanding and Preserving Safety in Fine-Tuned LLMs	Jiawen Zhang et.al.	2601.10141	translate	read	null
2026-01-15	Is More Context Always Better? Examining LLM Reasoning Capability for Time Interval Prediction	Yanan Cao et.al.	2601.10132	translate	read	null
2026-01-15	M^4olGen: Multi-Agent, Multi-Stage Molecular Generation under Precise Multi-Property Constraints	Yizhan Li et.al.	2601.10131	translate	read	null
2026-01-15	LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning	Linquan Wu et.al.	2601.10129	translate	read	link
2026-01-15	Role-Playing Agents Driven by Large Language Models: Current Status, Challenges, and Future Trends	Ye Wang et.al.	2601.10122	translate	read	null
2026-01-15	Following the Teacher’s Footsteps: Scheduled Checkpoint Distillation for Domain-Specific LLMs	Cheng Feng et.al.	2601.10114	translate	read	null
2026-01-15	SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature	Yiming Ren et.al.	2601.10108	translate	read	null
2026-01-15	When Personas Override Payoffs: Role Identity Bias in Multi-Agent LLM Decision-Making	Viswonathan Manoranjan et.al.	2601.10102	translate	read	null
2026-01-15	MATRIX AS PLAN: Structured Logical Reasoning with Feedback-Driven Replanning	Ke Chen et.al.	2601.10101	translate	read	null
2026-01-15	Multilingual-To-Multimodal (M2M): Unlocking New Languages with Monolingual Text	Piyush Singh Pasi et.al.	2601.10096	translate	read	link
2026-01-15	State of AI: An Empirical 100 Trillion Token Study with OpenRouter	Malika Aubakirova et.al.	2601.10088	translate	read	null
2026-01-15	CALM-IT: Generating Realistic Long-Form Motivational Interviewing Dialogues with Dual-Actor Conversational Dynamics Tracking	Viet Cuong Nguyen et.al.	2601.10085	translate	read	null
2026-01-15	Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts	Sijia Luo et.al.	2601.10079	translate	read	null
2026-01-15	Long-Chain Reasoning Distillation via Adaptive Prefix Alignment	Zhenghao Liu et.al.	2601.10064	translate	read	null
2026-01-15	Unlabeled Data Can Provably Enhance In-Context Learning of Transformers	Renpu Liu et.al.	2601.10058	translate	read	null
2026-01-15	Privacy Enhanced PEFT: Tensor Train Decomposition Improves Privacy Utility Tradeoffs under DP-SGD	Pradip Kunwar et.al.	2601.10045	translate	read	null
2026-01-15	Instruction Finetuning LLaMA-3-8B Model Using LoRA for Financial Named Entity Recognition	Zhiming Lian et.al.	2601.10043	translate	read	null
2026-01-15	EmplifAI: a Fine-grained Dataset for Japanese Empathetic Medical Dialogues in 28 Emotion Labels	Wan Jou She et.al.	2601.10033	translate	read	null
2026-01-15	Structured Personality Control and Adaptation for LLM Agents	Jinpeng Wang et.al.	2601.10025	translate	read	null
2026-01-15	Empowering Older Adults in Digital Technology Use with Foundation Models	Hasti Sharifi et.al.	2601.10018	translate	read	null
2026-01-15	VERHallu: Evaluating and Mitigating Event Relation Hallucination in Video Large Language Models	Zefan Zhang et.al.	2601.10010	translate	read	null
2026-01-15	SoK: Privacy-aware LLM in Healthcare: Threat Model, Privacy Techniques, Challenges and Recommendations	Mohoshin Ara Tahera et.al.	2601.10004	translate	read	null
2026-01-15	Towards Native Intelligence: 6G-LLM Trained with Reinforcement Learning from NDT Feedback	Zhuoran Xiao et.al.	2601.09992	translate	read	null
2026-01-15	Context Volume Drives Performance: Tackling Domain Shift in Extremely Low-Resource Translation via RAG	David Samuel Setiawan et.al.	2601.09982	translate	read	null
2026-01-15	DR $^2$ Seg: Decomposed Two-Stage Rollouts for Efficient Reasoning Segmentation in Multimodal Large Language Models	Yulin He et.al.	2601.09981	translate	read	null
2026-01-15	Performance of AI agents based on reasoning language models on ALD process optimization tasks	Angel Yanguas-Gil et.al.	2601.09980	translate	read	null
2026-01-15	SPRInG: Continual LLM Personalization via Selective Parametric Adaptation and Retrieval-Interpolated Generation	Seoyeon Kim et.al.	2601.09974	translate	read	null
2026-01-15	Chinese Labor Law Large Language Model Benchmark	Zixun Lan et.al.	2601.09972	translate	read	null
2026-01-15	An Exploratory Study to Repurpose LLMs to a Unified Architecture for Time Series Classification	Hansen He et.al.	2601.09971	translate	read	null
2026-01-15	Take Out Your Calculators: Estimating the Real Difficulty of Question Items with LLM Student Simulations	Christabel Acquaye et.al.	2601.09953	translate	read	null
2026-01-14	How Diplomacy Reshapes Online Discourse:Asymmetric Persistence in Online Framing of North Korea	Hunjun Shin et.al.	2601.09942	translate	read	null
2026-01-14	Hallucination Detection and Mitigation in Large Language Models	Ahmad Pesaranghader et.al.	2601.09929	translate	read	null
2026-01-14	Continuum Memory Architectures for Long-Horizon LLM Agents	Joe Logan et.al.	2601.09913	translate	read	null
2026-01-14	Self-reflection in Automated Qualitative Coding: Improving Text Annotation through Secondary LLM Critique	Zackary Okun Dunivin et.al.	2601.09905	translate	read	null
2026-01-14	Beyond Rule-Based Workflows: An Information-Flow-Orchestrated Multi-Agents Paradigm via Agent-to-Agent Communication from CORAL	Xinxing Ren et.al.	2601.09883	translate	read	null
2026-01-14	MedVL-SAM2: A unified 3D medical vision-language model for multimodal reasoning and prompt-driven segmentation	Yang Xing et.al.	2601.09879	translate	read	null
2026-01-14	Beyond Strict Rules: Assessing the Effectiveness of Large Language Models for Code Smell Detection	Saymon Souza et.al.	2601.09873	translate	read	null
2026-01-14	A Scoping Review of the Ethical Perspectives on Anthropomorphising Large Language Model-Based Conversational Agents	Andrea Ferrario et.al.	2601.09869	translate	read	null
2026-01-14	Advancing Model Refinement: Muon-Optimized Distillation and Quantization for LLM Deployment	Jacob Sander et.al.	2601.09865	translate	read	null
2026-01-14	OUTLINEFORGE: Hierarchical Reinforcement Learning with Explicit States for Scientific Writing	Yilin Bao et.al.	2601.09858	translate	read	null
2026-01-14	MedRedFlag: Investigating how LLMs Redirect Misconceptions in Real-World Health Communication	Sraavya Sambara et.al.	2601.09853	translate	read	null
2026-01-14	Strategies of cooperation and defection in five large language models	Saptarshi Pal et.al.	2601.09849	translate	read	null
2026-01-14	Stable and Explainable Personality Trait Evaluation in Large Language Models with Internal Activations	Xiaoxu Ma et.al.	2601.09833	translate	read	null
2026-01-14	UniHash: Unifying Pointwise and Pairwise Hashing Paradigms for Seen and Unseen Category Retrieval	Xiaoxu Ma et.al.	2601.09828	translate	read	null
2026-01-14	LLM-Based Agentic Systems for Software Engineering: Challenges and Opportunities	Yongjian Tang et.al.	2601.09822	translate	read	null
2026-01-14	Antisocial behavior towards large language model users: experimental evidence	Paweł Niszczota et.al.	2601.09772	translate	read	null
2026-01-14	Explicating Tacit Regulatory Knowledge from LLMs to Auto-Formalize Requirements for Compliance Test Case Generation	Zhiyi Xue et.al.	2601.09762	translate	read	null
2026-01-14	Investigating Tool-Memory Conflicts in Tool-Augmented LLMs	Jiali Cheng et.al.	2601.09760	translate	read	null
2026-01-13	Synthetic Data for Veterinary EHR De-identification: Benefits, Limits, and Safety Trade-offs Under Fixed Compute	David Brundage et.al.	2601.09756	translate	read	null
2026-01-12	SAGE: Tool-Augmented LLM Task Solving Strategies in Scalable Multi-Agent Environments	Robert K. Strehlow et.al.	2601.09750	translate	read	null
2026-01-14	ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation	Sicong Liu et.al.	2601.09703	translate	read	null
2026-01-14	How well LLM-based test generation techniques perform with newer LLM versions?	Michael Konstantinou et.al.	2601.09695	translate	read	null
2026-01-14	LLMs can Compress LLMs: Adaptive Pruning by Agents	Sai Varun Kodathala et.al.	2601.09694	translate	read	null
2026-01-14	Routing with Generated Data: Annotation-Free LLM Skill Estimation and Expert Selection	Tianyi Niu et.al.	2601.09692	translate	read	null
2026-01-14	Disentangling Task Conflicts in Multi-Task LoRA via Orthogonal Gradient Projection	Ziyu Yang et.al.	2601.09684	translate	read	null
2026-01-14	Automating Supply Chain Disruption Monitoring via an Agentic AI Approach	Sara AlMahri et.al.	2601.09680	translate	read	null
2026-01-14	LLMs Got Rhythm? Hybrid Phonological Filtering for Greek Poetry Rhyme Detection and Generation	Stergios Chatzikyriakidis et.al.	2601.09631	translate	read	null
2026-01-14	From Prompt to Protocol: Fast Charging Batteries with Large Language Models	Ge Lei et.al.	2601.09626	translate	read	null
2026-01-14	The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multi-Step Malware	Ben Nassi et.al.	2601.09625	translate	read	null
2026-01-14	DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing	Qian Cao et.al.	2601.09609	translate	read	null
2026-01-14	GRCF: Two-Stage Groupwise Ranking and Calibration Framework for Multimodal Sentiment Analysis	Manning Gao et.al.	2601.09606	translate	read	null
2026-01-14	OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding	Sheng-Yu Huang et.al.	2601.09575	translate	read	null
2026-01-14	Dialogue Telemetry: Turn-Level Instrumentation for Autonomous Information Gathering	Dimitris Panagopoulos et.al.	2601.09570	translate	read	null
2026-01-14	Hot-Start from Pixels: Low-Resolution Visual Tokens for Chinese Language Modeling	Shuyang Xiang et.al.	2601.09566	translate	read	null
2026-01-14	Benchmarking Post-Training Quantization of Large Language Models under Microscaling Floating Point Formats	Manyi Zhang et.al.	2601.09555	translate	read	null
2026-01-14	Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning	Dongjie Cheng et.al.	2601.09536	translate	read	link
2026-01-14	MVSS: A Unified Framework for Multi-View Structured Survey Generation	Yinqi Liu et.al.	2601.09504	translate	read	null
2026-01-14	What Do LLM Agents Know About Their World? Task2Quiz: A Paradigm for Studying Environment Understanding	Siyuan Liu et.al.	2601.09503	translate	read	null
2026-01-14	SlidesGen-Bench: Evaluating Slides Generation via Computational and Quantitative Metrics	Yunqiao Yang et.al.	2601.09487	translate	read	link
2026-01-14	Bridging Semantic Understanding and Popularity Bias with LLMs	Renqiang Luo et.al.	2601.09478	translate	read	null
2026-01-14	SimMerge: Learning to Select Merge Operators from Similarity Signals	Oliver Bolton et.al.	2601.09473	translate	read	null
2026-01-14	Personalized Multimodal Feedback Using Multiple External Representations: Strategy Profiles and Learning in High School Physics	Natalia Revenga-Lozano et.al.	2601.09470	translate	read	null
2026-01-14	Dissecting Judicial Reasoning in U.S. Copyright Damage Awards	Pei-Chi Lo et.al.	2601.09459	translate	read	null
2026-01-14	Population-Aligned Audio Reproduction With LLM-Based Equalizers	Ioannis Stylianou et.al.	2601.09448	translate	read	null
2026-01-14	Improving Symbolic Translation of Language Models for Logical Reasoning	Ramya Keerthy Thatikonda et.al.	2601.09446	translate	read	null
2026-01-14	SC-MAS: Constructing Cost-Efficient Multi-Agent Systems with Edge-Level Heterogeneous Collaboration	Di Zhao et.al.	2601.09434	translate	read	null
2026-01-14	Video-MSR: Benchmarking Multi-hop Spatial Reasoning Capabilities of MLLMs	Rui Zhu et.al.	2601.09430	translate	read	null
2026-01-14	TiInsight: A SQL-based Automated Exploratory Data Analysis System through Large Language Models	Jun-Peng Zhu et.al.	2601.09404	translate	read	null
2026-01-14	Structured Knowledge Representation through Contextual Pages for Retrieval-Augmented Generation	Xinze Li et.al.	2601.09402	translate	read	null
2026-01-14	Ability Transfer and Recovery via Modularized Parameters Localization	Songyao Jin et.al.	2601.09398	translate	read	null
2026-01-14	SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing	Ziyang Ma et.al.	2601.09385	translate	read	null
2026-01-14	Long-term Task-oriented Agent: Proactive Long-term Intent Maintenance in Dynamic Environments	Qinglong Shi et.al.	2601.09382	translate	read	null
2026-01-14	The Imperfective Paradox in Large Language Models	Bolei Ma et.al.	2601.09373	translate	read	null
2026-01-14	Relation Extraction Capabilities of LLMs on Clinical Text: A Bilingual Evaluation for English and Turkish	Aidana Aidynkyzy et.al.	2601.09367	translate	read	null
2026-01-14	See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval	Mingyu Jeon et.al.	2601.09350	translate	read	null
2026-01-14	SpatialJB: How Text Distribution Art Becomes the “Jailbreak Key” for LLM Guardrails	Zhiyi Mou et.al.	2601.09321	translate	read	null
2026-01-14	On-Device Large Language Models for Sequential Recommendation	Xin Xia et.al.	2601.09306	translate	read	null
2026-01-14	Multi-Modal LLM based Image Captioning in ICT: Bridging the Gap Between General and Industry Domain	Lianying Chao et.al.	2601.09298	translate	read	null
2026-01-14	MACRO-LLM: LLM-Empowered Multi-Agent Collaborative Reasoning under Spatiotemporal Partial Observability	Handi Chen et.al.	2601.09295	translate	read	null
2026-01-14	Enhancing Spatial Reasoning in Large Language Models for Metal-Organic Frameworks Structure Prediction	Mianzhi Pan et.al.	2601.09285	translate	read	null
2026-01-14	Cluster Workload Allocation: Semantic Soft Affinity Using Natural Language Processing	Leszek Sliwko et.al.	2601.09282	translate	read	null
2026-01-14	STaR: Sensitive Trajectory Regulation for Unlearning in Large Reasoning Models	Jingjing Zhou et.al.	2601.09281	translate	read	null
2026-01-14	ReGraM: Region-First Knowledge Graph Reasoning for Medical Question Answering	Chaerin Lee et.al.	2601.09280	translate	read	null
2026-01-14	MCGA: A Multi-task Classical Chinese Literary Genre Audio Corpus	Yexing Du et.al.	2601.09270	translate	read	link
2026-01-14	RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering	Wencheng Ye et.al.	2601.09269	translate	read	link
2026-01-14	Coordinated Pandemic Control with Large Language Model Agents as Policymaking Assistants	Ziyi Shi et.al.	2601.09264	translate	read	null
2026-01-14	Efficient Paths and Dense Rewards: Probabilistic Flow Reasoning for Large Language Models	Yan Liu et.al.	2601.09260	translate	read	null
2026-01-14	MAXS: Meta-Adaptive Exploration with LLM Agents	Jian Zhang et.al.	2601.09259	translate	read	link
2026-01-14	When to Invoke: Refining LLM Fairness with Toxicity Assessment	Jing Ren et.al.	2601.09250	translate	read	null
2026-01-14	When to Trust: A Causality-Aware Calibration Framework for Accurate Knowledge Graph Retrieval-Augmented Generation	Jing Ren et.al.	2601.09241	translate	read	null
2026-01-14	DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion	Hanlin Zhang et.al.	2601.09239	translate	read	null
2026-01-14	Mikasa: A Character-Driven Emotional AI Companion Inspired by Japanese Oshi Culture	Miki Ueno et.al.	2601.09208	translate	read	null
2026-01-14	ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection	Tao Liu et.al.	2601.09195	translate	read	link
2026-01-14	OrthoGeoLoRA: Geometric Parameter-Efficient Fine-Tuning for Structured Social Science Concept Retrieval on theWeb	Zeqiang Wang et.al.	2601.09185	translate	read	null
2026-01-14	$D^2Prune$ : Sparsifying Large Language Models via Dual Taylor Expansion and Attention Distribution Awareness	Lang Xiong et.al.	2601.09176	translate	read	null
2026-01-14	BalDRO: A Distributionally Robust Optimization based Framework for Large Language Model Unlearning	Pengyang Shao et.al.	2601.09172	translate	read	null
2026-01-14	LLMs Meet Isolation Kernel: Lightweight, Learning-free Binary Embeddings for Fast Retrieval	Zhibo Zhang et.al.	2601.09159	translate	read	null
2026-01-14	PrivacyReasoner: Can LLM Emulate a Human-like Privacy Mind?	Yiwen Tu et.al.	2601.09152	translate	read	null
2026-01-14	Interpretable Probability Estimation with LLMs via Shapley Reconstruction	Yang Nan et.al.	2601.09151	translate	read	null
2026-01-14	World Craft: Agentic Framework to Create Visualizable Worlds via Text	Jianwen Sun et.al.	2601.09150	translate	read	null
2026-01-14	Identity-Robust Language Model Generation via Content Integrity Preservation	Miao Zhang et.al.	2601.09141	translate	read	null
2026-01-14	KryptoPilot: An Open-World Knowledge-Augmented LLM Agent for Automated Cryptographic Exploitation	Xiaonan Liu et.al.	2601.09129	translate	read	null
2026-01-14	Contrastive Bi-Encoder Models for Multi-Label Skill Extraction: Enhancing ESCO Ontology Matching with BERT and Attention Mechanisms	Yongming Sun et.al.	2601.09119	translate	read	null
2026-01-14	The AI Hippocampus: How Far are We From Human Memory?	Zixia Jia et.al.	2601.09113	translate	read	null
2026-01-14	Seeking Human Security Consensus: A Unified Value Scale for Generative AI Value Safety	Ying He et.al.	2601.09112	translate	read	null
2026-01-14	DScheLLM: Enabling Dynamic Scheduling through a Fine-Tuned Dual-System Large language Model	Lixiang Zhang et.al.	2601.09100	translate	read	null
2026-01-14	Programming over Thinking: Efficient and Robust Multi-Constraint Planning	Derrick Goh Xin Deik et.al.	2601.09097	translate	read	null
2026-01-14	Hidden States as Early Signals: Step-level Trace Evaluation and Pruning for Efficient Test-Time Scaling	Zhixiang Liang et.al.	2601.09093	translate	read	null
2026-01-14	SubTokenTest: A Practical Benchmark for Real-World Sub-token Understanding	Shuyang Hou et.al.	2601.09089	translate	read	null
2026-01-14	From Symbolic to Natural-Language Relations: Rethinking Knowledge Graph Construction in the Era of Large Language Models	Kanyao Han et.al.	2601.09069	translate	read	null
2026-01-14	Mi:dm 2.0 Korea-centric Bilingual Language Models	Donghoon Shin et.al.	2601.09066	translate	read	null
2026-01-14	Efficient Multilingual Dialogue Processing via Translation Pipelines and Distilled Language Models	Santiago Martínez Novoa et.al.	2601.09059	translate	read	null
2026-01-14	Evaluating local large language models for structured extraction from endometriosis-specific transvaginal ultrasound reports	Haiyi Li et.al.	2601.09053	translate	read	null
2026-01-14	Is Grokking Worthwhile? Functional Analysis and Transferability of Generalization Circuits in Transformers	Kaiyu He et.al.	2601.09049	translate	read	null
2026-01-14	Horseshoe Mixtures-of-Experts (HS-MoE)	Nick Polson et.al.	2601.09043	translate	read	null
2026-01-14	Can LLMs interpret figurative language as humans do?: surface-level vs representational similarity	Samhita Bollepally et.al.	2601.09041	translate	read	null
2026-01-14	An Information-Theoretic Perspective on LLM Tokenizers	Mete Erdogan et.al.	2601.09039	translate	read	null
2026-01-14	SpectraQuery: A Hybrid Retrieval-Augmented Conversational Assistant for Battery Science	Sreya Vangara et.al.	2601.09036	translate	read	null
2026-01-14	A Decompilation-Driven Framework for Malware Detection with Large Language Models	Aniesh Chawla et.al.	2601.09035	translate	read	null
2026-01-13	The Hierarchy of Agentic Capabilities: Evaluating Frontier Models on Realistic RL Environments	Logan Ritchie et.al.	2601.09032	translate	read	null
2026-01-13	Proactively Detecting Threats: A Novel Approach Using LLMs	Aniesh Chawla et.al.	2601.09029	translate	read	null
2026-01-13	OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG	Fengran Mo et.al.	2601.09028	translate	read	null
2026-01-13	Agentic AI and Machine Learning for Accelerated Materials Discovery and Applications	Jihua Chen et.al.	2601.09027	translate	read	null
2026-01-13	Multicultural Spyfall: Assessing LLMs through Dynamic Multilingual Social Deduction Game	Haryo Akbarianto Wibowo et.al.	2601.09017	translate	read	null
2026-01-13	Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers	Annalisa Belloni et.al.	2601.09000	translate	read	null
2026-01-13	Optimising for Energy Efficiency and Performance in Machine Learning	Emile Dos Santos Ferreira et.al.	2601.08991	translate	read	null
2026-01-13	ART: Action-based Reasoning Task Benchmarking for Medical AI Agents	Ananya Mantravadi et.al.	2601.08988	translate	read	null
2026-01-13	Integrating APK Image and Text Data for Enhanced Threat Detection: A Multimodal Deep Learning Approach to Android Malware	Md Mashrur Arifin et.al.	2601.08959	translate	read	null
2026-01-13	Fine Grained Evaluation of LLMs-as-Judges	Sourav Saha et.al.	2601.08919	translate	read	null
2026-01-13	Spectral Generative Flow Models: A Physics-Inspired Replacement for Vectorized Large Language Models	Andrew Kiruluta et.al.	2601.08893	translate	read	null
2026-01-13	Evaluating Role-Consistency in LLMs for Counselor Training	Eric Rudolph et.al.	2601.08892	translate	read	null
2026-01-12	Bridging the Gap: Empowering Small Models in Reliable OpenACC-based Parallelization via GEPA-Optimized Prompting	Samyak Jhaveri et.al.	2601.08884	translate	read	null
2026-01-13	Modeling LLM Agent Reviewer Dynamics in Elo-Ranked Review System	Hsiang-Wei Huang et.al.	2601.08829	translate	read	null
2026-01-13	Reasoning Matters for 3D Visual Grounding	Hsiang-Wei Huang et.al.	2601.08811	translate	read	null
2026-01-13	Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge	Yao Tang et.al.	2601.08808	translate	read	link
2026-01-13	MixServe: An Automatic Distributed Serving System for MoE Models with Hybrid Parallelism Based on Fused Communication Algorithm	Bowen Zhou et.al.	2601.08800	translate	read	null
2026-01-13	Uncovering Political Bias in Large Language Models using Parliamentary Voting Records	Jieying Chen et.al.	2601.08785	translate	read	null
2026-01-13	Asymptotic Universal Alignment: A New Alignment Framework via Test-Time Scaling	Yang Cai et.al.	2601.08777	translate	read	null
2026-01-13	Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs	Zhiyuan Hu et.al.	2601.08763	translate	read	null
2026-01-13	M3CoTBench: Benchmark Chain-of-Thought of MLLMs in Medical Image Understanding	Juntao Jiang et.al.	2601.08758	translate	read	null
2026-01-13	Inferring Latent Intentions: Attributional Natural Language Inference in LLM Agents	Xin Quan et.al.	2601.08742	translate	read	null
2026-01-13	From Rows to Reasoning: A Retrieval-Augmented Multimodal Framework for Spreadsheet Understanding	Anmol Gulati et.al.	2601.08741	translate	read	null
2026-01-13	PrivGemo: Privacy-Preserving Dual-Tower Graph Retrieval for Empowering LLM Reasoning with Memory Augmentation	Xingyu Tan et.al.	2601.08739	translate	read	null
2026-01-13	TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback	Prithwish Jana et.al.	2601.08734	translate	read	null
2026-01-13	RAGShaper: Eliciting Sophisticated Agentic RAG Skills via Automated Data Synthesis	Zhengwei Tao et.al.	2601.08699	translate	read	null
2026-01-13	Nationality and Region Prediction from Names: A Comparative Study of Neural Models and Large Language Models	Keito Inoshita et.al.	2601.08692	translate	read	null
2026-01-13	LLMs in Code Vulnerability Analysis: A Proof of Concept	Shaznin Sultana et.al.	2601.08691	translate	read	null
2026-01-13	All Required, In Order: Phase-Level Evaluation for AI-Human Dialogue in Healthcare and Beyond	Shubham Kulkarni et.al.	2601.08690	translate	read	null
2026-01-13	QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models	Zhaolu Kang et.al.	2601.08689	translate	read	null
2026-01-13	Advancing ESG Intelligence: An Expert-level Agent and Comprehensive Benchmark for Sustainable Finance	Yilei Zhao et.al.	2601.08676	translate	read	null
2026-01-13	Why AI Alignment Failure Is Structural: Learned Human Interaction Structures and AGI as an Endogenous Evolutionary Shock	Didier Sornette et.al.	2601.08673	translate	read	null
2026-01-13	Analyzing Bias in False Refusal Behavior of Large Language Models for Hate Speech Detoxification	Kyuri Im et.al.	2601.08668	translate	read	null
2026-01-13	Prism: Towards Lowering User Cognitive Load in LLMs via Complex Intent Understanding	Zenghua Liao et.al.	2601.08653	translate	read	null
2026-01-13	Resisting Manipulative Bots in Memecoin Copy Trading: A Multi-Agent Approach with Chain-of-Thought Reasoning	Yichen Luo et.al.	2601.08641	translate	read	null
2026-01-13	Moral Lenses, Political Coordinates: Towards Ideological Positioning of Morally Conditioned LLMs	Chenchen Yuan et.al.	2601.08634	translate	read	null
2026-01-13	How Order-Sensitive Are LLMs? OrderProbe for Deterministic Structural Reconstruction	Yingjie He et.al.	2601.08626	translate	read	null
2026-01-13	Efficient Maintenance of Leiden Communities in Large Dynamic Graphs	Chunxu Lin et.al.	2601.08554	translate	read	null
2026-01-13	Learner-Tailored Program Repair: A Solution Generator with Iterative Edit-Driven Retrieval Enhancement	Zhenlong Dai et.al.	2601.08545	translate	read	null
2026-01-13	Reducing Compute Waste in LLMs through Kernel-Level DVFS	Jeffrey Spaan et.al.	2601.08539	translate	read	null
2026-01-13	Your Group-Relative Advantage Is Biased	Fengkai Yang et.al.	2601.08521	translate	read	null
2026-01-13	Closed-Loop LLM Discovery of Non-Standard Channel Priors in Vision Models	Tolgay Atinc Uzun et.al.	2601.08517	translate	read	null
2026-01-13	Robust CAPTCHA Using Audio Illusions in the Era of Large Language Models: from Evaluation to Advances	Ziqi Ding et.al.	2601.08516	translate	read	null
2026-01-13	What If TSF: A Benchmark for Reframing Forecasting as Scenario-Guided Multimodal Forecasting	Jinkwan Jang et.al.	2601.08509	translate	read	null
2026-01-13	It’s All About the Confidence: An Unsupervised Approach for Multilingual Historical Entity Linking using Large Language Models	Cristian Santini et.al.	2601.08500	translate	read	null
2026-01-13	BenchOverflow: Measuring Overflow in Large Language Models via Plain-Text Prompts	Erin Feiglin et.al.	2601.08490	translate	read	null
2026-01-13	SUMMPILOT: Bridging Efficiency and Customization for Interactive Summarization System	JungMin Yun et.al.	2601.08475	translate	read	null
2026-01-13	sui-1: Grounded and Verifiable Long-Form Summarization	Benedikt Droste et.al.	2601.08472	translate	read	null
2026-01-13	JudgeRLVR: Judge First, Generate Second for Efficient Reasoning	Jiangshan Duo et.al.	2601.08468	translate	read	null
2026-01-13	M3-BENCH: Process-Aware Evaluation of LLM Agents Social Behaviors in Mixed-Motive Games	Sixiong Xie et.al.	2601.08462	translate	read	null
2026-01-13	Beyond Linearization: Attributed Table Graphs for Table Reasoning	Yuxiang Wang et.al.	2601.08444	translate	read	null
2026-01-13	YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation	Abdelaziz Bounhar et.al.	2601.08441	translate	read	null
2026-01-13	Incentivizing Cardiologist-Like Reasoning in MLLMs for Interpretable Echocardiographic Diagnosis	Yi Qin et.al.	2601.08440	translate	read	null
2026-01-13	Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management	Weitao Ma et.al.	2601.08435	translate	read	null
2026-01-13	Silence the Judge: Reinforcement Learning with Self-Verifier via Latent Geometric Clustering	Nonghai Zhang et.al.	2601.08427	translate	read	null
2026-01-13	Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance	Jihang Li et.al.	2601.08418	translate	read	null
2026-01-13	Regulatory gray areas of LLM Terms	Brittany I. Davidson et.al.	2601.08415	translate	read	null
2026-01-13	Hybrid Distillation with CoT Guidance for Edge-Drone Control Code Generation	Yizhan Feng et.al.	2601.08412	translate	read	null
2026-01-13	Large Language Models to Enhance Multi-task Drone Operations in Simulated Environments	Yizhan Feng et.al.	2601.08405	translate	read	null
2026-01-13	Owen-Shapley Policy Optimization (OSPO): A Principled RL Algorithm for Generative Search LLMs	Abhijnan Nath et.al.	2601.08403	translate	read	null
2026-01-13	PATS: Personality-Aware Teaching Strategies with Large Language Model Tutors	Donya Rooein et.al.	2601.08402	translate	read	null
2026-01-13	CLaS-Bench: A Cross-Lingual Alignment and Steering Benchmark	Daniil Gurgurov et.al.	2601.08331	translate	read	null
2026-01-13	Deep Exploration of Epoch-wise Double Descent in Noisy Data: Signal Separation, Large Activation, and Benign Overfitting	Tomoki Kubo et.al.	2601.08316	translate	read	null
2026-01-13	Enhancing Image Quality Assessment Ability of LMMs via Retrieval-Augmented Generation	Kang Fu et.al.	2601.08311	translate	read	null
2026-01-13	Enhancing Sentiment Classification and Irony Detection in Large Language Models through Advanced Prompt Engineering Techniques	Marvin Schmitt et.al.	2601.08302	translate	read	null
2026-01-13	Demystifying the Slash Pattern in Attention: The Role of RoPE	Yuan Cheng et.al.	2601.08297	translate	read	null
2026-01-13	KidVis: Do Multimodal Large Language Models Possess the Visual Perceptual Capabilities of a 6-Year-Old?	Xianfeng Wang et.al.	2601.08292	translate	read	null
2026-01-13	OpenMic: A Multi-Agent-Based Stand-Up Comedy Generation System	Yuyang Wu et.al.	2601.08288	translate	read	null
2026-01-13	AgriLens: Semantic Retrieval in Agricultural Texts Using Topic Modeling and Language Models	Heba Shakeel et.al.	2601.08283	translate	read	null
2026-01-13	Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees	Kun Li et.al.	2601.08274	translate	read	null
2026-01-13	HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding	Qitan Lv et.al.	2601.08273	translate	read	null
2026-01-13	Med-CoReasoner: Reducing Language Disparities in Medical Reasoning via Language-Informed Co-Reasoning	Fan Gao et.al.	2601.08267	translate	read	null
2026-01-13	Unleashing Tool Engineering and Intelligence for Agentic AI in Next-Generation Communication Networks	Yinqiu Liu et.al.	2601.08259	translate	read	null
2026-01-13	Large Artificial Intelligence Model Guided Deep Reinforcement Learning for Resource Allocation in Non Terrestrial Networks	Abdikarim Mohamed Ibrahim et.al.	2601.08254	translate	read	null
2026-01-13	Improving Zero-shot ADL Recognition with Large Language Models through Event-based Context and Confidence	Michele Fiori et.al.	2601.08241	translate	read	null
2026-01-13	The End of Reward Engineering: How LLMs Are Redefining Multi-Agent Coordination	Haoran Su et.al.	2601.08237	translate	read	null
2026-01-13	DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection	Zhenhua Xu et.al.	2601.08223	translate	read	null
2026-01-13	Generation-Augmented Generation: A Plug-and-Play Framework for Private Knowledge Injection in Large Language Models	Rongji Li et.al.	2601.08209	translate	read	null
2026-01-13	Triplets Better Than Pairs: Towards Stable and Effective Self-Play Fine-Tuning for LLMs	Yibo Wang et.al.	2601.08198	translate	read	null
2026-01-13	Evaluating Implicit Regulatory Compliance in LLM Tool Invocation via Logic-Guided Synthesis	Da Song et.al.	2601.08196	translate	read	null
2026-01-13	Improving LLM Reasoning with Homophily-aware Structural and Semantic Text-Attributed Graph Compression	Zijun Di et.al.	2601.08187	translate	read	null
2026-01-13	GI-Bench: A Panoramic Benchmark Revealing the Knowledge-Experience Dissociation of Multimodal Large Language Models in Gastrointestinal Endoscopy Against Clinical Standards	Yan Zhu et.al.	2601.08183	translate	read	null
2026-01-13	Prompt-Based Clarity Evaluation and Topic Detection in Political Question Answering	Lavanya Prahallad et.al.	2601.08176	translate	read	null
2026-01-13	The Agent’s First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios	Daocheng Fu et.al.	2601.08173	translate	read	null
2026-01-13	Relational Knowledge Distillation Using Fine-tuned Function Vectors	Andrea Kang et.al.	2601.08169	translate	read	null
2026-01-13	WISE-Flow: Workflow-Induced Structured Experience for Self-Evolving Conversational Service Agents	Yuqing Zhou et.al.	2601.08158	translate	read	null
2026-01-13	Where Does Vision Meet Language? Understanding and Refining Visual Fusion in MLLMs via Contrastive Attention	Shezheng Song et.al.	2601.08151	translate	read	null
2026-01-13	Enriching Semantic Profiles into Knowledge Graph for Recommender Systems Using Large Language Models	Seokho Ahn et.al.	2601.08148	translate	read	null
2026-01-13	Qalb: Largest State-of-the-Art Urdu Large Language Model for 230M Speakers with Systematic Continued Pre-training	Muhammad Taimoor Hassan et.al.	2601.08141	translate	read	null
2026-01-13	MirrorBench: An Extensible Framework to Evaluate User-Proxy Agents for Human-Likeness	Ashutosh Hathidara et.al.	2601.08118	translate	read	null
2026-01-13	Coordinated Cooling and Compute Management for AI Datacenters	Nardos Belay Abera et.al.	2601.08113	translate	read	null
2026-01-13	Debiasing Large Language Models via Adaptive Causal Prompting with Sketch-of-Thought	Bowen Li et.al.	2601.08108	translate	read	null
2026-01-13	STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order	Chengyang Gu et.al.	2601.08107	translate	read	null
2026-01-13	AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling	Yongliang Miao et.al.	2601.08097	translate	read	null
2026-01-13	Q-realign: Piggybacking Realignment on Quantization for Safe and Efficient LLM Deployment	Qitao Tan et.al.	2601.08089	translate	read	null
2026-01-12	MemoBrain: Executive Memory as an Agentic Brain for Reasoning	Hongjin Qian et.al.	2601.08079	translate	read	null
2026-01-12	Semantic Gravity Wells: Why Negative Constraints Backfire	Shailesh Rana et.al.	2601.08070	translate	read	null
2026-01-12	Calibration Is Not Enough: Evaluating Confidence Estimation Under Language Variations	Yuxi Xia et.al.	2601.08064	translate	read	null
2026-01-12	Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models	Zhenghao He et.al.	2601.08058	translate	read	null
2026-01-12	Cognitive Biases in LLM-Assisted Software Development	Xinyi Zhou et.al.	2601.08045	translate	read	null
2026-01-12	Towards Verifiably Safe Tool Use for LLM Agents	Aarya Doshi et.al.	2601.08012	translate	read	null
2026-01-12	LLM Review: Enhancing Creative Writing via Blind Peer Review Feedback	Weiyue Li et.al.	2601.08003	translate	read	null
2026-01-12	Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety	Can Jin et.al.	2601.08000	translate	read	null
2026-01-12	Is Sentiment Banana-Shaped? Exploring the Geometry and Portability of Sentiment Concept Vectors	Laurits Lyngbaek et.al.	2601.07995	translate	read	null
2026-01-12	DYCP: Dynamic Context Pruning for Long-Form Dialogue with LLMs	Nayoung Choi et.al.	2601.07994	translate	read	null
2026-01-12	Fake Date Tests: Can We Trust In-sample Accuracy of LLMs in Macroeconomic Forecasting?	Alexander Eliseev et.al.	2601.07992	translate	read	null
2026-01-12	Multilingual, Multimodal Pipeline for Creating Authentic and Structured Fact-Checked Claim Dataset	Z. Melce Hüsünbeyi et.al.	2601.07985	translate	read	null
2026-01-12	Cost and accuracy of long-term memory in Distributed Multi-Agent Systems based on Large Language Models	Benedict Wolff et.al.	2601.07978	translate	read	null
2026-01-12	Explaining Generalization of AI-Generated Text Detectors Through Linguistic Analysis	Yuxi Xia et.al.	2601.07974	translate	read	null
2026-01-12	Knowing But Not Doing: Convergent Morality and Divergent Action in LLMs	Jen-tse Huang et.al.	2601.07972	translate	read	null
2026-01-12	A Human-Centric Pipeline for Aligning Large Language Models with Chinese Medical Ethics	Haoan Jin et.al.	2601.07954	translate	read	null
2026-01-12	SECite: Analyzing and Summarizing Citations in Software Engineering Literature	Shireesh Reddy Pyreddy et.al.	2601.07939	translate	read	null
2026-01-12	Towards Specialized Generalists: A Multi-Task MoE-LoRA Framework for Domain-Specific LLM Adaptation	Yuxin Yang et.al.	2601.07935	translate	read	null
2026-01-12	Enhancing Large Language Models for Time-Series Forecasting via Vector-Injected In-Context Learning	Jianqi Zhang et.al.	2601.07903	translate	read	null
2026-01-12	SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations	Mohammed Himayath Ali et.al.	2601.07835	translate	read	null
2026-01-12	The Confidence Trap: Gender Bias and Predictive Certainty in LLMs	Ahmed Sabir et.al.	2601.07806	translate	read	null
2026-01-12	Learning Through Dialogue: Unpacking the Dynamics of Human-LLM Conversations on Political Issues	Shaz Furniturewala et.al.	2601.07796	translate	read	null
2026-01-12	Kinship Data Benchmark for Multi-hop Reasoning	Tianda Sun et.al.	2601.07794	translate	read	null
2026-01-12	“TODO: Fix the Mess Gemini Created”: Towards Understanding GenAI-Induced Self-Admitted Technical Debt	Abdullah Al Mujahid et.al.	2601.07786	translate	read	null
2026-01-12	Enhancing Self-Correction in Large Language Models through Multi-Perspective Reflection	Mariana Costa et.al.	2601.07780	translate	read	null
2026-01-12	Are LLM Decisions Faithful to Verbal Confidence?	Jiawei Wang et.al.	2601.07767	translate	read	null
2026-01-12	Structure First, Reason Next: Enhancing a Large Language Model using Knowledge Graph for Numerical Reasoning in Financial Documents	Aryan Mishra et.al.	2601.07754	translate	read	null
2026-01-12	Evaluating the encoding competence of visual language models using uncommon actions	Chen Ling et.al.	2601.07737	translate	read	null
2026-01-12	Is Agentic RAG worth it? An experimental comparison of RAG approaches	Pietro Ferrazzi et.al.	2601.07711	translate	read	null
2026-01-12	Exploring the Meta-level Reasoning of Large Language Models via a Tool-based Multi-hop Tabular Question Answering Task	Nick Ferguson et.al.	2601.07696	translate	read	null
2026-01-12	Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference	Rei Taniguchi et.al.	2601.07667	translate	read	null
2026-01-12	Towards Automating Blockchain Consensus Verification with IsabeLLM	Elliot Jones et.al.	2601.07654	translate	read	null
2026-01-12	PlaM: Training-Free Plateau-Guided Model Merging for Better Visual Grounding in MLLMs	Zijing Wang et.al.	2601.07645	translate	read	null
2026-01-12	GeoMotionGPT: Geometry-Aligned Motion Understanding with Large Language Models	Zhankai Ye et.al.	2601.07632	translate	read	null
2026-01-12	Proof of Time: A Benchmark for Evaluating Scientific Idea Judgments	Bingyang Ye et.al.	2601.07606	translate	read	null
2026-01-12	OODEval: Evaluating Large Language Models on Object-Oriented Design	Bingxu Xiao et.al.	2601.07602	translate	read	null
2026-01-12	GRPO with State Mutations: Improving LLM-Based Hardware Test Plan Generation	Dimple Vijay Kochar et.al.	2601.07593	translate	read	null
2026-01-12	Large Language Models for Physics Instrument Design	Sara Zoccheddu et.al.	2601.07580	translate	read	null
2026-01-12	Beyond Entangled Planning: Task-Decoupled Planning for Long-Horizon Agents	Yunfan Li et.al.	2601.07577	translate	read	null
2026-01-12	d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation	Yu-Yang Qian et.al.	2601.07568	translate	read	link
2026-01-12	A Unified Framework for Emotion Recognition and Sentiment Analysis via Expert-Guided Multimodal Fusion with Large Language Models	Jiaqi Qiao et.al.	2601.07565	translate	read	null
2026-01-05	Heterogeneous Low-Bandwidth Pre-Training of LLMs	Yazan Obeidi et.al.	2601.02360	translate	read	null
2026-01-05	Robust Persona-Aware Toxicity Detection with Prompt Optimization and Learned Ensembling	Berk Atil et.al.	2601.02337	translate	read	null
2026-01-05	Estimating Text Temperature	Nikolay Mikhaylovskiy et.al.	2601.02320	translate	read	null
2026-01-05	Project Ariadne: A Structural Causal Framework for Auditing Faithfulness in LLM Agents	Sourena Khanzadeh et.al.	2601.02314	translate	read	null
2026-01-05	Placement Semantics for Distributed Deep Learning: A Systematic Framework for Analyzing Parallelism Strategies	Deep Pankajbhai Mehta et.al.	2601.02311	translate	read	null
2026-01-05	Power-of-Two Quantization-Aware-Training (PoT-QAT) in Large Language Models (LLMs)	Mahmoud Elgenedy et.al.	2601.02298	translate	read	null
2026-01-05	CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models	Yihao Liang et.al.	2601.02236	translate	read	null
2026-01-05	ELLA: Efficient Lifelong Learning for Adapters in Large Language Models	Shristi Das Biswas et.al.	2601.02232	translate	read	null
2026-01-05	From XAI to Stories: A Factorial Study of LLM-Generated Explanation Quality	Fabian Lukassen et.al.	2601.02224	translate	read	null
2026-01-05	CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents	Keyu Wang et.al.	2601.02201	translate	read	null
2026-01-05	Toward Global Large Language Models in Medicine	Rui Yang et.al.	2601.02186	translate	read	null
2026-01-05	Confidence Estimation for LLMs in Multi-turn Interactions	Caiqi Zhang et.al.	2601.02179	translate	read	null
2026-01-05	Streaming Hallucination Detection in Long Chain-of-Thought Reasoning	Haolang Lu et.al.	2601.02170	translate	read	null
2026-01-05	EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning	Chuanrui Hu et.al.	2601.02163	translate	read	null
2026-01-05	Routing by Analogy: kNN-Augmented Expert Assignment for Mixture-of-Experts	Boxuan Lyu et.al.	2601.02144	translate	read	null
2026-01-05	Towards Multi-Level Transcript Segmentation: LoRA Fine-Tuning for Table-of-Contents Generation	Steffen Freisinger et.al.	2601.02128	translate	read	null
2026-01-05	DeCode: Decoupling Content and Delivery for Medical QA	Po-Jen Ko et.al.	2601.02123	translate	read	null
2026-01-05	Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot	Chenghao Yin et.al.	2601.02078	translate	read	null
2026-01-05	Deferred Commitment Decoding for Diffusion Language Models with Confidence-Aware Sliding Windows	Yingte Shu et.al.	2601.02076	translate	read	null
2026-01-05	MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics	Zhuofan Shi et.al.	2601.02075	translate	read	null
2026-01-05	FormuLLA: A Large Language Model Approach to Generating Novel 3D Printable Formulations	Adeshola Okubena et.al.	2601.02071	translate	read	null
2026-01-05	Cost-Efficient Cross-Lingual Retrieval-Augmented Generation for Low-Resource Languages: A Case Study in Bengali Agricultural Advisory	Md. Asif Hossain et.al.	2601.02065	translate	read	null
2026-01-05	Perish or Flourish? A Holistic Evaluation of Large Language Models for Code Generation in Functional Programming	Nguyet-Anh H. Lang et.al.	2601.02060	translate	read	null
2026-01-05	Output Embedding Centering for Stable LLM Pretraining	Felix Stollenwerk et.al.	2601.02031	translate	read	null
2026-01-05	Not All Needles Are Found: How Fact Distribution and Don’t Make It Up Prompts Shape Literal Extraction, Logical Inference, and Hallucination Risks in Long-Context LLMs	Amirali Ebrahimzadeh et.al.	2601.02023	translate	read	null
2026-01-05	AgentVNE: LLM-Augmented Graph Reinforcement Learning for Affinity-Aware Multi-Agent Placement in Edge Agentic AI	Runze Zheng et.al.	2601.02021	translate	read	null
2026-01-05	Exploring Approaches for Detecting Memorization of Recommender System Data in Large Language Models	Antonio Colacicco et.al.	2601.02002	translate	read	null
2026-01-05	MindChat: A Privacy-preserving Large Language Model for Mental Health Support	Dong Xue et.al.	2601.01993	translate	read	null
2026-01-05	ChaosBench-Logic: A Benchmark for Logical and Symbolic Reasoning on Chaotic Dynamical Systems	Noel Thomas et.al.	2601.01982	translate	read	null
2026-01-05	Reporting LLM Prompting in Automated Software Engineering: A Guideline Based on Current Practices and Expectations	Alexander Korn et.al.	2601.01954	translate	read	null
2026-01-05	MacVQA: Adaptive Memory Allocation and Global Noise Filtering for Continual Visual Question Answering	Zhifei Li et.al.	2601.01926	translate	read	null
2026-01-05	AR-MOT: Autoregressive Multi-object Tracking	Lianjie Jia et.al.	2601.01925	translate	read	null
2026-01-05	TalkPhoto: A Versatile Training-Free Conversational Assistant for Intelligent Image Editing	Yujie Hu et.al.	2601.01915	translate	read	null
2026-01-05	MMP-A*: Multimodal Perception Enhanced Incremental Heuristic Search on Path Planning	Minh Hieu Ha et.al.	2601.01910	translate	read	null
2026-01-05	Tackling the Inherent Difficulty of Noise Filtering in RAG	Jingyu Liu et.al.	2601.01896	translate	read	null
2026-01-05	Agentic AI in Remote Sensing: Foundations, Taxonomy, and Emerging Systems	Niloufar Alipour Talemi et.al.	2601.01891	translate	read	null
2026-01-05	Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance	Jiawen Zhang et.al.	2601.01887	translate	read	null
2026-01-05	Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents	Yi Yu et.al.	2601.01885	translate	read	null
2026-01-05	Theory Trace Card: Theory-Driven Socio-Cognitive Evaluation of LLMs	Farzan Karimi-Malekabadi et.al.	2601.01878	translate	read	null
2026-01-05	Toward Auditable Neuro-Symbolic Reasoning in Pathology: SQL as an Explicit Trace of Evidence	Kewen Cao et.al.	2601.01875	translate	read	null
2026-01-05	CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving	Shuhang Chen et.al.	2601.01874	translate	read	null
2026-01-05	Entity-Guided Multi-Task Learning for Infrared and Visible Image Fusion	Wenyu Shao et.al.	2601.01870	translate	read	null
2026-01-05	DermoGPT: Open Weights and Open Data for Morphology-Grounded Dermatological Reasoning MLLMs	Jinghan Ru et.al.	2601.01868	translate	read	null
2026-01-05	Judging with Personality and Confidence: A Study on Personality-Conditioned LLM Relevance Assessment	Nuo Chen et.al.	2601.01862	translate	read	null
2026-01-05	Jenius Agent: Towards Experience-Driven Accuracy Optimization in Real-World Scenarios	Defei Xia et.al.	2601.01857	translate	read	null
2026-01-05	MORE: Multi-Objective Adversarial Attacks on Speech Recognition	Xiaoxue Gao et.al.	2601.01852	translate	read	null
2026-01-05	Clinical Knowledge Graph Construction and Evaluation with Multi-LLMs via Retrieval-Augmented Generation	Udiptaman Das et.al.	2601.01844	translate	read	null
2026-01-05	COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs	Dasol Choi et.al.	2601.01836	translate	read	null
2026-01-05	Emergent Introspective Awareness in Large Language Models	Jack Lindsey et.al.	2601.01828	translate	read	null
2026-01-05	Aspect Extraction from E-Commerce Product and Service Reviews	Valiant Lance D. Dionela et.al.	2601.01827	translate	read	null
2026-01-05	CSCBench: A PVC Diagnostic Benchmark for Commodity Supply Chain Reasoning	Yaxin Cui et.al.	2601.01825	translate	read	null
2026-01-05	Causality-Aware Temporal Projection for Video Understanding in Video-LLMs	Zhengjian Kang et.al.	2601.01804	translate	read	null
2026-01-05	UnPII: Unlearning Personally Identifiable Information with Quantifiable Exposure Risk	Intae Jeon et.al.	2601.01786	translate	read	null
2026-01-05	LIA: Supervised Fine-Tuning of Large Language Models for Automatic Issue Assignment	Arsham Khosravani et.al.	2601.01780	translate	read	null
2026-01-05	Can Large Language Models Solve Engineering Equations? A Systematic Comparison of Direct Prediction and Solver-Assisted Approaches	Sai Varun Kodathala et.al.	2601.01774	translate	read	null
2026-01-05	Can LLMs Track Their Output Length? A Dynamic Feedback Mechanism for Precise Length Regulation	Meiman Xiao et.al.	2601.01768	translate	read	null
2026-01-05	A New Benchmark for the Appropriate Evaluation of RTL Code Optimization	Yao Lu et.al.	2601.01765	translate	read	null
2026-01-05	Query-Document Dense Vectors for LLM Relevance Judgment Bias Analysis	Samaneh Mohtadi et.al.	2601.01751	translate	read	null
2026-01-05	Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications	YuanLab. ai et.al.	2601.01718	translate	read	null
2026-01-05	A Training-Free Large Reasoning Model-based Knowledge Tracing Framework for Unified Prediction and Prescription	Unggi Lee et.al.	2601.01708	translate	read	null
2026-01-04	All-Optical Deep Learning with Quantum Nonlinearity	Qingyi Zhou et.al.	2601.01690	translate	read	null
2026-01-04	Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage	Jinwei Hu et.al.	2601.01685	translate	read	null
2026-01-04	Exposing Hidden Interfaces: LLM-Guided Type Inference for Reverse Engineering macOS Private Frameworks	Arina Kharlamova et.al.	2601.01673	translate	read	null
2026-01-04	JMedEthicBench: A Multi-Turn Conversational Benchmark for Evaluating Medical Safety in Japanese Large Language Models	Junyu Liu et.al.	2601.01627	translate	read	null
2026-01-04	Structured Decomposition for LLM Reasoning: Cross-Domain Validation and Semantic Web Integration	Albert Sadowski et.al.	2601.01609	translate	read	null
2026-01-04	OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs	Xin Wang et.al.	2601.01592	translate	read	null
2026-01-04	The Two-Stage Decision-Sampling Hypothesis: Understanding the Emergence of Self-Reflection in RL-Trained LLMs	Zibo Zhao et.al.	2601.01580	translate	read	null
2026-01-04	CaveAgent: Transforming LLMs into Stateful Runtime Operators	Maohao Ran et.al.	2601.01569	translate	read	null
2026-01-04	MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization	Donghua Yu et.al.	2601.01554	translate	read	null
2026-01-04	HalluZig: Hallucination Detection using Zigzag Persistence	Shreyas N. Samaga et.al.	2601.01552	translate	read	null
2026-01-04	Improving Behavioral Alignment in LLM Social Simulations via Context Formation and Navigation	Letian Kong et.al.	2601.01546	translate	read	null
2026-01-04	Bridging the Data Gap: Creating a Hindi Text Summarization Dataset from the English XSUM	Praveenkumar Katwe et.al.	2601.01543	translate	read	null
2026-01-04	Bayesian Orchestration of Multi-LLM Agents for Cost-Aware Sequential Decision-Making	Danial Amin et.al.	2601.01522	translate	read	null
2026-01-04	Distortion Instead of Hallucination: The Effect of Reasoning Under Strict Constraints	Junichiro Niimi et.al.	2601.01490	translate	read	null
2026-01-04	Can Legislation Be Made Machine-Readable in PROLEG?	May-Myo Zin et.al.	2601.01477	translate	read	null
2026-01-04	Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR	Yuxiang Mei et.al.	2601.01461	translate	read	null
2026-01-04	Bayesian Subspace Gradient Estimation for Zeroth-Order Optimization of Large Language Models	Jian Feng et.al.	2601.01452	translate	read	null
2026-01-04	iFlip: Iterative Feedback-driven Counterfactual Example Refinement	Yilong Wang et.al.	2601.01446	translate	read	null
2026-01-04	Personalizing black-box models for nonparametric regression with minimax optimality	Sai Li et.al.	2601.01432	translate	read	null
2026-01-04	From Emotion Classification to Emotional Reasoning: Enhancing Emotional Intelligence in Large Language Models	Arjhun Sreedar et.al.	2601.01407	translate	read	null
2026-01-04	LANCET: Neural Intervention via Structural Entropy for Mitigating Faithfulness Hallucinations in LLMs	Chenxu Wang et.al.	2601.01401	translate	read	null
2026-01-04	EternalMath: A Living Benchmark of Frontier Mathematics that Evolves with Human Discovery	Jicheng Ma et.al.	2601.01400	translate	read	null
2026-01-04	Empowering Small Language Models with Factual Hallucination-Aware Reasoning for Financial Classification	Han Yuan et.al.	2601.01378	translate	read	null
2026-01-04	KGCE: Knowledge-Augmented Dual-Graph Evaluator for Cross-Platform Educational Agent Benchmarking with Multimodal Language Models	Zixian Liu et.al.	2601.01366	translate	read	null
2026-01-04	A unified multimodal understanding and generation model for cross-disciplinary scientific research	Xiaomeng Yang et.al.	2601.01363	translate	read	null
2026-01-04	Investigating the Multilingual Calibration Effects of Language Model Instruction-Tuning	Jerry Huang et.al.	2601.01362	translate	read	null
2026-01-04	Towards LLM-enabled autonomous combustion research: A literature-aware agent for self-corrective modeling workflows	Ke Xiao et.al.	2601.01357	translate	read	null
2026-01-04	Reasoning Over Recall: Evaluating the Efficacy of Generalist Architectures vs. Specialized Fine-Tunes in RAG-Based Mental Health Dialogue Systems	Md Abdullah Al Kafi et.al.	2601.01341	translate	read	null
2026-01-04	FLOP-Efficient Training: Early Stopping Based on Test-Time Compute Awareness	Hossam Amer et.al.	2601.01332	translate	read	null
2026-01-04	Beyond Gemini-3-Pro: Revisiting LLM Routing and Aggregation at Scale	Shengji Tang et.al.	2601.01330	translate	read	null
2026-01-04	Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models	Rong Zhou et.al.	2601.01321	translate	read	null
2026-01-04	Adaptive Hierarchical Evaluation of LLMs and SAST tools for CWE Prediction in Python	Muntasir Adnan et.al.	2601.01320	translate	read	null
2026-01-04	Towards a Principled Muon under $μ\mathsf{P}$ : Ensuring Spectral Conditions throughout Training	John Zhao et.al.	2601.01306	translate	read	null
2026-01-03	Warp-Cortex: An Asynchronous, Memory-Efficient Architecture for Million-Agent Cognitive Scaling on Consumer Hardware	Jorge L. Ruiz Williams et.al.	2601.01298	translate	read	null
2026-01-03	Aggressive Compression Enables LLM Weight Theft	Davis Brown et.al.	2601.01296	translate	read	null
2026-01-03	LLM Collusion	Shengyu Cao et.al.	2601.01279	translate	read	null
2026-01-03	CatchAll: Repository-Aware Exception Handling with Knowledge-Guided LLMs	Qingxiao Tao et.al.	2601.01271	translate	read	null
2026-01-03	From Policy to Logic for Efficient and Interpretable Coverage Assessment	Rhitabrat Pokharel et.al.	2601.01266	translate	read	null
2026-01-03	MambaFormer: Token-Level Guided Routing Mixture-of-Experts for Accurate and Efficient Clinical Assistance	Hamad Khan et.al.	2601.01260	translate	read	null
2026-01-03	Entity-Aware and Secure Query Optimization in Database Using Named Entity Recognition	Azrin Sultana et.al.	2601.01254	translate	read	null
2026-01-03	Racka: Efficient Hungarian LLM Adaptation on Academic Infrastructure	Zsolt Csibi et.al.	2601.01244	translate	read	null
2026-01-03	IO-RAE: Information-Obfuscation Reversible Adversarial Example for Audio Privacy Protection	Jiajie Zhu et.al.	2601.01239	translate	read	null
2026-01-03	Atomizer: An LLM-based Collaborative Multi-Agent Framework for Intent-Driven Commit Untangling	Kangchen Zhu et.al.	2601.01233	translate	read	null
2026-01-03	Correctness isnt Efficiency: Runtime Memory Divergence in LLM-Generated Code	Prateek Rajput et.al.	2601.01215	translate	read	null
2026-01-03	OrchestrRL: Dynamic Compute and Network Orchestration for Disaggregated RL	Xin Tan et.al.	2601.01209	translate	read	null
2026-01-03	EduSim-LLM: An Educational Platform Integrating Large Language Models and Robotic Simulation for Beginners	Shenqi Lu et.al.	2601.01196	translate	read	null
2026-01-03	Reinforcement Learning Enhanced Multi-hop Reasoning for Temporal Knowledge Question Answering	Wuzhenghong Wen et.al.	2601.01195	translate	read	null
2026-01-03	SecureCodeRL: Security-Aware Reinforcement Learning for Code Generation with Partial-Credit Rewards	Suryansh Singh Sijwali et.al.	2601.01184	translate	read	null
2026-01-03	Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models	Zihua Yang et.al.	2601.01162	translate	read	null
2026-01-03	DHI: Leveraging Diverse Hallucination Induction for Enhanced Contrastive Factuality Control in Large Language Models	Jiani Guo et.al.	2601.01156	translate	read	null
2026-01-03	SongSage: A Large Musical Language Model with Lyric Generative Pre-training	Jiani Guo et.al.	2601.01153	translate	read	null
2026-01-03	RovoDev Code Reviewer: A Large-Scale Online Evaluation of LLM-based Code Review Automation at Atlassian	Kla Tantithamthavorn et.al.	2601.01129	translate	read	null
2026-01-03	ScienceDB AI: An LLM-Driven Agentic Recommender System for Large-Scale Scientific Data Sharing Services	Qingqing Long et.al.	2601.01118	translate	read	null
2026-01-03	NarrativeTrack: Evaluating Video Language Models Beyond the Frame	Hyeonjeong Ha et.al.	2601.01095	translate	read	null
2026-01-03	ks-lit-3m: A 3.1 million word kashmiri text dataset for large language model pretraining	Haq Nawaz Malik et.al.	2601.01091	translate	read	null
2026-01-03	Harm in AI-Driven Societies: An Audit of Toxicity Adoption on Chirper.ai	Erica Coppolillo et.al.	2601.01090	translate	read	null
2026-01-03	SPoRC-VIST: A Benchmark for Evaluating Generative Natural Narrative in Vision-Language Models	Yunlin Zeng et.al.	2601.01062	translate	read	null
2026-01-03	A Platform for Interactive AI Character Experiences	Rafael Wampfler et.al.	2601.01027	translate	read	null
2026-01-03	HyperJoin: LLM-augmented Hypergraph Link Prediction for Joinable Table Discovery	Shiyuan Liu et.al.	2601.01015	translate	read	null
2026-01-02	Grain-Aware Data Transformations: Type-Level Formal Verification at Zero Computational Cost	Nikos Karayannidis et.al.	2601.00995	translate	read	null
2026-01-02	Reliability Under Randomness: An Empirical Analysis of Sparse and Dense Language Models Across Decoding Temperatures	Kabir Grover et.al.	2601.00942	translate	read	null
2026-01-02	Emoji-Based Jailbreaking of Large Language Models	M P V S Gopinadh et.al.	2601.00936	translate	read	null
2026-01-02	AI-Guided Computational Design of a Room-Temperature, Ambient- Pressure Superconductor Candidate: Grokene	DEARDAO DeSci Collaborative Team et.al.	2601.00931	translate	read	null
2026-01-02	AlignUSER: Human-Aligned LLM Agents via World Models for Recommender System Evaluation	Nicolas Bougie et.al.	2601.00930	translate	read	null
2026-01-02	Measuring Social Media Polarization Using Large Language Models and Heuristic Rules	Jawad Chowdhury et.al.	2601.00927	translate	read	null
2026-01-01	MACA: A Framework for Distilling Trustworthy LLMs into Efficient Retrievers	Satya Swaroop Gudipudi et.al.	2601.00926	translate	read	null
2026-01-01	Context Collapse: In-Context Learning and Model Collapse	Josef Ott et.al.	2601.00923	translate	read	null
2026-01-01	Attention Needs to Focus: A Unified Perspective on Attention Allocation	Zichuan Fu et.al.	2601.00919	translate	read	null
2026-01-01	The Discovery Gap: How Product Hunt Startups Vanish in LLM Organic Discovery Queries	Amit Prakash Sharma et.al.	2601.00912	translate	read	null
2026-01-02	Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning	Valentin Noël et.al.	2601.00791	translate	read	null
2026-01-02	Investigating the Viability of Employing Multi-modal Large Language Models in the Context of Audio Deepfake Detection	Akanksha Chuchra et.al.	2601.00777	translate	read	null
2026-01-02	Memory Bank Compression for Continual Adaptation of Large Language Models	Thomas Katraouras et.al.	2601.00756	translate	read	null
2026-01-02	The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving	Max Ruiz Luyten et.al.	2601.00747	translate	read	null
2026-01-02	Materials Informatics: Emergence To Autonomous Discovery In The Age Of AI	Turab Lookman et.al.	2601.00742	translate	read	null
2026-01-02	Exploring the Performance of Large Language Models on Subjective Span Identification Tasks	Alphaeus Dmonte et.al.	2601.00736	translate	read	null
2026-01-02	Grading Handwritten Engineering Exams with Multimodal Large Language Models	Janez Perš et.al.	2601.00730	translate	read	null
2026-01-02	A Vision-and-Knowledge Enhanced Large Language Model for Generalizable Pedestrian Crossing Behavior Inference	Qingwen Pu et.al.	2601.00694	translate	read	null
2026-01-02	Human-like AI-based Auto-Field-in-Field Whole-Brain Radiotherapy Treatment Planning With Conversation Large Language Model Feedback	Adnan Jafar et.al.	2601.00685	translate	read	null
2026-01-02	QSLM: A Performance- and Memory-aware Quantization Framework with Tiered Search Strategy for Spike-driven Language Models	Rachmad Vidya Wicaksana Putra et.al.	2601.00679	translate	read	null
2026-01-02	Physio-DPO: Aligning Large Language Models with the Protein Energy Landscape to Eliminate Structural Hallucinations	QiWei Meng et.al.	2601.00647	translate	read	null
2026-01-02	FlexSpec: Frozen Drafts Meet Evolving Targets in Edge-Cloud Collaborative LLM Speculative Decoding	Yuchen Li et.al.	2601.00644	translate	read	null
2026-01-02	Probabilistic Guarantees for Reducing Contextual Hallucinations in LLMs	Nils Rautenberg et.al.	2601.00641	translate	read	null
2026-01-02	SEMODS: A Validated Dataset of Open-Source Software Engineering Models	Alexandra González et.al.	2601.00635	translate	read	null
2026-01-02	Do Chatbot LLMs Talk Too Much? The YapBench Benchmark	Vadim Borisov et.al.	2601.00624	translate	read	null
2026-01-02	DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations	Longtian Qiu et.al.	2601.00623	translate	read	null
2026-01-02	Beyond IVR: Benchmarking Customer Support LLM Agents for Business-Adherence	Sumanth Balaji et.al.	2601.00596	translate	read	null
2026-01-02	CSSBench: Evaluating the Safety of Lightweight LLMs against Chinese-Specific Adversarial Patterns	Zhenhong Zhou et.al.	2601.00588	translate	read	null
2026-01-02	HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts	Zihan Fang et.al.	2601.00583	translate	read	null
2026-01-02	The AI Invisibility Effect: Understanding Human-AI Interaction When Users Don’t Recognize Artificial Intelligence	Obada Kraishan et.al.	2601.00579	translate	read	null
2026-01-02	InfoSynth: Information-Guided Benchmark Synthesis for LLMs	Ishir Garg et.al.	2601.00575	translate	read	null
2026-01-02	Improving Scientific Document Retrieval with Academic Concept Index	Jeyun Lee et.al.	2601.00567	translate	read	null
2026-01-02	Low Rank Comes with Low Security: Gradient Assembly Poisoning Attacks against Distributed LoRA-based LLM Systems	Yueyan Dong et.al.	2601.00566	translate	read	null
2026-01-02	Cracking IoT Security: Can LLMs Outsmart Static Analysis Tools?	Jason Quantrill et.al.	2601.00559	translate	read	null
2026-01-01	Improving LLM-Assisted Secure Code Generation through Retrieval-Augmented-Generation and Multi-Tool Feedback	Vidyut Sriram et.al.	2601.00509	translate	read	null
2026-01-01	Rule-Based Approaches to Atomic Sentence Extraction	Lineesha Kamana et.al.	2601.00506	translate	read	null
2026-01-01	MotionPhysics: Learnable Motion Distillation for Text-Guided Simulation	Miaowei Wang et.al.	2601.00504	translate	read	null
2026-01-01	STELLAR: A Search-Based Testing Framework for Large Language Model Applications	Lev Sorokin et.al.	2601.00497	translate	read	null
2026-01-01	Noise-Aware Named Entity Recognition for Historical VET Documents	Alexander M. Esser et.al.	2601.00488	translate	read	null
2026-01-01	Multi-Agent Coordinated Rename Refactoring	Abhiram Bellur et.al.	2601.00482	translate	read	null
2026-01-01	DSL or Code? Evaluating the Quality of LLM-Generated Algebraic Specifications: A Case Study in Optimization at Kinaxis	Negin Ayoughi et.al.	2601.00469	translate	read	null
2026-01-01	Defensive M2S: Training Guardrail Models on Compressed Multi-turn Conversations	Hyunjun Kim et.al.	2601.00454	translate	read	null
2026-01-01	Language as Mathematical Structure: Examining Semantic Field Theory Against Language Games	Dimitris Vartziotis et.al.	2601.00448	translate	read	null
2026-01-01	Toward Better Temporal Structures for Geopolitical Events Forecasting	Kian Ahrabian et.al.	2601.00430	translate	read	null
2026-01-01	Do LLMs Judge Distantly Supervised Named Entity Labels Well? Constructing the JudgeWEL Dataset	Alistair Plum et.al.	2601.00411	translate	read	null
2026-01-01	Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach	Biao Wu et.al.	2601.00388	translate	read	null
2026-01-01	The Role of Mixed-Language Documents for Multilingual Large Language Model Pretraining	Jiandong Shao et.al.	2601.00364	translate	read	null
2026-01-01	Robust Uncertainty Quantification for Factual Generation of Large Language Models	Yuhao Zhang et.al.	2601.00348	translate	read	null
2026-01-01	Can Large Language Models Still Explain Themselves? Investigating the Impact of Quantization on Self-Explanations	Qianli Wang et.al.	2601.00282	translate	read	null
2026-01-01	Making Theft Useless: Adulteration-Based Protection of Proprietary Knowledge Graphs in GraphRAG Systems	Weijie Wang et.al.	2601.00274	translate	read	null
2026-01-01	FaithSCAN: Model-Driven Single-Pass Hallucination Detection for Faithful Visual Question Answering	Chaodong Tong et.al.	2601.00269	translate	read	null
2026-01-01	Beyond Perfect APIs: A Comprehensive Evaluation of LLM Agents Under Real-World API Complexity	Doyoung Kim et.al.	2601.00268	translate	read	null
2026-01-01	Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counterfactual Example Generation	Qianli Wang et.al.	2601.00263	translate	read	null
2026-01-01	TotalFM: An Organ-Separated Framework for 3D-CT Vision Foundation Models	Kohei Yamamoto et.al.	2601.00260	translate	read	null
2026-01-01	An Empirical Evaluation of LLM-Based Approaches for Code Vulnerability Detection: RAG, SFT, and Dual-Agent Systems	Md Hasan Saju et.al.	2601.00254	translate	read	null
2026-01-01	FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems	Shanli Xing et.al.	2601.00227	translate	read	null
2026-01-01	Talk Less, Verify More: Improving LLM Assistants with Semantic Checks and Execution Feedback	Yan Sun et.al.	2601.00224	translate	read	null
2026-01-01	From Evidence-Based Medicine to Knowledge Graph: Retrieval-Augmented Generation for Sports Rehabilitation and a Domain Benchmark	Jinning Zhang et.al.	2601.00216	translate	read	null
2026-01-01	From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning	Omar Sharif et.al.	2601.00215	translate	read	null
2026-01-01	Overlooked Safety Vulnerability in LLMs: Malicious Intelligent Optimization Algorithm Request and its Jailbreak	Haoran Gu et.al.	2601.00213	translate	read	null
2026-01-01	Knowledge Distillation for Temporal Knowledge Graph Reasoning with Large Language Models	Wang Xing et.al.	2601.00202	translate	read	null
2026-01-01	Pat-DEVAL: Chain-of-Legal-Thought Evaluation for Patent Description	Yongmin Yoo et.al.	2601.00166	translate	read	null
2026-01-01	Combining datasets with different ground truths using Low-Rank Adaptation to generalize image-based CNN models for photometric redshift prediction	Vikram Seenivasan et.al.	2601.00146	translate	read	null

(<a href=../LLM.md>back to LLM</a>)