LLM - 2026-01
LLM - 2026-01
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2026-01-30 | FOCUS: DLLMs Know How to Tame Their Compute Bound | Kaihua Liang et.al. | 2601.23278 | translate | read | null |
| 2026-01-30 | UPA: Unsupervised Prompt Agent via Tree-Based Search and Selection | Siran Peng et.al. | 2601.23273 | translate | read | null |
| 2026-01-30 | TEON: Tensorized Orthonormalization Beyond Layer-Wise Muon for Large Language Model Pre-Training | Ruijie Zhang et.al. | 2601.23261 | translate | read | null |
| 2026-01-30 | GrepRAG: An Empirical Study and Optimization of Grep-Like Retrieval for Code Completion | Baoyi Wang et.al. | 2601.23254 | translate | read | null |
| 2026-01-30 | ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search | Tao Yu et.al. | 2601.23232 | translate | read | null |
| 2026-01-30 | Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning | Xiangyu Zeng et.al. | 2601.23224 | translate | read | null |
| 2026-01-30 | Med-Scout: Curing MLLMs’ Geometric Blindness in Medical Perception via Geometry-Aware RL Post-Training | Anglin Liu et.al. | 2601.23220 | translate | read | null |
| 2026-01-30 | High-quality generation of dynamic game content via small language models: A proof of concept | Morten I. K. Munk et.al. | 2601.23206 | translate | read | null |
| 2026-01-30 | TSAQA: Time Series Analysis Question And Answering Benchmark | Baoyu Jing et.al. | 2601.23204 | translate | read | null |
| 2026-01-30 | Large Language Models for Patent Classification: Strengths, Trade-offs, and the Long Tail Effect | Lorenzo Emer et.al. | 2601.23200 | translate | read | null |
| 2026-01-30 | Deep Search with Hierarchical Meta-Cognitive Monitoring Inspired by Cognitive Neuroscience | Zhongxiang Sun et.al. | 2601.23188 | translate | read | null |
| 2026-01-30 | ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought | Fanmeng Wang et.al. | 2601.23184 | translate | read | link |
| 2026-01-30 | TriSpec: Ternary Speculative Decoding via Lightweight Proxy Verification | Haoyun Jiang et.al. | 2601.23180 | translate | read | null |
| 2026-01-30 | Make Anything Match Your Target: Universal Adversarial Perturbations against Closed-Source MLLMs via Multi-Crop Routed Meta Optimization | Hui Lu et.al. | 2601.23179 | translate | read | null |
| 2026-01-30 | Probing the Trajectories of Reasoning Traces in Large Language Models | Marthe Ballon et.al. | 2601.23163 | translate | read | null |
| 2026-01-30 | DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding | Jiaming Zhou et.al. | 2601.23161 | translate | read | link |
| 2026-01-30 | SPICE: Submodular Penalized Information-Conflict Selection for Efficient Large Language Model Training | Powei Chang et.al. | 2601.23155 | translate | read | null |
| 2026-01-30 | Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data | Eugenia Iofinova et.al. | 2601.23153 | translate | read | null |
| 2026-01-30 | Hearing is Believing? Evaluating and Analyzing Audio Language Model Sycophancy with SYAUDIO | Junchi Yao et.al. | 2601.23149 | translate | read | null |
| 2026-01-30 | RAudit: A Blind Auditing Protocol for Large Language Model Reasoning | Edward Y. Chang et.al. | 2601.23133 | translate | read | null |
| 2026-01-30 | Secure Tool Manifest and Digital Signing Solution for Verifiable MCP and LLM Pipelines | Saeid Jamshidi et.al. | 2601.23132 | translate | read | null |
| 2026-01-30 | An Automatic Deep Learning Approach for Trailer Generation through Large Language Models | Roberto Balestri et.al. | 2601.23121 | translate | read | null |
| 2026-01-30 | CATTO: Balancing Preferences and Confidence in Language Models | Nisarg Parikh et.al. | 2601.23096 | translate | read | null |
| 2026-01-30 | Exploring Sidewalk Sheds in New York City through Chatbot Surveys and Human Computer Interaction | Junyi Li et.al. | 2601.23095 | translate | read | null |
| 2026-01-30 | WiFiPenTester: Advancing Wireless Ethical Hacking with Governed GenAI | Haitham S. Al-Sinani et.al. | 2601.23092 | translate | read | null |
| 2026-01-30 | OrLog: Resolving Complex Queries with LLMs and Probabilistic Reasoning | Mohanna Hoveyda et.al. | 2601.23085 | translate | read | null |
| 2026-01-30 | Character as a Latent Variable in Large Language Models: A Mechanistic Account of Emergent Misalignment and Conditional Safety Failures | Yanghao Su et.al. | 2601.23081 | translate | read | null |
| 2026-01-30 | Towards Explicit Acoustic Evidence Perception in Audio LLMs for Speech Deepfake Detection | Xiaoxuan Guo et.al. | 2601.23066 | translate | read | null |
| 2026-01-30 | HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation | Hari Krishna Gadi et.al. | 2601.23064 | translate | read | null |
| 2026-01-30 | Gender Disparities in StackOverflow’s Community-Based Question Answering: A Matter of Quantity versus Quality | Maddalena Amendola et.al. | 2601.23063 | translate | read | null |
| 2026-01-30 | On the Impact of Code Comments for Automated Bug-Fixing: An Empirical Study | Antonio Vitale et.al. | 2601.23059 | translate | read | null |
| 2026-01-30 | From Absolute to Relative: Rethinking Reward Shaping in Group-Based Reinforcement Learning | Wenzhe Niu et.al. | 2601.23058 | translate | read | null |
| 2026-01-30 | From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics | Bowen Cao et.al. | 2601.23048 | translate | read | null |
| 2026-01-30 | Guided by Trajectories: Repairing and Rewarding Tool-Use Trajectories for Tool-Integrated Reasoning | Siyu Gong et.al. | 2601.23032 | translate | read | null |
| 2026-01-30 | DimABSA: Building Multilingual and Multidomain Datasets for Dimensional Aspect-Based Sentiment Analysis | Lung-Hao Lee et.al. | 2601.23022 | translate | read | null |
| 2026-01-30 | Integrating Multi-Label Classification and Generative AI for Scalable Analysis of User Feedback | Sandra Loop et.al. | 2601.23018 | translate | read | null |
| 2026-01-30 | SolAgent: A Specialized Multi-Agent Framework for Solidity Code Generation | Wei Chen et.al. | 2601.23009 | translate | read | null |
| 2026-01-30 | InstructDiff: Domain-Adaptive Data Selection via Differential Entropy for Efficient LLM Fine-Tuning | Junyou Su et.al. | 2601.23006 | translate | read | null |
| 2026-01-30 | Bias Beyond Borders: Political Ideology Evaluation and Steering in Multilingual LLMs | Afrozah Nadeem et.al. | 2601.23001 | translate | read | null |
| 2026-01-30 | Mano: Restriking Manifold Optimization for LLM Training | Yufei Gu et.al. | 2601.23000 | translate | read | null |
| 2026-01-30 | Competitive Non-Clairvoyant KV-Cache Scheduling for LLM Inference | Yiding Feng et.al. | 2601.22996 | translate | read | null |
| 2026-01-30 | Learnable Permutation for Structured Sparsity on Transformer Models | Zekai Li et.al. | 2601.22980 | translate | read | null |
| 2026-01-30 | Quantifying Model Uniqueness in Heterogeneous AI Ecosystems | Lei You et.al. | 2601.22977 | translate | read | null |
| 2026-01-30 | Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text | Ximing Lu et.al. | 2601.22975 | translate | read | null |
| 2026-01-30 | MiTa: A Hierarchical Multi-Agent Collaboration Framework with Memory-integrated and Task Allocation | XiaoJie Zhang et.al. | 2601.22974 | translate | read | null |
| 2026-01-30 | A Unified View of Attention and Residual Sinks: Outlier-Driven Rescaling is Essential for Transformer Training | Zihan Qiu et.al. | 2601.22966 | translate | read | null |
| 2026-01-30 | SWE-Manager: Selecting and Synthesizing Golden Proposals Before Coding | Boyin Tan et.al. | 2601.22956 | translate | read | null |
| 2026-01-30 | Residual Context Diffusion Language Models | Yuezhou Hu et.al. | 2601.22954 | translate | read | null |
| 2026-01-30 | Sifting the Noise: A Comparative Study of LLM Agents in Vulnerability False Positive Filtering | Yunpeng Xiong et.al. | 2601.22952 | translate | read | null |
| 2026-01-30 | Alignment among Language, Vision and Action Representations | Nicola Milano et.al. | 2601.22948 | translate | read | null |
| 2026-01-30 | Relaxing Positional Alignment in Masked Diffusion Language Models | Mengyu Ye et.al. | 2601.22947 | translate | read | null |
| 2026-01-30 | Protecting Private Code in IDE Autocomplete using Differential Privacy | Evgeny Grigorenko et.al. | 2601.22935 | translate | read | null |
| 2026-01-30 | MTDrive: Multi-turn Interactive Reinforcement Learning for Autonomous Driving | Xidong Li et.al. | 2601.22930 | translate | read | null |
| 2026-01-30 | LLMs Explain’t: A Post-Mortem on Semantic Interpretability in Transformer Models | Alhassan Abdelhalim et.al. | 2601.22928 | translate | read | null |
| 2026-01-30 | BEAR: Towards Beam-Search-Aware Optimization for Recommendation with Large Language Models | Weiqin Yang et.al. | 2601.22925 | translate | read | null |
| 2026-01-30 | Evaluating Large Language Models for Security Bug Report Prediction | Farnaz Soltaniani et.al. | 2601.22921 | translate | read | null |
| 2026-01-30 | LLMDR: Large language model driven framework for missing data recovery in mixed data under low resource regime | Durga Keshav et.al. | 2601.22916 | translate | read | null |
| 2026-01-30 | Game-Theoretic Co-Evolution for LLM-Based Heuristic Discovery | Xinyi Ke et.al. | 2601.22896 | translate | read | null |
| 2026-01-30 | When Machines Get It Wrong: Large Language Models Perpetuate Autism Myths More Than Humans Do | Eduardo C. Garrido-Merchán et.al. | 2601.22893 | translate | read | null |
| 2026-01-30 | MoVE: Mixture of Value Embeddings – A New Axis for Scaling Parametric Memory in Autoregressive Models | Yangyan Li et.al. | 2601.22887 | translate | read | null |
| 2026-01-30 | Leveraging LLMs For Turkish Skill Extraction | Ezgi Arslan İltüzer et.al. | 2601.22885 | translate | read | null |
| 2026-01-30 | EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis | Li Zhou et.al. | 2601.22873 | translate | read | null |
| 2026-01-30 | MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering | Chuanzhe Guo et.al. | 2601.22859 | translate | read | null |
| 2026-01-30 | Learning to Build Shapes by Extrusion | Thor Vestergaard Christiansen et.al. | 2601.22858 | translate | read | null |
| 2026-01-30 | Hierarchical Shift Mixing – Beyond Dense Attention in Transformers | Robert Forchheimer et.al. | 2601.22852 | translate | read | null |
| 2026-01-30 | When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training | Felicia Körner et.al. | 2601.22851 | translate | read | null |
| 2026-01-30 | Hide and Seek in Embedding Space: Geometry-based Steganography and Detection in Large Language Models | Charles Westphal et.al. | 2601.22818 | translate | read | null |
| 2026-01-30 | Stable Personas: Dual-Assessment of Temporal Stability in LLM-Based Human Simulation | Jana Gonnermann-Müller et.al. | 2601.22812 | translate | read | null |
| 2026-01-30 | Operational Solar Flare Forecasting System Using an Explainable Large Language Model | Xuebao Li et.al. | 2601.22811 | translate | read | null |
| 2026-01-30 | Clipping-Free Policy Optimization for Large Language Models | Ömer Veysel Çağatan et.al. | 2601.22801 | translate | read | null |
| 2026-01-30 | Sparse or Dense? A Mechanistic Estimation of Computation Density in Transformer-based LLMs | Corentin Kervadec et.al. | 2601.22795 | translate | read | null |
| 2026-01-30 | Understanding on the Edge: LLM-generated Boundary Test Explanations | Sabinakhon Akbarova et.al. | 2601.22791 | translate | read | null |
| 2026-01-30 | Toward IIT-Inspired Consciousness in LLMs: A Reward-Based Learning Framework | Hamid Reza Akbari et.al. | 2601.22786 | translate | read | null |
| 2026-01-30 | Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval | Ilyass Moummad et.al. | 2601.22783 | translate | read | null |
| 2026-01-30 | Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization | Genshun Wan et.al. | 2601.22779 | translate | read | null |
| 2026-01-30 | RASST: Fast Cross-modal Retrieval-Augmented Simultaneous Speech Translation | Jiaxuan Luo et.al. | 2601.22777 | translate | read | null |
| 2026-01-30 | TSPO: Breaking the Double Homogenization Dilemma in Multi-turn Search Policy Optimization | Shichao Ma et.al. | 2601.22776 | translate | read | null |
| 2026-01-30 | How Far Can Pretrained LLMs Go in Symbolic Music? Controlled Comparisons of Supervised and Preference-based Adaptation | Deepak Kumar et.al. | 2601.22764 | translate | read | null |
| 2026-01-30 | AscendCraft: Automatic Ascend NPU Kernel Generation via DSL-Guided Transcompilation | Zhongzhen Wen et.al. | 2601.22760 | translate | read | null |
| 2026-01-30 | Qualitative Evaluation of LLM-Designed GUI | Bartosz Sawicki et.al. | 2601.22759 | translate | read | null |
| 2026-01-30 | AutoRefine: From Trajectories to Reusable Expertise for Continual LLM Agent Refinement | Libin Qiu et.al. | 2601.22758 | translate | read | null |
| 2026-01-30 | AutoMerge: Search-Based Model Merging Framework for Effective Model Reuse | You Lu et.al. | 2601.22748 | translate | read | null |
| 2026-01-30 | AR-BENCH: Benchmarking Legal Reasoning with Judgment Error Detection, Classification and Correction | Yifei Li et.al. | 2601.22742 | translate | read | null |
| 2026-01-30 | MM-THEBench: Do Reasoning MLLMs Think Reasonably? | Zhidian Huang et.al. | 2601.22735 | translate | read | null |
| 2026-01-30 | ImgCoT: Compressing Long Chain of Thought into Compact Visual Tokens for Efficient Reasoning of Large Language Model | Xiaoshu Chen et.al. | 2601.22730 | translate | read | null |
| 2026-01-30 | A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization | Shiye Lei et.al. | 2601.22718 | translate | read | null |
| 2026-01-30 | RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories | Yanlin Wang et.al. | 2601.22706 | translate | read | null |
| 2026-01-30 | Models Know Models Best: Evaluation via Model-Preferred Formats | Joonhak Lee et.al. | 2601.22699 | translate | read | null |
| 2026-01-30 | FNF: Functional Network Fingerprint for Large Language Models | Yiheng Liu et.al. | 2601.22692 | translate | read | null |
| 2026-01-30 | Do Transformers Have the Ability for Periodicity Generalization? | Huanyu Liu et.al. | 2601.22690 | translate | read | null |
| 2026-01-30 | BioModelsRAG: A Biological Modeling Assistant Using RAG (Retrieval Augmented Generation) | Bhavyahshree Navaneetha Krishnan et.al. | 2601.22684 | translate | read | null |
| 2026-01-30 | VarParser: Unleashing the Neglected Power of Variables for LLM-based Log Parsing | Jinrui Sun et.al. | 2601.22676 | translate | read | null |
| 2026-01-30 | VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration | Hanxun Yu et.al. | 2601.22674 | translate | read | link |
| 2026-01-30 | Real-Time Aligned Reward Model beyond Semantics | Zixuan Huang et.al. | 2601.22664 | translate | read | null |
| 2026-01-30 | Task-Aware LLM Council with Adaptive Decision Pathways for Decision Support | Wei Zhu et.al. | 2601.22662 | translate | read | null |
| 2026-01-30 | UCPO: Uncertainty-Aware Policy Optimization | Xianzhou Zeng et.al. | 2601.22648 | translate | read | null |
| 2026-01-30 | Beyond Medical Chatbots: Meddollina and the Rise of Continuous Clinical Intelligence | Vaibhav Ram S. V. N. S et.al. | 2601.22645 | translate | read | null |
| 2026-01-30 | Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification | Chuxue Cao et.al. | 2601.22642 | translate | read | null |
| 2026-01-30 | Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling | Mingqian Feng et.al. | 2601.22636 | translate | read | null |
| 2026-01-30 | MCP-Diag: A Deterministic, Protocol-Driven Architecture for AI-Native Network Diagnostics | Devansh Lodha et.al. | 2601.22633 | translate | read | null |
| 2026-01-30 | DART-ing Through the Drift: Dynamic Tracing of Knowledge Neurons for Adaptive Inference-Time Pruning | Abhishek Tyagi et.al. | 2601.22632 | translate | read | null |
| 2026-01-30 | Time-Annealed Perturbation Sampling: Diverse Generation for Diffusion Language Models | Jingxuan Wu et.al. | 2601.22629 | translate | read | null |
| 2026-01-30 | TTCS: Test-Time Curriculum Synthesis for Self-Evolving | Chengyi Yang et.al. | 2601.22628 | translate | read | link |
| 2026-01-30 | SYMPHONY: Synergistic Multi-agent Planning with Heterogeneous Language Model Assembly | Wei Zhu et.al. | 2601.22623 | translate | read | null |
| 2026-01-30 | Ethical Risks of Large Language Models in Medical Consultation: An Assessment Based on Reproductive Ethics | Hanhui Xu et.al. | 2601.22621 | translate | read | null |
| 2026-01-30 | Layer-wise Swapping for Generalizable Multilingual Safety | Hyunseo Shin et.al. | 2601.22620 | translate | read | null |
| 2026-01-30 | Learn More with Less: Uncertainty Consistency Guided Query Selection for RLVR | Hao Yi et.al. | 2601.22595 | translate | read | null |
| 2026-01-30 | Small is Beautiful: A Practical and Efficient Log Parsing Framework | Minxing Wang et.al. | 2601.22590 | translate | read | null |
| 2026-01-30 | Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry | Zhuochun Li et.al. | 2601.22588 | translate | read | null |
| 2026-01-30 | HetCCL: Accelerating LLM Training with Heterogeneous GPUs | Heehoon Kim et.al. | 2601.22585 | translate | read | null |
| 2026-01-30 | SpanNorm: Reconciling Training Stability and Performance in Deep Transformers | Chao Wang et.al. | 2601.22580 | translate | read | null |
| 2026-01-30 | PhoStream: Benchmarking Real-World Streaming for Omnimodal Assistants in Mobile Scenarios | Xudong Lu et.al. | 2601.22575 | translate | read | null |
| 2026-01-30 | Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding | Yuansheng Gao et.al. | 2601.22574 | translate | read | null |
| 2026-01-30 | PerfGuard: A Performance-Aware Agent for Visual Content Generation | Zhipeng Chen et.al. | 2601.22571 | translate | read | null |
| 2026-01-30 | Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction | Aditya Sarkar et.al. | 2601.22570 | translate | read | null |
| 2026-01-30 | Whispers of Wealth: Red-Teaming Google’s Agent Payments Protocol via Prompt Injection | Tanusree Debi et.al. | 2601.22569 | translate | read | null |
| 2026-01-30 | Are LLM Evaluators Really Narcissists? Sanity Checking Self-Preference Evaluations | Dani Roytburg et.al. | 2601.22548 | translate | read | null |
| 2026-01-30 | Towards the Holographic Characteristic of LLMs for Efficient Short-text Generation | Shun Qian et.al. | 2601.22546 | translate | read | null |
| 2026-01-30 | SCaLRec: Semantic Calibration for LLM-enabled Cloud-Device Sequential Recommendation | Ruiqi Zheng et.al. | 2601.22543 | translate | read | null |
| 2026-01-30 | Decoding in Geometry: Alleviating Embedding-Space Crowding for Complex Reasoning | Yixin Yang et.al. | 2601.22536 | translate | read | null |
| 2026-01-30 | Darwinian Memory: A Training-Free Self-Regulating Memory System for GUI Agent Evolution | Hongze Mi et.al. | 2601.22528 | translate | read | null |
| 2026-01-30 | $ρ$-$\texttt{EOS}$ : Training-free Bidirectional Variable-Length Control for Masked Diffusion LLMs | Jingyi Yang et.al. | 2601.22527 | translate | read | null |
| 2026-01-30 | Shattered Compositionality: Counterintuitive Learning Dynamics of Transformers for Arithmetic | Xingyu Zhao et.al. | 2601.22510 | translate | read | null |
| 2026-01-30 | FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks | Naen Xu et.al. | 2601.22485 | translate | read | null |
| 2026-01-30 | Head-Aware Visual Cropping: Enhancing Fine-Grained VQA with Attention-Guided Subimage | Junfei Xie et.al. | 2601.22483 | translate | read | null |
| 2026-01-30 | Transform-Augmented GRPO Improves Pass@k | Khiem Le et.al. | 2601.22478 | translate | read | null |
| 2026-01-30 | Unrewarded Exploration in Large Language Models Reveals Latent Learning from Psychology | Jian Xiong et.al. | 2601.22474 | translate | read | null |
| 2026-01-30 | Toward Non-Expert Customized Congestion Control | Mingrui Zhang et.al. | 2601.22461 | translate | read | null |
| 2026-01-30 | ScribbleSense: Generative Scribble-Based Texture Editing with Intent Prediction | Yudi Zhang et.al. | 2601.22455 | translate | read | null |
| 2026-01-30 | Does My Chatbot Have an Agenda? Understanding Human and AI Agency in Human-Human-like Chatbot Interaction | Bhada Yun et.al. | 2601.22452 | translate | read | null |
| 2026-01-30 | Countering the Over-Reliance Trap: Mitigating Object Hallucination for LVLMs via a Self-Validation Framework | Shiyu Liu et.al. | 2601.22451 | translate | read | null |
| 2026-01-30 | Towards Resiliency in Large Language Model Serving with KevlarFlow | Shangshu Qian et.al. | 2601.22438 | translate | read | null |
| 2026-01-30 | Large Language Model Agents Are Not Always Faithful Self-Evolvers | Weixiang Zhao et.al. | 2601.22436 | translate | read | null |
| 2026-01-30 | When LLM meets Fuzzy-TOPSIS for Personnel Selection through Automated Profile Analysis | Shahria Hoque et.al. | 2601.22433 | translate | read | null |
| 2026-01-30 | ScamPilot: Simulating Conversations with LLMs to Protect Against Online Scams | Owen Hoffman et.al. | 2601.22426 | translate | read | null |
| 2026-01-29 | Bifocal Attention: Harmonizing Geometric and Spectral Positional Embeddings for Algorithmic Generalization | Kanishk Awadhiya et.al. | 2601.22402 | translate | read | null |
| 2026-01-29 | Jailbreaks on Vision Language Model via Multimodal Reasoning | Aarush Noheria et.al. | 2601.22398 | translate | read | null |
| 2026-01-29 | Culturally Grounded Personas in Large Language Models: Characterization and Alignment with Socio-Psychological Value Frameworks | Candida M. Greco et.al. | 2601.22396 | translate | read | null |
| 2026-01-29 | Specialists or Generalists? Multi-Agent and Single-Agent LLMs for Essay Grading | Jamiu Adekunle Idowu et.al. | 2601.22386 | translate | read | null |
| 2026-01-29 | Purely Agentic Black-Box Optimization for Biological Design | Natalie Maus et.al. | 2601.22382 | translate | read | null |
| 2026-01-29 | Stability-Aware Prompt Optimization for Clinical Data Abstraction | Arinbjörn Kolbeinsson et.al. | 2601.22373 | translate | read | null |
| 2026-01-29 | Towards Solving the Gilbert-Pollak Conjecture via Large Language Models | Yisi Ke et.al. | 2601.22365 | translate | read | null |
| 2026-01-29 | Context Structure Reshapes the Representational Geometry of Language Models | Eghbal A. Hosseini et.al. | 2601.22364 | translate | read | null |
| 2026-01-29 | Understanding Efficiency: Quantization, Batching, and Serving Strategies in LLM Energy Use | Julien Delavande et.al. | 2601.22362 | translate | read | null |
| 2026-01-29 | MERMAID: Memory-Enhanced Retrieval and Reasoning with Multi-Agent Iterative Knowledge Grounding for Veracity Assessment | Yupeng Cao et.al. | 2601.22361 | translate | read | null |
| 2026-01-29 | Small Talk, Big Impact: The Energy Cost of Thanking AI | Julien Delavande et.al. | 2601.22357 | translate | read | null |
| 2026-01-29 | Sparks of Rationality: Do Reasoning LLMs Align with Human Judgment and Choice? | Ala N. Tak et.al. | 2601.22329 | translate | read | null |
| 2026-01-29 | Federate the Router: Learning Language Model Routers with Sparse and Decentralized Evaluations | Baris Askin et.al. | 2601.22318 | translate | read | null |
| 2026-01-29 | Gaussian Process Bandit Optimization with Machine Learning Predictions and Application to Hypothesis Generation | Xin Jennifer Chen et.al. | 2601.22315 | translate | read | null |
| 2026-01-29 | Hair-Trigger Alignment: Black-Box Evaluation Cannot Guarantee Post-Update Alignment | Yavuz Bakman et.al. | 2601.22313 | translate | read | null |
| 2026-01-29 | SCALAR: Quantifying Structural Hallucination, Consistency, and Reasoning Gaps in Materials Foundation Models | Can Polat et.al. | 2601.22312 | translate | read | null |
| 2026-01-29 | Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents | Zehong Wang et.al. | 2601.22311 | translate | read | null |
| 2026-01-29 | Prepare Reasoning Language Models for Multi-Agent Debate with Self-Debate Reinforcement Learning | Chenxi Liu et.al. | 2601.22297 | translate | read | null |
| 2026-01-29 | The Six Sigma Agent: Achieving Enterprise-Grade Reliability in LLM Systems Through Consensus-Driven Decomposed Execution | Khush Patel et.al. | 2601.22290 | translate | read | null |
| 2026-01-29 | FunPRM: Function-as-Step Process Reward Model with Meta Reward Correction for Code Generation | Ruiyi Zhang et.al. | 2601.22249 | translate | read | null |
| 2026-01-29 | MirrorMark: A Distortion-Free Multi-Bit Watermark for Large Language Models | Ya Jiang et.al. | 2601.22246 | translate | read | null |
| 2026-01-29 | A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy | Pedro H. Barcha Correia et.al. | 2601.22240 | translate | read | null |
| 2026-01-29 | What Lies Beneath: A Call for Distribution-based Visual Question & Answer Datasets | Jill P. Naiman et.al. | 2601.22218 | translate | read | null |
| 2026-01-29 | Stalled, Biased, and Confused: Uncovering Reasoning Failures in LLMs for Cloud-Based Root Cause Analysis | Evelien Riddell et.al. | 2601.22208 | translate | read | null |
| 2026-01-28 | Tacit Coordination of Large Language Models | Ido Aharon et.al. | 2601.22184 | translate | read | null |
| 2026-01-29 | UEval: A Benchmark for Unified Multimodal Generation | Bo Li et.al. | 2601.22155 | translate | read | null |
| 2026-01-29 | DynaWeb: Model-Based Reinforcement Learning of Web Agents | Hang Ding et.al. | 2601.22149 | translate | read | null |
| 2026-01-29 | FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale | Ajay Patel et.al. | 2601.22146 | translate | read | null |
| 2026-01-29 | Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers | Xin Chen et.al. | 2601.22139 | translate | read | null |
| 2026-01-29 | Pay for Hints, Not Answers: LLM Shepherding for Cost-Efficient Inference | Ziming Dong et.al. | 2601.22132 | translate | read | link |
| 2026-01-29 | World of Workflows: a Benchmark for Bringing World Models to Enterprise Systems | Lakshya Gupta et.al. | 2601.22130 | translate | read | null |
| 2026-01-29 | SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents | Yifeng Ding et.al. | 2601.22129 | translate | read | null |
| 2026-01-29 | The Patient is not a Moving Document: A World Model Training Paradigm for Longitudinal EHR | Irsyad Adam et.al. | 2601.22128 | translate | read | null |
| 2026-01-29 | A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine | Anran Li et.al. | 2601.22124 | translate | read | null |
| 2026-01-29 | ECO: Quantized Training without Full-Precision Master Weights | Mahdi Nikdan et.al. | 2601.22101 | translate | read | null |
| 2026-01-29 | VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning | Yibo Wang et.al. | 2601.22069 | translate | read | link |
| 2026-01-29 | Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models | Wenxuan Huang et.al. | 2601.22060 | translate | read | null |
| 2026-01-29 | AIRPET: Virtual Positron Emission Tomography | J. Renner et.al. | 2601.22059 | translate | read | null |
| 2026-01-29 | MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources | Baorui Ma et.al. | 2601.22054 | translate | read | link |
| 2026-01-29 | MasalBench: A Benchmark for Contextual and Cross-Cultural Understanding of Persian Proverbs in LLMs | Ghazal Kalhor et.al. | 2601.22050 | translate | read | null |
| 2026-01-29 | On the Paradoxical Interference between Instruction-Following and Task Solving | Yunjia Qi et.al. | 2601.22047 | translate | read | null |
| 2026-01-29 | Per-parameter Task Arithmetic for Unlearning in Large Language Models | Chengyi Cai et.al. | 2601.22030 | translate | read | null |
| 2026-01-29 | CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty | Johannes Kirmayr et.al. | 2601.22027 | translate | read | null |
| 2026-01-29 | When “Better” Prompts Hurt: Evaluation-Driven Iteration for LLM Applications | Daniel Commey et.al. | 2601.22025 | translate | read | null |
| 2026-01-29 | Visual-Guided Key-Token Regularization for Multimodal Large Language Model Unlearning | Chengyi Cai et.al. | 2601.22020 | translate | read | null |
| 2026-01-29 | TBDFiltering: Sample-Efficient Tree-Based Data Filtering | Robert Istvan Busa-Fekete et.al. | 2601.22016 | translate | read | null |
| 2026-01-29 | SpecTran: Spectral-Aware Transformer-based Adapter for LLM-Enhanced Sequential Recommendation | Yu Cui et.al. | 2601.21986 | translate | read | null |
| 2026-01-29 | Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding | Yifan Zhu et.al. | 2601.21969 | translate | read | null |
| 2026-01-29 | Industrialized Deception: The Collateral Effects of LLM-Generated Misinformation on Digital Ecosystems | Alexander Loth et.al. | 2601.21963 | translate | read | null |
| 2026-01-29 | ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models | Bowen Fang et.al. | 2601.21947 | translate | read | null |
| 2026-01-29 | Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities | Shuangshuang Ying et.al. | 2601.21937 | translate | read | null |
| 2026-01-29 | Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text | Hongyi Zhou et.al. | 2601.21895 | translate | read | null |
| 2026-01-29 | Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning | Lukas Twist et.al. | 2601.21894 | translate | read | null |
| 2026-01-29 | astra-langchain4j: Experiences Combining LLMs and Agent Programming | Rem Collier et.al. | 2601.21879 | translate | read | null |
| 2026-01-29 | Evolution of Benchmark: Black-Box Optimization Benchmark Design through Large Language Model | Chen Wang et.al. | 2601.21877 | translate | read | null |
| 2026-01-29 | LLM-Driven Scenario-Aware Planning for Autonomous Driving | He Li et.al. | 2601.21876 | translate | read | null |
| 2026-01-29 | WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents | Yao Zhang et.al. | 2601.21872 | translate | read | null |
| 2026-01-29 | KnowBias: Mitigating Social Bias in LLMs via Know-Bias Neuron Enhancement | Jinhao Pan et.al. | 2601.21864 | translate | read | null |
| 2026-01-29 | READY: Reward Discovery for Meta-Black-Box Optimization | Zechuan Huang et.al. | 2601.21847 | translate | read | null |
| 2026-01-29 | Embodied Task Planning via Graph-Informed Action Generation with Large Lanaguage Model | Xiang Li et.al. | 2601.21841 | translate | read | null |
| 2026-01-29 | Test-Time Compute Games | Ander Artola Velasco et.al. | 2601.21839 | translate | read | null |
| 2026-01-29 | Mil-SCORE: Benchmarking Long-Context Geospatial Reasoning and Planning in Large Language Models | Aadi Palnitkar et.al. | 2601.21826 | translate | read | null |
| 2026-01-29 | DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training | Xinwei Qiang et.al. | 2601.21824 | translate | read | null |
| 2026-01-29 | CORE:Toward Ubiquitous 6G Intelligence Through Collaborative Orchestration of Large Language Model Agents Over Hierarchical Edge | Zitong Yu et.al. | 2601.21822 | translate | read | null |
| 2026-01-29 | A Judge-Aware Ranking Framework for Evaluating Large Language Models without Ground Truth | Mingyuan Xu et.al. | 2601.21817 | translate | read | null |
| 2026-01-29 | Nonparametric LLM Evaluation from Preference Data | Dennis Frauen et.al. | 2601.21816 | translate | read | null |
| 2026-01-29 | Distribution-Aware Reward Estimation for Test-Time Reinforcement Learning | Bodong Du et.al. | 2601.21804 | translate | read | null |
| 2026-01-29 | A Unified XAI-LLM Approach for EndotrachealSuctioning Activity Recognition | Hoang Khang Phan et.al. | 2601.21802 | translate | read | null |
| 2026-01-29 | CG-MLLM: Captioning and Generating 3D content via Multi-modal Large Language Models | Junming Huang et.al. | 2601.21798 | translate | read | null |
| 2026-01-29 | Effective LoRA Adapter Routing using Task Representations | Akash Dhasade et.al. | 2601.21795 | translate | read | null |
| 2026-01-29 | Assessing the Business Process Modeling Competences of Large Language Models | Chantale Lauer et.al. | 2601.21787 | translate | read | null |
| 2026-01-29 | Zonkey: A Hierarchical Diffusion Language Model with Differentiable Tokenization and Probabilistic Attention | Alon Rozental et.al. | 2601.21768 | translate | read | null |
| 2026-01-29 | Evaluating ChatGPT on Medical Information Extraction Tasks: Performance, Explainability and Beyond | Wei Zhu et.al. | 2601.21767 | translate | read | null |
| 2026-01-29 | EWSJF: An Adaptive Scheduler with Hybrid Partitioning for Mixed-Workload LLM Inference | Bronislav Sidik et.al. | 2601.21758 | translate | read | null |
| 2026-01-29 | Language-based Trial and Error Falls Behind in the Era of Experience | Haoyu Wang et.al. | 2601.21754 | translate | read | null |
| 2026-01-29 | Temporal Guidance for Large Language Models | Hong-Kai Zheng et.al. | 2601.21744 | translate | read | null |
| 2026-01-29 | MIDI-LLaMA: An Instruction-Following Multimodal LLM for Symbolic Music Understanding | Meng Yang et.al. | 2601.21740 | translate | read | null |
| 2026-01-29 | CE-GOCD: Central Entity-Guided Graph Optimization for Community Detection to Augment LLM Scientific Question Answering | Jiayin Lan et.al. | 2601.21733 | translate | read | null |
| 2026-01-29 | E-mem: Multi-agent based Episodic Context Reconstruction for LLM Agent Memory | Kaixiang Wang et.al. | 2601.21714 | translate | read | null |
| 2026-01-29 | TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning | Huiyuan Lai et.al. | 2601.21711 | translate | read | null |
| 2026-01-29 | Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis | Qingyue Yang et.al. | 2601.21709 | translate | read | link |
| 2026-01-29 | FBS: Modeling Native Parallel Reading inside a Transformer | Tongxi Wang et.al. | 2601.21708 | translate | read | null |
| 2026-01-29 | Toward Culturally Aligned LLMs through Ontology-Guided Multi-Agent Reasoning | Wonduk Seo et.al. | 2601.21700 | translate | read | null |
| 2026-01-29 | ChartE $^{3}$ : A Comprehensive Benchmark for End-to-End Chart Editing | Shuo Li et.al. | 2601.21694 | translate | read | null |
| 2026-01-29 | TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning | Mingzu Liu et.al. | 2601.21692 | translate | read | null |
| 2026-01-29 | Do Not Waste Your Rollouts: Recycling Search Experience for Efficient Test-Time Scaling | Xinglin Wang et.al. | 2601.21684 | translate | read | null |
| 2026-01-29 | FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning | Xiaoyu Xu et.al. | 2601.21682 | translate | read | null |
| 2026-01-29 | LLM4Fluid: Large Language Models as Generalizable Neural Solvers for Fluid Dynamics | Qisong Xiao et.al. | 2601.21681 | translate | read | null |
| 2026-01-29 | Scale-Dependent Semantic Dynamics Revealed by Allan Deviation | Debayan Dasgupta et.al. | 2601.21678 | translate | read | null |
| 2026-01-29 | SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding | Ahmed Y. Radwan et.al. | 2601.21666 | translate | read | link |
| 2026-01-29 | AdaptBPE: From General Purpose to Specialized Tokenizers | Vijini Liyanage et.al. | 2601.21665 | translate | read | null |
| 2026-01-29 | ScholarGym: Benchmarking Deep Research Workflows on Academic Literature Retrieval | Hao Shen et.al. | 2601.21654 | translate | read | null |
| 2026-01-29 | ILRR: Inference-Time Steering Method for Masked Diffusion Language Models | Eden Avrahami et.al. | 2601.21647 | translate | read | null |
| 2026-01-29 | RSGround-R1: Rethinking Remote Sensing Visual Grounding through Spatial Reasoning | Shiqi Huang et.al. | 2601.21634 | translate | read | null |
| 2026-01-29 | LAMP: Look-Ahead Mixed-Precision Inference of Large Language Models | Stanislav Budzinskiy et.al. | 2601.21623 | translate | read | null |
| 2026-01-29 | StarSD: One-for-Many Speculative Decoding | Junhao He et.al. | 2601.21622 | translate | read | null |
| 2026-01-29 | Thinking Broad, Acting Fast: Latent Reasoning Distillation from Multi-Perspective Chain-of-Thought for E-Commerce Relevance | Baopu Qiu et.al. | 2601.21611 | translate | read | null |
| 2026-01-29 | WMVLM: Evaluating Diffusion Model Image Watermarking via Vision-Language Models | Zijin Yang et.al. | 2601.21610 | translate | read | null |
| 2026-01-29 | RecNet: Self-Evolving Preference Propagation for Agentic Recommender Systems | Bingqian Li et.al. | 2601.21609 | translate | read | null |
| 2026-01-29 | Age Matters: Analyzing Age-Related Discussions in App Reviews | Shashiwadana Nirmania et.al. | 2601.21605 | translate | read | null |
| 2026-01-29 | CORE: Collaborative Reasoning via Cross Teaching | Kshitij Mishra et.al. | 2601.21600 | translate | read | null |
| 2026-01-29 | Beyond Imitation: Reinforcement Learning for Active Latent Planning | Zhi Zheng et.al. | 2601.21598 | translate | read | null |
| 2026-01-29 | Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening | Xiaotong Ji et.al. | 2601.21590 | translate | read | null |
| 2026-01-29 | ICL-EVADER: Zero-Query Black-Box Evasion Attacks on In-Context Learning and Their Defenses | Ningyuan He et.al. | 2601.21586 | translate | read | null |
| 2026-01-29 | Learning the Mechanism of Catastrophic Forgetting: A Perspective from Gradient Similarity | Mutian Yang et.al. | 2601.21577 | translate | read | null |
| 2026-01-29 | Chain Of Thought Compression: A Theoritical Analysis | Juncai Li et.al. | 2601.21576 | translate | read | null |
| 2026-01-29 | ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas | Xiaoyu Tian et.al. | 2601.21558 | translate | read | null |
| 2026-01-29 | Meta Context Engineering via Agentic Skill Evolution | Haoran Ye et.al. | 2601.21557 | translate | read | null |
| 2026-01-29 | Note2Chat: Improving LLMs for Multi-Turn Clinical History Taking Using Medical Notes | Yang Zhou et.al. | 2601.21551 | translate | read | null |
| 2026-01-29 | ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory | Yang Zhao et.al. | 2601.21545 | translate | read | null |
| 2026-01-29 | Opinion Consensus Formation Among Networked Large Language Models | Iris Yazici et.al. | 2601.21540 | translate | read | null |
| 2026-01-29 | More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD) | Sagi Meir et.al. | 2601.21522 | translate | read | null |
| 2026-01-29 | HERS: Hidden-Pattern Expert Learning for Risk-Specific Vehicle Damage Adaptation in Diffusion Models | Teerapong Panboonyuen et.al. | 2601.21517 | translate | read | null |
| 2026-01-29 | LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AI | Niki van Stein et.al. | 2601.21511 | translate | read | null |
| 2026-01-29 | The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation | Diaoulé Diallo et.al. | 2601.21505 | translate | read | null |
| 2026-01-29 | MAR: Efficient Large Language Models via Module-aware Architecture Refinement | Junhong Cai et.al. | 2601.21503 | translate | read | null |
| 2026-01-29 | The Path of Least Resistance: Guiding LLM Reasining Trajectories with Prefix Consensus | Ishan Jindal et.al. | 2601.21494 | translate | read | null |
| 2026-01-29 | DimStance: Multilingual Datasets for Dimensional Stance Analysis | Jonas Becker et.al. | 2601.21483 | translate | read | null |
| 2026-01-29 | SOUP: Token-level Single-sample Mix-policy Reinforcement Learning for Large Language Models | Lei Yang et.al. | 2601.21476 | translate | read | null |
| 2026-01-29 | Adaptive Confidence Gating in Multi-Agent Collaboration for Efficient and Optimized Code Generation | Haoji Zhang et.al. | 2601.21469 | translate | read | null |
| 2026-01-29 | Topeax – An Improved Clustering Topic Model with Density Peak Detection and Lexical-Semantic Term Importance | Márton Kardos et.al. | 2601.21465 | translate | read | null |
| 2026-01-29 | Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation | Yuan Sui et.al. | 2601.21464 | translate | read | null |
| 2026-01-29 | Unifying Speech Editing Detection and Content Localization via Prior-Enhanced Audio LLMs | Jun Xue et.al. | 2601.21463 | translate | read | null |
| 2026-01-29 | SAGE: Sequence-level Adaptive Gradient Evolution for Generative Recommendation | Yu Xie et.al. | 2601.21452 | translate | read | null |
| 2026-01-29 | Variance & Greediness: A comparative study of metric-learning losses | Donghuo Zeng et.al. | 2601.21450 | translate | read | null |
| 2026-01-29 | ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design | Zhongkai Yu et.al. | 2601.21448 | translate | read | null |
| 2026-01-29 | The Paradox of Robustness: Decoupling Rule-Based Logic from Affective Noise in High-Stakes Decision-Making | Jon Chun et.al. | 2601.21439 | translate | read | null |
| 2026-01-29 | Accurate Network Traffic Matrix Prediction via LEAD: an LLM-Enhanced Adapter-Based Conditional Diffusion Model | Yu Sun et.al. | 2601.21437 | translate | read | null |
| 2026-01-29 | From Consistency to Complementarity: Aligned and Disentangled Multi-modal Learning for Time Series Understanding and Reasoning | Hang Ni et.al. | 2601.21436 | translate | read | null |
| 2026-01-29 | When Prohibitions Become Permissions: Auditing Negation Sensitivity in Language Models | Katherine Elkins et.al. | 2601.21433 | translate | read | null |
| 2026-01-29 | MultiModal Fine-tuning with Synthetic Captions | Shohei Enomoto et.al. | 2601.21426 | translate | read | null |
| 2026-01-29 | ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation | Zihao Huang et.al. | 2601.21420 | translate | read | null |
| 2026-01-29 | Statsformer: Validated Ensemble Learning with LLM-Derived Semantic Priors | Erica Zhang et.al. | 2601.21410 | translate | read | null |
| 2026-01-29 | User-Centric Evidence Ranking for Attribution and Fact Verification | Guy Alt et.al. | 2601.21387 | translate | read | null |
| 2026-01-29 | Predicting Developer Acceptance of AI-Generated Code Suggestions | Jing Jiang et.al. | 2601.21379 | translate | read | null |
| 2026-01-29 | TeachBench: A Syllabus-Grounded Framework for Evaluating Teaching Ability in Large Language Models | Zheng Li et.al. | 2601.21375 | translate | read | null |
| 2026-01-29 | NEMO: Execution-Aware Optimization Modeling via Autonomous Coding Agents | Yang Song et.al. | 2601.21372 | translate | read | null |
| 2026-01-29 | Small models, big threats: Characterizing safety challenges from low-compute AI models | Prateek Puri et.al. | 2601.21365 | translate | read | null |
| 2026-01-29 | The Compliance Paradox: Semantic-Instruction Decoupling in Automated Academic Code Evaluation | Devanshu Sahoo et.al. | 2601.21360 | translate | read | null |
| 2026-01-29 | Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization | Jiecong Wang et.al. | 2601.21358 | translate | read | null |
| 2026-01-29 | Factored Causal Representation Learning for Robust Reward Modeling in RLHF | Yupei Yang et.al. | 2601.21350 | translate | read | null |
| 2026-01-29 | Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER | Xiuwen Zheng et.al. | 2601.21347 | translate | read | null |
| 2026-01-29 | Self-Improving Pretraining: using post-trained models to pretrain better models | Ellen Xiaoqing Tan et.al. | 2601.21343 | translate | read | null |
| 2026-01-29 | Ostrakon-VL: Towards Domain-Expert MLLM for Food-Service and Retail Stores | Zhiyong Shen et.al. | 2601.21342 | translate | read | null |
| 2026-01-29 | EHR-RAG: Bridging Long-Horizon Structured Electronic Health Records and Large Language Models via Enhanced Retrieval-Augmented Generation | Lang Cao et.al. | 2601.21340 | translate | read | null |
| 2026-01-29 | Within-Model vs Between-Prompt Variability in Large Language Models for Creative Tasks | Jennifer Haase et.al. | 2601.21339 | translate | read | null |
| 2026-01-29 | White-Box Op-Amp Design via Human-Mimicking Reasoning | Zihao Chen et.al. | 2601.21321 | translate | read | null |
| 2026-01-29 | Detecting Multiple Semantic Concerns in Tangled Code Commits | Beomsu Koh et.al. | 2601.21298 | translate | read | null |
| 2026-01-29 | More Code, Less Reuse: Investigating Code Quality and Reviewer Sentiment towards AI-generated Pull Requests | Haoming Huang et.al. | 2601.21276 | translate | read | null |
| 2026-01-29 | Reinforcement Learning from Meta-Evaluation: Aligning Language Models Without Ground-Truth Labels | Micah Rentschler et.al. | 2601.21268 | translate | read | null |
| 2026-01-29 | CausalEmbed: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding | Jiahao Huo et.al. | 2601.21262 | translate | read | null |
| 2026-01-29 | User-Centric Phishing Detection: A RAG and LLM-Based Approach | Abrar Hamed Al Barwani et.al. | 2601.21261 | translate | read | null |
| 2026-01-29 | TIDE: Tuning-Integrated Dynamic Evolution for LLM-Based Automated Heuristic Design | Chentong Chen et.al. | 2601.21239 | translate | read | null |
| 2026-01-29 | SHARP: Social Harm Analysis via Risk Profiles for Measuring Inequities in Large Language Models | Alok Abhishek et.al. | 2601.21235 | translate | read | null |
| 2026-01-29 | Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs | Xiang Zheng et.al. | 2601.21233 | translate | read | null |
| 2026-01-29 | MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation | Tianyi Xu et.al. | 2601.21225 | translate | read | null |
| 2026-01-29 | LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models | Alvi Md Ishmam et.al. | 2601.21220 | translate | read | null |
| 2026-01-29 | Parametric Knowledge is Not All You Need: Toward Honest Large Language Models via Retrieval of Pretraining Data | Christopher Adrian Kusuma et.al. | 2601.21218 | translate | read | null |
| 2026-01-29 | Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models | Zhaoyi Li et.al. | 2601.21214 | translate | read | null |
| 2026-01-29 | Intelli-Planner: Towards Customized Urban Planning via Large Language Model Empowered Reinforcement Learning | Xixian Yong et.al. | 2601.21212 | translate | read | null |
| 2026-01-29 | Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification | Paul He et.al. | 2601.21210 | translate | read | null |
| 2026-01-29 | Scaling Embeddings Outperforms Scaling Experts in Language Models | Hong Liu et.al. | 2601.21204 | translate | read | null |
| 2026-01-29 | ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling | Yuchen Yang et.al. | 2601.21198 | translate | read | null |
| 2026-01-29 | Do Reasoning Models Enhance Embedding Models? | Wun Yu Chan et.al. | 2601.21192 | translate | read | null |
| 2026-01-29 | Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks | Arther Tian et.al. | 2601.21189 | translate | read | null |
| 2026-01-29 | MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models | Sangyun Chung et.al. | 2601.21181 | translate | read | link |
| 2026-01-29 | Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving | Jingyun Wang et.al. | 2601.21164 | translate | read | null |
| 2026-01-29 | Bridging the Arithmetic Gap: The Cognitive Complexity Benchmark and Financial-PoT for Robust Financial Reasoning | Boxiang Zhao et.al. | 2601.21157 | translate | read | null |
| 2026-01-29 | Large Language Models Naively Recover Ethnicity from Individual Records | Noah Dasanaike et.al. | 2601.21132 | translate | read | null |
| 2026-01-29 | Beyond a Single Reference: Training and Evaluation with Paraphrases in Sign Language Translation | Václav Javorek et.al. | 2601.21128 | translate | read | null |
| 2026-01-28 | Planner-Auditor Twin: Agentic Discharge Planning with FHIR-Based LLM Planning, Guideline Recall, Optional Caching and Self-Improvement | Kaiyuan Wu et.al. | 2601.21113 | translate | read | null |
| 2026-01-28 | ChunkWise LoRA: Adaptive Sequence Partitioning for Memory-Efficient Low-Rank Adaptation and Accelerated LLM Inference | Ketan Thakkar et.al. | 2601.21109 | translate | read | null |
| 2026-01-28 | OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence | Jarrod Barnes et.al. | 2601.21083 | translate | read | link |
| 2026-01-28 | LOCUS: Low-Dimensional Model Embeddings for Efficient Model Exploration, Comparison, and Selection | Shivam Patel et.al. | 2601.21082 | translate | read | null |
| 2026-01-28 | Towards Comprehensive Benchmarking Infrastructure for LLMs In Software Engineering | Daniel Rodriguez-Cardenas et.al. | 2601.21070 | translate | read | null |
| 2026-01-28 | Textual Equilibrium Propagation for Deep Compound AI Systems | Minghui Chen et.al. | 2601.21064 | translate | read | null |
| 2026-01-28 | Human-LLM Collaborative Feature Engineering for Tabular Data | Zhuoyan Li et.al. | 2601.21060 | translate | read | null |
| 2026-01-28 | Order-Aware Test-Time Adaptation: Leveraging Temporal Dynamics for Robust Streaming Inference | Young Kyung Kim et.al. | 2601.21012 | translate | read | null |
| 2026-01-28 | Bayesian-LoRA: Probabilistic Low-Rank Adaptation of Large Language Models | Moule Lin et.al. | 2601.21003 | translate | read | null |
| 2026-01-28 | UrduBench: An Urdu Reasoning Benchmark using Contextually Ensembled Translations with Human-in-the-Loop | Muhammad Ali Shafique et.al. | 2601.21000 | translate | read | null |
| 2026-01-28 | Diversifying Toxicity Search in Large Language Models Through Speciation | Onkar Shelar et.al. | 2601.20981 | translate | read | null |
| 2026-01-28 | Infusion of Blockchain to Establish Trustworthiness in AI Supported Software Evolution: A Systematic Literature Review | Mohammad Naserameri et.al. | 2601.20918 | translate | read | null |
| 2026-01-28 | Noisy but Valid: Robust Statistical Evaluation of LLMs with Imperfect Judges | Chen Feng et.al. | 2601.20913 | translate | read | null |
| 2026-01-28 | Non-Markov Multi-Round Conversational Image Generation with History-Conditioned MLLMs | Haochen Zhang et.al. | 2601.20911 | translate | read | null |
| 2026-01-28 | TwinWeaver: An LLM-Based Foundation Model Framework for Pan-Cancer Digital Twins | Nikita Makarov et.al. | 2601.20906 | translate | read | null |
| 2026-01-28 | ICON: Intent-Context Coupling for Efficient Multi-Turn Jailbreak Attack | Xingwei Lin et.al. | 2601.20903 | translate | read | null |
| 2026-01-28 | Text-only adaptation in LLM-based ASR through text denoising | Sergio Burdisso et.al. | 2601.20900 | translate | read | null |
| 2026-01-28 | Reducing Prompt Sensitivity in LLM-based Speech Recognition Through Learnable Projection | Sergio Burdisso et.al. | 2601.20898 | translate | read | null |
| 2026-01-28 | IDE-Bench: Evaluating Large Language Models as IDE Agents on Real-World Software Engineering Tasks | Spencer Mateega et.al. | 2601.20886 | translate | read | null |
| 2026-01-27 | What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models | Md Tasnim Jawad et.al. | 2601.20885 | translate | read | null |
| 2026-01-28 | When Flores Bloomz Wrong: Cross-Direction Contamination in Machine Translation Evaluation | David Tan et.al. | 2601.20858 | translate | read | null |
| 2026-01-28 | SokoBench: Evaluating Long-Horizon Planning and Reasoning in Large Language Models | Sebastiano Monti et.al. | 2601.20856 | translate | read | null |
| 2026-01-28 | Reward Models Inherit Value Biases from Pretraining | Brian Christian et.al. | 2601.20838 | translate | read | null |
| 2026-01-28 | Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives | Tengyue Xu et.al. | 2601.20833 | translate | read | link |
| 2026-01-28 | MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents | Vishnu Sashank Dorbala et.al. | 2601.20831 | translate | read | null |
| 2026-01-28 | Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning | Minwu Kim et.al. | 2601.20829 | translate | read | link |
| 2026-01-28 | Context-Augmented Code Generation Using Programming Knowledge Graphs | Shahd Seddik et.al. | 2601.20810 | translate | read | null |
| 2026-01-28 | How Disciplinary Partnerships Shape Research Landscape in U.S. Library and Information Science Schools | Jiangen He et.al. | 2601.20806 | translate | read | null |
| 2026-01-28 | Reinforcement Learning via Self-Distillation | Jonas Hübotter et.al. | 2601.20802 | translate | read | link |
| 2026-01-28 | Dissecting Multimodal In-Context Learning: Modality Asymmetries and Circuit Dynamics in modern Transformers | Yiran Huang et.al. | 2601.20796 | translate | read | null |
| 2026-01-28 | Agentic Fog: A Policy-driven Framework for Distributed Intelligence in Fog Computing | Saeed Akbar et.al. | 2601.20764 | translate | read | null |
| 2026-01-28 | Persona Prompting as a Lens on LLM Social Reasoning | Jing Yang et.al. | 2601.20757 | translate | read | link |
| 2026-01-28 | ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler | Bohua Zou et.al. | 2601.20755 | translate | read | null |
| 2026-01-28 | Like a Therapist, But Not: Reddit Narratives of AI in Mental Health Contexts | Elham Aghakhani et.al. | 2601.20747 | translate | read | null |
| 2026-01-28 | HESTIA: A Hessian-Guided Differentiable Quantization-Aware Training Framework for Extremely Low-Bit LLMs | Guoan Wang et.al. | 2601.20745 | translate | read | null |
| 2026-01-28 | Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification | Xin Jin et.al. | 2601.20742 | translate | read | null |
| 2026-01-28 | QueerGen: How LLMs Reflect Societal Norms on Gender and Sexuality in Sentence Completion Tasks | Mae Sosto et.al. | 2601.20731 | translate | read | null |
| 2026-01-28 | AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts | Shicheng Fang et.al. | 2601.20730 | translate | read | link |
| 2026-01-28 | Audit Trails for Accountability in Large Language Models | Victor Ojewale et.al. | 2601.20727 | translate | read | null |
| 2026-01-28 | MedViz: An Agent-based, Visual-guided Research Assistant for Navigating Biomedical Literature | Huan He et.al. | 2601.20709 | translate | read | null |
| 2026-01-28 | Beyond GEMM-Centric NPUs: Enabling Efficient Diffusion LLM Sampling | Binglei Lou et.al. | 2601.20706 | translate | read | null |
| 2026-01-28 | Structurally Human, Semantically Biased: Detecting LLM-Generated References with Embeddings and GNNs | Melika Mobini et.al. | 2601.20704 | translate | read | null |
| 2026-01-28 | Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework | Xinyue Li et.al. | 2601.20689 | translate | read | null |
| 2026-01-28 | Online Density-Based Clustering for Real-Time Narrative Evolution Monitorin | Ostap Vykhopen et.al. | 2601.20680 | translate | read | null |
| 2026-01-28 | ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code | Mingqiao Mo et.al. | 2601.20679 | translate | read | null |
| 2026-01-28 | Efficient Multimodal Planning Agent for Visual Question-Answering | Zhuo Chen et.al. | 2601.20676 | translate | read | null |
| 2026-01-28 | bi-modal textual prompt learning for vision-language models in remote sensing | Pankhi Kashyap et.al. | 2601.20675 | translate | read | null |
| 2026-01-28 | Harnessing Large Language Models for Precision Querying and Retrieval-Augmented Knowledge Extraction in Clinical Data Science | Juan Jose Rubio Jan et.al. | 2601.20674 | translate | read | null |
| 2026-01-28 | When Vision Meets Texts in Listwise Reranking | Hongyi Cai et.al. | 2601.20623 | translate | read | null |
| 2026-01-28 | GDCNet: Generative Discrepancy Comparison Network for Multimodal Sarcasm Detection | Shuguang Zhang et.al. | 2601.20618 | translate | read | null |
| 2026-01-28 | Agent Benchmarks Fail Public Sector Requirements | Jonathan Rystrøm et.al. | 2601.20617 | translate | read | null |
| 2026-01-28 | DRAINCODE: Stealthy Energy Consumption Attacks on Retrieval-Augmented Code Generation via Context Poisoning | Yanlin Wang et.al. | 2601.20615 | translate | read | null |
| 2026-01-28 | Dialogical Reasoning Across AI Architectures: A Multi-Model Framework for Testing AI Alignment Strategies | Gray Cox et.al. | 2601.20604 | translate | read | null |
| 2026-01-28 | MeCo: Enhancing LLM-Empowered Multi-Robot Collaboration via Similar Task Memoization | Baiqing Wang et.al. | 2601.20577 | translate | read | null |
| 2026-01-28 | Gen-SER: When the generative model meets speech emotion recognition | Taihui Wang et.al. | 2601.20573 | translate | read | null |
| 2026-01-28 | Beyond Divergent Creativity: A Human-Based Evaluation of Creativity in Large Language Models | Kumiko Nakajima et.al. | 2601.20546 | translate | read | null |
| 2026-01-28 | PathWise: Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs | Oguzhan Gungordu et.al. | 2601.20539 | translate | read | null |
| 2026-01-28 | Interpreting Emergent Extreme Events in Multi-Agent Systems | Ling Tang et.al. | 2601.20538 | translate | read | null |
| 2026-01-28 | Context Tokens are Anchors: Understanding the Repetition Curse in dMLLMs from an Information Flow Perspective | Qiyan Zhao et.al. | 2601.20520 | translate | read | null |
| 2026-01-28 | Can We Improve Educational Diagram Generation with In-Context Examples? Not if a Hallucination Spoils the Bunch | Evanfiya Logacheva et.al. | 2601.20476 | translate | read | null |
| 2026-01-28 | Piloting Planetarium Visualizations with LLMs during Live Events in Science Centers | Mathis Brossier et.al. | 2601.20466 | translate | read | null |
| 2026-01-28 | PEARL: Plan Exploration and Adaptive Reinforcement Learning for Multihop Tool Use | Qihao Wang et.al. | 2601.20439 | translate | read | null |
| 2026-01-28 | Concept Component Analysis: A Principled Approach for Concept Extraction in LLMs | Yuhang Liu et.al. | 2601.20420 | translate | read | null |
| 2026-01-28 | Beyond Accuracy: A Cognitive Load Framework for Mapping the Capability Boundaries of Tool-use Agents | Qihao Wang et.al. | 2601.20412 | translate | read | null |
| 2026-01-28 | GuideAI: A Real-time Personalized Learning Solution with Adaptive Interventions | Ananya Shukla et.al. | 2601.20402 | translate | read | null |
| 2026-01-28 | Eliminating Hallucination in Diffusion-Augmented Interactive Text-to-Image Retrieval | Zhuocheng Zhang et.al. | 2601.20391 | translate | read | null |
| 2026-01-28 | Policy of Thoughts: Scaling LLM Reasoning via Test-time Policy Evolution | Zhengbo Jiao et.al. | 2601.20379 | translate | read | null |
| 2026-01-28 | LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning | Wei Huang et.al. | 2601.20375 | translate | read | null |
| 2026-01-28 | AMA: Adaptive Memory via Multi-Agent Collaboration | Weiquan Huang et.al. | 2601.20352 | translate | read | null |
| 2026-01-28 | Demonstration-Free Robotic Control via LLM Agents | Brian Y. Tsui et.al. | 2601.20334 | translate | read | null |
| 2026-01-28 | PsychePass: Calibrating LLM Therapeutic Competence via Trajectory-Anchored Tournaments | Zhuang Chen et.al. | 2601.20330 | translate | read | null |
| 2026-01-28 | ECG-Agent: On-Device Tool-Calling Agent for ECG Multi-Turn Dialogue | Hyunseung Chung et.al. | 2601.20323 | translate | read | null |
| 2026-01-28 | Less is More: Benchmarking LLM Based Recommendation Agents | Kargi Chauhan et.al. | 2601.20316 | translate | read | null |
| 2026-01-28 | DiagLink: A Dual-User Diagnostic Assistance System by Synergizing Experts with LLMs and Knowledge Graphs | Zihan Zhou et.al. | 2601.20311 | translate | read | null |
| 2026-01-28 | SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips | Jiahuan Yu et.al. | 2601.20309 | translate | read | null |
| 2026-01-28 | Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction | Tianyi Alex Qiu et.al. | 2601.20299 | translate | read | null |
| 2026-01-28 | Memory Retrieval in Transformers: Insights from The Encoding Specificity Principle | Viet Hung Dinh et.al. | 2601.20282 | translate | read | null |
| 2026-01-28 | Eliciting Least-to-Most Reasoning for Phishing URL Detection | Holly Trikilis et.al. | 2601.20270 | translate | read | null |
| 2026-01-28 | HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH | Yueyang Wang et.al. | 2601.20255 | translate | read | null |
| 2026-01-28 | Efficient Evaluation of LLM Performance with Statistical Guarantees | Skyler Wu et.al. | 2601.20251 | translate | read | null |
| 2026-01-28 | Large Language Models Polarize Ideologically but Moderate Affectively in Online Political Discourse | Gavin Wang et.al. | 2601.20238 | translate | read | null |
| 2026-01-28 | Unit-Based Agent for Semi-Cascaded Full-Duplex Dialogue Systems | Haoyuan Yu et.al. | 2601.20230 | translate | read | null |
| 2026-01-28 | Scaling Medical Reasoning Verification via Tool-Integrated Reinforcement Learning | Hang Zhang et.al. | 2601.20221 | translate | read | null |
| 2026-01-28 | Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning | Jinyang Wu et.al. | 2601.20209 | translate | read | null |
| 2026-01-28 | An Autonomous Agent Framework for Feature-Label Extraction from Device Dialogues and Automatic Multi-Dimensional Device Hosting Planning Based on Large Language Models | Huichao Men et.al. | 2601.20194 | translate | read | null |
| 2026-01-28 | Me-Agent: A Personalized Mobile Agent with Two-Level User Habit Learning for Enhanced Interaction | Shuoxin Wang et.al. | 2601.20162 | translate | read | null |
| 2026-01-28 | Large language models accurately predict public perceptions of support for climate action worldwide | Nattavudh Powdthavee et.al. | 2601.20141 | translate | read | null |
| 2026-01-27 | BengaliSent140: A Large-Scale Bengali Binary Sentiment Dataset for Hate and Non-Hate Speech Classification | Akif Islam et.al. | 2601.20129 | translate | read | null |
| 2026-01-27 | Rewarding Intellectual Humility Learning When Not To Answer In Large Language Models | Abha Jha et.al. | 2601.20126 | translate | read | null |
| 2026-01-27 | Usage, Effects and Requirements for AI Coding Assistants in the Enterprise: An Empirical Study | Maja Vukovic et.al. | 2601.20112 | translate | read | null |
| 2026-01-27 | FFE-Hallu:Hallucinations in Fixed Figurative Expressions:Benchmark of Idioms and Proverbs in the Persian Language | Faezeh Hosseini et.al. | 2601.20105 | translate | read | null |
| 2026-01-27 | Dynamics of Human-AI Collective Knowledge on the Web: A Scalable Model and Insights for Sustainable Growth | Buddhika Nettasinghe et.al. | 2601.20099 | translate | read | null |
| 2026-01-27 | Should I Have Expressed a Different Intent? Counterfactual Generation for LLM-Based Autonomous Control | Amirmohammad Farzaneh et.al. | 2601.20090 | translate | read | null |
| 2026-01-27 | Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery | Meng Xin et.al. | 2601.20088 | translate | read | null |
| 2026-01-27 | Sparse CLIP: Co-Optimizing Interpretability and Performance in Contrastive Learning | Chuan Qin et.al. | 2601.20075 | translate | read | null |
| 2026-01-23 | A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs | Dayal Singh Kalra et.al. | 2601.16979 | translate | read | null |
| 2026-01-23 | Auto-Regressive Masked Diffusion Models | Mahdi Karami et.al. | 2601.16971 | translate | read | null |
| 2026-01-23 | Empowering Medical Equipment Sustainability in Low-Resource Settings: An AI-Powered Diagnostic and Support Platform for Biomedical Technicians | Bernes Lorier Atabonfack et.al. | 2601.16967 | translate | read | null |
| 2026-01-23 | AgentDrive: An Open Benchmark Dataset for Agentic AI Reasoning with LLM-Generated Scenarios in Autonomous Systems | Mohamed Amine Ferrag et.al. | 2601.16964 | translate | read | null |
| 2026-01-23 | DataStates-LLM: Scalable Checkpointing for Transformer Models Using Composable State Providers | Avinash Maurya et.al. | 2601.16956 | translate | read | null |
| 2026-01-23 | Strategies for Span Labeling with Large Language Models | Danil Semin et.al. | 2601.16946 | translate | read | null |
| 2026-01-23 | GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints | Andy Zhu et.al. | 2601.16905 | translate | read | null |
| 2026-01-23 | Reasoning Promotes Robustness in Theory of Mind Tasks | Ian B. de Haan et.al. | 2601.16853 | translate | read | null |
| 2026-01-23 | Trapped in the past? Disentangling fluid and crystallized intelligence of large language models using chess | Leonard S. Pleiss et.al. | 2601.16823 | translate | read | null |
| 2026-01-23 | Large Language Models as Automatic Annotators and Annotation Adjudicators for Fine-Grained Opinion Analysis | Gaurav Negi et.al. | 2601.16800 | translate | read | null |
| 2026-01-23 | Persuasion Tokens for Editing Factual Knowledge in LLMs | Paul Youssef et.al. | 2601.16781 | translate | read | null |
| 2026-01-23 | LLM-powered Real-time Patent Citation Recommendation for Financial Technologies | Tianang Deng et.al. | 2601.16775 | translate | read | null |
| 2026-01-23 | Standardizing Longitudinal Radiology Report Evaluation via Large Language Model Annotation | Xinyi Wang et.al. | 2601.16753 | translate | read | null |
| 2026-01-23 | Supporting Stakeholder Requirements Expression with LLM Revisions: An Empirical Evaluation | Michael Mircea et.al. | 2601.16699 | translate | read | null |
| 2026-01-23 | AgentsEval: Clinically Faithful Evaluation of Medical Imaging Reports via Multi-Agent Reasoning | Suzhong Fu et.al. | 2601.16685 | translate | read | null |
| 2026-01-23 | From Transactions to Exploits: Automated PoC Synthesis for Real-World DeFi Attacks | Xing Su et.al. | 2601.16681 | translate | read | null |
| 2026-01-23 | PLawBench: A Rubric-Based Benchmark for Evaluating LLMs in Real-World Legal Practice | Yuzhen Shi et.al. | 2601.16669 | translate | read | null |
| 2026-01-23 | Revisiting the Role of Natural Language Code Comments in Code Translation | Monika Gupta et.al. | 2601.16661 | translate | read | null |
| 2026-01-23 | Select or Project? Evaluating Lower-dimensional Vectors for LLM Training Data Explanations | Lukas Hinterleitner et.al. | 2601.16651 | translate | read | null |
| 2026-01-23 | LUMINA: Long-horizon Understanding for Multi-turn Interactive Agents | Amin Rakhsha et.al. | 2601.16649 | translate | read | null |
| 2026-01-23 | MultiLexNorm++: A Unified Benchmark and a Generative Model for Lexical Normalization for Asian Languages | Weerayut Buaphet et.al. | 2601.16623 | translate | read | null |
| 2026-01-23 | How Does Personalized Memory Shape LLM Behavior? Benchmarking Rational Preference Utilization in Personalized Assistants | Xueyang Feng et.al. | 2601.16621 | translate | read | null |
| 2026-01-23 | PROST-LLM: Progressively Enhancing the Speech-to-Speech Translation Capability in LLMs | Jing Xu et.al. | 2601.16618 | translate | read | null |
| 2026-01-23 | AuroraEdge-V-2B: A Faster And Stronger Edge Visual Large Language Model | Xiang Chen et.al. | 2601.16615 | translate | read | null |
| 2026-01-23 | Attention-MoA: Enhancing Mixture-of-Agents via Inter-Agent Semantic Attention and Deep Residual Synthesis | Jianyu Wen et.al. | 2601.16596 | translate | read | null |
| 2026-01-23 | X-Aligner: Composed Visual Retrieval without the Bells and Whistles | Yuqian Zheng et.al. | 2601.16582 | translate | read | null |
| 2026-01-23 | Predicting Startup Success Using Large Language Models: A Novel In-Context Learning Approach | Abdurahman Maarouf et.al. | 2601.16568 | translate | read | null |
| 2026-01-23 | Retrieve-Refine-Calibrate: A Framework for Complex Claim Fact-Checking | Mingwei Sun et.al. | 2601.16555 | translate | read | null |
| 2026-01-23 | LLM is Not All You Need: A Systematic Evaluation of ML vs. Foundation Models for text and image based Medical Classification | Meet Raval et.al. | 2601.16549 | translate | read | null |
| 2026-01-23 | CORD: Bridging the Audio-Text Reasoning Gap via Weighted On-policy Cross-modal Distillation | Jing Hu et.al. | 2601.16547 | translate | read | null |
| 2026-01-23 | Do Models Hear Like Us? Probing the Representational Alignment of Audio LLMs and Naturalistic EEG | Haoyun Yang et.al. | 2601.16540 | translate | read | null |
| 2026-01-23 | OnlineSI: Taming Large Language Model for Online 3D Understanding and Grounding | Zixian Liu et.al. | 2601.16538 | translate | read | null |
| 2026-01-23 | W4A16 Mixed-Precision Matrix Multiplication on Decoupled Architecture: Kernel Design and Memory Bottleneck Analysis for Ascend NPUs | Yuanhong He et.al. | 2601.16536 | translate | read | null |
| 2026-01-23 | Curate-Train-Refine: A Closed-Loop Agentic Framework for Zero Shot Classification | Gaurav Maheshwari et.al. | 2601.16530 | translate | read | null |
| 2026-01-23 | SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care | Dongshen Peng et.al. | 2601.16529 | translate | read | null |
| 2026-01-23 | TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning | Daixian Liu et.al. | 2601.16520 | translate | read | null |
| 2026-01-23 | DANCE: Dynamic, Available, Neighbor-gated Condensation for Federated Text-Attributed Graphs | Zekai Chen et.al. | 2601.16519 | translate | read | null |
| 2026-01-23 | Rethinking Large Language Models For Irregular Time Series Classification In Critical Care | Feixiang Zheng et.al. | 2601.16516 | translate | read | null |
| 2026-01-23 | SearchLLM: Detecting LLM Paraphrased Text by Measuring the Similarity with Regeneration of the Candidate Source via Search Engine | Hoang-Quoc Nguyen-Son et.al. | 2601.16512 | translate | read | null |
| 2026-01-23 | REprompt: Prompt Generation for Intelligent Software Development Guided by Requirements Engineering | Junjie Shi et.al. | 2601.16507 | translate | read | null |
| 2026-01-23 | SafeThinker: Reasoning about Risk to Deepen Safety Beyond Shallow Alignment | Xianya Fang et.al. | 2601.16506 | translate | read | null |
| 2026-01-23 | EvoConfig: Self-Evolving Multi-Agent Systems for Efficient Autonomous Environment Configuration | Xinshuai Guo et.al. | 2601.16489 | translate | read | null |
| 2026-01-23 | Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic | Yichuan Ma et.al. | 2601.16486 | translate | read | null |
| 2026-01-23 | FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning | Haoxu Wang et.al. | 2601.16483 | translate | read | null |
| 2026-01-23 | TL-GRPO: Turn-Level RL for Reasoning-Guided Iterative Optimization | Peiji Li et.al. | 2601.16480 | translate | read | null |
| 2026-01-23 | Doc2AHP: Inferring Structured Multi-Criteria Decision Models via Semantic Trees with LLMs | Hongjia Wu et.al. | 2601.16479 | translate | read | null |
| 2026-01-23 | Order from Chaos: Physical World Understanding from Glitchy Gameplay Videos | Meng Cao et.al. | 2601.16471 | translate | read | null |
| 2026-01-23 | Persona Jailbreaking in Large Language Models | Jivnesh Sandhan et.al. | 2601.16466 | translate | read | null |
| 2026-01-23 | Cutting the Gordian Knot: Detecting Malicious PyPI Packages via a Knowledge-Mining Framework | Wenbo Guo et.al. | 2601.16463 | translate | read | null |
| 2026-01-23 | Graph-Anchored Knowledge Indexing for Retrieval-Augmented Generation | Zhenghao Liu et.al. | 2601.16462 | translate | read | null |
| 2026-01-23 | Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding | Xiaojiang Peng et.al. | 2601.16449 | translate | read | null |
| 2026-01-23 | Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go | Yichuan Ma et.al. | 2601.16447 | translate | read | null |
| 2026-01-23 | Exploring the Effects of Alignment on Numerical Bias in Large Language Models | Ayako Sato et.al. | 2601.16444 | translate | read | null |
| 2026-01-23 | iPDB – Optimizing SQL Queries with ML and LLM Predicates | Udesh Kumarasinghe et.al. | 2601.16432 | translate | read | null |
| 2026-01-23 | Learning Domain Knowledge in Multimodal Large Language Models through Reinforcement Fine-Tuning | Qinglong Cao et.al. | 2601.16419 | translate | read | null |
| 2026-01-23 | Gen-DBA: Generative Database Agents (Towards a Move 37 for Databases) | Yeasir Rayhan et.al. | 2601.16409 | translate | read | null |
| 2026-01-23 | Jacobian Scopes: token-level causal attributions in LLMs | Toni J. B. Liu et.al. | 2601.16407 | translate | read | null |
| 2026-01-23 | Towards a Theoretical Understanding to the Generalization of RLHF | Zhaochun Li et.al. | 2601.16403 | translate | read | null |
| 2026-01-23 | Clarify or Answer: Reinforcement Learning for Agentic VQA with Context Under-specification | Zongwan Cao et.al. | 2601.16400 | translate | read | null |
| 2026-01-23 | White-Box Sensitivity Auditing with Steering Vectors | Hannah Cyberey et.al. | 2601.16398 | translate | read | null |
| 2026-01-23 | ResAgent: Entropy-based Prior Point Discovery and Visual Reasoning for Referring Expression Segmentation | Yihao Wang et.al. | 2601.16394 | translate | read | null |
| 2026-01-23 | Cross-Lingual Activation Steering for Multilingual Language Models | Rhitabrat Pokharel et.al. | 2601.16390 | translate | read | null |
| 2026-01-23 | PolyAgent: Large Language Model Agent for Polymer Design | Vani Nigam et.al. | 2601.16376 | translate | read | null |
| 2026-01-22 | The Behavioral Fabric of LLM-Powered GUI Agents: Human Values and Interaction Outcomes | Simret Araya Gebreegziabher et.al. | 2601.16356 | translate | read | null |
| 2026-01-22 | Identity, Cooperation and Framing Effects within Groups of Real and Simulated Humans | Suhong Moon et.al. | 2601.16355 | translate | read | null |
| 2026-01-22 | NOIR: Privacy-Preserving Generation of Code with Open-Source LLMs | Khoa Nguyen et.al. | 2601.16354 | translate | read | null |
| 2026-01-22 | Regional Bias in Large Language Models | M P V S Gopinadh et.al. | 2601.16349 | translate | read | null |
| 2026-01-22 | Identifying Concurrency Bug Reports via Linguistic Patterns | Shuai Shao et.al. | 2601.16338 | translate | read | null |
| 2026-01-22 | National Quantum Strategies: A Data-Driven Approach to Understanding the Quantum Ecosystem | Simon Richard Goorney et.al. | 2601.16329 | translate | read | null |
| 2026-01-22 | Machine-Assisted Grading of Nationwide School-Leaving Essay Exams with LLMs and Statistical NLP | Andres Karjus et.al. | 2601.16314 | translate | read | null |
| 2026-01-22 | A Longitudinal, Multinational, and Multilingual Corpus of News Coverage of the Russo-Ukrainian War | Dikshya Mohanty et.al. | 2601.16309 | translate | read | null |
| 2026-01-22 | When Agents Fail to Act: A Diagnostic Framework for Tool Invocation Reliability in Multi-Agent LLM Systems | Donghao Huang et.al. | 2601.16280 | translate | read | null |
| 2026-01-22 | Better as Generators Than Classifiers: Leveraging LLMs and Synthetic Data for Low-Resource Multilingual Classification | Branislav Pecher et.al. | 2601.16278 | translate | read | null |
| 2026-01-22 | GameTalk: Training LLMs for Strategic Conversation | Victor Conchello Vendrell et.al. | 2601.16276 | translate | read | null |
| 2026-01-21 | Algorithmic Identity Based on Metaparameters: A Path to Reliability, Auditability, and Traceability | Juliao Braga et.al. | 2601.16234 | translate | read | null |
| 2026-01-22 | Provable Robustness in Multimodal Large Language Models via Feature Space Smoothing | Song Xia et.al. | 2601.16200 | translate | read | null |
| 2026-01-22 | PAL*M: Property Attestation for Large Generative Models | Prach Chantasantitam et.al. | 2601.16199 | translate | read | null |
| 2026-01-22 | Structured Hints for Sample-Efficient Lean Theorem Proving | Zachary Burton et.al. | 2601.16172 | translate | read | null |
| 2026-01-22 | Low-altitude Multi-UAV-assisted Data Collection and Semantic Forwarding for Post-Disaster Relief | Xiaoya Zheng et.al. | 2601.16146 | translate | read | null |
| 2026-01-22 | LLM Prompt Evaluation for Educational Applications | Langdon Holmes et.al. | 2601.16134 | translate | read | null |
| 2026-01-22 | Improving Training Efficiency and Reducing Maintenance Costs via Language Specific Model Merging | Alphaeus Dmonte et.al. | 2601.16127 | translate | read | null |
| 2026-01-22 | Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing | Tingyu Song et.al. | 2601.16125 | translate | read | null |
| 2026-01-22 | Adapter Fusion for Multilingual Text2Cypher with Linear and Learned Gating | Makbule Gulcin Ozsoy et.al. | 2601.16097 | translate | read | null |
| 2026-01-22 | Controlling Long-Horizon Behavior in Language Model Agents with Explicit State Dynamics | Sukesh Subaharan et.al. | 2601.16087 | translate | read | null |
| 2026-01-22 | Grounding Large Language Models in Reaction Knowledge Graphs for Synthesis Retrieval | Olga Bunkova et.al. | 2601.16038 | translate | read | null |
| 2026-01-22 | Sawtooth Wavefront Reordering: Enhanced CuTile FlashAttention on NVIDIA GB10 | Yifan Zhu et.al. | 2601.16032 | translate | read | null |
| 2026-01-22 | Deja Vu in Plots: Leveraging Cross-Session Evidence with Retrieval-Augmented LLMs for Live Streaming Risk Assessment | Yiran Qiao et.al. | 2601.16027 | translate | read | null |
| 2026-01-22 | Timbre-Aware LLM-based Direct Speech-to-Speech Translation Extendable to Multiple Language Pairs | Lalaram Arya et.al. | 2601.16023 | translate | read | null |
| 2026-01-22 | PhysicsMind: Sim and Real Mechanics Benchmarking for Physical Reasoning and Prediction in Foundational VLMs and World Models | Chak-Wing Mak et.al. | 2601.16007 | translate | read | null |
| 2026-01-22 | TeNet: Text-to-Network for Compact Policy Synthesis | Ariyan Bighashdel et.al. | 2601.15912 | translate | read | null |
| 2026-01-22 | Co-Constructing Alignment: A Participatory Approach to Situate AI Values | Anne Arzberger et.al. | 2601.15895 | translate | read | null |
| 2026-01-22 | Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model | Chenghao Fan et.al. | 2601.15892 | translate | read | null |
| 2026-01-22 | Evaluating and Achieving Controllable Code Completion in Code LLM | Jiajun Zhang et.al. | 2601.15879 | translate | read | null |
| 2026-01-22 | Virtual Traffic Police: Large Language Model-Augmented Traffic Signal Control for Unforeseen Incidents | Shiqi Wei et.al. | 2601.15816 | translate | read | null |
| 2026-01-22 | ErrorMap and ErrorAtlas: Charting the Failure Landscape of Large Language Models | Shir Ashury-Tahan et.al. | 2601.15812 | translate | read | null |
| 2026-01-22 | Attributing and Exploiting Safety Vectors through Global Optimization in Large Language Models | Fengheng Chu et.al. | 2601.15801 | translate | read | null |
| 2026-01-22 | HumanLLM: Towards Personalized Understanding and Simulation of Human Nature | Yuxuan Lei et.al. | 2601.15793 | translate | read | null |
| 2026-01-22 | Next Generation Active Learning: Mixture of LLMs in the Loop | Yuanyuan Qi et.al. | 2601.15773 | translate | read | null |
| 2026-01-22 | Beyond Marginal Distributions: A Framework to Evaluate the Representativeness of Demographic-Aligned LLMs | Tristan Williams et.al. | 2601.15755 | translate | read | null |
| 2026-01-22 | Tabular Incremental Inference | Xinda Chen et.al. | 2601.15751 | translate | read | null |
| 2026-01-22 | Towards Automated Kernel Generation in the Era of LLMs | Yang Yu et.al. | 2601.15727 | translate | read | null |
| 2026-01-22 | VideoThinker: Building Agentic VideoLLMs with LLM-Guided Tool Reasoning | Chenglin Li et.al. | 2601.15724 | translate | read | null |
| 2026-01-22 | CoNRec: Context-Discerning Negative Recommendation with LLMs | Xinda Chen et.al. | 2601.15721 | translate | read | null |
| 2026-01-22 | Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs | Mingyu Yu et.al. | 2601.15698 | translate | read | null |
| 2026-01-22 | From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models | Jiaxin Zhang et.al. | 2601.15690 | translate | read | null |
| 2026-01-22 | Connect the Dots: Knowledge Graph-Guided Crawler Attack on Retrieval-Augmented Generation Systems | Mengyu Yao et.al. | 2601.15678 | translate | read | null |
| 2026-01-22 | What Patients Really Ask: Exploring the Effect of False Assumptions in Patient Information Seeking | Raymond Xiong et.al. | 2601.15674 | translate | read | null |
| 2026-01-22 | EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning | Dingdong Wang et.al. | 2601.15668 | translate | read | null |
| 2026-01-22 | Event-VStream: Event-Driven Real-Time Understanding for Long Video Streams | Zhenghui Guo et.al. | 2601.15655 | translate | read | null |
| 2026-01-22 | Predictive Coding and Information Bottleneck for Hallucination Detection in Large Language Models | Manish Bhatt et.al. | 2601.15652 | translate | read | null |
| 2026-01-22 | Towards Reliable Medical LLMs: Benchmarking and Enhancing Confidence Estimation of Large Language Models in Medical Consultation | Zhiyao Ren et.al. | 2601.15645 | translate | read | null |
| 2026-01-22 | CogToM: A Comprehensive Theory of Mind Benchmark inspired by Human Cognition for Large Language Models | Haibo Tong et.al. | 2601.15628 | translate | read | null |
| 2026-01-22 | Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors | Zhiwei Zhang et.al. | 2601.15625 | translate | read | null |
| 2026-01-22 | Explainable Deepfake Detection with RL Enhanced Self-Blended Images | Ning Jiang et.al. | 2601.15624 | translate | read | null |
| 2026-01-22 | When Sharpening Becomes Collapse: Sampling Bias and Semantic Coupling in RL with Verifiable Rewards | Mingyuan Fan et.al. | 2601.15609 | translate | read | null |
| 2026-01-22 | ToxiTwitch: Toward Emote-Aware Hybrid Moderation for Live Streaming Platforms | Baktash Ansari et.al. | 2601.15605 | translate | read | null |
| 2026-01-22 | Autonomous Business System via Neuro-symbolic AI | Cecil Pang et.al. | 2601.15599 | translate | read | null |
| 2026-01-22 | DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice | Leying Zhang et.al. | 2601.15596 | translate | read | null |
| 2026-01-22 | Data-Free Privacy-Preserving for LLMs via Model Inversion and Selective Unlearning | Xinjie Zhou et.al. | 2601.15595 | translate | read | null |
| 2026-01-22 | YuFeng-XGuard: A Reasoning-Centric, Interpretable, and Flexible Guardrail Model for Large Language Models | Junyu Lin et.al. | 2601.15588 | translate | read | null |
| 2026-01-22 | MapViT: A Two-Stage ViT-Based Framework for Real-Time Radio Quality Map Prediction in Dynamic Environments | Cyril Shih-Huan Hsu et.al. | 2601.15578 | translate | read | null |
| 2026-01-22 | From Generation to Collaboration: Using LLMs to Edit for Empathy in Healthcare | Man Luo et.al. | 2601.15558 | translate | read | null |
| 2026-01-22 | LLM or Human? Perceptions of Trust and Information Quality in Research Summaries | Nil-Jana Akpinar et.al. | 2601.15556 | translate | read | null |
| 2026-01-22 | VIOLA: Towards Video In-Context Learning with Minimal Annotations | Ryo Fujii et.al. | 2601.15549 | translate | read | null |
| 2026-01-21 | Securing LLM-as-a-Service for Small Businesses: An Industry Case Study of a Distributed Chatbot Deployment Platform | Jiazhu Xie et.al. | 2601.15528 | translate | read | null |
| 2026-01-21 | TransportAgents: a multi-agents LLM framework for traffic accident severity prediction | Zhichao Yang et.al. | 2601.15519 | translate | read | null |
| 2026-01-21 | AdversaRiskQA: An Adversarial Factuality Benchmark for High-Risk Domains | Adam Szelestey et.al. | 2601.15511 | translate | read | null |
| 2026-01-21 | MARS: Unleashing the Power of Speculative Decoding via Margin-Aware Verification | Jingwei Song et.al. | 2601.15498 | translate | read | null |
| 2026-01-21 | Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge | Yiyang Feng et.al. | 2601.15495 | translate | read | null |
| 2026-01-21 | Testing Deep Learning Libraries via Neurosymbolic Constraint Learning | M M Abid Naziri et.al. | 2601.15493 | translate | read | null |
| 2026-01-21 | Multi-Persona Thinking for Bias Mitigation in Large Language Models | Yuxing Chen et.al. | 2601.15488 | translate | read | null |
| 2026-01-21 | A Universal Large Language Model – Drone Command and Control Interface | Javier N. Ramos-Silva et.al. | 2601.15486 | translate | read | null |
| 2026-01-21 | The Rise of Large Language Models and the Direction and Impact of US Federal Research Funding | Yifan Qian et.al. | 2601.15485 | translate | read | null |
| 2026-01-21 | Martingale Foresight Sampling: A Principled Approach to Inference-Time LLM Decoding | Huayu Li et.al. | 2601.15482 | translate | read | null |
| 2026-01-21 | Benchmarking LLMs for Pairwise Causal Discovery in Biomedical and Multi-Domain Contexts | Sydney Anuyah et.al. | 2601.15479 | translate | read | null |
| 2026-01-21 | Reliability by design: quantifying and eliminating fabrication risk in LLMs. From generative to consultative AI: a comparative analysis in the legal domain and lessons for high-stakes knowledge bases | Alex Dantart et.al. | 2601.15476 | translate | read | null |
| 2026-01-21 | Chunking, Retrieval, and Re-ranking: An Empirical Evaluation of RAG Architectures for Policy Document Question Answering | Anuj Maharjan et.al. | 2601.15457 | translate | read | null |
| 2026-01-21 | Exploring Implicit Perspectives on Autism in Large Language Models Through Multi-Agent Simulations | Sohyeon Park et.al. | 2601.15437 | translate | read | null |
| 2026-01-21 | Not Your Typical Sycophant: The Elusive Nature of Sycophancy in Large Language Models | Shahar Ben Natan et.al. | 2601.15436 | translate | read | null |
| 2026-01-21 | Domain-Specific Knowledge Graphs in RAG-Enhanced Healthcare LLMs | Sydney Anuyah et.al. | 2601.15429 | translate | read | null |
| 2026-01-21 | Evaluating Multimodal Large Language Models for Heterogeneous Face Recognition | Hatef Otroshi Shahreza et.al. | 2601.15406 | translate | read | null |
| 2026-01-21 | Beyond Prompting: Efficient and Robust Contextual Biasing for Speech LLMs via Logit-Space Integration (LOGIC) | Peidong Wang et.al. | 2601.15397 | translate | read | null |
| 2026-01-21 | Memorization Dynamics in Knowledge Distillation for Language Models | Jaydeep Borkar et.al. | 2601.15394 | translate | read | null |
| 2026-01-21 | VegaChat: A Robust Framework for LLM-Based Chart Generation and Assessment | Marko Hostnik et.al. | 2601.15385 | translate | read | null |
| 2026-01-21 | OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation | Letian Zhang et.al. | 2601.15369 | translate | read | null |
| 2026-01-21 | Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing | Xiang Li et.al. | 2601.15356 | translate | read | null |
| 2026-01-21 | A Prompt-Based Framework for Loop Vulnerability Detection Using Local LLMs | Adeyemi Adeseye et.al. | 2601.15352 | translate | read | null |
| 2026-01-21 | Abusive music and song transformation using GenAI and LLMs | Jiyang Choi et.al. | 2601.15348 | translate | read | null |
| 2026-01-20 | Lost in Transcription: How Speech-to-Text Errors Derail Code Understanding | Jayant Havare et.al. | 2601.15339 | translate | read | null |
| 2026-01-20 | From Quotes to Concepts: Axial Coding of Political Debates with Ensemble LMs | Angelina Parfenova et.al. | 2601.15338 | translate | read | null |
| 2026-01-20 | ToolCaching: Towards Efficient Caching for LLM Tool-calling | Yi Zhai et.al. | 2601.15335 | translate | read | null |
| 2026-01-20 | No Reliable Evidence of Self-Reported Sentience in Small Large Language Models | Caspar Kaiser et.al. | 2601.15334 | translate | read | null |
| 2026-01-20 | Empowering LLMs for Structure-Based Drug Design via Exploration-Augmented Latent Inference | Xuanning Hu et.al. | 2601.15333 | translate | read | null |
| 2026-01-20 | RECAP: A Resource-Efficient Method for Adversarial Prompting in Large Language Models | Rishit Chugh et.al. | 2601.15331 | translate | read | null |
| 2026-01-20 | ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation | Zhebo Wang et.al. | 2601.15330 | translate | read | null |
| 2026-01-21 | Towards Understanding Best Practices for Quantization of Vision-Language Models | Gautom Das et.al. | 2601.15287 | translate | read | link |
| 2026-01-21 | Iterative Refinement Improves Compositional Image Generation | Shantanu Jaiswal et.al. | 2601.15286 | translate | read | null |
| 2026-01-21 | MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs | Christoph Bartmann et.al. | 2601.15279 | translate | read | null |
| 2026-01-21 | Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks | Sahar Tahmasebi et.al. | 2601.15277 | translate | read | null |
| 2026-01-21 | Lightweight LLMs for Network Attack Detection in IoT Networks | Piyumi Bhagya Sudasinghe et.al. | 2601.15269 | translate | read | null |
| 2026-01-21 | Evaluation of Large Language Models in Legal Applications: Challenges, Methods, and Future Directions | Yiran Hu et.al. | 2601.15267 | translate | read | null |
| 2026-01-21 | The Effect of Scripts and Formats on LLM Numeracy | Varshini Reddy et.al. | 2601.15251 | translate | read | null |
| 2026-01-21 | Metadata Conditioned Large Language Models for Localization | Anjishnu Mukherjee et.al. | 2601.15236 | translate | read | null |
| 2026-01-21 | When Agents Fail: A Comprehensive Study of Bugs in LLM Agents with Automated Labeling | Niful Islam et.al. | 2601.15232 | translate | read | null |
| 2026-01-21 | Deaf and Hard of Hearing Access to Intelligent Personal Assistants: Comparison of Voice-Based Options with an LLM-Powered Touch Interface | Paige S. DeVries et.al. | 2601.15209 | translate | read | null |
| 2026-01-21 | Benchmarking Large Language Models for ABAP Code Generation: An Empirical Study on Iterative Improvement by Compiler Feedback | Stephan Wallraven et.al. | 2601.15188 | translate | read | null |
| 2026-01-21 | Supporting Humans in Evaluating AI Summaries of Legal Depositions | Naghmeh Farzi et.al. | 2601.15182 | translate | read | null |
| 2026-01-21 | The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models | Zanlin Ni et.al. | 2601.15165 | translate | read | link |
| 2026-01-21 | Automated Rubrics for Reliable Evaluation of Medical Dialogue Systems | Yinzhu Chen et.al. | 2601.15161 | translate | read | null |
| 2026-01-21 | Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning | Yuval Kansal et.al. | 2601.15160 | translate | read | null |
| 2026-01-21 | How to Build AI Agents by Augmenting LLMs with Codified Human Expert Domain Knowledge? A Software Engineering Framework | Choro Ulan uulu et.al. | 2601.15153 | translate | read | null |
| 2026-01-21 | CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning | Tianshi Xu et.al. | 2601.15141 | translate | read | null |
| 2026-01-21 | Why Authors and Maintainers Link (or Don’t Link) Their PyPI Libraries to Code Repositories and Donation Platforms | Alexandros Tsakpinis et.al. | 2601.15139 | translate | read | null |
| 2026-01-21 | Conversational AI for Social Good (CAI4SG): An Overview of Emerging Trends, Applications, and Challenges | Yi-Chieh Lee et.al. | 2601.15136 | translate | read | null |
| 2026-01-21 | The Plausibility Trap: Using Probabilistic Engines for Deterministic Tasks | Ivan Carrera et.al. | 2601.15130 | translate | read | null |
| 2026-01-21 | RSNA Large Language Model Benchmark Dataset for Chest Radiographs of Cardiothoracic Disease: Radiologist Evaluation and Validation Enhanced by AI Labels (REVEAL-CXR) | Yishu Wei et.al. | 2601.15129 | translate | read | null |
| 2026-01-21 | From Who They Are to How They Act: Behavioral Traits in Generative Agent-Based Models of Social Media | Valerio La Gatta et.al. | 2601.15114 | translate | read | null |
| 2026-01-21 | Parameter-Efficient Multi-Task Fine-Tuning in Code-Related Tasks | Md Zahidul Haque et.al. | 2601.15094 | translate | read | null |
| 2026-01-21 | Multi-Agent Constraint Factorization Reveals Latent Invariant Solution Structure | Christopher Scofield et.al. | 2601.15077 | translate | read | null |
| 2026-01-21 | The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution | Chen Qian et.al. | 2601.15075 | translate | read | null |
| 2026-01-21 | SmartOracle – An Agentic Approach to Mitigate Noise in Differential Oracles | Srinath Srinivasan et.al. | 2601.15074 | translate | read | null |
| 2026-01-21 | Turning Citation Networks Inside Out: Studying Science Using Content-Based Knowledge Graphs from LLM-Derived Taxonomies | Seorin Kim et.al. | 2601.15062 | translate | read | null |
| 2026-01-21 | LogicScore: Fine-grained Logic Evaluation of Conciseness, Completeness, and Determinateness in Attributed Question Answering | Zhichao Yan et.al. | 2601.15050 | translate | read | null |
| 2026-01-21 | Game-Theoretic Lens on LLM-based Multi-Agent Systems | Jianing Hao et.al. | 2601.15047 | translate | read | null |
| 2026-01-21 | Knowledge Restoration-driven Prompt Optimization: Unlocking LLM Potential for Open-Domain Relational Triplet Extraction | Xiaonan Jing et.al. | 2601.15037 | translate | read | null |
| 2026-01-21 | Visual and Cognitive Demands of a Large Language Model-Powered In-vehicle Conversational Agent | Chris Monk et.al. | 2601.15034 | translate | read | null |
| 2026-01-21 | Mixture-of-Experts Models in Vision: Routing, Optimization, and Generalization | Adam Rokah et.al. | 2601.15021 | translate | read | null |
| 2026-01-21 | LiViBench: An Omnimodal Benchmark for Interactive Livestream Video Understanding | Xiaodong Wang et.al. | 2601.15016 | translate | read | null |
| 2026-01-21 | Obscuring Data Contamination Through Translation: Evidence from Arabic Corpora | Chaymaa Abbas et.al. | 2601.14994 | translate | read | null |
| 2026-01-21 | InstructTime++: Time Series Classification with Multimodal Language Modeling via Implicit Feature Enhancement | Mingyue Cheng et.al. | 2601.14968 | translate | read | null |
| 2026-01-21 | Power-Law Scaling in the Classification Performance of Small-Scale Spiking Neural Networks | Zhengdi Zhang et.al. | 2601.14961 | translate | read | null |
| 2026-01-21 | CorpusQA: A 10 Million Token Benchmark for Corpus-Level Analysis and Reasoning | Zhiyuan Lu et.al. | 2601.14952 | translate | read | null |
| 2026-01-21 | What Should I Cite? A RAG Benchmark for Academic Citation Prediction | Leqi Zheng et.al. | 2601.14949 | translate | read | null |
| 2026-01-21 | The GDN-CC Dataset: Automatic Corpus Clarification for AI-enhanced Democratic Citizen Consultations | Pierre-Antoine Lequeu et.al. | 2601.14944 | translate | read | null |
| 2026-01-21 | State of the Art of LLM-Enabled Interaction with Visualization | Mathis Brossier et.al. | 2601.14943 | translate | read | null |
| 2026-01-21 | LLM-Based Repair of C++ Implicit Data Loss Compiler Warnings: An Industrial Case Study | Chansong You et.al. | 2601.14936 | translate | read | null |
| 2026-01-21 | CodeDelegator: Mitigating Context Pollution via Role Separation in Code-as-Action Agents | Tianxiang Fei et.al. | 2601.14914 | translate | read | null |
| 2026-01-21 | AlertGuardian: Intelligent Alert Life-Cycle Management for Large-scale Cloud Systems | Guangba Yu et.al. | 2601.14912 | translate | read | null |
| 2026-01-21 | SynPerf: A Hybrid Analytical-ML Framework for GPU Performance Prediction | Kaixuan Zhang et.al. | 2601.14910 | translate | read | null |
| 2026-01-21 | Comparative Study of Large Language Models on Chinese Film Script Continuation: An Empirical Analysis Based on GPT-5.2 and Qwen-Max | Yuxuan Cao et.al. | 2601.14826 | translate | read | null |
| 2026-01-21 | Reflecting in the Reflection: Integrating a Socratic Questioning Framework into Automated AI-Based Question Generation | Ondřej Holub et.al. | 2601.14798 | translate | read | null |
| 2026-01-21 | CI4A: Semantic Component Interfaces for Agents Empowering Web Automation | Zhi Qiu et.al. | 2601.14790 | translate | read | null |
| 2026-01-21 | RECAP: Resistance Capture in Text-based Mental Health Counseling with Large Language Models | Anqi Li et.al. | 2601.14780 | translate | read | null |
| 2026-01-21 | ReinPath: A Multimodal Reinforcement Learning Approach for Pathology | Kangcheng Zhou et.al. | 2601.14757 | translate | read | null |
| 2026-01-21 | Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning | Yifan Wang et.al. | 2601.14750 | translate | read | link |
| 2026-01-21 | Optimizing FaaS Platforms for MCP-enabled Agentic Workflows | Varad Kulkarni et.al. | 2601.14735 | translate | read | null |
| 2026-01-21 | AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering | Chun-Yi Kuan et.al. | 2601.14728 | translate | read | null |
| 2026-01-21 | HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding | Haowei Zhang et.al. | 2601.14724 | translate | read | link |
| 2026-01-21 | PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning | Yao Lu et.al. | 2601.14716 | translate | read | null |
| 2026-01-21 | Unified Multimodal and Multilingual Retrieval via Multi-Task Learning with NLU Integration | Xinyuan Zhang et.al. | 2601.14714 | translate | read | null |
| 2026-01-21 | DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs | Mingxuan Song et.al. | 2601.14711 | translate | read | null |
| 2026-01-21 | LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval | Chao Gao et.al. | 2601.14706 | translate | read | null |
| 2026-01-21 | DARL: Encouraging Diverse Answers for General Reasoning without Verifiers | Chongxuan Huang et.al. | 2601.14700 | translate | read | null |
| 2026-01-21 | AdaTIR: Adaptive Tool-Integrated Reasoning via Difficulty-Aware Policy Optimization | Zhaiyu Fang et.al. | 2601.14696 | translate | read | null |
| 2026-01-21 | Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation | Muhammad Khalifa et.al. | 2601.14691 | translate | read | null |
| 2026-01-21 | IB-GRPO: Aligning LLM-based Learning Path Recommendation with Educational Objectives via Indicator-Based Group Relative Policy Optimization | Shuai Wang et.al. | 2601.14686 | translate | read | null |
| 2026-01-21 | FARE: Fast-Slow Agentic Robotic Exploration | Shuhao Liao et.al. | 2601.14681 | translate | read | null |
| 2026-01-21 | HCVR Scene Generation: High Compatibility Virtual Reality Environment Generation for Extended Redirected Walking | Yiran Zhang et.al. | 2601.14679 | translate | read | null |
| 2026-01-21 | INFA-Guard: Mitigating Malicious Propagation via Infection-Aware Safeguarding in LLM-Based Multi-Agent Systems | Yijin Zhou et.al. | 2601.14667 | translate | read | null |
| 2026-01-21 | NeuroFilter: Privacy Guardrails for Conversational LLM Agents | Saswat Das et.al. | 2601.14660 | translate | read | null |
| 2026-01-21 | Say Anything but This: When Tokenizer Betrays Reasoning in LLMs | Navid Ayoobi et.al. | 2601.14658 | translate | read | null |
| 2026-01-21 | MIND: Empowering Mental Health Clinicians with Multimodal Data Insights through a Narrative Dashboard | Ruishi Zou et.al. | 2601.14641 | translate | read | null |
| 2026-01-21 | Forest-Chat: Adapting Vision-Language Agents for Interactive Forest Change Analysis | James Brock et.al. | 2601.14637 | translate | read | null |
| 2026-01-21 | Probing Prompt Design for Socially Compliant Robot Navigation with Vision Language Models | Ling Xiao et.al. | 2601.14622 | translate | read | null |
| 2026-01-21 | Seeing to Think? How Source Transparency Design Shapes Interactive Information Seeking and Evaluation in Conversational AI | Jiangen He et.al. | 2601.14611 | translate | read | null |
| 2026-01-21 | An LLM Agent-based Framework for Whaling Countermeasures | Daisuke Miyamoto et.al. | 2601.14606 | translate | read | null |
| 2026-01-21 | Variance-Adaptive Muon: Accelerating LLM Pretraining with NSR-Modulated and Variance-Scaled Momentum | Jingru Li et.al. | 2601.14603 | translate | read | null |
| 2026-01-21 | 3D Space as a Scratchpad for Editable Text-to-Image Generation | Oindrila Saha et.al. | 2601.14602 | translate | read | null |
| 2026-01-21 | HELIOS: Hierarchical Graph Abstraction for Structure-Aware LLM Decompilation | Yonatan Gizachew Achamyeleh et.al. | 2601.14598 | translate | read | null |
| 2026-01-21 | LFS: Learnable Frame Selector for Event-Aware and Temporally Diverse Video Captioning | Lianying Chao et.al. | 2601.14594 | translate | read | null |
| 2026-01-21 | Counterfactual Modeling with Fine-Tuned LLMs for Health Intervention Design and Sensor Data Augmentation | Shovito Barua Soumma et.al. | 2601.14590 | translate | read | null |
| 2026-01-21 | Social Caption: Evaluating Social Understanding in Multimodal Models | Bhaavanaa Thumu et.al. | 2601.14569 | translate | read | null |
| 2026-01-21 | Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education | Unggi Lee et.al. | 2601.14560 | translate | read | null |
| 2026-01-21 | Self-Blinding and Counterfactual Self-Simulation Mitigate Biases and Sycophancy in Large Language Models | Brian Christian et.al. | 2601.14553 | translate | read | null |
| 2026-01-20 | Predicting Retrieval Utility and Answer Quality in Retrieval-Augmented Generation | Fangzheng Tian et.al. | 2601.14546 | translate | read | null |
| 2026-01-20 | Report for NSF Workshop on AI for Electronic Design Automation | Deming Chen et.al. | 2601.14541 | translate | read | null |
| 2026-01-20 | LLM Security and Safety: Insights from Homotopy-Inspired Prompt Obfuscation | Luis Lazo et.al. | 2601.14528 | translate | read | null |
| 2026-01-20 | Large Language Model-Powered Evolutionary Code Optimization on a Phylogenetic Tree | Leyi Zhao et.al. | 2601.14523 | translate | read | null |
| 2026-01-20 | Can LLM Reasoning Be Trusted? A Comparative Study: Using Human Benchmarking on Statistical Tasks | Crish Nagarkar et.al. | 2601.14479 | translate | read | null |
| 2026-01-20 | Large Language Models for Large-Scale, Rigorous Qualitative Analysis in Applied Health Services Research | Sasha Ronaghi et.al. | 2601.14478 | translate | read | null |
| 2026-01-20 | On the Generalization Gap in LLM Planning: Tests and Verifier-Reward RL | Valerio Belcamino et.al. | 2601.14456 | translate | read | null |
| 2026-01-20 | Diffusion Large Language Models for Black-Box Optimization | Ye Yuan et.al. | 2601.14446 | translate | read | null |
| 2026-01-20 | Agentic AI Meets Edge Computing in Autonomous UAV Swarms | Thuan Minh Nguyen et.al. | 2601.14437 | translate | read | null |
| 2026-01-20 | CMind: An AI Agent for Localizing C Memory Bugs | Chia-Yi Su et.al. | 2601.14434 | translate | read | null |
| 2026-01-20 | Measuring the State of Open Science in Transportation Using Large Language Models | Junyi Ji et.al. | 2601.14429 | translate | read | null |
| 2026-01-20 | Rethinking On-Device LLM Reasoning: Why Analogical Mapping Outperforms Abstract Thinking for IoT DDoS Detection | William Pan et.al. | 2601.14343 | translate | read | null |
| 2026-01-20 | Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs | Yiyang Lu et.al. | 2601.14340 | translate | read | null |
| 2026-01-20 | Layer-adaptive Expert Pruning for Pre-Training of Mixture-of-Experts Large Language Models | YuanLab. ai et.al. | 2601.14327 | translate | read | null |
| 2026-01-19 | Tracing the Data Trail: A Survey of Data Provenance, Transparency and Traceability in LLMs | Richard Hohensinner et.al. | 2601.14311 | translate | read | null |
| 2026-01-19 | CORVUS: Red-Teaming Hallucination Detectors via Internal Signal Camouflage in Large Language Models | Nay Myat Min et.al. | 2601.14310 | translate | read | null |
| 2026-01-20 | XR: Cross-Modal Agents for Composed Image Retrieval | Zhongyu Yang et.al. | 2601.14245 | translate | read | null |
| 2026-01-20 | Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow | Haocheng Xi et.al. | 2601.14243 | translate | read | null |
| 2026-01-20 | Attention-Based Offline Reinforcement Learning and Clustering for Interpretable Sepsis Treatment | Punit Kumar et.al. | 2601.14228 | translate | read | null |
| 2026-01-20 | HALT: Hallucination Assessment via Latent Testing | Rohan Bhatnagar et.al. | 2601.14210 | translate | read | null |
| 2026-01-20 | InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning | Matthew Y. R. Yang et.al. | 2601.14209 | translate | read | null |
| 2026-01-20 | Toward Efficient Agents: Memory, Tool learning, and Planning | Xiaofang Yang et.al. | 2601.14192 | translate | read | link |
| 2026-01-20 | ReSearch: A Multi-Stage Machine Learning Framework for Earth Science Data Discovery | Youran Sun et.al. | 2601.14176 | translate | read | null |
| 2026-01-20 | Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance | Qianli Ma et.al. | 2601.14171 | translate | read | link |
| 2026-01-20 | Domain-Adaptation through Synthetic Data: Fine-Tuning Large Language Models for German Law | Ali Hamza Bashir et.al. | 2601.14160 | translate | read | null |
| 2026-01-20 | ConceptCaps – a Distilled Concept Dataset for Interpretability in Music Models | Bruno Sienkiewicz et.al. | 2601.14157 | translate | read | null |
| 2026-01-20 | LLM Augmented Intervenable Multimodal Adaptor for Post-operative Complication Prediction in Lung Cancer Surgery | Shubham Pandey et.al. | 2601.14154 | translate | read | null |
| 2026-01-20 | Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models | Hyunjong Ok et.al. | 2601.14152 | translate | read | null |
| 2026-01-20 | The Quest for Reliable AI Accelerators: Cross-Layer Evaluation and Design Optimization | Meng Li et.al. | 2601.14148 | translate | read | null |
| 2026-01-20 | CREATE: Cross-Layer Resilience Characterization and Optimization for Efficient yet Reliable Embodied AI Systems | Tong Xie et.al. | 2601.14140 | translate | read | null |
| 2026-01-20 | The Side Effects of Being Smart: Safety Risks in MLLMs’ Multi-Image Reasoning | Renmiao Chen et.al. | 2601.14127 | translate | read | link |
| 2026-01-20 | Style Transfer as Bias Mitigation: Diffusion Models for Synthetic Mental Health Text for Arabic | Saad Mankarious et.al. | 2601.14124 | translate | read | null |
| 2026-01-20 | NewsRECON: News article REtrieval for image CONtextualization | Jonathan Tonglet et.al. | 2601.14121 | translate | read | null |
| 2026-01-20 | A flexible language model-assisted electronic design automation framework | Cristian Sestito et.al. | 2601.14098 | translate | read | null |
| 2026-01-20 | Zero-shot adaptable task planning for autonomous construction robots: a comparative study of lightweight single and multi-AI agent systems | Hossein Naderi et.al. | 2601.14091 | translate | read | null |
| 2026-01-20 | DermaBench: A Clinician-Annotated Benchmark Dataset for Dermatology Visual Question Answering and Reasoning | Abdurrahim Yilmaz et.al. | 2601.14084 | translate | read | null |
| 2026-01-20 | XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs | Mohsinul Kabir et.al. | 2601.14063 | translate | read | null |
| 2026-01-20 | Fine-Grained Zero-Shot Composed Image Retrieval with Complementary Visual-Semantic Integration | Yongcong Ye et.al. | 2601.14060 | translate | read | null |
| 2026-01-20 | LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems | Badri N. Patro et.al. | 2601.14053 | translate | read | null |
| 2026-01-20 | Vision Also You Need: Navigating Out-of-Distribution Detection with Multimodal Large Language Model | Haoran Xu et.al. | 2601.14052 | translate | read | null |
| 2026-01-20 | Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants | Yunhe Wang et.al. | 2601.14041 | translate | read | null |
| 2026-01-20 | RM-Distiller: Exploiting Generative LLM for Reward Model Distillation | Hongli Zhou et.al. | 2601.14032 | translate | read | null |
| 2026-01-20 | BACH-V: Bridging Abstract and Concrete Human-Values in Large Language Models | Junyu Zhang et.al. | 2601.14007 | translate | read | null |
| 2026-01-20 | Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models | Hengyuan Zhang et.al. | 2601.14004 | translate | read | link |
| 2026-01-20 | Auditory Brain Passage Retrieval: Cross-Sensory EEG Training for Neural Information Retrieval | Niall McGuire et.al. | 2601.14001 | translate | read | null |
| 2026-01-20 | “The Whole Is Greater Than the Sum of Its Parts”: A Compatibility-Aware Multi-Teacher CoT Distillation Framework | Jin Cui et.al. | 2601.13992 | translate | read | null |
| 2026-01-20 | VirtualCrime: Evaluating Criminal Potential of Large Language Models via Sandbox Simulation | Yilin Tang et.al. | 2601.13981 | translate | read | null |
| 2026-01-20 | RepoGenesis: Benchmarking End-to-End Microservice Generation from Readme to Repository | Zhiyuan Peng et.al. | 2601.13943 | translate | read | null |
| 2026-01-20 | Glance-or-Gaze: Incentivizing LMMs to Adaptively Focus Search via Reinforcement Learning | Hongbo Bai et.al. | 2601.13942 | translate | read | null |
| 2026-01-20 | HyperWalker: Dynamic Hypergraph-Based Deep Diagnosis for Multi-Hop Clinical Modeling across EHR and X-Ray in Medical VLMs | Yuezhe Yang et.al. | 2601.13919 | translate | read | null |
| 2026-01-20 | AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization | Yusheng Liao et.al. | 2601.13918 | translate | read | link |
| 2026-01-20 | Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches | Changhao Pan et.al. | 2601.13910 | translate | read | null |
| 2026-01-20 | Multi-Objective Hierarchical Optimization with Large Language Models | Andrej Schwanke et.al. | 2601.13892 | translate | read | null |
| 2026-01-20 | Human Simulation Computation: A Human-Inspired Framework for Adaptive AI Systems | Hong Su et.al. | 2601.13887 | translate | read | null |
| 2026-01-20 | OpenLearnLM Benchmark: A Unified Framework for Evaluating Knowledge, Skill, and Attitude in Educational Large Language Models | Unggi Lee et.al. | 2601.13882 | translate | read | null |
| 2026-01-20 | LifeAgentBench: A Multi-dimensional Benchmark and Agent for Personal Health Assistants in Digital Health | Ye Tian et.al. | 2601.13880 | translate | read | null |
| 2026-01-20 | Chain-of-Thought Compression Should Not Be Blind: V-Skip for Efficient Multimodal Reasoning via Dual-Path Anchoring | Dongxu Zhang et.al. | 2601.13879 | translate | read | null |
| 2026-01-20 | Pedagogical Alignment for Vision-Language-Action Models: A Comprehensive Framework for Data, Architecture, and Evaluation in Education | Unggi Lee et.al. | 2601.13876 | translate | read | null |
| 2026-01-20 | HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation | Qirui Chen et.al. | 2601.13864 | translate | read | null |
| 2026-01-20 | QKVQA: Question-Focused Filtering for Knowledge-based VQA | Wei Ye et.al. | 2601.13856 | translate | read | null |
| 2026-01-20 | Small Models, Big Impact: Tool-Augmented AI Agents for Wireless Network Planning | Yongqiang Zhang et.al. | 2601.13843 | translate | read | null |
| 2026-01-20 | DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes | Aisha Al-Mohannadi et.al. | 2601.13839 | translate | read | null |
| 2026-01-20 | FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs | Qian Chen et.al. | 2601.13836 | translate | read | link |
| 2026-01-20 | ELSA: Efficient LLM-Centric Split Aggregation for Privacy-Aware Hierarchical Federated Learning over Resource-Constrained Edge Networks | Xiaohong Yang et.al. | 2601.13824 | translate | read | null |
| 2026-01-20 | HoverAI: An Embodied Aerial Agent for Natural Human-Drone Interaction | Yuhua Jin et.al. | 2601.13801 | translate | read | null |
| 2026-01-20 | Look-Ahead-Bench: a Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance | Mostapha Benhenda et.al. | 2601.13770 | translate | read | null |
| 2026-01-20 | DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution | Shengda Fan et.al. | 2601.13761 | translate | read | link |
| 2026-01-20 | On Autopilot? An Empirical Study of Human-AI Teaming and Review Practices in Open Source | Haoyu Gao et.al. | 2601.13754 | translate | read | null |
| 2026-01-20 | Pro-AI Bias in Large Language Models | Benaya Trabelsi et.al. | 2601.13749 | translate | read | null |
| 2026-01-20 | Dimension-First Evaluation of Speech-to-Speech Models with Structured Acoustic Cues | Arjun Chandra et.al. | 2601.13742 | translate | read | null |
| 2026-01-20 | Towards robust long-context understanding of large language model via active recap learning | Chenyu Hui et.al. | 2601.13734 | translate | read | null |
| 2026-01-20 | OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents | Yulin Hu et.al. | 2601.13722 | translate | read | null |
| 2026-01-20 | GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark | Lotta Kiefer et.al. | 2601.13711 | translate | read | null |
| 2026-01-20 | Hidden in Plain Text: Measuring LLM Deception Quality Against Human Baselines Using Social Deduction Games | Christopher Kao et.al. | 2601.13709 | translate | read | null |
| 2026-01-20 | IGAA: Intent-Driven General Agentic AI for Edge Services Scheduling using Generative Meta Learning | Yan Sun et.al. | 2601.13702 | translate | read | null |
| 2026-01-20 | Uncertainty-Aware Gradient Signal-to-Noise Data Selection for Instruction Tuning | Zhihang Yuan et.al. | 2601.13697 | translate | read | null |
| 2026-01-20 | Generative Intent Prediction Agentic AI empowered Edge Service Function Chain Orchestration | Yan Sun et.al. | 2601.13694 | translate | read | null |
| 2026-01-20 | Dr. Assistant: Enhancing Clinical Diagnostic Inquiry via Structured Diagnostic Reasoning Data and Reinforcement Learning | Yue Guo et.al. | 2601.13690 | translate | read | null |
| 2026-01-20 | CodeContests-O: Powering LLMs via Feedback-Driven Iterative Test Case Generation | Jianfeng Cai et.al. | 2601.13682 | translate | read | link |
| 2026-01-20 | CommunityBench: Benchmarking Community-Level Alignment across Diverse Groups and Tasks | Jiayu Lin et.al. | 2601.13669 | translate | read | null |
| 2026-01-20 | Temporal-Spatial Decouple before Act: Disentangled Representation Learning for Multimodal Sentiment Analysis | Chunlei Meng et.al. | 2601.13659 | translate | read | null |
| 2026-01-20 | Beyond Known Facts: Generating Unseen Temporal Knowledge to Address Data Contamination in LLM Evaluation | Arthur Amalvy et.al. | 2601.13658 | translate | read | null |
| 2026-01-20 | Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs | Guangba Yu et.al. | 2601.13655 | translate | read | null |
| 2026-01-20 | TimeART: Towards Agentic Time Series Reasoning via Tool-Augmentation | Xingjian Wu et.al. | 2601.13653 | translate | read | null |
| 2026-01-20 | Fairness or Fluency? An Investigation into Language Bias of Pairwise LLM-as-a-Judge | Xiaolin Zhou et.al. | 2601.13649 | translate | read | null |
| 2026-01-20 | ContiguousKV: Accelerating LLM Prefill with Granularity-Aligned KV Cache Management | Jing Zou et.al. | 2601.13631 | translate | read | null |
| 2026-01-20 | Activation-Space Anchored Access Control for Multi-Class Permission Reasoning in Large Language Models | Zhaopeng Zhang et.al. | 2601.13630 | translate | read | null |
| 2026-01-20 | S $^2$ Voice: Style-Aware Autoregressive Modeling with Enhanced Conditioning for Singing Style Conversion | Ziqian Wang et.al. | 2601.13629 | translate | read | null |
| 2026-01-20 | PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator | Yue Jiet Chong et.al. | 2601.13628 | translate | read | null |
| 2026-01-20 | Are Large Language Models able to Predict Highly Cited Papers? Evidence from Statistical Publications | Zhanshuo Ye et.al. | 2601.13627 | translate | read | null |
| 2026-01-20 | PINA: Prompt Injection Attack against Navigation Agents | Jiani Liu et.al. | 2601.13612 | translate | read | null |
| 2026-01-20 | Foundations of Global Consistency Checking with Noisy LLM Oracles | Paul He et.al. | 2601.13600 | translate | read | null |
| 2026-01-20 | AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development | Shyam Agarwal et.al. | 2601.13597 | translate | read | null |
| 2026-01-20 | Vulnerability of LLMs’ Belief Systems? LLMs Belief Resistance Check Through Strategic Persuasive Conversation Interventions | Fan Huang et.al. | 2601.13590 | translate | read | null |
| 2026-01-20 | TREX: Tokenizer Regression for Optimal Data Mixture | Inho Won et.al. | 2601.13588 | translate | read | null |
| 2026-01-20 | SCRIPTMIND: Crime Script Inference and Cognitive Evaluation for LLM-based Social Engineering Scam Detection System | Heedou Kim et.al. | 2601.13581 | translate | read | null |
| 2026-01-20 | Leveraging ChatGPT and Other NLP Methods for Identifying Risk and Protective Behaviors in MSM: Social Media and Dating apps Text Analysis | Mehrab Beikzadeh et.al. | 2601.13558 | translate | read | null |
| 2026-01-20 | LogicEnvGen: Task-Logic Driven Generation of Diverse Simulated Environments for Embodied AI | Jianan Wang et.al. | 2601.13556 | translate | read | null |
| 2026-01-20 | TruthTensor: Evaluating LLMs Human Imitation through Prediction Market Drift and Holistic Reasoning | Shirin Shahabi et.al. | 2601.13545 | translate | read | null |
| 2026-01-20 | When Wording Steers the Evaluation: Framing Bias in LLM judges | Yerin Hwang et.al. | 2601.13537 | translate | read | null |
| 2026-01-20 | CatMaster: An Agentic Autonomous System for Computational Heterogeneous Catalysis Research | Honghao Chen et.al. | 2601.13508 | translate | read | null |
| 2026-01-20 | Towards Efficient and Robust Linguistic Emotion Diagnosis for Mental Health via Multi-Agent Instruction Refinement | Jian Zhang et.al. | 2601.13481 | translate | read | null |
| 2026-01-20 | A Unified Variational Imputation Framework for Electric Vehicle Charging Data Using Retrieval-Augmented Language Model | Jinhao Li et.al. | 2601.13476 | translate | read | null |
| 2026-01-20 | Preconditioning Benefits of Spectral Orthogonalization in Muon | Jianhao Ma et.al. | 2601.13474 | translate | read | null |
| 2026-01-19 | PhysicsSolutionAgent: Towards Multimodal Explanations for Numerical Physics Problem Solving | Aditya Thole et.al. | 2601.13453 | translate | read | null |
| 2026-01-19 | Explicit Cognitive Allocation: A Principle for Governed and Auditable Inference in Large Language Models | Héctor Manuel Manzanilla-Granados et.al. | 2601.13443 | translate | read | null |
| 2026-01-19 | Trust Me, I’m an Expert: Decoding and Steering Authority Bias in Large Language Models | Priyanka Mary Mammen et.al. | 2601.13433 | translate | read | null |
| 2026-01-19 | RLBR: Reinforcement Learning with Biasing Rewards for Contextual Speech Large Language Models | Bo Ren et.al. | 2601.13409 | translate | read | null |
| 2026-01-19 | Integrating Virtual Reality and Large Language Models for Team-Based Non-Technical Skills Training and Evaluation in the Operating Room | Jacob Barker et.al. | 2601.13406 | translate | read | null |
| 2026-01-19 | Beyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks | Shlok Shelat et.al. | 2601.13392 | translate | read | null |
| 2026-01-19 | Structured Insight from Unstructured Data: Large Language Models for SDOH-Driven Diabetes Risk Prediction | Sasha Ronaghi et.al. | 2601.13388 | translate | read | null |
| 2026-01-19 | Confidence over Time: Confidence Calibration with Temporal Logic for Large Language Model Reasoning | Zhenjiang Mao et.al. | 2601.13387 | translate | read | null |
| 2026-01-19 | A Lightweight Modular Framework for Constructing Autonomous Agents Driven by Large Language Models: Design, Implementation, and Applications in AgentForge | Akbar Anbar Jafari et.al. | 2601.13383 | translate | read | null |
| 2026-01-19 | Bounded Minds, Generative Machines: Envisioning Conversational AI that Works with Human Heuristics and Reduces Bias Risk | Jiqun Liu et.al. | 2601.13376 | translate | read | null |
| 2026-01-19 | Recurrent Confidence Chain: Temporal-Aware Uncertainty Quantification in Large Language Models | Zhenjiang Mao et.al. | 2601.13368 | translate | read | null |
| 2026-01-19 | Sockpuppetting: Jailbreaking LLMs Without Optimization Through Output Prefix Injection | Asen Dotsinski et.al. | 2601.13359 | translate | read | null |
| 2026-01-19 | The Geometry of Thought: How Scale Restructures Reasoning In Large Language Models | Samuel Cyrenius Anderson et.al. | 2601.13358 | translate | read | null |
| 2026-01-19 | LLM-as-RNN: A Recurrent Language Model for Memory Updates and Sequence Prediction | Yuxing Lu et.al. | 2601.13352 | translate | read | null |
| 2026-01-19 | FlipFlop: A Static Analysis-based Energy Optimization Framework for GPU Kernels | Saurabhsingh Rajput et.al. | 2601.13345 | translate | read | null |
| 2026-01-19 | Paid Voices vs. Public Feeds: Interpretable Cross-Platform Theme Modeling of Climate Discourse | Samantha Sudhoff et.al. | 2601.13317 | translate | read | null |
| 2026-01-19 | CausalSpatial: A Benchmark for Object-Centric Causal Spatial Reasoning | Wenxin Ma et.al. | 2601.13304 | translate | read | null |
| 2026-01-19 | OI-Bench: An Option Injection Benchmark for Evaluating LLM Susceptibility to Directive Interference | Yow-Fu Liou et.al. | 2601.13300 | translate | read | null |
| 2026-01-19 | Enginuity: Building an Open Multi-Domain Dataset of Complex Engineering Diagrams | Ethan Seefried et.al. | 2601.13299 | translate | read | null |
| 2026-01-19 | The Tag is the Signal: URL-Agnostic Credibility Scoring for Messages on Telegram | Yipeng Wang et.al. | 2601.13294 | translate | read | null |
| 2026-01-19 | Semantic Communication in Underwater IoT Networks for Meaning-Driven Connectivity | Ruhul Amin Khalil et.al. | 2601.13289 | translate | read | null |
| 2026-01-19 | Balancing Classification and Calibration Performance in Decision-Making LLMs via Calibration Aware Reinforcement Learning | Duygu Nur Yaldiz et.al. | 2601.13284 | translate | read | null |
| 2026-01-19 | Improving the Safety and Trustworthiness of Medical AI via Multi-Agent Evaluation Loops | Zainab Ghafoor et.al. | 2601.13268 | translate | read | null |
| 2026-01-19 | Unlearning in LLMs: Methods, Evaluation, and Open Challenges | Tyler Lizzo et.al. | 2601.13264 | translate | read | null |
| 2026-01-19 | CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning | Eric Onyame et.al. | 2601.13262 | translate | read | null |
| 2026-01-19 | Stop Taking Tokenizers for Granted: They Are Core Design Decisions in Large Language Models | Sawsan Alqahtani et.al. | 2601.13260 | translate | read | null |
| 2026-01-19 | Aligning Agentic World Models via Knowledgeable Experience Learning | Baochang Ren et.al. | 2601.13247 | translate | read | null |
| 2026-01-19 | A Comprehensive Evaluation of LLM Reasoning: From Single-Model to Multi-Agent Paradigms | Yapeng Li et.al. | 2601.13243 | translate | read | null |
| 2026-01-19 | KOCO-BENCH: Can Large Language Models Leverage Domain Knowledge in Software Development? | Xue Jiang et.al. | 2601.13240 | translate | read | null |
| 2026-01-19 | GTPred: Benchmarking MLLMs for Interpretable Geo-localization and Time-of-capture Prediction | Jinnao Li et.al. | 2601.13207 | translate | read | null |
| 2026-01-19 | Real-Time Deadlines Reveal Temporal Awareness Failures in LLM Strategic Dialogues | Neil K. R. Sehgal et.al. | 2601.13206 | translate | read | null |
| 2026-01-19 | Scientific production in the era of Large Language Models | Keigo Kusumegi et.al. | 2601.13187 | translate | read | null |
| 2026-01-19 | Prompt Injection Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching | Diego Gosmar et.al. | 2601.13186 | translate | read | null |
| 2026-01-19 | Training instability in deep learning follows low-dimensional dynamical principles | Zhipeng Zhang et.al. | 2601.13160 | translate | read | null |
| 2026-01-19 | Seeing Radio: From Zero RF Priors to Explainable Modulation Recognition with Vision Language Models | Hang Zou et.al. | 2601.13157 | translate | read | null |
| 2026-01-19 | Probe and Skip: Self-Predictive Token Skipping for Efficient Long-Context LLM Inference | Zimeng Wu et.al. | 2601.13155 | translate | read | null |
| 2026-01-19 | FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference | Chaeyoung Jung et.al. | 2601.13143 | translate | read | null |
| 2026-01-19 | From Human to Machine Refactoring: Assessing GPT-4’s Impact on Python Class Quality and Readability | Alessandro Midolo et.al. | 2601.13139 | translate | read | null |
| 2026-01-19 | Adversarial Alignment: Ensuring Value Consistency in Large Language Models for Sensitive Domains | Yuan Gao et.al. | 2601.13137 | translate | read | null |
| 2026-01-19 | Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization | Alessandro Midolo et.al. | 2601.13118 | translate | read | null |
| 2026-01-19 | Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning | Fengran Mo et.al. | 2601.13115 | translate | read | null |
| 2026-01-19 | Leveraging Lora Fine-Tuning and Knowledge Bases for Construction Identification | Liu Kaipeng et.al. | 2601.13105 | translate | read | null |
| 2026-01-19 | Alexandria: A Multi-Domain Dialectal Arabic Machine Translation Dataset for Culturally Inclusive and Linguistically Diverse LLMs | Abdellah El Mekki et.al. | 2601.13099 | translate | read | null |
| 2026-01-19 | LLM-VLM Fusion Framework for Autonomous Maritime Port Inspection using a Heterogeneous UAV-USV System | Muhayy Ud Din et.al. | 2601.13096 | translate | read | null |
| 2026-01-19 | Adversarial News and Lost Profits: Manipulating Headlines in LLM-Driven Algorithmic Trading | Advije Rizvani et.al. | 2601.13082 | translate | read | null |
| 2026-01-19 | What’s it like to be a chat? On the co-simulation of artificial minds in human-AI conversations | Geoff Keeling et.al. | 2601.13081 | translate | read | null |
| 2026-01-19 | Profiling German Text Simplification with Interpretable Model-Fingerprints | Lars Klöser et.al. | 2601.13050 | translate | read | null |
| 2026-01-19 | Tears or Cheers? Benchmarking LLMs via Culturally Elicited Distinct Affective Responses | Chongyuan Dai et.al. | 2601.13024 | translate | read | null |
| 2026-01-19 | PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning | Zhiyan Hou et.al. | 2601.13020 | translate | read | null |
| 2026-01-19 | MeltRTL: Multi-Expert LLMs with Inference-time Intervention for RTL Code Generation | Nowfel Mashnoor et.al. | 2601.13015 | translate | read | null |
| 2026-01-19 | ArchAgent: Scalable Legacy Software Architecture Recovery with LLMs | Rusheng Pan et.al. | 2601.13007 | translate | read | null |
| 2026-01-19 | Graph Reasoning Paradigm: Structured and Symbolic Reasoning with Topology-Aware Reinforcement Learning for Large Language Models | Runxuan Liu et.al. | 2601.12995 | translate | read | null |
| 2026-01-19 | RAGExplorer: A Visual Analytics System for the Comparative Diagnosis of RAG Systems | Haoyu Tian et.al. | 2601.12991 | translate | read | null |
| 2026-01-19 | PaperGuide: Making Small Language-Model Paper-Reading Agents More Efficient | Zijian Wang et.al. | 2601.12988 | translate | read | null |
| 2026-01-19 | KinGuard: Hierarchical Kinship-Aware Fingerprinting to Defend Against Large Language Model Stealing | Zhenhua Xu et.al. | 2601.12986 | translate | read | null |
| 2026-01-19 | Rules, Resources, and Restrictions: A Taxonomy of Task-Based Information Request Intents | Melanie A. Kilian et.al. | 2601.12985 | translate | read | null |
| 2026-01-19 | ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation | Jesus-German Ortiz-Barajas et.al. | 2601.12983 | translate | read | null |
| 2026-01-19 | The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check | Qingyu Lu et.al. | 2601.12979 | translate | read | null |
| 2026-01-19 | Bridging the Knowledge-Action Gap by Evaluating LLMs in Dynamic Dental Clinical Scenarios | Hongyang Ma et.al. | 2601.12974 | translate | read | null |
| 2026-01-19 | ACE-Align: Attribute Causal Effect Alignment for Cultural Values under Varying Persona Granularities | Jiatang Luo et.al. | 2601.12962 | translate | read | null |
| 2026-01-19 | Beyond Accuracy: Characterizing Code Comprehension Capabilities in (Large) Language Models | Felix Mächtle et.al. | 2601.12951 | translate | read | null |
| 2026-01-19 | AI-generated data contamination erodes pathological variability and diagnostic reliability | Hongyu He et.al. | 2601.12946 | translate | read | null |
| 2026-01-19 | A Component-Based Survey of Interactions between Large Language Models and Multi-Armed Bandits | Miao Xie et.al. | 2601.12945 | translate | read | null |
| 2026-01-19 | On the Evidentiary Limits of Membership Inference for Copyright Auditing | Murat Bilgehan Ertan et.al. | 2601.12937 | translate | read | null |
| 2026-01-19 | A Benchmark for Language Models in Real-World System Building | Weilin Jin et.al. | 2601.12927 | translate | read | null |
| 2026-01-19 | Dual-Stream Collaborative Transformer for Image Captioning | Jun Wan et.al. | 2601.12926 | translate | read | null |
| 2026-01-19 | Injecting Knowledge from Social Science Journals to Improve Indonesian Cultural Understanding by LLMs | Adimulya Kartiyasa et.al. | 2601.12921 | translate | read | null |
| 2026-01-19 | CooperLLM: Cloud-Edge-End Cooperative Federated Fine-tuning for LLMs via ZOO-based Gradient Correction | He Sun et.al. | 2601.12917 | translate | read | null |
| 2026-01-19 | From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation | Jiahao Wang et.al. | 2601.12904 | translate | read | null |
| 2026-01-19 | Efficient Code Analysis via Graph-Guided Large Language Models | Hang Gao et.al. | 2601.12890 | translate | read | null |
| 2026-01-19 | Race, Ethnicity and Their Implication on Bias in Large Language Models | Shiyue Hu et.al. | 2601.12868 | translate | read | null |
| 2026-01-19 | SCULPT: Constraint-Guided Pruned MCTS that Carves Efficient Paths for Mathematical Reasoning | Qitong Fang et.al. | 2601.12842 | translate | read | null |
| 2026-01-19 | Do Clinical Question Answering Systems Really Need Specialised Medical Fine Tuning? | Sushant Kumar Ray et.al. | 2601.12812 | translate | read | null |
| 2026-01-19 | Semi-supervised Instruction Tuning for Large Language Models on Text-Attributed Graphs | Zixing Song et.al. | 2601.12807 | translate | read | null |
| 2026-01-19 | SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding | Xiaohan Huang et.al. | 2601.12805 | translate | read | null |
| 2026-01-19 | VIRO: Robust and Efficient Neuro-Symbolic Reasoning with Verification for Referring Expression Comprehension | Hyejin Park et.al. | 2601.12781 | translate | read | null |
| 2026-01-19 | Who Does This Name Remind You of? Nationality Prediction via Large Language Model Associative Memory | Keito Inoshita et.al. | 2601.12771 | translate | read | null |
| 2026-01-19 | Spatial-VLN: Zero-Shot Vision-and-Language Navigation With Explicit Spatial Perception and Exploration | Lu Yue et.al. | 2601.12766 | translate | read | null |
| 2026-01-19 | Teaching LLMs to Learn Tool Trialing and Execution through Environment Interaction | Xingjie Gao et.al. | 2601.12762 | translate | read | link |
| 2026-01-19 | VISPA: Pluralistic Alignment via Automatic Value Selection and Activation | Shenyan Zheng et.al. | 2601.12758 | translate | read | null |
| 2026-01-19 | PAIR-SAFE: A Paired-Agent Approach for Runtime Auditing and Refining AI-Mediated Mental Health Support | Jiwon Kim et.al. | 2601.12754 | translate | read | null |
| 2026-01-19 | Towards Robust Process Reward Modeling via Noise-aware Learning | Bin Xie et.al. | 2601.12748 | translate | read | null |
| 2026-01-19 | Vision Language Models for Optimization-Driven Intent Processing in Autonomous Networks | Tasnim Ahmed et.al. | 2601.12744 | translate | read | null |
| 2026-01-19 | A Shared Geometry of Difficulty in Multilingual Language Models | Stefano Civelli et.al. | 2601.12731 | translate | read | null |
| 2026-01-19 | Distribution-Centric Policy Optimization Dominates Exploration-Exploitation Trade-off | Zhaochun Li et.al. | 2601.12730 | translate | read | link |
| 2026-01-19 | AI-exhibited Personality Traits Can Shape Human Self-concept through Conversations | Jingshu Li et.al. | 2601.12727 | translate | read | null |
| 2026-01-19 | An Evolutionary Framework for Automatic Optimization Benchmark Generation via Large Language Models | Yuhiro Ono et.al. | 2601.12723 | translate | read | null |
| 2026-01-19 | CellularSpecSec-Bench: A Staged Benchmark for Evidence-Grounded Interpretation and Security Reasoning over 3GPP Specifications | Ke Xie et.al. | 2601.12716 | translate | read | null |
| 2026-01-19 | Neurosymbolic LoRA: Why and When to Tune Weights vs. Rewrite Prompts | Kevin Wang et.al. | 2601.12711 | translate | read | null |
| 2026-01-19 | Improving Audio Question Answering with Variational Inference | Haolin Chen et.al. | 2601.12700 | translate | read | null |
| 2026-01-19 | MetaToolAgent: Towards Generalizable Tool Usage in LLMs through Meta-Learning | Zheng Fang et.al. | 2601.12680 | translate | read | null |
| 2026-01-19 | MedConsultBench: A Full-Cycle, Fine-Grained, Process-Aware Benchmark for Medical Consultation Agents | Chuhan Qiao et.al. | 2601.12661 | translate | read | null |
| 2026-01-19 | Augmenting Question Answering with A Hybrid RAG Approach | Tianyi Yang et.al. | 2601.12658 | translate | read | null |
| 2026-01-19 | Ethical Risks in Deploying Large Language Models: An Evaluation of Medical Ethics Jailbreaking | Chutian Huang et.al. | 2601.12652 | translate | read | null |
| 2026-01-19 | Intelligent Documentation in Medical Education: Can AI Replace Manual Case Logging? | Nafiz Imtiaz Khan et.al. | 2601.12648 | translate | read | null |
| 2026-01-19 | STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models | Xiangyu Shi et.al. | 2601.12641 | translate | read | null |
| 2026-01-19 | BioPulse-QA: A Dynamic Biomedical Question-Answering Benchmark for Evaluating Factuality, Robustness, and Bias in Large Language Models | Kriti Bhattarai et.al. | 2601.12632 | translate | read | null |
| 2026-01-16 | Extractive summarization on a CMOS Ising machine | Ziqing Zeng et.al. | 2601.11491 | translate | read | null |
| 2026-01-16 | Health Facility Location in Ethiopia: Leveraging LLMs to Integrate Expert Knowledge into Algorithmic Planning | Yohai Trabelsi et.al. | 2601.11479 | translate | read | null |
| 2026-01-16 | Predict the Retrieval! Test time adaptation for Retrieval Augmented Generation | Xin Sun et.al. | 2601.11443 | translate | read | null |
| 2026-01-16 | Hierarchical Orthogonal Residual Spread for Precise Massive Editing in Large Language Models | Xiaojie Gu et.al. | 2601.11441 | translate | read | null |
| 2026-01-16 | The unreasonable effectiveness of pattern matching | Gary Lupyan et.al. | 2601.11432 | translate | read | null |
| 2026-01-16 | Relational Linearity is a Predictor of Hallucinations | Yuetian Lu et.al. | 2601.11429 | translate | read | null |
| 2026-01-16 | Understanding Help Seeking for Digital Privacy, Safety, and Security | Kurt Thomas et.al. | 2601.11398 | translate | read | null |
| 2026-01-16 | Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning | Haomiao Tang et.al. | 2601.11393 | translate | read | null |
| 2026-01-16 | Evaluating LLM Behavior in Hiring: Implicit Weights, Fairness Across Groups, and Alignment with Human Preferences | Morgane Hoffmann et.al. | 2601.11379 | translate | read | null |
| 2026-01-16 | Reward Modeling for Scientific Writing Evaluation | Furkan Şahinuç et.al. | 2601.11374 | translate | read | null |
| 2026-01-16 | RITA: A Tool for Automated Requirements Classification and Specification from Online User Feedback | Manjeshwar Aniruddh Mallya et.al. | 2601.11362 | translate | read | null |
| 2026-01-16 | Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding | Wenhui Tan et.al. | 2601.11359 | translate | read | null |
| 2026-01-16 | AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems | Weiyi Wang et.al. | 2601.11354 | translate | read | link |
| 2026-01-16 | How Much Would a Clinician Edit This Draft? Evaluating LLM Alignment for Patient Message Response Drafting | Parker Seegmiller et.al. | 2601.11344 | translate | read | null |
| 2026-01-16 | Unlocking the Potentials of Retrieval-Augmented Generation for Diffusion Language Models | Chuanyue Yu et.al. | 2601.11342 | translate | read | null |
| 2026-01-16 | Neural Chain-of-Thought Search: Searching the Optimal Reasoning Path to Enhance Large Language Models | Guoming Ling et.al. | 2601.11340 | translate | read | null |
| 2026-01-16 | Idea First, Code Later: Disentangling Problem Solving from Code Generation in Evaluating LLMs for Competitive Programming | Sama Hadhoud et.al. | 2601.11332 | translate | read | null |
| 2026-01-16 | Membership Inference on LLMs in the Wild | Jiatong Yi et.al. | 2601.11314 | translate | read | null |
| 2026-01-16 | FORESTLLM: Large Language Models Make Random Forest Great on Few-shot Tabular Learning | Zhihan Yang et.al. | 2601.11311 | translate | read | null |
| 2026-01-16 | One LLM to Train Them All: Multi-Task Learning Framework for Fact-Checking | Malin Astrid Larsson et.al. | 2601.11293 | translate | read | null |
| 2026-01-16 | Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation | Pingzhi Tang et.al. | 2601.11258 | translate | read | null |
| 2026-01-16 | Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering | Yuling Shi et.al. | 2601.11255 | translate | read | null |
| 2026-01-16 | LLM-Assisted Pseudo-Relevance Feedback | David Otero et.al. | 2601.11238 | translate | read | null |
| 2026-01-16 | How DDAIR you? Disambiguated Data Augmentation for Intent Recognition | Galo Castillo-López et.al. | 2601.11234 | translate | read | null |
| 2026-01-16 | FactCorrector: A Graph-Inspired Approach to Long-Form Factuality Correction of Large Language Models | Javier Carnerero-Cano et.al. | 2601.11232 | translate | read | null |
| 2026-01-16 | Language of Thought Shapes Output Diversity in Large Language Models | Shaoyang Xu et.al. | 2601.11227 | translate | read | null |
| 2026-01-16 | MultiCaption: Detecting disinformation using multilingual visual claims | Rafael Martins Frade et.al. | 2601.11220 | translate | read | null |
| 2026-01-16 | SDFLoRA: Selective Dual-Module LoRA for Federated Fine-tuning with Heterogeneous Clients | Zhikang Shen et.al. | 2601.11219 | translate | read | null |
| 2026-01-16 | FAQ: Mitigating Quantization Error via Regenerating Calibration Data with Family-Aware Quantization | Haiyang Xiao et.al. | 2601.11200 | translate | read | null |
| 2026-01-16 | SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation | Aiman Al Masoud et.al. | 2601.11199 | translate | read | null |
| 2026-01-16 | From Knots to Knobs: Towards Steerable Collaborative Filtering Using Sparse Autoencoders | Martin Spišák et.al. | 2601.11182 | translate | read | null |
| 2026-01-16 | Do We Always Need Query-Level Workflows? Rethinking Agentic Workflow Generation for Multi-Agent Systems | Zixu Wang et.al. | 2601.11147 | translate | read | null |
| 2026-01-16 | Learn Before Represent: Bridging Generative and Contrastive Learning for Domain-Specific LLM Embeddings | Xiaoyu Liang et.al. | 2601.11124 | translate | read | null |
| 2026-01-16 | Optimized Algorithms for Text Clustering with LLM-Generated Constraints | Chaoqi Jia et.al. | 2601.11118 | translate | read | null |
| 2026-01-16 | Differentially Private Subspace Fine-Tuning for Large Language Models | Lele Zheng et.al. | 2601.11113 | translate | read | null |
| 2026-01-16 | Simple Models, Rich Representations: Visual Decoding from Primate Intracortical Neural Signals | Matteo Ciferri et.al. | 2601.11108 | translate | read | null |
| 2026-01-16 | ReCreate: Reasoning and Creating Domain Agents Driven by Experience | Zhezheng Hao et.al. | 2601.11100 | translate | read | null |
| 2026-01-16 | Integrity Shield A System for Ethical AI Use & Authorship Transparency in Assessments | Ashish Raj Shekhar et.al. | 2601.11093 | translate | read | null |
| 2026-01-16 | ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development | Jie Yang et.al. | 2601.11077 | translate | read | link |
| 2026-01-16 | Visual question answering-based image-finding generation for pulmonary nodules on chest CT from structured annotations | Maiko Nagao et.al. | 2601.11075 | translate | read | null |
| 2026-01-16 | H-AIM: Orchestrating LLMs, PDDL, and Behavior Trees for Hierarchical Multi-Robot Planning | Haishan Zeng et.al. | 2601.11063 | translate | read | null |
| 2026-01-16 | Children’s Expectations, Engagement, and Evaluation of an LLM-enabled Spherical Visualization Platform in the Classroom | Emelie Fälton et.al. | 2601.11060 | translate | read | null |
| 2026-01-16 | Predicting Biased Human Decision-Making with Large Language Models in Conversational Settings | Stephen Pilli et.al. | 2601.11049 | translate | read | null |
| 2026-01-16 | CoG: Controllable Graph Reasoning via Relational Blueprints and Failure-Aware Refinement over Knowledge Graphs | Yuanxiang Liu et.al. | 2601.11047 | translate | read | null |
| 2026-01-16 | AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts | Keyu Li et.al. | 2601.11044 | translate | read | link |
| 2026-01-16 | Spectral Characterization and Mitigation of Sequential Knowledge Editing Collapse | Chi Zhang et.al. | 2601.11042 | translate | read | null |
| 2026-01-16 | Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data | Xuanming Zhang et.al. | 2601.11038 | translate | read | null |
| 2026-01-16 | PruneRAG: Confidence-Guided Query Decomposition Trees for Efficient Retrieval-Augmented Generation | Shuguang Jiao et.al. | 2601.11024 | translate | read | null |
| 2026-01-16 | Combating Spurious Correlations in Graph Interpretability via Self-Reflection | Kecheng Cai et.al. | 2601.11021 | translate | read | null |
| 2026-01-16 | Finding the Translation Switch: Discovering and Exploiting the Task-Initiation Features in LLMs | Xinwei Wu et.al. | 2601.11019 | translate | read | null |
| 2026-01-16 | NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems | Jiayu Liu et.al. | 2601.11004 | translate | read | null |
| 2026-01-16 | Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies | Qianen Zhang et.al. | 2601.11002 | translate | read | null |
| 2026-01-16 | When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs | Zhongxiang Sun et.al. | 2601.11000 | translate | read | null |
| 2026-01-16 | Data-driven Prediction of Ionic Conductivity in Solid-State Electrolytes with Machine Learning and Large Language Models | Haewon Kim et.al. | 2601.10997 | translate | read | null |
| 2026-01-16 | ZPD Detector: Data Selection via Capability-Difficulty Alignment for Large Language Models | Bo Yang et.al. | 2601.10986 | translate | read | null |
| 2026-01-16 | Evaluating 21st-Century Competencies in Postsecondary Curricula with Large Language Models: Performance Benchmarking and Reasoning-Based Prompting Strategies | Zhen Xu et.al. | 2601.10983 | translate | read | null |
| 2026-01-16 | AJAR: Adaptive Jailbreak Architecture for Red-teaming | Yipu Dou et.al. | 2601.10971 | translate | read | null |
| 2026-01-16 | Large Wireless Foundation Models: Stronger over Bigger | Xiang Cheng et.al. | 2601.10963 | translate | read | null |
| 2026-01-16 | Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents | Kaiyu Zhou et.al. | 2601.10955 | translate | read | null |
| 2026-01-16 | SwiftKV: An Edge-Oriented Attention Algorithm and Multi-Head Accelerator for Fast, Efficient LLM Decoding | Junming Zhang et.al. | 2601.10953 | translate | read | null |
| 2026-01-16 | Multi-Stage Patient Role-Playing Framework for Realistic Clinical Interactions | Shijie Jiang et.al. | 2601.10951 | translate | read | null |
| 2026-01-16 | HOSL: Hybrid-Order Split Learning for Memory-Constrained Edge Training | Aakriti et.al. | 2601.10940 | translate | read | null |
| 2026-01-15 | FrankenMotion: Part-level Human Motion Generation and Composition | Chuqiao Li et.al. | 2601.10909 | translate | read | link |
| 2026-01-15 | Topic Modeling in New Physics Detection | Alexandre Alves et.al. | 2601.10871 | translate | read | null |
| 2026-01-15 | Multi-Agent Taint Specification Extraction for Vulnerability Detection | Jonah Ghebremichael et.al. | 2601.10865 | translate | read | null |
| 2026-01-15 | Reasoning Models Generate Societies of Thought | Junsol Kim et.al. | 2601.10825 | translate | read | null |
| 2026-01-15 | Mugi: Value Level Parallelism For Efficient LLMs | Daniel Price et.al. | 2601.10823 | translate | read | null |
| 2026-01-15 | Digital Metabolism: Decoupling Logic from Facts via Regenerative Unlearning – Towards a Pure Neural Logic Core | Mengmeng Peng et.al. | 2601.10810 | translate | read | null |
| 2026-01-15 | A Concise Agent is Less Expert: Revealing Side Effects of Using Style Features on Conversational Agents | Young-Min Cho et.al. | 2601.10809 | translate | read | null |
| 2026-01-15 | BYOL: Bring Your Own Language Into LLMs | Syed Waqas Zamir et.al. | 2601.10804 | translate | read | null |
| 2026-01-15 | Bidirectional Human-Robot Communication for Physical Human-Robot Interaction | Junxiang Wang et.al. | 2601.10796 | translate | read | null |
| 2026-01-15 | LogicLens: Leveraging Semantic Code Graph to explore Multi Repository large systems | Niko Usai et.al. | 2601.10773 | translate | read | null |
| 2026-01-15 | Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers | Runyuan Cai et.al. | 2601.10770 | translate | read | null |
| 2026-01-14 | Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents | Fengchao Chen et.al. | 2601.10758 | translate | read | null |
| 2026-01-15 | MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching | Changle Qu et.al. | 2601.10712 | translate | read | link |
| 2026-01-15 | From One-to-One to Many-to-Many: Dynamic Cross-Layer Injection for Deep Vision-Language Fusion | Cheng Chen et.al. | 2601.10710 | translate | read | null |
| 2026-01-15 | Grounding Agent Memory in Contextual Intent | Ruozhen Yang et.al. | 2601.10702 | translate | read | null |
| 2026-01-15 | LIBERTy: A Causal Framework for Benchmarking Concept-Based Explanations of LLMs with Structural Counterfactuals | Gilat Toker et.al. | 2601.10700 | translate | read | null |
| 2026-01-15 | Structure and Diversity Aware Context Bubble Construction for Enterprise Retrieval Augmented Systems | Amir Khurshid et.al. | 2601.10681 | translate | read | null |
| 2026-01-15 | Are Your Reasoning Models Reasoning or Guessing? A Mechanistic Analysis of Hierarchical Reasoning Models | Zirui Ren et.al. | 2601.10679 | translate | read | null |
| 2026-01-15 | Single-Stage Huffman Encoder for ML Compression | Aditya Agrawal et.al. | 2601.10673 | translate | read | null |
| 2026-01-15 | Detecting Winning Arguments with Large Language Models and Persuasion Strategies | Tiziano Labruna et.al. | 2601.10660 | translate | read | null |
| 2026-01-15 | PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution | Minghao Yan et.al. | 2601.10657 | translate | read | null |
| 2026-01-15 | Influential Training Data Retrieval for Explaining Verbalized Confidence of LLMs | Yuxi Xia et.al. | 2601.10645 | translate | read | null |
| 2026-01-15 | iTIMO: An LLM-empowered Synthesis Dataset for Travel Itinerary Modification | Zhuoxuan Huang et.al. | 2601.10609 | translate | read | null |
| 2026-01-15 | Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay | Hao Wang et.al. | 2601.10589 | translate | read | null |
| 2026-01-15 | From Single to Multi-Agent Reasoning: Advancing GeneGPT for Genomics QA | Kimia Abedini et.al. | 2601.10581 | translate | read | null |
| 2026-01-15 | Generative AI collective behavior needs an interactionist paradigm | Laura Ferrarotti et.al. | 2601.10567 | translate | read | null |
| 2026-01-15 | Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing | Yinzhi Zhao et.al. | 2601.10543 | translate | read | null |
| 2026-01-15 | A Propagation Framework for Network Regression | Yingying Ma et.al. | 2601.10533 | translate | read | null |
| 2026-01-15 | PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models | Chengbing Wang et.al. | 2601.10532 | translate | read | null |
| 2026-01-15 | A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5 | Xingjun Ma et.al. | 2601.10527 | translate | read | null |
| 2026-01-15 | Diagnosing Generalization Failures in Fine-Tuned LLMs: A Cross-Architectural Study on Phishing Detection | Frank Bobe et.al. | 2601.10524 | translate | read | null |
| 2026-01-15 | DR-Arena: an Automated Evaluation Framework for Deep Research Agents | Yiwen Gao et.al. | 2601.10504 | translate | read | null |
| 2026-01-15 | Projected Microbatch Accumulation yields reference-free proximal policy updates for reinforcement learning | Nilin Abrahamsen et.al. | 2601.10498 | translate | read | null |
| 2026-01-15 | Model See, Model Do? Exposure-Aware Evaluation of Bug-vs-Fix Preference in Code LLMs | Ali Al-Kaswan et.al. | 2601.10496 | translate | read | null |
| 2026-01-15 | ChartComplete: A Taxonomy-based Inclusive Chart Dataset | Ahmad Mustapha et.al. | 2601.10462 | translate | read | null |
| 2026-01-15 | Contextual StereoSet: Stress-Testing Bias Alignment Robustness in Large Language Models | Abhinaba Basu et.al. | 2601.10460 | translate | read | null |
| 2026-01-15 | LangLasso: Interactive Cluster Descriptions through LLM Explanation | Raphael Buchmüller et.al. | 2601.10458 | translate | read | null |
| 2026-01-15 | NSR-Boost: A Neuro-Symbolic Residual Boosting Framework for Industrial Legacy Models | Ziming Dai et.al. | 2601.10457 | translate | read | null |
| 2026-01-15 | Development of Ontological Knowledge Bases by Leveraging Large Language Models | Le Ngoc Luyen et.al. | 2601.10436 | translate | read | null |
| 2026-01-15 | LLMdoctor: Token-Level Flow-Guided Preference Optimization for Efficient Test-Time Alignment of Large Language Models | Tiesunlong Shen et.al. | 2601.10416 | translate | read | null |
| 2026-01-15 | LADFA: A Framework of Using Large Language Models and Retrieval-Augmented Generation for Personal Data Flow Analysis in Privacy Policies | Haiyue Yuan et.al. | 2601.10413 | translate | read | null |
| 2026-01-15 | Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering | Xinyu Zhu et.al. | 2601.10402 | translate | read | null |
| 2026-01-15 | LatentRefusal: Latent-Signal Refusal for Unanswerable Text-to-SQL Queries | Xuancheng Ren et.al. | 2601.10398 | translate | read | null |
| 2026-01-15 | The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models | Christina Lu et.al. | 2601.10387 | translate | read | null |
| 2026-01-15 | Advanced Manufacturing with Renewable and Bio-based Materials: AI/ML workflows and Process Optimization | Rigoberto Advincula et.al. | 2601.10382 | translate | read | null |
| 2026-01-15 | Fine-Grained Human Pose Editing Assessment via Layer-Selective MLLMs | Ningyu Sun et.al. | 2601.10369 | translate | read | null |
| 2026-01-15 | Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text | Zhihao Xu et.al. | 2601.10355 | translate | read | null |
| 2026-01-15 | SuS: Strategy-aware Surprise for Intrinsic Exploration | Mark Kashirskiy et.al. | 2601.10349 | translate | read | null |
| 2026-01-15 | C-GRASP: Clinically-Grounded Reasoning for Affective Signal Processing | Cheng Lin Cheng et.al. | 2601.10342 | translate | read | null |
| 2026-01-15 | Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders | Siqi Kou et.al. | 2601.10332 | translate | read | null |
| 2026-01-15 | ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding | Xueyun Tian et.al. | 2601.10323 | translate | read | null |
| 2026-01-15 | An Efficient Long-Context Ranking Architecture With Calibrated LLM Distillation: Application to Person-Job Fit | Warren Jouanneau et.al. | 2601.10321 | translate | read | null |
| 2026-01-15 | The Straight and Narrow: Do LLMs Possess an Internal Moral Path? | Luoming Hu et.al. | 2601.10307 | translate | read | null |
| 2026-01-15 | DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset | Hengyu Shen et.al. | 2601.10305 | translate | read | null |
| 2026-01-15 | Queueing-Aware Optimization of Reasoning Tokens for Accuracy-Latency Trade-offs in LLM Servers | Emre Ozbas et.al. | 2601.10274 | translate | read | null |
| 2026-01-15 | MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts | Yuxuan Lou et.al. | 2601.10272 | translate | read | null |
| 2026-01-15 | In-Context Source and Channel Coding | Ziqiong Wang et.al. | 2601.10267 | translate | read | null |
| 2026-01-15 | NoReGeo: Non-Reasoning Geometry Benchmark | Irina Abdullaeva et.al. | 2601.10254 | translate | read | null |
| 2026-01-15 | Loop as a Bridge: Can Looped Transformers Truly Link Representation Space and Natural Language Outputs? | Guanxu Chen et.al. | 2601.10242 | translate | read | null |
| 2026-01-15 | GeoSteer: Faithful Chain-of-Thought Steering via Latent Manifold Gradients | Kentaro Kazama et.al. | 2601.10229 | translate | read | null |
| 2026-01-15 | Optimizing Multimodal LLMs for Egocentric Video Understanding: A Solution for the HD-EPIC VQA Challenge | Sicheng Yang et.al. | 2601.10228 | translate | read | null |
| 2026-01-15 | PRL: Process Reward Learning Improves LLMs’ Reasoning Ability and Broadens the Reasoning Boundary | Jiarui Yao et.al. | 2601.10201 | translate | read | null |
| 2026-01-15 | HUMANLLM: Benchmarking and Reinforcing LLM Anthropomorphism via Human Cognitive Patterns | Xintao Wang et.al. | 2601.10198 | translate | read | null |
| 2026-01-15 | Autonomous Quantum Simulation through Large Language Model Agents | Weitang Li et.al. | 2601.10194 | translate | read | null |
| 2026-01-15 | GFM4GA: Graph Foundation Model for Group Anomaly Detection | Jiujiu Chen et.al. | 2601.10193 | translate | read | null |
| 2026-01-15 | HOMURA: Taming the Sand-Glass for Time-Constrained LLM Translation via Reinforcement Learning | Ziang Cui et.al. | 2601.10187 | translate | read | null |
| 2026-01-15 | ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack | Hao Li et.al. | 2601.10173 | translate | read | null |
| 2026-01-15 | Credit C-GPT: A Domain-Specialized Large Language Model for Conversational Understanding in Vietnamese Debt Collection | Nhung Nguyen Thi Hong et.al. | 2601.10167 | translate | read | null |
| 2026-01-15 | Advancing Adaptive Multi-Stage Video Anomaly Reasoning: A Benchmark Dataset and Method | Chao Huang et.al. | 2601.10165 | translate | read | null |
| 2026-01-15 | AWED-FiNER: Agents, Web applications, and Expert Detectors for Fine-grained Named Entity Recognition across 36 Languages for 6.6 Billion Speakers | Prachuryya Kaushik et.al. | 2601.10161 | translate | read | link |
| 2026-01-15 | LOOKAT: Lookup-Optimized Key-Attention for Memory-Efficient Transformers | Aryan Karmore et.al. | 2601.10155 | translate | read | null |
| 2026-01-15 | DecisionLLM: Large Language Models for Long Sequence Decision Exploration | Xiaowei Lv et.al. | 2601.10148 | translate | read | null |
| 2026-01-15 | Actors, Frames and Arguments: A Multi-Decade Computational Analysis of Climate Discourse in Financial News using Large Language Models | Ruiran Su et.al. | 2601.10142 | translate | read | null |
| 2026-01-15 | Understanding and Preserving Safety in Fine-Tuned LLMs | Jiawen Zhang et.al. | 2601.10141 | translate | read | null |
| 2026-01-15 | Is More Context Always Better? Examining LLM Reasoning Capability for Time Interval Prediction | Yanan Cao et.al. | 2601.10132 | translate | read | null |
| 2026-01-15 | M^4olGen: Multi-Agent, Multi-Stage Molecular Generation under Precise Multi-Property Constraints | Yizhan Li et.al. | 2601.10131 | translate | read | null |
| 2026-01-15 | LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning | Linquan Wu et.al. | 2601.10129 | translate | read | link |
| 2026-01-15 | Role-Playing Agents Driven by Large Language Models: Current Status, Challenges, and Future Trends | Ye Wang et.al. | 2601.10122 | translate | read | null |
| 2026-01-15 | Following the Teacher’s Footsteps: Scheduled Checkpoint Distillation for Domain-Specific LLMs | Cheng Feng et.al. | 2601.10114 | translate | read | null |
| 2026-01-15 | SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature | Yiming Ren et.al. | 2601.10108 | translate | read | null |
| 2026-01-15 | When Personas Override Payoffs: Role Identity Bias in Multi-Agent LLM Decision-Making | Viswonathan Manoranjan et.al. | 2601.10102 | translate | read | null |
| 2026-01-15 | MATRIX AS PLAN: Structured Logical Reasoning with Feedback-Driven Replanning | Ke Chen et.al. | 2601.10101 | translate | read | null |
| 2026-01-15 | Multilingual-To-Multimodal (M2M): Unlocking New Languages with Monolingual Text | Piyush Singh Pasi et.al. | 2601.10096 | translate | read | link |
| 2026-01-15 | State of AI: An Empirical 100 Trillion Token Study with OpenRouter | Malika Aubakirova et.al. | 2601.10088 | translate | read | null |
| 2026-01-15 | CALM-IT: Generating Realistic Long-Form Motivational Interviewing Dialogues with Dual-Actor Conversational Dynamics Tracking | Viet Cuong Nguyen et.al. | 2601.10085 | translate | read | null |
| 2026-01-15 | Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts | Sijia Luo et.al. | 2601.10079 | translate | read | null |
| 2026-01-15 | Long-Chain Reasoning Distillation via Adaptive Prefix Alignment | Zhenghao Liu et.al. | 2601.10064 | translate | read | null |
| 2026-01-15 | Unlabeled Data Can Provably Enhance In-Context Learning of Transformers | Renpu Liu et.al. | 2601.10058 | translate | read | null |
| 2026-01-15 | Privacy Enhanced PEFT: Tensor Train Decomposition Improves Privacy Utility Tradeoffs under DP-SGD | Pradip Kunwar et.al. | 2601.10045 | translate | read | null |
| 2026-01-15 | Instruction Finetuning LLaMA-3-8B Model Using LoRA for Financial Named Entity Recognition | Zhiming Lian et.al. | 2601.10043 | translate | read | null |
| 2026-01-15 | EmplifAI: a Fine-grained Dataset for Japanese Empathetic Medical Dialogues in 28 Emotion Labels | Wan Jou She et.al. | 2601.10033 | translate | read | null |
| 2026-01-15 | Structured Personality Control and Adaptation for LLM Agents | Jinpeng Wang et.al. | 2601.10025 | translate | read | null |
| 2026-01-15 | Empowering Older Adults in Digital Technology Use with Foundation Models | Hasti Sharifi et.al. | 2601.10018 | translate | read | null |
| 2026-01-15 | VERHallu: Evaluating and Mitigating Event Relation Hallucination in Video Large Language Models | Zefan Zhang et.al. | 2601.10010 | translate | read | null |
| 2026-01-15 | SoK: Privacy-aware LLM in Healthcare: Threat Model, Privacy Techniques, Challenges and Recommendations | Mohoshin Ara Tahera et.al. | 2601.10004 | translate | read | null |
| 2026-01-15 | Towards Native Intelligence: 6G-LLM Trained with Reinforcement Learning from NDT Feedback | Zhuoran Xiao et.al. | 2601.09992 | translate | read | null |
| 2026-01-15 | Context Volume Drives Performance: Tackling Domain Shift in Extremely Low-Resource Translation via RAG | David Samuel Setiawan et.al. | 2601.09982 | translate | read | null |
| 2026-01-15 | DR $^2$ Seg: Decomposed Two-Stage Rollouts for Efficient Reasoning Segmentation in Multimodal Large Language Models | Yulin He et.al. | 2601.09981 | translate | read | null |
| 2026-01-15 | Performance of AI agents based on reasoning language models on ALD process optimization tasks | Angel Yanguas-Gil et.al. | 2601.09980 | translate | read | null |
| 2026-01-15 | SPRInG: Continual LLM Personalization via Selective Parametric Adaptation and Retrieval-Interpolated Generation | Seoyeon Kim et.al. | 2601.09974 | translate | read | null |
| 2026-01-15 | Chinese Labor Law Large Language Model Benchmark | Zixun Lan et.al. | 2601.09972 | translate | read | null |
| 2026-01-15 | An Exploratory Study to Repurpose LLMs to a Unified Architecture for Time Series Classification | Hansen He et.al. | 2601.09971 | translate | read | null |
| 2026-01-15 | Take Out Your Calculators: Estimating the Real Difficulty of Question Items with LLM Student Simulations | Christabel Acquaye et.al. | 2601.09953 | translate | read | null |
| 2026-01-14 | How Diplomacy Reshapes Online Discourse:Asymmetric Persistence in Online Framing of North Korea | Hunjun Shin et.al. | 2601.09942 | translate | read | null |
| 2026-01-14 | Hallucination Detection and Mitigation in Large Language Models | Ahmad Pesaranghader et.al. | 2601.09929 | translate | read | null |
| 2026-01-14 | Continuum Memory Architectures for Long-Horizon LLM Agents | Joe Logan et.al. | 2601.09913 | translate | read | null |
| 2026-01-14 | Self-reflection in Automated Qualitative Coding: Improving Text Annotation through Secondary LLM Critique | Zackary Okun Dunivin et.al. | 2601.09905 | translate | read | null |
| 2026-01-14 | Beyond Rule-Based Workflows: An Information-Flow-Orchestrated Multi-Agents Paradigm via Agent-to-Agent Communication from CORAL | Xinxing Ren et.al. | 2601.09883 | translate | read | null |
| 2026-01-14 | MedVL-SAM2: A unified 3D medical vision-language model for multimodal reasoning and prompt-driven segmentation | Yang Xing et.al. | 2601.09879 | translate | read | null |
| 2026-01-14 | Beyond Strict Rules: Assessing the Effectiveness of Large Language Models for Code Smell Detection | Saymon Souza et.al. | 2601.09873 | translate | read | null |
| 2026-01-14 | A Scoping Review of the Ethical Perspectives on Anthropomorphising Large Language Model-Based Conversational Agents | Andrea Ferrario et.al. | 2601.09869 | translate | read | null |
| 2026-01-14 | Advancing Model Refinement: Muon-Optimized Distillation and Quantization for LLM Deployment | Jacob Sander et.al. | 2601.09865 | translate | read | null |
| 2026-01-14 | OUTLINEFORGE: Hierarchical Reinforcement Learning with Explicit States for Scientific Writing | Yilin Bao et.al. | 2601.09858 | translate | read | null |
| 2026-01-14 | MedRedFlag: Investigating how LLMs Redirect Misconceptions in Real-World Health Communication | Sraavya Sambara et.al. | 2601.09853 | translate | read | null |
| 2026-01-14 | Strategies of cooperation and defection in five large language models | Saptarshi Pal et.al. | 2601.09849 | translate | read | null |
| 2026-01-14 | Stable and Explainable Personality Trait Evaluation in Large Language Models with Internal Activations | Xiaoxu Ma et.al. | 2601.09833 | translate | read | null |
| 2026-01-14 | UniHash: Unifying Pointwise and Pairwise Hashing Paradigms for Seen and Unseen Category Retrieval | Xiaoxu Ma et.al. | 2601.09828 | translate | read | null |
| 2026-01-14 | LLM-Based Agentic Systems for Software Engineering: Challenges and Opportunities | Yongjian Tang et.al. | 2601.09822 | translate | read | null |
| 2026-01-14 | Antisocial behavior towards large language model users: experimental evidence | Paweł Niszczota et.al. | 2601.09772 | translate | read | null |
| 2026-01-14 | Explicating Tacit Regulatory Knowledge from LLMs to Auto-Formalize Requirements for Compliance Test Case Generation | Zhiyi Xue et.al. | 2601.09762 | translate | read | null |
| 2026-01-14 | Investigating Tool-Memory Conflicts in Tool-Augmented LLMs | Jiali Cheng et.al. | 2601.09760 | translate | read | null |
| 2026-01-13 | Synthetic Data for Veterinary EHR De-identification: Benefits, Limits, and Safety Trade-offs Under Fixed Compute | David Brundage et.al. | 2601.09756 | translate | read | null |
| 2026-01-12 | SAGE: Tool-Augmented LLM Task Solving Strategies in Scalable Multi-Agent Environments | Robert K. Strehlow et.al. | 2601.09750 | translate | read | null |
| 2026-01-14 | ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation | Sicong Liu et.al. | 2601.09703 | translate | read | null |
| 2026-01-14 | How well LLM-based test generation techniques perform with newer LLM versions? | Michael Konstantinou et.al. | 2601.09695 | translate | read | null |
| 2026-01-14 | LLMs can Compress LLMs: Adaptive Pruning by Agents | Sai Varun Kodathala et.al. | 2601.09694 | translate | read | null |
| 2026-01-14 | Routing with Generated Data: Annotation-Free LLM Skill Estimation and Expert Selection | Tianyi Niu et.al. | 2601.09692 | translate | read | null |
| 2026-01-14 | Disentangling Task Conflicts in Multi-Task LoRA via Orthogonal Gradient Projection | Ziyu Yang et.al. | 2601.09684 | translate | read | null |
| 2026-01-14 | Automating Supply Chain Disruption Monitoring via an Agentic AI Approach | Sara AlMahri et.al. | 2601.09680 | translate | read | null |
| 2026-01-14 | LLMs Got Rhythm? Hybrid Phonological Filtering for Greek Poetry Rhyme Detection and Generation | Stergios Chatzikyriakidis et.al. | 2601.09631 | translate | read | null |
| 2026-01-14 | From Prompt to Protocol: Fast Charging Batteries with Large Language Models | Ge Lei et.al. | 2601.09626 | translate | read | null |
| 2026-01-14 | The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multi-Step Malware | Ben Nassi et.al. | 2601.09625 | translate | read | null |
| 2026-01-14 | DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing | Qian Cao et.al. | 2601.09609 | translate | read | null |
| 2026-01-14 | GRCF: Two-Stage Groupwise Ranking and Calibration Framework for Multimodal Sentiment Analysis | Manning Gao et.al. | 2601.09606 | translate | read | null |
| 2026-01-14 | OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding | Sheng-Yu Huang et.al. | 2601.09575 | translate | read | null |
| 2026-01-14 | Dialogue Telemetry: Turn-Level Instrumentation for Autonomous Information Gathering | Dimitris Panagopoulos et.al. | 2601.09570 | translate | read | null |
| 2026-01-14 | Hot-Start from Pixels: Low-Resolution Visual Tokens for Chinese Language Modeling | Shuyang Xiang et.al. | 2601.09566 | translate | read | null |
| 2026-01-14 | Benchmarking Post-Training Quantization of Large Language Models under Microscaling Floating Point Formats | Manyi Zhang et.al. | 2601.09555 | translate | read | null |
| 2026-01-14 | Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning | Dongjie Cheng et.al. | 2601.09536 | translate | read | link |
| 2026-01-14 | MVSS: A Unified Framework for Multi-View Structured Survey Generation | Yinqi Liu et.al. | 2601.09504 | translate | read | null |
| 2026-01-14 | What Do LLM Agents Know About Their World? Task2Quiz: A Paradigm for Studying Environment Understanding | Siyuan Liu et.al. | 2601.09503 | translate | read | null |
| 2026-01-14 | SlidesGen-Bench: Evaluating Slides Generation via Computational and Quantitative Metrics | Yunqiao Yang et.al. | 2601.09487 | translate | read | link |
| 2026-01-14 | Bridging Semantic Understanding and Popularity Bias with LLMs | Renqiang Luo et.al. | 2601.09478 | translate | read | null |
| 2026-01-14 | SimMerge: Learning to Select Merge Operators from Similarity Signals | Oliver Bolton et.al. | 2601.09473 | translate | read | null |
| 2026-01-14 | Personalized Multimodal Feedback Using Multiple External Representations: Strategy Profiles and Learning in High School Physics | Natalia Revenga-Lozano et.al. | 2601.09470 | translate | read | null |
| 2026-01-14 | Dissecting Judicial Reasoning in U.S. Copyright Damage Awards | Pei-Chi Lo et.al. | 2601.09459 | translate | read | null |
| 2026-01-14 | Population-Aligned Audio Reproduction With LLM-Based Equalizers | Ioannis Stylianou et.al. | 2601.09448 | translate | read | null |
| 2026-01-14 | Improving Symbolic Translation of Language Models for Logical Reasoning | Ramya Keerthy Thatikonda et.al. | 2601.09446 | translate | read | null |
| 2026-01-14 | SC-MAS: Constructing Cost-Efficient Multi-Agent Systems with Edge-Level Heterogeneous Collaboration | Di Zhao et.al. | 2601.09434 | translate | read | null |
| 2026-01-14 | Video-MSR: Benchmarking Multi-hop Spatial Reasoning Capabilities of MLLMs | Rui Zhu et.al. | 2601.09430 | translate | read | null |
| 2026-01-14 | TiInsight: A SQL-based Automated Exploratory Data Analysis System through Large Language Models | Jun-Peng Zhu et.al. | 2601.09404 | translate | read | null |
| 2026-01-14 | Structured Knowledge Representation through Contextual Pages for Retrieval-Augmented Generation | Xinze Li et.al. | 2601.09402 | translate | read | null |
| 2026-01-14 | Ability Transfer and Recovery via Modularized Parameters Localization | Songyao Jin et.al. | 2601.09398 | translate | read | null |
| 2026-01-14 | SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing | Ziyang Ma et.al. | 2601.09385 | translate | read | null |
| 2026-01-14 | Long-term Task-oriented Agent: Proactive Long-term Intent Maintenance in Dynamic Environments | Qinglong Shi et.al. | 2601.09382 | translate | read | null |
| 2026-01-14 | The Imperfective Paradox in Large Language Models | Bolei Ma et.al. | 2601.09373 | translate | read | null |
| 2026-01-14 | Relation Extraction Capabilities of LLMs on Clinical Text: A Bilingual Evaluation for English and Turkish | Aidana Aidynkyzy et.al. | 2601.09367 | translate | read | null |
| 2026-01-14 | See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval | Mingyu Jeon et.al. | 2601.09350 | translate | read | null |
| 2026-01-14 | SpatialJB: How Text Distribution Art Becomes the “Jailbreak Key” for LLM Guardrails | Zhiyi Mou et.al. | 2601.09321 | translate | read | null |
| 2026-01-14 | On-Device Large Language Models for Sequential Recommendation | Xin Xia et.al. | 2601.09306 | translate | read | null |
| 2026-01-14 | Multi-Modal LLM based Image Captioning in ICT: Bridging the Gap Between General and Industry Domain | Lianying Chao et.al. | 2601.09298 | translate | read | null |
| 2026-01-14 | MACRO-LLM: LLM-Empowered Multi-Agent Collaborative Reasoning under Spatiotemporal Partial Observability | Handi Chen et.al. | 2601.09295 | translate | read | null |
| 2026-01-14 | Enhancing Spatial Reasoning in Large Language Models for Metal-Organic Frameworks Structure Prediction | Mianzhi Pan et.al. | 2601.09285 | translate | read | null |
| 2026-01-14 | Cluster Workload Allocation: Semantic Soft Affinity Using Natural Language Processing | Leszek Sliwko et.al. | 2601.09282 | translate | read | null |
| 2026-01-14 | STaR: Sensitive Trajectory Regulation for Unlearning in Large Reasoning Models | Jingjing Zhou et.al. | 2601.09281 | translate | read | null |
| 2026-01-14 | ReGraM: Region-First Knowledge Graph Reasoning for Medical Question Answering | Chaerin Lee et.al. | 2601.09280 | translate | read | null |
| 2026-01-14 | MCGA: A Multi-task Classical Chinese Literary Genre Audio Corpus | Yexing Du et.al. | 2601.09270 | translate | read | link |
| 2026-01-14 | RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering | Wencheng Ye et.al. | 2601.09269 | translate | read | link |
| 2026-01-14 | Coordinated Pandemic Control with Large Language Model Agents as Policymaking Assistants | Ziyi Shi et.al. | 2601.09264 | translate | read | null |
| 2026-01-14 | Efficient Paths and Dense Rewards: Probabilistic Flow Reasoning for Large Language Models | Yan Liu et.al. | 2601.09260 | translate | read | null |
| 2026-01-14 | MAXS: Meta-Adaptive Exploration with LLM Agents | Jian Zhang et.al. | 2601.09259 | translate | read | link |
| 2026-01-14 | When to Invoke: Refining LLM Fairness with Toxicity Assessment | Jing Ren et.al. | 2601.09250 | translate | read | null |
| 2026-01-14 | When to Trust: A Causality-Aware Calibration Framework for Accurate Knowledge Graph Retrieval-Augmented Generation | Jing Ren et.al. | 2601.09241 | translate | read | null |
| 2026-01-14 | DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion | Hanlin Zhang et.al. | 2601.09239 | translate | read | null |
| 2026-01-14 | Mikasa: A Character-Driven Emotional AI Companion Inspired by Japanese Oshi Culture | Miki Ueno et.al. | 2601.09208 | translate | read | null |
| 2026-01-14 | ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection | Tao Liu et.al. | 2601.09195 | translate | read | link |
| 2026-01-14 | OrthoGeoLoRA: Geometric Parameter-Efficient Fine-Tuning for Structured Social Science Concept Retrieval on theWeb | Zeqiang Wang et.al. | 2601.09185 | translate | read | null |
| 2026-01-14 | $D^2Prune$ : Sparsifying Large Language Models via Dual Taylor Expansion and Attention Distribution Awareness | Lang Xiong et.al. | 2601.09176 | translate | read | null |
| 2026-01-14 | BalDRO: A Distributionally Robust Optimization based Framework for Large Language Model Unlearning | Pengyang Shao et.al. | 2601.09172 | translate | read | null |
| 2026-01-14 | LLMs Meet Isolation Kernel: Lightweight, Learning-free Binary Embeddings for Fast Retrieval | Zhibo Zhang et.al. | 2601.09159 | translate | read | null |
| 2026-01-14 | PrivacyReasoner: Can LLM Emulate a Human-like Privacy Mind? | Yiwen Tu et.al. | 2601.09152 | translate | read | null |
| 2026-01-14 | Interpretable Probability Estimation with LLMs via Shapley Reconstruction | Yang Nan et.al. | 2601.09151 | translate | read | null |
| 2026-01-14 | World Craft: Agentic Framework to Create Visualizable Worlds via Text | Jianwen Sun et.al. | 2601.09150 | translate | read | null |
| 2026-01-14 | Identity-Robust Language Model Generation via Content Integrity Preservation | Miao Zhang et.al. | 2601.09141 | translate | read | null |
| 2026-01-14 | KryptoPilot: An Open-World Knowledge-Augmented LLM Agent for Automated Cryptographic Exploitation | Xiaonan Liu et.al. | 2601.09129 | translate | read | null |
| 2026-01-14 | Contrastive Bi-Encoder Models for Multi-Label Skill Extraction: Enhancing ESCO Ontology Matching with BERT and Attention Mechanisms | Yongming Sun et.al. | 2601.09119 | translate | read | null |
| 2026-01-14 | The AI Hippocampus: How Far are We From Human Memory? | Zixia Jia et.al. | 2601.09113 | translate | read | null |
| 2026-01-14 | Seeking Human Security Consensus: A Unified Value Scale for Generative AI Value Safety | Ying He et.al. | 2601.09112 | translate | read | null |
| 2026-01-14 | DScheLLM: Enabling Dynamic Scheduling through a Fine-Tuned Dual-System Large language Model | Lixiang Zhang et.al. | 2601.09100 | translate | read | null |
| 2026-01-14 | Programming over Thinking: Efficient and Robust Multi-Constraint Planning | Derrick Goh Xin Deik et.al. | 2601.09097 | translate | read | null |
| 2026-01-14 | Hidden States as Early Signals: Step-level Trace Evaluation and Pruning for Efficient Test-Time Scaling | Zhixiang Liang et.al. | 2601.09093 | translate | read | null |
| 2026-01-14 | SubTokenTest: A Practical Benchmark for Real-World Sub-token Understanding | Shuyang Hou et.al. | 2601.09089 | translate | read | null |
| 2026-01-14 | From Symbolic to Natural-Language Relations: Rethinking Knowledge Graph Construction in the Era of Large Language Models | Kanyao Han et.al. | 2601.09069 | translate | read | null |
| 2026-01-14 | Mi:dm 2.0 Korea-centric Bilingual Language Models | Donghoon Shin et.al. | 2601.09066 | translate | read | null |
| 2026-01-14 | Efficient Multilingual Dialogue Processing via Translation Pipelines and Distilled Language Models | Santiago Martínez Novoa et.al. | 2601.09059 | translate | read | null |
| 2026-01-14 | Evaluating local large language models for structured extraction from endometriosis-specific transvaginal ultrasound reports | Haiyi Li et.al. | 2601.09053 | translate | read | null |
| 2026-01-14 | Is Grokking Worthwhile? Functional Analysis and Transferability of Generalization Circuits in Transformers | Kaiyu He et.al. | 2601.09049 | translate | read | null |
| 2026-01-14 | Horseshoe Mixtures-of-Experts (HS-MoE) | Nick Polson et.al. | 2601.09043 | translate | read | null |
| 2026-01-14 | Can LLMs interpret figurative language as humans do?: surface-level vs representational similarity | Samhita Bollepally et.al. | 2601.09041 | translate | read | null |
| 2026-01-14 | An Information-Theoretic Perspective on LLM Tokenizers | Mete Erdogan et.al. | 2601.09039 | translate | read | null |
| 2026-01-14 | SpectraQuery: A Hybrid Retrieval-Augmented Conversational Assistant for Battery Science | Sreya Vangara et.al. | 2601.09036 | translate | read | null |
| 2026-01-14 | A Decompilation-Driven Framework for Malware Detection with Large Language Models | Aniesh Chawla et.al. | 2601.09035 | translate | read | null |
| 2026-01-13 | The Hierarchy of Agentic Capabilities: Evaluating Frontier Models on Realistic RL Environments | Logan Ritchie et.al. | 2601.09032 | translate | read | null |
| 2026-01-13 | Proactively Detecting Threats: A Novel Approach Using LLMs | Aniesh Chawla et.al. | 2601.09029 | translate | read | null |
| 2026-01-13 | OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG | Fengran Mo et.al. | 2601.09028 | translate | read | null |
| 2026-01-13 | Agentic AI and Machine Learning for Accelerated Materials Discovery and Applications | Jihua Chen et.al. | 2601.09027 | translate | read | null |
| 2026-01-13 | Multicultural Spyfall: Assessing LLMs through Dynamic Multilingual Social Deduction Game | Haryo Akbarianto Wibowo et.al. | 2601.09017 | translate | read | null |
| 2026-01-13 | Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers | Annalisa Belloni et.al. | 2601.09000 | translate | read | null |
| 2026-01-13 | Optimising for Energy Efficiency and Performance in Machine Learning | Emile Dos Santos Ferreira et.al. | 2601.08991 | translate | read | null |
| 2026-01-13 | ART: Action-based Reasoning Task Benchmarking for Medical AI Agents | Ananya Mantravadi et.al. | 2601.08988 | translate | read | null |
| 2026-01-13 | Integrating APK Image and Text Data for Enhanced Threat Detection: A Multimodal Deep Learning Approach to Android Malware | Md Mashrur Arifin et.al. | 2601.08959 | translate | read | null |
| 2026-01-13 | Fine Grained Evaluation of LLMs-as-Judges | Sourav Saha et.al. | 2601.08919 | translate | read | null |
| 2026-01-13 | Spectral Generative Flow Models: A Physics-Inspired Replacement for Vectorized Large Language Models | Andrew Kiruluta et.al. | 2601.08893 | translate | read | null |
| 2026-01-13 | Evaluating Role-Consistency in LLMs for Counselor Training | Eric Rudolph et.al. | 2601.08892 | translate | read | null |
| 2026-01-12 | Bridging the Gap: Empowering Small Models in Reliable OpenACC-based Parallelization via GEPA-Optimized Prompting | Samyak Jhaveri et.al. | 2601.08884 | translate | read | null |
| 2026-01-13 | Modeling LLM Agent Reviewer Dynamics in Elo-Ranked Review System | Hsiang-Wei Huang et.al. | 2601.08829 | translate | read | null |
| 2026-01-13 | Reasoning Matters for 3D Visual Grounding | Hsiang-Wei Huang et.al. | 2601.08811 | translate | read | null |
| 2026-01-13 | Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge | Yao Tang et.al. | 2601.08808 | translate | read | link |
| 2026-01-13 | MixServe: An Automatic Distributed Serving System for MoE Models with Hybrid Parallelism Based on Fused Communication Algorithm | Bowen Zhou et.al. | 2601.08800 | translate | read | null |
| 2026-01-13 | Uncovering Political Bias in Large Language Models using Parliamentary Voting Records | Jieying Chen et.al. | 2601.08785 | translate | read | null |
| 2026-01-13 | Asymptotic Universal Alignment: A New Alignment Framework via Test-Time Scaling | Yang Cai et.al. | 2601.08777 | translate | read | null |
| 2026-01-13 | Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs | Zhiyuan Hu et.al. | 2601.08763 | translate | read | null |
| 2026-01-13 | M3CoTBench: Benchmark Chain-of-Thought of MLLMs in Medical Image Understanding | Juntao Jiang et.al. | 2601.08758 | translate | read | null |
| 2026-01-13 | Inferring Latent Intentions: Attributional Natural Language Inference in LLM Agents | Xin Quan et.al. | 2601.08742 | translate | read | null |
| 2026-01-13 | From Rows to Reasoning: A Retrieval-Augmented Multimodal Framework for Spreadsheet Understanding | Anmol Gulati et.al. | 2601.08741 | translate | read | null |
| 2026-01-13 | PrivGemo: Privacy-Preserving Dual-Tower Graph Retrieval for Empowering LLM Reasoning with Memory Augmentation | Xingyu Tan et.al. | 2601.08739 | translate | read | null |
| 2026-01-13 | TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback | Prithwish Jana et.al. | 2601.08734 | translate | read | null |
| 2026-01-13 | RAGShaper: Eliciting Sophisticated Agentic RAG Skills via Automated Data Synthesis | Zhengwei Tao et.al. | 2601.08699 | translate | read | null |
| 2026-01-13 | Nationality and Region Prediction from Names: A Comparative Study of Neural Models and Large Language Models | Keito Inoshita et.al. | 2601.08692 | translate | read | null |
| 2026-01-13 | LLMs in Code Vulnerability Analysis: A Proof of Concept | Shaznin Sultana et.al. | 2601.08691 | translate | read | null |
| 2026-01-13 | All Required, In Order: Phase-Level Evaluation for AI-Human Dialogue in Healthcare and Beyond | Shubham Kulkarni et.al. | 2601.08690 | translate | read | null |
| 2026-01-13 | QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models | Zhaolu Kang et.al. | 2601.08689 | translate | read | null |
| 2026-01-13 | Advancing ESG Intelligence: An Expert-level Agent and Comprehensive Benchmark for Sustainable Finance | Yilei Zhao et.al. | 2601.08676 | translate | read | null |
| 2026-01-13 | Why AI Alignment Failure Is Structural: Learned Human Interaction Structures and AGI as an Endogenous Evolutionary Shock | Didier Sornette et.al. | 2601.08673 | translate | read | null |
| 2026-01-13 | Analyzing Bias in False Refusal Behavior of Large Language Models for Hate Speech Detoxification | Kyuri Im et.al. | 2601.08668 | translate | read | null |
| 2026-01-13 | Prism: Towards Lowering User Cognitive Load in LLMs via Complex Intent Understanding | Zenghua Liao et.al. | 2601.08653 | translate | read | null |
| 2026-01-13 | Resisting Manipulative Bots in Memecoin Copy Trading: A Multi-Agent Approach with Chain-of-Thought Reasoning | Yichen Luo et.al. | 2601.08641 | translate | read | null |
| 2026-01-13 | Moral Lenses, Political Coordinates: Towards Ideological Positioning of Morally Conditioned LLMs | Chenchen Yuan et.al. | 2601.08634 | translate | read | null |
| 2026-01-13 | How Order-Sensitive Are LLMs? OrderProbe for Deterministic Structural Reconstruction | Yingjie He et.al. | 2601.08626 | translate | read | null |
| 2026-01-13 | Efficient Maintenance of Leiden Communities in Large Dynamic Graphs | Chunxu Lin et.al. | 2601.08554 | translate | read | null |
| 2026-01-13 | Learner-Tailored Program Repair: A Solution Generator with Iterative Edit-Driven Retrieval Enhancement | Zhenlong Dai et.al. | 2601.08545 | translate | read | null |
| 2026-01-13 | Reducing Compute Waste in LLMs through Kernel-Level DVFS | Jeffrey Spaan et.al. | 2601.08539 | translate | read | null |
| 2026-01-13 | Your Group-Relative Advantage Is Biased | Fengkai Yang et.al. | 2601.08521 | translate | read | null |
| 2026-01-13 | Closed-Loop LLM Discovery of Non-Standard Channel Priors in Vision Models | Tolgay Atinc Uzun et.al. | 2601.08517 | translate | read | null |
| 2026-01-13 | Robust CAPTCHA Using Audio Illusions in the Era of Large Language Models: from Evaluation to Advances | Ziqi Ding et.al. | 2601.08516 | translate | read | null |
| 2026-01-13 | What If TSF: A Benchmark for Reframing Forecasting as Scenario-Guided Multimodal Forecasting | Jinkwan Jang et.al. | 2601.08509 | translate | read | null |
| 2026-01-13 | It’s All About the Confidence: An Unsupervised Approach for Multilingual Historical Entity Linking using Large Language Models | Cristian Santini et.al. | 2601.08500 | translate | read | null |
| 2026-01-13 | BenchOverflow: Measuring Overflow in Large Language Models via Plain-Text Prompts | Erin Feiglin et.al. | 2601.08490 | translate | read | null |
| 2026-01-13 | SUMMPILOT: Bridging Efficiency and Customization for Interactive Summarization System | JungMin Yun et.al. | 2601.08475 | translate | read | null |
| 2026-01-13 | sui-1: Grounded and Verifiable Long-Form Summarization | Benedikt Droste et.al. | 2601.08472 | translate | read | null |
| 2026-01-13 | JudgeRLVR: Judge First, Generate Second for Efficient Reasoning | Jiangshan Duo et.al. | 2601.08468 | translate | read | null |
| 2026-01-13 | M3-BENCH: Process-Aware Evaluation of LLM Agents Social Behaviors in Mixed-Motive Games | Sixiong Xie et.al. | 2601.08462 | translate | read | null |
| 2026-01-13 | Beyond Linearization: Attributed Table Graphs for Table Reasoning | Yuxiang Wang et.al. | 2601.08444 | translate | read | null |
| 2026-01-13 | YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation | Abdelaziz Bounhar et.al. | 2601.08441 | translate | read | null |
| 2026-01-13 | Incentivizing Cardiologist-Like Reasoning in MLLMs for Interpretable Echocardiographic Diagnosis | Yi Qin et.al. | 2601.08440 | translate | read | null |
| 2026-01-13 | Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management | Weitao Ma et.al. | 2601.08435 | translate | read | null |
| 2026-01-13 | Silence the Judge: Reinforcement Learning with Self-Verifier via Latent Geometric Clustering | Nonghai Zhang et.al. | 2601.08427 | translate | read | null |
| 2026-01-13 | Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance | Jihang Li et.al. | 2601.08418 | translate | read | null |
| 2026-01-13 | Regulatory gray areas of LLM Terms | Brittany I. Davidson et.al. | 2601.08415 | translate | read | null |
| 2026-01-13 | Hybrid Distillation with CoT Guidance for Edge-Drone Control Code Generation | Yizhan Feng et.al. | 2601.08412 | translate | read | null |
| 2026-01-13 | Large Language Models to Enhance Multi-task Drone Operations in Simulated Environments | Yizhan Feng et.al. | 2601.08405 | translate | read | null |
| 2026-01-13 | Owen-Shapley Policy Optimization (OSPO): A Principled RL Algorithm for Generative Search LLMs | Abhijnan Nath et.al. | 2601.08403 | translate | read | null |
| 2026-01-13 | PATS: Personality-Aware Teaching Strategies with Large Language Model Tutors | Donya Rooein et.al. | 2601.08402 | translate | read | null |
| 2026-01-13 | CLaS-Bench: A Cross-Lingual Alignment and Steering Benchmark | Daniil Gurgurov et.al. | 2601.08331 | translate | read | null |
| 2026-01-13 | Deep Exploration of Epoch-wise Double Descent in Noisy Data: Signal Separation, Large Activation, and Benign Overfitting | Tomoki Kubo et.al. | 2601.08316 | translate | read | null |
| 2026-01-13 | Enhancing Image Quality Assessment Ability of LMMs via Retrieval-Augmented Generation | Kang Fu et.al. | 2601.08311 | translate | read | null |
| 2026-01-13 | Enhancing Sentiment Classification and Irony Detection in Large Language Models through Advanced Prompt Engineering Techniques | Marvin Schmitt et.al. | 2601.08302 | translate | read | null |
| 2026-01-13 | Demystifying the Slash Pattern in Attention: The Role of RoPE | Yuan Cheng et.al. | 2601.08297 | translate | read | null |
| 2026-01-13 | KidVis: Do Multimodal Large Language Models Possess the Visual Perceptual Capabilities of a 6-Year-Old? | Xianfeng Wang et.al. | 2601.08292 | translate | read | null |
| 2026-01-13 | OpenMic: A Multi-Agent-Based Stand-Up Comedy Generation System | Yuyang Wu et.al. | 2601.08288 | translate | read | null |
| 2026-01-13 | AgriLens: Semantic Retrieval in Agricultural Texts Using Topic Modeling and Language Models | Heba Shakeel et.al. | 2601.08283 | translate | read | null |
| 2026-01-13 | Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees | Kun Li et.al. | 2601.08274 | translate | read | null |
| 2026-01-13 | HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding | Qitan Lv et.al. | 2601.08273 | translate | read | null |
| 2026-01-13 | Med-CoReasoner: Reducing Language Disparities in Medical Reasoning via Language-Informed Co-Reasoning | Fan Gao et.al. | 2601.08267 | translate | read | null |
| 2026-01-13 | Unleashing Tool Engineering and Intelligence for Agentic AI in Next-Generation Communication Networks | Yinqiu Liu et.al. | 2601.08259 | translate | read | null |
| 2026-01-13 | Large Artificial Intelligence Model Guided Deep Reinforcement Learning for Resource Allocation in Non Terrestrial Networks | Abdikarim Mohamed Ibrahim et.al. | 2601.08254 | translate | read | null |
| 2026-01-13 | Improving Zero-shot ADL Recognition with Large Language Models through Event-based Context and Confidence | Michele Fiori et.al. | 2601.08241 | translate | read | null |
| 2026-01-13 | The End of Reward Engineering: How LLMs Are Redefining Multi-Agent Coordination | Haoran Su et.al. | 2601.08237 | translate | read | null |
| 2026-01-13 | DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection | Zhenhua Xu et.al. | 2601.08223 | translate | read | null |
| 2026-01-13 | Generation-Augmented Generation: A Plug-and-Play Framework for Private Knowledge Injection in Large Language Models | Rongji Li et.al. | 2601.08209 | translate | read | null |
| 2026-01-13 | Triplets Better Than Pairs: Towards Stable and Effective Self-Play Fine-Tuning for LLMs | Yibo Wang et.al. | 2601.08198 | translate | read | null |
| 2026-01-13 | Evaluating Implicit Regulatory Compliance in LLM Tool Invocation via Logic-Guided Synthesis | Da Song et.al. | 2601.08196 | translate | read | null |
| 2026-01-13 | Improving LLM Reasoning with Homophily-aware Structural and Semantic Text-Attributed Graph Compression | Zijun Di et.al. | 2601.08187 | translate | read | null |
| 2026-01-13 | GI-Bench: A Panoramic Benchmark Revealing the Knowledge-Experience Dissociation of Multimodal Large Language Models in Gastrointestinal Endoscopy Against Clinical Standards | Yan Zhu et.al. | 2601.08183 | translate | read | null |
| 2026-01-13 | Prompt-Based Clarity Evaluation and Topic Detection in Political Question Answering | Lavanya Prahallad et.al. | 2601.08176 | translate | read | null |
| 2026-01-13 | The Agent’s First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios | Daocheng Fu et.al. | 2601.08173 | translate | read | null |
| 2026-01-13 | Relational Knowledge Distillation Using Fine-tuned Function Vectors | Andrea Kang et.al. | 2601.08169 | translate | read | null |
| 2026-01-13 | WISE-Flow: Workflow-Induced Structured Experience for Self-Evolving Conversational Service Agents | Yuqing Zhou et.al. | 2601.08158 | translate | read | null |
| 2026-01-13 | Where Does Vision Meet Language? Understanding and Refining Visual Fusion in MLLMs via Contrastive Attention | Shezheng Song et.al. | 2601.08151 | translate | read | null |
| 2026-01-13 | Enriching Semantic Profiles into Knowledge Graph for Recommender Systems Using Large Language Models | Seokho Ahn et.al. | 2601.08148 | translate | read | null |
| 2026-01-13 | Qalb: Largest State-of-the-Art Urdu Large Language Model for 230M Speakers with Systematic Continued Pre-training | Muhammad Taimoor Hassan et.al. | 2601.08141 | translate | read | null |
| 2026-01-13 | MirrorBench: An Extensible Framework to Evaluate User-Proxy Agents for Human-Likeness | Ashutosh Hathidara et.al. | 2601.08118 | translate | read | null |
| 2026-01-13 | Coordinated Cooling and Compute Management for AI Datacenters | Nardos Belay Abera et.al. | 2601.08113 | translate | read | null |
| 2026-01-13 | Debiasing Large Language Models via Adaptive Causal Prompting with Sketch-of-Thought | Bowen Li et.al. | 2601.08108 | translate | read | null |
| 2026-01-13 | STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order | Chengyang Gu et.al. | 2601.08107 | translate | read | null |
| 2026-01-13 | AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling | Yongliang Miao et.al. | 2601.08097 | translate | read | null |
| 2026-01-13 | Q-realign: Piggybacking Realignment on Quantization for Safe and Efficient LLM Deployment | Qitao Tan et.al. | 2601.08089 | translate | read | null |
| 2026-01-12 | MemoBrain: Executive Memory as an Agentic Brain for Reasoning | Hongjin Qian et.al. | 2601.08079 | translate | read | null |
| 2026-01-12 | Semantic Gravity Wells: Why Negative Constraints Backfire | Shailesh Rana et.al. | 2601.08070 | translate | read | null |
| 2026-01-12 | Calibration Is Not Enough: Evaluating Confidence Estimation Under Language Variations | Yuxi Xia et.al. | 2601.08064 | translate | read | null |
| 2026-01-12 | Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models | Zhenghao He et.al. | 2601.08058 | translate | read | null |
| 2026-01-12 | Cognitive Biases in LLM-Assisted Software Development | Xinyi Zhou et.al. | 2601.08045 | translate | read | null |
| 2026-01-12 | Towards Verifiably Safe Tool Use for LLM Agents | Aarya Doshi et.al. | 2601.08012 | translate | read | null |
| 2026-01-12 | LLM Review: Enhancing Creative Writing via Blind Peer Review Feedback | Weiyue Li et.al. | 2601.08003 | translate | read | null |
| 2026-01-12 | Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety | Can Jin et.al. | 2601.08000 | translate | read | null |
| 2026-01-12 | Is Sentiment Banana-Shaped? Exploring the Geometry and Portability of Sentiment Concept Vectors | Laurits Lyngbaek et.al. | 2601.07995 | translate | read | null |
| 2026-01-12 | DYCP: Dynamic Context Pruning for Long-Form Dialogue with LLMs | Nayoung Choi et.al. | 2601.07994 | translate | read | null |
| 2026-01-12 | Fake Date Tests: Can We Trust In-sample Accuracy of LLMs in Macroeconomic Forecasting? | Alexander Eliseev et.al. | 2601.07992 | translate | read | null |
| 2026-01-12 | Multilingual, Multimodal Pipeline for Creating Authentic and Structured Fact-Checked Claim Dataset | Z. Melce Hüsünbeyi et.al. | 2601.07985 | translate | read | null |
| 2026-01-12 | Cost and accuracy of long-term memory in Distributed Multi-Agent Systems based on Large Language Models | Benedict Wolff et.al. | 2601.07978 | translate | read | null |
| 2026-01-12 | Explaining Generalization of AI-Generated Text Detectors Through Linguistic Analysis | Yuxi Xia et.al. | 2601.07974 | translate | read | null |
| 2026-01-12 | Knowing But Not Doing: Convergent Morality and Divergent Action in LLMs | Jen-tse Huang et.al. | 2601.07972 | translate | read | null |
| 2026-01-12 | A Human-Centric Pipeline for Aligning Large Language Models with Chinese Medical Ethics | Haoan Jin et.al. | 2601.07954 | translate | read | null |
| 2026-01-12 | SECite: Analyzing and Summarizing Citations in Software Engineering Literature | Shireesh Reddy Pyreddy et.al. | 2601.07939 | translate | read | null |
| 2026-01-12 | Towards Specialized Generalists: A Multi-Task MoE-LoRA Framework for Domain-Specific LLM Adaptation | Yuxin Yang et.al. | 2601.07935 | translate | read | null |
| 2026-01-12 | Enhancing Large Language Models for Time-Series Forecasting via Vector-Injected In-Context Learning | Jianqi Zhang et.al. | 2601.07903 | translate | read | null |
| 2026-01-12 | SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations | Mohammed Himayath Ali et.al. | 2601.07835 | translate | read | null |
| 2026-01-12 | The Confidence Trap: Gender Bias and Predictive Certainty in LLMs | Ahmed Sabir et.al. | 2601.07806 | translate | read | null |
| 2026-01-12 | Learning Through Dialogue: Unpacking the Dynamics of Human-LLM Conversations on Political Issues | Shaz Furniturewala et.al. | 2601.07796 | translate | read | null |
| 2026-01-12 | Kinship Data Benchmark for Multi-hop Reasoning | Tianda Sun et.al. | 2601.07794 | translate | read | null |
| 2026-01-12 | “TODO: Fix the Mess Gemini Created”: Towards Understanding GenAI-Induced Self-Admitted Technical Debt | Abdullah Al Mujahid et.al. | 2601.07786 | translate | read | null |
| 2026-01-12 | Enhancing Self-Correction in Large Language Models through Multi-Perspective Reflection | Mariana Costa et.al. | 2601.07780 | translate | read | null |
| 2026-01-12 | Are LLM Decisions Faithful to Verbal Confidence? | Jiawei Wang et.al. | 2601.07767 | translate | read | null |
| 2026-01-12 | Structure First, Reason Next: Enhancing a Large Language Model using Knowledge Graph for Numerical Reasoning in Financial Documents | Aryan Mishra et.al. | 2601.07754 | translate | read | null |
| 2026-01-12 | Evaluating the encoding competence of visual language models using uncommon actions | Chen Ling et.al. | 2601.07737 | translate | read | null |
| 2026-01-12 | Is Agentic RAG worth it? An experimental comparison of RAG approaches | Pietro Ferrazzi et.al. | 2601.07711 | translate | read | null |
| 2026-01-12 | Exploring the Meta-level Reasoning of Large Language Models via a Tool-based Multi-hop Tabular Question Answering Task | Nick Ferguson et.al. | 2601.07696 | translate | read | null |
| 2026-01-12 | Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference | Rei Taniguchi et.al. | 2601.07667 | translate | read | null |
| 2026-01-12 | Towards Automating Blockchain Consensus Verification with IsabeLLM | Elliot Jones et.al. | 2601.07654 | translate | read | null |
| 2026-01-12 | PlaM: Training-Free Plateau-Guided Model Merging for Better Visual Grounding in MLLMs | Zijing Wang et.al. | 2601.07645 | translate | read | null |
| 2026-01-12 | GeoMotionGPT: Geometry-Aligned Motion Understanding with Large Language Models | Zhankai Ye et.al. | 2601.07632 | translate | read | null |
| 2026-01-12 | Proof of Time: A Benchmark for Evaluating Scientific Idea Judgments | Bingyang Ye et.al. | 2601.07606 | translate | read | null |
| 2026-01-12 | OODEval: Evaluating Large Language Models on Object-Oriented Design | Bingxu Xiao et.al. | 2601.07602 | translate | read | null |
| 2026-01-12 | GRPO with State Mutations: Improving LLM-Based Hardware Test Plan Generation | Dimple Vijay Kochar et.al. | 2601.07593 | translate | read | null |
| 2026-01-12 | Large Language Models for Physics Instrument Design | Sara Zoccheddu et.al. | 2601.07580 | translate | read | null |
| 2026-01-12 | Beyond Entangled Planning: Task-Decoupled Planning for Long-Horizon Agents | Yunfan Li et.al. | 2601.07577 | translate | read | null |
| 2026-01-12 | d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation | Yu-Yang Qian et.al. | 2601.07568 | translate | read | link |
| 2026-01-12 | A Unified Framework for Emotion Recognition and Sentiment Analysis via Expert-Guided Multimodal Fusion with Large Language Models | Jiaqi Qiao et.al. | 2601.07565 | translate | read | null |
| 2026-01-05 | Heterogeneous Low-Bandwidth Pre-Training of LLMs | Yazan Obeidi et.al. | 2601.02360 | translate | read | null |
| 2026-01-05 | Robust Persona-Aware Toxicity Detection with Prompt Optimization and Learned Ensembling | Berk Atil et.al. | 2601.02337 | translate | read | null |
| 2026-01-05 | Estimating Text Temperature | Nikolay Mikhaylovskiy et.al. | 2601.02320 | translate | read | null |
| 2026-01-05 | Project Ariadne: A Structural Causal Framework for Auditing Faithfulness in LLM Agents | Sourena Khanzadeh et.al. | 2601.02314 | translate | read | null |
| 2026-01-05 | Placement Semantics for Distributed Deep Learning: A Systematic Framework for Analyzing Parallelism Strategies | Deep Pankajbhai Mehta et.al. | 2601.02311 | translate | read | null |
| 2026-01-05 | Power-of-Two Quantization-Aware-Training (PoT-QAT) in Large Language Models (LLMs) | Mahmoud Elgenedy et.al. | 2601.02298 | translate | read | null |
| 2026-01-05 | CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models | Yihao Liang et.al. | 2601.02236 | translate | read | null |
| 2026-01-05 | ELLA: Efficient Lifelong Learning for Adapters in Large Language Models | Shristi Das Biswas et.al. | 2601.02232 | translate | read | null |
| 2026-01-05 | From XAI to Stories: A Factorial Study of LLM-Generated Explanation Quality | Fabian Lukassen et.al. | 2601.02224 | translate | read | null |
| 2026-01-05 | CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents | Keyu Wang et.al. | 2601.02201 | translate | read | null |
| 2026-01-05 | Toward Global Large Language Models in Medicine | Rui Yang et.al. | 2601.02186 | translate | read | null |
| 2026-01-05 | Confidence Estimation for LLMs in Multi-turn Interactions | Caiqi Zhang et.al. | 2601.02179 | translate | read | null |
| 2026-01-05 | Streaming Hallucination Detection in Long Chain-of-Thought Reasoning | Haolang Lu et.al. | 2601.02170 | translate | read | null |
| 2026-01-05 | EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning | Chuanrui Hu et.al. | 2601.02163 | translate | read | null |
| 2026-01-05 | Routing by Analogy: kNN-Augmented Expert Assignment for Mixture-of-Experts | Boxuan Lyu et.al. | 2601.02144 | translate | read | null |
| 2026-01-05 | Towards Multi-Level Transcript Segmentation: LoRA Fine-Tuning for Table-of-Contents Generation | Steffen Freisinger et.al. | 2601.02128 | translate | read | null |
| 2026-01-05 | DeCode: Decoupling Content and Delivery for Medical QA | Po-Jen Ko et.al. | 2601.02123 | translate | read | null |
| 2026-01-05 | Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot | Chenghao Yin et.al. | 2601.02078 | translate | read | null |
| 2026-01-05 | Deferred Commitment Decoding for Diffusion Language Models with Confidence-Aware Sliding Windows | Yingte Shu et.al. | 2601.02076 | translate | read | null |
| 2026-01-05 | MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics | Zhuofan Shi et.al. | 2601.02075 | translate | read | null |
| 2026-01-05 | FormuLLA: A Large Language Model Approach to Generating Novel 3D Printable Formulations | Adeshola Okubena et.al. | 2601.02071 | translate | read | null |
| 2026-01-05 | Cost-Efficient Cross-Lingual Retrieval-Augmented Generation for Low-Resource Languages: A Case Study in Bengali Agricultural Advisory | Md. Asif Hossain et.al. | 2601.02065 | translate | read | null |
| 2026-01-05 | Perish or Flourish? A Holistic Evaluation of Large Language Models for Code Generation in Functional Programming | Nguyet-Anh H. Lang et.al. | 2601.02060 | translate | read | null |
| 2026-01-05 | Output Embedding Centering for Stable LLM Pretraining | Felix Stollenwerk et.al. | 2601.02031 | translate | read | null |
| 2026-01-05 | Not All Needles Are Found: How Fact Distribution and Don’t Make It Up Prompts Shape Literal Extraction, Logical Inference, and Hallucination Risks in Long-Context LLMs | Amirali Ebrahimzadeh et.al. | 2601.02023 | translate | read | null |
| 2026-01-05 | AgentVNE: LLM-Augmented Graph Reinforcement Learning for Affinity-Aware Multi-Agent Placement in Edge Agentic AI | Runze Zheng et.al. | 2601.02021 | translate | read | null |
| 2026-01-05 | Exploring Approaches for Detecting Memorization of Recommender System Data in Large Language Models | Antonio Colacicco et.al. | 2601.02002 | translate | read | null |
| 2026-01-05 | MindChat: A Privacy-preserving Large Language Model for Mental Health Support | Dong Xue et.al. | 2601.01993 | translate | read | null |
| 2026-01-05 | ChaosBench-Logic: A Benchmark for Logical and Symbolic Reasoning on Chaotic Dynamical Systems | Noel Thomas et.al. | 2601.01982 | translate | read | null |
| 2026-01-05 | Reporting LLM Prompting in Automated Software Engineering: A Guideline Based on Current Practices and Expectations | Alexander Korn et.al. | 2601.01954 | translate | read | null |
| 2026-01-05 | MacVQA: Adaptive Memory Allocation and Global Noise Filtering for Continual Visual Question Answering | Zhifei Li et.al. | 2601.01926 | translate | read | null |
| 2026-01-05 | AR-MOT: Autoregressive Multi-object Tracking | Lianjie Jia et.al. | 2601.01925 | translate | read | null |
| 2026-01-05 | TalkPhoto: A Versatile Training-Free Conversational Assistant for Intelligent Image Editing | Yujie Hu et.al. | 2601.01915 | translate | read | null |
| 2026-01-05 | MMP-A*: Multimodal Perception Enhanced Incremental Heuristic Search on Path Planning | Minh Hieu Ha et.al. | 2601.01910 | translate | read | null |
| 2026-01-05 | Tackling the Inherent Difficulty of Noise Filtering in RAG | Jingyu Liu et.al. | 2601.01896 | translate | read | null |
| 2026-01-05 | Agentic AI in Remote Sensing: Foundations, Taxonomy, and Emerging Systems | Niloufar Alipour Talemi et.al. | 2601.01891 | translate | read | null |
| 2026-01-05 | Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance | Jiawen Zhang et.al. | 2601.01887 | translate | read | null |
| 2026-01-05 | Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents | Yi Yu et.al. | 2601.01885 | translate | read | null |
| 2026-01-05 | Theory Trace Card: Theory-Driven Socio-Cognitive Evaluation of LLMs | Farzan Karimi-Malekabadi et.al. | 2601.01878 | translate | read | null |
| 2026-01-05 | Toward Auditable Neuro-Symbolic Reasoning in Pathology: SQL as an Explicit Trace of Evidence | Kewen Cao et.al. | 2601.01875 | translate | read | null |
| 2026-01-05 | CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving | Shuhang Chen et.al. | 2601.01874 | translate | read | null |
| 2026-01-05 | Entity-Guided Multi-Task Learning for Infrared and Visible Image Fusion | Wenyu Shao et.al. | 2601.01870 | translate | read | null |
| 2026-01-05 | DermoGPT: Open Weights and Open Data for Morphology-Grounded Dermatological Reasoning MLLMs | Jinghan Ru et.al. | 2601.01868 | translate | read | null |
| 2026-01-05 | Judging with Personality and Confidence: A Study on Personality-Conditioned LLM Relevance Assessment | Nuo Chen et.al. | 2601.01862 | translate | read | null |
| 2026-01-05 | Jenius Agent: Towards Experience-Driven Accuracy Optimization in Real-World Scenarios | Defei Xia et.al. | 2601.01857 | translate | read | null |
| 2026-01-05 | MORE: Multi-Objective Adversarial Attacks on Speech Recognition | Xiaoxue Gao et.al. | 2601.01852 | translate | read | null |
| 2026-01-05 | Clinical Knowledge Graph Construction and Evaluation with Multi-LLMs via Retrieval-Augmented Generation | Udiptaman Das et.al. | 2601.01844 | translate | read | null |
| 2026-01-05 | COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs | Dasol Choi et.al. | 2601.01836 | translate | read | null |
| 2026-01-05 | Emergent Introspective Awareness in Large Language Models | Jack Lindsey et.al. | 2601.01828 | translate | read | null |
| 2026-01-05 | Aspect Extraction from E-Commerce Product and Service Reviews | Valiant Lance D. Dionela et.al. | 2601.01827 | translate | read | null |
| 2026-01-05 | CSCBench: A PVC Diagnostic Benchmark for Commodity Supply Chain Reasoning | Yaxin Cui et.al. | 2601.01825 | translate | read | null |
| 2026-01-05 | Causality-Aware Temporal Projection for Video Understanding in Video-LLMs | Zhengjian Kang et.al. | 2601.01804 | translate | read | null |
| 2026-01-05 | UnPII: Unlearning Personally Identifiable Information with Quantifiable Exposure Risk | Intae Jeon et.al. | 2601.01786 | translate | read | null |
| 2026-01-05 | LIA: Supervised Fine-Tuning of Large Language Models for Automatic Issue Assignment | Arsham Khosravani et.al. | 2601.01780 | translate | read | null |
| 2026-01-05 | Can Large Language Models Solve Engineering Equations? A Systematic Comparison of Direct Prediction and Solver-Assisted Approaches | Sai Varun Kodathala et.al. | 2601.01774 | translate | read | null |
| 2026-01-05 | Can LLMs Track Their Output Length? A Dynamic Feedback Mechanism for Precise Length Regulation | Meiman Xiao et.al. | 2601.01768 | translate | read | null |
| 2026-01-05 | A New Benchmark for the Appropriate Evaluation of RTL Code Optimization | Yao Lu et.al. | 2601.01765 | translate | read | null |
| 2026-01-05 | Query-Document Dense Vectors for LLM Relevance Judgment Bias Analysis | Samaneh Mohtadi et.al. | 2601.01751 | translate | read | null |
| 2026-01-05 | Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications | YuanLab. ai et.al. | 2601.01718 | translate | read | null |
| 2026-01-05 | A Training-Free Large Reasoning Model-based Knowledge Tracing Framework for Unified Prediction and Prescription | Unggi Lee et.al. | 2601.01708 | translate | read | null |
| 2026-01-04 | All-Optical Deep Learning with Quantum Nonlinearity | Qingyi Zhou et.al. | 2601.01690 | translate | read | null |
| 2026-01-04 | Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage | Jinwei Hu et.al. | 2601.01685 | translate | read | null |
| 2026-01-04 | Exposing Hidden Interfaces: LLM-Guided Type Inference for Reverse Engineering macOS Private Frameworks | Arina Kharlamova et.al. | 2601.01673 | translate | read | null |
| 2026-01-04 | JMedEthicBench: A Multi-Turn Conversational Benchmark for Evaluating Medical Safety in Japanese Large Language Models | Junyu Liu et.al. | 2601.01627 | translate | read | null |
| 2026-01-04 | Structured Decomposition for LLM Reasoning: Cross-Domain Validation and Semantic Web Integration | Albert Sadowski et.al. | 2601.01609 | translate | read | null |
| 2026-01-04 | OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs | Xin Wang et.al. | 2601.01592 | translate | read | null |
| 2026-01-04 | The Two-Stage Decision-Sampling Hypothesis: Understanding the Emergence of Self-Reflection in RL-Trained LLMs | Zibo Zhao et.al. | 2601.01580 | translate | read | null |
| 2026-01-04 | CaveAgent: Transforming LLMs into Stateful Runtime Operators | Maohao Ran et.al. | 2601.01569 | translate | read | null |
| 2026-01-04 | MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization | Donghua Yu et.al. | 2601.01554 | translate | read | null |
| 2026-01-04 | HalluZig: Hallucination Detection using Zigzag Persistence | Shreyas N. Samaga et.al. | 2601.01552 | translate | read | null |
| 2026-01-04 | Improving Behavioral Alignment in LLM Social Simulations via Context Formation and Navigation | Letian Kong et.al. | 2601.01546 | translate | read | null |
| 2026-01-04 | Bridging the Data Gap: Creating a Hindi Text Summarization Dataset from the English XSUM | Praveenkumar Katwe et.al. | 2601.01543 | translate | read | null |
| 2026-01-04 | Bayesian Orchestration of Multi-LLM Agents for Cost-Aware Sequential Decision-Making | Danial Amin et.al. | 2601.01522 | translate | read | null |
| 2026-01-04 | Distortion Instead of Hallucination: The Effect of Reasoning Under Strict Constraints | Junichiro Niimi et.al. | 2601.01490 | translate | read | null |
| 2026-01-04 | Can Legislation Be Made Machine-Readable in PROLEG? | May-Myo Zin et.al. | 2601.01477 | translate | read | null |
| 2026-01-04 | Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR | Yuxiang Mei et.al. | 2601.01461 | translate | read | null |
| 2026-01-04 | Bayesian Subspace Gradient Estimation for Zeroth-Order Optimization of Large Language Models | Jian Feng et.al. | 2601.01452 | translate | read | null |
| 2026-01-04 | iFlip: Iterative Feedback-driven Counterfactual Example Refinement | Yilong Wang et.al. | 2601.01446 | translate | read | null |
| 2026-01-04 | Personalizing black-box models for nonparametric regression with minimax optimality | Sai Li et.al. | 2601.01432 | translate | read | null |
| 2026-01-04 | From Emotion Classification to Emotional Reasoning: Enhancing Emotional Intelligence in Large Language Models | Arjhun Sreedar et.al. | 2601.01407 | translate | read | null |
| 2026-01-04 | LANCET: Neural Intervention via Structural Entropy for Mitigating Faithfulness Hallucinations in LLMs | Chenxu Wang et.al. | 2601.01401 | translate | read | null |
| 2026-01-04 | EternalMath: A Living Benchmark of Frontier Mathematics that Evolves with Human Discovery | Jicheng Ma et.al. | 2601.01400 | translate | read | null |
| 2026-01-04 | Empowering Small Language Models with Factual Hallucination-Aware Reasoning for Financial Classification | Han Yuan et.al. | 2601.01378 | translate | read | null |
| 2026-01-04 | KGCE: Knowledge-Augmented Dual-Graph Evaluator for Cross-Platform Educational Agent Benchmarking with Multimodal Language Models | Zixian Liu et.al. | 2601.01366 | translate | read | null |
| 2026-01-04 | A unified multimodal understanding and generation model for cross-disciplinary scientific research | Xiaomeng Yang et.al. | 2601.01363 | translate | read | null |
| 2026-01-04 | Investigating the Multilingual Calibration Effects of Language Model Instruction-Tuning | Jerry Huang et.al. | 2601.01362 | translate | read | null |
| 2026-01-04 | Towards LLM-enabled autonomous combustion research: A literature-aware agent for self-corrective modeling workflows | Ke Xiao et.al. | 2601.01357 | translate | read | null |
| 2026-01-04 | Reasoning Over Recall: Evaluating the Efficacy of Generalist Architectures vs. Specialized Fine-Tunes in RAG-Based Mental Health Dialogue Systems | Md Abdullah Al Kafi et.al. | 2601.01341 | translate | read | null |
| 2026-01-04 | FLOP-Efficient Training: Early Stopping Based on Test-Time Compute Awareness | Hossam Amer et.al. | 2601.01332 | translate | read | null |
| 2026-01-04 | Beyond Gemini-3-Pro: Revisiting LLM Routing and Aggregation at Scale | Shengji Tang et.al. | 2601.01330 | translate | read | null |
| 2026-01-04 | Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models | Rong Zhou et.al. | 2601.01321 | translate | read | null |
| 2026-01-04 | Adaptive Hierarchical Evaluation of LLMs and SAST tools for CWE Prediction in Python | Muntasir Adnan et.al. | 2601.01320 | translate | read | null |
| 2026-01-04 | Towards a Principled Muon under $μ\mathsf{P}$ : Ensuring Spectral Conditions throughout Training | John Zhao et.al. | 2601.01306 | translate | read | null |
| 2026-01-03 | Warp-Cortex: An Asynchronous, Memory-Efficient Architecture for Million-Agent Cognitive Scaling on Consumer Hardware | Jorge L. Ruiz Williams et.al. | 2601.01298 | translate | read | null |
| 2026-01-03 | Aggressive Compression Enables LLM Weight Theft | Davis Brown et.al. | 2601.01296 | translate | read | null |
| 2026-01-03 | LLM Collusion | Shengyu Cao et.al. | 2601.01279 | translate | read | null |
| 2026-01-03 | CatchAll: Repository-Aware Exception Handling with Knowledge-Guided LLMs | Qingxiao Tao et.al. | 2601.01271 | translate | read | null |
| 2026-01-03 | From Policy to Logic for Efficient and Interpretable Coverage Assessment | Rhitabrat Pokharel et.al. | 2601.01266 | translate | read | null |
| 2026-01-03 | MambaFormer: Token-Level Guided Routing Mixture-of-Experts for Accurate and Efficient Clinical Assistance | Hamad Khan et.al. | 2601.01260 | translate | read | null |
| 2026-01-03 | Entity-Aware and Secure Query Optimization in Database Using Named Entity Recognition | Azrin Sultana et.al. | 2601.01254 | translate | read | null |
| 2026-01-03 | Racka: Efficient Hungarian LLM Adaptation on Academic Infrastructure | Zsolt Csibi et.al. | 2601.01244 | translate | read | null |
| 2026-01-03 | IO-RAE: Information-Obfuscation Reversible Adversarial Example for Audio Privacy Protection | Jiajie Zhu et.al. | 2601.01239 | translate | read | null |
| 2026-01-03 | Atomizer: An LLM-based Collaborative Multi-Agent Framework for Intent-Driven Commit Untangling | Kangchen Zhu et.al. | 2601.01233 | translate | read | null |
| 2026-01-03 | Correctness isnt Efficiency: Runtime Memory Divergence in LLM-Generated Code | Prateek Rajput et.al. | 2601.01215 | translate | read | null |
| 2026-01-03 | OrchestrRL: Dynamic Compute and Network Orchestration for Disaggregated RL | Xin Tan et.al. | 2601.01209 | translate | read | null |
| 2026-01-03 | EduSim-LLM: An Educational Platform Integrating Large Language Models and Robotic Simulation for Beginners | Shenqi Lu et.al. | 2601.01196 | translate | read | null |
| 2026-01-03 | Reinforcement Learning Enhanced Multi-hop Reasoning for Temporal Knowledge Question Answering | Wuzhenghong Wen et.al. | 2601.01195 | translate | read | null |
| 2026-01-03 | SecureCodeRL: Security-Aware Reinforcement Learning for Code Generation with Partial-Credit Rewards | Suryansh Singh Sijwali et.al. | 2601.01184 | translate | read | null |
| 2026-01-03 | Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models | Zihua Yang et.al. | 2601.01162 | translate | read | null |
| 2026-01-03 | DHI: Leveraging Diverse Hallucination Induction for Enhanced Contrastive Factuality Control in Large Language Models | Jiani Guo et.al. | 2601.01156 | translate | read | null |
| 2026-01-03 | SongSage: A Large Musical Language Model with Lyric Generative Pre-training | Jiani Guo et.al. | 2601.01153 | translate | read | null |
| 2026-01-03 | RovoDev Code Reviewer: A Large-Scale Online Evaluation of LLM-based Code Review Automation at Atlassian | Kla Tantithamthavorn et.al. | 2601.01129 | translate | read | null |
| 2026-01-03 | ScienceDB AI: An LLM-Driven Agentic Recommender System for Large-Scale Scientific Data Sharing Services | Qingqing Long et.al. | 2601.01118 | translate | read | null |
| 2026-01-03 | NarrativeTrack: Evaluating Video Language Models Beyond the Frame | Hyeonjeong Ha et.al. | 2601.01095 | translate | read | null |
| 2026-01-03 | ks-lit-3m: A 3.1 million word kashmiri text dataset for large language model pretraining | Haq Nawaz Malik et.al. | 2601.01091 | translate | read | null |
| 2026-01-03 | Harm in AI-Driven Societies: An Audit of Toxicity Adoption on Chirper.ai | Erica Coppolillo et.al. | 2601.01090 | translate | read | null |
| 2026-01-03 | SPoRC-VIST: A Benchmark for Evaluating Generative Natural Narrative in Vision-Language Models | Yunlin Zeng et.al. | 2601.01062 | translate | read | null |
| 2026-01-03 | A Platform for Interactive AI Character Experiences | Rafael Wampfler et.al. | 2601.01027 | translate | read | null |
| 2026-01-03 | HyperJoin: LLM-augmented Hypergraph Link Prediction for Joinable Table Discovery | Shiyuan Liu et.al. | 2601.01015 | translate | read | null |
| 2026-01-02 | Grain-Aware Data Transformations: Type-Level Formal Verification at Zero Computational Cost | Nikos Karayannidis et.al. | 2601.00995 | translate | read | null |
| 2026-01-02 | Reliability Under Randomness: An Empirical Analysis of Sparse and Dense Language Models Across Decoding Temperatures | Kabir Grover et.al. | 2601.00942 | translate | read | null |
| 2026-01-02 | Emoji-Based Jailbreaking of Large Language Models | M P V S Gopinadh et.al. | 2601.00936 | translate | read | null |
| 2026-01-02 | AI-Guided Computational Design of a Room-Temperature, Ambient- Pressure Superconductor Candidate: Grokene | DEARDAO DeSci Collaborative Team et.al. | 2601.00931 | translate | read | null |
| 2026-01-02 | AlignUSER: Human-Aligned LLM Agents via World Models for Recommender System Evaluation | Nicolas Bougie et.al. | 2601.00930 | translate | read | null |
| 2026-01-02 | Measuring Social Media Polarization Using Large Language Models and Heuristic Rules | Jawad Chowdhury et.al. | 2601.00927 | translate | read | null |
| 2026-01-01 | MACA: A Framework for Distilling Trustworthy LLMs into Efficient Retrievers | Satya Swaroop Gudipudi et.al. | 2601.00926 | translate | read | null |
| 2026-01-01 | Context Collapse: In-Context Learning and Model Collapse | Josef Ott et.al. | 2601.00923 | translate | read | null |
| 2026-01-01 | Attention Needs to Focus: A Unified Perspective on Attention Allocation | Zichuan Fu et.al. | 2601.00919 | translate | read | null |
| 2026-01-01 | The Discovery Gap: How Product Hunt Startups Vanish in LLM Organic Discovery Queries | Amit Prakash Sharma et.al. | 2601.00912 | translate | read | null |
| 2026-01-02 | Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning | Valentin Noël et.al. | 2601.00791 | translate | read | null |
| 2026-01-02 | Investigating the Viability of Employing Multi-modal Large Language Models in the Context of Audio Deepfake Detection | Akanksha Chuchra et.al. | 2601.00777 | translate | read | null |
| 2026-01-02 | Memory Bank Compression for Continual Adaptation of Large Language Models | Thomas Katraouras et.al. | 2601.00756 | translate | read | null |
| 2026-01-02 | The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving | Max Ruiz Luyten et.al. | 2601.00747 | translate | read | null |
| 2026-01-02 | Materials Informatics: Emergence To Autonomous Discovery In The Age Of AI | Turab Lookman et.al. | 2601.00742 | translate | read | null |
| 2026-01-02 | Exploring the Performance of Large Language Models on Subjective Span Identification Tasks | Alphaeus Dmonte et.al. | 2601.00736 | translate | read | null |
| 2026-01-02 | Grading Handwritten Engineering Exams with Multimodal Large Language Models | Janez Perš et.al. | 2601.00730 | translate | read | null |
| 2026-01-02 | A Vision-and-Knowledge Enhanced Large Language Model for Generalizable Pedestrian Crossing Behavior Inference | Qingwen Pu et.al. | 2601.00694 | translate | read | null |
| 2026-01-02 | Human-like AI-based Auto-Field-in-Field Whole-Brain Radiotherapy Treatment Planning With Conversation Large Language Model Feedback | Adnan Jafar et.al. | 2601.00685 | translate | read | null |
| 2026-01-02 | QSLM: A Performance- and Memory-aware Quantization Framework with Tiered Search Strategy for Spike-driven Language Models | Rachmad Vidya Wicaksana Putra et.al. | 2601.00679 | translate | read | null |
| 2026-01-02 | Physio-DPO: Aligning Large Language Models with the Protein Energy Landscape to Eliminate Structural Hallucinations | QiWei Meng et.al. | 2601.00647 | translate | read | null |
| 2026-01-02 | FlexSpec: Frozen Drafts Meet Evolving Targets in Edge-Cloud Collaborative LLM Speculative Decoding | Yuchen Li et.al. | 2601.00644 | translate | read | null |
| 2026-01-02 | Probabilistic Guarantees for Reducing Contextual Hallucinations in LLMs | Nils Rautenberg et.al. | 2601.00641 | translate | read | null |
| 2026-01-02 | SEMODS: A Validated Dataset of Open-Source Software Engineering Models | Alexandra González et.al. | 2601.00635 | translate | read | null |
| 2026-01-02 | Do Chatbot LLMs Talk Too Much? The YapBench Benchmark | Vadim Borisov et.al. | 2601.00624 | translate | read | null |
| 2026-01-02 | DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations | Longtian Qiu et.al. | 2601.00623 | translate | read | null |
| 2026-01-02 | Beyond IVR: Benchmarking Customer Support LLM Agents for Business-Adherence | Sumanth Balaji et.al. | 2601.00596 | translate | read | null |
| 2026-01-02 | CSSBench: Evaluating the Safety of Lightweight LLMs against Chinese-Specific Adversarial Patterns | Zhenhong Zhou et.al. | 2601.00588 | translate | read | null |
| 2026-01-02 | HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts | Zihan Fang et.al. | 2601.00583 | translate | read | null |
| 2026-01-02 | The AI Invisibility Effect: Understanding Human-AI Interaction When Users Don’t Recognize Artificial Intelligence | Obada Kraishan et.al. | 2601.00579 | translate | read | null |
| 2026-01-02 | InfoSynth: Information-Guided Benchmark Synthesis for LLMs | Ishir Garg et.al. | 2601.00575 | translate | read | null |
| 2026-01-02 | Improving Scientific Document Retrieval with Academic Concept Index | Jeyun Lee et.al. | 2601.00567 | translate | read | null |
| 2026-01-02 | Low Rank Comes with Low Security: Gradient Assembly Poisoning Attacks against Distributed LoRA-based LLM Systems | Yueyan Dong et.al. | 2601.00566 | translate | read | null |
| 2026-01-02 | Cracking IoT Security: Can LLMs Outsmart Static Analysis Tools? | Jason Quantrill et.al. | 2601.00559 | translate | read | null |
| 2026-01-01 | Improving LLM-Assisted Secure Code Generation through Retrieval-Augmented-Generation and Multi-Tool Feedback | Vidyut Sriram et.al. | 2601.00509 | translate | read | null |
| 2026-01-01 | Rule-Based Approaches to Atomic Sentence Extraction | Lineesha Kamana et.al. | 2601.00506 | translate | read | null |
| 2026-01-01 | MotionPhysics: Learnable Motion Distillation for Text-Guided Simulation | Miaowei Wang et.al. | 2601.00504 | translate | read | null |
| 2026-01-01 | STELLAR: A Search-Based Testing Framework for Large Language Model Applications | Lev Sorokin et.al. | 2601.00497 | translate | read | null |
| 2026-01-01 | Noise-Aware Named Entity Recognition for Historical VET Documents | Alexander M. Esser et.al. | 2601.00488 | translate | read | null |
| 2026-01-01 | Multi-Agent Coordinated Rename Refactoring | Abhiram Bellur et.al. | 2601.00482 | translate | read | null |
| 2026-01-01 | DSL or Code? Evaluating the Quality of LLM-Generated Algebraic Specifications: A Case Study in Optimization at Kinaxis | Negin Ayoughi et.al. | 2601.00469 | translate | read | null |
| 2026-01-01 | Defensive M2S: Training Guardrail Models on Compressed Multi-turn Conversations | Hyunjun Kim et.al. | 2601.00454 | translate | read | null |
| 2026-01-01 | Language as Mathematical Structure: Examining Semantic Field Theory Against Language Games | Dimitris Vartziotis et.al. | 2601.00448 | translate | read | null |
| 2026-01-01 | Toward Better Temporal Structures for Geopolitical Events Forecasting | Kian Ahrabian et.al. | 2601.00430 | translate | read | null |
| 2026-01-01 | Do LLMs Judge Distantly Supervised Named Entity Labels Well? Constructing the JudgeWEL Dataset | Alistair Plum et.al. | 2601.00411 | translate | read | null |
| 2026-01-01 | Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach | Biao Wu et.al. | 2601.00388 | translate | read | null |
| 2026-01-01 | The Role of Mixed-Language Documents for Multilingual Large Language Model Pretraining | Jiandong Shao et.al. | 2601.00364 | translate | read | null |
| 2026-01-01 | Robust Uncertainty Quantification for Factual Generation of Large Language Models | Yuhao Zhang et.al. | 2601.00348 | translate | read | null |
| 2026-01-01 | Can Large Language Models Still Explain Themselves? Investigating the Impact of Quantization on Self-Explanations | Qianli Wang et.al. | 2601.00282 | translate | read | null |
| 2026-01-01 | Making Theft Useless: Adulteration-Based Protection of Proprietary Knowledge Graphs in GraphRAG Systems | Weijie Wang et.al. | 2601.00274 | translate | read | null |
| 2026-01-01 | FaithSCAN: Model-Driven Single-Pass Hallucination Detection for Faithful Visual Question Answering | Chaodong Tong et.al. | 2601.00269 | translate | read | null |
| 2026-01-01 | Beyond Perfect APIs: A Comprehensive Evaluation of LLM Agents Under Real-World API Complexity | Doyoung Kim et.al. | 2601.00268 | translate | read | null |
| 2026-01-01 | Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counterfactual Example Generation | Qianli Wang et.al. | 2601.00263 | translate | read | null |
| 2026-01-01 | TotalFM: An Organ-Separated Framework for 3D-CT Vision Foundation Models | Kohei Yamamoto et.al. | 2601.00260 | translate | read | null |
| 2026-01-01 | An Empirical Evaluation of LLM-Based Approaches for Code Vulnerability Detection: RAG, SFT, and Dual-Agent Systems | Md Hasan Saju et.al. | 2601.00254 | translate | read | null |
| 2026-01-01 | FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems | Shanli Xing et.al. | 2601.00227 | translate | read | null |
| 2026-01-01 | Talk Less, Verify More: Improving LLM Assistants with Semantic Checks and Execution Feedback | Yan Sun et.al. | 2601.00224 | translate | read | null |
| 2026-01-01 | From Evidence-Based Medicine to Knowledge Graph: Retrieval-Augmented Generation for Sports Rehabilitation and a Domain Benchmark | Jinning Zhang et.al. | 2601.00216 | translate | read | null |
| 2026-01-01 | From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning | Omar Sharif et.al. | 2601.00215 | translate | read | null |
| 2026-01-01 | Overlooked Safety Vulnerability in LLMs: Malicious Intelligent Optimization Algorithm Request and its Jailbreak | Haoran Gu et.al. | 2601.00213 | translate | read | null |
| 2026-01-01 | Knowledge Distillation for Temporal Knowledge Graph Reasoning with Large Language Models | Wang Xing et.al. | 2601.00202 | translate | read | null |
| 2026-01-01 | Pat-DEVAL: Chain-of-Legal-Thought Evaluation for Patent Description | Yongmin Yoo et.al. | 2601.00166 | translate | read | null |
| 2026-01-01 | Combining datasets with different ground truths using Low-Rank Adaptation to generalize image-based CNN models for photometric redshift prediction | Vikram Seenivasan et.al. | 2601.00146 | translate | read | null |
(<a href=../LLM.md>back to LLM</a>)