LLM - 2025-04

Publish Date Title Authors PDF Translate Read Code
2025-04-30 TRUST: An LLM-Based Dialogue System for Trauma Understanding and Structured Assessments Sichang Tu et.al. 2504.21851 translate read null
2025-04-30 COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning Xindi Wu et.al. 2504.21850 translate read link
2025-04-30 An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding Xiuwei Shang et.al. 2504.21803 translate read null
2025-04-30 DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition Z. Z. Ren et.al. 2504.21801 translate read link
2025-04-30 MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness Junsheng Huang et.al. 2504.21773 translate read null
2025-04-30 LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs Baleegh Ahmad et.al. 2504.21770 translate read null
2025-04-30 LLM-based Interactive Imitation Learning for Robotic Manipulation Jonas Werner et.al. 2504.21769 translate read null
2025-04-30 Investigating Literary Motifs in Ancient and Medieval Novels with Large Language Models Emelie Hallenberg et.al. 2504.21742 translate read null
2025-04-30 TheraQuest: A Gamified, LLM-Powered Simulation for Massage Therapy Training Shengqian Wang et.al. 2504.21735 translate read null
2025-04-30 XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs Marco Arazzi et.al. 2504.21700 translate read null
2025-04-29 YoChameleon: Personalized Vision and Language Generation Thao Nguyen et.al. 2504.20998 translate read link
2025-04-29 Toward Efficient Exploration by Large Language Model Agents Dilip Arumugam et.al. 2504.20997 translate read null
2025-04-29 X-Fusion: Introducing New Modality to Frozen Large Language Models Sicheng Mo et.al. 2504.20996 translate read null
2025-04-29 ACE: A Security Architecture for LLM-Integrated App Systems Evan Li et.al. 2504.20984 translate read null
2025-04-29 Real-Time Wayfinding Assistant for Blind and Low-Vision Users Dabbrata Das et.al. 2504.20976 translate read null
2025-04-29 SetKE: Knowledge Editing for Knowledge Elements Overlap Yifan Wei et.al. 2504.20972 translate read null
2025-04-29 OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification Shangyu Li et.al. 2504.20964 translate read null
2025-04-29 Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models Maryna Vyshnyvetska et.al. 2504.20951 translate read null
2025-04-29 Trace-of-Thought: Enhanced Arithmetic Problem Solving via Reasoning Distillation From Large to Small Language Models Tyler McDonald et.al. 2504.20946 translate read null
2025-04-29 ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification Ziqing Fan et.al. 2504.20930 translate read link
2025-04-28 AutoJudge: Judge Decoding Without Manual Annotation Roman Garipov et.al. 2504.20039 translate read null
2025-04-28 SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning Wufei Ma et.al. 2504.20024 translate read null
2025-04-28 Better To Ask in English? Evaluating Factual Accuracy of Multilingual LLMs in English and Low-Resource Languages Pritika Rohera et.al. 2504.20022 translate read null
2025-04-28 Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models Xin Wang et.al. 2504.20020 translate read null
2025-04-28 LLM-Generated Fake News Induces Truth Decay in News Ecosystem: A Case Study on Neural News Recommendation Beizhe Hu et.al. 2504.20013 translate read null
2025-04-28 Towards Automated Scoping of AI for Social Good Projects Jacob Emmerson et.al. 2504.20010 translate read null
2025-04-28 Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom Rishika Sen et.al. 2504.20000 translate read null
2025-04-28 TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons Emre Can Acikgoz et.al. 2504.19982 translate read null
2025-04-28 Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets Adam Younsi et.al. 2504.19981 translate read null
2025-04-29 From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification Junhao Ye et.al. 2504.19959 translate read null
2025-04-25 TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation Gwen Yidou Weng et.al. 2504.18535 translate read link
2025-04-25 Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation Shivam Duggal et.al. 2504.18509 translate read null
2025-04-25 TopSpace: spatial topic modeling for unsupervised discovery of multicellular spatial tissue structures in multiplex imaging Junsouk Choi et.al. 2504.18495 translate read null
2025-04-25 Investigating Co-Constructive Behavior of Large Language Models in Explanation Dialogues Leandra Fichtel et.al. 2504.18483 translate read null
2025-04-25 Generative Induction of Dialogue Task Schemas with Streaming Refinement and Simulated Interactions James D. Finch et.al. 2504.18474 translate read null
2025-04-25 Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation Peiyuan Jing et.al. 2504.18453 translate read null
2025-04-25 LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection Rajesh Yarra et.al. 2504.18423 translate read null
2025-04-25 BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs Hongyu Wang et.al. 2504.18415 translate read null
2025-04-25 An Empirical Study of Evaluating Long-form Question Answering Ning Xian et.al. 2504.18413 translate read null
2025-04-25 Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers Jared Moore et.al. 2504.18412 translate read link
2025-04-24 Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models Xu Ma et.al. 2504.17789 translate read null
2025-04-24 Replay to Remember: Retaining Domain Knowledge in Streaming Language Models Sneh Pillai et.al. 2504.17780 translate read null
2025-04-24 Conversational Assistants to support Heart Failure Patients: comparing a Neurosymbolic Architecture with ChatGPT Anuja Tayal et.al. 2504.17753 translate read null
2025-04-24 Towards Robust LLMs: an Adversarial Robustness Measurement Framework Natan Levy et.al. 2504.17723 translate read null
2025-04-24 Multilingual Performance Biases of Large Language Models in Education Vansh Gupta et.al. 2504.17720 translate read null
2025-04-24 Ensemble Bayesian Inference: Leveraging Small Language Models to Achieve LLM-level Accuracy in Profile Matching Tasks Haru-Tada Sato et.al. 2504.17685 translate read null
2025-04-24 INSIGHT: Bridging the Student-Teacher Gap in Times of Large Language Models Jarne Thys et.al. 2504.17677 translate read null
2025-04-24 Energy Considerations of Large Language Model Inference and Efficiency Optimizations Jared Fernandez et.al. 2504.17674 translate read null
2025-04-24 Cross-region Model Training with Communication-Computation Overlapping and Delay Compensation Ying Zhu et.al. 2504.17672 translate read null
2025-04-24 Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction Yuanchang Ye et.al. 2504.17671 translate read null
2025-04-23 IberBench: LLM Evaluation on Iberian Languages José Ángel González et.al. 2504.16921 translate read link
2025-04-23 Do Large Language Models know who did what to whom? Joseph M. Denning et.al. 2504.16884 translate read null
2025-04-23 Enhancing Critical Thinking with AI: A Tailored Warning System for RAG Models Xuyang Zhu et.al. 2504.16883 translate read null
2025-04-23 Context-Enhanced Vulnerability Detection Based on Large Language Model Yixin Yang et.al. 2504.16877 translate read null
2025-04-23 Exploring How LLMs Capture and Represent Domain-Specific Knowledge Mirian Hipolito Garcia et.al. 2504.16871 translate read null
2025-04-23 Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification Alexander Shvets et.al. 2504.16856 translate read link
2025-04-23 Monte Carlo Planning with Large Language Model for Text-Based Game Agents Zijing Shi et.al. 2504.16855 translate read null
2025-04-23 Improving Significant Wave Height Prediction Using Chronos Models Yilin Zhai et.al. 2504.16834 translate read null
2025-04-23 LRASGen: LLM-based RESTful API Specification Generation Sida Deng et.al. 2504.16833 translate read null
2025-04-23 GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning Luu Quy Tung et.al. 2504.16832 translate read null
2025-04-22 TTRL: Test-Time Reinforcement Learning Yuxin Zuo et.al. 2504.16084 translate read link
2025-04-22 From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning Le Zhuo et.al. 2504.16080 translate read link
2025-04-22 LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Thomas Schmied et.al. 2504.16078 translate read null
2025-04-22 PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models Shi Qiu et.al. 2504.16074 translate read link
2025-04-22 A Python Tool for Reconstructing Full News Text from GDELT A. Fronzetti Colladon et.al. 2504.16063 translate read null
2025-04-22 Vision language models are unreliable at trivial spatial cognition Sangeet Khemlani et.al. 2504.16061 translate read null
2025-04-22 Automated Static Vulnerability Detection via a Holistic Neuro-symbolic Approach Penghui Li et.al. 2504.16057 translate read null
2025-04-22 Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability Daniel Hendriks et.al. 2504.16056 translate read null
2025-04-22 Certified Mitigation of Worst-Case LLM Copyright Infringement Jingyu Zhang et.al. 2504.16046 translate read null
2025-04-22 LLMs meet Federated Learning for Scalable and Secure IoT Management Yazan Otoum et.al. 2504.16032 translate read null
2025-04-21 Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs Chun-Hsiao Yeh et.al. 2504.15280 translate read link
2025-04-21 VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Weiye Xu et.al. 2504.15279 translate read link
2025-04-21 Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning Jie Cheng et.al. 2504.15275 translate read link
2025-04-21 Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning Ehsan Ahmadi et.al. 2504.15263 translate read null
2025-04-21 CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation Anirudh Khatry et.al. 2504.15254 translate read link
2025-04-21 Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators Yilun Zhou et.al. 2504.15253 translate read link
2025-04-21 MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning Yahan Yang et.al. 2504.15241 translate read null
2025-04-21 Fully Bayesian Approaches to Topics over Time Julián Cendrero et.al. 2504.15220 translate read null
2025-04-21 EvalAgent: Discovering Implicit Evaluation Criteria from the Web Manya Wadhwa et.al. 2504.15219 translate read null
2025-04-21 Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs Marina Sakharova et.al. 2504.15210 translate read null
2025-04-18 Generative AI Act II: Test Time Scaling Drives Cognition Engineering Shijie Xia et.al. 2504.13828 translate read link
2025-04-18 Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models Junjie Yang et.al. 2504.13825 translate read null
2025-04-18 Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning Yixuan Even Xu et.al. 2504.13818 translate read null
2025-04-18 BadApex: Backdoor Attack Based on Adaptive Optimization Mechanism of Black-box Large Language Models Zhengxian Wu et.al. 2504.13775 translate read null
2025-04-18 DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs Tamim Al Mahmud et.al. 2504.13774 translate read null
2025-04-18 Detecting Malicious Source Code in PyPI Packages with LLMs: Does RAG Come in Handy? Motunrayo Ibiyo et.al. 2504.13769 translate read null
2025-04-18 Scaling sparse feature circuit finding for in-context learning Dmitrii Kharlapenko et.al. 2504.13756 translate read null
2025-04-18 Controlled Territory and Conflict Tracking (CONTACT): (Geo-)Mapping Occupied Territory from Open Source Intelligence Paul K. Mandal et.al. 2504.13730 translate read null
2025-04-18 OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation Yichen Wu et.al. 2504.13707 translate read null
2025-04-18 Exploring Multimodal Prompt for Visualization Authoring with Large Language Models Zhen Wen et.al. 2504.13700 translate read null
2025-04-17 SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs Haoxuan Li et.al. 2504.13172 translate read null
2025-04-17 Sleep-time Compute: Beyond Inference Scaling at Test-time Kevin Lin et.al. 2504.13171 translate read link
2025-04-17 Exploring Expert Failures Improves LLM Agent Tuning Li-Cheng Lan et.al. 2504.13145 translate read null
2025-04-17 Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo João Loula et.al. 2504.13139 translate read null
2025-04-17 Energy-Based Reward Models for Robust Language Model Alignment Anamika Lochab et.al. 2504.13134 translate read null
2025-04-17 LLMs Meet Finance: Fine-Tuning Foundation Models for the Open FinLLM Leaderboard Varun Rao et.al. 2504.13125 translate read null
2025-04-17 Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training Xinsong Zhang et.al. 2504.13123 translate read null
2025-04-17 VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Haojian Huang et.al. 2504.13122 translate read link
2025-04-17 Hadamard product in deep learning: Introduction, Advances and Challenges Grigorios G Chrysos et.al. 2504.13112 translate read null
2025-04-17 Uncertainty-Aware Trajectory Prediction via Rule-Regularized Heteroscedastic Deep Classification Kumar Manas et.al. 2504.13111 translate read null
2025-04-16 BitNet b1.58 2B4T Technical Report Shuming Ma et.al. 2504.12285 translate read link
2025-04-16 HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks Stefan Abi-Karam et.al. 2504.12268 translate read null
2025-04-16 FLIP Reasoning Challenge Andreas Plesner et.al. 2504.12256 translate read link
2025-04-16 AnomalyGen: An Automated Semantic Log Sequence Generation Framework with LLM for Anomaly Detection Xinyu Li et.al. 2504.12250 translate read null
2025-04-16 MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models Hang Yuan et.al. 2504.12234 translate read null
2025-04-16 Watermarking Needs Input Repetition Masking David Khachaturov et.al. 2504.12229 translate read null
2025-04-16 d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning Siyan Zhao et.al. 2504.12216 translate read link
2025-04-16 What Do Large Language Models Know? Tacit Knowledge as a Potential Causal-Explanatory Structure Céline Budding et.al. 2504.12187 translate read null
2025-04-16 SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented Data Suyoung Bae et.al. 2504.12185 translate read null
2025-04-16 Trusting CHATGPT: how minor tweaks in the prompts lead to major differences in sentiment classification Jaime E. Cuellar et.al. 2504.12180 translate read null
2025-04-15 TextArena Leon Guertler et.al. 2504.11442 translate read link
2025-04-15 TADACap: Time-series Adaptive Domain-Aware Captioning Elizabeth Fons et.al. 2504.11441 translate read null
2025-04-15 Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models Maria Teleki et.al. 2504.11431 translate read null
2025-04-15 A Dual-Space Framework for General Knowledge Distillation of Large Language Models Xue Zhang et.al. 2504.11426 translate read null
2025-04-15 Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts Quanyu Long et.al. 2504.11420 translate read null
2025-04-15 DataDecide: How to Predict Best Pretraining Data with Small Experiments Ian Magnusson et.al. 2504.11393 translate read null
2025-04-15 RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models Juan Diego Rodriguez et.al. 2504.11381 translate read null
2025-04-15 Cancer-Myth: Evaluating AI Chatbot on Patient Questions with False Presuppositions Wang Bill Zhu et.al. 2504.11373 translate read link
2025-04-15 OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution Lucio La Cava et.al. 2504.11369 translate read null
2025-04-15 Teaching Large Language Models to Reason through Learning and Forgetting Tianwei Ni et.al. 2504.11364 translate read null
2025-04-14 InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Jinguo Zhu et.al. 2504.10479 translate read null
2025-04-14 MIEB: Massive Image Embedding Benchmark Chenghao Xiao et.al. 2504.10471 translate read link
2025-04-14 Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Tao Zhang et.al. 2504.10465 translate read link
2025-04-14 The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer Weixian Lei et.al. 2504.10462 translate read link
2025-04-14 GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents Xiaobo Xia et.al. 2504.10458 translate read link
2025-04-14 M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models Junxiong Wang et.al. 2504.10449 translate read link
2025-04-14 Multimodal Long Video Modeling Based on Temporal Dynamic Context Haoran Hao et.al. 2504.10443 translate read link
2025-04-14 LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models Minqian Liu et.al. 2504.10430 translate read link
2025-04-14 Can We Edit LLMs for Long-Tail Biomedical Knowledge? Xinhao Yi et.al. 2504.10421 translate read null
2025-04-14 Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA Michał Turski et.al. 2504.10419 translate read link
2025-04-11 Quantum Large Language Model Fine-Tuning Sang Hyub Kim et.al. 2504.08732 translate read null
2025-04-11 DocAgent: A Multi-Agent System for Automated Code Documentation Generation Dayu Yang et.al. 2504.08725 translate read null
2025-04-11 Hypergraph Vision Transformers: Images are More than Nodes, More than Edges Joshua Fixelle et.al. 2504.08710 translate read null
2025-04-11 SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents Muhammad Shihab Rashid et.al. 2504.08703 translate read null
2025-04-11 Large Language Models as Span Annotators Zdeněk Kasner et.al. 2504.08697 translate read null
2025-04-11 TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning Hang Ni et.al. 2504.08694 translate read null
2025-04-11 Fast-Slow-Thinking: Complex Task Solving with Large Language Models Yiliu Sun et.al. 2504.08690 translate read null
2025-04-11 Voice Interaction With Conversational AI Could Facilitate Thoughtful Reflection and Substantive Revision in Writing Jiho Kim et.al. 2504.08687 translate read null
2025-04-11 Variability-Driven User-Story Generation using LLM and Triadic Concept Analysis Alexandre Bazin et.al. 2504.08666 translate read null
2025-04-11 Quality evaluation of Tabby coding assistant using real source code snippets Marta Borek et.al. 2504.08650 translate read null
2025-04-10 C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing Zhongyang Li et.al. 2504.07964 translate read link
2025-04-10 GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation Lang Lin et.al. 2504.07962 translate read null
2025-04-10 MM-IFEngine: Towards Multimodal Instruction Following Shengyuan Ding et.al. 2504.07957 translate read link
2025-04-10 VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Yukun Qi et.al. 2504.07956 translate read null
2025-04-10 Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos Rundong Luo et.al. 2504.07940 translate read null
2025-04-10 Porting an LLM based Application from ChatGPT to an On-Premise Environment Teemu Paloniemi et.al. 2504.07907 translate read null
2025-04-10 Redefining Machine Translation on Social Network Services with Large Language Models Hongcheng Guo et.al. 2504.07901 translate read null
2025-04-10 How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective Qi Liu et.al. 2504.07898 translate read null
2025-04-10 Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge Riccardo Cantini et.al. 2504.07887 translate read link
2025-04-10 Token Level Routing Inference System for Edge Devices Jianshu She et.al. 2504.07878 translate read null
2025-04-09 Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning Nikhil Shivakumar Nayak et.al. 2504.07097 translate read null
2025-04-09 KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs Elan Markowitz et.al. 2504.07087 translate read null
2025-04-09 DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning Atharva Pandey et.al. 2504.07080 translate read null
2025-04-09 A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models Zhouhang Xie et.al. 2504.07070 translate read null
2025-04-09 HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification Bibek Paudel et.al. 2504.07069 translate read null
2025-04-09 TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling Liang-Hsuan Tseng et.al. 2504.07053 translate read null
2025-04-09 To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning Tian Qin et.al. 2504.07052 translate read null
2025-04-09 Evaluating Retrieval Augmented Generative Models for Document Queries in Transportation Safety Chad Melton et.al. 2504.07022 translate read null
2025-04-09 LLM-IFT: LLM-Powered Information Flow Tracking for Secure Hardware Nowfel Mashnoor et.al. 2504.07015 translate read null
2025-04-09 Towards LLMs Robustness to Changes in Prompt Format Styles Lilian Ngweta et.al. 2504.06969 translate read null
2025-04-08 GOLLuM: Gaussian Process Optimized LLMs – Reframing LLM Finetuning through Bayesian Optimization Bojana Ranković et.al. 2504.06265 translate read null
2025-04-08 Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Gleb Rodionov et.al. 2504.06261 translate read null
2025-04-08 FEABench: Evaluating Language Models on Multiphysics Reasoning Ability Nayantara Mudur et.al. 2504.06260 translate read null
2025-04-08 Transfer between Modalities with MetaQueries Xichen Pan et.al. 2504.06256 translate read null
2025-04-08 LExT: Towards Evaluating Trustworthiness of Natural Language Explanations Krithi Shailya et.al. 2504.06227 translate read null
2025-04-08 Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation Biao Zhang et.al. 2504.06225 translate read null
2025-04-08 Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs Dongyang Fan et.al. 2504.06219 translate read null
2025-04-08 From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models Chejian Xu et.al. 2504.06214 translate read null
2025-04-08 TxGemma: Efficient and Agentic LLMs for Therapeutics Eric Wang et.al. 2504.06196 translate read null
2025-04-08 Assessing how hyperparameters impact Large Language Models’ sarcasm detection performance Montgomery Gole et.al. 2504.06166 translate read null
2025-04-07 URECA: Unique Region Caption Anything Sangbeom Lim et.al. 2504.05305 translate read null
2025-04-07 Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations Pedro Ferreira et.al. 2504.05294 translate read null
2025-04-07 The challenge of uncertainty quantification of large language models in medicine Zahra Atf et.al. 2504.05278 translate read null
2025-04-07 Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation Yucheng Chu et.al. 2504.05276 translate read null
2025-04-07 Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models Yang Yan et.al. 2504.05262 translate read null
2025-04-07 Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models Adrián Bazaga et.al. 2504.05258 translate read null
2025-04-07 Explaining Low Perception Model Competency with High-Competency Counterfactuals Sara Pohland et.al. 2504.05254 translate read null
2025-04-07 LLM-based Automated Grading with Human-in-the-Loop Hang Li et.al. 2504.05239 translate read null
2025-04-08 Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG Hengran Zhang et.al. 2504.05220 translate read null
2025-04-07 Unleashing the Power of LLMs in Dense Retrieval with Query Likelihood Modeling Hengran Zhang et.al. 2504.05216 translate read null
2025-04-04 Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning Xinyi Wang et.al. 2504.03635 translate read null
2025-04-04 Align to Structure: Aligning Large Language Models with Structural Information Zae Myung Kim et.al. 2504.03622 translate read null
2025-04-04 VISTA-OCR: Towards generative and interactive end to end OCR models Laziz Hamdi et.al. 2504.03621 translate read null
2025-04-04 Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task Leonardo Ranaldi et.al. 2504.03616 translate read null
2025-04-04 AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset Bingxiang He et.al. 2504.03612 translate read null
2025-04-04 EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline Peter Baile Chen et.al. 2504.03598 translate read null
2025-04-04 Agentic Knowledgeable Self-awareness Shuofei Qiao et.al. 2504.03553 translate read null
2025-04-04 Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles Chen Wei Kuo et.al. 2504.03520 translate read null
2025-04-04 LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications Botao Zhu et.al. 2504.03444 translate read null
2025-04-04 Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models Mirko Borszukovszki et.al. 2504.03440 translate read null
2025-04-03 STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection Divya Velayudhan et.al. 2504.02823 translate read null
2025-04-03 Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models Mateusz Pach et.al. 2504.02821 translate read link
2025-04-03 Generative Evaluation of Complex Reasoning in Large Language Models Haowei Lin et.al. 2504.02810 translate read link
2025-04-03 MegaMath: Pushing the Limits of Open Math Corpora Fan Zhou et.al. 2504.02807 translate read link
2025-04-04 A Survey of Large Language Models in Mental Health Disorder Detection on Social Media Zhuohan Ge et.al. 2504.02800 translate read null
2025-04-03 A Framework for Robust Cognitive Evaluation of LLMs Karin de Langis et.al. 2504.02789 translate read null
2025-04-03 From Consumption to Collaboration: Measuring Interaction Patterns to Augment Human Cognition in Open-Ended Tasks Joshua Holstein et.al. 2504.02780 translate read null
2025-04-03 BT-ACTION: A Test-Driven Approach for Modular Understanding of User Instruction Leveraging Behaviour Trees and LLMs Alexander Leszczynski et.al. 2504.02779 translate read null
2025-04-03 How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices? Andres Algaba et.al. 2504.02767 translate read null
2025-04-03 Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study Aryan Agrawal et.al. 2504.02733 translate read null
2025-04-02 Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities Jing Liu et.al. 2504.01954 translate read null
2025-04-02 The LLM Wears Prada: Analysing Gender Bias and Stereotypes through Online Shopping Data Massimiliano Luca et.al. 2504.01951 translate read null
2025-04-02 OpenCodeReasoning: Advancing Data Distillation for Competitive Coding Wasi Uddin Ahmad et.al. 2504.01943 translate read null
2025-04-02 Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length? Celine Lee et.al. 2504.01935 translate read null
2025-04-02 A thorough benchmark of automatic text classification: From traditional approaches to large language models Washington Cunha et.al. 2504.01930 translate read null
2025-04-02 Gen-C: Populating Virtual Worlds with Generative Crowds Andreas Panayiotou et.al. 2504.01924 translate read null
2025-04-02 Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation Baban Gain et.al. 2504.01919 translate read null
2025-04-02 Advancing AI-Scientist Understanding: Making LLM Think Like a Physicist with Interpretable Reasoning Yinggan Xu et.al. 2504.01911 translate read null
2025-04-02 GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning Yanzhou Su et.al. 2504.01886 translate read link
2025-04-02 TransientTables: Evaluating LLMs’ Reasoning on Temporally Evolving Semi-structured Tables Abhilash Shankarampeta et.al. 2504.01879 translate read null

(<a href=../LLM.md>back to LLM</a>)