LLM - 2025-04 | Paper Arxiv Daily

LLM - 2025-04

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-04-30	TRUST: An LLM-Based Dialogue System for Trauma Understanding and Structured Assessments	Sichang Tu et.al.	2504.21851	translate	read	null
2025-04-30	COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning	Xindi Wu et.al.	2504.21850	translate	read	link
2025-04-30	An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding	Xiuwei Shang et.al.	2504.21803	translate	read	null
2025-04-30	DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition	Z. Z. Ren et.al.	2504.21801	translate	read	link
2025-04-30	MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness	Junsheng Huang et.al.	2504.21773	translate	read	null
2025-04-30	LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs	Baleegh Ahmad et.al.	2504.21770	translate	read	null
2025-04-30	LLM-based Interactive Imitation Learning for Robotic Manipulation	Jonas Werner et.al.	2504.21769	translate	read	null
2025-04-30	Investigating Literary Motifs in Ancient and Medieval Novels with Large Language Models	Emelie Hallenberg et.al.	2504.21742	translate	read	null
2025-04-30	TheraQuest: A Gamified, LLM-Powered Simulation for Massage Therapy Training	Shengqian Wang et.al.	2504.21735	translate	read	null
2025-04-30	XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs	Marco Arazzi et.al.	2504.21700	translate	read	null
2025-04-29	YoChameleon: Personalized Vision and Language Generation	Thao Nguyen et.al.	2504.20998	translate	read	link
2025-04-29	Toward Efficient Exploration by Large Language Model Agents	Dilip Arumugam et.al.	2504.20997	translate	read	null
2025-04-29	X-Fusion: Introducing New Modality to Frozen Large Language Models	Sicheng Mo et.al.	2504.20996	translate	read	null
2025-04-29	ACE: A Security Architecture for LLM-Integrated App Systems	Evan Li et.al.	2504.20984	translate	read	null
2025-04-29	Real-Time Wayfinding Assistant for Blind and Low-Vision Users	Dabbrata Das et.al.	2504.20976	translate	read	null
2025-04-29	SetKE: Knowledge Editing for Knowledge Elements Overlap	Yifan Wei et.al.	2504.20972	translate	read	null
2025-04-29	OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification	Shangyu Li et.al.	2504.20964	translate	read	null
2025-04-29	Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models	Maryna Vyshnyvetska et.al.	2504.20951	translate	read	null
2025-04-29	Trace-of-Thought: Enhanced Arithmetic Problem Solving via Reasoning Distillation From Large to Small Language Models	Tyler McDonald et.al.	2504.20946	translate	read	null
2025-04-29	ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification	Ziqing Fan et.al.	2504.20930	translate	read	link
2025-04-28	AutoJudge: Judge Decoding Without Manual Annotation	Roman Garipov et.al.	2504.20039	translate	read	null
2025-04-28	SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning	Wufei Ma et.al.	2504.20024	translate	read	null
2025-04-28	Better To Ask in English? Evaluating Factual Accuracy of Multilingual LLMs in English and Low-Resource Languages	Pritika Rohera et.al.	2504.20022	translate	read	null
2025-04-28	Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models	Xin Wang et.al.	2504.20020	translate	read	null
2025-04-28	LLM-Generated Fake News Induces Truth Decay in News Ecosystem: A Case Study on Neural News Recommendation	Beizhe Hu et.al.	2504.20013	translate	read	null
2025-04-28	Towards Automated Scoping of AI for Social Good Projects	Jacob Emmerson et.al.	2504.20010	translate	read	null
2025-04-28	Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom	Rishika Sen et.al.	2504.20000	translate	read	null
2025-04-28	TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons	Emre Can Acikgoz et.al.	2504.19982	translate	read	null
2025-04-28	Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets	Adam Younsi et.al.	2504.19981	translate	read	null
2025-04-29	From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification	Junhao Ye et.al.	2504.19959	translate	read	null
2025-04-25	TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation	Gwen Yidou Weng et.al.	2504.18535	translate	read	link
2025-04-25	Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation	Shivam Duggal et.al.	2504.18509	translate	read	null
2025-04-25	TopSpace: spatial topic modeling for unsupervised discovery of multicellular spatial tissue structures in multiplex imaging	Junsouk Choi et.al.	2504.18495	translate	read	null
2025-04-25	Investigating Co-Constructive Behavior of Large Language Models in Explanation Dialogues	Leandra Fichtel et.al.	2504.18483	translate	read	null
2025-04-25	Generative Induction of Dialogue Task Schemas with Streaming Refinement and Simulated Interactions	James D. Finch et.al.	2504.18474	translate	read	null
2025-04-25	Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation	Peiyuan Jing et.al.	2504.18453	translate	read	null
2025-04-25	LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection	Rajesh Yarra et.al.	2504.18423	translate	read	null
2025-04-25	BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs	Hongyu Wang et.al.	2504.18415	translate	read	null
2025-04-25	An Empirical Study of Evaluating Long-form Question Answering	Ning Xian et.al.	2504.18413	translate	read	null
2025-04-25	Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers	Jared Moore et.al.	2504.18412	translate	read	link
2025-04-24	Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models	Xu Ma et.al.	2504.17789	translate	read	null
2025-04-24	Replay to Remember: Retaining Domain Knowledge in Streaming Language Models	Sneh Pillai et.al.	2504.17780	translate	read	null
2025-04-24	Conversational Assistants to support Heart Failure Patients: comparing a Neurosymbolic Architecture with ChatGPT	Anuja Tayal et.al.	2504.17753	translate	read	null
2025-04-24	Towards Robust LLMs: an Adversarial Robustness Measurement Framework	Natan Levy et.al.	2504.17723	translate	read	null
2025-04-24	Multilingual Performance Biases of Large Language Models in Education	Vansh Gupta et.al.	2504.17720	translate	read	null
2025-04-24	Ensemble Bayesian Inference: Leveraging Small Language Models to Achieve LLM-level Accuracy in Profile Matching Tasks	Haru-Tada Sato et.al.	2504.17685	translate	read	null
2025-04-24	INSIGHT: Bridging the Student-Teacher Gap in Times of Large Language Models	Jarne Thys et.al.	2504.17677	translate	read	null
2025-04-24	Energy Considerations of Large Language Model Inference and Efficiency Optimizations	Jared Fernandez et.al.	2504.17674	translate	read	null
2025-04-24	Cross-region Model Training with Communication-Computation Overlapping and Delay Compensation	Ying Zhu et.al.	2504.17672	translate	read	null
2025-04-24	Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction	Yuanchang Ye et.al.	2504.17671	translate	read	null
2025-04-23	IberBench: LLM Evaluation on Iberian Languages	José Ángel González et.al.	2504.16921	translate	read	link
2025-04-23	Do Large Language Models know who did what to whom?	Joseph M. Denning et.al.	2504.16884	translate	read	null
2025-04-23	Enhancing Critical Thinking with AI: A Tailored Warning System for RAG Models	Xuyang Zhu et.al.	2504.16883	translate	read	null
2025-04-23	Context-Enhanced Vulnerability Detection Based on Large Language Model	Yixin Yang et.al.	2504.16877	translate	read	null
2025-04-23	Exploring How LLMs Capture and Represent Domain-Specific Knowledge	Mirian Hipolito Garcia et.al.	2504.16871	translate	read	null
2025-04-23	Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification	Alexander Shvets et.al.	2504.16856	translate	read	link
2025-04-23	Monte Carlo Planning with Large Language Model for Text-Based Game Agents	Zijing Shi et.al.	2504.16855	translate	read	null
2025-04-23	Improving Significant Wave Height Prediction Using Chronos Models	Yilin Zhai et.al.	2504.16834	translate	read	null
2025-04-23	LRASGen: LLM-based RESTful API Specification Generation	Sida Deng et.al.	2504.16833	translate	read	null
2025-04-23	GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning	Luu Quy Tung et.al.	2504.16832	translate	read	null
2025-04-22	TTRL: Test-Time Reinforcement Learning	Yuxin Zuo et.al.	2504.16084	translate	read	link
2025-04-22	From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning	Le Zhuo et.al.	2504.16080	translate	read	link
2025-04-22	LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities	Thomas Schmied et.al.	2504.16078	translate	read	null
2025-04-22	PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models	Shi Qiu et.al.	2504.16074	translate	read	link
2025-04-22	A Python Tool for Reconstructing Full News Text from GDELT	A. Fronzetti Colladon et.al.	2504.16063	translate	read	null
2025-04-22	Vision language models are unreliable at trivial spatial cognition	Sangeet Khemlani et.al.	2504.16061	translate	read	null
2025-04-22	Automated Static Vulnerability Detection via a Holistic Neuro-symbolic Approach	Penghui Li et.al.	2504.16057	translate	read	null
2025-04-22	Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability	Daniel Hendriks et.al.	2504.16056	translate	read	null
2025-04-22	Certified Mitigation of Worst-Case LLM Copyright Infringement	Jingyu Zhang et.al.	2504.16046	translate	read	null
2025-04-22	LLMs meet Federated Learning for Scalable and Secure IoT Management	Yazan Otoum et.al.	2504.16032	translate	read	null
2025-04-21	Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs	Chun-Hsiao Yeh et.al.	2504.15280	translate	read	link
2025-04-21	VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models	Weiye Xu et.al.	2504.15279	translate	read	link
2025-04-21	Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning	Jie Cheng et.al.	2504.15275	translate	read	link
2025-04-21	Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning	Ehsan Ahmadi et.al.	2504.15263	translate	read	null
2025-04-21	CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation	Anirudh Khatry et.al.	2504.15254	translate	read	link
2025-04-21	Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators	Yilun Zhou et.al.	2504.15253	translate	read	link
2025-04-21	MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning	Yahan Yang et.al.	2504.15241	translate	read	null
2025-04-21	Fully Bayesian Approaches to Topics over Time	Julián Cendrero et.al.	2504.15220	translate	read	null
2025-04-21	EvalAgent: Discovering Implicit Evaluation Criteria from the Web	Manya Wadhwa et.al.	2504.15219	translate	read	null
2025-04-21	Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs	Marina Sakharova et.al.	2504.15210	translate	read	null
2025-04-18	Generative AI Act II: Test Time Scaling Drives Cognition Engineering	Shijie Xia et.al.	2504.13828	translate	read	link
2025-04-18	Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models	Junjie Yang et.al.	2504.13825	translate	read	null
2025-04-18	Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning	Yixuan Even Xu et.al.	2504.13818	translate	read	null
2025-04-18	BadApex: Backdoor Attack Based on Adaptive Optimization Mechanism of Black-box Large Language Models	Zhengxian Wu et.al.	2504.13775	translate	read	null
2025-04-18	DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs	Tamim Al Mahmud et.al.	2504.13774	translate	read	null
2025-04-18	Detecting Malicious Source Code in PyPI Packages with LLMs: Does RAG Come in Handy?	Motunrayo Ibiyo et.al.	2504.13769	translate	read	null
2025-04-18	Scaling sparse feature circuit finding for in-context learning	Dmitrii Kharlapenko et.al.	2504.13756	translate	read	null
2025-04-18	Controlled Territory and Conflict Tracking (CONTACT): (Geo-)Mapping Occupied Territory from Open Source Intelligence	Paul K. Mandal et.al.	2504.13730	translate	read	null
2025-04-18	OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation	Yichen Wu et.al.	2504.13707	translate	read	null
2025-04-18	Exploring Multimodal Prompt for Visualization Authoring with Large Language Models	Zhen Wen et.al.	2504.13700	translate	read	null
2025-04-17	SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs	Haoxuan Li et.al.	2504.13172	translate	read	null
2025-04-17	Sleep-time Compute: Beyond Inference Scaling at Test-time	Kevin Lin et.al.	2504.13171	translate	read	link
2025-04-17	Exploring Expert Failures Improves LLM Agent Tuning	Li-Cheng Lan et.al.	2504.13145	translate	read	null
2025-04-17	Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo	João Loula et.al.	2504.13139	translate	read	null
2025-04-17	Energy-Based Reward Models for Robust Language Model Alignment	Anamika Lochab et.al.	2504.13134	translate	read	null
2025-04-17	LLMs Meet Finance: Fine-Tuning Foundation Models for the Open FinLLM Leaderboard	Varun Rao et.al.	2504.13125	translate	read	null
2025-04-17	Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training	Xinsong Zhang et.al.	2504.13123	translate	read	null
2025-04-17	VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models	Haojian Huang et.al.	2504.13122	translate	read	link
2025-04-17	Hadamard product in deep learning: Introduction, Advances and Challenges	Grigorios G Chrysos et.al.	2504.13112	translate	read	null
2025-04-17	Uncertainty-Aware Trajectory Prediction via Rule-Regularized Heteroscedastic Deep Classification	Kumar Manas et.al.	2504.13111	translate	read	null
2025-04-16	BitNet b1.58 2B4T Technical Report	Shuming Ma et.al.	2504.12285	translate	read	link
2025-04-16	HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks	Stefan Abi-Karam et.al.	2504.12268	translate	read	null
2025-04-16	FLIP Reasoning Challenge	Andreas Plesner et.al.	2504.12256	translate	read	link
2025-04-16	AnomalyGen: An Automated Semantic Log Sequence Generation Framework with LLM for Anomaly Detection	Xinyu Li et.al.	2504.12250	translate	read	null
2025-04-16	MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models	Hang Yuan et.al.	2504.12234	translate	read	null
2025-04-16	Watermarking Needs Input Repetition Masking	David Khachaturov et.al.	2504.12229	translate	read	null
2025-04-16	d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning	Siyan Zhao et.al.	2504.12216	translate	read	link
2025-04-16	What Do Large Language Models Know? Tacit Knowledge as a Potential Causal-Explanatory Structure	Céline Budding et.al.	2504.12187	translate	read	null
2025-04-16	SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented Data	Suyoung Bae et.al.	2504.12185	translate	read	null
2025-04-16	Trusting CHATGPT: how minor tweaks in the prompts lead to major differences in sentiment classification	Jaime E. Cuellar et.al.	2504.12180	translate	read	null
2025-04-15	TextArena	Leon Guertler et.al.	2504.11442	translate	read	link
2025-04-15	TADACap: Time-series Adaptive Domain-Aware Captioning	Elizabeth Fons et.al.	2504.11441	translate	read	null
2025-04-15	Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models	Maria Teleki et.al.	2504.11431	translate	read	null
2025-04-15	A Dual-Space Framework for General Knowledge Distillation of Large Language Models	Xue Zhang et.al.	2504.11426	translate	read	null
2025-04-15	Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts	Quanyu Long et.al.	2504.11420	translate	read	null
2025-04-15	DataDecide: How to Predict Best Pretraining Data with Small Experiments	Ian Magnusson et.al.	2504.11393	translate	read	null
2025-04-15	RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models	Juan Diego Rodriguez et.al.	2504.11381	translate	read	null
2025-04-15	Cancer-Myth: Evaluating AI Chatbot on Patient Questions with False Presuppositions	Wang Bill Zhu et.al.	2504.11373	translate	read	link
2025-04-15	OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution	Lucio La Cava et.al.	2504.11369	translate	read	null
2025-04-15	Teaching Large Language Models to Reason through Learning and Forgetting	Tianwei Ni et.al.	2504.11364	translate	read	null
2025-04-14	InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models	Jinguo Zhu et.al.	2504.10479	translate	read	null
2025-04-14	MIEB: Massive Image Embedding Benchmark	Chenghao Xiao et.al.	2504.10471	translate	read	link
2025-04-14	Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding	Tao Zhang et.al.	2504.10465	translate	read	link
2025-04-14	The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer	Weixian Lei et.al.	2504.10462	translate	read	link
2025-04-14	GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents	Xiaobo Xia et.al.	2504.10458	translate	read	link
2025-04-14	M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models	Junxiong Wang et.al.	2504.10449	translate	read	link
2025-04-14	Multimodal Long Video Modeling Based on Temporal Dynamic Context	Haoran Hao et.al.	2504.10443	translate	read	link
2025-04-14	LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models	Minqian Liu et.al.	2504.10430	translate	read	link
2025-04-14	Can We Edit LLMs for Long-Tail Biomedical Knowledge?	Xinhao Yi et.al.	2504.10421	translate	read	null
2025-04-14	Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA	Michał Turski et.al.	2504.10419	translate	read	link
2025-04-11	Quantum Large Language Model Fine-Tuning	Sang Hyub Kim et.al.	2504.08732	translate	read	null
2025-04-11	DocAgent: A Multi-Agent System for Automated Code Documentation Generation	Dayu Yang et.al.	2504.08725	translate	read	null
2025-04-11	Hypergraph Vision Transformers: Images are More than Nodes, More than Edges	Joshua Fixelle et.al.	2504.08710	translate	read	null
2025-04-11	SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents	Muhammad Shihab Rashid et.al.	2504.08703	translate	read	null
2025-04-11	Large Language Models as Span Annotators	Zdeněk Kasner et.al.	2504.08697	translate	read	null
2025-04-11	TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning	Hang Ni et.al.	2504.08694	translate	read	null
2025-04-11	Fast-Slow-Thinking: Complex Task Solving with Large Language Models	Yiliu Sun et.al.	2504.08690	translate	read	null
2025-04-11	Voice Interaction With Conversational AI Could Facilitate Thoughtful Reflection and Substantive Revision in Writing	Jiho Kim et.al.	2504.08687	translate	read	null
2025-04-11	Variability-Driven User-Story Generation using LLM and Triadic Concept Analysis	Alexandre Bazin et.al.	2504.08666	translate	read	null
2025-04-11	Quality evaluation of Tabby coding assistant using real source code snippets	Marta Borek et.al.	2504.08650	translate	read	null
2025-04-10	C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing	Zhongyang Li et.al.	2504.07964	translate	read	link
2025-04-10	GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation	Lang Lin et.al.	2504.07962	translate	read	null
2025-04-10	MM-IFEngine: Towards Multimodal Instruction Following	Shengyuan Ding et.al.	2504.07957	translate	read	link
2025-04-10	VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning	Yukun Qi et.al.	2504.07956	translate	read	null
2025-04-10	Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos	Rundong Luo et.al.	2504.07940	translate	read	null
2025-04-10	Porting an LLM based Application from ChatGPT to an On-Premise Environment	Teemu Paloniemi et.al.	2504.07907	translate	read	null
2025-04-10	Redefining Machine Translation on Social Network Services with Large Language Models	Hongcheng Guo et.al.	2504.07901	translate	read	null
2025-04-10	How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective	Qi Liu et.al.	2504.07898	translate	read	null
2025-04-10	Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge	Riccardo Cantini et.al.	2504.07887	translate	read	link
2025-04-10	Token Level Routing Inference System for Edge Devices	Jianshu She et.al.	2504.07878	translate	read	null
2025-04-09	Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning	Nikhil Shivakumar Nayak et.al.	2504.07097	translate	read	null
2025-04-09	KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs	Elan Markowitz et.al.	2504.07087	translate	read	null
2025-04-09	DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning	Atharva Pandey et.al.	2504.07080	translate	read	null
2025-04-09	A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models	Zhouhang Xie et.al.	2504.07070	translate	read	null
2025-04-09	HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification	Bibek Paudel et.al.	2504.07069	translate	read	null
2025-04-09	TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling	Liang-Hsuan Tseng et.al.	2504.07053	translate	read	null
2025-04-09	To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning	Tian Qin et.al.	2504.07052	translate	read	null
2025-04-09	Evaluating Retrieval Augmented Generative Models for Document Queries in Transportation Safety	Chad Melton et.al.	2504.07022	translate	read	null
2025-04-09	LLM-IFT: LLM-Powered Information Flow Tracking for Secure Hardware	Nowfel Mashnoor et.al.	2504.07015	translate	read	null
2025-04-09	Towards LLMs Robustness to Changes in Prompt Format Styles	Lilian Ngweta et.al.	2504.06969	translate	read	null
2025-04-08	GOLLuM: Gaussian Process Optimized LLMs – Reframing LLM Finetuning through Bayesian Optimization	Bojana Ranković et.al.	2504.06265	translate	read	null
2025-04-08	Hogwild! Inference: Parallel LLM Generation via Concurrent Attention	Gleb Rodionov et.al.	2504.06261	translate	read	null
2025-04-08	FEABench: Evaluating Language Models on Multiphysics Reasoning Ability	Nayantara Mudur et.al.	2504.06260	translate	read	null
2025-04-08	Transfer between Modalities with MetaQueries	Xichen Pan et.al.	2504.06256	translate	read	null
2025-04-08	LExT: Towards Evaluating Trustworthiness of Natural Language Explanations	Krithi Shailya et.al.	2504.06227	translate	read	null
2025-04-08	Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation	Biao Zhang et.al.	2504.06225	translate	read	null
2025-04-08	Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs	Dongyang Fan et.al.	2504.06219	translate	read	null
2025-04-08	From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models	Chejian Xu et.al.	2504.06214	translate	read	null
2025-04-08	TxGemma: Efficient and Agentic LLMs for Therapeutics	Eric Wang et.al.	2504.06196	translate	read	null
2025-04-08	Assessing how hyperparameters impact Large Language Models’ sarcasm detection performance	Montgomery Gole et.al.	2504.06166	translate	read	null
2025-04-07	URECA: Unique Region Caption Anything	Sangbeom Lim et.al.	2504.05305	translate	read	null
2025-04-07	Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations	Pedro Ferreira et.al.	2504.05294	translate	read	null
2025-04-07	The challenge of uncertainty quantification of large language models in medicine	Zahra Atf et.al.	2504.05278	translate	read	null
2025-04-07	Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation	Yucheng Chu et.al.	2504.05276	translate	read	null
2025-04-07	Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models	Yang Yan et.al.	2504.05262	translate	read	null
2025-04-07	Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models	Adrián Bazaga et.al.	2504.05258	translate	read	null
2025-04-07	Explaining Low Perception Model Competency with High-Competency Counterfactuals	Sara Pohland et.al.	2504.05254	translate	read	null
2025-04-07	LLM-based Automated Grading with Human-in-the-Loop	Hang Li et.al.	2504.05239	translate	read	null
2025-04-08	Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG	Hengran Zhang et.al.	2504.05220	translate	read	null
2025-04-07	Unleashing the Power of LLMs in Dense Retrieval with Query Likelihood Modeling	Hengran Zhang et.al.	2504.05216	translate	read	null
2025-04-04	Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning	Xinyi Wang et.al.	2504.03635	translate	read	null
2025-04-04	Align to Structure: Aligning Large Language Models with Structural Information	Zae Myung Kim et.al.	2504.03622	translate	read	null
2025-04-04	VISTA-OCR: Towards generative and interactive end to end OCR models	Laziz Hamdi et.al.	2504.03621	translate	read	null
2025-04-04	Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task	Leonardo Ranaldi et.al.	2504.03616	translate	read	null
2025-04-04	AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset	Bingxiang He et.al.	2504.03612	translate	read	null
2025-04-04	EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline	Peter Baile Chen et.al.	2504.03598	translate	read	null
2025-04-04	Agentic Knowledgeable Self-awareness	Shuofei Qiao et.al.	2504.03553	translate	read	null
2025-04-04	Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles	Chen Wei Kuo et.al.	2504.03520	translate	read	null
2025-04-04	LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications	Botao Zhu et.al.	2504.03444	translate	read	null
2025-04-04	Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models	Mirko Borszukovszki et.al.	2504.03440	translate	read	null
2025-04-03	STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection	Divya Velayudhan et.al.	2504.02823	translate	read	null
2025-04-03	Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models	Mateusz Pach et.al.	2504.02821	translate	read	link
2025-04-03	Generative Evaluation of Complex Reasoning in Large Language Models	Haowei Lin et.al.	2504.02810	translate	read	link
2025-04-03	MegaMath: Pushing the Limits of Open Math Corpora	Fan Zhou et.al.	2504.02807	translate	read	link
2025-04-04	A Survey of Large Language Models in Mental Health Disorder Detection on Social Media	Zhuohan Ge et.al.	2504.02800	translate	read	null
2025-04-03	A Framework for Robust Cognitive Evaluation of LLMs	Karin de Langis et.al.	2504.02789	translate	read	null
2025-04-03	From Consumption to Collaboration: Measuring Interaction Patterns to Augment Human Cognition in Open-Ended Tasks	Joshua Holstein et.al.	2504.02780	translate	read	null
2025-04-03	BT-ACTION: A Test-Driven Approach for Modular Understanding of User Instruction Leveraging Behaviour Trees and LLMs	Alexander Leszczynski et.al.	2504.02779	translate	read	null
2025-04-03	How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices?	Andres Algaba et.al.	2504.02767	translate	read	null
2025-04-03	Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study	Aryan Agrawal et.al.	2504.02733	translate	read	null
2025-04-02	Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities	Jing Liu et.al.	2504.01954	translate	read	null
2025-04-02	The LLM Wears Prada: Analysing Gender Bias and Stereotypes through Online Shopping Data	Massimiliano Luca et.al.	2504.01951	translate	read	null
2025-04-02	OpenCodeReasoning: Advancing Data Distillation for Competitive Coding	Wasi Uddin Ahmad et.al.	2504.01943	translate	read	null
2025-04-02	Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length?	Celine Lee et.al.	2504.01935	translate	read	null
2025-04-02	A thorough benchmark of automatic text classification: From traditional approaches to large language models	Washington Cunha et.al.	2504.01930	translate	read	null
2025-04-02	Gen-C: Populating Virtual Worlds with Generative Crowds	Andreas Panayiotou et.al.	2504.01924	translate	read	null
2025-04-02	Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation	Baban Gain et.al.	2504.01919	translate	read	null
2025-04-02	Advancing AI-Scientist Understanding: Making LLM Think Like a Physicist with Interpretable Reasoning	Yinggan Xu et.al.	2504.01911	translate	read	null
2025-04-02	GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning	Yanzhou Su et.al.	2504.01886	translate	read	link
2025-04-02	TransientTables: Evaluating LLMs’ Reasoning on Temporally Evolving Semi-structured Tables	Abhilash Shankarampeta et.al.	2504.01879	translate	read	null

(<a href=../LLM.md>back to LLM</a>)