LLM - 2025-04
LLM - 2025-04
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-04-30 | TRUST: An LLM-Based Dialogue System for Trauma Understanding and Structured Assessments | Sichang Tu et.al. | 2504.21851 | translate | read | null |
| 2025-04-30 | COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning | Xindi Wu et.al. | 2504.21850 | translate | read | link |
| 2025-04-30 | An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding | Xiuwei Shang et.al. | 2504.21803 | translate | read | null |
| 2025-04-30 | DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition | Z. Z. Ren et.al. | 2504.21801 | translate | read | link |
| 2025-04-30 | MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness | Junsheng Huang et.al. | 2504.21773 | translate | read | null |
| 2025-04-30 | LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs | Baleegh Ahmad et.al. | 2504.21770 | translate | read | null |
| 2025-04-30 | LLM-based Interactive Imitation Learning for Robotic Manipulation | Jonas Werner et.al. | 2504.21769 | translate | read | null |
| 2025-04-30 | Investigating Literary Motifs in Ancient and Medieval Novels with Large Language Models | Emelie Hallenberg et.al. | 2504.21742 | translate | read | null |
| 2025-04-30 | TheraQuest: A Gamified, LLM-Powered Simulation for Massage Therapy Training | Shengqian Wang et.al. | 2504.21735 | translate | read | null |
| 2025-04-30 | XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs | Marco Arazzi et.al. | 2504.21700 | translate | read | null |
| 2025-04-29 | YoChameleon: Personalized Vision and Language Generation | Thao Nguyen et.al. | 2504.20998 | translate | read | link |
| 2025-04-29 | Toward Efficient Exploration by Large Language Model Agents | Dilip Arumugam et.al. | 2504.20997 | translate | read | null |
| 2025-04-29 | X-Fusion: Introducing New Modality to Frozen Large Language Models | Sicheng Mo et.al. | 2504.20996 | translate | read | null |
| 2025-04-29 | ACE: A Security Architecture for LLM-Integrated App Systems | Evan Li et.al. | 2504.20984 | translate | read | null |
| 2025-04-29 | Real-Time Wayfinding Assistant for Blind and Low-Vision Users | Dabbrata Das et.al. | 2504.20976 | translate | read | null |
| 2025-04-29 | SetKE: Knowledge Editing for Knowledge Elements Overlap | Yifan Wei et.al. | 2504.20972 | translate | read | null |
| 2025-04-29 | OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification | Shangyu Li et.al. | 2504.20964 | translate | read | null |
| 2025-04-29 | Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models | Maryna Vyshnyvetska et.al. | 2504.20951 | translate | read | null |
| 2025-04-29 | Trace-of-Thought: Enhanced Arithmetic Problem Solving via Reasoning Distillation From Large to Small Language Models | Tyler McDonald et.al. | 2504.20946 | translate | read | null |
| 2025-04-29 | ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification | Ziqing Fan et.al. | 2504.20930 | translate | read | link |
| 2025-04-28 | AutoJudge: Judge Decoding Without Manual Annotation | Roman Garipov et.al. | 2504.20039 | translate | read | null |
| 2025-04-28 | SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning | Wufei Ma et.al. | 2504.20024 | translate | read | null |
| 2025-04-28 | Better To Ask in English? Evaluating Factual Accuracy of Multilingual LLMs in English and Low-Resource Languages | Pritika Rohera et.al. | 2504.20022 | translate | read | null |
| 2025-04-28 | Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models | Xin Wang et.al. | 2504.20020 | translate | read | null |
| 2025-04-28 | LLM-Generated Fake News Induces Truth Decay in News Ecosystem: A Case Study on Neural News Recommendation | Beizhe Hu et.al. | 2504.20013 | translate | read | null |
| 2025-04-28 | Towards Automated Scoping of AI for Social Good Projects | Jacob Emmerson et.al. | 2504.20010 | translate | read | null |
| 2025-04-28 | Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom | Rishika Sen et.al. | 2504.20000 | translate | read | null |
| 2025-04-28 | TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons | Emre Can Acikgoz et.al. | 2504.19982 | translate | read | null |
| 2025-04-28 | Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets | Adam Younsi et.al. | 2504.19981 | translate | read | null |
| 2025-04-29 | From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification | Junhao Ye et.al. | 2504.19959 | translate | read | null |
| 2025-04-25 | TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation | Gwen Yidou Weng et.al. | 2504.18535 | translate | read | link |
| 2025-04-25 | Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation | Shivam Duggal et.al. | 2504.18509 | translate | read | null |
| 2025-04-25 | TopSpace: spatial topic modeling for unsupervised discovery of multicellular spatial tissue structures in multiplex imaging | Junsouk Choi et.al. | 2504.18495 | translate | read | null |
| 2025-04-25 | Investigating Co-Constructive Behavior of Large Language Models in Explanation Dialogues | Leandra Fichtel et.al. | 2504.18483 | translate | read | null |
| 2025-04-25 | Generative Induction of Dialogue Task Schemas with Streaming Refinement and Simulated Interactions | James D. Finch et.al. | 2504.18474 | translate | read | null |
| 2025-04-25 | Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation | Peiyuan Jing et.al. | 2504.18453 | translate | read | null |
| 2025-04-25 | LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection | Rajesh Yarra et.al. | 2504.18423 | translate | read | null |
| 2025-04-25 | BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs | Hongyu Wang et.al. | 2504.18415 | translate | read | null |
| 2025-04-25 | An Empirical Study of Evaluating Long-form Question Answering | Ning Xian et.al. | 2504.18413 | translate | read | null |
| 2025-04-25 | Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers | Jared Moore et.al. | 2504.18412 | translate | read | link |
| 2025-04-24 | Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models | Xu Ma et.al. | 2504.17789 | translate | read | null |
| 2025-04-24 | Replay to Remember: Retaining Domain Knowledge in Streaming Language Models | Sneh Pillai et.al. | 2504.17780 | translate | read | null |
| 2025-04-24 | Conversational Assistants to support Heart Failure Patients: comparing a Neurosymbolic Architecture with ChatGPT | Anuja Tayal et.al. | 2504.17753 | translate | read | null |
| 2025-04-24 | Towards Robust LLMs: an Adversarial Robustness Measurement Framework | Natan Levy et.al. | 2504.17723 | translate | read | null |
| 2025-04-24 | Multilingual Performance Biases of Large Language Models in Education | Vansh Gupta et.al. | 2504.17720 | translate | read | null |
| 2025-04-24 | Ensemble Bayesian Inference: Leveraging Small Language Models to Achieve LLM-level Accuracy in Profile Matching Tasks | Haru-Tada Sato et.al. | 2504.17685 | translate | read | null |
| 2025-04-24 | INSIGHT: Bridging the Student-Teacher Gap in Times of Large Language Models | Jarne Thys et.al. | 2504.17677 | translate | read | null |
| 2025-04-24 | Energy Considerations of Large Language Model Inference and Efficiency Optimizations | Jared Fernandez et.al. | 2504.17674 | translate | read | null |
| 2025-04-24 | Cross-region Model Training with Communication-Computation Overlapping and Delay Compensation | Ying Zhu et.al. | 2504.17672 | translate | read | null |
| 2025-04-24 | Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction | Yuanchang Ye et.al. | 2504.17671 | translate | read | null |
| 2025-04-23 | IberBench: LLM Evaluation on Iberian Languages | José Ángel González et.al. | 2504.16921 | translate | read | link |
| 2025-04-23 | Do Large Language Models know who did what to whom? | Joseph M. Denning et.al. | 2504.16884 | translate | read | null |
| 2025-04-23 | Enhancing Critical Thinking with AI: A Tailored Warning System for RAG Models | Xuyang Zhu et.al. | 2504.16883 | translate | read | null |
| 2025-04-23 | Context-Enhanced Vulnerability Detection Based on Large Language Model | Yixin Yang et.al. | 2504.16877 | translate | read | null |
| 2025-04-23 | Exploring How LLMs Capture and Represent Domain-Specific Knowledge | Mirian Hipolito Garcia et.al. | 2504.16871 | translate | read | null |
| 2025-04-23 | Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification | Alexander Shvets et.al. | 2504.16856 | translate | read | link |
| 2025-04-23 | Monte Carlo Planning with Large Language Model for Text-Based Game Agents | Zijing Shi et.al. | 2504.16855 | translate | read | null |
| 2025-04-23 | Improving Significant Wave Height Prediction Using Chronos Models | Yilin Zhai et.al. | 2504.16834 | translate | read | null |
| 2025-04-23 | LRASGen: LLM-based RESTful API Specification Generation | Sida Deng et.al. | 2504.16833 | translate | read | null |
| 2025-04-23 | GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning | Luu Quy Tung et.al. | 2504.16832 | translate | read | null |
| 2025-04-22 | TTRL: Test-Time Reinforcement Learning | Yuxin Zuo et.al. | 2504.16084 | translate | read | link |
| 2025-04-22 | From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning | Le Zhuo et.al. | 2504.16080 | translate | read | link |
| 2025-04-22 | LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities | Thomas Schmied et.al. | 2504.16078 | translate | read | null |
| 2025-04-22 | PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models | Shi Qiu et.al. | 2504.16074 | translate | read | link |
| 2025-04-22 | A Python Tool for Reconstructing Full News Text from GDELT | A. Fronzetti Colladon et.al. | 2504.16063 | translate | read | null |
| 2025-04-22 | Vision language models are unreliable at trivial spatial cognition | Sangeet Khemlani et.al. | 2504.16061 | translate | read | null |
| 2025-04-22 | Automated Static Vulnerability Detection via a Holistic Neuro-symbolic Approach | Penghui Li et.al. | 2504.16057 | translate | read | null |
| 2025-04-22 | Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability | Daniel Hendriks et.al. | 2504.16056 | translate | read | null |
| 2025-04-22 | Certified Mitigation of Worst-Case LLM Copyright Infringement | Jingyu Zhang et.al. | 2504.16046 | translate | read | null |
| 2025-04-22 | LLMs meet Federated Learning for Scalable and Secure IoT Management | Yazan Otoum et.al. | 2504.16032 | translate | read | null |
| 2025-04-21 | Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs | Chun-Hsiao Yeh et.al. | 2504.15280 | translate | read | link |
| 2025-04-21 | VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models | Weiye Xu et.al. | 2504.15279 | translate | read | link |
| 2025-04-21 | Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning | Jie Cheng et.al. | 2504.15275 | translate | read | link |
| 2025-04-21 | Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning | Ehsan Ahmadi et.al. | 2504.15263 | translate | read | null |
| 2025-04-21 | CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation | Anirudh Khatry et.al. | 2504.15254 | translate | read | link |
| 2025-04-21 | Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators | Yilun Zhou et.al. | 2504.15253 | translate | read | link |
| 2025-04-21 | MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning | Yahan Yang et.al. | 2504.15241 | translate | read | null |
| 2025-04-21 | Fully Bayesian Approaches to Topics over Time | Julián Cendrero et.al. | 2504.15220 | translate | read | null |
| 2025-04-21 | EvalAgent: Discovering Implicit Evaluation Criteria from the Web | Manya Wadhwa et.al. | 2504.15219 | translate | read | null |
| 2025-04-21 | Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs | Marina Sakharova et.al. | 2504.15210 | translate | read | null |
| 2025-04-18 | Generative AI Act II: Test Time Scaling Drives Cognition Engineering | Shijie Xia et.al. | 2504.13828 | translate | read | link |
| 2025-04-18 | Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models | Junjie Yang et.al. | 2504.13825 | translate | read | null |
| 2025-04-18 | Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning | Yixuan Even Xu et.al. | 2504.13818 | translate | read | null |
| 2025-04-18 | BadApex: Backdoor Attack Based on Adaptive Optimization Mechanism of Black-box Large Language Models | Zhengxian Wu et.al. | 2504.13775 | translate | read | null |
| 2025-04-18 | DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs | Tamim Al Mahmud et.al. | 2504.13774 | translate | read | null |
| 2025-04-18 | Detecting Malicious Source Code in PyPI Packages with LLMs: Does RAG Come in Handy? | Motunrayo Ibiyo et.al. | 2504.13769 | translate | read | null |
| 2025-04-18 | Scaling sparse feature circuit finding for in-context learning | Dmitrii Kharlapenko et.al. | 2504.13756 | translate | read | null |
| 2025-04-18 | Controlled Territory and Conflict Tracking (CONTACT): (Geo-)Mapping Occupied Territory from Open Source Intelligence | Paul K. Mandal et.al. | 2504.13730 | translate | read | null |
| 2025-04-18 | OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation | Yichen Wu et.al. | 2504.13707 | translate | read | null |
| 2025-04-18 | Exploring Multimodal Prompt for Visualization Authoring with Large Language Models | Zhen Wen et.al. | 2504.13700 | translate | read | null |
| 2025-04-17 | SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs | Haoxuan Li et.al. | 2504.13172 | translate | read | null |
| 2025-04-17 | Sleep-time Compute: Beyond Inference Scaling at Test-time | Kevin Lin et.al. | 2504.13171 | translate | read | link |
| 2025-04-17 | Exploring Expert Failures Improves LLM Agent Tuning | Li-Cheng Lan et.al. | 2504.13145 | translate | read | null |
| 2025-04-17 | Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo | João Loula et.al. | 2504.13139 | translate | read | null |
| 2025-04-17 | Energy-Based Reward Models for Robust Language Model Alignment | Anamika Lochab et.al. | 2504.13134 | translate | read | null |
| 2025-04-17 | LLMs Meet Finance: Fine-Tuning Foundation Models for the Open FinLLM Leaderboard | Varun Rao et.al. | 2504.13125 | translate | read | null |
| 2025-04-17 | Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training | Xinsong Zhang et.al. | 2504.13123 | translate | read | null |
| 2025-04-17 | VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models | Haojian Huang et.al. | 2504.13122 | translate | read | link |
| 2025-04-17 | Hadamard product in deep learning: Introduction, Advances and Challenges | Grigorios G Chrysos et.al. | 2504.13112 | translate | read | null |
| 2025-04-17 | Uncertainty-Aware Trajectory Prediction via Rule-Regularized Heteroscedastic Deep Classification | Kumar Manas et.al. | 2504.13111 | translate | read | null |
| 2025-04-16 | BitNet b1.58 2B4T Technical Report | Shuming Ma et.al. | 2504.12285 | translate | read | link |
| 2025-04-16 | HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks | Stefan Abi-Karam et.al. | 2504.12268 | translate | read | null |
| 2025-04-16 | FLIP Reasoning Challenge | Andreas Plesner et.al. | 2504.12256 | translate | read | link |
| 2025-04-16 | AnomalyGen: An Automated Semantic Log Sequence Generation Framework with LLM for Anomaly Detection | Xinyu Li et.al. | 2504.12250 | translate | read | null |
| 2025-04-16 | MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models | Hang Yuan et.al. | 2504.12234 | translate | read | null |
| 2025-04-16 | Watermarking Needs Input Repetition Masking | David Khachaturov et.al. | 2504.12229 | translate | read | null |
| 2025-04-16 | d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning | Siyan Zhao et.al. | 2504.12216 | translate | read | link |
| 2025-04-16 | What Do Large Language Models Know? Tacit Knowledge as a Potential Causal-Explanatory Structure | Céline Budding et.al. | 2504.12187 | translate | read | null |
| 2025-04-16 | SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented Data | Suyoung Bae et.al. | 2504.12185 | translate | read | null |
| 2025-04-16 | Trusting CHATGPT: how minor tweaks in the prompts lead to major differences in sentiment classification | Jaime E. Cuellar et.al. | 2504.12180 | translate | read | null |
| 2025-04-15 | TextArena | Leon Guertler et.al. | 2504.11442 | translate | read | link |
| 2025-04-15 | TADACap: Time-series Adaptive Domain-Aware Captioning | Elizabeth Fons et.al. | 2504.11441 | translate | read | null |
| 2025-04-15 | Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models | Maria Teleki et.al. | 2504.11431 | translate | read | null |
| 2025-04-15 | A Dual-Space Framework for General Knowledge Distillation of Large Language Models | Xue Zhang et.al. | 2504.11426 | translate | read | null |
| 2025-04-15 | Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts | Quanyu Long et.al. | 2504.11420 | translate | read | null |
| 2025-04-15 | DataDecide: How to Predict Best Pretraining Data with Small Experiments | Ian Magnusson et.al. | 2504.11393 | translate | read | null |
| 2025-04-15 | RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models | Juan Diego Rodriguez et.al. | 2504.11381 | translate | read | null |
| 2025-04-15 | Cancer-Myth: Evaluating AI Chatbot on Patient Questions with False Presuppositions | Wang Bill Zhu et.al. | 2504.11373 | translate | read | link |
| 2025-04-15 | OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution | Lucio La Cava et.al. | 2504.11369 | translate | read | null |
| 2025-04-15 | Teaching Large Language Models to Reason through Learning and Forgetting | Tianwei Ni et.al. | 2504.11364 | translate | read | null |
| 2025-04-14 | InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models | Jinguo Zhu et.al. | 2504.10479 | translate | read | null |
| 2025-04-14 | MIEB: Massive Image Embedding Benchmark | Chenghao Xiao et.al. | 2504.10471 | translate | read | link |
| 2025-04-14 | Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding | Tao Zhang et.al. | 2504.10465 | translate | read | link |
| 2025-04-14 | The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer | Weixian Lei et.al. | 2504.10462 | translate | read | link |
| 2025-04-14 | GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents | Xiaobo Xia et.al. | 2504.10458 | translate | read | link |
| 2025-04-14 | M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models | Junxiong Wang et.al. | 2504.10449 | translate | read | link |
| 2025-04-14 | Multimodal Long Video Modeling Based on Temporal Dynamic Context | Haoran Hao et.al. | 2504.10443 | translate | read | link |
| 2025-04-14 | LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models | Minqian Liu et.al. | 2504.10430 | translate | read | link |
| 2025-04-14 | Can We Edit LLMs for Long-Tail Biomedical Knowledge? | Xinhao Yi et.al. | 2504.10421 | translate | read | null |
| 2025-04-14 | Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA | Michał Turski et.al. | 2504.10419 | translate | read | link |
| 2025-04-11 | Quantum Large Language Model Fine-Tuning | Sang Hyub Kim et.al. | 2504.08732 | translate | read | null |
| 2025-04-11 | DocAgent: A Multi-Agent System for Automated Code Documentation Generation | Dayu Yang et.al. | 2504.08725 | translate | read | null |
| 2025-04-11 | Hypergraph Vision Transformers: Images are More than Nodes, More than Edges | Joshua Fixelle et.al. | 2504.08710 | translate | read | null |
| 2025-04-11 | SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents | Muhammad Shihab Rashid et.al. | 2504.08703 | translate | read | null |
| 2025-04-11 | Large Language Models as Span Annotators | Zdeněk Kasner et.al. | 2504.08697 | translate | read | null |
| 2025-04-11 | TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning | Hang Ni et.al. | 2504.08694 | translate | read | null |
| 2025-04-11 | Fast-Slow-Thinking: Complex Task Solving with Large Language Models | Yiliu Sun et.al. | 2504.08690 | translate | read | null |
| 2025-04-11 | Voice Interaction With Conversational AI Could Facilitate Thoughtful Reflection and Substantive Revision in Writing | Jiho Kim et.al. | 2504.08687 | translate | read | null |
| 2025-04-11 | Variability-Driven User-Story Generation using LLM and Triadic Concept Analysis | Alexandre Bazin et.al. | 2504.08666 | translate | read | null |
| 2025-04-11 | Quality evaluation of Tabby coding assistant using real source code snippets | Marta Borek et.al. | 2504.08650 | translate | read | null |
| 2025-04-10 | C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing | Zhongyang Li et.al. | 2504.07964 | translate | read | link |
| 2025-04-10 | GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation | Lang Lin et.al. | 2504.07962 | translate | read | null |
| 2025-04-10 | MM-IFEngine: Towards Multimodal Instruction Following | Shengyuan Ding et.al. | 2504.07957 | translate | read | link |
| 2025-04-10 | VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning | Yukun Qi et.al. | 2504.07956 | translate | read | null |
| 2025-04-10 | Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos | Rundong Luo et.al. | 2504.07940 | translate | read | null |
| 2025-04-10 | Porting an LLM based Application from ChatGPT to an On-Premise Environment | Teemu Paloniemi et.al. | 2504.07907 | translate | read | null |
| 2025-04-10 | Redefining Machine Translation on Social Network Services with Large Language Models | Hongcheng Guo et.al. | 2504.07901 | translate | read | null |
| 2025-04-10 | How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective | Qi Liu et.al. | 2504.07898 | translate | read | null |
| 2025-04-10 | Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge | Riccardo Cantini et.al. | 2504.07887 | translate | read | link |
| 2025-04-10 | Token Level Routing Inference System for Edge Devices | Jianshu She et.al. | 2504.07878 | translate | read | null |
| 2025-04-09 | Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning | Nikhil Shivakumar Nayak et.al. | 2504.07097 | translate | read | null |
| 2025-04-09 | KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs | Elan Markowitz et.al. | 2504.07087 | translate | read | null |
| 2025-04-09 | DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning | Atharva Pandey et.al. | 2504.07080 | translate | read | null |
| 2025-04-09 | A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models | Zhouhang Xie et.al. | 2504.07070 | translate | read | null |
| 2025-04-09 | HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification | Bibek Paudel et.al. | 2504.07069 | translate | read | null |
| 2025-04-09 | TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling | Liang-Hsuan Tseng et.al. | 2504.07053 | translate | read | null |
| 2025-04-09 | To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning | Tian Qin et.al. | 2504.07052 | translate | read | null |
| 2025-04-09 | Evaluating Retrieval Augmented Generative Models for Document Queries in Transportation Safety | Chad Melton et.al. | 2504.07022 | translate | read | null |
| 2025-04-09 | LLM-IFT: LLM-Powered Information Flow Tracking for Secure Hardware | Nowfel Mashnoor et.al. | 2504.07015 | translate | read | null |
| 2025-04-09 | Towards LLMs Robustness to Changes in Prompt Format Styles | Lilian Ngweta et.al. | 2504.06969 | translate | read | null |
| 2025-04-08 | GOLLuM: Gaussian Process Optimized LLMs – Reframing LLM Finetuning through Bayesian Optimization | Bojana Ranković et.al. | 2504.06265 | translate | read | null |
| 2025-04-08 | Hogwild! Inference: Parallel LLM Generation via Concurrent Attention | Gleb Rodionov et.al. | 2504.06261 | translate | read | null |
| 2025-04-08 | FEABench: Evaluating Language Models on Multiphysics Reasoning Ability | Nayantara Mudur et.al. | 2504.06260 | translate | read | null |
| 2025-04-08 | Transfer between Modalities with MetaQueries | Xichen Pan et.al. | 2504.06256 | translate | read | null |
| 2025-04-08 | LExT: Towards Evaluating Trustworthiness of Natural Language Explanations | Krithi Shailya et.al. | 2504.06227 | translate | read | null |
| 2025-04-08 | Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation | Biao Zhang et.al. | 2504.06225 | translate | read | null |
| 2025-04-08 | Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs | Dongyang Fan et.al. | 2504.06219 | translate | read | null |
| 2025-04-08 | From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models | Chejian Xu et.al. | 2504.06214 | translate | read | null |
| 2025-04-08 | TxGemma: Efficient and Agentic LLMs for Therapeutics | Eric Wang et.al. | 2504.06196 | translate | read | null |
| 2025-04-08 | Assessing how hyperparameters impact Large Language Models’ sarcasm detection performance | Montgomery Gole et.al. | 2504.06166 | translate | read | null |
| 2025-04-07 | URECA: Unique Region Caption Anything | Sangbeom Lim et.al. | 2504.05305 | translate | read | null |
| 2025-04-07 | Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations | Pedro Ferreira et.al. | 2504.05294 | translate | read | null |
| 2025-04-07 | The challenge of uncertainty quantification of large language models in medicine | Zahra Atf et.al. | 2504.05278 | translate | read | null |
| 2025-04-07 | Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation | Yucheng Chu et.al. | 2504.05276 | translate | read | null |
| 2025-04-07 | Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models | Yang Yan et.al. | 2504.05262 | translate | read | null |
| 2025-04-07 | Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models | Adrián Bazaga et.al. | 2504.05258 | translate | read | null |
| 2025-04-07 | Explaining Low Perception Model Competency with High-Competency Counterfactuals | Sara Pohland et.al. | 2504.05254 | translate | read | null |
| 2025-04-07 | LLM-based Automated Grading with Human-in-the-Loop | Hang Li et.al. | 2504.05239 | translate | read | null |
| 2025-04-08 | Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG | Hengran Zhang et.al. | 2504.05220 | translate | read | null |
| 2025-04-07 | Unleashing the Power of LLMs in Dense Retrieval with Query Likelihood Modeling | Hengran Zhang et.al. | 2504.05216 | translate | read | null |
| 2025-04-04 | Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning | Xinyi Wang et.al. | 2504.03635 | translate | read | null |
| 2025-04-04 | Align to Structure: Aligning Large Language Models with Structural Information | Zae Myung Kim et.al. | 2504.03622 | translate | read | null |
| 2025-04-04 | VISTA-OCR: Towards generative and interactive end to end OCR models | Laziz Hamdi et.al. | 2504.03621 | translate | read | null |
| 2025-04-04 | Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task | Leonardo Ranaldi et.al. | 2504.03616 | translate | read | null |
| 2025-04-04 | AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset | Bingxiang He et.al. | 2504.03612 | translate | read | null |
| 2025-04-04 | EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline | Peter Baile Chen et.al. | 2504.03598 | translate | read | null |
| 2025-04-04 | Agentic Knowledgeable Self-awareness | Shuofei Qiao et.al. | 2504.03553 | translate | read | null |
| 2025-04-04 | Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles | Chen Wei Kuo et.al. | 2504.03520 | translate | read | null |
| 2025-04-04 | LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications | Botao Zhu et.al. | 2504.03444 | translate | read | null |
| 2025-04-04 | Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models | Mirko Borszukovszki et.al. | 2504.03440 | translate | read | null |
| 2025-04-03 | STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection | Divya Velayudhan et.al. | 2504.02823 | translate | read | null |
| 2025-04-03 | Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models | Mateusz Pach et.al. | 2504.02821 | translate | read | link |
| 2025-04-03 | Generative Evaluation of Complex Reasoning in Large Language Models | Haowei Lin et.al. | 2504.02810 | translate | read | link |
| 2025-04-03 | MegaMath: Pushing the Limits of Open Math Corpora | Fan Zhou et.al. | 2504.02807 | translate | read | link |
| 2025-04-04 | A Survey of Large Language Models in Mental Health Disorder Detection on Social Media | Zhuohan Ge et.al. | 2504.02800 | translate | read | null |
| 2025-04-03 | A Framework for Robust Cognitive Evaluation of LLMs | Karin de Langis et.al. | 2504.02789 | translate | read | null |
| 2025-04-03 | From Consumption to Collaboration: Measuring Interaction Patterns to Augment Human Cognition in Open-Ended Tasks | Joshua Holstein et.al. | 2504.02780 | translate | read | null |
| 2025-04-03 | BT-ACTION: A Test-Driven Approach for Modular Understanding of User Instruction Leveraging Behaviour Trees and LLMs | Alexander Leszczynski et.al. | 2504.02779 | translate | read | null |
| 2025-04-03 | How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices? | Andres Algaba et.al. | 2504.02767 | translate | read | null |
| 2025-04-03 | Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study | Aryan Agrawal et.al. | 2504.02733 | translate | read | null |
| 2025-04-02 | Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities | Jing Liu et.al. | 2504.01954 | translate | read | null |
| 2025-04-02 | The LLM Wears Prada: Analysing Gender Bias and Stereotypes through Online Shopping Data | Massimiliano Luca et.al. | 2504.01951 | translate | read | null |
| 2025-04-02 | OpenCodeReasoning: Advancing Data Distillation for Competitive Coding | Wasi Uddin Ahmad et.al. | 2504.01943 | translate | read | null |
| 2025-04-02 | Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length? | Celine Lee et.al. | 2504.01935 | translate | read | null |
| 2025-04-02 | A thorough benchmark of automatic text classification: From traditional approaches to large language models | Washington Cunha et.al. | 2504.01930 | translate | read | null |
| 2025-04-02 | Gen-C: Populating Virtual Worlds with Generative Crowds | Andreas Panayiotou et.al. | 2504.01924 | translate | read | null |
| 2025-04-02 | Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation | Baban Gain et.al. | 2504.01919 | translate | read | null |
| 2025-04-02 | Advancing AI-Scientist Understanding: Making LLM Think Like a Physicist with Interpretable Reasoning | Yinggan Xu et.al. | 2504.01911 | translate | read | null |
| 2025-04-02 | GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning | Yanzhou Su et.al. | 2504.01886 | translate | read | link |
| 2025-04-02 | TransientTables: Evaluating LLMs’ Reasoning on Temporally Evolving Semi-structured Tables | Abhilash Shankarampeta et.al. | 2504.01879 | translate | read | null |
(<a href=../LLM.md>back to LLM</a>)