LLM - 2025-07

Publish Date Title Authors PDF Translate Read Code
2025-07-23 BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems Malsha Ashani Mahawatta Dona et.al. 2507.17722 translate read null
2025-07-23 AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer Danny D. Leybzon et.al. 2507.17718 translate read null
2025-07-23 HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging Taha Ceritli et.al. 2507.17706 translate read null
2025-07-23 Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models Changxin Tian et.al. 2507.17702 translate read null
2025-07-23 Thinking Isn’t an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations Zhao Song et.al. 2507.17699 translate read null
2025-07-23 Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks Ilias Chatzistefanidis et.al. 2507.17695 translate read null
2025-07-23 Simulating multiple human perspectives in socio-ecological systems using large language models Yongchao Zeng et.al. 2507.17680 translate read null
2025-07-23 See the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question Answering Junjie Wang et.al. 2507.17659 translate read null
2025-07-23 Who Attacks, and Why? Using LLMs to Identify Negative Campaigning in 18M Tweets across 19 Countries Victor Hartman et.al. 2507.17636 translate read null
2025-07-23 A Hybrid Early-Exit Algorithm for Large Language Models Based on Space Alignment Decoding (SPADE) Bowen Zheng et.al. 2507.17618 translate read null
2025-07-22 LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMs Da-Chen Lian et.al. 2507.16809 translate read null
2025-07-22 Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis Zhihao Xu et.al. 2507.16808 translate read null
2025-07-22 Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning Yanjun Zheng et.al. 2507.16802 translate read link
2025-07-23 Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent Xiaoyu Zhan et.al. 2507.16799 translate read null
2025-07-22 Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning Helena Casademunt et.al. 2507.16795 translate read link
2025-07-22 ChatChecker: A Framework for Dialogue System Testing and Evaluation Through Non-cooperative User Simulation Roman Mayr et.al. 2507.16792 translate read null
2025-07-22 Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Hongyin Luo et.al. 2507.16784 translate read link
2025-07-22 Cooling Matters: Benchmarking Large Language Models and Vision-Language Models on Liquid-Cooled Versus Air-Cooled H100 GPU Systems Imran Latif et.al. 2507.16781 translate read null
2025-07-22 When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs Yue Li et.al. 2507.16773 translate read null
2025-07-22 WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding Ran Wang et.al. 2507.16768 translate read null
2025-07-21 Diffusion Beats Autoregressive in Data-Constrained Settings Mihir Prabhudesai et.al. 2507.15857 translate read link
2025-07-21 Gemini 2.5 Pro Capable of Winning Gold at IMO 2025 Yichen Huang et.al. 2507.15855 translate read null
2025-07-21 The Other Mind: How Language Models Exhibit Human Temporal Cognition Lingyu Li et.al. 2507.15851 translate read link
2025-07-21 3LM: Bridging Arabic, STEM, and Code through Benchmarking Basma El Amel Boussaha et.al. 2507.15850 translate read null
2025-07-21 The Impact of Language Mixing on Bilingual LLM Reasoning Yihao Li et.al. 2507.15849 translate read null
2025-07-21 FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs Anh Nguyen et.al. 2507.15839 translate read null
2025-07-21 Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation Alessandro B. Melchiorre et.al. 2507.15826 translate read null
2025-07-21 ACS: An interactive framework for conformal selection Yu Gui et.al. 2507.15825 translate read null
2025-07-21 Do AI models help produce verified bug fixes? Li Huang et.al. 2507.15822 translate read null
2025-07-21 LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra Seth Karten et.al. 2507.15815 translate read link
2025-07-18 CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Xiaoya Li et.al. 2507.14111 translate read link
2025-07-18 Automated Interpretation of Non-Destructive Evaluation Contour Maps Using Large Language Models for Bridge Condition Assessment Viraj Nishesh Darji et.al. 2507.14107 translate read null
2025-07-18 Generative AI-Driven High-Fidelity Human Motion Simulation Hari Iyer et.al. 2507.14097 translate read null
2025-07-18 Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track Brian Ondov et.al. 2507.14096 translate read null
2025-07-18 DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration Xiyun Li et.al. 2507.14088 translate read null
2025-07-18 The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems? Maria Tsfasman et.al. 2507.14084 translate read null
2025-07-18 DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits Garapati Keerthana et.al. 2507.14079 translate read null
2025-07-18 Evaluating the Effectiveness of Cost-Efficient Large Language Models in Benchmark Biomedical Tasks Israt Jahan et.al. 2507.14045 translate read null
2025-07-18 Architecting Human-AI Cocreation for Technical Services – Interaction Modes and Contingency Factors Jochen Wulf et.al. 2507.14034 translate read null
2025-07-18 KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models Lam Nguyen et.al. 2507.14032 translate read null
2025-07-17 VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding Shihao Wang et.al. 2507.13353 translate read null
2025-07-17 Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes Tyler Loakman et.al. 2507.13335 translate read null
2025-07-17 A Survey of Context Engineering for Large Language Models Lingrui Mei et.al. 2507.13334 translate read link
2025-07-17 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner Zhouqi Hua et.al. 2507.13332 translate read null
2025-07-17 GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM Kyeongjin Ahn et.al. 2507.13323 translate read null
2025-07-17 Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark Junsu Kim et.al. 2507.13314 translate read null
2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations Carlos Arriaga et.al. 2507.13302 translate read null
2025-07-17 AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research Yilun Zhao et.al. 2507.13300 translate read link
2025-07-17 Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management Luis Gasco et.al. 2507.13275 translate read null
2025-07-17 Automating Steering for Safe Multimodal Large Language Models Lyucheng Wu et.al. 2507.13255 translate read null
2025-07-16 Mitigating Object Hallucinations via Sentence-Level Early Intervention Shangpin Peng et.al. 2507.12455 translate read link
2025-07-16 S2WTM: Spherical Sliced-Wasserstein Autoencoder for Topic Modeling Suman Adhya et.al. 2507.12451 translate read null
2025-07-16 Describe Anything Model for Visual Question Answering on Text-rich Images Yen-Linh Vu et.al. 2507.12441 translate read link
2025-07-16 Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models Yik Siu Chan et.al. 2507.12428 translate read null
2025-07-16 Advancing Retrieval-Augmented Generation for Structured Enterprise and Internal Data Chandana Cheerla et.al. 2507.12425 translate read link
2025-07-16 QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval Jaehyun Kwak et.al. 2507.12416 translate read null
2025-07-16 SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Xinyi He et.al. 2507.12415 translate read link
2025-07-16 Assessing the Value of Visual Input: A Benchmark of Multimodal Large Language Models for Robotic Path Planning Jacinto Colan et.al. 2507.12391 translate read null
2025-07-16 Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics Meysam Alizadeh et.al. 2507.12372 translate read null
2025-07-16 Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate Ana Davila et.al. 2507.12370 translate read null
2025-07-15 Streaming 4D Visual Geometry Transformer Dong Zhuo et.al. 2507.11539 translate read link
2025-07-15 DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering Yinsheng Li et.al. 2507.11527 translate read link
2025-07-15 LLM-based ambiguity detection in natural language instructions for collaborative surgical robots Ana Davila et.al. 2507.11525 translate read null
2025-07-15 AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air Shiyi Yang et.al. 2507.11515 translate read null
2025-07-15 LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer Yaoxian Dong et.al. 2507.11457 translate read null
2025-07-15 Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize? Yanjian Zhang et.al. 2507.11423 translate read null
2025-07-15 Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations Miray Özcan et.al. 2507.11417 translate read null
2025-07-15 Seq vs Seq: An Open Suite of Paired Encoders and Decoders Orion Weller et.al. 2507.11412 translate read link
2025-07-15 KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? Soumadeep Saha et.al. 2507.11408 translate read null
2025-07-15 EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes LG AI Research et.al. 2507.11407 translate read null
2025-07-14 Fusing LLM Capabilities with Routing Data Tao Feng et.al. 2507.10540 translate read null
2025-07-14 CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks Hongchao Jiang et.al. 2507.10535 translate read null
2025-07-14 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Mingqi Wu et.al. 2507.10532 translate read link
2025-07-14 Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI Jiangkai Wu et.al. 2507.10510 translate read link
2025-07-14 Scene-Aware Conversational ADAS with Generative AI for Real-Time Driver Assistance Kyungtae Han et.al. 2507.10500 translate read null
2025-07-14 Can You Detect the Difference? İsmail Tarım et.al. 2507.10475 translate read null
2025-07-14 GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space David G. Shatwell et.al. 2507.10473 translate read null
2025-07-14 MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking Mohamed T. Younes et.al. 2507.10472 translate read null
2025-07-14 An Empirical Evaluation of AI-Powered Non-Player Characters’ Perceived Realism and Performance in Virtual Reality Environments Mikko Korkiakoski et.al. 2507.10469 translate read null
2025-07-14 Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems Hammad Atta et.al. 2507.10457 translate read null
2025-07-11 Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective Hangjie Yuan et.al. 2507.08801 translate read link
2025-07-11 One Token to Fool LLM-as-a-Judge Yulai Zhao et.al. 2507.08794 translate read null
2025-07-11 BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity Chenyang Song et.al. 2507.08771 translate read link
2025-07-11 Multilingual Multimodal Software Developer for Code Generation Linzheng Chai et.al. 2507.08719 translate read null
2025-07-11 KG-Attention: Knowledge Graph-Guided Attention at Test-Time via Bidirectional Information Aggregation Songlin Zhai et.al. 2507.08704 translate read null
2025-07-11 ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way Rajarshi Roy et.al. 2507.08679 translate read null
2025-07-11 LLMCup: Ranking-Enhanced Comment Updating with LLMs Hua Ge et.al. 2507.08671 translate read null
2025-07-11 KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment Jiyao Zhang et.al. 2507.08665 translate read null
2025-07-11 Introspection of Thought Helps AI Agents Haoran Sun et.al. 2507.08664 translate read null
2025-07-11 Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning Xingguang Ji et.al. 2507.08649 translate read link
2025-07-10 Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology Haochen Wang et.al. 2507.07999 translate read link
2025-07-10 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Ziyue Li et.al. 2507.07996 translate read null
2025-07-10 Multigranular Evaluation for Brain Visual Decoding Weihao Xia et.al. 2507.07993 translate read null
2025-07-10 Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs Jeongseok Hyun et.al. 2507.07990 translate read link
2025-07-10 Automating Expert-Level Medical Reasoning Evaluation of Large Language Models Shuang Zhou et.al. 2507.07988 translate read null
2025-07-10 OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding JingLi Lin et.al. 2507.07984 translate read link
2025-07-10 Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology Sabine Felde et.al. 2507.07983 translate read null
2025-07-10 Defending Against Prompt Injection With a Few DefensiveTokens Sizhe Chen et.al. 2507.07974 translate read null
2025-07-10 Can Large Language Models Improve Phishing Defense? A Large-Scale Controlled Experiment on Warning Dialogue Explanations Federico Maria Cau et.al. 2507.07916 translate read null
2025-07-10 DTECT: Dynamic Topic Explorer & Context Tracker Suman Adhya et.al. 2507.07910 translate read null
2025-07-09 Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor Vatsal Agarwal et.al. 2507.07106 translate read null
2025-07-09 Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models Tiezheng Zhang et.al. 2507.07104 translate read link
2025-07-09 Evaluating Attribute Confusion in Fashion Text-to-Image Generation Ziyue Liu et.al. 2507.07079 translate read null
2025-07-09 5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage Ugur Ari et.al. 2507.07045 translate read null
2025-07-09 UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations Fengran Mo et.al. 2507.07030 translate read null
2025-07-09 First Return, Entropy-Eliciting Explore Tianyu Zheng et.al. 2507.07017 translate read null
2025-07-09 GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning S M Taslim Uddin Raju et.al. 2507.07006 translate read null
2025-07-09 Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs Yahan Yu et.al. 2507.06999 translate read null
2025-07-09 MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation Qilong Xing et.al. 2507.06992 translate read null
2025-07-09 Are They All Good? Evaluating the Quality of CoTs in LLM-based Code Generation Binquan Zhang et.al. 2507.06980 translate read null
2025-07-08 Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers Zhiyuan Peng et.al. 2507.06223 translate read null
2025-07-08 A Survey on Latent Reasoning Rui-Jie Zhu et.al. 2507.06203 translate read null
2025-07-08 UQLM: A Python Package for Uncertainty Quantification in Large Language Models Dylan Bouchard et.al. 2507.06196 translate read null
2025-07-08 SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads Jiale Lao et.al. 2507.06192 translate read null
2025-07-08 Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review Zhicheng Lin et.al. 2507.06185 translate read null
2025-07-08 Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling Prahitha Movva et.al. 2507.06183 translate read null
2025-07-08 Data-Semantics-Aware Recommendation of Diverse Pivot Tables Whanhee Cho et.al. 2507.06171 translate read null
2025-07-09 Skywork-R1V3 Technical Report Wei Shen et.al. 2507.06167 translate read null
2025-07-08 Evaluation of Habitat Robotics using Large Language Models William Li et.al. 2507.06157 translate read null
2025-07-08 Large Language Models Predict Human Well-being – But Not Equally Everywhere Pat Pataranutaporn et.al. 2507.06141 translate read null
2025-07-07 Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing Chun-Hsiao Yeh et.al. 2507.05259 translate read null
2025-07-07 Spatio-Temporal LLM: Reasoning about Environments and Actions Haozhen Zheng et.al. 2507.05258 translate read null
2025-07-07 Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions Yuanzhe Hu et.al. 2507.05257 translate read null
2025-07-07 Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Yana Wei et.al. 2507.05255 translate read null
2025-07-07 Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models Ziqi Miao et.al. 2507.05248 translate read null
2025-07-07 StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Meng Wei et.al. 2507.05240 translate read null
2025-07-07 All in One: Visual-Description-Guided Unified Point Cloud Segmentation Zongyan Han et.al. 2507.05211 translate read null
2025-07-07 CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale Jonathan Hyun et.al. 2507.05178 translate read null
2025-07-07 OpenS2S: Advancing Open-Source End-to-End Empathetic Large Speech Language Model Chen Wang et.al. 2507.05177 translate read null
2025-07-07 AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models Chinnappa Guggilla et.al. 2507.05157 translate read null
2025-07-03 Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation Jiaer Xia et.al. 2507.02859 translate read null
2025-07-03 Requirements Elicitation Follow-Up Question Generation Yuchen Shen et.al. 2507.02858 translate read null
2025-07-03 MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs Purbesh Mitra et.al. 2507.02851 translate read null
2025-07-03 Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection Ziqi Miao et.al. 2507.02844 translate read null
2025-07-03 LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding Yuchen Ma et.al. 2507.02843 translate read null
2025-07-03 StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason Kaiyi Zhang et.al. 2507.02841 translate read null
2025-07-03 ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning Ruiyang Zhou et.al. 2507.02834 translate read null
2025-07-03 SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model Wencheng Zhang et.al. 2507.02822 translate read null
2025-07-03 Multimodal Mathematical Reasoning with Diverse Solving Perspective Wenhao Shi et.al. 2507.02804 translate read null
2025-07-03 Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models Riccardo Cantini et.al. 2507.02799 translate read null
2025-07-02 Kwai Keye-VL Technical Report Kwai Keye Team et.al. 2507.01949 translate read null
2025-07-02 SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars Xiaosheng Zhao et.al. 2507.01939 translate read null
2025-07-02 The Thin Line Between Comprehension and Persuasion in LLMs Adrian de Wynter et.al. 2507.01936 translate read null
2025-07-02 Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations Wenhao Wang et.al. 2507.01930 translate read null
2025-07-03 Decision-Oriented Text Evaluation Yu-Shiang Huang et.al. 2507.01923 translate read null
2025-07-02 Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models Chengao Li et.al. 2507.01915 translate read null
2025-07-02 Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning Qingdong He et.al. 2507.01908 translate read null
2025-07-02 AI4Research: A Survey of Artificial Intelligence for Scientific Research Qiguang Chen et.al. 2507.01903 translate read null
2025-07-02 High-Layer Attention Pruning with Rescaling Songtao Liu et.al. 2507.01900 translate read null
2025-07-02 MiCoTA: Bridging the Learnability Gap with Intermediate CoT and Teacher Assistants Dongyi Ding et.al. 2507.01887 translate read null
2025-07-01 Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives Sixun Dong et.al. 2506.24124 translate read null

(<a href=../LLM.md>back to LLM</a>)