LLM

Publish Date Title Authors PDF Code
2025-12-18 AdaTooler-V: Adaptive Tool-Use for Images and Videos Chaoyang Wang et.al. 2512.16918 null
2025-12-18 Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning Qihao Liu et.al. 2512.16917 null
2025-12-18 Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward Peter Chen et.al. 2512.16912 null
2025-12-18 Impacts of Racial Bias in Historical Training Data for News AI Rahul Bhargava et.al. 2512.16901 null
2025-12-18 Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image Yushi Hu et.al. 2512.16899 null
2025-12-18 LinkedOut: Linking World Knowledge Representation Out of Video LLM for Next-Generation Video Recommendation Haichao Zhang et.al. 2512.16891 null
2025-12-18 AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning Tzu-Han Lin et.al. 2512.16883 null
2025-12-18 TOGGLE: Temporal Logic-Guided Large Language Model Compression for Edge Khurram Khalil et.al. 2512.16855 null
2025-12-18 Meta-RL Induces Exploration in Language Agents Yulun Jiang et.al. 2512.16848 null
2025-12-18 Toward Systematic Counterfactual Fairness Evaluation of Large Language Models: The CAFFE Framework Alessandra Parziale et.al. 2512.16816 null
2025-12-18 From Facts to Conclusions : Integrating Deductive Reasoning in Retrieval-Augmented LLMs Shubham Mishra et.al. 2512.16795 null
2025-12-18 Inside Out: Uncovering How Comment Internalization Steers LLMs for Better or Worse Aaron Imani et.al. 2512.16790 null
2025-12-18 Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future Tianshuai Hu et.al. 2512.16760 null
2025-12-18 Plausibility as Failure: How LLMs and Humans Co-Construct Epistemic Error Claudia Vale Oliveira et.al. 2512.16750 null
2025-12-18 AI-Driven Prediction of Cancer Pain Episodes: A Hybrid Decision Support Approach Yipeng Zhuang et.al. 2512.16739 null
2025-12-18 Cyber Humanism in Education: Reclaiming Agency through AI and Learning Sciences Giovanni Adorni et.al. 2512.16701 null
2025-12-18 Do Multi-Agents Solve Better Than Single? Evaluating Agentic Frameworks for Diagram-Grounded Geometry Problem Solving and Reasoning Mahbub E Sobhani et.al. 2512.16698 null
2025-12-18 DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Hao Liang et.al. 2512.16676 null
2025-12-18 Microsoft Academic Graph Information Retrieval for Research Recommendation and Assistance Jacob Reiss et.al. 2512.16661 null
2025-12-18 Prefix Probing: Lightweight Harmful Content Detection for Large Language Models Jirui Yang et.al. 2512.16650 null
2025-12-18 JustRL: Scaling a 1.5B LLM with a Simple RL Recipe Bingxiang He et.al. 2512.16649 null
2025-12-18 Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game Barna Pásztor et.al. 2512.16626 null
2025-12-18 Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics Iker García-Ferrero et.al. 2512.16602 null
2025-12-18 Muon is Provably Faster with Momentum Variance Reduction Xun Qian et.al. 2512.16598 null
2025-12-18 Sketch-in-Latents: Eliciting Unified Reasoning in MLLMs Jintao Tong et.al. 2512.16584 null
2025-12-18 Non-Asymptotic Global Convergence of PPO-Clip Yin Liu et.al. 2512.16565 null
2025-12-18 Needle in the Web: A Benchmark for Retrieving Targeted Web Pages in the Wild Yumeng Wang et.al. 2512.16553 null
2025-12-18 A Systematic Study of Code Obfuscation Against LLM-based Vulnerability Detection Xiao Li et.al. 2512.16538 null
2025-12-18 From Personalization to Prejudice: Bias and Discrimination in Memory-Enhanced AI Agents for Recruitment Himanshu Gharat et.al. 2512.16532 null
2025-12-18 Scaling Laws for Energy Efficiency of Local LLMs Ander Alvarez et.al. 2512.16531 null
2025-12-18 Plain language adaptations of biomedical text using LLMs: Comparision of evaluation metrics Primoz Kocbek et.al. 2512.16530 null
2025-12-18 Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems En-Ming Huang et.al. 2512.16473 null
2025-12-18 cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution Jinwu Chen et.al. 2512.16465 null
2025-12-18 TimeSeries2Report prompting enables adaptive large language model management of lithium-ion batteries Jiayang Yang et.al. 2512.16453 null
2025-12-18 Towards AI-Supported Research: a Vision of the TIB AIssistant Sören Auer et.al. 2512.16447 null
2025-12-18 Topic Modelling Black Box Optimization Roman Akramov et.al. 2512.16445 null
2025-12-18 TIB AIssistant: a Platform for AI-Supported Research Across Research Life Cycles Allard Oelen et.al. 2512.16442 null
2025-12-18 From Essence to Defense: Adaptive Semantic-aware Watermarking for Embedding-as-a-Service Copyright Protection Hao Li et.al. 2512.16439 null
2025-12-18 Introducing ORKG ASK: an AI-driven Scholarly Literature Search and Exploration System Taking a Neuro-Symbolic Approach Allard Oelen et.al. 2512.16425 null
2025-12-18 Synthelite: Chemist-aligned and feasibility-aware synthesis planning with LLMs Nguyen Xuan-Vu et.al. 2512.16424 null
2025-12-18 Large Language Models as a (Bad) Security Norm in the Context of Regulation and Compliance Kaspar Rosager Ludvigsen et.al. 2512.16419 null
2025-12-18 BrepLLM: Native Boundary Representation Understanding with Large Language Models Liyuan Deng et.al. 2512.16413 null
2025-12-18 A Network Arena for Benchmarking AI Agents on Network Troubleshooting Zhihao Wang et.al. 2512.16381 null
2025-12-18 Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs Sara Papi et.al. 2512.16378 null
2025-12-18 Factorized Video Generation: Decoupling Scene Construction and Temporal Synthesis in Text-to-Video Diffusion Models Mariam Hassan et.al. 2512.16371 null
2025-12-18 AI Needs Physics More Than Physics Needs AI Peter Coveney et.al. 2512.16344 null
2025-12-18 Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference Arther Tian et.al. 2512.16317 null
2025-12-18 Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation Yuxuan Qiao et.al. 2512.16310 null
2025-12-18 PixelArena: A benchmark for Pixel-Precision Visual Intelligence Feng Liang et.al. 2512.16303 null
2025-12-18 Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection Fanrui Zhang et.al. 2512.16300 null
2025-12-18 Feature-Selective Representation Misdirection for Machine Unlearning Taozhao Chen et.al. 2512.16297 null
2025-12-18 MACL: Multi-Label Adaptive Contrastive Learning Loss for Remote Sensing Image Retrieval Amna Amir et.al. 2512.16294 null
2025-12-18 Ein Typenrad auf der Überholspur: Die Kult-Schreibmaschine “Erika” trifft KI Karola Köpferl et.al. 2512.16293 null
2025-12-18 In-Context Probing for Membership Inference in Fine-Tuned Language Models Zhexi Lu et.al. 2512.16292 null
2025-12-18 Evaluating OpenAI GPT Models for Translation of Endangered Uralic Languages: A Comparison of Reasoning and Non-Reasoning Architectures Yehor Tereshchenko et.al. 2512.16287 null
2025-12-18 CKA-Guided Modular Quantization: Beyond Bit-Width to Algorithmic Diversity Jinhao Zhang et.al. 2512.16282 null
2025-12-18 Love, Lies, and Language Models: Investigating AI’s Role in Romance-Baiting Scams Gilad Gressel et.al. 2512.16280 null
2025-12-18 QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems Yiliu Yang et.al. 2512.16279 null
2025-12-18 Fast Collaborative Inference via Distributed Speculative Decoding Ce Zheng et.al. 2512.16273 null
2025-12-18 Beyond Blind Spots: Analytic Hints for Mitigating LLM-Based Evaluation Pitfalls Ora Nova Fandina et.al. 2512.16272 null
2025-12-18 Learning to Wait: Synchronizing Agents with the Physical World Yifei She et.al. 2512.16262 null
2025-12-18 AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding Sanjoy Chowdhury et.al. 2512.16250 null
2025-12-18 AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints Aniruddha Roy et.al. 2512.16245 null
2025-12-18 Coarse-to-Fine Open-Set Graph Node Classification with Large Language Models Xueqi Ma et.al. 2512.16244 null
2025-12-18 Trustworthy and Controllable Professional Knowledge Utilization in Large Language Models with TEE-GPU Execution Yifeng Cai et.al. 2512.16238 null
2025-12-18 The Evolution of Reranking Models in Information Retrieval: From Heuristic Methods to Large Language Models Tejul Pandit et.al. 2512.16236 null
2025-12-18 LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding Chenkai Xu et.al. 2512.16229 null
2025-12-18 An Information-Theoretic Framework for Robust Large Language Model Editing Qizhou Chen et.al. 2512.16227 null
2025-12-18 DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack Hao Li et.al. 2512.16182 null
2025-12-18 Ev-Trust: A Strategy Equilibrium Trust Mechanism for Evolutionary Games in LLM-Based Multi-Agent Services Shiduo Yang et.al. 2512.16167 null
2025-12-18 Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference Jian Tian et.al. 2512.16134 null
2025-12-18 Scaling Text2SQL via LLM-efficient Schema Filtering with Functional Dependency Graph Rerankers Thanh Dat Hoang et.al. 2512.16083 null
2025-12-18 Auto-Vocabulary 3D Object Detection Haomeng Zhang et.al. 2512.16077 null
2025-12-18 LLM4Perf: Large Language Models Are Effective Samplers for Multi-Objective Performance Modeling (Copy) Xin Wang et.al. 2512.16070 null
2025-12-18 A Multi-Agent Large Language Model Framework for Automated Qualitative Analysis Qidi Xu et.al. 2512.16063 null
2025-12-18 ContextLeak: Auditing Leakage in Private In-Context Learning Methods Jacob Choi et.al. 2512.16059 null
2025-12-18 MultiPath Transfer Engine: Breaking GPU and Host-Memory Bandwidth Bottlenecks in LLM Services Lingfeng Tang et.al. 2512.16056 null
2025-12-17 Topic Discovery and Classification for Responsible Generative AI Adaptation in Higher Education Diane Myung-kyung Woodbridge et.al. 2512.16036 null
2025-12-17 Do Large Language Models Know What They Don’t Know? Kalshibench: A New Benchmark for Evaluating Epistemic Calibration via Prediction Markets Lukas Nel et.al. 2512.16030 null
2025-12-17 Cross-Language Bias Examination in Large Language Models Yuxuan Liang et.al. 2512.16029 null
2025-12-17 Conversational Time Series Foundation Models: Towards Explainable and Effective Forecasting Defu Cao et.al. 2512.16022 null
2025-12-17 Few-Shot Inference of Human Perceptions of Robot Performance in Social Navigation Scenarios Qiping Zhang et.al. 2512.16019 null
2025-12-17 OLAF: Towards Robust LLM-Based Annotation Framework in Empirical Software Engineering Mia Mohammad Imran et.al. 2512.15979 null
2025-12-17 Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models Caner Erden et.al. 2512.15973 null
2025-12-17 BRAID: Bounded Reasoning for Autonomous Inference and Decisions Armağan Amcalar et.al. 2512.15959 null
2025-12-17 The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs Tejas Anvekar et.al. 2512.15949 null
2025-12-17 Privacy Discourse and Emotional Dynamics in Mental Health Information Interaction on Reddit Jai Kruthunz Naveen Kumar et.al. 2512.15945 null
2025-12-17 Small Language Models for Efficient Agentic Tool Calling: Outperforming Large Models with Targeted Fine-tuning Polaris Jhandi et.al. 2512.15943 null
2025-12-17 City Navigation in the Wild: Exploring Emergent Navigation from Web-Scale Knowledge in MLLMs Dwip Dalal et.al. 2512.15933 null
2025-12-17 DSO: Direct Steering Optimization for Bias Mitigation Lucas Monteiro Paes et.al. 2512.15926 null
2025-12-17 Leveraging Spreading Activation for Improved Document Retrieval in Knowledge-Graph-Based RAG Systems Jovan Pavlović et.al. 2512.15922 null
2025-12-17 TabReX : Tabular Referenceless eXplainable Evaluation Tejas Anvekar et.al. 2512.15907 null
2025-12-17 Darth Vecdor: An Open-Source System for Generating Knowledge Graphs Through Large Language Model Queries Jonathan A. Handler et.al. 2512.15906 null
2025-12-17 PediatricAnxietyBench: Evaluating Large Language Model Safety Under Parental Anxiety and Pressure in Pediatric Consultations Vahideh Zolfaghari et.al. 2512.15894 null
2025-12-17 VET Your Agent: Towards Host-Independent Autonomy via Verifiable Execution Traces Artem Grigor et.al. 2512.15892 null
2025-12-17 Seeing Beyond Words: Self-Supervised Visual Learning for Multimodal Large Language Models Davide Caffagni et.al. 2512.15885 null
2025-12-17 HEPTAPOD: Orchestrating High Energy Physics Workflows Towards Autonomous Agency Tony Menzo et.al. 2512.15867 null
2025-12-17 Dynamic Rebatching for Efficient Early-Exit Inference with DREX Xuting Liu et.al. 2512.15705 null
2025-12-17 Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning Yifei Li et.al. 2512.15693 null
2025-12-17 Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Zhenwen Liang et.al. 2512.15687 null
2025-12-17 Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers Adam Karvonen et.al. 2512.15674 null
2025-12-17 Explaining the Reasoning of Large Language Models Using Attribution Graphs Chase Walker et.al. 2512.15663 null
2025-12-17 Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning Jiaqi Xu et.al. 2512.15662 null
2025-12-17 How Much is Too Much? Exploring LoRA Rank Trade-offs for Retaining Knowledge and Domain Robustness Darshita Rathore et.al. 2512.15634 null
2025-12-17 Evaluating Metrics for Safety with LLM-as-Judges Kester Clegg et.al. 2512.15617 null
2025-12-17 Behavior Tokens Speak Louder: Disentangled Explainable Recommendation with Behavior Vocabulary Xinshun Feng et.al. 2512.15614 null
2025-12-17 Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction Mathieu Blondel et.al. 2512.15605 null
2025-12-17 Evaluating Large Language Models in Scientific Discovery Zhangde Song et.al. 2512.15567 null
2025-12-17 GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models Bozhou Li et.al. 2512.15560 null
2025-12-17 CTkvr: KV Cache Retrieval for Long-Context LLMs via Centroid then Token Indexing Kuan Lu et.al. 2512.15550 null
2025-12-17 When a Nation Speaks: Machine Learning and NLP in People’s Sentiment Analysis During Bangladesh’s 2024 Mass Uprising Md. Samiul Alim et.al. 2512.15547 null
2025-12-17 An Efficient and Effective Encoder Model for Vision and Language Tasks in the Remote Sensing Domain João Daniel Silva et.al. 2512.15531 null
2025-12-17 EmoCaliber: Advancing Reliable Visual Emotion Comprehension via Confidence Verbalization and Calibration Daiqing Wu et.al. 2512.15528 null
2025-12-17 How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code? Hua Yang et.al. 2512.15468 null
2025-12-17 On Assessing the Relevance of Code Reviews Authored by Generative Models Robert Heumüller et.al. 2512.15466 null
2025-12-17 Toward expert-level motivational interviewing for health behavior improvement with LLMs Run-ze Hu et.al. 2512.15446 null
2025-12-17 Step-GUI Technical Report Haolong Yan et.al. 2512.15431 null
2025-12-17 Can AI Generate more Comprehensive Test Scenarios? Review on Automated Driving Systems Test Scenario Generation Methods Ji Zhou et.al. 2512.15422 null
2025-12-17 Bilateral Spatial Reasoning about Street Networks: Graph-based RAG with Qualitative Spatial Representations Reinhard Moratz et.al. 2512.15388 null
2025-12-17 MedNuggetizer: Confidence-Based Information Nugget Extraction from Medical Documents Gregor Donabauer et.al. 2512.15384 null
2025-12-17 SCOPE: Prompt Evolution for Enhancing Agent Effectiveness Zehua Pei et.al. 2512.15374 null
2025-12-17 ArcBERT: An LLM-based Search Engine for Exploring Integrated Multi-Omics Metadata Gajendra Doniparthi et.al. 2512.15365 null
2025-12-17 Revisiting Task-Oriented Dataset Search in the Era of Large Language Models: Challenges, Benchmark, and Solution Zixin Wei et.al. 2512.15363 null
2025-12-17 Dual-Density Inference for Efficient Language Model Reasoning Zhengyi Zhao et.al. 2512.15358 null
2025-12-17 Adversarial versification in portuguese as a jailbreak operator in LLMs Joao Queiroz et.al. 2512.15353 null
2025-12-17 Exploring User Acceptance and Concerns toward LLM-powered Conversational Agents in Immersive Extended Reality Efe Bozkir et.al. 2512.15343 null
2025-12-17 Evaluating LLMs for Zeolite Synthesis Event Extraction (ZSEE): A Systematic Analysis of Prompting Strategies Charan Prakash Rathore et.al. 2512.15312 null
2025-12-17 SynthSeg-Agents: Multi-Agent Synthetic Data Generation for Zero-Shot Weakly Supervised Semantic Segmentation Wangyu Wu et.al. 2512.15310 null
2025-12-17 Towards Proactive Personalization through Profile Customization for Individual Users in Dialogues Xiaotian Zhang et.al. 2512.15302 null
2025-12-17 ChatGPT and Gemini participated in the Korean College Scholastic Ability Test – Earth Science I Seok-Hyun Ga et.al. 2512.15298 null
2025-12-17 Heterogeneous Model Alignment in Digital Twin Faima Abbasi et.al. 2512.15281 null
2025-12-17 Bounty Hunter: Autonomous, Comprehensive Emulation of Multi-Faceted Adversaries Louis Hackländer-Jansen et.al. 2512.15275 null
2025-12-17 Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning Yiliu Sun et.al. 2512.15274 null
2025-12-17 Gaming the Arena: AI Model Evaluation and the Viral Capture of Attention Sam Hind et.al. 2512.15252 null
2025-12-17 The Moralization Corpus: Frame-Based Annotation and Analysis of Moralizing Speech Acts across Diverse Text Genres Maria Becker et.al. 2512.15248 null
2025-12-17 Null-LoRA: Low-Rank Adaptation on Null Space Yi Zhang et.al. 2512.15233 null
2025-12-17 CangLing-KnowFlow: A Unified Knowledge-and-Flow-fused Agent for Comprehensive Remote Sensing Applications Zhengchao Chen et.al. 2512.15231 null
2025-12-17 Yes-MT’s Submission to the Low-Resource Indic Language Translation Shared Task in WMT 2024 Yash Bhaskar et.al. 2512.15226 null
2025-12-17 RFKG-CoT: Relation-Driven Adaptive Hop-count Selection and Few-Shot Path Guidance for Knowledge-Aware QA Chao Zhang et.al. 2512.15219 null
2025-12-17 DEER: Draft with Diffusion, Verify with Autoregressive Models Zicong Cheng et.al. 2512.15176 null
2025-12-17 MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers Xuanjun Zong et.al. 2512.15163 null
2025-12-17 Offline Multi-Task Multi-Objective Data-Driven Evolutionary Algorithm with Language Surrogate Model and Implicit Q-Learning Xian-Rong Zhang et.al. 2512.15149 null
2025-12-17 Aligning Academia with Industry: An Empirical Study of Industrial Needs and Academic Capabilities in AI-Driven Software Engineering Hang Yu et.al. 2512.15148 null
2025-12-17 Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning Weiqin Wang et.al. 2512.15146 null
2025-12-17 I am here for you”: How relational conversational AI appeals to adolescents, especially those who are socially and emotionally vulnerable Pilyoung Kim et.al. 2512.15117 null
2025-12-17 Uni-Parser Technical Report Xi Fang et.al. 2512.15098 null
2025-12-17 Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models Jinwu Hu et.al. 2512.15089 null
2025-12-17 The Semantic Architect: How FEAML Bridges Structured Data and LLMs for Multi-Label Tasks Wanfu Gao et.al. 2512.15082 null
2025-12-17 Quantifying Return on Security Controls in LLM Systems Richard Helder Moulton et.al. 2512.15081 null
2025-12-17 An Exploratory Study of Bayesian Prompt Optimization for Test-Driven Code Generation with Large Language Models Shlok Tomar et.al. 2512.15076 null
2025-12-17 The Meta-Prompting Protocol: Orchestrating LLMs via Adversarial Feedback Loops Fanzhe Fu et.al. 2512.15053 null
2025-12-17 SGM: Safety Glasses for Multimodal Large Language Models via Neuron-Level Detoxification Hongbo Wang et.al. 2512.15052 null
2025-12-17 Beyond Accuracy: A Geometric Stability Analysis of Large Language Models in Chess Evaluation Xidan Song et.al. 2512.15033 null
2025-12-17 Toxicity Ahead: Forecasting Conversational Derailment on GitHub Mia Mohammad Imran et.al. 2512.15031 null
2025-12-17 SeBERTis: A Framework for Producing Classifiers of Security-Related Issue Reports Sogol Masoumzadeh et.al. 2512.15003 null
2025-12-17 DreamPRM-Code: Function-as-Step Process Reward Model with Label Correction for LLM Coding Ruiyi Zhang et.al. 2512.15000 null
2025-12-17 Evaluating Large Language Models on Multimodal Chemistry Olympiad Exams Yiming Cui et.al. 2512.14989 null
2025-12-16 EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving Shaoting Feng et.al. 2512.14946 null
2025-12-16 Parameter Efficient Multimodal Instruction Tuning for Romanian Vision Language Models George-Andrei Dima et.al. 2512.14926 null
2025-12-16 Multiscale Aggregated Hierarchical Attention (MAHA): A Game Theoretic and Optimization Driven Approach to Efficient Contextual Modeling in Large Language Models Caner Erden et.al. 2512.14925 null
2025-12-16 Evaluating Code Reasoning Abilities of Large Language Models Under Real-World Settings Changshu Liu et.al. 2512.14917 null
2025-12-16 DrugRAG: Enhancing Pharmacy LLM Performance Through A Novel Retrieval-Augmented Generation Pipeline Houman Kazemzadeh et.al. 2512.14896 null
2025-12-16 Integrating Large Language Models and Knowledge Graphs to Capture Political Viewpoints in News Media Massimiliano Fadda et.al. 2512.14887 null
2025-12-16 Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse Jingwei Chen et.al. 2512.14879 null
2025-12-16 Isolated Sign Language Recognition with Segmentation and Pose Estimation Daniel Perkins et.al. 2512.14876 null
2025-12-16 HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering Dan Ben-Ami et.al. 2512.14870 null
2025-12-16 MALCDF: A Distributed Multi-Agent LLM Framework for Real-Time Cyber Arth Bhardwaj et.al. 2512.14846 null
2025-12-16 Sharing State Between Prompts and Programs Ellie Y. Cheng et.al. 2512.14805 null
2025-12-16 Incentives or Ontology? A Structural Rebuttal to OpenAI’s Hallucination Thesis Richard Ackermann et.al. 2512.14801 null
2025-12-16 IaC Generation with LLMs: An Error Taxonomy and A Study on Configuration Knowledge Injection Roman Nekrasov et.al. 2512.14792 null
2025-12-16 TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs Jun Zhang et.al. 2512.14698 null
2025-12-16 Fast and Accurate Causal Parallel Decoding using Jacobi Forcing Lanxiang Hu et.al. 2512.14681 null
2025-12-16 EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models Zechen Bai et.al. 2512.14666 null
2025-12-16 Enhancing Visual Sentiment Analysis via Semiotic Isotopy-Guided Dataset Construction Marco Blanchini et.al. 2512.14665 null
2025-12-16 Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models Chiyue Wei et.al. 2512.14661 null
2025-12-16 Beyond Text-to-SQL: Autonomous Research-Driven Database Exploration with DAR Ostap Vykhopen et.al. 2512.14622 null
2025-12-16 PerProb: Indirectly Evaluating Memorization in Large Language Models Yihan Liao et.al. 2512.14600 null
2025-12-16 LLM-driven Knowledge Enhancement for Multimodal Cancer Survival Prediction Chenyu Zhao et.al. 2512.14594 null
2025-12-16 Towards Nepali-language LLMs: Efficient GPT training with a Nepali BPE tokenizer Adarsha Shrestha et.al. 2512.14585 null
2025-12-16 Pairwise Comparison for Bias Identification and Quantification Fabian Haak et.al. 2512.14565 null
2025-12-16 Polypersona: Persona-Grounded LLM for Synthetic Survey Responses Tejaswani Dash et.al. 2512.14562 null
2025-12-16 Agreement Between Large Language Models and Human Raters in Essay Scoring: A Research Synthesis Hongli Li et.al. 2512.14561 null
2025-12-16 CLNet: Cross-View Correspondence Makes a Stronger Geo-Localizationer Xianwei Cao et.al. 2512.14560 null
2025-12-16 VLegal-Bench: Cognitively Grounded Benchmark for Vietnamese Legal Reasoning of Large Language Models Nguyen Tien Dong et.al. 2512.14554 null
2025-12-16 VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse Ying Nie et.al. 2512.14531 null
2025-12-16 RecGPT-V2 Technical Report Chao Yi et.al. 2512.14503 null
2025-12-16 C-ing Clearly: Enhanced Binary Code Explanations using C code Teodor Poncu et.al. 2512.14500 null
2025-12-16 SASQ: Static Activation Scaling for Quantization-Aware Training in Large Language Models Shizhuo Mao et.al. 2512.14481 null
2025-12-16 Model-First Reasoning LLM Agents: Reducing Hallucinations through Explicit Problem Modeling Annu Rana et.al. 2512.14474 null
2025-12-16 Reasoning-Style Poisoning of LLM Agents via Stealthy Style Transfer: Process-Level Attacks and Runtime Monitoring in RSV Space Xingfu Zhou et.al. 2512.14448 null
2025-12-16 Seismology modeling agent: A smart assistant for geophysical researchers Yukun Ren et.al. 2512.14429 null
2025-12-16 Effect of Document Packing on the Latent Multi-Hop Reasoning Capabilities of Large Language Models Gabriele Prato et.al. 2512.14427 null
2025-12-16 DISCODE: Distribution-Aware Score Decoder for Robust Automatic Evaluation of Image Captioning Nakamasa Inoue et.al. 2512.14420 null
2025-12-16 PortAgent: LLM-driven Vehicle Dispatching Agent for Port Terminals Jia Hu et.al. 2512.14417 null
2025-12-16 Massive Editing for Large Language Models Based on Dynamic Weight Generation Wentao Wan et.al. 2512.14395 null
2025-12-16 RePo: Language Models with Context Re-Positioning Huayang Li et.al. 2512.14391 null
2025-12-16 Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations Xudong Han et.al. 2512.14321 null
2025-12-16 Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity Shuai Dong et.al. 2512.14320 null
2025-12-16 Inflation Attitudes of Large Language Models Nikoleta Anesti et.al. 2512.14306 null
2025-12-16 Leveraging LLMs for Collaborative Ontology Engineering in Parkinson Disease Monitoring and Alerting Georgios Bouchouras et.al. 2512.14288 null
2025-12-16 The Trust in AI-Generated Health Advice (TAIGHA) Scale and Short Version (TAIGHA-S): Development and Validation Study Marvin Kopka et.al. 2512.14278 null
2025-12-16 SPARQL-LLM: Real-Time SPARQL Query Generation from Natural Language Questions Panayiotis Smeros et.al. 2512.14277 null
2025-12-16 Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs Wentao Wan et.al. 2512.14257 null
2025-12-16 TEMP: A Memory Efficient Physical-aware Tensor Partition-Mapping Framework on Wafer-scale Chips Huizheng Wang et.al. 2512.14256 null
2025-12-16 From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition Yiqing Zhou et.al. 2512.14244 null
2025-12-16 Two CFG Nahuatl for automatic corpora expansion Juan-José Guzmán-Landa et.al. 2512.14239 null
2025-12-16 Ladder Up, Memory Down: Low-Cost Fine-Tuning With Side Nets Estelle Zheng et.al. 2512.14237 null
2025-12-16 PentestEval: Benchmarking LLM-based Penetration Testing with Modular and Stage-Level Design Ruozhao Yang et.al. 2512.14233 null
2025-12-16 Georeferencing complex relative locality descriptions with large language models Aneesha Fernando et.al. 2512.14228 null
2025-12-16 Estimating problem difficulty without ground truth using Large Language Model comparisons Marthe Ballon et.al. 2512.14220 null
2025-12-16 IntentMiner: Intent Inversion Attack via Tool Call Analysis in the Model Context Protocol Yunhao Yao et.al. 2512.14166 null
2025-12-16 Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement Songze Liu et.al. 2512.14151 null
2025-12-16 Astraea: A State-Aware Scheduling Engine for LLM-Powered Agents Hongqiu Ni et.al. 2512.14142 null
2025-12-16 TorchTraceAP: A New Benchmark Dataset for Detecting Performance Anti-Patterns in Computer Vision Models Hanning Chen et.al. 2512.14141 null
2025-12-16 LAPPI: Interactive Optimization with LLM-Assisted Preference-Based Problem Instantiation So Kuroki et.al. 2512.14138 null
2025-12-16 SportsGPT: An LLM-driven Framework for Interpretable Sports Motion Assessment and Training Guidance Wenbo Tian et.al. 2512.14121 null
2025-12-16 CogMem: A Cognitive Memory Architecture for Sustained Multi-Turn Reasoning in Large Language Models Yiran Zhang et.al. 2512.14118 null
2025-12-16 Neurosymbolic Inference On Foundation Models For Remote Sensing Text-to-image Retrieval With Complex Queries Emanuele Mezzi et.al. 2512.14102 null
2025-12-16 A First-Order Logic-Based Alternative to Reward Models in RLHF Chunjin Jian et.al. 2512.14100 null
2025-12-16 Cornserve: Efficiently Serving Any-to-Any Multimodal Models Jeff J. Ma et.al. 2512.14098 null
2025-12-16 A Unified Sparse Attention via Multi-Granularity Compression Siran Liu et.al. 2512.14082 null
2025-12-16 From Obfuscated to Obvious: A Comprehensive JavaScript Deobfuscation Tool for Security Analysis Dongchao Zhou et.al. 2512.14070 null
2025-12-16 RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees Junjie Ma et.al. 2512.14069 null
2025-12-16 What Affects the Effective Depth of Large Language Models? Yi Hu et.al. 2512.14064 null
2025-12-16 HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices HyperAI Team et.al. 2512.14052 null
2025-12-16 OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value Mengzhang Cai et.al. 2512.14051 null
2025-12-16 Intention Chain-of-Thought Prompting with Dynamic Routing for Code Generation Shen Li et.al. 2512.14048 null
2025-12-16 Evaluating Small Language Models for Agentic On-Farm Decision Support Systems Enhong Liu et.al. 2512.14043 null
2025-12-16 ChartAgent: A Chart Understanding Framework with Tool Integrated Reasoning Boran Wang et.al. 2512.14040 null
2025-12-16 PerfCoder: Large Language Models for Interpretable Code Performance Optimization Jiuding Yang et.al. 2512.14018 null
2025-12-16 KFS-Bench: Comprehensive Evaluation of Key Frame Sampling in Long Video Understanding Zongyao Li et.al. 2512.14017 null
2025-12-16 Sparsity-Controllable Dynamic Top-p MoE for Large Foundation Model Pre-training Can Jin et.al. 2512.13996 null
2025-12-16 Structure-Aware Decoding Mechanisms for Complex Entity Extraction with Large-Scale Language Models Zhimin Qiu et.al. 2512.13980 null
2025-12-16 ReflCtrl: Controlling LLM Reflection via Representation Engineering Ge Yan et.al. 2512.13979 null
2025-12-16 Evaluating Frontier LLMs on PhD-Level Mathematical Reasoning: A Benchmark on a Textbook in Theoretical Computer Science about Randomized Algorithms Yang Cao et.al. 2512.13978 null
2025-12-16 Autonomous Construction-Site Safety Inspection Using Mobile Robots: A Multilayer VLM-LLM Pipeline Hossein Naderi et.al. 2512.13974 null
2025-12-15 Informing Acquisition Functions via Foundation Models for Molecular Discovery Qi Chen et.al. 2512.13935 null
2025-12-15 Hierarchical Multi-agent Large Language Model Reasoning for Autonomous Functional Materials Discovery Samuel Rothfarb et.al. 2512.13930 null
2025-12-15 Context Branching for LLM Conversations: A Version Control Approach to Exploratory Programming Bhargav Chickmagalur Nanjundappa et.al. 2512.13914 null
2025-12-15 FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition Jonas Golde et.al. 2512.13884 null
2025-12-15 Verification-Guided Context Optimization for Tool Calling via Hierarchical LLMs-as-Editors Henger Li et.al. 2512.13860 null
2025-12-15 EvoLattice: Persistent Internal-Population Evolution through Multi-Alternative Quality-Diversity Graph Representations for LLM-Guided Program Discovery Kamer Ali Yuksel et.al. 2512.13857 null
2025-12-15 Practitioner Insights on Fairness Requirements in the AI Development Life Cycle: An Interview Study Chaima Boufaied et.al. 2512.13830 null
2025-12-15 The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces Subramanyam Sahoo et.al. 2512.13821 null
2025-12-15 State-Dependent Refusal and Learned Incapacity in RLHF-Aligned Language Models TK Lee et.al. 2512.13762 null
2025-12-15 A Scientific Reasoning Model for Organic Synthesis Procedure Generation Guoqing Liu et.al. 2512.13668 null
2025-12-15 Embedding-Based Rankings of Educational Resources based on Learning Outcome Alignment: Benchmarking, Expert Validation, and Learner Performance Mohammadreza Molavi et.al. 2512.13658 null
2025-12-15 Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation Richard J. Young et.al. 2512.13655 null
2025-12-15 Large-Language Memorization During the Classification of United States Supreme Court Cases John E. Ortega et.al. 2512.13654 null
2025-12-15 MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning Haoyu Fu et.al. 2512.13636 null
2025-12-15 Temporal Tokenization Strategies for Event Sequence Modeling with Large Language Models Zefang Liu et.al. 2512.13618 null
2025-12-15 Textual Gradients are a Flawed Metaphor for Automatic Prompt Optimization Daniel Melcer et.al. 2512.13598 null
2025-12-15 ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding Jia-Nan Li et.al. 2512.13586 null
2025-12-15 MMhops-R1: Multimodal Multi-hop Reasoning Tao Zhang et.al. 2512.13573 null
2025-12-15 PrahokBART: A Pre-trained Sequence-to-Sequence Model for Khmer Natural Language Generation Hour Kaing et.al. 2512.13552 null
2025-12-15 Fine-tuned LLM-based Code Migration Framework Oleg Grynets et.al. 2512.13515 null
2025-12-15 MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph Linjie Mu et.al. 2512.13510 null
2025-12-15 SkipCat: Rank-Maximized Low-Rank Compression of Large Language Models via Shared Projection and Block Skipping Yu-Chen Lu et.al. 2512.13494 null
2025-12-15 From Zipf’s Law to Neural Scaling through Heaps’ Law and Hilberg’s Hypothesis Łukasz Dębowski et.al. 2512.13491 null
2025-12-15 neuralFOMO: Can LLMs Handle Being Second Best? Measuring Envy-Like Preferences in Multi-Agent Settings Ojas Pungalia et.al. 2512.13481 null
2025-12-15 Non-Resolution Reasoning (NRR): A Computational Framework for Contextual Identity and Ambiguity Preservation Kei Saito et.al. 2512.13478 null
2025-12-15 Scaling Laws for Code: Every Programming Language Matters Jian Yang et.al. 2512.13472 null
2025-12-15 Large language models are not about natural language Johan J. Bolhuis et.al. 2512.13441 null
2025-12-15 From User Interface to Agent Interface: Efficiency Optimization of UI Representations for LLM Agents Dezhi Ran et.al. 2512.13438 null
2025-12-15 Behavior and Representation in Large Language Models for Combinatorial Optimization: From Feature Extraction to Algorithm Selection Francesca Da Ros et.al. 2512.13374 null
2025-12-15 Detecting Emotion Drift in Mental Health Text Using Pre-Trained Transformers Shibani Sankpal et.al. 2512.13363 null
2025-12-15 UCRBench: Benchmarking LLMs on Use Case Recovery Shuyuan Xiao et.al. 2512.13360 null
2025-12-15 On the Effectiveness of Membership Inference in Targeted Data Extraction from Large Language Models Ali Al Sahili et.al. 2512.13352 null
2025-12-15 FROC: A Unified Framework with Risk-Optimized Control for Machine Unlearning in LLMs Si Qi Goh et.al. 2512.13337 null
2025-12-15 FIN-bench-v2: A Unified and Robust Benchmark Suite for Evaluating Finnish Large Language Models Joona Kytöniemi et.al. 2512.13330 null
2025-12-15 Security and Detectability Analysis of Unicode Text Watermarking Methods Against Large Language Models Malte Hellmeier et.al. 2512.13325 null
2025-12-15 KlingAvatar 2.0 Technical Report Kling Team et.al. 2512.13313 null
2025-12-15 MiniLingua: A Small Open-Source LLM for European Languages Anna Aksenova et.al. 2512.13298 null
2025-12-15 AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning Jiaru Zou et.al. 2512.13278 null
2025-12-15 CogniEdit: Dense Gradient Flow Optimization for Fine-Grained Image Editing Yan Li et.al. 2512.13276 null
2025-12-15 Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection Juil Koo et.al. 2512.13250 null
2025-12-15 Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance Francesco Ragusa et.al. 2512.13238 null
2025-12-15 Efficient Adaptive Rejection Sampling for Accelerating Speculative Decoding in Large Language Models Chendong Sun et.al. 2512.13194 null
2025-12-15 Integrated Semantic and Temporal Alignment for Interactive Video Retrieval Thanh-Danh Luu et.al. 2512.13169 null
2025-12-15 A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis Xianchao Guan et.al. 2512.13164 null
2025-12-15 Can AI Understand What We Cannot Say? Measuring Multilevel Alignment Through Abortion Stigma Across Cognitive, Interpersonal, and Structural Levels Anika Sharma et.al. 2512.13142 null
2025-12-15 Uncovering the Role of Initial Saliency in U-Shaped Attention Bias: Scaling Initial Token Weight for Enhanced Long-Text Processing Zewen Qiang et.al. 2512.13109 null
2025-12-15 Socratic Students: Teaching Language Models to Learn by Asking Questions Rajeev Bhatt Ambati et.al. 2512.13102 null
2025-12-15 A Simple and Effective Framework for Symmetric Consistent Indexing in Large-Scale Dense Retrieval Huimu Wang et.al. 2512.13074 null
2025-12-15 M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization Bizhe Bai et.al. 2512.13070 null
2025-12-15 LLM Rationalis? Measuring Bargaining Capabilities of AI Negotiators Cheril Shah et.al. 2512.13063 null
2025-12-15 An Open and Reproducible Deep Research Agent for Long-Form Question Answering Ikuya Yamada et.al. 2512.13059 null
2025-12-15 Sharpen the Spec, Cut the Code: A Case for Generative File System with SYSSPEC Qingyuan Liu et.al. 2512.13047 null
2025-12-15 Understanding Structured Financial Data with LLMs: A Case Study on Fraud Detection Xuwei Tan et.al. 2512.13040 null
2025-12-15 Large Language Models for Power System Applications: A Comprehensive Literature Survey Muhammad Sarwar et.al. 2512.13004 null
2025-12-15 Are Large Language Models Really Effective for Training-Free Cold-Start Recommendation? Genki Kusano et.al. 2512.13001 null
2025-12-15 Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views Tingyang Chen et.al. 2512.12980 null
2025-12-15 Do Reviews Matter for Recommendations in the Era of Large Language Models? Chee Heng Tan et.al. 2512.12978 null
2025-12-15 Authors Should Annotate Marcus Ma et.al. 2512.12976 null
2025-12-15 Database Research needs an Abstract Relational Query Language Wolfgang Gatterbauer et.al. 2512.12957 null
2025-12-15 Building from Scratch: A Multi-Agent Framework with Human-in-the-Loop for Multilingual Legal Terminology Mapping Lingyi Meng et.al. 2512.12950 null
2025-12-15 SPAR: Session-based Pipeline for Adaptive Retrieval on Legacy File Systems Duy A. Nguyen et.al. 2512.12938 null
2025-12-15 PROSERVE: Unified Multi-Priority Request Scheduling for LLM Serving Weizhe Huang et.al. 2512.12928 null
2025-12-15 Interpretable Hypothesis-Driven Trading:A Rigorous Walk-Forward Validation Framework for Market Microstructure Signals Gagan Deep et.al. 2512.12924 null
2025-12-15 LLM-based Personalized Portfolio Recommender: Integrating Large Language Models and Reinforcement Learning for Intelligent Investment Strategy Optimization Bangyu Li et.al. 2512.12922 null
2025-12-15 Cisco Integrated AI Security and Safety Framework Report Amy Chang et.al. 2512.12921 null
2025-12-15 CTIGuardian: A Few-Shot Framework for Mitigating Privacy Leakage in Fine-Tuned LLMs Shashie Dilhara Batan Arachchige et.al. 2512.12914 null
2025-12-14 SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition Minghao Zhu et.al. 2512.12885 null
2025-12-14 ERA-IT: Aligning Semantic Models with Revealed Economic Preference for Real-Time and Explainable Patent Valuation Yoo Yongmin et.al. 2512.12869 null
2025-12-14 Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM Furong Jia et.al. 2512.12868 null
2025-12-14 Information-Consistent Language Model Recommendations through Group Relative Policy Optimization Sonal Prabhune et.al. 2512.12858 null
2025-12-14 Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, LLaMA Hanyu Cai et.al. 2512.12812 null
2025-12-14 Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution Boyang Yan et.al. 2512.12806 null
2025-12-14 A Disproof of Large Language Model Consciousness: The Necessity of Continual Learning for Consciousness Erik Hoel et.al. 2512.12802 null
2025-12-14 Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P Anurag Dutt et.al. 2512.12801 null
2025-12-14 DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning Zhe Liu et.al. 2512.12799 null
2025-12-14 A Rule-Aware Prompt Framework for Structured Numeric Reasoning in Cyber-Physical Systems Yichen Liu et.al. 2512.12794 null
2025-12-14 Beyond Task Completion: An Assessment Framework for Evaluating Agentic AI Systems Sreemaee Akshathala et.al. 2512.12791 null
2025-12-14 State over Tokens: Characterizing the Role of Reasoning Tokens Mosh Levy et.al. 2512.12777 null
2025-12-14 Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions Pedro Henrique Luz de Araujo et.al. 2512.12775 null
2025-12-14 JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation Jianghan Chao et.al. 2512.12772 null
2025-12-14 Adaptive Edge-Cloud Inference for Speech-to-Action Systems Using ASR and Large Language Models (ASTA) Mohammad Jalili Torkamani et.al. 2512.12769 null
2025-12-14 Intelligent Scientific Literature Explorer using Machine Learning (ISLE) Sina Jani et.al. 2512.12760 null
2025-12-14 FysicsWorld: A Unified Full-Modality Benchmark for Any-to-Any Understanding, Generation, and Reasoning Yue Jiang et.al. 2512.12756 null
2025-12-14 Resting Neurons, Active Insights: Improving Input Sparsification for Large Language Models Haotian Xu et.al. 2512.12744 null
2025-12-14 CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning Xuanzhang Liu et.al. 2512.12716 null
2025-12-14 Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning Enhong Mu et.al. 2512.12706 null
2025-12-14 Hybrid Retrieval-Augmented Generation for Robust Multilingual Document Question Answering Anthony Mudet et.al. 2512.12694 null
2025-12-14 Memoria: A Scalable Agentic Memory Framework for Personalized Conversational AI Samarth Sarin et.al. 2512.12686 null
2025-12-14 Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches Amirhossein Yousefiramandi et.al. 2512.12677 null
2025-12-14 LexRel: Benchmarking Legal Relation Extraction for Chinese Civil Cases Yida Cai et.al. 2512.12643 null
2025-12-14 DiG: Differential Grounding for Enhancing Fine-Grained Perception in Multimodal Large Language Model Zhou Tao et.al. 2512.12633 null
2025-12-14 ORIBA: Exploring LLM-Driven Role-Play Chatbot as a Creativity Support Tool for Original Character Artists Yuqian Sun et.al. 2512.12630 null
2025-12-14 Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space Chengzhi Liu et.al. 2512.12623 null
2025-12-14 Understanding Syllogistic Reasoning in LLMs from Formal and Natural Language Perspectives Aheli Poddar et.al. 2512.12620 null
2025-12-14 Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching Wonseok Choi et.al. 2512.12610 null
2025-12-14 Human-Inspired Learning for Large Language Models via Obvious Record and Maximum-Entropy Method Discovery Hong Su et.al. 2512.12608 null
2025-12-14 Vision-Enhanced Large Language Models for High-Resolution Image Synthesis and Multimodal Data Interpretation Karthikeya KV et.al. 2512.12595 null
2025-12-14 Beyond Static Scoring: Enhancing Assessment Validity via AI-Generated Interactive Verification Tom Lee et.al. 2512.12592 null
2025-12-14 StreamingAssistant: Efficient Visual Token Pruning for Accelerating Online Video Understanding Xinqi Jin et.al. 2512.12560 null
2025-12-14 Large Language Newsvendor: Decision Biases and Cognitive Mechanisms Jifei Liu et.al. 2512.12552 null
2025-12-14 HyperEdit: Unlocking Instruction-based Text Editing in LLMs via Hypernetworks Yiming Zeng et.al. 2512.12544 null
2025-12-14 NagaNLP: Bootstrapping NLP for Low-Resource Nagamese Creole with Human-in-the-Loop Synthetic Data Agniva Maiti et.al. 2512.12537 null
2025-12-14 Diverse LLMs vs. Vulnerabilities: Who Detects and Fixes Them Better? Arastoo Zibaeirad et.al. 2512.12536 null
2025-12-14 ATLAS: Automated Tree-based Language Analysis System for C and C++ source programs Jaid Monwar Chowdhury et.al. 2512.12507 null
2025-12-14 KidsArtBench: Multi-Dimensional Children’s Art Evaluation with Attribute-Aware MLLMs Mingrui Ye et.al. 2512.12503 null
2025-12-14 Explainable AI as a Double-Edged Sword in Dermatology: The Impact on Clinicians versus The Public Xuhai Xu et.al. 2512.12500 null
2025-12-13 The American Ghost in the Machine: How language models align culturally and the effects of cultural prompting James Luther et.al. 2512.12488 null
2025-12-13 HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments Yongjun He et.al. 2512.12476 null
2025-12-13 Large language models have learned to use language Gary Lupyan et.al. 2512.12447 null
2025-12-13 Can GPT replace human raters? Validity and reliability of machine-generated norms for metaphors Veronica Mangiaterra et.al. 2512.12444 null
2025-12-11 Towards Efficient and Effective Multi-Camera Encoding for End-to-End Driving Jiawei Yang et.al. 2512.10947 null
2025-12-11 FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos Yulu Gan et.al. 2512.10927 null
2025-12-11 SparseSwaps: Tractable LLM Pruning Mask Refinement at Scale Max Zimmer et.al. 2512.10922 null
2025-12-11 CompanionCast: A Multi-Agent Conversational AI Framework with Spatial Audio for Social Co-Viewing Experiences Yiyang Wang et.al. 2512.10918 null
2025-12-11 Multi-Granular Node Pruning for Circuit Discovery Muhammad Umair Haider et.al. 2512.10903 null
2025-12-11 LLMs Can Assist with Proposal Selection at Large User Facilities Lijie Ding et.al. 2512.10895 null
2025-12-11 Computational emotion analysis with multimodal LLMs: Current evidence on an emerging methodological opportunity Hauke Licht et.al. 2512.10882 null
2025-12-11 Quantifying Emotional Tone in Tolkien’s The Hobbit: Dialogue Sentiment Analysis with RegEx, NRC-VAD, and Python Lilin Qiu et.al. 2512.10865 null
2025-12-11 Large Language Models for Superconductor Discovery Suman Itani et.al. 2512.10847 null
2025-12-11 LabelFusion: Learning to Fuse LLMs and Transformer Classifiers for Robust Text Classification Michael Schlee et.al. 2512.10793 null
2025-12-11 The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality Aileen Cheng et.al. 2512.10791 null
2025-12-11 Natural Language Interface for Firewall Configuration F. Taghiyev et.al. 2512.10789 null
2025-12-11 Developing and Evaluating a Large Language Model-Based Automated Feedback System Grounded in Evidence-Centered Design for Supporting Physics Problem Solving Holger Maus et.al. 2512.10785 null
2025-12-11 Script Gap: Evaluating LLM Triage on Indian Languages in Native vs Roman Scripts in a Real World Setting Manurag Khullar et.al. 2512.10780 null
2025-12-11 OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification Zijian Wu et.al. 2512.10756 null
2025-12-11 LDP: Parameter-Efficient Fine-Tuning of Multimodal LLM for Medical Report Generation Tianyu Zhou et.al. 2512.10750 null
2025-12-11 Echoes of Automation: How Bots Shaped Political Discourse in Brazil Merve Ipek Bal et.al. 2512.10749 null
2025-12-11 TRIDENT: A Redundant Architecture for Caribbean-Accented Emergency Speech Triage Elroy Galbraith et.al. 2512.10741 null
2025-12-11 Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving Songyang Gao et.al. 2512.10739 null
2025-12-11 Textual Data Bias Detection and Mitigation - An Extensible Pipeline with Experimental Evaluation Rebekka Görge et.al. 2512.10734 null
2025-12-11 IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Generation Yuan-Ming Li et.al. 2512.10730 link
2025-12-11 Beyond the Black Box: Identifiable Interpretation and Control in Generative Models via Causal Minimality Lingjing Kong et.al. 2512.10720 null
2025-12-11 PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In Code Itay Dreyfuss et.al. 2512.10713 null
2025-12-11 COMPARE: Clinical Optimization with Modular Planning and Assessment via RAG-Enhanced AI-OCT: Superior Decision Support for Percutaneous Coronary Intervention Compared to ChatGPT-5 and Junior Operators Wei Fang et.al. 2512.10702 null
2025-12-11 Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution Zouying Cao et.al. 2512.10696 null
2025-12-11 Challenges of Evaluating LLM Safety for User Welfare Manon Kempermann et.al. 2512.10687 null
2025-12-11 On the Dynamics of Multi-Agent LLM Communities Driven by Value Diversity Muhua Huang et.al. 2512.10665 null
2025-12-11 Token Sample Complexity of Attention Léa Bohbot et.al. 2512.10656 null
2025-12-11 TriDF: Evaluating Perception, Detection, and Hallucination for Interpretable DeepFake Detection Jian-Yu Jiang-Lin et.al. 2512.10652 null
2025-12-11 From Data Scarcity to Data Care: Reimagining Language Technologies for Serbian and other Low-Resource Languages Smiljana Antonijevic Ubois et.al. 2512.10630 null
2025-12-11 AgriGPT-Omni: A Unified Speech-Vision-Text Framework for Multilingual Agricultural Intelligence Bo Yang et.al. 2512.10624 null
2025-12-11 Phythesis: Physics-Guided Evolutionary Scene Synthesis for Energy-Efficient Data Center Design via LLMs Minghao LI et.al. 2512.10611 null
2025-12-11 Multi-Objective Reward and Preference Optimization: Theory and Algorithms Akhil Agnihotri et.al. 2512.10601 null
2025-12-11 Beyond Pixels: A Training-Free, Text-to-Text Framework for Remote Sensing Image Retrieval J. Xiao et.al. 2512.10596 null
2025-12-11 RoleRMBench & RoleRM: Towards Reward Modeling for Profile-Based Role Play in Dialogue Systems Hang Ding et.al. 2512.10575 null
2025-12-11 NormCode: A Semi-Formal Language for Context-Isolated AI Planning Xin Guan et.al. 2512.10563 null
2025-12-11 Causal Reasoning Favors Encoders: On The Limits of Decoder-Only Models Amartya Roy et.al. 2512.10561 null
2025-12-11 Grounding Everything in Tokens for Multimodal Large Language Models Xiangxuan Ren et.al. 2512.10554 null
2025-12-11 LLM-Auction: Generative Auction towards LLM-Native Advertising Chujie Zhao et.al. 2512.10551 null
2025-12-11 Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding Yuchen Feng et.al. 2512.10548 null
2025-12-11 Unlocking the Address Book: Dissecting the Sparse Semantic Structure of LLM Key-Value Caches via Sparse Autoencoders Qingsen Ma et.al. 2512.10547 null
2025-12-11 XDoGE: Multilingual Data Reweighting to Enhance Language Inclusivity in LLMs Iñaki Lacunza et.al. 2512.10545 null
2025-12-11 Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning Haiteng Zhao et.al. 2512.10534 null
2025-12-11 Zero-shot 3D Map Generation with LLM Agents: A Dual-Agent Architecture for Procedural Content Generation Lim Chien Her et.al. 2512.10501 null
2025-12-11 Decoding Human-LLM Collaboration in Coding: An Empirical Study of Multi-Turn Conversations in the Wild Binquan Zhang et.al. 2512.10493 null
2025-12-11 LLM-Assisted AHP for Explainable Cyber Range Evaluation Vyron Kampourakis et.al. 2512.10487 null
2025-12-11 From Lab to Reality: A Practical Evaluation of Deep Learning Models and LLMs for Vulnerability Detection Chaomeng Lu et.al. 2512.10485 null
2025-12-11 Grammaticality Judgments in Humans and Language Models: Revisiting Generative Grammar with LLMs Lars G. B. Johnsen et.al. 2512.10453 null
2025-12-11 When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection Devanshu Sahoo et.al. 2512.10449 null
2025-12-11 Decoding Student Minds: Leveraging Conversational Agents for Psychological and Learning Analysis Nour El Houda Ben Chaabene et.al. 2512.10441 null
2025-12-11 Enhancing Next-Generation Language Models with Knowledge Graphs: Extending Claude, Mistral IA, and GPT-4 via KG-BERT Nour El Houda Ben Chaabene et.al. 2512.10440 null
2025-12-11 Semantic Reconstruction of Adversarial Plagiarism: A Context-Aware Framework for Detecting and Restoring “Tortured Phrases” in Scientific Literature Agniva Maiti et.al. 2512.10435 null
2025-12-11 Cooperative Retrieval-Augmented Generation for Question Answering: Mutual Information Exchange and Ranking by Contrasting Layers Youmin Ko et.al. 2512.10422 null
2025-12-11 How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation Devanshu Sahoo et.al. 2512.10415 null
2025-12-11 Sliding Window Attention Adaptation Yijiong Yu et.al. 2512.10411 null
2025-12-11 RoboNeuron: A Modular Framework Linking Foundation Models and ROS for Embodied AI Weifan Guan et.al. 2512.10394 null
2025-12-11 GPG: Generalized Policy Gradient Theorem for Transformer-based Policies Hangyu Mao et.al. 2512.10365 null
2025-12-11 Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models Woojun Jung et.al. 2512.10362 null
2025-12-11 Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task Sunqi Fan et.al. 2512.10359 null
2025-12-11 Dynamics of Agentic Loops in Large Language Models: A Geometric Theory of Trajectories Nicolas Tacheny et.al. 2512.10350 null
2025-12-11 EchoingPixels: Cross-Modal Adaptive Token Reduction for Efficient Audio-Visual LLMs Chao Gong et.al. 2512.10324 null
2025-12-11 EpiPlanAgent: Agentic Automated Epidemic Response Planning Kangkun Mao et.al. 2512.10313 null
2025-12-11 Efficient-VLN: A Training-Efficient Vision-Language Navigation Model Duo Zheng et.al. 2512.10310 null
2025-12-11 Reverse Thinking Enhances Missing Information Detection in Large Language Models Yuxin Liu et.al. 2512.10273 null
2025-12-11 VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models Yuetong Su et.al. 2512.10262 null
2025-12-11 Reject or Not?: A Benchmark for Voice Assistant Query Rejection in Smart Home Scenario and an Improved Method Based on LLMs Huichao Men et.al. 2512.10257 null
2025-12-11 InFerActive: Towards Scalable Human Evaluation of Large Language Models through Interactive Inference Junhyeong Hwangbo et.al. 2512.10234 null
2025-12-11 Adaptive Information Routing for Multimodal Time Series Forecasting Jun Seo et.al. 2512.10229 null
2025-12-11 Does SWE-Bench-Verified Test Agent Ability or Model Memory? Thanosan Prathifkumar et.al. 2512.10218 null
2025-12-11 CP-Env: Evaluating Large Language Models on Clinical Pathways in a Controllable Hospital Environment Yakun Zhu et.al. 2512.10206 null
2025-12-11 AutoMedic: An Automated Evaluation Framework for Clinical Conversational Agents with Medical Dataset Grounding Gyutaek Oh et.al. 2512.10195 null
2025-12-11 CIEGAD: Cluster-Conditioned Interpolative and Extrapolative Framework for Geometry-Aware and Domain-Aligned Data Augmentation Keito Inoshita et.al. 2512.10178 null
2025-12-11 ATLAS: Automated Toolkit for Large-Scale Verified Code Synthesis Mantas Baksys et.al. 2512.10173 null
2025-12-11 Offscript: Automated Auditing of Instruction Adherence in LLMs Nicholas Clark et.al. 2512.10172 null
2025-12-10 Enhancing Large Language Models for End-to-End Circuit Analysis Problem Solving Liangliang Chen et.al. 2512.10159 null
2025-12-10 Unforgotten Safety: Preserving Safety Alignment of Large Language Models with Continual Learning Lama Alssum et.al. 2512.10150 null
2025-12-10 PARAN: Persona-Augmented Review ANswering system on Food Delivery Review Dataset Moonsoo Park et.al. 2512.10148 null
2025-12-10 Workflow is All You Need: Escaping the “Statistical Smoothing Trap” via High-Entropy Information Foraging and Adversarial Pacing Zhongjie Jiang et.al. 2512.10121 null
2025-12-10 AgriRegion: Region-Aware Retrieval for High-Fidelity Agricultural Advice Mesafint Fanuel et.al. 2512.10114 null
2025-12-10 Generate-Then-Validate: A Novel Question Generation Approach Using Small Language Models Yumou Wei et.al. 2512.10110 null
2025-12-10 LLM-PEA: Leveraging Large Language Models Against Phishing Email Attacks Najmul Hassan et.al. 2512.10104 null
2025-12-10 What Kind of Reasoning (if any) is an LLM actually doing? On the Stochastic Nature and Abductive Appearance of Large Language Models Luciano Floridi et.al. 2512.10080 null
2025-12-10 Independent Density Estimation Jiahao Liu et.al. 2512.10067 null
2025-12-10 Linear socio-demographic representations emerge in Large Language Models from indirect cues Paul Bouchaud et.al. 2512.10065 null
2025-12-10 \textsc{Text2Graph}: Combining Lightweight LLMs and GNNs for Efficient Text Classification in Label-Scarce Scenarios João Lucas Luz Lima Sarcinelli et.al. 2512.10061 null
2025-12-10 Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning Logan Robbins et.al. 2512.10054 null
2025-12-10 Detailed balance in large language model-driven agents Zhuo-Yang Song et.al. 2512.10047 null
2025-12-10 Local LLM Ensembles for Zero-shot Portuguese Named Entity Recognition João Lucas Luz Lima Sarcinelli et.al. 2512.10043 null
2025-12-10 Intelligently Weighting Multiple Reference Models for Direct Preference Optimization of LLMs Skyler Wu et.al. 2512.10040 null
2025-12-10 Exploring LLMs for Scientific Information Extraction Using The SciEx Framework Sha Li et.al. 2512.10004 null
2025-12-10 SCOPE: Language Models as One-Time Teacher for Hierarchical Planning in Text Environments Haoye Lu et.al. 2512.09897 null
2025-12-10 Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs Pius Horn et.al. 2512.09874 link
2025-12-10 FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning Khurram Khalil et.al. 2512.09872 null
2025-12-10 MedForget: Hierarchy-Aware Multimodal Unlearning Testbed for Medical AI Fengli Wu et.al. 2512.09867 null
2025-12-10 UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving Hao Lu et.al. 2512.09864 null
2025-12-10 Mitigating Social Bias in English and Urdu Language Models Using PRM-Guided Candidate Selection and Sequential Refinement Muneeb Ur Raheem Khan et.al. 2512.09854 null
2025-12-10 ChronusOmni: Improving Time Awareness of Omni Large Language Models Yijing Chen et.al. 2512.09841 null
2025-12-10 LLMs in Interpreting Legal Documents Simone Corbo et.al. 2512.09830 null
2025-12-10 RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning Khurram Khalil et.al. 2512.09829 null
2025-12-10 DeepSeek’s WEIRD Behavior: The cultural alignment of Large Language Models and the effects of prompt language and cultural prompting James Luther et.al. 2512.09772 null
2025-12-10 Defining Cost Function of Steganography with Large Language Models Hanzhou Wu et.al. 2512.09769 null
2025-12-10 Towards Language Model Guided TLA+ Proof Automation Yuhao Zhou et.al. 2512.09758 null
2025-12-10 Knowledge Graph Enrichment and Reasoning for Nobel Laureates Thanh-Lam T. Nguyen et.al. 2512.09707 null
2025-12-10 Exqutor: Extended Query Optimizer for Vector-augmented Analytical Queries Hyunjoon Kim et.al. 2512.09695 null
2025-12-10 Understanding Chain-of-Thought Effectiveness in Code Generation: An Empirical and Information-Theoretic Analysis Naizhu Jin et.al. 2512.09679 null
2025-12-10 The Ky Fan Norms and Beyond: Dual Norms and Combinations for Matrix Optimization Alexey Kravatskiy et.al. 2512.09678 null
2025-12-10 d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models Leyi Pan et.al. 2512.09675 null
2025-12-10 IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting Tao Zhang et.al. 2512.09663 link
2025-12-10 Can LLMs Evaluate What They Cannot Annotate? Revisiting LLM Reliability in Hate Speech Detection Paloma Piot et.al. 2512.09662 null
2025-12-10 Measuring Corruption from Text Data Arieda Muço et.al. 2512.09652 null
2025-12-10 MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment Mengxi Xiao et.al. 2512.09636 null
2025-12-10 Creation of the Estonian Subjectivity Dataset: Assessing the Degree of Subjectivity on a Scale Karl Gustav Gailit et.al. 2512.09634 null
2025-12-10 An End-to-end Planning Framework with Agentic LLMs and PDDL Emanuele La Malfa et.al. 2512.09629 null
2025-12-10 LogICL: Distilling LLM Reasoning to Bridge the Semantic Gap in Cross-Domain Log Anomaly Detection Jingwei Ye et.al. 2512.09627 null
2025-12-10 Rethinking Chain-of-Thought Reasoning for Videos Yiwu Zhong et.al. 2512.09616 link
2025-12-10 ImageTalk: Designing a Multimodal AAC Text Generation System Driven by Image Recognition and Natural Language Generation Boyin Yang et.al. 2512.09610 null
2025-12-10 Investigate the Low-level Visual Perception in Vision-Language based Image Quality Assessment Yuan Li et.al. 2512.09573 null
2025-12-10 System Report for CCL25-Eval Task 10: Prompt-Driven Large Language Model Merge for Fine-Grained Chinese Hate Speech Detection Binglin Wu et.al. 2512.09563 null
2025-12-10 Systematic Framework of Application Methods for Large Language Models in Language Sciences Kun Sun et.al. 2512.09552 null
2025-12-10 Chasing Shadows: Pitfalls in LLM Security Research Jonathan Evertz et.al. 2512.09549 null
2025-12-10 Supporting Dynamic Agentic Workloads: How Data and Agents Interact Ioana Giurgiu et.al. 2512.09548 null
2025-12-10 Don’t Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search Ekaterina Fadeeva et.al. 2512.09538 null
2025-12-10 CNFinBench: A Benchmark for Safety and Compliance of Large Language Models in Finance Jinru Ding et.al. 2512.09506 null
2025-12-10 RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning Yucan Guo et.al. 2512.09487 null
2025-12-10 Advancing LLM-Based Security Automation with Customized Group Relative Policy Optimization for Zero-Touch Networks Xinye Cao et.al. 2512.09485 null
2025-12-10 An Efficient Interaction Human-AI Synergy System Bridging Visual Awareness and Large Language Model for Intensive Care Units Yibowen Zhao et.al. 2512.09473 null
2025-12-10 WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving Chiheng Lou et.al. 2512.09472 null
2025-12-10 Advancing Text Classification with Large Language Models and Neural Attention Mechanisms Ning Lyu et.al. 2512.09444 null
2025-12-10 Advancing Research via Human-AI Interactive Theorem Proving Chenyi Li et.al. 2512.09443 null
2025-12-10 Knowledge-Augmented Large Language Model Agents for Explainable Financial Decision-Making Qingyuan Zhang et.al. 2512.09440 null
2025-12-10 ODMA: On-Demand Memory Allocation Framework for LLM Serving on LPDDR-Class Accelerators Guoqiang Zou et.al. 2512.09427 null
2025-12-10 Black-Box Behavioral Distillation Breaks Safety Alignment in Medical LLMs Sohely Jahan et.al. 2512.09403 null
2025-12-10 Optimizing Data Extraction from Materials Science Literature: A Study of Tools Using Large Language Models Wenkai Ning et.al. 2512.09370 null
2025-12-10 Are Hypervectors Enough? Single-Call LLM Reasoning over Knowledge Graphs Yezi Liu et.al. 2512.09369 null
2025-12-10 Video-QTR: Query-Driven Temporal Reasoning Framework for Lightweight Video Understanding Xinkui Zhao et.al. 2512.09354 null
2025-12-10 Self Distillation Fine-Tuning of Protein Language Models Improves Versatility in Protein Design Amin Tavakoli et.al. 2512.09329 null
2025-12-10 RACAM: Enhancing DRAM with Reuse-Aware Computation and Automated Mapping for ML Inference Siyuan Ma et.al. 2512.09304 null
2025-12-10 Identifying Bias in Machine-generated Text Detection Kevin Stowe et.al. 2512.09292 null
2025-12-10 LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations Zhichao Yang et.al. 2512.09271 null
2025-12-10 From Forecast to Action: Uncertainty-Aware UAV Deployment for Ocean Drifter Recovery Jingeun Kim et.al. 2512.09260 null
2025-12-10 The Illusion of Rationality: Tacit Bias and Strategic Dominance in Frontier LLM Negotiation Games Manuel S. Ríos et.al. 2512.09254 null
2025-12-10 GLACIA: Instance-Aware Positional Reasoning for Glacial Lake Segmentation via Multimodal Large Language Model Lalit Maurya et.al. 2512.09251 null
2025-12-10 Training-free Context-adaptive Attention for Efficient Long Context Modeling Zeng You et.al. 2512.09238 null
2025-12-10 CORE: A Conceptual Reasoning Layer for Large Language Models Vishwas Hegde et.al. 2512.09222 null
2025-12-10 Targeting Misalignment: A Conflict-Aware Framework for Reward-Model-based LLM Alignment Zixuan Liu et.al. 2512.09212 null
2025-12-09 LLMs for Analog Circuit Design Continuum (ACDC) Yasaman Esfandiari et.al. 2512.09199 null
2025-12-09 TritonForge: Profiling-Guided Framework for Automated Triton Kernel Optimization Haonan Li et.al. 2512.09196 null
2025-12-09 WOLF: Werewolf-based Observations for LLM Deception and Falsehoods Mrinal Agarwal et.al. 2512.09187 null
2025-12-09 MindShift: Analyzing Language Models’ Reactions to Psychological Prompts Anton Vasiliuk et.al. 2512.09149 null
2025-12-09 Detecting Hallucinations in Graph Retrieval-Augmented Generation via Attention Patterns and Semantic Alignment Shanghao Li et.al. 2512.09148 null
2025-12-09 Knowledge-Guided Large Language Model for Automatic Pediatric Dental Record Understanding and Safe Antibiotic Recommendation Zihan Han et.al. 2512.09127 null
2025-12-09 A Categorical Analysis of Large Language Models and Why LLMs Circumvent the Symbol Grounding Problem Luciano Floridi et.al. 2512.09117 null
2025-12-09 Evolving Excellence: Automated Optimization of LLM-based Agents Paul Brookes et.al. 2512.09108 null
2025-12-09 Learning Unmasking Policies for Diffusion Language Models Metod Jazbec et.al. 2512.09106 null
2025-12-09 Explaining the Unseen: Multimodal Vision-Language Reasoning for Situational Awareness in Underground Mining Disasters Mizanur Rahman Jewel et.al. 2512.09092 null
2025-12-09 Calibrated Trust in Dealing with LLM Hallucinations: A Qualitative Study Adrian Ryser et.al. 2512.09088 null
2025-12-09 AgentComp: From Agentic Reasoning to Compositional Mastery in Text-to-Image Models Arman Zarei et.al. 2512.09081 null
2025-12-09 Llama-based source code vulnerability detection: Prompt engineering vs Fine tuning Dyna Soumhane Ouchebara et.al. 2512.09006 null
2025-12-09 Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs Angela van Sprang et.al. 2512.08923 null
2025-12-09 Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training Jakub Krajewski et.al. 2512.08894 null
2025-12-09 Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders Guangzhi Xiong et.al. 2512.08892 null
2025-12-09 AI Didn’t Start the Fire: Examining the Stack Exchange Moderator and Contributor Strike Yiwei Wu et.al. 2512.08884 null
2025-12-09 When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation Joshua Ward et.al. 2512.08875 null
2025-12-09 Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning Jing Jie Tan et.al. 2512.08873 null
2025-12-09 SimpleDevQA: Benchmarking Large Language Models on Development Knowledge QA Jing Zhang et.al. 2512.08867 null
2025-12-09 Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts Yifan Lyu et.al. 2512.08814 null
2025-12-09 PrivTune: Efficient and Privacy-Preserving Fine-Tuning of Large Language Models via Device-Cloud Collaboration Yi Liu et.al. 2512.08809 null
2025-12-09 A Systematic Evaluation of Preference Aggregation in Federated RLHF for Pluralistic Alignment of LLMs Mahmoud Srewa et.al. 2512.08786 null
2025-12-09 A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows Eranga Bandara et.al. 2512.08769 null
2025-12-09 Financial News Summarization: Can extractive methods still offer a true alternative to LLMs? Nicolas Reche et.al. 2512.08764 null
2025-12-09 Towards Foundation Models with Native Multi-Agent Intelligence Shuyue Hu et.al. 2512.08743 null
2025-12-09 LaMoSys3.5D: Enabling 3.5D-IC-Based Large Language Model Inference Serving Systems via Hardware/Software Co-Design Qipan Wang et.al. 2512.08731 null
2025-12-09 Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search Manos Plitsis et.al. 2512.08724 null
2025-12-09 Multi-Agent Intelligence for Multidisciplinary Decision-Making in Gastrointestinal Oncology Rongzhao Zhang et.al. 2512.08674 null
2025-12-09 An Agentic AI System for Multi-Framework Communication Coding Bohao Yang et.al. 2512.08659 null
2025-12-09 QSTN: A Modular Framework for Robust Questionnaire Inference with Large Language Models Maximilian Kreutner et.al. 2512.08646 null
2025-12-09 Chain-of-Image Generation: Toward Monitorable and Controllable Image Generation Young Kyung Kim et.al. 2512.08645 null
2025-12-09 See-Control: A Multimodal Agent Framework for Smartphone Interaction with a Robotic Arm Haoyu Zhao et.al. 2512.08629 null
2025-12-09 HealthcareNLP: where are we and what is next? Lifeng Han et.al. 2512.08617 null
2025-12-09 CogMCTS: A Novel Cognitive-Guided Monte Carlo Tree Search Framework for Iterative Heuristic Evolution with Large Language Models Hui Wang et.al. 2512.08609 null
2025-12-09 Bridging Scale Discrepancies in Robotic Control via Language-Based Action Representations Yuchi Zhang et.al. 2512.08548 null
2025-12-09 Curriculum Guided Massive Multi Agent System Solving For Robust Long Horizon Tasks Indrajit Kar et.al. 2512.08545 null
2025-12-09 Principles2Plan: LLM-Guided System for Operationalising Ethical Principles into Plans Tammy Zhong et.al. 2512.08536 null
2025-12-09 Autonomous Issue Resolver: Towards Zero-Touch Code Maintenance Aliaksei Kaliutau et.al. 2512.08492 null
2025-12-09 Soft Inductive Bias Approach via Explicit Reasoning Perspectives in Inappropriate Utterance Detection Using Large Language Models Ju-Young Kim et.al. 2512.08480 null
2025-12-09 A Multi-Agent LLM Framework for Design Space Exploration in Autonomous Driving Systems Po-An Shih et.al. 2512.08476 null
2025-12-09 Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models III: Implementing the Bacterial Biothreat Benchmark (B3) Dataset Gary Ackerman et.al. 2512.08459 null
2025-12-09 Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models II: Benchmark Generation Process Gary Ackerman et.al. 2512.08451 null
2025-12-09 What Triggers my Model? Contrastive Explanations Inform Gender Choices by Translation Models Janiça Hackenbuchner et.al. 2512.08440 null
2025-12-09 Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs Yinan Zhong et.al. 2512.08417 null
2025-12-09 Towards Effective and Efficient Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval Tao Chen et.al. 2512.08410 null
2025-12-09 DFALLM: Achieving Generalizable Multitask Deepfake Detection by Optimizing Audio LLM Components Yupei Li et.al. 2512.08403 null
2025-12-09 The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss Bozhou Li et.al. 2512.08374 null
2025-12-09 Reflecting with Two Voices: A Co-Adaptive Dual-Strategy Framework for LLM-Based Agent Decision Making Wentao Zhang et.al. 2512.08366 null
2025-12-09 The High Cost of Incivility: Quantifying Interaction Inefficiency via Multi-Agent Monte Carlo Simulations Benedikt Mangold et.al. 2512.08345 null
2025-12-09 Argus: A Multi-Agent Sensitive Information Leakage Detection Framework Based on Hierarchical Reference Relationships Bin Wang et.al. 2512.08326 null
2025-12-09 rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection Sijia Chen et.al. 2512.08300 null
2025-12-09 Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem Shiva Gaire et.al. 2512.08290 null
2025-12-09 Empowering smart app development with SolidGPT: an edge-cloud hybrid AI agent framework Liao Hu et.al. 2512.08286 null
2025-12-09 AgentEval: Generative Agents as Reliable Proxies for Human Evaluation of AI-Generated Content Thanh Vu et.al. 2512.08273 null
2025-12-09 Reasoning Models Ace the CFA Exams Jaisal Patel et.al. 2512.08270 null
2025-12-09 Token Sugar: Making Source Code Sweeter for LLMs through Token-Efficient Shorthand Zhensu Sun et.al. 2512.08266 null
2025-12-09 Beyond Traditional Diagnostics: Transforming Patient-Side Information into Predictive Insights with Knowledge Graphs and Prototypes Yibowen Zhao et.al. 2512.08261 null
2025-12-09 Chopper: A Multi-Level GPU Characterization Tool & Derived Insights Into LLM Training Inefficiency Marco Kurzynski et.al. 2512.08242 null
2025-12-09 SOP^2: Transfer Learning with Scene-Oriented Prompt Pool on 3D Object Detection Ching-Hung Cheng et.al. 2512.08223 null
2025-12-09 Secure or Suspect? Investigating Package Hallucinations of Shell Command in Original and Quantized LLMs Md Nazmul Haque et.al. 2512.08213 null
2025-12-09 MobileFineTuner: A Unified End-to-End Framework for Fine-Tuning LLMs on Mobile Phones Jiaxiang Geng et.al. 2512.08211 null
2025-12-09 ClinicalTrialsHub: Bridging Registries and Literature for Comprehensive Clinical Trial Access Jiwoo Park et.al. 2512.08193 null
2025-12-09 A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties Jinghao Wang et.al. 2512.08185 null
2025-12-09 Framing Climate Change on YouTube: North-South Divides in Narratives and Public Engagement Sanika Damle et.al. 2512.08183 null
2025-12-09 Chat with UAV – Human-UAV Interaction Based on Large Language Models Haoran Wang et.al. 2512.08145 null
2025-12-09 PolyLingua: Margin-based Inter-class Transformer for Robust Cross-domain Language Detection Ali Lotfi Rezaabad et.al. 2512.08143 null
2025-12-09 Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models I: The Task-Query Architecture Gary Ackerman et.al. 2512.08130 null
2025-12-09 Universal Adversarial Suffixes Using Calibrated Gumbel-Softmax Relaxation Sampriti Soor et.al. 2512.08123 null
2025-12-08 Evolutionary perspective of large language models on shaping research insights into healthcare disparities David An et.al. 2512.08122 null
2025-12-08 Balanced Accuracy: The Right Metric for Evaluating LLM Judges – Explained through Youden’s J statistic Stephane Collot et.al. 2512.08121 null
2025-12-08 Detecting Ambiguity Aversion in Cyberattack Behavior to Inform Cognitive Defense Strategies Stephan Carney et.al. 2512.08107 null
2025-12-08 AgentCrypt: Advancing Privacy and (Secure) Computation in AI Agent Collaboration Harish Karthikeyan et.al. 2512.08104 null
2025-12-08 Training LLMs for Honesty via Confessions Manas Joglekar et.al. 2512.08093 null
2025-12-08 Adaptation of Embedding Models to Financial Filings via LLM Distillation Eliot Brenner et.al. 2512.08088 null
2025-12-08 Exploiting the Randomness of Large Language Models (LLM) in Text Classification Tasks: Locating Privileged Documents in Legal Matters Keith Huffman et.al. 2512.08083 null
2025-12-08 Short-Context Dominance: How Much Local Context Natural Language Actually Needs? Vala Vakilian et.al. 2512.08082 null
2025-12-08 Leveraging Machine Learning and Large Language Models for Automated Image Clustering and Description in Legal Discovery Qiang Mao et.al. 2512.08079 null
2025-12-08 A Comparative Study of Retrieval Methods in Azure AI Search Qiang Mao et.al. 2512.08078 null
2025-12-08 Unveiling Latent Knowledge in Chemistry Language Models through Sparse Autoencoders Jaron Cohen et.al. 2512.08077 null
2025-12-08 Large Language Models for Education and Research: An Empirical and User Survey-based Analysis Md Mostafizer Rahman et.al. 2512.08057 null
2025-12-08 CLARITY: Medical World Model for Guiding Treatment Decisions by Modeling Context-Aware Disease Trajectories in Latent Space Tianxingjian Ding et.al. 2512.08029 null
2025-12-08 Toward an AI Reasoning-Enabled System for Patient-Clinical Trial Matching Caroline N. Leach et.al. 2512.08026 null
2025-12-08 FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models Jiyoon Pyo et.al. 2512.08016 null
2025-12-08 Bridging the Clinical Expertise Gap: Development of a Web-Based Platform for Accessible Time Series Forecasting and Analysis Aaron D. Mullen et.al. 2512.07992 null
2025-12-08 DeepCode: Open Agentic Coding Zongwei Li et.al. 2512.07921 link
2025-12-08 Relational Visual Similarity Thao Nguyen et.al. 2512.07833 null
2025-12-08 Do Generalisation Results Generalise? Matteo Boglioni et.al. 2512.07832 null
2025-12-08 Understanding Privacy Risks in Code Models Through Training Dynamics: A Causal Approach Hua Yang et.al. 2512.07814 null
2025-12-08 LLM Use for Mental Health: Crowdsourcing Users’ Sentiment-based Perspectives and Values from Social Discussions Lingyao Li et.al. 2512.07797 null
2025-12-08 Large Causal Models from Large Language Models Sridhar Mahadevan et.al. 2512.07796 null
2025-12-08 ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning Nearchos Potamitis et.al. 2512.07795 null
2025-12-08 Automating High Energy Physics Data Analysis with LLM-Powered Agents Eli Gendreau-Distler et.al. 2512.07785 null
2025-12-08 Mary, the Cheeseburger-Eating Vegetarian: Do LLMs Recognize Incoherence in Narratives? Karin de Langis et.al. 2512.07777 null
2025-12-08 RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models Xiqiao Xiong et.al. 2512.07761 null
2025-12-08 SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery Meng Cao et.al. 2512.07733 null
2025-12-08 SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination Sangha Park et.al. 2512.07730 null
2025-12-08 Privacy Practices of Browser Agents Alisha Ukani et.al. 2512.07725 null
2025-12-08 In-Context and Few-Shots Learning for Forecasting Time Series Data based on Large Language Models Saroj Gopali et.al. 2512.07705 null
2025-12-08 HalluShift++: Bridging Language and Vision through Internal Representation Shifts for Hierarchical Hallucinations in MLLMs Sujoy Nath et.al. 2512.07687 null
2025-12-08 When Large Language Models Do Not Work: Online Incivility Prediction through Graph Neural Networks Zihan Chen et.al. 2512.07684 null
2025-12-08 Depth-Wise Activation Steering for Honest Language Models Gracjan Góral et.al. 2512.07667 null
2025-12-08 Bridging Code Graphs and Large Language Models for Better Code Understanding Zeqi Chen et.al. 2512.07666 null
2025-12-08 Reliable agent engineering should integrate machine-compatible organizational principles R. Patrick Xian et.al. 2512.07665 null
2025-12-08 An AI-Powered Autonomous Underwater System for Sea Exploration and Scientific Research Hamad Almazrouei et.al. 2512.07652 null
2025-12-08 PCMind-2.1-Kaiyuan-2B Technical Report Kairong Luo et.al. 2512.07612 null
2025-12-08 Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement Yongsheng Lian et.al. 2512.07611 null
2025-12-08 Metric-Fair Prompting: Treating Similar Samples Similarly Jing Wang et.al. 2512.07608 null
2025-12-08 Complementary Learning Approach for Text Classification using Large Language Models Navid Asgari et.al. 2512.07583 null
2025-12-08 All You Need Are Random Visual Tokens? Demystifying Token Pruning in VLLMs Yahong Wang et.al. 2512.07580 null
2025-12-08 A Simple Method to Enhance Pre-trained Language Models with Speech Tokens for Classification Nicolas Calbucura et.al. 2512.07571 null
2025-12-08 MoCoRP: Modeling Consistent Relations between Persona and Response for Persona-based Dialogue Kyungro Lee et.al. 2512.07544 null
2025-12-08 SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents Michelle Wastl et.al. 2512.07538 null
2025-12-08 Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs Xiaoran Liu et.al. 2512.07525 link
2025-12-08 AutoICE: Automatically Synthesizing Verifiable C Code via LLM-driven Evolution Weilin Luo et.al. 2512.07501 null
2025-12-08 How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations JV Roig et.al. 2512.07497 null
2025-12-08 Enhancing Agentic RL with Progressive Reward Shaping and Value-based Sampling Policy Optimization Zhuoran Zhuang et.al. 2512.07478 null
2025-12-08 Understanding LLM Agent Behaviours via Game Theory: Strategy Recognition, Biases and Multi-Agent Dynamics Trung-Kiet Huynh et.al. 2512.07462 null
2025-12-08 Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning Tong Wu et.al. 2512.07461 link
2025-12-08 Persian-Phi: Efficient Cross-Lingual Adaptation of Compact LLMs via Curriculum Learning Amir Mohammad Akhlaghi et.al. 2512.07454 null
2025-12-08 From Show Programmes to Data: Designing a Workflow to Make Performing Arts Ephemera Accessible Through Language Models Clarisse Bardiot et.al. 2512.07452 null
2025-12-08 MIDG: Mixture of Invariant Experts with knowledge injection for Domain Generalization in Multimodal Sentiment Analysis Yangle Li et.al. 2512.07430 null
2025-12-08 Revolutionizing Mixed Precision Quantization: Towards Training-free Automatic Proxy Discovery via Large Language Models Haidong Kang et.al. 2512.07419 null
2025-12-08 Do LLMs Trust the Code They Write? Francisco Ribeiro et.al. 2512.07404 null
2025-12-08 LUNE: Efficient LLM Unlearning via LoRA Fine-Tuning with Negative Examples Yezi Liu et.al. 2512.07375 null
2025-12-08 Communication-Efficient Serving for Video Diffusion Models with Latent Parallelism Zhiyuan Wu et.al. 2512.07350 null
2025-12-08 Generalized Referring Expression Segmentation on Aerial Photos Luís Marnoto et.al. 2512.07338 link
2025-12-08 DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management Zhongchun Zhou et.al. 2512.07312 null
2025-12-08 Exact Synthetic Populations for Scalable Societal and Market Modeling Thierry Petit et.al. 2512.07306 null
2025-12-08 Towards Accurate UAV Image Perception: Guiding Vision-Language Models with Stronger Task Prompts Mingning Guo et.al. 2512.07302 null
2025-12-08 Investigating Training and Generalization in Faithful Self-Explanations of Large Language Models Tomoki Doi et.al. 2512.07288 null
2025-12-08 Automatic Syntax Error Repair for Discrete Controller Synthesis using Large Language Model Yusei Ishimizu et.al. 2512.07261 null
2025-12-08 Ensembling LLM-Induced Decision Trees for Explainable and Robust Error Detection Mengqi Wang et.al. 2512.07246 null
2025-12-08 NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models Feng Liang et.al. 2512.07218 null
2025-12-08 MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning Xuhui Zheng et.al. 2512.07203 null
2025-12-08 Generating Storytelling Images with Rich Chains-of-Reasoning Xiujie Song et.al. 2512.07198 null
2025-12-08 START: Spatial and Textual Learning for Chart Understanding Zhuoming Liu et.al. 2512.07186 link
2025-12-08 ContextualSHAP : Enhancing SHAP Explanations Through Contextual Language Generation Latifa Dwiyanti et.al. 2512.07178 null
2025-12-08 SPACE: Noise Contrastive Estimation Stabilizes Self-Play Fine-Tuning for Large Language Models Yibo Wang et.al. 2512.07175 null
2025-12-08 Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration Jucheng Shen et.al. 2512.07173 null
2025-12-08 When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing Siyuan Xu et.al. 2512.07166 null
2025-12-08 A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning Siyang Jiang et.al. 2512.07136 null
2025-12-08 DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning Nithin Sivakumaran et.al. 2512.07132 null
2025-12-08 RisConFix: LLM-based Automated Repair of Risk-Prone Drone Configurations Liping Han et.al. 2512.07122 null
2025-12-08 FOAM: Blocked State Folding for Memory-Efficient LLM Training Ziqing Wen et.al. 2512.07112 null
2025-12-08 The Geometry of Persona: Disentangling Personality from Reasoning in Large Language Models Zhixiang Wang et.al. 2512.07092 null
2025-12-08 Leveraging KV Similarity for Online Structured Pruning in LLMs Jungmin Lee et.al. 2512.07090 null
2025-12-08 ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking Yunzhe Li et.al. 2512.07086 null
2025-12-08 Do Large Language Models Truly Understand Cross-cultural Differences? Shiwei Guo et.al. 2512.07075 null
2025-12-08 Replicating TEMPEST at Scale: Multi-Turn Adversarial Attacks Against Trillion-Parameter Frontier Models Richard Young et.al. 2512.07059 null
2025-12-07 Reformulate, Retrieve, Localize: Agents for Repository-Level Bug Localization Genevieve Caumartin et.al. 2512.07022 null
2025-12-07 Latency-Response Theory Model: Evaluating Large Language Models via Response Accuracy and Chain-of-Thought Length Zhiyu Xu et.al. 2512.07019 null
2025-12-07 FVA-RAG: Falsification-Verification Alignment for Mitigating Sycophantic Hallucinations Mayank Ravishankara et.al. 2512.07015 null
2025-12-07 Block Sparse Flash Attention Daniel Ohayon et.al. 2512.07011 null
2025-12-07 Singing Timbre Popularity Assessment Based on Multimodal Large Foundation Model Zihao Wang et.al. 2512.06999 null
2025-12-07 Prompting-in-a-Series: Psychology-Informed Contents and Embeddings for Personality Recognition With Decoder-Only Models Jing Jie Tan et.al. 2512.06991 null
2025-12-07 Progress Ratio Embeddings: An Impatience Signal for Robust Length Control in Neural Text Generation Ivanhoé Botcazou et.al. 2512.06938 null
2025-12-07 Large Language Models and Forensic Linguistics: Navigating Opportunities and Threats in the Age of Generative AI George Mikros et.al. 2512.06922 null
2025-12-07 NeuroABench: A Multimodal Evaluation Benchmark for Neurosurgical Anatomy Identification Ziyang Song et.al. 2512.06921 null
2025-12-07 SoK: Trust-Authorization Mismatch in LLM Agent Interactions Guanquan Shi et.al. 2512.06914 null
2025-12-07 Robots with Attitudes: Influence of LLM-Driven Robot Personalities on Motivation and Performance Dennis Becker et.al. 2512.06910 null
2025-12-07 BabelCoder: Agentic Code Translation with Specification Alignment Fazle Rabbi et.al. 2512.06902 null
2025-12-07 An Analysis of Large Language Models for Simulating User Responses in Surveys Ziyun Yu et.al. 2512.06874 null
2025-12-07 Rhea: Role-aware Heuristic Episodic Attention for Conversational LLMs Wanyang Hong et.al. 2512.06869 null
2025-12-07 Do Persona-Infused LLMs Affect Performance in a Strategic Reasoning Game? John Licato et.al. 2512.06867 null
2025-12-07 Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior Yulin Li et.al. 2512.06866 null
2025-12-07 Spatial Retrieval Augmented Autonomous Driving Xiaosong Jia et.al. 2512.06865 null
2025-12-07 JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models Ce Chi et.al. 2512.06859 null
2025-12-07 Formal that “Floats” High: Formal Verification of Floating Point Arithmetic Hansa Mohanty et.al. 2512.06850 null
2025-12-07 CKG-LLM: LLM-Assisted Detection of Smart Contract Access Control Vulnerabilities Based on Knowledge Graphs Xiaoqi Li et.al. 2512.06846 null
2025-12-07 Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs Weixing Zhang et.al. 2512.06836 null
2025-12-07 Large Language Model-Based Generation of Discharge Summaries Tiago Rodrigues et.al. 2512.06812 null
2025-12-07 MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning Yueqian Wang et.al. 2512.06810 null
2025-12-07 Optimal and Diffusion Transports in Machine Learning Gabriel Peyré et.al. 2512.06797 null
2025-12-07 LLM4SFC: Sequential Function Chart Generation via Large Language Models Ofek Glick et.al. 2512.06787 null
2025-12-07 From Description to Score: Can LLMs Quantify Vulnerabilities? Sima Jafarikhah et.al. 2512.06781 null
2025-12-07 From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs Yuchuan Tian et.al. 2512.06776 link
2025-12-07 Becoming Experienced Judges: Selective Test-Time Learning for Evaluators Seungyeon Jwa et.al. 2512.06751 null
2025-12-07 DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems Ming Ma et.al. 2512.06749 null
2025-12-07 PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance Jifar Wakuma Ayana et.al. 2512.06747 null
2025-12-07 A Patient-Doctor-NLP-System to contest inequality for less privileged Subrit Dikshit et.al. 2512.06734 null
2025-12-07 “The Dentist is an involved parent, the bartender is not”: Revealing Implicit Biases in QA with Implicit BBQ Aarushi Wagh et.al. 2512.06732 null
2025-12-07 KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models Sourjya Roy et.al. 2512.06727 null
2025-12-07 The Role of Entropy in Visual Grounding: Analysis and Optimization Shuo Li et.al. 2512.06726 null
2025-12-07 ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems Bufang Yang et.al. 2512.06721 null
2025-12-07 Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents Zhibo Liang et.al. 2512.06716 null
2025-11-06 Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs Preetum Nakkiran et.al. 2511.04869 null
2025-11-06 Software Defined Vehicle Code Generation: A Few-Shot Prompting Approach Quang-Dung Nguyen et.al. 2511.04849 null
2025-11-06 Grounded Test-Time Adaptation for LLM Agents Arthur Chen et.al. 2511.04847 null
2025-11-06 Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models Chenxi Liu et.al. 2511.04800 null
2025-11-06 ReGen: Generative Robot Simulation via Inverse Design Phat Nguyen et.al. 2511.04769 null
2025-11-06 Surprisal reveals diversity gaps in image captioning and different scorers change the story Nikolai Ilinykh et.al. 2511.04754 null
2025-11-06 Trustworthiness Calibration Framework for Phishing Email Detection Using Large Language Models Daniyal Ganiuly et.al. 2511.04728 null
2025-11-06 IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs Ali Faraz et.al. 2511.04727 null
2025-11-06 Learning to reason about rare diseases through retrieval-augmented agents Ha Young Kim et.al. 2511.04720 null
2025-11-06 Benchmark Designers Should “Train on the Test Set” to Expose Exploitable Non-Visual Shortcuts Ellis Brown et.al. 2511.04655 null
2025-11-06 Logit-Entropy Adaptive Stopping Heuristic for Efficient Chain-of-Thought Reasoning Mohammad Atif Quamar et.al. 2511.04654 null
2025-11-06 Optimal Inference Schedules for Masked Diffusion Models Sitan Chen et.al. 2511.04647 null
2025-11-06 When retrieval outperforms generation: Dense evidence retrieval for scalable fake news detection Alamgir Munir Qazi et.al. 2511.04643 link
2025-11-06 PixCLIP: Achieving Fine-grained Visual Language Understanding via Any-granularity Pixel-Text Alignment Learning Yicheng Xiao et.al. 2511.04601 null
2025-11-06 Question the Questions: Auditing Representation in Online Deliberative Processes Soham De et.al. 2511.04588 null
2025-11-06 ARETE: an R package for Automated REtrieval from TExt with large language models Vasco V. Branco et.al. 2511.04573 null
2025-11-06 Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Jingqi Tong et.al. 2511.04570 link
2025-11-06 LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems Baptiste Bonin et.al. 2511.04541 null
2025-11-06 From Model to Breach: Towards Actionable LLM-Generated Vulnerabilities Reporting Cyril Vallez et.al. 2511.04538 null
2025-11-06 Large Language Models for Cyber Security Raunak Somani et.al. 2511.04508 null
2025-11-06 RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG Joshua Gao et.al. 2511.04502 null
2025-11-06 Large language models replicate and predict human cooperation across experiments in game theory Andrea Cera Palatsi et.al. 2511.04500 null
2025-11-06 Decoding Emergent Big Five Traits in Large Language Models: Temperature-Dependent Expression and Architectural Clustering Christos-Nikolaos Zacharopoulos et.al. 2511.04499 null
2025-11-06 RUST-BENCH: Benchmarking LLM Reasoning on Unstructured Text within Structured Tables Nikhil Abhyankar et.al. 2511.04491 null
2025-11-06 Perceptions of AI Bad Behavior: Variations on Discordant Non-Performance Jaime Banks et.al. 2511.04487 null
2025-11-06 Promoting Sustainable Web Agents: Benchmarking and Estimating Energy Consumption through Empirical and Theoretical Analysis Lars Krupp et.al. 2511.04481 null
2025-11-06 Enabling Dynamic Sparsity in Quantized LLM Inference Rongxiang Wang et.al. 2511.04477 null
2025-11-06 Beyond Shortest Path: Agentic Vehicular Routing with Semantic Context Carnot Braun et.al. 2511.04464 null
2025-11-06 Speed at the Cost of Quality? The Impact of LLM Agent Assistance on Software Development Hao He et.al. 2511.04427 null
2025-11-06 The Illusion of Certainty: Uncertainty quantification for LLMs fails under ambiguity Tim Tomov et.al. 2511.04418 null
2025-11-06 Post-Training LLMs as Better Decision-Making Agents: A Regret-Minimization Approach Chanwoo Park et.al. 2511.04393 null
2025-11-06 Multi-Task Learning for Visually Grounded Reasoning in Gastrointestinal VQA Itbaan Safwan et.al. 2511.04384 null
2025-11-06 HPC-Vis: A Visual Analytic System for Interactive Exploration of Historical Painter Cohorts Yingping Yang et.al. 2511.04383 null
2025-11-06 Towards Aligning Multimodal LLMs with Human Experts: A Focus on Parent-Child Interaction Weiyan Shi et.al. 2511.04366 null
2025-11-06 Where Do LLMs Still Struggle? An In-Depth Analysis of Code Generation Benchmarks Amir Molzam Sharifloo et.al. 2511.04355 null
2025-11-06 Differentially Private In-Context Learning with Nearest Neighbor Search Antti Koskela et.al. 2511.04332 null
2025-11-06 RxSafeBench: Identifying Medication Safety Issues of Large Language Models in Simulated Consultation Jiahao Zhao et.al. 2511.04328 null
2025-11-06 AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research Tim Beyer et.al. 2511.04316 null
2025-11-06 Measuring economic outlook in the news timely and efficiently Elliot Beck et.al. 2511.04299 null
2025-11-06 Robustness of Minimum-Volume Nonnegative Matrix Factorization under an Expanded Sufficiently Scattered Condition Giovanni Barbarino et.al. 2511.04291 null
2025-11-06 A Tool for Benchmarking Large Language Models’ Robustness in Assessing the Realism of Driving Scenarios Jiahui Wu et.al. 2511.04267 null
2025-11-06 SSPO: Subsentence-level Policy Optimization Kun Yang et.al. 2511.04256 null
2025-11-06 Efficient Topic Extraction via Graph-Based Labeling: A Lightweight Alternative to Deep Models Salma Mekaoui et.al. 2511.04248 null
2025-11-06 Reusing Pre-Training Data at Test Time is a Compute Multiplier Alex Fang et.al. 2511.04234 null
2025-11-06 Black-Box Guardrail Reverse-engineering Attack Hongwei Yao et.al. 2511.04215 null
2025-11-06 Block Rotation is All You Need for MXFP4 Quantization Yuantian Shao et.al. 2511.04214 null
2025-11-06 Can we trust LLMs as a tutor for our students? Evaluating the Quality of LLM-generated Feedback in Statistics Exams Markus Herklotz et.al. 2511.04213 null
2025-11-06 LLM-as-a-Judge is Bad, Based on AI Attempting the Exam Qualifying for the Member of the Polish National Board of Appeal Michał Karp et.al. 2511.04205 null
2025-11-06 Computational Turing Test Reveals Systematic Differences Between Human and AI Language Nicolò Pagan et.al. 2511.04195 null
2025-11-06 Explaining Software Vulnerabilities with Large Language Models Oshando Johnson et.al. 2511.04179 null
2025-11-06 Transforming Mentorship: An AI Powered Chatbot Approach to University Guidance Mashrur Rahman et.al. 2511.04172 null
2025-11-06 Are We Aligned? A Preliminary Investigation of the Alignment of Responsible AI Values between LLMs and Human Judgment Asma Yamani et.al. 2511.04157 null
2025-11-06 BAPPA: Benchmarking Agents, Plans, and Pipelines for Automated Text-to-SQL Generation Fahim Ahmed et.al. 2511.04153 null
2025-11-06 Implementation of transformer-based LLMs with large-scale optoelectronic neurons on a CMOS image sensor platform Neil Na et.al. 2511.04136 null
2025-11-06 Exploring the Feasibility of End-to-End Large Language Model as a Compiler Hongbin Zhang et.al. 2511.04132 null
2025-11-06 RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning Xinyuan Li et.al. 2511.04120 null
2025-11-06 How Natural Language Proficiency Shapes GenAI Code for Software Engineering Tasks Ruksit Rojpaisarnkit et.al. 2511.04115 null
2025-11-06 Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models Wenmo Qiu et.al. 2511.04108 null
2025-11-06 KGFR: A Foundation Retriever for Generalized Knowledge Graph Question Answering Yuanning Cui et.al. 2511.04093 null
2025-11-06 E-CARE: An Efficient LLM-based Commonsense-Augmented Framework for E-Commerce Ge Zhang et.al. 2511.04087 null
2025-11-06 Caption Injection for Optimization in Generative Search Engine Xiaolu Chen et.al. 2511.04080 null
2025-11-06 The truth is no diaper: Human and AI-generated associations to emotional words Špela Vintar et.al. 2511.04077 null
2025-11-06 Agentmandering: A Game-Theoretic Framework for Fair Redistricting via Large Language Model Agents Hao Li et.al. 2511.04076 null
2025-11-06 Plan of Knowledge: Retrieval-Augmented Large Language Models for Temporal Knowledge Graph Question Answering Xinying Qian et.al. 2511.04072 null
2025-11-06 TXL Fusion: A Hybrid Machine Learning Framework Integrating Chemical Heuristics and Large Language Models for Topological Materials Discovery Arif Ullah et.al. 2511.04068 null
2025-11-06 DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization Yuantian Shao et.al. 2511.04063 null
2025-11-06 Interpreting Multi-Attribute Confounding through Numerical Attributes in Large Language Models Hirohane Takagi et.al. 2511.04053 null
2025-11-06 An LLM-based Framework for Human-Swarm Teaming Cognition in Disaster Search and Rescue Kailun Ji et.al. 2511.04042 null
2025-11-06 PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration Yue Jiet Chong et.al. 2511.04036 null
2025-11-06 Detecting Silent Failures in Multi-Agentic AI Trajectories Divya Pathak et.al. 2511.04032 null
2025-11-06 Abductive Inference in Retrieval-Augmented Language Models: Generating and Validating Missing Premises Shiyin Lin et.al. 2511.04020 null
2025-11-06 Specification-Guided Vulnerability Detection with Large Language Models Hao Zhu et.al. 2511.04014 null
2025-11-06 PSD2Code: Automated Front-End Code Generation from Design Files via Multimodal Large Language Models Yongxi Chen et.al. 2511.04012 null
2025-11-06 Memory- and Latency-Constrained Inference of Large Language Models via Adaptive Split Computing Mingyu Sung et.al. 2511.04002 null
2025-11-06 Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback Shiyin Lin et.al. 2511.03995 null
2025-11-06 TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training Michael Menezes et.al. 2511.03983 null
2025-11-06 LLMs and Cultural Values: the Impact of Prompt Language and Explicit Cultural Framing Bram Bulté et.al. 2511.03980 null
2025-11-06 Direct Semantic Communication Between Large Language Models via Vector Translation Fu-Chun Yang et.al. 2511.03945 null
2025-11-06 MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation Shih-Lun Wu et.al. 2511.03942 null
2025-11-06 RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods Raghav Sharma et.al. 2511.03939 null
2025-11-06 SynQuE: Estimating Synthetic Dataset Quality Without Annotations Arthur Chen et.al. 2511.03928 null
2025-11-06 Collaborative Agents for Automated Program Repair in Ruby Nikta Akbarpour et.al. 2511.03925 null
2025-11-05 The Human Flourishing Geographic Index: A County-Level Dataset for the United States, 2013–2023 Stefano M. Iacus et.al. 2511.03915 null
2025-11-05 GRAD: Graph-Retrieved Adaptive Decoding for Hallucination Mitigation Manh Nguyen et.al. 2511.03900 null
2025-11-05 Secure Code Generation at Scale with Reflexion Arup Datta et.al. 2511.03898 null
2025-11-05 KnowThyself: An Agentic Assistant for LLM Interpretability Suraj Prasai et.al. 2511.03878 null
2025-11-05 OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing Paradigms Arijit Bhattacharjee et.al. 2511.03866 null
2025-11-05 GAIA: Geothermal Analytics and Intelligent Agent Randy Harsuko et.al. 2511.03852 null
2025-11-05 To See or To Read: User Behavior Reasoning in Multimodal LLMs Tianning Dong et.al. 2511.03845 null
2025-11-05 ASAP: an Agentic Solution to Auto-optimize Performance of Large-Scale LLM Training Yuran Ding et.al. 2511.03844 null
2025-11-05 Divide, Cache, Conquer: Dichotomic Prompting for Efficient Multi-Label LLM-Based Classification Mikołaj Langner et.al. 2511.03830 null
2025-11-05 STARS: Segment-level Token Alignment with Rejection Sampling in Large Language Models Mohammad Atif Quamar et.al. 2511.03827 null
2025-11-05 How Different Tokenization Algorithms Impact LLMs and Transformer Models for Binary Code Analysis Ahmed Mostafa et.al. 2511.03825 null
2025-11-05 PLLuM: A Family of Polish Large Language Models Jan Kocoń et.al. 2511.03823 null
2025-11-05 Expert Evaluation of LLM World Models: A High- $T_c$ Superconductivity Case Study Haoyu Guo et.al. 2511.03782 null
2025-11-05 Scaling Agent Learning via Experience Synthesis Zhaorun Chen et.al. 2511.03773 link
2025-11-05 Disentangled Concepts Speak Louder Than Words:Explainable Video Action Recognition Jongseo Lee et.al. 2511.03725 null
2025-11-05 Outbidding and Outbluffing Elite Humans: Mastering Liar’s Poker via Self-Play and Reinforcement Learning Richard Dewey et.al. 2511.03724 null
2025-11-05 LLM-enhanced Air Quality Monitoring Interface via Model Context Protocol Yu-Erh Pan et.al. 2511.03706 null
2025-11-05 Do Androids Dream of Unseen Puppeteers? Probing for a Conspiracy Mindset in Large Language Models Francesco Corso et.al. 2511.03699 null
2025-11-05 AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing Mohsen Ahmadzadeh et.al. 2511.03697 null
2025-11-05 Whisper Leak: a side-channel attack on Large Language Models Geoff McDonald et.al. 2511.03675 null
2025-11-05 Watermarking Large Language Models in Europe: Interpreting the AI Act in Light of Technology Thomas Souverain et.al. 2511.03641 null
2025-11-05 Towards Transparent Stance Detection: A Zero-Shot Approach Using Implicit and Explicit Interpretability Apoorva Upadhyaya et.al. 2511.03635 null
2025-11-05 LiveTradeBench: Seeking Real-World Alpha with Large Language Models Haofei Yu et.al. 2511.03628 null
2025-11-05 PerfDojo: Automated ML Library Generation for Heterogeneous Architectures Andrei Ivanov et.al. 2511.03586 null
2025-11-05 ASVRI-Legal: Fine-Tuning LLMs with Retrieval Augmented Generation for Enhanced Legal Regulation One Octadion et.al. 2511.03563 null
2025-11-05 MultiZebraLogic: A Multilingual Logical Reasoning Benchmark Sofie Helene Bruun et.al. 2511.03553 null
2025-11-05 Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding Ziv Nevo et.al. 2511.03549 null
2025-11-05 U2F: Encouraging SWE-Agent to Seize Novelty without Losing Feasibility Wencheng Ye et.al. 2511.03517 null
2025-11-05 One Battle After Another: Probing LLMs’ Limits on Multi-Turn Instruction Following with a Benchmark Evolving Framework Qi Jia et.al. 2511.03508 null
2025-11-05 BanglaSTEM: A Parallel Corpus for Technical Domain Bangla-English Translation Kazi Reyazul Hasan et.al. 2511.03498 null
2025-11-05 RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse Yinsicheng Jiang et.al. 2511.03475 null
2025-11-05 Towards Scalable Web Accessibility Audit with MLLMs as Copilots Ming Gu et.al. 2511.03471 null
2025-11-05 CareMedEval dataset: Evaluating Critical Appraisal and Reasoning in the Biomedical Field Doria Bonzi et.al. 2511.03441 null
2025-11-05 Light over Heavy: Automated Performance Requirements Quantification with Linguistic Inducement Shihai Wang et.al. 2511.03421 null
2025-11-05 Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG Longpeng Qiu et.al. 2511.03410 null
2025-11-05 Efficient Reasoning via Thought-Training and Thought-Free Inference Canhui Wu et.al. 2511.03408 null
2025-11-05 Towards Realistic Project-Level Code Generation via Multi-Agent Collaboration and Semantic Architecture Modeling Qianhui Zhao et.al. 2511.03404 null
2025-11-05 GUIDES: Guidance Using Instructor-Distilled Embeddings for Pre-trained Robot Policy Enhancement Minquan Gao et.al. 2511.03400 null
2025-11-05 Computational Imaging Meets LLMs: Zero-Shot IDH Mutation Prediction in Brain Gliomas Syed Muqeem Mahmood et.al. 2511.03376 null
2025-11-05 LFC-DA: Logical Formula-Controlled Data Augmentation for Enhanced Logical Reasoning Shenghao Li et.al. 2511.03372 null
2025-11-05 EQ-Negotiator: Dynamic Emotional Personas Empower Small Language Models for Edge-Deployable Credit Negotiation Yunbo Long et.al. 2511.03370 null
2025-11-05 Silenced Biases: The Dark Side LLMs Learned to Refuse Rom Himelstein et.al. 2511.03369 null
2025-11-05 A Modular, Data-Free Pipeline for Multi-Label Intention Recognition in Transportation Agentic AI Applications Xiaocai Zhang et.al. 2511.03363 null
2025-11-05 Multi-Object Tracking Retrieval with LLaVA-Video: A Training-Free Solution to MOT25-StAG Challenge Yi Yang et.al. 2511.03332 null
2025-11-05 Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks Jindong Hong et.al. 2511.03328 null
2025-11-05 SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding Mauro Orazio Drago et.al. 2511.03325 null
2025-11-05 TASU: Text-Only Alignment for Speech Understanding Jing Peng et.al. 2511.03310 null
2025-11-05 How to Evaluate Speech Translation with Source-Aware Neural MT Metrics Mauro Cettolo et.al. 2511.03295 null
2025-11-05 UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM Hai Huang et.al. 2511.03293 null
2025-11-05 Let the Bees Find the Weak Spots: A Path Planning Perspective on Multi-Turn Jailbreak Attacks against LLMs Yize Liu et.al. 2511.03271 null
2025-11-05 SCALE: Upscaled Continual Learning of Large Language Models Jin-woo Lee et.al. 2511.03270 null
2025-11-05 Comparing the Performance of LLMs in RAG-based Question-Answering: A Case Study in Computer Science Literature Ranul Dayarathne et.al. 2511.03261 null
2025-11-05 Auditing M-LLMs for Privacy Risks: A Synthetic Benchmark and Evaluation Framework Junhao Li et.al. 2511.03248 null
2025-11-05 Death by a Thousand Prompts: Open Model Vulnerability Analysis Amy Chang et.al. 2511.03247 null
2025-11-05 IndicSuperTokenizer: An Optimized Tokenizer for Indic Multilingual LLMs Souvik Rana et.al. 2511.03237 null
2025-11-05 From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers Yi-Fei Liu et.al. 2511.03235 null
2025-11-05 Multimodal-Wireless: A Large-Scale Dataset for Sensing and Communication Tianhao Mao et.al. 2511.03220 null
2025-11-05 Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification Shaghayegh Kolli et.al. 2511.03217 null
2025-11-05 LGM: Enhancing Large Language Models with Conceptual Meta-Relations and Iterative Retrieval Wenchang Lei et.al. 2511.03214 null
2025-11-05 QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models Kuei-Chun Kao et.al. 2511.03206 null
2025-11-05 Large Language Models as Information Sources: Distinctive Characteristics and Types of Low-Quality Information Jiawei Zhou et.al. 2511.03198 null
2025-11-05 Understanding Robustness of Model Editing in Code LLMs: An Empirical Study Vinaik Chhetri et.al. 2511.03182 null
2025-11-05 Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control Rewida Ali et.al. 2511.03181 null
2025-11-05 BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture Shahriyar Zaman Ridoy et.al. 2511.03180 null
2025-11-05 Toward Autonomous Engineering Design: A Knowledge-Guided Multi-Agent Framework Varun Kumar et.al. 2511.03179 null
2025-11-05 SurgAnt-ViVQA: Learning to Anticipate Surgical Events through GRU-Driven Temporal Cross-Attention Shreyas C. Dhake et.al. 2511.03178 null
2025-11-05 AI as We Describe It: How Large Language Models and Their Applications in Health are Represented Across Channels of Public Discourse Jiawei Zhou et.al. 2511.03174 null
2025-11-05 Measuring Aleatoric and Epistemic Uncertainty in LLMs: Empirical Evaluation on ID and OOD QA Tasks Kevin Wang et.al. 2511.03166 null
2025-11-05 RefAgent: A Multi-agent LLM-based Framework for Automatic Software Refactoring Khouloud Oueslati et.al. 2511.03153 null
2025-11-05 From Measurement to Expertise: Empathetic Expert Adapters for Context-Based Empathy in Conversational AI Agents Erfan Shayegani et.al. 2511.03143 null
2025-11-05 A Proprietary Model-Based Safety Response Framework for AI Agents Qi Li et.al. 2511.03138 null
2025-11-05 Using Multi-modal Large Language Model to Boost Fireworks Algorithm’s Ability in Settling Challenging Optimization Tasks Shipeng Cen et.al. 2511.03137 null
2025-11-05 From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation Najrin Sultana et.al. 2511.03128 null
2025-11-05 Control Barrier Function for Aligning Large Language Models Yuya Miyaoka et.al. 2511.03121 null
2025-11-05 Large language models require a new form of oversight: capability-based monitoring Katherine C. Kellogg et.al. 2511.03106 null
2025-11-05 CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic Saad Mankarious et.al. 2511.03102 null
2025-11-05 ALAS: Transactional and Dynamic Multi-Agent LLM Planning Longling Geng et.al. 2511.03094 null
2025-11-05 SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators Jonathan Li et.al. 2511.03092 null
2025-11-05 PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech Michel Wong et.al. 2511.03080 null
2025-11-04 A Collaborative Reasoning Framework for Anomaly Diagnostics in Underwater Robotics Markus Buchholz et.al. 2511.03075 null
2025-11-04 Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge Drago Plecko et.al. 2511.03070 null
2025-11-04 Reading Between the Lines: The One-Sided Conversation Problem Victoria Ebert et.al. 2511.03056 null
2025-11-04 No-Human in the Loop: Agentic Evaluation at Scale for Recommendation Tao Zhang et.al. 2511.03051 null
2025-11-04 ROBoto2: An Interactive System and Dataset for LLM-assisted Clinical Trial Risk of Bias Assessment Anthony Hevia et.al. 2511.03048 null
2025-11-04 Unsupervised Evaluation of Multi-Turn Objective-Driven Interactions Emi Soroka et.al. 2511.03047 null
2025-11-04 Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis Yan Cathy Hua et.al. 2511.03034 null
2025-11-04 PublicAgent: Multi-Agent Design Principles From an LLM-Based Open Data Analysis Framework Sina Montazeri et.al. 2511.03023 null
2025-11-04 LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation Gyeom Hwangbo et.al. 2511.03001 null
2025-11-04 Zero-shot data citation function classification using transformer-based large language models (LLMs) Neil Byers et.al. 2511.02936 null
2025-11-04 Cache Mechanism for Agent RAG Systems Shuhang Lin et.al. 2511.02919 null
2025-11-04 Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models W. K. M Mithsara et.al. 2511.02894 null
2025-11-04 Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything Huawei Lin et.al. 2511.02834 null
2025-11-04 Can LLMs subtract numbers? Mayank Jobanputra et.al. 2511.02795 null
2025-11-04 When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning Chenyu Zhang et.al. 2511.02794 null
2025-11-04 When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Yiyang Zhou et.al. 2511.02779 null
2025-11-04 ConMeZO: Adaptive Descent-Direction Sampling for Gradient-Free Finetuning of Large Language Models Lejs Deen Behric et.al. 2511.02757 null
2025-11-04 Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning Bowen Jin et.al. 2511.02755 null
2025-11-04 AI Diffusion in Low Resource Language Countries Amit Misra et.al. 2511.02752 null
2025-11-04 Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning Farhad Rezazadeh et.al. 2511.02748 null
2025-11-04 CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents Jiayu Liu et.al. 2511.02734 link
2025-11-04 LLEXICORP: End-user Explainability of Convolutional Neural Networks Vojtěch Kůr et.al. 2511.02720 null
2025-11-04 ReleaseEval: A Benchmark for Evaluating Language Models in Automated Release Note Generation Qianru Meng et.al. 2511.02713 null
2025-11-04 VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models Zhicheng Zhang et.al. 2511.02712 null
2025-11-04 Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs Georgios Tzannetos et.al. 2511.02690 null
2025-11-04 Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes Mohammadsajad Alipour et.al. 2511.02681 null
2025-11-04 EasyTUS: A Comprehensive Framework for Fast and Accurate Table Union Search across Data Lakes Tim Otto et.al. 2511.02674 null
2025-11-04 Apriel-H1: Towards Efficient Enterprise Reasoning Models Oleksiy Ostapenko et.al. 2511.02651 null
2025-11-04 Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks Xiumei Deng et.al. 2511.02647 null
2025-11-04 DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning Lachlan McPheat et.al. 2511.02627 null
2025-11-04 Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis, Solution, and Interpretation Renfei Dang et.al. 2511.02626 null
2025-11-04 The Realignment Problem: When Right becomes Wrong in LLMs Aakash Sen Sharma et.al. 2511.02623 null
2025-11-04 Verifying LLM Inference to Prevent Model Weight Exfiltration Roy Rinberg et.al. 2511.02620 null
2025-11-04 UniChange: Unifying Change Detection with Multimodal Large Language Model Xu Zhang et.al. 2511.02607 null
2025-11-04 CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency Ehsan Aghazadeh et.al. 2511.02603 null
2025-11-04 Next Token Knowledge Tracing: Exploiting Pretrained LLM Representations to Decode Student Behaviour Max Norris et.al. 2511.02599 null
2025-11-04 A Large Language Model for Corporate Credit Scoring Chitro Majumdar et.al. 2511.02593 null
2025-11-04 The ORCA Benchmark: Evaluating Real-World Calculation Accuracy in Large Language Models Claudia Herambourg et.al. 2511.02589 null
2025-11-04 Smart-Hiring: An Explainable end-to-end Pipeline for CV Information Extraction and Job Matching Kenza Khelkhal et.al. 2511.02537 null
2025-11-04 Knowledge Graph-enhanced Large Language Model for Incremental Game PlayTesting Enhong Mu et.al. 2511.02534 null
2025-11-04 Causal Graph Neural Networks for Healthcare Munib Mesinovic et.al. 2511.02531 null
2025-11-04 Large Lemma Miners: Can LLMs do Induction Proofs for Hardware? Romy Peled et.al. 2511.02521 null
2025-11-04 ESA: Energy-Based Shot Assembly Optimization for Automatic Video Editing Yaosen Chen et.al. 2511.02505 null
2025-11-04 BRAINS: A Retrieval-Augmented System for Alzheimer’s Detection and Monitoring Rajan Das Gupta et.al. 2511.02490 null
2025-11-04 Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization Tao Liu et.al. 2511.02489 link
2025-11-04 Modeling Hawkish-Dovish Latent Beliefs in Multi-Agent Debate-Based LLMs for Monetary Policy Decision Classification Kaito Takano et.al. 2511.02469 null
2025-11-04 Auditable-choice reframing unlocks RL-based verification for open-ended tasks Mengyu Zhang et.al. 2511.02463 null
2025-11-04 Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas Giulia Iadisernia et.al. 2511.02458 null
2025-11-04 Who’s Who? LLM-assisted Software Traceability with Architecture Entity Recognition Dominik Fuchß et.al. 2511.02434 null
2025-11-04 Can Conversational AI Counsel for Change? A Theory-Driven Approach to Supporting Dietary Intentions in Ambivalent Individuals Michelle Bak et.al. 2511.02428 null
2025-11-04 From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics Nicolas Schuler et.al. 2511.02427 null
2025-11-04 ReAcTree: Hierarchical LLM Agent Trees with Control Flow for Long-Horizon Task Planning Jae-Woo Choi et.al. 2511.02424 null
2025-11-04 LLM4PG: Adapting Large Language Model for Pathloss Map Generation via Synesthesia of Machines Mingran Sun et.al. 2511.02423 null
2025-11-04 ChartM $^3$ : A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension Duo Xu et.al. 2511.02415 null
2025-11-04 EvoDev: An Iterative Feature-Driven Framework for End-to-End Software Development with LLM-based Agents Junwei Liu et.al. 2511.02399 null
2025-11-04 RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning Jiahe Song et.al. 2511.02384 null
2025-11-04 Revisiting put-that-there, context aware window interactions via LLMs Riccardo Bovo et.al. 2511.02378 null
2025-11-04 AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models Aashray Reddy et.al. 2511.02376 null
2025-11-04 AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda Mohd Nauman et.al. 2511.02374 null
2025-11-04 LUMA-RAG: Lifelong Multimodal Agents with Provably Stable Streaming Alignment Rohan Wandre et.al. 2511.02371 null
2025-11-04 An LLM-powered MILP modelling engine for workforce scheduling guided by expert knowledge Qingyang Li et.al. 2511.02364 null
2025-11-04 Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation Wongyu Kim et.al. 2511.02358 null
2025-11-04 An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks Xu Liu et.al. 2511.02356 null
2025-11-04 LTD-Bench: Evaluating Large Language Models by Letting Them Draw Liuhao Lin et.al. 2511.02347 link
2025-11-04 Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation Zhiwei Zhang et.al. 2511.02303 null
2025-11-04 VFocus: Better Verilog Generation from Large Language Model via Focused Reasoning Zhuorui Zhao et.al. 2511.02285 null
2025-11-04 SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning Fangxun Shu et.al. 2511.02280 link
2025-11-04 LA-MARRVEL: A Knowledge-Grounded and Language-Aware LLM Reranker for AI-MARRVEL in Rare Disease Diagnosis Jaeyeon Lee et.al. 2511.02263 null
2025-11-04 When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs Zhuoran Zhang et.al. 2511.02243 null
2025-11-04 Deep Ideation: Designing LLM Agents to Generate Novel Research Ideas on Scientific Concept Network Keyu Zhao et.al. 2511.02238 null
2025-11-04 An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM Jiawei Liu et.al. 2511.02234 null
2025-11-04 Quantitative Risk Assessment in Radiation Oncology via LLM-Powered Root Cause Analysis of Incident Reports Yuntao Wang et.al. 2511.02223 null
2025-11-04 TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data Changjiang Jiang et.al. 2511.02219 null
2025-11-04 IG-Pruning: Input-Guided Block Pruning for Large Language Models Kangyu Qiao et.al. 2511.02213 null
2025-11-04 Language-Enhanced Generative Modeling for PET Synthesis from MRI and Blood Biomarkers Zhengjie Zhang et.al. 2511.02206 null
2025-11-04 LLMs as Judges: Toward The Automatic Review of GSN-compliant Assurance Cases Gerhard Yu et.al. 2511.02203 null
2025-11-04 Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration Jingbo Wang et.al. 2511.02200 null
2025-11-04 Open the Oyster: Empirical Evaluation and Improvement of Code Reasoning Confidence in LLMs Shufan Wang et.al. 2511.02197 null
2025-11-04 Personalized Decision Modeling: Utility Optimization or Textualized-Symbolic Reasoning Yibo Zhao et.al. 2511.02194 null
2025-11-04 Pinpointing Trigger Moment for Grounded Video QA: Enhancing Spatio-temporal Grounding in Multimodal Large Language Models Jinhwan Seo et.al. 2511.02182 null
2025-11-04 Eliminating Multi-GPU Performance Taxes: A Systems Approach to Efficient Distributed LLMs Octavian Alexandru Trifan et.al. 2511.02168 null
2025-11-03 Rethinking LLM Human Simulation: When a Graph is What You Need Joseph Suh et.al. 2511.02135 null
2025-11-03 InsurAgent: A Large Language Model-Empowered Agent for Simulating Individual Behavior in Purchasing Flood Insurance Ziheng Geng et.al. 2511.02119 null
2025-11-03 Deep Value Benchmark: Measuring Whether Models Generalize Deep values or Shallow Preferences Joshua Ashkinaze et.al. 2511.02109 null
2025-11-03 Metamorphic Testing of Large Language Models for Natural Language Processing Steven Cho et.al. 2511.02108 null
2025-11-03 LLM Probing with Contrastive Eigenproblems: Improving Understanding and Applicability of CCS Stefan F. Schouten et.al. 2511.02089 null
2025-11-03 Watermarking Discrete Diffusion Language Models Avi Bagchi et.al. 2511.02083 null
2025-10-10 A Unified Biomedical Named Entity Recognition Framework with Large Language Models Tengxiao Lv et.al. 2510.08902 null
2025-09-25 SCRA-VQA: Summarized Caption-Rerank for Augmented Large Language Models in Visual Question Answering Yan Zhang et.al. 2509.20871 null
2025-08-12 LLaMA-Based Models for Aspect-Based Sentiment Analysis Jakub Šmíd et.al. 2508.08649 null
2025-07-23 BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems Malsha Ashani Mahawatta Dona et.al. 2507.17722 null
2025-07-23 AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer Danny D. Leybzon et.al. 2507.17718 null
2025-07-23 HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging Taha Ceritli et.al. 2507.17706 null
2025-07-23 Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models Changxin Tian et.al. 2507.17702 null
2025-07-23 Thinking Isn’t an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations Zhao Song et.al. 2507.17699 null
2025-07-23 Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks Ilias Chatzistefanidis et.al. 2507.17695 null
2025-07-23 Simulating multiple human perspectives in socio-ecological systems using large language models Yongchao Zeng et.al. 2507.17680 null
2025-07-23 See the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question Answering Junjie Wang et.al. 2507.17659 null
2025-07-23 Who Attacks, and Why? Using LLMs to Identify Negative Campaigning in 18M Tweets across 19 Countries Victor Hartman et.al. 2507.17636 null
2025-07-23 A Hybrid Early-Exit Algorithm for Large Language Models Based on Space Alignment Decoding (SPADE) Bowen Zheng et.al. 2507.17618 null
2025-07-22 LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMs Da-Chen Lian et.al. 2507.16809 null
2025-07-22 Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis Zhihao Xu et.al. 2507.16808 null
2025-07-22 Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning Yanjun Zheng et.al. 2507.16802 link
2025-07-23 Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent Xiaoyu Zhan et.al. 2507.16799 null
2025-07-22 Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning Helena Casademunt et.al. 2507.16795 link
2025-07-22 ChatChecker: A Framework for Dialogue System Testing and Evaluation Through Non-cooperative User Simulation Roman Mayr et.al. 2507.16792 null
2025-07-22 Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Hongyin Luo et.al. 2507.16784 link
2025-07-22 Cooling Matters: Benchmarking Large Language Models and Vision-Language Models on Liquid-Cooled Versus Air-Cooled H100 GPU Systems Imran Latif et.al. 2507.16781 null
2025-07-22 When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs Yue Li et.al. 2507.16773 null
2025-07-22 WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding Ran Wang et.al. 2507.16768 null
2025-07-21 Diffusion Beats Autoregressive in Data-Constrained Settings Mihir Prabhudesai et.al. 2507.15857 null
2025-07-21 Gemini 2.5 Pro Capable of Winning Gold at IMO 2025 Yichen Huang et.al. 2507.15855 null
2025-07-21 The Other Mind: How Language Models Exhibit Human Temporal Cognition Lingyu Li et.al. 2507.15851 link
2025-07-21 3LM: Bridging Arabic, STEM, and Code through Benchmarking Basma El Amel Boussaha et.al. 2507.15850 null
2025-07-21 The Impact of Language Mixing on Bilingual LLM Reasoning Yihao Li et.al. 2507.15849 null
2025-07-21 FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs Anh Nguyen et.al. 2507.15839 null
2025-07-21 Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation Alessandro B. Melchiorre et.al. 2507.15826 null
2025-07-21 ACS: An interactive framework for conformal selection Yu Gui et.al. 2507.15825 null
2025-07-21 Do AI models help produce verified bug fixes? Li Huang et.al. 2507.15822 null
2025-07-21 LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra Seth Karten et.al. 2507.15815 link
2025-07-18 CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Xiaoya Li et.al. 2507.14111 null
2025-07-18 Automated Interpretation of Non-Destructive Evaluation Contour Maps Using Large Language Models for Bridge Condition Assessment Viraj Nishesh Darji et.al. 2507.14107 null
2025-07-18 Generative AI-Driven High-Fidelity Human Motion Simulation Hari Iyer et.al. 2507.14097 null
2025-07-18 Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track Brian Ondov et.al. 2507.14096 null
2025-07-18 DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration Xiyun Li et.al. 2507.14088 null
2025-07-18 The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems? Maria Tsfasman et.al. 2507.14084 null
2025-07-18 DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits Garapati Keerthana et.al. 2507.14079 null
2025-07-18 Evaluating the Effectiveness of Cost-Efficient Large Language Models in Benchmark Biomedical Tasks Israt Jahan et.al. 2507.14045 null
2025-07-18 Architecting Human-AI Cocreation for Technical Services – Interaction Modes and Contingency Factors Jochen Wulf et.al. 2507.14034 null
2025-07-18 KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models Lam Nguyen et.al. 2507.14032 null
2025-07-17 VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding Shihao Wang et.al. 2507.13353 null
2025-07-17 Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes Tyler Loakman et.al. 2507.13335 null
2025-07-17 A Survey of Context Engineering for Large Language Models Lingrui Mei et.al. 2507.13334 null
2025-07-17 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner Zhouqi Hua et.al. 2507.13332 null
2025-07-17 GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM Kyeongjin Ahn et.al. 2507.13323 null
2025-07-17 Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark Junsu Kim et.al. 2507.13314 null
2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations Carlos Arriaga et.al. 2507.13302 null
2025-07-17 AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research Yilun Zhao et.al. 2507.13300 null
2025-07-17 Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management Luis Gasco et.al. 2507.13275 null
2025-07-17 Automating Steering for Safe Multimodal Large Language Models Lyucheng Wu et.al. 2507.13255 null
2025-07-16 Mitigating Object Hallucinations via Sentence-Level Early Intervention Shangpin Peng et.al. 2507.12455 null
2025-07-16 S2WTM: Spherical Sliced-Wasserstein Autoencoder for Topic Modeling Suman Adhya et.al. 2507.12451 null
2025-07-16 Describe Anything Model for Visual Question Answering on Text-rich Images Yen-Linh Vu et.al. 2507.12441 null
2025-07-16 Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models Yik Siu Chan et.al. 2507.12428 null
2025-07-16 Advancing Retrieval-Augmented Generation for Structured Enterprise and Internal Data Chandana Cheerla et.al. 2507.12425 null
2025-07-16 QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval Jaehyun Kwak et.al. 2507.12416 null
2025-07-16 SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Xinyi He et.al. 2507.12415 null
2025-07-16 Assessing the Value of Visual Input: A Benchmark of Multimodal Large Language Models for Robotic Path Planning Jacinto Colan et.al. 2507.12391 null
2025-07-16 Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics Meysam Alizadeh et.al. 2507.12372 null
2025-07-16 Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate Ana Davila et.al. 2507.12370 null
2025-07-15 Streaming 4D Visual Geometry Transformer Dong Zhuo et.al. 2507.11539 null
2025-07-15 DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering Yinsheng Li et.al. 2507.11527 null
2025-07-15 LLM-based ambiguity detection in natural language instructions for collaborative surgical robots Ana Davila et.al. 2507.11525 null
2025-07-15 AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air Shiyi Yang et.al. 2507.11515 null
2025-07-15 LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer Yaoxian Dong et.al. 2507.11457 null
2025-07-15 Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize? Yanjian Zhang et.al. 2507.11423 null
2025-07-15 Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations Miray Özcan et.al. 2507.11417 null
2025-07-15 Seq vs Seq: An Open Suite of Paired Encoders and Decoders Orion Weller et.al. 2507.11412 null
2025-07-15 KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? Soumadeep Saha et.al. 2507.11408 null
2025-07-15 EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes LG AI Research et.al. 2507.11407 null
2025-07-14 Fusing LLM Capabilities with Routing Data Tao Feng et.al. 2507.10540 null
2025-07-14 CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks Hongchao Jiang et.al. 2507.10535 null
2025-07-14 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Mingqi Wu et.al. 2507.10532 null
2025-07-14 Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI Jiangkai Wu et.al. 2507.10510 null
2025-07-14 Scene-Aware Conversational ADAS with Generative AI for Real-Time Driver Assistance Kyungtae Han et.al. 2507.10500 null
2025-07-14 Can You Detect the Difference? İsmail Tarım et.al. 2507.10475 null
2025-07-14 GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space David G. Shatwell et.al. 2507.10473 null
2025-07-14 MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking Mohamed T. Younes et.al. 2507.10472 null
2025-07-14 An Empirical Evaluation of AI-Powered Non-Player Characters’ Perceived Realism and Performance in Virtual Reality Environments Mikko Korkiakoski et.al. 2507.10469 null
2025-07-14 Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems Hammad Atta et.al. 2507.10457 null
2025-07-11 Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective Hangjie Yuan et.al. 2507.08801 null
2025-07-11 One Token to Fool LLM-as-a-Judge Yulai Zhao et.al. 2507.08794 null
2025-07-11 BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity Chenyang Song et.al. 2507.08771 null
2025-07-11 Multilingual Multimodal Software Developer for Code Generation Linzheng Chai et.al. 2507.08719 null
2025-07-11 KG-Attention: Knowledge Graph-Guided Attention at Test-Time via Bidirectional Information Aggregation Songlin Zhai et.al. 2507.08704 null
2025-07-11 ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way Rajarshi Roy et.al. 2507.08679 null
2025-07-11 LLMCup: Ranking-Enhanced Comment Updating with LLMs Hua Ge et.al. 2507.08671 null
2025-07-11 KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment Jiyao Zhang et.al. 2507.08665 null
2025-07-11 Introspection of Thought Helps AI Agents Haoran Sun et.al. 2507.08664 null
2025-07-11 Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning Xingguang Ji et.al. 2507.08649 null
2025-07-10 Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology Haochen Wang et.al. 2507.07999 null
2025-07-10 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Ziyue Li et.al. 2507.07996 null
2025-07-10 Multigranular Evaluation for Brain Visual Decoding Weihao Xia et.al. 2507.07993 null
2025-07-10 Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs Jeongseok Hyun et.al. 2507.07990 null
2025-07-10 Automating Expert-Level Medical Reasoning Evaluation of Large Language Models Shuang Zhou et.al. 2507.07988 null
2025-07-10 OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding JingLi Lin et.al. 2507.07984 null
2025-07-10 Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology Sabine Felde et.al. 2507.07983 null
2025-07-10 Defending Against Prompt Injection With a Few DefensiveTokens Sizhe Chen et.al. 2507.07974 null
2025-07-10 Can Large Language Models Improve Phishing Defense? A Large-Scale Controlled Experiment on Warning Dialogue Explanations Federico Maria Cau et.al. 2507.07916 null
2025-07-10 DTECT: Dynamic Topic Explorer & Context Tracker Suman Adhya et.al. 2507.07910 null
2025-07-09 Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor Vatsal Agarwal et.al. 2507.07106 null
2025-07-09 Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models Tiezheng Zhang et.al. 2507.07104 null
2025-07-09 Evaluating Attribute Confusion in Fashion Text-to-Image Generation Ziyue Liu et.al. 2507.07079 null
2025-07-09 5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage Ugur Ari et.al. 2507.07045 null
2025-07-09 UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations Fengran Mo et.al. 2507.07030 null
2025-07-09 First Return, Entropy-Eliciting Explore Tianyu Zheng et.al. 2507.07017 null
2025-07-09 GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning S M Taslim Uddin Raju et.al. 2507.07006 null
2025-07-09 Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs Yahan Yu et.al. 2507.06999 null
2025-07-09 MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation Qilong Xing et.al. 2507.06992 null
2025-07-09 Are They All Good? Evaluating the Quality of CoTs in LLM-based Code Generation Binquan Zhang et.al. 2507.06980 null
2025-07-08 Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers Zhiyuan Peng et.al. 2507.06223 null
2025-07-08 A Survey on Latent Reasoning Rui-Jie Zhu et.al. 2507.06203 null
2025-07-08 UQLM: A Python Package for Uncertainty Quantification in Large Language Models Dylan Bouchard et.al. 2507.06196 null
2025-07-08 SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads Jiale Lao et.al. 2507.06192 null
2025-07-08 Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review Zhicheng Lin et.al. 2507.06185 null
2025-07-08 Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling Prahitha Movva et.al. 2507.06183 null
2025-07-08 Data-Semantics-Aware Recommendation of Diverse Pivot Tables Whanhee Cho et.al. 2507.06171 null
2025-07-09 Skywork-R1V3 Technical Report Wei Shen et.al. 2507.06167 null
2025-07-08 Evaluation of Habitat Robotics using Large Language Models William Li et.al. 2507.06157 null
2025-07-08 Large Language Models Predict Human Well-being – But Not Equally Everywhere Pat Pataranutaporn et.al. 2507.06141 null
2025-07-07 Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing Chun-Hsiao Yeh et.al. 2507.05259 null
2025-07-07 Spatio-Temporal LLM: Reasoning about Environments and Actions Haozhen Zheng et.al. 2507.05258 null
2025-07-07 Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions Yuanzhe Hu et.al. 2507.05257 null
2025-07-07 Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Yana Wei et.al. 2507.05255 null
2025-07-07 Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models Ziqi Miao et.al. 2507.05248 null
2025-07-07 StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Meng Wei et.al. 2507.05240 null
2025-07-07 All in One: Visual-Description-Guided Unified Point Cloud Segmentation Zongyan Han et.al. 2507.05211 null
2025-07-07 CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale Jonathan Hyun et.al. 2507.05178 null
2025-07-07 OpenS2S: Advancing Open-Source End-to-End Empathetic Large Speech Language Model Chen Wang et.al. 2507.05177 null
2025-07-07 AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models Chinnappa Guggilla et.al. 2507.05157 null
2025-07-03 Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation Jiaer Xia et.al. 2507.02859 null
2025-07-03 Requirements Elicitation Follow-Up Question Generation Yuchen Shen et.al. 2507.02858 null
2025-07-03 MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs Purbesh Mitra et.al. 2507.02851 null
2025-07-03 Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection Ziqi Miao et.al. 2507.02844 null
2025-07-03 LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding Yuchen Ma et.al. 2507.02843 null
2025-07-03 StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason Kaiyi Zhang et.al. 2507.02841 null
2025-07-03 ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning Ruiyang Zhou et.al. 2507.02834 null
2025-07-03 SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model Wencheng Zhang et.al. 2507.02822 null
2025-07-03 Multimodal Mathematical Reasoning with Diverse Solving Perspective Wenhao Shi et.al. 2507.02804 null
2025-07-03 Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models Riccardo Cantini et.al. 2507.02799 null
2025-07-02 Kwai Keye-VL Technical Report Kwai Keye Team et.al. 2507.01949 null
2025-07-02 SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars Xiaosheng Zhao et.al. 2507.01939 null
2025-07-02 The Thin Line Between Comprehension and Persuasion in LLMs Adrian de Wynter et.al. 2507.01936 null
2025-07-02 Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations Wenhao Wang et.al. 2507.01930 null
2025-07-03 Decision-Oriented Text Evaluation Yu-Shiang Huang et.al. 2507.01923 null
2025-07-02 Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models Chengao Li et.al. 2507.01915 null
2025-07-02 Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning Qingdong He et.al. 2507.01908 null
2025-07-02 AI4Research: A Survey of Artificial Intelligence for Scientific Research Qiguang Chen et.al. 2507.01903 null
2025-07-02 High-Layer Attention Pruning with Rescaling Songtao Liu et.al. 2507.01900 null
2025-07-02 MiCoTA: Bridging the Learnability Gap with Intermediate CoT and Teacher Assistants Dongyi Ding et.al. 2507.01887 null
2025-07-01 Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives Sixun Dong et.al. 2506.24124 null
2025-06-30 Calligrapher: Freestyle Text Image Customization Yue Ma et.al. 2506.24123 null
2025-06-30 Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime Yuqing Wang et.al. 2506.24120 null
2025-06-30 DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World Xiangtai Li et.al. 2506.24102 null
2025-06-30 Logit-Gap Steering: Efficient Short-Suffix Jailbreaks for Aligned Large Language Models Tung-Ling Li et.al. 2506.24056 null
2025-06-30 Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC Xinming Wei et.al. 2506.24045 null
2025-06-30 A Survey on Vision-Language-Action Models for Autonomous Driving Sicong Jiang et.al. 2506.24044 null
2025-06-30 EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations Hyunjong Kim et.al. 2506.24016 null
2025-06-30 Large Language Models Don’t Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective Anselm R. Strohmaier et.al. 2506.24006 null
2025-06-30 Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning Seungjun Yi et.al. 2506.23998 null
2025-06-27 The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements Bingchen Zhao et.al. 2506.22419 null
2025-06-27 HyperCLOVA X THINK Technical Report NAVER Cloud HyperCLOVA X Team et.al. 2506.22403 null
2025-06-27 Refining Czech GEC: Insights from a Multi-Experiment Approach Petr Pechman et.al. 2506.22402 null
2025-06-27 QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization Danush Khanna et.al. 2506.22396 null
2025-06-27 What Makes ChatGPT Effective for Software Issue Resolution? An Empirical Study of Developer-ChatGPT Conversations in GitHub Ramtin Ehsani et.al. 2506.22390 null
2025-06-27 Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment Yue Zhang et.al. 2506.22385 null
2025-06-27 Probabilistic Optimality for Inference-time Scaling Youkang Wang et.al. 2506.22376 null
2025-06-27 Towards Fair Rankings: Leveraging LLMs for Gender Bias Detection and Measurement Maryam Mousavian et.al. 2506.22372 null
2025-06-27 Can Large Language Models Help Students Prove Software Correctness? An Experimental Study with Dafny Carolina Carreira et.al. 2506.22370 null
2025-06-27 Concept-Level AI for Telecom: Moving Beyond Large Language Models Viswanath Kumarskandpriya et.al. 2506.22359 null
2025-06-26 Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Ziyue Li et.al. 2506.21551 null
2025-06-26 mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale Xiaona Zhou et.al. 2506.21550 null
2025-06-26 PsyLite Technical Report Fangjun Ding et.al. 2506.21536 null
2025-06-26 Exploring the Design Space of 3D MLLMs for CT Report Generation Mohammed Baharoon et.al. 2506.21535 null
2025-06-26 “What’s Up, Doc?”: Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets Akshay Paruchuri et.al. 2506.21532 null
2025-06-26 Potemkin Understanding in Large Language Models Marina Mancoridis et.al. 2506.21521 null
2025-06-26 Mitigating Hallucination of Large Vision-Language Models via Dynamic Logits Calibration Jiahe Chen et.al. 2506.21509 null
2025-06-26 Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge Boyu Gou et.al. 2506.21506 null
2025-06-26 Bridging Offline and Online Reinforcement Learning for LLMs Jack Lanchantin et.al. 2506.21495 null
2025-06-26 Efficient and Reuseable Cloud Configuration Search Using Discovery Spaces Michael Johnston et.al. 2506.21467 null
2025-06-25 The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind Andrei Lupu et.al. 2506.20664 null
2025-06-25 Memento: Note-Taking for Your Future Self Chao Wan et.al. 2506.20642 null
2025-06-25 Towards Community-Driven Agents for Machine Learning Engineering Sijie Li et.al. 2506.20640 null
2025-06-25 DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation Shansan Gong et.al. 2506.20639 null
2025-06-25 AI Assistants to Enhance and Exploit the PETSc Knowledge Base Barry Smith et.al. 2506.20608 null
2025-06-25 Model Editing as a Double-Edged Sword: Steering Agent Ethical Behavior Toward Beneficence or Harm Baixiang Huang et.al. 2506.20606 null
2025-06-25 Video Perception Models for 3D Scene Synthesis Rui Huang et.al. 2506.20601 null
2025-06-25 HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction Zhonghao Shi et.al. 2506.20566 null
2025-06-25 Large Language Model-Driven Code Compliance Checking in Building Information Modeling Soumya Madireddy et.al. 2506.20551 null
2025-06-25 When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs Ammar Khairi et.al. 2506.20544 null
2025-06-24 ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing Long Xing et.al. 2506.19848 null
2025-06-24 JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning Ai Han et.al. 2506.19846 null
2025-06-24 MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration Yucheng Zhou et.al. 2506.19835 null
2025-06-24 Curating art exhibitions using machine learning Eurico Covas et.al. 2506.19813 null
2025-06-24 KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality Baochang Ren et.al. 2506.19807 null
2025-06-24 LLM-Based Social Simulations Require a Boundary Zengqing Wu et.al. 2506.19806 null
2025-06-24 KnowML: Improving Generalization of ML-NIDS with Attack Knowledge Graphs Xin Fan Guo et.al. 2506.19802 null
2025-06-24 Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study Yuqi Zhu et.al. 2506.19794 null
2025-06-24 SAGE: Strategy-Adaptive Generation Engine for Query Rewriting Teng Wang et.al. 2506.19783 null
2025-06-24 SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning Yuqian Fu et.al. 2506.19767 null
2025-06-23 jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval Michael Günther et.al. 2506.18902 null
2025-06-23 Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations Jiaming Han et.al. 2506.18898 null
2025-06-23 ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Jiaru Zou et.al. 2506.18896 null
2025-06-23 Universal Video Temporal Grounding with Generative Multi-modal Large Language Models Zeqian Li et.al. 2506.18883 null
2025-06-23 CommVQ: Commutative Vector Quantization for KV Cache Compression Junyan Li et.al. 2506.18879 null
2025-06-23 OmniGen2: Exploration to Advanced Multimodal Generation Chenyuan Wu et.al. 2506.18871 null
2025-06-23 TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting Zhongbin Guo et.al. 2506.18862 null
2025-06-23 LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Yuhao Wu et.al. 2506.18841 null
2025-06-23 STU-PID: Steering Token Usage via PID Controller for Efficient Large Language Model Reasoning Aryasomayajula Ram Bharadwaj et.al. 2506.18831 null
2025-06-23 Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories Islem Bouzenia et.al. 2506.18824 null
2025-06-20 VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning Zhangyang Qi et.al. 2506.17221 null
2025-06-20 No Free Lunch: Rethinking Internal Feedback for LLM Reasoning Yanzhi Zhang et.al. 2506.17219 null
2025-06-20 Fine-Tuning Lowers Safety and Disrupts Evaluation Consistency Kathleen C. Fraser et.al. 2506.17209 null
2025-06-20 Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems Matias Martinez et.al. 2506.17208 null
2025-06-20 Confidence Scoring for LLM-Generated SQL in Supply Chain Data Extraction Jiekai Ma et.al. 2506.17203 null
2025-06-20 Detecting LLM-Generated Short Answers and Effects on Learner Performance Shambhavi Bhushan et.al. 2506.17196 null
2025-06-20 The MedPerturb Dataset: What Non-Content Perturbations Reveal About Human and Clinical LLM Decision Making Abinitha Gourabathina et.al. 2506.17163 null
2025-06-20 Do We Need Large VLMs for Spotting Soccer Actions? Ritabrata Chakraborty et.al. 2506.17144 null
2025-06-20 Large Language Model Unlearning for Source Code Xue Jiang et.al. 2506.17125 null
2025-06-20 When Can Model-Free Reinforcement Learning be Enough for Thinking? Josiah P. Hanna et.al. 2506.17124 null
2025-06-18 PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning Yuhui Shi et.al. 2506.15683 null
2025-06-18 GenRecal: Generation after Recalibration from Large to Small Vision-Language Models Byung-Kwan Lee et.al. 2506.15681 null
2025-06-18 SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence Yao Zhang et.al. 2506.15672 null
2025-06-18 CC-LEARN: Cohort-based Consistency Learning Xiao Ye et.al. 2506.15662 null
2025-06-18 PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection Wenhao Li et.al. 2506.15656 null
2025-06-18 deepSURF: Detecting Memory Safety Vulnerabilities in Rust Through Fuzzing LLM-Augmented Harnesses Georgios Androutsopoulos et.al. 2506.15648 null
2025-06-18 Demystifying the Visual Quality Paradox in Multimodal Large Language Models Shuo Xing et.al. 2506.15645 null
2025-06-18 Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability Yusuke Sakai et.al. 2506.15629 null
2025-06-18 The Effect of State Representation on LLM Agent Behavior in Dynamic Routing Games Lyle Goodyear et.al. 2506.15624 null
2025-06-18 The Compositional Architecture of Regret in Large Language Models Xiangxiang Cui et.al. 2506.15617 null
2025-06-17 A Variational Framework for Improving Naturalness in Generative Spoken Language Models Li-Wei Chen et.al. 2506.14767 link
2025-06-17 ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM Yujun Wang et.al. 2506.14766 null
2025-06-17 Large Language Models – the Future of Fundamental Physics? Caroline Heneka et.al. 2506.14757 null
2025-06-17 Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs Ring Team et.al. 2506.14731 null
2025-06-17 AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes Jiahao Qiu et.al. 2506.14728 link
2025-06-17 HARMONY: A Scalable Distributed Vector Database for High-Throughput Approximate Nearest Neighbor Search Qian Xu et.al. 2506.14707 null
2025-06-17 Capacity Matters: a Proof-of-Concept for Transformer Memorization on Real-World Data Anton Changalidis et.al. 2506.14704 null
2025-06-17 Unified Software Engineering agent as AI Software Engineer Leonhard Applis et.al. 2506.14683 null
2025-06-17 AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models Ads Dawson et.al. 2506.14682 null
2025-06-17 Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality Yuto Harada et.al. 2506.14681 null
2025-06-16 Steering LLM Thinking with Budget Guidance Junyan Li et.al. 2506.13752 link
2025-06-16 Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability Shova Kuikel et.al. 2506.13746 link
2025-06-16 Instruction Following by Boosting Attention of Large Language Models Vitoria Guardieiro et.al. 2506.13734 null
2025-06-16 Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs Sayed Mohammad Vakilzadeh Hatefi et.al. 2506.13727 null
2025-06-16 Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models Arjun Krishna et.al. 2506.13726 null
2025-06-16 TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning Junru Zhang et.al. 2506.13705 link
2025-06-16 Balancing Knowledge Delivery and Emotional Comfort in Healthcare Conversational Systems Shang-Chi Tsai et.al. 2506.13692 null
2025-06-16 What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers Pulkit Gopalani et.al. 2506.13688 link
2025-06-16 An LLM’s Apology: Outsourcing Awkwardness in the Age of AI Twm Stone et.al. 2506.13685 null
2025-06-16 Prefix-Tuning+: Modernizing Prefix-Tuning through Attention Independent Prefix Data Haonan Wang et.al. 2506.13674 null
2025-06-13 code_transformed: The Influence of Large Language Models on Code Yuliang Xu et.al. 2506.12014 null
2025-06-13 Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making Xiaopeng Yuan et.al. 2506.12012 null
2025-06-13 VGR: Visual Grounded Reasoning Jiacong Wang et.al. 2506.11991 null
2025-06-13 How Visual Representations Map to Language Feature Space in Multimodal LLMs Constantin Venhoff et.al. 2506.11976 null
2025-06-13 Improving Large Language Model Safety with Contrastive Representation Learning Samuel Simko et.al. 2506.11938 null
2025-06-13 Temporal Dynamics of Emotions in Italian Online Soccer Fandoms Salvatore Citraro et.al. 2506.11934 null
2025-06-13 LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? Zihan Zheng et.al. 2506.11928 link
2025-06-13 Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache Xiaoran Liu et.al. 2506.11886 null
2025-06-13 Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment Alejandro Peña et.al. 2506.11880 null
2025-06-13 A Short Survey on Formalising Software Requirements using Large Language Models Arshad Beg et.al. 2506.11874 null
2025-06-12 AutoMind: Adaptive Knowledgeable Agent for Automated Data Science Yixin Ou et.al. 2506.10974 null
2025-06-12 Farseer: A Refined Scaling Law in Large Language Models Houyi Li et.al. 2506.10972 link
2025-06-12 Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs Qizhe Zhang et.al. 2506.10967 null
2025-06-12 ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark Kangwei Liu et.al. 2506.10960 link
2025-06-12 SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks Lianghong Guo et.al. 2506.10954 link
2025-06-12 Build the web for agents, not agents for the web Xing Han Lù et.al. 2506.10953 null
2025-06-12 Execution Guided Line-by-Line Code Generation Boaz Lavon et.al. 2506.10948 null
2025-06-12 GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models Evelyn Ma et.al. 2506.10946 null
2025-06-12 Self-Adapting Language Models Adam Zweiger et.al. 2506.10943 null
2025-06-12 Building a Media Ecosystem Observatory from Scratch: Infrastructure, Methodology, and Insights Zeynep Pehlivan et.al. 2506.10942 null
2025-06-11 Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling Tim Z. Xiao et.al. 2506.09998 null
2025-06-11 From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring Yang Li et.al. 2506.09996 null
2025-06-11 Large Language Models for Toxic Language Detection in Low-Resource Balkan Languages Amel Muminovic et.al. 2506.09992 link
2025-06-11 Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Xinyu Yang et.al. 2506.09991 null
2025-06-11 V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Mido Assran et.al. 2506.09985 link
2025-06-11 Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs Hiroshi Matsuda et.al. 2506.09983 null
2025-06-11 SRLAgent: Enhancing Self-Regulated Learning Skills through Gamification and LLM Assistance Wentao Ge et.al. 2506.09968 null
2025-06-11 Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing Junfei Wu et.al. 2506.09965 link
2025-06-11 Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy Sushant Gautam et.al. 2506.09958 link
2025-06-11 LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge Sahar Abdelnabi et.al. 2506.09956 null
2025-06-09 GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior Penghao Wu et.al. 2506.08012 link
2025-06-09 Play to Generalize: Learning to Reason Through Game Play Yunfei Xie et.al. 2506.08011 link
2025-06-09 Reinforcement Pre-Training Qingxiu Dong et.al. 2506.08007 null
2025-06-09 Reparameterized LLM Training via Orthogonal Equivalence Transformation Zeju Qiu et.al. 2506.08001 link
2025-06-09 Supporting Construction Worker Well-Being with a Multi-Agent Conversational AI System Fan Yang et.al. 2506.07997 null
2025-06-09 $τ^2$ -Bench: Evaluating Conversational Agents in a Dual-Control Environment Victor Barres et.al. 2506.07982 link
2025-06-09 HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization Hongzheng Chen et.al. 2506.07972 link
2025-06-09 CyberV: Cybernetics for Test-time Scaling in Video Understanding Jiahao Meng et.al. 2506.07971 link
2025-06-09 SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence Ziyang Gong et.al. 2506.07966 link
2025-06-09 Reinforcing Multimodal Understanding and Generation with Dual Self-rewards Jixiang Hong et.al. 2506.07963 null
2025-06-06 Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias Yuanzhe Hu et.al. 2506.06280 null
2025-06-06 CoMemo: LVLMs Need Image Context with Image Memory Shi Liu et.al. 2506.06279 link
2025-06-06 AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization Mukur Gupta et.al. 2506.06273 null
2025-06-06 Cartridges: Lightweight and general-purpose long context representations via self-study Sabri Eyuboglu et.al. 2506.06266 link
2025-06-06 PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time Weizhi Zhang et.al. 2506.06254 null
2025-06-06 DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation Jingyu Xiao et.al. 2506.06251 link
2025-06-06 Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models Zahra Babaiee et.al. 2506.06242 null
2025-06-06 Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge Yi Sui et.al. 2506.06240 null
2025-06-06 CompilerGPT: Leveraging Large Language Models for Analyzing and Acting on Compiler Optimization Reports Peter Pirkelbauer et.al. 2506.06227 null
2025-06-06 PROVSYN: Synthesizing Provenance Graphs for Data Augmentation in Intrusion Detection Systems Yi Huang et.al. 2506.06226 null
2025-06-05 Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets Lei Hsiung et.al. 2506.05346 null
2025-06-05 SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs Jiahui Wang et.al. 2506.05344 link
2025-06-05 Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning Xingjian Ran et.al. 2506.05341 null
2025-06-05 VideoMolmo: Spatio-Temporal Grounding Meets Pointing Ghazi Shazan Ahmad et.al. 2506.05336 link
2025-06-05 Search Arena: Analyzing Search-Augmented LLMs Mihran Miroyan et.al. 2506.05334 link
2025-06-05 MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning Xinyan Chen et.al. 2506.05331 link
2025-06-05 Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay Yifan Sun et.al. 2506.05316 null
2025-06-05 Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models Taha Entesari et.al. 2506.05314 null
2025-06-05 ProRefine: Inference-time Prompt Refinement with Textual Feedback Deepak Pandita et.al. 2506.05305 null
2025-06-05 Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos Weifeng Lin et.al. 2506.05302 null
2025-06-04 Language-Image Alignment with Fixed Text Encoders Jingfeng Yang et.al. 2506.04209 link
2025-06-04 Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Shuang Chen et.al. 2506.04207 link
2025-06-04 EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation Jinghan Jia et.al. 2506.04205 null
2025-06-04 Cascadia: A Cascade Serving System for Large Language Models Youhe Jiang et.al. 2506.04203 null
2025-06-04 TracLLM: A Generic Framework for Attributing Long Context LLMs Yanting Wang et.al. 2506.04202 link
2025-06-04 R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning Qingfei Zhao et.al. 2506.04185 link
2025-06-04 SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models Yuhao Wu et.al. 2506.04180 link
2025-06-04 SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling Anhao Zhao et.al. 2506.04179 null
2025-06-04 Does Prompt Design Impact Quality of Data Imputation by LLMs? Shreenidhi Srinivasan et.al. 2506.04172 null
2025-06-04 VISCA: Inferring Component Abstractions for Automated End-to-End Testing Parsa Alian et.al. 2506.04161 null
2025-06-03 Entity-Augmented Neuroscience Knowledge Retrieval Using Ontology and Semantic Understanding Capability of LLM Pralaypati Ta et.al. 2506.03145 null
2025-06-03 Not All Tokens Are Meant to Be Forgotten Xiangyu Zhou et.al. 2506.03142 null
2025-06-03 SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation Siqi Chen et.al. 2506.03139 link
2025-06-03 Native-Resolution Image Synthesis Zidong Wang et.al. 2506.03131 link
2025-06-03 AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation Lu Qiu et.al. 2506.03126 link
2025-06-03 AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation Prashanth Vijayaraghavan et.al. 2506.03122 null
2025-06-03 Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback Xiaoying Zhang et.al. 2506.03106 link
2025-06-03 TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models Chetwin Low et.al. 2506.03099 link
2025-06-03 EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models Mingzhe Li et.al. 2506.03067 null
2025-06-03 Facts Do Care About Your Language: Assessing Answer Quality of Multilingual LLMs Yuval Kansal et.al. 2506.03051 null
2025-05-30 MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning Yiqing Liang et.al. 2505.24871 link
2025-05-30 SiLVR: A Simple Language-based Video Reasoning Framework Ce Zhang et.al. 2505.24869 link
2025-05-30 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Mingjie Liu et.al. 2505.24864 null
2025-05-30 MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning Jingyan Shen et.al. 2505.24846 null
2025-05-30 Chameleon: A Flexible Data-mixing Framework for Language Model Pretraining and Finetuning Wanyun Xie et.al. 2505.24844 null
2025-05-30 Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck Yuwen Tan et.al. 2505.24840 null
2025-05-30 VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software Brandon Man et.al. 2505.24838 link
2025-05-30 Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs Juraj Vladika et.al. 2505.24830 null
2025-05-30 LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text Li yunhan et.al. 2505.24826 null
2025-05-30 PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models Yinggan Xu et.al. 2505.24823 null
2025-05-29 Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought Yunze Man et.al. 2505.23766 null
2025-05-29 From Chat Logs to Collective Insights: Aggregative Question Answering Wentao Zhang et.al. 2505.23765 null
2025-05-29 MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence Sihan Yang et.al. 2505.23764 null
2025-05-29 Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch Aneeshan Sain et.al. 2505.23763 null
2025-05-29 Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint Heekyung Lee et.al. 2505.23759 link
2025-05-29 DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning Ziyin Zhang et.al. 2505.23754 link
2025-05-29 ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks Akashah Shabbir et.al. 2505.23752 link
2025-05-29 Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences? Paul Gölz et.al. 2505.23749 null
2025-05-29 Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Diankun Wu et.al. 2505.23747 link
2025-05-29 Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time Mohamad Chehade et.al. 2505.23729 null
2025-05-28 Zero-Shot Vision Encoder Grafting via LLM Surrogates Kaiyu Yue et.al. 2505.22664 link
2025-05-28 AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models Feng Luo et.al. 2505.22662 null
2025-05-28 GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning Qingchen Yu et.al. 2505.22661 link
2025-05-28 3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model Wenbo Hu et.al. 2505.22657 null
2025-05-28 Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents Michael Kirchhof et.al. 2505.22655 null
2025-05-28 The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason Ang Lv et.al. 2505.22653 link
2025-05-28 Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese Hanjia Lyu et.al. 2505.22645 link
2025-05-28 Learning Composable Chains-of-Thought Fangcong Yin et.al. 2505.22635 null
2025-05-28 Spatial Knowledge Graph-Guided Multimodal Synthesis Yida Xue et.al. 2505.22633 null
2025-05-28 Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs Ziling Cheng et.al. 2505.22630 null
2025-05-27 Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making Yihan Wang et.al. 2505.21503 null
2025-05-27 Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment Xiaojun Jia et.al. 2505.21494 null
2025-05-27 Reinforcing General Reasoning without Verifiers Xiangxin Zhou et.al. 2505.21493 null
2025-05-27 Robust Hypothesis Generation: LLM-Automated Language Bias for Inductive Logic Programming Yang Yang et.al. 2505.21486 null
2025-05-27 Are Language Models Consequentialist or Deontological Moral Reasoners? Keenan Samway et.al. 2505.21479 null
2025-05-27 Policy Optimized Text-to-Image Pipeline Design Uri Gadot et.al. 2505.21478 null
2025-05-27 Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration Zijun Liu et.al. 2505.21471 link
2025-05-27 Do LLMs Need to Think in One Language? Correlation between Latent Language and Task Performance Shintaro Ozaki et.al. 2505.21458 null
2025-05-27 Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO Muzhi Zhu et.al. 2505.21457 null
2025-05-27 Can Large Reasoning Models Self-Train? Sheikh Shafayat et.al. 2505.21444 null
2025-05-26 Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs Hanting Chen et.al. 2505.20155 null
2025-05-26 UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter-Efficient Fine-Tuning of Large Models Xueyan Zhang et.al. 2505.20154 null
2025-05-26 MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents Ziming Wei et.al. 2505.20148 null
2025-05-26 FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities Jin Wang et.al. 2505.20147 null
2025-05-26 StructEval: Benchmarking LLMs’ Capabilities to Generate Structural Outputs Jialin Yang et.al. 2505.20139 null
2025-05-26 Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers Zhengliang Shi et.al. 2505.20128 null
2025-05-26 Agentic AI Process Observability: Discovering Behavioral Variability Fabiana Fournier et.al. 2505.20127 null
2025-05-26 TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent Dominik Meier et.al. 2505.20118 null
2025-05-26 Named Entity Recognition in Historical Italian: The Case of Giacomo Leopardi’s Zibaldone Cristian Santini et.al. 2505.20113 null
2025-05-26 ResSVD: Residual Compensated SVD for Large Language Model Compression Haolei Bai et.al. 2505.20112 null
2025-05-26 Language-Agnostic Suicidal Risk Detection Using Large Language Models June-Woo Kim et.al. 2505.20109 null
2025-05-26 Adaptive Deep Reasoning: Triggering Deep Thinking When Needed Yunhao Wang et.al. 2505.20101 null
2025-05-23 Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs Wafa Alghallabi et.al. 2505.18152 null
2025-05-23 First Finish Search: Efficient Test-Time Scaling in Large Language Models Aradhye Agarwal et.al. 2505.18149 null
2025-05-23 Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find Owen Bianchi et.al. 2505.18148 null
2025-05-23 Gaming Tool Preferences in Agentic LLMs Kazem Faghih et.al. 2505.18135 link
2025-05-23 Reward Model Overoptimisation in Iterated RLHF Lorenz Wolf et.al. 2505.18126 null
2025-05-23 UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification Poojah Ganesan et.al. 2505.18122 null
2025-05-23 ProgRM: Build Better GUI Agents with Progress Rewards Danyang Zhang et.al. 2505.18121 null
2025-05-23 Bidirectional Knowledge Distillation for Enhancing Sequential Recommendation with Large Language Models Jiongran Wu et.al. 2505.18120 null
2025-05-23 Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM Zinuo Li et.al. 2505.18110 null
2025-05-23 ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent Framework Lisheng Huang et.al. 2505.18105 null
2025-05-22 CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms Shilin Yan et.al. 2505.17020 link
2025-05-22 Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework Chenhao Zhang et.al. 2505.17019 link
2025-05-22 SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward Kaixuan Fan et.al. 2505.17018 link
2025-05-22 Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO Chengzhuo Tong et.al. 2505.17017 link
2025-05-22 Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models Runsen Xu et.al. 2505.17015 link
2025-05-22 SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding Haoning Wu et.al. 2505.17012 link
2025-05-22 R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning Huatong Song et.al. 2505.17005 link
2025-05-22 Do Large Language Models Excel in Complex Logical Reasoning with Formal Language? Jin Jiang et.al. 2505.16998 link
2025-05-22 DecoupledESC: Enhancing Emotional Support Generation via Strategy-Response Decoupled Preference Optimization Chao Zhang et.al. 2505.16995 null
2025-05-22 Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding Runpeng Yu et.al. 2505.16990 link
2025-05-21 The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation Patrick Kahardipraja et.al. 2505.15807 null
2025-05-21 Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering Hwan Chang et.al. 2505.15805 null
2025-05-21 STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs Zongzhao Li et.al. 2505.15804 null
2025-05-21 VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models Yuchen Yan et.al. 2505.15801 null
2025-05-21 Reverse Engineering Human Preferences with Reinforcement Learning Lisa Alazraki et.al. 2505.15795 null
2025-05-21 HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving Zhiwen Chen et.al. 2505.15793 null
2025-05-21 Large Language Models as Computable Approximations to Solomonoff Induction Jun Wan et.al. 2505.15784 null
2025-05-21 ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning Changtai Zhu et.al. 2505.15776 null
2025-05-21 Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention Huanxuan Liao et.al. 2505.15774 null
2025-05-21 MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling Cheng Yifan et.al. 2505.15772 null
2025-05-20 Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning Haolei Xu et.al. 2505.14684 null
2025-05-20 UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation Rui Tian et.al. 2505.14682 null
2025-05-20 UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Large Language Models Xiaojie Gu et.al. 2505.14679 null
2025-05-20 Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning Jiaer Xia et.al. 2505.14677 null
2025-05-20 Reward Reasoning Model Jiaxin Guo et.al. 2505.14674 null
2025-05-20 Quartet: Native FP4 Training Can Be Optimal for Large Language Models Roberto L. Castro et.al. 2505.14669 null
2025-05-20 ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions Bufang Yang et.al. 2505.14668 null
2025-05-20 Beyond Words: Multimodal LLM Knows When to Speak Zikai Liao et.al. 2505.14654 null
2025-05-20 General-Reasoner: Advancing LLM Reasoning Across All Domains Xueguang Ma et.al. 2505.14652 null
2025-05-20 Think Only When You Need with Large Hybrid-Reasoning Models Lingjie Jiang et.al. 2505.14631 null
2025-05-19 CIE: Controlling Language Model Text Generations Using Continuous Signals Vinay Samuel et.al. 2505.13448 link
2025-05-19 Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards Xiaoyuan Liu et.al. 2505.13445 null
2025-05-19 Optimizing Anytime Reasoning via Budget Relative Policy Optimization Penghui Qi et.al. 2505.13438 link
2025-05-19 SMOTExT: SMOTE meets Large Language Models Mateusz Bystroński et.al. 2505.13434 null
2025-05-19 Fine-tuning Quantized Neural Networks with Zeroth-order Optimization Sifeng Shang et.al. 2505.13430 null
2025-05-19 Understanding Complexity in VideoQA via Visual Program Generation Cristobal Eyzaguirre et.al. 2505.13429 null
2025-05-19 MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision Lingxiao Du et.al. 2505.13427 link
2025-05-19 Learnware of Language Models: Specialized Small Language Models Can Do Big Zhi-Hao Tan et.al. 2505.13425 null
2025-05-19 Make Still Further Progress: Chain of Thoughts for Tabular Data Leaderboard Si-Yang Liu et.al. 2505.13421 null
2025-05-19 FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning Zhuozhao Hu et.al. 2505.13419 link
2025-05-16 Modeling cognitive processes of natural reading with transformer-based Language Models Bruno Bianchi et.al. 2505.11485 null
2025-05-16 msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML Zhaolan Huang et.al. 2505.11483 null
2025-05-16 Improving Assembly Code Performance with Large Language Models via Reinforcement Learning Anjiang Wei et.al. 2505.11480 null
2025-05-16 HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages Zhilin Wang et.al. 2505.11475 null
2025-05-16 Disentangling Reasoning and Knowledge in Medical Large Language Models Rahul Thapa et.al. 2505.11462 null
2025-05-16 ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks Zhixiong Zhuang et.al. 2505.11459 null
2025-05-16 HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation Shaina Raza et.al. 2505.11454 null
2025-05-16 LLMs unlock new paths to monetizing exploits Nicholas Carlini et.al. 2505.11449 null
2025-05-16 Is Compression Really Linear with Code Intelligence? Xianzhen Luo et.al. 2505.11441 null
2025-05-16 GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art Chenkai Zhang et.al. 2505.11436 null
2025-05-15 End-to-End Vision Tokenizer Tuning Wenxuan Wang et.al. 2505.10562 null
2025-05-15 Neural Thermodynamic Laws for Large Language Model Training Ziming Liu et.al. 2505.10559 null
2025-05-15 MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning Ke Wang et.al. 2505.10557 link
2025-05-15 Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data Yiwen Liu et.al. 2505.10551 link
2025-05-15 Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models Annie Wong et.al. 2505.10543 link
2025-05-15 Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis Pengfei Wang et.al. 2505.10541 link
2025-05-15 S3C2 Summit 2024-09: Industry Secure Software Supply Chain Summit Imranur Rahman et.al. 2505.10538 null
2025-05-15 RouteNator: A Router-Based Multi-Modal Architecture for Generating Synthetic Training Data for Function Calling LLMs Vibha Belavadi et.al. 2505.10495 null
2025-05-15 Can You Really Trust Code Copilots? Evaluating Large Language Models from a Code Security Perspective Yutao Mou et.al. 2505.10494 link
2025-05-15 CL-RAG: Bridging the Gap in Retrieval-Augmented Generation with Curriculum Learning Shaohan Wang et.al. 2505.10493 null
2025-05-14 Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors Nicolas Dupuis et.al. 2505.09610 null
2025-05-14 Adversarial Suffix Filtering: a Defense Pipeline for LLMs David Khachaturov et.al. 2505.09602 null
2025-05-14 How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference Nidhal Jegham et.al. 2505.09598 null
2025-05-14 WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models Abdullah Mushtaq et.al. 2505.09595 null
2025-05-14 Variational Visual Question Answering Tobias Jan Wieczorek et.al. 2505.09591 null
2025-05-14 Beyond Likes: How Normative Feedback Complements Engagement Signals on Social Media Yuchen Wu et.al. 2505.09583 null
2025-05-14 Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach Shannon Lodoen et.al. 2505.09576 null
2025-05-14 MIGRATION-BENCH: Repository-Level Code Migration Benchmark from Java 8 Linbo Liu et.al. 2505.09569 null
2025-05-14 PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning Zongqian Li et.al. 2505.09519 null
2025-05-14 Layered Unlearning for Adversarial Relearning Timothy Qian et.al. 2505.09500 link
2025-05-13 CodePDE: An Inference Framework for LLM-driven PDE Solver Generation Shanda Li et.al. 2505.08783 null
2025-05-13 HealthBench: Evaluating Large Language Models Towards Improved Human Health Rahul K. Arora et.al. 2505.08775 link
2025-05-14 Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology Yatai Ji et.al. 2505.08765 null
2025-05-13 AC-Reason: Towards Theory-Guided Actual Causality Reasoning with Large Language Models Yanxi Zhang et.al. 2505.08750 null
2025-05-13 DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models Xiaoyang Chen et.al. 2505.08744 link
2025-05-13 Probability Consistency in Large Language Models: Theoretical Foundations Meet Empirical Discrepancies Xiaoliang Luo et.al. 2505.08739 null
2025-05-13 NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context Ben Yao et.al. 2505.08734 null
2025-05-13 Securing RAG: A Risk Assessment and Mitigation Framework Lukas Ammann et.al. 2505.08728 null
2025-05-13 PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts Yang Su et.al. 2505.08719 null
2025-05-13 LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs K M Sajjadul Islam et.al. 2505.08704 null
2025-05-12 A Comparative Analysis of Static Word Embeddings for Hungarian Máté Gedeon et.al. 2505.07809 null
2025-05-12 Learning Dynamics in Continual Pre-Training for Large Language Models Xingjin Wang et.al. 2505.07796 null
2025-05-12 Domain Regeneration: How well do LLMs match syntactic properties of text domains? Da Ju et.al. 2505.07784 null
2025-05-12 Relative Overfitting and Accept-Reject Framework Yanxin Liu et.al. 2505.07783 null
2025-05-12 MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering Rushi Qiang et.al. 2505.07782 null
2025-05-12 Must Read: A Systematic Survey of Computational Persuasion Nimet Beyza Bozdag et.al. 2505.07775 null
2025-05-12 Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving Xinji Mai et.al. 2505.07773 link
2025-05-12 Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding Yifeng Di et.al. 2505.07768 null
2025-05-12 Assessing the Chemical Intelligence of Large Language Models Nicholas T. Runcie et.al. 2505.07735 null
2025-05-12 Spoken Language Understanding on Unseen Tasks With In-Context Learning Neeraj Agrawal et.al. 2505.07731 null
2025-05-09 From Millions of Tweets to Actionable Insights: Leveraging LLMs for User Profiling Vahid Rahimzadeh et.al. 2505.06184 null
2025-05-09 A Large Language Model-Enhanced Q-learning for Capacitated Vehicle Routing Problem with Time Windows Linjiang Cao et.al. 2505.06178 null
2025-05-09 MonetGPT: Solving Puzzles Enhances MLLMs’ Image Retouching Skills Niladri Shekhar Dutt et.al. 2505.06176 null
2025-05-09 Turbo-ICL: In-Context Learning-Based Turbo Equalization Zihang Song et.al. 2505.06175 null
2025-05-09 A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets Ryan Lagasse et.al. 2505.06150 null
2025-05-09 Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study Faeze Ghorbanpour et.al. 2505.06149 null
2025-05-09 LLMs Get Lost In Multi-Turn Conversation Philippe Laban et.al. 2505.06120 link
2025-05-09 Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models Jugal Gajjar et.al. 2505.06110 null
2025-05-09 LLMs Outperform Experts on Challenging Biology Benchmarks Lennart Justen et.al. 2505.06108 null
2025-05-09 Free and Fair Hardware: A Pathway to Copyright Infringement-Free Verilog Generation using LLMs Sam Bush et.al. 2505.06096 null
2025-05-08 Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation Chao Liao et.al. 2505.05472 null
2025-05-08 Flow-GRPO: Training Flow Matching Models via Online RL Jie Liu et.al. 2505.05470 link
2025-05-08 Generating Physically Stable and Buildable LEGO Designs from Text Ava Pun et.al. 2505.05469 link
2025-05-08 StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant Haibo Wang et.al. 2505.05467 null
2025-05-08 ComPO: Preference Alignment via Comparison Oracles Peter Chen et.al. 2505.05465 null
2025-05-08 Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging Shiqi Chen et.al. 2505.05464 link
2025-05-08 UKElectionNarratives: A Dataset of Misleading Narratives Surrounding Recent UK General Elections Fatima Haouari et.al. 2505.05459 null
2025-05-08 SITE: towards Spatial Intelligence Thorough Evaluation Wenqi Wang et.al. 2505.05456 null
2025-05-08 Conversational Process Model Redesign Nataliia Klievtsova et.al. 2505.05453 null
2025-05-08 clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations Chalamalasetti Kranti et.al. 2505.05445 null
2025-05-07 EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning Zhenghao Xing et.al. 2505.04623 null
2025-05-07 On Path to Multimodal Generalist: General-Level and General-Bench Hao Fei et.al. 2505.04620 link
2025-05-07 OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution Lianghong Guo et.al. 2505.04606 null
2025-05-08 MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection Zhihao Zhang et.al. 2505.04594 null
2025-05-07 ZeroSearch: Incentivize the Search Capability of LLMs without Searching Hao Sun et.al. 2505.04588 link
2025-05-07 SlideItRight: Using AI to Find Relevant Slides and Provide Feedback for Open-Ended Questions Chloe Qianhui Zhao et.al. 2505.04584 null
2025-05-07 Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization Wenjun Cao et.al. 2505.04578 null
2025-05-07 Comparative Analysis of Carbon Footprint in Manual vs. LLM-Assisted Code Development Kuen Sum Cheung et.al. 2505.04521 null
2025-05-07 Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs Yehui Tang et.al. 2505.04519 null
2025-05-07 CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation Jiahao Li et.al. 2505.04481 null
2025-05-06 VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model Zuwei Long et.al. 2505.03739 link
2025-05-06 Graph Drawing for LLMs: An Empirical Evaluation Walter Didimo et.al. 2505.03678 null
2025-05-06 Binding threshold units with artificial oscillatory neurons Vladimir Fanaskov et.al. 2505.03648 null
2025-05-06 PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing Yiping Xie et.al. 2505.03621 null
2025-05-06 A Unifying Bias-aware Multidisciplinary Framework for Investigating Socio-Technical Issues Sacha Hasan et.al. 2505.03593 null
2025-05-06 BCause: Human-AI collaboration to improve hybrid mapping and ideation in argumentation-grounded deliberation Lucas Anastasiou et.al. 2505.03584 null
2025-05-06 DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes Sergey Linok et.al. 2505.03581 link
2025-05-06 LlamaFirewall: An open source guardrail system for building secure AI agents Sahana Chennabasappa et.al. 2505.03574 null
2025-05-06 Say It Another Way: A Framework for User-Grounded Paraphrasing Cléa Chataigner et.al. 2505.03563 null
2025-05-06 A Comprehensive Survey of Large AI Models for Future Communications: Foundations, Applications and Challenges Feibo Jiang et.al. 2505.03556 null
2025-05-05 Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation Lu Ling et.al. 2505.02836 null
2025-05-05 R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning Yi-Fan Zhang et.al. 2505.02835 link
2025-05-05 ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations Dmitriy Shopkhoev et.al. 2505.02819 link
2025-05-05 Towards Quantifying the Hessian Structure of Neural Networks Zhaorui Dong et.al. 2505.02809 null
2025-05-05 Generating HomeAssistant Automations Using an LLM-based Chatbot Mathyas Giudici et.al. 2505.02802 null
2025-05-05 HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models Zheng Lin et.al. 2505.02795 null
2025-05-05 Beyond the Monitor: Mixed Reality Visualization and AI for Enhanced Digital Pathology Workflow Jai Prakash Veerla et.al. 2505.02780 null
2025-05-05 Giving Simulated Cells a Voice: Evolving Prompt-to-Intervention Models for Cellular Control Nam H. Le et.al. 2505.02766 null
2025-05-05 Bye-bye, Bluebook? Automating Legal Procedure with Large Language Models Matthew Dahl et.al. 2505.02763 null
2025-05-05 Knowledge Graphs for Enhancing Large Language Models in Entity Disambiguation Pons Gerard et.al. 2505.02737 null
2025-05-02 Helping Big Language Models Protect Themselves: An Enhanced Filtering and Summarization System Sheikh Samit Muhaimin et.al. 2505.01315 null
2025-05-02 Enhancing SPARQL Query Rewriting for Complex Ontology Alignments Anicet Lepetit Ondo et.al. 2505.01309 null
2025-05-02 Document Retrieval Augmented Fine-Tuning (DRAFT) for safety-critical software assessments Regan Bolton et.al. 2505.01307 null
2025-05-02 FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing Gaoxiang Cong et.al. 2505.01263 null
2025-05-02 Digital Pathway Curation (DPC): a comparative pipeline to assess the reproducibility, consensus and accuracy across Gemini, PubMed, and scientific reviewers in biomedical research Flavio Lichtenstein et.al. 2505.01259 null
2025-05-02 CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning Tsai-Ning Wang et.al. 2505.01199 null
2025-05-02 LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures Francisco Aguilera-Martínez et.al. 2505.01177 null
2025-05-02 Methodological Foundations for AI-Driven Survey Question Generation Ted K. Mburu et.al. 2505.01150 null
2025-05-02 Retrieval-Augmented Generation in Biomedicine: A Survey of Technologies, Datasets, and Clinical Applications Jiawei He et.al. 2505.01146 null
2025-05-02 MateICL: Mitigating Attention Dispersion in Large-Scale In-Context Learning Murtadha Ahmed et.al. 2505.01110 null
2025-05-01 T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Dongzhi Jiang et.al. 2505.00703 link
2025-05-01 Steering Large Language Models with Register Analysis for Arbitrary Style Transfer Xinchen Yang et.al. 2505.00679 null
2025-05-01 Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions Yiming Du et.al. 2505.00675 link
2025-05-01 DeepCritic: Deliberate Critique with Large Language Models Wenkai Yang et.al. 2505.00662 link
2025-05-01 On the generalization of language models from in-context learning and finetuning: a controlled study Andrew K. Lampinen et.al. 2505.00661 null
2025-05-01 Large Language Models Understanding: an Inherent Ambiguity Barrier Daniel N. Nissani et.al. 2505.00654 null
2025-05-01 Open-Source LLM-Driven Federated Transformer for Predictive IoV Management Yazan Otoum et.al. 2505.00651 null
2025-05-01 Investigating Task Arithmetic for Zero-Shot Information Retrieval Marco Braga et.al. 2505.00649 null
2025-05-01 The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them) Zihao Wang et.al. 2505.00626 null
2025-05-01 FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation Chaitali Bhattacharyya et.al. 2505.00624 null
2025-04-30 TRUST: An LLM-Based Dialogue System for Trauma Understanding and Structured Assessments Sichang Tu et.al. 2504.21851 null
2025-04-30 COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning Xindi Wu et.al. 2504.21850 link
2025-04-30 An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding Xiuwei Shang et.al. 2504.21803 null
2025-04-30 DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition Z. Z. Ren et.al. 2504.21801 link
2025-04-30 MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness Junsheng Huang et.al. 2504.21773 null
2025-04-30 LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs Baleegh Ahmad et.al. 2504.21770 null
2025-04-30 LLM-based Interactive Imitation Learning for Robotic Manipulation Jonas Werner et.al. 2504.21769 null
2025-04-30 Investigating Literary Motifs in Ancient and Medieval Novels with Large Language Models Emelie Hallenberg et.al. 2504.21742 null
2025-04-30 TheraQuest: A Gamified, LLM-Powered Simulation for Massage Therapy Training Shengqian Wang et.al. 2504.21735 null
2025-04-30 XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs Marco Arazzi et.al. 2504.21700 null
2025-04-29 YoChameleon: Personalized Vision and Language Generation Thao Nguyen et.al. 2504.20998 link
2025-04-29 Toward Efficient Exploration by Large Language Model Agents Dilip Arumugam et.al. 2504.20997 null
2025-04-29 X-Fusion: Introducing New Modality to Frozen Large Language Models Sicheng Mo et.al. 2504.20996 null
2025-04-29 ACE: A Security Architecture for LLM-Integrated App Systems Evan Li et.al. 2504.20984 null
2025-04-29 Real-Time Wayfinding Assistant for Blind and Low-Vision Users Dabbrata Das et.al. 2504.20976 null
2025-04-29 SetKE: Knowledge Editing for Knowledge Elements Overlap Yifan Wei et.al. 2504.20972 null
2025-04-29 OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification Shangyu Li et.al. 2504.20964 null
2025-04-29 Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models Maryna Vyshnyvetska et.al. 2504.20951 null
2025-04-29 Trace-of-Thought: Enhanced Arithmetic Problem Solving via Reasoning Distillation From Large to Small Language Models Tyler McDonald et.al. 2504.20946 null
2025-04-29 ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification Ziqing Fan et.al. 2504.20930 link
2025-04-28 AutoJudge: Judge Decoding Without Manual Annotation Roman Garipov et.al. 2504.20039 null
2025-04-28 SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning Wufei Ma et.al. 2504.20024 null
2025-04-28 Better To Ask in English? Evaluating Factual Accuracy of Multilingual LLMs in English and Low-Resource Languages Pritika Rohera et.al. 2504.20022 null
2025-04-28 Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models Xin Wang et.al. 2504.20020 null
2025-04-28 LLM-Generated Fake News Induces Truth Decay in News Ecosystem: A Case Study on Neural News Recommendation Beizhe Hu et.al. 2504.20013 null
2025-04-28 Towards Automated Scoping of AI for Social Good Projects Jacob Emmerson et.al. 2504.20010 null
2025-04-28 Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom Rishika Sen et.al. 2504.20000 null
2025-04-28 TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons Emre Can Acikgoz et.al. 2504.19982 null
2025-04-28 Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets Adam Younsi et.al. 2504.19981 null
2025-04-29 From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification Junhao Ye et.al. 2504.19959 null
2025-04-25 TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation Gwen Yidou Weng et.al. 2504.18535 link
2025-04-25 Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation Shivam Duggal et.al. 2504.18509 null
2025-04-25 TopSpace: spatial topic modeling for unsupervised discovery of multicellular spatial tissue structures in multiplex imaging Junsouk Choi et.al. 2504.18495 null
2025-04-25 Investigating Co-Constructive Behavior of Large Language Models in Explanation Dialogues Leandra Fichtel et.al. 2504.18483 null
2025-04-25 Generative Induction of Dialogue Task Schemas with Streaming Refinement and Simulated Interactions James D. Finch et.al. 2504.18474 null
2025-04-25 Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation Peiyuan Jing et.al. 2504.18453 null
2025-04-25 LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection Rajesh Yarra et.al. 2504.18423 null
2025-04-25 BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs Hongyu Wang et.al. 2504.18415 null
2025-04-25 An Empirical Study of Evaluating Long-form Question Answering Ning Xian et.al. 2504.18413 null
2025-04-25 Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers Jared Moore et.al. 2504.18412 link
2025-04-24 Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models Xu Ma et.al. 2504.17789 null
2025-04-24 Replay to Remember: Retaining Domain Knowledge in Streaming Language Models Sneh Pillai et.al. 2504.17780 null
2025-04-24 Conversational Assistants to support Heart Failure Patients: comparing a Neurosymbolic Architecture with ChatGPT Anuja Tayal et.al. 2504.17753 null
2025-04-24 Towards Robust LLMs: an Adversarial Robustness Measurement Framework Natan Levy et.al. 2504.17723 null
2025-04-24 Multilingual Performance Biases of Large Language Models in Education Vansh Gupta et.al. 2504.17720 null
2025-04-24 Ensemble Bayesian Inference: Leveraging Small Language Models to Achieve LLM-level Accuracy in Profile Matching Tasks Haru-Tada Sato et.al. 2504.17685 null
2025-04-24 INSIGHT: Bridging the Student-Teacher Gap in Times of Large Language Models Jarne Thys et.al. 2504.17677 null
2025-04-24 Energy Considerations of Large Language Model Inference and Efficiency Optimizations Jared Fernandez et.al. 2504.17674 null
2025-04-24 Cross-region Model Training with Communication-Computation Overlapping and Delay Compensation Ying Zhu et.al. 2504.17672 null
2025-04-24 Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction Yuanchang Ye et.al. 2504.17671 null
2025-04-23 IberBench: LLM Evaluation on Iberian Languages José Ángel González et.al. 2504.16921 link
2025-04-23 Do Large Language Models know who did what to whom? Joseph M. Denning et.al. 2504.16884 null
2025-04-23 Enhancing Critical Thinking with AI: A Tailored Warning System for RAG Models Xuyang Zhu et.al. 2504.16883 null
2025-04-23 Context-Enhanced Vulnerability Detection Based on Large Language Model Yixin Yang et.al. 2504.16877 null
2025-04-23 Exploring How LLMs Capture and Represent Domain-Specific Knowledge Mirian Hipolito Garcia et.al. 2504.16871 null
2025-04-23 Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification Alexander Shvets et.al. 2504.16856 link
2025-04-23 Monte Carlo Planning with Large Language Model for Text-Based Game Agents Zijing Shi et.al. 2504.16855 null
2025-04-23 Improving Significant Wave Height Prediction Using Chronos Models Yilin Zhai et.al. 2504.16834 null
2025-04-23 LRASGen: LLM-based RESTful API Specification Generation Sida Deng et.al. 2504.16833 null
2025-04-23 GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning Luu Quy Tung et.al. 2504.16832 null
2025-04-22 TTRL: Test-Time Reinforcement Learning Yuxin Zuo et.al. 2504.16084 link
2025-04-22 From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning Le Zhuo et.al. 2504.16080 link
2025-04-22 LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Thomas Schmied et.al. 2504.16078 null
2025-04-22 PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models Shi Qiu et.al. 2504.16074 link
2025-04-22 A Python Tool for Reconstructing Full News Text from GDELT A. Fronzetti Colladon et.al. 2504.16063 null
2025-04-22 Vision language models are unreliable at trivial spatial cognition Sangeet Khemlani et.al. 2504.16061 null
2025-04-22 Automated Static Vulnerability Detection via a Holistic Neuro-symbolic Approach Penghui Li et.al. 2504.16057 null
2025-04-22 Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability Daniel Hendriks et.al. 2504.16056 null
2025-04-22 Certified Mitigation of Worst-Case LLM Copyright Infringement Jingyu Zhang et.al. 2504.16046 null
2025-04-22 LLMs meet Federated Learning for Scalable and Secure IoT Management Yazan Otoum et.al. 2504.16032 null
2025-04-21 Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs Chun-Hsiao Yeh et.al. 2504.15280 link
2025-04-21 VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Weiye Xu et.al. 2504.15279 link
2025-04-21 Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning Jie Cheng et.al. 2504.15275 link
2025-04-21 Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning Ehsan Ahmadi et.al. 2504.15263 null
2025-04-21 CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation Anirudh Khatry et.al. 2504.15254 link
2025-04-21 Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators Yilun Zhou et.al. 2504.15253 link
2025-04-21 MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning Yahan Yang et.al. 2504.15241 null
2025-04-21 Fully Bayesian Approaches to Topics over Time Julián Cendrero et.al. 2504.15220 null
2025-04-21 EvalAgent: Discovering Implicit Evaluation Criteria from the Web Manya Wadhwa et.al. 2504.15219 null
2025-04-21 Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs Marina Sakharova et.al. 2504.15210 null
2025-04-18 Generative AI Act II: Test Time Scaling Drives Cognition Engineering Shijie Xia et.al. 2504.13828 link
2025-04-18 Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models Junjie Yang et.al. 2504.13825 null
2025-04-18 Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning Yixuan Even Xu et.al. 2504.13818 null
2025-04-18 BadApex: Backdoor Attack Based on Adaptive Optimization Mechanism of Black-box Large Language Models Zhengxian Wu et.al. 2504.13775 null
2025-04-18 DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs Tamim Al Mahmud et.al. 2504.13774 null
2025-04-18 Detecting Malicious Source Code in PyPI Packages with LLMs: Does RAG Come in Handy? Motunrayo Ibiyo et.al. 2504.13769 null
2025-04-18 Scaling sparse feature circuit finding for in-context learning Dmitrii Kharlapenko et.al. 2504.13756 null
2025-04-18 Controlled Territory and Conflict Tracking (CONTACT): (Geo-)Mapping Occupied Territory from Open Source Intelligence Paul K. Mandal et.al. 2504.13730 null
2025-04-18 OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation Yichen Wu et.al. 2504.13707 null
2025-04-18 Exploring Multimodal Prompt for Visualization Authoring with Large Language Models Zhen Wen et.al. 2504.13700 null
2025-04-17 SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs Haoxuan Li et.al. 2504.13172 null
2025-04-17 Sleep-time Compute: Beyond Inference Scaling at Test-time Kevin Lin et.al. 2504.13171 link
2025-04-17 Exploring Expert Failures Improves LLM Agent Tuning Li-Cheng Lan et.al. 2504.13145 null
2025-04-17 Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo João Loula et.al. 2504.13139 null
2025-04-17 Energy-Based Reward Models for Robust Language Model Alignment Anamika Lochab et.al. 2504.13134 null
2025-04-17 LLMs Meet Finance: Fine-Tuning Foundation Models for the Open FinLLM Leaderboard Varun Rao et.al. 2504.13125 null
2025-04-17 Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training Xinsong Zhang et.al. 2504.13123 null
2025-04-17 VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Haojian Huang et.al. 2504.13122 link
2025-04-17 Hadamard product in deep learning: Introduction, Advances and Challenges Grigorios G Chrysos et.al. 2504.13112 null
2025-04-17 Uncertainty-Aware Trajectory Prediction via Rule-Regularized Heteroscedastic Deep Classification Kumar Manas et.al. 2504.13111 null
2025-04-16 BitNet b1.58 2B4T Technical Report Shuming Ma et.al. 2504.12285 null
2025-04-16 HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks Stefan Abi-Karam et.al. 2504.12268 null
2025-04-16 FLIP Reasoning Challenge Andreas Plesner et.al. 2504.12256 link
2025-04-16 AnomalyGen: An Automated Semantic Log Sequence Generation Framework with LLM for Anomaly Detection Xinyu Li et.al. 2504.12250 null
2025-04-16 MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models Hang Yuan et.al. 2504.12234 null
2025-04-16 Watermarking Needs Input Repetition Masking David Khachaturov et.al. 2504.12229 null
2025-04-16 d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning Siyan Zhao et.al. 2504.12216 link
2025-04-16 What Do Large Language Models Know? Tacit Knowledge as a Potential Causal-Explanatory Structure Céline Budding et.al. 2504.12187 null
2025-04-16 SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented Data Suyoung Bae et.al. 2504.12185 null
2025-04-16 Trusting CHATGPT: how minor tweaks in the prompts lead to major differences in sentiment classification Jaime E. Cuellar et.al. 2504.12180 null
2025-04-15 TextArena Leon Guertler et.al. 2504.11442 null
2025-04-15 TADACap: Time-series Adaptive Domain-Aware Captioning Elizabeth Fons et.al. 2504.11441 null
2025-04-15 Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models Maria Teleki et.al. 2504.11431 null
2025-04-15 A Dual-Space Framework for General Knowledge Distillation of Large Language Models Xue Zhang et.al. 2504.11426 null
2025-04-15 Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts Quanyu Long et.al. 2504.11420 null
2025-04-15 DataDecide: How to Predict Best Pretraining Data with Small Experiments Ian Magnusson et.al. 2504.11393 null
2025-04-15 RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models Juan Diego Rodriguez et.al. 2504.11381 null
2025-04-15 Cancer-Myth: Evaluating AI Chatbot on Patient Questions with False Presuppositions Wang Bill Zhu et.al. 2504.11373 null
2025-04-15 OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution Lucio La Cava et.al. 2504.11369 null
2025-04-15 Teaching Large Language Models to Reason through Learning and Forgetting Tianwei Ni et.al. 2504.11364 null
2025-04-14 InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Jinguo Zhu et.al. 2504.10479 null
2025-04-14 MIEB: Massive Image Embedding Benchmark Chenghao Xiao et.al. 2504.10471 null
2025-04-14 Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Tao Zhang et.al. 2504.10465 null
2025-04-14 The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer Weixian Lei et.al. 2504.10462 null
2025-04-14 GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents Xiaobo Xia et.al. 2504.10458 null
2025-04-14 M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models Junxiong Wang et.al. 2504.10449 null
2025-04-14 Multimodal Long Video Modeling Based on Temporal Dynamic Context Haoran Hao et.al. 2504.10443 null
2025-04-14 LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models Minqian Liu et.al. 2504.10430 null
2025-04-14 Can We Edit LLMs for Long-Tail Biomedical Knowledge? Xinhao Yi et.al. 2504.10421 null
2025-04-14 Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA Michał Turski et.al. 2504.10419 null
2025-04-11 Quantum Large Language Model Fine-Tuning Sang Hyub Kim et.al. 2504.08732 null
2025-04-11 DocAgent: A Multi-Agent System for Automated Code Documentation Generation Dayu Yang et.al. 2504.08725 null
2025-04-11 Hypergraph Vision Transformers: Images are More than Nodes, More than Edges Joshua Fixelle et.al. 2504.08710 null
2025-04-11 SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents Muhammad Shihab Rashid et.al. 2504.08703 null
2025-04-11 Large Language Models as Span Annotators Zdeněk Kasner et.al. 2504.08697 null
2025-04-11 TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning Hang Ni et.al. 2504.08694 null
2025-04-11 Fast-Slow-Thinking: Complex Task Solving with Large Language Models Yiliu Sun et.al. 2504.08690 null
2025-04-11 Voice Interaction With Conversational AI Could Facilitate Thoughtful Reflection and Substantive Revision in Writing Jiho Kim et.al. 2504.08687 null
2025-04-11 Variability-Driven User-Story Generation using LLM and Triadic Concept Analysis Alexandre Bazin et.al. 2504.08666 null
2025-04-11 Quality evaluation of Tabby coding assistant using real source code snippets Marta Borek et.al. 2504.08650 null
2025-04-10 C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing Zhongyang Li et.al. 2504.07964 link
2025-04-10 GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation Lang Lin et.al. 2504.07962 null
2025-04-10 MM-IFEngine: Towards Multimodal Instruction Following Shengyuan Ding et.al. 2504.07957 link
2025-04-10 VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Yukun Qi et.al. 2504.07956 null
2025-04-10 Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos Rundong Luo et.al. 2504.07940 null
2025-04-10 Porting an LLM based Application from ChatGPT to an On-Premise Environment Teemu Paloniemi et.al. 2504.07907 null
2025-04-10 Redefining Machine Translation on Social Network Services with Large Language Models Hongcheng Guo et.al. 2504.07901 null
2025-04-10 How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective Qi Liu et.al. 2504.07898 null
2025-04-10 Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge Riccardo Cantini et.al. 2504.07887 link
2025-04-10 Token Level Routing Inference System for Edge Devices Jianshu She et.al. 2504.07878 null
2025-04-09 Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning Nikhil Shivakumar Nayak et.al. 2504.07097 null
2025-04-09 KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs Elan Markowitz et.al. 2504.07087 null
2025-04-09 DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning Atharva Pandey et.al. 2504.07080 null
2025-04-09 A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models Zhouhang Xie et.al. 2504.07070 null
2025-04-09 HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification Bibek Paudel et.al. 2504.07069 null
2025-04-09 TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling Liang-Hsuan Tseng et.al. 2504.07053 null
2025-04-09 To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning Tian Qin et.al. 2504.07052 null
2025-04-09 Evaluating Retrieval Augmented Generative Models for Document Queries in Transportation Safety Chad Melton et.al. 2504.07022 null
2025-04-09 LLM-IFT: LLM-Powered Information Flow Tracking for Secure Hardware Nowfel Mashnoor et.al. 2504.07015 null
2025-04-09 Towards LLMs Robustness to Changes in Prompt Format Styles Lilian Ngweta et.al. 2504.06969 null
2025-04-08 GOLLuM: Gaussian Process Optimized LLMs – Reframing LLM Finetuning through Bayesian Optimization Bojana Ranković et.al. 2504.06265 null
2025-04-08 Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Gleb Rodionov et.al. 2504.06261 null
2025-04-08 FEABench: Evaluating Language Models on Multiphysics Reasoning Ability Nayantara Mudur et.al. 2504.06260 null
2025-04-08 Transfer between Modalities with MetaQueries Xichen Pan et.al. 2504.06256 null
2025-04-08 LExT: Towards Evaluating Trustworthiness of Natural Language Explanations Krithi Shailya et.al. 2504.06227 null
2025-04-08 Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation Biao Zhang et.al. 2504.06225 null
2025-04-08 Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs Dongyang Fan et.al. 2504.06219 null
2025-04-08 From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models Chejian Xu et.al. 2504.06214 null
2025-04-08 TxGemma: Efficient and Agentic LLMs for Therapeutics Eric Wang et.al. 2504.06196 null
2025-04-08 Assessing how hyperparameters impact Large Language Models’ sarcasm detection performance Montgomery Gole et.al. 2504.06166 null
2025-04-07 URECA: Unique Region Caption Anything Sangbeom Lim et.al. 2504.05305 null
2025-04-07 Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations Pedro Ferreira et.al. 2504.05294 null
2025-04-07 The challenge of uncertainty quantification of large language models in medicine Zahra Atf et.al. 2504.05278 null
2025-04-07 Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation Yucheng Chu et.al. 2504.05276 null
2025-04-07 Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models Yang Yan et.al. 2504.05262 null
2025-04-07 Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models Adrián Bazaga et.al. 2504.05258 null
2025-04-07 Explaining Low Perception Model Competency with High-Competency Counterfactuals Sara Pohland et.al. 2504.05254 null
2025-04-07 LLM-based Automated Grading with Human-in-the-Loop Hang Li et.al. 2504.05239 null
2025-04-08 Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG Hengran Zhang et.al. 2504.05220 null
2025-04-07 Unleashing the Power of LLMs in Dense Retrieval with Query Likelihood Modeling Hengran Zhang et.al. 2504.05216 null
2025-04-04 Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning Xinyi Wang et.al. 2504.03635 null
2025-04-04 Align to Structure: Aligning Large Language Models with Structural Information Zae Myung Kim et.al. 2504.03622 null
2025-04-04 VISTA-OCR: Towards generative and interactive end to end OCR models Laziz Hamdi et.al. 2504.03621 null
2025-04-04 Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task Leonardo Ranaldi et.al. 2504.03616 null
2025-04-04 AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset Bingxiang He et.al. 2504.03612 null
2025-04-04 EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline Peter Baile Chen et.al. 2504.03598 null
2025-04-04 Agentic Knowledgeable Self-awareness Shuofei Qiao et.al. 2504.03553 null
2025-04-04 Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles Chen Wei Kuo et.al. 2504.03520 null
2025-04-04 LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications Botao Zhu et.al. 2504.03444 null
2025-04-04 Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models Mirko Borszukovszki et.al. 2504.03440 null
2025-04-03 STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection Divya Velayudhan et.al. 2504.02823 null
2025-04-03 Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models Mateusz Pach et.al. 2504.02821 link
2025-04-03 Generative Evaluation of Complex Reasoning in Large Language Models Haowei Lin et.al. 2504.02810 link
2025-04-03 MegaMath: Pushing the Limits of Open Math Corpora Fan Zhou et.al. 2504.02807 link
2025-04-04 A Survey of Large Language Models in Mental Health Disorder Detection on Social Media Zhuohan Ge et.al. 2504.02800 null
2025-04-03 A Framework for Robust Cognitive Evaluation of LLMs Karin de Langis et.al. 2504.02789 null
2025-04-03 From Consumption to Collaboration: Measuring Interaction Patterns to Augment Human Cognition in Open-Ended Tasks Joshua Holstein et.al. 2504.02780 null
2025-04-03 BT-ACTION: A Test-Driven Approach for Modular Understanding of User Instruction Leveraging Behaviour Trees and LLMs Alexander Leszczynski et.al. 2504.02779 null
2025-04-03 How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices? Andres Algaba et.al. 2504.02767 null
2025-04-03 Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study Aryan Agrawal et.al. 2504.02733 null
2025-04-02 Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities Jing Liu et.al. 2504.01954 null
2025-04-02 The LLM Wears Prada: Analysing Gender Bias and Stereotypes through Online Shopping Data Massimiliano Luca et.al. 2504.01951 null
2025-04-02 OpenCodeReasoning: Advancing Data Distillation for Competitive Coding Wasi Uddin Ahmad et.al. 2504.01943 null
2025-04-02 Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length? Celine Lee et.al. 2504.01935 null
2025-04-02 A thorough benchmark of automatic text classification: From traditional approaches to large language models Washington Cunha et.al. 2504.01930 null
2025-04-02 Gen-C: Populating Virtual Worlds with Generative Crowds Andreas Panayiotou et.al. 2504.01924 null
2025-04-02 Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation Baban Gain et.al. 2504.01919 null
2025-04-02 Advancing AI-Scientist Understanding: Making LLM Think Like a Physicist with Interpretable Reasoning Yinggan Xu et.al. 2504.01911 null
2025-04-02 GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning Yanzhou Su et.al. 2504.01886 link
2025-04-02 TransientTables: Evaluating LLMs’ Reasoning on Temporally Evolving Semi-structured Tables Abhilash Shankarampeta et.al. 2504.01879 null
2025-03-31 Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Shengqiong Wu et.al. 2503.24379 link
2025-03-31 Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models Rui Wang et.al. 2503.24377 link
2025-03-31 Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Yi Chen et.al. 2503.24376 link
2025-03-31 Effectively Controlling Reasoning Models through Thinking Intervention Tong Wu et.al. 2503.24370 null
2025-03-31 ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion Rana Muhammad Shahroz Khan et.al. 2503.24354 null
2025-03-31 BEATS: Bias Evaluation and Assessment Test Suite for Large Language Models Alok Abhishek et.al. 2503.24310 null
2025-03-31 A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG Arshia Kermani et.al. 2503.24307 null
2025-03-31 Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning Jiacheng Lin et.al. 2503.24289 link
2025-03-31 Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality Sewoong Lee et.al. 2503.24277 link
2025-03-31 Enhancing Large Language Models (LLMs) for Telecommunications using Knowledge Graphs and Retrieval-Augmented Generation Dun Yuan et.al. 2503.24245 null
2025-03-28 Q-Insight: Understanding Image Quality via Visual Reinforcement Learning Weiqi Li et.al. 2503.22679 link
2025-03-28 QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks? Belinda Z. Li et.al. 2503.22674 link
2025-03-28 Exploring the Effectiveness of Multi-stage Fine-tuning for Cross-encoder Re-rankers Francesca Pezzuti et.al. 2503.22672 link
2025-03-28 Unicorn: Text-Only Data Synthesis for Vision Language Model Training Xiaomin Yu et.al. 2503.22655 link
2025-03-28 Sentiment Classification of Thai Central Bank Press Releases Using Supervised Learning Stefano Grassi et.al. 2503.22629 null
2025-03-28 Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users Antonia Karamolegkou et.al. 2503.22610 null
2025-03-28 On the Alignment of Post-Publication Reviews & Bibliometric and Altmetric Impact – A Case Study on Expert Statements from the Science Media Center Germany Dirk Tunger et.al. 2503.22594 null
2025-03-28 LLM-enabled Instance Model Generation Fengjunjie Pan et.al. 2503.22587 null
2025-03-28 Historical Ink: Exploring Large Language Models for Irony Detection in 19th-Century Spanish Kevin Cohen et.al. 2503.22585 link
2025-03-28 Beyond Vanilla Fine-Tuning: Leveraging Multistage, Multilingual, and Domain-Specific Methods for Low-Resource Machine Translation Sarubi Thillainathan et.al. 2503.22582 null
2025-03-27 Video-R1: Reinforcing Video Reasoning in MLLMs Kaituo Feng et.al. 2503.21776 link
2025-03-27 LOCORE: Image Re-ranking with Long-Context Sequence Modeling Zilin Xiao et.al. 2503.21772 link
2025-03-27 MemInsight: Autonomous Memory Augmentation for LLM Agents Rana Salama et.al. 2503.21760 null
2025-03-27 Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck Adrian Bulat et.al. 2503.21757 null
2025-03-27 LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis Shitian Zhao et.al. 2503.21749 link
2025-03-27 CTRL-O: Language-Controllable Object-Centric Visual Representation Learning Aniket Didolkar et.al. 2503.21747 null
2025-03-27 GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics Arsham Gholamzadeh Khoee et.al. 2503.21735 null
2025-03-27 Effective Skill Unlearning through Intervention and Abstention Yongce Li et.al. 2503.21730 link
2025-03-27 Collab: Controlled Decoding using Mixture of Agents for LLM Alignment Souradip Chakraborty et.al. 2503.21720 null
2025-03-27 Enhancing Repository-Level Software Repair via Repository-Aware Knowledge Graphs Boyang Yang et.al. 2503.21710 null
2025-03-26 Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark Sondos Mahmoud Bsharat et.al. 2503.20786 link
2025-03-26 Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields Shijie Zhou et.al. 2503.20776 null
2025-03-26 MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams Yanpeng Sun et.al. 2503.20745 null
2025-03-26 Dynamic Motion Blending for Versatile Motion Editing Nan Jiang et.al. 2503.20724 null
2025-03-26 From Annotation to Adaptation: Metrics, Synthetic Data, and Aspect Extraction for Aspect-Based Sentiment Analysis with Large Language Models Nikita Neveditsin et.al. 2503.20715 null
2025-03-27 Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy Yinan Sun et.al. 2503.20673 null
2025-03-26 TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews Huimin Xu et.al. 2503.20666 null
2025-03-26 Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging Han Wu et.al. 2503.20641 link
2025-03-26 Collaborative Storytelling and LLM: A Linguistic Analysis of Automatically-Generated Role-Playing Game Sessions Alessandro Maisto et.al. 2503.20623 null
2025-03-26 What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond Wenchao Gu et.al. 2503.20589 null
2025-03-25 CoLLM: A Large Language Model for Composed Image Retrieval Chuong Huynh et.al. 2503.19910 link
2025-03-25 A Multi-Agent Framework Integrating Large Language Models and Generative AI for Accelerated Metamaterial Design Jie Tian et.al. 2503.19889 null
2025-03-25 CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation Nengbo Wang et.al. 2503.19878 null
2025-03-25 SLA-Awareness for AI-assisted coding Kishanthan Thangarajah et.al. 2503.19876 null
2025-03-25 Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking Xiaoyu Tian et.al. 2503.19855 link
2025-03-25 Towards Online Multi-Modal Social Interaction Understanding Xinpeng Li et.al. 2503.19851 null
2025-03-25 FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs Carlos Plou et.al. 2503.19850 null
2025-03-25 A Comparative Analysis of Word Segmentation, Part-of-Speech Tagging, and Named Entity Recognition for Historical Chinese Sources, 1900-1950 Zhao Fang et.al. 2503.19844 null
2025-03-25 SemEval-2025 Task 9: The Food Hazard Detection Challenge Korbinian Randl et.al. 2503.19800 null
2025-03-25 PAVE: Patching and Adapting Video Large Language Models Zhuoming Liu et.al. 2503.19794 link
2025-03-24 SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding Mingze Xu et.al. 2503.18943 null
2025-03-24 Video-T1: Test-Time Scaling for Video Generation Fangfu Liu et.al. 2503.18942 link
2025-03-24 Exploring Training and Inference Scaling Laws in Generative Retrieval Hongru Cai et.al. 2503.18941 null
2025-03-24 Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training Brian R. Bartoldson et.al. 2503.18929 link
2025-03-24 FFN Fusion: Rethinking Sequential Computation in Large Language Models Akhiad Bercovich et.al. 2503.18908 null
2025-03-24 xKV: Cross-Layer SVD for KV-Cache Compression Chi-Chih Chang et.al. 2503.18893 link
2025-03-24 AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration Zhexuan Wang et.al. 2503.18891 null
2025-03-24 Toward building next-generation Geocoding systems: a systematic review Zhengcong Yin et.al. 2503.18888 null
2025-03-24 I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders Andrey Galichin et.al. 2503.18878 link
2025-03-24 Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design Rui Xie et.al. 2503.18869 null
2025-03-21 Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique Yansi Li et.al. 2503.17363 null
2025-03-21 OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Yihe Deng et.al. 2503.17352 link
2025-03-21 Capturing Individual Human Preferences with Reward Features André Barreto et.al. 2503.17338 null
2025-03-21 Efficient Intent-Based Filtering for Multi-Party Conversations Using Knowledge Distillation from LLMs Reem Gody et.al. 2503.17336 null
2025-03-21 CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities Yuxuan Zhu et.al. 2503.17332 link
2025-03-21 LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language Kun Chu et.al. 2503.17309 null
2025-03-21 Bugdar: AI-Augmented Secure Code Review for GitHub Pull Requests John Naulty et.al. 2503.17302 null
2025-03-21 CASE – Condition-Aware Sentence Embeddings for Conditional Semantic Textual Similarity Measurement Gaifan Zhang et.al. 2503.17279 null
2025-03-21 SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging Aladin Djuhera et.al. 2503.17239 null
2025-03-21 FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs Albert Sawczyn et.al. 2503.17229 null
2025-03-20 Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Yang Sui et.al. 2503.16419 link
2025-03-20 The Emperor’s New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination Yifan Sun et.al. 2503.16402 null
2025-03-20 Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them Guanyu Chen et.al. 2503.16401 null
2025-03-20 Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation Yijia Luo et.al. 2503.16385 link
2025-03-20 LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images Leyang Wang et.al. 2503.16376 null
2025-03-20 CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners Yunzhi Yao et.al. 2503.16356 link
2025-03-20 LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates Ying Shen et.al. 2503.16334 null
2025-03-20 OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence Long Yuan et.al. 2503.16326 null
2025-03-20 Bridging Technology and Humanities: Evaluating the Impact of Large Language Models on Social Sciences Research with DeepSeek-R1 Peiran Gu et.al. 2503.16304 null
2025-03-20 Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens Shuqi Lu et.al. 2503.16278 link
2025-03-19 SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks Yifei Zhou et.al. 2503.15478 link
2025-03-19 Cube: A Roblox View of 3D Intelligence Foundation AI Team et.al. 2503.15475 link
2025-03-19 From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment Jia-Nan Li et.al. 2503.15463 null
2025-03-19 Visual Position Prompt for MLLM based Visual Grounding Wei Tang et.al. 2503.15426 link
2025-03-19 Probing the topology of the space of tokens with structured prompts Michael Robinson et.al. 2503.15421 null
2025-03-19 EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models Yinan Liang et.al. 2503.15369 null
2025-03-19 SemEval-2025 Task 1: AdMIRe – Advancing Multimodal Idiomaticity Representation Thomas Pickard et.al. 2503.15358 null
2025-03-19 SPILL: Domain-Adaptive Intent Clustering based on Selection and Pooling with Large Language Models I-Fan Lin et.al. 2503.15351 null
2025-03-19 TruthLens:A Training-Free Paradigm for DeepFake Detection Ritabrata Chakraborty et.al. 2503.15342 null
2025-03-19 Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs Yuqi Zhu et.al. 2503.15341 null
2025-03-18 Aligning Multimodal LLM with Human Preference: A Survey Tao Yu et.al. 2503.14504 null
2025-03-18 Engineering Scientific Assistants using Interactive Structured Induction of Programs Shraddha Surana et.al. 2503.14488 null
2025-03-18 Gricean Norms as a Basis for Effective Collaboration Fardin Saad et.al. 2503.14484 null
2025-03-18 Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Xinyu Fang et.al. 2503.14478 link
2025-03-18 EnvBench: A Benchmark for Automated Environment Setup Aleksandra Eliseeva et.al. 2503.14443 link
2025-03-18 LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers Nikhil Abhyankar et.al. 2503.14434 link
2025-03-18 PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play Wei Fang et.al. 2503.14432 null
2025-03-18 Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models Siwei Zhang et.al. 2503.14411 null
2025-03-18 Large Language Models for Virtual Human Gesture Selection Parisa Ghanad Torshizi et.al. 2503.14408 null
2025-03-18 From “Hallucination” to “Suture”: Insights from Language Philosophy to Enhance Large Language Models Qiantong Wang et.al. 2503.14392 null
2025-03-17 MetaScale: Test-Time Scaling with Evolving Meta-Thoughts Qin Liu et.al. 2503.13447 null
2025-03-17 Faithfulness of LLM Self-Explanations for Commonsense Tasks: Larger Is Better, and Instruction-Tuning Allows Trade-Offs but Not Pareto Dominance Noah Y. Siegel et.al. 2503.13445 null
2025-03-17 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning Ye Liu et.al. 2503.13444 null
2025-03-17 xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference Maximilian Beck et.al. 2503.13427 null
2025-03-17 A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives Weiqiang Jin et.al. 2503.13415 null
2025-03-17 DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective Dengyun Peng et.al. 2503.13413 null
2025-03-17 Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis Alexander Ku et.al. 2503.13401 null
2025-03-17 MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research James Burgess et.al. 2503.13399 null
2025-03-17 Scale Efficient Training for Large Datasets Qing Zhou et.al. 2503.13385 null
2025-03-17 Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning Mengyao Lyu et.al. 2503.13383 null
2025-03-14 ASMA-Tune: Unlocking LLMs’ Assembly Code Comprehension via Structural-Semantic Instruction Tuning Xinyi Wang et.al. 2503.11617 null
2025-03-14 Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs using Semantic Space Zhiliang Chen et.al. 2503.11586 null
2025-03-14 Synthesizing Access Control Policies using Large Language Models Adarsh Vatsa et.al. 2503.11573 null
2025-03-14 Implicit Bias-Like Patterns in Reasoning Models Messi H. J. Lee et.al. 2503.11572 null
2025-03-14 VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity Jing Bi et.al. 2503.11557 null
2025-03-14 Potential of large language model-powered nudges for promoting daily water and energy conservation Zonghan Li et.al. 2503.11531 null
2025-03-14 HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models Ziqin Zhou et.al. 2503.11513 null
2025-03-14 V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning Zixu Cheng et.al. 2503.11495 null
2025-03-14 A Review of DeepSeek Models’ Key Innovative Techniques Chengen Wang et.al. 2503.11486 null
2025-03-14 T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation Seyed Mohammad Hadi Hosseini et.al. 2503.11481 null
2025-03-13 GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Rongyao Fang et.al. 2503.10639 link
2025-03-13 HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model Jiaming Liu et.al. 2503.10631 null
2025-03-13 UniGoal: Towards Universal Zero-shot Goal-oriented Navigation Hang Yin et.al. 2503.10630 null
2025-03-13 DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding Ayesha Ishaq et.al. 2503.10621 link
2025-03-13 From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM Kshitij Ambilduke et.al. 2503.10620 null
2025-03-13 Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search Andy Zhou et.al. 2503.10619 null
2025-03-13 Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models Andy Zhou et.al. 2503.10617 null
2025-03-13 R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Yi Yang et.al. 2503.10615 link
2025-03-13 CoSTA $\ast$ : Cost-Sensitive Toolpath Agent for Multi-turn Image Editing Advait Gupta et.al. 2503.10613 link
2025-03-13 TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention Jinhao Duan et.al. 2503.10602 link
2025-03-12 MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System Jihao Zhao et.al. 2503.09600 null
2025-03-12 How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation Ruohao Guo et.al. 2503.09598 null
2025-03-12 SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment Katrin Renz et.al. 2503.09594 null
2025-03-12 BIMBA: Selective-Scan Compression for Long-Range Video Question Answering Md Mohaiminul Islam et.al. 2503.09590 null
2025-03-12 Cost-Optimal Grouped-Query Attention for Long-Context LLMs Yingfa Chen et.al. 2503.09579 link
2025-03-12 Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks Lutfi Eren Erdogan et.al. 2503.09572 null
2025-03-12 Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models Qiguang Chen et.al. 2503.09567 null
2025-03-12 Large Language Models for Multi-Facility Location Mechanism Design Nguyen Thach et.al. 2503.09533 null
2025-03-12 Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Bowen Jin et.al. 2503.09516 null
2025-03-12 ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning Ziyu Wan et.al. 2503.09501 null
2025-03-11 Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs Ariba Khan et.al. 2503.08688 null
2025-03-11 OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models Jialv Zou et.al. 2503.08686 null
2025-03-11 Self-Taught Self-Correction for Small Language Models Viktor Moskvoretskii et.al. 2503.08681 null
2025-03-11 Exploring the Word Sense Disambiguation Capabilities of Large Language Models Pierpaolo Basile et.al. 2503.08662 null
2025-03-11 LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Xianfeng Wu et.al. 2503.08619 null
2025-03-11 EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments Dongping Li et.al. 2503.08604 null
2025-03-11 NSF-SciFy: Mining the NSF Awards Database for Scientific Claims Delip Rao et.al. 2503.08600 null
2025-03-11 HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding Shehreen Azad et.al. 2503.08585 null
2025-03-11 RAG-Adapter: A Plug-and-Play RAG-enhanced Framework for Long Video Understanding Xichen Tan et.al. 2503.08576 null
2025-03-11 DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process Minjun Zhu et.al. 2503.08569 null
2025-03-10 Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru Dunant Cusipuma et.al. 2503.07587 null
2025-03-10 Talking to GDELT Through Knowledge Graphs Audun Myers et.al. 2503.07584 null
2025-03-10 AutoSpatial: Visual-Language Reasoning for Social Robot Navigation through Efficient Spatial Reasoning Learning Yangzhe Kong et.al. 2503.07557 null
2025-03-10 Junior Software Developers’ Perspectives on Adopting LLMs for Software Engineering: a Systematic Literature Review Samuel Ferino et.al. 2503.07556 null
2025-03-10 KSOD: Knowledge Supplement for LLMs On Demand Haoran Li et.al. 2503.07550 null
2025-03-10 Bi-Directional Mental Model Reconciliation for Human-Robot Interaction with Large Language Models Nina Moorman et.al. 2503.07547 null
2025-03-10 Queueing, Predictions, and LLMs: Challenges and Open Problems Michael Mitzenmacher et.al. 2503.07545 null
2025-03-10 XIFBench: Evaluating Large Language Models on Multilingual Instruction Following Zhenyu Li et.al. 2503.07539 null
2025-03-10 TokenButler: Token Importance is Predictable Yash Akhauri et.al. 2503.07518 null
2025-03-10 Language Models Fail to Introspect About Their Knowledge of Language Siyuan Song et.al. 2503.07513 null
2025-03-10 LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition? Bangyan Li et.al. 2503.07487 null
2025-03-10 GenAIReading: Augmenting Human Cognition with Interactive Digital Textbooks Using Large Language Models and Image Generation Models Ryugo Morita et.al. 2503.07463 null
2025-03-10 MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Xiangru Tang et.al. 2503.07459 null
2025-03-10 LLMs syntactically adapt their language use to their conversational partner Florian Kandra et.al. 2503.07457 null
2025-03-10 From Idea to Implementation: Evaluating the Influence of Large Language Models in Software Development – An Opinion Paper Sargam Yadav et.al. 2503.07450 null
2025-03-10 From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics Jaewook Lee et.al. 2503.07429 null
2025-03-10 RePO: ReLU-based Preference Optimization Junkang Wu et.al. 2503.07426 null
2025-03-10 REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding Yan Tai et.al. 2503.07413 link
2025-03-10 Revisiting Noise in Natural Language Processing for Computational Social Science Nadav Borenstein et.al. 2503.07395 null
2025-03-10 Process-Supervised LLM Recommenders via Flow-guided Tuning Chongming Gao et.al. 2503.07377 null
2025-03-07 Understanding the Limits of Lifelong Knowledge Editing in LLMs Lukas Thede et.al. 2503.05683 null
2025-03-07 A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval Yu Zhang et.al. 2503.05659 null
2025-03-07 Learning LLM Preference over Intra-Dialogue Pairs: A Framework for Utterance-level Understandings Xuanqing Liu et.al. 2503.05620 null
2025-03-07 A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models Dong Shu et.al. 2503.05613 null
2025-03-07 R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Huatong Song et.al. 2503.05592 null
2025-03-07 Evaluating open-source Large Language Models for automated fact-checking Nicolo’ Fontana et.al. 2503.05565 null
2025-03-07 Revitalizing Saturated Benchmarks: A Weighted Metric Approach for Differentiating Large Language Model Performance Bryan Etzine et.al. 2503.05551 null
2025-03-07 Leveraging Approximate Caching for Faster Retrieval-Augmented Generation Shai Bergman et.al. 2503.05530 null
2025-03-07 PoSSUM: A Protocol for Surveying Social-media Users with Multimodal LLMs Roberto Cerina et.al. 2503.05529 null
2025-03-07 Cognitive Bias Detection Using Advanced Prompt Engineering Frederic Lemieux et.al. 2503.05516 null
2025-03-06 L $^2$ M: Mutual Information Scaling Law for Long-Context Language Modeling Zhuo Chen et.al. 2503.04725 null
2025-03-06 Shifting Long-Context LLMs Research from Input to Output Yuhao Wu et.al. 2503.04723 null
2025-03-06 Enough Coin Flips Can Make LLMs Act Bayesian Ritwik Gupta et.al. 2503.04722 null
2025-03-06 Predictable Scale: Part I – Optimal Hyperparameter Scaling Law in Large Language Model Pretraining Houyi Li et.al. 2503.04715 null
2025-03-06 Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size Alireza Behtash et.al. 2503.04704 null
2025-03-06 UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets Wenyu Wang et.al. 2503.04693 null
2025-03-06 Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases Pengcheng Qiu et.al. 2503.04691 null
2025-03-06 LLM-guided Plan and Retrieval: A Strategic Alignment for Interpretable User Satisfaction Estimation in Dialogue Sangyeop Kim et.al. 2503.04675 null
2025-03-06 RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining Tengfei Zhang et.al. 2503.04653 null
2025-03-06 Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment Wen Yang et.al. 2503.04647 null
2025-03-05 The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems Richard Ren et.al. 2503.03750 null
2025-03-05 Process-based Self-Rewarding Language Models Shimao Zhang et.al. 2503.03746 null
2025-03-05 Towards Understanding Distilled Reasoning Models: A Representational Approach David D. Baek et.al. 2503.03730 null
2025-03-05 Improving LLM Safety Alignment with Dual-Objective Optimization Xuandong Zhao et.al. 2503.03710 null
2025-03-05 Effective LLM Knowledge Learning via Model Generalization Mingkang Zhu et.al. 2503.03705 null
2025-03-05 A Practical Memory Injection Attack against LLM Agents Shen Dong et.al. 2503.03704 null
2025-03-05 Developing and Utilizing a Large-Scale Cantonese Dataset for Multi-Tasking in Large Language Models Jiyue Jiang et.al. 2503.03702 null
2025-03-05 Addressing Overprescribing Challenges: Fine-Tuning Large Language Models for Medication Recommendation Tasks Zihao Zhao et.al. 2503.03687 null
2025-03-05 Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models Bar Karov et.al. 2503.03669 null
2025-03-05 Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction Gustaw Opiełka et.al. 2503.03666 null
2025-03-04 Wikipedia in the Era of LLMs: Evolution and Risks Siming Huang et.al. 2503.02879 null
2025-03-04 The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models Ke Ji et.al. 2503.02875 null
2025-03-04 Prompting Generative AI with Interaction-Augmented Instructions Leixian Shen et.al. 2503.02874 null
2025-03-04 FairSense-AI: Responsible AI Meets Sustainability Shaina Raza et.al. 2503.02865 null
2025-03-04 Calibrating LLM Confidence with Semantic Steering: A Multi-Prompt Aggregation Framework Ziang Zhou et.al. 2503.02863 null
2025-03-04 Privacy and Accuracy-Aware AI/ML Model Deduplication Hong Guan et.al. 2503.02862 null
2025-03-04 Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs’ Decoding Layers Zicong He et.al. 2503.02851 null
2025-03-04 Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs Yuzhe Gu et.al. 2503.02846 null
2025-03-04 AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation Songming Zhang et.al. 2503.02832 null
2025-03-04 Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression Nathan Godey et.al. 2503.02812 null
2025-02-28 LLM Post-Training: A Deep Dive into Reasoning Large Language Models Komal Kumar et.al. 2502.21321 null
2025-02-28 FANformer: Improving Large Language Models Through Effective Periodicity Modeling Yihong Dong et.al. 2502.21309 null
2025-02-28 Contextualizing biological perturbation experiments through language Menghua Wu et.al. 2502.21290 null
2025-02-28 Adaptive Keyframe Sampling for Long Video Understanding Xi Tang et.al. 2502.21271 null
2025-02-28 Token-level Ensembling of Models with Different Vocabularies Rachel Wicks et.al. 2502.21265 null
2025-02-28 RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete Yuheng Ji et.al. 2502.21257 null
2025-02-28 Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs Xiaomin Li et.al. 2502.21239 null
2025-02-28 Transforming Tuberculosis Care: Optimizing Large Language Models For Enhanced Clinician-Patient Communication Daniil Filienko et.al. 2502.21236 null
2025-02-28 ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs Hao Ge et.al. 2502.21231 null
2025-03-03 ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer Omer Goldman et.al. 2502.21228 null
2025-02-27 R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts Zhongyang Li et.al. 2502.20395 null
2025-02-27 Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis Jeffrey Yang Fan Chiang et.al. 2502.20383 null
2025-02-27 Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers Shalev Lifshitz et.al. 2502.20379 null
2025-02-27 PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation Albert Gong et.al. 2502.20377 null
2025-02-27 Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization Ryan C. Barron et.al. 2502.20364 null
2025-02-27 Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs Kuan Lok Zhou et.al. 2502.20356 null
2025-02-27 KEDRec-LM: A Knowledge-distilled Explainable Drug Recommendation Large Language Model Kai Zhang et.al. 2502.20350 null
2025-02-27 Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models Yi Jing et.al. 2502.20344 null
2025-02-27 Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners Daniele Paliotta et.al. 2502.20339 null
2025-02-27 Expertise Is What We Want Alan Ashworth et.al. 2502.20335 null
2025-02-26 Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing Akshat Gupta et.al. 2502.19416 null
2025-02-26 Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs Dayu Yang et.al. 2502.19411 null
2025-02-26 Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices Xinru Wang et.al. 2502.19410 null
2025-02-26 ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models Danae Sánchez Villegas et.al. 2502.19409 null
2025-02-26 Learning Code-Edit Embedding to Model Student Debugging Behavior Hasnain Heickal et.al. 2502.19407 null
2025-02-26 General Reasoning Requires Learning to Reason from the Get-go Seungwook Han et.al. 2502.19402 null
2025-02-26 TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding Max Ku et.al. 2502.19400 null
2025-02-26 Residual Speech Embeddings for Tone Classification: Removing Linguistic Content to Enhance Paralinguistic Analysis Hamdan Al Ahbabi et.al. 2502.19387 null
2025-02-26 DataMan: Data Manager for Pre-training Large Language Models Ru Peng et.al. 2502.19363 null
2025-02-26 Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? Yancheng He et.al. 2502.19361 null
2025-02-25 DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers Xueguang Ma et.al. 2502.18460 null
2025-02-25 LLM-Based Design Pattern Detection Christian Schindler et.al. 2502.18458 null
2025-02-25 FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response Mollie Shichman et.al. 2502.18452 null
2025-02-25 SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Yuxiang Wei et.al. 2502.18449 null
2025-02-25 MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning Chanwoo Park et.al. 2502.18439 null
2025-02-25 TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning Frederikus Hudi et.al. 2502.18431 null
2025-02-25 OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Xiangyu Zhao et.al. 2502.18411 null
2025-02-25 Monte Carlo Temperature: a robust sampling strategy for LLM’s uncertainty quantification methods Nicola Cecere et.al. 2502.18389 null
2025-02-25 How Far are LLMs from Real Search? A Comprehensive Study on Efficiency, Completeness, and Inherent Capabilities Minhua Lin et.al. 2502.18387 null
2025-02-25 MindMem: Multimodal for Predicting Advertisement Memorability Using LLMs and Deep Learning Sepehr Asgarian et.al. 2502.18371 null
2025-02-24 Introducing Visual Perception Token into Multimodal Large Language Model Runpeng Yu et.al. 2502.17425 link
2025-02-24 MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs Jiarui Zhang et.al. 2502.17422 link
2025-02-24 LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification Penghui Yang et.al. 2502.17421 link
2025-02-24 The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence Tom Wollschläger et.al. 2502.17420 null
2025-02-24 From System 1 to System 2: A Survey of Reasoning Large Language Models Zhong-Zhi Li et.al. 2502.17419 link
2025-02-24 Reasoning with Latent Thoughts: On the Power of Looped Transformers Nikunj Saunshi et.al. 2502.17416 null
2025-02-24 COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs Liming Liu et.al. 2502.17410 link
2025-02-24 Large Language Models are Powerful EHR Encoders Stefan Hegselmann et.al. 2502.17403 null
2025-02-24 DIS-CO: Discovering Copyrighted Content in VLMs Training Data André V. Duarte et.al. 2502.17358 link
2025-02-24 On Relation-Specific Neurons in Large Language Models Yihong Liu et.al. 2502.17355 link
2025-02-21 ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval Guanqi Zhan et.al. 2502.15682 null
2025-02-21 Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training Jaydeep Borkar et.al. 2502.15680 null
2025-02-21 FLEKE: Federated Locate-then-Edit Knowledge Editing Zongkai Zhao et.al. 2502.15677 null
2025-02-21 AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind Zhining Zhang et.al. 2502.15676 null
2025-02-21 Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing Shoumik Saha et.al. 2502.15666 null
2025-02-21 Machine-generated text detection prevents language model collapse George Drayson et.al. 2502.15654 null
2025-02-21 Empowering LLMs with Logical Reasoning: A Comprehensive Survey Fengxiang Cheng et.al. 2502.15652 null
2025-02-21 Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models Anirudh Sundar et.al. 2502.15639 null
2025-02-21 The Relationship Between Reasoning and Performance in Large Language Models – o3 (mini) Thinks Harder, Not Longer Marthe Ballon et.al. 2502.15631 null
2025-02-21 Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing Qi Le et.al. 2502.15618 null
2025-02-20 LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention Shang Yang et.al. 2502.14866 link
2025-02-20 Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning Shuyue Stella Li et.al. 2502.14860 link
2025-02-20 FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling Weilin Zhao et.al. 2502.14856 null
2025-02-20 Prompt-to-Leaderboard Evan Frick et.al. 2502.14855 null
2025-02-20 GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks Jianwen Luo et.al. 2502.14848 null
2025-02-20 Red-Teaming LLM Multi-Agent Systems via Communication Attacks Pengfei He et.al. 2502.14847 null
2025-02-20 Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Yue Yang et.al. 2502.14846 null
2025-02-20 Revealing and Mitigating Over-Attention in Knowledge Editing Pinzheng Wang et.al. 2502.14838 null
2025-02-20 Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs Danni Liu et.al. 2502.14830 null
2025-02-20 Exploring Advanced Techniques for Visual Question Answering: A Comprehensive Comparison Aiswarya Baby et.al. 2502.14827 null
2025-02-19 Where’s the Bug? Attention Probing for Scalable Fault Localization Adam Stein et.al. 2502.13966 null
2025-02-19 Autellix: An Efficient Serving Engine for LLM Agents as General Programs Michael Luo et.al. 2502.13965 null
2025-02-19 MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads Weihao Liu et.al. 2502.13963 null
2025-02-19 Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering William Jurayj et.al. 2502.13962 link
2025-02-19 LIDDIA: Language-based Intelligent Drug Discovery Agent Reza Averly et.al. 2502.13959 null
2025-02-19 Neurosymbolic artificial intelligence via large language models and coherence-driven inference Steve Huntsman et.al. 2502.13953 null
2025-02-19 Why Safeguarded Ships Run Aground? Aligned Large Language Models’ Safety Mechanisms Tend to Be Anchored in The Template Region Chak Tou Leong et.al. 2502.13946 null
2025-02-19 A Chain-of-Thought Subspace Meta-Learning for Few-shot Image Captioning with Large Vision and Language Models Hao Huang et.al. 2502.13942 null
2025-02-19 LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization Guanzheng Chen et.al. 2502.13922 link
2025-02-19 Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis Jiahao Gai et.al. 2502.13921 null
2025-02-18 Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization Shuo Xing et.al. 2502.13146 link
2025-02-18 Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation Bencheng Liao et.al. 2502.13145 link
2025-02-18 UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models Huawei Lin et.al. 2502.13141 null
2025-02-18 Towards Quantum Tensor Decomposition in Biomedical Applications Myson Burch et.al. 2502.13140 null
2025-02-18 AIDE: AI-Driven Exploration in the Space of Code Zhengyao Jiang et.al. 2502.13138 link
2025-02-18 Theorem Prover as a Judge for Synthetic Data Generation Joshua Ong Jun Leang et.al. 2502.13137 null
2025-02-18 Learning to Defer for Causal Discovery with Imperfect Experts Oscar Clivio et.al. 2502.13132 null
2025-02-18 Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning Jingyang Lin et.al. 2502.13127 null
2025-02-18 RuozhiBench: Evaluating LLMs with Logical Fallacies and Misleading Premises Zenan Zhai et.al. 2502.13125 null
2025-02-18 Adapting Psycholinguistic Research for LLMs: Gender-inclusive Language in a Coreference Context Marion Bartl et.al. 2502.13120 null
2025-02-17 Idiosyncrasies in Large Language Models Mingjie Sun et.al. 2502.12150 link
2025-02-17 HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation Ling Yang et.al. 2502.12148 link
2025-02-17 Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control Jinyan Su et.al. 2502.12145 null
2025-02-17 Small Models Struggle to Learn from Strong Reasoners Yuetai Li et.al. 2502.12143 link
2025-02-17 SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs Yige Xu et.al. 2502.12134 null
2025-02-17 Transformer Dynamics: A neuroscientific approach to interpretability of large language models Jesseba Fernando et.al. 2502.12131 null
2025-02-17 Scaling Autonomous Agents via Automatic Reward Modeling And Planning Zhenfang Chen et.al. 2502.12130 link
2025-02-17 Minimal Ranks, Maximum Confidence: Parameter-efficient Uncertainty Quantification for LoRA Patryk Marszałek et.al. 2502.12122 null
2025-02-17 LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws Prasanna Mayilvahanan et.al. 2502.12120 null
2025-02-17 PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection Jinhe Bi et.al. 2502.12119 null
2025-02-14 MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Yi-Fan Zhang et.al. 2502.10391 null
2025-02-14 Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction WonJin Yoon et.al. 2502.10388 null
2025-02-14 Enhancing Multilingual LLM Pretraining with Model-Based Data Selection Bettina Messmer et.al. 2502.10361 null
2025-02-14 Organize the Web: Constructing Domains Enhances Pre-Training Data Curation Alexander Wettig et.al. 2502.10341 null
2025-02-14 Evaluating the Meta- and Object-Level Reasoning of Large Language Models for Question Answering Nick Ferguson et.al. 2502.10338 null
2025-02-14 LLM-Powered Preference Elicitation in Combinatorial Assignment Ermis Soumalias et.al. 2502.10308 null
2025-02-14 Open-Source AI-Powered Optimization in Scalene: Advancing Python Performance Profiling with DeepSeek-R1 and LLaMA 3.2 Saem Hasan et.al. 2502.10299 null
2025-02-14 Are Large Language Models the future crowd workers of Linguistics? Iris Ferrazzo et.al. 2502.10266 null
2025-02-14 Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers Aivin V. Solatorio et.al. 2502.10263 link
2025-02-14 VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models Gokul Karthik Kumar et.al. 2502.10250 null
2025-02-13 Theoretical Benefit and Limitation of Diffusion Language Model Guhao Feng et.al. 2502.09622 null
2025-02-13 MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Dongzhi Jiang et.al. 2502.09621 null
2025-02-13 Exploring the Potential of Encoder-free Architectures in 3D LMMs Yiwen Tang et.al. 2502.09620 link
2025-02-13 Human-LLM Coevolution: Evidence from Academic Writing Mingmeng Geng et.al. 2502.09606 null
2025-02-13 SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Yung-Sung Chuang et.al. 2502.09604 link
2025-02-13 GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis Angelos Zavras et.al. 2502.09598 link
2025-02-13 Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs Siyan Zhao et.al. 2502.09597 link
2025-02-13 KIMAs: A Configurable Knowledge Integrated Multi-Agent System Zitao Li et.al. 2502.09596 null
2025-02-13 Logical forms complement probability in understanding language model (and human) performance Yixuan Wang et.al. 2502.09589 null
2025-02-13 Polymind: Parallel Visual Diagramming with Large Language Models to Support Prewriting Through Microtasks Qian Wan et.al. 2502.09577 null
2025-02-12 Examining Multilingual Embedding Models Cross-Lingually Through LLM-Generated Adversarial Examples Andrianos Michail et.al. 2502.08638 null
2025-02-12 Ensemble based approach to quantifying uncertainty of LLM based classifications Srijith Rajamohan et.al. 2502.08631 null
2025-02-12 Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks Ang Li et.al. 2502.08586 null
2025-02-12 QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval Wonduk Seo et.al. 2502.08557 null
2025-02-12 Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies Sunnie S. Y. Kim et.al. 2502.08554 null
2025-02-12 LLMs can implicitly learn from mistakes in-context Lisa Alazraki et.al. 2502.08550 null
2025-02-12 LLM Pretraining with Continuous Concepts Jihoon Tack et.al. 2502.08524 link
2025-02-12 The Paradox of Stochasticity: Limited Creativity and Computational Decoupling in Temperature-Varied LLM Outputs of Structured Fictional Data Evgenii Evstafev et.al. 2502.08515 null
2025-02-12 Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation Mahnaz Koupaee et.al. 2502.08514 null
2025-02-12 Measuring Diversity in Synthetic Datasets Yuchang Zhu et.al. 2502.08512 null
2025-02-11 DarwinLM: Evolutionary Structured Pruning of Large Language Models Shengkun Tang et.al. 2502.07780 link
2025-02-11 Auditing Prompt Caching in Language Model APIs Chenchen Gu et.al. 2502.07776 link
2025-02-11 Automatic Robot Task Planning by Integrating Large Language Model with Genetic Programming Azizjon Kobilov et.al. 2502.07772 null
2025-02-11 Great Power Brings Great Responsibility: Personalizing Conversational AI for Diverse Problem-Solvers Italo Santos et.al. 2502.07763 null
2025-02-11 Scalable Fingerprinting of Large Language Models Anshul Nasery et.al. 2502.07760 null
2025-02-11 Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension Wenbo Gong et.al. 2502.07752 null
2025-02-11 WHODUNIT: Evaluation benchmark for culprit detection in mystery stories Kshitij Gupta et.al. 2502.07747 link
2025-02-11 The Economics of Large Language Models: Token Allocation, Fine-Tuning, and Optimal Pricing Dirk Bergemann et.al. 2502.07736 null
2025-02-11 Economics of Sourcing Human Data Sebastin Santy et.al. 2502.07732 null
2025-02-11 Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK Marcos Cramer et.al. 2502.07728 null
2025-02-10 Rationalization Models for Text-to-SQL Gaetano Rossiello et.al. 2502.06759 null
2025-02-10 Gradient Multi-Normalization for Stateless and Scalable LLM Training Meyer Scetbon et.al. 2502.06742 null
2025-02-10 VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data Thomas Zeng et.al. 2502.06737 null
2025-02-10 Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining Daouda Sow et.al. 2502.06733 null
2025-02-10 Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Runze Liu et.al. 2502.06703 link
2025-02-10 Boosting Self-Efficacy and Performance of Large Language Models via Verbal Efficacy Stimulations Rui Chen et.al. 2502.06669 null
2025-02-10 Automatic Evaluation of Healthcare LLMs Beyond Question-Answering Anna Arias-Duart et.al. 2502.06666 null
2025-02-10 On the Limitations of Combining Sentiment Analysis Tools in a Cross-Platform Setting Martin Obaidi et.al. 2502.06665 null
2025-02-10 EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models Xingrun Xing et.al. 2502.06663 link
2025-02-10 Unbiased Evaluation of Large Language Models from a Causal Perspective Meilin Chen et.al. 2502.06655 null
2025-02-07 Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray Yunhang Shen et.al. 2502.05177 link
2025-02-07 NoLiMa: Long-Context Evaluation Beyond Literal Matching Ali Modarressi et.al. 2502.05167 link
2025-02-07 DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails Yihe Deng et.al. 2502.05163 link
2025-02-07 A Lightweight Method to Disrupt Memorized Sequences in LLM Parjanya Prajakta Prashant et.al. 2502.05159 null
2025-02-07 Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment Minh-Quan Le et.al. 2502.05153 null
2025-02-07 Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation Steffen Eger et.al. 2502.05151 link
2025-02-07 CodeSCM: Causal Analysis for Multi-Modal Code Generation Mukur Gupta et.al. 2502.05150 null
2025-02-07 An Annotated Reading of ‘The Singer of Tales’ in the LLM Era Kush R. Varshney et.al. 2502.05148 null
2025-02-07 Refining Integration-by-Parts Reduction of Feynman Integrals with Machine Learning Matt von Hippel et.al. 2502.05121 null
2025-02-07 Flexible and Efficient Grammar-Constrained Decoding Kanghee Park et.al. 2502.05111 null
2025-02-06 Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Zuyan Liu et.al. 2502.04328 null
2025-02-06 Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions Yik Siu Chan et.al. 2502.04322 link
2025-02-06 ChamaleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters Kamer Ali Yuksel et.al. 2502.04315 null
2025-02-06 ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization Yinjie Wang et.al. 2502.04306 link
2025-02-06 Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization Yuanye Liu et.al. 2502.04295 link
2025-02-06 PILAF: Optimal Human Preference Sampling for Reward Modeling Yunzhen Feng et.al. 2502.04270 null
2025-02-06 How does a Multilingual LM Handle Multiple Languages? Santhosh Kakarla et.al. 2502.04269 null
2025-02-06 Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion Marco Mistretta et.al. 2502.04263 link
2025-02-06 TriNER: A Series of Named Entity Recognition Models For Hindi, Bengali & Marathi Mohammed Amaan Dhamaskar et.al. 2502.04245 null
2025-02-06 MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion Xintong Hao et.al. 2502.04235 null
2025-02-05 Do Large Language Model Benchmarks Test Reliability? Joshua Vendrow et.al. 2502.03461 null
2025-02-05 Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training Boyao Wang et.al. 2502.03460 null
2025-02-05 A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs) Yiye Chen et.al. 2502.03450 null
2025-02-05 BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving Ran Xin et.al. 2502.03438 null
2025-02-05 On Fairness of Unified Multimodal Large Language Model for Image Generation Ming Liu et.al. 2502.03429 null
2025-02-05 Harnessing Large Language Models for Curated Code Reviews Oussama Ben Sghaier et.al. 2502.03425 null
2025-02-05 Investigating Corporate Social Responsibility Initiatives: Examining the case of corporate Covid-19 response Meheli Basu et.al. 2502.03421 null
2025-02-05 Think or Step-by-Step? UnZIPping the Black Box in Zero-Shot Prompts Nikta Gohari Sadr et.al. 2502.03418 null
2025-02-05 SPRI: Aligning Large Language Models with Context-Situated Principles Hongli Zhan et.al. 2502.03397 null
2025-02-05 LIMO: Less is More for Reasoning Yixin Ye et.al. 2502.03387 null
2025-02-04 COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation Xueqing Deng et.al. 2502.02589 null
2025-02-04 A comparison of translation performance between DeepL and Supertext Alex Flückiger et.al. 2502.02577 null
2025-02-04 Are Language Models Up to Sequential Optimization Problems? From Evaluation to a Hegelian-Inspired Enhancement Soheil Abbasloo et.al. 2502.02573 null
2025-02-04 Learning the RoPEs: Better 2D and 3D Position Encodings with STRING Connor Schenck et.al. 2502.02562 null
2025-02-04 LLMs for Generation of Architectural Components: An Exploratory Empirical Study in the Serverless World Shrikara Arun et.al. 2502.02539 null
2025-02-04 Adaptive Self-improvement LLM Agentic System for ML Library Development Genghan Zhang et.al. 2502.02534 null
2025-02-04 Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies Han Zhou et.al. 2502.02533 null
2025-02-04 Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search Maohao Shen et.al. 2502.02508 null
2025-02-04 EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization Yize Wu et.al. 2502.02493 null
2025-02-04 Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study Menglong Cui et.al. 2502.02481 null
2025-01-31 Vintix: Action Model via In-Context Reinforcement Learning Andrey Polubarov et.al. 2501.19400 link
2025-01-31 Do LLMs Strategically Reveal, Conceal, and Infer Information? A Theoretical and Empirical Analysis in The Chameleon Game Mustafa O. Karabag et.al. 2501.19398 link
2025-01-31 Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models Alina Shutova et.al. 2501.19392 null
2025-01-31 Federated Sketching LoRA: On-Device Collaborative Fine-Tuning of Large Language Models Wenzhi Fang et.al. 2501.19389 null
2025-02-03 SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions Dominik Wagner et.al. 2501.19377 null
2025-01-31 We’re Different, We’re the Same: Creative Homogeneity Across LLMs Emily Wenger et.al. 2501.19361 null
2025-01-31 Mechanical Properties of the Meninges: Large Language Model Assisted Systematic Review of over 25,000 Studies Brandon P. Chelstrom et.al. 2501.19359 null
2025-01-31 The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking Yuchun Miao et.al. 2501.19358 null
2025-01-31 Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SCICAP Challenge 2023 Ting-Yao E. Hsu et.al. 2501.19353 null
2025-01-31 Towards Adaptive Self-Improvement for Smarter Energy Systems Alexander Sommer et.al. 2501.19340 null
2025-01-30 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Yue Wang et.al. 2501.18585 null
2025-01-30 Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH Evgenii Evstafev et.al. 2501.18576 null
2025-01-30 BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos Lehao Lin et.al. 2501.18565 null
2025-01-30 Semantic Web and Creative AI – A Technical Report from ISWS 2023 Raia Abu Ahmad et.al. 2501.18542 null
2025-01-30 Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges Manveer Singh Tamber et.al. 2501.18536 link
2025-01-30 Differentially Private Steering for Large Language Model Alignment Anmol Goel et.al. 2501.18532 link
2025-01-30 Learn from the Past: Language-conditioned Object Rearrangement with Large Language Models Guanqun Cao et.al. 2501.18516 null
2025-01-30 Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Arthur Douillard et.al. 2501.18512 null
2025-01-30 CLEAR: Cue Learning using Evolution for Accurate Recognition Applied to Sustainability Data Extraction Peter J. Bentley et.al. 2501.18504 null
2025-01-30 A Tool for In-depth Analysis of Code Execution Reasoning of Large Language Models Changshu Liu et.al. 2501.18482 null
2025-01-29 Learning Beyond the Surface: How Far Can Continual Pre-Training with LoRA Enhance LLMs’ Domain-Specific Insight Learning? Pouya Pezeshkpour et.al. 2501.17840 link
2025-01-29 Leveraging Multimodal LLM for Inspirational User Interface Search Seokhyeon Park et.al. 2501.17799 link
2025-01-29 BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation – Challenges and Insights Chan-Jan Hsu et.al. 2501.17790 null
2025-01-29 AdditiveLLM: Large Language Models Predict Defects in Additive Manufacturing Peter Pak et.al. 2501.17784 null
2025-01-29 2SSP: A Two-Stage Framework for Structured Pruning of LLMs Fabrizio Sandri et.al. 2501.17771 null
2025-01-29 Hybrid Graphs for Table-and-Text based Question Answering using LLMs Ankush Agarwal et.al. 2501.17767 null
2025-01-29 On the Partitioning of GPU Power among Multi-Instances Tirth Vamja et.al. 2501.17752 null
2025-01-29 Early External Safety Testing of OpenAI’s o3-mini: Insights from the Pre-Deployment Evaluation Aitor Arrieta et.al. 2501.17749 null
2025-01-29 Using Code Generation to Solve Open Instances of Combinatorial Design Problems Christopher D. Rosin et.al. 2501.17725 link
2025-01-29 RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts Eujeong Choi et.al. 2501.17715 link
2025-01-28 Cultural Differences and Perverse Incentives in Science Create a Bad Mix: Exploring Country-Level Publication Bias in Select ACM Conferences Aksheytha Chelikavada et.al. 2501.17150 null
2025-01-28 FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data Deren Lei et.al. 2501.17144 link
2025-01-28 ASTRAL: Automated Safety Testing of Large Language Models Miriam Ugarte et.al. 2501.17132 null
2025-01-28 Optimizing Large Language Model Training Using FP4 Quantization Ruizhe Wang et.al. 2501.17116 null
2025-01-28 Unlocking Transparent Alignment Through Enhanced Inverse Constitutional AI for Principle Extraction Carl-Leander Henneking et.al. 2501.17112 null
2025-01-28 Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving Evgenii Evstafev et.al. 2501.17084 null
2025-01-28 Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models Minghan Li et.al. 2501.17039 null
2025-01-28 Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies Manojkumar Parmar et.al. 2501.17030 null
2025-01-28 Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs Alessandro Midolo et.al. 2501.17024 null
2025-01-28 Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement Kei Katsumata et.al. 2501.17022 null
2025-01-27 Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology Meiyun Cao et.al. 2501.16309 null
2025-01-27 RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval Long Nguyen et.al. 2501.16303 null
2025-01-27 Matryoshka Re-Ranker: A Flexible Re-Ranking Architecture With Configurable Depth and Width Zheng Liu et.al. 2501.16302 null
2025-01-27 Large Models in Dialogue for Active Perception and Anomaly Detection Tzoulio Chamiti et.al. 2501.16300 null
2025-01-27 FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers Renshan Zhang et.al. 2501.16297 null
2025-01-27 Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models Jing Zhang et.al. 2501.16282 null
2025-01-27 Do LLMs Have Visualization Literacy? An Evaluation on Modified Visualizations to Test Generalization in Data Interpretation Jiayi Hong et.al. 2501.16277 null
2025-01-27 URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots – A Case Study at HCMUT Long Nguyen et.al. 2501.16276 null
2025-01-27 A foundation model for human-AI collaboration in medical literature mining Zifeng Wang et.al. 2501.16255 null
2025-01-27 Multi-Agent Geospatial Copilots for Remote Sensing Workflows Chaehong Lee et.al. 2501.16254 null
2025-01-24 HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation Xin Zhou et.al. 2501.14729 link
2025-01-24 Do LLMs Provide Consistent Answers to Health-Related Questions across Languages? Ipek Baris Schlicht et.al. 2501.14719 null
2025-01-24 Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models Naihao Deng et.al. 2501.14717 null
2025-01-24 FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing James Seale Smith et.al. 2501.14713 null
2025-01-24 The Karp Dataset Mason DiCicco et.al. 2501.14705 null
2025-01-24 Rethinking Table Instruction Tuning Naihao Deng et.al. 2501.14693 null
2025-01-24 An Empirical Study on LLM-based Classification of Requirements-related Provisions in Food-safety Regulations Shabnam Hassani et.al. 2501.14683 null
2025-01-24 Diffusion based Text-to-Music Generationwith Global and Local Text based Conditioning Jisi Zhang et.al. 2501.14680 null
2025-01-24 MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications Yixing Jiang et.al. 2501.14654 link
2025-01-24 Investigating the (De)Composition Capabilities of Large Language Models in Natural-to-Formal Language Conversion Ziyao Xu et.al. 2501.14649 link
2025-01-23 CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation Guofeng Cui et.al. 2501.13927 null
2025-01-23 Analysis of Indic Language Capabilities in LLMs Aatman Vaidya et.al. 2501.13912 null
2025-01-23 Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models Linh Tran et.al. 2501.13904 null
2025-01-23 Exploring Finetuned Audio-LLM on Heart Murmur Features Adrian Florea et.al. 2501.13884 null
2025-01-23 The machine learning platform for developers of large systems Alexey Naikov et.al. 2501.13881 null
2025-01-23 A RAG-Based Institutional Assistant Gustavo Kuratomi et.al. 2501.13880 null
2025-01-23 Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes Shiling Deng et.al. 2501.13851 link
2025-01-23 On the Reasoning Capacity of AI Models and How to Quantify It Santosh Kumar Radha et.al. 2501.13833 null
2025-01-23 Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing Hao Zhang et.al. 2501.13831 null
2025-01-23 Hallucinations Can Improve Large Language Models in Drug Discovery Shuzhou Yuan et.al. 2501.13824 null
2025-01-22 A Rate-Distortion Framework for Summarization Enes Arda et.al. 2501.13100 null
2025-01-22 Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment Melissa Kazemi Rad et.al. 2501.13080 null
2025-01-22 Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning Bohao Yang et.al. 2501.13042 link
2025-01-22 Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament Yantao Liu et.al. 2501.13007 link
2025-01-22 Large Language Model-Based Semantic Communication System for Image Transmission Soheyb Ribouh et.al. 2501.12988 null
2025-01-22 LLM4WM: Adapting LLM for Wireless Multi-Tasking Xuanyu Liu et.al. 2501.12983 null
2025-01-22 OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models Chongren Sun et.al. 2501.12975 link
2025-01-22 Accessible Smart Contracts Verification: Synthesizing Formal Models with Tamed LLMs Jan Corazza et.al. 2501.12972 null
2025-01-22 It’s complicated. The relationship of algorithmic fairness and non-discrimination regulations in the EU AI Act Kristof Meding et.al. 2501.12962 null
2025-01-22 Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference Weizhi Fei et.al. 2501.12959 null
2025-01-21 InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling Yi Wang et.al. 2501.12386 link
2025-01-21 Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists Thomas F. Eisenmann et.al. 2501.12374 link
2025-01-21 Is Long Context All You Need? Leveraging LLM’s Extended Context for NL2SQL Yeounoh Chung et.al. 2501.12372 null
2025-01-21 Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration Thomas Walshe et.al. 2501.12332 null
2025-01-21 VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model Xianwei Zhuang et.al. 2501.12327 link
2025-01-21 LLM-Assisted Knowledge Graph Completion for Curriculum and Domain Modelling in Personalized Higher Education Recommendations Hasan Abu-Rasheed et.al. 2501.12300 null
2025-01-21 MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks Qishen Zhou et.al. 2501.12281 link
2025-01-21 Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement Maosong Cao et.al. 2501.12273 null
2025-01-21 FOCUS: First Order Concentrated Updating Scheme Yizhou Liu et.al. 2501.12243 null
2025-01-21 InsTALL: Context-aware Instructional Task Assistance with Multi-modal Large Language Models Pha Nguyen et.al. 2501.12231 null
2025-01-17 FaceXBench: Evaluating Multimodal LLMs on Face Understanding Kartik Narayan et.al. 2501.10360 link
2025-01-17 Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems Weibo Gao et.al. 2501.10332 null
2025-01-17 Large language models for automated scholarly paper review: A survey Zhenzhen Zhuang et.al. 2501.10326 null
2025-01-17 HiMix: Reducing Computational Complexity in Large Vision-Language Models Xuange Zhang et.al. 2501.10318 null
2025-01-17 Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling Suvodip Dey et.al. 2501.10316 link
2025-01-17 Addressing Popularity Bias in Third-Party Library Recommendations Using LLMs Claudio Di Sipio et.al. 2501.10313 null
2025-01-17 Computational Protein Science in the Era of Large Language Models (LLMs) Wenqi Fan et.al. 2501.10282 null
2025-01-17 Test Wars: A Comparative Study of SBST, Symbolic Execution, and LLM-Based Approaches to Unit Test Generation Azat Abdullin et.al. 2501.10200 null
2025-01-17 Generative Artificial Intelligence: Implications for Biomedical and Health Professions Education William Hersh et.al. 2501.10186 null
2025-01-17 Multi-stage Training of Bilingual Islamic LLM for Neural Passage Retrieval Vera Pavlova et.al. 2501.10175 null
2025-01-16 Distilling Multi-modal Large Language Models for Autonomous Driving Deepti Hegde et.al. 2501.09757 null
2025-01-16 Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues Youngjoon Jang et.al. 2501.09754 null
2025-01-16 OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking Zekun Xi et.al. 2501.09751 null
2025-01-16 Enhancing Lexicon-Based Text Embeddings with Large Language Models Yibin Lei et.al. 2501.09749 null
2025-01-16 Suggesting Code Edits in Interactive Machine Learning Notebooks Using Large Language Models Bihui Jin et.al. 2501.09745 null
2025-01-16 KU AIGEN ICL EDI@BC8 Track 3: Advancing Phenotype Named Entity Recognition and Normalization for Dysmorphology Physical Examination Reports Hajung Kim et.al. 2501.09744 null
2025-01-16 Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Nanye Ma et.al. 2501.09732 null
2025-01-16 A Simple Aerial Detection Baseline of Multimodal Language Models Qingyun Li et.al. 2501.09720 link
2025-01-16 CyberMentor: AI Powered Learning Tool Platform to Address Diverse Student Needs in Cybersecurity Education Tianyu Wang et.al. 2501.09709 null
2025-01-16 Domain Adaptation of Foundation LLMs for e-Commerce Christian Herold et.al. 2501.09706 null
2025-01-15 Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails Shaona Ghosh et.al. 2501.09004 null
2025-01-15 Vision Foundation Models for Computed Tomography Suraj Pai et.al. 2501.09001 null
2025-01-15 Development and Validation of the Provider Documentation Summarization Quality Instrument for Large Language Models Emma Croxford et.al. 2501.08977 null
2025-01-15 Learning to Extract Cross-Domain Aspects and Understanding Sentiments Using Large Language Models Karukriti Kaushik Ghosh et.al. 2501.08974 null
2025-01-15 Analyzing the Ethical Logic of Six Large Language Models W. Russell Neuman et.al. 2501.08951 null
2025-01-15 Applying General Turn-taking Models to Conversational Human-Robot Interaction Gabriel Skantze et.al. 2501.08946 null
2025-01-15 Disentangling Exploration of Large Language Models by Optimal Exploitation Tim Grams et.al. 2501.08925 null
2025-01-15 GenAI Content Detection Task 3: Cross-Domain Machine-Generated Text Detection Challenge Liam Dugan et.al. 2501.08913 null
2025-01-15 Leveraging Large Language Models as Knowledge-Driven Agents for Reliable Retrosynthesis Planning Qinyu Ma et.al. 2501.08897 null
2025-01-15 XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework Sida Tian et.al. 2501.08809 null
2025-01-14 PokerBench: Training Large Language Models to become Professional Poker Players Richard Zhuang et.al. 2501.08328 link
2025-01-14 Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks Miran Heo et.al. 2501.08326 null
2025-01-14 ADAM-1: AI and Bioinformatics for Alzheimer’s Detection and Microbiome-Clinical Data Integrations Ziyuan Huang et.al. 2501.08324 null
2025-01-14 Exploring Robustness of Multilingual LLMs on Real-World Noisy Data Amirhossein Aliakbarzadeh et.al. 2501.08322 link
2025-01-14 Enhancing Automated Interpretability with Output-Centric Feature Descriptions Yoav Gur-Arieh et.al. 2501.08319 link
2025-01-14 HALoGEN: Fantastic LLM Hallucinations and Where to Find Them Abhilasha Ravichander et.al. 2501.08292 null
2025-01-14 LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding Hongyu Li et.al. 2501.08282 link
2025-01-14 Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing Pulkit Arora et.al. 2501.08276 null
2025-01-14 TriMod Fusion for Multimodal Named Entity Recognition in Social Media Mosab Alfaqeeh et.al. 2501.08267 null
2025-01-14 Addressing the sustainable AI trilemma: a case study on LLM agents and RAG Hui Wu et.al. 2501.08262 null
2025-01-13 Imagine while Reasoning in Space: Multimodal Visualization-of-Thought Chengzu Li et.al. 2501.07542 null
2025-01-13 ML Mule: Mobile-Driven Context-Aware Collaborative Learning Haoxiang Yu et.al. 2501.07536 null
2025-01-13 Investigating Large Language Models in Inferring Personality Traits from User Conversations Jianfeng Zhu et.al. 2501.07532 null
2025-01-13 RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment Difei Gu et.al. 2501.07525 link
2025-01-13 Parallel Key-Value Cache Fusion for Position Invariant RAG Philhoon Oh et.al. 2501.07523 null
2025-01-13 Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards Yangsibo Huang et.al. 2501.07493 null
2025-01-13 TiEBe: A Benchmark for Assessing the Current Knowledge of Large Language Models Thales Sales Almeida et.al. 2501.07482 null
2025-01-13 A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities Yihao Liu et.al. 2501.07468 null
2025-01-13 Understanding and Benchmarking Artificial Intelligence: OpenAI’s o3 Is Not AGI Rolf Pfister et.al. 2501.07458 null
2025-01-13 Enhancing LLM’s Ability to Generate More Repository-Aware Unit Tests Through Precise Contextual Information Injection Xin Yin et.al. 2501.07425 null
2025-01-10 LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Omkar Thawakar et.al. 2501.06186 link
2025-01-10 PEACE: Empowering Geologic Map Holistic Understanding with MLLMs Yangyu Huang et.al. 2501.06184 null
2025-01-10 Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories Gerd Kortemeyer et.al. 2501.06143 null
2025-01-10 Supervision policies can shape long-term risk management in general-purpose AI models Manuel Cebrian et.al. 2501.06137 link
2025-01-10 Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI Yuya Asano et.al. 2501.06129 null
2025-01-10 Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding Fabian David Schmidt et.al. 2501.06117 link
2025-01-10 From Conversation to Automation: Leveraging Large Language Models to Analyze Strategies in Problem Solving Therapy Elham Aghakhani et.al. 2501.06101 null
2025-01-10 How to Tune a Multilingual Encoder Model for Germanic Languages: A Study of PEFT, Full Fine-Tuning, and Language Adapters Romina Oji et.al. 2501.06025 link
2025-01-10 Addressing speaker gender bias in large scale speech translation systems Shubham Bansal et.al. 2501.05989 null
2025-01-10 Exploring LLMs for Automated Pre-Testing of Cross-Cultural Surveys Divya Mani Adhikari et.al. 2501.05985 null
2025-01-09 ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding Xingyu Fu et.al. 2501.05452 link
2025-01-09 Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark Yunzhuo Hao et.al. 2501.05444 null
2025-01-09 A survey of textual cyber abuse detection using cutting-edge language models and large language models Jose A. Diaz-Garcia et.al. 2501.05443 null
2025-01-09 Using LLMs to Infer Non-Binary COVID-19 Sentiments of Chinese Micro-bloggers Jerry Chongyi Hu et.al. 2501.05423 null
2025-01-09 FairCode: Evaluating Social Bias of LLMs in Code Generation Yongkang Du et.al. 2501.05396 link
2025-01-09 Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models Kristian G. Barman et.al. 2501.05382 null
2025-01-09 Accelerated Diffusion Models via Speculative Sampling Valentin De Bortoli et.al. 2501.05370 null
2025-01-09 Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction Hantao Lou et.al. 2501.05336 link
2025-01-09 “What’s Happening”- A Human-centered Multimodal Interpreter Explaining the Actions of Autonomous Vehicles Xuewen Luo et.al. 2501.05322 null
2025-01-09 CallNavi: A Study and Challenge on Function Calling Routing and Invocation in Large Language Models Yewei Song et.al. 2501.05255 null
2025-01-08 Re-ranking the Context for Multimodal Retrieval Augmented Generation Matin Mortaheb et.al. 2501.04695 null
2025-01-08 URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics Ruilin Luo et.al. 2501.04686 link
2025-01-08 Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations Archita Srivastava et.al. 2501.04675 null
2025-01-08 Assessing Language Comprehension in Large Language Models Using Construction Grammar Wesley Scivetti et.al. 2501.04661 null
2025-01-08 Multi-task retriever fine-tuning for domain-specific and efficient RAG Patrice Béchard et.al. 2501.04652 null
2025-01-08 FlairGPT: Repurposing LLMs for Interior Designs Gabrielle Littlefair et.al. 2501.04648 null
2025-01-08 Knowledge Retrieval Based on Generative AI Te-Lun Yang et.al. 2501.04635 null
2025-01-08 “Can you be my mum?”: Manipulating Social Robots in the Large Language Models Era Giulio Antonio Abbo et.al. 2501.04633 null
2025-01-08 Quantum-inspired Embeddings Projection and Similarity Metrics for Representation Learning Ivan Kankeu et.al. 2501.04591 null
2025-01-08 InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection Yuhang Liu et.al. 2501.04575 link
2025-01-07 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Haobo Yuan et.al. 2501.04001 link
2025-01-07 RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance Matin Mortaheb et.al. 2501.03995 null
2025-01-07 Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles Yuxi Xia et.al. 2501.03991 null
2025-01-07 (De)-Indexing and the Right to be Forgotten Salvatore Vilella et.al. 2501.03989 null
2025-01-07 VLM-driven Behavior Tree for Context-aware Task Planning Naoki Wake et.al. 2501.03968 null
2025-01-07 Vision Language Models as Values Detectors Giulio Antonio Abbo et.al. 2501.03957 null
2025-01-07 Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States Jurgita Kapočiūtė-Dzikienė et.al. 2501.03952 null
2025-01-07 Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection Pablo Miralles-González et.al. 2501.03940 null
2025-01-07 Visual question answering: from early developments to recent advances – a survey Ngoc Dung Huynh et.al. 2501.03939 null
2025-01-07 Exploring the Potential of Large Language Models in Public Transportation: San Antonio Case Study Ramya Jonnala et.al. 2501.03904 null
2025-01-06 BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning Beichen Zhang et.al. 2501.03226 link
2025-01-06 Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation Yuhui Zhang et.al. 2501.03225 link
2025-01-06 Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLMs-Generated Text Ayat Najjar et.al. 2501.03212 null
2025-01-06 Detecting AI-Generated Text in Educational Content: Leveraging Machine Learning and Explainable AI for Academic Integrity Ayat A. Najjar et.al. 2501.03203 null
2025-01-06 CLIX: Cross-Lingual Explanations of Idiomatic Expressions Aaron Gluck et.al. 2501.03191 null
2025-01-06 GLiREL – Generalist Model for Zero-Shot Relation Extraction Jack Boylan et.al. 2501.03172 null
2025-01-06 Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text Ali Al-Lawati et.al. 2501.03166 link
2025-01-06 Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches Alhassan Mumuni et.al. 2501.03151 null
2025-01-06 VicSim: Enhancing Victim Simulation with Emotional and Linguistic Fidelity Yerong Li et.al. 2501.03139 null
2025-01-06 PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models Mingyang Song et.al. 2501.03124 link
2025-01-03 VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Chaoyou Fu et.al. 2501.01957 link
2025-01-03 Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap Weizhi Zhang et.al. 2501.01945 null
2025-01-03 Abstractive Text Summarization for Contemporary Sanskrit Prose: Issues and Challenges Shagun Sinha et.al. 2501.01933 null
2025-01-03 Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding Jiaming Li et.al. 2501.01926 null
2025-01-03 Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Yifan Du et.al. 2501.01904 link
2025-01-03 Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions Rachneet Sachdeva et.al. 2501.01872 link
2025-01-03 Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification Xiangxiang Dai et.al. 2501.01849 null
2025-01-03 MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning Pu Yang et.al. 2501.01834 null
2025-01-03 Time Series Language Model for Descriptive Caption Generation Mohamed Trabelsi et.al. 2501.01832 null
2025-01-03 Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models Yanjiang Liu et.al. 2501.01830 null
2025-01-02 Unifying Specialized Visual Encoders for Video Language Models Jihoon Chung et.al. 2501.01426 link
2025-01-02 Multi-Modal Video Feature Extraction for Popularity Prediction Haixu Liu et.al. 2501.01422 null
2025-01-02 Deep Discrete Encoders: Identifiable Deep Generative Models for Rich Data with Discrete Latent Layers Seunghyun Lee et.al. 2501.01414 null
2025-01-02 OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios Xize Cheng et.al. 2501.01384 null
2025-01-02 CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering Ben Vardi et.al. 2501.01371 null
2025-01-02 Embedding-based Approaches to Hyperpartisan News Detection Karthik Mohan et.al. 2501.01370 null
2025-01-02 Aligning Large Language Models for Faithful Integrity Against Opposing Argument Yong Zhao et.al. 2501.01336 null
2025-01-02 CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models Johan Wahréus et.al. 2501.01335 link
2025-01-02 Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension Yanbo Fang et.al. 2501.01332 null
2025-01-02 The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation Shuzheng Gao et.al. 2501.01329 null
2024-12-30 Distributed Mixture-of-Agents for Edge Inference with Large Language Models Purbesh Mitra et.al. 2412.21200 link
2024-12-31 HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Zhaojian Yu et.al. 2412.21199 link
2024-12-30 Facilitating large language model Russian adaptation with Learned Embedding Propagation Mikhail Tikhomirov et.al. 2412.21140 link
2024-12-30 ExpShield: Safeguarding Web Text from Unauthorized Crawling and Language Modeling Exploitation Ruixuan Liu et.al. 2412.21123 null
2024-12-30 Toward Intelligent and Secure Cloud: Large Language Model Empowered Proactive Defense Yuyang Zhou et.al. 2412.21051 link
2024-12-30 TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Chia-Yu Hung et.al. 2412.21037 link
2024-12-30 GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models Shangyu Xing et.al. 2412.21036 null
2024-12-30 Automated Robustness Testing for LLM-based NLP Software Mingxuan Xiao et.al. 2412.21016 link
2024-12-30 MapQaTor: A System for Efficient Annotation of Map Query Datasets Mahir Labib Dihan et.al. 2412.21015 link
2024-12-31 Verbosity-Aware Rationale Reduction: Effective Reduction of Redundant Rationale via Principled Criteria Joonwon Jang et.al. 2412.21006 null
2024-12-27 Can AI Help with Your Personal Finances? Oudom Hean et.al. 2412.19784 null
2024-12-27 Machine Learning for Sentiment Analysis of Imported Food in Trinidad and Tobago Cassandra Daniels et.al. 2412.19781 null
2024-12-27 Fortran2CPP: Automating Fortran-to-C++ Migration using LLMs via Multi-Turn Dialogue and Dual-Agent Integration Le Chen et.al. 2412.19770 link
2024-12-27 Can Large Language Models Adapt to Other Agents In-Context? Matthew Riemer et.al. 2412.19726 null
2024-12-27 Text2Insight: Transform natural language text into insights seamlessly using multi-model architecture Pradeep Sain et.al. 2412.19718 null
2024-12-27 Toward Adaptive Reasoning in Large Language Models with Thought Rollback Sijia Chen et.al. 2412.19707 link
2024-12-27 A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization Jingchun Lian et.al. 2412.19685 null
2024-12-27 Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework Jiang Liu et.al. 2412.19684 null
2024-12-27 CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs Siyu Wang et.al. 2412.19663 link
2024-12-27 FreStega: A Plug-and-Play Method for Boosting Imperceptibility and Capacity in Generative Linguistic Steganography for Real-World Scenarios Kaiyi Pang et.al. 2412.19652 null
2024-12-24 Decentralized Intelligence in GameFi: Embodied AI Agents and the Convergence of DeFi and Virtual Ecosystems Fernando Jia et.al. 2412.18601 link
2024-12-24 A Paragraph is All It Takes: Rich Robot Behaviors from Interacting, Trusted LLMs OpenMind et.al. 2412.18588 null
2024-12-24 Exploring Embedding Priors in Prompt-Tuning for Improved Interpretability and Control Sergey Sedov et.al. 2412.18582 null
2024-12-24 Zero-resource Speech Translation and Recognition with LLMs Karel Mundnich et.al. 2412.18566 null
2024-12-24 Distilling Fine-grained Sentiment Understanding from Large Language Models Yice Zhang et.al. 2412.18552 link
2024-12-24 Token-Budget-Aware LLM Reasoning Tingxu Han et.al. 2412.18547 link
2024-12-24 PLD-Tree: Persistent Laplacian Decision Tree for Protein-Protein Binding Free Energy Prediction Xingjian Xu et.al. 2412.18541 null
2024-12-24 Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation Derong Xu Xinhang Li et.al. 2412.18537 link
2024-12-24 Automated Code Review In Practice Umut Cihan et.al. 2412.18531 null
2024-12-24 Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving Hao Pang et.al. 2412.18511 null
2024-12-23 ChatGarment: Garment Estimation, Generation and Editing via Large Language Models Siyuan Bian et.al. 2412.17811 null
2024-12-23 Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective Xinmiao Yu et.al. 2412.17787 null
2024-12-23 ResearchTown: Simulator of Human Research Community Haofei Yu et.al. 2412.17767 link
2024-12-23 Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy Priyaranjan Pattnayak et.al. 2412.17759 null
2024-12-23 ADC: Enhancing Function Calling Via Adversarial Datasets and Code Line-Level Feedback Wei Zhang et.al. 2412.17754 null
2024-12-23 Deliberation in Latent Space via Differentiable Cache Augmentation Luyang Liu et.al. 2412.17747 null
2024-12-23 YuLan-Mini: An Open Data-efficient Language Model Yiwen Hu et.al. 2412.17743 link
2024-12-23 **Reasoning to Attend: Try to Understand How Token Works** Rui Qian et.al. 2412.17741 link
2024-12-23 Knowledge Editing through Chain-of-Thought Changyue Wang et.al. 2412.17727 link
2024-12-23 Understanding the Logic of Direct Preference Alignment through Logic Kyle Richardson et.al. 2412.17696 null
2024-12-20 HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding Chenxin Tao et.al. 2412.16158 null
2024-12-20 Offline Reinforcement Learning for LLM Multi-Step Reasoning Huaijie Wang et.al. 2412.16145 link
2024-12-20 Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation Seyedreza Mohseni et.al. 2412.16135 link
2024-12-20 Data-Driven Mechanism Design: Jointly Eliciting Preferences and Information Dirk Bergemann et.al. 2412.16132 null
2024-12-20 PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics Daniil Larionov et.al. 2412.16120 null
2024-12-20 Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts Muhammad Abdullah Sohail et.al. 2412.16119 link
2024-12-20 PruneVid: Visual Token Pruning for Efficient Video Large Language Models Xiaohu Huang et.al. 2412.16117 link
2024-12-20 The Content Moderator’s Dilemma: Removal of Toxic Content and Distortions to Online Discourse Mahyar Habibi et.al. 2412.16114 null
2024-12-20 Logical Consistency of Large Language Models in Fact-checking Bishwamittra Ghosh et.al. 2412.16100 null
2024-12-20 The Evolution of LLM Adoption in Industry Data Curation Practices Crystal Qian et.al. 2412.16089 null
2024-12-19 UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency Enis Simsar et.al. 2412.15216 null
2024-12-19 Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Qihao Liu et.al. 2412.15213 null
2024-12-19 OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving Shuo Xing et.al. 2412.15208 link
2024-12-19 AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving Shuo Xing et.al. 2412.15206 link
2024-12-19 MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark Qihao Zhao et.al. 2412.15194 link
2024-12-19 LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation Weijia Shi et.al. 2412.15188 null
2024-12-19 Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning Simon Frieder et.al. 2412.15184 null
2024-12-19 HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages Aman Chaturvedi et.al. 2412.15178 null
2024-12-19 Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying Federico Castagna et.al. 2412.15177 link
2024-12-19 Rethinking Uncertainty Estimation in Natural Language Generation Lukas Aichberger et.al. 2412.15176 null
2024-12-18 Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Jihan Yang et.al. 2412.14171 link
2024-12-18 TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Frank F. Xu et.al. 2412.14161 link
2024-12-18 Advanced Reasoning and Transformation Engine for Multi-Step Insight Synthesis in Data Analytics with Large Language Models Atin Sakkeer Hussain et.al. 2412.14146 null
2024-12-18 LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research Tianyang Gu et.al. 2412.14141 null
2024-12-18 Design choices made by LLM-based test generators prevent them from finding bugs Noble Saji Mathews et.al. 2412.14137 null
2024-12-18 Adversarial Hubness in Multi-Modal Retrieval Tingwei Zhang et.al. 2412.14113 link
2024-12-18 Alignment faking in large language models Ryan Greenblatt et.al. 2412.14093 link
2024-12-18 Future Research Avenues for Artificial Intelligence in Digital Gaming: An Exploratory Report Markus Dablander et.al. 2412.14085 null
2024-12-18 Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification Kyle Thompson et.al. 2412.14063 null
2024-12-18 Understanding and Evaluating Trust in Generative AI and Large Language Models for Spreadsheets Simon Thorne et.al. 2412.14062 null
2024-12-17 SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents Sheng Yin et.al. 2412.13178 link
2024-12-17 DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation Miriam Wanner et.al. 2412.13175 null
2024-12-17 Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study Bolei Ma et.al. 2412.13169 link
2024-12-17 C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System Parker Addison et.al. 2412.13163 null
2024-12-17 BanglishRev: A Large-Scale Bangla-English and Code-mixed Dataset of Product Reviews in E-Commerce Mohammad Nazmush Shamael et.al. 2412.13161 null
2024-12-17 SWAN: Preprocessing SGD Enables Adam-Level Performance On LLM Training With Significant Memory Reduction Chao Ma et.al. 2412.13148 null
2024-12-17 Are Your LLMs Capable of Stable Reasoning? Junnan Liu et.al. 2412.13147 link
2024-12-17 AI PERSONA: Towards Life-long Personalization of LLMs Tiannan Wang et.al. 2412.13103 null
2024-12-17 AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark Jianlyu Chen et.al. 2412.13102 link
2024-12-17 Modality-Inconsistent Continual Learning of Multimodal Large Language Models Weiguo Pian et.al. 2412.13050 null
2024-12-16 SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator Guoxuan Chen et.al. 2412.12094 link
2024-12-16 Instruction-based Image Manipulation by Watching How Things Move Mingdeng Cao et.al. 2412.12087 null
2024-12-16 CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology Yuxuan Sun et.al. 2412.12077 null
2024-12-16 CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding Guo Chen et.al. 2412.12075 null
2024-12-16 Making FETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats Kuleen Sasse et.al. 2412.12072 link
2024-12-16 How Private are Language Models in Abstractive Summarization? Anthony Hughes et.al. 2412.12040 null
2024-12-16 Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection Ira Ceka et.al. 2412.12039 null
2024-12-16 SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval Yueqian Lin et.al. 2412.12009 null
2024-12-16 Agentic AI-Driven Technical Troubleshooting for Enterprise Systems: A Novel Weighted Retrieval-Augmented Generation Paradigm Rajat Khanda et.al. 2412.12006 null
2024-12-16 The Open Source Advantage in Large Language Models (LLMs) Jiya Manchanda et.al. 2412.12004 null
2024-12-13 UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities Muhammad Uzair Khattak et.al. 2412.10372 link
2024-12-13 Robust image classification with multi-modal large language models Francesco Villani et.al. 2412.10353 null
2024-12-13 COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models Yuchen Ren et.al. 2412.10347 null
2024-12-13 Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining Zhiqi Ge et.al. 2412.10342 null
2024-12-13 AdvPrefix: An Objective for Nuanced LLM Jailbreaks Sicheng Zhu et.al. 2412.10321 null
2024-12-13 BrushEdit: All-In-One Image Inpainting and Editing Yaowei Li et.al. 2412.10316 link
2024-12-13 DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Zhiyu Wu et.al. 2412.10302 link
2024-12-13 Buzz to Broadcast: Predicting Sports Viewership Using Social Media Engagement Anakin Trotter et.al. 2412.10298 link
2024-12-13 Still “Talking About Large Language Models”: Some Clarifications Murray Shanahan et.al. 2412.10291 null
2024-12-13 One world, one opinion? The superstar effect in LLM responses Sofie Goethals et.al. 2412.10281 null
2024-12-12 Doe-1: Closed-Loop Autonomous Driving with Large World Model Wenzhao Zheng et.al. 2412.09627 link
2024-12-12 EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Zhuofan Zong et.al. 2412.09618 null
2024-12-12 Olympus: A Universal Task Router for Computer Vision Tasks Yuanze Lin et.al. 2412.09612 link
2024-12-12 SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Hao Li et.al. 2412.09604 null
2024-12-12 Do Multimodal Large Language Models See Like Humans? Jiaying Lin et.al. 2412.09603 null
2024-12-12 InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Pan Zhang et.al. 2412.09596 link
2024-12-12 OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages Chester Palen-Michel et.al. 2412.09587 null
2024-12-12 DISHONEST: Dissecting misInformation Spread using Homogeneous sOcial NEtworks and Semantic Topic classification Caleb Stam et.al. 2412.09578 null
2024-12-12 DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction Yu Feng et.al. 2412.09572 null
2024-12-12 Does Representation Matter? Exploring Intermediate Layers in Large Language Models Oscar Skean et.al. 2412.09563 null
2024-12-11 Generative Semantic Communication: Architectures, Technologies, and Applications Jinke Ren et.al. 2412.08642 null
2024-12-11 Fast Prompt Alignment for Text-to-Image Generation Khalil Mrini et.al. 2412.08639 link
2024-12-11 Multimodal Latent Language Modeling with Next-Token Diffusion Yutao Sun et.al. 2412.08635 null
2024-12-11 Synthetic Vision: Training Vision-Language Models to Understand Physics Vahid Balazadeh et.al. 2412.08619 null
2024-12-11 Image Retrieval Methods in the Dissimilarity Space Madhu Kiran et.al. 2412.08618 null
2024-12-11 Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models Jiahui Li et.al. 2412.08615 link
2024-12-11 Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning Fan Lu et.al. 2412.08614 link
2024-12-11 Preference Discerning with LLM-Enhanced Generative Retrieval Fabian Paischer et.al. 2412.08604 null
2024-12-11 Empirical Measurements of AI Training Power Demand on a GPU-Accelerated Node Imran Latif et.al. 2412.08602 null
2024-12-11 Leveraging Graph-RAG and Prompt Engineering to Enhance LLM-Based Automated Requirement Traceability and Compliance Checks Arsalan Masoudifard et.al. 2412.08593 null
2024-12-10 BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities Sahal Shaji Mullappilly et.al. 2412.07769 null
2024-12-10 Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences Alan Nawzad Amin et.al. 2412.07763 link
2024-12-10 Zero-Shot ATC Coding with Large Language Models for Clinical Assessments Zijian Chen et.al. 2412.07743 null
2024-12-10 Image Retrieval with Intra-Sweep Representation Learning for Neck Ultrasound Scanning Guidance Wanwen Chen et.al. 2412.07741 null
2024-12-10 Granite Guardian Inkit Padhi et.al. 2412.07724 link
2024-12-10 DriveMM: All-in-One Large Multimodal Model for Autonomous Driving Zhijian Huang et.al. 2412.07689 link
2024-12-10 Privacy-Preserving Customer Support: A Framework for Secure and Scalable Interactions Anant Prakash Awasthi et.al. 2412.07687 null
2024-12-10 TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation Alfredo Garrachón Ruiz et.al. 2412.07682 null
2024-12-10 Ask Humans or AI? Exploring Their Roles in Visualization Troubleshooting Shuyu Shen et.al. 2412.07673 null
2024-12-10 FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks Bocheng Chen et.al. 2412.07672 null
2024-12-09 Training Large Language Models to Reason in a Continuous Latent Space Shibo Hao et.al. 2412.06769 null
2024-12-09 Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code Joy Krishan Das et.al. 2412.06757 null
2024-12-09 Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models Neel Jain et.al. 2412.06748 null
2024-12-09 JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM Takuro Fujii et.al. 2412.06738 null
2024-12-09 AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark Lan Li et.al. 2412.06724 null
2024-12-09 DEEPER: Dense Electroencephalography Passage Retrieval Niall McGuire et.al. 2412.06695 null
2024-12-09 OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions Yi-Kai Zhang et.al. 2412.06693 null
2024-12-09 Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach Weichao Xu et.al. 2412.06684 null
2024-12-09 Toward LLM-Agent-Based Modeling of Transportation Systems: A Conceptual Framework Tianming Liu et.al. 2412.06681 null
2024-12-09 I Don’t Know: Explicit Modeling of Uncertainty with an [IDK] Token Roi Cohen et.al. 2412.06676 null
2024-12-06 Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Zhe Chen et.al. 2412.05271 null
2024-12-06 APOLLO: SGD-like Memory, AdamW-level Performance Hanqing Zhu et.al. 2412.05270 link
2024-12-06 CompCap: Improving Multimodal Large Language Models with Composite Captions Xiaohui Chen et.al. 2412.05243 null
2024-12-06 MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Jarvis Guo et.al. 2412.05237 link
2024-12-06 BEExformer: A Fast Inferencing Transformer Architecture via Binarization with Multiple Early Exits Wazib Ansar et.al. 2412.05225 null
2024-12-06 100% Hallucination Elimination Using Acurai Michael C. Wood et.al. 2412.05223 null
2024-12-06 Evaluating and Aligning CodeLLMs on Human Preference Jian Yang et.al. 2412.05210 link
2024-12-06 A Survey of Large Language Model-Based Generative AI for Text-to-SQL: Benchmarks, Applications, Use Cases, and Challenges Aditi Singh et.al. 2412.05208 null
2024-12-06 Are Frontier Large Language Models Suitable for Q&A in Science Centres? Jacob Watson et.al. 2412.05200 null
2024-12-06 SurgBox: Agent-Driven Operating Room Sandbox with Surgery Copilot Jinlin Wu et.al. 2412.05187 link
2024-12-05 p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay Jun Zhang et.al. 2412.04449 link
2024-12-05 EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios Lu Qiu et.al. 2412.04447 null
2024-12-05 Moto: Latent Motion Token as the Bridging Language for Robot Manipulation Yi Chen et.al. 2412.04445 link
2024-12-05 Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Yuying Ge et.al. 2412.04432 link
2024-12-05 Grounding Descriptions in Images informs Zero-Shot Visual Recognition Shaunak Halbe et.al. 2412.04429 link
2024-12-05 Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Jiuhai Chen et.al. 2412.04424 link
2024-12-05 Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation Xuying Li et.al. 2412.04415 null
2024-12-05 Retrieval-Augmented Machine Translation with Unstructured Knowledge Jiaan Wang et.al. 2412.04342 link
2024-12-05 Liquid: Language Models are Scalable Multi-modal Generators Junfeng Wu et.al. 2412.04332 link
2024-12-05 The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation Fredrik Carlsson et.al. 2412.04318 null
2024-12-04 From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents Xinyi Mou et.al. 2412.03563 link
2024-12-04 SPICE: Smart Projection Interface for Cooking Enhancement Vera Prohaska et.al. 2412.03551 null
2024-12-04 Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models Natalie Mackraz et.al. 2412.03537 null
2024-12-04 A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences Gabriel Lino Garcia et.al. 2412.03531 null
2024-12-04 FANAL – Financial Activity News Alerting Language Modeling Framework Urjitkumar Patel et.al. 2412.03527 null
2024-12-04 You’re (Not) My Type – Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks? Dominic Lohr et.al. 2412.03516 null
2024-12-04 Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective Neta Shaul et.al. 2412.03487 null
2024-12-04 Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning Neale Ratzlaff et.al. 2412.03467 null
2024-12-04 From Words to Workflows: Automating Business Processes Laura Minkova et.al. 2412.03446 null
2024-12-04 RedStone: Curating General, Code, Math, and QA Data for Large Language Models Yaoyao Chang et.al. 2412.03398 null
2024-12-03 T-REG: Preference Optimization with Token-Level Reward Regularization Wenxuan Zhou et.al. 2412.02685 link
2024-12-03 Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models Yuda Song et.al. 2412.02674 null
2024-12-03 LLM-Enhanced Path Planning: Safe and Efficient Autonomous Navigation with Instructional Inputs Pranav Doma et.al. 2412.02655 null
2024-12-03 Time-Reversal Provides Unsupervised Feedback to LLMs Yerram Varun et.al. 2412.02626 null
2024-12-03 Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback Hiroki Furuta et.al. 2412.02617 null
2024-12-03 AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? Kaixiong Gong et.al. 2412.02611 link
2024-12-03 Interpretable Company Similarity with Sparse Autoencoders Marco Molinari et.al. 2412.02605 null
2024-12-03 CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs Abhas Kumar et.al. 2412.02602 null
2024-12-03 PrefixLLM: LLM-aided Prefix Circuit Design Weihua Xiao et.al. 2412.02594 null
2024-12-03 OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation Junyuan Zhang et.al. 2412.02592 link
2024-12-02 T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs Shukang Yin et.al. 2411.19951 link
2024-12-02 Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability Zicheng Lin et.al. 2411.19943 link
2024-11-29 VLSBench: Unveiling Visual Leakage in Multimodal Safety Xuhao Hu et.al. 2411.19939 link
2024-11-29 On Domain-Specific Post-Training for Multimodal Large Language Models Daixuan Cheng et.al. 2411.19930 link
2024-11-29 SIMS: Simulating Human-Scene Interactions with Real World Script Planning Wenjia Wang et.al. 2411.19921 null
2024-11-29 PDDLFuse: A Tool for Generating Diverse Planning Domains Vedant Khandelwal et.al. 2411.19886 null
2024-12-02 LUMIA: Linear probing for Unimodal and MultiModal Membership Inference Attacks leveraging internal LLM states Luis Ibanez-Lissen et.al. 2411.19876 null
2024-11-29 AIDetx: a compression-based method for identification of machine-learning generated text Leonardo Almeida et.al. 2411.19869 link
2024-11-29 Reverse Thinking Makes LLMs Stronger Reasoners Justin Chih-Yao Chen et.al. 2411.19865 null
2024-11-29 Cross-Domain Recommendation Meets Large Language Models Ajay Krishna Vajjala et.al. 2411.19862 link
2024-11-27 Cross-modal Information Flow in Multimodal Large Language Models Zhi Zhang et.al. 2411.18620 link
2024-11-27 Automated Literature Review Using NLP Techniques and LLM-Based Retrieval-Augmented Generation Nurshat Fateh Ali et.al. 2411.18583 null
2024-11-27 Challenges in Adapting Multilingual LLMs to Low-Resource Languages using LoRA PEFT Tuning Omkar Khade et.al. 2411.18571 null
2024-11-27 A Pipeline of Neural-Symbolic Integration to Enhance Spatial Reasoning in Large Language Models Rong Wang et.al. 2411.18564 null
2024-11-27 DexDiffuser: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation Zhixuan Liang et.al. 2411.18562 null
2024-11-27 Retrofitting (Large) Language Models with Dynamic Tokenization Darius Feher et.al. 2411.18553 null
2024-11-27 Emergence of Self-Identity in AI: A Mathematical Framework and Empirical Study with Generative Large Language Models Minhyeok Lee et.al. 2411.18530 link
2024-11-27 LLM-ABBA: Understand time series via symbolic approximation Erin Carson et.al. 2411.18506 null
2024-11-27 GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation Pengfei Zhou et.al. 2411.18499 link
2024-11-27 Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS Jinyang Wu et.al. 2411.18478 link
2024-11-26 Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats Jiaxin Wen et.al. 2411.17693 null
2024-11-26 Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens Xu Ouyang et.al. 2411.17691 null
2024-11-26 Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration Yuhang Han et.al. 2411.17686 link
2024-11-26 Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning Zhu Xu et.al. 2411.17679 link
2024-11-26 Push the Limit of Multi-modal Emotion Recognition by Prompting LLMs with Receptive-Field-Aware Attention Weighting Liyun Zhang et.al. 2411.17674 null
2024-11-26 SketchAgent: Language-Driven Sequential Sketch Generation Yael Vinker et.al. 2411.17673 link
2024-11-26 Synthetic Data Generation with LLM for Improved Depression Prediction Andrea Kang et.al. 2411.17672 null
2024-11-26 BERT or FastText? A Comparative Analysis of Contextual as well as Non-Contextual Embeddings Abhay Shanbhag et.al. 2411.17661 null
2024-11-26 Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism Yi-Chien Lin et.al. 2411.17651 link
2024-11-26 On Limitations of LLM as Annotator for Low Resource Languages Suramya Jadhav et.al. 2411.17637 null
2024-11-25 Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts? Sohee Yang et.al. 2411.16679 null
2024-11-25 DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation Zun Wang et.al. 2411.16657 null
2024-11-25 Self-Generated Critiques Boost Reward Modeling for Language Models Yue Yu et.al. 2411.16646 null
2024-11-25 Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective Jean Marie Tshimula et.al. 2411.16642 null
2024-11-25 Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models Ronghuan Wu et.al. 2411.16602 null
2024-11-25 From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge Dawei Li et.al. 2411.16594 link
2024-11-25 Large Language Model-based Decision-making for COLREGs and the Control of Autonomous Surface Vehicles Klinsmann Agyei et.al. 2411.16587 null
2024-11-25 MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series Aaron Wheeler et.al. 2411.16585 null
2024-11-25 Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision Zhiheng Xi et.al. 2411.16579 null
2024-11-25 Predictive Power of LLMs in Financial Markets Jerick Shi et.al. 2411.16569 null
2024-11-22 Measuring Bullshit in the Language Games played by ChatGPT Alessandro Trevisan et.al. 2411.15129 null
2024-11-22 AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution Fengyuan Liu et.al. 2411.15102 link
2024-11-22 XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models Yixin Dong et.al. 2411.15100 link
2024-11-22 Locating the Leading Edge of Cultural Change Sarah Griebel et.al. 2411.15068 link
2024-11-22 mR $^2$ AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA Tao Zhang et.al. 2411.15041 null
2024-11-22 One to rule them all: natural language to bind communication, perception and action Simone Colombani et.al. 2411.15033 null
2024-11-22 Time is on my sight: scene graph filtering for dynamic environment perception in an LLM-driven robot Simone Colombani et.al. 2411.15027 null
2024-11-22 DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models Keda Tao et.al. 2411.15024 link
2024-11-22 FTA generation using GenAI with an Autonomy sensor Usecase Sneha Sudhir Shetiya et.al. 2411.15007 null
2024-11-22 ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data Junhong Shen et.al. 2411.15004 link
2024-11-21 Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Yuhao Dong et.al. 2411.14432 link
2024-11-21 Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding Yiming Zhang et.al. 2411.14401 null
2024-11-21 Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings Aaron Zheng et.al. 2411.14398 null
2024-11-21 UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages Bethel Melesse Tessema et.al. 2411.14343 link
2024-11-21 Velocitune: A Velocity-based Dynamic Domain Reweighting Method for Continual Pre-training Zheheng Luo et.al. 2411.14318 null
2024-11-21 Automated Generation of Code Debugging Exercises Victor-Alexandru Pădurean et.al. 2411.14303 null
2024-11-21 Auto-SPICE: Leveraging LLMs for Dataset Creation via Automated SPICE Netlist Extraction from Analog Circuit Diagrams Jitendra Bhandari et.al. 2411.14299 null
2024-11-21 Efficient Aspect-Based Summarization of Climate Change Reports with Small Language Models Iacopo Ghinassi et.al. 2411.14272 link
2024-11-21 Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective Ernests Lavrinovics et.al. 2411.14258 null
2024-11-21 Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models Javier Ferrando et.al. 2411.14257 null
2024-11-20 SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs Shirley Kokane et.al. 2411.13547 null
2024-11-20 BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Davide Paglieri et.al. 2411.13543 link
2024-11-20 Metacognition for Unknown Situations and Environments (MUSE) Rodolfo Valiente et.al. 2411.13537 null
2024-11-20 Advancing Complex Medical Communication in Arabic with Sporo AraSum: Surpassing Existing Large Language Models Chanseo Lee et.al. 2411.13518 null
2024-11-20 Disentangling Memory and Reasoning Ability in Large Language Models Mingyu Jin et.al. 2411.13504 link
2024-11-20 Utilizing Large Language Models to Synthesize Product Desirability Datasets John D. Hastings et.al. 2411.13485 null
2024-11-20 PatentEdits: Framing Patent Novelty as Textual Entailment Ryan Lee et.al. 2411.13477 null
2024-11-20 When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training Haonan Wang et.al. 2411.13476 link
2024-11-20 SoK: A Systems Perspective on Compound AI Threats and Countermeasures Sarbartha Banerjee et.al. 2411.13459 null
2024-11-20 AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations Gaurav Verma et.al. 2411.13451 null
2024-11-19 ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models Salma Kharrat et.al. 2411.12736 link
2024-11-19 Information Theory of Meaningful Communication Doron Sivan et.al. 2411.12728 null
2024-11-19 CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs Zhehan Kan et.al. 2411.12713 null
2024-11-19 Strengthening Fake News Detection: Leveraging SVM and Sophisticated Text Vectorization Techniques. Defying BERT? Ahmed Akib Jawad Karim et.al. 2411.12703 null
2024-11-19 When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations Huaizhi Ge et.al. 2411.12701 null
2024-11-19 SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference Jiho Shin et.al. 2411.12692 null
2024-11-19 Neurosymbolic Graph Enrichment for Grounded World Models Stefano De Giorgis et.al. 2411.12671 null
2024-11-19 DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models Vinay Kumar Sankarapu et.al. 2411.12643 link
2024-11-19 Improving Controllability and Editability for Pretrained Text-to-Music Generation Models Yixiao Zhang et.al. 2411.12641 null
2024-11-19 AdaCM $^2$ : On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction Yuanbin Man et.al. 2411.12593 null
2024-11-18 Bi-Mamba: Towards Accurate 1-Bit State Space Models Shengkun Tang et.al. 2411.11843 null
2024-11-18 Tackling prediction tasks in relational databases with LLMs Marek Wydmuch et.al. 2411.11829 null
2024-11-18 Exploring adversarial robustness of JPEG AI: methodology, comparison and new methods Egor Kovalev et.al. 2411.11795 null
2024-11-18 LLM-IE: A Python Package for Generative Information Extraction with Large Language Models Enshuo Hsu et.al. 2411.11779 null
2024-11-18 The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning Longju Bai et.al. 2411.11758 link
2024-11-18 sMoRe: Enhancing Object Manipulation and Organization in Mixed Reality Spaces with LLMs and Generative AI Yunhao Xing et.al. 2411.11752 null
2024-11-18 BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration Yuzong Chen et.al. 2411.11745 link
2024-11-18 Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment Allison Huang et.al. 2411.11731 null
2024-11-18 Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation Mingchao Qi et.al. 2411.11714 link
2024-11-18 FedCoLLM: A Parameter-Efficient Federated Co-tuning Framework for Large and Small Language Models Tao Fan et.al. 2411.11707 null
2024-11-15 Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Weiyun Wang et.al. 2411.10442 link
2024-11-15 LLaVA-o1: Let Vision Language Models Reason Step-by-Step Guowei Xu et.al. 2411.10440 link
2024-11-15 MARS: Unleashing the Power of Variance Reduction for Training Large Models Huizhuo Yuan et.al. 2411.10438 link
2024-11-15 Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization Yuhan Fu et.al. 2411.10436 null
2024-11-15 Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent Balderdash Parsa Hejabi et.al. 2411.10422 link
2024-11-15 Interactive Cycle Model – The Linkage Combination among Automatic Speech Recognition, Large Language Models and Smart Glasses Libo Wang et.al. 2411.10362 null
2024-11-15 Bias Unveiled: Investigating Social Bias in LLM-Generated Code Lin Ling et.al. 2411.10351 null
2024-11-15 On the Cost of Model-Serving Frameworks: An Experimental Evaluation Pasquale De Rosa et.al. 2411.10337 null
2024-11-15 Number it: Temporal Grounding Videos like Flipping Manga Yongliang Wu et.al. 2411.10332 link
2024-11-15 Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting Ziqi Xie et.al. 2411.10309 link
2024-11-14 MagicQuill: An Intelligent Interactive Image Editing System Zichen Liu et.al. 2411.09703 link
2024-11-14 Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models Wei Wang et.al. 2411.09691 null
2024-11-14 Squeezed Attention: Accelerating Long Context Length LLM Inference Coleman Hooper et.al. 2411.09688 link
2024-11-14 Towards a Classification of Open-Source ML Models and Datasets for Software Engineering Alexandra González et.al. 2411.09683 null
2024-11-14 Med-Bot: An AI-Powered Assistant to Provide Accurate and Reliable Medical Information Ahan Bhatt et.al. 2411.09648 null
2024-11-14 Local deployment of large-scale music AI models on commodity hardware Xun Zhou et.al. 2411.09625 null
2024-11-14 PTR: Precision-Driven Tool Recommendation for Large Language Models Hang Gao et.al. 2411.09613 null
2024-11-14 The Moral Foundations Weibo Corpus Renjie Cao et.al. 2411.09612 null
2024-11-14 Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework Ronak Pradeep et.al. 2411.09607 null
2024-11-14 Accelerating Knowledge Graph and Ontology Engineering with Large Language Models Cogan Shimizu et.al. 2411.09601 null
2024-11-13 The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models Daniel P. Jeong et.al. 2411.08870 null
2024-11-13 LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs Piyush Jha et.al. 2411.08862 null
2024-11-13 Multimodal Instruction Tuning with Hybrid State Space Models Jianing Zhou et.al. 2411.08840 null
2024-11-13 FinRobot: AI Agent for Equity Research and Valuation with Large Language Models Tianyu Zhou et.al. 2411.08804 link
2024-11-13 Evaluating World Models with LLM for Decision Making Chang Yang et.al. 2411.08794 null
2024-11-13 Can sparse autoencoders be used to decompose and interpret steering vectors? Harry Mayne et.al. 2411.08790 link
2024-11-13 Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers Clément Dumas et.al. 2411.08745 link
2024-11-13 A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models Dingdong Wang et.al. 2411.08742 null
2024-11-13 Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models Somanshu Singla et.al. 2411.08733 link
2024-11-13 Polymetis:Large Language Modeling for Multiple Material Domains Chao Huang et.al. 2411.08728 null
2024-11-12 Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data Juanhui Li et.al. 2411.08028 null
2024-11-12 LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models Anoop Cherian et.al. 2411.08027 null
2024-11-12 Language Models as Causal Effect Generators Lucius E. J. Bynum et.al. 2411.08019 link
2024-11-12 ExpressivityArena: Can LLMs Express Information Implicitly? Joshua Tint et.al. 2411.08010 null
2024-11-12 Can adversarial attacks by large language models be attributed? Manuel Cebrian et.al. 2411.08003 null
2024-11-12 Derivational Morphology Reveals Analogical Generalization in Large Language Models Valentin Hofmann et.al. 2411.07990 null
2024-11-12 JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Yiyang Ma et.al. 2411.07975 link
2024-11-12 From General to Specific: Utilizing General Hallucation to Automatically Measure the Role Relationship Fidelity for Specific Role-Play Agents Chuyi Kong et.al. 2411.07965 null
2024-11-12 Towards Low-bit Communication for Tensor Parallel LLM Inference Harry Dong et.al. 2411.07942 null
2024-11-12 Leveraging Multimodal Models for Enhanced Neuroimaging Diagnostics in Alzheimer’s Disease Francesco Chiumento et.al. 2411.07871 null
2024-11-11 UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts Bo Yang et.al. 2411.07240 link
2024-11-11 OpenThaiGPT 1.5: A Thai-Centric Open Source Large Language Model Sumeth Yuenyong et.al. 2411.07238 null
2024-11-11 Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving Botao Yu et.al. 2411.07228 null
2024-11-11 Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks Madeline Brumley et.al. 2411.07213 null
2024-11-11 DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID Nyle Siddiqui et.al. 2411.07205 link
2024-11-11 The Super Weight in Large Language Models Mengxia Yu et.al. 2411.07191 link
2024-11-11 NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics David Robinson et.al. 2411.07186 null
2024-11-11 Gradual Fine-Tuning with Graph Routing for Multi-Source Unsupervised Domain Adaptation Yao Ma et.al. 2411.07185 null
2024-11-11 Continual Memorization of Factoids in Large Language Models Howard Chen et.al. 2411.07175 link
2024-11-11 A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19 Vedant Khandelwal et.al. 2411.07163 null
2024-11-08 Recycled Attention: Efficient inference for long-context language models Fangyuan Xu et.al. 2411.05787 link
2024-11-08 Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths? Veronica Chatrath et.al. 2411.05775 null
2024-11-08 Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024 Christopher Malon et.al. 2411.05762 null
2024-11-08 Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models Jia-Hong Huang et.al. 2411.05706 null
2024-11-08 Unmasking the Limits of Large Language Models: A Systematic Evaluation of Masked Text Processing Ability through MskQA and MskCal Fuka Matsuzaki et.al. 2411.05665 link
2024-11-08 The influence of persona and conversational task on social interactions with a LLM-controlled embodied conversational agent Leon O. H. Kroczek et.al. 2411.05653 null
2024-11-08 LightVA: Lightweight Visual Analytics with LLM Agent-Based Task Planning and Execution Yuheng Zhao et.al. 2411.05651 null
2024-11-08 Evaluating Large Language Model Capability in Vietnamese Fact-Checking Data Generation Long Truong To et.al. 2411.05641 null
2024-11-08 Assessing Open-Source Large Language Models on Argumentation Mining Subtasks Mohammad Yeghaneh Abkenar et.al. 2411.05639 null
2024-11-08 A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis Cristiano Patrício et.al. 2411.05609 null
2024-11-07 SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models Muyang Li et.al. 2411.05007 link
2024-11-07 Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? Jonathan Roberts et.al. 2411.05000 link
2024-11-07 LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Weiquan Huang et.al. 2411.04997 link
2024-11-07 Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Weixin Liang et.al. 2411.04996 link
2024-11-07 Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and Alternatives Hao Sun et.al. 2411.04991 link
2024-11-07 Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries Dylan Manuel et.al. 2411.04981 null
2024-11-07 SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference Gabriele Oliaro et.al. 2411.04975 null
2024-11-07 BitNet a4.8: 4-bit Activations for 1-bit LLMs Hongyu Wang et.al. 2411.04965 link
2024-11-07 Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability Yanjun Gao et.al. 2411.04962 null
2024-11-07 CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM Jingwei Xu et.al. 2411.04954 link
2024-11-06 Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? Daniel P. Jeong et.al. 2411.04118 null
2024-11-06 How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis Guan Zhe Hong et.al. 2411.04105 null
2024-11-06 Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation Ke Fan et.al. 2411.04079 null
2024-11-06 Beemo: Benchmark of Expert-edited Machine-generated Outputs Ekaterina Artemova et.al. 2411.04032 link
2024-11-06 Prompt Engineering Using GPT for Word-Level Code-Mixed Language Identification in Low-Resource Dravidian Languages Aniket Deroy et.al. 2411.04025 null
2024-11-06 Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval Davide Buoso et.al. 2411.04006 null
2024-11-06 Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning Jiawei Yao et.al. 2411.03978 null
2024-11-06 What Really is Commonsense Knowledge? Quyet V. Do et.al. 2411.03964 null
2024-11-06 How Does A Text Preprocessing Pipeline Affect Ontology Syntactic Matching? Zhangcheng Qiang et.al. 2411.03962 null
2024-11-06 Fine-Grained Guidance for Retrievers: Leveraging LLMs’ Feedback in Retrieval-Augmented Generation Yuhang Liu et.al. 2411.03957 null
2024-11-05 MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning Ziliang Gan et.al. 2411.03314 null
2024-11-05 LLMs for Domain Generation Algorithm Detection Reynier Leyva La O et.al. 2411.03307 null
2024-11-05 VERITAS: A Unified Approach to Reliability Evaluation Rajkumar Ramamurthy et.al. 2411.03300 null
2024-11-05 Examining Human-AI Collaboration for Co-Writing Constructive Comments Online Farhana Shahid et.al. 2411.03295 null
2024-11-05 Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation? Jingyu Xiao et.al. 2411.03292 null
2024-11-05 The Future of Intelligent Healthcare: A Systematic Analysis and Discussion on the Integration and Impact of Robots Using Large Language Models for Healthcare Souren Pashangpour et.al. 2411.03287 null
2024-11-05 SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents Dawei Li et.al. 2411.03284 link
2024-11-05 Spontaneous Emergence of Agent Individuality through Social Interactions in LLM-Based Communities Ryosuke Takata et.al. 2411.03252 null
2024-11-05 DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models Ying Zhou et.al. 2411.03250 null
2024-11-05 From Pen to Prompt: How Creative Writers Integrate AI into their Writing Practice Alicia Guo et.al. 2411.03137 null
2024-11-04 Training-free Regional Prompting for Diffusion Transformers Anthony Chen et.al. 2411.02395 link
2024-11-04 Adaptive Length Image Tokenization via Recurrent Allocation Shivam Duggal et.al. 2411.02393 link
2024-11-04 Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models Guangzhi Xiong et.al. 2411.02382 null
2024-11-04 Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI Ramneet Kaur et.al. 2411.02381 null
2024-11-04 DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution Yang Yue et.al. 2411.02359 link
2024-11-04 “Give Me BF16 or Give Me Death”? Accuracy-Performance Trade-Offs in LLM Quantization Eldar Kurtic et.al. 2411.02355 null
2024-11-04 Social-RAG: Retrieving from Group Interactions to Socially Ground Proactive AI Generation to Group Preferences Ruotong Wang et.al. 2411.02353 null
2024-11-04 Can Large Language Models generalize analogy solving like people can? Claire E. Stevenson et.al. 2411.02348 null
2024-11-04 WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning Zehan Qi et.al. 2411.02337 link
2024-11-04 Sparsing Law: Towards Large Language Models with Greater Activation Sparsity Yuqi Luo et.al. 2411.02335 link
2024-10-31 P-Masking: Power Law Masking Improves Multi-attribute Controlled Generation Mohamed Elgaar et.al. 2410.24201 null
2024-11-01 SelfCodeAlign: Self-Alignment for Code Generation Yuxiang Wei et.al. 2410.24198 link
2024-10-31 Constraint Back-translation Improves Complex Instruction Following of Large Language Models Yunjia Qi et.al. 2410.24175 link
2024-10-31 Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning Jinghan Zhang et.al. 2410.24155 null
2024-10-31 Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning Jiaqi Liu et.al. 2410.24152 null
2024-10-31 Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age Nouar AlDahoul et.al. 2410.24148 null
2024-11-01 Multi-environment Topic Models Dominic Sobhani et.al. 2410.24126 null
2024-10-31 Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing Akash Dhruv et.al. 2410.24119 link
2024-10-31 Repository-Level Compositional Code Translation and Validation Ali Reza Ibrahimzada et.al. 2410.24117 null
2024-10-31 Nearest Neighbor Normalization Improves Multimodal Retrieval Neil Chowdhury et.al. 2410.24114 link
2024-10-30 EMMA: End-to-End Multimodal Model for Autonomous Driving Jyh-Jing Hwang et.al. 2410.23262 null
2024-10-30 Evaluating Cultural and Social Awareness of LLM Web Agents Haoyi Qiu et.al. 2410.23252 null
2024-10-30 Carrot and Stick: Eliciting Comparison Data and Beyond Yiling Chen et.al. 2410.23243 null
2024-10-30 A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment Matteo G. Mecattaf et.al. 2410.23242 null
2024-10-30 EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning Peide Huang et.al. 2410.23234 null
2024-10-31 Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval Sheryl Hsu et.al. 2410.23214 null
2024-10-30 Reliability of Topic Modeling Kayla Schroeder et.al. 2410.23186 null
2024-10-30 ProTransformer: Robustify Transformers via Plug-and-Play Paradigm Zhichao Hou et.al. 2410.23182 null
2024-10-30 ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning Millennium Bismay et.al. 2410.23180 link
2024-10-30 SciPIP: An LLM-based Scientific Paper Idea Proposer Wenxiao Wang et.al. 2410.23166 link
2024-10-29 Enhancing Code Annotation Reliability: Generative AI’s Role in Comment Quality Assessment Models Seetharam Killivalavan et.al. 2410.22323 null
2024-10-29 Online Detecting LLM-Generated Texts via Sequential Hypothesis Testing by Betting Can Chen et.al. 2410.22318 link
2024-10-29 Natural Language Inference Improves Compositionality in Vision-Language Models Paola Cascante-Bonilla et.al. 2410.22315 null
2024-10-29 GPT-4o reads the mind in the eyes James W. A. Strachan et.al. 2410.22309 null
2024-10-29 SVIP: Towards Verifiable Inference of Open-source Large Language Models Yifan Sun et.al. 2410.22307 null
2024-10-29 Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning Yihe Deng et.al. 2410.22304 null
2024-10-29 LLMs are Highly-Constrained Biophysical Sequence Optimizers Angelica Chen et.al. 2410.22296 null
2024-10-29 Fine-Tuning LLMs for Code Mutation: A New Era of Cyber Threats Mohammad Setak et.al. 2410.22293 null
2024-10-29 Embedding-based classifiers can detect prompt injection attacks Md. Ahsan Ayub et.al. 2410.22284 link
2024-10-29 Whose ChatGPT? Unveiling Real-World Educational Inequalities Introduced by Large Language Models Renzhe Yu et.al. 2410.22282 null
2024-10-28 Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics Yaniv Nikankin et.al. 2410.21272 link
2024-10-28 LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior Hanyu Wang et.al. 2410.21264 link
2024-10-28 AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? Han Bao et.al. 2410.21259 link
2024-10-28 LongReward: Improving Long-context Large Language Models with AI Feedback Jiajie Zhang et.al. 2410.21252 link
2024-10-28 Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback Nour Jedidi et.al. 2410.21242 null
2024-10-28 Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce Zhantao Yang et.al. 2410.21237 null
2024-10-28 Flaming-hot Initiation with Regular Execution Sampling for Large Language Models Weizhe Chen et.al. 2410.21236 null
2024-10-28 LoRA vs Full Fine-tuning: An Illusion of Equivalence Reece Shuttleworth et.al. 2410.21228 null
2024-10-28 Lifting the Veil on the Large Language Model Supply Chain: Composition, Risks, and Mitigations Kaifeng Huang et.al. 2410.21218 null
2024-10-28 BongLLaMA: LLaMA for Bangla Language Abdullah Khan Zehady et.al. 2410.21200 null
2024-10-25 The Potential and Value of AI Chatbot in Personalized Cognitive Training Zilong Wang et.al. 2410.19733 null
2024-10-25 Counting Ability of Large Language Models and Impact of Tokenization Xiang Zhang et.al. 2410.19730 link
2024-10-25 FISHNET: Financial Intelligence from Sub-querying, Harmonizing, Neural-Conditioning, Expert Swarms, and Task Planning Nicole Cho et.al. 2410.19727 null
2024-10-25 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision Shilong Li et.al. 2410.19720 null
2024-10-25 TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning Xiangyu Zeng et.al. 2410.19702 link
2024-10-25 IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation Kaixian Qu et.al. 2410.19697 null
2024-10-25 Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs Yifei Zhang et.al. 2410.19694 null
2024-10-25 APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs Huaxiaoyue Wang et.al. 2410.19656 null
2024-10-25 Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina Yuan Gao et.al. 2410.19599 null
2024-10-25 Diverse Sign Language Translation Xin Shen et.al. 2410.19586 null
2024-10-24 Unbounded: A Generative Infinite Game of Character Life Simulation Jialu Li et.al. 2410.18975 null
2024-10-24 Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms Zhangheng Li et.al. 2410.18967 link
2024-10-24 Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions Yujuan Fu et.al. 2410.18966 null
2024-10-24 OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning Xiaoqiang Wang et.al. 2410.18963 link
2024-10-24 Bridge-Coder: Unlocking LLMs’ Potential to Overcome Language Gaps in Low-Resource Code Jipeng Zhang et.al. 2410.18957 null
2024-10-24 BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning Yujuan Velvin Fu et.al. 2410.18955 null
2024-10-24 Dynamic Vocabulary Pruning in Early-Exit LLMs Jort Vincenti et.al. 2410.18952 link
2024-10-24 SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models Zonghao Ying et.al. 2410.18927 null
2024-10-24 From Blind Solvers to Logical Thinkers: Benchmarking LLMs’ Logical Integrity on Faulty Mathematical Problems A M Muntasir Rahman et.al. 2410.18921 null
2024-10-24 A Survey on Speech Large Language Models Jing Peng et.al. 2410.18908 null
2024-10-23 TP-Eval: Tap Multimodal LLMs’ Potential in Evaluation by Customizing Prompts Yuxuan Xie et.al. 2410.18071 null
2024-10-23 LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering Qingfei Zhao et.al. 2410.18050 link
2024-10-23 Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases Anna Glazkova et.al. 2410.18040 null
2024-10-23 MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning Jingfan Zhang et.al. 2410.18035 null
2024-10-23 GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration Xin Li et.al. 2410.18032 link
2024-10-23 MiniFed : Integrating LLM-based Agentic-Workflow for Simulating FOMC Meeting Sungil Seok et.al. 2410.18012 null
2024-10-23 Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation Suho Kang et.al. 2410.18001 link
2024-10-23 Zeitenwenden: Detecting changes in the German political discourse Kai-Robin Lange et.al. 2410.17960 null
2024-10-23 ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference Xin He et.al. 2410.17954 null
2024-10-23 SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains Ran Xu et.al. 2410.17952 null
2024-10-22 Altogether: Image Captioning via Re-aligning Alt-text Hu Xu et.al. 2410.17251 null
2024-10-22 Large Language Models Empowered Personalized Web Agents Hongru Cai et.al. 2410.17236 null
2024-10-22 Automated Spinal MRI Labelling from Reports Using a Large Language Model Robin Y. Park et.al. 2410.17235 link
2024-10-22 Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy Benedict Aaron Tjandra et.al. 2410.17234 null
2024-10-22 Few-shot In-Context Preference Learning Using Large Language Models Chao Yu et.al. 2410.17233 null
2024-10-22 Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods Tsachi Blau et.al. 2410.17222 null
2024-10-22 Exploring Possibilities of AI-Powered Legal Assistance in Bangladesh through Large Language Modeling Azmine Toushik Wasi et.al. 2410.17210 link
2024-10-22 VoiceBench: Benchmarking LLM-Based Voice Assistants Yiming Chen et.al. 2410.17196 link
2024-10-22 Language Model Non-myopic Generation for Reasoning and Planning Chang Ma et.al. 2410.17195 null
2024-10-22 From Attention to Activation: Unravelling the Enigmas of Large Language Models Prannay Kaul et.al. 2410.17174 null
2024-10-21 Reflection-Bench: probing AI intelligence with reflection Lingyu Li et.al. 2410.16270 link
2024-10-21 Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance Zhangwei Gao et.al. 2410.16261 link
2024-10-21 Elucidating the design space of language models for image generation Xuantong Liu et.al. 2410.16257 null
2024-10-21 CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Maosong Cao et.al. 2410.16256 link
2024-10-21 Can Knowledge Editing Really Correct Hallucinations? Baixiang Huang et.al. 2410.16251 link
2024-10-21 Analyzing Context Contributions in LLM-based Machine Translation Emmanouil Zaranis et.al. 2410.16246 null
2024-10-21 IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems Yihuan Mao et.al. 2410.16237 null
2024-10-21 LLaVA-KD: A Framework of Distilling Multimodal Large Language Models Yuxuan Cai et.al. 2410.16236 null
2024-10-21 ToW: Thoughts of Words Improve Reasoning in Large Language Models Zhikun Xu et.al. 2410.16235 null
2024-10-21 Building A Coding Assistant via the Retrieval-Augmented Language Model Xinze Li et.al. 2410.16229 null
2024-10-18 Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts German Gritsai et.al. 2410.14677 null
2024-10-18 SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment Qin Liu et.al. 2410.14676 null
2024-10-18 Enhancing Large Language Models’ Situated Faithfulness to External Contexts Yukun Huang et.al. 2410.14675 link
2024-10-18 NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Baiqi Li et.al. 2410.14669 null
2024-10-18 MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps Xiongtao Zhou et.al. 2410.14668 link
2024-10-18 A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning Shengjie Sun et.al. 2410.14660 null
2024-10-18 EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search Oliver Sieberling et.al. 2410.14649 null
2024-10-18 Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs Runchu Tian et.al. 2410.14641 link
2024-10-18 GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings Raghuveer Thirukovalluru et.al. 2410.14635 null
2024-10-18 You Shall Know a Tool by the Traces it Leaves: The Predictability of Sentiment Analysis Tools Daniel Baumartz et.al. 2410.14626 null
2024-10-17 Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens Lijie Fan et.al. 2410.13863 null
2024-10-17 PUMA: Empowering Unified MLLM with Multi-granular Visual Generation Rongyao Fang et.al. 2410.13861 link
2024-10-17 $γ-$ MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models Yaxin Luo et.al. 2410.13859 null
2024-10-17 How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs Guhao Feng et.al. 2410.13857 null
2024-10-17 Can MLLMs Understand the Deep Implication Behind Chinese Images? Chenhao Zhang et.al. 2410.13854 link
2024-10-17 Retrospective Learning from Interactions Zizhao Chen et.al. 2410.13852 null
2024-10-17 SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction Xuan Zhang et.al. 2410.13846 link
2024-10-17 Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs Tianyu Guo et.al. 2410.13835 null
2024-10-17 AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents Ke Yang et.al. 2410.13825 null
2024-10-17 Harnessing Webpage UIs for Text-Rich Visual Understanding Junpeng Liu et.al. 2410.13824 null
2024-10-16 Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media Ross Deans Kristensen-McLachlan et.al. 2410.12791 null
2024-10-16 Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception Jihao Zhao et.al. 2410.12788 null
2024-10-16 In-Context Learning Enables Robot Action Prediction in LLMs Yida Yin et.al. 2410.12782 null
2024-10-16 Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information Yingya Li et.al. 2410.12774 null
2024-10-16 StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples Ajay Patel et.al. 2410.12757 null
2024-10-16 Comparative Analysis of Extrinsic Factors for NER in French Grace Yang et.al. 2410.12750 null
2024-10-16 CREAM: Consistency Regularized Self-Rewarding Language Models Zhaoyang Wang et.al. 2410.12735 null
2024-10-16 FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression Zhenheng Tang et.al. 2410.12707 null
2024-10-16 WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Genta Indra Winata et.al. 2410.12705 null
2024-10-16 Sarcasm Detection in a Less-Resourced Language Lazar Đoković et.al. 2410.12704 null
2024-10-15 GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation Fei Tang et.al. 2410.11841 null
2024-10-15 MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding Yue Cao et.al. 2410.11829 link
2024-10-15 SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing Zhiyuan Zhang et.al. 2410.11815 null
2024-10-15 NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models Han Han et.al. 2410.11805 null
2024-10-15 FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting Zhe Li et.al. 2410.11802 null
2024-10-15 Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability Tsz Ting Chung et.al. 2410.11786 null
2024-10-15 G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks Guibin Zhang et.al. 2410.11782 null
2024-10-15 Language Models Encode Numbers Using Digit Representations in Base 10 Amit Arnold Levy et.al. 2410.11781 null
2024-10-15 MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation Chenxi Wang et.al. 2410.11779 link
2024-10-15 Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models Kai Yao et.al. 2410.11772 link
2024-10-14 DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Guangxuan Xiao et.al. 2410.10819 link
2024-10-14 TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models Mu Cai et.al. 2410.10818 null
2024-10-14 Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Ziyue Li et.al. 2410.10814 null
2024-10-14 LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory Di Wu et.al. 2410.10813 link
2024-10-14 Local and Global Decoding in Text Generation Daniel Gareev et.al. 2410.10810 link
2024-10-14 Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning Aakanksha et.al. 2410.10801 null
2024-10-14 Towards Foundation Models for 3D Vision: How Close Are We? Yiming Zuo et.al. 2410.10799 null
2024-10-14 MMAR: Towards Lossless Multi-Modal Auto-Regressive Prababilistic Modeling Jian Yang et.al. 2410.10798 null
2024-10-14 Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance Sachin Goyal et.al. 2410.10796 link
2024-10-14 LiveXiv – A Multi-Modal Live Benchmark Based on Arxiv Papers Content Nimrod Shabtay et.al. 2410.10783 link
2024-10-11 MiRAGeNews: Multimodal Realistic AI-Generated News Detection Runsheng Huang et.al. 2410.09045 null
2024-10-11 AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation Zijun Wang et.al. 2410.09040 link
2024-10-11 Semi-Supervised Learning of Noisy Mixture of Experts Models Oh-Ran Kwon et.al. 2410.09039 null
2024-10-11 SimpleStrat: Diversifying Language Model Generation with Stratification Justin Wong et.al. 2410.09038 null
2024-10-11 Mentor-KD: Making Small Language Models Better Multi-step Reasoners Hojae Lee et.al. 2410.09037 link
2024-10-11 PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents Xiangyu Yin et.al. 2410.09034 null
2024-10-11 The Impact of Visual Information in Chinese Characters: Evaluating Large Models’ Ability to Recognize and Utilize Radicals Xiaofeng Wu et.al. 2410.09013 null
2024-10-11 Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models Hao Li et.al. 2410.09012 null
2024-10-11 SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights Ling Yang et.al. 2410.09008 link
2024-10-11 From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating UI Operation Impacts Zhuohao Jerry Zhang et.al. 2410.09006 null
2024-10-10 Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision Shengcao Cao et.al. 2410.08209 null
2024-10-10 Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training Gen Luo et.al. 2410.08202 null
2024-10-10 From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions Changle Qu et.al. 2410.08197 link
2024-10-10 MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code Zimu Lu et.al. 2410.08196 link
2024-10-10 GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment Yuancheng Xu et.al. 2410.08193 null
2024-10-10 Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models Qingni Wang et.al. 2410.08174 null
2024-10-10 On the Evaluation of Generative Robotic Simulations Feng Chen et.al. 2410.08172 null
2024-10-10 Agent S: An Open Agentic Framework that Uses Computers Like a Human Saaket Agashe et.al. 2410.08164 link
2024-10-10 Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning Amrith Setlur et.al. 2410.08146 null
2024-10-10 Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs Xiaoyuan Liu et.al. 2410.08145 null
2024-10-09 Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models Fei Wang et.al. 2410.07176 null
2024-10-09 Do better language models have crisper vision? Jona Ruthardt et.al. 2410.07173 null
2024-10-09 Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate Qidong Huang et.al. 2410.07167 link
2024-10-09 Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making Manling Li et.al. 2410.07166 link
2024-10-09 Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning Chongyu Fan et.al. 2410.07163 null
2024-10-09 Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis Bohan Zeng et.al. 2410.07155 link
2024-10-09 Mental Disorders Detection in the Era of Large Language Models Gleb Kuzmin et.al. 2410.07129 null
2024-10-09 Personalized Visual Instruction Tuning Renjie Pi et.al. 2410.07113 null
2024-10-09 I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy Gian Maria Campedelli et.al. 2410.07109 null
2024-10-09 Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context Sangwon Yu et.al. 2410.07103 null
2024-10-07 Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models Fei Wang et.al. 2410.05269 null
2024-10-07 PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs Mengzhao Chen et.al. 2410.05265 link
2024-10-07 TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles Qingchen Yu et.al. 2410.05262 link
2024-10-07 Differential Transformer Tianzhu Ye et.al. 2410.05258 null
2024-10-07 GLEE: A Unified Framework and Benchmark for Language-based Economic Environments Eilam Shapira et.al. 2410.05254 link
2024-10-07 Causal Micro-Narratives Mourad Heddaya et.al. 2410.05252 null
2024-10-07 LoTLIP: Improving Language-Image Pre-training for Long Text Understanding Wei Wu et.al. 2410.05249 null
2024-10-07 SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe Yuxin Xiao et.al. 2410.05248 null
2024-10-07 Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents Boyu Gou et.al. 2410.05243 null
2024-10-07 GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models Iman Mirzadeh et.al. 2410.05229 null
2024-10-04 Enhance Reasoning by Learning from Mistakes: Peer-Review Knowledge Distillation from Multiple Large Language Models Zhuochun Li et.al. 2410.03663 null
2024-10-04 RAFT: Realistic Attacks to Fool Text Detectors James Wang et.al. 2410.03658 null
2024-10-04 Aligning LLMs with Individual Preferences via Interaction Shujin Wu et.al. 2410.03642 link
2024-10-04 Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation Jie Xiao et.al. 2410.03613 null
2024-10-04 TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation Jonathan Cook et.al. 2410.03608 null
2024-10-04 Efficiently Identifying Watermarked Segments in Mixed-Source Texts Xuandong Zhao et.al. 2410.03600 null
2024-10-04 Understanding Reasoning in Chain-of-Thought from the Hopfieldian View Lijie Hu et.al. 2410.03595 null
2024-10-04 Explicit, Implicit, and Scattered: Revisiting Event Extraction to Capture Complex Arguments Omar Sharif et.al. 2410.03594 null
2024-10-04 Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models Xin Zou et.al. 2410.03577 null
2024-10-04 Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs) Abrar Rahman et.al. 2410.03568 null
2024-10-03 FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models Zhipei Xu et.al. 2410.02761 null
2024-10-03 Loong: Generating Minute-level Long Videos with Autoregressive Language Models Yuqing Wang et.al. 2410.02757 null
2024-10-03 SIEVE: General Purpose Data Filtering System Matching GPT-4o Accuracy at 1% the Cost Jifan Zhang et.al. 2410.02755 null
2024-10-03 Training Language Models on Synthetic Edit Sequences Improves Code Synthesis Ulyana Piterbarg et.al. 2410.02749 null
2024-10-03 CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation Han He et.al. 2410.02748 null
2024-10-03 Contrastive Localized Language-Image Pre-Training Hong-You Chen et.al. 2410.02746 null
2024-10-03 Neutral residues: revisiting adapters for model extension Franck Signe Talla et.al. 2410.02744 null
2024-10-03 MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions Yekun Chai et.al. 2410.02743 null
2024-10-03 Grounding Large Language Models In Embodied Environment With Imperfect World Models Haolan Liu et.al. 2410.02742 null
2024-10-03 Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization Lei Xu et.al. 2410.02741 null
2024-10-02 Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads Yuxiang Huang et.al. 2410.01805 link
2024-10-02 Efficient $1$ -bit tensor approximations Alex W. Neal Riasanovsky et.al. 2410.01799 null
2024-10-02 Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models Joseph Lee et.al. 2410.01795 link
2024-10-02 When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 R. Thomas McCoy et.al. 2410.01792 null
2024-10-02 Investigating on RLHF methodology Alexey Kutalev et.al. 2410.01789 null
2024-10-02 OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation Models Heng Yang et.al. 2410.01784 link
2024-10-02 Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models Shayekh Bin Islam et.al. 2410.01782 null
2024-10-02 Quantifying Generalization Complexity for Large Language Models Zhenting Qi et.al. 2410.01769 null
2024-10-02 LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks Mengzhao Jia et.al. 2410.01744 null
2024-10-02 VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models Kailai Feng et.al. 2410.01738 link
2024-09-30 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Haotian Zhang et.al. 2409.20566 null
2024-09-30 Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos Md Mohaiminul Islam et.al. 2409.20557 null
2024-09-30 LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation Ziyao Zhang et.al. 2409.20550 null
2024-09-30 Robi Butler: Remote Multimodal Interactions with Household Robot Assistant Anxing Xiao et.al. 2409.20548 null
2024-09-30 Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models Arpan Mukherjee et.al. 2409.20512 null
2024-09-30 COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models Divyanshu Daiya et.al. 2409.20502 null
2024-10-02 Linear Projections of Teacher Embeddings for Few-Class Distillation Noel Loo et.al. 2409.20449 null
2024-10-01 Instance-adaptive Zero-shot Chain-of-Thought Prompting Xiaosong Yuan et.al. 2409.20441 null
2024-09-30 HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding Fan Yuan et.al. 2409.20429 null
2024-09-30 World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering Jiacong Wang et.al. 2409.20424 null
2024-09-27 LML: Language Model Learning a Dataset for Data-Augmented Prediction Praneeth Vadlapati et.al. 2409.18957 link
2024-09-27 Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models Jiaming Li et.al. 2409.18943 link
2024-09-27 From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding Heqing Zou et.al. 2409.18938 link
2024-09-27 AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow Huizi Yu et.al. 2409.18924 null
2024-09-27 Soft Measures for Extracting Causal Collective Intelligence Maryam Berijanian et.al. 2409.18911 link
2024-09-27 Multi-Source Hard and Soft Information Fusion Approach for Accurate Cryptocurrency Price Movement Prediction Saeed Mohammadi Dashtaki et.al. 2409.18895 null
2024-09-27 HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models Yu Zhou et.al. 2409.18893 null
2024-09-27 IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation Fan Lin et.al. 2409.18892 null
2024-09-27 Predicting and analyzing memorization within fine-tuned Large Language Models Jérémie Dentan et.al. 2409.18858 null
2024-09-27 Mitigating Selection Bias with Node Pruning and Auxiliary Options Hyeong Kyu Choi et.al. 2409.18857 null
2024-09-26 EgoLM: Multi-Modal Language Model of Egocentric Motions Fangzhou Hong et.al. 2409.18127 null
2024-09-26 Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography Yuexi Du et.al. 2409.18119 link
2024-09-26 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding Ye Liu et.al. 2409.18111 link
2024-09-26 Infering Alt-text For UI Icons With Large Language Models During App Development Sabrina Haque et.al. 2409.18060 null
2024-09-26 DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving Dingrui Wang et.al. 2409.18053 null
2024-09-26 IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning Soeun Lee et.al. 2409.18046 null
2024-09-26 Unveiling the Role of Pretraining in Direct Speech Translation Belen Alastruey et.al. 2409.18044 null
2024-09-26 EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Kai Chen et.al. 2409.18042 link
2024-09-26 Compositional Hardness of Code in Large Language Models – A Probabilistic Perspective Yotam Wolf et.al. 2409.18028 null
2024-09-26 An Adversarial Perspective on Machine Unlearning for AI Safety Jakub Łucki et.al. 2409.18025 null
2024-09-25 Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Matt Deitke et.al. 2409.17146 link
2024-09-25 Attention Prompting on Image for Large Vision-Language Models Runpeng Yu et.al. 2409.17143 link
2024-09-25 FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression Fazal Mittu et.al. 2409.17141 link
2024-09-25 Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents Junting Lu et.al. 2409.17140 null
2024-09-25 Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Fan Zhou et.al. 2409.17115 link
2024-09-25 Accumulator-Aware Post-Training Quantization Ian Colbert et.al. 2409.17092 null
2024-09-25 VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models Yifei Liu et.al. 2409.17066 link
2024-09-25 Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia Azmul Asmar Irfan et.al. 2409.17054 null
2024-09-25 How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not Francesco Verdini et.al. 2409.17044 null
2024-09-25 Counterfactual Token Generation in Large Language Models Ivi Chatzi et.al. 2409.17027 link
2024-09-24 MonoFormer: One Transformer for Both Diffusion and Autoregression Chuyang Zhao et.al. 2409.16280 link
2024-09-24 A fast and sound tagging method for discontinuous named-entity recognition Caio Corro et.al. 2409.16243 null
2024-09-24 LLM Echo Chamber: personalized and automated disinformation Tony Ma et.al. 2409.16241 link
2024-09-24 Towards Enhancing Linked Data Retrieval in Conversational UIs using Large Language Models Omar Mussa et.al. 2409.16220 null
2024-09-24 LLMCount: Enhancing Stationary mmWave Detection with Multimodal-LLM Boyan Li et.al. 2409.16209 null
2024-09-25 CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data Qian-Wen Zhang et.al. 2409.16202 link
2024-09-24 HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Haoran Que et.al. 2409.16191 link
2024-09-24 Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation Xiaohong Liu et.al. 2409.16183 null
2024-09-24 Cyber Knowledge Completion Using Large Language Models Braden K Webb et.al. 2409.16176 null
2024-09-24 Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering Ziyu Zhao et.al. 2409.16167 null
2024-09-20 Gender Representation and Bias in Indian Civil Service Mock Interviews Somonnoy Banerjee et.al. 2409.12194 null
2024-09-18 To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Zayne Sprague et.al. 2409.12183 link
2024-09-18 Finetuning Language Models to Emit Linguistic Expressions of Uncertainty Arslan Chaudhry et.al. 2409.12180 null
2024-09-18 Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference Najmeh Forouzandehmehr et.al. 2409.12150 null
2024-09-18 MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning Justin Chih-Yao Chen et.al. 2409.12147 link
2024-09-18 Experimental Evidence That Conversational Artificial Intelligence Can Steer Consumer Behavior Without Detection Tobias Werner et.al. 2409.12143 null
2024-09-18 MoRAG – Multi-Fusion Retrieval Augmented Generation for Human Motion Kalakonda Sai Shashank et.al. 2409.12140 link
2024-09-24 Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models Sijing Chen et.al. 2409.12139 null
2024-09-18 Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement An Yang et.al. 2409.12122 null
2024-09-18 Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference Edresson Casanova et.al. 2409.12117 null
2024-09-17 AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs Basel Mousi et.al. 2409.11404 null
2024-09-17 NVLM: Open Frontier-Class Multimodal LLMs Wenliang Dai et.al. 2409.11402 null
2024-09-17 Says Who? Effective Zero-Shot Annotation of Focalization Rebecca M. M. Hicke et.al. 2409.11390 null
2024-09-17 Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement Simon Yu et.al. 2409.11378 link
2024-09-17 Towards Time Series Reasoning with LLMs Winnie Chow et.al. 2409.11376 null
2024-09-17 Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification Fatema-E- Jannat et.al. 2409.11375 null
2024-09-17 CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration Jiahui Gao et.al. 2409.11365 null
2024-09-17 AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances Dhruv Agarwal et.al. 2409.11360 null
2024-09-17 THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models Mengfei Liang et.al. 2409.11353 null
2024-09-18 Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling Xinyue Fang et.al. 2409.11283 null
2024-09-16 RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Di Liu et.al. 2409.10516 null
2024-09-16 Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models Momoko Shiraishi et.al. 2409.10506 null
2024-09-16 DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction John Wu et.al. 2409.10504 null
2024-09-16 Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles Kulin Shah et.al. 2409.10502 link
2024-09-16 Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models Shaznin Sultana et.al. 2409.10490 null
2024-09-16 XLM for Autonomous Driving Systems: A Comprehensive Review Sonda Fourati et.al. 2409.10484 null
2024-09-16 Schrodinger’s Memory: Large Language Models Wei Wang et.al. 2409.10482 null
2024-09-16 LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning Jicong Ao et.al. 2409.10444 link
2024-09-16 A Large-Scale Privacy Assessment of Android Third-Party SDKs Mark Huasong Meng et.al. 2409.10411 null
2024-09-17 Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot Bhuvan Sachdeva et.al. 2409.10354 null
2024-09-13 Agents in Software Engineering: Survey, Landscape, and Vision Yanxian Huang et.al. 2409.09030 link
2024-09-13 Contri(e)ve: Context + Retrieve for Scholarly Question Answering Kanchan Shivashankar et.al. 2409.09010 null
2024-09-13 Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance Lucio La Cava et.al. 2409.08963 null
2024-09-13 Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions Zahra Ashktorab et.al. 2409.08937 null
2024-09-13 SynSUM – Synthetic Benchmark with Structured and Unstructured Medical Records Paloma Rabaey et.al. 2409.08936 link
2024-09-13 LLM-based Weak Supervision Framework for Query Intent Classification in Video Search Farnoosh Javadi et.al. 2409.08931 null
2024-09-13 AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models Yifei Yao et.al. 2409.08904 null
2024-09-13 A Market for Lemons? Strategic Directions for a Vigilant Application of Artificial Intelligence in Entrepreneurship Research Martin Obschonka et.al. 2409.08890 null
2024-09-13 Exploring Graph Structure Comprehension Ability of Multimodal Large Language Models: Case Studies Zhiqiang Zhong et.al. 2409.08864 null
2024-09-13 FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition Zhenhua Xu et.al. 2409.08846 null
2024-09-12 DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors Thomas Hanwen Zhu et.al. 2409.08278 null
2024-09-12 Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Rogerio Bonatti et.al. 2409.08264 link
2024-09-12 OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering Jiahao Nick Li et.al. 2409.08250 null
2024-09-12 Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources Alisia Lupidi et.al. 2409.08239 null
2024-09-12 LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems Hakan T. Otal et.al. 2409.08234 link
2024-09-12 What Makes a Maze Look Like a Maze? Joy Hsu et.al. 2409.08202 null
2024-09-12 Fine-tuning Large Language Models for Entity Matching Aaron Steiner et.al. 2409.08185 link
2024-09-12 Faster Speech-LLaMA Inference with Multi-token Prediction Desh Raj et.al. 2409.08148 null
2024-09-12 LLM-POTUS Score: A Framework of Analyzing Presidential Debates with Large Language Models Zhengliang Liu et.al. 2409.08147 null
2024-09-12 WhisperNER: Unified Open Named Entity and Speech Recognition Gil Ayache et.al. 2409.08107 null
2024-09-11 “My Grade is Wrong!”: A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays Shengxin Hong et.al. 2409.07453 null
2024-09-11 SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories Ben Bogin et.al. 2409.07440 link
2024-09-11 CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification Zeqing Qin et.al. 2409.07407 null
2024-09-11 AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge Han Wang et.al. 2409.07394 link
2024-09-11 Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective Guimin Hu et.al. 2409.07388 null
2024-09-11 Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code Khiem Ton et.al. 2409.07368 null
2024-09-11 Think Together and Work Better: Combining Humans’ and LLMs’ Think-Aloud Outcomes for Effective Text Evaluation SeongYeub Chu et.al. 2409.07355 link
2024-09-11 Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks Md Zarif Hossain et.al. 2409.07353 link
2024-09-11 Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering Weixi Weng et.al. 2409.07331 null
2024-09-11 MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications Praveen K Kanithi et.al. 2409.07314 null
2024-09-10 E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning Zihan Liao et.al. 2409.06679 link
2024-09-10 LLaMA-Omni: Seamless Speech Interaction with Large Language Models Qingkai Fang et.al. 2409.06666 link
2024-09-10 Human Perception of LLM-generated Text Content in Social Media Environments Kristina Radivojevic et.al. 2409.06653 null
2024-09-10 Optimal Workload Placement on Multi-Instance GPUs Bekir Turkkan et.al. 2409.06646 null
2024-09-10 EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis Danli Shi et.al. 2409.06644 null
2024-09-10 MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders Wenyu Zhang et.al. 2409.06635 null
2024-09-10 A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio Ningyuan Xi et.al. 2409.06624 null
2024-09-10 Alleviating Hallucinations in Large Language Models with Scepticism Modeling Yetao Wu et.al. 2409.06601 null
2024-09-10 GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering Sacha Muller et.al. 2409.06595 link
2024-09-10 MAPS: Energy-Reliability Tradeoff Management in Autonomous Vehicles Through LLMs Penetrated Science Mahdieh Aliazam et.al. 2409.06558 null
2024-09-09 MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Run Luo et.al. 2409.05840 null
2024-09-09 Are Large Language Models a Threat to Programming Platforms? An Exploratory Study Md Mustakim Billah et.al. 2409.05824 null
2024-09-09 Benchmarking Chinese Knowledge Rectification in Large Language Models Tianhe Lu et.al. 2409.05806 link
2024-09-09 Breaking Neural Network Scaling Laws with Modularity Akhilan Boopathy et.al. 2409.05780 null
2024-09-09 Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models Emily Cheng et.al. 2409.05771 null
2024-09-09 Model Input Verification of Large Scale Simulations Rumyana Neykova et.al. 2409.05768 null
2024-09-09 A Novel Idea Generation Tool using a Structured Conversational AI (CAI) System B. Sankar et.al. 2409.05747 null
2024-09-09 LLMs Will Always Hallucinate, and We Need to Live With This Sourav Banerjee et.al. 2409.05746 null
2024-09-09 A System and Benchmark for LLM-based Q\&A on Heterogeneous Data Achille Fokoue et.al. 2409.05735 null
2024-09-09 Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach Meng Zhou et.al. 2409.05732 link
2024-09-06 RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs Jiaxing Wu et.al. 2409.04421 null
2024-09-06 Question-Answering Dense Video Events Hangyu Qin et.al. 2409.04388 link
2024-09-06 Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs Aliakbar Nafar et.al. 2409.04318 null
2024-09-06 An optically accelerated extreme learning machine using hot atomic vapors Pierre Azam et.al. 2409.04312 null
2024-09-06 Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets Desiree Heim et.al. 2409.04286 null
2024-09-06 Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models Yuxiao Huang et.al. 2409.04270 null
2024-09-06 GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding Ziyin Zhang et.al. 2409.04183 link
2024-09-06 Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering Larissa Pusch et.al. 2409.04181 null
2024-09-06 From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks Andreas Stephan et.al. 2409.04168 null
2024-09-06 Can OpenSource beat ChatGPT? – A Comparative Study of Large Language Models for Text-to-Code Generation Luis Mayer et.al. 2409.04164 null
2024-09-05 Attention Heads of Large Language Models: A Survey Zifan Zheng et.al. 2409.03752 link
2024-09-05 LLM-CI: Assessing Contextual Integrity Norms in Language Models Yan Shvartzshnaider et.al. 2409.03735 null
2024-09-05 Safety vs. Performance: How Multi-Objective Learning Reduces Barriers to Market Entry Meena Jagadeesan et.al. 2409.03734 null
2024-09-05 Planning In Natural Language Improves LLM Search For Code Generation Evan Wang et.al. 2409.03733 null
2024-09-05 RAG based Question-Answering for Contextual Response Prediction System Sriram Veturi et.al. 2409.03708 null
2024-09-05 TRACE-cs: Trustworthy Reasoning for Contrastive Explanations in Course Scheduling Problems Stylianos Loukas Vasileiou et.al. 2409.03671 null
2024-09-05 A Fused Large Language Model for Predicting Startup Success Abdurahman Maarouf et.al. 2409.03668 null
2024-09-05 The representation landscape of few-shot learning and fine-tuning in large language models Diego Doimo et.al. 2409.03662 link
2024-09-06 LLM-based multi-agent poetry generation in non-cooperative environments Ran Zhang et.al. 2409.03659 link
2024-09-05 From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents Jifan Yu et.al. 2409.03512 null
2024-09-04 RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version) Yao Mu et.al. 2409.02920 link
2024-09-05 LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA Jiajie Zhang et.al. 2409.02897 link
2024-09-04 LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture Xidong Wang et.al. 2409.02889 link
2024-09-04 Historical German Text Normalization Using Type- and Token-Based Language Modeling Anton Ehrmanntraut et.al. 2409.02841 null
2024-09-04 Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models Moein Shahiki Tash et.al. 2409.02836 null
2024-09-04 CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models Wentao Liu et.al. 2409.02834 link
2024-09-04 ExpLLM: Towards Chain of Thought for Facial Expression Recognition Xing Lan et.al. 2409.02828 link
2024-09-04 Design Contradictions: Help or Hindrance? Aron E. Owen et.al. 2409.02823 null
2024-09-04 Language Understanding as a Constraint on Consensus Size in LLM Societies Giordano De Marzo et.al. 2409.02822 null
2024-09-04 Towards a Unified View of Preference Learning for Large Language Models: A Survey Bofei Gao et.al. 2409.02795 link
2024-08-30 SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists Raoyuan Zhao et.al. 2408.17437 link
2024-08-30 Advancing Multi-talker ASR Performance with Large Language Models Mohan Shi et.al. 2408.17431 null
2024-08-30 CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models Jonathan Bourne et.al. 2408.17428 null
2024-08-30 Getting Inspiration for Feature Elicitation: App Store- vs. LLM-based Approach Jialiang Wei et.al. 2408.17404 link
2024-08-30 NDP: Next Distribution Prediction as a More Broad Target Junhao Ruan et.al. 2408.17377 null
2024-08-30 Look, Learn and Leverage (L $^3$ ): Mitigating Visual-Domain Shift and Discovering Intrinsic Relations via Symbolic Alignment Hanchen Xie et.al. 2408.17363 null
2024-08-30 Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain Francesca Grasso et.al. 2408.17362 link
2024-08-30 Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage Md Rafi Ur Rashid et.al. 2408.17354 null
2024-08-30 Bridging Domain Knowledge and Process Discovery Using Large Language Models Ali Norouzifar et.al. 2408.17316 link
2024-08-30 Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts Rhui Dih Lee et.al. 2408.17280 null
2024-08-29 How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models Jiyue Jiang et.al. 2408.16756 link
2024-08-29 Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models Alec Solway et.al. 2408.16753 null
2024-08-29 Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge Beidi Dong et.al. 2408.16749 null
2024-08-29 Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models Jiří Milička et.al. 2408.16740 null
2024-08-29 GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models Moreno D’Incà et.al. 2408.16700 link
2024-08-29 Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity Ziniu Li et.al. 2408.16673 null
2024-08-29 Examination of Code generated by Large Language Models Robin Beer et.al. 2408.16601 link
2024-08-29 Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies Zhiyang Qi et.al. 2408.16586 null
2024-08-29 CNIMA: A Universal Evaluation Framework and Automated Approach for Assessing Second Language Dialogues Rena Gao et.al. 2408.16518 null
2024-08-29 LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs? Jan Cegin et.al. 2408.16502 null
2024-08-28 Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Min Shi et.al. 2408.15998 link
2024-08-28 BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems Wei Wang et.al. 2408.15971 null
2024-08-28 More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding Yuan Tang et.al. 2408.15966 link
2024-08-28 Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games Nicholas R. Waytowich et.al. 2408.15950 null
2024-08-28 Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models Yuncheng Yang et.al. 2408.15915 link
2024-08-28 Decentralized LLM Inference over Edge Networks with Energy Harvesting Aria Khoshsirat et.al. 2408.15907 null
2024-08-28 LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments Ruirui Chen et.al. 2408.15903 null
2024-08-28 Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts Nikolas Gritsch et.al. 2408.15901 null
2024-08-28 Bias in LLMs as Annotators: The Effect of Party Cues on Labelling Decision by Large Language Models Sebastian Vallejo Vera et.al. 2408.15895 null
2024-08-28 Persuasion Games using Large Language Models Ganesh Prasath Ramani et.al. 2408.15879 null
2024-08-27 Generative Verifiers: Reward Modeling as Next-Token Prediction Lunjun Zhang et.al. 2408.15240 null
2024-08-27 LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet Nathaniel Li et.al. 2408.15221 null
2024-08-27 Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks Shide Zhou et.al. 2408.15207 null
2024-08-27 Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation Jian Hu et.al. 2408.15205 link
2024-08-27 Can Unconfident LLM Annotations Be Used for Confident Conclusions? Kristina Gligorić et.al. 2408.15204 link
2024-08-27 Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement Longshen Ou et.al. 2408.15176 null
2024-08-27 X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation Hanjia Lyu et.al. 2408.15172 null
2024-08-27 Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation N. E. Kriman et.al. 2408.15171 null
2024-08-27 BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline Guosheng Dong et.al. 2408.15079 null
2024-08-27 Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models Ned Cooper et.al. 2408.15066 null
2024-08-27 Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models Aradhye Agarwal et.al. 2408.14470 null
2024-08-26 Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos Qirui Chen et.al. 2408.14469 link
2024-08-26 Explicit Inductive Inference using Large Language Models Tianyang Liu et.al. 2408.14467 null
2024-08-26 Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study Liuchang Xu Shuo Zhao et.al. 2408.14438 null
2024-08-26 CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models Shubham Bharti et.al. 2408.14419 null
2024-08-26 MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues Kuluhan Binici et.al. 2408.14418 null
2024-08-26 Language-specific Calibration for Pruning Multilingual Language Models Simon Kurz et.al. 2408.14398 null
2024-08-26 Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning Sakhinana Sagar Srinivas et.al. 2408.14387 null
2024-08-26 Probing Causality Manipulation of Large Language Models Chenyang Zhang et.al. 2408.14380 link
2024-08-26 SWE-bench-java: A GitHub Issue Resolving Benchmark for Java Daoguang Zan et.al. 2408.14354 link
2024-08-23 MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Yi-Fan Zhang et.al. 2408.13257 null
2024-08-23 Domain-specific long text classification from sparse relevant information Célia D’Cruz et.al. 2408.13253 null
2024-08-23 Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption Sakhinana Sagar Srinivas et.al. 2408.13248 null
2024-08-23 Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time Yingyu Liang et.al. 2408.13233 null
2024-08-23 EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods Hongcheng Ding et.al. 2408.13214 null
2024-08-23 DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation Qiming Zhu et.al. 2408.13204 null
2024-08-23 Instruct-DeBERTa: A Hybrid Approach for Aspect-based Sentiment Analysis on Textual Reviews Dineth Jayakody et.al. 2408.13202 null
2024-08-23 Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning Hourui Deng et.al. 2408.13184 null
2024-08-23 IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models Zhihao Yu et.al. 2408.13073 null
2024-08-23 Guiding IoT-Based Healthcare Alert Systems with Large Language Models Yulan Gao et.al. 2408.13071 null
2024-08-22 Controllable Text Generation for Large Language Models: A Survey Xun Liang et.al. 2408.12599 link
2024-08-22 RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment Xiaohan Wang et.al. 2408.12579 null
2024-08-22 Jamba-1.5: Hybrid Transformer-Mamba Models at Scale Jamba Team et.al. 2408.12570 link
2024-08-22 ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation Lujia Zhong et.al. 2408.12561 link
2024-08-22 Towards Evaluating and Building Versatile Large Language Models for Medicine Chaoyi Wu et.al. 2408.12547 link
2024-08-22 Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Jinheng Xie et.al. 2408.12528 link
2024-08-22 MEDCO: Medical Education Copilots Based on A Multi-Agent Framework Hao Wei et.al. 2408.12496 null
2024-08-22 GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models Kunsheng Tang et.al. 2408.12494 link
2024-08-22 Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese Khang T. Doan et.al. 2408.12480 null
2024-08-22 Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition Bozheng Li et.al. 2408.12475 null
2024-08-21 SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs Yuanyang Yin et.al. 2408.11813 null
2024-08-21 Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models Yuzhou Huang et.al. 2408.11801 null
2024-08-21 PermitQA: A Benchmark for Retrieval Augmented Generation in Wind Siting and Permitting domain Rounak Meyur et.al. 2408.11800 null
2024-08-21 EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model Feipeng Ma et.al. 2408.11795 null
2024-08-21 Leveraging Chemistry Foundation Models to Facilitate Structure Focused Retrieval Augmented Generation in Multi-Agent Workflows for Catalyst and Materials Design Nathaniel H. Park et.al. 2408.11793 null
2024-08-21 Critique-out-Loud Reward Models Zachary Ankner et.al. 2408.11791 link
2024-08-21 DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework Zhifei Xie et.al. 2408.11788 null
2024-08-21 Personality Alignment of Large Language Models Minjun Zhu et.al. 2408.11779 link
2024-08-21 Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards Omar Erak et.al. 2408.11775 link
2024-08-21 Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks Yiyi Chen et.al. 2408.11749 null
2024-08-20 Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks Nathaniel Pinckney et.al. 2408.11053 null
2024-08-20 FLAME: Learning to Navigate with Multimodal LLM in Urban Environments Yunzhe Xu et.al. 2408.11051 link
2024-08-20 MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Jian Chen et.al. 2408.11049 link
2024-08-20 Reconciling Methodological Paradigms: Employing Large Language Models as Novice Qualitative Research Assistants in Talent Management Research Sreyoshi Bhaduri et.al. 2408.11043 null
2024-08-20 Scaling Law with Learning Rate Annealing Howe Tissue et.al. 2408.11029 null
2024-08-20 Athena: Safe Autonomous Agents with Verbal Contrastive Learning Tanmana Sadhu et.al. 2408.11021 null
2024-08-20 While GitHub Copilot Excels at Coding, Does It Ensure Responsible Output? Wen Cheng et.al. 2408.11006 link
2024-08-20 CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models Michael Reinisch et.al. 2408.10995 null
2024-08-20 Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models Yuyan Chen et.al. 2408.10947 null
2024-08-20 Large Language Model Driven Recommendation Anton Korikov et.al. 2408.10946 null
2024-08-19 Demystifying the Communication Characteristics for Distributed Transformer Models Quentin Anthony et.al. 2408.10197 null
2024-08-19 SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models Anke Tang et.al. 2408.10174 link
2024-08-19 Customizing Language Models with Instance-wise LoRA for Sequential Recommendation Xiaoyu Kong et.al. 2408.10159 null
2024-08-19 Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models Amey Hengle et.al. 2408.10151 null
2024-08-19 In-Context Learning with Representations: Contextual Generalization of Trained Transformers Tong Yang et.al. 2408.10147 null
2024-08-19 Instruction Finetuning for Leaderboard Generation from Empirical AI Research Salomon Kabongo et.al. 2408.10141 null
2024-08-19 Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models Tianyu Zhang et.al. 2408.10124 link
2024-08-20 PLUTUS: A Well Pre-trained Large Unified Transformer can Unveil Financial Time Series Regularities Yuanjian Xu et.al. 2408.10111 null
2024-08-19 Recent Surge in Public Interest in Transportation: Sentiment Analysis of Baidu Apollo Go Using Weibo Data Shiqi Wang et.al. 2408.10088 link
2024-08-19 ARMADA: Attribute-Based Multimodal Data Augmentation Xiaomeng Jin et.al. 2408.10086 null
2024-08-16 PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars Sumanth Prabhu et.al. 2408.08869 null
2024-08-16 Visual Agents as Fast and Slow Thinkers Guangyan Sun et.al. 2408.08862 null
2024-08-16 ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis Yubao Zhao et.al. 2408.08849 null
2024-08-16 PsychoLex: Unveiling the Psychological Mind of Large Language Models Mohammad Amin Abbasi et.al. 2408.08848 null
2024-08-16 FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats Xuanliang Zhang et.al. 2408.08841 link
2024-08-16 Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors Felipe A. Csaszar et.al. 2408.08811 null
2024-08-16 Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge Ravi Raju et.al. 2408.08808 null
2024-08-16 EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics Chenwei Wan et.al. 2408.08782 link
2024-08-16 Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions Chenming Tang et.al. 2408.08780 null
2024-08-16 DAC: Decomposed Automation Correction for Text-to-SQL Dingzirui Wang et.al. 2408.08779 link
2024-08-15 Can Large Language Models Understand Symbolic Graphics Programs? Zeju Qiu et.al. 2408.08313 null
2024-08-15 ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws Ruihang Li et.al. 2408.08310 null
2024-08-15 Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors Usman Syed et.al. 2408.08302 null
2024-08-15 HELP: Hierarchical Embeddings-based Log Parsing Andy Xu et.al. 2408.08300 null
2024-08-15 The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community Shachar Don-Yehiya et.al. 2408.08291 null
2024-08-15 Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model Jin Wang et.al. 2408.08282 null
2024-08-15 BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts Qizhen Zhang et.al. 2408.08274 null
2024-08-15 DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System Xihong Yang et.al. 2408.08231 null
2024-08-15 RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science David Farr et.al. 2408.08217 null
2024-08-15 Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models Javier González et.al. 2408.08210 null
2024-08-14 The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models Karime Maamari et.al. 2408.07702 null
2024-08-15 Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities Enneng Yang et.al. 2408.07666 link
2024-08-14 Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models Yi-Cheng Lin et.al. 2408.07665 null
2024-08-14 Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions Quan Liu et.al. 2408.07663 link
2024-08-14 WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs Weijian Xie et.al. 2408.07611 null
2024-08-14 Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey Hamza Kheddar et.al. 2408.07583 null
2024-08-15 MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark Minxuan Zhou et.al. 2408.07543 null
2024-08-14 Usefulness of data flow diagrams and large language models for security threat validation: a registered report Winnie Bahati Mbaka et.al. 2408.07537 null
2024-08-14 Development of a Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments Seungjun Han et.al. 2408.07531 null
2024-08-14 Large Language Models Know What Makes Exemplary Contexts Quanyu Long et.al. 2408.07505 null
2024-08-13 Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents Kexun Zhang et.al. 2408.07060 link
2024-08-13 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Yushi Bai et.al. 2408.07055 link
2024-08-13 PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology Xiaomin Wu et.al. 2408.07037 null
2024-08-13 Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models Chun Jie Chong et.al. 2408.07004 null
2024-08-13 Generative AI for automatic topic labelling Diego Kozlowski et.al. 2408.07003 null
2024-08-13 LLMs can Schedule Henrik Abgaryan et.al. 2408.06993 link
2024-08-13 OpenResearcher: Unleashing AI for Accelerated Scientific Research Yuxiang Zheng et.al. 2408.06941 link
2024-08-13 Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas Louis Kwok et.al. 2408.06929 null
2024-08-13 Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives Zhihu Wang et.al. 2408.06904 null
2024-08-13 Leveraging Language Models for Emotion and Behavior Analysis in Education Kaito Tanaka et.al. 2408.06874 null
2024-08-12 Animate, or Inanimate, That is the Question for Large Language Models Leonardo Ranaldi et.al. 2408.06332 null
2024-08-12 Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let’s Take TravelPlanner as an Example Yanan Chen et.al. 2408.06318 null
2024-08-12 Long-Form Answers to Visual Questions from Blind and Low Vision People Mina Huh et.al. 2408.06303 null
2024-08-12 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Chris Lu et.al. 2408.06292 link
2024-08-12 MovieSum: An Abstractive Summarization Dataset for Movie Screenplays Rohit Saxena et.al. 2408.06281 link
2024-08-12 Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation Jieyong Kim et.al. 2408.06276 null
2024-08-12 FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data Haoran Sun et.al. 2408.06273 link
2024-08-12 A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution Sampath Rajapaksha et.al. 2408.06272 null
2024-08-12 Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment Karel D’Oosterlinck et.al. 2408.06266 link
2024-08-12 On Effects of Steering Latent Representation for Large Language Model Unlearning Dang Huu-Tien et.al. 2408.06223 null
2024-08-10 Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions Michele Miranda et.al. 2408.05212 link
2024-08-09 VITA: Towards Open-Source Interactive Omni Multimodal LLM Chaoyou Fu et.al. 2408.05211 null
2024-08-09 Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners Michael Vaccaro Jr et.al. 2408.05204 null
2024-08-09 TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning Yujie Feng et.al. 2408.05200 null
2024-08-09 AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset Pritam Deka et.al. 2408.05149 null
2024-08-09 A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning Ye Yuan et.al. 2408.05141 null
2024-08-09 Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations Jasmine Latendresse et.al. 2408.05128 null
2024-08-09 Large Language Models and Thematic Analysis: Human-AI Synergy in Researching Hate Speech on Social Media Petre Breazu et.al. 2408.05126 null
2024-08-09 Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video Chunggi Lee et.al. 2408.05123 null
2024-08-09 A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? Xinyu Liu et.al. 2408.05109 link
2024-08-08 Transformer Explainer: Interactive Learning of Text-Generative Models Aeree Cho et.al. 2408.04619 link
2024-08-08 Better Alignment with Instruction Back-and-Forth Translation Thao Nguyen et.al. 2408.04614 null
2024-08-08 Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models Qirui Jiao et.al. 2408.04594 link
2024-08-08 Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness Xiaojing Fan et.al. 2408.04585 null
2024-08-08 SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals Haoran Zheng et.al. 2408.04575 null
2024-08-08 Learning Fine-Grained Grounded Citations for Attributed Large Language Models Lei Huang et.al. 2408.04568 link
2024-08-08 Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models Yupeng Chang et.al. 2408.04556 link
2024-08-08 Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models Fabio Pernisi et.al. 2408.04522 null
2024-08-08 What You Need is What You Get: Theory of Mind for an LLM-Based Code Understanding Assistant Jonan Richards et.al. 2408.04477 null
2024-08-08 Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate Yiqun Zhang et.al. 2408.04472 link
2024-08-07 How Well Can Vision Language Models See Image Details? Chenhui Gou et.al. 2408.03940 null
2024-08-07 SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature Vinícius Di Oliveira et.al. 2408.03936 null
2024-08-07 CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases Xiangyan Liu et.al. 2408.03910 link
2024-08-07 Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models Shachi H Kumar et.al. 2408.03907 null
2024-08-07 From Data to Story: Towards Automatic Animated Data Video Creation with LLM-based Multi-Agent Systems Leixian Shen et.al. 2408.03876 null
2024-08-07 PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training Haoran Xu et.al. 2408.03865 null
2024-08-07 GAIA – A Large Language Model for Advanced Power Dispatch Yuheng Cheng et.al. 2408.03847 null
2024-08-07 MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models Yuchen Dong et.al. 2408.03841 null
2024-08-07 WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models Prannaya Gupta et.al. 2408.03837 link
2024-08-07 Target Prompting for Information Extraction with Vision Language Model Dipankar Medhi et.al. 2408.03834 null
2024-08-06 Pre-training and in-context learning IS Bayesian inference a la De Finetti Naimeng Ye et.al. 2408.03307 null
2024-08-06 TextIM: Part-aware Interactive Motion Synthesis from Text Siyuan Fan et.al. 2408.03302 null
2024-08-06 KaPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models Ruizhe Zhang et.al. 2408.03297 null
2024-08-06 AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval Pavel Suma et.al. 2408.03282 null
2024-08-07 StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation Boxi Cao et.al. 2408.03281 link
2024-08-06 Synthesizing Text-to-SQL Data from Weak and Strong LLMs Jiaxi Yang et.al. 2408.03256 null
2024-08-06 Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons Yifei Wang et.al. 2408.03247 link
2024-08-06 Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi Pranita Deshmukh et.al. 2408.03172 null
2024-08-06 Conditioning LLMs with Emotion in Neural Machine Translation Charles Brazier et.al. 2408.03150 null
2024-08-06 Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations Leo Donisch et.al. 2408.03130 null
2024-08-05 Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Dongyang Liu et.al. 2408.02657 link
2024-08-05 Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models? Mohammad Bahrami Karkevandi et.al. 2408.02651 null
2024-08-05 SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models Muxi Diao et.al. 2408.02632 null
2024-08-05 Language Model Can Listen While Speaking Ziyang Ma et.al. 2408.02622 null
2024-08-05 Progressively Selective Label Enhancement for Language Model Alignment Biao Liu et.al. 2408.02599 null
2024-08-05 Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection Sajal Aggarwal et.al. 2408.02595 null
2024-08-05 Leveraging the Power of LLMs: A Fine-Tuning Approach for High-Quality Aspect-Based Summarization Ankan Mullick et.al. 2408.02584 null
2024-08-05 Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information Yauwai Yim et.al. 2408.02559 null
2024-08-05 Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning Hao Zhou et.al. 2408.02549 null
2024-08-05 RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Daniel Fleischer et.al. 2408.02545 link
2024-08-02 Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting Xiangyu Zhao et.al. 2408.01423 null
2024-08-02 Mission Impossible: A Statistical Perspective on Jailbreaking LLMs Jingtong Su et.al. 2408.01420 null
2024-08-02 DebateQA: Evaluating Question Answering on Debatable Knowledge Rongwu Xu et.al. 2408.01419 null
2024-08-02 Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs Yilun Hua et.al. 2408.01417 null
2024-08-02 Coalitions of Large Language Models Increase the Robustness of AI Agents Prattyush Mangal et.al. 2408.01380 null
2024-08-02 Toward Automatic Relevance Judgment using Vision–Language Models for Image–Text Retrieval Evaluation Jheng-Hong Yang et.al. 2408.01363 null
2024-08-02 Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs Peng Ding et.al. 2408.01355 null
2024-08-02 MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code Kaiwen Ning et.al. 2408.01354 null
2024-08-02 Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks Anders Giovanni Møller et.al. 2408.01346 null
2024-08-02 A Backbone for Long-Horizon Robot Task Understanding Xiaoshuai Chen et.al. 2408.01334 null
2024-08-01 AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation Mengkang Hu et.al. 2408.00764 link
2024-08-01 Tamper-Resistant Safeguards for Open-Weight LLMs Rishub Tamirisa et.al. 2408.00761 null
2024-08-01 DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency Jovan Stojkovic et.al. 2408.00741 null
2024-08-01 Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions Guangzhi Xiong et.al. 2408.00727 null
2024-08-01 An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models Yangzhen Wu et.al. 2408.00724 link
2024-08-01 Pathway to Secure and Trustworthy 6G for LLMs: Attacks, Defense, and Opportunities Sunder Ali Khowaja et.al. 2408.00722 null
2024-08-01 Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning Trapoom Ukarapol et.al. 2408.00690 link
2024-08-01 Can Developers Prompt? A Controlled Experiment for Code Documentation Generation Hans-Alexander Kruse et.al. 2408.00686 null
2024-08-01 AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models Daqin Luo et.al. 2408.00665 null
2024-08-01 Disentangling Dense Embeddings with Sparse Autoencoders Charles O’Neill et.al. 2408.00657 null
2024-07-31 Vision-Language Model Based Handwriting Verification Mihir Chauhan et.al. 2407.21788 null
2024-07-31 Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs Shi Liu et.al. 2407.21771 null
2024-07-31 ReplanVLM: Replanning Robotic Tasks with Visual Language Models Aoran Mei et.al. 2407.21762 null
2024-07-31 Adaptive Retrieval-Augmented Generation for Conversational Systems Xi Wang et.al. 2407.21712 null
2024-07-31 CEAR: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature Stefan Langer et.al. 2407.21708 null
2024-07-31 TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities Ming Zhang et.al. 2407.21693 null
2024-07-31 Synth-Empathy: Towards High-Quality Synthetic Empathy Data Hao Liang et.al. 2407.21669 link
2024-07-31 LLM-for-X: Application-agnostic Integration of Large Language Models to Support Personal Writing Workflows Lukas Teufelberger et.al. 2407.21593 null
2024-07-31 A Performance Study of LLM-Generated Code on Leetcode Tristan Coignion et.al. 2407.21579 null
2024-07-31 PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning Min Jae Jung et.al. 2407.21571 null
2024-07-30 ThinK: Thinner Key Cache by Query-Driven Pruning Yuhui Xu et.al. 2407.21018 link
2024-07-30 CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning Yuexi Du et.al. 2407.21011 link
2024-07-30 The Dual-Edged Sword of Technical Debt: Benefits and Issues Analyzed Through Developer Discussions Xiaozhou Li et.al. 2407.21007 null
2024-07-30 MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning Yupeng Chen et.al. 2407.20999 null
2024-07-30 From Feature Importance to Natural Language Explanations Using LLMs with RAG Sule Tekkesinoglu et.al. 2407.20990 null
2024-07-30 Large Language Models (LLMs) for Semantic Communication in Edge-based IoT Networks Alakesh Kalita et.al. 2407.20970 null
2024-07-30 Automated Review Generation Method Based on Large Language Models Shican Wu et.al. 2407.20906 link
2024-07-30 ThinkRepair: Self-Directed Automated Program Repair Xin Yin et.al. 2407.20898 link
2024-07-30 Effective Black Box Testing of Sentiment Analysis Classification Networks Parsa Karbasizadeh et.al. 2407.20884 null
2024-07-30 Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification Boyang Zhang et.al. 2407.20859 null
2024-07-29 Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing Ekaterina Iakovleva et.al. 2407.20232 null
2024-07-29 Can Editing LLMs Inject Harm? Canyu Chen et.al. 2407.20224 link
2024-07-29 QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval Hongming Tan et.al. 2407.20207 null
2024-07-29 MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Zehui Chen et.al. 2407.20183 link
2024-07-29 Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning Xingchen Zeng et.al. 2407.20174 link
2024-07-29 Diffusion Feedback Helps CLIP See Better Wenxuan Wang et.al. 2407.20171 link
2024-07-29 Language-Conditioned Offline RL for Multi-Robot Navigation Steven Morad et.al. 2407.20164 null
2024-07-29 rLLM: Relational Table Learning with LLMs Weichen Li et.al. 2407.20157 link
2024-07-29 ByteCheckpoint: A Unified Checkpointing System for LLM Development Borui Wan et.al. 2407.20143 null
2024-07-29 Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models Zhe Li et.al. 2407.20053 null
2024-07-26 Small Molecule Optimization with Large Language Models Philipp Guevorguian et.al. 2407.18897 link
2024-07-26 Human-artificial intelligence teaming for scientific information extraction from data-driven additive manufacturing research using large language models Mutahar Safdar et.al. 2407.18827 null
2024-07-26 Automatic Detection of Moral Values in Music Lyrics Vjosa Preniqi et.al. 2407.18787 link
2024-07-26 The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs Aleix Sant et.al. 2407.18786 null
2024-07-26 TAGIFY: LLM-powered Tagging Interface for Improved Data Findability on OGD portals Kevin Kliimask et.al. 2407.18764 null
2024-07-26 Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery Yuni Susanti et.al. 2407.18752 link
2024-07-26 Towards Effective and Efficient Continual Pre-training of Large Language Models Jie Chen et.al. 2407.18743 link
2024-07-26 Towards Generalized Offensive Language Identification Alphaeus Dmonte et.al. 2407.18738 null
2024-07-26 LLASP: Fine-tuning Large Language Models for Answer Set Programming Erica Coppolillo et.al. 2407.18723 null
2024-07-26 Neurosymbolic AI for Enhancing Instructability in Generative AI Amit Sheth et.al. 2407.18722 null
2024-07-25 Recursive Introspection: Teaching Language Model Agents How to Self-Improve Yuxiao Qu et.al. 2407.18219 null
2024-07-25 Exploring Scaling Trends in LLM Robustness Nikolhaus Howe et.al. 2407.18213 null
2024-07-25 Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models Sanae Lotfi et.al. 2407.18158 null
2024-07-25 Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic Fakhraddin Alwajih et.al. 2407.18129 null
2024-07-25 Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow Tian Guo et.al. 2407.18103 null
2024-07-25 PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization Christopher Clarke et.al. 2407.18078 link
2024-07-25 C2P: Featuring Large Language Models with Causal Reasoning Abdolmahdi Bagheri et.al. 2407.18069 null
2024-07-25 ComPeer: A Generative Conversational Agent for Proactive Peer Support Tianjian Liu et.al. 2407.18064 null
2024-07-25 Audio Entailment: Assessing Deductive Reasoning for Audio Understanding Soham Deshmukh et.al. 2407.18062 link
2024-07-25 Difficulty Estimation and Simplification of French Text Using LLMs Henri Jamet et.al. 2407.18061 null
2024-07-24 I Could’ve Asked That: Reformulating Unanswerable Questions Wenting Zhao et.al. 2407.17469 link
2024-07-24 WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries Wenting Zhao et.al. 2407.17468 null
2024-07-24 CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models Jiawei Gu et.al. 2407.17467 null
2024-07-24 $VILA^2$ : VILA Augmented VILA Yunhao Fang et.al. 2407.17453 null
2024-07-24 Generative AI in Evidence-Based Software Engineering: A White Paper Mattel Esposito et.al. 2407.17440 null
2024-07-24 Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? Michael-Andrei Panaitescu-Liess et.al. 2407.17417 null
2024-07-24 (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork Tianjin Huang et.al. 2407.17412 null
2024-07-24 Grammar-based Game Description Generation using Large Language Models Tsunehiko Tanaka et.al. 2407.17404 null
2024-07-24 3D Question Answering for City Scene Understanding Penglei Sun et.al. 2407.17398 null
2024-07-24 ViPer: Visual Personalization of Generative Models via Individual Preference Learning Sogand Salehi et.al. 2407.17365 null
2024-07-23 Can Large Language Models Automatically Jailbreak GPT-4V? Yuanwei Wu et.al. 2407.16686 null
2024-07-23 RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent Huiyu Xu et.al. 2407.16667 null
2024-07-23 Course-Correction: Safety Alignment Using Synthetic Preferences Rongwu Xu et.al. 2407.16637 link
2024-07-23 Lawma: The Power of Specialization for Legal Tasks Ricardo Dominguez-Olmedo et.al. 2407.16615 null
2024-07-23 Shared Imagination: LLMs Hallucinate Alike Yilun Zhou et.al. 2407.16604 null
2024-07-23 Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs Yifan Xia et.al. 2407.16576 null
2024-07-23 Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models Ioana Buhnila et.al. 2407.16565 null
2024-07-23 Patched RTC: evaluating LLMs for diverse software development tasks Asankhaya Sharma et.al. 2407.16557 link
2024-07-24 MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues Liyun Zhang et.al. 2407.16552 null
2024-07-23 Imperfect Vision Encoders: Efficient and Robust Tuning for Vision-Language Models Aristeidis Panos et.al. 2407.16526 null
2024-07-22 AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description Junyu Xie et.al. 2407.15850 link
2024-07-22 LLMmap: Fingerprinting For Large Language Models Dario Pasquini et.al. 2407.15847 null
2024-07-22 SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Mingze Xu et.al. 2407.15841 link
2024-07-22 MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity Yangzhou Liu et.al. 2407.15838 link
2024-07-22 dMel: Speech Tokenization made Simple He Bai et.al. 2407.15835 link
2024-07-22 Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight Ziyuan Huang et.al. 2407.15819 null
2024-07-22 Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach Rian Dolphin et.al. 2407.15788 null
2024-07-22 MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation Marco Simoni et.al. 2407.15748 null
2024-07-22 OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context Steffen Kleinle et.al. 2407.15736 null
2024-07-22 TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON John Chong Min Tan et.al. 2407.15734 link
2024-07-19 Internal Consistency and Self-Feedback in Large Language Models: A Survey Xun Liang et.al. 2407.14507 link
2024-07-19 On Pre-training of Multimodal Language Models Customized for Chart Understanding Wan-Cyuan Fan et.al. 2407.14506 null
2024-07-19 Evaluating the Reliability of Self-Explanations in Large Language Models Korbinian Randl et.al. 2407.14487 link
2024-07-19 Contrastive Learning with Counterfactual Explanations for Radiology Report Generation Mingjie Li et.al. 2407.14474 null
2024-07-19 Check-Eval: A Checklist-based Approach for Evaluating Text Quality Jayr Pereira et.al. 2407.14467 null
2024-07-19 Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier Zachary Wojtowicz et.al. 2407.14452 null
2024-07-19 From Instruction to Insight: Exploring the Functional and Semantic Roles of Text in Interactive Dashboards Nicole Sultanum et.al. 2407.14451 null
2024-07-19 Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding Renshan Zhang et.al. 2407.14439 link
2024-07-19 The Vision of Autonomic Computing: Can LLMs Make It a Reality? Zhiyang Zhang et.al. 2407.14402 null
2024-07-19 Open Artificial Knowledge Vadim Borisov et.al. 2407.14371 null
2024-07-18 Visual Haystacks: Answering Harder Questions About Sets of Images Tsung-Han Wu et.al. 2407.13766 link
2024-07-18 SegPoint: Segment Any Point Cloud via Large Language Model Shuting He et.al. 2407.13761 null
2024-07-18 Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models Zhuo Chen et.al. 2407.13757 null
2024-07-18 CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications Mirza Masfiqur Rahman et.al. 2407.13742 null
2024-07-18 Baba Is AI: Break the Rules to Beat the Benchmark Nathan Cloos et.al. 2407.13729 null
2024-07-18 CoDefeater: Using LLMs To Find Defeaters in Assurance Cases Usman Gohar et.al. 2407.13717 null
2024-07-18 Understanding Reference Policies in Direct Preference Optimization Yixin Liu et.al. 2407.13709 link
2024-07-18 A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice Shaina Raza et.al. 2407.13699 null
2024-07-18 Prover-Verifier Games improve legibility of LLM outputs Jan Hendrik Kirchner et.al. 2407.13692 link
2024-07-18 COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization Skyler Grandel et.al. 2407.13648 null
2024-07-17 LookupViT: Compressing visual information to a limited number of tokens Rajat Koner et.al. 2407.12753 null
2024-07-17 EchoSight: Advancing Visual-Language Models with Wiki Knowledge Yibin Yan et.al. 2407.12735 null
2024-07-17 NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model Zhongqun Zhang et.al. 2407.12727 null
2024-07-17 Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? Ben Yao et.al. 2407.12725 null
2024-07-17 The Future of Learning: Large Language Models through the Lens of Students He Zhang et.al. 2407.12723 null
2024-07-17 MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models Leyang Shen et.al. 2407.12709 link
2024-07-17 Patch-Level Training for Large Language Models Chenze Shao et.al. 2407.12665 link
2024-07-17 Zero-shot Text-guided Infinite Image Synthesis with LLM guidance Soyeong Kwon et.al. 2407.12642 null
2024-07-17 Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences Claudio Pinhanez et.al. 2407.12620 null
2024-07-17 AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism William Brannon et.al. 2407.12613 link
2024-07-16 UrbanWorld: An Urban World Model for 3D City Generation Yu Shang et.al. 2407.11965 null
2024-07-16 NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Mo Li et.al. 2407.11963 link
2024-07-16 Code Documentation and Analysis to Secure Software Development Paul Attie et.al. 2407.11934 null
2024-07-16 What’s Wrong? Refining Meeting Summaries with LLM Feedback Frederic Kirstein et.al. 2407.11919 null
2024-07-16 Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads Aritra Dhar et.al. 2407.11888 null
2024-07-16 Schema Matching with Large Language Models: an Experimental Study Marcel Parciak et.al. 2407.11852 link
2024-07-16 LoFTI: Localization and Factuality Transfer to Indian Locales Sona Elza Simon et.al. 2407.11833 link
2024-07-16 GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text Kyle Hamilton et.al. 2407.11827 null
2024-07-16 PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation Branden Butler et.al. 2407.11798 null
2024-07-16 Large Language Models as Misleading Assistants in Conversation Betty Li Hou et.al. 2407.11789 null
2024-07-15 VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation Bocheng Zou et.al. 2407.10972 link
2024-07-15 Q-Sparse: All Large Language Models can be Fully Sparsely-Activated Hongyu Wang et.al. 2407.10969 null
2024-07-15 No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations Walter Simoncini et.al. 2407.10964 link
2024-07-15 Fast Matrix Multiplications for Lookup Table-Quantized LLMs Han Guo et.al. 2407.10960 link
2024-07-15 MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models Chengguang Gan et.al. 2407.10953 null
2024-07-15 Can Textual Semantics Mitigate Sounding Object Segmentation Preference? Yaoting Wang et.al. 2407.10947 link
2024-07-15 GRUtopia: Dream General Robots in a City at Scale Hanqing Wang et.al. 2407.10943 link
2024-07-15 Benchmarking Vision Language Models for Cultural Understanding Shravan Nayak et.al. 2407.10920 null
2024-07-15 FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets Xiaohui Victor Li et.al. 2407.10909 link
2024-07-15 Hey, That’s My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique Mark Russinovich et.al. 2407.10887 null
2024-07-12 FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3 Georgios Makridis et.al. 2407.09467 null
2024-07-12 Human-like Episodic Memory for Infinite Context LLMs Zafeirios Fountas et.al. 2407.09450 link
2024-07-12 ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts Amelia F. Hardy et.al. 2407.09447 null
2024-07-12 MUSCLE: A Model Update Strategy for Compatible LLM Evolution Jessica Echterhoff et.al. 2407.09435 null
2024-07-12 Open (Clinical) LLMs are Sensitive to Instruction Phrasings Alberto Mario Ceballos Arroyo et.al. 2407.09429 null
2024-07-12 TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models Hang Zou et.al. 2407.09424 null
2024-07-12 Mitigating Entity-Level Hallucination in Large Language Models Weihang Su et.al. 2407.09417 link
2024-07-12 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers Shraman Pramanick et.al. 2407.09413 link
2024-07-12 PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents Saber Zerhoudi et.al. 2407.09394 link
2024-07-12 GAVEL: Generating Games Via Evolution and Language Models Graham Todd et.al. 2407.09388 link
2024-07-11 MAVIS: Mathematical Visual Instruction Tuning Renrui Zhang et.al. 2407.08739 link
2024-07-11 Real-Time Anomaly Detection and Reactive Planning with Large Language Models Rohan Sinha et.al. 2407.08735 null
2024-07-11 Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist Zihao Zhou et.al. 2407.08733 null
2024-07-11 A Taxonomy for Data Contamination in Large Language Models Medha Palavalli et.al. 2407.08716 null
2024-07-11 GTA: A Benchmark for General Tool Agents Jize Wang et.al. 2407.08713 link
2024-07-11 Extracting Training Data from Document-Based VQA Models Francesco Pinto et.al. 2407.08707 null
2024-07-11 Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models Zhening Xing et.al. 2407.08701 null
2024-07-11 Mitigating Catastrophic Forgetting in Language Transfer via Model Merging Anton Alexandrov et.al. 2407.08699 null
2024-07-11 Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight Zhiqiang Xie et.al. 2407.08694 null
2024-07-11 SEED-Story: Multimodal Long Story Generation with Large Language Model Shuai Yang et.al. 2407.08683 link
2024-07-10 Training on the Test Task Confounds Evaluation and Emergence Ricardo Dominguez-Olmedo et.al. 2407.07890 link
2024-07-10 Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization Junkang Wu et.al. 2407.07880 link
2024-07-10 FACTS About Building Retrieval Augmented Generation-based Chatbots Rama Akkiraju et.al. 2407.07858 null
2024-07-10 OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training Sami Jaghouar et.al. 2407.07852 link
2024-07-10 Natural Language Mechanisms via Self-Resolution with Foundation Models Nicolas Della Penna et.al. 2407.07845 null
2024-07-10 Transformer Alignment in Large Language Models Murdock Aubry et.al. 2407.07810 null
2024-07-10 Attribute or Abstain: Large Language Models as Long Document Assistants Jan Buchmann et.al. 2407.07799 link
2024-07-11 Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard Oguzhan Topsakal et.al. 2407.07796 link
2024-07-10 Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities Tianjie Ju et.al. 2407.07791 link
2024-07-10 WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment Jiefu Ou et.al. 2407.07778 null
2024-07-09 AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning Jiaxi Cui et.al. 2407.07094 link
2024-07-09 FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation Liqun Ma et.al. 2407.07093 link
2024-07-09 Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models Logan Cross et.al. 2407.07086 link
2024-07-09 Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities Shaltiel Shmidman et.al. 2407.07080 null
2024-07-09 Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps Yung-Sung Chuang et.al. 2407.07071 link
2024-07-09 Prompting Techniques for Secure Code Generation: A Systematic Investigation Catherine Tony et.al. 2407.07064 null
2024-07-09 Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence Weize Chen et.al. 2407.07061 link
2024-07-09 Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model Wenqi Zhang et.al. 2407.07053 link
2024-07-09 CorMulT: A Semi-supervised Modality Correlation-aware Multimodal Transformer for Sentiment Analysis Yangmin Li et.al. 2407.07046 null
2024-07-09 Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies Inwon Kang et.al. 2407.07019 null
2024-07-08 Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Orr Zohar et.al. 2407.06189 link
2024-07-08 CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation Xinying Guo et.al. 2407.06188 null
2024-07-08 On Speeding Up Language Model Evaluation Jin Peng Zhou et.al. 2407.06172 link
2024-07-08 What’s Wrong with Your Code Generated by Large Language Models? An Extensive Study Shihan Dou et.al. 2407.06153 null
2024-07-08 Using Grammar Masking to Ensure Syntactic Validity in LLM-based Modeling Tasks Lukas Netz et.al. 2407.06146 null
2024-07-08 ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation Ethan Chern et.al. 2407.06135 link
2024-07-08 Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization Hannah K. Bako et.al. 2407.06129 link
2024-07-08 Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities Avinash Anand et.al. 2407.06125 null
2024-07-08 Artificial Intuition: Efficient Classification of Scientific Abstracts Harsh Sakhrani et.al. 2407.06093 null
2024-07-08 Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models Jinliang Lu et.al. 2407.06089 null
2024-07-05 Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs Rudolf Laine et.al. 2407.04694 null
2024-07-05 ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models Yuzhe Gu et.al. 2407.04693 link
2024-07-05 Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge Yuanze Lin et.al. 2407.04681 null
2024-07-05 Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition Ye Bai et.al. 2407.04675 null
2024-07-05 Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement Yongji Wu et.al. 2407.04656 null
2024-07-05 Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework Reza Averly et.al. 2407.04629 null
2024-07-05 On scalable oversight with weak LLMs judging strong LLMs Zachary Kenton et.al. 2407.04622 null
2024-07-05 Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions Shumaila Javaid et.al. 2407.04581 null
2024-07-05 VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models Hang Gao et.al. 2407.04573 null
2024-07-05 PoPreRo: A New Dataset for Popularity Prediction of Romanian Reddit Posts Ana-Cristina Rogoz et.al. 2407.04541 link
2024-07-03 BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations Zhantao Yang et.al. 2407.03314 null
2024-07-03 Universal Length Generalization with Turing Programs Kaiying Hou et.al. 2407.03310 null
2024-07-03 Large Language Models for JSON Schema Discovery Michael J. Mior et.al. 2407.03286 null
2024-07-03 LLM Internal States Reveal Hallucination Risk Faced With a Query Ziwei Ji et.al. 2407.03282 null
2024-07-03 Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning Zhili Shen et.al. 2407.03227 null
2024-07-03 How Does Quantization Affect Multilingual LLMs? Kelly Marchisio et.al. 2407.03211 null
2024-07-03 TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts Ruida Wang et.al. 2407.03203 link
2024-07-03 Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models Haritz Puerto et.al. 2407.03181 link
2024-07-03 Investigating Decoder-only Large Language Models for Speech-to-text Translation Chao-Wei Huang et.al. 2407.03169 null
2024-07-03 SOS! Soft Prompt Attack Against Open-Source Large Language Models Ziqing Yang et.al. 2407.03160 null
2024-07-02 MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Huiqiang Jiang et.al. 2407.02490 link
2024-07-02 Neurocache: Efficient Vector Retrieval for Long-range Language Modeling Ali Safaya et.al. 2407.02486 link
2024-07-02 RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs Yue Yu et.al. 2407.02485 null
2024-07-02 MMedAgent: Learning to Use Medical Tools with Multi-modal Agent Binxu Li et.al. 2407.02483 null
2024-07-02 Understanding Alignment in Multimodal LLMs: A Comprehensive Study Elmira Amirloo et.al. 2407.02477 null
2024-07-02 Open Scene Graphs for Open World Object-Goal Navigation Joel Loo et.al. 2407.02473 null
2024-07-02 Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I Harrie Oosterhuis et.al. 2407.02464 null
2024-07-02 Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling Margaret Li et.al. 2407.02446 null
2024-07-02 Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs Jinmin Li et.al. 2407.02411 null
2024-07-02 CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models Song Wang et.al. 2407.02408 null
2024-06-28 Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs Sukmin Yun et.al. 2406.20098 link
2024-06-28 LLaRA: Supercharging Robot Learning Data for Vision-Language Policy Xiang Li et.al. 2406.20095 link
2024-06-28 Scaling Synthetic Data Creation with 1,000,000,000 Personas Xin Chan et.al. 2406.20094 link
2024-06-28 LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression Jieneng Chen et.al. 2406.20092 link
2024-06-28 ProgressGym: Alignment with a Millennium of Moral Progress Tianyi Qiu et.al. 2406.20087 link
2024-06-28 Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language Yicheng Chen et.al. 2406.20085 null
2024-06-28 Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification Anisha Gunjal et.al. 2406.20079 link
2024-06-28 Applying RLAIF for Code Generation with API-usage in Lightweight LLMs Sujan Dutta et.al. 2406.20060 null
2024-07-01 BMW Agents – A Framework For Task Automation Through Multi-Agent Collaboration Noel Crawford et.al. 2406.20041 null
2024-06-28 BioMNER: A Dataset for Biomedical Method Entity Recognition Chen Tang et.al. 2406.20038 null
2024-06-27 ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos Jr-Jen Chen et.al. 2406.19392 link
2024-06-27 The Remarkable Robustness of LLMs: Stages of Inference? Vedang Lad et.al. 2406.19384 link
2024-06-27 Suri: Multi-constraint Instruction Following for Long-form Text Generation Chau Minh Pham et.al. 2406.19371 link
2024-06-27 The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models Xiliang Zhu et.al. 2406.19358 null
2024-06-27 DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions Nigel Fernandez et.al. 2406.19356 null
2024-06-27 IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language Lucky Susanto et.al. 2406.19349 null
2024-06-27 Jump Starting Bandits with LLM-Generated Prior Knowledge Parand A. Alamdari et.al. 2406.19317 null
2024-06-27 Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation Malvina Nikandrou et.al. 2406.19297 null
2024-06-27 From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data Zheyang Xiong et.al. 2406.19292 link
2024-06-27 PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models Cathy Mengying Fang et.al. 2406.19283 null
2024-06-26 Symbolic Learning Enables Self-Evolving Agents Wangchunshu Zhou et.al. 2406.18532 link
2024-06-26 PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation Christoph Leiter et.al. 2406.18528 null
2024-06-26 CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs Zirui Wang et.al. 2406.18521 link
2024-06-26 “Is ChatGPT a Better Explainer than My Professor?”: Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline Grace Li et.al. 2406.18512 null
2024-06-26 Mental Modeling of Reinforcement Learning Agents by Language Models Wenhao Lu et.al. 2406.18505 null
2024-06-26 Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming Zhenghao Zhou et.al. 2406.18501 null
2024-06-26 Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation Ahmed Njifenjou et.al. 2406.18460 null
2024-06-26 Cascading Large Language Models for Salient Event Graph Generation Xingwei Tan et.al. 2406.18449 null
2024-06-26 New intelligent empowerment for digital transformation Peng Yifeng et.al. 2406.18440 null
2024-06-26 IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons Dan Shi et.al. 2406.18406 null
2024-06-25 Text-Animator: Controllable Visual Text Video Generation Lin Liu et.al. 2406.17777 null
2024-06-25 MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning Xiangyu Zhao et.al. 2406.17770 link
2024-06-25 BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning Ercong Nie et.al. 2406.17764 link
2024-06-25 CaLMQA: Exploring culturally specific long-form question answering across 23 languages Shane Arora et.al. 2406.17761 link
2024-06-25 Accelerating Clinical Evidence Synthesis with Large Language Models Zifeng Wang et.al. 2406.17755 null
2024-06-25 Measuring and Benchmarking Large Language Models’ Capabilities to Generate Persuasive Language Amalie Brogaard Pauli et.al. 2406.17753 null
2024-06-25 LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users Elinor Poole-Dayan et.al. 2406.17737 null
2024-06-25 FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model Feijie Wu et.al. 2406.17706 null
2024-06-25 From Distributional to Overton Pluralism: Investigating Large Language Model Alignment Thom Lake et.al. 2406.17692 link
2024-06-25 VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation Kun Qian et.al. 2406.17681 null
2024-06-24 EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees Yuhui Li et.al. 2406.16858 null
2024-06-24 From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models Sean Welleck et.al. 2406.16838 null
2024-06-24 USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$ onversations Mounika Marreddy et.al. 2406.16833 null
2024-06-24 Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track Ronak Pradeep et.al. 2406.16828 null
2024-06-24 GPT-4V Explorations: Mining Autonomous Driving Zixuan Li et.al. 2406.16817 null
2024-06-24 RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale Beck LaBash et.al. 2406.16801 link
2024-06-24 Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs Ashwinee Panda et.al. 2406.16797 link
2024-06-24 M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models Rishabh Maheshwary et.al. 2406.16783 null
2024-06-24 It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension Sagi Shaier et.al. 2406.16779 null
2024-06-24 Blending LLMs into Cascaded Speech Translation: KIT’s Offline Speech Translation System for IWSLT 2024 Sai Koneru et.al. 2406.16777 null
2024-06-21 GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians Haoyang Liu et.al. 2406.15341 link
2024-06-21 Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance Haoling Li et.al. 2406.15330 null
2024-06-21 An End-to-End, Segmentation-Free, Arabic Handwritten Recognition Model on KHATT Sondos Aabed et.al. 2406.15329 null
2024-06-21 Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks Hokyung Lee et.al. 2406.15325 null
2024-06-21 Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics Weijia Zhang et.al. 2406.15264 null
2024-06-21 Detecting Synthetic Lyrics with Few-Shot Inference Yanis Labrak et.al. 2406.15231 null
2024-06-21 A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation Irune Zubiaga et.al. 2406.15227 null
2024-06-21 Unsupervised Extraction of Dialogue Policies from Conversations Makesh Narsimhan Sreedhar et.al. 2406.15214 null
2024-06-21 Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding Mohan Li et.al. 2406.15209 null
2024-06-21 Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms Santiago Berrezueta-Guzman et.al. 2406.15198 null
2024-06-20 Model Merging and Safety Alignment: One Bad Model Spoils the Bunch Hasan Abed Al Kader Hammoud et.al. 2406.14563 null
2024-06-20 Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities Sachit Menon et.al. 2406.14562 null
2024-06-20 Asynchronous Large Language Model Enhanced Planner for Autonomous Driving Yuan Chen et.al. 2406.14556 link
2024-06-20 GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models Shilong Li et.al. 2406.14550 null
2024-06-20 Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models Sunny Duan et.al. 2406.14549 null
2024-06-20 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data Johannes Treutlein et.al. 2406.14546 link
2024-06-20 Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems Đorđe Klisura et.al. 2406.14545 null
2024-06-20 Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Yuxuan Qiao et.al. 2406.14544 link
2024-06-20 Are LLMs Naturally Good at Synthetic Tabular Data Generation? Shengzhe Xu et.al. 2406.14541 link
2024-06-20 PostMark: A Robust Blackbox Watermark for Large Language Models Yapei Chang et.al. 2406.14517 link
2024-06-18 DrVideo: Document Retrieval Based Long Video Understanding Ziyu Ma et.al. 2406.12846 null
2024-06-18 Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts Haoxiang Wang et.al. 2406.12845 link
2024-06-18 Synergizing Foundation Models and Federated Learning: A Survey Shenghui Li et.al. 2406.12844 null
2024-06-18 LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation Seyedarmin Azizi et.al. 2406.12832 link
2024-06-18 Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models? Pinzhen Chen et.al. 2406.12822 null
2024-06-18 Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones? Zhe Yang et.al. 2406.12809 null
2024-06-18 Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents Zehao Wang et.al. 2406.12806 null
2024-06-18 Supporting Human Raters with the Detection of Harmful Content using Large Language Models Kurt Thomas et.al. 2406.12800 null
2024-06-18 ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Team GLM et.al. 2406.12793 null
2024-06-18 Generating Educational Materials with Different Levels of Readability using LLMs Chieh-Yang Huang et.al. 2406.12787 null
2024-06-17 LLaNA: Large Language and NeRF Assistant Andrea Amaduzzi et.al. 2406.11840 null
2024-06-17 mDPO: Conditional Preference Optimization for Multimodal Large Language Models Fei Wang et.al. 2406.11839 link
2024-06-17 Unveiling Encoder-Free Vision-Language Models Haiwen Diao et.al. 2406.11832 link
2024-06-17 Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models Bingqi Ma et.al. 2406.11831 null
2024-06-17 WPO: Enhancing RLHF with Weighted Preference Optimization Wenxuan Zhou et.al. 2406.11827 link
2024-06-17 Composing Object Relations and Attributes for Image-Text Matching Khoi Pham et.al. 2406.11820 null
2024-06-17 Embodied Instruction Following in Unknown Environments Zhenyu Wu et.al. 2406.11818 null
2024-06-17 VideoLLM-online: Online Video Large Language Model for Streaming Video Joya Chen et.al. 2406.11816 null
2024-06-17 LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning Dantong Niu et.al. 2406.11815 null
2024-06-17 How Do Large Language Models Acquire Factual Knowledge During Pretraining? Hoyeon Chang et.al. 2406.11813 link
2024-06-14 Quantifying Variance in Evaluation Benchmarks Lovish Madaan et.al. 2406.10229 null
2024-06-14 Semantic Membership Inference Attack against Large Language Models Hamid Mozaffari et.al. 2406.10218 null
2024-06-14 Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs Rui Yang et.al. 2406.10216 link
2024-06-14 Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs Abhimanyu Hans et.al. 2406.10209 link
2024-06-14 A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors Naaman Tan et.al. 2406.10203 null
2024-06-14 TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners Tomas de la Rosa et.al. 2406.10196 null
2024-06-14 Detecting and Evaluating Medical Hallucinations in Large Vision Language Models Jiawei Chen et.al. 2406.10185 null
2024-06-14 Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors Siyuan Chen et.al. 2406.10181 null
2024-06-14 Datasets for Multilingual Answer Sentence Selection Matteo Gabburo et.al. 2406.10172 null
2024-06-14 Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models Carson Denison et.al. 2406.10162 link
2024-06-13 VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding Muhammad Maaz et.al. 2406.09418 link
2024-06-13 Explore the Limits of Omni-modal Pretraining at Scale Yiyuan Zhang et.al. 2406.09412 link
2024-06-13 Yo’LLaVA: Your Personalized Language and Vision Assistant Thao Nguyen et.al. 2406.09400 link
2024-06-13 Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms Miaosen Zhang et.al. 2406.09397 null
2024-06-13 Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA Jongwoo Park et.al. 2406.09396 link
2024-06-13 Improving Autoregressive Training with Dynamic Oracles Jianing Yang et.al. 2406.09393 null
2024-06-13 Towards Vision-Language Geo-Foundation Model: A Survey Yue Zhou et.al. 2406.09385 link
2024-06-13 Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs Zijia Zhao et.al. 2406.09367 link
2024-06-13 ElicitationGPT: Text Elicitation Mechanisms via Language Models Yifan Wu et.al. 2406.09363 null
2024-06-13 DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding Suwon Shon et.al. 2406.09345 null
2024-06-12 Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens Ting-Ji Huang et.al. 2406.08477 null
2024-06-12 Real2Code: Reconstruct Articulated Objects via Code Generation Zhao Mandi et.al. 2406.08474 null
2024-06-12 Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Zhangchen Xu et.al. 2406.08464 link
2024-06-12 ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery Kam Woh Ng et.al. 2406.08457 link
2024-06-12 TasTe: Teaching Large Language Models to Translate through Self-Reflection Yutong Wang et.al. 2406.08434 link
2024-06-12 Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL Zijin Hong et.al. 2406.08426 null
2024-06-12 OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Qingyun Li et.al. 2406.08418 link
2024-06-12 Discovering Preference Optimization Algorithms with and for Large Language Models Chris Lu et.al. 2406.08414 link
2024-06-12 Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference Christopher Wolters et.al. 2406.08413 null
2024-06-12 Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models Chun-Yi Kuan et.al. 2406.08402 link
2024-06-11 Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena Aidar Myrzakhan et.al. 2406.07545 link
2024-06-11 QuickLLaMA: Query-aware Inference Acceleration for Large Language Models Jingyao Li et.al. 2406.07528 link
2024-06-11 Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement Yunzhen Feng et.al. 2406.07515 null
2024-06-11 THaLLE: Text Hyperlocally Augmented Large Language Extension – Technical Report KBTG Labs et.al. 2406.07505 null
2024-06-11 Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions Renjie Pi et.al. 2406.07502 link
2024-06-11 TextGrad: Automatic “Differentiation” via Text Mert Yuksekgonul et.al. 2406.07496 link
2024-06-11 CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization Frederic Kirstein et.al. 2406.07494 null
2024-06-11 PITCH: Productivity and Mental Well-being Coaching through Daily Conversational Interaction Adnan Abbas et.al. 2406.07485 null
2024-06-11 Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing Mao Li et.al. 2406.07483 null
2024-06-11 VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Zesen Cheng et.al. 2406.07476 link
2024-06-10 Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Peize Sun et.al. 2406.06525 link
2024-06-10 UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor Shivani Upadhyay et.al. 2406.06519 link
2024-06-10 NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative Asmar Nadeem et.al. 2406.06499 null
2024-06-10 Towards a Personal Health Large Language Model Justin Cosentino et.al. 2406.06474 null
2024-06-10 AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction Zhen Xing et.al. 2406.06465 null
2024-06-10 Transforming Wearable Data into Health Insights using Large Language Model Agents Mike A. Merrill et.al. 2406.06464 null
2024-06-10 VCR: Visual Caption Restoration Tianyu Zhang et.al. 2406.06462 link
2024-06-10 Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies Junlin Wang et.al. 2406.06461 null
2024-06-10 Evaluating the Retrieval Component in LLM-Based Question Answering Systems Ashkan Alinejad et.al. 2406.06458 null
2024-06-10 A Large Language Model Pipeline for Breast Cancer Oncology Tristen Pool et.al. 2406.06455 null
2024-06-07 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs Jianing Yang et.al. 2406.05132 link
2024-06-07 An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models Xiongtao Zhou et.al. 2406.05130 null
2024-06-07 Towards Semantic Equivalence of Tokenization in Multimodal LLM Shengqiong Wu et.al. 2406.05127 null
2024-06-07 Categorizing Sources of Information for Explanations in Conversational AI Systems for Older Adults Aging in Place Niharika Mathur et.al. 2406.05111 null
2024-06-07 LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration Tavor Lipman et.al. 2406.05107 null
2024-06-07 Multi-Head RAG: Solving Multi-Aspect Problems with LLMs Maciej Besta et.al. 2406.05085 link
2024-06-07 Are Large Language Models More Empathetic than Humans? Anuradha Welivita et.al. 2406.05063 null
2024-06-07 Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions Shi-Yu Tian et.al. 2406.05055 null
2024-06-07 Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation Nachiket Kotalwar et.al. 2406.05053 null
2024-06-07 Bootstrapping Referring Multi-Object Tracking Yani Zhang et.al. 2406.05039 null
2024-06-06 Verbalized Machine Learning: Revisiting Machine Learning with Language Models Tim Z. Xiao et.al. 2406.04344 null
2024-06-06 RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation Jiaming Liu et.al. 2406.04339 null
2024-06-06 Coherent Zero-Shot Visual Instruction Generation Quynh Phung et.al. 2406.04337 null
2024-06-06 DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs Lingchen Meng et.al. 2406.04334 null
2024-06-06 PaCE: Parsimonious Concept Engineering for Large Language Models Jinqi Luo et.al. 2406.04331 link
2024-06-06 Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Zhanhao Liang et.al. 2406.04314 link
2024-06-06 Semantically Diverse Language Generation for Uncertainty Estimation in Language Models Lukas Aichberger et.al. 2406.04306 link
2024-06-06 Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models Phat Nguyen et.al. 2406.04300 null
2024-06-06 What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages Nadav Borenstein et.al. 2406.04289 null
2024-06-06 Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People Dun-Ming Huang et.al. 2406.04278 link
2024-06-05 Wings: Learning Multimodal LLMs without Text-only Forgetting Yi-Kai Zhang et.al. 2406.03496 null
2024-06-05 Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training Sun Ao et.al. 2406.03488 null
2024-06-05 Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends Sanjana Ramprasad et.al. 2406.03487 null
2024-06-05 BIPED: Pedagogically Informed Tutoring System for ESL Education Soonwoo Kwon et.al. 2406.03486 null
2024-06-05 Does your data spark joy? Performance gains from domain upsampling at the end of training Cody Blakeney et.al. 2406.03476 null
2024-06-05 AD-H: Autonomous Driving with Hierarchical Agents Zaibin Zhang et.al. 2406.03474 null
2024-06-05 What is the Best Way for ChatGPT to Translate Poetry? Shanshan Wang et.al. 2406.03450 null
2024-06-05 Pre-trained Large Language Models Use Fourier Features to Compute Addition Tianyi Zhou et.al. 2406.03445 null
2024-06-05 Investigating the Relationship Between User Specialization and Toxicity on Reddit: A Sentiment Analysis Approach Abi Oppenheim et.al. 2406.03443 null
2024-06-05 Cycles of Thought: Measuring LLM Confidence through Stable Explanations Evan Becker et.al. 2406.03441 null
2024-06-04 Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks Tianyu He et.al. 2406.02550 link
2024-06-04 Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning Alex Jinpeng Wang et.al. 2406.02547 link
2024-06-04 To Believe or Not to Believe Your LLM Yasin Abbasi Yadkori et.al. 2406.02543 null
2024-06-04 Loki: Low-Rank Keys for Efficient Sparse Attention Prajwal Singhania et.al. 2406.02542 null
2024-06-04 Parrot: Multilingual Visual Instruction Tuning Hai-Long Sun et.al. 2406.02539 null
2024-06-04 Mitigate Position Bias in Large Language Models via Scaling a Single Dimension Yijiong Yu et.al. 2406.02536 null
2024-06-04 SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices Ruslan Svirschevski et.al. 2406.02532 null
2024-06-04 Scalable MatMul-free Language Modeling Rui-Jie Zhu et.al. 2406.02528 link
2024-06-04 CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks Maciej Besta et.al. 2406.02524 null
2024-06-04 RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots Soroush Nasiriany et.al. 2406.02523 null
2024-05-31 Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis Chaoyou Fu et.al. 2405.21075 null
2024-05-31 Grammar-Aligned Decoding Kanghee Park et.al. 2405.21047 null
2024-05-31 Direct Alignment of Language Models via Quality-Aware Self-Refinement Runsheng Yu et.al. 2405.21040 null
2024-05-31 Standards for Belief Representations in LLMs Daniel A. Herrmann et.al. 2405.21030 null
2024-05-31 LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models Elias Stengel-Eskin et.al. 2405.21028 link
2024-05-31 Improved Techniques for Optimization-Based Jailbreaking on Large Language Models Xiaojun Jia et.al. 2405.21018 link
2024-05-31 DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models Linli Yao et.al. 2405.20985 null
2024-05-31 Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training Feiteng Fang et.al. 2405.20978 null
2024-05-31 SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales Tianyang Xu et.al. 2405.20974 link
2024-05-31 LCQ: Low-Rank Codebook based Quantization for Large Language Models Wen-Pu Cai et.al. 2405.20973 null
2024-05-30 MotionLLM: Understanding Human Behaviors from Human Motions and Videos Ling-Hao Chen et.al. 2405.20340 null
2024-05-30 Visual Perception by Large Language Model’s Weights Feipeng Ma et.al. 2405.20339 null
2024-05-30 Xwin-LM: Strong and Scalable Alignment Practice for LLMs Bolin Ni et.al. 2405.20335 link
2024-05-31 ParSEL: Parameterized Shape Editing with Language Aditya Ganeshan et.al. 2405.20319 null
2024-05-30 CausalQuest: Collecting Natural Causal Questions for AI Agents Roberto Ceraolo et.al. 2405.20318 link
2024-05-30 ANAH: Analytical Annotation of Hallucinations in Large Language Models Ziwei Ji et.al. 2405.20315 link
2024-05-30 Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation Guillaume Huguet et.al. 2405.20313 null
2024-05-30 Large Language Models Can Self-Improve At Web Agent Tasks Ajay Patel et.al. 2405.20309 null
2024-05-30 Group Robust Preference Optimization in Reward-free RLHF Shyam Sundhar Ramesh et.al. 2405.20304 link
2024-05-30 Who Writes the Review, Human or AI? Panagiotis C. Theocharopoulos et.al. 2405.20285 null
2024-05-29 X-VILA: Cross-Modality Alignment for Large Language Model Hanrong Ye et.al. 2405.19335 null
2024-05-29 LLMs Meet Multimodal Generation and Editing: A Survey Yingqing He et.al. 2405.19334 link
2024-05-29 Multi-Modal Generative Embedding Model Feipeng Ma et.al. 2405.19333 null
2024-05-29 Self-Exploring Language Models: Active Preference Elicitation for Online Alignment Shenao Zhang et.al. 2405.19332 link
2024-05-29 Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation Atrisha Sarkar et.al. 2405.19328 null
2024-05-29 MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series Ge Zhang et.al. 2405.19327 null
2024-05-29 Reasoning3D – Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models Tianrun Chen et.al. 2405.19326 null
2024-05-29 Nearest Neighbor Speculative Decoding for LLM Generation and Attribution Minghan Li et.al. 2405.19325 null
2024-05-29 Are Large Language Models Chameleons? Mingmeng Geng et.al. 2405.19323 null
2024-05-29 Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Shicong Cen et.al. 2405.19320 null
2024-05-28 Don’t Forget to Connect! Improving RAG with Graph-based Reranking Jialin Dong et.al. 2405.18414 null
2024-05-28 Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass Ethan Shen et.al. 2405.18400 link
2024-05-28 Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Yixiao Zhang et.al. 2405.18386 link
2024-05-28 OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning Pengxiang Li et.al. 2405.18380 link
2024-05-28 LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models Anthony Sarah et.al. 2405.18377 null
2024-05-28 Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning Dongjie Chen et.al. 2405.18376 link
2024-05-28 Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning Phakphum Artkaew et.al. 2405.18375 null
2024-05-28 PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework Eshaan Agarwal et.al. 2405.18369 null
2024-05-28 Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? Yifan Bai et.al. 2405.18361 null
2024-05-28 Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs Somnath Kumar et.al. 2405.18359 null
2024-05-27 Matryoshka Multimodal Models Mu Cai et.al. 2405.17430 null
2024-05-27 NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models Chankyu Lee et.al. 2405.17428 null
2024-05-27 Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model Kuan-Chih Huang et.al. 2405.17427 link
2024-05-27 LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence Zhuoling Li et.al. 2405.17424 null
2024-05-27 Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation Jiaming Liu et.al. 2405.17418 null
2024-05-27 THREAD: Thinking Deeper with Recursive Spawning Philip Schroeder et.al. 2405.17402 null
2024-05-27 MindMerger: Efficient Boosting LLM Reasoning in non-English Languages Zixian Huang et.al. 2405.17386 null
2024-05-27 ReMoDetect: Reward Models Recognize Aligned LLM’s Generations Hyunseok Lee et.al. 2405.17382 null
2024-05-27 RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects Ahmed Allam et.al. 2405.17378 null
2024-05-27 Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models ShengYun Peng et.al. 2405.17374 null
2024-05-24 Scaling Laws for Discriminative Classification in Large Language Models Dean Wyatte et.al. 2405.15765 null
2024-05-24 Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias Andres Algaba et.al. 2405.15739 null
2024-05-24 More Insight from Being More Focused: Analysis of Clustered Market Apps Maleknaz Nayebi et.al. 2405.15737 null
2024-05-24 LM4LV: A Frozen Large Language Model for Low-level Vision Tasks Boyang Zheng et.al. 2405.15734 null
2024-05-24 Optimizing Large Language Models for OpenAPI Code Completion Bohdan Petryshyn et.al. 2405.15729 null
2024-05-24 Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models Yue Zhang et.al. 2405.15684 null
2024-05-24 What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models Abdelrahman Abdelhamed et.al. 2405.15668 null
2024-05-24 Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning Wenhan Chang et.al. 2405.15662 null
2024-05-24 \(\mathbf{L^2\cdot M = C^2}\) Large Language Models as Covert Channels… a Systematic Analysis Simen Gaure et.al. 2405.15652 null
2024-05-24 LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots Ruoyu Wang et.al. 2405.15646 null
2024-05-23 A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns Asaf Yehudai et.al. 2405.14863 null
2024-05-23 Bitune: Bidirectional Instruction-Tuning Dawid J. Kopiczko et.al. 2405.14862 null
2024-05-23 PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression Vladimir Malinovskii et.al. 2405.14852 null
2024-05-23 HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models Bernal Jiménez Gutiérrez et.al. 2405.14831 null
2024-05-23 Can LLMs Solve longer Math Word Problems Better? Xin Xu et.al. 2405.14804 null
2024-05-23 Lessons from the Trenches on Reproducible Evaluation of Language Models Stella Biderman et.al. 2405.14782 null
2024-05-23 WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models Peng Wang et.al. 2405.14768 link
2024-05-23 FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models Hongyang Yang et.al. 2405.14767 link
2024-05-23 Evaluating Large Language Models for Public Health Classification and Extraction Tasks Joshua Harris et.al. 2405.14766 null
2024-05-23 Large language models can be zero-shot anomaly detectors for time series? Sarah Alnegheimish et.al. 2405.14755 null
2024-05-21 Reducing Transformer Key-Value Cache Size with Cross-Layer Attention William Brandon et.al. 2405.12981 null
2024-05-21 Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale Shriram Chennakesavalu et.al. 2405.12961 null
2024-05-21 Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models Zhangyue Yin et.al. 2405.12939 null
2024-05-21 Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs Bilgehan Sel et.al. 2405.12933 null
2024-05-21 Code-mixed Sentiment and Hate-speech Prediction Anjali Yadav et.al. 2405.12929 null
2024-05-21 Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples Tim Menzies et.al. 2405.12920 null
2024-05-21 G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation Xingyuan Pan et.al. 2405.12915 null
2024-05-21 An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation Zhiyu Tan et.al. 2405.12914 null
2024-05-21 Topic Modelling Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment Holli Sargeant et.al. 2405.12910 link
2024-05-21 Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents San Kim et.al. 2405.12900 null
2024-05-20 Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning Guanglin Zhou et.al. 2405.12217 link
2024-05-20 MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark Hongwei Liu et.al. 2405.12209 link
2024-05-20 Developers’ Perceptions on the Impact of ChatGPT in Software Development: A Survey Thiago S. Vaillant et.al. 2405.12195 null
2024-05-20 CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models Haoxiang Shi et.al. 2405.12174 null
2024-05-20 Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging Xiaobo Liang et.al. 2405.12163 link
2024-05-20 Eliciting Problem Specifications via Large Language Models Robert E. Wray et.al. 2405.12147 null
2024-05-20 DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM Xuchen Li et.al. 2405.12139 null
2024-05-20 MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Ting Jiang et.al. 2405.12130 link
2024-05-20 Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation Zhankui He et.al. 2405.12119 null
2024-05-20 Imp: Highly Capable Large Multimodal Models for Mobile Devices Zhenwei Shao et.al. 2405.12107 link
2024-05-17 A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers Kaiyu Huang et.al. 2405.10936 link
2024-05-17 The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks Lucius Bushnaq et.al. 2405.10928 null
2024-05-17 COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain Dimitrios P. Panagoulias et.al. 2405.10893 null
2024-05-17 Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review Hongyi Yang et.al. 2405.10883 null
2024-05-17 The Future of Large Language Model Pre-training is Federated Lorenzo Sani et.al. 2405.10853 null
2024-05-17 Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities Hao Zhou et.al. 2405.10825 null
2024-05-17 Modeling Supply Chain Interaction and Disruption: Insights from Real-world Data and Complex Adaptive System Jiawei Feng et.al. 2405.10818 null
2024-05-17 ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios Markus Bayer et.al. 2405.10808 null
2024-05-17 Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings Albert Sawczyn et.al. 2405.10745 null
2024-05-17 Efficient Multimodal Large Language Models: A Survey Yizhang Jin et.al. 2405.10739 link
2024-05-16 UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models Sahel Sharifymoghaddam et.al. 2405.10311 null
2024-05-16 4D Panoptic Scene Graph Generation Jingkang Yang et.al. 2405.10305 link
2024-05-16 HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models Rhea Sanjay Sukthanker et.al. 2405.10299 link
2024-05-16 Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction Jianhao Chen et.al. 2405.10288 null
2024-05-16 FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models Adrian Bulat et.al. 2405.10286 null
2024-05-16 Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers Tuo Zhang et.al. 2405.10276 null
2024-05-16 Keep It Private: Unsupervised Privatization of Online Text Calvin Bao et.al. 2405.10260 link
2024-05-16 When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models Xianzheng Ma et.al. 2405.10255 null
2024-05-16 A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks Xuanfan Ni et.al. 2405.10251 null
2024-05-16 IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers Hao Yan et.al. 2405.10250 null
2024-05-15 Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming Bushi Xiao et.al. 2405.09508 null
2024-05-15 ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata Jonne Sälevä et.al. 2405.09496 null
2024-05-15 Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts Donya Rooein et.al. 2405.09482 null
2024-05-15 Tell Me Why: Explainable Public Health Fact-Checking with Large Language Models Majid Zarharan et.al. 2405.09454 link
2024-05-15 Facilitating Opinion Diversity through Hybrid NLP Approaches Michiel van der Meer et.al. 2405.09439 null
2024-05-15 MicroPython Testbed for Federated Learning Algorithms Miroslav Popovic et.al. 2405.09423 null
2024-05-15 Matching domain experts by training from scratch on domain knowledge Xiaoliang Luo et.al. 2405.09395 null
2024-05-15 PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models Devansh Jain et.al. 2405.09373 null
2024-05-15 Large Language Model Bias Mitigation from the Perspective of Knowledge Editing Ruizhe Chen et.al. 2405.09341 null
2024-05-15 Prompting-based Synthetic Data Generation for Few-Shot Question Answering Maximilian Schmidt et.al. 2405.09335 null
2024-05-14 Towards Enhanced RAC Accessibility: Leveraging Datasets and LLMs Edison Jair Bejarano Sepulveda et.al. 2405.08792 null
2024-05-14 Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring Tiantian Zhang et.al. 2405.08786 null
2024-05-14 Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs Akhila Yerukola et.al. 2405.08760 link
2024-05-14 Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven Approach Syed Mhamudul Hasan et.al. 2405.08755 null
2024-05-14 Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding Zhimin Li et.al. 2405.08748 link
2024-05-14 ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation Dimitris Gkoumas et.al. 2405.08619 null
2024-05-14 A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine Hanguang Xiao et.al. 2405.08603 null
2024-05-14 EVDA: Evolving Deepfake Audio Detection Continual Learning Benchmark Xiaohui Zhang et.al. 2405.08596 null
2024-05-14 Falcon 7b for Software Mention Detection in Scholarly Documents AmeerAli Khan et.al. 2405.08514 null
2024-05-14 Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil Procedure Odysseas S. Chlapanis et.al. 2405.08502 null
2024-05-13 Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots Chengyue Wu et.al. 2405.07990 null
2024-05-13 A Generalist Learner for Multifaceted Medical Image Interpretation Hong-Yu Zhou et.al. 2405.07988 null
2024-05-13 PyZoBot: A Platform for Conversational Information Extraction and Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented Generation Suad Alshammari et.al. 2405.07963 null
2024-05-13 AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments Samuel Schmidgall et.al. 2405.07960 null
2024-05-13 EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning Yinzhu Quan et.al. 2405.07938 null
2024-05-13 PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition Ziyang Zhang et.al. 2405.07932 link
2024-05-13 Can Better Text Semantics in Prompt Tuning Improve VLM Generalization? Hari Chandana Kuchibhotla et.al. 2405.07921 null
2024-05-13 A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking Ferdinand Schlatt et.al. 2405.07920 null
2024-05-13 Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers Alena Tsanda et.al. 2405.07886 null
2024-05-13 Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques Michela Lorandi et.al. 2405.07875 null
2024-05-10 Linearizing Large Language Models Jean Mercat et.al. 2405.06640 link
2024-05-10 Value Augmented Sampling for Language Model Alignment and Personalization Seungwook Han et.al. 2405.06639 link
2024-05-10 Federated Document Visual Question Answering: A Pilot Study Khanh Nguyen et.al. 2405.06636 null
2024-05-10 Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models Chakshu Moar et.al. 2405.06626 null
2024-05-10 What Can Natural Language Processing Do for Peer Review? Ilia Kuznetsov et.al. 2405.06563 null
2024-05-10 Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval Mengjia Niu et.al. 2405.06545 null
2024-05-10 Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts Wenyu Huang et.al. 2405.06524 null
2024-05-10 UniDM: A Unified Framework for Data Manipulation with Large Language Models Yichen Qian et.al. 2405.06510 null
2024-05-10 Aspect-based Sentiment Evaluation of Chess Moves (ASSESS): an NLP-based Method for Evaluating Chess Strategies from Textbooks Haifa Alrdahi et.al. 2405.06499 null
2024-05-10 Storypark: Leveraging Large Language Models to Enhance Children Story Learning Through Child-AI collaboration Storytelling Lyumanshan Ye et.al. 2405.06495 null
2024-05-09 Natural Language Processing RELIES on Linguistics Juri Opitz et.al. 2405.05966 null
2024-05-09 OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning Dan Qiao et.al. 2405.05957 link
2024-05-09 Probing Multimodal LLMs as World Models for Driving Shiva Sreeram et.al. 2405.05956 link
2024-05-09 Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning Junzhi Chen et.al. 2405.05955 null
2024-05-09 CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts Jiachen Li et.al. 2405.05949 link
2024-05-09 Trustworthy AI-Generative Content in Intelligent 6G Network: Adversarial, Privacy, and Fairness Siyuan Li et.al. 2405.05930 null
2024-05-09 Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? Zorik Gekhman et.al. 2405.05904 null
2024-05-09 Co-driver: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes Ziang Guo et.al. 2405.05885 null
2024-05-09 FlockGPT: Guiding UAV Flocking with Linguistic Orchestration Artem Lykov et.al. 2405.05872 null
2024-05-09 Robots Can Feel: LLM-based Framework for Robot Ethical Reasoning Artem Lykov et.al. 2405.05824 link
2024-05-08 You Only Cache Once: Decoder-Decoder Architectures for Language Models Yutao Sun et.al. 2405.05254 null
2024-05-08 Open Source Language Models Can Provide Feedback: Evaluating LLMs’ Ability to Help Students Using GPT-4-As-A-Judge Charles Koutcheme et.al. 2405.05253 link
2024-05-09 LLMs with Personalities in Multi-issue Negotiation Games Sean Noh et.al. 2405.05248 null
2024-05-08 SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants Masoud Moghani et.al. 2405.05226 null
2024-05-08 Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers Jiuxiang Gu et.al. 2405.05219 null
2024-05-08 MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning Inderjeet Nair et.al. 2405.05189 null
2024-05-08 Air Gap: Protecting Privacy-Conscious Conversational Agents Eugene Bagdasaryan et.al. 2405.05175 null
2024-05-08 XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples Peiqin Lin et.al. 2405.05116 null
2024-05-08 QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs Weijia Zhang et.al. 2405.05109 null
2024-05-08 Concerns on Bias in Large Language Models when Creating Synthetic Personae Helena A. Haxvig et.al. 2405.05080 null
2024-05-07 ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning Jing Lin et.al. 2405.04533 null
2024-05-07 QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving Yujun Lin et.al. 2405.04532 link
2024-05-07 NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts Shudan Zhang et.al. 2405.04520 null
2024-05-07 xLSTM: Extended Long Short-Term Memory Maximilian Beck et.al. 2405.04517 null
2024-05-07 A Transformer with Stack Attention Jiaoda Li et.al. 2405.04515 link
2024-05-08 Unveiling Disparities in Web Task Handling Between Human and Web Agent Kihoon Son et.al. 2405.04497 null
2024-05-07 Toward In-Context Teaching: Adapting Examples to Students’ Misconceptions Alexis Ross et.al. 2405.04495 null
2024-05-07 The Silicone Ceiling: Auditing GPT’s Race and Gender Biases in Hiring Lena Armstrong et.al. 2405.04412 null
2024-05-07 Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks Georgios Pantazopoulos et.al. 2405.04403 link
2024-05-07 Large Language Models Cannot Explain Themselves Advait Sarkar et.al. 2405.04382 null
2024-05-06 Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs Muhammad Uzair Khattak et.al. 2405.03690 null
2024-05-06 Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Frames Keith Burghardt et.al. 2405.03688 null
2024-05-06 Language-Image Models with 3D Understanding Jang Hyun Cho et.al. 2405.03685 null
2024-05-06 AtomGPT: Atomistic Generative Pre-trained Transformer for Forward and Inverse Materials Design Kamal Choudhary et.al. 2405.03680 null
2024-05-06 A New Robust Partial $p$ -Wasserstein-Based Metric for Comparing Distributions Sharath Raghvendra et.al. 2405.03664 null
2024-05-06 When LLMs Meet Cybersecurity: A Systematic Literature Review Jie Zhang et.al. 2405.03644 null
2024-05-06 A Controlled Experiment on the Energy Efficiency of the Source Code Generated by Code Llama Vlad-Andrei Cursaru et.al. 2405.03616 null
2024-05-06 Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment Abhinav Agarwalla et.al. 2405.03594 null
2024-05-06 AlphaMath Almost Zero: process Supervision without process Guoxin Chen et.al. 2405.03553 null
2024-05-06 MAmmoTH2: Scaling Instructions from the Web Xiang Yue et.al. 2405.03548 null
2024-05-03 Leveraging Large Language Models to Enhance Domain Expert Inclusion in Data Science Workflows Jasmine Y. Shih et.al. 2405.02260 null
2024-05-03 What matters when building vision-language models? Hugo Laurençon et.al. 2405.02246 null
2024-05-03 REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs Deepa Tilwani et.al. 2405.02228 null
2024-05-03 Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks Lujing Zhang et.al. 2405.02225 null
2024-05-03 FairEvalLLM. A Comprehensive Framework for Benchmarking Fairness in Large Language Model Recommender Systems Yashar Deldjoo et.al. 2405.02219 null
2024-05-03 Automatic Programming: Large Language Models and Beyond Michael R. Lyu et.al. 2405.02213 null
2024-05-03 Assessing and Verifying Task Utility in LLM-Powered Applications Negar Arabzadeh et.al. 2405.02178 null
2024-05-03 The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates Giuseppe Russo Latona et.al. 2405.02150 null
2024-05-03 MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain Chao Jiang et.al. 2405.02144 null
2024-05-03 Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection Guillem Ramírez et.al. 2405.02134 null
2024-05-02 Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks Murtaza Dalal et.al. 2405.01534 null
2024-05-02 OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning Shihao Wang et.al. 2405.01533 link
2024-05-02 FLAME: Factuality-Aware Alignment for Large Language Models Sheng-Chieh Lin et.al. 2405.01525 null
2024-05-02 Transformer-Aided Semantic Communications Matin Mortaheb et.al. 2405.01521 null
2024-05-02 Analyzing the Role of Semantic Representations in the Era of Large Language Models Zhijing Jin et.al. 2405.01502 link
2024-05-02 Supporting Business Document Workflows via Collection-Centric Information Foraging with Large Language Models Raymond Fok et.al. 2405.01501 null
2024-05-02 Controllable Text Generation in the Instruction-Tuning Era Dhananjay Ashok et.al. 2405.01490 null
2024-05-02 NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment Gerald Shen et.al. 2405.01481 link
2024-05-02 V-FLUTE: Visual Figurative Language Understanding with Textual Explanations Arkadiy Saakyan et.al. 2405.01474 link
2024-05-02 Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning Théo Moutakanni et.al. 2405.01469 null
2024-05-01 Is Bigger Edit Batch Size Always Better? – An Empirical Study on Model Editing with Llama-3 Junsang Yoon et.al. 2405.00664 null
2024-05-01 HalluVault: A Novel Logic Programming-aided Metamorphic Testing Framework for Detecting Fact-Conflicting Hallucinations in Large Language Models Ningke Li et.al. 2405.00648 null
2024-05-01 When Quantization Affects Confidence of Large Language Models? Irina Proskurina et.al. 2405.00632 link
2024-05-01 “I’m Not Sure, But…”: Examining the Impact of Large Language Models’ Uncertainty Expression on User Reliance and Trust Sunnie S. Y. Kim et.al. 2405.00623 null
2024-05-01 Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling Yida Mu et.al. 2405.00611 null
2024-05-01 Investigating Automatic Scoring and Feedback using Large Language Models Gloria Ashiya Katuka et.al. 2405.00602 null
2024-05-01 Are Models Biased on Text without Gender-related Language? Catarina G Belém et.al. 2405.00588 link
2024-05-01 The Real, the Better: Aligning Large Language Models with Online Human Behaviors Guanying Jiang et.al. 2405.00578 null
2024-05-01 EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model Deng Li et.al. 2405.00574 null
2024-05-01 Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval Young Kyun Jang et.al. 2405.00571 null
2024-04-30 DOCCI: Descriptions of Connected and Contrasting Images Yasumasa Onoe et.al. 2404.19753 null
2024-04-30 Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Yunhao Ge et.al. 2404.19752 null
2024-04-30 PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance Verification Leon Garza et.al. 2404.19744 null
2024-04-30 Better & Faster Large Language Models via Multi-token Prediction Fabian Gloeckle et.al. 2404.19737 null
2024-04-30 A Framework for Leveraging Human Computation Gaming to Enhance Knowledge Graphs for Accuracy Critical Generative AI Applications Steph Buongiorno et.al. 2404.19729 null
2024-04-30 PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games Steph Buongiorno et.al. 2404.19721 null
2024-04-30 Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns Constantinos Patsakis et.al. 2404.19715 null
2024-04-30 Automated Generation of High-Quality Medical Simulation Scenarios Through Integration of Semi-Structured Data and Large Language Models Scott Sumpter et.al. 2404.19713 null
2024-04-30 When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively Tiziano Labruna et.al. 2404.19705 link
2024-04-30 Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners Chun Feng et.al. 2404.19696 null
2024-04-29 Hallucination of Multimodal Large Language Models: A Survey Zechen Bai et.al. 2404.18930 link
2024-04-29 DPO Meets PPO: Reinforced Token Optimization for RLHF Han Zhong et.al. 2404.18922 link
2024-04-29 TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation Junhao Cheng et.al. 2404.18919 null
2024-04-29 Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting Fangcheng Liu et.al. 2404.18911 link
2024-04-29 Human-in-the-Loop Synthetic Text Data Inspection with Provenance Tracking Hong Jin Kang et.al. 2404.18881 link
2024-04-29 More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness Aaron J. Li et.al. 2404.18870 link
2024-04-29 Truth-value judgment in language models: belief directions are context sensitive Stefan F. Schouten et.al. 2404.18865 null
2024-04-29 Performance-Aligned LLMs for Generating Fast Code Daniel Nichols et.al. 2404.18864 null
2024-04-29 VERT: Verified Equivalent Rust Transpilation with Few-Shot Learning Aidan Z. H. Yang et.al. 2404.18852 null
2024-04-29 It’s Difficult to be Neutral – Human and LLM-based Sentiment Annotation of Patient Comments Petter Mæhlum et.al. 2404.18832 null
2024-04-26 Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo Stephen Zhao et.al. 2404.17546 link
2024-04-26 Large Language Model Agent as a Mechanical Designer Yayati Jadhav et.al. 2404.17525 null
2024-04-26 On the Use of Large Language Models to Generate Capability Ontologies Luis Miguel Vieira da Silva et.al. 2404.17524 null
2024-04-26 Enhancing Legal Compliance and Regulation Analysis with Large Language Models Shabnam Hassani et.al. 2404.17522 null
2024-04-26 A Comprehensive Evaluation on Event Reasoning of Large Language Models Zhengwei Tao et.al. 2404.17513 link
2024-04-26 Learning text-to-video retrieval from image captioning Lucas Ventura et.al. 2404.17498 null
2024-04-26 CEval: A Benchmark for Evaluating Counterfactual Text Generation Van Bach Nguyen et.al. 2404.17475 link
2024-04-26 Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System Robin Schmucker et.al. 2404.17460 null
2024-04-26 “ChatGPT Is Here to Help, Not to Replace Anybody” – An Evaluation of Students’ Opinions On Integrating ChatGPT In CS Courses Bruno Pereira Cipriano et.al. 2404.17443 null
2024-04-26 InspectorRAGet: An Introspection Platform for RAG Evaluation Kshitij Fadnis et.al. 2404.17347 link
2024-04-25 Make-it-Real: Unleashing Large Multimodal Model’s Ability for Painting 3D Objects with Realistic Materials Ye Fang et.al. 2404.16829 null
2024-04-25 How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Zhe Chen et.al. 2404.16821 link
2024-04-25 IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages Harman Singh et.al. 2404.16816 link
2024-04-25 Make Your LLM Fully Utilize the Context Shengnan An et.al. 2404.16811 link
2024-04-25 Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning Tianhui Zhang et.al. 2404.16807 null
2024-04-25 Weak-to-Strong Extrapolation Expedites Alignment Chujie Zheng et.al. 2404.16792 link
2024-04-25 SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension Bohao Li et.al. 2404.16790 link
2024-04-25 Continual Learning of Large Language Models: A Comprehensive Survey Haizhou Shi et.al. 2404.16789 link
2024-04-25 Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model Runzhe Zhan et.al. 2404.16766 null
2024-04-25 RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis Xiaoman Zhang et.al. 2404.16754 null
2024-04-24 Hybrid LLM/Rule-based Approaches to Business Insights Generation from Structured Data Aliaksei Vertsel et.al. 2404.15604 null
2024-04-24 ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction Henry Peng Zou et.al. 2404.15592 link
2024-04-24 Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations? Hossein Salami et.al. 2404.15578 null
2024-04-23 PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models Shashi Kant Gupta et.al. 2404.15549 null
2024-04-23 Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models Mihir Parmar et.al. 2404.15522 link
2024-04-23 Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval Young Kyun Jang et.al. 2404.15516 null
2024-04-23 ToM-LM: Delegating Theory Of Mind Reasoning to External Symbolic Executors in Large Language Models Weizhi Tang et.al. 2404.15515 null
2024-04-23 GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots Simranjit Singh et.al. 2404.15500 null
2024-04-23 IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents Jean-Philippe Corbeil et.al. 2404.15488 link
2024-04-23 Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance Het Patel et.al. 2404.15485 null
2024-04-23 Aligning LLM Agents by Learning Latent Preference from User Edits Ge Gao et.al. 2404.15269 link
2024-04-23 XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts Yifeng Ding et.al. 2404.15247 link
2024-04-23 Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models Aidan Z. H. Yang et.al. 2404.15236 null
2024-04-23 Re-Thinking Inverse Graphics With Large Language Models Peter Kulits et.al. 2404.15228 null
2024-04-23 Setting up the Data Printer with Improved English to Ukrainian Machine Translation Yurii Paniv et.al. 2404.15196 link
2024-04-23 Regressive Side Effects of Training Language Models to Mimic Student Misconceptions Shashank Sonkar et.al. 2404.15156 null
2024-04-23 Bias patterns in the application of LLMs for clinical decision support: A comprehensive study Raphael Poulain et.al. 2404.15149 null
2024-04-23 Rethinking LLM Memorization through the Lens of Adversarial Compression Avi Schwarzschild et.al. 2404.15146 null
2024-04-23 MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning Sunan He et.al. 2404.15127 link
2024-04-23 Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation Xun Wu et.al. 2404.15100 null
2024-04-22 AutoAD III: The Prequel – Back to the Pixels Tengda Han et.al. 2404.14412 null
2024-04-22 SpaceByte: Towards Deleting Tokenization from Large Language Modeling Kevin Slagle et.al. 2404.14408 link
2024-04-22 RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios? Adrian de Wynter et.al. 2404.14397 link
2024-04-22 A Survey on Self-Evolution of Large Language Models Zhengwei Tao et.al. 2404.14387 null
2024-04-22 Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph Xiaochen Kev Gao et.al. 2404.14372 link
2024-04-22 Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data Fahim Tajwar et.al. 2404.14367 link
2024-04-22 Better Synthetic Data by Retrieving and Transforming Existing Datasets Saumya Gandhi et.al. 2404.14361 link
2024-04-22 Rethinking Legal Compliance Automation: Opportunities with Large Language Models Shabnam Hassani et.al. 2404.14356 null
2024-04-22 Automated Long Answer Grading with RiceChem Dataset Shashank Sonkar et.al. 2404.14316 null
2024-04-22 Explaining Arguments’ Strength: Unveiling the Role of Attacks and Supports (Technical Report) Xiang Yin et.al. 2404.14304 null
2024-04-19 MoVA: Adapting Mixture of Vision Experts to Multimodal Context Zhuofan Zong et.al. 2404.13046 link
2024-04-19 Unified Scene Representation and Reconstruction for 3D Large Language Models Tao Chu et.al. 2404.13044 null
2024-04-19 Data Alignment for Zero-Shot Concept Generation in Dermatology AI Soham Gadgil et.al. 2404.13043 null
2024-04-19 LaPA: Latent Prompt Assist Model For Medical Visual Question Answering Tiancheng Gu et.al. 2404.13039 link
2024-04-19 Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs Biyang Guo et.al. 2404.13033 link
2024-04-19 When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering Stephen Choi et.al. 2404.13028 null
2024-04-19 Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Chuofan Ma et.al. 2404.13013 link
2024-04-19 Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs Clemencia Siro et.al. 2404.12994 link
2024-04-19 RedactBuster: Entity Type Recognition from Redacted Documents Mirco Beltrame et.al. 2404.12991 null
2024-04-19 FineRec:Exploring Fine-grained Sequential Recommendation Xiaokun Zhang et.al. 2404.12975 null
2024-04-18 BLINK: Multimodal Large Language Models Can See but Not Perceive Xingyu Fu et.al. 2404.12390 null
2024-04-18 MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale Xiaotang Gai et.al. 2404.12372 null
2024-04-18 When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes Asaf Yehudai et.al. 2404.12365 link
2024-04-18 Towards a Foundation Model for Partial Differential Equation: Multi-Operator Learning and Extrapolation Jingmin Sun et.al. 2404.12355 link
2024-04-18 V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning Hang Hua et.al. 2404.12353 null
2024-04-18 Large Language Models in Targeted Sentiment Analysis Nicolay Rusnachenko et.al. 2404.12342 link
2024-04-18 Normative Requirements Operationalization with Large Language Models Nick Feng et.al. 2404.12335 null
2024-04-18 Large Language Models for Synthetic Participatory Planning of Shared Automated Electric Mobility Systems Jiangbo Yu et.al. 2404.12317 null
2024-04-18 Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair Yusuke Sakai et.al. 2404.12299 null
2024-04-18 Augmenting emotion features in irony detection with Large language modeling Yucheng Lin et.al. 2404.12291 null
2024-04-17 A Deep Dive into Large Language Models for Automated Bug Localization and Repair Soneya Binta Hossain et.al. 2404.11595 null
2024-04-17 Related Work and Citation Text Generation: A Survey Xiangci Li et.al. 2404.11588 null
2024-04-17 LLMTune: Accelerate Database Knob Tuning with Large Language Models Xinmei Huang et.al. 2404.11581 null
2024-04-17 MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation Kuan-Chieh et.al. 2404.11565 null
2024-04-17 Quantifying Multilingual Performance of Large Language Models Across Languages Zihao Li et.al. 2404.11553 link
2024-04-17 Evaluating Span Extraction in Generative Paradigm: A Reflection on Aspect-Based Sentiment Analysis Soyoung Yang et.al. 2404.11539 null
2024-04-17 Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization Costas Mavromatis et.al. 2404.11531 null
2024-04-17 Embedding Privacy in Computational Social Science and Artificial Intelligence Research Keenan Jones et.al. 2404.11515 null
2024-04-17 Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models Yushuo Chen et.al. 2404.11502 link
2024-04-17 Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models Yue Zhou et.al. 2404.11500 link
2024-04-16 Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback Qiwei Di et.al. 2404.10776 null
2024-04-16 LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? Yuchi Wang et.al. 2404.10763 link
2024-04-16 Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification Yu-Yang Li et.al. 2404.10757 null
2024-04-16 Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study Shusheng Xu et.al. 2404.10719 null
2024-04-16 An empirical study on code review activity prediction in practice Doriane Olewicki et.al. 2404.10703 null
2024-04-16 Automating REST API Postman Test Cases Using LLM S Deepika Sri et.al. 2404.10678 null
2024-04-16 ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images Quan Van Nguyen et.al. 2404.10652 link
2024-04-16 Self-playing Adversarial Language Game Enhances LLM Reasoning Pengyu Cheng et.al. 2404.10642 link
2024-04-16 HLAT: High-quality Large Language Model Pre-trained on AWS Trainium Haozheng Fan et.al. 2404.10630 null
2024-04-16 Private Attribute Inference from Images with Vision-Language Models Batuhan Tömekçe et.al. 2404.10618 null
2024-04-15 Personalized Collaborative Fine-Tuning for On-Device Large Language Models Nicolas Wagner et.al. 2404.09753 null
2024-04-15 Quantization of Large Language Models with an Overdetermined Basis Daniil Merkulov et.al. 2404.09737 null
2024-04-15 Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model Hyunsoo Cho et.al. 2404.09717 null
2024-04-15 Enhancing Robot Explanation Capabilities through Vision-Language Models: a Preliminary Study by Interpreting Visual Inputs for Improved Human-Robot Interaction David Sobrín-Hidalgo et.al. 2404.09705 null
2024-04-15 Generative AI for Game Theory-based Mobile Networking Long He et.al. 2404.09699 null
2024-04-15 Are Large Language Models Reliable Argument Quality Annotators? Nailia Mirzakhmedova et.al. 2404.09696 null
2024-04-15 LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models Guangyan Li et.al. 2404.09695 null
2024-04-15 Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation Juhwan Choi et.al. 2404.09682 link
2024-04-15 Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection Jiaqi Zhu et.al. 2404.09654 null
2024-04-15 Bridging Vision and Language Spaces with Assignment Prediction Jungin Park et.al. 2404.09632 link
2024-04-12 Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts Övgü Özdemir et.al. 2404.08589 link
2024-04-12 Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation Hanlin Tian et.al. 2404.08570 null
2024-04-12 RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs Shreyas Chaudhari et.al. 2404.08555 null
2024-04-12 Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward Xuan Xie et.al. 2404.08517 null
2024-04-12 Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction Haoran Qiu et.al. 2404.08509 link
2024-04-12 LaSagnA: Language-based Segmentation Assistant for Complex Queries Cong Wei et.al. 2404.08506 link
2024-04-12 Strategic Interactions between Large Language Models-based Agents in Beauty Contests Siting Lu et.al. 2404.08492 null
2024-04-12 Thematic Analysis with Large Language Models: does it work with languages other than English? A targeted test in Italian Stefano De Paoli et.al. 2404.08488 null
2024-04-12 Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task Hassan Ali et.al. 2404.08424 null
2024-04-12 AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees William Fleshman et.al. 2404.08417 null
2024-04-11 OpenBias: Open-set Bias Detection in Text-to-Image Generative Models Moreno D’Incà et.al. 2404.07990 link
2024-04-11 View Selection for 3D Captioning via Diffusion Ranking Tiange Luo et.al. 2404.07984 null
2024-04-11 Manipulating Large Language Models to Increase Product Visibility Aounon Kumar et.al. 2404.07981 link
2024-04-11 LLoCO: Learning Long Contexts Offline Sijun Tan et.al. 2404.07979 link
2024-04-11 Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Haotian Zhang et.al. 2404.07973 null
2024-04-11 Leveraging Large Language Models (LLMs) to Support Collaborative Human-AI Online Risk Data Annotation Jinkyung Park et.al. 2404.07926 null
2024-04-11 LaVy: Vietnamese Multimodal Large Language Model Chi Tran et.al. 2404.07922 link
2024-04-11 AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs Zeyi Liao et.al. 2404.07921 link
2024-04-11 DesignQA: A Multimodal Benchmark for Evaluating Large Language Models’ Understanding of Engineering Documentation Anna C. Doris et.al. 2404.07917 link
2024-04-11 High-Dimension Human Value Representation in Large Language Models Samuel Cahyawijaya et.al. 2404.07900 link
2024-04-10 UMBRAE: Unified Multimodal Decoding of Brain Signals Weihao Xia et.al. 2404.07202 null
2024-04-10 Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Tsendsuren Munkhdalai et.al. 2404.07143 link
2024-04-11 Semantically-correlated memories in a dense associative model Thomas F Burns et.al. 2404.07123 null
2024-04-10 Continuous Language Model Interpolation for Dynamic and Controllable Text Generation Sara Kangaslahti et.al. 2404.07117 null
2024-04-11 From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications Yongqiang Ma et.al. 2404.07108 null
2024-04-10 Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs Bowen Jin et.al. 2404.07103 link
2024-04-10 Dynamic Generation of Personalities with Large Language Models Jianzhi Liu et.al. 2404.07084 null
2024-04-10 VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning Alexandros Xenos et.al. 2404.07078 link
2024-04-10 Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers? Mingyu Jin et.al. 2404.07066 link
2024-04-10 Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study Alessandro Stolfo et.al. 2404.07060 null
2024-04-09 Pitfalls of Conversational LLMs on News Debiasing Ipek Baris Schlicht et.al. 2404.06488 null
2024-04-09 Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks Chonghua Wang et.al. 2404.06480 link
2024-04-09 Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models Zihan Fang et.al. 2404.06448 null
2024-04-09 Large Language Models to the Rescue: Deadlock Resolution in Multi-Robot Systems Kunal Garg et.al. 2404.06413 null
2024-04-09 AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents Luca Gioacchini et.al. 2404.06411 link
2024-04-09 Take a Look at it! Rethinking How to Evaluate Language Model Jailbreak Hongyu Cai et.al. 2404.06407 link
2024-04-09 Apprentices to Research Assistants: Advancing Research with Large Language Models M. Namvarpour et.al. 2404.06404 null
2024-04-09 MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Shengding Hu et.al. 2404.06395 link
2024-04-09 MuPT: A Generative Symbolic Music Pretrained Transformer Xingwei Qu et.al. 2404.06393 null
2024-04-09 Latent Distance Guided Alignment Training for Large Language Models Haotian Luo et.al. 2404.06390 null
2024-04-08 MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Bo He et.al. 2404.05726 link
2024-04-08 Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Keen You et.al. 2404.05719 null
2024-04-08 Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding Ahmad Idrissi-Yaghir et.al. 2404.05694 null
2024-04-08 Evaluating Mathematical Reasoning Beyond Accuracy Shijie Xia et.al. 2404.05692 link
2024-04-08 Retrieval-Augmented Open-Vocabulary Object Detection Jooyeon Kim et.al. 2404.05687 link
2024-04-08 MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation Kunpeng Song et.al. 2404.05674 link
2024-04-08 CoReS: Orchestrating the Dance of Reasoning and Segmentation Xiaoyi Bao et.al. 2404.05673 link
2024-04-08 Fighting crime with Transformers: Empirical analysis of address parsing methods in payment data Haitham Hammami et.al. 2404.05632 link
2024-04-08 LTNER: Large Language Model Tagging for Named Entity Recognition with Contextualized Entity Marking Faren Yan et.al. 2404.05624 null
2024-04-08 MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering Iñigo Alonso et.al. 2404.05590 null
2024-04-05 Physical Property Understanding from Language-Embedded Feature Fields Albert J. Zhai et.al. 2404.04242 null
2024-04-05 Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents Harsh Kohli et.al. 2404.04237 null
2024-04-05 Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation Tianqi Zhong et.al. 2404.04232 link
2024-04-05 Social Skill Training with Large Language Models Diyi Yang et.al. 2404.04204 null
2024-04-05 Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model Xinrun Du et.al. 2404.04167 null
2024-04-05 Large language models as oracles for instantiating ontologies with domain-specific knowledge Giovanni Ciatto et.al. 2404.04108 link
2024-04-05 Improving Factual Accuracy of Neural Table-to-Text Output by Addressing Input Problems in ToTTo Barkavi Sundararajan et.al. 2404.04103 link
2024-04-05 Robust Preference Optimization with Provable Noise Tolerance for LLMs Xize Liang et.al. 2404.04102 null
2024-04-05 Assessing the quality of information extraction Filip Seitl et.al. 2404.04068 null
2024-04-05 CLUE: A Clinical Language Understanding Evaluation for LLMs Amin Dada et.al. 2404.04067 link
2024-04-04 CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Dongzhi Jiang et.al. 2404.03653 link
2024-04-04 AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent Hanyu Lai et.al. 2404.03648 link
2024-04-04 Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra Darioush Kevian et.al. 2404.03647 null
2024-04-04 Training LLMs over Neurally Compressed Text Brian Lester et.al. 2404.03626 null
2024-04-04 Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph Marco Bronzini et.al. 2404.03623 link
2024-04-04 Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models Wenshan Wu et.al. 2404.03622 link
2024-04-04 DeViDe: Faceted medical knowledge for improved medical vision-language pre-training Haozhe Luo et.al. 2404.03618 null
2024-04-04 Sailor: Open Language Models for South-East Asia Longxu Dou et.al. 2404.03608 link
2024-04-04 Evaluating LLMs at Detecting Errors in LLM Responses Ryo Kamoi et.al. 2404.03602 link
2024-04-04 Intent Detection and Entity Extraction from BioMedical Literature Ankan Mullick et.al. 2404.03598 link
2024-04-03 ALOHa: A New Measure for Hallucination in Captioning Models Suzanne Petryk et.al. 2404.02904 null
2024-04-03 MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment Duygu Ceylan et.al. 2404.02899 null
2024-04-03 ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline Yifan Xu et.al. 2404.02893 link
2024-04-03 Integrating Explanations in Learning LTL Specifications from Demonstrations Ashutosh Gupta et.al. 2404.02872 null
2024-04-03 Toward Inference-optimal Mixture-of-Expert Large Language Models Longfei Yun et.al. 2404.02852 null
2024-04-03 I-Design: Personalized LLM Interior Designer Ata Çelen et.al. 2404.02838 null
2024-04-03 Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models Wanyun Cui et.al. 2404.02837 null
2024-04-03 Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison Maxime Bouthors et.al. 2404.02835 null
2024-04-03 Empowering Biomedical Discovery with AI Agents Shanghua Gao et.al. 2404.02831 null
2024-04-03 BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models Qijun Luo et.al. 2404.02827 link
2024-04-02 Topic-based Watermarks for LLM-Generated Text Alexander Nemecek et.al. 2404.02138 null
2024-04-02 Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models Wanyong Feng et.al. 2404.02124 null
2024-04-02 GINopic: Topic Modeling with Graph Isomorphism Network Suman Adhya et.al. 2404.02115 link
2024-04-02 CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems Sara Rosenthal et.al. 2404.02103 link
2024-04-02 Advancing LLM Reasoning Generalists with Preference Trees Lifan Yuan et.al. 2404.02078 link
2024-04-02 Digital Forgetting in Large Language Models: A Survey of Unlearning Methods Alberto Blanco-Justicia et.al. 2404.02062 null
2024-04-02 Long-context LLMs Struggle with Long In-context Learning Tianle Li et.al. 2404.02060 link
2024-04-02 Deconstructing In-Context Learning: Understanding Prompts via Corruption Namrata Shivagunde et.al. 2404.02054 link
2024-04-02 BERTopic-Driven Stock Market Predictions: Unraveling Sentiment Insights Enmin Zhu et.al. 2404.02053 null
2024-04-02 A Survey on Large Language Model-Based Game Agents Sihao Hu et.al. 2404.02039 link
2024-03-29 Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models Atsuyuki Miyai et.al. 2403.20331 link
2024-03-29 Gecko: Versatile Text Embeddings Distilled from Large Language Models Jinhyuk Lee et.al. 2403.20327 null
2024-03-29 Convolutional Prompting meets Language Models for Continual Learning Anurag Roy et.al. 2403.20317 null
2024-03-29 Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference Jovan Stojkovic et.al. 2403.20306 null
2024-03-29 Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain Burcu Sayin et.al. 2403.20288 null
2024-03-29 LUQ: Long-text Uncertainty Quantification for LLMs Caiqi Zhang et.al. 2403.20279 null
2024-04-01 Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want Weifeng Lin et.al. 2403.20271 link
2024-03-29 Latxa: An Open Language Model and Evaluation Suite for Basque Julen Etxaniz et.al. 2403.20266 link
2024-03-29 ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models Thibaut Thonet et.al. 2403.20262 null
2024-03-29 Using LLMs to Model the Beliefs and Preferences of Targeted Populations Keiichi Namikoshi et.al. 2403.20252 null
2024-03-28 InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction Sirui Xu et.al. 2403.19652 null
2024-03-28 MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions Kai Zhang et.al. 2403.19651 null
2024-03-28 Change-Agent: Towards Interactive Comprehensive Change Interpretation and Analysis from Change Detection and Change Captioning Chenyang Liu et.al. 2403.19646 link
2024-03-28 Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models Yucheng Shi et.al. 2403.19631 null
2024-03-28 Semantic Map-based Generation of Navigation Instructions Chengzu Li et.al. 2403.19603 link
2024-03-28 LocCa: Visual Pretraining with Location-aware Captioners Bo Wan et.al. 2403.19596 null
2024-03-28 Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation Zhongliang Zhou et.al. 2403.19584 null
2024-03-28 WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models Piotr Molenda et.al. 2403.19548 null
2024-03-28 LLMs as Academic Reading Companions: Extending HCI Through Synthetic Personae Celia Chen et.al. 2403.19506 null
2024-03-28 Evolving Assembly Code in an Adversarial Environment Irina Maliukov et.al. 2403.19489 null
2024-03-27 Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Yanwei Li et.al. 2403.18814 link
2024-03-27 ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation Suraj Patni et.al. 2403.18807 link
2024-03-27 Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation Mateusz Klimaszewski et.al. 2403.18804 null
2024-03-27 Long-form factuality in large language models Jerry Wei et.al. 2403.18802 link
2024-03-27 3P-LLM: Probabilistic Path Planning using Large Language Model for Autonomous Robot Navigation Ehsan Latif et.al. 2403.18778 null
2024-03-27 CheckEval: Robust Evaluation Framework using Large Language Model via Checklist Yukyung Lee et.al. 2403.18771 null
2024-03-27 MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model Yike Wu et.al. 2403.18760 null
2024-03-27 Understanding the Learning Dynamics of Alignment with Human Feedback Shawn Im et.al. 2403.18742 null
2024-03-27 PhysicsAssistant: An LLM-Powered Interactive Learning Robot for Physics Lab Investigations Ehsan Latif et.al. 2403.18721 null
2024-03-27 NL-ITI: Optimizing Probing and Intervention for Improvement of ITI Method Jakub Hoscilowicz et.al. 2403.18680 link
2024-03-26 MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution Wei Tao et.al. 2403.17927 null
2024-03-26 LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning Rui Pan et.al. 2403.17919 null
2024-03-26 Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach Andrea Ferrario et.al. 2403.17873 null
2024-03-26 Exploring LLMs as a Source of Targeted Synthetic Textual Data to Minimize High Confidence Misclassifications Philip Lippmann et.al. 2403.17860 null
2024-03-26 ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages Bhawna Piryani et.al. 2403.17859 link
2024-03-26 Verbing Weirds Language (Models): Evaluation of English Zero-Derivation in Five LLMs David R. Mortensen et.al. 2403.17856 null
2024-03-26 ArabicaQA: A Comprehensive Dataset for Arabic Question Answering Abdelrahman Abdallah et.al. 2403.17848 link
2024-03-26 Assessment of Multimodal Large Language Models in Alignment with Human Values Zhelun Shi et.al. 2403.17830 null
2024-03-26 Accelerating Radio Spectrum Regulation Workflows with Large Language Models (LLMs) Amir Ghasemi et.al. 2403.17819 null
2024-03-26 Are Compressed Language Models Less Subgroup Robust? Leonidas Gee et.al. 2403.17811 link
2024-03-25 Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making Shuai Ma et.al. 2403.16812 null
2024-03-25 An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems Hanqing Yang et.al. 2403.16809 null
2024-03-25 Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback Zhangqian Bi et.al. 2403.16792 null
2024-03-25 All Artificial, Less Intelligence: GenAI through the Lens of Formal Verification Deepak Narayan Gadde et.al. 2403.16750 null
2024-03-25 Synapse: Learning Preferential Concepts from Visual Demonstrations Sadanand Modak et.al. 2403.16689 null
2024-03-25 Investigation of the effectiveness of applying ChatGPT in Dialogic Teaching Using Electroencephalography Jiayue Zhang et.al. 2403.16687 null
2024-03-25 ToXCL: A Unified Framework for Toxic Speech Detection and Explanation Nhat M. Hoang et.al. 2403.16685 link
2024-03-25 RU22Fact: Optimizing Evidence for Multilingual Explainable Fact-Checking on Russia-Ukraine Conflict Yirong Zeng et.al. 2403.16662 link
2024-03-25 Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT Rohit Raju et.al. 2403.16655 null
2024-03-25 CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment Feiteng Fang et.al. 2403.16649 null
2024-03-25 Virtual Co-Pilot: Multimodal Large Language Model-enabled Quick-access Procedures for Single Pilot Operations Fan Li et.al. 2403.16645 null
2024-03-25 Conversational Grounding: Annotation and Analysis of Grounding Acts and Grounding Units Biswesh Mohapatra et.al. 2403.16609 null
2024-03-25 TrustAI at SemEval-2024 Task 8: A Comprehensive Analysis of Multi-domain Machine Generated Text Detection Techniques Ashok Urlana et.al. 2403.16592 null
2024-03-25 Can Large Language Models (or Humans) Distill Text? Nicolas Audinet de Pieuchon et.al. 2403.16584 null
2024-03-22 LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models Yuzhang Shang et.al. 2403.15388 null
2024-03-22 Long-CLIP: Unlocking the Long-Text Capability of CLIP Beichen Zhang et.al. 2403.15378 null
2024-03-22 Can large language models explore in-context? Akshay Krishnamurthy et.al. 2403.15371 null
2024-03-22 CoLLEGe: Concept Embedding Generation for Large Language Models Ryan Teehan et.al. 2403.15362 null
2024-03-22 Multi-Review Fusion-in-Context Aviv Slobodkin et.al. 2403.15351 null
2024-03-22 CO-Fun: A German Dataset on Company Outsourcing in Fund Prospectuses for Named Entity Recognition and Relation Extraction Neda Foroutan et.al. 2403.15322 null
2024-03-22 Sphere Neural-Networks for Rational Reasoning Tiansi Dong et.al. 2403.15297 null
2024-03-22 Measuring Gender and Racial Biases in Large Language Models Jiafu An et.al. 2403.15281 null
2024-03-22 Bioinformatics and Biomedical Informatics with ChatGPT: Year One Review Jinge Wang et.al. 2403.15274 null
2024-03-22 Event Temporal Relation Extraction based on Retrieval-Augmented on LLMs Xiaobin Zhang et.al. 2403.15273 null
2024-03-21 MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? Renrui Zhang et.al. 2403.14624 null
2024-03-21 Language Repository for Long Video Understanding Kumara Kahatapitiya et.al. 2403.14622 link
2024-03-21 Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey Zeyu Han et.al. 2403.14608 null
2024-03-21 MyVLM: Personalizing VLMs for User-Specific Queries Yuval Alaluf et.al. 2403.14599 null
2024-03-21 Large Language Models for Multi-Choice Question Classification of Medical Subjects Víctor Ponce-López et.al. 2403.14582 null
2024-03-21 RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain William James Bolton et.al. 2403.14578 link
2024-03-21 A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students’ Formative Assessment Responses in Science Clayton Cohn et.al. 2403.14565 null
2024-03-21 EDT: Improving Large Language Models’ Generation by Entropy-based Dynamic Temperature Sampling Shimao Zhang et.al. 2403.14541 null
2024-03-21 Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference Han Zhao et.al. 2403.14520 null
2024-03-21 The Ethics of ChatGPT in Medicine and Healthcare: A Systematic Review on Large Language Models (LLMs) Joschka Haltaufderheide et.al. 2403.14473 null
2024-03-20 RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition Ziyu Liu et.al. 2403.13805 null
2024-03-20 Learning from Models and Data for Visual Grounding Ruozhen He et.al. 2403.13804 null
2024-03-20 Reverse Training to Nurse the Reversal Curse Olga Golovneva et.al. 2403.13799 null
2024-03-20 Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts Guangzeng Han et.al. 2403.13786 null
2024-03-20 Leveraging High-Resolution Features for Improved Deep Hashing-based Image Retrieval Aymene Berriche et.al. 2403.13747 null
2024-03-20 EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation Atnafu Lambebo Tonja et.al. 2403.13737 null
2024-03-20 Large Language Models meet Network Slicing Management and Orchestration Abdulhalim Dandoush et.al. 2403.13721 null
2024-03-20 RoleInteract: Evaluating the Social Interaction of Role-Playing Agents Hongzhan Chen et.al. 2403.13679 null
2024-03-20 Do Not Worry if You Do Not Have Data: Building Pretrained Language Models Using Translationese Meet Doshi et.al. 2403.13638 null
2024-03-20 VL-Mamba: Exploring State Space Models for Multimodal Learning Yanyuan Qiao et.al. 2403.13600 null
2024-03-19 Dated Data: Tracing Knowledge Cutoffs in Large Language Models Jeffrey Cheng et.al. 2403.12958 null
2024-03-19 Automatic Information Extraction From Employment Tribunal Judgements Using Large Language Models Joana Ribeiro de Faria et.al. 2403.12936 null
2024-03-19 Rapid AIdeation: Generating Ideas With the Self and in Collaboration With Large Language Models Gionnieve Lim et.al. 2403.12928 null
2024-03-19 Supporting Energy Policy Research with Large Language Models Grant Buster et.al. 2403.12924 null
2024-03-19 Semantic Layering in Room Segmentation via LLMs Taehyeon Kim et.al. 2403.12920 null
2024-03-19 Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference Baolin Li et.al. 2403.12900 null
2024-03-19 mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding Anwen Hu et.al. 2403.12895 link
2024-03-19 MEDBind: Unifying Language and Multimodal Medical Data Embeddings Yuan Gao et.al. 2403.12894 null
2024-03-19 HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning Fucai Ke et.al. 2403.12884 null
2024-03-19 Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models Zehui Chen et.al. 2403.12881 link
2024-03-18 HDLdebugger: Streamlining HDL debugging with Large Language Models Xufeng Yao et.al. 2403.11671 null
2024-03-18 Let’s Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model Haoyun Xu et.al. 2403.11621 null
2024-03-18 Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines Ekaterina Trofimova et.al. 2403.11585 null
2024-03-18 Reinforcement Learning with Token-level Feedback for Controllable Text Generation Wendi Li et.al. 2403.11558 null
2024-03-18 LLM^3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning Shu Wang et.al. 2403.11552 link
2024-03-18 TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling Weiran Chen et.al. 2403.11550 null
2024-03-18 DEE: Dual-stage Explainable Evaluation Method for Text Generation Shenyu Zhang et.al. 2403.11509 null
2024-03-18 Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis Vishnu Sashank Dorbala et.al. 2403.11487 null
2024-03-18 VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding Yue Fan et.al. 2403.11481 null
2024-03-18 HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models Huy Nghiem et.al. 2403.11456 link
2024-03-14 Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference Piotr Nawrot et.al. 2403.09636 null
2024-03-14 3D-VLA: A 3D Vision-Language-Action Generative World Model Haoyu Zhen et.al. 2403.09631 null
2024-03-14 MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Brandon McKinzie et.al. 2403.09611 null
2024-03-14 Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey Xiaoyu Liu et.al. 2403.09606 null
2024-03-14 Logical Discrete Graphical Models Must Supplement Large Language Models for Information Synthesis Gregory Coppola et.al. 2403.09599 null
2024-03-14 ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models Runyu Ma et.al. 2403.09583 null
2024-03-14 Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation Yunhao Gou et.al. 2403.09572 null
2024-03-14 Enhancing Trust in Autonomous Agents: An Architecture for Accountability and Explainability through Blockchain and Large Language Models Laura Fernández-Becerra et.al. 2403.09567 null
2024-03-14 Welcome Your New AI Teammate: On Safety Analysis by Leashing Large Language Models Ali Nouri et.al. 2403.09565 null
2024-03-14 Less is More: Data Value Estimation for Visual Instruction Tuning Zikang Liu et.al. 2403.09559 null
2024-03-13 Simple and Scalable Strategies to Continually Pre-train Large Language Models Adam Ibrahim et.al. 2403.08763 null
2024-03-13 Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework Jingling Li et.al. 2403.08743 null
2024-03-13 The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models Carlo Nicolini et.al. 2403.08739 null
2024-03-13 Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization Renjie Pi et.al. 2403.08730 null
2024-03-14 SOTOPIA- $π$ : Interactive Learning of Socially Intelligent Language Agents Ruiyi Wang et.al. 2403.08715 link
2024-03-13 Review of Generative AI Methods in Cybersecurity Yagmur Yigit et.al. 2403.08701 null
2024-03-13 TeaMs-RL: Teaching LLMs to Teach Themselves Better Instructions via Reinforcement Learning Shangding Gu et.al. 2403.08694 null
2024-03-13 Token Alignment via Character Matching for Subword Completion Ben Athiwaratkun et.al. 2403.08688 null
2024-03-13 Zero-shot and Few-shot Generation Strategies for Artificial Clinical Records Erlend Frayling et.al. 2403.08664 null
2024-03-13 Human Alignment of Large Language Models through Online Preference Optimisation Daniele Calandriello et.al. 2403.08635 null
2024-03-12 Beyond Text: Frozen Large Language Models in Visual Signal Comprehension Lei Zhu et.al. 2403.07874 link
2024-03-12 Rethinking Generative Large Language Model Evaluation for Semantic Comprehension Fangyun Wei et.al. 2403.07872 null
2024-03-12 Exploring Safety Generalization Challenges of Large Language Models via Code Qibing Ren et.al. 2403.07865 null
2024-03-12 DeliGrasp: Inferring Object Mass, Friction, and Compliance with LLMs for Adaptive and Minimally Deforming Grasp Policies William Xie et.al. 2403.07832 null
2024-03-12 The Missing Piece in Model Editing: A Deep Dive into the Hidden Damage Brought By Model Editing Jianchen Wang et.al. 2403.07825 null
2024-03-12 Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Sainbayar Sukhbaatar et.al. 2403.07816 null
2024-03-12 Fine-tuning Large Language Models with Sequential Instructions Hanxu Hu et.al. 2403.07794 link
2024-03-12 Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations Carlos Jose Xavier Cruz et.al. 2403.07769 link
2024-03-12 Synth $^2$ : Boosting Visual-Language Models with Synthetic Captions and Image Embeddings Sahand Sharifzadeh et.al. 2403.07750 null
2024-03-12 FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models Yan Liu et.al. 2403.07747 null
2024-03-11 Hybrid Human-LLM Corpus Construction and LLM Evaluation for Rare Linguistic Phenomena Leonie Weissweiler et.al. 2403.06965 null
2024-03-11 Materials science in the era of large language models: a perspective Ge Lei et.al. 2403.06949 null
2024-03-11 Naming, Describing, and Quantifying Visual Objects in Humans and LLMs Alberto Testoni et.al. 2403.06935 null
2024-03-11 ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis Yanming Liu et.al. 2403.06932 link
2024-03-11 MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning Yichuan Li et.al. 2403.06914 null
2024-03-11 Exploring Large Language Models and Hierarchical Frameworks for Classification of Large Unstructured Legal Documents Nishchal Prasad et.al. 2403.06872 null
2024-03-11 Development of a Reliable and Accessible Caregiving Language Model (CaLM) Bambang Parmanto et.al. 2403.06857 null
2024-03-11 DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation Guosheng Zhao et.al. 2403.06845 null
2024-03-11 RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback Yanming Liu et.al. 2403.06840 link
2024-03-11 ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts Lyuye Zhang et.al. 2403.06838 null
2024-03-08 Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Machel Reid et.al. 2403.05530 null
2024-03-08 GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM Hao Kang et.al. 2403.05527 link
2024-03-08 Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapola Yijiang Li et.al. 2403.05523 null
2024-03-08 Will GPT-4 Run DOOM? Adrian de Wynter et.al. 2403.05468 null
2024-03-08 Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs Arijit Nag et.al. 2403.05434 null
2024-03-08 Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings Wei Zhou et.al. 2403.05338 null
2024-03-08 ChatASU: Evoking LLM’s Reflexion to Truly Understand Aspect Sentiment in Dialogues Yiding Liu et.al. 2403.05326 null
2024-03-08 RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation Zihao Wang et.al. 2403.05313 null
2024-03-08 Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents Jinyang Li et.al. 2403.05307 null
2024-03-08 ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications Sotaro Takeshita et.al. 2403.05303 link
2024-03-07 Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed Yifan Wang et.al. 2403.04765 null
2024-03-07 iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries Adam Coscia et.al. 2403.04760 link
2024-03-07 KnowledgeVIS: Interpreting Language Models by Comparing Fill-in-the-Blank Prompts Adam Coscia et.al. 2403.04758 link
2024-03-07 LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error Boshi Wang et.al. 2403.04746 link
2024-03-07 SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM Jielin Qiu et.al. 2403.04735 null
2024-03-07 ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes Hashmat Shadab Malik et.al. 2403.04701 null
2024-03-07 Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification Ekaterina Fadeeva et.al. 2403.04696 null
2024-03-07 PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation Junsong Chen et.al. 2403.04692 null
2024-03-07 Telecom Language Models: Must They Be Large? Nicola Piovesan et.al. 2403.04666 null
2024-03-07 QAQ: Quality Adaptive Quantization for LLM KV Cache Shichen Dong et.al. 2403.04643 link
2024-03-06 Bridging Language and Items for Retrieval and Recommendation Yupeng Hou et.al. 2403.03952 link
2024-03-06 Did Translation Models Get More Robust Without Anyone Even Noticing? Ben Peters et.al. 2403.03923 null
2024-03-06 Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing Asmita et.al. 2403.03897 null
2024-03-06 SaulLM-7B: A pioneering Large Language Model for Law Pierre Colombo et.al. 2403.03883 null
2024-03-06 Learning to Decode Collaboratively with Multiple Language Models Shannon Zejiang Shen et.al. 2403.03870 link
2024-03-06 On the Origins of Linear Representations in Large Language Models Yibo Jiang et.al. 2403.03867 null
2024-03-06 KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions Fangyuan Xu et.al. 2403.03866 null
2024-03-06 Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning Deepanway Ghosal et.al. 2403.03864 link
2024-03-06 X-Shot: A Unified System to Handle Frequent, Few-shot and Zero-shot Learning Simultaneously in Classification Hanzi Xu et.al. 2403.03863 link
2024-03-06 Emojinize : Enriching Any Text with Emoji Translations Lars Henning Klein et.al. 2403.03857 null
2024-03-05 The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning Nathaniel Li et.al. 2403.03218 null
2024-03-05 CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments Savitha Sam Abraham et.al. 2403.03203 null
2024-03-05 Towards Democratized Flood Risk Management: An Advanced AI Assistant Enabled by GPT-4 for Enhanced Interpretability and Public Engagement Rafaela Martelo et.al. 2403.03188 link
2024-03-05 MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting Fangchen Liu et.al. 2403.03174 null
2024-03-05 SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection Peng Qi et.al. 2403.03170 null
2024-03-05 PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset Arda Uzunoğlu et.al. 2403.03167 link
2024-03-05 Quantum Many-Body Physics Calculations with Large Language Models Haining Pan et.al. 2403.03154 null
2024-03-05 Language Guided Exploration for RL Agents in Text Environments Hitesh Golchha et.al. 2403.03141 null
2024-03-05 Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution Flor Miriam Plaza-del-Arco et.al. 2403.03121 null
2024-03-05 “In Dialogues We Learn”: Towards Personalized Dialogue Without Pre-defined Profiles through In-Dialogue Learning Chuanqi Cheng et.al. 2403.03102 null
2024-03-02 LM4OPT: Unveiling the Potential of Large Language Models in Formulating Mathematical Optimization Problems Tasnim Ahmed et.al. 2403.01342 null
2024-03-02 Chaining thoughts and LLMs to learn DNA structural biophysics Tyler D. Ross et.al. 2403.01332 null
2024-03-02 VNLP: Turkish NLP Package Meliksah Turker et.al. 2403.01309 null
2024-03-02 VBART: The Turkish LLM Meliksah Turker et.al. 2403.01308 null
2024-03-02 ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation Moran Yanuka et.al. 2403.01306 null
2024-03-02 Improving the Validity of Automatically Generated Feedback via Reinforcement Learning Alexander Scarlatos et.al. 2403.01304 link
2024-03-02 NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention Tianyi Zhang et.al. 2403.01273 null
2024-03-02 Employing LLMs for Incident Response Planning and Review Sam Hays et.al. 2403.01271 null
2024-03-02 A comprehensive cross-language framework for harmful content detection with the aid of sentiment analysis Mohammad Dehghani et.al. 2403.01270 null
2024-03-02 Dissecting Language Models: Machine Unlearning via Selective Pruning Nicholas Pochinkov et.al. 2403.01267 null
2024-02-29 The All-Seeing Project V2: Towards General Relation Comprehension of the Open World Weiyun Wang et.al. 2402.19474 link
2024-02-29 Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling Gabriel Grand et.al. 2402.19471 null
2024-02-29 Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models Chen Qian et.al. 2402.19465 link
2024-02-29 Curiosity-driven Red-teaming for Large Language Models Zhang-Wei Hong et.al. 2402.19464 link
2024-02-29 ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL Yifei Zhou et.al. 2402.19446 link
2024-02-29 Compositional API Recommendation for Library-Oriented Code Generation Zexiong Ma et.al. 2402.19431 null
2024-02-29 Crafting Knowledge: Exploring the Creative Mechanisms of Chat-Based Search Engines Lijia Ma et.al. 2402.19421 null
2024-02-29 On the Scaling Laws of Geographical Representation in Language Models Nathan Godey et.al. 2402.19406 null
2024-02-29 Entity-Aware Multimodal Alignment Framework for News Image Captioning Junzhe Zhang et.al. 2402.19404 null
2024-02-29 Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Match Human Crowd Accuracy Philipp Schoenegger et.al. 2402.19379 null
2024-02-28 Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards Haoxiang Wang et.al. 2402.18571 link
2024-02-28 A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic Gregory Coppola et.al. 2402.18566 null
2024-02-28 Implicit Bias of Next-Token Prediction Christos Thrampoulidis et.al. 2402.18551 null
2024-02-28 Few-Shot Fairness: Unveiling LLM’s Potential for Fairness-Aware Classification Garima Chhikara et.al. 2402.18502 null
2024-02-28 Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration Crystal Qian et.al. 2402.18498 null
2024-02-28 Language Models Represent Beliefs of Self and Others Wentao Zhu et.al. 2402.18496 null
2024-02-28 Meta-Task Prompting Elicits Embedding from Large Language Models Yibin Lei et.al. 2402.18458 null
2024-02-28 Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication Weize Chen et.al. 2402.18439 link
2024-02-28 Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport Bin Li et.al. 2402.18411 link
2024-02-28 A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision Language Models Xiujie Song et.al. 2402.18409 null

(<a href=../README.md>back to main</a>)