LLM
LLM
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-12-18 | AdaTooler-V: Adaptive Tool-Use for Images and Videos | Chaoyang Wang et.al. | 2512.16918 | null |
| 2025-12-18 | Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning | Qihao Liu et.al. | 2512.16917 | null |
| 2025-12-18 | Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward | Peter Chen et.al. | 2512.16912 | null |
| 2025-12-18 | Impacts of Racial Bias in Historical Training Data for News AI | Rahul Bhargava et.al. | 2512.16901 | null |
| 2025-12-18 | Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image | Yushi Hu et.al. | 2512.16899 | null |
| 2025-12-18 | LinkedOut: Linking World Knowledge Representation Out of Video LLM for Next-Generation Video Recommendation | Haichao Zhang et.al. | 2512.16891 | null |
| 2025-12-18 | AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning | Tzu-Han Lin et.al. | 2512.16883 | null |
| 2025-12-18 | TOGGLE: Temporal Logic-Guided Large Language Model Compression for Edge | Khurram Khalil et.al. | 2512.16855 | null |
| 2025-12-18 | Meta-RL Induces Exploration in Language Agents | Yulun Jiang et.al. | 2512.16848 | null |
| 2025-12-18 | Toward Systematic Counterfactual Fairness Evaluation of Large Language Models: The CAFFE Framework | Alessandra Parziale et.al. | 2512.16816 | null |
| 2025-12-18 | From Facts to Conclusions : Integrating Deductive Reasoning in Retrieval-Augmented LLMs | Shubham Mishra et.al. | 2512.16795 | null |
| 2025-12-18 | Inside Out: Uncovering How Comment Internalization Steers LLMs for Better or Worse | Aaron Imani et.al. | 2512.16790 | null |
| 2025-12-18 | Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future | Tianshuai Hu et.al. | 2512.16760 | null |
| 2025-12-18 | Plausibility as Failure: How LLMs and Humans Co-Construct Epistemic Error | Claudia Vale Oliveira et.al. | 2512.16750 | null |
| 2025-12-18 | AI-Driven Prediction of Cancer Pain Episodes: A Hybrid Decision Support Approach | Yipeng Zhuang et.al. | 2512.16739 | null |
| 2025-12-18 | Cyber Humanism in Education: Reclaiming Agency through AI and Learning Sciences | Giovanni Adorni et.al. | 2512.16701 | null |
| 2025-12-18 | Do Multi-Agents Solve Better Than Single? Evaluating Agentic Frameworks for Diagram-Grounded Geometry Problem Solving and Reasoning | Mahbub E Sobhani et.al. | 2512.16698 | null |
| 2025-12-18 | DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI | Hao Liang et.al. | 2512.16676 | null |
| 2025-12-18 | Microsoft Academic Graph Information Retrieval for Research Recommendation and Assistance | Jacob Reiss et.al. | 2512.16661 | null |
| 2025-12-18 | Prefix Probing: Lightweight Harmful Content Detection for Large Language Models | Jirui Yang et.al. | 2512.16650 | null |
| 2025-12-18 | JustRL: Scaling a 1.5B LLM with a Simple RL Recipe | Bingxiang He et.al. | 2512.16649 | null |
| 2025-12-18 | Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game | Barna Pásztor et.al. | 2512.16626 | null |
| 2025-12-18 | Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics | Iker García-Ferrero et.al. | 2512.16602 | null |
| 2025-12-18 | Muon is Provably Faster with Momentum Variance Reduction | Xun Qian et.al. | 2512.16598 | null |
| 2025-12-18 | Sketch-in-Latents: Eliciting Unified Reasoning in MLLMs | Jintao Tong et.al. | 2512.16584 | null |
| 2025-12-18 | Non-Asymptotic Global Convergence of PPO-Clip | Yin Liu et.al. | 2512.16565 | null |
| 2025-12-18 | Needle in the Web: A Benchmark for Retrieving Targeted Web Pages in the Wild | Yumeng Wang et.al. | 2512.16553 | null |
| 2025-12-18 | A Systematic Study of Code Obfuscation Against LLM-based Vulnerability Detection | Xiao Li et.al. | 2512.16538 | null |
| 2025-12-18 | From Personalization to Prejudice: Bias and Discrimination in Memory-Enhanced AI Agents for Recruitment | Himanshu Gharat et.al. | 2512.16532 | null |
| 2025-12-18 | Scaling Laws for Energy Efficiency of Local LLMs | Ander Alvarez et.al. | 2512.16531 | null |
| 2025-12-18 | Plain language adaptations of biomedical text using LLMs: Comparision of evaluation metrics | Primoz Kocbek et.al. | 2512.16530 | null |
| 2025-12-18 | Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems | En-Ming Huang et.al. | 2512.16473 | null |
| 2025-12-18 | cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution | Jinwu Chen et.al. | 2512.16465 | null |
| 2025-12-18 | TimeSeries2Report prompting enables adaptive large language model management of lithium-ion batteries | Jiayang Yang et.al. | 2512.16453 | null |
| 2025-12-18 | Towards AI-Supported Research: a Vision of the TIB AIssistant | Sören Auer et.al. | 2512.16447 | null |
| 2025-12-18 | Topic Modelling Black Box Optimization | Roman Akramov et.al. | 2512.16445 | null |
| 2025-12-18 | TIB AIssistant: a Platform for AI-Supported Research Across Research Life Cycles | Allard Oelen et.al. | 2512.16442 | null |
| 2025-12-18 | From Essence to Defense: Adaptive Semantic-aware Watermarking for Embedding-as-a-Service Copyright Protection | Hao Li et.al. | 2512.16439 | null |
| 2025-12-18 | Introducing ORKG ASK: an AI-driven Scholarly Literature Search and Exploration System Taking a Neuro-Symbolic Approach | Allard Oelen et.al. | 2512.16425 | null |
| 2025-12-18 | Synthelite: Chemist-aligned and feasibility-aware synthesis planning with LLMs | Nguyen Xuan-Vu et.al. | 2512.16424 | null |
| 2025-12-18 | Large Language Models as a (Bad) Security Norm in the Context of Regulation and Compliance | Kaspar Rosager Ludvigsen et.al. | 2512.16419 | null |
| 2025-12-18 | BrepLLM: Native Boundary Representation Understanding with Large Language Models | Liyuan Deng et.al. | 2512.16413 | null |
| 2025-12-18 | A Network Arena for Benchmarking AI Agents on Network Troubleshooting | Zhihao Wang et.al. | 2512.16381 | null |
| 2025-12-18 | Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs | Sara Papi et.al. | 2512.16378 | null |
| 2025-12-18 | Factorized Video Generation: Decoupling Scene Construction and Temporal Synthesis in Text-to-Video Diffusion Models | Mariam Hassan et.al. | 2512.16371 | null |
| 2025-12-18 | AI Needs Physics More Than Physics Needs AI | Peter Coveney et.al. | 2512.16344 | null |
| 2025-12-18 | Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference | Arther Tian et.al. | 2512.16317 | null |
| 2025-12-18 | Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation | Yuxuan Qiao et.al. | 2512.16310 | null |
| 2025-12-18 | PixelArena: A benchmark for Pixel-Precision Visual Intelligence | Feng Liang et.al. | 2512.16303 | null |
| 2025-12-18 | Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection | Fanrui Zhang et.al. | 2512.16300 | null |
| 2025-12-18 | Feature-Selective Representation Misdirection for Machine Unlearning | Taozhao Chen et.al. | 2512.16297 | null |
| 2025-12-18 | MACL: Multi-Label Adaptive Contrastive Learning Loss for Remote Sensing Image Retrieval | Amna Amir et.al. | 2512.16294 | null |
| 2025-12-18 | Ein Typenrad auf der Überholspur: Die Kult-Schreibmaschine “Erika” trifft KI | Karola Köpferl et.al. | 2512.16293 | null |
| 2025-12-18 | In-Context Probing for Membership Inference in Fine-Tuned Language Models | Zhexi Lu et.al. | 2512.16292 | null |
| 2025-12-18 | Evaluating OpenAI GPT Models for Translation of Endangered Uralic Languages: A Comparison of Reasoning and Non-Reasoning Architectures | Yehor Tereshchenko et.al. | 2512.16287 | null |
| 2025-12-18 | CKA-Guided Modular Quantization: Beyond Bit-Width to Algorithmic Diversity | Jinhao Zhang et.al. | 2512.16282 | null |
| 2025-12-18 | Love, Lies, and Language Models: Investigating AI’s Role in Romance-Baiting Scams | Gilad Gressel et.al. | 2512.16280 | null |
| 2025-12-18 | QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems | Yiliu Yang et.al. | 2512.16279 | null |
| 2025-12-18 | Fast Collaborative Inference via Distributed Speculative Decoding | Ce Zheng et.al. | 2512.16273 | null |
| 2025-12-18 | Beyond Blind Spots: Analytic Hints for Mitigating LLM-Based Evaluation Pitfalls | Ora Nova Fandina et.al. | 2512.16272 | null |
| 2025-12-18 | Learning to Wait: Synchronizing Agents with the Physical World | Yifei She et.al. | 2512.16262 | null |
| 2025-12-18 | AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding | Sanjoy Chowdhury et.al. | 2512.16250 | null |
| 2025-12-18 | AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints | Aniruddha Roy et.al. | 2512.16245 | null |
| 2025-12-18 | Coarse-to-Fine Open-Set Graph Node Classification with Large Language Models | Xueqi Ma et.al. | 2512.16244 | null |
| 2025-12-18 | Trustworthy and Controllable Professional Knowledge Utilization in Large Language Models with TEE-GPU Execution | Yifeng Cai et.al. | 2512.16238 | null |
| 2025-12-18 | The Evolution of Reranking Models in Information Retrieval: From Heuristic Methods to Large Language Models | Tejul Pandit et.al. | 2512.16236 | null |
| 2025-12-18 | LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding | Chenkai Xu et.al. | 2512.16229 | null |
| 2025-12-18 | An Information-Theoretic Framework for Robust Large Language Model Editing | Qizhou Chen et.al. | 2512.16227 | null |
| 2025-12-18 | DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack | Hao Li et.al. | 2512.16182 | null |
| 2025-12-18 | Ev-Trust: A Strategy Equilibrium Trust Mechanism for Evolutionary Games in LLM-Based Multi-Agent Services | Shiduo Yang et.al. | 2512.16167 | null |
| 2025-12-18 | Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference | Jian Tian et.al. | 2512.16134 | null |
| 2025-12-18 | Scaling Text2SQL via LLM-efficient Schema Filtering with Functional Dependency Graph Rerankers | Thanh Dat Hoang et.al. | 2512.16083 | null |
| 2025-12-18 | Auto-Vocabulary 3D Object Detection | Haomeng Zhang et.al. | 2512.16077 | null |
| 2025-12-18 | LLM4Perf: Large Language Models Are Effective Samplers for Multi-Objective Performance Modeling (Copy) | Xin Wang et.al. | 2512.16070 | null |
| 2025-12-18 | A Multi-Agent Large Language Model Framework for Automated Qualitative Analysis | Qidi Xu et.al. | 2512.16063 | null |
| 2025-12-18 | ContextLeak: Auditing Leakage in Private In-Context Learning Methods | Jacob Choi et.al. | 2512.16059 | null |
| 2025-12-18 | MultiPath Transfer Engine: Breaking GPU and Host-Memory Bandwidth Bottlenecks in LLM Services | Lingfeng Tang et.al. | 2512.16056 | null |
| 2025-12-17 | Topic Discovery and Classification for Responsible Generative AI Adaptation in Higher Education | Diane Myung-kyung Woodbridge et.al. | 2512.16036 | null |
| 2025-12-17 | Do Large Language Models Know What They Don’t Know? Kalshibench: A New Benchmark for Evaluating Epistemic Calibration via Prediction Markets | Lukas Nel et.al. | 2512.16030 | null |
| 2025-12-17 | Cross-Language Bias Examination in Large Language Models | Yuxuan Liang et.al. | 2512.16029 | null |
| 2025-12-17 | Conversational Time Series Foundation Models: Towards Explainable and Effective Forecasting | Defu Cao et.al. | 2512.16022 | null |
| 2025-12-17 | Few-Shot Inference of Human Perceptions of Robot Performance in Social Navigation Scenarios | Qiping Zhang et.al. | 2512.16019 | null |
| 2025-12-17 | OLAF: Towards Robust LLM-Based Annotation Framework in Empirical Software Engineering | Mia Mohammad Imran et.al. | 2512.15979 | null |
| 2025-12-17 | Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models | Caner Erden et.al. | 2512.15973 | null |
| 2025-12-17 | BRAID: Bounded Reasoning for Autonomous Inference and Decisions | Armağan Amcalar et.al. | 2512.15959 | null |
| 2025-12-17 | The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs | Tejas Anvekar et.al. | 2512.15949 | null |
| 2025-12-17 | Privacy Discourse and Emotional Dynamics in Mental Health Information Interaction on Reddit | Jai Kruthunz Naveen Kumar et.al. | 2512.15945 | null |
| 2025-12-17 | Small Language Models for Efficient Agentic Tool Calling: Outperforming Large Models with Targeted Fine-tuning | Polaris Jhandi et.al. | 2512.15943 | null |
| 2025-12-17 | City Navigation in the Wild: Exploring Emergent Navigation from Web-Scale Knowledge in MLLMs | Dwip Dalal et.al. | 2512.15933 | null |
| 2025-12-17 | DSO: Direct Steering Optimization for Bias Mitigation | Lucas Monteiro Paes et.al. | 2512.15926 | null |
| 2025-12-17 | Leveraging Spreading Activation for Improved Document Retrieval in Knowledge-Graph-Based RAG Systems | Jovan Pavlović et.al. | 2512.15922 | null |
| 2025-12-17 | TabReX : Tabular Referenceless eXplainable Evaluation | Tejas Anvekar et.al. | 2512.15907 | null |
| 2025-12-17 | Darth Vecdor: An Open-Source System for Generating Knowledge Graphs Through Large Language Model Queries | Jonathan A. Handler et.al. | 2512.15906 | null |
| 2025-12-17 | PediatricAnxietyBench: Evaluating Large Language Model Safety Under Parental Anxiety and Pressure in Pediatric Consultations | Vahideh Zolfaghari et.al. | 2512.15894 | null |
| 2025-12-17 | VET Your Agent: Towards Host-Independent Autonomy via Verifiable Execution Traces | Artem Grigor et.al. | 2512.15892 | null |
| 2025-12-17 | Seeing Beyond Words: Self-Supervised Visual Learning for Multimodal Large Language Models | Davide Caffagni et.al. | 2512.15885 | null |
| 2025-12-17 | HEPTAPOD: Orchestrating High Energy Physics Workflows Towards Autonomous Agency | Tony Menzo et.al. | 2512.15867 | null |
| 2025-12-17 | Dynamic Rebatching for Efficient Early-Exit Inference with DREX | Xuting Liu et.al. | 2512.15705 | null |
| 2025-12-17 | Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning | Yifei Li et.al. | 2512.15693 | null |
| 2025-12-17 | Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning | Zhenwen Liang et.al. | 2512.15687 | null |
| 2025-12-17 | Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers | Adam Karvonen et.al. | 2512.15674 | null |
| 2025-12-17 | Explaining the Reasoning of Large Language Models Using Attribution Graphs | Chase Walker et.al. | 2512.15663 | null |
| 2025-12-17 | Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning | Jiaqi Xu et.al. | 2512.15662 | null |
| 2025-12-17 | How Much is Too Much? Exploring LoRA Rank Trade-offs for Retaining Knowledge and Domain Robustness | Darshita Rathore et.al. | 2512.15634 | null |
| 2025-12-17 | Evaluating Metrics for Safety with LLM-as-Judges | Kester Clegg et.al. | 2512.15617 | null |
| 2025-12-17 | Behavior Tokens Speak Louder: Disentangled Explainable Recommendation with Behavior Vocabulary | Xinshun Feng et.al. | 2512.15614 | null |
| 2025-12-17 | Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction | Mathieu Blondel et.al. | 2512.15605 | null |
| 2025-12-17 | Evaluating Large Language Models in Scientific Discovery | Zhangde Song et.al. | 2512.15567 | null |
| 2025-12-17 | GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models | Bozhou Li et.al. | 2512.15560 | null |
| 2025-12-17 | CTkvr: KV Cache Retrieval for Long-Context LLMs via Centroid then Token Indexing | Kuan Lu et.al. | 2512.15550 | null |
| 2025-12-17 | When a Nation Speaks: Machine Learning and NLP in People’s Sentiment Analysis During Bangladesh’s 2024 Mass Uprising | Md. Samiul Alim et.al. | 2512.15547 | null |
| 2025-12-17 | An Efficient and Effective Encoder Model for Vision and Language Tasks in the Remote Sensing Domain | João Daniel Silva et.al. | 2512.15531 | null |
| 2025-12-17 | EmoCaliber: Advancing Reliable Visual Emotion Comprehension via Confidence Verbalization and Calibration | Daiqing Wu et.al. | 2512.15528 | null |
| 2025-12-17 | How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code? | Hua Yang et.al. | 2512.15468 | null |
| 2025-12-17 | On Assessing the Relevance of Code Reviews Authored by Generative Models | Robert Heumüller et.al. | 2512.15466 | null |
| 2025-12-17 | Toward expert-level motivational interviewing for health behavior improvement with LLMs | Run-ze Hu et.al. | 2512.15446 | null |
| 2025-12-17 | Step-GUI Technical Report | Haolong Yan et.al. | 2512.15431 | null |
| 2025-12-17 | Can AI Generate more Comprehensive Test Scenarios? Review on Automated Driving Systems Test Scenario Generation Methods | Ji Zhou et.al. | 2512.15422 | null |
| 2025-12-17 | Bilateral Spatial Reasoning about Street Networks: Graph-based RAG with Qualitative Spatial Representations | Reinhard Moratz et.al. | 2512.15388 | null |
| 2025-12-17 | MedNuggetizer: Confidence-Based Information Nugget Extraction from Medical Documents | Gregor Donabauer et.al. | 2512.15384 | null |
| 2025-12-17 | SCOPE: Prompt Evolution for Enhancing Agent Effectiveness | Zehua Pei et.al. | 2512.15374 | null |
| 2025-12-17 | ArcBERT: An LLM-based Search Engine for Exploring Integrated Multi-Omics Metadata | Gajendra Doniparthi et.al. | 2512.15365 | null |
| 2025-12-17 | Revisiting Task-Oriented Dataset Search in the Era of Large Language Models: Challenges, Benchmark, and Solution | Zixin Wei et.al. | 2512.15363 | null |
| 2025-12-17 | Dual-Density Inference for Efficient Language Model Reasoning | Zhengyi Zhao et.al. | 2512.15358 | null |
| 2025-12-17 | Adversarial versification in portuguese as a jailbreak operator in LLMs | Joao Queiroz et.al. | 2512.15353 | null |
| 2025-12-17 | Exploring User Acceptance and Concerns toward LLM-powered Conversational Agents in Immersive Extended Reality | Efe Bozkir et.al. | 2512.15343 | null |
| 2025-12-17 | Evaluating LLMs for Zeolite Synthesis Event Extraction (ZSEE): A Systematic Analysis of Prompting Strategies | Charan Prakash Rathore et.al. | 2512.15312 | null |
| 2025-12-17 | SynthSeg-Agents: Multi-Agent Synthetic Data Generation for Zero-Shot Weakly Supervised Semantic Segmentation | Wangyu Wu et.al. | 2512.15310 | null |
| 2025-12-17 | Towards Proactive Personalization through Profile Customization for Individual Users in Dialogues | Xiaotian Zhang et.al. | 2512.15302 | null |
| 2025-12-17 | ChatGPT and Gemini participated in the Korean College Scholastic Ability Test – Earth Science I | Seok-Hyun Ga et.al. | 2512.15298 | null |
| 2025-12-17 | Heterogeneous Model Alignment in Digital Twin | Faima Abbasi et.al. | 2512.15281 | null |
| 2025-12-17 | Bounty Hunter: Autonomous, Comprehensive Emulation of Multi-Faceted Adversaries | Louis Hackländer-Jansen et.al. | 2512.15275 | null |
| 2025-12-17 | Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning | Yiliu Sun et.al. | 2512.15274 | null |
| 2025-12-17 | Gaming the Arena: AI Model Evaluation and the Viral Capture of Attention | Sam Hind et.al. | 2512.15252 | null |
| 2025-12-17 | The Moralization Corpus: Frame-Based Annotation and Analysis of Moralizing Speech Acts across Diverse Text Genres | Maria Becker et.al. | 2512.15248 | null |
| 2025-12-17 | Null-LoRA: Low-Rank Adaptation on Null Space | Yi Zhang et.al. | 2512.15233 | null |
| 2025-12-17 | CangLing-KnowFlow: A Unified Knowledge-and-Flow-fused Agent for Comprehensive Remote Sensing Applications | Zhengchao Chen et.al. | 2512.15231 | null |
| 2025-12-17 | Yes-MT’s Submission to the Low-Resource Indic Language Translation Shared Task in WMT 2024 | Yash Bhaskar et.al. | 2512.15226 | null |
| 2025-12-17 | RFKG-CoT: Relation-Driven Adaptive Hop-count Selection and Few-Shot Path Guidance for Knowledge-Aware QA | Chao Zhang et.al. | 2512.15219 | null |
| 2025-12-17 | DEER: Draft with Diffusion, Verify with Autoregressive Models | Zicong Cheng et.al. | 2512.15176 | null |
| 2025-12-17 | MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers | Xuanjun Zong et.al. | 2512.15163 | null |
| 2025-12-17 | Offline Multi-Task Multi-Objective Data-Driven Evolutionary Algorithm with Language Surrogate Model and Implicit Q-Learning | Xian-Rong Zhang et.al. | 2512.15149 | null |
| 2025-12-17 | Aligning Academia with Industry: An Empirical Study of Industrial Needs and Academic Capabilities in AI-Driven Software Engineering | Hang Yu et.al. | 2512.15148 | null |
| 2025-12-17 | Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning | Weiqin Wang et.al. | 2512.15146 | null |
| 2025-12-17 | I am here for you”: How relational conversational AI appeals to adolescents, especially those who are socially and emotionally vulnerable | Pilyoung Kim et.al. | 2512.15117 | null |
| 2025-12-17 | Uni-Parser Technical Report | Xi Fang et.al. | 2512.15098 | null |
| 2025-12-17 | Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models | Jinwu Hu et.al. | 2512.15089 | null |
| 2025-12-17 | The Semantic Architect: How FEAML Bridges Structured Data and LLMs for Multi-Label Tasks | Wanfu Gao et.al. | 2512.15082 | null |
| 2025-12-17 | Quantifying Return on Security Controls in LLM Systems | Richard Helder Moulton et.al. | 2512.15081 | null |
| 2025-12-17 | An Exploratory Study of Bayesian Prompt Optimization for Test-Driven Code Generation with Large Language Models | Shlok Tomar et.al. | 2512.15076 | null |
| 2025-12-17 | The Meta-Prompting Protocol: Orchestrating LLMs via Adversarial Feedback Loops | Fanzhe Fu et.al. | 2512.15053 | null |
| 2025-12-17 | SGM: Safety Glasses for Multimodal Large Language Models via Neuron-Level Detoxification | Hongbo Wang et.al. | 2512.15052 | null |
| 2025-12-17 | Beyond Accuracy: A Geometric Stability Analysis of Large Language Models in Chess Evaluation | Xidan Song et.al. | 2512.15033 | null |
| 2025-12-17 | Toxicity Ahead: Forecasting Conversational Derailment on GitHub | Mia Mohammad Imran et.al. | 2512.15031 | null |
| 2025-12-17 | SeBERTis: A Framework for Producing Classifiers of Security-Related Issue Reports | Sogol Masoumzadeh et.al. | 2512.15003 | null |
| 2025-12-17 | DreamPRM-Code: Function-as-Step Process Reward Model with Label Correction for LLM Coding | Ruiyi Zhang et.al. | 2512.15000 | null |
| 2025-12-17 | Evaluating Large Language Models on Multimodal Chemistry Olympiad Exams | Yiming Cui et.al. | 2512.14989 | null |
| 2025-12-16 | EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving | Shaoting Feng et.al. | 2512.14946 | null |
| 2025-12-16 | Parameter Efficient Multimodal Instruction Tuning for Romanian Vision Language Models | George-Andrei Dima et.al. | 2512.14926 | null |
| 2025-12-16 | Multiscale Aggregated Hierarchical Attention (MAHA): A Game Theoretic and Optimization Driven Approach to Efficient Contextual Modeling in Large Language Models | Caner Erden et.al. | 2512.14925 | null |
| 2025-12-16 | Evaluating Code Reasoning Abilities of Large Language Models Under Real-World Settings | Changshu Liu et.al. | 2512.14917 | null |
| 2025-12-16 | DrugRAG: Enhancing Pharmacy LLM Performance Through A Novel Retrieval-Augmented Generation Pipeline | Houman Kazemzadeh et.al. | 2512.14896 | null |
| 2025-12-16 | Integrating Large Language Models and Knowledge Graphs to Capture Political Viewpoints in News Media | Massimiliano Fadda et.al. | 2512.14887 | null |
| 2025-12-16 | Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse | Jingwei Chen et.al. | 2512.14879 | null |
| 2025-12-16 | Isolated Sign Language Recognition with Segmentation and Pose Estimation | Daniel Perkins et.al. | 2512.14876 | null |
| 2025-12-16 | HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering | Dan Ben-Ami et.al. | 2512.14870 | null |
| 2025-12-16 | MALCDF: A Distributed Multi-Agent LLM Framework for Real-Time Cyber | Arth Bhardwaj et.al. | 2512.14846 | null |
| 2025-12-16 | Sharing State Between Prompts and Programs | Ellie Y. Cheng et.al. | 2512.14805 | null |
| 2025-12-16 | Incentives or Ontology? A Structural Rebuttal to OpenAI’s Hallucination Thesis | Richard Ackermann et.al. | 2512.14801 | null |
| 2025-12-16 | IaC Generation with LLMs: An Error Taxonomy and A Study on Configuration Knowledge Injection | Roman Nekrasov et.al. | 2512.14792 | null |
| 2025-12-16 | TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs | Jun Zhang et.al. | 2512.14698 | null |
| 2025-12-16 | Fast and Accurate Causal Parallel Decoding using Jacobi Forcing | Lanxiang Hu et.al. | 2512.14681 | null |
| 2025-12-16 | EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models | Zechen Bai et.al. | 2512.14666 | null |
| 2025-12-16 | Enhancing Visual Sentiment Analysis via Semiotic Isotopy-Guided Dataset Construction | Marco Blanchini et.al. | 2512.14665 | null |
| 2025-12-16 | Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models | Chiyue Wei et.al. | 2512.14661 | null |
| 2025-12-16 | Beyond Text-to-SQL: Autonomous Research-Driven Database Exploration with DAR | Ostap Vykhopen et.al. | 2512.14622 | null |
| 2025-12-16 | PerProb: Indirectly Evaluating Memorization in Large Language Models | Yihan Liao et.al. | 2512.14600 | null |
| 2025-12-16 | LLM-driven Knowledge Enhancement for Multimodal Cancer Survival Prediction | Chenyu Zhao et.al. | 2512.14594 | null |
| 2025-12-16 | Towards Nepali-language LLMs: Efficient GPT training with a Nepali BPE tokenizer | Adarsha Shrestha et.al. | 2512.14585 | null |
| 2025-12-16 | Pairwise Comparison for Bias Identification and Quantification | Fabian Haak et.al. | 2512.14565 | null |
| 2025-12-16 | Polypersona: Persona-Grounded LLM for Synthetic Survey Responses | Tejaswani Dash et.al. | 2512.14562 | null |
| 2025-12-16 | Agreement Between Large Language Models and Human Raters in Essay Scoring: A Research Synthesis | Hongli Li et.al. | 2512.14561 | null |
| 2025-12-16 | CLNet: Cross-View Correspondence Makes a Stronger Geo-Localizationer | Xianwei Cao et.al. | 2512.14560 | null |
| 2025-12-16 | VLegal-Bench: Cognitively Grounded Benchmark for Vietnamese Legal Reasoning of Large Language Models | Nguyen Tien Dong et.al. | 2512.14554 | null |
| 2025-12-16 | VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse | Ying Nie et.al. | 2512.14531 | null |
| 2025-12-16 | RecGPT-V2 Technical Report | Chao Yi et.al. | 2512.14503 | null |
| 2025-12-16 | C-ing Clearly: Enhanced Binary Code Explanations using C code | Teodor Poncu et.al. | 2512.14500 | null |
| 2025-12-16 | SASQ: Static Activation Scaling for Quantization-Aware Training in Large Language Models | Shizhuo Mao et.al. | 2512.14481 | null |
| 2025-12-16 | Model-First Reasoning LLM Agents: Reducing Hallucinations through Explicit Problem Modeling | Annu Rana et.al. | 2512.14474 | null |
| 2025-12-16 | Reasoning-Style Poisoning of LLM Agents via Stealthy Style Transfer: Process-Level Attacks and Runtime Monitoring in RSV Space | Xingfu Zhou et.al. | 2512.14448 | null |
| 2025-12-16 | Seismology modeling agent: A smart assistant for geophysical researchers | Yukun Ren et.al. | 2512.14429 | null |
| 2025-12-16 | Effect of Document Packing on the Latent Multi-Hop Reasoning Capabilities of Large Language Models | Gabriele Prato et.al. | 2512.14427 | null |
| 2025-12-16 | DISCODE: Distribution-Aware Score Decoder for Robust Automatic Evaluation of Image Captioning | Nakamasa Inoue et.al. | 2512.14420 | null |
| 2025-12-16 | PortAgent: LLM-driven Vehicle Dispatching Agent for Port Terminals | Jia Hu et.al. | 2512.14417 | null |
| 2025-12-16 | Massive Editing for Large Language Models Based on Dynamic Weight Generation | Wentao Wan et.al. | 2512.14395 | null |
| 2025-12-16 | RePo: Language Models with Context Re-Positioning | Huayang Li et.al. | 2512.14391 | null |
| 2025-12-16 | Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations | Xudong Han et.al. | 2512.14321 | null |
| 2025-12-16 | Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity | Shuai Dong et.al. | 2512.14320 | null |
| 2025-12-16 | Inflation Attitudes of Large Language Models | Nikoleta Anesti et.al. | 2512.14306 | null |
| 2025-12-16 | Leveraging LLMs for Collaborative Ontology Engineering in Parkinson Disease Monitoring and Alerting | Georgios Bouchouras et.al. | 2512.14288 | null |
| 2025-12-16 | The Trust in AI-Generated Health Advice (TAIGHA) Scale and Short Version (TAIGHA-S): Development and Validation Study | Marvin Kopka et.al. | 2512.14278 | null |
| 2025-12-16 | SPARQL-LLM: Real-Time SPARQL Query Generation from Natural Language Questions | Panayiotis Smeros et.al. | 2512.14277 | null |
| 2025-12-16 | Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs | Wentao Wan et.al. | 2512.14257 | null |
| 2025-12-16 | TEMP: A Memory Efficient Physical-aware Tensor Partition-Mapping Framework on Wafer-scale Chips | Huizheng Wang et.al. | 2512.14256 | null |
| 2025-12-16 | From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition | Yiqing Zhou et.al. | 2512.14244 | null |
| 2025-12-16 | Two CFG Nahuatl for automatic corpora expansion | Juan-José Guzmán-Landa et.al. | 2512.14239 | null |
| 2025-12-16 | Ladder Up, Memory Down: Low-Cost Fine-Tuning With Side Nets | Estelle Zheng et.al. | 2512.14237 | null |
| 2025-12-16 | PentestEval: Benchmarking LLM-based Penetration Testing with Modular and Stage-Level Design | Ruozhao Yang et.al. | 2512.14233 | null |
| 2025-12-16 | Georeferencing complex relative locality descriptions with large language models | Aneesha Fernando et.al. | 2512.14228 | null |
| 2025-12-16 | Estimating problem difficulty without ground truth using Large Language Model comparisons | Marthe Ballon et.al. | 2512.14220 | null |
| 2025-12-16 | IntentMiner: Intent Inversion Attack via Tool Call Analysis in the Model Context Protocol | Yunhao Yao et.al. | 2512.14166 | null |
| 2025-12-16 | Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement | Songze Liu et.al. | 2512.14151 | null |
| 2025-12-16 | Astraea: A State-Aware Scheduling Engine for LLM-Powered Agents | Hongqiu Ni et.al. | 2512.14142 | null |
| 2025-12-16 | TorchTraceAP: A New Benchmark Dataset for Detecting Performance Anti-Patterns in Computer Vision Models | Hanning Chen et.al. | 2512.14141 | null |
| 2025-12-16 | LAPPI: Interactive Optimization with LLM-Assisted Preference-Based Problem Instantiation | So Kuroki et.al. | 2512.14138 | null |
| 2025-12-16 | SportsGPT: An LLM-driven Framework for Interpretable Sports Motion Assessment and Training Guidance | Wenbo Tian et.al. | 2512.14121 | null |
| 2025-12-16 | CogMem: A Cognitive Memory Architecture for Sustained Multi-Turn Reasoning in Large Language Models | Yiran Zhang et.al. | 2512.14118 | null |
| 2025-12-16 | Neurosymbolic Inference On Foundation Models For Remote Sensing Text-to-image Retrieval With Complex Queries | Emanuele Mezzi et.al. | 2512.14102 | null |
| 2025-12-16 | A First-Order Logic-Based Alternative to Reward Models in RLHF | Chunjin Jian et.al. | 2512.14100 | null |
| 2025-12-16 | Cornserve: Efficiently Serving Any-to-Any Multimodal Models | Jeff J. Ma et.al. | 2512.14098 | null |
| 2025-12-16 | A Unified Sparse Attention via Multi-Granularity Compression | Siran Liu et.al. | 2512.14082 | null |
| 2025-12-16 | From Obfuscated to Obvious: A Comprehensive JavaScript Deobfuscation Tool for Security Analysis | Dongchao Zhou et.al. | 2512.14070 | null |
| 2025-12-16 | RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees | Junjie Ma et.al. | 2512.14069 | null |
| 2025-12-16 | What Affects the Effective Depth of Large Language Models? | Yi Hu et.al. | 2512.14064 | null |
| 2025-12-16 | HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices | HyperAI Team et.al. | 2512.14052 | null |
| 2025-12-16 | OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value | Mengzhang Cai et.al. | 2512.14051 | null |
| 2025-12-16 | Intention Chain-of-Thought Prompting with Dynamic Routing for Code Generation | Shen Li et.al. | 2512.14048 | null |
| 2025-12-16 | Evaluating Small Language Models for Agentic On-Farm Decision Support Systems | Enhong Liu et.al. | 2512.14043 | null |
| 2025-12-16 | ChartAgent: A Chart Understanding Framework with Tool Integrated Reasoning | Boran Wang et.al. | 2512.14040 | null |
| 2025-12-16 | PerfCoder: Large Language Models for Interpretable Code Performance Optimization | Jiuding Yang et.al. | 2512.14018 | null |
| 2025-12-16 | KFS-Bench: Comprehensive Evaluation of Key Frame Sampling in Long Video Understanding | Zongyao Li et.al. | 2512.14017 | null |
| 2025-12-16 | Sparsity-Controllable Dynamic Top-p MoE for Large Foundation Model Pre-training | Can Jin et.al. | 2512.13996 | null |
| 2025-12-16 | Structure-Aware Decoding Mechanisms for Complex Entity Extraction with Large-Scale Language Models | Zhimin Qiu et.al. | 2512.13980 | null |
| 2025-12-16 | ReflCtrl: Controlling LLM Reflection via Representation Engineering | Ge Yan et.al. | 2512.13979 | null |
| 2025-12-16 | Evaluating Frontier LLMs on PhD-Level Mathematical Reasoning: A Benchmark on a Textbook in Theoretical Computer Science about Randomized Algorithms | Yang Cao et.al. | 2512.13978 | null |
| 2025-12-16 | Autonomous Construction-Site Safety Inspection Using Mobile Robots: A Multilayer VLM-LLM Pipeline | Hossein Naderi et.al. | 2512.13974 | null |
| 2025-12-15 | Informing Acquisition Functions via Foundation Models for Molecular Discovery | Qi Chen et.al. | 2512.13935 | null |
| 2025-12-15 | Hierarchical Multi-agent Large Language Model Reasoning for Autonomous Functional Materials Discovery | Samuel Rothfarb et.al. | 2512.13930 | null |
| 2025-12-15 | Context Branching for LLM Conversations: A Version Control Approach to Exploratory Programming | Bhargav Chickmagalur Nanjundappa et.al. | 2512.13914 | null |
| 2025-12-15 | FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition | Jonas Golde et.al. | 2512.13884 | null |
| 2025-12-15 | Verification-Guided Context Optimization for Tool Calling via Hierarchical LLMs-as-Editors | Henger Li et.al. | 2512.13860 | null |
| 2025-12-15 | EvoLattice: Persistent Internal-Population Evolution through Multi-Alternative Quality-Diversity Graph Representations for LLM-Guided Program Discovery | Kamer Ali Yuksel et.al. | 2512.13857 | null |
| 2025-12-15 | Practitioner Insights on Fairness Requirements in the AI Development Life Cycle: An Interview Study | Chaima Boufaied et.al. | 2512.13830 | null |
| 2025-12-15 | The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces | Subramanyam Sahoo et.al. | 2512.13821 | null |
| 2025-12-15 | State-Dependent Refusal and Learned Incapacity in RLHF-Aligned Language Models | TK Lee et.al. | 2512.13762 | null |
| 2025-12-15 | A Scientific Reasoning Model for Organic Synthesis Procedure Generation | Guoqing Liu et.al. | 2512.13668 | null |
| 2025-12-15 | Embedding-Based Rankings of Educational Resources based on Learning Outcome Alignment: Benchmarking, Expert Validation, and Learner Performance | Mohammadreza Molavi et.al. | 2512.13658 | null |
| 2025-12-15 | Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation | Richard J. Young et.al. | 2512.13655 | null |
| 2025-12-15 | Large-Language Memorization During the Classification of United States Supreme Court Cases | John E. Ortega et.al. | 2512.13654 | null |
| 2025-12-15 | MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning | Haoyu Fu et.al. | 2512.13636 | null |
| 2025-12-15 | Temporal Tokenization Strategies for Event Sequence Modeling with Large Language Models | Zefang Liu et.al. | 2512.13618 | null |
| 2025-12-15 | Textual Gradients are a Flawed Metaphor for Automatic Prompt Optimization | Daniel Melcer et.al. | 2512.13598 | null |
| 2025-12-15 | ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding | Jia-Nan Li et.al. | 2512.13586 | null |
| 2025-12-15 | MMhops-R1: Multimodal Multi-hop Reasoning | Tao Zhang et.al. | 2512.13573 | null |
| 2025-12-15 | PrahokBART: A Pre-trained Sequence-to-Sequence Model for Khmer Natural Language Generation | Hour Kaing et.al. | 2512.13552 | null |
| 2025-12-15 | Fine-tuned LLM-based Code Migration Framework | Oleg Grynets et.al. | 2512.13515 | null |
| 2025-12-15 | MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph | Linjie Mu et.al. | 2512.13510 | null |
| 2025-12-15 | SkipCat: Rank-Maximized Low-Rank Compression of Large Language Models via Shared Projection and Block Skipping | Yu-Chen Lu et.al. | 2512.13494 | null |
| 2025-12-15 | From Zipf’s Law to Neural Scaling through Heaps’ Law and Hilberg’s Hypothesis | Łukasz Dębowski et.al. | 2512.13491 | null |
| 2025-12-15 | neuralFOMO: Can LLMs Handle Being Second Best? Measuring Envy-Like Preferences in Multi-Agent Settings | Ojas Pungalia et.al. | 2512.13481 | null |
| 2025-12-15 | Non-Resolution Reasoning (NRR): A Computational Framework for Contextual Identity and Ambiguity Preservation | Kei Saito et.al. | 2512.13478 | null |
| 2025-12-15 | Scaling Laws for Code: Every Programming Language Matters | Jian Yang et.al. | 2512.13472 | null |
| 2025-12-15 | Large language models are not about natural language | Johan J. Bolhuis et.al. | 2512.13441 | null |
| 2025-12-15 | From User Interface to Agent Interface: Efficiency Optimization of UI Representations for LLM Agents | Dezhi Ran et.al. | 2512.13438 | null |
| 2025-12-15 | Behavior and Representation in Large Language Models for Combinatorial Optimization: From Feature Extraction to Algorithm Selection | Francesca Da Ros et.al. | 2512.13374 | null |
| 2025-12-15 | Detecting Emotion Drift in Mental Health Text Using Pre-Trained Transformers | Shibani Sankpal et.al. | 2512.13363 | null |
| 2025-12-15 | UCRBench: Benchmarking LLMs on Use Case Recovery | Shuyuan Xiao et.al. | 2512.13360 | null |
| 2025-12-15 | On the Effectiveness of Membership Inference in Targeted Data Extraction from Large Language Models | Ali Al Sahili et.al. | 2512.13352 | null |
| 2025-12-15 | FROC: A Unified Framework with Risk-Optimized Control for Machine Unlearning in LLMs | Si Qi Goh et.al. | 2512.13337 | null |
| 2025-12-15 | FIN-bench-v2: A Unified and Robust Benchmark Suite for Evaluating Finnish Large Language Models | Joona Kytöniemi et.al. | 2512.13330 | null |
| 2025-12-15 | Security and Detectability Analysis of Unicode Text Watermarking Methods Against Large Language Models | Malte Hellmeier et.al. | 2512.13325 | null |
| 2025-12-15 | KlingAvatar 2.0 Technical Report | Kling Team et.al. | 2512.13313 | null |
| 2025-12-15 | MiniLingua: A Small Open-Source LLM for European Languages | Anna Aksenova et.al. | 2512.13298 | null |
| 2025-12-15 | AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning | Jiaru Zou et.al. | 2512.13278 | null |
| 2025-12-15 | CogniEdit: Dense Gradient Flow Optimization for Fine-Grained Image Editing | Yan Li et.al. | 2512.13276 | null |
| 2025-12-15 | Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection | Juil Koo et.al. | 2512.13250 | null |
| 2025-12-15 | Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance | Francesco Ragusa et.al. | 2512.13238 | null |
| 2025-12-15 | Efficient Adaptive Rejection Sampling for Accelerating Speculative Decoding in Large Language Models | Chendong Sun et.al. | 2512.13194 | null |
| 2025-12-15 | Integrated Semantic and Temporal Alignment for Interactive Video Retrieval | Thanh-Danh Luu et.al. | 2512.13169 | null |
| 2025-12-15 | A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis | Xianchao Guan et.al. | 2512.13164 | null |
| 2025-12-15 | Can AI Understand What We Cannot Say? Measuring Multilevel Alignment Through Abortion Stigma Across Cognitive, Interpersonal, and Structural Levels | Anika Sharma et.al. | 2512.13142 | null |
| 2025-12-15 | Uncovering the Role of Initial Saliency in U-Shaped Attention Bias: Scaling Initial Token Weight for Enhanced Long-Text Processing | Zewen Qiang et.al. | 2512.13109 | null |
| 2025-12-15 | Socratic Students: Teaching Language Models to Learn by Asking Questions | Rajeev Bhatt Ambati et.al. | 2512.13102 | null |
| 2025-12-15 | A Simple and Effective Framework for Symmetric Consistent Indexing in Large-Scale Dense Retrieval | Huimu Wang et.al. | 2512.13074 | null |
| 2025-12-15 | M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization | Bizhe Bai et.al. | 2512.13070 | null |
| 2025-12-15 | LLM Rationalis? Measuring Bargaining Capabilities of AI Negotiators | Cheril Shah et.al. | 2512.13063 | null |
| 2025-12-15 | An Open and Reproducible Deep Research Agent for Long-Form Question Answering | Ikuya Yamada et.al. | 2512.13059 | null |
| 2025-12-15 | Sharpen the Spec, Cut the Code: A Case for Generative File System with SYSSPEC | Qingyuan Liu et.al. | 2512.13047 | null |
| 2025-12-15 | Understanding Structured Financial Data with LLMs: A Case Study on Fraud Detection | Xuwei Tan et.al. | 2512.13040 | null |
| 2025-12-15 | Large Language Models for Power System Applications: A Comprehensive Literature Survey | Muhammad Sarwar et.al. | 2512.13004 | null |
| 2025-12-15 | Are Large Language Models Really Effective for Training-Free Cold-Start Recommendation? | Genki Kusano et.al. | 2512.13001 | null |
| 2025-12-15 | Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views | Tingyang Chen et.al. | 2512.12980 | null |
| 2025-12-15 | Do Reviews Matter for Recommendations in the Era of Large Language Models? | Chee Heng Tan et.al. | 2512.12978 | null |
| 2025-12-15 | Authors Should Annotate | Marcus Ma et.al. | 2512.12976 | null |
| 2025-12-15 | Database Research needs an Abstract Relational Query Language | Wolfgang Gatterbauer et.al. | 2512.12957 | null |
| 2025-12-15 | Building from Scratch: A Multi-Agent Framework with Human-in-the-Loop for Multilingual Legal Terminology Mapping | Lingyi Meng et.al. | 2512.12950 | null |
| 2025-12-15 | SPAR: Session-based Pipeline for Adaptive Retrieval on Legacy File Systems | Duy A. Nguyen et.al. | 2512.12938 | null |
| 2025-12-15 | PROSERVE: Unified Multi-Priority Request Scheduling for LLM Serving | Weizhe Huang et.al. | 2512.12928 | null |
| 2025-12-15 | Interpretable Hypothesis-Driven Trading:A Rigorous Walk-Forward Validation Framework for Market Microstructure Signals | Gagan Deep et.al. | 2512.12924 | null |
| 2025-12-15 | LLM-based Personalized Portfolio Recommender: Integrating Large Language Models and Reinforcement Learning for Intelligent Investment Strategy Optimization | Bangyu Li et.al. | 2512.12922 | null |
| 2025-12-15 | Cisco Integrated AI Security and Safety Framework Report | Amy Chang et.al. | 2512.12921 | null |
| 2025-12-15 | CTIGuardian: A Few-Shot Framework for Mitigating Privacy Leakage in Fine-Tuned LLMs | Shashie Dilhara Batan Arachchige et.al. | 2512.12914 | null |
| 2025-12-14 | SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition | Minghao Zhu et.al. | 2512.12885 | null |
| 2025-12-14 | ERA-IT: Aligning Semantic Models with Revealed Economic Preference for Real-Time and Explainable Patent Valuation | Yoo Yongmin et.al. | 2512.12869 | null |
| 2025-12-14 | Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM | Furong Jia et.al. | 2512.12868 | null |
| 2025-12-14 | Information-Consistent Language Model Recommendations through Group Relative Policy Optimization | Sonal Prabhune et.al. | 2512.12858 | null |
| 2025-12-14 | Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, LLaMA | Hanyu Cai et.al. | 2512.12812 | null |
| 2025-12-14 | Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution | Boyang Yan et.al. | 2512.12806 | null |
| 2025-12-14 | A Disproof of Large Language Model Consciousness: The Necessity of Continual Learning for Consciousness | Erik Hoel et.al. | 2512.12802 | null |
| 2025-12-14 | Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P | Anurag Dutt et.al. | 2512.12801 | null |
| 2025-12-14 | DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning | Zhe Liu et.al. | 2512.12799 | null |
| 2025-12-14 | A Rule-Aware Prompt Framework for Structured Numeric Reasoning in Cyber-Physical Systems | Yichen Liu et.al. | 2512.12794 | null |
| 2025-12-14 | Beyond Task Completion: An Assessment Framework for Evaluating Agentic AI Systems | Sreemaee Akshathala et.al. | 2512.12791 | null |
| 2025-12-14 | State over Tokens: Characterizing the Role of Reasoning Tokens | Mosh Levy et.al. | 2512.12777 | null |
| 2025-12-14 | Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions | Pedro Henrique Luz de Araujo et.al. | 2512.12775 | null |
| 2025-12-14 | JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation | Jianghan Chao et.al. | 2512.12772 | null |
| 2025-12-14 | Adaptive Edge-Cloud Inference for Speech-to-Action Systems Using ASR and Large Language Models (ASTA) | Mohammad Jalili Torkamani et.al. | 2512.12769 | null |
| 2025-12-14 | Intelligent Scientific Literature Explorer using Machine Learning (ISLE) | Sina Jani et.al. | 2512.12760 | null |
| 2025-12-14 | FysicsWorld: A Unified Full-Modality Benchmark for Any-to-Any Understanding, Generation, and Reasoning | Yue Jiang et.al. | 2512.12756 | null |
| 2025-12-14 | Resting Neurons, Active Insights: Improving Input Sparsification for Large Language Models | Haotian Xu et.al. | 2512.12744 | null |
| 2025-12-14 | CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning | Xuanzhang Liu et.al. | 2512.12716 | null |
| 2025-12-14 | Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning | Enhong Mu et.al. | 2512.12706 | null |
| 2025-12-14 | Hybrid Retrieval-Augmented Generation for Robust Multilingual Document Question Answering | Anthony Mudet et.al. | 2512.12694 | null |
| 2025-12-14 | Memoria: A Scalable Agentic Memory Framework for Personalized Conversational AI | Samarth Sarin et.al. | 2512.12686 | null |
| 2025-12-14 | Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches | Amirhossein Yousefiramandi et.al. | 2512.12677 | null |
| 2025-12-14 | LexRel: Benchmarking Legal Relation Extraction for Chinese Civil Cases | Yida Cai et.al. | 2512.12643 | null |
| 2025-12-14 | DiG: Differential Grounding for Enhancing Fine-Grained Perception in Multimodal Large Language Model | Zhou Tao et.al. | 2512.12633 | null |
| 2025-12-14 | ORIBA: Exploring LLM-Driven Role-Play Chatbot as a Creativity Support Tool for Original Character Artists | Yuqian Sun et.al. | 2512.12630 | null |
| 2025-12-14 | Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space | Chengzhi Liu et.al. | 2512.12623 | null |
| 2025-12-14 | Understanding Syllogistic Reasoning in LLMs from Formal and Natural Language Perspectives | Aheli Poddar et.al. | 2512.12620 | null |
| 2025-12-14 | Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching | Wonseok Choi et.al. | 2512.12610 | null |
| 2025-12-14 | Human-Inspired Learning for Large Language Models via Obvious Record and Maximum-Entropy Method Discovery | Hong Su et.al. | 2512.12608 | null |
| 2025-12-14 | Vision-Enhanced Large Language Models for High-Resolution Image Synthesis and Multimodal Data Interpretation | Karthikeya KV et.al. | 2512.12595 | null |
| 2025-12-14 | Beyond Static Scoring: Enhancing Assessment Validity via AI-Generated Interactive Verification | Tom Lee et.al. | 2512.12592 | null |
| 2025-12-14 | StreamingAssistant: Efficient Visual Token Pruning for Accelerating Online Video Understanding | Xinqi Jin et.al. | 2512.12560 | null |
| 2025-12-14 | Large Language Newsvendor: Decision Biases and Cognitive Mechanisms | Jifei Liu et.al. | 2512.12552 | null |
| 2025-12-14 | HyperEdit: Unlocking Instruction-based Text Editing in LLMs via Hypernetworks | Yiming Zeng et.al. | 2512.12544 | null |
| 2025-12-14 | NagaNLP: Bootstrapping NLP for Low-Resource Nagamese Creole with Human-in-the-Loop Synthetic Data | Agniva Maiti et.al. | 2512.12537 | null |
| 2025-12-14 | Diverse LLMs vs. Vulnerabilities: Who Detects and Fixes Them Better? | Arastoo Zibaeirad et.al. | 2512.12536 | null |
| 2025-12-14 | ATLAS: Automated Tree-based Language Analysis System for C and C++ source programs | Jaid Monwar Chowdhury et.al. | 2512.12507 | null |
| 2025-12-14 | KidsArtBench: Multi-Dimensional Children’s Art Evaluation with Attribute-Aware MLLMs | Mingrui Ye et.al. | 2512.12503 | null |
| 2025-12-14 | Explainable AI as a Double-Edged Sword in Dermatology: The Impact on Clinicians versus The Public | Xuhai Xu et.al. | 2512.12500 | null |
| 2025-12-13 | The American Ghost in the Machine: How language models align culturally and the effects of cultural prompting | James Luther et.al. | 2512.12488 | null |
| 2025-12-13 | HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments | Yongjun He et.al. | 2512.12476 | null |
| 2025-12-13 | Large language models have learned to use language | Gary Lupyan et.al. | 2512.12447 | null |
| 2025-12-13 | Can GPT replace human raters? Validity and reliability of machine-generated norms for metaphors | Veronica Mangiaterra et.al. | 2512.12444 | null |
| 2025-12-11 | Towards Efficient and Effective Multi-Camera Encoding for End-to-End Driving | Jiawei Yang et.al. | 2512.10947 | null |
| 2025-12-11 | FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos | Yulu Gan et.al. | 2512.10927 | null |
| 2025-12-11 | SparseSwaps: Tractable LLM Pruning Mask Refinement at Scale | Max Zimmer et.al. | 2512.10922 | null |
| 2025-12-11 | CompanionCast: A Multi-Agent Conversational AI Framework with Spatial Audio for Social Co-Viewing Experiences | Yiyang Wang et.al. | 2512.10918 | null |
| 2025-12-11 | Multi-Granular Node Pruning for Circuit Discovery | Muhammad Umair Haider et.al. | 2512.10903 | null |
| 2025-12-11 | LLMs Can Assist with Proposal Selection at Large User Facilities | Lijie Ding et.al. | 2512.10895 | null |
| 2025-12-11 | Computational emotion analysis with multimodal LLMs: Current evidence on an emerging methodological opportunity | Hauke Licht et.al. | 2512.10882 | null |
| 2025-12-11 | Quantifying Emotional Tone in Tolkien’s The Hobbit: Dialogue Sentiment Analysis with RegEx, NRC-VAD, and Python | Lilin Qiu et.al. | 2512.10865 | null |
| 2025-12-11 | Large Language Models for Superconductor Discovery | Suman Itani et.al. | 2512.10847 | null |
| 2025-12-11 | LabelFusion: Learning to Fuse LLMs and Transformer Classifiers for Robust Text Classification | Michael Schlee et.al. | 2512.10793 | null |
| 2025-12-11 | The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality | Aileen Cheng et.al. | 2512.10791 | null |
| 2025-12-11 | Natural Language Interface for Firewall Configuration | F. Taghiyev et.al. | 2512.10789 | null |
| 2025-12-11 | Developing and Evaluating a Large Language Model-Based Automated Feedback System Grounded in Evidence-Centered Design for Supporting Physics Problem Solving | Holger Maus et.al. | 2512.10785 | null |
| 2025-12-11 | Script Gap: Evaluating LLM Triage on Indian Languages in Native vs Roman Scripts in a Real World Setting | Manurag Khullar et.al. | 2512.10780 | null |
| 2025-12-11 | OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification | Zijian Wu et.al. | 2512.10756 | null |
| 2025-12-11 | LDP: Parameter-Efficient Fine-Tuning of Multimodal LLM for Medical Report Generation | Tianyu Zhou et.al. | 2512.10750 | null |
| 2025-12-11 | Echoes of Automation: How Bots Shaped Political Discourse in Brazil | Merve Ipek Bal et.al. | 2512.10749 | null |
| 2025-12-11 | TRIDENT: A Redundant Architecture for Caribbean-Accented Emergency Speech Triage | Elroy Galbraith et.al. | 2512.10741 | null |
| 2025-12-11 | Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving | Songyang Gao et.al. | 2512.10739 | null |
| 2025-12-11 | Textual Data Bias Detection and Mitigation - An Extensible Pipeline with Experimental Evaluation | Rebekka Görge et.al. | 2512.10734 | null |
| 2025-12-11 | IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Generation | Yuan-Ming Li et.al. | 2512.10730 | link |
| 2025-12-11 | Beyond the Black Box: Identifiable Interpretation and Control in Generative Models via Causal Minimality | Lingjing Kong et.al. | 2512.10720 | null |
| 2025-12-11 | PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In Code | Itay Dreyfuss et.al. | 2512.10713 | null |
| 2025-12-11 | COMPARE: Clinical Optimization with Modular Planning and Assessment via RAG-Enhanced AI-OCT: Superior Decision Support for Percutaneous Coronary Intervention Compared to ChatGPT-5 and Junior Operators | Wei Fang et.al. | 2512.10702 | null |
| 2025-12-11 | Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution | Zouying Cao et.al. | 2512.10696 | null |
| 2025-12-11 | Challenges of Evaluating LLM Safety for User Welfare | Manon Kempermann et.al. | 2512.10687 | null |
| 2025-12-11 | On the Dynamics of Multi-Agent LLM Communities Driven by Value Diversity | Muhua Huang et.al. | 2512.10665 | null |
| 2025-12-11 | Token Sample Complexity of Attention | Léa Bohbot et.al. | 2512.10656 | null |
| 2025-12-11 | TriDF: Evaluating Perception, Detection, and Hallucination for Interpretable DeepFake Detection | Jian-Yu Jiang-Lin et.al. | 2512.10652 | null |
| 2025-12-11 | From Data Scarcity to Data Care: Reimagining Language Technologies for Serbian and other Low-Resource Languages | Smiljana Antonijevic Ubois et.al. | 2512.10630 | null |
| 2025-12-11 | AgriGPT-Omni: A Unified Speech-Vision-Text Framework for Multilingual Agricultural Intelligence | Bo Yang et.al. | 2512.10624 | null |
| 2025-12-11 | Phythesis: Physics-Guided Evolutionary Scene Synthesis for Energy-Efficient Data Center Design via LLMs | Minghao LI et.al. | 2512.10611 | null |
| 2025-12-11 | Multi-Objective Reward and Preference Optimization: Theory and Algorithms | Akhil Agnihotri et.al. | 2512.10601 | null |
| 2025-12-11 | Beyond Pixels: A Training-Free, Text-to-Text Framework for Remote Sensing Image Retrieval | J. Xiao et.al. | 2512.10596 | null |
| 2025-12-11 | RoleRMBench & RoleRM: Towards Reward Modeling for Profile-Based Role Play in Dialogue Systems | Hang Ding et.al. | 2512.10575 | null |
| 2025-12-11 | NormCode: A Semi-Formal Language for Context-Isolated AI Planning | Xin Guan et.al. | 2512.10563 | null |
| 2025-12-11 | Causal Reasoning Favors Encoders: On The Limits of Decoder-Only Models | Amartya Roy et.al. | 2512.10561 | null |
| 2025-12-11 | Grounding Everything in Tokens for Multimodal Large Language Models | Xiangxuan Ren et.al. | 2512.10554 | null |
| 2025-12-11 | LLM-Auction: Generative Auction towards LLM-Native Advertising | Chujie Zhao et.al. | 2512.10551 | null |
| 2025-12-11 | Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding | Yuchen Feng et.al. | 2512.10548 | null |
| 2025-12-11 | Unlocking the Address Book: Dissecting the Sparse Semantic Structure of LLM Key-Value Caches via Sparse Autoencoders | Qingsen Ma et.al. | 2512.10547 | null |
| 2025-12-11 | XDoGE: Multilingual Data Reweighting to Enhance Language Inclusivity in LLMs | Iñaki Lacunza et.al. | 2512.10545 | null |
| 2025-12-11 | Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning | Haiteng Zhao et.al. | 2512.10534 | null |
| 2025-12-11 | Zero-shot 3D Map Generation with LLM Agents: A Dual-Agent Architecture for Procedural Content Generation | Lim Chien Her et.al. | 2512.10501 | null |
| 2025-12-11 | Decoding Human-LLM Collaboration in Coding: An Empirical Study of Multi-Turn Conversations in the Wild | Binquan Zhang et.al. | 2512.10493 | null |
| 2025-12-11 | LLM-Assisted AHP for Explainable Cyber Range Evaluation | Vyron Kampourakis et.al. | 2512.10487 | null |
| 2025-12-11 | From Lab to Reality: A Practical Evaluation of Deep Learning Models and LLMs for Vulnerability Detection | Chaomeng Lu et.al. | 2512.10485 | null |
| 2025-12-11 | Grammaticality Judgments in Humans and Language Models: Revisiting Generative Grammar with LLMs | Lars G. B. Johnsen et.al. | 2512.10453 | null |
| 2025-12-11 | When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection | Devanshu Sahoo et.al. | 2512.10449 | null |
| 2025-12-11 | Decoding Student Minds: Leveraging Conversational Agents for Psychological and Learning Analysis | Nour El Houda Ben Chaabene et.al. | 2512.10441 | null |
| 2025-12-11 | Enhancing Next-Generation Language Models with Knowledge Graphs: Extending Claude, Mistral IA, and GPT-4 via KG-BERT | Nour El Houda Ben Chaabene et.al. | 2512.10440 | null |
| 2025-12-11 | Semantic Reconstruction of Adversarial Plagiarism: A Context-Aware Framework for Detecting and Restoring “Tortured Phrases” in Scientific Literature | Agniva Maiti et.al. | 2512.10435 | null |
| 2025-12-11 | Cooperative Retrieval-Augmented Generation for Question Answering: Mutual Information Exchange and Ranking by Contrasting Layers | Youmin Ko et.al. | 2512.10422 | null |
| 2025-12-11 | How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation | Devanshu Sahoo et.al. | 2512.10415 | null |
| 2025-12-11 | Sliding Window Attention Adaptation | Yijiong Yu et.al. | 2512.10411 | null |
| 2025-12-11 | RoboNeuron: A Modular Framework Linking Foundation Models and ROS for Embodied AI | Weifan Guan et.al. | 2512.10394 | null |
| 2025-12-11 | GPG: Generalized Policy Gradient Theorem for Transformer-based Policies | Hangyu Mao et.al. | 2512.10365 | null |
| 2025-12-11 | Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models | Woojun Jung et.al. | 2512.10362 | null |
| 2025-12-11 | Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task | Sunqi Fan et.al. | 2512.10359 | null |
| 2025-12-11 | Dynamics of Agentic Loops in Large Language Models: A Geometric Theory of Trajectories | Nicolas Tacheny et.al. | 2512.10350 | null |
| 2025-12-11 | EchoingPixels: Cross-Modal Adaptive Token Reduction for Efficient Audio-Visual LLMs | Chao Gong et.al. | 2512.10324 | null |
| 2025-12-11 | EpiPlanAgent: Agentic Automated Epidemic Response Planning | Kangkun Mao et.al. | 2512.10313 | null |
| 2025-12-11 | Efficient-VLN: A Training-Efficient Vision-Language Navigation Model | Duo Zheng et.al. | 2512.10310 | null |
| 2025-12-11 | Reverse Thinking Enhances Missing Information Detection in Large Language Models | Yuxin Liu et.al. | 2512.10273 | null |
| 2025-12-11 | VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models | Yuetong Su et.al. | 2512.10262 | null |
| 2025-12-11 | Reject or Not?: A Benchmark for Voice Assistant Query Rejection in Smart Home Scenario and an Improved Method Based on LLMs | Huichao Men et.al. | 2512.10257 | null |
| 2025-12-11 | InFerActive: Towards Scalable Human Evaluation of Large Language Models through Interactive Inference | Junhyeong Hwangbo et.al. | 2512.10234 | null |
| 2025-12-11 | Adaptive Information Routing for Multimodal Time Series Forecasting | Jun Seo et.al. | 2512.10229 | null |
| 2025-12-11 | Does SWE-Bench-Verified Test Agent Ability or Model Memory? | Thanosan Prathifkumar et.al. | 2512.10218 | null |
| 2025-12-11 | CP-Env: Evaluating Large Language Models on Clinical Pathways in a Controllable Hospital Environment | Yakun Zhu et.al. | 2512.10206 | null |
| 2025-12-11 | AutoMedic: An Automated Evaluation Framework for Clinical Conversational Agents with Medical Dataset Grounding | Gyutaek Oh et.al. | 2512.10195 | null |
| 2025-12-11 | CIEGAD: Cluster-Conditioned Interpolative and Extrapolative Framework for Geometry-Aware and Domain-Aligned Data Augmentation | Keito Inoshita et.al. | 2512.10178 | null |
| 2025-12-11 | ATLAS: Automated Toolkit for Large-Scale Verified Code Synthesis | Mantas Baksys et.al. | 2512.10173 | null |
| 2025-12-11 | Offscript: Automated Auditing of Instruction Adherence in LLMs | Nicholas Clark et.al. | 2512.10172 | null |
| 2025-12-10 | Enhancing Large Language Models for End-to-End Circuit Analysis Problem Solving | Liangliang Chen et.al. | 2512.10159 | null |
| 2025-12-10 | Unforgotten Safety: Preserving Safety Alignment of Large Language Models with Continual Learning | Lama Alssum et.al. | 2512.10150 | null |
| 2025-12-10 | PARAN: Persona-Augmented Review ANswering system on Food Delivery Review Dataset | Moonsoo Park et.al. | 2512.10148 | null |
| 2025-12-10 | Workflow is All You Need: Escaping the “Statistical Smoothing Trap” via High-Entropy Information Foraging and Adversarial Pacing | Zhongjie Jiang et.al. | 2512.10121 | null |
| 2025-12-10 | AgriRegion: Region-Aware Retrieval for High-Fidelity Agricultural Advice | Mesafint Fanuel et.al. | 2512.10114 | null |
| 2025-12-10 | Generate-Then-Validate: A Novel Question Generation Approach Using Small Language Models | Yumou Wei et.al. | 2512.10110 | null |
| 2025-12-10 | LLM-PEA: Leveraging Large Language Models Against Phishing Email Attacks | Najmul Hassan et.al. | 2512.10104 | null |
| 2025-12-10 | What Kind of Reasoning (if any) is an LLM actually doing? On the Stochastic Nature and Abductive Appearance of Large Language Models | Luciano Floridi et.al. | 2512.10080 | null |
| 2025-12-10 | Independent Density Estimation | Jiahao Liu et.al. | 2512.10067 | null |
| 2025-12-10 | Linear socio-demographic representations emerge in Large Language Models from indirect cues | Paul Bouchaud et.al. | 2512.10065 | null |
| 2025-12-10 | \textsc{Text2Graph}: Combining Lightweight LLMs and GNNs for Efficient Text Classification in Label-Scarce Scenarios | João Lucas Luz Lima Sarcinelli et.al. | 2512.10061 | null |
| 2025-12-10 | Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning | Logan Robbins et.al. | 2512.10054 | null |
| 2025-12-10 | Detailed balance in large language model-driven agents | Zhuo-Yang Song et.al. | 2512.10047 | null |
| 2025-12-10 | Local LLM Ensembles for Zero-shot Portuguese Named Entity Recognition | João Lucas Luz Lima Sarcinelli et.al. | 2512.10043 | null |
| 2025-12-10 | Intelligently Weighting Multiple Reference Models for Direct Preference Optimization of LLMs | Skyler Wu et.al. | 2512.10040 | null |
| 2025-12-10 | Exploring LLMs for Scientific Information Extraction Using The SciEx Framework | Sha Li et.al. | 2512.10004 | null |
| 2025-12-10 | SCOPE: Language Models as One-Time Teacher for Hierarchical Planning in Text Environments | Haoye Lu et.al. | 2512.09897 | null |
| 2025-12-10 | Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs | Pius Horn et.al. | 2512.09874 | link |
| 2025-12-10 | FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning | Khurram Khalil et.al. | 2512.09872 | null |
| 2025-12-10 | MedForget: Hierarchy-Aware Multimodal Unlearning Testbed for Medical AI | Fengli Wu et.al. | 2512.09867 | null |
| 2025-12-10 | UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving | Hao Lu et.al. | 2512.09864 | null |
| 2025-12-10 | Mitigating Social Bias in English and Urdu Language Models Using PRM-Guided Candidate Selection and Sequential Refinement | Muneeb Ur Raheem Khan et.al. | 2512.09854 | null |
| 2025-12-10 | ChronusOmni: Improving Time Awareness of Omni Large Language Models | Yijing Chen et.al. | 2512.09841 | null |
| 2025-12-10 | LLMs in Interpreting Legal Documents | Simone Corbo et.al. | 2512.09830 | null |
| 2025-12-10 | RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning | Khurram Khalil et.al. | 2512.09829 | null |
| 2025-12-10 | DeepSeek’s WEIRD Behavior: The cultural alignment of Large Language Models and the effects of prompt language and cultural prompting | James Luther et.al. | 2512.09772 | null |
| 2025-12-10 | Defining Cost Function of Steganography with Large Language Models | Hanzhou Wu et.al. | 2512.09769 | null |
| 2025-12-10 | Towards Language Model Guided TLA+ Proof Automation | Yuhao Zhou et.al. | 2512.09758 | null |
| 2025-12-10 | Knowledge Graph Enrichment and Reasoning for Nobel Laureates | Thanh-Lam T. Nguyen et.al. | 2512.09707 | null |
| 2025-12-10 | Exqutor: Extended Query Optimizer for Vector-augmented Analytical Queries | Hyunjoon Kim et.al. | 2512.09695 | null |
| 2025-12-10 | Understanding Chain-of-Thought Effectiveness in Code Generation: An Empirical and Information-Theoretic Analysis | Naizhu Jin et.al. | 2512.09679 | null |
| 2025-12-10 | The Ky Fan Norms and Beyond: Dual Norms and Combinations for Matrix Optimization | Alexey Kravatskiy et.al. | 2512.09678 | null |
| 2025-12-10 | d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models | Leyi Pan et.al. | 2512.09675 | null |
| 2025-12-10 | IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting | Tao Zhang et.al. | 2512.09663 | link |
| 2025-12-10 | Can LLMs Evaluate What They Cannot Annotate? Revisiting LLM Reliability in Hate Speech Detection | Paloma Piot et.al. | 2512.09662 | null |
| 2025-12-10 | Measuring Corruption from Text Data | Arieda Muço et.al. | 2512.09652 | null |
| 2025-12-10 | MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment | Mengxi Xiao et.al. | 2512.09636 | null |
| 2025-12-10 | Creation of the Estonian Subjectivity Dataset: Assessing the Degree of Subjectivity on a Scale | Karl Gustav Gailit et.al. | 2512.09634 | null |
| 2025-12-10 | An End-to-end Planning Framework with Agentic LLMs and PDDL | Emanuele La Malfa et.al. | 2512.09629 | null |
| 2025-12-10 | LogICL: Distilling LLM Reasoning to Bridge the Semantic Gap in Cross-Domain Log Anomaly Detection | Jingwei Ye et.al. | 2512.09627 | null |
| 2025-12-10 | Rethinking Chain-of-Thought Reasoning for Videos | Yiwu Zhong et.al. | 2512.09616 | link |
| 2025-12-10 | ImageTalk: Designing a Multimodal AAC Text Generation System Driven by Image Recognition and Natural Language Generation | Boyin Yang et.al. | 2512.09610 | null |
| 2025-12-10 | Investigate the Low-level Visual Perception in Vision-Language based Image Quality Assessment | Yuan Li et.al. | 2512.09573 | null |
| 2025-12-10 | System Report for CCL25-Eval Task 10: Prompt-Driven Large Language Model Merge for Fine-Grained Chinese Hate Speech Detection | Binglin Wu et.al. | 2512.09563 | null |
| 2025-12-10 | Systematic Framework of Application Methods for Large Language Models in Language Sciences | Kun Sun et.al. | 2512.09552 | null |
| 2025-12-10 | Chasing Shadows: Pitfalls in LLM Security Research | Jonathan Evertz et.al. | 2512.09549 | null |
| 2025-12-10 | Supporting Dynamic Agentic Workloads: How Data and Agents Interact | Ioana Giurgiu et.al. | 2512.09548 | null |
| 2025-12-10 | Don’t Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search | Ekaterina Fadeeva et.al. | 2512.09538 | null |
| 2025-12-10 | CNFinBench: A Benchmark for Safety and Compliance of Large Language Models in Finance | Jinru Ding et.al. | 2512.09506 | null |
| 2025-12-10 | RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning | Yucan Guo et.al. | 2512.09487 | null |
| 2025-12-10 | Advancing LLM-Based Security Automation with Customized Group Relative Policy Optimization for Zero-Touch Networks | Xinye Cao et.al. | 2512.09485 | null |
| 2025-12-10 | An Efficient Interaction Human-AI Synergy System Bridging Visual Awareness and Large Language Model for Intensive Care Units | Yibowen Zhao et.al. | 2512.09473 | null |
| 2025-12-10 | WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving | Chiheng Lou et.al. | 2512.09472 | null |
| 2025-12-10 | Advancing Text Classification with Large Language Models and Neural Attention Mechanisms | Ning Lyu et.al. | 2512.09444 | null |
| 2025-12-10 | Advancing Research via Human-AI Interactive Theorem Proving | Chenyi Li et.al. | 2512.09443 | null |
| 2025-12-10 | Knowledge-Augmented Large Language Model Agents for Explainable Financial Decision-Making | Qingyuan Zhang et.al. | 2512.09440 | null |
| 2025-12-10 | ODMA: On-Demand Memory Allocation Framework for LLM Serving on LPDDR-Class Accelerators | Guoqiang Zou et.al. | 2512.09427 | null |
| 2025-12-10 | Black-Box Behavioral Distillation Breaks Safety Alignment in Medical LLMs | Sohely Jahan et.al. | 2512.09403 | null |
| 2025-12-10 | Optimizing Data Extraction from Materials Science Literature: A Study of Tools Using Large Language Models | Wenkai Ning et.al. | 2512.09370 | null |
| 2025-12-10 | Are Hypervectors Enough? Single-Call LLM Reasoning over Knowledge Graphs | Yezi Liu et.al. | 2512.09369 | null |
| 2025-12-10 | Video-QTR: Query-Driven Temporal Reasoning Framework for Lightweight Video Understanding | Xinkui Zhao et.al. | 2512.09354 | null |
| 2025-12-10 | Self Distillation Fine-Tuning of Protein Language Models Improves Versatility in Protein Design | Amin Tavakoli et.al. | 2512.09329 | null |
| 2025-12-10 | RACAM: Enhancing DRAM with Reuse-Aware Computation and Automated Mapping for ML Inference | Siyuan Ma et.al. | 2512.09304 | null |
| 2025-12-10 | Identifying Bias in Machine-generated Text Detection | Kevin Stowe et.al. | 2512.09292 | null |
| 2025-12-10 | LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations | Zhichao Yang et.al. | 2512.09271 | null |
| 2025-12-10 | From Forecast to Action: Uncertainty-Aware UAV Deployment for Ocean Drifter Recovery | Jingeun Kim et.al. | 2512.09260 | null |
| 2025-12-10 | The Illusion of Rationality: Tacit Bias and Strategic Dominance in Frontier LLM Negotiation Games | Manuel S. Ríos et.al. | 2512.09254 | null |
| 2025-12-10 | GLACIA: Instance-Aware Positional Reasoning for Glacial Lake Segmentation via Multimodal Large Language Model | Lalit Maurya et.al. | 2512.09251 | null |
| 2025-12-10 | Training-free Context-adaptive Attention for Efficient Long Context Modeling | Zeng You et.al. | 2512.09238 | null |
| 2025-12-10 | CORE: A Conceptual Reasoning Layer for Large Language Models | Vishwas Hegde et.al. | 2512.09222 | null |
| 2025-12-10 | Targeting Misalignment: A Conflict-Aware Framework for Reward-Model-based LLM Alignment | Zixuan Liu et.al. | 2512.09212 | null |
| 2025-12-09 | LLMs for Analog Circuit Design Continuum (ACDC) | Yasaman Esfandiari et.al. | 2512.09199 | null |
| 2025-12-09 | TritonForge: Profiling-Guided Framework for Automated Triton Kernel Optimization | Haonan Li et.al. | 2512.09196 | null |
| 2025-12-09 | WOLF: Werewolf-based Observations for LLM Deception and Falsehoods | Mrinal Agarwal et.al. | 2512.09187 | null |
| 2025-12-09 | MindShift: Analyzing Language Models’ Reactions to Psychological Prompts | Anton Vasiliuk et.al. | 2512.09149 | null |
| 2025-12-09 | Detecting Hallucinations in Graph Retrieval-Augmented Generation via Attention Patterns and Semantic Alignment | Shanghao Li et.al. | 2512.09148 | null |
| 2025-12-09 | Knowledge-Guided Large Language Model for Automatic Pediatric Dental Record Understanding and Safe Antibiotic Recommendation | Zihan Han et.al. | 2512.09127 | null |
| 2025-12-09 | A Categorical Analysis of Large Language Models and Why LLMs Circumvent the Symbol Grounding Problem | Luciano Floridi et.al. | 2512.09117 | null |
| 2025-12-09 | Evolving Excellence: Automated Optimization of LLM-based Agents | Paul Brookes et.al. | 2512.09108 | null |
| 2025-12-09 | Learning Unmasking Policies for Diffusion Language Models | Metod Jazbec et.al. | 2512.09106 | null |
| 2025-12-09 | Explaining the Unseen: Multimodal Vision-Language Reasoning for Situational Awareness in Underground Mining Disasters | Mizanur Rahman Jewel et.al. | 2512.09092 | null |
| 2025-12-09 | Calibrated Trust in Dealing with LLM Hallucinations: A Qualitative Study | Adrian Ryser et.al. | 2512.09088 | null |
| 2025-12-09 | AgentComp: From Agentic Reasoning to Compositional Mastery in Text-to-Image Models | Arman Zarei et.al. | 2512.09081 | null |
| 2025-12-09 | Llama-based source code vulnerability detection: Prompt engineering vs Fine tuning | Dyna Soumhane Ouchebara et.al. | 2512.09006 | null |
| 2025-12-09 | Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs | Angela van Sprang et.al. | 2512.08923 | null |
| 2025-12-09 | Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training | Jakub Krajewski et.al. | 2512.08894 | null |
| 2025-12-09 | Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders | Guangzhi Xiong et.al. | 2512.08892 | null |
| 2025-12-09 | AI Didn’t Start the Fire: Examining the Stack Exchange Moderator and Contributor Strike | Yiwei Wu et.al. | 2512.08884 | null |
| 2025-12-09 | When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation | Joshua Ward et.al. | 2512.08875 | null |
| 2025-12-09 | Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning | Jing Jie Tan et.al. | 2512.08873 | null |
| 2025-12-09 | SimpleDevQA: Benchmarking Large Language Models on Development Knowledge QA | Jing Zhang et.al. | 2512.08867 | null |
| 2025-12-09 | Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts | Yifan Lyu et.al. | 2512.08814 | null |
| 2025-12-09 | PrivTune: Efficient and Privacy-Preserving Fine-Tuning of Large Language Models via Device-Cloud Collaboration | Yi Liu et.al. | 2512.08809 | null |
| 2025-12-09 | A Systematic Evaluation of Preference Aggregation in Federated RLHF for Pluralistic Alignment of LLMs | Mahmoud Srewa et.al. | 2512.08786 | null |
| 2025-12-09 | A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows | Eranga Bandara et.al. | 2512.08769 | null |
| 2025-12-09 | Financial News Summarization: Can extractive methods still offer a true alternative to LLMs? | Nicolas Reche et.al. | 2512.08764 | null |
| 2025-12-09 | Towards Foundation Models with Native Multi-Agent Intelligence | Shuyue Hu et.al. | 2512.08743 | null |
| 2025-12-09 | LaMoSys3.5D: Enabling 3.5D-IC-Based Large Language Model Inference Serving Systems via Hardware/Software Co-Design | Qipan Wang et.al. | 2512.08731 | null |
| 2025-12-09 | Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search | Manos Plitsis et.al. | 2512.08724 | null |
| 2025-12-09 | Multi-Agent Intelligence for Multidisciplinary Decision-Making in Gastrointestinal Oncology | Rongzhao Zhang et.al. | 2512.08674 | null |
| 2025-12-09 | An Agentic AI System for Multi-Framework Communication Coding | Bohao Yang et.al. | 2512.08659 | null |
| 2025-12-09 | QSTN: A Modular Framework for Robust Questionnaire Inference with Large Language Models | Maximilian Kreutner et.al. | 2512.08646 | null |
| 2025-12-09 | Chain-of-Image Generation: Toward Monitorable and Controllable Image Generation | Young Kyung Kim et.al. | 2512.08645 | null |
| 2025-12-09 | See-Control: A Multimodal Agent Framework for Smartphone Interaction with a Robotic Arm | Haoyu Zhao et.al. | 2512.08629 | null |
| 2025-12-09 | HealthcareNLP: where are we and what is next? | Lifeng Han et.al. | 2512.08617 | null |
| 2025-12-09 | CogMCTS: A Novel Cognitive-Guided Monte Carlo Tree Search Framework for Iterative Heuristic Evolution with Large Language Models | Hui Wang et.al. | 2512.08609 | null |
| 2025-12-09 | Bridging Scale Discrepancies in Robotic Control via Language-Based Action Representations | Yuchi Zhang et.al. | 2512.08548 | null |
| 2025-12-09 | Curriculum Guided Massive Multi Agent System Solving For Robust Long Horizon Tasks | Indrajit Kar et.al. | 2512.08545 | null |
| 2025-12-09 | Principles2Plan: LLM-Guided System for Operationalising Ethical Principles into Plans | Tammy Zhong et.al. | 2512.08536 | null |
| 2025-12-09 | Autonomous Issue Resolver: Towards Zero-Touch Code Maintenance | Aliaksei Kaliutau et.al. | 2512.08492 | null |
| 2025-12-09 | Soft Inductive Bias Approach via Explicit Reasoning Perspectives in Inappropriate Utterance Detection Using Large Language Models | Ju-Young Kim et.al. | 2512.08480 | null |
| 2025-12-09 | A Multi-Agent LLM Framework for Design Space Exploration in Autonomous Driving Systems | Po-An Shih et.al. | 2512.08476 | null |
| 2025-12-09 | Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models III: Implementing the Bacterial Biothreat Benchmark (B3) Dataset | Gary Ackerman et.al. | 2512.08459 | null |
| 2025-12-09 | Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models II: Benchmark Generation Process | Gary Ackerman et.al. | 2512.08451 | null |
| 2025-12-09 | What Triggers my Model? Contrastive Explanations Inform Gender Choices by Translation Models | Janiça Hackenbuchner et.al. | 2512.08440 | null |
| 2025-12-09 | Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs | Yinan Zhong et.al. | 2512.08417 | null |
| 2025-12-09 | Towards Effective and Efficient Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval | Tao Chen et.al. | 2512.08410 | null |
| 2025-12-09 | DFALLM: Achieving Generalizable Multitask Deepfake Detection by Optimizing Audio LLM Components | Yupei Li et.al. | 2512.08403 | null |
| 2025-12-09 | The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss | Bozhou Li et.al. | 2512.08374 | null |
| 2025-12-09 | Reflecting with Two Voices: A Co-Adaptive Dual-Strategy Framework for LLM-Based Agent Decision Making | Wentao Zhang et.al. | 2512.08366 | null |
| 2025-12-09 | The High Cost of Incivility: Quantifying Interaction Inefficiency via Multi-Agent Monte Carlo Simulations | Benedikt Mangold et.al. | 2512.08345 | null |
| 2025-12-09 | Argus: A Multi-Agent Sensitive Information Leakage Detection Framework Based on Hierarchical Reference Relationships | Bin Wang et.al. | 2512.08326 | null |
| 2025-12-09 | rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection | Sijia Chen et.al. | 2512.08300 | null |
| 2025-12-09 | Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem | Shiva Gaire et.al. | 2512.08290 | null |
| 2025-12-09 | Empowering smart app development with SolidGPT: an edge-cloud hybrid AI agent framework | Liao Hu et.al. | 2512.08286 | null |
| 2025-12-09 | AgentEval: Generative Agents as Reliable Proxies for Human Evaluation of AI-Generated Content | Thanh Vu et.al. | 2512.08273 | null |
| 2025-12-09 | Reasoning Models Ace the CFA Exams | Jaisal Patel et.al. | 2512.08270 | null |
| 2025-12-09 | Token Sugar: Making Source Code Sweeter for LLMs through Token-Efficient Shorthand | Zhensu Sun et.al. | 2512.08266 | null |
| 2025-12-09 | Beyond Traditional Diagnostics: Transforming Patient-Side Information into Predictive Insights with Knowledge Graphs and Prototypes | Yibowen Zhao et.al. | 2512.08261 | null |
| 2025-12-09 | Chopper: A Multi-Level GPU Characterization Tool & Derived Insights Into LLM Training Inefficiency | Marco Kurzynski et.al. | 2512.08242 | null |
| 2025-12-09 | SOP^2: Transfer Learning with Scene-Oriented Prompt Pool on 3D Object Detection | Ching-Hung Cheng et.al. | 2512.08223 | null |
| 2025-12-09 | Secure or Suspect? Investigating Package Hallucinations of Shell Command in Original and Quantized LLMs | Md Nazmul Haque et.al. | 2512.08213 | null |
| 2025-12-09 | MobileFineTuner: A Unified End-to-End Framework for Fine-Tuning LLMs on Mobile Phones | Jiaxiang Geng et.al. | 2512.08211 | null |
| 2025-12-09 | ClinicalTrialsHub: Bridging Registries and Literature for Comprehensive Clinical Trial Access | Jiwoo Park et.al. | 2512.08193 | null |
| 2025-12-09 | A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties | Jinghao Wang et.al. | 2512.08185 | null |
| 2025-12-09 | Framing Climate Change on YouTube: North-South Divides in Narratives and Public Engagement | Sanika Damle et.al. | 2512.08183 | null |
| 2025-12-09 | Chat with UAV – Human-UAV Interaction Based on Large Language Models | Haoran Wang et.al. | 2512.08145 | null |
| 2025-12-09 | PolyLingua: Margin-based Inter-class Transformer for Robust Cross-domain Language Detection | Ali Lotfi Rezaabad et.al. | 2512.08143 | null |
| 2025-12-09 | Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models I: The Task-Query Architecture | Gary Ackerman et.al. | 2512.08130 | null |
| 2025-12-09 | Universal Adversarial Suffixes Using Calibrated Gumbel-Softmax Relaxation | Sampriti Soor et.al. | 2512.08123 | null |
| 2025-12-08 | Evolutionary perspective of large language models on shaping research insights into healthcare disparities | David An et.al. | 2512.08122 | null |
| 2025-12-08 | Balanced Accuracy: The Right Metric for Evaluating LLM Judges – Explained through Youden’s J statistic | Stephane Collot et.al. | 2512.08121 | null |
| 2025-12-08 | Detecting Ambiguity Aversion in Cyberattack Behavior to Inform Cognitive Defense Strategies | Stephan Carney et.al. | 2512.08107 | null |
| 2025-12-08 | AgentCrypt: Advancing Privacy and (Secure) Computation in AI Agent Collaboration | Harish Karthikeyan et.al. | 2512.08104 | null |
| 2025-12-08 | Training LLMs for Honesty via Confessions | Manas Joglekar et.al. | 2512.08093 | null |
| 2025-12-08 | Adaptation of Embedding Models to Financial Filings via LLM Distillation | Eliot Brenner et.al. | 2512.08088 | null |
| 2025-12-08 | Exploiting the Randomness of Large Language Models (LLM) in Text Classification Tasks: Locating Privileged Documents in Legal Matters | Keith Huffman et.al. | 2512.08083 | null |
| 2025-12-08 | Short-Context Dominance: How Much Local Context Natural Language Actually Needs? | Vala Vakilian et.al. | 2512.08082 | null |
| 2025-12-08 | Leveraging Machine Learning and Large Language Models for Automated Image Clustering and Description in Legal Discovery | Qiang Mao et.al. | 2512.08079 | null |
| 2025-12-08 | A Comparative Study of Retrieval Methods in Azure AI Search | Qiang Mao et.al. | 2512.08078 | null |
| 2025-12-08 | Unveiling Latent Knowledge in Chemistry Language Models through Sparse Autoencoders | Jaron Cohen et.al. | 2512.08077 | null |
| 2025-12-08 | Large Language Models for Education and Research: An Empirical and User Survey-based Analysis | Md Mostafizer Rahman et.al. | 2512.08057 | null |
| 2025-12-08 | CLARITY: Medical World Model for Guiding Treatment Decisions by Modeling Context-Aware Disease Trajectories in Latent Space | Tianxingjian Ding et.al. | 2512.08029 | null |
| 2025-12-08 | Toward an AI Reasoning-Enabled System for Patient-Clinical Trial Matching | Caroline N. Leach et.al. | 2512.08026 | null |
| 2025-12-08 | FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models | Jiyoon Pyo et.al. | 2512.08016 | null |
| 2025-12-08 | Bridging the Clinical Expertise Gap: Development of a Web-Based Platform for Accessible Time Series Forecasting and Analysis | Aaron D. Mullen et.al. | 2512.07992 | null |
| 2025-12-08 | DeepCode: Open Agentic Coding | Zongwei Li et.al. | 2512.07921 | link |
| 2025-12-08 | Relational Visual Similarity | Thao Nguyen et.al. | 2512.07833 | null |
| 2025-12-08 | Do Generalisation Results Generalise? | Matteo Boglioni et.al. | 2512.07832 | null |
| 2025-12-08 | Understanding Privacy Risks in Code Models Through Training Dynamics: A Causal Approach | Hua Yang et.al. | 2512.07814 | null |
| 2025-12-08 | LLM Use for Mental Health: Crowdsourcing Users’ Sentiment-based Perspectives and Values from Social Discussions | Lingyao Li et.al. | 2512.07797 | null |
| 2025-12-08 | Large Causal Models from Large Language Models | Sridhar Mahadevan et.al. | 2512.07796 | null |
| 2025-12-08 | ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning | Nearchos Potamitis et.al. | 2512.07795 | null |
| 2025-12-08 | Automating High Energy Physics Data Analysis with LLM-Powered Agents | Eli Gendreau-Distler et.al. | 2512.07785 | null |
| 2025-12-08 | Mary, the Cheeseburger-Eating Vegetarian: Do LLMs Recognize Incoherence in Narratives? | Karin de Langis et.al. | 2512.07777 | null |
| 2025-12-08 | RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models | Xiqiao Xiong et.al. | 2512.07761 | null |
| 2025-12-08 | SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery | Meng Cao et.al. | 2512.07733 | null |
| 2025-12-08 | SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination | Sangha Park et.al. | 2512.07730 | null |
| 2025-12-08 | Privacy Practices of Browser Agents | Alisha Ukani et.al. | 2512.07725 | null |
| 2025-12-08 | In-Context and Few-Shots Learning for Forecasting Time Series Data based on Large Language Models | Saroj Gopali et.al. | 2512.07705 | null |
| 2025-12-08 | HalluShift++: Bridging Language and Vision through Internal Representation Shifts for Hierarchical Hallucinations in MLLMs | Sujoy Nath et.al. | 2512.07687 | null |
| 2025-12-08 | When Large Language Models Do Not Work: Online Incivility Prediction through Graph Neural Networks | Zihan Chen et.al. | 2512.07684 | null |
| 2025-12-08 | Depth-Wise Activation Steering for Honest Language Models | Gracjan Góral et.al. | 2512.07667 | null |
| 2025-12-08 | Bridging Code Graphs and Large Language Models for Better Code Understanding | Zeqi Chen et.al. | 2512.07666 | null |
| 2025-12-08 | Reliable agent engineering should integrate machine-compatible organizational principles | R. Patrick Xian et.al. | 2512.07665 | null |
| 2025-12-08 | An AI-Powered Autonomous Underwater System for Sea Exploration and Scientific Research | Hamad Almazrouei et.al. | 2512.07652 | null |
| 2025-12-08 | PCMind-2.1-Kaiyuan-2B Technical Report | Kairong Luo et.al. | 2512.07612 | null |
| 2025-12-08 | Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement | Yongsheng Lian et.al. | 2512.07611 | null |
| 2025-12-08 | Metric-Fair Prompting: Treating Similar Samples Similarly | Jing Wang et.al. | 2512.07608 | null |
| 2025-12-08 | Complementary Learning Approach for Text Classification using Large Language Models | Navid Asgari et.al. | 2512.07583 | null |
| 2025-12-08 | All You Need Are Random Visual Tokens? Demystifying Token Pruning in VLLMs | Yahong Wang et.al. | 2512.07580 | null |
| 2025-12-08 | A Simple Method to Enhance Pre-trained Language Models with Speech Tokens for Classification | Nicolas Calbucura et.al. | 2512.07571 | null |
| 2025-12-08 | MoCoRP: Modeling Consistent Relations between Persona and Response for Persona-based Dialogue | Kyungro Lee et.al. | 2512.07544 | null |
| 2025-12-08 | SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents | Michelle Wastl et.al. | 2512.07538 | null |
| 2025-12-08 | Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs | Xiaoran Liu et.al. | 2512.07525 | link |
| 2025-12-08 | AutoICE: Automatically Synthesizing Verifiable C Code via LLM-driven Evolution | Weilin Luo et.al. | 2512.07501 | null |
| 2025-12-08 | How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations | JV Roig et.al. | 2512.07497 | null |
| 2025-12-08 | Enhancing Agentic RL with Progressive Reward Shaping and Value-based Sampling Policy Optimization | Zhuoran Zhuang et.al. | 2512.07478 | null |
| 2025-12-08 | Understanding LLM Agent Behaviours via Game Theory: Strategy Recognition, Biases and Multi-Agent Dynamics | Trung-Kiet Huynh et.al. | 2512.07462 | null |
| 2025-12-08 | Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning | Tong Wu et.al. | 2512.07461 | link |
| 2025-12-08 | Persian-Phi: Efficient Cross-Lingual Adaptation of Compact LLMs via Curriculum Learning | Amir Mohammad Akhlaghi et.al. | 2512.07454 | null |
| 2025-12-08 | From Show Programmes to Data: Designing a Workflow to Make Performing Arts Ephemera Accessible Through Language Models | Clarisse Bardiot et.al. | 2512.07452 | null |
| 2025-12-08 | MIDG: Mixture of Invariant Experts with knowledge injection for Domain Generalization in Multimodal Sentiment Analysis | Yangle Li et.al. | 2512.07430 | null |
| 2025-12-08 | Revolutionizing Mixed Precision Quantization: Towards Training-free Automatic Proxy Discovery via Large Language Models | Haidong Kang et.al. | 2512.07419 | null |
| 2025-12-08 | Do LLMs Trust the Code They Write? | Francisco Ribeiro et.al. | 2512.07404 | null |
| 2025-12-08 | LUNE: Efficient LLM Unlearning via LoRA Fine-Tuning with Negative Examples | Yezi Liu et.al. | 2512.07375 | null |
| 2025-12-08 | Communication-Efficient Serving for Video Diffusion Models with Latent Parallelism | Zhiyuan Wu et.al. | 2512.07350 | null |
| 2025-12-08 | Generalized Referring Expression Segmentation on Aerial Photos | Luís Marnoto et.al. | 2512.07338 | link |
| 2025-12-08 | DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management | Zhongchun Zhou et.al. | 2512.07312 | null |
| 2025-12-08 | Exact Synthetic Populations for Scalable Societal and Market Modeling | Thierry Petit et.al. | 2512.07306 | null |
| 2025-12-08 | Towards Accurate UAV Image Perception: Guiding Vision-Language Models with Stronger Task Prompts | Mingning Guo et.al. | 2512.07302 | null |
| 2025-12-08 | Investigating Training and Generalization in Faithful Self-Explanations of Large Language Models | Tomoki Doi et.al. | 2512.07288 | null |
| 2025-12-08 | Automatic Syntax Error Repair for Discrete Controller Synthesis using Large Language Model | Yusei Ishimizu et.al. | 2512.07261 | null |
| 2025-12-08 | Ensembling LLM-Induced Decision Trees for Explainable and Robust Error Detection | Mengqi Wang et.al. | 2512.07246 | null |
| 2025-12-08 | NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models | Feng Liang et.al. | 2512.07218 | null |
| 2025-12-08 | MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning | Xuhui Zheng et.al. | 2512.07203 | null |
| 2025-12-08 | Generating Storytelling Images with Rich Chains-of-Reasoning | Xiujie Song et.al. | 2512.07198 | null |
| 2025-12-08 | START: Spatial and Textual Learning for Chart Understanding | Zhuoming Liu et.al. | 2512.07186 | link |
| 2025-12-08 | ContextualSHAP : Enhancing SHAP Explanations Through Contextual Language Generation | Latifa Dwiyanti et.al. | 2512.07178 | null |
| 2025-12-08 | SPACE: Noise Contrastive Estimation Stabilizes Self-Play Fine-Tuning for Large Language Models | Yibo Wang et.al. | 2512.07175 | null |
| 2025-12-08 | Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration | Jucheng Shen et.al. | 2512.07173 | null |
| 2025-12-08 | When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing | Siyuan Xu et.al. | 2512.07166 | null |
| 2025-12-08 | A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning | Siyang Jiang et.al. | 2512.07136 | null |
| 2025-12-08 | DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning | Nithin Sivakumaran et.al. | 2512.07132 | null |
| 2025-12-08 | RisConFix: LLM-based Automated Repair of Risk-Prone Drone Configurations | Liping Han et.al. | 2512.07122 | null |
| 2025-12-08 | FOAM: Blocked State Folding for Memory-Efficient LLM Training | Ziqing Wen et.al. | 2512.07112 | null |
| 2025-12-08 | The Geometry of Persona: Disentangling Personality from Reasoning in Large Language Models | Zhixiang Wang et.al. | 2512.07092 | null |
| 2025-12-08 | Leveraging KV Similarity for Online Structured Pruning in LLMs | Jungmin Lee et.al. | 2512.07090 | null |
| 2025-12-08 | ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking | Yunzhe Li et.al. | 2512.07086 | null |
| 2025-12-08 | Do Large Language Models Truly Understand Cross-cultural Differences? | Shiwei Guo et.al. | 2512.07075 | null |
| 2025-12-08 | Replicating TEMPEST at Scale: Multi-Turn Adversarial Attacks Against Trillion-Parameter Frontier Models | Richard Young et.al. | 2512.07059 | null |
| 2025-12-07 | Reformulate, Retrieve, Localize: Agents for Repository-Level Bug Localization | Genevieve Caumartin et.al. | 2512.07022 | null |
| 2025-12-07 | Latency-Response Theory Model: Evaluating Large Language Models via Response Accuracy and Chain-of-Thought Length | Zhiyu Xu et.al. | 2512.07019 | null |
| 2025-12-07 | FVA-RAG: Falsification-Verification Alignment for Mitigating Sycophantic Hallucinations | Mayank Ravishankara et.al. | 2512.07015 | null |
| 2025-12-07 | Block Sparse Flash Attention | Daniel Ohayon et.al. | 2512.07011 | null |
| 2025-12-07 | Singing Timbre Popularity Assessment Based on Multimodal Large Foundation Model | Zihao Wang et.al. | 2512.06999 | null |
| 2025-12-07 | Prompting-in-a-Series: Psychology-Informed Contents and Embeddings for Personality Recognition With Decoder-Only Models | Jing Jie Tan et.al. | 2512.06991 | null |
| 2025-12-07 | Progress Ratio Embeddings: An Impatience Signal for Robust Length Control in Neural Text Generation | Ivanhoé Botcazou et.al. | 2512.06938 | null |
| 2025-12-07 | Large Language Models and Forensic Linguistics: Navigating Opportunities and Threats in the Age of Generative AI | George Mikros et.al. | 2512.06922 | null |
| 2025-12-07 | NeuroABench: A Multimodal Evaluation Benchmark for Neurosurgical Anatomy Identification | Ziyang Song et.al. | 2512.06921 | null |
| 2025-12-07 | SoK: Trust-Authorization Mismatch in LLM Agent Interactions | Guanquan Shi et.al. | 2512.06914 | null |
| 2025-12-07 | Robots with Attitudes: Influence of LLM-Driven Robot Personalities on Motivation and Performance | Dennis Becker et.al. | 2512.06910 | null |
| 2025-12-07 | BabelCoder: Agentic Code Translation with Specification Alignment | Fazle Rabbi et.al. | 2512.06902 | null |
| 2025-12-07 | An Analysis of Large Language Models for Simulating User Responses in Surveys | Ziyun Yu et.al. | 2512.06874 | null |
| 2025-12-07 | Rhea: Role-aware Heuristic Episodic Attention for Conversational LLMs | Wanyang Hong et.al. | 2512.06869 | null |
| 2025-12-07 | Do Persona-Infused LLMs Affect Performance in a Strategic Reasoning Game? | John Licato et.al. | 2512.06867 | null |
| 2025-12-07 | Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior | Yulin Li et.al. | 2512.06866 | null |
| 2025-12-07 | Spatial Retrieval Augmented Autonomous Driving | Xiaosong Jia et.al. | 2512.06865 | null |
| 2025-12-07 | JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models | Ce Chi et.al. | 2512.06859 | null |
| 2025-12-07 | Formal that “Floats” High: Formal Verification of Floating Point Arithmetic | Hansa Mohanty et.al. | 2512.06850 | null |
| 2025-12-07 | CKG-LLM: LLM-Assisted Detection of Smart Contract Access Control Vulnerabilities Based on Knowledge Graphs | Xiaoqi Li et.al. | 2512.06846 | null |
| 2025-12-07 | Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs | Weixing Zhang et.al. | 2512.06836 | null |
| 2025-12-07 | Large Language Model-Based Generation of Discharge Summaries | Tiago Rodrigues et.al. | 2512.06812 | null |
| 2025-12-07 | MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning | Yueqian Wang et.al. | 2512.06810 | null |
| 2025-12-07 | Optimal and Diffusion Transports in Machine Learning | Gabriel Peyré et.al. | 2512.06797 | null |
| 2025-12-07 | LLM4SFC: Sequential Function Chart Generation via Large Language Models | Ofek Glick et.al. | 2512.06787 | null |
| 2025-12-07 | From Description to Score: Can LLMs Quantify Vulnerabilities? | Sima Jafarikhah et.al. | 2512.06781 | null |
| 2025-12-07 | From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs | Yuchuan Tian et.al. | 2512.06776 | link |
| 2025-12-07 | Becoming Experienced Judges: Selective Test-Time Learning for Evaluators | Seungyeon Jwa et.al. | 2512.06751 | null |
| 2025-12-07 | DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems | Ming Ma et.al. | 2512.06749 | null |
| 2025-12-07 | PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance | Jifar Wakuma Ayana et.al. | 2512.06747 | null |
| 2025-12-07 | A Patient-Doctor-NLP-System to contest inequality for less privileged | Subrit Dikshit et.al. | 2512.06734 | null |
| 2025-12-07 | “The Dentist is an involved parent, the bartender is not”: Revealing Implicit Biases in QA with Implicit BBQ | Aarushi Wagh et.al. | 2512.06732 | null |
| 2025-12-07 | KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models | Sourjya Roy et.al. | 2512.06727 | null |
| 2025-12-07 | The Role of Entropy in Visual Grounding: Analysis and Optimization | Shuo Li et.al. | 2512.06726 | null |
| 2025-12-07 | ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems | Bufang Yang et.al. | 2512.06721 | null |
| 2025-12-07 | Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents | Zhibo Liang et.al. | 2512.06716 | null |
| 2025-11-06 | Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs | Preetum Nakkiran et.al. | 2511.04869 | null |
| 2025-11-06 | Software Defined Vehicle Code Generation: A Few-Shot Prompting Approach | Quang-Dung Nguyen et.al. | 2511.04849 | null |
| 2025-11-06 | Grounded Test-Time Adaptation for LLM Agents | Arthur Chen et.al. | 2511.04847 | null |
| 2025-11-06 | Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models | Chenxi Liu et.al. | 2511.04800 | null |
| 2025-11-06 | ReGen: Generative Robot Simulation via Inverse Design | Phat Nguyen et.al. | 2511.04769 | null |
| 2025-11-06 | Surprisal reveals diversity gaps in image captioning and different scorers change the story | Nikolai Ilinykh et.al. | 2511.04754 | null |
| 2025-11-06 | Trustworthiness Calibration Framework for Phishing Email Detection Using Large Language Models | Daniyal Ganiuly et.al. | 2511.04728 | null |
| 2025-11-06 | IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs | Ali Faraz et.al. | 2511.04727 | null |
| 2025-11-06 | Learning to reason about rare diseases through retrieval-augmented agents | Ha Young Kim et.al. | 2511.04720 | null |
| 2025-11-06 | Benchmark Designers Should “Train on the Test Set” to Expose Exploitable Non-Visual Shortcuts | Ellis Brown et.al. | 2511.04655 | null |
| 2025-11-06 | Logit-Entropy Adaptive Stopping Heuristic for Efficient Chain-of-Thought Reasoning | Mohammad Atif Quamar et.al. | 2511.04654 | null |
| 2025-11-06 | Optimal Inference Schedules for Masked Diffusion Models | Sitan Chen et.al. | 2511.04647 | null |
| 2025-11-06 | When retrieval outperforms generation: Dense evidence retrieval for scalable fake news detection | Alamgir Munir Qazi et.al. | 2511.04643 | link |
| 2025-11-06 | PixCLIP: Achieving Fine-grained Visual Language Understanding via Any-granularity Pixel-Text Alignment Learning | Yicheng Xiao et.al. | 2511.04601 | null |
| 2025-11-06 | Question the Questions: Auditing Representation in Online Deliberative Processes | Soham De et.al. | 2511.04588 | null |
| 2025-11-06 | ARETE: an R package for Automated REtrieval from TExt with large language models | Vasco V. Branco et.al. | 2511.04573 | null |
| 2025-11-06 | Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm | Jingqi Tong et.al. | 2511.04570 | link |
| 2025-11-06 | LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems | Baptiste Bonin et.al. | 2511.04541 | null |
| 2025-11-06 | From Model to Breach: Towards Actionable LLM-Generated Vulnerabilities Reporting | Cyril Vallez et.al. | 2511.04538 | null |
| 2025-11-06 | Large Language Models for Cyber Security | Raunak Somani et.al. | 2511.04508 | null |
| 2025-11-06 | RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG | Joshua Gao et.al. | 2511.04502 | null |
| 2025-11-06 | Large language models replicate and predict human cooperation across experiments in game theory | Andrea Cera Palatsi et.al. | 2511.04500 | null |
| 2025-11-06 | Decoding Emergent Big Five Traits in Large Language Models: Temperature-Dependent Expression and Architectural Clustering | Christos-Nikolaos Zacharopoulos et.al. | 2511.04499 | null |
| 2025-11-06 | RUST-BENCH: Benchmarking LLM Reasoning on Unstructured Text within Structured Tables | Nikhil Abhyankar et.al. | 2511.04491 | null |
| 2025-11-06 | Perceptions of AI Bad Behavior: Variations on Discordant Non-Performance | Jaime Banks et.al. | 2511.04487 | null |
| 2025-11-06 | Promoting Sustainable Web Agents: Benchmarking and Estimating Energy Consumption through Empirical and Theoretical Analysis | Lars Krupp et.al. | 2511.04481 | null |
| 2025-11-06 | Enabling Dynamic Sparsity in Quantized LLM Inference | Rongxiang Wang et.al. | 2511.04477 | null |
| 2025-11-06 | Beyond Shortest Path: Agentic Vehicular Routing with Semantic Context | Carnot Braun et.al. | 2511.04464 | null |
| 2025-11-06 | Speed at the Cost of Quality? The Impact of LLM Agent Assistance on Software Development | Hao He et.al. | 2511.04427 | null |
| 2025-11-06 | The Illusion of Certainty: Uncertainty quantification for LLMs fails under ambiguity | Tim Tomov et.al. | 2511.04418 | null |
| 2025-11-06 | Post-Training LLMs as Better Decision-Making Agents: A Regret-Minimization Approach | Chanwoo Park et.al. | 2511.04393 | null |
| 2025-11-06 | Multi-Task Learning for Visually Grounded Reasoning in Gastrointestinal VQA | Itbaan Safwan et.al. | 2511.04384 | null |
| 2025-11-06 | HPC-Vis: A Visual Analytic System for Interactive Exploration of Historical Painter Cohorts | Yingping Yang et.al. | 2511.04383 | null |
| 2025-11-06 | Towards Aligning Multimodal LLMs with Human Experts: A Focus on Parent-Child Interaction | Weiyan Shi et.al. | 2511.04366 | null |
| 2025-11-06 | Where Do LLMs Still Struggle? An In-Depth Analysis of Code Generation Benchmarks | Amir Molzam Sharifloo et.al. | 2511.04355 | null |
| 2025-11-06 | Differentially Private In-Context Learning with Nearest Neighbor Search | Antti Koskela et.al. | 2511.04332 | null |
| 2025-11-06 | RxSafeBench: Identifying Medication Safety Issues of Large Language Models in Simulated Consultation | Jiahao Zhao et.al. | 2511.04328 | null |
| 2025-11-06 | AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research | Tim Beyer et.al. | 2511.04316 | null |
| 2025-11-06 | Measuring economic outlook in the news timely and efficiently | Elliot Beck et.al. | 2511.04299 | null |
| 2025-11-06 | Robustness of Minimum-Volume Nonnegative Matrix Factorization under an Expanded Sufficiently Scattered Condition | Giovanni Barbarino et.al. | 2511.04291 | null |
| 2025-11-06 | A Tool for Benchmarking Large Language Models’ Robustness in Assessing the Realism of Driving Scenarios | Jiahui Wu et.al. | 2511.04267 | null |
| 2025-11-06 | SSPO: Subsentence-level Policy Optimization | Kun Yang et.al. | 2511.04256 | null |
| 2025-11-06 | Efficient Topic Extraction via Graph-Based Labeling: A Lightweight Alternative to Deep Models | Salma Mekaoui et.al. | 2511.04248 | null |
| 2025-11-06 | Reusing Pre-Training Data at Test Time is a Compute Multiplier | Alex Fang et.al. | 2511.04234 | null |
| 2025-11-06 | Black-Box Guardrail Reverse-engineering Attack | Hongwei Yao et.al. | 2511.04215 | null |
| 2025-11-06 | Block Rotation is All You Need for MXFP4 Quantization | Yuantian Shao et.al. | 2511.04214 | null |
| 2025-11-06 | Can we trust LLMs as a tutor for our students? Evaluating the Quality of LLM-generated Feedback in Statistics Exams | Markus Herklotz et.al. | 2511.04213 | null |
| 2025-11-06 | LLM-as-a-Judge is Bad, Based on AI Attempting the Exam Qualifying for the Member of the Polish National Board of Appeal | Michał Karp et.al. | 2511.04205 | null |
| 2025-11-06 | Computational Turing Test Reveals Systematic Differences Between Human and AI Language | Nicolò Pagan et.al. | 2511.04195 | null |
| 2025-11-06 | Explaining Software Vulnerabilities with Large Language Models | Oshando Johnson et.al. | 2511.04179 | null |
| 2025-11-06 | Transforming Mentorship: An AI Powered Chatbot Approach to University Guidance | Mashrur Rahman et.al. | 2511.04172 | null |
| 2025-11-06 | Are We Aligned? A Preliminary Investigation of the Alignment of Responsible AI Values between LLMs and Human Judgment | Asma Yamani et.al. | 2511.04157 | null |
| 2025-11-06 | BAPPA: Benchmarking Agents, Plans, and Pipelines for Automated Text-to-SQL Generation | Fahim Ahmed et.al. | 2511.04153 | null |
| 2025-11-06 | Implementation of transformer-based LLMs with large-scale optoelectronic neurons on a CMOS image sensor platform | Neil Na et.al. | 2511.04136 | null |
| 2025-11-06 | Exploring the Feasibility of End-to-End Large Language Model as a Compiler | Hongbin Zhang et.al. | 2511.04132 | null |
| 2025-11-06 | RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning | Xinyuan Li et.al. | 2511.04120 | null |
| 2025-11-06 | How Natural Language Proficiency Shapes GenAI Code for Software Engineering Tasks | Ruksit Rojpaisarnkit et.al. | 2511.04115 | null |
| 2025-11-06 | Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models | Wenmo Qiu et.al. | 2511.04108 | null |
| 2025-11-06 | KGFR: A Foundation Retriever for Generalized Knowledge Graph Question Answering | Yuanning Cui et.al. | 2511.04093 | null |
| 2025-11-06 | E-CARE: An Efficient LLM-based Commonsense-Augmented Framework for E-Commerce | Ge Zhang et.al. | 2511.04087 | null |
| 2025-11-06 | Caption Injection for Optimization in Generative Search Engine | Xiaolu Chen et.al. | 2511.04080 | null |
| 2025-11-06 | The truth is no diaper: Human and AI-generated associations to emotional words | Špela Vintar et.al. | 2511.04077 | null |
| 2025-11-06 | Agentmandering: A Game-Theoretic Framework for Fair Redistricting via Large Language Model Agents | Hao Li et.al. | 2511.04076 | null |
| 2025-11-06 | Plan of Knowledge: Retrieval-Augmented Large Language Models for Temporal Knowledge Graph Question Answering | Xinying Qian et.al. | 2511.04072 | null |
| 2025-11-06 | TXL Fusion: A Hybrid Machine Learning Framework Integrating Chemical Heuristics and Large Language Models for Topological Materials Discovery | Arif Ullah et.al. | 2511.04068 | null |
| 2025-11-06 | DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization | Yuantian Shao et.al. | 2511.04063 | null |
| 2025-11-06 | Interpreting Multi-Attribute Confounding through Numerical Attributes in Large Language Models | Hirohane Takagi et.al. | 2511.04053 | null |
| 2025-11-06 | An LLM-based Framework for Human-Swarm Teaming Cognition in Disaster Search and Rescue | Kailun Ji et.al. | 2511.04042 | null |
| 2025-11-06 | PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration | Yue Jiet Chong et.al. | 2511.04036 | null |
| 2025-11-06 | Detecting Silent Failures in Multi-Agentic AI Trajectories | Divya Pathak et.al. | 2511.04032 | null |
| 2025-11-06 | Abductive Inference in Retrieval-Augmented Language Models: Generating and Validating Missing Premises | Shiyin Lin et.al. | 2511.04020 | null |
| 2025-11-06 | Specification-Guided Vulnerability Detection with Large Language Models | Hao Zhu et.al. | 2511.04014 | null |
| 2025-11-06 | PSD2Code: Automated Front-End Code Generation from Design Files via Multimodal Large Language Models | Yongxi Chen et.al. | 2511.04012 | null |
| 2025-11-06 | Memory- and Latency-Constrained Inference of Large Language Models via Adaptive Split Computing | Mingyu Sung et.al. | 2511.04002 | null |
| 2025-11-06 | Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback | Shiyin Lin et.al. | 2511.03995 | null |
| 2025-11-06 | TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training | Michael Menezes et.al. | 2511.03983 | null |
| 2025-11-06 | LLMs and Cultural Values: the Impact of Prompt Language and Explicit Cultural Framing | Bram Bulté et.al. | 2511.03980 | null |
| 2025-11-06 | Direct Semantic Communication Between Large Language Models via Vector Translation | Fu-Chun Yang et.al. | 2511.03945 | null |
| 2025-11-06 | MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation | Shih-Lun Wu et.al. | 2511.03942 | null |
| 2025-11-06 | RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods | Raghav Sharma et.al. | 2511.03939 | null |
| 2025-11-06 | SynQuE: Estimating Synthetic Dataset Quality Without Annotations | Arthur Chen et.al. | 2511.03928 | null |
| 2025-11-06 | Collaborative Agents for Automated Program Repair in Ruby | Nikta Akbarpour et.al. | 2511.03925 | null |
| 2025-11-05 | The Human Flourishing Geographic Index: A County-Level Dataset for the United States, 2013–2023 | Stefano M. Iacus et.al. | 2511.03915 | null |
| 2025-11-05 | GRAD: Graph-Retrieved Adaptive Decoding for Hallucination Mitigation | Manh Nguyen et.al. | 2511.03900 | null |
| 2025-11-05 | Secure Code Generation at Scale with Reflexion | Arup Datta et.al. | 2511.03898 | null |
| 2025-11-05 | KnowThyself: An Agentic Assistant for LLM Interpretability | Suraj Prasai et.al. | 2511.03878 | null |
| 2025-11-05 | OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing Paradigms | Arijit Bhattacharjee et.al. | 2511.03866 | null |
| 2025-11-05 | GAIA: Geothermal Analytics and Intelligent Agent | Randy Harsuko et.al. | 2511.03852 | null |
| 2025-11-05 | To See or To Read: User Behavior Reasoning in Multimodal LLMs | Tianning Dong et.al. | 2511.03845 | null |
| 2025-11-05 | ASAP: an Agentic Solution to Auto-optimize Performance of Large-Scale LLM Training | Yuran Ding et.al. | 2511.03844 | null |
| 2025-11-05 | Divide, Cache, Conquer: Dichotomic Prompting for Efficient Multi-Label LLM-Based Classification | Mikołaj Langner et.al. | 2511.03830 | null |
| 2025-11-05 | STARS: Segment-level Token Alignment with Rejection Sampling in Large Language Models | Mohammad Atif Quamar et.al. | 2511.03827 | null |
| 2025-11-05 | How Different Tokenization Algorithms Impact LLMs and Transformer Models for Binary Code Analysis | Ahmed Mostafa et.al. | 2511.03825 | null |
| 2025-11-05 | PLLuM: A Family of Polish Large Language Models | Jan Kocoń et.al. | 2511.03823 | null |
| 2025-11-05 | Expert Evaluation of LLM World Models: A High- $T_c$ Superconductivity Case Study | Haoyu Guo et.al. | 2511.03782 | null |
| 2025-11-05 | Scaling Agent Learning via Experience Synthesis | Zhaorun Chen et.al. | 2511.03773 | link |
| 2025-11-05 | Disentangled Concepts Speak Louder Than Words:Explainable Video Action Recognition | Jongseo Lee et.al. | 2511.03725 | null |
| 2025-11-05 | Outbidding and Outbluffing Elite Humans: Mastering Liar’s Poker via Self-Play and Reinforcement Learning | Richard Dewey et.al. | 2511.03724 | null |
| 2025-11-05 | LLM-enhanced Air Quality Monitoring Interface via Model Context Protocol | Yu-Erh Pan et.al. | 2511.03706 | null |
| 2025-11-05 | Do Androids Dream of Unseen Puppeteers? Probing for a Conspiracy Mindset in Large Language Models | Francesco Corso et.al. | 2511.03699 | null |
| 2025-11-05 | AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing | Mohsen Ahmadzadeh et.al. | 2511.03697 | null |
| 2025-11-05 | Whisper Leak: a side-channel attack on Large Language Models | Geoff McDonald et.al. | 2511.03675 | null |
| 2025-11-05 | Watermarking Large Language Models in Europe: Interpreting the AI Act in Light of Technology | Thomas Souverain et.al. | 2511.03641 | null |
| 2025-11-05 | Towards Transparent Stance Detection: A Zero-Shot Approach Using Implicit and Explicit Interpretability | Apoorva Upadhyaya et.al. | 2511.03635 | null |
| 2025-11-05 | LiveTradeBench: Seeking Real-World Alpha with Large Language Models | Haofei Yu et.al. | 2511.03628 | null |
| 2025-11-05 | PerfDojo: Automated ML Library Generation for Heterogeneous Architectures | Andrei Ivanov et.al. | 2511.03586 | null |
| 2025-11-05 | ASVRI-Legal: Fine-Tuning LLMs with Retrieval Augmented Generation for Enhanced Legal Regulation | One Octadion et.al. | 2511.03563 | null |
| 2025-11-05 | MultiZebraLogic: A Multilingual Logical Reasoning Benchmark | Sofie Helene Bruun et.al. | 2511.03553 | null |
| 2025-11-05 | Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding | Ziv Nevo et.al. | 2511.03549 | null |
| 2025-11-05 | U2F: Encouraging SWE-Agent to Seize Novelty without Losing Feasibility | Wencheng Ye et.al. | 2511.03517 | null |
| 2025-11-05 | One Battle After Another: Probing LLMs’ Limits on Multi-Turn Instruction Following with a Benchmark Evolving Framework | Qi Jia et.al. | 2511.03508 | null |
| 2025-11-05 | BanglaSTEM: A Parallel Corpus for Technical Domain Bangla-English Translation | Kazi Reyazul Hasan et.al. | 2511.03498 | null |
| 2025-11-05 | RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse | Yinsicheng Jiang et.al. | 2511.03475 | null |
| 2025-11-05 | Towards Scalable Web Accessibility Audit with MLLMs as Copilots | Ming Gu et.al. | 2511.03471 | null |
| 2025-11-05 | CareMedEval dataset: Evaluating Critical Appraisal and Reasoning in the Biomedical Field | Doria Bonzi et.al. | 2511.03441 | null |
| 2025-11-05 | Light over Heavy: Automated Performance Requirements Quantification with Linguistic Inducement | Shihai Wang et.al. | 2511.03421 | null |
| 2025-11-05 | Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG | Longpeng Qiu et.al. | 2511.03410 | null |
| 2025-11-05 | Efficient Reasoning via Thought-Training and Thought-Free Inference | Canhui Wu et.al. | 2511.03408 | null |
| 2025-11-05 | Towards Realistic Project-Level Code Generation via Multi-Agent Collaboration and Semantic Architecture Modeling | Qianhui Zhao et.al. | 2511.03404 | null |
| 2025-11-05 | GUIDES: Guidance Using Instructor-Distilled Embeddings for Pre-trained Robot Policy Enhancement | Minquan Gao et.al. | 2511.03400 | null |
| 2025-11-05 | Computational Imaging Meets LLMs: Zero-Shot IDH Mutation Prediction in Brain Gliomas | Syed Muqeem Mahmood et.al. | 2511.03376 | null |
| 2025-11-05 | LFC-DA: Logical Formula-Controlled Data Augmentation for Enhanced Logical Reasoning | Shenghao Li et.al. | 2511.03372 | null |
| 2025-11-05 | EQ-Negotiator: Dynamic Emotional Personas Empower Small Language Models for Edge-Deployable Credit Negotiation | Yunbo Long et.al. | 2511.03370 | null |
| 2025-11-05 | Silenced Biases: The Dark Side LLMs Learned to Refuse | Rom Himelstein et.al. | 2511.03369 | null |
| 2025-11-05 | A Modular, Data-Free Pipeline for Multi-Label Intention Recognition in Transportation Agentic AI Applications | Xiaocai Zhang et.al. | 2511.03363 | null |
| 2025-11-05 | Multi-Object Tracking Retrieval with LLaVA-Video: A Training-Free Solution to MOT25-StAG Challenge | Yi Yang et.al. | 2511.03332 | null |
| 2025-11-05 | Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks | Jindong Hong et.al. | 2511.03328 | null |
| 2025-11-05 | SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding | Mauro Orazio Drago et.al. | 2511.03325 | null |
| 2025-11-05 | TASU: Text-Only Alignment for Speech Understanding | Jing Peng et.al. | 2511.03310 | null |
| 2025-11-05 | How to Evaluate Speech Translation with Source-Aware Neural MT Metrics | Mauro Cettolo et.al. | 2511.03295 | null |
| 2025-11-05 | UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM | Hai Huang et.al. | 2511.03293 | null |
| 2025-11-05 | Let the Bees Find the Weak Spots: A Path Planning Perspective on Multi-Turn Jailbreak Attacks against LLMs | Yize Liu et.al. | 2511.03271 | null |
| 2025-11-05 | SCALE: Upscaled Continual Learning of Large Language Models | Jin-woo Lee et.al. | 2511.03270 | null |
| 2025-11-05 | Comparing the Performance of LLMs in RAG-based Question-Answering: A Case Study in Computer Science Literature | Ranul Dayarathne et.al. | 2511.03261 | null |
| 2025-11-05 | Auditing M-LLMs for Privacy Risks: A Synthetic Benchmark and Evaluation Framework | Junhao Li et.al. | 2511.03248 | null |
| 2025-11-05 | Death by a Thousand Prompts: Open Model Vulnerability Analysis | Amy Chang et.al. | 2511.03247 | null |
| 2025-11-05 | IndicSuperTokenizer: An Optimized Tokenizer for Indic Multilingual LLMs | Souvik Rana et.al. | 2511.03237 | null |
| 2025-11-05 | From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers | Yi-Fei Liu et.al. | 2511.03235 | null |
| 2025-11-05 | Multimodal-Wireless: A Large-Scale Dataset for Sensing and Communication | Tianhao Mao et.al. | 2511.03220 | null |
| 2025-11-05 | Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification | Shaghayegh Kolli et.al. | 2511.03217 | null |
| 2025-11-05 | LGM: Enhancing Large Language Models with Conceptual Meta-Relations and Iterative Retrieval | Wenchang Lei et.al. | 2511.03214 | null |
| 2025-11-05 | QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models | Kuei-Chun Kao et.al. | 2511.03206 | null |
| 2025-11-05 | Large Language Models as Information Sources: Distinctive Characteristics and Types of Low-Quality Information | Jiawei Zhou et.al. | 2511.03198 | null |
| 2025-11-05 | Understanding Robustness of Model Editing in Code LLMs: An Empirical Study | Vinaik Chhetri et.al. | 2511.03182 | null |
| 2025-11-05 | Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control | Rewida Ali et.al. | 2511.03181 | null |
| 2025-11-05 | BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture | Shahriyar Zaman Ridoy et.al. | 2511.03180 | null |
| 2025-11-05 | Toward Autonomous Engineering Design: A Knowledge-Guided Multi-Agent Framework | Varun Kumar et.al. | 2511.03179 | null |
| 2025-11-05 | SurgAnt-ViVQA: Learning to Anticipate Surgical Events through GRU-Driven Temporal Cross-Attention | Shreyas C. Dhake et.al. | 2511.03178 | null |
| 2025-11-05 | AI as We Describe It: How Large Language Models and Their Applications in Health are Represented Across Channels of Public Discourse | Jiawei Zhou et.al. | 2511.03174 | null |
| 2025-11-05 | Measuring Aleatoric and Epistemic Uncertainty in LLMs: Empirical Evaluation on ID and OOD QA Tasks | Kevin Wang et.al. | 2511.03166 | null |
| 2025-11-05 | RefAgent: A Multi-agent LLM-based Framework for Automatic Software Refactoring | Khouloud Oueslati et.al. | 2511.03153 | null |
| 2025-11-05 | From Measurement to Expertise: Empathetic Expert Adapters for Context-Based Empathy in Conversational AI Agents | Erfan Shayegani et.al. | 2511.03143 | null |
| 2025-11-05 | A Proprietary Model-Based Safety Response Framework for AI Agents | Qi Li et.al. | 2511.03138 | null |
| 2025-11-05 | Using Multi-modal Large Language Model to Boost Fireworks Algorithm’s Ability in Settling Challenging Optimization Tasks | Shipeng Cen et.al. | 2511.03137 | null |
| 2025-11-05 | From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation | Najrin Sultana et.al. | 2511.03128 | null |
| 2025-11-05 | Control Barrier Function for Aligning Large Language Models | Yuya Miyaoka et.al. | 2511.03121 | null |
| 2025-11-05 | Large language models require a new form of oversight: capability-based monitoring | Katherine C. Kellogg et.al. | 2511.03106 | null |
| 2025-11-05 | CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic | Saad Mankarious et.al. | 2511.03102 | null |
| 2025-11-05 | ALAS: Transactional and Dynamic Multi-Agent LLM Planning | Longling Geng et.al. | 2511.03094 | null |
| 2025-11-05 | SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators | Jonathan Li et.al. | 2511.03092 | null |
| 2025-11-05 | PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech | Michel Wong et.al. | 2511.03080 | null |
| 2025-11-04 | A Collaborative Reasoning Framework for Anomaly Diagnostics in Underwater Robotics | Markus Buchholz et.al. | 2511.03075 | null |
| 2025-11-04 | Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge | Drago Plecko et.al. | 2511.03070 | null |
| 2025-11-04 | Reading Between the Lines: The One-Sided Conversation Problem | Victoria Ebert et.al. | 2511.03056 | null |
| 2025-11-04 | No-Human in the Loop: Agentic Evaluation at Scale for Recommendation | Tao Zhang et.al. | 2511.03051 | null |
| 2025-11-04 | ROBoto2: An Interactive System and Dataset for LLM-assisted Clinical Trial Risk of Bias Assessment | Anthony Hevia et.al. | 2511.03048 | null |
| 2025-11-04 | Unsupervised Evaluation of Multi-Turn Objective-Driven Interactions | Emi Soroka et.al. | 2511.03047 | null |
| 2025-11-04 | Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis | Yan Cathy Hua et.al. | 2511.03034 | null |
| 2025-11-04 | PublicAgent: Multi-Agent Design Principles From an LLM-Based Open Data Analysis Framework | Sina Montazeri et.al. | 2511.03023 | null |
| 2025-11-04 | LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation | Gyeom Hwangbo et.al. | 2511.03001 | null |
| 2025-11-04 | Zero-shot data citation function classification using transformer-based large language models (LLMs) | Neil Byers et.al. | 2511.02936 | null |
| 2025-11-04 | Cache Mechanism for Agent RAG Systems | Shuhang Lin et.al. | 2511.02919 | null |
| 2025-11-04 | Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models | W. K. M Mithsara et.al. | 2511.02894 | null |
| 2025-11-04 | Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything | Huawei Lin et.al. | 2511.02834 | null |
| 2025-11-04 | Can LLMs subtract numbers? | Mayank Jobanputra et.al. | 2511.02795 | null |
| 2025-11-04 | When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning | Chenyu Zhang et.al. | 2511.02794 | null |
| 2025-11-04 | When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought | Yiyang Zhou et.al. | 2511.02779 | null |
| 2025-11-04 | ConMeZO: Adaptive Descent-Direction Sampling for Gradient-Free Finetuning of Large Language Models | Lejs Deen Behric et.al. | 2511.02757 | null |
| 2025-11-04 | Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning | Bowen Jin et.al. | 2511.02755 | null |
| 2025-11-04 | AI Diffusion in Low Resource Language Countries | Amit Misra et.al. | 2511.02752 | null |
| 2025-11-04 | Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning | Farhad Rezazadeh et.al. | 2511.02748 | null |
| 2025-11-04 | CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents | Jiayu Liu et.al. | 2511.02734 | link |
| 2025-11-04 | LLEXICORP: End-user Explainability of Convolutional Neural Networks | Vojtěch Kůr et.al. | 2511.02720 | null |
| 2025-11-04 | ReleaseEval: A Benchmark for Evaluating Language Models in Automated Release Note Generation | Qianru Meng et.al. | 2511.02713 | null |
| 2025-11-04 | VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models | Zhicheng Zhang et.al. | 2511.02712 | null |
| 2025-11-04 | Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs | Georgios Tzannetos et.al. | 2511.02690 | null |
| 2025-11-04 | Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes | Mohammadsajad Alipour et.al. | 2511.02681 | null |
| 2025-11-04 | EasyTUS: A Comprehensive Framework for Fast and Accurate Table Union Search across Data Lakes | Tim Otto et.al. | 2511.02674 | null |
| 2025-11-04 | Apriel-H1: Towards Efficient Enterprise Reasoning Models | Oleksiy Ostapenko et.al. | 2511.02651 | null |
| 2025-11-04 | Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks | Xiumei Deng et.al. | 2511.02647 | null |
| 2025-11-04 | DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning | Lachlan McPheat et.al. | 2511.02627 | null |
| 2025-11-04 | Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis, Solution, and Interpretation | Renfei Dang et.al. | 2511.02626 | null |
| 2025-11-04 | The Realignment Problem: When Right becomes Wrong in LLMs | Aakash Sen Sharma et.al. | 2511.02623 | null |
| 2025-11-04 | Verifying LLM Inference to Prevent Model Weight Exfiltration | Roy Rinberg et.al. | 2511.02620 | null |
| 2025-11-04 | UniChange: Unifying Change Detection with Multimodal Large Language Model | Xu Zhang et.al. | 2511.02607 | null |
| 2025-11-04 | CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency | Ehsan Aghazadeh et.al. | 2511.02603 | null |
| 2025-11-04 | Next Token Knowledge Tracing: Exploiting Pretrained LLM Representations to Decode Student Behaviour | Max Norris et.al. | 2511.02599 | null |
| 2025-11-04 | A Large Language Model for Corporate Credit Scoring | Chitro Majumdar et.al. | 2511.02593 | null |
| 2025-11-04 | The ORCA Benchmark: Evaluating Real-World Calculation Accuracy in Large Language Models | Claudia Herambourg et.al. | 2511.02589 | null |
| 2025-11-04 | Smart-Hiring: An Explainable end-to-end Pipeline for CV Information Extraction and Job Matching | Kenza Khelkhal et.al. | 2511.02537 | null |
| 2025-11-04 | Knowledge Graph-enhanced Large Language Model for Incremental Game PlayTesting | Enhong Mu et.al. | 2511.02534 | null |
| 2025-11-04 | Causal Graph Neural Networks for Healthcare | Munib Mesinovic et.al. | 2511.02531 | null |
| 2025-11-04 | Large Lemma Miners: Can LLMs do Induction Proofs for Hardware? | Romy Peled et.al. | 2511.02521 | null |
| 2025-11-04 | ESA: Energy-Based Shot Assembly Optimization for Automatic Video Editing | Yaosen Chen et.al. | 2511.02505 | null |
| 2025-11-04 | BRAINS: A Retrieval-Augmented System for Alzheimer’s Detection and Monitoring | Rajan Das Gupta et.al. | 2511.02490 | null |
| 2025-11-04 | Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization | Tao Liu et.al. | 2511.02489 | link |
| 2025-11-04 | Modeling Hawkish-Dovish Latent Beliefs in Multi-Agent Debate-Based LLMs for Monetary Policy Decision Classification | Kaito Takano et.al. | 2511.02469 | null |
| 2025-11-04 | Auditable-choice reframing unlocks RL-based verification for open-ended tasks | Mengyu Zhang et.al. | 2511.02463 | null |
| 2025-11-04 | Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas | Giulia Iadisernia et.al. | 2511.02458 | null |
| 2025-11-04 | Who’s Who? LLM-assisted Software Traceability with Architecture Entity Recognition | Dominik Fuchß et.al. | 2511.02434 | null |
| 2025-11-04 | Can Conversational AI Counsel for Change? A Theory-Driven Approach to Supporting Dietary Intentions in Ambivalent Individuals | Michelle Bak et.al. | 2511.02428 | null |
| 2025-11-04 | From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics | Nicolas Schuler et.al. | 2511.02427 | null |
| 2025-11-04 | ReAcTree: Hierarchical LLM Agent Trees with Control Flow for Long-Horizon Task Planning | Jae-Woo Choi et.al. | 2511.02424 | null |
| 2025-11-04 | LLM4PG: Adapting Large Language Model for Pathloss Map Generation via Synesthesia of Machines | Mingran Sun et.al. | 2511.02423 | null |
| 2025-11-04 | ChartM $^3$ : A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension | Duo Xu et.al. | 2511.02415 | null |
| 2025-11-04 | EvoDev: An Iterative Feature-Driven Framework for End-to-End Software Development with LLM-based Agents | Junwei Liu et.al. | 2511.02399 | null |
| 2025-11-04 | RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning | Jiahe Song et.al. | 2511.02384 | null |
| 2025-11-04 | Revisiting put-that-there, context aware window interactions via LLMs | Riccardo Bovo et.al. | 2511.02378 | null |
| 2025-11-04 | AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models | Aashray Reddy et.al. | 2511.02376 | null |
| 2025-11-04 | AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda | Mohd Nauman et.al. | 2511.02374 | null |
| 2025-11-04 | LUMA-RAG: Lifelong Multimodal Agents with Provably Stable Streaming Alignment | Rohan Wandre et.al. | 2511.02371 | null |
| 2025-11-04 | An LLM-powered MILP modelling engine for workforce scheduling guided by expert knowledge | Qingyang Li et.al. | 2511.02364 | null |
| 2025-11-04 | Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation | Wongyu Kim et.al. | 2511.02358 | null |
| 2025-11-04 | An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks | Xu Liu et.al. | 2511.02356 | null |
| 2025-11-04 | LTD-Bench: Evaluating Large Language Models by Letting Them Draw | Liuhao Lin et.al. | 2511.02347 | link |
| 2025-11-04 | Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation | Zhiwei Zhang et.al. | 2511.02303 | null |
| 2025-11-04 | VFocus: Better Verilog Generation from Large Language Model via Focused Reasoning | Zhuorui Zhao et.al. | 2511.02285 | null |
| 2025-11-04 | SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning | Fangxun Shu et.al. | 2511.02280 | link |
| 2025-11-04 | LA-MARRVEL: A Knowledge-Grounded and Language-Aware LLM Reranker for AI-MARRVEL in Rare Disease Diagnosis | Jaeyeon Lee et.al. | 2511.02263 | null |
| 2025-11-04 | When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs | Zhuoran Zhang et.al. | 2511.02243 | null |
| 2025-11-04 | Deep Ideation: Designing LLM Agents to Generate Novel Research Ideas on Scientific Concept Network | Keyu Zhao et.al. | 2511.02238 | null |
| 2025-11-04 | An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM | Jiawei Liu et.al. | 2511.02234 | null |
| 2025-11-04 | Quantitative Risk Assessment in Radiation Oncology via LLM-Powered Root Cause Analysis of Incident Reports | Yuntao Wang et.al. | 2511.02223 | null |
| 2025-11-04 | TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data | Changjiang Jiang et.al. | 2511.02219 | null |
| 2025-11-04 | IG-Pruning: Input-Guided Block Pruning for Large Language Models | Kangyu Qiao et.al. | 2511.02213 | null |
| 2025-11-04 | Language-Enhanced Generative Modeling for PET Synthesis from MRI and Blood Biomarkers | Zhengjie Zhang et.al. | 2511.02206 | null |
| 2025-11-04 | LLMs as Judges: Toward The Automatic Review of GSN-compliant Assurance Cases | Gerhard Yu et.al. | 2511.02203 | null |
| 2025-11-04 | Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration | Jingbo Wang et.al. | 2511.02200 | null |
| 2025-11-04 | Open the Oyster: Empirical Evaluation and Improvement of Code Reasoning Confidence in LLMs | Shufan Wang et.al. | 2511.02197 | null |
| 2025-11-04 | Personalized Decision Modeling: Utility Optimization or Textualized-Symbolic Reasoning | Yibo Zhao et.al. | 2511.02194 | null |
| 2025-11-04 | Pinpointing Trigger Moment for Grounded Video QA: Enhancing Spatio-temporal Grounding in Multimodal Large Language Models | Jinhwan Seo et.al. | 2511.02182 | null |
| 2025-11-04 | Eliminating Multi-GPU Performance Taxes: A Systems Approach to Efficient Distributed LLMs | Octavian Alexandru Trifan et.al. | 2511.02168 | null |
| 2025-11-03 | Rethinking LLM Human Simulation: When a Graph is What You Need | Joseph Suh et.al. | 2511.02135 | null |
| 2025-11-03 | InsurAgent: A Large Language Model-Empowered Agent for Simulating Individual Behavior in Purchasing Flood Insurance | Ziheng Geng et.al. | 2511.02119 | null |
| 2025-11-03 | Deep Value Benchmark: Measuring Whether Models Generalize Deep values or Shallow Preferences | Joshua Ashkinaze et.al. | 2511.02109 | null |
| 2025-11-03 | Metamorphic Testing of Large Language Models for Natural Language Processing | Steven Cho et.al. | 2511.02108 | null |
| 2025-11-03 | LLM Probing with Contrastive Eigenproblems: Improving Understanding and Applicability of CCS | Stefan F. Schouten et.al. | 2511.02089 | null |
| 2025-11-03 | Watermarking Discrete Diffusion Language Models | Avi Bagchi et.al. | 2511.02083 | null |
| 2025-10-10 | A Unified Biomedical Named Entity Recognition Framework with Large Language Models | Tengxiao Lv et.al. | 2510.08902 | null |
| 2025-09-25 | SCRA-VQA: Summarized Caption-Rerank for Augmented Large Language Models in Visual Question Answering | Yan Zhang et.al. | 2509.20871 | null |
| 2025-08-12 | LLaMA-Based Models for Aspect-Based Sentiment Analysis | Jakub Šmíd et.al. | 2508.08649 | null |
| 2025-07-23 | BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems | Malsha Ashani Mahawatta Dona et.al. | 2507.17722 | null |
| 2025-07-23 | AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer | Danny D. Leybzon et.al. | 2507.17718 | null |
| 2025-07-23 | HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging | Taha Ceritli et.al. | 2507.17706 | null |
| 2025-07-23 | Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models | Changxin Tian et.al. | 2507.17702 | null |
| 2025-07-23 | Thinking Isn’t an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations | Zhao Song et.al. | 2507.17699 | null |
| 2025-07-23 | Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks | Ilias Chatzistefanidis et.al. | 2507.17695 | null |
| 2025-07-23 | Simulating multiple human perspectives in socio-ecological systems using large language models | Yongchao Zeng et.al. | 2507.17680 | null |
| 2025-07-23 | See the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question Answering | Junjie Wang et.al. | 2507.17659 | null |
| 2025-07-23 | Who Attacks, and Why? Using LLMs to Identify Negative Campaigning in 18M Tweets across 19 Countries | Victor Hartman et.al. | 2507.17636 | null |
| 2025-07-23 | A Hybrid Early-Exit Algorithm for Large Language Models Based on Space Alignment Decoding (SPADE) | Bowen Zheng et.al. | 2507.17618 | null |
| 2025-07-22 | LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMs | Da-Chen Lian et.al. | 2507.16809 | null |
| 2025-07-22 | Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis | Zhihao Xu et.al. | 2507.16808 | null |
| 2025-07-22 | Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning | Yanjun Zheng et.al. | 2507.16802 | link |
| 2025-07-23 | Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent | Xiaoyu Zhan et.al. | 2507.16799 | null |
| 2025-07-22 | Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning | Helena Casademunt et.al. | 2507.16795 | link |
| 2025-07-22 | ChatChecker: A Framework for Dialogue System Testing and Evaluation Through Non-cooperative User Simulation | Roman Mayr et.al. | 2507.16792 | null |
| 2025-07-22 | Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning | Hongyin Luo et.al. | 2507.16784 | link |
| 2025-07-22 | Cooling Matters: Benchmarking Large Language Models and Vision-Language Models on Liquid-Cooled Versus Air-Cooled H100 GPU Systems | Imran Latif et.al. | 2507.16781 | null |
| 2025-07-22 | When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs | Yue Li et.al. | 2507.16773 | null |
| 2025-07-22 | WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding | Ran Wang et.al. | 2507.16768 | null |
| 2025-07-21 | Diffusion Beats Autoregressive in Data-Constrained Settings | Mihir Prabhudesai et.al. | 2507.15857 | null |
| 2025-07-21 | Gemini 2.5 Pro Capable of Winning Gold at IMO 2025 | Yichen Huang et.al. | 2507.15855 | null |
| 2025-07-21 | The Other Mind: How Language Models Exhibit Human Temporal Cognition | Lingyu Li et.al. | 2507.15851 | link |
| 2025-07-21 | 3LM: Bridging Arabic, STEM, and Code through Benchmarking | Basma El Amel Boussaha et.al. | 2507.15850 | null |
| 2025-07-21 | The Impact of Language Mixing on Bilingual LLM Reasoning | Yihao Li et.al. | 2507.15849 | null |
| 2025-07-21 | FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs | Anh Nguyen et.al. | 2507.15839 | null |
| 2025-07-21 | Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation | Alessandro B. Melchiorre et.al. | 2507.15826 | null |
| 2025-07-21 | ACS: An interactive framework for conformal selection | Yu Gui et.al. | 2507.15825 | null |
| 2025-07-21 | Do AI models help produce verified bug fixes? | Li Huang et.al. | 2507.15822 | null |
| 2025-07-21 | LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra | Seth Karten et.al. | 2507.15815 | link |
| 2025-07-18 | CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning | Xiaoya Li et.al. | 2507.14111 | null |
| 2025-07-18 | Automated Interpretation of Non-Destructive Evaluation Contour Maps Using Large Language Models for Bridge Condition Assessment | Viraj Nishesh Darji et.al. | 2507.14107 | null |
| 2025-07-18 | Generative AI-Driven High-Fidelity Human Motion Simulation | Hari Iyer et.al. | 2507.14097 | null |
| 2025-07-18 | Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track | Brian Ondov et.al. | 2507.14096 | null |
| 2025-07-18 | DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration | Xiyun Li et.al. | 2507.14088 | null |
| 2025-07-18 | The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems? | Maria Tsfasman et.al. | 2507.14084 | null |
| 2025-07-18 | DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits | Garapati Keerthana et.al. | 2507.14079 | null |
| 2025-07-18 | Evaluating the Effectiveness of Cost-Efficient Large Language Models in Benchmark Biomedical Tasks | Israt Jahan et.al. | 2507.14045 | null |
| 2025-07-18 | Architecting Human-AI Cocreation for Technical Services – Interaction Modes and Contingency Factors | Jochen Wulf et.al. | 2507.14034 | null |
| 2025-07-18 | KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models | Lam Nguyen et.al. | 2507.14032 | null |
| 2025-07-17 | VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding | Shihao Wang et.al. | 2507.13353 | null |
| 2025-07-17 | Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes | Tyler Loakman et.al. | 2507.13335 | null |
| 2025-07-17 | A Survey of Context Engineering for Large Language Models | Lingrui Mei et.al. | 2507.13334 | null |
| 2025-07-17 | The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner | Zhouqi Hua et.al. | 2507.13332 | null |
| 2025-07-17 | GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM | Kyeongjin Ahn et.al. | 2507.13323 | null |
| 2025-07-17 | Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark | Junsu Kim et.al. | 2507.13314 | null |
| 2025-07-17 | The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations | Carlos Arriaga et.al. | 2507.13302 | null |
| 2025-07-17 | AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research | Yilun Zhao et.al. | 2507.13300 | null |
| 2025-07-17 | Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management | Luis Gasco et.al. | 2507.13275 | null |
| 2025-07-17 | Automating Steering for Safe Multimodal Large Language Models | Lyucheng Wu et.al. | 2507.13255 | null |
| 2025-07-16 | Mitigating Object Hallucinations via Sentence-Level Early Intervention | Shangpin Peng et.al. | 2507.12455 | null |
| 2025-07-16 | S2WTM: Spherical Sliced-Wasserstein Autoencoder for Topic Modeling | Suman Adhya et.al. | 2507.12451 | null |
| 2025-07-16 | Describe Anything Model for Visual Question Answering on Text-rich Images | Yen-Linh Vu et.al. | 2507.12441 | null |
| 2025-07-16 | Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models | Yik Siu Chan et.al. | 2507.12428 | null |
| 2025-07-16 | Advancing Retrieval-Augmented Generation for Structured Enterprise and Internal Data | Chandana Cheerla et.al. | 2507.12425 | null |
| 2025-07-16 | QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval | Jaehyun Kwak et.al. | 2507.12416 | null |
| 2025-07-16 | SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? | Xinyi He et.al. | 2507.12415 | null |
| 2025-07-16 | Assessing the Value of Visual Input: A Benchmark of Multimodal Large Language Models for Robotic Path Planning | Jacinto Colan et.al. | 2507.12391 | null |
| 2025-07-16 | Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics | Meysam Alizadeh et.al. | 2507.12372 | null |
| 2025-07-16 | Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate | Ana Davila et.al. | 2507.12370 | null |
| 2025-07-15 | Streaming 4D Visual Geometry Transformer | Dong Zhuo et.al. | 2507.11539 | null |
| 2025-07-15 | DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering | Yinsheng Li et.al. | 2507.11527 | null |
| 2025-07-15 | LLM-based ambiguity detection in natural language instructions for collaborative surgical robots | Ana Davila et.al. | 2507.11525 | null |
| 2025-07-15 | AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air | Shiyi Yang et.al. | 2507.11515 | null |
| 2025-07-15 | LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer | Yaoxian Dong et.al. | 2507.11457 | null |
| 2025-07-15 | Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize? | Yanjian Zhang et.al. | 2507.11423 | null |
| 2025-07-15 | Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations | Miray Özcan et.al. | 2507.11417 | null |
| 2025-07-15 | Seq vs Seq: An Open Suite of Paired Encoders and Decoders | Orion Weller et.al. | 2507.11412 | null |
| 2025-07-15 | KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? | Soumadeep Saha et.al. | 2507.11408 | null |
| 2025-07-15 | EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes | LG AI Research et.al. | 2507.11407 | null |
| 2025-07-14 | Fusing LLM Capabilities with Routing Data | Tao Feng et.al. | 2507.10540 | null |
| 2025-07-14 | CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks | Hongchao Jiang et.al. | 2507.10535 | null |
| 2025-07-14 | Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination | Mingqi Wu et.al. | 2507.10532 | null |
| 2025-07-14 | Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI | Jiangkai Wu et.al. | 2507.10510 | null |
| 2025-07-14 | Scene-Aware Conversational ADAS with Generative AI for Real-Time Driver Assistance | Kyungtae Han et.al. | 2507.10500 | null |
| 2025-07-14 | Can You Detect the Difference? | İsmail Tarım et.al. | 2507.10475 | null |
| 2025-07-14 | GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space | David G. Shatwell et.al. | 2507.10473 | null |
| 2025-07-14 | MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking | Mohamed T. Younes et.al. | 2507.10472 | null |
| 2025-07-14 | An Empirical Evaluation of AI-Powered Non-Player Characters’ Perceived Realism and Performance in Virtual Reality Environments | Mikko Korkiakoski et.al. | 2507.10469 | null |
| 2025-07-14 | Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems | Hammad Atta et.al. | 2507.10457 | null |
| 2025-07-11 | Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective | Hangjie Yuan et.al. | 2507.08801 | null |
| 2025-07-11 | One Token to Fool LLM-as-a-Judge | Yulai Zhao et.al. | 2507.08794 | null |
| 2025-07-11 | BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity | Chenyang Song et.al. | 2507.08771 | null |
| 2025-07-11 | Multilingual Multimodal Software Developer for Code Generation | Linzheng Chai et.al. | 2507.08719 | null |
| 2025-07-11 | KG-Attention: Knowledge Graph-Guided Attention at Test-Time via Bidirectional Information Aggregation | Songlin Zhai et.al. | 2507.08704 | null |
| 2025-07-11 | ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way | Rajarshi Roy et.al. | 2507.08679 | null |
| 2025-07-11 | LLMCup: Ranking-Enhanced Comment Updating with LLMs | Hua Ge et.al. | 2507.08671 | null |
| 2025-07-11 | KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment | Jiyao Zhang et.al. | 2507.08665 | null |
| 2025-07-11 | Introspection of Thought Helps AI Agents | Haoran Sun et.al. | 2507.08664 | null |
| 2025-07-11 | Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning | Xingguang Ji et.al. | 2507.08649 | null |
| 2025-07-10 | Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology | Haochen Wang et.al. | 2507.07999 | null |
| 2025-07-10 | Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs | Ziyue Li et.al. | 2507.07996 | null |
| 2025-07-10 | Multigranular Evaluation for Brain Visual Decoding | Weihao Xia et.al. | 2507.07993 | null |
| 2025-07-10 | Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs | Jeongseok Hyun et.al. | 2507.07990 | null |
| 2025-07-10 | Automating Expert-Level Medical Reasoning Evaluation of Large Language Models | Shuang Zhou et.al. | 2507.07988 | null |
| 2025-07-10 | OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding | JingLi Lin et.al. | 2507.07984 | null |
| 2025-07-10 | Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology | Sabine Felde et.al. | 2507.07983 | null |
| 2025-07-10 | Defending Against Prompt Injection With a Few DefensiveTokens | Sizhe Chen et.al. | 2507.07974 | null |
| 2025-07-10 | Can Large Language Models Improve Phishing Defense? A Large-Scale Controlled Experiment on Warning Dialogue Explanations | Federico Maria Cau et.al. | 2507.07916 | null |
| 2025-07-10 | DTECT: Dynamic Topic Explorer & Context Tracker | Suman Adhya et.al. | 2507.07910 | null |
| 2025-07-09 | Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor | Vatsal Agarwal et.al. | 2507.07106 | null |
| 2025-07-09 | Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models | Tiezheng Zhang et.al. | 2507.07104 | null |
| 2025-07-09 | Evaluating Attribute Confusion in Fashion Text-to-Image Generation | Ziyue Liu et.al. | 2507.07079 | null |
| 2025-07-09 | 5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage | Ugur Ari et.al. | 2507.07045 | null |
| 2025-07-09 | UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations | Fengran Mo et.al. | 2507.07030 | null |
| 2025-07-09 | First Return, Entropy-Eliciting Explore | Tianyu Zheng et.al. | 2507.07017 | null |
| 2025-07-09 | GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning | S M Taslim Uddin Raju et.al. | 2507.07006 | null |
| 2025-07-09 | Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs | Yahan Yu et.al. | 2507.06999 | null |
| 2025-07-09 | MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation | Qilong Xing et.al. | 2507.06992 | null |
| 2025-07-09 | Are They All Good? Evaluating the Quality of CoTs in LLM-based Code Generation | Binquan Zhang et.al. | 2507.06980 | null |
| 2025-07-08 | Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers | Zhiyuan Peng et.al. | 2507.06223 | null |
| 2025-07-08 | A Survey on Latent Reasoning | Rui-Jie Zhu et.al. | 2507.06203 | null |
| 2025-07-08 | UQLM: A Python Package for Uncertainty Quantification in Large Language Models | Dylan Bouchard et.al. | 2507.06196 | null |
| 2025-07-08 | SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads | Jiale Lao et.al. | 2507.06192 | null |
| 2025-07-08 | Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review | Zhicheng Lin et.al. | 2507.06185 | null |
| 2025-07-08 | Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling | Prahitha Movva et.al. | 2507.06183 | null |
| 2025-07-08 | Data-Semantics-Aware Recommendation of Diverse Pivot Tables | Whanhee Cho et.al. | 2507.06171 | null |
| 2025-07-09 | Skywork-R1V3 Technical Report | Wei Shen et.al. | 2507.06167 | null |
| 2025-07-08 | Evaluation of Habitat Robotics using Large Language Models | William Li et.al. | 2507.06157 | null |
| 2025-07-08 | Large Language Models Predict Human Well-being – But Not Equally Everywhere | Pat Pataranutaporn et.al. | 2507.06141 | null |
| 2025-07-07 | Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing | Chun-Hsiao Yeh et.al. | 2507.05259 | null |
| 2025-07-07 | Spatio-Temporal LLM: Reasoning about Environments and Actions | Haozhen Zheng et.al. | 2507.05258 | null |
| 2025-07-07 | Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions | Yuanzhe Hu et.al. | 2507.05257 | null |
| 2025-07-07 | Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning | Yana Wei et.al. | 2507.05255 | null |
| 2025-07-07 | Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models | Ziqi Miao et.al. | 2507.05248 | null |
| 2025-07-07 | StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling | Meng Wei et.al. | 2507.05240 | null |
| 2025-07-07 | All in One: Visual-Description-Guided Unified Point Cloud Segmentation | Zongyan Han et.al. | 2507.05211 | null |
| 2025-07-07 | CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale | Jonathan Hyun et.al. | 2507.05178 | null |
| 2025-07-07 | OpenS2S: Advancing Open-Source End-to-End Empathetic Large Speech Language Model | Chen Wang et.al. | 2507.05177 | null |
| 2025-07-07 | AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models | Chinnappa Guggilla et.al. | 2507.05157 | null |
| 2025-07-03 | Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation | Jiaer Xia et.al. | 2507.02859 | null |
| 2025-07-03 | Requirements Elicitation Follow-Up Question Generation | Yuchen Shen et.al. | 2507.02858 | null |
| 2025-07-03 | MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs | Purbesh Mitra et.al. | 2507.02851 | null |
| 2025-07-03 | Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection | Ziqi Miao et.al. | 2507.02844 | null |
| 2025-07-03 | LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding | Yuchen Ma et.al. | 2507.02843 | null |
| 2025-07-03 | StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason | Kaiyi Zhang et.al. | 2507.02841 | null |
| 2025-07-03 | ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning | Ruiyang Zhou et.al. | 2507.02834 | null |
| 2025-07-03 | SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model | Wencheng Zhang et.al. | 2507.02822 | null |
| 2025-07-03 | Multimodal Mathematical Reasoning with Diverse Solving Perspective | Wenhao Shi et.al. | 2507.02804 | null |
| 2025-07-03 | Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models | Riccardo Cantini et.al. | 2507.02799 | null |
| 2025-07-02 | Kwai Keye-VL Technical Report | Kwai Keye Team et.al. | 2507.01949 | null |
| 2025-07-02 | SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars | Xiaosheng Zhao et.al. | 2507.01939 | null |
| 2025-07-02 | The Thin Line Between Comprehension and Persuasion in LLMs | Adrian de Wynter et.al. | 2507.01936 | null |
| 2025-07-02 | Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations | Wenhao Wang et.al. | 2507.01930 | null |
| 2025-07-03 | Decision-Oriented Text Evaluation | Yu-Shiang Huang et.al. | 2507.01923 | null |
| 2025-07-02 | Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models | Chengao Li et.al. | 2507.01915 | null |
| 2025-07-02 | Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning | Qingdong He et.al. | 2507.01908 | null |
| 2025-07-02 | AI4Research: A Survey of Artificial Intelligence for Scientific Research | Qiguang Chen et.al. | 2507.01903 | null |
| 2025-07-02 | High-Layer Attention Pruning with Rescaling | Songtao Liu et.al. | 2507.01900 | null |
| 2025-07-02 | MiCoTA: Bridging the Learnability Gap with Intermediate CoT and Teacher Assistants | Dongyi Ding et.al. | 2507.01887 | null |
| 2025-07-01 | Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives | Sixun Dong et.al. | 2506.24124 | null |
| 2025-06-30 | Calligrapher: Freestyle Text Image Customization | Yue Ma et.al. | 2506.24123 | null |
| 2025-06-30 | Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime | Yuqing Wang et.al. | 2506.24120 | null |
| 2025-06-30 | DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World | Xiangtai Li et.al. | 2506.24102 | null |
| 2025-06-30 | Logit-Gap Steering: Efficient Short-Suffix Jailbreaks for Aligned Large Language Models | Tung-Ling Li et.al. | 2506.24056 | null |
| 2025-06-30 | Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC | Xinming Wei et.al. | 2506.24045 | null |
| 2025-06-30 | A Survey on Vision-Language-Action Models for Autonomous Driving | Sicong Jiang et.al. | 2506.24044 | null |
| 2025-06-30 | EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations | Hyunjong Kim et.al. | 2506.24016 | null |
| 2025-06-30 | Large Language Models Don’t Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective | Anselm R. Strohmaier et.al. | 2506.24006 | null |
| 2025-06-30 | Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning | Seungjun Yi et.al. | 2506.23998 | null |
| 2025-06-27 | The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements | Bingchen Zhao et.al. | 2506.22419 | null |
| 2025-06-27 | HyperCLOVA X THINK Technical Report | NAVER Cloud HyperCLOVA X Team et.al. | 2506.22403 | null |
| 2025-06-27 | Refining Czech GEC: Insights from a Multi-Experiment Approach | Petr Pechman et.al. | 2506.22402 | null |
| 2025-06-27 | QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization | Danush Khanna et.al. | 2506.22396 | null |
| 2025-06-27 | What Makes ChatGPT Effective for Software Issue Resolution? An Empirical Study of Developer-ChatGPT Conversations in GitHub | Ramtin Ehsani et.al. | 2506.22390 | null |
| 2025-06-27 | Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment | Yue Zhang et.al. | 2506.22385 | null |
| 2025-06-27 | Probabilistic Optimality for Inference-time Scaling | Youkang Wang et.al. | 2506.22376 | null |
| 2025-06-27 | Towards Fair Rankings: Leveraging LLMs for Gender Bias Detection and Measurement | Maryam Mousavian et.al. | 2506.22372 | null |
| 2025-06-27 | Can Large Language Models Help Students Prove Software Correctness? An Experimental Study with Dafny | Carolina Carreira et.al. | 2506.22370 | null |
| 2025-06-27 | Concept-Level AI for Telecom: Moving Beyond Large Language Models | Viswanath Kumarskandpriya et.al. | 2506.22359 | null |
| 2025-06-26 | Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test | Ziyue Li et.al. | 2506.21551 | null |
| 2025-06-26 | mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale | Xiaona Zhou et.al. | 2506.21550 | null |
| 2025-06-26 | PsyLite Technical Report | Fangjun Ding et.al. | 2506.21536 | null |
| 2025-06-26 | Exploring the Design Space of 3D MLLMs for CT Report Generation | Mohammed Baharoon et.al. | 2506.21535 | null |
| 2025-06-26 | “What’s Up, Doc?”: Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets | Akshay Paruchuri et.al. | 2506.21532 | null |
| 2025-06-26 | Potemkin Understanding in Large Language Models | Marina Mancoridis et.al. | 2506.21521 | null |
| 2025-06-26 | Mitigating Hallucination of Large Vision-Language Models via Dynamic Logits Calibration | Jiahe Chen et.al. | 2506.21509 | null |
| 2025-06-26 | Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge | Boyu Gou et.al. | 2506.21506 | null |
| 2025-06-26 | Bridging Offline and Online Reinforcement Learning for LLMs | Jack Lanchantin et.al. | 2506.21495 | null |
| 2025-06-26 | Efficient and Reuseable Cloud Configuration Search Using Discovery Spaces | Michael Johnston et.al. | 2506.21467 | null |
| 2025-06-25 | The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind | Andrei Lupu et.al. | 2506.20664 | null |
| 2025-06-25 | Memento: Note-Taking for Your Future Self | Chao Wan et.al. | 2506.20642 | null |
| 2025-06-25 | Towards Community-Driven Agents for Machine Learning Engineering | Sijie Li et.al. | 2506.20640 | null |
| 2025-06-25 | DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation | Shansan Gong et.al. | 2506.20639 | null |
| 2025-06-25 | AI Assistants to Enhance and Exploit the PETSc Knowledge Base | Barry Smith et.al. | 2506.20608 | null |
| 2025-06-25 | Model Editing as a Double-Edged Sword: Steering Agent Ethical Behavior Toward Beneficence or Harm | Baixiang Huang et.al. | 2506.20606 | null |
| 2025-06-25 | Video Perception Models for 3D Scene Synthesis | Rui Huang et.al. | 2506.20601 | null |
| 2025-06-25 | HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction | Zhonghao Shi et.al. | 2506.20566 | null |
| 2025-06-25 | Large Language Model-Driven Code Compliance Checking in Building Information Modeling | Soumya Madireddy et.al. | 2506.20551 | null |
| 2025-06-25 | When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs | Ammar Khairi et.al. | 2506.20544 | null |
| 2025-06-24 | ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing | Long Xing et.al. | 2506.19848 | null |
| 2025-06-24 | JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning | Ai Han et.al. | 2506.19846 | null |
| 2025-06-24 | MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration | Yucheng Zhou et.al. | 2506.19835 | null |
| 2025-06-24 | Curating art exhibitions using machine learning | Eurico Covas et.al. | 2506.19813 | null |
| 2025-06-24 | KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality | Baochang Ren et.al. | 2506.19807 | null |
| 2025-06-24 | LLM-Based Social Simulations Require a Boundary | Zengqing Wu et.al. | 2506.19806 | null |
| 2025-06-24 | KnowML: Improving Generalization of ML-NIDS with Attack Knowledge Graphs | Xin Fan Guo et.al. | 2506.19802 | null |
| 2025-06-24 | Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study | Yuqi Zhu et.al. | 2506.19794 | null |
| 2025-06-24 | SAGE: Strategy-Adaptive Generation Engine for Query Rewriting | Teng Wang et.al. | 2506.19783 | null |
| 2025-06-24 | SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning | Yuqian Fu et.al. | 2506.19767 | null |
| 2025-06-23 | jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval | Michael Günther et.al. | 2506.18902 | null |
| 2025-06-23 | Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations | Jiaming Han et.al. | 2506.18898 | null |
| 2025-06-23 | ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs | Jiaru Zou et.al. | 2506.18896 | null |
| 2025-06-23 | Universal Video Temporal Grounding with Generative Multi-modal Large Language Models | Zeqian Li et.al. | 2506.18883 | null |
| 2025-06-23 | CommVQ: Commutative Vector Quantization for KV Cache Compression | Junyan Li et.al. | 2506.18879 | null |
| 2025-06-23 | OmniGen2: Exploration to Advanced Multimodal Generation | Chenyuan Wu et.al. | 2506.18871 | null |
| 2025-06-23 | TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting | Zhongbin Guo et.al. | 2506.18862 | null |
| 2025-06-23 | LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning | Yuhao Wu et.al. | 2506.18841 | null |
| 2025-06-23 | STU-PID: Steering Token Usage via PID Controller for Efficient Large Language Model Reasoning | Aryasomayajula Ram Bharadwaj et.al. | 2506.18831 | null |
| 2025-06-23 | Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories | Islem Bouzenia et.al. | 2506.18824 | null |
| 2025-06-20 | VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning | Zhangyang Qi et.al. | 2506.17221 | null |
| 2025-06-20 | No Free Lunch: Rethinking Internal Feedback for LLM Reasoning | Yanzhi Zhang et.al. | 2506.17219 | null |
| 2025-06-20 | Fine-Tuning Lowers Safety and Disrupts Evaluation Consistency | Kathleen C. Fraser et.al. | 2506.17209 | null |
| 2025-06-20 | Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems | Matias Martinez et.al. | 2506.17208 | null |
| 2025-06-20 | Confidence Scoring for LLM-Generated SQL in Supply Chain Data Extraction | Jiekai Ma et.al. | 2506.17203 | null |
| 2025-06-20 | Detecting LLM-Generated Short Answers and Effects on Learner Performance | Shambhavi Bhushan et.al. | 2506.17196 | null |
| 2025-06-20 | The MedPerturb Dataset: What Non-Content Perturbations Reveal About Human and Clinical LLM Decision Making | Abinitha Gourabathina et.al. | 2506.17163 | null |
| 2025-06-20 | Do We Need Large VLMs for Spotting Soccer Actions? | Ritabrata Chakraborty et.al. | 2506.17144 | null |
| 2025-06-20 | Large Language Model Unlearning for Source Code | Xue Jiang et.al. | 2506.17125 | null |
| 2025-06-20 | When Can Model-Free Reinforcement Learning be Enough for Thinking? | Josiah P. Hanna et.al. | 2506.17124 | null |
| 2025-06-18 | PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning | Yuhui Shi et.al. | 2506.15683 | null |
| 2025-06-18 | GenRecal: Generation after Recalibration from Large to Small Vision-Language Models | Byung-Kwan Lee et.al. | 2506.15681 | null |
| 2025-06-18 | SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence | Yao Zhang et.al. | 2506.15672 | null |
| 2025-06-18 | CC-LEARN: Cohort-based Consistency Learning | Xiao Ye et.al. | 2506.15662 | null |
| 2025-06-18 | PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection | Wenhao Li et.al. | 2506.15656 | null |
| 2025-06-18 | deepSURF: Detecting Memory Safety Vulnerabilities in Rust Through Fuzzing LLM-Augmented Harnesses | Georgios Androutsopoulos et.al. | 2506.15648 | null |
| 2025-06-18 | Demystifying the Visual Quality Paradox in Multimodal Large Language Models | Shuo Xing et.al. | 2506.15645 | null |
| 2025-06-18 | Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability | Yusuke Sakai et.al. | 2506.15629 | null |
| 2025-06-18 | The Effect of State Representation on LLM Agent Behavior in Dynamic Routing Games | Lyle Goodyear et.al. | 2506.15624 | null |
| 2025-06-18 | The Compositional Architecture of Regret in Large Language Models | Xiangxiang Cui et.al. | 2506.15617 | null |
| 2025-06-17 | A Variational Framework for Improving Naturalness in Generative Spoken Language Models | Li-Wei Chen et.al. | 2506.14767 | link |
| 2025-06-17 | ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM | Yujun Wang et.al. | 2506.14766 | null |
| 2025-06-17 | Large Language Models – the Future of Fundamental Physics? | Caroline Heneka et.al. | 2506.14757 | null |
| 2025-06-17 | Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs | Ring Team et.al. | 2506.14731 | null |
| 2025-06-17 | AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes | Jiahao Qiu et.al. | 2506.14728 | link |
| 2025-06-17 | HARMONY: A Scalable Distributed Vector Database for High-Throughput Approximate Nearest Neighbor Search | Qian Xu et.al. | 2506.14707 | null |
| 2025-06-17 | Capacity Matters: a Proof-of-Concept for Transformer Memorization on Real-World Data | Anton Changalidis et.al. | 2506.14704 | null |
| 2025-06-17 | Unified Software Engineering agent as AI Software Engineer | Leonhard Applis et.al. | 2506.14683 | null |
| 2025-06-17 | AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models | Ads Dawson et.al. | 2506.14682 | null |
| 2025-06-17 | Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality | Yuto Harada et.al. | 2506.14681 | null |
| 2025-06-16 | Steering LLM Thinking with Budget Guidance | Junyan Li et.al. | 2506.13752 | link |
| 2025-06-16 | Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability | Shova Kuikel et.al. | 2506.13746 | link |
| 2025-06-16 | Instruction Following by Boosting Attention of Large Language Models | Vitoria Guardieiro et.al. | 2506.13734 | null |
| 2025-06-16 | Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs | Sayed Mohammad Vakilzadeh Hatefi et.al. | 2506.13727 | null |
| 2025-06-16 | Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models | Arjun Krishna et.al. | 2506.13726 | null |
| 2025-06-16 | TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning | Junru Zhang et.al. | 2506.13705 | link |
| 2025-06-16 | Balancing Knowledge Delivery and Emotional Comfort in Healthcare Conversational Systems | Shang-Chi Tsai et.al. | 2506.13692 | null |
| 2025-06-16 | What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers | Pulkit Gopalani et.al. | 2506.13688 | link |
| 2025-06-16 | An LLM’s Apology: Outsourcing Awkwardness in the Age of AI | Twm Stone et.al. | 2506.13685 | null |
| 2025-06-16 | Prefix-Tuning+: Modernizing Prefix-Tuning through Attention Independent Prefix Data | Haonan Wang et.al. | 2506.13674 | null |
| 2025-06-13 | code_transformed: The Influence of Large Language Models on Code | Yuliang Xu et.al. | 2506.12014 | null |
| 2025-06-13 | Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making | Xiaopeng Yuan et.al. | 2506.12012 | null |
| 2025-06-13 | VGR: Visual Grounded Reasoning | Jiacong Wang et.al. | 2506.11991 | null |
| 2025-06-13 | How Visual Representations Map to Language Feature Space in Multimodal LLMs | Constantin Venhoff et.al. | 2506.11976 | null |
| 2025-06-13 | Improving Large Language Model Safety with Contrastive Representation Learning | Samuel Simko et.al. | 2506.11938 | null |
| 2025-06-13 | Temporal Dynamics of Emotions in Italian Online Soccer Fandoms | Salvatore Citraro et.al. | 2506.11934 | null |
| 2025-06-13 | LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? | Zihan Zheng et.al. | 2506.11928 | link |
| 2025-06-13 | Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache | Xiaoran Liu et.al. | 2506.11886 | null |
| 2025-06-13 | Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment | Alejandro Peña et.al. | 2506.11880 | null |
| 2025-06-13 | A Short Survey on Formalising Software Requirements using Large Language Models | Arshad Beg et.al. | 2506.11874 | null |
| 2025-06-12 | AutoMind: Adaptive Knowledgeable Agent for Automated Data Science | Yixin Ou et.al. | 2506.10974 | null |
| 2025-06-12 | Farseer: A Refined Scaling Law in Large Language Models | Houyi Li et.al. | 2506.10972 | link |
| 2025-06-12 | Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs | Qizhe Zhang et.al. | 2506.10967 | null |
| 2025-06-12 | ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark | Kangwei Liu et.al. | 2506.10960 | link |
| 2025-06-12 | SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks | Lianghong Guo et.al. | 2506.10954 | link |
| 2025-06-12 | Build the web for agents, not agents for the web | Xing Han Lù et.al. | 2506.10953 | null |
| 2025-06-12 | Execution Guided Line-by-Line Code Generation | Boaz Lavon et.al. | 2506.10948 | null |
| 2025-06-12 | GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models | Evelyn Ma et.al. | 2506.10946 | null |
| 2025-06-12 | Self-Adapting Language Models | Adam Zweiger et.al. | 2506.10943 | null |
| 2025-06-12 | Building a Media Ecosystem Observatory from Scratch: Infrastructure, Methodology, and Insights | Zeynep Pehlivan et.al. | 2506.10942 | null |
| 2025-06-11 | Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling | Tim Z. Xiao et.al. | 2506.09998 | null |
| 2025-06-11 | From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring | Yang Li et.al. | 2506.09996 | null |
| 2025-06-11 | Large Language Models for Toxic Language Detection in Low-Resource Balkan Languages | Amel Muminovic et.al. | 2506.09992 | link |
| 2025-06-11 | Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation | Xinyu Yang et.al. | 2506.09991 | null |
| 2025-06-11 | V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning | Mido Assran et.al. | 2506.09985 | link |
| 2025-06-11 | Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs | Hiroshi Matsuda et.al. | 2506.09983 | null |
| 2025-06-11 | SRLAgent: Enhancing Self-Regulated Learning Skills through Gamification and LLM Assistance | Wentao Ge et.al. | 2506.09968 | null |
| 2025-06-11 | Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing | Junfei Wu et.al. | 2506.09965 | link |
| 2025-06-11 | Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy | Sushant Gautam et.al. | 2506.09958 | link |
| 2025-06-11 | LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge | Sahar Abdelnabi et.al. | 2506.09956 | null |
| 2025-06-09 | GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior | Penghao Wu et.al. | 2506.08012 | link |
| 2025-06-09 | Play to Generalize: Learning to Reason Through Game Play | Yunfei Xie et.al. | 2506.08011 | link |
| 2025-06-09 | Reinforcement Pre-Training | Qingxiu Dong et.al. | 2506.08007 | null |
| 2025-06-09 | Reparameterized LLM Training via Orthogonal Equivalence Transformation | Zeju Qiu et.al. | 2506.08001 | link |
| 2025-06-09 | Supporting Construction Worker Well-Being with a Multi-Agent Conversational AI System | Fan Yang et.al. | 2506.07997 | null |
| 2025-06-09 | $τ^2$ -Bench: Evaluating Conversational Agents in a Dual-Control Environment | Victor Barres et.al. | 2506.07982 | link |
| 2025-06-09 | HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization | Hongzheng Chen et.al. | 2506.07972 | link |
| 2025-06-09 | CyberV: Cybernetics for Test-time Scaling in Video Understanding | Jiahao Meng et.al. | 2506.07971 | link |
| 2025-06-09 | SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence | Ziyang Gong et.al. | 2506.07966 | link |
| 2025-06-09 | Reinforcing Multimodal Understanding and Generation with Dual Self-rewards | Jixiang Hong et.al. | 2506.07963 | null |
| 2025-06-06 | Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias | Yuanzhe Hu et.al. | 2506.06280 | null |
| 2025-06-06 | CoMemo: LVLMs Need Image Context with Image Memory | Shi Liu et.al. | 2506.06279 | link |
| 2025-06-06 | AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization | Mukur Gupta et.al. | 2506.06273 | null |
| 2025-06-06 | Cartridges: Lightweight and general-purpose long context representations via self-study | Sabri Eyuboglu et.al. | 2506.06266 | link |
| 2025-06-06 | PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time | Weizhi Zhang et.al. | 2506.06254 | null |
| 2025-06-06 | DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation | Jingyu Xiao et.al. | 2506.06251 | link |
| 2025-06-06 | Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models | Zahra Babaiee et.al. | 2506.06242 | null |
| 2025-06-06 | Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge | Yi Sui et.al. | 2506.06240 | null |
| 2025-06-06 | CompilerGPT: Leveraging Large Language Models for Analyzing and Acting on Compiler Optimization Reports | Peter Pirkelbauer et.al. | 2506.06227 | null |
| 2025-06-06 | PROVSYN: Synthesizing Provenance Graphs for Data Augmentation in Intrusion Detection Systems | Yi Huang et.al. | 2506.06226 | null |
| 2025-06-05 | Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets | Lei Hsiung et.al. | 2506.05346 | null |
| 2025-06-05 | SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs | Jiahui Wang et.al. | 2506.05344 | link |
| 2025-06-05 | Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning | Xingjian Ran et.al. | 2506.05341 | null |
| 2025-06-05 | VideoMolmo: Spatio-Temporal Grounding Meets Pointing | Ghazi Shazan Ahmad et.al. | 2506.05336 | link |
| 2025-06-05 | Search Arena: Analyzing Search-Augmented LLMs | Mihran Miroyan et.al. | 2506.05334 | link |
| 2025-06-05 | MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning | Xinyan Chen et.al. | 2506.05331 | link |
| 2025-06-05 | Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay | Yifan Sun et.al. | 2506.05316 | null |
| 2025-06-05 | Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models | Taha Entesari et.al. | 2506.05314 | null |
| 2025-06-05 | ProRefine: Inference-time Prompt Refinement with Textual Feedback | Deepak Pandita et.al. | 2506.05305 | null |
| 2025-06-05 | Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos | Weifeng Lin et.al. | 2506.05302 | null |
| 2025-06-04 | Language-Image Alignment with Fixed Text Encoders | Jingfeng Yang et.al. | 2506.04209 | link |
| 2025-06-04 | Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning | Shuang Chen et.al. | 2506.04207 | link |
| 2025-06-04 | EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation | Jinghan Jia et.al. | 2506.04205 | null |
| 2025-06-04 | Cascadia: A Cascade Serving System for Large Language Models | Youhe Jiang et.al. | 2506.04203 | null |
| 2025-06-04 | TracLLM: A Generic Framework for Attributing Long Context LLMs | Yanting Wang et.al. | 2506.04202 | link |
| 2025-06-04 | R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning | Qingfei Zhao et.al. | 2506.04185 | link |
| 2025-06-04 | SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models | Yuhao Wu et.al. | 2506.04180 | link |
| 2025-06-04 | SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling | Anhao Zhao et.al. | 2506.04179 | null |
| 2025-06-04 | Does Prompt Design Impact Quality of Data Imputation by LLMs? | Shreenidhi Srinivasan et.al. | 2506.04172 | null |
| 2025-06-04 | VISCA: Inferring Component Abstractions for Automated End-to-End Testing | Parsa Alian et.al. | 2506.04161 | null |
| 2025-06-03 | Entity-Augmented Neuroscience Knowledge Retrieval Using Ontology and Semantic Understanding Capability of LLM | Pralaypati Ta et.al. | 2506.03145 | null |
| 2025-06-03 | Not All Tokens Are Meant to Be Forgotten | Xiangyu Zhou et.al. | 2506.03142 | null |
| 2025-06-03 | SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation | Siqi Chen et.al. | 2506.03139 | link |
| 2025-06-03 | Native-Resolution Image Synthesis | Zidong Wang et.al. | 2506.03131 | link |
| 2025-06-03 | AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation | Lu Qiu et.al. | 2506.03126 | link |
| 2025-06-03 | AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation | Prashanth Vijayaraghavan et.al. | 2506.03122 | null |
| 2025-06-03 | Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback | Xiaoying Zhang et.al. | 2506.03106 | link |
| 2025-06-03 | TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models | Chetwin Low et.al. | 2506.03099 | link |
| 2025-06-03 | EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models | Mingzhe Li et.al. | 2506.03067 | null |
| 2025-06-03 | Facts Do Care About Your Language: Assessing Answer Quality of Multilingual LLMs | Yuval Kansal et.al. | 2506.03051 | null |
| 2025-05-30 | MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning | Yiqing Liang et.al. | 2505.24871 | link |
| 2025-05-30 | SiLVR: A Simple Language-based Video Reasoning Framework | Ce Zhang et.al. | 2505.24869 | link |
| 2025-05-30 | ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models | Mingjie Liu et.al. | 2505.24864 | null |
| 2025-05-30 | MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning | Jingyan Shen et.al. | 2505.24846 | null |
| 2025-05-30 | Chameleon: A Flexible Data-mixing Framework for Language Model Pretraining and Finetuning | Wanyun Xie et.al. | 2505.24844 | null |
| 2025-05-30 | Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck | Yuwen Tan et.al. | 2505.24840 | null |
| 2025-05-30 | VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software | Brandon Man et.al. | 2505.24838 | link |
| 2025-05-30 | Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs | Juraj Vladika et.al. | 2505.24830 | null |
| 2025-05-30 | LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text | Li yunhan et.al. | 2505.24826 | null |
| 2025-05-30 | PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models | Yinggan Xu et.al. | 2505.24823 | null |
| 2025-05-29 | Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought | Yunze Man et.al. | 2505.23766 | null |
| 2025-05-29 | From Chat Logs to Collective Insights: Aggregative Question Answering | Wentao Zhang et.al. | 2505.23765 | null |
| 2025-05-29 | MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence | Sihan Yang et.al. | 2505.23764 | null |
| 2025-05-29 | Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch | Aneeshan Sain et.al. | 2505.23763 | null |
| 2025-05-29 | Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint | Heekyung Lee et.al. | 2505.23759 | link |
| 2025-05-29 | DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning | Ziyin Zhang et.al. | 2505.23754 | link |
| 2025-05-29 | ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks | Akashah Shabbir et.al. | 2505.23752 | link |
| 2025-05-29 | Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences? | Paul Gölz et.al. | 2505.23749 | null |
| 2025-05-29 | Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence | Diankun Wu et.al. | 2505.23747 | link |
| 2025-05-29 | Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time | Mohamad Chehade et.al. | 2505.23729 | null |
| 2025-05-28 | Zero-Shot Vision Encoder Grafting via LLM Surrogates | Kaiyu Yue et.al. | 2505.22664 | link |
| 2025-05-28 | AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models | Feng Luo et.al. | 2505.22662 | null |
| 2025-05-28 | GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning | Qingchen Yu et.al. | 2505.22661 | link |
| 2025-05-28 | 3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model | Wenbo Hu et.al. | 2505.22657 | null |
| 2025-05-28 | Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents | Michael Kirchhof et.al. | 2505.22655 | null |
| 2025-05-28 | The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason | Ang Lv et.al. | 2505.22653 | link |
| 2025-05-28 | Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese | Hanjia Lyu et.al. | 2505.22645 | link |
| 2025-05-28 | Learning Composable Chains-of-Thought | Fangcong Yin et.al. | 2505.22635 | null |
| 2025-05-28 | Spatial Knowledge Graph-Guided Multimodal Synthesis | Yida Xue et.al. | 2505.22633 | null |
| 2025-05-28 | Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs | Ziling Cheng et.al. | 2505.22630 | null |
| 2025-05-27 | Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making | Yihan Wang et.al. | 2505.21503 | null |
| 2025-05-27 | Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment | Xiaojun Jia et.al. | 2505.21494 | null |
| 2025-05-27 | Reinforcing General Reasoning without Verifiers | Xiangxin Zhou et.al. | 2505.21493 | null |
| 2025-05-27 | Robust Hypothesis Generation: LLM-Automated Language Bias for Inductive Logic Programming | Yang Yang et.al. | 2505.21486 | null |
| 2025-05-27 | Are Language Models Consequentialist or Deontological Moral Reasoners? | Keenan Samway et.al. | 2505.21479 | null |
| 2025-05-27 | Policy Optimized Text-to-Image Pipeline Design | Uri Gadot et.al. | 2505.21478 | null |
| 2025-05-27 | Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration | Zijun Liu et.al. | 2505.21471 | link |
| 2025-05-27 | Do LLMs Need to Think in One Language? Correlation between Latent Language and Task Performance | Shintaro Ozaki et.al. | 2505.21458 | null |
| 2025-05-27 | Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO | Muzhi Zhu et.al. | 2505.21457 | null |
| 2025-05-27 | Can Large Reasoning Models Self-Train? | Sheikh Shafayat et.al. | 2505.21444 | null |
| 2025-05-26 | Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs | Hanting Chen et.al. | 2505.20155 | null |
| 2025-05-26 | UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter-Efficient Fine-Tuning of Large Models | Xueyan Zhang et.al. | 2505.20154 | null |
| 2025-05-26 | MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents | Ziming Wei et.al. | 2505.20148 | null |
| 2025-05-26 | FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities | Jin Wang et.al. | 2505.20147 | null |
| 2025-05-26 | StructEval: Benchmarking LLMs’ Capabilities to Generate Structural Outputs | Jialin Yang et.al. | 2505.20139 | null |
| 2025-05-26 | Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers | Zhengliang Shi et.al. | 2505.20128 | null |
| 2025-05-26 | Agentic AI Process Observability: Discovering Behavioral Variability | Fabiana Fournier et.al. | 2505.20127 | null |
| 2025-05-26 | TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent | Dominik Meier et.al. | 2505.20118 | null |
| 2025-05-26 | Named Entity Recognition in Historical Italian: The Case of Giacomo Leopardi’s Zibaldone | Cristian Santini et.al. | 2505.20113 | null |
| 2025-05-26 | ResSVD: Residual Compensated SVD for Large Language Model Compression | Haolei Bai et.al. | 2505.20112 | null |
| 2025-05-26 | Language-Agnostic Suicidal Risk Detection Using Large Language Models | June-Woo Kim et.al. | 2505.20109 | null |
| 2025-05-26 | Adaptive Deep Reasoning: Triggering Deep Thinking When Needed | Yunhao Wang et.al. | 2505.20101 | null |
| 2025-05-23 | Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs | Wafa Alghallabi et.al. | 2505.18152 | null |
| 2025-05-23 | First Finish Search: Efficient Test-Time Scaling in Large Language Models | Aradhye Agarwal et.al. | 2505.18149 | null |
| 2025-05-23 | Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find | Owen Bianchi et.al. | 2505.18148 | null |
| 2025-05-23 | Gaming Tool Preferences in Agentic LLMs | Kazem Faghih et.al. | 2505.18135 | link |
| 2025-05-23 | Reward Model Overoptimisation in Iterated RLHF | Lorenz Wolf et.al. | 2505.18126 | null |
| 2025-05-23 | UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification | Poojah Ganesan et.al. | 2505.18122 | null |
| 2025-05-23 | ProgRM: Build Better GUI Agents with Progress Rewards | Danyang Zhang et.al. | 2505.18121 | null |
| 2025-05-23 | Bidirectional Knowledge Distillation for Enhancing Sequential Recommendation with Large Language Models | Jiongran Wu et.al. | 2505.18120 | null |
| 2025-05-23 | Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM | Zinuo Li et.al. | 2505.18110 | null |
| 2025-05-23 | ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent Framework | Lisheng Huang et.al. | 2505.18105 | null |
| 2025-05-22 | CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms | Shilin Yan et.al. | 2505.17020 | link |
| 2025-05-22 | Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework | Chenhao Zhang et.al. | 2505.17019 | link |
| 2025-05-22 | SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward | Kaixuan Fan et.al. | 2505.17018 | link |
| 2025-05-22 | Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO | Chengzhuo Tong et.al. | 2505.17017 | link |
| 2025-05-22 | Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models | Runsen Xu et.al. | 2505.17015 | link |
| 2025-05-22 | SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding | Haoning Wu et.al. | 2505.17012 | link |
| 2025-05-22 | R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning | Huatong Song et.al. | 2505.17005 | link |
| 2025-05-22 | Do Large Language Models Excel in Complex Logical Reasoning with Formal Language? | Jin Jiang et.al. | 2505.16998 | link |
| 2025-05-22 | DecoupledESC: Enhancing Emotional Support Generation via Strategy-Response Decoupled Preference Optimization | Chao Zhang et.al. | 2505.16995 | null |
| 2025-05-22 | Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding | Runpeng Yu et.al. | 2505.16990 | link |
| 2025-05-21 | The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation | Patrick Kahardipraja et.al. | 2505.15807 | null |
| 2025-05-21 | Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering | Hwan Chang et.al. | 2505.15805 | null |
| 2025-05-21 | STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs | Zongzhao Li et.al. | 2505.15804 | null |
| 2025-05-21 | VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models | Yuchen Yan et.al. | 2505.15801 | null |
| 2025-05-21 | Reverse Engineering Human Preferences with Reinforcement Learning | Lisa Alazraki et.al. | 2505.15795 | null |
| 2025-05-21 | HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving | Zhiwen Chen et.al. | 2505.15793 | null |
| 2025-05-21 | Large Language Models as Computable Approximations to Solomonoff Induction | Jun Wan et.al. | 2505.15784 | null |
| 2025-05-21 | ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning | Changtai Zhu et.al. | 2505.15776 | null |
| 2025-05-21 | Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention | Huanxuan Liao et.al. | 2505.15774 | null |
| 2025-05-21 | MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling | Cheng Yifan et.al. | 2505.15772 | null |
| 2025-05-20 | Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning | Haolei Xu et.al. | 2505.14684 | null |
| 2025-05-20 | UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation | Rui Tian et.al. | 2505.14682 | null |
| 2025-05-20 | UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Large Language Models | Xiaojie Gu et.al. | 2505.14679 | null |
| 2025-05-20 | Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning | Jiaer Xia et.al. | 2505.14677 | null |
| 2025-05-20 | Reward Reasoning Model | Jiaxin Guo et.al. | 2505.14674 | null |
| 2025-05-20 | Quartet: Native FP4 Training Can Be Optimal for Large Language Models | Roberto L. Castro et.al. | 2505.14669 | null |
| 2025-05-20 | ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions | Bufang Yang et.al. | 2505.14668 | null |
| 2025-05-20 | Beyond Words: Multimodal LLM Knows When to Speak | Zikai Liao et.al. | 2505.14654 | null |
| 2025-05-20 | General-Reasoner: Advancing LLM Reasoning Across All Domains | Xueguang Ma et.al. | 2505.14652 | null |
| 2025-05-20 | Think Only When You Need with Large Hybrid-Reasoning Models | Lingjie Jiang et.al. | 2505.14631 | null |
| 2025-05-19 | CIE: Controlling Language Model Text Generations Using Continuous Signals | Vinay Samuel et.al. | 2505.13448 | link |
| 2025-05-19 | Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards | Xiaoyuan Liu et.al. | 2505.13445 | null |
| 2025-05-19 | Optimizing Anytime Reasoning via Budget Relative Policy Optimization | Penghui Qi et.al. | 2505.13438 | link |
| 2025-05-19 | SMOTExT: SMOTE meets Large Language Models | Mateusz Bystroński et.al. | 2505.13434 | null |
| 2025-05-19 | Fine-tuning Quantized Neural Networks with Zeroth-order Optimization | Sifeng Shang et.al. | 2505.13430 | null |
| 2025-05-19 | Understanding Complexity in VideoQA via Visual Program Generation | Cristobal Eyzaguirre et.al. | 2505.13429 | null |
| 2025-05-19 | MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision | Lingxiao Du et.al. | 2505.13427 | link |
| 2025-05-19 | Learnware of Language Models: Specialized Small Language Models Can Do Big | Zhi-Hao Tan et.al. | 2505.13425 | null |
| 2025-05-19 | Make Still Further Progress: Chain of Thoughts for Tabular Data Leaderboard | Si-Yang Liu et.al. | 2505.13421 | null |
| 2025-05-19 | FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning | Zhuozhao Hu et.al. | 2505.13419 | link |
| 2025-05-16 | Modeling cognitive processes of natural reading with transformer-based Language Models | Bruno Bianchi et.al. | 2505.11485 | null |
| 2025-05-16 | msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML | Zhaolan Huang et.al. | 2505.11483 | null |
| 2025-05-16 | Improving Assembly Code Performance with Large Language Models via Reinforcement Learning | Anjiang Wei et.al. | 2505.11480 | null |
| 2025-05-16 | HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages | Zhilin Wang et.al. | 2505.11475 | null |
| 2025-05-16 | Disentangling Reasoning and Knowledge in Medical Large Language Models | Rahul Thapa et.al. | 2505.11462 | null |
| 2025-05-16 | ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks | Zhixiong Zhuang et.al. | 2505.11459 | null |
| 2025-05-16 | HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation | Shaina Raza et.al. | 2505.11454 | null |
| 2025-05-16 | LLMs unlock new paths to monetizing exploits | Nicholas Carlini et.al. | 2505.11449 | null |
| 2025-05-16 | Is Compression Really Linear with Code Intelligence? | Xianzhen Luo et.al. | 2505.11441 | null |
| 2025-05-16 | GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art | Chenkai Zhang et.al. | 2505.11436 | null |
| 2025-05-15 | End-to-End Vision Tokenizer Tuning | Wenxuan Wang et.al. | 2505.10562 | null |
| 2025-05-15 | Neural Thermodynamic Laws for Large Language Model Training | Ziming Liu et.al. | 2505.10559 | null |
| 2025-05-15 | MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning | Ke Wang et.al. | 2505.10557 | link |
| 2025-05-15 | Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data | Yiwen Liu et.al. | 2505.10551 | link |
| 2025-05-15 | Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models | Annie Wong et.al. | 2505.10543 | link |
| 2025-05-15 | Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis | Pengfei Wang et.al. | 2505.10541 | link |
| 2025-05-15 | S3C2 Summit 2024-09: Industry Secure Software Supply Chain Summit | Imranur Rahman et.al. | 2505.10538 | null |
| 2025-05-15 | RouteNator: A Router-Based Multi-Modal Architecture for Generating Synthetic Training Data for Function Calling LLMs | Vibha Belavadi et.al. | 2505.10495 | null |
| 2025-05-15 | Can You Really Trust Code Copilots? Evaluating Large Language Models from a Code Security Perspective | Yutao Mou et.al. | 2505.10494 | link |
| 2025-05-15 | CL-RAG: Bridging the Gap in Retrieval-Augmented Generation with Curriculum Learning | Shaohan Wang et.al. | 2505.10493 | null |
| 2025-05-14 | Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors | Nicolas Dupuis et.al. | 2505.09610 | null |
| 2025-05-14 | Adversarial Suffix Filtering: a Defense Pipeline for LLMs | David Khachaturov et.al. | 2505.09602 | null |
| 2025-05-14 | How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference | Nidhal Jegham et.al. | 2505.09598 | null |
| 2025-05-14 | WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models | Abdullah Mushtaq et.al. | 2505.09595 | null |
| 2025-05-14 | Variational Visual Question Answering | Tobias Jan Wieczorek et.al. | 2505.09591 | null |
| 2025-05-14 | Beyond Likes: How Normative Feedback Complements Engagement Signals on Social Media | Yuchen Wu et.al. | 2505.09583 | null |
| 2025-05-14 | Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach | Shannon Lodoen et.al. | 2505.09576 | null |
| 2025-05-14 | MIGRATION-BENCH: Repository-Level Code Migration Benchmark from Java 8 | Linbo Liu et.al. | 2505.09569 | null |
| 2025-05-14 | PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning | Zongqian Li et.al. | 2505.09519 | null |
| 2025-05-14 | Layered Unlearning for Adversarial Relearning | Timothy Qian et.al. | 2505.09500 | link |
| 2025-05-13 | CodePDE: An Inference Framework for LLM-driven PDE Solver Generation | Shanda Li et.al. | 2505.08783 | null |
| 2025-05-13 | HealthBench: Evaluating Large Language Models Towards Improved Human Health | Rahul K. Arora et.al. | 2505.08775 | link |
| 2025-05-14 | Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology | Yatai Ji et.al. | 2505.08765 | null |
| 2025-05-13 | AC-Reason: Towards Theory-Guided Actual Causality Reasoning with Large Language Models | Yanxi Zhang et.al. | 2505.08750 | null |
| 2025-05-13 | DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models | Xiaoyang Chen et.al. | 2505.08744 | link |
| 2025-05-13 | Probability Consistency in Large Language Models: Theoretical Foundations Meet Empirical Discrepancies | Xiaoliang Luo et.al. | 2505.08739 | null |
| 2025-05-13 | NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context | Ben Yao et.al. | 2505.08734 | null |
| 2025-05-13 | Securing RAG: A Risk Assessment and Mitigation Framework | Lukas Ammann et.al. | 2505.08728 | null |
| 2025-05-13 | PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts | Yang Su et.al. | 2505.08719 | null |
| 2025-05-13 | LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs | K M Sajjadul Islam et.al. | 2505.08704 | null |
| 2025-05-12 | A Comparative Analysis of Static Word Embeddings for Hungarian | Máté Gedeon et.al. | 2505.07809 | null |
| 2025-05-12 | Learning Dynamics in Continual Pre-Training for Large Language Models | Xingjin Wang et.al. | 2505.07796 | null |
| 2025-05-12 | Domain Regeneration: How well do LLMs match syntactic properties of text domains? | Da Ju et.al. | 2505.07784 | null |
| 2025-05-12 | Relative Overfitting and Accept-Reject Framework | Yanxin Liu et.al. | 2505.07783 | null |
| 2025-05-12 | MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering | Rushi Qiang et.al. | 2505.07782 | null |
| 2025-05-12 | Must Read: A Systematic Survey of Computational Persuasion | Nimet Beyza Bozdag et.al. | 2505.07775 | null |
| 2025-05-12 | Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving | Xinji Mai et.al. | 2505.07773 | link |
| 2025-05-12 | Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding | Yifeng Di et.al. | 2505.07768 | null |
| 2025-05-12 | Assessing the Chemical Intelligence of Large Language Models | Nicholas T. Runcie et.al. | 2505.07735 | null |
| 2025-05-12 | Spoken Language Understanding on Unseen Tasks With In-Context Learning | Neeraj Agrawal et.al. | 2505.07731 | null |
| 2025-05-09 | From Millions of Tweets to Actionable Insights: Leveraging LLMs for User Profiling | Vahid Rahimzadeh et.al. | 2505.06184 | null |
| 2025-05-09 | A Large Language Model-Enhanced Q-learning for Capacitated Vehicle Routing Problem with Time Windows | Linjiang Cao et.al. | 2505.06178 | null |
| 2025-05-09 | MonetGPT: Solving Puzzles Enhances MLLMs’ Image Retouching Skills | Niladri Shekhar Dutt et.al. | 2505.06176 | null |
| 2025-05-09 | Turbo-ICL: In-Context Learning-Based Turbo Equalization | Zihang Song et.al. | 2505.06175 | null |
| 2025-05-09 | A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets | Ryan Lagasse et.al. | 2505.06150 | null |
| 2025-05-09 | Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study | Faeze Ghorbanpour et.al. | 2505.06149 | null |
| 2025-05-09 | LLMs Get Lost In Multi-Turn Conversation | Philippe Laban et.al. | 2505.06120 | link |
| 2025-05-09 | Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models | Jugal Gajjar et.al. | 2505.06110 | null |
| 2025-05-09 | LLMs Outperform Experts on Challenging Biology Benchmarks | Lennart Justen et.al. | 2505.06108 | null |
| 2025-05-09 | Free and Fair Hardware: A Pathway to Copyright Infringement-Free Verilog Generation using LLMs | Sam Bush et.al. | 2505.06096 | null |
| 2025-05-08 | Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation | Chao Liao et.al. | 2505.05472 | null |
| 2025-05-08 | Flow-GRPO: Training Flow Matching Models via Online RL | Jie Liu et.al. | 2505.05470 | link |
| 2025-05-08 | Generating Physically Stable and Buildable LEGO Designs from Text | Ava Pun et.al. | 2505.05469 | link |
| 2025-05-08 | StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant | Haibo Wang et.al. | 2505.05467 | null |
| 2025-05-08 | ComPO: Preference Alignment via Comparison Oracles | Peter Chen et.al. | 2505.05465 | null |
| 2025-05-08 | Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging | Shiqi Chen et.al. | 2505.05464 | link |
| 2025-05-08 | UKElectionNarratives: A Dataset of Misleading Narratives Surrounding Recent UK General Elections | Fatima Haouari et.al. | 2505.05459 | null |
| 2025-05-08 | SITE: towards Spatial Intelligence Thorough Evaluation | Wenqi Wang et.al. | 2505.05456 | null |
| 2025-05-08 | Conversational Process Model Redesign | Nataliia Klievtsova et.al. | 2505.05453 | null |
| 2025-05-08 | clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations | Chalamalasetti Kranti et.al. | 2505.05445 | null |
| 2025-05-07 | EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning | Zhenghao Xing et.al. | 2505.04623 | null |
| 2025-05-07 | On Path to Multimodal Generalist: General-Level and General-Bench | Hao Fei et.al. | 2505.04620 | link |
| 2025-05-07 | OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution | Lianghong Guo et.al. | 2505.04606 | null |
| 2025-05-08 | MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection | Zhihao Zhang et.al. | 2505.04594 | null |
| 2025-05-07 | ZeroSearch: Incentivize the Search Capability of LLMs without Searching | Hao Sun et.al. | 2505.04588 | link |
| 2025-05-07 | SlideItRight: Using AI to Find Relevant Slides and Provide Feedback for Open-Ended Questions | Chloe Qianhui Zhao et.al. | 2505.04584 | null |
| 2025-05-07 | Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization | Wenjun Cao et.al. | 2505.04578 | null |
| 2025-05-07 | Comparative Analysis of Carbon Footprint in Manual vs. LLM-Assisted Code Development | Kuen Sum Cheung et.al. | 2505.04521 | null |
| 2025-05-07 | Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs | Yehui Tang et.al. | 2505.04519 | null |
| 2025-05-07 | CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation | Jiahao Li et.al. | 2505.04481 | null |
| 2025-05-06 | VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model | Zuwei Long et.al. | 2505.03739 | link |
| 2025-05-06 | Graph Drawing for LLMs: An Empirical Evaluation | Walter Didimo et.al. | 2505.03678 | null |
| 2025-05-06 | Binding threshold units with artificial oscillatory neurons | Vladimir Fanaskov et.al. | 2505.03648 | null |
| 2025-05-06 | PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing | Yiping Xie et.al. | 2505.03621 | null |
| 2025-05-06 | A Unifying Bias-aware Multidisciplinary Framework for Investigating Socio-Technical Issues | Sacha Hasan et.al. | 2505.03593 | null |
| 2025-05-06 | BCause: Human-AI collaboration to improve hybrid mapping and ideation in argumentation-grounded deliberation | Lucas Anastasiou et.al. | 2505.03584 | null |
| 2025-05-06 | DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes | Sergey Linok et.al. | 2505.03581 | link |
| 2025-05-06 | LlamaFirewall: An open source guardrail system for building secure AI agents | Sahana Chennabasappa et.al. | 2505.03574 | null |
| 2025-05-06 | Say It Another Way: A Framework for User-Grounded Paraphrasing | Cléa Chataigner et.al. | 2505.03563 | null |
| 2025-05-06 | A Comprehensive Survey of Large AI Models for Future Communications: Foundations, Applications and Challenges | Feibo Jiang et.al. | 2505.03556 | null |
| 2025-05-05 | Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation | Lu Ling et.al. | 2505.02836 | null |
| 2025-05-05 | R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning | Yi-Fan Zhang et.al. | 2505.02835 | link |
| 2025-05-05 | ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations | Dmitriy Shopkhoev et.al. | 2505.02819 | link |
| 2025-05-05 | Towards Quantifying the Hessian Structure of Neural Networks | Zhaorui Dong et.al. | 2505.02809 | null |
| 2025-05-05 | Generating HomeAssistant Automations Using an LLM-based Chatbot | Mathyas Giudici et.al. | 2505.02802 | null |
| 2025-05-05 | HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models | Zheng Lin et.al. | 2505.02795 | null |
| 2025-05-05 | Beyond the Monitor: Mixed Reality Visualization and AI for Enhanced Digital Pathology Workflow | Jai Prakash Veerla et.al. | 2505.02780 | null |
| 2025-05-05 | Giving Simulated Cells a Voice: Evolving Prompt-to-Intervention Models for Cellular Control | Nam H. Le et.al. | 2505.02766 | null |
| 2025-05-05 | Bye-bye, Bluebook? Automating Legal Procedure with Large Language Models | Matthew Dahl et.al. | 2505.02763 | null |
| 2025-05-05 | Knowledge Graphs for Enhancing Large Language Models in Entity Disambiguation | Pons Gerard et.al. | 2505.02737 | null |
| 2025-05-02 | Helping Big Language Models Protect Themselves: An Enhanced Filtering and Summarization System | Sheikh Samit Muhaimin et.al. | 2505.01315 | null |
| 2025-05-02 | Enhancing SPARQL Query Rewriting for Complex Ontology Alignments | Anicet Lepetit Ondo et.al. | 2505.01309 | null |
| 2025-05-02 | Document Retrieval Augmented Fine-Tuning (DRAFT) for safety-critical software assessments | Regan Bolton et.al. | 2505.01307 | null |
| 2025-05-02 | FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing | Gaoxiang Cong et.al. | 2505.01263 | null |
| 2025-05-02 | Digital Pathway Curation (DPC): a comparative pipeline to assess the reproducibility, consensus and accuracy across Gemini, PubMed, and scientific reviewers in biomedical research | Flavio Lichtenstein et.al. | 2505.01259 | null |
| 2025-05-02 | CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning | Tsai-Ning Wang et.al. | 2505.01199 | null |
| 2025-05-02 | LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures | Francisco Aguilera-Martínez et.al. | 2505.01177 | null |
| 2025-05-02 | Methodological Foundations for AI-Driven Survey Question Generation | Ted K. Mburu et.al. | 2505.01150 | null |
| 2025-05-02 | Retrieval-Augmented Generation in Biomedicine: A Survey of Technologies, Datasets, and Clinical Applications | Jiawei He et.al. | 2505.01146 | null |
| 2025-05-02 | MateICL: Mitigating Attention Dispersion in Large-Scale In-Context Learning | Murtadha Ahmed et.al. | 2505.01110 | null |
| 2025-05-01 | T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT | Dongzhi Jiang et.al. | 2505.00703 | link |
| 2025-05-01 | Steering Large Language Models with Register Analysis for Arbitrary Style Transfer | Xinchen Yang et.al. | 2505.00679 | null |
| 2025-05-01 | Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions | Yiming Du et.al. | 2505.00675 | link |
| 2025-05-01 | DeepCritic: Deliberate Critique with Large Language Models | Wenkai Yang et.al. | 2505.00662 | link |
| 2025-05-01 | On the generalization of language models from in-context learning and finetuning: a controlled study | Andrew K. Lampinen et.al. | 2505.00661 | null |
| 2025-05-01 | Large Language Models Understanding: an Inherent Ambiguity Barrier | Daniel N. Nissani et.al. | 2505.00654 | null |
| 2025-05-01 | Open-Source LLM-Driven Federated Transformer for Predictive IoV Management | Yazan Otoum et.al. | 2505.00651 | null |
| 2025-05-01 | Investigating Task Arithmetic for Zero-Shot Information Retrieval | Marco Braga et.al. | 2505.00649 | null |
| 2025-05-01 | The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them) | Zihao Wang et.al. | 2505.00626 | null |
| 2025-05-01 | FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation | Chaitali Bhattacharyya et.al. | 2505.00624 | null |
| 2025-04-30 | TRUST: An LLM-Based Dialogue System for Trauma Understanding and Structured Assessments | Sichang Tu et.al. | 2504.21851 | null |
| 2025-04-30 | COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning | Xindi Wu et.al. | 2504.21850 | link |
| 2025-04-30 | An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding | Xiuwei Shang et.al. | 2504.21803 | null |
| 2025-04-30 | DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition | Z. Z. Ren et.al. | 2504.21801 | link |
| 2025-04-30 | MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness | Junsheng Huang et.al. | 2504.21773 | null |
| 2025-04-30 | LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs | Baleegh Ahmad et.al. | 2504.21770 | null |
| 2025-04-30 | LLM-based Interactive Imitation Learning for Robotic Manipulation | Jonas Werner et.al. | 2504.21769 | null |
| 2025-04-30 | Investigating Literary Motifs in Ancient and Medieval Novels with Large Language Models | Emelie Hallenberg et.al. | 2504.21742 | null |
| 2025-04-30 | TheraQuest: A Gamified, LLM-Powered Simulation for Massage Therapy Training | Shengqian Wang et.al. | 2504.21735 | null |
| 2025-04-30 | XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs | Marco Arazzi et.al. | 2504.21700 | null |
| 2025-04-29 | YoChameleon: Personalized Vision and Language Generation | Thao Nguyen et.al. | 2504.20998 | link |
| 2025-04-29 | Toward Efficient Exploration by Large Language Model Agents | Dilip Arumugam et.al. | 2504.20997 | null |
| 2025-04-29 | X-Fusion: Introducing New Modality to Frozen Large Language Models | Sicheng Mo et.al. | 2504.20996 | null |
| 2025-04-29 | ACE: A Security Architecture for LLM-Integrated App Systems | Evan Li et.al. | 2504.20984 | null |
| 2025-04-29 | Real-Time Wayfinding Assistant for Blind and Low-Vision Users | Dabbrata Das et.al. | 2504.20976 | null |
| 2025-04-29 | SetKE: Knowledge Editing for Knowledge Elements Overlap | Yifan Wei et.al. | 2504.20972 | null |
| 2025-04-29 | OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification | Shangyu Li et.al. | 2504.20964 | null |
| 2025-04-29 | Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models | Maryna Vyshnyvetska et.al. | 2504.20951 | null |
| 2025-04-29 | Trace-of-Thought: Enhanced Arithmetic Problem Solving via Reasoning Distillation From Large to Small Language Models | Tyler McDonald et.al. | 2504.20946 | null |
| 2025-04-29 | ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification | Ziqing Fan et.al. | 2504.20930 | link |
| 2025-04-28 | AutoJudge: Judge Decoding Without Manual Annotation | Roman Garipov et.al. | 2504.20039 | null |
| 2025-04-28 | SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning | Wufei Ma et.al. | 2504.20024 | null |
| 2025-04-28 | Better To Ask in English? Evaluating Factual Accuracy of Multilingual LLMs in English and Low-Resource Languages | Pritika Rohera et.al. | 2504.20022 | null |
| 2025-04-28 | Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models | Xin Wang et.al. | 2504.20020 | null |
| 2025-04-28 | LLM-Generated Fake News Induces Truth Decay in News Ecosystem: A Case Study on Neural News Recommendation | Beizhe Hu et.al. | 2504.20013 | null |
| 2025-04-28 | Towards Automated Scoping of AI for Social Good Projects | Jacob Emmerson et.al. | 2504.20010 | null |
| 2025-04-28 | Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom | Rishika Sen et.al. | 2504.20000 | null |
| 2025-04-28 | TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons | Emre Can Acikgoz et.al. | 2504.19982 | null |
| 2025-04-28 | Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets | Adam Younsi et.al. | 2504.19981 | null |
| 2025-04-29 | From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification | Junhao Ye et.al. | 2504.19959 | null |
| 2025-04-25 | TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation | Gwen Yidou Weng et.al. | 2504.18535 | link |
| 2025-04-25 | Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation | Shivam Duggal et.al. | 2504.18509 | null |
| 2025-04-25 | TopSpace: spatial topic modeling for unsupervised discovery of multicellular spatial tissue structures in multiplex imaging | Junsouk Choi et.al. | 2504.18495 | null |
| 2025-04-25 | Investigating Co-Constructive Behavior of Large Language Models in Explanation Dialogues | Leandra Fichtel et.al. | 2504.18483 | null |
| 2025-04-25 | Generative Induction of Dialogue Task Schemas with Streaming Refinement and Simulated Interactions | James D. Finch et.al. | 2504.18474 | null |
| 2025-04-25 | Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation | Peiyuan Jing et.al. | 2504.18453 | null |
| 2025-04-25 | LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection | Rajesh Yarra et.al. | 2504.18423 | null |
| 2025-04-25 | BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs | Hongyu Wang et.al. | 2504.18415 | null |
| 2025-04-25 | An Empirical Study of Evaluating Long-form Question Answering | Ning Xian et.al. | 2504.18413 | null |
| 2025-04-25 | Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers | Jared Moore et.al. | 2504.18412 | link |
| 2025-04-24 | Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models | Xu Ma et.al. | 2504.17789 | null |
| 2025-04-24 | Replay to Remember: Retaining Domain Knowledge in Streaming Language Models | Sneh Pillai et.al. | 2504.17780 | null |
| 2025-04-24 | Conversational Assistants to support Heart Failure Patients: comparing a Neurosymbolic Architecture with ChatGPT | Anuja Tayal et.al. | 2504.17753 | null |
| 2025-04-24 | Towards Robust LLMs: an Adversarial Robustness Measurement Framework | Natan Levy et.al. | 2504.17723 | null |
| 2025-04-24 | Multilingual Performance Biases of Large Language Models in Education | Vansh Gupta et.al. | 2504.17720 | null |
| 2025-04-24 | Ensemble Bayesian Inference: Leveraging Small Language Models to Achieve LLM-level Accuracy in Profile Matching Tasks | Haru-Tada Sato et.al. | 2504.17685 | null |
| 2025-04-24 | INSIGHT: Bridging the Student-Teacher Gap in Times of Large Language Models | Jarne Thys et.al. | 2504.17677 | null |
| 2025-04-24 | Energy Considerations of Large Language Model Inference and Efficiency Optimizations | Jared Fernandez et.al. | 2504.17674 | null |
| 2025-04-24 | Cross-region Model Training with Communication-Computation Overlapping and Delay Compensation | Ying Zhu et.al. | 2504.17672 | null |
| 2025-04-24 | Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction | Yuanchang Ye et.al. | 2504.17671 | null |
| 2025-04-23 | IberBench: LLM Evaluation on Iberian Languages | José Ángel González et.al. | 2504.16921 | link |
| 2025-04-23 | Do Large Language Models know who did what to whom? | Joseph M. Denning et.al. | 2504.16884 | null |
| 2025-04-23 | Enhancing Critical Thinking with AI: A Tailored Warning System for RAG Models | Xuyang Zhu et.al. | 2504.16883 | null |
| 2025-04-23 | Context-Enhanced Vulnerability Detection Based on Large Language Model | Yixin Yang et.al. | 2504.16877 | null |
| 2025-04-23 | Exploring How LLMs Capture and Represent Domain-Specific Knowledge | Mirian Hipolito Garcia et.al. | 2504.16871 | null |
| 2025-04-23 | Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification | Alexander Shvets et.al. | 2504.16856 | link |
| 2025-04-23 | Monte Carlo Planning with Large Language Model for Text-Based Game Agents | Zijing Shi et.al. | 2504.16855 | null |
| 2025-04-23 | Improving Significant Wave Height Prediction Using Chronos Models | Yilin Zhai et.al. | 2504.16834 | null |
| 2025-04-23 | LRASGen: LLM-based RESTful API Specification Generation | Sida Deng et.al. | 2504.16833 | null |
| 2025-04-23 | GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning | Luu Quy Tung et.al. | 2504.16832 | null |
| 2025-04-22 | TTRL: Test-Time Reinforcement Learning | Yuxin Zuo et.al. | 2504.16084 | link |
| 2025-04-22 | From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning | Le Zhuo et.al. | 2504.16080 | link |
| 2025-04-22 | LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities | Thomas Schmied et.al. | 2504.16078 | null |
| 2025-04-22 | PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models | Shi Qiu et.al. | 2504.16074 | link |
| 2025-04-22 | A Python Tool for Reconstructing Full News Text from GDELT | A. Fronzetti Colladon et.al. | 2504.16063 | null |
| 2025-04-22 | Vision language models are unreliable at trivial spatial cognition | Sangeet Khemlani et.al. | 2504.16061 | null |
| 2025-04-22 | Automated Static Vulnerability Detection via a Holistic Neuro-symbolic Approach | Penghui Li et.al. | 2504.16057 | null |
| 2025-04-22 | Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability | Daniel Hendriks et.al. | 2504.16056 | null |
| 2025-04-22 | Certified Mitigation of Worst-Case LLM Copyright Infringement | Jingyu Zhang et.al. | 2504.16046 | null |
| 2025-04-22 | LLMs meet Federated Learning for Scalable and Secure IoT Management | Yazan Otoum et.al. | 2504.16032 | null |
| 2025-04-21 | Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs | Chun-Hsiao Yeh et.al. | 2504.15280 | link |
| 2025-04-21 | VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models | Weiye Xu et.al. | 2504.15279 | link |
| 2025-04-21 | Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning | Jie Cheng et.al. | 2504.15275 | link |
| 2025-04-21 | Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning | Ehsan Ahmadi et.al. | 2504.15263 | null |
| 2025-04-21 | CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation | Anirudh Khatry et.al. | 2504.15254 | link |
| 2025-04-21 | Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators | Yilun Zhou et.al. | 2504.15253 | link |
| 2025-04-21 | MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning | Yahan Yang et.al. | 2504.15241 | null |
| 2025-04-21 | Fully Bayesian Approaches to Topics over Time | Julián Cendrero et.al. | 2504.15220 | null |
| 2025-04-21 | EvalAgent: Discovering Implicit Evaluation Criteria from the Web | Manya Wadhwa et.al. | 2504.15219 | null |
| 2025-04-21 | Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs | Marina Sakharova et.al. | 2504.15210 | null |
| 2025-04-18 | Generative AI Act II: Test Time Scaling Drives Cognition Engineering | Shijie Xia et.al. | 2504.13828 | link |
| 2025-04-18 | Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models | Junjie Yang et.al. | 2504.13825 | null |
| 2025-04-18 | Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning | Yixuan Even Xu et.al. | 2504.13818 | null |
| 2025-04-18 | BadApex: Backdoor Attack Based on Adaptive Optimization Mechanism of Black-box Large Language Models | Zhengxian Wu et.al. | 2504.13775 | null |
| 2025-04-18 | DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs | Tamim Al Mahmud et.al. | 2504.13774 | null |
| 2025-04-18 | Detecting Malicious Source Code in PyPI Packages with LLMs: Does RAG Come in Handy? | Motunrayo Ibiyo et.al. | 2504.13769 | null |
| 2025-04-18 | Scaling sparse feature circuit finding for in-context learning | Dmitrii Kharlapenko et.al. | 2504.13756 | null |
| 2025-04-18 | Controlled Territory and Conflict Tracking (CONTACT): (Geo-)Mapping Occupied Territory from Open Source Intelligence | Paul K. Mandal et.al. | 2504.13730 | null |
| 2025-04-18 | OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation | Yichen Wu et.al. | 2504.13707 | null |
| 2025-04-18 | Exploring Multimodal Prompt for Visualization Authoring with Large Language Models | Zhen Wen et.al. | 2504.13700 | null |
| 2025-04-17 | SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs | Haoxuan Li et.al. | 2504.13172 | null |
| 2025-04-17 | Sleep-time Compute: Beyond Inference Scaling at Test-time | Kevin Lin et.al. | 2504.13171 | link |
| 2025-04-17 | Exploring Expert Failures Improves LLM Agent Tuning | Li-Cheng Lan et.al. | 2504.13145 | null |
| 2025-04-17 | Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo | João Loula et.al. | 2504.13139 | null |
| 2025-04-17 | Energy-Based Reward Models for Robust Language Model Alignment | Anamika Lochab et.al. | 2504.13134 | null |
| 2025-04-17 | LLMs Meet Finance: Fine-Tuning Foundation Models for the Open FinLLM Leaderboard | Varun Rao et.al. | 2504.13125 | null |
| 2025-04-17 | Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training | Xinsong Zhang et.al. | 2504.13123 | null |
| 2025-04-17 | VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models | Haojian Huang et.al. | 2504.13122 | link |
| 2025-04-17 | Hadamard product in deep learning: Introduction, Advances and Challenges | Grigorios G Chrysos et.al. | 2504.13112 | null |
| 2025-04-17 | Uncertainty-Aware Trajectory Prediction via Rule-Regularized Heteroscedastic Deep Classification | Kumar Manas et.al. | 2504.13111 | null |
| 2025-04-16 | BitNet b1.58 2B4T Technical Report | Shuming Ma et.al. | 2504.12285 | null |
| 2025-04-16 | HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks | Stefan Abi-Karam et.al. | 2504.12268 | null |
| 2025-04-16 | FLIP Reasoning Challenge | Andreas Plesner et.al. | 2504.12256 | link |
| 2025-04-16 | AnomalyGen: An Automated Semantic Log Sequence Generation Framework with LLM for Anomaly Detection | Xinyu Li et.al. | 2504.12250 | null |
| 2025-04-16 | MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models | Hang Yuan et.al. | 2504.12234 | null |
| 2025-04-16 | Watermarking Needs Input Repetition Masking | David Khachaturov et.al. | 2504.12229 | null |
| 2025-04-16 | d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning | Siyan Zhao et.al. | 2504.12216 | link |
| 2025-04-16 | What Do Large Language Models Know? Tacit Knowledge as a Potential Causal-Explanatory Structure | Céline Budding et.al. | 2504.12187 | null |
| 2025-04-16 | SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented Data | Suyoung Bae et.al. | 2504.12185 | null |
| 2025-04-16 | Trusting CHATGPT: how minor tweaks in the prompts lead to major differences in sentiment classification | Jaime E. Cuellar et.al. | 2504.12180 | null |
| 2025-04-15 | TextArena | Leon Guertler et.al. | 2504.11442 | null |
| 2025-04-15 | TADACap: Time-series Adaptive Domain-Aware Captioning | Elizabeth Fons et.al. | 2504.11441 | null |
| 2025-04-15 | Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models | Maria Teleki et.al. | 2504.11431 | null |
| 2025-04-15 | A Dual-Space Framework for General Knowledge Distillation of Large Language Models | Xue Zhang et.al. | 2504.11426 | null |
| 2025-04-15 | Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts | Quanyu Long et.al. | 2504.11420 | null |
| 2025-04-15 | DataDecide: How to Predict Best Pretraining Data with Small Experiments | Ian Magnusson et.al. | 2504.11393 | null |
| 2025-04-15 | RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models | Juan Diego Rodriguez et.al. | 2504.11381 | null |
| 2025-04-15 | Cancer-Myth: Evaluating AI Chatbot on Patient Questions with False Presuppositions | Wang Bill Zhu et.al. | 2504.11373 | null |
| 2025-04-15 | OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution | Lucio La Cava et.al. | 2504.11369 | null |
| 2025-04-15 | Teaching Large Language Models to Reason through Learning and Forgetting | Tianwei Ni et.al. | 2504.11364 | null |
| 2025-04-14 | InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models | Jinguo Zhu et.al. | 2504.10479 | null |
| 2025-04-14 | MIEB: Massive Image Embedding Benchmark | Chenghao Xiao et.al. | 2504.10471 | null |
| 2025-04-14 | Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding | Tao Zhang et.al. | 2504.10465 | null |
| 2025-04-14 | The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer | Weixian Lei et.al. | 2504.10462 | null |
| 2025-04-14 | GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents | Xiaobo Xia et.al. | 2504.10458 | null |
| 2025-04-14 | M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models | Junxiong Wang et.al. | 2504.10449 | null |
| 2025-04-14 | Multimodal Long Video Modeling Based on Temporal Dynamic Context | Haoran Hao et.al. | 2504.10443 | null |
| 2025-04-14 | LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models | Minqian Liu et.al. | 2504.10430 | null |
| 2025-04-14 | Can We Edit LLMs for Long-Tail Biomedical Knowledge? | Xinhao Yi et.al. | 2504.10421 | null |
| 2025-04-14 | Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA | Michał Turski et.al. | 2504.10419 | null |
| 2025-04-11 | Quantum Large Language Model Fine-Tuning | Sang Hyub Kim et.al. | 2504.08732 | null |
| 2025-04-11 | DocAgent: A Multi-Agent System for Automated Code Documentation Generation | Dayu Yang et.al. | 2504.08725 | null |
| 2025-04-11 | Hypergraph Vision Transformers: Images are More than Nodes, More than Edges | Joshua Fixelle et.al. | 2504.08710 | null |
| 2025-04-11 | SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents | Muhammad Shihab Rashid et.al. | 2504.08703 | null |
| 2025-04-11 | Large Language Models as Span Annotators | Zdeněk Kasner et.al. | 2504.08697 | null |
| 2025-04-11 | TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning | Hang Ni et.al. | 2504.08694 | null |
| 2025-04-11 | Fast-Slow-Thinking: Complex Task Solving with Large Language Models | Yiliu Sun et.al. | 2504.08690 | null |
| 2025-04-11 | Voice Interaction With Conversational AI Could Facilitate Thoughtful Reflection and Substantive Revision in Writing | Jiho Kim et.al. | 2504.08687 | null |
| 2025-04-11 | Variability-Driven User-Story Generation using LLM and Triadic Concept Analysis | Alexandre Bazin et.al. | 2504.08666 | null |
| 2025-04-11 | Quality evaluation of Tabby coding assistant using real source code snippets | Marta Borek et.al. | 2504.08650 | null |
| 2025-04-10 | C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing | Zhongyang Li et.al. | 2504.07964 | link |
| 2025-04-10 | GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation | Lang Lin et.al. | 2504.07962 | null |
| 2025-04-10 | MM-IFEngine: Towards Multimodal Instruction Following | Shengyuan Ding et.al. | 2504.07957 | link |
| 2025-04-10 | VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning | Yukun Qi et.al. | 2504.07956 | null |
| 2025-04-10 | Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos | Rundong Luo et.al. | 2504.07940 | null |
| 2025-04-10 | Porting an LLM based Application from ChatGPT to an On-Premise Environment | Teemu Paloniemi et.al. | 2504.07907 | null |
| 2025-04-10 | Redefining Machine Translation on Social Network Services with Large Language Models | Hongcheng Guo et.al. | 2504.07901 | null |
| 2025-04-10 | How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective | Qi Liu et.al. | 2504.07898 | null |
| 2025-04-10 | Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge | Riccardo Cantini et.al. | 2504.07887 | link |
| 2025-04-10 | Token Level Routing Inference System for Edge Devices | Jianshu She et.al. | 2504.07878 | null |
| 2025-04-09 | Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning | Nikhil Shivakumar Nayak et.al. | 2504.07097 | null |
| 2025-04-09 | KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs | Elan Markowitz et.al. | 2504.07087 | null |
| 2025-04-09 | DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning | Atharva Pandey et.al. | 2504.07080 | null |
| 2025-04-09 | A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models | Zhouhang Xie et.al. | 2504.07070 | null |
| 2025-04-09 | HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification | Bibek Paudel et.al. | 2504.07069 | null |
| 2025-04-09 | TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling | Liang-Hsuan Tseng et.al. | 2504.07053 | null |
| 2025-04-09 | To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning | Tian Qin et.al. | 2504.07052 | null |
| 2025-04-09 | Evaluating Retrieval Augmented Generative Models for Document Queries in Transportation Safety | Chad Melton et.al. | 2504.07022 | null |
| 2025-04-09 | LLM-IFT: LLM-Powered Information Flow Tracking for Secure Hardware | Nowfel Mashnoor et.al. | 2504.07015 | null |
| 2025-04-09 | Towards LLMs Robustness to Changes in Prompt Format Styles | Lilian Ngweta et.al. | 2504.06969 | null |
| 2025-04-08 | GOLLuM: Gaussian Process Optimized LLMs – Reframing LLM Finetuning through Bayesian Optimization | Bojana Ranković et.al. | 2504.06265 | null |
| 2025-04-08 | Hogwild! Inference: Parallel LLM Generation via Concurrent Attention | Gleb Rodionov et.al. | 2504.06261 | null |
| 2025-04-08 | FEABench: Evaluating Language Models on Multiphysics Reasoning Ability | Nayantara Mudur et.al. | 2504.06260 | null |
| 2025-04-08 | Transfer between Modalities with MetaQueries | Xichen Pan et.al. | 2504.06256 | null |
| 2025-04-08 | LExT: Towards Evaluating Trustworthiness of Natural Language Explanations | Krithi Shailya et.al. | 2504.06227 | null |
| 2025-04-08 | Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation | Biao Zhang et.al. | 2504.06225 | null |
| 2025-04-08 | Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs | Dongyang Fan et.al. | 2504.06219 | null |
| 2025-04-08 | From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models | Chejian Xu et.al. | 2504.06214 | null |
| 2025-04-08 | TxGemma: Efficient and Agentic LLMs for Therapeutics | Eric Wang et.al. | 2504.06196 | null |
| 2025-04-08 | Assessing how hyperparameters impact Large Language Models’ sarcasm detection performance | Montgomery Gole et.al. | 2504.06166 | null |
| 2025-04-07 | URECA: Unique Region Caption Anything | Sangbeom Lim et.al. | 2504.05305 | null |
| 2025-04-07 | Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations | Pedro Ferreira et.al. | 2504.05294 | null |
| 2025-04-07 | The challenge of uncertainty quantification of large language models in medicine | Zahra Atf et.al. | 2504.05278 | null |
| 2025-04-07 | Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation | Yucheng Chu et.al. | 2504.05276 | null |
| 2025-04-07 | Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models | Yang Yan et.al. | 2504.05262 | null |
| 2025-04-07 | Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models | Adrián Bazaga et.al. | 2504.05258 | null |
| 2025-04-07 | Explaining Low Perception Model Competency with High-Competency Counterfactuals | Sara Pohland et.al. | 2504.05254 | null |
| 2025-04-07 | LLM-based Automated Grading with Human-in-the-Loop | Hang Li et.al. | 2504.05239 | null |
| 2025-04-08 | Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG | Hengran Zhang et.al. | 2504.05220 | null |
| 2025-04-07 | Unleashing the Power of LLMs in Dense Retrieval with Query Likelihood Modeling | Hengran Zhang et.al. | 2504.05216 | null |
| 2025-04-04 | Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning | Xinyi Wang et.al. | 2504.03635 | null |
| 2025-04-04 | Align to Structure: Aligning Large Language Models with Structural Information | Zae Myung Kim et.al. | 2504.03622 | null |
| 2025-04-04 | VISTA-OCR: Towards generative and interactive end to end OCR models | Laziz Hamdi et.al. | 2504.03621 | null |
| 2025-04-04 | Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task | Leonardo Ranaldi et.al. | 2504.03616 | null |
| 2025-04-04 | AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset | Bingxiang He et.al. | 2504.03612 | null |
| 2025-04-04 | EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline | Peter Baile Chen et.al. | 2504.03598 | null |
| 2025-04-04 | Agentic Knowledgeable Self-awareness | Shuofei Qiao et.al. | 2504.03553 | null |
| 2025-04-04 | Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles | Chen Wei Kuo et.al. | 2504.03520 | null |
| 2025-04-04 | LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications | Botao Zhu et.al. | 2504.03444 | null |
| 2025-04-04 | Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models | Mirko Borszukovszki et.al. | 2504.03440 | null |
| 2025-04-03 | STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection | Divya Velayudhan et.al. | 2504.02823 | null |
| 2025-04-03 | Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models | Mateusz Pach et.al. | 2504.02821 | link |
| 2025-04-03 | Generative Evaluation of Complex Reasoning in Large Language Models | Haowei Lin et.al. | 2504.02810 | link |
| 2025-04-03 | MegaMath: Pushing the Limits of Open Math Corpora | Fan Zhou et.al. | 2504.02807 | link |
| 2025-04-04 | A Survey of Large Language Models in Mental Health Disorder Detection on Social Media | Zhuohan Ge et.al. | 2504.02800 | null |
| 2025-04-03 | A Framework for Robust Cognitive Evaluation of LLMs | Karin de Langis et.al. | 2504.02789 | null |
| 2025-04-03 | From Consumption to Collaboration: Measuring Interaction Patterns to Augment Human Cognition in Open-Ended Tasks | Joshua Holstein et.al. | 2504.02780 | null |
| 2025-04-03 | BT-ACTION: A Test-Driven Approach for Modular Understanding of User Instruction Leveraging Behaviour Trees and LLMs | Alexander Leszczynski et.al. | 2504.02779 | null |
| 2025-04-03 | How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices? | Andres Algaba et.al. | 2504.02767 | null |
| 2025-04-03 | Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study | Aryan Agrawal et.al. | 2504.02733 | null |
| 2025-04-02 | Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities | Jing Liu et.al. | 2504.01954 | null |
| 2025-04-02 | The LLM Wears Prada: Analysing Gender Bias and Stereotypes through Online Shopping Data | Massimiliano Luca et.al. | 2504.01951 | null |
| 2025-04-02 | OpenCodeReasoning: Advancing Data Distillation for Competitive Coding | Wasi Uddin Ahmad et.al. | 2504.01943 | null |
| 2025-04-02 | Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length? | Celine Lee et.al. | 2504.01935 | null |
| 2025-04-02 | A thorough benchmark of automatic text classification: From traditional approaches to large language models | Washington Cunha et.al. | 2504.01930 | null |
| 2025-04-02 | Gen-C: Populating Virtual Worlds with Generative Crowds | Andreas Panayiotou et.al. | 2504.01924 | null |
| 2025-04-02 | Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation | Baban Gain et.al. | 2504.01919 | null |
| 2025-04-02 | Advancing AI-Scientist Understanding: Making LLM Think Like a Physicist with Interpretable Reasoning | Yinggan Xu et.al. | 2504.01911 | null |
| 2025-04-02 | GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning | Yanzhou Su et.al. | 2504.01886 | link |
| 2025-04-02 | TransientTables: Evaluating LLMs’ Reasoning on Temporally Evolving Semi-structured Tables | Abhilash Shankarampeta et.al. | 2504.01879 | null |
| 2025-03-31 | Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation | Shengqiong Wu et.al. | 2503.24379 | link |
| 2025-03-31 | Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models | Rui Wang et.al. | 2503.24377 | link |
| 2025-03-31 | Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 | Yi Chen et.al. | 2503.24376 | link |
| 2025-03-31 | Effectively Controlling Reasoning Models through Thinking Intervention | Tong Wu et.al. | 2503.24370 | null |
| 2025-03-31 | ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion | Rana Muhammad Shahroz Khan et.al. | 2503.24354 | null |
| 2025-03-31 | BEATS: Bias Evaluation and Assessment Test Suite for Large Language Models | Alok Abhishek et.al. | 2503.24310 | null |
| 2025-03-31 | A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG | Arshia Kermani et.al. | 2503.24307 | null |
| 2025-03-31 | Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning | Jiacheng Lin et.al. | 2503.24289 | link |
| 2025-03-31 | Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality | Sewoong Lee et.al. | 2503.24277 | link |
| 2025-03-31 | Enhancing Large Language Models (LLMs) for Telecommunications using Knowledge Graphs and Retrieval-Augmented Generation | Dun Yuan et.al. | 2503.24245 | null |
| 2025-03-28 | Q-Insight: Understanding Image Quality via Visual Reinforcement Learning | Weiqi Li et.al. | 2503.22679 | link |
| 2025-03-28 | QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks? | Belinda Z. Li et.al. | 2503.22674 | link |
| 2025-03-28 | Exploring the Effectiveness of Multi-stage Fine-tuning for Cross-encoder Re-rankers | Francesca Pezzuti et.al. | 2503.22672 | link |
| 2025-03-28 | Unicorn: Text-Only Data Synthesis for Vision Language Model Training | Xiaomin Yu et.al. | 2503.22655 | link |
| 2025-03-28 | Sentiment Classification of Thai Central Bank Press Releases Using Supervised Learning | Stefano Grassi et.al. | 2503.22629 | null |
| 2025-03-28 | Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users | Antonia Karamolegkou et.al. | 2503.22610 | null |
| 2025-03-28 | On the Alignment of Post-Publication Reviews & Bibliometric and Altmetric Impact – A Case Study on Expert Statements from the Science Media Center Germany | Dirk Tunger et.al. | 2503.22594 | null |
| 2025-03-28 | LLM-enabled Instance Model Generation | Fengjunjie Pan et.al. | 2503.22587 | null |
| 2025-03-28 | Historical Ink: Exploring Large Language Models for Irony Detection in 19th-Century Spanish | Kevin Cohen et.al. | 2503.22585 | link |
| 2025-03-28 | Beyond Vanilla Fine-Tuning: Leveraging Multistage, Multilingual, and Domain-Specific Methods for Low-Resource Machine Translation | Sarubi Thillainathan et.al. | 2503.22582 | null |
| 2025-03-27 | Video-R1: Reinforcing Video Reasoning in MLLMs | Kaituo Feng et.al. | 2503.21776 | link |
| 2025-03-27 | LOCORE: Image Re-ranking with Long-Context Sequence Modeling | Zilin Xiao et.al. | 2503.21772 | link |
| 2025-03-27 | MemInsight: Autonomous Memory Augmentation for LLM Agents | Rana Salama et.al. | 2503.21760 | null |
| 2025-03-27 | Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck | Adrian Bulat et.al. | 2503.21757 | null |
| 2025-03-27 | LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis | Shitian Zhao et.al. | 2503.21749 | link |
| 2025-03-27 | CTRL-O: Language-Controllable Object-Centric Visual Representation Learning | Aniket Didolkar et.al. | 2503.21747 | null |
| 2025-03-27 | GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics | Arsham Gholamzadeh Khoee et.al. | 2503.21735 | null |
| 2025-03-27 | Effective Skill Unlearning through Intervention and Abstention | Yongce Li et.al. | 2503.21730 | link |
| 2025-03-27 | Collab: Controlled Decoding using Mixture of Agents for LLM Alignment | Souradip Chakraborty et.al. | 2503.21720 | null |
| 2025-03-27 | Enhancing Repository-Level Software Repair via Repository-Aware Knowledge Graphs | Boyang Yang et.al. | 2503.21710 | null |
| 2025-03-26 | Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark | Sondos Mahmoud Bsharat et.al. | 2503.20786 | link |
| 2025-03-26 | Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields | Shijie Zhou et.al. | 2503.20776 | null |
| 2025-03-26 | MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams | Yanpeng Sun et.al. | 2503.20745 | null |
| 2025-03-26 | Dynamic Motion Blending for Versatile Motion Editing | Nan Jiang et.al. | 2503.20724 | null |
| 2025-03-26 | From Annotation to Adaptation: Metrics, Synthetic Data, and Aspect Extraction for Aspect-Based Sentiment Analysis with Large Language Models | Nikita Neveditsin et.al. | 2503.20715 | null |
| 2025-03-27 | Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy | Yinan Sun et.al. | 2503.20673 | null |
| 2025-03-26 | TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews | Huimin Xu et.al. | 2503.20666 | null |
| 2025-03-26 | Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging | Han Wu et.al. | 2503.20641 | link |
| 2025-03-26 | Collaborative Storytelling and LLM: A Linguistic Analysis of Automatically-Generated Role-Playing Game Sessions | Alessandro Maisto et.al. | 2503.20623 | null |
| 2025-03-26 | What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond | Wenchao Gu et.al. | 2503.20589 | null |
| 2025-03-25 | CoLLM: A Large Language Model for Composed Image Retrieval | Chuong Huynh et.al. | 2503.19910 | link |
| 2025-03-25 | A Multi-Agent Framework Integrating Large Language Models and Generative AI for Accelerated Metamaterial Design | Jie Tian et.al. | 2503.19889 | null |
| 2025-03-25 | CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation | Nengbo Wang et.al. | 2503.19878 | null |
| 2025-03-25 | SLA-Awareness for AI-assisted coding | Kishanthan Thangarajah et.al. | 2503.19876 | null |
| 2025-03-25 | Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking | Xiaoyu Tian et.al. | 2503.19855 | link |
| 2025-03-25 | Towards Online Multi-Modal Social Interaction Understanding | Xinpeng Li et.al. | 2503.19851 | null |
| 2025-03-25 | FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs | Carlos Plou et.al. | 2503.19850 | null |
| 2025-03-25 | A Comparative Analysis of Word Segmentation, Part-of-Speech Tagging, and Named Entity Recognition for Historical Chinese Sources, 1900-1950 | Zhao Fang et.al. | 2503.19844 | null |
| 2025-03-25 | SemEval-2025 Task 9: The Food Hazard Detection Challenge | Korbinian Randl et.al. | 2503.19800 | null |
| 2025-03-25 | PAVE: Patching and Adapting Video Large Language Models | Zhuoming Liu et.al. | 2503.19794 | link |
| 2025-03-24 | SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding | Mingze Xu et.al. | 2503.18943 | null |
| 2025-03-24 | Video-T1: Test-Time Scaling for Video Generation | Fangfu Liu et.al. | 2503.18942 | link |
| 2025-03-24 | Exploring Training and Inference Scaling Laws in Generative Retrieval | Hongru Cai et.al. | 2503.18941 | null |
| 2025-03-24 | Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training | Brian R. Bartoldson et.al. | 2503.18929 | link |
| 2025-03-24 | FFN Fusion: Rethinking Sequential Computation in Large Language Models | Akhiad Bercovich et.al. | 2503.18908 | null |
| 2025-03-24 | xKV: Cross-Layer SVD for KV-Cache Compression | Chi-Chih Chang et.al. | 2503.18893 | link |
| 2025-03-24 | AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration | Zhexuan Wang et.al. | 2503.18891 | null |
| 2025-03-24 | Toward building next-generation Geocoding systems: a systematic review | Zhengcong Yin et.al. | 2503.18888 | null |
| 2025-03-24 | I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders | Andrey Galichin et.al. | 2503.18878 | link |
| 2025-03-24 | Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design | Rui Xie et.al. | 2503.18869 | null |
| 2025-03-21 | Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique | Yansi Li et.al. | 2503.17363 | null |
| 2025-03-21 | OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement | Yihe Deng et.al. | 2503.17352 | link |
| 2025-03-21 | Capturing Individual Human Preferences with Reward Features | André Barreto et.al. | 2503.17338 | null |
| 2025-03-21 | Efficient Intent-Based Filtering for Multi-Party Conversations Using Knowledge Distillation from LLMs | Reem Gody et.al. | 2503.17336 | null |
| 2025-03-21 | CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities | Yuxuan Zhu et.al. | 2503.17332 | link |
| 2025-03-21 | LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language | Kun Chu et.al. | 2503.17309 | null |
| 2025-03-21 | Bugdar: AI-Augmented Secure Code Review for GitHub Pull Requests | John Naulty et.al. | 2503.17302 | null |
| 2025-03-21 | CASE – Condition-Aware Sentence Embeddings for Conditional Semantic Textual Similarity Measurement | Gaifan Zhang et.al. | 2503.17279 | null |
| 2025-03-21 | SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging | Aladin Djuhera et.al. | 2503.17239 | null |
| 2025-03-21 | FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs | Albert Sawczyn et.al. | 2503.17229 | null |
| 2025-03-20 | Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models | Yang Sui et.al. | 2503.16419 | link |
| 2025-03-20 | The Emperor’s New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination | Yifan Sun et.al. | 2503.16402 | null |
| 2025-03-20 | Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them | Guanyu Chen et.al. | 2503.16401 | null |
| 2025-03-20 | Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation | Yijia Luo et.al. | 2503.16385 | link |
| 2025-03-20 | LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images | Leyang Wang et.al. | 2503.16376 | null |
| 2025-03-20 | CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners | Yunzhi Yao et.al. | 2503.16356 | link |
| 2025-03-20 | LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates | Ying Shen et.al. | 2503.16334 | null |
| 2025-03-20 | OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence | Long Yuan et.al. | 2503.16326 | null |
| 2025-03-20 | Bridging Technology and Humanities: Evaluating the Impact of Large Language Models on Social Sciences Research with DeepSeek-R1 | Peiran Gu et.al. | 2503.16304 | null |
| 2025-03-20 | Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens | Shuqi Lu et.al. | 2503.16278 | link |
| 2025-03-19 | SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks | Yifei Zhou et.al. | 2503.15478 | link |
| 2025-03-19 | Cube: A Roblox View of 3D Intelligence | Foundation AI Team et.al. | 2503.15475 | link |
| 2025-03-19 | From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment | Jia-Nan Li et.al. | 2503.15463 | null |
| 2025-03-19 | Visual Position Prompt for MLLM based Visual Grounding | Wei Tang et.al. | 2503.15426 | link |
| 2025-03-19 | Probing the topology of the space of tokens with structured prompts | Michael Robinson et.al. | 2503.15421 | null |
| 2025-03-19 | EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models | Yinan Liang et.al. | 2503.15369 | null |
| 2025-03-19 | SemEval-2025 Task 1: AdMIRe – Advancing Multimodal Idiomaticity Representation | Thomas Pickard et.al. | 2503.15358 | null |
| 2025-03-19 | SPILL: Domain-Adaptive Intent Clustering based on Selection and Pooling with Large Language Models | I-Fan Lin et.al. | 2503.15351 | null |
| 2025-03-19 | TruthLens:A Training-Free Paradigm for DeepFake Detection | Ritabrata Chakraborty et.al. | 2503.15342 | null |
| 2025-03-19 | Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs | Yuqi Zhu et.al. | 2503.15341 | null |
| 2025-03-18 | Aligning Multimodal LLM with Human Preference: A Survey | Tao Yu et.al. | 2503.14504 | null |
| 2025-03-18 | Engineering Scientific Assistants using Interactive Structured Induction of Programs | Shraddha Surana et.al. | 2503.14488 | null |
| 2025-03-18 | Gricean Norms as a Basis for Effective Collaboration | Fardin Saad et.al. | 2503.14484 | null |
| 2025-03-18 | Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM | Xinyu Fang et.al. | 2503.14478 | link |
| 2025-03-18 | EnvBench: A Benchmark for Automated Environment Setup | Aleksandra Eliseeva et.al. | 2503.14443 | link |
| 2025-03-18 | LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers | Nikhil Abhyankar et.al. | 2503.14434 | link |
| 2025-03-18 | PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play | Wei Fang et.al. | 2503.14432 | null |
| 2025-03-18 | Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models | Siwei Zhang et.al. | 2503.14411 | null |
| 2025-03-18 | Large Language Models for Virtual Human Gesture Selection | Parisa Ghanad Torshizi et.al. | 2503.14408 | null |
| 2025-03-18 | From “Hallucination” to “Suture”: Insights from Language Philosophy to Enhance Large Language Models | Qiantong Wang et.al. | 2503.14392 | null |
| 2025-03-17 | MetaScale: Test-Time Scaling with Evolving Meta-Thoughts | Qin Liu et.al. | 2503.13447 | null |
| 2025-03-17 | Faithfulness of LLM Self-Explanations for Commonsense Tasks: Larger Is Better, and Instruction-Tuning Allows Trade-Offs but Not Pareto Dominance | Noah Y. Siegel et.al. | 2503.13445 | null |
| 2025-03-17 | VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning | Ye Liu et.al. | 2503.13444 | null |
| 2025-03-17 | xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference | Maximilian Beck et.al. | 2503.13427 | null |
| 2025-03-17 | A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives | Weiqiang Jin et.al. | 2503.13415 | null |
| 2025-03-17 | DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective | Dengyun Peng et.al. | 2503.13413 | null |
| 2025-03-17 | Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis | Alexander Ku et.al. | 2503.13401 | null |
| 2025-03-17 | MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research | James Burgess et.al. | 2503.13399 | null |
| 2025-03-17 | Scale Efficient Training for Large Datasets | Qing Zhou et.al. | 2503.13385 | null |
| 2025-03-17 | Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning | Mengyao Lyu et.al. | 2503.13383 | null |
| 2025-03-14 | ASMA-Tune: Unlocking LLMs’ Assembly Code Comprehension via Structural-Semantic Instruction Tuning | Xinyi Wang et.al. | 2503.11617 | null |
| 2025-03-14 | Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs using Semantic Space | Zhiliang Chen et.al. | 2503.11586 | null |
| 2025-03-14 | Synthesizing Access Control Policies using Large Language Models | Adarsh Vatsa et.al. | 2503.11573 | null |
| 2025-03-14 | Implicit Bias-Like Patterns in Reasoning Models | Messi H. J. Lee et.al. | 2503.11572 | null |
| 2025-03-14 | VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity | Jing Bi et.al. | 2503.11557 | null |
| 2025-03-14 | Potential of large language model-powered nudges for promoting daily water and energy conservation | Zonghan Li et.al. | 2503.11531 | null |
| 2025-03-14 | HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models | Ziqin Zhou et.al. | 2503.11513 | null |
| 2025-03-14 | V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning | Zixu Cheng et.al. | 2503.11495 | null |
| 2025-03-14 | A Review of DeepSeek Models’ Key Innovative Techniques | Chengen Wang et.al. | 2503.11486 | null |
| 2025-03-14 | T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation | Seyed Mohammad Hadi Hosseini et.al. | 2503.11481 | null |
| 2025-03-13 | GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing | Rongyao Fang et.al. | 2503.10639 | link |
| 2025-03-13 | HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model | Jiaming Liu et.al. | 2503.10631 | null |
| 2025-03-13 | UniGoal: Towards Universal Zero-shot Goal-oriented Navigation | Hang Yin et.al. | 2503.10630 | null |
| 2025-03-13 | DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding | Ayesha Ishaq et.al. | 2503.10621 | link |
| 2025-03-13 | From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM | Kshitij Ambilduke et.al. | 2503.10620 | null |
| 2025-03-13 | Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search | Andy Zhou et.al. | 2503.10619 | null |
| 2025-03-13 | Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models | Andy Zhou et.al. | 2503.10617 | null |
| 2025-03-13 | R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization | Yi Yang et.al. | 2503.10615 | link |
| 2025-03-13 | CoSTA $\ast$ : Cost-Sensitive Toolpath Agent for Multi-turn Image Editing | Advait Gupta et.al. | 2503.10613 | link |
| 2025-03-13 | TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention | Jinhao Duan et.al. | 2503.10602 | link |
| 2025-03-12 | MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System | Jihao Zhao et.al. | 2503.09600 | null |
| 2025-03-12 | How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation | Ruohao Guo et.al. | 2503.09598 | null |
| 2025-03-12 | SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment | Katrin Renz et.al. | 2503.09594 | null |
| 2025-03-12 | BIMBA: Selective-Scan Compression for Long-Range Video Question Answering | Md Mohaiminul Islam et.al. | 2503.09590 | null |
| 2025-03-12 | Cost-Optimal Grouped-Query Attention for Long-Context LLMs | Yingfa Chen et.al. | 2503.09579 | link |
| 2025-03-12 | Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks | Lutfi Eren Erdogan et.al. | 2503.09572 | null |
| 2025-03-12 | Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models | Qiguang Chen et.al. | 2503.09567 | null |
| 2025-03-12 | Large Language Models for Multi-Facility Location Mechanism Design | Nguyen Thach et.al. | 2503.09533 | null |
| 2025-03-12 | Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning | Bowen Jin et.al. | 2503.09516 | null |
| 2025-03-12 | ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning | Ziyu Wan et.al. | 2503.09501 | null |
| 2025-03-11 | Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs | Ariba Khan et.al. | 2503.08688 | null |
| 2025-03-11 | OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models | Jialv Zou et.al. | 2503.08686 | null |
| 2025-03-11 | Self-Taught Self-Correction for Small Language Models | Viktor Moskvoretskii et.al. | 2503.08681 | null |
| 2025-03-11 | Exploring the Word Sense Disambiguation Capabilities of Large Language Models | Pierpaolo Basile et.al. | 2503.08662 | null |
| 2025-03-11 | LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization | Xianfeng Wu et.al. | 2503.08619 | null |
| 2025-03-11 | EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments | Dongping Li et.al. | 2503.08604 | null |
| 2025-03-11 | NSF-SciFy: Mining the NSF Awards Database for Scientific Claims | Delip Rao et.al. | 2503.08600 | null |
| 2025-03-11 | HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding | Shehreen Azad et.al. | 2503.08585 | null |
| 2025-03-11 | RAG-Adapter: A Plug-and-Play RAG-enhanced Framework for Long Video Understanding | Xichen Tan et.al. | 2503.08576 | null |
| 2025-03-11 | DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process | Minjun Zhu et.al. | 2503.08569 | null |
| 2025-03-10 | Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru | Dunant Cusipuma et.al. | 2503.07587 | null |
| 2025-03-10 | Talking to GDELT Through Knowledge Graphs | Audun Myers et.al. | 2503.07584 | null |
| 2025-03-10 | AutoSpatial: Visual-Language Reasoning for Social Robot Navigation through Efficient Spatial Reasoning Learning | Yangzhe Kong et.al. | 2503.07557 | null |
| 2025-03-10 | Junior Software Developers’ Perspectives on Adopting LLMs for Software Engineering: a Systematic Literature Review | Samuel Ferino et.al. | 2503.07556 | null |
| 2025-03-10 | KSOD: Knowledge Supplement for LLMs On Demand | Haoran Li et.al. | 2503.07550 | null |
| 2025-03-10 | Bi-Directional Mental Model Reconciliation for Human-Robot Interaction with Large Language Models | Nina Moorman et.al. | 2503.07547 | null |
| 2025-03-10 | Queueing, Predictions, and LLMs: Challenges and Open Problems | Michael Mitzenmacher et.al. | 2503.07545 | null |
| 2025-03-10 | XIFBench: Evaluating Large Language Models on Multilingual Instruction Following | Zhenyu Li et.al. | 2503.07539 | null |
| 2025-03-10 | TokenButler: Token Importance is Predictable | Yash Akhauri et.al. | 2503.07518 | null |
| 2025-03-10 | Language Models Fail to Introspect About Their Knowledge of Language | Siyuan Song et.al. | 2503.07513 | null |
| 2025-03-10 | LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition? | Bangyan Li et.al. | 2503.07487 | null |
| 2025-03-10 | GenAIReading: Augmenting Human Cognition with Interactive Digital Textbooks Using Large Language Models and Image Generation Models | Ryugo Morita et.al. | 2503.07463 | null |
| 2025-03-10 | MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning | Xiangru Tang et.al. | 2503.07459 | null |
| 2025-03-10 | LLMs syntactically adapt their language use to their conversational partner | Florian Kandra et.al. | 2503.07457 | null |
| 2025-03-10 | From Idea to Implementation: Evaluating the Influence of Large Language Models in Software Development – An Opinion Paper | Sargam Yadav et.al. | 2503.07450 | null |
| 2025-03-10 | From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics | Jaewook Lee et.al. | 2503.07429 | null |
| 2025-03-10 | RePO: ReLU-based Preference Optimization | Junkang Wu et.al. | 2503.07426 | null |
| 2025-03-10 | REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding | Yan Tai et.al. | 2503.07413 | link |
| 2025-03-10 | Revisiting Noise in Natural Language Processing for Computational Social Science | Nadav Borenstein et.al. | 2503.07395 | null |
| 2025-03-10 | Process-Supervised LLM Recommenders via Flow-guided Tuning | Chongming Gao et.al. | 2503.07377 | null |
| 2025-03-07 | Understanding the Limits of Lifelong Knowledge Editing in LLMs | Lukas Thede et.al. | 2503.05683 | null |
| 2025-03-07 | A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval | Yu Zhang et.al. | 2503.05659 | null |
| 2025-03-07 | Learning LLM Preference over Intra-Dialogue Pairs: A Framework for Utterance-level Understandings | Xuanqing Liu et.al. | 2503.05620 | null |
| 2025-03-07 | A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models | Dong Shu et.al. | 2503.05613 | null |
| 2025-03-07 | R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning | Huatong Song et.al. | 2503.05592 | null |
| 2025-03-07 | Evaluating open-source Large Language Models for automated fact-checking | Nicolo’ Fontana et.al. | 2503.05565 | null |
| 2025-03-07 | Revitalizing Saturated Benchmarks: A Weighted Metric Approach for Differentiating Large Language Model Performance | Bryan Etzine et.al. | 2503.05551 | null |
| 2025-03-07 | Leveraging Approximate Caching for Faster Retrieval-Augmented Generation | Shai Bergman et.al. | 2503.05530 | null |
| 2025-03-07 | PoSSUM: A Protocol for Surveying Social-media Users with Multimodal LLMs | Roberto Cerina et.al. | 2503.05529 | null |
| 2025-03-07 | Cognitive Bias Detection Using Advanced Prompt Engineering | Frederic Lemieux et.al. | 2503.05516 | null |
| 2025-03-06 | L $^2$ M: Mutual Information Scaling Law for Long-Context Language Modeling | Zhuo Chen et.al. | 2503.04725 | null |
| 2025-03-06 | Shifting Long-Context LLMs Research from Input to Output | Yuhao Wu et.al. | 2503.04723 | null |
| 2025-03-06 | Enough Coin Flips Can Make LLMs Act Bayesian | Ritwik Gupta et.al. | 2503.04722 | null |
| 2025-03-06 | Predictable Scale: Part I – Optimal Hyperparameter Scaling Law in Large Language Model Pretraining | Houyi Li et.al. | 2503.04715 | null |
| 2025-03-06 | Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size | Alireza Behtash et.al. | 2503.04704 | null |
| 2025-03-06 | UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets | Wenyu Wang et.al. | 2503.04693 | null |
| 2025-03-06 | Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases | Pengcheng Qiu et.al. | 2503.04691 | null |
| 2025-03-06 | LLM-guided Plan and Retrieval: A Strategic Alignment for Interpretable User Satisfaction Estimation in Dialogue | Sangyeop Kim et.al. | 2503.04675 | null |
| 2025-03-06 | RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining | Tengfei Zhang et.al. | 2503.04653 | null |
| 2025-03-06 | Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment | Wen Yang et.al. | 2503.04647 | null |
| 2025-03-05 | The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems | Richard Ren et.al. | 2503.03750 | null |
| 2025-03-05 | Process-based Self-Rewarding Language Models | Shimao Zhang et.al. | 2503.03746 | null |
| 2025-03-05 | Towards Understanding Distilled Reasoning Models: A Representational Approach | David D. Baek et.al. | 2503.03730 | null |
| 2025-03-05 | Improving LLM Safety Alignment with Dual-Objective Optimization | Xuandong Zhao et.al. | 2503.03710 | null |
| 2025-03-05 | Effective LLM Knowledge Learning via Model Generalization | Mingkang Zhu et.al. | 2503.03705 | null |
| 2025-03-05 | A Practical Memory Injection Attack against LLM Agents | Shen Dong et.al. | 2503.03704 | null |
| 2025-03-05 | Developing and Utilizing a Large-Scale Cantonese Dataset for Multi-Tasking in Large Language Models | Jiyue Jiang et.al. | 2503.03702 | null |
| 2025-03-05 | Addressing Overprescribing Challenges: Fine-Tuning Large Language Models for Medication Recommendation Tasks | Zihao Zhao et.al. | 2503.03687 | null |
| 2025-03-05 | Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models | Bar Karov et.al. | 2503.03669 | null |
| 2025-03-05 | Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction | Gustaw Opiełka et.al. | 2503.03666 | null |
| 2025-03-04 | Wikipedia in the Era of LLMs: Evolution and Risks | Siming Huang et.al. | 2503.02879 | null |
| 2025-03-04 | The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models | Ke Ji et.al. | 2503.02875 | null |
| 2025-03-04 | Prompting Generative AI with Interaction-Augmented Instructions | Leixian Shen et.al. | 2503.02874 | null |
| 2025-03-04 | FairSense-AI: Responsible AI Meets Sustainability | Shaina Raza et.al. | 2503.02865 | null |
| 2025-03-04 | Calibrating LLM Confidence with Semantic Steering: A Multi-Prompt Aggregation Framework | Ziang Zhou et.al. | 2503.02863 | null |
| 2025-03-04 | Privacy and Accuracy-Aware AI/ML Model Deduplication | Hong Guan et.al. | 2503.02862 | null |
| 2025-03-04 | Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs’ Decoding Layers | Zicong He et.al. | 2503.02851 | null |
| 2025-03-04 | Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs | Yuzhe Gu et.al. | 2503.02846 | null |
| 2025-03-04 | AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation | Songming Zhang et.al. | 2503.02832 | null |
| 2025-03-04 | Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression | Nathan Godey et.al. | 2503.02812 | null |
| 2025-02-28 | LLM Post-Training: A Deep Dive into Reasoning Large Language Models | Komal Kumar et.al. | 2502.21321 | null |
| 2025-02-28 | FANformer: Improving Large Language Models Through Effective Periodicity Modeling | Yihong Dong et.al. | 2502.21309 | null |
| 2025-02-28 | Contextualizing biological perturbation experiments through language | Menghua Wu et.al. | 2502.21290 | null |
| 2025-02-28 | Adaptive Keyframe Sampling for Long Video Understanding | Xi Tang et.al. | 2502.21271 | null |
| 2025-02-28 | Token-level Ensembling of Models with Different Vocabularies | Rachel Wicks et.al. | 2502.21265 | null |
| 2025-02-28 | RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete | Yuheng Ji et.al. | 2502.21257 | null |
| 2025-02-28 | Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs | Xiaomin Li et.al. | 2502.21239 | null |
| 2025-02-28 | Transforming Tuberculosis Care: Optimizing Large Language Models For Enhanced Clinician-Patient Communication | Daniil Filienko et.al. | 2502.21236 | null |
| 2025-02-28 | ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs | Hao Ge et.al. | 2502.21231 | null |
| 2025-03-03 | ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer | Omer Goldman et.al. | 2502.21228 | null |
| 2025-02-27 | R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts | Zhongyang Li et.al. | 2502.20395 | null |
| 2025-02-27 | Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis | Jeffrey Yang Fan Chiang et.al. | 2502.20383 | null |
| 2025-02-27 | Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers | Shalev Lifshitz et.al. | 2502.20379 | null |
| 2025-02-27 | PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation | Albert Gong et.al. | 2502.20377 | null |
| 2025-02-27 | Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization | Ryan C. Barron et.al. | 2502.20364 | null |
| 2025-02-27 | Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs | Kuan Lok Zhou et.al. | 2502.20356 | null |
| 2025-02-27 | KEDRec-LM: A Knowledge-distilled Explainable Drug Recommendation Large Language Model | Kai Zhang et.al. | 2502.20350 | null |
| 2025-02-27 | Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models | Yi Jing et.al. | 2502.20344 | null |
| 2025-02-27 | Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners | Daniele Paliotta et.al. | 2502.20339 | null |
| 2025-02-27 | Expertise Is What We Want | Alan Ashworth et.al. | 2502.20335 | null |
| 2025-02-26 | Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing | Akshat Gupta et.al. | 2502.19416 | null |
| 2025-02-26 | Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs | Dayu Yang et.al. | 2502.19411 | null |
| 2025-02-26 | Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices | Xinru Wang et.al. | 2502.19410 | null |
| 2025-02-26 | ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models | Danae Sánchez Villegas et.al. | 2502.19409 | null |
| 2025-02-26 | Learning Code-Edit Embedding to Model Student Debugging Behavior | Hasnain Heickal et.al. | 2502.19407 | null |
| 2025-02-26 | General Reasoning Requires Learning to Reason from the Get-go | Seungwook Han et.al. | 2502.19402 | null |
| 2025-02-26 | TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding | Max Ku et.al. | 2502.19400 | null |
| 2025-02-26 | Residual Speech Embeddings for Tone Classification: Removing Linguistic Content to Enhance Paralinguistic Analysis | Hamdan Al Ahbabi et.al. | 2502.19387 | null |
| 2025-02-26 | DataMan: Data Manager for Pre-training Large Language Models | Ru Peng et.al. | 2502.19363 | null |
| 2025-02-26 | Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? | Yancheng He et.al. | 2502.19361 | null |
| 2025-02-25 | DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers | Xueguang Ma et.al. | 2502.18460 | null |
| 2025-02-25 | LLM-Based Design Pattern Detection | Christian Schindler et.al. | 2502.18458 | null |
| 2025-02-25 | FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response | Mollie Shichman et.al. | 2502.18452 | null |
| 2025-02-25 | SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution | Yuxiang Wei et.al. | 2502.18449 | null |
| 2025-02-25 | MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning | Chanwoo Park et.al. | 2502.18439 | null |
| 2025-02-25 | TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning | Frederikus Hudi et.al. | 2502.18431 | null |
| 2025-02-25 | OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference | Xiangyu Zhao et.al. | 2502.18411 | null |
| 2025-02-25 | Monte Carlo Temperature: a robust sampling strategy for LLM’s uncertainty quantification methods | Nicola Cecere et.al. | 2502.18389 | null |
| 2025-02-25 | How Far are LLMs from Real Search? A Comprehensive Study on Efficiency, Completeness, and Inherent Capabilities | Minhua Lin et.al. | 2502.18387 | null |
| 2025-02-25 | MindMem: Multimodal for Predicting Advertisement Memorability Using LLMs and Deep Learning | Sepehr Asgarian et.al. | 2502.18371 | null |
| 2025-02-24 | Introducing Visual Perception Token into Multimodal Large Language Model | Runpeng Yu et.al. | 2502.17425 | link |
| 2025-02-24 | MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs | Jiarui Zhang et.al. | 2502.17422 | link |
| 2025-02-24 | LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification | Penghui Yang et.al. | 2502.17421 | link |
| 2025-02-24 | The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence | Tom Wollschläger et.al. | 2502.17420 | null |
| 2025-02-24 | From System 1 to System 2: A Survey of Reasoning Large Language Models | Zhong-Zhi Li et.al. | 2502.17419 | link |
| 2025-02-24 | Reasoning with Latent Thoughts: On the Power of Looped Transformers | Nikunj Saunshi et.al. | 2502.17416 | null |
| 2025-02-24 | COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs | Liming Liu et.al. | 2502.17410 | link |
| 2025-02-24 | Large Language Models are Powerful EHR Encoders | Stefan Hegselmann et.al. | 2502.17403 | null |
| 2025-02-24 | DIS-CO: Discovering Copyrighted Content in VLMs Training Data | André V. Duarte et.al. | 2502.17358 | link |
| 2025-02-24 | On Relation-Specific Neurons in Large Language Models | Yihong Liu et.al. | 2502.17355 | link |
| 2025-02-21 | ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval | Guanqi Zhan et.al. | 2502.15682 | null |
| 2025-02-21 | Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training | Jaydeep Borkar et.al. | 2502.15680 | null |
| 2025-02-21 | FLEKE: Federated Locate-then-Edit Knowledge Editing | Zongkai Zhao et.al. | 2502.15677 | null |
| 2025-02-21 | AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind | Zhining Zhang et.al. | 2502.15676 | null |
| 2025-02-21 | Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing | Shoumik Saha et.al. | 2502.15666 | null |
| 2025-02-21 | Machine-generated text detection prevents language model collapse | George Drayson et.al. | 2502.15654 | null |
| 2025-02-21 | Empowering LLMs with Logical Reasoning: A Comprehensive Survey | Fengxiang Cheng et.al. | 2502.15652 | null |
| 2025-02-21 | Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models | Anirudh Sundar et.al. | 2502.15639 | null |
| 2025-02-21 | The Relationship Between Reasoning and Performance in Large Language Models – o3 (mini) Thinks Harder, Not Longer | Marthe Ballon et.al. | 2502.15631 | null |
| 2025-02-21 | Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing | Qi Le et.al. | 2502.15618 | null |
| 2025-02-20 | LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention | Shang Yang et.al. | 2502.14866 | link |
| 2025-02-20 | Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning | Shuyue Stella Li et.al. | 2502.14860 | link |
| 2025-02-20 | FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling | Weilin Zhao et.al. | 2502.14856 | null |
| 2025-02-20 | Prompt-to-Leaderboard | Evan Frick et.al. | 2502.14855 | null |
| 2025-02-20 | GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks | Jianwen Luo et.al. | 2502.14848 | null |
| 2025-02-20 | Red-Teaming LLM Multi-Agent Systems via Communication Attacks | Pengfei He et.al. | 2502.14847 | null |
| 2025-02-20 | Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation | Yue Yang et.al. | 2502.14846 | null |
| 2025-02-20 | Revealing and Mitigating Over-Attention in Knowledge Editing | Pinzheng Wang et.al. | 2502.14838 | null |
| 2025-02-20 | Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs | Danni Liu et.al. | 2502.14830 | null |
| 2025-02-20 | Exploring Advanced Techniques for Visual Question Answering: A Comprehensive Comparison | Aiswarya Baby et.al. | 2502.14827 | null |
| 2025-02-19 | Where’s the Bug? Attention Probing for Scalable Fault Localization | Adam Stein et.al. | 2502.13966 | null |
| 2025-02-19 | Autellix: An Efficient Serving Engine for LLM Agents as General Programs | Michael Luo et.al. | 2502.13965 | null |
| 2025-02-19 | MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads | Weihao Liu et.al. | 2502.13963 | null |
| 2025-02-19 | Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering | William Jurayj et.al. | 2502.13962 | link |
| 2025-02-19 | LIDDIA: Language-based Intelligent Drug Discovery Agent | Reza Averly et.al. | 2502.13959 | null |
| 2025-02-19 | Neurosymbolic artificial intelligence via large language models and coherence-driven inference | Steve Huntsman et.al. | 2502.13953 | null |
| 2025-02-19 | Why Safeguarded Ships Run Aground? Aligned Large Language Models’ Safety Mechanisms Tend to Be Anchored in The Template Region | Chak Tou Leong et.al. | 2502.13946 | null |
| 2025-02-19 | A Chain-of-Thought Subspace Meta-Learning for Few-shot Image Captioning with Large Vision and Language Models | Hao Huang et.al. | 2502.13942 | null |
| 2025-02-19 | LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization | Guanzheng Chen et.al. | 2502.13922 | link |
| 2025-02-19 | Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis | Jiahao Gai et.al. | 2502.13921 | null |
| 2025-02-18 | Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization | Shuo Xing et.al. | 2502.13146 | link |
| 2025-02-18 | Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation | Bencheng Liao et.al. | 2502.13145 | link |
| 2025-02-18 | UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models | Huawei Lin et.al. | 2502.13141 | null |
| 2025-02-18 | Towards Quantum Tensor Decomposition in Biomedical Applications | Myson Burch et.al. | 2502.13140 | null |
| 2025-02-18 | AIDE: AI-Driven Exploration in the Space of Code | Zhengyao Jiang et.al. | 2502.13138 | link |
| 2025-02-18 | Theorem Prover as a Judge for Synthetic Data Generation | Joshua Ong Jun Leang et.al. | 2502.13137 | null |
| 2025-02-18 | Learning to Defer for Causal Discovery with Imperfect Experts | Oscar Clivio et.al. | 2502.13132 | null |
| 2025-02-18 | Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning | Jingyang Lin et.al. | 2502.13127 | null |
| 2025-02-18 | RuozhiBench: Evaluating LLMs with Logical Fallacies and Misleading Premises | Zenan Zhai et.al. | 2502.13125 | null |
| 2025-02-18 | Adapting Psycholinguistic Research for LLMs: Gender-inclusive Language in a Coreference Context | Marion Bartl et.al. | 2502.13120 | null |
| 2025-02-17 | Idiosyncrasies in Large Language Models | Mingjie Sun et.al. | 2502.12150 | link |
| 2025-02-17 | HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation | Ling Yang et.al. | 2502.12148 | link |
| 2025-02-17 | Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control | Jinyan Su et.al. | 2502.12145 | null |
| 2025-02-17 | Small Models Struggle to Learn from Strong Reasoners | Yuetai Li et.al. | 2502.12143 | link |
| 2025-02-17 | SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs | Yige Xu et.al. | 2502.12134 | null |
| 2025-02-17 | Transformer Dynamics: A neuroscientific approach to interpretability of large language models | Jesseba Fernando et.al. | 2502.12131 | null |
| 2025-02-17 | Scaling Autonomous Agents via Automatic Reward Modeling And Planning | Zhenfang Chen et.al. | 2502.12130 | link |
| 2025-02-17 | Minimal Ranks, Maximum Confidence: Parameter-efficient Uncertainty Quantification for LoRA | Patryk Marszałek et.al. | 2502.12122 | null |
| 2025-02-17 | LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws | Prasanna Mayilvahanan et.al. | 2502.12120 | null |
| 2025-02-17 | PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection | Jinhe Bi et.al. | 2502.12119 | null |
| 2025-02-14 | MM-RLHF: The Next Step Forward in Multimodal LLM Alignment | Yi-Fan Zhang et.al. | 2502.10391 | null |
| 2025-02-14 | Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction | WonJin Yoon et.al. | 2502.10388 | null |
| 2025-02-14 | Enhancing Multilingual LLM Pretraining with Model-Based Data Selection | Bettina Messmer et.al. | 2502.10361 | null |
| 2025-02-14 | Organize the Web: Constructing Domains Enhances Pre-Training Data Curation | Alexander Wettig et.al. | 2502.10341 | null |
| 2025-02-14 | Evaluating the Meta- and Object-Level Reasoning of Large Language Models for Question Answering | Nick Ferguson et.al. | 2502.10338 | null |
| 2025-02-14 | LLM-Powered Preference Elicitation in Combinatorial Assignment | Ermis Soumalias et.al. | 2502.10308 | null |
| 2025-02-14 | Open-Source AI-Powered Optimization in Scalene: Advancing Python Performance Profiling with DeepSeek-R1 and LLaMA 3.2 | Saem Hasan et.al. | 2502.10299 | null |
| 2025-02-14 | Are Large Language Models the future crowd workers of Linguistics? | Iris Ferrazzo et.al. | 2502.10266 | null |
| 2025-02-14 | Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers | Aivin V. Solatorio et.al. | 2502.10263 | link |
| 2025-02-14 | VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models | Gokul Karthik Kumar et.al. | 2502.10250 | null |
| 2025-02-13 | Theoretical Benefit and Limitation of Diffusion Language Model | Guhao Feng et.al. | 2502.09622 | null |
| 2025-02-13 | MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency | Dongzhi Jiang et.al. | 2502.09621 | null |
| 2025-02-13 | Exploring the Potential of Encoder-free Architectures in 3D LMMs | Yiwen Tang et.al. | 2502.09620 | link |
| 2025-02-13 | Human-LLM Coevolution: Evidence from Academic Writing | Mingmeng Geng et.al. | 2502.09606 | null |
| 2025-02-13 | SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models | Yung-Sung Chuang et.al. | 2502.09604 | link |
| 2025-02-13 | GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis | Angelos Zavras et.al. | 2502.09598 | link |
| 2025-02-13 | Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs | Siyan Zhao et.al. | 2502.09597 | link |
| 2025-02-13 | KIMAs: A Configurable Knowledge Integrated Multi-Agent System | Zitao Li et.al. | 2502.09596 | null |
| 2025-02-13 | Logical forms complement probability in understanding language model (and human) performance | Yixuan Wang et.al. | 2502.09589 | null |
| 2025-02-13 | Polymind: Parallel Visual Diagramming with Large Language Models to Support Prewriting Through Microtasks | Qian Wan et.al. | 2502.09577 | null |
| 2025-02-12 | Examining Multilingual Embedding Models Cross-Lingually Through LLM-Generated Adversarial Examples | Andrianos Michail et.al. | 2502.08638 | null |
| 2025-02-12 | Ensemble based approach to quantifying uncertainty of LLM based classifications | Srijith Rajamohan et.al. | 2502.08631 | null |
| 2025-02-12 | Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks | Ang Li et.al. | 2502.08586 | null |
| 2025-02-12 | QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval | Wonduk Seo et.al. | 2502.08557 | null |
| 2025-02-12 | Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies | Sunnie S. Y. Kim et.al. | 2502.08554 | null |
| 2025-02-12 | LLMs can implicitly learn from mistakes in-context | Lisa Alazraki et.al. | 2502.08550 | null |
| 2025-02-12 | LLM Pretraining with Continuous Concepts | Jihoon Tack et.al. | 2502.08524 | link |
| 2025-02-12 | The Paradox of Stochasticity: Limited Creativity and Computational Decoupling in Temperature-Varied LLM Outputs of Structured Fictional Data | Evgenii Evstafev et.al. | 2502.08515 | null |
| 2025-02-12 | Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation | Mahnaz Koupaee et.al. | 2502.08514 | null |
| 2025-02-12 | Measuring Diversity in Synthetic Datasets | Yuchang Zhu et.al. | 2502.08512 | null |
| 2025-02-11 | DarwinLM: Evolutionary Structured Pruning of Large Language Models | Shengkun Tang et.al. | 2502.07780 | link |
| 2025-02-11 | Auditing Prompt Caching in Language Model APIs | Chenchen Gu et.al. | 2502.07776 | link |
| 2025-02-11 | Automatic Robot Task Planning by Integrating Large Language Model with Genetic Programming | Azizjon Kobilov et.al. | 2502.07772 | null |
| 2025-02-11 | Great Power Brings Great Responsibility: Personalizing Conversational AI for Diverse Problem-Solvers | Italo Santos et.al. | 2502.07763 | null |
| 2025-02-11 | Scalable Fingerprinting of Large Language Models | Anshul Nasery et.al. | 2502.07760 | null |
| 2025-02-11 | Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension | Wenbo Gong et.al. | 2502.07752 | null |
| 2025-02-11 | WHODUNIT: Evaluation benchmark for culprit detection in mystery stories | Kshitij Gupta et.al. | 2502.07747 | link |
| 2025-02-11 | The Economics of Large Language Models: Token Allocation, Fine-Tuning, and Optimal Pricing | Dirk Bergemann et.al. | 2502.07736 | null |
| 2025-02-11 | Economics of Sourcing Human Data | Sebastin Santy et.al. | 2502.07732 | null |
| 2025-02-11 | Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK | Marcos Cramer et.al. | 2502.07728 | null |
| 2025-02-10 | Rationalization Models for Text-to-SQL | Gaetano Rossiello et.al. | 2502.06759 | null |
| 2025-02-10 | Gradient Multi-Normalization for Stateless and Scalable LLM Training | Meyer Scetbon et.al. | 2502.06742 | null |
| 2025-02-10 | VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data | Thomas Zeng et.al. | 2502.06737 | null |
| 2025-02-10 | Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining | Daouda Sow et.al. | 2502.06733 | null |
| 2025-02-10 | Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling | Runze Liu et.al. | 2502.06703 | link |
| 2025-02-10 | Boosting Self-Efficacy and Performance of Large Language Models via Verbal Efficacy Stimulations | Rui Chen et.al. | 2502.06669 | null |
| 2025-02-10 | Automatic Evaluation of Healthcare LLMs Beyond Question-Answering | Anna Arias-Duart et.al. | 2502.06666 | null |
| 2025-02-10 | On the Limitations of Combining Sentiment Analysis Tools in a Cross-Platform Setting | Martin Obaidi et.al. | 2502.06665 | null |
| 2025-02-10 | EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models | Xingrun Xing et.al. | 2502.06663 | link |
| 2025-02-10 | Unbiased Evaluation of Large Language Models from a Causal Perspective | Meilin Chen et.al. | 2502.06655 | null |
| 2025-02-07 | Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Yunhang Shen et.al. | 2502.05177 | link |
| 2025-02-07 | NoLiMa: Long-Context Evaluation Beyond Literal Matching | Ali Modarressi et.al. | 2502.05167 | link |
| 2025-02-07 | DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails | Yihe Deng et.al. | 2502.05163 | link |
| 2025-02-07 | A Lightweight Method to Disrupt Memorized Sequences in LLM | Parjanya Prajakta Prashant et.al. | 2502.05159 | null |
| 2025-02-07 | Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment | Minh-Quan Le et.al. | 2502.05153 | null |
| 2025-02-07 | Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation | Steffen Eger et.al. | 2502.05151 | link |
| 2025-02-07 | CodeSCM: Causal Analysis for Multi-Modal Code Generation | Mukur Gupta et.al. | 2502.05150 | null |
| 2025-02-07 | An Annotated Reading of ‘The Singer of Tales’ in the LLM Era | Kush R. Varshney et.al. | 2502.05148 | null |
| 2025-02-07 | Refining Integration-by-Parts Reduction of Feynman Integrals with Machine Learning | Matt von Hippel et.al. | 2502.05121 | null |
| 2025-02-07 | Flexible and Efficient Grammar-Constrained Decoding | Kanghee Park et.al. | 2502.05111 | null |
| 2025-02-06 | Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment | Zuyan Liu et.al. | 2502.04328 | null |
| 2025-02-06 | Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions | Yik Siu Chan et.al. | 2502.04322 | link |
| 2025-02-06 | ChamaleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters | Kamer Ali Yuksel et.al. | 2502.04315 | null |
| 2025-02-06 | ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization | Yinjie Wang et.al. | 2502.04306 | link |
| 2025-02-06 | Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization | Yuanye Liu et.al. | 2502.04295 | link |
| 2025-02-06 | PILAF: Optimal Human Preference Sampling for Reward Modeling | Yunzhen Feng et.al. | 2502.04270 | null |
| 2025-02-06 | How does a Multilingual LM Handle Multiple Languages? | Santhosh Kakarla et.al. | 2502.04269 | null |
| 2025-02-06 | Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion | Marco Mistretta et.al. | 2502.04263 | link |
| 2025-02-06 | TriNER: A Series of Named Entity Recognition Models For Hindi, Bengali & Marathi | Mohammed Amaan Dhamaskar et.al. | 2502.04245 | null |
| 2025-02-06 | MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion | Xintong Hao et.al. | 2502.04235 | null |
| 2025-02-05 | Do Large Language Model Benchmarks Test Reliability? | Joshua Vendrow et.al. | 2502.03461 | null |
| 2025-02-05 | Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training | Boyao Wang et.al. | 2502.03460 | null |
| 2025-02-05 | A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs) | Yiye Chen et.al. | 2502.03450 | null |
| 2025-02-05 | BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving | Ran Xin et.al. | 2502.03438 | null |
| 2025-02-05 | On Fairness of Unified Multimodal Large Language Model for Image Generation | Ming Liu et.al. | 2502.03429 | null |
| 2025-02-05 | Harnessing Large Language Models for Curated Code Reviews | Oussama Ben Sghaier et.al. | 2502.03425 | null |
| 2025-02-05 | Investigating Corporate Social Responsibility Initiatives: Examining the case of corporate Covid-19 response | Meheli Basu et.al. | 2502.03421 | null |
| 2025-02-05 | Think or Step-by-Step? UnZIPping the Black Box in Zero-Shot Prompts | Nikta Gohari Sadr et.al. | 2502.03418 | null |
| 2025-02-05 | SPRI: Aligning Large Language Models with Context-Situated Principles | Hongli Zhan et.al. | 2502.03397 | null |
| 2025-02-05 | LIMO: Less is More for Reasoning | Yixin Ye et.al. | 2502.03387 | null |
| 2025-02-04 | COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation | Xueqing Deng et.al. | 2502.02589 | null |
| 2025-02-04 | A comparison of translation performance between DeepL and Supertext | Alex Flückiger et.al. | 2502.02577 | null |
| 2025-02-04 | Are Language Models Up to Sequential Optimization Problems? From Evaluation to a Hegelian-Inspired Enhancement | Soheil Abbasloo et.al. | 2502.02573 | null |
| 2025-02-04 | Learning the RoPEs: Better 2D and 3D Position Encodings with STRING | Connor Schenck et.al. | 2502.02562 | null |
| 2025-02-04 | LLMs for Generation of Architectural Components: An Exploratory Empirical Study in the Serverless World | Shrikara Arun et.al. | 2502.02539 | null |
| 2025-02-04 | Adaptive Self-improvement LLM Agentic System for ML Library Development | Genghan Zhang et.al. | 2502.02534 | null |
| 2025-02-04 | Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies | Han Zhou et.al. | 2502.02533 | null |
| 2025-02-04 | Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search | Maohao Shen et.al. | 2502.02508 | null |
| 2025-02-04 | EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization | Yize Wu et.al. | 2502.02493 | null |
| 2025-02-04 | Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study | Menglong Cui et.al. | 2502.02481 | null |
| 2025-01-31 | Vintix: Action Model via In-Context Reinforcement Learning | Andrey Polubarov et.al. | 2501.19400 | link |
| 2025-01-31 | Do LLMs Strategically Reveal, Conceal, and Infer Information? A Theoretical and Empirical Analysis in The Chameleon Game | Mustafa O. Karabag et.al. | 2501.19398 | link |
| 2025-01-31 | Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models | Alina Shutova et.al. | 2501.19392 | null |
| 2025-01-31 | Federated Sketching LoRA: On-Device Collaborative Fine-Tuning of Large Language Models | Wenzhi Fang et.al. | 2501.19389 | null |
| 2025-02-03 | SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions | Dominik Wagner et.al. | 2501.19377 | null |
| 2025-01-31 | We’re Different, We’re the Same: Creative Homogeneity Across LLMs | Emily Wenger et.al. | 2501.19361 | null |
| 2025-01-31 | Mechanical Properties of the Meninges: Large Language Model Assisted Systematic Review of over 25,000 Studies | Brandon P. Chelstrom et.al. | 2501.19359 | null |
| 2025-01-31 | The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking | Yuchun Miao et.al. | 2501.19358 | null |
| 2025-01-31 | Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SCICAP Challenge 2023 | Ting-Yao E. Hsu et.al. | 2501.19353 | null |
| 2025-01-31 | Towards Adaptive Self-Improvement for Smarter Energy Systems | Alexander Sommer et.al. | 2501.19340 | null |
| 2025-01-30 | Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs | Yue Wang et.al. | 2501.18585 | null |
| 2025-01-30 | Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH | Evgenii Evstafev et.al. | 2501.18576 | null |
| 2025-01-30 | BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos | Lehao Lin et.al. | 2501.18565 | null |
| 2025-01-30 | Semantic Web and Creative AI – A Technical Report from ISWS 2023 | Raia Abu Ahmad et.al. | 2501.18542 | null |
| 2025-01-30 | Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges | Manveer Singh Tamber et.al. | 2501.18536 | link |
| 2025-01-30 | Differentially Private Steering for Large Language Model Alignment | Anmol Goel et.al. | 2501.18532 | link |
| 2025-01-30 | Learn from the Past: Language-conditioned Object Rearrangement with Large Language Models | Guanqun Cao et.al. | 2501.18516 | null |
| 2025-01-30 | Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch | Arthur Douillard et.al. | 2501.18512 | null |
| 2025-01-30 | CLEAR: Cue Learning using Evolution for Accurate Recognition Applied to Sustainability Data Extraction | Peter J. Bentley et.al. | 2501.18504 | null |
| 2025-01-30 | A Tool for In-depth Analysis of Code Execution Reasoning of Large Language Models | Changshu Liu et.al. | 2501.18482 | null |
| 2025-01-29 | Learning Beyond the Surface: How Far Can Continual Pre-Training with LoRA Enhance LLMs’ Domain-Specific Insight Learning? | Pouya Pezeshkpour et.al. | 2501.17840 | link |
| 2025-01-29 | Leveraging Multimodal LLM for Inspirational User Interface Search | Seokhyeon Park et.al. | 2501.17799 | link |
| 2025-01-29 | BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation – Challenges and Insights | Chan-Jan Hsu et.al. | 2501.17790 | null |
| 2025-01-29 | AdditiveLLM: Large Language Models Predict Defects in Additive Manufacturing | Peter Pak et.al. | 2501.17784 | null |
| 2025-01-29 | 2SSP: A Two-Stage Framework for Structured Pruning of LLMs | Fabrizio Sandri et.al. | 2501.17771 | null |
| 2025-01-29 | Hybrid Graphs for Table-and-Text based Question Answering using LLMs | Ankush Agarwal et.al. | 2501.17767 | null |
| 2025-01-29 | On the Partitioning of GPU Power among Multi-Instances | Tirth Vamja et.al. | 2501.17752 | null |
| 2025-01-29 | Early External Safety Testing of OpenAI’s o3-mini: Insights from the Pre-Deployment Evaluation | Aitor Arrieta et.al. | 2501.17749 | null |
| 2025-01-29 | Using Code Generation to Solve Open Instances of Combinatorial Design Problems | Christopher D. Rosin et.al. | 2501.17725 | link |
| 2025-01-29 | RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts | Eujeong Choi et.al. | 2501.17715 | link |
| 2025-01-28 | Cultural Differences and Perverse Incentives in Science Create a Bad Mix: Exploring Country-Level Publication Bias in Select ACM Conferences | Aksheytha Chelikavada et.al. | 2501.17150 | null |
| 2025-01-28 | FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data | Deren Lei et.al. | 2501.17144 | link |
| 2025-01-28 | ASTRAL: Automated Safety Testing of Large Language Models | Miriam Ugarte et.al. | 2501.17132 | null |
| 2025-01-28 | Optimizing Large Language Model Training Using FP4 Quantization | Ruizhe Wang et.al. | 2501.17116 | null |
| 2025-01-28 | Unlocking Transparent Alignment Through Enhanced Inverse Constitutional AI for Principle Extraction | Carl-Leander Henneking et.al. | 2501.17112 | null |
| 2025-01-28 | Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving | Evgenii Evstafev et.al. | 2501.17084 | null |
| 2025-01-28 | Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models | Minghan Li et.al. | 2501.17039 | null |
| 2025-01-28 | Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies | Manojkumar Parmar et.al. | 2501.17030 | null |
| 2025-01-28 | Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs | Alessandro Midolo et.al. | 2501.17024 | null |
| 2025-01-28 | Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement | Kei Katsumata et.al. | 2501.17022 | null |
| 2025-01-27 | Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology | Meiyun Cao et.al. | 2501.16309 | null |
| 2025-01-27 | RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval | Long Nguyen et.al. | 2501.16303 | null |
| 2025-01-27 | Matryoshka Re-Ranker: A Flexible Re-Ranking Architecture With Configurable Depth and Width | Zheng Liu et.al. | 2501.16302 | null |
| 2025-01-27 | Large Models in Dialogue for Active Perception and Anomaly Detection | Tzoulio Chamiti et.al. | 2501.16300 | null |
| 2025-01-27 | FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers | Renshan Zhang et.al. | 2501.16297 | null |
| 2025-01-27 | Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models | Jing Zhang et.al. | 2501.16282 | null |
| 2025-01-27 | Do LLMs Have Visualization Literacy? An Evaluation on Modified Visualizations to Test Generalization in Data Interpretation | Jiayi Hong et.al. | 2501.16277 | null |
| 2025-01-27 | URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots – A Case Study at HCMUT | Long Nguyen et.al. | 2501.16276 | null |
| 2025-01-27 | A foundation model for human-AI collaboration in medical literature mining | Zifeng Wang et.al. | 2501.16255 | null |
| 2025-01-27 | Multi-Agent Geospatial Copilots for Remote Sensing Workflows | Chaehong Lee et.al. | 2501.16254 | null |
| 2025-01-24 | HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation | Xin Zhou et.al. | 2501.14729 | link |
| 2025-01-24 | Do LLMs Provide Consistent Answers to Health-Related Questions across Languages? | Ipek Baris Schlicht et.al. | 2501.14719 | null |
| 2025-01-24 | Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models | Naihao Deng et.al. | 2501.14717 | null |
| 2025-01-24 | FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing | James Seale Smith et.al. | 2501.14713 | null |
| 2025-01-24 | The Karp Dataset | Mason DiCicco et.al. | 2501.14705 | null |
| 2025-01-24 | Rethinking Table Instruction Tuning | Naihao Deng et.al. | 2501.14693 | null |
| 2025-01-24 | An Empirical Study on LLM-based Classification of Requirements-related Provisions in Food-safety Regulations | Shabnam Hassani et.al. | 2501.14683 | null |
| 2025-01-24 | Diffusion based Text-to-Music Generationwith Global and Local Text based Conditioning | Jisi Zhang et.al. | 2501.14680 | null |
| 2025-01-24 | MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications | Yixing Jiang et.al. | 2501.14654 | link |
| 2025-01-24 | Investigating the (De)Composition Capabilities of Large Language Models in Natural-to-Formal Language Conversion | Ziyao Xu et.al. | 2501.14649 | link |
| 2025-01-23 | CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation | Guofeng Cui et.al. | 2501.13927 | null |
| 2025-01-23 | Analysis of Indic Language Capabilities in LLMs | Aatman Vaidya et.al. | 2501.13912 | null |
| 2025-01-23 | Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models | Linh Tran et.al. | 2501.13904 | null |
| 2025-01-23 | Exploring Finetuned Audio-LLM on Heart Murmur Features | Adrian Florea et.al. | 2501.13884 | null |
| 2025-01-23 | The machine learning platform for developers of large systems | Alexey Naikov et.al. | 2501.13881 | null |
| 2025-01-23 | A RAG-Based Institutional Assistant | Gustavo Kuratomi et.al. | 2501.13880 | null |
| 2025-01-23 | Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes | Shiling Deng et.al. | 2501.13851 | link |
| 2025-01-23 | On the Reasoning Capacity of AI Models and How to Quantify It | Santosh Kumar Radha et.al. | 2501.13833 | null |
| 2025-01-23 | Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing | Hao Zhang et.al. | 2501.13831 | null |
| 2025-01-23 | Hallucinations Can Improve Large Language Models in Drug Discovery | Shuzhou Yuan et.al. | 2501.13824 | null |
| 2025-01-22 | A Rate-Distortion Framework for Summarization | Enes Arda et.al. | 2501.13100 | null |
| 2025-01-22 | Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment | Melissa Kazemi Rad et.al. | 2501.13080 | null |
| 2025-01-22 | Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning | Bohao Yang et.al. | 2501.13042 | link |
| 2025-01-22 | Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament | Yantao Liu et.al. | 2501.13007 | link |
| 2025-01-22 | Large Language Model-Based Semantic Communication System for Image Transmission | Soheyb Ribouh et.al. | 2501.12988 | null |
| 2025-01-22 | LLM4WM: Adapting LLM for Wireless Multi-Tasking | Xuanyu Liu et.al. | 2501.12983 | null |
| 2025-01-22 | OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models | Chongren Sun et.al. | 2501.12975 | link |
| 2025-01-22 | Accessible Smart Contracts Verification: Synthesizing Formal Models with Tamed LLMs | Jan Corazza et.al. | 2501.12972 | null |
| 2025-01-22 | It’s complicated. The relationship of algorithmic fairness and non-discrimination regulations in the EU AI Act | Kristof Meding et.al. | 2501.12962 | null |
| 2025-01-22 | Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference | Weizhi Fei et.al. | 2501.12959 | null |
| 2025-01-21 | InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling | Yi Wang et.al. | 2501.12386 | link |
| 2025-01-21 | Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists | Thomas F. Eisenmann et.al. | 2501.12374 | link |
| 2025-01-21 | Is Long Context All You Need? Leveraging LLM’s Extended Context for NL2SQL | Yeounoh Chung et.al. | 2501.12372 | null |
| 2025-01-21 | Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration | Thomas Walshe et.al. | 2501.12332 | null |
| 2025-01-21 | VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model | Xianwei Zhuang et.al. | 2501.12327 | link |
| 2025-01-21 | LLM-Assisted Knowledge Graph Completion for Curriculum and Domain Modelling in Personalized Higher Education Recommendations | Hasan Abu-Rasheed et.al. | 2501.12300 | null |
| 2025-01-21 | MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks | Qishen Zhou et.al. | 2501.12281 | link |
| 2025-01-21 | Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement | Maosong Cao et.al. | 2501.12273 | null |
| 2025-01-21 | FOCUS: First Order Concentrated Updating Scheme | Yizhou Liu et.al. | 2501.12243 | null |
| 2025-01-21 | InsTALL: Context-aware Instructional Task Assistance with Multi-modal Large Language Models | Pha Nguyen et.al. | 2501.12231 | null |
| 2025-01-17 | FaceXBench: Evaluating Multimodal LLMs on Face Understanding | Kartik Narayan et.al. | 2501.10360 | link |
| 2025-01-17 | Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems | Weibo Gao et.al. | 2501.10332 | null |
| 2025-01-17 | Large language models for automated scholarly paper review: A survey | Zhenzhen Zhuang et.al. | 2501.10326 | null |
| 2025-01-17 | HiMix: Reducing Computational Complexity in Large Vision-Language Models | Xuange Zhang et.al. | 2501.10318 | null |
| 2025-01-17 | Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling | Suvodip Dey et.al. | 2501.10316 | link |
| 2025-01-17 | Addressing Popularity Bias in Third-Party Library Recommendations Using LLMs | Claudio Di Sipio et.al. | 2501.10313 | null |
| 2025-01-17 | Computational Protein Science in the Era of Large Language Models (LLMs) | Wenqi Fan et.al. | 2501.10282 | null |
| 2025-01-17 | Test Wars: A Comparative Study of SBST, Symbolic Execution, and LLM-Based Approaches to Unit Test Generation | Azat Abdullin et.al. | 2501.10200 | null |
| 2025-01-17 | Generative Artificial Intelligence: Implications for Biomedical and Health Professions Education | William Hersh et.al. | 2501.10186 | null |
| 2025-01-17 | Multi-stage Training of Bilingual Islamic LLM for Neural Passage Retrieval | Vera Pavlova et.al. | 2501.10175 | null |
| 2025-01-16 | Distilling Multi-modal Large Language Models for Autonomous Driving | Deepti Hegde et.al. | 2501.09757 | null |
| 2025-01-16 | Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues | Youngjoon Jang et.al. | 2501.09754 | null |
| 2025-01-16 | OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking | Zekun Xi et.al. | 2501.09751 | null |
| 2025-01-16 | Enhancing Lexicon-Based Text Embeddings with Large Language Models | Yibin Lei et.al. | 2501.09749 | null |
| 2025-01-16 | Suggesting Code Edits in Interactive Machine Learning Notebooks Using Large Language Models | Bihui Jin et.al. | 2501.09745 | null |
| 2025-01-16 | KU AIGEN ICL EDI@BC8 Track 3: Advancing Phenotype Named Entity Recognition and Normalization for Dysmorphology Physical Examination Reports | Hajung Kim et.al. | 2501.09744 | null |
| 2025-01-16 | Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps | Nanye Ma et.al. | 2501.09732 | null |
| 2025-01-16 | A Simple Aerial Detection Baseline of Multimodal Language Models | Qingyun Li et.al. | 2501.09720 | link |
| 2025-01-16 | CyberMentor: AI Powered Learning Tool Platform to Address Diverse Student Needs in Cybersecurity Education | Tianyu Wang et.al. | 2501.09709 | null |
| 2025-01-16 | Domain Adaptation of Foundation LLMs for e-Commerce | Christian Herold et.al. | 2501.09706 | null |
| 2025-01-15 | Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails | Shaona Ghosh et.al. | 2501.09004 | null |
| 2025-01-15 | Vision Foundation Models for Computed Tomography | Suraj Pai et.al. | 2501.09001 | null |
| 2025-01-15 | Development and Validation of the Provider Documentation Summarization Quality Instrument for Large Language Models | Emma Croxford et.al. | 2501.08977 | null |
| 2025-01-15 | Learning to Extract Cross-Domain Aspects and Understanding Sentiments Using Large Language Models | Karukriti Kaushik Ghosh et.al. | 2501.08974 | null |
| 2025-01-15 | Analyzing the Ethical Logic of Six Large Language Models | W. Russell Neuman et.al. | 2501.08951 | null |
| 2025-01-15 | Applying General Turn-taking Models to Conversational Human-Robot Interaction | Gabriel Skantze et.al. | 2501.08946 | null |
| 2025-01-15 | Disentangling Exploration of Large Language Models by Optimal Exploitation | Tim Grams et.al. | 2501.08925 | null |
| 2025-01-15 | GenAI Content Detection Task 3: Cross-Domain Machine-Generated Text Detection Challenge | Liam Dugan et.al. | 2501.08913 | null |
| 2025-01-15 | Leveraging Large Language Models as Knowledge-Driven Agents for Reliable Retrosynthesis Planning | Qinyu Ma et.al. | 2501.08897 | null |
| 2025-01-15 | XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework | Sida Tian et.al. | 2501.08809 | null |
| 2025-01-14 | PokerBench: Training Large Language Models to become Professional Poker Players | Richard Zhuang et.al. | 2501.08328 | link |
| 2025-01-14 | Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks | Miran Heo et.al. | 2501.08326 | null |
| 2025-01-14 | ADAM-1: AI and Bioinformatics for Alzheimer’s Detection and Microbiome-Clinical Data Integrations | Ziyuan Huang et.al. | 2501.08324 | null |
| 2025-01-14 | Exploring Robustness of Multilingual LLMs on Real-World Noisy Data | Amirhossein Aliakbarzadeh et.al. | 2501.08322 | link |
| 2025-01-14 | Enhancing Automated Interpretability with Output-Centric Feature Descriptions | Yoav Gur-Arieh et.al. | 2501.08319 | link |
| 2025-01-14 | HALoGEN: Fantastic LLM Hallucinations and Where to Find Them | Abhilasha Ravichander et.al. | 2501.08292 | null |
| 2025-01-14 | LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding | Hongyu Li et.al. | 2501.08282 | link |
| 2025-01-14 | Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing | Pulkit Arora et.al. | 2501.08276 | null |
| 2025-01-14 | TriMod Fusion for Multimodal Named Entity Recognition in Social Media | Mosab Alfaqeeh et.al. | 2501.08267 | null |
| 2025-01-14 | Addressing the sustainable AI trilemma: a case study on LLM agents and RAG | Hui Wu et.al. | 2501.08262 | null |
| 2025-01-13 | Imagine while Reasoning in Space: Multimodal Visualization-of-Thought | Chengzu Li et.al. | 2501.07542 | null |
| 2025-01-13 | ML Mule: Mobile-Driven Context-Aware Collaborative Learning | Haoxiang Yu et.al. | 2501.07536 | null |
| 2025-01-13 | Investigating Large Language Models in Inferring Personality Traits from User Conversations | Jianfeng Zhu et.al. | 2501.07532 | null |
| 2025-01-13 | RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment | Difei Gu et.al. | 2501.07525 | link |
| 2025-01-13 | Parallel Key-Value Cache Fusion for Position Invariant RAG | Philhoon Oh et.al. | 2501.07523 | null |
| 2025-01-13 | Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards | Yangsibo Huang et.al. | 2501.07493 | null |
| 2025-01-13 | TiEBe: A Benchmark for Assessing the Current Knowledge of Large Language Models | Thales Sales Almeida et.al. | 2501.07482 | null |
| 2025-01-13 | A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities | Yihao Liu et.al. | 2501.07468 | null |
| 2025-01-13 | Understanding and Benchmarking Artificial Intelligence: OpenAI’s o3 Is Not AGI | Rolf Pfister et.al. | 2501.07458 | null |
| 2025-01-13 | Enhancing LLM’s Ability to Generate More Repository-Aware Unit Tests Through Precise Contextual Information Injection | Xin Yin et.al. | 2501.07425 | null |
| 2025-01-10 | LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs | Omkar Thawakar et.al. | 2501.06186 | link |
| 2025-01-10 | PEACE: Empowering Geologic Map Holistic Understanding with MLLMs | Yangyu Huang et.al. | 2501.06184 | null |
| 2025-01-10 | Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories | Gerd Kortemeyer et.al. | 2501.06143 | null |
| 2025-01-10 | Supervision policies can shape long-term risk management in general-purpose AI models | Manuel Cebrian et.al. | 2501.06137 | link |
| 2025-01-10 | Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI | Yuya Asano et.al. | 2501.06129 | null |
| 2025-01-10 | Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding | Fabian David Schmidt et.al. | 2501.06117 | link |
| 2025-01-10 | From Conversation to Automation: Leveraging Large Language Models to Analyze Strategies in Problem Solving Therapy | Elham Aghakhani et.al. | 2501.06101 | null |
| 2025-01-10 | How to Tune a Multilingual Encoder Model for Germanic Languages: A Study of PEFT, Full Fine-Tuning, and Language Adapters | Romina Oji et.al. | 2501.06025 | link |
| 2025-01-10 | Addressing speaker gender bias in large scale speech translation systems | Shubham Bansal et.al. | 2501.05989 | null |
| 2025-01-10 | Exploring LLMs for Automated Pre-Testing of Cross-Cultural Surveys | Divya Mani Adhikari et.al. | 2501.05985 | null |
| 2025-01-09 | ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding | Xingyu Fu et.al. | 2501.05452 | link |
| 2025-01-09 | Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark | Yunzhuo Hao et.al. | 2501.05444 | null |
| 2025-01-09 | A survey of textual cyber abuse detection using cutting-edge language models and large language models | Jose A. Diaz-Garcia et.al. | 2501.05443 | null |
| 2025-01-09 | Using LLMs to Infer Non-Binary COVID-19 Sentiments of Chinese Micro-bloggers | Jerry Chongyi Hu et.al. | 2501.05423 | null |
| 2025-01-09 | FairCode: Evaluating Social Bias of LLMs in Code Generation | Yongkang Du et.al. | 2501.05396 | link |
| 2025-01-09 | Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models | Kristian G. Barman et.al. | 2501.05382 | null |
| 2025-01-09 | Accelerated Diffusion Models via Speculative Sampling | Valentin De Bortoli et.al. | 2501.05370 | null |
| 2025-01-09 | Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction | Hantao Lou et.al. | 2501.05336 | link |
| 2025-01-09 | “What’s Happening”- A Human-centered Multimodal Interpreter Explaining the Actions of Autonomous Vehicles | Xuewen Luo et.al. | 2501.05322 | null |
| 2025-01-09 | CallNavi: A Study and Challenge on Function Calling Routing and Invocation in Large Language Models | Yewei Song et.al. | 2501.05255 | null |
| 2025-01-08 | Re-ranking the Context for Multimodal Retrieval Augmented Generation | Matin Mortaheb et.al. | 2501.04695 | null |
| 2025-01-08 | URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics | Ruilin Luo et.al. | 2501.04686 | link |
| 2025-01-08 | Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations | Archita Srivastava et.al. | 2501.04675 | null |
| 2025-01-08 | Assessing Language Comprehension in Large Language Models Using Construction Grammar | Wesley Scivetti et.al. | 2501.04661 | null |
| 2025-01-08 | Multi-task retriever fine-tuning for domain-specific and efficient RAG | Patrice Béchard et.al. | 2501.04652 | null |
| 2025-01-08 | FlairGPT: Repurposing LLMs for Interior Designs | Gabrielle Littlefair et.al. | 2501.04648 | null |
| 2025-01-08 | Knowledge Retrieval Based on Generative AI | Te-Lun Yang et.al. | 2501.04635 | null |
| 2025-01-08 | “Can you be my mum?”: Manipulating Social Robots in the Large Language Models Era | Giulio Antonio Abbo et.al. | 2501.04633 | null |
| 2025-01-08 | Quantum-inspired Embeddings Projection and Similarity Metrics for Representation Learning | Ivan Kankeu et.al. | 2501.04591 | null |
| 2025-01-08 | InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection | Yuhang Liu et.al. | 2501.04575 | link |
| 2025-01-07 | Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Haobo Yuan et.al. | 2501.04001 | link |
| 2025-01-07 | RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance | Matin Mortaheb et.al. | 2501.03995 | null |
| 2025-01-07 | Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles | Yuxi Xia et.al. | 2501.03991 | null |
| 2025-01-07 | (De)-Indexing and the Right to be Forgotten | Salvatore Vilella et.al. | 2501.03989 | null |
| 2025-01-07 | VLM-driven Behavior Tree for Context-aware Task Planning | Naoki Wake et.al. | 2501.03968 | null |
| 2025-01-07 | Vision Language Models as Values Detectors | Giulio Antonio Abbo et.al. | 2501.03957 | null |
| 2025-01-07 | Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States | Jurgita Kapočiūtė-Dzikienė et.al. | 2501.03952 | null |
| 2025-01-07 | Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection | Pablo Miralles-González et.al. | 2501.03940 | null |
| 2025-01-07 | Visual question answering: from early developments to recent advances – a survey | Ngoc Dung Huynh et.al. | 2501.03939 | null |
| 2025-01-07 | Exploring the Potential of Large Language Models in Public Transportation: San Antonio Case Study | Ramya Jonnala et.al. | 2501.03904 | null |
| 2025-01-06 | BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning | Beichen Zhang et.al. | 2501.03226 | link |
| 2025-01-06 | Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation | Yuhui Zhang et.al. | 2501.03225 | link |
| 2025-01-06 | Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLMs-Generated Text | Ayat Najjar et.al. | 2501.03212 | null |
| 2025-01-06 | Detecting AI-Generated Text in Educational Content: Leveraging Machine Learning and Explainable AI for Academic Integrity | Ayat A. Najjar et.al. | 2501.03203 | null |
| 2025-01-06 | CLIX: Cross-Lingual Explanations of Idiomatic Expressions | Aaron Gluck et.al. | 2501.03191 | null |
| 2025-01-06 | GLiREL – Generalist Model for Zero-Shot Relation Extraction | Jack Boylan et.al. | 2501.03172 | null |
| 2025-01-06 | Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text | Ali Al-Lawati et.al. | 2501.03166 | link |
| 2025-01-06 | Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches | Alhassan Mumuni et.al. | 2501.03151 | null |
| 2025-01-06 | VicSim: Enhancing Victim Simulation with Emotional and Linguistic Fidelity | Yerong Li et.al. | 2501.03139 | null |
| 2025-01-06 | PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models | Mingyang Song et.al. | 2501.03124 | link |
| 2025-01-03 | VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction | Chaoyou Fu et.al. | 2501.01957 | link |
| 2025-01-03 | Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap | Weizhi Zhang et.al. | 2501.01945 | null |
| 2025-01-03 | Abstractive Text Summarization for Contemporary Sanskrit Prose: Issues and Challenges | Shagun Sinha et.al. | 2501.01933 | null |
| 2025-01-03 | Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding | Jiaming Li et.al. | 2501.01926 | null |
| 2025-01-03 | Virgo: A Preliminary Exploration on Reproducing o1-like MLLM | Yifan Du et.al. | 2501.01904 | link |
| 2025-01-03 | Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions | Rachneet Sachdeva et.al. | 2501.01872 | link |
| 2025-01-03 | Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification | Xiangxiang Dai et.al. | 2501.01849 | null |
| 2025-01-03 | MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning | Pu Yang et.al. | 2501.01834 | null |
| 2025-01-03 | Time Series Language Model for Descriptive Caption Generation | Mohamed Trabelsi et.al. | 2501.01832 | null |
| 2025-01-03 | Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models | Yanjiang Liu et.al. | 2501.01830 | null |
| 2025-01-02 | Unifying Specialized Visual Encoders for Video Language Models | Jihoon Chung et.al. | 2501.01426 | link |
| 2025-01-02 | Multi-Modal Video Feature Extraction for Popularity Prediction | Haixu Liu et.al. | 2501.01422 | null |
| 2025-01-02 | Deep Discrete Encoders: Identifiable Deep Generative Models for Rich Data with Discrete Latent Layers | Seunghyun Lee et.al. | 2501.01414 | null |
| 2025-01-02 | OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios | Xize Cheng et.al. | 2501.01384 | null |
| 2025-01-02 | CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering | Ben Vardi et.al. | 2501.01371 | null |
| 2025-01-02 | Embedding-based Approaches to Hyperpartisan News Detection | Karthik Mohan et.al. | 2501.01370 | null |
| 2025-01-02 | Aligning Large Language Models for Faithful Integrity Against Opposing Argument | Yong Zhao et.al. | 2501.01336 | null |
| 2025-01-02 | CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models | Johan Wahréus et.al. | 2501.01335 | link |
| 2025-01-02 | Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension | Yanbo Fang et.al. | 2501.01332 | null |
| 2025-01-02 | The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation | Shuzheng Gao et.al. | 2501.01329 | null |
| 2024-12-30 | Distributed Mixture-of-Agents for Edge Inference with Large Language Models | Purbesh Mitra et.al. | 2412.21200 | link |
| 2024-12-31 | HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation | Zhaojian Yu et.al. | 2412.21199 | link |
| 2024-12-30 | Facilitating large language model Russian adaptation with Learned Embedding Propagation | Mikhail Tikhomirov et.al. | 2412.21140 | link |
| 2024-12-30 | ExpShield: Safeguarding Web Text from Unauthorized Crawling and Language Modeling Exploitation | Ruixuan Liu et.al. | 2412.21123 | null |
| 2024-12-30 | Toward Intelligent and Secure Cloud: Large Language Model Empowered Proactive Defense | Yuyang Zhou et.al. | 2412.21051 | link |
| 2024-12-30 | TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization | Chia-Yu Hung et.al. | 2412.21037 | link |
| 2024-12-30 | GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models | Shangyu Xing et.al. | 2412.21036 | null |
| 2024-12-30 | Automated Robustness Testing for LLM-based NLP Software | Mingxuan Xiao et.al. | 2412.21016 | link |
| 2024-12-30 | MapQaTor: A System for Efficient Annotation of Map Query Datasets | Mahir Labib Dihan et.al. | 2412.21015 | link |
| 2024-12-31 | Verbosity-Aware Rationale Reduction: Effective Reduction of Redundant Rationale via Principled Criteria | Joonwon Jang et.al. | 2412.21006 | null |
| 2024-12-27 | Can AI Help with Your Personal Finances? | Oudom Hean et.al. | 2412.19784 | null |
| 2024-12-27 | Machine Learning for Sentiment Analysis of Imported Food in Trinidad and Tobago | Cassandra Daniels et.al. | 2412.19781 | null |
| 2024-12-27 | Fortran2CPP: Automating Fortran-to-C++ Migration using LLMs via Multi-Turn Dialogue and Dual-Agent Integration | Le Chen et.al. | 2412.19770 | link |
| 2024-12-27 | Can Large Language Models Adapt to Other Agents In-Context? | Matthew Riemer et.al. | 2412.19726 | null |
| 2024-12-27 | Text2Insight: Transform natural language text into insights seamlessly using multi-model architecture | Pradeep Sain et.al. | 2412.19718 | null |
| 2024-12-27 | Toward Adaptive Reasoning in Large Language Models with Thought Rollback | Sijia Chen et.al. | 2412.19707 | link |
| 2024-12-27 | A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization | Jingchun Lian et.al. | 2412.19685 | null |
| 2024-12-27 | Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework | Jiang Liu et.al. | 2412.19684 | null |
| 2024-12-27 | CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs | Siyu Wang et.al. | 2412.19663 | link |
| 2024-12-27 | FreStega: A Plug-and-Play Method for Boosting Imperceptibility and Capacity in Generative Linguistic Steganography for Real-World Scenarios | Kaiyi Pang et.al. | 2412.19652 | null |
| 2024-12-24 | Decentralized Intelligence in GameFi: Embodied AI Agents and the Convergence of DeFi and Virtual Ecosystems | Fernando Jia et.al. | 2412.18601 | link |
| 2024-12-24 | A Paragraph is All It Takes: Rich Robot Behaviors from Interacting, Trusted LLMs | OpenMind et.al. | 2412.18588 | null |
| 2024-12-24 | Exploring Embedding Priors in Prompt-Tuning for Improved Interpretability and Control | Sergey Sedov et.al. | 2412.18582 | null |
| 2024-12-24 | Zero-resource Speech Translation and Recognition with LLMs | Karel Mundnich et.al. | 2412.18566 | null |
| 2024-12-24 | Distilling Fine-grained Sentiment Understanding from Large Language Models | Yice Zhang et.al. | 2412.18552 | link |
| 2024-12-24 | Token-Budget-Aware LLM Reasoning | Tingxu Han et.al. | 2412.18547 | link |
| 2024-12-24 | PLD-Tree: Persistent Laplacian Decision Tree for Protein-Protein Binding Free Energy Prediction | Xingjian Xu et.al. | 2412.18541 | null |
| 2024-12-24 | Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation | Derong Xu Xinhang Li et.al. | 2412.18537 | link |
| 2024-12-24 | Automated Code Review In Practice | Umut Cihan et.al. | 2412.18531 | null |
| 2024-12-24 | Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving | Hao Pang et.al. | 2412.18511 | null |
| 2024-12-23 | ChatGarment: Garment Estimation, Generation and Editing via Large Language Models | Siyuan Bian et.al. | 2412.17811 | null |
| 2024-12-23 | Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective | Xinmiao Yu et.al. | 2412.17787 | null |
| 2024-12-23 | ResearchTown: Simulator of Human Research Community | Haofei Yu et.al. | 2412.17767 | link |
| 2024-12-23 | Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy | Priyaranjan Pattnayak et.al. | 2412.17759 | null |
| 2024-12-23 | ADC: Enhancing Function Calling Via Adversarial Datasets and Code Line-Level Feedback | Wei Zhang et.al. | 2412.17754 | null |
| 2024-12-23 | Deliberation in Latent Space via Differentiable Cache Augmentation | Luyang Liu et.al. | 2412.17747 | null |
| 2024-12-23 | YuLan-Mini: An Open Data-efficient Language Model | Yiwen Hu et.al. | 2412.17743 | link |
| 2024-12-23 | **Reasoning to Attend: Try to Understand How |
Rui Qian et.al. | 2412.17741 | link |
| 2024-12-23 | Knowledge Editing through Chain-of-Thought | Changyue Wang et.al. | 2412.17727 | link |
| 2024-12-23 | Understanding the Logic of Direct Preference Alignment through Logic | Kyle Richardson et.al. | 2412.17696 | null |
| 2024-12-20 | HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding | Chenxin Tao et.al. | 2412.16158 | null |
| 2024-12-20 | Offline Reinforcement Learning for LLM Multi-Step Reasoning | Huaijie Wang et.al. | 2412.16145 | link |
| 2024-12-20 | Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation | Seyedreza Mohseni et.al. | 2412.16135 | link |
| 2024-12-20 | Data-Driven Mechanism Design: Jointly Eliciting Preferences and Information | Dirk Bergemann et.al. | 2412.16132 | null |
| 2024-12-20 | PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics | Daniil Larionov et.al. | 2412.16120 | null |
| 2024-12-20 | Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts | Muhammad Abdullah Sohail et.al. | 2412.16119 | link |
| 2024-12-20 | PruneVid: Visual Token Pruning for Efficient Video Large Language Models | Xiaohu Huang et.al. | 2412.16117 | link |
| 2024-12-20 | The Content Moderator’s Dilemma: Removal of Toxic Content and Distortions to Online Discourse | Mahyar Habibi et.al. | 2412.16114 | null |
| 2024-12-20 | Logical Consistency of Large Language Models in Fact-checking | Bishwamittra Ghosh et.al. | 2412.16100 | null |
| 2024-12-20 | The Evolution of LLM Adoption in Industry Data Curation Practices | Crystal Qian et.al. | 2412.16089 | null |
| 2024-12-19 | UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency | Enis Simsar et.al. | 2412.15216 | null |
| 2024-12-19 | Flowing from Words to Pixels: A Framework for Cross-Modality Evolution | Qihao Liu et.al. | 2412.15213 | null |
| 2024-12-19 | OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving | Shuo Xing et.al. | 2412.15208 | link |
| 2024-12-19 | AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving | Shuo Xing et.al. | 2412.15206 | link |
| 2024-12-19 | MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark | Qihao Zhao et.al. | 2412.15194 | link |
| 2024-12-19 | LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation | Weijia Shi et.al. | 2412.15188 | null |
| 2024-12-19 | Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning | Simon Frieder et.al. | 2412.15184 | null |
| 2024-12-19 | HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages | Aman Chaturvedi et.al. | 2412.15178 | null |
| 2024-12-19 | Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying | Federico Castagna et.al. | 2412.15177 | link |
| 2024-12-19 | Rethinking Uncertainty Estimation in Natural Language Generation | Lukas Aichberger et.al. | 2412.15176 | null |
| 2024-12-18 | Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces | Jihan Yang et.al. | 2412.14171 | link |
| 2024-12-18 | TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks | Frank F. Xu et.al. | 2412.14161 | link |
| 2024-12-18 | Advanced Reasoning and Transformation Engine for Multi-Step Insight Synthesis in Data Analytics with Large Language Models | Atin Sakkeer Hussain et.al. | 2412.14146 | null |
| 2024-12-18 | LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research | Tianyang Gu et.al. | 2412.14141 | null |
| 2024-12-18 | Design choices made by LLM-based test generators prevent them from finding bugs | Noble Saji Mathews et.al. | 2412.14137 | null |
| 2024-12-18 | Adversarial Hubness in Multi-Modal Retrieval | Tingwei Zhang et.al. | 2412.14113 | link |
| 2024-12-18 | Alignment faking in large language models | Ryan Greenblatt et.al. | 2412.14093 | link |
| 2024-12-18 | Future Research Avenues for Artificial Intelligence in Digital Gaming: An Exploratory Report | Markus Dablander et.al. | 2412.14085 | null |
| 2024-12-18 | Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification | Kyle Thompson et.al. | 2412.14063 | null |
| 2024-12-18 | Understanding and Evaluating Trust in Generative AI and Large Language Models for Spreadsheets | Simon Thorne et.al. | 2412.14062 | null |
| 2024-12-17 | SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents | Sheng Yin et.al. | 2412.13178 | link |
| 2024-12-17 | DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation | Miriam Wanner et.al. | 2412.13175 | null |
| 2024-12-17 | Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study | Bolei Ma et.al. | 2412.13169 | link |
| 2024-12-17 | C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System | Parker Addison et.al. | 2412.13163 | null |
| 2024-12-17 | BanglishRev: A Large-Scale Bangla-English and Code-mixed Dataset of Product Reviews in E-Commerce | Mohammad Nazmush Shamael et.al. | 2412.13161 | null |
| 2024-12-17 | SWAN: Preprocessing SGD Enables Adam-Level Performance On LLM Training With Significant Memory Reduction | Chao Ma et.al. | 2412.13148 | null |
| 2024-12-17 | Are Your LLMs Capable of Stable Reasoning? | Junnan Liu et.al. | 2412.13147 | link |
| 2024-12-17 | AI PERSONA: Towards Life-long Personalization of LLMs | Tiannan Wang et.al. | 2412.13103 | null |
| 2024-12-17 | AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark | Jianlyu Chen et.al. | 2412.13102 | link |
| 2024-12-17 | Modality-Inconsistent Continual Learning of Multimodal Large Language Models | Weiguo Pian et.al. | 2412.13050 | null |
| 2024-12-16 | SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator | Guoxuan Chen et.al. | 2412.12094 | link |
| 2024-12-16 | Instruction-based Image Manipulation by Watching How Things Move | Mingdeng Cao et.al. | 2412.12087 | null |
| 2024-12-16 | CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology | Yuxuan Sun et.al. | 2412.12077 | null |
| 2024-12-16 | CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding | Guo Chen et.al. | 2412.12075 | null |
| 2024-12-16 | Making FETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats | Kuleen Sasse et.al. | 2412.12072 | link |
| 2024-12-16 | How Private are Language Models in Abstractive Summarization? | Anthony Hughes et.al. | 2412.12040 | null |
| 2024-12-16 | Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection | Ira Ceka et.al. | 2412.12039 | null |
| 2024-12-16 | SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval | Yueqian Lin et.al. | 2412.12009 | null |
| 2024-12-16 | Agentic AI-Driven Technical Troubleshooting for Enterprise Systems: A Novel Weighted Retrieval-Augmented Generation Paradigm | Rajat Khanda et.al. | 2412.12006 | null |
| 2024-12-16 | The Open Source Advantage in Large Language Models (LLMs) | Jiya Manchanda et.al. | 2412.12004 | null |
| 2024-12-13 | UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities | Muhammad Uzair Khattak et.al. | 2412.10372 | link |
| 2024-12-13 | Robust image classification with multi-modal large language models | Francesco Villani et.al. | 2412.10353 | null |
| 2024-12-13 | COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models | Yuchen Ren et.al. | 2412.10347 | null |
| 2024-12-13 | Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining | Zhiqi Ge et.al. | 2412.10342 | null |
| 2024-12-13 | AdvPrefix: An Objective for Nuanced LLM Jailbreaks | Sicheng Zhu et.al. | 2412.10321 | null |
| 2024-12-13 | BrushEdit: All-In-One Image Inpainting and Editing | Yaowei Li et.al. | 2412.10316 | link |
| 2024-12-13 | DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding | Zhiyu Wu et.al. | 2412.10302 | link |
| 2024-12-13 | Buzz to Broadcast: Predicting Sports Viewership Using Social Media Engagement | Anakin Trotter et.al. | 2412.10298 | link |
| 2024-12-13 | Still “Talking About Large Language Models”: Some Clarifications | Murray Shanahan et.al. | 2412.10291 | null |
| 2024-12-13 | One world, one opinion? The superstar effect in LLM responses | Sofie Goethals et.al. | 2412.10281 | null |
| 2024-12-12 | Doe-1: Closed-Loop Autonomous Driving with Large World Model | Wenzhao Zheng et.al. | 2412.09627 | link |
| 2024-12-12 | EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM | Zhuofan Zong et.al. | 2412.09618 | null |
| 2024-12-12 | Olympus: A Universal Task Router for Computer Vision Tasks | Yuanze Lin et.al. | 2412.09612 | link |
| 2024-12-12 | SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding | Hao Li et.al. | 2412.09604 | null |
| 2024-12-12 | Do Multimodal Large Language Models See Like Humans? | Jiaying Lin et.al. | 2412.09603 | null |
| 2024-12-12 | InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions | Pan Zhang et.al. | 2412.09596 | link |
| 2024-12-12 | OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages | Chester Palen-Michel et.al. | 2412.09587 | null |
| 2024-12-12 | DISHONEST: Dissecting misInformation Spread using Homogeneous sOcial NEtworks and Semantic Topic classification | Caleb Stam et.al. | 2412.09578 | null |
| 2024-12-12 | DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction | Yu Feng et.al. | 2412.09572 | null |
| 2024-12-12 | Does Representation Matter? Exploring Intermediate Layers in Large Language Models | Oscar Skean et.al. | 2412.09563 | null |
| 2024-12-11 | Generative Semantic Communication: Architectures, Technologies, and Applications | Jinke Ren et.al. | 2412.08642 | null |
| 2024-12-11 | Fast Prompt Alignment for Text-to-Image Generation | Khalil Mrini et.al. | 2412.08639 | link |
| 2024-12-11 | Multimodal Latent Language Modeling with Next-Token Diffusion | Yutao Sun et.al. | 2412.08635 | null |
| 2024-12-11 | Synthetic Vision: Training Vision-Language Models to Understand Physics | Vahid Balazadeh et.al. | 2412.08619 | null |
| 2024-12-11 | Image Retrieval Methods in the Dissimilarity Space | Madhu Kiran et.al. | 2412.08618 | null |
| 2024-12-11 | Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models | Jiahui Li et.al. | 2412.08615 | link |
| 2024-12-11 | Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning | Fan Lu et.al. | 2412.08614 | link |
| 2024-12-11 | Preference Discerning with LLM-Enhanced Generative Retrieval | Fabian Paischer et.al. | 2412.08604 | null |
| 2024-12-11 | Empirical Measurements of AI Training Power Demand on a GPU-Accelerated Node | Imran Latif et.al. | 2412.08602 | null |
| 2024-12-11 | Leveraging Graph-RAG and Prompt Engineering to Enhance LLM-Based Automated Requirement Traceability and Compliance Checks | Arsalan Masoudifard et.al. | 2412.08593 | null |
| 2024-12-10 | BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities | Sahal Shaji Mullappilly et.al. | 2412.07769 | null |
| 2024-12-10 | Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences | Alan Nawzad Amin et.al. | 2412.07763 | link |
| 2024-12-10 | Zero-Shot ATC Coding with Large Language Models for Clinical Assessments | Zijian Chen et.al. | 2412.07743 | null |
| 2024-12-10 | Image Retrieval with Intra-Sweep Representation Learning for Neck Ultrasound Scanning Guidance | Wanwen Chen et.al. | 2412.07741 | null |
| 2024-12-10 | Granite Guardian | Inkit Padhi et.al. | 2412.07724 | link |
| 2024-12-10 | DriveMM: All-in-One Large Multimodal Model for Autonomous Driving | Zhijian Huang et.al. | 2412.07689 | link |
| 2024-12-10 | Privacy-Preserving Customer Support: A Framework for Secure and Scalable Interactions | Anant Prakash Awasthi et.al. | 2412.07687 | null |
| 2024-12-10 | TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation | Alfredo Garrachón Ruiz et.al. | 2412.07682 | null |
| 2024-12-10 | Ask Humans or AI? Exploring Their Roles in Visualization Troubleshooting | Shuyu Shen et.al. | 2412.07673 | null |
| 2024-12-10 | FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks | Bocheng Chen et.al. | 2412.07672 | null |
| 2024-12-09 | Training Large Language Models to Reason in a Continuous Latent Space | Shibo Hao et.al. | 2412.06769 | null |
| 2024-12-09 | Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code | Joy Krishan Das et.al. | 2412.06757 | null |
| 2024-12-09 | Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models | Neel Jain et.al. | 2412.06748 | null |
| 2024-12-09 | JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM | Takuro Fujii et.al. | 2412.06738 | null |
| 2024-12-09 | AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark | Lan Li et.al. | 2412.06724 | null |
| 2024-12-09 | DEEPER: Dense Electroencephalography Passage Retrieval | Niall McGuire et.al. | 2412.06695 | null |
| 2024-12-09 | OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions | Yi-Kai Zhang et.al. | 2412.06693 | null |
| 2024-12-09 | Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach | Weichao Xu et.al. | 2412.06684 | null |
| 2024-12-09 | Toward LLM-Agent-Based Modeling of Transportation Systems: A Conceptual Framework | Tianming Liu et.al. | 2412.06681 | null |
| 2024-12-09 | I Don’t Know: Explicit Modeling of Uncertainty with an [IDK] Token | Roi Cohen et.al. | 2412.06676 | null |
| 2024-12-06 | Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | Zhe Chen et.al. | 2412.05271 | null |
| 2024-12-06 | APOLLO: SGD-like Memory, AdamW-level Performance | Hanqing Zhu et.al. | 2412.05270 | link |
| 2024-12-06 | CompCap: Improving Multimodal Large Language Models with Composite Captions | Xiaohui Chen et.al. | 2412.05243 | null |
| 2024-12-06 | MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale | Jarvis Guo et.al. | 2412.05237 | link |
| 2024-12-06 | BEExformer: A Fast Inferencing Transformer Architecture via Binarization with Multiple Early Exits | Wazib Ansar et.al. | 2412.05225 | null |
| 2024-12-06 | 100% Hallucination Elimination Using Acurai | Michael C. Wood et.al. | 2412.05223 | null |
| 2024-12-06 | Evaluating and Aligning CodeLLMs on Human Preference | Jian Yang et.al. | 2412.05210 | link |
| 2024-12-06 | A Survey of Large Language Model-Based Generative AI for Text-to-SQL: Benchmarks, Applications, Use Cases, and Challenges | Aditi Singh et.al. | 2412.05208 | null |
| 2024-12-06 | Are Frontier Large Language Models Suitable for Q&A in Science Centres? | Jacob Watson et.al. | 2412.05200 | null |
| 2024-12-06 | SurgBox: Agent-Driven Operating Room Sandbox with Surgery Copilot | Jinlin Wu et.al. | 2412.05187 | link |
| 2024-12-05 | p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay | Jun Zhang et.al. | 2412.04449 | link |
| 2024-12-05 | EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios | Lu Qiu et.al. | 2412.04447 | null |
| 2024-12-05 | Moto: Latent Motion Token as the Bridging Language for Robot Manipulation | Yi Chen et.al. | 2412.04445 | link |
| 2024-12-05 | Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation | Yuying Ge et.al. | 2412.04432 | link |
| 2024-12-05 | Grounding Descriptions in Images informs Zero-Shot Visual Recognition | Shaunak Halbe et.al. | 2412.04429 | link |
| 2024-12-05 | Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion | Jiuhai Chen et.al. | 2412.04424 | link |
| 2024-12-05 | Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation | Xuying Li et.al. | 2412.04415 | null |
| 2024-12-05 | Retrieval-Augmented Machine Translation with Unstructured Knowledge | Jiaan Wang et.al. | 2412.04342 | link |
| 2024-12-05 | Liquid: Language Models are Scalable Multi-modal Generators | Junfeng Wu et.al. | 2412.04332 | link |
| 2024-12-05 | The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation | Fredrik Carlsson et.al. | 2412.04318 | null |
| 2024-12-04 | From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents | Xinyi Mou et.al. | 2412.03563 | link |
| 2024-12-04 | SPICE: Smart Projection Interface for Cooking Enhancement | Vera Prohaska et.al. | 2412.03551 | null |
| 2024-12-04 | Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models | Natalie Mackraz et.al. | 2412.03537 | null |
| 2024-12-04 | A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences | Gabriel Lino Garcia et.al. | 2412.03531 | null |
| 2024-12-04 | FANAL – Financial Activity News Alerting Language Modeling Framework | Urjitkumar Patel et.al. | 2412.03527 | null |
| 2024-12-04 | You’re (Not) My Type – Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks? | Dominic Lohr et.al. | 2412.03516 | null |
| 2024-12-04 | Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective | Neta Shaul et.al. | 2412.03487 | null |
| 2024-12-04 | Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning | Neale Ratzlaff et.al. | 2412.03467 | null |
| 2024-12-04 | From Words to Workflows: Automating Business Processes | Laura Minkova et.al. | 2412.03446 | null |
| 2024-12-04 | RedStone: Curating General, Code, Math, and QA Data for Large Language Models | Yaoyao Chang et.al. | 2412.03398 | null |
| 2024-12-03 | T-REG: Preference Optimization with Token-Level Reward Regularization | Wenxuan Zhou et.al. | 2412.02685 | link |
| 2024-12-03 | Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models | Yuda Song et.al. | 2412.02674 | null |
| 2024-12-03 | LLM-Enhanced Path Planning: Safe and Efficient Autonomous Navigation with Instructional Inputs | Pranav Doma et.al. | 2412.02655 | null |
| 2024-12-03 | Time-Reversal Provides Unsupervised Feedback to LLMs | Yerram Varun et.al. | 2412.02626 | null |
| 2024-12-03 | Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback | Hiroki Furuta et.al. | 2412.02617 | null |
| 2024-12-03 | AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? | Kaixiong Gong et.al. | 2412.02611 | link |
| 2024-12-03 | Interpretable Company Similarity with Sparse Autoencoders | Marco Molinari et.al. | 2412.02605 | null |
| 2024-12-03 | CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs | Abhas Kumar et.al. | 2412.02602 | null |
| 2024-12-03 | PrefixLLM: LLM-aided Prefix Circuit Design | Weihua Xiao et.al. | 2412.02594 | null |
| 2024-12-03 | OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation | Junyuan Zhang et.al. | 2412.02592 | link |
| 2024-12-02 | T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs | Shukang Yin et.al. | 2411.19951 | link |
| 2024-12-02 | Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability | Zicheng Lin et.al. | 2411.19943 | link |
| 2024-11-29 | VLSBench: Unveiling Visual Leakage in Multimodal Safety | Xuhao Hu et.al. | 2411.19939 | link |
| 2024-11-29 | On Domain-Specific Post-Training for Multimodal Large Language Models | Daixuan Cheng et.al. | 2411.19930 | link |
| 2024-11-29 | SIMS: Simulating Human-Scene Interactions with Real World Script Planning | Wenjia Wang et.al. | 2411.19921 | null |
| 2024-11-29 | PDDLFuse: A Tool for Generating Diverse Planning Domains | Vedant Khandelwal et.al. | 2411.19886 | null |
| 2024-12-02 | LUMIA: Linear probing for Unimodal and MultiModal Membership Inference Attacks leveraging internal LLM states | Luis Ibanez-Lissen et.al. | 2411.19876 | null |
| 2024-11-29 | AIDetx: a compression-based method for identification of machine-learning generated text | Leonardo Almeida et.al. | 2411.19869 | link |
| 2024-11-29 | Reverse Thinking Makes LLMs Stronger Reasoners | Justin Chih-Yao Chen et.al. | 2411.19865 | null |
| 2024-11-29 | Cross-Domain Recommendation Meets Large Language Models | Ajay Krishna Vajjala et.al. | 2411.19862 | link |
| 2024-11-27 | Cross-modal Information Flow in Multimodal Large Language Models | Zhi Zhang et.al. | 2411.18620 | link |
| 2024-11-27 | Automated Literature Review Using NLP Techniques and LLM-Based Retrieval-Augmented Generation | Nurshat Fateh Ali et.al. | 2411.18583 | null |
| 2024-11-27 | Challenges in Adapting Multilingual LLMs to Low-Resource Languages using LoRA PEFT Tuning | Omkar Khade et.al. | 2411.18571 | null |
| 2024-11-27 | A Pipeline of Neural-Symbolic Integration to Enhance Spatial Reasoning in Large Language Models | Rong Wang et.al. | 2411.18564 | null |
| 2024-11-27 | DexDiffuser: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation | Zhixuan Liang et.al. | 2411.18562 | null |
| 2024-11-27 | Retrofitting (Large) Language Models with Dynamic Tokenization | Darius Feher et.al. | 2411.18553 | null |
| 2024-11-27 | Emergence of Self-Identity in AI: A Mathematical Framework and Empirical Study with Generative Large Language Models | Minhyeok Lee et.al. | 2411.18530 | link |
| 2024-11-27 | LLM-ABBA: Understand time series via symbolic approximation | Erin Carson et.al. | 2411.18506 | null |
| 2024-11-27 | GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation | Pengfei Zhou et.al. | 2411.18499 | link |
| 2024-11-27 | Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS | Jinyang Wu et.al. | 2411.18478 | link |
| 2024-11-26 | Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats | Jiaxin Wen et.al. | 2411.17693 | null |
| 2024-11-26 | Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens | Xu Ouyang et.al. | 2411.17691 | null |
| 2024-11-26 | Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration | Yuhang Han et.al. | 2411.17686 | link |
| 2024-11-26 | Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning | Zhu Xu et.al. | 2411.17679 | link |
| 2024-11-26 | Push the Limit of Multi-modal Emotion Recognition by Prompting LLMs with Receptive-Field-Aware Attention Weighting | Liyun Zhang et.al. | 2411.17674 | null |
| 2024-11-26 | SketchAgent: Language-Driven Sequential Sketch Generation | Yael Vinker et.al. | 2411.17673 | link |
| 2024-11-26 | Synthetic Data Generation with LLM for Improved Depression Prediction | Andrea Kang et.al. | 2411.17672 | null |
| 2024-11-26 | BERT or FastText? A Comparative Analysis of Contextual as well as Non-Contextual Embeddings | Abhay Shanbhag et.al. | 2411.17661 | null |
| 2024-11-26 | Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism | Yi-Chien Lin et.al. | 2411.17651 | link |
| 2024-11-26 | On Limitations of LLM as Annotator for Low Resource Languages | Suramya Jadhav et.al. | 2411.17637 | null |
| 2024-11-25 | Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts? | Sohee Yang et.al. | 2411.16679 | null |
| 2024-11-25 | DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation | Zun Wang et.al. | 2411.16657 | null |
| 2024-11-25 | Self-Generated Critiques Boost Reward Modeling for Language Models | Yue Yu et.al. | 2411.16646 | null |
| 2024-11-25 | Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective | Jean Marie Tshimula et.al. | 2411.16642 | null |
| 2024-11-25 | Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models | Ronghuan Wu et.al. | 2411.16602 | null |
| 2024-11-25 | From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge | Dawei Li et.al. | 2411.16594 | link |
| 2024-11-25 | Large Language Model-based Decision-making for COLREGs and the Control of Autonomous Surface Vehicles | Klinsmann Agyei et.al. | 2411.16587 | null |
| 2024-11-25 | MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series | Aaron Wheeler et.al. | 2411.16585 | null |
| 2024-11-25 | Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision | Zhiheng Xi et.al. | 2411.16579 | null |
| 2024-11-25 | Predictive Power of LLMs in Financial Markets | Jerick Shi et.al. | 2411.16569 | null |
| 2024-11-22 | Measuring Bullshit in the Language Games played by ChatGPT | Alessandro Trevisan et.al. | 2411.15129 | null |
| 2024-11-22 | AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution | Fengyuan Liu et.al. | 2411.15102 | link |
| 2024-11-22 | XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models | Yixin Dong et.al. | 2411.15100 | link |
| 2024-11-22 | Locating the Leading Edge of Cultural Change | Sarah Griebel et.al. | 2411.15068 | link |
| 2024-11-22 | mR $^2$ AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA | Tao Zhang et.al. | 2411.15041 | null |
| 2024-11-22 | One to rule them all: natural language to bind communication, perception and action | Simone Colombani et.al. | 2411.15033 | null |
| 2024-11-22 | Time is on my sight: scene graph filtering for dynamic environment perception in an LLM-driven robot | Simone Colombani et.al. | 2411.15027 | null |
| 2024-11-22 | DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models | Keda Tao et.al. | 2411.15024 | link |
| 2024-11-22 | FTA generation using GenAI with an Autonomy sensor Usecase | Sneha Sudhir Shetiya et.al. | 2411.15007 | null |
| 2024-11-22 | ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data | Junhong Shen et.al. | 2411.15004 | link |
| 2024-11-21 | Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models | Yuhao Dong et.al. | 2411.14432 | link |
| 2024-11-21 | Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding | Yiming Zhang et.al. | 2411.14401 | null |
| 2024-11-21 | Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings | Aaron Zheng et.al. | 2411.14398 | null |
| 2024-11-21 | UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages | Bethel Melesse Tessema et.al. | 2411.14343 | link |
| 2024-11-21 | Velocitune: A Velocity-based Dynamic Domain Reweighting Method for Continual Pre-training | Zheheng Luo et.al. | 2411.14318 | null |
| 2024-11-21 | Automated Generation of Code Debugging Exercises | Victor-Alexandru Pădurean et.al. | 2411.14303 | null |
| 2024-11-21 | Auto-SPICE: Leveraging LLMs for Dataset Creation via Automated SPICE Netlist Extraction from Analog Circuit Diagrams | Jitendra Bhandari et.al. | 2411.14299 | null |
| 2024-11-21 | Efficient Aspect-Based Summarization of Climate Change Reports with Small Language Models | Iacopo Ghinassi et.al. | 2411.14272 | link |
| 2024-11-21 | Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective | Ernests Lavrinovics et.al. | 2411.14258 | null |
| 2024-11-21 | Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models | Javier Ferrando et.al. | 2411.14257 | null |
| 2024-11-20 | SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs | Shirley Kokane et.al. | 2411.13547 | null |
| 2024-11-20 | BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games | Davide Paglieri et.al. | 2411.13543 | link |
| 2024-11-20 | Metacognition for Unknown Situations and Environments (MUSE) | Rodolfo Valiente et.al. | 2411.13537 | null |
| 2024-11-20 | Advancing Complex Medical Communication in Arabic with Sporo AraSum: Surpassing Existing Large Language Models | Chanseo Lee et.al. | 2411.13518 | null |
| 2024-11-20 | Disentangling Memory and Reasoning Ability in Large Language Models | Mingyu Jin et.al. | 2411.13504 | link |
| 2024-11-20 | Utilizing Large Language Models to Synthesize Product Desirability Datasets | John D. Hastings et.al. | 2411.13485 | null |
| 2024-11-20 | PatentEdits: Framing Patent Novelty as Textual Entailment | Ryan Lee et.al. | 2411.13477 | null |
| 2024-11-20 | When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training | Haonan Wang et.al. | 2411.13476 | link |
| 2024-11-20 | SoK: A Systems Perspective on Compound AI Threats and Countermeasures | Sarbartha Banerjee et.al. | 2411.13459 | null |
| 2024-11-20 | AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations | Gaurav Verma et.al. | 2411.13451 | null |
| 2024-11-19 | ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models | Salma Kharrat et.al. | 2411.12736 | link |
| 2024-11-19 | Information Theory of Meaningful Communication | Doron Sivan et.al. | 2411.12728 | null |
| 2024-11-19 | CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs | Zhehan Kan et.al. | 2411.12713 | null |
| 2024-11-19 | Strengthening Fake News Detection: Leveraging SVM and Sophisticated Text Vectorization Techniques. Defying BERT? | Ahmed Akib Jawad Karim et.al. | 2411.12703 | null |
| 2024-11-19 | When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations | Huaizhi Ge et.al. | 2411.12701 | null |
| 2024-11-19 | SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference | Jiho Shin et.al. | 2411.12692 | null |
| 2024-11-19 | Neurosymbolic Graph Enrichment for Grounded World Models | Stefano De Giorgis et.al. | 2411.12671 | null |
| 2024-11-19 | DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models | Vinay Kumar Sankarapu et.al. | 2411.12643 | link |
| 2024-11-19 | Improving Controllability and Editability for Pretrained Text-to-Music Generation Models | Yixiao Zhang et.al. | 2411.12641 | null |
| 2024-11-19 | AdaCM $^2$ : On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction | Yuanbin Man et.al. | 2411.12593 | null |
| 2024-11-18 | Bi-Mamba: Towards Accurate 1-Bit State Space Models | Shengkun Tang et.al. | 2411.11843 | null |
| 2024-11-18 | Tackling prediction tasks in relational databases with LLMs | Marek Wydmuch et.al. | 2411.11829 | null |
| 2024-11-18 | Exploring adversarial robustness of JPEG AI: methodology, comparison and new methods | Egor Kovalev et.al. | 2411.11795 | null |
| 2024-11-18 | LLM-IE: A Python Package for Generative Information Extraction with Large Language Models | Enshuo Hsu et.al. | 2411.11779 | null |
| 2024-11-18 | The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning | Longju Bai et.al. | 2411.11758 | link |
| 2024-11-18 | sMoRe: Enhancing Object Manipulation and Organization in Mixed Reality Spaces with LLMs and Generative AI | Yunhao Xing et.al. | 2411.11752 | null |
| 2024-11-18 | BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration | Yuzong Chen et.al. | 2411.11745 | link |
| 2024-11-18 | Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment | Allison Huang et.al. | 2411.11731 | null |
| 2024-11-18 | Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation | Mingchao Qi et.al. | 2411.11714 | link |
| 2024-11-18 | FedCoLLM: A Parameter-Efficient Federated Co-tuning Framework for Large and Small Language Models | Tao Fan et.al. | 2411.11707 | null |
| 2024-11-15 | Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization | Weiyun Wang et.al. | 2411.10442 | link |
| 2024-11-15 | LLaVA-o1: Let Vision Language Models Reason Step-by-Step | Guowei Xu et.al. | 2411.10440 | link |
| 2024-11-15 | MARS: Unleashing the Power of Variance Reduction for Training Large Models | Huizhuo Yuan et.al. | 2411.10438 | link |
| 2024-11-15 | Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization | Yuhan Fu et.al. | 2411.10436 | null |
| 2024-11-15 | Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent Balderdash | Parsa Hejabi et.al. | 2411.10422 | link |
| 2024-11-15 | Interactive Cycle Model – The Linkage Combination among Automatic Speech Recognition, Large Language Models and Smart Glasses | Libo Wang et.al. | 2411.10362 | null |
| 2024-11-15 | Bias Unveiled: Investigating Social Bias in LLM-Generated Code | Lin Ling et.al. | 2411.10351 | null |
| 2024-11-15 | On the Cost of Model-Serving Frameworks: An Experimental Evaluation | Pasquale De Rosa et.al. | 2411.10337 | null |
| 2024-11-15 | Number it: Temporal Grounding Videos like Flipping Manga | Yongliang Wu et.al. | 2411.10332 | link |
| 2024-11-15 | Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting | Ziqi Xie et.al. | 2411.10309 | link |
| 2024-11-14 | MagicQuill: An Intelligent Interactive Image Editing System | Zichen Liu et.al. | 2411.09703 | link |
| 2024-11-14 | Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models | Wei Wang et.al. | 2411.09691 | null |
| 2024-11-14 | Squeezed Attention: Accelerating Long Context Length LLM Inference | Coleman Hooper et.al. | 2411.09688 | link |
| 2024-11-14 | Towards a Classification of Open-Source ML Models and Datasets for Software Engineering | Alexandra González et.al. | 2411.09683 | null |
| 2024-11-14 | Med-Bot: An AI-Powered Assistant to Provide Accurate and Reliable Medical Information | Ahan Bhatt et.al. | 2411.09648 | null |
| 2024-11-14 | Local deployment of large-scale music AI models on commodity hardware | Xun Zhou et.al. | 2411.09625 | null |
| 2024-11-14 | PTR: Precision-Driven Tool Recommendation for Large Language Models | Hang Gao et.al. | 2411.09613 | null |
| 2024-11-14 | The Moral Foundations Weibo Corpus | Renjie Cao et.al. | 2411.09612 | null |
| 2024-11-14 | Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework | Ronak Pradeep et.al. | 2411.09607 | null |
| 2024-11-14 | Accelerating Knowledge Graph and Ontology Engineering with Large Language Models | Cogan Shimizu et.al. | 2411.09601 | null |
| 2024-11-13 | The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models | Daniel P. Jeong et.al. | 2411.08870 | null |
| 2024-11-13 | LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs | Piyush Jha et.al. | 2411.08862 | null |
| 2024-11-13 | Multimodal Instruction Tuning with Hybrid State Space Models | Jianing Zhou et.al. | 2411.08840 | null |
| 2024-11-13 | FinRobot: AI Agent for Equity Research and Valuation with Large Language Models | Tianyu Zhou et.al. | 2411.08804 | link |
| 2024-11-13 | Evaluating World Models with LLM for Decision Making | Chang Yang et.al. | 2411.08794 | null |
| 2024-11-13 | Can sparse autoencoders be used to decompose and interpret steering vectors? | Harry Mayne et.al. | 2411.08790 | link |
| 2024-11-13 | Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers | Clément Dumas et.al. | 2411.08745 | link |
| 2024-11-13 | A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models | Dingdong Wang et.al. | 2411.08742 | null |
| 2024-11-13 | Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models | Somanshu Singla et.al. | 2411.08733 | link |
| 2024-11-13 | Polymetis:Large Language Modeling for Multiple Material Domains | Chao Huang et.al. | 2411.08728 | null |
| 2024-11-12 | Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data | Juanhui Li et.al. | 2411.08028 | null |
| 2024-11-12 | LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models | Anoop Cherian et.al. | 2411.08027 | null |
| 2024-11-12 | Language Models as Causal Effect Generators | Lucius E. J. Bynum et.al. | 2411.08019 | link |
| 2024-11-12 | ExpressivityArena: Can LLMs Express Information Implicitly? | Joshua Tint et.al. | 2411.08010 | null |
| 2024-11-12 | Can adversarial attacks by large language models be attributed? | Manuel Cebrian et.al. | 2411.08003 | null |
| 2024-11-12 | Derivational Morphology Reveals Analogical Generalization in Large Language Models | Valentin Hofmann et.al. | 2411.07990 | null |
| 2024-11-12 | JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation | Yiyang Ma et.al. | 2411.07975 | link |
| 2024-11-12 | From General to Specific: Utilizing General Hallucation to Automatically Measure the Role Relationship Fidelity for Specific Role-Play Agents | Chuyi Kong et.al. | 2411.07965 | null |
| 2024-11-12 | Towards Low-bit Communication for Tensor Parallel LLM Inference | Harry Dong et.al. | 2411.07942 | null |
| 2024-11-12 | Leveraging Multimodal Models for Enhanced Neuroimaging Diagnostics in Alzheimer’s Disease | Francesco Chiumento et.al. | 2411.07871 | null |
| 2024-11-11 | UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts | Bo Yang et.al. | 2411.07240 | link |
| 2024-11-11 | OpenThaiGPT 1.5: A Thai-Centric Open Source Large Language Model | Sumeth Yuenyong et.al. | 2411.07238 | null |
| 2024-11-11 | Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving | Botao Yu et.al. | 2411.07228 | null |
| 2024-11-11 | Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks | Madeline Brumley et.al. | 2411.07213 | null |
| 2024-11-11 | DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID | Nyle Siddiqui et.al. | 2411.07205 | link |
| 2024-11-11 | The Super Weight in Large Language Models | Mengxia Yu et.al. | 2411.07191 | link |
| 2024-11-11 | NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics | David Robinson et.al. | 2411.07186 | null |
| 2024-11-11 | Gradual Fine-Tuning with Graph Routing for Multi-Source Unsupervised Domain Adaptation | Yao Ma et.al. | 2411.07185 | null |
| 2024-11-11 | Continual Memorization of Factoids in Large Language Models | Howard Chen et.al. | 2411.07175 | link |
| 2024-11-11 | A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19 | Vedant Khandelwal et.al. | 2411.07163 | null |
| 2024-11-08 | Recycled Attention: Efficient inference for long-context language models | Fangyuan Xu et.al. | 2411.05787 | link |
| 2024-11-08 | Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths? | Veronica Chatrath et.al. | 2411.05775 | null |
| 2024-11-08 | Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024 | Christopher Malon et.al. | 2411.05762 | null |
| 2024-11-08 | Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models | Jia-Hong Huang et.al. | 2411.05706 | null |
| 2024-11-08 | Unmasking the Limits of Large Language Models: A Systematic Evaluation of Masked Text Processing Ability through MskQA and MskCal | Fuka Matsuzaki et.al. | 2411.05665 | link |
| 2024-11-08 | The influence of persona and conversational task on social interactions with a LLM-controlled embodied conversational agent | Leon O. H. Kroczek et.al. | 2411.05653 | null |
| 2024-11-08 | LightVA: Lightweight Visual Analytics with LLM Agent-Based Task Planning and Execution | Yuheng Zhao et.al. | 2411.05651 | null |
| 2024-11-08 | Evaluating Large Language Model Capability in Vietnamese Fact-Checking Data Generation | Long Truong To et.al. | 2411.05641 | null |
| 2024-11-08 | Assessing Open-Source Large Language Models on Argumentation Mining Subtasks | Mohammad Yeghaneh Abkenar et.al. | 2411.05639 | null |
| 2024-11-08 | A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis | Cristiano Patrício et.al. | 2411.05609 | null |
| 2024-11-07 | SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models | Muyang Li et.al. | 2411.05007 | link |
| 2024-11-07 | Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? | Jonathan Roberts et.al. | 2411.05000 | link |
| 2024-11-07 | LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation | Weiquan Huang et.al. | 2411.04997 | link |
| 2024-11-07 | Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models | Weixin Liang et.al. | 2411.04996 | link |
| 2024-11-07 | Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and Alternatives | Hao Sun et.al. | 2411.04991 | link |
| 2024-11-07 | Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries | Dylan Manuel et.al. | 2411.04981 | null |
| 2024-11-07 | SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference | Gabriele Oliaro et.al. | 2411.04975 | null |
| 2024-11-07 | BitNet a4.8: 4-bit Activations for 1-bit LLMs | Hongyu Wang et.al. | 2411.04965 | link |
| 2024-11-07 | Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability | Yanjun Gao et.al. | 2411.04962 | null |
| 2024-11-07 | CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM | Jingwei Xu et.al. | 2411.04954 | link |
| 2024-11-06 | Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? | Daniel P. Jeong et.al. | 2411.04118 | null |
| 2024-11-06 | How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis | Guan Zhe Hong et.al. | 2411.04105 | null |
| 2024-11-06 | Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation | Ke Fan et.al. | 2411.04079 | null |
| 2024-11-06 | Beemo: Benchmark of Expert-edited Machine-generated Outputs | Ekaterina Artemova et.al. | 2411.04032 | link |
| 2024-11-06 | Prompt Engineering Using GPT for Word-Level Code-Mixed Language Identification in Low-Resource Dravidian Languages | Aniket Deroy et.al. | 2411.04025 | null |
| 2024-11-06 | Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval | Davide Buoso et.al. | 2411.04006 | null |
| 2024-11-06 | Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning | Jiawei Yao et.al. | 2411.03978 | null |
| 2024-11-06 | What Really is Commonsense Knowledge? | Quyet V. Do et.al. | 2411.03964 | null |
| 2024-11-06 | How Does A Text Preprocessing Pipeline Affect Ontology Syntactic Matching? | Zhangcheng Qiang et.al. | 2411.03962 | null |
| 2024-11-06 | Fine-Grained Guidance for Retrievers: Leveraging LLMs’ Feedback in Retrieval-Augmented Generation | Yuhang Liu et.al. | 2411.03957 | null |
| 2024-11-05 | MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning | Ziliang Gan et.al. | 2411.03314 | null |
| 2024-11-05 | LLMs for Domain Generation Algorithm Detection | Reynier Leyva La O et.al. | 2411.03307 | null |
| 2024-11-05 | VERITAS: A Unified Approach to Reliability Evaluation | Rajkumar Ramamurthy et.al. | 2411.03300 | null |
| 2024-11-05 | Examining Human-AI Collaboration for Co-Writing Constructive Comments Online | Farhana Shahid et.al. | 2411.03295 | null |
| 2024-11-05 | Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation? | Jingyu Xiao et.al. | 2411.03292 | null |
| 2024-11-05 | The Future of Intelligent Healthcare: A Systematic Analysis and Discussion on the Integration and Impact of Robots Using Large Language Models for Healthcare | Souren Pashangpour et.al. | 2411.03287 | null |
| 2024-11-05 | SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents | Dawei Li et.al. | 2411.03284 | link |
| 2024-11-05 | Spontaneous Emergence of Agent Individuality through Social Interactions in LLM-Based Communities | Ryosuke Takata et.al. | 2411.03252 | null |
| 2024-11-05 | DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models | Ying Zhou et.al. | 2411.03250 | null |
| 2024-11-05 | From Pen to Prompt: How Creative Writers Integrate AI into their Writing Practice | Alicia Guo et.al. | 2411.03137 | null |
| 2024-11-04 | Training-free Regional Prompting for Diffusion Transformers | Anthony Chen et.al. | 2411.02395 | link |
| 2024-11-04 | Adaptive Length Image Tokenization via Recurrent Allocation | Shivam Duggal et.al. | 2411.02393 | link |
| 2024-11-04 | Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models | Guangzhi Xiong et.al. | 2411.02382 | null |
| 2024-11-04 | Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI | Ramneet Kaur et.al. | 2411.02381 | null |
| 2024-11-04 | DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution | Yang Yue et.al. | 2411.02359 | link |
| 2024-11-04 | “Give Me BF16 or Give Me Death”? Accuracy-Performance Trade-Offs in LLM Quantization | Eldar Kurtic et.al. | 2411.02355 | null |
| 2024-11-04 | Social-RAG: Retrieving from Group Interactions to Socially Ground Proactive AI Generation to Group Preferences | Ruotong Wang et.al. | 2411.02353 | null |
| 2024-11-04 | Can Large Language Models generalize analogy solving like people can? | Claire E. Stevenson et.al. | 2411.02348 | null |
| 2024-11-04 | WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning | Zehan Qi et.al. | 2411.02337 | link |
| 2024-11-04 | Sparsing Law: Towards Large Language Models with Greater Activation Sparsity | Yuqi Luo et.al. | 2411.02335 | link |
| 2024-10-31 | P-Masking: Power Law Masking Improves Multi-attribute Controlled Generation | Mohamed Elgaar et.al. | 2410.24201 | null |
| 2024-11-01 | SelfCodeAlign: Self-Alignment for Code Generation | Yuxiang Wei et.al. | 2410.24198 | link |
| 2024-10-31 | Constraint Back-translation Improves Complex Instruction Following of Large Language Models | Yunjia Qi et.al. | 2410.24175 | link |
| 2024-10-31 | Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning | Jinghan Zhang et.al. | 2410.24155 | null |
| 2024-10-31 | Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning | Jiaqi Liu et.al. | 2410.24152 | null |
| 2024-10-31 | Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age | Nouar AlDahoul et.al. | 2410.24148 | null |
| 2024-11-01 | Multi-environment Topic Models | Dominic Sobhani et.al. | 2410.24126 | null |
| 2024-10-31 | Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing | Akash Dhruv et.al. | 2410.24119 | link |
| 2024-10-31 | Repository-Level Compositional Code Translation and Validation | Ali Reza Ibrahimzada et.al. | 2410.24117 | null |
| 2024-10-31 | Nearest Neighbor Normalization Improves Multimodal Retrieval | Neil Chowdhury et.al. | 2410.24114 | link |
| 2024-10-30 | EMMA: End-to-End Multimodal Model for Autonomous Driving | Jyh-Jing Hwang et.al. | 2410.23262 | null |
| 2024-10-30 | Evaluating Cultural and Social Awareness of LLM Web Agents | Haoyi Qiu et.al. | 2410.23252 | null |
| 2024-10-30 | Carrot and Stick: Eliciting Comparison Data and Beyond | Yiling Chen et.al. | 2410.23243 | null |
| 2024-10-30 | A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment | Matteo G. Mecattaf et.al. | 2410.23242 | null |
| 2024-10-30 | EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning | Peide Huang et.al. | 2410.23234 | null |
| 2024-10-31 | Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval | Sheryl Hsu et.al. | 2410.23214 | null |
| 2024-10-30 | Reliability of Topic Modeling | Kayla Schroeder et.al. | 2410.23186 | null |
| 2024-10-30 | ProTransformer: Robustify Transformers via Plug-and-Play Paradigm | Zhichao Hou et.al. | 2410.23182 | null |
| 2024-10-30 | ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning | Millennium Bismay et.al. | 2410.23180 | link |
| 2024-10-30 | SciPIP: An LLM-based Scientific Paper Idea Proposer | Wenxiao Wang et.al. | 2410.23166 | link |
| 2024-10-29 | Enhancing Code Annotation Reliability: Generative AI’s Role in Comment Quality Assessment Models | Seetharam Killivalavan et.al. | 2410.22323 | null |
| 2024-10-29 | Online Detecting LLM-Generated Texts via Sequential Hypothesis Testing by Betting | Can Chen et.al. | 2410.22318 | link |
| 2024-10-29 | Natural Language Inference Improves Compositionality in Vision-Language Models | Paola Cascante-Bonilla et.al. | 2410.22315 | null |
| 2024-10-29 | GPT-4o reads the mind in the eyes | James W. A. Strachan et.al. | 2410.22309 | null |
| 2024-10-29 | SVIP: Towards Verifiable Inference of Open-source Large Language Models | Yifan Sun et.al. | 2410.22307 | null |
| 2024-10-29 | Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | Yihe Deng et.al. | 2410.22304 | null |
| 2024-10-29 | LLMs are Highly-Constrained Biophysical Sequence Optimizers | Angelica Chen et.al. | 2410.22296 | null |
| 2024-10-29 | Fine-Tuning LLMs for Code Mutation: A New Era of Cyber Threats | Mohammad Setak et.al. | 2410.22293 | null |
| 2024-10-29 | Embedding-based classifiers can detect prompt injection attacks | Md. Ahsan Ayub et.al. | 2410.22284 | link |
| 2024-10-29 | Whose ChatGPT? Unveiling Real-World Educational Inequalities Introduced by Large Language Models | Renzhe Yu et.al. | 2410.22282 | null |
| 2024-10-28 | Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics | Yaniv Nikankin et.al. | 2410.21272 | link |
| 2024-10-28 | LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior | Hanyu Wang et.al. | 2410.21264 | link |
| 2024-10-28 | AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? | Han Bao et.al. | 2410.21259 | link |
| 2024-10-28 | LongReward: Improving Long-context Large Language Models with AI Feedback | Jiajie Zhang et.al. | 2410.21252 | link |
| 2024-10-28 | Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback | Nour Jedidi et.al. | 2410.21242 | null |
| 2024-10-28 | Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce | Zhantao Yang et.al. | 2410.21237 | null |
| 2024-10-28 | Flaming-hot Initiation with Regular Execution Sampling for Large Language Models | Weizhe Chen et.al. | 2410.21236 | null |
| 2024-10-28 | LoRA vs Full Fine-tuning: An Illusion of Equivalence | Reece Shuttleworth et.al. | 2410.21228 | null |
| 2024-10-28 | Lifting the Veil on the Large Language Model Supply Chain: Composition, Risks, and Mitigations | Kaifeng Huang et.al. | 2410.21218 | null |
| 2024-10-28 | BongLLaMA: LLaMA for Bangla Language | Abdullah Khan Zehady et.al. | 2410.21200 | null |
| 2024-10-25 | The Potential and Value of AI Chatbot in Personalized Cognitive Training | Zilong Wang et.al. | 2410.19733 | null |
| 2024-10-25 | Counting Ability of Large Language Models and Impact of Tokenization | Xiang Zhang et.al. | 2410.19730 | link |
| 2024-10-25 | FISHNET: Financial Intelligence from Sub-querying, Harmonizing, Neural-Conditioning, Expert Swarms, and Task Planning | Nicole Cho et.al. | 2410.19727 | null |
| 2024-10-25 | 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision | Shilong Li et.al. | 2410.19720 | null |
| 2024-10-25 | TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Xiangyu Zeng et.al. | 2410.19702 | link |
| 2024-10-25 | IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation | Kaixian Qu et.al. | 2410.19697 | null |
| 2024-10-25 | Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs | Yifei Zhang et.al. | 2410.19694 | null |
| 2024-10-25 | APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs | Huaxiaoyue Wang et.al. | 2410.19656 | null |
| 2024-10-25 | Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina | Yuan Gao et.al. | 2410.19599 | null |
| 2024-10-25 | Diverse Sign Language Translation | Xin Shen et.al. | 2410.19586 | null |
| 2024-10-24 | Unbounded: A Generative Infinite Game of Character Life Simulation | Jialu Li et.al. | 2410.18975 | null |
| 2024-10-24 | Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms | Zhangheng Li et.al. | 2410.18967 | link |
| 2024-10-24 | Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions | Yujuan Fu et.al. | 2410.18966 | null |
| 2024-10-24 | OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning | Xiaoqiang Wang et.al. | 2410.18963 | link |
| 2024-10-24 | Bridge-Coder: Unlocking LLMs’ Potential to Overcome Language Gaps in Low-Resource Code | Jipeng Zhang et.al. | 2410.18957 | null |
| 2024-10-24 | BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning | Yujuan Velvin Fu et.al. | 2410.18955 | null |
| 2024-10-24 | Dynamic Vocabulary Pruning in Early-Exit LLMs | Jort Vincenti et.al. | 2410.18952 | link |
| 2024-10-24 | SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models | Zonghao Ying et.al. | 2410.18927 | null |
| 2024-10-24 | From Blind Solvers to Logical Thinkers: Benchmarking LLMs’ Logical Integrity on Faulty Mathematical Problems | A M Muntasir Rahman et.al. | 2410.18921 | null |
| 2024-10-24 | A Survey on Speech Large Language Models | Jing Peng et.al. | 2410.18908 | null |
| 2024-10-23 | TP-Eval: Tap Multimodal LLMs’ Potential in Evaluation by Customizing Prompts | Yuxuan Xie et.al. | 2410.18071 | null |
| 2024-10-23 | LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering | Qingfei Zhao et.al. | 2410.18050 | link |
| 2024-10-23 | Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases | Anna Glazkova et.al. | 2410.18040 | null |
| 2024-10-23 | MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning | Jingfan Zhang et.al. | 2410.18035 | null |
| 2024-10-23 | GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration | Xin Li et.al. | 2410.18032 | link |
| 2024-10-23 | MiniFed : Integrating LLM-based Agentic-Workflow for Simulating FOMC Meeting | Sungil Seok et.al. | 2410.18012 | null |
| 2024-10-23 | Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation | Suho Kang et.al. | 2410.18001 | link |
| 2024-10-23 | Zeitenwenden: Detecting changes in the German political discourse | Kai-Robin Lange et.al. | 2410.17960 | null |
| 2024-10-23 | ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference | Xin He et.al. | 2410.17954 | null |
| 2024-10-23 | SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains | Ran Xu et.al. | 2410.17952 | null |
| 2024-10-22 | Altogether: Image Captioning via Re-aligning Alt-text | Hu Xu et.al. | 2410.17251 | null |
| 2024-10-22 | Large Language Models Empowered Personalized Web Agents | Hongru Cai et.al. | 2410.17236 | null |
| 2024-10-22 | Automated Spinal MRI Labelling from Reports Using a Large Language Model | Robin Y. Park et.al. | 2410.17235 | link |
| 2024-10-22 | Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy | Benedict Aaron Tjandra et.al. | 2410.17234 | null |
| 2024-10-22 | Few-shot In-Context Preference Learning Using Large Language Models | Chao Yu et.al. | 2410.17233 | null |
| 2024-10-22 | Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods | Tsachi Blau et.al. | 2410.17222 | null |
| 2024-10-22 | Exploring Possibilities of AI-Powered Legal Assistance in Bangladesh through Large Language Modeling | Azmine Toushik Wasi et.al. | 2410.17210 | link |
| 2024-10-22 | VoiceBench: Benchmarking LLM-Based Voice Assistants | Yiming Chen et.al. | 2410.17196 | link |
| 2024-10-22 | Language Model Non-myopic Generation for Reasoning and Planning | Chang Ma et.al. | 2410.17195 | null |
| 2024-10-22 | From Attention to Activation: Unravelling the Enigmas of Large Language Models | Prannay Kaul et.al. | 2410.17174 | null |
| 2024-10-21 | Reflection-Bench: probing AI intelligence with reflection | Lingyu Li et.al. | 2410.16270 | link |
| 2024-10-21 | Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance | Zhangwei Gao et.al. | 2410.16261 | link |
| 2024-10-21 | Elucidating the design space of language models for image generation | Xuantong Liu et.al. | 2410.16257 | null |
| 2024-10-21 | CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution | Maosong Cao et.al. | 2410.16256 | link |
| 2024-10-21 | Can Knowledge Editing Really Correct Hallucinations? | Baixiang Huang et.al. | 2410.16251 | link |
| 2024-10-21 | Analyzing Context Contributions in LLM-based Machine Translation | Emmanouil Zaranis et.al. | 2410.16246 | null |
| 2024-10-21 | IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems | Yihuan Mao et.al. | 2410.16237 | null |
| 2024-10-21 | LLaVA-KD: A Framework of Distilling Multimodal Large Language Models | Yuxuan Cai et.al. | 2410.16236 | null |
| 2024-10-21 | ToW: Thoughts of Words Improve Reasoning in Large Language Models | Zhikun Xu et.al. | 2410.16235 | null |
| 2024-10-21 | Building A Coding Assistant via the Retrieval-Augmented Language Model | Xinze Li et.al. | 2410.16229 | null |
| 2024-10-18 | Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts | German Gritsai et.al. | 2410.14677 | null |
| 2024-10-18 | SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment | Qin Liu et.al. | 2410.14676 | null |
| 2024-10-18 | Enhancing Large Language Models’ Situated Faithfulness to External Contexts | Yukun Huang et.al. | 2410.14675 | link |
| 2024-10-18 | NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples | Baiqi Li et.al. | 2410.14669 | null |
| 2024-10-18 | MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps | Xiongtao Zhou et.al. | 2410.14668 | link |
| 2024-10-18 | A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning | Shengjie Sun et.al. | 2410.14660 | null |
| 2024-10-18 | EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search | Oliver Sieberling et.al. | 2410.14649 | null |
| 2024-10-18 | Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs | Runchu Tian et.al. | 2410.14641 | link |
| 2024-10-18 | GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings | Raghuveer Thirukovalluru et.al. | 2410.14635 | null |
| 2024-10-18 | You Shall Know a Tool by the Traces it Leaves: The Predictability of Sentiment Analysis Tools | Daniel Baumartz et.al. | 2410.14626 | null |
| 2024-10-17 | Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens | Lijie Fan et.al. | 2410.13863 | null |
| 2024-10-17 | PUMA: Empowering Unified MLLM with Multi-granular Visual Generation | Rongyao Fang et.al. | 2410.13861 | link |
| 2024-10-17 | $γ-$ MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models | Yaxin Luo et.al. | 2410.13859 | null |
| 2024-10-17 | How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs | Guhao Feng et.al. | 2410.13857 | null |
| 2024-10-17 | Can MLLMs Understand the Deep Implication Behind Chinese Images? | Chenhao Zhang et.al. | 2410.13854 | link |
| 2024-10-17 | Retrospective Learning from Interactions | Zizhao Chen et.al. | 2410.13852 | null |
| 2024-10-17 | SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction | Xuan Zhang et.al. | 2410.13846 | link |
| 2024-10-17 | Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs | Tianyu Guo et.al. | 2410.13835 | null |
| 2024-10-17 | AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents | Ke Yang et.al. | 2410.13825 | null |
| 2024-10-17 | Harnessing Webpage UIs for Text-Rich Visual Understanding | Junpeng Liu et.al. | 2410.13824 | null |
| 2024-10-16 | Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media | Ross Deans Kristensen-McLachlan et.al. | 2410.12791 | null |
| 2024-10-16 | Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception | Jihao Zhao et.al. | 2410.12788 | null |
| 2024-10-16 | In-Context Learning Enables Robot Action Prediction in LLMs | Yida Yin et.al. | 2410.12782 | null |
| 2024-10-16 | Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information | Yingya Li et.al. | 2410.12774 | null |
| 2024-10-16 | StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples | Ajay Patel et.al. | 2410.12757 | null |
| 2024-10-16 | Comparative Analysis of Extrinsic Factors for NER in French | Grace Yang et.al. | 2410.12750 | null |
| 2024-10-16 | CREAM: Consistency Regularized Self-Rewarding Language Models | Zhaoyang Wang et.al. | 2410.12735 | null |
| 2024-10-16 | FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression | Zhenheng Tang et.al. | 2410.12707 | null |
| 2024-10-16 | WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines | Genta Indra Winata et.al. | 2410.12705 | null |
| 2024-10-16 | Sarcasm Detection in a Less-Resourced Language | Lazar Đoković et.al. | 2410.12704 | null |
| 2024-10-15 | GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation | Fei Tang et.al. | 2410.11841 | null |
| 2024-10-15 | MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding | Yue Cao et.al. | 2410.11829 | link |
| 2024-10-15 | SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing | Zhiyuan Zhang et.al. | 2410.11815 | null |
| 2024-10-15 | NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models | Han Han et.al. | 2410.11805 | null |
| 2024-10-15 | FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting | Zhe Li et.al. | 2410.11802 | null |
| 2024-10-15 | Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability | Tsz Ting Chung et.al. | 2410.11786 | null |
| 2024-10-15 | G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks | Guibin Zhang et.al. | 2410.11782 | null |
| 2024-10-15 | Language Models Encode Numbers Using Digit Representations in Base 10 | Amit Arnold Levy et.al. | 2410.11781 | null |
| 2024-10-15 | MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation | Chenxi Wang et.al. | 2410.11779 | link |
| 2024-10-15 | Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models | Kai Yao et.al. | 2410.11772 | link |
| 2024-10-14 | DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads | Guangxuan Xiao et.al. | 2410.10819 | link |
| 2024-10-14 | TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models | Mu Cai et.al. | 2410.10818 | null |
| 2024-10-14 | Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free | Ziyue Li et.al. | 2410.10814 | null |
| 2024-10-14 | LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory | Di Wu et.al. | 2410.10813 | link |
| 2024-10-14 | Local and Global Decoding in Text Generation | Daniel Gareev et.al. | 2410.10810 | link |
| 2024-10-14 | Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning | Aakanksha et.al. | 2410.10801 | null |
| 2024-10-14 | Towards Foundation Models for 3D Vision: How Close Are We? | Yiming Zuo et.al. | 2410.10799 | null |
| 2024-10-14 | MMAR: Towards Lossless Multi-Modal Auto-Regressive Prababilistic Modeling | Jian Yang et.al. | 2410.10798 | null |
| 2024-10-14 | Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance | Sachin Goyal et.al. | 2410.10796 | link |
| 2024-10-14 | LiveXiv – A Multi-Modal Live Benchmark Based on Arxiv Papers Content | Nimrod Shabtay et.al. | 2410.10783 | link |
| 2024-10-11 | MiRAGeNews: Multimodal Realistic AI-Generated News Detection | Runsheng Huang et.al. | 2410.09045 | null |
| 2024-10-11 | AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation | Zijun Wang et.al. | 2410.09040 | link |
| 2024-10-11 | Semi-Supervised Learning of Noisy Mixture of Experts Models | Oh-Ran Kwon et.al. | 2410.09039 | null |
| 2024-10-11 | SimpleStrat: Diversifying Language Model Generation with Stratification | Justin Wong et.al. | 2410.09038 | null |
| 2024-10-11 | Mentor-KD: Making Small Language Models Better Multi-step Reasoners | Hojae Lee et.al. | 2410.09037 | link |
| 2024-10-11 | PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents | Xiangyu Yin et.al. | 2410.09034 | null |
| 2024-10-11 | The Impact of Visual Information in Chinese Characters: Evaluating Large Models’ Ability to Recognize and Utilize Radicals | Xiaofeng Wu et.al. | 2410.09013 | null |
| 2024-10-11 | Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models | Hao Li et.al. | 2410.09012 | null |
| 2024-10-11 | SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights | Ling Yang et.al. | 2410.09008 | link |
| 2024-10-11 | From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating UI Operation Impacts | Zhuohao Jerry Zhang et.al. | 2410.09006 | null |
| 2024-10-10 | Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision | Shengcao Cao et.al. | 2410.08209 | null |
| 2024-10-10 | Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training | Gen Luo et.al. | 2410.08202 | null |
| 2024-10-10 | From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions | Changle Qu et.al. | 2410.08197 | link |
| 2024-10-10 | MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code | Zimu Lu et.al. | 2410.08196 | link |
| 2024-10-10 | GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment | Yuancheng Xu et.al. | 2410.08193 | null |
| 2024-10-10 | Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models | Qingni Wang et.al. | 2410.08174 | null |
| 2024-10-10 | On the Evaluation of Generative Robotic Simulations | Feng Chen et.al. | 2410.08172 | null |
| 2024-10-10 | Agent S: An Open Agentic Framework that Uses Computers Like a Human | Saaket Agashe et.al. | 2410.08164 | link |
| 2024-10-10 | Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning | Amrith Setlur et.al. | 2410.08146 | null |
| 2024-10-10 | Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs | Xiaoyuan Liu et.al. | 2410.08145 | null |
| 2024-10-09 | Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models | Fei Wang et.al. | 2410.07176 | null |
| 2024-10-09 | Do better language models have crisper vision? | Jona Ruthardt et.al. | 2410.07173 | null |
| 2024-10-09 | Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate | Qidong Huang et.al. | 2410.07167 | link |
| 2024-10-09 | Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making | Manling Li et.al. | 2410.07166 | link |
| 2024-10-09 | Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning | Chongyu Fan et.al. | 2410.07163 | null |
| 2024-10-09 | Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis | Bohan Zeng et.al. | 2410.07155 | link |
| 2024-10-09 | Mental Disorders Detection in the Era of Large Language Models | Gleb Kuzmin et.al. | 2410.07129 | null |
| 2024-10-09 | Personalized Visual Instruction Tuning | Renjie Pi et.al. | 2410.07113 | null |
| 2024-10-09 | I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy | Gian Maria Campedelli et.al. | 2410.07109 | null |
| 2024-10-09 | Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context | Sangwon Yu et.al. | 2410.07103 | null |
| 2024-10-07 | Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models | Fei Wang et.al. | 2410.05269 | null |
| 2024-10-07 | PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs | Mengzhao Chen et.al. | 2410.05265 | link |
| 2024-10-07 | TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles | Qingchen Yu et.al. | 2410.05262 | link |
| 2024-10-07 | Differential Transformer | Tianzhu Ye et.al. | 2410.05258 | null |
| 2024-10-07 | GLEE: A Unified Framework and Benchmark for Language-based Economic Environments | Eilam Shapira et.al. | 2410.05254 | link |
| 2024-10-07 | Causal Micro-Narratives | Mourad Heddaya et.al. | 2410.05252 | null |
| 2024-10-07 | LoTLIP: Improving Language-Image Pre-training for Long Text Understanding | Wei Wu et.al. | 2410.05249 | null |
| 2024-10-07 | SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe | Yuxin Xiao et.al. | 2410.05248 | null |
| 2024-10-07 | Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents | Boyu Gou et.al. | 2410.05243 | null |
| 2024-10-07 | GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models | Iman Mirzadeh et.al. | 2410.05229 | null |
| 2024-10-04 | Enhance Reasoning by Learning from Mistakes: Peer-Review Knowledge Distillation from Multiple Large Language Models | Zhuochun Li et.al. | 2410.03663 | null |
| 2024-10-04 | RAFT: Realistic Attacks to Fool Text Detectors | James Wang et.al. | 2410.03658 | null |
| 2024-10-04 | Aligning LLMs with Individual Preferences via Interaction | Shujin Wu et.al. | 2410.03642 | link |
| 2024-10-04 | Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation | Jie Xiao et.al. | 2410.03613 | null |
| 2024-10-04 | TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation | Jonathan Cook et.al. | 2410.03608 | null |
| 2024-10-04 | Efficiently Identifying Watermarked Segments in Mixed-Source Texts | Xuandong Zhao et.al. | 2410.03600 | null |
| 2024-10-04 | Understanding Reasoning in Chain-of-Thought from the Hopfieldian View | Lijie Hu et.al. | 2410.03595 | null |
| 2024-10-04 | Explicit, Implicit, and Scattered: Revisiting Event Extraction to Capture Complex Arguments | Omar Sharif et.al. | 2410.03594 | null |
| 2024-10-04 | Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models | Xin Zou et.al. | 2410.03577 | null |
| 2024-10-04 | Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs) | Abrar Rahman et.al. | 2410.03568 | null |
| 2024-10-03 | FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models | Zhipei Xu et.al. | 2410.02761 | null |
| 2024-10-03 | Loong: Generating Minute-level Long Videos with Autoregressive Language Models | Yuqing Wang et.al. | 2410.02757 | null |
| 2024-10-03 | SIEVE: General Purpose Data Filtering System Matching GPT-4o Accuracy at 1% the Cost | Jifan Zhang et.al. | 2410.02755 | null |
| 2024-10-03 | Training Language Models on Synthetic Edit Sequences Improves Code Synthesis | Ulyana Piterbarg et.al. | 2410.02749 | null |
| 2024-10-03 | CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation | Han He et.al. | 2410.02748 | null |
| 2024-10-03 | Contrastive Localized Language-Image Pre-Training | Hong-You Chen et.al. | 2410.02746 | null |
| 2024-10-03 | Neutral residues: revisiting adapters for model extension | Franck Signe Talla et.al. | 2410.02744 | null |
| 2024-10-03 | MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions | Yekun Chai et.al. | 2410.02743 | null |
| 2024-10-03 | Grounding Large Language Models In Embodied Environment With Imperfect World Models | Haolan Liu et.al. | 2410.02742 | null |
| 2024-10-03 | Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization | Lei Xu et.al. | 2410.02741 | null |
| 2024-10-02 | Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads | Yuxiang Huang et.al. | 2410.01805 | link |
| 2024-10-02 | Efficient $1$ -bit tensor approximations | Alex W. Neal Riasanovsky et.al. | 2410.01799 | null |
| 2024-10-02 | Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models | Joseph Lee et.al. | 2410.01795 | link |
| 2024-10-02 | When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 | R. Thomas McCoy et.al. | 2410.01792 | null |
| 2024-10-02 | Investigating on RLHF methodology | Alexey Kutalev et.al. | 2410.01789 | null |
| 2024-10-02 | OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation Models | Heng Yang et.al. | 2410.01784 | link |
| 2024-10-02 | Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models | Shayekh Bin Islam et.al. | 2410.01782 | null |
| 2024-10-02 | Quantifying Generalization Complexity for Large Language Models | Zhenting Qi et.al. | 2410.01769 | null |
| 2024-10-02 | LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks | Mengzhao Jia et.al. | 2410.01744 | null |
| 2024-10-02 | VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models | Kailai Feng et.al. | 2410.01738 | link |
| 2024-09-30 | MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning | Haotian Zhang et.al. | 2409.20566 | null |
| 2024-09-30 | Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos | Md Mohaiminul Islam et.al. | 2409.20557 | null |
| 2024-09-30 | LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation | Ziyao Zhang et.al. | 2409.20550 | null |
| 2024-09-30 | Robi Butler: Remote Multimodal Interactions with Household Robot Assistant | Anxing Xiao et.al. | 2409.20548 | null |
| 2024-09-30 | Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models | Arpan Mukherjee et.al. | 2409.20512 | null |
| 2024-09-30 | COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models | Divyanshu Daiya et.al. | 2409.20502 | null |
| 2024-10-02 | Linear Projections of Teacher Embeddings for Few-Class Distillation | Noel Loo et.al. | 2409.20449 | null |
| 2024-10-01 | Instance-adaptive Zero-shot Chain-of-Thought Prompting | Xiaosong Yuan et.al. | 2409.20441 | null |
| 2024-09-30 | HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding | Fan Yuan et.al. | 2409.20429 | null |
| 2024-09-30 | World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering | Jiacong Wang et.al. | 2409.20424 | null |
| 2024-09-27 | LML: Language Model Learning a Dataset for Data-Augmented Prediction | Praneeth Vadlapati et.al. | 2409.18957 | link |
| 2024-09-27 | Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models | Jiaming Li et.al. | 2409.18943 | link |
| 2024-09-27 | From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding | Heqing Zou et.al. | 2409.18938 | link |
| 2024-09-27 | AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow | Huizi Yu et.al. | 2409.18924 | null |
| 2024-09-27 | Soft Measures for Extracting Causal Collective Intelligence | Maryam Berijanian et.al. | 2409.18911 | link |
| 2024-09-27 | Multi-Source Hard and Soft Information Fusion Approach for Accurate Cryptocurrency Price Movement Prediction | Saeed Mohammadi Dashtaki et.al. | 2409.18895 | null |
| 2024-09-27 | HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models | Yu Zhou et.al. | 2409.18893 | null |
| 2024-09-27 | IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation | Fan Lin et.al. | 2409.18892 | null |
| 2024-09-27 | Predicting and analyzing memorization within fine-tuned Large Language Models | Jérémie Dentan et.al. | 2409.18858 | null |
| 2024-09-27 | Mitigating Selection Bias with Node Pruning and Auxiliary Options | Hyeong Kyu Choi et.al. | 2409.18857 | null |
| 2024-09-26 | EgoLM: Multi-Modal Language Model of Egocentric Motions | Fangzhou Hong et.al. | 2409.18127 | null |
| 2024-09-26 | Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography | Yuexi Du et.al. | 2409.18119 | link |
| 2024-09-26 | E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding | Ye Liu et.al. | 2409.18111 | link |
| 2024-09-26 | Infering Alt-text For UI Icons With Large Language Models During App Development | Sabrina Haque et.al. | 2409.18060 | null |
| 2024-09-26 | DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving | Dingrui Wang et.al. | 2409.18053 | null |
| 2024-09-26 | IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning | Soeun Lee et.al. | 2409.18046 | null |
| 2024-09-26 | Unveiling the Role of Pretraining in Direct Speech Translation | Belen Alastruey et.al. | 2409.18044 | null |
| 2024-09-26 | EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions | Kai Chen et.al. | 2409.18042 | link |
| 2024-09-26 | Compositional Hardness of Code in Large Language Models – A Probabilistic Perspective | Yotam Wolf et.al. | 2409.18028 | null |
| 2024-09-26 | An Adversarial Perspective on Machine Unlearning for AI Safety | Jakub Łucki et.al. | 2409.18025 | null |
| 2024-09-25 | Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models | Matt Deitke et.al. | 2409.17146 | link |
| 2024-09-25 | Attention Prompting on Image for Large Vision-Language Models | Runpeng Yu et.al. | 2409.17143 | link |
| 2024-09-25 | FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression | Fazal Mittu et.al. | 2409.17141 | link |
| 2024-09-25 | Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents | Junting Lu et.al. | 2409.17140 | null |
| 2024-09-25 | Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale | Fan Zhou et.al. | 2409.17115 | link |
| 2024-09-25 | Accumulator-Aware Post-Training Quantization | Ian Colbert et.al. | 2409.17092 | null |
| 2024-09-25 | VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models | Yifei Liu et.al. | 2409.17066 | link |
| 2024-09-25 | Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia | Azmul Asmar Irfan et.al. | 2409.17054 | null |
| 2024-09-25 | How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not | Francesco Verdini et.al. | 2409.17044 | null |
| 2024-09-25 | Counterfactual Token Generation in Large Language Models | Ivi Chatzi et.al. | 2409.17027 | link |
| 2024-09-24 | MonoFormer: One Transformer for Both Diffusion and Autoregression | Chuyang Zhao et.al. | 2409.16280 | link |
| 2024-09-24 | A fast and sound tagging method for discontinuous named-entity recognition | Caio Corro et.al. | 2409.16243 | null |
| 2024-09-24 | LLM Echo Chamber: personalized and automated disinformation | Tony Ma et.al. | 2409.16241 | link |
| 2024-09-24 | Towards Enhancing Linked Data Retrieval in Conversational UIs using Large Language Models | Omar Mussa et.al. | 2409.16220 | null |
| 2024-09-24 | LLMCount: Enhancing Stationary mmWave Detection with Multimodal-LLM | Boyan Li et.al. | 2409.16209 | null |
| 2024-09-25 | CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data | Qian-Wen Zhang et.al. | 2409.16202 | link |
| 2024-09-24 | HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models | Haoran Que et.al. | 2409.16191 | link |
| 2024-09-24 | Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation | Xiaohong Liu et.al. | 2409.16183 | null |
| 2024-09-24 | Cyber Knowledge Completion Using Large Language Models | Braden K Webb et.al. | 2409.16176 | null |
| 2024-09-24 | Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering | Ziyu Zhao et.al. | 2409.16167 | null |
| 2024-09-20 | Gender Representation and Bias in Indian Civil Service Mock Interviews | Somonnoy Banerjee et.al. | 2409.12194 | null |
| 2024-09-18 | To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning | Zayne Sprague et.al. | 2409.12183 | link |
| 2024-09-18 | Finetuning Language Models to Emit Linguistic Expressions of Uncertainty | Arslan Chaudhry et.al. | 2409.12180 | null |
| 2024-09-18 | Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference | Najmeh Forouzandehmehr et.al. | 2409.12150 | null |
| 2024-09-18 | MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | Justin Chih-Yao Chen et.al. | 2409.12147 | link |
| 2024-09-18 | Experimental Evidence That Conversational Artificial Intelligence Can Steer Consumer Behavior Without Detection | Tobias Werner et.al. | 2409.12143 | null |
| 2024-09-18 | MoRAG – Multi-Fusion Retrieval Augmented Generation for Human Motion | Kalakonda Sai Shashank et.al. | 2409.12140 | link |
| 2024-09-24 | Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models | Sijing Chen et.al. | 2409.12139 | null |
| 2024-09-18 | Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement | An Yang et.al. | 2409.12122 | null |
| 2024-09-18 | Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference | Edresson Casanova et.al. | 2409.12117 | null |
| 2024-09-17 | AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs | Basel Mousi et.al. | 2409.11404 | null |
| 2024-09-17 | NVLM: Open Frontier-Class Multimodal LLMs | Wenliang Dai et.al. | 2409.11402 | null |
| 2024-09-17 | Says Who? Effective Zero-Shot Annotation of Focalization | Rebecca M. M. Hicke et.al. | 2409.11390 | null |
| 2024-09-17 | Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement | Simon Yu et.al. | 2409.11378 | link |
| 2024-09-17 | Towards Time Series Reasoning with LLMs | Winnie Chow et.al. | 2409.11376 | null |
| 2024-09-17 | Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification | Fatema-E- Jannat et.al. | 2409.11375 | null |
| 2024-09-17 | CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration | Jiahui Gao et.al. | 2409.11365 | null |
| 2024-09-17 | AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances | Dhruv Agarwal et.al. | 2409.11360 | null |
| 2024-09-17 | THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models | Mengfei Liang et.al. | 2409.11353 | null |
| 2024-09-18 | Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling | Xinyue Fang et.al. | 2409.11283 | null |
| 2024-09-16 | RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval | Di Liu et.al. | 2409.10516 | null |
| 2024-09-16 | Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models | Momoko Shiraishi et.al. | 2409.10506 | null |
| 2024-09-16 | DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction | John Wu et.al. | 2409.10504 | null |
| 2024-09-16 | Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles | Kulin Shah et.al. | 2409.10502 | link |
| 2024-09-16 | Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models | Shaznin Sultana et.al. | 2409.10490 | null |
| 2024-09-16 | XLM for Autonomous Driving Systems: A Comprehensive Review | Sonda Fourati et.al. | 2409.10484 | null |
| 2024-09-16 | Schrodinger’s Memory: Large Language Models | Wei Wang et.al. | 2409.10482 | null |
| 2024-09-16 | LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning | Jicong Ao et.al. | 2409.10444 | link |
| 2024-09-16 | A Large-Scale Privacy Assessment of Android Third-Party SDKs | Mark Huasong Meng et.al. | 2409.10411 | null |
| 2024-09-17 | Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot | Bhuvan Sachdeva et.al. | 2409.10354 | null |
| 2024-09-13 | Agents in Software Engineering: Survey, Landscape, and Vision | Yanxian Huang et.al. | 2409.09030 | link |
| 2024-09-13 | Contri(e)ve: Context + Retrieve for Scholarly Question Answering | Kanchan Shivashankar et.al. | 2409.09010 | null |
| 2024-09-13 | Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance | Lucio La Cava et.al. | 2409.08963 | null |
| 2024-09-13 | Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions | Zahra Ashktorab et.al. | 2409.08937 | null |
| 2024-09-13 | SynSUM – Synthetic Benchmark with Structured and Unstructured Medical Records | Paloma Rabaey et.al. | 2409.08936 | link |
| 2024-09-13 | LLM-based Weak Supervision Framework for Query Intent Classification in Video Search | Farnoosh Javadi et.al. | 2409.08931 | null |
| 2024-09-13 | AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models | Yifei Yao et.al. | 2409.08904 | null |
| 2024-09-13 | A Market for Lemons? Strategic Directions for a Vigilant Application of Artificial Intelligence in Entrepreneurship Research | Martin Obschonka et.al. | 2409.08890 | null |
| 2024-09-13 | Exploring Graph Structure Comprehension Ability of Multimodal Large Language Models: Case Studies | Zhiqiang Zhong et.al. | 2409.08864 | null |
| 2024-09-13 | FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition | Zhenhua Xu et.al. | 2409.08846 | null |
| 2024-09-12 | DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors | Thomas Hanwen Zhu et.al. | 2409.08278 | null |
| 2024-09-12 | Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale | Rogerio Bonatti et.al. | 2409.08264 | link |
| 2024-09-12 | OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering | Jiahao Nick Li et.al. | 2409.08250 | null |
| 2024-09-12 | Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources | Alisia Lupidi et.al. | 2409.08239 | null |
| 2024-09-12 | LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems | Hakan T. Otal et.al. | 2409.08234 | link |
| 2024-09-12 | What Makes a Maze Look Like a Maze? | Joy Hsu et.al. | 2409.08202 | null |
| 2024-09-12 | Fine-tuning Large Language Models for Entity Matching | Aaron Steiner et.al. | 2409.08185 | link |
| 2024-09-12 | Faster Speech-LLaMA Inference with Multi-token Prediction | Desh Raj et.al. | 2409.08148 | null |
| 2024-09-12 | LLM-POTUS Score: A Framework of Analyzing Presidential Debates with Large Language Models | Zhengliang Liu et.al. | 2409.08147 | null |
| 2024-09-12 | WhisperNER: Unified Open Named Entity and Speech Recognition | Gil Ayache et.al. | 2409.08107 | null |
| 2024-09-11 | “My Grade is Wrong!”: A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays | Shengxin Hong et.al. | 2409.07453 | null |
| 2024-09-11 | SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories | Ben Bogin et.al. | 2409.07440 | link |
| 2024-09-11 | CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification | Zeqing Qin et.al. | 2409.07407 | null |
| 2024-09-11 | AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge | Han Wang et.al. | 2409.07394 | link |
| 2024-09-11 | Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective | Guimin Hu et.al. | 2409.07388 | null |
| 2024-09-11 | Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code | Khiem Ton et.al. | 2409.07368 | null |
| 2024-09-11 | Think Together and Work Better: Combining Humans’ and LLMs’ Think-Aloud Outcomes for Effective Text Evaluation | SeongYeub Chu et.al. | 2409.07355 | link |
| 2024-09-11 | Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks | Md Zarif Hossain et.al. | 2409.07353 | link |
| 2024-09-11 | Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering | Weixi Weng et.al. | 2409.07331 | null |
| 2024-09-11 | MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications | Praveen K Kanithi et.al. | 2409.07314 | null |
| 2024-09-10 | E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning | Zihan Liao et.al. | 2409.06679 | link |
| 2024-09-10 | LLaMA-Omni: Seamless Speech Interaction with Large Language Models | Qingkai Fang et.al. | 2409.06666 | link |
| 2024-09-10 | Human Perception of LLM-generated Text Content in Social Media Environments | Kristina Radivojevic et.al. | 2409.06653 | null |
| 2024-09-10 | Optimal Workload Placement on Multi-Instance GPUs | Bekir Turkkan et.al. | 2409.06646 | null |
| 2024-09-10 | EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis | Danli Shi et.al. | 2409.06644 | null |
| 2024-09-10 | MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders | Wenyu Zhang et.al. | 2409.06635 | null |
| 2024-09-10 | A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio | Ningyuan Xi et.al. | 2409.06624 | null |
| 2024-09-10 | Alleviating Hallucinations in Large Language Models with Scepticism Modeling | Yetao Wu et.al. | 2409.06601 | null |
| 2024-09-10 | GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering | Sacha Muller et.al. | 2409.06595 | link |
| 2024-09-10 | MAPS: Energy-Reliability Tradeoff Management in Autonomous Vehicles Through LLMs Penetrated Science | Mahdieh Aliazam et.al. | 2409.06558 | null |
| 2024-09-09 | MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct | Run Luo et.al. | 2409.05840 | null |
| 2024-09-09 | Are Large Language Models a Threat to Programming Platforms? An Exploratory Study | Md Mustakim Billah et.al. | 2409.05824 | null |
| 2024-09-09 | Benchmarking Chinese Knowledge Rectification in Large Language Models | Tianhe Lu et.al. | 2409.05806 | link |
| 2024-09-09 | Breaking Neural Network Scaling Laws with Modularity | Akhilan Boopathy et.al. | 2409.05780 | null |
| 2024-09-09 | Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models | Emily Cheng et.al. | 2409.05771 | null |
| 2024-09-09 | Model Input Verification of Large Scale Simulations | Rumyana Neykova et.al. | 2409.05768 | null |
| 2024-09-09 | A Novel Idea Generation Tool using a Structured Conversational AI (CAI) System | B. Sankar et.al. | 2409.05747 | null |
| 2024-09-09 | LLMs Will Always Hallucinate, and We Need to Live With This | Sourav Banerjee et.al. | 2409.05746 | null |
| 2024-09-09 | A System and Benchmark for LLM-based Q\&A on Heterogeneous Data | Achille Fokoue et.al. | 2409.05735 | null |
| 2024-09-09 | Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach | Meng Zhou et.al. | 2409.05732 | link |
| 2024-09-06 | RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs | Jiaxing Wu et.al. | 2409.04421 | null |
| 2024-09-06 | Question-Answering Dense Video Events | Hangyu Qin et.al. | 2409.04388 | link |
| 2024-09-06 | Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs | Aliakbar Nafar et.al. | 2409.04318 | null |
| 2024-09-06 | An optically accelerated extreme learning machine using hot atomic vapors | Pierre Azam et.al. | 2409.04312 | null |
| 2024-09-06 | Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets | Desiree Heim et.al. | 2409.04286 | null |
| 2024-09-06 | Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models | Yuxiao Huang et.al. | 2409.04270 | null |
| 2024-09-06 | GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding | Ziyin Zhang et.al. | 2409.04183 | link |
| 2024-09-06 | Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering | Larissa Pusch et.al. | 2409.04181 | null |
| 2024-09-06 | From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks | Andreas Stephan et.al. | 2409.04168 | null |
| 2024-09-06 | Can OpenSource beat ChatGPT? – A Comparative Study of Large Language Models for Text-to-Code Generation | Luis Mayer et.al. | 2409.04164 | null |
| 2024-09-05 | Attention Heads of Large Language Models: A Survey | Zifan Zheng et.al. | 2409.03752 | link |
| 2024-09-05 | LLM-CI: Assessing Contextual Integrity Norms in Language Models | Yan Shvartzshnaider et.al. | 2409.03735 | null |
| 2024-09-05 | Safety vs. Performance: How Multi-Objective Learning Reduces Barriers to Market Entry | Meena Jagadeesan et.al. | 2409.03734 | null |
| 2024-09-05 | Planning In Natural Language Improves LLM Search For Code Generation | Evan Wang et.al. | 2409.03733 | null |
| 2024-09-05 | RAG based Question-Answering for Contextual Response Prediction System | Sriram Veturi et.al. | 2409.03708 | null |
| 2024-09-05 | TRACE-cs: Trustworthy Reasoning for Contrastive Explanations in Course Scheduling Problems | Stylianos Loukas Vasileiou et.al. | 2409.03671 | null |
| 2024-09-05 | A Fused Large Language Model for Predicting Startup Success | Abdurahman Maarouf et.al. | 2409.03668 | null |
| 2024-09-05 | The representation landscape of few-shot learning and fine-tuning in large language models | Diego Doimo et.al. | 2409.03662 | link |
| 2024-09-06 | LLM-based multi-agent poetry generation in non-cooperative environments | Ran Zhang et.al. | 2409.03659 | link |
| 2024-09-05 | From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents | Jifan Yu et.al. | 2409.03512 | null |
| 2024-09-04 | RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version) | Yao Mu et.al. | 2409.02920 | link |
| 2024-09-05 | LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA | Jiajie Zhang et.al. | 2409.02897 | link |
| 2024-09-04 | LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture | Xidong Wang et.al. | 2409.02889 | link |
| 2024-09-04 | Historical German Text Normalization Using Type- and Token-Based Language Modeling | Anton Ehrmanntraut et.al. | 2409.02841 | null |
| 2024-09-04 | Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models | Moein Shahiki Tash et.al. | 2409.02836 | null |
| 2024-09-04 | CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models | Wentao Liu et.al. | 2409.02834 | link |
| 2024-09-04 | ExpLLM: Towards Chain of Thought for Facial Expression Recognition | Xing Lan et.al. | 2409.02828 | link |
| 2024-09-04 | Design Contradictions: Help or Hindrance? | Aron E. Owen et.al. | 2409.02823 | null |
| 2024-09-04 | Language Understanding as a Constraint on Consensus Size in LLM Societies | Giordano De Marzo et.al. | 2409.02822 | null |
| 2024-09-04 | Towards a Unified View of Preference Learning for Large Language Models: A Survey | Bofei Gao et.al. | 2409.02795 | link |
| 2024-08-30 | SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists | Raoyuan Zhao et.al. | 2408.17437 | link |
| 2024-08-30 | Advancing Multi-talker ASR Performance with Large Language Models | Mohan Shi et.al. | 2408.17431 | null |
| 2024-08-30 | CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models | Jonathan Bourne et.al. | 2408.17428 | null |
| 2024-08-30 | Getting Inspiration for Feature Elicitation: App Store- vs. LLM-based Approach | Jialiang Wei et.al. | 2408.17404 | link |
| 2024-08-30 | NDP: Next Distribution Prediction as a More Broad Target | Junhao Ruan et.al. | 2408.17377 | null |
| 2024-08-30 | Look, Learn and Leverage (L $^3$ ): Mitigating Visual-Domain Shift and Discovering Intrinsic Relations via Symbolic Alignment | Hanchen Xie et.al. | 2408.17363 | null |
| 2024-08-30 | Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain | Francesca Grasso et.al. | 2408.17362 | link |
| 2024-08-30 | Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage | Md Rafi Ur Rashid et.al. | 2408.17354 | null |
| 2024-08-30 | Bridging Domain Knowledge and Process Discovery Using Large Language Models | Ali Norouzifar et.al. | 2408.17316 | link |
| 2024-08-30 | Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts | Rhui Dih Lee et.al. | 2408.17280 | null |
| 2024-08-29 | How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models | Jiyue Jiang et.al. | 2408.16756 | link |
| 2024-08-29 | Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models | Alec Solway et.al. | 2408.16753 | null |
| 2024-08-29 | Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge | Beidi Dong et.al. | 2408.16749 | null |
| 2024-08-29 | Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models | Jiří Milička et.al. | 2408.16740 | null |
| 2024-08-29 | GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models | Moreno D’Incà et.al. | 2408.16700 | link |
| 2024-08-29 | Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity | Ziniu Li et.al. | 2408.16673 | null |
| 2024-08-29 | Examination of Code generated by Large Language Models | Robin Beer et.al. | 2408.16601 | link |
| 2024-08-29 | Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies | Zhiyang Qi et.al. | 2408.16586 | null |
| 2024-08-29 | CNIMA: A Universal Evaluation Framework and Automated Approach for Assessing Second Language Dialogues | Rena Gao et.al. | 2408.16518 | null |
| 2024-08-29 | LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs? | Jan Cegin et.al. | 2408.16502 | null |
| 2024-08-28 | Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders | Min Shi et.al. | 2408.15998 | link |
| 2024-08-28 | BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems | Wei Wang et.al. | 2408.15971 | null |
| 2024-08-28 | More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding | Yuan Tang et.al. | 2408.15966 | link |
| 2024-08-28 | Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games | Nicholas R. Waytowich et.al. | 2408.15950 | null |
| 2024-08-28 | Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models | Yuncheng Yang et.al. | 2408.15915 | link |
| 2024-08-28 | Decentralized LLM Inference over Edge Networks with Energy Harvesting | Aria Khoshsirat et.al. | 2408.15907 | null |
| 2024-08-28 | LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments | Ruirui Chen et.al. | 2408.15903 | null |
| 2024-08-28 | Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts | Nikolas Gritsch et.al. | 2408.15901 | null |
| 2024-08-28 | Bias in LLMs as Annotators: The Effect of Party Cues on Labelling Decision by Large Language Models | Sebastian Vallejo Vera et.al. | 2408.15895 | null |
| 2024-08-28 | Persuasion Games using Large Language Models | Ganesh Prasath Ramani et.al. | 2408.15879 | null |
| 2024-08-27 | Generative Verifiers: Reward Modeling as Next-Token Prediction | Lunjun Zhang et.al. | 2408.15240 | null |
| 2024-08-27 | LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet | Nathaniel Li et.al. | 2408.15221 | null |
| 2024-08-27 | Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks | Shide Zhou et.al. | 2408.15207 | null |
| 2024-08-27 | Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation | Jian Hu et.al. | 2408.15205 | link |
| 2024-08-27 | Can Unconfident LLM Annotations Be Used for Confident Conclusions? | Kristina Gligorić et.al. | 2408.15204 | link |
| 2024-08-27 | Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement | Longshen Ou et.al. | 2408.15176 | null |
| 2024-08-27 | X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation | Hanjia Lyu et.al. | 2408.15172 | null |
| 2024-08-27 | Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation | N. E. Kriman et.al. | 2408.15171 | null |
| 2024-08-27 | BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline | Guosheng Dong et.al. | 2408.15079 | null |
| 2024-08-27 | Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models | Ned Cooper et.al. | 2408.15066 | null |
| 2024-08-27 | Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models | Aradhye Agarwal et.al. | 2408.14470 | null |
| 2024-08-26 | Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos | Qirui Chen et.al. | 2408.14469 | link |
| 2024-08-26 | Explicit Inductive Inference using Large Language Models | Tianyang Liu et.al. | 2408.14467 | null |
| 2024-08-26 | Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study | Liuchang Xu Shuo Zhao et.al. | 2408.14438 | null |
| 2024-08-26 | CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models | Shubham Bharti et.al. | 2408.14419 | null |
| 2024-08-26 | MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues | Kuluhan Binici et.al. | 2408.14418 | null |
| 2024-08-26 | Language-specific Calibration for Pruning Multilingual Language Models | Simon Kurz et.al. | 2408.14398 | null |
| 2024-08-26 | Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning | Sakhinana Sagar Srinivas et.al. | 2408.14387 | null |
| 2024-08-26 | Probing Causality Manipulation of Large Language Models | Chenyang Zhang et.al. | 2408.14380 | link |
| 2024-08-26 | SWE-bench-java: A GitHub Issue Resolving Benchmark for Java | Daoguang Zan et.al. | 2408.14354 | link |
| 2024-08-23 | MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? | Yi-Fan Zhang et.al. | 2408.13257 | null |
| 2024-08-23 | Domain-specific long text classification from sparse relevant information | Célia D’Cruz et.al. | 2408.13253 | null |
| 2024-08-23 | Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption | Sakhinana Sagar Srinivas et.al. | 2408.13248 | null |
| 2024-08-23 | Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time | Yingyu Liang et.al. | 2408.13233 | null |
| 2024-08-23 | EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods | Hongcheng Ding et.al. | 2408.13214 | null |
| 2024-08-23 | DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation | Qiming Zhu et.al. | 2408.13204 | null |
| 2024-08-23 | Instruct-DeBERTa: A Hybrid Approach for Aspect-based Sentiment Analysis on Textual Reviews | Dineth Jayakody et.al. | 2408.13202 | null |
| 2024-08-23 | Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning | Hourui Deng et.al. | 2408.13184 | null |
| 2024-08-23 | IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models | Zhihao Yu et.al. | 2408.13073 | null |
| 2024-08-23 | Guiding IoT-Based Healthcare Alert Systems with Large Language Models | Yulan Gao et.al. | 2408.13071 | null |
| 2024-08-22 | Controllable Text Generation for Large Language Models: A Survey | Xun Liang et.al. | 2408.12599 | link |
| 2024-08-22 | RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment | Xiaohan Wang et.al. | 2408.12579 | null |
| 2024-08-22 | Jamba-1.5: Hybrid Transformer-Mamba Models at Scale | Jamba Team et.al. | 2408.12570 | link |
| 2024-08-22 | ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation | Lujia Zhong et.al. | 2408.12561 | link |
| 2024-08-22 | Towards Evaluating and Building Versatile Large Language Models for Medicine | Chaoyi Wu et.al. | 2408.12547 | link |
| 2024-08-22 | Show-o: One Single Transformer to Unify Multimodal Understanding and Generation | Jinheng Xie et.al. | 2408.12528 | link |
| 2024-08-22 | MEDCO: Medical Education Copilots Based on A Multi-Agent Framework | Hao Wei et.al. | 2408.12496 | null |
| 2024-08-22 | GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models | Kunsheng Tang et.al. | 2408.12494 | link |
| 2024-08-22 | Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese | Khang T. Doan et.al. | 2408.12480 | null |
| 2024-08-22 | Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition | Bozheng Li et.al. | 2408.12475 | null |
| 2024-08-21 | SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs | Yuanyang Yin et.al. | 2408.11813 | null |
| 2024-08-21 | Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models | Yuzhou Huang et.al. | 2408.11801 | null |
| 2024-08-21 | PermitQA: A Benchmark for Retrieval Augmented Generation in Wind Siting and Permitting domain | Rounak Meyur et.al. | 2408.11800 | null |
| 2024-08-21 | EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model | Feipeng Ma et.al. | 2408.11795 | null |
| 2024-08-21 | Leveraging Chemistry Foundation Models to Facilitate Structure Focused Retrieval Augmented Generation in Multi-Agent Workflows for Catalyst and Materials Design | Nathaniel H. Park et.al. | 2408.11793 | null |
| 2024-08-21 | Critique-out-Loud Reward Models | Zachary Ankner et.al. | 2408.11791 | link |
| 2024-08-21 | DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework | Zhifei Xie et.al. | 2408.11788 | null |
| 2024-08-21 | Personality Alignment of Large Language Models | Minjun Zhu et.al. | 2408.11779 | link |
| 2024-08-21 | Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards | Omar Erak et.al. | 2408.11775 | link |
| 2024-08-21 | Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks | Yiyi Chen et.al. | 2408.11749 | null |
| 2024-08-20 | Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks | Nathaniel Pinckney et.al. | 2408.11053 | null |
| 2024-08-20 | FLAME: Learning to Navigate with Multimodal LLM in Urban Environments | Yunzhe Xu et.al. | 2408.11051 | link |
| 2024-08-20 | MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding | Jian Chen et.al. | 2408.11049 | link |
| 2024-08-20 | Reconciling Methodological Paradigms: Employing Large Language Models as Novice Qualitative Research Assistants in Talent Management Research | Sreyoshi Bhaduri et.al. | 2408.11043 | null |
| 2024-08-20 | Scaling Law with Learning Rate Annealing | Howe Tissue et.al. | 2408.11029 | null |
| 2024-08-20 | Athena: Safe Autonomous Agents with Verbal Contrastive Learning | Tanmana Sadhu et.al. | 2408.11021 | null |
| 2024-08-20 | While GitHub Copilot Excels at Coding, Does It Ensure Responsible Output? | Wen Cheng et.al. | 2408.11006 | link |
| 2024-08-20 | CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models | Michael Reinisch et.al. | 2408.10995 | null |
| 2024-08-20 | Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models | Yuyan Chen et.al. | 2408.10947 | null |
| 2024-08-20 | Large Language Model Driven Recommendation | Anton Korikov et.al. | 2408.10946 | null |
| 2024-08-19 | Demystifying the Communication Characteristics for Distributed Transformer Models | Quentin Anthony et.al. | 2408.10197 | null |
| 2024-08-19 | SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models | Anke Tang et.al. | 2408.10174 | link |
| 2024-08-19 | Customizing Language Models with Instance-wise LoRA for Sequential Recommendation | Xiaoyu Kong et.al. | 2408.10159 | null |
| 2024-08-19 | Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models | Amey Hengle et.al. | 2408.10151 | null |
| 2024-08-19 | In-Context Learning with Representations: Contextual Generalization of Trained Transformers | Tong Yang et.al. | 2408.10147 | null |
| 2024-08-19 | Instruction Finetuning for Leaderboard Generation from Empirical AI Research | Salomon Kabongo et.al. | 2408.10141 | null |
| 2024-08-19 | Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models | Tianyu Zhang et.al. | 2408.10124 | link |
| 2024-08-20 | PLUTUS: A Well Pre-trained Large Unified Transformer can Unveil Financial Time Series Regularities | Yuanjian Xu et.al. | 2408.10111 | null |
| 2024-08-19 | Recent Surge in Public Interest in Transportation: Sentiment Analysis of Baidu Apollo Go Using Weibo Data | Shiqi Wang et.al. | 2408.10088 | link |
| 2024-08-19 | ARMADA: Attribute-Based Multimodal Data Augmentation | Xiaomeng Jin et.al. | 2408.10086 | null |
| 2024-08-16 | PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars | Sumanth Prabhu et.al. | 2408.08869 | null |
| 2024-08-16 | Visual Agents as Fast and Slow Thinkers | Guangyan Sun et.al. | 2408.08862 | null |
| 2024-08-16 | ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis | Yubao Zhao et.al. | 2408.08849 | null |
| 2024-08-16 | PsychoLex: Unveiling the Psychological Mind of Large Language Models | Mohammad Amin Abbasi et.al. | 2408.08848 | null |
| 2024-08-16 | FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats | Xuanliang Zhang et.al. | 2408.08841 | link |
| 2024-08-16 | Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors | Felipe A. Csaszar et.al. | 2408.08811 | null |
| 2024-08-16 | Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge | Ravi Raju et.al. | 2408.08808 | null |
| 2024-08-16 | EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics | Chenwei Wan et.al. | 2408.08782 | link |
| 2024-08-16 | Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions | Chenming Tang et.al. | 2408.08780 | null |
| 2024-08-16 | DAC: Decomposed Automation Correction for Text-to-SQL | Dingzirui Wang et.al. | 2408.08779 | link |
| 2024-08-15 | Can Large Language Models Understand Symbolic Graphics Programs? | Zeju Qiu et.al. | 2408.08313 | null |
| 2024-08-15 | ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws | Ruihang Li et.al. | 2408.08310 | null |
| 2024-08-15 | Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors | Usman Syed et.al. | 2408.08302 | null |
| 2024-08-15 | HELP: Hierarchical Embeddings-based Log Parsing | Andy Xu et.al. | 2408.08300 | null |
| 2024-08-15 | The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community | Shachar Don-Yehiya et.al. | 2408.08291 | null |
| 2024-08-15 | Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model | Jin Wang et.al. | 2408.08282 | null |
| 2024-08-15 | BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts | Qizhen Zhang et.al. | 2408.08274 | null |
| 2024-08-15 | DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System | Xihong Yang et.al. | 2408.08231 | null |
| 2024-08-15 | RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science | David Farr et.al. | 2408.08217 | null |
| 2024-08-15 | Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models | Javier González et.al. | 2408.08210 | null |
| 2024-08-14 | The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models | Karime Maamari et.al. | 2408.07702 | null |
| 2024-08-15 | Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities | Enneng Yang et.al. | 2408.07666 | link |
| 2024-08-14 | Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models | Yi-Cheng Lin et.al. | 2408.07665 | null |
| 2024-08-14 | Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions | Quan Liu et.al. | 2408.07663 | link |
| 2024-08-14 | WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs | Weijian Xie et.al. | 2408.07611 | null |
| 2024-08-14 | Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey | Hamza Kheddar et.al. | 2408.07583 | null |
| 2024-08-15 | MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Minxuan Zhou et.al. | 2408.07543 | null |
| 2024-08-14 | Usefulness of data flow diagrams and large language models for security threat validation: a registered report | Winnie Bahati Mbaka et.al. | 2408.07537 | null |
| 2024-08-14 | Development of a Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments | Seungjun Han et.al. | 2408.07531 | null |
| 2024-08-14 | Large Language Models Know What Makes Exemplary Contexts | Quanyu Long et.al. | 2408.07505 | null |
| 2024-08-13 | Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents | Kexun Zhang et.al. | 2408.07060 | link |
| 2024-08-13 | LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs | Yushi Bai et.al. | 2408.07055 | link |
| 2024-08-13 | PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology | Xiaomin Wu et.al. | 2408.07037 | null |
| 2024-08-13 | Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models | Chun Jie Chong et.al. | 2408.07004 | null |
| 2024-08-13 | Generative AI for automatic topic labelling | Diego Kozlowski et.al. | 2408.07003 | null |
| 2024-08-13 | LLMs can Schedule | Henrik Abgaryan et.al. | 2408.06993 | link |
| 2024-08-13 | OpenResearcher: Unleashing AI for Accelerated Scientific Research | Yuxiang Zheng et.al. | 2408.06941 | link |
| 2024-08-13 | Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas | Louis Kwok et.al. | 2408.06929 | null |
| 2024-08-13 | Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives | Zhihu Wang et.al. | 2408.06904 | null |
| 2024-08-13 | Leveraging Language Models for Emotion and Behavior Analysis in Education | Kaito Tanaka et.al. | 2408.06874 | null |
| 2024-08-12 | Animate, or Inanimate, That is the Question for Large Language Models | Leonardo Ranaldi et.al. | 2408.06332 | null |
| 2024-08-12 | Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let’s Take TravelPlanner as an Example | Yanan Chen et.al. | 2408.06318 | null |
| 2024-08-12 | Long-Form Answers to Visual Questions from Blind and Low Vision People | Mina Huh et.al. | 2408.06303 | null |
| 2024-08-12 | The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery | Chris Lu et.al. | 2408.06292 | link |
| 2024-08-12 | MovieSum: An Abstractive Summarization Dataset for Movie Screenplays | Rohit Saxena et.al. | 2408.06281 | link |
| 2024-08-12 | Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation | Jieyong Kim et.al. | 2408.06276 | null |
| 2024-08-12 | FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data | Haoran Sun et.al. | 2408.06273 | link |
| 2024-08-12 | A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution | Sampath Rajapaksha et.al. | 2408.06272 | null |
| 2024-08-12 | Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment | Karel D’Oosterlinck et.al. | 2408.06266 | link |
| 2024-08-12 | On Effects of Steering Latent Representation for Large Language Model Unlearning | Dang Huu-Tien et.al. | 2408.06223 | null |
| 2024-08-10 | Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions | Michele Miranda et.al. | 2408.05212 | link |
| 2024-08-09 | VITA: Towards Open-Source Interactive Omni Multimodal LLM | Chaoyou Fu et.al. | 2408.05211 | null |
| 2024-08-09 | Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners | Michael Vaccaro Jr et.al. | 2408.05204 | null |
| 2024-08-09 | TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning | Yujie Feng et.al. | 2408.05200 | null |
| 2024-08-09 | AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset | Pritam Deka et.al. | 2408.05149 | null |
| 2024-08-09 | A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning | Ye Yuan et.al. | 2408.05141 | null |
| 2024-08-09 | Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations | Jasmine Latendresse et.al. | 2408.05128 | null |
| 2024-08-09 | Large Language Models and Thematic Analysis: Human-AI Synergy in Researching Hate Speech on Social Media | Petre Breazu et.al. | 2408.05126 | null |
| 2024-08-09 | Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video | Chunggi Lee et.al. | 2408.05123 | null |
| 2024-08-09 | A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? | Xinyu Liu et.al. | 2408.05109 | link |
| 2024-08-08 | Transformer Explainer: Interactive Learning of Text-Generative Models | Aeree Cho et.al. | 2408.04619 | link |
| 2024-08-08 | Better Alignment with Instruction Back-and-Forth Translation | Thao Nguyen et.al. | 2408.04614 | null |
| 2024-08-08 | Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models | Qirui Jiao et.al. | 2408.04594 | link |
| 2024-08-08 | Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness | Xiaojing Fan et.al. | 2408.04585 | null |
| 2024-08-08 | SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals | Haoran Zheng et.al. | 2408.04575 | null |
| 2024-08-08 | Learning Fine-Grained Grounded Citations for Attributed Large Language Models | Lei Huang et.al. | 2408.04568 | link |
| 2024-08-08 | Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models | Yupeng Chang et.al. | 2408.04556 | link |
| 2024-08-08 | Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models | Fabio Pernisi et.al. | 2408.04522 | null |
| 2024-08-08 | What You Need is What You Get: Theory of Mind for an LLM-Based Code Understanding Assistant | Jonan Richards et.al. | 2408.04477 | null |
| 2024-08-08 | Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate | Yiqun Zhang et.al. | 2408.04472 | link |
| 2024-08-07 | How Well Can Vision Language Models See Image Details? | Chenhui Gou et.al. | 2408.03940 | null |
| 2024-08-07 | SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature | Vinícius Di Oliveira et.al. | 2408.03936 | null |
| 2024-08-07 | CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases | Xiangyan Liu et.al. | 2408.03910 | link |
| 2024-08-07 | Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models | Shachi H Kumar et.al. | 2408.03907 | null |
| 2024-08-07 | From Data to Story: Towards Automatic Animated Data Video Creation with LLM-based Multi-Agent Systems | Leixian Shen et.al. | 2408.03876 | null |
| 2024-08-07 | PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training | Haoran Xu et.al. | 2408.03865 | null |
| 2024-08-07 | GAIA – A Large Language Model for Advanced Power Dispatch | Yuheng Cheng et.al. | 2408.03847 | null |
| 2024-08-07 | MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models | Yuchen Dong et.al. | 2408.03841 | null |
| 2024-08-07 | WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models | Prannaya Gupta et.al. | 2408.03837 | link |
| 2024-08-07 | Target Prompting for Information Extraction with Vision Language Model | Dipankar Medhi et.al. | 2408.03834 | null |
| 2024-08-06 | Pre-training and in-context learning IS Bayesian inference a la De Finetti | Naimeng Ye et.al. | 2408.03307 | null |
| 2024-08-06 | TextIM: Part-aware Interactive Motion Synthesis from Text | Siyuan Fan et.al. | 2408.03302 | null |
| 2024-08-06 | KaPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models | Ruizhe Zhang et.al. | 2408.03297 | null |
| 2024-08-06 | AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval | Pavel Suma et.al. | 2408.03282 | null |
| 2024-08-07 | StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation | Boxi Cao et.al. | 2408.03281 | link |
| 2024-08-06 | Synthesizing Text-to-SQL Data from Weak and Strong LLMs | Jiaxi Yang et.al. | 2408.03256 | null |
| 2024-08-06 | Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons | Yifei Wang et.al. | 2408.03247 | link |
| 2024-08-06 | Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi | Pranita Deshmukh et.al. | 2408.03172 | null |
| 2024-08-06 | Conditioning LLMs with Emotion in Neural Machine Translation | Charles Brazier et.al. | 2408.03150 | null |
| 2024-08-06 | Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations | Leo Donisch et.al. | 2408.03130 | null |
| 2024-08-05 | Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining | Dongyang Liu et.al. | 2408.02657 | link |
| 2024-08-05 | Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models? | Mohammad Bahrami Karkevandi et.al. | 2408.02651 | null |
| 2024-08-05 | SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models | Muxi Diao et.al. | 2408.02632 | null |
| 2024-08-05 | Language Model Can Listen While Speaking | Ziyang Ma et.al. | 2408.02622 | null |
| 2024-08-05 | Progressively Selective Label Enhancement for Language Model Alignment | Biao Liu et.al. | 2408.02599 | null |
| 2024-08-05 | Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection | Sajal Aggarwal et.al. | 2408.02595 | null |
| 2024-08-05 | Leveraging the Power of LLMs: A Fine-Tuning Approach for High-Quality Aspect-Based Summarization | Ankan Mullick et.al. | 2408.02584 | null |
| 2024-08-05 | Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information | Yauwai Yim et.al. | 2408.02559 | null |
| 2024-08-05 | Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning | Hao Zhou et.al. | 2408.02549 | null |
| 2024-08-05 | RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation | Daniel Fleischer et.al. | 2408.02545 | link |
| 2024-08-02 | Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting | Xiangyu Zhao et.al. | 2408.01423 | null |
| 2024-08-02 | Mission Impossible: A Statistical Perspective on Jailbreaking LLMs | Jingtong Su et.al. | 2408.01420 | null |
| 2024-08-02 | DebateQA: Evaluating Question Answering on Debatable Knowledge | Rongwu Xu et.al. | 2408.01419 | null |
| 2024-08-02 | Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs | Yilun Hua et.al. | 2408.01417 | null |
| 2024-08-02 | Coalitions of Large Language Models Increase the Robustness of AI Agents | Prattyush Mangal et.al. | 2408.01380 | null |
| 2024-08-02 | Toward Automatic Relevance Judgment using Vision–Language Models for Image–Text Retrieval Evaluation | Jheng-Hong Yang et.al. | 2408.01363 | null |
| 2024-08-02 | Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs | Peng Ding et.al. | 2408.01355 | null |
| 2024-08-02 | MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code | Kaiwen Ning et.al. | 2408.01354 | null |
| 2024-08-02 | Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks | Anders Giovanni Møller et.al. | 2408.01346 | null |
| 2024-08-02 | A Backbone for Long-Horizon Robot Task Understanding | Xiaoshuai Chen et.al. | 2408.01334 | null |
| 2024-08-01 | AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation | Mengkang Hu et.al. | 2408.00764 | link |
| 2024-08-01 | Tamper-Resistant Safeguards for Open-Weight LLMs | Rishub Tamirisa et.al. | 2408.00761 | null |
| 2024-08-01 | DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency | Jovan Stojkovic et.al. | 2408.00741 | null |
| 2024-08-01 | Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions | Guangzhi Xiong et.al. | 2408.00727 | null |
| 2024-08-01 | An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models | Yangzhen Wu et.al. | 2408.00724 | link |
| 2024-08-01 | Pathway to Secure and Trustworthy 6G for LLMs: Attacks, Defense, and Opportunities | Sunder Ali Khowaja et.al. | 2408.00722 | null |
| 2024-08-01 | Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning | Trapoom Ukarapol et.al. | 2408.00690 | link |
| 2024-08-01 | Can Developers Prompt? A Controlled Experiment for Code Documentation Generation | Hans-Alexander Kruse et.al. | 2408.00686 | null |
| 2024-08-01 | AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models | Daqin Luo et.al. | 2408.00665 | null |
| 2024-08-01 | Disentangling Dense Embeddings with Sparse Autoencoders | Charles O’Neill et.al. | 2408.00657 | null |
| 2024-07-31 | Vision-Language Model Based Handwriting Verification | Mihir Chauhan et.al. | 2407.21788 | null |
| 2024-07-31 | Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs | Shi Liu et.al. | 2407.21771 | null |
| 2024-07-31 | ReplanVLM: Replanning Robotic Tasks with Visual Language Models | Aoran Mei et.al. | 2407.21762 | null |
| 2024-07-31 | Adaptive Retrieval-Augmented Generation for Conversational Systems | Xi Wang et.al. | 2407.21712 | null |
| 2024-07-31 | CEAR: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature | Stefan Langer et.al. | 2407.21708 | null |
| 2024-07-31 | TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities | Ming Zhang et.al. | 2407.21693 | null |
| 2024-07-31 | Synth-Empathy: Towards High-Quality Synthetic Empathy Data | Hao Liang et.al. | 2407.21669 | link |
| 2024-07-31 | LLM-for-X: Application-agnostic Integration of Large Language Models to Support Personal Writing Workflows | Lukas Teufelberger et.al. | 2407.21593 | null |
| 2024-07-31 | A Performance Study of LLM-Generated Code on Leetcode | Tristan Coignion et.al. | 2407.21579 | null |
| 2024-07-31 | PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning | Min Jae Jung et.al. | 2407.21571 | null |
| 2024-07-30 | ThinK: Thinner Key Cache by Query-Driven Pruning | Yuhui Xu et.al. | 2407.21018 | link |
| 2024-07-30 | CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning | Yuexi Du et.al. | 2407.21011 | link |
| 2024-07-30 | The Dual-Edged Sword of Technical Debt: Benefits and Issues Analyzed Through Developer Discussions | Xiaozhou Li et.al. | 2407.21007 | null |
| 2024-07-30 | MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning | Yupeng Chen et.al. | 2407.20999 | null |
| 2024-07-30 | From Feature Importance to Natural Language Explanations Using LLMs with RAG | Sule Tekkesinoglu et.al. | 2407.20990 | null |
| 2024-07-30 | Large Language Models (LLMs) for Semantic Communication in Edge-based IoT Networks | Alakesh Kalita et.al. | 2407.20970 | null |
| 2024-07-30 | Automated Review Generation Method Based on Large Language Models | Shican Wu et.al. | 2407.20906 | link |
| 2024-07-30 | ThinkRepair: Self-Directed Automated Program Repair | Xin Yin et.al. | 2407.20898 | link |
| 2024-07-30 | Effective Black Box Testing of Sentiment Analysis Classification Networks | Parsa Karbasizadeh et.al. | 2407.20884 | null |
| 2024-07-30 | Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification | Boyang Zhang et.al. | 2407.20859 | null |
| 2024-07-29 | Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing | Ekaterina Iakovleva et.al. | 2407.20232 | null |
| 2024-07-29 | Can Editing LLMs Inject Harm? | Canyu Chen et.al. | 2407.20224 | link |
| 2024-07-29 | QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval | Hongming Tan et.al. | 2407.20207 | null |
| 2024-07-29 | MindSearch: Mimicking Human Minds Elicits Deep AI Searcher | Zehui Chen et.al. | 2407.20183 | link |
| 2024-07-29 | Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning | Xingchen Zeng et.al. | 2407.20174 | link |
| 2024-07-29 | Diffusion Feedback Helps CLIP See Better | Wenxuan Wang et.al. | 2407.20171 | link |
| 2024-07-29 | Language-Conditioned Offline RL for Multi-Robot Navigation | Steven Morad et.al. | 2407.20164 | null |
| 2024-07-29 | rLLM: Relational Table Learning with LLMs | Weichen Li et.al. | 2407.20157 | link |
| 2024-07-29 | ByteCheckpoint: A Unified Checkpointing System for LLM Development | Borui Wan et.al. | 2407.20143 | null |
| 2024-07-29 | Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models | Zhe Li et.al. | 2407.20053 | null |
| 2024-07-26 | Small Molecule Optimization with Large Language Models | Philipp Guevorguian et.al. | 2407.18897 | link |
| 2024-07-26 | Human-artificial intelligence teaming for scientific information extraction from data-driven additive manufacturing research using large language models | Mutahar Safdar et.al. | 2407.18827 | null |
| 2024-07-26 | Automatic Detection of Moral Values in Music Lyrics | Vjosa Preniqi et.al. | 2407.18787 | link |
| 2024-07-26 | The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs | Aleix Sant et.al. | 2407.18786 | null |
| 2024-07-26 | TAGIFY: LLM-powered Tagging Interface for Improved Data Findability on OGD portals | Kevin Kliimask et.al. | 2407.18764 | null |
| 2024-07-26 | Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery | Yuni Susanti et.al. | 2407.18752 | link |
| 2024-07-26 | Towards Effective and Efficient Continual Pre-training of Large Language Models | Jie Chen et.al. | 2407.18743 | link |
| 2024-07-26 | Towards Generalized Offensive Language Identification | Alphaeus Dmonte et.al. | 2407.18738 | null |
| 2024-07-26 | LLASP: Fine-tuning Large Language Models for Answer Set Programming | Erica Coppolillo et.al. | 2407.18723 | null |
| 2024-07-26 | Neurosymbolic AI for Enhancing Instructability in Generative AI | Amit Sheth et.al. | 2407.18722 | null |
| 2024-07-25 | Recursive Introspection: Teaching Language Model Agents How to Self-Improve | Yuxiao Qu et.al. | 2407.18219 | null |
| 2024-07-25 | Exploring Scaling Trends in LLM Robustness | Nikolhaus Howe et.al. | 2407.18213 | null |
| 2024-07-25 | Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models | Sanae Lotfi et.al. | 2407.18158 | null |
| 2024-07-25 | Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic | Fakhraddin Alwajih et.al. | 2407.18129 | null |
| 2024-07-25 | Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow | Tian Guo et.al. | 2407.18103 | null |
| 2024-07-25 | PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization | Christopher Clarke et.al. | 2407.18078 | link |
| 2024-07-25 | C2P: Featuring Large Language Models with Causal Reasoning | Abdolmahdi Bagheri et.al. | 2407.18069 | null |
| 2024-07-25 | ComPeer: A Generative Conversational Agent for Proactive Peer Support | Tianjian Liu et.al. | 2407.18064 | null |
| 2024-07-25 | Audio Entailment: Assessing Deductive Reasoning for Audio Understanding | Soham Deshmukh et.al. | 2407.18062 | link |
| 2024-07-25 | Difficulty Estimation and Simplification of French Text Using LLMs | Henri Jamet et.al. | 2407.18061 | null |
| 2024-07-24 | I Could’ve Asked That: Reformulating Unanswerable Questions | Wenting Zhao et.al. | 2407.17469 | link |
| 2024-07-24 | WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries | Wenting Zhao et.al. | 2407.17468 | null |
| 2024-07-24 | CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models | Jiawei Gu et.al. | 2407.17467 | null |
| 2024-07-24 | $VILA^2$ : VILA Augmented VILA | Yunhao Fang et.al. | 2407.17453 | null |
| 2024-07-24 | Generative AI in Evidence-Based Software Engineering: A White Paper | Mattel Esposito et.al. | 2407.17440 | null |
| 2024-07-24 | Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? | Michael-Andrei Panaitescu-Liess et.al. | 2407.17417 | null |
| 2024-07-24 | (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork | Tianjin Huang et.al. | 2407.17412 | null |
| 2024-07-24 | Grammar-based Game Description Generation using Large Language Models | Tsunehiko Tanaka et.al. | 2407.17404 | null |
| 2024-07-24 | 3D Question Answering for City Scene Understanding | Penglei Sun et.al. | 2407.17398 | null |
| 2024-07-24 | ViPer: Visual Personalization of Generative Models via Individual Preference Learning | Sogand Salehi et.al. | 2407.17365 | null |
| 2024-07-23 | Can Large Language Models Automatically Jailbreak GPT-4V? | Yuanwei Wu et.al. | 2407.16686 | null |
| 2024-07-23 | RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent | Huiyu Xu et.al. | 2407.16667 | null |
| 2024-07-23 | Course-Correction: Safety Alignment Using Synthetic Preferences | Rongwu Xu et.al. | 2407.16637 | link |
| 2024-07-23 | Lawma: The Power of Specialization for Legal Tasks | Ricardo Dominguez-Olmedo et.al. | 2407.16615 | null |
| 2024-07-23 | Shared Imagination: LLMs Hallucinate Alike | Yilun Zhou et.al. | 2407.16604 | null |
| 2024-07-23 | Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs | Yifan Xia et.al. | 2407.16576 | null |
| 2024-07-23 | Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models | Ioana Buhnila et.al. | 2407.16565 | null |
| 2024-07-23 | Patched RTC: evaluating LLMs for diverse software development tasks | Asankhaya Sharma et.al. | 2407.16557 | link |
| 2024-07-24 | MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues | Liyun Zhang et.al. | 2407.16552 | null |
| 2024-07-23 | Imperfect Vision Encoders: Efficient and Robust Tuning for Vision-Language Models | Aristeidis Panos et.al. | 2407.16526 | null |
| 2024-07-22 | AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description | Junyu Xie et.al. | 2407.15850 | link |
| 2024-07-22 | LLMmap: Fingerprinting For Large Language Models | Dario Pasquini et.al. | 2407.15847 | null |
| 2024-07-22 | SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models | Mingze Xu et.al. | 2407.15841 | link |
| 2024-07-22 | MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity | Yangzhou Liu et.al. | 2407.15838 | link |
| 2024-07-22 | dMel: Speech Tokenization made Simple | He Bai et.al. | 2407.15835 | link |
| 2024-07-22 | Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight | Ziyuan Huang et.al. | 2407.15819 | null |
| 2024-07-22 | Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach | Rian Dolphin et.al. | 2407.15788 | null |
| 2024-07-22 | MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation | Marco Simoni et.al. | 2407.15748 | null |
| 2024-07-22 | OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context | Steffen Kleinle et.al. | 2407.15736 | null |
| 2024-07-22 | TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON | John Chong Min Tan et.al. | 2407.15734 | link |
| 2024-07-19 | Internal Consistency and Self-Feedback in Large Language Models: A Survey | Xun Liang et.al. | 2407.14507 | link |
| 2024-07-19 | On Pre-training of Multimodal Language Models Customized for Chart Understanding | Wan-Cyuan Fan et.al. | 2407.14506 | null |
| 2024-07-19 | Evaluating the Reliability of Self-Explanations in Large Language Models | Korbinian Randl et.al. | 2407.14487 | link |
| 2024-07-19 | Contrastive Learning with Counterfactual Explanations for Radiology Report Generation | Mingjie Li et.al. | 2407.14474 | null |
| 2024-07-19 | Check-Eval: A Checklist-based Approach for Evaluating Text Quality | Jayr Pereira et.al. | 2407.14467 | null |
| 2024-07-19 | Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier | Zachary Wojtowicz et.al. | 2407.14452 | null |
| 2024-07-19 | From Instruction to Insight: Exploring the Functional and Semantic Roles of Text in Interactive Dashboards | Nicole Sultanum et.al. | 2407.14451 | null |
| 2024-07-19 | Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding | Renshan Zhang et.al. | 2407.14439 | link |
| 2024-07-19 | The Vision of Autonomic Computing: Can LLMs Make It a Reality? | Zhiyang Zhang et.al. | 2407.14402 | null |
| 2024-07-19 | Open Artificial Knowledge | Vadim Borisov et.al. | 2407.14371 | null |
| 2024-07-18 | Visual Haystacks: Answering Harder Questions About Sets of Images | Tsung-Han Wu et.al. | 2407.13766 | link |
| 2024-07-18 | SegPoint: Segment Any Point Cloud via Large Language Model | Shuting He et.al. | 2407.13761 | null |
| 2024-07-18 | Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models | Zhuo Chen et.al. | 2407.13757 | null |
| 2024-07-18 | CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications | Mirza Masfiqur Rahman et.al. | 2407.13742 | null |
| 2024-07-18 | Baba Is AI: Break the Rules to Beat the Benchmark | Nathan Cloos et.al. | 2407.13729 | null |
| 2024-07-18 | CoDefeater: Using LLMs To Find Defeaters in Assurance Cases | Usman Gohar et.al. | 2407.13717 | null |
| 2024-07-18 | Understanding Reference Policies in Direct Preference Optimization | Yixin Liu et.al. | 2407.13709 | link |
| 2024-07-18 | A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice | Shaina Raza et.al. | 2407.13699 | null |
| 2024-07-18 | Prover-Verifier Games improve legibility of LLM outputs | Jan Hendrik Kirchner et.al. | 2407.13692 | link |
| 2024-07-18 | COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization | Skyler Grandel et.al. | 2407.13648 | null |
| 2024-07-17 | LookupViT: Compressing visual information to a limited number of tokens | Rajat Koner et.al. | 2407.12753 | null |
| 2024-07-17 | EchoSight: Advancing Visual-Language Models with Wiki Knowledge | Yibin Yan et.al. | 2407.12735 | null |
| 2024-07-17 | NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model | Zhongqun Zhang et.al. | 2407.12727 | null |
| 2024-07-17 | Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? | Ben Yao et.al. | 2407.12725 | null |
| 2024-07-17 | The Future of Learning: Large Language Models through the Lens of Students | He Zhang et.al. | 2407.12723 | null |
| 2024-07-17 | MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models | Leyang Shen et.al. | 2407.12709 | link |
| 2024-07-17 | Patch-Level Training for Large Language Models | Chenze Shao et.al. | 2407.12665 | link |
| 2024-07-17 | Zero-shot Text-guided Infinite Image Synthesis with LLM guidance | Soyeong Kwon et.al. | 2407.12642 | null |
| 2024-07-17 | Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences | Claudio Pinhanez et.al. | 2407.12620 | null |
| 2024-07-17 | AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism | William Brannon et.al. | 2407.12613 | link |
| 2024-07-16 | UrbanWorld: An Urban World Model for 3D City Generation | Yu Shang et.al. | 2407.11965 | null |
| 2024-07-16 | NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? | Mo Li et.al. | 2407.11963 | link |
| 2024-07-16 | Code Documentation and Analysis to Secure Software Development | Paul Attie et.al. | 2407.11934 | null |
| 2024-07-16 | What’s Wrong? Refining Meeting Summaries with LLM Feedback | Frederic Kirstein et.al. | 2407.11919 | null |
| 2024-07-16 | Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads | Aritra Dhar et.al. | 2407.11888 | null |
| 2024-07-16 | Schema Matching with Large Language Models: an Experimental Study | Marcel Parciak et.al. | 2407.11852 | link |
| 2024-07-16 | LoFTI: Localization and Factuality Transfer to Indian Locales | Sona Elza Simon et.al. | 2407.11833 | link |
| 2024-07-16 | GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text | Kyle Hamilton et.al. | 2407.11827 | null |
| 2024-07-16 | PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation | Branden Butler et.al. | 2407.11798 | null |
| 2024-07-16 | Large Language Models as Misleading Assistants in Conversation | Betty Li Hou et.al. | 2407.11789 | null |
| 2024-07-15 | VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation | Bocheng Zou et.al. | 2407.10972 | link |
| 2024-07-15 | Q-Sparse: All Large Language Models can be Fully Sparsely-Activated | Hongyu Wang et.al. | 2407.10969 | null |
| 2024-07-15 | No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations | Walter Simoncini et.al. | 2407.10964 | link |
| 2024-07-15 | Fast Matrix Multiplications for Lookup Table-Quantized LLMs | Han Guo et.al. | 2407.10960 | link |
| 2024-07-15 | MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models | Chengguang Gan et.al. | 2407.10953 | null |
| 2024-07-15 | Can Textual Semantics Mitigate Sounding Object Segmentation Preference? | Yaoting Wang et.al. | 2407.10947 | link |
| 2024-07-15 | GRUtopia: Dream General Robots in a City at Scale | Hanqing Wang et.al. | 2407.10943 | link |
| 2024-07-15 | Benchmarking Vision Language Models for Cultural Understanding | Shravan Nayak et.al. | 2407.10920 | null |
| 2024-07-15 | FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets | Xiaohui Victor Li et.al. | 2407.10909 | link |
| 2024-07-15 | Hey, That’s My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique | Mark Russinovich et.al. | 2407.10887 | null |
| 2024-07-12 | FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3 | Georgios Makridis et.al. | 2407.09467 | null |
| 2024-07-12 | Human-like Episodic Memory for Infinite Context LLMs | Zafeirios Fountas et.al. | 2407.09450 | link |
| 2024-07-12 | ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts | Amelia F. Hardy et.al. | 2407.09447 | null |
| 2024-07-12 | MUSCLE: A Model Update Strategy for Compatible LLM Evolution | Jessica Echterhoff et.al. | 2407.09435 | null |
| 2024-07-12 | Open (Clinical) LLMs are Sensitive to Instruction Phrasings | Alberto Mario Ceballos Arroyo et.al. | 2407.09429 | null |
| 2024-07-12 | TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models | Hang Zou et.al. | 2407.09424 | null |
| 2024-07-12 | Mitigating Entity-Level Hallucination in Large Language Models | Weihang Su et.al. | 2407.09417 | link |
| 2024-07-12 | SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers | Shraman Pramanick et.al. | 2407.09413 | link |
| 2024-07-12 | PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents | Saber Zerhoudi et.al. | 2407.09394 | link |
| 2024-07-12 | GAVEL: Generating Games Via Evolution and Language Models | Graham Todd et.al. | 2407.09388 | link |
| 2024-07-11 | MAVIS: Mathematical Visual Instruction Tuning | Renrui Zhang et.al. | 2407.08739 | link |
| 2024-07-11 | Real-Time Anomaly Detection and Reactive Planning with Large Language Models | Rohan Sinha et.al. | 2407.08735 | null |
| 2024-07-11 | Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist | Zihao Zhou et.al. | 2407.08733 | null |
| 2024-07-11 | A Taxonomy for Data Contamination in Large Language Models | Medha Palavalli et.al. | 2407.08716 | null |
| 2024-07-11 | GTA: A Benchmark for General Tool Agents | Jize Wang et.al. | 2407.08713 | link |
| 2024-07-11 | Extracting Training Data from Document-Based VQA Models | Francesco Pinto et.al. | 2407.08707 | null |
| 2024-07-11 | Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models | Zhening Xing et.al. | 2407.08701 | null |
| 2024-07-11 | Mitigating Catastrophic Forgetting in Language Transfer via Model Merging | Anton Alexandrov et.al. | 2407.08699 | null |
| 2024-07-11 | Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight | Zhiqiang Xie et.al. | 2407.08694 | null |
| 2024-07-11 | SEED-Story: Multimodal Long Story Generation with Large Language Model | Shuai Yang et.al. | 2407.08683 | link |
| 2024-07-10 | Training on the Test Task Confounds Evaluation and Emergence | Ricardo Dominguez-Olmedo et.al. | 2407.07890 | link |
| 2024-07-10 | Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization | Junkang Wu et.al. | 2407.07880 | link |
| 2024-07-10 | FACTS About Building Retrieval Augmented Generation-based Chatbots | Rama Akkiraju et.al. | 2407.07858 | null |
| 2024-07-10 | OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training | Sami Jaghouar et.al. | 2407.07852 | link |
| 2024-07-10 | Natural Language Mechanisms via Self-Resolution with Foundation Models | Nicolas Della Penna et.al. | 2407.07845 | null |
| 2024-07-10 | Transformer Alignment in Large Language Models | Murdock Aubry et.al. | 2407.07810 | null |
| 2024-07-10 | Attribute or Abstain: Large Language Models as Long Document Assistants | Jan Buchmann et.al. | 2407.07799 | link |
| 2024-07-11 | Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard | Oguzhan Topsakal et.al. | 2407.07796 | link |
| 2024-07-10 | Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities | Tianjie Ju et.al. | 2407.07791 | link |
| 2024-07-10 | WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment | Jiefu Ou et.al. | 2407.07778 | null |
| 2024-07-09 | AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning | Jiaxi Cui et.al. | 2407.07094 | link |
| 2024-07-09 | FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation | Liqun Ma et.al. | 2407.07093 | link |
| 2024-07-09 | Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models | Logan Cross et.al. | 2407.07086 | link |
| 2024-07-09 | Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities | Shaltiel Shmidman et.al. | 2407.07080 | null |
| 2024-07-09 | Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps | Yung-Sung Chuang et.al. | 2407.07071 | link |
| 2024-07-09 | Prompting Techniques for Secure Code Generation: A Systematic Investigation | Catherine Tony et.al. | 2407.07064 | null |
| 2024-07-09 | Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence | Weize Chen et.al. | 2407.07061 | link |
| 2024-07-09 | Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model | Wenqi Zhang et.al. | 2407.07053 | link |
| 2024-07-09 | CorMulT: A Semi-supervised Modality Correlation-aware Multimodal Transformer for Sentiment Analysis | Yangmin Li et.al. | 2407.07046 | null |
| 2024-07-09 | Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies | Inwon Kang et.al. | 2407.07019 | null |
| 2024-07-08 | Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision | Orr Zohar et.al. | 2407.06189 | link |
| 2024-07-08 | CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation | Xinying Guo et.al. | 2407.06188 | null |
| 2024-07-08 | On Speeding Up Language Model Evaluation | Jin Peng Zhou et.al. | 2407.06172 | link |
| 2024-07-08 | What’s Wrong with Your Code Generated by Large Language Models? An Extensive Study | Shihan Dou et.al. | 2407.06153 | null |
| 2024-07-08 | Using Grammar Masking to Ensure Syntactic Validity in LLM-based Modeling Tasks | Lukas Netz et.al. | 2407.06146 | null |
| 2024-07-08 | ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation | Ethan Chern et.al. | 2407.06135 | link |
| 2024-07-08 | Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization | Hannah K. Bako et.al. | 2407.06129 | link |
| 2024-07-08 | Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities | Avinash Anand et.al. | 2407.06125 | null |
| 2024-07-08 | Artificial Intuition: Efficient Classification of Scientific Abstracts | Harsh Sakhrani et.al. | 2407.06093 | null |
| 2024-07-08 | Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models | Jinliang Lu et.al. | 2407.06089 | null |
| 2024-07-05 | Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs | Rudolf Laine et.al. | 2407.04694 | null |
| 2024-07-05 | ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models | Yuzhe Gu et.al. | 2407.04693 | link |
| 2024-07-05 | Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge | Yuanze Lin et.al. | 2407.04681 | null |
| 2024-07-05 | Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition | Ye Bai et.al. | 2407.04675 | null |
| 2024-07-05 | Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement | Yongji Wu et.al. | 2407.04656 | null |
| 2024-07-05 | Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework | Reza Averly et.al. | 2407.04629 | null |
| 2024-07-05 | On scalable oversight with weak LLMs judging strong LLMs | Zachary Kenton et.al. | 2407.04622 | null |
| 2024-07-05 | Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions | Shumaila Javaid et.al. | 2407.04581 | null |
| 2024-07-05 | VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models | Hang Gao et.al. | 2407.04573 | null |
| 2024-07-05 | PoPreRo: A New Dataset for Popularity Prediction of Romanian Reddit Posts | Ana-Cristina Rogoz et.al. | 2407.04541 | link |
| 2024-07-03 | BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations | Zhantao Yang et.al. | 2407.03314 | null |
| 2024-07-03 | Universal Length Generalization with Turing Programs | Kaiying Hou et.al. | 2407.03310 | null |
| 2024-07-03 | Large Language Models for JSON Schema Discovery | Michael J. Mior et.al. | 2407.03286 | null |
| 2024-07-03 | LLM Internal States Reveal Hallucination Risk Faced With a Query | Ziwei Ji et.al. | 2407.03282 | null |
| 2024-07-03 | Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning | Zhili Shen et.al. | 2407.03227 | null |
| 2024-07-03 | How Does Quantization Affect Multilingual LLMs? | Kelly Marchisio et.al. | 2407.03211 | null |
| 2024-07-03 | TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts | Ruida Wang et.al. | 2407.03203 | link |
| 2024-07-03 | Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models | Haritz Puerto et.al. | 2407.03181 | link |
| 2024-07-03 | Investigating Decoder-only Large Language Models for Speech-to-text Translation | Chao-Wei Huang et.al. | 2407.03169 | null |
| 2024-07-03 | SOS! Soft Prompt Attack Against Open-Source Large Language Models | Ziqing Yang et.al. | 2407.03160 | null |
| 2024-07-02 | MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention | Huiqiang Jiang et.al. | 2407.02490 | link |
| 2024-07-02 | Neurocache: Efficient Vector Retrieval for Long-range Language Modeling | Ali Safaya et.al. | 2407.02486 | link |
| 2024-07-02 | RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs | Yue Yu et.al. | 2407.02485 | null |
| 2024-07-02 | MMedAgent: Learning to Use Medical Tools with Multi-modal Agent | Binxu Li et.al. | 2407.02483 | null |
| 2024-07-02 | Understanding Alignment in Multimodal LLMs: A Comprehensive Study | Elmira Amirloo et.al. | 2407.02477 | null |
| 2024-07-02 | Open Scene Graphs for Open World Object-Goal Navigation | Joel Loo et.al. | 2407.02473 | null |
| 2024-07-02 | Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I | Harrie Oosterhuis et.al. | 2407.02464 | null |
| 2024-07-02 | Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling | Margaret Li et.al. | 2407.02446 | null |
| 2024-07-02 | Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs | Jinmin Li et.al. | 2407.02411 | null |
| 2024-07-02 | CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models | Song Wang et.al. | 2407.02408 | null |
| 2024-06-28 | Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs | Sukmin Yun et.al. | 2406.20098 | link |
| 2024-06-28 | LLaRA: Supercharging Robot Learning Data for Vision-Language Policy | Xiang Li et.al. | 2406.20095 | link |
| 2024-06-28 | Scaling Synthetic Data Creation with 1,000,000,000 Personas | Xin Chan et.al. | 2406.20094 | link |
| 2024-06-28 | LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression | Jieneng Chen et.al. | 2406.20092 | link |
| 2024-06-28 | ProgressGym: Alignment with a Millennium of Moral Progress | Tianyi Qiu et.al. | 2406.20087 | link |
| 2024-06-28 | Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language | Yicheng Chen et.al. | 2406.20085 | null |
| 2024-06-28 | Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification | Anisha Gunjal et.al. | 2406.20079 | link |
| 2024-06-28 | Applying RLAIF for Code Generation with API-usage in Lightweight LLMs | Sujan Dutta et.al. | 2406.20060 | null |
| 2024-07-01 | BMW Agents – A Framework For Task Automation Through Multi-Agent Collaboration | Noel Crawford et.al. | 2406.20041 | null |
| 2024-06-28 | BioMNER: A Dataset for Biomedical Method Entity Recognition | Chen Tang et.al. | 2406.20038 | null |
| 2024-06-27 | ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos | Jr-Jen Chen et.al. | 2406.19392 | link |
| 2024-06-27 | The Remarkable Robustness of LLMs: Stages of Inference? | Vedang Lad et.al. | 2406.19384 | link |
| 2024-06-27 | Suri: Multi-constraint Instruction Following for Long-form Text Generation | Chau Minh Pham et.al. | 2406.19371 | link |
| 2024-06-27 | The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models | Xiliang Zhu et.al. | 2406.19358 | null |
| 2024-06-27 | DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions | Nigel Fernandez et.al. | 2406.19356 | null |
| 2024-06-27 | IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language | Lucky Susanto et.al. | 2406.19349 | null |
| 2024-06-27 | Jump Starting Bandits with LLM-Generated Prior Knowledge | Parand A. Alamdari et.al. | 2406.19317 | null |
| 2024-06-27 | Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation | Malvina Nikandrou et.al. | 2406.19297 | null |
| 2024-06-27 | From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data | Zheyang Xiong et.al. | 2406.19292 | link |
| 2024-06-27 | PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models | Cathy Mengying Fang et.al. | 2406.19283 | null |
| 2024-06-26 | Symbolic Learning Enables Self-Evolving Agents | Wangchunshu Zhou et.al. | 2406.18532 | link |
| 2024-06-26 | PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation | Christoph Leiter et.al. | 2406.18528 | null |
| 2024-06-26 | CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs | Zirui Wang et.al. | 2406.18521 | link |
| 2024-06-26 | “Is ChatGPT a Better Explainer than My Professor?”: Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline | Grace Li et.al. | 2406.18512 | null |
| 2024-06-26 | Mental Modeling of Reinforcement Learning Agents by Language Models | Wenhao Lu et.al. | 2406.18505 | null |
| 2024-06-26 | Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming | Zhenghao Zhou et.al. | 2406.18501 | null |
| 2024-06-26 | Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation | Ahmed Njifenjou et.al. | 2406.18460 | null |
| 2024-06-26 | Cascading Large Language Models for Salient Event Graph Generation | Xingwei Tan et.al. | 2406.18449 | null |
| 2024-06-26 | New intelligent empowerment for digital transformation | Peng Yifeng et.al. | 2406.18440 | null |
| 2024-06-26 | IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons | Dan Shi et.al. | 2406.18406 | null |
| 2024-06-25 | Text-Animator: Controllable Visual Text Video Generation | Lin Liu et.al. | 2406.17777 | null |
| 2024-06-25 | MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning | Xiangyu Zhao et.al. | 2406.17770 | link |
| 2024-06-25 | BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning | Ercong Nie et.al. | 2406.17764 | link |
| 2024-06-25 | CaLMQA: Exploring culturally specific long-form question answering across 23 languages | Shane Arora et.al. | 2406.17761 | link |
| 2024-06-25 | Accelerating Clinical Evidence Synthesis with Large Language Models | Zifeng Wang et.al. | 2406.17755 | null |
| 2024-06-25 | Measuring and Benchmarking Large Language Models’ Capabilities to Generate Persuasive Language | Amalie Brogaard Pauli et.al. | 2406.17753 | null |
| 2024-06-25 | LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users | Elinor Poole-Dayan et.al. | 2406.17737 | null |
| 2024-06-25 | FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model | Feijie Wu et.al. | 2406.17706 | null |
| 2024-06-25 | From Distributional to Overton Pluralism: Investigating Large Language Model Alignment | Thom Lake et.al. | 2406.17692 | link |
| 2024-06-25 | VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation | Kun Qian et.al. | 2406.17681 | null |
| 2024-06-24 | EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees | Yuhui Li et.al. | 2406.16858 | null |
| 2024-06-24 | From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models | Sean Welleck et.al. | 2406.16838 | null |
| 2024-06-24 | USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$ onversations | Mounika Marreddy et.al. | 2406.16833 | null |
| 2024-06-24 | Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track | Ronak Pradeep et.al. | 2406.16828 | null |
| 2024-06-24 | GPT-4V Explorations: Mining Autonomous Driving | Zixuan Li et.al. | 2406.16817 | null |
| 2024-06-24 | RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale | Beck LaBash et.al. | 2406.16801 | link |
| 2024-06-24 | Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs | Ashwinee Panda et.al. | 2406.16797 | link |
| 2024-06-24 | M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models | Rishabh Maheshwary et.al. | 2406.16783 | null |
| 2024-06-24 | It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension | Sagi Shaier et.al. | 2406.16779 | null |
| 2024-06-24 | Blending LLMs into Cascaded Speech Translation: KIT’s Offline Speech Translation System for IWSLT 2024 | Sai Koneru et.al. | 2406.16777 | null |
| 2024-06-21 | GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians | Haoyang Liu et.al. | 2406.15341 | link |
| 2024-06-21 | Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance | Haoling Li et.al. | 2406.15330 | null |
| 2024-06-21 | An End-to-End, Segmentation-Free, Arabic Handwritten Recognition Model on KHATT | Sondos Aabed et.al. | 2406.15329 | null |
| 2024-06-21 | Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks | Hokyung Lee et.al. | 2406.15325 | null |
| 2024-06-21 | Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics | Weijia Zhang et.al. | 2406.15264 | null |
| 2024-06-21 | Detecting Synthetic Lyrics with Few-Shot Inference | Yanis Labrak et.al. | 2406.15231 | null |
| 2024-06-21 | A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation | Irune Zubiaga et.al. | 2406.15227 | null |
| 2024-06-21 | Unsupervised Extraction of Dialogue Policies from Conversations | Makesh Narsimhan Sreedhar et.al. | 2406.15214 | null |
| 2024-06-21 | Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding | Mohan Li et.al. | 2406.15209 | null |
| 2024-06-21 | Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms | Santiago Berrezueta-Guzman et.al. | 2406.15198 | null |
| 2024-06-20 | Model Merging and Safety Alignment: One Bad Model Spoils the Bunch | Hasan Abed Al Kader Hammoud et.al. | 2406.14563 | null |
| 2024-06-20 | Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities | Sachit Menon et.al. | 2406.14562 | null |
| 2024-06-20 | Asynchronous Large Language Model Enhanced Planner for Autonomous Driving | Yuan Chen et.al. | 2406.14556 | link |
| 2024-06-20 | GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models | Shilong Li et.al. | 2406.14550 | null |
| 2024-06-20 | Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models | Sunny Duan et.al. | 2406.14549 | null |
| 2024-06-20 | Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data | Johannes Treutlein et.al. | 2406.14546 | link |
| 2024-06-20 | Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems | Đorđe Klisura et.al. | 2406.14545 | null |
| 2024-06-20 | Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs | Yuxuan Qiao et.al. | 2406.14544 | link |
| 2024-06-20 | Are LLMs Naturally Good at Synthetic Tabular Data Generation? | Shengzhe Xu et.al. | 2406.14541 | link |
| 2024-06-20 | PostMark: A Robust Blackbox Watermark for Large Language Models | Yapei Chang et.al. | 2406.14517 | link |
| 2024-06-18 | DrVideo: Document Retrieval Based Long Video Understanding | Ziyu Ma et.al. | 2406.12846 | null |
| 2024-06-18 | Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts | Haoxiang Wang et.al. | 2406.12845 | link |
| 2024-06-18 | Synergizing Foundation Models and Federated Learning: A Survey | Shenghui Li et.al. | 2406.12844 | null |
| 2024-06-18 | LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation | Seyedarmin Azizi et.al. | 2406.12832 | link |
| 2024-06-18 | Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models? | Pinzhen Chen et.al. | 2406.12822 | null |
| 2024-06-18 | Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones? | Zhe Yang et.al. | 2406.12809 | null |
| 2024-06-18 | Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents | Zehao Wang et.al. | 2406.12806 | null |
| 2024-06-18 | Supporting Human Raters with the Detection of Harmful Content using Large Language Models | Kurt Thomas et.al. | 2406.12800 | null |
| 2024-06-18 | ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | Team GLM et.al. | 2406.12793 | null |
| 2024-06-18 | Generating Educational Materials with Different Levels of Readability using LLMs | Chieh-Yang Huang et.al. | 2406.12787 | null |
| 2024-06-17 | LLaNA: Large Language and NeRF Assistant | Andrea Amaduzzi et.al. | 2406.11840 | null |
| 2024-06-17 | mDPO: Conditional Preference Optimization for Multimodal Large Language Models | Fei Wang et.al. | 2406.11839 | link |
| 2024-06-17 | Unveiling Encoder-Free Vision-Language Models | Haiwen Diao et.al. | 2406.11832 | link |
| 2024-06-17 | Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models | Bingqi Ma et.al. | 2406.11831 | null |
| 2024-06-17 | WPO: Enhancing RLHF with Weighted Preference Optimization | Wenxuan Zhou et.al. | 2406.11827 | link |
| 2024-06-17 | Composing Object Relations and Attributes for Image-Text Matching | Khoi Pham et.al. | 2406.11820 | null |
| 2024-06-17 | Embodied Instruction Following in Unknown Environments | Zhenyu Wu et.al. | 2406.11818 | null |
| 2024-06-17 | VideoLLM-online: Online Video Large Language Model for Streaming Video | Joya Chen et.al. | 2406.11816 | null |
| 2024-06-17 | LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning | Dantong Niu et.al. | 2406.11815 | null |
| 2024-06-17 | How Do Large Language Models Acquire Factual Knowledge During Pretraining? | Hoyeon Chang et.al. | 2406.11813 | link |
| 2024-06-14 | Quantifying Variance in Evaluation Benchmarks | Lovish Madaan et.al. | 2406.10229 | null |
| 2024-06-14 | Semantic Membership Inference Attack against Large Language Models | Hamid Mozaffari et.al. | 2406.10218 | null |
| 2024-06-14 | Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs | Rui Yang et.al. | 2406.10216 | link |
| 2024-06-14 | Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs | Abhimanyu Hans et.al. | 2406.10209 | link |
| 2024-06-14 | A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors | Naaman Tan et.al. | 2406.10203 | null |
| 2024-06-14 | TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners | Tomas de la Rosa et.al. | 2406.10196 | null |
| 2024-06-14 | Detecting and Evaluating Medical Hallucinations in Large Vision Language Models | Jiawei Chen et.al. | 2406.10185 | null |
| 2024-06-14 | Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors | Siyuan Chen et.al. | 2406.10181 | null |
| 2024-06-14 | Datasets for Multilingual Answer Sentence Selection | Matteo Gabburo et.al. | 2406.10172 | null |
| 2024-06-14 | Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models | Carson Denison et.al. | 2406.10162 | link |
| 2024-06-13 | VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding | Muhammad Maaz et.al. | 2406.09418 | link |
| 2024-06-13 | Explore the Limits of Omni-modal Pretraining at Scale | Yiyuan Zhang et.al. | 2406.09412 | link |
| 2024-06-13 | Yo’LLaVA: Your Personalized Language and Vision Assistant | Thao Nguyen et.al. | 2406.09400 | link |
| 2024-06-13 | Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms | Miaosen Zhang et.al. | 2406.09397 | null |
| 2024-06-13 | Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA | Jongwoo Park et.al. | 2406.09396 | link |
| 2024-06-13 | Improving Autoregressive Training with Dynamic Oracles | Jianing Yang et.al. | 2406.09393 | null |
| 2024-06-13 | Towards Vision-Language Geo-Foundation Model: A Survey | Yue Zhou et.al. | 2406.09385 | link |
| 2024-06-13 | Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs | Zijia Zhao et.al. | 2406.09367 | link |
| 2024-06-13 | ElicitationGPT: Text Elicitation Mechanisms via Language Models | Yifan Wu et.al. | 2406.09363 | null |
| 2024-06-13 | DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding | Suwon Shon et.al. | 2406.09345 | null |
| 2024-06-12 | Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens | Ting-Ji Huang et.al. | 2406.08477 | null |
| 2024-06-12 | Real2Code: Reconstruct Articulated Objects via Code Generation | Zhao Mandi et.al. | 2406.08474 | null |
| 2024-06-12 | Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing | Zhangchen Xu et.al. | 2406.08464 | link |
| 2024-06-12 | ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery | Kam Woh Ng et.al. | 2406.08457 | link |
| 2024-06-12 | TasTe: Teaching Large Language Models to Translate through Self-Reflection | Yutong Wang et.al. | 2406.08434 | link |
| 2024-06-12 | Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL | Zijin Hong et.al. | 2406.08426 | null |
| 2024-06-12 | OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text | Qingyun Li et.al. | 2406.08418 | link |
| 2024-06-12 | Discovering Preference Optimization Algorithms with and for Large Language Models | Chris Lu et.al. | 2406.08414 | link |
| 2024-06-12 | Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference | Christopher Wolters et.al. | 2406.08413 | null |
| 2024-06-12 | Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models | Chun-Yi Kuan et.al. | 2406.08402 | link |
| 2024-06-11 | Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena | Aidar Myrzakhan et.al. | 2406.07545 | link |
| 2024-06-11 | QuickLLaMA: Query-aware Inference Acceleration for Large Language Models | Jingyao Li et.al. | 2406.07528 | link |
| 2024-06-11 | Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement | Yunzhen Feng et.al. | 2406.07515 | null |
| 2024-06-11 | THaLLE: Text Hyperlocally Augmented Large Language Extension – Technical Report | KBTG Labs et.al. | 2406.07505 | null |
| 2024-06-11 | Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions | Renjie Pi et.al. | 2406.07502 | link |
| 2024-06-11 | TextGrad: Automatic “Differentiation” via Text | Mert Yuksekgonul et.al. | 2406.07496 | link |
| 2024-06-11 | CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization | Frederic Kirstein et.al. | 2406.07494 | null |
| 2024-06-11 | PITCH: Productivity and Mental Well-being Coaching through Daily Conversational Interaction | Adnan Abbas et.al. | 2406.07485 | null |
| 2024-06-11 | Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing | Mao Li et.al. | 2406.07483 | null |
| 2024-06-11 | VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs | Zesen Cheng et.al. | 2406.07476 | link |
| 2024-06-10 | Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation | Peize Sun et.al. | 2406.06525 | link |
| 2024-06-10 | UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor | Shivani Upadhyay et.al. | 2406.06519 | link |
| 2024-06-10 | NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative | Asmar Nadeem et.al. | 2406.06499 | null |
| 2024-06-10 | Towards a Personal Health Large Language Model | Justin Cosentino et.al. | 2406.06474 | null |
| 2024-06-10 | AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction | Zhen Xing et.al. | 2406.06465 | null |
| 2024-06-10 | Transforming Wearable Data into Health Insights using Large Language Model Agents | Mike A. Merrill et.al. | 2406.06464 | null |
| 2024-06-10 | VCR: Visual Caption Restoration | Tianyu Zhang et.al. | 2406.06462 | link |
| 2024-06-10 | Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies | Junlin Wang et.al. | 2406.06461 | null |
| 2024-06-10 | Evaluating the Retrieval Component in LLM-Based Question Answering Systems | Ashkan Alinejad et.al. | 2406.06458 | null |
| 2024-06-10 | A Large Language Model Pipeline for Breast Cancer Oncology | Tristen Pool et.al. | 2406.06455 | null |
| 2024-06-07 | 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs | Jianing Yang et.al. | 2406.05132 | link |
| 2024-06-07 | An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models | Xiongtao Zhou et.al. | 2406.05130 | null |
| 2024-06-07 | Towards Semantic Equivalence of Tokenization in Multimodal LLM | Shengqiong Wu et.al. | 2406.05127 | null |
| 2024-06-07 | Categorizing Sources of Information for Explanations in Conversational AI Systems for Older Adults Aging in Place | Niharika Mathur et.al. | 2406.05111 | null |
| 2024-06-07 | LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration | Tavor Lipman et.al. | 2406.05107 | null |
| 2024-06-07 | Multi-Head RAG: Solving Multi-Aspect Problems with LLMs | Maciej Besta et.al. | 2406.05085 | link |
| 2024-06-07 | Are Large Language Models More Empathetic than Humans? | Anuradha Welivita et.al. | 2406.05063 | null |
| 2024-06-07 | Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions | Shi-Yu Tian et.al. | 2406.05055 | null |
| 2024-06-07 | Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation | Nachiket Kotalwar et.al. | 2406.05053 | null |
| 2024-06-07 | Bootstrapping Referring Multi-Object Tracking | Yani Zhang et.al. | 2406.05039 | null |
| 2024-06-06 | Verbalized Machine Learning: Revisiting Machine Learning with Language Models | Tim Z. Xiao et.al. | 2406.04344 | null |
| 2024-06-06 | RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation | Jiaming Liu et.al. | 2406.04339 | null |
| 2024-06-06 | Coherent Zero-Shot Visual Instruction Generation | Quynh Phung et.al. | 2406.04337 | null |
| 2024-06-06 | DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs | Lingchen Meng et.al. | 2406.04334 | null |
| 2024-06-06 | PaCE: Parsimonious Concept Engineering for Large Language Models | Jinqi Luo et.al. | 2406.04331 | link |
| 2024-06-06 | Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step | Zhanhao Liang et.al. | 2406.04314 | link |
| 2024-06-06 | Semantically Diverse Language Generation for Uncertainty Estimation in Language Models | Lukas Aichberger et.al. | 2406.04306 | link |
| 2024-06-06 | Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models | Phat Nguyen et.al. | 2406.04300 | null |
| 2024-06-06 | What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages | Nadav Borenstein et.al. | 2406.04289 | null |
| 2024-06-06 | Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People | Dun-Ming Huang et.al. | 2406.04278 | link |
| 2024-06-05 | Wings: Learning Multimodal LLMs without Text-only Forgetting | Yi-Kai Zhang et.al. | 2406.03496 | null |
| 2024-06-05 | Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training | Sun Ao et.al. | 2406.03488 | null |
| 2024-06-05 | Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends | Sanjana Ramprasad et.al. | 2406.03487 | null |
| 2024-06-05 | BIPED: Pedagogically Informed Tutoring System for ESL Education | Soonwoo Kwon et.al. | 2406.03486 | null |
| 2024-06-05 | Does your data spark joy? Performance gains from domain upsampling at the end of training | Cody Blakeney et.al. | 2406.03476 | null |
| 2024-06-05 | AD-H: Autonomous Driving with Hierarchical Agents | Zaibin Zhang et.al. | 2406.03474 | null |
| 2024-06-05 | What is the Best Way for ChatGPT to Translate Poetry? | Shanshan Wang et.al. | 2406.03450 | null |
| 2024-06-05 | Pre-trained Large Language Models Use Fourier Features to Compute Addition | Tianyi Zhou et.al. | 2406.03445 | null |
| 2024-06-05 | Investigating the Relationship Between User Specialization and Toxicity on Reddit: A Sentiment Analysis Approach | Abi Oppenheim et.al. | 2406.03443 | null |
| 2024-06-05 | Cycles of Thought: Measuring LLM Confidence through Stable Explanations | Evan Becker et.al. | 2406.03441 | null |
| 2024-06-04 | Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks | Tianyu He et.al. | 2406.02550 | link |
| 2024-06-04 | Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning | Alex Jinpeng Wang et.al. | 2406.02547 | link |
| 2024-06-04 | To Believe or Not to Believe Your LLM | Yasin Abbasi Yadkori et.al. | 2406.02543 | null |
| 2024-06-04 | Loki: Low-Rank Keys for Efficient Sparse Attention | Prajwal Singhania et.al. | 2406.02542 | null |
| 2024-06-04 | Parrot: Multilingual Visual Instruction Tuning | Hai-Long Sun et.al. | 2406.02539 | null |
| 2024-06-04 | Mitigate Position Bias in Large Language Models via Scaling a Single Dimension | Yijiong Yu et.al. | 2406.02536 | null |
| 2024-06-04 | SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices | Ruslan Svirschevski et.al. | 2406.02532 | null |
| 2024-06-04 | Scalable MatMul-free Language Modeling | Rui-Jie Zhu et.al. | 2406.02528 | link |
| 2024-06-04 | CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks | Maciej Besta et.al. | 2406.02524 | null |
| 2024-06-04 | RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots | Soroush Nasiriany et.al. | 2406.02523 | null |
| 2024-05-31 | Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis | Chaoyou Fu et.al. | 2405.21075 | null |
| 2024-05-31 | Grammar-Aligned Decoding | Kanghee Park et.al. | 2405.21047 | null |
| 2024-05-31 | Direct Alignment of Language Models via Quality-Aware Self-Refinement | Runsheng Yu et.al. | 2405.21040 | null |
| 2024-05-31 | Standards for Belief Representations in LLMs | Daniel A. Herrmann et.al. | 2405.21030 | null |
| 2024-05-31 | LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models | Elias Stengel-Eskin et.al. | 2405.21028 | link |
| 2024-05-31 | Improved Techniques for Optimization-Based Jailbreaking on Large Language Models | Xiaojun Jia et.al. | 2405.21018 | link |
| 2024-05-31 | DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models | Linli Yao et.al. | 2405.20985 | null |
| 2024-05-31 | Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training | Feiteng Fang et.al. | 2405.20978 | null |
| 2024-05-31 | SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales | Tianyang Xu et.al. | 2405.20974 | link |
| 2024-05-31 | LCQ: Low-Rank Codebook based Quantization for Large Language Models | Wen-Pu Cai et.al. | 2405.20973 | null |
| 2024-05-30 | MotionLLM: Understanding Human Behaviors from Human Motions and Videos | Ling-Hao Chen et.al. | 2405.20340 | null |
| 2024-05-30 | Visual Perception by Large Language Model’s Weights | Feipeng Ma et.al. | 2405.20339 | null |
| 2024-05-30 | Xwin-LM: Strong and Scalable Alignment Practice for LLMs | Bolin Ni et.al. | 2405.20335 | link |
| 2024-05-31 | ParSEL: Parameterized Shape Editing with Language | Aditya Ganeshan et.al. | 2405.20319 | null |
| 2024-05-30 | CausalQuest: Collecting Natural Causal Questions for AI Agents | Roberto Ceraolo et.al. | 2405.20318 | link |
| 2024-05-30 | ANAH: Analytical Annotation of Hallucinations in Large Language Models | Ziwei Ji et.al. | 2405.20315 | link |
| 2024-05-30 | Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation | Guillaume Huguet et.al. | 2405.20313 | null |
| 2024-05-30 | Large Language Models Can Self-Improve At Web Agent Tasks | Ajay Patel et.al. | 2405.20309 | null |
| 2024-05-30 | Group Robust Preference Optimization in Reward-free RLHF | Shyam Sundhar Ramesh et.al. | 2405.20304 | link |
| 2024-05-30 | Who Writes the Review, Human or AI? | Panagiotis C. Theocharopoulos et.al. | 2405.20285 | null |
| 2024-05-29 | X-VILA: Cross-Modality Alignment for Large Language Model | Hanrong Ye et.al. | 2405.19335 | null |
| 2024-05-29 | LLMs Meet Multimodal Generation and Editing: A Survey | Yingqing He et.al. | 2405.19334 | link |
| 2024-05-29 | Multi-Modal Generative Embedding Model | Feipeng Ma et.al. | 2405.19333 | null |
| 2024-05-29 | Self-Exploring Language Models: Active Preference Elicitation for Online Alignment | Shenao Zhang et.al. | 2405.19332 | link |
| 2024-05-29 | Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation | Atrisha Sarkar et.al. | 2405.19328 | null |
| 2024-05-29 | MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series | Ge Zhang et.al. | 2405.19327 | null |
| 2024-05-29 | Reasoning3D – Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models | Tianrun Chen et.al. | 2405.19326 | null |
| 2024-05-29 | Nearest Neighbor Speculative Decoding for LLM Generation and Attribution | Minghan Li et.al. | 2405.19325 | null |
| 2024-05-29 | Are Large Language Models Chameleons? | Mingmeng Geng et.al. | 2405.19323 | null |
| 2024-05-29 | Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF | Shicong Cen et.al. | 2405.19320 | null |
| 2024-05-28 | Don’t Forget to Connect! Improving RAG with Graph-based Reranking | Jialin Dong et.al. | 2405.18414 | null |
| 2024-05-28 | Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass | Ethan Shen et.al. | 2405.18400 | link |
| 2024-05-28 | Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning | Yixiao Zhang et.al. | 2405.18386 | link |
| 2024-05-28 | OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning | Pengxiang Li et.al. | 2405.18380 | link |
| 2024-05-28 | LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models | Anthony Sarah et.al. | 2405.18377 | null |
| 2024-05-28 | Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning | Dongjie Chen et.al. | 2405.18376 | link |
| 2024-05-28 | Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning | Phakphum Artkaew et.al. | 2405.18375 | null |
| 2024-05-28 | PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework | Eshaan Agarwal et.al. | 2405.18369 | null |
| 2024-05-28 | Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? | Yifan Bai et.al. | 2405.18361 | null |
| 2024-05-28 | Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs | Somnath Kumar et.al. | 2405.18359 | null |
| 2024-05-27 | Matryoshka Multimodal Models | Mu Cai et.al. | 2405.17430 | null |
| 2024-05-27 | NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models | Chankyu Lee et.al. | 2405.17428 | null |
| 2024-05-27 | Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model | Kuan-Chih Huang et.al. | 2405.17427 | link |
| 2024-05-27 | LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence | Zhuoling Li et.al. | 2405.17424 | null |
| 2024-05-27 | Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation | Jiaming Liu et.al. | 2405.17418 | null |
| 2024-05-27 | THREAD: Thinking Deeper with Recursive Spawning | Philip Schroeder et.al. | 2405.17402 | null |
| 2024-05-27 | MindMerger: Efficient Boosting LLM Reasoning in non-English Languages | Zixian Huang et.al. | 2405.17386 | null |
| 2024-05-27 | ReMoDetect: Reward Models Recognize Aligned LLM’s Generations | Hyunseok Lee et.al. | 2405.17382 | null |
| 2024-05-27 | RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects | Ahmed Allam et.al. | 2405.17378 | null |
| 2024-05-27 | Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models | ShengYun Peng et.al. | 2405.17374 | null |
| 2024-05-24 | Scaling Laws for Discriminative Classification in Large Language Models | Dean Wyatte et.al. | 2405.15765 | null |
| 2024-05-24 | Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias | Andres Algaba et.al. | 2405.15739 | null |
| 2024-05-24 | More Insight from Being More Focused: Analysis of Clustered Market Apps | Maleknaz Nayebi et.al. | 2405.15737 | null |
| 2024-05-24 | LM4LV: A Frozen Large Language Model for Low-level Vision Tasks | Boyang Zheng et.al. | 2405.15734 | null |
| 2024-05-24 | Optimizing Large Language Models for OpenAPI Code Completion | Bohdan Petryshyn et.al. | 2405.15729 | null |
| 2024-05-24 | Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models | Yue Zhang et.al. | 2405.15684 | null |
| 2024-05-24 | What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models | Abdelrahman Abdelhamed et.al. | 2405.15668 | null |
| 2024-05-24 | Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning | Wenhan Chang et.al. | 2405.15662 | null |
| 2024-05-24 | \(\mathbf{L^2\cdot M = C^2}\) Large Language Models as Covert Channels… a Systematic Analysis | Simen Gaure et.al. | 2405.15652 | null |
| 2024-05-24 | LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots | Ruoyu Wang et.al. | 2405.15646 | null |
| 2024-05-23 | A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns | Asaf Yehudai et.al. | 2405.14863 | null |
| 2024-05-23 | Bitune: Bidirectional Instruction-Tuning | Dawid J. Kopiczko et.al. | 2405.14862 | null |
| 2024-05-23 | PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression | Vladimir Malinovskii et.al. | 2405.14852 | null |
| 2024-05-23 | HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models | Bernal Jiménez Gutiérrez et.al. | 2405.14831 | null |
| 2024-05-23 | Can LLMs Solve longer Math Word Problems Better? | Xin Xu et.al. | 2405.14804 | null |
| 2024-05-23 | Lessons from the Trenches on Reproducible Evaluation of Language Models | Stella Biderman et.al. | 2405.14782 | null |
| 2024-05-23 | WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models | Peng Wang et.al. | 2405.14768 | link |
| 2024-05-23 | FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models | Hongyang Yang et.al. | 2405.14767 | link |
| 2024-05-23 | Evaluating Large Language Models for Public Health Classification and Extraction Tasks | Joshua Harris et.al. | 2405.14766 | null |
| 2024-05-23 | Large language models can be zero-shot anomaly detectors for time series? | Sarah Alnegheimish et.al. | 2405.14755 | null |
| 2024-05-21 | Reducing Transformer Key-Value Cache Size with Cross-Layer Attention | William Brandon et.al. | 2405.12981 | null |
| 2024-05-21 | Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale | Shriram Chennakesavalu et.al. | 2405.12961 | null |
| 2024-05-21 | Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models | Zhangyue Yin et.al. | 2405.12939 | null |
| 2024-05-21 | Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs | Bilgehan Sel et.al. | 2405.12933 | null |
| 2024-05-21 | Code-mixed Sentiment and Hate-speech Prediction | Anjali Yadav et.al. | 2405.12929 | null |
| 2024-05-21 | Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples | Tim Menzies et.al. | 2405.12920 | null |
| 2024-05-21 | G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation | Xingyuan Pan et.al. | 2405.12915 | null |
| 2024-05-21 | An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation | Zhiyu Tan et.al. | 2405.12914 | null |
| 2024-05-21 | Topic Modelling Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment | Holli Sargeant et.al. | 2405.12910 | link |
| 2024-05-21 | Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents | San Kim et.al. | 2405.12900 | null |
| 2024-05-20 | Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning | Guanglin Zhou et.al. | 2405.12217 | link |
| 2024-05-20 | MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark | Hongwei Liu et.al. | 2405.12209 | link |
| 2024-05-20 | Developers’ Perceptions on the Impact of ChatGPT in Software Development: A Survey | Thiago S. Vaillant et.al. | 2405.12195 | null |
| 2024-05-20 | CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models | Haoxiang Shi et.al. | 2405.12174 | null |
| 2024-05-20 | Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging | Xiaobo Liang et.al. | 2405.12163 | link |
| 2024-05-20 | Eliciting Problem Specifications via Large Language Models | Robert E. Wray et.al. | 2405.12147 | null |
| 2024-05-20 | DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM | Xuchen Li et.al. | 2405.12139 | null |
| 2024-05-20 | MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning | Ting Jiang et.al. | 2405.12130 | link |
| 2024-05-20 | Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation | Zhankui He et.al. | 2405.12119 | null |
| 2024-05-20 | Imp: Highly Capable Large Multimodal Models for Mobile Devices | Zhenwei Shao et.al. | 2405.12107 | link |
| 2024-05-17 | A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers | Kaiyu Huang et.al. | 2405.10936 | link |
| 2024-05-17 | The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks | Lucius Bushnaq et.al. | 2405.10928 | null |
| 2024-05-17 | COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain | Dimitrios P. Panagoulias et.al. | 2405.10893 | null |
| 2024-05-17 | Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review | Hongyi Yang et.al. | 2405.10883 | null |
| 2024-05-17 | The Future of Large Language Model Pre-training is Federated | Lorenzo Sani et.al. | 2405.10853 | null |
| 2024-05-17 | Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities | Hao Zhou et.al. | 2405.10825 | null |
| 2024-05-17 | Modeling Supply Chain Interaction and Disruption: Insights from Real-world Data and Complex Adaptive System | Jiawei Feng et.al. | 2405.10818 | null |
| 2024-05-17 | ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios | Markus Bayer et.al. | 2405.10808 | null |
| 2024-05-17 | Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings | Albert Sawczyn et.al. | 2405.10745 | null |
| 2024-05-17 | Efficient Multimodal Large Language Models: A Survey | Yizhang Jin et.al. | 2405.10739 | link |
| 2024-05-16 | UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models | Sahel Sharifymoghaddam et.al. | 2405.10311 | null |
| 2024-05-16 | 4D Panoptic Scene Graph Generation | Jingkang Yang et.al. | 2405.10305 | link |
| 2024-05-16 | HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models | Rhea Sanjay Sukthanker et.al. | 2405.10299 | link |
| 2024-05-16 | Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction | Jianhao Chen et.al. | 2405.10288 | null |
| 2024-05-16 | FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models | Adrian Bulat et.al. | 2405.10286 | null |
| 2024-05-16 | Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers | Tuo Zhang et.al. | 2405.10276 | null |
| 2024-05-16 | Keep It Private: Unsupervised Privatization of Online Text | Calvin Bao et.al. | 2405.10260 | link |
| 2024-05-16 | When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models | Xianzheng Ma et.al. | 2405.10255 | null |
| 2024-05-16 | A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks | Xuanfan Ni et.al. | 2405.10251 | null |
| 2024-05-16 | IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers | Hao Yan et.al. | 2405.10250 | null |
| 2024-05-15 | Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming | Bushi Xiao et.al. | 2405.09508 | null |
| 2024-05-15 | ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata | Jonne Sälevä et.al. | 2405.09496 | null |
| 2024-05-15 | Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts | Donya Rooein et.al. | 2405.09482 | null |
| 2024-05-15 | Tell Me Why: Explainable Public Health Fact-Checking with Large Language Models | Majid Zarharan et.al. | 2405.09454 | link |
| 2024-05-15 | Facilitating Opinion Diversity through Hybrid NLP Approaches | Michiel van der Meer et.al. | 2405.09439 | null |
| 2024-05-15 | MicroPython Testbed for Federated Learning Algorithms | Miroslav Popovic et.al. | 2405.09423 | null |
| 2024-05-15 | Matching domain experts by training from scratch on domain knowledge | Xiaoliang Luo et.al. | 2405.09395 | null |
| 2024-05-15 | PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models | Devansh Jain et.al. | 2405.09373 | null |
| 2024-05-15 | Large Language Model Bias Mitigation from the Perspective of Knowledge Editing | Ruizhe Chen et.al. | 2405.09341 | null |
| 2024-05-15 | Prompting-based Synthetic Data Generation for Few-Shot Question Answering | Maximilian Schmidt et.al. | 2405.09335 | null |
| 2024-05-14 | Towards Enhanced RAC Accessibility: Leveraging Datasets and LLMs | Edison Jair Bejarano Sepulveda et.al. | 2405.08792 | null |
| 2024-05-14 | Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring | Tiantian Zhang et.al. | 2405.08786 | null |
| 2024-05-14 | Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs | Akhila Yerukola et.al. | 2405.08760 | link |
| 2024-05-14 | Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven Approach | Syed Mhamudul Hasan et.al. | 2405.08755 | null |
| 2024-05-14 | Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding | Zhimin Li et.al. | 2405.08748 | link |
| 2024-05-14 | ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation | Dimitris Gkoumas et.al. | 2405.08619 | null |
| 2024-05-14 | A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine | Hanguang Xiao et.al. | 2405.08603 | null |
| 2024-05-14 | EVDA: Evolving Deepfake Audio Detection Continual Learning Benchmark | Xiaohui Zhang et.al. | 2405.08596 | null |
| 2024-05-14 | Falcon 7b for Software Mention Detection in Scholarly Documents | AmeerAli Khan et.al. | 2405.08514 | null |
| 2024-05-14 | Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil Procedure | Odysseas S. Chlapanis et.al. | 2405.08502 | null |
| 2024-05-13 | Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots | Chengyue Wu et.al. | 2405.07990 | null |
| 2024-05-13 | A Generalist Learner for Multifaceted Medical Image Interpretation | Hong-Yu Zhou et.al. | 2405.07988 | null |
| 2024-05-13 | PyZoBot: A Platform for Conversational Information Extraction and Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented Generation | Suad Alshammari et.al. | 2405.07963 | null |
| 2024-05-13 | AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | Samuel Schmidgall et.al. | 2405.07960 | null |
| 2024-05-13 | EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning | Yinzhu Quan et.al. | 2405.07938 | null |
| 2024-05-13 | PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition | Ziyang Zhang et.al. | 2405.07932 | link |
| 2024-05-13 | Can Better Text Semantics in Prompt Tuning Improve VLM Generalization? | Hari Chandana Kuchibhotla et.al. | 2405.07921 | null |
| 2024-05-13 | A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking | Ferdinand Schlatt et.al. | 2405.07920 | null |
| 2024-05-13 | Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers | Alena Tsanda et.al. | 2405.07886 | null |
| 2024-05-13 | Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques | Michela Lorandi et.al. | 2405.07875 | null |
| 2024-05-10 | Linearizing Large Language Models | Jean Mercat et.al. | 2405.06640 | link |
| 2024-05-10 | Value Augmented Sampling for Language Model Alignment and Personalization | Seungwook Han et.al. | 2405.06639 | link |
| 2024-05-10 | Federated Document Visual Question Answering: A Pilot Study | Khanh Nguyen et.al. | 2405.06636 | null |
| 2024-05-10 | Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models | Chakshu Moar et.al. | 2405.06626 | null |
| 2024-05-10 | What Can Natural Language Processing Do for Peer Review? | Ilia Kuznetsov et.al. | 2405.06563 | null |
| 2024-05-10 | Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval | Mengjia Niu et.al. | 2405.06545 | null |
| 2024-05-10 | Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts | Wenyu Huang et.al. | 2405.06524 | null |
| 2024-05-10 | UniDM: A Unified Framework for Data Manipulation with Large Language Models | Yichen Qian et.al. | 2405.06510 | null |
| 2024-05-10 | Aspect-based Sentiment Evaluation of Chess Moves (ASSESS): an NLP-based Method for Evaluating Chess Strategies from Textbooks | Haifa Alrdahi et.al. | 2405.06499 | null |
| 2024-05-10 | Storypark: Leveraging Large Language Models to Enhance Children Story Learning Through Child-AI collaboration Storytelling | Lyumanshan Ye et.al. | 2405.06495 | null |
| 2024-05-09 | Natural Language Processing RELIES on Linguistics | Juri Opitz et.al. | 2405.05966 | null |
| 2024-05-09 | OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning | Dan Qiao et.al. | 2405.05957 | link |
| 2024-05-09 | Probing Multimodal LLMs as World Models for Driving | Shiva Sreeram et.al. | 2405.05956 | link |
| 2024-05-09 | Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning | Junzhi Chen et.al. | 2405.05955 | null |
| 2024-05-09 | CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | Jiachen Li et.al. | 2405.05949 | link |
| 2024-05-09 | Trustworthy AI-Generative Content in Intelligent 6G Network: Adversarial, Privacy, and Fairness | Siyuan Li et.al. | 2405.05930 | null |
| 2024-05-09 | Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? | Zorik Gekhman et.al. | 2405.05904 | null |
| 2024-05-09 | Co-driver: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes | Ziang Guo et.al. | 2405.05885 | null |
| 2024-05-09 | FlockGPT: Guiding UAV Flocking with Linguistic Orchestration | Artem Lykov et.al. | 2405.05872 | null |
| 2024-05-09 | Robots Can Feel: LLM-based Framework for Robot Ethical Reasoning | Artem Lykov et.al. | 2405.05824 | link |
| 2024-05-08 | You Only Cache Once: Decoder-Decoder Architectures for Language Models | Yutao Sun et.al. | 2405.05254 | null |
| 2024-05-08 | Open Source Language Models Can Provide Feedback: Evaluating LLMs’ Ability to Help Students Using GPT-4-As-A-Judge | Charles Koutcheme et.al. | 2405.05253 | link |
| 2024-05-09 | LLMs with Personalities in Multi-issue Negotiation Games | Sean Noh et.al. | 2405.05248 | null |
| 2024-05-08 | SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants | Masoud Moghani et.al. | 2405.05226 | null |
| 2024-05-08 | Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers | Jiuxiang Gu et.al. | 2405.05219 | null |
| 2024-05-08 | MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning | Inderjeet Nair et.al. | 2405.05189 | null |
| 2024-05-08 | Air Gap: Protecting Privacy-Conscious Conversational Agents | Eugene Bagdasaryan et.al. | 2405.05175 | null |
| 2024-05-08 | XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples | Peiqin Lin et.al. | 2405.05116 | null |
| 2024-05-08 | QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs | Weijia Zhang et.al. | 2405.05109 | null |
| 2024-05-08 | Concerns on Bias in Large Language Models when Creating Synthetic Personae | Helena A. Haxvig et.al. | 2405.05080 | null |
| 2024-05-07 | ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning | Jing Lin et.al. | 2405.04533 | null |
| 2024-05-07 | QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | Yujun Lin et.al. | 2405.04532 | link |
| 2024-05-07 | NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts | Shudan Zhang et.al. | 2405.04520 | null |
| 2024-05-07 | xLSTM: Extended Long Short-Term Memory | Maximilian Beck et.al. | 2405.04517 | null |
| 2024-05-07 | A Transformer with Stack Attention | Jiaoda Li et.al. | 2405.04515 | link |
| 2024-05-08 | Unveiling Disparities in Web Task Handling Between Human and Web Agent | Kihoon Son et.al. | 2405.04497 | null |
| 2024-05-07 | Toward In-Context Teaching: Adapting Examples to Students’ Misconceptions | Alexis Ross et.al. | 2405.04495 | null |
| 2024-05-07 | The Silicone Ceiling: Auditing GPT’s Race and Gender Biases in Hiring | Lena Armstrong et.al. | 2405.04412 | null |
| 2024-05-07 | Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks | Georgios Pantazopoulos et.al. | 2405.04403 | link |
| 2024-05-07 | Large Language Models Cannot Explain Themselves | Advait Sarkar et.al. | 2405.04382 | null |
| 2024-05-06 | Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs | Muhammad Uzair Khattak et.al. | 2405.03690 | null |
| 2024-05-06 | Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Frames | Keith Burghardt et.al. | 2405.03688 | null |
| 2024-05-06 | Language-Image Models with 3D Understanding | Jang Hyun Cho et.al. | 2405.03685 | null |
| 2024-05-06 | AtomGPT: Atomistic Generative Pre-trained Transformer for Forward and Inverse Materials Design | Kamal Choudhary et.al. | 2405.03680 | null |
| 2024-05-06 | A New Robust Partial $p$ -Wasserstein-Based Metric for Comparing Distributions | Sharath Raghvendra et.al. | 2405.03664 | null |
| 2024-05-06 | When LLMs Meet Cybersecurity: A Systematic Literature Review | Jie Zhang et.al. | 2405.03644 | null |
| 2024-05-06 | A Controlled Experiment on the Energy Efficiency of the Source Code Generated by Code Llama | Vlad-Andrei Cursaru et.al. | 2405.03616 | null |
| 2024-05-06 | Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment | Abhinav Agarwalla et.al. | 2405.03594 | null |
| 2024-05-06 | AlphaMath Almost Zero: process Supervision without process | Guoxin Chen et.al. | 2405.03553 | null |
| 2024-05-06 | MAmmoTH2: Scaling Instructions from the Web | Xiang Yue et.al. | 2405.03548 | null |
| 2024-05-03 | Leveraging Large Language Models to Enhance Domain Expert Inclusion in Data Science Workflows | Jasmine Y. Shih et.al. | 2405.02260 | null |
| 2024-05-03 | What matters when building vision-language models? | Hugo Laurençon et.al. | 2405.02246 | null |
| 2024-05-03 | REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs | Deepa Tilwani et.al. | 2405.02228 | null |
| 2024-05-03 | Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks | Lujing Zhang et.al. | 2405.02225 | null |
| 2024-05-03 | FairEvalLLM. A Comprehensive Framework for Benchmarking Fairness in Large Language Model Recommender Systems | Yashar Deldjoo et.al. | 2405.02219 | null |
| 2024-05-03 | Automatic Programming: Large Language Models and Beyond | Michael R. Lyu et.al. | 2405.02213 | null |
| 2024-05-03 | Assessing and Verifying Task Utility in LLM-Powered Applications | Negar Arabzadeh et.al. | 2405.02178 | null |
| 2024-05-03 | The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates | Giuseppe Russo Latona et.al. | 2405.02150 | null |
| 2024-05-03 | MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain | Chao Jiang et.al. | 2405.02144 | null |
| 2024-05-03 | Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection | Guillem Ramírez et.al. | 2405.02134 | null |
| 2024-05-02 | Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks | Murtaza Dalal et.al. | 2405.01534 | null |
| 2024-05-02 | OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning | Shihao Wang et.al. | 2405.01533 | link |
| 2024-05-02 | FLAME: Factuality-Aware Alignment for Large Language Models | Sheng-Chieh Lin et.al. | 2405.01525 | null |
| 2024-05-02 | Transformer-Aided Semantic Communications | Matin Mortaheb et.al. | 2405.01521 | null |
| 2024-05-02 | Analyzing the Role of Semantic Representations in the Era of Large Language Models | Zhijing Jin et.al. | 2405.01502 | link |
| 2024-05-02 | Supporting Business Document Workflows via Collection-Centric Information Foraging with Large Language Models | Raymond Fok et.al. | 2405.01501 | null |
| 2024-05-02 | Controllable Text Generation in the Instruction-Tuning Era | Dhananjay Ashok et.al. | 2405.01490 | null |
| 2024-05-02 | NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment | Gerald Shen et.al. | 2405.01481 | link |
| 2024-05-02 | V-FLUTE: Visual Figurative Language Understanding with Textual Explanations | Arkadiy Saakyan et.al. | 2405.01474 | link |
| 2024-05-02 | Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning | Théo Moutakanni et.al. | 2405.01469 | null |
| 2024-05-01 | Is Bigger Edit Batch Size Always Better? – An Empirical Study on Model Editing with Llama-3 | Junsang Yoon et.al. | 2405.00664 | null |
| 2024-05-01 | HalluVault: A Novel Logic Programming-aided Metamorphic Testing Framework for Detecting Fact-Conflicting Hallucinations in Large Language Models | Ningke Li et.al. | 2405.00648 | null |
| 2024-05-01 | When Quantization Affects Confidence of Large Language Models? | Irina Proskurina et.al. | 2405.00632 | link |
| 2024-05-01 | “I’m Not Sure, But…”: Examining the Impact of Large Language Models’ Uncertainty Expression on User Reliance and Trust | Sunnie S. Y. Kim et.al. | 2405.00623 | null |
| 2024-05-01 | Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling | Yida Mu et.al. | 2405.00611 | null |
| 2024-05-01 | Investigating Automatic Scoring and Feedback using Large Language Models | Gloria Ashiya Katuka et.al. | 2405.00602 | null |
| 2024-05-01 | Are Models Biased on Text without Gender-related Language? | Catarina G Belém et.al. | 2405.00588 | link |
| 2024-05-01 | The Real, the Better: Aligning Large Language Models with Online Human Behaviors | Guanying Jiang et.al. | 2405.00578 | null |
| 2024-05-01 | EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model | Deng Li et.al. | 2405.00574 | null |
| 2024-05-01 | Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval | Young Kyun Jang et.al. | 2405.00571 | null |
| 2024-04-30 | DOCCI: Descriptions of Connected and Contrasting Images | Yasumasa Onoe et.al. | 2404.19753 | null |
| 2024-04-30 | Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | Yunhao Ge et.al. | 2404.19752 | null |
| 2024-04-30 | PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance Verification | Leon Garza et.al. | 2404.19744 | null |
| 2024-04-30 | Better & Faster Large Language Models via Multi-token Prediction | Fabian Gloeckle et.al. | 2404.19737 | null |
| 2024-04-30 | A Framework for Leveraging Human Computation Gaming to Enhance Knowledge Graphs for Accuracy Critical Generative AI Applications | Steph Buongiorno et.al. | 2404.19729 | null |
| 2024-04-30 | PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games | Steph Buongiorno et.al. | 2404.19721 | null |
| 2024-04-30 | Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns | Constantinos Patsakis et.al. | 2404.19715 | null |
| 2024-04-30 | Automated Generation of High-Quality Medical Simulation Scenarios Through Integration of Semi-Structured Data and Large Language Models | Scott Sumpter et.al. | 2404.19713 | null |
| 2024-04-30 | When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively | Tiziano Labruna et.al. | 2404.19705 | link |
| 2024-04-30 | Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners | Chun Feng et.al. | 2404.19696 | null |
| 2024-04-29 | Hallucination of Multimodal Large Language Models: A Survey | Zechen Bai et.al. | 2404.18930 | link |
| 2024-04-29 | DPO Meets PPO: Reinforced Token Optimization for RLHF | Han Zhong et.al. | 2404.18922 | link |
| 2024-04-29 | TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation | Junhao Cheng et.al. | 2404.18919 | null |
| 2024-04-29 | Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting | Fangcheng Liu et.al. | 2404.18911 | link |
| 2024-04-29 | Human-in-the-Loop Synthetic Text Data Inspection with Provenance Tracking | Hong Jin Kang et.al. | 2404.18881 | link |
| 2024-04-29 | More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness | Aaron J. Li et.al. | 2404.18870 | link |
| 2024-04-29 | Truth-value judgment in language models: belief directions are context sensitive | Stefan F. Schouten et.al. | 2404.18865 | null |
| 2024-04-29 | Performance-Aligned LLMs for Generating Fast Code | Daniel Nichols et.al. | 2404.18864 | null |
| 2024-04-29 | VERT: Verified Equivalent Rust Transpilation with Few-Shot Learning | Aidan Z. H. Yang et.al. | 2404.18852 | null |
| 2024-04-29 | It’s Difficult to be Neutral – Human and LLM-based Sentiment Annotation of Patient Comments | Petter Mæhlum et.al. | 2404.18832 | null |
| 2024-04-26 | Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo | Stephen Zhao et.al. | 2404.17546 | link |
| 2024-04-26 | Large Language Model Agent as a Mechanical Designer | Yayati Jadhav et.al. | 2404.17525 | null |
| 2024-04-26 | On the Use of Large Language Models to Generate Capability Ontologies | Luis Miguel Vieira da Silva et.al. | 2404.17524 | null |
| 2024-04-26 | Enhancing Legal Compliance and Regulation Analysis with Large Language Models | Shabnam Hassani et.al. | 2404.17522 | null |
| 2024-04-26 | A Comprehensive Evaluation on Event Reasoning of Large Language Models | Zhengwei Tao et.al. | 2404.17513 | link |
| 2024-04-26 | Learning text-to-video retrieval from image captioning | Lucas Ventura et.al. | 2404.17498 | null |
| 2024-04-26 | CEval: A Benchmark for Evaluating Counterfactual Text Generation | Van Bach Nguyen et.al. | 2404.17475 | link |
| 2024-04-26 | Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System | Robin Schmucker et.al. | 2404.17460 | null |
| 2024-04-26 | “ChatGPT Is Here to Help, Not to Replace Anybody” – An Evaluation of Students’ Opinions On Integrating ChatGPT In CS Courses | Bruno Pereira Cipriano et.al. | 2404.17443 | null |
| 2024-04-26 | InspectorRAGet: An Introspection Platform for RAG Evaluation | Kshitij Fadnis et.al. | 2404.17347 | link |
| 2024-04-25 | Make-it-Real: Unleashing Large Multimodal Model’s Ability for Painting 3D Objects with Realistic Materials | Ye Fang et.al. | 2404.16829 | null |
| 2024-04-25 | How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites | Zhe Chen et.al. | 2404.16821 | link |
| 2024-04-25 | IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages | Harman Singh et.al. | 2404.16816 | link |
| 2024-04-25 | Make Your LLM Fully Utilize the Context | Shengnan An et.al. | 2404.16811 | link |
| 2024-04-25 | Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning | Tianhui Zhang et.al. | 2404.16807 | null |
| 2024-04-25 | Weak-to-Strong Extrapolation Expedites Alignment | Chujie Zheng et.al. | 2404.16792 | link |
| 2024-04-25 | SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension | Bohao Li et.al. | 2404.16790 | link |
| 2024-04-25 | Continual Learning of Large Language Models: A Comprehensive Survey | Haizhou Shi et.al. | 2404.16789 | link |
| 2024-04-25 | Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model | Runzhe Zhan et.al. | 2404.16766 | null |
| 2024-04-25 | RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis | Xiaoman Zhang et.al. | 2404.16754 | null |
| 2024-04-24 | Hybrid LLM/Rule-based Approaches to Business Insights Generation from Structured Data | Aliaksei Vertsel et.al. | 2404.15604 | null |
| 2024-04-24 | ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction | Henry Peng Zou et.al. | 2404.15592 | link |
| 2024-04-24 | Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations? | Hossein Salami et.al. | 2404.15578 | null |
| 2024-04-23 | PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models | Shashi Kant Gupta et.al. | 2404.15549 | null |
| 2024-04-23 | Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models | Mihir Parmar et.al. | 2404.15522 | link |
| 2024-04-23 | Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval | Young Kyun Jang et.al. | 2404.15516 | null |
| 2024-04-23 | ToM-LM: Delegating Theory Of Mind Reasoning to External Symbolic Executors in Large Language Models | Weizhi Tang et.al. | 2404.15515 | null |
| 2024-04-23 | GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots | Simranjit Singh et.al. | 2404.15500 | null |
| 2024-04-23 | IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents | Jean-Philippe Corbeil et.al. | 2404.15488 | link |
| 2024-04-23 | Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance | Het Patel et.al. | 2404.15485 | null |
| 2024-04-23 | Aligning LLM Agents by Learning Latent Preference from User Edits | Ge Gao et.al. | 2404.15269 | link |
| 2024-04-23 | XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts | Yifeng Ding et.al. | 2404.15247 | link |
| 2024-04-23 | Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models | Aidan Z. H. Yang et.al. | 2404.15236 | null |
| 2024-04-23 | Re-Thinking Inverse Graphics With Large Language Models | Peter Kulits et.al. | 2404.15228 | null |
| 2024-04-23 | Setting up the Data Printer with Improved English to Ukrainian Machine Translation | Yurii Paniv et.al. | 2404.15196 | link |
| 2024-04-23 | Regressive Side Effects of Training Language Models to Mimic Student Misconceptions | Shashank Sonkar et.al. | 2404.15156 | null |
| 2024-04-23 | Bias patterns in the application of LLMs for clinical decision support: A comprehensive study | Raphael Poulain et.al. | 2404.15149 | null |
| 2024-04-23 | Rethinking LLM Memorization through the Lens of Adversarial Compression | Avi Schwarzschild et.al. | 2404.15146 | null |
| 2024-04-23 | MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning | Sunan He et.al. | 2404.15127 | link |
| 2024-04-23 | Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation | Xun Wu et.al. | 2404.15100 | null |
| 2024-04-22 | AutoAD III: The Prequel – Back to the Pixels | Tengda Han et.al. | 2404.14412 | null |
| 2024-04-22 | SpaceByte: Towards Deleting Tokenization from Large Language Modeling | Kevin Slagle et.al. | 2404.14408 | link |
| 2024-04-22 | RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios? | Adrian de Wynter et.al. | 2404.14397 | link |
| 2024-04-22 | A Survey on Self-Evolution of Large Language Models | Zhengwei Tao et.al. | 2404.14387 | null |
| 2024-04-22 | Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph | Xiaochen Kev Gao et.al. | 2404.14372 | link |
| 2024-04-22 | Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data | Fahim Tajwar et.al. | 2404.14367 | link |
| 2024-04-22 | Better Synthetic Data by Retrieving and Transforming Existing Datasets | Saumya Gandhi et.al. | 2404.14361 | link |
| 2024-04-22 | Rethinking Legal Compliance Automation: Opportunities with Large Language Models | Shabnam Hassani et.al. | 2404.14356 | null |
| 2024-04-22 | Automated Long Answer Grading with RiceChem Dataset | Shashank Sonkar et.al. | 2404.14316 | null |
| 2024-04-22 | Explaining Arguments’ Strength: Unveiling the Role of Attacks and Supports (Technical Report) | Xiang Yin et.al. | 2404.14304 | null |
| 2024-04-19 | MoVA: Adapting Mixture of Vision Experts to Multimodal Context | Zhuofan Zong et.al. | 2404.13046 | link |
| 2024-04-19 | Unified Scene Representation and Reconstruction for 3D Large Language Models | Tao Chu et.al. | 2404.13044 | null |
| 2024-04-19 | Data Alignment for Zero-Shot Concept Generation in Dermatology AI | Soham Gadgil et.al. | 2404.13043 | null |
| 2024-04-19 | LaPA: Latent Prompt Assist Model For Medical Visual Question Answering | Tiancheng Gu et.al. | 2404.13039 | link |
| 2024-04-19 | Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs | Biyang Guo et.al. | 2404.13033 | link |
| 2024-04-19 | When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering | Stephen Choi et.al. | 2404.13028 | null |
| 2024-04-19 | Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models | Chuofan Ma et.al. | 2404.13013 | link |
| 2024-04-19 | Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs | Clemencia Siro et.al. | 2404.12994 | link |
| 2024-04-19 | RedactBuster: Entity Type Recognition from Redacted Documents | Mirco Beltrame et.al. | 2404.12991 | null |
| 2024-04-19 | FineRec:Exploring Fine-grained Sequential Recommendation | Xiaokun Zhang et.al. | 2404.12975 | null |
| 2024-04-18 | BLINK: Multimodal Large Language Models Can See but Not Perceive | Xingyu Fu et.al. | 2404.12390 | null |
| 2024-04-18 | MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale | Xiaotang Gai et.al. | 2404.12372 | null |
| 2024-04-18 | When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes | Asaf Yehudai et.al. | 2404.12365 | link |
| 2024-04-18 | Towards a Foundation Model for Partial Differential Equation: Multi-Operator Learning and Extrapolation | Jingmin Sun et.al. | 2404.12355 | link |
| 2024-04-18 | V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning | Hang Hua et.al. | 2404.12353 | null |
| 2024-04-18 | Large Language Models in Targeted Sentiment Analysis | Nicolay Rusnachenko et.al. | 2404.12342 | link |
| 2024-04-18 | Normative Requirements Operationalization with Large Language Models | Nick Feng et.al. | 2404.12335 | null |
| 2024-04-18 | Large Language Models for Synthetic Participatory Planning of Shared Automated Electric Mobility Systems | Jiangbo Yu et.al. | 2404.12317 | null |
| 2024-04-18 | Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair | Yusuke Sakai et.al. | 2404.12299 | null |
| 2024-04-18 | Augmenting emotion features in irony detection with Large language modeling | Yucheng Lin et.al. | 2404.12291 | null |
| 2024-04-17 | A Deep Dive into Large Language Models for Automated Bug Localization and Repair | Soneya Binta Hossain et.al. | 2404.11595 | null |
| 2024-04-17 | Related Work and Citation Text Generation: A Survey | Xiangci Li et.al. | 2404.11588 | null |
| 2024-04-17 | LLMTune: Accelerate Database Knob Tuning with Large Language Models | Xinmei Huang et.al. | 2404.11581 | null |
| 2024-04-17 | MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation | Kuan-Chieh et.al. | 2404.11565 | null |
| 2024-04-17 | Quantifying Multilingual Performance of Large Language Models Across Languages | Zihao Li et.al. | 2404.11553 | link |
| 2024-04-17 | Evaluating Span Extraction in Generative Paradigm: A Reflection on Aspect-Based Sentiment Analysis | Soyoung Yang et.al. | 2404.11539 | null |
| 2024-04-17 | Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization | Costas Mavromatis et.al. | 2404.11531 | null |
| 2024-04-17 | Embedding Privacy in Computational Social Science and Artificial Intelligence Research | Keenan Jones et.al. | 2404.11515 | null |
| 2024-04-17 | Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models | Yushuo Chen et.al. | 2404.11502 | link |
| 2024-04-17 | Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models | Yue Zhou et.al. | 2404.11500 | link |
| 2024-04-16 | Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback | Qiwei Di et.al. | 2404.10776 | null |
| 2024-04-16 | LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? | Yuchi Wang et.al. | 2404.10763 | link |
| 2024-04-16 | Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification | Yu-Yang Li et.al. | 2404.10757 | null |
| 2024-04-16 | Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study | Shusheng Xu et.al. | 2404.10719 | null |
| 2024-04-16 | An empirical study on code review activity prediction in practice | Doriane Olewicki et.al. | 2404.10703 | null |
| 2024-04-16 | Automating REST API Postman Test Cases Using LLM | S Deepika Sri et.al. | 2404.10678 | null |
| 2024-04-16 | ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images | Quan Van Nguyen et.al. | 2404.10652 | link |
| 2024-04-16 | Self-playing Adversarial Language Game Enhances LLM Reasoning | Pengyu Cheng et.al. | 2404.10642 | link |
| 2024-04-16 | HLAT: High-quality Large Language Model Pre-trained on AWS Trainium | Haozheng Fan et.al. | 2404.10630 | null |
| 2024-04-16 | Private Attribute Inference from Images with Vision-Language Models | Batuhan Tömekçe et.al. | 2404.10618 | null |
| 2024-04-15 | Personalized Collaborative Fine-Tuning for On-Device Large Language Models | Nicolas Wagner et.al. | 2404.09753 | null |
| 2024-04-15 | Quantization of Large Language Models with an Overdetermined Basis | Daniil Merkulov et.al. | 2404.09737 | null |
| 2024-04-15 | Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model | Hyunsoo Cho et.al. | 2404.09717 | null |
| 2024-04-15 | Enhancing Robot Explanation Capabilities through Vision-Language Models: a Preliminary Study by Interpreting Visual Inputs for Improved Human-Robot Interaction | David Sobrín-Hidalgo et.al. | 2404.09705 | null |
| 2024-04-15 | Generative AI for Game Theory-based Mobile Networking | Long He et.al. | 2404.09699 | null |
| 2024-04-15 | Are Large Language Models Reliable Argument Quality Annotators? | Nailia Mirzakhmedova et.al. | 2404.09696 | null |
| 2024-04-15 | LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models | Guangyan Li et.al. | 2404.09695 | null |
| 2024-04-15 | Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation | Juhwan Choi et.al. | 2404.09682 | link |
| 2024-04-15 | Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection | Jiaqi Zhu et.al. | 2404.09654 | null |
| 2024-04-15 | Bridging Vision and Language Spaces with Assignment Prediction | Jungin Park et.al. | 2404.09632 | link |
| 2024-04-12 | Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts | Övgü Özdemir et.al. | 2404.08589 | link |
| 2024-04-12 | Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation | Hanlin Tian et.al. | 2404.08570 | null |
| 2024-04-12 | RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs | Shreyas Chaudhari et.al. | 2404.08555 | null |
| 2024-04-12 | Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward | Xuan Xie et.al. | 2404.08517 | null |
| 2024-04-12 | Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | Haoran Qiu et.al. | 2404.08509 | link |
| 2024-04-12 | LaSagnA: Language-based Segmentation Assistant for Complex Queries | Cong Wei et.al. | 2404.08506 | link |
| 2024-04-12 | Strategic Interactions between Large Language Models-based Agents in Beauty Contests | Siting Lu et.al. | 2404.08492 | null |
| 2024-04-12 | Thematic Analysis with Large Language Models: does it work with languages other than English? A targeted test in Italian | Stefano De Paoli et.al. | 2404.08488 | null |
| 2024-04-12 | Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task | Hassan Ali et.al. | 2404.08424 | null |
| 2024-04-12 | AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees | William Fleshman et.al. | 2404.08417 | null |
| 2024-04-11 | OpenBias: Open-set Bias Detection in Text-to-Image Generative Models | Moreno D’Incà et.al. | 2404.07990 | link |
| 2024-04-11 | View Selection for 3D Captioning via Diffusion Ranking | Tiange Luo et.al. | 2404.07984 | null |
| 2024-04-11 | Manipulating Large Language Models to Increase Product Visibility | Aounon Kumar et.al. | 2404.07981 | link |
| 2024-04-11 | LLoCO: Learning Long Contexts Offline | Sijun Tan et.al. | 2404.07979 | link |
| 2024-04-11 | Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models | Haotian Zhang et.al. | 2404.07973 | null |
| 2024-04-11 | Leveraging Large Language Models (LLMs) to Support Collaborative Human-AI Online Risk Data Annotation | Jinkyung Park et.al. | 2404.07926 | null |
| 2024-04-11 | LaVy: Vietnamese Multimodal Large Language Model | Chi Tran et.al. | 2404.07922 | link |
| 2024-04-11 | AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs | Zeyi Liao et.al. | 2404.07921 | link |
| 2024-04-11 | DesignQA: A Multimodal Benchmark for Evaluating Large Language Models’ Understanding of Engineering Documentation | Anna C. Doris et.al. | 2404.07917 | link |
| 2024-04-11 | High-Dimension Human Value Representation in Large Language Models | Samuel Cahyawijaya et.al. | 2404.07900 | link |
| 2024-04-10 | UMBRAE: Unified Multimodal Decoding of Brain Signals | Weihao Xia et.al. | 2404.07202 | null |
| 2024-04-10 | Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | Tsendsuren Munkhdalai et.al. | 2404.07143 | link |
| 2024-04-11 | Semantically-correlated memories in a dense associative model | Thomas F Burns et.al. | 2404.07123 | null |
| 2024-04-10 | Continuous Language Model Interpolation for Dynamic and Controllable Text Generation | Sara Kangaslahti et.al. | 2404.07117 | null |
| 2024-04-11 | From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications | Yongqiang Ma et.al. | 2404.07108 | null |
| 2024-04-10 | Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs | Bowen Jin et.al. | 2404.07103 | link |
| 2024-04-10 | Dynamic Generation of Personalities with Large Language Models | Jianzhi Liu et.al. | 2404.07084 | null |
| 2024-04-10 | VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning | Alexandros Xenos et.al. | 2404.07078 | link |
| 2024-04-10 | Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers? | Mingyu Jin et.al. | 2404.07066 | link |
| 2024-04-10 | Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study | Alessandro Stolfo et.al. | 2404.07060 | null |
| 2024-04-09 | Pitfalls of Conversational LLMs on News Debiasing | Ipek Baris Schlicht et.al. | 2404.06488 | null |
| 2024-04-09 | Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks | Chonghua Wang et.al. | 2404.06480 | link |
| 2024-04-09 | Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models | Zihan Fang et.al. | 2404.06448 | null |
| 2024-04-09 | Large Language Models to the Rescue: Deadlock Resolution in Multi-Robot Systems | Kunal Garg et.al. | 2404.06413 | null |
| 2024-04-09 | AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents | Luca Gioacchini et.al. | 2404.06411 | link |
| 2024-04-09 | Take a Look at it! Rethinking How to Evaluate Language Model Jailbreak | Hongyu Cai et.al. | 2404.06407 | link |
| 2024-04-09 | Apprentices to Research Assistants: Advancing Research with Large Language Models | M. Namvarpour et.al. | 2404.06404 | null |
| 2024-04-09 | MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies | Shengding Hu et.al. | 2404.06395 | link |
| 2024-04-09 | MuPT: A Generative Symbolic Music Pretrained Transformer | Xingwei Qu et.al. | 2404.06393 | null |
| 2024-04-09 | Latent Distance Guided Alignment Training for Large Language Models | Haotian Luo et.al. | 2404.06390 | null |
| 2024-04-08 | MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | Bo He et.al. | 2404.05726 | link |
| 2024-04-08 | Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs | Keen You et.al. | 2404.05719 | null |
| 2024-04-08 | Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding | Ahmad Idrissi-Yaghir et.al. | 2404.05694 | null |
| 2024-04-08 | Evaluating Mathematical Reasoning Beyond Accuracy | Shijie Xia et.al. | 2404.05692 | link |
| 2024-04-08 | Retrieval-Augmented Open-Vocabulary Object Detection | Jooyeon Kim et.al. | 2404.05687 | link |
| 2024-04-08 | MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation | Kunpeng Song et.al. | 2404.05674 | link |
| 2024-04-08 | CoReS: Orchestrating the Dance of Reasoning and Segmentation | Xiaoyi Bao et.al. | 2404.05673 | link |
| 2024-04-08 | Fighting crime with Transformers: Empirical analysis of address parsing methods in payment data | Haitham Hammami et.al. | 2404.05632 | link |
| 2024-04-08 | LTNER: Large Language Model Tagging for Named Entity Recognition with Contextualized Entity Marking | Faren Yan et.al. | 2404.05624 | null |
| 2024-04-08 | MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering | Iñigo Alonso et.al. | 2404.05590 | null |
| 2024-04-05 | Physical Property Understanding from Language-Embedded Feature Fields | Albert J. Zhai et.al. | 2404.04242 | null |
| 2024-04-05 | Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents | Harsh Kohli et.al. | 2404.04237 | null |
| 2024-04-05 | Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation | Tianqi Zhong et.al. | 2404.04232 | link |
| 2024-04-05 | Social Skill Training with Large Language Models | Diyi Yang et.al. | 2404.04204 | null |
| 2024-04-05 | Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model | Xinrun Du et.al. | 2404.04167 | null |
| 2024-04-05 | Large language models as oracles for instantiating ontologies with domain-specific knowledge | Giovanni Ciatto et.al. | 2404.04108 | link |
| 2024-04-05 | Improving Factual Accuracy of Neural Table-to-Text Output by Addressing Input Problems in ToTTo | Barkavi Sundararajan et.al. | 2404.04103 | link |
| 2024-04-05 | Robust Preference Optimization with Provable Noise Tolerance for LLMs | Xize Liang et.al. | 2404.04102 | null |
| 2024-04-05 | Assessing the quality of information extraction | Filip Seitl et.al. | 2404.04068 | null |
| 2024-04-05 | CLUE: A Clinical Language Understanding Evaluation for LLMs | Amin Dada et.al. | 2404.04067 | link |
| 2024-04-04 | CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching | Dongzhi Jiang et.al. | 2404.03653 | link |
| 2024-04-04 | AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent | Hanyu Lai et.al. | 2404.03648 | link |
| 2024-04-04 | Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra | Darioush Kevian et.al. | 2404.03647 | null |
| 2024-04-04 | Training LLMs over Neurally Compressed Text | Brian Lester et.al. | 2404.03626 | null |
| 2024-04-04 | Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph | Marco Bronzini et.al. | 2404.03623 | link |
| 2024-04-04 | Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models | Wenshan Wu et.al. | 2404.03622 | link |
| 2024-04-04 | DeViDe: Faceted medical knowledge for improved medical vision-language pre-training | Haozhe Luo et.al. | 2404.03618 | null |
| 2024-04-04 | Sailor: Open Language Models for South-East Asia | Longxu Dou et.al. | 2404.03608 | link |
| 2024-04-04 | Evaluating LLMs at Detecting Errors in LLM Responses | Ryo Kamoi et.al. | 2404.03602 | link |
| 2024-04-04 | Intent Detection and Entity Extraction from BioMedical Literature | Ankan Mullick et.al. | 2404.03598 | link |
| 2024-04-03 | ALOHa: A New Measure for Hallucination in Captioning Models | Suzanne Petryk et.al. | 2404.02904 | null |
| 2024-04-03 | MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment | Duygu Ceylan et.al. | 2404.02899 | null |
| 2024-04-03 | ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline | Yifan Xu et.al. | 2404.02893 | link |
| 2024-04-03 | Integrating Explanations in Learning LTL Specifications from Demonstrations | Ashutosh Gupta et.al. | 2404.02872 | null |
| 2024-04-03 | Toward Inference-optimal Mixture-of-Expert Large Language Models | Longfei Yun et.al. | 2404.02852 | null |
| 2024-04-03 | I-Design: Personalized LLM Interior Designer | Ata Çelen et.al. | 2404.02838 | null |
| 2024-04-03 | Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models | Wanyun Cui et.al. | 2404.02837 | null |
| 2024-04-03 | Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison | Maxime Bouthors et.al. | 2404.02835 | null |
| 2024-04-03 | Empowering Biomedical Discovery with AI Agents | Shanghua Gao et.al. | 2404.02831 | null |
| 2024-04-03 | BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models | Qijun Luo et.al. | 2404.02827 | link |
| 2024-04-02 | Topic-based Watermarks for LLM-Generated Text | Alexander Nemecek et.al. | 2404.02138 | null |
| 2024-04-02 | Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models | Wanyong Feng et.al. | 2404.02124 | null |
| 2024-04-02 | GINopic: Topic Modeling with Graph Isomorphism Network | Suman Adhya et.al. | 2404.02115 | link |
| 2024-04-02 | CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems | Sara Rosenthal et.al. | 2404.02103 | link |
| 2024-04-02 | Advancing LLM Reasoning Generalists with Preference Trees | Lifan Yuan et.al. | 2404.02078 | link |
| 2024-04-02 | Digital Forgetting in Large Language Models: A Survey of Unlearning Methods | Alberto Blanco-Justicia et.al. | 2404.02062 | null |
| 2024-04-02 | Long-context LLMs Struggle with Long In-context Learning | Tianle Li et.al. | 2404.02060 | link |
| 2024-04-02 | Deconstructing In-Context Learning: Understanding Prompts via Corruption | Namrata Shivagunde et.al. | 2404.02054 | link |
| 2024-04-02 | BERTopic-Driven Stock Market Predictions: Unraveling Sentiment Insights | Enmin Zhu et.al. | 2404.02053 | null |
| 2024-04-02 | A Survey on Large Language Model-Based Game Agents | Sihao Hu et.al. | 2404.02039 | link |
| 2024-03-29 | Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models | Atsuyuki Miyai et.al. | 2403.20331 | link |
| 2024-03-29 | Gecko: Versatile Text Embeddings Distilled from Large Language Models | Jinhyuk Lee et.al. | 2403.20327 | null |
| 2024-03-29 | Convolutional Prompting meets Language Models for Continual Learning | Anurag Roy et.al. | 2403.20317 | null |
| 2024-03-29 | Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference | Jovan Stojkovic et.al. | 2403.20306 | null |
| 2024-03-29 | Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain | Burcu Sayin et.al. | 2403.20288 | null |
| 2024-03-29 | LUQ: Long-text Uncertainty Quantification for LLMs | Caiqi Zhang et.al. | 2403.20279 | null |
| 2024-04-01 | Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want | Weifeng Lin et.al. | 2403.20271 | link |
| 2024-03-29 | Latxa: An Open Language Model and Evaluation Suite for Basque | Julen Etxaniz et.al. | 2403.20266 | link |
| 2024-03-29 | ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models | Thibaut Thonet et.al. | 2403.20262 | null |
| 2024-03-29 | Using LLMs to Model the Beliefs and Preferences of Targeted Populations | Keiichi Namikoshi et.al. | 2403.20252 | null |
| 2024-03-28 | InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction | Sirui Xu et.al. | 2403.19652 | null |
| 2024-03-28 | MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions | Kai Zhang et.al. | 2403.19651 | null |
| 2024-03-28 | Change-Agent: Towards Interactive Comprehensive Change Interpretation and Analysis from Change Detection and Change Captioning | Chenyang Liu et.al. | 2403.19646 | link |
| 2024-03-28 | Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models | Yucheng Shi et.al. | 2403.19631 | null |
| 2024-03-28 | Semantic Map-based Generation of Navigation Instructions | Chengzu Li et.al. | 2403.19603 | link |
| 2024-03-28 | LocCa: Visual Pretraining with Location-aware Captioners | Bo Wan et.al. | 2403.19596 | null |
| 2024-03-28 | Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation | Zhongliang Zhou et.al. | 2403.19584 | null |
| 2024-03-28 | WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models | Piotr Molenda et.al. | 2403.19548 | null |
| 2024-03-28 | LLMs as Academic Reading Companions: Extending HCI Through Synthetic Personae | Celia Chen et.al. | 2403.19506 | null |
| 2024-03-28 | Evolving Assembly Code in an Adversarial Environment | Irina Maliukov et.al. | 2403.19489 | null |
| 2024-03-27 | Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | Yanwei Li et.al. | 2403.18814 | link |
| 2024-03-27 | ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation | Suraj Patni et.al. | 2403.18807 | link |
| 2024-03-27 | Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation | Mateusz Klimaszewski et.al. | 2403.18804 | null |
| 2024-03-27 | Long-form factuality in large language models | Jerry Wei et.al. | 2403.18802 | link |
| 2024-03-27 | 3P-LLM: Probabilistic Path Planning using Large Language Model for Autonomous Robot Navigation | Ehsan Latif et.al. | 2403.18778 | null |
| 2024-03-27 | CheckEval: Robust Evaluation Framework using Large Language Model via Checklist | Yukyung Lee et.al. | 2403.18771 | null |
| 2024-03-27 | MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model | Yike Wu et.al. | 2403.18760 | null |
| 2024-03-27 | Understanding the Learning Dynamics of Alignment with Human Feedback | Shawn Im et.al. | 2403.18742 | null |
| 2024-03-27 | PhysicsAssistant: An LLM-Powered Interactive Learning Robot for Physics Lab Investigations | Ehsan Latif et.al. | 2403.18721 | null |
| 2024-03-27 | NL-ITI: Optimizing Probing and Intervention for Improvement of ITI Method | Jakub Hoscilowicz et.al. | 2403.18680 | link |
| 2024-03-26 | MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution | Wei Tao et.al. | 2403.17927 | null |
| 2024-03-26 | LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning | Rui Pan et.al. | 2403.17919 | null |
| 2024-03-26 | Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach | Andrea Ferrario et.al. | 2403.17873 | null |
| 2024-03-26 | Exploring LLMs as a Source of Targeted Synthetic Textual Data to Minimize High Confidence Misclassifications | Philip Lippmann et.al. | 2403.17860 | null |
| 2024-03-26 | ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages | Bhawna Piryani et.al. | 2403.17859 | link |
| 2024-03-26 | Verbing Weirds Language (Models): Evaluation of English Zero-Derivation in Five LLMs | David R. Mortensen et.al. | 2403.17856 | null |
| 2024-03-26 | ArabicaQA: A Comprehensive Dataset for Arabic Question Answering | Abdelrahman Abdallah et.al. | 2403.17848 | link |
| 2024-03-26 | Assessment of Multimodal Large Language Models in Alignment with Human Values | Zhelun Shi et.al. | 2403.17830 | null |
| 2024-03-26 | Accelerating Radio Spectrum Regulation Workflows with Large Language Models (LLMs) | Amir Ghasemi et.al. | 2403.17819 | null |
| 2024-03-26 | Are Compressed Language Models Less Subgroup Robust? | Leonidas Gee et.al. | 2403.17811 | link |
| 2024-03-25 | Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making | Shuai Ma et.al. | 2403.16812 | null |
| 2024-03-25 | An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems | Hanqing Yang et.al. | 2403.16809 | null |
| 2024-03-25 | Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback | Zhangqian Bi et.al. | 2403.16792 | null |
| 2024-03-25 | All Artificial, Less Intelligence: GenAI through the Lens of Formal Verification | Deepak Narayan Gadde et.al. | 2403.16750 | null |
| 2024-03-25 | Synapse: Learning Preferential Concepts from Visual Demonstrations | Sadanand Modak et.al. | 2403.16689 | null |
| 2024-03-25 | Investigation of the effectiveness of applying ChatGPT in Dialogic Teaching Using Electroencephalography | Jiayue Zhang et.al. | 2403.16687 | null |
| 2024-03-25 | ToXCL: A Unified Framework for Toxic Speech Detection and Explanation | Nhat M. Hoang et.al. | 2403.16685 | link |
| 2024-03-25 | RU22Fact: Optimizing Evidence for Multilingual Explainable Fact-Checking on Russia-Ukraine Conflict | Yirong Zeng et.al. | 2403.16662 | link |
| 2024-03-25 | Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT | Rohit Raju et.al. | 2403.16655 | null |
| 2024-03-25 | CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment | Feiteng Fang et.al. | 2403.16649 | null |
| 2024-03-25 | Virtual Co-Pilot: Multimodal Large Language Model-enabled Quick-access Procedures for Single Pilot Operations | Fan Li et.al. | 2403.16645 | null |
| 2024-03-25 | Conversational Grounding: Annotation and Analysis of Grounding Acts and Grounding Units | Biswesh Mohapatra et.al. | 2403.16609 | null |
| 2024-03-25 | TrustAI at SemEval-2024 Task 8: A Comprehensive Analysis of Multi-domain Machine Generated Text Detection Techniques | Ashok Urlana et.al. | 2403.16592 | null |
| 2024-03-25 | Can Large Language Models (or Humans) Distill Text? | Nicolas Audinet de Pieuchon et.al. | 2403.16584 | null |
| 2024-03-22 | LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models | Yuzhang Shang et.al. | 2403.15388 | null |
| 2024-03-22 | Long-CLIP: Unlocking the Long-Text Capability of CLIP | Beichen Zhang et.al. | 2403.15378 | null |
| 2024-03-22 | Can large language models explore in-context? | Akshay Krishnamurthy et.al. | 2403.15371 | null |
| 2024-03-22 | CoLLEGe: Concept Embedding Generation for Large Language Models | Ryan Teehan et.al. | 2403.15362 | null |
| 2024-03-22 | Multi-Review Fusion-in-Context | Aviv Slobodkin et.al. | 2403.15351 | null |
| 2024-03-22 | CO-Fun: A German Dataset on Company Outsourcing in Fund Prospectuses for Named Entity Recognition and Relation Extraction | Neda Foroutan et.al. | 2403.15322 | null |
| 2024-03-22 | Sphere Neural-Networks for Rational Reasoning | Tiansi Dong et.al. | 2403.15297 | null |
| 2024-03-22 | Measuring Gender and Racial Biases in Large Language Models | Jiafu An et.al. | 2403.15281 | null |
| 2024-03-22 | Bioinformatics and Biomedical Informatics with ChatGPT: Year One Review | Jinge Wang et.al. | 2403.15274 | null |
| 2024-03-22 | Event Temporal Relation Extraction based on Retrieval-Augmented on LLMs | Xiaobin Zhang et.al. | 2403.15273 | null |
| 2024-03-21 | MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? | Renrui Zhang et.al. | 2403.14624 | null |
| 2024-03-21 | Language Repository for Long Video Understanding | Kumara Kahatapitiya et.al. | 2403.14622 | link |
| 2024-03-21 | Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey | Zeyu Han et.al. | 2403.14608 | null |
| 2024-03-21 | MyVLM: Personalizing VLMs for User-Specific Queries | Yuval Alaluf et.al. | 2403.14599 | null |
| 2024-03-21 | Large Language Models for Multi-Choice Question Classification of Medical Subjects | Víctor Ponce-López et.al. | 2403.14582 | null |
| 2024-03-21 | RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain | William James Bolton et.al. | 2403.14578 | link |
| 2024-03-21 | A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students’ Formative Assessment Responses in Science | Clayton Cohn et.al. | 2403.14565 | null |
| 2024-03-21 | EDT: Improving Large Language Models’ Generation by Entropy-based Dynamic Temperature Sampling | Shimao Zhang et.al. | 2403.14541 | null |
| 2024-03-21 | Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference | Han Zhao et.al. | 2403.14520 | null |
| 2024-03-21 | The Ethics of ChatGPT in Medicine and Healthcare: A Systematic Review on Large Language Models (LLMs) | Joschka Haltaufderheide et.al. | 2403.14473 | null |
| 2024-03-20 | RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition | Ziyu Liu et.al. | 2403.13805 | null |
| 2024-03-20 | Learning from Models and Data for Visual Grounding | Ruozhen He et.al. | 2403.13804 | null |
| 2024-03-20 | Reverse Training to Nurse the Reversal Curse | Olga Golovneva et.al. | 2403.13799 | null |
| 2024-03-20 | Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts | Guangzeng Han et.al. | 2403.13786 | null |
| 2024-03-20 | Leveraging High-Resolution Features for Improved Deep Hashing-based Image Retrieval | Aymene Berriche et.al. | 2403.13747 | null |
| 2024-03-20 | EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation | Atnafu Lambebo Tonja et.al. | 2403.13737 | null |
| 2024-03-20 | Large Language Models meet Network Slicing Management and Orchestration | Abdulhalim Dandoush et.al. | 2403.13721 | null |
| 2024-03-20 | RoleInteract: Evaluating the Social Interaction of Role-Playing Agents | Hongzhan Chen et.al. | 2403.13679 | null |
| 2024-03-20 | Do Not Worry if You Do Not Have Data: Building Pretrained Language Models Using Translationese | Meet Doshi et.al. | 2403.13638 | null |
| 2024-03-20 | VL-Mamba: Exploring State Space Models for Multimodal Learning | Yanyuan Qiao et.al. | 2403.13600 | null |
| 2024-03-19 | Dated Data: Tracing Knowledge Cutoffs in Large Language Models | Jeffrey Cheng et.al. | 2403.12958 | null |
| 2024-03-19 | Automatic Information Extraction From Employment Tribunal Judgements Using Large Language Models | Joana Ribeiro de Faria et.al. | 2403.12936 | null |
| 2024-03-19 | Rapid AIdeation: Generating Ideas With the Self and in Collaboration With Large Language Models | Gionnieve Lim et.al. | 2403.12928 | null |
| 2024-03-19 | Supporting Energy Policy Research with Large Language Models | Grant Buster et.al. | 2403.12924 | null |
| 2024-03-19 | Semantic Layering in Room Segmentation via LLMs | Taehyeon Kim et.al. | 2403.12920 | null |
| 2024-03-19 | Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference | Baolin Li et.al. | 2403.12900 | null |
| 2024-03-19 | mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding | Anwen Hu et.al. | 2403.12895 | link |
| 2024-03-19 | MEDBind: Unifying Language and Multimodal Medical Data Embeddings | Yuan Gao et.al. | 2403.12894 | null |
| 2024-03-19 | HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning | Fucai Ke et.al. | 2403.12884 | null |
| 2024-03-19 | Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | Zehui Chen et.al. | 2403.12881 | link |
| 2024-03-18 | HDLdebugger: Streamlining HDL debugging with Large Language Models | Xufeng Yao et.al. | 2403.11671 | null |
| 2024-03-18 | Let’s Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model | Haoyun Xu et.al. | 2403.11621 | null |
| 2024-03-18 | Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines | Ekaterina Trofimova et.al. | 2403.11585 | null |
| 2024-03-18 | Reinforcement Learning with Token-level Feedback for Controllable Text Generation | Wendi Li et.al. | 2403.11558 | null |
| 2024-03-18 | LLM^3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning | Shu Wang et.al. | 2403.11552 | link |
| 2024-03-18 | TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling | Weiran Chen et.al. | 2403.11550 | null |
| 2024-03-18 | DEE: Dual-stage Explainable Evaluation Method for Text Generation | Shenyu Zhang et.al. | 2403.11509 | null |
| 2024-03-18 | Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis | Vishnu Sashank Dorbala et.al. | 2403.11487 | null |
| 2024-03-18 | VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding | Yue Fan et.al. | 2403.11481 | null |
| 2024-03-18 | HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models | Huy Nghiem et.al. | 2403.11456 | link |
| 2024-03-14 | Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference | Piotr Nawrot et.al. | 2403.09636 | null |
| 2024-03-14 | 3D-VLA: A 3D Vision-Language-Action Generative World Model | Haoyu Zhen et.al. | 2403.09631 | null |
| 2024-03-14 | MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training | Brandon McKinzie et.al. | 2403.09611 | null |
| 2024-03-14 | Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey | Xiaoyu Liu et.al. | 2403.09606 | null |
| 2024-03-14 | Logical Discrete Graphical Models Must Supplement Large Language Models for Information Synthesis | Gregory Coppola et.al. | 2403.09599 | null |
| 2024-03-14 | ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models | Runyu Ma et.al. | 2403.09583 | null |
| 2024-03-14 | Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation | Yunhao Gou et.al. | 2403.09572 | null |
| 2024-03-14 | Enhancing Trust in Autonomous Agents: An Architecture for Accountability and Explainability through Blockchain and Large Language Models | Laura Fernández-Becerra et.al. | 2403.09567 | null |
| 2024-03-14 | Welcome Your New AI Teammate: On Safety Analysis by Leashing Large Language Models | Ali Nouri et.al. | 2403.09565 | null |
| 2024-03-14 | Less is More: Data Value Estimation for Visual Instruction Tuning | Zikang Liu et.al. | 2403.09559 | null |
| 2024-03-13 | Simple and Scalable Strategies to Continually Pre-train Large Language Models | Adam Ibrahim et.al. | 2403.08763 | null |
| 2024-03-13 | Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework | Jingling Li et.al. | 2403.08743 | null |
| 2024-03-13 | The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models | Carlo Nicolini et.al. | 2403.08739 | null |
| 2024-03-13 | Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization | Renjie Pi et.al. | 2403.08730 | null |
| 2024-03-14 | SOTOPIA- $π$ : Interactive Learning of Socially Intelligent Language Agents | Ruiyi Wang et.al. | 2403.08715 | link |
| 2024-03-13 | Review of Generative AI Methods in Cybersecurity | Yagmur Yigit et.al. | 2403.08701 | null |
| 2024-03-13 | TeaMs-RL: Teaching LLMs to Teach Themselves Better Instructions via Reinforcement Learning | Shangding Gu et.al. | 2403.08694 | null |
| 2024-03-13 | Token Alignment via Character Matching for Subword Completion | Ben Athiwaratkun et.al. | 2403.08688 | null |
| 2024-03-13 | Zero-shot and Few-shot Generation Strategies for Artificial Clinical Records | Erlend Frayling et.al. | 2403.08664 | null |
| 2024-03-13 | Human Alignment of Large Language Models through Online Preference Optimisation | Daniele Calandriello et.al. | 2403.08635 | null |
| 2024-03-12 | Beyond Text: Frozen Large Language Models in Visual Signal Comprehension | Lei Zhu et.al. | 2403.07874 | link |
| 2024-03-12 | Rethinking Generative Large Language Model Evaluation for Semantic Comprehension | Fangyun Wei et.al. | 2403.07872 | null |
| 2024-03-12 | Exploring Safety Generalization Challenges of Large Language Models via Code | Qibing Ren et.al. | 2403.07865 | null |
| 2024-03-12 | DeliGrasp: Inferring Object Mass, Friction, and Compliance with LLMs for Adaptive and Minimally Deforming Grasp Policies | William Xie et.al. | 2403.07832 | null |
| 2024-03-12 | The Missing Piece in Model Editing: A Deep Dive into the Hidden Damage Brought By Model Editing | Jianchen Wang et.al. | 2403.07825 | null |
| 2024-03-12 | Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM | Sainbayar Sukhbaatar et.al. | 2403.07816 | null |
| 2024-03-12 | Fine-tuning Large Language Models with Sequential Instructions | Hanxu Hu et.al. | 2403.07794 | link |
| 2024-03-12 | Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations | Carlos Jose Xavier Cruz et.al. | 2403.07769 | link |
| 2024-03-12 | Synth $^2$ : Boosting Visual-Language Models with Synthetic Captions and Image Embeddings | Sahand Sharifzadeh et.al. | 2403.07750 | null |
| 2024-03-12 | FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models | Yan Liu et.al. | 2403.07747 | null |
| 2024-03-11 | Hybrid Human-LLM Corpus Construction and LLM Evaluation for Rare Linguistic Phenomena | Leonie Weissweiler et.al. | 2403.06965 | null |
| 2024-03-11 | Materials science in the era of large language models: a perspective | Ge Lei et.al. | 2403.06949 | null |
| 2024-03-11 | Naming, Describing, and Quantifying Visual Objects in Humans and LLMs | Alberto Testoni et.al. | 2403.06935 | null |
| 2024-03-11 | ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis | Yanming Liu et.al. | 2403.06932 | link |
| 2024-03-11 | MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning | Yichuan Li et.al. | 2403.06914 | null |
| 2024-03-11 | Exploring Large Language Models and Hierarchical Frameworks for Classification of Large Unstructured Legal Documents | Nishchal Prasad et.al. | 2403.06872 | null |
| 2024-03-11 | Development of a Reliable and Accessible Caregiving Language Model (CaLM) | Bambang Parmanto et.al. | 2403.06857 | null |
| 2024-03-11 | DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation | Guosheng Zhao et.al. | 2403.06845 | null |
| 2024-03-11 | RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback | Yanming Liu et.al. | 2403.06840 | link |
| 2024-03-11 | ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts | Lyuye Zhang et.al. | 2403.06838 | null |
| 2024-03-08 | Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context | Machel Reid et.al. | 2403.05530 | null |
| 2024-03-08 | GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM | Hao Kang et.al. | 2403.05527 | link |
| 2024-03-08 | Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapola | Yijiang Li et.al. | 2403.05523 | null |
| 2024-03-08 | Will GPT-4 Run DOOM? | Adrian de Wynter et.al. | 2403.05468 | null |
| 2024-03-08 | Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs | Arijit Nag et.al. | 2403.05434 | null |
| 2024-03-08 | Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings | Wei Zhou et.al. | 2403.05338 | null |
| 2024-03-08 | ChatASU: Evoking LLM’s Reflexion to Truly Understand Aspect Sentiment in Dialogues | Yiding Liu et.al. | 2403.05326 | null |
| 2024-03-08 | RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | Zihao Wang et.al. | 2403.05313 | null |
| 2024-03-08 | Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents | Jinyang Li et.al. | 2403.05307 | null |
| 2024-03-08 | ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications | Sotaro Takeshita et.al. | 2403.05303 | link |
| 2024-03-07 | Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed | Yifan Wang et.al. | 2403.04765 | null |
| 2024-03-07 | iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries | Adam Coscia et.al. | 2403.04760 | link |
| 2024-03-07 | KnowledgeVIS: Interpreting Language Models by Comparing Fill-in-the-Blank Prompts | Adam Coscia et.al. | 2403.04758 | link |
| 2024-03-07 | LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error | Boshi Wang et.al. | 2403.04746 | link |
| 2024-03-07 | SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM | Jielin Qiu et.al. | 2403.04735 | null |
| 2024-03-07 | ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes | Hashmat Shadab Malik et.al. | 2403.04701 | null |
| 2024-03-07 | Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification | Ekaterina Fadeeva et.al. | 2403.04696 | null |
| 2024-03-07 | PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation | Junsong Chen et.al. | 2403.04692 | null |
| 2024-03-07 | Telecom Language Models: Must They Be Large? | Nicola Piovesan et.al. | 2403.04666 | null |
| 2024-03-07 | QAQ: Quality Adaptive Quantization for LLM KV Cache | Shichen Dong et.al. | 2403.04643 | link |
| 2024-03-06 | Bridging Language and Items for Retrieval and Recommendation | Yupeng Hou et.al. | 2403.03952 | link |
| 2024-03-06 | Did Translation Models Get More Robust Without Anyone Even Noticing? | Ben Peters et.al. | 2403.03923 | null |
| 2024-03-06 | Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing | Asmita et.al. | 2403.03897 | null |
| 2024-03-06 | SaulLM-7B: A pioneering Large Language Model for Law | Pierre Colombo et.al. | 2403.03883 | null |
| 2024-03-06 | Learning to Decode Collaboratively with Multiple Language Models | Shannon Zejiang Shen et.al. | 2403.03870 | link |
| 2024-03-06 | On the Origins of Linear Representations in Large Language Models | Yibo Jiang et.al. | 2403.03867 | null |
| 2024-03-06 | KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions | Fangyuan Xu et.al. | 2403.03866 | null |
| 2024-03-06 | Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning | Deepanway Ghosal et.al. | 2403.03864 | link |
| 2024-03-06 | X-Shot: A Unified System to Handle Frequent, Few-shot and Zero-shot Learning Simultaneously in Classification | Hanzi Xu et.al. | 2403.03863 | link |
| 2024-03-06 | Emojinize : Enriching Any Text with Emoji Translations | Lars Henning Klein et.al. | 2403.03857 | null |
| 2024-03-05 | The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning | Nathaniel Li et.al. | 2403.03218 | null |
| 2024-03-05 | CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments | Savitha Sam Abraham et.al. | 2403.03203 | null |
| 2024-03-05 | Towards Democratized Flood Risk Management: An Advanced AI Assistant Enabled by GPT-4 for Enhanced Interpretability and Public Engagement | Rafaela Martelo et.al. | 2403.03188 | link |
| 2024-03-05 | MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting | Fangchen Liu et.al. | 2403.03174 | null |
| 2024-03-05 | SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection | Peng Qi et.al. | 2403.03170 | null |
| 2024-03-05 | PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset | Arda Uzunoğlu et.al. | 2403.03167 | link |
| 2024-03-05 | Quantum Many-Body Physics Calculations with Large Language Models | Haining Pan et.al. | 2403.03154 | null |
| 2024-03-05 | Language Guided Exploration for RL Agents in Text Environments | Hitesh Golchha et.al. | 2403.03141 | null |
| 2024-03-05 | Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution | Flor Miriam Plaza-del-Arco et.al. | 2403.03121 | null |
| 2024-03-05 | “In Dialogues We Learn”: Towards Personalized Dialogue Without Pre-defined Profiles through In-Dialogue Learning | Chuanqi Cheng et.al. | 2403.03102 | null |
| 2024-03-02 | LM4OPT: Unveiling the Potential of Large Language Models in Formulating Mathematical Optimization Problems | Tasnim Ahmed et.al. | 2403.01342 | null |
| 2024-03-02 | Chaining thoughts and LLMs to learn DNA structural biophysics | Tyler D. Ross et.al. | 2403.01332 | null |
| 2024-03-02 | VNLP: Turkish NLP Package | Meliksah Turker et.al. | 2403.01309 | null |
| 2024-03-02 | VBART: The Turkish LLM | Meliksah Turker et.al. | 2403.01308 | null |
| 2024-03-02 | ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation | Moran Yanuka et.al. | 2403.01306 | null |
| 2024-03-02 | Improving the Validity of Automatically Generated Feedback via Reinforcement Learning | Alexander Scarlatos et.al. | 2403.01304 | link |
| 2024-03-02 | NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention | Tianyi Zhang et.al. | 2403.01273 | null |
| 2024-03-02 | Employing LLMs for Incident Response Planning and Review | Sam Hays et.al. | 2403.01271 | null |
| 2024-03-02 | A comprehensive cross-language framework for harmful content detection with the aid of sentiment analysis | Mohammad Dehghani et.al. | 2403.01270 | null |
| 2024-03-02 | Dissecting Language Models: Machine Unlearning via Selective Pruning | Nicholas Pochinkov et.al. | 2403.01267 | null |
| 2024-02-29 | The All-Seeing Project V2: Towards General Relation Comprehension of the Open World | Weiyun Wang et.al. | 2402.19474 | link |
| 2024-02-29 | Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling | Gabriel Grand et.al. | 2402.19471 | null |
| 2024-02-29 | Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models | Chen Qian et.al. | 2402.19465 | link |
| 2024-02-29 | Curiosity-driven Red-teaming for Large Language Models | Zhang-Wei Hong et.al. | 2402.19464 | link |
| 2024-02-29 | ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL | Yifei Zhou et.al. | 2402.19446 | link |
| 2024-02-29 | Compositional API Recommendation for Library-Oriented Code Generation | Zexiong Ma et.al. | 2402.19431 | null |
| 2024-02-29 | Crafting Knowledge: Exploring the Creative Mechanisms of Chat-Based Search Engines | Lijia Ma et.al. | 2402.19421 | null |
| 2024-02-29 | On the Scaling Laws of Geographical Representation in Language Models | Nathan Godey et.al. | 2402.19406 | null |
| 2024-02-29 | Entity-Aware Multimodal Alignment Framework for News Image Captioning | Junzhe Zhang et.al. | 2402.19404 | null |
| 2024-02-29 | Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Match Human Crowd Accuracy | Philipp Schoenegger et.al. | 2402.19379 | null |
| 2024-02-28 | Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards | Haoxiang Wang et.al. | 2402.18571 | link |
| 2024-02-28 | A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic | Gregory Coppola et.al. | 2402.18566 | null |
| 2024-02-28 | Implicit Bias of Next-Token Prediction | Christos Thrampoulidis et.al. | 2402.18551 | null |
| 2024-02-28 | Few-Shot Fairness: Unveiling LLM’s Potential for Fairness-Aware Classification | Garima Chhikara et.al. | 2402.18502 | null |
| 2024-02-28 | Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration | Crystal Qian et.al. | 2402.18498 | null |
| 2024-02-28 | Language Models Represent Beliefs of Self and Others | Wentao Zhu et.al. | 2402.18496 | null |
| 2024-02-28 | Meta-Task Prompting Elicits Embedding from Large Language Models | Yibin Lei et.al. | 2402.18458 | null |
| 2024-02-28 | Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication | Weize Chen et.al. | 2402.18439 | link |
| 2024-02-28 | Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport | Bin Li et.al. | 2402.18411 | link |
| 2024-02-28 | A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision Language Models | Xiujie Song et.al. | 2402.18409 | null |
(<a href=../README.md>back to main</a>)