Action Recognition - 2025-10

Publish Date Title Authors PDF Translate Read Code
2025-10-14 MOTION: ML-Assisted On-Device Low-Latency Motion Recognition Veeramani Pugazhenthi et.al. 2512.00008 translate read null
2025-10-30 Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail NVIDIA et.al. 2511.00088 translate read null
2025-10-31 Enhancing Spatio-Temporal Zero-shot Action Recognition with Language-driven Description Attributes Yehna Kim et.al. 2510.27255 translate read null
2025-10-31 GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation Tao Liu et.al. 2510.27210 translate read null
2025-10-30 Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras Christoffer Koo Øhrstrøm et.al. 2510.26614 translate read null
2025-10-29 Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders Ali Rasekh et.al. 2510.26027 translate read null
2025-10-29 Informative Sample Selection Model for Skeleton-based Action Recognition with Limited Training Samples Zhigang Tu et.al. 2510.25345 translate read null
2025-10-27 Evaluation of Vision-LLMs in Surveillance Video Pascal Benschop et.al. 2510.23190 translate read null
2025-10-27 Enabling Vibration-Based Gesture Recognition on Everyday Furniture via Energy-Efficient FPGA Implementation of 1D Convolutional Networks Koki Shibata et.al. 2510.23156 translate read null
2025-10-27 Neural Recording Power Optimization Through Machine Learning Guided Resolution Reconfiguration Aviral Pandey et.al. 2510.22924 translate read null
2025-10-13 J-ORA: A Framework and Multimodal Dataset for Japanese Object Identification, Reference, Action Prediction in Robot Perception Jesse Atuhurra et.al. 2510.21761 translate read null
2025-10-22 From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction Zhida Zhao et.al. 2510.19654 translate read null
2025-10-22 Vision-Based Mistake Analysis in Procedural Activities: A Review of Advances and Challenges Konstantinos Bacharidis et.al. 2510.19292 translate read null
2025-10-22 MobiAct: Efficient MAV Action Recognition Using MobileNetV4 with Contrastive Learning and Knowledge Distillation Zhang Nengbo et.al. 2510.19273 translate read null
2025-10-22 See, Think, Act: Online Shopper Behavior Simulation with VLM Agents Yimeng Zhang et.al. 2510.19245 translate read null
2025-10-21 UniHPR: Unified Human Pose Representation via Singular Value Contrastive Learning Zhongyu Jiang et.al. 2510.19078 translate read null
2025-10-21 A Renaissance of Explicit Motion Information Mining from Transformers for Action Recognition Peiqin Zhuang et.al. 2510.18705 translate read null
2025-10-21 Biomechanically consistent real-time action recognition for human-robot interaction Wanchen Li et.al. 2510.18373 translate read null
2025-10-21 FST.ai 2.0: An Explainable AI Ecosystem for Fair, Fast, and Inclusive Decision-Making in Olympic and Paralympic Taekwondo Keivan Shariatmadar et.al. 2510.18193 translate read null
2025-10-20 Muscle Anatomy-aware Geometric Deep Learning for sEMG-based Gesture Decoding Adyasha Dash et.al. 2510.17660 translate read null
2025-10-18 MoS-VLA: A Vision-Language-Action Model with One-Shot Skill Adaptation Ruihan Zhao et.al. 2510.16617 translate read null
2025-10-18 RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba Kunyu Peng et.al. 2510.16444 translate read null
2025-10-17 StretchySnake: Flexible SSM Training Unlocks Action Recognition Across Spatio-Temporal Scales Nyle Siddiqui et.al. 2510.16209 translate read null
2025-10-17 MAVR-Net: Robust Multi-View Learning for MAV Action Recognition with Cross-View Attention Nengbo Zhang et.al. 2510.15448 translate read null
2025-10-16 MoCom: Motion-based Inter-MAV Visual Communication Using Event Vision and Spiking Neural Networks Zhang Nengbo et.al. 2510.14770 translate read null
2025-10-15 Generalizing WiFi Gesture Recognition via Large-Model-Aware Semantic Distillation and Alignment Feng-Qi Cui et.al. 2510.13390 translate read null
2025-10-14 SVAG-Bench: A Large-Scale Benchmark for Multi-Instance Spatio-temporal Video Action Grounding Tanveer Hannan et.al. 2510.13016 translate read null
2025-10-13 FOSSIL: Harnessing Feedback on Suboptimal Samples for Data-Efficient Generalisation with Imitation Learning for Embodied Vision-and-Language Tasks Sabrina McCallum et.al. 2510.11307 translate read null
2025-10-13 Mixup Helps Understanding Multimodal Video Better Xiaoyu Ma et.al. 2510.10986 translate read null
2025-10-12 MSF-Mamba: Motion-aware State Fusion Mamba for Efficient Micro-Gesture Recognition Deng Li et.al. 2510.10478 translate read null
2025-10-11 Dejavu: Towards Experience Feedback Learning for Embodied Intelligence Shaokai Wu et.al. 2510.10181 translate read null
2025-10-11 SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation Zeyu Ling et.al. 2510.10069 translate read null
2025-10-10 Enhancing Diffusion Policy with Classifier-Free Guidance for Temporal Robotic Tasks Yuang Lu et.al. 2510.09786 translate read null
2025-10-10 Diagnosing Shoulder Disorders Using Multimodal Large Language Models and Consumer-Grade Cameras Jindong Hong et.al. 2510.09230 translate read null
2025-10-09 Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools Zhenlong Yuan et.al. 2510.08480 translate read null
2025-10-09 MMHOI: Modeling Complex 3D Multi-Human Multi-Object Interactions Kaen Kogashi et.al. 2510.07828 translate read null
2025-10-07 Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model Danush Kumar Venkatesh et.al. 2510.07345 translate read null
2025-10-08 Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping Ziyi Wang et.al. 2510.07230 translate read null
2025-10-08 TrackVLA++: Unleashing Reasoning and Memory Capabilities in VLA Models for Embodied Visual Tracking Jiahang Liu et.al. 2510.07134 translate read null
2025-10-07 From Captions to Keyframes: KeyScore for Multimodal Frame Scoring and Video-Language Understanding Shih-Yao Lin et.al. 2510.06509 translate read null
2025-10-07 Human Action Recognition from Point Clouds over Time James Dickens et.al. 2510.05506 translate read null
2025-10-05 Speculative Actions: A Lossless Framework for Faster Agentic Systems Naimeng Ye et.al. 2510.04371 translate read null
2025-10-04 Talking Tennis: Language Feedback from 3D Biomechanical Action Recognition Arushi Dashore et.al. 2510.03921 translate read null
2025-10-04 MonitorVLM:A Vision Language Framework for Safety Violation Detection in Mining Operations Jiang Wu et.al. 2510.03666 translate read null
2025-10-03 FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents Imene Kerboua et.al. 2510.03204 translate read null
2025-10-02 Wearable and Ultra-Low-Power Fusion of EMG and A-Mode US for Hand-Wrist Kinematic Tracking Giusy Spacone et.al. 2510.02000 translate read null
2025-10-02 Contrastive Representation Regularization for Vision-Language-Action Models Taeyoung Kim et.al. 2510.01711 translate read null
2025-10-01 EvoStruggle: A Dataset Capturing the Evolution of Struggle across Activities and Skill Levels Shijia Feng et.al. 2510.01362 translate read null
2025-10-01 HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy Myungkyu Koo et.al. 2510.00695 translate read null
2025-10-01 Expandable Decision-Making States for Multi-Agent Deep Reinforcement Learning in Soccer Tactical Analysis Kenjiro Ide et.al. 2510.00480 translate read null

(<a href=../Action_Recognition.md>back to Action Recognition</a>)