Action Recognition - 2025-10 | Paper Arxiv Daily

Action Recognition - 2025-10

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-10-14	MOTION: ML-Assisted On-Device Low-Latency Motion Recognition	Veeramani Pugazhenthi et.al.	2512.00008	translate	read	null
2025-10-30	Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail	NVIDIA et.al.	2511.00088	translate	read	null
2025-10-31	Enhancing Spatio-Temporal Zero-shot Action Recognition with Language-driven Description Attributes	Yehna Kim et.al.	2510.27255	translate	read	null
2025-10-31	GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation	Tao Liu et.al.	2510.27210	translate	read	null
2025-10-30	Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras	Christoffer Koo Øhrstrøm et.al.	2510.26614	translate	read	null
2025-10-29	Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders	Ali Rasekh et.al.	2510.26027	translate	read	null
2025-10-29	Informative Sample Selection Model for Skeleton-based Action Recognition with Limited Training Samples	Zhigang Tu et.al.	2510.25345	translate	read	null
2025-10-27	Evaluation of Vision-LLMs in Surveillance Video	Pascal Benschop et.al.	2510.23190	translate	read	null
2025-10-27	Enabling Vibration-Based Gesture Recognition on Everyday Furniture via Energy-Efficient FPGA Implementation of 1D Convolutional Networks	Koki Shibata et.al.	2510.23156	translate	read	null
2025-10-27	Neural Recording Power Optimization Through Machine Learning Guided Resolution Reconfiguration	Aviral Pandey et.al.	2510.22924	translate	read	null
2025-10-13	J-ORA: A Framework and Multimodal Dataset for Japanese Object Identification, Reference, Action Prediction in Robot Perception	Jesse Atuhurra et.al.	2510.21761	translate	read	null
2025-10-22	From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction	Zhida Zhao et.al.	2510.19654	translate	read	null
2025-10-22	Vision-Based Mistake Analysis in Procedural Activities: A Review of Advances and Challenges	Konstantinos Bacharidis et.al.	2510.19292	translate	read	null
2025-10-22	MobiAct: Efficient MAV Action Recognition Using MobileNetV4 with Contrastive Learning and Knowledge Distillation	Zhang Nengbo et.al.	2510.19273	translate	read	null
2025-10-22	See, Think, Act: Online Shopper Behavior Simulation with VLM Agents	Yimeng Zhang et.al.	2510.19245	translate	read	null
2025-10-21	UniHPR: Unified Human Pose Representation via Singular Value Contrastive Learning	Zhongyu Jiang et.al.	2510.19078	translate	read	null
2025-10-21	A Renaissance of Explicit Motion Information Mining from Transformers for Action Recognition	Peiqin Zhuang et.al.	2510.18705	translate	read	null
2025-10-21	Biomechanically consistent real-time action recognition for human-robot interaction	Wanchen Li et.al.	2510.18373	translate	read	null
2025-10-21	FST.ai 2.0: An Explainable AI Ecosystem for Fair, Fast, and Inclusive Decision-Making in Olympic and Paralympic Taekwondo	Keivan Shariatmadar et.al.	2510.18193	translate	read	null
2025-10-20	Muscle Anatomy-aware Geometric Deep Learning for sEMG-based Gesture Decoding	Adyasha Dash et.al.	2510.17660	translate	read	null
2025-10-18	MoS-VLA: A Vision-Language-Action Model with One-Shot Skill Adaptation	Ruihan Zhao et.al.	2510.16617	translate	read	null
2025-10-18	RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba	Kunyu Peng et.al.	2510.16444	translate	read	null
2025-10-17	StretchySnake: Flexible SSM Training Unlocks Action Recognition Across Spatio-Temporal Scales	Nyle Siddiqui et.al.	2510.16209	translate	read	null
2025-10-17	MAVR-Net: Robust Multi-View Learning for MAV Action Recognition with Cross-View Attention	Nengbo Zhang et.al.	2510.15448	translate	read	null
2025-10-16	MoCom: Motion-based Inter-MAV Visual Communication Using Event Vision and Spiking Neural Networks	Zhang Nengbo et.al.	2510.14770	translate	read	null
2025-10-15	Generalizing WiFi Gesture Recognition via Large-Model-Aware Semantic Distillation and Alignment	Feng-Qi Cui et.al.	2510.13390	translate	read	null
2025-10-14	SVAG-Bench: A Large-Scale Benchmark for Multi-Instance Spatio-temporal Video Action Grounding	Tanveer Hannan et.al.	2510.13016	translate	read	null
2025-10-13	FOSSIL: Harnessing Feedback on Suboptimal Samples for Data-Efficient Generalisation with Imitation Learning for Embodied Vision-and-Language Tasks	Sabrina McCallum et.al.	2510.11307	translate	read	null
2025-10-13	Mixup Helps Understanding Multimodal Video Better	Xiaoyu Ma et.al.	2510.10986	translate	read	null
2025-10-12	MSF-Mamba: Motion-aware State Fusion Mamba for Efficient Micro-Gesture Recognition	Deng Li et.al.	2510.10478	translate	read	null
2025-10-11	Dejavu: Towards Experience Feedback Learning for Embodied Intelligence	Shaokai Wu et.al.	2510.10181	translate	read	null
2025-10-11	SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation	Zeyu Ling et.al.	2510.10069	translate	read	null
2025-10-10	Enhancing Diffusion Policy with Classifier-Free Guidance for Temporal Robotic Tasks	Yuang Lu et.al.	2510.09786	translate	read	null
2025-10-10	Diagnosing Shoulder Disorders Using Multimodal Large Language Models and Consumer-Grade Cameras	Jindong Hong et.al.	2510.09230	translate	read	null
2025-10-09	Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools	Zhenlong Yuan et.al.	2510.08480	translate	read	null
2025-10-09	MMHOI: Modeling Complex 3D Multi-Human Multi-Object Interactions	Kaen Kogashi et.al.	2510.07828	translate	read	null
2025-10-07	Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model	Danush Kumar Venkatesh et.al.	2510.07345	translate	read	null
2025-10-08	Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping	Ziyi Wang et.al.	2510.07230	translate	read	null
2025-10-08	TrackVLA++: Unleashing Reasoning and Memory Capabilities in VLA Models for Embodied Visual Tracking	Jiahang Liu et.al.	2510.07134	translate	read	null
2025-10-07	From Captions to Keyframes: KeyScore for Multimodal Frame Scoring and Video-Language Understanding	Shih-Yao Lin et.al.	2510.06509	translate	read	null
2025-10-07	Human Action Recognition from Point Clouds over Time	James Dickens et.al.	2510.05506	translate	read	null
2025-10-05	Speculative Actions: A Lossless Framework for Faster Agentic Systems	Naimeng Ye et.al.	2510.04371	translate	read	null
2025-10-04	Talking Tennis: Language Feedback from 3D Biomechanical Action Recognition	Arushi Dashore et.al.	2510.03921	translate	read	null
2025-10-04	MonitorVLM:A Vision Language Framework for Safety Violation Detection in Mining Operations	Jiang Wu et.al.	2510.03666	translate	read	null
2025-10-03	FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents	Imene Kerboua et.al.	2510.03204	translate	read	null
2025-10-02	Wearable and Ultra-Low-Power Fusion of EMG and A-Mode US for Hand-Wrist Kinematic Tracking	Giusy Spacone et.al.	2510.02000	translate	read	null
2025-10-02	Contrastive Representation Regularization for Vision-Language-Action Models	Taeyoung Kim et.al.	2510.01711	translate	read	null
2025-10-01	EvoStruggle: A Dataset Capturing the Evolution of Struggle across Activities and Skill Levels	Shijia Feng et.al.	2510.01362	translate	read	null
2025-10-01	HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy	Myungkyu Koo et.al.	2510.00695	translate	read	null
2025-10-01	Expandable Decision-Making States for Multi-Agent Deep Reinforcement Learning in Soccer Tactical Analysis	Kenjiro Ide et.al.	2510.00480	translate	read	null

(<a href=../Action_Recognition.md>back to Action Recognition</a>)