Action Recognition - 2025-10
Action Recognition - 2025-10
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-10-14 | MOTION: ML-Assisted On-Device Low-Latency Motion Recognition | Veeramani Pugazhenthi et.al. | 2512.00008 | translate | read | null |
| 2025-10-30 | Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail | NVIDIA et.al. | 2511.00088 | translate | read | null |
| 2025-10-31 | Enhancing Spatio-Temporal Zero-shot Action Recognition with Language-driven Description Attributes | Yehna Kim et.al. | 2510.27255 | translate | read | null |
| 2025-10-31 | GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation | Tao Liu et.al. | 2510.27210 | translate | read | null |
| 2025-10-30 | Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras | Christoffer Koo Øhrstrøm et.al. | 2510.26614 | translate | read | null |
| 2025-10-29 | Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders | Ali Rasekh et.al. | 2510.26027 | translate | read | null |
| 2025-10-29 | Informative Sample Selection Model for Skeleton-based Action Recognition with Limited Training Samples | Zhigang Tu et.al. | 2510.25345 | translate | read | null |
| 2025-10-27 | Evaluation of Vision-LLMs in Surveillance Video | Pascal Benschop et.al. | 2510.23190 | translate | read | null |
| 2025-10-27 | Enabling Vibration-Based Gesture Recognition on Everyday Furniture via Energy-Efficient FPGA Implementation of 1D Convolutional Networks | Koki Shibata et.al. | 2510.23156 | translate | read | null |
| 2025-10-27 | Neural Recording Power Optimization Through Machine Learning Guided Resolution Reconfiguration | Aviral Pandey et.al. | 2510.22924 | translate | read | null |
| 2025-10-13 | J-ORA: A Framework and Multimodal Dataset for Japanese Object Identification, Reference, Action Prediction in Robot Perception | Jesse Atuhurra et.al. | 2510.21761 | translate | read | null |
| 2025-10-22 | From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction | Zhida Zhao et.al. | 2510.19654 | translate | read | null |
| 2025-10-22 | Vision-Based Mistake Analysis in Procedural Activities: A Review of Advances and Challenges | Konstantinos Bacharidis et.al. | 2510.19292 | translate | read | null |
| 2025-10-22 | MobiAct: Efficient MAV Action Recognition Using MobileNetV4 with Contrastive Learning and Knowledge Distillation | Zhang Nengbo et.al. | 2510.19273 | translate | read | null |
| 2025-10-22 | See, Think, Act: Online Shopper Behavior Simulation with VLM Agents | Yimeng Zhang et.al. | 2510.19245 | translate | read | null |
| 2025-10-21 | UniHPR: Unified Human Pose Representation via Singular Value Contrastive Learning | Zhongyu Jiang et.al. | 2510.19078 | translate | read | null |
| 2025-10-21 | A Renaissance of Explicit Motion Information Mining from Transformers for Action Recognition | Peiqin Zhuang et.al. | 2510.18705 | translate | read | null |
| 2025-10-21 | Biomechanically consistent real-time action recognition for human-robot interaction | Wanchen Li et.al. | 2510.18373 | translate | read | null |
| 2025-10-21 | FST.ai 2.0: An Explainable AI Ecosystem for Fair, Fast, and Inclusive Decision-Making in Olympic and Paralympic Taekwondo | Keivan Shariatmadar et.al. | 2510.18193 | translate | read | null |
| 2025-10-20 | Muscle Anatomy-aware Geometric Deep Learning for sEMG-based Gesture Decoding | Adyasha Dash et.al. | 2510.17660 | translate | read | null |
| 2025-10-18 | MoS-VLA: A Vision-Language-Action Model with One-Shot Skill Adaptation | Ruihan Zhao et.al. | 2510.16617 | translate | read | null |
| 2025-10-18 | RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba | Kunyu Peng et.al. | 2510.16444 | translate | read | null |
| 2025-10-17 | StretchySnake: Flexible SSM Training Unlocks Action Recognition Across Spatio-Temporal Scales | Nyle Siddiqui et.al. | 2510.16209 | translate | read | null |
| 2025-10-17 | MAVR-Net: Robust Multi-View Learning for MAV Action Recognition with Cross-View Attention | Nengbo Zhang et.al. | 2510.15448 | translate | read | null |
| 2025-10-16 | MoCom: Motion-based Inter-MAV Visual Communication Using Event Vision and Spiking Neural Networks | Zhang Nengbo et.al. | 2510.14770 | translate | read | null |
| 2025-10-15 | Generalizing WiFi Gesture Recognition via Large-Model-Aware Semantic Distillation and Alignment | Feng-Qi Cui et.al. | 2510.13390 | translate | read | null |
| 2025-10-14 | SVAG-Bench: A Large-Scale Benchmark for Multi-Instance Spatio-temporal Video Action Grounding | Tanveer Hannan et.al. | 2510.13016 | translate | read | null |
| 2025-10-13 | FOSSIL: Harnessing Feedback on Suboptimal Samples for Data-Efficient Generalisation with Imitation Learning for Embodied Vision-and-Language Tasks | Sabrina McCallum et.al. | 2510.11307 | translate | read | null |
| 2025-10-13 | Mixup Helps Understanding Multimodal Video Better | Xiaoyu Ma et.al. | 2510.10986 | translate | read | null |
| 2025-10-12 | MSF-Mamba: Motion-aware State Fusion Mamba for Efficient Micro-Gesture Recognition | Deng Li et.al. | 2510.10478 | translate | read | null |
| 2025-10-11 | Dejavu: Towards Experience Feedback Learning for Embodied Intelligence | Shaokai Wu et.al. | 2510.10181 | translate | read | null |
| 2025-10-11 | SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation | Zeyu Ling et.al. | 2510.10069 | translate | read | null |
| 2025-10-10 | Enhancing Diffusion Policy with Classifier-Free Guidance for Temporal Robotic Tasks | Yuang Lu et.al. | 2510.09786 | translate | read | null |
| 2025-10-10 | Diagnosing Shoulder Disorders Using Multimodal Large Language Models and Consumer-Grade Cameras | Jindong Hong et.al. | 2510.09230 | translate | read | null |
| 2025-10-09 | Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools | Zhenlong Yuan et.al. | 2510.08480 | translate | read | null |
| 2025-10-09 | MMHOI: Modeling Complex 3D Multi-Human Multi-Object Interactions | Kaen Kogashi et.al. | 2510.07828 | translate | read | null |
| 2025-10-07 | Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model | Danush Kumar Venkatesh et.al. | 2510.07345 | translate | read | null |
| 2025-10-08 | Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping | Ziyi Wang et.al. | 2510.07230 | translate | read | null |
| 2025-10-08 | TrackVLA++: Unleashing Reasoning and Memory Capabilities in VLA Models for Embodied Visual Tracking | Jiahang Liu et.al. | 2510.07134 | translate | read | null |
| 2025-10-07 | From Captions to Keyframes: KeyScore for Multimodal Frame Scoring and Video-Language Understanding | Shih-Yao Lin et.al. | 2510.06509 | translate | read | null |
| 2025-10-07 | Human Action Recognition from Point Clouds over Time | James Dickens et.al. | 2510.05506 | translate | read | null |
| 2025-10-05 | Speculative Actions: A Lossless Framework for Faster Agentic Systems | Naimeng Ye et.al. | 2510.04371 | translate | read | null |
| 2025-10-04 | Talking Tennis: Language Feedback from 3D Biomechanical Action Recognition | Arushi Dashore et.al. | 2510.03921 | translate | read | null |
| 2025-10-04 | MonitorVLM:A Vision Language Framework for Safety Violation Detection in Mining Operations | Jiang Wu et.al. | 2510.03666 | translate | read | null |
| 2025-10-03 | FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents | Imene Kerboua et.al. | 2510.03204 | translate | read | null |
| 2025-10-02 | Wearable and Ultra-Low-Power Fusion of EMG and A-Mode US for Hand-Wrist Kinematic Tracking | Giusy Spacone et.al. | 2510.02000 | translate | read | null |
| 2025-10-02 | Contrastive Representation Regularization for Vision-Language-Action Models | Taeyoung Kim et.al. | 2510.01711 | translate | read | null |
| 2025-10-01 | EvoStruggle: A Dataset Capturing the Evolution of Struggle across Activities and Skill Levels | Shijia Feng et.al. | 2510.01362 | translate | read | null |
| 2025-10-01 | HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy | Myungkyu Koo et.al. | 2510.00695 | translate | read | null |
| 2025-10-01 | Expandable Decision-Making States for Multi-Agent Deep Reinforcement Learning in Soccer Tactical Analysis | Kenjiro Ide et.al. | 2510.00480 | translate | read | null |
(<a href=../Action_Recognition.md>back to Action Recognition</a>)