Action Recognition - 2025-12
Action Recognition - 2025-12
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-12-31 | Spatial4D-Bench: A Versatile 4D Spatial Intelligence Benchmark | Pan Wang et.al. | 2601.00092 | translate | read | null |
| 2025-12-31 | FineTec: Fine-Grained Action Recognition Under Temporal Corruption via Skeleton Decomposition and Sequence Completion | Dian Shao et.al. | 2512.25067 | translate | read | null |
| 2025-12-31 | VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation agents | Xunyi Zhao et.al. | 2512.24851 | translate | read | null |
| 2025-12-30 | AI-Driven Evaluation of Surgical Skill via Action Recognition | Yan Meng et.al. | 2512.24411 | translate | read | null |
| 2025-12-29 | Lifelong Domain Adaptive 3D Human Pose Estimation | Qucheng Peng et.al. | 2512.23860 | translate | read | null |
| 2025-12-29 | Act2Goal: From World Model To General Goal-conditioned Policy | Pengfei Zhou et.al. | 2512.23541 | translate | read | null |
| 2025-12-29 | Multi-Track Multimodal Learning on iMiGUE: Micro-Gesture and Emotion Recognition | Arman Martirosyan et.al. | 2512.23291 | translate | read | null |
| 2025-12-27 | Autoregressive Flow Matching for Motion Prediction | Johnathan Xie et.al. | 2512.22688 | translate | read | null |
| 2025-12-27 | Clutter-Resistant Vision-Language-Action Models through Object-Centric and Geometry Grounding | Khoa Vo et.al. | 2512.22519 | translate | read | null |
| 2025-12-22 | Signal-SGN++: Topology-Enhanced Time-Frequency Spiking Graph Network for Skeleton-Based Action Recognition | Naichuan Zheng et.al. | 2512.22214 | translate | read | null |
| 2025-12-26 | Patch as Node: Human-Centric Graph Representation Learning for Multimodal Action Recognition | Zeyu Liang et.al. | 2512.21916 | translate | read | null |
| 2025-12-24 | EVE: A Generator-Verifier System for Generative Policies | Yusuf Ali et.al. | 2512.21430 | translate | read | null |
| 2025-12-24 | ElfCore: A 28nm Neural Processor Enabling Dynamic Structured Sparse Training and Online Self-Supervised Learning with Activity-Dependent Weight Update | Zhe Su et.al. | 2512.21153 | translate | read | null |
| 2025-12-23 | Bridging Modalities and Transferring Knowledge: Enhanced Multimodal Understanding and Recognition | Gorjan Radevski et.al. | 2512.20501 | translate | read | null |
| 2025-12-23 | Beyond Motion Pattern: An Empirical Study of Physical Forces for Human Motion Understanding | Anh Dao et.al. | 2512.20451 | translate | read | null |
| 2025-12-23 | DETACH : Decomposed Spatio-Temporal Alignment for Exocentric Video and Ambient Sensors with Staged Learning | Junho Yoon et.al. | 2512.20409 | translate | read | null |
| 2025-12-23 | Effect of Activation Function and Model Optimizer on the Performance of Human Activity Recognition System Using Various Deep Learning Models | Subrata Kumer Paula et.al. | 2512.20104 | translate | read | null |
| 2025-12-23 | A Contextual Analysis of Driver-Facing and Dual-View Video Inputs for Distraction Detection in Naturalistic Driving Environments | Anthony Dontoh et.al. | 2512.20025 | translate | read | null |
| 2025-12-22 | Distinguishing Visually Similar Actions: Prompt-Guided Semantic Prototype Modulation for Few-Shot Action Recognition | Xiaoyang Li et.al. | 2512.19036 | translate | read | null |
| 2025-12-21 | Context-Aware Network Based on Multi-scale Spatio-temporal Attention for Action Recognition in Videos | Xiaoyang Li et.al. | 2512.18750 | translate | read | null |
| 2025-12-21 | Hierarchical Bayesian Framework for Multisource Domain Adaptation | Alexander M. Glandon et.al. | 2512.18553 | translate | read | null |
| 2025-12-17 | Seeing Beyond the Scene: Analyzing and Mitigating Background Bias in Action Recognition | Ellie Zhou et.al. | 2512.17953 | translate | read | null |
| 2025-12-19 | Xiaomi MiMo-VL-Miloco Technical Report | Jiaze Li et.al. | 2512.17436 | translate | read | null |
| 2025-12-18 | OMG-Bench: A New Challenging Benchmark for Skeleton-based Online Micro Hand Gesture Recognition | Haochen Chang et.al. | 2512.16727 | translate | read | null |
| 2025-12-18 | Skeleton-Snippet Contrastive Learning with Multiscale Feature Fusion for Action Localization | Qiushuo Cheng et.al. | 2512.16504 | translate | read | null |
| 2025-12-06 | Smart Surveillance: Identifying IoT Device Behaviours using ML-Powered Traffic Analysis | Reza Ryan et.al. | 2512.13709 | translate | read | null |
| 2025-12-15 | Recurrent Video Masked Autoencoders | Daniel Zoran et.al. | 2512.13684 | translate | read | null |
| 2025-12-14 | StegaVAR: Privacy-Preserving Video Action Recognition via Steganographic Domain Analysis | Lixin Chen et.al. | 2512.12586 | translate | read | null |
| 2025-12-13 | From Human Intention to Action Prediction: A Comprehensive Benchmark for Intention-driven End-to-End Autonomous Driving | Huan Zheng et.al. | 2512.12302 | translate | read | null |
| 2025-12-12 | DynaPURLS: Dynamic Refinement of Part-aware Representations for Skeleton-based Zero-Shot Action Recognition | Jingmin Zhu et.al. | 2512.11941 | translate | read | null |
| 2025-12-05 | Explainable Adversarial-Robust Vision-Language-Action Model for Robotic Manipulation | Ju-Young Kim et.al. | 2512.11865 | translate | read | null |
| 2025-12-12 | TSkel-Mamba: Temporal Dynamic Modeling via State Space Model for Human Skeleton-based Action Recognition | Yanan Liu et.al. | 2512.11503 | translate | read | null |
| 2025-12-12 | Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation | Jingmin Zhu et.al. | 2512.11458 | translate | read | null |
| 2025-12-12 | Task-Specific Distance Correlation Matching for Few-Shot Action Recognition | Fei Long et.al. | 2512.11340 | translate | read | null |
| 2025-12-12 | Breast-Rehab: A Postoperative Breast Cancer Rehabilitation Training Assessment System Based on Human Action Recognition | Zikang Chen et.al. | 2512.11245 | translate | read | null |
| 2025-12-12 | Multi-task Learning with Extended Temporal Shift Module for Temporal Action Localization | Anh-Kiet Duong et.al. | 2512.11189 | translate | read | null |
| 2025-12-11 | Deep Photonic Reservoir Computing with On-chip Nonlinearity | Jinlong Xiang et.al. | 2512.10626 | translate | read | null |
| 2025-12-11 | Lang2Motion: Bridging Language and Motion through Joint Embedding Spaces | Bishoy Galoaa et.al. | 2512.10617 | translate | read | null |
| 2025-12-11 | Lies We Can Trust: Quantifying Action Uncertainty with Inaccurate Stochastic Dynamics through Conformalized Nonholonomic Lie Groups | Luís Marques et.al. | 2512.10294 | translate | read | null |
| 2025-12-10 | GLaD: Geometric Latent Distillation for Vision-Language-Action Models | Minghao Guo et.al. | 2512.09619 | translate | read | null |
| 2025-12-09 | Neural Ordinary Differential Equations for Simulating Metabolic Pathway Dynamics from Time-Series Multiomics Data | Udesh Habaraduwa et.al. | 2512.08732 | translate | read | null |
| 2025-12-09 | Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning | Huilin Xu et.al. | 2512.08639 | translate | read | null |
| 2025-12-09 | Mind to Hand: Purposeful Robotic Control via Embodied Reasoning | Peijun Tang et.al. | 2512.08580 | translate | read | null |
| 2025-12-08 | A Comparative Study of EMG- and IMU-based Gesture Recognition at the Wrist and Forearm | Soroush Baghernezhad et.al. | 2512.07997 | translate | read | null |
| 2025-12-08 | Improving action classification with brain-inspired deep networks | Aidas Aglinskas et.al. | 2512.07729 | translate | read | null |
| 2025-12-08 | A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning | Siyang Jiang et.al. | 2512.07136 | translate | read | null |
| 2025-12-07 | VideoVLA: Video Generators Can Be Generalizable Robot Manipulators | Yichao Shen et.al. | 2512.06963 | translate | read | null |
| 2025-12-04 | Towards Adaptive Fusion of Multimodal Deep Networks for Human Action Recognition | Novanto Yudistira et.al. | 2512.04943 | translate | read | null |
| 2025-12-04 | CIG-MAE: Cross-Modal Information-Guided Masked Autoencoder for Self-Supervised WiFi Sensing | Gang Liu et.al. | 2512.04723 | translate | read | null |
| 2025-12-04 | WiFi-based Cross-Domain Gesture Recognition Using Attention Mechanism | Ruijing Liu et.al. | 2512.04521 | translate | read | null |
| 2025-12-03 | Heatmap Pooling Network for Action Recognition from RGB Videos | Mengyuan Liu et.al. | 2512.03837 | translate | read | null |
| 2025-12-02 | SAM2Grasp: Resolve Multi-modal Grasping via Prompt-conditioned Temporal Action Prediction | Shengkai Wu et.al. | 2512.02609 | translate | read | null |
| 2025-12-01 | TBT-Former: Learning Temporal Boundary Distributions for Action Localization | Thisara Rathnayaka et.al. | 2512.01298 | translate | read | null |
(<a href=../Action_Recognition.md>back to Action Recognition</a>)