Action Recognition - 2025-11
Action Recognition - 2025-11
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-11-29 | Developing Fairness-Aware Task Decomposition to Improve Equity in Post-Spinal Fusion Complication Prediction | Yining Yuan et.al. | 2512.00598 | translate | read | null |
| 2025-11-29 | Integrating Skeleton Based Representations for Robust Yoga Pose Classification Using Deep Learning Models | Mohammed Mohiuddin et.al. | 2512.00572 | translate | read | null |
| 2025-11-28 | LatBot: Distilling Universal Latent Actions for Vision-Language-Action Models | Zuolei Li et.al. | 2511.23034 | translate | read | null |
| 2025-11-27 | SkeletonAgent: An Agentic Interaction Framework for Skeleton-based Action Recognition | Hongda Liu et.al. | 2511.22433 | translate | read | null |
| 2025-11-27 | HandyLabel: Towards Post-Processing to Real-Time Annotation Using Skeleton Based Hand Gesture Recognition | Sachin Kumar Singh et.al. | 2511.22337 | translate | read | null |
| 2025-11-26 | Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models | Naifu Zhang et.al. | 2511.21663 | translate | read | null |
| 2025-11-26 | Active Learning for GCN-based Action Recognition | Hichem Sahbi et.al. | 2511.21625 | translate | read | null |
| 2025-11-26 | Towards an Effective Action-Region Tracking Framework for Fine-grained Video Action Recognition | Baoli Sun et.al. | 2511.21202 | translate | read | null |
| 2025-11-24 | Scale What Counts, Mask What Matters: Evaluating Foundation Models for Zero-Shot Cross-Domain Wi-Fi Sensing | Cheng Jiang et.al. | 2511.18792 | translate | read | null |
| 2025-11-22 | ActDistill: General Action-Guided Self-Derived Distillation for Efficient Vision-Language-Action Models | Wencheng Ye et.al. | 2511.18082 | translate | read | null |
| 2025-11-21 | Label-Efficient Skeleton-based Recognition with Stable-Invertible Graph Convolutional Networks | Hichem Sahbi et.al. | 2511.17345 | translate | read | null |
| 2025-11-21 | Social-Media Based Personas Challenge: Hybrid Prediction of Common and Rare User Actions on Bluesky | Benjamin White et.al. | 2511.17241 | translate | read | null |
| 2025-11-21 | VLA-4D: Embedding 4D Awareness into Vision-Language-Action Models for SpatioTemporally Coherent Robotic Manipulation | Hanyu Zhou et.al. | 2511.17199 | translate | read | null |
| 2025-11-21 | Progress-Think: Semantic Progress Reasoning for Vision-Language Navigation | Shuo Wang et.al. | 2511.17097 | translate | read | null |
| 2025-11-21 | H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation | Yijie Zhu et.al. | 2511.17079 | translate | read | null |
| 2025-11-21 | The Wireless Charger as a Gesture Sensor: A Novel Approach to Ubiquitous Interaction | Weiyi Wang et.al. | 2511.16989 | translate | read | null |
| 2025-11-21 | Parts-Mamba: Augmenting Joint Context with Part-Level Scanning for Occluded Human Skeleton | Tianyi Shen et.al. | 2511.16860 | translate | read | null |
| 2025-11-20 | BoxingVI: A Multi-Modal Benchmark for Boxing Action Recognition and Localization | Rahul Kumar et.al. | 2511.16524 | translate | read | null |
| 2025-11-20 | FOOTPASS: A Multi-Modal Multi-Agent Tactical Context Dataset for Play-by-Play Action Spotting in Soccer Broadcast Videos | Jeremie Ochin et.al. | 2511.16183 | translate | read | null |
| 2025-11-19 | Scriboora: Rethinking Human Pose Forecasting | Daniel Bermuth et.al. | 2511.15565 | translate | read | null |
| 2025-11-18 | DoGCLR: Dominance-Game Contrastive Learning Network for Skeleton-Based Action Recognition | Yanshan Li et.al. | 2511.14179 | translate | read | null |
| 2025-11-18 | A Machine Learning-Based Multimodal Framework for Wearable Sensor-Based Archery Action Recognition and Stress Estimation | Xianghe Liu et.al. | 2511.14057 | translate | read | null |
| 2025-11-17 | Computer Vision based group activity detection and action spotting | Narthana Sivalingam et.al. | 2511.13315 | translate | read | null |
| 2025-11-17 | MGCA-Net: Multi-Grained Category-Aware Network for Open-Vocabulary Temporal Action Localization | Zhenying Fang et.al. | 2511.13039 | translate | read | null |
| 2025-11-17 | View-aware Cross-modal Distillation for Multi-view Action Recognition | Trung Thanh Nguyen et.al. | 2511.12870 | translate | read | null |
| 2025-11-16 | Pixels or Positions? Benchmarking Modalities in Group Activity Recognition | Drishya Karki et.al. | 2511.12606 | translate | read | null |
| 2025-11-15 | Locomotion in CAVE: Enhancing Immersion through Full-Body Motion | Xiaohui Li et.al. | 2511.12251 | translate | read | null |
| 2025-11-14 | Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective | Nhat Chung et.al. | 2511.11478 | translate | read | null |
| 2025-11-13 | SUGAR: Learning Skeleton Representation with Visual-Motion Knowledge for Action Recognition | Qilang Ye et.al. | 2511.10091 | translate | read | null |
| 2025-11-12 | Revisiting Cross-Architecture Distillation: Adaptive Dual-Teacher Transfer for Lightweight Video Models | Ying Peng et.al. | 2511.09469 | translate | read | null |
| 2025-11-12 | Learning by Neighbor-Aware Semantics, Deciding by Open-form Flows: Towards Robust Zero-Shot Skeleton Action Recognition | Yang Chen et.al. | 2511.09388 | translate | read | null |
| 2025-11-12 | PressTrack-HMR: Pressure-Based Top-Down Multi-Person Global Human Mesh Recovery | Jiayue Yuan et.al. | 2511.09147 | translate | read | null |
| 2025-11-11 | Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding | Joseph Fioresi et.al. | 2511.08666 | translate | read | null |
| 2025-11-09 | Learning Topology-Driven Multi-Subspace Fusion for Grassmannian Deep Network | Xuan Yu et.al. | 2511.08628 | translate | read | null |
| 2025-11-05 | The chanciness of time | John M. Myers et.al. | 2511.08611 | translate | read | null |
| 2025-11-11 | SASG-DA: Sparse-Aware Semantic-Guided Diffusion Augmentation For Myoelectric Gesture Recognition | Chen Liu et.al. | 2511.08344 | translate | read | null |
| 2025-11-10 | Achieving Effective Virtual Reality Interactions via Acoustic Gesture Recognition based on Large Language Models | Xijie Zhang et.al. | 2511.07085 | translate | read | null |
| 2025-11-10 | Otter: Mitigating Background Distractions of Wide-Angle Few-Shot Action Recognition with Enhanced RWKV | Wenbo Huang et.al. | 2511.06741 | translate | read | null |
| 2025-11-09 | Learning-Based Robust Bayesian Persuasion with Conformal Prediction Guarantees | Heeseung Bang et.al. | 2511.06223 | translate | read | null |
| 2025-11-06 | Grounding Foundational Vision Models with 3D Human Poses for Robust Action Recognition | Nicholas Babey et.al. | 2511.05622 | translate | read | null |
| 2025-11-06 | Pose-Aware Multi-Level Motion Parsing for Action Quality Assessment | Shuaikang Zhu et.al. | 2511.05611 | translate | read | null |
| 2025-11-07 | Accurate online action and gesture recognition system using detectors and Deep SPD Siamese Networks | Mohamed Sanim Akremi et.al. | 2511.05250 | translate | read | null |
| 2025-11-06 | Unified Multimodal Diffusion Forcing for Forceful Manipulation | Zixuan Huang et.al. | 2511.04812 | translate | read | null |
| 2025-11-06 | X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations | Maximus A. Pace et.al. | 2511.04671 | translate | read | null |
| 2025-11-06 | Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment | Tao Lin et.al. | 2511.04555 | translate | read | null |
| 2025-11-06 | Alternative Fairness and Accuracy Optimization in Criminal Justice | Shaolong Wu et.al. | 2511.04505 | translate | read | null |
| 2025-11-06 | ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai | Surapon Nonesung et.al. | 2511.04479 | translate | read | null |
| 2025-11-06 | Temporal Action Selection for Action Chunking | Yueyang Weng et.al. | 2511.04421 | translate | read | null |
| 2025-11-06 | ForeRobo: Unlocking Infinite Simulation Data for 3D Goal-driven Robotic Manipulation | Dexin wang et.al. | 2511.04381 | translate | read | null |
| 2025-11-06 | GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA Policies | Maëlic Neau et.al. | 2511.04357 | translate | read | null |
| 2025-11-06 | RCMCL: A Unified Contrastive Learning Framework for Robust Multi-Modal (RGB-D, Skeleton, Point Cloud) Action Understanding | Hasan Akgul et.al. | 2511.04351 | translate | read | null |
| 2025-11-06 | GUI-360 $^\circ$ : A Comprehensive Dataset and Benchmark for Computer-Using Agents | Jian Mu et.al. | 2511.04307 | translate | read | null |
| 2025-11-06 | Expectation-Realization Interpretation of Quantum Superposition | Yanting Wang et.al. | 2511.04154 | translate | read | null |
| 2025-11-06 | Learning from Online Videos at Inference Time for Computer-Use Agents | Yujian Liu et.al. | 2511.04137 | translate | read | null |
| 2025-11-06 | Unified Effective Field Theory for Nonlinear and Quantum Optics | Xiaochen Liu et.al. | 2511.04118 | translate | read | null |
| 2025-11-06 | Simple 3D Pose Features Support Human and Machine Social Scene Understanding | Wenshuo Qin et.al. | 2511.03988 | translate | read | null |
| 2025-11-06 | Use of Continuous Glucose Monitoring with Machine Learning to Identify Metabolic Subphenotypes and Inform Precision Lifestyle Changes | Ahmed A. Metwally et.al. | 2511.03986 | translate | read | null |
| 2025-11-06 | Temporal Zoom Networks: Distance Regression and Continuous Depth for Efficient Action Localization | Ibne Farabi Shihab et.al. | 2511.03943 | translate | read | null |
| 2025-11-05 | Enhancing Q-Value Updates in Deep Q-Learning via Successor-State Prediction | Lipeng Zu et.al. | 2511.03836 | translate | read | null |
| 2025-11-05 | Krylov Complexity Meets Confinement | Xuhao Jiang et.al. | 2511.03783 | translate | read | null |
| 2025-11-05 | Disentangled Concepts Speak Louder Than Words: Explainable Video Action Recognition | Jongseo Lee et.al. | 2511.03725 | translate | read | null |
| 2025-11-05 | A Lightweight 3D-CNN for Event-Based Human Action Recognition with Privacy-Preserving Potential | Mehdi Sefidgar Dilmaghani et.al. | 2511.03665 | translate | read | null |
| 2025-11-05 | LiveTradeBench: Seeking Real-World Alpha with Large Language Models | Haofei Yu et.al. | 2511.03628 | translate | read | link |
| 2025-11-05 | Learning Communication Skills in Multi-task Multi-agent Deep Reinforcement Learning | Changxi Zhu et.al. | 2511.03348 | translate | read | null |
| 2025-11-05 | Multi-Object Tracking Retrieval with LLaVA-Video: A Training-Free Solution to MOT25-StAG Challenge | Yi Yang et.al. | 2511.03332 | translate | read | null |
| 2025-11-04 | WorldPlanner: Monte Carlo Tree Search and MPC with Action-Conditioned Visual World Models | R. Khorrambakht et.al. | 2511.03077 | translate | read | null |
| 2025-11-04 | The Curved Spacetime of Transformer Architectures | Riccardo Di Sipio et.al. | 2511.03060 | translate | read | null |
| 2025-11-04 | VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation | Kevin Qinghong Lin et.al. | 2511.02778 | translate | read | link |
| 2025-11-04 | Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning | Farhad Rezazadeh et.al. | 2511.02748 | translate | read | null |
| 2025-11-04 | Radio and Optical Flares on the dMe Flare Star EV Lac | Rachel A. Osten et.al. | 2511.02719 | translate | read | null |
| 2025-11-04 | MVAFormer: RGB-based Multi-View Spatio-Temporal Action Recognition with Transformer | Taiga Yamane et.al. | 2511.02473 | translate | read | null |
| 2025-11-04 | From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics | Nicolas Schuler et.al. | 2511.02427 | translate | read | null |
| 2025-11-03 | Euler-Heisenberg action for fermions coupled to gauge and axial vectors: Hessian diagonalization, sector classification, and applications | Lucas Pereira de Souza et.al. | 2511.02118 | translate | read | null |
| 2025-11-03 | Neural dynamics of cognitive control: Current tensions and future promise | Dale Zhou et.al. | 2511.02063 | translate | read | null |
| 2025-11-03 | Path-Coordinated Continual Learning with Neural Tangent Kernel-Justified Plasticity: A Theoretical Framework with Near State-of-the-Art Performance | Rathin Chandra Shit et.al. | 2511.02025 | translate | read | null |
| 2025-11-03 | Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process | Jiayi Chen et.al. | 2511.01718 | translate | read | null |
| 2025-11-03 | OmniVLA: Physically-Grounded Multimodal VLA with Unified Multi-Sensor Perception for Robotic Manipulation | Heyu Guo et.al. | 2511.01210 | translate | read | null |
| 2025-11-02 | Rhythm in the Air: Vision-based Real-Time Music Generation through Gestures | Barathi Subramanian et.al. | 2511.00793 | translate | read | null |
(<a href=../Action_Recognition.md>back to Action Recognition</a>)