Action Recognition - 2025-03
Action Recognition - 2025-03
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-03-30 | CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition | Jongseo Lee et.al. | 2503.23447 | translate | read | null |
| 2025-03-30 | OwlSight: A Robust Illumination Adaptation Framework for Dark Video Human Action Recognition | Shihao Cheng et.al. | 2503.23266 | translate | read | null |
| 2025-03-29 | Action Recognition in Real-World Ambient Assisted Living Environment | Vincent Gbouna Zakka et.al. | 2503.23214 | translate | read | link |
| 2025-03-28 | ForcePose: A Deep Learning Approach for Force Calculation Based on Action Recognition Using MediaPipe Pose Estimation Combined with Object Detection | Nandakishor M et.al. | 2503.22363 | translate | read | null |
| 2025-03-30 | UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning | Zhengxi Lu et.al. | 2503.21620 | translate | read | link |
| 2025-03-27 | One Snapshot is All You Need: A Generalized Method for mmWave Signal Generation | Teng Huang et.al. | 2503.21122 | translate | read | null |
| 2025-03-26 | ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction | Yiqiao Jin et.al. | 2503.20978 | translate | read | null |
| 2025-03-26 | Siformer: Feature-isolated Transformer for Efficient Skeleton-based Sign Language Recognition | Muxin Pu et.al. | 2503.20436 | translate | read | null |
| 2025-03-25 | Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings | Chengan Che et.al. | 2503.19740 | translate | read | link |
| 2025-03-25 | fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models | Saurav Sharma et.al. | 2503.19670 | translate | read | null |
| 2025-03-24 | LLaVAction: evaluating and training multi-modal large language models for action recognition | Shaokai Ye et.al. | 2503.18712 | translate | read | link |
| 2025-03-24 | Surgical Action Planning with Large Language Models | Mengya Xu et.al. | 2503.18296 | translate | read | null |
| 2025-03-27 | Temporal-Guided Spiking Neural Networks for Event-Based Human Action Recognition | Siyuan Yang et.al. | 2503.17132 | translate | read | null |
| 2025-03-21 | BEAC: Imitating Complex Exploration and Task-oriented Behaviors for Invisible Object Nonprehensile Manipulation | Hirotaka Tahara et.al. | 2503.16803 | translate | read | null |
| 2025-03-21 | Improving mmWave based Hand Hygiene Monitoring through Beam Steering and Combining Techniques | Isura Nirmal et.al. | 2503.16764 | translate | read | null |
| 2025-03-19 | A Comprehensive Survey on Architectural Advances in Deep CNNs: Challenges, Applications, and Emerging Research Directions | Saddam Hussain Khan et.al. | 2503.16546 | translate | read | null |
| 2025-03-25 | Deep learning framework for action prediction reveals multi-timescale locomotor control | Wei-Chen Wang et.al. | 2503.16340 | translate | read | null |
| 2025-03-19 | UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction | Shravan Nayak et.al. | 2503.15661 | translate | read | null |
| 2025-03-19 | Multi-Modal Gesture Recognition from Video and Surgical Tool Pose Information via Motion Invariants | Jumanh Atoum et.al. | 2503.15647 | translate | read | null |
| 2025-03-21 | Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action Recognition | Seungyeon Cho et.al. | 2503.14960 | translate | read | null |
| 2025-03-19 | DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework | Henrique Morimitsu et.al. | 2503.14880 | translate | read | link |
| 2025-03-15 | Salient Temporal Encoding for Dynamic Scene Graph Generation | Zhihao Zhu et.al. | 2503.14524 | translate | read | null |
| 2025-03-17 | Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition | Shristi Das Biswas et.al. | 2503.13724 | translate | read | null |
| 2025-03-20 | STEP: Simultaneous Tracking and Estimation of Pose for Animals and Humans | Shashikant Verma et.al. | 2503.13344 | translate | read | null |
| 2025-03-17 | Dense Policy: Bidirectional Autoregressive Learning of Actions | Yue Su et.al. | 2503.13217 | translate | read | null |
| 2025-03-16 | EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera | Luming Wang et.al. | 2503.12419 | translate | read | link |
| 2025-03-16 | ProbDiffFlow: An Efficient Learning-Free Framework for Probabilistic Single-Image Optical Flow Estimation | Mo Zhou et.al. | 2503.12348 | translate | read | null |
| 2025-03-15 | Real-Time Manipulation Action Recognition with a Factorized Graph Sequence Encoder | Enes Erdogan et.al. | 2503.12034 | translate | read | null |
| 2025-03-14 | Enhancing Hand Palm Motion Gesture Recognition by Eliminating Reference Frame Bias via Frame-Invariant Similarity Measures | Arno Verduyn et.al. | 2503.11352 | translate | read | null |
| 2025-03-14 | Aerial Vision-and-Language Navigation with Grid-based View Selection and Map Construction | Ganlong Zhao et.al. | 2503.11091 | translate | read | null |
| 2025-03-14 | VA-AR: Learning Velocity-Aware Action Representations with Mixture of Window Attention | Jiangning Wei et.al. | 2503.11004 | translate | read | null |
| 2025-03-13 | Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation | Qi Lv et.al. | 2503.10743 | translate | read | null |
| 2025-03-11 | Open-World Skill Discovery from Unsegmented Demonstrations | Jingwen Deng et.al. | 2503.10684 | translate | read | link |
| 2025-03-17 | HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model | Jiaming Liu et.al. | 2503.10631 | translate | read | null |
| 2025-03-13 | SurgRAW: Multi-Agent Workflow with Chain-of-Thought Reasoning for Surgical Intelligence | Chang Han Low et.al. | 2503.10265 | translate | read | null |
| 2025-03-12 | A Hybrid Neural Network with Smart Skip Connections for High-Precision, Low-Latency EMG-Based Hand Gesture Recognition | Hafsa Wazir et.al. | 2503.09041 | translate | read | null |
| 2025-03-12 | Unified Locomotion Transformer with Simultaneous Sim-to-Real Transfer for Quadrupeds | Dikai Liu et.al. | 2503.08997 | translate | read | null |
| 2025-03-11 | PromptGAR: Flexible Promptive Group Activity Recognition | Zhangyu Jin et.al. | 2503.08933 | translate | read | null |
| 2025-03-11 | MetaFold: Language-Guided Multi-Category Garment Folding Framework via Trajectory Generation and Foundation Model | Haonan Chen et.al. | 2503.08372 | translate | read | null |
| 2025-03-11 | A Survey on Wi-Fi Sensing Generalizability: Taxonomy, Techniques, Datasets, and Future Research Prospects | Fei Wang et.al. | 2503.08008 | translate | read | null |
| 2025-03-10 | Helios 2.0: A Robust, Ultra-Low Power Gesture Recognition System Optimised for Event-Sensor based Wearables | Prarthana Bhattacharyya et.al. | 2503.07825 | translate | read | null |
| 2025-03-10 | Elderly Activity Recognition in the Wild: Results from the EAR Challenge | Anh-Kiet Duong et.al. | 2503.07821 | translate | read | link |
| 2025-03-09 | TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos | Chen-Lin Zhang et.al. | 2503.06526 | translate | read | link |
| 2025-03-09 | SGA-INTERACT: A 3D Skeleton-based Benchmark for Group Activity Understanding in Modern Basketball Tactic | Yuchen Yang et.al. | 2503.06522 | translate | read | link |
| 2025-03-07 | MPTSNet: Integrating Multiscale Periodic Local Patterns and Global Dependencies for Multivariate Time Series Classification | Yang Mu et.al. | 2503.05582 | translate | read | null |
| 2025-03-07 | Multi-Grained Feature Pruning for Video-Based Human Pose Estimation | Zhigang Wang et.al. | 2503.05365 | translate | read | null |
| 2025-03-06 | Maestro: A 302 GFLOPS/W and 19.8GFLOPS RISC-V Vector-Tensor Architecture for Wearable Ultrasound Edge Computing | Mattia Sinigaglia et.al. | 2503.04581 | translate | read | null |
| 2025-03-06 | Gate-Shift-Pose: Enhancing Action Recognition in Sports with Skeleton Information | Edoardo Bianchi et.al. | 2503.04470 | translate | read | link |
| 2025-03-06 | Spatial-Temporal Perception with Causal Inference for Naturalistic Driving Action Recognition | Qing Chang et.al. | 2503.04078 | translate | read | null |
| 2025-03-06 | Social Gesture Recognition in spHRI: Leveraging Fabric-Based Tactile Sensing on Humanoid Robots | Dakarai Crowder et.al. | 2503.03234 | translate | read | null |
| 2025-03-04 | Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup | Seokun Kang et.al. | 2503.02284 | translate | read | null |
| 2025-03-04 | FABG : End-to-end Imitation Learning for Embodied Affective Human-Robot Interaction | Yanghai Zhang et.al. | 2503.01363 | translate | read | null |
| 2025-03-04 | An Efficient 3D Convolutional Neural Network with Channel-wise, Spatial-grouped, and Temporal Convolutions | Zhe Wang et.al. | 2503.00796 | translate | read | null |
| 2025-03-02 | One-Shot Gesture Recognition for Underwater Diver-To-Robot Communication | Rishikesh Joshi et.al. | 2503.00676 | translate | read | null |
| 2025-03-04 | Unified Video Action Model | Shuang Li et.al. | 2503.00200 | translate | read | link |
(<a href=../Action_Recognition.md>back to Action Recognition</a>)