Action Recognition - 2025-06

Publish Date Title Authors PDF Translate Read Code
2025-06-30 LineRetriever: Planning-Aware Observation Reduction for Web Agents Imene Kerboua et.al. 2507.00210 translate read null
2025-06-30 Online Human Action Detection during Escorting Siddhartha Mondal et.al. 2506.23573 translate read null
2025-06-29 DEL: Dense Event Localization for Multi-modal Audio-Visual Understanding Mona Ahmadian et.al. 2506.23196 translate read null
2025-06-27 Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition Wenhan Wu et.al. 2506.22179 translate read null
2025-06-26 WorldVLA: Towards Autoregressive Action World Model Jun Cen et.al. 2506.21539 translate read link
2025-06-26 EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception Sanjoy Chowdhury et.al. 2506.21080 translate read null
2025-06-25 How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction? Stephanie Käs et.al. 2506.20795 translate read null
2025-06-25 CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition Joerg Deigmoeller et.al. 2506.20373 translate read null
2025-06-25 Feature Hallucination for Self-supervised Action Recognition Lei Wang et.al. 2506.20342 translate read null
2025-06-27 ReactEMG: Zero-Shot, Low-Latency Intent Detection via sEMG Runsheng Wang et.al. 2506.19815 translate read null
2025-06-24 Self-Paced Collaborative and Adversarial Network for Unsupervised Domain Adaptation Weichen Zhang et.al. 2506.19267 translate read null
2025-06-23 Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition Dustin Aganian et.al. 2506.18721 translate read null
2025-06-23 Improving Weakly Supervised Temporal Action Localization by Exploiting Multi-resolution Information in Temporal Domain Rui Su et.al. 2506.18261 translate read null
2025-06-23 Robot Tactile Gesture Recognition Based on Full-body Modular E-skin Shuo Jiang et.al. 2506.18256 translate read null
2025-06-22 Adapting Vision-Language Models for Evaluating World Models Mariya Hendriksen et.al. 2506.17967 translate read null
2025-06-21 Domain Generalization using Action Sequences for Egocentric Action Recognition Amirshayan Nasirimajd et.al. 2506.17685 translate read null
2025-06-20 Wi-Fi Sensing Tool Release: Gathering 802.11ax Channel State Information from a Commercial Wi-Fi Access Point Zisheng Wang et.al. 2506.16957 translate read null
2025-06-20 Language-driven Description Generation and Common Sense Reasoning for Video Action Recognition Xiaodan Hu et.al. 2506.16701 translate read null
2025-06-19 CLIP-MG: Guiding Semantic Attention with Skeletal Pose Features and RGB Data for Micro-Gesture Recognition on the iMiGUE Dataset Santosh Patapati et.al. 2506.16385 translate read null
2025-06-18 Accessible Gesture-Driven Augmented Reality Interaction System Yikan Wang et.al. 2506.15189 translate read null
2025-06-17 CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion Jiahua Ma et.al. 2506.14769 translate read null
2025-06-16 Leveraging Vision-Language Pre-training for Human Activity Recognition in Still Images Cristina Mahanta et.al. 2506.13458 translate read null
2025-06-16 Active Multimodal Distillation for Few-shot Action Recognition Weijia Feng et.al. 2506.13322 translate read null
2025-06-16 Action Dubber: Timing Audible Actions via Inflectional Flow Wenlong Wan et.al. 2506.13320 translate read null
2025-06-15 Towards Fine-Grained Emotion Understanding via Skeleton-Based Micro-Gesture Recognition Hao Xu et.al. 2506.12848 translate read null
2025-06-13 Pose Matters: Evaluating Vision Transformers and CNNs for Human Action Recognition on Small COCO Subsets MingZe Tang et.al. 2506.11678 translate read null
2025-06-12 GynSurg: A Comprehensive Gynecology Laparoscopic Surgery Dataset Sahar Nasirihaghighi et.al. 2506.11356 translate read null
2025-06-12 WaveFormer: A Lightweight Transformer Model for sEMG-based Gesture Recognition Yanlong Chen et.al. 2506.11168 translate read null
2025-06-11 SLRNet: A Real-Time LSTM-Based Sign Language Recognition System Sharvari Kamble et.al. 2506.11154 translate read link
2025-06-10 Gender Fairness of Machine Learning Algorithms for Pain Detection Dylan Green et.al. 2506.11132 translate read null
2025-06-12 Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop Justin Kerr et.al. 2506.10968 translate read null
2025-06-11 HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios Kunyu Peng et.al. 2506.09650 translate read link
2025-06-11 Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation Ye Niu et.al. 2506.09422 translate read null
2025-06-11 Synthetic Human Action Video Data Generation with Pose Transfer Vaclav Knapp et.al. 2506.09411 translate read null
2025-06-11 An Effective End-to-End Solution for Multimodal Action Recognition Songping Wang et.al. 2506.09345 translate read null
2025-06-10 Diver-Robot Communication Dataset for Underwater Hand Gesture Recognition Igor Kvasić et.al. 2506.08974 translate read null
2025-06-09 BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models Peiyan Li et.al. 2506.07961 translate read link
2025-06-08 AugmentGest: Can Random Data Cropping Augmentation Boost Gesture Recognition Performance? Nada Aboudeshish et.al. 2506.07216 translate read null
2025-06-08 SAP-Bench: Benchmarking Multimodal Large Language Models in Surgical Action Planning Mengya Xu et.al. 2506.07196 translate read null
2025-06-07 PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments Minghao Zou et.al. 2506.06631 translate read null
2025-06-06 Conversational Interfaces for Parametric Conceptual Architectural Design: Integrating Mixed Reality with LLM-driven Interaction Ruochen Ji et.al. 2506.06066 translate read null
2025-06-06 DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models Yuhan Hao et.al. 2506.05667 translate read null
2025-06-05 Robustness Evaluation for Video Models with Reinforcement Learning Ashwin Ramesh Babu et.al. 2506.05431 translate read null
2025-06-04 Video, How Do Your Tokens Merge? Sam Pollard et.al. 2506.03885 translate read null
2025-06-04 Zero-Shot Temporal Interaction Localization for Egocentric Videos Erhang Zhang et.al. 2506.03662 translate read link
2025-06-04 Heterogeneous Skeleton-Based Action Representation Learning Hongsong Wang et.al. 2506.03481 translate read null
2025-06-04 Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments Di Wen et.al. 2506.02845 translate read link
2025-06-03 Technical Report for Ego4D Long-Term Action Anticipation Challenge 2025 Qiaohui Chu et.al. 2506.02550 translate read null
2025-06-03 VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments Zelai Xu et.al. 2506.02387 translate read link
2025-06-03 Multi-level and Multi-modal Action Anticipation Seulgi Kim et.al. 2506.02382 translate read null
2025-06-02 TransAct V2: Lifelong User Action Sequence Modeling on Pinterest Recommendation Xue Xia et.al. 2506.02267 translate read null
2025-06-02 SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Mustafa Shukor et.al. 2506.01844 translate read link
2025-06-02 Efficient Egocentric Action Recognition with Multimodal Data Marco Calzavara et.al. 2506.01757 translate read null
2025-06-02 EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language models Andy Bonnetto et.al. 2506.01608 translate read link
2025-06-02 Sheep Facial Pain Assessment Under Weighted Graph Neural Networks Alam Noor et.al. 2506.01468 translate read null
2025-06-02 EgoBrain: Synergizing Minds and Eyes For Human Action Understanding Nie Lin et.al. 2506.01353 translate read null

(<a href=../Action_Recognition.md>back to Action Recognition</a>)