Action Recognition - 2025-06
Action Recognition - 2025-06
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-06-30 | LineRetriever: Planning-Aware Observation Reduction for Web Agents | Imene Kerboua et.al. | 2507.00210 | translate | read | null |
| 2025-06-30 | Online Human Action Detection during Escorting | Siddhartha Mondal et.al. | 2506.23573 | translate | read | null |
| 2025-06-29 | DEL: Dense Event Localization for Multi-modal Audio-Visual Understanding | Mona Ahmadian et.al. | 2506.23196 | translate | read | null |
| 2025-06-27 | Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition | Wenhan Wu et.al. | 2506.22179 | translate | read | null |
| 2025-06-26 | WorldVLA: Towards Autoregressive Action World Model | Jun Cen et.al. | 2506.21539 | translate | read | link |
| 2025-06-26 | EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception | Sanjoy Chowdhury et.al. | 2506.21080 | translate | read | null |
| 2025-06-25 | How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction? | Stephanie Käs et.al. | 2506.20795 | translate | read | null |
| 2025-06-25 | CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition | Joerg Deigmoeller et.al. | 2506.20373 | translate | read | null |
| 2025-06-25 | Feature Hallucination for Self-supervised Action Recognition | Lei Wang et.al. | 2506.20342 | translate | read | null |
| 2025-06-27 | ReactEMG: Zero-Shot, Low-Latency Intent Detection via sEMG | Runsheng Wang et.al. | 2506.19815 | translate | read | null |
| 2025-06-24 | Self-Paced Collaborative and Adversarial Network for Unsupervised Domain Adaptation | Weichen Zhang et.al. | 2506.19267 | translate | read | null |
| 2025-06-23 | Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition | Dustin Aganian et.al. | 2506.18721 | translate | read | null |
| 2025-06-23 | Improving Weakly Supervised Temporal Action Localization by Exploiting Multi-resolution Information in Temporal Domain | Rui Su et.al. | 2506.18261 | translate | read | null |
| 2025-06-23 | Robot Tactile Gesture Recognition Based on Full-body Modular E-skin | Shuo Jiang et.al. | 2506.18256 | translate | read | null |
| 2025-06-22 | Adapting Vision-Language Models for Evaluating World Models | Mariya Hendriksen et.al. | 2506.17967 | translate | read | null |
| 2025-06-21 | Domain Generalization using Action Sequences for Egocentric Action Recognition | Amirshayan Nasirimajd et.al. | 2506.17685 | translate | read | null |
| 2025-06-20 | Wi-Fi Sensing Tool Release: Gathering 802.11ax Channel State Information from a Commercial Wi-Fi Access Point | Zisheng Wang et.al. | 2506.16957 | translate | read | null |
| 2025-06-20 | Language-driven Description Generation and Common Sense Reasoning for Video Action Recognition | Xiaodan Hu et.al. | 2506.16701 | translate | read | null |
| 2025-06-19 | CLIP-MG: Guiding Semantic Attention with Skeletal Pose Features and RGB Data for Micro-Gesture Recognition on the iMiGUE Dataset | Santosh Patapati et.al. | 2506.16385 | translate | read | null |
| 2025-06-18 | Accessible Gesture-Driven Augmented Reality Interaction System | Yikan Wang et.al. | 2506.15189 | translate | read | null |
| 2025-06-17 | CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion | Jiahua Ma et.al. | 2506.14769 | translate | read | null |
| 2025-06-16 | Leveraging Vision-Language Pre-training for Human Activity Recognition in Still Images | Cristina Mahanta et.al. | 2506.13458 | translate | read | null |
| 2025-06-16 | Active Multimodal Distillation for Few-shot Action Recognition | Weijia Feng et.al. | 2506.13322 | translate | read | null |
| 2025-06-16 | Action Dubber: Timing Audible Actions via Inflectional Flow | Wenlong Wan et.al. | 2506.13320 | translate | read | null |
| 2025-06-15 | Towards Fine-Grained Emotion Understanding via Skeleton-Based Micro-Gesture Recognition | Hao Xu et.al. | 2506.12848 | translate | read | null |
| 2025-06-13 | Pose Matters: Evaluating Vision Transformers and CNNs for Human Action Recognition on Small COCO Subsets | MingZe Tang et.al. | 2506.11678 | translate | read | null |
| 2025-06-12 | GynSurg: A Comprehensive Gynecology Laparoscopic Surgery Dataset | Sahar Nasirihaghighi et.al. | 2506.11356 | translate | read | null |
| 2025-06-12 | WaveFormer: A Lightweight Transformer Model for sEMG-based Gesture Recognition | Yanlong Chen et.al. | 2506.11168 | translate | read | null |
| 2025-06-11 | SLRNet: A Real-Time LSTM-Based Sign Language Recognition System | Sharvari Kamble et.al. | 2506.11154 | translate | read | link |
| 2025-06-10 | Gender Fairness of Machine Learning Algorithms for Pain Detection | Dylan Green et.al. | 2506.11132 | translate | read | null |
| 2025-06-12 | Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop | Justin Kerr et.al. | 2506.10968 | translate | read | null |
| 2025-06-11 | HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios | Kunyu Peng et.al. | 2506.09650 | translate | read | link |
| 2025-06-11 | Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation | Ye Niu et.al. | 2506.09422 | translate | read | null |
| 2025-06-11 | Synthetic Human Action Video Data Generation with Pose Transfer | Vaclav Knapp et.al. | 2506.09411 | translate | read | null |
| 2025-06-11 | An Effective End-to-End Solution for Multimodal Action Recognition | Songping Wang et.al. | 2506.09345 | translate | read | null |
| 2025-06-10 | Diver-Robot Communication Dataset for Underwater Hand Gesture Recognition | Igor Kvasić et.al. | 2506.08974 | translate | read | null |
| 2025-06-09 | BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models | Peiyan Li et.al. | 2506.07961 | translate | read | link |
| 2025-06-08 | AugmentGest: Can Random Data Cropping Augmentation Boost Gesture Recognition Performance? | Nada Aboudeshish et.al. | 2506.07216 | translate | read | null |
| 2025-06-08 | SAP-Bench: Benchmarking Multimodal Large Language Models in Surgical Action Planning | Mengya Xu et.al. | 2506.07196 | translate | read | null |
| 2025-06-07 | PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments | Minghao Zou et.al. | 2506.06631 | translate | read | null |
| 2025-06-06 | Conversational Interfaces for Parametric Conceptual Architectural Design: Integrating Mixed Reality with LLM-driven Interaction | Ruochen Ji et.al. | 2506.06066 | translate | read | null |
| 2025-06-06 | DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models | Yuhan Hao et.al. | 2506.05667 | translate | read | null |
| 2025-06-05 | Robustness Evaluation for Video Models with Reinforcement Learning | Ashwin Ramesh Babu et.al. | 2506.05431 | translate | read | null |
| 2025-06-04 | Video, How Do Your Tokens Merge? | Sam Pollard et.al. | 2506.03885 | translate | read | null |
| 2025-06-04 | Zero-Shot Temporal Interaction Localization for Egocentric Videos | Erhang Zhang et.al. | 2506.03662 | translate | read | link |
| 2025-06-04 | Heterogeneous Skeleton-Based Action Representation Learning | Hongsong Wang et.al. | 2506.03481 | translate | read | null |
| 2025-06-04 | Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments | Di Wen et.al. | 2506.02845 | translate | read | link |
| 2025-06-03 | Technical Report for Ego4D Long-Term Action Anticipation Challenge 2025 | Qiaohui Chu et.al. | 2506.02550 | translate | read | null |
| 2025-06-03 | VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments | Zelai Xu et.al. | 2506.02387 | translate | read | link |
| 2025-06-03 | Multi-level and Multi-modal Action Anticipation | Seulgi Kim et.al. | 2506.02382 | translate | read | null |
| 2025-06-02 | TransAct V2: Lifelong User Action Sequence Modeling on Pinterest Recommendation | Xue Xia et.al. | 2506.02267 | translate | read | null |
| 2025-06-02 | SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics | Mustafa Shukor et.al. | 2506.01844 | translate | read | link |
| 2025-06-02 | Efficient Egocentric Action Recognition with Multimodal Data | Marco Calzavara et.al. | 2506.01757 | translate | read | null |
| 2025-06-02 | EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language models | Andy Bonnetto et.al. | 2506.01608 | translate | read | link |
| 2025-06-02 | Sheep Facial Pain Assessment Under Weighted Graph Neural Networks | Alam Noor et.al. | 2506.01468 | translate | read | null |
| 2025-06-02 | EgoBrain: Synergizing Minds and Eyes For Human Action Understanding | Nie Lin et.al. | 2506.01353 | translate | read | null |
(<a href=../Action_Recognition.md>back to Action Recognition</a>)