Action Recognition
Action Recognition
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-12-18 | OMG-Bench: A New Challenging Benchmark for Skeleton-based Online Micro Hand Gesture Recognition | Haochen Chang et.al. | 2512.16727 | null |
| 2025-12-18 | Skeleton-Snippet Contrastive Learning with Multiscale Feature Fusion for Action Localization | Qiushuo Cheng et.al. | 2512.16504 | null |
| 2025-12-06 | Smart Surveillance: Identifying IoT Device Behaviours using ML-Powered Traffic Analysis | Reza Ryan et.al. | 2512.13709 | null |
| 2025-12-15 | Recurrent Video Masked Autoencoders | Daniel Zoran et.al. | 2512.13684 | null |
| 2025-12-14 | StegaVAR: Privacy-Preserving Video Action Recognition via Steganographic Domain Analysis | Lixin Chen et.al. | 2512.12586 | null |
| 2025-12-13 | From Human Intention to Action Prediction: A Comprehensive Benchmark for Intention-driven End-to-End Autonomous Driving | Huan Zheng et.al. | 2512.12302 | null |
| 2025-12-12 | DynaPURLS: Dynamic Refinement of Part-aware Representations for Skeleton-based Zero-Shot Action Recognition | Jingmin Zhu et.al. | 2512.11941 | null |
| 2025-12-05 | Explainable Adversarial-Robust Vision-Language-Action Model for Robotic Manipulation | Ju-Young Kim et.al. | 2512.11865 | null |
| 2025-12-12 | TSkel-Mamba: Temporal Dynamic Modeling via State Space Model for Human Skeleton-based Action Recognition | Yanan Liu et.al. | 2512.11503 | null |
| 2025-12-12 | Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation | Jingmin Zhu et.al. | 2512.11458 | null |
| 2025-12-12 | Task-Specific Distance Correlation Matching for Few-Shot Action Recognition | Fei Long et.al. | 2512.11340 | null |
| 2025-12-12 | Breast-Rehab: A Postoperative Breast Cancer Rehabilitation Training Assessment System Based on Human Action Recognition | Zikang Chen et.al. | 2512.11245 | null |
| 2025-12-12 | Multi-task Learning with Extended Temporal Shift Module for Temporal Action Localization | Anh-Kiet Duong et.al. | 2512.11189 | null |
| 2025-12-11 | Deep Photonic Reservoir Computing with On-chip Nonlinearity | Jinlong Xiang et.al. | 2512.10626 | null |
| 2025-12-11 | Lang2Motion: Bridging Language and Motion through Joint Embedding Spaces | Bishoy Galoaa et.al. | 2512.10617 | null |
| 2025-12-11 | Lies We Can Trust: Quantifying Action Uncertainty with Inaccurate Stochastic Dynamics through Conformalized Nonholonomic Lie Groups | Luís Marques et.al. | 2512.10294 | null |
| 2025-12-10 | GLaD: Geometric Latent Distillation for Vision-Language-Action Models | Minghao Guo et.al. | 2512.09619 | null |
| 2025-12-09 | Neural Ordinary Differential Equations for Simulating Metabolic Pathway Dynamics from Time-Series Multiomics Data | Udesh Habaraduwa et.al. | 2512.08732 | null |
| 2025-12-09 | Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning | Huilin Xu et.al. | 2512.08639 | null |
| 2025-12-09 | Mind to Hand: Purposeful Robotic Control via Embodied Reasoning | Peijun Tang et.al. | 2512.08580 | null |
| 2025-12-08 | A Comparative Study of EMG- and IMU-based Gesture Recognition at the Wrist and Forearm | Soroush Baghernezhad et.al. | 2512.07997 | null |
| 2025-12-08 | Improving action classification with brain-inspired deep networks | Aidas Aglinskas et.al. | 2512.07729 | null |
| 2025-12-08 | A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning | Siyang Jiang et.al. | 2512.07136 | null |
| 2025-12-07 | VideoVLA: Video Generators Can Be Generalizable Robot Manipulators | Yichao Shen et.al. | 2512.06963 | null |
| 2025-12-04 | Towards Adaptive Fusion of Multimodal Deep Networks for Human Action Recognition | Novanto Yudistira et.al. | 2512.04943 | null |
| 2025-12-04 | CIG-MAE: Cross-Modal Information-Guided Masked Autoencoder for Self-Supervised WiFi Sensing | Gang Liu et.al. | 2512.04723 | null |
| 2025-12-04 | WiFi-based Cross-Domain Gesture Recognition Using Attention Mechanism | Ruijing Liu et.al. | 2512.04521 | null |
| 2025-12-03 | Heatmap Pooling Network for Action Recognition from RGB Videos | Mengyuan Liu et.al. | 2512.03837 | null |
| 2025-12-02 | SAM2Grasp: Resolve Multi-modal Grasping via Prompt-conditioned Temporal Action Prediction | Shengkai Wu et.al. | 2512.02609 | null |
| 2025-12-01 | TBT-Former: Learning Temporal Boundary Distributions for Action Localization | Thisara Rathnayaka et.al. | 2512.01298 | null |
| 2025-11-29 | Developing Fairness-Aware Task Decomposition to Improve Equity in Post-Spinal Fusion Complication Prediction | Yining Yuan et.al. | 2512.00598 | null |
| 2025-11-29 | Integrating Skeleton Based Representations for Robust Yoga Pose Classification Using Deep Learning Models | Mohammed Mohiuddin et.al. | 2512.00572 | null |
| 2025-10-14 | MOTION: ML-Assisted On-Device Low-Latency Motion Recognition | Veeramani Pugazhenthi et.al. | 2512.00008 | null |
| 2025-11-28 | LatBot: Distilling Universal Latent Actions for Vision-Language-Action Models | Zuolei Li et.al. | 2511.23034 | null |
| 2025-11-27 | SkeletonAgent: An Agentic Interaction Framework for Skeleton-based Action Recognition | Hongda Liu et.al. | 2511.22433 | null |
| 2025-11-27 | HandyLabel: Towards Post-Processing to Real-Time Annotation Using Skeleton Based Hand Gesture Recognition | Sachin Kumar Singh et.al. | 2511.22337 | null |
| 2025-11-26 | Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models | Naifu Zhang et.al. | 2511.21663 | null |
| 2025-11-26 | Active Learning for GCN-based Action Recognition | Hichem Sahbi et.al. | 2511.21625 | null |
| 2025-11-26 | Towards an Effective Action-Region Tracking Framework for Fine-grained Video Action Recognition | Baoli Sun et.al. | 2511.21202 | null |
| 2025-11-24 | Scale What Counts, Mask What Matters: Evaluating Foundation Models for Zero-Shot Cross-Domain Wi-Fi Sensing | Cheng Jiang et.al. | 2511.18792 | null |
| 2025-11-22 | ActDistill: General Action-Guided Self-Derived Distillation for Efficient Vision-Language-Action Models | Wencheng Ye et.al. | 2511.18082 | null |
| 2025-11-21 | Label-Efficient Skeleton-based Recognition with Stable-Invertible Graph Convolutional Networks | Hichem Sahbi et.al. | 2511.17345 | null |
| 2025-11-21 | Social-Media Based Personas Challenge: Hybrid Prediction of Common and Rare User Actions on Bluesky | Benjamin White et.al. | 2511.17241 | null |
| 2025-11-21 | VLA-4D: Embedding 4D Awareness into Vision-Language-Action Models for SpatioTemporally Coherent Robotic Manipulation | Hanyu Zhou et.al. | 2511.17199 | null |
| 2025-11-21 | Progress-Think: Semantic Progress Reasoning for Vision-Language Navigation | Shuo Wang et.al. | 2511.17097 | null |
| 2025-11-21 | H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation | Yijie Zhu et.al. | 2511.17079 | null |
| 2025-11-21 | The Wireless Charger as a Gesture Sensor: A Novel Approach to Ubiquitous Interaction | Weiyi Wang et.al. | 2511.16989 | null |
| 2025-11-21 | Parts-Mamba: Augmenting Joint Context with Part-Level Scanning for Occluded Human Skeleton | Tianyi Shen et.al. | 2511.16860 | null |
| 2025-11-20 | BoxingVI: A Multi-Modal Benchmark for Boxing Action Recognition and Localization | Rahul Kumar et.al. | 2511.16524 | null |
| 2025-11-20 | FOOTPASS: A Multi-Modal Multi-Agent Tactical Context Dataset for Play-by-Play Action Spotting in Soccer Broadcast Videos | Jeremie Ochin et.al. | 2511.16183 | null |
| 2025-11-19 | Scriboora: Rethinking Human Pose Forecasting | Daniel Bermuth et.al. | 2511.15565 | null |
| 2025-11-18 | DoGCLR: Dominance-Game Contrastive Learning Network for Skeleton-Based Action Recognition | Yanshan Li et.al. | 2511.14179 | null |
| 2025-11-18 | A Machine Learning-Based Multimodal Framework for Wearable Sensor-Based Archery Action Recognition and Stress Estimation | Xianghe Liu et.al. | 2511.14057 | null |
| 2025-11-17 | Computer Vision based group activity detection and action spotting | Narthana Sivalingam et.al. | 2511.13315 | null |
| 2025-11-17 | MGCA-Net: Multi-Grained Category-Aware Network for Open-Vocabulary Temporal Action Localization | Zhenying Fang et.al. | 2511.13039 | null |
| 2025-11-17 | View-aware Cross-modal Distillation for Multi-view Action Recognition | Trung Thanh Nguyen et.al. | 2511.12870 | null |
| 2025-11-16 | Pixels or Positions? Benchmarking Modalities in Group Activity Recognition | Drishya Karki et.al. | 2511.12606 | null |
| 2025-11-15 | Locomotion in CAVE: Enhancing Immersion through Full-Body Motion | Xiaohui Li et.al. | 2511.12251 | null |
| 2025-11-14 | Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective | Nhat Chung et.al. | 2511.11478 | null |
| 2025-11-13 | SUGAR: Learning Skeleton Representation with Visual-Motion Knowledge for Action Recognition | Qilang Ye et.al. | 2511.10091 | null |
| 2025-11-12 | Revisiting Cross-Architecture Distillation: Adaptive Dual-Teacher Transfer for Lightweight Video Models | Ying Peng et.al. | 2511.09469 | null |
| 2025-11-12 | Learning by Neighbor-Aware Semantics, Deciding by Open-form Flows: Towards Robust Zero-Shot Skeleton Action Recognition | Yang Chen et.al. | 2511.09388 | null |
| 2025-11-12 | PressTrack-HMR: Pressure-Based Top-Down Multi-Person Global Human Mesh Recovery | Jiayue Yuan et.al. | 2511.09147 | null |
| 2025-11-11 | Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding | Joseph Fioresi et.al. | 2511.08666 | null |
| 2025-11-09 | Learning Topology-Driven Multi-Subspace Fusion for Grassmannian Deep Network | Xuan Yu et.al. | 2511.08628 | null |
| 2025-11-05 | The chanciness of time | John M. Myers et.al. | 2511.08611 | null |
| 2025-11-11 | SASG-DA: Sparse-Aware Semantic-Guided Diffusion Augmentation For Myoelectric Gesture Recognition | Chen Liu et.al. | 2511.08344 | null |
| 2025-11-10 | Achieving Effective Virtual Reality Interactions via Acoustic Gesture Recognition based on Large Language Models | Xijie Zhang et.al. | 2511.07085 | null |
| 2025-11-10 | Otter: Mitigating Background Distractions of Wide-Angle Few-Shot Action Recognition with Enhanced RWKV | Wenbo Huang et.al. | 2511.06741 | null |
| 2025-11-09 | Learning-Based Robust Bayesian Persuasion with Conformal Prediction Guarantees | Heeseung Bang et.al. | 2511.06223 | null |
| 2025-11-06 | Grounding Foundational Vision Models with 3D Human Poses for Robust Action Recognition | Nicholas Babey et.al. | 2511.05622 | null |
| 2025-11-06 | Pose-Aware Multi-Level Motion Parsing for Action Quality Assessment | Shuaikang Zhu et.al. | 2511.05611 | null |
| 2025-11-07 | Accurate online action and gesture recognition system using detectors and Deep SPD Siamese Networks | Mohamed Sanim Akremi et.al. | 2511.05250 | null |
| 2025-11-06 | Unified Multimodal Diffusion Forcing for Forceful Manipulation | Zixuan Huang et.al. | 2511.04812 | null |
| 2025-11-06 | X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations | Maximus A. Pace et.al. | 2511.04671 | null |
| 2025-11-06 | Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment | Tao Lin et.al. | 2511.04555 | null |
| 2025-11-06 | Alternative Fairness and Accuracy Optimization in Criminal Justice | Shaolong Wu et.al. | 2511.04505 | null |
| 2025-11-06 | ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai | Surapon Nonesung et.al. | 2511.04479 | null |
| 2025-11-06 | Temporal Action Selection for Action Chunking | Yueyang Weng et.al. | 2511.04421 | null |
| 2025-11-06 | ForeRobo: Unlocking Infinite Simulation Data for 3D Goal-driven Robotic Manipulation | Dexin wang et.al. | 2511.04381 | null |
| 2025-11-06 | GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA Policies | Maëlic Neau et.al. | 2511.04357 | null |
| 2025-11-06 | RCMCL: A Unified Contrastive Learning Framework for Robust Multi-Modal (RGB-D, Skeleton, Point Cloud) Action Understanding | Hasan Akgul et.al. | 2511.04351 | null |
| 2025-11-06 | GUI-360 $^\circ$ : A Comprehensive Dataset and Benchmark for Computer-Using Agents | Jian Mu et.al. | 2511.04307 | null |
| 2025-11-06 | Expectation-Realization Interpretation of Quantum Superposition | Yanting Wang et.al. | 2511.04154 | null |
| 2025-11-06 | Learning from Online Videos at Inference Time for Computer-Use Agents | Yujian Liu et.al. | 2511.04137 | null |
| 2025-11-06 | Unified Effective Field Theory for Nonlinear and Quantum Optics | Xiaochen Liu et.al. | 2511.04118 | null |
| 2025-11-06 | Simple 3D Pose Features Support Human and Machine Social Scene Understanding | Wenshuo Qin et.al. | 2511.03988 | null |
| 2025-11-06 | Use of Continuous Glucose Monitoring with Machine Learning to Identify Metabolic Subphenotypes and Inform Precision Lifestyle Changes | Ahmed A. Metwally et.al. | 2511.03986 | null |
| 2025-11-06 | Temporal Zoom Networks: Distance Regression and Continuous Depth for Efficient Action Localization | Ibne Farabi Shihab et.al. | 2511.03943 | null |
| 2025-11-05 | Enhancing Q-Value Updates in Deep Q-Learning via Successor-State Prediction | Lipeng Zu et.al. | 2511.03836 | null |
| 2025-11-05 | Krylov Complexity Meets Confinement | Xuhao Jiang et.al. | 2511.03783 | null |
| 2025-11-05 | Disentangled Concepts Speak Louder Than Words: Explainable Video Action Recognition | Jongseo Lee et.al. | 2511.03725 | null |
| 2025-11-05 | A Lightweight 3D-CNN for Event-Based Human Action Recognition with Privacy-Preserving Potential | Mehdi Sefidgar Dilmaghani et.al. | 2511.03665 | null |
| 2025-11-05 | LiveTradeBench: Seeking Real-World Alpha with Large Language Models | Haofei Yu et.al. | 2511.03628 | link |
| 2025-11-05 | Learning Communication Skills in Multi-task Multi-agent Deep Reinforcement Learning | Changxi Zhu et.al. | 2511.03348 | null |
| 2025-11-05 | Multi-Object Tracking Retrieval with LLaVA-Video: A Training-Free Solution to MOT25-StAG Challenge | Yi Yang et.al. | 2511.03332 | null |
| 2025-11-04 | WorldPlanner: Monte Carlo Tree Search and MPC with Action-Conditioned Visual World Models | R. Khorrambakht et.al. | 2511.03077 | null |
| 2025-11-04 | The Curved Spacetime of Transformer Architectures | Riccardo Di Sipio et.al. | 2511.03060 | null |
| 2025-11-04 | VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation | Kevin Qinghong Lin et.al. | 2511.02778 | link |
| 2025-11-04 | Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning | Farhad Rezazadeh et.al. | 2511.02748 | null |
| 2025-11-04 | Radio and Optical Flares on the dMe Flare Star EV Lac | Rachel A. Osten et.al. | 2511.02719 | null |
| 2025-11-04 | MVAFormer: RGB-based Multi-View Spatio-Temporal Action Recognition with Transformer | Taiga Yamane et.al. | 2511.02473 | null |
| 2025-11-04 | From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics | Nicolas Schuler et.al. | 2511.02427 | null |
| 2025-11-03 | Euler-Heisenberg action for fermions coupled to gauge and axial vectors: Hessian diagonalization, sector classification, and applications | Lucas Pereira de Souza et.al. | 2511.02118 | null |
| 2025-11-03 | Neural dynamics of cognitive control: Current tensions and future promise | Dale Zhou et.al. | 2511.02063 | null |
| 2025-11-03 | Path-Coordinated Continual Learning with Neural Tangent Kernel-Justified Plasticity: A Theoretical Framework with Near State-of-the-Art Performance | Rathin Chandra Shit et.al. | 2511.02025 | null |
| 2025-11-03 | Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process | Jiayi Chen et.al. | 2511.01718 | null |
| 2025-11-03 | OmniVLA: Physically-Grounded Multimodal VLA with Unified Multi-Sensor Perception for Robotic Manipulation | Heyu Guo et.al. | 2511.01210 | null |
| 2025-11-02 | Rhythm in the Air: Vision-based Real-Time Music Generation through Gestures | Barathi Subramanian et.al. | 2511.00793 | null |
| 2025-10-30 | Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail | NVIDIA et.al. | 2511.00088 | null |
| 2025-10-31 | Enhancing Spatio-Temporal Zero-shot Action Recognition with Language-driven Description Attributes | Yehna Kim et.al. | 2510.27255 | null |
| 2025-10-31 | GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation | Tao Liu et.al. | 2510.27210 | null |
| 2025-10-30 | Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras | Christoffer Koo Øhrstrøm et.al. | 2510.26614 | null |
| 2025-10-29 | Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders | Ali Rasekh et.al. | 2510.26027 | null |
| 2025-10-29 | Informative Sample Selection Model for Skeleton-based Action Recognition with Limited Training Samples | Zhigang Tu et.al. | 2510.25345 | null |
| 2025-10-27 | Evaluation of Vision-LLMs in Surveillance Video | Pascal Benschop et.al. | 2510.23190 | null |
| 2025-10-27 | Enabling Vibration-Based Gesture Recognition on Everyday Furniture via Energy-Efficient FPGA Implementation of 1D Convolutional Networks | Koki Shibata et.al. | 2510.23156 | null |
| 2025-10-27 | Neural Recording Power Optimization Through Machine Learning Guided Resolution Reconfiguration | Aviral Pandey et.al. | 2510.22924 | null |
| 2025-10-13 | J-ORA: A Framework and Multimodal Dataset for Japanese Object Identification, Reference, Action Prediction in Robot Perception | Jesse Atuhurra et.al. | 2510.21761 | link |
| 2025-10-22 | From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction | Zhida Zhao et.al. | 2510.19654 | null |
| 2025-10-22 | Vision-Based Mistake Analysis in Procedural Activities: A Review of Advances and Challenges | Konstantinos Bacharidis et.al. | 2510.19292 | null |
| 2025-10-22 | MobiAct: Efficient MAV Action Recognition Using MobileNetV4 with Contrastive Learning and Knowledge Distillation | Zhang Nengbo et.al. | 2510.19273 | null |
| 2025-10-22 | See, Think, Act: Online Shopper Behavior Simulation with VLM Agents | Yimeng Zhang et.al. | 2510.19245 | null |
| 2025-10-21 | UniHPR: Unified Human Pose Representation via Singular Value Contrastive Learning | Zhongyu Jiang et.al. | 2510.19078 | null |
| 2025-10-21 | A Renaissance of Explicit Motion Information Mining from Transformers for Action Recognition | Peiqin Zhuang et.al. | 2510.18705 | null |
| 2025-10-21 | Biomechanically consistent real-time action recognition for human-robot interaction | Wanchen Li et.al. | 2510.18373 | null |
| 2025-10-21 | FST.ai 2.0: An Explainable AI Ecosystem for Fair, Fast, and Inclusive Decision-Making in Olympic and Paralympic Taekwondo | Keivan Shariatmadar et.al. | 2510.18193 | null |
| 2025-10-20 | Muscle Anatomy-aware Geometric Deep Learning for sEMG-based Gesture Decoding | Adyasha Dash et.al. | 2510.17660 | null |
| 2025-10-18 | MoS-VLA: A Vision-Language-Action Model with One-Shot Skill Adaptation | Ruihan Zhao et.al. | 2510.16617 | null |
| 2025-10-18 | RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba | Kunyu Peng et.al. | 2510.16444 | null |
| 2025-10-17 | StretchySnake: Flexible SSM Training Unlocks Action Recognition Across Spatio-Temporal Scales | Nyle Siddiqui et.al. | 2510.16209 | null |
| 2025-10-17 | MAVR-Net: Robust Multi-View Learning for MAV Action Recognition with Cross-View Attention | Nengbo Zhang et.al. | 2510.15448 | null |
| 2025-10-16 | MoCom: Motion-based Inter-MAV Visual Communication Using Event Vision and Spiking Neural Networks | Zhang Nengbo et.al. | 2510.14770 | null |
| 2025-10-15 | Generalizing WiFi Gesture Recognition via Large-Model-Aware Semantic Distillation and Alignment | Feng-Qi Cui et.al. | 2510.13390 | null |
| 2025-10-14 | SVAG-Bench: A Large-Scale Benchmark for Multi-Instance Spatio-temporal Video Action Grounding | Tanveer Hannan et.al. | 2510.13016 | null |
| 2025-10-13 | FOSSIL: Harnessing Feedback on Suboptimal Samples for Data-Efficient Generalisation with Imitation Learning for Embodied Vision-and-Language Tasks | Sabrina McCallum et.al. | 2510.11307 | null |
| 2025-10-13 | Mixup Helps Understanding Multimodal Video Better | Xiaoyu Ma et.al. | 2510.10986 | null |
| 2025-10-12 | MSF-Mamba: Motion-aware State Fusion Mamba for Efficient Micro-Gesture Recognition | Deng Li et.al. | 2510.10478 | null |
| 2025-10-11 | Dejavu: Towards Experience Feedback Learning for Embodied Intelligence | Shaokai Wu et.al. | 2510.10181 | null |
| 2025-10-11 | SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation | Zeyu Ling et.al. | 2510.10069 | null |
| 2025-10-10 | Enhancing Diffusion Policy with Classifier-Free Guidance for Temporal Robotic Tasks | Yuang Lu et.al. | 2510.09786 | null |
| 2025-10-10 | Diagnosing Shoulder Disorders Using Multimodal Large Language Models and Consumer-Grade Cameras | Jindong Hong et.al. | 2510.09230 | null |
| 2025-10-09 | Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools | Zhenlong Yuan et.al. | 2510.08480 | null |
| 2025-10-09 | MMHOI: Modeling Complex 3D Multi-Human Multi-Object Interactions | Kaen Kogashi et.al. | 2510.07828 | null |
| 2025-10-07 | Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model | Danush Kumar Venkatesh et.al. | 2510.07345 | null |
| 2025-10-08 | Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping | Ziyi Wang et.al. | 2510.07230 | null |
| 2025-10-08 | TrackVLA++: Unleashing Reasoning and Memory Capabilities in VLA Models for Embodied Visual Tracking | Jiahang Liu et.al. | 2510.07134 | null |
| 2025-10-07 | From Captions to Keyframes: KeyScore for Multimodal Frame Scoring and Video-Language Understanding | Shih-Yao Lin et.al. | 2510.06509 | null |
| 2025-10-07 | Human Action Recognition from Point Clouds over Time | James Dickens et.al. | 2510.05506 | null |
| 2025-10-05 | Speculative Actions: A Lossless Framework for Faster Agentic Systems | Naimeng Ye et.al. | 2510.04371 | null |
| 2025-10-04 | Talking Tennis: Language Feedback from 3D Biomechanical Action Recognition | Arushi Dashore et.al. | 2510.03921 | null |
| 2025-10-04 | MonitorVLM:A Vision Language Framework for Safety Violation Detection in Mining Operations | Jiang Wu et.al. | 2510.03666 | null |
| 2025-10-03 | FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents | Imene Kerboua et.al. | 2510.03204 | link |
| 2025-09-27 | $\texttt{BluePrint}$ : A Social Media User Dataset for LLM Persona Evaluation and Training | Aurélien Bück-Kaeffer et.al. | 2510.02343 | null |
| 2025-10-02 | Wearable and Ultra-Low-Power Fusion of EMG and A-Mode US for Hand-Wrist Kinematic Tracking | Giusy Spacone et.al. | 2510.02000 | null |
| 2025-10-02 | Contrastive Representation Regularization for Vision-Language-Action Models | Taeyoung Kim et.al. | 2510.01711 | null |
| 2025-10-01 | EvoStruggle: A Dataset Capturing the Evolution of Struggle across Activities and Skill Levels | Shijia Feng et.al. | 2510.01362 | link |
| 2025-10-01 | HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy | Myungkyu Koo et.al. | 2510.00695 | null |
| 2025-10-01 | Expandable Decision-Making States for Multi-Agent Deep Reinforcement Learning in Soccer Tactical Analysis | Kenjiro Ide et.al. | 2510.00480 | null |
| 2025-09-30 | Towards Intuitive Human-Robot Interaction through Embodied Gesture-Driven Control with Woven Tactile Skins | ChunPing Lam et.al. | 2509.25951 | null |
| 2025-09-22 | Six Sigma For Neural Networks: Taguchi-based optimization | Sai Varun Kodathala et.al. | 2509.25213 | null |
| 2025-09-29 | Fast Real-Time Pipeline for Robust Arm Gesture Recognition | Milán Zsolt Bagladi et.al. | 2509.25042 | null |
| 2025-09-28 | AssemblyHands-X: Modeling 3D Hand-Body Coordination for Understanding Bimanual Human Activities | Tatsuro Banno et.al. | 2509.23888 | null |
| 2025-09-27 | New Synthetic Goldmine: Hand Joint Angle-Driven EMG Data Generation Framework for Micro-Gesture Recognition | Nana Wang et.al. | 2509.23359 | null |
| 2025-09-27 | Spatiotemporal Radar Gesture Recognition with Hybrid Spiking Neural Networks: Balancing Accuracy and Efficiency | Riccardo Mazzieri et.al. | 2509.23303 | null |
| 2025-09-27 | MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action Recognition | Ye-eun Kim et.al. | 2509.23044 | null |
| 2025-09-27 | Disentangling Static and Dynamic Information for Reducing Static Bias in Action Recognition | Masato Kobayashi et.al. | 2509.23009 | null |
| 2025-09-26 | See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation | Chih Yao Hu et.al. | 2509.22653 | null |
| 2025-09-26 | Prompt-guided Representation Disentanglement for Action Recognition | Tianci Wu et.al. | 2509.21783 | null |
| 2025-09-25 | SlotFM: A Motion Foundation Model with Slot Attention for Diverse Downstream Tasks | Junyong Park et.al. | 2509.21673 | null |
| 2025-09-25 | Temporal vs. Spatial: Comparing DINOv3 and V-JEPA2 Feature Representations for Video Action Analysis | Sai Varun Kodathala et.al. | 2509.21595 | null |
| 2025-09-25 | EMG-UP: Unsupervised Personalization in Cross-User EMG Gesture Recognition | Nana Wang et.al. | 2509.21589 | null |
| 2025-09-24 | mmHSense: Multi-Modal and Distributed mmWave ISAC Datasets for Human Sensing | Nabeel Nisar Bhat et.al. | 2509.21396 | null |
| 2025-09-25 | Every Subtlety Counts: Fine-grained Person Independence Micro-Action Recognition via Distributionally Robust Optimization | Feng-Qi Cui et.al. | 2509.21261 | null |
| 2025-09-25 | Autoregressive End-to-End Planning with Time-Invariant Spatial Alignment and Multi-Objective Policy Refinement | Jianbo Zhao et.al. | 2509.20938 | null |
| 2025-09-25 | GenFacts-Generative Counterfactual Explanations for Multi-Variate Time Series | Sarah Seifi et.al. | 2509.20936 | null |
| 2025-09-25 | Causal Time Series Generation via Diffusion Models | Yutong Xia et.al. | 2509.20846 | null |
| 2025-09-23 | A Bimanual Gesture Interface for ROS-Based Mobile Manipulators Using TinyML and Sensor Fusion | Najeeb Ahmed Bhuiyan et.al. | 2509.19521 | null |
| 2025-09-23 | FERA: Foil Fencing Referee Assistant Using Pose-Based Multi-Label Move Recognition and Rule Reasoning | Ziwen Chen et.al. | 2509.18527 | null |
| 2025-09-22 | MoCrop: Training Free Motion Guided Cropping for Efficient Video Action Recognition | Binhua Huang et.al. | 2509.18473 | null |
| 2025-09-22 | Orcust: Stepwise-Feedback Reinforcement Learning for GUI Agent | Junyu Lu et.al. | 2509.17917 | null |
| 2025-09-22 | Trainee Action Recognition through Interaction Analysis in CCATT Mixed-Reality Training | Divya Mereddy et.al. | 2509.17888 | null |
| 2025-09-22 | A $^2$M$^2$ -Net: Adaptively Aligned Multi-Scale Moment for Few-Shot Action Recognition | Zilin Gao et.al. | 2509.17638 | null |
| 2025-09-22 | UIPro: Unleashing Superior Interaction Capability For GUI Agents | Hongxin Li et.al. | 2509.17328 | null |
| 2025-09-21 | Imagine2Act: Leveraging Object-Action Motion Consistency from Imagined Goals for Robotic Manipulation | Liang Heng et.al. | 2509.17125 | null |
| 2025-09-21 | MoCLIP-Lite: Efficient Video Recognition by Fusing CLIP with Motion Vectors | Binhua Huang et.al. | 2509.17084 | null |
| 2025-09-20 | Automated Procedural Analysis via Video-Language Models for AI-assisted Nursing Skills Assessment | Shen Chang et.al. | 2509.16810 | null |
| 2025-09-19 | KRAST: Knowledge-Augmented Robotic Action Recognition with Structured Text for Vision-Language Models | Son Hai Nguyen et.al. | 2509.16452 | null |
| 2025-09-18 | RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation | Yuming Jiang et.al. | 2509.15212 | link |
| 2025-09-18 | Doppler Radiance Field-Guided Antenna Selection for Improved Generalization in Multi-Antenna Wi-Fi-based Human Activity Recognition | Navid Hasanzadeh et.al. | 2509.15129 | null |
| 2025-09-18 | LSTC-MDA: A Unified Framework for Long-Short Term Temporal Convolution and Mixed Data Augmentation in Skeleton-Based Action Recognition | Feng Ding et.al. | 2509.14619 | null |
| 2025-09-18 | ClearFairy: Capturing Creative Workflows through Decision Structuring, In-Situ Questioning, and Rationale Inference | Kihoon Son et.al. | 2509.14537 | null |
| 2025-09-15 | Domain-Adaptive Pretraining Improves Primate Behavior Recognition | Felix B. Mueller et.al. | 2509.12193 | null |
| 2025-09-15 | Open-ended Hierarchical Streaming Video Understanding with Vision Language Models | Hyolim Kang et.al. | 2509.12145 | null |
| 2025-09-15 | Gesture-Based Robot Control Integrating Mm-wave Radar and Behavior Trees | Yuqing Song et.al. | 2509.12008 | null |
| 2025-09-15 | Learning Representations in Video Game Agents with Supervised Contrastive Imitation Learning | Carlos Celemin et.al. | 2509.11880 | null |
| 2025-09-11 | Improvement of Human-Object Interaction Action Recognition Using Scene Information and Multi-Task Learning Approach | Hesham M. Shehata et.al. | 2509.09067 | null |
| 2025-09-10 | A Contextual Bandits Approach for Personalization of Hand Gesture Recognition | Duke Lin et.al. | 2509.08915 | null |
| 2025-09-10 | Diffusion-Based Action Recognition Generalizes to Untrained Domains | Rogerio Guimaraes et.al. | 2509.08908 | null |
| 2025-09-10 | Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening | Piyush Bagad et.al. | 2509.08502 | null |
| 2025-09-10 | LD-ViCE: Latent Diffusion Model for Video Counterfactual Explanations | Payal Varshney et.al. | 2509.08422 | null |
| 2025-09-09 | EHWGesture – A dataset for multimodal understanding of clinical gestures | Gianluca Amprimo et.al. | 2509.07525 | null |
| 2025-09-09 | G3CN: Gaussian Topology Refinement Gated Graph Convolutional Network for Skeleton-Based Action Recognition | Haiqing Ren et.al. | 2509.07335 | null |
| 2025-08-05 | Live Demonstration: Neuromorphic Radar for Gesture Recognition | Satyapreet Singh Yadav et.al. | 2508.03324 | null |
| 2025-07-22 | Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition | Zefeng Qian et.al. | 2507.16287 | null |
| 2025-07-22 | SPACT18: Spiking Human Action Recognition Benchmark Dataset with Complementary RGB and Thermal Modalities | Yasser Ashraf et.al. | 2507.16151 | null |
| 2025-07-20 | Light Future: Multimodal Action Frame Prediction via InstructPix2Pix | Zesen Zhong et.al. | 2507.14809 | null |
| 2025-07-17 | A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains | Antonio Finocchiaro et.al. | 2507.13326 | null |
| 2025-07-17 | Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities | Liuyi Wang et.al. | 2507.13019 | null |
| 2025-07-17 | Generalist Bimanual Manipulation via Foundation Video Diffusion Models | Yao Feng et.al. | 2507.12898 | null |
| 2025-07-16 | Predicting Soccer Penalty Kick Direction Using Human Action Recognition | David Freire-Obregón et.al. | 2507.12617 | null |
| 2025-07-18 | DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition | Hayat Ullah et.al. | 2507.12426 | null |
| 2025-07-16 | Calisthenics Skills Temporal Video Segmentation | Antonio Finocchiaro et.al. | 2507.12245 | null |
| 2025-07-15 | Diffusion-Based Imaginative Coordination for Bimanual Manipulation | Huilin Xu et.al. | 2507.11296 | null |
| 2025-07-15 | Women Sport Actions Dataset for Visual Classification Using Small Scale Training Data | Palash Ray et.al. | 2507.10969 | null |
| 2025-07-14 | Hand Gesture Recognition for Collaborative Robots Using Lightweight Deep Learning in Real-Time Robotic Systems | Muhtadin et.al. | 2507.10055 | null |
| 2025-07-13 | Online Micro-gesture Recognition Using Data Augmentation and Spatial-Temporal Attention | Pengyu Liu et.al. | 2507.09512 | null |
| 2025-07-11 | MM-Gesture: Towards Precise Micro-Gesture Recognition through Multimodal Fusion | Jihao Gu et.al. | 2507.08344 | null |
| 2025-07-10 | Multimodal Framework for Explainable Autonomous Driving: Integrating Video, Sensor, and Textual Data for Enhanced Decision-Making and Transparency | Abolfazl Zarghani et.al. | 2507.07938 | null |
| 2025-07-10 | EEvAct: Early Event-Based Action Recognition with High-Rate Two-Stream Spiking Neural Networks | Michael Neumeier et.al. | 2507.07734 | null |
| 2025-07-09 | Cross-Modal Dual-Causal Learning for Long-Term Action Recognition | Xu Shaowu et.al. | 2507.06603 | null |
| 2025-07-08 | Hierarchical Multi-Stage Transformer Architecture for Context-Aware Temporal Action Localization | Hayat Ullah et.al. | 2507.06411 | null |
| 2025-07-10 | VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting | Juyi Lin et.al. | 2507.05116 | link |
| 2025-07-07 | HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding | Yuxuan Cai et.al. | 2507.04909 | null |
| 2025-07-06 | Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions | Konstantinos Foteinos et.al. | 2507.04465 | null |
| 2025-07-06 | DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge | Wenyao Zhang et.al. | 2507.04447 | link |
| 2025-07-04 | Masked Temporal Interpolation Diffusion for Procedure Planning in Instructional Videos | Yufan Zhou et.al. | 2507.03393 | link |
| 2025-07-05 | AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation | Sixiang Chen et.al. | 2507.01961 | null |
| 2025-07-02 | Variational Graph Convolutional Neural Networks | Illia Oleksiienko et.al. | 2507.01699 | null |
| 2025-07-01 | Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment | Kai Zhou et.al. | 2507.00566 | null |
| 2025-06-30 | LineRetriever: Planning-Aware Observation Reduction for Web Agents | Imene Kerboua et.al. | 2507.00210 | null |
| 2025-06-30 | Online Human Action Detection during Escorting | Siddhartha Mondal et.al. | 2506.23573 | null |
| 2025-06-29 | DEL: Dense Event Localization for Multi-modal Audio-Visual Understanding | Mona Ahmadian et.al. | 2506.23196 | null |
| 2025-06-27 | Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition | Wenhan Wu et.al. | 2506.22179 | null |
| 2025-06-26 | WorldVLA: Towards Autoregressive Action World Model | Jun Cen et.al. | 2506.21539 | link |
| 2025-06-26 | EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception | Sanjoy Chowdhury et.al. | 2506.21080 | null |
| 2025-06-25 | How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction? | Stephanie Käs et.al. | 2506.20795 | null |
| 2025-06-25 | CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition | Joerg Deigmoeller et.al. | 2506.20373 | null |
| 2025-06-25 | Feature Hallucination for Self-supervised Action Recognition | Lei Wang et.al. | 2506.20342 | null |
| 2025-06-27 | ReactEMG: Zero-Shot, Low-Latency Intent Detection via sEMG | Runsheng Wang et.al. | 2506.19815 | null |
| 2025-06-24 | Self-Paced Collaborative and Adversarial Network for Unsupervised Domain Adaptation | Weichen Zhang et.al. | 2506.19267 | null |
| 2025-06-23 | Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition | Dustin Aganian et.al. | 2506.18721 | null |
| 2025-06-23 | Improving Weakly Supervised Temporal Action Localization by Exploiting Multi-resolution Information in Temporal Domain | Rui Su et.al. | 2506.18261 | null |
| 2025-06-23 | Robot Tactile Gesture Recognition Based on Full-body Modular E-skin | Shuo Jiang et.al. | 2506.18256 | null |
| 2025-06-22 | Adapting Vision-Language Models for Evaluating World Models | Mariya Hendriksen et.al. | 2506.17967 | null |
| 2025-06-21 | Domain Generalization using Action Sequences for Egocentric Action Recognition | Amirshayan Nasirimajd et.al. | 2506.17685 | null |
| 2025-06-20 | Wi-Fi Sensing Tool Release: Gathering 802.11ax Channel State Information from a Commercial Wi-Fi Access Point | Zisheng Wang et.al. | 2506.16957 | null |
| 2025-06-20 | Language-driven Description Generation and Common Sense Reasoning for Video Action Recognition | Xiaodan Hu et.al. | 2506.16701 | null |
| 2025-06-19 | CLIP-MG: Guiding Semantic Attention with Skeletal Pose Features and RGB Data for Micro-Gesture Recognition on the iMiGUE Dataset | Santosh Patapati et.al. | 2506.16385 | null |
| 2025-06-18 | Accessible Gesture-Driven Augmented Reality Interaction System | Yikan Wang et.al. | 2506.15189 | null |
| 2025-06-17 | CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion | Jiahua Ma et.al. | 2506.14769 | null |
| 2025-06-16 | Leveraging Vision-Language Pre-training for Human Activity Recognition in Still Images | Cristina Mahanta et.al. | 2506.13458 | null |
| 2025-06-16 | Active Multimodal Distillation for Few-shot Action Recognition | Weijia Feng et.al. | 2506.13322 | null |
| 2025-06-16 | Action Dubber: Timing Audible Actions via Inflectional Flow | Wenlong Wan et.al. | 2506.13320 | null |
| 2025-06-15 | Towards Fine-Grained Emotion Understanding via Skeleton-Based Micro-Gesture Recognition | Hao Xu et.al. | 2506.12848 | null |
| 2025-06-13 | Pose Matters: Evaluating Vision Transformers and CNNs for Human Action Recognition on Small COCO Subsets | MingZe Tang et.al. | 2506.11678 | null |
| 2025-06-12 | GynSurg: A Comprehensive Gynecology Laparoscopic Surgery Dataset | Sahar Nasirihaghighi et.al. | 2506.11356 | null |
| 2025-06-12 | WaveFormer: A Lightweight Transformer Model for sEMG-based Gesture Recognition | Yanlong Chen et.al. | 2506.11168 | null |
| 2025-06-11 | SLRNet: A Real-Time LSTM-Based Sign Language Recognition System | Sharvari Kamble et.al. | 2506.11154 | link |
| 2025-06-10 | Gender Fairness of Machine Learning Algorithms for Pain Detection | Dylan Green et.al. | 2506.11132 | null |
| 2025-06-12 | Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop | Justin Kerr et.al. | 2506.10968 | null |
| 2025-06-11 | HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios | Kunyu Peng et.al. | 2506.09650 | link |
| 2025-06-11 | Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation | Ye Niu et.al. | 2506.09422 | null |
| 2025-06-11 | Synthetic Human Action Video Data Generation with Pose Transfer | Vaclav Knapp et.al. | 2506.09411 | null |
| 2025-06-11 | An Effective End-to-End Solution for Multimodal Action Recognition | Songping Wang et.al. | 2506.09345 | null |
| 2025-06-10 | Diver-Robot Communication Dataset for Underwater Hand Gesture Recognition | Igor Kvasić et.al. | 2506.08974 | null |
| 2025-06-09 | BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models | Peiyan Li et.al. | 2506.07961 | link |
| 2025-06-08 | AugmentGest: Can Random Data Cropping Augmentation Boost Gesture Recognition Performance? | Nada Aboudeshish et.al. | 2506.07216 | null |
| 2025-06-08 | SAP-Bench: Benchmarking Multimodal Large Language Models in Surgical Action Planning | Mengya Xu et.al. | 2506.07196 | null |
| 2025-06-07 | PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments | Minghao Zou et.al. | 2506.06631 | null |
| 2025-06-06 | Conversational Interfaces for Parametric Conceptual Architectural Design: Integrating Mixed Reality with LLM-driven Interaction | Ruochen Ji et.al. | 2506.06066 | null |
| 2025-06-06 | DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models | Yuhan Hao et.al. | 2506.05667 | null |
| 2025-06-05 | Robustness Evaluation for Video Models with Reinforcement Learning | Ashwin Ramesh Babu et.al. | 2506.05431 | null |
| 2025-06-04 | Video, How Do Your Tokens Merge? | Sam Pollard et.al. | 2506.03885 | null |
| 2025-06-04 | Zero-Shot Temporal Interaction Localization for Egocentric Videos | Erhang Zhang et.al. | 2506.03662 | link |
| 2025-06-04 | Heterogeneous Skeleton-Based Action Representation Learning | Hongsong Wang et.al. | 2506.03481 | null |
| 2025-06-04 | Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments | Di Wen et.al. | 2506.02845 | link |
| 2025-06-03 | Technical Report for Ego4D Long-Term Action Anticipation Challenge 2025 | Qiaohui Chu et.al. | 2506.02550 | null |
| 2025-06-03 | VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments | Zelai Xu et.al. | 2506.02387 | link |
| 2025-06-03 | Multi-level and Multi-modal Action Anticipation | Seulgi Kim et.al. | 2506.02382 | null |
| 2025-06-02 | TransAct V2: Lifelong User Action Sequence Modeling on Pinterest Recommendation | Xue Xia et.al. | 2506.02267 | null |
| 2025-06-02 | SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics | Mustafa Shukor et.al. | 2506.01844 | link |
| 2025-06-02 | Efficient Egocentric Action Recognition with Multimodal Data | Marco Calzavara et.al. | 2506.01757 | null |
| 2025-06-02 | EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language models | Andy Bonnetto et.al. | 2506.01608 | link |
| 2025-06-02 | Sheep Facial Pain Assessment Under Weighted Graph Neural Networks | Alam Noor et.al. | 2506.01468 | null |
| 2025-06-02 | EgoBrain: Synergizing Minds and Eyes For Human Action Understanding | Nie Lin et.al. | 2506.01353 | null |
| 2025-05-30 | DiG-Net: Enhancing Quality of Life through Hyper-Range Dynamic Gesture Recognition in Assistive Robotics | Eran Bamani Beeri et.al. | 2505.24786 | null |
| 2025-05-30 | Beyond FACS: Data-driven Facial Expression Dictionaries, with Application to Predicting Autism | Evangelos Sariyanidi et.al. | 2505.24679 | null |
| 2025-05-30 | EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding | Ege Özsoy et.al. | 2505.24287 | null |
| 2025-05-29 | Autoregressive Meta-Actions for Unified Controllable Trajectory Generation | Jianbo Zhao et.al. | 2505.23612 | null |
| 2025-05-29 | CLIP-AE: CLIP-assisted Cross-view Audio-Visual Enhancement for Unsupervised Temporal Action Localization | Rui Xia et.al. | 2505.23524 | null |
| 2025-05-29 | Spatio-Temporal Joint Density Driven Learning for Skeleton-Based Action Recognition | Shanaka Ramesh Gunasekara et.al. | 2505.23012 | link |
| 2025-05-28 | PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion | Jaehyun Choi et.al. | 2505.22564 | null |
| 2025-05-27 | DeepConvContext: A Multi-Scale Approach to Timeseries Classification in Human Activity Recognition | Marius Bock et.al. | 2505.20894 | link |
| 2025-05-27 | TrustSkin: A Fairness Pipeline for Trustworthy Facial Affect Analysis Across Skin Tone | Ana M. Cabanas et.al. | 2505.20637 | null |
| 2025-05-26 | Data-Free Class-Incremental Gesture Recognition with Prototype-Guided Pseudo Feature Replay | Hongsong Wang et.al. | 2505.20049 | link |
| 2025-05-26 | PHI: Bridging Domain Shift in Long-Term Action Quality Assessment via Progressive Hierarchical Instruction | Kanglei Zhou et.al. | 2505.19972 | link |
| 2025-05-26 | The Role of Video Generation in Enhancing Data-Limited Action Understanding | Wei Li et.al. | 2505.19495 | null |
| 2025-05-24 | ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos | Xiaodong Wang et.al. | 2505.18650 | null |
| 2025-05-27 | SHARDeg: A Benchmark for Skeletal Human Action Recognition in Degraded Scenarios | Simon Malzard et.al. | 2505.18048 | null |
| 2025-05-23 | 3D Face Reconstruction Error Decomposed: A Modular Benchmark for Fair and Fast Method Evaluation | Evangelos Sariyanidi et.al. | 2505.18025 | null |
| 2025-05-23 | Multi-task Learning For Joint Action and Gesture Recognition | Konstantinos Spathis et.al. | 2505.17867 | null |
| 2025-05-23 | Temporal Consistency Constrained Transferable Adversarial Attacks with Background Mixup for Action Recognition | Ping Li et.al. | 2505.17807 | link |
| 2025-05-23 | Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour | Bálint Gyevnár et.al. | 2505.17801 | null |
| 2025-05-23 | SVL: Spike-based Vision-language Pretraining for Efficient 3D Open-world Understanding | Xuerui Qiu et.al. | 2505.17674 | null |
| 2025-05-23 | ProTAL: A Drag-and-Link Video Programming Framework for Temporal Action Localization | Yuchen He et.al. | 2505.17555 | null |
| 2025-05-22 | UAV Control with Vision-based Hand Gesture Recognition over Edge-Computing | Sousannah Abdalla et.al. | 2505.17303 | null |
| 2025-05-22 | CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning | Jiange Yang et.al. | 2505.17006 | null |
| 2025-05-21 | Towards Zero-Shot Differential Morphing Attack Detection with Multimodal Large Language Models | Ria Shekhawat et.al. | 2505.15332 | null |
| 2025-05-21 | DiffProb: Data Pruning for Face Recognition | Eduarda Caldeira et.al. | 2505.15272 | link |
| 2025-05-21 | Leveraging Foundation Models for Multimodal Graph-Based Action Recognition | Fatemeh Ziaeetabar et.al. | 2505.15192 | null |
| 2025-05-20 | Egocentric Action-aware Inertial Localization in Point Clouds | Mingfang Zhang et.al. | 2505.14346 | link |
| 2025-05-20 | Transfer Learning from Visual Speech Recognition to Mouthing Recognition in German Sign Language | Dinh Nam Pham et.al. | 2505.13784 | link |
| 2025-05-18 | MTIL: Encoding Full History with Mamba for Temporal Imitation Learning | Yulin Zhou et.al. | 2505.12410 | link |
| 2025-05-20 | Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation | Shuo Wang et.al. | 2505.11886 | null |
| 2025-05-16 | Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation | Zihan Wang et.al. | 2505.11383 | link |
| 2025-05-15 | NeoLightning: A Modern Reimagination of Gesture-Based Sound Design | Yonghyun Kim et.al. | 2505.10686 | link |
| 2025-05-15 | Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized? | Jianyang Xie et.al. | 2505.10679 | link |
| 2025-05-14 | Mission Balance: Generating Under-represented Class Samples using Video Diffusion Models | Danush Kumar Venkatesh et.al. | 2505.09858 | link |
| 2025-05-13 | Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection | Ayush K. Rai et.al. | 2505.08561 | null |
| 2025-05-17 | Training Strategies for Efficient Embodied Reasoning | William Chen et.al. | 2505.08243 | null |
| 2025-05-12 | H $^{\mathbf{3}}$ DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning | Yiyang Lu et.al. | 2505.07819 | null |
| 2025-05-11 | DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems | Tong Zhang et.al. | 2505.07110 | null |
| 2025-05-10 | A Short Overview of Multi-Modal Wi-Fi Sensing | Zijian Zhao et.al. | 2505.06682 | link |
| 2025-05-09 | Context Informed Incremental Learning Improves Myoelectric Control Performance in Virtual Reality Object Manipulation Tasks | Gabriel Gagné et.al. | 2505.06064 | link |
| 2025-05-09 | Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition | Congqi Cao et.al. | 2505.06002 | link |
| 2025-05-07 | DetReIDX: A Stress-Test Dataset for Real-World UAV-Based Person Recognition | Kailash A. Hambarde et.al. | 2505.04793 | link |
| 2025-05-07 | Comparison of Visual Trackers for Biomechanical Analysis of Running | Luis F. Gomez et.al. | 2505.04713 | null |
| 2025-05-07 | Trajectory Entropy Reinforcement Learning for Predictable and Robust Control | Bang You et.al. | 2505.04193 | null |
| 2025-05-07 | FoodTrack: Estimating Handheld Food Portions with Egocentric Video | Ervin Wang et.al. | 2505.04055 | null |
| 2025-05-06 | Action Spotting and Precise Event Detection in Sports: Datasets, Methods, and Challenges | Hao Xu et.al. | 2505.03991 | null |
| 2025-05-03 | A Multimodal Framework for Explainable Evaluation of Soft Skills in Educational Environments | Jared D. T. Guerrero-Sosa et.al. | 2505.01794 | null |
| 2025-05-01 | Predicting Estimated Times of Restoration for Electrical Outages Using Longitudinal Tabular Transformers | Bogireddy Sai Prasanna Teja et.al. | 2505.00225 | null |
| 2025-04-30 | Direct Motion Models for Assessing Generated Videos | Kelsey Allen et.al. | 2505.00209 | null |
| 2025-04-30 | CoCoDiff: Diversifying Skeleton Action Features via Coarse-Fine Text-Co-Guided Latent Diffusion | Zhifu Zhao et.al. | 2504.21266 | null |
| 2025-04-29 | Beyond the Horizon: Decoupling UAVs Multi-View Action Recognition via Partial Order Transfer | Wenxuan Liu et.al. | 2504.20530 | null |
| 2025-04-28 | ProFi-Net: Prototype-based Feature Attention with Curriculum Augmentation for WiFi-based Gesture Recognition | Zhe Cui et.al. | 2504.20193 | null |
| 2025-04-28 | FSBench: A Figure Skating Benchmark for Advancing Artistic Sports Understanding | Rong Gao et.al. | 2504.19514 | null |
| 2025-04-26 | 3DPyranet Features Fusion for Spatio-temporal Feature Learning | Ihsan Ullah et.al. | 2504.18977 | null |
| 2025-04-25 | POET: Prompt Offset Tuning for Continual Human Action Adaptation | Prachi Garg et.al. | 2504.18059 | null |
| 2025-04-25 | RSRNav: Reasoning Spatial Relationship for Image-Goal Navigation | Zheng Qin et.al. | 2504.17991 | null |
| 2025-04-24 | Robotic Task Ambiguity Resolution via Natural Language Interaction | Eugenio Chisari et.al. | 2504.17748 | null |
| 2025-04-23 | Latent Diffusion Planning for Imitation Learning | Amber Xie et.al. | 2504.16925 | null |
| 2025-04-23 | WiFi based Human Fall and Activity Recognition using Transformer based Encoder Decoder and Graph Neural Networks | Younggeol Cho et.al. | 2504.16655 | null |
| 2025-04-23 | Advancing Radar Hand Gesture Recognition: A Hybrid Spectrum Synthetic Framework Merging Simulation with Neural Networks | Jiaqi Tang et.al. | 2504.16423 | null |
| 2025-04-21 | Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer | Ziyi Liu et.al. | 2504.14860 | null |
| 2025-04-20 | Time Frequency Analysis of EMG Signal for Gesture Recognition using Fine grained Features | Parshuram N. Aarotale et.al. | 2504.14708 | null |
| 2025-04-22 | Talk is Not Always Cheap: Promoting Wireless Sensing Models with Text Prompts | Zhenkui Yang et.al. | 2504.14621 | link |
| 2025-04-19 | Balancing Privacy and Action Performance: A Penalty-Driven Approach to Image Anonymization | Nazia Aslam et.al. | 2504.14301 | null |
| 2025-04-18 | Are you SURE? Enhancing Multimodal Pretraining with Missing Modalities through Uncertainty Estimation | Duy A. Nguyen et.al. | 2504.13465 | null |
| 2025-04-23 | Chain-of-Thought Textual Reasoning for Few-shot Temporal Action Localization | Hongwei Ji et.al. | 2504.13460 | null |
| 2025-04-17 | Wearable-Derived Behavioral and Physiological Biomarkers for Classifying Unipolar and Bipolar Depression Severity | Yassine Ouzar et.al. | 2504.13331 | null |
| 2025-04-17 | PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition | Jongseo Lee et.al. | 2504.13140 | null |
| 2025-04-16 | SkeletonX: Data-Efficient Skeleton-based Action Recognition via Cross-sample Feature Aggregation | Zongye Zhang et.al. | 2504.11749 | link |
| 2025-04-14 | Toward Aligning Human and Robot Actions via Multi-Modal Demonstration Learning | Azizul Zahid et.al. | 2504.11493 | link |
| 2025-04-14 | H-MoRe: Learning Human-centric Motion Representation for Action Analysis | Zhanbo Huang et.al. | 2504.10676 | null |
| 2025-04-14 | Towards Low-Latency Event-based Obstacle Avoidance on a FPGA-Drone | Pietro Bonazzi et.al. | 2504.10400 | link |
| 2025-04-14 | Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition | Hongyu Qu et.al. | 2504.10079 | null |
| 2025-04-14 | EmbodiedAgent: A Scalable Hierarchical Approach to Overcome Practical Challenge in Multi-Robot Control | Hanwen Wan et.al. | 2504.10030 | link |
| 2025-04-14 | Hands-On: Segmenting Individual Signs from Continuous Sequences | Low Jian He et.al. | 2504.08593 | null |
| 2025-04-11 | Knowledge Distillation for Multimodal Egocentric Action Recognition Robust to Missing Modalities | Maria Santos-Villafranca et.al. | 2504.08578 | null |
| 2025-04-11 | Breaking the Barriers: Video Vision Transformers for Word-Level Sign Language Recognition | Alexander Brettmann et.al. | 2504.07792 | null |
| 2025-04-10 | Towards Micro-Action Recognition with Limited Annotations: An Asynchronous Pseudo Labeling and Training Approach | Yan Zhang et.al. | 2504.07785 | null |
| 2025-04-13 | ID-Booth: Identity-consistent Face Generation with Diffusion Models | Darian Tomašević et.al. | 2504.07392 | link |
| 2025-04-09 | Leveraging GCN-based Action Recognition for Teleoperation in Daily Activity Assistance | Thomas M. Kwok et.al. | 2504.07001 | null |
| 2025-04-09 | Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation | Sirine Arfa et.al. | 2504.06748 | null |
| 2025-04-09 | Exploring Ordinal Bias in Action Recognition for Instructional Videos | Joochan Kim et.al. | 2504.06580 | link |
| 2025-04-08 | FaceCloak: Learning to Protect Face Templates | Sudipta Banerjee et.al. | 2504.06131 | null |
| 2025-04-08 | Modular Soft Wearable Glove for Real-Time Gesture Recognition and Dynamic 3D Shape Reconstruction | Huazhi Dong et.al. | 2504.05983 | null |
| 2025-04-08 | Temporal Alignment-Free Video Matching for Few-shot Action Recognition | SuBeen Lee et.al. | 2504.05956 | null |
| 2025-04-08 | SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning | Fida Mohammad Thoker et.al. | 2504.05706 | null |
| 2025-04-07 | SelfMAD: Enhancing Generalization and Robustness in Morphing Attack Detection via Self-Supervised Learning | Marija Ivanovska et.al. | 2504.05504 | null |
| 2025-04-06 | SnapPix: Efficient-Coding–Inspired In-Sensor Compression for Edge Vision | Weikai Lin et.al. | 2504.04535 | null |
| 2025-04-04 | An Exploration-free Method for a Linear Stochastic Bandit Driven by a Linear Gaussian Dynamical System | Jonathan Gornet et.al. | 2504.03926 | null |
| 2025-04-04 | Electromyography-Based Gesture Recognition: Hierarchical Feature Extraction for Enhanced Spatial-Temporal Dynamics | Jungpil Shin et.al. | 2504.03221 | null |
| 2025-04-02 | UAC: Uncertainty-Aware Calibration of Neural Networks for Gesture Detection | Farida Al Haddad et.al. | 2504.02895 | null |
| 2025-04-03 | Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets | Chuning Zhu et.al. | 2504.02792 | null |
| 2025-04-03 | MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion | Trung Thanh Nguyen et.al. | 2504.02287 | link |
| 2025-04-07 | MultiTSF: Transformer-based Sensor Fusion for Human-Centric Multi-view and Multi-modal Action Recognition | Trung Thanh Nguyen et.al. | 2504.02279 | null |
| 2025-04-03 | SocialGesture: Delving into Multi-person Gesture Understanding | Xu Cao et.al. | 2504.02244 | null |
| 2025-04-02 | LSC-ADL: An Activity of Daily Living (ADL)-Annotated Lifelog Dataset Generated via Semi-Automatic Clustering | Minh-Quan Ho-Le et.al. | 2504.02060 | null |
| 2025-04-07 | Is Temporal Prompting All We Need For Limited Labeled Action Recognition? | Shreyank N Gowda et.al. | 2504.01890 | null |
| 2025-04-01 | FDDet: Frequency-Decoupling for Boundary Refinement in Temporal Action Detection | Xinnan Zhu et.al. | 2504.00647 | null |
| 2025-04-01 | Sample-level Adaptive Knowledge Distillation for Action Recognition | Ping Li et.al. | 2504.00606 | null |
| 2025-03-30 | CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition | Jongseo Lee et.al. | 2503.23447 | null |
| 2025-03-30 | OwlSight: A Robust Illumination Adaptation Framework for Dark Video Human Action Recognition | Shihao Cheng et.al. | 2503.23266 | null |
| 2025-03-29 | Action Recognition in Real-World Ambient Assisted Living Environment | Vincent Gbouna Zakka et.al. | 2503.23214 | link |
| 2025-03-28 | ForcePose: A Deep Learning Approach for Force Calculation Based on Action Recognition Using MediaPipe Pose Estimation Combined with Object Detection | Nandakishor M et.al. | 2503.22363 | null |
| 2025-03-30 | UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning | Zhengxi Lu et.al. | 2503.21620 | link |
| 2025-03-27 | One Snapshot is All You Need: A Generalized Method for mmWave Signal Generation | Teng Huang et.al. | 2503.21122 | null |
| 2025-03-26 | ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction | Yiqiao Jin et.al. | 2503.20978 | null |
| 2025-03-26 | Siformer: Feature-isolated Transformer for Efficient Skeleton-based Sign Language Recognition | Muxin Pu et.al. | 2503.20436 | null |
| 2025-03-25 | Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings | Chengan Che et.al. | 2503.19740 | link |
| 2025-03-25 | fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models | Saurav Sharma et.al. | 2503.19670 | null |
| 2025-03-24 | LLaVAction: evaluating and training multi-modal large language models for action recognition | Shaokai Ye et.al. | 2503.18712 | link |
| 2025-03-24 | Surgical Action Planning with Large Language Models | Mengya Xu et.al. | 2503.18296 | null |
| 2025-03-27 | Temporal-Guided Spiking Neural Networks for Event-Based Human Action Recognition | Siyuan Yang et.al. | 2503.17132 | null |
| 2025-03-21 | BEAC: Imitating Complex Exploration and Task-oriented Behaviors for Invisible Object Nonprehensile Manipulation | Hirotaka Tahara et.al. | 2503.16803 | null |
| 2025-03-21 | Improving mmWave based Hand Hygiene Monitoring through Beam Steering and Combining Techniques | Isura Nirmal et.al. | 2503.16764 | null |
| 2025-03-19 | A Comprehensive Survey on Architectural Advances in Deep CNNs: Challenges, Applications, and Emerging Research Directions | Saddam Hussain Khan et.al. | 2503.16546 | null |
| 2025-03-25 | Deep learning framework for action prediction reveals multi-timescale locomotor control | Wei-Chen Wang et.al. | 2503.16340 | null |
| 2025-03-19 | UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction | Shravan Nayak et.al. | 2503.15661 | null |
| 2025-03-19 | Multi-Modal Gesture Recognition from Video and Surgical Tool Pose Information via Motion Invariants | Jumanh Atoum et.al. | 2503.15647 | null |
| 2025-03-21 | Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action Recognition | Seungyeon Cho et.al. | 2503.14960 | null |
| 2025-03-19 | DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework | Henrique Morimitsu et.al. | 2503.14880 | link |
| 2025-03-15 | Salient Temporal Encoding for Dynamic Scene Graph Generation | Zhihao Zhu et.al. | 2503.14524 | null |
| 2025-03-17 | Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition | Shristi Das Biswas et.al. | 2503.13724 | null |
| 2025-03-20 | STEP: Simultaneous Tracking and Estimation of Pose for Animals and Humans | Shashikant Verma et.al. | 2503.13344 | null |
| 2025-03-17 | Dense Policy: Bidirectional Autoregressive Learning of Actions | Yue Su et.al. | 2503.13217 | null |
| 2025-03-16 | EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera | Luming Wang et.al. | 2503.12419 | link |
| 2025-03-16 | ProbDiffFlow: An Efficient Learning-Free Framework for Probabilistic Single-Image Optical Flow Estimation | Mo Zhou et.al. | 2503.12348 | null |
| 2025-03-15 | Real-Time Manipulation Action Recognition with a Factorized Graph Sequence Encoder | Enes Erdogan et.al. | 2503.12034 | null |
| 2025-03-14 | Enhancing Hand Palm Motion Gesture Recognition by Eliminating Reference Frame Bias via Frame-Invariant Similarity Measures | Arno Verduyn et.al. | 2503.11352 | null |
| 2025-03-14 | Aerial Vision-and-Language Navigation with Grid-based View Selection and Map Construction | Ganlong Zhao et.al. | 2503.11091 | null |
| 2025-03-14 | VA-AR: Learning Velocity-Aware Action Representations with Mixture of Window Attention | Jiangning Wei et.al. | 2503.11004 | null |
| 2025-03-13 | Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation | Qi Lv et.al. | 2503.10743 | null |
| 2025-03-11 | Open-World Skill Discovery from Unsegmented Demonstrations | Jingwen Deng et.al. | 2503.10684 | link |
| 2025-03-17 | HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model | Jiaming Liu et.al. | 2503.10631 | null |
| 2025-03-13 | SurgRAW: Multi-Agent Workflow with Chain-of-Thought Reasoning for Surgical Intelligence | Chang Han Low et.al. | 2503.10265 | null |
| 2025-03-12 | A Hybrid Neural Network with Smart Skip Connections for High-Precision, Low-Latency EMG-Based Hand Gesture Recognition | Hafsa Wazir et.al. | 2503.09041 | null |
| 2025-03-12 | Unified Locomotion Transformer with Simultaneous Sim-to-Real Transfer for Quadrupeds | Dikai Liu et.al. | 2503.08997 | null |
| 2025-03-11 | PromptGAR: Flexible Promptive Group Activity Recognition | Zhangyu Jin et.al. | 2503.08933 | null |
| 2025-03-11 | MetaFold: Language-Guided Multi-Category Garment Folding Framework via Trajectory Generation and Foundation Model | Haonan Chen et.al. | 2503.08372 | null |
| 2025-03-11 | A Survey on Wi-Fi Sensing Generalizability: Taxonomy, Techniques, Datasets, and Future Research Prospects | Fei Wang et.al. | 2503.08008 | null |
| 2025-03-10 | Helios 2.0: A Robust, Ultra-Low Power Gesture Recognition System Optimised for Event-Sensor based Wearables | Prarthana Bhattacharyya et.al. | 2503.07825 | null |
| 2025-03-10 | Elderly Activity Recognition in the Wild: Results from the EAR Challenge | Anh-Kiet Duong et.al. | 2503.07821 | link |
| 2025-03-09 | TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos | Chen-Lin Zhang et.al. | 2503.06526 | link |
| 2025-03-09 | SGA-INTERACT: A 3D Skeleton-based Benchmark for Group Activity Understanding in Modern Basketball Tactic | Yuchen Yang et.al. | 2503.06522 | link |
| 2025-03-07 | MPTSNet: Integrating Multiscale Periodic Local Patterns and Global Dependencies for Multivariate Time Series Classification | Yang Mu et.al. | 2503.05582 | null |
| 2025-03-07 | Multi-Grained Feature Pruning for Video-Based Human Pose Estimation | Zhigang Wang et.al. | 2503.05365 | null |
| 2025-03-06 | Maestro: A 302 GFLOPS/W and 19.8GFLOPS RISC-V Vector-Tensor Architecture for Wearable Ultrasound Edge Computing | Mattia Sinigaglia et.al. | 2503.04581 | null |
| 2025-03-06 | Gate-Shift-Pose: Enhancing Action Recognition in Sports with Skeleton Information | Edoardo Bianchi et.al. | 2503.04470 | link |
| 2025-03-06 | Spatial-Temporal Perception with Causal Inference for Naturalistic Driving Action Recognition | Qing Chang et.al. | 2503.04078 | null |
| 2025-03-06 | Social Gesture Recognition in spHRI: Leveraging Fabric-Based Tactile Sensing on Humanoid Robots | Dakarai Crowder et.al. | 2503.03234 | null |
| 2025-03-04 | Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup | Seokun Kang et.al. | 2503.02284 | null |
| 2025-03-04 | FABG : End-to-end Imitation Learning for Embodied Affective Human-Robot Interaction | Yanghai Zhang et.al. | 2503.01363 | null |
| 2025-03-04 | An Efficient 3D Convolutional Neural Network with Channel-wise, Spatial-grouped, and Temporal Convolutions | Zhe Wang et.al. | 2503.00796 | null |
| 2025-03-02 | One-Shot Gesture Recognition for Underwater Diver-To-Robot Communication | Rishikesh Joshi et.al. | 2503.00676 | null |
| 2025-03-04 | Unified Video Action Model | Shuang Li et.al. | 2503.00200 | link |
| 2025-02-28 | BST: Badminton Stroke-type Transformer for Skeleton-based Action Recognition in Racket Sports | Jing-Yuan Chang et.al. | 2502.21085 | null |
| 2025-02-27 | Learning to Generalize without Bias for Open-Vocabulary Action Recognition | Yating Yu et.al. | 2502.20158 | link |
| 2025-02-27 | QORT-Former: Query-optimized Real-time Transformer for Understanding Two Hands Manipulating Objects | Elkhan Ismayilzada et.al. | 2502.19769 | null |
| 2025-02-26 | Deep Learning For Time Series Analysis With Application On Human Motion | Ali Ismail-Fawaz et.al. | 2502.19364 | null |
| 2025-02-26 | UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering | Langming Liu et.al. | 2502.19178 | link |
| 2025-02-25 | EgoSim: An Egocentric Multi-view Simulator and Real Dataset for Body-worn Cameras during Motion and Activity | Dominik Hollidt et.al. | 2502.18373 | link |
| 2025-02-25 | Edge Training and Inference with Analog ReRAM Technology for Hand Gesture Recognition | Victoria Clerico et.al. | 2502.18152 | null |
| 2025-02-23 | Trunk-branch Contrastive Network with Multi-view Deformable Aggregation for Multi-view Action Recognition | Yingyuan Yang et.al. | 2502.16493 | null |
| 2025-02-20 | Online hand gesture recognition using Continual Graph Transformers | Rim Slama et.al. | 2502.14939 | null |
| 2025-02-19 | Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral | Shivani Kumar et.al. | 2502.14083 | null |
| 2025-02-19 | PSCon: Toward Conversational Product Search | Jie Zou et.al. | 2502.13881 | link |
| 2025-02-19 | SNN-Driven Multimodal Human Action Recognition via Event Camera and Skeleton Data Fusion | Naichuan Zheng et.al. | 2502.13385 | null |
| 2025-02-18 | Beyond Timesteps: A Novel Activation-wise Membrane Potential Propagation Mechanism for Spiking Neural Networks in 3D cloud | Jian Song et.al. | 2502.12791 | null |
| 2025-02-18 | Adaptive Prototype Model for Attribute-based Multi-label Few-shot Action Recognition | Juefeng Xiao et.al. | 2502.12582 | null |
| 2025-02-25 | Duo Streamers: A Streaming Gesture Recognition Framework | Boxuan Zhu et.al. | 2502.12297 | link |
| 2025-02-17 | Can LLMs Simulate Social Media Engagement? A Study on Action-Guided Response Generation | Zhongyi Qiu et.al. | 2502.12073 | null |
| 2025-02-14 | ManiTrend: Bridging Future Generation and Action Prediction with 3D Flow for Robotic Manipulation | Yuxin He et.al. | 2502.10028 | null |
| 2025-02-14 | VicKAM: Visual Conceptual Knowledge Guided Action Map for Weakly Supervised Group Activity Recognition | Zhuming Wang et.al. | 2502.09967 | null |
| 2025-02-13 | CellFlow: Simulating Cellular Morphology Changes via Flow Matching | Yuhui Zhang et.al. | 2502.09775 | link |
| 2025-02-12 | Measuring Anxiety Levels with Head Motion Patterns in Severe Depression Population | Fouad Boualeb et.al. | 2502.08813 | null |
| 2025-02-18 | Robot Data Curation with Mutual Information Estimators | Joey Hejna et.al. | 2502.08623 | null |
| 2025-02-12 | DGSense: A Domain Generalization Framework for Wireless Sensing | Rui Zhou et.al. | 2502.08155 | null |
| 2025-02-11 | Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis | Amir Hosein Fadaei et.al. | 2502.07277 | null |
| 2025-02-10 | From Image to Video: An Empirical Study of Diffusion Representations | Pedro Vélez et.al. | 2502.07001 | null |
| 2025-02-10 | Conformal Predictions for Human Action Recognition with Vision-Language Models | Bary Tim et.al. | 2502.06631 | null |
| 2025-02-10 | AppVLM: A Lightweight Vision Language Model for Online App Control | Georgios Papoudakis et.al. | 2502.06395 | null |
| 2025-02-09 | Preventing Rogue Agents Improves Multi-Agent Collaboration | Ohav Barbi et.al. | 2502.05986 | link |
| 2025-02-09 | HyLiFormer: Hyperbolic Linear Attention for Skeleton-based Human Action Recognition | Yue Li et.al. | 2502.05869 | null |
| 2025-02-11 | HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation | Yi Li et.al. | 2502.05485 | link |
| 2025-02-06 | HD-EPIC: A Highly-Detailed Egocentric Video Dataset | Toby Perrett et.al. | 2502.04144 | null |
| 2025-02-06 | MD-BERT: Action Recognition in Dark Videos via Dynamic Multi-Stream Fusion and Temporal Modeling | Sharana Dharshikgan Suresh Dass et.al. | 2502.03724 | null |
| 2025-02-10 | Kronecker Mask and Interpretive Prompts are Language-Action Video Learners | Jingyi Yang et.al. | 2502.03549 | link |
| 2025-02-05 | SKI Models: Skeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living | Arkaprava Sinha et.al. | 2502.03459 | null |
| 2025-02-01 | Minimalistic Video Saliency Prediction via Efficient Decoder & Spatio Temporal Action Cues | Rohit Girmaji et.al. | 2502.00397 | null |
| 2025-01-31 | ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition | Joseph Fioresi et.al. | 2502.00156 | null |
| 2025-01-31 | From Soft Materials to Controllers with NeuroTouch: A Neuromorphic Tactile Sensor for Real-Time Gesture Recognition | Victor Hoffmann et.al. | 2501.19174 | null |
| 2025-01-31 | XRF V2: A Dataset for Action Summarization with Wi-Fi Signals, and IMUs in Phones, Watches, Earbuds, and Glasses | Bo Lan et.al. | 2501.19034 | link |
| 2025-02-03 | Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models | Hao Dong et.al. | 2501.18592 | link |
| 2025-01-29 | Action Recognition Using Temporal Shift Module and Ensemble Learning | Anh-Kiet Duong et.al. | 2501.17550 | link |
| 2025-01-28 | Bones of Contention: Exploring Query-Efficient Attacks Against Skeleton Recognition Systems | Yuxin Cao et.al. | 2501.16843 | null |
| 2025-01-27 | A Low-Cost, High-Precision Human-Machine Interaction Solution Based on Multi-Coil Wireless Charging Pads | Bojun Zhang et.al. | 2501.15885 | null |
| 2025-01-25 | Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data | Jiajie Li et.al. | 2501.15326 | null |
| 2025-01-27 | ACT-JEPA: Joint-Embedding Predictive Architecture Improves Policy Representation Learning | Aleksandar Vujinovic et.al. | 2501.14622 | null |
| 2025-01-24 | Optimizing Human Pose Estimation Through Focused Human and Joint Regions | Yingying Jiao et.al. | 2501.14439 | null |
| 2025-01-24 | Human Activity Recognition with a 6.5 GHz Reconfigurable Intelligent Surface for Wi-Fi 6E | Nuno Paulino et.al. | 2501.14423 | null |
| 2025-01-23 | MV-GMN: State Space Model for Multi-View Action Recognition | Yuhui Lin et.al. | 2501.13829 | null |
| 2025-01-23 | EgoHand: Ego-centric Hand Pose Estimation and Gesture Recognition with Head-mounted Millimeter-wave Radar and IMUs | Yizhe Lv et.al. | 2501.13805 | link |
| 2025-01-22 | SMART-Vision: Survey of Modern Action Recognition Techniques in Vision | Ali K. AlShami et.al. | 2501.13066 | null |
| 2025-01-22 | Can masking background and object reduce static bias for zero-shot action recognition? | Takumi Fukuzawa et.al. | 2501.12681 | null |
| 2025-01-21 | BlanketGen2-Fit3D: Synthetic Blanket Augmentation Towards Improving Real-World In-Bed Blanket Occluded Human Pose Estimation | Tamás Karácsony et.al. | 2501.12318 | null |
| 2025-01-21 | InsTALL: Context-aware Instructional Task Assistance with Multi-modal Large Language Models | Pha Nguyen et.al. | 2501.12231 | null |
| 2025-01-21 | DSTSA-GCN: Advancing Skeleton-Based Gesture Recognition with Semantic-Aware Spatio-Temporal Topology Modeling | Hu Cui et.al. | 2501.12086 | null |
| 2025-01-21 | Survey on Hand Gesture Recognition from Visual Input | Manousos Linardakis et.al. | 2501.11992 | null |
| 2025-01-19 | Rethinking Pseudo-Label Guided Learning for Weakly Supervised Temporal Action Localization from the Perspective of Noise Correction | Quan Zhang et.al. | 2501.11124 | null |
| 2025-01-23 | HFGCN:Hypergraph Fusion Graph Convolutional Networks for Skeleton-Based Action Recognition | Pengcheng Dong et.al. | 2501.11007 | null |
| 2025-01-18 | BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues | Prashant Jayannavar et.al. | 2501.10836 | link |
| 2025-01-15 | Visual WetlandBirds Dataset: Bird Species Identification and Behavior Recognition in Videos | Javier Rodriguez-Juan et.al. | 2501.08931 | link |
| 2025-01-13 | Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics | Tze Ho Elden Tse et.al. | 2501.07100 | null |
| 2025-01-12 | DRDT3: Diffusion-Refined Decision Test-Time Training Model | Xingshuai Huang et.al. | 2501.06718 | null |
| 2025-01-07 | Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models | Malak Mansour et.al. | 2501.05478 | null |
| 2025-01-09 | Improving Skeleton-based Action Recognition with Interactive Object Information | Hao Wen et.al. | 2501.05066 | link |
| 2025-01-08 | Video Summarisation with Incident and Context Information using Generative AI | Ulindu De Silva et.al. | 2501.04764 | null |
| 2025-01-08 | Assessing the Acceptance of a Mid-Air Gesture Syntax for Smart Space Interaction: An Empirical Study | Ana M. Bernardos et.al. | 2501.04464 | null |
| 2025-01-07 | Extraction Of Cumulative Blobs From Dynamic Gestures | Rishabh Naulakha et.al. | 2501.04002 | null |
| 2025-01-06 | Large Language Models for Video Surveillance Applications | Ulindu De Silva et.al. | 2501.02850 | null |
| 2025-01-05 | Evolving Skeletons: Motion Dynamics in Action Recognition | Jushang Qiu et.al. | 2501.02593 | null |
| 2025-01-02 | SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization | Yongle Huang et.al. | 2501.01245 | link |
| 2025-01-02 | Event Masked Autoencoder: Point-wise Action Recognition with Event-Based Cameras | Jingkai Sun et.al. | 2501.01040 | null |
| 2025-01-01 | Multiscaled Multi-Head Attention-based Video Transformer Network for Hand Gesture Recognition | Mallika Garg et.al. | 2501.00935 | null |
| 2025-01-01 | Multimodal Large Models Are Effective Action Anticipators | Binglu Wang et.al. | 2501.00795 | link |
| 2024-12-31 | M2I2: Learning Efficient Multi-Agent Communication via Masked State Modeling and Intention Inference | Chuxiong Sun et.al. | 2501.00312 | null |
| 2024-12-30 | A Large-Scale Study on Video Action Dataset Condensation | Yang Chen et.al. | 2412.21197 | null |
| 2024-12-30 | Frequency-aware Event Cloud Network | Hongwei Ren et.al. | 2412.20803 | null |
| 2024-12-29 | FreqMixFormerV2: Lightweight Frequency-aware Mixed Transformer for Human Skeleton Action Recognition | Wenhan Wu et.al. | 2412.20621 | link |
| 2024-12-29 | Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation | Qucheng Peng et.al. | 2412.20538 | link |
| 2024-12-29 | Improving Vision-Language-Action Models via Chain-of-Affordance | Jinming Li et.al. | 2412.20451 | null |
| 2024-12-28 | DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments | Xijun Wang et.al. | 2412.20042 | null |
| 2024-12-27 | Generalized Uncertainty-Based Evidential Fusion with Hybrid Multi-Head Attention for Weak-Supervised Temporal Action Localization | Yuanpeng He et.al. | 2412.19418 | link |
| 2024-12-25 | SWAG: Long-term Surgical Workflow Prediction with Generative-based Anticipation | Maxence Boels et.al. | 2412.18849 | null |
| 2024-12-25 | Skeleton-based Action Recognition with Non-linear Dependency Modeling and Hilbert-Schmidt Independence Criterion | Yuheng Yang et.al. | 2412.18780 | link |
| 2024-12-24 | Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer | Fenghua Shao et.al. | 2412.18321 | null |
| 2024-12-23 | HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data | Ting Zhou et.al. | 2412.17574 | null |
| 2024-12-22 | Video Domain Incremental Learning for Human Action Recognition in Home Environments | Yuanda Hu et.al. | 2412.16946 | null |
| 2024-12-21 | Optical Wireless Communications: Enabling the Next Generation Network of Networks | Aravindh Krishnamoorthy et.al. | 2412.16798 | null |
| 2024-12-21 | FACTS: Fine-Grained Action Classification for Tactical Sports | Christopher Lai et.al. | 2412.16454 | null |
| 2024-12-20 | iRadar: Synthesizing Millimeter-Waves from Wearable Inertial Inputs for Human Gesture Sensing | Huanqi Yang et.al. | 2412.15980 | null |
| 2024-12-19 | Synchronized and Fine-Grained Head for Skeleton-Based Ambiguous Action Recognition | Hao Huang et.al. | 2412.14833 | null |
| 2024-12-19 | Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition | Kun Li et.al. | 2412.14719 | link |
| 2024-12-24 | Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models | Xinghang Li et.al. | 2412.14058 | link |
| 2024-12-18 | Do Language Models Understand Time? | Xi Ding et.al. | 2412.13845 | link |
| 2024-12-17 | CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices | Andrei Znobishchev et.al. | 2412.13273 | null |
| 2024-12-20 | Future Aspects in Human Action Recognition: Exploring Emerging Techniques and Ethical Influences | Antonios Gasteratos et.al. | 2412.12990 | null |
| 2024-12-16 | Designing Semi-Structured Pruning of Graph Convolutional Networks for Skeleton-based Recognition | Hichem Sahbi et.al. | 2412.11813 | null |
| 2024-12-13 | TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies | Ruijie Zheng et.al. | 2412.10345 | null |
| 2024-12-13 | Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP | Yating Yu et.al. | 2412.09895 | link |
| 2024-12-14 | USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation | Wanjiang Weng et.al. | 2412.09220 | link |
| 2024-12-13 | Temporal Action Localization with Cross Layer Task Decoupling and Refinement | Qiang Li et.al. | 2412.09202 | link |
| 2024-12-12 | Goal-Conditioned Supervised Learning for Multi-Objective Recommendation | Shijun Li et.al. | 2412.08911 | null |
| 2024-12-10 | SAT: Spatial Aptitude Training for Multimodal Language Models | Arijit Ray et.al. | 2412.07755 | link |
| 2024-12-10 | Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence | Wenbo Huang et.al. | 2412.07481 | null |
| 2024-12-09 | Mining Limited Data Sufficiently: A BERT-inspired Approach for CSI Time Series Application in Wireless Communication and Sensing | Zijian Zhao et.al. | 2412.06861 | link |
| 2024-12-09 | Exploring the Impact of Synthetic Data on Human Gesture Recognition Tasks Using GANs | George Kontogiannis et.al. | 2412.06389 | null |
| 2024-12-07 | Action Recognition based Industrial Safety Violation Detection | Surya N Reddy et.al. | 2412.05531 | null |
| 2024-12-06 | CCS: Continuous Learning for Customized Incremental Wireless Sensing Services | Qunhang Fu et.al. | 2412.04821 | null |
| 2024-12-06 | KNN-MMD: Cross Domain Wi-Fi Sensing Based on Local Distribution Alignment | Zijian Zhao et.al. | 2412.04783 | link |
| 2024-12-03 | Proximal Control of UAVs with Federated Learning for Human-Robot Collaborative Domains | Lucas Nogueira Nobrega et.al. | 2412.02863 | null |
| 2024-12-03 | Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation | Xuanlin Li et.al. | 2412.02676 | null |
| 2024-12-02 | Human-Machine Interfaces for Subsea Telerobotics: From Soda-straw to Natural Language Interactions | Adnan Abdullah et.al. | 2412.01753 | null |
| 2024-12-02 | HaGRIDv2: 1M Images for Static and Dynamic Hand Gesture Recognition | Anton Nuzhdin et.al. | 2412.01508 | link |
| 2024-12-02 | EdgeOAR: Real-time Online Action Recognition On Edge Devices | Wei Luo et.al. | 2412.01267 | null |
| 2024-11-29 | CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation | Qixiu Li et.al. | 2411.19650 | null |
| 2024-11-29 | SkelMamba: A State Space Model for Efficient Skeleton Action Recognition of Neurological Disorders | Niki Martinel et.al. | 2411.19544 | null |
| 2024-11-29 | Hierarchical Framework for Retrosynthesis Prediction with Enhanced Reaction Center Localization | Seongeun Yun et.al. | 2411.19503 | null |
| 2024-11-28 | TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition | Yilong Wang et.al. | 2411.19041 | null |
| 2024-11-28 | Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition | Hongda Liu et.al. | 2411.18941 | link |
| 2024-11-27 | Robust Dynamic Gesture Recognition at Ultra-Long Distances | Eran Bamani Beeri et.al. | 2411.18413 | null |
| 2024-11-27 | EventCrab: Harnessing Frame and Point Synergy for Event-based Action Recognition and Beyond | Meiqi Cao et.al. | 2411.18328 | null |
| 2024-11-27 | An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action Recognition | Song-Jiang Lai et.al. | 2411.18002 | null |
| 2024-11-26 | Pre-training for Action Recognition with Automatically Generated Fractal Datasets | Davyd Svyezhentsev et.al. | 2411.17584 | link |
| 2024-11-26 | Real-Time Multimodal Signal Processing for HRI in RoboCup: Understanding a Human Referee | Filippo Ansalone et.al. | 2411.17347 | null |
| 2024-11-22 | TSkips: Efficiency Through Explicit Temporal Delay Connections in Spiking Neural Networks | Prajna G. Malettira et.al. | 2411.16711 | null |
| 2024-11-24 | OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions | Guanyu Zhou et.al. | 2411.15729 | link |
| 2024-11-23 | Machine Learning-based sEMG Signal Classification for Hand Gesture Recognition | Parshuram N. Aarotale et.al. | 2411.15655 | null |
| 2024-11-23 | Optimizing Gesture Recognition for Seamless UI Interaction Using Convolutional Neural Networks | Qi Sun et.al. | 2411.15598 | null |
| 2024-11-22 | When Spatial meets Temporal in Action Recognition | Huilin Chen et.al. | 2411.15284 | null |
| 2024-11-22 | Adaptive Hyper-Graph Convolution Network for Skeleton-based Human Action Recognition with Virtual Connections | Youwei Zhou et.al. | 2411.14796 | null |
| 2024-11-22 | Aim My Robot: Precision Local Navigation to Any Object | Xiangyun Meng et.al. | 2411.14770 | null |
| 2024-11-21 | Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning | Jiange Yang et.al. | 2411.14519 | null |
| 2024-11-18 | Enhancing Bidirectional Sign Language Communication: Integrating YOLOv8 and NLP for Real-Time Gesture Recognition & Translation | Hasnat Jamil Bhuiyan et.al. | 2411.13597 | null |
| 2024-11-23 | AzSLD: Azerbaijani Sign Language Dataset for Fingerspelling, Word, and Sentence Translation with Baseline Software | Nigar Alishzade et.al. | 2411.12865 | null |
| 2024-11-20 | Topological Symmetry Enhanced Graph Convolution for Skeleton-Based Action Recognition | Zeyu Liang et.al. | 2411.12560 | link |
| 2024-11-19 | Rethinking Top Probability from Multi-view for Distracted Driver Behaviour Localization | Quang Vinh Nguyen et.al. | 2411.12525 | null |
| 2024-11-18 | Video-to-Task Learning via Motion-Guided Attention for Few-Shot Action Recognition | Hanyu Guo et.al. | 2411.11335 | null |
| 2024-11-18 | Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition | Yang Chen et.al. | 2411.11288 | null |
| 2024-11-18 | Efficient Transfer Learning for Video-language Foundation Models | Haoxing Chen et.al. | 2411.11223 | link |
| 2024-11-16 | TDSM:Triplet Diffusion for Skeleton-Text Matching in Zero-Shot Action Recognition | Jeonghyeok Do et.al. | 2411.10745 | link |
| 2024-11-15 | KuaiFormer: Transformer-Based Retrieval at Kuaishou | Chi Liu et.al. | 2411.10057 | null |
| 2024-11-14 | Towards Scalable Handwriting Communication via EEG Decoding and Latent Embedding Integration | Jun-Young Kim et.al. | 2411.09170 | null |
| 2024-11-14 | VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation | Youpeng Wen et.al. | 2411.09153 | null |
| 2024-11-13 | Can MLLMs Guide Weakly-Supervised Temporal Action Localization Tasks? | Quan Zhang et.al. | 2411.08466 | null |
| 2024-11-13 | Generative AI for Data Augmentation in Wireless Networks: Analysis, Applications, and Case Study | Jinbo Wen et.al. | 2411.08341 | null |
| 2024-11-12 | LapGSR: Laplacian Reconstructive Network for Guided Thermal Super-Resolution | Aditya Kasliwal et.al. | 2411.07750 | null |
| 2024-11-12 | OWLed: Outlier-weighed Layerwise Pruning for Efficient Autonomous Driving Framework | Jiaxi Li et.al. | 2411.07711 | null |
| 2024-11-11 | ConvMixFormer- A Resource-efficient Convolution Mixer for Transformer-based Dynamic Hand Gesture Recognition | Mallika Garg et.al. | 2411.07118 | link |
| 2024-11-10 | Extended multi-stream temporal-attention module for skeleton-based human action recognition (HAR) | Faisal Mehmood et.al. | 2411.06553 | null |
| 2024-11-10 | SuperResolution Radar Gesture Recognitio | Netanel Blumenfeld et.al. | 2411.06410 | null |
| 2024-11-08 | Video RWKV:Video Action Recognition Based RWKV | Zhuowen Yin et.al. | 2411.05636 | null |
| 2024-11-06 | Object Recognition in Human Computer Interaction:- A Comparative Analysis | Kaushik Ranade et.al. | 2411.04263 | null |
| 2024-11-06 | Explaining Human Activity Recognition with SHAP: Validating Insights with Perturbation and Quantitative Measures | Felix Tempel et.al. | 2411.03714 | link |
| 2024-11-05 | One-Stage-TFS: Thai One-Stage Fingerspelling Dataset for Fingerspelling Recognition Frameworks | Siriwiwat Lata et.al. | 2411.02768 | null |
| 2024-11-04 | TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos | Leonardo Plini et.al. | 2411.02570 | null |
| 2024-11-04 | AM Flow: Adapters for Temporal Processing in Action Recognition | Tanay Agrawal et.al. | 2411.02065 | null |
| 2024-11-04 | ARN-LSTM: A Multi-Stream Attention-Based Model for Action Recognition with Temporal Dynamics | Chuanchuan Wang et.al. | 2411.01769 | null |
| 2024-10-31 | Technical Report for ActivityNet Challenge 2022 – Temporal Action Localization | Shimin Chen et.al. | 2411.00883 | null |
| 2024-10-30 | A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footage | Levi Harris et.al. | 2411.00862 | null |
| 2024-11-01 | STAA: Spatio-Temporal Attention Attribution for Real-Time Interpreting Transformer-based Video Models | Zerui Wang et.al. | 2411.00630 | link |
| 2024-11-01 | Human Action Recognition (HAR) Using Skeleton-based Spatial Temporal Relative Transformer Network: ST-RTR | Faisal Mehmood et.al. | 2410.23806 | null |
| 2024-10-31 | Recovering Complete Actions for Cross-dataset Skeleton Action Recognition | Hanchao Liu et.al. | 2410.23641 | null |
| 2024-10-30 | Keypoint Abstraction using Large Models for Object-Relative Imitation Learning | Xiaolin Fang et.al. | 2410.23254 | null |
| 2024-10-30 | AtGCN: A Graph Convolutional Network For Ataxic Gait Detection | Karan Bania et.al. | 2410.22862 | null |
| 2024-10-29 | ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding | Kimihiro Hasegawa et.al. | 2410.22211 | link |
| 2024-10-29 | Multi-Level Feature Distillation of Joint Teachers Trained on Distinct Image Datasets | Adrian Iordache et.al. | 2410.22184 | link |
| 2024-10-28 | Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context | Manuel Benavent-Lledo et.al. | 2410.21275 | link |
| 2024-10-28 | One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation | Zhendong Wang et.al. | 2410.21257 | null |
| 2024-10-28 | Zero-Shot Action Recognition in Surveillance Videos | Joao Pereira et.al. | 2410.21113 | null |
| 2024-10-28 | LiGAR: LiDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition | Naga Venkata Sai Raviteja Chappa et.al. | 2410.21108 | null |
| 2024-10-27 | Exocentric To Egocentric Transfer For Action Recognition: A Short Survey | Anirudh Thatipelli et.al. | 2410.20621 | null |
| 2024-10-27 | Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition | Lilang Lin et.al. | 2410.20349 | null |
| 2024-10-28 | x-RAGE: eXtended Reality – Action & Gesture Events Dataset | Vivek Parmar et.al. | 2410.19486 | null |
| 2024-10-24 | Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms | Zhangheng Li et.al. | 2410.18967 | link |
| 2024-10-24 | Research on gesture recognition method based on SEDCNN-SVM | Mingjin Zhang et.al. | 2410.18557 | null |
| 2024-10-23 | Unsupervised Domain Adaptation for Action Recognition via Self-Ensembling and Conditional Embedding Alignment | Indrajeet Ghosh et.al. | 2410.17489 | link |
| 2024-10-22 | Are Visual-Language Models Effective in Action Recognition? A Comparative Study | Mahmoud Ali et.al. | 2410.17149 | null |
| 2024-10-22 | Masked Differential Privacy | David Schneider et.al. | 2410.17098 | null |
| 2024-10-22 | SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition | Jiaqi Chen et.al. | 2410.16746 | link |
| 2024-10-21 | Improving the Multi-label Atomic Activity Recognition by Robust Visual Feature and Advanced Attention @ ROAD++ Atomic Activity Recognition 2024 | Jiamin Cao et.al. | 2410.16037 | null |
| 2024-10-19 | CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation | Shangning Xia et.al. | 2410.14974 | null |
| 2024-10-18 | DFlow: Diverse Dialogue Flow Simulation with Large Language Models | Wanyu Du et.al. | 2410.14853 | null |
| 2024-10-18 | Storyboard guided Alignment for Fine-grained Video Action Recognition | Enqi Liu et.al. | 2410.14238 | null |
| 2024-10-17 | SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs | Yuling Gu et.al. | 2410.13648 | null |
| 2024-10-16 | In-Context Learning Enables Robot Action Prediction in LLMs | Yida Yin et.al. | 2410.12782 | null |
| 2024-10-14 | Continual Learning Improves Zero-Shot Action Recognition | Shreyank N Gowda et.al. | 2410.10497 | null |
| 2024-10-16 | PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation | Kaidong Zhang et.al. | 2410.10394 | null |
| 2024-10-13 | EITNet: An IoT-Enhanced Framework for Real-Time Basketball Action Recognition | Jingyu Liu et.al. | 2410.09954 | null |
| 2024-10-13 | Multi class activity classification in videos using Motion History Image generation | Senthilkumar Gopal et.al. | 2410.09902 | link |
| 2024-10-12 | Advanced Gesture Recognition in Autism: Integrating YOLOv7, Video Augmentation and VideoMAE for Video Analysis | Amit Kumar Singh et.al. | 2410.09339 | null |
| 2024-10-11 | Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning | Yunpeng Gao et.al. | 2410.08500 | null |
| 2024-10-10 | Human Stone Toolmaking Action Grammar (HSTAG): A Challenging Benchmark for Fine-grained Motor Behavior Recognition | Cheng Liu et.al. | 2410.08410 | null |
| 2024-10-10 | Understanding Spatio-Temporal Relations in Human-Object Interaction using Pyramid Graph Convolutional Network | Hao Xing et.al. | 2410.07912 | null |
| 2024-10-09 | CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition | Yuhang Wen et.al. | 2410.07153 | link |
| 2024-10-09 | Fourier-based Action Recognition for Wildlife Behavior Quantification with Event Cameras | Friedhelm Hamann et.al. | 2410.06698 | null |
| 2024-10-08 | GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation | Chi-Lam Cheang et.al. | 2410.06158 | null |
| 2024-10-10 | ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition | Mohammadreza Salehi et.al. | 2410.05774 | null |
| 2024-10-07 | Exploring Gestural Interaction with a Cushion Interface for Smart Home Control | Yuri Suzuki et.al. | 2410.04730 | null |
| 2024-10-05 | TR-LLM: Integrating Trajectory Data for Scene-Aware LLM-Based Human Action Prediction | Kojiro Takeyama et.al. | 2410.03993 | null |
| 2024-10-04 | Shadow Augmentation for Handwashing Action Recognition: from Synthetic to Real Datasets | Shengtai Ju et.al. | 2410.03984 | null |
| 2024-10-04 | Action Selection Learning for Multi-label Multi-view Action Recognition | Trung Thanh Nguyen et.al. | 2410.03302 | link |
| 2024-10-03 | DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects | Zhaowei Wang et.al. | 2410.02730 | link |
| 2024-10-03 | An Evaluation of Large Pre-Trained Models for Gesture Recognition using Synthetic Videos | Arun Reddy et.al. | 2410.02152 | null |
| 2024-10-02 | Language Supervised Human Action Recognition with Salient Fusion: Construction Worker Action Recognition as a Use Case | Mohammad Mahdavian et.al. | 2410.01962 | null |
| 2024-10-02 | Sparse Covariance Neural Networks | Andrea Cavallo et.al. | 2410.01669 | link |
| 2024-10-02 | Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy | Ricardo Garcia et.al. | 2410.01345 | link |
| 2024-10-01 | Dynamic Planning for LLM-based Graphical User Interface Automation | Shaoqing Zhang et.al. | 2410.00467 | link |
| 2024-09-30 | SurgPETL: Parameter-Efficient Image-to-Surgical-Video Transfer Learning for Surgical Phase Recognition | Shu Yang et.al. | 2409.20083 | null |
| 2024-09-28 | Gesture Recognition for Feedback Based Mixed Reality and Robotic Fabrication: A Case Study of the UnLog Tower | Alexander Htet Kyaw et.al. | 2409.19281 | null |
| 2024-09-26 | SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining | Ruiqi Xian et.al. | 2409.18300 | null |
| 2024-09-26 | Spatial Hierarchy and Temporal Attention Guided Cross Masking for Self-supervised Skeleton-based Action Recognition | Xinpeng Yin et.al. | 2409.17951 | link |
| 2024-09-26 | EAGLE: Egocentric AGgregated Language-video Engine | Jing Bi et.al. | 2409.17523 | null |
| 2024-09-25 | Path-adaptive Spatio-Temporal State Space Model for Event-based Recognition with Arbitrary Duration | Jiazhou Zhou et.al. | 2409.16953 | null |
| 2024-09-25 | Dynamic Obstacle Avoidance through Uncertainty-Based Adaptive Planning with Diffusion | Vineet Punyamoorty et.al. | 2409.16950 | null |
| 2024-09-24 | Hand Gesture Classification Based on Forearm Ultrasound Video Snippets Using 3D Convolutional Neural Networks | Keshav Bimbraw et.al. | 2409.16431 | null |
| 2024-09-22 | Zero-Shot Skeleton-based Action Recognition with Dual Visual-Text Alignment | Jidong Kuang et.al. | 2409.14336 | null |
| 2024-09-21 | Egocentric zone-aware action recognition across environments | Simone Alberto Peirone et.al. | 2409.14205 | null |
| 2024-09-19 | Interpretable Action Recognition on Hard to Classify Actions | Anastasia Anichenko et.al. | 2409.13091 | null |
| 2024-09-18 | Distillation-free Scaling of Large SSMs for Images and Videos | Hamid Suleman et.al. | 2409.11867 | null |
| 2024-09-17 | Mamba Fusion: Learning Actions Through Questioning | Zhikang Dong et.al. | 2409.11513 | link |
| 2024-09-16 | Forearm Ultrasound based Gesture Recognition on Edge | Keshav Bimbraw et.al. | 2409.09915 | null |
| 2024-09-15 | Integrating Audio Narrations to Strengthen Domain Generalization in Multimodal First-Person Action Recognition | Cagri Gungor et.al. | 2409.09611 | null |
| 2024-09-14 | MulCPred: Learning Multi-modal Concepts for Explainable Pedestrian Action Prediction | Yan Feng et.al. | 2409.09446 | link |
| 2024-09-14 | KAN-HyperpointNet for Point Cloud Sequence-Based 3D Human Action Recognition | Zhaoyu Chen et.al. | 2409.09444 | null |
| 2024-09-14 | ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild | Arya Farkhondeh et.al. | 2409.09319 | link |
| 2024-09-13 | Using The Concept Hierarchy for Household Action Recognition | Andrei Costinescu et.al. | 2409.08853 | null |
| 2024-09-12 | Customized Mid-Air Gestures for Accessibility: A $B Recognizer for Multi-Dimensional Biosignal Gestures | Momona Yamagami et.al. | 2409.08402 | null |
| 2024-09-12 | Spatial Adaptation Layer: Interpretable Domain Adaptation For Biosignal Sensor Array Applications | Joao Pereira et.al. | 2409.08058 | null |
| 2024-09-16 | InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation | Andrew Lee et.al. | 2409.07914 | null |
| 2024-09-11 | 2D bidirectional gated recurrent unit convolutional Neural networks for end-to-end violence detection In videos | Abdarahmane Traoré et.al. | 2409.07588 | null |
| 2024-09-10 | Data Collection-free Masked Video Modeling | Yuchi Ishikawa et.al. | 2409.06665 | null |
| 2024-09-10 | Advancements in Gesture Recognition Techniques and Machine Learning for Enhanced Human-Robot Interaction: A Comprehensive Review | Sajjad Hussain et.al. | 2409.06503 | null |
| 2024-09-10 | Learning Generative Interactive Environments By Trained Agent Exploration | Naser Kazemi et.al. | 2409.06445 | link |
| 2024-09-09 | ReL-SAR: Representation Learning for Skeleton Action Recognition with Convolutional Transformers and BYOL | Safwen Naimi et.al. | 2409.05749 | null |
| 2024-09-11 | Real-Time Human Action Recognition on Embedded Platforms | Ruiqi Wang et.al. | 2409.05662 | null |
| 2024-09-06 | Self-Supervised Contrastive Learning for Videos using Differentiable Local Alignment | Keyne Oei et.al. | 2409.04607 | null |
| 2024-09-05 | MVTN: A Multiscale Video Transformer Network for Hand Gesture Recognition | Mallika Garg et.al. | 2409.03890 | link |
| 2024-09-05 | UAV (Unmanned Aerial Vehicles): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking | Md. Mahfuzur Rahman et.al. | 2409.03245 | null |
| 2024-09-04 | SITAR: Semi-supervised Image Transformer for Action Recognition | Owais Iqbal et.al. | 2409.02910 | null |
| 2024-09-04 | TASAR: Transferable Attack on Skeletal Action Recognition | Yunfeng Diao et.al. | 2409.02483 | link |
| 2024-09-04 | Unified Framework with Consistency across Modalities for Human Activity Recognition | Tuyen Tran et.al. | 2409.02385 | null |
| 2024-09-07 | Unfolding Videos Dynamics via Taylor Expansion | Siyi Chen et.al. | 2409.02371 | null |
| 2024-09-03 | ADHD diagnosis based on action characteristics recorded in videos using machine learning | Yichun Li et.al. | 2409.02274 | null |
| 2024-09-03 | Action-Based ADHD Diagnosis in Video | Yichun Li et.al. | 2409.02261 | null |
| 2024-09-03 | ReSpike: Residual Frames-based Hybrid Spiking Neural Networks for Efficient Action Recognition | Shiting Xiao et.al. | 2409.01564 | null |
| 2024-09-02 | FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition | Ishan Rajendrakumar Dave et.al. | 2409.01448 | null |
| 2024-09-01 | Fisher Information guided Purification against Backdoor Attacks | Nazmul Karim et.al. | 2409.00863 | link |
| 2024-09-01 | A Critical Analysis on Machine Learning Techniques for Video-based Human Activity Recognition of Surveillance Systems: A Review | Shahriar Jahan et.al. | 2409.00731 | null |
| 2024-09-03 | Open-vocabulary Temporal Action Localization using VLMs | Naoki Wake et.al. | 2408.17422 | null |
| 2024-08-29 | Text-Enhanced Zero-Shot Action Recognition: A training-free approach | Massimo Bosetti et.al. | 2408.16412 | null |
| 2024-08-28 | DEAR: Depth-Enhanced Action Recognition | Sadegh Rahmaniboldaji et.al. | 2408.15679 | link |
| 2024-08-28 | Online pre-training with long-form videos | Itsuki Kato et.al. | 2408.15651 | null |
| 2024-09-04 | Hand1000: Generating Realistic Hands from Text with Only 1,000 Images | Haozhuo Zhang et.al. | 2408.15461 | null |
| 2024-08-26 | Comparative Analysis: Violence Recognition from Videos using Transfer Learning | Dursun Dashdamirov et.al. | 2408.14659 | link |
| 2024-08-25 | Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization | Jia-Run Du et.al. | 2408.13777 | link |
| 2024-08-25 | FMI-TAL: Few-shot Multiple Instances Temporal Action Localization by Probability Distribution Learning and Interval Cluster Refinement | Fengshun Wang et.al. | 2408.13765 | link |
| 2024-08-25 | EMG-Based Hand Gesture Recognition through Diverse Domain Feature Enhancement and Machine Learning-Based Approach | Abu Saleh Musa Miah et.al. | 2408.13723 | null |
| 2024-08-24 | HabitAction: A Video Dataset for Human Habitual Behavior Recognition | Hongwu Li et.al. | 2408.13463 | null |
| 2024-08-23 | N-DriverMotion: Driver motion learning and prediction using an event-based camera and directly trained spiking neural networks | Hyo Jong Chung et.al. | 2408.13379 | null |
| 2024-08-23 | Energy-Efficient Spiking Recurrent Neural Network for Gesture Recognition on Embedded GPUs | Marzieh Hassanshahi Varposhti et.al. | 2408.12978 | null |
| 2024-08-21 | Data-Free Class Incremental Gesture Recognition via Synthetic Feature Sampling | Zhenyu Lu et.al. | 2408.12629 | null |
| 2024-08-22 | Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition | Bozheng Li et.al. | 2408.12475 | null |
| 2024-08-23 | TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models | Hyeongmin Lee et.al. | 2408.11318 | link |
| 2024-08-21 | CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network | Zijian Zhao et.al. | 2408.10919 | link |
| 2024-08-20 | TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning | Bin Wang et.al. | 2408.10688 | link |
| 2024-08-19 | Narrowing the Gap between Vision and Action in Navigation | Yue Zhang et.al. | 2408.10388 | link |
| 2024-08-19 | SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition | Wiktor Mucha et.al. | 2408.10037 | link |
| 2024-08-19 | Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms | Xiao Wang et.al. | 2408.09764 | link |
| 2024-08-18 | Joint Temporal Pooling for Improving Skeleton-based Action Recognition | Shanaka Ramesh Gunasekara et.al. | 2408.09356 | null |
| 2024-08-17 | Intuitive Human-Robot Interface: A 3-Dimensional Action Recognition and UAV Collaboration Framework | Akash Chaudhary et.al. | 2408.09232 | null |
| 2024-08-17 | Flatten: Video Action Recognition is an Image Classification task | Junlin Chen et.al. | 2408.09220 | null |
| 2024-08-17 | Temporal Reversed Training for Spiking Neural Networks with Generalized Spatio-Temporal Representation | Lin Zuo et.al. | 2408.09108 | null |
| 2024-08-16 | Towards Physical World Backdoor Attacks against Skeleton Action Recognition | Qichen Zheng et.al. | 2408.08671 | null |
| 2024-08-15 | An Advanced Deep Learning Based Three-Stream Hybrid Model for Dynamic Hand Gesture Recognition | Md Abdur Rahim et.al. | 2408.08035 | null |
| 2024-08-12 | HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization | Sakib Reza et.al. | 2408.06437 | link |
| 2024-08-12 | Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization | Geuntaek Lim et.al. | 2408.05955 | link |
| 2024-08-10 | A Methodological and Structural Review of Hand Gesture Recognition Across Diverse Data Modalities | Jungpil Shin et.al. | 2408.05436 | null |
| 2024-08-10 | EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition | Ahmed Abdelkawy et.al. | 2408.05421 | link |
| 2024-08-06 | Prototype Learning for Micro-gesture Classification | Guoliang Chen et.al. | 2408.03097 | null |
| 2024-08-06 | Online Temporal Action Localization with Memory-Augmented Transformer | Youngkil Song et.al. | 2408.02957 | null |
| 2024-08-05 | From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation | Xin Liu et.al. | 2408.02769 | null |
| 2024-08-04 | Enhancing Human Action Recognition and Violence Detection Through Deep Learning Audiovisual Fusion | Pooya Janani et.al. | 2408.02033 | null |
| 2024-08-03 | MultiFuser: Multimodal Fusion Transformer for Enhanced Driver Action Recognition | Ruoyu Wang et.al. | 2408.01766 | null |
| 2024-08-03 | Signal-SGN: A Spiking Graph Convolutional Network for Skeletal Action Recognition via Learning Temporal-Frequency Dynamics | Naichuan Zheng et.al. | 2408.01701 | null |
| 2024-08-01 | Text-Guided Video Masked Autoencoder | David Fan et.al. | 2408.00759 | null |
| 2024-08-01 | How Effective are Self-Supervised Models for Contact Identification in Videos | Malitha Gunawardhana et.al. | 2408.00498 | null |
| 2024-08-01 | Task-Adapter: Task-specific Adaptation of Image Models for Few-shot Action Recognition | Congqi Cao et.al. | 2408.00249 | null |
| 2024-07-31 | Explainable Artificial Intelligence for Quantifying Interfering and High-Risk Behaviors in Autism Spectrum Disorder in a Real-World Classroom Environment Using Privacy-Preserving Video Analysis | Barun Das et.al. | 2407.21691 | null |
| 2024-07-31 | Skeleton-Based Action Recognition with Spatial-Structural Graph Convolution | Jingyao Wang et.al. | 2407.21525 | null |
| 2024-07-31 | Dynamic Gesture Recognition in Ultra-Range Distance for Effective Human-Robot Interaction | Eran Bamani Beeri et.al. | 2407.21374 | null |
| 2024-07-29 | Adversarial Robustness in RGB-Skeleton Action Recognition: Leveraging Attention Modality Reweighter | Chao Liu et.al. | 2407.19981 | null |
| 2024-07-29 | ActivityCLIP: Enhancing Group Activity Recognition by Mining Complementary Information from Text to Supplement Image Modality | Guoliang Xu et.al. | 2407.19820 | null |
| 2024-07-29 | PredIN: Towards Open-Set Gesture Recognition via Prediction Inconsistency | Chen Liu et.al. | 2407.19753 | null |
| 2024-07-28 | Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph | Zhengcen Li et.al. | 2407.19497 | link |
| 2024-07-25 | MARINE: A Computer Vision Model for Detecting Rare Predator-Prey Interactions in Animal Videos | Zsófia Katona et.al. | 2407.18289 | null |
| 2024-07-25 | Trajectory-aligned Space-time Tokens for Few-shot Action Recognition | Pulkit Kumar et.al. | 2407.18249 | null |
| 2024-07-26 | Harnessing Temporal Causality for Advanced Temporal Action Detection | Shuming Liu et.al. | 2407.17792 | link |
| 2024-07-23 | Fusion and Cross-Modal Transfer for Zero-Shot Human Action Recognition | Abhi Kamboj et.al. | 2407.16803 | null |
| 2024-07-23 | PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles | Aws Khalil et.al. | 2407.16740 | link |
| 2024-07-24 | SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition | Wenbo Huang et.al. | 2407.16344 | link |
| 2024-07-22 | Efficient and generalizable prediction of molecular alterations in multiple cancer cohorts using H&E whole slide images | Kshitij Ingale et.al. | 2407.15816 | null |
| 2024-07-25 | Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition | Jinfu Liu et.al. | 2407.15706 | link |
| 2024-07-21 | Semi-Supervised Pipe Video Temporal Defect Interval Localization | Zhu Huang et.al. | 2407.15170 | null |
| 2024-07-20 | Automated Patient Positioning with Learned 3D Hand Gestures | Zhongpai Gao et.al. | 2407.14903 | null |
| 2024-07-20 | Can VLMs be used on videos for action recognition? LLMs are Visual Reasoning Coordinators | Harsh Lunia et.al. | 2407.14834 | null |
| 2024-07-20 | Decoupled Prompt-Adapter Tuning for Continual Activity Recognition | Di Fu et.al. | 2407.14811 | null |
| 2024-07-20 | A Comprehensive Review of Few-shot Action Recognition | Yuyang Wanyan et.al. | 2407.14744 | null |
| 2024-07-19 | LORTSAR: Low-Rank Transformer for Skeleton-based Action Recognition | Soroush Oraki et.al. | 2407.14655 | null |
| 2024-07-19 | Fine-grained Knowledge Graph-driven Video-Language Learning for Action Recognition | Rui Zhang et.al. | 2407.14146 | null |
| 2024-07-19 | Zero-Shot Underwater Gesture Recognition | Sandipan Sarma et.al. | 2407.14103 | link |
| 2024-07-18 | Pose-guided multi-task video transformer for driver action recognition | Ricardo Pizarro et.al. | 2407.13750 | null |
| 2024-07-18 | SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders | Sheng-Wei Li et.al. | 2407.13460 | link |
| 2024-07-18 | QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person View | Trinh T. L. Vuong et.al. | 2407.13216 | link |
| 2024-07-18 | Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism | Sangyoun Lee et.al. | 2407.13078 | link |
| 2024-07-17 | ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos | Hyolim Kang et.al. | 2407.12987 | link |
| 2024-07-17 | NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models | Gengze Zhou et.al. | 2407.12366 | link |
| 2024-07-17 | Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer | Wenhan Wu et.al. | 2407.12322 | null |
| 2024-07-17 | Shap-Mix: Shapley Value Guided Mixing for Long-Tailed Skeleton Based Action Recognition | Jiahang Zhang et.al. | 2407.12312 | null |
| 2024-07-16 | Enhancing Split Computing and Early Exit Applications through Predefined Sparsity | Luigi Capogrosso et.al. | 2407.11763 | link |
| 2024-07-10 | Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical | Adarsh Prasad Behera et.al. | 2407.11061 | null |
| 2024-07-15 | STARS: Self-supervised Tuning for 3D Action Recognition in Skeleton Sequences | Soroush Mehraban et.al. | 2407.10935 | null |
| 2024-07-15 | Human-Centric Transformer for Domain Adaptive Action Recognition | Kun-Yu Lin et.al. | 2407.10860 | null |
| 2024-07-17 | Augmented Neural Fine-Tuning for Efficient Backdoor Purification | Nazmul Karim et.al. | 2407.10052 | link |
| 2024-07-13 | Region-aware Image-based Human Action Retrieval with Transformers | Hongsong Wang et.al. | 2407.09924 | null |
| 2024-07-16 | OmniRace: 6D Hand Pose Estimation for Intuitive Guidance of Racing Drone | Valerii Serpiva et.al. | 2407.09841 | link |
| 2024-07-12 | Full-Stage Pseudo Label Quality Enhancement for Weakly-supervised Temporal Action Localization | Qianhan Feng et.al. | 2407.08971 | link |
| 2024-07-11 | Boosting Adversarial Transferability for Skeleton-based Action Recognition via Exploring the Model Posterior Space | Yunfeng Diao et.al. | 2407.08572 | null |
| 2024-07-12 | Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization | Feixiang Zhou et.al. | 2407.07673 | null |
| 2024-07-10 | EA-VTR: Event-Aware Video-Text Retrieval | Zongyang Ma et.al. | 2407.07478 | null |
| 2024-07-09 | Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization | Jeongseok Hyun et.al. | 2407.07024 | link |
| 2024-07-09 | Rethinking Image-to-Video Adaptation: An Object-centric Perspective | Rui Qian et.al. | 2407.06871 | null |
| 2024-07-09 | Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition | Mingfang Zhang et.al. | 2407.06628 | null |
| 2024-07-08 | Noise-Free Explanation for Driving Action Prediction | Hongbo Zhu et.al. | 2407.06339 | link |
| 2024-07-08 | C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition | Rongchang Li et.al. | 2407.06113 | link |
| 2024-07-08 | DMSD-CDFSAR: Distillation from Mixed-Source Domain for Cross-Domain Few-shot Action Recognition | Fei Guo et.al. | 2407.05657 | null |
| 2024-07-11 | Helios: An extremely low power event-based gesture recognition for always-on smart eyewear | Prarthana Bhattacharyya et.al. | 2407.05206 | null |
| 2024-07-06 | DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition | Qi Wang et.al. | 2407.05106 | link |
| 2024-07-05 | AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation | Yuhan Zhu et.al. | 2407.04603 | null |
| 2024-07-05 | TF-SASM: Training-free Spatial-aware Sparse Memory for Multi-object Tracking | Thuc Nguyen-Quang et.al. | 2407.04327 | null |
| 2024-07-05 | Computer Vision for Clinical Gait Analysis: A Gait Abnormality Video Dataset | Rahm Ranjan et.al. | 2407.04190 | null |
| 2024-07-04 | Robust Policy Learning for Multi-UAV Collision Avoidance with Causal Feature Selection | Jiafan Zhuang et.al. | 2407.04056 | null |
| 2024-07-04 | On-Device Training Empowered Transfer Learning For Human Activity Recognition | Pixi Kang et.al. | 2407.03644 | null |
| 2024-07-03 | Motion meets Attention: Video Motion Prompts | Qixiang Chen et.al. | 2407.03179 | null |
| 2024-07-02 | Advancing Compressed Video Action Recognition through Progressive Knowledge Distillation | Efstathia Soufleri et.al. | 2407.02713 | link |
| 2024-07-02 | Novel Human Machine Interface via Robust Hand Gesture Recognition System using Channel Pruned YOLOv5s Model | Abir Sen et.al. | 2407.02585 | null |
| 2024-07-02 | Referring Atomic Video Action Recognition | Kunyu Peng et.al. | 2407.01872 | link |
| 2024-07-01 | Mask and Compress: Efficient Skeleton-based Action Recognition in Continual Learning | Matteo Mosconi et.al. | 2407.01397 | link |
| 2024-06-30 | Graph in Graph Neural Network | Jiongshu Wang et.al. | 2407.00696 | link |
| 2024-06-29 | Diving Deeper Into Pedestrian Behavior Understanding: Intention Estimation, Action Prediction, and Event Risk Assessment | Amir Rasouli et.al. | 2407.00446 | link |
| 2024-06-29 | PerAct2: A Perceiver Actor Framework for Bimanual Manipulation Tasks | Markus Grotz et.al. | 2407.00278 | null |
| 2024-06-27 | VideoMambaPro: A Leap Forward for Mamba in Video Understanding | Hui Lu et.al. | 2406.19006 | link |
| 2024-06-28 | CSI4Free: GAN-Augmented mmWave CSI for Improved Pose Classification | Nabeel Nisar Bhat et.al. | 2406.18684 | null |
| 2024-06-26 | The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval | Meinardus Boris et.al. | 2406.18113 | link |
| 2024-07-01 | EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation | Baoqi Pei et.al. | 2406.18070 | link |
| 2024-06-26 | Expressive Keypoints for Skeleton-based Action Recognition via Skeleton Transformation | Yijie Yang et.al. | 2406.18011 | link |
| 2024-06-25 | Using joint angles based on the international biomechanical standards for human action recognition and related tasks | Kevin Schlegel et.al. | 2406.17443 | null |
| 2024-06-21 | Open-Vocabulary Temporal Action Localization using Multimodal Guidance | Akshita Gupta et.al. | 2406.15556 | null |
| 2024-06-21 | SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition | Liutao Yu et.al. | 2406.15034 | null |
| 2024-06-21 | Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN | Oluwaleke Yusuf et.al. | 2406.15003 | link |
| 2024-06-20 | Self-supervised Multi-actor Social Activity Understanding in Streaming Videos | Shubham Trehan et.al. | 2406.14472 | null |
| 2024-06-19 | An Efficient yet High-Performance Method for Precise Radar-Based Imaging of Human Hand Poses | Johanna Bräunig et.al. | 2406.13464 | null |
| 2024-06-19 | Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition | Anqi Zhu et.al. | 2406.13327 | link |
| 2024-06-21 | Underwater Human-Robot and Human-Swarm Interaction: A Review and Perspective | Sara Aldhaheri et.al. | 2406.12473 | null |
| 2024-06-18 | Deep self-supervised learning with visualisation for automatic gesture recognition | Fabien Allemand et.al. | 2406.12440 | null |
| 2024-06-17 | Brain-inspired Computational Modeling of Action Recognition with Recurrent Spiking Neural Networks Equipped with Reinforcement Delay Learning | Alireza Nadafian et.al. | 2406.11778 | null |
| 2024-06-18 | CM2-Net: Continual Cross-Modal Mapping Network for Driver Action Recognition | Ruoyu Wang et.al. | 2406.11340 | null |
| 2024-06-17 | Expanding the Design Space of Computer Vision-based Interactive Systems for Group Dance Practice | Soohwan Lee et.al. | 2406.11236 | null |
| 2024-06-14 | Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild | Lingni Ma et.al. | 2406.09905 | null |
| 2024-06-12 | Enhancing End-to-End Autonomous Driving with Latent World Model | Yingyan Li et.al. | 2406.08481 | link |
| 2024-06-09 | ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition | Sanjoy Kundu et.al. | 2406.05722 | null |
| 2024-06-07 | SMART: Scene-motion-aware human action recognition framework for mental disorder group | Zengyuan Lai et.al. | 2406.04649 | link |
| 2024-06-06 | Enhancing Sign Language Detection through Mediapipe and Convolutional Neural Networks (CNN) | Aditya Raj Verma et.al. | 2406.03729 | null |
| 2024-06-05 | The Logarithmic Memristor-Based Bayesian Machine | Clément Turck et.al. | 2406.03492 | null |
| 2024-06-05 | FILS: Self-Supervised Video Feature Prediction In Semantic Language Space | Mona Ahmadian et.al. | 2406.03447 | null |
| 2024-06-05 | Self-Supervised Skeleton Action Representation Learning: A Benchmark and Beyond | Jiahang Zhang et.al. | 2406.02978 | null |
| 2024-06-04 | Contrastive Language Video Time Pre-training | Hengyue Liu et.al. | 2406.02631 | null |
| 2024-06-04 | DL-KDD: Dual-Light Knowledge Distillation for Action Recognition in the Dark | Chi-Jui Chang et.al. | 2406.02468 | null |
| 2024-06-04 | A Generalized Apprenticeship Learning Framework for Modeling Heterogeneous Student Pedagogical Strategies | Md Mirajul Islam et.al. | 2406.02450 | null |
| 2024-06-04 | Analyzing the Feature Extractor Networks for Face Image Synthesis | Erdi Sarıtaş et.al. | 2406.02153 | link |
| 2024-06-04 | Analyzing the Effect of Combined Degradations on Face Recognition | Erdi Sarıtaş et.al. | 2406.02142 | link |
| 2024-06-03 | ELSA: Evaluating Localization of Social Activities in Urban Streets | Maryam Hosseini et.al. | 2406.01551 | null |
| 2024-06-03 | HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models | Mengcheng Li et.al. | 2406.01334 | null |
| 2024-06-03 | Augmented Commonsense Knowledge for Remote Object Grounding | Bahram Mohammadi et.al. | 2406.01256 | link |
| 2024-06-03 | Understanding the Cross-Domain Capabilities of Video-Based Few-Shot Action Recognition Models | Georgia Markham et.al. | 2406.01073 | null |
| 2024-06-02 | An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition | Haojun Xu et.al. | 2406.00639 | null |
| 2024-05-31 | Action-OOD: An End-to-End Skeleton-Based Model for Robust Out-of-Distribution Human Action Detection | Jing Xu et.al. | 2405.20633 | link |
| 2024-05-31 | Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning | Yang Chen et.al. | 2405.20606 | null |
| 2024-05-30 | ENTIRe-ID: An Extensive and Diverse Dataset for Person Re-Identification | Serdar Yildiz et.al. | 2405.20465 | null |
| 2024-05-30 | From Forest to Zoo: Great Ape Behavior Recognition with ChimpBehave | Michael Fuchs et.al. | 2405.20025 | null |
| 2024-05-31 | Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition | Masashi Hatano et.al. | 2405.19917 | null |
| 2024-05-30 | EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos | Ryo Fujii et.al. | 2405.19644 | link |
| 2024-05-30 | SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation | Junjie Zhang et.al. | 2405.19586 | null |
| 2024-05-29 | Matrix Manifold Neural Networks++ | Xuan Son Nguyen et.al. | 2405.19206 | null |
| 2024-05-29 | Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature Preservation | Sabrina Cynthia Triess et.al. | 2405.19173 | null |
| 2024-05-28 | Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition | Muhammad Adi Nugroho et.al. | 2405.18012 | null |
| 2024-05-30 | Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson’s Disease Severity in Walking Sequences | Vida Adeli et.al. | 2405.17817 | link |
| 2024-05-28 | Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions | Rui Zhang et.al. | 2405.17729 | null |
| 2024-05-28 | EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions? | Boshen Xu et.al. | 2405.17719 | link |
| 2024-05-27 | Advancements in Tactile Hand Gesture Recognition for Enhanced Human-Machine Interaction | Chiara Fumelli et.al. | 2405.17038 | null |
| 2024-05-27 | A Cross-Dataset Study for Text-based 3D Human Motion Retrieval | Léore Bensabath et.al. | 2405.16909 | null |
| 2024-05-26 | Flow Snapshot Neurons in Action: Deep Neural Networks Generalize to Biological Motion Perception | Shuangpeng Han et.al. | 2405.16493 | null |
| 2024-05-25 | Application of Artificial Intelligence in Hand Gesture Recognition with Virtual Reality: Survey and Analysis of Hand Gesture Hardware Selection | Jindi Wang et.al. | 2405.16264 | null |
| 2024-05-22 | From CNNs to Transformers in Multimodal Human Action Recognition: A Survey | Muhammad Bilal Shaikh et.al. | 2405.15813 | null |
| 2024-05-24 | V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM | Abdur Rahman et.al. | 2405.15341 | link |
| 2024-05-23 | Enhanced Spatiotemporal Prediction Using Physical-guided And Frequency-enhanced Recurrent Neural Networks | Xuanle Zhao et.al. | 2405.14504 | null |
| 2024-05-23 | SpGesture: Source-Free Domain-adaptive sEMG-based Gesture Recognition with Jaccard Attentive Spiking Neural Network | Weiyu Guo et.al. | 2405.14398 | null |
| 2024-05-23 | MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models | Jiuming Liu et.al. | 2405.14338 | null |
| 2024-05-22 | Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks | Mohit Prabhushankar et.al. | 2405.13758 | null |
| 2024-05-21 | Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding | Rong Gao et.al. | 2405.13206 | null |
| 2024-05-22 | Building Temporal Kernels with Orthogonal Polynomials | Yan Ru Pei et.al. | 2405.12179 | link |
| 2024-05-18 | GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic Hand Gesture Recognition | Mallika Garg et.al. | 2405.11180 | link |
| 2024-05-17 | Air Signing and Privacy-Preserving Signature Verification for Digital Documents | P. Sarveswarasarma et.al. | 2405.10868 | null |
| 2024-05-17 | MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains | Zhaohuan Zhan et.al. | 2405.10620 | null |
| 2024-05-06 | MEET: Mixture of Experts Extra Tree-Based sEMG Hand Gesture Identification | Naveen Gehlot et.al. | 2405.09562 | null |
| 2024-05-14 | Wearable Sensor-Based Few-Shot Continual Learning on Hand Gestures for Motor-Impaired Individuals via Latent Embedding Exploitation | Riyad Bin Rafiq et.al. | 2405.08969 | link |
| 2024-05-14 | The impact of Compositionality in Zero-shot Multi-label action recognition for Object-based tasks | Carmela Calabrese et.al. | 2405.08695 | null |
| 2024-05-15 | POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning | Chang Huang et.al. | 2405.08036 | null |
| 2024-05-13 | Coarse or Fine? Recognising Action End States without Labels | Davide Moltisanti et.al. | 2405.07723 | link |
| 2024-05-11 | PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action Recognition | Shenglin He et.al. | 2405.06929 | null |
| 2024-05-10 | CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras | James Tang et.al. | 2405.06845 | link |
| 2024-05-09 | A Survey on Backbones for Deep Video Action Recognition | Zixuan Tang et.al. | 2405.05584 | null |
| 2024-05-06 | OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs | Jiahao Nick Li et.al. | 2405.03901 | null |
| 2024-05-05 | JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance Videos | Pietro Nardelli et.al. | 2405.02961 | null |
| 2024-05-03 | On the Utility of External Agent Intention Predictor for Human-AI Coordination | Chenxu Wang et.al. | 2405.02229 | null |
| 2024-05-11 | MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition | Hongyu Qu et.al. | 2405.02077 | null |
| 2024-05-03 | Enhancing Micro Gesture Recognition for Emotion Understanding via Context-aware Visual-Text Contrastive Learning | Deng Li et.al. | 2405.01885 | link |
| 2024-05-02 | Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy | Hoang-Quan Nguyen et.al. | 2405.01337 | null |
| 2024-05-07 | Towards Inclusive Face Recognition Through Synthetic Ethnicity Alteration | Praveen Kumar Chandaliya et.al. | 2405.01273 | null |
| 2024-04-30 | One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features | Trung Thanh Nguyen et.al. | 2404.19542 | link |
| 2024-04-30 | Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action Recognition | Zhendong Liu et.al. | 2404.19383 | null |
| 2024-04-28 | Enhancing Action Recognition from Low-Quality Skeleton Data via Part-Level Knowledge Distillation | Cuiwei Liu et.al. | 2404.18206 | null |
| 2024-04-26 | SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes | Georgia Baltsou et.al. | 2404.17255 | null |
| 2024-04-25 | Learning Discriminative Spatio-temporal Representations for Semi-supervised Action Recognition | Yu Wang et.al. | 2404.16416 | null |
| 2024-04-25 | An Improved Graph Pooling Network for Skeleton-Based Action Recognition | Cong Wu et.al. | 2404.16359 | null |
| 2024-04-24 | Unimodal and Multimodal Sensor Fusion for Wearable Activity Recognition | Hymalai Bello et.al. | 2404.16005 | null |
| 2024-04-24 | 3D Face Morphing Attack Generation using Non-Rigid Registration | Jag Mohan Singh et.al. | 2404.15765 | null |
| 2024-04-25 | HDBN: A Novel Hybrid Dual-branch Network for Robust Skeleton-based Action Recognition | Jinfu Liu et.al. | 2404.15719 | link |
| 2024-04-23 | Combating Missing Modalities in Egocentric Videos at Test Time | Merey Ramazanova et.al. | 2404.15161 | null |
| 2024-04-23 | G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition | Kaikai Deng et.al. | 2404.14934 | null |
| 2024-04-23 | Driver Activity Classification Using Generalizable Representations from Vision-Language Models | Ross Greer et.al. | 2404.14906 | null |
| 2024-04-23 | DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition | Haozhe Cheng et.al. | 2404.14890 | null |
| 2024-04-22 | 1st Place Solution to the 1st SkatingVerse Challenge | Tao Sun et.al. | 2404.14032 | null |
| 2024-04-22 | CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment | Kanglei Zhou et.al. | 2404.13999 | link |
| 2024-04-21 | Attack on Scene Flow using Point Clouds | Haniyeh Ehsani Oskouie et.al. | 2404.13621 | null |
| 2024-04-20 | STAT: Towards Generalizable Temporal Action Localization | Yangcen Liu et.al. | 2404.13311 | null |
| 2024-04-19 | Ring-a-Pose: A Ring for Continuous Hand Pose Tracking | Tianhong Catherine Yu et.al. | 2404.12980 | null |
| 2024-04-19 | VoxAtnNet: A 3D Point Clouds Convolutional Neural Network for Generalizable Face Presentation Attack Detection | Raghavendra Ramachandra et.al. | 2404.12680 | null |
| 2024-04-18 | DeepLocalization: Using change point detection for Temporal Action Localization | Mohammed Shaiqur Rahman et.al. | 2404.12258 | null |
| 2024-04-18 | Aligning Actions and Walking to LLM-Generated Textual Descriptions | Radu Chivereanu et.al. | 2404.12192 | link |
| 2024-04-18 | Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition | Xunsong Li et.al. | 2404.11903 | null |
| 2024-04-18 | sEMG-based Fine-grained Gesture Recognition via Improved LightGBM Model | Xiupeng Qiao et.al. | 2404.11861 | null |
| 2024-04-17 | VG4D: Vision-Language Model Goes 4D Video Recognition | Zhichao Deng et.al. | 2404.11605 | link |
| 2024-04-17 | A Data-Driven Representation for Sign Language Production | Harry Walsh et.al. | 2404.11499 | link |
| 2024-04-17 | Lower Limb Movements Recognition Based on Feature Recursive Elimination and Backpropagation Neural Network | Yongkai Ma et.al. | 2404.11383 | null |
| 2024-04-17 | Revisiting Noise Resilience Strategies in Gesture Recognition: Short-Term Enhancement in Surface Electromyographic Signal Analysis | Weiyu Guo et.al. | 2404.11213 | null |
| 2024-04-17 | Kathakali Hand Gesture Recognition With Minimal Data | Kavitha Raju et.al. | 2404.11205 | null |
| 2024-04-16 | HumMUSS: Human Motion Understanding using State Space Models | Arnab Kumar Mondal et.al. | 2404.10880 | null |
| 2024-04-17 | Learning to Score Sign Language with Two-stage Method | Hongli Wen et.al. | 2404.10383 | null |
| 2024-04-16 | MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition | Naichuan Zheng et.al. | 2404.10210 | null |
| 2024-04-15 | Design and Analysis of Efficient Attention in Transformers for Social Group Activity Recognition | Masato Tamura et.al. | 2404.09964 | null |
| 2024-04-15 | A Diffusion-based Data Generator for Training Object Recognition Models in Ultra-Range Distance | Eran Bamani et.al. | 2404.09846 | null |
| 2024-04-15 | Leveraging Temporal Contextualization for Video Action Recognition | Minji Kim et.al. | 2404.09490 | link |
| 2024-04-14 | In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition | Wiktor Mucha et.al. | 2404.09308 | null |
| 2024-04-13 | Exploring Explainability in Video Action Recognition | Avinab Saha et.al. | 2404.09067 | null |
| 2024-04-12 | MSSTNet: A Multi-Scale Spatio-Temporal CNN-Transformer Network for Dynamic Facial Expression Recognition | Linhuang Wang et.al. | 2404.08433 | null |
| 2024-04-11 | Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls | Amin Hosseiny Marani et.al. | 2404.08155 | null |
| 2024-04-11 | Simba: Mamba augmented U-ShiftGCN for Skeletal Action Recognition in Videos | Soumyabrata Chaudhuri et.al. | 2404.07645 | null |
| 2024-04-15 | Fine-Grained Side Information Guided Dual-Prompts for Zero-Shot Skeleton Action Recognition | Yang Chen et.al. | 2404.07487 | null |
| 2024-04-10 | O-TALC: Steps Towards Combating Oversegmentation within Online Action Segmentation | Matthew Kent Myers et.al. | 2404.06894 | null |
| 2024-04-10 | An Animation-based Augmentation Approach for Action Recognition from Discontinuous Video | Xingyu Song et.al. | 2404.06741 | null |
| 2024-04-07 | X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model | Jan Held et.al. | 2404.06332 | null |
| 2024-04-10 | Algorithms for Caching and MTS with reduced number of predictions | Karim Abdel Sadek et.al. | 2404.06280 | null |
| 2024-04-09 | ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos | Sharana Dharshikgan Suresh Dass et.al. | 2404.06243 | link |
| 2024-04-08 | Localizing Moments of Actions in Untrimmed Videos of Infants with Autism Spectrum Disorder | Halil Ismail Helvaci et.al. | 2404.05849 | null |
| 2024-04-09 | TIM: A Time Interval Machine for Audio-Visual Action Recognition | Jacob Chalk et.al. | 2404.05559 | link |
| 2024-04-11 | Test-Time Zero-Shot Temporal Action Localization | Benedetta Liberatori et.al. | 2404.05426 | link |
| 2024-04-09 | SDFR: Synthetic Data for Face Recognition Competition | Hatef Otroshi Shahreza et.al. | 2404.04580 | null |
| 2024-04-05 | PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos | Yufei Zhang et.al. | 2404.04430 | null |
| 2024-04-05 | Koala: Key frame-conditioned long video-LLM | Reuben Tan et.al. | 2404.04346 | null |
| 2024-04-04 | UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization | Tiantian Geng et.al. | 2404.03179 | null |
| 2024-04-03 | Optimizing the Deployment of Tiny Transformers on Low-Power MCUs | Victor J. B. Jung et.al. | 2404.02945 | link |
| 2024-04-03 | Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition | Ikuo Nakamura et.al. | 2404.02624 | null |
| 2024-04-02 | PREGO: online mistake detection in PRocedural EGOcentric videos | Alessandro Flaborea et.al. | 2404.01933 | link |
| 2024-04-02 | Disentangled Pre-training for Human-Object Interaction Detection | Zhuolong Li et.al. | 2404.01725 | link |
| 2024-04-02 | Language Model Guided Interpretable Video Action Reasoning | Ning Wang et.al. | 2404.01591 | null |
| 2024-04-02 | Leveraging YOLO-World and GPT-4V LMMs for Zero-Shot Person Detection and Action Recognition in Drone Imagery | Christian Limberg et.al. | 2404.01571 | null |
| 2024-04-01 | LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization | Akshita Gupta et.al. | 2404.01282 | null |
| 2024-03-31 | LLMs are Good Action Recognizers | Haoxuan Qu et.al. | 2404.00532 | null |
| 2024-03-29 | Latent Embedding Clustering for Occlusion Robust Head Pose Estimation | José Celestino et.al. | 2403.20251 | null |
| 2024-03-29 | A Unified Framework for Human-centric Point Cloud Video Understanding | Yiteng Xu et.al. | 2403.20031 | null |
| 2024-03-28 | Zero-shot Prompt-based Video Encoder for Surgical Gesture Recognition | Mingxing Rao et.al. | 2403.19786 | link |
| 2024-03-28 | Hypergraph-based Multi-View Action Recognition using Event Cameras | Yue Gao et.al. | 2403.19316 | null |
| 2024-03-27 | PLOT-TAL – Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization | Edward Fish et.al. | 2403.18915 | null |
| 2024-03-27 | iFace: Hand-Over-Face Gesture Recognition Leveraging Impedance Sensing | Mengxi Liu et.al. | 2403.18433 | null |
| 2024-03-27 | An Evolutionary Network Architecture Search Framework with Adaptive Multimodal Fusion for Hand Gesture Recognition | Yizhang Xia et.al. | 2403.18208 | null |
| 2024-03-26 | OmniVid: A Generative Framework for Universal Video Understanding | Junke Wang et.al. | 2403.17935 | link |
| 2024-03-25 | Understanding Long Videos in One Multimodal Language Model Pass | Kanchana Ranasinghe et.al. | 2403.16998 | link |
| 2024-03-25 | Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects | Zicong Fan et.al. | 2403.16428 | null |
| 2024-03-24 | Emotion Recognition from the perspective of Activity Recognition | Savinay Nagendra et.al. | 2403.16263 | null |
| 2024-03-22 | InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding | Yi Wang et.al. | 2403.15377 | link |
| 2024-03-22 | Gesture-Controlled Aerial Robot Formation for Human-Swarm Interaction in Safety Monitoring Applications | Vít Krátký et.al. | 2403.15333 | null |
| 2024-03-22 | GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition | Lei Jiang et.al. | 2403.15212 | link |
| 2024-03-21 | Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets | Ahmet Alp Kindiroglu et.al. | 2403.14534 | link |
| 2024-03-20 | Hierarchical NeuroSymbolic Approach for Action Quality Assessment | Lauren Okamoto et.al. | 2403.13798 | null |
| 2024-03-19 | Selective, Interpretable, and Motion Consistent Privacy Attribute Obfuscation for Action Recognition | Filip Ilic et.al. | 2403.12710 | null |
| 2024-03-19 | ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More | Jiazhou Zhou et.al. | 2403.12534 | null |
| 2024-03-19 | VideoBadminton: A Video Dataset for Badminton Action Recognition | Qi Li et.al. | 2403.12385 | null |
| 2024-03-19 | Multi-View Video-Based Learning: Leveraging Weak Labels for Frame-Level Perception | Vijay John et.al. | 2403.11616 | null |
| 2024-03-19 | VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation | Weiyao Wang et.al. | 2403.11461 | null |
| 2024-03-17 | A Lie Group Approach to Riemannian Batch Normalization | Ziheng Chen et.al. | 2403.11261 | link |
| 2024-03-17 | Boosting Semi-Supervised Temporal Action Localization by Learning from Non-Target Classes | Kun Xia et.al. | 2403.11189 | null |
| 2024-03-16 | CoPlay: Audio-agnostic Cognitive Scaling for Acoustic Sensing | Yin Li et.al. | 2403.10796 | null |
| 2024-03-15 | CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner | Tingbing Yan et.al. | 2403.10082 | null |
| 2024-03-15 | Skeleton-Based Human Action Recognition with Noisy Labels | Yi Xu et.al. | 2403.09975 | null |
| 2024-03-14 | On the Utility of 3D Hand Poses for Action Recognition | Md Salman Shamil et.al. | 2403.09805 | null |
| 2024-03-14 | 3D-VLA: A 3D Vision-Language-Action Generative World Model | Haoyu Zhen et.al. | 2403.09631 | link |
| 2024-03-14 | SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition | Jeonghyeok Do et.al. | 2403.09508 | link |
| 2024-03-14 | EventRPG: Event Data Augmentation with Relevance Propagation Guidance | Mingyuan Sun et.al. | 2403.09274 | link |
| 2024-03-14 | Leveraging Foundation Model Automatic Data Augmentation Strategies and Skeletal Points for Hands Action Recognition in Industrial Assembly Lines | Liang Wu et.al. | 2403.09056 | null |
| 2024-03-13 | Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models | Wensheng Liang et.al. | 2403.08420 | null |
| 2024-03-13 | NaturalVLM: Leveraging Fine-grained Natural Language for Affordance-Guided Visual Manipulation | Ran Xu et.al. | 2403.08355 | null |
| 2024-03-13 | ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation | Guanxing Lu et.al. | 2403.08321 | link |
| 2024-03-12 | NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning | Bingqian Lin et.al. | 2403.07376 | link |
| 2024-03-12 | BID: Boundary-Interior Decoding for Unsupervised Temporal Action Localization Pre-Trainin | Qihang Fang et.al. | 2403.07354 | null |
| 2024-03-11 | Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling | Wele Gedara Chaminda Bandara et.al. | 2403.06978 | link |
| 2024-03-11 | Deep Learning Approaches for Human Action Recognition in Video Data | Yufei Xie et.al. | 2403.06810 | null |
| 2024-03-11 | Real-Time Multimodal Cognitive Assistant for Emergency Medical Services | Keshara Weerasinghe et.al. | 2403.06734 | null |
| 2024-03-11 | Multimodal Transformers for Real-Time Surgical Activity Prediction | Keshara Weerasinghe et.al. | 2403.06705 | link |
| 2024-03-11 | epsilon-Mesh Attack: A Surface-based Adversarial Point Cloud Attack for Facial Expression Recognition | Batuhan Cengiz et.al. | 2403.06661 | null |
| 2024-03-11 | Density-Guided Label Smoothing for Temporal Localization of Driving Actions | Tunc Alkanat et.al. | 2403.06616 | null |
| 2024-03-11 | Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition | Erkut Akdag et.al. | 2403.06577 | null |
| 2024-03-10 | Coherent Temporal Synthesis for Incremental Action Segmentation | Guodong Ding et.al. | 2403.06102 | null |
| 2024-03-09 | Dissecting Deep RL with High Update Ratios: Combatting Value Overestimation and Divergence | Marcel Hussing et.al. | 2403.05996 | null |
| 2024-03-08 | Benchmarking Micro-action Recognition: Dataset, Methods, and Applications | Dan Guo et.al. | 2403.05234 | link |
| 2024-03-06 | Video Relationship Detection Using Mixture of Experts | Ala Shaabana et.al. | 2403.03994 | link |
| 2024-03-05 | Behavior Generation with Latent Actions | Seungjae Lee et.al. | 2403.03181 | link |
| 2024-03-05 | Learning to Use Tools via Cooperative and Interactive Agents | Zhengliang Shi et.al. | 2403.03031 | null |
| 2024-03-04 | Gesture recognition with Brownian reservoir computing using geometrically confined skyrmion dynamics | Grischa Beneke et.al. | 2403.01877 | null |
| 2024-03-04 | A Simple Baseline for Efficient Hand Mesh Reconstruction | Zhishan Zhou et.al. | 2403.01813 | null |
| 2024-03-03 | A Unified Model Selection Technique for Spectral Clustering Based Motion Segmentation | Yuxiang Huang et.al. | 2403.01606 | null |
| 2024-03-03 | Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition | Kun-Yu Lin et.al. | 2403.01560 | link |
| 2024-03-02 | Dynamic 3D Point Cloud Sequences as 2D Videos | Yiming Zeng et.al. | 2403.01129 | null |
| 2024-02-29 | On the Design of Human-Robot Collaboration Gestures | Anas Shrinah et.al. | 2402.19058 | null |
| 2024-02-23 | Multimodal Transformer With a Low-Computational-Cost Guarantee | Sungjin Park et.al. | 2402.15096 | null |
| 2024-02-17 | Implementation of a Model of the Cortex Basal Ganglia Loop | Naoya Arakawa et.al. | 2402.13275 | null |
| 2024-02-20 | Radar-Based Recognition of Static Hand Gestures in American Sign Language | Christian Schuessler et.al. | 2402.12800 | null |
| 2024-02-20 | Learning Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition | Yuke Li et.al. | 2402.12706 | null |
| 2024-02-19 | Comprehensive Cognitive LLM Agent for Smartphone GUI Automation | Xinbei Ma et.al. | 2402.11941 | link |
| 2024-02-15 | Hand Shape and Gesture Recognition using Multiscale Template Matching, Background Subtraction and Binary Image Analysis | Ketan Suhaas Saichandran et.al. | 2402.09663 | null |
| 2024-02-14 | TikTokActions: A TikTok-Derived Video Dataset for Human Action Recognition | Yang Qian et.al. | 2402.08875 | null |
| 2024-02-13 | BdSLW60: A Word-Level Bangla Sign Language Dataset | Husne Ara Rubaiyeat et.al. | 2402.08635 | link |
| 2024-02-13 | Vision-Based Hand Gesture Customization from a Single Demonstration | Soroush Shahi et.al. | 2402.08420 | null |
| 2024-02-12 | PBADet: A One-Stage Anchor-Free Approach for Part-Body Association | Zhongpai Gao et.al. | 2402.07814 | null |
(<a href=../README.md>back to main</a>)