Action Recognition - 2024-03
Action Recognition - 2024-03
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-03-31 | LLMs are Good Action Recognizers | Haoxuan Qu et.al. | 2404.00532 | translate | read | null |
| 2024-03-29 | Latent Embedding Clustering for Occlusion Robust Head Pose Estimation | José Celestino et.al. | 2403.20251 | translate | read | null |
| 2024-03-29 | A Unified Framework for Human-centric Point Cloud Video Understanding | Yiteng Xu et.al. | 2403.20031 | translate | read | null |
| 2024-03-28 | Zero-shot Prompt-based Video Encoder for Surgical Gesture Recognition | Mingxing Rao et.al. | 2403.19786 | translate | read | link |
| 2024-03-28 | Hypergraph-based Multi-View Action Recognition using Event Cameras | Yue Gao et.al. | 2403.19316 | translate | read | null |
| 2024-03-27 | PLOT-TAL – Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization | Edward Fish et.al. | 2403.18915 | translate | read | null |
| 2024-03-27 | iFace: Hand-Over-Face Gesture Recognition Leveraging Impedance Sensing | Mengxi Liu et.al. | 2403.18433 | translate | read | null |
| 2024-03-27 | An Evolutionary Network Architecture Search Framework with Adaptive Multimodal Fusion for Hand Gesture Recognition | Yizhang Xia et.al. | 2403.18208 | translate | read | null |
| 2024-03-26 | OmniVid: A Generative Framework for Universal Video Understanding | Junke Wang et.al. | 2403.17935 | translate | read | link |
| 2024-03-25 | Understanding Long Videos in One Multimodal Language Model Pass | Kanchana Ranasinghe et.al. | 2403.16998 | translate | read | link |
| 2024-03-25 | Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects | Zicong Fan et.al. | 2403.16428 | translate | read | null |
| 2024-03-24 | Emotion Recognition from the perspective of Activity Recognition | Savinay Nagendra et.al. | 2403.16263 | translate | read | null |
| 2024-03-22 | InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding | Yi Wang et.al. | 2403.15377 | translate | read | link |
| 2024-03-22 | Gesture-Controlled Aerial Robot Formation for Human-Swarm Interaction in Safety Monitoring Applications | Vít Krátký et.al. | 2403.15333 | translate | read | null |
| 2024-03-22 | GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition | Lei Jiang et.al. | 2403.15212 | translate | read | link |
| 2024-03-21 | Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets | Ahmet Alp Kindiroglu et.al. | 2403.14534 | translate | read | link |
| 2024-03-20 | Hierarchical NeuroSymbolic Approach for Action Quality Assessment | Lauren Okamoto et.al. | 2403.13798 | translate | read | null |
| 2024-03-19 | Selective, Interpretable, and Motion Consistent Privacy Attribute Obfuscation for Action Recognition | Filip Ilic et.al. | 2403.12710 | translate | read | null |
| 2024-03-19 | ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More | Jiazhou Zhou et.al. | 2403.12534 | translate | read | null |
| 2024-03-19 | VideoBadminton: A Video Dataset for Badminton Action Recognition | Qi Li et.al. | 2403.12385 | translate | read | null |
| 2024-03-19 | Multi-View Video-Based Learning: Leveraging Weak Labels for Frame-Level Perception | Vijay John et.al. | 2403.11616 | translate | read | null |
| 2024-03-19 | VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation | Weiyao Wang et.al. | 2403.11461 | translate | read | null |
| 2024-03-17 | A Lie Group Approach to Riemannian Batch Normalization | Ziheng Chen et.al. | 2403.11261 | translate | read | link |
| 2024-03-17 | Boosting Semi-Supervised Temporal Action Localization by Learning from Non-Target Classes | Kun Xia et.al. | 2403.11189 | translate | read | null |
| 2024-03-16 | CoPlay: Audio-agnostic Cognitive Scaling for Acoustic Sensing | Yin Li et.al. | 2403.10796 | translate | read | null |
| 2024-03-15 | CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner | Tingbing Yan et.al. | 2403.10082 | translate | read | null |
| 2024-03-15 | Skeleton-Based Human Action Recognition with Noisy Labels | Yi Xu et.al. | 2403.09975 | translate | read | null |
| 2024-03-14 | On the Utility of 3D Hand Poses for Action Recognition | Md Salman Shamil et.al. | 2403.09805 | translate | read | null |
| 2024-03-14 | 3D-VLA: A 3D Vision-Language-Action Generative World Model | Haoyu Zhen et.al. | 2403.09631 | translate | read | link |
| 2024-03-14 | SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition | Jeonghyeok Do et.al. | 2403.09508 | translate | read | link |
| 2024-03-14 | EventRPG: Event Data Augmentation with Relevance Propagation Guidance | Mingyuan Sun et.al. | 2403.09274 | translate | read | link |
| 2024-03-14 | Leveraging Foundation Model Automatic Data Augmentation Strategies and Skeletal Points for Hands Action Recognition in Industrial Assembly Lines | Liang Wu et.al. | 2403.09056 | translate | read | null |
| 2024-03-13 | Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models | Wensheng Liang et.al. | 2403.08420 | translate | read | null |
| 2024-03-13 | NaturalVLM: Leveraging Fine-grained Natural Language for Affordance-Guided Visual Manipulation | Ran Xu et.al. | 2403.08355 | translate | read | null |
| 2024-03-13 | ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation | Guanxing Lu et.al. | 2403.08321 | translate | read | link |
| 2024-03-12 | NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning | Bingqian Lin et.al. | 2403.07376 | translate | read | link |
| 2024-03-12 | BID: Boundary-Interior Decoding for Unsupervised Temporal Action Localization Pre-Trainin | Qihang Fang et.al. | 2403.07354 | translate | read | null |
| 2024-03-11 | Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling | Wele Gedara Chaminda Bandara et.al. | 2403.06978 | translate | read | link |
| 2024-03-11 | Deep Learning Approaches for Human Action Recognition in Video Data | Yufei Xie et.al. | 2403.06810 | translate | read | null |
| 2024-03-11 | Real-Time Multimodal Cognitive Assistant for Emergency Medical Services | Keshara Weerasinghe et.al. | 2403.06734 | translate | read | null |
| 2024-03-11 | Multimodal Transformers for Real-Time Surgical Activity Prediction | Keshara Weerasinghe et.al. | 2403.06705 | translate | read | link |
| 2024-03-11 | epsilon-Mesh Attack: A Surface-based Adversarial Point Cloud Attack for Facial Expression Recognition | Batuhan Cengiz et.al. | 2403.06661 | translate | read | null |
| 2024-03-11 | Density-Guided Label Smoothing for Temporal Localization of Driving Actions | Tunc Alkanat et.al. | 2403.06616 | translate | read | null |
| 2024-03-11 | Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition | Erkut Akdag et.al. | 2403.06577 | translate | read | null |
| 2024-03-10 | Coherent Temporal Synthesis for Incremental Action Segmentation | Guodong Ding et.al. | 2403.06102 | translate | read | null |
| 2024-03-09 | Dissecting Deep RL with High Update Ratios: Combatting Value Overestimation and Divergence | Marcel Hussing et.al. | 2403.05996 | translate | read | null |
| 2024-03-08 | Benchmarking Micro-action Recognition: Dataset, Methods, and Applications | Dan Guo et.al. | 2403.05234 | translate | read | link |
| 2024-03-06 | Video Relationship Detection Using Mixture of Experts | Ala Shaabana et.al. | 2403.03994 | translate | read | link |
| 2024-03-05 | Behavior Generation with Latent Actions | Seungjae Lee et.al. | 2403.03181 | translate | read | link |
| 2024-03-05 | Learning to Use Tools via Cooperative and Interactive Agents | Zhengliang Shi et.al. | 2403.03031 | translate | read | null |
| 2024-03-04 | Gesture recognition with Brownian reservoir computing using geometrically confined skyrmion dynamics | Grischa Beneke et.al. | 2403.01877 | translate | read | null |
| 2024-03-04 | A Simple Baseline for Efficient Hand Mesh Reconstruction | Zhishan Zhou et.al. | 2403.01813 | translate | read | null |
| 2024-03-03 | A Unified Model Selection Technique for Spectral Clustering Based Motion Segmentation | Yuxiang Huang et.al. | 2403.01606 | translate | read | null |
| 2024-03-03 | Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition | Kun-Yu Lin et.al. | 2403.01560 | translate | read | link |
| 2024-03-02 | Dynamic 3D Point Cloud Sequences as 2D Videos | Yiming Zeng et.al. | 2403.01129 | translate | read | null |
(<a href=../Action_Recognition.md>back to Action Recognition</a>)