Repository logo
  • English
  • العربية
  • বাংলা
  • Català
  • Čeština
  • Deutsch
  • Ελληνικά
  • Español
  • Suomi
  • Français
  • Gàidhlig
  • हिंदी
  • Magyar
  • Italiano
  • Қазақ
  • Latviešu
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Српски
  • Svenska
  • Türkçe
  • Yкраї́нська
  • Tiếng Việt
Log In
New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Scholalry Output
  3. Publications
  4. A Novel Hierarchical Pipeline for Fine-Grained Punch Recognition in Uncontrolled Setting
 
  • Details

A Novel Hierarchical Pipeline for Fine-Grained Punch Recognition in Uncontrolled Setting

Source
Communications in Computer and Information Science
ISSN
18650929
Date Issued
2026-01-01
Author(s)
Baghel, Vipul
Deb, Sagar Deep
Nagisetti, Rithihas
Srinivasan, Babji
Hegde, Ravi  
DOI
10.1007/978-3-031-93688-3_19
Volume
2473 CCIS
Abstract
Human Action Recognition (HAR) is one of the most emerging topics in the field of Computer Vision. It finds utilization in a wide range of applications like athletics, healthcare, security, sports performance, etc. In sports, HAR can be used to monitor athletes’ movements to facilitate skill development and decision-making training. However, in case of highly dynamic sports like boxing, automatic HAR becomes more difficult. Variations in lighting conditions, background, camera perspectives, and rapid movements collectively compound the complexity in automatic HAR and its performance further worsens with the involvement of uncontrolled environment videos. This manuscript proposes a hierarchical pipeline for robust sports activity recognition. For this purpose, a novel regression-based detection is introduced using a Dual Stream Spatio-Temporal Transformer, and a 3D Convolutional Neural Network (CNN) is used for fine-grained classification. The approach commences with regression-based detection to identify punch instances within unedited boxing video streams. Subsequently, a fine-grained classification module is employed to categorize detected punches into six basic punch types: jabs, cross, lead, and rear hooks, or lead and rear uppercuts. This hierarchical architecture enables efficient and accurate punch recognition in diverse real-world wild settings. Experimental results on various unedited YouTube videos show the effectiveness of our proposed approach. The proposed approach achieves an overall mean detection accuracy of 96.09% and a mean classification accuracy of 88.25% on test videos.
Unpaywall
URI
https://d8.irins.org/handle/IITG2025/27978
Subjects
Classification | Regression | Spatio-temporal Transformer | Sports Activity Recognition
IITGN Knowledge Repository Developed and Managed by Library

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
Repository logo COAR Notify