A novel hierarchical pipeline for fine-grained punch recognition in uncontrolled setting

Show simple item record

dc.contributor.author Baghel, Vipul
dc.contributor.author Deb, Sagar Deep
dc.contributor.author Nagisetti, Rithihas
dc.contributor.author Srinivasan, Babji
dc.contributor.author Hegde, Ravi S.
dc.coverage.spatial India
dc.date.accessioned 2025-08-01T07:02:19Z
dc.date.available 2025-08-01T07:02:19Z
dc.date.issued 2024-12-19
dc.identifier.citation Baghel, Vipul; Deb, Sagar Deep; Nagisetti, Rithihas; Srinivasan, Babji and Hegde, Ravi S., "A novel hierarchical pipeline for fine-grained punch recognition in uncontrolled setting", in the 9th International Conference on Computer Vision and Image Processing (CVIP 2024), Chennai, IN, Dec. 19-21, 2024.
dc.identifier.uri https://doi.org/10.1007/978-3-031-93688-3_19
dc.identifier.uri https://repository.iitgn.ac.in/handle/123456789/11709
dc.description.abstract Human Action Recognition (HAR) is one of the most emerging topics in the field of Computer Vision. It finds utilization in a wide range of applications like athletics, healthcare, security, sports performance, etc. In sports, HAR can be used to monitor athletes’ movements to facilitate skill development and decision-making training. However, in case of highly dynamic sports like boxing, automatic HAR becomes more difficult. Variations in lighting conditions, background, camera perspectives, and rapid movements collectively compound the complexity in automatic HAR and its performance further worsens with the involvement of uncontrolled environment videos. This manuscript proposes a hierarchical pipeline for robust sports activity recognition. For this purpose, a novel regression-based detection is introduced using a Dual Stream Spatio-Temporal Transformer, and a 3D Convolutional Neural Network (CNN) is used for fine-grained classification. The approach commences with regression-based detection to identify punch instances within unedited boxing video streams. Subsequently, a fine-grained classification module is employed to categorize detected punches into six basic punch types: jabs, cross, lead, and rear hooks, or lead and rear uppercuts. This hierarchical architecture enables efficient and accurate punch recognition in diverse real-world wild settings. Experimental results on various unedited YouTube videos show the effectiveness of our proposed approach. The proposed approach achieves an overall mean detection accuracy of 96.09% and a mean classification accuracy of 88.25% on test videos.
dc.description.statementofresponsibility by Vipul Baghel, Sagar Deep Deb, Rithihas Nagisetti, Babji Srinivasan and Ravi S. Hegde
dc.language.iso en_US
dc.publisher Springer
dc.title A novel hierarchical pipeline for fine-grained punch recognition in uncontrolled setting
dc.type Conference Paper
dc.relation.journal 9th International Conference on Computer Vision and Image Processing (CVIP 2024)


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account