Badatya, Bikash KumarBikash KumarBadatyaBaghel, VipulVipulBaghelHegde, RaviRaviHegde2026-01-072026-01-072025-10-2910.1109/STAR66750.2025.11264778http://repository.iitgn.ac.in/handle/IITG2025/33771Fine-grained action localization in untrimmed sports videos is a challenging task, as motion transitions are subtle and occur within short time spans. Traditional supervised and weakly supervised methods require extensive labeled data, making them less scalable and generalizable. To address these challenges, we propose an unsupervised skeleton-based action localization pipeline that detects fine-grained action boundaries using spatio-temporal graph embeddings. Our approach involves pre-training an Attention-based Spatio-Temporal Graph Convolutional Network (ASTGCN) on a blockwise partitioned pose-sequence to pose-sequence denoising task, enabling the model to learn motion dynamics in an unsupervised manner. During inference, we introduce an Action Dynamics Metric (ADM), computed from ASTGCN-derived embeddings, to detect motion transitions based on inflection points in the curvature of the ADM sequence. Experiments conducted on the DSV Diving dataset demonstrate that our unsupervised method achieves mAP of 82.67% which is comparable to state-of-the-art supervised methods. Additionally, our approach generalizes well to in-the-wild diving videos without requiring labeled data, proving its robustness and scalability for real-world applications.en-USSports AnalyticsSkeleton-based Action LocalizationGraph ConvolutionRepresentation LearningInterpretabilityGraph-based unsupervised temporal segmentation of diving actionsConference Paper0