GMOT-Mamba: Mamba-based model prediction for generic multiple object tracking

Raman, Shanmuganathan

doi:10.1109/ICIP55913.2025.11084714

GMOT-Mamba: Mamba-based model prediction for generic multiple object tracking

Source

IEEE International Conference on Image Processing (ICIP 2025)

Date Issued

2025-09-14

Author(s)

Verma, Shashikant

Sebe, Nicu

Raman, Shanmuganathan

DOI

10.1109/ICIP55913.2025.11084714

Abstract

We introduce GMOT-Mamba, a novel Mamba-based model prediction framework for Generic Multiple Object Tracking (GMOT) in video sequences. Our approach features a Weighted Feature Pooling (WFP) layer, which processes encoded target states, and an innovative encoder-decoder architecture that leverages Vision-Mamba (ViM) to predict filter weights. We train our model on combinations of large-scale datasets to capture strong priors and discriminative features necessary for generic object tracking. Through extensive experiments and ablation studies, we demonstrate the effectiveness of our approach, showcasing its competitive performance against state-of-the-art GMOT methods while outperforming SOT methods in both accuracy and inference speed. Our findings underscore the potential of Mamba for enhancing model prediction in visual tracking applications.

Unpaywall

URI

https://repository.iitgn.ac.in0/handle/IITG2025/33147

Subjects

Generic object tracking

Vision Mamba

State space models

Multiple object tracking