GMOT-Mamba: Mamba-based model prediction for generic multiple object tracking
Source
IEEE International Conference on Image Processing (ICIP 2025)
Date Issued
2025-09-14
Author(s)
Abstract
We introduce GMOT-Mamba, a novel Mamba-based model prediction framework for Generic Multiple Object Tracking (GMOT) in video sequences. Our approach features a Weighted Feature Pooling (WFP) layer, which processes encoded target states, and an innovative encoder-decoder architecture that leverages Vision-Mamba (ViM) to predict filter weights. We train our model on combinations of large-scale datasets to capture strong priors and discriminative features necessary for generic object tracking. Through extensive experiments and ablation studies, we demonstrate the effectiveness of our approach, showcasing its competitive performance against state-of-the-art GMOT methods while outperforming SOT methods in both accuracy and inference speed. Our findings underscore the potential of Mamba for enhancing model prediction in visual tracking applications.
Subjects
Generic object tracking
Vision Mamba
State space models
Multiple object tracking
