Filling streamflow data gaps in Indian catchments using machine learning and K‐Means clustering

Vimal Mishra

doi:10.1029/2025WR042210

Filling streamflow data gaps in Indian catchments using machine learning and K‐Means clustering

Source

Water Resources Research

ISSN

0043-1397

Date Issued

2026-03-01

Author(s)

Hiren Solanki

Vimal Mishra

DOI

10.1029/2025WR042210

Volume

62

Issue

3

Abstract

Reliable and continuous water level and streamflow records are essential for accurate hydrological modeling and informed water resource management, yet observations often suffer from substantial data gaps. Existing data gap-filling methods use observations from upstream or nearby stations with local climate records. However, their applicability to long-term, large-scale hydrological networks over diverse climate zones with extended data gaps remains limited. Here, we develop a robust framework that integrates geomorphological, meteorological, and hydrological parameters with the Quantile Regression Forests to fill the gaps in daily streamflow observations at 343 stations across Peninsular India from 1961 to 2021. To reconstruct the streamflow at ungauged locations, we employed transfer learning through k-means clustering to group hydromorphologically similar catchments. Our data gap-fill method performs well with Nash-Sutcliffe Efficiency (NSE) of more than 0.8 at 72% and 90% of stations for water level and streamflow, respectively. We found that 50 and 100-year return period events are highly sensitive to data gaps, often leading to significant under- or overestimation by more than 40%. Overall, the gap-filled streamflow and water-level data sets for the 1961–2021 period provide a robust foundation for hydrological modeling, climate change impact assessments, and water resource planning across Peninsular India.

URI

https://repository.iitgn.ac.in/handle/IITG2025/34832