Repository logo
  • English
  • العربية
  • বাংলা
  • Català
  • Čeština
  • Deutsch
  • Ελληνικά
  • Español
  • Suomi
  • Français
  • Gàidhlig
  • हिंदी
  • Magyar
  • Italiano
  • Қазақ
  • Latviešu
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Српски
  • Svenska
  • Türkçe
  • Yкраї́нська
  • Tiếng Việt
Log In
New user? Click here to register.Have you forgotten your password?
  1. Home
  2. IIT Gandhinagar
  3. Computer Science and Engineering
  4. CSE Publications
  5. Online Coresets for Parametric and Non-Parametric Bregman Clustering
 
  • Details

Online Coresets for Parametric and Non-Parametric Bregman Clustering

Source
Transactions on Machine Learning Research
Date Issued
2022-01-01
Author(s)
Chhaya, Rachit
Choudhari, Jayesh
Dasgupta, Anirban  
Shit, Supratim
Volume
2022-July
Abstract
We present algorithms that create coresets in an online setting for clustering problems based on a wide subset of Bregman divergences. Notably, our coresets have a small additive error, similar in magnitude to the gap between expected and empirical loss Bachem et al. (2017a), and take update time O(d) for every incoming point where d is the dimension of the point. Our first algorithm gives online coresets of size Õ(poly(k, d, ϵ, µ)) for k-clusterings according to any µ-similar Bregman divergence. We further extend this algorithm to show the existence of non-parametric coresets, where the coreset size is independent of k, the number of clusters, for the same subclass of Bregman divergences. Our non-parametric coresets also function as coresets for non-parametric versions of the Bregman clustering like DP-Means. While these coresets provide additive error guarantees, they are significantly smaller for high dimensional data than the (relative-error) coresets obtained in Bachem et al. (2015) for DP-Means— for the input of size n our coresets grow as O(log n) while being independent of d as opposed to O(d<sup>d</sup>) for points in R<sup>d</sup> Bachem et al. (2015). We also present experiments to compare the performance of our algorithms with other sampling techniques.
URI
http://repository.iitgn.ac.in/handle/IITG2025/28522
IITGN Knowledge Repository Developed and Managed by Library

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
Repository logo COAR Notify