D^2 Sampling and the k-means Problem

by Amit Kumar (Indian Institute of Tehnology, New Delhi)

Tuesday, March 29, 2016 from 16:00 to 17:00 (Asia/Kolkata)
at A-201 (STCS Seminar Room)

Description

Given a set of points P in a d-dimensional Euclidean space, the k-means clustering problem seeks to find a set C of k centers such that the sum over all points in P of the square of the distance to the closest center in C is minimized. A natural random sampling based heuristic for finding the k centers is as follows -- pick the first center uniformly at random from the set of points, the next one with probability proportional to the square of the distance from the first center and so on. In this talk, I will survey algorithms for the k-means problem based on this random sampling technique. I will show that this idea can be used to obtain fast PTAS for this problem. We extend this result to constrained k-means clustering problems, where there may be additional constraints on  valid clusterings of the input points (this is joint work with Anup Bhattacharya and Ragesh Jaiswal).