School of Technology and Computer Science Seminars

Streaming Algorithms for Embedding and Computing Edit Distance in the Low Distance Regime

by Michal Koucky (Charles University, Prague, Czech Republic)

Wednesday, April 6, 2016 from to (Asia/Kolkata)
at A-201 (STCS Seminar Room)
Description
The Hamming and the edit metrics are two common notions of measuring 
distances between pairs of strings x,y lying in the Boolean hypercube. The 
edit distance between x and y is defined as the minimum number of 
character insertion, deletion, and bit flips needed for converting x into 
y. Whereas, the Hamming distance between x and y is the number of bit 
flips needed for converting x to y. 

In this talk I will present a randomized injective embedding of the edit 
distance into the Hamming distance with a small (qudratic) distortion. 
Namely, for any satisfying that their edit distance equals , the 
Hamming distance between the embedding of and is with 
high probability. This improves over the distortion ratio of $O(\log n 
\log^* n)$ obtained by Jowhari (2012) for small values of . Moreover, 
the embedding output size is linear in the input size and the embedding 
can be computed using a single pass over the input. 

I will mention several applications for this embedding. Among our results 
we provide a one-pass (streaming) algorithm for edit distance running in 
space and computing edit distance exactly up-to distance . 
This algorithm is based on kernelization for edit distance that is of 
independent (joint work with Diptarka Chakraborty and Elazar Goldenberg).