School of Technology and Computer Science Seminars

DNA Sequencing and Information Theory

by Mr. Anantha Karthik (School of Technology and Computer Science, TIFR)

Friday, December 27, 2013 from to (Asia/Kolkata)
at Colaba Campus ( D-405 (D-Block Seminar Room) )
Description
DNA sequencing is an important method in Modern Biology. The predominant technique used is Shotgun sequencing where randomly located fragments (base pairs) called 'reads' are extracted from a DNA sequence. These 'reads' are later 'stitched' to reconstruct the original sequence. The minimum number of 'reads' required to 'stitch' the sequence reliably is an important quantity.

In this talk, I will show an analogy between Shotgun sequencing and Shannon's communication model, and discuss 'sequencing capacity'. This is the maximum number of DNA base pairs that can be resolved reliably per 'read', and also a fundamental limit to the performance of a 'stitching' algorithm. I will derive the sequencing capacity for a simple model of shotgun sequencing.


References :
1. S. Motahari, G. Bresler, and D. Tse, “Information theory of DNA sequencing,” http://arxiv.org/abs/1203.6233 , 2012.
2. S. Motahari, G. Bresler, and D. Tse, “Information Theory for DNA Sequencing: Part 1: A Basic Model,”Proc. IEEE    International Symposium on Information Theory, pp. 2741–2745, Cambridge, MA, July 2012
3. J. Miller, S. Koren, and G. Sutton, “Assembly algorithms for next-generation sequencing data,” Genomics, vol. 95, pp. 315–327, 2010.