Genomes

Problem put to rest

Print edition : July 24, 2015

Comparing the genomes of different species,or different members of the same species,is the basis of a great deal of modern biology. DNA sequences that are conserved across species are likely to be functionally important, while variations between members of the same species can indicate different susceptibilities to disease.

The basic algorithm for determining how much two sequences of symbols in the genomic sequences have in common—called the “edit distance” between them—is now more than 40 years old. Equivalently, it is the minimum number of edits—deletions, insertions, and substitutions—required to turn one string into another. And for more than 40 years, computer science researchers have been trying to improve upon it, without much success.

At the ACM Symposium on Theory of Computing (STOC), researchers from the Massachusetts Institute of Technology (MIT) reported that, in all likelihood, the algorithm is as good as it gets. The proof that this 40 year-old algorithm is the best possible will come as a relief to computer scientists.

This implies that if the widely held assumption about computational complexity is correct, then the problem of measuring the difference between two genomes—or texts, or speech samples, or anything else that can be represented as a string of symbols—cannot be solved more efficiently.

In a sense, that is disappointing, since a computer running the existing algorithm would take 1,000 years to exhaustively compare two human genomes. But it also means that computer scientists can stop agonising about whether they can do better.

“This edit distance is something that I’ve been trying to get better algorithms for since I was a graduate student, in the mid-’90s,” said Piotr Indyk, a professor of computer science and engineering at the MIT and co-author of the STOC paper. “I certainly spent lots of late nights on that—without any progress whatsoever. So at least now there’s a feeling of closure.”

The standard algorithm for determining edit distance is known as the Wagner-Fischer algorithm. Computer scientists measure algorithmic efficiency as computation time relative to the number of elements the algorithm manipulates. For the Wagner-Fischer algorithm the running time is proportional to the product of the lengths of the two strings it is considering. Double the lengths of the strings, and the running time quadruples. In computer parlance, the algorithm runs in quadratic time.

Compiled by R. Ramachandran

A letter from the Editor


Dear reader,

The COVID-19-induced lockdown and the absolute necessity for human beings to maintain a physical distance from one another in order to contain the pandemic has changed our lives in unimaginable ways. The print medium all over the world is no exception.

As the distribution of printed copies is unlikely to resume any time soon, Frontline will come to you only through the digital platform until the return of normality. The resources needed to keep up the good work that Frontline has been doing for the past 35 years and more are immense. It is a long journey indeed. Readers who have been part of this journey are our source of strength.

Subscribing to the online edition, I am confident, will make it mutually beneficial.

Sincerely,

R. Vijaya Sankar

Editor, Frontline

Support Quality Journalism
This article is closed for comments.
Please Email the Editor