Data storage in DNA

Print edition : February 22, 2013

DNA double helix. Photo: Science Photo Library

RESEARCHERS at the European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), United Kingdom, have created a way to store data in the form of deoxyribonucleic acid (DNA), a material that lasts for tens of thousands of years. The new method, published in the journal Nature, makes it possible to store at least 100 million hours of high-definition video in about a cup of DNA.

There is a lot of digital information in the world, and the constant influx of new digital content poses a real challenge for archivists. Hard disks are expensive and require a constant supply of electricity, while even the best “no-power” archiving materials such as magnetic tape degrade within a decade. “We already know that DNA is a robust way to store information because we can extract it from woolly mammoth bones, which date back tens of thousands of years, and make sense of it,” says Nick Goldman of the EMBL-EBI. “It’s also incredibly small, dense and does not need any power for storage, so shipping and keeping it is easy.”

Reading DNA is fairly straightforward, but writing it has until now been a major hurdle to making DNA storage a reality. There are two challenges: First, using current methods it is only possible to manufacture DNA in short strings. Secondly, both writing and reading DNA are prone to errors, particularly when the same DNA letter is repeated.

“We knew we needed to make a code using only short strings of DNA, and to do it in such a way that creating a run of the same letter would be impossible. So we figured, let’s break up the code into lots of overlapping fragments going in both directions, with indexing information showing where each fragment belongs in the overall code, and make a coding scheme that doesn’t allow repeats. That way, you would have to have the same error on four different fragments for it to fail—and that would be very rare.”

The new method requires synthesising DNA from the encoded information: enter Agilent Technologies, Inc, a California-based company that volunteered its services. Goldman and co-workers sent the company encoded versions of an .mp3 of Martin Luther King’s “I Have a Dream” speech; a .jpg photo of EMBL-EBI; a .pdf of Watson and Crick’s seminal paper, “Molecular structure of nucleic acids”; a .txt file of all of Shakespeare’s sonnets; and a file that describes the encoding. “We downloaded the files from the Web and used them to synthesise hundreds of thousands of pieces of DNA—the result looks like a tiny piece of dust,” explains Emily Leproust of Agilent. Agilent mailed the sample to EMBL-EBI, where the researchers were able to sequence the DNA and decode the files without errors. Although there are many practical aspects to solve, the inherent density and longevity of DNA makes it an attractive storage medium.

A letter from the Editor


Dear reader,

The COVID-19-induced lockdown and the absolute necessity for human beings to maintain a physical distance from one another in order to contain the pandemic has changed our lives in unimaginable ways. The print medium all over the world is no exception.

As the distribution of printed copies is unlikely to resume any time soon, Frontline will come to you only through the digital platform until the return of normality. The resources needed to keep up the good work that Frontline has been doing for the past 35 years and more are immense. It is a long journey indeed. Readers who have been part of this journey are our source of strength.

Subscribing to the online edition, I am confident, will make it mutually beneficial.

Sincerely,

R. Vijaya Sankar

Editor, Frontline

Support Quality Journalism
This article is closed for comments.
Please Email the Editor