Thursday, March 09, 2017

DNA data storage keeps getting better

I forgot to mention a month or so ago - one good thing (the only good thing?) to come out of having Trump as President is that the AAAS ran a big subscription campaign to get more people supporting science, which meant I now get full access to Science magazine for a year for the princely sum of $50 (US)!  

This week's edition had a story about new techniques being done to see how good DNA data storage could be - and the answer is very, very good.   From the free article about it:
Now, researchers report that they’ve come up with a new way to encode digital data in DNA to create the highest-density large-scale data storage scheme ever invented. Capable of storing 215 petabytes (215 million gigabytes) in a single gram of DNA, the system could, in principle, store every bit of datum ever recorded by humans in a container about the size and weight of a couple of pickup trucks....

Erlich thought he could get closer to that limit. So he and Dina Zielinski, an associate scientist at the New York Genome Center, looked at the algorithms that were being used to encode and decode the data. They started with six files, including a full computer operating system, a computer virus, an 1895 French film called Arrival of a Train at La Ciotat, and a 1948 study by information theorist Claude Shannon. They first converted the files into binary strings of 1s and 0s, compressed them into one master file, and then split the data into short strings of binary code. They devised an algorithm called a DNA fountain, which randomly packaged the strings into so-called droplets, to which they added extra tags to help reassemble them in the proper order later. In all, the researchers generated a digital list of 72,000 DNA strands, each 200 bases long.

They sent these as text files to Twist Bioscience, a San Francisco, California–based startup, which then synthesized the DNA strands. Two weeks later, Erlich and Zielinski received in the mail a vial with a speck of DNA encoding their files. To decode them, the pair used modern DNA sequencing technology. The sequences were fed into a computer, which translated the genetic code back into binary and used the tags to reassemble the six original files. The approach worked so well that the new files contained no errors, they report today in Science. They were also able to make a virtually unlimited number of error-free copies of their files through polymerase chain reaction, a standard DNA copying technique. What’s more, Erlich says, they were able to encode 1.6 bits of data per nucleotide, 60% better than any group had done before and 85% the theoretical limit.
So, at 215 million gigabytes of storage per gram, I was curious as to how much information could be potentially stored in a human body's worth of DNA.

Googling the question "how much does all the DNA in a human weigh?" came up with estimates that seem to vary from around 6 g to 300g, but then there is also the question of DNA in all the microbes we host in our guts.  

But sure, it seems that, in theory, a human could be an enormous data storage device...

No comments:

Post a Comment