The
link in the article explains how writing/reading works.
Write: You give Biomemory (the company) the data that you want to have stored. They give that information to a lab in Germany, who produces the DNA and puts it into that little circular capsule embedded in the card. You get sent two cards.
Read: If you ever want to read the data, it has to be sent back to the lab. They'll pull the data (which is a destructive process) and they give you the DNA sequence, which you can decode yourself using Biomemory's DNA translator.
Totally impractical for the average user. :-D But really fascinating.
Yeah, I assumed it was something like this. Commercial DNA synthesis providers sell double-stranded DNA for less than 0.1 USD per base pair, so you can get 1000+ base pairs for on the order of $100 or less. This will be on the order of a couple of micrograms of DNA, some of the molecules will have errors, but on average they'll be sequence correct (think of it as a *huge* RAID with many trillions of parity disks).
Resuspend the tube that they ship you with water, transfer it to a filter paper (
http://wang.ucsd.edu/protocol/2. molecular cloning/2.3 Amplification/Shipping_and_Receiving_Plasmids_on_Filter_Paper.pdf ), let it dry, and store the filter paper in an air tight bottle with a desiccant, and you'll probably have something equivalent to what this company is selling.
When you want to read it, pull the filter paper out and rehydrate the DNA, then send it to a Sanger sequencing facility to get the read out for $5 or something around that.
To make things easier, the ends of the DNA can encode primer recognition sites using commonly available primers. The most common primers are generally provided free by the sequencing facilities, so if possible, use them to save some money.
There are current limits both for synthesis, and some for sequencing. For instance DNA sections which are rich in G/C content or A/T content do not synthesize well, and can skip some bases or prematurely terminate when sequencing. This is also true for DNA that creates secondary structure and long stretches of the same base, or has repeats. There are ways around this, but it sometimes can be more expensive or have length limitations. This will limit the maximum encoding to somewhat less than 4 bases per position.
I would guess that much of the 40-mer for the above "hello" encoding is for defined 'stuffer' sequences on the ends either for primer recognition sites, or just to prevent any degradation of the ends from impacting the text coding portion if these are synthesized as linear molecules and not cloned as circular molecules.This is not the case. It looks like they're using sequences of 8 to encode each digit. Maybe they're covering the entire Unicode specification? The first 3 of 8 seem to indicate letter, number, etcetera, while the next 5 presumably encoded the actual digit.
Ultimately the readability issue will be which encoding is being used.