DNA as a digital storage medium; sequences algorithmically avoided for safety reasons (or should be)?

DNA as a digital storage medium; sequences algorithmically avoided for safety reasons (or should be)?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Safety for the environment is probably implicit here as well, but the focus should be on the people who may come in contact with large amounts of synthetic DNA used to encode information as a data storage medium.

The idea of using DNA as a digital storage medium is not new. Two papers are listed below but there are many others.

Recently the concept has been further highlighted by the news of a "movie" being encoded using CRISPR-Cas. A Harvard University video explains the process:

In this video, Wyss Institute and Harvard Medical School researchers George Church and Seth Shipman explain how they engineered a new CRISPR system-based technology that enables the chronological recording of digital information, like that representing still and moving images, in living bacteria. Credit: Wyss Institute at Harvard University

For more information, please visit:…

below: a GIF showing the original data, and the "movie" interpreted from the recording in DNA. From the Los Angeles Times:

DNA DVR? Scientists have uploaded a short movie of a galloping horse into the DNA of living bacteria and were able to retrieve it with a 90% success rate. (Seth Shipman)

The immediate application in the Harvard Wyss Institute work seems to be (based on my understanding of the video) in-situ data logging, where information relevant to an experiment is recorded in the DNA of cells involved in the experiment. So the use of a moving image is a way to bring home the key point of "chronological recording of digital information".

Other uses involve the DNA itself, stored outside of living cells, as simply a mass storage medium.

QUESTION: I'm wondering if there are sequences that are likely to become algorithmically avoided as standard practice for safety reasons (or should be) when using DNA as a digital storage medium. The DNA may be handled by IT personnel rather than trained biologists, or shipped, or otherwise not be treated with the same diligence to safety that synthetic DNA is handled by researchers today.

Some other discussions of DNA as a storage medium:

Next-Generation Digital Information Storage in DNA, Church, Gao, Kosari, 2012. doi:10.1126/science.1226355

Towards practical, high-capacity, low-maintenance information storage in synthesized DNA, Goldman et al. 2013. doi:10.1038/nature11875

For further background on the temporally encoded data work, see the excellent answers to the question What does it mean to “write an image and GIF into the DNA of bacteria”?

edit: I'll quote a comment here just to make sure that the word safety isn't misinterpreted as data integrity or the safety of the bacteria:

When speaking of safety issues related to the handling of biological or biomolecular samples, it's the safety of people that we worry about. Safety rules, safety procedures, safety courses, safety glasses, these are for the safety of the human, not the safety of the sample.

Yes, there could be sequences that a DNA based system could not store handle/very well: repetitive DNA sequences.

Not only do such sequences make a lot of problems during the sequencing ('reading of data'), but they are also more prone to form secondary structure elements like loops, since repetitive DNA can bind to itself in various different ways. This means that similar to natural occuring repeats they are more likely to mutate during DNA amplicifation (the number of 'copies' can quite easily change), which makes the 'data' more instable.

A possible way to solve this problem would be to use specialised data formats that avoid as much Repetition of data as possible (similar to how zip or jpg files work)

Otherwise I agree with Nathan, that DNA by itself is not dangerous (since it won't do anything outside of cells, and the bacteria used in molecular biology are generally harmless unless you start eating them in large quantities or otherwise getting them in your body). The possibility that the DNA could somehow code for toxic proteins is very very low, and you could just check that the 'data' in the DNA does not form a protein by chance anyway.

Even if DNA drives (to give it a name) are being handled by IT professionals instead of trained biologists, we can still expect the environment to be clean enough so that no organism would tend to be affected by it. Also, as @nathan points out in the other answer, free DNA cannot do anything on its own. Its just like an open book, you can't get anything from it unless you have a reader (protein) for it. There have been no reports claiming that foriegn DNA (called cell-free DNA or cf-DNA for humans) has ever caused any health issues. In fact, there are no reports of cf-DNA even being transcribed by humans. You might also want to have a look at this to find out more. However, as pointed out by @AlanBoyd in comments, organisms residing inside humans can take up free floating DNA from their surroundings. For example, Heliobacter pylori has been shown by Stingl et al, 2009 to take up free DNA from its surroundings. Keeping that in mind, there are chances that DNA drives could be harmful (if, by chance, the drive contains some genes that might provide some antibiotic resistance to the pathogen). Some microbes show a phenomenon known as natural competence i.e. they simply take up plasmid DNA from their environment, break it into parts by restriction enzymes and add it into their genome. These organisms, which generally possess sex pilli for DNA take up, can be a major threat to the use of DNA drive if the users of DNA drives show carelesness while handling and mainpulating DNA since some of this DNA might provide them with antibiotic resistance or capability to infect humans. However, we can prevent this from happening by making sure there are no restriction enzyme recognition sites in the DNA of the drive, so that the organism cannot cut the DNA for ligation into its own genome.

I would say no for two reasons.

The first is I don't know of any DNA sequences that are in themselves infectious. You'd need the proper cellular machinery to make anything out of them. If I'm not mistaken even most prokaryotes have varied defenses against random pieces of DNA getting incorporated into their systems. Otherwise they'd never have stable cores. Viruses and bacteria etc have to use strategies to get around those defense systems.

The second reason, even if I'm wrong on the first, is that this system must be designed to be as free as possible from outside influence. A system designed to minimize external influence also inherently minimizes influence to the outside, at least in this case.

This wouldn't be handled by IT people unless it was somehow massively automated. Again no clue on the final form, so hard to say.

I'm wondering if there are sequences that are likely to become algorithmically avoided as standard practice for safety reasons (or should be) when using DNA as a digital storage medium.

I don't think the sequences themselves are the concern here, so much as the procedure used to integrate them into bacteria for data storage, which I don't think is of direct risk to humans.

In the Shipman-Church paper, video frame data are encoded into chunks that are 27 bases long, interrupted by short PAM spacers and barcodes to aid in resequencing (for time-directed data readout).

The larger 35-base oligonucleotides are administered into the host bacteria via electroporation - effectively shocking the cells, which opens up the cell membrane to allow the oligonucleotides to get in for the bacteria's existing Cas1/Cas2 system to integrate into the genome.

Electroporation can be used for delivering oligos in vitro, into human cell cultures, but not typically to live humans. Lab safety and contamination protocols are used to prevent exposure to reagents and limit risk of electrocution.

Even so, shocking people wouldn't be enough as the sequences themselves are not enough to be the risk. Efforts at CRISPR human gene editing in vivo typically focus on the Cas9 system. Human cells don't express Cas9, so that bacterial protein would have be delivered along with the target sequence, which would use delivery techniques not used in this paper (e.g., injection or lipid-based delivery).

It may even have to be done in such a way as to get around immunity to Cas9 protein, or be directed at stem cells or other cell-specific population to have a desired effect.

A 27-base sequence could theoretically target genes critical for health - say, deletion or insertion of some 27-base sequence within an oncogene or tumor suppressor gene, for example, delivered with the CRISPR-Cas9 platform. But to administer this, you'd need an entirely different protocol than what the Shipman-Church paper describes, in order for this storage medium to begin to create the possibility of a safety hazard for humans.

Going to the (paywalled) Goldman-Birney paper, they do not store their data in bacteria, but simply synthesize DNA oligonucleotide strings and effectively store them in blobs of fat (literally). When they want to do data retrieval, they do shotgun sequencing with high coverage to get back the original sequence.

As such, there is no real risk of incorporation into human cells, short of microinjection, and so no real direct risk from subsequences being transcribed and translated into pathogenic proteins or virus particles, etc.

While the sequence data could itself contain pathogens, your IT fellow would need to bring their own protein machinery and protocols to have anything nefarious happen. Sequencing is what is done to affect data retrieval, not transcription/translation, and sequencing does not activate any code in the underlying sequence.

Watch the video: Algorithms for DNA Data Storage by Ruthie Nachmany (June 2022).


  1. Yehudi

    Just that is necessary. An interesting theme, I will participate.

Write a message