Samier Merchant and co-authors Derrick Wood and IDIES Associate Director Steven Salzberg have found evidence in genome databases of unnoticed and unexpected errors. In their recent paper, Unexpected cross-species contamination in genome sequencing projects, published in PeerJ, they show that many putatively ‘complete’ genomes, including Neisseria gonorrhoeae TCDC-NG08107, contain cross-species contamination. Their work indicates that genomes in databases may contain sequences from other species, even other kingdoms, such as mammalian DNA in bacterial sequences. The errors Salzberg’s group found were on the order of a few base pairs per million – a small number with big potential to cause problems for researchers. The errors may have been introduced during sampling, preparation, or during the computational process of sequence assembly. These findings underscore a danger researchers unknowingly face when comparing their own results against contaminated genomes: treating complete, or even partial, genome sequences from databases as validated. Against this flawed standard, researchers unwittingly ascribe deviations from the norm as originating in their own samples, leading to confusing and erroneous results. The paper reminds us of the importance of validating genome sequences obtained from genome databases. The art and science of genome processing has made tremendous strides, but is far from perfected.
For more information check out Steven Salzberg’s Website or Follow him on Twitter.
- Salzberg and Colleagues Find Genome Databases Contain Cross-Species Contamination