With the growth of home DNA testing, online services such as GEDMatch, MyHeritage and FamilyTreeDNA have become popular places for people to upload their genetic information, research their genealogy and find lost relatives.
They have also been used by law enforcement to find criminal suspects through a DNA match with relatives.
Now Professor Graham Coop and postdoctoral researcher Michael “Doc” Edge at the University of California, Davis warn that these “direct to consumer” services could be vulnerable to a sort of genetic hacking.
By uploading selected DNA sequences, they said, it may be possible, for example, to pull out the genomes of most people in a database or to identify people with genetic variants associated with specific traits such as Alzheimer’s disease.
A paper describing the problem is posted online Oct. 22. Coop and Edge notified the database companies of the problem in mid-July to allow them time to put countermeasures in place.
“People are giving up more information than they think they are,” when they upload to these publicly accessible sites, Coop said.
And unlike credit card information, you can’t just cancel your old genome and get a new one.
The problems do not affect for-profit DNA sequencing companies such as 23andMe, Coop said. You have to submit your DNA as a saliva sample to get access to their genetic data.
The public databases, however, allow anyone to upload DNA sequences and search for other users with matching sequences.
Identical by state and descent
These sites work by using software to compare DNA sequences uploaded by users with sequences already in their database. Your genome is a mosaic of pieces inherited from your ancestors.
Bigger pieces, or tiles in the mosaic, come from recent ancestors. As generations pass, matching sequences get chopped into smaller pieces. So if you share large chunks of DNA sequence with someone else, it’s likely you share a recent ancestor.
Coop and Edge found three approaches that yield far more information from a DNA database than just some lost cousins. (Their tests used a public collection of human DNA sequences available for research, not the hobbyist databases.)
They call these methods IBS (identical by sequence) tiling, IBS probing and IBS baiting.
In IBS tiling, an attacker uploads several genomes found in public research databases and keeps track of which ones match with other genomes in the database, and where. If they can find enough matching tiles, they can put together most of someone’s genome.
IBS probing can be used to hunt for people who carry a specific genetic variant — for example, a gene tied to Alzheimer’s disease.
To do this, the attacker creates a fake genome with a DNA sequence that isn’t likely to match anyone, except for one small section that will match the gene of interest. Matches from the database are likely to be people with this genetic variant.
Finally, IBS baiting relies on tricking one class of algorithms used to identify relatives. (Not all databases use this type of algorithm, though). Coop and Edge calculate that with as few as 100 uploaded DNA sequences, an attacker could use this method to obtain most of the genomic information in a database.
All three attacks could be carried out by someone with knowledge of genetics and computing, such as a graduate student or serious hobbyist, but “the good news is that it’s quite preventable,” Edge said.
Coop and Edge’s paper sets out a series of steps direct-to-consumer genetics services could take to block these attacks. While they have already shared the information with the leading services, they have had a “varied” response, Coop said.
Using these services necessarily involves giving up personal information, and millions of people seem willing to do that in exchange for researching family history or other personal uses. But users should be more aware of exactly how much information they might be giving up when they access these services.
“We would like them to clarify their vulnerabilities and how they’re addressing them,” Coop said.
Written by Andy Fell.