Quantcast
Channel: Dredd Blog
Viewing all articles
Browse latest Browse all 3572

Pangenome

$
0
0

We tend to think that the scientific data presented to the public is flawless each and every time, but it isn't:

"The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation."

(The Human Pangenome Project, Nature, 2020). But that does not mean the data cannot be improved upon.

That is what the "pangenome" project is all about (The Human Pangenome Reference Consortium).

In a recent series Dredd Blog took a gander at the MetaSUB project which took a smorgasbord view of DNA residue at public places in cities (where most people now reside) around the globe (MetaSUB, 2, 3, 4, 5, 6, 7).

The MetaSUB DNA data is likely to include Bat, Bird, Bacteria, Human, and other DNA samples.

Today the table which was presented in MetaSUB-7 is updated to include the human Pangenome FASTA data as well as updated MetaSUB DNA because the previous MetaSUB data was mRNA by mistake.

The updated table:


Australian, Egyptian, and Viking Mummy DNA
Compared To MetaSUB,
Pangenome& Other Modern Human DNA
CodonAustraliaEgyptHumanMetaSUBPangenomeViking
GCT2.05%1.51%1.10%0.25%1.68%2.06%
GCC1.94%2.32%0.94%4.27%1.52%2.00%
GCA1.90%2.03%1.35%2.52%1.64%1.97%
GCG0.37%2.29%0.22%4.52%0.25%0.36%
GCU0.00%0.00%0.00%0.00%0.00%0.00%
TGT2.29%1.43%2.10%0.44%2.56%2.17%
TGC0.02%0.13%0.01%0.03%0.02%0.03%
UGU0.00%0.00%0.00%0.00%0.00%0.00%
UGC0.00%0.00%0.00%0.00%0.00%0.00%
GAT1.20%2.62%0.85%0.31%1.09%1.10%
GAC0.56%0.88%0.43%0.56%0.53%0.69%
GAU0.00%0.00%0.00%0.00%0.00%0.00%
GAA2.39%2.90%1.91%6.40%2.21%2.42%
GAG0.70%0.50%0.58%0.38%0.60%0.74%
TTT1.09%0.43%1.44%0.03%1.96%1.16%
TTC0.99%0.67%1.38%0.21%1.24%1.06%
UUU0.00%0.00%0.00%0.00%0.00%0.00%
UUC0.00%0.00%0.00%0.00%0.00%0.00%
GGT1.19%0.93%0.66%0.27%1.00%1.07%
GGC0.04%0.14%0.02%0.66%0.03%0.04%
GGA0.22%0.29%0.16%0.13%0.20%0.26%
GGG0.31%0.40%0.22%0.49%0.32%0.38%
CodonAustraliaEgyptHumanMetaSUBPangenomeViking
GGU0.00%0.00%0.00%0.00%0.00%0.00%
CAT0.61%0.29%0.71%0.05%0.68%0.56%
CAC1.24%0.91%0.89%0.06%0.97%1.03%
CAU0.00%0.00%0.00%0.00%0.00%0.00%
ATT0.22%0.10%0.21%0.08%0.35%0.19%
ATC0.41%0.35%0.42%0.15%0.47%0.37%
ATA0.27%0.15%0.34%0.02%0.51%0.25%
AUU0.00%0.00%0.00%0.00%0.00%0.00%
AUC0.00%0.00%0.00%0.00%0.00%0.00%
AUA0.00%0.00%0.00%0.00%0.00%0.00%
AAA1.34%0.48%1.24%2.01%1.59%1.19%
AAG0.22%0.10%0.16%0.01%0.22%0.23%
CTT0.36%0.15%0.21%0.01%0.42%0.38%
CTC1.08%0.54%0.70%0.11%0.96%0.97%
CTA0.24%0.16%0.27%0.07%0.32%0.28%
CTG0.22%0.14%0.13%0.01%0.21%0.28%
TTA0.12%0.06%0.13%0.05%0.19%0.13%
TTG0.10%0.05%0.06%0.00%0.08%0.09%
CUU0.00%0.00%0.00%0.00%0.00%0.00%
CUC0.00%0.00%0.00%0.00%0.00%0.00%
CUA0.00%0.00%0.00%0.00%0.00%0.00%
CUG0.00%0.00%0.00%0.00%0.00%0.00%
CodonAustraliaEgyptHumanMetaSUBPangenomeViking
UUA0.00%0.00%0.00%0.00%0.00%0.00%
UUG0.00%0.00%0.00%0.00%0.00%0.00%
ATG0.02%0.01%0.03%0.00%0.03%0.02%
AUG0.00%0.00%0.00%0.00%0.00%0.00%
AAT0.06%0.03%0.06%0.00%0.08%0.05%
AAC0.20%0.19%0.18%0.04%0.22%0.22%
AAU0.00%0.00%0.00%0.00%0.00%0.00%
CCT0.28%0.12%0.19%0.44%0.27%0.31%
CCC0.37%0.26%0.29%0.02%0.36%0.41%
CCA0.31%0.21%0.28%0.03%0.31%0.37%
CCG0.05%0.26%0.04%0.11%0.05%0.08%
CCU0.00%0.00%0.00%0.00%0.00%0.00%
CAA0.16%0.11%0.15%0.02%0.18%0.16%
CAG0.47%0.16%0.24%0.02%0.37%0.43%
CGT0.08%0.37%0.07%0.10%0.08%0.10%
CGC0.00%0.06%0.00%0.00%0.00%0.00%
CGA0.02%0.14%0.01%0.01%0.01%0.01%
CGG0.02%0.12%0.01%0.02%0.02%0.03%
AGA0.19%0.24%0.25%0.18%0.27%0.22%
AGG0.05%0.03%0.05%0.00%0.06%0.07%
CGU0.00%0.00%0.00%0.00%0.00%0.00%
TCT0.14%0.37%0.10%0.44%0.15%0.16%
CodonAustraliaEgyptHumanMetaSUBPangenomeViking
TCC0.09%0.09%0.07%0.01%0.10%0.12%
TCA0.15%0.07%0.10%0.00%0.13%0.13%
TCG0.03%0.53%0.02%0.01%0.02%0.02%
AGT0.13%0.07%0.09%0.03%0.14%0.16%
AGC0.00%0.02%0.00%0.00%0.00%0.00%
UCU0.00%0.00%0.00%0.00%0.00%0.00%
UCC0.00%0.00%0.00%0.00%0.00%0.00%
UCA0.00%0.00%0.00%0.00%0.00%0.00%
UCG0.00%0.00%0.00%0.00%0.00%0.00%
AGU0.00%0.00%0.00%0.00%0.00%0.00%
ACT0.14%0.05%0.09%0.03%0.13%0.12%
ACC0.11%0.09%0.06%0.03%0.08%0.08%
ACA0.12%0.06%0.12%0.00%0.14%0.12%
ACG0.01%0.06%0.02%0.00%0.02%0.02%
ACU0.00%0.00%0.00%0.00%0.00%0.00%
GTT0.06%0.05%0.03%0.00%0.04%0.05%
GTC0.04%0.11%0.03%0.00%0.04%0.05%
GTA0.05%0.07%0.04%0.13%0.06%0.06%
GTG0.11%0.06%0.06%0.00%0.07%0.09%
GUU0.00%0.00%0.00%0.00%0.00%0.00%
GUC0.00%0.00%0.00%0.00%0.00%0.00%
GUA0.00%0.00%0.00%0.00%0.00%0.00%
CodonAustraliaEgyptHumanMetaSUBPangenomeViking
GUG0.00%0.00%0.00%0.00%0.00%0.00%
TGG0.28%0.11%0.15%1.99%0.24%0.27%
UGG0.00%0.00%0.00%0.00%0.00%0.00%
TAT0.02%0.01%0.02%0.00%0.03%0.01%
TAC0.04%0.04%0.04%0.01%0.06%0.05%
UAU0.00%0.00%0.00%0.00%0.00%0.00%
UAC0.00%0.00%0.00%0.00%0.00%0.00%
TAA0.00%0.00%0.00%0.00%0.00%0.00%
TAG0.00%0.00%0.00%0.00%0.00%0.00%
TGA0.00%0.00%0.00%0.00%0.00%0.00%
UAA0.00%0.00%0.00%0.00%0.00%0.00%
UAG0.00%0.00%0.00%0.00%0.00%0.00%
UGA0.00%0.00%0.00%0.00%0.00%0.00%

Closing Comments

The pop culture of the upper middle class and wealthy is wild about "ancestry".

But just as the hard core scientist prepared GenBank and the like have errors in them, so do the pop culture data:

"Although some people purchase kits from multiple companies, the majority of people take just one test. Each person who buys genetic analysis from Ancestry, for example, consents to having his/her data become part of Ancestry’s enormous database, which is used to perform the analyses that people pay for. There are some interesting implications to how these databases are built.

First, they are primarily made up of paying customers, which means that the vast majority of genetic datasets in Ancestry’s database come from people who have enough disposable income to purchase the kit and analysis. It may not seem like an important detail, but it shows that the comparison population is not the same as the general population.

Second, because the analyses compare the sample DNA to DNA already in the database, it matters how many people from any given area have taken the test and are in the database. An article in Gizmodo describes one family’s experience with DNA testing and some of the pitfalls. The author quotes a representative from the company 23andMe as saying, “Different companies have different reference data sets and different algorithms, hence the variance in results. Middle Eastern reference populations [for example] are not as well represented as European, an industry-wide challenge.”

The same is true for any population where not many members have taken the test for a particular company. In an interview with NPR about trying to find information about her ancestry, journalist Alex Wagner described a similar problem, saying, “There are not a lot of Burmese people taking DNA tests … and so, the results that were returned were kind of nebulous.”

Wagner’s mother and grandmother both immigrated to the US from Burma in 1965, and when Wagner began investigating her ancestry, she, both of her parents, and her grandmother, all took tests from three different direct-to-consumer DNA testing companies. To Wagner’s surprise, her mother and grandmother both had results that showed they were Mongolian, but none of the results indicated Burmese heritage. In the interview she says that one of the biggest things she learned through doing all these tests was that “a lot of these DNA test companies [are] commercial enterprises. So, they basically purchase or acquire DNA samples on market-demand.”

As it turns out, there aren’t many Burmese people taking DNA tests, so there’s not much reason for the testing companies to pursue having a robust Burmese or even Southeast Asian database of DNA."

(The Problems with Ancestry DNA Analyses). Specifically, historical records diminish the uncertainty:

"Although it has been studied for many decades, DNA is not entirely understood. There could be significant SNPs that are not evaluated or recognized as important genetic markers. You also might not inherit certain genes that show your Scandinavian heritage even if your siblings have. Even with the best DNA testing, genes are tricky and cannot tell you everything about your family. Some companies (like Ancestry.com) incorporate the use of historical records to increase their ancestry DNA accuracy."

(GenomeLink). Even noted experts in the field disagree on DNA interpretations in various degrees:

"A New York laboratory has cut its ties with James Watson, the Nobel prize-winning scientist who helped discover the structure of DNA, over 'reprehensible' comments in which he said race and intelligence are connected.

The Cold Spring Harbor Laboratory said it was revoking all titles and honors conferred on Watson, 90, who led the lab for many years.

The lab 'unequivocally rejects the unsubstantiated and reckless personal opinions Dr James D Watson expressed on the subject of ethnicity and genetics', its president, Bruce Stillman, and chair of the board of trustees, Marilyn Simons, said in a statement.

'Dr Watson’s statements are reprehensible, unsupported by science, and in no way represent the views of CSHL, its trustees, faculty, staff, or students. The laboratory condemns the misuse of science to justify prejudice.'”

(Guardian, 2019). The video below features experts who detail a lot of myths about DNA (genes) which are prevalent in various cultures.




Viewing all articles
Browse latest Browse all 3572

Trending Articles