September 22, 2012

ADMIXTURE analysis of Schlebusch et al. (2012) data

The ADMIXTURE analysis of Schlebusch et al. (2012) did not include Eurasian references, but thanks to the fact that the authors have made their data publicly available, anyone can carry out additional analyses on it. I am sure that this data will be very useful in the future. The list of included populations, with sample sizes are:


  • ColouredColesberg_Sch 20
  • ColouredWellington_Sch 20
  • Khomani_Sch 39
  • Karretjie_Sch 20
  • Khwe_Sch 17
  • GuiGhanaKgal_Sch 15
  • Juhoansi_Sch 18
  • Nama_Sch 20
  • Xun_Sch 19
  • SEBantu_Sch 20
  • SWBantu_Sch 12

As is my convention, the _Sch ending denotes that these populations are from the Schlebusch et al. paper


As always with a new dataset, after processing it, I ran a quick test to make sure everything seemed to be alright. This time, I included the 220 individuals in the released datasets together with 28 HGDP Sardinians and 10 HGDP Dai, and ran a quick K=4 ADMIXTURE analysis:


These appear to make sense. The "green" Dai-like element in the Coloured samples is probably a stand-in for Indian ancestry in that population. The plot of individuals shows considerable variation within several populations:

2 comments:

Amanda S said...

The Cape Coloured community is more likely to have ancestry from Indonesia than from India as the Dutch settlers brought people from the Dutch East Indies to the Cape. Perhaps some came from the Dutch colony in Sri Lanka.

Large numbers of Indians came later to South Africa under British rule, many as indentured workers and mostly to a different province (Natal).

Ted Kandell said...

Are the Ju’hoansi then the most highly divergent Khoisan group, akin to the Mbuti and the Hadza who form their own set extremes on the Paleoafrican axis on the MDS plots?

I would like to see the data from this Ju’hoansi San group included in the "San" principal component in various implementations of DIYDodecad as well as the Hadza, separately. Each of these three will provide sets of some of the most highly divergent modern human alleles, and deserve to be considered separately.

How does the new almost complete whole Denisovan genome plot now that we have these populations?

Dienekes, can you do a five-way MDS plot of modern Humans, the Chimpanzee, Bonobo, Denisovan, and Neanderthal, now that we have more whole genome sequences from Paleoafricans and others, and a quite complete Denisovan genome?

Also, can you tease out any sort of "African Archaic" component and then do a PCA with the "African Archaic", Neanderthal and Denisovan, to help identify regions of possible Archaic admixture?

The Geno 2.0 chip has ILS SNPs as well as Neanderthal and Denisovan SNPs, but they have (tried to) clean out all non-synonymous protein-coding SNPs as well as SNPs in full LD with those. Since Dodecad is not limited by such restrictions I think you can do a better job of detecting even low-level Archaic admixture. This should help us figure out any differences between admixture with East Asian Neanderthals and European / Caucasian Neanderthals and Denisovans, and help us detect the source of any "Back to Africa" Neanderthal admixture within Africa.