search this blog

Saturday, January 13, 2018

Genetic maps featuring 67 ancient genomes and more than 3,000 present-day individuals


I've got some eye candy for you guys as we wait for 2018 to really get going. Below are three Principal Component Analyses (PCA) plots, or genetic maps, based on the ancient diploid dataset from Martiniano et al. 2017 (described in more detail here). Click on the images to download Hi-Res PDFs of each plot. The relevant datasheets are available here.




The important thing about these PCA is that none of the samples in the analyses are missing more than 1% of the ~188K markers used to compute the PCs, which means that I didn't have to resort to any type of projection to get things right. In other words, the relationships between the samples that you see on these plots are direct.

PCA are easy to read. The main thing to keep in mind is that the results are dependent on the samples in the analysis. For instance, note that the Indians (Gujaratis and Brahmins) cluster rather close to some Europeans on the West Eurasian plot, but much further from them on the Eurasian/American plot. Why? Because the addition of hundreds of East Eurasian individuals to the latter plot highlights the significant East Eurasian-related admixture in the Indians, and pulls them away from the Europeans, who generally have much less of this type of ancestry.

It's interesting, I think, that all of the ancients from burial sites from within the borders of present-day Europe (discussed in an earlier blog post here), cluster with present-day Europeans, or at least closest to us. See anything else interesting? Feel free to share it in the comments below.

If you're having trouble spotting certain individuals and/or populations, type the relevant individual or population ID in the PDF search box and click enter. The PDF will initially show you a box where the samples of interest are located; click on the box, and the PDF will zoom into the boxed area and highlight these samples, like this:


See also...

Who's your (proto) daddy Western Europeans?

74 comments:

Slumbery said...

It is interesting to see the effect of mixing far distanced populations on their PCA positions. At least I assume that is the reason behind the weird position of some Ecuadorian samples. Some of them are very close to some Bashkir, Uzgeg and Nogai. So mixing Amerindian with West Eurasian can result the same PCA position as mixing Siberian/East Asian with West Eurasian.
This also a reminder that very close PCA positions can be the results of vastly different recent ancestry.

One of my pet topics is of course Uralic groups. When I was in high school (so decades ago, before the time of modern population genetics) I learned the Chuvash are suspected to be possibly have substantial Uralic ancestry (based on details of their language). This is not decisive (partly because what I just said about recent ancestry and PCA), but their PCA position seems to support that. They are closest to Udmurt and notably they keep close in every PCA, even if East Asian and Siberian groups are removed from the equitation. The only Turkic group that come close to them are Volga Tatars, but those also have a lot of local ancestry.

Shahanshah of Persia said...

Thanks David.

Davidski said...

@Slumbery

This also a reminder that very close PCA positions can be the results of vastly different recent ancestry.

Yes, but different dimensions can sort that out easily, and this is why I posted the PCA datasheets.

Slumbery said...

Davidski

Yes, I agree and I am also glad for your work. Of course you were not the target audience of that warning.

Shahanshah of Persia said...

Also, why do Iranians seem to have a pull towards Arabs, when other plots indicate the contrary?

Davidski said...

Some of these Iranians have minor Sub-Saharan ancestry, so some might also have Arabian ancestry. Not sure which part of Iran they were sampled at though?

Slumbery said...

@Shahanshah of Persia

Are they pulled towards Arabs compared to what reference point? There are a few outliers (obviously the Arab invasion had _some_ effects, at least on the South-West), but on the main mass I do not see a specifically Arab pull on the PCA.

Onur Dincer said...

Interesting that on the worldwide PCA plot the West Eurasia-East Eurasia axis is represented by eigenvector 1 while the West Eurasia-Africa axis is represented by eigenvector 2. Normally it is the other way round on worldwide PCA and similar worldwide analyses.

Eren said...

@Onur: I guess that is due to the predominance of Eurasian samples in the data set.

Shaikorth said...

@Eren
That's surely the case, we know the PCA method is affected by sample sizes more than genealogical relatedness of said samples. McVean demonstrated this:

http://journals.plos.org/plosgenetics/article/figure?id=10.1371/journal.pgen.1000686.g003

Eren said...

@Shaikorth
Yeah, I came to realize that this was the real cause of the problem I had with the Global 10 PCA early last year. As you know I tried to fix this with weighting the PCs.. :D

The solution is an equal representation of samples, but that's probably not as straightforward as it sounds.

Simon_W said...

Interesting how in this West Eurasian PCA South Italians are even more West Asian-shifted than the Sicilians. I suspect these South Italians are from the tip of Calabria. And note how the only samples inbetween South Italians and Cypriots are Sephardic and Moroccan Jews, while Ashkenazi Jews are similarly West Asian as South Italians and Sicilians. A few Greeks are also in this cluster, even some from Macedonia, but the bulk of the Greeks is more northern. Also noteworthy how much Italian_Bergamo overlaps with Iberians.

Simon_W said...

And Ireland_EBA and one of the two Roman_Britain individuals cluster with Slavic people rather than with modern northwest Europeans. Obviously because they still had a little more steppe ancestry than the modern ones.

And one of the two Hungary_BA individuals, no doubt BR2, is quite close to the French. As was already the case in the Global 10 PCA. In the late Bronze Age and presumably in the Iron Age there must have been a belt of French-like people from (presumably) France over southern Germany to western Hungary.

Onur Dincer said...

@Eren

I guess that is due to the predominance of Eurasian samples in the data set.

That is what I thought too. But I wanted to hear David's take on this as his worldwide PCA plots are also usually dominated by the West Eurasia-Africa axis at eigenvector 1.

Onur Dincer said...

@Simon_W

Interesting how in this West Eurasian PCA South Italians are even more West Asian-shifted than the Sicilians. I suspect these South Italians are from the tip of Calabria. And note how the only samples inbetween South Italians and Cypriots are Sephardic and Moroccan Jews, while Ashkenazi Jews are similarly West Asian as South Italians and Sicilians. A few Greeks are also in this cluster, even some from Macedonia, but the bulk of the Greeks is more northern. Also noteworthy how much Italian_Bergamo overlaps with Iberians.

Those strongly West Asian-leaning or West Asian-like Greek individuals almost certainly have recent ancestry from Anatolia, the nearby Aegean islands, Cyprus, Crimea, the Armenian Highland and/or the Levant (in other words, from the Greek communities with origins outside the Balkans, the nearby islands or southern Italy).

Kristiina said...

@Slumbery

If you are interested, you should also check this new paper: Between Lake Baikal and the Baltic Sea: genomic history of the gateway to Europe (https://bmcgenet.biomedcentral.com/articles/10.1186/s12863-017-0578-3)

If you take a look at their Fig. 3 on ancient and recent IBD sharing, you see that Chuvash share ancient ancestry in particular with Komi, Udmurt, Khanty and Tatar. Recent sharing only shows some recent IBD with Tatars.

http://media.springernature.com/full/springer-static/image/art%3A10.1186%2Fs12863-017-0578-3/MediaObjects/12863_2017_578_Fig3_HTML.gif

These conclusions are also interesting:

It is noteworthy that the genomes of closest linguistic relatives of Bashkir, Volga Tatar, bears very little traces of East Asian or Central Siberian ancestry. Volga Tatar are a mix between Bulgar who carried a large Finno-Ugric component, Pecheneg, Kuman, Khazar, local Finno-Ugric tribes, and even Alan. Therefore, Volga Tatars are predominantly European ethnicity with a tiny contribution of East-Asian component. As most Tatar’ IBD is shared with various Turkic and Uralic populations from Volga-Ural region, an amalgamation of various cultures is evident. When the original Finno-Ugric speaking people were conquered by Turkic tribes, both Tatar and Chuvash are likely to have experience language replacement, while retaining their genetic core. Most likely, these events took place sometime around VIII century AD, after the relocation of Bulgar tribes to Volga and Kama river basins, and expansion of Turkic people.

We speculate that Bashkir, Tatar, Chuvash and Finno-Ugric speakers from Volga basin has a common Turkic component, which could have been acquired as a result of Turkic expansion to Volga-Urals region. However, the original Finno-Ugric substrate was not homogeneous: Tatar and Chuvash genomes carry mainly “Finno-Permic” component, while Bashkir carry the “Magyar” one. The fraction of the Turkic component in Bashkir is, undoubtedly, quite significant, and larger than that in Tatar and Chuvash. This component reflects the South Siberian influence on Bashkir, which makes them related to Altai, Kyrgyz, Tuvinian, and Kazakh people.

As a standalone approach, an analysis of shared IBD is not sufficient to support the Finno-Ugric hypothesis of Bashkir origin as a sole source, while pointing at temporal separation of genetic components in Bashkir. Hence, we demonstrated that Bashkir genepool is a multifaceted, multicomponent system, lacking the main “core”; it is an amalgamation of Turkic, Ugric, Finnish and Indo-European contributions. In this mosaic, it is impossible to identify the leading element. Therefore, Bashkir are the most genetically diverse ethnic group of the Volga-Urals region.

Slumbery said...

@Kriistina

Thank you. Davidski wrote about this article on his other blog and I even commented on the high IBD sharing between Khanti and Bashkir, but I have to admit I yet to read the original article.

Using the datasheets here I plotted the relevant populations up to PC7-8 and the Mari-Udmurt-Chuvash is a very persistent cluster that stays together in every dimensions even when everything around them moves around. And the Chuvash do not show any Turkic pull. If anything they are a bit pulled toward the more core European populations (Finnish groups and Slavs) compared to the Mari and Udmurt.

As for the Khanti-Mansi vs. Bashkir, they do not really cluster. It seems to me that Khanti-Mansi have a considerable recent (later than the Ugric common times) Siberian ancestry which they picked up as they moved to Siberia and the North from the South Ural homeland and assimilated some local HG groups. This extra Siberian ancestry pulls them away from the Bashkirs on PCA despite the Baskirs drawing ancestry from the South Ural Ugric population.
Also this PCA data is on agreement with that article about the Baskirs having actual Turkic ancestry. They form a cline towards the Altai samples in multiple dimensions. It is impossible to tell from PCA however whether their Uralic side is Khanti-Mansi related, because the Khanti-Mansi are too strongly effected by their Extra Siberian. In most dimensions they cluster with Ket against other Uralic groups. This reminds me of somebody who claimed on a Hungarian forum that modern Khanti and Mansi are mostly relatively recently assimilated Siberian groups on the periphery.

Matt said...

Really nice stuff, and there's a lot of high level structure here. Would it also be possible to get eigenvalues for the PCs in each datasheet?
Also, if its possible, I would quite like to see if Fst matrix has changed at all for populations under Martiniano's ancient dataset...

@Simon, yep re:Ireland EBA, in the West Eurasian PCA actually it looks like all the ancient Bronze Age Europeans are distinct from Northwest Europeans on the basic East vs West PC2 - https://imgur.com/a/CMVrs and overlap more with Northeast Europeans

At the same time, all the ancient BA for Northwest and West-Central Europe overlap pretty clearly with Western Europeans on the higher order PCs that are contribute further to European structure - https://i.imgur.com/goSAw20.png / https://i.imgur.com/6ZYiR4l.png / https://i.imgur.com/7xIhg0w.png

It will be interesting if it's ever possible to put the big samples of British Bell Beakers on this and really the whole British transect going up to the Iron Age - it seems likely to me that there has to have been a subtle effect of isolation by distance genetic flow probably already by Bronze-Iron to explain why modern Northwest Europeans are ever so slightly different.

On European structure in the West Eurasia plot, I think it's interesting that in terms of general structure (not just specific to a few populations), you've got:

- PC4 that seems to be a slightly "purer" reflection of Anatolian Neolithic ancestry as distinct from the Levant_N ancestry in the Near East (or maybe Arabic specific!), and that seems to have some clear West-East substructure.

- PC5 seems to reflect a distinction between Yamnaya related ancestry and other Volga-Ural ancestry (with Neolithics falling around 0?) and overlap between NW Europeans and non-Russian Northern Baltic-Slavic peoples.

- PC6 seems to be dominated by a West-East European split that doesn't have much to do with Anatolian/Yamnaya related ancestry, with East-Central European samples having their own distinct position (and West Europeans and Volga-Ural, modern and ancient, sitting together).

Onur Dincer said...

@Slumbery

As for the Khanti-Mansi vs. Bashkir, they do not really cluster. It seems to me that Khanti-Mansi have a considerable recent (later than the Ugric common times) Siberian ancestry which they picked up as they moved to Siberia and the North from the South Ural homeland and assimilated some local HG groups. This extra Siberian ancestry pulls them away from the Bashkirs on PCA despite the Baskirs drawing ancestry from the South Ural Ugric population.
Also this PCA data is on agreement with that article about the Baskirs having actual Turkic ancestry. They form a cline towards the Altai samples in multiple dimensions. It is impossible to tell from PCA however whether their Uralic side is Khanti-Mansi related, because the Khanti-Mansi are too strongly effected by their Extra Siberian. In most dimensions they cluster with Ket against other Uralic groups. This reminds me of somebody who claimed on a Hungarian forum that modern Khanti and Mansi are mostly relatively recently assimilated Siberian groups on the periphery.


I always thought that the southern Ural area was home to Indo-European tribes in ancient times and more northern Ural areas where Khanty and Mansi currently live was home to Uralic peoples from time immemorial. Proto-Uralics seem to be a hunter-gatherer population from a northern region around the Ural Mountains (maybe also western Siberia). Proto-Magyars seem to have headed south towards the southern Ural area and come into contact with Indo-European steppe tribes and recently arrived Turkic tribes living there and acquired the steppe culture from them.

Onur Dincer said...

The lack of any Neolithic Anatolian admixture in Khanty and Mansi in contrast to Sintashta and Andronovo peoples and peoples with Sintashta or Andronovo ancestry seems to confirm that they did not come from the southern Ural area.

Slumbery said...

@Onur Dincer

Not very likely, but more importantly you seem to conflate multiple time layers. Regardless of where the Uralic as a whole formed the Ugric branch probably comes from the Cherkaskul/Mezhovskaja archaeological cultures and those were in the Western Sibera - South Ural region. Not exactly on the Southernmost tip of the Urals, but their name-sites and main territories are way South from the current Khanti-Mansi territory.
And exactly because the Ugric branch shows Indoeuropean-Iranic contact it is likely that they formed in the southern contact region and expanded North later. (That is not to say that the North was not Uralic "since time immemorial", but that is beside the point.)
A migration that placed the ancestral Hungarians more into the Steppe is assumed, but the source region was nowhere near as far North as the NW corner of Siberia.


"The lack of any Neolithic Anatolian admixture in Khanty and Mansi in contrast to Sintashta and Andronovo..."
Again, there is a huge time span. Sintashta is ancient, it was long gone before the Ugric branch formed. A lot happened everywhere. Also if the Khanti-Mansi are mostly later assimilated HG-s that actually had to dilute any EEF. And then there are sampling density issues.

Kristiina said...

Slumbery, I agree with you!

Onur, your idea about hunter-gatherers conquering Indo-Europeans does not make any sense.

How do you fit the findings of the recent paper on Sargat yDNA and mtDNA in your theory? There was 2x R1a1 and 5 x N1c1 without any true Siberian mtDNA. Moreover, Sargat samples lack N1b which accounts for c. 28% in Khanty; and c. 63% in Mansi, of which c. 37% is Eastern N1b-VL67.

Sargat samples do not contain Siberian mtDNA. However, 30% of modern Khanty mtDNA and 41.3% of modern Mansi mtDNA is Siberian/Altaian. It is very easy to explain the rise of Siberian ancestry in Ob-Ugrics with N1b-VL67 and mtDNA such as D4e4, D4j2, D4l2, D5a3, C4a1, C4b, C5b, A, G2a, F1c.

https://anthrogenica.com/showthread.php?97-Genetic-Genealogy-and-Ancient-DNA-in-the-News/page170

https://www.researchgate.net/publication/321071660_Kinship_Analysis_of_Human_Remains_from_the_Sargat_Mounds_Baraba_Forest-Steppe_Western_Siberia

Onur Dincer said...

@Slumbery

Not very likely, but more importantly you seem to conflate multiple time layers. Regardless of where the Uralic as a whole formed the Ugric branch probably comes from the Cherkaskul/Mezhovskaja archaeological cultures and those were in the Western Sibera - South Ural region. Not exactly on the Southernmost tip of the Urals, but their name-sites and main territories are way South from the current Khanti-Mansi territory.
And exactly because the Ugric branch shows Indoeuropean-Iranic contact it is likely that they formed in the southern contact region and expanded North later. (That is not to say that the North was not Uralic "since time immemorial", but that is beside the point.)
A migration that placed the ancestral Hungarians more into the Steppe is assumed, but the source region was nowhere near as far North as the NW corner of Siberia.


Those are fair points but:

"The lack of any Neolithic Anatolian admixture in Khanty and Mansi in contrast to Sintashta and Andronovo..."
Again, there is a huge time span. Sintashta is ancient, it was long gone before the Ugric branch formed. A lot happened everywhere. Also if the Khanti-Mansi are mostly later assimilated HG-s that actually had to dilute any EEF. And then there are sampling density issues.


Even hugely Turkic admixed steppe populations such as Altaians, Kazakhs and Kyrgyz have EEF admixture, obviously due to their Andronovo-related ancestry (including Scythian-Saka ancestry), but Khanty and Mansi conspicuously lack it, which needs an explanation. That is why I look for their origins in more northern regions than the southern Ural contact zone.

Rob said...

Onur

“I always thought that the southern Ural area was home to Indo-European tribes in ancient times and””

As in the Kazakh steppe/ Botai people?


Anthro Survey said...

@Simon

Lowland Campanians score like this, too. I've seen their results in other PCAs and they are seemingly more West Asian shifted than Sicilians and definitely more so than any continental European groups. Such a shift is probably a combo of a Roman-age Samaritan-like influx from Syria in addition to an existing Anatolia_BA layer there.

I say *seemingly* because perhaps West Sicilians actually have more post-Neo influence. We simply can't know as we don't have good Roman-age, Fatimid-era DNA from NA or proxies we can be confident in for such populations.

Kristiina said...

@Onur ”Even hugely Turkic admixed steppe populations such as Altaians, Kazakhs and Kyrgyz have EEF admixture, obviously due to their Andronovo-related ancestry (including Scythian-Saka ancestry), but Khanty and Mansi conspicuously lack it, which needs an explanation.”

Khanty and Mansi are very much Western Siberian natives and most of their ancestors probably spoke an extinct Siberian language. They carry a high amount of ANE and EHG as you can see in Fig8: http://media.springernature.com/full/springer-static/image/art%3A10.1186%2Fs12863-017-0578-3/MediaObjects/12863_2017_578_Fig8_HTML.gif in which f3 values to estimate (a) Eastern European Hunter-Gatherer, b Neolithic Farmer, c Caucasus hunter-gatherer, and d) Mal’ta (Ancient North Eurasian) ancestry in modern humans. According to the same graph, Turkic speaking Kyrgyz also lack EEF and CHG similarly as Khanty and Mansi, and however, circa 63% of modern Kyrgyz carry R1a1. Moreover, I presume that there was not any EEF in proto-Uralics. Andronovoans were not Uralics and therefore they are not relevant.

Anthro Survey said...

Davidski,

I do see something interesting, as a matter of fact.

See that unusual Bosnian sample occupying the space between North Caucasus and Balkan clusters? Any thoughts on it?

Had this been a sample from the early 1900s, it wouldn't be TOO surprising since we could suspect some Circassian pasha as an immediate relative.

Onur Dincer said...

@Kristiina

It is well known that Khanty and Mansi lived in more western areas (including west of the Ural Mountains) than they do today before the Russian demographic expansion in Siberia during the last couple of centuries, but I have not seen any reliable evidence of their migration towards north, at least during the historical times.

I do not understand your point by showing those Sargat results, do we have any genomewide autosomal results from them? I see 7 West Eurasian and 1 East Eurasian mtDNA haplogroups among those published Sargat ancient DNA results, but then again, what is your point by showing them?

It is commonly accepted that Proto-Uralics were a hunter-gatherer population based on the reconstructed Proto-Uralic vocabulary and archaeological evidence. Proto-Ugrics were probably reindeer herders as modern Ugric peoples of the Ural area traditionally were, but since modern Ugrics of the Ural area do not show any clear evidence of post-EMBA steppe ancestry, Ugrics of the Ural area probably had very little interaction with steppe peoples after their divergence from other Uralic peoples.

Onur Dincer said...

@Rob

I am not sure about the Botai people, as they are an early people and we have no ancient DNA results from them.

Onur Dincer said...

@Kristiina

I mentioned Andronovans in relation to steppe peoples and to show their contrast with non-steppe peoples such as Khanty and Mansi.

Kyrgyz obviously have some Andronovan-like ancestry, however small, but Khanty and Mansi completely lack it. The high R1a in Kyrgyz is due to drift or founder effect.

Onur Dincer said...

@Anthro Survey

You can see the ADMIXTURE result of that unusual Bosnian sample here:

https://www.researchgate.net/profile/Kristiina_Tambets/publication/264985653/figure/fig2/AS:296007708495873@1447585145141/Figure-2-ADMIXTURE-analysis-of-autosomal-SNPs-of-the-Western-Balkan-region-in-a-global.png

It is from this study:

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0105090

Kristiina said...

@Onur ”I have not seen any reliable evidence of their migration towards north, at least during the historical times.”

Encyclopeadia Britannica explains that ”Together [Khanty and Mansi] numbered some 30,000 in the late 20th century. They are descended from people from the south Ural steppe who moved into this region about the middle of the 1st millennium ad.” (https://www.britannica.com/topic/Khanty#ref272005)

According to Wikipedia ”In the centuries of the second millennium BC, the territories between Kama and Irtysh rivers were the home of a Proto-Uralic speaking population who had contacts with Proto-Indo-European speakers from the South.[5] The inhabitants of these areas were of Europid stock,[5] although the Khanty are predominantly Uraloid. This woodland population is the ancestor of the modern-day Ugrian inhabitants of Trans-Uralia”.
[5] Wiget, Andrew; Balalaeva, Olga (2011). Khanty, People of the Taiga: Surviving the 20th Century. University of Alaska Press. p. 3.

”but since modern Ugrics of the Ural area do not show any clear evidence of post-EMBA steppe ancestry, Ugrics of the Ural area probably had very little interaction with steppe peoples after their divergence from other Uralic peoples.”

Yes, and so what? I took a look at the admixture graph of ”Extensive farming in Estonia started through a sex-biased migration from the Steppe”, p. 12 (https://www.biorxiv.org/content/biorxiv/suppl/2017/03/02/112714.DC1/112714-1.pdf)
Mansi lack EEF but also Yamnaya Samara and Kalmykia lack EEF, which means that the ”IE” ancestry in Mansi is comparable to the "IE" ancestry in Indians, i.e. it is of Yamnaya and not of Andronovo or Sintashta type. According to that admixture graph, Mansi may carry a significant amount of EBA steppe ancestry.

”I see 7 West Eurasian and 1 East Eurasian mtDNA haplogroups among those published Sargat ancient DNA results, but then again, what is your point by showing them?”

According to Ian Logan site C4a2c and C4a2c1 are typical for Pamir. It is not an ancient Siberian haplogroup.

My question is what language do you think that Sargat people spoke?

Matt said...

Looking at the World 20 dimensions here, through neighbour joining and just the West Eurasian populations, actually looks like there's enough structure in it to pick out much more of the structure in West Eurasia than shows up in the Global10: https://imgur.com/a/vnS7q

Should be good for nMonte modeling (scaling might help but even unscaled models seem to work fairly well in nMonte).

Neighbour joining under the West Eurasian tree is qualitatively similar: https://imgur.com/a/XgAFE. Main difference being it captures more of the private drift in some small populations with founder effects or restricted growth - Druze, Ashkenazi, Sardinian, Basque, Roma, etc. Less importance if you're not trying to be 100% sure a sample doesn't share recent ancestry with to those groups or understand their unique recent evolution. Scaling is probably more important on this one as there are so many more dimensions where a single small group is extreme (and so its more of a problem to treat all dimensions as being as large as the first dimension).

So I'm pretty impressed.

Rob said...

@ Kristiina

Mansi:
Mansi
"Itelmen" 55.75
"Yamnaya_Samara" 34.05
"Blatterhole_MN" 10.2
d% = 5

Does that result make sense to you ?

Onur Dincer said...

@Kristiina

Encyclopeadia Britannica explains that ”Together [Khanty and Mansi] numbered some 30,000 in the late 20th century. They are descended from people from the south Ural steppe who moved into this region about the middle of the 1st millennium ad.” (https://www.britannica.com/topic/Khanty#ref272005)

According to Wikipedia ”In the centuries of the second millennium BC, the territories between Kama and Irtysh rivers were the home of a Proto-Uralic speaking population who had contacts with Proto-Indo-European speakers from the South.[5] The inhabitants of these areas were of Europid stock,[5] although the Khanty are predominantly Uraloid. This woodland population is the ancestor of the modern-day Ugrian inhabitants of Trans-Uralia”.
[5] Wiget, Andrew; Balalaeva, Olga (2011). Khanty, People of the Taiga: Surviving the 20th Century. University of Alaska Press. p. 3.


Could be. But that is prehistory for that region and there is a lot of room for speculation there. Speculations get stronger when they are backed by genetics.

”but since modern Ugrics of the Ural area do not show any clear evidence of post-EMBA steppe ancestry, Ugrics of the Ural area probably had very little interaction with steppe peoples after their divergence from other Uralic peoples.”

Yes, and so what? I took a look at the admixture graph of ”Extensive farming in Estonia started through a sex-biased migration from the Steppe”, p. 12 (https://www.biorxiv.org/content/biorxiv/suppl/2017/03/02/112714.DC1/112714-1.pdf)
Mansi lack EEF but also Yamnaya Samara and Kalmykia lack EEF, which means that the ”IE” ancestry in Mansi is comparable to the "IE" ancestry in Indians, i.e. it is of Yamnaya and not of Andronovo or Sintashta type. According to that admixture graph, Mansi may carry a significant amount of EBA steppe ancestry.


I was talking about post-EMBA steppe ancestry, I did not say or imply anything about the lack of EMBA steppe ancestry among Ural Ugrics or Proto-Ugrics. Proto-Uralics (including the Proto-Uralic ancestors of Proto-Ugrics) were obviously interacting with early steppe IE peoples as is clear from the reconstructed Proto-Uralic vocabulary and archaeology. My point is that Ugrics formed and evolved away from the IE contact zone in the Ural area.

According to Ian Logan site C4a2c and C4a2c1 are typical for Pamir. It is not an ancient Siberian haplogroup.

So what?

My question is what language do you think that Sargat people spoke?

Almost certainly a non-Ugric language.

Anthro Survey said...

@Onur Dincer

They didn't try to offer an explanation for the result, it looks like. :-(

By the way, I've also considered Roma ancestry in the individual. It's just that Roma presence is notably rare in territories west of the former First Bulgarian Empire and Bosniaks tend to have strong opinions when it comes to them. Now, the Romanian outliers in the ADMIXTURE run are clearly Roma, though.

I'll take a more in-depth look later. So far, though, I've busted out nMonte and ran the distances. The sample is GSM1424650, it looks like, and is significantly closer to all 3 Roma samples in the dataset than other Bosniaks.

How do you explain it?

Onur Dincer said...

@Rob

Mansi:
Mansi
"Itelmen" 55.75
"Yamnaya_Samara" 34.05
"Blatterhole_MN" 10.2
d% = 5

Does that result make sense to you ?


Nein. The distance is too high anyway.

Rob said...

@ Onur

I think you’re confusing decimal points
Pretty sure that’s on target. Anything smaller is overfitted

Anthro Survey said...

What the heck? I did a preliminary run to test the waters at 20D(lol!).

[1] "distance%=4.1777 / distance=0.041777"

Bosnian_1_GSM1424650

GS000014325 34.45
Bosnian_10_GSM1424651 27.25
Bosnian_11_GSM1424652 13.20
Bosnian_5_GSM1424661 11.75
Bosnian_3_GSM1424659 8.70
Bosnian_7_GSM1424663 1.75
Bosnian_16_GSM1424657 1.70

GS....14325 is one of the Roma samples.
A distance of 0.04 would be terrible on Global10, but it actually indicates a good fit in this case because it's pretty similar to the average inter-sample distance within the Bosniak cluster.

Onur Dincer said...

@Anthro Survey

Not sure how to explain it in the absence of proper South Asian or Roma populations in that ADMIXTURE run. I would be interested to see the results of your nMonte analyses of that sample though.

Onur Dincer said...

@Rob

0.5% would be a close distance, not 5%.

Rob said...

Ah the mistake was mine. Yes it was 0.5% indeed.
So the restul makes sense to me, EMBA steppe with minimal EEF + some sort of paleo-Siberian for Mansi

Onur Dincer said...

@Rob

Ah the mistake was mine. Yes it was 0.5% indeed.

Then it is a good fit, but:

So the restul makes sense to me, EMBA steppe with minimal EEF + some sort of paleo-Siberian for Mansi

Why represent the East Eurasian ancestry of Mansi with a pretty distant population such as Itelmen when there is the far more representative Nganasan population?

Rob said...

@ Onur

"Why represent the East Eurasian ancestry of Mansi with a pretty distant population such as Itelmen when there is the far more representative Nganasan population?"

According to who ?
Anyhow, switching to Nganasan made little difference - they both represent the some LNBA radiation from Siberia.

Mansi
"Nganasan" 50.5
"Yamnaya_Samara" 40.25
"Blatterhole_MN" 9.25

Matt said...

Few nMonte runs based on the raw Ancient 67 World PCA dimensions with European targets, and ancient samples and outgroups as data:

No distance penalty: https://pastebin.com/zwSsqaCv
Distance penalty: https://pastebin.com/8FF65cJv

Populations in Europe get the regional ancestors that make sense (e.g. modern England is mostly RomanBritain+NordicIA/AngloSaxon and the rest is largely some composite of IberianLNBA populations and Steppe ancestry, which all makes sense, and so on for other pops). Some low levels of extra admixture coming in from world populations, but that may just be because the unscaled PCA is not representing the full population distances.

(Note in the Finnish example the new distance penalty feature has the side effect that real distant admixtures, like the Finnish's Siberian ancestry, seem to be removed.)

Davidski said...

@Onur, Erin & Shaikorth

Yes, the World PCA isn't showing Eurasian vs Sub-Saharan differentiation in Eigenvector 1 because the dataset has many more Eurasians than Africans.

However, it's still useful to have the Sub-Saharan Africans on the plot, because they help to flesh out the extra substructures in West Eurasia caused by recent Sub-Saharan admixture there.

@Anthro Survey

That outlier Bosnian sample has recent West Asian admixture, but I haven't tried to pinpoint its precise source. You can probably figure that out though, by looking at the datasheet to see which sample it is, and then analyzing his/her recent ancestry with this cM matrix.

http://eurogenes.blogspot.com/2017/09/ancient-ibdcm-matrix-analysis-offer.html

Onur Dincer said...

@Rob

Interesting nMonte result.

David, can you test using formal methods whether Khanty and Mansi have actual EEF ancestry and its levels?

@Matt

Excellent work with nMonte!

Chad Rohlfsen said...

Yamnaya and Mansi have EEF/Anatolian. That is a certainty.

Onur Dincer said...

@Chad

Yamnaya and Mansi have EEF/Anatolian. That is a certainty.

Yes, Yamnaya seem to have some EEF admixture according to formal analyses, not sure about Khanty and Mansi though. But since Khanty and Mansi seem to have some Yamnaya-related ancestry, they should be expected to have some EEF ancestry too.

Davidski said...

@Matt

I've added a folder with eigenvalues to the datasheets zip file.

https://drive.google.com/file/d/1PYpUj_DHf-lPMJnZGVgzq07ZFcPomOwB/view?usp=sharing

Kristiina said...

Thanks Rob! That makes a lot of sense to me! Yamnaya Samara percentage, 34.05, is indeed very high.

@Onur ”It is commonly accepted that Proto-Uralics were a hunter-gatherer population based on the reconstructed Proto-Uralic vocabulary and archaeological evidence.”

Jaakko Häkkinen has reconstructed two words for metals *wäśka and *äsa and some agricultural words, *oxči (sheep), *woxji (butter), *šeŋti (wheat/barley) and *puśnV (flour), to Proto-Uralic (https://tuhat.halvi.helsinki.fi/portal/fi/persons/jaakko-hakkinen%286e21403c-6ff1-4ba4-a0db-d868bf394c97%29/publications.html).

This means that these words show regular sound correspondences in the daughter languages and existed already in Proto-Uralic. There are also words such as sata, ”hundred” that are reconstructed to Proto-Uralic. This shows that Proto-Uralic speakers represented the modern BA culture. Moreover, the Mansi ethnonym 'Mansi' is linked with the word for man in Old Indian ”manuṣya" and Avestan "manuš".

From the archaeological point of view you should read Asko Parpola’s article ”The problem of Samoyed origins in the light of archaeology: On the formation and dispersal of East Uralic (Proto-Ugro-Samoyed)” (http://www.sgr.fi/sust/sust264/sust264_parpola.pdf) to get more perspective.

Parpola has written the article without any genetic data, and therefore, we have to fit his views with the results of ancient DNA.

He writes about Proto-Hungarian that ”the local Gorokhovo people began the practice of mobile pastoral herding and then became part of the multicomponent pastoralist Sargat culture (c. 500 BCE to 300 CE), which in a broader sense comprised all cultural groups between the Tobol and Irtysh rivers, succeeding here the Sargary culture. The Sargat intercommunity was dominated by steppe nomads belonging to the Iranian-speaking Saka confederation, who in the summer migrated northwards to the forest steppe. A leading Hungarian archaeologist happily supports the following correlation with Proto-Hungarian: “Most scholars of western Siberian archaeology agree that the Sargatka culture can be plausibly identified with the proto-Hungarians”.

This fits perfectly well with the Sargat yDNA which is R1a1 and N1c.

As for Proto-Khanty, he writes that ”Proto-Khanty may have been spoken in the Late Bronze Age and Early Iron Age cultures related to the Gamayunskoe and Itkul’ cultures that extended up to the Ob: the Nosilovo , Baitovo , Late Irmen’ , and Krasnoozero cultures (c. 90 0 – 500 BCE). Some of these were in contact with the Akhmylovo of the Mid-Volga. All these cultures of the forest steppe were later absorbed into the Sargat culture discussed below (Parzinger 2006: 545–564, 679–681).”

As for Proto-Mansi, he writes that ”The Mezhovka culture was succeeded by the genetically related Gamayunskoe culture (c. 1000–700 BCE) (Parzinger 2006: 446; 542–545). From Gamayunskoe descended the Itkul’ culture (c. 700–200 BCE), which was distributed along the eastern slope of the Ural Mountains (Parzinger 2006: 552–556). Known for its walled forts, it constituted the principal Trans-Uralian centre of metallurgy in the Iron Age, and was in contact with both the Anan’ino and Akhmylovo cul- tures (the metallurgical centres of the Mid-Volga and Kama-Belaya region) and the neighbouring Gorokhovo culture.”

From the genetic point of view, it it significant that Sargat men were R1a1 and N1c-Z1936 (possibly Ugric-specific L1034). N1c-Z1936 has been considered the main vector of the expansion of Uralic languages as it has the widest distribution of all N yDNA in the Uralic groups. Therefore, this new paper supports this view, and it is important that this paper also suggests that N1c-Z1936 came from the south and was not autosomally Siberian as there is no Siberian mtDNA.

By the way, Mezhovska samples are R1b1a2-PF6494 (RISE524) and R1a1a1b-Z649 (RISE525), and c. 7% of modern Mansi carry R1b and 5% R1a1 and c. 14% Ugric-specific N1c-L1034.

Matt said...

@Davidski, thanks for that.

Few basic graphics using the scaled World datasheet over all 20 dimensions:

Neighbour Joining Tree: https://imgur.com/a/PJJz7
Euclidean Distance Comparisons :

Yamnaya_Kalmykia vs Sweden_MN: https://i.imgur.com/xmNDNzx.png. (Note that the Northeast Europeans and even Lezgins and Tajiks come out slightly closer to Yamnaya_Kalmykia than NW Europeans do (as would make sense), despite the NJ tree above placing NW Europeans closer to the phylogeny to Yamnaya. This is because of a bridging effect where NW Europeans are related to Iron and Bronze Age Scandinavian, British Isles and Irish, who are in turn most related to Sintashta and Andronovo, who are in turn related to Yamnaya. Once the Baltic Bronze Age and other samples are available at similar quality, the bridge will likely shift to NE Europe).

Further distances here: https://imgur.com/a/PJJz7. It seems like there is enough structure in these dimensions to find slight disequilibria where a) NW Europeans (English, Scottish, Norwegian) seem as roughly related to Loschbour/Bichon/La Brana as NE Europe (Lithuanians, Latvians) and less related to Hungary_HG / Motala_HG than the Balts/Polish, b) West Europeans relatively more related to Iberia_EN, East to Hungary_CA (at a very subtle level).

Will have a go with nMonte a bit later.

huijbregts said...

@Matt
The removal of Finnish/Siberian admixtures in nMonte3 may be unwanted, but it is not a side effect.
In machine learning it is a standard practice to reduce the effects of overfitting by penalizing some feature of the model, preferably in combination with crossvalidation.
Indeed, the Sangarius weighting we have discussed in the past, is a simple scheme to penalize the admixtures of high K dimensions.
The main feature of nMonte3 is the penalizing of reference populations with a great distance to the target population. I think this is useful for modern populations with much variance, like Eurogenes North-European or LukaszM K36.
Penalizing distant populations is risky when you are targetting ancient populations (although I did see a quite acceptable model of Ballynahatty).
It is up to the knowledgeable human user to judge whether a specific variant of penalizing is OK.
If you are interested in Finnish/Siberian admixtures, you should switch the penalty feature off.
A few other remarks:
- If the distance to the closest population is exceptionally small, distance penalizing will result in a nearly 100% admixture for this population.
- The other feature of nMonte3 is that it runs individual samples and aggregates afterwards. This is better than using population averages. You can do this yourself manually, but why.
- I do not agree that the PCA should be scaled. You don't calculate the distance from Paris to Moscow, while scaling the distances by the country you are crossing.
- I too have wondered what might be in PC11-PC20.

Matt said...

@All, nMonte3 outputs on the scaled Ancient 67 World 20 dimensions: https://pastebin.com/z65skAqP

@huijbregts: Yes sure, perhaps we don't wish to talk about it as a side effect. It was merely something that I hadn't considered as an issue of using distance penalization until I actually used the feature and thought worth mentioning to others in this comment thread.

huijbregts: I do not agree that the PCA should be scaled. You don't calculate the distance from Paris to Moscow, while scaling the distances by the country you are crossing.

Let's say you had a series of cities across the world, and a distance matrix giving distances between them. You wanted to transform that matrix into an abstract set of dimensions describing those distances, so you use Principal Coordinates Analysis to do so. We'd expect the two dimensions output to be equivalent to latitude and longitude.

Now, let's say those cities were much more spatially compressed on latitude than longitude. The algorithm should should represent this, and preserve the true distance, by scaling the latitude dimension to a smaller eigenvalue. If it did not eigenvalue scale the dimensions, then when deriving distances back from the output dimensions, you would find distances were relatively inflated for cities which vary in latitude, compared to the ground truth.

In an extreme case, if your cities varied 1 mile in latitude and 100 miles on longitude, you would find that, if the longitude and latitude dimensions were scaled to be the same magnitude, you would predict that very close pairs were considered as distant as very distant pairs. (This could be a problem if you had some algorithm that was built to try and use distance minimization to represent the position of a city as a linear combination of other cities in your data!)

Does this make sense as to why I would find this undesirable? I am not trying to further scale an already scaled output, I am adding eigenvalue scaling into an unscaled output in which, for example, dimension 20 is the exact same size as dimension 1.

If I were given the distance between Moscow and Paris in two abstract dimensions which were scaled such that Moscow=1,1 and Paris=0,0, then of course I would want to scale the dimensions before working out the real distance between them.

PCA software can output dimensions with or without eigenvalue scaling. Whatever Davidski has used here has output the dimensions without that scaling as the default. All the dimensions are the same magnitude.

The other feature of nMonte3 is that it runs individual samples and aggregates afterwards. This is better than using population averages. You can do this yourself manually, but why.

I can think of some reasons, when using populations dispersed in dimensions (for transparency of what the admixture actually represents), when using overlapping populations, and when using populations with no equal sample size. But I am happy to be using the post-workflow aggregation, which is what I've used in the models here.

huijbregts said...

@Matt
I doubt that you are right on scaling. Eigenvalues are heavily dependent on sampling density, so eigenvalue scaling would penalize high frequency populations.

Matt said...

Of course, I equally strongly doubt you're right. The eigenvector scaling more closely recapitulates formal population differentiation statistics of one kind of or another (allele sharing, fst, f3, etc.). The unscaled matrix does not.

In my experience, I'd also say you're exactly wrong about which populations are "penalized"; eigenvalue scaling "penalizes" (reduces distance) for low frequency populations that form a distinct dimension of variation more. For example, scaling the Kalash dimension that they alone score highly on such that it is *not* exactly the same size as the African-Eurasian or West-East Eurasian dimension is obviously going to decrease their relative distance from other Eurasian peoples compared to the unscaled scenario in which high dimensions are treated of equal size to the lower. It's not the most high frequency populations which are "penalized" it is those who are less genetically differentiated in reality.

Qagan said...

Dear Davidski,

Interesting PCA results. I have one curious question. Please bear with me as I am pretty much a layman in genetics.

Shouldn't Amerindians be more Western Eurasian-shifted in the first PCA (PCA world) than their positions considering that Amerindians are genetically around 35-45% ANE (please correct me if I am wrong)? Furthermore, I heard ANE is a lot closer to Western Eurasians than to Eastern Eurasians. (Again, correct me if I am wrong) If that's the case, shouldn't Amerindians be located in similar positions to populations like Kirghiz, Khakass, Altaian, some Kazakhs, etc. than their current positions in terms of Western-shifted ancestry in the first PCA/PCA 67 World?

Please kindly answer my question in layman terms regarding this as I am a pretty much beginner in population genetics.

Thank you very much.


huijbregts said...

@Matt
I am not a mathematician and I thought neither are you. So lets be careful about our assumptions.
My understending is that PCA software can present its output in one of two ways.
In the first way (my preference) the columns of the score matrix have a variance which is equal to the eigenvalue.
The second way (your preference) is called 'eigenvector scaling'. Here the columns of the score matrix are divided by the respective eigenvalues, which sets the variance to 1.
Both of these representations are mathematically correct, but in further processing you have to treat them differently.
Now keep in mind that we are handling genealogical data, which are ultimately derived from numbers of haplotype differences and which are mixed with some degree of noise.
Now I have two assertions:
1. When the data are eigenvalue scaled, the highest dimensions (which are the most noisy) are more inflated then the lower dimensions. This inflates the overall noise in the matrix.
2. To be useful, the Euclidean distance should approximate a measure of the number of haplotype differences. In the unscaled way, it does so. But the Euclidean distance of scaled data is an approximation of the scaled number of differences, which is completely useless.
Based on these arguments, I think that I am on solid ground in preferring the unscaled data.

Ryan said...

I would love to see this annotated with what each of the clines and vertices represents.

Matt said...

Some more nMonte fits using Ancient 67 World 20 scaled:

European population averages using all pre-Bronze Age and outgroups: https://pastebin.com/BMbZLxhY

"Simple" European population averages using all Steppe_EMBA, Anatolia_N, CHG, WHG, SHG and outgroups: https://pastebin.com/bAfFBU0u

Graphics for the simple fits: https://imgur.com/a/Gtryg

MomOfZoha said...

@Davidski:
"Some of these Iranians have minor Sub-Saharan ancestry, so some might also have Arabian ancestry. Not sure which part of Iran they were sampled at though?"

Abadani people could be relatively closer to Arabs, with or without elevated Sub-Saharan ancestry (given that Arabs too are not monolithic), just from geography.

Going east from the port Bandar-i Abbas towards Bandar Beheshti one might see elevated Sub-Saharan ancestry too, due to the historic movements -- willing or unwilling -- of Bantu peoples ancestral to the Siddi of Pakistan and India today.

Then again, within the very same country Iran, one may also find descendants of Georgians and Armenians along the Caucasus border, Turkmen along the Turkmenistan border, and Tajiks along the Afghan border too. Not to mention every combination thereof, whether due to admixtures or common-ancestor origination (chicken or egg)...

Hence, it is not surprising that the commonality concerning Iranians throughout all three graphs is that they are very spread out. Also, Iranians seem to sprinkle the vast space between the Pakistan-India-Afghanistan-Tajikistan people east of Iran and the people west of Iran including Caucasus-Turkey-Yemen (though not a cluster). Eh, geography...

Onur Dincer said...

@Kristiina

Thank you for the links. I have now read Parpola's article. Not sure which of Hakkinen's articles you meant, so I have not read his articles on that link, most of which are in Finnish anyway. I cannot say I am convinced by your references from Hakkinen about the existence of an agricultural economy among Proto-Uralics. Proto-Uralics were probably Neolithic hunter-gatherers (including fishers).

https://encyclopedia2.thefreedictionary.com/Volosovo+Culture

I have already mentioned the existence of words of IE origin in Proto-Uralic.

I could not find anything contradicting my arguments in Parpola's article. Some Uralic peoples lived in the IE contact zone in the forest steppe areas, I have never disputed this, nor do I dispute Khanty and Mansi peoples having post-Proto-Ugric admixture from more East Eurasian-derived peoples (Uralic or not).

Also, this is what Parpola says in regard to Sargat: "The Sargat intercommunity was dominated by steppe nomads belonging to the
Iranian-speaking Saka confederation, who in the summer migrated northwards
to the forest steppe."

Still, you jump to quick conclusions about ancient autosomal results based on a few ancient haplogroup results. The Sargat people might well have had some Siberian type East Eurasian admixture (probably derived from the Uralic peoples they absorbed or incorporated).

By the way, I had sent you an email a while ago, but you have not replied to it.

Samuel Andrews said...

Spoiler Alert, Northern Bell Beaker's farmer ancestor was Funnel Beaker. No way was it Globular Amphora. No way. That's my opinion based on mtDNA. My blog will be up in a few days.

Samuel Andrews said...

Some H2a is from the Steppe, some from EEF. H2a2 is from EEF and was probably popular in FUnnel beaker. H2a1 is from the Steppe.

Anthro Survey said...

@Onur Dincer @Davidski

After looking into it, it's pretty clear the Bosnian outlier does have Roma ancestry, after all, not West Asian. (Btw, Onur, that ADMIXTURE graph does have proper South Asian samples and presence of the modal component in the Bosnian is what made me suspect Roma ancestry).

In fact, this sample gets the highest cM sharing with all 3 Roma samples on that datasheet(thanks Dave)----by miles. So much so that the individual must have a Roma parent. Unsurprisingly enough, the Roma sample he shares most cM with is also the one nMonte selected to model him.

The plot thickens, though. The second highest sharing sample(but not quite as high) is a Peloponesian Greek individual to whom I didn't pay much attention earlier . After a very cursory glance I made a hasty assumption he has ancestry from the 1920s' population exchange.
He is GreecePelop6. Once again, nMonte selected the highest-sharing Roma for him.
Rumors about Georgios Karaiskakis, one of the Greek Revolution's heroes, having Roma ancestry might not be so far-fetched, it seems.

I then examined the two Bulgarian outliers on the PCA: Bulgarian12H and Bulgarian10H. Sure enough, they take a decisive 3rd place on that matrix. Their sharing is significantly higher than other samples, but surely more modest than the Bosnian. Adding the 3 Roma into the input didn't result in a drastic improvement in the fit.

Some of Monte's highlights are shown below. Note again that a distance of 0.05 and below is a pretty decent fit in 20D as it's close to the distance between two samples in an average cluster.

Bosnian with Roma:
[1] "distance%=4.1777 / distance=0.041777"
Bosnian_1_GSM1424650
GS000014325 34.45
Bosnian_10_GSM1424651 27.25
Bosnian_11_GSM1424652 13.20
Bosnian_5_GSM1424661 11.75
Bosnian_3_GSM1424659 8.70


Bosnian without Roma, but with various North Caucasus samples:
[1] "distance%=10.2317 / distance=0.102317"
Bosnian_1_GSM1424650
Bosnian_6_GSM1424662 63.9
Bosnian_11_GSM1424652 12.2
Bosnian_4_GSM1424660 12.2
HGDP01403 11.8

Greek with Roma inputs(GS14325):
[1] "distance%=2.8508 / distance=0.028508"
GreecePelop6
GreecePelop8 34.70
GreecePelop3 34.10
GS000014325 18.65
GreecePelop5 11.10
GreecePelop7 1.45

Greek without Roma:
[1] "distance%=6.2836 / distance=0.062836"
GreecePelop6
GreecePelop8 77.6
GreecePelop4 17.4
GreecePelop3 3.9
GreecePelop5 1.1

Bulgarian with and without Roma:
[1] "distance%=2.8877 / distance=0.028877"
Bulgarian12H
Bulgaria33 67.25
GS000014325 13.40
Bulgarian17H 7.75
Bulgarian16H 7.40
Bulgarian18H 4.20
--------------------
[1] "distance%=4.882 / distance=0.04882"
Bulgarian12H
Bulgaria33 68.6
Bulgarian6H 15.4
Bulgarian7H 8.0
Bulgarian5H 4.2

Rob said...

@ Sam

"Spoiler Alert, Northern Bell Beaker's farmer ancestor was Funnel Beaker. No way was it Globular Amphora. No way. That's my opinion based on mtDNA. My blog will be up in a few days"

That would be a great find if true, and is somewhat expected viz. archaeology. How sure are you though, given that there are only 6 GAC mtDNAs ?

Davidski said...

@Qagan

The plot positions of the individuals and populations, and the resulting clusters and clines, that you're seeing on these PCA reflect pairwise genetic relationships between all of the samples, and all of the things that this entails.

So they aren't just the result of certain levels of ancient ancestral components, but also ancient and recent demographic events, like, for example, rapid expansions of small founder populations, and resulting genetic drift.

Such relatively recent genetic drift can be so extreme that it can dominate certain dimensions of the PCA, and completely mask more ancient relationships, especially when some populations are oversampled relative to others.

This is essentially why many Amerindians are being pushed so far to the left in Eigenvector 1 on the Eurasia & Americas PCA, despite their ancient West Eurasian ancestry.

However, looking at more dimensions than just two or three, which is all that we can plot visually, by using them to model ancestry proportions, is likely to reveal the western shift in Amerindians compared to East Eurasians. That's because we'd be using dimensions in which the Amerindian-specific genetic drift has very little or no impact.

But I've done PCA in the past in which Amerindians appear significantly West Eurasian in the first two dimensions, and that;s because I used only one Amerindian sample in each run. See here...

http://eurogenes.blogspot.com/2016/09/the-eurasians-idiots-guide.html

Matt said...

@huijbregts: In the first way (my preference) the columns of the score matrix have a variance which is equal to the eigenvalue.
The second way (your preference) is called 'eigenvector scaling'. Here the columns of the score matrix are divided by the respective eigenvalues, which sets the variance to 1.


To be clear, the scaling I am using is to multiply the "columns of the score matrix" (which in the raw data each account for an equal amount of the total variance) by the square root of the eigenvalue. This is the same operation as the eigenvalue scale option does in PAST3 when set on PCoA (scales each dimension from for an equal amount of the variance by multiplying by the square root of the eigenvalue).

The procedure you describe as "your preference" (dividing each column by the eigenvalue) is not what I'm doing. I'm not sure where you've got this idea from because it's not what I've described ITT or elsewhere (and I'm further not sure why anyone would do it since it would make the lower dimensions account for a systematically smaller amount of the variance in a way that directly distorts the ground truth).

Samuel Andrews said...

@Rob,
"That would be a great find if true, and is somewhat expected viz. archaeology. How sure are you though, given that there are only 6 GAC mtDNAs ?"

I'm pretty sure. For two reasons. First, Irish share multiple recent links with Scandinavians including mHGs that have already been found in Funnel Beaker remains. Second, I'm confident Poles & Russians' main farmer ancestor is Globular Amphora and several young EEF-derived mHGs in eastern Europe are not found in Northwestern Europe.

But I'm open to being wrong. Modern mtDNA has been miss leading before.

Matt said...

@ huijbregts: I mean, if this helps -

https://folk.uio.no/ohammer/past/multivar.html

This is the procedure I'm following A: "Principal Coordinates ... The "Eigenvalue scaling" option scales each axis using the square root of the eigenvalue (recommended)."

I'm not following B: Principal components analysis ... If the "Eigenval scale" is ticked, the data points will be scaled by 1/sqrt(dk), and the biplot eigenvectors by sqrt(dk) - this is the correlation biplot of Legendre & Legendre (1998).. (This is the procedure that puts the data in the state that the datasheet is in, with each dimension accounting for an equal share of variance.)

Onur Dincer said...

@Anthro Survey

No, that ADMIXTURE graph does not include any proper South Asian population, populations such as the Baloch, Brahui, Pashtuns/Pathans and Kalash are not proper South Asians, they are South Central Asians. By "proper South Asian" I mean populations with medium to high ASI levels such as the Punjabi, Gujarati, Bengali, Tamils and Paniya, no such population exists in that ADMIXTURE analysis, as a result their "South Asian" component has little ASI influence and should be treated as South Central Asian. Non-Iranian Near Eastern and Caucasus populations have no or almost no proper South Asian ancestry, so the component that is modal among the South Central Asian populations is obviously not a proper South Asian component. You should keep these in mind in your future analyses.

Kristiina said...

@Onur
I have not received your email, so please send it again.

You are free to keep your opinions, but the difference between us is that I have adduced a lot of evidence in support of my views and you have adduced none.

In any case, I believe in plurality of cultures and languages which gives room to many ethnic identities that have interacted, merged and disappeared in prehistory.

Häkkinen's article is in Finnish: Kantauralin ajoitus ja paikannus: perustelut puntarissa.

Anthro Survey said...

@Onur Dincer

At first I was not entirely sure what you meant since many do not make this distinction and it does tend to vary. Your point is valid and I make such a distinction myself(referring to everyone east and inclusive of Punjabis & Sindhis), actually. Nevertheless, it doesn't take away from my suspicion---strengthened by cM sharing data and Monte---being fully justified.

In this case, it's a composite component containing alleles related to Iran_N, ASI and steppe-related ancestries, proportions fluctuating somewhat depending on population(e.g.it would be essentially ASI-free in the context of Armenians or Iranians).

Little ASI, yes, but NOT trivial by any means.
For one, we know the 4 populations of interest contain ASI from other analyses, and it's the only component where ASI would reside at that K in the Makrani and Brahui(if you look at the expanded ADMIXTURE results in S1). Note also how the "yellow" ceases to persist after K=5 in Brahui & Makrani and markedly reduces in the Pathans and Burusho.

It is almost a certainty that had a low-caste North Indian population been included, we'd see overlap with these 4 populations at low Ks as they do in virtually all other ADMXITURE runs---mainly due to a shared stream of SA-specific Iran_N-like ancestry(but also ASI).

Now, had recent West Asian ancestry been the culprit, as opposed to SA-related, chances of seeing such a big piece in the Bosnian would have been slim-to-none. Instead, we'd expect to get higher-than-normal pink and perhaps 1/6-1/3 as much of the green.
Taken together with those two Romanians in the run behaving similarly and Romania(as well as Balkans at large) harbouring sizeable Roma populations, it was hard not to suspect Roma ancestry.

And after closer inspection summarized in my last comment, it is hard to have serious doubts about the ancestry in question being Roma.

Aram said...

Samuel Andrews

H2a2 is from EEF?
Based on modern distribution (yes it looks European) or there was an ancient DNA with H2a2 that I missed?