This cushtic nonsense needs to die

You were right when you said that the samples that turned 12% Arabian were wrongly designated.

I have proved this:

View attachment 357030

I used the least Arabian samples 0-3% as representative of the source, while I added Bronze Age Levantine (reflects the Arabian-like because it neatly reflects this ancestry type) as a stand-in for Arab, then I added Nilo-Saharan. Notice how when the Levantine ancestry rises, Nilo-Saharan elevates in direct proportionality? Well, this is because it is Cushitic ancestry. To know if these people are more or less Arabian, you subtract the NS from the BA Levant and then you will get a decrease or increase. So sample SOMALI6 that is supposedly 12% Arab is likely just 4%.

This makes perfect sense because the basal similarity between the most "Arab" samples was effectively in the same range without much discernment. It's simply because Somalis don't have a lot of Arabian ancestry. Outliers will score higher. I am an outlier, and let me demonstrate how I look using that same source:

View attachment 357031

This is in the ballpark of my 23andme readings:
View attachment 357032

See, no Nilo-Saharan.

Using the averages by Michalis:
View attachment 357033

Somalilanders who have the sample size of 9 barely got anything higher than the selected source samples which were the least Arabian. They're not exceptionally Arabian.

You might ask, but why is the number from Somali less and the BA and NS increased? Because there is an internal signature dimension that is entirely Somali but has differences within Somalis. Meaning, although the source samples who represent southern, northeastern, and central Somalis, there are slight shifts in what is Somali ancestry. Basically one pure sample that is fully Somali might account for 80% of Somali genetics.

To illustrate what I mean (the most outer circle represents the signature extent and boundary, hypothetically, whereas anything outside that is admixture):
View attachment 357036

These are overlapping circles that represent parts of Somali ancestry with the entire thing representing that extent of the internalized diversity without any admixture. However, it does not mean that if samples are 90% overlapping, that the other 10% is foreign. No. That sample is going to soak up the entire 100% because it is Somali ancestry but still, the fit is going to increase. Still, the other 10% is just a homogenous cluster with the rest of the 90%, it's just that no one sample is representative of all non-admixed Somalis to a perfect extent, although as far as homogenity goes, they do represent better than anything else, especially compared to other population internal differences.

Within-signature dimensionality can strictly be because of zero admixture, theoretically speaking.

Look at the Kenyan Somalis, they have higher NS than BA Levant and this checks out. Some of those samples received increased non-Cushitic DNA that was not Eurasian.

To summarize, the Arab ancestry in non-admixed Somalis is greatly exaggerated, where people who have 8% are actually outliers.

We have Giire, who is a Habar Awal (from what I recall) who clearly is very Arab:
View attachment 357038

On par with the Saho:
View attachment 357039

There are some samples taken from a research on UAE that seem to show Somalis:

View attachment 357037

They show fluctuations. The first one clearly has more than the average Arab ancestry, but its like tops 7%. The second one likely fits in the normal range. The third one is similar to the Somalilander samples (does not mean its from there). The fourth one is very similar to the source samples but with slightly higher Arabian. Maybe it's from the south central or Puntland. The third one is a bit higher but we have to remember the taforalt typically levels toward the non-NS needs, so you can roughly group it with the Levant. Then the last one, which is kind of within the norm.

It seems like Somalis are in general 0-5% Arabian.

Look at the average of all the samples (removed the samples from Kenya and likewise did not include the heavy Arab ones):
View attachment 357040

It shows parity between the Jordan EBA and Sudanese, which shows that on average, most Somalis are not that Arab when they don't have any recent admixture. But that does not mean there are not outliers. Many of them exist, but they don't make up a bulk that shifts anything tremendously. Most of the Arab stuff you see is something that was likely soaked up many centuries ago.

These are the Somali Emirati samples:



For example, this was designated as Sudanese Arab, but really is half Somalis, half Arab:



My own description is on the right side.

There are Sudanese, and seemingly Beja samples available from that UAE dataset that I should also post about. They are interesting. I had to correct some of the readings from the original labelling.
Your analysis is like reading a thesis book, very interesting and informative, great work. Since you said I am one of the outliers, is it possible that 23andMe can sometimes misrepresent somali sample? Can there be more than 1 somali sample they use for their basis to identity the autosomal? I know Myheritage is unreliable for horn africans but their recent update version is actually good. The updated version gave similar results as G25.

Old version


Updated version

 
Old version

1742051693416.jpeg



New version

1742051765100.jpeg
 
Your analysis is like reading a thesis book, very interesting and informative, great work. Since you said I am one of the outliers, is it possible that 23andMe can sometimes misrepresent somali sample? Can there be more than 1 somali sample they use for their basis to identity the autosomal? I know Myheritage is unreliable for horn africans but their recent update version is actually good. The updated version gave similar results as G25.

Old version


Updated version

Thank you for the compliment.

23andMe generally uses several mechanisms to measure genetics. Most of what ours is based on is sample size and within phasing paremeters AI clustering. It's fairly accurate as far as generalities go. But 23andMe is totally wrong when it comes to your ancestry. You don't have 15% Arab or what it claims. You have double that.

Quoting myself, here is my understanding of how they do it:

"It's a bit more methodological. They use the data they have to train the system to set parameters, with new users then fitting into that, and/or bettering the needed calibration as well. Yes, they used machine learning for this. Because they use aggregate systems recognition through PCA and a fancy process they call "Uniform manifold approximation and projection," which is basically a way to average clusters into uniform placements, they can conceptually build relational and associative landscapes for genetics to fit dimensions. Added with this: "which, when paired with survey data and analyzed jointly with the well-curated external reference panels, enabled us to define our 45 reference populations and flag outliers for removal" as mentioned in their White Paper, they can reliably build a well-functioning system. They use additional signifiers that also goes beyond geography sometimes, "Free-text responses on grandparental national, ethnic, religious, or other identities enabled us to construct reference panels for populations not defined by specific geographic regions (e.g., Ashkenazi Jews).""

From seeing Myheritage, it mixes the Cushitic DNA across distinct regions and designates the rest as Egyptian when it is Arab.
1742065319488.png
 

Trending

Top