u/Illustrious-Ice-7198

After running qpAdm rotations on the Tanoli average using AT1, These rotations consistently emerged as the strongest and most stable fit across bootstrap and complexity testing. This post breaks down the winning models, why it works, and what the rotations tell us about Tanoli ancestry structure. These samples are all R1b Carriers in Origin and from the Greater Area in Tanawal and they number 6 samples all tested from AncestryDNA.

Winning Rotation (Best Passing Static Model)
Sources
Indus Proxy: Indus_medAASI
Steppe Proxy: Russia_Srubnaya_Alakul
East Asian Proxy: Mongolia_Xiongnu_o1.SG
Final Percentages
46.5% Farmer
20.8% SAHG
26.0% Steppe
06.7% East Asian
This was the cleanest overall passing rotation when balancing:
p-value stability
coefficient realism
standard errors
complexity pass
bootstrap consistency
The static run clearly prefers a 3-way South Asian cline with a minor East Asian pulse rather than a pure Indo-Aryan or pure NW South Asian profile.

Why This Rotation Wins
The strongest signal in the entire run is the balance between:
substantial Iranian-related farmer ancestry,
moderate indigenous SAHG/AASI,
and elevated Steppe MLBA,
with a small but persistent East Asian component.
The model stabilizes around:
~46–47% Farmer
~20–21% SAHG
~26% Steppe
~6–7% East Asian
This balance repeatedly appeared in bootstrap means and coefficient convergence.
The East Asian component is small but statistically meaningful. When the East Asian proxy was removed entirely, the fit quality collapsed and p-values became extremely poor. That indicates the signal is not noise but likely reflects real eastern admixture absorbed somewhere along the historical Tanoli formation process.

Bootstrap Stability
From the static output:
bootstrap mean:
0.674
0.259
0.067
standard errors:
0.024
0.018
0.011
This is actually pretty stable for a South Asian qpAdm run involving Steppe + AASI + East Eurasian interactions.
The East Asian signal survives bootstrapping despite being small, which is important. Random noise usually disappears under bootstrap resampling. Here it remains consistently around ~6–7%.

Interpretation of Components
1. Farmer (~46.5%)
The farmer layer here is not “pure Iranian farmer” in a literal sense. Using Indus_medAASI as the proxy captures:
Iranian-related ancestry,
settled Indus Periphery ancestry,
and part of the South Asian substrate already absorbed into Bronze Age Indus populations.
This percentage suggests Tanolis are heavily derived from the northern Indus genetic continuum rather than being overwhelmingly Steppe-shifted.

2. SAHG (~20.8%)
This is the indigenous South Asian hunter-gatherer component.
The amount is moderate:
higher than many NW Indo-Iranian shifted groups,
lower than Gangetic or deep South Asian populations.
This places Tanolis genetically in a transitional northwestern zone rather than an extreme Steppe-shifted population.

3. Steppe (~26.0%)
The Steppe signal is strong and unmistakable.
Using Russia_Srubnaya_Alakul produced the cleanest Steppe fit overall:
lower errors,
stronger convergence,
better coefficient behavior than alternative Steppe proxies.
~26% Steppe is significant and aligns Tanolis with many northern Pakistani highland populations rather than low-Steppe caste clusters.
Importantly:
the Steppe proportion remained stable across rotations,
meaning this component is not being artificially inflated by proxy interactions.
That’s usually a good sign of a real ancestry layer.

4. East Asian (~6.7%)
This was one of the most interesting findings.
The Xiongnu-related signal repeatedly improved the fit when included. Removing it worsened:
p-values,
complexity,
and residual behavior.
Possible explanations:
ancient eastern steppe absorption,
Tibeto-Burman mediated admixture,
medieval Central Asian interactions,
or inherited eastern ancestry already present in Steppe-derived groups entering South Asia.
The amount is not large enough to imply recent East Asian ancestry, but it is persistent enough to matter statistically.

The Second Best Model
The second strongest model used:
Indus_lowAASI
Kurumba.DG
Russia_Srubnaya_Alakul
Mongolia_Xiongnu_o1.SG
This rotation also produced a passing complexity model and respectable p-values.
The approximate profile was:
~52–53% Indus_lowAASI
~14% Kurumba-like SAHG
~25% Steppe
~8% East Asian
Why it ranked second instead of first:
Problems
slight less p-value on rotation
More proxy overlap between Indus_lowAASI and Kurumba
Larger coefficient variance under bootstrap
Essentially:
the model passed and was great it was just as good as the indusmed run.
but the ancestry partitions were less clean.
Kurumba acts as a very deep southern SAHG proxy and can sometimes overfit residual AASI structure. The model compensates by shifting proportions around more aggressively.
Still, the fact that it passed complexity testing is important because it independently confirms:
substantial Farmer ancestry,
moderate SAHG,
robust Steppe,
and persistent East Asian signal.

Why Some Rotations Failed
Several rotations produced:
negative coefficients,
inflated Farmer values,
impossible Steppe proportions,
or poor complexity scores.
Example:
some Indus_hiAASI rotations produced absurd Farmer inflation above 100% with heavily negative secondary coefficients.
This usually indicates:
proxy redundancy,
poor source orthogonality,
or the model trying to compensate for missing ancestry dimensions.
The high-AASI Indus proxies appear too drifted toward South Asian substrate to cleanly separate:
Farmer,
SAHG,
and Steppe simultaneously.
That’s why the medAASI profile performed best overall.

Overall Genetic Interpretation
The static qpAdm run suggests Tanolis are best modeled as:
predominantly Indus-derived,
with strong Steppe MLBA input,
moderate indigenous SAHG,
and a small but real eastern Eurasian component.
The ancestry profile fits well with a northwestern South Asian highland population shaped by:
Bronze Age Indus ancestry,
Indo-Iranian Steppe admixture,
local South Asian substrate,
and limited eastern steppe interaction.
The most important takeaway is that the model remained stable across multiple rotations and bootstrap sampling, which gives confidence that the broad ancestry proportions are real and not artifacts of overfitting.

TKM_IA Rotations (Static Winner + Distal vs Proximal Comparison)
We finally finished going through the TKM_IA rotations for the Tanoli average, and this ended up being one of the strongest and most informative qpAdm frameworks in the entire AT1 run.
What makes this model especially important is that it not only produced the best statistical fit overall, but it also gives a much more historically realistic picture of how Tanoli ancestry was likely formed.

Winning Static Rotation (Best Overall Fit)
Sources
Indus_medAASI
Turkmenistan_IA.SG
Nepal_Samdzong_1500BP.SG
Final Interpreted Percentages
53.1% Farmer
13.8% SAHG
22.4% Steppe
10.7% East Asian
Statistical Results
p-value: 0.874744
Complexity: 3
Pass: YES
This was by far the strongest fitting rotation in the entire proximal framework.
A p-value of 0.874744 is extremely strong for a South/Central Asian qpAdm model, especially one involving:
Steppe,
AASI,
Iranian-related ancestry,
and East Eurasian ancestry simultaneously.
In qpAdm terms, this means the model fits the observed genetic data exceptionally well and there is very little statistical evidence against the proposed ancestry mixture.

Why the TKM_IA Rotation Works So Well
This model succeeds because Turkmenistan_IA acts as a genetically intermediate population.
Instead of trying to independently model:
BMAC,
Steppe,
and eastern Inner Asian ancestry,
TKM_IA already carries portions of all three.
That reduces:
source competition,
covariance instability,
and coefficient inflation.
As a result:
the coefficients stabilize,
bootstrap behavior improves,
and the p-value rises dramatically.
This is exactly what we see in the output.

Important Observation About the Indus_hiAASI Rotations
One of the most interesting things in the proximal framework is that the Indus_hiAASI rotations actually trend closer to the distal Srubnaya-style models than the cleaner TKM_IA winner itself.
When the model uses:
Indus_hiAASI,
Turkmenistan_IA,
and eastern proxies,
the ancestry proportions begin shifting toward:
higher explicit SAHG,
lower Farmer absorption,
and stronger separation between Steppe and South Asian substrate ancestry.
In other words, the model starts behaving more like a distal decomposition again.
This is important because it shows:
the underlying Tanoli ancestry structure itself is fairly consistent across frameworks,
and the major differences are being driven by how qpAdm partitions the Farmer/AASI overlap.

Why Indus_medAASI Was Preferred Instead
The key factor here is the right populations — especially Ami.DG.
Once Ami is included in the right set, qpAdm strongly begins favoring:
Indus_medAASI,
over:
Indus_hiAASI.
Why?
Because Ami creates stronger discrimination against excess eastern drift overlap.
Indus_hiAASI carries more deeply South Asian-shifted ancestry and introduces additional covariance overlap with:
SAHG-related ancestry,
eastern-shifted variation,
and some Himalayan-related drift.
As a result:
Indus_hiAASI rotations become statistically less efficient,
while Indus_medAASI produces cleaner covariance structure and much stronger p-values.
This is why the final winning model converges around:
a more balanced Farmer/AASI profile,
rather than an extremely SAHG-heavy one.
So biologically, the runs are actually fairly consistent.
Statistically, however:
Ami in the right populations heavily stabilizes the Indus_medAASI framework.

Bootstrap Stability
Best Coefficients
0.446
0.447
0.107
Bootstrap Means
0.446
0.447
0.107
This level of convergence is extremely clean.
The bootstrap means almost perfectly reproducing the original coefficients is a major indicator that:
the model is stable,
not overfit,
and the ancestry partitions are statistically robust.
Standard Errors
0.040
0.032
0.012
Very respectable error ranges overall.
The East Asian signal especially is highly stable despite being the smallest ancestry component.
That strongly suggests the eastern signal is real and not random noise.

Deep Breakdown of the Components
1. Farmer (~53.1%)
This is the dominant ancestry layer.
Using Indus_medAASI captures:
Iranian-related farmer ancestry,
Indus Periphery ancestry,
and already admixed South Asian substrate ancestry.
Compared to the earlier distal model:
Farmer ancestry rises significantly,
while explicit SAHG drops.
This implies the Farmer + SAHG ancestry had already partially fused before later Steppe-era admixture events.
That pattern is exactly what we expect in:
post-IVC northern populations,
Swat-like groups,
and Indo-Iranian frontier populations.

2. SAHG (~13.8%)
The SAHG component becomes noticeably smaller in the proximal framework.
This does NOT mean Tanolis suddenly possess less indigenous South Asian ancestry biologically.
Instead:
part of the SAHG signal is now absorbed into the Turkmenistan_IA component itself.
This is one of the biggest differences between:
distal modeling,
and proximal modeling.
Distal models separate ancestry into cleaner ancient poles.
Proximal models compress ancestry into historically mixed intermediary populations.

3. Steppe (~22.4%)
In the earlier distal Srubnaya model:
Steppe was roughly ~26%.
In the proximal TKM_IA model:
standalone Steppe falls to ~22%.
But this does NOT mean Tanolis suddenly have less Steppe ancestry overall.
Instead:
Turkmenistan_IA already carries embedded Steppe ancestry.
So part of the Steppe signal is now hidden inside the Turan-related layer.
This is a classic proximal modeling phenomenon.
The real interpretation is that Tanolis likely descend from:
Steppe-admixed Turan populations,
rather than directly from unmixed Steppe migrants.

4. East Asian (~10.7%)
This was one of the most interesting findings.
In the distal framework:
East Asian ancestry was only ~6–7%.
But in the proximal framework:
it rises to ~10–11%.
Why?
Because Nepal_Samdzong is a cleaner and more regionally appropriate eastern proxy than broad Mongolia/Xiongnu samples.
Samdzong captures:
Himalayan,
trans-Himalayan,
and Inner Asian eastern ancestry
far more effectively in South Asian contexts.
The fact that the model fit improved with Samdzong strongly suggests the eastern signal in Tanolis is likely:
older,
Himalayan/Inner Asian mediated,
and historically embedded,
rather than recent East Asian admixture.

Comparison — Distal vs Proximal Models
Distal Model (Srubnaya Framework)
Final Profile
46.5% Farmer
20.8% SAHG
26.0% Steppe
6.7% East Asian
Best Passing Rotation
Indus_medAASI
Russia_Srubnaya_Alakul
Mongolia_Xiongnu_o1.SG
p-value
0. 659971
Interpretation
The distal framework isolates ancestry into deep ancient poles:
Farmer,
SAHG,
Steppe,
East Eurasian.
This model is excellent for understanding prehistoric ancestry structure.

Proximal Model (TKM_IA Framework)
Final Profile
53.1% Farmer
13.8% SAHG
22.4% Steppe
10.7% East Asian
Best Passing Rotation
Indus_medAASI
Turkmenistan_IA.SG
Nepal_Samdzong_1500BP.SG
p-value
0.874744
Interpretation
The proximal framework models ancestry through historically mixed populations.
Instead of direct ancient poles, it reflects:
Steppe-admixed Turan populations,
Indo-Iranian frontier groups,
Himalayan eastern interaction zones,
and post-IVC northern clines.
This is likely much closer to the actual historical formation process of Tanolis.

Historical Interpretation
Taken together, both frameworks paint a remarkably coherent picture.
The distal model tells us the deep ancestry sources:
Iranian-related Farmer,
SAHG,
Steppe MLBA,
eastern Inner Asian.
The proximal model then explains HOW those ancestries likely came together historically.
The strongest interpretation is that Tanolis descend from populations related to:
Steppe-admixed Turan groups,
Indo-Iranian frontier populations,
Swat-like Iron Age clines,
and Himalayan interaction zones already carrying eastern Eurasian ancestry.
In other words:
Tanolis do not appear to descend from:
pure Steppe migrants,
pure Indus populations,
or isolated South Asian groups.
Instead, they appear to derive from already mixed northern frontier populations formed between:
Turan/BMAC,
Steppe,
Indus,
and Inner Asian eastern zones.
And statistically, the TKM_IA proximal model may genuinely be the most historically realistic reconstruction in the entire AT1 dataset but we do prefer the Distal Model as it does have a deeper look overall and is more consistent with NW Groups in the Subcontinent. The Proximal is a great way to look at ancestry from a geographical POV.

Winning Rotation (Best Passing Static Model)
Sources
Farmer Proxy: Indus_medAASI
SAHG Proxy: Russia_Srubnaya_Alakul
East Asian Proxy: Mongolia_Xiongnu_o1.SG
Final Percentages
46.5% Farmer
20.8% SAHG
26.0% Steppe
06.7% East Asian
This was the cleanest overall passing rotation when balancing:
p-value stability
coefficient realism
standard errors
complexity pass
bootstrap consistency
The static run clearly prefers a 3-way South Asian cline with a minor East Asian pulse rather than a pure Indo-Aryan or pure NW South Asian profile.

The Second Best Model
The second strongest model used:
Indus_lowAASI
Kurumba.DG
Russia_Srubnaya_Alakul
Mongolia_Xiongnu_o1.SG
This rotation also produced a passing complexity model and respectable p-values.
The approximate profile was:
~52–53% Indus_lowAASI
~14% Kurumba-like SAHG
~25% Steppe
~8% East Asian
Why it ranked second instead of first:
Problems
A Slightly less p-value
More proxy overlap between Indus_lowAASI and Kurumba
Larger coefficient variance under bootstrap
Essentially:
the model works almost as good as indusmed.
but the ancestry partitions were less clean.
Kurumba acts as a very deep southern SAHG proxy and can sometimes overfit residual AASI structure. The model compensates by shifting proportions around more aggressively.
Still, the fact that it passed complexity testing is important because it independently confirms:
substantial Farmer ancestry,
moderate SAHG,
robust Steppe,
and persistent East Asian signal.

4. East Asian (~10.7%)
This was one of the most interesting findings.
In the distal framework:
East Asian ancestry was only ~6–7%.
But in the proximal framework:
it rises to ~10–11%.
Why?
Because Nepal_Samdzong is a cleaner and more regionally appropriate eastern proxy than broad Mongolia/Xiongnu samples.
Samdzong captures:
Himalayan,
trans-Himalayan,
and Inner Asian eastern ancestry
far more effectively in South Asian contexts.
The fact that the model fit improved massively with Samdzong strongly suggests the eastern signal in Tanolis is likely:
older,
Himalayan/Inner Asian mediated,
and historically embedded,
rather than recent East Asian admixture.

Tanoli QPADM Average Run on AT1 (Static + Rotations Deep Dive Distal Vs Proximal)

Tanoli QPADM Average Run on AT1 (Static + Rotations Deep Dive Distal Vs Proximal)