Weaving Facts into a Family Story in Different Layers of Genealogical Time : Part Two

Historical context when writing a story is an aim when I research our family history. In addition to studying the basic facts of direct ancestors’ lives, if it is possible, my intent is to consider family stories and the social context in which ancestors lived. Sometimes this aim is difficult to achieve. When analyzing evidence in genealogical time layers outside the traditional genealogical period of time, family history takes on a different meaning and challenges to adding historical context to the story.

As we trace family lineages back in time our source of genealogical evidence changes and becomes limited. Stories shift from specific ancestors and families to lineages. Generations of ancestors shift to questions of where and when genetic mutations may have occurred. The methods we use to gather evidence also change.

Our notion of ‘family’ changes. We have two ‘sets’ of family: genealogical and genetic. Both are related and overlap but not identical. Our terminology and focus on describing ‘family’ characteristics changes. Our general orientation to recreate historical context and describe influencing factors in family stories change.

Fundamental questions arise regarding what are the differences and limitations when writing family history in different genealogical layers of time. While there are differences, there is a line of connectivity and coherence in what we call ‘family’ across the three genealogical layers of time. The sources of contextual evidence are different in each time layer. In the genealogical time payers of deep ancestry and the period of lineages, our family stories can be gleaned from paleo-genomic research and macro cultural anthropological research.

The Three Layers of Genealogical Time

In the first part of this story, I outlined three layers of genealogical time that have unique characteristics.

  • Short Term – Normal Time: This is the realm of traditional genealogy and family history that spans roughly 300 years or 10 generations. I use 31 years are one generation. [1];
  • Mid Range – Lineages: This middle layer of time can be viewed within a genetic genealogical perspective that focuses on Y-STR mutations. It is a period where surnames emerge. Using traditional genealogical methods with genetic genealogy can lead to promising leads on the location of haplogroup groups based on surnames and geographical areas. The middle historical time layer can be viewed in terms of tracing Single Nucleotide Polymorphisms (SNP) and Short Tandem Repeats (STR) Y-DNA mutations in lineage / clan groups and haplogroups.
  • Long Range – Deep Ancestry: This is the foundational layer of genealogical time. It can provide an understanding of the correlation between haplogroup migration and geographical location. This time layer focuses on the correlation of genetic evidence with ancient cultural groups that existed in specific geographical areas and long-term climate and landscape changes as well as historic cultural geographical patterns across long stretches of time. This long range layer of time can be viewed within a genetic genealogical perspective that focuses on Y-SNP mutations;

Each of these layers of time are associated with differing orientations and sources of contextual background information to create family stories.

Reframing Contextual Factors for Mid Range and Deep Ancestry Time Layers

In the traditional genealogical time layer we have paper, digital and physical sources of historical evidence to create family stories. Contextual factors are broadly encapsulated in four social structural levels. (See table one.) They can help explain or provide descriptive information surrounding an ancestor or family’s life experiences in a particular time period.

Table One: Social Structural Levels or Networks of Influence in the Traditional or Short Term Genealogical Time Layer

Social Structural
Level
Examples of Social Structural Influences
IndividualFamily Member / Couple
Nuclear Family
Micro LevelExtended Family / Local Neighborhood
Local Social Groups (Church, Local Community)
Local Occupational Work Groups
Intermediate LevelEthnic Networks
Economic Strata / Class
City-Wide area / Local Regional Areas
Macro LevelState and National Level
European Country
Geographical Region

In addition to the various social structural levels that may play a prominent role in describing the experiences of ancestors and their families, there are ecological, technological, economic, cultural influences that may add historical context to the story. These influences may affect specific or all social structural levels, as illustrated below.

Illustration One: Time and Historical Context of Structure, Culture, and Other Factors in the Short Range Time Layer

As we move back in time, contextual evidence increasingly becomes associated with the intermediate and macrostructural levels. The ability to document these historical contextual factors of influence diminish as was we go back into the mid range and long range genealogical time layers. Evidence is not available for certain social structural levels and other contextual historical factors. This is illustrated in table two.

Table Two: Likelihood of Finding Information from Social Structural Levels Associated with Traditional Genealogy

Time Period / Layer
IndividualMicro LevelIntermediate
Level
Macro
Level
Long Range – Deep AncestryXX
Mid Range – LineagesXXX
Short Term – Normal TimeXXXX

Our frame of reference shifts from individual ancestors and families to terminal single nucleotide polymorphisms (SNPs), short tandem repeats (STRs), the most recent common ancestor (tMRCA), haplotypes, haplogroup subclades, modal haplotypes and branches. [2]

Y-DNA SNP and STR mutations or mtDNA SNPs are the basic frames of reference for the mid range and long range time layers. These mutations help identify groups, based on those mutations, loosely akin to what are families in the short term or traditional time layer.

SNPs and STRs: The Underlying Connection Between the Three Time Layers

In a nutshell, SNPs, single nucleotide polymorphisms, are the mutations that define different haplogroups. Haplogroups reach far back in time on the direct paternal, generally the surname, line. [3]

SNPs and STRs are the building blocks that tie the three genealogical time layers together. While both are part of each time layer, one can argue that SNPs characterize the long term genealogical time layer while STRs are provide a unique discriminatory power in the mid range or period of lineages genealogical time layer.

A Base Pair in DNA

Two complementary nitrogenous bases (adenine with thymine, and cytosine with guanine) that pair together to form the “rungs” of the DNA double helix, held together by hydrogen bonds. They are the building blocks of DNA structure where the sequence of these base pairs encodes genetic information.

Illustration Two: A Base Pair 

SNPs represent variations at a single DNA base position where one nucleotide in the DNA string is substituted for another. STRs are repeated sequences of DNA that consist of 2-6 base pairs occurring in a head-tail manner. For example, a sequence of DNA base sequences in the DNA chain resembling “GATAGATAGATAGATA” represents four repeats of the “GATA” pattern. These repeats can vary in length among different individuals, making them highly polymorphic (the occurrence of multiple distinct forms or variants). [4]

SNPs and STRs serve distinct purposes in genetic analysis across different time periods due to their unique mutation characteristics. STRs are ideal for recent genetic analysis (short range and mid range time periods) because they have a high mutation rate of approximately 10-3 to 10-4 per generation. [5] [6] This makes them particularly useful for population differentiation studies, genealogical matching within the past 500 years to 800 years, and forensic DNA testing and kinship analysis. [7] Completing a’ Big Y’ DNA test provides matches back 1,500 years. [8]

SNPs are better suited for studying ancient (long range) genetic history. They have extremely low mutation rates of approximately 10-8 . [9] They are considered “once in the lifetime of mankind” events. [10] They can effectively track population divergence dating back to the African exodus 50,000-75,000 years ago. [11] As more male individuals are tested, the SNP haplotrees can become more refined and identify sub branches or subclades in what I have identified as the mid range and short range time periods.

From a technical angle, SNPs work better with degraded DNA (e.g. ancients bones) due to smaller target regions. They also have greater mutational stability and require 40-60 loci to match the discriminatory power of 13-15 STRs. [12]

STRs provide higher information content per locus due to multiple alleles. (An allele is a variant form of a gene that occurs at a specific location (locus) on a DNA molecule.) They also can be used for high-resolution description of human evolutionary history. [13]

As indicated in illustration three below and discussed in a previous story, the “One-Two” punch of Y-DNA testing involves using the results of Y-SNP DNA tests to provide a general location of Y-DNA testers on the Y-DNA haplotree based on nested haplogroups. The ‘second punch’ uses Y-STR test results to help group test results within recent haplogroup branches and to assist in analyzing potential individual matches.

The analysis and comparison of individual Y-STR test results can help delineate lineages and tease out branches within the haplotree, fine-tuning relationships between people within the tree. The “One-Two Punch’ approach with SNP and STR data is particularly helpful in trasing out genetic ties with test results associated with different surnames and before the use of surnames in the period of lineages genealogical time layer.

Illustration Three: The Relationship Between SNPs and STRs in Refining Haplogroup Branches

Click for Larger View | Source: Modified illustration from J. David Vance, DNA Concepts for Genealogy: Y-DNA Testing Part 2, 3 Oct 2019 https://www.youtube.com/watch?v=mhBYXD7XufI&t=355s

While STR tests are used by individual testers to discover possible Y-DNA genetic matches with other testers, the results of STR tests can also provide insights into macroscopic demographic properties that can shed light on lineages and clans – well before the time of surnames. Y- STRs have a time window that runs back to the late Bronze Age. 

STRs … tell us about demography — specifically about bottlenecks and subsequent expansions, namely “founder events.” While SNPs tell us when they were created, STRs tell us about when the population burgeoned after a founding mutation. That SNP and STR clades have a fundamentally different interpretation has caused considerable confusion, but once understood, the methods are very useful complements.” [14]

STRs have been viewed as having limited use in estimating dates beyond about 50 to 100 generations (e.g. 1,550 – 3,100 years before present). However, there have been studies that indicate STR data can be utilized to for genealogical analysis into the Paleolithic era. (The Paleolithic period, also known as the Old Stone Age, generally spans from around 3.3 million years ago to approximately 11,650 years ago.) [15]

The Haplogroup and Most Recent Common Ancestor as the ‘Generation’ in the Mid Range and Deep Ancestry Time Layers

The concepts of an haplogroup and the Most Recent Common Ancestory (tMRCA) play a tandem role as defining what can be called a ‘generation’ in the deep ancestry and period of lineages genealogical time layers. However, pinpointing a ‘generation’ in the mid-range and long range time periods is not as exact as in the short range genealogical time layer.

A haplogroup can be considered like an ancestor on your family tree. Each haplogroup forms a branch on that family tree. Depending on the age of the haplogroup (when it formed), you may have the name of that ancestor, or the ancestor may have lived so long ago that their name has been lost to time.

“Each haplogroup formed at a specific time and in a specific location. Testing of modern peoples and ancient DNA informs us of those locations and phylogenetic experts are able to build not just a tree of humankind, but also migration paths that those haplogroups took across and out of Africa and to the other continents.” [16]

A Y-DNA SNP mutation is akin to a direct paternal descendent. Haplogroups contain one or more unique SNP mutations. Each unique SNP mutation within the haplogroup pertain to a single line of descent. Each haplogroup originates from, and remains part of, a preceding single haplogroup.

As such, any related group of haplogroups may be precisely modelled as a nested hierarchy, in which each set (haplogroup) is also a subset of a single broader set (as opposed, that is, to biparental models, such as human family trees). Haplogroups can be further divided into subclades.[17]

There is at least one SNP mutation associated with a haplogroup. However, many haplogroups may have more than one SNP mutation associated with it, referred to as equivalents or equivalent SNPs.

“Equivalent SNPs” in a haplogroup refer to multiple SNPs that occur on the same genetic branch, essentially meaning they all indicate membership in the same haplogroup, even though they are slightly different mutations at the DNA level. Essentially they are considered the same for identifying a haplogroup as they all point to the same ancestral lineage within that group. 

These SNPs are located on the same branch of the phylogenetic tree, indicating they arose around the same time in evolutionary history and are associated with the same haplogroup. It is often difficult to determine the exact chronological order of occurrence between equivalent SNPs. When multiple SNPs are tested, if they all show the same pattern (positive or negative for the same haplogroup), it strengthens the identification of that haplogroup. [18]

Equivalent SNPs are variants that occupy the same branch as one another. This occurs when multiple SNPs are tested positive and negative for the same upstream and downstream SNPs and have all yielded the same positive and negative results from testers as the main SNP on the branch, making it impossible for our phylogenetic expert to confidently determine which of these variants are upstream or downstream of the others.[19]

When multiple equivalent SNPs exist, they are often listed together in haplotrees and source documentation. Different laboratories and corporations may select different equivalent SNPs as their primary or defining marker for the same haplogroup.

In each nested genetic set of SNPs, there resides a ‘Most Common Recent Ancestor’. The determination of relationships of identified SNP mutations within the haplogroup relies on statistical methods like the rho statistic to estimate the time to most recent common ancestor (TMRCA), next-generation sequencing techniques that can identify SNPs in an unbiased way, and high-quality coverage of the Y chromosome to ensure accurate SNP identification. [20]

When dealing with equivalent SNPs in a haplogroup, the focus is not on choosing a single “most recent” common ancestor, but rather on understanding that these mutations represent the same ancestral point in the haplogroup’s history. The actual age estimation of the common ancestor is calculated using statistical methods and ‘molecular clock’ calculations rather than trying to determine which of the equivalent SNPs came first. [21]

In genetic genealogy, the most recent common ancestor (tMRCA) refers to the most recent individual from whom two or more people being tested are directly descended, essentially the point in time where their genetic lineages converge based on DNA analysis. The MRCA can be a specific person in a family tree, or a population-level ancestor estimated through genetic data analysis. Regarding the latter, the MRCA will often be represented by an estimated birth date and a statistical confidence level associated with the estimated date. [22]

Rob Spencer provides a cogent explanation of the relationship with tMRCA and when haplogroups are formed. Illustration four depicts an example of how the tMRCA and haplogroup formation dates can be different.

Illustration Four: Formation Dates of Haplogrups and tMRCA

Click for Larger View | Source: Spencer, Rob, Data Source and SNP Dates, Discussion, SNP Tracker,http://scaledinnovation.com/gg/snpTracker.html

Spencer’s illustration focuses on the fact that the determination of when the MRCA emerged or was estimated to be born varies depending on who or what organization is calculating the MRCA date. The variation in estimates is also dependent upon the number of SNP mutations associated with a specific haplogroup.

In a rapidly expanding population with many surviving lineages, tMRCA and formation are very close and may be identical. But for older and leaner lineages, a SNP may appear long before one of the originator’s descendants has two surviving lineages, and additional separate mutations may occur in that time. In the sketch, (illustration above), SNP S2 is one of 21 such equivalents: different mutations but evidently from a long unbranched line, since all DNA testers either have none of these 21 SNPs or they have all of them. The tMRCA for S2 is shown in blue; it’s where branches that have S3 and S4 split away. But the formation time for S2 cannot be directly measured and it could be anywhere between S2’s tMRCA and the previous tMRCA. YFull’s convention is to assign a SNP’s formation date to the previous SNP’S tMRCA (the left-most of the long run of equivalent SNPs). But it is perhaps better to estimate the formation date as halfway between, as shown by the red dot, which is what SNP Tracker does.” [23]

Different haplogroups exhibit substantial variation in their mutation rates. This can be due to bottlenecks or expansion in populations. Bottleneck events can create distinctive patterns that increase the rate of coalescence between lineages, lead to fewer overall haplotypes, and result in higher frequencies of the most common haplotypes. [24]

Different haplogroups may have undergone varying levels of genetic diversification based on their demographic history and population size. Migration patterns can create unique combinations of variants. [25] Some haplogroups have experienced more mutations over time due to geographic isolation leading to distinct mutation patterns, larger population sizes allowing more opportunities for mutations to occur, and older lineages having more time to accumulate variants. [26]

The age of population splits affects variant distribution. Older lineages have had more time to accumulate variants. Recent demographic events (5,000-10,000 years ago) particularly shape the distribution of rare variants. Population-specific variants can arise either from new mutations within a population or from the loss of variants in other populations [27]

The impact of growth on SNP variant diversity is particularly evident in founder populations, where initial small population sizes followed by rapid expansion create unique patterns of genetic variation and haplogroup distribution [28]

Differences between ‘Generations’ and ‘Haplogroups’

The parallel between ‘generation’ in the traditional genealogical time layer and ‘haplogroup’ in the other two time layers is limited. A family is associated with a specific network of individuals that can be associated with a ‘generation’. A generation is a group of people born around the same time and generally in the same area. A generation is also the average period of time it takes for children to be born, grow up, become adults, and have children. [29]

A haplogroup, on the other hand, is a group of people with similar genetic SNP and STR markers that can be traced back to a common ancestor. That common ancestor could have lived thousands of years before the group of people identified as having similar genetic markers. Despite the limited similarity between the terms family and haplogroup, their similarity is based on their ability to connect and trace patrilineal or matrilineal connections across each of the three time layers.

Illustration five below provides an example of comparing ‘generational’ and ‘haplogroup’ properties based on my genealogical evidence. On the left hand side of the illustration is eight generations depicting my patrilineal family lineage through traditional genealogical research. To the right of my traditional patrilineal lineage is my ‘recent’ genetic genealogical lineage depicted through haplogroups based on SNP mutations along my patrilineal line.

As reflected in the illustration, my traditional patrilineal genealogical tree depicts eight generations between fathers and sons. Generations can be viewed as the years between father and son. In this instance, generations range from 21 years to 41 years. My patrilineal line of descent, which comprises eight generations back, spans 217 years.

Illustration Five: Comparison of Generations in a Traditional Family Tree and ‘Genetic Generations’ in a Haplotree

Click for Larger View | Sources: The traditional patrilineal line is based on personal genealogical research. The haplogroup information is based on genetic data test results from the Y-700 DNA test from FamilyTreeDNA (FTDNA)

The recent haplogroups or ‘genetic generations’ in my patrilineal line, as reflected in illustration four, comprise five SNP mutation levels or ‘genetic generations’ prior to my terminal YDNA SNP which is identified as G-FT48097. There is another haplogroup that split off of my most recent haplogroup G-FY211678 that I am related to and is idenified as G-FT119236. I am not directly related to the G-FT119236 haplogroup.

As depicted in table three, three things are particularly notable with haplogroups: the range of years between each haplogroup, the variance of the number of SNPs associated with each haplogroup and the number of immedite descendants or subbranches for each haplogroup. The number of years that are between each haplogroup range from an estimated 50 years to 1400 years. The number of SNPs associated with each haplogroup vary greatly. A third observation, not evident in illustration five, is the number of branches or subclades – the number of male descendants from each haplogroup.

Table Three: SNP Variants and Immidiate Male Descendants Associated with Selected Haplogroups

HaplogroupNumber of
Associated
SNPs
Estimated Years
Between Haplogroup
Number of Phylogenetic Subclades
G-Z674829– –2
G-Y383352502
G-Z4085751504
G-Y13250521504
G-BY21167833002
G-FT48097– – 500

Corresponding to the same time frame as table three, illustration six depicts a phylogenetic tree of haplogroups and subclades or branches that are associated with my ‘recent’ genetic descendants from haplofgroup G-Z6748.

Illustration Six: Phylogenetic Trees of Haplogroups Descending from G-Z4768

Click for Larger View | Source: A portion of and modification of Rolf Langland and Mauricio Catelli, Haplogroup G-L497 Chart D: FGC477 Branch, 2 Aug 2024, G-L497 Y-DNA Work Group, FamilyTreeDNA, https://drive.google.com/file/d/1xuZseoX40tWQhU5TpXZXqD6Y9zI9eqVz/view

Table four illustrates the wide variance in estimating the year of birth for each of the common ancestors associated with each haplogroup. While individual dates should be interpreted cautiously, collectively they can provide reliable benchmarks. Most genealogists recommend using 95% confidence intervals for the most accurate interpretation of results. Sixty-eight percent confidence intervals are recommended for narrower, but less certain estimates [30]

Table Four: The Most Recent Common Ancestor (tMRCA) Associated with Each Haplogroup

HaplogroupEstimated
Birth Date
of tMRCA
95 %
Confidence
Range of Birth
95%
Confidence
in Yrs
Rounded
Estimate
of tMRCA
Birth Date
G-Y38335708 CE425 – 943 CE518 yrs700 CE
G-Z40857970 CE737 – 1162 CE425950 CE
G-Y1325051115 CE841 – 1332 CE4911100 CE
G-BY2116781413 CE1210 – 1571 CE3611400 CE
Source: FamilyTreeDNA Big Y Data Haplotree, accessed 26 Jan 2025

The reliability of Y-DNA SNP-based MRCA estimates varies significantly depending on the timeframe and methodology used. For genetic genealogy purposes, the accuracy varies by depth of time. For prehistoric migrations for about 5000 years, there is a variance of 500 years in precision. For MRCA’s within 200 years, it is estimated that he variance could be around a 30 year variance. For MRCA dating based on cultural origins within 800 years, the precision of the estimate is plus or minus 500 years. [31]

Different testing companies use varying mutation rates. YFull utilizes 144.4 years per SNP. FamilyTreeDNA results associated with the BigY500 DNA test utilized : 131.3 years per SNP. For the BIig Y 700 Y-DNA test, a mutation rate of 83.3 years per SNP is used. [32]

Haplotrees as Family Trees in the Mid Range and Long Term Genealogical Time Layers

A haplotree is a branching diagram that shows the evolutionary relationships and genetic ancestry of human populations through inherited genetic markers. These trees represent the journey of human genetic lineages and help visualize how different groups are related to each other genetically. [33] There are two main types of haplotrees: Mitochondrial DNA (mtDNA) haplotrees that track maternal lineages through mitochondrial DNA and Y-DNA haplotrees that track paternal lineages through Y chromosome mutations.

Haplotrees follow a nested hierarchical structure where each haplogroup originates from and remains part of a preceding haplogroup. They are typically labeled using alphabetical nomenclature, starting with an initial letter followed by numbers and additional letters for refinements (e.g., A → A1 → A1a). [34]

The Y-DNA haplotree is particularly dynamic, with new branches being added frequently as more genetic data becomes available. As of recent updates, it has grown significantly from its initial 153 branches and 243 Y-SNPs to encompass thousands of documented genetic lineages. [35]

As of February 2024, it was claimed that the Y-DNA haplotree contains 76,626 distinct branches (as of February 2024). [36] Another source indicates by the end of 2024, these totals grew to 86,892 branches and 734,748 variants, marking a full-year increase from 2023 of 11,823 branches (15.5%) and 83,752 variants (12.9%). [37]

Unlike the Y DNA tree, which is defined and constructed by the genetic community, new mitochondrial DNA branches cannot be added to the official mitochondrial Phylotree. The official mitochondrial Phylotree is maintained at www.phylotree.org and is periodically updated. The most recent version is mtDNA tree build 17, published and updated in February 2016. [38]

Haplotrees are built on the principle that genetic mutations accumulate and remain fixed in DNA over time. When a mutation occurs, all descendants of that individual will carry that genetic marker. The sequential nature of these mutations allows scientists to reconstruct the historical order of genetic changes and map human migrations throughout history.

Illustration seven depicts the major branches for the Y-DNA haplogroup tree and illustration eight depicts the major branches for the mtDNA maternal lineages .

Illustration Seven: Major Branches of the Y-DNA Haplogroup Tree

Click for Larger View | Source: Primary structure of the Y-chromosome tree. Nineteen letters label monophyletic clades, but three of these (orange) denote internal branches ancestral to other lettered haplogroups: F is an ancestor of G, H, I, J, and K; K is the common ancestor of L, T, N, O, S, M, and P; and P is an ancestor of Q and R. A twentieth letter, “A”, marks a paraphyletic group of the four most highly diverged clades: A00, A0, A1a, and A1b1 (blue). Multi-letter labels represent joins. For example, DE is the parent of D and E. Finally, A1b is the parent of A1b1 and BT, the common ancestor of all non-A haplogroups. Source: 23andMe to Update Paternal Haplogroup Assigments, 11 Apr, 2024, 23andMe Blog, https://blog.23andme.com/articles/23andme-updates-paternal-haplogroup-assignments

Illustration Eight: Major Branches of the mtDNA Haplogroup Tree

Click for Larger View |Source: Modification of diagram found at – Katy Rowe-Schurwanz, Learn about the significance of mtDNA haplogroups and how your mtDNA test results can help you trace your maternal ancestry back to Mitochondrial Eve, 19 Jul 2024, FamilyTreeDNA Blog, https://blog.familytreedna.com/interpreting-mtdna-test-results/
Click for Larger View | Source: FamilyTreeDNA

We can look at my DNA results in the context of haplotrees. Results of my FamilyTreeDNA (FTDNA) Y-700 DNA test indicate my Y-DNA terminal haplogroup is G-BY211678 and my mtDNA phylotree is H50.

The relative positions of these results are indicated in illustrations nine and ten of the major haplotree branches by blue circles.

Given the specificity and the wide range of SNPS tested in the Y-700 DNA test, my results reflect a new terminal end point, FT-48097 in the G -BY211678 branch of the G Haplotree. [38] A terminal SNP represents the furthest known branch or “leaf” on haplotree tree. (See Illustration nine.)

This metaphorical tree framework has proven so useful that it has become a standard way to visualize and understand Y-DNA testing results, with modern genetic testing companies like Family Tree DNA adopting it as their primary way to represent genetic relationships.

Illustration Nine: The Tree Metaphor for explaining Branches in the G Haplotree Branch and My Test Results

The application of the tree metaphor specifically to terminal SNPs emerged from the broader field of genetic genealogy and haplogroup identification. A terminal SNP represents the furthest known branch or “leaf” on a person’s genetic tree. This modern usage combines the traditional tree metaphor with current genetic science and the branch structure of the DNA haplotree. The main branches or subclades represent major haplogroups. Smaller branches indicate subgroups. The terminal SNP represents the smallest “leaf” on the branch.

Unlike Y-line DNA, no additional SNP tests are required to fully determine one’s mitochondrial DNA haplogroup.  The full mitochondrial sequence test (mtFullSequence) at FTDNA provides the most detailed, full haplogroup designation. With the HVR1 (mtDNA) and HVR2 (mtDNAPlus) tests, you receive a base haplogroup.  The full sequence is required to determine your full haplogroup.

To put this in perspective, think of your mitochondrial DNA as a clock face. There are a total of 16,569 locations in your mitochondrial DNA. The HVR1 test tests the number of locations from 11:55 to noon and the HVR2 test tests the number of locations between noon and 12:05PM.  The full sequence test tests the rest, the balance of the 50 minutes of the hour.[39]

Illustration Ten: The H50 Branch on the mtDNA PhyloTree

Click for Larger View | Source: PhyloTree.org – mtDNA tree Build 17 (18 Feb 2016): subtree R0, http://www.phylotree.org/tree/R0.htm

Reframing Contextual Factors for Mid Range and Deep Ancestry Time Layers

Given the change in the frame of reference in developing family stories in the mid and long range time periods, it is more useful to redefine the four ‘social’ structural levels of influence in genetic genealogical terms, as indicated in table five.

Table Five: Comparison of Structural Influences between Different Genealgical Layers of Time

Social Structural
Level
Examples in
Short Term
Time Layer
Examples in
Mid Range &
Long Range
Layers
IndividualFamily Member;
Couple;
Nuclear Family;
‘A generation’
Terminal SNP;
Private Variant;
the Most Recent Common Ancestor
(tMRCA)
Micro LevelExtended Family;
Local Neighborhood;
Local Social Groups
SNP & STR Groups;
Genetic Distance;
Haplogroup subclade;
Modal Haplotype;
tMRCA
Localized Geographical Area
Intermediate LevelEthnic Networks;
Strata / Class;
City-Wide area;
Local Regional Areas
SNP Haplogroup
Sub-branches / Subclades;
Modal Haplotype;;
tMRCA
Regional Geographic Area
Macro LevelState & National Level;
European Country;
Geographical Region
Migratory Paths of Haplogroups;
Major Branches of Haplogroups;
tMRCA;
Regions of Europe

The ‘individual‘ level in the mid range and long term levels of time are ideally represented by a terminal SNP or private variant. A terminal SNP is the defining mutation that represents the most recently known branch on a Y-DNA haplogroup tree, an haplotree. A private variant is a genetic mutation that has occurred in a specific family line but has not yet been found in other tested individuals. These variants represent new SNPs that are unique to particular lineages. [40]

New branches emerge when a variant not only becomes a Named Variant but also fulfills additional criteria: at least one person must test negative for it. This “negative test” helps distinguish the new branch from equivalent ones, signaling a point of divergence in the tree. Each branch represents a distinct lineage, connecting individuals to their unique paternal heritage and further refining our understanding of the tree’s structure.[41]

There are distinct differences between private variants and terminal SNPs. When a private variant is found in enough testers and receives official designation, it can become a new terminal SNP for those who carry it. This demonstrates the evolving nature of genetic genealogy classification as more people test their DNA.

The ‘micro‘ level is represented by haplogroup subclades or branches that are related to the terminal SNP or private variant. The subsclades are in a ‘local’ geographical area and are related to a common ancestor that resided in that geographical area. It is analogous to the ‘extended family’ or ‘local social groups’ . This is the genetic social structural level that can reveal the emergence of surnames in the period of lineages.

Illustration Eleven: Genealogical Time and Social Structural Levels

The ‘intermediate‘ level straddles the mid range and long range time layers of genealogical time. The social structures in this time layer are akin to ‘ethnic networks’ or larger networks and haplogroups based in ‘regional geographical areas’. It is represented by a larger portion of haplogroup subclades which comprise haplogroup branches that have a common genetic ancestor that migrated from one geographical area to another. The Phylogenetic tree of haplogroups descending from G-Z4768 in illustration six above would be an example.

The ‘macro‘ level is in the long range genealogical time layer. It is graphically reflected by the migratory paths of major branches in an haplogroup lineage. This time layer is similar to French historian Fernand Braudel’s “long duration”. It is a time layer which emphasizes studying history or genealogy through the lens of long-term, slow-moving structures like geography, climate, and demographics, rather than focusing on short-term events or individual figures. It is essentially looking at the deep, underlying patterns of history that persist over extended periods of time, often beyond human memory. [42]

Illustration twelve depicts the differences in the social structural levels in each of the three genealogical time layers.

Illustration Twelve: Historical Context of Social Structure in the Three GenealogicalTime Layers

The three layers of genealogical time rely upon different methods of gathering contextual evidence. I have discussed contextual factors found in the traditional or short term genealogical time layer in a previous story.

As depicted in illustration thirteen, in addition to the various social structural levels that may influence our development of a story about a family member of family in the traditional genealogical time layer, there are ecological, technological, economic, cultural influences that may add historical context to the story. These influences may affect specific or all social structural levels. Rather than delve into possible relationships of causation, I have simply recognized the impact of and interplay between social, cultural, technological influences when weaving stories from our genealogical evidence.

Illustration Thirteen: Social Structural Levels and Other Influences in the Three Genealogical Time Layers

The long term and mid range ancestry genealogical time layers are also influenced by contextual factors. However, the ability to retrieve evidence on these factors diminishes as one goes back in time. These contextual factors in the period of deep ancestry are largely the outcome of a series of environmental, demographic and evolutionary events reflected in migration, genetic bottlenecks, founder events, admixture, population isolation, natural selection and genetic drift which occurred in different parts of the world at various time points in history. [43]

In human populations, changes in genetic variation are driven not only by genetic processes themselves, but can also arise from environmental, cultural or social changes. SNPs and STRs are influenced by several key factors that affect their occurrence and distribution throughout the genome. Demographic population patterns significantly influence SNP and STR mutation patterns through several key mechanisms.

Rob Spencer’s research in genealogy, particularly regarding “bottleneck” events, focuses on identifying periods in a population’s history where a significant decrease in population size occurred, which can leave a noticeable genetic signature in the genealogical record and impact the diversity of descendants today. Conversely, a founder event happens when a small group separates from a larger population to establish a new colony. [44]

Cultural factors and processes can influence migration patterns and genetic isolation of populations, and can be responsible for the patterns of genetic variation as a result of gene-culture co-inheritance (e.g. a preference of cousin marriage). Understanding how social and cultural processes affect the genetic patterns of human populations over time has brought together anthropologists, geneticists and evolutionary biologists, and the availability of genomic data and powerful statistical methods widens the scope of questions that analyses of genetic information can answer.” [45]

The long term and mid range ancestry genealogical time layers rely on paleo-genomic, anthropological sources and historical analyses of cultural groups for contextual evidence. [46] The contextual sources for the deep ancestry time period are discussed in part three of this series of stories.

Illustration Fourteen: Historical Context of Social Structure, Culture, and Other Factors in the Three Genealogical Time Layers

A Illustrative Model for Depicting the Mid Range and Long Term Genealogical Time Periods

Examples for each of the four structural levels in mid and long range genealogical time are provided in an illustrated model of genealogical time and historical contexts of structural and cultural factors below.

Illustration Fifteen: Time and Historical Context of Structure, Culture, and Other Factors in the Mid and Long Range Genealogical Time Layers

The examples for each of the social structural levels in the illustration are based on my genetic genealogical past. The examples for creating the illustration are from various sources. [47]

Reference
Number in
Model
Structural LevelExample
OneIndividualMy terminal SNP G-FT480 based on Y-700 FamilyTreeDNA results
TwoMicroPhylogenetic Tree of Decendents of Haplogroup G-Y132505
ThreeIntermediatePhylogenetic Tree of Decendents of Haplogroup G-Z6748
FourMacroMigratory Path of G Haplogroup in Europe

Reference Number 2 & 3 in the Model

The Phylogenetic tree is based on the current YDNA descendants of Haplogroup G-Z6748.

A subset of the phylogentic tree, which represents the micro level, is the haplogroup G-Z6748. This haplogroup appears to be a largely Welsh haplogroup, though extending into neighboring parts of England.

My Y-700 DNA test results as reflected in work compiled by the project administrators of the FamilyTreeDNA G-L497 work group project. [48]

Reference Number 4 in the Model

An illustrative example used in the model depicted above for the macro social structural level is a depiction of the general migratory path for my patrilineal genetic ancestors through the G-L497 haplogroup line. The ‘reconstructed’ migratory path was created using Globetrekker.

Globetrekker is an innovative DNA mapping tool launched by FamilyTreeDNA (FTDNA) in July 2023. The mapping tool visualizes paternal ancestry migration paths. This feature is only available to customers who have taken the Big Y-500 or Big Y-700 test. [49]

Reference Number 5 & 7 in the Model

An observation is noted in the illustrated model about the high percentage of population in Wales that exhibit STR values associated with the G-P303 haplogroup. “In Wales, a distinctive G2a3b1 type (DYS388=13 and DYS594=11) dominates there and pushes the G percentage of the population higher than in England.” In the model, it is used to illustrate a micro level genetic observation that is found in the short term and mid level genealogical time layers.

In Wales, a distinctive G2a3b1 type (DYS388=13 and DYS594=11) dominates and pushes the G percentage of the population higher than in England.

DYS stands for DNA Y-chromosome Segment. It is used to describe a segment of DNA on the Y chromosome that contains short tandem repeats (STRs). STRs are short DNA patterns that repeat in a specific sequence. All STRs are given a unique identification number. For example, DYS388: the D indicates that the segment is a DNA segment, the Y indicates that the segment is on the Y chromosome, the S indicates that it is a unique segment, and the number 388 is the identifier.

The values for the two abovementioned DYS’s are uniquelyassociated with the Haplogroup G-P303 (G2a2b2a, formerly G2a3b1). 

Reference Number 6 in the Model

This observation is associated with the intermediate structural level. It is a current theory proffered by a member of the FamilyTreeDNA working project group for the Z-6748 Haplogroup. The YDNA tests associated with this group have ancestors that appear to have come from Wales.

Click for Larger View | Source: Migratory Path for Haplogroup G-Y132505 generated through GlobTrekker, FamilyTreeDNA, based on data as of 21 Jan 2025

The current theory is the ancestor of this YDNA line came across the English Channel with the Normans around the Norman Invastion. While the ancestor was not Norman he was probably a French or Belgium.

Reference Number 8 in the Model

Examples of contextual evidence from macro cultural and paleo-genomic research are correlated with each of the four structural levels. This is an example of macro-cultural contextual evidence in illustration three provides a map of cultural groups around 1,000 – 1,200 BCE.

The information in the map is correlated with when the G-Z1817 haplogroup existed in Europe. The haplogroup follows an ancestral path that descended from earlier G lineages that were present in the region approximately 4,550 BCE. The haplogroup emerged from the G-CTS9737 haplogroup around 3,050 BCE during the transition between the Stone Age and Metal Ages.

Example of Cultural Groups in Europe Around 1000 1200 BCE

Click for Larger View | Source: Hay, Maciamo, Haplogroup G2a (Y-DNA), Jul 2023, Eupedia, https://www.eupedia.com/europe/Haplogroup_G2a_Y-DNA.shtml

The haplogroup appears to have a predominantly Germanic and Central European focus, with its distribution suggesting possible connections to early Germanic populations. The modern pattern indicates the haplogroup likely played a role in Central European population movements, though maintaining its strongest presence in German-speaking regions. [50]

Reference Number 9 in the Model

Ths is an illustrative example at the macro level provides a correlation of where ancient DNA (aDNA) remains have been found that were part of the G-P15 haplogroup. G-P15, also known as haplogroup G2a, is a Y-chromosome haplogroup that emerged approximately 15,000-16,000 years ago.

Example of G-P15 Ancient remains in Europe

Click for Larger View | Source: E.K. Khusnutdinova, N.V. Ekomasova, et al., Distribution of Haplogroup G-P15 of the Y-Chromosome Among Representatives of Ancient Cultures and Modern Populations of Norther Eurasia, Opera Med Physiol. 2023. Vol. 10 (4): 57 – 72, doi: 10.24412/2500-2295-2023-4-57-72

 This genetic lineage is defined by specific mutations on the Y-chromosome, particularly the P15 marker. The G-P15 haplogroup is an ancestral group of my more historically immediate haplogroups. Current research indicates that G-P15 represents one of the main Neolithic genetic links connecting early farmers who migrated across different European routes, including the northern route through the Balkans to Central Europe and the western maritime route to the Western Mediterranean. [51]

Weaving Genealogical Stories Across the Three Layers of Time

This story provdes a model to explain the connectiveness of three different genealogical time layers and associated contextual sources of evidence for developing genealogical stories. The combination of traditional genealogical research with genetic genealogical analysis offers several powerful benefits for extending research through three layers of genealogical time. While the terminology, the objects of research and reseach methods are differenet, there is coherence between the two approaches to tie family history across the time layers. Haplogroup testing can help overcome genealogical dead ends or birckwalls by offering clues about ancestral origins beyond documented records, providing direction for research when traditional records are unavailable, and connecting genetic matches who share common ancestors.

Haplogroups enhance location-based research. They point to specific geographic regions where ancestors lived. They can confirm family origins and migration patterns. They also provide insights about ancestral locations from thousands of years ago that are not documented in historical records.

The combination of research through the three genealogical time layers helps validate genealogical research. DNA testing can confirm or disprove suspected family connections. Haplogroups can verify heritage claims that are too distant for autosomal DNA testing or beyond the reach of traditional research. Y-DNA patterns can help confirm surname connections and lineages.

The combination research across the three time layers provides a deeper historical understanding by revealing ancient migration patterns of family lines. It connects family history to broader historical movements. It provides insights about ancestors’ lives thousands of years before written records.

Each time layer provides valuable clues and they should be used as a unique source of evidence in our genealogical research.

Source:

Feature Banner: The banner at the top of the story is a depiction of the two models associated with the three layers of genealogical time with the four social structural levels of historical context and other factors. .

[1] I have used 31 or 33 years as a rough estimate of a generation. This estimate has been ‘deduced’ after reading through the research and opinions about what is a generation in terms of years.

The conversion from generations to years typically uses a generation interval of approximately 30 years, rather than the previously assumed 20-25 years. This longer interval has been validated through extensive genealogical studies and population registers. For the mosst accurate calculations, it is recommended that an interval of 28-31.5 years be used.

Tremblay M, Vézina H. New estimates of intergenerational time intervals for the calculation of age and origins of mutations. Am J Hum Genet. 2000 Feb;66(2):651-8. doi: 10.1086/302770. PMID: 10677323; PMCID: PMC1288116, https://pmc.ncbi.nlm.nih.gov/articles/PMC1288116/

Also, see for example:

“But just how long is a generation? Don’t we all know as a matter of common knowledge that it generally averages about 25 years from the birth of a parent to the birth of a child. …

“I’ve shaded my earlier preferred number, 34, down a bit, to 33 or 32 but varying with the ethnicity, place, and period of the population.

(Based on a study of family documentation) For a total of 21 male-line generations among five lines, the average interval was close to 34 years per generation. For 19 female-line generations from four lines, the average was an exact 29 years per generation.”

John Barrett Rob, How Long is a Generation?, https://www.johnbrobb.com/Content/DNA/How_Long_Is_A_Human_Generation.pdf

“For the Y chromosome these rates assume a 31 year generation.”

J. Douglas McDonald, TMRCA Calculator, Oct 2014 version, Clan Donald, USA website, Https://clandonaldusa.org/index.php/tmrca-calculator

Richard J Wang, Samer I. Al-Saffar, Jeffery Rogers and Mathew W. Hah,  Human generation times across the past 250,000 years, Science Advances, 6 Jan 2023, Vol 9 Issue 1, https://www.science.org/doi/10.1126/sciadv.abm7047

“(T)he accepted 25-year average has worked quite acceptably, and birth dates too far out of line with it are properly suspect.”

“As a check on those values, which are based on extensive data and rigorous mathematical analysis, although rounded off for ease of use, I decided to compare the generational intervals from all-male or all-female ranges in my own family lines for the years 1700 to 2000, and was pleasantly surprised to see how closely they agree. For a total of 21 male-line generations among five lines, the average interval was 34 years per generation. For 19 female-line generations from four lines, the average was 29 years per generation.”

“However, to convert generations to years and probable date ranges, use a value for the generational interval that is soundly based on the best currently available evidence.”

Donn Devine, How Long is a generation? Science Provides an Answer, International Society of Genetic Genealogy (ISOG) Wiki, This page was last edited on 16 November 2016, https://isogg.org/wiki/How_long_is_a_generation%3F_Science_provides_an_answer. This article was originally published in Ancestry Magazine, Sep-Oct 2005, Volume 23, Number 4, pp51-53.

Marc Tremblay et al., “New Estimation of Intergenerational Time Intervals for the Calculation of Age and Origin of Mutations,” American Journal of Human Genetics 66 (Feb. 2000): 651-658.

Nancy Howell calculated average generational intervals among present-day members of the !Kung tribe. The !Kung are a contemporary hunter-gatherer group currently living in Botswana and Namibia. Their way of life mirrors the nomadic hunting and gathering lifestyle thqat is similar to pre-agricultural ancestors. The average age of mothers at birth of their first child was 20 and at the last birth 31, giving a mean of 25.5 years per female generation. Husbands were six to 13 years older, giving a male generational interval of 31 to 38 years. 

Nancy Howell, The Demography of the Dobe !Kung (1979; second edition New York: Walter de Gruyter, 2000).

Archaeologist Kenneth Weiss questioned the accepted 20 and 25-year generational intervals, finding from his analysis of prehistoric burial sites that 27 years was a more appropriate interval. 

Kenneth M. Weiss, “Demographic Models for Anthropology,” American Antiquity 38 No, 2 (April 1979): 1-39.

With an average depth of nine generations, but extending as far back as 12 or 13 generations, Trembley and Vézina’s sample included 10,538 generational intervals. They took as the interval the years between parents’ and children’s marriages, which averaged 31.7 years

Marc Tremblay, H. Vézina H,  New estimates of intergenerational time intervals for the calculation of age and origins of mutations. Am J Hum Genet. 2000 Feb;66(2):651-8. doi: 10.1086/302770. PMID: 10677323; PMCID: PMC1288116. https://pubmed.ncbi.nlm.nih.gov/10677323/

Ingman and associates used 20-year generations to place “mitochondrial Eve” 171,500 +/- 50,000 years before present, a probability range broad enough to cover underestimation.

Max Ingman et al., “Mitochondrial Genome Variation and the Origin of Modern Humans,” Nature 408 (2000): 708-713, 8,575,

Thomason and associates used 25-year generations (although noting Weiss’s 27-year estimate) to place the most recent common male-line ancestor of all living men about 50,000 years before the present. 

Russell. Thomson et al., “Recent Common Ancestry of Human Y Chromosomes,” Proceedings of the National Academy of Science USA 97 (20 June 2000): 7360-7365

Fenner, Jack N., Cross-cultural estimation of the human generation interval for use in genetics-based population divergence studies (American Journal of Physical Anthropology 128(1Jan2005):415-423)

Generation, Wikipedia, This page was last edited on 15 January 2024, https://en.wikipedia.org/wiki/Generation

Richard J. Wang et al. ,Human generation times across the past 250,000 years. Science Advances Vol 9 No 1, 2023. DOI:10.1126/sciadv.abm7047

The concept of a ‘generation takes on different meaning from a purely historical or sociological view.


Kertzer, David I. “Generation as a Sociological Problem.” Annual Review of Sociology, vol. 9, 1983, pp. 125–49. JSTOR, http://www.jstor.org/stable/2946060 

“The scope of future generational studies may be somewhat restricted by limited the concept of generation to relations of kinship descent. But such restrictions do to entail any limitation of substantive or theoretical inquiry; rather, they email a more precise use of concepts.”  Page 143 

“What is crucial … is that generational processes be firmly placed in specific historical contexts – ie, that they reanalyzed in conjunction with the concepts of cohort, age, and historical period.” P  143

“Examining generation in conjunction with age opens up a research agenda that may be obscured where age, cohort, and generation are used interchangeably. The issues likely to be of greatest interest depend on the theoretical orientation of the researcher. From a sociobiological viewpoint, generational relations are central to society, for they underlie the transmission of genes … . . “ Page 144

“I advocate a role of the concept of generation more restricted than that championed by many other social scientists, but a role nonetheless important.” Page 144


Jansen, Nerina. “Definition of Generation and Sociological Theory.” Social Science, vol. 49, no. 2, 1974, pp. 90–98. JSTOR, http://www.jstor.org/stable/41959796 

There are two methodological prerequisites for the identification of the generation in the social structure: (a) a particular time dimensions and(b) a particular historical context.”  Page 93


Spitzer, Alan B. “The Historical Problem of Generations.” The American Historical Review, vol. 78, no. 5, 1973, pp. 1353–85. JSTOR, https://doi.org/10.2307/1854096 


See also:

Carlsson, Gosta, and Katarina Karlsson. “Age, Cohorts and the Generation of Generations.” American Sociological Review, vol. 35, no. 4, 1970, pp. 710–18. JSTOR, https://doi.org/10.2307/2093946  

Julián Marías, Generations: A Historical Method, Alabama: Alabama University Press, 1970

For a psychological perspective, see: Bettelheim, Bruno. “The Problem of Generations.” Daedalus, vol. 91, no. 1, 1962, pp. 68–96. JSTOR, http://www.jstor.org/stable/20026698  

[2] The following are definitions of the terms used in this sentence.

A terminal SNP (Single Nucleotide Polymorphism) is the defining SNP of the most recent known subclade on a person’s Y-DNA haplogroup tree based on their current testing level1. It represents the furthest tested branch position on the Y-chromosome tree of human ancestry. Terminal SNPs are considered “once in the lifetime of mankind” mutations that are stable and unique genetic markers. They help define different haplogroups and subclades on the paternal line. The terminal SNP designation can change over time as different testing companies may identify different terminal SNPs based on their testing coverage. More extensive testing may reveal additional downstream SNPs. New SNPs are discovered through advanced testing like the FamilyTreeDNA Big Y700.

Terminal SNPs are valuable for determining the precise placement of DNA test results on the human paternal and maternal family tree. They are also useful for identifying genetic relationships between different family lines. Two individuals cannot be closely related within the past 1,000 years if they belong to different haplogroups, even if their other genetic markers appear similar. [a]

The Most Recent Common Ancestor (MRCA), also known is the most recent individual from whom all members of a specified group are directly descended. The MRCA represents the point where specific genealogical lines of a group converge to a single ancestor. While it is often impossible to identify the exact MRCA of a large group, scientists can estimate when this ancestor lived using DNA tests and established mutation rates. [b]

A subclade is a subgroup within a larger genetic haplogroup that represents a more specific and detailed classification of genetic lineages. A subclade is defined by specific genetic markers, particularly Single Nucleotide Polymorphisms (SNPs), that distinguish it from other branches within the same haplogroup. Subclades form nested hierarchies within haplogroups, with each subclade representing a more recent branch of the genetic family tree.

The classification of subclades can change as new SNPs are discovered. More extensive testing may reveal additional downstream markers. Different testing companies identify new genetic markers. [c]

A haplotype is a group of alleles inherited together from a single parent. These genetic variations are located on the same chromosome and pass down as a unit through generations. [d]

A modal haplotype is the most commonly occurring set of genetic markers (STR values) found within a specific group of people. It represents the predominant pattern in a population but may not necessarily be the ancestral pattern. [e]

FeatureHaplotypeModal Haplotype
OriginIndividual inheritancePopulation statistics
RepresentationActual genetic sequenceMost frequent pattern
ScopeIndividual levelGroup or population level

The modal haplotype functions as a theoretical construct composed of the most frequent value for each marker among members of the same lineage. This creates a reference point that is useful for groups sharing common ancestry within the past several hundred years.

Modal haplotypes are useful in surname DNA projects by helping researchers analyze genetic relationships within family groups. Modal haplotypes help project administrators that manage Y-DNA results for DNA companies to determine genetic families within surname projects by providing a reference point for comparison. When comparing participants’ DNA results, the modal haplotype serves as a baseline to identify related individuals.

The modal haplotype represents the most commonly occurring genetic marker values within a specific group, though it may not exactly match the ancestral haplotype due to sampling bias, genetic drift, or founder effects.

Project administrators use modal haplotypes to compare marginal members against the core genetic family; resolve conflicting matches between participants; adnd group test results without initially relying on paper trail genealogy. When working with modal haplotypes in surname projects, administrators can help identify genetic families within the same surname group. They also can be used to evaluate potential new members and compare participants with different testing resolutions.

[a] Estes, Roberta, Glossary – Terminal SNP, 29 Nov 2017, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2017/11/29/glossary-terminal-snp/

Most Recent Common Ancestor, Wikipedia, This page was last edited on 6 January 2025, https://en.wikipedia.org/wiki/Most_recent_common_ancestor

Most Recent Common Ancestor, International Society of Genetic Genealogy Wiki, This page was last edited on 31 January 2017, https://isogg.org/wiki/Most_recent_common_ancestor

[c] Subclades, Wikipedia, This page was last edited on 24 May 2024, https://en.wikipedia.org/wiki/Subclade

[d] Haplotype, Wikipedia, This page was last edited on 19 September 2024, https://en.wikipedia.org/wiki/Haplotype

Haplotype / Haplotypes, Scitable, https://www.nature.com/scitable/definition/haplotype-haplotypes-142/

[e] Modal Haplotype, Wikipedia, This page was last edited on 10 May 2024, https://en.wikipedia.org/wiki/Modal_haplotype

Matching and grouping in surname DNA projects, International Society of Genetic Genealogy Wiki, This page was last edited on 28 January 2021, https://isogg.org/wiki/Matching_and_grouping_in_surname_DNA_projects 

[3] Estes, Roberta, Glossary – Terminal SNP, 29 Nov 2017, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2017/11/29/glossary-terminal-snp/

[4] Polymorphism (biology), Wikipedia, This page was last edited on 14 December 2024, https://en.wikipedia.org/wiki/Polymorphism_(biology)

Fan H, Chu JY. A brief review of short tandem repeat mutation. Genomics Proteomics Bioinformatics. 2007 Feb; 5(1):7-14. doi: 10.1016/S1672-0229(07)60009-6. PMID: 17572359; PMCID: PMC5054066. https://pmc.ncbi.nlm.nih.gov/articles/PMC5054066/

Estes, Roberta, STRs vs SNPs, Multiple DNA Personalities, 10Feb 2014, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2014/02/10/strs-vs-snps-multiple-dna-personalities/

Single-nucleotide polymorphism, Wikipedia, This page was last edited on 6 January 2025, https://en.wikipedia.org/wiki/Single-nucleotide_polymorphism

[5] John M. Butler, Michael D. Coble, Peter M. Vallone, STRs vs. SNPs: thoughts on the future of forensic DNA testing, Forensic Sci Med Pathol (2007) 3:200–205. DOI 10.1007/s12024-007-0018-1, https://strbase-archive.nist.gov/pub_pres/FSMP_STRs_vs_SNPs.pdf

Norrgard , Karen & Schultz, JoAnna, Using SNP data to examine human phenotypic differences. Nature Education 1(1):85, 2008, https://www.nature.com/scitable/topicpage/using-snp-data-to-examine-human-phenotypic-706/

Fan H, Chu JY. A brief review of short tandem repeat mutation. Genomics Proteomics Bioinformatics. 2007 Feb;5(1):7-14. doi: 10.1016/S1672-0229(07)60009-6. PMID: 17572359; PMCID: PMC5054066, https://pmc.ncbi.nlm.nih.gov/articles/PMC5054066/

Estes, Roberta, STRs vs SNPs, Multiple DNA Personalities, 10 Feb 2014, DNAeXplained, https://dna-explained.com/2014/02/10/strs-vs-snps-multiple-dna-personalities/

Phillips C, García-Magariños M, Salas A, Carracedo A, Lareu MV. SNPs as Supplements in Simple Kinship Analysis or as Core Markers in Distant Pairwise Relationship Tests: When Do SNPs Add Value or Replace Well-Established and Powerful STR Tests? Transfus Med Hemother. 2012 Jun;39(3):202-210. doi: 10.1159/000338857. Epub 2012 May 12. PMID: 22851936; PMCID: PMC3375139, https://pmc.ncbi.nlm.nih.gov/articles/PMC3375139/

[6] The number 10 in mutation rates represents scientific notation, which is used to express very small probabilities of mutations occurring. A mutation rate (per base per generation) of ~10^-8 means 0.00000001. In humans, a mutation rate of 10^-8 means one mutation occurs per hundred million base pairs per generation. With 3 billion base pairs in the human genome, this results in approximately 30-100 new mutations per generation. [a]

A mutation rate of 10^-8 represents the probability of a mutation occurring at a specific nucleotide site per generation in humans. [b][c]To put this in practical terms this mutation rate means approximately 2.5 × 10^-8 mutations occur per nucleotide site per generation.[d] With a human genome of about 3 billion base pairs, this results in roughly 60-100 new mutations in each person’s genome per generation. This mutation rate means that in a human population every possible single base-pair mutation exists somewhere in the current human population. For any specific site in the genome, dozens of humans may carry a mutation at that location. [c] Two-base-pair specific mutations would require approximately 10^7 generations to occur by chance. 

[a] Sanjuán R, Nebot MR, Chirico N, Mansky LM, Belshaw R. Viral mutation rates. J Virol. 2010 Oct;84(19):9733-48. doi: 10.1128/JVI.00694-10. Epub 2010 Jul 21. PMID: 20660197; PMCID: PMC2937809.

What is the Mutation Rate During Genome replication, Cell Biology by the Numbers, https://book.bionumbers.org/what-is-the-mutation-rate-during-genome-replication/

[b] Adam Eyre-Walker, Ying Chen Eyre-Walker, How Much of the Variation in the Mutation Rate Along the Human Genome Can Be Explained?, G3 Genes|Genomes|Genetics, Volume 4, Issue 9, 1 September 2014, Pages 1667–1670, https://doi.org/10.1534/g3.114.012849

[c] What is the Mutation Rate During Genome replication, Cell Biology by the Numbers, https://book.bionumbers.org/what-is-the-mutation-rate-during-genome-replication/

[d] Nachman MW, Crowell SL. Estimate of the mutation rate per nucleotide in humans. Genetics. 2000 Sep;156(1):297-304. doi: 10.1093/genetics/156.1.297. PMID: 10978293; PMCID: PMC1461236. https://pmc.ncbi.nlm.nih.gov/articles/PMC1461236/

Mutation rate, Wikipedia, This page was last edited on 7 November 2024, https://en.wikipedia.org/wiki/Mutation_rate

[7] Estes, Roberta, STRs vs SNPs, Multiple DNA Personalities, 10 Feb 2014, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2014/02/10/strs-vs-snps-multiple-dna-personalities/

[8] Estes, Roberta, Y DNA: Step-by-Step Big Y Analysis, 30 May 2020, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2020/05/30/y-dna-step-by-step-big-y-analysis/

[9] John M. Butler, Michael D. Coble, Peter M. Vallone, STRs vs. SNPs: thoughts on the future of forensic DNA testing, Forensic Sci Med Pathol (2007) 3:200–205. DOI 10.1007/s12024-007-0018-1, https://strbase-archive.nist.gov/pub_pres/FSMP_STRs_vs_SNPs.pdf

[10] Estes, Roberta, STRs vs SNPs, Multiple DNA Personalities, 10 Feb 2014, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2014/02/10/strs-vs-snps-multiple-dna-personalities/

[11] Norrgard , K. & Schultz, J. (2008) Using SNP data to examine human phenotypic differences. Nature Education1(1):85 https://www.nature.com/scitable/topicpage/using-snp-data-to-examine-human-phenotypic-706/

[12] John M. Butler, Michael D. Coble, Peter M. Vallone, STRs vs. SNPs: thoughts on the future of forensic DNA testing, Forensic Sci Med Pathol (2007) 3:200–205. DOI 10.1007/s12024-007-0018-1, https://strbase-archive.nist.gov/pub_pres/FSMP_STRs_vs_SNPs.pdf

[13] Fan H, Chu JY. A brief review of short tandem repeat mutation. Genomics Proteomics Bioinformatics. 2007 Feb;5(1):7-14. doi: 10.1016/S1672-0229(07)60009-6. PMID: 17572359; PMCID: PMC5054066, https://pmc.ncbi.nlm.nih.gov/articles/PMC5054066/

[14] Rob Spencer, STR Clades, Tracking Back: a website for genetic genealogy tools, experimentation, and discussion, http://scaledinnovation.com/gg/gg.html?rr=strclades

[15] Rob Spencer, Why use STR data and not SNP data?, Tracking Back: a website for genetic genealogy tools, experimentation, and discussion, http://scaledinnovation.com/gg/gg.html?rr=whystr

[16] Katy Rowe-Schurwanz, Learn about the significance of mtDNA haplogroups and how your mtDNA test results can help you trace your maternal ancestry back to Mitochondrial Eve, 19 Jul 2024, FamilyTreeDNA Blog, https://blog.familytreedna.com/interpreting-mtdna-test-results/

[17] Haplogroup, Wikipedia, This page was last edited on 12 January 2025, https://en.wikipedia.org/wiki/Haplogroup

[18] Rowe-Schuranz, Katy, Interpreting Y-DNATest Results: Y-DNA Haplogroups, 2 Jul 2024, FamilyTreeDNA Blog, https://blog.familytreedna.com/interpreting-y-dna-test-results-haplogroups/

Rowe-Schuranz, Katy, Big Y Lifetime Analysis: The Myth of the Manual Review, 22 Nov 2023, FamilyTreeDNA Blog, https://blog.familytreedna.com/big-y-manual-review-lifetime-analysis/

Y-DNA project help, International Society of Genetic Genealogy Wiki, This page was last edited on 28 October 2022,, https://isogg.org/wiki/Y-DNA_project_help

[19] Rowe-Schuranz, Katy, Interpreting Y-DNATest Results: Y-DNA Haplogroups, 2 Jul 2024, FamilyTreeDNA Blog, https://blog.familytreedna.com/interpreting-y-dna-test-results-haplogroups/

[20] Hallast P, Batini C, Zadik D, Maisano Delser P, Wetton JH, Arroyo-Pardo E, Cavalleri GL, de Knijff P, Destro Bisol G, Dupuy BM, Eriksen HA, Jorde LB, King TE, Larmuseau MH, López de Munain A, López-Parra AM, Loutradis A, Milasin J, Novelletto A, Pamjav H, Sajantila A, Schempp W, Sears M, Tolun A, Tyler-Smith C, Van Geystelen A, Watkins S, Winney B, Jobling MA. The Y-chromosome tree bursts into leaf: 13,000 high-confidence SNPs covering the majority of known clades. Mol Biol Evol. 2015 Mar;32(3):661-73. doi: 10.1093/molbev/msu327. Epub 2014 Dec 2. PMID: 25468874; PMCID: PMC4327154, https://pmc.ncbi.nlm.nih.gov/articles/PMC4327154/

[21] Several key methods exist for calculating Time to Most Recent Common Ancestor (TMRCA), each with distinct advantages and limitations. Recent developments have led to tree-based methods using Y-SNPs, which offer improved phylogenetic tree construction, better handling of sub-clade relationships and more accurate mutation counting between nodes.

McDonald I. Improved Models of Coalescence Ages of Y-DNA Haplogroups. Genes (Basel). 2021 Jun 4;12(6):862. doi: 10.3390/genes12060862. PMID: 34200049; PMCID: PMC8228294 https://pmc.ncbi.nlm.nih.gov/articles/PMC8228294/

Hallast P, et al, The Y-chromosome tree bursts into leaf: 13,000 high-confidence SNPs covering the majority of known clades. Mol Biol Evol. 2015 Mar;32(3):661-73. doi: 10.1093/molbev/msu327. Epub 2014 Dec 2. PMID: 25468874; PMCID: PMC4327154, https://pmc.ncbi.nlm.nih.gov/articles/PMC4327154/

Boattini, A., Sarno, S., Mazzarisi, A.M. et al. Estimating Y-Str Mutation Rates and Tmrca Through Deep-Rooting Italian Pedigrees. Sci Rep 9, 9032 (2019). https://doi.org/10.1038/s41598-019-45398-3

Basu A. and Majumder P. P. 2003 A comparison of two popular statistical methods for estimating the time to most recent common
ancestor (TMRCA) from a sample of DNA sequences. J. Genet., 82, 7–12, https://www.ias.ac.in/article/fulltext/jgen/082/01-02/0007-0012

Zhou J, Teo YY. Estimating time to the most recent common ancestor (TMRCA): comparison and application of eight methods. Eur J Hum Genet. 2016 Aug;24(8):1195-201. doi: 10.1038/ejhg.2015.258. Epub 2015 Dec 16. PMID: 26669663; PMCID: PMC4970674, https://pmc.ncbi.nlm.nih.gov/articles/PMC4970674/

Estes, Roberta, Haplogroups: DNA SNPs are Breadcrumbs – Follow Their Path, 10 Aug 2023, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2023/08/10/haplogroups-dna-snps-are-breadcrumbs-follow-their-path/

[22] Most recent recent common ancestor, Wikipedia, This page was last edited on 20 January 2025, https://en.wikipedia.org/wiki/Most_recent_common_ancestor

[23] Spencer, Rob, Data Source and SNP Dates, Discussion, SNP Tracker, http://scaledinnovation.com/gg/snpTracker.html

Rob Spncer alludes to YFull’s operational definition of tMRCA’s inception date. YFull is a specialized DNA analysis service that focuses on interpreting Y-chromosome and mitochondrial DNA sequences. YFull analyzes raw data files (BAM and CRAM) obtained from next-generation sequencing (NGS) to study origins in both direct paternal line (Y DNA) and direct maternal line (Mitochondrial DNA).

What is YFull, Tutorial, YFull, https://www.yfull.com/tutorial/

What is YFull’s age estimation methodology?, FAQ, YFull, https://www.yfull.com/faq/what-yfulls-age-estimation-methodology/

Estes, Roberta, Data Mining and Screen Scraping – Right or Wrong?, 6 Apr 2014, DNAeXplained – Genetic Genealogy, https://dna-explained.com/category/yfull-company/

Jonas, Linda, Advantages of submitting to YFull, 14 Oct 2019, The Ultimate Family Historians, http://ultimatefamilyhistorians.blogspot.com/2019/10/advantages-of-submitting-to-yfull.html

[24] Generation, Wikipedia, This page was last edited on 18 January 2025, https://en.wikipedia.org/wiki/Generation

[25] Lohmueller KE, Bustamante CD, Clark AG. Methods for human demographic inference using haplotype patterns from genomewide single-nucleotide polymorphism data. Genetics. 2009 May;182(1):217-31. doi: 10.1534/genetics.108.099275. Epub 2009 Mar 2. PMID: 19255370; PMCID: PMC2674818, https://pmc.ncbi.nlm.nih.gov/articles/PMC2674818/

[26] Yunusbaev, U., Valeev, A., Yunusbaeva, M. et al. Reconstructing recent population history while mapping rare variants using haplotypes. Sci Rep 9, 5849 (2019). https://doi.org/10.1038/s41598-019-42385-6

[27] Halpogroup, International Society of Genetic Genealogy Wiki, This page was last edited on 1 November 2024, https://isogg.org/wiki/Haplogroup

[28] Choudhury A, Hazelhurst S, Meintjes A, Achinike-Oduaran O, Aron S, Gamieldien J, Jalali Sefid Dashti M, Mulder N, Tiffin N, Ramsay M. Population-specific common SNPs reflect demographic histories and highlight regions of genomic plasticity with functional relevance. BMC Genomics. 2014 Jun 6;15(1):437. doi: 10.1186/1471-2164-15-437. PMID: 24906912; PMCID: PMC4092225, https://pmc.ncbi.nlm.nih.gov/articles/PMC4092225/

Yunusbaev, U., Valeev, A., Yunusbaeva, M. et al. Reconstructing recent population history while mapping rare variants using haplotypes. Sci Rep 9, 5849 (2019). https://doi.org/10.1038/s41598-019-42385-6

Zurel, H., Bhérer, C., Batten, R. et al. Characterization of Y chromosome diversity in newfoundland and labrador: evidence for a structured founding population. Eur J Hum Genet 33, 98–107 (2025). https://doi.org/10.1038/s41431-024-01719-3

[29] Generation, Wikipedia, This page was last edited on 18 January 2025, https://en.wikipedia.org/wiki/Generation

[30] McDonald I. Improved Models of Coalescence Ages of Y-DNA Haplogroups. Genes (Basel). 2021 Jun 4;12(6):862. doi: 10.3390/genes12060862. PMID: 34200049; PMCID: PMC8228294, https://pmc.ncbi.nlm.nih.gov/articles/PMC8228294/

[31] McDonald I. Improved Models of Coalescence Ages of Y-DNA Haplogroups. Genes (Basel). 2021 Jun 4;12(6):862. doi: 10.3390/genes12060862. PMID: 34200049; PMCID: PMC8228294, https://pmc.ncbi.nlm.nih.gov/articles/PMC8228294/

Irvine, James, Y-DNA SNP-Based TMRCA Calculations for Surname Project Administrators, Journal f Genetic Genealogy, Volume 9, Number 1 (Fall 2021), Reference Number: 91.007, https://jogg.info/wp-content/uploads/2021/12/91.007-Article.pdf

Mullen, Pierre, 16 Feb 2023, Introducing the New FTDNATiP™ Report for Y-STRs, FamilyTreeDNA Blog, https://blog.familytreedna.com/ftdnatip-report/

[32] McDonald I. Improved Models of Coalescence Ages of Y-DNA Haplogroups. Genes (Basel). 2021 Jun 4;12(6):862. doi: 10.3390/genes12060862. PMID: 34200049; PMCID: PMC8228294, https://pmc.ncbi.nlm.nih.gov/articles/PMC8228294/

[33] Human Y-chromosome DNA haplogroup, Wikipedia, This page was last edited on 31 December 2024, , https://en.wikipedia.org/wiki/Human_Y-chromosome_DNA_haplogroup

Cloud, Janine, Y-DNA Haplotree Growth and Genetic Discoveries in 2024, 16 Jan 2025, FamilyTreeDNA Blog, https://blog.familytreedna.com/y-dna-haplotree-growth-2024/

Haplogroup, Wikipedia, This page was last edited on 12 January 2025, https://en.wikipedia.org/wiki/Haplogroup

[34] Y Chromosome Consortium. A nomenclature system for the tree of human Y-chromosomal binary haplogroups. Genome Res. 2002 Feb;12(2):339-48. doi: 10.1101/gr.217602. PMID: 11827954; PMCID: PMC155271, https://pmc.ncbi.nlm.nih.gov/articles/PMC155271/

[35] Estes, Roberta, Y DNA Tree of Mankind Reaches 50,000 Branches, 7 Dec 2021, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2021/12/07/y-dna-tree-of-mankind-reaches-50000-branches/

[36] Williams, Edison,A Brief History of the yDNA Haplotree, 18 Feb 2024,  Wikitree G2G, https://www.wikitree.com/g2g/1706781/a-brief-history-of-the-ydna-haplotree

[37] Cloud, Janine, Y-DNA Haplotree Growth and Genetic Discoveries in 2024, 16 Jan 2025, FamilyTreeDNA Blog, https://blog.familytreedna.com/y-dna-haplotree-growth-2024/

[38] van Oven M, Kayser M. 2009. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum Mutat 30(2):E386-E394. http://www.phylotree.org. doi:10.1002/humu.20921

[39] Estes, Roberta, What is a Haplogroup, 24Jan 2013, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2013/01/24/what-is-a-haplogroup/

[40] Private variants are newer mutations that have not yet been officially named or placed on the haplotree. They are specific to particular family lines and must be found in multiple testers before receiving official designation.

A terminal SNP represents the most recently confirmed and named mutation on the Y-DNA haplotree for an individual. It defines the latest known subclade in a person’s lineage.

Both can be distinguished by naming status. Private variants are unnamed mutations waiting to be officially recognized. Terminal SNPs have been officially named and placed on the haplotree.

Verification requirements for both are different. Private variants need confirmation through multiple testers to become named SNPs. Terminal SNPs are already established and confirmed markers.

Both represent different points on a genealogical timeline. Private variants typically represent more recent mutations in a family line. Terminal SNPs can represent older, well-established branch points in the haplotree.

For a private variant to be officially named and placed on the Y-DNA haplotree, it must be found in at least two or more samples with sufficient positive reads; compared against other Big Y DNA test results to verify uniqueness; and reviewed by phylogenetic experts to ensure it hasn’t been discovered by another lab.

Once confirmed, private variants receive specific designations. For Big Y-500 discoveries they get the prefix “BY” followed by a number. For Big Y-700 discoveries they receive the prefix “FT” (or FTA, FTB, FTC, FTD) with a number.

See, for references:

Rowe-Schurwanz, Big Y Lifetime Analysis: The Myth of the Manual Review, 22 Nov 2023, FamilyTreeDNA Blog, https://blog.familytreedna.com/big-y-manual-review-lifetime-analysis/

Private variant vs novel variant vs singleton, 31 May 2015, FamilyTreeDNA Forum, https://forums.familytreedna.com/forum/paternal-lineages-y-dna/y-dna-haplogroups-snps-basics/330714-private-variant-vs-novel-variant-vs-singleton

Estes, Roberta, Glossary  – Terminal SNP, 29 Nov 2017, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2017/11/29/glossary-terminal-snp/

Estes, Roberta, Y DNA: Step-By-Step Big Y Analysis, 30 May, 2020, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2020/05/30/y-dna-step-by-step-big-y-analysis/

Marian AJ. Clinical Interpretation and Management of Genetic Variants. JACC Basic Transl Sci. 2020 Oct 26;5(10):1029-1042. doi: 10.1016/j.jacbts.2020.05.013. PMID: 33145465; PMCID: PMC7591931, https://pmc.ncbi.nlm.nih.gov/articles/PMC7591931/

Yang L. A Practical Guide for Structural Variation Detection in the Human Genome. Curr Protoc Hum Genet. 2020 Sep;107(1):e103. doi: 10.1002/cphg.103. PMID: 32813322; PMCID: PMC7738216, https://pmc.ncbi.nlm.nih.gov/articles/PMC7738216/

Marshall, C.R., Chowdhury, S., Taft, R.J. et al. Best practices for the analytical validation of clinical whole-genome sequencing intended for the diagnosis of germline disease. npj Genom. Med. 5, 47 (2020). https://doi.org/10.1038/s41525-020-00154-9

Angelo Fortunato, Diego Mallo, Shawn M Rupp, Lorraine M King, Timothy Hardman, Joseph Y Lo, Allison Hall, Jeffrey R Marks, E Shelley Hwang, Carlo C Maley, A new method to accurately identify single nucleotide variants using small FFPE breast samples, Briefings in Bioinformatics, Volume 22, Issue 6, November 2021, bbab221, https://doi.org/10.1093/bib/bbab221

Big Y Private Variants Guide, FamilyTreeDNA Help center, https://help.familytreedna.com/hc/en-us/articles/4402695710223-Big-Y-Private-Variants-Guide

de Vere, Lloyd, What Is your template statement for Y DNA proved by Big Y SNPs, 21 Jan 2022, WikiTree G2G, https://www.wikitree.com/g2g/1362001/what-is-your-template-statement-for-y-dna-proved-by-big-y-snps

[41] Cloud, Janine, Y-DNA Haplotree Growth and Genetic Discoveries in 2024, 16 Jan 2025, FamilyTreeDNA Blog, https://blog.familytreedna.com/y-dna-haplotree-growth-2024/

[42] See, for example:

The Braudel Method, The Indian Ocean World Centre, a McGill Research Centre, McGill University, https://indianoceanworldcentre.com/fernand-braudel/

Guldi J, Armitage D. Going forward by looking back: the rise of the longue durée. In: The History Manifesto. Cambridge University Press; 2014:14-37

McNeill, William H. “Fernand Braudel, Historian.” The Journal of Modern History, vol. 73, no. 1, 2001, pp. 133–46. JSTOR, https://doi.org/10.1086/319882 

Dale Tomich, The Order of Historical Time: Longue Durée and Micro-History, Almanack. Guarulhos, n.02, p.52-65, 2o semestre de 2011, https://www.scielo.br/j/alm/a/dF7D8LWPFhCjtjmx7NKbtQk/?format=pdf&lang=en

Smith, Michael, E., Braudel’s Temporal Rhythms and Chronology Theory in Archaeology, in: Knapp AB, ed. Archaeology, Annales, and Ethnohistory. New Directions in Archaeology. Cambridge University Press; 1992:23-34. https://www.public.asu.edu/~mesmith9/1-CompleteSet/MES-92-Braudel1.pdf

[43] The the following influences on gentiic genealogy:

Influence DescriptionExamples in G Haplogroup
MigrationGenetic haplogroup migration is the study of how people with a particular genetic haplogroup have moved over time. By analyzing the distribution of haplogroups in different populations, geneticists can learn about human migration and evolution. [a] The predominant migratory path of the G haplogroup is believed to be from the Middle East, spreading westward across Anatolia into Europe during the Neolithic period, with some branches migrating eastward towards the Iranian plateau and Central Asia, with the highest concentrations currently found in the Caucasus region. [b]
BottleneckIt refers to a drastic reduction in a population size or the decimation of a gene pool (haplogroup) due to a catastrophic event or changes in social customs. The surviving individuals may not represent the full genetic spectrum of the original population. [c] The split between the G1 and G2 subclades, which is believed to have occurred in the region of modern-day Iran around the Last Glacial Maximum (LGM), indicating a period of significantly reduced population size where a small group of individuals carrying the G haplogroup expanded and diversified into the G1 and G2 lineages; this is often observed in the distribution of G2a, which is prevalent in the Caucasus and parts of the Middle East, suggesting a population expansion from a limited founder population. [d]
Founder EventIn a founder event, the founding group inherently carries only a subset of the original population’s genetic variation. [e] A founder event within the G haplogroup could be the migration of a population carrying the G haplogroup from the Caucasus region (where it is believed to have originated) into the Anatolian peninsula, leading to a significant increase in the frequency of G lineages within that region, possibly associated with the spread of early agriculture during the Neolithic period. [f]
AdmixtureThe process where individuals from two or more previously distinct populations interbreed, resulting in a new population with a mixed genetic ancestry, essentially meaning their DNA contains genetic traits from multiple ancestral origins; it’s the mixing of genes from different populations over time, creating a mosaic of genetic heritage within an individual.  [g]An example of admixture in the G haplogroup would be the presence of a significant portion of individuals carrying the G haplogroup in a population that is primarily associated with another haplogroup, like finding a high frequency of G haplogroup carriers in a region historically dominated by people with the R haplogroup, indicating past intermixing between populations from different geographical origins where the G haplogroup is more prevalent, such as the Middle East or the Mediterranean region. [h]
Population IsolationA situation where a group of people are geographically or culturally separated from other populations, leading to limited gene flow and a distinct genetic makeup within that isolated group, often revealing unique patterns in their DNA when compared to broader populations; essentially, it means a population has minimal genetic mixing with surrounding groups due to barriers like distance, language, or social customs, allowing researchers to study specific genetic traits more easily.  [i]
The Caucasus region’s mountainous terrain and historical political boundaries contributed to a degree of isolation, allowing specific G subclades to develop and become more prevalent within those populations. [j]
Natural Selection
Genetic Drift The random change in the frequency of certain genetic variants (alleles) within a population over time, simply due to chance, which can lead to some lineages becoming more prevalent while others become less common, even if those variations have no direct impact on survival or reproduction. It is a process where certain genes are passed on more frequently by random chance, altering the genetic makeup of a population over generations. [k]
Genetic drift has a more significant impact on smaller populations, where random fluctuations in allele frequencies can drastically change the genetic makeup. In Wales, a distinctive G2a3b1 type (DYS388=13 and DYS594=11) dominates and pushes the G percentage of the population higher than in England. [l]
DemeA “deme” refers to a small, localized population of organisms within a species that interbreed primarily with each other, essentially a distinct breeding group with a shared gene pool, often considered a sub-population within a larger population; it’s a key concept in population genetics, particularly when studying how genes evolve within geographically restricted areas. [m]Research demonstrates that patrilineal kinship systems played a crucial role in creating a Y-DNA bottleneck that occurred approximately 5,000-7,000 years ago.
The Y-chromosome bottleneck was a dramatic reduction in male genetic diversity to approximately one-twentieth of its original level, while female genetic diversity remained stable. [n]

[a] Lell JT, Wallace DC. The peopling of Europe from the maternal and paternal perspectives. Am J Hum Genet. 2000 Dec;67(6):1376-81. doi: 10.1086/316917. Epub 2000 Nov 9. PMID: 11078473; PMCID: PMC1287914, https://pmc.ncbi.nlm.nih.gov/articles/PMC1287914/

[b] Balanovsky O, Zhabagin M, Agdzhoyan A, Chukhryaeva M, Zaporozhchenko V, Utevska O, et al. (2015) Deep Phylogenetic Analysis of Haplogroup G1 Provides Estimates of SNP and STR Mutation Rates on the Human Y-Chromosome and Reveals Migrations of Iranic Speakers. PLoS ONE 10(4): e0122968. https://doi.org/10.1371/journal.pone.0122968

[c] Sanders, Robert, Bottlenecks that reduced genetic diversity were common throughout human history, 23 Jun 2022, UC Berkeley News, https://news.berkeley.edu/2022/06/23/bottlenecks-that-reduced-genetic-diversity-were-common-throughout-human-history/

Zeng, T.C., Aw, A.J. & Feldman, M.W. Cultural hitchhiking and competition between patrilineal kin groups explain the post-Neolithic Y-chromosome bottleneck. Nat Commun9, 2077 (2018). https://doi.org/10.1038/s41467-018-04375-6

Tournebize R, Chu G, Moorjani P (2022) Reconstructing the history of founder events using genome-wide patterns of allele sharing across individuals. PLoS Genet 18(6): e1010243. https://doi.org/10.1371/journal.pgen.1010243 

[d] Burkhard Berger, Harald Niederstätter, Daniel Erhart, Christoph Gassner, Harald Schennach, Walther Parson, High resolution mapping of Y haplogroup G in Tyrol (Austria), Forensic Science International: Genetics, Volume 7, Issue 5, 2013,Pages 529-536, https://www.sciencedirect.com/science/article/abs/pii/S1872497313001361

[e] Slatkin M. A population-genetic test of founder effects and implications for Ashkenazi Jewish diseases. Am J Hum Genet. 2004 Aug;75(2):282-93. doi: 10.1086/423146. Epub 2004 Jun 18. PMID: 15208782; PMCID: PMC1216062, https://pmc.ncbi.nlm.nih.gov/articles/PMC1216062/

[f] Sims LM, Garvey D, Ballantyne J. Improved resolution haplogroup G phylogeny in the Y chromosome, revealed by a set of newly characterized SNPs. PLoS One. 2009 Jun 4;4(6):e5792. doi: 10.1371/journal.pone.0005792. PMID: 19495413; PMCID: PMC2686153, https://pmc.ncbi.nlm.nih.gov/articles/PMC2686153/

[g] Shriner D. Overview of admixture mapping. Curr Protoc Hum Genet. 2013;Chapter 1:Unit 1.23. doi: 10.1002/0471142905.hg0123s76. PMID: 23315925; PMCID: PMC3556814, https://pmc.ncbi.nlm.nih.gov/articles/PMC3556814/

[h] Haplogroup G (Y-DNA) by country, Wikipedia, This page was last edited on 15 October 2024, https://en.wikipedia.org/wiki/Haplogroup_G_(Y-DNA)_by_country

[i] Killgrove, Kristina, 9 of the most ‘genetically isolated’ human populations in the world, 17 Dec 2024, https://www.livescience.com/health/9-of-the-most-genetically-isolated-human-populations-in-the-world

[j] Sims LM, Garvey D, Ballantyne J. Improved resolution haplogroup G phylogeny in the Y chromosome, revealed by a set of newly characterized SNPs. PLoS One. 2009 Jun 4;4(6):e5792. doi: 10.1371/journal.pone.0005792. PMID: 19495413; PMCID: PMC2686153,https://pmc.ncbi.nlm.nih.gov/articles/PMC2686153/

[k] Genetic Drift and Natural Selection, Population Genetics and Statistics for Forensic Analysts National Institute of Justice , U.S. Department of Justice, https://nij.ojp.gov/nij-hosted-online-training-courses/population-genetics-and-statistics-forensic-analysts/population-theory/hardy-weinberg-principle/genetic-drift-and-natural-selection

[l] Genetic Drift, Wikipedia, This page was last edited on 15 December 2024, https://en.wikipedia.org/wiki/Genetic_drift

[m] Deme (biology), Wikipedia, This page was last edited on 1 May 2023, https://en.wikipedia.org/wiki/Deme_(biology)

[n] Zeng, T.C., Aw, A.J. & Feldman, M.W., Cultural hitchhiking and competition between patrilineal kin groups explain the post-Neolithic Y-chromosome bottleneck. Nat Commun 9, 2077 (2018). https://doi.org/10.1038/s41467-018-04375-6

[44] Rob Spencer, The Big Picture of Y STR Patterns, The 14th International Conference on Genetic Genealogy, Houston, TX March 22-24, 2019,  http://scaledinnovation.com/gg/ext/RWS-Houston-2019-WideAngleView.pdf Page 12

[45] Zeng, T.C., Aw, A.J. & Feldman, M.W., Cultural hitchhiking and competition between patrilineal kin groups explain the post-Neolithic Y-chromosome bottleneck. Nat Commun 9, 2077 (2018). https://doi.org/10.1038/s41467-018-04375-6

[46] Paleogenomics is the scientific field focused on reconstructing and analyzing genomic information from ancient DNA. This cutting-edge discipline has revolutionized our understanding of ancient life through the examination of preserved genetic material. Paleogenomics has made significant contributions to genealogical research by revolutionizing our understanding of human ancestry and migration patterns.

Anthropological genetics has become a fundamental tool in reconstructing human evolutionary histories by combining molecular analysis with traditional anthropological approaches. The field combines insights from genomics, archaeology, and anthropology to understand transformative processes like migration and colonization1. This multidisciplinary approach provides a more comprehensive understanding of human evolutionary history.

The integration of historical analysis and ancient DNA research has revolutionized our understanding of human migration patterns and cultural development. This integrated approach continues to provide new insights into human history, demonstrating that cultural and biological histories are deeply intertwined. For example, archaeological evidence has helped interpret genetic data by providing crucial temporal and spatial frameworks. For example, the discovery of pottery in Anatolia coincided with genetic signatures from Levantine farmers, indicating a migration associated with technological advancement.

Paleaognomics, Wikipeda, This page was last edited on 16 December 2023, https://en.wikipedia.org/wiki/Paleogenomics

Hassler, Margaret, Genetic Lab to Revisit the Past, College of Liberal Arts, anthropology, University of Minnesota, https://cla.umn.edu/anthropology/news-events/story/genetics-lab-revisit-past

Gokcumen, Omer, “Evolution, Function and Deconstructing Histories: A New Generation of Anthropological Genetics” (2017). Human Biology Open Access Pre-Prints. 124.
http://digitalcommons.wayne.edu/humbiol_preprints/124

Pickrell JK, Reich D. Toward a new history and geography of human genes informed by ancient DNA. Trends Genet. 2014 Sep;30(9):377-89. doi: 10.1016/j.tig.2014.07.007. Epub 2014 Aug 26. PMID: 25168683; PMCID: PMC4163019, https://pmc.ncbi.nlm.nih.gov/articles/PMC4163019/

Skourtanioti, E., Ringbauer, H., Gnecchi Ruscone, G.A. et al. Ancient DNA reveals admixture history and endogamy in the prehistoric Aegean. Nat Ecol Evol 7, 290–303 (2023). https://doi.org/10.1038/s41559-022-01952-3

[47] Sources for creating the illustration are from various sources:

[a] Rolf Langland and Mauricio Catelli, Haplogroup G-L497 Chart D: FG4 77 Branch, 2 Aug 2024, FTDNA G-L497 Working Group, https://drive.google.com/file/d/1xuZseoX40tWQhU5TpXZXqD6Y9zI9eqVz/view ;

[b] FTDNA Globetrekker Mapping of migration of the G Haplogroup based on end point for G-Y132505;

[c] Maciamo, Eupedia map of Late Bronze Age Europe (1200 – 1000 BCE), 2009 – 2017, https://www.eupedia.com/europe/neolithic_europe_map.shtml#late_bronze_age ;

[d] “The percentage of haplogroup G among available samples from Wales is overwhelmingly G-P303. Such a high percentage is not found in nearby England, Scotland or Ireland.”

Haplogroup G-P303, Wikipedia, This page was last edited on 10 December 2024, https://en.wikipedia.org/wiki/Haplogroup_G-P303 ;

(e) “In Wales, a distinctive G2a3b1 type (DYS388=13 and DYS594=11) dominates and pushes the G percentage of the population higher than in England.

Haplogroup G-M201, Wikipedia, This page was last edited on 6 January 2025, https://en.wikipedia.org/wiki/Haplogroup_G-M201 and

[f] E.K. Khusnutdinova, N.V. Ekomasova, et al., Distribution of Haplogroup G-P15 of the Y-Chromosome Among Representatives of Ancient Cultures and Modern Populations of Norther Eurasia, Opera Med Physiol. 2023. Vol. 10 (4): 57 – 72, doi: 10.24412/2500-2295-2023-4-57-72

[g] Watkins, Mathew, The migration path for the G-L497 men entering into Britain, 28 May 2024, Activity Feed, G-L497 Y-DNA Group Project, FamilyTreeDNA, https://www.familytreedna.com/groups/g-ydna/activity-feed

[48] FamilyTreeDNA offers a wide variety of Y-DNA Group Projects to help further research goals. The group projects are associated with specific branches of the Y-DNA Haplotree, geographical areas, surnames, or other unique identifying criteria. Based on their respective area of focus, the research groups have access to and the ability to compare Y-DNA results of fellow project members to determine if they are related. These projects are run by volunteer administrators who specialize in the haplogroup, surname, or geographical region that one may be researching. 

For my research on the Griff(is)(es)(ith) family, upon the receipt of my Y-DNA test, I joined five Y-DNA Family Tree DNA based projects to assist in my ongoing research:

The Wales Cymru DNA project collects the DNA haplotypes of individuals who can trace their Y-DNA and/or mtDNA lines to Wales (the reasoning by many researchers being that there was less genetic replacement from invaders there than elsewhere, excepting small inaccessible islands and similar locales). Tradition holds that the Celts retreated as far west in Wales as possible to escape invading populations. This project seeks to determine the validity of the theory. This project is open to descendants from all of Wales. (857 members as of the date of this article.)

The GRIFFI(TH,THS,N,S,NG…etc) surname project is intended to provide an avenue for connecting the many branches of Griffith, Griffiths, Griffin, Griffis, Griffing and other families with derivative surnames. The Welsh patronymic naming system, practiced into the latter 18th century, makes this task more difficult. Evan, Thomas, John, Rees, Owen, and many other common Welsh names may share common male ancestors. (871 members as of the date of this article).

The G-L497 project includes men with the L497 SNP mutation or reliably predicted to be G-L497+ on the basis of certain STR marker values. The L-497 is a branch or subclade of the G-haplogroup (M201+). The project also welcomes representatives of L497 males who are deceased, unavailable or otherwise unable to join, including females as their representatives and custodians of their Y-DNA. The primary goal of the project is to identify new subgroups of haplogroup G-L497 which will provide better focus to the migration history of our haplogroup G-L497 ancestors. (2,438 members as of the date of this article.)

The G-Z6748 project is a Y-DNA Haplogroup Project for a specific branch that is a more recent, ‘downstream’ branch from the L-497 branch of the G haplotree. It is a project work group that is a subset of the L497 work group. The G-Z6748 subclade or brand appears to be a largely Welsh haplogroup, though extending into neighboring parts of England. (50 members as of the date of the article)

The Welsh Patronymics project is designed to establish links between various families of Welsh origin with patronymic style surnames. Because the patronymic system (father’s given name as surname) continued until the 19th century in some parts of Wales, there was no reason to limit this study to a single surname. (1,661 members as of the date of this article.)

[49] The tool creates personalized animations spanning 200,000 years of history, tracking ancestral journeys from Y-Adam to an individual’s current Big Y haplogroup. It contains over 48,000 paternal line migration paths covering all populated continents.

Example Used in the Diagram

Click for Larger View | Source: FTDNA Globetrekker Mapping of migration of the G Haplogroup based on end point for G-Y132505

Globetrekker employs sophisticated phylogenetic algorithms that factor in topographical information, historical global sea levels, land elevation, and ice age glaciation. The system combines multiple data types to generate migration paths: archaeological data, earliest known ancestor locations from users and matches, ancient DNA samples, and population genetic studies.

Estes, Roberta, Globetrekker – A New Feature for Big Y Customers from FamilyTreeDNA, 4 Aug 2023, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2023/08/04/globetrekker-a-new-feature-for-big-y-customers-from-familytreedna/

Runfeldt, Goran , Globertrekker, Part 1: A NewFamilyTreeDNA Discover™ Report that Puts Big Y on the Map, 31 Jul 2023, FamilyTreeDNA Blog, https://blog.familytreedna.com/globetrekker-discover-report/

Maier, Paul, Globetrekker, Part 2: Advancing the Science of Phylogeography, 15 Aug 2023, FamilyTreeDNA Blog, https://blog.familytreedna.com/globetrekker-analysis/

[50] Rootsi S, Myres NM, Lin AA, Järve M, King RJ, Kutuev I, Cabrera VM, Khusnutdinova EK, Varendi K, Sahakyan H, Behar DM, Khusainova R, Balanovsky O, Balanovska E, Rudan P, Yepiskoposyan L, Bahmanimehr A, Farjadian S, Kushniarevich A, Herrera RJ, Grugni V, Battaglia V, Nici C, Crobu F, Karachanak S, Hooshiar Kashani B, Houshmand M, Sanati MH, Toncheva D, Lisa A, Semino O, Chiaroni J, Di Cristofaro J, Villems R, Kivisild T, Underhill PA. Distinguishing the co-ancestries of haplogroup G Y-chromosomes in the populations of Europe and the Caucasus. Eur J Hum Genet. 2012 Dec;20(12):1275-82. doi: 10.1038/ejhg.2012.86. Epub 2012 May 16. PMID: 22588667; PMCID: PMC3499744, https://pmc.ncbi.nlm.nih.gov/articles/PMC3499744/

Hay, Maciamo,Haplogroup G2a (Y-DNA), Jul 2023, Eupedia, https://www.eupedia.com/europe/Haplogroup_G2a_Y-DNA.shtml

Haplogroup G-M201, Wikipedia, This page was last edited on 13 January 2025, https://en.wikipedia.org/wiki/Haplogroup_G-M201

Haplogroup G-P303, Wikipedia, This page was last edited on 10 December 2024, https://en.wikipedia.org/wiki/Haplogroup_G-P303

[51] E.K. Khusnutdinova, N.V. Ekomasova, et al., Distribution of Haplogroup G-P15 of the Y-Chromosome Among Representatives of Ancient Cultures and Modern Populations of Northern Eurasia, Opera Med Physiol. 2023. Vol. 10 (4): 57 – 72, doi: 10.24412/2500-2295-2023-4-57-72

Rootsi S, Myres NM, Lin AA, Järve M, King RJ, Kutuev I, Cabrera VM, Khusnutdinova EK, Varendi K, Sahakyan H, Behar DM, Khusainova R, Balanovsky O, Balanovska E, Rudan P, Yepiskoposyan L, Bahmanimehr A, Farjadian S, Kushniarevich A, Herrera RJ, Grugni V, Battaglia V, Nici C, Crobu F, Karachanak S, Hooshiar Kashani B, Houshmand M, Sanati MH, Toncheva D, Lisa A, Semino O, Chiaroni J, Di Cristofaro J, Villems R, Kivisild T, Underhill PA. Distinguishing the co-ancestries of haplogroup G Y-chromosomes in the populations of Europe and the Caucasus. Eur J Hum Genet. 2012 Dec;20(12):1275-82. doi: 10.1038/ejhg.2012.86. Epub 2012 May 16. PMID: 22588667; PMCID: PMC3499744, https://pmc.ncbi.nlm.nih.gov/articles/PMC3499744/

Hay, Maciamo,Haplogroup G2a (Y-DNA), Jul 2023, Eupedia, https://www.eupedia.com/europe/Haplogroup_G2a_Y-DNA.shtml

G-P15 (Y-DNA), Geni, https://www.geni.com/projects/G-P15-Y-DNA/3927

Autosomal DNA Tests: Estimating Genetic Relationships and Discovering Relatives

In prior posts, I discussed the utility of Y-DNA tests as a possible avenue to gain insights and possible leads on identifying information about tracing the lineage associated with family surnames for the Griffis(ith)(es) family. [1] I have not discussed my experience of using autosomal DNA tests for genealogical and family research.

There are perhaps two unique things that atDNA tests can provide. They can:

  • identify unknown living relatives and their possible relationships; and
  • identify a possible relationship of a common ancestor that you share with a living relative.

My experience with atDNA tests have largely resulted in the initial discovery of many living third to fifth generational cousins. However, all of these distant cousins fail to document their respective lines of descent in various DNA company databases. The lack of this additional genealogical information makes it difficult to document where our common distant family connections are located.

A few of the genetic connections from the atDNA tests have provided documentation on common family connections. Based on their information, I have been able to identify a few distant connections. On two other occasions, I have discovered two half brothers.

This three part story focuses on the merits and limitations as well as my personal experience of using autosomal DNA (atDNA) tests for documenting genetic kinship ties in the Griffis family. This part provides general background to make sense of the DNA results. The second part of the story discusses my ongoing DNA discoveries from these tests. As such, the information can change in the future. The third part is devoted to my profound discovery of having two half siblings David and Greg.

General Comparison of DNA Tests

Depending on the DNA test, they tell you how much of their DNA you have inherited from unspecified ancestors on each side of your family or how far back you can trace genetic lineages through a maternal or paternal line. Genetic genealogy or results from DNA tests do not tell you where each member on your family tree lived or provide information on their specific family relationships.

DNA results can identify matches of living individuals and their possible shared kinship relationships. These estimates are based on the amount of shared DNA segments between the match and you. When it comes to identifying specific individuals and verifying kinship relationships, traditional genealogical research is typically required for interpretation of the results. [2]

There are basically three types of genetic tests used in genealogical research. Autosomal ancestry (atDNA), Y-DNA, and mitochondrial DNA (mtDNA) tests (see illustration one below). Autosomal tests can analyze a broader range of genetic family network ties than the Y-DNA or mtDNA tests. Y-DNA and mtDNA tests respectively trace the paternal and maternal sides of one’s genetic history. The atDNA tests are broader in their ability to trace genetic relatives on both sides of your family tree. However, their effectiveness of tracing ancestors is limited in terms of how many generations back they can effectively provide results. Another unique characteristic of the atDNA tests is matching living test takers through the amount of shared autosomal DNA.

Illustration One: Three Types of DNA Tests

Click for Larger View | Source: Modified version of an image found at Edward Sweeney, Types of DNA Test, MacDugall DNA Research Project, https://macdougalldna.org/types-of-dna-test-b/

As indicated in table one, while limited to the paternal line of descent, Y-DNA tests can effectively track male genetic descendants back around 300,000 years. Mitochondrial testing of the matrilineal line can also provide results that go back over 140 thousands of years. The popular atDNA ‘ethnicity’ tests can trace back through a limited number of generations. While women have two X chromosomes, DNA testing of the X-DNA is usually tested along with other chromosomes as part of an atDNA test. [3]

Table 1: Type of DNA Testing

CharacteristicAutosomal
DNA (atDNA)
Y – DNA (YDNA)Mitochondrial
DNA (mtDNA)
What does it test?All autosomal chromosomesY chromosomeMitochondria
Available toBoth males and
females
Only males can
take test
Both males and
females
How far back?5 – 9 generations~155,000 Years~200,000+ years
Source of TestingAutosomal
Chromosomes
Y ChromosomeX Chromosom
found in Mitochondria
What genealogical lines tested?All ancestry linesOnly Paternal (father’s
father’s father, etc)
Maternal (mother’s
mother’s mother, etc.)
Benefits – utilityFinding relatives within
a few generations, determining broader
ethnicity estimations,
identifying potential
matches across both sides
Tracing direct
paternal lines, surnames,
identifying specific
paternal lineages and haplogroups,
studying deep paternal ancestry
Tracing a direct
maternal line,
identifying maternal haplogroups,
analyzing ancient
ancestry patterns
Available from
the following
companies:
– ancestry.com
– Family Tree DNA
– 23andMe
– Myheritage
– Living DNA
– Family Tree DNA
– 23andME (high level)
– YSEQ
– Full Genome Corp
– Family Tree DNA
– 23andMe
– YSEQ
– Full Genome Corp

Autosomal DNA tests are useful for finding relatives, such as unknown relatives, clarifying uncertain family relationships and identifying distant relatives. Typically DNA companies identify matches up to six generations. The Y-DNA and mtDNA tests, while limited to only tracing paternal lines or maternal lines respectively, can trace genetic lineage back over 150,000 years.

Popularity of Autosomal DNA Tests

“For about a hundred dollars, it is now possible to spit into a tube, drop it in the mail, and within a couple of months gain access to a list of likely relatives. If you have any colonial American ancestors, the first thing you realize, taking a DNA test for genealogical purposes, is that potential sixth cousins are a whole lot easier to come by than you ever imagined. Even fifth cousins — people with whom you share a fourth great-grandparent — aren’t a particular scarcity.” [4]

These tests provide information about an individual’s ancestral roots, and they can help to connect people with their relatives, sometimes as distantly related as fourth or fifth cousins. Such information can be particularly useful when a person does not know their genealogical ancestry (eg. many adoptees and the descendants of forced migrants). [5]

The direct-to-consumer genetic testing market has shown significant growth in recent years, but there are indications of a recent slowdown in sales in 2023.

As many people purchased consumer DNA tests in 2018 as in all previous years combined. [6] Combined with prior years of personal consumer testing, more than 26 million consumers had added their DNA to ostensibly four leading commercial ancestry and health databases.

Chart One: atDNA Database Growth

Click for Larger View | Source: 23andMe Has More Than 10 Million Customers, April 8, 2019, The DNA Geek Blog, https://thednageek.com/23andme-has-more-than-10-million-customers/

In late 2019, there were signs of declining sales. Ancestry and 23andMe saw drops in direct website sales of 38% and 54% respectively compared to 2018. [7]

“Less than five years ago, consumer DNA tests were being hailed as the innovative technology of the future—but today, declining sales have forced several companies in the field to scale back their workforces and adjust their business strategies.” [8]

Market data from DNA companies suggest that the market continues to grow, albeit at a slower rate than the initial boom years. Projections include all type of DNA tests (e.g. genetic relatedness, ancestry, lifestyle wellness, reproductive health, personalized medicine, sports nutrition, reproductive health, diagnostics and others). Factors like market saturation among early adopters and privacy concerns may be contributing to the moderation in growth rates.

Despite the decade-long rise in sales, in 2020 there was a sudden decline in interest. Two of the leading companies, 23andMe and AncestryDNA, experienced declines in sales of DNA ancestry kits of 54 and 38 percent, respectively. The decline was attributed to market saturation, economic recession related to the COVID-19 pandemic, and privacy concerns. [9]

Since 2021, 23andMe, a prominent direct-to-consumer genetic testing company, has faced significant financial challenges that have raised concerns about its future and the security of customer data. The company’s financial situation has deteriorated rapidly. Its stock price has plummeted, losing over 97% of its value since going public in 2021. 23andMe is reportedly on the verge of bankruptcy and has never turned a profit.  In 2023, the company suffered a major data breach affecting nearly 7 million users. The company has had turnover of board members and internal dissension between board members and executive management. [10]

This situation surrounding 23andMe serves as a cautionary tale about the risks associated with entrusting sensitive genetic information to private companies and highlights the need for robust data protection measures in the rapidly evolving field of consumer genomics. It also underscores the need to have back up contingencies of one’s DNA data. [10a]

What do atDNA Tests Measure?

Autosomal DNA tests basically measure five things.

  1. Genetic Markers: atDNA tests look at hundreds of thousands of genetic markers in a DNA sample called single nucleotide polymorphisms (SNPs) across the 22 autosomal chromosome pairs. More on SNPs later in this story. These sampled SNPs represent DNA sequences that can be used to efficiently identify genetic differences and similarities between individuals.
  2. Inheritance Patterns: The tests examine the autosomal DNA inherited from both parents, which includes genetic contributions from all recent ancestors. This allows for connections to be made with relatives on all “recent” branches of a family tree, not just direct paternal or maternal lines in the past six or so generations.
  3. Genetic Relatives: The tests identify shared DNA segments between the test taker and other individuals in the DNA test company’s database, allowing for the discovery of genetic relatives that are living and linking each matched DNA tester to past generations.
  4. Ethnicity Estimates: By comparing an individual’s genetic markers to reference populations maintained by a DNA test company, autosomal DNA tests can provide estimates of a person’s ancestral origins and ethnic background.
  5. Health Traits: Many atDNA testing companies also include screening for certain inherited health conditions or physical traits that can play in one’s life to identify certain genetic code that could affect health.

The Genetic Influence of Autosomal DNA

An atDNA test is a measurement of sampled parts of your 22 autosomal chromosomes. Everyone (with rare exceptions) is born with a set of 23 pairs of chromosomes. The twenty-third chromosome is the sex chromosome. In most cases, we inherit an X chromosome from our mother and a Y or X chromossome from our father to determine our sex differentiation. (See illustration two).

Illustration Two: Karyotype of Human Chromosomes [11]

Click for Larger View | Source: Karyotype, National Genome Human Genome Research Institute, https://www.genome.gov/genetics-glossary/Karyotype

We inherit half of our chromosomes from our mother and the other half from our father. Two of those pairs are usually sex chromosomes (for most cases, XX in females and XY in males). The remaining 22 pairs of chromosomes are autosomal chromosomes or autosomes. For example, as illustrated below, chromosomes from the depicted mother are labeled in purple, and chromosomes from the depicted father are labeled in teal. (See illustration three).  [12]

Illustration Three: Inheritance of Parental Chromosomes

Click for Larger View| Source: Human Genomic Variation, Fact Sheet, National Human Genome Research Institute, 1 Feb 2023, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

The genetic inheritance patterns associated with autosomal chromosomes become more complex and diluted over generations due to recombination and variable inheritance patterns. [13] Illustration four shows the average amount of atDNA inherited by all close relations up to the third cousin level. The illustration uses the maternal side as a an example. The percentages can be replicated for the paternal side. [14] As reflected in the chart, fifty percent of one’s atDNA is inherited from each parent and roughly equally portions from grandparents to about 3x great-grandparents. 

Illustration Four: Percent of Autosomal Genetic Inheritance from Descendants

Click for Larger View | Source: Dimario, A chart illustrating the different types of cousins, including genetic kinship marked within boxes in red which shows the actual genetic degree of relationship (gene share) with ‘self’ in percentage (%), 27 April 2010, Wikimedia Commons, https://commons.wikimedia.org/wiki/File:Cousin_tree_(with_genetic_kinship).png

During meiosis [15], genetic recombination occurs, shuffling segments of DNA from each of the parents. This means that siblings may inherit different combinations of DNA segments from their parents; and with each generation, the specific segments inherited become more randomized. As a result, the amount of shared DNA between relatives decreases exponentially with each generation, making it more challenging to detect distant relationships through autosomal testing.

The random nature of genetic inheritance leads to variability in how much DNA is shared between relatives, especially for more distant relationships. This is known as variable expressivity. [16] For example, as indicated in table two, full siblings may share anywhere from about 35% to 65% of their DNA; and first cousins typically share around 12.5% of their DNA, but the actual range can vary significantly. This variability increases with more distant relationships, making it harder to precisely determine the degree of relatedness based solely on shared DNA percentages (see table two).  [17]

Table Two: Average Percent of Autosomal DNA Shared Between Selected Relatives

RelationshipAverage Percent
of DNA Shared
Range of DNA
Shared
Identical Twin100%N/A
Parent-Child50% (but 47.5% for father-son relationships)N/A
Full Sibiling50%38% – 61%
Half Sibling
Grandparent / Grandchild
Aunt / Uncle
Niece / Nephew
25%17% – 34%
1st Cousin
Great-grandparent
Great-grandchild
Great-Uncle / Aunt
Great Nephew / Niece
12.5%4% – 23%
1st Cousin once removed
Half first cousin
6.25%2% – 11.5%
2nd Cousin3.13%2% – 6%
2nd Cousin once removed
Half second cousin
1.5%0.6% – 2.5%
3rd Cousin0.78%0% – 2.2%
4th Cousin0.20%0% – 0.8%
5th Cousin
to Distant Cousin
0.05%
Source: Average Percent DNA Shared Between Relatives, 23andMe Customer Care, Tools, 23andMe, https://customercare.23andme.com/hc/en-us/articles/212170668-Average-Percent-DNA-Shared-Between-Relatives

While autosomal DNA testing has become increasingly accurate, there are still limitations in the context of estimating genetic relations and finding relatives. Current testing methods typically analyze only a subset of genetic markers. In addition, the interpretation of results relies on comparison to reference populations, which may not fully represent all ancestral groups. In the end, as previously stated, traditional genealogical research brings atDNA results into focus.

Genetic Variants: The Genetic Basis of atDNA Testing

genome is the complete set of DNA instructions found in every cell. [18] As discussed in a prior story, the human cell is a masterpiece of data compression. [19] Its nucleus, just a few microns wide, contains (if you ‘spell’ it out) six feet of genetic code comprised in a double helix called the DNA: deoxyribonucleic acid (see illustration five).

Illustration Five: Structure of Deoxyribonucleaic Acid (DNA)

Source: Modified image of DNA as found in Ruairo J Mackenie, DNA vs. RNA – 5 Key Differences and Comparison, 18 Dec 2020, updated 24 Jan 2024, Technology Networks, Genomics Research, https://www.technologynetworks.com/genomics/lists/what-are-the-key-differences-between-dna-and-rna-296719

The DNA helical molecules string together some three billion pairs of nucleotides that are comprised of proteins, sugar (deoxyribose), a phosphate and four types of nitrogenous bases which are represented by an initial: A (adenine), C (cytosine), G (guanine), and T (thymine). Nucleotides are the fundamental building blocks that make up the DNA strands. The sequence of nucleotides along the DNA strand encodes genetic information and regulates when codes are activated. [20]

The nucleotides form base pairs and are the cornerstone of genetic testing. (See illustration six.) They are the foundation of the programming language of our genetic code. Whenever a particular base is present on one side of a strand of the DNA, its complementary base is found on the other side. Guanine always pairs with cytosine. Thymine always pairs with adenine. So one can write the DNA sequence by listing the bases along either one of the two sides or strands. When DNA companies perform their tests, they essentially separate the two stands of the helix and use one side of the helix as the template or coding strand when they map out an individual’s DNA results.

Illustration Six: Relationship between Nucleotides, Base Pairs, Chromosomes, Genes, and DNA

Approximately 2% of our genome encodes proteins – this is where gene strands are located (illustration seven).  Coding “gene” DNA makes up only about one to three percent of the human genome, while noncoding DNA comprises approximately 97-99% of our total genetic material. This distribution shows that the vast majority of our genome consists of noncoding sequences. [21]

Genes are the basic unit of inherited DNA and carry information for making proteins, which perform important functions in your body. The coded regions of the genome produce proteins with structural, functional, and regulatory roles in cells and to a larger extent the human body. The remainder of our genome is made of noncoding DNA, sometimes called “junk DNA”, which is a misnomer. It is estimated that between 25% and 80% of non-coding DNA regulates gene expression (e.g. when, where, and for how long a gene is turned on to make a protein). [22] The non-coding DNA that does not regulate gene activity is composed either of deactivated genes that were once useful for our non-human ancestors (like a tail) or parasitic DNA from virus that have entered our genome and replicated themselves hundreds or thousands of times over the generations, or generally serve no purpose in the host organism.

Illustration Seven: Coding and Non-Coding Regions of the Genome

Clck for Larger View | Source: Modified version of graphic found at – Non-Coding DNA, AncestryDNA Learning Hub, https://www.ancestry.com/c/dna-learning-hub/junk-dna

Out of 3.2 billion DNA letters or nucleotides, there are only a ‘handful of places’ on the DNA ribbon that might be different between individuals. Humans share a very high percentage of their DNA. The exact figure is subject to some debate and depends on how it is measured. The commonly cited figure is that humans are 99.9% genetically identical. More recent research suggests a slightly lower, but still very high, level of similarity. Humans share a very high percentage of their DNA – roughly 99.4% to 99.9%. The small differences of 0.1 and 0.6 between individuals are crucial for understanding human diversity and health. [23]

As indicated in illustration eight, there are multiple types of genomic variants that comprise 0.4 percent of the genome.. The smallest genomic variants are known as single-nucleotide variants (SNVs). Each SNV reflects a difference in a single nucleotide (or letter) in the DNA chain. For a given SNV, the DNA letter at that genomic position might be a C in one person but a T in another person as reflected in illustration nine. [24]

Illustration Eight: Potential Sources of Genetic Variants for atDNA Testing

Click for Larger View | Source: Modification of a chart found at – Chart Human Genomic Variation, Fact Sheet, National Human Genome Research Institute, 1 Feb 2023, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

Single-nucleotide variants (SNVs) are differences of one nucleotide at a specific location in the genome. An individual may have different nucleotides at a specific location on each chromosome (getting a different one from each parent), such as with Person 1 in illustration nine. An individual may also have the same nucleotide at such a location on both chromosomes, such as with Person 2 and Person 3 in the illustration.

Illustration Nine: An Example of a single-nucleotide variant (SNV)

Click for Larger View | Source: Human Genomic Variation, Fact Sheet, National Human Genome Research Institute, 1 Feb 2023, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

As reflected in illustration ten below, there are also a small group of genetic variants that are called insertions and deletions of nucleotides.

“Insertion/deletion variants reflect extra or missing DNA nucleotides in the genome, respectively, and typically involve fewer than 50 nucleotides. Insertion/deletion variants are less frequent than SNVs but can sometimes have a larger impact on health and disease (e.g., by disrupting the function of a gene that encodes an important protein).” [25]

One of the most common types of insertion/deletion variants are tandem repeats. [26] Tandem Repeats are short stretches of nucleotides that are repeated multiple times and are highly variable among people. Different chromosomes can vary in the number of times such short nucleotide stretches are repeated, ranging from a few times to hundreds of times.

Each person has a collection of different genomic variants. For example, in illustration ten below, Person 1 has an insertion variant; Person 2 has a SNV and deletion variant; and Person 3 has an insertion, SNV, and deletion variant. All three people have different tandem repeats. Different variants can be inherited from different parents as reflected in the illustration.

Illustration Ten: Examples of Other Types of Genetic Variants

Click for Larger View | Source: Human Genomic Variation, Fact Sheet, National Human Genome Research Institute, 1 Feb 2023, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

As indicated in illustration seven above, the third general type of genomic variations are structural variants (SVs). Structural variants extend beyond small stretches of nucleotides to larger chromosomal regions. These large-scale genomic differences involve at least 50 nucleotides and as many as thousands of nucleotides that have been inserted, deleted, inverted or moved from one part of the genome to another. [27]

Tandem repeats that contain more than 50 nucleotides are considered structural variants. In fact, such large tandem repeats account for nearly half of the structural variants present in human genomes. When a structural variant reflects differences in the total number of nucleotides involved, it is called a copy number variant (CNV). CNVs are distinguished from other structural variants, such as inversions and translocations, because the latter types often do not involve a difference in the total number of nucleotides. [28]

Cornerstone of atDNA Testing: Single Nucleotide Polymorphisms (SNPs)

A subtype of SNVs is the single-nucleotide polymorphism (SNP), pronounced as “snip” for short. To be considered a SNP, a SNV must be present in at least 1% of the human population. As such, a SNP is more common than the rare single-nucleotide differences.  [29]

Among the genetic variants, SNPs are relatively common, occurring approximately once every 500-1000 base pairs in the human genome. This translates to about 4 to 5 million SNPs in an individual’s genome. Scientists have found more than 600 million SNPs in populations around the world. The combination of technical feasibility, scientific reliability, and analytical power makes SNPs the optimal choice for autosomal DNA testing in genealogical and ancestry applications. [30]

Ancestry information markers refers to locations in the genome that have varied sequences at that location and the relative abundance of those markers differs based on the continent from which individuals can trace their ancestry. So by using a series of these ancestry information markers, sometimes 20 or 30 more, and genotyping an individual you can determine from the frequency of those markers where their great, great, great, great ancestors may have come from. [31]

SNPs represent natural variations that make individuals unique while being common enough to be reliable DNA test markers. Their high frequency makes them ideal markers for genetic analysis. The vast majority of SNPs have no effect on health or development. SNPs are generally found in the DNA between genes rather than within genes themselves. [32]

While other genetic markers exist, SNPs are preferred ancestry information markers. SNPs are used for genetic testing based on their reliability and accuracy. SNPs are stable genetic markers that are passed down through generations. SNPs offer more detailed information about both recent and ancient ancestry. They also allow for fairly precise ethnic profiling and ancestral location inference.[33]

How atDNA Tests Figure Out Genetic Relationships

In a “Nutshell”: How do DNA companies Figure Out Genetic Relationships

Analyzing SNPs: DNA companies analyze hundreds of thousands of single nucleotide polymorphisms (SNPs) across the 22 autosomal chromosomes. [34]

The results from different atDNA test companies can vary. The variance is based on a number of factors. All major DNA testing companies use equipment that analyze DNA specimens with what are called ‘chips’ that use DNA microarray technology supplied by a company named Illumina. However, different companies use different versions of the Illumina chip and each version tests different sets of SNP (Single Nucleotide Polymorphism) locations.

Illustration Ten: How DNA Microarray Technology Analyzes Autosomal DNA

Source: Bergström, Ann-Louise and Lasse Folkersen , DNA microarray, 15 May 2020, Moving Science, https://movingscience.dk/dna-microarray/

Companies can specify their own “other” locations to be included on their chip. The number of markers tested varies significantly by company. FamilyTreeDNA uses a customized Illumina chip. 23andMe and AncestryDNA use a customized Illumina Global Screening Array (GSA) chip. Living DNA uses an Affymetrix Axiom microarray (Sirius) chip. My Heritage uses an Illumina GSA chip. [35]

Illustration of Illumina Microarray Chips

Source: Web Graphic Array with GE Inserts, Illumina, Powerfully Informative Microarrays, Illumina,https://www.illumina.com/techniques/microarrays.html

“Each DNA testing company purchases DNA processing equipment. Illumina is the big dog in this arena. Illumina defines the capacity and structure of each chip. In part, how the testing companies use that capacity, or space on each chip, is up to each company. This means that the different testing companies test many of the same autosomal DNA SNP locations, but not all of the same locations. … This means that each testing company includes and reports many of the same, but also some different SNP locations when they scan your DNA. …  In addition to dealing with different file formats and contents from multiple DNA vendors, companies change their own chips and file structure from time to time. In some cases, it’s a forced change by the chip manufacturer. Other times, the vendors want to include different locations or make improvements.” [36]

When DNA companies change DNA chips, a different version of the company’s own file may contain different positions. DNA testing companies have to “fill in the blanks” for compatibility, and they do this using a technique called imputation. Illumina forced their customers to adopt imputation in 2017 when they dropped the capacity of their chip. [37]

Identify Matching Segments: The DNA test software for respective DNA companies compare the SNP data between two individuals to identify segments of DNA that appear to be identical or similar. These matching DNA segments indicate the likelihood of DNA inherited from a common ancestor. [38]

The ability to identify DNA matches between individuals is largely influenced by the size of database tests and the SNPs that were sampled to atDNA tests. As indicated, there are main differences between atDNA tests from various companies (e.g. 23andMe, Ancestry.com, FamilyTree DNA, LivingDNA, MyHeritage) regarding SNPs that are tested and the relative size of their respective database results.

Each company maintains its own proprietary reference databases and matching algorithms. As indicated in table three below, AncestryDNA has a larger customer database (over 20 million) compared to 23andMe (about 12 million). This gives AncestryDNA an advantage for finding genetic relatives.

Table Three: Data Base Size and Number of SNPs Tested by DNA Company in 2024

DNA
Company
Data Base Size of
atDNA Test Results
No. of Autosome
SNPs Tested
23andMe14 Million630,`132
FamilyTreeDNA1.7 million612,272
AncestryDNA25 million637,639
My Heritage8.5 million576,157
Living DNA300,000683,503
Source: Autosomal DNA testing comparison chart, International Society of Genetic Genalogy Wiki, This page was last edited on 8 October 2024, https://isogg.org/wiki/Autosomal_DNA_testing_comparison_chart

Measuring Segment Length: The length of matching segments of SNPs is measured in centimorgans (cM). Centimorgans measure the likelihood of genetic recombination between two markers on a chromosome. One centimorgan represents a one percent chance that two genetic markers will be separated by a recombination event in a single generation. This measurement helps geneticists and genealogists estimate how close two individuals are genetically related. [39]

Centimorgans (cM) are a crucial unit of measurement in genetic atDNA testing. It is used to quantify genetic distance and determine relationships between individuals based on shared DNA. The more centimorgans two people share, the more likely they are related. in addition to the number of cMs shared, longer segments generally indicate a closer relationship.

One cM corresponds on the average to about 1 million base pairs in humans. The total human genome is approximately 7400 cM long. A parent-child relationship typically shares about 3400-3700 cM. More distant relatives share fewer cMs. However, there can be overlap in cM ranges for different relationship types, so additional genealogical research is often needed to determine exact relationships.

(A centiMorgan) is less of a physical distance and more of a measurement of probability. It refers to the DNA segments that you have in common with others and the likelihood of sharing genetic traits. The ends of shared segments are defined by points where DNA swapped between two chromosomes, and the centimorgan is a measure of the probability of getting a segment that large when these swaps occur.” [40]

Chart One: Ranges of Shared centiMorgans with Family

Click for Larger View | Source: Bettinger, Blaine, Version 4.0! March 2020 Update to the Shared cM Project!, 27 Mar 2020, The Genetic Genealogist, https://thegeneticgenealogist.com/2020/03/27/version-4-0-march-2020-update-to-the-shared-cm-project/

When you take an atDNA test, the testing company compares your DNA to others in their database. The amount of DNA you share with a match is reported in centimorgans. Generally, the more centimorgans you share with someone, the more closely you are related to this other person. Shared centimorgan ranges can often indicate how many generations separate two people. Certain shared cM values can also suggest possible half-sibling or half-first cousin relationships as opposed to full relatives.

Calculating Total Shared DNA: The total amount of shared DNA is calculated by summing up the lengths of all matching segments, typically expressed in cMs or as a percentage of the total amount of shared SNPs sampled. [41]

Applying Thresholds: Each company sets minimum thresholds for segment length and total shared DNA to be considered a match. For example, FamilyTree DNA requires at least one segment of 9 cM or more.

Table Four: Different cM Thresholds for atDNA Matches Across DNA Companies

DNA CompanyCriteria for matching segments
23andMe9 cMs and at least 700 SNPs for one half-identical region

5 cMs and 700 SNPs with at least two half-identical regions being shared
FamilyTreeDNAAll matching segments must be at least 6 cMs in length. almost all matching segments contain at least 800 SNPs & all matching segments contain at least 600 SNPs.
AncestryDNA6 cMs per segment before the Timber algorithm is applied and a total of at least 8 cMs after Timber is applied.
My Heritage8 cM for the first matching segment and at least 6 cMs for the 2nd matching segment; 12 cM for the first matching segment in people whose ancestry is at least 50% Ashkenazi Jewish
Living DNA9.46 cMs for the first segment
Source: Autosomal DNA testing comparison chart, International Society of Genetic Genalogy Wiki, This page was last edited on 8 October 2024, https://isogg.org/wiki/Autosomal_DNA_testing_comparison_chart

Relationship Prediction: The amount of shared DNA is compared to expected ranges for different relationships to predict how two people may be related. Close relationships like parent/child or full siblings have very distinct amounts of shared DNA, while more distant relationships have overlapping ranges. [42]

Special Considerations: Some of the DNA companies use phasing algorithms to improve accuracy, especially for analyzing smaller shared segments. Some also apply special algorithms for populations with higher rates of endogamy, like Ashkenazi Jews. [43]

Moving Onward

I imagine all of this makes total sense. I, however, believe, all of this is totally confusing. To walk away with some semblance of understanding, I would focus on the following observations:

  • DNA tests can only provide so much information. Traditional genealogical research brings atDNA results into focus. Genetic and traditional research strategies can work hand in hand.
  • atDNA tests have the ability to trace living genetic relatives on both sides of your family tree. However, their effectiveness is limited in terms of how many generations back they can effectively provide results.
  • While autosomal DNA testing has become increasingly accurate, there are still limitations in the context of estimating genetic relations and finding relatives.
  • When looking at atDNA matches, centimorgans (cM) are the key unit of measurement in genetic atDNA testing. It is used to determine relationships between individuals based on shared DNA. The more centimorgans two people share, the more likely they are related. in addition to the number of cMs shared, longer segments generally indicate a closer relationship.

Sources

Feature image: The image depicts a branch from a massive family tree that shows 6,000 relatives spanning seven generations.  It is part of a study that links 13 million people related by genetics or marriage.  Source: Jocelyn Kaiser, Thirteen million degrees of Kevin Bacon: World’s largest family tree shines light on life span, who marries whom, Science, 1 Mar 2018, https://www.science.org/content/article/thirteen-million-degrees-kevin-bacon-world-s-largest-family-tree-shines-light-life-span .

See the original study behind this effort at: Kaplanis J, Gordon A, Shor T, Weissbrod O, Geiger D, Wahl M, Gershovits M, Markus B, Sheikh M, Gymrek M, Bhatia G, MacArthur DG, Price AL, Erlich Y. Quantitative analysis of population-scale family trees with millions of relatives. Science. 2018 Apr 13;360(6385):171-175. doi: 10.1126/science.aam9309. Epub 2018 Mar 1. PMID: 29496957; PMCID: PMC6593158. https://pmc.ncbi.nlm.nih.gov/articles/PMC6593158/

[1] See the following stories:

[2] Bettinger, Blaine, Everyone Has Two Family Trees – A Genealogical Tree and a Genetic Tree, 10 Nov 2009, The Genetic Genealogist, https://thegeneticgenealogist.com/2009/11/10/qa-everyone-has-two-family-trees-a-genealogical-tree-and-a-genetic-tree/

Understanding genetic ancestry testing, International Society of Genetic Genealogy Wiki, This page was last edited on on 25 August 2015, https://isogg.org/wiki/Understanding_genetic_ancestry_testing

[3] Human Y-chromosome DNA haplogroup, Wikipedia, This page was last edited on 5 October 2024,, https://en.wikipedia.org/wiki/Human_Y-chromosome_DNA_haplogroup

Human mitochondrial DNA haplogroup, Wikipedia, This page was last edited on 5 October 2024, https://en.wikipedia.org/wiki/Human_mitochondrial_DNA_haplogroup

Rowe, Katy, Genealogy’s Secret Weapon: How Using mtDNA Can Solve Family Mysteries, 10 May 2023, FamilyTreeDNA Blog, https://blog.familytreedna.com/mtdna/

MtDNA testing comparison chart, International Society of Genetic Genealogy Wiki, This page was last edited on 3 September 2023, https://isogg.org/wiki/MtDNA_testing_comparison_chart

Y chromosome DNA tests, International Society of Genetic Genealogy Wiki, This page was last edited on 6 September 2024, https://isogg.org/wiki/Y_chromosome_DNA_tests

Y-DNA STR testing comparison chart, International Society of Genetic Genealogy Wiki, This page was last edited on 11 July 2022, https://isogg.org/wiki/Y-DNA_STR_testing_comparison_chart

Balding, David, Debbie Kennett and Mark Thomas, Understanding genetic ancestry testing, This page was last edited on 25 August 2015, Iternational Society of Genetic Genealogy Wiki, https://isogg.org/wiki/Understanding_genetic_ancestry_testing

Rowe-Schurwanz, Kathy, Using mtDNA for Genealogical Research, Aug 14, 2024, FamilyTreeDNA Blog, https://blog.familytreedna.com/using-mtdna-genealogical-research/

Rowe-Schurwanz, Kathy, How Autosomal DNA Testing Works, June10, 2024, FamilyTreeDNA Blog, https://blog.familytreedna.com/how-autosomal-dna-testing-works/

Unveiling the Power of Big Y-700: Unraveling the Journey and Advantages, Oct 21, 2022, FamilyTreeDNA Blog, https://blog.familytreedna.com/big-y-700/

Mitochondrial Eve, Wikipedia, This page was last edited on 18 September 2024, https://en.wikipedia.org/wiki/Mitochondrial_Eve

Y-chromosomal Adam, Wikipedia, This page was last edited on 19 September 2024, https://en.wikipedia.org/wiki/Y-chromosomal_Adam

[4] Newton, Maud, America’s Ancestry Craze: Making sense of our family-tree obsession, June 2014, Harper’s Magazine, https://harpers.org/archive/2014/06/americas-ancestry-craze/

[5] Jorde LB, Bamshad MJ. Genetic Ancestry Testing: What Is It and Why Is It Important? JAMA. 2020 Mar 17;323(11):1089-1090. doi:10.1001/jama.2020.0517 PMID: 32058561; PMCID: PMC8202415 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8202415/

[6] Antonio Regalodo, More than 26 million people have taken an at-home ancestry test, MIT Technology Review, 11 Feb 2019, https://www.technologyreview.com/2019/02/11/103446/more-than-26-million-people-have-taken-an-at-home-ancestry-test/

Covering Your Bases: Introduction to Autosomal DNA Coverage, Legacy Tree Genealogists, https://www.legacytree.com/blog/introduction-autosomal-dna-coverage

DNA Geek, Family DNA Tests for Ancestry & Genealogy, Navigating the World of DNA,

[7] Has the consumer DNA test boom gone bust?, Feb 20, 2020, updated Jul 28, 2024, Advisory Board, https://www.advisory.com/daily-briefing/2020/02/20/dna-tests 

[8] Ibid

[9] Krimsky Sheldon, The Business of DNA Ancestry, in: Understanding DNA Ancestry. Understanding Life. Cambridge University Press; 2021, Pages 8-16.

Molla, Rami, Why DNA tests are suddenly unpopular, 13 Feb 2020, Vox, https://www.vox.com/recode/2020/2/13/21129177/consumer-dna-tests-23andme-ancestry-sales-decline#

Spiers, Caroline, Keeping It in the Family: Direct-to-Consumer Genetic Testing and the Fourth Amendment, Houston Law Review, Vol 59, Issue 5, May 23 2020, https://houstonlawreview.org/article/36547-keeping-it-in-the-family-direct-to-consumer-genetic-testing-and-the-fourth-amendment

Has the consumer DNA test boom gone bust?, Updated 28 Jul 2023, Advisory Board, https://www.advisory.com/daily-briefing/2020/02/20/dna-tests

Linder, Emmett, As 23andMe Struggles, Concerns Surface About Its Genetic Data, 5 Oct 2024, New York Times, https://www.nytimes.com/2024/10/05/business/23andme-dna-bankrupt.html

Estes, Roberta, DNA Testing Sales Decline: Reason and Reasons, 11 Feb 2020, DNAeXplained – Genetic Genealogy Blog, https://dna-explained.com/2020/02/11/dna-testing-sales-decline-reason-and-reasons/

[10] Fish, Eric, The Sordid Saga of 23andMe, 21 Oct 2024, All Science Great & Small, https://allscience.substack.com/p/the-sordid-saga-of-23andme

Prictor, Megan, Millions of People’s DNA in Doubt as 23andMe Faces Bankruptcy, 21 Oct 2024, Science Alert, https://www.sciencealert.com/millions-of-peoples-dna-in-doubt-as-23andme-faces-bankruptcy

Linder, Emmett, As 23andMe Struggles, Concerns Surface About Its Genetic Data, 5 Oct 2024, New York Times, https://www.nytimes.com/2024/10/05/business/23andme-dna-bankrupt.html

Allyn, Bobby, 23andMe is on the brink. What happens to all its DNA data?, NPR, https://www.npr.org/2024/10/03/g-s1-25795/23andme-data-genetic-dna-privacy

23andMe Facing Bankruptcy, FoxLocal 26, , https://youtu.be/ZfBOCxbWAeY

[10a] Estes, Roberta, 23andMe Trouble – Step-by-Step Instructions to Preserve Your Data and Matches, 19 Sep 2024, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2024/09/19/23andme-trouble-step-by-step-instructions-to-preserve-your-data-and-matches/

[11] A karyotype is a visual representation of an individual’s complete set of chromosomes, displaying their number, size, and structure, typically arranged in pairs and ordered by size.

“A karyotype is the general appearance of the complete set of chromosomes in the cells of a species or in an individual organism, mainly including their sizes, numbers, and shapes. … A karyogram or idiogram is a graphical depiction of a karyotype, wherein chromosomes are generally organized in pairs, ordered by size and position of centromere for chromosomes of the same size.”

Karotype, Wikipedia, This page was last edited on 12 September 2024, https://en.wikipedia.org/wiki/Karyotype

Karyotype, Wikipedia, This page was last edited on 17 October 2024,, https://en.wikipedia.org/wiki/Karyotype

Dutra, Ameria, Karyotype, National Genome Human Genome Research Institute, https://www.genome.gov/genetics-glossary/Karyotype

Karyotype, ScienceDirect, definition and discussion is from from Antonie D. Kline and Ethylin Wang Jabs, eds., Genomics in the Clinic,  2024, Shen Gu, Bo Yuan, Ethylin Wang Jabs, Christine M. Eng , Chapter 2 – Basic Principles of Genetics and Genomics,  Pages 5-28 ,  https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/karyotype 

Shen Gu, Bo Yuan, Ethylin Wang Jabs, Christine M. Eng, Chapter 2 – Basic Principles of Genetics and Genomics, Editor(s): Antonie D. Kline, Ethylin Wang Jabs, Genomics in the Clinic, Academic Press, 2024, Pages 5-28

[12] Autosomes are the non-sex chromosomes found in the cells of organisms. Autosomes are any chromosomes that are not sex chromosomes (allosomes). In humans, there are 22 pairs of autosomes, numbered from 1 to 22. They come in identical pairs in both males and females. They are numbered based on size, shape, and other properties. They contain genes that control the inheritance of all traits except sex-linked ones.

[13] Recombination is a process by which pieces of DNA are broken and recombined to produce new combinations of nucleotides or alleles. Recombination primarily happens between homologous chromosomes, which are paired chromosomes with similar genetic information, allowing for the exchange of corresponding DNA segments.

During meiosis, when homologous chromosomes pair up, a process called “crossing over” occurs where DNA strands break and rejoin, swapping genetic material between the chromosomes. This recombination process creates genetic diversity at the level of genes that reflects differences in the DNA sequences of different organisms. 

Recombination, Scitable by nature Education, Nature, 2014, https://www.nature.com/scitable/definition/recombination-226/

Genetic recombination, Wikipedia, This page was last edited on 5 October 2024, https://en.wikipedia.org/wiki/Genetic_recombination

Alberts B, Johnson A, Lewis J, et al., General Recombination, in The cell, New York: Garland Science; 2002. https://www.ncbi.nlm.nih.gov/books/NBK26898/

[14] Autosomal DNA Statistics, International Society of Genetic Genealogy Wiki, Page was last edited 4 August 2022, Page accessed 14 Aug 2022, https://isogg.org/wiki/Autosomal_DNA_statistics

Nicole Dyer, Charts for Understanding DNA Inheritance, 14 Aug 2019, Family Locket, Page accessed 10 Oct 2021, https://familylocket.com/charts-for-understanding-dna-inheritance/

[15] Meiosis is a type of cell division that reduces the number of chromosomes in the parent cell by half and produces four gamete cells. This process is required to produce egg and sperm cells for sexual reproduction.

Meiosis, 2014, Scitable by Nature Education, Nature, https://www.nature.com/scitable/definition/meiosis-88/

Gilchrist, Daniel, Meiosis, National Human Genome Research Institute, https://www.genome.gov/genetics-glossary/Meiosis

Meiosis, Wikipedia, This page was last edited on 22 August 2024, https://en.wikipedia.org/wiki/Meiosis

[16] What are reduced penetrance and variable expressivity?, MedlinePlus, https://medlineplus.gov/genetics/understanding/inheritance/penetranceexpressivity/

Miko, Iiona,  Phenotype variability: penetrance and expressivity. Nature Education 1(1):137 , 2008, https://www.nature.com/scitable/topicpage/phenotype-variability-penetrance-and-expressivity-573/

Expressivity (genetics), Wikipedia, This page was last edited on 9 October 2024, https://en.wikipedia.org/wiki/Expressivity_(genetics)

[17] Average Percent DNA Shared Between Relatives, 23andMe Customer Care, Tools, 23andMe, https://customercare.23andme.com/hc/en-us/articles/212170668-Average-Percent-DNA-Shared-Between-Relatives

Autosomal Statistics, International Society of Genetic Genealogy Wiki, This page was last edited on 17 October 2022, https://isogg.org/wiki/Autosomal_DNA_statistics

[18] The genome is the entire set of DNA instructions found in a cell. In humans, the genome consists of 23 pairs of chromosomes located in the cell’s nucleus, as well as a small chromosome in the cell’s mitochondria. A genome contains all the information needed for an individual to develop and function.

Human Genomic Variation, Fact Sheet, National Human Genome Research Institute, 1 Feb 2023, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

[19] Fundamental Concepts of Genetics and about the Human Genome, Eupedia, page accessed 3 Feb 2021, https://www.eupedia.com/genetics/human_genome_and_genetics.shtml

Sheldon Krimsky, Understanding DNA Ancestry, Cambridge: Cambridge University , 2022, Page 18

Human Genomic Variation, Fact Sheet, National Human Genome Research Institute, 1 Feb 2023, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

[20] Nucleotide, National Cancer Institute, https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/nucleotide

Nucleotide, Wikipedia, This page was last edited on 3 September 2024, https://en.wikipedia.org/wiki/Nucleotide

Brody, Lawrence, Nucleotide, National Human Genome Research Institute, 1 Nov 2024, https://www.genome.gov/genetics-glossary/Nucleotide 

[21] Non-Coding DNA, AncestryDNA Learning Hub, 16 Aug 2016, https://www.ancestry.com/c/dna-learning-hub/non-coding-dna

What is Noncoding DNA?, MedlinePlus, https://medlineplus.gov/genetics/understanding/basics/noncodingdna/

[22] Non-Coding DNA, AncestryDNA Learning Hub, https://www.ancestry.com/c/dna-learning-hub/junk-dna

Ohno, Susumu. “So Much ‘Junk’ DNA in Our Genome.” Brookhaven Symposium on Biology, Volume 23, 1972: 366-370.

Zhang F, Lupski JR. Non-coding genetic variants in human disease. Hum Mol Genet. 2015 Oct 15;24(R1):R102-10. doi: 10.1093/hmg/ddv259. Epub 2015 Jul 7. PMID: 26152199; PMCID: PMC4572001 https://pmc.ncbi.nlm.nih.gov/articles/PMC4572001/

Peña-Martínez EG, Rodríguez-Martínez JA. Decoding Non-coding Variants: Recent Approaches to Studying Their Role in Gene Regulation and Human Diseases. Front Biosci (Schol Ed). 2024 Mar 1;16(1):4. doi: 10.31083/j.fbs1601004. PMID: 38538340; PMCID: PMC11044903 https://pmc.ncbi.nlm.nih.gov/articles/PMC11044903/

Malte Spielmann, Stefan Mundlos, Looking beyond the genes: the role of non-coding variants in human disease, Human Molecular Genetics, Volume 25, Issue R2, 1 October 2016, Pages R157–R165, https://doi.org/10.1093/hmg/ddw205

Vitsios, D., Dhindsa, R.S., Middleton, L. et al. Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning. Nat Commun 12, 1504 (2021). https://doi.org/10.1038/s41467-021-21790-4

Ellingford, J.M., Ahn, J.W., Bagnall, R.D. et al. Recommendations for clinical interpretation of variants found in non-coding regions of the genome. Genome Med 14, 73 (2022). https://doi.org/10.1186/s13073-022-01073-3

[23]  The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015). https://doi.org/10.1038/nature15393https://www.nature.com/articles/nature15393#citeas

Human Genomic Variation, National Human Genome Research Institute, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

For the 99.9 percent figure, see for example: Krimsky, Sheldon, Understanding DNA Ancestry, Cambridge, Cambridge University Press, 2022, Page 18

[22] Zou H, Wu LX, Tan L, Shang FF, Zhou HH. Significance of Single-Nucleotide Variants in Long Intergenic Non-protein Coding RNAs. Front Cell Dev Biol. 2020 May 25;8:347. doi: 10.3389/fcell.2020.00347. PMID: 32523949; PMCID: PMC7261909

The Order of Nucleotides in a Gene Is Revealed by DNA Sequencing, Scitable, Nature Education, https://www.nature.com/scitable/topicpage/the-order-of-nucleotides-in-a-gene-6525806/

single nucleotide variant, National Cancer Institute, https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/single-nucleotide-variant

Wright, A.F. (2005). Genetic Variation: Polymorphisms and Mutations. In eLS, (Ed.). https://doi.org/10.1038/npg.els.0005005

Single-nucleotide polymorphism, Wikipedia, This page was last edited on 29 September 2024, https://en.wikipedia.org/wiki/Single-nucleotide_polymorphism

SNVs vs. SNPs, CD Genomics, https://www.cd-genomics.com/resource-snvs-vs-snps.html

[23] Human Genomic Variation, Fact Sheet, National Human Genome Research Institute, 1 Feb 2023, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

[24] Ichikawa, K., Kawahara, R., Asano, T. et al. A landscape of complex tandem repeats within individual human genomes. Nat Commun 14, 5530 (2023). https://doi.org/10.1038/s41467-023-41262-1 

Tandem Repeat, Wikipedia, This page was last edited on 12 July 2024, https://en.wikipedia.org/wiki/Tandem_repeat

Myers, P., Tandem repeats and morphological variation. Nature Education 1(1):1, 2007,  http://scienceblogs.com/pharyngula/2007/10/tandem_repeats_and_morphologic.php

Usdin K. The biological effects of simple tandem repeats: lessons from the repeat expansion diseases. Genome Res. 2008 Jul;18(7):1011-9. doi: 10.1101/gr.070409.107. PMID: 18593815; PMCID: PMC3960014. https://pmc.ncbi.nlm.nih.gov/articles/PMC3960014/

Ichikawa, K., Kawahara, R., Asano, T. et al. A landscape of complex tandem repeats within individual human genomes. Nat Commun 14, 5530 (2023). https://doi.org/10.1038/s41467-023-41262-1 

Mitsuhashi, S., Frith, M.C., Mizuguchi, T. et al. Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads. Genome Biol 20, 58 (2019). https://doi.org/10.1186/s13059-019-1667-6 

Sequencing 101: Tandem repeats, 22 Nov 2023, PacBio, https://www.pacb.com/blog/sequencing-101-tandem-repeats/

Kai Zhou, Abram Aertsen, Chris W. Michiels, The role of variable DNA tandem repeats in bacterial adaptation, FEMS Microbiology Reviews, Volume 38, Issue 1, January 2014, Pages 119–141, https://doi.org/10.1111/1574-6976.12036

Fan H, Chu JY. A brief review of short tandem repeat mutation. Genomics Proteomics Bioinformatics. 2007 Feb;5(1):7-14. doi: 10.1016/S1672-0229(07)60009-6. PMID: 17572359; PMCID: PMC5054066. https://pmc.ncbi.nlm.nih.gov/articles/PMC5054066/

[25] Structural variation, Wikipedia, This page was last edited on 30 August 2024, https://en.wikipedia.org/wiki/Structural_variation

Scott AJ, Chiang C, Hall IM. Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes. Genome Res. 2021 Dec;31(12):2249-2257. doi: 10.1101/gr.275488.121. Epub 2021 Sep 20. PMID: 34544830; PMCID: PMC8647827 https://pmc.ncbi.nlm.nih.gov/articles/PMC8647827/

Feuk, L., Carson, A. & Scherer, S. Structural variation in the human genome. Nat Rev Genet 7, 85–97 (2006). https://doi.org/10.1038/nrg1767 

[26] CNVs are typically defined as DNA segments that are: larger than 1,000 base pairs (1 kilobase); usually less than 5 megabases in length; and  can include both duplications (additional copies) and deletions (losses) of genetic material. 

CNVs are remarkably common in human genomes. They account for approximately 5 to 9.5% of the human genome. They affect more base pairs than other forms of mutation when comparing two human genomes. They play crucial roles in evolution, population diversity, and disease development. 

Copy number variation, Wikipedia, This page was last edited on 24 September 2024, https://en.wikipedia.org/wiki/Copy_number_variation

Pös O, Radvanszky J, Buglyó G, Pös Z, Rusnakova D, Nagy B, Szemes T. DNA copy number variation: Main characteristics, evolutionary significance, and pathological aspects. Biomed J. 2021 Oct;44(5):548-559. doi: 10.1016/j.bj.2021.02.003. Epub 2021 Feb 13. PMID: 34649833; PMCID: PMC8640565 https://pmc.ncbi.nlm.nih.gov/articles/PMC8640565/

Eichler, E. E. Copy Number Variation and Human Disease. Nature Education 1(3):1, 2008,  https://www.nature.com/scitable/topicpage/copy-number-variation-and-human-disease-741737/

What are copy number variants?, 12 Aug 2020, Genomics Education Programme, https://www.genomicseducation.hee.nhs.uk/blog/what-are-copy-number-variants/

Clancy, S. Copy number variation. Nature Education 1(1):95, 2008, https://www.nature.com/scitable/topicpage/copy-number-variation-445/

Copy number variant, National Cancer Institute, https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/copy-number-variant

Copy Number Variation (CNV), 3 Nov 2024, National Human Genome Research Institute, https://www.genome.gov/genetics-glossary/Copy-Number-Variation

[29] Several approaches are used to determine if an SNV meets the one percent population frequency threshold:

  • Large-Scale Population Studies: Projects like the 1000 Genomes Project have sequenced thousands of individuals across multiple populations to identify and validate SNPs
  • A number of detection technologies are used such as real-time PCR, the use of microarrays, and Next-generation sequencing (NGS).

See for example:

The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015). https://doi.org/10.1038/nature15393 

Patricia M Schnepp, Mengjie Chen, Evan T Keller, Xiang Zhou, SNV identification from single-cell RNA sequencing data, Human Molecular Genetics, Volume 28, Issue 21, 1 November 2019, Pages 3569–3583, https://doi.org/10.1093/hmg/ddz207

Telenti A, Pierce LC, Biggs WH, di Iulio J, Wong EH, Fabani MM, Kirkness EF, Moustafa A, Shah N, Xie C, Brewerton SC, Bulsara N, Garner C, Metzker G, Sandoval E, Perkins BA, Och FJ, Turpaz Y, Venter JC. Deep sequencing of 10,000 human genomes. Proc Natl Acad Sci U S A. 2016 Oct 18;113(42):11901-11906. doi: 10.1073/pnas.1613365113. Epub 2016 Oct 4. PMID: 27702888; PMCID: PMC5081584. https://pmc.ncbi.nlm.nih.gov/articles/PMC5081584/

SNVs vs. SNPs, CD Genomics, https://www.cd-genomics.com/resource-snvs-vs-snps.html

Efficiently detect single nucleotide polymorphisms and variants, Illumina, https://www.illumina.com/techniques/popular-applications/genotyping/snp-snv-genotyping.html

[30] What are single nucleotide polymorphisms (SNPs)?, MedlinePlus, https://medlineplus.gov/genetics/understanding/genomicresearch/snp/

SNP, IMS Riken Center for Integrative Medical Sciences, https://www.ims.riken.jp/english/glossary/genome.php

The 1000 Genomes Project Consortium. A global reference for human genetic variation.Nature 526, 68–74 (2015). https://doi.org/10.1038/nature15393

[31] Ancestry Information Markers, National Human Genome Research Institute, https://www.genome.gov/genetics-glossary/Ancestry-informative-Markers

Joon-Ho You, Janelle S. Taylor, Karen L. Edwards, Stephanie M. Fullerton, What are our AIMs? Interdisciplinary Perspectives on the Use of Ancestry Estimation in Disease Research, National Library of Medicine, 2012 Nov 5. doi: 10.1080/21507716.2012.717339

Huckins, L., Boraska, V., Franklin, C. et al. Using ancestry-informative markers to identify fine structure across 15 populations of European origin. Eur J Hum Genet 22, 1190–1200 (2014). https://doi.org/10.1038/ejhg.2014.1

[32] What are single nucleotide polymorphisms (SNPs)?, MedlinePlus, https://medlineplus.gov/genetics/understanding/genomicresearch/snp/

[33] AIMs are single-nucleotide polymorphisms (SNPs) that show substantially different frequencies between populations from different geographical regions15. These genetic variations can be used to estimate the geographical origins of a person’s ancestors, typically by continent of origin.

AIMs are found within the approximately 15 million SNP sites in human DNA (about 0.4% of total base pairs). They are often traced to the Y chromosome, Mitochondrial DNA, and Autosomal regions.

AIMs can distinguish between major continental populations (Africa, Asia, Europe). They require multiple markers working together (typically 20-30 or more) for accurate ancestry determination. They can identify fine population structure within continents using larger marker sets. 

The effectiveness of AIMs depends on the number of markers used:

  • 40-80 markers can identify five broad continental clusters;
  • 128 markers can characterize samples into 8 broad continental groups; and
  • Larger sets (>46,000 markers) can identify detailed subpopulation structure

Hinkley, Ellen, DNA Testing Choice, 16 Dec 2016, https://dnatestingchoice.com/en-us/news/what-is-an-autosomal-dna-test

Lamiaa Mekhfi, Bouchra El Khalfi, Rachid Saile, Hakima Yahia, and Abdelaziz Soukri, The interest of informative ancestry markers (AIM) and their fields of application, , BIO Web of Conferences 115, 07003 (2024),https://doi.org/10.1051/bioconf/202411507003 

Huckins, L., Boraska, V., Franklin, C. et al. Using ancestry-informative markers to identify fine structure across 15 populations of European origin. Eur J Hum Genet 22, 1190–1200 (2014). https://doi.org/10.1038/ejhg.2014.1 

Ancestry Information Markers, National Human Genome Research Institute, https://www.genome.gov/genetics-glossary/Ancestry-informative-Markers

Ancestry-informative marker, Wikipedia, This page was last edited on 14 August 2024, https://en.wikipedia.org/wiki/Ancestry-informative_marker

[34] Autosomal DNA Statistics, International Society of Genetic Genealogy Wiki, This page was last edited on 17 October 2022, https://isogg.org/wiki/Autosomal_DNA_statistics

Autosomal SNP comparison chart, International Society of Genetic Genealogy Wiki, This page was last edited on 29 January 2024, https://isogg.org/wiki/Autosomal_SNP_comparison_chart

DNA Structure and the Testing Process, FamilyTreeDNA Help Center, https://help.familytreedna.com/hc/en-us/articles/6189190247311-DNA-Structure-and-the-Testing-Process

Catherine A. Ball, Mathew J Barber, Jake Byrnes, Peter Carbonetto, Kenneth G. Chahine, Ross E. Curtis, Julie M. Granka, Eunjung Han, Eurie L. Hong, Amir R. Kermany, Natalie M. Myres, Keith Noto, Jianlong Qi, Kristin Rand, Yong Wang and Lindsay Willmore, AncestryDNA Matching White Paper, 31 Mar 2016, AncestryDNA, https://www.ancestry.com/cs/dna-help/matches/whitepaper; PDF: https://www.ancestry.com/dna/resource/whitePaper/AncestryDNA-Matching-White-Paper.pdf

Autosomal DNA match thresholds, International Society of Genetic Genealogy Wiki, This page was last edited on 31 August 2024, https://isogg.org/wiki/Autosomal_DNA_match_thresholds

Daniel Kling, Christopher Phillips, Debbie Kennett, Andreas Tillmar,

Investigative genetic genealogy: Current methods, knowledge and practice, Forensic Science International: Genetics, Volume 52, 2021, https://doi.org/10.1016/j.fsigen.2021.102474

Davis DJ, Challis JH. Automatic segment filtering procedure for processing non-stationary signals. J Biomech. 2020 Mar 5;101:109619. doi: 10.1016/j.jbiomech.2020.109619. Epub 2020 Jan 9. PMID: 31952818.

The Order of Nucleotides in a Gene Is Revealed by DNA Sequencing, Scitable, Nature Education, https://www.nature.com/scitable/topicpage/the-order-of-nucleotides-in-a-gene-6525806/

[35] The Illumina Global Screening Array (GSA) is a customizable genotyping microarray platform.  Its base configuration

  • Contains approximately 654,000 fixed markers spanning the human genome;
  • Supports 24 samples per array in standard format;
  • Requires 200 ng DNA input;
  • Achieves call rates greater than 99% and reproducibility greater than 99.9%; and
  • Allows addition of up to 100,000 custom markers

Illumina microarray solutions, Illumina, https://www.illumina.com/techniques/microarrays.html

Efficiently detect single nucleotide polymorphisms and variants, Illumina, https://www.illumina.com/techniques/popular-applications/genotyping/snp-snv-genotyping.html

Custom design tools for genotyping any variant, in any species, Illumina, https://www.illumina.com/techniques/popular-applications/genotyping/custom-genotyping.html

Infinium™ Global Screening Array-24 v3.0 BeadChip, Illumina , https://www.illumina.com/content/dam/illumina-marketing/documents/products/datasheets/infinium-global-screening-array-data-sheet-370-2016-016.pdf

Infinium Global Screening Array-24 Kit, Illumina, https://www.illumina.com/products/by-type/microarray-kits/infinium-global-screening.html

Efficiently detect single nucleotide polymorphisms and variants, Illumina, https://www.illumina.com/techniques/popular-applications/genotyping/snp-snv-genotyping.html

Custom design tools for genotyping any variant, in any species, Illumina, https://www.illumina.com/techniques/popular-applications/genotyping/custom-genotyping.html

[36] Estes, Roberta, Comparing DNA Results – Different Tests at the Same Testing Company, 5 Sep 2017, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2023/05/18/comparing-dna-results-different-tests-at-the-same-testing-company/

[37]  Estes, Roberta, Concepts -Imputation, 5 Sep 2017, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2017/09/05/concepts-imputation/

Illumina microarray solutions, Illumina, https://www.illumina.com/techniques/microarrays.html

Efficiently detect single nucleotide polymorphisms and variants, Illumina, https://www.illumina.com/techniques/popular-applications/genotyping/snp-snv-genotyping.html

[38] See for example: Our Autosomal DNA Test (Family Finder™), FamilyTreeDNA HelpCenter, https://help.familytreedna.com/hc/en-us/articles/4411203169679-Our-Autosomal-DNA-Test-Family-Finder

[39] Different DNA testing companies use centimorgans (cM) in slightly different ways when reporting matches and relationships:

  1. Matching thresholds: Companies set different minimum thresholds for reporting matches. For example: AncestryDNA currently uses a threshold of 8 cM; 23andMe uses 7 cM and at least 700 SNPs for the first matching segment; and MyHeritage uses 8 cM.
  2. Algorithms and filtering: Companies use proprietary algorithms to filter and process the raw DNA data. AncestryDNA uses algorithms called Timber and Underdog to phase data and filter out high-frequency segments. Other companies may use different methods, leading to variations in reported shared cM.
  3. Total cM calculations: The total amount of cM a person has can vary between companies. 23andMe reports about 7,440 cM total and AncestryDNA seems to use around 6,800-7,000 cM total.
  4. Reporting of segments: Some companies like 23andMe and FamilyTreeDNA provide detailed segment data. AncestryDNA does not show specific segment information.
  5. Confidence levels: Companies may assign different confidence levels or relationship probabilities based on shared cM. For example, AncestryDNA previously used confidence scores like “Extremely High” for cMs greater than 60.
  6. Handling of small segments: Companies differ in how they handle very small matching segments, with some including segments as small as one cM and others excluding anything below their threshold.

These differences in methodologies can result in variations in reported shared cM and relationship estimates between companies for the same pair of individuals. This is why matches and relationship predictions may not be identical across different testing companies.

Centimorgan, Wikipedia, This page was last edited on 1 May 2024, https://en.wikipedia.org/wiki/Centimorgan

What’s the difference between shared centimorgans and shared segments?, 11 Nov 2019, The Tech Initiative, https://www.thetech.org/ask-a-geneticist/articles/2019/centimorgans-vs-shared-segments/

centiMorgan, Internatioal Society of Genetic Genealogy, This page was last edited on 15 August 2024, https://isogg.org/wiki/CentiMorgan

[40] Hansen, Annelie, Untangling the Centimorgans on Your DNA Test, FamilySearch Blog, https://www.familysearch.org/en/blog/centimorgan-chart-understanding-dna

Green Dragon Genealogy, Yes, but what EXACTLY is a centiMorgan?, 19 Sep 2021, Green Dragon Genealogy,https://greendragongenealogy.co.uk/dna/yes-but-what-exactly-is-a-centimorgan/

[41] Autosomal DNA match thresholds, International Society of Genetic Genealogy Wiki, This page was last edited on 31 August 2024, https://isogg.org/wiki/Autosomal_DNA_match_thresholds

[42] Autosomal DNA Statistics, International Society of Genetic Genealogy Wiki, This page was last edited on 17 October 2022, https://isogg.org/wiki/Autosomal_DNA_statistics

Autosomal DNA match thresholds, International Society of Genetic Genealogy Wiki, This page was last edited on 31 August 2024, https://isogg.org/wiki/Autosomal_DNA_match_thresholds

Estes, Roberta , Comparing DNA Results – Different Tests at the Same Testing Company, DNAeXplained – Genetic Genealogy Blog, 18 May 2023, https://dna-explained.com/2023/05/18/comparing-dna-results-different-tests-at-the-same-testing-company/

Autosomal DNA testing comparison chart, International Society of Genetic Genealogy Wiki, This page was last edited on 8 October 2024, https://isogg.org/wiki/Autosomal_DNA_testing_comparison_chart

[43] Phasing, International Society of Genetic Genealogy Wiki, This page was last edited on 24 May 2024, https://isogg.org/wiki/Phasing

A Guide to Phasing from Illumina: https://youtu.be/15NPZCGP_e4

Autosomal DNA match thresholds, International Society of Genetic Genealogy Wiki, This page was last edited on 31 August 2024, https://isogg.org/wiki/Autosomal_DNA_match_thresholds

Davis DJ, Challis JH. Automatic segment filtering procedure for processing non-stationary signals. J Biomech. 2020 Mar 5;101:109619. doi: 10.1016/j.jbiomech.2020.109619. Epub 2020 Jan 9. PMID: 31952818.