The Orientation of Family Narratives Across Time Layers : Part Three

The analysis of Y-DNA or mtDNA data provides the foundation for mapping out one’s haplogroup or ‘family’ lineage in the long term and mid range time layers. Genetic genealogy is the thread of continuity in all three periods of genealogical time. However, each time layer has its unique properties and rely on predominant forms of contextual evidence to fill in a family narrative.

In order to add historical information to the analysis of Y-DNA or mtDNA evidence, the long term and mid range ancestry genealogical time layers rely on paleo-genomic and anthropological macro level sources of evidence. These two general sources of research can provide an historical background or context for interpreting DNA test results. Their respective advantages in adding meaning to a story, however, have notable limitations as well.

Each of the three layers of genealogical time rely upon different methods of gathering evidence and interpreting evidence in context of social and cultural factors. Illustration one depicts the predominant orientation in narrating family stories in each of the specific layers of genealogical time.

Illustration One: Orientation of Family Stories Based on Genealogical Time Period

The short range genealogical time period predominately relies on traditional research methods and historical sources associated with social history. Autosomal DNA tests might also be used to verify or discover family relationships within the past seven or so generations. mtDNA (mitochondrial DNA) [1] and Y-DNA tests [2] may also play a supplementary role in fleshing out evidence in the short range time layer.

The mid range genealogical time layer utilizes mtDNA and both SNP and STR Y-DNA data to discover ‘family’ haplogroups. The use of Y-STR data can provide novel discoveries of haplogroup formation when surnames emerged in Europe. As previously stated, the analysis and comparison of individual Y-STR results with other Y-STR test kit results can help delineate lineages and tease out branches within the haplotree family, fine-tuning relationships between ‘mutations’ or people within the tree. [3] The results from genetic DNA tests can be placed into an historical context in the mid range time palyer through anthropoligical and macor cultural research and paleo genetic studies.

The long term time layer relies primarily on SNP and haplogroup data. Genetic data can be interpreted through the lens of long-term, slow-moving macro level social structures, genetic demographic changes and patterns, geographical and climatic influences, and macro level cultural and anthropological history.

I have discussed the creation of family stories in the short range or traditional genealogical time layer in a prior story. This story focuses on the use of the paleo-genetic and anthropological / macro cultural orientations for providing background information when developing family stories within the mid range and long range time layers.

As discussed in prior stories, the Griff(is)(es)(ith) family surname can be traced to William Griffis who was born in Huntington, Long Island New York in 1736. He is the ‘brick wall’ in our traditional family research. Through the use of Y-DNA testing, I have been able to link the Griff(is)(es)(ith) family patrilineal genetic line through a migratory path of the G-haplogroup. I also have evidence that the patrilineal line probably came from the southern area of Wales before immigrating to the American colonies.

The Paleo-Genomic or Paleo-Genetic Orientation

In conjunction with test results from Y-DNA and mtDNA, the discoveries and accumulated research from paleogenomics provide a complimentary base of evidence to document the historical context of migratory patterns of family lineages in the earlier time periods.

Paleogenomics provides powerful insights into human migration patterns through several key analytical approaches. Ancient DNA sequencing allows researchers to directly examine genetic material from historical remains, revealing detailed information about population movements and interactions. This technique can track genetic changes across thousands of years, providing a timeline of human migrations. The ability to analyze both modern and ancient genomes helps reconstruct migration routes, genetic diversification events, and genetic admixture among various groups.

The key applications of paleogenomics for genealogy are, among others, the detection of genetic drift [4] and ancient population migrations and on the analysis of haplogroup features across geographic regions. Modern paleo-genomic techniques have allowed research scientists to reconstruct ancient ecological communities and study adaptive evolution across deep time. [5]

Paleogenomics is the science of reconstructing and analyzing genomic information from extinct species and ancient organisms. This field involves extracting and studying ancient DNA (aDNA) from various sources including museum artifacts, ice cores, archaeological sites, bones, teeth, mummified tissues, and hair. [6]

During the past decade technological advances have made it cost effective and efficiently possible to sequence the entire genome of humans who lived tens of millions of years ago. The result has been an explosion of new information that has fueled an emerging academic field of paleo-genetics or paleo-genomics that is transforming archaeology and the mapping of deep ancestry at a macroscopic level.

Illustration Two: Samples of Whole Genome Data Generated since 2010

Source: David Reich, Who We are and How We got Here, Ancient DNA and the New Science of the Human Past, New York: Vintage Books, 2018, Page xvi Click for larger view.

This technology has revolutionized the ability to decode complex biological systems. High-throughput sequencing has revolutionized the study of Y chromosome variation in ancient human DNA (aDNA). High-throughput sequencing (HTS), also known as next-generation sequencing (NGS), represents a paradigm shift in genomic research by enabling rapid, cost-effective, and large-scale analysis of DNA and RNA. [7]

The research using this technology has provided insights into male-specific genetic variation throughout history. The study of aDNA allows scientists to directly examine which SNPs and haplotypes were present at different time periods, rather than relying solely on inferences from modern populations. This provides concrete evidence of population movements and genetic changes over time. [8]

In 2018 alone, the genomes of more than a thousand prehistoric humans were determined, mostly from bones dug up years ago and preserved in museums and archaeological labs. [9]

As illustration three indicates, ancient DNA labs are now producing data on ancient human artifacts so quickly that the time lag between data production and publication of the results is longer than the time it takes to double the data production in the field. David Reich published the chart in illustration two in 2018.

In the matter of two years, Reich updated the chart (illustration three) [10] to reflect the dramatic increase in the number of completed whole genome sequencing of ancient remains. He referred to the dramatic increase in sampling of ancient genome data as “Moore’s Law of Ancient DNA”. [11]

Illustration Three: Growth of Genome Sequencing of Ancient Remains

Paleogenomic studies have revealed that non-African populations resulted from the diversification of an ancestral metapopulation that left Africa around 45,000-55,000 years ago.  This migration carried a subset of African genetic diversity to other continents, with subsequent population movements creating the genetic diversity we see today. [12]

Now scientists are delivering new answers to the question of who Europeans really are and where they came from. Their findings suggest that the continent has been a melting pot since the Ice Age. Europeans living today, in whatever country, are a varying mix of ancient bloodlines hailing from Africa, the Middle East, and the Russian steppe.

The evidence comes from archaeological artifacts, from the analysis of ancient teeth and bones, and from linguistics. But above all it comes from the new field of paleogenetics. [13]

The M168 YDNA genomic mutation represents a crucial milestone in human genetic history, marking one of the most significant events in human male lineage (see illustration four). This Y-chromosome marker originated approximately 50,000-60,000 years ago in northeastern Africa. The M168 mutation appeared in a man who geneticists sometimes refer to as “Out of Africa Adam.” His descendants were among the first humans to migrate out of Africa, carrying this genetic marker with them. This mutation is present in all modern non-African Y-chromosome haplogroups (C through R) and separates these lineages from the earlier African haplogroups A and B. [14]

Illustration Four: Simplified Phylogenetic Tree of Major Y Haplogroups and their Respecrtive Ancestry-Informative Markers (AIMs) in Europe

Click for Larger View | Adapted diagram originally found in B. Navarro‑Lopez, E. Granizo‑Rodrguez, L. Palencia‑Madrid, C. Raffone, M. Baeta, M. M. de Pancorbo, Phylogeographic review of Y chromosome haplogroups in Europe, International Journal of Legal Medicine (2021) 135:1675–1684, https://doi.org/10.1007/s00414-021-02644-6

The ancestry-informative marker (AIM) “M168” defines the macro-haplogroup CT and represents the ancestral lineage of all non-African Y-chromosome haplogroups, as well as some African lineages. [15] Every male living today, except those belonging to haplogroups A and B (found exclusively in Africa), carries this genetic marker.

Haplogroup G, which represents the Griff(is)(es)(ith) patenal line, originated in southwestern Asia or the Caucasus region. The estimated date of the G-M201 mutation has been debated, with several different timeframes proposed.

Recent research suggests that the first man to carry haplogroup G-M201 lived between 46,000 and 54,000 years ago in southwestern Asia or the Caucasus region. The National Geographic Society previously estimated its origins in the Middle East 30,000 years ago. Two other studues have suggested 17,000 years ago and a much more recent date of 9,500 years ago. The 9,500-year-old origin date for G-M201 was proposed by Cinnioglu et al. in their 2004 study. However, this estimate appears to be an outlier compared to other research findings and is not well-supported by current evidence. [16]

FamilyTreeDNA estimates the most recent common ancestor associated with the G-M201 haplgroup was born 25,735 BCE rounded to 26,000 BCE. With a 95 percent probability, the most recent common ancestor of all members of this haplogroup was born between the years 29,661 BCE and 22,295 BCE. [17]

The geographic origin of haplogroup G-M201 is most likely located somewhere near eastern Anatolia, Armenia, or western Iran. (See illustration five.) After remaining relatively isolated during the Ice Age, the haplogroup began expanding significantly around 11,500 years ago with the advent of farming and warmer climate conditions.

Illustration Five: Early Migratory Path of Most Recent Common Ancestors of the G Haplogroup in Anatolia Area

Click for Larger View | Source: Migratory Path of G Haplogroup Using Terminal Haplogroup G-Y132505 Rendered with Globe Trekker, FamilyTreeDNA, 12 February 2025, https://discover.familytreedna.com/y-dna/G-BY211678/path

The Y chromosome has been widely explored for the study of human migrations. Due to its paternal inheritance, the Y chromosome polymorphisms are helpful tools for understanding the geographical distribution of populations all over the world and for inferring their origin, which is really useful in forensics. The remarkable historical context of Europe, with numerous migrations and invasions, has turned this continent into a melting pot. For this reason, it is interesting to study the Y chromosome variability and how it has contributed to improving our knowledge of the distribution and development of European male genetic pool as it is today.” [18]

Anthropological – Macro Cultural Orientation

The anthropological – macro cultural approaches can add historical context to the genealogical discoveries associated with mid range and long term time layers. This macro approach helps bridge genetic data with an anthropological and sociological understanding, as genetic identities are often juxtaposed with socio-political contexts and dynamics. This creates a more complete picture of human population history while acknowledging both biological and cultural factors in human variation. [19]

Understanding how social and cultural processes affect the genetic patterns of human populations over time has brought together anthropologists, geneticists and evolutionary biologists, and the availability of genomic data and powerful statistical methods widens the scope of questions that analyses of genetic information can answer.” [20]

The anthropological – macro cultural orientation in genetic genealogy represents a comprehensive approach that combines traditional anthropological and demographic methods with modern genetic analysis to understand human populations and their histories at a broader scale. Genetic anthropology examines DNA sequences across diverse populations to determine shared geographical origins and migration patterns. This macro-level analysis helps reconstruct human population histories and relationships between different groups, moving beyond individual genetic ancestry to understand larger historical demographic patterns. [21]

The approach examines and documents broad cultural, political, and economic forces that shape communities and individuals in different time periods. It emphasizes studying the larger structural forces and systems that influence human behavior, moving beyond individual-level analysis to understand societal level patterns, institutions and customs.

The field employs both traditional macromorphoscopic trait analysis and modern genetic testing to create a robust scientific framework. [22] This includes examining population-wide genetic markers (Ancestry Informative Markers – AIMs) , demographic history patterns, DNA derived from ancient populations (aDNA), and social adaptation patterns across groups. [23]

Through their research, genetic anthropologists can determine population relationships, historical fluctuations in size, and admixture patterns between different groups. This helps reconstruct complex migration histories and evolutionary adaptations of human populations. [24]

Several key discoveries have emerged from studying genetic genealogy haplogroups through sociocultural and anthropological approaches. These findings demonstrate how social and cultural practices have been crucial factors in shaping human genetic diversity through their effects on genetic drift and population structure.

For example, the practice of patrilocality [25] has created distinct patterns in genetic diversity between male and female lineages. [26] Cultural organization has significantly impacted genetic patterns, particularly in nomadic populations where tribal-clan structures regulate social order and maintain bloodlines and agricultural communities where different patterns of inheritance and succession emerge. [27] 

Historical cultural expansions have had varying genetic impacts. For example, one study found that the Arab Islamic expansion introduced cultural changes but left minimal genetic impact. Conversely, the Mongol expansion achieved significant genetic success while having limited cultural influence. [28]

Different social structures have created distinct genetic patterns in kinship systems. Patrilineal kin groups show accelerated genetic drift and loss of Y-chromosome diversity. Corporate kin groups demonstrate clustering of genetic lineages due to intergroup competition. [29] 

Two studies, for example, have found that the mode of subsistence has been more influential than geography in shaping genetic landscapes. Settled agricultural communities show different genetic patterns compared to nomadic populations. Population size in villages affects genetic heterogeneity, with smaller communities showing greater between-village variation. [30] 

Click for Larger View | Cover illustration is by Zosia Rostomian, Geneome Research, April 2015,https://genome.cshlp.org/content/25/4.cover-expansion

A 2015 study utilizing an anthropological – macro cultural orientation by Monika Karmin and colleagues presents several significant findings. The researchers analyzed 456 geographically diverse high-coverage Y chromosome sequences, including 299 newly reported samples. Using ancient DNA calibration, they dated the Y-chromosomal most recent common ancestor (MRCA) in Africa at approximately 254,000 years ago. [31]

The study detected a cluster of major non-African founder haplogroups within a narrow time interval of 47-52 thousand years ago (kya), which supports a model of rapid initial colonization of Eurasia and Oceania following the out-of-Africa bottleneck.

Another key discovery from the Karmin et al study was the detection of a second strong bottleneck in Y-chromosome lineages dating to the last 10,000 years, which contrasts with demographic reconstructions based on mitochondrial DNA (mtDNA). The researchers hypothesize that this recent bottleneck was caused by cultural changes that affected the variance of reproductive success among males. The G haplogroup was impacted by his bottleneck.

The decline in the male effective population size during the Neolithic period was approximately one-twentieth of its original level in various regions of the world. In the same study, mitochondrial sequences indicated a continual increase in population size from the Neolithic to the present, suggesting extreme divergences between the demographic size of male and female populations in the bottleneck period. See illustration six below. Two encircled areas in the illustration graphically identify the growth differences in each of the YDNA and mtDNA graphs.

Illustration Six: Bottleneck of Y Chromosome Diversity Coincides with a Global Change in Culture

Click for Larger View | Source: Karmin M, et al, A recent bottleneck of Y chromosome diversity coincides with a global change in culture. Genome Res. 2015 Apr;25(4):459-66,doi: 10.1101/gr.186684.114, PubMed:https://pmc.ncbi.nlm.nih.gov/articles/PMC4381518/

Zeng et al.’s 2018 article in Nature Communications presents an intriguing sociocultural hypothesis to explain this post-Neolithic Y-chromosome bottleneck. The authors propose that the formation of patrilineal kin groups and competition between these groups led to a significant reduction in Y-chromosomal diversity through a process called ‘cultural hitchhiking’.

The outlines of that idea came to Tian Chen Zeng, a Stanford undergraduate in sociology, after spending hours reading blog posts that speculated – unconvincingly, Zeng thought – on the origins of the “Neolithic Y-chromosome bottleneck,” as the event is known. He soon shared his ideas with his high school classmate Alan Aw, also a Stanford undergraduate in mathematical and computational science.[32]

Click for Larger View | Source: Nature Communications is a peer-reviewed, open access, scientific journal published by Nature Portfolio since 2010. Image from Nature Communications, Wikipedia, This page was last edited on 30 August 2024, https://en.wikipedia.org/wiki/Nature_Communications

The pair of students took their idea to Marcus Feldman, a professor of biology in Stanford’s School of Humanities and Sciences and the rest is history. The authors contend that two cultural mechanisms of Y diversity reduction came into play. Patrilineal kin groups naturally produce high levels of Y-chromosomal homogeneity within each group (due to common descent) and high levels of between-group variation. Violent intergroup competition between patrilineal groups resulted in casualties clustering among related males, sometimes leading to the extinction of entire lineages and their unique Y-chromosomes. [33]

After the onset of farming and herding around 12,000 years ago, societies grew increasingly organized around extended kinship groups, many of them patrilineal clans – a cultural fact with potentially significant biological consequences. The key is how clan members are related to each other. While women may have married into a clan, men in such clans are all related through male ancestors and therefore tend to have the same Y chromosomes.

To explain why even between-clan variation might have declined during the bottleneck, the researchers hypothesized that wars, if they repeatedly wiped out entire clans over time, would also wipe out a good many male lineages and their unique Y chromosomes in the process.” [34]

The bottleneck coincides with the post-Neolithic period when societies were at an “intermediate social scale”, after the adoption of agriculture but before the emergence of hierarchical institutions. The authors argue that patrilineal descent groups were most politically salient in these post-Neolithic societies where the social structures were characteristzed as being without a formal leader or governing body. [35]

Cick for Larger View | Undergraduates Tian Chen Zeng, left, and Alan Aw, right, worked with Marcus Feldman, a professor of biology, to show how social structure could explain a genetic puzzle about humans of the Stone Age. (Image credit: Courtesy Marcus Feldman) Source:Collins, Nathan, Wars and clan structure may explain a strange biological event 7,000 years ago, Stanford researchers find , 30 May 2018, Stanford Report, Stanford University, https://news.stanford.edu/stories/2018/05/war-clan-structure-explain-odd-biological-event

The bottleneck ended in each region of the Old World during periods that coincided with the rise of regional polities, chiefdoms, and states, which reduced the prominence of corporate kin groups as units of mobilization in intergroup competition.

Genetic and Cultural Hitchhiking

The interplay between genetic and cultural evolution has shaped human diversity in profound ways. Two critical mechanisms—genetic hitchhiking and cultural hitchhiking—explain how neutral or non-adaptive traits can propagate through populations due to their association with advantageous traits – hitchhiking traits. While both processes reduce genetic diversity and leave distinct signatures in the genome, their mechanisms, transmission pathways, and evolutionary implications differ significantly. Hitchhiking models in socially structured populations describe processes where selection on one trait affects the frequency of other traits or genetic elements.

Genetic hitchhiking represents a powerful evolutionary force that can significantly shape haplogroup diversity patterns, sometimes creating genetic signatures that persist long after the original selective events occurred. Genetic hitchhiking, also called genetic drift or the hitchhiking effect, occurs when an allele changes frequency not because it is under natural selection itself, but because it is physically linked to another gene undergoing a selective sweep. [36]

Illustration Seven: Genetic Hitchhiking

Click for Larger View | Source: Hashem, Ihab & Telen, Dries & Nimmegeers, Philippe & Van Impe, Jan. (2018). The Silent Cooperator: An Epigenetic Model for Emergence of Altruistic Traits in Biological Systems. Complexity. 2018. 1-16. 10.1155/2018/2082037

Genetic hitchhiking: the frequency of a gene could increase in the population due to lying at the same chromosome of another advantageous gene. In these “domino organisms,” the top gene, the number of dots, represents a trait that is advantageous to its carrier, such as resistance to toxins or diseases. Hence, as the domino organisms with the highest dot number get positively selected, their bottom genes, which have no influence on their fitness, also spread in the population.” [37]

Nearby neutral or even slightly deleterious alleles that are in linkage with the selected gene “hitchhike” along with it. The closer a polymorphism is to the gene under selection, the stronger the hitchhiking effect due to less opportunity for recombination. Examples of selective sweeps in humans are in variants affecting lactase persistence, [38] and adaptation to high altitude. [39].

Cultural hitchhiking, originally proposed by Hal Whitehead in 1998 [40], describes how neutral genetic diversity is shaped by cultural selection. Unlike genetic hitchhiking, this process involves the transmission of culturally advantageous traits (e.g., agricultural practices or social norms) that indirectly influence the frequency of genetically neutral alleles through mate choice, social learning, or demographic shifts. Examples of mechanisms and cultural drivers are provided in table one. Examples of the cultural drivers and the resultant genomic and cultural signatures of cultural hitchhiking are provided signatures are provided in table one.

Table One: Examples of Cultural Drivers, Cultural Signatures and Genomic Patterns

Mechanisms and Cultural DriversDescription
Postmarital Residence RulesPatrilocal or matrilocal societies influence genetic admixture. For example, patrilocal postmarital residence in farming communities may reduce Y-chromosome diversity due to male-biased migration and cultural resocialization [41]
Cultural SelectionAdaptive cultural traits (e.g., slash-and-burn horticulture) alter selection pressures on genes. The spread of farming practices in Neolithic societies increased malaria incidence, favoring the S allele for sickle cell anemia. [42]
Genomic and Cultural Signatures:
Cultural hitchhiking leaves distinct genomic patterns
Description
Mitochondrial and Y-Chromosome BottlenecksReduced diversity in uniparentally inherited loci due to sex-biased cultural practices (e.g., patrilocality) [43]
Association with Cultural ArtifactsNeutral traits (e.g., pottery styles) spread alongside adaptive technologies (e.g., agriculture) due to social learning. [44]

Cultural hitchhiking occurs when neutral genes ‘hitchhike’ to higher frequencies alongside adaptive cultural traits. This process requires specific conditions. Genetic and cultural variants must be transmitted symmetrically (typically vertically from parent to offspring) . Cultural traits must create heritable variation in reproductive success or survival between different groups . Cultures must be stable and not frequently transfer between population segments. [45]

A related process called culturally mediated migration occurs when culture creates barriers within a population that inhibit dispersal and mating. This process reduces diversity of both neutral and functional genes through bottlenecks and selection ; can interact with competitive social dynamics, as seen in patrilineal kin groups ; and requires cultures that affect dispersal patterns and remain relatively stable. [46]

These models are significant because they help explain how social structure and cultural transmission can shape genetic diversity in both human and non-human populations.

Beware of Imputing Cause and Correlation between Genetic and Cultural Genealogical Orientations

The relationship between genetic and cultural inheritance is complex and bidirectional. Genetic propensities influence what cultural elements individuals learn, while culturally transmitted information affects selection pressures, such as marriage traditions, on populations. 

Genes and culture represent two streams of inheritance that for millions of years have flowed down the generations and interacted. Genetic propensities, expressed throughout development, influence what cultural organisms learn. Culturally transmitted information, expressed in behaviour and artefacts, spreads through populations, modifying selection acting back on populations.” [47]

Cultural and genetic genealogy are two distinct but related aspects of genealogy. Various migratory patterns associated with Y-DNA haplogroups do not necessarily imply that they coincide with macro-level, cultural geographical patterns or movements of people. Migratory patterns of Y-DNA Haplogroups undoubtably contained a mix of haplogroups. The migratory groups undoubtably were characterized by various cultural patterns, ptrsctices and behaviors. But Y-DNA haplogroups also were represented in various historical cultures. Many cultures invariably contained genetic mixtures of haplogroups at various periods of time.

Various theories have been formed that describe large cultural groups and major population movements where most of the members of a genetic haplogroup may have lived and traveled. Common genetic ancestors with matches from these time periods can be mapped and described but any information about where these ancestors lived and migrated is gained from these studies doe not necessaily mean that they are connected to our family history. 

There is no direct evidence that our individual ancestors were part of the same culture or migration patterns that are documented in paleogenomics and gnetic anthropological studies. We can not definitively associate deep ancestry haplogroups with historical cultures. However, the results of these multidisciplinary studies can provide a backdrop for interpreting or providing meaning and context to our haplogroup tree.

Ecological Fallacies Can Emerge When Analyzing Y-DNA Migration Patterns

An ecological fallacy is a logical error that occurs when conclusions about individuals are incorrectly drawn from group-level or aggregate data. This fallacy arises when characteristics of a population as a whole are mistakenly attributed to individuals within that population without demonstrating any real connection. [48]

The ecological fallacy can significantly impact the interpretation of Y-DNA migration patterns and haplotree analyses in several key ways. The primary ecological fallacy occurs when making inferences about individual migrations based on population-level Y-DNA patterns. Just because a haplogroup shows a particular geographic distribution pattern at the population level does not necessarily mean that our individual ancestors followed those exact migration routes. [49]

Two major temporal fallacies can emerge when comparing DNA composition with present day patterns and historic patterns. . The presence of a haplogroup in a modern population does not necessarily indicate when that lineage first arrived in a region. High frequencies of particular SNPs in current populations may not reflect historical frequencies, as ancient populations could have had different distributions. [50]

The assumption that current geographic distributions of Y-DNA haplogroups directly map to ancient migration routes can be fallacious. Population bottlenecks, founder effects, and later migrations can dramatically reshape haplogroup distributions. [51]

A reliable way to overcome ecological fallacies is to supplement population-level data with individual-level evidence. This requires integrating archaeological, historical, and genetic data at multiple scales of analysis. [52]

As genetic processes are inherently stochastic, patterns of genetic variation only indirectly reflect demographic histories, requiring careful inferential approaches. Lisa Loog’s 2020 article underscors this point by reviewing fundamental models and assumptions that underlie common approaches for inferring past demographic events from genetic data. All inferential approaches require assumptions about the data and underlying demographic processes, which significantly affect the interpretation of results. [53]

Loog discusses several important methodological issues:

  • Phylogenetic Analysis Limitations: Events in phylogenetic trees based on single loci do not directly correspond to population-level events due to their stochastic nature. Different demographic scenarios can produce similar gene trees (equifinality).
  • Principal Component Analysis (PCA) Issues: PCA, an approach used in many paleogenomic studies lacks an underlying population genetic model, making it problematic for demographic inference. Similar distributions of samples on PCs can result from entirely different demographic histories.
  • Clustering Method Problems: Statistical clusters are often mistakenly interpreted as evidence of “ancestral” or “source” populations when multiple distinct demographic histories could explain such clusters.

Loog’s article highlights how non-random sampling can significantly affect demographic inference. Archaeological specimens and museum collections are particularly susceptible to sampling bias due to preservation issues and non-random excavation patterns.

Loog’s analysis emphasizes that robust demographic inference requires formal comparison of alternative hypotheses formulated as different demographic scenarios. This allows assessment of the importance of different processes in population history.

Dangers of Attributing Cultural Factors with Haplogroups

Attributing ancient cultural traits to haplogroup migratory paths involves several potential fallacies and misconceptions. While genetic data provides valuable insights into human history, attributing cultural traits solely to haplogroup migrations oversimplifies complex historical processes. Cultural transmission, sociocultural practices, selection, drift, and non-random mating patterns all contribute to the complex relationship between genes and culture. A more nuanced approach recognizes that genetic and cultural histories, while sometimes parallel, often follow independent paths.

Genes and culture are not necessarily aligned. They follow different evolutionary trajectories. Languages and cultural practices evolve differently than genes, and while they may sometimes indicate common ancestry, they often develop independently6. Cultural innovations can significantly influence genetic diversity patterns without requiring population replacement. [54]

The relationship between genetic markers and cultural traits is rarely straightforward. Archaeological evidence often shows that contact between culturally distinct groups (like farmers and hunter-gatherers) led to substantial cultural changes without corresponding genetic shifts. Cultural diffusion can occur without significant genetic admixture, and vice versa. [55]

The presence of a haplogroup in multiple regions doesn’t necessarily indicate a single migration event or cultural connection. Haplogroups can arise before migration events and spread through multiple independent pathways . For example, if a haplogroup originated 20,000 years ago but a migration occurred 10,000 years ago, the haplogroup could potentially be found on both sides of the migration route. [56]

Sociocultural practices like postmarital residence patterns, linguistic exogamy, and gender-specific roles can dramatically shape genetic diversity independent of large-scale migrations. Studies of Native American populations show that sociocultural factors have played a more important role than language or geography in determining genetic structure. [57]

The coincidence of genetic and cultural changes doesn’t necessarily imply a causal relationship. For instance, the Avar migration into East Central Europe demonstrates how perceptions of people as “Avars” in historical texts, cultural unification, and genetic admixture did not follow analogous rhythms, leading to diverse genetic ancestry in different local communities despite shared cultural identity [58]

Many historical migrations show sex-biased patterns, with different male and female genetic histories. For example, in Native American populations, European admixture occurred primarily between European men and indigenous women4, creating discrepancies between mitochondrial DNA and Y-chromosome patterns. [59]

Genetic markers can be affected by natural selection and genetic drift, which can create patterns that mimic migration effects. These processes can lead to complicated cline shapes in marker frequencies that are unrelated to cultural diffusion. [60]

Human reproduction is not a uniform random process but is channeled through kinship systems, marriage rules, and social meanings of birth8. Even when different groups share cultural practices, their reproductive choices may maintain genetic differences rather than lead to homogenization. [61]

Admixture Events Complicate Attribution of Cultural Traits to Specific Haplogroups

Admixture events create complex genetic landscapes that make simple haplogroup-culture associations problematic. When populations merge, the resulting genetic profile becomes a mosaic of different ancestral contributions, with some individuals carrying haplogroups from one ancestral population while adopting cultural practices from another. For example, the genetic composition of present-day Europeans reflects multiple prehistoric migrations and admixture events, making it impossible to attribute specific cultural developments solely to particular haplogroups.

Admixture events typically involve cultural exchange that operates independently from genetic exchange. When populations meet and mix, cultural traits can be selectively adopted, modified, or rejected regardless of genetic inheritance patterns. The spread of farming across Europe illustrates this complexity – while there was some genetic contribution from Near Eastern farmers, the cultural practice of agriculture spread more widely than the genetic signature, as local hunter-gatherers adopted farming without complete genetic replacement.

The timing of genetic admixture and cultural change often does not align. Cultural traits may be adopted long before or after genetic admixture occurs, creating a ‘temporal disconnect’ that makes attributing cultural traits to specific haplogroups problematic. For instance, the adoption of Indo-European languages in Europe did not always coincide with significant genetic changes, as evidenced by regions where language shifted while genetic composition remained relatively stable. [62]

Genetic material and cultural traits follow different inheritance patterns. While haplogroups are inherited strictly through biological lines (Y-chromosome haplogroups paternally and mtDNA haplogroups maternally), cultural traits can be transmitted horizontally across populations and vertically between generations through non-genetic means. This fundamental difference means that cultural traits can spread widely without corresponding genetic changes.

Many historical admixture events show strong sex biases, with genetic contributions predominantly from males or females of one population. These sex-biased patterns create discrepancies between different genetic markers (autosomal DNA, Y-chromosome, mtDNA) and further complicate cultural attributions.

Source:

Feature Banner: The banner at the top of the story is an amalgam of two illustrations.

The illustration on the left is part of a chart that represents an haplotree of paternal descent. The blue lines represent the path or lineage of Y-SNP mutations of Y-DNA tests. The other lines represent lineages that have been undiscovered. On the left hand side of the haplotree are two bar graphs that illustrate how far back Y-STR and Y-SNP test results can be utilized to analyze lineages. The bottom of the illustration reflect the extent to which traditional family trees reach in the past. This illustration was created by Mike Walsh, project administrator of the FamilyTreeDNA R1b-L513 working group. It is presented in Vance’s introductory YourTube discussion of Y-DNA. J. David Vance, Transcript of DNA Concepts for Genealogy Y-DNA, 2019,  Page 11, https://drive.google.com/file/d/1CdUB4AmB1UYff5fmKtoKiqp6nG_gom37/view

The right hand portion of the banner is a chart that depicts the predominant orientation of a genealogical narrative in each layer of time.

[1] Mitochondrial DNA (mtDNA) testing analyzes DNA found in the mitochondria of cells, which is passed down exclusively from mothers to their children. This type of DNA testing provides specific information about a person’s maternal ancestry and has several distinctive characteristics. mtDNA exists separately from nuclear DNA, representing one of two genomes in mammalian cells. Both males and females inherit mtDNA, but only females can pass it to their children. Maternal relatives across multiple generations share identical mtDNA sequences, barring mutation.

Amorim A, Fernandes T, Taveira N. Mitochondrial DNA in human identification: a review. PeerJ. 2019 Aug 13;7:e7314. doi: 10.7717/peerj.7314. PMID: 31428537; PMCID: PMC6697116, https://pmc.ncbi.nlm.nih.gov/articles/PMC6697116/

Mitochondrial DNA tests, This page was last edited on 13 February 2021, International Society of Genetic Gnealogists Wiki, https://isogg.org/wiki/Mitochondrial_DNA_tests

[2] Y-DNA testing analyzes genetic information on the Y chromosome, which passes exclusively from fathers to sons. Y chromosome passes unchanged from father to son through generations. Only males possess and can pass on Y-DNA, making it useful for tracing paternal lineages. Unlike other chromosomes, Y-DNA undergoes minimal genetic recombination during reproduction.

[3] See my story: Y-DNA and the Griffis Paternal Line Part Three: The One-Two Punch of Using SNPs and STRs February 23, 2023

[4] Genetic drift is a fundamental evolutionary mechanism where random chance causes changes in the frequency of gene variants (alleles) within a population over time. This process occurs through random sampling of genes passed from one generation to the next, rather than through natural selection. This randomness can lead to some genetic variants becoming more common while others disappear entirely from the population.

Genetic drift has a stronger impact on smaller populations. In small groups, the loss or increase of particular genetic variants happens more quickly and dramatically than in larger populations.

Population bottlenecks are a type of geneetic drift. They occur when a population’s size is suddenly and dramatically reduced, such as through a natural disaster or overhunting. The surviving individuals may carry only a fraction of the original population’s genetic diversity.

Another example of genetic drift is a founder effect. Founder effects occur when a small group separates from a larger population to establish a new colony, they carry only a subset of the original population’s genetic diversity. This limited genetic pool becomes the foundation for the new population.

Rotimi, Charles, Genetic Drift, National Human Genome Research Institute, https://www.genome.gov/genetics-glossary/Genetic-Drift

Andrews, Christine A. (2010) Natural Selection, Genetic Drift, and Gene Flow Do Not Act in Isolation in Natural Populations. Nature Education Knowledge 3(10):5, https://www.nature.com/scitable/knowledge/library/natural-selection-genetic-drift-and-gene-flow-15186648/

Genetic Drift, Wikipedia, This page was last edited on 29 January 2025, https://en.wikipedia.org/wiki/Genetic_drift

Bohonak, Andrew J., Genetic Drift in Human Populations, Genetic Drift in Human Populations. In: Encyclopedia of Life Sciences (ELS), John Wiley & Sons, Ltd: Chichester. April 2018, DOI: 10.1002/9780470015902.a0005440.pub2, https://biology.sdsu.edu/pub/andy/Bohonak2008.pdf

[5] David Reich, Who We are and How We got Here, Ancient DNA and the New Science of the Human Past, New York: Vintage Books, 2018

Kivisild T. The study of human Y chromosome variation through ancient DNA. Hum Genet. 2017 May;136(5):529-546. doi: 10.1007/s00439-017-1773-z. Epub 2017 Mar 4. Erratum in: Hum Genet. 2018 Oct;137(10):863. doi: 10.1007/s00439-018-1937-5. PMID: 28260210; PMCID: PMC5418327, https://pmc.ncbi.nlm.nih.gov/articles/PMC5418327/

[6] Paleogenomics, Wikipedia, This page was last edited on 16 December 2023, https://en.wikipedia.org/wiki/Paleogenomics

High-throughput sequencing (HTS) is a revolutionary technology that enables rapid, parallel sequencing of millions of DNA and RNA molecules simultaneously13. This massively parallel approach represents a significant advancement over traditional Sanger sequencing methods, offering unprecedented speed, scale, and cost-effectiveness

[7] High-throughput sequencing (HTS) is a technology that enables rapid, parallel sequencing of millions of DNA and RNA molecules simultaneously. This massively parallel approach represents a significant advancement over traditional Sanger sequencing methods, offering unprecedented speed, scale, and cost-effectiveness in analying human genomes.

High-Throughput Sequencing: Definition, Technology, Advantages, Application and Workflow, CD Genomics, https://www.cd-genomics.com/resource-comprehensive-overview-high-throughput-sequencing.html

Churko JM, Mantalas GL, Snyder MP, Wu JC. Overview of high throughput sequencing technologies to elucidate molecular pathways in cardiovascular diseases. Circ Res. 2013 Jun 7;112(12):1613-23. doi: 10.1161/CIRCRESAHA.113.300939. PMID: 23743227; PMCID: PMC3831009, https://pmc.ncbi.nlm.nih.gov/articles/PMC3831009/

Tamang, Sanju, ed., Aryal, Sager, High Throughput Sequencing (HTS): Principle, Steps, Uses, Diagram, 9 Sep 2024, Microbe Notes, https://microbenotes.com/high-throughput-sequencing-hts/

What is next-generation sequencing?, Illumina, https://www.illumina.com/science/technology/next-generation-sequencing.html

Imanian, B., Donaghy, J., Jackson, T. et al. The power, potential, benefits, and challenges of implementing high-throughput sequencing in food safety systems. npj Sci Food 6, 35 (2022). https://doi.org/10.1038/s41538-022-00150-6 

Lee JY. The Principles and Applications of High-Throughput Sequencing Technologies. Dev Reprod. 2023 Apr;27(1):9-24. doi: 10.12717/DR.2023.27.1.9. Epub 2023 Mar 31. PMID: 38075439; PMCID: PMC10703097, https://pmc.ncbi.nlm.nih.gov/articles/PMC10703097/

[8] Kivisild, Toomas, The study of human Y chromosome variation through ancient DNA. Hum Genet. 2017 May;136(5):529-546. doi: 10.1007/s00439-017-1773-z. Epub 2017 Mar 4. Erratum in: Hum Genet. 2018 Oct;137(10):863. doi: 10.1007/s00439-018-1937-5. PMID: 28260210; PMCID: PMC5418327, https://pubmed.ncbi.nlm.nih.gov/28260210/

[9] David Reich, Who We are and How We got Here, Ancient DNA and the New Science of the Human Past, New York: Vintage Books, 2018

Michael Hofreiter, Johanna L. A. Paijmans, Helen Goodchild, Camilla F. Speller, Axel Barlow, Gloria G. Fortes, Jessica A. Thomas, Arne Ludwig and Matthew J. Collins, The future of ancient DNA: Technical advances and conceptual shifts, Bio Essays 37 (3) Nov 2015. original publication Nov 21 2014,  https://www.researchgate.net/publication/268579140_The_future_of_ancient_DNA_Technical_advances_and_conceptual_shifts 

Chinese Academy of Sciences, Researchers chart advances in ancient DNA technology July 21 2022, Phys.orghttps://phys.org/news/2022-07-advances-ancient-dna-technology.html 

Lorelei Verlhac, DNA and New Technologies: Is Paleogenomics the Future of Archiealology?, Byacardia,https://www.byarcadia.org/post/dna-and-new-technologies-is-paleogenomics-the-future-of-archaeology

Tsosie KS, Begay RL, Fox K, Garrison NA. Generations of genomes: advances in paleogenomics technology and engagement for Indigenous people of the Americas. Curr Opin Genet Dev. 2020 Jun;62:91-96  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7484015/

Evan K Irving-Pease, Rasa Muktupavela, Michael Dannermann, Fernando Racimo, Quantitative Human Paleogenetics: What can Ancient DNA Tell Us About Complex Trait Evolution?, Frontiers in Genetics, Aug 2021, Volume 12 Article 703541, https://www.frontiersin.org/articles/10.3389/fgene.2021.703541/full

Hodan, George, Most European men descend from a handful of Bronze Age forefathers, 19 May 2015, Phys.org, https://phys.org/news/2015-05-european-men-descend-bronze-age.html

Forbes. Peter, What Ancient DNA says about us, 2 Jul 2018, New Humanist, https://newhumanist.org.uk/articles/5335/what-ancient-dna-says-about-us

[10] Reich, David, Ancient DNA and the New Science of the Human Past, 3 Mar 2021, Simon’s Foundation Presidential Lectures, https://www.simonsfoundation.org/event/ancient-dna-and-the-new-science-of-the-human-past/

[11] Moore’s Law refers to Gordon Moore’s perception that the number of transistors on a microchip doubles every two years, though the cost of computers is halved. Moore’s Law states that we can expect the speed and capability of our computers to increase every couple of years, and we will pay less for them. Another tenet of Moore’s Law asserts that this growth is exponential.

Moore’s Law, Wikipedia, page last updated 23 Sep 2022, https://en.wikipedia.org/wiki/Moore%27s_law

For a related discussion on the improvements in DNA sequencing technologies and data-production pipelines in recent years, see:

Kris A. Wetterstrand, DNA Sequencing Costs: Data, 2022, National Humane Genome Research Institute, https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data

[12] Paleogenomics, Wikipedia, This page was last edited on 16 December 2023, https://en.wikipedia.org/wiki/Paleogenomics

[13] Curry, Andrew, The First Europeans Weren’t Who Your Might Think, National Geographic Magazine, August 2019, online: first-europeans-immigrants-genetic-testing-feature

[14] Karafet, T., Mendez, F., Sudoyo, H. et al. Improved phylogenetic resolution and rapid diversification of Y-chromosome haplogroup K-M526 in Southeast Asia. Eur J Hum Genet23, 369–373 (2015). https://doi.org/10.1038/ejhg.2014.106

Haplogroup CT, Wikipedia, This page was last edited on 5 July 2024, https://en.wikipedia.org/wiki/Haplogroup_CT

[15] Scozzari R, Massaia A, D’Atanasio E, Myres NM, Perego UA, Trombetta B, et al. (2012) Molecular Dissection of the Basal Clades in the Human Y Chromosome Phylogenetic Tree. PLoS ONE 7(11): e49170. https://doi.org/10.1371/journal.pone.0049170

[16] Haplogroup G-M201, Wikipedia, This page was last edited on 24 January 2025, https://en.wikipedia.org/wiki/Haplogroup_G-M201

“Atlas of the Human Journey: Haplogroup G (M201)”, National Geographic. Archived from the original on 5 February 2011. Retrieved 25 March 2023

Ancestral Path Chart for Haplogroup BY211678, G-M201 Haplogroup, FamilyTreeDNA, 22 Feb 2025, https://discover.familytreedna.com/y-dna/G-BY211678/path

Cinnioğlu C, King R, Kivisild T, Kalfoğlu E, Atasoy S, Cavalleri GL, Lillie AS, Roseman CC, Lin AA, Prince K, Oefner PJ, Shen P, Semino O, Cavalli-Sforza LL, Underhill PA. Excavating Y-chromosome haplotype strata in Anatolia. Hum Genet. 2004 Jan;114(2):127-48. doi: 10.1007/s00439-003-1031-4. Epub 2003 Oct 29. PMID: 14586639, https://pubmed.ncbi.nlm.nih.gov/14586639/

Semino O, Passarino G, Oefner PJ, Lin AA, Arbuzova S, Beckman LE, De Benedictis G, Francalacci P, Kouvatsi A, Limborska S, Marcikiae M, Mika A, Mika B, Primorac D, Santachiara-Benerecetti AS, Cavalli-Sforza LL, Underhill PA (November 2000). “The genetic legacy of Paleolithic Homo sapiens sapiens in extant Europeans: a Y chromosome perspective”. Science. 290 (5494): 1155–9. Bibcode:2000Sci…290.1155S. doi:10.1126/science.290.5494.1155. PMID 11073453

[17] Haplogroup G-M201, Wikipedia, This page was last edited on 24 January 2025, https://en.wikipedia.org/wiki/Haplogroup_G-M201

Ancestral Path Chart for Haplogroup BY211678, G-M201 Haplogroup, FamilyTreeDNA, 22 Feb 2025, https://discover.familytreedna.com/y-dna/G-BY211678/path

[18] B. Navarro‑L.pez, E. Granizo‑Rodr.guez, L. Palencia‑Madrid, C. Raffone . M. Baeta, M. M. de Pancorbo, Phylogeographic review of Y chromosome haplogroups in Europe, International Journal of Legal Medicine (2021) 135:1675–1684, https://doi.org/10.1007/s00414-021-02644-6

[19] Moreira, Ricardo Gomes, Human population genetics and the idea of ancestry: an anthropological perspective (part 2), 12, Jun 2023, Ancestry Traveler, https://ancestrytraveller.i3s.up.pt/human-population-genetics-and-the-idea-of-ancestry-an-anthropological-perspective-part-2/

Elia T. Ben-Ari, Molecular biographies: Anthropological geneticists are using the genome to decode human history, BioScience, Volume 49, Issue 2, February 1999, Pages 98–103, https://doi.org/10.2307/1313533

Kass, Mikala, 23 Apr 2019, Anthropology meets genetics to tell our collective story, ASU News, Arizona State University, https://news.asu.edu/20190423-discoveries-dna-anthropology-genetics

Crawford, Michael, Anthropological Genetics, Cambridge: Camridge University Press, 2007, http://ndl.ethernet.edu.et/bitstream/123456789/52369/1/104.pdf

Benn Torres J. Anthropological perspectives on genomic data, genetic ancestry, and race. Am J Phys Anthropol. 2020 May;171 Suppl 70:74-86. doi: 10.1002/ajpa.23979. Epub 2019 Dec 14. PMID: 31837009, https://pubmed.ncbi.nlm.nih.gov/31837009/

[20] Zeng, T.C., Aw, A.J. & Feldman, M.W. Cultural hitchhiking and competition between patrilineal kin groups explain the post-Neolithic Y-chromosome bottleneck. Nat Commun 9, 2077 (2018), page1, https://doi.org/10.1038/s41467-018-04375-6

[21] Deng, Nancy, Unearthing our past: The crucial role of genetic anthropology in rewriting history’s narrative, 2 Oct 2024, Vanderbilt Vanguard, https://vanderbiltvanguard.com/unearthing-our-past-the-crucial-role-of-genetic-anthropology-in-rewriting-historys-narrative/

“Genetic anthropology.” International Society of Genetic Genealogy Wiki. https://isogg.org/wiki/Genetic_anthropology#:~:text=Genetic%20anthropology%20is%20an%20emerging,how%20did%20we%20get%20here%3F%22.  

Kass, Mikala. “Anthropology meets genetics to tell our collective story.” ASU News, 23 April 2019, https://news.asu.edu/20190423-discoveries-dna-anthropology-genetics.

[22] While genetic markers provide direct DNA-based evidence, macromorphoscopic traits serve as proxies for genetic data to measure relatedness and locality. The Macromorphoscopic Databank (MaMD) contains data from over 2,400 individuals worldwide to support these assessments.

Macromorphoscopic traits are morphological features of the human cranium that are assessed by their presence, development, or absence, rather than through measurements. These traits reflect soft-tissue differences in living individuals and are used primarily in forensic anthropology for ancestry estimation.

Researchers are now working to combine macromorphoscopic trait data with genetic markers (including mitochondrial DNA, Y-chromosomes, and single nucleotide polymorphisms) to create more comprehensive ancestry estimations. This integration aims to provide multiple lines of evidence for more accurate classifications.

Some researchers question whether macromorphoscopic traits truly reflect microevolutionary processes or serve as suitable genetic proxies for population structure. This has led to ongoing discussions about the most appropriate methods for ancestry estimation in forensic anthropology.

Miller, Mackenzie, “Accuracy of Ancestry Estimation in Forensic Anthropology: An Examination of Select Nonmetric Methods” (2023). All ETDs from UAB. 79.
https://digitalcommons.library.uab.edu/etd-collection/79,

Plemons A, Hefner JT. Ancestry Estimation Using Macromorphoscopic Traits. Acad Forensic Pathol. 2016 Sep;6(3):400-412. doi: 10.23907/2016.041. Epub 2016 Sep 1. PMID: 31239915; PMCID: PMC6474543, https://pmc.ncbi.nlm.nih.gov/articles/PMC6474543/

DiGangi, EA, Bethard JD. Uncloaking a Lost Cause: Decolonizing ancestry estimation in the United States. Am J Phys Anthropol. 2021 Jun;175(2):422-436. doi: 10.1002/ajpa.24212. Epub 2021 Jan 18. PMID: 33460459; PMCID: PMC8248240, https://pmc.ncbi.nlm.nih.gov/articles/PMC8248240/

Hinkes M. Book Review: Atlas of Human Cranial Macromorphoscopic Traits. Acad Forensic Pathol. 2018 Dec;8(4):xii–xiii. doi: 10.1177/1925362118821514. Epub 2018 Dec 19. PMCID: PMC6491539, https://pmc.ncbi.nlm.nih.gov/articles/PMC6491539/

[23] Bernardi, Laura, An Introduction to Anthropological Demography, MPIDR Working Paper WP 2007-031, Max Planck Institute for Demographic Research, https://www.demogr.mpg.de/papers/working/wp-2007-031.pdf

Sample records for anthropology human genetics, Topics by Sience.gov, Science.gov, https://www.science.gov/topicpages/a/anthropology+human+genetics.html

Sommer M. Human evolution across the disciplines: spotlights on American anthropology and genetics. Hist Philos Life Sci. 2012;34(1-2):211-36. PMID: 23272600, https://pubmed.ncbi.nlm.nih.gov/23272600/

Elhaik, Eran; Greenspan, Elliott; Staats, Sean; Krahn, Thomas; Tyler-Smith, Chris; Xue, Yali; Tofanelli, Sergio; Francalacci, Paolo; Cucca, Francesco; Pagani, Luca; Jin, Li; Li, Hui; Schurr, Theodore G.; Greenspan, Bennett; Spencer Wells, R, The GenoChip: A New Tool for Genetic Anthropology, the Genographic Consortium, Genome Biol Evol. 2013; 5(5): 1021–1031. Published online 2013 May 9. doi: 10.1093/gbe/evt066 https://pmc.ncbi.nlm.nih.gov/articles/PMC3673633/

Huckins, L., Boraska, V., Franklin, C. et al. Using ancestry-informative markers to identify fine structure across 15 populations of European origin. Eur J Hum Genet 22, 1190–1200 (2014). https://doi.org/10.1038/ejhg.2014.1

Yu JH, Taylor JS, Edwards KL, Fullerton SM. What are our AIMs? Interdisciplinary Perspectives on the Use of Ancestry Estimation in Disease Research. AJOB Prim Res. 2012;3(4):87-97. doi: 10.1080/21507716.2012.717339. PMID: 25419472; PMCID: PMC4238888, https://pmc.ncbi.nlm.nih.gov/articles/PMC4238888/

[24] Elia T. Ben-Ari, Molecular biographies: Anthropological geneticists are using the genome to decode human history, BioScience, Volume 49, Issue 2, February 1999, Pages 98–103, https://doi.org/10.2307/1313533

Shyamalika Gopalan , Samuel Pattillo Smith , Katharine Korunes , Iman Hamid , Sohini Ramachandran and Amy Goldberg, Human genetic admixture through the lens of population genomics, Philosphical Transactions of the Royal Society Biological Sciences, 18 April 2022, https://doi.org/10.1098/rstb.2020.0410

Manjusha Chintalapati Nick Patterson Priya Moorjani (2022) The spatiotemporal patterns of major human admixture events during the European Holocene,  eLife 11:e77625, https://doi.org/10.7554/eLife.77625

Korunes KL, Goldberg A. Human genetic admixture. PLoS Genet. 2021 Mar 11;17(3):e1009374. doi: 10.1371/journal.pgen.1009374. PMID: 33705374; PMCID: PMC7951803, https://pmc.ncbi.nlm.nih.gov/articles/PMC7951803/

Shriner D. Overview of admixture mapping. Curr Protoc Hum Genet. 2013;Chapter 1:Unit 1.23. doi: 10.1002/0471142905.hg0123s76. PMID: 23315925; PMCID: PMC3556814, https://pmc.ncbi.nlm.nih.gov/articles/PMC3556814/

Daniel Wegmann, Raphael Eckel, Human evolution: When admixture met selection, Current Biology, Volume 33, Issue 7, 2023, Pages R259-R261, ISSN 0960-9822,
https://doi.org/10.1016/j.cub.2023.02.077 .
(https://www.sciencedirect.com/science/article/pii/S0960982223002671 )

[25] Patrilocality is the practice where a newly married couple resides with or near the husband’s family, meaning the wife moves to live close to her husband’s parents after marriage, typically found in societies that emphasize strong male lineage and family ties; it is the opposite of matrilocality where the couple lives near the wife’s family. 

[26]  Deborah A. Bolnick, Daniel I. Bolnick, David Glenn Smith, Asymmetric Male and Female Genetic Histories among Native Americans from Eastern North America, Molecular Biology and Evolution, Volume 23, Issue 11, November 2006, Pages 2161–2174, https://doi.org/10.1093/molbev/msl088

Giovanni Destro-Bisol, Francesco Donati, Valentina Coia, Ilaria Boschi, Fabio Verginelli, Alessandra Caglià, Sergio Tofanelli, Gabriella Spedini, Cristian Capelli, Variation of Female and Male Lineages in Sub-Saharan Populations: the Importance of Sociocultural Factors, Molecular Biology and Evolution, Volume 21, Issue 9, September 2004, Pages 1673–1682, https://doi.org/10.1093/molbev/msh186

[27] Zhabagin, M., Balanovska, E., Sabitov, Z. et al. The Connection of the Genetic, Cultural and Geographic Landscapes of Transoxiana. Sci Rep 7, 3085 (2017). https://doi.org/10.1038/s41598-017-03176-z 

[28] Ibid

[29] Zeng, T.C., Aw, A.J. & Feldman, M.W. Cultural hitchhiking and competition between patrilineal kin groups explain the post-Neolithic Y-chromosome bottleneck. Nat Commun 9, 2077 (2018). https://doi.org/10.1038/s41467-018-04375-6

[30] Zhabagin, M., Balanovska, E., Sabitov, Z. et al. The Connection of the Genetic, Cultural and Geographic Landscapes of Transoxiana. Sci Rep 7, 3085 (2017). https://doi.org/10.1038/s41598-017-03176-z 

Chiaroni J, Underhill PA, Cavalli-Sforza LL. Y chromosome diversity, human expansion, drift, and cultural evolution. Proc Natl Acad Sci U S A. 2009 Dec 1;106(48):20174-9. doi: 10.1073/pnas.0910803106. Epub 2009 Nov 17. Erratum in: Proc Natl Acad Sci U S A. 2010 Jul 27;107(30):13556. PMID: 19920170; PMCID: PMC2787129, https://pmc.ncbi.nlm.nih.gov/articles/PMC2787129/

[31] Karmin M, Saag L, Vicente M, Wilson Sayres MA, Järve M, Talas UG, Rootsi S, Ilumäe AM, Mägi R, Mitt M, Pagani L, Puurand T, Faltyskova Z, Clemente F, Cardona A, Metspalu E, Sahakyan H, Yunusbayev B, Hudjashov G, DeGiorgio M, Loogväli EL, Eichstaedt C, Eelmets M, Chaubey G, Tambets K, Litvinov S, Mormina M, Xue Y, Ayub Q, Zoraqi G, Korneliussen TS, Akhatova F, Lachance J, Tishkoff S, Momynaliev K, Ricaut FX, Kusuma P, Razafindrazaka H, Pierron D, Cox MP, Sultana GN, Willerslev R, Muller C, Westaway M, Lambert D, Skaro V, Kovačevic L, Turdikulova S, Dalimova D, Khusainova R, Trofimova N, Akhmetova V, Khidiyatova I, Lichman DV, Isakova J, Pocheshkhova E, Sabitov Z, Barashkov NA, Nymadawa P, Mihailov E, Seng JW, Evseeva I, Migliano AB, Abdullah S, Andriadze G, Primorac D, Atramentova L, Utevska O, Yepiskoposyan L, Marjanovic D, Kushniarevich A, Behar DM, Gilissen C, Vissers L, Veltman JA, Balanovska E, Derenko M, Malyarchuk B, Metspalu A, Fedorova S, Eriksson A, Manica A, Mendez FL, Karafet TM, Veeramah KR, Bradman N, Hammer MF, Osipova LP, Balanovsky O, Khusnutdinova EK, Johnsen K, Remm M, Thomas MG, Tyler-Smith C, Underhill PA, Willerslev E, Nielsen R, Metspalu M, Villems R, Kivisild T. A recent bottleneck of Y chromosome diversity coincides with a global change in culture. Genome Res. 2015 Apr;25(4):459-66. https://www.semanticscholar.org/paper/A-recent-bottleneck-of-Y-chromosome-diversity-with-Karmin-Saag/1e676ee5564b690d9534a3e395d2db6de8cf7875

(Pubmed) https://pmc.ncbi.nlm.nih.gov/articles/PMC4381518/

https://www.centogene.com/fileadmin/resources/scientific-publications/publications/centogene_publication_Karmin_Monika_A_recent_bottleneck_of_Y_chromosome_diversity_coincides_with_global_change_of_culture.pdf

[32] Collins, Nathan, Wars and clan structure may explain a strange biological event 7,000 years ago, Stanford researchers find , 30 May 2018, Stanford Report, Stanford University, https://news.stanford.edu/stories/2018/05/war-clan-structure-explain-odd-biological-event

[33] Zeng, T.C., Aw, A.J. & Feldman, M.W. Cultural hitchhiking and competition between patrilineal kin groups explain the post-Neolithic Y-chromosome bottleneck. Nat Commun9, 2077 (2018). https://doi.org/10.1038/s41467-018-04375-6

[34] Collins, Nathan, Wars and clan structure may explain a strange biological event 7,000 years ago, Stanford researchers find , 30 May 2018, Stanford Report, Stanford University, https://news.stanford.edu/stories/2018/05/war-clan-structure-explain-odd-biological-event

[35] Davidski , Cultural hitchhiking and competition between patrilineal kin groups may have led to the post-Neolithic Y-chromosome bottleneck (Zeng et al. 2018) , Friday, May 25, 2018 , Eurogenes Blog, https://eurogenes.blogspot.com/2018/05/cultural-hitchhiking-and-competition.html#google_vignette

Collins, Nathan, Wars and clan structure may explain a strange biological event 7,000 years ago, Stanford researchers find , 30 May 2018, Stanford Report, Stanford University, https://news.stanford.edu/stories/2018/05/war-clan-structure-explain-odd-biological-event

[36] In genetics, a selective sweep is the process through which a new beneficial mutation that increases its frequency and becomes fixed (i.e., reaches a frequency of 1) in the population leads to the reduction or elimination of genetic variation among nucleotide sequences that are near the mutation.”

Selective sweep, Wikipedia, This page was last edited on 1 February 2025, https://en.wikipedia.org/wiki/Selective_sweep

Genetic hitchhiking, Wikipedia, This page was last edited on 10 February 2025, https://en.wikipedia.org/wiki/Genetic_hitchhiking

[37] Hashem, Ihab & Telen, Dries & Nimmegeers, Philippe & Van Impe, Jan. (2018). The Silent Cooperator: An Epigenetic Model for Emergence of Altruistic Traits in Biological Systems. Complexity. 2018. 1-16. 10.1155/2018/2082037

[38] Bersaglieri, Todd; Sabeti, Pardis C.; Patterson, Nick; Vanderploeg, Trisha; Schaffner, Steve F.; Drake, Jared A.; Rhodes, Matthew; Reich, David E.; Hirschhorn, Joel N. (2004-06-01). “Genetic signatures of strong recent positive selection at the lactase gene”. American Journal of Human Genetics74 (6): 1111–1120. doi: 10.1086/421051. PMC 1182075. PMID 15114531, https://pmc.ncbi.nlm.nih.gov/articles/PMC1182075/

Tishkoff, Sarah A.; Reed, Floyd A.; Ranciaro, Alessia; Voight, Benjamin F.; Babbitt, Courtney C.; Silverman, Jesse S.; Powell, Kweli; Mortensen, Holly M.; Hirbo, Jibril B. (2007-01-01). “Convergent adaptation of human lactase persistence in Africa and Europe”. Nature Genetics39 (1): 31–40, https://pmc.ncbi.nlm.nih.gov/articles/PMC2672153/

[39] Yi, Xin; Liang, Yu; Huerta-Sanchez, Emilia; Jin, Xin; Cuo, Zha Xi Ping; Pool, John E.; Xu, Xun; Jiang, Hui; Vinckenbosch, Nicolas (2010-07-02). “Sequencing of 50 human exomes reveals adaptation to high altitude”. Science329 (5987): 75–78. Bibcode:2010 Sci…329…75Y.  doi:10.1126/science.1190371. PMC 3711608. PMID 20595611 , https://pmc.ncbi.nlm.nih.gov/articles/PMC3711608/

[40] Cultural hitchhiking, Wikipedia, This page was last edited on 23 October 2024, https://en.wikipedia.org/wiki/Cultural_hitchhiking

Whitehead, Hal; Vachon, Felicia; Frasier, Timothy R. (May 2017). “Cultural Hitchhiking in the Matrilineal Whales”. Behavior Genetics47 (3): 324–334. doi:10.1007/s10519-017-9840-8. PMID 28275880. S2CID 3866892, https://doi.org/10.1007/s10519-017-9840-8

[40] Premo, L. S.. “Hitchhiker’s guide to genetic diversity in socially structured populations.” Current Zoology, vol. 58, no. 2, Apr. 2012, pp. 287-297. https://doi.org/10.1093/czoolo/58.2.287

[41] Carrignon, Simon, Encrico R Crema, Anne Kandler, Stephen Shennan, Postmarital residence rules and transmission pathways in cultural hitchhiking, 18 Nov 2024, PNAS, Vol 121 No 48 https://www.pnas.org/doi/10.1073/pnas.2322888121

Whitehead, Hal; Vachon, Felicia; Frasier, Timothy R. (May 2017). “Cultural Hitchhiking in the Matrilineal Whales”. Behavior Genetics47 (3): 324–334. doi:10.1007/s10519-017-9840-8. PMID 28275880. S2CID 3866892, https://doi.org/10.1007/s10519-017-9840-8

[42] Fogarty L, Otto SP. Signatures of selection with cultural interference. Proc Natl Acad Sci U S A. 2024 Nov 26;121(48):e2322885121. doi: 10.1073/pnas.2322885121. Epub 2024 Nov 18. PMID: 39556724; PMCID: PMC11621839, https://pmc.ncbi.nlm.nih.gov/articles/PMC11621839/

[43] Carrignon, Simon, Encrico R Crema, Anne Kandler, Stephen Shennan, Postmarital residence rules and transmission pathways in cultural hitchhiking, 18 Nov 2024, PNAS, Vol 121 No 48 https://www.pnas.org/doi/10.1073/pnas.2322888121

[44] Carrignon, Simon, Encrico R Crema, Anne Kandler, Stephen Shennan, Postmarital residence rules and transmission pathways in cultural hitchhiking, 18 Nov 2024, PNAS, Vol 121 No 48 https://www.pnas.org/doi/10.1073/pnas.2322888121

Fogarty L, Otto SP. Signatures of selection with cultural interference. Proc Natl Acad Sci U S A. 2024 Nov 26;121(48):e2322885121. doi: 10.1073/pnas.2322885121. Epub 2024 Nov 18. PMID: 39556724; PMCID: PMC11621839, https://pmc.ncbi.nlm.nih.gov/articles/PMC11621839/

[45] Premo, L. S.. “Hitchhiker’s guide to genetic diversity in socially structured populations.” Current Zoology, vol. 58, no. 2, Apr. 2012, pp. 287-297. https://doi.org/10.1093/czoolo/58.2.287

Whitehead, H., Laland, K.N., Rendell, L. et al. The reach of gene–culture coevolution in animals. Nat Commun 10, 2405 (2019). https://doi.org/10.1038/s41467-019-10293-y

[46] Zeng, T.C., Aw, A.J. & Feldman, M.W. Cultural hitchhiking and competition between patrilineal kin groups explain the post-Neolithic Y-chromosome bottleneck. Nat Commun9, 2077 (2018). https://doi.org/10.1038/s41467-018-04375-6

[47] Laland Kevin N. Exploring gene-culture interactions: insights from handedness, sexual selection and niche-construction case studies. Philos Trans R Soc Lond B Biol Sci. 2008 Nov 12;363(1509):3577-89. doi: 10.1098/rstb.2008.0132. PMID: 18799415; PMCID: PMC2607340, https://pmc.ncbi.nlm.nih.gov/articles/PMC2607340/

One a approach, niche construction theory (NCT), describes how organisms actively modify their own and other species’ evolutionary environments through their activities and behaviors1. This process goes beyond passive adaptation to environments, as organisms create systematic changes that affect natural selection pressures on themselves and future generations. [a]

Rather than viewing evolution as a one-way process, NCT presents it as a dynamic feedback system where organisms modify their environments, modified environments create new selection pressures, and these pressures influence subsequent evolution. This perspective transforms evolutionary theory from focusing solely on organismal evolution to examining the co-evolution of organisms with their environments. [b]

[47a] Laland K, Matthews B, Feldman MW. An introduction to niche construction theory. Evol Ecol. 2016;30:191-202. doi: 10.1007/s10682-016-9821-z. Epub 2016 Feb 3. PMID: 27429507; PMCID: PMC4922671, https://pmc.ncbi.nlm.nih.gov/articles/PMC4922671/

Niche construction, Wikipedia, This page was last edited on 6 January 2025, https://en.wikipedia.org/wiki/Niche_construction

[47b] Kevin Laland, John Odling-Smee and ohn Endler, Niche construction, sources of selection and trait coevolution, Interface Focus, 18 August 2017, https://doi.org/10.1098/rsfs.2016.0147

[48] Ecological Fallacy, Wikipedia, This page was last edited on 21 September 2024, https://en.wikipedia.org/wiki/Ecological_fallacy

[49] Spatial Aggregation and the Ecological Fallacy. Chapman Hall CRC Handb Mod Stat Methods. 2010;2010:541-558. doi: 10.1201/9781420072884-c30. PMID: 25356440; PMCID: PMC4209486, https://pmc.ncbi.nlm.nih.gov/articles/PMC4209486/

[50] See for example, Parahu, Ancient DNA from Ethiopia, 11 Mar 2023, Land of Punt, https://landofpunt.wordpress.com/2023/03/11/ancient-dna-from-ethiopia-2/

[51] Templeton, Alan R., Genetics and Recent Human Evolution, 19 Apr 2007, Perspective: The Society for the Study of Evolution, Evolution 61-7 : 1507–1519, https://www.sfu.ca/biology/courses/bisc441/Course_Materials/Readings/13-(Lect8)Templeton2007.pdf

Guha P, Srivastava SK, Bhattacharjee S, Chaudhuri TK. Human migration, diversity and disease association: a convergent role of established and emerging DNA markers. Front Genet. 2013 Aug 9;4:155. doi: 10.3389/fgene.2013.00155. PMID: 23950760; PMCID: PMC3738866 https://pmc.ncbi.nlm.nih.gov/articles/PMC3738866/

[52] Spatial Aggregation and the Ecological Fallacy. Chapman Hall CRC Handb Mod Stat Methods. 2010;2010:541-558. doi: 10.1201/9781420072884-c30. PMID: 25356440; PMCID: PMC4209486, https://pmc.ncbi.nlm.nih.gov/articles/PMC4209486/

[53] Loog L. Sometimes hidden but always there: the assumptions underlying genetic inference of demographic histories. Philos Trans R Soc Lond B Biol Sci. 2021 Jan 18;376(1816):20190719. doi: 10.1098/rstb.2019.0719. Epub 2020 Nov 30. PMID: 33250022; PMCID: PMC7741104, https://pmc.ncbi.nlm.nih.gov/articles/PMC7741104/

[54] Ainash Childebayeva, Adam Benjamin Rohrlach, Rodrigo Barquera, Maïté Rivollat, Franziska Aron, András Szolek, Oliver Kohlbacher, Nicole Nicklisch, Kurt W. Alt, Detlef Gronenborn, Harald Meller, Susanne Friederich, Kay Prüfer, Marie-France Deguilloux, Johannes Krause, Wolfgang Haak, Population Genetics and Signatures of Selection in Early Neolithic European Farmers, Molecular Biology and Evolution, Volume 39, Issue 6, June 2022, msac108, https://doi.org/10.1093/molbev/msac108

Arias L, Schröder R, Hübner A, Barreto G, Stoneking M, Pakendorf B. Cultural Innovations Influence Patterns of Genetic Diversity in Northwestern Amazonia. Mol Biol Evol. 2018 Nov 1;35(11):2719-2735. doi: 10.1093/molbev/msy169. PMID: 30169717; PMCID: PMC6231495, https://pmc.ncbi.nlm.nih.gov/articles/PMC6231495

Deborah A. Bolnick, Daniel I. Bolnick, David Glenn Smith, Asymmetric Male and Female Genetic Histories among Native Americans from Eastern North America, Molecular Biology and Evolution, Volume 23, Issue 11, November 2006, Pages 2161–2174, https://doi.org/10.1093/molbev/msl088

[55] Chyleński, M., Makarowicz, P., Juras, A. et al. Patrilocality and hunter-gatherer-related ancestry of populations in East-Central Europe during the Middle Bronze Age. Nat Commun 14, 4395 (2023). https://doi.org/10.1038/s41467-023-40072-9

[56] See for example Estes, Roberta, New Native American Mitochondrial DNA Haplogroups, 2 mar 217, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2017/03/02/new-native-american-mitochondrial-dna-haplogroups/

[57] See for example:

Arias L, Schröder R, Hübner A, Barreto G, Stoneking M, Pakendorf B. Cultural Innovations Influence Patterns of Genetic Diversity in Northwestern Amazonia. Mol Biol Evol. 2018 Nov 1;35(11):2719-2735. doi: 10.1093/molbev/msy169. PMID: 30169717; PMCID: PMC6231495

Deborah A. Bolnick, Daniel I. Bolnick, David Glenn Smith, Asymmetric Male and Female Genetic Histories among Native Americans from Eastern North America, Molecular Biology and Evolution, Volume 23, Issue 11, November 2006, Pages 2161–2174, https://doi.org/10.1093/molbev/msl088

[58] Wang, K., Tobias, B., Pany-Kucera, D. et al. Ancient DNA reveals reproductive barrier despite shared Avar-period culture. Nature 638, 1007–1014 (2025). https://doi.org/10.1038/s41586-024-08418-5

[59] Deborah A. Bolnick, Daniel I. Bolnick, David Glenn Smith, Asymmetric Male and Female Genetic Histories among Native Americans from Eastern North America, Molecular Biology and Evolution, Volume 23, Issue 11, November 2006, Pages 2161–2174, https://doi.org/10.1093/molbev/msl088

Arias L, Schröder R, Hübner A, Barreto G, Stoneking M, Pakendorf B. Cultural Innovations Influence Patterns of Genetic Diversity in Northwestern Amazonia. Mol Biol Evol. 2018 Nov 1;35(11):2719-2735. doi: 10.1093/molbev/msy169. PMID: 30169717; PMCID: PMC6231495, https://pmc.ncbi.nlm.nih.gov/articles/PMC6231495/

[60] Isern, N., Fort, J. & de Rioja, V.L. The ancient cline of haplogroup K implies that the Neolithic transition in Europe was mainly demic. Sci Rep 7, 11229 (2017). https://doi.org/10.1038/s41598-017-11629-8

[61] Wang, K., Tobias, B., Pany-Kucera, D. et al. Ancient DNA reveals reproductive barrier despite shared Avar-period culture. Nature 638, 1007–1014 (2025). https://doi.org/10.1038/s41586-024-08418-5

[62] There are several documented instances where Indo-European languages were adopted without corresponding significant genetic changes in European populations.

The Hungarians represent one of the most studied cases of language-genetic mismatch in Europe. While they speak a Uralic language (not Indo-European), they are genetically similar to their Indo-European speaking neighbors. This population preserved the language brought by the Magyars who conquered the Carpathian Basin in the ninth century CE, while becoming genetically assimilated to their Indo-European-speaking neighbors over time. [a]

The Maltese present another interesting case. They speak an Afro-Asiatic language with lexical influences from Italian and English, making them the only Afro-Asiatic speakers in Europe. Their genetic profile can be described as a mix of ancestries from throughout the Mediterranean basin, being genetically close to Eastern Sicilians while sharing genetic relatedness with Indo-European speakers from the Balkans. [b]

More recent European examples where language and genes do not match include the spread of Slavic languages across the Balkans and elsewhere. These cases demonstrate that language adoption can occur through cultural processes rather than genetic replacement. [c]

In Greece, archaeological and genetic evidence indicates that Indo-European languages spread without major population replacement. Studies show that steppe ancestry (associated with early Indo-European speakers) was present at relatively low levels of about in both elite and non-elite individuals in ancient Greece4[d] Unlike northern Europe, where steppe-descended peoples replaced up to 90% of the native population, in Greece the steppe migrants became integrated both socially and genetically into Aegean societies rather than dominating them.

Concept of Language Shift

The concept of language sift has been utilized as an attempt to explain one aspecet of the relationship between genetics and culture. Language shifts can occur through elite dominance rather than mass migration.

The “elite recruitment” model suggests that Indo-European languages likely spread through the actions of “Indo-European chiefs” and their “ideology of political clientage” rather than through complete population replacement. Small elite groups have successfully imposed their languages in various historical contexts without significantly altering the genetic makeup of the local population. [e]

David Anthony, who proposed a “revised Steppe hypothesis,” argues that Indo-European languages spread not through “chain-type folk migrations” but through this elite recruitment process, where ritual and political elites introduced these languages and were then emulated by larger groups.

As David Anthony explains, “Language shift can be understood best as a social strategy through which individuals and groups compete for positions of prestige, power, and domestic security.” A relatively small immigrant elite population can encourage widespread language shift among numerically dominant indigenous populations if they employ specific combinations of encouragements and punishments. [f]

However, some scholars like Axel Kristinsson question the elite dominance model, noting that historically, it is often the conquerors who adopt the language of the conquered rather than vice versa. He points out that for elite dominance to effectively cause language shift, it typically requires additional elements like a centralized state, which did not exist in the fourth millennium BCE when Indo-European languages began spreading. [g]

Correlations between genetic and linguistic diversity across European populations

A 2015 study by Longobardi et al. revealed significant correlations between genetic and linguistic diversity across European populations. The research employed innovative linguistic comparison tools: a refined list of Indo-European cognate words and a novel method estimating linguistic diversity from a universal inventory of grammatical polymorphisms. [h]

Click for Larger View | Source: Giuseppe Longobardi, Silvia Ghirotto, Cristina Guardiano, Francesca Tassi, Andrea Benazzo, Andrea Ceolin, Guido Barbujani, Across language families: Genome diversity mirrors linguistic variation within Europe, Physical Anthropology, 157 (4) Aug 2015: 630-640, online: https://onlinelibrary.wiley.com/doi/full/10.1002/ajpa.22758

The study found that populations speaking different languages are more likely to have different genetic makeup. The degree of genetic diversity between two European populations was proportional to their linguistic diversity.

Contrary to previous observations, language proved to be a better predictor of genetic differences than geographical distribution. Both lexical and syntactic distances showed higher correlations with genetic distances than genes did with geography

The research by Longobardi et al suggests that migrating populations carried their genes alongside their language, rather than just experiencing cultural diffusion of linguistic features. Inferred episodes of genetic admixture following major population splits had convincing correlates in the linguistic realm.

Research has shown significant correlations between genomic and linguistic diversity in Europe, with language sometimes proving to be a better predictor of genomic differences than geography.  However, these correlations do not necessarily imply that language shifts always coincide with genetic changes.

The debate about Indo-European language origins continues, with competing theories placing their birthplace either in Anatolia (with the first farmers) or on the Eurasian steppe. Recent genetic evidence supports the steppe hypothesis, identifying the Caucasus Lower Volga people as the likely originators of Proto-Indo-European around 6,500 years ago.  [i]

The spread of these languages throughout Europe likely involved both migration and cultural adoption processes, with varying degrees of genetic impact in different regions.

[a] Barbieri, Chaiara, Damián E. Blasi, Epifanía Arango-Isaza, and Kentaro K. Shimizu,  A global analysis of matches and mismatches between human genetic and linguistic histories, 21 Nov 2022, PNAS, 119 (47), https://www.pnas.org/doi/10.1073/pnas.2122084119

[b] Ibid

[c] Alberto González, Origins and spread of Indo-European languages: an alternative view, 8 Dec 2024, Ancient DNA Era, https://adnaera.com/2024/12/08/origins-and-spread-of-indo-european-languages-an-alternative-view/

[d] Shaw, Jonathan, Seeking the First Speakers of Indo-European Language, 25 Aug 2022, Harvard Magazine, https://www.harvardmagazine.com/2022/08/indo-european-languages

Iosif Lazaridis et al. ,The genetic history of the Southern Arc: A bridge between West Asia and Europe, Science 377, eabm4247 (2022). DOI:10.1126/science.abm4247, https://www.science.org/doi/10.1126/science.abm4247

Language shift, Wikipedia, This page was last edited on 23 December 2024, https://en.wikipedia.org/wiki/Language_shift

Indo-European migrations, Wikipedia, This page was last edited on 21 February 2025, https://en.wikipedia.org/wiki/Indo-European_migrations

[e] Language shift, Wikipedia, This page was last edited on 23 December 2024, https://en.wikipedia.org/wiki/Language_shift

[f] Language shift, Wikipedia, This page was last edited on 23 December 2024, https://en.wikipedia.org/wiki/Language_shift

[g] Kristinsson, Axel, Indo-European Expansion Cycles, The Journal of Indo-European Studies , Volume 40, Number 3 & 4, Fall/Winter 2012, https://www.axelkrist.com/docs/Indo-European_Expansion_Cycles.pdf

[h] Giuseppe Longobardi, Silvia Ghirotto, Cristina Guardiano, Francesca Tassi, Andrea Benazzo, Andrea Ceolin, Guido Barbujani, Across language families: Genome diversity mirrors linguistic variation within Europe, Physical Anthropology, 157 (4) Aug 2015: 630-640, online: https://onlinelibrary.wiley.com/doi/full/10.1002/ajpa.22758

[i] Giuseppe Longobardi, Silvia Ghirotto, Cristina Guardiano, Francesca Tassi, Andrea Benazzo, Andrea Ceolin, Guido Barbujani, Across language families: Genome diversity mirrors linguistic variation within Europe, Physical Anthropology, 157 (4) Aug 2015: 630-640, online: https://onlinelibrary.wiley.com/doi/full/10.1002/ajpa.22758

DeSmith, Christy, Ancient-DNA Study Identifies Originators of Indo-European Language Family, 5 Feb 2025, Harvard Gazette, https://hms.harvard.edu/news/ancient-dna-study-identifies-originators-indo-european-language-family

Lazaridis, I., Patterson, N., Anthony, D. et al. The genetic origin of the Indo-Europeans. Nature (2025). https://doi.org/10.1038/s41586-024-08531-5

Dutchen, Stephanie, A Steppe Forward: Ancient DNA challenges popular theory of Indo-European language arrival in Europe, 2 mar 2015, News & Research, Harvard Medical School, https://hms.harvard.edu/news/steppe-forward

Dutchen, Stephanie, Old Mysteries: New Insights Ancient DNA illuminates 15,000 years of history at Europe-Asia crossroads, News & Research, 25 Aug 2022, Harvard Medical School, https://hms.harvard.edu/news/old-mysteries-new-insights

Autosomal DNA Tests: Estimating Genetic Relationships and Discovering Relatives

In prior posts, I discussed the utility of Y-DNA tests as a possible avenue to gain insights and possible leads on identifying information about tracing the lineage associated with family surnames for the Griffis(ith)(es) family. [1] I have not discussed my experience of using autosomal DNA tests for genealogical and family research.

There are perhaps two unique things that atDNA tests can provide. They can:

  • identify unknown living relatives and their possible relationships; and
  • identify a possible relationship of a common ancestor that you share with a living relative.

My experience with atDNA tests have largely resulted in the initial discovery of many living third to fifth generational cousins. However, all of these distant cousins fail to document their respective lines of descent in various DNA company databases. The lack of this additional genealogical information makes it difficult to document where our common distant family connections are located.

A few of the genetic connections from the atDNA tests have provided documentation on common family connections. Based on their information, I have been able to identify a few distant connections. On two other occasions, I have discovered two half brothers.

This three part story focuses on the merits and limitations as well as my personal experience of using autosomal DNA (atDNA) tests for documenting genetic kinship ties in the Griffis family. This part provides general background to make sense of the DNA results. The second part of the story discusses my ongoing DNA discoveries from these tests. As such, the information can change in the future. The third part is devoted to my profound discovery of having two half siblings David and Greg.

General Comparison of DNA Tests

Depending on the DNA test, they tell you how much of their DNA you have inherited from unspecified ancestors on each side of your family or how far back you can trace genetic lineages through a maternal or paternal line. Genetic genealogy or results from DNA tests do not tell you where each member on your family tree lived or provide information on their specific family relationships.

DNA results can identify matches of living individuals and their possible shared kinship relationships. These estimates are based on the amount of shared DNA segments between the match and you. When it comes to identifying specific individuals and verifying kinship relationships, traditional genealogical research is typically required for interpretation of the results. [2]

There are basically three types of genetic tests used in genealogical research. Autosomal ancestry (atDNA), Y-DNA, and mitochondrial DNA (mtDNA) tests (see illustration one below). Autosomal tests can analyze a broader range of genetic family network ties than the Y-DNA or mtDNA tests. Y-DNA and mtDNA tests respectively trace the paternal and maternal sides of one’s genetic history. The atDNA tests are broader in their ability to trace genetic relatives on both sides of your family tree. However, their effectiveness of tracing ancestors is limited in terms of how many generations back they can effectively provide results. Another unique characteristic of the atDNA tests is matching living test takers through the amount of shared autosomal DNA.

Illustration One: Three Types of DNA Tests

Click for Larger View | Source: Modified version of an image found at Edward Sweeney, Types of DNA Test, MacDugall DNA Research Project, https://macdougalldna.org/types-of-dna-test-b/

As indicated in table one, while limited to the paternal line of descent, Y-DNA tests can effectively track male genetic descendants back around 300,000 years. Mitochondrial testing of the matrilineal line can also provide results that go back over 140 thousands of years. The popular atDNA ‘ethnicity’ tests can trace back through a limited number of generations. While women have two X chromosomes, DNA testing of the X-DNA is usually tested along with other chromosomes as part of an atDNA test. [3]

Table 1: Type of DNA Testing

CharacteristicAutosomal
DNA (atDNA)
Y – DNA (YDNA)Mitochondrial
DNA (mtDNA)
What does it test?All autosomal chromosomesY chromosomeMitochondria
Available toBoth males and
females
Only males can
take test
Both males and
females
How far back?5 – 9 generations~155,000 Years~200,000+ years
Source of TestingAutosomal
Chromosomes
Y ChromosomeX Chromosom
found in Mitochondria
What genealogical lines tested?All ancestry linesOnly Paternal (father’s
father’s father, etc)
Maternal (mother’s
mother’s mother, etc.)
Benefits – utilityFinding relatives within
a few generations, determining broader
ethnicity estimations,
identifying potential
matches across both sides
Tracing direct
paternal lines, surnames,
identifying specific
paternal lineages and haplogroups,
studying deep paternal ancestry
Tracing a direct
maternal line,
identifying maternal haplogroups,
analyzing ancient
ancestry patterns
Available from
the following
companies:
– ancestry.com
– Family Tree DNA
– 23andMe
– Myheritage
– Living DNA
– Family Tree DNA
– 23andME (high level)
– YSEQ
– Full Genome Corp
– Family Tree DNA
– 23andMe
– YSEQ
– Full Genome Corp

Autosomal DNA tests are useful for finding relatives, such as unknown relatives, clarifying uncertain family relationships and identifying distant relatives. Typically DNA companies identify matches up to six generations. The Y-DNA and mtDNA tests, while limited to only tracing paternal lines or maternal lines respectively, can trace genetic lineage back over 150,000 years.

Popularity of Autosomal DNA Tests

“For about a hundred dollars, it is now possible to spit into a tube, drop it in the mail, and within a couple of months gain access to a list of likely relatives. If you have any colonial American ancestors, the first thing you realize, taking a DNA test for genealogical purposes, is that potential sixth cousins are a whole lot easier to come by than you ever imagined. Even fifth cousins — people with whom you share a fourth great-grandparent — aren’t a particular scarcity.” [4]

These tests provide information about an individual’s ancestral roots, and they can help to connect people with their relatives, sometimes as distantly related as fourth or fifth cousins. Such information can be particularly useful when a person does not know their genealogical ancestry (eg. many adoptees and the descendants of forced migrants). [5]

The direct-to-consumer genetic testing market has shown significant growth in recent years, but there are indications of a recent slowdown in sales in 2023.

As many people purchased consumer DNA tests in 2018 as in all previous years combined. [6] Combined with prior years of personal consumer testing, more than 26 million consumers had added their DNA to ostensibly four leading commercial ancestry and health databases.

Chart One: atDNA Database Growth

Click for Larger View | Source: 23andMe Has More Than 10 Million Customers, April 8, 2019, The DNA Geek Blog, https://thednageek.com/23andme-has-more-than-10-million-customers/

In late 2019, there were signs of declining sales. Ancestry and 23andMe saw drops in direct website sales of 38% and 54% respectively compared to 2018. [7]

“Less than five years ago, consumer DNA tests were being hailed as the innovative technology of the future—but today, declining sales have forced several companies in the field to scale back their workforces and adjust their business strategies.” [8]

Market data from DNA companies suggest that the market continues to grow, albeit at a slower rate than the initial boom years. Projections include all type of DNA tests (e.g. genetic relatedness, ancestry, lifestyle wellness, reproductive health, personalized medicine, sports nutrition, reproductive health, diagnostics and others). Factors like market saturation among early adopters and privacy concerns may be contributing to the moderation in growth rates.

Despite the decade-long rise in sales, in 2020 there was a sudden decline in interest. Two of the leading companies, 23andMe and AncestryDNA, experienced declines in sales of DNA ancestry kits of 54 and 38 percent, respectively. The decline was attributed to market saturation, economic recession related to the COVID-19 pandemic, and privacy concerns. [9]

Since 2021, 23andMe, a prominent direct-to-consumer genetic testing company, has faced significant financial challenges that have raised concerns about its future and the security of customer data. The company’s financial situation has deteriorated rapidly. Its stock price has plummeted, losing over 97% of its value since going public in 2021. 23andMe is reportedly on the verge of bankruptcy and has never turned a profit.  In 2023, the company suffered a major data breach affecting nearly 7 million users. The company has had turnover of board members and internal dissension between board members and executive management. [10]

This situation surrounding 23andMe serves as a cautionary tale about the risks associated with entrusting sensitive genetic information to private companies and highlights the need for robust data protection measures in the rapidly evolving field of consumer genomics. It also underscores the need to have back up contingencies of one’s DNA data. [10a]

What do atDNA Tests Measure?

Autosomal DNA tests basically measure five things.

  1. Genetic Markers: atDNA tests look at hundreds of thousands of genetic markers in a DNA sample called single nucleotide polymorphisms (SNPs) across the 22 autosomal chromosome pairs. More on SNPs later in this story. These sampled SNPs represent DNA sequences that can be used to efficiently identify genetic differences and similarities between individuals.
  2. Inheritance Patterns: The tests examine the autosomal DNA inherited from both parents, which includes genetic contributions from all recent ancestors. This allows for connections to be made with relatives on all “recent” branches of a family tree, not just direct paternal or maternal lines in the past six or so generations.
  3. Genetic Relatives: The tests identify shared DNA segments between the test taker and other individuals in the DNA test company’s database, allowing for the discovery of genetic relatives that are living and linking each matched DNA tester to past generations.
  4. Ethnicity Estimates: By comparing an individual’s genetic markers to reference populations maintained by a DNA test company, autosomal DNA tests can provide estimates of a person’s ancestral origins and ethnic background.
  5. Health Traits: Many atDNA testing companies also include screening for certain inherited health conditions or physical traits that can play in one’s life to identify certain genetic code that could affect health.

The Genetic Influence of Autosomal DNA

An atDNA test is a measurement of sampled parts of your 22 autosomal chromosomes. Everyone (with rare exceptions) is born with a set of 23 pairs of chromosomes. The twenty-third chromosome is the sex chromosome. In most cases, we inherit an X chromosome from our mother and a Y or X chromossome from our father to determine our sex differentiation. (See illustration two).

Illustration Two: Karyotype of Human Chromosomes [11]

Click for Larger View | Source: Karyotype, National Genome Human Genome Research Institute, https://www.genome.gov/genetics-glossary/Karyotype

We inherit half of our chromosomes from our mother and the other half from our father. Two of those pairs are usually sex chromosomes (for most cases, XX in females and XY in males). The remaining 22 pairs of chromosomes are autosomal chromosomes or autosomes. For example, as illustrated below, chromosomes from the depicted mother are labeled in purple, and chromosomes from the depicted father are labeled in teal. (See illustration three).  [12]

Illustration Three: Inheritance of Parental Chromosomes

Click for Larger View| Source: Human Genomic Variation, Fact Sheet, National Human Genome Research Institute, 1 Feb 2023, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

The genetic inheritance patterns associated with autosomal chromosomes become more complex and diluted over generations due to recombination and variable inheritance patterns. [13] Illustration four shows the average amount of atDNA inherited by all close relations up to the third cousin level. The illustration uses the maternal side as a an example. The percentages can be replicated for the paternal side. [14] As reflected in the chart, fifty percent of one’s atDNA is inherited from each parent and roughly equally portions from grandparents to about 3x great-grandparents. 

Illustration Four: Percent of Autosomal Genetic Inheritance from Descendants

Click for Larger View | Source: Dimario, A chart illustrating the different types of cousins, including genetic kinship marked within boxes in red which shows the actual genetic degree of relationship (gene share) with ‘self’ in percentage (%), 27 April 2010, Wikimedia Commons, https://commons.wikimedia.org/wiki/File:Cousin_tree_(with_genetic_kinship).png

During meiosis [15], genetic recombination occurs, shuffling segments of DNA from each of the parents. This means that siblings may inherit different combinations of DNA segments from their parents; and with each generation, the specific segments inherited become more randomized. As a result, the amount of shared DNA between relatives decreases exponentially with each generation, making it more challenging to detect distant relationships through autosomal testing.

The random nature of genetic inheritance leads to variability in how much DNA is shared between relatives, especially for more distant relationships. This is known as variable expressivity. [16] For example, as indicated in table two, full siblings may share anywhere from about 35% to 65% of their DNA; and first cousins typically share around 12.5% of their DNA, but the actual range can vary significantly. This variability increases with more distant relationships, making it harder to precisely determine the degree of relatedness based solely on shared DNA percentages (see table two).  [17]

Table Two: Average Percent of Autosomal DNA Shared Between Selected Relatives

RelationshipAverage Percent
of DNA Shared
Range of DNA
Shared
Identical Twin100%N/A
Parent-Child50% (but 47.5% for father-son relationships)N/A
Full Sibiling50%38% – 61%
Half Sibling
Grandparent / Grandchild
Aunt / Uncle
Niece / Nephew
25%17% – 34%
1st Cousin
Great-grandparent
Great-grandchild
Great-Uncle / Aunt
Great Nephew / Niece
12.5%4% – 23%
1st Cousin once removed
Half first cousin
6.25%2% – 11.5%
2nd Cousin3.13%2% – 6%
2nd Cousin once removed
Half second cousin
1.5%0.6% – 2.5%
3rd Cousin0.78%0% – 2.2%
4th Cousin0.20%0% – 0.8%
5th Cousin
to Distant Cousin
0.05%
Source: Average Percent DNA Shared Between Relatives, 23andMe Customer Care, Tools, 23andMe, https://customercare.23andme.com/hc/en-us/articles/212170668-Average-Percent-DNA-Shared-Between-Relatives

While autosomal DNA testing has become increasingly accurate, there are still limitations in the context of estimating genetic relations and finding relatives. Current testing methods typically analyze only a subset of genetic markers. In addition, the interpretation of results relies on comparison to reference populations, which may not fully represent all ancestral groups. In the end, as previously stated, traditional genealogical research brings atDNA results into focus.

Genetic Variants: The Genetic Basis of atDNA Testing

genome is the complete set of DNA instructions found in every cell. [18] As discussed in a prior story, the human cell is a masterpiece of data compression. [19] Its nucleus, just a few microns wide, contains (if you ‘spell’ it out) six feet of genetic code comprised in a double helix called the DNA: deoxyribonucleic acid (see illustration five).

Illustration Five: Structure of Deoxyribonucleaic Acid (DNA)

Source: Modified image of DNA as found in Ruairo J Mackenie, DNA vs. RNA – 5 Key Differences and Comparison, 18 Dec 2020, updated 24 Jan 2024, Technology Networks, Genomics Research, https://www.technologynetworks.com/genomics/lists/what-are-the-key-differences-between-dna-and-rna-296719

The DNA helical molecules string together some three billion pairs of nucleotides that are comprised of proteins, sugar (deoxyribose), a phosphate and four types of nitrogenous bases which are represented by an initial: A (adenine), C (cytosine), G (guanine), and T (thymine). Nucleotides are the fundamental building blocks that make up the DNA strands. The sequence of nucleotides along the DNA strand encodes genetic information and regulates when codes are activated. [20]

The nucleotides form base pairs and are the cornerstone of genetic testing. (See illustration six.) They are the foundation of the programming language of our genetic code. Whenever a particular base is present on one side of a strand of the DNA, its complementary base is found on the other side. Guanine always pairs with cytosine. Thymine always pairs with adenine. So one can write the DNA sequence by listing the bases along either one of the two sides or strands. When DNA companies perform their tests, they essentially separate the two stands of the helix and use one side of the helix as the template or coding strand when they map out an individual’s DNA results.

Illustration Six: Relationship between Nucleotides, Base Pairs, Chromosomes, Genes, and DNA

Approximately 2% of our genome encodes proteins – this is where gene strands are located (illustration seven).  Coding “gene” DNA makes up only about one to three percent of the human genome, while noncoding DNA comprises approximately 97-99% of our total genetic material. This distribution shows that the vast majority of our genome consists of noncoding sequences. [21]

Genes are the basic unit of inherited DNA and carry information for making proteins, which perform important functions in your body. The coded regions of the genome produce proteins with structural, functional, and regulatory roles in cells and to a larger extent the human body. The remainder of our genome is made of noncoding DNA, sometimes called “junk DNA”, which is a misnomer. It is estimated that between 25% and 80% of non-coding DNA regulates gene expression (e.g. when, where, and for how long a gene is turned on to make a protein). [22] The non-coding DNA that does not regulate gene activity is composed either of deactivated genes that were once useful for our non-human ancestors (like a tail) or parasitic DNA from virus that have entered our genome and replicated themselves hundreds or thousands of times over the generations, or generally serve no purpose in the host organism.

Illustration Seven: Coding and Non-Coding Regions of the Genome

Clck for Larger View | Source: Modified version of graphic found at – Non-Coding DNA, AncestryDNA Learning Hub, https://www.ancestry.com/c/dna-learning-hub/junk-dna

Out of 3.2 billion DNA letters or nucleotides, there are only a ‘handful of places’ on the DNA ribbon that might be different between individuals. Humans share a very high percentage of their DNA. The exact figure is subject to some debate and depends on how it is measured. The commonly cited figure is that humans are 99.9% genetically identical. More recent research suggests a slightly lower, but still very high, level of similarity. Humans share a very high percentage of their DNA – roughly 99.4% to 99.9%. The small differences of 0.1 and 0.6 between individuals are crucial for understanding human diversity and health. [23]

As indicated in illustration eight, there are multiple types of genomic variants that comprise 0.4 percent of the genome.. The smallest genomic variants are known as single-nucleotide variants (SNVs). Each SNV reflects a difference in a single nucleotide (or letter) in the DNA chain. For a given SNV, the DNA letter at that genomic position might be a C in one person but a T in another person as reflected in illustration nine. [24]

Illustration Eight: Potential Sources of Genetic Variants for atDNA Testing

Click for Larger View | Source: Modification of a chart found at – Chart Human Genomic Variation, Fact Sheet, National Human Genome Research Institute, 1 Feb 2023, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

Single-nucleotide variants (SNVs) are differences of one nucleotide at a specific location in the genome. An individual may have different nucleotides at a specific location on each chromosome (getting a different one from each parent), such as with Person 1 in illustration nine. An individual may also have the same nucleotide at such a location on both chromosomes, such as with Person 2 and Person 3 in the illustration.

Illustration Nine: An Example of a single-nucleotide variant (SNV)

Click for Larger View | Source: Human Genomic Variation, Fact Sheet, National Human Genome Research Institute, 1 Feb 2023, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

As reflected in illustration ten below, there are also a small group of genetic variants that are called insertions and deletions of nucleotides.

“Insertion/deletion variants reflect extra or missing DNA nucleotides in the genome, respectively, and typically involve fewer than 50 nucleotides. Insertion/deletion variants are less frequent than SNVs but can sometimes have a larger impact on health and disease (e.g., by disrupting the function of a gene that encodes an important protein).” [25]

One of the most common types of insertion/deletion variants are tandem repeats. [26] Tandem Repeats are short stretches of nucleotides that are repeated multiple times and are highly variable among people. Different chromosomes can vary in the number of times such short nucleotide stretches are repeated, ranging from a few times to hundreds of times.

Each person has a collection of different genomic variants. For example, in illustration ten below, Person 1 has an insertion variant; Person 2 has a SNV and deletion variant; and Person 3 has an insertion, SNV, and deletion variant. All three people have different tandem repeats. Different variants can be inherited from different parents as reflected in the illustration.

Illustration Ten: Examples of Other Types of Genetic Variants

Click for Larger View | Source: Human Genomic Variation, Fact Sheet, National Human Genome Research Institute, 1 Feb 2023, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

As indicated in illustration seven above, the third general type of genomic variations are structural variants (SVs). Structural variants extend beyond small stretches of nucleotides to larger chromosomal regions. These large-scale genomic differences involve at least 50 nucleotides and as many as thousands of nucleotides that have been inserted, deleted, inverted or moved from one part of the genome to another. [27]

Tandem repeats that contain more than 50 nucleotides are considered structural variants. In fact, such large tandem repeats account for nearly half of the structural variants present in human genomes. When a structural variant reflects differences in the total number of nucleotides involved, it is called a copy number variant (CNV). CNVs are distinguished from other structural variants, such as inversions and translocations, because the latter types often do not involve a difference in the total number of nucleotides. [28]

Cornerstone of atDNA Testing: Single Nucleotide Polymorphisms (SNPs)

A subtype of SNVs is the single-nucleotide polymorphism (SNP), pronounced as “snip” for short. To be considered a SNP, a SNV must be present in at least 1% of the human population. As such, a SNP is more common than the rare single-nucleotide differences.  [29]

Among the genetic variants, SNPs are relatively common, occurring approximately once every 500-1000 base pairs in the human genome. This translates to about 4 to 5 million SNPs in an individual’s genome. Scientists have found more than 600 million SNPs in populations around the world. The combination of technical feasibility, scientific reliability, and analytical power makes SNPs the optimal choice for autosomal DNA testing in genealogical and ancestry applications. [30]

Ancestry information markers refers to locations in the genome that have varied sequences at that location and the relative abundance of those markers differs based on the continent from which individuals can trace their ancestry. So by using a series of these ancestry information markers, sometimes 20 or 30 more, and genotyping an individual you can determine from the frequency of those markers where their great, great, great, great ancestors may have come from. [31]

SNPs represent natural variations that make individuals unique while being common enough to be reliable DNA test markers. Their high frequency makes them ideal markers for genetic analysis. The vast majority of SNPs have no effect on health or development. SNPs are generally found in the DNA between genes rather than within genes themselves. [32]

While other genetic markers exist, SNPs are preferred ancestry information markers. SNPs are used for genetic testing based on their reliability and accuracy. SNPs are stable genetic markers that are passed down through generations. SNPs offer more detailed information about both recent and ancient ancestry. They also allow for fairly precise ethnic profiling and ancestral location inference.[33]

How atDNA Tests Figure Out Genetic Relationships

In a “Nutshell”: How do DNA companies Figure Out Genetic Relationships

Analyzing SNPs: DNA companies analyze hundreds of thousands of single nucleotide polymorphisms (SNPs) across the 22 autosomal chromosomes. [34]

The results from different atDNA test companies can vary. The variance is based on a number of factors. All major DNA testing companies use equipment that analyze DNA specimens with what are called ‘chips’ that use DNA microarray technology supplied by a company named Illumina. However, different companies use different versions of the Illumina chip and each version tests different sets of SNP (Single Nucleotide Polymorphism) locations.

Illustration Ten: How DNA Microarray Technology Analyzes Autosomal DNA

Source: Bergström, Ann-Louise and Lasse Folkersen , DNA microarray, 15 May 2020, Moving Science, https://movingscience.dk/dna-microarray/

Companies can specify their own “other” locations to be included on their chip. The number of markers tested varies significantly by company. FamilyTreeDNA uses a customized Illumina chip. 23andMe and AncestryDNA use a customized Illumina Global Screening Array (GSA) chip. Living DNA uses an Affymetrix Axiom microarray (Sirius) chip. My Heritage uses an Illumina GSA chip. [35]

Illustration of Illumina Microarray Chips

Source: Web Graphic Array with GE Inserts, Illumina, Powerfully Informative Microarrays, Illumina,https://www.illumina.com/techniques/microarrays.html

“Each DNA testing company purchases DNA processing equipment. Illumina is the big dog in this arena. Illumina defines the capacity and structure of each chip. In part, how the testing companies use that capacity, or space on each chip, is up to each company. This means that the different testing companies test many of the same autosomal DNA SNP locations, but not all of the same locations. … This means that each testing company includes and reports many of the same, but also some different SNP locations when they scan your DNA. …  In addition to dealing with different file formats and contents from multiple DNA vendors, companies change their own chips and file structure from time to time. In some cases, it’s a forced change by the chip manufacturer. Other times, the vendors want to include different locations or make improvements.” [36]

When DNA companies change DNA chips, a different version of the company’s own file may contain different positions. DNA testing companies have to “fill in the blanks” for compatibility, and they do this using a technique called imputation. Illumina forced their customers to adopt imputation in 2017 when they dropped the capacity of their chip. [37]

Identify Matching Segments: The DNA test software for respective DNA companies compare the SNP data between two individuals to identify segments of DNA that appear to be identical or similar. These matching DNA segments indicate the likelihood of DNA inherited from a common ancestor. [38]

The ability to identify DNA matches between individuals is largely influenced by the size of database tests and the SNPs that were sampled to atDNA tests. As indicated, there are main differences between atDNA tests from various companies (e.g. 23andMe, Ancestry.com, FamilyTree DNA, LivingDNA, MyHeritage) regarding SNPs that are tested and the relative size of their respective database results.

Each company maintains its own proprietary reference databases and matching algorithms. As indicated in table three below, AncestryDNA has a larger customer database (over 20 million) compared to 23andMe (about 12 million). This gives AncestryDNA an advantage for finding genetic relatives.

Table Three: Data Base Size and Number of SNPs Tested by DNA Company in 2024

DNA
Company
Data Base Size of
atDNA Test Results
No. of Autosome
SNPs Tested
23andMe14 Million630,`132
FamilyTreeDNA1.7 million612,272
AncestryDNA25 million637,639
My Heritage8.5 million576,157
Living DNA300,000683,503
Source: Autosomal DNA testing comparison chart, International Society of Genetic Genalogy Wiki, This page was last edited on 8 October 2024, https://isogg.org/wiki/Autosomal_DNA_testing_comparison_chart

Measuring Segment Length: The length of matching segments of SNPs is measured in centimorgans (cM). Centimorgans measure the likelihood of genetic recombination between two markers on a chromosome. One centimorgan represents a one percent chance that two genetic markers will be separated by a recombination event in a single generation. This measurement helps geneticists and genealogists estimate how close two individuals are genetically related. [39]

Centimorgans (cM) are a crucial unit of measurement in genetic atDNA testing. It is used to quantify genetic distance and determine relationships between individuals based on shared DNA. The more centimorgans two people share, the more likely they are related. in addition to the number of cMs shared, longer segments generally indicate a closer relationship.

One cM corresponds on the average to about 1 million base pairs in humans. The total human genome is approximately 7400 cM long. A parent-child relationship typically shares about 3400-3700 cM. More distant relatives share fewer cMs. However, there can be overlap in cM ranges for different relationship types, so additional genealogical research is often needed to determine exact relationships.

(A centiMorgan) is less of a physical distance and more of a measurement of probability. It refers to the DNA segments that you have in common with others and the likelihood of sharing genetic traits. The ends of shared segments are defined by points where DNA swapped between two chromosomes, and the centimorgan is a measure of the probability of getting a segment that large when these swaps occur.” [40]

Chart One: Ranges of Shared centiMorgans with Family

Click for Larger View | Source: Bettinger, Blaine, Version 4.0! March 2020 Update to the Shared cM Project!, 27 Mar 2020, The Genetic Genealogist, https://thegeneticgenealogist.com/2020/03/27/version-4-0-march-2020-update-to-the-shared-cm-project/

When you take an atDNA test, the testing company compares your DNA to others in their database. The amount of DNA you share with a match is reported in centimorgans. Generally, the more centimorgans you share with someone, the more closely you are related to this other person. Shared centimorgan ranges can often indicate how many generations separate two people. Certain shared cM values can also suggest possible half-sibling or half-first cousin relationships as opposed to full relatives.

Calculating Total Shared DNA: The total amount of shared DNA is calculated by summing up the lengths of all matching segments, typically expressed in cMs or as a percentage of the total amount of shared SNPs sampled. [41]

Applying Thresholds: Each company sets minimum thresholds for segment length and total shared DNA to be considered a match. For example, FamilyTree DNA requires at least one segment of 9 cM or more.

Table Four: Different cM Thresholds for atDNA Matches Across DNA Companies

DNA CompanyCriteria for matching segments
23andMe9 cMs and at least 700 SNPs for one half-identical region

5 cMs and 700 SNPs with at least two half-identical regions being shared
FamilyTreeDNAAll matching segments must be at least 6 cMs in length. almost all matching segments contain at least 800 SNPs & all matching segments contain at least 600 SNPs.
AncestryDNA6 cMs per segment before the Timber algorithm is applied and a total of at least 8 cMs after Timber is applied.
My Heritage8 cM for the first matching segment and at least 6 cMs for the 2nd matching segment; 12 cM for the first matching segment in people whose ancestry is at least 50% Ashkenazi Jewish
Living DNA9.46 cMs for the first segment
Source: Autosomal DNA testing comparison chart, International Society of Genetic Genalogy Wiki, This page was last edited on 8 October 2024, https://isogg.org/wiki/Autosomal_DNA_testing_comparison_chart

Relationship Prediction: The amount of shared DNA is compared to expected ranges for different relationships to predict how two people may be related. Close relationships like parent/child or full siblings have very distinct amounts of shared DNA, while more distant relationships have overlapping ranges. [42]

Special Considerations: Some of the DNA companies use phasing algorithms to improve accuracy, especially for analyzing smaller shared segments. Some also apply special algorithms for populations with higher rates of endogamy, like Ashkenazi Jews. [43]

Moving Onward

I imagine all of this makes total sense. I, however, believe, all of this is totally confusing. To walk away with some semblance of understanding, I would focus on the following observations:

  • DNA tests can only provide so much information. Traditional genealogical research brings atDNA results into focus. Genetic and traditional research strategies can work hand in hand.
  • atDNA tests have the ability to trace living genetic relatives on both sides of your family tree. However, their effectiveness is limited in terms of how many generations back they can effectively provide results.
  • While autosomal DNA testing has become increasingly accurate, there are still limitations in the context of estimating genetic relations and finding relatives.
  • When looking at atDNA matches, centimorgans (cM) are the key unit of measurement in genetic atDNA testing. It is used to determine relationships between individuals based on shared DNA. The more centimorgans two people share, the more likely they are related. in addition to the number of cMs shared, longer segments generally indicate a closer relationship.

Sources

Feature image: The image depicts a branch from a massive family tree that shows 6,000 relatives spanning seven generations.  It is part of a study that links 13 million people related by genetics or marriage.  Source: Jocelyn Kaiser, Thirteen million degrees of Kevin Bacon: World’s largest family tree shines light on life span, who marries whom, Science, 1 Mar 2018, https://www.science.org/content/article/thirteen-million-degrees-kevin-bacon-world-s-largest-family-tree-shines-light-life-span .

See the original study behind this effort at: Kaplanis J, Gordon A, Shor T, Weissbrod O, Geiger D, Wahl M, Gershovits M, Markus B, Sheikh M, Gymrek M, Bhatia G, MacArthur DG, Price AL, Erlich Y. Quantitative analysis of population-scale family trees with millions of relatives. Science. 2018 Apr 13;360(6385):171-175. doi: 10.1126/science.aam9309. Epub 2018 Mar 1. PMID: 29496957; PMCID: PMC6593158. https://pmc.ncbi.nlm.nih.gov/articles/PMC6593158/

[1] See the following stories:

[2] Bettinger, Blaine, Everyone Has Two Family Trees – A Genealogical Tree and a Genetic Tree, 10 Nov 2009, The Genetic Genealogist, https://thegeneticgenealogist.com/2009/11/10/qa-everyone-has-two-family-trees-a-genealogical-tree-and-a-genetic-tree/

Understanding genetic ancestry testing, International Society of Genetic Genealogy Wiki, This page was last edited on on 25 August 2015, https://isogg.org/wiki/Understanding_genetic_ancestry_testing

[3] Human Y-chromosome DNA haplogroup, Wikipedia, This page was last edited on 5 October 2024,, https://en.wikipedia.org/wiki/Human_Y-chromosome_DNA_haplogroup

Human mitochondrial DNA haplogroup, Wikipedia, This page was last edited on 5 October 2024, https://en.wikipedia.org/wiki/Human_mitochondrial_DNA_haplogroup

Rowe, Katy, Genealogy’s Secret Weapon: How Using mtDNA Can Solve Family Mysteries, 10 May 2023, FamilyTreeDNA Blog, https://blog.familytreedna.com/mtdna/

MtDNA testing comparison chart, International Society of Genetic Genealogy Wiki, This page was last edited on 3 September 2023, https://isogg.org/wiki/MtDNA_testing_comparison_chart

Y chromosome DNA tests, International Society of Genetic Genealogy Wiki, This page was last edited on 6 September 2024, https://isogg.org/wiki/Y_chromosome_DNA_tests

Y-DNA STR testing comparison chart, International Society of Genetic Genealogy Wiki, This page was last edited on 11 July 2022, https://isogg.org/wiki/Y-DNA_STR_testing_comparison_chart

Balding, David, Debbie Kennett and Mark Thomas, Understanding genetic ancestry testing, This page was last edited on 25 August 2015, Iternational Society of Genetic Genealogy Wiki, https://isogg.org/wiki/Understanding_genetic_ancestry_testing

Rowe-Schurwanz, Kathy, Using mtDNA for Genealogical Research, Aug 14, 2024, FamilyTreeDNA Blog, https://blog.familytreedna.com/using-mtdna-genealogical-research/

Rowe-Schurwanz, Kathy, How Autosomal DNA Testing Works, June10, 2024, FamilyTreeDNA Blog, https://blog.familytreedna.com/how-autosomal-dna-testing-works/

Unveiling the Power of Big Y-700: Unraveling the Journey and Advantages, Oct 21, 2022, FamilyTreeDNA Blog, https://blog.familytreedna.com/big-y-700/

Mitochondrial Eve, Wikipedia, This page was last edited on 18 September 2024, https://en.wikipedia.org/wiki/Mitochondrial_Eve

Y-chromosomal Adam, Wikipedia, This page was last edited on 19 September 2024, https://en.wikipedia.org/wiki/Y-chromosomal_Adam

[4] Newton, Maud, America’s Ancestry Craze: Making sense of our family-tree obsession, June 2014, Harper’s Magazine, https://harpers.org/archive/2014/06/americas-ancestry-craze/

[5] Jorde LB, Bamshad MJ. Genetic Ancestry Testing: What Is It and Why Is It Important? JAMA. 2020 Mar 17;323(11):1089-1090. doi:10.1001/jama.2020.0517 PMID: 32058561; PMCID: PMC8202415 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8202415/

[6] Antonio Regalodo, More than 26 million people have taken an at-home ancestry test, MIT Technology Review, 11 Feb 2019, https://www.technologyreview.com/2019/02/11/103446/more-than-26-million-people-have-taken-an-at-home-ancestry-test/

Covering Your Bases: Introduction to Autosomal DNA Coverage, Legacy Tree Genealogists, https://www.legacytree.com/blog/introduction-autosomal-dna-coverage

DNA Geek, Family DNA Tests for Ancestry & Genealogy, Navigating the World of DNA,

[7] Has the consumer DNA test boom gone bust?, Feb 20, 2020, updated Jul 28, 2024, Advisory Board, https://www.advisory.com/daily-briefing/2020/02/20/dna-tests 

[8] Ibid

[9] Krimsky Sheldon, The Business of DNA Ancestry, in: Understanding DNA Ancestry. Understanding Life. Cambridge University Press; 2021, Pages 8-16.

Molla, Rami, Why DNA tests are suddenly unpopular, 13 Feb 2020, Vox, https://www.vox.com/recode/2020/2/13/21129177/consumer-dna-tests-23andme-ancestry-sales-decline#

Spiers, Caroline, Keeping It in the Family: Direct-to-Consumer Genetic Testing and the Fourth Amendment, Houston Law Review, Vol 59, Issue 5, May 23 2020, https://houstonlawreview.org/article/36547-keeping-it-in-the-family-direct-to-consumer-genetic-testing-and-the-fourth-amendment

Has the consumer DNA test boom gone bust?, Updated 28 Jul 2023, Advisory Board, https://www.advisory.com/daily-briefing/2020/02/20/dna-tests

Linder, Emmett, As 23andMe Struggles, Concerns Surface About Its Genetic Data, 5 Oct 2024, New York Times, https://www.nytimes.com/2024/10/05/business/23andme-dna-bankrupt.html

Estes, Roberta, DNA Testing Sales Decline: Reason and Reasons, 11 Feb 2020, DNAeXplained – Genetic Genealogy Blog, https://dna-explained.com/2020/02/11/dna-testing-sales-decline-reason-and-reasons/

[10] Fish, Eric, The Sordid Saga of 23andMe, 21 Oct 2024, All Science Great & Small, https://allscience.substack.com/p/the-sordid-saga-of-23andme

Prictor, Megan, Millions of People’s DNA in Doubt as 23andMe Faces Bankruptcy, 21 Oct 2024, Science Alert, https://www.sciencealert.com/millions-of-peoples-dna-in-doubt-as-23andme-faces-bankruptcy

Linder, Emmett, As 23andMe Struggles, Concerns Surface About Its Genetic Data, 5 Oct 2024, New York Times, https://www.nytimes.com/2024/10/05/business/23andme-dna-bankrupt.html

Allyn, Bobby, 23andMe is on the brink. What happens to all its DNA data?, NPR, https://www.npr.org/2024/10/03/g-s1-25795/23andme-data-genetic-dna-privacy

23andMe Facing Bankruptcy, FoxLocal 26, , https://youtu.be/ZfBOCxbWAeY

[10a] Estes, Roberta, 23andMe Trouble – Step-by-Step Instructions to Preserve Your Data and Matches, 19 Sep 2024, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2024/09/19/23andme-trouble-step-by-step-instructions-to-preserve-your-data-and-matches/

[11] A karyotype is a visual representation of an individual’s complete set of chromosomes, displaying their number, size, and structure, typically arranged in pairs and ordered by size.

“A karyotype is the general appearance of the complete set of chromosomes in the cells of a species or in an individual organism, mainly including their sizes, numbers, and shapes. … A karyogram or idiogram is a graphical depiction of a karyotype, wherein chromosomes are generally organized in pairs, ordered by size and position of centromere for chromosomes of the same size.”

Karotype, Wikipedia, This page was last edited on 12 September 2024, https://en.wikipedia.org/wiki/Karyotype

Karyotype, Wikipedia, This page was last edited on 17 October 2024,, https://en.wikipedia.org/wiki/Karyotype

Dutra, Ameria, Karyotype, National Genome Human Genome Research Institute, https://www.genome.gov/genetics-glossary/Karyotype

Karyotype, ScienceDirect, definition and discussion is from from Antonie D. Kline and Ethylin Wang Jabs, eds., Genomics in the Clinic,  2024, Shen Gu, Bo Yuan, Ethylin Wang Jabs, Christine M. Eng , Chapter 2 – Basic Principles of Genetics and Genomics,  Pages 5-28 ,  https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/karyotype 

Shen Gu, Bo Yuan, Ethylin Wang Jabs, Christine M. Eng, Chapter 2 – Basic Principles of Genetics and Genomics, Editor(s): Antonie D. Kline, Ethylin Wang Jabs, Genomics in the Clinic, Academic Press, 2024, Pages 5-28

[12] Autosomes are the non-sex chromosomes found in the cells of organisms. Autosomes are any chromosomes that are not sex chromosomes (allosomes). In humans, there are 22 pairs of autosomes, numbered from 1 to 22. They come in identical pairs in both males and females. They are numbered based on size, shape, and other properties. They contain genes that control the inheritance of all traits except sex-linked ones.

[13] Recombination is a process by which pieces of DNA are broken and recombined to produce new combinations of nucleotides or alleles. Recombination primarily happens between homologous chromosomes, which are paired chromosomes with similar genetic information, allowing for the exchange of corresponding DNA segments.

During meiosis, when homologous chromosomes pair up, a process called “crossing over” occurs where DNA strands break and rejoin, swapping genetic material between the chromosomes. This recombination process creates genetic diversity at the level of genes that reflects differences in the DNA sequences of different organisms. 

Recombination, Scitable by nature Education, Nature, 2014, https://www.nature.com/scitable/definition/recombination-226/

Genetic recombination, Wikipedia, This page was last edited on 5 October 2024, https://en.wikipedia.org/wiki/Genetic_recombination

Alberts B, Johnson A, Lewis J, et al., General Recombination, in The cell, New York: Garland Science; 2002. https://www.ncbi.nlm.nih.gov/books/NBK26898/

[14] Autosomal DNA Statistics, International Society of Genetic Genealogy Wiki, Page was last edited 4 August 2022, Page accessed 14 Aug 2022, https://isogg.org/wiki/Autosomal_DNA_statistics

Nicole Dyer, Charts for Understanding DNA Inheritance, 14 Aug 2019, Family Locket, Page accessed 10 Oct 2021, https://familylocket.com/charts-for-understanding-dna-inheritance/

[15] Meiosis is a type of cell division that reduces the number of chromosomes in the parent cell by half and produces four gamete cells. This process is required to produce egg and sperm cells for sexual reproduction.

Meiosis, 2014, Scitable by Nature Education, Nature, https://www.nature.com/scitable/definition/meiosis-88/

Gilchrist, Daniel, Meiosis, National Human Genome Research Institute, https://www.genome.gov/genetics-glossary/Meiosis

Meiosis, Wikipedia, This page was last edited on 22 August 2024, https://en.wikipedia.org/wiki/Meiosis

[16] What are reduced penetrance and variable expressivity?, MedlinePlus, https://medlineplus.gov/genetics/understanding/inheritance/penetranceexpressivity/

Miko, Iiona,  Phenotype variability: penetrance and expressivity. Nature Education 1(1):137 , 2008, https://www.nature.com/scitable/topicpage/phenotype-variability-penetrance-and-expressivity-573/

Expressivity (genetics), Wikipedia, This page was last edited on 9 October 2024, https://en.wikipedia.org/wiki/Expressivity_(genetics)

[17] Average Percent DNA Shared Between Relatives, 23andMe Customer Care, Tools, 23andMe, https://customercare.23andme.com/hc/en-us/articles/212170668-Average-Percent-DNA-Shared-Between-Relatives

Autosomal Statistics, International Society of Genetic Genealogy Wiki, This page was last edited on 17 October 2022, https://isogg.org/wiki/Autosomal_DNA_statistics

[18] The genome is the entire set of DNA instructions found in a cell. In humans, the genome consists of 23 pairs of chromosomes located in the cell’s nucleus, as well as a small chromosome in the cell’s mitochondria. A genome contains all the information needed for an individual to develop and function.

Human Genomic Variation, Fact Sheet, National Human Genome Research Institute, 1 Feb 2023, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

[19] Fundamental Concepts of Genetics and about the Human Genome, Eupedia, page accessed 3 Feb 2021, https://www.eupedia.com/genetics/human_genome_and_genetics.shtml

Sheldon Krimsky, Understanding DNA Ancestry, Cambridge: Cambridge University , 2022, Page 18

Human Genomic Variation, Fact Sheet, National Human Genome Research Institute, 1 Feb 2023, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

[20] Nucleotide, National Cancer Institute, https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/nucleotide

Nucleotide, Wikipedia, This page was last edited on 3 September 2024, https://en.wikipedia.org/wiki/Nucleotide

Brody, Lawrence, Nucleotide, National Human Genome Research Institute, 1 Nov 2024, https://www.genome.gov/genetics-glossary/Nucleotide 

[21] Non-Coding DNA, AncestryDNA Learning Hub, 16 Aug 2016, https://www.ancestry.com/c/dna-learning-hub/non-coding-dna

What is Noncoding DNA?, MedlinePlus, https://medlineplus.gov/genetics/understanding/basics/noncodingdna/

[22] Non-Coding DNA, AncestryDNA Learning Hub, https://www.ancestry.com/c/dna-learning-hub/junk-dna

Ohno, Susumu. “So Much ‘Junk’ DNA in Our Genome.” Brookhaven Symposium on Biology, Volume 23, 1972: 366-370.

Zhang F, Lupski JR. Non-coding genetic variants in human disease. Hum Mol Genet. 2015 Oct 15;24(R1):R102-10. doi: 10.1093/hmg/ddv259. Epub 2015 Jul 7. PMID: 26152199; PMCID: PMC4572001 https://pmc.ncbi.nlm.nih.gov/articles/PMC4572001/

Peña-Martínez EG, Rodríguez-Martínez JA. Decoding Non-coding Variants: Recent Approaches to Studying Their Role in Gene Regulation and Human Diseases. Front Biosci (Schol Ed). 2024 Mar 1;16(1):4. doi: 10.31083/j.fbs1601004. PMID: 38538340; PMCID: PMC11044903 https://pmc.ncbi.nlm.nih.gov/articles/PMC11044903/

Malte Spielmann, Stefan Mundlos, Looking beyond the genes: the role of non-coding variants in human disease, Human Molecular Genetics, Volume 25, Issue R2, 1 October 2016, Pages R157–R165, https://doi.org/10.1093/hmg/ddw205

Vitsios, D., Dhindsa, R.S., Middleton, L. et al. Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning. Nat Commun 12, 1504 (2021). https://doi.org/10.1038/s41467-021-21790-4

Ellingford, J.M., Ahn, J.W., Bagnall, R.D. et al. Recommendations for clinical interpretation of variants found in non-coding regions of the genome. Genome Med 14, 73 (2022). https://doi.org/10.1186/s13073-022-01073-3

[23]  The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015). https://doi.org/10.1038/nature15393https://www.nature.com/articles/nature15393#citeas

Human Genomic Variation, National Human Genome Research Institute, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

For the 99.9 percent figure, see for example: Krimsky, Sheldon, Understanding DNA Ancestry, Cambridge, Cambridge University Press, 2022, Page 18

[22] Zou H, Wu LX, Tan L, Shang FF, Zhou HH. Significance of Single-Nucleotide Variants in Long Intergenic Non-protein Coding RNAs. Front Cell Dev Biol. 2020 May 25;8:347. doi: 10.3389/fcell.2020.00347. PMID: 32523949; PMCID: PMC7261909

The Order of Nucleotides in a Gene Is Revealed by DNA Sequencing, Scitable, Nature Education, https://www.nature.com/scitable/topicpage/the-order-of-nucleotides-in-a-gene-6525806/

single nucleotide variant, National Cancer Institute, https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/single-nucleotide-variant

Wright, A.F. (2005). Genetic Variation: Polymorphisms and Mutations. In eLS, (Ed.). https://doi.org/10.1038/npg.els.0005005

Single-nucleotide polymorphism, Wikipedia, This page was last edited on 29 September 2024, https://en.wikipedia.org/wiki/Single-nucleotide_polymorphism

SNVs vs. SNPs, CD Genomics, https://www.cd-genomics.com/resource-snvs-vs-snps.html

[23] Human Genomic Variation, Fact Sheet, National Human Genome Research Institute, 1 Feb 2023, https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation

[24] Ichikawa, K., Kawahara, R., Asano, T. et al. A landscape of complex tandem repeats within individual human genomes. Nat Commun 14, 5530 (2023). https://doi.org/10.1038/s41467-023-41262-1 

Tandem Repeat, Wikipedia, This page was last edited on 12 July 2024, https://en.wikipedia.org/wiki/Tandem_repeat

Myers, P., Tandem repeats and morphological variation. Nature Education 1(1):1, 2007,  http://scienceblogs.com/pharyngula/2007/10/tandem_repeats_and_morphologic.php

Usdin K. The biological effects of simple tandem repeats: lessons from the repeat expansion diseases. Genome Res. 2008 Jul;18(7):1011-9. doi: 10.1101/gr.070409.107. PMID: 18593815; PMCID: PMC3960014. https://pmc.ncbi.nlm.nih.gov/articles/PMC3960014/

Ichikawa, K., Kawahara, R., Asano, T. et al. A landscape of complex tandem repeats within individual human genomes. Nat Commun 14, 5530 (2023). https://doi.org/10.1038/s41467-023-41262-1 

Mitsuhashi, S., Frith, M.C., Mizuguchi, T. et al. Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads. Genome Biol 20, 58 (2019). https://doi.org/10.1186/s13059-019-1667-6 

Sequencing 101: Tandem repeats, 22 Nov 2023, PacBio, https://www.pacb.com/blog/sequencing-101-tandem-repeats/

Kai Zhou, Abram Aertsen, Chris W. Michiels, The role of variable DNA tandem repeats in bacterial adaptation, FEMS Microbiology Reviews, Volume 38, Issue 1, January 2014, Pages 119–141, https://doi.org/10.1111/1574-6976.12036

Fan H, Chu JY. A brief review of short tandem repeat mutation. Genomics Proteomics Bioinformatics. 2007 Feb;5(1):7-14. doi: 10.1016/S1672-0229(07)60009-6. PMID: 17572359; PMCID: PMC5054066. https://pmc.ncbi.nlm.nih.gov/articles/PMC5054066/

[25] Structural variation, Wikipedia, This page was last edited on 30 August 2024, https://en.wikipedia.org/wiki/Structural_variation

Scott AJ, Chiang C, Hall IM. Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes. Genome Res. 2021 Dec;31(12):2249-2257. doi: 10.1101/gr.275488.121. Epub 2021 Sep 20. PMID: 34544830; PMCID: PMC8647827 https://pmc.ncbi.nlm.nih.gov/articles/PMC8647827/

Feuk, L., Carson, A. & Scherer, S. Structural variation in the human genome. Nat Rev Genet 7, 85–97 (2006). https://doi.org/10.1038/nrg1767 

[26] CNVs are typically defined as DNA segments that are: larger than 1,000 base pairs (1 kilobase); usually less than 5 megabases in length; and  can include both duplications (additional copies) and deletions (losses) of genetic material. 

CNVs are remarkably common in human genomes. They account for approximately 5 to 9.5% of the human genome. They affect more base pairs than other forms of mutation when comparing two human genomes. They play crucial roles in evolution, population diversity, and disease development. 

Copy number variation, Wikipedia, This page was last edited on 24 September 2024, https://en.wikipedia.org/wiki/Copy_number_variation

Pös O, Radvanszky J, Buglyó G, Pös Z, Rusnakova D, Nagy B, Szemes T. DNA copy number variation: Main characteristics, evolutionary significance, and pathological aspects. Biomed J. 2021 Oct;44(5):548-559. doi: 10.1016/j.bj.2021.02.003. Epub 2021 Feb 13. PMID: 34649833; PMCID: PMC8640565 https://pmc.ncbi.nlm.nih.gov/articles/PMC8640565/

Eichler, E. E. Copy Number Variation and Human Disease. Nature Education 1(3):1, 2008,  https://www.nature.com/scitable/topicpage/copy-number-variation-and-human-disease-741737/

What are copy number variants?, 12 Aug 2020, Genomics Education Programme, https://www.genomicseducation.hee.nhs.uk/blog/what-are-copy-number-variants/

Clancy, S. Copy number variation. Nature Education 1(1):95, 2008, https://www.nature.com/scitable/topicpage/copy-number-variation-445/

Copy number variant, National Cancer Institute, https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/copy-number-variant

Copy Number Variation (CNV), 3 Nov 2024, National Human Genome Research Institute, https://www.genome.gov/genetics-glossary/Copy-Number-Variation

[29] Several approaches are used to determine if an SNV meets the one percent population frequency threshold:

  • Large-Scale Population Studies: Projects like the 1000 Genomes Project have sequenced thousands of individuals across multiple populations to identify and validate SNPs
  • A number of detection technologies are used such as real-time PCR, the use of microarrays, and Next-generation sequencing (NGS).

See for example:

The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015). https://doi.org/10.1038/nature15393 

Patricia M Schnepp, Mengjie Chen, Evan T Keller, Xiang Zhou, SNV identification from single-cell RNA sequencing data, Human Molecular Genetics, Volume 28, Issue 21, 1 November 2019, Pages 3569–3583, https://doi.org/10.1093/hmg/ddz207

Telenti A, Pierce LC, Biggs WH, di Iulio J, Wong EH, Fabani MM, Kirkness EF, Moustafa A, Shah N, Xie C, Brewerton SC, Bulsara N, Garner C, Metzker G, Sandoval E, Perkins BA, Och FJ, Turpaz Y, Venter JC. Deep sequencing of 10,000 human genomes. Proc Natl Acad Sci U S A. 2016 Oct 18;113(42):11901-11906. doi: 10.1073/pnas.1613365113. Epub 2016 Oct 4. PMID: 27702888; PMCID: PMC5081584. https://pmc.ncbi.nlm.nih.gov/articles/PMC5081584/

SNVs vs. SNPs, CD Genomics, https://www.cd-genomics.com/resource-snvs-vs-snps.html

Efficiently detect single nucleotide polymorphisms and variants, Illumina, https://www.illumina.com/techniques/popular-applications/genotyping/snp-snv-genotyping.html

[30] What are single nucleotide polymorphisms (SNPs)?, MedlinePlus, https://medlineplus.gov/genetics/understanding/genomicresearch/snp/

SNP, IMS Riken Center for Integrative Medical Sciences, https://www.ims.riken.jp/english/glossary/genome.php

The 1000 Genomes Project Consortium. A global reference for human genetic variation.Nature 526, 68–74 (2015). https://doi.org/10.1038/nature15393

[31] Ancestry Information Markers, National Human Genome Research Institute, https://www.genome.gov/genetics-glossary/Ancestry-informative-Markers

Joon-Ho You, Janelle S. Taylor, Karen L. Edwards, Stephanie M. Fullerton, What are our AIMs? Interdisciplinary Perspectives on the Use of Ancestry Estimation in Disease Research, National Library of Medicine, 2012 Nov 5. doi: 10.1080/21507716.2012.717339

Huckins, L., Boraska, V., Franklin, C. et al. Using ancestry-informative markers to identify fine structure across 15 populations of European origin. Eur J Hum Genet 22, 1190–1200 (2014). https://doi.org/10.1038/ejhg.2014.1

[32] What are single nucleotide polymorphisms (SNPs)?, MedlinePlus, https://medlineplus.gov/genetics/understanding/genomicresearch/snp/

[33] AIMs are single-nucleotide polymorphisms (SNPs) that show substantially different frequencies between populations from different geographical regions15. These genetic variations can be used to estimate the geographical origins of a person’s ancestors, typically by continent of origin.

AIMs are found within the approximately 15 million SNP sites in human DNA (about 0.4% of total base pairs). They are often traced to the Y chromosome, Mitochondrial DNA, and Autosomal regions.

AIMs can distinguish between major continental populations (Africa, Asia, Europe). They require multiple markers working together (typically 20-30 or more) for accurate ancestry determination. They can identify fine population structure within continents using larger marker sets. 

The effectiveness of AIMs depends on the number of markers used:

  • 40-80 markers can identify five broad continental clusters;
  • 128 markers can characterize samples into 8 broad continental groups; and
  • Larger sets (>46,000 markers) can identify detailed subpopulation structure

Hinkley, Ellen, DNA Testing Choice, 16 Dec 2016, https://dnatestingchoice.com/en-us/news/what-is-an-autosomal-dna-test

Lamiaa Mekhfi, Bouchra El Khalfi, Rachid Saile, Hakima Yahia, and Abdelaziz Soukri, The interest of informative ancestry markers (AIM) and their fields of application, , BIO Web of Conferences 115, 07003 (2024),https://doi.org/10.1051/bioconf/202411507003 

Huckins, L., Boraska, V., Franklin, C. et al. Using ancestry-informative markers to identify fine structure across 15 populations of European origin. Eur J Hum Genet 22, 1190–1200 (2014). https://doi.org/10.1038/ejhg.2014.1 

Ancestry Information Markers, National Human Genome Research Institute, https://www.genome.gov/genetics-glossary/Ancestry-informative-Markers

Ancestry-informative marker, Wikipedia, This page was last edited on 14 August 2024, https://en.wikipedia.org/wiki/Ancestry-informative_marker

[34] Autosomal DNA Statistics, International Society of Genetic Genealogy Wiki, This page was last edited on 17 October 2022, https://isogg.org/wiki/Autosomal_DNA_statistics

Autosomal SNP comparison chart, International Society of Genetic Genealogy Wiki, This page was last edited on 29 January 2024, https://isogg.org/wiki/Autosomal_SNP_comparison_chart

DNA Structure and the Testing Process, FamilyTreeDNA Help Center, https://help.familytreedna.com/hc/en-us/articles/6189190247311-DNA-Structure-and-the-Testing-Process

Catherine A. Ball, Mathew J Barber, Jake Byrnes, Peter Carbonetto, Kenneth G. Chahine, Ross E. Curtis, Julie M. Granka, Eunjung Han, Eurie L. Hong, Amir R. Kermany, Natalie M. Myres, Keith Noto, Jianlong Qi, Kristin Rand, Yong Wang and Lindsay Willmore, AncestryDNA Matching White Paper, 31 Mar 2016, AncestryDNA, https://www.ancestry.com/cs/dna-help/matches/whitepaper; PDF: https://www.ancestry.com/dna/resource/whitePaper/AncestryDNA-Matching-White-Paper.pdf

Autosomal DNA match thresholds, International Society of Genetic Genealogy Wiki, This page was last edited on 31 August 2024, https://isogg.org/wiki/Autosomal_DNA_match_thresholds

Daniel Kling, Christopher Phillips, Debbie Kennett, Andreas Tillmar,

Investigative genetic genealogy: Current methods, knowledge and practice, Forensic Science International: Genetics, Volume 52, 2021, https://doi.org/10.1016/j.fsigen.2021.102474

Davis DJ, Challis JH. Automatic segment filtering procedure for processing non-stationary signals. J Biomech. 2020 Mar 5;101:109619. doi: 10.1016/j.jbiomech.2020.109619. Epub 2020 Jan 9. PMID: 31952818.

The Order of Nucleotides in a Gene Is Revealed by DNA Sequencing, Scitable, Nature Education, https://www.nature.com/scitable/topicpage/the-order-of-nucleotides-in-a-gene-6525806/

[35] The Illumina Global Screening Array (GSA) is a customizable genotyping microarray platform.  Its base configuration

  • Contains approximately 654,000 fixed markers spanning the human genome;
  • Supports 24 samples per array in standard format;
  • Requires 200 ng DNA input;
  • Achieves call rates greater than 99% and reproducibility greater than 99.9%; and
  • Allows addition of up to 100,000 custom markers

Illumina microarray solutions, Illumina, https://www.illumina.com/techniques/microarrays.html

Efficiently detect single nucleotide polymorphisms and variants, Illumina, https://www.illumina.com/techniques/popular-applications/genotyping/snp-snv-genotyping.html

Custom design tools for genotyping any variant, in any species, Illumina, https://www.illumina.com/techniques/popular-applications/genotyping/custom-genotyping.html

Infinium™ Global Screening Array-24 v3.0 BeadChip, Illumina , https://www.illumina.com/content/dam/illumina-marketing/documents/products/datasheets/infinium-global-screening-array-data-sheet-370-2016-016.pdf

Infinium Global Screening Array-24 Kit, Illumina, https://www.illumina.com/products/by-type/microarray-kits/infinium-global-screening.html

Efficiently detect single nucleotide polymorphisms and variants, Illumina, https://www.illumina.com/techniques/popular-applications/genotyping/snp-snv-genotyping.html

Custom design tools for genotyping any variant, in any species, Illumina, https://www.illumina.com/techniques/popular-applications/genotyping/custom-genotyping.html

[36] Estes, Roberta, Comparing DNA Results – Different Tests at the Same Testing Company, 5 Sep 2017, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2023/05/18/comparing-dna-results-different-tests-at-the-same-testing-company/

[37]  Estes, Roberta, Concepts -Imputation, 5 Sep 2017, DNAeXplained – Genetic Genealogy, https://dna-explained.com/2017/09/05/concepts-imputation/

Illumina microarray solutions, Illumina, https://www.illumina.com/techniques/microarrays.html

Efficiently detect single nucleotide polymorphisms and variants, Illumina, https://www.illumina.com/techniques/popular-applications/genotyping/snp-snv-genotyping.html

[38] See for example: Our Autosomal DNA Test (Family Finder™), FamilyTreeDNA HelpCenter, https://help.familytreedna.com/hc/en-us/articles/4411203169679-Our-Autosomal-DNA-Test-Family-Finder

[39] Different DNA testing companies use centimorgans (cM) in slightly different ways when reporting matches and relationships:

  1. Matching thresholds: Companies set different minimum thresholds for reporting matches. For example: AncestryDNA currently uses a threshold of 8 cM; 23andMe uses 7 cM and at least 700 SNPs for the first matching segment; and MyHeritage uses 8 cM.
  2. Algorithms and filtering: Companies use proprietary algorithms to filter and process the raw DNA data. AncestryDNA uses algorithms called Timber and Underdog to phase data and filter out high-frequency segments. Other companies may use different methods, leading to variations in reported shared cM.
  3. Total cM calculations: The total amount of cM a person has can vary between companies. 23andMe reports about 7,440 cM total and AncestryDNA seems to use around 6,800-7,000 cM total.
  4. Reporting of segments: Some companies like 23andMe and FamilyTreeDNA provide detailed segment data. AncestryDNA does not show specific segment information.
  5. Confidence levels: Companies may assign different confidence levels or relationship probabilities based on shared cM. For example, AncestryDNA previously used confidence scores like “Extremely High” for cMs greater than 60.
  6. Handling of small segments: Companies differ in how they handle very small matching segments, with some including segments as small as one cM and others excluding anything below their threshold.

These differences in methodologies can result in variations in reported shared cM and relationship estimates between companies for the same pair of individuals. This is why matches and relationship predictions may not be identical across different testing companies.

Centimorgan, Wikipedia, This page was last edited on 1 May 2024, https://en.wikipedia.org/wiki/Centimorgan

What’s the difference between shared centimorgans and shared segments?, 11 Nov 2019, The Tech Initiative, https://www.thetech.org/ask-a-geneticist/articles/2019/centimorgans-vs-shared-segments/

centiMorgan, Internatioal Society of Genetic Genealogy, This page was last edited on 15 August 2024, https://isogg.org/wiki/CentiMorgan

[40] Hansen, Annelie, Untangling the Centimorgans on Your DNA Test, FamilySearch Blog, https://www.familysearch.org/en/blog/centimorgan-chart-understanding-dna

Green Dragon Genealogy, Yes, but what EXACTLY is a centiMorgan?, 19 Sep 2021, Green Dragon Genealogy,https://greendragongenealogy.co.uk/dna/yes-but-what-exactly-is-a-centimorgan/

[41] Autosomal DNA match thresholds, International Society of Genetic Genealogy Wiki, This page was last edited on 31 August 2024, https://isogg.org/wiki/Autosomal_DNA_match_thresholds

[42] Autosomal DNA Statistics, International Society of Genetic Genealogy Wiki, This page was last edited on 17 October 2022, https://isogg.org/wiki/Autosomal_DNA_statistics

Autosomal DNA match thresholds, International Society of Genetic Genealogy Wiki, This page was last edited on 31 August 2024, https://isogg.org/wiki/Autosomal_DNA_match_thresholds

Estes, Roberta , Comparing DNA Results – Different Tests at the Same Testing Company, DNAeXplained – Genetic Genealogy Blog, 18 May 2023, https://dna-explained.com/2023/05/18/comparing-dna-results-different-tests-at-the-same-testing-company/

Autosomal DNA testing comparison chart, International Society of Genetic Genealogy Wiki, This page was last edited on 8 October 2024, https://isogg.org/wiki/Autosomal_DNA_testing_comparison_chart

[43] Phasing, International Society of Genetic Genealogy Wiki, This page was last edited on 24 May 2024, https://isogg.org/wiki/Phasing

A Guide to Phasing from Illumina: https://youtu.be/15NPZCGP_e4

Autosomal DNA match thresholds, International Society of Genetic Genealogy Wiki, This page was last edited on 31 August 2024, https://isogg.org/wiki/Autosomal_DNA_match_thresholds

Davis DJ, Challis JH. Automatic segment filtering procedure for processing non-stationary signals. J Biomech. 2020 Mar 5;101:109619. doi: 10.1016/j.jbiomech.2020.109619. Epub 2020 Jan 9. PMID: 31952818.