Skip to main content

Understanding Y-DNA SNPs

Single Nucleotide Polymorphisms, often abbreviated as SNPs, are small changes in the value of a specific nucleotide. For the purposes of genetic genealogy, you will often hear the term SNPs in association with Y-DNA, but they occur in all types of DNA.

Finding SNPs

DNA is made of two parallel polymer chains that wind around each other to form a double helix. These two strands are connected to each other, like rungs on a ladder, by four chemicals (or nucleotides), abbreviated as A, C, T, and G. Each rung is given a position number (much like a set of map coordinates). The nucleotide at each position is called the value of that position. This is also sometimes called the base, or the base value. You can read more about the shape of DNA here

Sometimes, a small mutation will occur in an embryo that changes the value of a specific nucleotide. For example, instead of a G, an A might get swapped out instead. This change occurs in a Single Nucleotide, and is a change from one to another, or, to use a technical term, a Polymorphism. Hence SNP

Most of these changes do not affect health in any way, but they are useful to track inheritance. Once a change occurs, it is unlikely to change again in a subsequent generation. To track these changes, we use a universal Reference Sequence that consists of a standard set of values for each location. These values are called the Ancestral Values, and is used as a universal reference to track mutations.

When a change is found in the value of a nucleotide at a specific location, it is considered a Derived Value because it is a derivation of Ancestral Value. When these are discovered through tests like the Big Y-700, each is given a SNP name. The SNP name refers not only to the position, but to the type of mutation. 

For example, it could be an A to C mutation at position 1234, which is not to be confused with an A to T mutation at position 1234. Since you can have different types of mutations at different locations, we refer to each by a SNP name.

SNP Names

Each SNP name refers to a specific mutation in a specific location. They are generally named after the lab or individual that first identified them. Each lab or individual is assigned a letter (or series of letters). The numbers following that lette represent the number of SNPs each lab or individual has identified. 

For example, FT1 was the first SNP identified by the FamilyTreeDNA BigY-700 test. FT2 was the second, and so on. For a complete list of SNP prefixes and their meanings, take a look at our SNP Names article. It can still be overwhelming, so instead of focusing on the long SNP names like FT1367840 or BY2787, just think of each SNP on the tree as an ancestor whose name we do not know. 

Types of SNPs

Even though SNPs refer to any single change in the chromosome, they fall into three broad categories:

  • Biallelic SNP - This represents the vast majority of observed SNPs throughout the human genome. For these SNPs, only two distinct nucleotides are observed across the entire population being studied.

    For instance, if the reference base at a specific location is 'C', and the variant found in a specific lineage is 'T', the site is biallelic. It segregates the population into two groups: those who inherited the ancestral 'C' and those who inherited the derived 'T'. This creates the foundational binary split that defines a new branch in the phylogenetic tree: an ancestor who mutated from C to T, and all his male descendants who carry that T.
  • Multiallelic SNP - Sometimes more than two nucleotide values are observed across the entire population of male testers for a given position on the Y chromosome. In other words, if the Ancestral Value is A, and we see A, C, and T in a population, the C and T each represent a unique SNP, because they are unique mutation types. 

    The maximum number of values for any given location is four, because there are only four possible nucleotides to go in that position. That means multiallelic SNPs fall into two types:
    • Triallelic SNP - This is is a position on the Y chromosome where three distinct nucleotides are observed. This configuration means the site contains the ancestral (reference allele) and two different derived (variant) alleles segregating within the population. 

      For example, if the ancestral base is A, and population studies reveal some testers carrying G and other individuals carrying C, then all three (A, G, and C) are present, making it triallelic. A triallelic site indicates that two separate, distinct single-step mutational events occurred at the exact same location, originating from the common ancestor who possessed the ancestral base.
    • Quadalellic SNP - A quadallelic SNP is the rarest form of SNP variation, where all four possible nucleotides (A, C, G, and T) are observed across the tester population. This represents three distinct, independent base substitutions originating from the same ancestral base, all occurring at the same position on the Y chromosome but in separate patrilineal lines.

All this sounds confusing, but it usually isn’t for most SNPs. The different variant alleles are usually given different SNP names, so for instance the SNP Y921 might refer to the A to G mutation at that example position, and the SNP FGC32091 might refer to the A to C mutation that occurred in a completely different ancestor in a different part of the haplotree. It’s usually only when multiallelic SNPs are found in private variants or other SNPs that haven’t been named yet that it can sometimes be confusing, but if you keep in mind that you might see a new variant allele at any position it’s usually not hard to clear up

Submit Feedback