DNA is composed of four chemicals: adenine, genuine, thymine, and cytosine, abbreviated as A, G, T and C. These chemicals occur in pairs, called base pairs, in DNA. Because of their chemical structure, A is always paired with T, and G is always paired with C.
When new embryos are created, sometimes one base pair may be substituted for another. For example, in a particular location an AT pair might be inadvertently substituted for a GC pair. These small copy errors are called mutations.
To track these changes and how they differ from other DNA sequences, we use a universal reference sequence. This reference sequence serves as a standard to which all mutations are compared.
Reference Sequences
To help avoid confusion, the reference sequence focuses on only one-half of the pair, called the forward strand. After all, if we know an A is on the forward strand, we know a T must be on the reverse strand, so it is easier to identify the pair simply as A.
In a reference sequence, each base pair is assigned a location number and a reference value for that location, such as A. This is called the ancestral value. If a person has a value other than the ancestral value, we call that mutation a derived value.
There are two reference sequences to which scientists compare changes in mtDNA: the revised Cambridge Reference Sequence (rCRS) and the Reconstructed Sapiens Reference Sequence (RSRS).
rCRS
The revised Cambridge Reference Sequence is a revision of the very first mitochondrial genome sequenced at Cambridge University in 1981. This was based on a group of anonymous individuals of European descent. In the rCRS system, each nucleotide base is assigned a position along with the value (A, C, T, or G) that was discovered in this anonymous individual. Your rCRS values are reported by listing the location followed by your derived value. For example, if you differ from rCRS at position 263 with a value of G, this will be reported as 263G.
As global testing became more prevalent, we began to find that the rCRS sequence, although common among Europeans, was not representative of the wider global human population. In order to address this, a group of scientists published the Reconstructed Sapiens Reference Sequence (RSRS) in 2012.
RSRS
The Reconstructed Sapiens Reference Sequence was designed to be representative of “Mitochondrial Eve.” This is not the first woman who lived, but rather the woman from whom all modern humans descend in a direct maternal line. The RSRS is a reconstruction of this ancestral mitochondrial sequence. Just like the rCRS, each nucleotide is assigned a position and an ancestral value. RSRS values are reported using a system that lists the ancestral value, the position, then your mutation. For example, if at location 769, the ancestral value is adenine (A), and you have a mutation of guanine (G), then this mutation will be reported as A769G.
Types of mutations
By comparing your mtDNA mutations to each sequence, we can compare the differences in your DNA to the ancestral values in both the RSRS and the rCRS. FamilyTreeDNA provides a separate list of each of these differences along with your other results when you take an mtDNA test. There are different types of mutations, however, and they are reported slightly differently.
Transversions
Different bases have different chemical shapes. A and G are double-ringed structures called purines whereas C and T are single-ringed structures called pyrimidines. When a purine has mutated to a pyrimidine or where a pyrimidine has mutated to a purine (C to G, C to A, T to G, or T to A) this is called a transversion. Transversions are shown by giving the original value capitalized before the location and the mutated value uncapitalized after the location. Thus, a transversion of an A nucleotide at locus 825 to a t is shown as A825t.
Transitions
When a purine is exchanged for another purine, this is called a transition. This also applies when a pyrimidine is exchanged for another pyrimidine. For example, the purine A could be substituted with the purine G or vice versa, or the pyrimidine C could be substituted with the pyrimidine T. Other possible substitutions are T to C and G to A.
Heteroplasmy
When more than one value is found at a location it is called a heteroplasmy. These are indicated by letters other than A, C, T, or G on the mutations page. You can read more about heteroplasmy here.
Back Mutation
Sometimes a base pair mutates away from the reference value, and in a later generation mutates back. These are called back mutations (or reversals) and are indicated by a ! after the value on the RSRS mutations page.
Extra mutations and missing mutations
The RSRS also contains Extra Mutations and Missing Mutations. Extra Mutations are those that are present in your mtDNA but are not defining for your haplogroup. Missing mutations are mutations defining for your haplogroup that you do not have.
Insertions
In some cases an extra base pair is added into the mtDNA sequence between two established positions. When this occurs it is represented by a decimal point followed by the value found at that extra position.
For example, if an extra base pair is found between positions 315 and 316, and the value of that base pair is C, it will be reported as 315.1C. If there are more than one insertions, they will be reported as 315.2, 315.3, etc.
Deletions
In some cases an established position in the reference sequence is simply not found in an individual's DNA. For example, position 315 may have a reference value of C, but when the mitochondria is sequenced, positions 314 and 316 are found, but 315 does not have C or any other value. When this happens it is called a deletion and it is represented by reporting the position or range of positions followed by a d (e.g., 315d, or 290-291d).
Repeats
Short repeated mtDNA sequences may occasionally add or subtract repeat counts. For example, a sequence in the reference like CTATCTAT may add another repeat to become CTATCTATCTAT, or delete a repeat to become CTAT. In these cases the variant is reported as the previous repeat count in brackets followed by the lead position and single sequence then followed by the new repeat count in brackets. If the above example of an added repeat had occurred at position 16982, for example, it would be reported as [2]16982CTAT[3].
Lower-Quality Variants
Variants with lower reporting quality or which may be prone to unusual behavior are reported in parentheses like (C152T). These variants may have unpredictable behavior in nearby haplotree branches and have generally not been used for branch determination.