The Structure of DNA
Most people know that DNA is a double helix spiral shape, but there are many components within this shape. Understanding the components of this spiral and how they connect together is important to decode the information locked within. This article will help you to understand how the chemicals in DNA fit together and interact. This understanding will help you to better understand what your DNA results mean.
DNA is made of two parallel polymer chains that wind around each other to form a double helix. These two strands are connected to each other like rungs on a ladder by four chemicals or nucleotides, abbreviated as A, C, T, and G. Each rung is composed of a pair of two of these nucleotides, one on each strand. We call this a base pair.
There are millions and millions of base pairs in our DNA. Through hard work, geneticists were able to map out this “ladder” in a method much like putting together pieces of a very complicated puzzle. Each base pair is assigned a position, like coordinates on a map, to find where it fits in the puzzle.
Because of their chemical structure, in each base pair, A is always paired with T, and C is always paired with G. This means if we look at a specific position and see an A on one strand, we know there is a T on its companion strand. This is referred to as Chargaff’s Rule after the Ukrainian geneticist who first identified this phenomenon.
The base pairs at each position differ from one person to another, and that is what creates all the diversity in humans we see today. Even though there can be differences in DNA, there are some series of positions, called sequences, that must be universal for human bodies to function. These sequences can help serve as landmarks to anchor ourselves in this vast puzzle of variations.
Because of Chargaff’s Rule, we know each position connects the two strands with a combination of AT or CG. But how do we know which strand has A and which one has T? To help with this, we identify the two strands as the forward (+) strand and the reverse (-) strand. After all, if one person has a combination of TA with T on the forward strand, and another person has AT with A on the forward strand, it’s not the same thing.
For different types of DNA, there are established sequences of the forward and reverse strands that are universally used as reference sequences.
To make things easier, DNA tests are compared to the forward strand only. So if a pair of AT has A on the forward strand it will be listed as A. We can then compare the value on your strand to the reference value to track changes.
We call changes like these variants. An AT pair might be swapped with TA or even with GC or CG. This doesn’t happen all the time, but occasionally, changes like this happen; and the changes usually get passed down to the next generation. Outside of the landmark sequences, this doesn’t make a difference in how the body functions, but it is valuable to genealogists to know how changes occur from one generation to another.
Regardless of any variants that may happen at a position, that position is always relative to the landmark sequences in the reference. For example, in one area, we may know that positions 200-300 are between landmarks of GGGG on either side of the forward strand. With so many potential variants between the landmarks though, how do we know which combination it is? Is it AT with an A on the forward strand or TA with T on the forward strand?
To understand this, let’s take a look at the testing process.
The Testing Process
To go back to our map analogy, if you are on a road trip and need to get to a particular place, you don’t need to look at the entire map of the world. You only need to look at the landmarks near you and your destination. Just like maps, if we are looking at a particular sequence, we don’t need to look at the entire genome.
Let’s look at an example. Say we want to look at a section of DNA that looks like this:
Notice how each end of this section is bracketed with:
For simplicity's sake, let’s say that these GC base pairs are the landmark sequences. Based on our established reference sequence, we know that G is on the forward strand and C is on the reverse strand. Now we know which strand is which:
Forward strand: GGGGCTATTCATTCAATCATACACCCAGGGG
Reverse strand: CCCCGATAAGTAAGTTAGTATGTGGGTCCCC
Remember though that this map is a tightly wound helix. Each pair is wrapped around between the twisting strands. With them so tightly wound together though, how do we peel it apart so we are looking at just the forward strand?
Peeling the Strands Apart
After several lab procedures have removed all the rest of the biological material from the sample (the cell membranes, proteins, etc.), we are left with just the DNA. The isolated DNA is then heated until the two strands peel apart. The base pairs are separated, and each strand is left with only the nucleotides that were on their own sides. This way we can look at each of the nucleotides on each strand independently. DNA is so tiny though that it is impossible to hold up a microscope to look at the individual strands.
So how do we separate the forward and reverse strands so that we know we are only looking at the forward strand?
To do this, we will design artificial pieces of the reverse strand. We call these artificial pieces primers. In our example, our primer is CCCC. Since C always pairs with G, our primer won’t connect to anything other than GGGG. When we introduce our primer to our separated strands, it will ignore the reverse strand and go straight to the GGGG on the forward strand. It will look something like this:
Now we have highlighted exactly where we need to look. While we are able to highlight either end of the segment in question, there are so many potential variations within the segment (different mutations on different alleles) it would be impossible to create a standard primer for them. How do we actually know what all the variations inside are? The answer to this riddle lies in the chemistry of A, C, T, and G.
Next, we introduce an enzyme called DNA polymerase. A polymerase is a naturally occurring enzyme that copies single strands of DNA to replicate new copies. In this case, we make a specifically modified polymerase that will only copy the section we have highlighted with our primers. This will then make copies of the copies, and copies of those copies, in an exponential chain reaction. Having many copies means we can run the same test multiple times. Doing so means that even if there was a slight copy error in some of the polymerase reactions, there will be plenty without this error. This helps to ensure accuracy.
Different chemicals and elements react differently and at different wavelengths when subjected to radiation such as light or heat. This is called fluorescence. A white fluorescent light, for example, is a tube of mercury vapor. Mercury vapor fluoresces white when you introduce a low-level electric current, and this is what causes the bright white shine. All the different colors in neon lights work the same way.
Remember the base pairs we added to make copies of the DNA? These base pairs have special chemicals added to them that fluoresce as well. Our testing machines have lasers attuned to specific wavelengths so that a laser run over A will fluoresce at a different color than T.
Now that we have multiple copies, even if some of the copies had a slight copy error, or if the laser had a slight misread, we can make sure the results are reliable. The results of these tests are forwarded to our data assurance team to double-check for accuracy.
We can even verify these results by making reverse primers for the reverse strand. In our example, we simply make a primer of GGGG and it will bind to the reverse strand. We can then run the test in the same way. Thanks to Dr. Chargaff, we know that the nucleotides found within our GGGG primer will be an exact mirror of what was in the CCCC primer.
Unfortunately, the testing process and the chemicals used mean only a single test type can be used on a piece of sample. We only take a small portion of DNA from every test tube submitted, but even so, it will eventually run out. This is why we cannot run an infinite number of tests on your DNA sample.
The above example is a very simplified version of what all of the different tests in our lab do. Different tests look at different portions of DNA and require different primers. Some tests work better with some types of testing machines than others. The best way to get results for an autosomal test like our Family Finder is not the same as the test for the Big Y.
The testing procedures and DNA regions looked at vary from test to test. This is why when you order an autosomal test you do not also get results for another test type.
Once we have the results for the positions we need, we are able to compare them to the applicable reference sequence. This lets us identify which variants you have. We use these variants to compare your results with our matching database and use our proprietary algorithms and provide the wide variety of results you see on your account.