Skip to main content

DNA Sequencing Technologies

The complete human genome includes over 3 billion base pairs. Sequencing the entire genome has historically been cost-prohibitive and time-consuming, so techniques to focus on specific regions of DNA have been developed and refined over time. This allows researchers to focus on specific regions of DNA that are relevant to their research.

Techniques differ, and while technology and techniques have improved over time, many older methods are still in use today. They are usually broken into “generations.”

First Generation

Sanger Sequencing

In 1977, a British biochemist named Fred Sanger developed a process that became the industry standard for decades.

DNA is a long chain of pairs of four nucleotide bases, which we usually refer to by their first letters: A, C, T, and G. Also called bases or base pairs, each nucleotide is referred to as a position. Most of this chain is identical in all humans, so we can design artificial chains of nucleotides (primers) in a pattern that will only connect to a specific point in the DNA chain.

In the Sanger method, first the DNA fragment(s) that contain positions to be read are isolated and processed through a gel containing several types of added nucleotides, which attach themselves to the target DNA.  

The number of primers which attach to any given base depends on their position on the DNA fragment, and a special fluorescent nucleotide will attach at the end of each chain depending on the base’s own nucleotide value. Each type of fluorescent nucleotide shines a different color (fluoresces) when subjected to energy and that color is measured with a specialized laser. The sequence from the DNA fragments is then reconstructed from those measurements to form a complete picture, or read, of the target section.

DNA Testing Technologies - Electropheragram .png

Advantages and Disadvantages

Sanger sequencing has a very low error rate, with an accuracy of around 99.99%. It is also excellent at reading through repetitive regions of the genome that can confuse newer methods. It is effective at sequencing relatively small fragments of DNA of no more than about 900 base pairs in length. If a large area of the genome needs to be sequenced, the DNA must be broken into smaller fragments, sequenced separately, and then reassembled into the larger whole. This is a process called shotgun sequencing.

Although Sanger sequencing is not widely used in genetic genealogy today, it is still regularly used for single STR/SNP and STR/SNP panel testing. Its main disadvantages are that it is not as fast as the newer methods of DNA sequencing and is commercially cost-prohibitive to expand to larger regions of DNA.

Microarray Genotyping

In 1995, a team at Stanford University developed a technique to target individual nucleotides rather than regions or fragments. Instead of reading all the DNA letters in a sequence, a microarray performs a massive "spot-check" at those hundreds of thousands of pre-selected locations.

DNA Testing Technologies - Simplified Microarray Sequencing.png

Specialized microarray chips are used that contain a specific set of artificial DNA fragments (probes) that contain a fluorescent dye. Different chip versions vary slightly, but on average report around 600,000-800,000 SNPs.  A person's DNA sample is washed over the chip, and the DNA fragments stick to the matching probes on the chip. A laser scanner then measures the fluorescence at each spot to determine which genetic variant a person has at each of those known SNP locations.

Most of those SNPs are tested by all genetic genealogy companies, but also include a custom list of SNPs unique to that company. Most chips used by genetic genealogy companies include some known Y-DNA or mtDNA locations, and are able to provide a mid-level haplogroup based on that data.

Advantages and Disadvantages

Microarray technology is cheap, fast, and scalable. Most of the cost is in building the chips in the first place, and after mass production, the testing phase is cheaper and quicker to run.

Humans have two complete sets of chromosomes. One set is received from each parent. The receptors in microarray chips are designed to bind in specific locations regardless of which chromosome (maternal or paternal) it occurs on, and report the values at those locations. This means it can only detect known SNP locations and is completely blind to any new, rare, or family-specific variant that is not already printed on the chip.

It also results in two values for each location - one from each chromosome copy. Microarrays report SNPs from either side of a person’s chromosomes together, with no assignment of individual nucleotide values to your mother's or father’s half of your DNA. This results in a need for further analysis (phasing) to identify which nucleotide values belong on which chromosome.

These limitations do not affect goals such as finding a 3rd cousin, since that connection relies on sharing large, common blocks of DNA. However, it makes the technology unsuitable for advanced Y-DNA or mtDNA research, which depends on discovering new defining mutations.

Second Generation

Next-Generation Sequencing (NGS)

The next generation of DNA sequencing techniques built upon and expanded the Shotgun Sequencing approach used in Sanger Sequencing for reconstructing long DNA sequences. Instead of sequencing one piece at a time, NGS breaks DNA into millions of very short pieces and reads them all at once, then puts the results back together through data analysis.

NGS revolutionized the biological sciences. It is the technology that made it possible to sequence an entire human genome in a fraction of the time and cost of the original Human Genome Project.  NGS is the foundational engine for all modern genomics, including cancer research, clinical diagnostics, and studying the microbiome.

DNA Testing Technologies - Simplified Next Generation Sequencing.png

The process, often likened to a "shredder," begins by fragmenting the genome into millions of short DNA pieces (typically 100-150 bases). These fragments are then washed over a glass flow cell containing thousands of wells. Each well is filled with specialized primers designed to bind to fragments within the targeted region of DNA.

After the fragments are attached to the flow cell, the sequencer reads all of them at once. In this process, the machine adds fluorescent DNA letters (A, T, C, G) one by one, and a camera takes a picture with each addition to record the sequence of every fragment.

The fragments are then mapped to “reads”, which report small sections of DNA.  Finally, a powerful computer pieces these reads together in the proper sequence and maps them to a human reference genome.

DNA Testing Technologies - A Simplified Example of Next Generation Sequencing Reads.png

The number of reads for any particular position of a tester’s actual DNA will vary by test and any single test will have areas of high or low coverage or even no coverage at all. As demonstrated in the graphic above, occasionally a fragment is misaligned to the rest of the genome, meaning that it is attached to the wrong place in the sequence. This makes it essential to run this test multiple times with many fragments covering the same area, either in whole or in part, to produce reliable results and exclude misreads. The number of times a location is analyzed is referred to as the Read Depth.

Tests such as the mtFull Sequence test, Big Y-700, or a particular Whole Genome Sequencing test all use NGS technology. Because NGS has so many uses, many variations of it have evolved over the past two decades.

Advantages and Disadvantages

NGS reads every nucleotide within the targeted area, whether that region has been previously sequenced or not. Unlike microarray chips, which have specific probes for specific, known SNPs, NGS sequencing allows us to explore entire regions of the genome in depth. This allows genetic genealogists to discover new, family-defining SNPs.

The primary disadvantage of NGS is that the short reads resulting from the small, “shredded” fragments make the computer assembly challenging,  especially in repetitive parts of the genome such as the Y-chromosome. This can make correct reassembly a bit like reassembling a 1000-piece puzzle of a clear blue sky.

NGS Variation:  Targeted Panel Sequencing

Targeted Panel Sequencing targets specific regions rather than large coverage areas. This enables the target area to be read multiple times in a cost-effective way, ensuring accurate results targeted to areas useful for genealogists. Targeted regions can be as small as a single gene or large enough (like Big Y-700) to return data on 14-18 million positions.

In the picture below, if the DNA strand represents all of a tester’s chromosomes (including the Y and X chromosomes) plus their mitochondrial DNA, the targeted panel test has been pre-configured to target two regions of DNA marked in orange.

DNA Testing Technologies - Targeted Panel NGS Testing Targeting Two Regions of Interest.png

Advantages and Disadvantages

Deep coverage provided by targeted panels provides high sensitivity for rare variant calling, especially the rare, recent SNPs that place a person on a specific branch of the human Y-DNA or mtDNA family tree.

This also makes it a specialist tool for the specific regions of interest and does not provide any information about other regions.  Targeted panel NGS is used for the two tests included in our Big Y-700 and mtFull Sequence mtDNA tests.

NGS Variation:  Whole Genome Sequencing

Whole Genome Sequencing (WGS) is an umbrella term that can be misleading if not understood properly. The total amount of your genome that a WGS test actually reports is dependent on the read depth of that WGS test.  

The read depth refers to the average number of times any specific position in the genome is expected to be read. Each read can be thought of as proof-reading a piece a writing before submission. The more times the position is read, the more confident we are that it was proof-read accurately

The read-depth and confidence is determined in advance for each WGS test, and is typically expressed as a multiple (e.g., 30x). A higher read depth means a position will likely be sequenced more times, which increases reliability in identifying true variants over sequencing errors. WGS tests at lower read depths also have larger areas of zero reads. Sequencing costs, amount of data returned and data analysis and storage costs of WGS tests are all directly related to read depth (and by extension, test coverage).

Representative Reads For Both A Low-Coverage And High-Coverage WGS Test.png

Selecting an appropriate coverage depends on the purpose of the testing.  Low-coverage WGS may be enough for low-cost autosomal testing for instance, while other purposes may require a much deeper level of coverage for analysis by expert users.  

Advantages and Disadvantages

High-coverage WGS tests still have high costs and high amounts of data to be stored and analyzed, but represent the highest resolution of an entire genome available today.

A low-pass/low-coverage WGS offers the advantage of low cost and lower amounts of data; it also can be used to report or impute the same 700,000 SNPs as a microarray test and so produce comparable data output to standard autosomal tests.  

However, low-pass WGS does not report enough of the genome at higher read depths to be comparable to a high coverage WGS test or to the regions reported by a targeted panel NGS test.  

It is important to note that any WGS test is still not a complete representation of a tester’s genome, as they still suffer from the same limitations as regular NGS testing in not being able to reliably phase novel SNPs to the correct parental side of chromosomes or reliably report large structure variants like many STRs.

Conclusion

DNA sequencing techniques vary depending on the research goals, target areas, and depth of coverage. All are still in use today for different purposes, and understanding what terms like Whole Genome Sequencing really mean is crucial when deciding which test to get. Just like different tools in your workshop, each provides a solution to a unique problem.

This article is an abbreviated version of Dave Vance’s original blog on the same topic. Please see his post for more detailed information. 
 
DNA Sequencing Technologies Explained

Submit Feedback