Y-DNA FTDNATiP™ Report Introduction

The FTDNATiP™ Report provides an estimate of how far in the past two Y-STR matches share a common paternal ancestor. This feature is similar to the time estimates given for the age of haplogroups in the Discover Time Tree.

This report provides a chart of time estimates for genetic distance (GD) at each test level. As the number of markers increases, we are able to predict time ranges with more precision, so a GD of 1 at the 12 marker level will provide a much wider date range than a GD of 1 at the 111 marker level.

Genetic Distance

Genetic Distance (GD) refers to the number of differences in Short Tandem Repeats, or STR markers, between two people. You can read more about GD here, but in general, the higher the GD, the further back in time the two ancestors probably share a common ancestor. Each match level has its own unique match limit. This is generally proportional to the number of markers. A GD of 10 at the 12-marker level is highly unlikely to be genealogically relevant, while the same GD at 111 markers may very well be. This limit explains why there are gradually increasing options for GD in the TiP report as the marker level increases.

How the FTDNATiP™ Report was created

The best way to understand how the TiP report was developed is by an example:

Take two people who match each other with a genetic distance of 2 at the Y-37 STR marker level. Make sure both of these people have also taken a Big Y-700 test. Next, take a look at their confirmed haplogroups. These haplogroups are each determined by a specific SNP, and those SNPs have age estimates provided by the Discover Time Tree.

If the two men have different haplogroups, trace the two haplogroups back on the tree to the “parent” SNP. This SNP will have an age estimate as well.

For example, say the two matches have haplogroups of I-FT139081 and I-A19407. The “parent” SNP of both of these is I-A9783. This means that I-A9783 represents the common ancestor for the two matches. Therefore, the age of I-A9783 is also the TMRCA for the two matches.

Now expand your search to ALL the people in the database who match each other with a Genetic Distance of 2 at the 37 marker level and find the ancestral SNP each pair of men have in common. This will be unique to each pair. For example, the ancestral SNP for two men in the R haplogroup will be different from two men in the E haplogroup. Each of these SNPs will have their own age estimate. When we combine all of these age estimates, we get a distribution that looks like this:

What does the graph tell us?

The graph is broken down into three color-coded sections, designated on the Legend to the right of the graph. These are distinct Confidence Intervals, or CI. This refers to the percentage of matches who share a SNP of a given age range. 99% includes the most age ranges currently in our database. Since there may be matches who share a SNP outside this age range who have not yet tested, we cannot be 100% certain that this is the definitive age range, so we list it as 99%.

You can see the 99% CI is the largest because it contains the most possible age estimates, including extreme outliers. 95% CI means that 95% of the testers in the database have that age range, and 68% CI means 68% of all testers have that age range. 68% is the most narrow age range, as it is the smallest subset of the database. If you are basing your research on the 68% CI, there is a 32% chance that your shared SNP is outside this age range. This increases the risk of inferring incorrect information.

For this reason we offer three levels of confidence intervals. Most genealogists recommend the 95% CI.

Expanding the report

This same technique was done for all possible genetic distances at all possible Y-STR matching levels. We calculated the time estimate at three confidence levels: 68%, 95%, and 99%. Here is the chart for 95%. This means that 95% of all men with a certain genetic distance at a certain testing level share a common ancestor in this time range. This gives genealogists different options on which they can base their research. When you view the TiP Report, you can click on any given Genetic Distance estimate to get a detailed report that looks like this:

You will notice the highest possible GD on the chart is 10, for 111 markers. In the last row there is an option to to show the entire range simultaneously.

Clicking on one of these options will display all possible GD ranges for that marker level. Here is an example of the first option, for Y-12:

Notice that the only two levels shown are for a GD of 0 and 1, as these are the only options available. The GD of 1 has a wider time range, but a lower possible likelihood, while the GD of 0 has a narrower range but a higher likelihood. This corresponds to the general confidence interval of these two levels.

When you view the same chart for the Y-111 level, there are many more options displayed, yet the same pattern occurs, with an inverse relationship between the time range and the likelihood.

Things to remember

There will always be statistical outliers. In our beta version of the Time Tree, we have worked to address these outliers. One thing that is known to affect the time estimate is null values. Null values occur when the lab is unable to find the value for a particular STR marker. This can happen for a number of reasons. You can read more about null values here, and how they affect genetic distance here.

Null Values

Null values affect matching, so samples that had a null value in their STR markers were not included in the dataset used to create the TiP report and Time Tree. For statistical analysis, this is especially important as people who have null values tend to have multiple null values at multiple locations, and this amplifies the differences they have with the majority of the data set.

Conclusion

DNA testing is an ever-evolving science. As the science improves and the database grows, we are able to refine our estimates and develop more accurate techniques. It is our hope that as time goes on, this report will only improve. For now, it represents the most up-to-date information gathered from the world’s largest Y-DNA haplotree.