Recently the AutoSegment ICW tool on Genetic Affairs for 23andMe and FamilyTreeDNA profiles has received significant enhancements. In short, if the DNA matches linked to overlapping segments are not shared matches (for FTDNA) or do not triangulate (for 23andme), it can be presumed that these segments are related on opposite parent sides.
An AutoSegment analysis first collects and groups all the segments that overlap on each chromosome. Next shared matches for these segments are collected, and the shared matches are used to group overlapping segments, to make segment clusters. These clusters are used to generate the AutoSegment ICW cluster, which is found at the top of the window of the HTML file.
Earlier segments that were part of an overlapping segment cluster, but did not have any shared matches with the DNA matches of the other segments were discarded, but now with the new enhancements these segments are kept, and the data are available along with the ICW groups in a table. The table is brightly colored to indicate where there are ICW clusters and where there are additional segments. Another feature of this table is the built-in ICW matrix (similar to the FTDNA ICW matrix) that shows the segments. Clicking on ‘more info’ brings up the matrix where grey cells indicate when one of the segments that was not in the cluster is a shared match with some of the members of the cluster. Finally, this table of all the segments can be entered into DNA Painter’s Cluster Auto Painter (CAP) to show the triangulated clusters as well as the segments that do not match. Since I have another blog post about AutoSegment ICW cluster, this post will primarily be about these new segment clusters.
Also note that although the name suggests otherwise, the AutoSegment ICW on 23andme actually employs triangulation data, since the quality of actual triangulation data is better as compared to ICW data (especially linked to high cM matches that share multiple segments).
In a nutshell, this is the set of different steps employed by AutoSegment ICW and the newest addition to the tool:
- Getting DNA match list until it reaches the lowest cM setting.
- Getting the segments (or in the case of 23andme, the user-provider match file with all segment data).
- Clustering of segments, finding overlapping segment clusters.
- Identify which matches are part of overlapping segment clusters
- Download ICW matches for DNA matches linked to segment clusters
- Redo the segment clustering but use the ICW data, overlapping segments are discarded if the underlying matches are not ICW
- Link matches together if they share a segment in a segment cluster and create a network of DNA matches.
- Perform AutoCluster clustering and create the chart
- so until now, this is the regular AutoCluster ICW – now comes the new part
- Redo the segment clustering and identify overlapping segment clusters
- Examine the segment clusters and check if all DNA matches underlying the segments are ICW, if this is all true, it’s a green segment cluster
- Segment clusters for which not all segments are triangulating are clustered, to see if we can identify 2 or more separate segment clusters
- The separate segment clusters from the previous step are used to color the segment clusters
- Create an ICW matrix page per segment cluster, color the ICW information with the same information from the segment cluster colors from the table
- Add the table with colored segment clusters to the main HTML created for the regular AutoCluster ICW
23andMe AutoSegment ICW
Figure 1 shows the files that are produced by the AutoSegment ICW analysis for 23andMe data. The first HTML file that is listed here is the AutoSegment ICW AutoCluster which is typically displayed. The second file, the Excel one, is the spreadsheet version of this same AutoCluster, which is very useful for reading the match names when the HTML AutoCluster is very large. The third file, which has ‘no-chart’ just before the HTML contains all the results of the AutoCluster html but without the AutoCluster at the top. This file is most useful when the cluster is so large that your computer has trouble displaying it. The fourth and last file that ends with ‘segment_clusters’ contains the new, enhanced segment clusters that are used for DNA Painter Cluster Auto Painter.
Clicking on the first html file in Figure 1 brings up the AutoCluster at the top and all of the information including the new segment clusters table. The original AutoSegment ICW cluster is shown in Figure 2.
Scrolling down the page next is the Chromosome segment statistics per AutoSegment cluster. Clicking on one of the AutoSegment clusters lists all the DNA matches in that cluster as well as matches that are in other clusters and have grey cells to a match in the first cluster. Continuing down the page is the list of all the matches in each of the clusters found in the AutoSegment ICW cluster from the top of the window, as well as the individual segment cluster information. All of these features have been explained in more detail in previous blog posts.
Next is the Complete Segment Cluster Information, which is one of the new features. Previously any segments in an overlapping segment cluster that did not have shared matches with the other DNA matches linked to the other segments in the cluster were discarded. Also, matches that did not fit into a cluster, even if there was some overlap, but only to one or two people in the cluster, were not included. Now all matches are used for this table. The matches are color coded so that triangulated matches, matches that share the same segment of the same chromosome and also are shared (ICW) with others in that cluster, are given the same color in the table. Figure 3 shows an example from this colored table.
Looking at the data in this table first is the cluster number from the AutoSegment cluster, which was at the top of the window. Ann and Sue are in cluster 23 and Trish is in cluster 1. The next number in the table is the cluster number that will be used when these data are put into DNA Painter CAP. All three of them are on chromosome 5, and their start, end, SNPs and cM values are given. Trish is my known paternal second cousin. Looking at this table with a different color for Trish than for Ann and Sue, I would say that Ann and Sue triangulate and do not match Trish. Clicking on ‘more info’ in the upper left of the table brings up the matrix for these three matches, see figure 4.
The triangulation matrix confirms that Trish does not match Ann and Sue. Since I know Trish is on my paternal side, this would lead me to believe Ann and Sue are maternal.
Another new feature of the colored table is that it can be imported into DNA Painter using the Cluster Auto Painter (CAP). It’s always been possible to import the main AutoSegment ICW cluster using CAP, but now all of the segments can be imported. Clinking on ‘Cluster Auto Painter’ at the top of the colored table brings up CAP in DNA Painter, shown in Figure 7.
Select the tester’s gender and then choose the file. The HTML, shown in Figure 1, that ends with ‘segment_clusters’ is the one that contains the segment data from the colored table. It’s still possible to import the main HTML file into CAP, but that file will not hold all segment clusters. Figure 6 shows the clusters after importing the file that ends with segment_clusters.
One thing to note is that each segment cluster is in a separate cluster. That includes the single segments that were once part of an overlapping segment cluster but do not have triangulation or ICW evidence to be part of the main segment cluster. For example, I share nineteen segments with my cousin Trish, but they are in different locations on various chromosomes. Figure 7 shows some of these segments.
Since Trish and I triangulate with different matches on each of these segments this provides an easy way to group our triangulated matches for each chromosome.
Using the CAP results we can see chromosome 5, which we saw in Figures 3 and 4, Ann and Sue match each other, but did not match Trish. Initially, all of the results are shown as ‘shared or both.’ Since Trish is known as paternal I can change her cluster 4 to paternal. Because she does not match Ann and Sue I can change their cluster 3 to maternal.
Figure 8 shows both clusters as ‘shared or both.’ Figure 9 has the results after moving Trish’s cluster to paternal and Ann and Sue’s cluster to maternal.
FTDNA AutoSegment ICW
The AutoSegment ICW directory for FTDNA contains two more directories that were not present in the 23andMe one. Since matches on FTDNA might have posted a tree with their DNA results, the ancestors and tree directories are included here. The files for the AutoSegment ICW cluster, both as HTML and Excel, the AutoSegment results with ‘no_chart,’ and the ‘segment_clusters’ are the same as for the 23andMe data.
Other than trees and ancestors the displayed results for FTDNA are the same as for 23andMe. Figure 8 shows a more complex match list in the colored table. The first match in the list is my paternal second cousin on my Dad’s father’s side, who I will call Frank. There are a number of matches to Frank on chromosome 20 in this cluster. The matches in blue triangulate with Frank on chromosome 20. Using the matrix for this table entry we can determine the relationship for the yellow, green and red clusters. Clicking on ‘more info’ brings up the matrix in Figure 12.
Frank is part of the large grouping as well as having grey cells, indicated matches, to other matches on chromosome 20. Fred and Dan are brothers and match Fred and Joe in the blue cluster, and Karl matches Frank and several others in the blue cluster. The only ones who do not match anyone in the blue cluster are Jim and Van. Since Frank is a known paternal second cousin, Jim and Van must be maternal. Looking at the surnames Jim listed on FTDNA I can tell that his ancestors were from Germany. My mother’s entire family was from Germany. Unfortunately, neither Jim nor Van have a family tree so it would not be easy to try and find the connection to my mother’s family.
Using CAP with this FTDNA cluster we can look at these clusters in DNA Painter.
The new addition to the AutoSegment ICW tool for FTDNA and 23andme provides all the information that was available before and adds important new features. Now all the segment data is included which shows matches that are grey cells to the main triangulated cluster as well as showing DNA matches that did not fit into the cluster. The matches that do not fit are likely the opposite side of your family.
For example, if the cluster of triangulated matches is on your maternal side, and there are other DNA matches that do not belong in the cluster, they are likely paternal. This can provide valuable hints for searching for family members that might have been overlooked before.
Another application of this enhancement might be the ability to assist researchers in obtaining information about each of the tested parents (e.g., in the case of adoptees, Does, or perpetrators). For example, if a certain ethnicity of DNA matches is found to be often different as compared to the matches linked to the opposite-sided segments. Following this approach, it might be possible to identify parents that are linked to an ethnicity that is underrepresented in the DNA database. In this scenario, almost no opposite-sided segment clusters are present because there are almost no DNA matches on the side of the underrepresented parent. If there are opposite-sided segment clusters, these might provide some essential clues to the ethnicity of the parents.