Comparison of ICW AutoCluster and AutoSegment AutoCluster

I’ve been wanting to do a comparison between the In Common With (ICW) clusters and AutoSegment clusters ever since AutoSegment came out. So today I finally did that!  The ICW clusters are the ones that have been available either on the site, such as MyHeritage or GEDmatch, or can be accessed from your account on Genetic Affairs. The Auto Segment clusters are relatively new and use the csv files that you download from the different testing sites. In previous blog posts, I already discussed the AutoSegment and hybrid AutoSegment tools.

23andMe

I started with 23anMe mainly because I’ve worked most with that data, since my cousin Patricia Harris Anthony (Trish)1 tested there.  First I downloaded aggregate data from 23andMe so I’d have the latest list of matches.  Then I ran Genetic Affairs ICW cluster from 600 cM to 30 cM, with a minimum shared 15 cM and cluster size of 2.  Trish is listed as sharing 402 cM with me, and I definitely wanted to include her.  I also ran Genetic Affairs AutoSegment cluster with the exact same parameters range and the new ‘aggregate data’ I’d just downloaded from 23andMe.

I mainly used Excel for my companion along with DNA Painter.  DNA Painter is my main DNA match data storage location.  I paint both known and unknown matches mainly looking at triangulated matches to group the data.  Of course I was using 23andMe to check on matches that I’d not painted into DNA Painter.  I started by opening the Excel files from both the ICW and the AutoSegment runs.  In the ICW Excel file I copied columns A-E which included the match name (column B), the total cM (column C) and the cluster number (column E) and placed the copy into a new Excel file in columns A-E.   Looking at ICW cluster 1 there were 5 matches listed with Trish as the first match.  So I searched for Trish in AutoSegment file and copied the cluster that she was in there into columns H through K. 

Figure 1. 23andMe ICW AutoClusters 1 and 2 and AutoSegment cluster 1.

The ICW cluster 1 has Trish matched to several other people then in the AutoSegment cluster 1.  Trish and I match on 19 DNA segments so that’s not too surprising.  But notice Sam, Mark and Keith Smith are shown in cluster 2 of the ICW. Keith is father of Mark and brother of Sam. So I’ll move them from the AutoSegment cluster down and look for Laura, Sue, Bill and Micky in the AutoSegment clusters. They were in cluster 2 of the AutoSegments.  I’ll add them to the Excel file.

Figure 2. 23andMe ICW AutoClusters 1 and 2 and AutoSegment clusters 1 and 2.

Looking at the ICW cluster I can see that most of these should triangulate with Trish. The helix symbol on the square shows this triangulation. Both the ICW and the AutoSegment clusters are shown in figure 3.

Figure 3. 23andMe ICW clusters 1 and 2 on the left and AutoSegment cluster on the right.

Below the AutoSegment cluster is a table of chromosome segment statistics. By clicking on the live link for cluster 1 I can see the list of matches in cluster 1 and which segments clusters or segments are underlying this AutoSegment cluster. Figure 4 shows the list for AutoSegment cluster 1. AutoSegment cluster 1, the orange cluster, shows Trish, Mark, Sam and Keith. The chromosome segments that are directly linked to this cluster 1 are listed in the chart. Mark and Keith match me on chr 1 and they and Sam also match Trish and me on chr 17. Also in this chart are the segment clusters that are indirectly linked to the the green cluster 1. These are from the green cluster 2 that are connected to Trish with the grey cells. They show Sue, Laura and Bill who triangulate with Trish and me on chr X.

Figure 4. Segment cluster details for AutoSegment cluster 1.

The ICW AutoCluster 1 and AutoSegment cluster 2 match up. I noticed in AutoSegment cluster 2 that Micky was not listed. Looking at my shared matches to Trish in 23andMe I see that Micky’s match says ‘share to see’.  Without messaging Micky and asking her to share her DNA with me, I can’t see exactly where she matches Trish and me.  Also looking at the details for AutoSegment cluster 2, I see that Laura, Sue and Bill match Trish and me on chr X. I could add Laura, Sue, and Bill to the 23andMe ‘Advanced DNA Comparison’ with Trish and she how they match, but since I’ve already painted them in DNA Painter, it’s just easier for me to look there. Figure 5 shows my DNA Painter paternal chr X.

Figure 5. Trish, Laura, Sue and Bill painted on my paternal chromosome X.

They all match Trish on the X chromosome. Trish and I share paternal great grandparents, Thomas Byrnes and Bridget Fenton. These were my Dad’s maternal grandparents. This X could come from Thomas’ mother Hanora Shannon, from Bridget’s mother Johanna O’Brien, or from Bridget’s father’s mother Bridget Lillis. 

Next I will look at ICW AutoCluster 2 which compares to AutoSegment cluster 1. From the segment cluster details in figure 4, I can see that Mark and Keith match me on chr 1 as well as matching Trish and me along with Sam on chr 17. The ICW cluster tells me they all triangulate with Trish and me, and AutoSegment tells me they all share the same segment. After looking at the AutoSegment cluster 1 details I can see that the chr we all share is chr 17. Here I used the ‘Advanced DNA Comparison’ on 23andMe, shown in figure 6. Sam also matches me on chr 1 with 10 cM. I had used 15 cM as the minimum shared amount between matches when I ran the two clusters, which explains why he did not show up as a match on chr 1 in the AutoSegment cluster 1 details.

Figure 6. 23andMe Advanced DNA Comparison of Trish, Keith, Mark and Sam.

I was very excited to notice these matches on chr 1.  Trish and I are trying to figure out where in County Roscommon our great grandfather Thomas Byrnes was born.  I have other triangulated DNA matches on chr 1 at this location whose ancestors lived in the eastern part of County Galway near the County Roscommon border, and one of those ancestors married a Byrnes.  The chr 1 segment is likely a Byrnes segment that I’ve inherited.  I’ll message Keith, since 23andMe says he was on the site last week, and see what he knows of his ancestors in Ireland. 

Next I moved to ICW cluster 3 which has my 2C Frank Barry.  We share paternal great grandparents Edward Barry and Pauline Fröhlich.  These were my Dad’s paternal grandparents.  Going through the same process as before I found Frank in AutoSegment cluster 3 but the rest of the matches in ICW cluster 3 were in AutoSegment cluster 9 along with two new AutoSegment matches Doug and Peggy.  I found Doug and Peggy in the ICW cluster 8. 

Figure 7. Excel file with 23andMe ICW cluster 3 and corresponding AutoSegment clusters added.

I can see that Mary and Larry triangulate with Frank in the ICW cluster 3 because of the helix symbols.

Figure 8. 23andMe ICW clusters 1 – 3.

I’d looked at Mary and Larry before on 23andMe. Mary is Larry’s mother, and they triangulate with Frank and me on chr 3. Not only that but Mary’s grandmother was from Baden-Württemberg, and my great grandmother, Pauline Fröhlich was from Baden-Württemberg!  Likely that segment of DNA that we share came from my great grandmother Pauline.  

Looking at AutoSegment cluster 9 Doug and Peggy are listed on that same segment as Mary and Larry, see figure 9. Penny is Doug’s daughter.  When I ran 23andMe ‘Advanced DNA Comparison’ using Frank with Doug and Peggy there was no match.  AutoSegment is looking at all matches that fall in the same location on the chr.  It is not differentiating between maternal and paternal.  Here is a good example since Mary and Larry triangulated with Frank and me, they are therefore paternal.  Whereas Doug and Peggy did not match Frank at all, but they do triangulate with me, so they must be on my maternal side, since Frank and I only share paternal great grandparents.  

Figure 9. Excel file showing 23andMe ICW clusters 1 – 3 and corresponding AutoSegment clusters.

This is how chr 3 on my DNA Painter profile looks now, see figure 10.  Larry and Mary triangulated with Frank and me so they must be paternal.  Doug and Peggy did not match Frank at all, but match me so they are likely maternal. It is possible that there’s not enough overlap between Frank and Doug and Peggy.  But there should be a good bit of overlap between Larry and Doug.  I compared Larry and Doug in 23andMe’s ‘Advanced DNA Comparison’ and they do not share any DNA.  That confirms that Doug and Peggy are on my maternal side.

Figure 10. My DNA Painter profile chr 3.

I continued with the 23andMe data in this fashion.  The majority of the matches in the AutoSegment clusters matched with the ICW ones and triangulated with me.  So there weren’t any huge surprises here.

MyHeritage

Next I looked at the MyHeritage ICW and AutoSegment clusters.  Again I downloaded the latest match and shared matches files from MyHeritage.  The ICW cluster used 400-25 cM with a 10 cM minimum between matches and minimum of 3 per cluster, so I ran the AutoSegment cluster using those same parameters.  There were several ICW clusters from MyHeritage where the matches in them did not appear in AutoSegment clusters, unlike the case with 23andMe where only a few matches in a cluster did not appear.  There was also a lot more mixing of clusters than I saw in 23andMe.

In ICW cluster 4 Trish matches 5 people.  Three of them are in cluster 2 of the AutoSegment and do triangulate with Trish on chr 15.  Bob Burns is not shown on the AutoSegment cluster. He matches Trish on several segments and matches me on chr 1 and 5.  The surname Byrnes has had many spelling variations over the years and Bob and Trish and I do match on the Byrnes side of my family.  ICW cluster 5 matches with many of the people who were in AutoSegment cluster 1.  Henry matches Trish and me on our 2nd great grandmother, Johanna O’Brien side, as his mother’s maiden name was O’Brien.  Henry triangulates with Trish and me on chr 7 as do Guy, Dawn and Alex.  Jake triangulates with Trish and me on chr 8, but he is also a cousin of Henry.  These are summarized in figure 11.

Figure 11. Excel file showing MyH ICW and AutoSegment clusters with matches to Trish and me.

As I’m doing these comparisons I’m going down the list of the ICW AutoClusters and then finding the corresponding AutoSegment clusters, if there are any. Continuing down the MyH ICW list cluster 16, shown in figure 12, became very interesting.  

Figure 12. MyH ICW cluster 16 and AutoSegment cluster 9.

Looking at the details for AutoSegment cluster 9, shown in figure 13, I see that Clara, Matt and Otto match me on chr 3. I had painted Matt on chr 3 as unknown.  I’d not painted Clara or Otto before. 

Figure 13. MyHeritage Segment Cluster 9 details.

Checking the shared matches with Matt on MyHeritage I found both Clara and Otto triangulated with him.  Both Matt and Clara live in France, and Otto lives in Germany.  I added them to the unknown group with Matt on my DNA Painter profile, as shown in figure 14 .  

Figure 14. My DNA Painter profile chr 3 after adding Matt’s triangulated group.

Looking further down Matt’s shared match list on MyHeritage I found that he triangulated with Terri Grant.  Her name was very familiar and I was sure I’d painted her and others with that same surname. I found that I’d painted her as a triangulated match to Frank Barry, since they had matched on GEDmatch 1 to 1.  Frank Barry is not on MyHeritage, so I’m unable to compare him directly with Matt.  But since Terri is on both GEDmatch and MyH and she triangulates with both Frank and Matt, now I know that Matt and those that triangulate with him must be paternal.  On DNA Painter I can merge Matt’s unknown group into the paternal group with Larry and Mary.

Figure 15. My DNA Painter profile chr 3 after discovering that Matt’s group was paternal.

Not only did I discover two new matches to paint, Clara and Otto, but I was able to merge Matt’s unknown group into a paternal one.  This is the paternal segment that I likely inherited from my great grandmother Pauline, who was born in Baden-Württemberg.  

FamilyTree DNA

When I first thought of looking at my FTDNA data with ICW AutoCluster and AutoSegment I thought that using the two clustering techniques together might help with matches, since FTDNA doesn’t have a triangulation function.  But after working with my clusters I can’t say that it did.  Both cousin Trish and Frank are also on FTDNA.  Unlike with 23andMe or MyHeritage each of their cluster of ICW matches and their AutoSegment cluster matched at FTDNA, and I’d already painted them all. 

I went down the list of ICW clusters and did find several interesting things. ICW AutoCluster 35 had 4 matches listed.  Three of these were found in AutoSegment cluster 16.  There were an additional three matches in AutoSegment cluster 16.  I found Edith and Amy in ICW cluster 7. This reminds me of 23andMe ICW cluster 3 and 8 shown in figure 9, and is probably a hint that Edith and Amy are on the opposite side of my family of the group in ICW cluster 35.  See figure 16.

Figure 16. ICW AutoCluster 35 and 7, and AutoSegment cluster 16.

I looked at chr 12 on my DNA Painter profile and found that I had painted these matches as two different groups but both of them as unknown, as shown in figure 17.  All of these matches are on the same location as AutoSegment has said. John wasn’t in the AutoSegment cluster and maybe there wasn’t enough overlap to his segment for him to be included, but he shows up on chr 12 with the others.   Next, I looked at the matches in the FTDNA matrix which is shown in figure 18.

Figure 17. Chromosome 12 on my DNA Painter profile.
Figure 18. FTDNA matrix showing the 7 matches.

From the matrix I can see that Edith and Amy would be on one side of my family and the other 5 would be on the other side.  Unfortunately, I don’t know which set is on which side of my family.  Of these seven matches only Beth has a tree and that only contains 2 people. Using the name of the only deceased person in the tree I searched Ancestry’s public trees and found several trees that contained him.  I looked through a couple of those trees and found the surname Burns in both of them.  That surname is on my Dad’s mother’s side of the family.  The common great grandfather that Trish and I share was Thomas Byrnes.  So, it’s possible that the group of 5 in the matrix are on my paternal side, and the group of 2 would then be maternal.  At this point I don’t have enough evidence to be certain of that, and I will just make a note on my DNA Painter profile by their groups.  Perhaps I should email some of the matches in these groups, and see if we can figure out the connection. 

Summary

The ICW AutoCluster gives a listing of your shared matches.  In general, these would all be on one side of your family.  I actually have a couple cases in my family where that is not true, but it does seem to be rare.  So to begin with the hypothesis would be that all the ICW matches in a specific cluster are on one side of your family.  The AutoSegment cluster is telling you all the matches in it are on the same chromosome.  It does not tell you which ones are paternal and which ones are maternal, and the AutoSegment cluster can very well be a mixture of these. 

Each of the sites has different features and need to be treated a bit differently, so I will summarize them individually.  On 23andMe an ICW cluster will have the helix symbols in the squares if the matches triangulate.  That makes looking at the AutoSegment cluster very easy because knowing certain segments triangulate identifies them as being on the same side of your family.  On MyHeritage, there was more mixing of clusters when I made the comparison between ICW and AutoSegment. It was necessary to check the chromosome browser on MyH to make sure that matches triangulated since looking at either of the clusters did not provide enough information to determine that.  This step had not been necessary on 23andMe because of the triangulation symbols in the ICW cluster.  FTDNA also required checking matches on their website.  For FTDNA looking at matches in the matrix was needed to determine if AutoSegment matches were on the same side of the family or not.  The AutoSegment clusters from GEDmatch have already included the GEDmatch triangulated data, so they will all be on the same side of the family.

Putting the two types of clusters together uses the ICW, that indicates one side of your family, to then group the segments that belong on that side of your family. If there are others in the auto segment group they would likely belong to the other side of your family.  The next step is to check the matches on the testing site to see if there is more information, such as a tree or surnames that will help with your assessment and possibly confirm it.

  1. Patricia Ann Harris Anthony, Trish, has given me permission to use her real name. All the other names used in this post are fictitious.

Genetic Affairs Hybrid AutoSegment Cluster

An exciting addition to Genetic Affairs is Hybrid AutoSegment Clusters!  Now you can run the AutoSegment clusters with data from 23andMe, FTDNA, MyHeritage and GEDmatch or any combination of these sites all into one cluster analysis.  The entry page is shown in figure 1.

Figure 1. Entry page for Hybrid AutoSegment Clusters

Starting at the top of the page you’d want to give a name for you Hybrid cluster.  I often use the name of the person whose cluster it is and a date or some information that will tell me exactly what the file is.  You can select the minimum overlapping segment size between your matches, and the minimum cluster size.  The smaller the overlap and the smaller the cluster size the more matches that will be used, and it could end up with an html cluster that is too large for your browser to load.  You can always view it with the Excel file or look at the html file that has all the information without showing the large cluster.

If you have a large number of matches in known pileup regions you can choose to have those matches removed from the analysis. Pileup regions are explained in more detail at Genetic Affairs

Another parameter to consider is liftover for FTDNA.  Of the various testing sites only FTDNA still uses build 36 for their comparison whereas the other sites are using build 37.  A build is a reference system used by the testing company that represents the human genome. For comparing matches across the different companies I would want all the data to be using the same reference.  Performing liftover on the FTDNA matches converts them to build 37, so that you can easily do a direct comparison.  

You can select different min and max cM settings for each of the sites.  What I typically do is to look at the site and select a max cM value that will include the highest match that I want in the analysis.  My paternal 2nd cousin Trish1 tested at 23andMe and uploaded to MyHeritages and GEDmatch.  She did mtDNA and Family Finder tests at FTNDA. But each of the sites reports a slightly different cM that she and I share.  Table 1 shows the amount of DNA Trish and I share at each site.  Both 23andMe and FTDNA include our X chromosome match, as well as FTDNA counting small cM down to 1.  I could run MyHeritage and GEDmatch with a max of 400 cM, but if I used 400 cM for all of the sites, I’d not include Trish’s data at 23andMe and FTDNA.  I usually just use 600 cM max for all four of the sites, since I know Trish is my highest match it’s not hurting anything to have the max higher than needed.

I find it harder to select a minimum cM value.  I’d like to go down to around 7 cM, but then the clusters are so large that it’s very difficult to load and view them, at least on my laptop.  Minimum shared is the amount shared between your DNA matches. And minimum cluster size is the number of matches needed to make a cluster.  

The match and segment files that are used in the analysis are the ones that you download from the particular site.  You can select to run two, three or all four sites.  If you want to run just one site AutoSegment you should use the AutoSegment Analysis from Genetic Affairs main page.  The cost of the Hybrid AutoSegment Analysis is 100 credits.  It is not part of the free tier which is the 200 credits you receive when you first join Genetic Affairs.  For the paid tier you can make a one-time purchase of any amount.  For example, a $5.00 (USD) would purchase 5.00 credits. Or you can select to have a monthly subscription for as little as $5.00 (USD) per month. Monthly subscriptions also provide 10% additional credits. So a $6.00 (USD) subscription will result in 660 monthly credits.

When your hybrid AutoSegment cluster is ready you will receive an email. If the resulting file is less than 8 MB, the zip file will be attached to the email. For larger files the email will contain a link where you can download your results file.

Results

This is my beautiful html cluster from 600 cM to 25 cM on all 4 sites: MyHeritage, 23andMe, FamilyTreeDNA and GEDmatch, shown in figure 2. The segment clusters from MyHeritage, 23andMe and FTDNA look at segments that overlap on a particular chromosome.  In general they are not considering maternal or paternal.  FTDNA will label a DNA as maternal or paternal if you have identified a match in your tree.  Maternal or paternal is sometimes indicated at 23andMe. You can also add maternal or paternal to known matches in the CSV files after retrieving them from the testing company. The GEDmatch data uses both the triangulated data as well as the segment data, so the results for GEDmatch are triangulated segments.  If you know one match in the GEDmatch data in a particular cluster is paternal, you know that all the GEDmatch segments in that cluster also have to be paternal because of the triangulation.

Figure 2. Hybrid cluster from 600 cM to 25 cM on all 4 data sites.

That first orange cluster with lots of grey squares has my paternal 2nd cousin Trish in it.  Since Trish is on all four of these sites she’ll show up as four matches.  The table below the html cluster contains the chromosome segment statistics per AutoSegment cluster, shown in figure 3. This table contains a link that will bring up a more detailed page concerning the cluster of interest as well as provide some information concerning the identified segment (clusters) such as the chromosomes underlying the segment clusters, how many matches (per DTC) and if there are any maternal or paternal annotations linked to these clusters.

Figure 3. Chromosome segment statistics for AutoSegment cluster 1.

Clicking on the AutoSegment cluster 1 link in the table brings up a visualization of the identified segment clusters and the individual segments that have made up cluster 1 in the html file. These segment clusters are shown in figure 4. A colored square is present between two segments indicates that there is sufficient overlap between those segments.

Figure 4. Chart displaying a visualization of the individual segment clusters (colored groups) and the underlying segments (x-axis and y-axis).

Figure 5 shows the segment cluster information for segment cluster 13, the red one about in the middle of figure 4.

Figure 5. Segment cluster information for segment cluster 13.

There are 7 matches listed here.  The first column tells the cluster number where the match was found in the html cluster. The second column has the segment cluster number, here all are 13. Next is the chromosome number, which happens to be chromosome 13 here. Then the start and end values of the segment. The diagram is a visualization of that segment of data. You can easily see that all 7 of these overlap. The SNP value is in the next column. Followed by the name and kit number of the match. I added a red circle around DTC, DNA Testing Company.  The next column has the number of shared cM on this segment, followed by the total number of cM that the match and I share. The last 2 columns have paternal and maternal if that information is found in the file.

The last 4 matches are all of Trish from the 4 sites. Mike tested at MyHeritage.  I know that Mike is a maternal 2nd cousin once removed.  He and I share my maternal great grandparents as our most recent common ancestor.  He has a segment of DNA on chromosome 13 that overlays Trish’s segment.  But because MyHeritage is giving all segments that fall in the same location he and Trish show up in this cluster.  If I didn’t know who Mike was, I’d go to MyHeritage and run the chromosome browser with Trish and Mike in order to see if they triangulate with me.  Next is A.B. whose data came from GEDmatch.  Because the GEDmatch segment data here is triangulated I know that A.B. must be paternal because he triangulated with Trish.  Sue is from FTDNA and has a P off to the far right, which tells me that I’ve placed her in my tree, and FTDNA knows that she is paternal.  If she were totally unknown I’d use the FTDNA chromosome browser and matrix to determine if she matched Trish or not.

Having the data from the different sites displayed this way makes it’s easy to see matches that overlap and might be related.  Then you can check in the chromosome browser on the individual testing company site to confirm if they are on the same side of your family or one is maternal and the other is paternal.  It’s especially helpful when I find a match that I’ve not looked at before, and now I have some idea how the match might relate to me based on who else is on that DNA segment and in the cluster.

I’ve done a lot of research with matches that my cousin Trish shares with me, so I decided to look at some more distant matches. Searching through the list of names in the Excel file I found Sophie who is in this cluster 70 which I’ve circled in the large html cluster in figure 6.

Figure 6. Cluster 70 circled.

Figure 7 shows the segment clusters chart using a visualization of the individual segment.  I have no idea who Andrea is, other than she matches Sophie.

Figure 7. Segment clusters chart for cluster 70 of the html file. Note that DNA matches can be linked via different segment clusters and therefore multiple segments.

Sophie tested at FTDNA and uploaded her results to GEDMatch. She shows up here matching herself and Andrea.  Sophie and I have emailed a number of times.  We know that our comment ancestor is on my Aide line.  My 2nd great grandfather Thomas Barry married Mary Aide in County Kilkenny, Ireland.  I have the baptismal records for their two children, Edward, my great grandfather, and Mary his sister.  Edward was baptized in 1840 and Mary in 1843.  None of the baptismal records prior to 1823 and none of the marriage records for the Catholic parish in Ballyhale, Kilkenny survived.  So I’ve not been able to find Mary Aide’s baptismal record or Thomas and Mary’s marriage record.

Sophie’s Aide family also lived in County Kilkenny, and goes back another generation or two past my Mary Aide.  Mary Kilfoil married an Aide, and as best as we can tell without records and with the DNA evidence Mary Kilfoil is either my 3rd or 4th great grandmother.  This continues to be something that I’m searching, but for now we’ll leave it at that.

Andrea on GEDmatch indicated that she’d tested at 23andMe.  Almost to my surprise I found her in my match list on 23andMe!  She had no triangulated matches with me, which was a disappointment as I like to work with triangulated matches.  But looking at her ICW match list, shown in figure 8, was amazing!  Frank Barry, my 2C who also descends from Thomas Barry and Mary Aide is at the top of the list.  Looking down the list I’d already added many of her matches and their triangulated matches to my DNA Painter profile.  I’d messaged Tyler over a year ago on 23andMe and never got a reply, so I really don’t have any information on him.  He does triangulate with known Irish matches, however.  Kay and I have emailed a good bit.  She has a great grandfather surname Byrne from County Roscommon.  I have my great grandfather, Thomas Byrnes, from County Roscommon.  We’ve not found the common ancestor yet, but the connection seems to be on my Byrnes side.  Beth is a bit of an unknown as she and Trish have segments on the same chromosome and somewhat overlap, but don’t show as a match.  Either Beth is on my maternal side or there’s just not enough overlap with Trish.  I need to message her for more information.  Ashley was a match I’d not looked at before.  So I looked at her shared matches and any information they might have listed. I found one of her matches with ancestors from Buffalo, NY. Thomas Barry’s family lived in Evans, Erie County, NY, which is not far from Buffalo.

Figure 8. Andrea’s shared match list with me on 23andMe.

One of the surnames, Green, and one of the locations on Andrea’s information on 23andMe were the same as I knew were in Sophie’s family.  And not finding much information from Andrea’s matches I emailed Sophie.  Sure enough Andrea is on the same line as Sophie and is Sophie’s 2nd cousin once removed.  Andrea is another match on my Aide line.  Now if I could just make the connection to our two trees and figure out if Mary Kilkoil in my 3rd or 4th great grandmother!  I’ve tried WATO, but most of my matches to Aide family members are too small to be useful in WATO, so I haven’t gotten very promising results.  

Summary

I’m finding the Hybrid AutoSegment Clusters on Genetic Affairs very promising.  There are so many new connections for me to explore!  I would not have found Andrea and been able to connect her to Sophie if not for the hybrid clustering.  Sophie has not tested on 23andMe.  Andrea didn’t have any triangulated matches there.  At most I’d have seen the Green surname, and since it’s not that unusual a name I might not have ever thought of Sophie and that it’s in her family tree.  The Hybrid AutoSegment Clusters is going to be a huge help for me trying to make connections between more of my DNA matches.

You can run the AutoSegment Clusters with any 2, 3 or 4 of the testing companies: MyHeritage, 23andMe, FTDNA and GEDmatch. It will provide you with clusters that are based on shared segments across the companies that you selected. With the exception of GEDmatch, where the data has already been triangulated, you will need to compare matches in one of the segment clusters with each other using the chromosome browser, and on FTDNA the matrix tool, to determine if the matches triangulate or not.

Now to go explore more of my hybrid clusters!

  1. Patricia Ann Harris Anthony, Trish, has given me permission to use her real name. All the other names used in this post are fictitious.

GEDmatch AutoSegment

There’s an enhancement to the GEDmatch AutoSegment clustering on Genetic Affairs.  Now the GEDmatch option includes using the triangulated data as well as the all segment data, both of which are available on Tier 1 of GEDmatch.

There are a number of settings for the GEDmatch DNA Segment Search.  I used 1000 for my analysis.  Most of the settings can be left to the defaults.  However, if you’re including matches that have long segments with you, you’d want to click the ‘Prevent Hard Breaks’ option. GEDmatch default adds in hard breaks when it finds segments over 500,000 base positions.  

Figure 1. GEDmatch Segment Search page.

After running your Matching Segment Data you would want to download the csv file.  There is a  ‘HERE’ button at the top of the list of segment data that allows you to save the csv file to your computer.

Figure 2 shows the GEDmatch Segment Triangulation Screen.  It defaults to 500 kits, which I changed to 1000 to match what I’d run on the Segment data.  The upper threshold of 3000 cM would exclude parent-child relationships but probably won’t exclude siblings. I left all the other defaults as they were.

Figure 2. GEDmatch Triangulation Data page.

The Segment Triangulation Data can be saved as a tsv file.  The ‘HERE’ button to save this data is found at the bottom of the table of triangulated data.

Segment and Triangulation Files

The data in the Segment file shows a list of my matches, the chromosome where we match, how many cM we share, the SNP value, and the start and end of the data on the chromosome.   Figure 3 shows the data that I share with a match, Joe1.  We share 14.0 cM on chr 6 from about 162 M to 168 M.

Figure 3. Segment on chr 6 that Joe and I share.

When I look at Joe’s triangulated matches in my triangulation file I find that he has 4 matches on chr 6.  These data, shown in figure 4, show how many cM Joe, another match and I triangulate in that particular region. It looks as if Joe, John and I triangulate across the entire region that Joe and I match.  Whereas the triangulated region for Joe, and Mary or Sue and me is less than the 14.0 cM we share.  The data shown in the triangulated data file is showing only the start, end and cM that the three of us who triangulate share.  To see how many cM I share with Mary, or John, or Bill or Sue, I’d have to look at the All segment data file.  That is why both of these files are needed for the analysis.

Figure 4. Triangulated data that Joe and I share.

With triangulation I’m looking for at least three independent matches that each match me and also match each other.  For example a parent and child, or 2 siblings would definitely have a common ancestor, but they would not be independent of each other.  The child got half of his or her DNA from that parent, and siblings would share a great deal of DNA in common as well. When the matches triangulate it’s very likely that we share a common ancestor in the genealogical timeframe.  Then the next step would be to use traditional genealogy methods to attempt to identify that common ancestor.

Sometimes there are segments found in the triangulation file that are not in the segment file.  I downloaded 1000 segment matches and 1000 triangulated matches.  Not all segments are going to have triangulated matches, and those segments that don’t have any triangulation will not be included in the cluster.  Consequently, since I used 1000 matches for each of the files, there will be segments that triangulate that are not found in the segment file.  New DNA matches based on these triangulated segments are reconstructed and these segments are used to estimate the total cM.

Results

Now you are ready to run the GEDmatch enhanced AutoSegment Cluster.  Figure 5 shows the data entry page for analysis.  Select the maximum cM value you want to include.  I pick this value based on what my highest match is and whether or not I want to include that match in the run.  The minimum is a bit harder to pick.   I know that I don’t have many close cousins at all so I usually pick a low minimum, perhaps lower than most people would choose. 

Figure 5. Genetic Affairs data entry screen for GEDmatch AutoSegment analysis.

A zip file with the results is sent to your email.  My results are shown in figure 6.

Figure 6. AutoSegment cluster results for my GEDmatch data.

Below the clusters are three tables containing information about the clusters and the matches.  The first table, shown in figure 7, is the segment statistics for each of the AutoSegment clusters.  It describes the segment clusters that are found in each cluster, lists the chromosomes that are present, the number of matches in the cluster, the number of segment clusters and the number of segments.  By clicking on the link another window opens showing the segment clusters that make up the large cluster shown in the original html cluster. An example of this is shown below in the Data Analysis.

Figure 7. Chromosome segment statistics.

Below this table is the AutoSegment cluster information, which is shown in figure 8.  It shows the name and kit number of the match, the amount of cM shared, the number of shared matches, the cluster that this kit in in and other information about the match. The notes indicate the source of the particular segment. Some of this information, such as MyHeritage, and Migration-V4-M, comes from GEDmatch. When the (triangulated) segment is not found in the GEDmatch segment file, Genetic Affairs reconstructs it based on segments that triangulate with it.

Figure 8. AutoSegment Cluster Information.

Shown in figure 9 is the third table which is the Individual segment cluster information.  The cluster listed on the far left is the cluster number from the large html cluster.  Clicking on that number takes me to the segment clusters that are making up the cluster 29.  This is the same as if I clicked on the ‘segment and segment clusters for cluster 29’ in the Chromosome segment statistics.  Next is the segment cluster number, the chromosome, the start and end values, the SNP, match name and kit, cM for this segment and the total cM for this match. The segment representation chart allows me to quickly assess the overlap between the different segments within a segment cluster.

Figure 9. Individual Segment cluster information.

Data Analysis

Earlier I looked at Joe and his triangulated matches on chr 6 in the triangulated segment table.  Searching for Joe in the html cluster I found him in cluster 5, the larger brown cluster in figure 6, with grey squares to cluster 4, the purple cluster.  Joe matches my known 2nd cousin Frank on chr 20 and was placed into cluster 5 with Frank.  You can see the line of grey squares below where I labeled Frank, that are Joe’s matches to John, Bill, Mary and Sue on chr 6 in the purple cluster.

Figure 10. Enlarged image of cluster 4 from figure 6.

Because of the grey cells Joe will show up in the segment clusters for clusters 4 and 5.  Clicking on the link for cluster 4 in the statistics I get another chart with the well known animations which represents the underlying segment clusters and their segment members. This chart allows you to quickly see which and how many segment clusters are present and how connected they are.

For AutoSegment cluster 4, I see there are two segment clusters (blue and orange cluster) in the chart. The orange cluster in figure 11 is the group that triangulates on chr 6.  John and Bill also triangulate with another group of matches on chr 9, and those are in the blue cluster.

Figure 11. Two fully connected segment clusters for AutoSegment cluster 4.

Below these clusters is the Segment cluster information table, shown in figure 12. The column on the left indicates the html cluster where each person was found. The second column indicates the segment cluster. Chromosome 6 has Joe and his triangulated matches. As seen in the clusters John and Bill also triangulate on chromosome 9 with other matches. The start and end values are given and a visual representation of the relative size of the segments are given. The number of cM for each match as well as their total cM are also shown in the chart. The table allows for a quick check how similar the segments are and how they align.

Figure 12. Segment information chart for the blue and orange clusters shown in figure 11.

DNA Painter Cluster Tool

One thing I like to do with my cluster results is to put them into DNA Painter.  Using the ‘Cluster Auto Painter’ in the DNA Painter tools I can enter the html file from Genetic Affairs and generate a new profile with all my clusters in it.  Figure 13 shows my chr 6 on DNA Painter after importing the html file.  Cluster 4 is the pink one on the far right.  Joe is on cluster 5 so he’s in a different color.  The other green segment in that location is Joe’s son. He would not have been an independent match which is why I left him out of the earlier triangulation.  The other 4 triangulated matches are in the pink cluster.  I did not add paternal or maternal to any of the matches in my all segment file, so all of my clusters here are showing up as ‘shared or both’.  Another thing I’ve done with this DNA Painter profile is to import the GEDmatch segment data file and compare the segments to those in the cluster. It is then very easy to see segments that don’t have any triangulated matches.

Figure 13. DNA Painter profile of chr 6 obtained from the Cluster Auto Painter for my data.

Just like the other AutoSegment analyses the GEDmatch AutoSegment clusters costs 50 credits per run.

Summary

The enhanced AutoSegment Clustering for GEDmatch uses the all segment file and triangulation data files from GEDmatch and clusters the matches into triangulated groups.  Triangulated groups, especially of four or more, indicate a common ancestor in a genealogical timeframe. Compared to the AutoSegment implementation of MyHeritage, FTDNA and 23andme the GEDmatch version frees users of the manual process of checking the validity of the identified segment clusters.

Individual clusters can then be analyzed using traditional genealogical methods to find a common ancestor.  The html file containing the AutoSegment clusters can also be imported into DNA Painter using the ‘Cluster Auto Painter’ tool and visualized in detail on individual chromosomes as well. This is going to be a huge help to me as I research which of our common ancestor my 2C and I share on a particular chromosome segment.

  1. All names of living individuals have been changed to protect their privacy.

AutoSegment Cluster

There’s a new feature at Genetic Affairs – AutoSegment Analysis.  Unlike In Common With (ICW) analysis AutoSegment looks at the DNA segments for the analysis.  ICW looks at your match list and forms clusters based on shared matches and groups these matches based on them matching the others in the group. The ICW analysis doesn’t look at the DNA or which chromosome contains that DNA, which is why it works for sites that do not have a chromosome browser.  AutoSegment first looks at the segments of DNA and where they are located on the chromosomes.  Then it groups them based on matches sharing the same segment on the same chromosome (using an user-defined minimum cM overlap) and uses that as the basis for the clustering.  This technique uses the downloadable files from the company’s site for the analysis.  These are flat segment files without indication as to whether the segment is maternal or paternal for most cases, and you will need to determine that for yourself after you have your AutoSegment results.  FTDNA files indicate maternal or paternal if you have placed that DNA match in your tree on the site.  Also 23andMe will indicate maternal or paternal if one or both of your parents have tested, or now they are indicating it where they have placed a DNA match in the tree that they are generating for you. Otherwise Genetic Affairs doesn’t know if the data is maternal or paternal. However,  after running an AutoSegment cluster and analyzing your results you can edit the original file that you downloaded from the company and add maternal or paternal labels for any match in the list.  Then rerunning AutoSegment will indicate P for paternal or M for maternal on the data in the clusters.

Why would you want to sort your data by DNA segments?  Recently my 2nd cousin Patricia Ann Harris Anthony, Trish 1, tested on 23andMe.  I’ve known Trish all my life.  Every summer when I was a kid we’d go to Norfolk, Virginia, where my Dad grew up and spend a week at the beach and visiting family.  All of Dad’s mother’s side of the family lived near by.  Trish, her mother and grandmother would come down to the beach for a day and visit with us.  I was excited to have Trish test.  Using her results and mine will help to try and extend that part of our family tree.  Our most recent common ancestors are our great grandparents, Thomas Byrnes and Bridget Fenton.  Thomas was born about 1840 somewhere in County Roscommon, Ireland, but we don’t know exactly where.  Bridget was born in Bruff, County Limerick in 1853.  Trish and I share 19 segments of DNA across 14 different chromosome.  What I’ve been doing is looking at the matches on these segments, looking at any information on surnames and/or locations they share and building out Quick & Dirty (Q&D) trees to determine which common ancestor of Trish and mine gave us that DNA.  

One of the newest tools to help analyze DNA results in Genetic Affairs AutoSegments, shown in figure 1.  Data from MyHeritage, FamilyTreeDNA, 23andMe, and GEDmatch can be used for the analysis.  Each analysis starts with the data files that are available to download from the company site.  The GEDmatch segment data file is available only on Tier1 of GEDmatch.

Figure 1. Starting page for AutoSegment Analysis

Since Trish’s data is on 23andMe I started by downloading that data file which is found at the bottom of the page of ‘View All DNA Relatives’ as shown in figure 2.

Figure 2. 23andMe segment data file location

After selecting 23andMe for analysis in Genetic Affairs the screen shown in figure 3 appears.

Figure 3. 23andMe screen for starting AutoSegment analysis.

I need to select the maximum and minimum cM values, as well as the minimum overlap cM value for segments. Plus I need to decide if I want to have known pileup regions removed.  There are certain regions on chromosomes that have been defined over the years as pileup regions.  Typically you might have a large number (over 100) matches in this region which could pretty much be of any size, not just the 7-10 cM variety.  Genetic Affairs has added a feature in the AutoSegment analysis to remove those if you chose to do so. Genetic Affairs is using the list published by Li et al (2014)2 for this purpose.  The most likely scenario that I can see is that you’d run your AutoSegment Cluster and see some huge number of matches in a particular location.  Then check to see if that is a known pileup region that you might want to have removed in a subsequent run.  My experience is not finding pileups in the known locations but having what is considered as personal pileup regions.  There would be no easy way for software to deal with these. But as I said they are an extremely large region of matches in a specific location.  I have around 200 matches on chr 2 in a specific location.  When it was only a few I emailed with them and found that they lived in Northern Ireland or their ancestors were from counties in the northern part of Ireland.  Since my Thomas Byrnes was from Roscommon this could be likely.  I don’t know exactly where he lived in Roscommon and Counties Sligo, Leitrim and Fermanagh are all in the north of Ireland and near Roscommon so it could be possible.  When your family is a small and distant as mine you don’t rule out anything that could be possible!  But in any case if you find large number of matches in a region that is a known pileup region it might be useful to run the AutoSegment analysis with the pileup regions removed.

A typical range for an AutoCluster analysis is 400 to 50 cM.  However, since I wanted to include Trish’s data in the analysis I looked at her match to me to see how much DNA was shared.  23andMe says we share 402 cM, as shown in figure 4. 

Figure 4. DNA shared by Trish and me on 23andMe.

Therefore I want to set the maximum cM value on Genetic Affairs to the next higher value above 402 cM, which is 600 cM.  Setting the lower cM value to some extent depends on your family. My closest cousins are 2nd cousins and so I tend to set a lower minimum value than many people.  I’m going to use 30 cM for my minimum.  Most people would likely get good results with the minimum set at 40 or 50 cM.  My results arrived by email very quickly.  I saved the zip file and unzipped it to get the html and Excel files.  Figure 5 shows the html file.

Figure 5. AutoSegment clusters from my 23andMe analysis.

My cousin Trish is in the top orange cluster and matches people in several other clusters.  This is not surprising since she and I are 2nd cousins.  Frank, who is in the third (red) cluster is also a 2nd cousin.  He is on my Dad’s father’s side.  I only met Frank six years ago when I started a tree on Ancestry, whereas I’d know my Dad’s mother’s side cousins all my life.  I knew my Dad’s father’s side of the family lived in upper state NY.  The tree my Dad had done years earlier went through his first cousins, but it didn’t cover their children.  But once I started entering the tree from Dad’s notes in Ancestry I found Frank’s tree also had my great grandparents in it, and I sent him a message.  Frank and I finally got to meet in person in 2015.  Mark is an interesting match since he is shown as being related to both Trish and Frank.  Frank also has grey cells connecting him to the purple cluster that has Sue and her mother, as well as Dot and her mother, in it.  We’ll investigate these matches further.

At the bottom of the html clusters is a table that explains/expands these clusters.  The table is shown in figure 6.

Figure 6. Table of chromosome segment locations.

These DNA match clusters are created based on the segment clusters. So someone like my cousin Trish, who shares many segments with me, will appear in a number of clusters.  This table shows the various clusters and lists on which of the chromosomes segment clusters are identified.  Clicking on any of the cluster links will bring up that particular cluster page. This page will then display the different segments underlying the cluster and the segment clusters in which these segments reside. After looking through the clusters I decided to explore cluster 3 from the list.  Cluster 3 is shown in figure 7.

Figure 7. Segment clusters from cluster 3 of the html file.

I’ve labeled the matches that were also seen in the AutoSegment html file image shown in figure 5.  Cluster 3 is the red one in the html cluster.  Because Sue and her mother match Frank and are in cluster 4, Dot and her mother in cluster 4 are included here as well, even though they are not indicated as matching Frank.  At the bottom of this html page is a summary of the segment clusters information, which is shown in figure 8.

Figure 8. Segment cluster information for cluster 3 generated from the original AutoSegment html file.

The most interesting match here is Mark who is shown in the dark blue cluster matching Fred, and in the red cluster matching Trish.  Looking at the chromosome segment chart at the bottom of the html page (figure 8) it shows Mark and Fred match on chr 20 and Mark and Trish match on chr 12.  From emailing with Mark I know that he and his father triangulate with Fred and me on chr 20 which would be a Barry match, and that our triangulated match on chr 12 is with Mark and his son.  I assumed that the chr 12 match was also a Barry match, until Trish’s DNA results arrived.  Contacting Mark again I found out that his mother was a Byrnes!  This points out how initial assumptions that all the segments of a match are from one most recent common ancestor may not be true.  it is definitely a good starting point, but it also needs to be reevaluated when new matches arrive. This is an excellent example of how valuable the AutoSegment tool is to this type of analysis. It also points out how important it is to make contact with matches and to keep chatting with them.  Mark and his father match on chr 20 with my Dad’s father’s side, and Mark’s and his son’s matches on chr 12 matches are from his mother’s side and match with my Dad’s mother’s side.  

Looking at other matches on this cluster Joe appears to match both Frank and Trish.  The only information on Joe’s 23andMe profile is that his paternal grandparents came from Italy.  I have no known Italian ancestors, so that was puzzling, and he did not respond to any messages.  Using the ‘Advanced DNA Comparison’ on 23andMe shows that Joe doesn’t match either Frank or Trish.  Looking at the table in figure 8 it appears that there is plenty of overlap for a match to show up, so that makes me think that Joe is likely on my Mother’s side of the family. 

Other interesting matches are Sue and her mother and Dot and her mother.  I looked at each in 23andMe Shared matches and found their information. Again I can check them in the ‘Advanced DNA Comparison’ to see if they truly match Frank and/or Trish.  But looking at the table in figure 8 there appears to be a very small overlap between Frank and Dot or her mother.  Also recall that Dot and her mother did not have grey cells to Frank in the original AutoSegment cluster (see figure 5). Whereas there is a large overlap between Sue and her mother and Frank.  The ‘Advanced DNA Comparison’ gives the same results as shown in figure 9.  Sue and her mother triangulate with Frank and me and therefore must be paternal on my Barry side.

Figure 9. 23andMe ‘Advanced DNA Comparison’ for Frank and Sue and her moth

Their information on 23andMe indicates that Sue’s mother’s maternal grandmother was from the Baden-Württemberg area. My great grandmother Pauline Fröhlich who married Edward Barry was from the Baden-Württemberg area, so that segment on chr 3 is likely from my great grandmother Pauline. Looking at Dot and her mother along with Frank on the 23andMe ‘Advanced DNA Comparison’ shows no match at all.  Dot and her mother could be maternal matches or perhaps there’s not enough overlap to Frank to show as a paternal match.  I’ll leave them as ‘unknown’ for now.

This is only the analysis for one of the cluster links from the table, shown in figure 6, that was located at the bottom of the AutoSegment cluster html file.  All of the links in that table could be explored for similar information and new findings.

Another new feature for AutoSegment is being able to add maternal or paternal to the original data file.  My file was downloaded from 23andMe.  Opening that csv file in Excel allows me to add maternal or paternal in a new column at the end of the existing file.  The 23andMe file is in the form of name_relatives_download.csv  For this file the last filled column is AD so I’ll be added maternal or paternal in column AE.  I search column A for my cousin’s name – Trish Harris Anthony.  I found her starting in row 648 as shown in figure 10.

Figure 10. Excel file looking for Trish.

Next I’ll scroll to the end of the row and place paternal in column AE for all the rows in which she is listed, as shown in figure 11.

Figure 11.  Paternal added to Trish Harris Anthony’s rows of Excel dat

I’ll do this for Frank and other matches that I know where they fit in my family.  Then rerunning Genetic Affairs AutoSegment will include that information in the html file display, as shown in figure 12.  Besides adding Trish and Frank as paternal I added maternal to a couple of known matches in cluster 8.  Looking at the html AutoSegment cluster in figure 12 you can see P next to Trish’s name and next to Frank’s name for paternal. Joe and Mark are also considered paternal, since they match both Trish and Frank.  I know that Mark is paternal, but having already looked at Joe’s match in the ‘Advanced DNA Comparison’ on 23andMe I know that he doesn’t match either Trish or Frank.  His being marked as paternal is a hint, just like any other hint you get from looking at DNA results and needs to be checked out, which I’ve already done.

Figure 12. AutoSegment for 23andMe after adding maternal and paternal to the match file.

Another thing that can be done with the AutoSegment html cluster is that it can be added to DNA Painter using the ‘Cluster Auto Painter’ found in the DNA Painter tools.  I uploaded the html file shown in figure 12.  The DNA Painter profile is shown in figure 13. 

Figure 13. DNA Painter showing clusters from the html file in Figure 11.

Notice how the light green for cluster 1 and the pink for cluster 3 are shown as paternal, and dark navy for cluster 8 is shown as maternal, while the others are all shown as ‘unknown or both’.  This is particularly apparent on chr 10 where you can see the light green paternal and the dark navy maternal segments near the beginning of the chromosome.  Figure 14 shows the expansion of cluster 1 with part of the segments where Trish matches me, and all are listed as paternal.  One of our shared matches who triangulates with the two of us on the X chromosome also has a segment on chr 5.  This might be a useful segment to investigate further for other matches that could potentially also be on my Byrnes and Fenton line.

Figure 14. Cluster 1 from DNA Painter expanded.

Remember Sue and her mother and Dot and her mother that were in cluster 3 from figure 7 and 8?  Now we can look at them in DNA Painter and change them from ‘unknown or both’ to ‘paternal’ for Sue and her mother and ‘maternal’ for Dot and her mother.  Figure 15 shows the four of them in cluster 4 as was seen in the html file (see figures 5 and 12).

Figure 15. Dot and her mother and Sue and her mother on chromosome 3.

Using DNA Painter’s ‘mass edit mode’ I moved Sue and her mother to ‘paternal’ since they triangulated with Frank on chr 3 when I tested that in 23andMe ‘Advanced DNA Comparison’.  I then moved Dot and her mother to ‘maternal’ since 23andMe ‘Advanced DNA Comparison’ showed that they did not match Frank at all.  Chromosome 3 after moving Sue and her mother and Dot and her mother is shown in figure 16.

Figure 16. Sue and her mother after moving them based on the 23andMe ‘Advanced DNA Comparison’ results.

All of the analysis I’ve done here has been with 23andMe data.  The same thing can be done with your DNA matches from FTDNA, MyHeritage or GEDmatch data.  Table 1 shows what data files are needed, and where they are located.  If you tested at FTDNA or MyHeritage the chromosome browser at FTDNA and DNA tools at MyHeritage is included in the price of your test.  If you uploaded your data FTDNA charges $19 to unlock the chromosome browser. MyHeritage charges $29 for the DNA tools, unless you unloaded your DNA data and have an account with them.  MyHeritage allows a tree up to 250 people for free, but larger trees, such as I have, requires an account. A summary of the location, and filenames for the various sites that can be used for Genetic Affairs AutoSegments analysis are shown in table 1.

On Genetic Affairs website you can run ICW AutoClusters for 23andMe and FTDNA data but not for MyHeritage nor GEDmatch. MyHeritage has an early version of the Genetic Affaires software in their DNA Tools. It does not allow you to select the max and min, and gives you results with about 100 matches in the clusters.  GEDmatch also has a version of Genetic Affairs software that allows you to set max and min but uses their unknown algorithm to determine how they use this range. Now with AutoSegment analysis you can run all the analyses on Genetic Affairs website.  You can select whatever max, min and minimum shared segment size that you want to use.  No longer do you need to give your login credentials in order to run this analysis.  That has been a concern for a number of people in the past.  

What does the AutoSegment analysis cost? There is a new free tier for AutoSegment.  For MyHeritage, 23andMe and GEDmatch an analysis of 250 – 25 cM with a 15 cM segment size is within the free tier.  For FTDNA a 250 – 45 cM size with 15 cM segment size is also within the free tier.  Those will give you a chance to play with the new feature and potentially get you hooked on it, as it has me!  AutoSegment Cluster of larger, different sizes cost the same prices as ICW clusters.  Generally they are around 50 credits per run.

In summary I find this new feature very exciting. I was already starting to look at the segment Trish and I shared so this makes that analysis so much easier. The AutoSegment analysis can be run with data that you download from 23andMe, FTDNA, MyHeritage or GEDmatch. You get to select the maximum, minimum and the amount between shared matches.  You don’t have to provide your login information to the other sites since you’ve downloaded the files they make available to you from their sites.  You can put the resulting html file into DNA Painter using their ‘Cluster Auto Painter’ tool. And you can add maternal or paternal to known matches in the data files from the companies to better sort your matches.

  1. Patricia Ann Harris Anthony has given me permission to use her real name. All the other names used in this post are fictitious.
  2. Li et al., “Relationship Estimation from Whole-Genome Sequence Data”, Plos Genetics, 10, (Jan. 2014), 1371.

Genetic Affairs AutoFastClusters

There is an exciting new feature on Genetic Affairs – AutoFastCluster!!  You begin by entering your DNA matches and shared matches into the spreadsheet that’s now found inside Genetic Affairs.  Figure 1 shows the new spreadsheet in Genetic Affairs.

Figure 1. Genetic Affairs spreadsheet.

You enter the data into the online spreadsheet, select the cM range that you want to use, press ‘perform autocluster analysis’ and in a manner of seconds your clusters appear. An example of my results is shown in figure 2.  All the names are hidden for privacy.

Figure 2. AutoFastCluster results for my data.

The first match in the orange cluster is my known 2C. She and I share great grandparents, Thomas Byrnes and Bridget Mary Fenton. I wanted to get all of her matches and their shared matches into one large cluster analysis.  I used the online spreadsheet on Genetic Affairs, shown in figure 1, to enter my data for the analysis.  The AutoFastCluster gives a table at the bottom of the image with AutoCluster information.  Any of the notes you added to your match list will show up with that match in your cluster list.  Based on chatting with various matches I’ve been able to find connections to several surnames in our family.  Cluster 6, the pink one, contains surname Burns from County Roscommon, which is also where our Byrnes great grandfather was born.

Running an Analysis

To begin your analysis enter your DNA match, the amount of shared cM and any notes into the ‘Data match list’ on the left of the spreadsheet.  Next you add your shared matches into the ‘Shared match list’ on the right of the spreadsheet.  Figure 3 shows some data filled into the two lists.  None of the matches real names were used in this example.

Figure 3. Data entered into spreadsheet.

Once you’ve entered your data you decide what range for the max and min cM values. The highest cM value is 358 cM and the lowest is 20 cM, so I would run max 400 and min 15 cM.

At the bottom of your autocluster you have chose to save the cluster to your computer or go back to the spreadsheet page.  Figure 4 shows these options at the bottom of figure 2.

Figure 4. Options after you run an autocluster.

I often run a small test cluster and then go back and add more matches to my spreadsheet.  I find this particularly useful when I’m copying data from several different sites.  When you go back to the spreadsheet after running your cluster, it will initially be empty but clicking on ‘load’ will refill both lists with the data you most recently ran, which makes it very easy to add additional matches and their shared matches.  

Spreadsheet Options

There are a number of options shown on the spreadsheet between the two lists of data, as shown in figure 5. The Max, Min, Cluster size, and Name for the cluster are all used when you perform the Autocluster.  Other options allow you to export your data as CSV or Excel file, choose to clear your match list, your shared match list or both of them.  You can also save the match and shared match list locally in the program or add more rows to your lists if you need.  When you’re making your match list you need a minimum of 10 DNA matches in it in order to run a cluster analysis.

Figure 5. Options on spreadsheet.

Summary

The new AutoFastCluster feature provides an easy, rapid cluster analysis for your data.  It is displayed directly on your browser, and you don’t have to wait for an email and unzip the attached files n order to see your clusters.

Advanced Paste Options

I have been collecting my match and shared match data in two CSV files in Excel.  If you already have your data in a CSV or txt file you can copy and paste it into the Genetic Affairs spreadsheet. 

First select the ‘DNA Match name’ as if you were going to type in data.

Second hoover your mouse over the boundary between the ‘DNA Match name’ and ‘cM’ columns.  

Third left click your mouse and paste the data into the ‘DNA Match name’ list.   The same procedure is used to copy and paste data into the ‘Shared Match’ list.

Manual AutoClusters for LivingDNA

When I recently noticed that I had some matches at LivingDNA I did a Leeds1 analysis to analyze my shared matches to identify clusters. Now Genetic Affairs has a manual AutoCluster for LivingDNA that I can run using the CSV file that I made in my spreadsheet for my Leeds analysis. All I needed to do was to generate a second CSV file that contained my shared matches information.

To run manual AutoCluster for LivingDNA, I started with my match list from LivingDNA, shown in figure 1. I had put the name2 in Column A and the cM values in Column B, as shown in figure 2, when I ran my Leeds analysis. 

Figure 1. My list of matches from LivingDNA.

Now I added notes in Column C to my known matches because these notes will show up with my AutoCluster analysis. Both column B and C are optional, so it even works if you only supply the DNA match names. Then I saved the file as a CSV file.

Figure 2. Match file for manual AutoCluster for LivingDNA analysis.

The second file the manual AutoCluster analysis requires is a CSV file containing the shared matches. The first column of this file holds the match and the second column contains the shared match. One of the easiest ways to do get this information is to ‘view profile’ of each LivingDNA match, as shown in figure 3. 

Figure 3. View profile for an individual match.

Then scroll to the bottom of the shared matches page, highlight the entire page and copy its contents. Alternatively, you can select each shared match individually and paste it in the spreadsheet. The shared matches image from LivingDNA is shown in figure 4.

Figure 4. Highlighted shared matches for Van from LivingDNA.

Next paste the shared match information into a text editor such as Notepad on Windows or TextEdit on Mac. I’m using a Mac so the image in figure 5 is from Mac TextEdit. 

Figure 5. The copied shared matches in TextEdit.

The Mac TextEdit preserved the formatting from LivingDNA, however when I copied the names into my spreadsheet and saved as a CSV file, all the formatting disappeared and the file only contained text.  If you are using Notepad on Windows the formatting disappears when you copy it into Notepad.  Figure 6 shows my spreadsheet with the match’s name in Column A and the shared matches in Column B.

Figure 6. Shared match file for manual Auto-Cluster analysis.

The shared match file was also saved as CSV file.  I was using Excel to make these but any spreadsheet program, such as Notepad on Mac or Google spreadsheet, can be used. in order for manual Genetic Affairs AutoCluster recognizes the files, the word ‘shared’ needs to appear in the shared matches filename.  Other than that any filenames can be used.  

Next I put my data into Genetic Affairs for manual AutoCluster analysis, the URL of this analysis is: https://members.geneticaffairs.com/autocluster   Figure 7 shows the setup page.

Figure 7. Genetic Affairs manual AutoCluster entry page.

My closest match on LivingDNA is one of my Irish cousins who shares 59 cM with me.  So I the AutoCluster from max 60 cM down to 9 cM.

The AutoCluster results are sent as a zip file to your email.  First save the zip file to your computer and then unzip it.  It contains an html file with the auto cluster and an Excel file.  Figure 8 shows the auto clusters for my 60-9 cM manual AutoCluster for LivingDNA analysis.

Figure  8. Results for my 60 to 9 cM manual AutoCluster for LivingDNA.

I noticed there are some grey squares associated with the large purple cluster. Unfortunately I don’t know any of the matches in that cluster. This looks like a great opportunity to contact some of the matches there and try and find the connection. Three of my great grandparents on my Dad’s side came from Ireland, and many of my matches on LivingDNA are from Ireland or Great Britain, so they likely match me somewhere on my Dad’s side. All of my Mom’s family came from Germany. For more details on analyzing grey squares see What are Grey squares

Looking at the cluster table below my AutoCluster I can see the notes that I added for my known cousins.  The cluster table is shown in Figure 9. I’ve chatted with Mike several times and know that he matches both my Dad’s mother’s and father’s sides of my family. Harry only matches on my Dad’s father’s side.

Figure 9. Part of the cluster table from my 60-9 cM AutoCluster analysis showing the notes I added to known cousins.

A manual AutoCluster for LivingDNA analysis costs 25 credits. When you first sign up for Genetic Affairs you receive 200 free credits which allows you to run several analyses. When you decide to purchase more credits each credit costs $0.01 in US dollars. So 25 credits costs $0.25 in US dollars.

Summary

I’ve found that by generating an addition CSV file of my shared matches when I’m doing a Leeds analysis, I can now run a manual AutoCluster for my LivingDNA data. The AutoCluster gives a different visualization than my initial Leeds analysis.

Footnotes

  1. https://www.danaleeds.com/dna-color-clustering-the-leeds-method-for-easily-visualizing-matches/
  2. All names are changed for privacy reasons.