Reconstruct trees for MyHeritage matches using AutoKinship

The newest feature on Genetic Affairs website is AutoKinship.  The most amazing thing about AutoKinship is that it generates a tree using only your DNA matches and the shared DNA between your matches.  It doesn’t require you or any of your matches to have a tree. On the Genetic Affairs website there are 2 ways to run AutoKinship, an automated analysis for 23andme or a manual analysis for MyHeritage or GEDmatch matches. Recently Roberta Estes wrote a blog describing the use of AutoKinship for 23andme.  This blog will describe using the manual AutoKinship at Genetic Affairs using MyHeritage DNA matches.

Figure 1. The two AutoKinship approaches on Genetic Affairs.

The start point of this analysis is the MyHeritage AutoCluster clustering. Starting with the MyHeritage cluster my second cousin, Trish1 is found in cluster 6, but she has several grey cells to cluster 1. Cluster 7 has several additional known cousins.

Figure 2. MyHeritage AutoCluster.

Cluster 1

A close up of cluster 1 is shown in figure 3.

Figure 3. Cluster 1 and grey cells to Trish.

From the information provided on MyHeritage for the matches in cluster 1 two of them live in Australia, two in England, a couple in the US and the rest in Ireland.  Trish and my great grandparents, Thomas Byrnes and Bridget Fenton, both came from Ireland so I’m interested in finding the connection there.

To set up for AutoKinship I took the HTML from this MyHeritage cluster and converted it to an Excel file using the “Transform AutoCluster HTML to Excel” under the “Analysis” menu at Genetic Affairs.

Figure 4. Convert HTML file to Excel.

The Excel file has several tabs in it.  The first is a list of my matches.  The second tab is a list of the shared matches.  The next tab has the matches in cluster 1 followed by a tab with the shared matches for cluster 1.  This continues then for all of the clusters with a list of matches showing the cM that the match shares with me and whatever notes I’ve written about the match, followed by a tab that has those matches with the names of all their shared matches in the cluster.  Matches from grey cells are not included in the cluster matches but do show up in the shared matches list for that cluster.  Figure 5 shows the match list for cluster 1.

Figure 5. Match list in Excel file for Cluster 1.

Part of the shared match list for cluster 1 is in figure 6.

Figure 6. Part of the shared match list, ‘icw_1’ for cluster 1 as found in the Excel file.

Using the shared match list, I then went to MyHeritage and got the number of cM that these matches share with each other.

Figure 7. MyHeritage shared matches for Mary who is in cluster 1.

The shared match list in MyHeritage shows how much DNA Mary shares with me, but it also shows the amount that she shares with each of our shared matches. These data are needed for AutoKinship. I’ve circled two of these amounts in red in figure 7. These shared centiMorgans are copied and pasted into column C of ‘icw_1’ list.

Figure 8. After adding the shared cM to the ‘icw_1’ Excel tab.

Because Trish was a shared match to several of these matches she already shows up in the shared match list, but not in the match list (see figure 5), so I needed to add her to the match list. On MyHeritage Trish is listed as Patricia Ann Harris, her formal name.  AutoKinship needs the names exactly as they appear at MyHeritage so that it can find the correct person in the cluster.

Figure 9. Match list for cluster 1 with Trish added.

I also need to add Trish and myself to the shared match list.  I copied the table from the match list, shown in figure 9, and added that to the shared match list.

Figure 10. The ‘icw_1’ list after adding myself to the shared matches list.

I could run AutoKinship with the information that I have now, but I can also add a known genealogical tree (in the WATO format) for the known relationship between Trish and me.  Our common ancestors are Thomas Byrnes and Bridget Fenton, our second great grandparents. Using the WATO tree insures that Trish and I are placed correctly relative to each other. The amount of DNA that we share is on the high side for second cousins, and it could be labeled as first cousins once removed. Since we know the relationship it’s better to set it with a WATO tree.

Figure 11. WATO tree showing Trish and my family relationship.

With the WATO tree I need to use the same exact names for Trish and myself that MyHeritage uses, or they will not be recognized as the same people.  Also on the WATO tree I need to add the shared cM that MyHeritage has for Trish and use 0 cM for myself.

To use the WATO tree with AutoKinship I downloaded the WATO tree.  Do not use the ‘save image’ as that will create an image of the tree and not what is actually in the tree.

Figure 12. Download the WATO tree to use with AutoKinship.

Now everything is ready to run the manual AutoKinship.

Figure 13. The entry screen for manual AutoKinship.

For name of tested person, I entered my name exactly as it is on MyHeritage.  The default is for 10 trees. You can select more if you want, and they are listed from highest probability down to lower ones.  The first few would be the most likely.  Maximum difference in generation refers to the difference between the tested person and their matches.  The default is 2 generations which would include people in my generation or my parent’s or my children’s generations.  Since I don’t know how all the matches are related to me this is likely a good value.  If I were to set it to 3 generations that would indicate that some of the matches could be in my grandparents or grandchildren’s generation. Looking at the ages the matches have indicated in MyHeritage gives me an idea that 3 generations is not needed here. ‘Set generation of tested person’ lets you set the generation level for yourself if you’ve set the generation of some of your matches.  This is especially helpful if you know how some of the matches in the list fit in your tree and are a different generation.  This is data from MyHeritage so I want to use the MyHeritage probabilities.  And I’ve loaded the WATO tree for Trish and me.

Figure 14. Full screen setup for manual AutoKinship.

There are two ways that the data can be entered.  However, the bulk import is so much easier! Just copy and paste the match data from the Excel file.  In this case it’s cluster 1 so the tab with the data has a ‘1’ on it for the Bulk Input DNA matches data.

Figure 15. Bulk import DNA matches with the data from cluster 1 filled in.

Then copy and paste the shared match list from the ‘icw_1’ tab into the Import shared matches data screen.

Figure 16. Data pasted from ‘icw_1’ into the shared matches screen.

Next I clicked on “Perform AutoKinship Analysis”.  A zip file is sent to my email which I then downloaded, saved to my computer and unziped.  The first autokinship.html file is the landing page and has the highest probability.  The autokinship.xlsx lists the match file in one tab and all the shared matches in another tab.  I’d used 10 as the max number of trees.  Tree1.html is identical to the landing page tree.  The other 9 are trees with lower probability. WATO trees of the 10 probability trees are also provided.

Figure 17. List of files in the AutoKinship directory.

Figure 18 shows the first tree using WATO for Trish and me and setting only myself as generation 0.

Figure 18. Landing AutoKinship for cluster 1 with only me set to generation 0.

The WATO puts Trish and me into our correct second cousin relationship.  However, Sarah, Sue and Joe Smith being our grandparent’s level seems unlikely, since they list their age range on their MyHeritage page, and they are in the same range as Trish and me.

Next I ran the AutoKinship setting Joe Smith as generation 0 as well as having set myself as generation 0 using the ‘set generation level of tested person’ showed in figure 13.  To set generation 0 for Joe Smith I added 0 in Column C next to Joe’s name in the match Excel file (see figure 19) and used that match file in the AutoKinship.

Figure 19. Match table with Joe Smith set to generation 0.

The landingpage AutoKinship tree for the analysis that has Joe listed as generation 0 is shown in figure 20. There is a notation of gen 0 by both Joe’s and my names in the AutoKinship tree.

Figure 20. AutoKinship landing page tree with both Joe and me set to generation 0.

In the AutoKinship tree clicking on the person’s name brings up a box that summarizes all of their matches and the amount of DNA charged both as centiMorgans and percentage. This is shown for Joe Smith in figure 21.

Figure 21. This display shows how Joe Smith matches each person in the AutoKinship tree.

One interesting thing that jumps out at me is the relationship between Joe Smith, Sue and Sarah is the same in both AutoKinship trees.  On MyHeritage Sue only has a tree of 1, so that doesn’t provide any information.  However, her son Frank has a small tree but indicates his mother’s maiden name was Smith.  Sarah also has a small tree and indicates her mother’s maiden name was Smith.  From their shared DNA it appears that those connections are through Joe’s grandfather and great grandfather on the Smith side of his family.

Bridget Fenton’s mother, our second great grandmother was Johanna O’Brien.  Bridget was born in 1853 in Limerick and was Johanna’s only child born in Ireland.  All her other children were born in the United States and lived there their entire lives.  We don’t know who Johanna’s parents or her siblings were.  It appears that at least one generation is missing here, since Johanna cannot be person #1 in the tree.

In following Joe Smith’s family back starting with the tree he had and looking up Irish civil birth records and Catholic baptismal records I discovered that his great grandfather, born 1851, married an O’Brien who was born in 1850.  They would be in the same generation as Bridget, born 1853.  And both this O’Brien and Bridget Fenton were baptized at the same Catholic parish in Limerick.  Unfortunately, the records haven’t survived far enough back to give either my second great grandmother, Johanna’s or Joe’s second great grandfather’s baptismal records.  My hypothesis is that they were siblings (see figure 22).  There is another occurrence of O’Brien in Joe’s tree on his mother’s side.  It’s quite possible that Trish and I match on his mother’s side as well, and we just haven’t found that connection yet.

Figure 22. Tree with Bridget’s mother, Johanna O’Brien added.

Looking at the others in cluster 1 I’ve messaged with Mary.  She and her brother, Bob, are second cousins to Joe Smith but not on the Smith line.  Mary’s grandmother is sister to Joe’s grandfather. I’ve not found enough records to generate a hypothesis for her connection to Trish and me since it is different than our connection to Joe.  Meagan and Barbara are cousins to each other.  They have a private tree and did not reply to messages.  So I have no idea how they connect.  Ann has a small tree but all of the people in her tree live in the US.  I have not messaged her at this time.

Cluster 7

Cluster 7, shown in figure 23, has several known cousins in it. 

Figure 23. Cluster 7 from the MyHeritage clusters.

Carl is a known second cousin on my Dad’s father’s side, and Carol and Andy are known second cousins twice removed. Our Barry family came from Kilkenny, Ireland.  Edward Barry married Pauline Fröhlich from Baden after both families had immigrated to Evans, Erie, New York.   Figure 24 shows the WATO tree for this side of my family.

Figure 24. WATO tree for known cousins in cluster 7.

With both a second cousin and second cousins twice removed the generation indicator in Genetic Affairs AutoKinship setup becomes very important.  If Carl and I are set to generation 0, then Carol and Andy would be -2 since twice removed is 2 generations past Carl and me.  Figure 25 shows the cluster 7 match tab.  It is worth noting that Carol’s and Andy’s setting is based on where they are in relation to our common ancestor and not on when they were born. Their births are both within in a few years of my daughter’s birth.

Figure 25. Cluster 7 match table showing relationship of cousins.

The AutoKinship cluster for cluster 7 is shown in figure 26.

Figure 26. AutoKinship landing tree for cluster 7.

Looking at the information the matches provided on their MyHeritage post Laura has ancestors in Nova Scotia and Prussia.  Richard has ancestors from England, several counties in Ireland and Newfoundland, and Mae has ancestors from England and Ireland and specifically from County Kilkenny in Ireland.  Based on immigrating information for the Barry family, specially not finding any passenger list to the United States, a cousin’s family that immigrated through Canada, and several DNA matches that live in Ontario, the Newfoundland and Nova Scotia are not surprising. Our hypothesis is that the family immigrated to Erie, NY, a short distance from Buffalo, an international entry location, after arriving by ship from Ireland to Canada.  Since Ireland to Canada would have been within the British Commonwealth there would have been no passenger lists for the journey. 

Conclusion

First there was a DNA match.  Then shared matches gave a hint to the family connection.  A triangulated match provided a second hint.  Next AutoClusters grouped these shared matches together to hint of the relationship between them.  And now AutoKinship provides the biggest hint by suggesting how the family tree is connected!

  1. Trish has given me permission to use her real name. No other living people are identified by their real names in this blog.

Exploring AutoClusters

The other day several of us where having a discussion about AutoCluster, which employs In Common With (ICW) matches, and AutoSegment cluster, which employs overlapping segments and triangulated segment.  Triangulated segments, where you and some of your matches share the same segment on the same location of a particular chromosome, indicate that you share a common ancestor. A segment of DNA can only come from one ancestor. Then it’s a matter of determining which ancestor gave you that segment.

ICW clusters are groups of matches that share many if not all matches but they do not necessarily share one common ancestor.  The question then came up if matches are in a cluster together, doesn’t that automatically mean they all share a most recent common ancestor (MRCA).  Like so many things, it depends. 

In many cases I’ve been able to classify an ICW cluster to a specific grandparent or great grandparent.  But then there are some where there’s a mixture of generations, a great grandparent and that person’s parents, for example.  Those seem to be the ones that I see most often.  There is a common ancestor, but not everyone in the cluster has the same most recent common ancestor.  Recent being the key word there.

My family is such that I have a large number of unknown DNA matches.  I’m an only child.  My parents did not do DNA testing.  My closest known cousins are 2nd cousins.  Most often I’m looking for fourth or more distant cousins. 

Maternal 3rd and 4th Great Grandparents

The AutoCluster in figure 1 is from 23andMe.  I found person A early on as a DNA match.  She has tested on several sites.  We emailed back and forth, and I was able to add her to my tree.  The next one I found on 23andMe was person E.  We also emailed, and I was able to place him in my tree.  Filling in other descendants from the same line with the help of cousin’s trees I had all the others in my tree before they showed up as DNA matches.  A mini tree that shows how they are all connected to me is in figure 2.

Figure 1. AutoCluster of some of my 23andMe matches.
Figure 2. Family tree showing the connections between the DNA matches in the AutoCluster in figure 1 and me.

This is my mother’s mother’s side of the family.  I’m circled in the tree in figure 2.  My grandmother, Louise Wolff, was the daughter of Jacob Wolff and Anna Marie Briel. Both were born in Marburg, Germany and immigrated to Richmond, VA.  A number of Anna’s cousins on her mother’s side had already immigrated to Richmond.  In the tree Anna Briel’s parents were Phillip Briel and Elizabeth Schaaf, my 2nd great grandparents. All but one of the matches in the cluster descend from Phillip and Elizabeth.  Person H descends from Elizabeth Schaaf’s parents Matthaus Schaaf and Anna Kuntz.

I like to use the Cluster Auto Painter (CAP) and add my cluster results to DNA Painter.  Figure 3 shows the segments from this cluster.

Figure 3. DNA Painter profile showing segments from AutoCluster in figure 1.

The segments are labeled the same as in the cluster (figure 1) and the tree (figure 2).  Each segment of DNA that you inherit comes from only one ancestor.  Most of the time I name them for ancestor couple because I don’t know which of the couple gave the segment to me.

The segments on chromosome 10 and 12 are where person H matches me.  Because the MRCA between H and me are my 2nd great grandparents, I would move those to a new group named for Matthaus Schaaf and Anna Kuntz.  Then I’d change the cluster name for the rest of these matches to Phillip Briel and Elizabeth Schaaf. Clearly A, B, C, and D on chromosome 10 got that segment that they share with me from my 2nd great grandmother, Elizabeth Schaaf, and she got it, as did H from her parents.  I name my groups based on couples, and I have no way, at the point, to tell if that segment came from Elizabeth’s father or her mother.

How was I able to figure all of this out?  First one of my Schaaf 4th cousin has been researching and documenting the family a lot longer than I have.  A lot of the information in my tree came from her research.  When I noticed a match to person A first on FTDNA, I emailed her.  That helped me fill in some of the living people in her part of the family.  Person E matched on 23andMe, and because of his triangulating with A and me, I knew he was in this part of this same family.  I messaged him, and he helped me fill in his family.  When person B appeared I knew right away where she fit because her mother and mine had been good friends.  So really it came down to having a good, filled out tree and matches who replied and shared information with me.

Paternal – Somewhere on the Byrnes line

I don’t always have such luck in getting replies to messages.  Figure 4 shows a perfectly filled AutoCluster from 23andMe that my paternal 2nd cousin, Trish, is in.

Figure 4. An AutoCluster from 23andMe of matches that triangulate with Trish and me.

The 3 others in the cluster are a father, his brother and his son.  The son was the first to show up as a match to me, sharing 36 cM.  According to the shared cM project 36 cM is the average for 4th cousins, so that would be around the 3rd great grandfather level.  His father also shares 36 cM and his uncle shares 39 cM.  He’d added a greeting on his page and wanted people to message him.  He also indicated that he lives in the same city where I grew up.  I messaged him 3 years ago but never got a reply.  About a year later his uncle showed up as a match.  I messaged him 2 years ago and again no reply.  By now Trish had tested and I could see that they matched her, so I knew it was on my father’s mother’s side.  The father only showed up recently, and I messaged him last week.  He also added 3 ancestor surnames.  Unfortunately the surname of my matches here is rather common.  I tried looking for each of them on Ancestry and found over 900 members with the same name.  Then I tried looking for trees with combinations of their surname and the 3 surnames the father had listed.  Still that didn’t help.  

The 3 of them share 2 segments of DNA with me.  They triangulate with Trish on chromosome 17.  I know that triangulated segments with Trish could be Byrnes, Fenton, Lillis, Shannon or O’Brien.  I’ve not resolved which is the chromosome 17 segment.  The other segment that the 3 of them and I share is on chromosome 1, see figure 5.  I do know something about that segment!  Four years ago I found 6 people on GEDmatch that triangulated with each other and me on that segment and emailed them.  I heard back from several of them.  One of their grandmother’s was a Byrnes, and several of them had ancestors from County Galway near the County Roscommon border!  Thomas Byrnes, Trish and my great grandfather was from County Roscommon, but we don’t know exactly where in Roscommon.  This gives us a hint to where he lived. I’ve not been able to find a baptismal record for Thomas, so I know he lived in an area where the baptismal records have not survived.  All I can say for this cluster is that I know at least 1 of the segments is on our Byrnes line, and it’s likely around the 3rd great grandparent level or more distant. Since Trish and my MRCA are our great grandparents, there are at least 2 different MRCA in this cluster.

Figure 5. Segments shared with me on chromosome 1 from the AutoCluster in figure 4.

Dave’s Paternal Grandmother’s Line

My husband Dave has a large number of known cousins on his paternal grandmother Marti side, and many of them have done DNA tests. Dave’s Aunt Mary worked on the family tree for many years, and we can trace back several generations.  Figure 6 shows one of Dave’s AutoClusters from MyHeritage.

Figure 6. One of Dave’s MyHeritage AutoClusters.

At first I thought this was going to be similar to my Schaaf one since there’s a combination of 2nd and 3rd great grandparents.  But as soon as I drew out the tree I knew something was different here.  Figures 7 and 8 show the trees for this cluster.

Figure 7. Tree for matches that have Jacob Marti and Anna Fritz or Jacob’s parents, Adam Marti and Elizabeth Schnell as MRCA.

Dave’s paternal grandmother Harriett’s father was Jacob Marti, son of Jacob Marti and Anna Fritz.  The elder Jacob’s parents were Adam Marti and Elizabeth Schnell.  Now the problem coms in that match K descends from Veronica Stamm, who is Anna Fritz’s mother.  After Veronica’s husband, Johann Fritz died, she remarried and had daughter Rose, who was a half sister to Anna Fritz.  Match K descends from Rose.

Figure 8. Tree for match K showing the MRCA for K and Dave is Veronica Stamm.

Dave and matches B through H MRCA are his 2nd great grandparents, Jacob Marti and Anna Fritz.  His MRCA with matches A and J are Jacob’s parents, Adam Marti and Elizabeth Schnell, and his MRCA with match K is his 3rd great grandmother, Victoria Stamm.  Person K matches B through H with Victoria Stamm as their MRCA.   All of this would be very well as long as K doesn’t match A or J.  However, K does match A. There has to be some more distant connection between Victoria Stamm’s family and the Marti family.  Matches A and K share 32 cM and do not triangulate with Dave. From the shared cM project 32 cM would be in the 4th to 5th cousin or more distant range. Veronica Stamm was born 1811, so the MRCA ancestor here is the 1700s.  All of these families were living in the same village in Switzerland at that time, so it’s quite possible that there were other earlier marriages in the family that we don’t know about.

Conclusion

This started from a question about whether or not all the matches in an AutoCluster were from one most recent common ancestor.  In my experience and the examples I have shown here, they are not.  There is a family line, such as the paternal grandmother, that all the matches follow, but there are typically several generations of ancestors present in the cluster.  How do you figure out the exact connections?  What I’ve found is having a detailed tree, matches that also have detailed trees, as well as matches that will reply to messages and share family information with you are important to helping to find that common ancestor.

Considering the fact that the clustering was performed using shared matches, this conclusion perhaps should not be a surprise. Shared match data is usually a mixture of DNA matches that share the same or another segment as compared to you. However, the AutoSegment ICW, which is available for FTDNA and 23andMe, and AutoSegment for GEDmatch, which employs triangulated data, looks for overlapping segments that are on the same side. Therefore, by using these clusters, we should be able to obtain clusters of matches that share the same or several DNA segments and therefore share a common ancestor. The AutoSegment ICW clusters will be explored in a future blog.

Both Dave and Trish have giving me permission to use their real names.  All other living people’s names are hidden for privacy.

Improved AutoCluster clustering on GEDmatch

GEDmatch now offers the improved AutoClustering tools that will generate AutoTrees from the gedcoms of DNA matches, and if there are high enough DNA matches with extensive trees, additional trees might be generated by AutoPedigree. All of these new features were developed in collaboration with Evert-Jan Blom of Genetic Affairs and are available from the Tier 1 features on GEDmatch. One of these is ‘Clusters, Single Kit input, Basic Version’ but it’s considerable more than the basic cluster GEDmatch used to offer. Another way of using the new AutoTree feature is based on using tag groups in the Tier 1 ‘One-to-Many Comparison Beta.’ Both of these will be examined here.

Clusters, Single Kit input

The first of these is the ‘Clusters, Single Kit input, Basic Version.’ There are now more input parameters than there used to be which allows more control over the results you obtain. The opening screen for the “Clusters” item in Tier 1 is shown in figure 1.  The “Work Flow info Toggle” gives suggestions on running your AutoCluster and AutoTree.  First you’d want to run the cluster, next go back to the entry screen and run the Auto Tree, and finally go back and run the segment data. 

Figure 1. Entry page for Tier 1 ‘Clusters, Single Kit Input, Basic Verion’

I usually think of the values that appear in the opening screen as a bit of a default for my first run.  Number of kits is set to 100.  Other options are 250 or 500 kits.  The lower threshold is set for 15 cM, which is what I’d likely use.  But the upper threshold is 50 cM.  I’m not sure that the range 15 – 50 cM would be very useful for most people.  I like to set the upper threshold based on what matches I want included in the cluster or which ones I want to exclude.  For example, I’d want to include cousin matches, but most likely not want to include siblings or nephews/nieces.  The information from the ‘i’ button on ‘minimum overlap’ is shown in figure 2.

Figure 2. Information button for ‘Minimum Overlap’.

This minimum overlap is related to ‘Overlap’ seen in the One-to-Many comparison.  Figure 3 shows the One-to-Many for Judy, my sister-in-law.

Figure 3. Judy’s One-to-Many GEDmatch list.

Overlap numbers that are less than 100,000 are colored various shades of pink or red.  Judy tested at 23andMe.  Her brother tested at both 23andMe and at Ancestry.  Notice how his Ancestry migration – F2 -A is pale pink whereas his 23andMe one is over 100,000.  Sister 1 only tested at 23andMe, but sister 2 tested at both Ancestry and 23andMe.  All the matches from 23andMe are over 100,000 overlap whereas the ones from Ancestry are not and are pink or red.  I definitely want to include matches that tested at other companies in my analyses. To me this is one of the huge benefits of GEDmatch that I can compare matches from different testing companies and especially from Ancestry since they do not provide a chromosome browser.  Looking at this list tells me that 40,000 is the value I want to use here for Judy’s data.

‘Include Segment Detail’ “performs 1-to-1 on all pairs of kits. Includes a Triangulation check.  Generally more accurate results,” see figure 4.  When this analysis is run a triangle appears in matches that triangulate.  ‘Auto Tree’ option first will add a tree symbol to your cluster and allow you to see the match’s gedcom pedigree.  However, if there are high DNA matches with extensive trees you can also get AutoPedigree.

Figure 4. Information for ‘Include Segment Data’ option.

The ‘Cut off Year’ refers to the earlier date seen in the tree.  The default is 1700 CE, but it can be lowered to 1500 CE.

Example

Using my sister-in-law, Judy’s1 GEDmatch data I wanted to include Sue, her 3C1R.  Figure 3 shows Sue matches Judy with 92.8 cM in the 1 to many match list.  I want to include all the various testing companies so I set the overlap parameter to 40,000.   Using 100 kits and running the cluster from 15 cM to 100 cM should include her, but it didn’t.  The range of matches that were in the clusters were 26.4 to 27.1 cM.  It appears that GEDmatch is using some modification of the clustering algorithm that they used previously which favored the smaller cM values.  After trying several combinations of values I obtained a cluster with Sue, Judy’s 3C1R, by using 25 to 500 cM and 40,000 overlap with 100 kits. It’s important to view the cluster order ‘by Cluster number’ in order to get the cleanest results.  The clusters for this analysis are shown in figure 5.

Figure 5. Judy’s clusters that included her 3C1R, Sue.

One thing you might notice are clusters that are more sparse as compared to other clustering results. This is because the clustering settings have been changed to generate larger clusters. This should improve the identification of common ancestors when they would not be found if the clusters are more condensed.

Next I ran the Auto Tree.  In Chrome hitting the back arrow on the browser twice will bring up the original screen with the parameter I’d used intact.  Often several more matches are shown in the clusters with Auto Tree, and a tree icon is shown for matches that have a gedcom on GEDmatch.  See figure 6.

Figure 6. Judy’s AutoTree using the AutoCluster in figure 5.

Now both Sue, the 3C1R and her daughter, Mary are showing in the cluster, whereas Mary wasn’t showing in the earlier cluster in figure 5. Neither of them has a gedcom, but I know  they are related to Judy through her second great grandparents on her paternal grandmother’s side.  There are a number of trees here to explore. I recognized D. McFury surname’s as I’ve seen it with another match on 23andMe.  There was no tree on 23andMe so I was very interested to see the gedcom and hopefully figure out the connection to Judy.  Part of the gedcom is shown in figure 7. Ellen Talbot is D. McFury’s 2nd great grandmother. Judy’s 2nd great grandmother is Jane Talbot who married Richard Coleman.  This match is on her Talbot 2nd great grandmother’s line.  Two things in this gedcom jumped out at me.  Ellen Talbot was from County Kilkenny, Ireland, which is where my Barry family lived. Kilkenny is the one place in Ireland that I’ve done extensive research.  The other is that Ellen’s daughter Nancy Anna died in Kalamazoo County, MI.  Judy’s 2nd great grandfather, Richard Coleman lived in Kent County, MI and is buried in Ada, MI which is only about 70 miles (113 km) from Kalamazoo. In fall 2019 I spent a day in Kent County researching the Coleman family and plan a trip back there when I’m visiting in MI and the libraries are open again.

Figure 7. The part of D. McFury’s gedcom showing her 2nd great grandmother, Ellen Talbot.

D. McFury shares 35 cM with Judy.  Looking at the Shared cM Project on DNA Painter that could be 3rd cousin or perhaps more likely 4th cousin.  Now I need to build out the tree and try and find the most recent common ancestor between Jane Talbot and Ellen Talbot.

At the top of the screen with the clusters and trees there are several new options, shown in figure 8.

Figure 8. Options after running AutoCluster with AutoTree

Selecting the ‘AutoTree AutoCluster Analysis’ tab brings up an explanation of how the AutoCluster and AutoTrees are calculated.  At the bottom of this explanation is a table.  See figure 9. 

Figure 9. Table of data for the trees associated with the various clusters.

In the table each cluster is listed and any information that has been obtained from the cluster.  Looking at cluster 3 the information for the tree icon indicates that clicking on the tree icon will display the tree connecting the matches in the cluster.  This tree is shown in figure 10.  Location indicates there are 5 locations for this tree.  Clicking on the gedcom icon will download a gedcom for this cluster. Surnames lists all the surnames found in gedcom for the cluster.

Figure 10. Tree for cluster 3.

Below the tree is a list of all the people and locations in more detail, shown in figure 11. The table shows the gedcom number found on GEDmatch, the names of the ancestors, their birth and death location, the descendants, their GEDmatch kit number, their name and the shared cM.  All the kit numbers and living people are blocked out for privacy in the figure.  Clicking on the location brings up Google map showing the exact location!

Figure 11. Details for common ancestors for tree from cluster 3.

Below this table are detailed tables of all the locations found and the ancestors who lived in them for the matches and the primary kit.  Figure 12 shows which ancestors from this cluster lived in Grand Rapids, MI, as well as Judy’s ancestors who lived there.  These tables add a great deal to the information I’d seen in the gedcom. It lists all the information in one place, as opposed to spread out in the gedcom, and it does the exact comparison of specific locations that are common to the different gedcoms.

Figure 12. Detailed locations where ancestors for D. McFury and Judy both lived.

Next I ran the ‘Include Segment Data’.  The clusters go back to what was seen for the original clusters but this time with triangles in the clusters to indicate triangulated matches as shown in figure 13.

Figure 13. Judy’s Clusters showing Segment Data.

Sue, who is 3C1R, triangulates with other known cousin who are descendants of Judy’s paternal grandmother’s side of the family.  These are seen in the red cluster. The  two matches in the first orange cluster triangulate but I do not know how they connect to Judy’s family.  The green cluster is on her maternal grandfather’s side, as is the pink cluster. The brown cluster is on her maternal grandmother’s side.  The mustard cluster is where D. McFury is and would be on paternal grandfather’s side.  Both the turquoise and orange clusters are on maternal grandfather’s side of the family.

Tag Groups

Another method to get AutoTrees and potentially AutoPedigree is based on tag groups. From the home page I selected ‘View/Change/Delete your profile (password, email, groups),’ see figure 14. Then I selected ‘Tag Group Management,’ and figure 15 appeared.

Figure 14. GEDmatch home page to set up Tag Groups.
Figure 15. Screen to set up a tag group.

Since I wanted to make a tag group for Judy using her matching kits that contain gedcoms I used Judy1 for the description. I selected green for the color and clicked ‘Add Tag Group.’ Next I went back to the main page by selecting ‘Home’ and select ‘One-to-Many DNA Comparison Beta’ under Tier1. The One-to-Many entry screen is shown in figure 16. After entering Judy’s kit number, I selected 2500 as the ‘limit’ for the number of kits to include and clicked ‘Search.’

Figure 16. One-to-Many DNA Comparison Beta’

Once the list of matching kits is displayed, I clicked ‘Select all with GEDComs.’ That resulted in about 250 matches. Then I selected ‘Visualization Options’ which brings up the screen shown in figure 17, and there I selected ‘Tag Groups’ which let met add the 253 matches that had gedcoms to Judy1 tag group.

Figure 17. Visualization Options screen.

The end results of adding the 253 matches to Judy1 tag group is shown in figure 18.

Figure 18. Tag group Judy1 now has the 253 of her matches that have gedcoms.

I next went back to the screen in figure 17 and selected ‘Clustering’ which brings up the screen shown in figure 19.

Figure 19. Clustering screen.

Since I’ve only included kits that have gedcoms I selected ‘Auto Tree’. I can also select ‘Include Segment Detail’ if I want to see which of the matches triangulate. An ‘Overlap’ of 100,000 would likely only include kits that were tested at 23andMe, which is where Judy tested, so I again changed that to 40,000 in order to include all the testing companies. I’ll used 15 – 1000 cM for the range and selected ‘Cluster’. The top of the resulting screen is shown in figure 20 and the top part of the cluster is shown in figure 21.

Figure 20. Cluster results for Judy1 tag group of kits with gedcoms.
Figure 21. Cluster using only kits that matched Judy and had a gedcom.

Next I looked at the ‘AutoTree AutoCluster Analysis’ tab to get the list showing common ancestors and common locations, as shown in figure 22.

Figure 22. List of AutoTrees and common locations.

Notice in figure 22 how all of the clusters have locations listed because they all have gecoms, but not all of them show common ancestors. Very often finding ancestors of DNA matches that live in the same location as your ancestors did can help to either make a connection or at least help identify which part of your family is likely in the connection. I decided to look at the tree for cluster 22, which is shown in figure 23.

Figure 23. AutoTree for cluster 22.

Looking at the list of common locations associated with cluster 22 and this AutoTree points to the common ancestors that are listed in the table below the AutoTree and shown in figure 24.

Figure 24. Common ancestors table.

Jane Lutey and Simon Hocking are Judy’s 5th great grandparents. They are also Ann’s and her brother, Mark’s 4th great grandparents. In the AutoTree shown in figure 23 Martha Murrish and James Hocking are Judy’s great grandparents. Judy’s mother was Virginia Hocking. Virginia’s paternal ancestors were miners in Cornwall, England until the mines closed. Some of the miners immigrated to Minnesota, which is what Judy’s great grandparents, Martha Murrish and James Hocking did. Other miners immigrated to Australia, which is what Judy’s 5C1R, Ann’s and Mark’s great grandparents, John Hocking and Margaret Oats, did. John and Margaret are the parents of Caroline Jane Hocking shown in the AutoTree in figure 23. There is a second Hocking line that connects Judy with Ann and Mark, which makes the shared cM value higher than would be expected for a 5C1R. Ann was one of the first matches on GEDmatch that we found, and we’ve been emailing ever since.

Summary

There are two exciting new features on GEDmatch Tier 1 that were developed in collaboration with Evert-Jan Blom of Genetic Affairs. Both of them use your DNA matches and specifically the matches that have gedcoms to find common locations and common ancestors. Both of these techniques have been described in this blog post.

  1. Judy has given me permission to use her real name. All other names of living people are either hidden or fictitious names have been used. All kit numbers have been hidden.

Convert old AutoCluster reports to Excel

For over two years Genetic Affairs has provided the AutoCluster reports in the HTML format. For some of these reports, it would be interesting to extract the DNA match information as well as the shared matches. This information could provide a head start for people that would like to run a manual AutoCluster analysis. Imagine transforming an old Ancestry analysis to Excel, add the most recent (shared) matches, and re-run the AutoCluster analysis using CSV files.

Another scenario would involve the MyHeritage AutoClusters analysis which does not allow you to change the maximum and minimum cM setting. It runs from 400 cM down for about 100 matches and to whatever minimum cM that gives. Recently I was looking at my sister-in-law1, Barb’s matches, and noticed that she had a known first cousin once removed (1C1R) who had tested. I was very interested in seeing her shared matches since we know something about Barb’s mother’s side of the family. But to my dismay, this match, Grace, matched Barb at over 400 cM and so wasn’t included in the cluster. Well, I can manually look through all 90 of her matches, or normally I’d think of Leeds method, but I know this is Barb’s mother’s father’s side of the family, and I’m more interested in who the matches are and how they connect to others in the family.

I wanted to follow Grace’s matches and compare them with the rest of the AutoCluster. There’s a new feature on Genetic Affairs that lets me do just that. ‘Transform AutoCluster HTML’, as shown in figure 1, lets me take the existing AutoCluster HTML report and convert it to an Excel file with 2 tabs. One tab has the matches with the name of the match and the cM they share. The second tab has the shared matches for all the people in the match list. Using this I can add Grace and her matches to the match list with their cM values and save that tab as a new CSV file. Then I can add Grace and her matches to the shared match list and again save that tab as a new CSV file. Next, I can run a CSV AutoCluster (as explained in an earlier blog post) using my two new CSV files and generate a cluster that contains Grace and her matches along with all the matches that were in the original MyHeritage AutoClusters.

Figure 1. Other AutoCluster analyses. Select the “Transform AutoCluster HTML” to obtain the Excel file.

Looking at the MyHeritage AutoCluster I noticed that it went down to 30 cM. So I’d want to find Grace’s matches that also go to 30 cM to be consistent. Looking at Barb’s matches and selecting Grace I could then see her shared matches with Barb. The shared match list on MyHeritage doesn’t always go by the highest match first and appears to have highest matches to Grace in some cases before the highest shared matches to Barb, so I went down the list making a new Excel file of the match names and the cM shared with Barb. I went down to the point where matches I was seeing were less than 20 cM. At that point I skimmed the rest of the list for any matches over 30 cM and didn’t see any. So I was pretty sure I’d found them all.

Figure 2. HTML to Excel interface. Select the HTML file of interest to start the analysis.

Next I downloaded the Excel file from Barb’s MyHeritage AutoCluster, see figure 2, so I could add the new matches I’d found with Grace. Even though Barb and Grace had 90 shared matches only 10 of them were over 30 cM. Then I looked at those 10 matches to see if they matched others in this little group of 10. A couple of things I noticed that turned out to be important. One is to the copy the name exactly as they listed it in MyHeritage. You and I might think that John Smith and John SMITH are the same, but the comparison won’t see it that way. The other problem I had was in writing Robert F. Jones but he had Robert F Jones without the period after the initial. Thus it took me a couple runs, because of these minor, careless mistakes to get the AutoCluster that I wanted.

Figure 3. New AutoCluster after adding Grace and her shared matches.

Figure 3 shows the new AutoCluster with Grace in cluster #1 along with her matches. As it turned out 2 of those 10 matches did not have any shared matches over 30 cM so they were not included. That first cluster is all on the Barb’s mother’s side of family side as well as the matches in the second cluster.

Summary

Being able to convert my old AutoCluster to an Excel file and then adding new matches is a great new feature. It will be a lot easier to update older clusters with additional information without having to redo the entire cluster as a manual csv cluster.

  1. Barb has given me permission to use her real name.

RootsTech Connect

A year ago I was in Salt Lake City researching in the Family History Library and getting ready for RootsTech to start.   I was very fortunate to be able to attend RootsTech for the first time.  One of the highlights was getting to meet Leisa Byrne and Jonny Perl in person!  We’d been working on the DNA Painter user group for over 2 years and ‘chatting’ by email. Here’s a photo of us at the DNA Painter booth last year. Jonny is on the left, Leisa’s on the right, and I’m in the middle.

DNA Painter booth at RootsTech 2020 in Salt Lake City.

This year with the pandemic RootsTech is all virtual and FREE!!!  From numbers I’ve seen over 430,000 people have registered so far. If you’ve not registered yet click on RootsTech to go to the registration page.

Also on the registration page are the Main Stage Streaming Schedule and the list of All English Sessions.  For sections in other languages there will be subtitles in English on the screen. So we should be able to enjoy any of the presentations in a language that we are comfortable with. All of the presentations are prerecorded and will be available for a year, so you’ll have plenty of time to view as many as you want.  The virtual Expo Hall opens at 5:00 PM Mountain Standard Time (MST) Wednesday, February 24.  Many of the exhibitors will have special sales associated with RootsTech Connect, and there will likely be games to play at the booths just like there were in Salt Lake City last year.  There will be FamilySearch employees and volunteers available during the conference for questions.  It will be in the ‘Ask Me Anything’ tab on the RootsTech page.  I’ll be there helping there on Thursday and part of Friday. Because this is a world-wide event you will be able to access all the conference features and the volunteers all day and all night until 9:00 PM MST Saturday, February 27 when the conference ends.   

I have three short presentations on DNA Painter that will be in the DNA Booth.  They will be shown for the first time on Saturday: ‘Getting Started with DNA Painter’ is at 5:00AM MST; ‘Adding My Heritage Data to DNA Painter’ is at 12:00 MST; and ‘Adding 23andMe Data to DNA Painter’ at 2:00 MST. I plan to be online to answer any questions that come up during that first showing for each of these.

This should be a very exciting conference. I’m looking forward to it!! Hope to see you there!

Looking for Thomas Barry in Kilkenny, Ireland before 1840

We know that Thomas Barry and his wife Mary Aide lived in Moanroe Commons, Knocktopher, Kilkenny when their son, Edward, was baptized at Ballyhale Catholic church in February 1840, and when their daughter, Mary was baptized in May 1843. Also Thomas Barry is listed in the 1845 House Book for Moanroe Commons, however, he is not listed in the 1848 House Book nor in Griffiths Valuation in 1850.  In 1855 New York State Census Thomas, wife Mary, and children, Edward and Mary, are listed in Evans, Erie County, New York, and it indicates that they have lived in the same location for 5 years.  The 1875 New York State Census lists that Thomas Barry died 25 March of that year at age 63 or 65, and that Mary Barry died 12 September 1874 at age 65 (see earlier blog post).

Where was Thomas prior to 1840?  Ballyhale Catholic church baptismal records prior to 1823 and all early marriage records have not survived.  It is not known when or where Thomas or Mary Aide were baptized or when or where they married.

A DNA Match Connection

I have a DNA match to a Barry family that lived in Sugarstown, Kilfane, Kilkenny, which is 12 km (7.5 mi) from Moanroe Commons.  Sugarstown is in the Thomastown Catholic parish, where many older baptism and marriage records have survived.  There is a baptism of a Thomas Barry, son of J Barry and Ellen Shea, on 19 Nov 1812.  That year would fit for my Thomas Barry’s birth based on the age given in his death record.  Could this be my 2nd great grandfather?  

Thomas Barry’s 1812 baptism lists James Comerford as one of his sponsors.  Looking at the tree from my Barry DNA match James Comerford was a witness at the wedding of his 3rd great aunt, Mary Barry, also at Thomastown Catholic church.  Is there a connection between this Thomas Barry’s family and my DNA match’s family?

John Barry and Ellen Shea had four other children: Alexander (born 1807), Paul (1810), Margaret (1815), and Judith (1818). For Alexander’s and Judith’s baptisms the residence is given as Oldtown.  No residence is given for Thomas’s, Paul’s or Margaret’s. There are four Oldtown in Kilkenny, however, the Oldtown in civil parish Jerpointchurch is the only one that has Thomastown Catholic church.  It is 3.5 km (2 mi) between Moanroe Commons and Jerpointchurch.  Oldtown is not on a modern map. 

Alexander and Paul are both unusual names in Ireland, which made it easy to find other records for them.1 John Barry is listed in Tithe Applotment with 7 acres in Oldtown, Jerpointchurch in 1833.  As would be expected Alexander, the oldest son, inherited that land and is in the 1850 Griffith Valuation in Oldtown, Jerpointchurch.  It was pretty easy to find Alexander’s marriage and their children and then his death in the civil records.  Paul never married and worked as a porter at the Workhouse in Thomastown.  There were three possible marriages for Margaret and two for Judith, which I’ve not pursued so far.  Figure 1 shows the family tree I built for Alexander down to living people, whose names are blocked off.  

Figure 1. Alexander Barry’s family tree.

Alexander’s grandson, Richard, immigrated to the US, and I was able to find several trees for Richard on Ancestry.  I sent messages to all the tree owners and was surprised to receive replies within a week.  One of them gave me the name of a potential 4th cousin, who was his DNA match.  She does not match me, but there’s only a 45% chance that a 4th cousin would match.  I sent her a message, but I’ve not received a reply.   There are several other descendants who could be DNA matches, and I need to follow up with them.

Can I Prove my Hypothesis?

That got me thinking of other Thomas Barrys in that part of Kilkenny.  Another way to prove a hypothesis is to show that none of the other Thomas Barrys could possibly be the one baptized in 1812 in Thomastown.  Starting with civil parishes connected to Thomastown Catholic parish I looked at Griffiths Valuation for these areas.  

There is a Thomas Barry (Thomas #1) in Stoneen, Kilfane in Griffiths Valuation in 1850, see figure 2.  Stoneen is 7 km (about 4 miles) from Moanroe Commons.  There is also an Andrew Barry listed here, but I don’t know who he is.  Could he be Thomas #1’s brother?

Figure 2. Griffiths Valuation for Stoneen, Kilfane, Kilkenny.

Looking for Thomas #1 marriage in Thomastown Catholic parish I found that he married Betty Mulloy 15 Aug 1836.  They had a dispensation for 3rd degree and 4th degree on the marriage record.  Third degree means 2nd cousins, and 4th degree means 3rd cousins.  Likely Thomas’ great grandparents were Betty’s 2nd great grandparents.   Elizabeth Molloy, daughter of Edmund Molloy and Margaret Kealy of Stoneen was baptized 21 Jul 1813 in Thomastown RC.  This is likely the bride in this listing.  I didn’t find a baptism for Thomas #1, which makes me wonder if he’s the Thomas Barry baptized in 1812 in Thomastown. Had he been from some other Catholic parish I’d have expected to see a note on the marriage record about receiving a certificate from another parish that indicated he’d been baptized there.  If he were baptized in 1812 he would about the same age as Betty when they married.

They had 5 children all baptized at Thomastown.  Ellen was baptized 23 Aug 1836, 8 days after the parents wedding.  This was definitely the oddest thing I’ve found in looking at the records!  I double checked the dates on both the marriage and this baptism to make sure I had the correct records. Their other children were Elizabeth (1839), Anastasia (1840), Mary (1842), and Rose (1846), see figure 3.  Following this family I could not find civil death records from Thomas or Elizabeth.  It’s possible they died prior to the 1864 beginning of civil death records.  I also couldn’t find any marriage records for any of the daughters, nor civil death records for any of them.  Either the family all died, perhaps in the famine, or they emigrated from Ireland.  

Figure 3. Thomas #1 family tree.

Next I looked for Andrew Barry, who appeared in the Griffiths Valuation for Stoneen.  Andrew Barry is the son of John Barry and Mary Barron baptized at Thomastown Catholic church in 1814.  The family residence was given as Stoneen.  Other children of this family were Laurence (1818), Thomas #2 (1823), and Margaret (1827).   Thomas #2 cannot be Thomas #1 as he would only be 13 in 1836 when Thomas #1 married.  

Thomas #2 married Ellen Ryan at Tullaherin Catholic church 30 Aug 1854.   Thomas and Ellen’s children were: Patrick (1855), Maryann (1857), Judith (1860), Andrew (1866), Bridget (1867), Margaret (1870), Ellen (1873), and Johanna (1876), see figure 4.  Thomas #2 died 17 Feb 1898, age 75, which agrees with his being born in 1823.  He was a laborer and the informant was Ellen Barry, likely his wife or perhaps his daughter Ellen.

Figure 4. Thomas #2 family tree.

I found an extensive tree for Thomas #2 on Ancestry, and there is a photo of the family tombstone in Kilfane Cemetery.  Bernie and Mary sent me information about this tombstone that was erected by Ellen Ryan Barry in memory of her husband, Thomas.  It also has information about their children: Mary Ann, Johanna, Bridget, Margaret, and Andrew who died young, as well as their daughter Johanna Walsh, her husband Walter and their daughter Anne.

Is Thomas #2 the one on Griffiths Valuation in Stoneen?  As the 4th son in the family his brother Andrew would have inherited the land from his father, so it seems unlikely.  Could Thomas #2 have inherited a farm from his father-in-law?  

There is an Ellen Ryan, daughter of Edmund Ryan and Mary Flannery baptized 1 Feb 1834, who is likely the Ellen Ryan who married Thomas #2.  Edward Ryan is in the Tithe Applotment in Stoneen as well as being in Griffiths Valuation in Stoneen along with Andrew Barry and Thomas Barry, see circled names in figure 2.  This makes it unlikely that he gave his farm to son-in-law Thomas Barry #2.  Also Thomas #2 death record said he was a laborer and not a farmer.  Thomas Barry #2 is not the one in Griffiths Valuations in Stoneen.

At this point I’ve not been able to prove my hypothesis that Thomas Barry baptized at Thomastown in 1812 is my 2nd great grandfather. More research is needed.

Summary

I have a DNA match to a Barry family who lived in Sugarstown, Kilfane, Kilkenny, but I don’t know how my Barry family connects to them.  Kilfane is in the Thomastown Catholic parish.  There’s a Thomas Barry baptism there in 1812, which is about when my 2nd great grandfather was born based on his age at death. I have a hypothesis that this Thomas Barry is my 2nd great grandfather.  John Barry and Ellen Shea were the parents of that Thomas and had 4 other children; Alexander, Paul, Margaret and Judith.  Alexander and Paul are unusual names which made it easy to follow Alexander’s family down to a potential 4th cousin.  She had done DNA test, but we did not match. There’s a 55% chance that 4th cousins won’t share any DNA.

Another way to prove my hypothesis would be to show that being my 2nd great grandfather was the only possible solution.   I looked at other Thomas Barrys in the area.  Thomas Barry #1 married Ellen Ryan and had 5 daughters.  I cannot find a baptismal record for him, which makes it possible that he is the 1812 baptism.  However, since he’s listed in Griffiths Valuation in Stoneen, and would not have inherited land if he were John Barry and Ellen Shea’s son Thomas, it suggests that he is not the Thomas Barry baptized in 1812. I cannot find marriages for any of his daughters, nor death records for any of the family.  It seems very likely that they emigrated from Ireland.

Thomas Barry #2 married Ellen Ryan and had eight children.  I was able to find his baptismal record, marriage, baptismal records of his children, his civil death record, and tombstone in Kilfane.  His life is totally documented.

Next I am going to look for Thomas Barry baptismal records in a wider range of Kilkenny.  As well as looking at other DNA matches that triangulate with the Sugarstown Barry family and me.

  1. FIona Fitzsimmions, transcript of online chat, 12 Dec 2020, privately held by Coleman, Grand Marais, MN.

Comparison of ICW AutoCluster and AutoSegment AutoCluster

I’ve been wanting to do a comparison between the In Common With (ICW) clusters and AutoSegment clusters ever since AutoSegment came out. So today I finally did that!  The ICW clusters are the ones that have been available either on the site, such as MyHeritage or GEDmatch, or can be accessed from your account on Genetic Affairs. The Auto Segment clusters are relatively new and use the csv files that you download from the different testing sites. In previous blog posts, I already discussed the AutoSegment and hybrid AutoSegment tools.

23andMe

I started with 23anMe mainly because I’ve worked most with that data, since my cousin Patricia Harris Anthony (Trish)1 tested there.  First I downloaded aggregate data from 23andMe so I’d have the latest list of matches.  Then I ran Genetic Affairs ICW cluster from 600 cM to 30 cM, with a minimum shared 15 cM and cluster size of 2.  Trish is listed as sharing 402 cM with me, and I definitely wanted to include her.  I also ran Genetic Affairs AutoSegment cluster with the exact same parameters range and the new ‘aggregate data’ I’d just downloaded from 23andMe.

I mainly used Excel for my companion along with DNA Painter.  DNA Painter is my main DNA match data storage location.  I paint both known and unknown matches mainly looking at triangulated matches to group the data.  Of course I was using 23andMe to check on matches that I’d not painted into DNA Painter.  I started by opening the Excel files from both the ICW and the AutoSegment runs.  In the ICW Excel file I copied columns A-E which included the match name (column B), the total cM (column C) and the cluster number (column E) and placed the copy into a new Excel file in columns A-E.   Looking at ICW cluster 1 there were 5 matches listed with Trish as the first match.  So I searched for Trish in AutoSegment file and copied the cluster that she was in there into columns H through K. 

Figure 1. 23andMe ICW AutoClusters 1 and 2 and AutoSegment cluster 1.

The ICW cluster 1 has Trish matched to several other people then in the AutoSegment cluster 1.  Trish and I match on 19 DNA segments so that’s not too surprising.  But notice Sam, Mark and Keith Smith are shown in cluster 2 of the ICW. Keith is father of Mark and brother of Sam. So I’ll move them from the AutoSegment cluster down and look for Laura, Sue, Bill and Micky in the AutoSegment clusters. They were in cluster 2 of the AutoSegments.  I’ll add them to the Excel file.

Figure 2. 23andMe ICW AutoClusters 1 and 2 and AutoSegment clusters 1 and 2.

Looking at the ICW cluster I can see that most of these should triangulate with Trish. The helix symbol on the square shows this triangulation. Both the ICW and the AutoSegment clusters are shown in figure 3.

Figure 3. 23andMe ICW clusters 1 and 2 on the left and AutoSegment cluster on the right.

Below the AutoSegment cluster is a table of chromosome segment statistics. By clicking on the live link for cluster 1 I can see the list of matches in cluster 1 and which segments clusters or segments are underlying this AutoSegment cluster. Figure 4 shows the list for AutoSegment cluster 1. AutoSegment cluster 1, the orange cluster, shows Trish, Mark, Sam and Keith. The chromosome segments that are directly linked to this cluster 1 are listed in the chart. Mark and Keith match me on chr 1 and they and Sam also match Trish and me on chr 17. Also in this chart are the segment clusters that are indirectly linked to the the green cluster 1. These are from the green cluster 2 that are connected to Trish with the grey cells. They show Sue, Laura and Bill who triangulate with Trish and me on chr X.

Figure 4. Segment cluster details for AutoSegment cluster 1.

The ICW AutoCluster 1 and AutoSegment cluster 2 match up. I noticed in AutoSegment cluster 2 that Micky was not listed. Looking at my shared matches to Trish in 23andMe I see that Micky’s match says ‘share to see’.  Without messaging Micky and asking her to share her DNA with me, I can’t see exactly where she matches Trish and me.  Also looking at the details for AutoSegment cluster 2, I see that Laura, Sue and Bill match Trish and me on chr X. I could add Laura, Sue, and Bill to the 23andMe ‘Advanced DNA Comparison’ with Trish and she how they match, but since I’ve already painted them in DNA Painter, it’s just easier for me to look there. Figure 5 shows my DNA Painter paternal chr X.

Figure 5. Trish, Laura, Sue and Bill painted on my paternal chromosome X.

They all match Trish on the X chromosome. Trish and I share paternal great grandparents, Thomas Byrnes and Bridget Fenton. These were my Dad’s maternal grandparents. This X could come from Thomas’ mother Hanora Shannon, from Bridget’s mother Johanna O’Brien, or from Bridget’s father’s mother Bridget Lillis. 

Next I will look at ICW AutoCluster 2 which compares to AutoSegment cluster 1. From the segment cluster details in figure 4, I can see that Mark and Keith match me on chr 1 as well as matching Trish and me along with Sam on chr 17. The ICW cluster tells me they all triangulate with Trish and me, and AutoSegment tells me they all share the same segment. After looking at the AutoSegment cluster 1 details I can see that the chr we all share is chr 17. Here I used the ‘Advanced DNA Comparison’ on 23andMe, shown in figure 6. Sam also matches me on chr 1 with 10 cM. I had used 15 cM as the minimum shared amount between matches when I ran the two clusters, which explains why he did not show up as a match on chr 1 in the AutoSegment cluster 1 details.

Figure 6. 23andMe Advanced DNA Comparison of Trish, Keith, Mark and Sam.

I was very excited to notice these matches on chr 1.  Trish and I are trying to figure out where in County Roscommon our great grandfather Thomas Byrnes was born.  I have other triangulated DNA matches on chr 1 at this location whose ancestors lived in the eastern part of County Galway near the County Roscommon border, and one of those ancestors married a Byrnes.  The chr 1 segment is likely a Byrnes segment that I’ve inherited.  I’ll message Keith, since 23andMe says he was on the site last week, and see what he knows of his ancestors in Ireland. 

Next I moved to ICW cluster 3 which has my 2C Frank Barry.  We share paternal great grandparents Edward Barry and Pauline Fröhlich.  These were my Dad’s paternal grandparents.  Going through the same process as before I found Frank in AutoSegment cluster 3 but the rest of the matches in ICW cluster 3 were in AutoSegment cluster 9 along with two new AutoSegment matches Doug and Peggy.  I found Doug and Peggy in the ICW cluster 8. 

Figure 7. Excel file with 23andMe ICW cluster 3 and corresponding AutoSegment clusters added.

I can see that Mary and Larry triangulate with Frank in the ICW cluster 3 because of the helix symbols.

Figure 8. 23andMe ICW clusters 1 – 3.

I’d looked at Mary and Larry before on 23andMe. Mary is Larry’s mother, and they triangulate with Frank and me on chr 3. Not only that but Mary’s grandmother was from Baden-Württemberg, and my great grandmother, Pauline Fröhlich was from Baden-Württemberg!  Likely that segment of DNA that we share came from my great grandmother Pauline.  

Looking at AutoSegment cluster 9 Doug and Peggy are listed on that same segment as Mary and Larry, see figure 9. Penny is Doug’s daughter.  When I ran 23andMe ‘Advanced DNA Comparison’ using Frank with Doug and Peggy there was no match.  AutoSegment is looking at all matches that fall in the same location on the chr.  It is not differentiating between maternal and paternal.  Here is a good example since Mary and Larry triangulated with Frank and me, they are therefore paternal.  Whereas Doug and Peggy did not match Frank at all, but they do triangulate with me, so they must be on my maternal side, since Frank and I only share paternal great grandparents.  

Figure 9. Excel file showing 23andMe ICW clusters 1 – 3 and corresponding AutoSegment clusters.

This is how chr 3 on my DNA Painter profile looks now, see figure 10.  Larry and Mary triangulated with Frank and me so they must be paternal.  Doug and Peggy did not match Frank at all, but match me so they are likely maternal. It is possible that there’s not enough overlap between Frank and Doug and Peggy.  But there should be a good bit of overlap between Larry and Doug.  I compared Larry and Doug in 23andMe’s ‘Advanced DNA Comparison’ and they do not share any DNA.  That confirms that Doug and Peggy are on my maternal side.

Figure 10. My DNA Painter profile chr 3.

I continued with the 23andMe data in this fashion.  The majority of the matches in the AutoSegment clusters matched with the ICW ones and triangulated with me.  So there weren’t any huge surprises here.

MyHeritage

Next I looked at the MyHeritage ICW and AutoSegment clusters.  Again I downloaded the latest match and shared matches files from MyHeritage.  The ICW cluster used 400-25 cM with a 10 cM minimum between matches and minimum of 3 per cluster, so I ran the AutoSegment cluster using those same parameters.  There were several ICW clusters from MyHeritage where the matches in them did not appear in AutoSegment clusters, unlike the case with 23andMe where only a few matches in a cluster did not appear.  There was also a lot more mixing of clusters than I saw in 23andMe.

In ICW cluster 4 Trish matches 5 people.  Three of them are in cluster 2 of the AutoSegment and do triangulate with Trish on chr 15.  Bob Burns is not shown on the AutoSegment cluster. He matches Trish on several segments and matches me on chr 1 and 5.  The surname Byrnes has had many spelling variations over the years and Bob and Trish and I do match on the Byrnes side of my family.  ICW cluster 5 matches with many of the people who were in AutoSegment cluster 1.  Henry matches Trish and me on our 2nd great grandmother, Johanna O’Brien side, as his mother’s maiden name was O’Brien.  Henry triangulates with Trish and me on chr 7 as do Guy, Dawn and Alex.  Jake triangulates with Trish and me on chr 8, but he is also a cousin of Henry.  These are summarized in figure 11.

Figure 11. Excel file showing MyH ICW and AutoSegment clusters with matches to Trish and me.

As I’m doing these comparisons I’m going down the list of the ICW AutoClusters and then finding the corresponding AutoSegment clusters, if there are any. Continuing down the MyH ICW list cluster 16, shown in figure 12, became very interesting.  

Figure 12. MyH ICW cluster 16 and AutoSegment cluster 9.

Looking at the details for AutoSegment cluster 9, shown in figure 13, I see that Clara, Matt and Otto match me on chr 3. I had painted Matt on chr 3 as unknown.  I’d not painted Clara or Otto before. 

Figure 13. MyHeritage Segment Cluster 9 details.

Checking the shared matches with Matt on MyHeritage I found both Clara and Otto triangulated with him.  Both Matt and Clara live in France, and Otto lives in Germany.  I added them to the unknown group with Matt on my DNA Painter profile, as shown in figure 14 .  

Figure 14. My DNA Painter profile chr 3 after adding Matt’s triangulated group.

Looking further down Matt’s shared match list on MyHeritage I found that he triangulated with Terri Grant.  Her name was very familiar and I was sure I’d painted her and others with that same surname. I found that I’d painted her as a triangulated match to Frank Barry, since they had matched on GEDmatch 1 to 1.  Frank Barry is not on MyHeritage, so I’m unable to compare him directly with Matt.  But since Terri is on both GEDmatch and MyH and she triangulates with both Frank and Matt, now I know that Matt and those that triangulate with him must be paternal.  On DNA Painter I can merge Matt’s unknown group into the paternal group with Larry and Mary.

Figure 15. My DNA Painter profile chr 3 after discovering that Matt’s group was paternal.

Not only did I discover two new matches to paint, Clara and Otto, but I was able to merge Matt’s unknown group into a paternal one.  This is the paternal segment that I likely inherited from my great grandmother Pauline, who was born in Baden-Württemberg.  

FamilyTree DNA

When I first thought of looking at my FTDNA data with ICW AutoCluster and AutoSegment I thought that using the two clustering techniques together might help with matches, since FTDNA doesn’t have a triangulation function.  But after working with my clusters I can’t say that it did.  Both cousin Trish and Frank are also on FTDNA.  Unlike with 23andMe or MyHeritage each of their cluster of ICW matches and their AutoSegment cluster matched at FTDNA, and I’d already painted them all. 

I went down the list of ICW clusters and did find several interesting things. ICW AutoCluster 35 had 4 matches listed.  Three of these were found in AutoSegment cluster 16.  There were an additional three matches in AutoSegment cluster 16.  I found Edith and Amy in ICW cluster 7. This reminds me of 23andMe ICW cluster 3 and 8 shown in figure 9, and is probably a hint that Edith and Amy are on the opposite side of my family of the group in ICW cluster 35.  See figure 16.

Figure 16. ICW AutoCluster 35 and 7, and AutoSegment cluster 16.

I looked at chr 12 on my DNA Painter profile and found that I had painted these matches as two different groups but both of them as unknown, as shown in figure 17.  All of these matches are on the same location as AutoSegment has said. John wasn’t in the AutoSegment cluster and maybe there wasn’t enough overlap to his segment for him to be included, but he shows up on chr 12 with the others.   Next, I looked at the matches in the FTDNA matrix which is shown in figure 18.

Figure 17. Chromosome 12 on my DNA Painter profile.
Figure 18. FTDNA matrix showing the 7 matches.

From the matrix I can see that Edith and Amy would be on one side of my family and the other 5 would be on the other side.  Unfortunately, I don’t know which set is on which side of my family.  Of these seven matches only Beth has a tree and that only contains 2 people. Using the name of the only deceased person in the tree I searched Ancestry’s public trees and found several trees that contained him.  I looked through a couple of those trees and found the surname Burns in both of them.  That surname is on my Dad’s mother’s side of the family.  The common great grandfather that Trish and I share was Thomas Byrnes.  So, it’s possible that the group of 5 in the matrix are on my paternal side, and the group of 2 would then be maternal.  At this point I don’t have enough evidence to be certain of that, and I will just make a note on my DNA Painter profile by their groups.  Perhaps I should email some of the matches in these groups, and see if we can figure out the connection. 

Summary

The ICW AutoCluster gives a listing of your shared matches.  In general, these would all be on one side of your family.  I actually have a couple cases in my family where that is not true, but it does seem to be rare.  So to begin with the hypothesis would be that all the ICW matches in a specific cluster are on one side of your family.  The AutoSegment cluster is telling you all the matches in it are on the same chromosome.  It does not tell you which ones are paternal and which ones are maternal, and the AutoSegment cluster can very well be a mixture of these. 

Each of the sites has different features and need to be treated a bit differently, so I will summarize them individually.  On 23andMe an ICW cluster will have the helix symbols in the squares if the matches triangulate.  That makes looking at the AutoSegment cluster very easy because knowing certain segments triangulate identifies them as being on the same side of your family.  On MyHeritage, there was more mixing of clusters when I made the comparison between ICW and AutoSegment. It was necessary to check the chromosome browser on MyH to make sure that matches triangulated since looking at either of the clusters did not provide enough information to determine that.  This step had not been necessary on 23andMe because of the triangulation symbols in the ICW cluster.  FTDNA also required checking matches on their website.  For FTDNA looking at matches in the matrix was needed to determine if AutoSegment matches were on the same side of the family or not.  The AutoSegment clusters from GEDmatch have already included the GEDmatch triangulated data, so they will all be on the same side of the family.

Putting the two types of clusters together uses the ICW, that indicates one side of your family, to then group the segments that belong on that side of your family. If there are others in the auto segment group they would likely belong to the other side of your family.  The next step is to check the matches on the testing site to see if there is more information, such as a tree or surnames that will help with your assessment and possibly confirm it.

  1. Patricia Ann Harris Anthony, Trish, has given me permission to use her real name. All the other names used in this post are fictitious.

Genetic Affairs Hybrid AutoSegment Cluster

An exciting addition to Genetic Affairs is Hybrid AutoSegment Clusters!  Now you can run the AutoSegment clusters with data from 23andMe, FTDNA, MyHeritage and GEDmatch or any combination of these sites all into one cluster analysis.  The entry page is shown in figure 1.

Figure 1. Entry page for Hybrid AutoSegment Clusters

Starting at the top of the page you’d want to give a name for you Hybrid cluster.  I often use the name of the person whose cluster it is and a date or some information that will tell me exactly what the file is.  You can select the minimum overlapping segment size between your matches, and the minimum cluster size.  The smaller the overlap and the smaller the cluster size the more matches that will be used, and it could end up with an html cluster that is too large for your browser to load.  You can always view it with the Excel file or look at the html file that has all the information without showing the large cluster.

If you have a large number of matches in known pileup regions you can choose to have those matches removed from the analysis. Pileup regions are explained in more detail at Genetic Affairs

Another parameter to consider is liftover for FTDNA.  Of the various testing sites only FTDNA still uses build 36 for their comparison whereas the other sites are using build 37.  A build is a reference system used by the testing company that represents the human genome. For comparing matches across the different companies I would want all the data to be using the same reference.  Performing liftover on the FTDNA matches converts them to build 37, so that you can easily do a direct comparison.  

You can select different min and max cM settings for each of the sites.  What I typically do is to look at the site and select a max cM value that will include the highest match that I want in the analysis.  My paternal 2nd cousin Trish1 tested at 23andMe and uploaded to MyHeritages and GEDmatch.  She did mtDNA and Family Finder tests at FTNDA. But each of the sites reports a slightly different cM that she and I share.  Table 1 shows the amount of DNA Trish and I share at each site.  Both 23andMe and FTDNA include our X chromosome match, as well as FTDNA counting small cM down to 1.  I could run MyHeritage and GEDmatch with a max of 400 cM, but if I used 400 cM for all of the sites, I’d not include Trish’s data at 23andMe and FTDNA.  I usually just use 600 cM max for all four of the sites, since I know Trish is my highest match it’s not hurting anything to have the max higher than needed.

I find it harder to select a minimum cM value.  I’d like to go down to around 7 cM, but then the clusters are so large that it’s very difficult to load and view them, at least on my laptop.  Minimum shared is the amount shared between your DNA matches. And minimum cluster size is the number of matches needed to make a cluster.  

The match and segment files that are used in the analysis are the ones that you download from the particular site.  You can select to run two, three or all four sites.  If you want to run just one site AutoSegment you should use the AutoSegment Analysis from Genetic Affairs main page.  The cost of the Hybrid AutoSegment Analysis is 100 credits.  It is not part of the free tier which is the 200 credits you receive when you first join Genetic Affairs.  For the paid tier you can make a one-time purchase of any amount.  For example, a $5.00 (USD) would purchase 5.00 credits. Or you can select to have a monthly subscription for as little as $5.00 (USD) per month. Monthly subscriptions also provide 10% additional credits. So a $6.00 (USD) subscription will result in 660 monthly credits.

When your hybrid AutoSegment cluster is ready you will receive an email. If the resulting file is less than 8 MB, the zip file will be attached to the email. For larger files the email will contain a link where you can download your results file.

Results

This is my beautiful html cluster from 600 cM to 25 cM on all 4 sites: MyHeritage, 23andMe, FamilyTreeDNA and GEDmatch, shown in figure 2. The segment clusters from MyHeritage, 23andMe and FTDNA look at segments that overlap on a particular chromosome.  In general they are not considering maternal or paternal.  FTDNA will label a DNA as maternal or paternal if you have identified a match in your tree.  Maternal or paternal is sometimes indicated at 23andMe. You can also add maternal or paternal to known matches in the CSV files after retrieving them from the testing company. The GEDmatch data uses both the triangulated data as well as the segment data, so the results for GEDmatch are triangulated segments.  If you know one match in the GEDmatch data in a particular cluster is paternal, you know that all the GEDmatch segments in that cluster also have to be paternal because of the triangulation.

Figure 2. Hybrid cluster from 600 cM to 25 cM on all 4 data sites.

That first orange cluster with lots of grey squares has my paternal 2nd cousin Trish in it.  Since Trish is on all four of these sites she’ll show up as four matches.  The table below the html cluster contains the chromosome segment statistics per AutoSegment cluster, shown in figure 3. This table contains a link that will bring up a more detailed page concerning the cluster of interest as well as provide some information concerning the identified segment (clusters) such as the chromosomes underlying the segment clusters, how many matches (per DTC) and if there are any maternal or paternal annotations linked to these clusters.

Figure 3. Chromosome segment statistics for AutoSegment cluster 1.

Clicking on the AutoSegment cluster 1 link in the table brings up a visualization of the identified segment clusters and the individual segments that have made up cluster 1 in the html file. These segment clusters are shown in figure 4. A colored square is present between two segments indicates that there is sufficient overlap between those segments.

Figure 4. Chart displaying a visualization of the individual segment clusters (colored groups) and the underlying segments (x-axis and y-axis).

Figure 5 shows the segment cluster information for segment cluster 13, the red one about in the middle of figure 4.

Figure 5. Segment cluster information for segment cluster 13.

There are 7 matches listed here.  The first column tells the cluster number where the match was found in the html cluster. The second column has the segment cluster number, here all are 13. Next is the chromosome number, which happens to be chromosome 13 here. Then the start and end values of the segment. The diagram is a visualization of that segment of data. You can easily see that all 7 of these overlap. The SNP value is in the next column. Followed by the name and kit number of the match. I added a red circle around DTC, DNA Testing Company.  The next column has the number of shared cM on this segment, followed by the total number of cM that the match and I share. The last 2 columns have paternal and maternal if that information is found in the file.

The last 4 matches are all of Trish from the 4 sites. Mike tested at MyHeritage.  I know that Mike is a maternal 2nd cousin once removed.  He and I share my maternal great grandparents as our most recent common ancestor.  He has a segment of DNA on chromosome 13 that overlays Trish’s segment.  But because MyHeritage is giving all segments that fall in the same location he and Trish show up in this cluster.  If I didn’t know who Mike was, I’d go to MyHeritage and run the chromosome browser with Trish and Mike in order to see if they triangulate with me.  Next is A.B. whose data came from GEDmatch.  Because the GEDmatch segment data here is triangulated I know that A.B. must be paternal because he triangulated with Trish.  Sue is from FTDNA and has a P off to the far right, which tells me that I’ve placed her in my tree, and FTDNA knows that she is paternal.  If she were totally unknown I’d use the FTDNA chromosome browser and matrix to determine if she matched Trish or not.

Having the data from the different sites displayed this way makes it’s easy to see matches that overlap and might be related.  Then you can check in the chromosome browser on the individual testing company site to confirm if they are on the same side of your family or one is maternal and the other is paternal.  It’s especially helpful when I find a match that I’ve not looked at before, and now I have some idea how the match might relate to me based on who else is on that DNA segment and in the cluster.

I’ve done a lot of research with matches that my cousin Trish shares with me, so I decided to look at some more distant matches. Searching through the list of names in the Excel file I found Sophie who is in this cluster 70 which I’ve circled in the large html cluster in figure 6.

Figure 6. Cluster 70 circled.

Figure 7 shows the segment clusters chart using a visualization of the individual segment.  I have no idea who Andrea is, other than she matches Sophie.

Figure 7. Segment clusters chart for cluster 70 of the html file. Note that DNA matches can be linked via different segment clusters and therefore multiple segments.

Sophie tested at FTDNA and uploaded her results to GEDMatch. She shows up here matching herself and Andrea.  Sophie and I have emailed a number of times.  We know that our comment ancestor is on my Aide line.  My 2nd great grandfather Thomas Barry married Mary Aide in County Kilkenny, Ireland.  I have the baptismal records for their two children, Edward, my great grandfather, and Mary his sister.  Edward was baptized in 1840 and Mary in 1843.  None of the baptismal records prior to 1823 and none of the marriage records for the Catholic parish in Ballyhale, Kilkenny survived.  So I’ve not been able to find Mary Aide’s baptismal record or Thomas and Mary’s marriage record.

Sophie’s Aide family also lived in County Kilkenny, and goes back another generation or two past my Mary Aide.  Mary Kilfoil married an Aide, and as best as we can tell without records and with the DNA evidence Mary Kilfoil is either my 3rd or 4th great grandmother.  This continues to be something that I’m searching, but for now we’ll leave it at that.

Andrea on GEDmatch indicated that she’d tested at 23andMe.  Almost to my surprise I found her in my match list on 23andMe!  She had no triangulated matches with me, which was a disappointment as I like to work with triangulated matches.  But looking at her ICW match list, shown in figure 8, was amazing!  Frank Barry, my 2C who also descends from Thomas Barry and Mary Aide is at the top of the list.  Looking down the list I’d already added many of her matches and their triangulated matches to my DNA Painter profile.  I’d messaged Tyler over a year ago on 23andMe and never got a reply, so I really don’t have any information on him.  He does triangulate with known Irish matches, however.  Kay and I have emailed a good bit.  She has a great grandfather surname Byrne from County Roscommon.  I have my great grandfather, Thomas Byrnes, from County Roscommon.  We’ve not found the common ancestor yet, but the connection seems to be on my Byrnes side.  Beth is a bit of an unknown as she and Trish have segments on the same chromosome and somewhat overlap, but don’t show as a match.  Either Beth is on my maternal side or there’s just not enough overlap with Trish.  I need to message her for more information.  Ashley was a match I’d not looked at before.  So I looked at her shared matches and any information they might have listed. I found one of her matches with ancestors from Buffalo, NY. Thomas Barry’s family lived in Evans, Erie County, NY, which is not far from Buffalo.

Figure 8. Andrea’s shared match list with me on 23andMe.

One of the surnames, Green, and one of the locations on Andrea’s information on 23andMe were the same as I knew were in Sophie’s family.  And not finding much information from Andrea’s matches I emailed Sophie.  Sure enough Andrea is on the same line as Sophie and is Sophie’s 2nd cousin once removed.  Andrea is another match on my Aide line.  Now if I could just make the connection to our two trees and figure out if Mary Kilkoil in my 3rd or 4th great grandmother!  I’ve tried WATO, but most of my matches to Aide family members are too small to be useful in WATO, so I haven’t gotten very promising results.  

Summary

I’m finding the Hybrid AutoSegment Clusters on Genetic Affairs very promising.  There are so many new connections for me to explore!  I would not have found Andrea and been able to connect her to Sophie if not for the hybrid clustering.  Sophie has not tested on 23andMe.  Andrea didn’t have any triangulated matches there.  At most I’d have seen the Green surname, and since it’s not that unusual a name I might not have ever thought of Sophie and that it’s in her family tree.  The Hybrid AutoSegment Clusters is going to be a huge help for me trying to make connections between more of my DNA matches.

You can run the AutoSegment Clusters with any 2, 3 or 4 of the testing companies: MyHeritage, 23andMe, FTDNA and GEDmatch. It will provide you with clusters that are based on shared segments across the companies that you selected. With the exception of GEDmatch, where the data has already been triangulated, you will need to compare matches in one of the segment clusters with each other using the chromosome browser, and on FTDNA the matrix tool, to determine if the matches triangulate or not.

Now to go explore more of my hybrid clusters!

  1. Patricia Ann Harris Anthony, Trish, has given me permission to use her real name. All the other names used in this post are fictitious.

Thomas Barry in Kilkenny, Ireland

Like everything else this year the Celtic Connections Conference went virtual.  That gave me the opportunity to attend, as the dates of the live event would have conflicted with several local things I would have been involved with.  There were several presentation dealing with Griffiths Valuation.  Although I’d used it several times in the past, I learned a great deal more about it in the conference.  

Griffiths Valuation

The Griffiths Valuation was a property tax based on what income could be produced annually from the land.   It was carried out from 1847 to 1864 starting in the south of Ireland and moving north.  Every area of land and building were measured and the person leasing the property was named.  Irish census began in 1821 and occurred every 10 years, however the majority of the census records were either destroyed by the 1922 fire at Four Courts or were pulped for making paper in WWI.  The only fully existing census records are from 1901 and 1911.  Some fragments of earlier census exist and can be found on the National Archives of Ireland website. Because of the loss of early census records the Griffith Valuation and the Revision books are now used by genealogists as census substitutes.

Thomas Barry, 2nd Great Grandfather

My 2nd great grandfather was Thomas Barry.  My Dad had built extensive family trees for both his side of the family and my Mom’s family.  As the only child I obtained all the records after their deaths.  Dad’s information said that Thomas Barry and his wife Mary (Aide) Barry lived in the Village of Ballyhale, County Kilkenny, Ireland.  I found baptismal records for Edward, my great grandfather, born in 1840 and for his sister, Mary, born 1843.  The Roman Catholic church there did not have any records that had survived for Thomas and Mary’s wedding or either of their baptisms.

After learning that FindMyPast listed the exact date that the Griffiths Valuation was printed, and that it took 3 months to compile and print, I decided to look and see if Thomas Barry, was listed there.  Griffiths Valuation for Kilkenny was published in April 1850.  I really didn’t expect to find him listed there, as the 1855 census for Evans, Erie County, New York state indicated the family had been residence there for 5 years.

The children’s baptismal records indicated the townland where the family lived at that time.  Edward’s baptismal record is shown in figure 1. Starting at the left it has Feby 11 Ned Tom Barry Mary Aide and on the 2nd line Martin Millea Cath Millea and Moanroe.  Mary’s baptismal record in 1843 also listed Moanroe.

Figure 1. Edward Barry’s baptismal record 11 Feb 1840 Ballyhale Catholic Parish, Kilkenny

I looked for Thomas Barry in Griffiths Valuation in Moanroe Commons, Kilkenny.  To my surprise I found that Thomas Barry leased land in Moanroe Commons along with Anastasia Barry.  I had no idea who Anastasia Barry was, as far as anyone in the family.  I knew she had to be a widow, as the only women in Griffiths Valuation were either widows on their husband’s land or the landowner, and she was clearly not the owner here.  But where did Thomas live?  I broadened my search to nearby townlands and found that he had a house in Knocktopher Manor which was next door to Moanroe Commons.  The Griffiths Valuation for Knocktopher Manor is in figure 2.  Anastasia had a house next door to Thomas, and the land they shared in Moanroe Commons backed into the location of their houses.  Figure 2 is a composite of the top of the page of Griffiths Valuation which shows the column heading and the listings for Knocktopher Manor which was at the bottom of the page. 

Figure 2. Griffith Valuation for Knocktopher Manor, Knocktopher, Kilkenny.

The numbers and letter in the first column indicate the location on the map, and the small a and b indicates house.  Column 2 has the name os the occupants.  Parentheses surrounding Thomas’ and Anastasia’s names  indicate that each of them is responsible for the tax on the house and land.  The immediate lessor is Thomas Norman, Esq. who seems to lease a good bit of land here.  He may not be the actual owner and was subletting the land, but more research is needed to determine that. Column 3 has house, office and land.  An office is not what we’d think of today.  It could be a barn, or a stable. Content of the land is in acres, roods and perches.  A rood is 1/4 of an acre, and there are 40 perches in a rood.  The net value would be how much income could be expected from that land in a year.  The net value of the buildings would mainly be for the house.  They had to pay tax of 1£ 2 shilling on the house. I learned from Fiona Fitzsimmons Celtic Connections presentation that a house with a tax about this amount would be made of cob walls.  Cob was made from mud and straw and usually white-washed to help keep out the weather.  When the house was abandoned, and no one was any longer living there, heating it and caring for it, the house would just melt into the landscape.  Using the numbers in column 1 we can look on the map and find the location. The map for Knocktopher Manor and Moanroe Commons is shown in figure 3.  I’ve circled 6a and b in Knocktopher Manor where Thomas and Anastasia Barry lived.  The land they had in Moanroe Commons is 11, so it basically is their back yard.

Figure 3. Map showing Moanroe Commons and Knocktopher.

What’s more Thomas’ next door neighbor was Martin Millea.  His house is at 5a, shown both in figures 2 and 3.  It just happened that Martin Millea was one of the sponsors on Edward’s baptismal records, shown in figure 1!  This had to be my Thomas Barry!!  That would leave just about a 6-month window from when they left Ireland and arrived in the US. I planned to add several months on either side when I started digging in ship arrival records, just in case. My latest hypothesis was that this was my Thomas Barry and Anastasia was his mother.  This totally threw out two earlier hypotheses I had.  One was based on a Barry DNA match whose family was in Thomastown, and had a Thomas baptized in 1812 with parents Js Barry and Ellen Shea.  That hypothesis had been based on this Thomas’ baptismal record that showed James Comerford as the sponsor, and he had been a witness at the wedding of Mary Barry, who was a daughter in the family there.  Thomastown RC is the next RC parish to Ballyhale RC and only a few miles away, so it seemed a reasonable hypothesis.  The other hypothesis was based on an Aide cousin to Thomas’ wife, Mary, whose naturalization papers said he’d arrived in the US via Buffalo, NY in 1846.  Buffalo is only a short distance from Evans, NY and since families, neighbors and friends often traveled from Ireland to US, this also seemed a reasonable hypothesis.

Anastasia Barry

So who is Anastasia Barry? There’s no mention of her in any of the family notes or tree that my Dad had done. But then again he never says anything about Thomas’ parents. Dad likely got his family information from his father, Frank, who would have gotten it from Edward. My Dad was only 5 when his grandfather Edward died. It’s unlikely his grandfather told him any family information. Edward, born in 1840 and in the US at least by 1855, may not have known his grandparents at all. That would mainly depend on when the family left Ireland and Edward’s age at the time.

I searched FindMyPast baptismal records for Ballyhale Catholic parish using Barry surname and An* as the mother’s forename and found two records; Margaret born in 1825 and Nellie born in 1831. The earliest surviving baptismal records for Ballyhale Catholic parish are in 1823 according to John Grenham’s website. Thomas likely was born before that time. Those two baptismal records list John Barry as the father and Anastasia Riley as the mother. Potentially these are Thomas’ parents and Margaret and Nellie are his sisters.

Revision Books

Every few years after the Griffiths Valuation a revision was done.  Since this was a record for collecting tax, it was necessary to update the person living there that would be required to pay the tax.  The Revision books started just after Griffith Valuation and continued to the 1980s.  These Revision books are housed in the Valuation Office in Dublin.  Some have been digitized, but not all, and none of them are online at this time.  I emailed the Valuation Office asking about this location where Thomas Barry was in Knocktopher Manor.  I fully expected a reply telling me how to apply for the information and the cost.  However, the next day I received an email with 2 pages from the Revision books.   Thomas took over the lease from Anastasia in the 1860-62 timeframe.  That likely indicates that she died, but Civil records for death did not start until 1878. Thomas is replaced by Eliza Barry in 1882, and Eliza is replaced by Richard Moore in 1883.  Figure 4 shows the Revision book for 1876-1883.

Figure 4. Revision book for 1876-1883 showing Thomas Barry in Knocktopher.

Checking the civil death records Thomas died 31 Oct 1881 as reported by his son, John.  Elizabeth Barry, widow of Thomas Barry, farmer, died 8 May 1882 again reported by son John.  This is not my Thomas Barry, as I know he was in NY in June 1855.  I’d heard many times how unusual the surname Barry was in Kilkenny.  So it had never occurred to me that there could be two Thomas Barry’s in the same area of Kilkenny!  

Validation Books

Now what to do?  Prior to Griffiths Valuation there had been Field Books which described the land, quality of the soil etc but also listed the name of the person on that land.  There had also been House Books which listed the houses on the land and what had already been surveyed in preparation for the later valuation.  FindMyPast had Field books from 1848 and House books for 1845 and 1848.  Thomas Barry who lived in Knocktopher Manor and died in 1881 was found with his house in Knocktopher Manor in 1845 and 1848.  But looking at the House books also for Moanroe Commons, since the children’s baptismal records said that was where my Thomas Barry lived, I couldn’t find him in 1848, but there he was in 1845!  In the 1845 House Book both my Thomas Barry in Moanroe Commons and the Thomas Barry who had a house in Knocktopher Manor were listed!  This appears to tell me that my Thomas Barry who was there in 1845 and not in 1848 left Ireland for the US after 1845 and before 1848.  Maybe they did travel with the Aide cousins, and maybe he was baptized in 1812 in Thomastown.  Lots more research needs to be done.  But now I do have an earliest date for his arrival in the US when I start searching passenger records.

GEDmatch AutoSegment

There’s an enhancement to the GEDmatch AutoSegment clustering on Genetic Affairs.  Now the GEDmatch option includes using the triangulated data as well as the all segment data, both of which are available on Tier 1 of GEDmatch.

There are a number of settings for the GEDmatch DNA Segment Search.  I used 1000 for my analysis.  Most of the settings can be left to the defaults.  However, if you’re including matches that have long segments with you, you’d want to click the ‘Prevent Hard Breaks’ option. GEDmatch default adds in hard breaks when it finds segments over 500,000 base positions.  

Figure 1. GEDmatch Segment Search page.

After running your Matching Segment Data you would want to download the csv file.  There is a  ‘HERE’ button at the top of the list of segment data that allows you to save the csv file to your computer.

Figure 2 shows the GEDmatch Segment Triangulation Screen.  It defaults to 500 kits, which I changed to 1000 to match what I’d run on the Segment data.  The upper threshold of 3000 cM would exclude parent-child relationships but probably won’t exclude siblings. I left all the other defaults as they were.

Figure 2. GEDmatch Triangulation Data page.

The Segment Triangulation Data can be saved as a tsv file.  The ‘HERE’ button to save this data is found at the bottom of the table of triangulated data.

Segment and Triangulation Files

The data in the Segment file shows a list of my matches, the chromosome where we match, how many cM we share, the SNP value, and the start and end of the data on the chromosome.   Figure 3 shows the data that I share with a match, Joe1.  We share 14.0 cM on chr 6 from about 162 M to 168 M.

Figure 3. Segment on chr 6 that Joe and I share.

When I look at Joe’s triangulated matches in my triangulation file I find that he has 4 matches on chr 6.  These data, shown in figure 4, show how many cM Joe, another match and I triangulate in that particular region. It looks as if Joe, John and I triangulate across the entire region that Joe and I match.  Whereas the triangulated region for Joe, and Mary or Sue and me is less than the 14.0 cM we share.  The data shown in the triangulated data file is showing only the start, end and cM that the three of us who triangulate share.  To see how many cM I share with Mary, or John, or Bill or Sue, I’d have to look at the All segment data file.  That is why both of these files are needed for the analysis.

Figure 4. Triangulated data that Joe and I share.

With triangulation I’m looking for at least three independent matches that each match me and also match each other.  For example a parent and child, or 2 siblings would definitely have a common ancestor, but they would not be independent of each other.  The child got half of his or her DNA from that parent, and siblings would share a great deal of DNA in common as well. When the matches triangulate it’s very likely that we share a common ancestor in the genealogical timeframe.  Then the next step would be to use traditional genealogy methods to attempt to identify that common ancestor.

Sometimes there are segments found in the triangulation file that are not in the segment file.  I downloaded 1000 segment matches and 1000 triangulated matches.  Not all segments are going to have triangulated matches, and those segments that don’t have any triangulation will not be included in the cluster.  Consequently, since I used 1000 matches for each of the files, there will be segments that triangulate that are not found in the segment file.  New DNA matches based on these triangulated segments are reconstructed and these segments are used to estimate the total cM.

Results

Now you are ready to run the GEDmatch enhanced AutoSegment Cluster.  Figure 5 shows the data entry page for analysis.  Select the maximum cM value you want to include.  I pick this value based on what my highest match is and whether or not I want to include that match in the run.  The minimum is a bit harder to pick.   I know that I don’t have many close cousins at all so I usually pick a low minimum, perhaps lower than most people would choose. 

Figure 5. Genetic Affairs data entry screen for GEDmatch AutoSegment analysis.

A zip file with the results is sent to your email.  My results are shown in figure 6.

Figure 6. AutoSegment cluster results for my GEDmatch data.

Below the clusters are three tables containing information about the clusters and the matches.  The first table, shown in figure 7, is the segment statistics for each of the AutoSegment clusters.  It describes the segment clusters that are found in each cluster, lists the chromosomes that are present, the number of matches in the cluster, the number of segment clusters and the number of segments.  By clicking on the link another window opens showing the segment clusters that make up the large cluster shown in the original html cluster. An example of this is shown below in the Data Analysis.

Figure 7. Chromosome segment statistics.

Below this table is the AutoSegment cluster information, which is shown in figure 8.  It shows the name and kit number of the match, the amount of cM shared, the number of shared matches, the cluster that this kit in in and other information about the match. The notes indicate the source of the particular segment. Some of this information, such as MyHeritage, and Migration-V4-M, comes from GEDmatch. When the (triangulated) segment is not found in the GEDmatch segment file, Genetic Affairs reconstructs it based on segments that triangulate with it.

Figure 8. AutoSegment Cluster Information.

Shown in figure 9 is the third table which is the Individual segment cluster information.  The cluster listed on the far left is the cluster number from the large html cluster.  Clicking on that number takes me to the segment clusters that are making up the cluster 29.  This is the same as if I clicked on the ‘segment and segment clusters for cluster 29’ in the Chromosome segment statistics.  Next is the segment cluster number, the chromosome, the start and end values, the SNP, match name and kit, cM for this segment and the total cM for this match. The segment representation chart allows me to quickly assess the overlap between the different segments within a segment cluster.

Figure 9. Individual Segment cluster information.

Data Analysis

Earlier I looked at Joe and his triangulated matches on chr 6 in the triangulated segment table.  Searching for Joe in the html cluster I found him in cluster 5, the larger brown cluster in figure 6, with grey squares to cluster 4, the purple cluster.  Joe matches my known 2nd cousin Frank on chr 20 and was placed into cluster 5 with Frank.  You can see the line of grey squares below where I labeled Frank, that are Joe’s matches to John, Bill, Mary and Sue on chr 6 in the purple cluster.

Figure 10. Enlarged image of cluster 4 from figure 6.

Because of the grey cells Joe will show up in the segment clusters for clusters 4 and 5.  Clicking on the link for cluster 4 in the statistics I get another chart with the well known animations which represents the underlying segment clusters and their segment members. This chart allows you to quickly see which and how many segment clusters are present and how connected they are.

For AutoSegment cluster 4, I see there are two segment clusters (blue and orange cluster) in the chart. The orange cluster in figure 11 is the group that triangulates on chr 6.  John and Bill also triangulate with another group of matches on chr 9, and those are in the blue cluster.

Figure 11. Two fully connected segment clusters for AutoSegment cluster 4.

Below these clusters is the Segment cluster information table, shown in figure 12. The column on the left indicates the html cluster where each person was found. The second column indicates the segment cluster. Chromosome 6 has Joe and his triangulated matches. As seen in the clusters John and Bill also triangulate on chromosome 9 with other matches. The start and end values are given and a visual representation of the relative size of the segments are given. The number of cM for each match as well as their total cM are also shown in the chart. The table allows for a quick check how similar the segments are and how they align.

Figure 12. Segment information chart for the blue and orange clusters shown in figure 11.

DNA Painter Cluster Tool

One thing I like to do with my cluster results is to put them into DNA Painter.  Using the ‘Cluster Auto Painter’ in the DNA Painter tools I can enter the html file from Genetic Affairs and generate a new profile with all my clusters in it.  Figure 13 shows my chr 6 on DNA Painter after importing the html file.  Cluster 4 is the pink one on the far right.  Joe is on cluster 5 so he’s in a different color.  The other green segment in that location is Joe’s son. He would not have been an independent match which is why I left him out of the earlier triangulation.  The other 4 triangulated matches are in the pink cluster.  I did not add paternal or maternal to any of the matches in my all segment file, so all of my clusters here are showing up as ‘shared or both’.  Another thing I’ve done with this DNA Painter profile is to import the GEDmatch segment data file and compare the segments to those in the cluster. It is then very easy to see segments that don’t have any triangulated matches.

Figure 13. DNA Painter profile of chr 6 obtained from the Cluster Auto Painter for my data.

Just like the other AutoSegment analyses the GEDmatch AutoSegment clusters costs 50 credits per run.

Summary

The enhanced AutoSegment Clustering for GEDmatch uses the all segment file and triangulation data files from GEDmatch and clusters the matches into triangulated groups.  Triangulated groups, especially of four or more, indicate a common ancestor in a genealogical timeframe. Compared to the AutoSegment implementation of MyHeritage, FTDNA and 23andme the GEDmatch version frees users of the manual process of checking the validity of the identified segment clusters.

Individual clusters can then be analyzed using traditional genealogical methods to find a common ancestor.  The html file containing the AutoSegment clusters can also be imported into DNA Painter using the ‘Cluster Auto Painter’ tool and visualized in detail on individual chromosomes as well. This is going to be a huge help to me as I research which of our common ancestor my 2C and I share on a particular chromosome segment.

  1. All names of living individuals have been changed to protect their privacy.