It’s not the biggest family tree in the world, but it’s close.
Armchair genealogists and a team of computers scientists have assembled a massive family tree that includes 13 million individual members and spans an average of 11 generations.
A study describing the tree, published this week in Science, also details some of what we can learn from this crowdsourced data. For example, it reveals when people stopped marrying their cousins, whether men or women traveled farther from home for marriage, and provides clues about how longevity is inherited.
The tree is based on data assembled by roughly 3 million genealogy enthusiasts who have identified the familial relationships of more than 86 million individuals on the website Geni.com.
Kevin Bacon is in there. So is Donald Trump.
You’ll find me on the Geni site too. (Thanks, second cousin Scott!)
Not everyone in the website’s database is included in the current study, however. For this work, the authors only used data from profiles that users had agreed to make public.
Before assembling the mega-family tree, lead author Yaniv Erlich, a computer scientist at Columbia University, and his team first had to make sure the entries were accurate and worth using.
This took a lot of time, Erlich said. The data were not pristine, but perhaps not as faulty as you might expect considering they were provided by millions of contributors.
The researchers found that on average there was a 2% error when listing a person’s father, and a 0.3% error for a mother. They also found that about 0.3% of profiles included clear mistakes such as a person having more than two parents, or someone being the parent and offspring of the same person.
To correct these errors the team developed computer programs that “pruned” the tree, removing invalid relationships. After doing that, they generated 5.3 million disjointed family trees — the largest of which included 13 million individuals.
By comparing people in the system with 80,000 death records from Vermont spanning from 1985 to 2000, the authors also found that the people included in their family tree were not any more likely to be rich or poor than the general population. They were, however, much more likely to be white.
“We have much more representation of Western populations, mostly from Europe and the U.S.,” Erlich said. “And from the U.S. it is mostly from Caucasians rather than other ethnicities.”
He added that he hopes more nonwhite people will soon add their families to the site.
After getting a handle on their sprawling data set, the research team came up with a set of questions that only a mega family tree dating back hundreds of years could answer.
For example, after studying migration patterns in the tree they found that women leave their hometown more than men, but when men move, they tend to move much farther. This pattern has continued for a long time. It was true 300 years ago, and continues to be true today, the authors said.
In another line of inquiry, the data were used to determine when people stopped marrying close relations.
The researchers found that prior to 1750, most marriages in their data set occurred between people born about 6 miles from each other. After the start of the Industrial Revolution in 1870, however, that distance rapidly increased to about 60 miles.
You might think that as people traveled farther to find a spouse, they would marry people who were more distantly related to them. And indeed, that was true. Eventually.
The authors report that between 1650 and 1850 the average genetic relationship of married couples was on the order of 4th cousins. After 1850 it was on the order of 7th cousins.
But, the researchers found something strange in the data. Between 1800 and 1850 the distance couples traveled to marry each other doubled — probably because rapid transportation made railroad travel possible in most of Europe and the United States. However, that increase in distance traveled to marry someone was accompanied by an increase in genetic relatedness between marriage partners.
In other words, during this 50-year period, people traveled farther to marry closer relations.
“Families dispersed, and people started taking the train to marry their cousin,” Erlich said.
This observation implies that it was changing social norms, rather than access to rapid transit, that was the primary trigger for people to search genetically further afield than fourth cousins when it came to finding a spouse, Erlich said.
The authors also addressed an ongoing debate about the inheritability of longevity. According to their data set, previous studies have probably overestimated the heritability of this particular trait.
“We should lower our expectations about our ability to predict longevity from genomic data,” they wrote.
But Erlich said these investigations just scratch the surface of what we can learn from a massive family tree.
“There are many questions we want to ask,” he said.