focus of genealogical research besides the size of the
data source its structure also changes (Rannala, 1997;
Kingman, 1982). Marriages and childbirths connect
families (Koylu et al., 2021). Ancestry charts of a mi-
nor community cannot be represented by a forest of
family trees, it is a general directed acyclic graph. In
a small settlement especially in bygone years, the so-
ciety was more closed than nowadays, thus families
are densely interconnected. People who live in small
communities often choose wives/husbands from the
same community (villages, ethnic or religious minori-
ties).
The ”loss of lineage” (also called implex) can be
characterized by a genealogical coefficient of a given
genealogical tree, defined as the difference between
the number of theoretical ancestors of a person and
the number of his/her real ones in a given generation.
For example, procreation between first cousins means
25% loss in the generation of great-grandparents of
the offspring. This measure is not so useful in case
of marriages between different generations or when
some ancestors are unknown (Pattison, 2001; Patti-
son, 2007).
In genetic genealogy, DNA analysis can be used
to show out pedigree collapse (Tetushkin, 2011;
Vince Buffalo, 2016). Generally, children inherit 50%
of their DNA each from their parents, 25% from their
4 grandparents, and so on. Nevertheless, the ac-
tual amount of DNA inherited is random, the average
amount of DNA inherited from an individual ancestor
is halved going back to each generation level. Due
to random inheritance, DNA analysis is an effective
way of finding shared ancestors only within few gen-
erations. Nevertheless, this kind of research is quite
expensive and involved.
Our goal is to build a directed network of people
based on only registry records (without genetic test re-
sults) and then determine different metrics of the net-
works (Newman, 2010; Barab
´
asi, 2016), such as in-
degree and out-degree distribution, average clustering
coefficient, size of the giant component, average path
length, etc. In this system, they have the social mean-
ing as well. The characterization of pedigree collapse
also requires network analysis. While the dataset is
not complete a novel quantity is defined to illustrate
the scale of pedigree collapse.
2 METHOD OF
INVESTIGATIONS
Our research is based on a public dataset created by a
Hungarian genealogist (Szepesi, 2020). He processed
the available (civil and parish) birth, marriage and
death registers of a town (Hajd
´
ub
¨
osz
¨
orm
´
eny, Hun-
gary) and other historical documents (census, burial
records, etc) of the archives. The database contains
different data fields appeared in the registry records:
an ID, the name, the date of birth, marriages and
death of the given person, names (and IDs) of his/her
parents and name (and ID) of his/her spouse(s), etc.
More than 100.000 individuals appear in the dataset
mainly (but not exclusively) from the last three cen-
turies.
Of course, the dateset is not complete due to the
nature of the problem and the accuracy of the sources.
Each person has two parents, but the source is re-
stricted in time and space. Too old ancestors and too
young descendants are unknown and migration is also
not followed. In the 18th century, just the fathers were
represented in registers.
The IDs of people and his/her parents were ex-
tracted form a dataset having Personal Ancestral File
format and used to build up the genealogical chart
i.e. an acyclic directed graph of depersonalized IDs.
(Those few people who do not belong to any other
individuals are eliminated.) A special graph analyzer
program (Bord
´
an, 2019) and a web-application (Sz
´
ell
et al., 2020) were applied to analyze the topology of
this special social network. However, only one com-
munity was investigated in this case study we believe
that the results and conclusions may be general.
2.1 New Characterisation of Pedigree
Collapse
As it was highlighted in Section 1, the pedigree col-
lapse cannot be properly characterised by the simple
loss of ancestors in a given generation. That is why
we propose a new quantity to measure the degree of
the pedigree collapse. First a kind of auxiliary mea-
sure α
j
is assigned to the given person i and to his/her
known ancestors according to a recursive definition.
The α
j
= 1 for the given person, so where j = i. For
ancestors the value of α
j
is given by the following
form
α
j
=
1
N
N
∑
k=1
α
k
2
, (1)
where k runs over all the N children of person j who
are ancestors of person i. An example is shown in
Figure 1. It was motivated by the inheritance. How-
ever it is a random process, approximately half of
the genome comes from the father and the other half
comes from the mother.
In order to define the new ancestor-loss coefficient
of a person i (denoted by λ
i
) the summation of aux-
iliary measure is needed according to the following
restrictions:
COMPLEXIS 2023 - 8th International Conference on Complexity, Future Information Systems and Risk
48