Classifying Words with 3-sort Automata
Tomasz Jastrz ˛ab
1 a
, Frédéric Lardeux
2 b
and Éric Monfroy
2 c
1
Silesian University of Technology, Gliwice, Poland
2
LERIA, University of Angers, Angers, France
Keywords:
Grammatical Inference, Nondeterministic Automata, SAT Models.
Abstract:
Grammatical inference consists in learning a language or a grammar from data. In this paper, we consider a
number of models for inferring a non-deterministic finite automaton (NFA) with 3 sorts of states, that must
accept some words, and reject some other words from a given sample. We then propose a transformation from
this 3-sort NFA into weighted-frequency and probabilistic NFA, and we apply the latter to a classification task.
The experimental evaluation of our approach shows that the probabilistic NFAs can be successfully applied
for classification tasks on both real-life and superficial benchmark data sets.
1 INTRODUCTION
Many real-world phenomena may be represented as
syntactically structured sequences, e.g., DNA, natu-
ral language sentences, electrocardiograms, and chain
codes. Grammatical Inference refers to learning
grammars and languages from data, i.e., from such
syntactically structured sequences. Machine learning
of grammars has various applications in syntactic pat-
tern recognition, adaptive intelligent agents, compu-
tational biology, and prediction.We are interested in
learning grammars as finite state automata, with re-
spect to a sample of the language composed of posi-
tive sequences that must be elements of the language,
and negative ones that the automaton must reject.
The problem of learning finite automata has been
studied from various angles: ad-hoc methods such
as DeLeTe2 (Denis et al., 2004) which merges states
from the prefix tree acceptor (PTA), a family of al-
gorithms for regular language inference presented
in (Vázquez de Parga et al., 2006), metaheuristics
such as hill-climbing in (Tomita, 1982), or model-
ing the problem as a Constraint Satisfaction Prob-
lem (CSP) and solving it with generic tools (such
as non-linear programming (Wieczorek, 2017), or
Boolean formulas (Jastrz ˛ab, 2017; Jastrz ˛ab et al.,
2023; Lardeux and Monfroy, 2021; Jastrz ˛ab et al.,
2022)).
a
https://orcid.org/0000-0002-7854-9058
b
https://orcid.org/0000-0001-8636-3870
c
https://orcid.org/0000-0001-7970-1368
However, all these works consider Deterministic
Finite Automata (DFA), or Non-deterministic Finite
Automata (NFA). In both cases, this means that when
using the automata on a word, the answer is “Yes,
this word is part of the language”, or “No, this word
is not part of the language”. Since samples are fi-
nite and usually limited in size (hundreds of words at
most), and regular languages are infinite, this classifi-
cation may be too restrictive. One could be interested
in probabilistic answers such as “this word is part of
the language with a probability of x%”. The question
is thus “How can we learn a probabilistic automaton
from a sample of positive and negative words?”.
In this paper, we propose a technique to derive
a probabilistic automaton from a sample of positive
and negative words. We first learn Non-deterministic
Finite Automata with 3 sorts of states: accepting fi-
nal states which validate positive words, rejecting fi-
nal states which reject negative words, and whatever
states that are not conclusive. We use these 3-sort
NFA which seems of reasonable use for our goal (the
usefulness of this kind of NFA is presented in (de la
Higuera, 2010)). To improve the efficiency for gen-
erating such automata, we use a similar property to
the one that was used for 2-sort NFA in (Jastrz ˛ab
et al., 2023): here, we build a 3-sort automaton with
only one accepting final state and one rejecting final
state, and some extra constraints to reduce this size
k + 2 automaton into a size k automaton. Then, we
want to reflect frequencies based on the sample, such
as: how many positive words of the sample termi-
nate in this accepting final state? How many times
Jastrz ˛ab, T., Lardeux, F. and Monfroy, É.
Classifying Words with 3-sort Automata.
DOI: 10.5220/0012454100003636
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 16th International Conference on Agents and Artificial Intelligence (ICAART 2024) - Volume 3, pages 1179-1188
ISBN: 978-989-758-680-4; ISSN: 2184-433X
Proceedings Copyright © 2024 by SCITEPRESS – Science and Technology Publications, Lda.
1179
has a negative word of the sample passed by this tran-
sition? But we also want to be a bit more specific.
For example, how many negative words of the sam-
ple passed by this transition and terminated in a re-
jecting final state (vs. a whatever state). To this end,
we need to weigh differently some cases and patterns.
We thus need to define what we call 3-sort Weighted-
Frequency NFA, and we present the transformation of
a 3-sort NFA into a 3-sort Weighted-Frequency NFA.
This last can then be converted into a probabilistic
NFA once weights have been instantiated.
The probabilistic NFA can then be used to deter-
mine the probability for a word to be a part of the lan-
guage, or the probability of it not being a part of the
language. Note that by modifying the weights, we can
obtain more accepting or more rejecting automata.
We conduct a number of experiments on the Waltz
DB database to classify peptides into amyloid ones
that are dangerous, or non-amyloid ones that are
harmless. We perform similar studies with languages
generated by a regular expression. Our results look
promising, leaving also some space for weights tun-
ing depending on the aim of the classification, e.g., we
can be safer (rejecting dangerous peptides and some
harmless ones) or more risky (trying not to reject non-
amyloid peptides).
The paper is organized as follows. In Sect. 2
we revise the already developed inference models
and propose modifications required to construct 3-sort
NFAs. In Sect. 3 we show how the 3-sort NFA can
be transformed into weighted-frequency NFA and fi-
nally into probabilistic NFA. In Sect. 4 we describe
conducted experiments and discuss obtained results.
Finally, we conclude in Sect. 5.
2 THE NFA INFERENCE
PROBLEM: FIRST MODELS
In this section, we formally present the NFA inference
problem based on the propositional logic paradigm
and we provide several models. These new models
are similar to the ones of (Jastrz ˛ab et al., 2023) but
using 3-sort non-deterministic automata.
Without the loss of generality
1
, we consider in the
following, that λ, the empty word, is not part of the
sample. We also consider a unique initial state, q
1
.
1
If λ S, then it can be recognized or rejected directly,
without the need of an automaton.
2.1 Notations
Let Σ = {s
1
,...,s
n
} be an alphabet of n symbols, and
let λ denote the empty word, K be the set of integers
{1,...,k}, Pre f (w) (resp. Su f (w)) be the set of pre-
fixes (resp. suffixes) of the word w, that we extend to
Pre f (W ) (resp. Su f (W)) for a set of words W .
Definition 1. A 3-sort non-deterministic finite au-
tomaton (3NFA) is a 6-tuple A = (Q,Σ,I,F
+
,F
,δ)
with: Q = {q
1
,...,q
k
} a finite set of states, Σ a
finite alphabet, I the set of initial states, F
+
– the set
of accepting final states, F
– the set of rejecting final
states, and δ : Q × Σ 2
Q
the transition function.
Note that in what follows, we will consider only one
initial state, i.e., I = {q
1
}.
A learning sample S = S
+
S
is given by a set S
+
of “positive” words from Σ
that the inferred 3-sort
NFA must accept, and a set S
of “negative” words
that it must reject.
The language recognized by A, L(A)
A
, is the set
of words for which there exists a sequence of transi-
tions from q
1
to a state of F
+
. The language rejected
by A , L(A )
R
, is the set of words for which there exists
a sequence of transitions from q
1
to a state of F
.
An automaton is non-ambiguous if L(A)
A
L(A)
R
=
/
0, i.e., no positive word terminates in a re-
jecting final state, and no negative word terminates in
an accepting final state.
We discard models with 0/1 variables, either
from INLP (Wieczorek, 2017) or CSP (Rossi et al.,
2006): we made some tests with various models with
PyCSP (Lecoutre and Szczepanski, 2020) and ob-
tained some very poor results: the NFA inference
problem is intrinsically a Boolean problem, and thus,
well suited for SAT solvers. Hence, we consider the
following variables:
k, an integer, the size of the 3NFA to be generated,
a set of k Boolean variables F
+
= {a
1
,...,a
k
} de-
termining whether state i is a final accepting state
or not,
a set of k Boolean variables F
= {r
1
,...,r
k
} de-
termining if state i is rejecting,
= {δ
s,
# »
q
i
,q
j
| s Σ and (i, j) K
2
} , a set of nk
2
Boolean variables representing the existence of
transitions from state q
i
to state q
j
with the sym-
bol s Σ, for each i, j, and s.
we define ρ
w,q
1
,q
m+1
as the path q
1
,...,q
m+1
for
a word w = s
1
...s
m
: ρ
w,q
1
,q
m+1
= δ
s
1
,
# »
q
1
,q
2
. . .
δ
s
m
,
# »
q
m
,q
m+1
.
Although the path is directed from q
1
to q
m+1
(it is a
sequence of derivations), we will build it either start-
ing from q
1
, starting from q
m+1
, or starting from both
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
1180
sides (Jastrz ˛ab et al., 2022). Thus, to avoid confusion,
we prefer keeping q
1
,q
m+1
without any direction.
Paths will be built recursively, and we need at most
O(σk
2
) Boolean variables ρ
s,i, j
, with σ =
wS
|w|.
2.2 Core of the Models
The core of the models is independent of the way the
paths are built. It will thus be common to each model.
The core to define a 3NFA of size k (noted k_3NFA)
can be defined as follows:
a final state must be either accepting or rejecting:
^
iK
¬(a
i
r
i
) (1)
a positive word must terminate in an accepting fi-
nal state of the 3NFA, i.e., there must be a path
from state q
1
to a final state i, such that i F
+
:
_
iK
ρ
w,q
1
,q
i
a
i
(2)
to build a non-ambiguous NFA, a positive word
cannot terminate in a rejecting final state:
^
iK
(¬ρ
w,q
1
,q
i
¬r
i
) (3)
similarly, negative words need the same con-
straints swapping rejecting and accepting:
_
iK
ρ
w,q
1
,q
i
r
i
(4)
^
iK
(¬ρ
w,q
1
,q
i
¬a
i
) (5)
Of course, the notion of a path can be defined and
built in many ways, see (Jastrz ˛ab et al., 2022) for pre-
fix, suffix, and hybrid approaches.
2.3 Building Paths
We consider here that a path for a word w = uv is built
as the concatenation of a path for the prefix u and one
for the suffix v. Thus, we can have several joining
states ( j below) and ending states (k below) for the
various paths of a word w = uv:
_
( j,k)K
2
ρ
u,q
1
,q
j
ρ
v,q
j
,q
k
(6)
Note that for the empty word λ we impose that
2
:
ρ
λ,q
i
,q
j
= True (i, j) K
2
, (7)
2
As said before, we consider that λ ̸∈ S, but when split-
ting a word, its prefix or suffix may be λ.
to ensure that splitting a word in such a way that the
prefix or suffix is the empty word is valid. Note, how-
ever, that we do not allow λ-transitions in the NFA.
Prefixes and suffixes are then built recursively,
starting from the beginning of words for prefixes:
For each prefix u = s, s Σ:
^
iK
δ
s,
# »
q
1
,q
i
ρ
s,q
1
,q
i
(8)
for each prefix u = xs, s Σ of each word of S:
^
iK
ρ
u,q
1
,q
i
_
jK
ρ
x,q
1
,q
j
δ
s,
# »
q
j
,q
i
!!
(9)
and from the ending of words for suffixes:
for v = s, and s Σ
^
(i, j)K
2
δ
s,
# »
q
i
,q
j
ρ
s,q
i
,q
j
(10)
for each suffix v = sx, s Σ:
^
(i, j)K
2
ρ
v,q
i
,q
j
_
lK
δ
s,
# »
q
i
,q
l
ρ
x,q
l
,q
j
!!
(11)
2.4 The Models
We build the models similarly to standard NFA (i.e.,
without the notion of rejecting states). This means
that we have to determine where to split each word
w S into a prefix u and a suffix v. We then consider
the set of prefixes S
u
= {u|w S and w = uv} and
S
v
= {v|w S and w = uv} the set of suffixes. Then,
the model is the conjunction of Constraints (1)–(11).
Based on Constraint (7), if we split each word w
as w = wλ, we obtain the prefix model P whose spa-
tial complexity is in O(σk
2
) clauses, and variables,
with σ =
wS
|w|. Similarly, by splitting words as
w = λw, we obtain the suffix model S whose spatial
complexity is in O(σk
3
) clauses, and variables (the
difference is because we do not know where a word
terminates).
We then have hybrid models with non-empty suf-
fixes and prefixes. Their complexity is in O(σk
3
):
the best suffix model S
which consists in
determining a minimal set of suffixes cover-
ing each word of S and maximizing a cost
based on an order over suffixes ((v) = {w
S | v is a suffix of w} and considering two suf-
fixes v
1
and v
2
, v
1
v
2
|v
1
| · |(v
1
)|
|v
2
|·|(v
2
)|. Then, prefixes are computed to com-
plete words.
similarly, the best prefix model P
is built opti-
mizing prefixes.
Classifying Words with 3-sort Automata
1181
we can also try to optimize each word splitting us-
ing some metaheuristics, for example, iterated lo-
cal search (ILS). The model ILS(Init), based on a
local search optimization (Stützle and Ruiz, 2018)
of word splittings (starting with an initial config-
uration Init, being either a random splitting of
words, the splitting found by the P
model, or by
the S
model), tries to minimize the fitness func-
tion f (S
p
,S
s
) = |Pre f (S
p
)| + k ·|Su f (S
s
)|.
2.5 From O(k
3
) to O((k + 2)
2
)
Consider a sample S. If there is a k_3NFA for S, i.e.,
a 3NFA of size k, to recognize words of S
+
and reject
words of S
, there is also a (k + 2)_3NFA for S. We
can refine this property by adding some constraints to
build what we call (k + 2)_NFA
extensions.
Let A = (Q,Σ,{q
1
},F
+
,F
,δ) be a k_3NFA.
Then, there always exists a (k + 2)_3NFA, A
= (Q
{q
k+1
,q
k+2
},Σ,{q
1
},{q
k+1
},{q
k+2
}, δ
), such that:
there is only one final accepting state q
k+1
and one
rejecting state q
k+2
, thus we do not need anymore
the a
i
and r
i
variables,
each transition is copied:
i, jK
2
,sΣ
δ
s,
# »
q
i
,q
j
δ
s,
# »
q
i
,q
j
,
incoming transitions to accepting final state are
duplicated to the new accepting final state q
k+1
:
iK,q
j
F
+
δ
a,
# »
q
i
,q
j
δ
a,
# »
q
i
,q
k+1
,
the same transition duplication is made for reject-
ing final states to the new rejecting final state q
k+2
,
there is no outgoing transition from states q
k+1
and q
k+2
,
no negative (resp. positive) words terminate in the
states from F
+
(resp. F
). This is obvious in A,
we have to make it effective in A
.
The interest of this (k + 2)_3NFA for S is that the
complexity for building suffixes is now in O(k
2
) since
both positive and negative words must terminate in a
given state (resp. q
k+1
and q
k+2
).
We now give the constraints of the (k + 2)_3NFA.
Let K
+
= {1,2,...,k + 2}:
Constraint (1) disappears,
Constraints (2) and (3) become, for each w S
+
:
ρ
w,q
1
,q
k+1
(12)
¬ρ
w,q
1
,q
k+2
(13)
the same happens for Constraints (4)–(5), re-
placed by, for w S
:
ρ
w,q
1
,q
k+2
(14)
¬ρ
w,q
1
,q
k+1
(15)
Constraints (6) must be split into two, for positive
(16) and negative (17) words:
_
jK
ρ
u,q
1
,q
j
ρ
v,q
j
,q
k+1
(16)
_
jK
ρ
u,q
1
,q
j
ρ
v,q
j
,q
k+2
(17)
Constraints (8)–(9) are not modified
Constraints (10)–(11) are respectively modified
for positive words into:
^
iK
δ
s,
# »
q
i
,q
k+1
ρ
s,q
i
,q
k+1
(18)
^
iK
ρ
v,q
i
,q
k+1
_
jK
δ
s,
# »
q
i
,q
j
ρ
x,q
j
,q
k+1
!!
(19)
and for negative words into:
^
iK
δ
s,
# »
q
i
,q
k+2
ρ
s,q
i
,q
k+2
(20)
^
iK
ρ
v,
q
i
,q
k+2
_
jK
δ
s,
# »
q
i
,q
j
ρ
x,
q
j
,q
k+2
!!
(21)
There is no outgoing transition from the final
states:
^
sΣ
^
iK
+
¬δ
s,
# »
q
k+1
,q
i
¬δ
s,
# »
q
k+2
,q
i
(22)
Each incoming transition of the accepting (resp.
rejecting) final state q
k+1
(resp. q
k+2
) must also
terminate in another state (duplication):
^
sΣ
^
iK
δ
s,
# »
q
i
,q
k+1
_
jK
δ
s,
# »
q
i
,q
j
!
(23)
^
sΣ
^
iK
δ
s,
# »
q
i
,q
k+2
_
jK
δ
s,
# »
q
i
,q
j
!
(24)
In order to be able to reduce the (k + 2)_3NFA
into a k_3NFA, we must take care about the possi-
bly rejecting and accepting final states of the k_3NFA.
To this end, we need a new set of Boolean variables
representing possibly accepting (resp. rejecting) fi-
nal states for the corresponding k_NFA: {a
1
,··· ,a
k
}
(resp. {r
1
,··· ,r
k
}). The (k + 2)_3NFA
may be
reduced to a k_3NFA by just removing states q
k+1
,
q
k+2
, and their incoming transitions, and by fixing the
final states among the possible final states, i.e., deter-
mining the a
i
and the r
i
which are final states of the
k_3NFA, either accepting or rejecting. To determine
these possible final states, we have to ensure:
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
1182
A negative (resp. positive) word cannot termi-
nate in an accepting (resp. rejecting) possible final
state:
^
iK
a
i
^
wS
¬ρ
w,q
1
,q
i
(25)
^
iK
r
i
^
wS
+
¬ρ
w,q
1
,q
i
(26)
Note that with Constraints (25)–(26), Constraints
(3)–(5) are no longer needed.
Each accepting (resp. rejecting) possible final
state validates at least one positive (resp. nega-
tive) word of S:
^
iK
a
i
_
vsS
+
_
jK
ρ
v,q
1
,q
j
δ
s,
# »
q
j
,q
i
δ
s,
# »
q
j
,q
k+1
(27)
^
iK
r
i
_
vsS
_
jK
ρ
v,q
1
,q
j
δ
s,
# »
q
j
,q
i
δ
s,
# »
q
j
,q
k+2
(28)
Each positive (resp. negative) word terminates in
at least one accepting (resp. rejecting) possible
final state:
^
wS
+
_
iK
(ρ
w,q
1
,q
i
a
i
) (29)
^
wS
_
iK
(ρ
w,q
1
,q
i
r
i
) (30)
a state cannot be both accepting and rejecting pos-
sible final state:
^
iK
¬(a
i
r
i
) (31)
3 FROM 3NFA TO
WEIGHTED-FREQUENCY NFA
AND PROBABILISTIC NFA
Using a sample of positive and negative words, we
are able to generate a k_3NFA. However, we can-
not directly obtain probabilistic automata to decide
the probability for a word to be part or not of the
language represented by the sample. However, we
can use the sample and the generated k_3NFA to
build a weighted-frequency automaton: weighted-
frequencies will be determined with respect to the
sample words. We can then create, from this last au-
tomaton, a probabilistic automaton to classify words.
3.1 Weighted-Frequency Automata
We now define what we call a weighted-frequency au-
tomaton. In a frequency automata, the integer n at-
tached to a transition δ(q, a, q
) means that this tran-
sition was used n times (see (de la Higuera, 2010)).
Here, we want to count differently positive (resp. neg-
ative) words terminating in an accepting (resp. reject-
ing) final state from positive (resp. negative) words
terminating in whatever state. We thus need to weigh
these cases with different real numbers. Thus, we ob-
tain automata that reflect weighted frequencies and
are still based on 3-sort automata.
Definition 2 (3_NWFFA). A 3-sort non-
deterministic weighted-frequency finite automa-
ton (3_NWFFA) is a 10-tuple A = (Q,Σ,I,F
+
,
F
,δ,
( f ,+)
,
( f ,)
,
(δ,+)
,
(δ,)
) with:
Q = {q
1
,...,q
k
} – a finite set of states,
Σ – a finite alphabet,
I = {q
1
} – the set of initial states,
F
+
– the set of accepting final states,
F
– the set of rejecting final states,
δ : Q × Σ 2
Q
– the transition function,
( f ,+)
– a function Q N, i.e.,
( f ,+)
(q) =
ω
( f ,+,+)
· φ
( f ,+,+)
(q) if q F
+
ω
( f ,+,?)
· φ
( f ,+,?)
(q) if q Q \ (F
+
F
)
0 otherwise
where:
ω
( f ,+,+)
and ω
( f ,+,?)
two weights associated re-
spectively to positive words terminating in ac-
cepting states and to positive words terminating
in whatever states,
two counting functions φ
( f ,+,+)
(q) and
φ
( f ,+,?)
(q) for counting the number of times
(i.e., the numbers of physical paths for all
positive words) a positive word terminates in
accepting state q and respectively in whatever
state q. These two functions are detailed later.
( f ,)
a function Q N,
( f ,)
(q) is defined as
above but for negative words and rejecting states
(i.e., “+” is replaced by “”, and F
+
by F
).
(δ,+)
a function Q × Σ × Q N, i.e.,
(δ,+)
(q,s,q
) =
ω
(δ,+,+)
·φ
(δ,+,+)
(q,s,q
)+ω
(δ,+,?)
·φ
(δ,+,?)
(q,s,q
)
where:
ω
(δ,+,+)
and ω
(δ,+,?)
are two weights associ-
ated respectively with positive words terminat-
ing in accepting states and positive words ter-
minating in whatever states,
two counting functions φ
(δ,+,+)
(q,s,q
) and
φ
(δ,+,?)
(q,s,q
) for counting the number of
times a positive word uses given transition
within a physical path that terminates in an ac-
cepting state and respectively in whatever state.
These two functions are detailed later.
Classifying Words with 3-sort Automata
1183
(δ,)
a function Q × Σ × Q N, i.e.,
(δ,)
(q,s,q
) = ω
(δ,,)
· φ
(δ,,)
(q,s,q
) +
ω
(δ,,?)
· φ
(δ,,?)
(q,s,q
) defined as above but for
negative words.
Remember that these automata are non-
deterministic, and thus, there can be several paths for
a word, terminating in different states. Thus, for a
given word, we are interested in all terminating paths
independently from the sort of the terminating state.
3.2 From 3-sort NFA to
Weighted-Frequency Automata
We need a “physical” view of transitions and paths
of the k_3NFA we have built. Consider the transition
function δ : Q × Σ 2
Q
from a k_3NFA A. We re-
name by
s,
#»
i, j
the value δ(q
i
,s,q
j
). Note that if δ
s,
# »
q
i
,q
j
is true,
s,
#»
i, j
exists, otherwise
s,
#»
i, j
does not exist.
We also define π
s
1
...s
n
,i
1
,...,i
n+1
as the sequence of
physical transitions
s
1
,
# »
i
1
,i
2
,...,
s
n
,
# »
i
n
,i
n+1
. For a given
word w = s
1
...s
n
,
Π
w,i
1
,i
n+1
= {π
s
1
...s
n
,i
1
,...,i
n+1
| i
1
,...,i
n+1
Q
n+1
}
is the set of all sequences for w in A.
Consider a sequence π
s
1
...s
n
,i
1
,...,i
n+1
=
s
1
,
# »
i
1
,i
2
,...,
s
n
,
# »
i
n
,i
n+1
. Then, occ(π
s
1
...s
n
,i
1
,...,i
n+1
)(
s,
#»
i, j
) is the
number of occurrences of
s,
#»
i, j
in the sequence
π
s
1
.....s
n
,i
1
,...,i
n+1
defined recursively as follows:
occ(Λ)(
s,
#»
i, j
) = 0
occ(
s
1
,
# »
k
1
,l
1
,
s
2
,
# »
k
2
,l
2
,...,
s
3
,
# »
k
3
,l
3
)(
s,
#»
i, j
) =
1 + occ(
s
2
,
# »
k
2
,l
2
,...,
s
3
,
# »
k
3
,l
3
)(
s,
#»
i, j
)
if (s,i, j) = (s
1
,k
1
,l
1
),
occ(
s
2
,
# »
k
2
,l
2
,...,
s
3
,
# »
k
3
,l
3
)(
s,
#»
i, j
)
otherwise.
We now propose an implementation of the count-
ing functions for a weighted-frequency automaton:
φ
( f ,+,+)
: if q F
+
, φ
( f ,+,+)
(q) = |
S
wS
+
Π
w,1,q
|,
0 otherwise
φ
( f ,+,?)
: if q Q \ (F
+
F
), φ
( f ,+,?)
(q) =
|
S
wS
+
Π
w,1,q
|, 0 otherwise
φ
( f ,,)
and φ
( f ,,?)
can be defined similarly
φ
(δ,+,+)
: φ
(δ,+,+)
(q,s,q
) =
wS
+
q
A
F
+
pΠ
w,q
1
,q
A
occ(p)(
s,
# »
q,q
)
φ
(δ,+,?)
, φ
(δ,,)
, and φ
(δ,,?)
are defined similarly.
We can imagine other counting functions, for example
not considering all possible paths for a word, but only
one path, or only a given number of paths.
The different weights enable us to consider only
positive words for example (ω
(F,,)
= 0 for F
{ f ,δ} and = {−,?}), or considering for example
only positive words terminating in a positive state
(ω
( f ,+,?)
= 0, and ω
(δ,+,?)
= 0).
3.3 Probabilistic Automata
We can now define the probabilistic automata we are
interested in: 3-sort automata with probabilities for
transitions and probabilities for states to be final ac-
cepting and rejecting.
Definition 3 (3_NPFA). A 3-sort non-deterministic
probabilistic finite automaton is an 8-tuple A =
(Q,Σ,I, δ, Γ
( f ,+)
,Γ
( f ,)
, Γ
(δ,+)
,Γ
(δ,)
) with:
Q = {q
1
,...,q
k
} – a finite set of states,
Σ – a finite alphabet,
I = {q
1
} – the set of initial states,
δ : Q × Σ 2
Q
– the transition function,
Γ
( f ,+)
a function Q [0,1], i.e., Γ
( f ,+)
(q) is the
probability of state q to be accepting final,
Γ
( f ,)
a function Q [0,1], i.e., Γ
( f ,)
(q) is the
probability of state q to be rejecting final,
Γ
(δ,+)
a function Q × Σ × Q [0,1], i.e.,
Γ
(δ,+)
(q,s,q
) is the probability for a positive
word to pass by the transition δ(q,s,q
),
Γ
(δ,)
– similar to Γ
(δ,+)
for negative words.
A 3-sort non-deterministic probabilistic automaton
A = (Q,Σ,I,δ, Γ
( f ,+)
,Γ
( f ,)
,Γ
(δ,+)
,Γ
(δ,)
) must re-
spect the following constraint:
q Q,
q
Q,sΣ
Γ
(δ,+)
(q,s,q
) + Γ
( f ,+)
(q)
= 1
q
Q,sΣ
Γ
(δ,)
(q,s,q
) + Γ
( f ,)
(q)
= 1
Remember that we consider only one initial state
which is q
1
. In case one wants to consider several ini-
tial states, some probabilities of being initial positive
and initial negative can be added.
3.4 From Weighted-Frequency to
Probabilistic Automata
We now present the transformation of a 3_NWFFA
into a 3_NPFA: weighted-frequencies are converted
into probabilities.
Consider a 3-sort non-deterministic weighted-
frequency finite automaton A = (Q, Σ, I,F
+
, F
,
δ,
( f ,+)
,
( f ,)
,
(δ,+)
,
(δ,)
). Then, from A ,
we can derive a 3-sort non-deterministic prob-
abilistic finite automaton A
= (Q
, Σ
, I
, δ
,
Γ
( f ,+)
,Γ
( f ,)
,Γ
(δ,+)
,Γ
(δ,)
) such that:
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
1184
the states, alphabet, transitions, and initial state
remain unchanged: Q = Q
, Σ = Σ
, I = I
, and
δ = δ
the probability for q to be an accepting final state
is the weighted frequency of words of S
+
termi-
nating in q, divided by the sum of the weighted
frequencies of the positive words from the sample
outgoing from q plus the weighted-frequency of
positive words ending in q:
q Q, Γ
( f ,+)
(q) =
( f ,+)
(q)/
( f ,+)
(q) +
sΣ,q
Q
(δ,+)
(q,s,q
)
the probabilities Γ
( f ,)
are computed similarly for
negative words replacing “+” by “”.
the probability for a positive word to follow tran-
sition δ(q,s,q
) is computed similarly as the prob-
ability of ending in q:
q,q
Q,s Σ,Γ
(δ,+)
(q,s,q
) =
(δ,+)
(q,s,q
)/
( f ,+)
(q)
+
s
Σ,q
′′
Q
(δ,+)
(q,s
,q
′′
)
the computation is similar for negative words re-
placing “+” by “”.
These probabilities respect the properties of 3-sort
non-deterministic frequency finite automata.
3.5 Classifying Words
Given the two sets of independent weights (for states
and transitions) and the non-deterministic nature of
the NFA, implying possibly multiple paths for a word,
we consider four classifiers:
C
MM
computes the positive and negative scores
for a word by multiplying the probabilities of the
transitions and the probability of the last state on
each path, selecting as the final score the maxi-
mum of all paths.
C
MA
computes the positive and negative scores
for a word by multiplying the probabilities of the
transitions and the probability of the last state on
each path, selecting as the final score the average
of all paths.
C
SM
computes the positive and negative scores
for a word by summing up the probabilities of the
transitions and the probability of the last state on
each path, selecting as the final score the maxi-
mum of all paths. Summed up probabilities for
each path for word w are divided by |w| + 1, to
scale them to the range [0,1].
C
SA
computes the positive and negative scores
for a word by summing up the probabilities of the
transitions and the probability of the last state on
each path, selecting as the final score the average
of all paths. Summed up probabilities for each
path for word w are divided by |w| + 1, to scale
them to the range [0,1].
The final classifier decision, i.e., acceptance or rejec-
tion of a word, is based on the comparison of positive
and negative scores—the greater score wins.
To illustrate the operation of the classifiers let
us consider an example. Assume that we have a
word w = abb for which there are two paths in
some NFA. For simplicity, assume that all weights
ω
(F,,)
= 1, with F = { f ,δ}, = {+,−}, and =
{+,,?}. Let us also assume that the transition
and last state probabilities for the first path are:
(0.2,0.5,0.35,0.6) for the acceptance scenario, and
(0.6,0.5,0.65,0.4) for the rejection scenario. For the
second path assume probabilities (0.2, 0.15, 0.55, 0.9)
and (0.6, 0.5, 0.5, 0.75). Then the scores for the re-
spective classifiers are as follows:
for C
MM
the positive score is 0.02, and the nega-
tive is 0.11,
for C
MA
the positive score is 0.02, and the negative
is 0.10,
for C
SM
the positive score is 0.45, and the negative
is 0.59,
for C
SA
the positive score is 0.43, and the negative
is 0.56.
It is clear that each classifier indicates that the word
should be rejected as the negative scores are always
greater than the positive ones. Note also that the C
SM
and C
SA
produce larger margin between the scores
than the C
MM
and C
MA
(0.13–0.14 vs. 0.08–0.09).
4 EXPERIMENTATION
4.1 Experiment I
To evaluate the proposed probabilistic automata and
classifiers, we have created a benchmark set based
on the peptides stored in WaltzDB database (Beerten
et al., 2015; Louros et al., 2019). The bench-
mark set was composed of several samples con-
taining amyloid (positive) and non-amyloid (nega-
tive) peptides, each having a length of 6 charac-
ters. The samples were created based on pep-
Classifying Words with 3-sort Automata
1185
tide subsets available on the WaltzDB website
(http://waltzdb.switchlab.org/sequences).
Based on each sample, the training and test sam-
ples were created, with the training sample consisting
of 10%, 30%, and 50% of the first peptide sequences
in the given subset. The training sample was used
to infer the probabilistic NFA, which acted then as a
classifier for the test sample, comprising the whole
subset without the elements included in the training
sample. Since some of the subsets contained very few
positive/negative sequences, for the final evaluation
we selected only five of them, i.e., Amylhex (AH),
Apoai mutant set (AMS), Literature (Lit), Newscores
(NS), and Tau mutant set (TMS). Table 1 summarizes
the characteristics of the data set. Note that all sam-
ples are quite imbalanced and they differ both in the
total number of words and the size of the alphabet.
Table 1: Characteristics of the benchmark set.
Train 10% Train 30% Train 50% Whole subset
Subset |Σ| |S
+
| |S
| |S
+
| |S
| |S
+
| |S
| |S
+
| |S
|
AH 19 7 12 23 36 39 60 79 121
AMS 20 7 3 23 10 39 18 79 36
Lit 20 20 6 61 19 102 33 204 66
NS 18 3 1 9 4 16 7 32 15
TMS 19 9 2 27 6 46 11 92 22
The inference models were implemented in
Python using PySAT library and the Glucose SAT
solver with default options. The experiments were
carried out on a computing cluster with Intel-E5-
2695 CPUs, and a fixed limit of 10 GB of memory.
Running times were limited to 15 minutes, including
model generation and solving time. The classification
was conducted using a Java application running on
a single machine with Intel Core i7-7560U 2.40GHz
processor and 8 GB of RAM.
Since weights tuning lies outside of the scope of
the current paper, we decided to conduct the experi-
ments by setting respective weights to 0s or 1s only,
analyzing all possible combinations of 0s and 1s for
the eight weights defined before. Thus, for each train-
ing sample we analyzed 256 different weight assign-
ments, which along with 115 inferred NFAs
3
and 4
classifiers gave us a total of 117 760 classifications.
The whole process took around 186 minutes, with
the (k + 2)_3NFA models taking on average 2.9 times
longer to perform the classification than the k_3NFA
ones. This difference may be attributed to the larger
size of the former NFAs, which makes the path build-
ing process more time-consuming.
The classification results were evaluated based on
3
In five cases for the NS subset, we failed to infer an
NFA for the Train 50% training set. The models that failed
were the P
k+2
and all suffix-based models.
accuracy and F1-score given by Eqs. (32) and (33):
Acc =
T P + T N
T P + T N + FP + FN
(32)
F1 =
2 · T P
2 · T P + FP + FN
(33)
where T P denotes true positives (amyloid peptides
classified as amyloid), T N denotes true negatives
(non-amyloid peptides classified as such), FP de-
notes false positives (non-amyloid peptides classified
as amyloid ones), and FN denotes false negatives
(amyloid peptides classified as non-amyloid).
Table 2 shows the best accuracy values and their
corresponding F1-scores obtained for the test sets
over all analyzed weight combinations and all clas-
sifiers. The metrics were obtained by NFAs inferred
using 8 different models. Boldfaced values denote the
best column-wise values. The entries with an asterisk
denote the cases in which the best F1-score did not
correspond to the best accuracy.
Table 2: Best accuracy and corresponding F1-score metrics
obtained by the NFAs for the analyzed benchmark sets.
Accuracy F1-score
Model AH AMS Lit NS TMS AH AMS Lit NS TMS
P
k
0.63 0.59 0.76 0.63 0.72 0.63* 0.66* 0.86 0.73 0.82
P
(k+2)
0.69 0.68 0.76 0.67 0.73 0.64* 0.79 0.86 0.76 0.82
P
k
0.68 0.62 0.76 0.71 0.72 0.66 0.72 0.86 0.77 0.82
P
(k+2)
0.64 0.67 0.73 0.50 0.73 0.62* 0.77 0.82 0.51 0.82
S
k
0.61 0.55 0.76 0.62 0.72 0.48* 0.56* 0.86 0.70 0.82
S
(k+2)
0.65 0.64 0.77 0.50 0.73 0.60* 0.77 0.86 0.48 0.82
S
k
0.61 0.62 0.66 0.50 0.72 0.58* 0.76 0.79 0.51 0.82
S
(k+2)
0.63 0.66 0.74 0.47 0.73 0.42* 0.77 0.85 0.44 0.82
Based on the accuracy values we can state that
the prefix-based models perform best among all eight
models, regardless of the benchmark set. It is also
clear that NS data set turned out to be the hardest one,
since some of the models achieved accuracy smaller
than 0.5, which is the probability of success with a
random decision in binary classification. The anal-
ysis of F1-score, being the harmonic mean between
precision and recall, confirms the strong position of
prefix-based models, with a small advantage given to
the P
k
model. Overall, the achieved metrics are not
very satisfactory, which may be attributed to, e.g.:
the way the training samples were constructed
it is not infrequent that the training sample does
not cover the whole alphabet, which results in the
rejection of words from the test set using the sym-
bols outside of the training sample’s alphabet,
the lack of some language behind the subsets of
peptides there is no guarantee that the peptides
from a certain subset share some common features
reflected in the sequences,
limited parameter tuning so far we analyzed
only the extreme values for the weights, more
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
1186
advanced parameter tuning may be required to
achieve better results.
Interestingly, the best results were typically
achieved with 30% training sets. The NS data set,
whenever the NFA for it could be inferred, required
the 50% training set to achieve its peak performance.
There were also rare cases in which the smallest train-
ing data set was sufficient (e.g., for the P
k
model with
AMS data set).
Table 3 shows the best accuracy values and cor-
responding F1-scores for different classifiers over all
analyzed weight combinations and all subsets. The
metrics pertain to NFAs inferred using eight different
models. The meaning of boldfaced entries and entries
with an asterisk is the same as for Table 2.
Table 3: Best accuracy and corresponding F1-score metrics
obtained by the NFAs for the analyzed classifiers.
Accuracy F1-score
Model C
MM
C
MA
C
SM
C
SA
C
MM
C
MA
C
SM
C
SA
P
k
0.70 0.70 0.76 0.76 0.80 0.81 0.86 0.86
P
(k+2)
0.70 0.69 0.76 0.76 0.80 0.79 0.86 0.86
P
k
0.56 0.56 0.76 0.76 0.45* 0.45* 0.86 0.86
P
(k+2)
0.59 0.59 0.73 0.73 0.70 0.70 0.82* 0.82*
S
k
0.53 0.53 0.76 0.76 0.61* 0.69 0.86 0.86
S
(k+2)
0.64 0.64 0.77 0.77 0.73 0.73 0.86 0.86
S
k
0.53 0.53 0.72 0.72 0.69 0.69 0.82 0.82
S
(k+2)
0.68 0.68 0.74 0.74 0.79 0.79 0.85 0.85
The analysis shows that this time P
k
is clearly the
best model. In terms of classifiers, we do not ob-
serve many differences between the pairs of classi-
fiers based on multiplication (C
MM
, C
MA
) and sum-
mation (C
SM
, C
SA
). However, the differences between
multiplication- and summation-based classifiers for
the given model are statistically significant (based on
ANOVA test with α = 0.05 and post hoc Tukey HSD)
in terms of both accuracy and F1-score.
Figure 1 shows the best accuracy values and cor-
responding F1-scores obtained by all classifiers over
all analyzed weight combinations and NFAs inferred
by all models. We can confirm that the differences
between the classifiers are statistically significant at
α = 0.05, with the C
SA
and C
SM
classifiers consis-
tently achieving better results than the other two. We
can also note that Lit data set was the most favorable
in terms of satisfactory metric values.
4.2 Experiment II
To evaluate the proposed solutions even further, we
created the second benchmark composed of two data
sets. The data sets were built based on regular expres-
sions (regexp)
4
—we defined two languages described
4
The regular expressions were: (0|11)(001|000|10)*0
and [0-9][0-4][5-9](024|135|(98|87))*(0|6).
0.2
0.4
0.6
0.8
1.0
AH AMS Lit NS TMS
Accuracy
0.2
0.4
0.6
0.8
1.0
AH AMS Lit NS TMS
F1-score
Figure 1: Best accuracy and F1-score metrics obtained by
all NFAs for the analyzed benchmark sets and classifiers
C
MM
(black), C
MA
(gray), C
SM
(light gray), and C
SA
.
by different regular expressions from which we sam-
pled the words of 1 to 15 characters. These words
represented the sets S
+
. The sets S
contained words
constructed by randomly shuffling the positive exam-
ples and ensuring they do not match the regexp. Sim-
ilarly to the first experiment, we created the training
data sets used for NFA inference and the test sets for
evaluation. The sizes of complete samples were equal
to 200 words split equally between S
+
and S
. The
experimental setup, i.e., computing machines, met-
rics, and weight settings were kept as before.
In Table 4 we show the results obtained for the var-
ious models across all classifiers for the two regexp-
based data sets. Comparing the results to the ones
presented in Tab. 2, we note a significant improve-
ment in the achieved metrics. We also observe that for
RegExp1 data set, except for the S
k
model, all mod-
els achieve perfect scores. Finally, we note that for
RegExp2 all (k +2)-based models, except PM(k + 2),
improve over their k-based counterparts, while for
RegExp1 it only applies to S
model, since the oth-
ers achieved perfect scores. Detailed analysis have
shown that in most cases, the best accuracy and F1-
score were obtained with the 50% training set, but for
the P
(k+2)
, S
k
, and S
(k+2)
models, 10% was enough.
Table 4: Best accuracy and corresponding F1-score metrics
obtained by the NFAs for the analyzed benchmark sets.
Data set P
k
P
(k+2)
P
k
P
(k+2)
S
k
S
(k+2)
S
k
S
(k+2)
Acc
RegExp1 1.00 1.00 1.00 1.00 0.91 1.00 1.00 1.00
RegExp2 0.93 0.88 0.88 0.93 0.77 0.92 0.85 0.87
F1-score
RegExp1 1.00 1.00 1.00 1.00 0.90 1.00 1.00 1.00
RegExp2 0.93 0.86* 0.87 0.95 0.71* 0.92 0.81 0.87
In Table 5, we show the analysis of best accu-
racy and corresponding F1-score for the models vs.
classifiers comparison. It can be observed that the re-
sults improved significantly as compared to the ones
presented in Tab. 3. We can also note that with this
Classifying Words with 3-sort Automata
1187
benchmark, only the S
k
model failed to achieve per-
fect scores. Clearly, there are no significant differ-
ences between classifiers as they performed equally
well regardless of the model.
Table 5: Best accuracy and corresponding F1-score metrics
obtained by the NFAs for the analyzed classifiers.
Accuracy F1-score
Model C
MM
C
MA
C
SM
C
SA
C
MM
C
MA
C
SM
C
SA
P
k
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
P
(k+2)
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
P
k
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
P
(k+2)
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
S
k
0.91 0.91 0.91 0.91 0.90 0.90 0.90 0.90
S
(k+2)
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
S
k
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
S
(k+2)
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
5 CONCLUSIONS
In this paper, we have proposed a method to transform
an NFA with three types of states (accepting, rejecting
and non-conclusive) to a weighted frequency automa-
ton, which could be further transformed into a prob-
abilistic NFA. The developed transformation process
is generic since it allows to control the relative impor-
tance of the different types of states and/or transitions
by customizable weights.
We have evaluated the proposed probabilistic au-
tomata on the classification task performed over two
distinct benchmarks. The first one, based on real-life
samples of peptide sequences proved to be quite chal-
lenging, yielding relatively low quality metrics. The
second benchmark, based on a random sampling of
a language described by a regular expression enabled
us to show the power of probabilistic NFA, producing
accuracy scores of 0.81–1.00 with F1-score ranging
between 0.69 up to 1.00. The second benchmark al-
lowed us to prove that given a representative sample
of an underlying language, the probabilistic NFA can
achieve very good classification quality, even without
sophisticated parameter tuning.
In the future, we plan to apply some heuristics to
tune the weights so that the classifiers perform even
better, especially for real-life benchmarks. Given the
generic nature of the proposed weighted-frequency
automata we also plan to consider using a parallel en-
semble of classifiers, differing not only in terms of
weights, but also in how probabilities are combined.
REFERENCES
Beerten, J., van Durme, J. J. J., Gallardo, R., Capriotti,
E., Serpell, L. C., Rousseau, F., and Schymkowitz, J.
(2015). WALTZ-DB: a benchmark database of amy-
loidogenic hexapeptides. Bioinform., 31(10):1698–
1700.
de la Higuera, C. (2010). Grammatical Inference: Learn-
ing Automata and Grammars. Cambridge University
Press.
Denis, F., Lemay, A., and Terlutte, A. (2004). Learning
regular languages using RFSAs. Theor. Comput. Sci.,
313(2):267–294.
Jastrz ˛ab, T. (2017). Two parallelization schemes for the in-
duction of nondeterministic finite automata on PCs. In
Proc. of PPAM 2017, volume 10777 of LNCS, pages
279–289. Springer.
Jastrz ˛ab, T., Lardeux, F., and Monfroy, É. (2022). Taking
advantage of a very simple property to efficiently in-
fer NFAs. In 34th IEEE International Conference on
Tools with Artificial Intelligence, ICTAI 2022, pages
1355–1361. IEEE.
Jastrz ˛ab, T., Lardeux, F., and Monfroy, É. (2023). Inference
of over-constrained NFA of size k + 1 to efficiently
and systematically derive NFA of size k for grammar
learning. In Proceedings of the International Confer-
ence on Computational Science ICCS 2023, Part I,
volume 14073 of LNCS, pages 134–147. Springer.
Lardeux, F. and Monfroy, É. (2021). Optimized models
and symmetry breaking for the NFA inference prob-
lem. In 33rd IEEE International Conference on Tools
with Artificial Intelligence, ICTAI 2021, pages 396–
403. IEEE.
Lecoutre, C. and Szczepanski, N. (2020). PYCSP3: mod-
eling combinatorial constrained problems in python.
CoRR, abs/2009.00326.
Louros, N., Konstantoulea, K., De Vleeschouwer, M.,
Ramakers, M., Schymkowitz, J., and Rousseau, F.
(2019). WALTZ-DB 2.0: an updated database con-
taining structural information of experimentally deter-
mined amyloid-forming peptides. Nucleic Acids Re-
search, 48(D1):D389–D393.
Rossi, F., van Beek, P., and Walsh, T., editors (2006).
Handbook of Constraint Programming, volume 2 of
Foundations of Artificial Intelligence. Elsevier.
Stützle, T. and Ruiz, R. (2018). Iterated Local Search,
pages 579–605. Springer International Publishing,
Cham.
Tomita, M. (1982). Dynamic construction of finite-state au-
tomata from examples using hill-climbing. Proc. of
the Fourth Annual Conference of the Cognitive Sci-
ence Society, pages 105–108.
Vázquez de Parga, M., García, P., and Ruiz, J. (2006). A
family of algorithms for non deterministic regular lan-
guages inference. In Proc. of CIAA 2006, volume
4094 of LNCS, pages 265–274. Springer.
Wieczorek, W. (2017). Grammatical Inference Algo-
rithms, Routines and Applications, volume 673 of
Studies in Computational Intelligence. Springer.
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
1188