Exploring Patterns and Assessing the Security of Pseudorandom Number

Generators with Machine Learning

Sara Boanc

Babes¸-Bolyai University Cluj-Napoca, Romania

Keywords:

Machine Learning, Neural Networks, Pattern Learning, Pseudorandom Number Generators.

Abstract:

In recent years, Machine Learning methods have been employed for testing the security of pseudorandom

number generators. It is considered that successful learning from pseudorandom data implies the existence of

some detectable pattern within it, thus reducing the generator security. As the number and complexity of such

approaches has reported important growth, the aim of the present paper is to synthesize current results, discuss

perspectives and challenges and provide relevant guidelines for future study. To the best of our knowledge, this

is the ﬁrst comprehensive analysis on the current state of the research into the problem of pseudorandomness

exploration by means of Machine Learning.

1 INTRODUCTION

It is known that pseudorandom number generators

(PRNGs) produce numbers that, despite appearing in-

distinguishable from true random numbers and fol-

lowing certain statistical properties, are the result of a

deterministic process. Effort has been invested into

providing some statistical guideline for evaluating

PRNG security by means of statistical test suites such

as NIST (Rukhin et al., 2001), TestU01 (L’Ecuyer and

Simard, 2007) or Dieharder (Robert G. Brown, 2017)

that take into consideration properties like equidistri-

bution, gap and birthday spacings (among many oth-

ers) to determine whether the generated pseudoran-

dom sequence is uniformly distributed or may be pre-

dicted with more than chance accuracy by random

guessing. While such tests provided tremendous ad-

vances for PRNG evaluation in the past, their ability

of understanding patterns is reduced. With the advent

of Machine Learning, one may pose the question of

whether more than just statistical properties can be

extracted from PRNG data.

Thus far, Machine Learning approaches have been

used for performing cryptography and cryptanalysis

related tasks, such as pseudorandom number gener-

ation (Pasqualini and Parton, 2020) and differential

cryptanalysis (Gohr, 2019) bridging the gap between

theoretical algebra and the innovative possibilities of

intelligent algorithms. In the context of PRNG secu-

rity assessment through pattern exploration, Machine

Learning methods have been used sporadically un-

til (Fischer, 2018) formally proposed neural networks

for the task. As a consequence, a new area of research

has emerged and the number of studies on this matter

has signiﬁcantly increased.

The aim of this paper is to bring to light exist-

ing research on PRNG security exploration by means

of Machine Learning in an attempt to synthesize cur-

rent knowledge and results as well as discuss poten-

tial challenges and opportunities for the future. As

this ﬁeld is still in its infancy, approaches are in-

cluded based on the novelty and original perspective

they bring to the problem. To the best of our knowl-

edge, this is the ﬁrst comprehensive study discussing

the current state of the research in Machine Learning

methods applied for PRNG pattern exploration and

security assessment.

The rest of the paper is structured as follows. Sec-

tion 2 discusses research premises in terms of prob-

lem formulation. Section 3 presents the existing ap-

proaches of Machine Learning methods applied for

PRNG security assessment. Section 4 displays syn-

thesized results. Section 5 proposes a discussion on

future work. Section 6 presents the conclusions of the

study.

2 PROBLEM FORMULATION

The present section describes existing formulations

for the problem of learning from pseudorandom data.

186

Boanc

a, S.

Exploring Patterns and Assessing the Security of Pseudorandom Number Generators with Machine Learning.

DOI: 10.5220/0012312900003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 3, pages 186-193

ISBN: 978-989-758-680-4; ISSN: 2184-433X

Over the years, emerging research in the area of pseu-

dorandom pattern exploration by means of Machine

Learning has been organized in a number of direc-

tions:

• next in sequence prediction;

• entropy estimation;

• inversion;

• sequence randomness classiﬁcation (strong vs.

weak).

Next in sequence prediction may be the most common

formulation for the problem of learning from pseu-

dorandom data. It consists of training classiﬁcation

or regression models on pseudorandom number se-

quences in order to determine, for a given input se-

quence, which value is most likely to be generated

next. The approach is based on the intuition that weak

generators may exhibit more easily detectable pat-

terns (see randograms for Linear Congruential Gener-

ators (O’neill, 2014)), thus enabling learning and of-

fering a greater possibility of success when predicting

future outputs. In this case, the strength of the learn-

ing algorithm is measured by its prediction accuracy.

However, despite the apparent simplicity of the task,

the problem of predicting next PRNG values consid-

ering past sequences is a difﬁcult one and in practice

PRNG simplﬁcation is often used in order to facilitate

learning.

Another emerging direction of research is PRNG

entropy estimation, often measured in terms of the

min-entropy score. Higher entropy values are asso-

ciated with a higher quantity of information betrayed

by the PRNGs. Studies approaching the problem in

this way use a next in sequence prediction compo-

nent to estimate subsequent PRNG values and com-

pute entropy scores based on the obtained results. The

method has gained popularity since its introduction in

the late 2010s and interest in developing complex Ma-

chine Learning models for PRNG entropy estimation

is increasing.

PRNG inversion by means of Machine Learing is

an interesting, yet underdeveloped direction in PRNG

security assessment which involves training models

to learn form PRNG data to the point of determining

their hidden parameter conﬁguration (such as seed,

modulus, tap positions or feedback polynomial for

LFSRs). Due to its difﬁculty, there are few attempts

to approach the problem in this way, yet some of the

cited studies may be considered to have (at least par-

tially) achieved it.

Another under-examined direction is the one of

sequence randomness classiﬁcation, which involves

training Machine Learning models to differentiate be-

tween strong and weak pseudorandom sequences. A

strong pseudorandom sequence exhibits high unpre-

dictability, namely it is difﬁcult to ﬁnd correlations

within its data and consequently, to learn from it,

while in the case of a weak sequence, such correla-

tions are more apparent. The problem is formulated in

(Savicky and Robnik-

Sikonja, 2008) and its difﬁculty

is acknowledged by the authors. Like in the case of

other PRNG security assessment approaches, no se-

curity proof can be emitted for the sequences that pass

such tests, but security may be dismantled for those

that fail.

The aforementioned problem formulations have

been introduced to provide improved understanding

of the task in relation to the presented approaches.

3 RELATED WORK

The current section organizes existing research in the

ﬁeld of pseudorandomness exploration and security

assessment by means of Machine Learning. The

speed with which interest in this ﬁeld has increased as

well as the modernization and complexity of the tech-

niques used are highlighted by the timeline display of

the studied approaches.

Early attempts at PRNG prediction with Machine

Learning revolve around Decision Trees.

The authors of (Kant and Khan, 2006) predict sub-

sequent bits from a Fibonacci Linear Feedback Shift

structs a pseudorandom sequence by performing the

exclusive or (XOR) operation on bits located at tap

positions. The tap positions of the LFSR can be de-

scribed by a polynomial modulo 2. The powers of

the polynomial denote the positions of the taps to be

XOR-ed within the LFSR. The authors frame the next

in sequence prediction task as a classiﬁcation problem

and attempt to solve it through the C4.5 algorithm. It

is observed that a number of bits equal to the degree of

the polynomial is needed as input size in order to out-

perform random guessing. The model reaches 100%

classiﬁcation accuracy. The authors introduce a Bit

Prediction Ratio formula that describes the number of

input bits necessary for predicting the subsequent one,

which they ﬁnd increases with the number of taps.

Moreover, the rules generated by the C4.5 algorithm

can be used to infer the initial feedback polynomial,

which may be considered a ﬁrst attempt to PRNG in-

version.

The work of (Savicky and Robnik-

Sikonja, 2008)

explores patterns in a widely used PRNG, namely

Matlab rand() in the case the state method is used

(denoted rand state). The authors propose a Ran-

dom Forest (RF) based framework and formulate 4

Exploring Patterns and Assessing the Security of Pseudorandom Number Generators with Machine Learning

187

different approaches for PRNG learning: two in-

volving classiﬁcation, (following different threshold-

ing), one involving regression and one involving fea-

ture importance evaluation. Comparisons are per-

formed with PRNGs such as MINSTD, ANSIC, the

rand() method in the Microsoft C compiler (denoted

as ms rand) and RANDU (with odd seeds) as well

as Mersenne Twister (MT) based PRNGs such as

MRG32k5a and WELL19937a, which describe an

upper bound for security comparison. Statistical tests

lead to the rejection of the hypothesis according to

which rand state and RANDU values are indepen-

dent and identically distributed. For regression, it is

considered that if the ratio between the current error

and the average error of the test set is smaller than 1,

the model has successfully learned an approximation.

Root Mean Squared Error (RMSE) is used for mea-

surement. Both rand state and RANDU (with even

and odd seeds) fail, while the rest of the generators

pass. Dependencies are detected in all PRNGs, but

are strongest in rand state and RANDU.

The work of (Kant et al., 2009) expands on (Kant

and Khan, 2006), by employing a variety of Machine

Learning techniques, namely Naive Bayes (NB), Av-

erage One Dependence Estimators (AODE), C4.5 and

Artiﬁcial Neural Networks (ANNs), to predict future

bits for an Alternating Step and a Geffe LFSR as

well as for a number of keystreams in the eSTREAM

project. It is observed that the performance of NB is

insufﬁcient for the given task and the ANN needs a

large number of instances for training. For the Al-

ternating Step and the Geffe LFSR, C4.5 and AODE

obtain over 90% accuracy, with C4.5 performing bet-

ter. It is important to note that previous experiments

(Kant and Khan, 2006) were unable to attain high pre-

diction accuracy for the Geffe LFSR. The authors ob-

serve that the number of bits needed for prediction

increases with the degree of the polynomial and the

sparsity of the taps. Studied keystreams could not be

predicted with more than chance accuracy.

Following these initial approaches, the literature

has seen a shift in focus as approaches began to lean

towards the increasingly popular neural networks.

The work of (Hashim and Abdulhussien, 2015)

describes an attempt at strong vs. weak PRNG clas-

siﬁcation with ANNs. While such classiﬁcation is

deemed feasible, the approach involves rather few

pseudorandom number sequences. Moreover, since

class labels used in training are obtained through sta-

tistical test evaluation, one may say the neural net-

works capacity for learning is reduced to the baseline

of statistical tests. The approach was included, how-

ever, on the base of its originality and for being, to the

best of our knowledge, the ﬁrst attempt at strong vs.

weak PRNG classiﬁcation.

The work of (Fan and Wang, 2018) studies the

ability of an ANN to ﬁnd correlations in natural se-

quences such as π,

√

2 and e and the MT PRNG. Nat-

ural sequences follow a thresholding for binarization.

The authors perform statistical testing on the obtained

results and claim enough evidence exists for consid-

ering that thresholded π digits present a subtle corre-

lation that can be learned by means of an ANN. The

approach is extended to a Linear Congruential Gener-

ator (LCG) and a Quantum Random Number Gener-

ator (QRNG) in the work of (Feng and Hao, 2020). It

is observed that slightly more predictable results are

obtained for the LCG, while insufﬁcient information

can be extracted from other generators.

The study of (Fischer, 2018) is considered an in-

ﬂection point in the literature since PRNGs to be

evaluated are extended to include cryptographically

secure PRNGs and learning is performed by ad-

vanced Machine Learning techniques such as Long

Short-Term Memory Networks (LSTMs), Gated Re-

current Units (GRUs) and their bidirectional varia-

tions (BiLSTM, BiGRU). The studied generators are

Debian rand, Python random.randint, arc4random,

SHA1PRNG and /dev/urandom (with arc4random

and SHA1PRNG passing the Dieharder test suite).

Training involves a simple data mean ﬁtting step fol-

lowed by a more complex estimation step based on

the networks ability of learning variation. The Mean

Average Error (MAE) score is obtained and an evalu-

ation framework based on multiple indicators is built.

The ﬁrst indicator is the computed mean. If the MAE

score is lower than the expected average, learning is

considered successful. Due to model construction, it

is believed that a good model provides low MAE lev-

els and a good variance score in order to prove the

model did not simply learn the mean of the training

set. Thus, the second indicator is the obtained vari-

ance. In the case variance is small, the system has

likely overﬁt the data. The authors describe some

variance bounds within which learning is deemed suc-

cessful. The third indicator is p-value testing. Statis-

tical tests are performed for p-values < 0.01 in or-

der to reject the null hypothesis of the data belonging

to an independent and identical distribution. If one

indicator raises concerns, the result is unknown. If

more indicators suggest the model has learned from

the PRNG sequence, the result is fail. The tests pass

only if no indicators are able to detect issues.

Following this framework, vulnerabilities are dis-

covered in all studied PRNGs, with higher degree of

insecurity for rand and random.randint. The LSTM

model is deemed the most suitable for the task.

The paper (Truong et al., 2018) describes an at-

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

188

tempt to evaluate the security of a Quantum Random

Number Generator (QRNG) by means of min-entropy

estimation and, in the process, provides an evaluation

for a LCG as well. The authors use a hybrid model

consisting of Convolutional and LSTM layers (CNN-

LSTM). The problem is approached as a classiﬁcation

in terms of the output layer, where classes represent

the corresponding bits of the following number. The

authors observe that up to a certain period the model

displays better than random performance, but the ac-

curacy declines as the period is increased beyond that

threshold.

The work of (Lv et al., 2020) proposes a Recurrent

Neural Network (RNN) and an ANN for the task of

min-entropy estimation. This comes as a reﬁnement

to the traditional approach (Kelsey et al., 2015) that,

due to its design, is prone to underestimates. The pro-

posed estimators are used to evaluate a number of se-

quences, including pseudorandom sequences such as

those from random.org, Ubld.it, linux kernel entropy

source, linux /dev/urandom and Windows random bit

generator with Crypto API. The approach consists of

evaluating network performance while still updating

model parameters. When compared to the traditional

NIST package evaluation, the results obtained by both

the RNN and the ANN are considered better. Among

the two, RNNs prove superior excepting the linux ker-

nel entropy source, where ANNs provide better esti-

mation. Due to the fact that the generator draws its

entropy from the pattern of repetitiveness of user in-

teraction with the operating system and given the na-

ture of ANN learning, this led the authors to believe

periodicity is produced in the random number gener-

ator.

The authors of (Li et al., 2020) propose a LSTM

model with Temporal Pattern Attention (TPA) to learn

from a QRNG and a LCG. The problem is approached

as in (Truong et al., 2018), with a ﬁnal classiﬁcation

step for the bits of the following number. Compar-

isons with models similar to those in (Truong et al.,

2018) and (Lv et al., 2020) are provided and the re-

sults are evaluated with respect to the length of the

input sequence as well. It is observed that both the

ANN and the TPA-LSTM are able to detect patterns

for sequence lengths as small as 3.2 ×10

. While in

this case the ability of the ANN is signiﬁcantly higher

than that of the TPA-LSTM, as the sequence grows to

8 ×10

the use of attention is considered inﬂuential

in elevating the TPA-LSTM model performance to an

accuracy of over 95%.

The work of (John Labelle, 2020) explores the

deterministic nature of PRNGs and attempts to de-

termine subsequent values generated by the Xor-

shift128 PRNG using an LSTM. While the developed

model exceeds 95% bitwise accuracy, (Mostafa Has-

san, 2021) observes that using an ANN for the task

can signiﬁcantly reduce the number of model param-

eters while maintaining (and even increasing) the ac-

curacy. Thus, an ANN is trained and achieves 100%

bitwise accuracy. Moreover, the author displays the

values of the learned weights, showing that the model

was able to capture essential information in the data in

what may be considered an early stage of an inversion

attempt. These approaches are considered noteworthy

as they discuss the trade off between model complex-

ity, accuracy and domain knowledge in the context of

PRNG learning.

The work of (Gupta et al., 2021) addresses the

same problem as (Kant and Khan, 2006) for the Fi-

bonacci LFSR. The authors attempt to decrease the

number of bits the traditional Berlekamp Massey

(BM) algorithm needs for decryption, then train an

ANN on the reduced dataset to perform next in se-

quence prediction. This is possible through the in-

troduction of two pattern generation algorithms. The

problem is again framed as a classiﬁcation task.

Training is ﬁrst conducted in order to ﬁnd the mini-

mum number of bits needed to obtain sufﬁcient accu-

racy by gradually decreasing this number and evalu-

ating model performance. As the lower bound on the

number of bits is established, the value is retained and

the model is retrained with this conﬁguration. Thus,

the authors succeed in ﬁnding a new lower bound for

the number of bits needed for learning to make accu-

rate next in sequence predictions for the LFSR.

The work of (Kim and Kim, 2021) performs a the-

oretical analysis on the relation between the ANN in-

put size and tap positions of Fibonacci LFSRs in the

context of learning from PRNG data, validating their

results in practice via a number of simulations. After

training on the proposed conﬁguration, model insight

is provided through weight visualization, which may

be considered a proof for the model ability towards

inversion.

The paper (Amigo et al., 2021) approaches the

problem of predicting subsequent values for some

LCG variations using deep ANNs. The task is framed

as a regression, where the next in sequence prediction

problem is modeled in a Markovian way (considering

only one previous state) and is reduced to the approx-

imation of a piecewise continuous function with a ﬁ-

nite number of discontinuities. The authors success-

fully train the models and compute the Mean Squared

Error (MSE) for evaluation. Errors in approximation

are found to occur in very few cases, for points close

to the discontinuity edge. This is assumed to happen

because of the nature of the training process where

only one previously generated seed is used and con-

Exploring Patterns and Assessing the Security of Pseudorandom Number Generators with Machine Learning

189

secutive values near discontinuity points lead to pre-

diction results at opposite sides of the codomain. The

approach is claimed to be the ﬁrst to successfully use

neural networks for estimating piecewise continuous

functions.

4 EVALUATION AND RESULTS

The present section expands on the evaluation meth-

ods and results obtained in the studied approaches.

It is important to note the difﬁculty of training

Machine Learning models to learn from PRNG data.

There have been cases when certain approaches pro-

duced no results (Zanin, 2022). For this reason, any

attempt that manages to extract information from such

data may be deemed, at least to some degree, success-

ful. Section 3 describes a number of approaches for

performing this admittedly difﬁcult task. It is appar-

ent that the focus of the scientiﬁc community has been

on framing it as either a next in sequence prediction

or an entropy estimation problem. While the next in

sequence prediction approaches are more numerous,

those concerning entropy estimation tend to employ

increasingly complex models.

Next in sequence prediction is generally per-

formed through an initial simpliﬁcation (binariza-

tion) of the sequences and consequently regarded as

a binary classiﬁcation problem (Savicky and Robnik-

Sikonja, 2008), (Fan and Wang, 2018), (Feng and

Hao, 2020). Attempts at performing regression on

real valued inputs have been successful, but have re-

quired more complex architectures as well as heavier

preprocessing (Amigo et al., 2021).

The computational techniques used range from

Decision Trees in early approaches to more sophis-

ticated neural network models (ANNs, LSTMs) and

the majority focus on PRNGs based on shift registers

or LCGs, while there are some that evaluate PRNGs

that come with a security claim as well. The generic

approach involves computing the next value of the

sequence and retaining the corresponding prediction

accuracy that would, in the end, be subject to a sta-

tistical test (Savicky and Robnik-

Sikonja, 2008), (Fan

and Wang, 2018), (Feng and Hao, 2020). However,

in some cases the values may be considered in en-

sembles of indicators to obtain the ﬁnal verdict on the

PRNG security (Fischer, 2018).

It is observed that ANNs are preferred for learning

on LFSRs and produce better results than their recur-

rent variations.

For regression tasks, metrics such as MSE (Amigo

et al., 2021) or RMSE (Savicky and Robnik-

Sikonja,

2008) are used to evaluate the accuracy of the predic-

tion.

Regarding the formulation of the problem as an

entropy estimation task, evaluated PRNG generators

range from simple LCGs to some that include quan-

tum noise components. It is considered that higher

entropy scores signify larger amounts of learned in-

formation. For this method, all employed models

rely on neural networks and comparisons may be per-

formed not only against a baseline distribution, but

Figure 1: Machine Learning techniques used in the corresponding studies.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

190

Table 1: Overview of the studied Machine Learning approaches for PRNG exploration.

Work Technique PRNG Training dataset

size

Results

(Kant and Khan,

2006)

C4.5 Fibonacci LFSR,

Geffe LFSR

100% prediction

accuracy,

polynomial

inference

(Savicky and

Robnik-

Sikonja,

2008)

RF rand state,

ms rand, RANDU,

MINSTD, ANSIC,

MRG32k5a,

WELL19937a

dependencies in all

but MRG32k5a

and WELL19937a,

worst in rand state

(Kant et al., 2009) C4.5, NB (no

results), AODE,

ANN

alternating step

LFSR, Geffe

LFSR, eSTREAM

90% of period, 10

for eSTREAM

up to 100%

prediction

accuracy for

LFSRs

(Hashim and

Abdulhussien,

2015)

ANN random bit

generator

low MSE score

(Fan and Wang,

2018)

ANN thresholded natural

sequences, MT,

QRNG

higher prediction

accuracy for π

(Feng and Hao,

2020)

ANN thresholded natural

sequences, MT,

LCG, QRNG

higher prediction

accuracy for π,

LCG

(Fischer, 2018) LSTM, GRU,

BiLSTM, BiGRU

random.randint,

Debian rand(),

arc4rand,

SHA1PRNG,

/dev/urandom

vulnerabilities in

all but

/dev/urandom

(Truong et al.,

2018)

CNN-LSTM LCG, QRNG 10

better accuracy for

LCG for smaller

periods

(Lv et al., 2020) ANN, RNN random.org,

Ubld.it, linux

kernel entropy

source,

/dev/urandom,

Windows RNG

−10

entropy evaluation,

possible

periodicity in

/dev/urandom

(Li et al., 2020) CNN-LSTM with

TPA

LCG, QRNG 10

high entropy for

LCG for smaller

periods

(John Labelle,

2020)

LSTM Xorshift128 10

over 95% bitwise

accuracy

(Mostafa Hassan,

2021)

ANN Xorshift128 10

100% bitwise

accuracy

(Gupta et al.,

2021)

ANN Fibonacci LFSR 7 to 45% less than

up to 100%

prediction

accuracy

(Kim and Kim,

2021)

ANN Fibonacci LFSR 10

0 Bit Error Rate

value

(Amigo et al.,

2021)

Deep ANN variations of a

LCG

good model ﬁt,

low MSE score

Exploring Patterns and Assessing the Security of Pseudorandom Number Generators with Machine Learning

191

also against similar approaches. The work of (Li

et al., 2020) evaluates model performance with re-

spect to various data sizes as well. Moreover, it pro-

vides a comprehensive comparison to min-entropy

scores obtained by network architectures similar to

those in (Truong et al., 2018) and (Lv et al., 2020).

In terms of training, the above described ap-

proaches use datasets of size ranging from 10

−10

in early studies to 10

in the later ones. The train-

ing time varies from hours to weeks, being generally

performed on GPUs.

Table 1 provides a synthesized overview of exist-

ing approaches. It covers computational techniques,

PRNGs, dataset size and obtained results for each of

the discussed works.

Figure 1 displays a distribution of the Machine

Learning techniques used the included studies.

5 DISCUSSION AND FUTURE

WORK

The present section expands on some of the discussed

approaches and presents a number of directions for

future study.

The task of training Machine Learning models for

PRNG pattern exploration and security assessment

is undoubtedly difﬁcult, hence the small number of

studies performed on the subject in the past decades.

Recent advances in Machine Learning have prompted

growing interest in performing such experiments as

they reveal not only the complexity of PRNGs, but

also that of the models employed for the learning task.

For this reason, study in the direction of Machine

Learning analysis of loosely correlated data such as

that of PRNGs may be regarded as a means of test-

ing (and expanding) the frontier of how much infor-

mation Machine Learning models are capable to ex-

tract. It may be considered that future studies per-

formed in this direction will have an enormous im-

pact when extending their results to other problem

domains that share in dealing with subtle correlations

(Fan and Wang, 2018). To some extent, this was al-

ready achieved in the case of (Kim and Kim, 2021),

where the actual application domain of the study was

intended to be optical communication systems.

Interest has been invested in employing current

PRNG pattern exploration models to verify quantum

random number generators as well.

In terms of current results, the studied approaches

involve a rather small number of generators mostly

belonging to classes of weak PRNGs (Shift Registers,

LCGs), yet interesting learning tasks have been for-

mulated for them. The most widely used approach is

that of next in sequence prediction, while entropy es-

timation problems have sparked growing interest by

capitalizing on the use of increasingly complex mod-

els. Few studies have so far involved inversion and

strong vs. weak PRNG classiﬁcation. Despite the

widespread tendency of leaning towards “one-size-

ﬁts-all” black box models, given the difﬁculty of the

problem, one may at this stage beneﬁt from devel-

oping tailored models with domain knowledge, tar-

geting some precise PRNG. When such approaches

are applied, especially in the case of stronger PRNGs,

generator simpliﬁcation may be required to perform

learning.

It is believed that successful models can be devel-

oped using PRNG simpliﬁcation and domain knowl-

edge to capture the underlying deterministic pro-

cesses of stronger generators. It may be assumed that

once captured, such deterministic rules can be learned

to perfection. It would be interesting to perform such

attempts on stronger PRNGs with next in sequence

prediction and inversion tasks, especially since the

latter has been under-examined until this point.

Different robust neural network models such as

Generative Adversarial Networks (GANs) and Trans-

formers, as well as other hybrid model frameworks

may be evaluated with respect to this task.

Despite important progress, the time when Ma-

chine Learning testing will be performed instead of

(or complimentary to) traditional statistical test suite

evaluation may still be far. The main reason for this is

the complexity in the design of PRNGs regarded from

the Machine Learning methods perspective. Another

important issue is the high training time which may

deem implementation impractical. However, with the

advent of GPU and TPU computation this drawback

is expected to be mitigated in the future.

It is important to acknowledge the progress at-

tained through the use of Machine Learning for

PRNG pattern exploration and security evaluation in

the past decades. The number and complexity of ap-

proaches to tackle this task is rapidly increasing as

is the interest in this emerging niche. While current

domain applicability of the developed solutions may

be reduced to theoretical exploration, there is reason

to believe that in the future, knowledge derived from

these studies may actually be transferred to tackle