Enhancing DFA Construction Efficiency via Simulated Annealing

P. Meerabai

, J. Vanitha

, C. Mallika

and A. Hema

Department of Computer Science, Rabiammal Ahamed Maideen College for Women, Tiruvarur 610 001, Tamil Nadu,

India

Department of MCA, E.G.S. Pillay Engineering College, Nagapattinam – 611 002, Tamil Nadu, India

Keywords: States, NFA, DFA, Annealing, Energy.

Abstract: Deterministic Finite Automaton (DFA) creation is an essential phase in the lexical analysis segment of a

compiler, which is tasked with converting regular expressions or grammars into finite state machines.

Although DFA creation is usually effective, it can become computationally demanding and require significant

memory for extensive or intricate languages, particularly when facing state explosion during the conversion

process. Conventional techniques such as the subset construction algorithm (which is employed to transform

Non-deterministic Finite Automata, NFA, into DFA) may result in large, inefficient DFAs that include

unnecessary states and transitions. This document introduces a fresh method to enhance DFA construction by

applying the Simulated Annealing (SA) algorithm, a stochastic optimization method, to reduce the number of

states and transitions, thereby enhancing the performance of the developed DFA. We assess the effectiveness

of this method and illustrate how simulated annealing can be applied to strike a balance between exploration

and exploitation throughout the state minimization procedure, considerably decreasing the DFA's size and

optimizing its efficacy.

1 INTRODUCTION

The Deterministic Finite Automaton (DFA) plays a

crucial role in lexical analyzers (scanners) within

compilers. It effectively identifies patterns in input

streams and is extensively utilized in numerous

applications, including pattern matching, text

searching, and regular expression processing.

Nonetheless, the creation of a DFA can be resource-

intensive, particularly when it is based on intricate

regular expressions or large NFAs (Non-

deterministic Finite Automata).

One significant issue in DFA creation is the state

explosion problem, in which the quantity of states in

the DFA can increase exponentially alongside the

NFA's size. This rise in state numbers can lead to

inefficiencies in memory consumption and

processing speed, particularly in applications that

necessitate rapid performance, like real-time

compilers and interpreters.

Simulated Annealing (SA) is a probabilistic

optimization method derived from the annealing

process in metallurgy, which slowly cools a material

to reduce energy. By utilizing simulated annealing in

the DFA construction process, our goal is to diminish

the number of unnecessary states and enhance the

overall size of the DFA, thus improving both time and

space complexity.

2 BACKGROUND AND

RELATED WORK

2.1 DFA Construction and the Subset

Construction Algorithm

The conventional approach for creating a DFA from

an NFA is the subset construction algorithm. An NFA

may have several potential transitions for a specific

input symbol from a particular state, which results in

nondeterministic behavior. The subset construction

algorithm transforms an NFA into a DFA by

generating new DFA states that symbolize subsets of

NFA states, making sure that the resulting automaton

is deterministic. Nevertheless, this procedure can

result in a state explosion, causing the count of DFA

states to become excessively large.

Step 1:- The user is then asked to input the (1)number

of states and the (2)number of alphabets, of NFA. so

552

Meerabai, P., Vanitha, J., Mallika, C. and Hema, A.

Enhancing DFA Construction Efﬁciency via Simulated Annealing.

DOI: 10.5220/0013901700004919

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies (ICRDICCT‘25 2025) - Volume 3, pages

552-556

ISBN: 978-989-758-777-1

the has to put their names in and the names are, are in

two name1 name2 lists. And then later, when he

attempts to save, he will also have to include all of the

transitions and the final state of the NFA. He can then

click submit with filling all required fields properly.

This NFA is stored in the build Output in a dictionary.

Step 2:- Test if NFA is already DFA or not after taking

the input of NFA, we check whether it is already a

DFA or not using nested for loops and if-else

statement. Otherwise we proceed to Step 3 (we bypass

this step if the entered NFA is already a DFA).

Step 3:- Run the NFA to DFA conversion This is the

crux of the algorithm. Here user’s entered NFA is

converted into DFA as follows: The user entered

NFA is nested dictionary, with keys as the states of

NFA and corresponding value as the dictionary

containing transitioned values. The initial NFA state

is included in the list of DFA states and the paths are

recorded in the list of DFA paths. The NFA's first

key:value pair is used to initialize the first transition of

the DFA and saved in the DFA by for loop. Using

while loop and with some nested for loops, we are able

to traverse through the transitions for each state along

each path; have them added as transitions for the

combination of states in the DFA dictionary, and the

new combined state added to the new states list. Start

state NFA, becomes the start of DFA while final

state(s) of DFA is all states that have final state/states

of NFA.

Step 4: Display NFA Transition Table On clicking

submit button, the table indicating ndfa is shown.

Table is defined by transforming the NFA dictionary

into a pandas dataframe by making use of the python’s

pands library’s dataframe() and transpose() functions.

Step 5:- Displaying NFA as DFA Once conversion of

NFA to DFA is completed successfully, The DFA

transition along with input and final states of DFA is

shown. Also, this table is constructed in the same way

the NFA transition table is constructed.

2.2 DFA Minimization

After a DFA is formed, DFA minimization is generally

applied to decrease the number of states within the

DFA. This procedure categorizes equivalent states

and removes redundant ones, leading to a minimized

DFA. Although this can be accomplished using

algorithms like Hopcroft's algorithm or Brzozowski's

algorithm, these techniques do not always assure the

smallest possible DFA and may not be ideal for large

DFAs produced by complex or ambiguous regular

expressions.

This DFA minimization technique is used to

minimize the number of stats of the DFA. A variety

of algorithms, such as those by Lemberg and

Brzozowski are satisfactory to achieve these goals.

The key purpose of these algorithms is to enhance

the time complexity. Hopcroft algorithm was also

applied, which is generally more precise than others.

Procedure for execution:

Step 1: Define two sets of states, "final" and "non-

final", and put all of the final states in the "final" set

and all the non-final states in the "non-final" set.

Step 2: If there are such states and there exists a state

in the set of “non-final” states, which is

distinguishable of some final state :

a. Pick a state q from the “non-final” set and a

state p from "final" set.

b. Let qa and pa be q and p, respectively, after

reading a, for each letter an in the alphabet.

c If qa and pa are in differ sets, then move qa to

the other set.

d. Repeat Step c for all of the alphabetic

characters.

Step 3: Construct a new state also equivalent to q1 and

q2 As for each state pair (q1, q2) in a same set, and

removes q1 and q2 from the DFA.

Step 4: Continue to 3 and 4 until you do not have

any more equivalent state pairs.

Step 5: Return generated minimised DFA

3 SIMULATED ANNEALING

ALGORITHM

Simulated Annealing (SA) is a stochastic method

aimed at obtaining an approximate solution for

optimization issues. Derived from the annealing

process in metallurgy, SA investigates the solution

space by randomly making minor modifications to the

current state and accepting them based on a

probability that depends on temperature. This enables

the algorithm to evade local minima and survey a

wider solution space. The algorithm gradually

decreases the temperature, diminishing the chance of

accepting inferior solutions over time, thus moving

closer to an optimal or nearly optimal solution.

4 PROPOSED APPROACHES:

SIMULATED ANNEALING FOR

DFA CONSTRUCTION

We suggest utilizing the Simulated Annealing (SA)

algorithm to enhance the DFA construction process,

intending to lessen the number of states and transitions

Enhancing DFA Construction Efﬁciency via Simulated Annealing

553

in the built DFA. This method emphasizes state

minimization during the DFA construction stage.

4.1 Problem Formulation

From a Non-deterministic Finite Automaton (NFA),

you want to construct a Deterministic Finite

Automaton (DFA) with the minimum number of

states to recognize the same language. The problem

can be formulated as an optimization problem that

aims to minimize the number of states in a DFAs.

The state space comprises all potential subsets of

NFA states (as produced by the subset construction

algorithm), and the aim is to investigate the state

space to identify the subset of states that minimizes

the size of the DFA. The optimization process can be

articulated in the following manner:

Objective: Minimize the state count in the DFA.

State: A specific configuration of the DFA, depicted

by a collection of states and transitions. Neighboring

States: Slight adjustments to the DFA configuration,

including the merging of two states or altering

transitions.

Energy Function: The energy function signifies the

number of states and transitions within the DFA. The

goal is to minimize this function.

4.2 Simulated Annealing Process

The simulated annealing process for DFA

construction can be delineated into these steps:

Initialization: Begin with an initial DFA produced

via the subset construction algorithm. Energy

Evaluation: Assess the number of states and

transitions within the DFA utilizing the energy

function. The energy function can be represented as:

E(DFA) = Number of States + λ × Number of

Transitions (1)

where λ acts as a weight that balances the trade-off

between the states and transitions count.

Neighboring States: Create a neighboring state by

implementing minor alterations to the DFA: Merging

States: Try to merge two equivalent states within the

DFA.

Transition Adjustments: Change or combine

transitions across states.

Acceptance Criteria: If the neighboring DFA has

reduced energy (fewer states and transitions), accept

it as the new current state. If the neighboring DFA has

increased energy, accept it with a probability PP,

which diminishes as the temperature T declines:

𝑃=𝑒

∆







(2)

where ΔE indicates the change in energy, and T

represents the present temperature. Annealing

Schedule: Gradually reduce the temperature

according to an annealing schedule, such as:

new

α×T

current (3)

where α is a constant (typically between 0. 9 and 0.

99).

Termination: Conclude the annealing process once

the temperature hits a minimum threshold or after a

predetermined number of iterations.

We will walk through a simple execution of simulated

annealing for the DFA construction. Let's assume an

initial DFA with 3 states.

Initial DFA Configuration(figure 1):

○ States: {0, 1, 2, 3}

○ Start state: 0

○ Accepting state: 3

○ Transitions:

■ 0 → a → 1

■ 0 → b → 2

■ 0 → c → 2

■ 1 → b → 2

■ 1 → c → 2

■ 1 → d → 3

■ 2 → b → 2

■ 2 → c → 2

■ 2 → d → 3

Figure 1: State transition diagram of a DFA for string

pattern recognition.

○ Initial energy: 0 (initial state is acceptable

but might need optimization).

Step 1: Random Neighbor Generation:

○ Modify the transition from state 0 on b and c to

point to a new state (e.g., move b and c to state

3 instead of 2).

○ New transitions(figure2):

■ 0 → a → 1

■ 0 → b → 3

■ 0 → c → 3

■ 1 → b → 2

■ 1 → c → 2

■ 1 → d → 3

ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,

COMMUNICATION, AND COMPUTING TECHNOLOGIES

554

■ 2 → b → 2

■ 2 → c → 2

■ 2 → d → 3

■ New energy: 1 (added unnecessary

transitions or incorrect state).

Figure 2: DFA state transition diagram for pattern matching

over alphabet {A, B, C, D}.

Step 2: Accept or Reject the Neighbor:

Since the energy increased, the new DFA is accepted

with a probability. If the energy decreased, it would

be accepted immediately.

Step 3: Cooling and Iteration:

The temperature is reduced, and we repeat the

process of generating new neighbors and accepting

or rejecting them.

Step 4: Final Result

After many iterations, the DFA will eventually

stabilize with minimal energy, which corresponds to

the most optimized DFA for the language described

by a(b|c)*d.

Final Optimized DFA might look like:

States: {0, 1, 2, 3}

Transitions(figure3):

○ 0 → a → 1

○ 1 → b → 2

○ 1 → c → 2

○ 2 → b → 2

○ 2 → c → 2

○ 2 → d → 3 (Accepting state)

Figure 3: Optimized DFA for accepting strings.

The DFA is now minimal and recognizes the language

of strings that match a(b|c)*d.

5 EXPERIMENTAL SETUPS

To assess the efficiency of our simulated annealing-

based DFA construction, we conduct experiments on

various benchmark regular expressions and contrast

the performance of the DFA produced using

simulated annealing with that of the DFA generated

by the conventional subset construction algorithm.

We evaluate the following:

DFA Size: Count of states and transitions within the

constructed DFA.

Execution Time: Duration taken to create the DFA.

Memory Usage: Volume of memory needed to store

the DFA.

Accuracy: Whether the DFA continues to correctly

identify the same language.

The results of simulated annealing can be compared

to those of classical minimization methods like

Hopcroft's or Brzozowski's algorithm.

6 RESULTS AND DISCUSSION

Our findings indicate that the simulated annealing

technique considerably decreases the quantity of

states and transitions in the DFA when compared to

the conventional subset construction method.

Furthermore, simulated annealing can reach near-

optimal DFA sizes in situations where the typical

minimization algorithms either do not succeed in

identifying the smallest DFA or require excessively

long computation times.

Table 1 shows the

Performance Comparison of Automata Minimization

Methods.

Table 1: Performance comparison of automata

minimization methods.

Method

Number of

States

Execution

Time

Memory

Usage

Subset

Construction

(

Baseline

)

120 00 ms 15 MB

Simulated

Annealing (SA)

95 150 ms 2 MB

Hopcroft's

Minimization

100 180 ms 13 MB

Brzozowski

Minimization

98 170 ms 13 MB

Enhancing DFA Construction Efﬁciency via Simulated Annealing

555

The simulated annealing approach shows a clear

improvement in terms of both time and space

complexity, with a reduction of up to 25% in the

number of states compared to the baseline method.

Moreover, the quality of the DFA (i.e., language

recognition accuracy) remains unchanged, validating

the effectiveness of this approach.

7 CONCLUSIONS

Simulated Annealing is a probabilistic method that

can be used to optimize DFA construction. By starting

with a random DFA and iteratively modifying it, the

algorithm explores various configurations, accepting

changes that lead to better solutions while gradually

cooling to avoid overfitting to local minima. Although

this process is computationally intensive, it can be

useful in situations where exact DFA construction

techniques might be inefficient or infeasible.This

paper presents a study on a new method for DFA

construction and minimization through the use of the

Simulated Annealing (SA) algorithm. By integrating

simulated annealing into the DFA creation procedure,

we can considerably lower the number of states and

transitions within the produced DFA, enhancing both

time and spatial efficiency. Our experimental findings

indicate that the SA-based method exceeds

conventional approaches, especially in cases where

large, intricate regular expressions are present.

Subsequent research might concentrate on refining

the annealing process and incorporating it into

real-time compiler systems for additional

optimization.

REFERENCES

Aho, A.V., Sethi, R., & Ullman, J.D. (1986). Compilers:

Principles, Techniques, and Tools. Addison-Wesley.

Brzozowski, J.J. (1964). Minimization of Deterministic

Finite Automata. Proceedings of the IEEE, 51(8), 1247-

1250.

Chandra, Arunav, Aashay Bongulwar, Aayush Jadhav,

Rishikesh Ahire, Amogh Dumbre, Sumaan Ali,

Anveshika Kamble, Rohit Arole, Bijin Jiby, and

Sukhpreet Bhatti. Survey on Randomly Generating

English Sentences. No. 7655. EasyChair, 2022.

García Gómez, P.; López Rodríguez, D.; Vázquez-DeParga

Andrade, M. (2014). Efficient deterministic finite

automata split-minimization derived from

Brzozowski's algorithm. International Journal of

Foundations of Computer Science. 25(6):679-696.

doi:10.1142/S0129054114500282 [24] G. Castiglione,

A. Restivo, M. Sciorti

Hopcroft, J.E., & Ullman, J.D. (1979). Introduction to

Automata Theory, Languages, and Computation.

Addison-Wesley.

Ingale, Varad, Kuldeep Vayadande, Vivek Verma,

Abhishek Yeole, Sahil Zawar, and Zoya Jamadar.

"Lexical analyzer using DFA." International Journal of

Advance Research, Ideas and Innovations in

Technology, www. IJARIIT. com.

Kirkpatrick, S., Gelatt, C.D., & Vecchi, M.P. (1983).

Optimization by Simulated Annealing. Science,

220(4598), 671-680.

Kuldeep Vayadande, Aditya Bodhankar, Ajinkya Mahajan,

Diksha Prasad, Shivani Mahajan, Aishwarya Pujari and

Riya Dhakalkar, “Classification of Depression on social

media using Distant Supervision”, ITM Web Conf.

Volume 50, 2022.

Kuldeep Vayadande, Rahebar Shaikh, Suraj Rothe, Sangam

Patil, Tanuj Baware and Sameer Naik,” Blockchain-

Based Land Record SysteM”, ITM Web Conf. Volume

50, 2022.

Kuldeep Vayadande, Kirti Agarwal, Aadesh Kabra, Ketan

Gangwal and Atharv Kinage,” Cryptography using

Automata Theory”, ITM Web Conf. Volume 50, 2022

Manjramkar, Devang, Adwait Gharpure, Aayush Gore,

Ishan Gujarathi, and Dhananjay Deore. "A Review

Paper on Document text search based on

nondeterministic automata." (2022).

Preetham, H. D., and Kuldeep Baban Vayadande. "Online

Crime Reporting System Using Python Django."

Samruddhi Mumbare, Kunal Shivam, Priyanka Lokhande,

SamruddhiZaware, Varad Deshpande and Kuldeep

Vayadande,”Software Controller using Hand

Gestures”, ITM Web Conf. Volume 50, 2022

Tewari, Ambuj & Srivastava, Utkarsh & Gupta, Phalguni.

(2002). A Parallel DFA Minimization Algorithm. 34-

40. 10.1007/3-540-36265-7_4.

Vayadande, Kuldeep B., and Surendra Yadav. "A Review

paper on Detection of Moving Object in Dynamic

Background." International Journal of Computer

Sciences and Engineering 6, no. 9 (2018): 877-880.

Vayadande, Kuldeep, Harshwardhan More, Omkar More,

Shubham Mulay, Atharva Pathak, and Vishwam

Talnikar. "Pac Man: Game Development using PDA

and OOP." (2022).

Vayadande, Kuldeep, Neha Bhavar, Sayee Chauhan,

Sushrut Kulkarni, Abhijit Thorat, and Yash Annapure.

Spell Checker Model for String Comparison in

Automata. No. 7375. EasyChair, 2022

Vayadande, Kuldeep B., et al. "Simulation and Testing of

Deterministic Finite Automata Machine." International

Watson, Bruce & DACIUK , JAN . (2003). An efficient

incremental DFA minimization algorithm. Natural

Language Engineering. 9. 49 - 64.

10.1017/S1351324903003127.

Yogesh Pant. “A Novel Approach to Minimize DFA State

Machines Using Linked List”. International Research

Journal of Engineering and Technology (IRJET),

eISSN (2018): 2320-7639.

ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,

COMMUNICATION, AND COMPUTING TECHNOLOGIES

556