
their pursuit to combat discrimination, achieve finan-
cial stability, and enhance social mobility for them-
selves and their children, they have relinquished their
languages and traditions. However, they are unaware
that they risk losing their identity if they are unable
to communicate in their native tongue. In such a sit-
uation, it became crucial to preserve and strengthen
the root language, or Irula, from both a cultural and a
computational standpoint. Irula presents a significant
obstacle due to the absence of any digital corpus or
documentation. The objective of this project is to cre-
ate computational tools for Irula that will significantly
influence the field of low-resource language process-
ing and help preserve the language.
Proto-Dravidian refers to the linguistic recon-
struction of the common ancestor of the Dravidian
languages indigenous to the Indian subcontinent. This
language set has descendants in Irula, Tamil, and
Malayalam. Analyzing these three languages reveals
clear commonalities in syntax, semantics, and lexi-
con. The phonological system is where the degree of
divergence appears. These parallels and discrepancies
are examined in this study. It also emphasizes how the
phoneme inventory and sound patterns of these lan-
guages have changed over time.
Malayalam has a more significant phonetic dis-
tinction for every consonant. Irula, which is more
like Tamil, has distinct phonological characteristics,
as evidenced by the change in where some consonants
are articulated. These changes emphasize how each
language’s phonetics—particularly its fricatives and
retroflexes—make these sister languages distinct. The
aim of this study is to assess and analyze the phono-
logical distinctions that exist. Irula shares phonetic
similarities with Tamil and Malayalam, but it also has
distinctive phonetic characteristics, such as consonan-
tal shift and phoneme simplification. For instance, the
word ”singam” in Tamil means ”lion,” whereas the
word ”simham” in Malayalam indicates both phono-
logical and lexical diversity. Irula is closely con-
nected to languages like Malayalam, as seen by the
term ”shivaya,” which is pronounced similarly. This
word is pronounced ”sivaya” in Tamil, and ”chivaya,”
”shivaya,” or ”sivaya” in Irula. This research fo-
cuses on phonetic differences, as well as syntactic and
grammatical similarities.
This effort also aims to analyze the similarities
and differences between these languages. Despite
having a similar lexicon and grammatical organiza-
tion, their phonetic components differ greatly. Frica-
tives and retroflexes aid in the development of a thor-
ough phonological model that precisely differentiates
between these languages. Vowel articulation and pho-
netic markers offer a strong foundation for examining
these languages’ development. The necessity for in-
struments that can precisely record the phonetic vari-
ety among the proto-Dravidian languages and aid in
their preservation is discussed in this work.
Section II provides an overview of related re-
search in phonological analysis, emphasizing stud-
ies that focus on language preservation and linguis-
tic diversity, especially concerning low-resource lan-
guages. This section reviews prior work on acous-
tic feature extraction, phoneme segmentation, and
the creation of language models specifically designed
for endangered languages. Section III describes the
methodology of this study, including the dataset uti-
lized for phonological analysis, feature extraction
methods like MFCC, and the models used for pho-
netic comparison. Section IV presents the results
from the acoustic analysis, highlighting both the sim-
ilarities and differences in phonetic characteristics
among Malayalam, Tamil, and Irula while assessing
the effectiveness of various feature extraction tech-
niques. Section V assesses the phonological mod-
els based on their ability to identify language-specific
features, particularly regarding their role in preserv-
ing Irula. Finally, Section VI concludes with a discus-
sion on the implications of these findings for linguis-
tic research and outlines potential directions for future
work, such as incorporating machine learning tech-
niques for automatic phoneme segmentation and de-
veloping speech-to-text systems for low-resource lan-
guages.
2 RELATED WORKS
The majority of the existing computational linguis-
tics literature on voice analysis concentrates on lan-
guages with abundant resources or extensive docu-
mentation. However, research on resource-centered
and endangered languages has brought attention to the
need for computational tools designed specifically for
these languages. Speech segmentation and pronunci-
ation modeling are the subjects of two studies that are
particularly relevant to this endeavor.
The first study segmented voice samples using
a hybrid segmentation system that integrated signal
processing and machine learning methods(Prakash
et al., 2016). particularly Speech data is divided into
syllable-level segments using a hidden Markov model
(HMM), which is initialized using the global aver-
age variance. Along with examining the distribution
of acoustic qualities among the languages, the study
also examined the acoustic characteristics of sylla-
bles in six Indian languages. and determine the par-
allels and discrepancies The study’s findings demon-
INCOFT 2025 - International Conference on Futuristic Technology
872