2.2.1 Machine Learning (ML)
A computer technique called ML uses past data to
automatically learn from mistakes to enhance
performance and produce more accurate predictions.
ML is the development of algorithms and techniques
that enable computers to learn and become intelligent
based on past experience. It is a branch of artificial
intelligence (AI) with close connections to statistics.
As a result of learning, the system becomes capable
of identifying and comprehending the input data,
enabling it to be used as the basis for choices and
forecasts. In the current work. Find trends and
patterns in danger indicators by utilizing machine
learning approaches, the Pima India Diabetes dataset,
and R data processing tools. It can generate and
analyze five different predictive models to classify an
individual as either diabetic or non-diabetic using the
R data processing tool.ML algorithms are employed,
specifically multifactor dimensionality reduction
(MDR), KNN, radial basis function (RBF), kernel
support vector machines, linear kernel SVM, and
artificial neural networks (ANN), to do this. High-
dimensional biomedical data is now automatically
analyzed using ML methods. Among the biological
uses of ML are liver disease diagnosis, skin lesion
classification, cardiovascular disease risk assessment,
and genetic and genomic data analysis. Hashemi et al.
have successfully deployed the SVM algorithm for
the diagnosis of liver illness (Mumtaz et al., 2018).
Mumtaz et al. employed classification models such as
SVM, logistic regression (LR), and Naive Bayes
(NB) to determine the presence of major depressive
disorder (MDD) using an EEG dataset.
2.2.2 KNN
KNN classifies use the characteristic space's closest
training scenarios as a basis. The most fundamental
kind of lazy learning based on instance learning is
KNN. All occurrences are taken to be points in n-
dimensional space. Finding an instance's "proximity"
requires the use of a distance metric. To classify
cases, KNN locates the closest neighbors and chooses
the most well-liked class among them. The feature of
KNN is as followed (Li et al., 2019). All data
instances are points in n-dimensional Euclidean
space. Categorization is postponed until additional
instances are received. For complex goal functions
and noisy training data, KNN is an effective inductive
inference technique. One way to think of the objective
function of the entire space is as a mixture of simpler
local approximations. The algorithm of KNN is as
follows (Li et al., 2019). Consider a sample dataset
with n columns and m rows, where the input vector is
represented by the
π

-1
ξ―§ξ―
column and the output
vector by the
π
ξ―§ξ―
column. Call the test dataset P. It
has y rows and n-1 properties. To determine the
Euclidean distance between each S and T, as:
πππ π‘ππππ =
ξΆ§
βββ(
π
(π,π)βπ(πβπ)
)
ξ¬Ά

ξ―ξ
ξ―
ξ―ξ
ξ―¬
ξ―ξ
(1)
Next, ascertain that KK has a random value of no.
the closest neighbor. Next, determine the
π
ξ―§ξ―
column for each using these minimum distances and
Euclidean distances. Find the same output value. The
patient has diabetes if the number stays the same.
2.2.3 SVM
In medical diagnosis, it refers to a group of linked
supervised learning techniques for regression and
classification. Known as the maximum margin
classifier, SVM enhances the shape of the margin
while concurrently minimizing the actual
classification error. Statistical learning theory serves
as the foundation for SVM, a broad risk border
assurance technique. SVM may effectively use the
so-called kernel technique to carry out nonlinear
classification. Their inputs are implicitly mapped into
a higher-dimensional feature space by it. The
classifier may be built without knowledge of the
feature space thanks to the kernel trick. The ML
research community has recently shown a significant
lot of interest in SVM. In terms of precision in
classification, SVM typically outperforms other data
categorization techniques, according to certain recent
studies. SVM is a helpful technique for situations
involving binary categorization, hence it can be
applied to anticipate diabetes. SVM are applied to
regression as well as classification. Data points are
spatially represented and grouped in the SVM model;
points that share comparable characteristics are
included in the same group. A given data set in a
linear SVM is seen as a p-dimensional vector that is
separable by the hyperplane, or maximum value of
the p-1 plane. These planes define boundaries
between data groups or divide data Spaces for
classification or regression issues. The optimal
hyperplane can be selected from the available
hyperplanes based on the distance between the two
classes that the hyperplane divides. The maximum
boundary hyperplane is the plane that has the biggest
boundary between these two classes.