testing sets of the data are created, with noteworthy
features selected. After training several regression
models, Polynomial Regression reduces residual error
by fitting a curve to the data, resulting in the best
accuracy of 80.97%. (Nithya & Dr. V. Ilango, 2017)
Six different classification algorithms were used in
combination to predict which beneficiaries' inpatient
claim amounts increased in 2008 and 2009.
Using the test dataset, the results showed 80%
sensitivity, 77.56% overall accuracy, and 76.46%
precision, the model proved to be useful in helping
high-risk patients select the best insurance plan and in
properly estimating cost and revenue for insurance
providers.(A. Tike & S. Tavarageri, 2017) Utilizing
Medicare payment data, a medical price prediction
system was developed to help patients identify less
expensive medical providers. The authors present a
brand-new hierarchical decision tree method for
estimating prices.
Several experiments are conducted to compare
this method against several machine learning
techniques, including linear regression, Random
Forests, and gradient-boosted trees, and the results
demonstrate that the hierarchical decision tree
achieves a high accuracy. (M Mohammed Hanafy &
Omar M.A. Mahmoud, 2021)This study demonstrates
how insurance premiums can be predicted using a
variety of regression models. They also compared the
results of several other models, such as the DNN,
Random Forest Regressor, Support Vector Machine,
XGBoost, CART, Randomized Additive Model, and
k-Nearest Neighbors.
The stochastic gradient boosting model, which
produces an R-squared value of 84.8295, an MAE of
0.17448, and an RMSE of 0.38018, is found to be the
most successful model. (Hossen, 2023) In this
research, they anticipate insurance amounts for
different groups of people using both individual and
local health data. The effectiveness of these
techniques was investigated using nine regression
models: Linear Regression, XGBoost Regression,
Gradient Boosting, KNN Model, Random Forest
Regression, Ridge Regression, Lasso Regression,
Decision Tree Regression, and Support Vector
Regression. (Lahiri & N. Agarwal, 2014) They use
about 114,000 beneficiaries and over 12,400
attributes in publicly available Medicare data. It
solved the problem of accurately predicting which
customers' inpatient claim amounts increased
between 2008 and 2009 by using six different
classification algorithms.
The study predicts which beneficiaries inpatient
claim amounts grew from 2008 to 2009 using publicly
available Medicare data with over 114,000
beneficiaries and 12,400 attributes. Employing an
ensemble of six classification techniques, it obtains an
overall accuracy of 77.56%, a precision of 76.46%,
and a sensitivity of 80% with the test dataset.
(Gregori, et al., 2011) After being introduced,
regression techniques suitable for healthcare cost
analysis were applied in two experimental settings:
hospital care for diabetes and cardiovascular
treatment (the COSTAMI trial). The study presents
regression approaches created especially for
healthcare cost analysis.
These techniques are applied in observational
settings (diabetes hospital care) as well as
experimental ones (the COSTAMI trial in
cardiovascular treatment). (Bertsimas, et al., 2008)
This study investigates data mining techniques, which
are a potent tool for estimating health-care expenses
and offer precise estimates of medical costs.
In order to predict medical costs, the study used
data-mining techniques. It evaluates their accuracy in
two contexts: observational (diabetic hospital care)
and experimental (COSTAMI research in
cardiovascular medicine). The study looks at the
value of medical data in predicting costs, especially
for high-priced members, and emphasizes the
predictive ability of past cost trends. (Mukund
Kulkarni, et al., 2022) This work uses a range of
machine learning regression models on a personal
medical cost dataset to anticipate health insurance
costs based on certain features.
Regression models including gradient boosting,
polynomial, decision tree, random forest, multiple
linear, and other regression models are studied in this
work. All the models are trained on a subset of the
dataset, and their performance is evaluated using
metrics such as root mean square error (RMSE), mean
absolute error (MAE), and accuracy. (Sahu Ajay,
Sharma Gopal, Kaushik Janvi, Agarwal Kajal, &
Singh Devendra, 2023) “With a score of 85.776%, it
was discovered that Gradient Boosting Decision Tree
Regression had the best accuracy rate for estimating
the amount. Although around 80% of the time, both
random forest and linear regression could produce
accurate forecasts.
3 DESIGN AND PRINCIPLE OF
MODEL
3.1 METHODOLOGY
Using ensemble learning approaches, we created a
predictive model for healthcare spending in this study. To