A Comparative Analysis of Ensemble and Non-Ensemble Machine
Learning Algorithms
Yifei Wang
Department of Mathematics, University of Washington, Seattle, Washington, 98195, U.S.A.
Keywords: Machine Learning, Ensemble Learning, Heart Attack Prediction, Mushroom Classification.
Abstract: The development of machine learning has led to the design of various algorithms to effectively address
complex problems. Among these, both ensemble and non-ensemble methods have attracted significant
attention due to their unique advantages and applications. This paper compares the performance of ensemble
and non-ensemble machine learning algorithms in terms of accuracy, efficiency, and stability, using two
classification datasets. This work evaluates six algorithms: three non-ensemble methods, which include
support vector classification, decision tree, and k-nearest neighbors; and three ensemble methods, which
include random forest, gradient boosting, and voting. The performance is validated on two tasks: heart attack
prediction and mushroom classification. The results indicate that ensemble algorithms, particularly random
forest, and gradient boosting, generally achieve higher accuracy and greater stability compared to the non-
ensemble decision tree algorithm. However, despite the slight accuracy improvement, ensemble methods tend
to be much slower during both the training and prediction phases. Support vector classification is efficient on
smaller datasets but exhibits slower performance on larger ones. Additionally, the performance of voting
algorithms is highly dependent on the selection of base models. These findings highlight the trade-offs
between accuracy, efficiency, and stability when choosing appropriate machine learning algorithms for
specific tasks.
1 INTRODUCTION
Machine learning (ML) is a class of algorithms that
analyze existing data, discover patterns, and make
predictions. Machine learning can be used to
automate decision-making processes. From
healthcare to finance, from autonomous driving to
natural language processing, machine learning
algorithms have been widely adapted to daily lives.
In the field of machine learning, algorithms can be
roughly divided into ensemble and non-ensemble
algorithms. Non-ensemble algorithms rely on a single
model for prediction. Support vector classification,
decision trees, and k-nearest neighbors are several
common non-ensemble algorithms. Ensemble
machine learning algorithms combine the predictions
of several base estimators to obtain a more stable and
accurate prediction model. Common ensemble
algorithms include random forests, gradient boosting,
and voting.
This paper aims to analyze and compare the
performance of non-ensemble and ensemble machine
learning algorithms on two datasets, including the
prediction of heart attack and the classification of
mushrooms.
2 MACHINE LEARNING
ALGORITHMS
To compare ensemble and non-ensemble machine
learning algorithms, the author will use the following
6 algorithms. Support Vector Classification, decision
trees, and K-Nearest Neighbors are non-ensemble
algorithms, while Random Forest, Gradient Boosting,
and, Voting are ensemble algorithms.
2.1 Non-Ensemble Algorithms
Machine learning is aimed at finding a function 𝑦 =
𝑓(𝑥) which can be used to model the data from
training data (𝑥, 𝑦). Non-ensemble machine learning
algorithms are hypothesis spaces containing function
𝑓. Training is designed to find out the best function
from the hypothesis spaces that match the real-world
problem best (Muhamedyev, 2015). Decision tree is
382
Wang, Y.
A Comparative Analysis of Ensemble and Non-Ensemble Machine Learning Algorithms.
DOI: 10.5220/0013332500004558
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 1st International Conference on Modern Logistics and Supply Chain Management (MLSCM 2024), pages 382-386
ISBN: 978-989-758-738-2
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
one of the classic non-ensemble algorithms. And
following are non-ensemble algorithms that will be
used for compare.
Decision Trees (DT): A decision tree is a
supervised learning algorithm for classification and
regression tasks (Loh, 2011). It builds a structure
similar to a binary tree, dividing the data step by step
according to input characteristics. Throughout the
process, different options are examined, and the most
effective ones are selected at each level, leading to the
final decision.
Support Vector Classification (SVC): Support
Vector Machines (SVMs) are supervised learning
algorithms used for classification and regression tasks
(Cortes, 1995). They work by finding an optimal
hyperplane that maximizes the margin between
different classes in an N-dimensional space. A kernel
function is used to transform the data to achieve a
better result compared with simpler algorithms like
linear regression (SalcedoSanz, 2014).
K-Nearest Neighbors (kNN): The k-nearest
neighbors algorithm is a supervised learning method
frequently applied when solving classification and
regression problems. By finding the k-nearest
neighbors to a given point, the algorithm gives
the output based on choosing the most common
category for classification or the mean value for
regression tasks. A parameter k is used to control the
number of nearest points that participate in the
prediction.
2.2 Ensemble Algorithms
Ensemble algorithms are learning algorithms that
construct a set of models and make decisions based
on the combination of them. Bagging and boosting
are common strategies for building ensemble models
(Dietterich, 2000). Bagging is a technology first
proposed by Breiman (Breiman, 1996). With bagging,
a new training set is created by randomly sampling
from the original training set with a replacement for
each base classifier. Boosting is a strategy proposed
by Schapire (Schapire, 1990). It focuses on improving
the performance of weak classifiers by sequentially
training them, each time focusing more on the
instances that previous classifiers misclassified. And
following are ensemble algorithms that will be used.
Random Forest (RF): RF is an ensemble learning
method that builds multiple decision trees and merges
them to get a more accurate and stable prediction (Ho,
1995). In random forests, each tree is built from a
bootstrap sample of the training data, and at each split,
a random subset of features is considered to find the
best split. It may avoid some overfitting problems
from the decision tree as it builds multiple decision
trees.
Gradient Boosting (GB): Gradient Boosting is an
ensemble learning technique that builds a model in a
stage-wise fashion from decision trees, which are
supervised learning methods used for classification
and regression (Friedman, 2002). Gradient Boosting
usually combines multiple decision trees to predict
the result, which makes it a strong model. It achieves
this by finding a loss function and choosing the one
with the least error through gradient descent. It
iterates many times on the weak learners to build a
more precise model.
Voting: Voting is also an ensemble learning
technique. The mechanism of it is collecting the
results of different models and considering all of them
to give an average prediction. In the context of voting,
there are two main types: hard voting and soft voting.
Hard voting involves taking the majority vote from
the predictions of all models, while soft voting
averages the predicted probabilities and selects the
class with the highest average probability.
2.3 Algorithms Comparison
As Dietterich pointed out, ensemble algorithms may
achieve better performance due to three reasons:
statistical, computational, and representational
(Dietterich, 2000).
Statistical: The hypothesis space that requires
searching is too large, however, people typically do
not have enough training data to determine the model
precisely. Try to learn a model based on them
typically caused overfitting. Combining multiple
models may offset the errors in each model and avoid
overfitting.
Computational: Find the best function in
hypothesis space such as a decision tree that could be
an NP-Hard problem. So, some heuristics search must
be applied to find the function. And therefore, the
function may not be the best one. Combining multiple
models makes the prediction closer to the optimal
solution.
Representational: The hypothesis space may not
actually contain the best function. So, the best model
based on a specified algorithm may not be the best to
represent the real-world problem.
3 EXPERIMENTS AND RESULTS
3.1 Dataset
A Comparative Analysis of Ensemble and Non-Ensemble Machine Learning Algorithms
383
This work will evaluate these algorithms based on
two datasets: (1) The Heart Attack Dataset by Rashik.
The dataset contains various fields such as age, sex,
and other cardiovascular health indicators for some
people. And it aimed to predict if the person has a
higher chance of heart attack. The dataset is tiny and
contains 303 rows and 14 columns (Rashik, 2021). (2)
The Mushroom Dataset for Binary Classification
Available at UCI Library. The dataset contains
different properties of mushrooms such as color,
shape, and size. It aimed to predict if the given
mushroom is poisonous or edible. The dataset is much
larger, which contains 54035 rows and 9 columns
(Joakim, 2023). All these datasets are publicly
available on Kaggle.
3.2 Evaluation Metrics
This work applies all 6 algorithms discussed above on
these two datasets. When preprocessing the dataset,
all categorical columns are one-hot encoded. This
work splits the dataset randomly, so 80% rows are
used for training, and the remaining 20% rows are
used for evaluating.
For each algorithm, this work measures it with
these key properties:
Accuracy (Acc.): Indicates the percentage of
instances correctly predicted compared to the total in
the test set. A higher value means a better outcome
for the algorithm. It is one of the most important
indicators for evaluating a model.
Training accuracy: Reflects how many instances
were accurately predicted relative to the total
instances in the training set. A higher value means the
model learned more in the training set and may also
indicate the overfitting.
Time usage (train time + predict time): Measures
the time spent on training the model and generating
predictions on the test dataset. A shorter time means
more efficient model predictions. The evaluation is
executed on the computer with Intel Core i7-8650U.
3.3 Performance Comparison
The evaluated result on the mushroom dataset is
shown in Table 1. This work sets specific parameters
for each machine learning algorithm. DT is
constrained with a maximum depth of 25 and
considers up to 28 features. kNN is based on the 5
nearest neighbors, with predictions weighted by the
distance to these neighbors. SVC employs a
polynomial kernel to transform the data, which can
enhance its classification capabilities. RF is
configured with a depth limit of 32 and is trained
using 100 decision trees. GB operates with a similar
depth constraint of 25 and a feature limit of 28, and it
constructs the model through 100 iterations. Finally,
Vote method combines the predictions from the three
non-ensemble algorithms to determine the final
classification.
Table 1: Performance comparison on mushroom dataset.
Al
g
orith
m
Val Acc Train Acc Time
DT 0.9802 0.9982 0.32
/
0.33
kNN 0.6807 1.0000
0.04/1.85
SVC 0.5564 0.5525 391.7
/
421.7
RF 0.9899 1.0000 6.55/6.77
GB 0.9892 1.0000 86.02
/
86.23
Vote 0.8307 0.9997
395.9/429.0
The evaluate result on the heart attack dataset is
shown in Table 2. In this study, DT is capped at a
depth of 6 for controlled growth. The kNN uses 7
neighbors in its unweighted predictions. SVC applies
a polynomial kernel to enhance data classification.
RF, with a depth limit of 5, is trained on 400 decision
trees to improve accuracy through diversity. GB is
limited to a depth of 3, which undergoes 150
iterations to refine its model. Vote method pools
predictions from the DT, kNN, and SVC through hard
voting, aiming to consolidate strengths for better
accuracy.
Table 2: Performance comparison on heart attack dataset.
Al
g
orith
m
Val Acc Train Acc Time
DT 0.9802 0.9982 0.32
/
0.33
kNN 0.6807 1.0000
0.04/1.85
SVC 0.5564 0.5525 391.7
/
421.7
RF 0.9899 1.0000
6.55/6.77
GB 0.9892 1.0000 86.02
/
86.23
Vote 0.8307 0.9997
395.9/429.0
Figure 1: Accuracy of models with given number of
estimators (Figure Credits: Original).
MLSCM 2024 - International Conference on Modern Logistics and Supply Chain Management
384
This work also interested in the 1.0 training
accuracy on the first dataset archived by algorithm
random forest and gradient boosting. So, this work
trains these two algorithms with different numbers of
estimators. The accuracy after the different number of
estimators is shown in Figure 1.
4 DISCUSSIONS
4.1 Performance
Accuracy is one of the most important indicators
when evaluating models. The author first analyzes the
accuracy of given datasets for all these algorithms,
and finds out that within the non-ensemble models,
random forests have the best accuracy. This may be
due to the random forest being more suitable to a
given dataset. As it has a much better accuracy, the
voting algorithm based on all these 3 algorithms
perform worse than the decision tree. However, it
could also be noticed that random forest and gradient
boosting have similar or better accuracy compare to
the decision tree. As these two algorithms are based
on the decision tree and have made improvements to
it. As distance weighted kNN is applied on mushroom
dataset, it suggests that it overfitted the train data with
a 1.0 train accuracy. However, since non-weighted
kNN is used on the heart attack dataset, the kNN
algorithm does not show overfitting. To avoid
overfitting, this work limited the depth and features
when training decision tree models. By given these
parameters, the decision tree has a lower train
accuracy but a better performance. It could also be
noticed that random forest and gradient boosting have
a very high train accuracy. While it may not mean the
model is overfitted. As the model is not getting lower
accuracy when increasing the number of estimators,
aka. underlying decision trees or the number of
iterations as shown in Figure 1.
4.2 Time Consumption
Among the three non-ensemble algorithms, SVC has
the worst efficiency. It takes much longer time when
applied on a large dataset. The kNN algorithm has a
lower training time, but a longer predicting time. This
is due to the fact that the algorithm is not actually
trained into certain model, but use all train data when
predicting. The vote algorithm needs to first train all
these 3 algorithms. So, its time usage is about to be
the sum of the above 3 algorithms. And since this
work included the SVC algorithm which is slow on
large dataset, it suggests voting have a very poor
efficient. Random forest and gradient boosting are all
based on decision trees. And there are parameters
which could be used to control number of decision
trees or number of iterations. So, their efficiency is
heavily influenced by the parameter. As a result, they
are much slower than decision trees. However, they
still show a better efficient when compare to SVC on
large datasets.
4.3 Randomness and Stability
The kNN and SVC algorithms are not relied on
randomness, so they always have the stable outcome.
However, the decision tree needs randomness when
splitting nodes. And as the result, the decision tree is
not stable. Based on different random seeds, it may
have different accuracy.
Random forest is based on decision tree. However,
it combines the results from multiple decision trees.
So, it has a higher stability. Similar behavior may be
observed in gradient boosting. It iterates many times
to avoid the unstable introduced by randomness.
5 CONCLUSIONS
This work presents a comprehensive comparison of
ensemble and non-ensemble machine learning
algorithms, focusing on their performance, efficiency,
and stability. The analysis includes decision trees,
support vector classification, K-nearest neighbors,
random forests, gradient boosting, and voting
algorithms.
From the evaluation, it could be observed that
ensemble methods, especially random forests and
gradient boosting, generally outperform non-
ensemble methods in terms of accuracy. This can be
attributed to their ability to combine multiple models,
thereby reducing overfitting and enhancing
generalization. However, the voting algorithm did not
perform as well as expected, possibly due to the
inclusion of SVC, which exhibits inefficiency on
large datasets.
In terms of training and prediction time, non-
ensemble methods such as kNN and decision trees
exhibit faster training times, but their prediction
efficiency varies. kNN, in particular, exhibits longer
prediction times due to its reliance on the entire
training dataset. Ensemble methods are slower during
training due to the complexity of combining multiple
models, but are still more efficient than SVC on large
datasets.
Stability analysis shows that non-ensemble
methods such as kNN and SVC provide consistent
A Comparative Analysis of Ensemble and Non-Ensemble Machine Learning Algorithms
385
results, while decision trees exhibit variability due to
their reliance on randomness. Ensemble methods
such as random forests and gradient boosting mitigate
this instability by aggregating the results of multiple
models, thereby providing greater stability.
Results show that while non-ensemble methods
can be efficient and easy to implement, ensemble
methods provide better accuracy and stability,
making them more suitable for complex datasets.
When choosing a machine learning algorithm, it is
required to consider factors such as performance and
accuracy and choose the right algorithm.
REFERENCES
Breiman, L. 1996. Bagging predictors. Machine
learning, 24, 123-140.
Cortes, C. 1995. Support-Vector Networks. Machine
Learning, 1-25.
Dietterich, T. G. 2000. Ensemble methods in machine
learning. In International workshop on multiple
classifier systems. 1-15.
Friedman, J. H. 2002. Stochastic gradient
boosting. Computational statistics & data
analysis, 38(4), 367-378.
Ho, T. K. 1995. Random decision forests. In Proceedings
of 3rd international conference on document analysis
and recognition. 1, 278-282.
Joakim, A. 2023. Secondary Mushroom Dataset. URL:
https://www.kaggle.com/datasets/joebeachcapital/seco
ndary-mushroom-dataset. Last Accessed: 2024/09/13.
Loh, W. Y. 2011. Classification and regression trees. Wiley
interdisciplinary reviews: data mining and knowledge
discovery, 1(1), 14-23.
Muhamedyev, R. 2015. Machine learning methods: An
overview. Computer modelling & new
technologies, 19(6), 14-29.
Rashik, R. 2021. Heart Attack Analysis & Prediction Data
set. URL: https://www.kaggle.com/datasets/rashikrah
manpritom/heart-attack-analysis-prediction-dataset. La
st Accessed: 2024/09/13
SalcedoSanz, S., RojoÁlvarez, J. L., MartínezRamón, M.,
& CampsValls, G. 2014. Support vector machines in
engineering: an overview. Wiley Interdisciplinary
Reviews: Data Mining and Knowledge Discovery, 4(3),
234-267.
Schapire, R. E. 1990. The strength of weak
learnability. Machine learning, 5, 197-227.
MLSCM 2024 - International Conference on Modern Logistics and Supply Chain Management
386