For the filtered dataset (n = 52), the random forest
model achieved an R-squared of 0.832 on the training
set and 0.054 on the test set. The RMSE values were
1.273 (training) and 2.323 (test). While the test set R-
squared was still low, it was slightly better than the
result from the full dataset, suggesting a marginal
improvement in predictive stability when using the
filtered data.
Comparing the two models, both results highlight
Study Hours and Sleep Duration as consistently
important factors. However, the filtered dataset
emphasized Weekday Sleep Duration and Weekend
Sleep End more strongly, while the full dataset placed
greater weight on Weekday Sleep Start. This suggests
that when outliers and unrealistic data were removed,
the model focused more on total sleep duration as a
key predictor, while the full dataset’s model was
slightly more sensitive to the timing of sleep.
4.3 Comparison with Previous Studies
Some researchers have also analyzed the same
dataset. For example, Tapendu used both multiple
linear regression and random forest models. In their
work, they added several new variables, including
Average Sleep Duration and Sleep Onset Time
Difference, which aimed to describe students’ sleep
habits in more detail. Their linear regression model
achieved a relatively high R-squared, and their
random forest results showed that Sleep Duration,
Average Sleep Duration, and Weekday Sleep End
were the most important factors influencing sleep
quality. These results are generally consistent with
the trend found in this study, where sleep duration-
related variables also appeared to be key predictors.
However, there are clear differences between their
approach and mine. This paper focused more on
improving data quality through strict filtering. The
filtered dataset contained only records with valid
sleep schedule data, which made it possible to include
two new variables: Weekday Sleep Duration and
Weekend Sleep Duration. Although my model
performance was lower, the focus was on ensuring
that only reliable data were used. These differences in
variable selection and data processing may explain
why the results are not exactly the same (Tapendu,
2024). Overall, both studies highlight that sleep
patterns play an important role in determining sleep
quality, but different methods can lead to different
outcomes.
5 CONCLUSION
This study explored how student lifestyle factors
affect sleep quality using both linear regression and
random forest models. The results showed that
neither model could accurately predict sleep quality,
with low R² scores on the test set for both models.
Although the random forest model performed well on
the training data, its performance dropped
significantly on the test set, suggesting overfitting.
Still, both methods consistently highlighted sleep
duration as one of the most important factors
influencing sleep quality.
There are also some limitations. The dataset did
not include psychological factors such as stress,
anxiety or depression, which may strongly influence
sleep, especially for international students who are
always influenced by culture shock. Environmental
factors like sunlight duration and seasonal changes
were also not considered.
Moreover, international students often face extra
challenges that can affect their sleep quality. In future
research, more detailed data should be collected
especially data about mental health, social factors.
With better and comprehensive data, models may
give more useful and accurate predictions of student
sleep quality.
REFERENCES
Becker, S. P., Jarrett, M. A., Luebbe, A. M., Garner, A. A.,
Burns, G. L., Kofler, M. J. 2018. Sleep in a large, multi-
university sample of college students: sleep problem
prevalence, sex differences, and mental health
correlates. Sleep health, 4(2), 174-181.
Hershner, S. D., Chervin, R. D. 2014. Causes and
consequences of sleepiness among college
students. Nature and science of sleep, 6, 73-84.
Lemma, S., Berhane, Y., Worku, A., Gelaye, B., Williams,
M. A. 2014. Good quality sleep is associated with better
academic performance among university students in
Ethiopia. Sleep & breathing = Schlaf & Atmung, 18(2),
257-263.
Lund, H. G., Reider, B. D., Whiting, A. B., Prichard, J. R.
2010. Sleep patterns and predictors of disturbed sleep
in a large population of college students. The Journal of
adolescent health: official publication of the Society for
Adolescent Medicine, 46(2), 124-132.
Schmickler, J. M., Blaschke, S., Robbins, R., Mess, F. 2023.
Determinants of Sleep Quality: A Cross-Sectional
Study in University Students. International journal of
environmental research and public health, 20(3).
Sohn, S. I., Kim, D. H., Lee, M. Y. Cho, Y. W. 2012. The
reliability and validity of the korean version of the
pittsburgh sleep quality index. Sleep & Breathing, 16(3),
803-812.