erence for determining symmetry lines. These lines
are calculated based on the centers of the topmost and
bottommost digits for vertical symmetry and the left-
most and rightmost digits for horizontal symmetry.
Furthermore, it impacts the distance variation coef-
ficient as artifacts or additional lines are frequently
unevenly distributed across the scan. Lastly, for in-
stances where the algorithm identifies a 2 and an 11
it uses their centers as references for calculating the
optimal angle for the clock hand evaluation. There-
fore, the accuracy and reliability of the digit recogni-
tion component significantly influence the entire CDT
scoring process.
Another critical aspect regarding the digit clas-
sification model is that it is trained using data from
the MNIST dataset and subsequently applied to clas-
sify digits within CDT-scans without further trans-
fer learning on these specific digits. The handwritten
digits in the MNIST dataset, were mostly created by
young American highschool students and employees
of the National Institute for Standards and Technol-
ogy, whereas the scans were conducted in Germany.
This raises the possibility that there are differences
how Americans and Germans write digits. Also peo-
ple suffering from dementia are typically older, an-
other aspect distinguishing the clock digits from the
training data.
Another potential difference between the ground
truth and model’s prediction could stem from the dis-
cretion that a human rater can perform case-by-case
decisions. It is possible for a medical expert to assign
a perfect rating to clock hands that are drawn as a di-
rect line between 2 and 11 without touching the center
the clock face, as it is clear that the patient correctly
identifies the correct time. Our detection algorithm
on the other hand is not capable of doing that, as it is
looking for lines originating in the center of the clock,
which might not always align with human evaluative
criteria.
Reevaluating the exact criteria for the final scoring
within the algorithm might be necessary on a techno-
logical level to provide both transparent and accurate
predictions in scoring. However a thorough study by
Mainland et al. (2014) implies that, in the medical
context, it is more important to correctly assess be-
tween pass or fail, than to increase the complexity of
scoring criteria, to depict the cognitive state a person.
(Mainland et al., 2014).
Our algorithm evaluates a scan of CDT result, per-
formed on paper with a pen. Information about the
patients behaviour and the time taken for completing
the test is lost in this form of evaluation.
To conclude this discussion there are some inac-
curacies in the current algorithm, especially regard-
ing the digit recognition. However the results show
the potential of an automatic evaluation of the CDT,
especially when it comes to binary classification as
pass and fail.
9 CONCLUSION
This paper examines the feasibility of scoring and
screening the clock drawing test with a transparent,
component-wise approach of combining traditional
image detection methods and deep learning. The pro-
posed algorithm yields good prediction accuracy for
screening, where a CDT-scan is classified as pass or
fail. Especially in correctly classifying true-negative
results, which is of particular relevance in practical
dementia diagnosis, only 6.36 % of failed CDT-scans
are misclassified as pass.
The MAE from scoring and the confusion matri-
ces in Figure 3 suggest that there are some issues in
precise ordinal regression prediction. However incor-
rect predictions are off by a score of 1 and resolv-
ing the issues discussed in section 8 could lead to im-
provements of the accuracy.
Future advancements of the algorithm should pri-
oritize more precise digit recognition, such as the im-
plementation of a non-digit-class for input training.
Moreover, it is imperative to engage in a comprehen-
sive review and standardization of the digital scor-
ing criteria, involving a multidisciplinary team of ex-
perts encompassing the fields of dementia, neurology,
clock drawing testing and computer science. This col-
laborative effort will ensure that the scoring criteria
are transparent, robust and universally accepted.
Part of the preparatory work has been creating an
app
2
for digitally performing the clock drawing test
on an iPad. The result is scored with a variant of the
algorithm, described in this paper and showed good
results, that are of clinical significance.
REFERENCES
Agrell, B. and Dehlin, O. (1998). The clock-drawing test.
Age and ageing, 27(3):399–404.
Baccianella, S., Esuli, A., and Sebastiani, F. (2009). Evalu-
ation Measures for Ordinal Regression. In 2009 Ninth
International Conference on Intelligent Systems De-
sign and Applications, pages 283–287.
Chen, S., Stromer, D., Alabdalrahim, H. A., Schwab, S.,
Weih, M., and Maier, A. (2020). Automatic dementia
2
https://apps.apple.com/gb/app/clock-drawing-test/id
1594273677
HEALTHINF 2024 - 17th International Conference on Health Informatics
480