Automated Medical Reporting: From Multimodal Inputs to Medical
Reports through Knowledge Graphs
Lientje Maas
1
, Adriaan Kisjes
1
, Iman Hashemi
1
, Floris Heijmans
1
, Fabiano Dalpiaz
1 a
,
Sandra Van Dulmen
2 b
and Sjaak Brinkkemper
1 c
1
Department of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands
2
NIVEL (Netherlands Institute for Health Services Research), Utrecht, The Netherlands
Keywords:
Healthcare Workflow Management, Electronic Medical Record, Automated Reporting, Dialogue
Interpretation, Knowledge Graphs, Patient Medical Graph.
Abstract:
Care providers generally experience a high workload mainly due to the large amount of time required for ade-
quate documentation. This paper presents our visionary idea of real-time automated medical reporting through
the integration of speech and action recognition technology with knowledge-based summarization of the inter-
action between care provider and patient. We introduce the Patient Medical Graph as a formal representation
of the dialogue and actions during a medical consultation. This knowledge graph represents human anatomical
entities, symptoms, medical observations, diagnoses and treatment plans. The formal representation enables
automated preparation of a consultation report by means of sentence plans to generate natural language. The
architecture and functionality of the Care2Report prototype illustrate our vision of automated reporting of
human communication and activities using knowledge graphs and NLP tools.
1 INTRODUCTION
Care providers (CPs) are required to accurately re-
port patient information. As a primary communica-
tion tool between CPs, medical records are necessary
for good patient care. However, recording and main-
taining patient medical information in the electronic
medical record (EMR) is time-consuming. A more ef-
ficient way of reporting is required to cope with high
workload in healthcare while preserving quality of the
patient data.
To reduce documentation time, the use of speech
recognition in medical reporting has been studied ex-
tensively. Recently, Chiu et al. developed a speech
recognition system for transcription of medical con-
versations, reaching a word accuracy of 81.7% (Chiu
et al., 2017). Most studies focus on dictation for
reporting after a consultation (Ajami, 2016). How-
ever, dictation is only used by 1% of medical staff
in the Netherlands (Luchies et al., 2018). Klann and
Szolovits performed initial work to capture the patient
- CP dialogue with speech recognition and automati-
a
https://orcid.org/0000-0003-4480-3887
b
https://orcid.org/0000-0002-1651-7544
c
https://orcid.org/0000-0002-2977-8911
cally extract clinical meaning (Klann and Szolovits,
2009). Further, the project BabyTalk aimed to au-
tomatically generate textual summaries of temporal
clinical data from physiological signals (Portet et al.,
2009). Automated medical reporting is the visionary
goal of our Care2Report (C2R) research program (see
www.care2report.nl). To achieve this, state-of-the-art
speech and action recognition technology are com-
bined with semantic interpretation of data through
knowledge graphs. This enables automatic prepara-
tion of a consultation report that is checked by the
CP (and, if relevant, the patient) before uploading in
the EMR. Our solution will substantially reduce ad-
ministrative load and improve personal engagement
in healthcare. Note that we do not provide decision
support but solely report consultations.
This paper is organized as follows. The next sec-
tion describes our approach to enable automated med-
ical reporting. Section 3 provides more in-depth in-
formation about the formal representation of events
and situations during medical consultations. Section 4
presents the architecture and functionality of the sys-
tem that is under development. Finally, the status of
our research and outlook is described in Section 5.
Maas, L., Kisjes, A., Hashemi, I., Heijmans, F., Dalpiaz, F., Van Dulmen, S. and Brinkkemper, S.
Automated Medical Reporting: From Multimodal Inputs to Medical Reports through Knowledge Graphs.
DOI: 10.5220/0010261605090514
In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF, pages 509-514
ISBN: 978-989-758-490-9
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
509
Formal
representation
Reasoning
Patient
EMR
Voice dialogue
recording
Action and
treatment
recording
Measurement
recording
Raw text
preprocessing
Raw video
preprocessing
Raw data
preprocessing
Stage 1 Stage 2
Stage 3 Stage 4
Report
preparation
Figure 1: Flowchart of the process of automated medical reporting.
2 APPROACH
Our main research challenge is to integrate state-
of-the-art multimodal recognition technology with
knowledge representation and reasoning into one soft-
ware platform. Globally, the process consists of four
stages (illustrated in Fig. 1):
1. Transformation of audio, video and sensor data
from medical consultations into text using exist-
ing speech and action recognition technology.
2. Formal representation of situations, measure-
ments and treatments based on multimodal input
combined with semantic technology.
3. Generation of medical reports using conventions
in specific medical domains.
4. Report completion, checking by CP, and upload-
ing through a generic EMR-interface.
We develop a generic hardware and software plat-
form with non-intrusive recording device with micro-
phone, camera and sensor technology that performs
optimal recognition of situations and actions. Sensor
technology enables wireless connection with health-
care domotics, e.g., a thermometer. Multimodal in-
put is provided: audio, video and sensor modalities.
Speech recognition allows to transform medical dia-
logues to text, action recognition captures examina-
tions and treatments, and sensor data provide results
of medical measurements.
2.1 Multimodal Knowledge Integration
To interpret the raw data recorded during the medi-
cal consultation, we model it as a knowledge graph to
enhance semantic reasoning and querying (Antoniou
et al., 2012). We refer to all interpreted information
from the consultation as consultation knowledge.
Although the interpretation of unconstrained di-
alogue text can be problematic, we are in the for-
tunate circumstance that detailed knowledge about
the context of the utterances is available through
so-called background knowledge. For most medi-
cal consultations, the condition for which the patient
is treated is known and the corresponding medical
guideline is employed for a more accurate interpre-
tation (Peleg, 2013; Sutton and Fox, 2003). This
helps to resolve ambiguity and cope with incomplete
or noisy input. Access to the medical record of the
patient is of similar use. Additionally, we exploit the
large corpus of medical background knowledge that
is available. Medical ontologies (SNOMED, ICD-
10, LOINC) and large medical knowledge graphs
(Drugbank, SIDER, AERS) are utilized to disam-
biguate the text. This is particularly helpful for cases
where knowledge of the patient’s condition is par-
tially known or vague.
To integrate information from the multimodal
sources, the C2R system constructs a so-called medi-
cal consultation timeline to log a medical consultation
(e.g., measurements, diagnosis, treatments). The sit-
uations that stem from the occurrence of events are
stored along with their time range, enabling enhanced
event recognition by using multimodal inputs. For
example, if a CP verbally announces that he or she
is going to listen to a patient’s heart (audio input), it
can be foreseen that a stethoscope will be used (video
input). The integration of inputs will lead to the com-
plete modeled consultation knowledge in a knowledge
graph populated by semantic triples (hsubject, predi-
cate, objecti) (Rohloff et al., 2007), from which a re-
port is generated.
3 PATIENT MEDICAL GRAPH
Medical consultations follow a general structure:
opening, history taking, physical examination, evalu-
ation, treatment recommendations and closing (May-
nard and Heritage, 2005). During history taking and
physical examination the presence of signs and symp-
toms is determined, which are evaluated to determine
a diagnosis and treatment plan. To formally represent
HEALTHINF 2021 - 14th International Conference on Health Informatics
510
head
ears
left
ear
outer
ear
ear
canal
auricle
eardrum
hasSymp
pain
drainage
hearing
loss
patient
body
isPartOf
isPartOf
isPartOf
isPartOfisPartOf
redness
swelling
hasSign
scars
hasSign
hasSign
intact
isPartOf
isPartOf
isPartOf
7/10
mild
yes
yes
hasValue
observation-1
no
observation-3
hasValue
diagnosedWith
otitis
externa
hasTreatments
hypercortison
3 drops t.i.d.
treatment-2
treatedWith
hasValue
observation-2
obsSymp
obsValue
obsSymp
obsValue
obsSymp
obsValue
S
left-ear
duration
4 days
observation-6
hasValue
obsSign
obsValue
yes
observation-7
hasValue
obsValue
obsSign
S
left-ear
canal
observation-4
obsSign
obsValue
no
S
left-
eardrum
observation-5
obsSign
obsValue
PAG
PSG
POG
PDG
PTG
S
left-
auricle
hasValue
hasValue
Figure 2: Excerpt of a PMG based on a consultation concerning otitis externa (external ear infection). Note that for explanation
reasons the graph is colored for each of the five subgraphs.
the collected information, we define the Patient Med-
ical Graph (PMG) as the knowledge graph of the pa-
tient’s anatomy complemented with evaluated signs
and symptoms with associated diagnosis and treat-
ment plan.
1
The PMG serves as an internal represen-
tation of the consultation knowledge. An example of
the PMG for a fictitious external ear infection (otitis
externa) consultation is presented in Fig. 2. It con-
sists of five subgraphs: PMG = PAG PSG POG
PDG PTG. We will now formally define each sub-
graph (see also Tables 1 and 2) and illustrate with ex-
amples from Fig. 2.
The human anatomy is the starting point of the
Patient Anatomy Graph (PAG), representing all hu-
man anatomical entities. The PAG knowledge graph
is universal for each patient, apart from gender dif-
ferences. Existing ontologies are used as reference,
e.g., the Foundational Model of Anatomy (Rosse and
Mejino Jr, 2003). The PAG is complemented with
the Patient Symptom Graph (PSG), representing signs
and symptoms associated with specific anatomical en-
tities. Medical guidelines build the PSG by provid-
ing lists of signs and symptoms occurring in specific
medical domains (Peleg, 2013). The Patient Obser-
vation Graph (POG) assigns values to the signs and
symptoms based on observations during the medical
consultation. The observations connect the values to a
certain sign or symptom (e.g., observation-1 observes
symptom pain with value 7/10), appearing as (green)
triangles in the POG. Additional characteristics are
also in the POG, such as the time of occurrence (e.g.,
observation-1 of pain 7/10 has had duration 4 days).
1
The PMG can be seen as the instance level (A-Box) of an
ontology. Due to space limitations, we do not discuss the
corresponding T-Box that defines the entity and relation-
ship types.
Next, the graph is complemented with the diagno-
sis made by the CP in the Patient Diagnosis Graph
(PDG). Based on observations (green), the diagnosis
otitis externa is given (red). Finally, we complement
the graph with the Patient Treatment Graph (PTG)
based on the interpreted treatment plan in the con-
sultation. We consider any treatment in its broadest
sense: not only medication, but also referral to a spe-
cialist or additional tests.
3.1 Populating the PMG
Complementing the PAG PSG with the POG, PDG
and PTG requires interpretation of the consultation.
Observations from test scenarios indicate that the key
parts in the consultation dialogue are typically uttered
in short standard phrases. We aim to capture the med-
ical dialogue through a library of linguistic patterns
with placeholders. Medical guidelines are the starting
point for identification of these patterns. The place-
holders are filled in using part-of-speech tagging and
dependency parsing in combination with regular ex-
pressions, after which semantic triples are deduced.
A similar method has been successfully used for au-
tomated evaluation of eligibility criteria for clinical
trials (Milian et al., 2015).
3.2 Report Generation
From the populated PMG, a report of the consul-
tation is generated. Medical reports generally con-
tain short and simple sentences, which enhances au-
tomated generation. We are developing a natural
language generation component of our system based
on the NaturalOWL system (Androutsopoulos et al.,
2013), illustrated in Fig. 3. Template sentence plans
Automated Medical Reporting: From Multimodal Inputs to Medical Reports through Knowledge Graphs
511
Table 1: Definition of sets required to define the PMG.
Set Description Set Description
P all patients O all medical observations
A all anatomical entities of the human body D all medical diagnoses
S all medical signs and symptoms T all medical treatments
V all possible values to be assigned to s S
Table 2: Formal definitions of the subgraphs comprising the PMG.
Graph Vertices Typed edges
PAG A {(a
1
, a
2
) | a
1
, a
2
A a
1
is a direct anatomical subpart of a
2
}
PSG A S {(a, s) | a A s S} i.e., all signs and symptoms of a
POG O S V {(o, s), (o, v), (s, v) | o O s S v V } i.e., all observations
PDG P D {(p, d) | p P d D} i.e., all diagnoses for patient p
PTG P T {(p, t) | p P t T } i.e., all treatment plans for patient p
are specified for the relevant relations in the PMG. We
will determine the requirements and conventions re-
garding medical reporting to identify information that
is relevant to report and study filtering methods for
report texts.
The sentence plans consist of a sequence of slots
along with information on how to fill those in. These
plans lead to separate sentences, which are aggregated
into longer ones based on rules. In addition, referring
expressions are generated to improve readability. Af-
ter the report is generated, the CP checks it for com-
pleteness and correctness.
4 Care2Report PROTOTYPE
To realize our vision a prototype is under develop-
ment that takes multimodal input and outputs a draft
report. It transforms speech to text, recognizes medi-
cal objects from video, and transforms sensor signals
to measurement data. Formal knowledge representa-
tion based on medical guidelines and sentence com-
position are implemented for a selected domain: med-
ical problems related to the ear. Starting with a small
domain provides the opportunity to study and test our
methods by specification of e.g. the PAG PSG and
it enhances data interpretation due to specific back-
ground knowledge.
4.1 Architecture
The prototype is based on a microservice architec-
ture (Klock et al., 2017). Splitting large unimodal an-
alyzers (e.g., audio analyzer, video analyzer, and do-
motics analyzer) into smaller microanalyzers solves
interdependency complications while maintaining a
loosely coupled system. Each microanalyzer has a
predefined input and output set, which allows for sim-
ple configurability and future extensibility. A micro-
analyzer controller controls the analysis process and
ensures that all execution constraints are satisfied.
4.2 Input Analysis and Report
Generation
The system contains a database with data structure in
correspondence with the medical consultation time-
line described in Section 2. Triples to populate the
PMG are extracted from dialogue text using linguis-
tic tools. Grammatical annotation of dialogue sen-
tences is used to extract concepts and relations for
triple creation. We envision more rigorous methods
in the future as described in Section 3. Video anal-
ysis is used to identify movement of medical objects
(e.g., a stethoscope) to indicate utilization by the CP.
Healthcare domotics send data from medical mea-
surements to the system via Bluetooth. The relevant
input is added to the (prebuilt) PAG PSG to form
the complete PMG comprising the modeled consul-
tation knowledge. A report is then generated based
on sentence plans, following the procedure described
in Section 3, which is developed for the ear domain.
The stages of the process are illustrated in Fig. 3 for
the ear infection example.
4.3 Evaluation
We are currently building a large corpus of data in-
cluding recordings of both simulated and real medi-
cal consultations. Corresponding medical reports are
written manually by medical professionals to com-
pare with the automatically generated reports. The
data can be partitioned into a training set and a test
set, enabling training and evaluation of the system.
4.4 Technological Platforms
The front end of the system runs on the Windows
UWP platform and is mainly written in C#. The
HEALTHINF 2021 - 14th International Conference on Health Informatics
512
...
CP:Whatbringsyou
heretoday?
P:Ihavepaininmy
leftear.
CP:Ahsoyourear
hurts.Doyoufeelthat
youalsocanhearless
thanusual?
P:No.
...
hasSymptom
Patient
Head
Right
Ear
Left
Ear
Instance:
Left Ear
Instance:
Pain
Symptom
Hearing
Loss
Drainage
Pain
hasSymptom
Figure 3: Example showing part of a transcription of the CP - patient dialogue (left), the resulting PMG (middle) and the
sentence plan for report generation (right).
back end runs primarily on .NET Core and is writ-
ten in C#. The analyzers are written in Python, using
gRPC for communication between services/modules.
Google Cloud Speech-to-Text service transcribes the
audio and linguistic annotation is handled by Python-
Frog. For video analysis the OpenCV and the YOLO
libraries are used. Medical guidelines are modeled in
PROforma. Prot
´
eg
´
e facilitates ontology development
and triples are stored and managed with StarDog.
5 RESEARCH OUTLOOK
So far, we presented our grand vision and the imple-
mentation of our basic ideas in the first C2R proto-
type. To reach our proposed objectives, we need to
overcome several challenges i.a. in the development
of a robust architecture that is independent of input
technology, in the semantic interpretation of input that
deviates between hospitals on terminology and pro-
cedures, and in striking a balance between required
expressiveness and computational demands in con-
structing a formal representation of the transcriptions.
Our future research will focus on device integra-
tion for high-quality multimodal recognition (stage 1
in Fig. 1), on methods to build and populate the PMG
(stage 2 in Fig. 1), and on methods to filter out irrel-
evant information from medical consultations (stage
3 in Fig. 1). Our preliminary research and results
encouraged us that our ambitious goal of fully auto-
mated medical reporting is achievable.
ACKNOWLEDGEMENTS
We thank the students of the software project teams
Ki
´
eli and KettleHawks, Marjan van den Akker,
Lennart Herlaar and Sabine Molenaar for their sup-
port.
REFERENCES
Ajami, S. (2016). Use of speech-to-text technology for
documentation by healthcare providers. The National
Medical Journal of India, 29(3):148–152.
Androutsopoulos, I., Lampouras, G., and Galanis, D.
(2013). Generating natural language descriptions
from OWL ontologies: the NaturalOWL system.
JAIR, 48:671–715.
Antoniou, G., Groth, P., van Harmelen, F., and Hoekstra,
R. (2012). A Semantic Web Primer. The MIT Press,
Cambridge, Massachusetts, 3 edition.
Chiu, C.-C., Tripathi, A., Chou, K., Co, C., Jaitly, N., Jaun-
zeikare, D., Kannan, A., Nguyen, P., Sak, H., Sankar,
A., et al. (2017). Speech recognition for medical con-
versations. arXiv preprint arXiv:1711.07274.
Klann, J. G. and Szolovits, P. (2009). An intelligent listen-
ing framework for capturing encounter notes from a
doctor-patient dialog. BMC Medical Informatics and
Decision Making, 9(1).
Klock, S., van der Werf, J. M., Guelen, J. P., and Jansen, S.
(2017). Workload-based clustering of coherent feature
sets in microservice architectures. In ICSA, pages 11–
20.
Luchies, E., Spruit, M., and Askari, M. (2018). Speech
technology in Dutch health care: A qualitative study.
In BIOSTEC, volume 5, pages 339–348.
Maynard, D. W. and Heritage, J. (2005). Conversation anal-
ysis, doctor–patient interaction and medical commu-
nication. Medical Education, 39(4):428–435.
Milian, K., Hoekstra, R., Bucur, A., ten Teije, A., van
Harmelen, F., and Paulissen, J. (2015). Enhancing
reuse of structured eligibility criteria and supporting
their relaxation. Journal of Biomedical Informatics,
56:205–219.
Peleg, M. (2013). Computer-interpretable clinical guide-
lines: a methodological review. Journal of Biomedical
Informatics, 46(4):744–763.
Portet, F., Reiter, E., Gatt, A., Hunter, J., Sripada, S., Freer,
Y., and Sykes, C. (2009). Automatic generation of
textual summaries from neonatal intensive care data.
AI, 173(7-8):789–816.
Rohloff, K., Dean, M., Emmons, I., Ryder, D., and Sumner,
J. (2007). An evaluation of triple-store technologies
Automated Medical Reporting: From Multimodal Inputs to Medical Reports through Knowledge Graphs
513
for large data stores. In OTM Confederated Interna-
tional Conferences “On the Move to Meaningful In-
ternet Systems”, pages 1105–1114. Springer.
Rosse, C. and Mejino Jr, J. L. (2003). A reference on-
tology for biomedical informatics: the Foundational
Model of Anatomy. Journal of Biomedical Informat-
ics, 36(6):478–500.
Sutton, D. R. and Fox, J. (2003). The syntax and semantics
of the PROforma guideline modeling language. Jour-
nal of the American Medical Informatics Association,
10(5):433–443.
HEALTHINF 2021 - 14th International Conference on Health Informatics
514