The Role of Text Analytics in Healthcare: A Review of Recent

Developments and Applications

Mahmoud Elbattah

, Émilien Arnaud

, Maxime Gignon

and Gilles Dequen

Laboratoire MIS, Université de Picardie Jules Verne, Amiens, France

Emergency Department, Amiens-Picardy University, Amiens France

Keywords: Text Analytics, Natural Language Processing, Unstructured Data, Healthcare Analytics.

Abstract: The implementation of Data Analytics has achieved a significant momentum across a very wide range of

domains. Part of that progress is directly linked to the implementation of Text Analytics solutions.

Organisations increasingly seek to harness the power of Text Analytics to automate the process of gleaning

insights from unstructured textual data. In this respect, this study aims to provide a meeting point for

discussing the state-of-the-art applications of Text Analytics in the healthcare domain in particular. It is aimed

to explore how healthcare providers could make use of Text Analytics for different purposes and contexts. To

this end, the study reviews key studies published over the past 6 years in two major digital libraries including

IEEE Xplore, and ScienceDirect. In general, the study provides a selective review that spans a broad spectrum

of applications and use cases in healthcare. Further aspects are also discussed, which could help reinforce the

utilisation of Text Analytics in the healthcare arena.

1 INTRODUCTION

“Most of the knowledge in the world in the future

is going to be extracted by machines and will

reside in machines”, (LeCun, 2014).

The above-mentioned statement describes the ever-

rising abundance of data-driven knowledge, which

continuously calls for further utilisation of Machine

Learning (ML). By the same token, healthcare is

delivered in data-rich environments where a broad

variety of data sources can be created at the individual

and population levels. The format of heath data

ranges from Electronic Health Records (EHR) to

images, time series, or unstructured textual notes.

Data Analytics has been increasingly considered

as an enabling artefact to leverage health data for

competitive advantage. Using a diversity of ML

techniques, analytics has been widely utilised to

summarise, explain, and get insights into the

interrelationships underlying complex datasets in

novel ways. Such insights can play a positive role in

various medical and operational aspects including

diagnosis, health monitoring and assessment,

healthcare planning, and management of hospitals

and health services.

However, one of the key challenges for healthcare

analytics is to deal with huge data volumes in the form

of unstructured text. Examples include nursing notes,

clinical protocols, medical transcriptions, medical

publications, and many others. In this respect, the use

of Text Analytics has increasingly come into

prominence in order to deliver benefits for health

organisations in a wide range of applications.

Text Analytics, or Text Mining, is generally

defined as the methodology followed to derive quality

and actionable insights from textual data (Sarkar,

2019). Text Analytics represents an overarching field

of techniques and technologies including Natural

Language Processing (NLP), ML, and Information

Retrieval. The power of Text Analytics is to extract

information that could allow for forming and

exploring new facts or hypotheses from unstructured

textual data (Hearst, 1999).

Compared to conventional tasks, the obvious

challenge of Text Analytics is to extract patterns from

natural-language text, rather than well-structured

databases. Textual data are largely stored in an

unstructured form, which does not adhere to any pre-

defined schema or data model. Further, standard ML

algorithms were genuinely crafted to deal with

numeric data. As such, Text Analytics need to apply

Elbattah, M., Arnaud, É., Gignon, M. and Dequen, G.

The Role of Text Analytics in Healthcare: A Review of Recent Developments and Applications.

DOI: 10.5220/0010414508250832

In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF, pages 825-832

ISBN: 978-989-758-490-9

825

especially designed techniques and transformations

to effectively operate over textual data.

The potentials of NLP have been constantly

discussed in the healthcare literature (e.g. Demner-

Fushman, Chapman, and McDonald, 2009; Jensen,

Jensen, and Brunak, 2012; Spasić, Uzuner, and Zhou,

2020). In this respect, the main motivation for this

study was to explore the recent developments and

applications in this context. The study provides a

selective review that spans a broad spectrum of the

applications and use cases of Text Analytics in the

healthcare domain particularly.

2 REVIEW METHODOLOGY

The review aimed to explore the state-of-the-art

approaches and applications of Text Analytics in the

healthcare context. We were generally motivated by

a set of exploratory questions as below:

 What are the potential data sources for applying

Text Analytics in healthcare?

 What are the recent technological advances in

implementing Text Analytics in this context?

 How could Text Analytics help healthcare

providers make better decisions?

 What are the challenges of integrating NLP

tools into healthcare systems?

 What are the key limitations of Text Analytics

in the healthcare domain?

The review incorporated two main stages. The

initial stage included the screening and selection of

studies retrieved from the search results.

Subsequently, we analysed a set of representative

studies to be included in the literature review. The

study sought to largely follow the procedures of a

systematic literature review as informed by (Booth,

Sutton, and Papaioannou, 2011).

The search of literature was conducted to find

relevant studies in two major digital libraries

including: i) IEEE Xplore, and ii) ScienceDirect. It is

acknowledged that other relevant studies could have

been published in other conferences or journals, but

we believe that the selected venues generally

provided excellent representative studies. The review

timeframe stretched through the past 6 years (i.e.

2015-2020).

The inclusion of studies was conducted over a

three-step process for screening and classifying

studies. First, potential studies were screened based

on the title. Second, the abstracts were initially

inspected to confirm the suitability for full-text

review. Eventually, the final decision of inclusion

was made based on the full-text inspection. Figure 1

sketches a flowchart of the review process. Table 1

summarises the search strategy.

Figure 1: The process of screening and selecting studies in

the review.

Table 1: Summary of search strategy.

Digital Libraries

IEEE Xplore,

ScienceDirect

Search Terms

Text Analytics Healthcare,

Text Mining Healthcare,

NLP Healthcare

Search Items Title, Abstract, Keywords

Types of

Document

Conference Proceedings,

Journal Articles

Timespan 2015-2020

Language English

3 REVIEW ANALYSIS

This section aims to provide an analysis of the studies

reviewed. The search results included about 200

publications overall. Eventually, a set of 35 studies

were included in the review based on the process of

screening and analysis as described before.

The review is organised into two broad categories

of Text Analytics. On one hand, the first part presents

Scale-IT-up 2021 - Workshop on Scaling-Up Healthcare with Conversational Agents

826

selective studies that applied Text Mining in the

context of healthcare. On the other hand, the second

part describes Text Analytics in a diversity of

predictive applications to support the clinical decision

making. The review is unavoidably selective rather

than exhaustive. However, it is believed that the study

could adequately provide representative studies in

each category.

3.1 Text Mining Applications in

Healthcare

Text Mining consists of two phases as follows. The

initial phase typically includes the application of text

refining procedures, which transform free-text

documents into another intermediate form.

Subsequently, the process of knowledge extraction,

which attempts to learn patterns or insights from that

intermediate form (Tan, 1999). This section provides

selective studies that applied Text Mining with

different modalities and for various purposes in the

healthcare context.

(Han, Nandan, and Sun, 2015) presented a rule-

based system for question retrieval. The goal was to

search for similar questions in a large corpus of

questions posted on online health forums. The system

was mainly based on the RAKE algorithm (Rose,

Engel, Cramer, and Cowley, 2010) to perform the

automatic extraction of keywords. Additional NLP

methods were applied using the popular NLTK

library (Bird, Klein, and Loper, 2009).

In another application of Text Mining, a study

aimed to develop automated methods for extracting

information from the application webpages on the

iTunes App Store (Paglialonga, Riboldi, Tognola,

and Caiani, 2017). The study considered around 86K

applications under the categories of Medicine, and

Health/Fitness. They used the NLP capabilities

provided by the IBM Watson API to identify the

medical specialty (e.g. cardiology, nutrition,

neurology, etc.), and the type of sponsor (e.g. industry

manufacturer, or government organisation).

Likewise, (Paglialonga et al., 2017) applied Text

Mining to automate the extraction of meaningful

information about health apps on the web.

(Lieder et al., 2019) developed a system that

could mine millions of public business webpages to

extract a multi-faceted representation of customers. In

addition, the extracted data were enriched with

external information collected from Wikipedia. In

this respect, a large-scale knowledge graph was

constructed including millions of inter-connected

entities, which could be continuously enriched and

connected to new entities. The system could be

applied to industry use cases, such as healthcare, to

support insight discovery in real time.

In addition, several studies applied Text Mining to

extract information or insights from online forums or

discussions. For instance, (Sutar, 2017) presented an

interesting application of Text Mining to extract

healthcare-related information from the user-

generated content on social media. Using a dataset

from a cancer-related forum, they developed a system

that could be used to extract practical information

such as treatments, medication names, and side

effects. The dataset included a set of unstructured and

semi-structured textual fields. Similarly, (Deng, Zhou,

Zhang, and Abbasi, 2019) proposed a framework to

support the analytics of online discussions. The

framework was named as Discussion Logic-based

Text Analytics (DiLTA). The DiLTA framework

attempted to extract features that could reveal the

discussion logic underlying online forums. The

framework was experimented using a case study

related to healthcare forums.

(Martínez et al., 2016) discussed exploiting the

health-related online content into actionable

knowledge using Text Mining. To this end, they

developed an approach to help monitor online user-

generated streams on social Media. An NLP-based

processing pipeline was applied to extract and

transform information stemming from real-time

streams of social media. The system could not only

extract the mention of diseases and drugs, but also it

could identify useful relationships among

medications, indications, and adverse drug reactions.

(James, Calderon, and Cook, 2017) analysed

unstructured textual feedback of physicians. They

aimed to extract sentiments and topics pertaining to

the quality of healthcare service. Specifically, they

attempted to identify the tones and topics that could

shape the service ratings. In this regard, more than

20K patient reviews of more than about 4K

physicians were analysed using the Latent Dirichlet

Allocation (LDA) method. Further, a dictionary-

based text analysis was applied to determine the tone

elements in the physician reviews.

(Pendyala, and Figueira, 2017) explored the

potentials of Text Mining for automating the medical

diagnosis. They study applied the Bag-of-Words

representation to medical documents. To simplify the

text representation, the Bag-of-Words model builds a

histogram of the words, while each word count is

considered as a feature (Goldberg, 2017). As such,

each document can be simply represented as a “bag”

of words, while disregarding the order, sequence, and

grammar of text. Though using a small dataset, their

experiments demonstrated promising results for that

The Role of Text Analytics in Healthcare: A Review of Recent Developments and Applications

827

application. More recently, (van Dijk et al., 2020)

applied Text Mining to EHR data to validate the

screening eligibility of trial patients. The study was

based on a multi-centre, and multi-EHR systems as

well. The accuracy of the Text-Ming approach was

compared to the standard process produced by

research personnel. The accuracy of the automatically

extracted data was about 88.0%.

(Chang et al., 2016) developed a workflow using

Text Mining to search, extract, and synthesise

information about Comparative Effectiveness

Research (CER) in healthcare. The study included the

development of an NLP-based pipeline to extract

information from unstructured CER data sources. The

Text-Mining solution could allow for the generation

of timely alerts, and the collection of systematic

reviews as well. Their approach was experimented

using trial data from multiple sources including

ClinicalTrials.gov, WHO International Clinical Trials

Registry Platform (ICTRP), and Citeline Trialtrove.

While other contributions focused on exploiting

Text Mining techniques for extracting concepts and

association rules from the scholarly literature. For

instance, (Kumari, and Mahalakshmi, 2019) applied

Text Mining to a subset of the biomedical literature

on PubMed. They aimed to discover information

related to the phytochemical properties of medicinal

plants. In another application, (Ji, Tian, Shen, and

Tran, 2016) developed a scalable approach to extract

associations among biomedical concepts in scientific

articles. Biomedical concepts were derived by

matching the text elements with the Unified Medical

Language System (UMLS) thesaurus. A MapReduce-

based algorithm was used to calculate the strength of

associations. The experimental dataset included a

large set of about 34K full-text articles. Their results

generally demonstrated that meaningful association

rules were highly ranked.

Recent studies considered more sophisticated

implementations based on the Bidirectional Encoder

Representations from Transformers (BERT), a state-

of-the-art NLP model (Devlin, Chang, Lee, and

Toutanova, 2019). The BERT approach brings the

advantage of allowing pre-trained models to tackle a

broad set of NLP tasks. In this regard, (Peterson,

Jiang, and Liu, 2020) developed a framework for

transforming free-text descriptions into a

standardised form based on the Health Level 7 (HL7)

standards. They utilised a combination of domain-

specific knowledgebases in tandem with the BERT

models. It was demonstrated that the BERT-based

language representation contributed significantly to

the model performance. Likewise, the literature

includes recent contributions that made use of the

BERT approach for a variety of Text Mining tasks

such as (Fan, Fan, and Smith, 2020), (Liao et al.,

2020), and (Vinod et al., 2020).

Furthermore, a major part of the recent

contributions has been positioned in the COVID-19

context. For instance, (Jelodar, Wang, Orji, and

Huang, 2020) used Text Mining to extract the

COVID-19 discussions from social media. They

applied topic modeling of public opinions to gain

insights into the various issues pertaining to the

COVID-19 pandemic. In addition, they implemented

an LSTM model for the sentiment classification of

comments. While (Bharti et al., 2020) developed a

Multilingual conversational bot to provide primary

healthcare education, information, and advice to

chronic patients. Using NLP methods, the chatbot

was aimed to act as a personal virtual doctor to

interact with patients like human beings.

3.2 Text Analytics for Clinical Decision

Support

(Tvardik et al., 2018) developed a Text-Analytics

solution for the automatic detection of medical events

using EHR data. The textual records included data

collected from three University hospitals based in

France over the period October 2009 to December

2010. The dataset spanned a variety of medical

surgical specialities including neurosurgery,

orthopaedic surgery, and digestive surgery. The

system performance was compared with standard

methods. The overall sensitivity and specificity were

about 84%. The study generally confirmed the

feasibility of using NLP-based methods to automate

the detection and monitoring of healthcare-associated

events in hospital facilities.

In another interesting application, (Brown, and

Marotta, 2017) developed a set of classification

models to predict the protocol and priority of MRI

brain examinations. They used the narrative clinical

information provided by clinicians. The models were

trained to make predictions on three tasks including:

i) Selection of examination protocols, ii) Evaluation

of the need for contrast administration, and iii)

Estimation of priority. The dataset consisted of about

14K MRI brain examinations over the period of

January 2013 to June 2015. The empirical results

largely demonstrated that the models could be

effectively employed to assist the clinical decision

support in this regard.

In the context of radiology, several studies sought

to explore the application of NLP methods to extract

information from the mammography reports. For

example, (Castro et al., 2017) developed a system to

Scale-IT-up 2021 - Workshop on Scaling-Up Healthcare with Conversational Agents

828

automate the annotation and classification of the

Breast Imaging Reporting and Data System (BI-

RADS) categories. Specifically, the system tackled

two tasks including: i) Annotation of the BI-RADS

categories, and ii) Classification of the laterality for

each BI-RADS category. The study included about

2K radiology reports collected from 18 hospitals of

the University of Pittsburgh from 2003 to 2015.

While (Miao et al., 2018) applied Deep Learning to

extract the BI-RADS categories from breast

ultrasound reports in Chinese. The experiments

included a dataset of 540 manually annotated reports.

The model accuracy could achieve F1-score of 0.904.

(Afzal et al., 2018) applied NLP for the automatic

identification of Critical limb ischemia (CLI). The

dataset included narrative clinical notes retrieved

form the EHR database. The model performance was

validated compared to the human abstraction of

clinical notes. Specifically, a physician reviewed and

interpreted the information in the EHR data for each

patient in the dataset. Overall, the method could

achieve an excellent F1-score of about 90%.

Using a Text-Analytics approach, (Carchiolo et

al., 2019) proposed a system for the automatic

classification of medical prescriptions (i.e. grantable

or not). Initially, the textual data were scanned from

medical prescription documents. They could develop

an effective classifier based on the data about

patient/doctor personal data, symptoms, pathology,

diagnosis, and suggested treatments. Their results

reported that only 5% of the prescriptions could not

be automatically classified.

Another recent study developed a framework to

realise scalable Text Analytics (Ge, Isah, Zulkernine,

and Khan, 2019). The framework aimed to support

real-time analytics for decision support in a variety of

domains such as healthcare for example. Deep

Learning was applied for NLP tasks including

language understanding and sentiment analysis. The

framework utilised a set of open-source tools

including Spark Streaming for real-time text

processing along with Zeppelin and Banana for data

visualisation. In addition, an LSTM model was

trained for the sentiment analysis. They practically

demonstrated the functionality of the framework

using a scenario with Twitter data.

(Kidwai, and Nadesh, 2020) discussed the

application of diagnostic chatbots in healthcare. They

developed a chatbot that makes use of NLP methods

to understand the user queries. After collecting the

initial symptoms, the chatbot would guide the user

through a sequence of questions towards making the

appropriate diagnosis. The system uses decision trees

and follows a top-down approach to conclude the

diagnosis. The chatbot was experimented using a

medical database of about 150 diseases.

While plentiful studies sought to develop

predictive models to help streamline hospital

admissions. Increasing contributions attempted to

utilise unstructured data such as free-text notes made

by nurses or physicians at the Emergency Department

(ED). For instance, (Sterling, Patzer, Di, and

Schrager, 2019) utilised the bag-of-words

representation of triage free-text notes. Using a

dataset of over 250K ED visits, neural network

models were trained to predict hospital admissions.

They could achieve a promising accuracy with ROC-

AUC≈0.74. Further, (Chen et al., 2020) aimed to

compare the performance of ML models with the

inclusion of textual elements. They applied Deep

Learning along with Word Embeddings using clinical

narratives. They practically demonstrated that the

model accuracy generally improved with the addition

of free-text fields.

Similarly, (Arnaud, Elbattah, Gignon, and

Dequen, 2020) presented an approach based on

integrating structured data with unstructured textual

notes recorded at the triage stage. The key idea was

to apply a multi-input of mixed data for training a

classification model to predict hospitalisation. On one

hand, a standard Multi-Layer Perceptron (MLP)

model was used with the standard set of features (i.e.

numeric and categorical). On the other hand, a

Convolutional Neural Network (CNN) was used to

operate over the textual data. Their empirical results

demonstrated that the classifier could achieve a very

good accuracy with ROC-AUC≈0.83.

The use of ontologies has also drawn attention in

a variety of medical and healthcare applications. To

name a few, (Chakrabarty, and Roy, 2016) used

ontology alignment for the personalisation of cancer

treatment. A patient ontology was mapped to the

disease ontology to dynamically transform general

treatment options into individual intervention plans,

personalised for the patient. In another application,

(Comelli, Agnello, and Vitabile, 2015) proposed an

ontology-based indexing and retrieval system for the

mammography reports. Using an improved

radiological ontology, medical terms were organised

in a hierarchy, which could measure the semantic

similarity between unstructured reports. The system

was tested using a dataset of 126 mammographic

reports in the Italian language, provided by the

University Hospital of Palermo Policlinico.

Furthermore, part of the recent efforts explored

the applicability of Text Analytics to predict the

International Classification of Diseases (ICD) codes.

The manual encoding process is usually time-

The Role of Text Analytics in Healthcare: A Review of Recent Developments and Applications

829

consuming, and prone to various errors as well. In this

regard, (Teng et al., 2020) applied medical topic

mining and Deep Learning to automatically predict

the ICD codes from free-text medical records. The

study used the MIMIC-III dataset, which provides a

large freely accessible repository of ICU records

(Johnson et al. 2016). The reported results indicated

that their method could increase the F1-score

approximately by 5% compared to earlier work.

Similarly, (Gangavarapu et al., 2020) developed an

approach to help predict the ICD-9 code groups based

on unstructured nursing notes. They applied vector

space and topic modeling to structure the raw clinical

data, which allowed for capturing the semantic

information in the free-text notes.

4 DISCUSSION

Over the past five years, there have been pronounced

innovations in the NLP research including novel

approaches and technologies, which in turn have

resonated in the healthcare domain. Most remarkably,

Deep Learning has been increasingly applied for

developing large-scale language models. Deep

architectures of CNNs have introduced a potent

mechanism for learning feature representations from

raw data automatically (LeCun et al. 1989; LeCun,

Bottou, Bengio, and Haffner, 1998). Equally

important, recent applications have started to adopt

the BERT-based approach, which avails of Transfer

Learning for NLP tasks. Furthermore, scalable

analytics platforms have been utilised for real-time

data processing. Examples include Apache Spark,

and IBM Watson.

In terms of data sources, it appears that Text

Analytics was applied against a broad variety of

healthcare data. The datasets ranged from standard

EHR datasets, medical reports, free-text notes,

scientific literature, to user-generated content on

online forums or social media. In this regard, Text

Analytics was implemented for considerable

problems including extracting evidence-based care

interventions, and patient outcomes, or identifying

the population at risk for example. To this end, NLP

pipelines have been intensively developed for a

variety of text-processing tasks such as: i) Named

entity recognition, ii) Topic modeling, iii) Semantic

labelling, iv) Relationship extraction, v) Question

answering, vi) Text summarisation, vii) Sentiment

analysis, and others.

Nevertheless, a set of hurdles stands in opposition

to a widespread implementation of Text Analytics in

the healthcare domain. A key challenge is the

availability of quality data, which is a fundamental

factor for building robust NLP models, and for ML in

general. Beyond that, the underlying data biases pose

multiple ethical concerns for the deployment of NLP

models. Such ethical issues have been recently

discussed in the literature (e.g. Davenport, and

Kalakota, 2019; Baclic et al., 2020). While other

technical challenges may relate to the integration of

Text Analytics tools with existing healthcare systems.

The conventional IT systems may not be well-poised

to be integrated with sophisticated Text Analytics,

which requires an advanced infrastructure and a

highly technical skillset as well. Furthermore, the

implementation of Text Analytics typically requires

intensive development cycles.

In summary, it is conceived that the future holds

many interesting opportunities for implementing Text

Analytics in a multitude of healthcare applications.

The need for leveraging unstructured textual data

should bring up new practical areas for taking

advantage of the Text Analytics potentials.

5 CONCLUSIONS

There is an obvious need to leverage unstructured

textual data to support the operations of healthcare in

many aspects. A large proportion of the clinical data

is unavoidably stockpiled into unstructured, or semi-

structured, documents or notes. Text Analytics should

therefore play a key role in transforming textual data

into actionable insights.

This study endeavoured to review the state-of-the-

art applications of Text Analytics in healthcare. In

this regard, the applications could be broadly

summarised as follows:

 Information extraction from free-text data

stored in EHR databases, clinical reports,

nursing notes, scientific literature, and user-

generated content.

 Applying vector-based representations to a

variety of clinical documents, which transforms

the textual data into an amenable form for ML.

 Sequence-based modeling to address tasks, such

as sentiment analysis, using notes in clinical

reports, or comments posted on online forums.

 Predictive analytics applications to support the

clinical decision making.

 Implementations of Conversational AI

technologies to use chatbots to interact with

patients in a human-like way.

Scale-IT-up 2021 - Workshop on Scaling-Up Healthcare with Conversational Agents

830

REFERENCES

Afzal, N., Mallipeddi, V. P., Sohn, S., Liu, H., Chaudhry,

R., Scott, C. G., ... & Arruda-Olson, A. M. (2018).

Natural language processing of clinical notes for

identification of critical limb ischemia. International

Journal of Medical Informatics, 111, 83-89.

Arnaud, E., Elbattah, M., Gignon, G & Dequen, G. (2020).

Deep learning to predict hospitalization at triage:

Integration of structured data and unstructured text. In

Proceedings of the IEEE International Conference on

Big Data.

Baclic, O., Tunis, M., Young, K., Doan, C., Swerdfeger, H.,

& Schonfeld, J. (2020). Challenges and opportunities

for public health made possible by advances in natural

language processing. Canada Communicable Disease

Report, 46(6), 161-168.

Bharti, U., Bajaj, D., Batra, H., Lalit, S., Lalit, S., &

Gangwani, A. (2020). Medbot: Conversational artificial

intelligence powered chatbot for delivering tele-health

after COVID-19. In Proceedings of the 2020 5th

International Conference on Communication and

Electronics Systems (ICCES), pp. 870-875. IEEE.

Bird, S., Klein, E., & Loper, E. (2009). Natural language

processing with Python: analyzing text with the natural

language toolkit. O'Reilly Media, Inc.

Booth, A., Sutton, A., & Papaioannou, D. (2011).

Systematic approaches to a successful literature review.

Sage.

Brown, A. D., & Marotta, T. R. (2017). A natural language

processing-based model to automate MRI brain

protocol selection and prioritization. Academic

Radiology, 24(2), 160-166.

Carchiolo, V., Longheu, A., Reitano, G., & Zagarella, L.

(2019). Medical prescription classification: A NLP-

based approach. In Proceedings of the 2019 Federated

Conference on Computer Science and Information

Systems (FedCSIS), pp. 605-609. IEEE.

Castro, S. M., Tseytlin, E., Medvedeva, O., Mitchell, K.,

Visweswaran, S., Bekhuis, T., & Jacobson, R. S. (2017).

Automated annotation and classification of BI-RADS

assessment from radiology reports. Journal of

Biomedical Informatics, 69, 177-187.

Chakrabarty, A., & Roy, S. (2016). Personalizing

healthcare services to support decision making in

treatment of cancer patients using ontology alignment.

In Proceedings of the India International Conference

on Information Processing (IICIP), pp. 1-6. IEEE.

Chang, M., Chang, M., Reed, J. Z., Milward, D., Xu, J. J.,

& Cornell, W. D. (2016). Developing timely insights

into comparative effectiveness research with a text-

mining pipeline. Drug Discovery Today, 21(3), 473-

480.

Chen, C. H., Hsieh, J. G., Cheng, S. L., Lin, Y. L., Lin, P.

H., & Jeng, J. H. (2020). Emergency department

disposition prediction using a deep neural network with

integrated clinical narratives and structured data.

International Journal of Medical Informatics, 104146.

Comelli, A., Agnello, L., & Vitabile, S. (2015). An

ontology-based retrieval system for mammographic

reports. In Proceedings of the 2015 IEEE Symposium

on Computers and Communication (ISCC), pp.1001-

1006). IEEE.

Davenport, T., & Kalakota, R. (2019). The potential for

artificial intelligence in healthcare. Future Healthcare

Journal, 6(2), 94.

Demner-Fushman, D., Chapman, W. W., & McDonald, C.

J. (2009). What can natural language processing do for

clinical decision support?. Journal of Biomedical

Informatics, 42(5), 760-772.

Deng, S., Zhou, Y., Zhang, P., & Abbasi, A. (2019). Using

discussion logic in analyzing online group discussions:

A text mining approach. Information & Management,

56(4), 536-551.

Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019).

BERT: Pre-training of Deep Bidirectional

Transformers for Language Understanding. In

Proceedings of the Annual Conference of the North

American Chapter of the Association for

Computational Linguistics (NAACL-HLT).

Fan, B., Fan, W., & Smith, C. (2020). Adverse drug event

detection and extraction from open data: A deep

learning approach. Information Processing &

Management, 57(1), 102131.

Gangavarapu, T., Jayasimha, A., Krishnan, G. S., &

Kamath, S. (2020). Predicting ICD-9 code groups with

fuzzy similarity based supervised multi-label

classification of unstructured clinical nursing notes.

Knowledge-Based Systems, Vol. 190, 105321.

Ge, S., Isah, H., Zulkernine, F., & Khan, S. (2019). A

scalable framework for multilevel streaming data

analytics using deep learning. In Proceedings of the

IEEE 43rd Annual Computer Software and

Applications Conference (COMPSAC), Vol. 2, pp. 189-

194). IEEE.

Goldberg, Y. (2017). Neural network methods for natural

language processing. In Hirst, G. (Ed.). Synthesis

Lectures on Human Language Technologies, 10(1), p.

69. Morgan & Claypool Publishers.

LeCun, Y., Boser, B. E., Denker, J. S., Henderson, D.,

Howard, R. E., Hubbard, W. E., and Jackel, L. D.

(1989). Handwritten digit recognition with a back-

propagation network. In Proceedings of Advances in

Neural Information Processing Systems (NIPS) (pp.

396-404).

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998).

Gradient-based learning applied to document

recognition. In Proceedings of the IEEE, 86(11), 2278-

2324.

LeCun, Y. (2014). Chapter 3: Facebook. In Sebastian

Gutierrez (Eds.). Data Scientists at Work. Apress.

Liao, Z., Liu, L., Wu, Q., Teney, D., Shen, C., van den

Hengel, A., & Verjans, J. (2020). Medical Data Inquiry

Using a Question Answering Model. In Proceedings of

the 17th IEEE International Symposium on Biomedical

Imaging (ISBI) (pp. 1490-1493). IEEE.

The Role of Text Analytics in Healthcare: A Review of Recent Developments and Applications

831

Lieder, I., Segal, M., Avidan, E., Cohen, A., & Hope, T.

(2019). Learning a faceted customer segmentation for

discovering new business opportunities at Intel. In

Proceedings of the IEEE International Conference on

Big Data, pp. 6136-6138. IEEE.

Han, J., Nandan, N., & Sun, A. (2015). Did You Know? A

Rule-Based Approach to Finding Similar Questions on

Online Health Forums. In Proceedings of the 2015

International Conference on Healthcare Informatics,

pp. 513-514). IEEE.

Hearst, M. A. (1999). Untangling text data mining. In

Proceedings of the 37th Annual meeting of the

Association for Computational Linguistics (pp. 3-10).

James, T. L., Calderon, E. D. V., & Cook, D. F. (2017).

Exploring patient perceptions of healthcare service

quality through analysis of unstructured feedback.

Expert Systems with Applications, 71, 479-492.

Jelodar, H., Wang, Y., Orji, R., & Huang, H. (2020). Deep

sentiment classification and topic discovery on novel

coronavirus or covid-19 online discussions: NLP using

lstm recurrent neural network approach. IEEE Journal

of Biomedical and Health Informatics, vol. 24, no. 10,

pp. 2733-2742

Jensen, P. B., Jensen, L. J., & Brunak, S. (2012). Mining

electronic health records: towards better research

applications and clinical care. Nature Reviews Genetics,

13(6), 395-405.

Ji, Y., Tian, Y., Shen, F., & Tran, J. (2016). Leveraging

MapReduce to efficiently extract associations between

biomedical concepts from large text data.

Microprocessors and Microsystems, 46, 202-210.

Johnson, A. E., Pollard, T. J., Shen, L., Li-wei, H. L., Feng,

M., Ghassemi, M., ... & Mark, R. G. (2016). MIMIC-

III, a freely accessible critical care database. Scientific

Data, 3, 160035.

Kidwai, B., & Nadesh, R. K. (2020). Design and

development of diagnostic Chabot for supporting

primary health care systems. Procedia Computer

Science, 167, 75-84.

Kumari, B. N., & Mahalakshmi, G. S. (2019). A cloud

based knowledge discovery framework, for medicinal

plants from PubMed literature. Informatics in Medicine

Unlocked, 16, 100226.

Martínez, P., Martínez, J. L., Segura-Bedmar, I., Moreno-

Schneider, J., Luna, A., & Revert, R. (2016). Turning

user generated health-related content into actionable

knowledge through text analytics services. Computers

in Industry, 78, 43-56.

Miao, S., Xu, T., Wu, Y., Xie, H., Wang, J., Jing, S., ... &

Shan, T. (2018). Extraction of BI-RADS findings from

breast ultrasound reports in Chinese using deep learning

approaches. International Journal of Medical

Informatics, 119, 17-21.

Paglialonga, A., Riboldi, M., Tognola, G., & Caiani, E. G.

(2017). Automated identification of health apps'

medical specialties and promoters from the store

webpages. In Proceedings of the E-Health and

Bioengineering Conference (EHB), pp. 197-200. IEEE.

Paglialonga, A., Pinciroli, F., Tognola, G., Barbieri, R.,

Caiani, E. G., & Riboldi, M. (2017). e-Health solutions

for better care: Characterization of health apps to

extract meaningful information and support users'

choices. In Proceedings of the 3rd International Forum

on Research and Technologies for Society and Industry

(RTSI) (pp. 1-6). IEEE.

Pendyala, V. S., & Figueira, S. (2017). Automated medical

diagnosis from clinical data. In Proceedings of the

IEEE Third International Conference on Big Data

Computing Service and Applications (BigDataService),

pp. 185-190. IEEE.

Peterson, K. J., Jiang, G., & Liu, H. (2020). A corpus-driven

standardization framework for encoding clinical

problems with HL7 FHIR. Journal of Biomedical

Informatics, 110, 103541.

Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010).

Automatic keyword extraction from individual

documents. Text Mining: Applications and Theory, 1,

1-20.

Sarkar, D. (2019). Text analytics with Python: a

practitioner's guide to natural language processing.

Apress.

Spasić, I., Uzuner, Ö., & Zhou, L. (2020). Emerging clinical

applications of text analytics. International Journal of

Medical Informatics, Vol. 134.

Sterling, N. W., Patzer, R. E., Di, M., & Schrager, J. D.

(2019). Prediction of emergency department patient

disposition based on natural language processing of

triage notes. International Journal of Medical

Informatics, 129, 184-188.

Sutar, S. G. (2017). Intelligent data mining technique of

social media for improving health care. In Proceedings

of the 2017 International Conference on Intelligent

Computing and Control Systems (ICICCS), pp. 1356-

1360. IEEE.

Tan, A. H. (1999). Text mining: The state of the art and the

challenges. In Proceedings of the 1999 PAKDD

Workshop on Knowledge Discovery from Advanced

Databases, Vol. 8, pp. 65-70.

Tvardik, N., Kergourlay, I., Bittar, A., Segond, F., Darmoni,

S., & Metzger, M. H. (2018). Accuracy of using natural

language processing methods for identifying

healthcare-associated infections. International Journal

of Medical Informatics, 117, 96-102.

Teng, F., Ma, Z., Chen, J., Xiao, M., & Huang, L. (2020).

Automatic medical code assignment via deep learning

approach for intelligent healthcare. IEEE Journal of

Biomedical and Health Informatics, vol. 24, no. 9, pp.

2506-2515.

van Dijk, W. B., Fiolet, A. T., Schuit, E., Sammani, A.,

Groenhof, T. K. J., van der Graaf, R., ... & Grobbee, D.

E. (2020). Text-mining in electronic healthcare records

can be used as efficient tool for screening and data-

collection in cardiovascular trials: a multicenter

validation study. Journal of Clinical Epidemiology.

https://doi.org/10.1016/j.jclinepi.2020.11.014

Vinod, P., Safar, S., Mathew, D., Venugopal, P., Joly, L. M.,

& George, J. (2020). Fine-tuning the BERTSUMEXT

model for Clinical Report Summarization. In

Proceedings of the 2020 International Conference for

Emerging Technology (INCET) (pp. 1-7). IEEE.

Scale-IT-up 2021 - Workshop on Scaling-Up Healthcare with Conversational Agents

832