Illegitimate HIS Access by Healthcare Professionals Detection System

Applying an Audit Trail-based Model

Liliana S

a-Correia

1 a

, Manuel E. Correia

2 b

and Ricardo Cruz-Correia

1,3 c

HLTSYS, HealthySystems, lda, Porto, Portugal

Faculdade de Ci

encias da Universidade do Porto, Portugal

Center for Research in Health Technologies and Information Systems - CINTESIS, Porto, Portugal

Keywords:

Data Breach, Data Protection, Health Data Access, Illegitimate Access Detection.

Abstract:

Complex data management on healthcare institutions makes very hard to identify illegitimate accesses which

is a serious issue. We propose to develop a system to detect accesses with suspicious behavior for further

investigation. We modeled use cases (UC) and sequence diagrams (SD) showing the data ﬂow between users

and systems. The algorithms represented by activity diagrams apply rules based on professionals’ routines,

use data from an audit trail (AT) and classify accesses as suspicious or normal. The algorithms were evaluated

between 23rd and 31st July 2019. The results were analyzed using absolute and relative frequencies and

dispersion measures. Access classiﬁcation was in accordance to rules applied. “Check time of activity”

UC had 64,78% of suspicious classiﬁcations, being 55% of activity period shorter and 9,78% longer than

expected, “Check days of activity” presented 2,27% of suspicious access and “EHR read access” 79%, the

highest percentage of suspicious accesses. The results show the ﬁrst picture of HIS accesses. Deeper analysis

to evaluate algorithms sensibility and speciﬁcity should be done. Lack of more detailed information about

professionals’ routines and systems, and low quality of systems logs are some limitations. Although we

believe this is an important step in this ﬁeld.

1 INTRODUCTION

Healthcare institutions typically imply complex data

management processes, where a professional can

have multiple roles during a certain period of time

(physician, researcher, head of department), leading

him or her to access many different patients’ Eletronic

Health Record (EHR), that in its turn are accessed by

many professionals for different reasons. This com-

plexity makes very hard distinguishing the legitimate

accesses from the non-legitimate ones and it is be-

coming a serious issue for healthcare institutions to

solve. Although audit trails (AT) are an important tool

for some General Data Protection Regulation (GDPR)

requirements’ complience (EU, 2016), like audit and

traceability (Gonc¸alves-Ferreira et al., 2018), we be-

live that they can have an important role on detec-

tion of suspicious actions on EHR that can be illegit-

imate access. Previous studies show that despite the

https://orcid.org/0000-0001-6174-3957

https://orcid.org/0000-0002-2348-8075

https://orcid.org/0000-0002-3764-5158

complex environment of data management on health-

care providers it is possible to create rules associated

to routines of healthcare professionals and to model

their access to Health Information Systems (HIS)

through use cases (UC). Taking advantage of infor-

mation collected on previous investigation (L.Correia

et al., 2019) we propose to implement algorithms for

detection of suspicious actions on HIS by healthcare

professionals giving clues for further investigation by

the Data Protection Ofﬁcer (DPO) and to ensure the

patients data privacy.

2 METHODS

Parting from previous studies (L.Correia et al., 2019)

in which were modeled UC for scenarios that de-

scribed situations of, or that could lead to, illegitimate

access, we selected three to implement algorithms for

detection of suspicious activity. The choice was based

on the available logs in the AT of a hospital from

North Portugal after an analysis of variables needed

Sá-Correia, L., Correia, M. and Cruz-Correia, R.

Illegitimate HIS Access by Healthcare Professionals Detection System Applying an Audit Trail-based Model.

DOI: 10.5220/0008991505390546

In Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - Volume 5: HEALTHINF, pages 539-546

ISBN: 978-989-758-398-8; ISSN: 2184-4305

539

for each UC. Since we had logs just from the ap-

plicational system Obscare we excluded the UC that

depend other type of logs. The rules and tresholds

applyed to algorithms were based on the information

gathered on discussions with experts and interviews

to healthcare professionals (L.Correia et al., 2019).

We used the Uniﬁed Model Language (UML) to de-

sign the UC and activity diagrams (AD), and coded in

JAVA programming language. Tests were conducted

between 23rd and 31st of July 2019, with logs of one

applicational system - Obscare that were being col-

lect by the AT HS.REGISTER on an hospital from

North Portugal. We analysed the obtained datasets in

order to ﬁnd erros on dates and calculations, incon-

sistencies and access misclassiﬁcations. For each, it

was removed duplicated records and it was analysed

the impact of N/A existence. For the dataset of UC

“Check time of activity” was produced a summary

table by professional category with the metrics: (1)

total of results, (2) number of professionals without

identiﬁcation, (3) minimum time of activity, (4) 1st

and 3rd quartiles and median values of time of activ-

ity, (5) mean of time of activity, (6) maximum time

of activity; (7) standard deviation and (8) number of

results classiﬁed as “suspicious”. For the dataset of

UC “Check days of activity” we produced a summary

table by professional category with the metrics: (1)

total of results, (2) minimum days of activity, (3) 1st

and 3rd and median days of activity, (4) mean of days

of activity, (5) maximum of days of activity; (6) stan-

dard deviation and (7) number of results classiﬁed as

“suspicious”. For dataset of UC “EHR read access”

we produced a summary table by date with the met-

rics: (1) total of results, (2) total of results without

professional ID, (3) total of results with null patient

ID, (4) total of suspicious access classiﬁcations, (5)

total of suspicious access classiﬁcations without pro-

fessional ID and (6) total of suspicious access classi-

ﬁcations with null patient ID, and a table comparing

the accesses by professional category.

3 RESULTS

3.1 Use Case “Check Time of Activity”

First, we think of identifying professionals’ activity

periods that are longer or shorter than expected for

a work shift, since professionals have a schedule to

work and should not access to HIS when they are off

(Di

ario da Republica, 2005).

Scenario 1. A professional uses his credentials dur-

ing his shift to accomplish his tasks. In the end of his

shift, he goes home and he did not logout his session

on the computer. A colleague uses his open session to

access the HIS and take a look at a patient’s EHR that

he was curious about.

Figure 1: Use case “Check time of activity”.

For this UC we propose to track all activities of a

professional and monitor the consecutive time of ac-

tivity, checking if the total time of activity is normal

for a shift duration, or is shorter or longer insted, as

showed in the UC (ﬁgure 1) and on the SD (ﬁgure 2).

Figure 2: Sequence diagram “Check time of activity”.

Our algorithm requests the data to AT between

two dates, analyses the data and produces a report

with the classiﬁcation of the results corresponding to

periods of consecutive time of activity. As we cannot

aﬁrm that the result obtained is in fact an illegitimat

access, we classiﬁed the access as “suspicious” if it

is shorter or longer than expected for a work shift and

an alert is launched. The AD (ﬁgure 3) shows the pro-

posed algorithm. For each professional, the events are

ascending ordered by timestamp. It adds the time be-

tween two consecutive timestamps of event logs, if

the difference between them is less than six hours.

Other wise we consider that the professional is off and

it starts counting a new period of consecutive time of

actvity. If the total of added time is greater than thir-

teen hours or shorter than ﬁve hours it may indicate

that the user is not accessing only during the work

shift.

Evaluation. The algorithm was tested between 23rd

and 31st July 2019. The results for each professional

HEALTHINF 2020 - 13th International Conference on Health Informatics

540

Figure 3: Activity diagram “Check time of activity”.

and classiﬁcation as suspicious are shown in table 1.

If a professional accesses a period of time less than

ﬁve hours (≤ 299 minutes) and greater than thirteen

hours (≥781 minutes) it is classiﬁed as “suspicious”,

else the system classiﬁes the access as normal. In

the referred period, which counts nine days, after re-

moving the duplicated ones, we got 276 results, of

which 176 were classiﬁed as suspicious. The data pre-

sented show two outliers with different behaviours.

For the category “No identiﬁed” all cases have a dura-

tion completely distinct from the others, but all 6 oc-

currences have the same behaviour. This happens be-

cause this category represent automatic processes that

run in system’s background, acording the provider of

Obscare system. Another outlier is in “Nurse” cat-

egory and is similar to “No identiﬁed” category, be-

cause there are some automatic processes associated

to “Nurse” category, as well. Categories associated to

management and research tasks have activities with

very short duration and few occurrences. Looking to

categories that are more related to healthcare delivery

and removing the automatic processes from “Nurse”

category we can see that the results presented, gen-

erally, do not exceed the superior limit ﬁxed as sus-

picious access. However the categories “physician”

and “specialist phyisician” have some results that ex-

ceed that limit. It is also possible to see that there are

many accesses that do not go over the inferior limit

and those are responsible for most of the suspicious

access classiﬁcations.

3.2 Use Case “Check Days of Activity”

Secondly, we tried to identify professionals’ consecu-

tive days of activity that are longer than expected for a

week work, since professionals are off after a week of

work, that can be up to seven consecutive days and in

some exceptions even longer, and should not access

to HIS when they are off (Di

ario da Republica, 2005).

Scenario 2. A professional uses his credentials dur-

ing his shift to accomplish his tasks. In the end of

work week, when he is off, another user uses his cre-

dential to access a HIS, to take a look at a patient

EHR.

Figure 4: Use case “Check days of activity”.

For this UC we propose to track all activities of a

professional and monitor the consecutive days of his

or her work, checking if the total consecutive days

of activity is normal for a work week, or longer, as

showed in the UC (ﬁgure 4). The SD (ﬁgure 5) shows

that for every activity in the system done by a profes-

sional, it is sent a event log for the AT which identiﬁes

the professional, his proﬁle, the timestamp, the patient

accessed, the action executed among other data. Our

algorithm requests the data to the AT between two

dates, analyses the data and produces a report with the

classiﬁcation of the results corresponding to the num-

ber of consecutive days of activity. Also, as we cannot

aﬁrm that the result obtained is in fact an illegitimate

access, we classiﬁed the accesses as “suspicious” if

it is longer than expected for a work week. For each

Illegitimate HIS Access by Healthcare Professionals Detection System Applying an Audit Trail-based Model

541

Table 1: Results for UC “Days of Activity”.

Professional Category Nr Professional nr of consecutive minutes worked Stsndard Suspicious

results identiﬁed Min. 1st Qu. Median Mean 3rd Qu. Max. Deviation access

No identiﬁed 6 0 4305 4305 4305 4308 4305 4328 6.12 6 (100%)

Admin Sirai 5 5 4.00 4.00 6.00 6.40 8.00 10.00 2.60 5 (100%)

Admin VCOBSGYNV3,Create users 1 1 2.00 2.00 2.00 2.00 2.00 2.00 0 1 (100%)

Administrative 56 56 1.00 96.75 338.00 307.54 364.25 713.00 210.83 24 (43%)

Nurse 108 105 0.00 50.25 210.50 335.45 361.75 4320.00 194.41 71 (66%)

Nurse,Admin VCInt,Physician, Development team 1 1 1.00 1.00 1.00 1.00 1.00 1.00 0.00 1 (100%)

Nurse,Administrative,Admin Physician,Admin Sirai, Admin Backofﬁce 4 4 0.00 2.25 8.00 13.00 18.75 36.00 16.31 4 (100%)

Management, Admin, VCOBSGYNV3,Creat Users,Physician 1 1 137.00 137.00 137.00 137.00 137.00 137.00 0.00 1 (100%)

Indicators Sirai,Admin Sirai 1 1 0.00 0.00 0.00 0.00 0.00 0.00 0 1 (100%)

Physician 28 28 0.00 74.5 230.00 314.60 399.00 1543.00 345.88 21 (75%)

Physician Specialist,Physician 65 65 0.00 124.00 338.00 436.40 669.00 1431.00 365.68 41 (63%)

276 267 0.00 67.00 266.00 422.80 448.50 4320.00 766.37 176 (63%)

Figure 5: Sequence diagram “Check Days of activity”.

classiﬁcation as “suspicious” an email is sent to the

CIS and to the DPO.

The AD (ﬁgure 6) shows the proposed algorithm.

For each professional, the events are ascending or-

dered by timestamp and it counts consecutive days

using the timestamps of event logs. If the difference

between them is more than one day, we consider that

the professional was off and it starts counting a new

period of consecutive days of actvity. If the total of

added days is greater than eight days it may indicate

that the user is not accessing only during the work

shift.

Evaluation. The algorithm was tested between 23rd

and 31st July 2019. The results of days of activity for

each professional and classiﬁcation as suspicious are

shown in table 2. If a professional accesses a period

of days greater than 8 days it is classiﬁed as “suspi-

cious”, else the system classiﬁes the access as normal.

In the referred period, which counts nine days, we

got 213 results, of which 17 were classiﬁed as “sus-

picious”. The data presented shows two outliers with

different behaviours. For the category “No identiﬁed”

all the cases have a duration completely distinct from

the others, but all the 6 occurrences have the same be-

haviour. This happens because this category represent

automatic processes that run in system’s background.

Another outlier is in “Nurse” category and is similar

to “No identiﬁed” category, because there are some

automatic processes associated to “Nurse” category,

as well. Categories associated to management and re-

search tasks have activities with very short duration

Figure 6: Activity diagram “Check days of activity”.

and few occurrences. We can see that automatic pro-

cesses run every day having no associated category,

and there are accesses associated to management and

research accesses that occurs generally once or twice

a week. Observing the categories that are directly re-

lated to delivery of healthcare, such “Administrative”,

“Nurse”, “Phisician” and “Specialist Phisitian”, simi-

larly to what happens in the results of the UC “Check

HEALTHINF 2020 - 13th International Conference on Health Informatics

542

time of activity”, the suspicious accesses are associ-

ated to “specialist physician” category.

3.3 Use Case “Check EHR Read

Access”

In this UC, we tried to identify accesses by profes-

sionals to read patients’ EHR and did not create or

update them. The lack of evidences that can justify

such access is already spotted as an issue to solve by

healthcare institutions. According to GDPR and Joint

Commission International (JCI) for hospitals certiﬁ-

cation on Management of Information (MOI) 11.5

this type of access should be addressed to mitigate

problems related to data breaches (Joint Commission

International, 2017).

Scenario 3. A professional access to a patient EHR.

Why does he access? What are the evidences of the

healthcare delivering of that professional.

For this UC we propose to track all accesses of a

professional and monitor the patient accessed and the

type of action excuted between 72 hours (three con-

secutive days). A EHR may be updated after the end

of shift or in the begining of the shift and the infor-

mation updated may need to be checked in the end of

the shift. If between 72 hours there is an access to

read an EHR and there is any update or create action,

the access is classiﬁed as “suspicious”, else is classi-

ﬁed as normal, as showed in the UC (ﬁgure 7). The

SD (ﬁgure 8) shows that for every activity in the sys-

tem done by a professional, it is sent an event log for

the AT which identiﬁes the professional, his proﬁle,

timestamp, patient accessed, action executed, among

other data. Our algorithm requests the data to the AT

between two dates, analyses the data and produces a

report with the read actions (yes or no), write actions

(yes or no) and results of access classiﬁcation. Again,

as we cannot aﬁrm that the result obtained is in fact an

illegitimate access, we classiﬁed the accesss as “sus-

picious” if there is any update or create actions. For

each classiﬁcation as “suspicious” an email is sent to

the CIS and to the DPO.

Figure 7: Use case “Check EHR read access”.

Figure 8: Sequence diagram “EHR read access”.

Figure 9: Activity diagram “EHR read access”.

The AD (ﬁgure 9) shows the proposed algorithm.

For each professional, it is analysed the patient ac-

cessed, and, for each, veriﬁes the action, counts the

number of readings and the number of writings. For

each patient, if the number of readings is greater than

zero and the number of writings are equal to zero the

Illegitimate HIS Access by Healthcare Professionals Detection System Applying an Audit Trail-based Model

543

Table 2: Results for UC “Days of Activity”.

Professional Category Nr nr of consecutive days Standard Suspicious

results Min. 1st Qu. Median Mean 3rd Qu. Max. deviation access

No identiﬁed 9 11 11 11 11 11 11 0.00 9 (100%)

Admin Sirai 18 1.00 1.00 1.00 1.11 1.00 2.00 0.32 0

Admin VCOBSGYNV3,Create users 1 1.00 1.00 1.00 1.11 1.00 1.00 N/A 0

Administrative 61 1.00 1.00 2.00 2.18 3.00 7.00 1.46 0

Nurse 101 1.00 1.00 1.00 1.67 2.00 5.00 0.99 0

Nurse,Admin SIRAI 1 1.00 1.00 1.00 1.11 1.00 1.00 N/A 0

Nurse,Admin VCInt,Physician, Development team 2 1.00 1.00 1.00 1.11 1.00 1.00 0.00 0

Nurse,Administrative,Admin Physician,Admin Sirai, Admin Backofﬁce 5 1.00 1.00 1.00 1.60 2.00 3.00 0.89 0

Management, Admin, VCOBSGYNV3,Creat Users,Physician 1 1.00 1.00 1.00 1.11 1.00 1.00 N/A 0

Indicators Sirai,Admin Sirai 7 1.00 1.00 1.00 1.43 1.50 3.00 0.78 0

Physician 30 1.00 1.00 2.00 2.03 2.00 5.00 1.19 0

Physician Specialist,Physician 82 1.00 1.00 2.00 2.57 2.00 11.00 2.64 8 (9.7%)

318 1.00 1.00 1.00 1.93 2.00 11.00 2.25 17(5,3%)

access is classiﬁed as “suspicious”, else it is classiﬁed

as “normal”.

Evaluation. The algorithm was tested between 23rd

and 31st July 2019. The results for UC “EHR read

access” are shown in table 3. In the referred pe-

riod, which counts nine days, we obtained 378 results,

of which 300 (79%) were classiﬁed as “suspicious”.

This means that, during this period, there were 300

sets of professionals’ accesses to EHR’s patient with-

out any registry being made. Analysing the results we

can see that 32 accesses classiﬁed as “normal” do not

have professional ID and 28 others do not have patient

ID.

Table 3: Results for UC “EHR read access”.

Analysis Total of No Null Suspicious access classiﬁcation by system

date results Prof. Id Patient Id Total Null Prof. Id Null Patient Id

2019-07-22 49 10 (20%) 11 (22%) 39 (80%) 4(10%) 9 (23%)

2019-07-23 38 6 (16%) 13 (34%) 30 (79%) 4 (13%) 8 (27%)

2019-07-24 49 7 (14%) 13 (27%) 44 (90%) 4 (9%) 12 (27%)

2019-07-25 59 9 (15%) 11 (19%) 43 (73%) 5 (12%) 7 (16%)

2019-07-26 31 11 (35%) 10 (32%) 24 (77%) 6 (25%) 8 (33%)

2019-07-27 31 10 (32%) 7 (23%) 24 (77%) 7 (29%) 4 (17%)

2019-07-28 29 6 (21%) 6 (21%) 22 (76%) 4 (18%) 2 (9%)

2019-07-29 44 10 (23%) 9 (20%) 34 (77%) 4 (12%) 7 (21%)

2019-07-30 48 8 (17%) 14 (29%) 40 (83%) 7 (18%) 9 (22%)

378 77 (20%) 94 (25%) 300 (79%) 45 (15%) 66 (0,22%)

The results show that 77 classiﬁcations (20%) do

not have the professional ID. Such occurrences are

related to automatic processes, that run in parallel, to

check, get and retrieve necessary data on EHR. The

results with N/A “patitents id” are 94 (25%), and they

are users’ processes that are not related with patients

but to other type of reports instead. So we adjusted the

values excluding the results of automatic processes

and we obtain the values on table 4, which shows that

the results classiﬁed as “suspicious” grow to 92%.

Analysing the results by catogories (table 5), we

have those that typically access data for management

and research tasks. All these accesses were con-

sidered suspicious because they are query actions.

Nonetheless all the other categories have a high per-

centage of access classiﬁed as “suspicious”. This in-

dicates that there are a several number of EHR ac-

cessed that did not had information updated, and were

just consulted.

Table 4: Adjusted results for UC “EHR read access”.

Date Total results Suspicious access

nr = 216 nr= 198 (92%)

2019-07-22 29 27 (93%)

2019-07-23 20 19 (95%)

2019-07-24 30 29 (97%)

2019-07-25 40 32 (80%)

2019-07-26 11 11 (100%)

2019-07-27 15 14 (93%)

2019-07-28 18 17 (94%)

2019-07-29 26 24 (92%)

2019-07-30 27 25 (93%)

Table 5: Results for UC “EHR read access” by professional

category.

Professional Category Nr Accesses Nr Suspicious

nr=378 nr=300 (79,3%)

No identiﬁed 50 30 (60)

Admin Sirai 7 7 (100)

Admin VCOBSGYNV3,Creat User 2 2 (100)

Administrative 52 38 (73)

Nurse 117 93 (79,5)

Nurse,Admin VCInt,Physician, Development team 1 1 (100)

Nurse,Administrative,Admin Physician,Admin Sirai, Admin Backofﬁce 12 12 (100)

Management, Admin, VCOBSGYNV3,Creat Users,Physician 2 2 (100)

Indicators Sirai,Admin Sirai 2 2 (100)

Physician 25 18 (72)

Physician Specialist,Physician 108 95 (88)

The categories related to healthcare delivery like

“Administrative”, “Nurse”, “Phisician” and “Special-

ist Physician” have a high percentage of suspicious

access classiﬁcation, all above 70%.

4 DISCUSSION

Previous work showed that there are many reasons for

existig concerns about health data access on health-

care institutions (L.Correia et al., 2019) and the health

data ﬂow complexity is such that turns very hard to

evaluate the ligitimacy of the accesses to EHR. How-

ever, despite the complexity of health data manage-

ment processes, it is possible to describe scenarios,

UC and the data ﬂow of the access between users and

systems through SD. Based on this information we

could develop three algorithms for suspicious activ-

HEALTHINF 2020 - 13th International Conference on Health Informatics

544

ity dectection that used the data from AT, which has

the users and systems activity logs. However, at this

point, we just had available logs from one applica-

tional system (Obscare) that were being collected to

HS.Register in a hospital from North Portugal. So

we developed three algorithms for suspicious activ-

ity dectection that could be tested. The results im-

mediately show that there are some aspects in com-

mon to the three analysis. There are events that do not

have the professional identiﬁcation or category. This

actions are automatic processes running in system’s

background and all of them are classiﬁed as suspi-

cious in “Check time of activity” and “Check days of

activity” UC, because of their continuous behaviour

that exceeds the time limits imposed by algorithm’s

rules. For UC “EHR read access”, as it depends on the

type of action of the automatic processes, not all are

suspicious. Some are just to check information and

others update the EHR. Being automatic processes

they probably do not represent a threat, nonetheless

they should be identiﬁed to make easier to spot and

interpret them. We can also see that there are various

professional categories of management responsibili-

ties, and they are not used regularly. So their accesses

appear classiﬁed as suspicious accesses due to their

pattern of very short usage, specially when comparing

with the expected duration for a shift. Even seeming

normal at a ﬁrst glance, it would be recommendable

to track the behaviour of these accesses in particular,

once they provide conﬁdencial data. We think that a

detailed analysis of the pattern of these accesses may

give further indication of their legitimacy.

By the point of view of the professional category

of the staff that access to HIS, the main categories

that access are administrative, nurses, physicians and

specialist physicians. In general, they present normal

activity in what concerns to activity longer than ex-

pected (≤780 minutes), 27 in 276 which represent

9,78% of the results. Physicians and Specialist physi-

cians are the categories that have more cases of this

type of activity, longer than expected. Some of these

suspicious accesses may be explained by the fact that

Obscare system is also used in emergency context,

and not only for consulting or hospital stay context.

In emergency context shifts may be longer than 12

hours, up to 24 hours (L.Correia et al., 2019). The

suspicious accesses detected are in most cases for ac-

tivities shorter than expected (≥300 minutes), 152 in

276 representing 55% of the results and it is common

to all categories. We supose that the consecutive time

of activity on HIS by professionals in general proba-

bly may be shorter than six hours.

In the case of consecutive days of work, exclud-

ing automatic processes which are 9 of 318 (2,83%),

only specialist physician category exceeds the ex-

pected number of consecutive days of activity, which

are 7 of 309(2,27%). The constraints of patient data

access to care delivery it is usually used the creden-

tials of physicians because most of the times they do

not have EHR access limitations. These occurrences

might explain the values obtained. In general, we see

that accesses that show the highest percentage of be-

ing suspicious are the ones associated to physicians

categories and are in line with our expectations. How-

ever a deeper analysis would be necessary to have fur-

ther conclusions about these results. Relatively to ac-

cesses made to read medical records, the mean of sus-

picious access is 79% of the total accesses, and when

analysing by category we can see that this high per-

centage is transversal to all categories. Even when

excluding the categories that normally access to get

reports, every type of management and research cat-

egories that access to extrat data and do not update

records, the percentage of suspicious access grows to

93%, which means that only 7% of the records are ac-

cessed and updated. For this UC we should evaluate

again the data that it is being analysed, test it during a

longer period of time and ﬁnd out whether this num-

bers are correct, performing an analysis on the ﬁeld.

5 CONCLUSIONS

The scope of this study is very complex and requires

a very thorough analysis. Although the difﬁculties we

found it was possible to create a proof of concept of a

system to detect suspicious accesses by professionals

from healthcare institutions.

Some limitations we have are the lack of detail

of the tasks performed by healthcare professionals to

create more precise rules for algorithms. An analy-

sis on the ﬁeld, would be also very useful to better

understand the results and, probably, change the clas-

siﬁcation of some accesses. Another limitation is the

availability and quality of HIS logs. Obscare system

has already logs prepared for GDPR compliance, but

many systems have not and institutions need to make

a great effort on providers to have this information.

The period of test should be longer than nine days to

detect more patterns in the results obtained.

Nonetheless it was possible to model the scenar-

ios of undue access and create algorithms to detect

suspicious accesses. The results obtained gave a ﬁrst

glance of what is happening at the level of HIS ac-

cess. A strength of using Obscare system was the fact

that it is used on hospital stay, consulting and emer-

gency context. It may explain some of the outliers

detected, as the emergency shifts may have different

Illegitimate HIS Access by Healthcare Professionals Detection System Applying an Audit Trail-based Model

545

durations. These results must be conﬁrmed at an ini-

tial stage and, than, take advantage of this information

to create a knowledge base that will allow to apply Ar-

tiﬁcial Intelligence (AI) models.

Even at this stage, which is still in a very embry-

onic stadium, the project reveals to be very useful to

IS department and to DPO. They are having the ﬁrst

picture of the accesses by professionals in the pre-

sented format. Having the results of access classiﬁ-

cation based on the rules created according to staff

routines, the identiﬁcation number of the professional

that made the access, the time the access was made

and the patient accessed, it gives clues for DPO and

CIS investigate whether the access was in fact ille-

gitimate. Further work must be performed to com-

pletely accomplish the main goal of this project, like

perform a more detailed analysis to verify the correct-

ness of classiﬁcations, determine its sensibility and

speciﬁcity and detect the suspicious accesses in al-

most real time. It would be very interesting and use-

ful althought the caracteristics of technology used in

hospitals may be a barrier. Finally, the production of

a knowledge base its recommended so that it will be

possible to apply AI models in the future.

ACKNOWLEDGEMENTS

This article is a result of the project HS.Register

Demonstrator, NORTE-01-0247-FEDER-033756,

supported by Norte Portugal Regional Operational

Programme (NORTE 2020), under the PORTUGAL

2020 Partnership Agreement, through the European

Regional Development Fund (ERDF).

REFERENCES

ario da Republica (2005). Informac¸

ao gen

etica pessoal

e informac¸

ao de sa

ude, Lei n.

12/2005 - Di

ario da

Rep

ublica n.

18/2005).

EU (2016). Regulation (EU) 2016/679 of the European Par-

liament and of the Council of 27 April 2016 on the

protection of natural persons with regard to the pro-

cessing of personal data and on the free movement of

such data, and repealing Directive 95/46/EC (General

Data Protection Regulation).

Gonc¸alves-Ferreira, D., Leite, M., Santos-Pereira, C., Cor-

reia, M. E., Antunes, L., and Cruz-Correia, R. (2018).

HS.Register - An Audit-Trail Tool to Respond to the

General Data Protection Regulation (GDPR). Stud.

Health Technol. Inform., 247:81–85.

Joint Commission International (2017). JCI Accreditation

Standards for Hospitals, 6th Edition.

L.Correia, R.Cruz-Correia, and P.Rodrigues (2019). Illegit-

imate his access by healthcare professionals: scenar-

ios, use cases and audit trail-based detection model.

SCITEPRESS.

HEALTHINF 2020 - 13th International Conference on Health Informatics

546