AUTOMATED RISK DETECTION

What are the Key Elements Needed to Create a Multi-source, Pattern-based Risk

Detection System?

Ian Blunt, Xavier Chitnis and Adam Roberts

Healthcare Commission, Finsbury Tower, Bunhill Row, U.K.

Keywords: Information, Analysis, Risk, Regulation.

Abstract: The Healthcare Commission, the national regulator for health care in England, uses an innovative risk

detection system to target its inspections of National Health Service organisations. At the core of the

system is a tool that enables: gathering of information from a huge variety of sources, and of varying types;

mapping this information to the regulatory framework; and analysing this information in a comparable way

to detect patterns that could indicate risk. The tool has demonstrated itself to be flexible and reliable, and its

risk estimates have been consistently proven to be effective at discovering failure compared with non-

targeted inspections.

1 INTRODUCTION

From its creation in 2004, the Healthcare

Commission has always set out to be targeted and

proportionate in its work (Kennedy, 2003), relying

on intelligent use of information to guide its

inspections and reducing the regulatory burden on

well performing providers. This is in line with

modern regulatory thinking in the UK, which

advocates a risk-based approach to regulation

(Office for Public Sector Reform, 2003). Clearly,

this puts a focus on obtaining high quality

information, and managing and using it

appropriately.

A further change in England’s regulatory

landscape in 2004 was a move away from

performance assessment solely through key

performance indicators known as “star ratings”

(Healthcare Commission, 2005) and towards an

assessment against standards. By their nature,

standards tend to be broader and less well defined

than performance indicators. In this case, the

Standards for Better Health (Department of Health,

2004) are set at a very high level, meaning that there

is no single set of indicators that can measure them

accurately.

Our solution was to gather as many imperfect

measures as possible to try to describe performance

against the standard, while acknowledging their

imperfection by following up with inspection on

areas where our system detects risk of non-

compliance. In this way, the system produces

compliance risk estimates, not judgements. Given

that we are assessing 44 part-standards in nearly 400

organisations, a huge amount of information is

required and needs a highly sophisticated system to

manage and analyse it. This paper explores the

lessons learnt from developing that system, and the

emergent key elements that are required for any

system like this to function.

2 KEY ELEMENTS OF THE

SYSTEM

The system’s primary task is to support the

Commission’s main inspection programme, the Core

Standards Assessment, but also supports risk

targeting for many other assessments.

Through our programme of development, three

key functions of the system became clear;

1. Being able to align information (by analysis

unit)

2. Being able to map information (by analysis

topic)

3. Being able to compare multiple sources to

produce an overall result

393

Blunt I., Chitnis X. and Roberts A. (2009).

AUTOMATED RISK DETECTION - What are the Key Elements Needed to Create a Multi-source, Pattern-based Risk Detection System? .

In Proceedings of the International Conference on Health Informatics, pages 393-397

DOI: 10.5220/0001781803930397

 SciTePress

2.1 Structuring Information

2.1.1 Sourcing Data

One of the core principles of our risk targeting is

that we will not require any bespoke data collection,

but rely solely on existing information. Another is

that the system is opportunitistic, and does not rely

on good national coverage for inclusion. We aim to

use “everything the Commission knows” to risk

assess organisations.

Any member of staff can add data to the system,

a decision taken because we find that data are more

reliably stored if they are imported by staff using it

for their own ends, rather than hire administrative

staff solely to load data. Incoming data can come

from any source, any location, and any format,

although the majority of these arrive as spreadsheets

containing a handful of measures.

2.1.2 Formatting Data

Once data are received and assessed as fit for

purpose, they are transferred to a data template for

entry to main database.

The key unique identifier is the organisation

code (NACS code for NHS), although this could be

any consistent label to identify a specific unit of

analysis. If multiple measures have been supplied,

this is where they are divided into

value/numerator/denominator format, or category

name and rank if they are categorical. New

measures can also be created by combining separate

numerators and denominators from different sources

at this stage. The data template also stores metadata

to feed to the main database, as discussed in the next

section.

2.1.3 Storing Data & Structure

Each one of the measures described above is

referred to as an “item”, and each dataset is a “time

period” of that item which that consists of individual

“observations”, which are either a value, numerator

and denominator or a list of categories, depending

on their type. Items can have many “time periods”,

which allows us to align measures over time and

reduces the amount of metadata that have to be re-

entered.

Key meta data are information about: description

of the measure (including type, numerator and

denominator units); source details; dates that the

data relate to; audit trail data (file paths, URLs etc);

an assessment of reliability of the information; and

the “sentinel distribution” – which notes whether

high, low or extreme values should increase our

estimation of risk.

2.1.4 Handling Free Text Intelligence

As well as traditional numerical measures, the

system also makes extensive use of comments

derived from free text sources. This is an important

way of capturing input from patient groups and

including isolated, opportunitistic intelligence such

as investigation reports and information discovered

by our local staff.

This information is structured by a team of

analysts who code each comment against a

taxonomy (currently the Standards for Better

Health). As well as topic, they also assess whether

the information tells us something positive or

negative about an organisation and issues around the

reliability of the comment and the strength of

relationship between the comment and the taxonomy

element.

2.2 Mapping Information

The system is designed to analyse a range of

intelligence related to a user-created assessment

framework. The frameworks will be determined by

the goals of the assessment programme, rather than

the information available.

A very simple structure allows us to create item

groups (with descriptive metadata) and map items

against them to mimic these frameworks. We can

also create a multilayer framework by mapping item

groups to other item groups (a conceptual example is

shown in figure 1). The item groups can represent

any construct of the assessment framework, be it a

standard, a part standard, criteria, topic, question etc.

The system will analyse the most recent time period

available when the group result is requested.



Figure 1: Conceptual representation of a simple

framework containing items (white) and item groups

(black).

HEALTHINF 2009 - International Conference on Health Informatics

394

Clearly, some items will be more important in

the framework than others. Importance might be

due to how accurately it measures the framework

construct, the authority of the intelligence (e.g. a

formal judgement carries more weight than a data

analysis) and its “real world” value (e.g. mortality

measures should outweigh bureaucratic process

measures). Items are weighted accordingly as part

of the mapping process, and can even be given a

“super-weighting” that allows them to “trump”

everything else in group.

2.3 Analysing Information

2.3.1 Item Level

The analysis process brings together many different

types of data. For each item of information, we

assess the difference between the observed result for

a particular organisation and an expected level of

performance on a common scale using the most

appropriate analysis for that item. The outcome of

this analysis is an “oddness” score which is a

statistical measure of how far each organisation’s

performance is from the expected level for that

measure. None of our methods penalise (or reward)

organisations simply for being at the bottom (or top)

of a list - they are designed to look for genuine

differences from our expectation. It is entirely

possible that all organisations will be performing

similarly to expectation on a data item.

We make a number of “stock” analysis methods

available to the user, who will be heavily guided by

information type towards an appropriate choice.

The system will also suggest analysis settings based

on characteristics of the data.

Our analysis methods are tailored to data type

(proportions, ratios etc) and take account of the

possibility that an organisation’s results may be

affected by chance variation. To do this we use a

modified Z score (Spiegelhalter, 2005).

The expected level of performance against which

a organisation is compared can be calculated in

several ways. For some items, organisations are

compared against the national average of all

organisations. In other cases - such as waiting times

for example - an expected level of performance has

been set down for organisations in government

policies. For some data items we recognise that

organisations’ performance may be significantly

influenced by factors beyond their control. There are

two main ways we adjust for this. Either the ‘raw’

data are standardised (for example by age and sex)

before import or we may set our expectation for that

organisation as the average performance of a group

of other organisations with similar local

circumstances (referred to as the ‘benchmark

group’). We use various benchmarking groups in

our analysis, including deprivation, population

turnover and disease prevalence.

Where data are categorical, we achieve analysis

results on the common risk scale by assuming an

underlying normal distribution in the frequency data

and assessing distance between each observation and

the expectation (either an imposed target or set as

the ordinal category that contains the median

observation).

Our free text comments are scored by analysts as

discussed in section 2.1.4. These factors are then

translated into a score that is nominally equivalent to

the scores on the common risk scale.

2.3.2 Pattern Detection at Group Level

For organisations whose performance over a range

of items appears to be “oddly” poor, we infer that

there may be a risk of failure against the given

framework. However, there are many reasons why

an organisation that raises concerns in our analysis

might be found legitimately to be compliant by

inspection. The organisation will have access to

much better local sources of evidence than are

available to the Commission at a national level for

risk assessment. They will also have the benefit of

the most up-to-date information. It might also be

that, while the organisation is not performing well

compared with other organisations, they are still

meeting the minimum needed for acceptable

performance against the framework.

For each item of information, we assess whether

the organisation’s result was in line with what we

would expect, as outlined in section 2.3.1 above.

The results for all items mapped to an item group

(including qualitative information) are then

aggregated together. This produces an overall group

“oddness” score that is directly comparable to the

item oddness.

Our main method of combining the results from

each item of information is not to calculate a simple

average, but instead enables us to highlight patterns

of poor performance. For example, an item group

may be assessed as being at high risk where several

items of information are worse or tending towards

worse than expected, but none exceed the threshold

to be notable in their own right.

When combining this volume of information,

rules-based or directly weighted aggregation models

that finely balance every item against each other

become unsustainably complex. Our model uses

broad weights discussed in section 2.2 and then

automatically avoids double counting by adjusting

for the degree of auto-correlation within the item

AUTOMATED RISK DETECTION - What are the Key Elements Needed to Create a Multi-source, Pattern-based Risk

Detection System?

395

group. This allows us to include any relevant

measure without needing to consider whether it

measures the same underlying factors as other

measures.

Lastly, as the system takes item groups with

massively different amounts of information and

produces directly comparable risk estimates, we

need to consider the confidence we should have in

that risk estimate. In general, it would be

unreasonable and disproportionate to trigger an

inspection based on just one or two observations.

Other aggregation methods are also available,

which can include taking a conventional mean of

item results or counting the number of outlying

observations in each group.

2.4 Outputs

2.4.1 Selection Models

The core business output is to inform our selection

models that are run separately from the main system

to allow for swift customisation and adjustment.

However, they are all based in some way on the risk

estimates produced by Compass.

Typically selection models are either absolute, in

which any organisation with more than a certain

number of high risk item groups are inspected, or

prioritised, in which the X% most risky

organisations are selected dependent on resource

available. We have that facility to apply almost any

model that is desired by the assessment programme.

2.4.2 Presenting Results

In addition to triggering inspections, it is important

that the system can also display its results both to

help inspection staff engage with the risk assessment

and to provide an audit trail to the inspected

organisation to show that the selection was objective

and robust. We also make the results available to

the public on our website.

This is achieved with a customised reporting

tool, that takes a direct transfer from the “live”

system when a new set of results are released. A

screenshot example is shown in figure 2.

3 RESULTS AND USAGE

3.1 Core Purpose Results

Demonstrating the success of a risk targeting system

can often be problematic, as the resulting inspection

programmes tend to be entirely risk-based. Indeed,

most of our smaller reviews operate in this way.

However, our main inspection programme, the

Core Standards Assessment, contains a parallel

element of random selection, which allows us to

judge the effectiveness of our risk detection. Our

success criterion is simply that the system should

detect more non-compliance than selecting

organisations by chance alone.

Figure 2: Screenshot from our interactive reporting tool.

Users can drill down for more information on any item.

In the two years for which results are currently

available risk targeted inspections discovered twice

(2005/2006) and then three times more (2006/2007)

non-compliance than inspections selected at random

(Bardsley et al, 2008 1&2). Therefore the system

has achieved its core objective.

However, there is still scope to improve. For

example, we know that we can target some standards

more accurately than others, and this is often a

consequence of the information available.

We also know that our inspectors are

increasingly engaging with the system, as the

number of local intelligence reports submitted has

increased nearly five-fold since the first application

(1160 comments for 2005/2006 compared with 5508

for 2007/2008).

3.2 Additional Uses

Success of a system can also be measured by its

adoption in other business areas. In addition to

HEALTHINF 2009 - International Conference on Health Informatics

396

supporting most of the Commission’s NHS risk-

targeted work, the system also provides a regular

rolling update of risk status to our local staff

(independent of inspections) to prompt extra

gathering of local intelligence. The system can also

be exploited as an intelligence-base, and has

informed many other assessment programmes by

providing information but not targeting.

4 CONCLUSIONS / DISCUSSION

The system’s risk estimates have been proven to be

an effective method of targeting the Commission’s

inspections, and our approach to estimating

performance against frameworks using multiple

information sources has been validated. The success

of the system has led to wide scale adoption by the

Healthcare Commission, and it has also been used in

a number of other ways that build on the benefits of

having created such a large structured intelligence-

base.

Additionally, we believe that the range and scope

of the information that we have collected and

focused for a common purpose is unprecedented in

the field of healthcare information handling,

although several others have advocated the use of

investigating organisational performance by using

multiple measures (Yates & Davidge 1984, Harley

et al 2005).

One important innovation is the integration of

quantitative and qualitative intelligence, firstly to

maximise the use we make of our intelligence and

also because it has allowed our inspectors to engage

with a targeting system that some might consider

centralist. Being able to submit extra evidence to

influence the next round of risk assessment – and

seeing their input reflected – has increased their

feelings of ownership for the risk estimates that the

system produces. Another important effect has been

to help embed the approach of using data to prompt

further questions, as proposed by Lilford et al

(Lilford et al, 2004), rather than to pass judgement

directly.

This approach is extensible to any regulator

(even sectors other than health), and to any

organisation with good data on a large number of

sub-units, by applying the key elements identified in

this paper.

One of the current challenges for this approach is

to extend it to areas that are less rich with

information such as independent sector health care

and social care.

ACKNOWLEDGEMENTS

We would like to acknowledge the contributions

from many teams within the Healthcare Commission

but in particular members, past and present, of the

Screening Development Team and most notably

Martin Bardsley, David Spiegelhalter and Theo

Georghiou.

REFERENCES

Kennedy I., 2003. CHAI: A new organisation. London:

Commission for Health Audit and Inspection.

http://www.healthcarecommission.org.uk/_db/_docum

ents/04000020.pdf (accessed Sep 2008)

Office for Public Sector Reform, 2003. Inspecting for

Improvement. London: Office for Public Sector

Reform.

http://archive.cabinetoffice.gov.uk/opsr/documents/pdf

/inspecting.pdf (accessed Dec 2007)

Healthcare Commission, 2005. NHS Performance Ratings

2004/2005 London: Healthcare Commission.

http://www.healthcarecommission.org.uk/_db/_docum

ents/04018745.pdf (accessed Sept 2008)

Department of Health, 2004.. National Standards, Local

Action: Health and Social Care Standards and

Planning Framework 2005/6-2007/8. London:

Department of Health.

http://www.dh.gov.uk/en/Publicationsandstatistics/Pub

lications/PublicationsPolicyAndGuidance/DH_408605

7 (accessed Dec 2007)

Spiegelhalter D J., 2005. Funnel plots for institutional

comparisons. Statistics in Medicine, 24:1185-1202.

Bardsley M, Spiegelhalter DJ, Blunt I, Chitnis X, Roberts

A, Bharania S., 2008. Using routine intelligence to

target inspection of healthcare providers in England.

Quality and Safety In Health Care, In publication.

Bardsley M., Blunt I., Chitnis X., Spiegelhalter D., 2008.

Which NHS trusts get inspected by the Healthcare

Commission? (an update). International Forum on

Quality and Safety, Paris, France, April 2008

Yates J.M., Davidge M.G., 1984.. Can you measure

performance? British Medical Journal

1984;288:1935–6.

Harley M, Mohammed MA, Hussain S, Yates J, Almasri

A., 2005. Was Rodney Ledward a statistical outlier?

Retrospective analysis using routine hospital data to

identify gynaecologists performance. British Medical

Journal 2005;330:929-32

Lilford R., Mohammed M.A., Speigelhalter D., Thomson

R., 2004. Use and misuse of process and outcome data

in managing performance of acute medical care:

avoiding institutional stigma. Lancet 2004;363:1147-

54.

AUTOMATED RISK DETECTION - What are the Key Elements Needed to Create a Multi-source, Pattern-based Risk

Detection System?

397