INTEGRATING R MODELS WITH WEB TECHNOLOGIES

Mingrui Zhang

1,2

, Scott Olson

, Joan Francioni

, Tim Gegg-Harrison

, Nan Meng

Zhifu Sun

and Ping Yang

Department of Computer Science, Winona State University, Winona, MN, U.S.A.

Department of Health Science Research, Mayo Clinic, Rochester, MN, U.S.A.

Keywords: Web, Cancer Patient, Survival, Treatment, Software Tool, Lung Cancer.

Abstract: We describe a software framework designed to shorten the translation of research models from theory to

clinical practice. The framework integrates research and clinical practice into a single software architecture.

Specifically, we present a Survival Probability Predication Architecture (SPPA), which is an extensible

software platform allowing researchers to experiment with their statistical models and make rapid delivery

of these models to clinical practice without a lengthy software development cycle.

1 INTRODUCTION

With personalized medicine, information such as a

patient's genotype, molecular profiles, or clinical

phenotypes is often used to make a decision on

medication, provision of a therapy, or the initiation

of a preventative measure. To make such a decision,

health care teams need to be able to access the most

current treatment plans and adjust their strategies

accordingly and with confidence. One of the key

challenges is to make a quick delivery for such

systems at lower development costs, and allow

greater interdisciplinary collaboration in treating

patients (Tan et al., 2005).

A number of models for predicting cancer survival

have been investigated and published, but few of

them have ever been tested in clinical settings and

even fewer have been used eventually in clinical

practice. They are often developed in a statistical

programming environment, such as R (www.r-

project.org). As a statistical programming

environment is intended to be used in biomedical

research and not for clinical application, a final

software translation is usually required to code the

model into a computer system appropriate for a

clinical setting. Due to costly software development

processes, most of the models remain in literature

and their translation never actually happens.

In this paper, we present a prototype system that

integrates statistical cancer survival prediction

models into web-based applications. The system is

based on web technologies to allow for mobility and

to achieve wide user access. Its information

exchange backbone uses standard-compliant XML

formats, and it is built as a set of multiple standalone

Java applications.

2 USER INTERFACE

Our Lung Cancer Survival Prediction web

application takes as input historical information on

previous patients and specific information about a

current patient. It calculates the current patient’s

survival probability and presents the findings as a

chart and/or a table. To use this web-based tool, the

end user (e.g., a clinician or a data entry person)

enters patient clinical information and then submits

the information. Once the request is received, the

tool initiates statistical model(s) and passes the

inputs to the model(s). After completing the

calculation, the model(s) returns the results to the

tool, and the tool formats and presents the results of

the prediction model, viewable as either a chart or

table. The user can change patient information,

select different treatments, and examine the results

in real-time.

411

Zhang M., Olson S., Francioni J., Gegg-Harrison T., Meng N., Sun Z. and Yang P. (2009).

INTEGRATING R MODELS WITH WEB TECHNOLOGIES.

In Proceedings of the International Conference on Health Informatics, pages 411-415

DOI: 10.5220/0001430104110415

 SciTePress

2.1 Data Input

The system allows for different model(s) for various

cancer types. Even within the same type of cancer,

different models may use different sets of input from

the user. However, we assume all models require

some core inputs from the user. For example, all

models require basic information including patient’s

age, gender, and cancer cell type, stage and grade

when predicting a cancer patient’s survival

probability. Additional information such as the

patient’s smoking status and history for lung cancer,

and the treatments received by the patient could be

used in a specific model to improve prediction

accuracy. All of this information is gathered with the

Patient Information Entry Form shown in Figure 1.

Figure 1: Patient Information Entry Form.

The tool is designed to integrate multimodality data,

such as genomic information, to make better

predictions and to aid individual physicians in

providing the best treatment for their patients. As the

models being researched and developed vary from

time to time, data input to the models will change

accordingly. Such changes require the tool to be

adjusted based on the models to be used. As

designed, the researcher can make these changes

directly by adjusting the user interface for his/her

new model without the need to consult a computer

programmer.

2.2 Presentation of the Results

The user interface presents the results of the

statistical model as graphs and tables as shown in

Figures 2 and 3 respectively. At the time a patient is

diagnosed with lung cancer, a clinician would be

able to use the tool to compare the effects of

different treatment options on the patient’s survival

probability. Over time, as the patient receives

specific treatment, more data is generated affecting

his or her survival probability. In deciding further

treatment for the patient, the clinician can use the

tool to investigate treatment options that take this

new data into account.

Figure 2: Graph View of Prediction Results.

Figure 3: Table View of Prediction Results.

3 ARCHITECTURAL DESIGN

Our platform allows researchers to add and remove

statistical models and to make changes to the input

area of the user interface. Table 1 summarizes the

functions that can be performed by researchers,

clinicians, and data entry people. In designing the

architecture of our framework, the guiding principle

was to create an environment that researchers and

diagnosticians could use to experiment with various

diagnostic models and potential treatments without

having to acquire expertise in a computer program-

HEALTHINF 2009 - International Conference on Health Informatics

412

Figure 4: Survival Probability Prediction Architecture.

ming language and environment. The resulting

Survival Probability Prediction Architecture (SPPA)

is designed for experimentation with diagnostic

models and survival prediction. SPPA is based on

the Model-View-Controller architectural pattern, as

shown in Figure 4. It provides both a mechanism for

defining models and a mechanism for testing the

model in a clinical setting.

Table 1: Software Functions Provided to User Groups.

User Group Functions Supported

Researchers

Add and remove prediction

models in R; Change database for

the model; Modify user interface

Clinicians

Add, view, and modify a patient

record; Compare and select

treatments

Data entry person

Add, view, and modify a patient

record

3.1 Controller

The heart of the SPPA is the Controller. The

Controller was designed to be sufficiently general

enough to enable quick and seamless modification of

the system by non-computer specialists. Although

we expect researchers to be proficient in a statistical

programming language like R, we do not expect

them to be proficient in Java. Likewise,

diagnosticians should be able to navigate a web

page, but they should not have to generate one to be

able to view the results of a model.

In order to insulate the researchers and clinicians

from its internals, the Controller is subdivided into

three components: the Model Manager, the View

Manager, and the Variable Definition component.

The Model Manager uses JRI (www.rforge.net/JRI)

to provide an interface between the Java methods of

the Controller and the prediction model, which is

currently written in R. However, the Model Manager

enables researchers to build diagnostic models using

any statistical programming language of their

choice. At present, SPPA only supports statistical

models written in R, but it can be easily modified to

support any statistical modeling language by

extending the Model Manager to provide an

interface between Java and the new statistical

modeling language.

The View Manager is responsible for providing the

researcher and/or the clinician with the results of the

prediction model on a given patient. It consists of

two components: the Web Form Generator and the

Presentation Generator. Information (e.g., age,

gender, etc.) associated with new patients is gathered

through a web page form that is generated by the

Web Form Generator. The Web Form Generator

works with the XML definitions of the Variable

Variable Definition

- XML

R Prediction

Model

Model Manager

- Model Interface (JRI)

View Manager

- Web Form Generator

- Presentation Generator

Presentation

Input Form

Historical

Patient Data

Prediction

Current Patient’s Data

Model

iew

CSS

Styles

Controlle

INTEGRATING R MODELS WITH WEB TECHNOLOGIES

413

Definition component to dynamically create the web

form.

The glue that connects the Model with the View is

the Variable Definition component. The Variable

Definition component uses XML to define the type

and structure of the inputs that describe the state of

the patient, and defines which models use which

inputs. The XML files are used by both the Model

Manager and the View Manager. The View Manager

uses the XML definitions within the Web Page

Generator to create the Patient Information Entry

Form like the one given in Figure 1. Once the

clinician has completed the form, the inputs that

he/she entered about the patient are passed to the

prediction model via the Model Manager.

3.2 Prediction Model

There are three major components that support the

prediction model. In addition to the prediction model

itself, the Model Manager and Variable Definition

components of the Controller work together to help

researchers and clinicians work with different

prediction models. SPPA was specifically designed

to separate out the functionality of prediction

modeling so as to minimize the type and level of

modifications that are necessary to experiment with

different models.

Obviously, to experiment with different models, the

researcher needs to build a new prediction model. If,

however, the new prediction model is written in R

and uses the same input parameters as the current

prediction model then no additional modifications

are needed. The researcher need merely replace the

current R prediction model with the new one. If, on

the other hand, the researcher needs to add some

additional input parameters then he/she must also

modify the Variable Definition component of the

Controller to define the structure and form of the

new inputs in addition to replacing the current

prediction model with the newly generated one.

It is also possible to build a new prediction model

using a programming language other than R, but

such a change also requires a change to the Model

Manager. JRI serves as an interface between Java

and R. Changing to a different statistical

programming language would require embedding a

Java method that defines an interface between Java

and the new statistical programming language.

The actual prediction models we used in building the

system take inputs describing a given patient (e.g.,

age, gender, stage) along with a suggested treatment

and then predicts the survival probability of the

patient. In our previous work, we analyzed over

5,000 consecutively enrolled non-small cell lung

cancer patients and developed two models for

predicting lung cancer patient’s survival probability

(Sun et. al., 2006). The first model uses patient’s

information available at the time of diagnosis and

has been proven prognostic in our previous work. It

uses age, gender, stage, cell type, and tumour grade

as inputs. The second model uses additional

information, including the treatment options and

patient’s smoking status.

3.3 User View

The user view is controlled by the View Manager as

described above. It is not necessary to change the

Web Form Generator when adding new inputs to the

prediction model. Researchers need only change the

XML definitions in the Variable Definition

component of the Controller without making any

changes to the Web Form Generator. SPPA allows

the researcher to make changes to the inputs to

his/her prediction model without having to make

changes in the Java code. Furthermore, given the

existence of XML editors, researchers can change

XML definitions without a complete understanding

of XML.

The information that is gathered via the web page

form is stored in a file using the format defined by

the XML tags of the Variable Definition component

and then transmitted to the Prediction Model. If the

clinician is dealing with an existing patient, then the

web page form is initially filled in with that patient’s

information. The result of the Prediction Model is a

collection of data that predicts the patient’s

survivability. The Presentation Generator allows the

user to view this data using two different formats: a

graph view and a table view. The graph view of the

data, given in Figure 2, is created with a Java

freeware JFreeChart (www.jfree.org/jfreechart). The

table view, which is obtainable by selecting the

Table tab, is given in Figure 3.

4 CONCLUDING REMARKS

We developed a software framework, the Survival

Probability Prediction Architecture (SPPA), for

translation of research findings into a clinical

application in a timely fashion and at low cost. For

investigators, SPPA helps them to plug in a

HEALTHINF 2009 - International Conference on Health Informatics

414

developed statistical model, adapt a database, and

make their models available for clinical practice. A

coherent presentation of the patient’s information

and the prediction of a cancer patient’s survival

allows a physician to plan, deliver, and evaluate the

most appropriate treatment for the patient.

We compared the results from the first statistical

model with the results produced from SPPA. The

graphs and tables in both cases were virtually

identical. We have also assessed the tool’s

extensibility using the second model with additional

inputs. Variable Definition component was modified

to include the new input variables to be used in the

model, and the server was restarted. SPPA requested

the appropriate inputs for patients for the second

model and generated the correct graphs and tables. A

more comprehensive assessment of SPPA, such as

evaluations on task completion efficiency (Hu et al.,

2007) and the system’s successfulness (DeLone and

McLean, 2003), still needs to be conducted.

As a framework prototype, there are several useful

features that are not yet included in SPPA. The

inclusion of wireless support would allow a larger

variety of mobile devices to access the tool. A

database is another planned addition to the platform

for storing and retrieving patient information. We

would also like our platform to grant different

permissions to the users with authentication.

ACKNOWLEDGEMENTS

This project is partially funded by a grant through

HealthForce Minnesota.

REFERENCES

Delone, W. H. & Mclean, E. R. (2003) The Delone And

Mclean Model Of Information Systems Success: A

Ten-Year Update. J. Manage. Inform. Syst., 19, 9-30.

Hu, P. J.-H., Zeng, D., Chen, H., Larson, C., Chang, W.,

Tseng, C. & Ma, J. (2007) System For Infectious

Disease Information Sharing And Analysis: Design

And Evaluation. Ieee Trans. Information Technology

In Biomedicine, 11, 483-492.

Sun, Z., Aubry, M. C., Deschamps, C., Marks, R. S.,

Okuno, S. H., Williams, B. A., Sugimura, H.,

Pankratz, V. S. & Yang, P. (2006) Histologic Grade Is

An Independent Prognostic Factor For Survival In

Non-Small Cell Lung Cancer: An Analysis Of 5018

Hospital- And 712 Population-Based Cases. J Thorac

Cardiovasc Surg., 131, 1014-1020.

Tan, J., Wen, H. J. & Awad, N. (2005) Health Care And

Services Delivery Systems As Complex Adaptive

Systems. Communications Of The Acm, 48, 36-44.

INTEGRATING R MODELS WITH WEB TECHNOLOGIES

415