Behavioral Predictive Analytics towards Personalization for
Self-management
Bon Sy
1,2,3 a
, Jin Chen
3
, Magdalen Beiting-Parrish
1,3
and Connor Brown
2
1
Graduate Center/City University of NY, 365 5
th
Ave, NY 10016, U.S.A.
2
Queens College/City University of NY, 65-30 Kissena Blvd., Queens NY 11367, U.S.A.
3
SIPPA Solutions, 42-06A Bell Blvd., Queens NY 11361, U.S.A.
Keywords: Self-health Management, Behavioral Predictive Analytics.
Abstract: The objective of this research is to investigate the feasibility of applying behavioral predictive analytics to
optimize diabetes self-management. In the U.S., less than 25% of patients actively engage in self-management
even though self-management has been reported to associate with improved health outcomes and reduced
healthcare costs. The proposed behavioral predictive analytics relies on manifold clustering to derive non-
linear clusters. These clusters are characterized by behavior readiness patterns for subpopulation segmentation.
For each subpopulation, an individualized auto-regression model and a population-based model are developed
to support self-management personalization in three areas: glucose self-monitoring, diet management, and
exercise. The goal is to predict personalized activities that are most likely to achieve optimal engagement.
This paper reports the result of manifold clusters based on 148 subjects with type 2 diabetes, and shows the
preliminary result of personalization for 22 subjects under different scenarios.
1 INTRODUCTION
Type 2 (Pre-)Diabetes is a chronic disease that affects
over 115 million Americans and over 440 million
people world-wide. Some of the risk factors are
mitigatable or even reversible through behavior
change towards a healthy lifestyle. It has been
demonstrated elsewhere (Bollyky, 2018) that
behavior change can achieve a 10% or more
improvement in diabetes symptoms if an individual is
engaged in proactive self-management of diabetes.
Self-management is generally accepted as a
viable intervention strategy (Hadjiconstantinou,
2020). Self-management is the patient’s ability to
manage their chronic disease through their own
activities, such as taking their blood glucose and
focusing on meeting diet and activity goals. However,
we do not fully understand the relationship between
the behavior readiness of an individual and the
specific intervention strategy that could deliver
optimal patient engagement in self-management
activities. As evidenced in a survey conducted
elsewhere (Volpp, 2016), less than 25% of patients
are considered as actively engaged in self-health
a
https://orcid.org/0000-0001-8827-2702
management. Population health management will not
be cost effective if self-management programs do not
consider the readiness of the patient population. A
contribution of this research is to provide an insight
into the technical feasibility of behavioral predictive
analytics. The goal is to optimize the effectiveness of
self-management strategies by means of
personalization based on predicting behavior
readiness and its relationship to engagement
outcomes. In this study, we aim to demonstrate a
potential predictive system that delivers personalized
content to the users based on their behavior readiness
and user profile.
Section 2 contains a brief review on the state-of-
the-art, and the context of this research within it. We
will first discuss the Theory of Planned Behavior, and
the use of behavior constructs as an attribute vector of
behavior readiness. In section 3 the research results
reported elsewhere will be restated as it is applied for
in this research. In section 4 we will discuss
predictive analytics for personalization using either
an auto-regression model, or a population-based
model. The population-based model provides a
fallback mechanism when the auto-regression model
derivation fails. This could occur when there is
Sy, B., Chen, J., Beiting-Parrish, M. and Brown, C.
Behavioral Predictive Analytics towards Personalization for Self-management.
DOI: 10.5220/0010231801110121
In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF, pages 111-121
ISBN: 978-989-758-490-9
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
111
insufficient data, or it fails the statistics test of the
model selection process based on Bayesian/Akaike
Information Criteria. In section 5 we will present the
results of manifold clustering based on the attribute
vector of behavior readiness of 148 subjects with type
2 diabetes. This will be followed by the results of a
preliminary study involving 22 subjects who were in
the intervention phase for personalization during the
study period. We will then summarize the results of
this paper, discuss the limitations, and conclude with
our future research plans.
2 RELATIONSHIP TO
STATE-OF-THE-ART
The Theory of Planned Behavior is a popular
theoretical framework in health psychology. It is used
to describe the underlying psychological mechanisms
that lead to changes in behavior. Within this
framework, the individual has many beliefs about
their behavior as well as beliefs about the normative
behaviors expected within a social context. These all
work together or in opposition to fuel behavioral
attitudes and beliefs in subjective norms, based on the
importance the individual places on these attitudes
and norms. This then decides the individual’s
intentions which lead to the behaviors in question
(Kan, 2017). In line with this theory, our research
proposes targeting a user’s behavioral beliefs to
change their attitudes and intentions toward
actionable health behaviors.
One of the most important features of our
approach is the use of frequent reminders to track
health activities that reveal information about
appropriate health behaviors. In a review of the
literature, Fry and Neff (Neff & Fry, 2009) found that
frequent periodic prompts around: improving diet,
increasing physical activity, and weight loss all led to
positive results for study participants. Tailored
prompts were especially found to be statistically
significant in encouraging user engagement;
however, for users who are already not engaged, these
prompts do little to engage users (Bidargaddi, Pituch,
Maaieh, Short, & Strecher, 2018). Sawesi et al.
(2016) found in a systematic review of the literature
that digital methods such as text messages, web
applications, and social media interventions all were
good intervention tools. These tools can support
behavioral change in users and usually improve
patient engagement. Finally, the use of mobile health
interventions has been found to be an engaging
method for improving health behaviors and is cost
effective for the behavioral change (Van Stee &
Yang, 2020).
3 PREDICTIVE ANALYTICS
FOUNDATION
SIPPA (Secure Information Processing with Privacy
Assurance) predictive analytics relies on two
foundational building blocks developed in research
reported elsewhere (Sy, 2017, 2019). The workflow
process for the application of the proposed predictive
analytics consists of three stages. In stage 1, an
individual responds to a survey instrument linked to a
behavior model for measuring readiness. In stage 2,
the outcome measure of the behavior readiness
determines the cluster/subpopulation that the
individual is assigned to. The assignment is based on
the similarity between the individual’s behavior
pattern and the statistically significant association
patterns that characterize the cluster/subpopulation.
In stage 3, the population based model and
individualized week-over-week engagement models
are applied to predict personalized weekly activities
that optimize the success rate of engagement in self-
health management. The details on stage 3 will be
presented in the following section.
The first building block of SIPPA predictive
analytics is a behavior model to enable behavior
readiness prediction. Behavior readiness is a 1x4
vector of continuous (Real) numbers quantifying
[ownership, motivation, intention, attitudes]. These
behavior attributes of Real are constructs of behavior
modelling grounded on the Theory of Planned
Behavior. Structural Equation Modelling (Duncan,
1975) was employed to link questions of a survey
instrument to the behavior constructs defined by a
weighing factor derived from the confirmatory factor
analysis. The behavior model linking to the survey
questions were statistically validated based on the
responses from over 500 participants (Sy, 2017).
The second building block is an unsupervised
learning approach for discovering manifold clusters.
The novelty of manifold clustering is to induce
patient subpopulation clusters based on statistically
significant association patterns. This approach is not
restricted to only continuous data (number of Real).
In other words, this approach could be applied to a
data set of mixed-type of both continuous and discrete
variables. A behavior pattern, which is manifested by
the instantiation of finite discrete variables, is
statistically significant if it survives two tests: (1) a
support measure as defined by normalized
HEALTHINF 2021 - 14th International Conference on Health Informatics
112
frequency occurrence exceeds a pre-defined
threshold according to the domain problem, and (2)
the association among the observed values does not
happen by chance as measured by the mutual
information measure. There are two important results
of the manifold clustering technique. First, each
manifold cluster has a semantic interpretation
characterized by statistically significant association
patterns; i.e., grouping according to behavior
readiness in this application. Second, the manifold
clustering does not require linearity assumption as is
in Principal Component Analysis (PCA). But it will
produce the same result as PCA if the linearity
assumption holds, and the iteration is based on
minimizing reconstruction errors; i.e., “phase 2”
regrouping is skipped in the manifold clustering.
While the behaviour constructs are related according
to the Theory of Planned Behavior, variations exist as
shown in the confirmative factor analysis regarding
the assumption on linearity; i.e., the existence (and
strength) of a linear relationship between the
behaviour constructs that quantifies behavior
readiness for self-management in a population.
4 PREDICTIVE ANALYTICS FOR
PERSONALIZATION
The behavior goal of personalization for self-
management is to target specific user-directed
activities that will be communicated to a user through
a mobile app, and to inform “fulfilment” through
feedback from the app. For example, when a
personalized recommendation is to walk 10,000 steps
a day, one would like to know whether a user follows
through after the user received the recommendation
from the mobile app. Two specific metrics are defined
for this research to gain insights into the effectiveness
of personalization:
Compliance Ratio (CR): Over a period of time,
compliance ratio is the ratio of the number of times a
proposed health related activity (i.e., actionable
health) was acted on over the recommended/expected
number of the related activity given the clinical
condition/disease state of an individual.
Example: Over a period of 30 days, a diabetes
user is encouraged to self-monitor one’s glucose once
a day under the clinical recommendation in
commensurate to one’s specific diabetic condition.
The expected number of self-monitoring
measurements is 30. Over this period the user self-
monitors 18 times. Therefore, the compliance ratio is
0.6.
Engagement Ratio (ER): Over a given period,
engagement ratio is defined as the total number of
user interactions to the messages over the total
number of messages sent. These messages are health
tips or reminder for health actions, and are sent
through text messaging, push notification, or as an in-
app message.
Example: Over a period of 30 days, three
messages are sent daily: one healthy tip, one reminder
to self-monitor, and one reminder on exercise. The
total number of messages sent is 90. A diabetes user
responds to half of the healthy tips (i.e., 15 out of 30),
and ⅕ of the reminders on self-monitoring, and of
the reminders on exercise. The engagement ratio is
(15+6+10)/90 = 31/90.
4.1 Prediction based on
Auto-regression and Maximum
Likelihood
To facilitate the discussion on predictive analytics for
personalization, let P be a population consisting of n
individuals; i.e., |P| = n. C = {C1, … Ck} is the set of
subpopulations obtained by applying manifold
clustering described in section 3 to P; where Ci
P,
Ci Cj=𝜙 if 𝑖≠𝑗 and P =
i
Ci. p
j
Ci
is the j
th
individual in the subpopulation cluster Ci. Recall
each manifold cluster Ci is characterized by one or
more statistically significant association patterns of
behavior readiness attribute vector(s). For each p
j
Ci
individual, there exists a set of engagement/
compliance ratios over some period of time T. Let’s
denote the set of engagement ratios be {ER
1
, …, ER
T
}.
T could be different from one individual to another
due to the rolling basis of the enrollment into the pilot.
For example, one individual who just starts self-
management may have (T=) 2 weekly engagement/
compliance ratios while another one in the same
subpopulation may have (T=) 6 weekly engagement/
compliance ratios. Yet they both belong to the same
subpopulation because of their behavior readiness.
This proposed predictive analytics is based on a
two-pronged approach. First, individualized auto-
regression will be applied for personalization when
there is sufficient data on the engagement
(compliance) ratio on a type of messages related to
self-management; e.g., healthy diet. Second, a
population-based model prediction for
personalization will be applied when an individual
does not (yet) have sufficient data on the
engagement (compliance) ratio, or the individualized
auto-regression model derivation fails on statistic
validation. There is sufficient data for generating an
individualized auto-regression model when T
l
for
Behavioral Predictive Analytics towards Personalization for Self-management
113
l being the order of the auto-regression model as
discovered through model selection criteria such as
AIC (Akaike Information Criteria) or BIC (Bayesian
Information Criteria) that pass statistical tests.
4.2 Information-Theoretic Model
Selection Approach
Bayes and Akaike Information Criteria are two
common information-theoretic approaches for model
selection as stated below:
Bayes Information Criterion (BIC): BIC(l) =
ln(SSR(l)/T) + [(l+1)ln(T)]/
T
(1)
Akaike Information Criterion (AIC) AIC(l) =
ln(SSR(l)/T) + 2/
T
(2)
where l = number of lags,
T= total number of observations,
SSR(l) = sum of squared residual calculated from
the difference between the estimated value derived
from l
th
order auto-regression and the actual one.
Objective: choose l that minimizes BIC/AIC and
p-value < 0.05, and R
2
- correlation is “large.”
4.3 Predictive Analytics for
Personalization
Stage 1: The behavior readiness (a 1x4 vector of Real
[ownership, motivation, intention, attitude]) of each
individual in a population is derived based on the
user's response to a survey instrument.
Stage 2: The population is partitioned into
subpopulations based on the result of manifold
clustering; where each cluster is a subpopulation.
Further technical details about manifold clustering
based on statistically significant association patterns
could be found elsewhere (Sy, 2019).
Stage 3: Repeat the following for each possible
self-management activity (e.g., self-monitoring,
exercise, diet management):
For each subpopulation Ci, derive the population
statistical (joint) distribution of ER and ΔER based on
the available engagement ratios of all individuals
(p
j
Ci
) in the subpopulation; for j = 1, 2, ... |Ci|. In other
words, the joint distribution characterized by Pr(ER,
ΔER) is derived from using the ER
t
and ΔER
t+1
(t = 1
… T-1) of each individual p
j
Ci
in the population who
has participated in the study for a time period T. This
is referred to as a population-based model to support
predictive analytics specific to the subpopulation
cluster Ci for the rest of the discussions in this paper.
For each individual p
j
Ci
residing in a
subpopulation (manifold cluster) Ci:
1. Perform l
th
order auto-regression (for l = 1 .. k
T) on successive change in engagement ratio ΔER;
in other words, ΔER
t+1
=ER
t+1
– ER
t
where t = 1 .. T-
1.
2. Perform AIC or BIC to determine the desirable
lag l given the time series data that minimizes
AIC/BIC.
3. Note the p-value and the correlation R
2
between
the actual and the estimated based on some pre-
selected threshold for R
2
.
4. Predict the change in engagement ratio ΔER
T+1
p
based on auto-regression using T, T-1, T-2 … T-l. If
the test statistics in (3) are reasonable (i.e., p-value <
0.05 and threshold R
2
), keep the predicted value
ΔER
T+1
p
and stop. Otherwise continue to step 5.
5. Determine the predicted value ΔER
T+1
p
based
on ΔER
T+1
p
= ArgMax
ΔER
Pr(ΔER| ER=ER
T
p
).
Among the choices on the actionable health (e.g.,
self-monitoring, exercise, diet), determine the
actionable health recommendation based on the one
with the largest ΔER
T+1
p
.
Predicting/recommending coaching agenda based
on compliance ratio is similar by repeating the steps.
5 PRELIMINARY STUDY
The proposed approach was applied to the diabetes
subjects of a self-health management pilot conducted
under an IRB-approved study protocol (CUNY IRB
#2018-1043). The objective was to investigate the
impact of digital health solutions to affect
individuals’ behavior towards self-management of
chronic diseases, particularly type 2 diabetes.
To be included in the study, the participants had
to be at least 18 years old. They also needed a
minimum education level of a high school diploma.
An additional criterion was that the participants had
to have an H1AC of 6.0, or a diagnosis of diabetes or
pre-diabetes. This means that participants also had a
perceived risk of developing or had already
developed diabetes and other associated chronic
illnesses.
The behavior model developed under previous
research for predicting behavior readiness was based
on a population of over 500 individuals. The
population consisted of both healthy individuals as
well as individuals with chronic diseases. The
statistically validated model was applied in stage 1 of
the proposed predictive analytics for personalization.
148 individuals with type 2 diabetes were
involved in stage two of the preliminary study. These
participants had a mean age of 49 and a mean H1AC
HEALTHINF 2021 - 14th International Conference on Health Informatics
114
of 7.89%. The population characteristics are shown
below:
Table 1: Participant Demographic Information.
Ethnicity:
Distribution:
Caucasian 41.40%
African American 30.90%
African
American/His
p
anic
3.10%
Asian 13.80%
Hispanic 7.50%
Hispanic/Caucasian 1.10%
Indian/Asian 1.10%
Mexican/Black 1.10%
Income (in U.S. $):
Distribution:
$0 - $24,999 27.50%
$25,000 - $49,000 23.33%
$50,000 - $99,999 28.33%
$100,000 - $150,000 12.50%
$150,000 - $199,999 4.17%
> $200,000 4.17%
Education level:
Distribution:
High school diploma
17.89%
Some college - no degree
21.95%
2-yr college degree
16.26%
4-yr college degree
26.83%
Some graduate work
5.69%
Graduate-level degree
11.38%
Self-perceived health
Distribution:
Poor
8.13%
Fair
28.46%
Good
43.09%
Very good
16.26%
Excellent
4.06%
Sex:
Distribution:
Female 51%
Male 49%
The survey responses of these 148 individuals
were used to identify manifold clusters
(subpopulations). Individuals were grouped into a
cluster when their behavior readiness measures were
close to the statistically significant association
patterns characterizing the cluster. Four manifold
clusters were obtained for stage 2 of this proposed
approach.
Among the 148 individuals participating in this
pilot on a rolling basis, some were still in a one-month
hold period for establishing a baseline without
intervention; i.e., they have not entered the pilot phase
for personalized intervention. On the other hand,
some others already completed the intervention phase
of the pilot. Excluding these two groups, 49 subjects
with type 2 diabetes were left to be included in
deriving the population-based models for
personalized intervention. These were the subjects
who entered/were in the intervention phase of the
study as of this report. The self-health management
focused on the following three health coaching
agenda items:
- Knowledge building and information gathering
(through daily wisdom sent via SMS and/or push
notifications)
- Discipline and skill development (through
notifications and reminders)
- Awareness improvement (through weekly survey)
Figure 1: Push notification. Figure 2: SMS reminder.
The self-health management activities of this pilot
included the delivery of (1) daily wisdom on diabetes
management, (2) text messaging, and/or notification
reminders on diet, physical exercise, and self-
monitoring, and (3) in-app services to track self-
monitoring, diet and steps. This is followed by
weekly online surveys to improve awareness on self-
management. An example of each of these are shown
in Figures 1 to 4. This study will focus on only a
retrospective analysis based on compliance ratio, and
a forward-looking prediction based on engagement
ratio, for evaluation purposes.
Behavioral Predictive Analytics towards Personalization for Self-management
115
Fi
g
ure 3: In-a
pp
service. Fi
g
ure 4: Weekl
y
surve
y
.
5.1 Data-driven Model Development
The data collected and used for this preliminary study
are a subset of our pilot sample. When a subject enters
the “intervention” phase of the study protocol, the
SIPPA Health platform collects de-identified activity
meta-data on user interactions with the SIPPA Health
mobile app. This allows us to infer adherence and
engagement in certain activities; e.g., using the app to
conduct medication research or schedule medication
reminders. The survey response data of 148 subjects
were used to derive individuals’ behavior readiness.
Among the 148 subjects, 49 of them have either
completed the study or were in the “intervention”
phase during the study period. The data from these
148 subjects were used for the manifold clustering to
identify subpopulation characteristics defined by
behavior readiness. The data from the 49 subjects just
mentioned were used to derive the population-based
models (section 4.3 stage 3) to support the behavioral
predictive analytics for personalization. The
personalization results reported in this paper are
based on 22 subjects who were in the “intervention
phase during the study period of this research. A
subject in the “intervention” phase of the study
receives a recommendation on a weekly basis about
the activities on diet management, physical activities,
and self-monitoring of glucose and other vital signs.
Personalization for each subject is performed on a
weekly basis to recommend one activity to focus on
during a week.
Using the behavior readiness of 148 subjects as
training data, four manifold clusters were identified.
Each of the 49 subjects who completed/entered the
intervention phase were assigned to a cluster based on
the similarity of the behavior readiness measure
between the individual and behavior patterns
exhibiting statistically significant association that
define the cluster. Further details on the similarity
function could be found elsewhere (Sy, 2019).
Within each cluster subpopulation, a normalized
compliance ratio and an engagement ratio of each
subject, as well as the change on a weekly basis, are
derived for each one of the activities: diet
management, physical activities, and self-monitoring.
Each ratio is normalized to account for the different
starting times of the participants. For each subject, an
auto-regression model is derived for each activity for
each ratio. It is noted that developing an auto-
regression model is not always feasible. For example,
there may not be sufficient data because in an early
stage of the participation an individual may have only
activity data in one category (such as self-monitoring)
but not the others (such as physical activities).
Furthermore, the data may not yield a valid auto-
regression model because it fails the statistical test in
step 3 during the model selection process using
BIC/AIC. Typically, this happens when a subject is in
the intervention phase for less than four weeks.
In a scenario where an individual auto-regression
model is not feasible, prediction for personalization
for the individual will rely on the population-based
model. For each cluster subpopulation, we derive a
population-based model − one for each activity
defined by the distribution of the compliance/
engagement ratio and the amount of change using the
data of all the subjects in the cluster subpopulation. In
other words, there are nxm such models to capture
engagement (compliance) ratios; where n is the
number of clusters, and m is the number of activity
categories. For example, m=3 if there are three
categories of activities such as diet management,
physical exercise, and self-monitoring. A population-
based model developed for an activity category Aj
(where j = 1 .. m) in a cluster Ci (where i = 1 .. n) is
used to predict an engagement (compliance) ratio for
an individual in Ci when an individual auto-
regression model is not available for the activity
category Aj.
5.2 Preliminary Study
The subjects included in this study were distributed
across four different clusters (subpopulations). The
results reported in this paper are based on an 11-week
(2.5 months) study of personalization in summer of
2020. In other words, the activity data of each subject
since participating in this pilot, leading up to the week
of personalization, was used to develop the prediction
models for the self-management activities. Then for
each subject a recommendation (either exercise or
HEALTHINF 2021 - 14th International Conference on Health Informatics
116
diet management) was derived using the prediction
algorithm described in the previous section.
5.2.1 Feasibility Assessment
To determine the feasibility on the real-world
application of the proposed behavioral predictive
analytic technique, the design of the preliminary
study consists of two parts. The first part is a
retrospective analysis using the data related to
compliance. The second part is looking forward
prediction on the engagement. The purpose of
retrospective analysis is to establish a base reference
for performance assessment based on historical
results. The looking forward prediction is for
evaluating the prediction performance as a time series
on a rolling basis in real time.
Retrospective Analysis
The predictive analytics will be greatly simplified if
personalization could be based on only the time-series
(engagement/compliance) data. That is, for each
subject, it is possible to derive an auto-regression
model that is also statistically valid according to the
information-theoretic model selection criteria
described in section 4.2. In such a case, manifold-
based clustering could be completely skipped because
a population-based model to support personalization
would not be necessary.
To gain insight into such scenario just described,
an attempt was made to derive an auto-regression
model for each subject who completed/entered the
intervention phase. Out of the 49 subjects, the auto-
regression model derivation was successful for 21
subjects (who completed or entered the intervention
phase). Therefore, manifold clustering is required for
this particular use case on applying the algorithm
described in section 4.3.
The compliance ratio is computed on the weekly
basis for each subject. A subject has n data points of
compliance ratio; where n is the number of weeks of
participation in the intervention phase. For deriving
the auto-regression model for a subject, (n-4) data
points were used to derive/train the auto-regression
model, and the model is used to predict the
compliance ratio of the last 4 data points for
evaluation purposes.
Forward Looking Prediction
In contrast to the retrospective analysis, forward
looking prediction involves only those subjects who
were in the intervention phase during the study
period. Out of the 49 subjects mentioned earlier, 22
of them were used to generate the engagement ratios
and predictive analytics.
The engagement ratio of each active subject was
computed on a weekly basis. Similar to the
retrospective analysis, an estimated engagement ratio
is derived for each week based on the predictive
analytics technique described in section 4.3. The
prediction was performed forward looking. For
example, the prediction on engagement ratio for week
n (n=2 … 11) of the 11-week study period for a
subject would be conducted at week n-1. Then the
actual observed engagement ratio was recorded at
week n. This forward looking prediction process was
repeated ten times in the 11-week study period.
5.3 Results
Figure 5: Predicted compliance ratio for an subject.
Figure 6: Observed compliance ratio for a subject.
5.3.1 Retrospective Analysis
Figures 5 and 6 show the predicted and observed
compliance ratios of the 21 subjects for whom a
statistically valid auto-regression model could be
derived. The result shows the predicted and observed
compliance ratios for each week on each of the 21
subjects; whereas a compliance ratio is derived based
on a 7-day average. As shown in Figure 7, there is a
Behavioral Predictive Analytics towards Personalization for Self-management
117
consistent pattern across the 4-week prediction
period. Below shows the R and the p-value of the 4
weeks; whereas R is the correlation coefficient
measuring the strength and direction of a linear
relationship between the predicted and observed
compliance ratio, and p-value is a probability
measure on the value of R that have occurred just by
random chance (which is typically compared against
the gold standard requiring it to be less than 0.05):
Table 2: R and p-values for the tests
Wee
1 Wee
2 Wee
3 Wee
4
R
0.5178 0.6673 0.7698 0.7008
p-value 0.0162 0.00095 4.5E-05 0.0004
Figure 7: Average predicted vs observed CR.
5.3.2 Forward Looking Prediction
In the forward-looking prediction experiment, the
prediction is on actionable health recommendations
based on the maximal posterior estimate as described
in section 4.3. In this study, the personalized
actionable health recommendation would be in either
diet management or exercise. 22 subjects were in the
intervention phase during this period of research.
Figures 5 through 7 show evidence of its accuracy
and consistency. But we are also interested in the
effectiveness of the prediction technique for
personalization. To evaluate its effectiveness for
improving self-efficacy on health management, this
study also attempts to show personalized actionable
health (recommended by the behavioral predictive
analytics) resulting in a more active engagement
when it is compared to that of without
personalization.
In order to understand the effect of
personalization on engagement, the weekly average
engagement ratio without personalization is
compared against the engagement ratio with
personalization. Figure 8 shows the aggregated
weekly engagement average, disregarding
subpopulations, for comparison purposes.
In calculating the engagement ratio without
personalization, the average engagement ratio of each
subject over time prior to personalization is first
calculated, then the average over all the subjects.
Note that the average engagement ratio of each
subject over time prior to personalization spans over
different time periods and lengths, as well as the
actionable health recommendations because of the
rolling nature of the subject participation in the pilot.
Figure 8: Aggregated ER w(/o) personalization.
Figure 9: Individual ER average (over 11 weeks).
Figure 9 shows the engagement ratio of each
individual averaged over the participation period.
There are half a dozen subjects with low/zero
engagement ratio in forward looking prediction. All
of them received follow-up from this research team to
understand these unusual outcomes. One withdrew
from the study, and two were unreachable during the
study period. Among the rest, one has limited
technology proficiency, and one other older adult
subject relies on her daughter to assist her on certain
self-management activities at a time convenient to her
daughter. Furthermore, one subject (participant 15 in
HEALTHINF 2021 - 14th International Conference on Health Informatics
118
Figure 9) was active until he damaged his phone
during the study period of this research.
Figure 10 shows the aggregated engagement
average of 22 subjects (with personalization) for each
week during the study period distributed across four
cluster subpopulations.
Figure 10: Observed ER by subpopulation clusters.
5.4 Discussion
5.4.1 Experimental Results
The results shown in Figures 5 through 7 in the
retrospective analysis show evidence of the feasibility
of behavioral predictive analytics in terms of
computational efficacy as measured by accuracy and
consistency.
Figure 8 shows the evidence of the applicability
of the approach in terms of health efficacy. It shows
that engagement level with personalization is better
than that without personalization.
The results shown in Figures 9 and 10 in the
forward-looking experiment demonstrate the
practical implementation feasibility. The results
shown in Figure 10 also reveal indirect evidence of
the effectiveness of the manifold-based clustering
technique for grouping subjects into subpopulations
by means of behavior readiness. In particular,
subpopulation clusters 1 and 2 are the more engaged
patient subpopulations reflected in the behavior
readiness characteristics of the clusters. Furthermore,
personalization with strategies tailored for a cluster
seems to show an effect over time for improving the
engagement, in particular, the second cluster
subpopulation that is not as high performing at the
beginning.
Finally, the overall average engagement ratio
with personalization had a mean value of 0.31 with a
standard deviation of 0.33. The 95% confidence
interval around this was [0.17, 0.45]. By contrast,
without personalization, the overall mean
engagement ratio is 0.26 with a standard deviation of
0.31. The 95% confidence interval for this value was
[0.13, 0.38]. These are overall promising results;
however, with such large standard deviations, one of
the next steps in the research would be to gather larger
samples to mitigate this issue.
5.4.2 Hypothesis Testing
Although the results shown in the previous figures are
encouraging, it is necessary to conduct a hypothesis
test analysis to understand the extent of improvement
with clustering and personalization, as well as its
statistical significance.
In reference to the results of the forward-looking
prediction shown in Figures 8 to 10, an analysis was
conducted to understand the effect of the population
size on the statistical power. In particular, is the
change in engagement ratio reported in this study
generalizable?
This question was approached by conducting a t-
test to compare the difference between the means of
the engagement ratio with personalization and
without personalization for the entire sample and
within each cluster by investigating such change of
each participant over the 11-week period of the study.
Table 3: Hypothesis testing results for each cluster.
t-statistic
p
-value
All data without clustering 0.51758 .303733
Cluste
r
1 0.32971 0.372949
Cluste
r
2 1.79319 0.061554
Cluste
r
3 -0.48247 0.319928
Cluste
r
4 -0.10798 0.459604
While the t-statistic shows an overall
improvement on engagement ratio when
personalization is applied --- irrespective to
clustering, and a more significant improvement with
clustering, none passes the p-value test for the result
to be generalizable. This suggests that the study will
need a larger population to achieve a power that
allows the result to be generalizable.
5.4.3 Limiting Factors
There are many human factors that need to be
explored in further analyses. These include time spent
in the training period, level of proficiency with
technology, and demographic features that can impact
engagement such as gender and socioeconomic
status.
Behavioral Predictive Analytics towards Personalization for Self-management
119
In addition to the non-technical limitations above,
two factors related to the population-based model are
noteworthy. First, the population-based model
approach is non-parametric and could potentially be
sensitive to the additional data available over time
that could change the behavior of the model as
measured by information-theoretic entropy. Second,
when a personalized recommendation is based on the
population model, it should be noted that the
prediction strategy is a “greedy” approach.
In reference to step 5 of the algorithm that
determines the predicted value ΔER
T+1
p
based on Max
Pr(ΔER
T+1
p
| ER
T
), a larger ΔER
T+1
p
is unlikely to
come from a large ER
T
. For example, if ER
T
=0.9, it
is not possible for ΔER
T+1
p
> 0.1; or Pr(ΔER
T+1
p
>0.1|
ER
T
=0.9)=0. Therefore, the “greedy” approach has
an inherent bias to work better in personalization for
those who are moderately active compared to others.
6 CONCLUSION
A behavioral predictive analytics approach was
presented for self-management personalization. The
personalized recommendation is based on the
engagement outcomes that reveal the behavior
readiness of an individual in self-management. Auto-
regression and population models were derived to
support the proposed predictive analytics approach
for generating personalized recommendations. A
limitation of this research is the requirement for a
“wait” period to accumulate sufficient data to derive
a personalized auto-regression model. In this research
we adopt a strategy that aims to prioritize
personalization based on greatest improvement
possible on engagement in a self-management area.
This has an inherent bias that may negatively impact
individuals with limited potential improvement on
engagement. We do not yet know how this affects
engagement and in what pace. Our future research
will focus on understanding this aspect. An additional
future research goal will be to collect larger samples
in future, as our results were promising, but need
larger samples to be statistically significant for future
generalizability.
ACKNOWLEDGEMENTS
The authors are indebted to the reviewers for their
valuable comments that help to improve this paper.
This research is conducted under the support of U.S.
NSF phase 2 grant 1831214. Mike Wassil oversees
the pilot operation described in this research. Michael
Van der Gaag leads the usability study of the mobile
app used in this research. The pilot team consists of
Arora Ashima, Connor Brown,
Brandon Huang,
Rebecca Horowitz, Sumaita Hussain, and Pan Lin.
Dr. Catherine Benedict had advised on this research
regarding patient self-efficacy. Dr. Adebola
Orafidiya (MD) had helped this pilot team by sharing
clinical best practice on recommending self-
monitoring. This pilot team has also benefited from
the discussions with Dr. Joseph Tibaldi (MD) and
Caterina Trovato (CDE) on patient engagement.
REFERENCES
Bidargaddi, N., Pituch, T., Maaieh, H., Short, C., &
Strecher, V. (2018). Predicting which type of push
notification content motivates users to engage in a self-
monitoring app, Preventive Medicine Reports, 11: 267-
273. https://doi.org/10.1016/j.pmedr.2018.07.004.
Bollyky JB, Bravata D, Yang J, Williamson M, Schneider
J., 2018. Remote Lifestyle Coaching Plus a Connected
Glucose Meter with Certified Diabetes Educator
Support Improves Glucose and Weight Loss for People
with Type 2 Diabetes. J Diabetes Res. 2018;
2018:3961730. Published 2018 May 16.
doi:10.1155/2018/3961730
CDC, 2020. National Diabetes Statistics Report.
https://www.cdc.gov/diabetes/pdfs/data/statistics/natio
nal-diabetes-statistics-report.pdf
Duncan, Otis Dudley. 1975. Introduction to Structural
Equation Models. New York Academic Press.
Hadjiconstantinou M, Schreder S, Brough C, et al., 2020.
Using Intervention Mapping to Develop a Digital Self-
Management Program for People With Type 2
Diabetes: Tutorial on MyDESMOND. J Med Internet
Res. 2020;22(5):e17316. Published 2020 May 11.
doi:10.2196/17316
Kan M.P.H. & Fabrigar L.R., 2017. Theory of Planned
Behavior. In: Zeigler-Hill V., Shackelford T. (eds)
Encyclopedia of Personality and Individual
Differences. Springer, Cham
Neff R. & Fry, J., 2009. Periodic prompts and reminders in
health promotion and health behavior interventions:
Systematic review. Journal of Medical Internet
Research, 11(2). URL: https://www.jmir.org/2009/
2/e16 DOI: 10.2196/jmir.1138
Sawesi, S., Rashrash, M., Phalakornkule, K., Carpenter,
J.S., Jones, J.F., 2016. The impact of information
technology on patient engagement and health behavior
change: A systematic review of the literature. JMIR
Medical Informatics, 4(1):e.1, Doi: 10.2196/medin
form.4514
Sy B., 2017. "SEM Approach for TPB: Application to
Digital Health Software and Self-Health Management,"
2017 International Conference on Computational
Science and Computational Intelligence (CSCI), Las
HEALTHINF 2021 - 14th International Conference on Health Informatics
120
Vegas, NV, 2017, pp. 1660-1665, doi:
10.1109/CSCI.2017.289.
Sy B., Chen J. and Horowitz R., 2019. "Incorporating
Association Patterns into Manifold Clustering for
Enabling Predictive Analytics," 2019 International
Conference on Computational Science and
Computational Intelligence (CSCI), Las Vegas, NV,
USA, 2019, pp. 1300-1305, doi: 10.1109/CSCI
49370.2019.00243.
Van Stee, S. K., & Yang, Q., 2020. The effectiveness and
moderators of mobile applications for health behavior
change. In Technology and Health (pp. 243–270).
https://doi.org/10.1016/b978-0-12-816958-2.00011-3
Volpp KG, Mohta N., 2016. Insights Report: Patient
Engagement Survey: Improved Engagement Leads to
Better Outcomes, but Better Tools Are Needed. NEJM
Catalyst May 2016 Notes:https://catalyst.nejm.org/
patient-engagement-report-improved-engagement-
leadsbetter-outcomes-better-tools-needed/
Behavioral Predictive Analytics towards Personalization for Self-management
121