
2.2  Metabolic Syndrome  
In Japan, the decision to diagnose metabolic 
syndrome is based on the following criteria: waist 
circumference, blood sugar level, HbA1c, systolic 
blood pressure, diastolic blood pressure, 
triglycerides, HDL cholesterol, and LDL cholesterol, 
and people are classified into the metabolic 
syndrome group, preliminary group, and normal 
group. The screening method is explained in Figure 
1 (Wataru et al. 2008). 
 
Figure 1: Screening for metabolic syndrome in Japan. 
This screening checks only the combination of four 
risk factors: the combination of obesity and 
hyperglycemia, hypertension, or dyslipidemia. 
However, there are other risk factors reported, such 
as mental disorders (Maria D. Llorente et al. 2006; H. 
Klar Yaggi 2006). Therefore, many persons at high 
risk for these diseases cannot be screened by this 
method. 
3 PROPOSED METHODS 
To realize a screening method that can identify the 
various health risk factors, we propose a machine-
learning-based screening method using medical 
checkup data and medical billings. In general, 
medical checkup data involve blood tests that 
indicate health status, and medical billings involve 
personal medical history that involves information 
about all diseases. Technical knowledge is needed to 
make the rules using medical billings for decisions 
like metabolic syndrome. Thus, we applied machine-
learning techniques to handle the huge volume of 
data statistically. In this paper, we propose using 
latent Dirichlet allocation (LDA) (Blei et al. 2003). 
LDA is a kind of topic model where machine-
learning techniques are mainly used for natural 
language processing. With LDA, we can model data 
(such as documents) as a mixture of multiple topics 
more precisely than the mixed Gaussian distribution 
such as k-means. 
3.1  Latent Dirichlet Allocation 
LDA has the advantage of easily modeling 
documents and is now applied to various data 
mining tasks, such as information retrieval and voice 
recognition (Ishiguro et al. 2012; Otsuka et al. 2012), 
data visualization, and image processing (Fei-Fei et 
al. 2005; Wang et al. 2009; Wang and Mori 2009;  
Niebles et al. 2008). LDA infers the topic of 
documents containing many words from a document 
set by assigning each word to a certain topic. In 
LDA, documents are handled as a bag-of-words 
representation, and these documents are analyzed 
according to a word-topic probability matrix ( ϕ 
matrix) and topic-document probability matrix (θ 
matrix). Some approximation techniques estimate 
the parameters of LDA, such as variational Bayes 
(Blei et al. 2003), Gibbs sampling (Griffiths and 
Steyvers 2004), and collapsed variational Bayes 
(The et al. 2006). In this paper, we use the Gibbs 
(Griffiths and Steyvers 2004) technique. Now, we 
explain and review LDA following the notation of 
(Griffiths and Steyvers 2004). Let there be T topics 
and 
,…
 represent bag-of-words representations 
for each D document. (Document d  becomes 
,
,…
,
 where N be number of all types 
of words. ) Also, let 
 be the hidden topic from 
which  
 is  generated,  
|, and 
 for  document  d. LDA involves the 
following generative model: 
θ~Dirα 
|
~
 
ϕ~Dirβ 
|
,~
 
Dir and Mult mean the Dirichlet distribution and 
multinomial distribution, respectively. α and  βare 
hyperparameters for the document-topic and topic-
word Dirichlet distributions, respectively. Here we 
assume α and β are scalars resulting in symmetric 
Dirichlet priors. Given observed words, we have to 
infer the hidden topics. To approximate this 
posterior, we resort to a Markov chain Monte Carlo 
(MCMC) sampling scheme, specifically a collapsed 
Gibbs sampling: 
1.Confirmriskfactorofaccumulationofvisceralfat
・waistcircumference
>85centimeters(male)
>90centimeters(female)
2.Confirmadditionalriskfactors
hyperglycemia
hypertension
hyperlipidemia
・Bloodsugarlevel>110mg/dl
・HbA1c>5.5%
・Takingmedicineofdiabetesmellitus
・Triglyceride>150mg/dl
・HDLcholesterol<40mg/dl
・Takingmedicineofdyslipidemia
・SystolicBP>130mmHg
・DiastolicBP>85mmHg
・Takingmedicineofhypertension
3.Decision
RiskfactorNo.1
RiskfactorNo.2
RiskfactorNo.3
RiskfactorNo.4
Anyoneoftheseconditions
visceralfat
Anyoneoftheseconditions
Anyoneoftheseconditions
Thecasethatsubjectpersonhas
・bothriskfactorNo.1and anytwoofadditionalfactors(No.2~4)
・bothriskfactorNo.1and anyoneofadditionalfactors(No.2~4)
・Exceptthecasesabove
Metabolicsyndromegroup
Preliminarygroup
Notmetabolicgroup
MethodofScreeningtheHealthofPersonswithHighRiskforPotentialLifestyle-relatedDiseasesusingLDA-Towarda
BetterScreeningMethodforPersonswithHighHealthRisks
503