New Maximum Similarity Method for Object Identification in Photon
Counting Imaging
V. E. Antsiperov
a
Kotelnikov Institute of Radioengineering and Electronics of RAS, Mokhovaya 11-7, Moscow, Russian Federation
Keywords: Machine Learning, Image Recognition, Photon Counting Detectors, Poisson Point Process Intensity,
Precedent‒based Identification, Naive Bayes Model, Gaussian Mixture Model, K‒means Clustering.
Abstract: The paper discusses a new approach to recognition / identification of the test objects according to their
intensity shape in the images registered by photon counting detectors. The main problem analyzed within the
framework of the proposed approach is related to the identification decision (inference ) based on a registered
set of discrete photocounts (~ photons) regarding the similarity of the shape of the object's intensity in the
image to the shape of previously observed objects (precedents). It is shown that when the intensity shape is
approximated by a mixture of Gaussian components within the framework of this approach, a recurrent
identification algorithm can be synthesized, similar to the well-known K-means clustering algorithm in the
machine (statistical) learning.
1 INTRODUCTION
Biomedical imaging includes several methods and
techniques for representing by images (usually
digital) various organs and their sections, hidden as a
rule from direct (visual) observation. The demand for
imaging tools in modern medicine is growing at a
rapid pace, and this is noted both in the field of
analysis of clinical cases and in the field of
diagnostics. The need for modern diagnostic methods
in medicine has led to the expansion of developments
in various areas of biomedical imaging. Magnetic
resonance imaging (MRI), computed (X-ray)
tomography (CT), positron emission tomography
(PET), single-photon emission computed
tomography (SPECT), etc. should be noted among the
main areas that have won reliable positions today
(Darby, 2012).
The existence of various imaging methods is
usually associated with the sensitivity of the
corresponding techniques to a certain type of tissue.
For example, MRI images are sensitive to soft tissue,
while X-ray images are more sensitive to hard and
bony structures. However, despite the difference in
the underlying physical principles, a common feature
of all methods is the low level of the used radiation.
a
https://orcid.org/0000-0002-6770-1317
This leads to the fact that the so-called photon-
counting detectors (PCDs) are widely used as the
sensors of the visualization (Leng, 2019). At the same
time, while the photon-counting mode is natural for
PET and SPECT, until recently, the mode of photons
accumulation (energy-integrating detectors EIDs)
was mainly used for CT. However, the recent
developments of energy-sensitive PCD open up new
possibilities for obtaining X-ray images at low photon
fluxes, involving the registration of images in the
PCD mode (from ~ 10 photons per pixel to, in the
future, 1:1) (Willemink, 2018).
This energy-sensitive PCD technology has the
potential to revolutionize clinical CT by providing a
higher contrast-to-noise ratio, improving spatial
resolution, and opening the possibilities for spectral
(colour) imaging. In this regard, the report outlines a
new image processing method that can be chosen as
the basis for a modern approach to PCD-imaging
problems. The proposed method is defined as the
method of maximum similarity and represents some
adaptation of the R. Fisher's maximum likelihood
method (ML) to machine learning problems.Within
the framework of the proposed approach, the
recognition, or more precisely, the identification of
objects on PCD images is performed in accordance
with the shape of their intensity.
Antsiperov, V.
New Maximum Similarity Method for Object Identification in Photon Counting Imaging.
DOI: 10.5220/0010346803410348
In Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2021), pages 341-348
ISBN: 978-989-758-486-2
Copyright
c
2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
341
2 MAXIMUM SIMILARITY
METHOD
The starting points of the maximum similarity method
formally are as follows. It is assumed that for the
objects under consideration there are sets of (random)
observations counts of the form 𝑋=
{
𝑥
,…,𝑥
}
,
where 𝑥
∈ℝ
. In this case, the number n of
observations in set 𝑋 can be arbitrary. The physical
mechanism of their registration (observation) is not
specified here. For the maximum similarity method
presented in this work, only the statistical description
of observations is essential.
From the statistical point of view, it is assumed
that each observation 𝑥
is random and the process of
its registration is described by some parametric
probability distribution with density 𝜌𝑥| 𝜃
, 𝜃
𝛩⊂
(parametric model). It is assumed, following
the Bayesian point of view, that the parameters 𝜃
are
also random variables with a certain prior distribution
density 𝒫𝜃
, the exact form of which, however, is
not essential. Both assumptions allow us to utilize the
joint distribution density 𝜌𝑥;𝜃
=𝜌𝑥| 𝜃
𝒫𝜃
of
observationы 𝑥 and parameters 𝜃
.
The presented parametric model belongs to the
class of the so-called generative models (Jebara,
2004), which imply a correspondence of certain
values of the parameters 𝜃
,𝜃
,…,𝜃
,…𝛩 to the
objects observed (in short, these values can be
thought of as object class labels (Wasserman, 2004)).
However, the problem with generative models is that
the exact values of the parameters corresponding to
the objects are unknown, only their statistical
estimates, based on the sets of objects observed data
𝑋
,𝑋
,…,𝑋
,… are available.
In particular, the maximum likelihood (ML)
estimates 𝜃
()
(Efron, 1982), determined from the
R. Fisher's ML equations, can be used as such
estimates:
𝜃
()
=𝑎𝑟𝑔max
∈
𝜚𝑋
| 𝜃
.
(1)
As follows from (1), to find maximum likelihood
estimates, it is necessary for each n, as well as for
𝑛=1, to determine its parametric n‒model of the set
of observations 𝑋 ‒ the corresponding probability
distribution density 𝜚𝑥
,...,𝑥
| 𝜃
(these models
for different n should be consistent in accordance
with the Kolmogorov theorem, see (Billingsley,
1986)). However, it is possible to significantly
simplify the problem of determining n‒models by
assuming the conditional (for a given 𝜃
)
independence of individual observations of the set:
𝜚𝑋|𝜃
=
𝜌𝑥
| 𝜃

.
(2)
Note that assumption (2) is actively used in
machine learning problems, for example, within the
framework of the naive Bayesian method (Barber,
2012), which is one of the ten most popular modern
algorithms (Wu, 2007).
The problem of identification, correlation of some
observable, test object with one of the previously
registered training objects (hereinafter called
precedents) is formalized now in the context of the
presented parametric model as the problem of
maximizing some measure of similarity of the
observed data from the object 𝑋=
{
𝑥
,…,𝑥
}
with
the data sets 𝑋
,𝑋
,…,𝑋
,…, obtained earlier for the
precedents. Since no additional knowledge about the
object and precedents except for observed data
𝑋,𝑋
,𝑋
,…,𝑋
,… is assumed (including the values
of the characteristic parameters 𝜃
,𝜃
,…,𝜃
,…𝛩,
characteristics of 𝒫𝜃
, etc.), it is highly desirable
that the corresponding similarity measure 𝜇
(
𝑋,𝑋
)
could be expressed in terms of this and only this data.
A natural quantitative characteristic of the
consistency of data 𝑋 and an arbitrary set of
observations 𝑋
is the probability density of their
joint distribution 𝑝
(
𝑋,𝑋
)
, under the assumption that
both sets correspond to the same unknown object.
Taking into account the basic assumption about
the considered model (2) (as well as the sets 𝑋 and 𝑋
conditional independence), density 𝑝
(
𝑋,𝑋
)
can be
written in the following form:
𝑝
(
𝑋,𝑋
)
=
𝑝𝑋,𝑋
| 𝜃
𝒫𝜃
𝑑𝜃
=
=
𝜚𝑋 | 𝜃
𝜚𝑋
| 𝜃
𝒫𝜃
𝑑𝜃
=
=
𝜌𝑥
| 𝜃
𝜌𝑥
| 𝜃
𝒫𝜃
𝑑𝜃
=
=
𝜚𝑋𝑋
| 𝜃
𝒫𝜃
𝑑𝜃
.
(3)
where {𝑥
} is a set of 𝑛
observations 𝑋
, 𝑋∪𝑋
is a
set of 𝑛+𝑛
observations obtained by uniting 𝑋 and
𝑋
. In other words, 𝑝
(
𝑋,𝑋
)
(3) is an unconditional
distribution density of type (2) model for united set of
observations {𝑥
} ∪{𝑥
}. Note that the data 𝑋 and
𝑋
, being conditionally independent, in the general
case turn out to be unconditionally dependent due to
a correlation 𝑋
~𝜃
~𝑋.
Considering the above interpretation, it is clear
that 𝑝
(
𝑋,𝑋
)
in some sense reflects the degree of
consistency between 𝑋 and 𝑋
. Namely, if all objects
observed data 𝑋
,𝑋
,…,𝑋
,…, would be of the same
size 𝑛
=𝑛
=⋯=𝑛
=𝑚, then all the sets {𝑋 ∪
𝑋
} would be random samples within the same
ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods
342
parametric (𝑛 + 𝑚)‒model 𝜚𝑋 | 𝜃
(2). In this case,
the degree of consistency 𝑋 ~ 𝑋
would be
determined by the probability of their joint sample
𝑋∪𝑋
, i.e. 𝑝
(
𝑋,𝑋
)
, indeed, could be considered as
a similarity measure. The problem, however, is that,
due to the arbitrary value of 𝑛
, the sets {𝑋 ∪ 𝑋
}
belong to different models and, therefore, the
comparison of 𝑝
(
𝑋,𝑋
)
values for different 𝑋
should be corrected for this circumstance.
In the proposed maximum similarity method, the
corresponding correction is specified by normalizing
the values 𝑝
(
𝑋,𝑋
)
(3) to the probabilities 𝑝
(
𝑋
)
:
𝜇
(
𝑋,𝑋
)
=
(
,
)
(
)
=𝑝
(
𝑋 | 𝑋
)
.
(4)
i.e. the similarity measure 𝜇
(
𝑋,𝑋
)
is chosen as the
ratio of the probability of the 𝑋∪𝑋
sample to the
probability of the only 𝑋
sample. In this case, the
maximum similarity method consists in choosing the
precedent for which the expansion of the sample 𝑋
by the 𝑋 leads to the greatest probability ratio.
As follows from (4), the chosen probability ratio
formally coincides with the conditional probability of
the set 𝑋 with the given observations 𝑋
. The latter
leads to an alternative interpretation of the proposed
method: the observed object is identified with the
precedent, observations 𝑋
of which lead to the
maximum conditional probability of a set of
observations 𝑋 for the object tested.
Using the selected measure of similarity of the
observed data and precedents 𝜇
(
𝑋,𝑋
)
(4), the
maximum similarity method can be formalized as a
solution to the following maximum similarity (MS)
equation:
𝑘

=𝑎𝑟𝑔max
𝜇
(
𝑋,𝑋
)
=
=𝑎𝑟𝑔max
𝑝
(
𝑋 | 𝑋
)
.
(5)
To further substantiate the proposed method,
namely, the choice of the similarity measure in the
form (4), it is convenient to turn to the asymptotic
case of large samples of precedents 𝑛
≫1. Note
that, in addition to questions of convenience, this case
quite adequately reflects the specifics of the
(machine) learning process organization.
In the general (not necessarily asymptotic) case
the conditional probability 𝑝
(
𝑋 | 𝑋
)
in terms of the
parametric model (3) can be written as:
𝑝
(
𝑋 | 𝑋
)
=
 |
| 
𝒫

| 
𝒫

.
(6)
In the case 𝑛
≫1 for 𝜚𝑋
| 𝜃
(2) holds the
well-known asymptotic approximation in
neighbourhood of 𝜃
()
(1) (Wasserman, 2004):
𝜚𝑋
|𝜃
≅𝜚𝑋
|𝜃
()
×
× exp −
𝜃
−𝜃
()
𝐼𝜃
()
𝜃
−𝜃
()

.
(7)
where 𝑇 is the transposition operation,
𝐼𝜃
()
is the
Fisher’s information matrix for the distribution
𝜌𝑥| 𝜃
one of the most important characteristics
of the adopted parametric model:
𝐼

𝜃
=−
𝜌𝑥
|𝜃
𝜕
ln𝜌𝑥
| 𝜃
𝜕𝜃
𝜕𝜃
𝑑𝑥
.
(8)
Considering the sharpness of the peak of
asymptotic (7) in the vicinity of
𝜃
()
, we can
approximate the numerator in (6) by:
𝜚𝑋|𝜃
𝜚𝑋
|𝜃
𝒫𝜃
𝑑𝜃
≈𝜚𝑋|𝜃
()
𝜚𝑋
| 𝜃
𝒫𝜃
𝑑𝜃
.
(9)
As a result, 𝑝
(
𝑋 | 𝑋
)
(6) is simplified to
𝜚(𝑋 |𝜃
()
), and the similarity measure 𝜇
(
𝑋,𝑋
)
(4)
takes the following simple form:
𝜇
(
𝑋,𝑋
)
≅𝜚𝑋|𝜃
()
.
(10)
In other words, in the case of large sets of
precedent observations 𝑋
,𝑋
,…,𝑋
,…, 𝑛
≫1 (the
number of observations n of a set 𝑋 of an identified
object does not have to be large), the naive Bayesian
distribution density
𝜚𝑋 | 𝜃
(2) for the values of the
parameter
𝜃
=𝜃
()
can be used as a similarity
measure 𝜇
(
𝑋,𝑋
)
. The latter means that for arbitrary
precedents (arbitrary 𝑛
), the similarity measures
𝜇
(
𝑋,𝑋
)
(4) are determined within the same n
model, so the comparison of their values is quite
justified. Note that not only ML estimates but also
many other, called consistent estimates, give us an
asymptotic of the form (7). In this context,
𝜃
()
can
be understood as any of consistent estimates, and not
necessarily a solution to problem (1).
With a pragmatic point of view, expression (10)
for the similarity measure 𝜇
(
𝑋,𝑋
)
seems even more
attractive than (4). The corresponding formulation of
the maximum likelihood method, which is the
asymptotic limit of the general formulation (5), takes
on a form like the maximum likelihood method (1):
𝑘

=𝑎𝑟𝑔max
𝜚𝑋 |𝜃
()
=
=𝑎𝑟𝑔 max
∈{
(

)
}
𝜚𝑋 |𝜃
.
(11)
with the only exception that maximization is
performed not over all
𝜃
∈𝛩, but only over a finite set
of estimates 𝜃
()
,𝜃
()
,…,𝜃
()
,…𝛩. Note that,
from a practical point of view, in this case, there is
also no need to store in full the sets of observations of
New Maximum Similarity Method for Object Identification in Photon Counting Imaging
343
precedents 𝑋
,𝑋
,…,𝑋
,…, it is enough to save only
the parameter estimates (statistics) 𝜃
()
, obtained
from observations.
3 MAXIMUM SIMILARITY
METHOD IN THE CASE OF
GAUSSIAN MIXTURES
In previous works (Antsiperov, 2019а), (Antsiperov,
2019b) basing on the physical (semi‒classical)
mechanisms of radiation registration by matter, it was
shown that the most adequate statistics of
photocounts in the PCD image data 𝑋=
{
𝑥
,…,𝑥
}
,
is given by the Poisson point processes (PPP) model
(Streit, 2010). It was also shown that for the problems
of object identification only by the form of intensity,
regardless of the total brightness, one can restrict
oneself to the densities of conditional distributions of
the probabilities of the coordinates of the counts {𝑥
}
factored into the product, provided that their total
number of registered counts n is given:
𝜚
(
𝑥
,…,𝑥
|𝑛,𝐼(𝑥
)
)
=
𝜌(𝑥
|𝐼(𝑥
))

,
𝜌𝑥
𝐼
(
𝑥
)
=𝐼(𝑥
)
𝐼(𝑥
)𝑑𝑥
,
(12)
where 𝐼(𝑥⃗) is the intensity of coming from the object
and recorded on the PCDs surface
𝛺 radiation.
Let the intensity 𝐼(𝑥⃗) be approximated by a
parametric intensity model 𝐼(𝑥⃗|𝜃
), 𝜃
∈𝛩⊂ℝ
,
which we define as a sum, a mixture of 𝐾 overlapping
components, frames {𝐹
(
𝑥
|
𝜇
,𝜃
)}, 𝑗=1,,𝐾,
located at the nodes 𝜇
of some imaginary regular
lattice, covering
𝛺:
𝐼(𝑥
|𝜃
)=
𝐹
(
𝑥
|
𝜇
,𝜃
)

.
(13)
The components 𝐹
𝑗
(
𝑥
|
𝜇
𝑗
,𝜃
𝑂
) in (13) are
assumed to be copies of some basic frame 𝐹
(
𝑥
|
𝜂⃗) ≥
0, 𝜂
∈ℝ
𝑝
, 𝑝=𝑃𝐾
, shifted to the nodes of the
lattice 𝜇
, 𝑝=𝑃𝐾
, specified up to the values of
the parameters 𝜂
. The region 𝛥⊂ℝ
, in which
𝐹
(
𝑥
|
𝜂⃗) ≠ 0 will be called the carrier of the base
frame or the base carrier. This carrier 𝛥 is assumed to
be symmetric in the sense that it contains the
coordinates origin 𝑥⃗=0
and, together with each 𝑥
𝛥, contains −𝑥. The simplest example of such a
carrier is some regular polygon, for example, a
square, placed in coordinates origin.
Let us denote by 𝜂
the values of the parameters
of the frame / component in the mixture (13), located
at the node 𝑗 (𝜂
can be considered as related to this
node 𝑗 as 𝜇
). Taking into account the assumptions
made, model (1) is refined in the form:
𝐼(𝑥
|𝜃
)=
𝐹𝑥
−𝜇
𝜂
)

.
(14)
where the complete set of parameters 𝜃
is now
represented by the set
{
𝜂
,…,𝜂
}
and it is considered
that the 𝑗‒th component of the mixture depends only
on a part of the parameters ‒ on 𝜂
.
In model (14), the carriers of the components of
the mixture 𝛥
copies of the base carrier 𝛥 shifted
by 𝜇
, are assumed to be partially overlapping. This
is provided by the requirement 𝐷>𝑑, where 𝐷 is the
characteristic size of 𝛥, and 𝑑 is the lattice spacing.
Assuming the last requirement is fulfilled, we obtain
that the set of carriers 𝛥
completely covers the area
𝛺 of the image. The simplest example of such a
covering
𝛺 is the square carriers 𝛥
, whose centers
are located at the nodes 𝜇
of a rectangular lattice.
When
𝛺 is completely covered by the carriers
𝛥
, each point 𝑥⃗∈𝛺 belongs to at least one carrier.
Therefore, the set of nodes whose carriers contain 𝑥
is not empty. We denote the set of indices of these
nodes by 𝛿
={𝑗} and call it the 𝑥 lattice
environment. Due to the symmetry of the base carrier
𝛥, the nodes in 𝛿
will be those contained in the
region obtained by displacement to the point 𝑥 of the
base carrier 𝛥. Lattice environments 𝛿
will be used
intensely in the construction of identification
procedure.
The next step in refining model (14) is related to
the choice of a dependence of the base frame 𝐹
(
𝑥
|
𝜂⃗)
on the parameters 𝜂
=(𝜂
0
,𝜂
1
,…,𝜂
𝑝−1
)
𝑇
. The
simplest form of such dependence could be a linear
combination with the coefficients 𝜂
of some finite
functional basis {𝜑
(𝑥⃗),𝜑
(𝑥),…,𝜑

(𝑥⃗)}, for
example, in the image of the construction of frames
(Gröchenig, 2001). However, in this case, to ensure
the positivity of the base frame 𝐹
(
𝑥
|
𝜂⃗) ≥ 0,
significant restrictions would be required both on the
basis {𝜑
(𝑥⃗)} and on the parameters 𝜂. Therefore, it
seems more appropriate to expand on the functional
basis of not the frame itself, but its logarithm:
𝐹
(
𝑥
|
𝜂
)=𝐼
exp
𝜂
𝜑
(𝑥
)


 ,𝑥
∈𝛥 ,
(15)
where the multiplier 𝐼
is introduced to ensure the
correct frame dimension. Bearing in mind the
subsequent normalization of the intensity, it is
convenient to take as multiplier 𝐼
the average
radiation intensity on the
𝛺 :
ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods
344
𝐼
=
𝐼
(𝑥
)𝑑𝑥
,
(16)
where 𝛴
is the surface area of 𝛺.
It is natural to require that the parametric family
𝐹
(
𝑥
|
𝜂⃗) (15) contains at least all the constants 𝐼
(
𝑥
)
𝐼>0. For this, one of the basic functions, for
example 𝜑
(𝑥⃗), should be admitted as a constant:
𝜑
(𝑥⃗) ≡1. The corresponding parameter 𝜂
will set
the normalization of the components by means of the
factor exp
{
𝜂
}
in (15). In view of the subsequent
transition to the normalized version of the intensity, it
is convenient to introduce instead of the parameter 𝜂
another parameter 𝑤=𝑤(𝜂⃗), which is a function of
𝜂
and has the meaning of the energy fraction per
frame (15) for the given 𝜂
of a total (falling per unit
time on
𝛺) energy 𝑊
=𝐼
𝛴
:
𝑤
(
𝜂
)
=
1
𝑊
𝐹
(
𝑥
|
𝜂
)𝑑𝑥
=
=exp
{
𝜂
}
1
𝛴
exp 𝜂
𝜑
(𝑥⃗)


𝑑𝑥
=
=exp
{
𝜂
}
exp
𝐴
(𝜂
,…,𝜂

)
(17)
where an auxiliary (cumulant-generating function)
function related to the basis {𝜑
(𝑥⃗)} is introduced:
𝐴
(𝜂
1
,…,𝜂
𝑝−1
)=
=ln
1
𝛴
𝛺
exp
𝜂
𝑞
𝜑
𝑞
(𝑥)
𝑝−1
𝑞=1
𝑑𝑥
𝛥
.
(18)
Replacing the parameters 𝜂⃗→(𝑤,𝜂
,…,𝜂

)
in accordance with (17), we arrive at the following
representation of the base frame (15):
𝐹
(
𝑥
|
𝑤, 𝜂)=
=𝐼
𝑂
𝑤exp
{
𝜂𝜑
(
𝑥
)
𝐴
(𝜂)
}
𝛱
𝛥
(𝑥)
.
(19)
where the shortened notations 𝜂
⃗=(𝜂
1
,…,𝜂
𝑝−1
)
𝑇
and 𝜑
(
𝑥
)
=(𝜑
(𝑥),…,𝜑

(𝑥⃗))
are introduced,
symbol of dot product is used and the
characteristic function 𝛱
(𝑥⃗) of the base carrier 𝛥 is
introduced to automatically take into account the
constraint 𝑥
∈𝛥.
Considering the chosen structure of the base
frame 𝐹
(
𝑥
|
𝑤,𝜂⃗) (19), the parametric model of
intensity 𝐼(𝑥⃗|𝜃
) (14) takes the following final form:
×
𝐼(𝑥
|𝜃
)=𝐼
𝑤

×
exp𝜂
𝜑
𝑥
−𝜇
−
𝐴
𝜂
 ×
×𝛱
(𝑥
−𝜇
)
.
(20)
where {𝑤
} and {𝜂
} ‒ are weights and sets of normal
parameters of 𝑗‒th components, 𝜑
(
𝑥
)
is a functional
basis of the parametric model common for all
components, 𝐼
and 𝐴𝜂
are defined, respectively,
by expressions (16) and (18). Note that, if we
formally calculate the integral over
𝛺 of the right-
hand side of (20), taking into account (18), the will
get the value 𝐼
𝛴
𝑤

, which implies the
property that the weights {𝑤
} are normalized to
unity, which coincides with their interpretation as the
distribution of the total radiation energy 𝑊
=𝐼
𝛴
over individual components.
Using the worked out parametric model of the
recorded radiation intensity (20), we write down the
probability distribution density (12) of an individual
observation count from the sample 𝑋
={𝑥
,…,𝑥
},
representing the of the object 𝑂 image, also in the
parametric form:
𝜌𝑥
𝐼
(
𝑥
)
=𝐼(𝑥
|𝜃
𝑂
)
𝐼(𝑥
|𝜃
𝑂
)𝑑𝑥
=
=
1
𝛴
𝛺
𝑤
𝑗
exp𝜂
𝑗
𝜑
𝑥
−𝜇
𝑗
−𝐴𝜂
𝑗
 ×
𝑗∈𝛿
𝑥
×𝛱
𝛥
(𝑥
−𝜇
𝑗
)
,
(21)
where it is considered that for a given count 𝑥 the
formal summation over all components 𝑗 from 1 to 𝐾
in (20) is actually reduced to summation over the
nonzero components at the point
𝑥over the lattice
environment 𝛿
.
Even though all subsequent conclusions can be
made in the most general case of mixtures of
distributions of an exponential family, for simplicity
of presentation and to avoid cumbersome formulas,
we restrict ourselves to the case of a simple model of
Gaussian mixtures (GMM) (Murphy, 2012).
Namely, we assume that the functional basis
𝜑
(
𝑥
)
contains only two linear basis functions
𝜑
(
𝑥
)
=𝑥
𝐷
and 𝜑
(
𝑥
)
=𝑥
𝐷
(so the number of
𝜂 components is also two:
𝜂⃗=(𝜂
1
,𝜂
2
) ). In addition,
we approximate the characteristic function 𝛱
(𝑥⃗) by
a Gaussian bell-shaped distribution exp
{
−𝑥
2𝐷
⁄}
.
Replacing the integration in (18) with 𝛱
(𝑥⃗) by
integrating with Gaussian approximation, we find that
an auxiliary function 𝐴
(
𝜂
)
has a quite simple form:
𝐴
(
𝜂
)
=
𝜂
+ln

. (22)
Substituting 𝐴
(
𝜂
)
from (22) into the expression
for
𝜌𝑥⃗𝐼
(
𝑥
)
(21) we finally obtain a representation
of the classical model of a Gaussian mixture (GMM)
mixtures (Murphy, 2012) for the probability
distributions of individual counts from the sample 𝑋
:
New Maximum Similarity Method for Object Identification in Photon Counting Imaging
345
𝑥
𝐼
(
𝑥
)
=
1
2𝜋𝐷
2
𝑤
𝑗
exp
𝑥
−𝜇
𝑗
−𝐷𝜂
𝑗
2𝐷
2
𝑗
∈𝛿
𝑥
(23)
In the chosen model (23), corresponding n-model
of the observations 𝑋
={𝑥
,…,𝑥
} (2) after
opening all n brackets in the product takes the form:
𝜚
𝑋
| 𝜃
= 
𝑤
2𝜋𝐷
exp−
𝑥
−𝜇
−𝐷𝜂
2𝐷
∈

=
𝑤
2𝜋𝐷
×

𝑗
𝑛
∈𝛿
𝑛
𝑗
1
∈𝛿
𝑥
1
×exp
𝑥
−𝜇
−𝐷𝜂
2𝐷

(24)
where the indices 𝑗
∈𝛿
associate the samples 𝑥
with nodes 𝑗 of the lattice environment 𝛿
, the
summation is performed over all possible 𝑛–tuples
{
𝑗
,…,𝑗
}
∈𝛿
×…×𝛿
=𝛿
𝑋
𝑛
.
Note that each of the 𝑛–tuples
{
𝑗
,…,𝑗
}
∈𝛿
𝑋
𝑛
defines a partition of the sample 𝑋
={𝑥
,…,𝑥
}
into 𝐾 disjoint subsets of counts 𝜋
associated with
nodes 𝑗: 𝜋
=
{
𝑥
|𝑗
=𝑗
}
, 𝑋
=
𝜋

. If in the
product terms of the sum (24) we combine the
exponents and give similar terms with respect to the
parameters 𝑤
and 𝜂
, then the density 𝜚𝑋
| 𝜃
can be rewritten as a sum over all partitions:
𝜚𝑋
| 𝜃
= exp 𝑝
̅

ln𝑤

{
,…,
}
∈
×exp
𝑝̅

𝛷
−𝐷𝜂
2𝐷

×
×
𝑒𝑥𝑝−
1
2𝐷
𝑝̅

𝐶
2𝜋𝐷
(25)
where 𝑝̅
=𝑛
𝑛
are the partition weights of nodes,
𝑛
=|𝜋
| is the number of samples 𝑥
in the subset
𝜋
, corresponding node 𝑗,
𝑛

=𝑛,
𝑝̅

=1
and {𝛷
}, {𝐶
} are defined by the partition 𝜋
as
follows:
𝛷
=
𝑥
−𝜇
∈
𝑛
,
𝐶
=
𝑥
−𝜇
−𝛷
∈
𝑛
.
(26)
Obviously, when the total number of counts n is
large, the calculation of all 𝛿
~𝑘
terms in (24) /
(25), where 𝑘>1 is the average size of the lattice
environment 𝛿
,𝑥⃗∈𝛺, becomes a difficult problem.
For example, in the case when the size of the base
carrier 𝐷 noticeably exceeds the lattice spacing 𝑑,
𝑘≈
(
𝐷𝑑
⁄)
≫1, the number of terms becomes
extremely large, and the problem becomes an EXP
complete problem (Du, 2014).
Therefore, the problem of approximating
𝜚𝑋
| 𝜃
(24) / (25), with something simpler is
urgent. One of the possibilities here is, obviously, the
approximation of this big sum with its only term. A
remarkable fact is that if the chosen term is maximal
compared to others in some neighbourhood of some
𝜃
(∗)
, then as the number of counts n grows, the term
becomes the asymptotic approximation of the
distribution 𝜚𝑋
| 𝜃
(25) in 𝜃
(∗)
. Indeed, if we
denote the terms in (25), corresponding to the
partitions as [𝑠
(𝜃
)]
, and by {𝜋
(∗)
} we denote the
partition corresponding to the maximal in 𝜃
(∗)
term,
then (25) can be rewritten as:
𝜚𝑋
| 𝜃
[𝑠
{
(∗)
}
(𝜃
)]
=
=1+ [𝑠
(𝜃
)𝑠
{
(∗)
}
(𝜃
)]
(∗)
(27)
Since by assumption in some neighbourhood of
𝜃
(∗)
all ratios in the right-hand sum (27) are less than
unity, so if we denote the average of degree 𝑛 of all
~ 𝑘
ratios by 𝜀
, then we also get 𝜀
<1. So far as
the sum in (27) can be estimated from above by
[
𝑘𝜀
]
, for 𝑘𝜀
<1, as the number of counts 𝑛 in the
sample
𝑋
grows, this sum will tend to zero and,
therefore, the asymptotic
𝜚𝑋
| 𝜃
[𝑠
{
(∗)
}
(𝜃
)]
→1
will take place. The latter means that the maximal
term for
𝑛≫1 acquires a dominant value in the
vicinity of 𝜃
(∗)
.
Considering the above reasoning, the problem of
approximation
𝜚𝑋
| 𝜃
is thus reduced to the
question of how to find the partition {𝜋
(∗)
} for which
the corresponding term in (25) will be maximal in the
neighbourhood of its most probable, optimal
parameter
𝜃
(∗)
=𝑤
,{𝜂
}. It is clear, that the
problem of finding such partition {𝜋
(∗)
} / optimal 𝜃
(∗)
as
direct comparison of the maxima of all ~𝑘
terms
of sum (25) is not suitable, since it is also an EXP
complete problem (Du, 2014). Fortunately, there are
quite effective procedures for solving optimization
problems like (25). Below we propose one of such
procedures ‒ recurrent segmentation / partition of
sample 𝑋
={𝑥
,…,𝑥
} , analogous to the well‒
known 𝐾‒means segmentation method (Barber,
2012), (Wu, 2007). This procedure consists of
ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods
346
sequential (recurrent) iterations of two main steps
step P for refining the target partition
𝜋
()
and step
M for calculating the corresponding
𝜋
()
optimal
parameters
𝜃
()
.
Refinement at step P of the partition
𝜋
()
with
the parameters
𝜃
()
found at the previous iteration
is carried out as a solution for each sample 𝑥
∈𝑋
of
the following maximization problem:
𝑗
=𝑎𝑟𝑔max
∈
𝑤
()
exp−



 ,
(28)
where the set of nodes
𝑗 tested for maximum is
limited by the 𝑥
‒th lattice environment 𝛿
.
The solution
𝑗
of each of the problems (28)
selects in the sum (25) a subset of those terms for
which the 𝑖‒th factor is greater than that of the others.
After the indices are found for all counts {𝑥
,…,𝑥
},
the the 𝑛–tuples
{
𝑗
,…,𝑗
}
∈𝛿
𝑋
𝑛
will define some
refined partition
𝜋
()
the corresponding term of
which for the values parameters
𝜃
()
will be
maximum in the sum (24) / (25).
With the partition
𝜋
()
found at step P, finding
at step M the corresponding optimal parameters
𝜃
()
is the quite simple problem. First, the weights
𝑤
are
included in each term of (25) only in the expression
for the cross-entropy
𝑝
𝑗
𝐾
𝑗=1
ln𝑤
𝑗
in first factor and
give it a maximum at
𝑤
=𝑝̅
for all 𝑗. Second, the
maximum of each term with respect to
𝜂
is obvious
from the form of second factor in (25) and is achieved
at
𝜂
=𝛷
𝐷
. Collecting these facts into a system, we
find that at step M the following calculations should be
performed:
𝑤
()
=𝑝
̅
()
=|𝜋
()
|𝑛
,
𝜂
()
=𝛷
()
𝐷
=
1
|𝜋
()
|𝐷
𝑥
−𝜇
∈
()
(29)
The question of stopping the recurrent procedure
(28)‒(29) is solved in the same way as in the case of
𝐾 ‒means segmentation, i.e. is determined by the
criterion for stabilization of partitions:
𝜋
()
=
𝜋
()
=𝜋
𝑗
(∗)
. Note that the convergence of the
partitions
𝜋
()
→𝜋
𝑗
(∗)
is accompanied by the
convergence of the parameters
𝜃
()
→𝜃
(∗)
), but it
does not follow from this that in the limit we obtain
the maximum likelihood estimate
𝜃
()
(1).
As a result, calculating 𝜃
(∗)
(or, equivalently
{𝜋
(∗)
}) and approximating the density 𝜚𝑋
| 𝜃
(25)
with the corresponding term, we obtain, according to
(10), the following approximation for the similarity
measure:
𝜇
(
𝑋
,𝑋
)
=𝑄(𝜃
(∗)
)exp−𝐷
𝐾𝐿
𝑤
𝑗
,𝑤
𝑗
𝑘

×exp
1
2
𝑤
𝑗
(∗)

𝜂
(∗)
−𝜂

(30)
where
𝜃
()
=𝑤
𝑗
𝑘
,{𝜂
} are parameters of the
precedent
𝑋
and 𝑤
𝑗
(∗)
and {𝜂
(∗)
} are the optimal
parameters of the tested sample
𝑋
. In (30) the model
parameters 𝑝̅
and 𝛷
(26), dependent on the
partition, were replaced in accordance with (29) with
the optimal parameters calculated during the PM
procedure. For the convenience, the cross-entropy
𝑝
𝑗
𝐾
𝑗=1
ln𝑤
𝑗
in (25) is replaced by the sum of the
Kullback–Leibler divergence 𝐷

𝑤
,𝑤

(Murphy, 2012) and the Shannon’s entropy 𝐻𝑤
:
𝐷

𝑤
,𝑤
=−  𝑤
ln𝑤
𝑤

,
𝐻𝑤
= 𝑤
ln𝑤

.
(31)
Finally, all independent of the precedent
𝑋
factors in
𝜚𝑋
| 𝜃
are combined in (30) into one
common
𝑄(𝜃
(∗)
):
𝑄𝜃
(
)
=
𝑒𝑥𝑝𝐻𝑤

2𝜋𝐷
×
×exp
1
2𝐷
𝑤
𝑗
(∗)

𝐶
𝜋
𝑗
(∗)

(32)
Theoretically, the similarity measure (30)
explicitly sets the required expression for finding the
most similar precedent to the tested object (11).
However, the computational characteristics of the
method can be noticeably improved if any equivalent
similarity measure 𝜇
(
𝑋
,𝑋
)
monotonic function
of 𝜇
(
𝑋
,𝑋
)
will be used. Namely, discarding the
first factor
𝑄(𝜃
(∗)
) in (30), which does not depend at
all on the parameters
𝜃
()
, and taking the natural
logarithm of the second factor, up to a factor 𝑛 we
obtain the following similarity measure:
𝜇
(
𝑋
,𝑋
)
=−𝐷
𝐾𝐿
𝑤
𝑗
,𝑤
𝑗
𝑘

1
2
𝐷
𝑀𝐸
𝜂
(∗)
,𝜂

(33)
where we have introduced the notation 𝐷

for the
average over the distribution of 𝑤
Euclidean
distance between the corresponding parameters of
𝜂
𝑗
(∗)
and 𝜂
𝑗
𝑘
:
New Maximum Similarity Method for Object Identification in Photon Counting Imaging
347
𝐷

𝜂
(∗)
,𝜂
= 𝑤
(∗)

𝜂
(∗)
−𝜂
(34)
Thus, in the case of modeling the intensities of the
registered radiation by a simple model of Gaussian
mixture (24) / (25), the maximum similarity method
is reduced to the choice of the minimum sum of
divergence between the optimal parameters of the
tested object
𝜃
(∗)
and the maximum likelihood
parameters of the precedents
𝜃
()
. Denoting the sum
of divergences (31), (33) by 𝜇̅
(
𝑋,𝑋
)
=−𝜇
(
𝑋,𝑋
)
,
we can reformulate the maximum similarity method
(11) for this case as follows:
𝑘

=𝑎𝑟𝑔min
𝜇
̅
(
𝑋
,
𝑋
)
=
=𝑎𝑟𝑔min
𝐷

𝑤
,𝑤
 +
1
2
𝐷

𝜂
(∗)
,𝜂

(35)
which in the above entry is literally the criterion of
minimum difference.
4 CONCLUSIONS
The considered case of approximating the shape of
the intensity of objects by mixtures of Gaussian
components and the results of the corresponding
numerical simulation showed the adequacy of the
method of maximum similarity to the problems of
analyzing PCD images. Even in low-quality images
(~ 1000 samples), the algorithm corresponding to the
proposed method gives the correct identification of
objects. Note that the implementation of the
algorithm, like the method of 𝐾‒means segmentation,
is very efficient computationally for mixtures with
~ 1000 components in the common computation
environment, processing of images with a size of
500×500 pixels by PM algorithm (28) ‒ (29) takes ~
1 sec on a standard PC and it is already clear that these
characteristics can be improved if desired.
As for the maximum similarity method itself, the
simplicity of its interpretation, to which the section 2
is devoted, and the straightforwardness of its
algorithmic implementation, what is the main content
of the section 3, makes it attractive both in theoretical
and practical terms, especially in the context of
modern, oriented to machine learning approaches. In
a sense, for machine learning problems, the proposed
method is an adaptation of the R. Fisher's maximum
likelihood method widely used in traditional statistics
(Efron, 1982). The fruitful use of the latter, as is
known, has led to a huge number of important
statistical results. In this regard, it is hoped that the
proposed maximum similarity method will also be
useful in solving a wide range of modern problems of
statistical (machine) learning.
ACKNOWLEDGEMENTS
The author is grateful to the Russian Foundation for
Basic Research (RFBR), grant N 18-07-01295 А for
the financial support of the work.
REFERENCES
Darby, M.J., Barron, D.A., Hyland, R.E., 2012. Chapter 1.
Techniques. In Oxford Handbook of Medical Imaging.
Oxford University Press. Oxford.
Leng, S., et. al. 2019. Photon-counting Detector CT:
System Design and Clinical Applications of an
Emerging Technology. In RadioGraphics, 39(3): 729-
743.
Willemink, M.J., et. al., 2018. Photon-counting CT:
Technical Principles and Clinical Prospects. Un
Radiology, 289(2): 293-312.
Gonzalez, R.C., Woods, R.E., 2007. Digital Image
Processing, Prentice Hall. 3
rd
edition.
Saleh, B., 1978. Photoelectron Statistics. Springer Verlag.
Berlin.
Antsiperov, V., 2019а. Machine Learning Approach to the
Synthesis of Identification Procedures for Modern
Photon-Counting Sensors. In Proceedings of the 8th
International Conference on Pattern Recognition
Applications and Methods – ICPRAM. SCITEPRESS.
Jebara, T., 2004. Machine Learning: Discriminative and
Generative. Kluwer. Dordrecht.
Wasserman, L., 2004. All of Statistics: A Concise Course in
Statistical Inference. Springer. New York.
Efron, B., 1982. Maximum likelihood and decision theory.
In Ann. Statist., 10: 340–356.
Billingsley, P., 1986. Probability and Measure, Wiley.
New York. 2
nd
edition.
Barber, D., 2012. Bayesian Reasoning and Machine
Learning , Cambridge Univ. Press. Cambridge.
Wu, X., et. al., 2007. Top 10 algorithms in data mining. In
Knowl. Inf. Syst. 14 (1): 1–37.
Antsiperov, V.E., 2019b. Target identification for photon-
counting image sensors, inspired by mechanisms of
human visual perception. In Journal of Physics:
Conference Series, 1368: 032020.
Streit, R.L., 2010. Poisson Point Processes. Imaging,
Tracking and Sensing. Springer. New York.
Gröchenig, K., 2001. Foundations of Time-Frequency
Analysis. Birkhäuser. Boston.
Murphy, K.P., 2012. Machine Learning: A Probabilistic
Perspective. MIT Press. Cambridge.
Du, D., Ko, K., 2014. Theory of Computational
Complexity, 2nd ed. John Wiley & Sons.
ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods
348