Computer Assisted Quantification of Hyoid Bone Motion in Fluoroscopic
Videos
Ishtiaque Hossain
1
, Angela Roberts-South
2
, Mandar Jog
3
and Mahmoud R. El-Sakka
1
1
Computer Science Department, Western University, London, Ontario, Canada
2
Health and Rehabilitation Sciences, Western University, London, Ontario, Canada
3
Department of Clinical Neurological Sciences, Western University, London, Ontario, Canada
Keywords:
Dysphagia, Swallowing Disorder, Videofluoroscopic Swallowing Study, Hyoid Bone, Object Detection, Haar
Classifier, Tracking, Template Matching, Kinesiologic Analysis.
Abstract:
The Videofluoroscopic Swallowing Study is a technique commonly used by radiologists to detect abnormali-
ties in the swallowing process. While the subject swallows the food, X-ray images are taken and then compiled
in a video form. The video is later analyzed by the radiologist using visual means. Since the nature of the
inspection is highly subjective, the result of the inspection can barely be reliable. One of the assessed mea-
sures is the elevation of the hyoid bone during the swallow. This research introduces a semi-automatic method
which identifies the hyoid bone in fluoroscopic videos and quantifies its motion. Before identifying the hyoid
bone, the region-of-interest is automatically identified using a classification-based approach and subsequent
image processing procedures are applied to the identified region-of-interest. Results show that the proposed
method can accurately quantify the motion of the hyoid bone.
1 INTRODUCTION
The swallowing process begins as the food is chewed
inside the mouth and ends when the food reaches the
stomach. In order to detect abnormalities in the swal-
lowing process, radiologists use a technique called
Videofluoroscopic Swallowing Study, where the pa-
tient is instructed to swallow food mixed with bar-
ium sulphate and the swallowing process is recorded
in the form of a video made of X-ray images. Bar-
ium causes the food to become visible in the captured
video and this allows the radiologist to watch the ac-
tivities inside the patient’s throat during the swallow-
ing process. The protocol for this method is described
in more detail in the work of Palmer et al. (Palmer
et al., 1993).
Usually, radiologists inspect a number of mea-
sures when evaluating the swallowing process, in-
cluding the elevation of the hyoid bone. During a sin-
gle swallow cycle, the hyoid bone is elevated (moves
in an upward and forward direction) as the cycle be-
gins. The hyoid bone then moves in the opposite di-
rection, returning to its normal position as the cycle
ends. Figure 1 shows the trajectory of the hyoid bone
during a normal swallow. Paik et al. reported that
the trajectory of the hyoid bone is significantly differ-
Figure 1: Trajectory of the hyoid bone during a normal
swallow.
ent from its normal trajectory for patients who have
abnormalities in the swallowing process (Paik et al.,
2008). It is indicative of the fact that, inspecting the
trajectory of the hyoid bone can play an importantrole
when evaluating the swallowing process.
Currently, the evaluation procedure is performed
by means of visual inspection. Due to the highly sub-
jective nature of the evaluation process, achieving re-
liable result from the assessment can be a very chal-
lenging task. The severity of this issue is reported in
757
Hossain I., Roberts-South A., Jog M. and R. El-Sakka M..
Computer Assisted Quantification of Hyoid Bone Motion in Fluoroscopic Videos.
DOI: 10.5220/0004276707570761
In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2013), pages 757-761
ISBN: 978-989-8565-47-1
Copyright
c
2013 SCITEPRESS (Science and Technology Publications, Lda.)
a number of studies conducted on intrarater and inter-
rater reliability (Kuhlemeier et al., 1998; McCullough
et al., 2001; Stoeckli et al., 2003; Scott et al., 1998).
Evidently, the evaluation process demands more
objective methods for quantifying various measures
involved with the evaluation process. However, as of
the writing of this article, only a few attempts have
been made to meet this demand. Chen et al. pro-
posed a computer aided method that measures and
quantifies oral movement (Chen et al., 2001). Aung et
al. proposed automatic identification of a number of
anatomical landmarks using a 16-point active shape
model (Aung et al., 2010b). In a different study, Aung
et al. introduced a semi-automatic approach to deter-
mine the transit time of the bolus (Aung et al., 2010a).
Kellen et al. proposed a semi-automatic method to
track the hyoid bone (Kellen et al., 2010). It is worth
mentioning here that in the work of Kellen et al., the
region-of-interest is identified manually by means of
user interaction.
This research concentrates on the problem of
quantifying the movement of the hyoid bone. In this
work, a semi-automatic method is introduced which
attempts to identify and track the hyoid bone in fluo-
roscopic videos. At the same time, the cervical ver-
tebrae are also identified which establish a relative
referencing system. In order to limit image process-
ing procedures to the relevant area of the image, the
regions-of-interest are automatically identified before
identifying the hyoid bone and the cervical vertebrae.
The rest of the paper is organized as follows. Sec-
tion 2 presents the proposed method. The results are
presented in Section 3. Section 4 concludes the article
by commenting on the results. A number of directions
to future work are pointed out in Section 5.
2 PROPOSED METHOD
The proposed method attempts to quantify the move-
ment of the hyoid bone in fluoroscopic videos. Addi-
tionally, a referencing system relative to the patient is
established by identifying the cervical vertebrae (see
Section 2.3). Using a classification-based approach,
the regions-of-interest are automatically identified in
order to limit image-processing operations on a sub-
region of the image. By matching user defined tem-
plates, objects inside the regions-of-interest are iden-
tified.
2.1 Identifying the Region-of-Interest
The proposed method identifies the regions-of-
interest using a method similar to the one proposed
by Huang et al. where the lumbar vertebrae are de-
tected using a learning-based method (Huang et al.,
2009). Such a method is fast, requires no user inter-
action and can be tuned to achieve high accuracy. In
this research, the regions-of-interestare automatically
identified using the Haar classifier. The Haar classi-
fier uses Haar features to classify sub-regions in the
image and search the image for target objects (Viola
and Jones, 2001). Instead of using the original fea-
tures, an extended feature-set is used in this research
which includes tilted features (Lienhart and Maydt,
2002).
The classifier is trained to identify the region-of-
interest containing the cervical vertebrae. For training
purpose, two sets of example images are prepared.
The cervical vertebrae are present in one set (set of
positive samples), and absent from the other (set of
negative samples). As of the writing of this article,
there is no conclusive study that dictates the optimum
number of samples. However, Lienhart et al. con-
ducted an empirical study on the training process with
5000 positive samples and 3000 negative samples and
the positive samples are derived from 1000 images
(Lienhartet al., 2003). In this research, the same num-
ber of samples is used. For the negative samples, high
resolution random images are utilized.
The training process utilizes the adaboost method
to iteratively classify the samples into their corre-
sponding classes, minimizing the classification error
at each step (Freund and Schapire, 1995). A single
Haar feature performs as an input to a weak classifier.
At each step, the adaboost method combines multiple
weak classifiers in order to generate a boosted classi-
fier. To speed up the detection process, a cascade of
boosted classifiers is used.
It is not required to train a separate classifier for
the purpose of identifying the region-of-interest for
the hyoid bone. In the fluoroscopic videos, the hyoid
bone is always located on the left side of the region-
of-interest for the cervical vertebrae. This observation
suggests that the region-of-interest for the hyoid bone
can be inferred from the region-of-interestfor the cer-
vical vertebrae by mirroring the latter to the left. Fig-
ure 2 shows the identified regions-of-interest for the
hyoid bone and the cervical vertebrae in one of the
frames from the videos.
2.2 Tracking
After the regions-of-interest are identified, it is re-
quired to identify the objects of interest (each cervi-
cal vertebra and the hyoid bone) and track the objects
throughout the video. Template matching is used to
accomplish this task. Before tracking can be started,
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
758
Figure 2: Regions-of-interest for the hyoid bone and the
cervical vertebrae.
Figure 3: Identified locations of the hyoid bone and the in-
dividual cervical vertebrae.
the user has to manually identify the individual ob-
jects (the C2 vertebra, the C3 vertebra, the C4 verte-
bra and hyoid bone) in one of the frames in the video
by identifying the smallest rectangle enclosing each
object. The purpose of these rectangular regions is to
serve as the templates of the objects. Figure 3 shows
the identified locations of the hyoid bone and the indi-
vidual cervical vertebrae inside the regions-of-interest
in one of the frames from the videos. It can be argued
that using generalized templates is more preferable
than using templates from the video to be processed.
However, the shape and the size of the objects can
vary among patients and therefore, using generalized
templates for all patients does not produce good re-
sults.
Figure 4: Relative referencing system.
2.3 Coordinate Transformation
Originally, the results obtained using template match-
ing (locations of objects) are expressed in terms of the
coordinate system of the image, where the upper left
corner of the image is the origin, the X axis lies in
horizontal direction and the Y axis lies in the verti-
cal direction. However, this approach does not allow
us to distinguish between the movement of the hy-
oid bone and the movement of the patient’s head. The
movement of the hyoid bone needs to be isolated from
the movement of the patient’s head by expressing the
movement of the hyoid bone in terms of a referencing
system relative to the patient. In the relative referenc-
ing system, the line through the C2 vertebra and the
C4 vertebra is the vertical direction (V axis). The line
perpendicular to the V axis and passing through the
C4 vertebra is the horizontal direction (U axis). Fig-
ure 4 shows the relative referencing system.
3 RESULTS
The trajectories of the hyoid bone in two sample swal-
low cycles are presented in Figure 5. The horizon-
tal axis and the vertical axis correspond to displace-
ment of the hyoid bone in horizontal and vertical di-
rection, respectively (the U axis and the V axis de-
scribed in Section 2.3). The unit for displacements,
both horizontal and vertical, is in millimeters. Mea-
sured distances are calibrated by securing a coin of
known diameter to the back of the patient’s earlobe.
Results obtained using the proposed method are com-
pared with results obtained by manually identifying
the hyoid bone in the same images. Table 1 shows
the average and the standard deviation of distance be-
tween the locations of the hyoid bone obtained from
both methods. It can be seen from Table 1 that results
obtained from both methods are close to each other.
ComputerAssistedQuantificationofHyoidBoneMotioninFluoroscopic
Videos
759
(a) Cycle 1
(b) Cycle 2
Figure 5: Trajectories of the hyoid bone for two sample
swallow cycles. The horizontal axis and the vertical axis
represent the horizontal and vertical displacement of the hy-
oid bone, respectively.
Table 1: Average and standard deviation of distances be-
tween locations of the hyoid bone obtained from proposed
method and manual identification.
Cycle #
Distance (pixels)
Mean Std
1 3.01 1.96
2 1.94 1.11
4 DISCUSSION AND
CONCLUSIONS
This research introduces a semi-automatic approach
to identify the hyoid bone and quantify its movement
in fluoroscopic videos. Results indicate that the pro-
pose method measures the movement of the hyoid
bone with a significant amount of accuracy. Identify-
ing the region-of-interest allows us to perform image
processing procedures to the most promising area in
the image and to reduce computing time significantly.
Therefore, automatic identification of the regions-of-
interest can be useful in quantifying measures other
than the elevation of the hyoid bone as well. Thera-
peutic use of the proposed method is one of the var-
ious medical applications where the measurement of
the movement of the hyoid bone can be useful. The
proposed method can also be utilized in studies that
attempt to relate swallowing disorder to other dis-
eases. The proposed method requires minimal input
from the user. However, a fully automatic method is
more preferable.
5 FUTURE WORK
In this research, the movement of the hyoid bone is
assumed to be limited to the sagittal plane. Although
this assumption holds for the data used in this re-
search, the possibility of movements in the coronal
plane cannot be eliminated. As a future work, the pro-
posed method can be improved by detecting motion
along the coronal plane.
ACKNOWLEDGEMENTS
We would like to thank Professor Mandar Jog, Direc-
tor of the Movement Disorders Program, LHSC, and
Ms Angela Roberts-South, Speech-Language Pathol-
ogist, for providing us with the necessary medical de-
tails and for being our primary medical and data re-
source for this research. Special credit goes to Dr.
Donald Taves and the staff at the radiology depart-
ment of Parkwood Hospital for their assistance with
collecting the data that have been used in this re-
search. We also acknowledge the Parkinson Disease
Society of Canada for funding the entire data acqui-
sition process through a grant to Dr. Jog and Ms.
Roberts-South and for allowing us to use this data in
our research.
REFERENCES
Aung, M., Goulermas, J., Hamdy, S., and Power, M.
(2010a). Spatiotemporal visualizations for the mea-
surement of oropharyngeal transit time from videoflu-
oroscopy. IEEE Transactions on Biomedical Engi-
neering, 57(2):432–441.
Aung, M., Goulermas, J., Stanschus, S., Hamdy, S., and
Power, M. (2010b). Automated anatomical demar-
cation using an active shape model for videofluoro-
scopic analysis in swallowing. Medical Engineering
and Physics, 32(10):1170–1179.
Chen, Y., Barron, J. L., Taves, D. H., and Martin, R. E.
(2001). Computer measurement of oral movement in
swallowing. Dysphagia, 16(2):97–109.
Freund, Y. and Schapire, R. (1995). A desicion-theoretic
generalization of on-line learning and an application
to boosting. In Vitnyi, P., editor, Computational
Learning Theory, volume 904 of Lecture Notes in
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
760
Computer Science, pages 23–37. Springer Berlin /
Heidelberg.
Huang, S.-H., Chu, Y.-H., Lai, S.-H., and Novak, C.
(2009). Learning-based vertebra detection and itera-
tive normalized-cut segmentation for spinal mri. IEEE
Transactions on Medical Imaging, 28(10):1595–1605.
Kellen, P., Becker, D., Reinhardt, J., and Van Daele, D.
(2010). Computer-assisted assessment of hyoid bone
motion from videofluoroscopic swallow studies. Dys-
phagia, 25(4):298–306.
Kuhlemeier, K., Yates, P., and Palmer, J. (1998). Intra- and
interrater variation in the evaluation of videofluoro-
graphic swallowing studies. Dysphagia, 13(3):142–
147.
Lienhart, R., Kuranov, A., and Pisarevsky, V. (2003). Em-
pirical analysis of detection cascades of boosted clas-
sifiers for rapid object detection. In Michaelis, B. and
Krell, G., editors, Pattern Recognition, volume 2781
of Lecture Notes in Computer Science, pages 297–
304. Springer Berlin/Heidelberg.
Lienhart, R. and Maydt, J. (2002). An extended set of haar-
like features for rapid object detection. In Proceed-
ings of the 2002 International Conference on Image
Processing, volume 1, pages 900–903.
McCullough, G. H., Wertz, R. T., Rosenbek, J. C., Mills,
R. H., Webb, W. G., and Ross, K. B. (2001). Inter-
and intrajudge reliability for videofluoroscopic swal-
lowing evaluation measures. Dysphagia, 16(2):110–
118.
Paik, N.-J., Kim, S. J., Lee, H. J., Jeon, J. Y., Lim, J.-Y.,
and Han, T. R. (2008). Movement of the hyoid bone
and the epiglottis during swallowing in patients with
dysphagia from different etiologies. Journal of Elec-
tromyography and Kinesiology, 18(2):329–335.
Palmer, J. B., Kuhlemeier, K. V., Tippett, D. C., and Lynch,
C. (1993). A protocol for the videofluorographic swal-
lowing study. Dysphagia, 8(3):209–214.
Scott, A., Perry, A., and Bench, J. (1998). A study of inter-
rater reliability when using videofluoroscopy as an as-
sessment of swallowing. Dysphagia, 13(4):223–227.
Stoeckli, S. J., Huisman, T. A. G. M., Seifert, B. A. G. M.,
and Martin-Harris, B. J. W. (2003). Interrater reliabil-
ity of videofluoroscopic swallow evaluation. Dyspha-
gia, 18(1):53–57.
Viola, P. and Jones, M. (2001). Rapid object detection using
a boosted cascade of simple features. In Proceedings
of the 2001 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, volume 1,
pages 511–518.
ComputerAssistedQuantificationofHyoidBoneMotioninFluoroscopic
Videos
761