Medical Image Processing in the Age of Deep Learning
Is There Still Room for Conventional Medical Image Processing Techniques?
Jason Hagerty, R. Joe Stanley and William V. Stoecker
Missouri University of Science and Technology, 1201 N State St., Rolla, MO, U.S.A.
{jrh55c, stanleyj, wvs}@mst.edu
Keywords: Deep Learning, Convolution Neural Networks, Fusion, Transfer Learning.
Abstract: Deep learning, in particular convolutional neural networks, has increasingly been applied to medical
images. Advances in hardware coupled with availability of increasingly large data sets have fueled this rise.
Results have shattered expectations. But it would be premature to cast aside conventional machine learning
and image processing techniques. All that deep learning comes at a cost, the need for very large datasets.
We discuss the role of conventional manually tuned features combined with deep learning. This process of
fusing conventional image processing techniques with deep learning can yield results that are superior to
those obtained by either learning method in isolation. In this article, we review the rise of deep learning in
medical image processing and the recent onset of fusion of learning methods. We discuss supervision
equilibrium point and the factors that favor the role of fusion methods for histopathology and quasi-
histopathology modalities.
1 INTRODUCTION
Because deep learning architectures, in particular the
convolutional neural net (convnet), have attracted
unprecedented attention in medical image
processing, there is a tendency to overlook the
potential contribution of conventional image
processing techniques. The allure of the new
convnet architecture is that it will simplify the task
of image processing. But this convenience comes at
a cost, primarily in demand for more training
examples, and a case will be made here that there is
still a place in image processing for more
conventional computer vision techniques. This
article focuses on the rise of deep learning, in
particular the convnet architecture, the relation
between image complexity and image processing
architecture, and discusses the rule of fusion of
conventional and deep learning architectures. To
better understand the need for conventional learning
techniques, we define two new image complexity
measures. We use these image complexity measures
to define the learning equilibrium (dataset size at
which deep learning techniques gain superiority) as
a function of image complexity. We explore the
situations where fusion of new and conventional
image processing techniques offers the best image
processing solution. Finally, we give examples
where conventional learning and deep learning
fusion has already proven successful.
2 THE RISE OF DEEP LEARNING
IN IMAGE PROCESSING
Deep learning (representation learning) computa-
tional models comprise a sequence of processing
layers operating independently on numeric data to
independently learn hierarchical data representations
(LeCun, 2015; Bengio, 2013; Goodfellow, 2016).
Deep learning can discover intricate structures in
large data sets by using the backpropagation
algorithm to indicate how a machine should change
its internal parameters. Since deep learning
architecture encompasses layers of nodes updating
operating parameters in sequence, it is a type of
neural network. Deep learning models differ from
other neural networks by using a deep graph with
multiple processing layers of a small number of
nodes, as opposed to traditional neural networks,
comprised of few layers with a larger number of
nodes (LeCun, 2015; Bengio, 2013; Goodfellow,
2016). Deep learning, as the term “representation
learning” implies, seeks to discover knowledge
representations rather than to use hand-crafted
knowledge representations. In the past decade, the
306
Hagerty J., Stanley R. and Stoecker W.
Medical Image Processing in the Age of Deep Learning - Is There Still Room for Conventional Medical Image Processing Techniques?.
DOI: 10.5220/0006273803060311
In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017), pages 306-311
ISBN: 978-989-758-225-7
Copyright
c
2017 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
use of the phrase “deep learning” has exploded. A
search on IEEE Xplore returns only 36 articles
published in 2006 vs. 1,017 articles for 2015. The
increasing use of deep learning in research can be
attributed to advances in several areas: the
development of large data sets, also called “big
data,” a dramatic increase in computational power,
and the desire to “re-brand” neural networks,
echoing earlier efforts to rebrand “artificial
intelligence” and “artificial neural networks” (Allen,
2017; LeCun 1998).
Figure 1: Left to right: a.) Input layer accepting a 32x32
RGB image. b.) convolution consisting of 8 7x7 filters. c.)
2x2 max-pooling layer. d.) Fully connected layer with
1536 input nodes for each pixel from the previous layer
and 256 output nodes. e.) Fully connected layer consisting
of 256 inputs nodes plus a bias node and a single sigmoid
activated output node. Total number of free parameters:
525,985.
One deep learning architecture that has been
prominently successful in image recognition
challenges (Goodfellow, 2015) is the convolutional
neural network (convnet). The basic convnet
architecture combines two concepts: the
mathematical convolution operator and a fully
connected neural network. One or more convolution
layers are usually prepended to a fully connected
network. A simple convnet with a single convolution
layer is presented in Figure 1. The application of the
2D convolution operator, shown in Equation 1,
within the convolution layer, enables the network to
process an input image directly without the need of
“flattening” the image, preserving any spatial
relations that may exist in the image. The convnet
architecture was introduced in 1998 when LeCun
presented LeNet (LeCun, 1998) designed to identify
handwritten digits; LeNet yielded a remarkably low
error rate of 0.7%. Equation (1) summarizes the
operation of a kernel k(x,y) upon an image I(x,y).
,
∗
,

,
,


(1)
The addition of convolution layers to a typical
neural network and allowing the back-propagation
training algorithm to update not only the weights of
the fully connected neural network but also the
elements of the 2D convolution kernels, allows the
convnet to directly use images as inputs and
alleviates the need to manually determine the “best”
convolution kernel. Throughout the training period,
the “best” convolutional kernel will be continually
improved upon. This has enabled investigators to
focus on optimizing the architecture of the network
(machine learning) without requiring conventional
manually-tuned feature extraction (computer vision).
But this convenience comes at a cost, the number
of weights (parameters) in the convnet is large and
presents a computational burden on the
backpropagation training algorithm. For the simple
network shown in Figure 1, a total of 525,985
parameters need tuning! The high computational
demands to optimize that many parameters is not the
only concern; because of the parameter count, a
large number of training samples is required for
successful training and generalization.
The high computational requirement to be able to
use deep learning has been somewhat alleviated by a
dramatic increase in computational power from the
now common use of multiple cores (CPUs) in
current processors and specialized graphics
processing units (GPUs). The GPU came about
because of the demands of computer game players
for more detailed graphics. Rendering a scene for a
computer game requires many floating-point matrix
operations. The developers of these GPUs designed
these processors to include hundreds, sometimes
thousands, of cores that are specifically designed to
perform fast and efficient floating-point matrix
operations in an effort to offload the burden from the
CPU. An unexpected but welcomed result was that
the GPUs could be harnessed for machine learning,
since neural networks could also be expressed in a
sequence of floating point matrix operations.
Machine learning algorithms began to be
developed and implemented in a parallel manner to
take advantage of these GPUs. These parallel
algorithms could now be leveraged on clusters of
CPUs or ideally, GPUs. As a result of parallel
implementation of machine learning algorithms
along with fast floating-point operation via use of
GPUs, a 9x reduction in training times is possible
when comparing a single GPU to the single CPU
processor with multiple cores (Brown, 2015).
3 DATA IS THE PROBLEM
The computational demands of deep learning
Medical Image Processing in the Age of Deep Learning - Is There Still Room for Conventional Medical Image Processing Techniques?
307
algorithms are mostly addressed with use of GPUs,
but the number of parameters that require
optimization in a deep learning algorithm is still a
problem. Because of the number of parameters,
training and generalization demand a large training
set. In several domains, publically available large
datasets exist, for example, the ImageNet dataset.
The ImageNet dataset has over 14 million images
that encompass 14 thousand classes (ImageNet,
2016). But for domains such as medicine, although
datasets of moderate size are increasingly available,
very large datasets on the order of ImageNet are not
available. The largest dermoscopy image set, for
example, is located at the ISIC project (ISIC, 2016)
and consists of approximately 12 thousand images,
but only about 700 of those are of melanoma.
Because of the relatively small number of images,
and the heavily biased number of one class (benign
versus melanoma), researchers cannot blindly use a
deep learning algorithm and expect good results.
To use a deep learning approach with the ISIC
dataset, one should augment the original dataset by
including rotated, flipped and mirrored versions of
the original images. Oversampling the minority class
can be used to minimize the bias between classes. A
researcher may also use a network trained in another
domain, such as ImageNet or AlexNet, and use a
technique called transfer learning to train a new
network using the combined features of the pre-
trained network and new features specific to the
learning task. Or a researcher may have to rely on
manually tuned feature extractors to create an input
vector to a learning algorithm that is not a complex
deep learning algorithm.
With smaller datasets, a convnet may not have
the optimal solution architecture. For some domains,
large image sets may not be available. For example,
for skin lesions, the image datasets available may
only contain 10’s or 100’s of examples of a
particular lesion diagnosis. In the future, larger
image sets may become available, as anticipated for
the ISIC project. But these datasets still require
professionals to collect, label and curate the data
accurately and still may only increase by an order or
two of magnitude.
This is where conventional image processing
may continue to excel. Since the image datasets in
specialized fields are usually quite small, manual
extraction of dominant features will offset the lack
of data. These critical features are often the same
features that professional look for in the clinic.
4 MORE COMPLEX IMAGE
SETS REQUIRE MORE
IMAGES FOR SUCCESSFUL
CLASSIFICATION
Recent image recognition challenges, such as those
using the ImageNet dataset (Figure 2), may include
images with varying scenes at different scales and
containing multiple objects. An index of image
complexity (i.c.) can be defined for an image.
Additionally, image complexity can be defined for
an entire set of images. An image complexity index
should be higher if 1) image object sizes vary widely
in scale 2) multiple objects are present in the image
3) distracting objects are present in the image. An
image set is more complex if 1) average image
complexity is higher 2) more classes of images are
present and 3) inter-image variety within a class is
greater. Thus the ImageNet dataset, with various
complex scenes, is quite complex and is quite large.
Intuitively, we may suppose that larger image
datasets are needed for successful diagnosis of more
complex image sets.
Figure 2. ImageNet challenge result. Beginning in 2011,
deep learning (DNN) results (solid line), began to surpass
those obtained from traditional learning (dashed line).
(Brown, 2015).
5 IMAGE COMPLEXITY AND
THE SUPERVISION
EQUILIBRIUM
Previous sections establish that deep learning techni-
VISAPP 2017 - International Conference on Computer Vision Theory and Applications
308
ques need large datasets before accuracy exceeds
that of conventional techniques. The size of the
dataset needed for successful classification is
expected to grow as images and image sets increase
in complexity. Let us consider the case number
spectrum, ranging from low numbers of cases to
very high numbers of cases. We plot the number of
cases on a log scale, shown in Figure 3. In some
two-class problems, such as benign vs. malignant
diagnosis, small datasets may contain equal numbers
of images of benign and malignant cases. For the
zero-knowledge situation, over many trials, as with
coin flipping, we expect 50% diagnostic accuracy.
As the number of cases grows, the expected
accuracy of both conventional and deep learning-
based models tends to increase, with conventional
learning accuracy higher for a small number of
cases. As case numbers grow, at some point, deep
learning techniques become equal in classification
accuracy to conventional techniques, as shown in
Figure 3.
Figure 3: Our conjectured model of diagnostic success for
conventional learning techniques (dashed line), deep
learning techniques (solid line), and fusion techniques
(dot-dashed line). Errors (learning gaps) persist, even with
large case numbers, for all three techniques. Curve
shapes, shown here as linear functions of log (case
numbers), are unknown.
We may define this equilibrium point, where
both deep and conventional learning have the same
diagnostic accuracy as the Learning Equilibrium
(LE). LE is a function of image complexity (i.c.).
Different image spaces have different levels of
complexity, due to both intra- and inter-image
complexity, as noted above. As image space
complexity grows, the number of images required to
represent that complexity grows; the accuracy
obtained for any given number of cases falls. Thus
for high complexity image sets, the accuracy curves
flatten, and; LE grows. We offer the conjecture that
LE(i.c.), with appropriate smoothing, is a
monotonically increasing function of i.c..
As shown in Figure 3, diagnostic success can
never be perfect. Errors persist, even with large case
numbers, due to imperfect knowledge of the image
space. These gaps in knowledge in the three
representations—the conventional learning gap, the
deep learning gap, and the fusion gap, all become
relatively smaller as the number of cases increases,
but will always remain nonzero. In the real world,
perfect diagnostic accuracy remains elusive. Even
histopathologic “gold standards” have an inherent
degree of uncertainty. Expert pathologists disagree
on diagnoses (Krieger, 1994). This creates a
challenge in image machine learning (Guo, 2015).
Table 1: Conventional vs. Deep Learning.
Elements favoring
conventional machine
learning
Examples
Repeating biological units
Cells, nuclei in
histopathologic images
Scale invariance of features Vessel walls
High domain knowledge Organs e.g. brain, heart
Elements favoring DL Examples
No repeating units
Microcalcifications in
breast cancer
Features vary in scale Bone tumors
6 LOW HANGING FRUIT FOR
FUSION TECHNIQUES
Table 1 shows types of elements in medical images
favoring either conventional learning or deep
learning. Types of images favoring conventional
learning include images with repeating biological
units as seen in histopathology, scale-invariant
images as seen in vessel walls, and organs such as
brain and heart described with specific domain
knowledge. In these areas, human-supervised
conventional learning can add significant
information to deep learning by adding biological
descriptions which successfully constrain class
output. Thus we predict that human-supervised
conventional learning will continue to be useful in
histopathology, brain and cardiovascular imaging.
We may also predict that quasi-pathological
domains using newer techniques such as
dermoscopy, confocal microscopy and optical
coherence tomography (OCT) may also utilize
conventional techniques, in some cases fused with
Medical Image Processing in the Age of Deep Learning - Is There Still Room for Conventional Medical Image Processing Techniques?
309
deep-learning techniques for some time to come.
Deep learning, in contrast, is already showing
progress in automated unsupervised analysis of
mammograms (Suzuki, 2016; Wang, 2016),
Three deep-conventional learning fusion
examples have already appeared in the field of
automated histopathology. Zhong and colleagues
fused information from deep learning and
conventional learning (Zhong, 2017). In comparing
multiple machine learning strategies, it was found
that the combination of biologically inspired
conventional cellular morphology features (CMF)
and predictive sparse decomposition deep learning
features provided the best separation of benign and
malignant histology sections (Zhong, 2017). The
deep learning arm used a pre-trained AlexNet
network (transfer learning). The conventional arm
used cellular morphology features, which include
nuclear size, aspect ratio, and mean nuclear gradient.
The researchers concluded that both CMF features
and sparse decomposition deep learning features
encode meaningful biological patterns.
Wang and colleagues were able to detect mitoses
in breast cancer histopathology images by using the
combined manually-tuned CMF data and convolu-
tional neural net features (Wang, 2014).
Arevalo and colleagues added an interpretable
layer they called “digital staining,” to aid in their
deep learning approach to classification of basal cell
carcinoma (Arevalo, 2015). Of interest, the
handcrafted layer finds the area of importance,
reproducing the high-level search strategy of the
expert pathologist.
7 CONCLUSION
Deep learning has shown its ability to solve, with a
high degree of accuracy, rather complex problems.
But conventional machine learning and image
processing techniques should not be totally
discounted. Deep learning’s ability does not come
without a cost: time and dataset requirements. With
very large datasets, deep learning is already the
preferred method to use, but may not be ideal for
smaller datasets. Although conventional machine
learning and image processing may be more labor
intensive, they provide a tool for situations lacking
sufficient data, despite augmentation techniques. We
offer a conjectural model which shows advantages
for conventional learning techniques for small
datasets; advantages shift to deep learning after
some dataset size. We call this dataset size the
“learning equilibrium” (LE). It would be interesting
to study how many images are needed for deep
learning approaches to be effective in different
applications. Another topic for future research is to
determine the characteristics that make one
application require a larger dataset than another. We
may consider the dataset size at the LE to be an
application-specific trade-off; for applications in
which conventional models are effective, the LE
point will be larger.
In some applications, such as histopathology, and
related applications such as dermoscopy, biological
constraints are best modeled by manually-tuned
features. Therefore in these applications especially,
the LE dataset size is large. In these applications
there is still room for familiar computer vision
techniques in the novel world of deep learning.
REFERENCES
LeCun Y., Bengio Y., Hinton G. Deep learning. Nature.
2015 May 28;521(7553):436-44.
Bengio Y., Courville A., Vincent P. Representation
learning: a review and new perspectives. IEEE Trans
Pattern Anal Mach Intell. 2013 Aug; 35(8):1798-828.
Goodfellow I., Bengio Y., Courville A. Deep Learning.
Cambridge MA, MIT Press, 2016.
Allen, Kate. "How a Toronto Professor's Research
Revolutionized Artificial Intelligence | Toronto Star."
Thestar.com. N.p., 17 Apr. 2015. Web. 09 Jan. 2017.
LeCun Y., Bottou L., Bengio Y., Haffner, P., Gradient-
Based Learning Applied to Document Recognition,
Proceedings of the IEEE, 86(11):2278-2324, Nov.
1998.
Goodfellow I. J., Erhan D., Luc Carrier P., et al.
Challenges in representation learning: a report on three
machine learning contests. Neural Netw. 2015 Apr;
64:59-63. PMID: 25613956
Valiant L. "A theory of the learnable", Commun. ACM,
vol. 27, pp. 1134-1142, Nov. 1984.
Brown L. “Deep learning with GPUs”, Larry Brown
Ph.D., Johns Hopkins University, June 2015
http://www.nvidia.com/content/events/geoInt2015/LB
rown_DL.pdf accessed November 28, 2016
"ImageNet", Image-net.org, 2016. [Online]. Available:
http://image-net.org/. [Accessed: 29- Nov- 2016].
"ISIC Archive", Isic-archive.com, 2016. [Online].
Available: https://isic-archive.com. [Accessed: 29-
Nov- 2016].
Krieger N., Hiatt R. A., Sagebiel R. W., Clark W. H.,
Mihm M.C. Inter-observer variability among
pathologists' evaluation of malignant melanoma:
effects upon an analytic study. J Clin Epidemiol. 1994
Aug; 47(8):897-902.
Guo P., Banerjee K., Stanley R., Long R., Antani S.,
Thoma G., Zuna R., Frazier S., Moss R., Stoecker W.
Nuclei-Based Features for Uterine Cervical Cancer
Histology Image Analysis with Fusion-based
VISAPP 2017 - International Conference on Computer Vision Theory and Applications
310
Classification. IEEE J Biomed Health Inform. 2015
Oct 26. [Epub ahead of print]
Suzuki S., Zhang X., Homma N., Ichiji K., Kawasumi Y.,
Ishibashi T., Yoshizawa M. WE-DE-207B-02:
Detection of Masses On Mammograms Using Deep
Convolutional Neural Network: A Feasibility Study.
Med Phys. 2016 Jun; 43(6):3817.
Wang J., Yang X., Cai H., Tan W., Jin C., Li L.
Discrimination of Breast Cancer with Microcalcifica-
tions on Mammography by Deep Learning. Sci Rep.
2016 Jun 7; 6:27327.
Zhong C., Han J., Borowsky A., Parvin B., Wang Y.,
Chang H. When machine vision meets histology: A
comparative evaluation of model architecture for
classification of histology sections. Med Image Anal.
2017 Jan; 35:530-543.
Wang H., Cruz-Roa A., Basavanhally A., Gilmore A. H.,
Shih N., Feldman M., et al. Mitosis detection in breast
cancer pathology images by combining handcrafted
and convolutional neural network features. J Med
Imaging, 1 (3) (2014), p. 034003
Arevalo J., Cruz-Roa A., Arias V., Romero E., González
F.A. An unsupervised feature learning framework for
basal cell carcinoma image analysis. Artif Intell Med.
2015 Jun; 64(2):131-45.
Medical Image Processing in the Age of Deep Learning - Is There Still Room for Conventional Medical Image Processing Techniques?
311