Robust Image Analysis of BeadChip Microarrays

Jan Kalina

and Anna Schlenker

2,3

Dept. of Medical Informatics and Biostatistics, Institute of Computer Science,

Academy of Sciences of the Czech Republic, Prague, Czech Republic

Institute of Hygiene and Epidemiology, First Faculty of Medicine, Charles University in Prague, Prague, Czech Republic

Dept. of Biomedical Informatics, Faculty of Biomedical Engineering,

Czech Technical University in Prague, Kladno, Czech Republic

Keywords:

Microarray, Robust Image Analysis, Noise, Outlying Measurements, Background Effect.

Abstract:

Microarray images in molecular genetics are heavily contaminated by noise and outlying measurements. This

paper is devoted to analysis of Illumina BeadChip microarray images, primarily to their low-level preprocess-

ing. We point out that standard image analysis procedures, which are implemented in the beadarray package

of BioConductor software, are highly sensitive to contamination by severe noise and outliers. Therefore, the

habitually used methodology does not discover many of the outliers. We illustrate this on real data and show

that the standard background correction method may actually amplify the noise in the image. A robust image

analysis tailor-made for this type of microarray images is highly desirable. We explain principles and show

preliminary results of our robust alternative to the standard approach, which aims to be robust to noise and

outliers in each its step.

1 BEADCHIP MICROARRAYS

Microarrays represent a commonly used technology

for measuring gene expressions. Microarray studies

are typically designed to ﬁnd differently expressed

genes in two or more groups of samples (e.g. patients

with a different form of a disease) or to perform (un-

supervised) clustering or (supervised) classiﬁcation

analysis into given groups (Fraser et al., 2010).

Illumina BeadChip microarrays are claimed to be

the currently most popular technology for measuring

gene expressions (Rueda, 2014). A sample is placed

in a microwell array containing about a million of sil-

ica beads corresponding to different gene transcripts

(Kuhn et al., 2004). The surface of the chip is scanned

to obtain a gray-scale image with a high ﬂuorescence

intensity corresponding to highly expressed genes.

We focus on the low-level preprocessing of Bead-

Chip images, which is a part of the BeadChip image

analysis with the aim to estimate expression valuesfor

each bead from the raw scanned image, i.e. after ﬁlter-

ing out the effect of the background. Various sources

call the low-level preprocessing by different names:

• Feature-intensity extraction (Smith et al., 2010)

• Low-level analysis (Dunning et al., 2008)

• Data reduction (Fraser et al., 2010)

• Quantiﬁcation (Rueda, 2014)

The habitual approach to the image analysis of

BeadChip microarrays is based on the algorithm of

(Dunning et al., 2007), which does not contain a de-

scription of details of the methodology and practition-

ers have to rely on default values of its parameters.

The approach is implemented in the package beadar-

ray, which in its current version 1.16 can be con-

sidered a standard software for the preprocessing of

BeadChip microarrays, before performing a more ad-

vanced analysis with the standard lumi package. Both

packages are part of the open-source software Bio-

conductor (Rueda, 2014).

The basic intensity is observed everywhere in the

image and the target intensity in a particular bead cor-

responds to a gene expression. A raw BeadChip im-

age is a 16-bit gray-scale image, which contains one

of 65536 values. Performing the complex process of

outlier or noise detection is computationally intensive

and the difﬁcult task of information extraction from

the massive 2D images with tens of millions of pixels

makes the ﬁeld of microarrays image analysis to be

an important hot topic in current bioinformatics (Car-

dona and Tomancak, 2012).

Observed gray intensities of BeadChip microarray

images are heavily contaminated both in the beads as

Kalina J. and Schlenker A..

Robust Image Analysis of BeadChip Microarrays.

DOI: 10.5220/0005246900890094

In Proceedings of the International Conference on Bioimaging (BIOIMAGING-2015), pages 89-94

ISBN: 978-989-758-072-7

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

well as in the background. We have observed a con-

tamination by additive as well as impulsive (severely

outlying) noise in isolated pixels or in smaller or

larger regions in a variety of applications due to var-

ious reasons. To give only a few of them, these in-

clude measurement errors, unreliable sample prepa-

ration, or wrong identiﬁcation of gene probes (Fraser

et al., 2010).

Only marginal attention has been paid to the low-

level preprocessing of BeadChip images. Speciﬁc

problems of BeadChips do not allow to simply use

sophisticated tools for image analysis and quality as-

sessment, which are available for Affymetrix microar-

rays (Kuhn et al., 2004). Instead, we hold the opinion

that a robust alternative tailor-made for BeadChip im-

ages would be highly desirable and we bring argu-

ments in favor of such opinion in this paper.

This paper has the following structure. Section 2

explains that procedures habitually applied on Bead-

Chip microarray images are highly vulnerable with

respect to noise. Section 3 proposes a computational

improvement for the foreground estimation. Cur-

rently, we work on robustifying standard image anal-

ysis approaches for BeadChip microarrays and our

ideas of robust image analysis for this task are sum-

marized in Section 4. Some results of the analysis of

real data are shown Section 5, revealing the sensitivity

of standard image analysis procedures with respect to

noise and outliers in the images. Finally, Section 6

concludes the paper.

2 PROBLEMS OF STANDARD

LOW-LEVEL PREPROCESSING

Standard analysis of BeadChip microarry images is

sensitive to random or systematic errors in the raw

data and to artifacts of different sizes and shapes. Its

methods havebeen derivedprimarily with the require-

ment on a high speed (Kuhn et al., 2004). This section

describes all steps of the standard image analysis al-

gorithm and our critical stance will be illustrated on

examples in Section 5. Thus, we bring new ideas and

ﬁndings beyond those of (Smith et al., 2010), who

simply recommended to ignore problematic beads in

the images.

Let us now critically describe all the steps of the

standard image analysis performed on each BeadChip

microarray image. Commonly, the same sample is ap-

plied on two neighboring strips (Shi et al., 2009). The

following are the steps implemented in the beadarray

package.

• Estimating the Local Background Effect by the

5-th smallest value in a square neighborhood of

each bead is heavily inﬂuenced by local noise.

• Estimating the Foreground is performed by

three consecutively applied linear ﬁlters.

1. Image Sharpening can yield negative intensi-

ties. Besides, it yields nonsense (also negative)

values for pixels at the boundary of the image

where there are no beads.

2. Averaging (mean ﬁlter) over a square of size

3× 3 pixels around each given bead propagates

the effect of outliers to their neighbors.

3. Another averaging of four neighboring pixels

of a particular bead.

• Background Correction deﬁned as the differ-

ence between foreground and background inten-

sities. It may also yield negative values, which is

the consequence of the sharpening.

• Data Normalization has been largely discussed

and positively evaluated (Shi et al., 2009), but it

leads to hiding some outliers.

• Outlier Deletion is performed after mixing the

data from both strips, which prevents some out-

liers from being detected.

Unfortunately, these methods in the beadarray

package with default values of their parameters are

strongly inﬂuenced by noise and random or system-

atic errors in the measurements. They transfer the ef-

fect of noise from noisy pixels also to neighboring

pixels (and neighboring genes), which can be am-

pliﬁed from one non-robust step to later ones. In

this way, noise is introduced artiﬁcially to such genes

which are not affected by noise in the raw image. The

results of the whole process are also too much inﬂu-

enced by the sharpening. Besides, there is not even an

attempt to correct for spatial artifacts.

Attention has been paid to the choice of a suitable

background correction method (Smith et al., 2010),

which has only a small inﬂuence on the result, but

no doubts have been cast upon other steps of the low-

level preprocessing. Nevertheless, they all can be eas-

ily shown to have a zero breakdown point, which is

a statistical measure of sensitivity against outliers in

the data (Davies and Gather, 2005).

The local approach to estimation of both back-

ground and foreground does not exploit information

about global trends across the whole strip. Besides,

the initial steps of the standard image analysis of mi-

croarrays are strongly inﬂuenced by local noise in the

neighborhood of particular microbeads and the result-

ing biased values are passed on to the next estima-

tion methods and transformations. Outliers become

masked among the data and their consequent detec-

tion and deletion is much more complicated.

BIOIMAGING2015-InternationalConferenceonBioimaging

3 FOREGROUND ESTIMATION

The three linear ﬁlters of the foreground estimation

within the standard image analysis of the beadarray

package can be expressed by means of a single lin-

ear ﬁlter. Here, we show the ﬁlter equivalent to im-

age sharpening together with the consecutively ap-

plied averaging. We propose its more efﬁcient com-

putation and discuss whether the function meets its

expectations.

Let us denote the intensity in pixel with coordi-

nates [i, j] by w

. The sharpening procedure replaces

−

i, j+1

−

i, j−1

−

i+1, j

−

i−1, j

. (1)

In the current code, sharpening is computed over

the whole image and then the foreground intensities

are computed as the weighted combination of four av-

eraged values. These two ﬁlters can be namely eas-

ily described by one procedure, saving much compu-

tational complexity in terms of ﬂoating point opera-

tions.

This computational improvement based on com-

bining the computation of sharpening and averaging

will be now proposed. It can be easily derived that

both ﬁlters can be joined to one replacing w

directly

by linear combination of raw intensities of pixels in

the neighborhood of the pixel [i, j] with coefﬁcients

given by this convolution mask:

0 −1/18 −1/18 −1/18 0

−1/18 2/9 3/18 2/9 −1/18

−1/18 3/18 1/9 3/18 −1/18

−1/18 2/9 3/18 2/9 −1/18

0 −1/18 −1/18 −1/18 0

Besides, there is no justiﬁed reason for prefer-

ing this sharpening approach rather than any other

one, e.g. that of (Tanaka et al., 2010), or for com-

bining sharpening with smoothing, because sharpen-

ing increases contrast and smoothing removes it. The

motivation for using sharpening, although formulated

in a vague way, was formulated by (Dunning et al.,

2007) as follows. A bead with a high intensity was

claimed to deplete all of the target molecules from the

neighborhood and therefore the measured intensity is

smaller than the real intensity. A reasonable transfor-

mation should take this into account by increasing the

measured intensity. Such idea is deﬁnitely not ful-

ﬁlled by the sharpening of formula (1). Actually, (1)

increases a local contrast, but it does not increase the

intensity in a pixel with a large intensity surrounded

by neighbors also with a large intensity. Besides, we

can say that sharpening is accompanied by a conse-

quent averaging, which has the function to improve

the bias introduced by sharpening.

4 ROBUST IMAGE ANALYSIS

The sequence of standard low-level preprocessing in

package beadarray is sensitive to heavy contamina-

tion by a serious noise, which is omnipresent in gene

expression measurements. Therefore, it is highly de-

sirable to replace standard methods by robust coun-

teparts. Tailor-made methods for microarray image

analysis should be proposed using the knowledge

about the causes of noise and errors in the images.

These can either detect outliers and completely ignore

all of them, but other possibilities include to replace

them by values estimated in a suitable model or to

down-weight them.

The standard analysis of BeadChip microarrays

does not exploit available image denoising methods.

On the other hand, denoising itself is complicated and

its numerous methods rely on certain assumptions,

e.g. Gaussian distribution of noise, which may be un-

suitable and in fact allows to preserve some sorts of

outliers. We actually do not have a good experience

if the microarray image analysis is started by a simple

denoising and continues with standard methods.

It is important to robustify all steps of the ap-

proach implemented in beadarray. For example, re-

placing the average by a robust estimator of location

with a high breakdown point turns out to be useful

in estimating the foreground. However, we are not

aware of any application of highly robust methods in

the analysis of microarray images, which would be

based on the concept of robust image analysis. This

has been described as a branch of image analysis ex-

ploiting the ideas and methods of robust statistics with

a high breakdown point.

We are working on implementation of a robust

alternative to the standard approach in C++. Let us

present some key ideas for improving the robustness:

• Robustify each individual step in the whole pro-

cess of the image analysis by a method with a high

breakdown point.

• Replace linear ﬁlters by robust counterparts. For

example, ﬂters based on order statistics are more

suitable for impulsive noise.

• Ignore pixels with a nonsense intensity (e.g. zero

intensity).

• Ignore such beads, which have outliers in their

neighborhood. Further, outliers should be re-

moved after each step of the analysis. Only this

can prevent from masking of outliers.

• Treat very bright beads and their direct neighbors

(Figure 1) in a speciﬁc way, e.g. delete them or

model their effect and subtract it from the mea-

sured intensities.

RobustImageAnalysisofBeadChipMicroarrays

• Perform an artifact detection and correction tailor-

made for artifacts of various sizes and shapes.

• Use a global model for the trend across the whole

strip and use an efﬁcient algorithm for this inten-

sive computation.

To give an example, let us consider estimating the

local background effect. Here, we propose to replace

the standard approach (see Section 2) by the least

weighted squares estimator of location (Kalina, 2012)

computed from w

(6)

,...w

(10)

, where

(1)

≤ w

(2)

≤ · · · ≤ w

(25)

(2)

are orderedgray intensities of a circular neighborhood

of a given pixel. Better than that, considering a circu-

lar neighborhood would ensure a rotation invariance

of the computation.

Figure 1: A cut-out of a raw BeadChip microarray im-

age. Its standard analysis does not have a special treatmeant

for highly expressed beads transmitting their signal to their

neighborhood.

Currently, we are comparing various alternative

methods, which are also accompanied by a robust sta-

tistical analysis, using e.g. following ideas.

• Use a robust regularized classiﬁcation analysis

method, which is suitable for high-dimensional

data, or a robust dimensionality reduction, using

e.g. recently proposed methods of (Filzmoser and

Todorov, 2011; Kalina, 2014).

• Use a weighted classiﬁcation analysis, where each

gene obtains a weight according to its variability

over beads.

• Classiﬁcation over beads instead of over gene

transcripts, at the cost of a higher computational

intensity.

• Optimize parameters of image analysis proce-

dures to improve classiﬁcation results, e.g. by ro-

bust optimization (Xanthopoulos et al., 2013).

5 EXAMPLE

The harmful inﬂuence of noise on standard proce-

dures of microarray image analysis will be ilustrated

on real data. We analyzed a data subset from a ge-

netic study of the Center for Biomedical Informatics

(Kalina and Zv´arov´a, 2013). Blood samples of 24 pa-

tients with cerebrovascular stroke and of 24 control

individuals were examined by HumanWG-6 Expres-

sion BeadChips according to manufacturer’s protocol.

First, we investigate the effect of the sharpening

and averaging within the process of foreground esti-

mation. In practice, the user of beadarray may select

to turn it off, while it is not possible to turn off the

averaging. Users are adviced to use sharpening (Dun-

ning et al., 2007; Dunning et al., 2008), although its

inﬂuence on the result is extremely high.

We analyzed 48 images (strips of size 2389 ×

18309 pixels) from different microarrays by means

of different approaches, including the option not to

use sharpening and averaging. Figures 2 and 3 show

averaged gene expressions across the 2389 rows of

a microarray strip across 48 microarrays for a partic-

ular gene. The gene was selected randomly as a typi-

cal gene without a differential expression between pa-

tients and control individuals. The plots contain box-

plots of four groups, corresponding to top, upper mid-

dle, bottom middle, and bottom row, respectively. In

other words, the total number of 2389 rows was di-

vided to 4 equally sized groups, while outliers were

discarded from the ﬁgures.

Figure 2 shows average foreground intensities

computed without performing sharpening and averag-

ing. Figure 3 analogous values computed with sharp-

ening and averaging. The effect of these transforma-

tions can be seen as a deformation of values in the

last rows of the strip. The ﬁgures quantify our ex-

perience with heterogeneity of expressions across the

strip, which we have observed in the majority of pre-

processed (but not raw) microarrays.

Moreover, beadarray detects 0.9 % of beads to be

outliers in the outlier deletion step of the algorithm.

Compared to this strikingly small percentage, our pre-

liminary version of robust image processing considers

as much as 15 % of beads to be too inﬂuenced by spa-

tial artifacts or noise and treats them in a speciﬁc way.

To quantify the adherse effect of outliers in the

images, we evaluated a simple robust version of the

image preprocessing. Its parameters were tuned to

detect a given percentage of outliers, namely 10 %

and 20 % of measurements (gene expressions) are dis-

carded from the consequent computations.

Further, highly robust principal component analy-

sis LWS-PCA of (Kalina, 2012) was computed from

BIOIMAGING2015-InternationalConferenceonBioimaging

Figure 2: Example of Section 5: Estimated foreground in-

tensities across 48 samples for a particular gene transcript

without performing sharpening and averaging. The image

shows averaged values for top, upper middle, bottom mid-

dle, and bottom rows of the strips.

Figure 3: Estimated foreground intensities across 48 sam-

ples for a particular gene transcript with sharpening and av-

eraging. This is exactly the result of the beadarray package

with default settings of the parameters.

the data. The ﬁrst row corresponds to the standard ap-

proach. The next rows included trimming off 10 %

and 20 % of measurements, i.e. the ﬁxed number

equal to 10 % (or 20 % of beads are claimed to be

outliers). An increase in the values in Table 1 shows

that the major principal components perform better

in extracting information from the data. In the stan-

dard approach, less important principal components

are namely caused by a mere noise, but their inﬂuence

is now reduced closer to zero and the methods can be

interpreted as a denoising (Tibshirani et al., 2003).

Table 1: The contribution (in %) of the major 5 robust prin-

cipal components to explaining the variability of the gene

expression data for various percentage of outliers trimmed

off.

Percentage of

Index

outliers 1 2 3 4 5

0.9 % 6.4 6.1 5.0 4.3 3.9

10 % 7.8 7.4 5.9 5.2 4.0

20 %

8.8 7.8 6.3 5.4 4.1

6 CONCLUSIONS AND FUTURE

RESEARCH

Microarrays remain to be a perspective technology for

research in molecular genetics (G¨ohlmann and Tal-

loen, 2009) and their role is belived to be still in-

creasing (Rueda, 2014). We analyzed the code of

commonly used methods of the beadarray package in

C++, because their details (e.g. the formula (1)) have

not been critically described in literature. This might

have contributed to the fact that sufﬁcient attention

has not been paid to robustness properties of the stan-

dard methodology, not even in monographs on image

analysis of microarrays (Rueda, 2014). This paper re-

veals some disadvantages of the standard image anal-

ysis of BeadChip microarrays.

We promote our position that the standard soft-

ware for the image analysis of BeadChip microar-

rays, which is implemented in the beadarray pack-

age of Bioconductor, suffers from the presence of se-

vere noise in the scanned images. Therefore, we see

a need for software based on an alternative approach

for the BeadChip image analysis. Our criticism of the

standard approaches illustrated on examples with real

data goes far beyond the mild review of (Smith et al.,

2010). The problem is that results of the standard

analysis are biased even on optimally prepared mi-

croarrays. As a consequence, we give a warning that

results of all microarray genetic studies analyzed by

the standard methodology should be interpreted care-

fully, because they are inﬂuenced by the high sensi-

tivity of standard procedures with respect to the om-

nipresent contamination of data by noise and outlying

values.

Currently, we are implementing a robust approach

and tune its parameters. Such approach allows a spe-

ciﬁc treatment of outliers, spatial artifacts, and global

trend across the strip of the microarray. In this con-

text, pixels inﬂuenced by artifacts or severe noise do

not have to be deleted, but carefully modeled, allow-

ing to suppress the effect of noise. After having de-

scribed some principles of robust image analysis of

BeadChip microarrays in this paper, we propose a list

of ideas or topics for a future research in this area.

• Measurement errors can be estimated as the vari-

ability of measured expression in beads with no

corresponding targets in the genome.

• The type of gene probes or the precise location

of beads may be identiﬁed wrongly with a rela-

tively large probability, particularly for genes with

a small expression.

• Beads corresponding to highly expressed genes

have a tendency to transmit a part of their signal to

RobustImageAnalysisofBeadChipMicroarrays

their neighborhood. Thus, beads with a small in-

tensity may be overilluminated by a strong effect

from their neighborhood (Figure 1).

• Errors in the process of bead localization on the

microarray. They are typical for all beads, but

larger for beads with a lower expression level.

• Errors in the process of bead type identiﬁcation.

• Urealistic assumption that the expressions of each

gene have the same variability.

• Poisson distribution is more adequate for model-

ing the noise because the measured ﬂuorescence

intensities actually correspond to counts of indi-

vidual photons.

• Sensitivity to discretization of coordinates. The

result is highly inﬂuenced by fractional coordi-

nates of the bead, because the assumption of lin-

earity of the intensities of the image between

neighboring pixels is strongly violated.

ACKNOWLEDGEMENTS

The work was ﬁnancially supported by the Neuron

Foundation for Supporting Science. The work of

A. Schlenker was also supported by the speciﬁc re-

search project 260034 (Semantic Interoperability in

Biomedicine and Health Care) of Charles University

in Prague.

REFERENCES

Cardona, A. and Tomancak, P. (2012). Current challenges in

open-source bioimage informatics. Nature Methods,

9:661–665.

Davies, P. and Gather, U. (2005). Breakdown and groups.

Annals of Statistics, 33:997–1035.

Dunning, M., Barbosa-Morais, N., Lynch, A., Tavar´e, S.,

and Ritchie, M. (2008). Statistical issues in the analy-

sis of Illumina data. BMC Bioinformatics, 9(85).

Dunning, M., M.L. Smith, M. R., and Tavar´e, S. (2007).

Beadarray: R classes and methods for Illumina bead-

based data. Bioinformatics, 23:2183–2184.

Filzmoser, P. and Todorov, V. (2011). Review of robust mul-

tivariate statistical methods in high dimension. Ana-

lytica Chinica Acta, 705:2–14.

Fraser, K., Wang, Z., and Liu, X. (2010). Microarray im-

age analysis: An algorithmic approach. Chapman &

Hall/CRC, Boca Raton.

G¨ohlmann, H. and Talloen, W. (2009). Gene expression

studies using Affymetrix microarrays. Chapman &

Hall/CRC, Boca Raton.

Kalina, J. (2012). Implicitly weighted methods in robust

image analysis. Journal of Mathematical Imaging and

Vision, 44:449–462.

Kalina, J. (2014). Classiﬁcation analysis methods for

high-dimensional genetic data. Biocybernetics and

Biomedical Engineering, 34:10–18.

Kalina, J. and Zv´arov´a, J. (2013). Decision support systems

in the process of improving patient safety. In E-health

Technologies and Improving Patient Safety: Exploring

Organizational Factors. IGI Global, Hershey, 71–83.

Kuhn, K., Baker, S., Chudin, E., Lieu, M.-H., Oeser, S.,

Bennett, H., Rigault, P., Barker, D., McDaniel, T., and

Chee, M. (2004). A novel, high-performance random

array platform for quantitative gene expression proﬁl-

ing. Genome Research, 14:2347–2356.

Rueda, L. (2014). Microarray image and data analysis:

Theory and practice. CRC Press, Boca Raton.

Shi, W., Banerjee, A., Ritchie, M., Gerondakis, S., and

Smyth, G. (2009). Illumina WG-6 beadchip strips

should be normalized separately. BMC Bioinformat-

ics, 10(372).

Smith, M., Dunning, M., Tavar´e, S., and Lynch, A. (2010).

Identiﬁcation and correction of previously reported

spatial phenomena using raw Illumina BeadArray

data. BMC Bioinformatics, 11(208).

Tanaka, G., Suetake, N., and Uchino, E. (2010). Im-

age enhancement based on nonlinear smoothing and

sharpening for noisy images. Journal of Advanced

Computational Intelligence and Intelligent Informat-

ics, 14:200–207.

Tibshirani, R., Hastie, T., and Narasimhan, B. (2003). Class

prediction by nearest shrunken centroids, with ap-

plications to DNA microarrays. Statistical Science,

18:104–117.

Xanthopoulos, P., Pardalos, P., and Trafalis, T. (2013). Ro-

bust data mining. Springer, New York.

BIOIMAGING2015-InternationalConferenceonBioimaging