
 
matching ground truth images. However, it is well-
known that that such an approach will often fail 
under conditions where the required segmentation is 
related to the meaning of the segmented image. 
Indeed, if a reliable unsupervised objective method 
had existed then it would have formed the basis of 
one of the best (if not the best) image segmentation 
algorithms to date. That is not the case. Hence, 
researchers still utilize either subjective evaluation, 
which requires ground truth or supervised objective 
evaluation, which also requires ground truth. Our 
method comes under supervised means of objective 
evaluation, and it is assessed – as it ought to be – by 
subjective visual inspection. 
2 METHODOLOGY 
ISAT assesses the quality of segmentation of any 
image. To do so, ISAT does not require the original 
image, but two other images representing the ideal 
and actual segmentation of the original image. As a 
matter of terminology, the ideally segmented image, 
which is usually drawn by hand, is called the ground 
truth image (or GT). The other image represents the 
result of a segmentation procedure, which is usually 
executed by machine, and is called the Machine 
Segmented image (or MS). Both of these images are 
binary images, in that they exhibit the boundaries of 
the segmented regions as black curves on a white 
background. In all following calculations, it is the 
GT that functions as a reference of presumed truth 
against which a MS image is judged. 
To carry out any kind of segmentation quality 
assessment, connected regions in both GT and MS 
images must be established then, crucially, every 
region in GT must – if possible – be matched with 
one or more regions in MS. Note that one region in 
GT may match one region in MS; that region in GT 
would then be correctly  segmented if the overlap 
between the two regions is great enough or missed if 
the overlap is insufficient. Also, more than one 
region in GT may be matched with one region in 
MS; that region in GT would be under-segmented. 
On the other hand, multiple regions in MS may 
correspond to one region in GT; that region in GT 
would then be over-segmented. Finally, every region 
that exists in MS but does not correspond to any 
region(s) in GT is considered noise. Region-based 
accuracy is calculated as a ratio of the number of 
correctly matched regions in MS to the sum of all 
the regions in GT, plus the number of noise regions 
(which come from MS). All of the above measures 
were based on equivalent measures proposed by 
Hoover et al. (Hoover, 1996). 
As such, an ideally segmented image, from a 
region-based perspective, entails that every region in 
GT is exclusively matched with exactly one region 
in MS, with zero noise (i.e., unmatched regions in 
MS). And in fact, ISAT will return a region-based 
accuracy of 100%, for this case.  Note that matching 
requires an overlap between the two matched 
regions exceeding a pre-set threshold, which we 
currently set to 66% and should not be set to 100%. 
This ensures that the number of correctly segmented 
regions reflects human conceptions of region-based 
segmentation, where the number of approximately 
matched regions (e.g., red blood cells) matter more 
than the precise fit of every matched region (e.g., 
one blood cell).  
Once region identification in both GT and MS is 
completed, and matching of regions between GT and 
MS is done, it is possible to compute all region-
based segmentation quality measures. But also, this 
makes it possible to compute the other set of pixel-
based segmentation quality measures. These 
measures sound familiar, but they are applied 
differently than the well-known True Positive, False 
Negative,  True Negative and  False Positive 
measures used in innumerable studies in image 
processing (Bushberg, 2002). We will describe the 
final pixel-based measures here intuitively, as the 
following sub-sections describe all the measures, in 
full detail. In brief, the final pixel-based measures 
provide a normalized image-wide quantitative 
assessment of the quality of the fit between the 
regions of GT and those they were matched with in 
MS. As such, our sensitivity  is the percentage of 
pixels of regions of GT that were matched with 
regions in MS. Specificity is the percentage of pixels 
of the backgrounds of the various regions in GT that 
were in fact assigned to backgrounds of the 
matching regions in MS. We define the background 
of a region as those pixels that belong to the image 
but not to that region, and we exclude the pixels of 
the edges between regions from all calculations. 
An ideally segmented image, from a pixel-based 
perspective is similar to an ideally segmented image, 
from a region-based point of view, but for one 
exception. Using the red blood cells example, every 
blood cell boundary in the MS image must fit 
perfectly the boundary of every corresponding blood 
cell in the GT image; any deviation no matter how 
small will reduce either sensitivity or specificity and 
hence the overall pixel-based measure of accuracy, 
which is a weighted average of the two. 
AnImageSegmentationAssessmentToolISAT1.0
437