
 
Table 1: Bias of results to existence/absence of a model. 
  With a model  Without a model 
Query Image 
 
Top Results 
 
 
 
automatically remove those before the visual 
descriptors are extracted. 
In Section 2, we refer to some related work in the 
fields of image retreival, image indexing, fashion 
retrieval, and skin detection. In Section 3, we 
describe the system into which we apply skin 
removal and in Section 4, we describe our skin 
removal technique. In Section 5, we present our 
judging methodology and results. Finally in Section 
6, we conclude the paper and discuss future work. 
2 RELATED WORK  
Although there is a lot of work being done on image 
retrieval in general, there is little work done on the 
specific domain of clothing retrieval. Recently, 
Grana et al, 2012, presented their work on fashion 
retrieval based solely on color using a color bag of 
words signature. They describe garments by a single 
dominant color and therefore focus only on images 
with a unique color classification. Arguing that 
uniform color space division and color space 
clustering don’t reflect fashion color jargon, they use 
color classes that label garments in their training set 
to split the color space in a way that minimizes error 
between these color classes. They use automatic pre-
processing to remove skin and mannequin parts and 
then use GrabCut (Rother et al., 2004) to remove 
clothing items that are not the main garment 
depicted in an image. However, they don’t provide a 
description of their skin removal approach in this 
pre-processing step and its impact on retrieval. 
Skin detection has been approached by 
researchers with different methodologies including 
explicit color space thresholding and histogram 
models with naïve Bayes classifiers which we 
discuss later (Kakumanu et al., 2007).  However, we 
noticed that the precision of most of the proposed 
techniques is not high.  That is mainly because those 
techniques depend on analysing the images in the 
visible color spectrum without any attention to the 
context. This is not optimal because many factors 
(like illumination, camera characteristics, shadows, 
makeup, etc…) affect the skin color significantly. A 
workaround is to move the problem to the non-
visible color spectrum (Infra-red range), in which 
the skin color seems to be more consistent across 
different conditions. However, the equipment 
needed is more expensive and usually not available 
in consumer devices. 
3 EXISTING SYSTEM 
We integrate our skin removal component in an 
existing clothing retrieval system (running on a 
commercial search engine) which we briefly 
describe in this section. Figure 3 shows a high-level 
overview of the system. In the coming subsections, 
we briefly describe the features extracted. In the 
following section, we describe our skin removal 
component and how it fits in this system. 
The features generated for each image are 
contours to capture shape, and a single RGB value 
that captures the most dominant color. The image 
indexing and retrieval system is based upon the 
Edgel index by Cao et. Al, 2011. 
3.1 Visual Representation 
When a query image is submitted, a list of candidate 
similar edges is retrieved from an inverted index 
(Sivic and Zisserman, 2003). This list is ranked 
based on a composite score of the edge similarity, 
salient color similarity, and textual description 
similarity. Our interest is in improving the edge 
similarity score by removing unwanted edges and 
therefore improving this metric’s semantic quality. 
By removing such edges, we also potentially impact 
the salient color extracted as motivated in figure 2. 
3.1.1 Image Pre-processing 
To reduce computation and storage costs while 
preserving information, the image is first downsized 
to a maximum dimension of 200 pixels (Cao et. Al, 
2011). The downsized image is then segmented 
using GraphCut (Felzenszwalb and Huttenlocher, 
2004). The output is a segmented image where each 
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
694