
detection accuracy may be improved. It is worth 
trying to use the SLIC which gives a good 
segmentation quality (Achanta et al., 2010). This is a 
subject for future work. 
ACKNOWLEDGEMENTS 
This work was partially supported by KAKENHI 
Grant Number 24700178. 
REFERENCES 
Russakovsky, O., Lin, Y., Yu, K. and Fei-Fei, L., 2012. 
Object-centric spatial pooling for image classification, 
European Conference on Computer Vision. 
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, 
D., 2010. Object Detection with Discriminatively 
Trained Part Based Models, IEEE Transactions on 
Pattern Analysis and Machine Intelligence, Vol. 32, 
No. 9, Sep. 
Csurka, G., Dance, C., Fan, L., Willamowski, J. and Bray, 
C., 2004. Visual Categorization with Bags of 
Keypoints, Proc. of ECCV Workshop on Statistical 
Learning in Computer Vision, pp. 59–74. 
Arandjelovi’c, R. and Zisserman, A., 2012. Three things 
everyone should know to improve object retrieval, In 
IEEE Conference on Computer Vision and Pattern 
Recognition, pp. 2911-2918. 
Discriminatively trained deformable part models. 
http://cs.brown.edu/~pff/latent-release4/ 
Lazebnik, S., Schmid, C. and Ponce, J., 2006. Beyond 
Bags of Features: Spatial Pyramid Matching for 
Recognizing Natural Scene Categories, In IEEE 
Conference on Computer Vision and Pattern 
Recognition, pp. 2169-2178. 
Fan, R., Chang, K., Hsieh, C., Wang, X. and Lin, C. 2008.  
LIBLINEAR: A library for large linear classification, 
Journal of Machine Learning Research 9, pp. 1871-
1874. 
Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L.J. and 
Fei-Fei, L., 2011. Human Action Recognition by 
Learning Bases of Action Attributes and Parts, 
Internation Conference on Computer Vision. 
INRIA Person Dataset http://pascal.inrialpes.fr/ 
data/human/ 
Vedaldi, A. and Zisserman, A., 2010. Efficient Additive 
Kernels via Explicit Feature Maps, In IEEE 
Conference on Computer Vision and Pattern 
Recognition, Vol. 34, No. 3, pp. 480-492. 
Tani, Y. and Hotta, K., 2014. Robust Human Detection to 
Pose and Occlusion Using Bag-of-Words, 
International Conference on Pattern Recognition, pp. 
4376-4381. 
Rother, C., Kolmogorov, V., and Blake, A., 2004. 
GrabCut: Interactive foreground extraction using 
iterated graph cuts, The ACM Special Interest Group 
on Computer Graphics, Vol. 23, pp. 309-314. 
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., and. 
Susstrunk, S. 2010. SLIC superpixels, Technical 
report, EPFL. 
Weijer J., Schmid C., 2007, Applying Color Names to 
Image Description, In IEEE Conference on Computer 
Vision and Pattern Recognition, Vol. 3, pp. 493-496. 
Gavves E., Fernando B., Snoek C.G.M., Smeulders 
A.W.M., and Tuytelaars T, 2013, Fine-Grained 
Categorization by Alignments, In IEEE International 
Conference on Computer Vision. 
RobustHumanDetectionusingBag-of-WordsandSegmentation
509