Compression Techniques for Deep Fisher Vectors

Sarah Ahmed and Tayyaba Azim

Center of Excellence in IT, Institute of Management Sciences, Peshawar, Pakistan

ssarahahmedd@gmail.com, tayyaba.azim@imsciences.edu.pk

Keywords:

Fisher Vectors (FV), Restricted Boltzmann Machine (RBM), k-Nearest Neighbor (k-nn), Principal Component

Analysis (PCA), Parametric t-SNE, Spectral Hashing.

Abstract:

This paper investigates the use of efﬁcient compression techniques for Fisher vectors derived from deep ar-

chitectures such as restricted Boltzmann machine (RBM). Fisher representations have recently created a surge

of interest by proving their worth for large scale object recognition and retrieval problems due to the intrinsic

properties that make them unique from the conventional bag of visual words (BoW) features, however they

suffer from the problem of large dimensionality. This paper provides empirical evidence along with visualisa-

tions to explore which of the feature normalisation and state of the art compression techniques is well suited

for deep Fisher vectors, making them amenable for large scale visual retrieval with reduced memory footprint.

We further show that the compressed Fisher vectors give impressive classiﬁcation results even with costless

linear classiﬁers like k-nearest neighbour.

1 INTRODUCTION

Large scale image classiﬁcation and retrieval has re-

ceived an increasing attention over the last decade

due to the large amount of multimedia data availabil-

ity on the web and the growing need to mine infor-

mation of interest from these large image reposito-

ries. Where on one end, we have witnessed improve-

ments in the hardware to efﬁciently store and pro-

cess such massively growing data sets, efforts have

also been made at the algorithmic level to come up

with speedy retrieval techniques that are human com-

petitive in perception and image understanding tasks.

These algorithms rely speciﬁcally on how the images

are represented semantically in a feature space that

makes them discriminant as well as retrievable for

later use. In this regard, one of the most popular ap-

proaches to represent images through mid level fea-

tures is bag of visual words (BoW) approach (Csurka

et al., 2004) that converts the visual vocabulary built

in low level feature space into intermediate represen-

tations of ﬁxed size to train a non-linear classiﬁer

like support vector machines (SVM) and have consis-

tently shown to outperform other methods in succes-

sive PASCAL VOC evaluations (Everingham et al.,

2015). However, an important limitation of this ap-

proach lies in its inability to scale to large amounts

of training data. The computational cost of non-

linear SVMs is between O(N

) and O(N

), thus mak-

ing it unattractive for large scale image classiﬁcation

and retrieval problems. Attempts have been made

to reduce the computational cost incurred in training

non-linear SVMs by either changing the classiﬁer or

choosing a better encoding scheme (Farquhar et al.,

2005), (Perronnin et al., 2006), (Boureau et al., 2010),

(Wang et al., 2010), (Lazebnik et al., 2006) that can

perform well even with a linear classiﬁer.

The Fisher kernel (FK) framework introduced by

Jaakola et al. (Jaakkola and Haussler, 1998) and ap-

plied by Perronin et al. (Perronnin and Dance, 2007)

for image classiﬁcation task is an extension of bag of

visual words (BoW) approach and is explained in de-

tail in Section 2. The FK representation overcomes

the limitations of BoW approach (Perronnin et al.,

2010) and has yielded competitive results for large

scale image classiﬁcation and retrieval tasks (Chat-

ﬁeld et al., 2011), (Perronnin and Larlus, 2015). It

combines the beneﬁts of generative and discrimina-

tive approaches to pattern classiﬁcation by deriving

a kernel from a generative probability model of the

data. Another prominent feature of the Fisher vectors

is that they perform very well even with a simple lin-

ear classiﬁer using techniques such as stochastic gra-

dient descent method. However, these recommended

Fisher features have high dimensionality and in com-

bination with a large number of examples could pose

computational and storage constraint (Sanchez et al.,

2013). This problem has been tackled by either us-

ing standard compression techniques (Sanchez et al.,

2013) or through feature selection methods (Zhang

Ahmed, S. and Azim, T.

Compression Techniques for Deep Fisher Vectors.

DOI: 10.5220/0006205002170224

In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2017), pages 217-224

ISBN: 978-989-758-222-6

217

et al., 2014) that reduce the signature length of each

image to acquire less storage and quick retrieval re-

sults.

This paper takes into account a different class of

Fisher vectors derived from a deep stochastic model,

i.e. restricted Boltzmann machine (RBM) (Azim and

Niranjan, 2013) and analyses efﬁcient ways of com-

pressing its dimensionality to achieve minimal loss in

classiﬁcation performance. The dimensionality of the

Fisher vectors derived from deep models has an in-

trinsic relationship with the number of hidden units

of the model. The length of the encoded Fisher vec-

tor increases as the number of hidden units increase.

See Table 1 for illustration of the impact of model’s

architecture on Fisher vector’s length. Our goal is to

reduce the Fisher feature dimensionality and hence

the storage cost and computational load of the re-

trieval algorithm. Our contributions are as follows: a)

Analysing a different class of Fisher vectors for large

scale image classiﬁcation, b) Sparsity analysis of the

Fisher vectors derived from RBM, c) Demonstrating

visualisations of compressed Fisher vectors through

off the shelf available compression techniques, d)

Highlighting the type of feature normalisation scheme

required to achieve better Fisher scores density and

classiﬁcation performance.

Table 1: Growth of Fisher vector’s length on MNIST data

set where the images have dimensionality 28 × 28. The

length, l = |v| × |h| when gradients with respect to the

weights, W between visible and hidden units are considered

only.

RBM Hidden Units 1 10 100 1000 10000

FV’s Length 784 7840 78400 784000 7840000

l = |∇

logP(x

|θ)|

2 THE FISHER KERNEL

FRAMEWORK

The Fisher kernel framework introduced by Jaakola

and Haussler (Jaakkola and Haussler, 1998) proposes

to use a generative probability model, P(x|θ) to de-

rive a kernel by computing Fisher scores using gradi-

ents of the log likelihood of the data, x with respect to

the model parameters, θ. The derived kernel function

thus takes the form:

K(x

) = φ

−1

, (1)

where U is the covariance matrix of the Fisher scores,

and is regarded as Fisher Information matrix. The

computation of Fisher information matrix is generally

considered immaterial (Jaakkola and Haussler, 1998)

(a) Before normalisation, MNIST Fisher vec-

tors derived from RBM with 1 hidden unit.

(b) After normalization, MNIST Fisher vectors

derived from RBM with 1 hidden unit.

tors derived from RBM with 5 hidden units.

(d) After normalization, MNIST Fisher vectors

derived from RBM with 5 hidden units.

Figure 1: Histogram of the Fisher scores calculated before

and after the application of min-max feature normalisation.

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

218

and is often ignored in practice by replacing it with

an identity matrix, I. However, some of the literature

on the classiﬁcation systems has shown good discrim-

ination results by using approximations of the infor-

mation matrix in kernel computation (Maaten, 2011).

Examples of such approximations include restricted

forms of covariancematrix, such as a diagonal covari-

ance matrix (U = diagonal(σ

)) or isotropic Gaus-

sians (U = σ

I). Fisher kernel, once derived from a

generative probability model, P(x|θ) is capable of be-

ing embedded into any discriminative classiﬁer such

as support vector machines (SVM), linear discrimi-

nant analysis (LDA), neural networks, etc.

In this work, we have taken a restricted Boltz-

mann machine (RBM) (Hinton, 2002) to deriveFisher

scores. A restricted Boltzmann machine is a bipar-

tite graph in which the visible units that represent

observations are connected to binary stochastic hid-

den units using undirected weight connections. The

hidden units are used to discover useful features or

patterns from the data fed to the visible layer during

training. The probability of a joint conﬁguration over

both visible and hidden units depends on the energy

of that joint conﬁguration compared with the energy

of all other joint conﬁgurations:

P(v,h;θ) =

Z(θ)

exp(E(v,h, θ)), (2)

where the partition function, Z(θ) is given as:

Z(θ) =

∑

v,h

exp(E(v,h,θ)).

The parameters of this energy based model are learnt

by performing stochastic gradient descent learning on

the empirical negative log- likelihood of the training

data. A guide to initialise and optimise these parame-

ters, θ = {W, a, b} is given by Hinton (Hinton, 2010).

The Fisher scores, φ

derived from the gradients of

the log likelihood of the data, x with respect to RBM

model parameters, θ = {W,a,b} are given as below:

∇

logP(x

|θ) =



[n]

| Q

[n]

| Z

[n]



,where (3)

[n]

= ∇

logP(x

|θ) = hvh

data

− hvh

model

(4)

[n]

= ∇

logP(x

|θ) = hhi

data

− hhi

model

, (5)

[n]

= ∇

logP(x

|θ) = hvi

data

− hvi

model

. (6)

3 THE FISHER VECTOR

NORMALISATION

In this section, we describe the normalisation scheme

required for achieving competitive classiﬁcation re-

sults on deep Fisher vectors with a linear classiﬁer.

We have applied Min-Max normalisation technique

(Jayalakshmi and Santhakumaran, 2011) on the de-

rived Fisher vectors. This method re-scales our fea-

tures in the range [0, 1]. If x is an n-dimensional fea-

ture vector, the Min-Max normalisation is computed

by using the following linear interpretation formula:

′

= (x

− x

min

)/(x

max

− x

min

), (7)

where x

min

and x

max

represent the maximum and min-

imum values across all dimensions for each image

vector, x. The normalised Fisher vector, x

′

has the

same dimensionality as that of Fisher vector, x. The

Min-Max normalisation has the advantage of preserv-

ing exactly all relationships in the data. Results of

the normalisation scheme could be seen in Figure 1.

Conventionally, the recommended Fisher vectors for

large scale image retrieval are derived from Gaussian

mixture model (GMM) and deploy L2-normalisation

scheme to improve their classiﬁcation performance

(Perronnin et al., 2010). We checked the L2 and

L1 normalisation techniques for deep Fisher vectors

but did not get as much improvement in discrimina-

tion performance as from the Min-Max normalisation

scheme.

4 COMPRESSION TECHNIQUES

In this section, we discuss the following off-the shelf

compression techniques used to reduce the dimen-

sions of normalised Fisher vectors: Principal compo-

nent analysis (PCA), Spectral Hashing (SH), Auto-

encoder and Parametric t-SNE.

4.1 Principal Component Analysis

(PCA)

Principal components analysis (PCA) (Pearson, 1901)

is a linear dimensionality reduction technique that

transforms d dimensional data set into a k dimen-

sional subspace (k < d) such that most of the infor-

mation in the data is retained. The transformed sub-

space, k denotes uncorrelated orthogonal dimensions

along which the data contains maximum variance and

its k-dimensional data representations are called prin-

cipal components or eigen vectors. Eigen vectors are

generally associated with an eigen value which are in-

terpreted as the ‘length’ or ‘magnitude’ of the corre-

sponding eigenvector. If some eigenvalues have a sig-

niﬁcantly larger magnitude than others, then the di-

mensionality reduction of the data set onto a smaller

dimensional subspace is performed by dropping the

‘less informative’ eigen pairs.

Compression Techniques for Deep Fisher Vectors

219

4.2 Spectral Hashing (SH)

Spectral hashing (Weiss et al., 2009) is a non-linear

dimensionality reduction technique that minimises

the Hamming distance between similar pairs of binary

codes using a Gaussian kernel. The algorithm aims to

learn binary encodings of data in such a way that the

points distant in the Euclidean space are also distant in

the Hamming space and vice versa. Spectral hashing

calculates binary mappings by simply minimising the

sum of the Hamming distances between pairs of bi-

nary codes weighted by the Gaussian kernel between

the corresponding vectors. The compact binary code

solution is eventually obtained by thresholding a sub-

set of eigen vectors of the Laplacian of the similarity

graph.

4.3 Autoencoder

Autoencoder (Hinton and Salakhutdinov, 2006) is a

multi-layer stochastic neural network that uses both

encoding and decoding layers to yield non-linear pro-

jections of the data. The encoding layers of the net-

work transform high dimensional data into a low di-

mensional space, while the decoding layers of the net-

work recover the original data from compressed form

into its original input dimensionality. The unsuper-

vised learning algorithm trains both types of network

layers by minimizing the disparity between the orig-

inal data and its reconstructions using chain-rule to

calculate error derivatives for back propagation. In

autoencoder, a joint conﬁguration (v, h)of the visible

and hidden units have the energy:

E(v,h) = −

∑

i∈pixels

−

∑

j∈ f eatures

−

∑

i, j

where w corresponds to the weight connections be-

tween the visible and hidden units and b

corre-

spond to the biases connected to the visible and hid-

den units of the network.

4.4 Parametric t-SNE

Parametric t-SNE (Maaten, 2009) is an unsupervised

dimensionality reduction technique that preserves lo-

cal structure of the data in low dimensional space by

learning a parametric mapping, f : X → Y which uses

a feed-forward neural network with weights W be-

tween the high dimensional space X and low dimen-

sional space Y. The training procedure of paramet-

ric t-SNE is based on the following three steps: (1)

Training RBM (2) Construction of a pre-trained neu-

ral network using stack of RBMs and (3) Fine tuning

of the neural network using back-propagation such

that the cost function is minimised. The cost func-

tion of this network is a Kullback Leibler divergence

function showing divergence between the probabili-

ties reﬂecting the pairwise distance of data in the in-

put space and the latent space.

5 EXPERIMENTS

In order to evaluate the classiﬁcation performance of

compressed Fisher vectors, we applied four differ-

ent compression schemes on Fisher vectors and cal-

culated their accuracies with a simple linear classi-

ﬁer, i.e. k-nearest neighbour. The data set we have

taken for exploration is MNIST. The MNIST dataset

consists of 28× 28 dimensional images with 60,000

digits in the training set and 10,000 digits in the test

set. These images are vectorised to form a 784 di-

mensional vector fed to the RBM’s visible layer for

training.

5.1 Experimental Setup

To start with, we have performed classiﬁcation ex-

periments on Fisher vectors derived from a compact

RBM with 1 and 5 hidden units. The FVs could

have also been derived from a very shallow model

containing thousands of hidden units as reported in

(Azim and Niranjan, 2013), however in that case the

dimensionality of the Fisher vectors scales to a mag-

nitude of 10

and the model tends to over-ﬁt resulting

in no classiﬁcation performance improvements. We

therefore constrained our compression experiment to

Fisher vectors derived from a small RBM that has

shown to report the best performance on MNIST. The

Fisher vectors derived from RBM with 1 hidden unit

havedimensionality 784. Similarly, the Fisher vectors

derived from RBM with 5 hidden units have dimen-

sionality 3920. Please note that we have skipped com-

puting Fisher scores using Equation 5 and Equation 6.

This is because these gradients were not found to im-

prove the classiﬁcation accuracy of the system. Con-

sequently, we only used Equation 3 to compute the

Fisher scores. We compress these two kinds of Fisher

vectors using standard techniques such as PCA, spec-

tral hashing, autoencoder and parametric t-SNE. The

performance of these compression techniques is eval-

uated by plotting two-dimensional visualisations and

by measuring their overall classiﬁcation performance

through linear classiﬁer like k-nearest neighbour (k-

nn).

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

220

(a) Parametric t-SNE (b) Autoencoder

Figure 2: Visualisation of compressed Fisher scores derived from 1 hidden unit RBM.

(a) Parametric t-SNE (b) Auto encoder

Figure 3: Visualisation of compressed Fisher scores derived from 5 hidden units RBM.

6 RESULTS AND DISCUSSION

In Figure 2 and 3, we present the visualisation of

Fisher vectors compressed by PCA, auto-encoder and

parametric t-SNE. We have not shown the visualisa-

tion results of spectral hashing because it gives binary

codes that could not be compared with the remain-

ing visualisation schemes. The visualisations shown

Compression Techniques for Deep Fisher Vectors

221

Table 2: Accuracy of k-nn classiﬁer on un-normalised Fisher scores derived from 1 unit RBM on MNIST data set.

Compression Techniques 2D 10D 20D k-nn

Parametric t-SNE 0.10% 0.10% 0.32%

0.960%

Autoencoder 0.098% 0.098% 0.098 %

PCA 0.099 % 0.18% 0.14%

Spectral Hashing 0.19% 0.099% 0.10%

(nbits=16) (nbits=80) (nbits=160)

Table 3: Accuracy of k-nn classiﬁer on normalized Fisher scores derived from 1 unit RBM on MNIST data set.

Compression Techniques 2D 10D 20D k-nn

Parametric t-SNE 0.86% 0.94% 0.942%

0.960%

Auto encoder 0.78% 0.89% 0.96%

PCA 0.27% 0.67 % 0.68%

Spectral Hashing 0.66% 0.50% 0.44%

(nbits=16) (nbits=80) (nbits=160)

Table 4: Accuracy of k-nn classiﬁer on normalized Fisher scores derived from 5 units RBM on MNIST data set.

Compression Techniques 2D 10D 20D k-nn

Parametric t-SNE 0.83% 0.93% 0.932%

0.963%

Autoencoder 0.75% 0.94% 0.96%

PCA 0.26% 0.68% 0.70%

Spectral Hashing 0.67% 0.43% 0.38%

(nbits=16) (nbits=80) (nbits=160)

are constructed by compressing the Fisher vectors ob-

tained from test images into two dimensions. In Table

2, we show the accuracy of k-nearest neighbour clas-

siﬁer on un-normalised Fisher vectors. Table 3 and 4

demonstrate the accuracy of k-nn classiﬁer on Min-

Max normalised Fisher vectors derived from RBM

with 1 and 5 hidden units respectively. Our results re-

veal that Min-Max normalisation has a huge impact

on the classiﬁcation accuracy achieved by compact

Fisher features.

From the deep Fisher vector classiﬁcation results,

we observe that feature scaling matters largely when

using a Euclidian distance metric space as it is sensi-

tive to the differences in the magnitude or scale of the

attributes. Since k-nn uses Euclidian metric space, it

is important to normalise the derived Fisher features

so that it assigns an equal contribution to all the fea-

tures, prevents outweighing attributes and suppresses

the effect of outliers. Table 2 of our experiments

also give an insight that if the data is not normalized

in the range [0, 1], the sigmoid activation function

used in autoencoder and parametric t-SNE would sat-

urate the gradients of the sigmoid function leading to

very poor/no training of the networks. This ultimately

leads to poor classiﬁcation results as shown in Table

2. Accuracy of k-nn classiﬁer in Table 3 and 4 shows

that if all the features are on same scale then there is

no knock-on effect on the parameter learning of au-

toencoder and parametric t-SNE.

Our visualisations in Figure 2 and 3 show that

PCA does not preserve the signiﬁcant structure of

data in low dimensional space. We therefore con-

clude the following for PCA: 1) PCA makes compu-

tation of eigen vectors infeasible for high dimensional

Fisher vectors (say of magnitude 10

) as the compu-

tation of covariance matrix becomes difﬁcult, 2) PCA

is not scale-invariant and it mainly focuses on pre-

serving large pairwise distances instead of small pair-

wise distances which are important too. In contrast to

PCA, other compression techniques such as autoen-

coder and parametric t-SNE show much better dis-

criminative visualisations and thus yield better clas-

siﬁcation accuracies simultaneously. The best com-

pression performanceon deep Fisher vectors is shown

by parametric t-SNE as it preserves the local struc-

ture of the data appropriately in low dimensional sub-

space using heavy tailed student t- distribution. The

visualisations reﬂect Fisher scores in distinct clusters

representing their membership to different classes of

digits. In comparison to parametric t-SNE, the sec-

ond best classiﬁcation and visualisation performance

is shown by autoencoder. When using spectral hash-

ing (SH), we observed that the performance of the

nearest neighbor classiﬁer decreases as the number of

code bits increase. This is because in spectral hash-

ing, Euclidian distance is inversly proportional to the

Hamming afﬁnity, thus when the number of bits ap-

proach to inﬁnity, spectral hashing does not guarantee

to faithfully reproduce the afﬁnity between the data.

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

222

7 CONCLUSION & FUTURE

WORK

In this paper, we have applied different compression

techniques on Fisher vectors derived from restricted

Boltzmann machine to make them amenable for large

scale retrieval problems. We explored four different

dimensionality reduction techniques for Fisher vec-

tors: PCA, spectral hashing, parametric t-SNE and

autoencoder,and found that parametric t-SNE outper-

forms all the other techniques on high dimensional

Fisher vectors. Moreover, the Max-Min normalisa-

tion scheme improves the accuracy of the linear clas-

siﬁer in Euclidian space.

In the future, we would extend our experiments

to other large scale data sets like PASCAL-VOC (Ev-

eringham et al., 2015) and ImageNet (Russakovsky

et al., 2015) and test the classiﬁcation performance of

compressed Fisher scores with other competitive clas-

siﬁers like support vector machines (SVM). In addi-

tion, we shall also explore if feature selection meth-

ods are much apt than feature compression schemes

to reduce the dimensionality of deep Fisher vectors

for retrieval tasks.

ACKNOWLEDGEMENT

This research was supported by Higher Education

Commission of Pakistan (SRGP: 21-402) & NVIDIA

(Ref.: 281400) with a valuable donation of Titan-X

graphics card.

REFERENCES

Azim, T. and Niranjan, M. (2013). Inducing Discrimina-

tion in Biologically Inspired Models of Visual Scene

Recognition. In IEEE International Workshop on Ma-

chine Learning for Signal Processing (MLSP).

Boureau, Y., Bach, F., LeCun, Y., and Ponce, J. (2010).

Learning Mid-level Features for Recognition. In IEEE

Computer Society Conference on Computer Vision

and Pattern Recognition.

Chatﬁeld, K., Lempitsky, V., Vedaldi, A., and Zisserman, A.

(2011). The Devil is in the Details: An Evaluation of

Recent Feature Encoding Methods. In Proceedings of

the British Machine Vision Conference. BMVA Press.

Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray,

C. (2004). Visual Categorization with Bags of Key-

points. In Workshop on Statistical Learning in Com-

puter Vision, ECCV. Prague.

Everingham, M., Eslami, S., Van Gool, L., Williams, C.,

Winn, J., and Zisserman, A. (2015). The Pascal Visual

Object Classes Challenge: A Retrospective. Interna-

tional Journal of Computer Vision, 111(1):98–136.

Farquhar, J., Szedmak, S., Meng, H., and Shawe-Taylor, J.

(2005). Improving Bag-of-Keypoints Image Categori-

sation: Generative Models and PDF-Kernels.

Hinton, G. (2002). Training Products of Experts by Mini-

mizing Contrastive Divergence. Neural Computation,

14:1771–1800.

Hinton, G. (2010). A Practical Guide to Training Restricted

Boltzmann Machines. Technical report.

Hinton, G. and Salakhutdinov, R. (2006). Reducing the Di-

mensionality of Data with Neural Networks. Science.

Jaakkola, T. and Haussler, D. (1998). Exploiting Genera-

tive Models in Discriminative Classiﬁers. In Advances

in Neural Information Processing Systems 11, pages

487–493. MIT Press.

Jayalakshmi, T. and Santhakumaran, A. (2011). Statistical

Normalization and Back Propagation for Classiﬁca-

tion. International Journal of Computer Theory and

Engineering.

Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond

Bags of Features: Spatial Pyramid Matching for Rec-

ognizing Natural Scene Categories. In IEEE Com-

puter Society Conference on Computer Vision and

Pattern Recognition (CVPR’06). IEEE.

Maaten, L. (2009). Learning a Parametric Embedding by

Preserving Local Structure. RBM.

Maaten, L. (2011). Learning Discriminative Fisher Kernels.

In Getoor, L. and Scheffer, T., editors, Proceedings of

the 28th International Conference on Machine Learn-

ing, ICML 2011, Bellevue, Washington, USA, June 28

- July 2, 2011, pages 217–224. Omnipress.

Pearson, K. (1901). On Lines and Planes of Closest Fit to

Systems of Points in Space. Philosophical Magazine,

2:559–572.

Perronnin, F. and Dance, C. (2007). Fisher Kernels on Vi-

sual Vocabularies for Image Categorization. In IEEE

Conference on Computer Vision and Pattern Recogni-

tion. IEEE.

Perronnin, F., Dance, C., Csurka, G., and Bressan, M.

(2006). Adapted Vocabularies for Generic Visual Cat-

egorization. Springer Berlin Heidelberg.

Perronnin, F. and Larlus, D. (2015). Fisher Vectors Meet

Neural Networks: A Hybrid Classiﬁcation Architec-

ture. In Proceedings of the IEEE Conference on Com-

puter Vision and Pattern Recognition. CVPR.

Perronnin, F., S´anchez, J., and Mensink, T. (2010). Improv-

ing the Fisher Kernel for Large-scale Image Classiﬁ-

cation. In European Conference on Computer Vision.

Springer.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh,

S., Sean, M., Zhiheng, H., Karpathy, A., Khosla, A.,

Bernstein, M., Berg, A., and Fei-Fei, L. (2015). Im-

ageNet Large Scale Visual Recognition Challenge.

International Journal of Computer Vision (IJCV),

115(3):211–252.

Sanchez, J., Perronnin, F., Mensink, T., and Verbeek, J.

(2013). Compressed Fisher Vectors for Large-scale

Image Classiﬁcation. IJCV.

Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., and Gong, Y.

(2010). Locality-constrained Linear Coding for Im-

Compression Techniques for Deep Fisher Vectors

223

age Classiﬁcation. In IEEE Conference on Computer

Vision and Pattern Recognition (CVPR). IEEE.

Weiss, Y., Torralba, A., and Fergus, R. (2009). Spectral

Hashing. In Advances in Neural Information Process-

ing Systems. NIPS.

Zhang, Y., Wu, J., and Cai, J. (2014). Compact Repre-

sentation for Image Classiﬁcation: To Choose or to

Compress? In Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition. IEEE

Computer Society.

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

224