Cross-domain & In-domain Sentiment Analysis with Memory-based

Deep Neural Networks

∗

Gianluca Moro, Andrea Pagliarani, Roberto Pasolini and Claudio Sartori

Department of Computer Science and Engineering (DISI), University of Bologna,

Via Cesare Pavese, 47522 Cesena, Italy

Keywords:

Sentiment Classiﬁcation, Transfer Learning, Fine-tuning, Deep Learning, Big Data, Memory Networks.

Abstract:

Cross-domain sentiment classiﬁers aim to predict the polarity, namely the sentiment orientation of target text

documents, by reusing a knowledge model learned from a different source domain. Distinct domains are ty-

pically heterogeneous in language, so that transfer learning techniques are advisable to support knowledge

transfer from source to target. Distributed word representations are able to capture hidden word relationships

without supervision, even across domains. Deep neural networks with memory (MemDNN) have recently

achieved the state-of-the-art performance in several NLP tasks, including cross-domain sentiment classiﬁca-

tion of large-scale data. The contribution of this work is the massive experimentations of novel outstanding

MemDNN architectures, such as Gated Recurrent Unit (GRU) and Differentiable Neural Computer (DNC)

both in cross-domain and in-domain sentiment classiﬁcation by using the GloVe word embeddings. As far as

we know, only GRU neural networks have been applied in cross-domain sentiment classiﬁcation. Sentiment

classiﬁers based on these deep learning architectures are also assessed from the viewpoint of scalability and

accuracy by gradually increasing the training set size, and showing also the effect of ﬁne-tuning, an expli-

cit transfer learning mechanism, on cross-domain tasks. This work shows that MemDNN based classiﬁers

improve the state-of-the-art on Amazon Reviews corpus with reference to document-level cross-domain sen-

timent classiﬁcation. On the same corpus, DNC outperforms previous approaches in the analysis of a very

large in-domain conﬁguration in both binary and ﬁne-grained document sentiment classiﬁcation. Finally,

DNC achieves accuracy comparable with the state-of-the-art approaches on the Stanford Sentiment Treebank

dataset in both binary and ﬁne-grained single-sentence sentiment classiﬁcation.

1 INTRODUCTION

Sentiment analysis deals with the computational tre-

atment of opinion, appraisals, attitudes, and emotions

toward entities, individuals, issues, events, topics and

their attributes (a survey is in (Liu and Zhang, 2012)).

The task is technically challenging but very useful

in practice. For instance, companies always want to

know customer opinions about their products.

When an understanding is required of whether a

plain text document has a positive, negative or neu-

tral orientation, sentiment classiﬁcation is involved.

This supervised approach learns a model from a la-

belled training set of documents, then applies it to an

unlabelled test set, whose polarity (e.g. positive, ne-

∗

This work was partially supported by the project ”Tore-

ador”, funded by the European Unions Horizon 2020 rese-

arch and innovation programme under grant agreement No

688797. Thanks to NVIDIA Corporation for the donated

Titan GPU used in this work.

gative or neutral orientation) has to be found. The

typical approach to sentiment classiﬁcation assumes

that both the training set and the test set deal with the

same topic. For example, a model is learnt on a set

of book reviews and applied to a distinct set of re-

views, but always about books. This modus operandi,

known as in-domain sentiment classiﬁcation, guaran-

tees optimal performance given that documents from

the same domain are semantically similar. However

this approach is often inapplicable in practice, where

documents are mostly unlabelled. Tweets, blogs, fora,

comments on social networks could bear opinions, but

no information is available on whether they are po-

sitive, negative or neutral. Document categorisation

by human experts is the only way to deal with such

a problem in order to learn an in-domain sentiment

classiﬁer, but it is infeasible with large text sets.

It would be advantageous if a model, once learnt

on a source domain, could be used to classify docu-

ment polarity in a distinct target domain. This appro-

Moro, G., Pagliarani, A., Pasolini, R. and Sartori, C.

Cross-domain & In-domain Sentiment Analysis with Memory-based Deep Neural Networks.

DOI: 10.5220/0007239101270138

In Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2018) - Volume 1: KDIR, pages 127-138

ISBN: 978-989-758-330-8

127

ach, known as cross-domain sentiment classiﬁcation,

has become a hot research thread due to its practi-

cal implications. The biggest obstacle to learning an

effective cross-domain sentiment classiﬁer is the lan-

guage heterogeneity in documents of different dom-

ains. For instance, a book could be described as in-

teresting or boring, whereas an electrical appliance

is more likely to be working or noisy. In such ca-

ses, transfer learning or knowledge transfer techni-

ques may help solving the problem. Many transfer le-

arning approaches have been attempted during the ye-

ars, including the usage of multiple classiﬁers in (Aue

and Gamon, 2005), measures of domain similarity in

(Blitzer et al., 2007), feature and document alignment

in (Pan et al., 2010; He et al., 2011; Zhang et al.,

2015b; Domeniconi et al., 2015b; Domeniconi et al.,

2015a; Bollegala et al., 2016), and knowledge ba-

ses in (Bollegala et al., 2013; Franco-Salvador et al.,

2015). They are generally based on dense bag-of-

words representation and often require heavy para-

meter tuning. Despite their good performance with

small-scale data (e.g. hundreds or few thousands in-

stances), standard transfer learning approaches do not

scale well with the number of features and are not the

best choice with large-scale data.

The advent of deep learning has brought a more

expressive way to encode text, named distributed re-

presentation (aka word vectors), alternative to bag-of-

words. Bag-of-words loses the ordering of words and

ignores their semantics. Distributed representation

solves these problems along with the curse of dimen-

sionality, providing a low-dimensional representation

(i.e. 300 features are often enough) wherein words

are not mutually exclusive and feature conﬁgurations

correspond to the variation seen in the observed data.

The two main model families for learning word vec-

tors are: global factorization methods, such as latent

semantic analysis (LSA) by (Deerwester et al., 1990),

and local context window methods, such as the skip-

gram and the continuous bag-of-words model by (Mi-

kolov et al., 2013), paragraph vector by (Le and Mi-

kolov, 2014), and others proposed by (Mnih and Ka-

vukcuoglu, 2013; Levy and Goldberg, 2014). Met-

hods from the former family leverage statistical in-

formation but perform bad on the word analogy task,

whereas those from the latter family are better on the

analogy task but inadequately utilise the statistics of

corpus since they train on separate local context win-

dows instead of on global co-occurrence counts.

Other than choosing the best text encoding, anot-

her aspect that affects sentiment analysis tasks as sen-

timent classiﬁcation is how to deal with sequential

inputs. This problem impacts on text comprehen-

sion and allows the detection of sentiment inversi-

ons in phrases or sentences. Recurrent nets are of-

ten the best choice for tasks that involve sequential

inputs. They process an input sequence one element

at a time, maintaining a state vector in their hidden

units that implicitly contains information about the

history of all past elements of the sequence. Recur-

rent nets are very powerful, but training them is pro-

blematic because the backpropagated gradients either

explode or vanish over many time steps, as shown by

(Bengio et al., 1994). This makes recurrent neural

network unable to learn long dependencies in text.

The problem was solved by means of Long Short-

Term Memory Network (LSTM) by (Hochreiter and

Schmidhuber, 1997), which introduced memory cells

to store, load and forget relevant information. Re-

cently, new memory-based neural network schemas

have been proposed that achieved the state-of-the-art

in many tasks, including machine translation, graph

tasks (e.g. graph traversal, shortest path, etc.), and

question answering tasks. The rationale is to memo-

rise essential information and use it to handle sequen-

tial events and perform complex reasoning on top of

them. Sentiment analysis and classiﬁcation typically

require complicated relationships to be inferred, such

as the detection of polarity shift and sarcasm. Furt-

hermore, transitive reasoning over multiple sentences

is sometimes needed to correctly identify the opinion

holder, the target, or the sentiment itself.

The contribution of this work is to investigate

with massive experiments to what extent two no-

vel memory-based deep neural network architectures

(MemDNN) perform in cross-domain and in-domain

sentiment classiﬁcation, which are Gated Recurrent

Unit (GRU) by (Cho et al., 2014) and Differentia-

ble Neural Computer (DNC) by (Graves et al., 2016).

We have also combined the two MemDNN with the

use of Global Vectors (GloVe) proposed by (Pen-

nington et al., 2014) in order to allow them to le-

arn from distributed word representation, now a de

facto standard representation in deep learning. As

far as we know, only GRU has been recently ap-

plied by (Dai et al., 2017) to a cross-domain sen-

timent classiﬁcation in combination with word em-

beddings for Chinese corpora. GloVe combines the

advantages of the other two major model families

in literature for learning word vectors. The unsu-

pervised information extracted by means of GloVe

model is an important ﬁrst step to align heterogene-

ous domains. Binary and ﬁne-grained (i.e. multi-

class) sentiment classiﬁers has been constructed for

both MemDNN architectures. Two benchmark data-

sets have been used for the experiments: Amazon Re-

views dataset

for document sentiment classiﬁcation,

http://jmcauley.ucsd.edu/data/amazon/

KDIR 2018 - 10th International Conference on Knowledge Discovery and Information Retrieval

128

and Stanford Sentiment Treebank, introduced by (So-

cher et al., 2013) for single-sentence sentiment clas-

siﬁcation. In-domain and cross-domain document-

level experiments have been done to assess the vari-

ation in performance by the amount of labelled data

available for training and validating the model. Re-

sults have been compared with those in (Domeniconi

et al., 2017), where we developed solutions based on

both paragraph vectors, a different text representa-

tion method, and other machine learning algorithms.

In the former paper, Paragraph Vector by (Le and

Mikolov, 2014), despite no explicit transfer learning

capability, has been shown to achieve cross-domain

accuracy equivalent to a Markov Chain method de-

veloped ad-hoc for cross-domain sentiment classiﬁ-

cation in (Domeniconi et al., 2015b). In the latter

paper, the same Markov Chain approach has been

outperformed by GRU with random feature weig-

hts initialisation when large-scale labelled data are

available for training and validating the model. To

enhance the capability of the MemDNN sentiment

classiﬁers in cross-domain tasks, ﬁne-tuning is per-

formed on a small set of labelled target instances.

Fine-tuning, along with GloVe word representation

and the ability of MemDNN in modelling relevant

sequential information, aid the inter-domain align-

ment and bring to outstanding cross-domain docu-

ment classiﬁcation results. The MemDNN based clas-

siﬁers have also been employed on very large data sets

(e.g. million instances), assessing their document-

level performance in an in-domain conﬁguration. Bi-

nary and ﬁne-grained experiments have been carried

out. The outcome has been compared with several

variants of Character-level Convolutional Neural Net-

works (CharCNN) proposed by (Zhang et al., 2015a).

DNC based classiﬁer outperforms the state-of-the-

art in both binary and ﬁne-grained conﬁgurations,

whereas GRU with GloVe feature weights initialisa-

tion achieves comparable performance with previous

techniques. The experimented MemDNN methods

can be applied to any text, whatever its length and

structure. For this reason, single-sentence sentiment

classiﬁcation has also been performed, using Stan-

ford Sentiment Treebank as the benchmark dataset.

The accuracy of MemDNN techniques is comparable

with state-of-the-art methods in both binary and ﬁne-

grained settings.

2 RELATED WORK

This work encompasses many research threads, inclu-

ding sentiment classiﬁcation, cross-domain and trans-

fer learning, and deep learning. Relevant research ad-

vances are reviewed in this Section, and other met-

hods are mentioned throughout the paper.

2.1 Sentiment Classiﬁcation

Sentiment classiﬁcation consists in labelling a plain

text based on its polarity (i.e. sentiment orientation).

This task is much more difﬁcult than text classiﬁca-

tion by topic, because some form of discourse analy-

sis is necessary. (Pang et al., 2002) pointed out that

the phenomenon of thwarted expectations narrative is

common in documents, where an opinion holder sets

up a deliberate contrast to earlier discussion. For in-

stance, ”This ﬁlm should be brilliant. It sounds like a

great plot, the actors are ﬁrst grade, and the suppor-

ting cast is good as well, and Stallone is attempting

to deliver a good performance. However, it can’t hold

up”. (Turney, 2002) made a similar point, stating that

for reviews the whole is not necessarily the sum of the

parts. This is pretty obvious if we observe the previ-

ous example. In such a case, humans could easily un-

derstand the overall polarity, but it is much less easy

for a machine, unless it is able to perform discourse

analysis and to detect the polarity shift.

To cope with the complexity of sentiment classi-

ﬁcation, several methods have been attempted. (Tan

et al., 2008) and (Qiu et al., 2009) employed a dictio-

nary containing commonly used words in expressing

sentiment to label a portion of informative examples

from a given domain, in order to reduce the labelling

effort and to use the labelled documents as a training

set for a supervised classiﬁer. (Melville et al., 2009)

exploited lexical information about associations bet-

ween words and classes, and reﬁned them for speciﬁc

domains by means of training examples to enhance

accuracy. Other works by (Deng et al., 2014; Wu and

Gu, 2014; Domeniconi et al., 2016) proposed term

weighting schemes to foster sentiment classiﬁcation.

Cross-domain comes into play when the target

domain lacks (or has few) labelled data for training a

classiﬁer with supervision. Transfer learning techni-

ques are generally required to bridge the semantic gap

due to language heterogeneity across domains. Two

transfer learning modes have been identiﬁed by (Pan

and Yang, 2010), namely, instance transfer and fea-

ture representation transfer. In order to bridge the

inter-domain gap, the former adapts source instances

to the target domain, whereas the latter maps source

and target features into a different space. (Aue and

Gamon, 2005) made some attempts to customize a

classiﬁer to a new target domain: training on a mix-

ture of labelled data from other domains where such

data is available, possibly considering just the features

observed in target domain; using multiple classiﬁers

Cross-domain & In-domain Sentiment Analysis with Memory-based Deep Neural Networks

129

trained on labelled data from diverse domains; inclu-

ding a small amount of labelled data from the target.

(Blitzer et al., 2007) discovered a domain similarity

measure that fosters domain adaptation. (Pan et al.,

2010) introduced a spectral feature alignment techni-

que, where domain independent terms helps aligning

domain speciﬁc terms into the same clusters. These

clusters form a latent space that improves the clas-

siﬁcation of the target domain. Apart from this, ot-

her algorithms have been proposed in (Zhang et al.,

2015b; Domeniconi et al., 2015b; Domeniconi et al.,

2015a) to transfer the polarity of features from the

source domain to the target domain by using dom-

ain independent features as a bridge. (He et al., 2011)

modiﬁed the topic-word Dirichlet priors and exten-

ded the joint sentiment-topic model by adding prior

words sentiment. Polarity-bearing topics have been

used to perform feature and document expansion so

as to align domains. (Bollegala et al., 2013) sugge-

sted the adoption of a thesaurus containing labelled

data from the source domain and unlabelled data from

both source and target domains. (Bollegala et al.,

2016) modelled cross-domain sentiment classiﬁcation

as embedding learning, and discovered that a good

objective function should capture geometric proper-

ties in the unlabelled documents of both source and

target domains. These unsupervised properties are

even more important than considering common featu-

res that occur in both domains and than setting label

constraints to the source domain documents.

2.2 Deep Learning

The advent of deep learning has dramatically impro-

ved the state-of-the-art in several research areas, such

as speech processing and recognition, visual object

detection, video, audio, and natural language proces-

sing, and many other domains like drug discovery and

genomics, as pointed out by (LeCun et al., 2015). The

ﬁrst issue to face when analysing a plain text is how to

deal with sequential data. This problem is even more

essential to detect sentiment orientation, because of

the presence of sarcasm, negations, and the phenome-

non of thwarted expectations narrative. Bag-of-words

text representation, where the presence (or the fre-

quency) of terms into documents is encoded in a term-

document matrix, is intrinsically unable to handle se-

quential inputs. Word ordering is lost and word se-

mantics is ignored, since context is not taken into ac-

count. Another big issue of the bag-of-words model is

dimensionality, because each term is a feature of the

model, resulting in very sparse term-document matri-

ces. Feature selection techniques attenuate the pro-

blem and let data be processed, but relevant informa-

tion can be lost during this process.

Alternative to the bag-of-words model are distri-

buted text representations. Words are mapped into

low-dimensional vector spaces, where features, cal-

led word vectors, capture most of the variation obser-

ved in data. A feature in the newer space incorporates

the characteristics of several features in the original

space. (Mikolov et al., 2013) introduced the conti-

nuous bag-of-words (CBOW) model, a local context

window method derived from the neural network lan-

guage model by (Bengio et al., 2003). In CBOW, a

projection layer is shared among words so that their

vectors get projected (e.g. averaged) into the same

position. The model is trained by building a log-

linear classiﬁer with k future and k history words as

input, where the training criterion is to correctly pre-

dict the current word. In the same work, the skip-

gram model is also proposed, where the current word

is used as input to a log-linear classiﬁer with continu-

ous projection layer to predict words within a certain

range before and after the current word itself. Follo-

wing the same idea of word vectors, (Le and Mikolov,

2014) proposed an approach to learn paragraph vec-

tors. Every paragraph vector is mapped into a unique

vector, then averaged or concatenated to word vec-

tors to predict the next word in a given window size

(i.e. context). In spite of capturing semantic and syn-

tactic regularities, local context window methods for

distributed word representation typically fail in mo-

delling global statistics and properties. (Pennington

et al., 2014) advanced a global log-bilinear regression

model to solve this lack. Their GloVe model utili-

ses the beneﬁts of count-based methods like LSA by

(Deerwester et al., 1990), while simultaneously cap-

turing the meaningful linear substructures prevalent in

local context window methods.

Along with the distributed word representation

models, several deep learning architectures have been

proposed that brought to a dramatic improvement in

sentiment classiﬁcation. (Dos Santos and Gatti, 2014)

proposed a deep convolutional neural network that

jointly uses character-level, word-level and sentence-

level representations to perform sentiment analysis of

short texts. (Socher et al., 2013) introduced Recur-

sive Neural Tensor Networks (RecNT N) for single-

sentence sentiment classiﬁcation. Its recursive struc-

ture makes RecNT N able to capture polarity shifts in

sentences. The experiments have been carried out

on Stanford Sentiment Treebank, which became a

benchmark for single-sentence sentiment classiﬁca-

tion. It turned out that RecNT N improves the state-

of-the-art in both binary and ﬁne-grained conﬁgura-

tions. RecNT N has been outperformed by the Dy-

namic Memory Network (DMN) by (Kumar et al.,

KDIR 2018 - 10th International Conference on Knowledge Discovery and Information Retrieval

130

2016), which naturally captures position and tempo-

rality by processing input sequences and questions,

forming episodic memories, and generating relevant

answers. The memory and input modules of the ori-

ginal technique have been improved later by (Xiong

et al., 2016). (Tang et al., 2015) introduced Gated Re-

current Neural Networks to learn vector-based docu-

ment representation, showing that the underlying mo-

del outperforms the standard Recurrent Neural Net-

works in document modelling for sentiment classiﬁ-

cation. (Zhang and LeCun, 2015) applied temporal

convolutional networks to large-scale data sets, sho-

wing that they can perform well without the kno-

wledge of words or any other syntactic or semantic

structures.

Despite the success of deep nets, few work has

been done on transfer learning and cross-domain sen-

timent classiﬁcation so far. The Stacked Denoising

Autoencoder, introduced in (Vincent et al., 2010),

was used by (Glorot et al., 2011) to extract domain-

independent features without supervision that act as

a bridge between heterogeneous domains. In (Do-

meniconi et al., 2017), we showed that labelled data

from multiple domains encoded by means of para-

graph vectors help transfer learning and cross-domain

sentiment classiﬁcation.

3 DEEP LEARNING ADVANCES

This section describes the main features of the deep

learning advances combined in this work to break

through cross-domain sentiment classiﬁcation.

3.1 Gated Recurrent Unit

Gated Recurrent Unit (GRU), proposed by (Cho et al.,

2014), is an evolution of Long Short-Term Memory

(LST M), a neural network architecture provided with

a memory mechanism that allows storing and retai-

ning information through long time sequences. GRU

adds a mechanism that makes each recurrent unit

adaptively able to capture dependencies of different

time scales. While LST M is composed of three gates

(i.e. input, output, and forget), GRU only has two ga-

tes, such as update and reset (ﬁgure 1). The update

gate rules the unit activation, by deciding how much

information will be moved from the previous hidden

state to the current one. Any information in the hid-

den state that becomes irrelevant later on is dropped

via the reset gate. As each hidden unit has separate

reset and update gates, it will learn to capture depen-

dencies over different time scales.

Figure 1: A schematic representation of GRU.

GRU with randomly initialised feature weights

has shown promising results in cross-domain sen-

timent classiﬁcation with large-scale data. When

enough training instances are available, the align-

ment of heterogeneous domains is achieved thanks to

memory units, which are automatically able to cap-

ture and preserve domain-independent information,

despite no explicit transfer learning mechanism.

3.2 Differentiable Neural Computer

Differentiable Neural Computer (DNC), introduced

by (Graves et al., 2016) as the evolution of Neural Tu-

ring Machines (NT M) by (Graves et al., 2014), is one

of the most innovative MemDNN techniques. Dif-

ferently from previous MemDNN architectures (e.g.

GRU), where the memory mechanism was internal

to the network, DNC uses an external memory to re-

present and manipulate complex data structures. The

neural network can selectively address the external

memory, both to read from and write to it, allowing

iterative modiﬁcation of memory content. This makes

DNC able to learn complex tasks from data, such as

ﬁnding the shortest path or inferring the missing links

in graphs, and answering synthetic questions desig-

ned to emulate reasoning in natural language. Figure

2 shows the basic behaviour of a DNC. It uses diffe-

rentiable attention mechanisms to deﬁne weightings,

which represent the degree to which each memory lo-

cation is involved in a read or write operation. The

functional units that determine and apply the weig-

htings are called read and write heads.

Figure 2: A schematic representation of DNC.

Cross-domain & In-domain Sentiment Analysis with Memory-based Deep Neural Networks

131

In the original work, DNC has only been applied

to small-scale tasks. However, (Graves et al., 2016)

pointed out that DNC should be able to seamlessly

acquire knowledge and take advantage of exposure

to large data sources. This consideration, along with

the ability of memory mechanisms to capture inter-

domain relationships, makes DNC suitable for cross-

domain sentiment classiﬁcation.

3.3 GloVe Word Representation

Global Vectors (GloVe) is a log-bilinear regression

model that have been proposed by (Pennington et al.,

2014) to learn distributed word representation. Alike

other methods for learning vector space representa-

tion of words, GloVe is able to capture ﬁne-grained

syntactic and semantic regularities in an unsupervi-

sed fashion, just using vector arithmetic, and solves

the data sparsity problem of dense bag-of-words mo-

dels. GloVe combines the advantages of global ma-

trix factorization and local context window methods:

as the former, it efﬁciently leverages statistical infor-

mation by training only on the nonzero elements in

a word-word co-occurrence matrix; as the latter, it

achieves great performance on word analogy, simi-

larity and named entity recognition tasks. The unsu-

pervised information extracted by means of distribu-

ted word representation fosters the alignment of he-

terogeneous domains ; for this reason, we argue that

GloVe can be promising to initialise the feature weig-

hts that MemDNN architectures will use.

4 DATASETS

In this Section the benchmark datasets for document-

level and single-sentence classiﬁcation respectively

will be introduced. Amazon Reviews corpus

has

been used for the former task, whereas Stanford Sen-

timent Treebank

for the latter. Both are widely used

benchmarks for sentiment analysis.

4.1 Amazon Reviews Corpus

Amazon Reviews corpus is a collection of Amazon

reviews about different domains. Each domain con-

tains a list of English reviews, which include both the

plain text and a score from 1 (i.e. very negative) to

5 (i.e. very positive). In binary sentiment classiﬁ-

cation, reviews with rating > 3 have been conside-

red as positive, reviews with rating < 3 as negative,

http://jmcauley.ucsd.edu/data/amazon/

https://nlp.stanford.edu/sentiment/code.html

while reviews with rating = 3 have been discarded as

they are ambiguous and could express a neutral senti-

ment orientation. On the other hand, all the 5 classes

have been taken into account in the ﬁne-grained set-

ting. Data from 4 domains have been used for the

experiments: Books (B), Movies (M), Electronics (E)

and Clothing-Shoes-Jewelry (J) have been chosen for

a matter of comparison with the state of the art.

4.2 Stanford Sentiment Treebank

Stanford Sentiment Treebank (SST) is a dataset of la-

belled sentences that was introduced by (Socher et al.,

2013). SST is built on a corpus of movie review ex-

cerpts, composed of 11,855 sentences, half of which

are positive and half negative. The sentences are par-

sed with the Stanford parser by (Klein and Manning,

2003) into 215,154 syntactically plausible phrases.

Each phrase is annotated by 3 human experts into 5

possible categories, namely negative, somewhat ne-

gative, neutral, somewhat positive and positive. Simi-

larly to Amazon Reviews corpus, neutral phrases are

discarded in binary classiﬁcation.

5 EXPERIMENTS AND RESULTS

This Section illustrates the experiments that have

been performed. The ﬁrst assesses to what extent

the amount of labelled data available for training

the model affects its performance in both in-domain

and cross-domain document sentiment classiﬁcation.

Then the impact of ﬁne-tuning on cross-domain is

evaluated, with appropriate comparison with the state

of the art. In the third experiment, in-domain docu-

ment sentiment classiﬁcation is performed on large-

scale data, in order to evaluate the scalability of

MemDNN techniques and their potential feasibility

in big data scenarios. The last trial assesses whet-

her MemDNNs can be successfully applied to single-

sentence sentiment classiﬁcation.

Accuracy of the classiﬁer (i.e. the percentage of

correctly classiﬁed instances) has been measured for

each single test, averaging results on 10 randomly

chosen training-test partitions to reduce the variance,

(i.e. the sensitivity to small variations in the training

set), but always keeping the classes balanced.

5.1 The Impact of Training Data

The ﬁrst experiment checks to what extent the amount

of labelled training data affects MemDNN perfor-

mance. Na

ıve Bayes (NB), Markov Chain (MC), Pa-

ragraph Vector (PV ) and Gated Recurrent Unit with

KDIR 2018 - 10th International Conference on Knowledge Discovery and Information Retrieval

132

randomly initialised feature weights (GRU

rand

) have

already been taken into account in (Domeniconi et al.,

2017). For a matter of comparison, source-target par-

titions of three different orders of magnitude have

been tested, preserving 80%-20% as the source-target

ratio, and balancing positive and negative examples.

The small-scale data set has 1,600 labelled instances

as the training set and 400 unlabelled instances as the

test set; the medium-scale 16,000 and 4,000; and the

large-scale 80,000 and 20,000 respectively.

Figure 3 shows the in-domain performance of the

various techniques, averaged on the 4 domains con-

sidered (detailed results have not been reported due

to space reason). As pointed out by (Domeniconi

et al., 2017), deep learning approaches usually do

not perform well when few training data are avai-

lable. That is the reason why MC outperforms the

proposed MemDNN techniques with small-scale data.

However, GRU and DNC outperform the other ap-

proaches. GRU with feature weights initialised by

GloVe achieves a higher accuracy with respect to

GRU

rand

whose features have been initialised with

random weights. Increasing the amount of labelled

training data, DNC obtains astonishing performance.

Its accuracy is 90.08% with medium-scale data, mea-

ning that 16,000 training examples are enough for the

memory mechanism of DNC to capture relevant in-

formation. The same does not hold for GRU, whose

performance does not increase considering medium-

scale data. However, in opposition to their trial, GRU

already achieves comparable performance with MC in

the medium-sized data set. Considering large-scale

data, the accuracy of DNC continues to grow, rea-

ching 91.24%. This outcome makes it interesting to

evaluate to what extent DNC performance can incre-

ase. For this purpose, an in-domain test with a huge

dataset will be shown later in 5.3. Finally, it may be

noted that GRU performance improves as well. A re-

asonable explanation is that the memory mechanism

of GRU is automatically able to decide which infor-

mation is relevant to classiﬁcation, if trained with a

large amount of data.

The cross-domain evaluation of the same techni-

ques can be seen in ﬁgure 4. The plot displays accu-

racy averaged on each of the 12 source-target conﬁ-

gurations of the 4 domains considered. The ﬁrst out-

come that catches the eye is that DNC dramatically

outperforms all the other techniques regardless of the

dataset size. It is remarkable that DNC exceeds by

more than 9% the accuracy of MC, which is a non-

deep method that was speciﬁcally developed by (Do-

meniconi et al., 2015b) to accomplish both transfer le-

arning and sentiment classiﬁcation. The reason of this

outcome resides in several combined factors that lead

Small-scale Medium-scale Large-scale

Dataset size

Accuracy (%)

NB P V MC

GRU

rand

GRU DN C

Figure 3: Average in-domain accuracy over the 4 domains.

Small-scale Medium-scale Large-scale

Dataset size

Accuracy (%)

NB P V MC

GRU

rand

GRU DN C

Figure 4: Average cross-domain accuracy over the 12

source-target combinations of the 4 domains.

to semantic comprehension of text. The ﬁrst factor is

the usage of distributed representation to encode text.

In particular, we used GloVe for word representation

of both GRU and DNC. As pointed out by (Penning-

ton et al., 2014), GloVe combines the advantages of

the other two model families in literature for learning

word vectors, namely factorization methods and lo-

cal context window methods. This means that GloVe

also inherits the beneﬁts of PV , which is able to dis-

cover hidden relationships between semantically si-

milar words. The unsupervised information extracted

by GloVe aids the alignment of heterogeneous dom-

ains. The second factor is the memory mechanism

of DNC. Once enough training data are available,

MemDNN architectures are automatically able to cap-

Cross-domain & In-domain Sentiment Analysis with Memory-based Deep Neural Networks

133

ture domain-independent information and preserve it

in memory. The third factor are deep neural networks.

In particular, DNC is one of the most powerful mecha-

nisms to emulate reasoning and inference problems in

natural language. The combined effect of these three

factors led to a dramatic improvement of the state of

the art in cross-domain sentiment classiﬁcation. DNC

turns out to be 9% more accurate than MC. Compa-

ring in-domain and cross-domain results, it could be

noted that the accuracy of DNC is perfectly aligned

by looking at small-scale data, whereas cross-domain

performance is slightly worse by increasing the da-

taset size. Apart from the astonishing performance of

DNC, careful readers can note the behaviours of GRU

and GRU

rand

respectively, which probably are even

more interesting. As expected, GloVe initialisation

of feature weights leads to a substantial increasing

of accuracy with small-scale data, which jumps from

66.30% of GRU

rand

to 74.14% of GRU. Comparing

in-domain and cross-domain experiments, we can see

the combined effect of GloVe distributed word repre-

sentation and GRU memory mechanism. The former

plays a key role to align heterogeneous domains when

few labelled data are available as the training set. The

latter is automatically able to extract relevant inter-

domain concepts as the amount of labelled training

data increases.

5.2 Fine-tuning of MemDNNs

The second experiment aims to assess whether ﬁne-

tuning affects MemDNN performance. Fine-tuning

is the practice of using a labelled sample of target

instances to reﬁne a model previously learnt on the

source domain. The sample is usually small (e.g.

hundreds instances) for two main reasons. On the one

hand, if a large set of labelled instances was available,

it would be advisable to learn an in-domain sentiment

classiﬁer rather than a cross-domain one. On the other

hand, if a large set of labelled instances was not avai-

lable, the only alternative would be to let a team of hu-

man experts pre-classify some instances. Manual ca-

tegorisation becomes infeasible when many instances

are required to be labelled. Therefore, ﬁne-tuning on

a small sample of labelled target instances is generally

a good trade-off between its cost and the expected

improvement of performance. To further investigate

the performance, we have experimented ﬁne-tuning

by using 250 and 500 examples respectively.

Figure 5 shows the effect of ﬁne-tuning on

MemDNN architectures. GRU takes a slight advan-

tage of ﬁne-tuning. With reference to small-scale

data, accuracy increases from 74.14% to 74.84%

using 250 target instances to 75.77% using 500 tar-

Small-scale Medium-scale Large-scale

Dataset size

Accuracy (%)

GRU

rand

GRU

rand

250

GRU

rand

500

GRU GRU

250

GRU

500

DN C DN C

250

DN C

500

Figure 5: Average cross-domain with ﬁne-tuning accuracy

over the 12 source-target combinations of the 4 domains.

The subscripts 250 and 500 represent the number of labelled

target instances utilised for ﬁne-funing.

get instances. It deserves to be noted that GRU per-

formance is more affected by GloVe feature weights

initialisation than by ﬁne-tuning. In fact, GRU wit-

hout ﬁne-tuning outperforms GRU

rand

500

. As dataset

size increases, the contribution of ﬁne-tuning diminis-

hes, until eventually vanishing with large-scale data.

A different behaviour is observed for DNC. Fine-

tuning only impacts on accuracy when performed on

500 target instances with reference to the small-scale

dataset. It is quite obvious that, when few training

data are available, even a small sample can conside-

rably affect performance. GRU is a clear proof of this

behaviour. However, DNC

250

does not lead to a per-

formance improvement. The reason is that DNC is

a very robust technique, almost unaffected by noise.

The attention mechanism to address the external me-

mory makes DNC less sensitive to noise than GRU,

whose memory units are internal to the network. As

a consequence, DNC is less prone than GRU to al-

tering memory content. In other words, it is unli-

kely that DNC stores irrelevant information in me-

mory. 250 target instances are not relevant enough

for DNC and are considered as noise by the network.

The same considerations apply to experiments with

medium-scale and large-scale data, where both 250

and 500 target instances do not affect performance.

5.3 Large-scale Classiﬁcation

The third experimentation is an in-domain sentiment

classiﬁcation task with a very large data set. This

trial let us assess to what extent MemDNN archi-

tectures with GloVe feature weights initialisation are

KDIR 2018 - 10th International Conference on Knowledge Discovery and Information Retrieval

134

suitable as sentiment classiﬁers in big data scena-

rios. Moreover, MemDNN architectures can be com-

bined with many other sentiment classiﬁcation techni-

ques, in particular several variants of Character-level

Convolutional Neural networks (CharCNN), propo-

sed by (Zhang and LeCun, 2015) and empirically ex-

plored in (Zhang et al., 2015a). For result compa-

rison purposes, two very large data sets have been

constructed. The former deals with binary in-domain

sentiment classiﬁcation, where the goal is to distin-

guish positive from negative instances. The latter

aims to predict the full score assigned to instances

(i.e. from 1 to 5). So it is a ﬁne-grained in-domain

sentiment classiﬁcation task. The binary dataset con-

tains 1, 800, 000 training samples and 200, 000 testing

samples for each polarity sentiment. The ﬁne-grained

contains 600,000 training samples and 130, 000 tes-

ting samples for each of the ﬁve classes. In both da-

tasets, samples have been taken in equal proportion

from the 4 domains considered. Differently from the

previous experiments, review title has also been con-

sidered, together with review content.

Table 1: In-domain accuracy on very large datasets con-

structed from Amazon Reviews corpus. Binary and ﬁne-

grained refer to 2-class and 5-class in-domain sentiment

classiﬁcation respectively. CharCNN variants are preﬁxed

with Lg. or Sm..

Model Binary Fine-grained

BoW 90.40% 54.64%

BoW T f -Id f 91.00% 55.26%

n-grams 92.02% 54.27%

n-grams T f -Id f 91.54% 52.44%

Bag-o f -means 81.61% 44.13%

LST M 93.90% 59.43%

Lg. w2v Conv 94.12% 55.60%

Sm. w2v Conv 94.00% 57.41%

Lg. w2v Conv. T h. 94.20% 56.25%

Sm. w2v Conv. T h. 94.37% 57.50%

Lg. Lk. Conv 94.16% 54.05%

Sm. Lk. Conv 94.15% 56.34%

Lg. Lk. Conv. T h. 94.48% 57.61%

Sm. Lk. Conv. T h. 94.49% 56.81%

Lg. Full. Conv 94.22% 59.11%

Sm. Full. Conv 94.22% 59.12%

Lg. Full. Conv. T h.

94.49% 59.46%

Sm. Full. Conv. T h. 94.34% 59.47%

Lg. Conv 94.49% 58.69%

Sm. Conv 94.50% 59.47%

Lg. Conv. T h. 95.07% 59.55%

Sm. Conv. T h. 94.33% 59.57%

GRU 94.07% 59.55%

DNC 95.51% 61.45%

Apart from the several variants of CharCNN, re-

sults have also been compared with other methods, in-

cluding Long Short-Term Memory networks (LST M),

Bag-of-means by (Lev et al., 2015) and some Bag-

of-words (BoW ) based conﬁgurations. Careful rea-

ders can ﬁnd further details on these methods along

with their parameters in (Zhang et al., 2015a). Table

1 shows the accuracy of MemDNN methods and the

state-of-the-art techniques. GRU achieves compara-

ble performance with the other methods. In particu-

lar, it is slightly more accurate than LST M. This is

not surprising, since GRU is an evolution of LST M,

but both have a built-in memory mechanism. On the

other hand, DNC outcome is astonishing. It outper-

forms all the other techniques with reference to both

binary and ﬁne-grained datasets. Fine-grained accu-

racy is almost 2% higher than the previous methods.

This difference in accuracy is signiﬁcant in a multi-

nomial classiﬁcation problem, where predicting the

correct class is challenging. To the best of our know-

ledge, it is the ﬁrst time that a method achieves accu-

racy higher than 60% on Amazon Reviews corpus in

ﬁne-grained sentiment classiﬁcation.

5.4 Single-sentence Classiﬁcation

While the previous experiments deal with docu-

ment sentiment classiﬁcation, the last one focuses on

single-sentence sentiment classiﬁcation. The bench-

mark dataset used is Stanford Sentiment Treebank

(SST). According to the work by (Socher et al., 2013),

8,544 sentences are used as the training set, 1, 101 as

the validation set, and 2, 210 as the test set. Plenty of

techniques have been applied to SST in the last few

years. (Socher et al., 2013) presented Recursive Neu-

ral Tensor Networks (RecNT N) in the same work they

introduced SST, and compared their algorithm on SST

with Na

ıve Bayes with unigram features (NB), Na

ıve

Bayes with unigram and bigram features (BiNB),

Support Vector Machine with unigram and bigram

features (SV M), Recursive Neural Networks (RNN)

by (Socher et al., 2011) and Matrix-Vector RNN

(MV -RNN) by (Socher et al., 2012). (Kalchbrenner

et al., 2014) proposed Dynamic Convolutional Neu-

ral Network (DCNN), comparing its performance on

SST with Max Time-Delay Neural Networks (Max-

T DNN) by (Collobert and Weston, 2008), and a Neu-

ral Bag-of-Words (NBoW ) model. (Dos Santos and

Gatti, 2014) introduced Character to Sentence Con-

volutional Neural Network (CharSCNN) and applied

it to SST. A variant of CharSCNN has been trained

by using word embeddings only (SCNN). Other two

variants of the previous, referred as CharSCNN ph.

and SCNN ph., have been trained by exploiting also

phrases representation in addition to sentence repre-

sentation. (Kim, 2014) experimented some variants of

Convolutional Neural Networks (CNN-rand, CNN-

static, CNN-non-static, CNN-multichannel) on SST.

(Le and Mikolov, 2014) applied to SST logistic re-

Cross-domain & In-domain Sentiment Analysis with Memory-based Deep Neural Networks

135

gression on top of their Paragraph Vector PV for dis-

tributed word representation. Finally, Multiplicative

Recurrent Neural Network (DRNN) by (Irsoy and

Cardie, 2014), Constituency Tree-LSTM (CT -LST M)

by (Tai et al., 2015), and Dynamic Memory Network

(DMN) by (Kumar et al., 2016) have also been app-

lied to SST.

Table 2: Accuracy achieved by the compared methods on

SST. Binary and ﬁne-grained refer to 2-class and 5-class in-

domain sentiment classiﬁcation respectively.

Model Binary Fine-grained

NB 81.80% 41.00%

BiNB 83.10% 41.90%

SV M 79.40% 40.70%

RecNT N 85.40% 45.70%

Max-T DNN 77.10% 37.40%

NBoW 80.50% 42.40%

DCNN 86.80% 48.50%

RNN 82.40% 43.20%

MV -RNN 82.90% 44.40%

SCNN 82.00% 43.50%

CharSCNN 82.30% 43.50%

SCNN ph. 85.50% 48.30%

CharSCNN ph. 85.70% 48.30%

CNN-rand 82.70% 45.00%

CNN-static 86.80% 45.50%

CNN-non-static 87.20% 48.00%

CNN-multichannel 88.10% 47.40%

PV 87.80% 48.70%

DRNN 86.60% 49.80%

CT -LST M 88.00% 51.00%

DMN 88.60% 52.10%

GRU 84.13% 45.89%

DNC 85.22% 46.78%

Table 2 shows the comparison between the

MemDNN architectures and the mentioned methods.

GRU and DNC achieve comparable performance in

both binary and ﬁne-grained conﬁgurations. The

accuracy of DNC is just about 1% higher than the

accuracy of GRU. They perform similarly to most

of the other techniques, but are not deﬁnitely the best

methods for single-sentence sentiment classiﬁcation.

This is probably due to the absence of a speciﬁc me-

chanism to take sentence syntax into account, and

to the small amount of training data, which is an

obstacle to GRU and DNC performance. Just look

at the in-domain experiment on Amazon Reviews 3,

where they have been outperformed by Markov Chain

with small-scale data. Somebody might argue that

SST have 8, 544 instances, but we should not forget

that they are single-sentences, not whole and usually

longer reviews (i.e. documents) as in the Amazon

dataset. The best algorithm turns out to be DMN,

which performs better than all the other techniques

in both binary and ﬁne-grained conﬁgurations. This

is not surprising, since DMN includes a memory me-

chanism to store and preserve relevant information

through time and has also been proved to work well

with single-sentences in (Kumar et al., 2016).

6 CONCLUSIONS

This work has investigated with massive experiments

to what extent novel memory-based neural networks

(MemDNN) perform in cross-domain and in-domain

sentiment classiﬁcations. We have combined the ad-

vances of MemDNN together with word embeddings,

a de facto standard in deep learning, along with ﬁne-

tuning on target instances to investigate whether they

are able to outperform ad-hoc cross-domain solutions.

Among the deep memory-based methods, we expe-

rimented Differentiable Neural Computer and Gated

Recurrent Unit. The former is one of the most inno-

vative deep learning techniques. Its ability to address

and manage an external memory makes DNC able to

emulate reasoning and inference problems in natural

language. The latter is a different kind of MemDNN,

since its memory mechanism is part of the network

structure. GloVe distributed word representation has

been used in combination with both MemDNN archi-

tectures.

Experiments on Amazon Reviews corpus show

that DNC with GloVe word representation drama-

tically outperforms state-of-the-art techniques for

cross-domain sentiment classiﬁcation. Transfer le-

arning from the source to the target domain is sup-

ported by distributed word representation with small-

scale datasets, as proved by the comparison between

GRU and GRU

rand

, and by memory mechanisms as

the dataset size increases. MemDNN techniques take

advantage of large-scale data to align heterogene-

ous domains. Fine-tuning on a small sample of tar-

get instances is more useful to GRU than DNC, as

the latter is more robust and less sensitive to noise.

Both techniques have been compared with state-of-

the-art methods on two very large datasets, built on

the same Amazon Reviews corpus, for in-domain do-

cument sentiment classiﬁcation. DNC with GloVe

feature weights achieves new state-of-the-art perfor-

mance both in binary and ﬁne-grained classiﬁcation

tasks. Finally, DNC and GRU achieve comparable

performance with many techniques in single-sentence

in-domain sentiment classiﬁcation on Stanford Senti-

ment Treebank. Small-scale training data and the ab-

sence of a mechanism to deal with sentence syntax are

probably the reasons that prevent DNC from reaching

the state-of-the-art performance.

KDIR 2018 - 10th International Conference on Knowledge Discovery and Information Retrieval

136

REFERENCES

Aue, A. and Gamon, M. (2005). Customizing sentiment

classiﬁers to new domains: A case study. In Procee-

dings of recent advances in natural language proces-

sing (RANLP), volume 1, pages 2–1.

Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C.

(2003). A neural probabilistic language model. Jour-

nal of machine learning research, 3(Feb):1137–1155.

Bengio, Y., Simard, P., and Frasconi, P. (1994). Learning

long-term dependencies with gradient descent is difﬁ-

cult. IEEE transactions on neural networks, 5(2):157–

166.

Blitzer, J., Dredze, M., and Pereira, F. (2007). Biographies,

bollywood, boom-boxes and blenders: Domain adap-

tation for sentiment classiﬁcation. In Proceedings of

the 45th annual meeting of the association of compu-

tational linguistics, pages 440–447.

Bollegala, D., Mu, T., and Goulermas, J. Y. (2016). Cross-

domain sentiment classiﬁcation using sentiment sen-

sitive embeddings. IEEE Transactions on Knowledge

and Data Engineering, 28(2):398–410.

Bollegala, D., Weir, D., and Carroll, J. (2013). Cross-

domain sentiment classiﬁcation using a sentiment sen-

sitive thesaurus. IEEE transactions on knowledge and

data engineering, 25(8):1719–1731.

Cho, K., Van Merri

enboer, B., Gulcehre, C., Bahdanau, D.,

Bougares, F., Schwenk, H., and Bengio, Y. (2014).

Learning phrase representations using rnn encoder-

decoder for statistical machine translation. arXiv pre-

print arXiv:1406.1078.

Collobert, R. and Weston, J. (2008). A uniﬁed architec-

ture for natural language processing: Deep neural net-

works with multitask learning. In Proceedings of the

25th international conference on Machine learning,

pages 160–167. ACM.

Dai, M., Huang, S., Zhong, J., Yang, C., and Yang,

S. (2017). Inﬂuence of noise on transfer learning

in chinese sentiment classiﬁcation using gru. In

2017 13th International Conference on Natural Com-

putation, Fuzzy Systems and Knowledge Discovery

(ICNC-FSKD), pages 1844–1849. IEEE.

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer,

T. K., and Harshman, R. (1990). Indexing by latent

semantic analysis. Journal of the American society

for information science, 41(6):391.

Deng, Z.-H., Luo, K.-H., and Yu, H.-L. (2014). A study of

supervised term weighting scheme for sentiment ana-

lysis. Expert Systems with Applications, 41(7):3506–

3513.

Domeniconi, G., Moro, G., Pagliarani, A., and Pasolini,

R. (2015a). Cross-domain sentiment classiﬁcation via

polarity-driven state transitions in a markov model. In

International Joint Conference on Knowledge Disco-

very, Knowledge Engineering, and Knowledge Mana-

gement, pages 118–138. Springer.

Domeniconi, G., Moro, G., Pagliarani, A., and Pasolini, R.

(2015b). Markov chain based method for in-domain

and cross-domain sentiment classiﬁcation. In Kno-

wledge Discovery, Knowledge Engineering and Kno-

wledge Management (IC3K), 2015 7th International

Joint Conference on, volume 1, pages 127–137. IEEE.

Domeniconi, G., Moro, G., Pagliarani, A., and Pasolini, R.

(2017). On deep learning in cross-domain sentiment

classiﬁcation. In Proceedings of the 9th Internatio-

nal Joint Conference on Knowledge Discovery, Kno-

wledge Engineering and Knowledge Management.

Domeniconi, G., Moro, G., Pasolini, R., and Sartori, C.

(2016). A Comparison of Term Weighting Schemes

for Text Classiﬁcation and Sentiment Analysis with a

Supervised Variant of tf.idf, pages 39–58. Springer In-

ternational Publishing, Cham.

Dos Santos, C. N. and Gatti, M. (2014). Deep convolutional

neural networks for sentiment analysis of short texts.

In COLING, pages 69–78.

Franco-Salvador, M., Cruz, F. L., Troyano, J. A., and Rosso,

P. (2015). Cross-domain polarity classiﬁcation using

a knowledge-enhanced meta-classiﬁer. Knowledge-

Based Systems, 86:46–56.

Glorot, X., Bordes, A., and Bengio, Y. (2011). Domain

adaptation for large-scale sentiment classiﬁcation: A

deep learning approach. In Proceedings of the 28th

international conference on machine learning (ICML-

11), pages 513–520.

Graves, A., Wayne, G., and Danihelka, I. (2014). Neural

turing machines. arXiv preprint arXiv:1410.5401.

Graves, A., Wayne, G., Reynolds, M., Harley, T., Da-

nihelka, I., Grabska-Barwi

nska, A., Colmenarejo,

S. G., Grefenstette, E., Ramalho, T., Agapiou, J.,

et al. (2016). Hybrid computing using a neural

network with dynamic external memory. Nature,

538(7626):471–476.

He, Y., Lin, C., and Alani, H. (2011). Automatically ex-

tracting polarity-bearing topics for cross-domain sen-

timent classiﬁcation. In Proceedings of the 49th An-

nual Meeting of the Association for Computational

Linguistics: Human Language Technologies-Volume

1, pages 123–131. Association for Computational Lin-

guistics.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term

memory. Neural computation, 9(8):1735–1780.

Irsoy, O. and Cardie, C. (2014). Modeling compositionality

with multiplicative recurrent neural networks. arXiv

preprint arXiv:1412.6577.

Kalchbrenner, N., Grefenstette, E., and Blunsom, P. (2014).

A convolutional neural network for modelling senten-

ces. arXiv preprint arXiv:1404.2188.

Kim, Y. (2014). Convolutional neural networks for sentence

classiﬁcation. arXiv preprint arXiv:1408.5882.

Klein, D. and Manning, C. D. (2003). Accurate unlexicali-

zed parsing. In Proceedings of the 41st annual meet-

ing of the association for computational linguistics.

Kumar, A., Irsoy, O., Ondruska, P., Iyyer, M., Bradbury, J.,

Gulrajani, I., Zhong, V., Paulus, R., and Socher, R.

(2016). Ask me anything: Dynami memory networks

for natural language processing. In International Con-

ference on Machine Learning, pages 1378–1387.

Le, Q. and Mikolov, T. (2014). Distributed representations

of sentences and documents. In Proceedings of the

Cross-domain & In-domain Sentiment Analysis with Memory-based Deep Neural Networks

137

31st International Conference on Machine Learning

(ICML-14), pages 1188–1196.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep lear-

ning. Nature, 521(7553):436–444.

Lev, G., Klein, B., and Wolf, L. (2015). In defense of word

embedding for generic text representation. In Inter-

national Conference on Applications of Natural Lan-

guage to Information Systems, pages 35–50. Springer.

Levy, O. and Goldberg, Y. (2014). Linguistic regularities in

sparse and explicit word representations. In Procee-

dings of the eighteenth conference on computational

natural language learning, pages 171–180.

Liu, B. and Zhang, L. (2012). A survey of opinion mining

and sentiment analysis. In Mining text data, pages

415–463. Springer.

Melville, P., Gryc, W., and Lawrence, R. D. (2009). Sen-

timent analysis of blogs by combining lexical know-

ledge with text classiﬁcation. In Proceedings of the

15th ACM SIGKDD international conference on Kno-

wledge discovery and data mining, pages 1275–1284.

ACM.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).

Efﬁcient estimation of word representations in vector

space. arXiv preprint arXiv:1301.3781.

Mnih, A. and Kavukcuoglu, K. (2013). Learning word

embeddings efﬁciently with noise-contrastive estima-

tion. In Advances in neural information processing

systems, pages 2265–2273.

Pan, S. J., Ni, X., Sun, J.-T., Yang, Q., and Chen, Z. (2010).

Cross-domain sentiment classiﬁcation via spectral fe-

ature alignment. In Proceedings of the 19th internati-

onal conference on World wide web, pages 751–760.

ACM.

Pan, S. J. and Yang, Q. (2010). A survey on transfer le-

arning. IEEE Transactions on knowledge and data

engineering, 22(10):1345–1359.

Pang, B., Lee, L., and Vaithyanathan, S. (2002). Thumbs

up?: sentiment classiﬁcation using machine learning

techniques. In Proceedings of the ACL-02 con-

ference on Empirical methods in natural language

processing-Volume 10, pages 79–86. Association for

Computational Linguistics.

Pennington, J., Socher, R., and Manning, C. (2014). Glove:

Global vectors for word representation. In Procee-

dings of the 2014 conference on empirical methods in

natural language processing (EMNLP), pages 1532–

1543.

Qiu, L., Zhang, W., Hu, C., and Zhao, K. (2009). Selc:

a self-supervised model for sentiment classiﬁcation.

In Proceedings of the 18th ACM conference on Infor-

mation and knowledge management, pages 929–936.

ACM.

Socher, R., Huval, B., Manning, C. D., and Ng, A. Y.

(2012). Semantic compositionality through recursive

matrix-vector spaces. In Proceedings of the 2012 joint

conference on empirical methods in natural language

processing and computational natural language lear-

ning, pages 1201–1211. Association for Computatio-

nal Linguistics.

Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., and

Manning, C. D. (2011). Semi-supervised recursive au-

toencoders for predicting sentiment distributions. In

Proceedings of the conference on empirical methods

in natural language processing, pages 151–161. As-

sociation for Computational Linguistics.

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning,

C. D., Ng, A., and Potts, C. (2013). Recursive deep

models for semantic compositionality over a senti-

ment treebank. In Proceedings of the 2013 conference

on empirical methods in natural language processing,

pages 1631–1642.

Tai, K. S., Socher, R., and Manning, C. D. (2015). Im-

proved semantic representations from tree-structured

long short-term memory networks. arXiv preprint

arXiv:1503.00075.

Tan, S., Wang, Y., and Cheng, X. (2008). Combining learn-

based and lexicon-based techniques for sentiment de-

tection without using labeled examples. In Procee-

dings of the 31st annual international ACM SIGIR

conference on Research and development in informa-

tion retrieval, pages 743–744. ACM.

Tang, D., Qin, B., and Liu, T. (2015). Document mo-

deling with gated recurrent neural network for senti-

ment classiﬁcation. In EMNLP, pages 1422–1432.

Turney, P. D. (2002). Thumbs up or thumbs down?: se-

mantic orientation applied to unsupervised classiﬁca-

tion of reviews. In Proceedings of the 40th annual

meeting on association for computational linguistics,

pages 417–424. Association for Computational Lin-

guistics.

Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and

Manzagol, P.-A. (2010). Stacked denoising autoen-

coders: Learning useful representations in a deep net-

work with a local denoising criterion. Journal of Ma-

chine Learning Research, 11(Dec):3371–3408.

Wu, H. and Gu, X. (2014). Reducing over-weighting in

supervised term weighting for sentiment analysis. In

COLING, pages 1322–1330.

Xiong, C., Merity, S., and Socher, R. (2016). Dynamic me-

mory networks for visual and textual question answe-

ring. In International Conference on Machine Lear-

ning, pages 2397–2406.

Zhang, X. and LeCun, Y. (2015). Text understanding from

scratch. arXiv preprint arXiv:1502.01710.

Zhang, X., Zhao, J., and LeCun, Y. (2015a). Character-

level convolutional networks for text classiﬁcation. In

Advances in neural information processing systems,

pages 649–657.

Zhang, Y., Hu, X., Li, P., Li, L., and Wu, X. (2015b).

Cross-domain sentiment classiﬁcation-feature diver-

gence, polarity divergence or both? Pattern recog-

nition letters, 65:44–50.

KDIR 2018 - 10th International Conference on Knowledge Discovery and Information Retrieval

138