A Weak-supervision Method for Automating Training Set Creation in

Multi-domain Aspect Sentiment Classiﬁcation

Massimo Ruffolo

1,2 a

and Francesco Visalli

High Performance Computing and Networking Institute of the National Research Council (ICAR-CNR),

Via Pietro Bucci 8/9C, Rende (CS), 87036, Italy

Altilia.ai, Technest - University of Calabria, Piazza Vermicelli, Rende (CS), 87036, Italy

Keywords:

Weak-supervision, Data Programming, Deep Learning, Aspect Based Sentiment Analysis, Transformers,

Natural Language Processing.

Abstract:

Aspect Based Sentiment Analysis (ABSA) is receiving growing attention from the research community be-

cause it has applications in several real world use cases. To train deep learning models for ABSA in vertical

domains may result a laborious process requiring a signiﬁcative human effort in creating proper training sets.

In this work we present initial studies regarding the deﬁnition of an easy-to-use, ﬂexible, and reusable weakly-

supervised method for the Aspect Sentence Classiﬁcation task of ABSA. Our method mainly consists in a

process where templates of Labeling Functions automatically annotate sentences, and then the generative

model of Snorkel constructs a probabilistic training set. In order to test effectiveness and applicability of our

method we trained machine learning models where the loss function is informed about the probabilistic nature

of the labels. In particular, we ﬁne-tuned BERT models on two famous disjoint SemEval datasets related to

laptops and restaurants.

1 INTRODUCTION

Aspect Based Sentiment Analysis (ABSA) is a Nat-

ural Language Processing (NLP) problem that is re-

ceiving growing attention from the research commu-

nity because it can be productively used in many dif-

ferent real world use cases (Hu and Liu, 2004). For

example, ABSA enables extracting relevant features,

along with buyers opinions, from product reviews

available online.

Figure 1: Examples of aspects and related sentiments.

ABSA comes in two mainly variants (Pontiki et al.,

2014), one of these provides for two subtasks which

are Aspect Extraction (AE) and Aspect Sentiment

Classiﬁcation (ASC). Figure 1 shows two sentences

extracted from laptops and restaurants domains, re-

spectively. Aspects are highlighted in blue, AE takes

https://orcid.org/0000-0002-4094-4810

care of retrieving them from sentences. Instead, ASC

identiﬁes the polarity of terms referred to the aspect

and that express a sentiment/opinion. ABSA is a dif-

ﬁcult problem to address in multiple domains because

an opinion term that could be positive for a domain

may be not for another.

For example, in Figure 1, the term “hot” ex-

presses a Negative opinion about the aspect “battery”,

whereas the same term referred to the aspect “food”

assumes a Positive connotation. A battery that gets

hot is not desirable, while a hot tasty food is good.

In last years ABSA methods based on deep learn-

ing are becoming mainstream. Deep learning auto-

mates the feature engineering process. Since 2012,

when AlexNet (Krizhevsky et al., 2012) won the Ima-

geNet Large Scale Visual Recognition Competition

deep neural network architectures have contaminated

also the NLP area, becoming the state of the art for

many tasks in this ﬁeld (Devlin et al., 2019; Yang

et al., 2019) comprised ABSA (Xu et al., 2019; Ri-

etzler et al., 2019). Despite such a success, devel-

oping enterprise-grade deep learning-based applica-

tions still pose many challenges. In particular, deep

http://www.image-net.org/challenges/LSVRC/

Ruffolo, M. and Visalli, F.

A Weak-supervision Method for Automating Training Set Creation in Multi-domain Aspect Sentiment Classiﬁcation.

DOI: 10.5220/0009165602490256

In Proceedings of the 12th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2020) - Volume 2, pages 249-256

ISBN: 978-989-758-395-7; ISSN: 2184-433X

249

learning models are often hungry for data, as they are

rich of parameters. Leveraging supervised learning

methods to train these models requires a large amount

of annotated examples. Such training sets are enor-

mously expensive to create, especially when domain

expertise is required. Moreover, the speciﬁcations of

a system often change, requiring the re-labeling of the

datasets. Therefore, it is not always possible to rely

on subject matter experts for labeling the data. This

is one of the most costly bottlenecks to a wide and

pervasive adoption of deep learning methods in real

world use cases. Hence, alleviating the cost of human

annotation is a major issue in supervised learning.

To tackle this problem, vary approaches such as:

transfer learning, semi-supervised learning, where

both unsupervised and supervised learning methods

are exploited, and weak supervision have been pro-

posed. Transfer learning methods such as (Ding et al.,

2017; Wang and Pan, 2018) rely on the fact that a

model trained on a speciﬁc domain (source) can be

exploited to do the same task on another domain (tar-

get), thus reducing the need for labeled data in the

target domain.

One of the most important example of semi-

supervised learning, recently appeared in literature,

is Bidirectional Encoder Representations from Trans-

formers (BERT) (Devlin et al., 2019) that strongly

reduces the volume of needed labeled data. BERT

mainly is a model that provides strong contextual

word embeddings learned in an unsupervised way

by training on large text corpus. But BERT is also

a generic deep learning architecture for many NLP

tasks because it can be ﬁne-tuned in a supervised way

in order to learn various down-stream NLP tasks. The

idea behind ﬁne-tuning is that most of the features

have already been learned and the model just needs

to be specialized for the speciﬁc NLP task. This way,

ﬁne-tuning BERT for a speciﬁc NLP problem, such as

ABSA, requires much less annotated data than learn-

ing the entire task from scratch.

Weak supervision simpliﬁes the annotation pro-

cess in order to make it more automatic and scalable,

even though less accurate and noisier. Weak super-

vised methods rely on several different data annota-

tion techniques such as the use of heuristics, distant

learning, pattern matching, weak classiﬁers and so

on. Recently, (Ratner et al., 2016) proposed data pro-

gramming as a paradigm for semi-automatic datasets

labeling, and Snorkel (Ratner et al., 2017) the system

that implements it. Data programming is based on the

concept of Labeling Functions (LFs), where LFs are

procedures that automatically assign labels to data on

the base of domain knowledge embedded in form of

annotation rules. (Bach et al., 2019) at Google ex-

tended Snorkel in order to achieve better scalability

and knowledge base re-usability for enterprise-grade

training sets labeling.

In this paper we propose a weakly-supervised ap-

proach to the ASC task in ABSA problems. In our

approach we leverage BERT ﬁne-tuning method for

sentiment classiﬁcation as in (Xu et al., 2019), and

Snorkel (Ratner et al., 2017) to apply data program-

ming principles to reviews. The main contributions of

this work are:

• The deﬁnition of a set of easy-to-use, ﬂexible,

and reusable LFs templates that make viable the

weakly-supervised method for the ASC task;

• The experimental evaluation of the generality,

effectiveness, and robustness of the weakly-

supervised method proposed in (Ratner et al.,

2017) when applied to complex problems like the

ASC task.

In order to prove the ﬂexibility and the robustness of

our approach we tested it on two disjoint domains,

laptops and restaurants. In particular we used the

datasets of SemEval task 4 subtask 2 (Pontiki et al.,

2014). Finally, we compared the obtained results with

those of supervised learning methods. Results appear

remarkable for a weakly supervised system. They

show that our approach can be used for practical pur-

pose on multiple domains.

The rest of this paper is organized as follows: in

Section 2 we introduce a list of ABSA works that

try to reduce the need for human effort; in Section

3 we present our method in terms of LFs template we

have deﬁned; in Section 4 the experiments carried out

and the results obtained are shown and discussed; ﬁ-

nally, in Section 5 conclusions are drawn and the fu-

ture work is presented.

2 RELATED WORK

While there is a large corpus of scientiﬁc papers re-

lated to the ABSA problem, to the best of our knowl-

edge there are very few works that propose the appli-

cation of weak-supervision methods to ABSA. (Pab-

los et al., 2014) uses some variations of (Qiu et al.,

2009) and (Qiu et al., 2011) to perform AE and ASC.

In (Pablos et al., 2015) the AE task is done by

bootstrapping a list of candidate domain aspect terms

and using them to annotate the reviews of the same

domain. The polarity detection is performed using

a polarity lexicon exploiting the Word2Vec model

(Mikolov et al., 2013) for each domain (however the

task is a bit different from ASC, they classify Entity-

Attribute pair, where Entity and Attribute belong to

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

250

predeﬁned lists, e.g. food, price, location for Entity

and food-price, food-quality for Attribute).

(Pablos et al., 2018) presents a fully “almost un-

supervised” ABSA system. Starting from a customer

reviews dataset and a few words list of aspects they

extract a list of words per aspect and two lists of posi-

tive and negative words for every selected aspect. It is

based on a topic modelling approach combined with

continuous word embeddings and a Maximum En-

tropy classiﬁer.

(Purpura et al., 2018) performs the AE phase with

a topic modeling technique called Non-negative Ma-

trix Factorization. It allows the user to embed a list of

seed words to guide the algorithm towards more sig-

niﬁcant topic deﬁnition. The ASC is done by using a

list of positive and negative words, with a few senti-

ment terms for each topic. This list is then extended

with the Word2Vec model (Mikolov et al., 2013).

In (Pereg et al., 2019) aspect and opinion lexicons

are extracted from an unsupervised dataset belonging

to the same domain as the target domain. The process

is initialized with a seed lexicon of generic opinion

terms. New aspect and opinion terms are extracted

by using the dependency rules proposed in (Qiu et al.,

2011). The opinion lexicon is then ﬁltered and scored

while the aspect lexicon can be modiﬁed by hand in a

weak supervised manner. ASC is performed on a tar-

get domain by detecting a direct or second-order de-

pendency relation of any type between aspect-opinion

pairs.

All related works are speciﬁcally designed for

ABSA and leverage low level techniques. We pro-

pose a more high level approach, easy to extend, that

simpliﬁes and automates the annotation of sentiment

terminology making the ASC task easily applicable

in multiple domains. Moreover, the nature of our

approach allows using any discriminative model. In

particular, it allows taking advantage of deep learning

models that have reached the state of the art on NLP

tasks.

3 WEAK SUPERVISION FOR ASC

Our intent is to address the ASC task in a weakly-

supervised way. For the scopes of this paper we as-

sume that the AE task has already been done. So, we

have as input a dataset composed of sentence-aspect

pairs. This kind of dataset can be built in many ways.

Sentences can be easily extracted from reviews with

a sentence splitter. Aspects can also be retrieved with

different approaches, see (Pablos et al., 2018; Pereg

et al., 2019) for some examples.

Our weak-supervision method, speciﬁcally de-

signed for the ASC task of the ABSA problem, is

grounded on data programming (Ratner et al., 2016)

that is a weak-supervised paradigm based on the con-

cept of Labeling Functions (LFs), where LFs are pro-

cedures, designed by data scientists and/or subject

matter experts, that automatically assign labels to data

on the base of domain knowledge embedded in form

of annotation rules.

More in detail, our method consists in a set of pre-

deﬁned, easy-to-use, and ﬂexible LFs capable of au-

tomatically assigning a sentiment to sentence-aspect

pairs. The method is based on the ideas that: (i) it

must require minimum NLP knowledge to the user,

and (ii) it must be reusable in multiple domains with

minimal effort for domain adaptation.

One of the most important characteristic of data

programming is that LFs are noisy (e.g. different LFs

may label same data in different ways, LFs can label

false positive examples). In this paper, in order to deal

with the ASC task of ABSA problems by data pro-

gramming, we used the Snorkel system (Ratner et al.,

2017) that enables handling the entire life-cycle of

data programming tasks. Once LFs have been written,

Snorkel applies them to data and automatically learns

a generative model over these LFs which estimates

their accuracy and correlations, with no ground-truth

data needed. It is noteworthy that Snorkel applies LFs

as a generative process which automatically de-noises

the resulting dataset by learning the accuracy of the

LFs along with their correlation structure. Thus, the

output of this process is a training set composed of

probabilistic labels that can be used as input for deep

learning algorithms that have to use a noise-aware

loss function.

In our weak-supervised method LFs take as in-

put pairs having the form < s; a > where s is a

sentence and a is an aspect, and return triples hav-

ing the form < s;a;l > where s and a are the sen-

tence and the aspect in input respectively, and l ∈

{Positive, Negative, Neutral} ∪ {Abstain} is the label

in output. In fact, a LF can choose to abstain if it

doesn’t have sufﬁcient information to assign a senti-

ment. The following pseudo-code shows the structure

of a LF in our method.

A fundamental step of the annotation process

in each LF is to chunk the text in input. So, every

sentence in input to a LF is analyzed by chunker(s).

To implement chunker(s) we leverage the Stanford

CoreNLP Parser (Manning et al., 2014). chunker(s)

returns a list of chunks belonging to two different

types: NP and V P. To assign tokens of a sentence

to NP and VP we use the following strategy in

traversing the parse tree: all the tokens around a

NP tag are assigned to a NP chunk, until we meet a

V P tag. Considering the encountered V P tag all the

A Weak-supervision Method for Automating Training Set Creation in Multi-domain Aspect Sentiment Classiﬁcation

251

tokens around it are assigned to a VP chunk until we

meet another NP tag and so on.

INPUT: <s;a>

OUTPUT: <s;a;l>

BEGIN METHOD:

C := chunker(s)

if a belongs to a chunk in C and

a is the longest aspect for that chunk:

D := dependency_parser(C, a)

L := labeling(D)

return <s;a;l>

return <s;a;Abstain>

END METHOD

It is noteworthy that if tokens of an aspect a, of a given

pair < s;a >, span over multiple NP or V P chunks the

LF assigns the Abstain label to the pair. Moreover, if

in a chunk in C there are more than one aspect the

LF considers the aspect having maximum number of

tokens to try to assign the polarity label and assigns

the Abstain label to other aspects in the same chunk.

This behavior avoids introducing too much noise in

the labeling process.

When conditions about the aspect a are veriﬁed

the LF calls the dependency_parser(C, a) method.

As dependency parser we use the Stanford CoreNLP

(Manning et al., 2014). dependency_parser(C, a) as-

signs to D the chunk that contains the aspect a and all

those chunks that have a direct parsing dependency of

some kind with it.

The method labeling(D) assigns a label l to the

pair < s;a > by evaluating sentiments of chunks in

D. To compute sentiment polarity of chunks in D we

adopt already trained external sentiment analyzers. In

particular, we use Stanford CoreNLP (Manning et al.,

2014), TextBlob

, NLTK (Bird, 2006), and Pattern

We adopt two strategies in assigning labels. In the

ﬁrst strategy we compute the polarity of each single

chunk in D. If all chunks have the same polarity the

method returns the corresponding label. If chunks

have mixed Positive and Negative polarity the method

returns the Abstain label. When there are Neutral

chunks mixed with at least one Positive chunk the

method returns the Positive label, and the same hap-

pens for mixed Neutral and Negative chunks. The

second strategy consists in appending chunks in D

to compute the global polarity of the resulting text

https://textblob.readthedocs.io/

https://www.clips.uantwerpen.be/pages/pattern-en/

string. In this case the method simply returns the po-

larity l where l ∈ {Positive, Negative, Neutral}. Con-

sidering the two different strategies and the four senti-

ment analyzers the approach includes a total of 8 LFs.

It is noteworthy that these labeling function tem-

plates are conceptually simple and powerful because

they enable everyone to reuse existing knowledge em-

bedded in already available NLP tools to create new

training sets. Table 1 and Table 2 show statistics

about the LFs when applied to laptops and restau-

rants datasets, respectively. In particular, columns of

the tables represent coverage, overlaps, conﬂicts, and

empirical accuracy of LFs when they are executed to

a small number (150) of manually labeled examples

called dev set. Rows in the tables correspond to the

eight LFs we deﬁned by using NLP tools and strate-

gies described above, where each row of the tables

contains values computed for a speciﬁc labeling func-

tion.

Experiments on laptops in Table 1 show that LFs

have about 77% of coverage and 50% of empirical

accuracy, while Table 2 shows that restaurants have

a coverage of about 69% and empirical accuracy of

50%. Results on coverage and empirical accuracy

suggest that deﬁned LFs work properly and can be

used to annotate the two datasets.

The result of the labeling process is a ma-

trix of labels Λ ∈ ({Positive, Negative, Neutral} ∪

{Abstain})

mxn

, where m is the cardinality of the train-

ing set and n is the number of LFs. This matrix

is the input of the Snorkel generative model. Such

model produces a list of probabilistic training labels

Y = ( ˜y

, ..., ˜y

), where each ˜y

∈ R

is a probabil-

ity distribution over the classes {Positive, Negative,

Neutral}. This probabilistic dataset is the input of

discriminative models that use noise-aware loss func-

tions.

4 EXPERIMENTS

In this section we describe experiments aiming to

verify effectiveness and robustness of our weak-

supervision method for the ASC task. To carry out

experiments we used discriminative models proposed

in (Xu et al., 2019). Such models are obtained

by post-training BERT (Devlin et al., 2019) a deep

learning architecture based on Transformers (Vaswani

et al., 2017). BERT is pre-trained on Wikipedia and

BooksCorpus dataset (Zhu et al., 2015) and is cur-

rently widely used in many models that reach the state

of the art in several NLP tasks.

Post-training enables injecting into BERT the

missing domain knowledge. (Xu et al., 2019) shows

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

252

Table 1: LFs application stats on laptops domain.

Coverage Overlaps Conﬂicts Emp. Acc.

(Stan f ordCoreNLP, FirstStrategy)

0.7667 0.7667 0.5267 0.5478

(Stan f ordCoreNLP, SecondStrategy)

0.7933 0.7933 0.5333 0.5042

(TextBlob, FirstStrategy)

0.7400 0.7400 0.4867 0.4775

(TextBlob, SecondStrategy)

0.7933 0.7933 0.5333 0.5042

(NLT K, FirstStrategy)

0.7533 0.7533 0.4933 0.4956

(NLT K, SecondStrategy)

0.7933 0.7933 0.5333 0.4874

(Pattern.en, FirstStrategy)

0.7467 0.7467 0.4933 0.4821

(Pattern.en, SecondStrategy)

0.7933 0.7933 0.5333 0.4790

Table 2: LFs application stats on restaurants domain.

Coverage Overlaps Conﬂicts Emp. Acc.

(Stan f ordCoreNLP, FirstStrategy)

0.6200 0.6200 0.3867 0.4946

(Stan f ordCoreNLP, SecondStrategy)

0.6800 0.6800 0.4200 0.5882

(TextBlob, FirstStrategy)

0.6667 0.6667 0.4067 0.5100

(TextBlob, SecondStrategy)

0.6800 0.6800 0.4200 0.5098

(NLT K, FirstStrategy)

0.6667 0.6667 0.4133 0.5000

(NLT K, SecondStrategy)

0.6800 0.6800 0.4200 0.4098

(Pattern.en, FirstStrategy)

0.6667 0.6667 0.4067 0.5100

(Pattern.en, SecondStrategy)

0.6800 0.6800 0.4200 0.5000

that post-training the model on a speciﬁc domain con-

tributes to performances improvement. The model

resulting from the post-training is ﬁnally ﬁne-tuned

for the down-stream task. Post-training is the founda-

tion for state of the art ASC methods (Rietzler et al.,

2019).

Figure 2 shows the input and the output of BERT

for te ASC task. First of all the sentence and the as-

pect in input are tokenized by the WordPiece algo-

rithm (Wu et al., 2016). Hence, q

, ..., q

is the em-

bedding of an input aspect with m tokens and d

, ..., d

is the embedding of an input sentence with n tokens.

[CLS] and [SEP] are two special tokens. [CLS] is used

for classiﬁcation problem and h[CLS] is the aspect-

aware represetation of the whole input through BERT

(for further detail see (Devlin et al., 2019)). The

[SEP] token is used to separate two different inputs.

Figure 2: BERT input/output for the ASC task.

Discriminative models like BERT have to be in-

formed about the probabilistic nature of the train-

ing set resulting from our weak-supervision method.

Hence, we use a noise-aware loss function. The

cross-entropy function ﬁts perfectly for this purpose,

because it measures the discrepancy between a true

probability distribution and an estimated distribution

(p and q respectively in Equation 1). In general,

the estimated distribution is the output of a classiﬁer,

while the true distribution is usually a one hot vector,

where the bit set to one indicates which class the train-

ing example belongs to. In the probabilistic world of

data programming, p is a probability distribution over

the classes (i.e. the output of the Snorkel generative

model for a training example).

H(p, q) = −

∑

x∈X

p(x)log q(x) (1)

4.1 Dataset

In order to test the robustness of our method and to

compare the results with those of supervised algo-

rithms, we chose two datasets belonging to SemEval

task 4 subtask 2 (Pontiki et al., 2014) (SemEval from

now), and coming from disjoint domains, i.e. laptops

and restaurants.

Each dataset is already split in training, develop-

ment, and test set. The number of examples for each

set is shown in Table 3. Each partition was hand-

labeled by subject matter experts with labels within

the set {Positive, Negative, Neutral}.

In our experiments, we replaced original annota-

tions in the training sets with probability distributions

computed by Snorkel generative models. We used the

dev set for tuning the LFs and the hyper-parameters

of the generative models.

A Weak-supervision Method for Automating Training Set Creation in Multi-domain Aspect Sentiment Classiﬁcation

253

Table 3: Number of examples for each dataset.

Train Set Dev Set Test Set

Laptops 2163 150 638

Restaurants 3452 150 1120

4.2 Experimental Settings

The hyperparameters of the generative models were

searched through grid search. To tune the hyperpa-

rameters we used the dev sets of SemEval. Let’s intro-

duce the triple < e;lr;o > in order to denote a search

conﬁguration, where e ∈ {100, 200, 500, 1000, 2000}

is the number of epochs, lr ∈ {0.01, 0.001, 0.0001} is

the learning rate and o ∈ {sgd, adam, adamax} is the

optimizer.

The best conﬁguration for laptops domain is <

100;0.01; adamax >. With these settings the gener-

ative model applied to SemEval produced a dataset

of 1702 probabilistic examples on laptops. Best

results for restaurants domain are obtained with <

100;0.001; adamax >. The cardinality of the prob-

abilistic dataset produced by Snorkel on restaurants is

2471.

Table 4 shows the number of examples for each

label. Because the labels are probabilistic, we report

the class with highest probability for every training

example.

Restaurants training set produced by Snorkel re-

sults unbalanced. In order to balance it we limit the

number of Positive and Neutral examples to 700. Re-

sults discussed in the next section are computed by

averaging the metrics of 10 models trained with Posi-

tive and Neutral examples obtained by different sam-

pling strategies.

Table 4: Number of most probable examples for each label.

Positive Negative Neutral

Laptops 627 448 627

Restaurants 1371 301 792

Restaurants (sampled) 700 301 700

As in (Xu et al., 2019) we trained a simple softmax

classiﬁer whose output belongs to R

(3 is the num-

ber of polarities) on top of BERT post-trained models.

We ﬁne-tuned the discriminative models for 4 epochs

using a batch size of 32 and the Adam optimizer with

a learning rate of 3e-5. Results are obtained by aver-

aging 10 runs sampling the mini-batches differently.

4.3 Results and Discussion

The following tables show the results of accuracy and

macro F1 on laptops and restaurants domains. On the

left there are the names of ﬁne-tuned models, where

XU-150, XU-300, XU-450 are the models obtained

by ﬁne-tuning (Xu et al., 2019) with 150, 300, and

450 hand-labeled examples respectively belonging to

the SemEval training set. The examples were ran-

domly extracted to obtain balanced datasets. In par-

ticular, we sampled the train set 10 times averaging

the results. XU-Weak is the model trained with prob-

abilistic labels computed by our weak-supervision

method.

Table 5: Results of the experiments on laptops.

Accuracy Macro F1

XU-150 57.77 52.60

XU-300 71.59 68.00

XU-Weak 69.36 65.37

Table 6: Results of the experiments on restaurants.

Accuracy Macro F1

XU-150 48.62 39.15

XU-300 69.49 60.79

XU-450 77.93 67.57

XU-Weak 75.39 67.33

Results in Table 5 and Table 6 suggest that the us-

age in discriminative models of automatically labeled

probabilistic examples is promising. Furthermore,

our initial experiments suggest that we can easily

scale the number of labeled examples needed to get

better performances. In particular, our method consti-

tutes a way to deal with real world use cases where the

dimension of hand-labeled labeled training sets may

become a bottleneck for the implementation of ASC

models.

Experimental results seem, also, to conﬁrm that

the approach can be easily used cross-domain. How-

ever, further experiments on bigger datasets are re-

quired to completely assess these initial intuitions.

Best results are obtained on restaurants domain.

This could be due to the fact that the restaurants model

was post-trained for more epochs and with more ex-

amples in (Xu et al., 2019).

Our goal has been to offer an easy-to-use alterna-

tive to manual labeling in cross-domain usage of dis-

criminative models for the ASC task. Our method is

ready-to-use, it just needs a few labeled examples (we

only used 150 examples belonging to SemEval dev

sets) in order to tune the Snorkel generative model. It

is very useful when applied to many different domains

with lots of unlabeled data.

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

254

5 CONCLUSIONS AND FUTURE

WORK

In this work we proposed a weak-supervised approach

to the ASC task. We developed a method that exploits

the data programming paradigm in order to address

the problem. We tested the method on two disjoint

domains, laptops and restaurants. We tuned a Snorkel

generative model for each domain and used them to

label the data in a probabilistic manner. Resulting

probabilistic training sets were used to ﬁne-tune dis-

criminative models proposed in (Xu et al., 2019) that

were previously post-trained with unlabeled domain

data.

The experiments we carried out offer many hints

for future work. In particular, obtained results sug-

gest that the approach can be used cross-domain. We

plan to perform more experiments on larger datasets

in order to conﬁrm initial intuitions. Our future work

will be focused on real world datasets to perform ex-

tensive experiments on scalability and performances

of the method we have proposed in this paper. More-

over, we will improve the method by deﬁning further

LFs templates while maintaining its simplicity.

REFERENCES

Bach, S. H., Rodriguez, D., Liu, Y., Luo, C., Shao, H.,

Xia, C., Sen, S., Ratner, A., Hancock, B., Alborzi, H.,

Kuchhal, R., R

e, C., and Malkin, R. (2019). Snorkel

drybell: A case study in deploying weak supervision

at industrial scale. In Boncz, P. A., Manegold, S.,

Ailamaki, A., Deshpande, A., and Kraska, T., edi-

tors, Proceedings of the 2019 International Confer-

ence on Management of Data, SIGMOD Conference

2019, Amsterdam, The Netherlands, June 30 - July 5,

2019., pages 362–375. ACM.

Bird, S. (2006). NLTK: the natural language toolkit. In

Calzolari, N., Cardie, C., and Isabelle, P., editors,

ACL 2006, 21st International Conference on Compu-

tational Linguistics and 44th Annual Meeting of the

Association for Computational Linguistics, Proceed-

ings of the Conference, Sydney, Australia, 17-21 July

2006. The Association for Computer Linguistics.

Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2019).

BERT: pre-training of deep bidirectional transformers

for language understanding. In Burstein, J., Doran,

C., and Solorio, T., editors, Proceedings of the 2019

Conference of the North American Chapter of the As-

sociation for Computational Linguistics: Human Lan-

guage Technologies, NAACL-HLT 2019, Minneapolis,

MN, USA, June 2-7, 2019, Volume 1 (Long and Short

Papers), pages 4171–4186. Association for Computa-

tional Linguistics.

Ding, Y., Yu, J., and Jiang, J. (2017). Recurrent neural net-

works with auxiliary labels for cross-domain opinion

target extraction. In Proceedings of the Thirty-First

AAAI Conference on Artiﬁcial Intelligence, February

4-9, 2017, San Francisco, California, USA., pages

3436–3442.

Hu, M. and Liu, B. (2004). Mining and summarizing cus-

tomer reviews. In Kim, W., Kohavi, R., Gehrke, J.,

and DuMouchel, W., editors, Proceedings of the Tenth

ACM SIGKDD International Conference on Knowl-

edge Discovery and Data Mining, Seattle, Washing-

ton, USA, August 22-25, 2004, pages 168–177. ACM.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-

agenet classiﬁcation with deep convolutional neural

networks. In Bartlett, P. L., Pereira, F. C. N., Burges,

C. J. C., Bottou, L., and Weinberger, K. Q., edi-

tors, Advances in Neural Information Processing Sys-

tems 25: 26th Annual Conference on Neural Informa-

tion Processing Systems 2012. Proceedings of a meet-

ing held December 3-6, 2012, Lake Tahoe, Nevada,

United States., pages 1106–1114.

Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R.,

Bethard, S., and McClosky, D. (2014). The stanford

corenlp natural language processing toolkit. In Pro-

ceedings of the 52nd Annual Meeting of the Associa-

tion for Computational Linguistics, ACL 2014, June

22-27, 2014, Baltimore, MD, USA, System Demon-

strations, pages 55–60. The Association for Computer

Linguistics.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and

Dean, J. (2013). Distributed representations of words

and phrases and their compositionality. In Burges, C.

J. C., Bottou, L., Ghahramani, Z., and Weinberger,

K. Q., editors, Advances in Neural Information Pro-

cessing Systems 26: 27th Annual Conference on Neu-

ral Information Processing Systems 2013. Proceed-

ings of a meeting held December 5-8, 2013, Lake

Tahoe, Nevada, United States., pages 3111–3119.

Pablos, A. G., Cuadros, M., and Rigau, G. (2014). V3: un-

supervised generation of domain aspect terms for as-

pect based sentiment analysis. In Nakov, P. and Zesch,

T., editors, Proceedings of the 8th International Work-

shop on Semantic Evaluation, SemEval@COLING

2014, Dublin, Ireland, August 23-24, 2014., pages

833–837. The Association for Computer Linguistics.

Pablos, A. G., Cuadros, M., and Rigau, G. (2015). V3:

unsupervised aspect based sentiment analysis for se-

meval2015 task 12. In Cer, D. M., Jurgens, D.,

Nakov, P., and Zesch, T., editors, Proceedings of

the 9th International Workshop on Semantic Eval-

uation, SemEval@NAACL-HLT 2015, Denver, Col-

orado, USA, June 4-5, 2015, pages 714–718. The As-

sociation for Computer Linguistics.

Pablos, A. G., Cuadros, M., and Rigau, G. (2018).

W2VLDA: almost unsupervised system for aspect

based sentiment analysis. Expert Syst. Appl., 91:127–

137.

Pereg, O., Korat, D., Wasserblat, M., Mamou, J., and Da-

gan, I. (2019). Absapp: A portable weakly-supervised

aspect-based sentiment extraction system. CoRR,

abs/1909.05608.

Pontiki, M., Galanis, D., Pavlopoulos, J., Papageorgiou,

H., Androutsopoulos, I., and Manandhar, S. (2014).

A Weak-supervision Method for Automating Training Set Creation in Multi-domain Aspect Sentiment Classiﬁcation

255

Semeval-2014 task 4: Aspect based sentiment analy-

sis. In Nakov, P. and Zesch, T., editors, Proceedings

of the 8th International Workshop on Semantic Evalu-

ation, SemEval@COLING 2014, Dublin, Ireland, Au-

gust 23-24, 2014., pages 27–35. The Association for

Computer Linguistics.

Purpura, A., Masiero, C., and Susto, G. A. (2018).

WS4ABSA: an nmf-based weakly-supervised ap-

proach for aspect-based sentiment analysis with ap-

plication to online reviews. In Soldatova, L. N., Van-

schoren, J., Papadopoulos, G. A., and Ceci, M., ed-

itors, Discovery Science - 21st International Confer-

ence, DS 2018, Limassol, Cyprus, October 29-31,

2018, Proceedings, volume 11198 of Lecture Notes

in Computer Science, pages 386–401. Springer.

Qiu, G., Liu, B., Bu, J., and Chen, C. (2009). Expanding do-

main sentiment lexicon through double propagation.

In Boutilier, C., editor, IJCAI 2009, Proceedings of

the 21st International Joint Conference on Artiﬁcial

Intelligence, Pasadena, California, USA, July 11-17,

2009, pages 1199–1204.

Qiu, G., Liu, B., Bu, J., and Chen, C. (2011). Opinion word

expansion and target extraction through double prop-

agation. Computational Linguistics, 37(1):9–27.

Ratner, A., Bach, S. H., Ehrenberg, H. R., Fries, J. A., Wu,

S., and R

e, C. (2017). Snorkel: Rapid training data

creation with weak supervision. PVLDB, 11(3):269–

282.

Ratner, A. J., Sa, C. D., Wu, S., Selsam, D., and R

e, C.

(2016). Data programming: Creating large training

sets, quickly. In Lee, D. D., Sugiyama, M., von

Luxburg, U., Guyon, I., and Garnett, R., editors, Ad-

vances in Neural Information Processing Systems 29:

Annual Conference on Neural Information Process-

ing Systems 2016, December 5-10, 2016, Barcelona,

Spain, pages 3567–3575.

Rietzler, A., Stabinger, S., Opitz, P., and Engl, S. (2019).

Adapt or get left behind: Domain adaptation through

BERT language model ﬁnetuning for aspect-target

sentiment classiﬁcation. CoRR, abs/1908.11860.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,

L., Gomez, A. N., Kaiser, L., and Polosukhin, I.

(2017). Attention is all you need. In Guyon, I., von

Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R.,

Vishwanathan, S. V. N., and Garnett, R., editors, Ad-

vances in Neural Information Processing Systems 30:

Annual Conference on Neural Information Processing

Systems 2017, 4-9 December 2017, Long Beach, CA,

USA, pages 6000–6010.

Wang, W. and Pan, S. J. (2018). Recursive neural struc-

tural correspondence network for cross-domain aspect

and opinion co-extraction. In Proceedings of the 56th

Annual Meeting of the Association for Computational

Linguistics, ACL 2018, Melbourne, Australia, July 15-

20, 2018, Volume 1: Long Papers, pages 2171–2181.

Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi,

M., Macherey, W., Krikun, M., Cao, Y., Gao, Q.,

Macherey, K., Klingner, J., Shah, A., Johnson, M.,

Liu, X., Kaiser, L., Gouws, S., Kato, Y., Kudo, T.,

Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang,

W., Young, C., Smith, J., Riesa, J., Rudnick, A.,

Vinyals, O., Corrado, G., Hughes, M., and Dean, J.

(2016). Google’s neural machine translation system:

Bridging the gap between human and machine trans-

lation. CoRR, abs/1609.08144.

Xu, H., Liu, B., Shu, L., and Yu, P. S. (2019). BERT post-

training for review reading comprehension and aspect-

based sentiment analysis. In Burstein, J., Doran, C.,

and Solorio, T., editors, Proceedings of the 2019 Con-

ference of the North American Chapter of the Asso-

ciation for Computational Linguistics: Human Lan-

guage Technologies, NAACL-HLT 2019, Minneapolis,

MN, USA, June 2-7, 2019, Volume 1 (Long and Short

Papers), pages 2324–2335. Association for Computa-

tional Linguistics.

Yang, Z., Dai, Z., Yang, Y., Carbonell, J. G., Salakhutdi-

nov, R., and Le, Q. V. (2019). Xlnet: Generalized au-

toregressive pretraining for language understanding.

CoRR, abs/1906.08237.

Zhu, Y., Kiros, R., Zemel, R. S., Salakhutdinov, R., Urtasun,

R., Torralba, A., and Fidler, S. (2015). Aligning books

and movies: Towards story-like visual explanations by

watching movies and reading books. In 2015 IEEE

International Conference on Computer Vision, ICCV

2015, Santiago, Chile, December 7-13, 2015, pages

19–27. IEEE Computer Society.

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

256