connected splicing operation are sent to the softmax
classifier to calculate the final sentiment polarity
classification results, where the GRAND regular
optimization algorithm is used to add perturbations
during the training process to improve the model
robustness and generalizability. The overall network
architecture of our proposed AGTG text sentiment
classification model is shown in Figure 2.
Vector
Splicing
Word Vector
Dimensionality
d
2
T
e
x
t
Word Vector
Dimensionality
d
1
+d
2
ALBERT
GloVe
Matrix X
Matrix Y
Matrix Z
Word Vector
Dimensionality
d
1
softmax
Convolution Kernel 1
G
R
A
N
D
p
1
p
2
p
3
Convolution Kernel 2
Convolution Kernel 3
Figure 2. AGTG architecture diagram
4 EXPERIMENT
4.1 Parameters
The experiment further pre-trained the adopted
ALBERT pre-training model in the target task domain
by performing the ITPT task with the following key
parameters: train_batch_size was set to 4096,
eval_batch_size was set to 64, max_seq_length was
set to 256, max_predictions_per_seq was set to 20,
num_train_steps was set to 300000,
num_warmup_steps was set to 1500, learning_rate
was set to 1e-3, save_checkpoints_save was set to 1e-
3, predictions_per_seq was set to 20, num_train_steps
was set to 300000, num_warmup_steps was set to
1500, learning_rate was set to 1e-3,
save_checkpoints_steps was set to 50000.
With the other parameters fixed, the model
parameters involved in the experiments were changed
over several experiments to obtain the parameter
configuration that allowed the experiments to classify
better, and the key parameters were set as follows:
train_epochs was set to 8, batch_size was set to 64,
max_seq_len was set to 256, learning_rate was set to
3e-5, activation function is ReLU, dropout_rate was
set to 0.1, convolution kernel size is 3,4,5, and the
number of convolution kernels was set to 128.
For GRAND, the adjustment parameters
and
in both adversarial regularization and
approximate point optimization were set to 0.5, and
the regularization perturbation parameter
was set
to 1e-5. To simplify the iterations, the number of
regular optimization iterations T
b
and
were both
set to 2, the update parameter
of
was set to
1e-3, the initialized standard deviation
of the
perturbed samples was set to 1e-5, and the
acceleration parameter
was set to 0.3.
4.2 Comparison with Baseline
In order to verify the advantages of our proposed
AGTG, some representative models in text sentiment
analysis studies were selected as baseline models, and
comparative experiments on model performance
were conducted in the same experimental
environment, mainly including the following models:
BERT
0
: Devlin et al. proposed a bi-directional
Transformer architecture for pre-training models, and
the officially released Chinese model was pre-trained
on a large Chinese corpus to make it highly capable
of language comprehension.
BERT-wwm
0
: Cui et al. propose a variant based
on the BERT model to improve the pre-training effect
of BERT by replacing the traditional split-word
masking with whole-word masking, which makes full
use of the characteristics of Chinese language and
enables the model to better understand the
relationships and semantic meanings between words,
thus achieving better performance in NLP tasks.
ERNIE
0
: Zhang et al. propose a Transformer-
based pre-trained language representation model that
aims to integrate entity information and relational
knowledge to enhance the representation capability of
pre-trained language models, employing a term
masking mechanism to mask entities in the input text
and requiring the model to predict the masked entities
based on their context and external knowledge, thus
integrating knowledge and linguistic semantic
information together, allowing the model to learn
entity representations and their relationships more
efficiently, thus improving the performance of NLP
tasks.
RoBERTa
0
: Liu et al. propose a reinforced
optimised pre-trained self-supervised model based on
BERT that aims to improve the performance of BERT
by using more pre-trained data, larger batch sizes and
dynamic masks used to replace static masks.
DeBERTa
0
: He et al. propose a self-supervised
pre-training model based on BERT, introducing a
decoding enhancement technique that allows the
model to better capture relations in sentences by
adding the output of the decoder to a self-attentive
mechanism, in addition to using a decomposed
attention mechanism to better distinguish between
different features.
AGTG: Text Sentiment Analysis on Social Network Domain Based on Pre-Training and Regular Optimization Techniques
341