ternational Conference on Social Networks Analysis,
Management and Security, pages 284–289.
Goldberg, Y. (2016). A primer on neural network models
for natural language processing. Journal of Artificial
Intelligence Research, 57(1):345–420.
Gomez, A. N., Ren, M., Urtasun, R., and Grosse, R. B.
(2017). The reversible residual network: Backpropa-
gation without storing activations. In Proceedings of
the 31st International Conference on Neural Informa-
tion Processing Systems, page 2211–2221.
Gonc¸alo Oliveira, H. and Cardoso, N. (2009). Sahara:
An online service for harem named entity recognition
evaluation. In Proceedings of the 7th Brazilian Sym-
posium in Information and Human Language Technol-
ogy, pages 171–174.
Gutmann, M. U. and Hyv
¨
arinen, A. (2012). Noise-
contrastive estimation of unnormalized statistical
models, with applications to natural image statis-
tics. The Journal of Machine Learning Research,
13(1):307–361.
Hinton, G., Vinyals, O., and Dean, J. (2015). Distill-
ing the knowledge in a neural network. CoRR,
abs/1503.02531.
Joulin, A., Grave, E., Bojanowski, P., and Mikolov, T.
(2016). Bag of tricks for efficient text classification.
CoRR, abs/1607.01759.
Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R. S., Tor-
ralba, A., Urtasun, R., and Fidler, S. (2015). Skip-
Thought vectors. In Proceedings of the 29th Con-
ference on Neural Information Processing Systems,
pages 1532–1543.
Lafferty, J. D., McCallum, A., and Pereira, F. C. N. (2001).
Conditional random fields: Probabilistic models for
segmenting and labeling sequence data. In Proceed-
ings of the 18th International Conference on Machine
Learning, page 282–289.
Lample, G. and Conneau, A. (2019). Cross-lingual lan-
guage model pretraining. CoRR, abs/1901.07291.
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P.,
and Soricut, R. (2019). ALBERT: A lite BERT for
self-supervised learning of language representations.
CoRR, abs/1909.11942.
Levy, O. and Goldberg, Y. (2014). Dependency-based word
embeddings. In Proceedings of the 52nd Annual Meet-
ing of the Association for Computational Linguistics,
pages 302–308.
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mo-
hamed, A., Levy, O., Stoyanov, V., and Zettlemoyer,
L. (2019). BART: denoising sequence-to-sequence
pre-training for natural language generation, transla-
tion, and comprehension. CoRR, abs/1910.13461.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,
Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov,
V. (2019). RoBERTa: A robustly optimized bert pre-
training approach. CoRR, abs/1907.11692.
Marsh, E. and Perzanowski, D. (1998). MUC-7 evaluation
of IE technology: Overview of results. In Proceedings
of the 7th Message Understanding Conference.
Mikolov, T., Chen, K., Corrado, G. S., and Dean, J. (2013a).
Efficient estimation of word representations in vector
space. CoRR, abs/1301.3781.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and
Dean, J. (2013b). Distributed representations of words
and phrases and their compositionality. In Proceed-
ings of the Advances in Neural Information Process-
ing Systems, pages 3111–3119.
Morin, F. and Bengio, Y. (2005). Hierarchical probabilistic
neural network language model. In Proceedings of the
10th International Workshop on Artificial Intelligence
and Statistics, pages 246–252.
Pennington, J., Socher, R., and Manning, C. D. (2014).
GloVe: Global vectors for word representation. In
Proceedings of the 2014 Conference on Empirical
Methods in Natural Language Processing, pages
1532–1543.
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark,
C., Lee, K., and Zettlemoyer, L. (2018). Deep con-
textualized word representations. In Proceedings of
the 2018 Conference of the North American Chapter
of the Association for Computational Linguistics: Hu-
man Language Technologies, pages 2227–2237.
Pires, A. R. O. (2017). Named entity extraction from por-
tuguese web text. Master’s thesis, Porto University.
Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019).
DistilBERT, a distilled version of BERT: smaller,
faster, cheaper and lighter. CoRR, abs/1910.01108.
Santos, D. and Cardoso, N. (2007). Reconhecimento de en-
tidades mencionadas em portugu
ˆ
es: Documentac¸
˜
ao e
actas do HAREM, a primeira avaliac¸
˜
ao conjunta na
´
area. Linguateca.
Shazeer, N., Cheng, Y., Parmar, N., Tran, D., Vaswani, A.,
Koanantakool, P., Hawkins, P., Lee, H., Hong, M.,
Young, C., et al. (2018). Mesh-TensorFlow: Deep
learning for supercomputers. In Procedings of the
Advances in Neural Information Processing Systems,
pages 10414–10423.
Souza, F., Nogueira, R. F., and de Alencar Lotufo, R.
(2019). Portuguese named entity recognition using
BERT-CRF. CoRR, abs/1909.10649.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J.,
Jones, L., Gomez, A. N., Kaiser, L., and Polo-
sukhin, I. (2017). Attention is all you need. CoRR,
abs/1706.03762.
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue,
C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtow-
icz, M., et al. (2019). Huggingface’s transformers:
State-of-the-art natural language processing. CoRR,
abs/1910.03771.
Xun, G., Li, Y., Zhao, W. X., Gao, J., and Zhang, A. (2017).
A correlated topic model using word embeddings. In
Proceedings of the 26th International Joint Confer-
ence on Artificial Intelligence, pages 4207–4213.
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.,
and Le, Q. V. (2019). XLNet: Generalized autoregres-
sive pretraining for language understanding. CoRR,
1906.08237.
Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese
483