trained models since they are highly effective in the 
field of NLP, and the tasks in the domain are recurrent 
and  repetitive  and  deal  with  a  homogenous  entity 
which is language. Meaning that the hyperparameters 
that are optimal in a big model will very likely yield 
the same results in smaller tasks with the same nature. 
6  CONCLUSIONS 
This paper provides an insight into good practices in 
hyperparameter  optimization  in  natural  language 
processing related tasks. We found out that there are 
common  traits  in  the  optimization  process  of 
hyperparameters  and  that  some  particular  HPO 
techniques  work  well  with  certain  tasks.  Also,  the 
values reported in this paper from certain studies can 
be  reproduced  in  similar  tasks.  The  recent 
developments  in  transformer  architectures,  have 
paved the way for optimal models down the line by 
means of transfer learning, which benefits ultimately 
the hyperparameter optimization in NLP. 
REFERENCES 
Moore,  R.,  Lopes,  J.,  1999.  Paper  templates.  In 
TEMPLATE’06, 1st International Conference on 
Template Production. SCITEPRESS. 
Smith,  J.,  1998.  The book,  The  publishing  company. 
London, 2
nd
 edition. 
Aghaebrahimian,  Ahmad,  and  Mark  Cieliebak.  2019. 
“Hyperparameter Tuning for Deep Learning in Natural 
Language Processing,” 7. 
Bergstra,  James,  and  Yoshua  Bengio.  2012.  “Random 
Search for Hyper-Parameter Optimization,” 25. 
Bergstra,  James,  Brent  Komer,  Chris  Eliasmith,  Dan 
Yamins, and David D Cox. 2015. “Hyperopt: A Python 
Library  for  Model  Selection  and  Hyperparameter 
Optimization.” Computational Science & Discovery 8 
(1):  014008.  https://doi.org/10.1088/1749-
4699/8/1/014008. 
Caselles-Dupré, Hugo, Florian Lesaint, and Jimena Royo-
Letelier.  2018.  “Word2Vec  Applied  to 
Recommendation:  Hyperparameters  Matter.” 
ArXiv:1804.04212  [Cs,  Stat],  August. 
http://arxiv.org/abs/1804.04212. 
Claesen, Marc, and Bart De Moor. 2015. “Hyperparameter 
Search in Machine Learning.” ArXiv:1502.02127 [Cs, 
Stat], April. http://arxiv.org/abs/1502.02127. 
Costa,  Victor  O.,  and  Cesar  R.  Rodrigues.  2018. 
“Hierarchical Ant  Colony  for  Simultaneous Classifier 
Selection and Hyperparameter Optimization.” In 2018 
IEEE  Congress  on  Evolutionary  Computation  (CEC), 
1–8.  Rio  de  Janeiro:  IEEE. 
https://doi.org/10.1109/CEC.2018.8477834. 
Dernoncourt, Franck, and Ji Young Lee. 2016. “Optimizing 
Neural  Network  Hyperparameters  with  Gaussian 
Processes for Dialog Act Classification.” In 2016 IEEE 
Spoken Language Technology Workshop (SLT), 406–
13. San Diego, CA: IEEE. 
https://doi.org/10.1109/SLT.2016.7846296. 
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina 
Toutanova.  2019.  “BERT:  Pre-Training  of  Deep 
Bidirectional  Transformers  for  Language 
Understanding.”  ArXiv:1810.04805  [Cs],  May. 
http://arxiv.org/abs/1810.04805. 
Feurer,  Matthias,  Aaron  Klein,  Katharina  Eggensperger, 
Jost  Tobias  Springenberg,  Manuel  Blum,  and  Frank 
Hutter.  2015.  “Efficient  and  Robust  Automated 
Machine Learning,” 9. 
Golovin,  Daniel,  Benjamin  Solnik,  Subhodeep  Moitra, 
Greg  Kochanski,  John  Karro,  and  D.  Sculley.  2017. 
“Google  Vizier:  A  Service  for  Black-Box 
Optimization.”  In  Proceedings  of  the  23rd  ACM 
SIGKDD  International  Conference  on  Knowledge 
Discovery  and  Data  Mining  -  KDD  ’17,  1487–95. 
Halifax,  NS,  Canada:  ACM  Press. 
https://doi.org/10.1145/3097983.3098043. 
Hinton, Geoffrey E. 2012. “A Practical Guide to Training 
Restricted Boltzmann Machines.” In Neural Networks: 
Tricks  of  the  Trade,  edited  by  Grégoire  Montavon, 
Geneviève B. Orr, and Klaus-Robert Müller, 7700:599–
619.  Lecture  Notes  in  Computer  Science.  Berlin, 
Heidelberg:  Springer  Berlin  Heidelberg. 
https://doi.org/10.1007/978-3-642-35289-8_32. 
Klein,  Aaron,  Stefan  Falkner,  Simon  Bartels,  Philipp 
Hennig,  and  Frank  Hutter.  2017.  “Fast  Bayesian 
Hyperparameter  Optimization  on  Large  Datasets.” 
Electronic  Journal  of  Statistics  11  (2):  4945–68. 
https://doi.org/10.1214/17-EJS1335SI. 
Komninos,  Alexandros,  and  Suresh  Manandhar.  2016. 
“Dependency  Based  Embeddings  for  Sentence 
Classification  Tasks.”  In  Proceedings  of  the  2016 
Conference  of  the  North  American  Chapter  of  the 
Association  for  Computational  Linguistics:  Human 
Language  Technologies,  1490–1500.  San  Diego, 
California: Association for Computational Linguistics. 
https://doi.org/10.18653/v1/N16-1175. 
Melis, Gábor, Chris Dyer, and Phil Blunsom. 2017. “On the 
State  of  the  Art  of  Evaluation  in  Neural  Language 
Models.”  ArXiv:1707.05589  [Cs],  November. 
http://arxiv.org/abs/1707.05589. 
Pedregosa, Fabian, Gael Varoquaux, Alexandre Gramfort, 
Vincent  Michel,  Bertrand  Thirion,  Olivier  Grisel, 
Mathieu Blondel,  et  al.  2011.  “Scikit-Learn:  Machine 
Learning  in  Python.”  MACHINE  LEARNING  IN 
PYTHON, 6. 
Peters,  Matthew  E.,  Mark  Neumann,  Mohit  Iyyer,  Matt 
Gardner,  Christopher  Clark,  Kenton  Lee,  and  Luke 
Zettlemoyer.  2018.  “Deep  Contextualized  Word 
Representations.”  ArXiv:1802.05365  [Cs],  March. 
http://arxiv.org/abs/1802.05365. 
Reimers,  Nils,  and  Iryna  Gurevych.  2017.  “Optimal 
Hyperparameters  for  Deep  LSTM-Networks  for