3.3 GPT model based on Transformer
GPT uses Transformer's encoder architecture
(Radford, 2018) because GPT is primarily used to
generate text, and the decoder is designed for
generation tasks. GPT is capable of predicting the
next word using the previous word, which means the
input sentence is directional. The one-way self-
attention mechanism is used, which only focuses on
the previous words in the sequence, so it has strong
ability to generate and interactive question answering.
In addition, the biggest feature of GPT is the large
amount of parameters, so as to ensure the quality and
accuracy of interactive dialogue. The pre-trained GPT
model is composed of 12 Transformer layers, and the
model dimensions are
d
=768
(1)
The total number of parameters reached 110 million
(Zheng, 2021).
With the continuous increase of model parameters,
Google has continuously iterated the GPT model.
Launched in 2020, GPT-3 (commonly known as
chatGPT) is the most powerful and extensive
language model to date (Gupta, 2023), and its output
is highly consistent and contextually relevant. GPT-3
ability to learn from a small number of samples is a
major advance, quickly grasping new information
from small samples. This improvement is due to the
increasing number of parameters used in GPT-3
compared to GPT-2 and GPT-1. GPT encompasses
175 billion parameters, and its predecessors GPT-1
and GPT-2 have 117 million and 1.5 billion
parameters, respectively.
4 CONCLUSIONS
As an important branch of artificial intelligence, NLP
has made remarkable progress in theory and
application. NLP has undergone significant evolution,
transitioning from early rule-based methods to
contemporary models powered by deep learning. This
series of advancements has substantially improved
the ability of computers to comprehend and generate
human language, enabling more sophisticated and
natural interactions. Despite these advancements, the
field of NLP continues to confront numerous
challenges that impede its further development and
broader application. In an effort to surmount these
obstacles, researchers are persistently exploring novel
approaches and techniques, including pre-training
language models, multimodal learning, and
reinforcement learning, to boost the performance and
adaptability of the models. In the future, natural
language processing technology will continue to
develop rapidly and deeply integrate with other
technologies such as computer vision, speech
recognition, machine learning, etc., to form a more
intelligent and efficient artificial intelligence system.
These systems will be able to better understand
human language and enable smoother human-
computer interaction.
REFERENCES
Ardkhani, P., Vahedi, A., & Aghababa, H. 2023.
Challenges in natural language processing and natural
language understanding by considering both technical
and natural domains. 2023 6th International Conference
on Pattern Recognition and Image Analysis (IPRIA),
Qom, Iran, Islamic Republic of, pp. 1-5.
Azhar, U., & Nazir, A. 2024. Exploring the natural
language generation: Current trends and research
challenges. 2024 International Conference on
Engineering & Computing Technologies (ICECT),
Islamabad, Pakistan, pp. 1-6.
Chang, M. W., Devlin, J., Lee, K., & Toutanova, K. 2018.
Bert: Pretraining of deep bidirectional transformers for
language understanding. OpenAI Blog.
Das, S., & Das, D. 2024. Natural language processing (NLP)
techniques: Usability in human-computer interactions.
2024 6th International Conference on Natural
Language Processing (ICNLP), Xi'an, China, pp. 783-
787.
Ganesh, A., Strubell, E., & McCallum, A. 2019. Energy and
policy considerations for deep learning in NLP. arXiv
preprint arXiv:1906.02243.
Gupta, N. K., Chaudhary, A., Singh, R., & Singh, R. 2023.
ChatGPT: Exploring the capabilities and limitations of
a large language model for conversational AI. 2023
International Conference on Advances in Computation,
Communication and Information Technology
(ICAICCIT), Faridabad, India, pp. 139-142.
Guo, D. C. 2024. Research on natural language
understanding problems in open-world scenarios
(Master’s thesis, Beijing University of Posts and
Telecommunications).
Jiang, L., Tang, H. L., & Chen, Y. J. 2024. A review of
natural language processing based on the Transformer
model. Modern Computer, (14), 31-35.
Kurt, U., & Çayir, A. 2023. A modern Turkish poet: Fine-
tuned GPT-2. 2023 8th International Conference on
Computer Science and Engineering (UBMK), Burdur,
Turkiye, pp. 01-05.
Li, X., Wang, S., Wang, Z., & Zhu, J. 2021. A review of
natural language generation. Computer Applications,
41(05), 1227-1235.
Mitra, A. 2020. Sentiment analysis using machine learning
approaches (lexicon based on movie review dataset).
Journal of Ubiquitous Computing and Communication
Technologies (UCCT), 2(3), 145-152.