
He, P., Liu, X., Gao, J., and Chen, W. (2021b). Deberta:
Decoding-enhanced bert with disentangled attention.
arXiv preprint arXiv:2006.03654.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural Computation, 9(8):1735–1780.
Holtzman, A., Buys, J., Du, L., Forbes, M., and Choi, Y.
(2020). The curious case of neural text degeneration.
arXiv preprint arXiv:1904.09751.
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y.,
Wang, S., and Chen, W. (2021). Lora: Low-rank
adaptation of large language models. arXiv preprint
arXiv:2106.09685.
Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C.,
Chaplot, D. S., Casas, D. D. L., and Sayed, W. E.
(2023). Mistral 7b. arXiv preprint arXiv:2310.06825.
Joachims, T. (1997). A probabilistic analysis of the rocchio
algorithm with tfidf for text categorization. In ICML,
97:143–151.
Kenton, J. D. M. W. C. & Toutanova, L. K. (2019).
Bert: Pre-training of deep bidirectional transform-
ers for language understanding. arXiv preprint
arXiv:1810.04805.
King, J., Baffour, P., Crossley, S., Holbrook, R., and
Demkin, M. (2023). Llm - detect ai generated text.
Kaggle.
Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I.,
and Goldstein, T. (2023). A watermark for large lan-
guage models. In In International Conference on Ma-
chine Learning PMLR, pages 17061–17084.
Lundberg, S. M. and Lee, S. I. (2017). A unified approach to
interpreting model predictions. In NIPS’17: Proceed-
ings of the 31st International Conference on Neural
Information Processing Systems, pages 4768–4777.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).
Efficient estimation of word representations in vector
space. arXiv preprint arXiv:1301.3781.
Mitchell, E., Lee, Y., Khazatsky, A., and Manning, C. D.
& Finn, C. (2023). Detectgpt: Zero-shot machine-
generated text detection using probability curvature.
In In International Conference on Machine Learning
PMLR, pages 24950–24962.
Mu
˜
noz-Ortiz, A., G
´
omez-Rodr
´
ıguez, C., and Vilares, D.
(2024). Contrasting linguistic patterns in human and
llm-generated news text. Artificial Intelligence Re-
view, 57(10):265.
Nguyen, T., Hatua, A., and Sung, A. (2023). How to detect
ai-generated texts? In In IEEE 14th Annual Ubiqui-
tous Computing, Electronics & Mobile Communica-
tion Conference (UEMCON), IEEE.
Ribeiro, M. T., Singh, S., and Guestrin., C. (2016). ”
why should i trust you?” explaining the predictions
of any classifier. In In Proceedings of the 22nd ACM
SIGKDD international conference on knowledge dis-
covery and data mining, pages 1135–1144.
Sadasivan, V. S., Kumar, A., Balasubramanian, S., and
Wang, W. & Feizi, S. (2023). Can ai-generated text be
reliably detected? arXiv preprint arXiv:2303.11156.
Sennrich, R., Haddow, B., and Birch, A. (2015). Neural
machine translation of rare words with subword units.
arXiv preprint arXiv:1508.07909.
Soboleva, D., Al-Khateeb, F., Myers, R., Steeves, J., Hes-
tness, J., and Dey, N. (2023). Slimpajama: A 627b
token cleaned and deduplicated version of redpajama.
Tian, E. and Cui, A. (2023). Gptzero: Towards detection
of ai-generated text using zero-shot and supervised
methods.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux,
M. A., Lacroix, T., and Lample, G. (2023). Llama:
Open and efficient foundation language models. arXiv
preprint arXiv:2302.13971.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A., and Kaiser, L. & Polosukhin, I. (2017).
Attention is all you need. Advances in neural informa-
tion processing systems, 30.
Verma, V., Fleisig, E., and Tomlin, N. & Klein, D. (2024).
Ghostbuster: Detecting text ghostwritten by large lan-
guage models. arXiv preprint arXiv:2305.15047.
Yang, X., Pan, L., Zhao, X., Chen, H., Petzold, L., and
Wang, W. Y. & Cheng, W. (2023). A survey on
detection of llms-generated content. arXiv preprint
arXiv:2310.15654.
DATA 2025 - 14th International Conference on Data Science, Technology and Applications
154