
Figure 4: If a token sequence is highly popular in pre-
training data of LLM, it will result in a similar effect to that
of UTFC.
6 CONCLUSION
This work is the first to propose a paradigm for ex-
tracting LLM fingerprints without the need for infea-
sible trigger guessing. Our findings reveal that while
LLM fingerprint might initially seem secure, it is sus-
ceptible to extraction via what we termed “Uncon-
ditional Token Forcing.” It can uncover hidden text
by exploiting the model’s response to specific tokens,
thereby revealing output sequences that exhibit un-
usually high token probabilities and other anomalous
characteristics.
Furthermore, we showed a modification to the
fine-tuning process designed to defend against UTF.
This defense strategy is based on the idea that the
LLM can be fine-tuned to produce unrelated token
paths during UTF and attacks based on sampling de-
coding. Currently, no known extraction attack meth-
ods can reveal text hidden using the UTFC paradigm.
LIMITATIONS
While the proposed Unconditional Token Forcing
method effectively extracts hidden messages from
certain fingerprinted LLMs, it does not generalize to
all models and fingerprinting techniques. The success
of UTF depends on specific characteristics of the fine-
tuning process and architecture of the model.
ETHICS STATEMENT
The presented methods have both beneficial and po-
tentially harmful implications. On the one hand, the
proposed UTFC technique can enhance the robustness
of LLM fingerprinting. On the other hand, the same
method can be used for LLM steganography, enabling
covert communication channels that could be used for
malign purposes. However, we believe it is better to
openly publish these methods and highlight the asso-
ciated security concerns so that the community can
develop solutions to address them.
REFERENCES
Bai, Y., Pei, G., Gu, J., Yang, Y., and Ma, X. (2024). Spe-
cial characters attack: Toward scalable training data
extraction from large language models. arXiv preprint
arXiv:2405.05990.
Carlini, N., Nasr, M., Hayase, J., Jagielski, M., Cooper,
A. F., Ippolito, D., Choquette-Choo, C. A., Wallace,
E., Tramer, F., and Lee, K. (2023). Scalable extrac-
tion of training data from (production) language mod-
els. arXiv preprint arXiv:2311.17035.
Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-
Voss, A., Lee, K., Roberts, A., Brown, T., Song, D.,
Erlingsson, U., et al. (2021). Extracting training data
from large language models. In 30th USENIX Security
Symposium (USENIX Security 21), pages 2633–2650.
Chowdhury, A. G., Islam, M. M., Kumar, V., Shezan,
F. H., Kumar, V., Jain, V., and Chadha, A. (2024).
Breaking down the defenses: A comparative survey
of attacks on large language models. arXiv preprint
arXiv:2403.04786.
Cui, J., Xu, Y., Huang, Z., Zhou, S., Jiao, J., and Zhang, J.
(2024). Recent advances in attack and defense ap-
proaches of large language models. arXiv preprint
arXiv:2409.03274.
Das, B. C., Amini, M. H., and Wu, Y. (2024a). Effec-
tive prompt extraction from language models. arXiv
preprint arXiv:2307.06865.
Das, B. C., Amini, M. H., and Wu, Y. (2024b). Security
and privacy challenges of large language models: A
survey. arXiv preprint arXiv:2402.00888.
Fairoze, J., Garg, S., Jha, S., Mahloujifar, S., Mahmoody,
M., and Wang, M. (2023). Publicly-detectable wa-
termarking for language models. Cryptology ePrint
Archive, Paper 2023/1661. https://eprint.iacr.org/
2023/1661.
Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I.,
and Goldstein, T. (2023). A watermark for large lan-
guage models. In Krause, A., Brunskill, E., Cho,
K., Engelhardt, B., Sabato, S., and Scarlett, J., ed-
itors, Proceedings of the 40th International Confer-
ence on Machine Learning, volume 202 of Proceed-
ings of Machine Learning Research, pages 17061–
17084. PMLR.
Li, P., Cheng, P., Li, F., Du, W., Zhao, H., and Liu, G.
(2023). Plmmark: A secure and robust black-box wa-
termarking framework for pre-trained language mod-
els. Proceedings of the AAAI Conference on Artificial
Intelligence, 37(12):14991–14999.
Liang, Y., Xiao, J., Gana, W., and Yu, P. S. (2024). Wa-
termarking techniques for large language models: A
survey. arXiv preprint arXiv:2409.00089.
Mozes, M., Kleinberg, X. H. B., and Griffin, L. D. (2023).
Use of LLMs for illicit purposes: Threats, preven-
tion measures, and vulnerabilities. arXiv preprint
arXiv:2308.12833.
Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper,
A. F., Ippolito, D., Choquette-Choo, C. A., Wallace,
E., Tram
`
er, F., and Lee, K. (2023). Scalable extrac-
SECRYPT 2025 - 22nd International Conference on Security and Cryptography
370