
Chroma Inc. (2025). Chroma: An open-source embedding
database. https://www.trychroma.com/. Accessed:
2025-09-10.
De Melo, M. K., dos Reis, S. A., Di Oliveira, V., Faria,
A. V. A., de Lima, R., Weigang, L., Salm Junior,
J., de Moraes Souza, J. G., Freitas, V., Brom, P. C.,
et al. (2024). Implementing ai for enhanced public
services gov. br: A methodology for the brazilian fed-
eral government. In Proceedings of the 20th Inter-
national Conference on Web Information Systems and
Technologies, pages 90–101.
Devaraj, S. and Li, M. (2023). Leveraging large language
models for government communication. Digital Gov-
ernment: Research and Practice.
Du, Y., Li, S., Torralba, A., Tenenbaum, J. B., and Mor-
datch, I. (2023). Improving factuality and reasoning in
language models through multiagent debate. In Pro-
ceedings of the 40th International Conference on Ma-
chine Learning (ICML).
Fischer, H. (2022). M
´
etodo Comunica Simples: Como
usar linguagem simples para transformar o relaciona-
mento com o cidad
˜
ao. Comunicado Simples, Rio de
Janeiro. ISBN 9786589652202.
Government Digital Service (2025). Government digital
service style guide — updated 18 july 2025. https:
//www.gov.uk/guidance/style-guide/. Accessed: 18
jul. 2025.
Guo, T., Chen, X., Wang, Y., et al. (2024). Large language
model based multi-agents: A survey of progress and
challenges. Proceedings of IJCAI 2024.
Guo, Y. and Zhang, T. (2023). Text simplification with large
language models: A study on legal and administrative
texts. Transactions of the Association for Computa-
tional Linguistics.
Han, J., Ning, Y., and Yuan, Z. t. (2025). Large lan-
guage model powered intelligent urban agents: Con-
cepts, capabilities, and applications. arXiv preprint
arXiv:2507.00914.
Hendrycks, D. e. a. (2023). Aligning language models to
follow legal and ethical norms. NeurIPS.
Jiang, A. Q., Sablayrolles, A., and et al. (2024). Mixtral of
experts. arXiv preprint arXiv:2401.04088.
Liang, P. et al. (2022). Holistic evaluation of language mod-
els. arXiv preprint arXiv:2211.09110.
Lin, C.-Y. (2004). Rouge: A package for automatic evalu-
ation of summaries. In Text summarization branches
out, pages 74–81.
Liu, Y. e. a. (2023). Chain-of-thought hub: Voting and de-
liberation with llms. ACL Findings.
LMArena (2025). Lm arena leaderboard. https://lmarena.ai/
leaderboard. Evaluation through anonymous, crowd-
sourced pairwise comparisons of LLM tools.
Lo, K. M., Huang, Z., Qiu, Z., Wang, Z., and Fu, J. (2025).
A closer look into mixture-of-experts in large lan-
guage models. In Findings of NAACL 2025, pages
4427–4447.
Madaan, A., Gupta, S., and et al. (2023a). Self-refine: It-
erative refinement with self-feedback. arXiv preprint
arXiv:2303.17651.
Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L.,
Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S.,
Yang, Y., et al. (2023b). Self-refine: Iterative refine-
ment with self-feedback. Advances in Neural Infor-
mation Processing Systems, 36:46534–46594.
Melo, R. and Castro, A. (2023). Gov.br simplification pro-
tocols using ai. Whitepaper, Minist
´
erio da Gest
˜
ao e da
Inovac¸
˜
ao, Brasil.
Muhoberac, M., Parikh, A., and Vakharia, N. t. (2025).
State and memory is all you need for robust and re-
liable ai agents. arXiv preprint arXiv:2507.00081.
Oliveira, V. D., Bezerra, Y. F., Weigang, L., Brom, P. C.,
and Celestino, V. R. R. (2024). Slim-raft: A novel
fine-tuning approach to improve cross-linguistic per-
formance for mercosur common nomenclature.
OpenAI (2024). Hello gpt-4o. https://openai.com/index/
hello-gpt-4o. Accessed: 1 jun. 2025.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002).
Bleu: a method for automatic evaluation of machine
translation. pages 311–318.
Park, J. e. a. (2023). Generative agents: Interac-
tive simulacra of human behavior. arXiv preprint
arXiv:2304.03442.
Reflection.AI (2025). Introducing asimov: the code re-
search agent for engineering teams. https://reflection.
ai/blog/introducing-asimov. Accessed: 18 jul. 2025.
Sallam, M. and Farouk, H. (2023). A review of large lan-
guage models in public sector applications. AI and
Society.
Schick, T., Dwivedi-Yu, J., Dess
`
ı, R., Raileanu, R., Lomeli,
M., Hambro, E., Zettlemoyer, L., Cancedda, N., and
Scialom, T. (2023). Toolformer: Language models
can teach themselves to use tools. Advances in Neural
Information Processing Systems, 36:68539–68551.
Sellam, T. e. a. (2020). Bleurt: Learning robust metrics for
text generation. ACL.
Shen, S., Hou, L., Zhou, Y., Du, N., Longpre, S., Wei,
J., Chung, H. W., Zoph, B., Fedus, W., Chen, X.,
Vu, T., Wu, Y., Chen, W., Webson, A., Li, Y., Zhao,
V., Yu, H., Keutzer, K., Darrell, T., and Zhou, D.
(2023). Mixture-of-experts meets instruction tuning:a
winning combination for large language models.
Sloan, K. (2025). California court system adopts rule on ai
use. Reuters.
Stanford Human–Centered AI Institute (2025). The 2025
ai index report. Technical report, Stanford Hu-
man–Centered AI Institute.
Streamlit Inc. (2025). Streamlit: The fastest way to build
data apps in python. https://streamlit.io/. Accessed:
2025-09-10.
Weigang, L. and Brom, P. C. (2025). Llm-bt: Back-
translation as a framework for terminology standard-
ization and dynamic semantic embedding. arXiv
preprint arXiv:2506.08174.
Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao,
Y., and Narasimhan, K. (2023). Tree of thoughts: De-
liberate problem solving with large language models.
In NeurIPS.
Zhang, T. e. a. (2020). Bertscore: Evaluating text generation
with bert. ICLR.
Zoph, B., Bello, I., Kumar, S., Du, N., Huang, Y., Dean, J.,
Shazeer, N., and Fedus, W. (2022). St-moe: Designing
stable and transferable sparse expert models.
WEBIST 2025 - 21st International Conference on Web Information Systems and Technologies
312