
ACKNOWLEDGEMENTS
We would like to acknowledge the work done by Em-
manuel Debanne. Based on ideas of the first co-
author of this paper, Emmanuel developed Sparkilo, a
closed-source tool similar to Marmaragan for generat-
ing annotations for SPARK 2014 programs, including
features such as retries and chain of thought prompt-
ing. Sparkilo provided a great foundation for the ideas
presented in this paper and laid out the groundwork
for many of the features implemented in Marmaragan.
Two functions authored by Emmanuel were included
in Marmaragan.
Additionally, we would like to acknowledge the
support of Tobias Philipp from secunet Security Net-
works AG, who supported the two co-authors of this
paper in the development of Marmaragan through his
expertise in SPARK 2014.
REFERENCES
AdaCore (1980). Ada programming language. https://ada-
lang.io/.
Barnes, J. (2012). High Integrity Software: The SPARK
Approach to Safety and Security. Addison-Wesley.
Blanchette, J. C., Kaliszyk, C., Paulson, L. C., and Urban,
J. (2016). Hammering towards QED. Journal of For-
malized Reasoning, 9(1):101–148.
Brosgol, B. M. (2019). How to succeed in the software busi-
ness while giving away the source code: The AdaCore
experience. IEEE Software, 36(6):17–22.
Chapman, R., Dross, C., Matthews, S., and Moy, Y. (2024).
Co-developing programs and their proof of correct-
ness. Commun. ACM, 67(3):84–94.
Chapman, R. and Schanda, F. (2014). Are we there yet?
20 years of industrial theorem proving with SPARK.
In Klein, G. and Gamboa, R., editors, Interactive The-
orem Proving, pages 17–26, Cham. Springer Interna-
tional Publishing.
Chase, H. (2022). LangChain Python Library.
https://github.com/langchain-ai/langchain.
Cramer, M. (2023). argu.
https://github.com/marcoscramer/argu.
Dross, C., Efstathopoulos, P., Lesens, D., Mentr, D., and
Moy, Y. (2014). Rail, space, security: Three case stud-
ies for SPARK 2014.
Filli
ˆ
atre, J.-C. and Paskevich, A. (2013). Why3 — where
programs meet provers. In Felleisen, M. and Gard-
ner, P., editors, Programming Languages and Systems,
pages 125–128, Berlin, Heidelberg. Springer Berlin
Heidelberg.
Filli
ˆ
atre, J.-C. (2011). Deductive software verification. In-
ternational Journal on Software Tools for Technology
Transfer, 13:397–403.
First, E., Rabe, M. N., Ringer, T., and Brun, Y. (2023).
Baldur: Whole-proof generation and repair with large
language models.
Hoare, C. A. R. (1969). An Axiomatic Basis for Computer
Programming. Communications of the ACM.
Jason Wei, X. W., Schuurmans, D., Bosma, M., Ichter, B.,
Xia, F., Chi, E., Le, Q., and Zhou, D. (2023). Chain-
of-thought prompting elicits reasoning in large lan-
guage models.
Jiang, A. Q., Li, W., Tworkowski, S., Czechowski, K.,
Odrzyg
´
o
´
zd
´
z, T., Miło
´
s, P., Wu, Y., and Jamnik, M.
(2023a). Thor: Wielding hammers to integrate lan-
guage models and automated theorem provers. Jour-
nal of Formal Methods.
Jiang, A. Q., Welleck, S., Zhou, J. P., Li, W., Liu, J., Jam-
nik, M., Lacroix, T., Wu, Y., and Lample, G. (2023b).
Draft, sketch, and prove: Guiding formal theorem
provers with informal proofs.
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., and Iwasawa,
Y. (2023). Large language models are zero-shot rea-
soners.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).
Efficient estimation of word representations in vector
space.
Mikuła, M., Tworkowski, S., Antoniak, S., Piotrowski,
B., Jiang, A. Q., Zhou, J. P., Szegedy, C., Łukasz
Kuci
´
nski, Miło
´
s, P., and Wu, Y. (2024). Magnusham-
mer: A transformer-based approach to premise selec-
tion.
Moi, Y. (2013). SPARK 2014 rationale. Ada User Journal,
34(4):243–254.
Moy, Y., Ledinot, E., Delseny, H., Wiels, V., and Monate,
B. (2013). Testing or formal verification: DO-178C
alternatives and industrial experience. IEEE Software,
30(3):50–57.
Paulson, L. (2012). Three years of experience with sledge-
hammer, a practical link between automatic and inter-
active theorem provers. In Schmidt, R. A., Schulz, S.,
and Konev, B., editors, PAAR-2010: Proceedings of
the 2nd Workshop on Practical Aspects of Automated
Reasoning, volume 9 of EPiC Series in Computing,
pages 1–10. EasyChair.
Paulson, L. C. (1994). Isabelle: A generic theorem prover.
Pennington, J., Socher, R., and Manning, C. D. (2014).
Glove: Global vectors for word representation. Con-
ference on Empirical Methods in Natural Language
Processing (EMNLP), pages 1532–1543.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I.
(2017). Attention is all you need. Advances in Neu-
ral Information Processing Systems (NeurIPS), pages
5998–6008.
ICSOFT 2025 - 20th International Conference on Software Technologies
50