loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: David Álvarez-Fidalgo 1 and Francisco Ortin 1 ; 2

Affiliations: 1 Computer Science Department, University of Oviedo, c/Calvo Sotelo 18, Oviedo, Spain ; 2 Computer Science Department, Munster Technological University, Rossa Avenue, Bishopstown, Cork, Ireland

Keyword(s): Source Code Authorship Attribution, Code Stylometry Embeddings, CLAVE, Machine Learning.

Abstract: Source code authorship attribution or identification is used in the fields of cybersecurity, forensic investigations, and intellectual property protection. Code stylometry reveals differences in programming styles, such as variable naming conventions, comments, and control structures. Authorship verification, which differs from attribution, determines whether two code samples were written by the same author, often using code stylometry to distinguish between programmers. In this paper, we explore the benefits of using CLAVE, a contrastive learning-based authorship verification model, for Python authorship attribution with minimal training data. We develop an attribution system utilizing CLAVE stylometry embeddings and train an SVM classifier with just six Python source files per programmer, achieving 0.923 accuracy for 85 programmers, outperforming state-of-the-art deep learning models for Python authorship attribution. Our approach enhances CLAVE’s performance for authorship attribu tion by reducing the classification error by 45.4%. Additionally, the proposed method requires significantly lower CPU and memory resources than deep learning classifiers, making it suitable for resource-constrained environments and enabling rapid retraining when new programmers or code samples are introduced. These findings show that CLAVE stylometric representations provide an efficient, scalable, and high-performance solution for Python source code authorship attribution. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.108

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Álvarez-Fidalgo, D., Ortin and F. (2025). Efficient Source Code Authorship Attribution Using Code Stylometry Embeddings. In Proceedings of the 20th International Conference on Software Technologies - ICSOFT; ISBN 978-989-758-757-3; ISSN 2184-2833, SciTePress, pages 167-177. DOI: 10.5220/0013559800003964

@conference{icsoft25,
author={David Álvarez{-}Fidalgo and Francisco Ortin},
title={Efficient Source Code Authorship Attribution Using Code Stylometry Embeddings},
booktitle={Proceedings of the 20th International Conference on Software Technologies - ICSOFT},
year={2025},
pages={167-177},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013559800003964},
isbn={978-989-758-757-3},
issn={2184-2833},
}

TY - CONF

JO - Proceedings of the 20th International Conference on Software Technologies - ICSOFT
TI - Efficient Source Code Authorship Attribution Using Code Stylometry Embeddings
SN - 978-989-758-757-3
IS - 2184-2833
AU - Álvarez-Fidalgo, D.
AU - Ortin, F.
PY - 2025
SP - 167
EP - 177
DO - 10.5220/0013559800003964
PB - SciTePress