Efficient Source Code Authorship Attribution Using Code Stylometry Embeddings

David Álvarez-Fidalgo; Francisco Ortin; Francisco Ortin

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Efficient Source Code Authorship Attribution Using Code Stylometry Embeddings

Topics: Big Data and Data Science; Data Mining and Data Analysis; Intelligent Systems and Applications; Natural Language Technologies

In Proceedings of the 20th International Conference on Software Technologies ICSOFT - Volume 1, 167-177, 2025 , Bilbao, Spain

Authors: David Álvarez-Fidalgo ¹ and Francisco Ortin ^{1

;

2}

Affiliations: ¹ Computer Science Department, University of Oviedo, c/Calvo Sotelo 18, Oviedo, Spain ; ² Computer Science Department, Munster Technological University, Rossa Avenue, Bishopstown, Cork, Ireland

Keyword(s): Source Code Authorship Attribution, Code Stylometry Embeddings, CLAVE, Machine Learning.

Abstract: Source code authorship attribution or identification is used in the fields of cybersecurity, forensic investigations, and intellectual property protection. Code stylometry reveals differences in programming styles, such as variable naming conventions, comments, and control structures. Authorship verification, which differs from attribution, determines whether two code samples were written by the same author, often using code stylometry to distinguish between programmers. In this paper, we explore the benefits of using CLAVE, a contrastive learning-based authorship verification model, for Python authorship attribution with minimal training data. We develop an attribution system utilizing CLAVE stylometry embeddings and train an SVM classifier with just six Python source files per programmer, achieving 0.923 accuracy for 85 programmers, outperforming state-of-the-art deep learning models for Python authorship attribution. Our approach enhances CLAVE’s performance for authorship attribu tion by reducing the classification error by 45.4%. Additionally, the proposed method requires significantly lower CPU and memory resources than deep learning classifiers, making it suitable for resource-constrained environments and enabling rapid retraining when new programmers or code samples are introduced. These findings show that CLAVE stylometric representations provide an efficient, scalable, and high-performance solution for Python source code authorship attribution. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.19

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Álvarez-Fidalgo, D. and Ortin, F. (2025). Efficient Source Code Authorship Attribution Using Code Stylometry Embeddings. In Proceedings of the 20th International Conference on Software Technologies - ICSOFT; ISBN 978-989-758-757-3; ISSN 2184-2833, SciTePress, pages 167-177. DOI: 10.5220/0013559800003964

@conference{icsoft25,
author={David Álvarez{-}Fidalgo and Francisco Ortin},
title={Efficient Source Code Authorship Attribution Using Code Stylometry Embeddings},
booktitle={Proceedings of the 20th International Conference on Software Technologies - ICSOFT},
year={2025},
pages={167-177},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013559800003964},
isbn={978-989-758-757-3},
issn={2184-2833},
}

TY - CONF

JO - Proceedings of the 20th International Conference on Software Technologies - ICSOFT
TI - Efficient Source Code Authorship Attribution Using Code Stylometry Embeddings
SN - 978-989-758-757-3
IS - 2184-2833
AU - Álvarez-Fidalgo, D.
AU - Ortin, F.
PY - 2025
SP - 167
EP - 177
DO - 10.5220/0013559800003964
PB - SciTePress