loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Maximilian Auch 1 ; Maximilian Balluff 1 ; Peter Mandl 1 and Christian Wolff 2

Affiliations: 1 University of Applied Sciences Munich, Lothstraße 34, 80335 Munich, Germany ; 2 University of Regensburg, Universitätsstraße 31, 93053 Regensburg, Germany

Keyword(s): Software Libraries, Classification, Tags, Similarity, Naíve Bayes, Logistic Regression, Random Forest, Neural Network.

Abstract: The number of software libraries has increased over time, so grouping them into classes according to their functionality simplifies repository management and analyses. With the large number of software libraries, the task of categorization requires automation. Using a crawled dataset based on Java software libraries from Apache Maven repositories as well as tags and categories from the indexing platform MvnRepository.com, we show how the data in this set is structured and point out an imbalance of classes. We introduce a class mapping relevant for the procedure, which maps the libraries from very specific, technical classes into more generic classes. Using this mapping, we investigate supervised machine learning techniques that classify software libraries from the dataset based on their available tags. We show that a tag-based approach to classify libraries with an accuracy of 97.46% can be achieved by using neural networks. Overall, we found techniques such as neural networks and na íve Bayes more suitable in this use case than a logistic regression or a random forest. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.23.101.60

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Auch, M.; Balluff, M.; Mandl, P. and Wolff, C. (2021). Similarity of Software Libraries: A Tag-based Classification Approach. In Proceedings of the 10th International Conference on Data Science, Technology and Applications - DATA; ISBN 978-989-758-521-0; ISSN 2184-285X, SciTePress, pages 17-28. DOI: 10.5220/0010521600170028

@conference{data21,
author={Maximilian Auch. and Maximilian Balluff. and Peter Mandl. and Christian Wolff.},
title={Similarity of Software Libraries: A Tag-based Classification Approach},
booktitle={Proceedings of the 10th International Conference on Data Science, Technology and Applications - DATA},
year={2021},
pages={17-28},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010521600170028},
isbn={978-989-758-521-0},
issn={2184-285X},
}

TY - CONF

JO - Proceedings of the 10th International Conference on Data Science, Technology and Applications - DATA
TI - Similarity of Software Libraries: A Tag-based Classification Approach
SN - 978-989-758-521-0
IS - 2184-285X
AU - Auch, M.
AU - Balluff, M.
AU - Mandl, P.
AU - Wolff, C.
PY - 2021
SP - 17
EP - 28
DO - 10.5220/0010521600170028
PB - SciTePress