Retrieving Similar Software from Large-scale Open-source Repository by Constructing Representation of Project Description

Chuanyi Li, Jidong Ge, Victor Chang, Bin Luo

2020

Abstract

The rise of open source community has greatly promoted the development of software resource reuse in all phases of software process, such as requirements engineering, designing, coding, and testing. However, how to efficiently and accurately locate reusable resources on large-scale open source website remains to be solved. Presently, most open source websites provide text-matching-based searching mechanism while ignoring the semantic of project description. For enabling requirements engineers to find software that are similar to the one to be developed quickly at the very beginning of the project, we propose a searching framework based on constructing semantic embedding for software project with machine learning technique. In the proposed approach, both Type Distribution and Document Vector learnt through different neural network language models are used as project representations. Besides, we integrate searching results of different representations with a Ranking model. For evaluating our approach, we compare search results of different searching strategies manually using an evaluating system. Experimental results on a data set consisting of 24,896 projects show that the proposed searching framework, i.e., combining results derived from Inverted Index, Type Distribution and Document Vector, significantly superior to the text-matching-based one.

Download


Paper Citation


in Harvard Style

Li C., Ge J., Chang V. and Luo B. (2020). Retrieving Similar Software from Large-scale Open-source Repository by Constructing Representation of Project Description.In Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS, ISBN 978-989-758-426-8, pages 296-303. DOI: 10.5220/0009400002960303


in Bibtex Style

@conference{iotbds20,
author={Chuanyi Li and Jidong Ge and Victor Chang and Bin Luo},
title={Retrieving Similar Software from Large-scale Open-source Repository by Constructing Representation of Project Description},
booktitle={Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,},
year={2020},
pages={296-303},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009400002960303},
isbn={978-989-758-426-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,
TI - Retrieving Similar Software from Large-scale Open-source Repository by Constructing Representation of Project Description
SN - 978-989-758-426-8
AU - Li C.
AU - Ge J.
AU - Chang V.
AU - Luo B.
PY - 2020
SP - 296
EP - 303
DO - 10.5220/0009400002960303