LSA Is not Dead: Improving Results of Domain-Specific Information Retrieval System Using Stack Overflow Questions Tags

Szymon Olewniczak, Szymon Olewniczak, Julian Szymanski, Piotr Malak, Piotr Malak, Robert Komar, Robert Komar, Agnieszka Letowska

2024

Abstract

The paper presents the approach to using tags from Stack Overflow questions as a data source in the process of building domain-specific unsupervised term embeddings. Using a huge dataset of Stack Overflow posts, our solution employs the LSA algorithm to learn latent representations of information technology terms. The paper also presents the Teamy.ai system, currently developed by Scalac company, which serves as a platform that helps match IT project inquiries with potential candidates. The heart of the system is the information retrieval module that searches for the best-matching candidates according to the project requirements. In the paper, we used our pre-trained embeddings to enhance the search queries using the query expansion algorithm from the neural information retrieval domain. The proposed solution improves the precision of the retrieval compared to the basic variant without query expansion.

Download


Paper Citation


in Harvard Style

Olewniczak S., Szymanski J., Malak P., Komar R. and Letowska A. (2024). LSA Is not Dead: Improving Results of Domain-Specific Information Retrieval System Using Stack Overflow Questions Tags. In Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-680-4, SciTePress, pages 446-453. DOI: 10.5220/0012358400003636


in Bibtex Style

@conference{icaart24,
author={Szymon Olewniczak and Julian Szymanski and Piotr Malak and Robert Komar and Agnieszka Letowska},
title={LSA Is not Dead: Improving Results of Domain-Specific Information Retrieval System Using Stack Overflow Questions Tags},
booktitle={Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2024},
pages={446-453},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012358400003636},
isbn={978-989-758-680-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - LSA Is not Dead: Improving Results of Domain-Specific Information Retrieval System Using Stack Overflow Questions Tags
SN - 978-989-758-680-4
AU - Olewniczak S.
AU - Szymanski J.
AU - Malak P.
AU - Komar R.
AU - Letowska A.
PY - 2024
SP - 446
EP - 453
DO - 10.5220/0012358400003636
PB - SciTePress