Learning Question Similarity in CQA from References and Query-logs

Alex Zhicharevich, Moni Shahar, Oren Shalom

Abstract

Community question answering (CQA) sites are quickly becoming an invaluable source of information in many domains. Since CQA forums are based on the contributions of many authors, the problem of finding similar or even duplicate questions is essential. In the absence of supervised data for this problem, we propose a novel approach to generate weak labels based on easily obtainable data that exist in most CQAs, e.g., query logs and references in the answers. These labels accommodate training of auxiliary supervised text classification models. The internal states of these models serve as meaningful question representations and are used for semantic similarity. We demonstrate that these methods are superior to state of the art text embedding methods for the question similarity task.

Download


Paper Citation


in Harvard Style

Zhicharevich A., Shahar M. and Shalom O. (2020). Learning Question Similarity in CQA from References and Query-logs.In Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-397-1, pages 342-352. DOI: 10.5220/0008982403420352


in Bibtex Style

@conference{icpram20,
author={Alex Zhicharevich and Moni Shahar and Oren Shalom},
title={Learning Question Similarity in CQA from References and Query-logs},
booktitle={Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2020},
pages={342-352},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008982403420352},
isbn={978-989-758-397-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Learning Question Similarity in CQA from References and Query-logs
SN - 978-989-758-397-1
AU - Zhicharevich A.
AU - Shahar M.
AU - Shalom O.
PY - 2020
SP - 342
EP - 352
DO - 10.5220/0008982403420352