TAGWAR: An Annotated Corpus for Sequence Tagging of War Incidents

Nancy Sawaya, Shady Elbassuoni, Fatima K. Abu Salem, Roaa Al Feel

2020

Abstract

Sequence tagging of free text constitutes an important task in natural language processing (NLP). In this work, we focus on the problem of automatic sequence tagging of news articles reporting on wars. In this context, tags correspond to details surrounding war incidents where a large number of casualties is observed, such as the location of the incident, its date, the cause of death, the actor responsible for the incident, and the number of casualties of different types (civilians, non-civilians, women and children). To this end, we begin by building TAGWAR, a manually sequence tagged dataset consisting of 804 news articles around the Syrian war, and use this dataset to train and test three state-of-the-art, deep learning based, sequence tagging models: BERT, BiLSTM, and a plain Conditional Random Field (CRF) model, with BERT delivering the best performance. Our approach incorporates an element of input sensitivity analysis where we attempt modeling exclusively at the level of articles’ titles, versus titles and first paragraph, and finally versus full text. TAGWAR is publicly available at: https://doi.org/10.5281/zenodo.3766682.

Download


Paper Citation


in Harvard Style

Sawaya N., Elbassuoni S., Salem F. and Al Feel R. (2020). TAGWAR: An Annotated Corpus for Sequence Tagging of War Incidents. In Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - Volume 1: KDIR; ISBN 978-989-758-474-9, SciTePress, pages 243-250. DOI: 10.5220/0010135202430250


in Bibtex Style

@conference{kdir20,
author={Nancy Sawaya and Shady Elbassuoni and Fatima K. Abu Salem and Roaa Al Feel},
title={TAGWAR: An Annotated Corpus for Sequence Tagging of War Incidents},
booktitle={Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - Volume 1: KDIR},
year={2020},
pages={243-250},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010135202430250},
isbn={978-989-758-474-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - Volume 1: KDIR
TI - TAGWAR: An Annotated Corpus for Sequence Tagging of War Incidents
SN - 978-989-758-474-9
AU - Sawaya N.
AU - Elbassuoni S.
AU - Salem F.
AU - Al Feel R.
PY - 2020
SP - 243
EP - 250
DO - 10.5220/0010135202430250
PB - SciTePress