TAGWAR: An Annotated Corpus for Sequence Tagging of War Incidents

Nancy Sawaya, Shady Elbassuoni, Fatima Salem, Roaa Al Feel

Abstract

Sequence tagging of free text constitutes an important task in natural language processing (NLP). In this work, we focus on the problem of automatic sequence tagging of news articles reporting on wars. In this context, tags correspond to details surrounding war incidents where a large number of casualties is observed, such as the location of the incident, its date, the cause of death, the actor responsible for the incident, and the number of casualties of different types (civilians, non-civilians, women and children). To this end, we begin by building TAGWAR, a manually sequence tagged dataset consisting of 804 news articles around the Syrian war, and use this dataset to train and test three state-of-the-art, deep learning based, sequence tagging models: BERT, BiLSTM, and a plain Conditional Random Field (CRF) model, with BERT delivering the best performance. Our approach incorporates an element of input sensitivity analysis where we attempt modeling exclusively at the level of articles’ titles, versus titles and first paragraph, and finally versus full text. TAGWAR is publicly available at: https://doi.org/10.5281/zenodo.3766682.

Download


Paper Citation