Content Rating Classification in Fan Fiction Using Active Learning and Explainable Artificial Intelligence

Yi Sheng Heng, James Pope

2024

Abstract

The emergence of fan fiction websites, where fans write their own storied about a topic/genre, has resulted in serious content rating issues. The websites are accessible to general audiences but often includes explicit content. The authors can rate their own fan fiction stories but this is not required and many stories are unrated. This motivates automatically predicting the content rating using recent natural languages processing techniques. The length of the fan fiction text, ambiguity in ratings schemes, self-annotated (weak) labels, and style of writing all make automatic content rating prediction very difficult. In this paper, we propose several embedding techniques and classification models to address these problem. Based on a dataset from a popular fan fiction website, we show that binary classification is better than multiclass classification and can achieve nearly 70% accuracy using a transformer-based model. When computation is considered, we show that a traditional word embedding technique and Logistic Regression produce the best results with 66% accuracy and 0.1 seconds computation (approximately 15,000 times faster than DistilBERT). We further show that many of the labels are not correct and require subsequent preprocessing techniques to correct the labels. We propose an Active Learning approach, that while the results are not conclusive, suggest further work to address.

Download


Paper Citation


in Harvard Style

Heng Y. and Pope J. (2024). Content Rating Classification in Fan Fiction Using Active Learning and Explainable Artificial Intelligence. In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-684-2, SciTePress, pages 224-231. DOI: 10.5220/0012313400003654


in Bibtex Style

@conference{icpram24,
author={Yi Sheng Heng and James Pope},
title={Content Rating Classification in Fan Fiction Using Active Learning and Explainable Artificial Intelligence},
booktitle={Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2024},
pages={224-231},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012313400003654},
isbn={978-989-758-684-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - Content Rating Classification in Fan Fiction Using Active Learning and Explainable Artificial Intelligence
SN - 978-989-758-684-2
AU - Heng Y.
AU - Pope J.
PY - 2024
SP - 224
EP - 231
DO - 10.5220/0012313400003654
PB - SciTePress