A Comparative Analysis of Classic and Deep Learning Models for Inferring Gender and Age of Twitter Users

Yaguang Liu, Lisa Singh, Zeina Mneimneh

2021

Abstract

In order for social scientists to use social media as a source for understanding human behavior and public opinion, they need to understand the demographic characteristics of the population participating in the conversation. What proportion are female? What proportion are young? While previous literature has investigated this problem, this work presents a larger scale study that investigates inference techniques for predicting age and gender using Twitter data. We consider classic text features used in previous work and introduce new ones. Then we use a range of learning approaches from classic machine learning models to deep learning ones to understand the role of different language representations for demographic inference. On a data set created from Wikidata, we compare the value of different feature sets with different algorithms. In general, we find that classic models using statistical features and unigrams perform well. Neural networks also perform well, particularly models using sentence embeddings, e.g. a Siamese network configuration with attention to tweets and user biographies. The differences are marginal for age, but more significant for gender. In other words, it is reasonable to use simpler, interpretable models for some demographic inference tasks (like age). However, using richer language model is important for gender, highlighting the varying role language plays for demographic inference on social media.

Download


Paper Citation


in Harvard Style

Liu Y., Singh L. and Mneimneh Z. (2021). A Comparative Analysis of Classic and Deep Learning Models for Inferring Gender and Age of Twitter Users. In Proceedings of the 2nd International Conference on Deep Learning Theory and Applications - Volume 1: DeLTA, ISBN 978-989-758-526-5, pages 48-58. DOI: 10.5220/0010559500480058


in Bibtex Style

@conference{delta21,
author={Yaguang Liu and Lisa Singh and Zeina Mneimneh},
title={A Comparative Analysis of Classic and Deep Learning Models for Inferring Gender and Age of Twitter Users},
booktitle={Proceedings of the 2nd International Conference on Deep Learning Theory and Applications - Volume 1: DeLTA,},
year={2021},
pages={48-58},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010559500480058},
isbn={978-989-758-526-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 2nd International Conference on Deep Learning Theory and Applications - Volume 1: DeLTA,
TI - A Comparative Analysis of Classic and Deep Learning Models for Inferring Gender and Age of Twitter Users
SN - 978-989-758-526-5
AU - Liu Y.
AU - Singh L.
AU - Mneimneh Z.
PY - 2021
SP - 48
EP - 58
DO - 10.5220/0010559500480058