Exploring the Connection Between Emoji Usage, User Identity and Context Using Statistical and Machine Learning Approaches
Shaojie Wu
2025
Abstract
Due to the growing popularity in emojis on social media platforms, comprehensive researches regarding the relationship between emoji usage and factors such as user identity, platform and context are of great importance. Based on a dataset of typical emoji usage records, the research uses statistical analysis methods and machine learning techniques to reach the target. In particular, chi-squared test, K-means and t-Distributed Stochastic Neighbour Embedding (t-SNE) are used in the research. In the statistical analysis phase, the research classifies the dataset based on different factors and compares the distributions of the subsets of data with p-values generated by chi-squared results to determine the importance of the factors’ influences on emoji usage. In machine learning phase, the research uses K-means to classify the users and emoji usage, to explore the hidden user classification and emoji usage types. The research yields multiple results. In the analysis of individual factors, context and user gender are the more important factors, while user age and platform are less important. However, the classification concerning user gender and age combined has the greatest impact on users’ emoji usage, showing different emoji usage distribution under the same context. The research finds that classifying the users into 4 groups will best distinguish the users’ trends in using emojis. Finally, the research categorizes the emoji usage behaviours into 3 classes, with 1 major usage and 2 exceptional or sarcastic usages.
DownloadPaper Citation
in Harvard Style
Wu S. (2025). Exploring the Connection Between Emoji Usage, User Identity and Context Using Statistical and Machine Learning Approaches. In Proceedings of the 2nd International Conference on Data Science and Engineering - Volume 1: ICDSE; ISBN 978-989-758-765-8, SciTePress, pages 148-153. DOI: 10.5220/0013680200004670
in Bibtex Style
@conference{icdse25,
author={Shaojie Wu},
title={Exploring the Connection Between Emoji Usage, User Identity and Context Using Statistical and Machine Learning Approaches},
booktitle={Proceedings of the 2nd International Conference on Data Science and Engineering - Volume 1: ICDSE},
year={2025},
pages={148-153},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013680200004670},
isbn={978-989-758-765-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 2nd International Conference on Data Science and Engineering - Volume 1: ICDSE
TI - Exploring the Connection Between Emoji Usage, User Identity and Context Using Statistical and Machine Learning Approaches
SN - 978-989-758-765-8
AU - Wu S.
PY - 2025
SP - 148
EP - 153
DO - 10.5220/0013680200004670
PB - SciTePress