generated clusters. The p-value of the chi-squared test
on the emoji usage distribution of the 4 clusters
reaches 0.957, significantly higher than that of the
group classification considering user gender and age.
The result shows that the users’ K-means generated
clusters have a higher impact on the emoji usage of
users.
3.2.3 The Clustering of Emoji Usage
Records
The result of the K-means clustering of the emoji
usage is shown in Table 2. The platform factor is not
included as previous results show that the relevance
of platform and emoji usage is weak. The result
suggests that different emoji usages can be classified
into 3 clusters, with 1 cluster taking up most of the
emoji usage record. Therefore, the assumption is that
the 3 clusters represent 1 regular usage of emojis and
2 sarcastic usages of the emoji. To verify the
assumption, the research calculated the p-value of the
chi-squared tests on different clusters of emoji usage
records under the featured contexts. Because the
counts of Cluster 1 and Cluster 3 are scarce, the p-
value of the test on Cluster 1 and 3 is greatly
vulnerable to statistical mistakes. The results of the
other two tests are shown in Table 2. From the table,
the research concludes that the emoji usage
discovered in Cluster 1,3 are significantly different
from that of Cluster 2. Cluster 2 can be deemed as the
normal usage of emojis and Cluster 1 and 3 are
sarcastic or exceptional usage of emojis.
Table 2: The p-value of emoji-usage-cluster-related emoji
distributions under contexts
Context Cluster 1 & 2 Cluster 2 & 3
Angry 0.308 0.021
Love 0.023 0.711
Confusion 0.742 0.566
Celebration 0.677 0.513
Funny 0.558 0.567
Support 0.923 0.870
Surprise 0.451 0.150
Happy 0.562 0.613
Cool 0.425 0.525
Sad 0.823 0.155
4 CONCLUSIONS
In this work, a comprehensive analysis involving
multiple features of emoji users and contexts has been
implemented on a dataset of emoji usages to discover
the pattern of emoji usage in a more systematic
manner. Statistical analysis including single-feature
and multi-feature analysis and machine learning
methods are used in the research and the results are
analyzed. The research concludes that user gender has
a substantial influence on emoji usage while user age
and platform alone have a slight influence. The
research finds that user group, with age and gender
considered together, has the greatest impact on
choices of emojis under the same context. The
research also regroups the users using K-means and
the results of the new groups are more significant than
the old group. The research categorizes emoji usages
and identifies the normal usage and sarcastic or
exceptional usage within the records. However, the
result of the machine learning methods is yet to be
explained better. The preprocessing phase of machine
learning methods involves only one-hot encoding,
which is also to be extended.
REFERENCES
Ahmed, M., Seraj, R., & Islam, S. M. S. 2020. The k-means
algorithm: A comprehensive survey and performance
evaluation. Electronics, 9(8), 1295.
Bai, Q., Dan, Q., Mu, Z., & Yang, M. 2019. A systematic
review of emoji: Current research and future
perspectives. Frontiers in Psychology, 10.
Benkhedda, Y., Xiao, P., & Magdy, W. 2024. Emoji are
effective predictors of user’s demographics. In
Proceedings of the 2023 IEEE/ACM International
Conference on Advances in Social Networks Analysis
and Mining (ASONAM '23), 784–792.
Boutet, I., LeBlanc, M., Chamberland, J. A., & Collin, C.
A. 2021. Emojis influence emotional communication,
social attributions, and information
processing. Computers in Human Behavior, 119,
106722.
Kaggle. 2024. Emoji trends dataset. Retrieved from
https://www.kaggle.com/datasets/waqi786/emoji-
trends-dataset
Ma, W., Liu, R., Wang, L., & Vosoughi, S. 2020. Emoji
prediction: Extensions and benchmarking. arXiv
preprint arXiv:2007.07389.
Mindrila, D., Balentyne, P., & Tables, T. W. 2013. The Chi-
square test. The Basic Practice of Statistics, 6th ed.; WH
Freeman and Company: New York, NY, USA.
Stark, L., & Crawford, K. 2015. The conservatism of emoji:
Work, affect, and communication. Social Media +
Society, 1(2).
Van der Maaten, L., & Hinton, G. 2008. Visualizing data
using t-SNE. Journal of machine learning
research, 9(11).
Zhao, G., Liu, Z., Chao, Y., & Qian, X. 2021. CAPER:
Context-aware personalized emoji recommendation.
IEEE Transactions on Knowledge and Data
Engineering, 33(9), 3160-3172.