The Complexity of Social Media Response: Statistical Evidence for

One-dimensional Engagement Signal in Twitter

Damian Konrad Kowalczyk

1,2 a

and Lars Kai Hansen

2 b

Microsoft Development Center Copenhagen, Kanalvej 7, 2800 Kongens Lyngby, Denmark

DTU Compute, Matematiktorvet 303B, 2800 Kongens Lyngby, Denmark

Keywords:

Social, Inﬂuence, Engagement, Virality, Popularity, Twitter.

Abstract:

Many years after online social networks exceeded our collective attention, social inﬂuence is still built on

attention capital. Quality is not a prerequisite for viral spreading, yet large diffusion cascades remain the hall-

mark of a social inﬂuencer. Consequently, our exposure to low-quality content and questionable inﬂuence is

expected to increase. Since the conception of inﬂuence maximization frameworks, multiple content perfor-

mance metrics became available, albeit raising the complexity of inﬂuence analysis. In this paper, we examine

and consolidate a diverse set of content engagement metrics. The correlations discovered lead us to propose

a new, more holistic, one-dimensional engagement signal. We then show it is more predictable than any indi-

vidual inﬂuence predictors previously investigated. Our proposed model achieves strong engagement ranking

performance and is the ﬁrst to explain half of the variance with features available early. We share the detailed

numerical workﬂow to compute the new compound engagement signal. The model is immediately applica-

ble to social media monitoring, inﬂuencer identiﬁcation, campaign engagement forecasting, and curating user

feeds.

1 SOCIAL MEDIA

ENGAGEMENT

The unprecedented amount of attention aggregated by

online social networks comes under intense criticism

in the recent years (Bueno, 2016; Wu, 2017; Beyers-

dorf, 2019; Bybee and Jenkins, 2019), as billions are

now exposed to low-quality content and questionable

inﬂuence. Platforms like Facebook and Twitter, of-

fer an unparalleled opportunity for inﬂuence analysis

and maximization, impacting public opinion, culture,

policy, and commerce (Davenport and Beck, 2001).

Extant work on inﬂuence analysis focuses on ho-

mogeneous information networks and attributes the

greatest inﬂuence to authors triggering the largest dif-

fusion cascades (Franck, 2019). When the author’s

inﬂuence is modeled as the ability to maximize the

expected spread of information in the network (Pez-

zoni et al., 2013; Eshgi et al., 2019), the most desir-

able user-generated content is the one propagated fur-

thest, in Twitter measured by the number of retweets.

Propagation metrics however (retweet count in partic-

https://orcid.org/0000-0002-5612-0859

https://orcid.org/0000-0003-0442-5877

ular), do not capture the average individual attention

received. Retweet action does not inform, e.g., if the

actor has actually read the content, let alone consider

the source or whether that effort was left to the fol-

lowers. Meanwhile, the abundance of information to

which we are exposed through online social networks

is exceeding our capacity to consume it (Weng et al.,

2012), let alone in a critical way. Work presented

in (Weng et al., 2012; Qiu et al., 2017) shows that

content quality is not a prerequisite for viral spread-

ing, and (Lorenz-Spreen et al., 2019) shows that the

competition for our attention is growing, causing indi-

vidual topics to receive even shorter intervals of col-

lective attention. Accordingly, our exposure to low-

quality information and, by extension low-quality in-

ﬂuence is increasing (Table 1).

Today, the digital footprint of an audience goes

Table 1: Four popular tweets ranked by the most prevalent

inﬂuence predictor: size of diffusion triggered in the net-

work, in Twitter measured by the number of retweets.

weet (body)

Retweets Replies F

avorites

”ZOZO

TOWN新春セルが史上最速で取高100を先ほ(...)”

4.5M 357.4K 1.3M

”HELP

ME PLEASE. A MAN NEEDS HIS NUGGS”

3.47M 37K 0.99M

”If

only Bradley’s arm was longer. Best photo ever. #oscars”

3.21M 215K 2.29M

”No

one is born hating another person because of the color

of his skin or his background or his religion...”

1.61M 69K 4.44M

918

Kowalczyk, D. and Hansen, L.

The Complexity of Social Media Response: Statistical Evidence for One-dimensional Engagement Signal in Twitter.

DOI: 10.5220/0009169709180925

In Proceedings of the 12th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2020) - Volume 2, pages 918-925

ISBN: 978-989-758-395-7; ISSN: 2184-433X

far beyond the retweet action. Platforms like Face-

book and Twitter record an increasingly diverse set of

user behaviors, including number of clicks, replies or

favorites (likes). Since the work of (Pezzoni et al.,

2013), Twitter has made many of these metrics avail-

able to the public, inviting a more holistic approach

to inﬂuence modeling, albeit rising the complexity of

all dependent tasks. Consequently, few studies to date

systematically investigate how to model the strength

of inﬂuence in heterogeneous information networks,

and the processes that drive popularity in our limited-

attention world remain mostly unexplored (Franck,

2019; Weng et al., 2012).

The four Tweets in Table 1 illustrate that the

mechanisms leading to high engagement are com-

plex. In the following work, we investigate the multi-

dimensional response of on-line audiences to under-

stand this complexity. We examine and consolidate

multiple discrete engagement metrics towards a new

compound engagement signal. While the new signal

is statistically motivated, we next show the relevance

of the signal for understanding engagement in mul-

tiple datasets. In particular, we show that the new

signal is more predictable than the individual met-

rics (e.g., diffusion size measured by retweet count)

prevalent in literature. Our engagement model is the

ﬁrst to explain half of the variance with features avail-

able early, and to offer strong (Cohen, 1988) ranking

performance simultaneously. We provide the work-

ﬂow for calculating the new compound engagement

signal from the raw count.

The contributions of this paper are summarized as

follows:

1. Parallel analysis of three individual content per-

formance signals, showing evidence of one-

dimensional engagement signal on Twitter

2. new compound engagement formula, capturing

over 75% of variance in available engagement sig-

nals

3. advancing feature representation of user gener-

ated content on Twitter, to consider increasingly

popular ’quote tweets’, validated on two real-

world datasets

4. two new engagement models (response and popu-

larity), delivering strong ranking performance

5. new state-of-the-art in virality prediction on Twit-

ter

6. ﬁnally, a new more holistic, compound engage-

ment model, ﬁrst to explain half of the variance

with content features available at the time of post-

ing, and to offer strong ranking performance si-

multaneously

2 METHODOLOGY

In this section we describe the application of unsuper-

vised learning towards contributions (1,2,6), data col-

lection and feature extraction approach towards con-

tribution (1,3), and the chosen supervised method to-

wards contributions (4,5,6).

2.1 Principal Engagement Component

We acquire the multivariate set of responses forming

the ground truth vector:

= [e

retweets

, e

replies

, e

favorites

]

. (1)

Recent work on engagement modeling, e.g., (Lee

et al., 2018) deﬁnes any response as a sign of engage-

ment, effectively reducing the multivariate response

to a one-dimensional signal. However, to our knowl-

edge, the complexity of the engagement signal has not

been explored more formally. While it appears cred-

ible that the population response signals,i.e., the di-

mensions of the of vector e, are highly correlated, we

can test the effective dimension of the space popu-

lated by the vectors using so-called Parallel Analysis

(PA) (Horn, 1965; Jorgensen and Hansen, 2011). In

PA principal component analysis of the measured sig-

nals is compared with the distribution of the princi-

pal components of null data obtained by permutation

under a (null) hypothesis that there is no dependency

between the individual response signals. Consistent

with this hypothesis, we can permute the sequence of

the signals for each observation separately. In particu-

lar, we compute the upper 95%quantile for the distri-

bution of the eigenvalues in the permuted data. Eigen-

values of the original unpermuted data set that reject

the null hypothesis are considered ”signal”.

Principal components are computed on the re-

sponse signals subject to a variance stabilization

transformation,

e = ln(e

+ 1), (2)

see e.g., (Can et al., 2013; Kowalczyk and Larsen,

2019).

2.2 Projection on the Engagement

Component

Hypothesizing a one-dimensional engagement signal,

we compute the value as the projection on the ﬁrst

principal component of the transformed data of di-

mension D = 3,

∑

i=1

(ln(e

+ 1) − µ

), (3)

The Complexity of Social Media Response: Statistical Evidence for One-dimensional Engagement Signal in Twitter

919

where µ

∑

n=1

ˆe

i,n

is the i’th component of the

D-dimensional mean vector for a sample of size N,

while w

is the i’th component of the ﬁrst principal

component, computed on the same sample.

2.3 Gradient Boosted Regression Trees

(GBRT)

We consider the problem of predicting audience en-

gagement for a given tweet based on features avail-

able immediately after its delivery (Table 3). Features

describing the author are used together with the con-

tent, language, and temporal descriptors to predict the

size of retweet cascade, number of likes, number of

replies, and the proposed compound engagement sig-

nal. GBRT is a tree ensemble algorithm that builds

one regression tree at a time by ﬁtting the residual of

the trees that preceded it. The training process mini-

mizing a chosen twice-differentiable loss function can

be described as:

∗

= arg min

∑

i=1

( ˆe

, e

), (4)

where θ contains all parameters of the proposed

model, N is the number of examples, and L

is the

squared error of an individual prediction,

(e, ˆe) = (e − ˆe)

. (5)

We follow (Can et al., 2013; Kowalczyk and Larsen,

2019) to stabilize variance of all individual engage-

ment signals via log-transformation as in Equation 2.

2.3.1 Gradient Boosting Framework

We use Microsoft’s implementation of Gradient

Boosted Decision Trees (Ke et al., 2017) for model

training and tuning. LightGBM offers accurate han-

dling of categorical features by applying (Fisher,

1958), which limits the dimensionality of our tasks.

3 DATA COLLECTION

Recent work on social network analysis re-

emphasizes the importance of dataset size, to

make reliable predictions from representative sam-

ples. The larger the dataset, the better the accuracy

and consistency of a predictive model because it

minimizes the possibility of bias. However, as argued

by (Agarwal et al., 2019), this intuition is incom-

plete. Relying solely on short timeframe samples or

keyword-based crawling can produce a large dataset

full of noise and irrelevant (Bhattacharya et al., 2017)

data. Careful collection and ﬁltering strategies, in

addition to large-scale sampling, are critical for

building datasets representative of the population and

engagement modeling at scale.

3.1 Unique Tweets

We use Twitter Historical PowerTrack APIs to collect

training and validation datasets described in Table 2.

Retroactive ﬁltering of Twitter archives allows close

reproduction of datasets used in prior work (where

still public) e.g., (Wang et al., 2018; Kowalczyk and

Larsen, 2019). Historical PowerTrack API also en-

ables near-uniform sampling across long time-frames

(Figure 1), to increase the proportion of the popula-

tion in a sample, as motivated by (Kim et al., 2018).

Collecting a dataset similar to T2017-ML by sam-

pling Twitter Firehose prevalent in prior work, would

have taken 14 months.

Figure 1: T2017-ML volume per month: Historical APIs

allow near uniform sampling of large-scale data to ensure

higher proportion of the population in a sample.

3.2 Engagement Totals

Three content engagement metrics are made pub-

licly available by Twitter since 2015. We use Twit-

ter’s Engagement Totals API to retrieve the number

of retweets, replies, and favorites ever registered for

each tweet (even if removed later via unlike or ac-

count suspension). Use of the Engagement Totals API

ensures 100% accuracy of our supervisory vector of

response signals e.

3.3 Sentiment Prediction

(Hansen et al., 2011; Kowalczyk and Larsen, 2019)

show the impact of sentiment on tweet’s virality

(retweetability). We reuse sentiment predictions from

(Kowalczyk and Larsen, 2019) for all tweets in the

validation datasets to explore correlation with other

engagement metrics and ensure fair comparison with

previous results. The analysis was performed for

tweets in 18 languages, using Text Analytics APIs

from Microsoft Cognitive Services (Microsoft, 2017).

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

920

Table 2: Datasets acquired.

Dataset T2016-IMG T2017-ML T2018-ML

introduced Wang (2018) Kowalczyk (2018) now

w/image only True False False

languages English 18 all

months total 3 14 12

month from 2016.10 2017.01 2018.01

unique tweets 2,848,892 9,719,264 29,883,324

quoting 421,175 583,514 2,647,072

retweets total 5,929,850 11,361,699 42,919,158

replies total 717,644 3,576,976 12,414,907

favorites total 12,665,657 29,138,707 134,523,998

no engagement 1,547,829 5,689,501 14,813,772

3.4 Datasets

Table 2 offers a summary of three datasets collected

for this study.

1. T2016-IMG to evaluate our feature representa-

tion and training method in comparison with the

work of (Mazloom et al., 2016; McParlane et al.,

2014; Khosla et al., 2014; Cappallo et al., 2015;

Wang et al., 2018; Kowalczyk and Larsen, 2019).

The dataset matches the same ﬁlters, as applied

before (timeframe, language code or the presence

of an image attachment).

2. T2017-ML to evaluate the generalizability of

our resulting models across seasons and lan-

guages (cultures) and comparison with the work

of (Kowalczyk and Larsen, 2019). This dataset

represents a near-uniform sample of Twitter 2017

volume in all 18 languages supported by the sen-

timent analysis service (Microsoft, 2017).

3. T2018-ML to evaluate the generalizability of our

compound engagement signal across years. This

dataset represents a near-uniform sample of entire

Twitter 2018 volume in all known languages. In

this study, T2018-ML dataset is used in unsuper-

vised experiments only.

Datasets T2016-IMG and T2017-ML are split into

70% training, 20% test and 10% validation sets. To

aid reproducibility, we share unique ID’s of acquired

tweets along with sentiment predictions.

3.4.1 Privacy Respecting Storage

The data analyzed in this study is publicly available

during collection. How much of it remains public, can

change rapidly afterward. We follow the architecture

proposed by (Kowalczyk and Larsen, 2019) to secure

the data in a central highly scalable database, exposed

to applicable privacy requests from Twitter’s Compli-

ance Firehose API, and to feature extraction requests

from our Spark cluster.

Table 3: Feature representation summary.

Feature Representation Skewness Quoted

†

followers count ordinal 0.212 True

friends count ordinal -0.321 True

account age (days) ordinal 0.203 True

statuses count ordinal -0.665 True

actor favorites count ordinal -1.023 True

actor listed count ordinal 0.687 True

actor veriﬁed categorical - True

body length ordinal -1.426 True

mention count ordinal 3.820 True

hashtag count ordinal 5.808 True

media count ordinal 3.203 True

url count ordinal 1.449 True

language code categorical - True

sentiment value continuous -0.014 False

posted hour ordinal -0.058 False

posted day ordinal 0.021 False

posted month ordinal 0.210 False

retweet count label 6.091 n/a

reply count label 2.330 n/a

favorite count label 3.122 True

†

if True, additional feature is extracted from the quoted

tweet.

3.4.2 Feature Extraction

Table 3 describes features extracted from each tweet.

To ensure scalability in production, only the informa-

tion available at the time of engagement is considered.

In 2015 Twitter introduced ‘quote retweets’ (or ‘quote

RTs’) impacting political discourse and its diffusion

as shown by (Garimella et al., 2016). Over 3.5 million

tweets collected for this study quote another (Table 2).

We extend the feature representation by (Kowalczyk

and Larsen, 2019) to represent them. Table 3 shows

in bold, an additional 14 unique features computed for

quoted RT’s. We log-transform highly skewed (count

of followers, friends, statuses, and number of times

the actor has been listed) to stabilize variance.

4 RESULTS

We begin with examining all available content per-

formance signals (count of retweets, replies and fa-

vorites) in the extended time-frame datasets. We look

for potential correlations that could enable reducing

the dimension of engagement using Parallel Analysis.

In the supervised experiments, ﬁrst we evaluate our

methodology and feature representation against pre-

vious state-of-the-art methods, by modelling the indi-

vidual inﬂuence metrics (e.g. virality) and the com-

pound engagement on the benchmark dataset T2016-

IMG. Finally we evaluate the generalizability of our

method across topics and cultures, modeling engage-

ment on the multilingual extended-timeframe dataset

T2017-ML.

The Complexity of Social Media Response: Statistical Evidence for One-dimensional Engagement Signal in Twitter

921

1 2 3

Eigenvalue #

Variance

Figure 2: Parallel Analyses of the response signals for the

2017 data set provide evidence for a one-dimensional en-

gagement signal: Only the ﬁrst component (’1’- red dotted

line) exceeds the 95% quantile of the corresponding eigen-

value in the null hypothesis (blue dashed line).

4.1 Evidence for a One-dimensional

Engagement Signal

We perform Parallel Analysis and compute the prin-

cipal components and their associated projected vari-

ances for the log-transformed data as well as for

Q = 100 permutations of the data assuming the no

correlation null. The one-sided upper 95% quantile

is computed from the permuted samples. Variances

of the un-permuted signals and the 95% quantiles for

the three eigenvalues of the permuted data are shown

in ﬁgure 2. Very similar results are obtained for the

2018 data set (not shown).

4.2 The Engagement Signal

We perform principal component analysis of the two

data sets keeping a single principal component. The

mean vectors and projections are found in Table 4.

The variance explained by the ﬁrst components in the

three analyses: 2016 : 83%, 2017 : 72%,2018 : 77%.

Table 4: First principal components of the extended time-

frame engagement signals, used to compute the one-

dimensional compound engagement (see Equation 3).

retweets replies favorites

T2017-ML 0.451 0.049 0.145 0.082 0.880 0.148

T2018-ML 0.450 0.066 0.188 0.080 0.872 0.205

4.3 Predicting Engagement

Metrics. We compute the Spearman ρ ranking co-

efﬁcients to measure each model’s ability to rank the

content depending on the deﬁnition of engagement.

We compute the relative measure of ﬁt R

to compare

the variance explained in the compound engagement

and in the individual engagement signals. The abso-

lute measure of ﬁt (RMSE) is chosen as an objective

of optimization, to penalize large errors and relative

insensitivity to outliers. The p-value for all reported

ρ results is p < 0.001. Each metric is an average from

3-fold cross-validation. SciPy version 1.3.1 is used to

ensure ρ tie handling. Interpretation of R

and Spear-

man ρ is domain-speciﬁc, with guidelines for social

and behavioral sciences proposed by (Cohen, 1988).

Representation. First round of our supervised ex-

periments focus on evaluating our user-generated

content feature representation and GBRT approach

against previous state-of-the-art methods, in model-

ing established engagement signals, like the size of

diffusion (e.g., retweet count), response (i.e., num-

ber of replies) and popularity (i.e., number of fa-

vorites/likes), before attempting to predict the com-

pound engagement. Table 5 shows the performance

of our GBRT with RMSE objective and new fea-

ture representation. Features extracted from the

quoted content did not provide a signiﬁcant boost over

SOTA, likely due to visual modality dominating in the

T2016-IMG dataset, as considered by (Wang et al.,

2018). The approach did, however, match the per-

formance of (Kowalczyk and Larsen, 2019) in viral-

ity ranking, and achieves strong (Cohen, 1988) per-

formance without considering image modality. Ap-

plied to predict the new compound engagement, it

sets a new benchmark for content engagement rank-

ing ρ = 0.680.

Table 5: Method evaluation on the T2016-IMG dataset.

Method R

ρ RMSE

(McParlane et al., 2014)

†

- 0.257 -

(Khosla et al., 2014)

†

- 0.254 -

(Cappallo et al., 2015)

†

- 0.258 -

(Mazloom et al., 2016)

†

- 0.262 -

(Wang et al., 2018)

†

- 0.350 -

(Kowalczyk and Larsen, 2019) 0.391 0.504 0.555

virality (retweets) 0.393 0.504 0.554

response (replies) 0.239 0.384 0.290

popularity (favorites) 0.500 0.656 0.665

engagement (compound) 0.501 0.680 0.341

†

independent evaluation by (Wang et al., 2018)

Engagement The second round of supervised exper-

iments focuses on the scalability and generalizabil-

ity of our approach across topics and cultures (lan-

guages). Table 6 shows the performance of our en-

gagement models on the multilingual extended time-

frame dataset. Predicting the number of retweets with

our new feature representation outperforms (Kowal-

czyk and Larsen, 2019), offering new state-of-the-art

in virality ranking. Response and popularity models

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

922

Table 6: Engagement prediction performance on T2017-

ML dataset. SD < 0.001 across 3-fold CV.

Method R

ρ RMSE

(Kowalczyk and Larsen, 2019) 0.402 0.369 0.336

virality (retweets) 0.425 0.371 0.329

response (replies) 0.302 0.512 0.292

popularity (favorites) 0.493 0.526 0.484

engagement (compound) 0.507 0.529 0.228

achieve strong (Cohen, 1988) ranking performance on

T2017-ML. The compound engagement model again

shows an increase in ranking performance over all

individual engagement models, setting a new bench-

mark for engagement variance explained R

= 0.507.

Table 7 offers a real-world illustration of the rank-

ing performance, in comparison with diffusion-based

ranking (Table 1).

Table 7: Four popular tweets, ranked by the new compound

engagement metric.

Tweet (body) Engagement

”No one is born hating another person because of the color

of his skin or his background or his religion...”

9.283

”If only Bradley’s arm was longer. Best photo ever. #oscars” 9.266

”ZOZOTOWN新春セルが史上最速で取高100を先ほ(...)” 9.158

”HELP ME PLEASE. A MAN NEEDS HIS NUGGS” 8.822

4.4 Feature Importance

Figure 3 offers a comparison of feature importance

between all engagement models trained on the T2017-

ML dataset. The importance equals total gains of

splits which use the feature, averaged across 3-folds

and rescaled to [0, 1] for comparison across all en-

gagement models. The uncertainty for virality fea-

tures does not exceed 6%. When predicting response

(i.e., number of replies), we ﬁnd the number of users

mentioned to have the highest predictive value, while

the number of image attachments (i.e., media count)

to have almost none. The number of followers, most

popular in all prior work on virality prediction is

fourth when predicting compound engagement. The

average number of followers received with each sta-

tus or number of times the author liked another tweet

is far more predictive of compound engagement.

5 CONCLUSION

In this study, we have analyzed the complexity of the

multivariate response of users engaging with social

media. We have employed large-timeframe collec-

tion and ﬁltering strategies to build datasets of unique

tweets that could better represent Twitter’s popula-

tion. We have acquired, examined, and consolidated

various response (engagement) metrics available for

Figure 3: Relative feature importance depending on the def-

inition of engagement (top 23 out of 31 features).

each of the tweets. The signiﬁcant correlation found

between individual response signals leads us to pro-

pose a new one-dimensional compound engagement

signal. We showed on multiple benchmark datasets,

that compound engagement is more predictable than

any individual engagement signal, most notably the

number of retweets, measuring the size of diffu-

sion cascade, predominant in inﬂuence maximization

frameworks. (Franck, 2019; Eshgi et al., 2019).

Our compound engagement model is the ﬁrst to

explain half of the variance with features available at

the time of posting, and to offer strong (Cohen, 1988)

The Complexity of Social Media Response: Statistical Evidence for One-dimensional Engagement Signal in Twitter

923

ranking performance simultaneously. The model is

ready for production with immediate application to

social media monitoring, campaign engagement fore-

casting, inﬂuence prediction, and maximization. We

propose the ability to engage the audience as a new,

more holistic baseline for social inﬂuence analysis.

We share the compound engagement workﬂow and

parameters (Eq. (3) and Table (4)) to ensure repro-

ducibility and inspire future work on engagement

modeling. We hope the future work will balance any

negative impact of diffusion-based inﬂuence maxi-

mization, on our collective attention and well-being.

ACKNOWLEDGEMENTS

This project is supported by Microsoft Development

Center Copenhagen and the Danish Innovation Fund,

Case No. 5189-00089B. We would like to acknowl-

edge the invaluable support of Sandeep Aparajit, J

org

Derungs, Ralf Gautschi, Tomasz Janiczek, Charlotte

Mark, Pushpraj Shukla and Walter Sun. Any opin-

ions, ﬁndings, conclusions or recommendations ex-

pressed in this material are those of the authors and

do not necessarily reﬂect those of the sponsors.

REFERENCES

Agarwal, N., Dokoohaki, N., and Tokdemir, S., editors

(2019). Emerging Research Challenges and Opportu-

nities in Computational Social Network Analysis and

Mining. Lecture Notes in Social Networks. Springer

International Publishing, Cham.

Beyersdorf, B. (2019). Regulating the Most Accessible

Marketplace of Ideas in History: Disclosure Require-

ments in Online Political Advertisements after the

2016 Election. California Law Review, 107.

Bhattacharya, N., Arpinar, I. B., and Kursuncu, U. (2017).

Real Time Evaluation of Quality of Search Terms dur-

ing Query Expansion for Streaming Text Data Us-

ing Velocity and Relevance. In Proceedings - IEEE

11th International Conference on Semantic Comput-

ing, ICSC 2017, pages 280–281. Institute of Electrical

and Electronics Engineers Inc.

Bueno, C. C. (2016). The Attention Economy: Labour, Time

and Power in Cognitive Capitalism. Rowman & Lit-

tleﬁeld International.

Bybee, K. J. and Jenkins, L. (2019). Free Speech, Free

Press, and Fake News: What If the Marketplace of

Ideas Isn’t About Identifying Truth? SSRN Electronic

Journal.

Can, E. F., Oktay, H., and Manmatha, R. (2013). Predicting

retweet count using visual cues. In Proceedings of the

22nd ACM international conference on Conference on

information & knowledge management - CIKM ’13.

Cappallo, S., Mensink, T., and Snoek, C. G. (2015). Latent

Factors of Visual Popularity Prediction. In Proceed-

ings of the 5th ACM on International Conference on

Multimedia Retrieval - ICMR ’15.

Cohen, J. (1988). Statistical Power Analysis for the Behav-

ioral Sciences. Lawrence Erlbaum Associates.

Davenport, T. H. and Beck, J. C. (2001). The attention

economy: Understanding the new currency of busi-

ness. Harvard Business Press.

Eshgi, S., Maghsudi, S., Restocchi, V., Stein, S., and Tassiu-

las, L. (2019). Efﬁcient inﬂuence maximization under

network uncertainty. In INFOCOM 2019 Workshop

proceedings.

Fisher, W. D. (1958). On Grouping For Maximum Homo-

geneity. American Statistical Association Journal.

Franck, G. (2019). The economy of attention. Journal of

Sociology, 55(1):8–19.

Garimella, K., Weber, I., and De Choudhury, M. (2016).

Quote rts on twitter: Usage of the new feature for po-

litical discourse. In Proceedings of the 8th ACM Con-

ference on Web Science, WebSci ’16, pages 200–204,

New York, NY, USA. ACM.

Hansen, L. K., Arvidsson, A., Nielsen, F. A., Colleoni, E.,

and Etter, M. (2011). Good friends, bad news - Affect

and virality in twitter. In Communications in Com-

puter and Information Science.

Horn, J. L. (1965). A rationale and test for the number of

factors in factor analysis. Psychometrika, 30(2):179–

185.

Jorgensen, K. W. and Hansen, L. K. (2011). Model selection

for gaussian kernel pca denoising. IEEE transactions

on neural networks and learning systems, 23(1):163–

168.

Ke, G., Meng, Q., Wang, T., Chen, W., Ma, W., Liu, T.-

Y., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q.,

and Liu, T.-Y. (2017). LightGBM: A highly efﬁcient

gradient boosting decision tree. Advances in Neural

Information Processing Systems.

Khosla, A., Das Sarma, A., and Hamid, R. (2014). What

makes an image popular? In Proceedings of the 23rd

international conference on World wide web - WWW

’14.

Kim, H., Jang, S. M., Kim, S.-H., and Wan, A. (2018).

Evaluating Sampling Methods for Content Analy-

sis of Twitter Data. Social Media + Society,

4(2):205630511877283.

Kowalczyk, D. K. and Larsen, J. (2019). Scalable Privacy-

Compliant Virality Prediction on Twitter. In Proceed-

ings of AffCon 2019 @ AAAI, volume 2328, pages 12–

27.

Lee, D., Hosanagar, K., and Nair, H. S. (2018). Advertis-

ing content and consumer engagement on social me-

dia: evidence from facebook. Management Science,

64(11):5105–5131.

Lorenz-Spreen, P., Mønsted, B. M., H

ovel, P., and

Lehmann, S. (2019). Accelerating dynamics of collec-

tive attention. Nature Communications, 10(1):1759.

Mazloom, M., Rietveld, R., Rudinac, S., Worring, M., and

van Dolen, W. (2016). Multimodal Popularity Pre-

diction of Brand-related Social Media Posts. In Pro-

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

924

ceedings of the 2016 ACM on Multimedia Conference

- MM ’16.

McParlane, P. J., Moshfeghi, Y., and Jose, J. M. (2014).

”Nobody comes here anymore, it’s too crowded”; Pre-

dicting Image Popularity on Flickr. Proceedings of

International Conference on Multimedia Retrieval -

ICMR ’14.

Microsoft (2017). Cognitive Services APIs refer-

ence. https://westus.dev.cognitive.microsoft.com/

docs/services/TextAnalytics.V2.0/. Accessed: 2018-

09-05.

Pezzoni, F., An, J., Passarella, A., Crowcroft, J., and Conti,

M. (2013). Why do I retweet it? An information prop-

agation model for microblogs. In Lecture Notes in

Computer Science (including subseries Lecture Notes

in Artiﬁcial Intelligence and Lecture Notes in Bioin-

formatics).

Qiu, X., F. M. Oliveira, D., Sahami Shirazi, A., Flammini,

A., and Menczer, F. (2017). Limited individual at-

tention and online virality of low-quality information.

Nature Human Behaviour, 1(7):0132.

Wang, K., Bansal, M., and Frahm, J. M. (2018). Retweet

wars: Tweet popularity prediction via dynamic multi-

modal regression. In Proceedings - 2018 IEEE Win-

ter Conference on Applications of Computer Vision,

WACV 2018.

Weng, L., Flammini, A., Vespignani, A., and Menczer, F.

(2012). Competition among memes in a world with

limited attention. Scientiﬁc reports, 2:335.

Wu, T. (2017). The attention merchants: The epic scramble

to get inside our heads. Vintage.

The Complexity of Social Media Response: Statistical Evidence for One-dimensional Engagement Signal in Twitter

925