Content Significance Distribution of Sub-Text Blocks in Articles
and Its Application to Article-Organization Assessment
You Zhou
a
and Jie Wang
b
Richard Miner School of Computer & Information Sciences, University of Massachusetts, Lowell, MA 01854, U.S.A.
Keywords:
Content Significance Distribution, Embedding Similarity, Article Structure, Beta Distribution,
Article-Organization Assessment.
Abstract:
We explore how to capture the significance of a sub-text block in an article and how it may be used for text
mining tasks. A sub-text block is a sub-sequence of sentences in the article. We formulate the notion of content
significance distribution (CSD) of sub-text blocks, referred to as CSD of the first kind and denoted by CSD-1.
In particular, we leverage Hugging Face’s SentenceTransformer to generate contextual sentence embeddings,
and use MoverScore over text embeddings to measure how similar a sub-text block is to the entire text. To
overcome the exponential blowup on the number of sub-text blocks, we present an approximation algorithm
and show that the approximated CSD-1 is almost identical to the exact CSD-1. Under this approximation, we
show that the average and median CSD-1’s for news, scholarly research, argument, and narrative articles share
the same pattern. We also show that under a certain linear transformation, the complement of the cumulative
distribution function of the beta distribution with certain values of α and β resembles a CSD-1 curve. We
then use CSD-1’s to extract linguistic features to train an SVC classifier for assessing how well an article
is organized. Through experiments, we show that this method achieves high accuracy for assessing student
essays. Moreover, we study CSD of sentence locations, referred to as CSD of the second kind and denoted by
CSD-2, and show that average CSD-2’s for different types of articles possess distinctive patterns, which either
conform common perceptions of article structures or provide rectification with minor deviation.
1 INTRODUCTION
In articles crafted by skilled writers, certain sentence
positions hold greater significance compared to other
positions, as do certain sub-sequences of sentences.
A prime example is news articles, where the sen-
tences positioned towards the beginning tend to carry
greater significance than those towards the end, result-
ing in an inverted-pyramid-like structure for content
significance. In linguistics, article structures are qual-
itatively classified based on content-significance dis-
tributions. Some are classified into self-explanatory
geometric shapes, including inverted pyramid, hour-
glass, diamond, and Christmas tree. Other classifica-
tions include narrative, five-boxes, and organic. The
narrative presents a straightforward, chronological ac-
count of events. For information about five-box and
organic structures, the reader is referred to Saleh’s
guide to article writing (Saleh, 2014).
These qualitative classifications serve as a rule of
a
https://orcid.org/0009-0005-0919-5793
b
https://orcid.org/0000-0003-1483-2783
thumb for writers to organize various types of articles
and for readers to grab the significant content. How-
ever, content significance in an article has not been
quantitatively studied, and its full potentials beyond
qualitative classification of article structures is yet to
be unlocked.
In a recent study on ranking sentences in an arti-
cle, we note that an ad hoc location weight is assigned
to a sentence to reflect its significance based on intu-
itive judgments specific to the given type of articles
(Zhang and Wang, 2021; Zhang et al., 2021). Despite
the rudimentary simplicity of this approach, feature
analysis in this study demonstrates that such weights
do play a significant role in sentence ranking.
Motivated by this result, we explore how qualita-
tive classifications of article structures may be turned
into quantitative descriptions for carrying out certain
text mining tasks. In particular, we explore content
significance distribution of sub-text blocks in an arti-
cle that leads to the formulation of CSD of the first
kind. For this notion to be useful in practice, we need
to deal with the exponential blowup of the number
Zhou, Y. and Wang, J.
Content Significance Distribution of Sub-Text Blocks in Articles and Its Application to Article-Organization Assessment.
DOI: 10.5220/0012232600003598
In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2023) - Volume 1: KDIR, pages 121-131
ISBN: 978-989-758-671-2; ISSN: 2184-3228
Copyright © 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
121
of sub-text blocks, for it is time consuming to com-
pute the exact CSD-1 for a long article. To overcome
this obstacle, we devise an approximation method
to compute CSD-1 over a moderate number of text
blocks chosen independently at random. We show
that, through experiments, the approximated CSD-1
is almost identified to the exact CSD-1.
We investigate four common types of articles: ar-
gument, narrative, news, and scholarly research. To
do so, we form four datasets using existing datasets,
one for each type of articles. We show that the average
and median CSD-1 for each type of articles share the
same pattern. Moreover, we show that a CSD-1 can
be resembled as a linear transformation of the com-
plement of the cumulative distribution function of the
beta distribution of parameters α and β with 0 < α < 1
and 0 < β < 1.
To demonstrate the usefulness of CSD-1, we ap-
ply CSD-1 to intrinsically determining if an article is
well written. In particular, we use CSD-1’s to extract
features and train an SVC classifier using these fea-
tures to assess intrinsically how well an article is or-
ganized. Experiment results show that this method
achieves high accuracy for student essays. article or-
ganization.
Next, we investigate CSD of the second kind
based on sentence locations in an article. Unlike com-
puting CSD-1 that incurs exponential running time,
exact CSD-2 can be computed efficiently. We show
that the average and median CSD-2’s for each type of
articles are close to each other, with distinctive pat-
terns for different types of articles, which conform
common perceptions of the structures for news and
narrative articles, and rectify with minor deviations
an earlier perception of the structures for scholarly re-
search and argument articles in the study of sentence
ranking (Zhang et al., 2021).
The rest of the paper is organized as follows: We
describe related work on automatic assessment of ar-
ticle qualities in Section 2. We present in Section 3
MoverScore and the datasets. In Section 4 we de-
fine CSD of the first kind, describe an approxima-
tion algorithm to compute it, and show that CSD-1’s
for different types of human-written articles all share
the same pattern. We also discuss CSD-1 for articles
formed by random sentences. In Section 5 we show
how to resemble a CSD-1 using the beta distribution
with 0 < α < 1 and 0 < β < 1 by the complement of its
cumulative distribution function under a linear trans-
formation. In Section 6 we show how to use CSD-1
to intrinsically assess article organization. In Section
7 we define and discuss CSD of the second kind, de-
noted by CSD-2. We conclude the paper in Section
8 with remarks and suggestions for further investiga-
tions.
2 RELATED WORK
To the best of our knowledge, we are not aware of
any prior work on quantitative investigations of con-
tent significance distribution of sub-text blocks in an
article.
Automatic assessment of article qualities, on the
other hand, has attracted attention in recent years. The
quality of an article is determined by a number of fac-
tors, including grammaticality, readability, stylistic
attributes, and the depth of expertise presented. Tex-
tual analysis can be carried out at the word level, such
as identifying verb formation errors, calculating av-
erage word frequency, and determining average word
length, and at the sentence level. Word-level features
have been used to measure word usages and lexical
complexity in assessing essays (Attali and Burstein,
2006). Incorporating sentence-level features for as-
sessing article qualities provides a fruitful approach
(Cummins et al., 2016).
In an attempt to address the complexity of assess-
ing article qualities, Yang et al (Yang et al., 2018)
introduced a modularized hierarchical convolutional
neural net, where individual sections of an article are
treated as separate modules, with an attentive pooling
layer applied to the concatenated representations of
these sections, which are fed into a softmax layer for
evaluation.
To investigate visual effects of an article such as
font choices, the layout of the article, and images in-
cluded in the article, a multimodal approach (Shen
et al., 2020) was presented to capture implicit qual-
ity indicators that extend beyond the textual content
of an article. The integration of these visual aspects
with the textual content enhances the effectiveness of
article quality assessment.
The concept of text coherence also plays a pivotal
role. For example, a hierarchical coherence model
was introduced (Liao et al., 2021), which seeks to
leverage local coherence within sentences as well as
broader contextual relationships and diverse rhetori-
cal connections. This approach transcends the con-
ventional assessment of sentence similarity, incorpo-
rating a richer understanding of the article’s coher-
ence.
KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval
122
3 SIMILARITY MEASURES AND
DATASETS
It is critical to use an appropriate metric to mea-
sure how similar semantically a sub-text block is to
the entire text. Traditional token-based metrics for
measuring text similarity, such as BLEU (Papineni
et al., 2002), ROUGE (Lin, 2004), and Jaccard’s co-
efficients (Jaccard, 1912), fail to capture similarities
between texts in lexical forms that convey the same
or similar meanings. To overcome this limitation, we
would need a metric on semantic similarity.
3.1 MoverScore
We propose to use MoverScore (Zhao et al., 2019) to
measure semantic similarities between two texts. In
our case, one text is a sub-text block and the other
is the entire article. MoverScore uses Earth Mover’s
Distance (EMD) (Levina and Bickel, 2001) to com-
pute the distance between the contextual embeddings
of the two texts to be compared with, where contex-
tual embeddings may be computed by ELMo (Peters
et al., 1802), BERT (Devlin et al., 2018), or some
other transformers. We use HFace’s SentenceTrans-
former ((Reimers and Gurevych, 2019)) to generate
text embeddings. The resulting EMD distance score
measures the semantic similarity of the two texts.
In so doing, MoverScore provides many-to-one soft
alignments to map the candidate word into several
most related reference words, producing a more accu-
rate assessment of the semantic similarities between
two texts that is more aligned with human judgment.
3.2 Datasets
We form four datasets, one for each type of arti-
cles, using the following existing datasets (all in En-
glish): SummBank (Radev, 2003), Argument Anno-
tated Essays (AAE) (Stab and Gurevych, 2014), Pre-
dicting Effective Arguments (PEA) (PEA, ), Book-
Sum (Kry
´
sci
´
nski et al., 2021), and research papers
from arXiv.
NewsA for news articles. NewsA is a dataset of
200 news articles selected independently at ran-
dom from SummBank, with on average of 23 sen-
tences in an article. SummBank is a large collec-
tion of news articles with sentence rankings anno-
tated by three judges.
ArguE for argument articles. ArguE is a dataset
of 200 essays selected independently at random
from the union of AAE and PEA, with an average
of 21 sentences in an article. AAE consists of per-
suasive essays written by students for preparing
for standardized tests, and PEA consists of argu-
ment essays written by students of grades 6 12,
annotated by experts for discourse elements in ar-
gumentative writing.
NarrC for narrative articles. NarrC is a dataset of
200 documents selected independently at random
from BookSum. BookSum is a large collection
of long-form narrative summaries from novels,
plays, and stories with a large number of chapter-
level documents which covers source documents
from the literature domain, such as novels, plays
and stories. We randomly select 200 chapters
from the datasets which contains 255 sentences
each in a chapter.
SchRP for scholary research papers. SchRP is a
dataset of 200 papers selected independently at
random from arXiv.org in Physics, Mathematics,
Computer Science, Quantitative Biology, Quanti-
tative Finance, Statistics, Electrical Engineering
and Systems Science, and Economics, with ap-
proximately an equal number of articles in each
subject with an average of 210 sentences in each
paper.
4 CSD OF THE FIRST KIND
Let A = S
1
,S
2
,...,S
n
be an article, represented as a
sequence of sentences, where S
i
is the ith sentence.
Let k be an integer between 1 and n. There are
N =
n
k
many sub-text blocks consisting of k sen-
tences. The number of sentences in a text block is
also referred to as the size. For example, let n = 10
and k = 0.3n = 3. Then there are
10
3
= 120 sub-
text blocks of size 3, listed in lexicographical order as
follows: S
1
,S
2
,S
3
,S
1
,S
3
,S
4
,··· , S
8
,S
9
,S
10
.
Let MSc(X,Y ) represent the MoverScore of text
X and text Y . Let T
1
and T
2
be two text block. We
say that T
1
> T
2
if either MSc(T
1
,A) > MSc(T
2
,A) or
MSc(T
1
,A) = MSc(T
2
,A) and T
1
proceeds T
2
in lexi-
cographical order. Sort the N sub-text blocks of size k
in descending order according to this ordering and let
T
k, j
be the jth sub-text block in the sorted list. That is,
T
k,1
> T
k,2
> ··· > T
k,N
. Then the CSD-1 for A with
size k is a discrete function over x
j
= j/N [0, 1] with
1 j N, defined by
CSD-1(A,k, x
j
) = MSc(T
k, j
,A).
We are particularly interested in selecting 30% of
sentences to form a sub-text block because the previ-
ous results indicate that selecting 30% of sentences
Content Significance Distribution of Sub-Text Blocks in Articles and Its Application to Article-Organization Assessment
123
Table 1: Comparisons between approximated CSD-1 and exact CSD-1 over NewsA.
(a) Average CSD-1 over NewsA.
(b) CSD-1 over a sample news article.
appropriately from an article would typically cap-
ture the major points of the article (see, for example,
(Zhang and Wang, 2021)).
Note that
n
k
(n/k)
k
. For k = 0.3n, we have
(n/k)
k
> 3
0.3n
, resulting in an exponential blowup.
When n is large, computing CSD-1 is intractable. For
example, recall that in SchRP of scholarly research
papers, the average number of sentences in each arti-
cle is 210. We have
n
k
=
210
63
> 3 × 10
54
, which is
much too big for any computer to handle. Approxi-
mation is therefore needed.
Table 2: Average and median approximated CSD-1 for ar-
ticles of each type, where the x-values are the normalized
sequence of text blocks in ascending order of MoverScores
compared with the article itself, and the values in the table
are the MoverScores.
4.1 Approximating CSD-1
Approximating CSD-1 for each article proceeds as
follows:
1. Generate independently at random 5,000 text
blocks.
2. Cluster sentences in A using Affinity Propagation
(Frey and Dueck, 2007) based on sentence em-
beddings generated by SentenceTransformer. Let
C
1
,C
2
,...,C
m
be resulted clusters, where m is de-
termined by the clustering algorithm. Let n
i
be the
number of sentences in cluster C
i
(i = 1,2,...,m).
Select 0.3n × n
i
/n sentences from cluster C
i
to
form a text block and randomly select 5,000 such
text blocks.
The objective of this step is to avoid pure random
sampling and try to cover the exact curve as much
as possible with limited samples. We achieve this
by selecting text blocks at random from different
topics just in case the text blocks selected in Step
1 have missed certain topics.
3. Combine the 5,000 text blocks generated in Step 1
and the 5,000 text blocks generated in Step 2. Use
these 10,000 text blocks to compute CSD-1 as an
approximation to the exact CSD-1 for A.
KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval
124
(a) News. (b) Scholarly research.
(c) Argument. (d) Narrative.
Figure 1: Average and median approximated CSD-1 for articles of each type, where the x-axis is the normalized sequence of
text blocks in ascending order of MoverScores compared with the article itself, and the y-axis is the MoverScores.
Extensive experiments indicate that CSD-1 under this
approximation is almost identical to the exact CSD-
1 for various sizes. For example, Table 1 (a) shows
the average approximated CSD-1 and the exact CSD-
1 for articles in NewsA, while Table 1 (b) shows the
approximated CSD-1 and the exact CSD-1 for a ran-
dom sample from NewsA, where c is a fractional to
determine the number k of sentences in a text block
with k = cn. Thus, it is sufficient to use approxi-
mated CSD-1 to replace exact CSD-1.
Table 2 depicts the average and median approxi-
mated CSD-1’s with c = 0.3 over each dataset, where
SchR, Argu, and Narr represent, respectively, Schol-
arly Research, Argument, and Narrative, while avg
and med represent, respectively, average and median.
Figure 1 depicts the corresponding curves.
4.2 Segments of CSD-1
It is evident that the average approximated CSD-1 and
the median approximated CSD-1 for each type of arti-
cles are very close to each other, and they all share the
same pattern. This pattern can be divided into three
segments, referred to as, from left to right, the left seg-
ment (L-segment), the middle segment (M-segment),
and the right segment (R-segment). The L-segment
contains a very small number of text blocks that are
substantially more significant than the rest, with the
largest value close to 0.9. The union of these text
blocks is the most significant content of the article.
The R-segment contains a very small number of text
blocks with the least significance, with value below
0.6 and above 0.4. These text blocks often contain
connection sentences. The M-segment is the majority
of text blocks that are gradually decreasing in terms
of significance. Figure 2 depicts the three segments
of CSD-1.
4.3 CSD-1 for Randomly Generated
Articles
To establish a baseline for CSD-1 on articles writ-
ten by under-educated writers, we generate articles by
selecting at random unrelated sentences from 20 ex-
Content Significance Distribution of Sub-Text Blocks in Articles and Its Application to Article-Organization Assessment
125
Figure 2: CSD-1 segments.
isting articles, one sentence per article, and placing
them in a random order. We call such articles “Ran-
dom sentences”. Existing articles are selected from
SummBank in one setting, and Wikipedia (wik, ) in
another setting.
We also form articles using 20 identical sentences
and 20 sentences with high embedding similarity. For
the former, each CSD-1 is simply a straight line. For
the latter, we generate similar sentences from a given
sentence as follows: Select at random one or two
words and replace them with synonyms to form 20
new sentences so that their pairwise embedding sim-
ilarity is greater than 0.9. Figure 3 depicts the aver-
age CSD-1 on random-sentence articles and similar-
sentence articles. It can be seen that the average CSD-
1 for random-sentence articles exhibits a much larger
y-value range from nearly zero to below 0.7.
5 TRANSFORMING BETA
DISTRIBUTION TO CSD-1
We show that a CSD-1 curve can be resembled using
the beta distribution for certain values of parameters
α and β under a certain linear transformation. In par-
ticular, we present the following observations:
1. A typical CSD-1 curve resembles the complement
of a cumulative distribution function (CDF) of a
U-shape probabilistic density function (PDF).
2. The beta distribution, denoted by Beta(α,β), pro-
vides a U-shape PDF with 0 < α < 1 and 0 < β <
1.
Denote by I
x
(α,β) the CDF of Beta(α,β). Let
C
x
(α,β) = 1 I
x
(α,β).
Figure 4 shows the curve of C
x
(0.4,0.3).
We apply a linear transformation to obtain a CSD-
1 curve in the desired range. Let
LC
x
(a,b | α, β) = a ·C
x
(α,β) + b,
a) Random sentences.
b) Similar sentences.
Figure 3: Average CSD-1’s with c = 0.3 for (a) articles of
random sentences and (b) articles of similar sentences, with
the same x-axis and y-axis as those in Figure 1.
Figure 4: The curve of C
x
(0.4,0.3), which spans the entire
y-axis from 0 to 1.
where a 0, b 0, and a+b 1. For easier reading,
we may also write LC
x
(a = a
0
,b = b
0
| α = α
0
,β =
β
0
) as LC
x
(a
0
,b
0
| α
0
,β
0
).
For example, Figure 5(a) depicts the curve of
LC
x
(a = 0.38, b = 0.55 | α = 0,45,β = 0.3), which re-
sembles the curves of Figure 1(a) for NewsA. Figure
KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval
126
5(b) depicts the curve of LC
x
(a = 0.6,b = 0.05 | α =
0,4,β = 0.35), which resembles the curves of Figure
3(a) for random sentences. Figure 5(c) depicts the
curve of LC
x
(a = 0.1,b = 0.8 | α = 0,4,β = 0.25),
which resembles the curves of Figure 3(b) for similar
sentences.
(a) LC
x
(0.38,0.55 | 0,45,0.3).
(b) LC
x
(a = 0.6,b = 0.05 | α = 0,4, β = 0.35).
(c) LC
x
(a = 0.1,b = 0.8 | α = 0,4, β = 0.25).
Figure 5: The curves resemble, respectively, (a) the average
and median curves for news articles in Figure 1(a), (b) the
average curve for random sentences in Figure 3(a), and (c)
the average curve for similar sentences in Figure 3(b).
We observe that α controls how the L-segment of
the CSD-1 curve looks like and β controls how the R-
segment of the CSD-1 curve looks like. In particular,
1. If α is smaller, then the L-segment is larger both
vertically and horizontally.
2. If β is smaller, then the R-segment is larger both
vertically and horizontally.
We conjecture that by choosing the values of α, β, a,
and b appropriately, we can resemble CSD-1 for any
type of article using LC
x
(a,b | α, β).
6 ASSESSING ARTICLE
ORGANIZATION
As an application of CSD-1, we show how to assess
intrinsically how well an article is organized using
features extracted from multiple CSD-1’s with vari-
ous sizes. For this purpose, we use the ASAP (the
Automated Student Assessment Prize ) dataset (Stab
and Gurevych, 2014), for it contains subsets suitable
for carrying out this task. It consists of eight sets of
essays, written by students and scored by experienced
graders on idea, organization, style, convention, sen-
tence fluency, and word choice.
In particular, we choose Set 7 and Set 8, for they
provide an organization score for each essay, assess-
ing whether an essay is organized in a way that en-
hances the central ideas and its development, focus-
ing on whether the sentence orders are compelling
and move the reader through the text easily, and the
connections between ideas and events are clear and
logically sequenced.
Specifically, Set 7 consists of 1,730 essays with
an average of 250 words, scored by two graders with
scores from 0 to 3. Averaging the two scores for each
essay yields five labels from 1 to 3 with an increment
of 0.5, one essay one score. Set 8 consists of 918
essays with an average of 650 words, scored by two
or three graders with scores from 1 to 6. Averaging
scores for an essay may result in slight discrepancy
depending on whether it has two or three scores. For
example, an average score could be 3.5 with 2 scores
or 3.67 with 3 scores for different papers of about the
same quality. We combine such scores into one score,
yielding eleven labels from 1 to 6 with an increment
of 0.5.
Neither set provides sufficient data to train neural-
net classifiers. Instead, we train SVC classifiers by
extracting features from approximated CSD-1 of vari-
ous sizes as percentages of n, the number of sentences
in an article. For example, we may divide n by 10 or
by 5 to yield nine or 19 different sizes. Figure 6 shows
Content Significance Distribution of Sub-Text Blocks in Articles and Its Application to Article-Organization Assessment
127
approximated CSD-1’s of text blocks of three differ-
ent sizes of a single document in Set 7, which are of
essentially the same shape with the one of a larger size
above the one of a smaller size.
Figure 6: Approximated CSD-1’s with sizes determined by
c
i
= 0.2 + 3(i 1)/10 for i = 1, 2,3.
However, even with 10,000 text blocks to obtain
an approximated CSD-1, the computation is still too
high, which hinders our application of assessing es-
say organization in real time. To overcome this ob-
stacle, we sample N text blocks uniformly and inde-
pendently at random to further approximate CSD-1’s
on each size. Through intensive and extensive experi-
ments, we find that choosing N = 1,000 and dividing
n by 10 offer a satisfactory trade-off between accu-
racy and time complexity for our applications. Note
that selecting a larger value of N and dividing n by a
number smaller than 10, while providing a better ac-
curacy, incurs a tremendous time complexity required
to calculate CSD-1’s for 9 times from 10% to 90% for
each article, making it unpractical. This yields, for
each article, nine vectors of 1,000 dimensions from
the corresponding CSD-1. We then select 10 equal
positions from a 1,000d vector to produce, for each
article, nine 10d vectors.
We train a multi-label SVC classifier for Set 7
based on the given training set using the 10d feature
vectors with a standard 80-20 split, and do the same
for Set 8.
For a test article, we call the predicted result “ex-
act”, “adequate”, and “acceptable” if, respectively,
the predicted label by SVC is identical to its label,
±0.5 of the exact, and ±1 of the exact. It is com-
mon for experienced graders to have small dependen-
cies among them, with ±0.5 been adequate and ±1
acceptable. Table 4 shows the binary F1 scores for
“exact”, “adequate”, and “acceptable”.
In developing a practice application, for datasets
with a smaller number of labels, such as Set 7 with
5 labels, we may hesitate to consider ±1 of the exact
Table 3: F1 scores of predicted labels for test articles.
Exact (%) ±0.5 (%) ±1 (%)
Set 7 71.26 85.21 99.08
Set 8 67.19 76.66 98.01
acceptable, but it is reasonable to consider ±1 of the
exact acceptable for datasets with a larger number of
labels, such as Set 8 with 11 labels. Thus, we would
want to adopt the 6-point system to achieve over 98%
accuracy of acceptable intrinsic evaluation of article
organization.
7 CSD OF THE SECOND KIND
We define CSD-2 to reflect the significance of sen-
tence locations. The idea is to identify where the top
30% of sentences with the highest MoverScores are
located. The reason of choosing 30% is the same as
that in Section 4.
Given an article A = S
1
,S
2
,...,S
n
of a certain
type with S
i
being the ith sentence for i = 1,2,...,n,
select the top t = 0.3n sentences that are most simi-
lar to A as key sentences determined by MoverScores.
Let these sentences be S
i
1
,S
i
2
,...,S
i
t
with i
1
< i
2
<
··· < i
t
. We normalize the location index of each of
these sentences in A by n and define CSD-2 as the
following discrete function:
CSD-2(A,i/n) =
MSc(S
i
j
,A), if i = i
j
,
0, otherwise.
We compute the average and median CSD-2’s for
each type of articles over, respectively, the ArguE,
NewsA, NarrC, and SchRP datasets. Table 4 depicts
the average and median CSD-2 for each dataset, and
Figure 7 depicts the corresponding CSD-2 curves.
Table 4: Average and median CSD-2 for each dataset.
The following observations are evident.
1. For each dataset, the median CSD-2 is very closed
to the average CSD-2, indicating that the average
CSD-2 curve provides a good representative for
each dataset.
2. For news articles, the average CSD-2 is monoton-
ically decreasing. This is in line with an inverted
KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval
128
(a) News. (b) Scholarly research.
(c) Argument. (d) Narrative.
Figure 7: Average and median CSD-2 for articles of each type, where the x-axis is the normalized sentence indexes and the
y-axis is the MoverScores of sentences and the article they are in.
pyramid in the common perception for the struc-
ture of news articles. The small bumps over the
x-interval (0.2, 0.6) indicate that the structure of a
news article may not be a straight inverted pyra-
mid.
3. For scholarly research articles, the average CSD-2
is also monotonically decreasing. Different from
the average CSD-2 for news articles, we note
that the values over the x-interval (0,0.2) drops
rapidly from about 0.8 to below 0.2. It decreases
slowly over the x-interval (0.2,1]. This indicates
that in a scholarly research paper, the first 20% of
sentences would be the most significant, and the
rest of the article would be justifications of these
statements. This differs from an earlier perception
that presumes an hourglass structure for scholarly
research articles (Zhang and Wang, 2021).
4. For argument articles, the average CSD-2 resem-
bles a shallow “W”, with two peaks at both ends
and one peak in the middle, where the peaks at
both ends are higher than the one in the middle.
This indicates that the structure of an argument
article would, on average, start and end with the
most significant arguments that match each other,
with other secondary arguments somewhere in the
middle. This differs from an earlier perception
that presumes a pyramid structure for argument
articles (Zhang and Wang, 2021).
5. For narrative articles, the average CSD-2 resem-
bles a shallow wave line centered around a value
slightly greater than 0.3. This indicates that the
structure of a narrative article would, on average,
includes sentences of approximately the same sig-
nificance throughout the article. This is in line
with the common perception.
8 CONCLUSIONS AND FINAL
REMARKS
We present for the first time a quantitative method to
capture content significance distributions of sub-texts
within an article and show that it is a promising new
approach to unlocking potentials for certain text min-
Content Significance Distribution of Sub-Text Blocks in Articles and Its Application to Article-Organization Assessment
129
ing tasks using linguistic knowledge which so far only
has qualitative descriptions. In addition to CSD-1 and
CSD-2, we believe that CSDs of other kinds may also
be possible that are awaiting to be explored.
We have demonstrated how to use CSD-1 to as-
sess article organization with high accuracy, and we
believe that other applications may also be possible.
For instance, we may use CSD-2 to identify the type
of a given article to help obtain more accurate rank-
ing of sentences. Incorporating sentence ranking to a
large language model such as GPT-3.5-turbo (Brown
et al., 2020), LLaMA (Touvron et al., 2023), and
PaLM (Chowdhery et al., 2022) is expected to help
generate a better summary for a given article.
Our approach of computing CSDs relies on met-
rics of comparing semantic similarities of a sub-text
block (note that a sentence is a special case of sub-text
block) to the article it is in. While MoverScore is ar-
guably the best choice at this time, computing Mover-
Scores incurs a cubic time complexity (Zhao et al.,
2019). Fortunately, this task is highly parallelizable
and we have implemented a parallel program to carry
out this task on a GPU, which provides much more ef-
ficient computation of CSD-1. Nevertheless, finding a
more effective and efficient measure for content simi-
larity is highly desirable for our tasks, particularly for
long articles.
We would also like to seek intuitions and math-
ematical explanations why the functions LC
x
(a,b |
α,β) resemble CSD-1 curves.
Finally, we would like to explore if CSDs may be
used to assess the overall quality of an article with
a single score with better accuracy than an early at-
tempt (Wang et al., 2022) using a multi-scale essay
representation that can be jointly learned, which em-
ploys multiple losses and transfer learning from out-
of-domain essays.
ACKNOWLEDGMENT
We would like to thank Jay Belanger for a valuable
suggestion on function transformation.
REFERENCES
Feedback prize - predicting effective argu-
ments. https://www.kaggle.com/competitions/
feedback-prize-effectiveness/data. Accessed: 2022.
Wikimedia downloads.
Attali, Y. and Burstein, J. (2006). Automated essay scoring
with e-rater® v. 2. The Journal of Technology, Learn-
ing and Assessment, 4(3).
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D.,
Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,
Askell, A., et al. (2020). Language models are few-
shot learners. Advances in Neural Information Pro-
cessing Systems, 33:1877–1901.
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra,
G., Roberts, A., Barham, P., Chung, H. W., Sut-
ton, C., Gehrmann, S., et al. (2022). Palm: Scal-
ing language modeling with pathways. arXiv preprint
arXiv:2204.02311.
Cummins, R., Zhang, M., and Briscoe, T. (2016). Con-
strained multi-task learning for automated essay scor-
ing. In Proceedings of the 54th Annual Meeting of the
Association for Computational Linguistics (Volume 1:
Long Papers), pages 789–799.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2018). Bert: Pre-training of deep bidirectional trans-
formers for language understanding. arXiv preprint
arXiv:1810.04805.
Frey, B. J. and Dueck, D. (2007). Clustering by
passing messages between data points. science,
315(5814):972–976.
Jaccard, P. (1912). The distribution of the flora in the alpine
zone.1. New Phytologist, 11:37—-50.
Kry
´
sci
´
nski, W., Rajani, N., Agarwal, D., Xiong, C., and
Radev, D. (2021). Booksum: A collection of datasets
for long-form narrative summarization.
Levina, E. and Bickel, P. (2001). The earth mover’s distance
is the mallows distance: Some insights from statis-
tics. In Proceedings Eighth IEEE International Con-
ference on Computer Vision. ICCV 2001, volume 2,
pages 251–256. IEEE.
Liao, D., Xu, J., Li, G., and Wang, Y. (2021). Hierarchical
coherence modeling for document quality assessment.
In Proceedings of the AAAI Conference on Artificial
Intelligence, volume 35, pages 13353–13361.
Lin, C.-Y. (2004). Rouge: A package for automatic evalu-
ation of summaries. In Text summarization branches
out, pages 74–81.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002).
Bleu: a method for automatic evaluation of machine
translation. In Proceedings of the 40th annual meet-
ing of the Association for Computational Linguistics,
pages 311–318.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark,
C., Lee, K., and Zettlemoyer, L. (1802). Deep con-
textualized word representations. corr abs/1802.05365
(2018). arXiv preprint arXiv:1802.05365.
Radev, Dragomir, e. a. (2003). Summbank. Radev,
Dragomir, et al. SummBank 1.0 LDC2003T16. Web
Download. Philadelphia: Linguistic Data Consortium,
2003.
Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sen-
tence embeddings using siamese bert-networks. arXiv
preprint arXiv:1908.10084.
Saleh, N. (2014). The Complete Guide to Article Writing:
How to Write Successful Articles for Online and Print
Markets. Writer’s Digest Books; Illustrated edition
(January 14, 2014).
KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval
130
Shen, A., Salehi, B., Qi, J., and Baldwin, T. (2020). A
multimodal approach to assessing document quality.
Journal of Artificial Intelligence Research, 68.
Stab, C. and Gurevych, I. (2014). Annotating argument
components and relations in persuasive essays. Chris-
tian Stab and Iryna Gurevych. 2014. Annotating ar-
gument components and relations in persuasive es-
says. In Proceedings of the 25th International Confer-
ence on Computational Linguistics (COLING 2014),
Dublin, Ireland.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux,
M.-A., Lacroix, T., Rozi
`
ere, B., Goyal, N., Hambro,
E., Azhar, F., et al. (2023). Llama: Open and ef-
ficient foundation language models. arXiv preprint
arXiv:2302.13971.
Wang, Y., Wang, C., Ruobing, L., and Lin, H. (2022).
On the use of BERT for atutomated essay scor-
ing: joint learning of multi-scale essay representation.
https://arxiv.org/pdf/2205.03835v2.pdf.
Yang, P., Sun, X., Li, W., and Ma, S. (2018). Automatic
academic paper rating based on modularized hierar-
chical convolutional neural network. arXiv preprint
arXiv:1805.03977.
Zhang, H. and Wang, J. (2021). An unsupervised seman-
tic sentence ranking scheme for text documents. Inte-
grated Computer-Aided Engineering, 28:17–33.
Zhang, H., Zhou, Y., and Wang, J. (2021). Contextual net-
works and unsupervised ranking of sentences. In 2021
IEEE 33rd International Conference on Tools with Ar-
tificial Intelligence (ICTAI), pages 1126–1131.
Zhao, W., Peyrard, M., Liu, F., Gao, Y., Meyer, C. M., and
Eger, S. (2019). Moverscore: Text generation evaluat-
ing with contextualized embeddings and earth mover
distance. arXiv preprint arXiv:1909.02622.
Content Significance Distribution of Sub-Text Blocks in Articles and Its Application to Article-Organization Assessment
131