A Novel Method for Embedding and Extracting Secret Messages in
Textual Documents based on Paragraph Resizing
Benjamin Aziz
a
, Aysha Bukhelli
b
, Rinat Khusainov
c
and Alaa Mohasseb
d
School of Computing, University of Portsmouth, Portsmouth PO1 3HE, U.K.
Keywords:
Formal Methods, Information Hiding, Lexical Steganography, Text Steganography, Linguistic Steganography.
Abstract:
The ancient technique of information hiding known as text steganography has enjoyed much research in recent
years due to the rising popularity of social media platforms and the abundant availability of online literature
and other text as cover media for steganography. Whilst the majority of the research approaches have focused
on manipulating or replacing text, in some form or another, to embed secret information, the utilisation of
the structure of the document itself for such embedding has rarely been researched. Therefore, we propose in
this short paper a new approach for embedding secret messages in textual documents based on the splitting,
merging, and resizing of paragraph text. The size comparison between adjacent paragraphs embeds one bit of
information. We outline only the basic idea and define the syntax and semantics of the embedding language.
1 INTRODUCTION
Text steganography refers to all the techniques and
methods used for hiding secret messages in textual
documents (Agarwal, 2013; Lockwood and Curran,
2017; Taleby Ahvanooey et al., 2019; Kumar and
Singh, 2020; Majeed et al., 2021). Unlike other me-
dia, text documents are more challenging to use as
cover due to the lack of redundant information that
can be used for hiding secret messages. As a result,
the smallest manipulation of the text becomes imme-
diately visible to the human eye. This further means
that the original cover document cannot be assumed
to be known to the reader, since differences with the
modified version will be immediately detectable.
There are many approaches to text steganography.
These include format-based, in which the physical
features of text symbols are used to conceal a message
(Xiang et al., 2007; Roy and Manasmita, 2011; Na-
haruddin et al., 2018; Malik et al., 2017), substitution-
based, or lexical approaches, where words are substi-
tuted for others without affecting the meaning (Bar-
mawi, 2016; Yajam et al., 2014), random statistical
generation (Wu et al., 2020; Wu et al., 2019; Huan-
huan et al., 2017), linguistic methods (Li et al., 2021;
a
https://orcid.org/0000-0001-5089-2025
b
https://orcid.org/0000-0001-7578-977X
c
https://orcid.org/0000-0003-2087-5245
d
https://orcid.org/0000-0003-2671-2199
Yang et al., 2021b; Kang et al., 2020), as well as cov-
erless (Wu et al., 2018; Wang and Gao, 2019; Guan
et al., 2022) and human-in-the-loop-based (Bergmair
and Katzenbeisser, 2006; Grosvald and Orgun, 2012)
approaches. The use of text-based steganography and
steganalysis in social media (Shirali-Shahreza and
Shirali-Shahreza, 2007; Wilson et al., 2015) in re-
cent years has drawn much attention to these top-
ics from the research community, due to the impor-
tant security and safety implications that online me-
dia communications have on everyday modern life.
On the other hand, several text steganalysis methods
have also been proposed in literature, as documented
in the work of (Samanta and Pattyanayak, 2020). In
addition, datasets such as (Yang et al., 2021a) have
been collected for testing text steganalysis methods,
and also tools have been developed for text steganog-
raphy, such as LUNABEL (Chand and Orgun, 2006)
and SNOW (Kwan, 2013).
In this paper, we present a new structural approach
for encoding secret messages in text documents. We
do not manipulate the text itself, but rather the layout
of the document. More specifically, we use the break-
down of a document into paragraphs, where the differ-
ence between the sizes of adjacent paragraphs is used
to encode individual bits of information. We then pro-
pose a formal language to adjust the paragraph sizes
accordingly. This paper reports on our initial ideas,
and lays down the ground for future implementation
and validation of these proposals.
714
Aziz, B., Bukhelli, A., Khusainov, R. and Mohasseb, A.
A Novel Method for Embedding and Extracting Secret Messages in Textual Documents based on Paragraph Resizing.
DOI: 10.5220/0011384200003283
In Proceedings of the 19th International Conference on Security and Cryptography (SECRYPT 2022), pages 714-719
ISBN: 978-989-758-590-6; ISSN: 2184-7711
Copyright
c
2022 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
Most relevant to our work is that of Liang and
Iranmanesh (Liang and Iranmanesh, 2016), who pro-
posed a method by adding five white space characters
to randomly selected positions in a line using a key
to correlate the characters required for embedding se-
cret information. This method is advantageous be-
cause randomly-spread white spaces may encode the
message differently using different keys.
The rest of the paper is structured as follows. In
Section 2, we introduce the theoretical background
necessary for defining our embedding method. In
Section 3, we define the embedding and extraction
processes. In Section 4, we suggest a suitable statisti-
cal test that can be used to attack a document embed-
ded with content using our approach. Finally, we con-
clude the paper in Section 5 and discuss future work.
2 THEORY
We assume that a text document, S , can be de-
fined as a finite sequence of paragraphs, S = P
1
, , P
n
.
Any other content can be ignored. We call the set
of all possible paragraphs, . Hence, any sequence
would be a finite subset, S . We also call the set
of all possible sequences of paragraphs (i.e. docu-
ments), . An example of such set would be one
constructed from the hypothetical Hyperwebster dic-
tionary (Stewart et al., 1996). Furthermore, a para-
graph, P = c
1
, , c
m
, is a sequence of some finite num-
ber of characters with different multiplicities, which
we can capture through a function, ch_of : ./,
where is the multi-set of all possible characters in
some language (e.g. English), and ./ is a power-set
preserving the multiplicity (but not the order) of each
character in this multi-set (as defined by Axiom V in
(Blizard, 1988)):
ch_of.P/ = ˆc : c P
Assuming every paragraph has at least one sentence
defined by a punctuation mark e.g. ’!’, ’?’ or ’.’, the
condition that P : ch_of.P/
+
will always hold.
The same applies to documents, as we assume there
is no empty document, S = , since it would be use-
less for embedding any hidden content. In general,
a document and its paragraphs will need to satisfy a
number of conditions, such as those above, to ensure
these are grammatically and structurally sound. These
conditions must remain invariant once the embedding
operations are defined in the next section.
We define the union of paragraphs,
P
, as follows:
x
1
, , x
n P
y
1
, , y
m
=
x
1
, , x
n
, SPACE, y
1
, , y
m
this simply states that the paragraph union is equiva-
lent to the composition of two sequences of characters
(i.e. two paragraphs) separated by a newline charac-
ter.
Now, we can define the following function, R :
ˆ0, 1‘, to be a function on pairs of paragraphs
resulting in one of the values 0, 1. We can define R
in any way we want, but for the rest of our model, we
will define R as the size comparison function:
R.P
l
, P
r
/ =
0 if ch_of.P
l
/ ch_of.P
r
/
1 otherwise
Using the R function, a document consisting of n
paragraphs can be seen as a sequence of .n * 1/ ze-
roes and ones. We call this sequence the document’s
R sequence:
R
1
, , R
n*1
R is important; it constitutes our embedding function
and its definition will determine what secret message,
M x
i=1n*1
x
i
ˆ0, 1‘, we can communicate.
3 OUR PROPOSED METHOD
We present here our method for embedding and ex-
tracting a secret message. This method can be classi-
fied under the structural (or format-based) approaches
for text steganography, since it affects the structure of
the document rather than its content.
3.1 The Embedding Process
In order to embed a secret message in a document,
we first define a language for altering the paragraph
structure of documents. We call this language Λ, and
define its syntax as follows:
Λ ::= Op Λ Λ
Op ::= P P P P P P P
Informally, Λ is either a document altering operation,
Op, or a sequential composition of two Λ terms, writ-
ten as Λ Λ. A document altering operation, Op,
can be one of the following: P P is the paragraph
merge operation, which takes a pair of paragraphs
and merges them into a single paragraph. P is the
paragraph split operation, which splits a paragraph
into two paragraphs at the end of some sentence, each
paragraph with random proportions. P P is the left-
shift operation, which takes some characters from the
right (in reality, the lower) paragraph and adds them
to the left (in reality, upper) paragraph, such that the
number of characters in the former becomes less than
A Novel Method for Embedding and Extracting Secret Messages in Textual Documents based on Paragraph Resizing
715
that in the latter. Finally, P P is the right-shift op-
eration, which takes some characters from the left (in
reality, upper) paragraph and adds them to the right
(in reality, lower) paragraph, such that the number of
characters in the latter becomes more than that in the
former. All of these operations must respect the in-
variant conditions on the grammar and structure of a
paragraph and the document, for example, that a para-
graph must always end with a sentence as defined by
a punctuation mark.
We define the formal operational meaning of Λ as
in Figure 1, using the semantic operator [[Λ S]] ,
defined inductively over the terms of the language
and with respect to an existing document S. We now
explain the semantic rules. Rule (Op1) defines the
meaning of a merge operation on a pair of paragraphs
that belong to a document by simply merging their
sentences and preserving the position of the new para-
graph as the position of the two old ones. Rule (Op2)
defines the meaning of a split operation for a para-
graph that belongs to a document, by splitting the
paragraph into two paragraphs, such that the posi-
tion of the new paragraphs occupies the position of
the old one. The two new paragraphs have random
sizes, but must add up to ch_of.P/. Rule (Op3) de-
fines the meaning of the left-shift operation on a pair
of paragraphs that belong to a document as the mod-
ification of those paragraphs such that the number of
characters of the left paragraph now exceeds the right
one. Similarly, Rule (Op4) defines the meaning of the
right-shift operation on a pair of paragraphs such that
the new pair has more characters in the right para-
graph than the left. Finally, Rule (Λ0) defines the
meaning of the sequential composition of two Λ terms
as the application of the left term first on a document
and then the right one after that on the resulting doc-
ument.
The embedding process now consists of applying
terms of the Λ language to any document S, such that
[[Λ S]] = P
1
, , P
n
and:
R.P
1
, P
2
/, , R.P
n*1
, P
n
/ = M
where M is the secret message being communicated.
It is also possible to simply either choose or construct
from fresh a textual document, P
1
, , P
n
, such that the
document satisfies the above equation with M.
As a simple example, let us consider the excerpt,
S
Dickens
, defined in Figure 2 and taken from Charles
Dickens’s "Oliver Twist". If we want to transmit the
message M = 0, 1, then we must alter S
Dickens
such
that R.P
1
, P
2
/ = 0 and R.P
2
, P
3
/ = 1. One way of
achieving this would be to apply:
.P
2
P
3
/ S
Dickens
under an invariant condition that states that a para-
graph must end with a full stop. This then results in a
new excerpt, S
Dickens
, as shown in Figure 3.
3.2 The Extraction Process
The extraction process reverses the embedding pro-
cess. In order to extract a message, the recipient will
need to have agreed on the definition of R with the
sender beforehand. With that in mind, the extraction
logic can be defined as follows:
Y ω P
1
, , P
n
= ω .Y ω/ P
1
, , P
n
where, Y is Curry’s fixed-point combinator (Curry
and Feys, 1958, p.178), P
1
, , P
n
is the received text
document, is the empty sequence, to be filled with
the bits of the secret message, and ω is defined as the
following λ-calculus expression (Church, 1932):
ω = λ f .λs.λ. if s 1 then
else f .sfst.s// . : R.fst.s/, snd.s///
Here, fst : is a partial function that returns the
first paragraph element in a sequence, snd :
is a partial function that returns the second paragraph
element in a sequence and : is a partial
function that takes a sequence and a paragraph, and
returns a new sequence resulting from the removal of
that paragraph from the input sequence. Finally, . : n/
joins an element n to the tail of an existing sequence
such that n becomes the last element of the new se-
quence. For example, 1, 0, 1 : 1 = 1, 0, 1, 1. Both fst
and snd are partial since they are not defined over the
empty sequence and the 1-element sequence, respec-
tively. is partial since the element being removed
from a sequence may not be a member. The defini-
tion of ω returns the current initial message sequence
unaltered if the size of the document is one or zero
paragraphs, since such document cannot hold any se-
cret messages. Moreover, this is also used as the con-
dition to terminate the fixed point calculation.
To extract the message from the excerpt of Figure
3, we apply the following:
Y ω .Y ω .Y ω S
Dickens
// = 0, 1
4 PROPOSED STEGANALYSIS
According to (Taleby Ahvanooey et al., 2019), there
are generally three methods for attacking text doc-
uments with hidden content: visual attacks that in-
volve a human in comparing two documents, struc-
tural attacks that involve modifying the structure of
SECRYPT 2022 - 19th International Conference on Security and Cryptography
716
(Op1) [[P
l
P
r
P
1
, , P
l*1
, P
l
, P
r
, P
r+1
, , P
n
]] = P
1
, , P
l*1
, P, P
r+1
, , P
n*1
where, P = P
l
P
P
r
(Op2) [[P P
1
, , P
l*1
, P, P
r+1
, , P
n
]] = P
1
, , P
l*1
, P
l
, P
r
, P
r+1
, , P
n+1
where, P = P
l
P
P
r
(Op3) [[P
l
P
r
P
1
, , P
l
, P
r
, , P
n
]] = P
1
, , P
l
, P
r
, , P
n
where, ch_of.P
l
/ ch_of.P
r
/
and ch_of.P
l
/ > ch_of.P
r
/
(Op4) [[P
l
P
r
P
1
, , P
l
, P
r
, , P
n
]] = P
1
, , P
l
, P
r
, , P
n
where, ch_of.P
l
/ ch_of.P
r
/
and ch_of.P
l
/ < ch_of.P
r
/
(Λ0) [[.Λ
l
Λ
r
/ S]] = [[Λ
r
[[Λ
l
S]]]]
Figure 1: Semantics of the document alteration language Λ.
P1: "Bow to the Board," said Bumble. Oliver brushed away two or three tears that were lingering
in his eyes, and seeing no board but the table, fortunately bowed to that.
P2: "What’s your name, boy?" said the gentleman in the high chair.
P3: Oliver was frightened at the sight of so many gentlemen, which made him tremble; and the
beadle gave him another tap behind, which made him cry; and these two causes made him answer in
a very low and hesitating voice; whereupon a gentleman in a white waistcoat said he was a fool.
Which was a capital way of raising his spirits, and putting him quite at ease.
Figure 2: A three-paragraph excerpt, S
Dickens
, taken from Charles Dickens’s "Oliver Twist".
P1: "Bow to the Board," said Bumble. Oliver brushed away two or three tears that were lingering
in his eyes,and seeing no board but the table, fortunately bowed to that.
P2: "What’s your name, boy?" said the gentleman in the high chair. Oliver was frightened at the
sight of so many gentlemen, which made him tremble; and the beadle gave him another tap behind,
which made him cry; and these two causes made him answer in a very low and hesitating voice;
whereupon a gentleman in a white waistcoat said he was a fool.
P3: Which was a capital way of raising his spirits, and putting him quite at ease.
Figure 3: The excerpt of Figure 2 with a new layout after encoding the secret message 0,1.
the suspected documents hence destroying its embed-
ded content and finally, statistical attacks where the
attacker uses statistical methods to estimate the prob-
ability or possibility that a document has some hidden
content.
In general, the first method always succeeds in de-
tecting hidden content if the attacker has access to the
cover document. Therefore, our embedding method
is unable to withstand such attacks. On the other had,
our method resists structural or stylistic changes un-
less these involve resizing paragraphs, in which case
the hidden message is affected. The most interesting
method is the statistical one, and we outline below one
such attack based on the chi-squared test (Pearson,
1900; Plackett, 1983). The generality of this test ren-
ders it a suitable one, albeit a rough one, as it is based
only on detecting similarities in numbers of 0s and
1s with the norm, but not other attributes, for exam-
ple their ordering. Other popular statistical tests, such
as Jaro-Winkler’s similarity test (Jaro, 1989; Winkler,
1990), are less suitable since they rely on changes in
the textual content itself (e.g. character or word re-
placement), which we avoid in our Λ embedding.
In our case, Pearson’s chi-squared test would be
defined by the following equation:
χ
2
=
2
i=1
.O
i
* E
i
/
2
E
i
where O
i
is the observed value, i.e. the number of 0s
and 1s contained in a document’s R sequence, and E
i
is the expected value of those 0s and 1s, based, for
example, on the results of some empirical study that
would be carried out over some text corpus. There
are only two categories to sum over, 0 and 1. If we
assume that such empirical study produced a result
so that the rate of occurrence of 0s in a document’s R
sequence was Pr
0
, then the null hypothesis (H
0
) for
our chi-squared test would be stated as follows:
H
0
: The average rate at which a 0 occurs in a docu-
ment’s R sequence is 0 Pr
0
1
A Novel Method for Embedding and Extracting Secret Messages in Textual Documents based on Paragraph Resizing
717
By contrast, the alternative hypothesis (H
1
) is:
H
1
: The average rate at which a 0 occurs in
a document’s R sequence is some other value,
0 Pr
0
1 where Pr
0
Pr
0
In testing a suspicious document, S, we then perform
the following test:
χ
2
=
.O
0
* .Pr
0
S//
2
Pr
0
S
+
.O
1
* ..1 * Pr
0
/ S//
2
.1 * Pr
0
/ S
where 1 * Pr
0
is the rate of occurrence of 1s in the
normal case. If after this test, H
0
holds, meaning that
the upper-tail critical value is χ
2
3.841 for a signif-
icance level of 0.05, then this implies that S is clean.
On the other hand, if H
1
is proven to be correct, mean-
ing that the upper-tail critical value is χ
2
> 3.841, then
this indicates that S is more likely to contain some se-
cret message.
5 CONCLUSION
We presented in this paper briefly a new structural
method for embedding and extracting secret messages
in text documents. The new method manipulates the
layout of a document through the resizing of the doc-
uments and then using the size of two adjacent para-
graphs to embed an information bit. In general, our
method is capable of embedding n * 1 number of bits,
for a document consisting of n paragraphs. We de-
fined formally the semantics of the embedding lan-
guage and suggested the chi-squared test as a suitable
attack against a document with a hidden message.
Future work will focus on validating the new
method through carrying out extensive experiments
on various text corpora. This will involve establish-
ing the normal distribution of 0s and 1s to define what
the value of Pr
0
is for clean documents. Another di-
rection of future work would be to enrich the formal
language and its semantics to include a logical theory
defining invariant conditions needed for maintaining
the structural and grammatical integrity of paragraphs
and documents. Finally, it is also possible to give dif-
ferent other definitions of R, in particular, ones that
compare the sizes of non-adjacent paragraphs. Such
definitions of R would be keyed, where the key indi-
cates which paragraphs are to be compared. Although
such keyed versions would be more secure, they intro-
duce the additional problem of key distribution.
REFERENCES
Agarwal, M. (2013). Text steganographic approaches: A
comparison. International Journal of Network Secu-
rity and its Applications, 5:91–106.
Barmawi, A. (2016). Linguistic based steganography using
lexical substitution and syntactical transformation. In
2016 6th International Conference on IT Convergence
and Security (ICITCS), pages 1–6.
Bergmair, R. and Katzenbeisser, S. (2006). Content-aware
steganography: About lazy prisoners and narrow-
minded wardens. In Information Hiding, volume
4437, pages 109–123.
Blizard, W. D. (1988). Multiset theory. Notre Dame Journal
of Formal Logic, 30(1):36 – 66.
Chand, V. and Orgun, C. O. (2006). Exploiting linguistic
features in lexical steganography: Design and proof-
of-concept implementation. In Proceedings of the
39th Annual Hawaii International Conference on Sys-
tem Sciences (HICSS’06), volume 6, pages 126b–
126b.
Church, A. (1932). A set of postulates for the foundation of
logic. Annals of Mathematics, 33(2):346–366.
Curry, H. B. and Feys, R. (1958). Combinatory Logic.
Number v. 1 in Combinatory Logic. North-Holland
Publishing Company.
Grosvald, M. and Orgun, C. O. (2012). Human-versus
computer-generated text-based steganography: Real-
world tests of two algorithms. Journal of Inf. Hiding
and Multimedia Signal Processing, 3(1):24–33.
Guan, B., Gong, L., and Shen, Y. (2022). A novel cover-
less text steganographic algorithm based on polyno-
mial encryption. Sec. and Comm. Networks, 2022.
Huanhuan, H., Xin, Z., Weiming, Z., and Nenghai, Y.
(2017). Adaptive text steganography by exploring sta-
tistical and linguistical distortion. In 2017 IEEE Sec-
ond International Conference on Data Science in Cy-
berspace (DSC), pages 145–150. IEEE.
Jaro, M. A. (1989). Advances in record-linkage methodol-
ogy as applied to matching the 1985 census of tampa,
florida. Journal of the American Statistical Associa-
tion, 84(406):414–420.
Kang, H., Wu, H., and Zhang, X. (2020). Generative
text steganography based on lstm network and atten-
tion mechanism with keywords. Electronic Imaging,
2020(4):291–1.
Kumar, R. and Singh, H. (2020). Recent trends in text
steganography with experimental study. In Handbook
of Computer Networks and Cyber Security, pages
849–872. Springer.
Kwan, M. (2013). The SNOW Home Page.
Li, Y., Zhang, J., Yang, Z., and Zhang, R. (2021).
Topic-aware neural linguistic steganography based on
knowledge graphs. ACM Trans. on Data Science,
2(2):1–13.
Liang, O. W. and Iranmanesh, V. (2016). Information hid-
ing using whitespace technique in Microsoft word. In
Proceedings of the 2016 International Conference on
Virtual Systems and Multimedia, VSMM 2016. Insti-
tute of Electrical and Electronics Engineers Inc.
SECRYPT 2022 - 19th International Conference on Security and Cryptography
718
Lockwood, R. and Curran, K. (2017). Text based steganog-
raphy. International Journal of Information Privacy,
Security and Integrity, 3(2):134–153.
Majeed, M. A., Sulaiman, R., Shukur, Z., and Hasan, M. K.
(2021). A review on text steganography techniques.
Mathematics, 9(21).
Malik, A., Sikka, G., and Verma, H. K. (2017). A high ca-
pacity text steganography scheme based on lzw com-
pression and color coding. Engineering Science and
Technology, an International Journal, 20(1):72–79.
Naharuddin, A., Wibawa, A. D., and Sumpeno, S. (2018).
A high capacity and imperceptible text steganography
using binary digit mapping on ascii characters. In
2018 International Seminar on Intelligent Technology
and Its Applications (ISITIA), pages 287–292. IEEE.
Pearson, K. (1900). On the criterion that a given system of
deviations from the probable in the case of a correlated
system of variables is such that it can be reasonably
supposed to have arisen from random sampling. The
London, Edinburgh, and Dublin Philosophical Maga-
zine and Journal of Science, 50(302):157–175.
Plackett, R. L. (1983). Karl Pearson and the Chi-Squared
Test. International Statistical Review / Revue Interna-
tionale de Statistique, 51(1):59–72.
Roy, S. and Manasmita, M. (2011). A novel approach
to format based text steganography. In Proceedings
of the 2011 International Conference on Communi-
cation, Computing and Security, ICCCS ’11, page
511–516, New York, NY, USA. Association for Com-
puting Machinery.
Samanta, S. and Pattyanayak, S. (2020). A significant sur-
vey on text steganalysis techniques. Int. Journal on
Computer Science and Engineering, pages 187–193.
Shirali-Shahreza, M. H. and Shirali-Shahreza, M. (2007).
Text steganography in chat. In 2007 3rd IEEE/IFIP
International Conference in Central Asia on Internet,
pages 1–5.
Stewart, I. et al. (1996). From here to infinity. Oxford Pa-
perbacks.
Taleby Ahvanooey, M., Li, Q., Hou, J., Rajput, A. R., and
Chen, Y. (2019). Modern text hiding, text steganal-
ysis, and applications: A comparative analysis. En-
tropy, 21(4).
Wang, K. and Gao, Q. (2019). A coverless plain text
steganography based on character features. IEEE Ac-
cess, 7:95665–95676.
Wilson, A., Blunsom, P., and Ker, A. (2015). Detection of
steganographic techniques on twitter. In Proceedings
of the 2015 Conference on Empirical Methods in Nat-
ural Language Processing, pages 2564–2569.
Winkler, W. E. (1990). String comparator metrics and en-
hanced decision rules in the fellegi-sunter model of
record linkage. Technical report, U.S. Bureau of the
Census.
Wu, N., Shang, P., Fan, J., Yang, Z., Ma, W., and Liu,
Z. (2019). Coverless text steganography based on
maximum variable bit embedding rules. In Journal
of Physics: Conference Series, volume 1237:2, page
022078. IOP Publishing.
Wu, N., Yang, Z., Yang, Y., Li, L., Shang, P., Ma, W., and
Liu, Z. (2020). Stbs-stega: Coverless text steganog-
raphy based on state transition-binary sequence. In-
ternational Journal of Distributed Sensor Networks,
16(3):1550147720914257.
Wu, Y., Chen, X., and Sun, X. (2018). Coverless steganog-
raphy based on english texts using binary tags proto-
col. Journal of Internet Technology, 19(2):599–606.
Xiang, L., Sun, X., Luo, G., and Gan, C. (2007). Research
on steganalysis for text steganography based on font
format. In Third International Symposium on Infor-
mation Assurance and Security, pages 490–495.
Yajam, H. A., Mousavi, A. S., and Amirmazlaghani, M.
(2014). A new linguistic steganography scheme based
on lexical substitution. In 2014 11th International ISC
Conference on Information Security and Cryptology,
pages 155–160.
Yang, Z., He, J., Zhang, S., Yang, J., and Huang, Y.
(2021a). Tstego-thu: Large-scale text steganalysis
dataset. In International Conference on Artificial In-
telligence and Security, pages 335–344. Springer.
Yang, Z., Xiang, L., Zhang, S., Sun, X., and Huang, Y.
(2021b). Linguistic generative steganography with en-
hanced cognitive-imperceptibility. IEEE Signal Pro-
cessing Letters, 28:409–413.
A Novel Method for Embedding and Extracting Secret Messages in Textual Documents based on Paragraph Resizing
719