A NEW STEGANOGRAPHIC SCHEME
BASED ON FIRST ORDER REED MULLER CODES
A New Steganographic Scheme
Houda Jouhari and El Mamoun Souidi
Laboratoire de Math´ematiques, Informatique et Applications, Facult´e des Sciences, Universit´e Mohammed V-Agdal
B. P. 1014, Rabat, Morocco
Keywords:
Steganography, Error correcting codes, Reed-Muller codes R M (1,m), Boolean functions.
Abstract:
Reed-Muller codes are widely used in communications and they have fast decoding algorithms. In this paper
we present an improved data hiding technique based on the first order binary Reed-Muller syndrome coding.
The proposed data hiding method can hide the same amount of data as known methods with reduction of time
complexity from 2
m
(2
m
1)2
m+1
binary operations to 2
m
(2
m
1)m binary operations .
1 INTRODUCTION
Steganographyis the art and science of invisible com-
munications. It is used, sometimes together with
cryptography, to protect information from unwanted
third parties. In contrast with cryptography, where
the enemy is able to detect, intercept and modify the
transmitted information (Kahn, 1996), steganography
is used primarily when the fact of communicating
needs to be kept secret. This is accomplished by em-
bedding the secret messages within another, appar-
ently innocuous, messages (called covers). Today’s
typical covers are computer files, mainly (due to the
limited power of human visual and hearing systems)
image, video and audio files; but in fact, whatever
an electronic document contains irrelevant or redun-
dant information, it can be used as a cover for hid-
ing secrets. For example, despite their known weak-
nesses, the most popular steganographic systems are
LSB (least significant bit) techniques. In its more el-
ementary form, the encoder select a pixel of a bitmap
image and replaces its LSB by a bit of information.
More elaborated versions allow to hide information
in JPEG and other format images.
Now-days , steganographic techniques are used in
order to guarantee security and privacy on open sys-
tems (as the Internet). They play also a role in elec-
tronic commerce, where they are used to prevent il-
legal uses of digital information (by means of water-
marking for example, see (Cox et al., 2007)). For a
more complete description of uses and applications
of steganography, see (Bender et al., 2000), (Moulin
and Koetter, 2005).
The design of a steganographic system has (at
least) two facets: firstly, the choice of accurate cov-
ers and the search for strategies to modify them in an
imperceptible way; this study relies on a variety of
methods, including psycho-visual and statistical cri-
teria. Secondly, the design of efficient algorithm for
embedding and extracting the information. Here we
concentrate our attention on this last problem.
Our goal in this paper is to improve the efficiency
of these embedding/retrievalalgorithms by using cod-
ing theory techniques to construct new and more ef-
ficient algorithms. Recall that error-correcting codes
are commonly used for detecting and correctingerrors
in data transmission. Their use in steganography is
not new. It was first suggested by Crandall (Crandall,
1998) who called it matrix encoding and later implic-
itly used by Westfeld in the design of F5 (Westfeld,
2001).
There exists a close relationship between stegano-
graphic protocols and error correcting codes. Since
error-correcting codes can be used to construct good
steganographic protocols and study their properties.
An explicit description of the relationship between
error-correcting codes and steganographic systems
was treated in (Zhang and Li, 2008), (Munuera,
2007).
Here, we propose to focus on a particular family
of error correcting codes: the first-order binary Reed-
Muller codes denoted R M (1,m). Theses codes are
widely used in communications over long distances, a
Reed Muller code was used by Mariner 9 to transmit
351
Jouhari H. and Souidi E..
A NEW STEGANOGRAPHIC SCHEME BASED ON FIRST ORDER REED MULLER CODES - A New Steganographic Scheme.
DOI: 10.5220/0003512703510356
In Proceedings of the International Conference on Security and Cryptography (SECRYPT-2011), pages 351-356
ISBN: 978-989-8425-71-3
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
black and white photographs of Mars.
This paper is organized as follows. After the in-
troduction, Section 2 presents syndrome coding, first
order Reed-Muller codes and we discuss there inter-
est in steganography after writing them with boolean
functions. Section 3 contents our contribution that’s
an improved algorithm based on list-decoding, that
enables us to embed more rapidly compared to the
Matrix/Embedding approach. The last section is de-
voted for, discussion, comparison and conclusion.
Notations. F
2
denotes the Galois field {0,1}, d
H
and ω
H
the Hamming distance and the Hamming
weight respectively.
2 CODING THEORY AND
STEGANOGRAPHY
2.1 Syndrome Coding
Let C be an [n,k] code with parity check matrix H,
and s F
nk
2
. For x F
n
the syndrome of x is de-
fined to be x.H
t
. We let Coset(s) to denote the set of
all vectors in F
n
with syndrome s. A vector with the
smallest weight is called the leader of Coset(s) which
we denote by I
s
(if there is more than one vector, sim-
ply take one at random). Clearly Coset(s) = C + I
s
.
Now, when decoding a vector y we compute
y.H
T
= s and take the associated leader I
s
in Coset(s).
The nearest element to y in C is then c = y I
s
. To
see this:
d
H
(y, c) = ω
H
(y c) = ω
H
(I
s
)
then,
min
aC
d
H
(y, a) = d
H
(y, c)
Thus we decode y by y I
s
. This procedure can be
adapted to make a method to perform the embedding
process.
2.2 Syndrome Coding and
Steganography
The behaviour of a steganographic algorithm can be
sketched in the following way: a cover-data x is mod-
ified into y to embed a message M; y is sometimes
called the stego-data. Here, we assume that the de-
tectability of the embedding increases with the num-
ber of bits that must be changed to transform x to y,
see (Westfeld, 2001) for some examples.
Syndrome coding deals with this number of
changes. The key idea is to use some syndrome com-
putation to embed the message M into the cover-data
x. In fact, this scheme uses a linear code C , more pre-
cisely its cosets, to hide M. A word y hides the mes-
sage M if y lies in a particular coset of C , related to M.
Since cosets are uniquely identified by the so called
syndromes, embedding consist exactly in searching y
with syndrome M, close enough to x.
We now set up the notation and describe properly
the syndrome coding scheme, and its inherent prob-
lems. We are looking for two mappings, embedding
Emb and extraction Ext, such that:
(x,M) F
n
2
× F
r
2
,Ext(Emb(x,M)) = M (1)
(x,M) F
n
2
× F
r
2
,d(x,Emb(x, M)) T (2)
Equation 1 means that we want to recover the
message in all cases ; Equation 2 means that we
authorize the modification of at most T coordinates
in the vector x.
It is quite easy to show that the scheme enables
to embed messages of length n k in a cover-data
of length n, while modifying at most T( ρ)
1
ele-
ments of the cover-data. The embedding and extrac-
tion functions are defined after (Fontaine and Galand,
2007) by:
Emb(x,M) = x + e = y (3)
Ext(y) = y.H
t
= M (4)
where e is the smallest element of weight ρ such
that:
e.H
t
= M x.H
t
= s (5)
Remark that effective computation of e(= I
s
) is the
complete syndrome decoding problem, which is a
very hard problem.
The hidden message can be recovered from y by:
y.H
t
= x.H
t
+ e.H
t
= x.H
t
+ M x.H
t
= M (6)
In this paper, the embedding process is divided
into two steps. In the first one, the exhaustive search
is used to acquire the first sequence q = (q
1
,·· · ,q
n
).
The coset member q can be identified more simply
and independent of x by looking for a sequence q that
fulfils
q.H
T
= s
In the second step of the embedding process, this
coset member q can be used to determine a sequence
that has a minimum distance to the cover sequence.
Using the exhaustive search, we compare the
member coset q directly to the 2
k
codewords, and
knowing that the time needed to find the first coset
member q is negligible (Sch¨onfeld and Winkler,
2007), then we obtains a leader coset in O (n(n1)2
k
)
1
By definition ρ = max
xF
n
2
min
cC
d(x,c) is the cover-
ing radius of C .
SECRYPT 2011 - International Conference on Security and Cryptography
352
binary operations. In fact I
s
= q c where c satisfies
d
H
(q,c) = d
H
(q,C ).
Whenever considering a big codeword length n,
finding the optimal solution and thus finding a coset
leader is known to be an NP-complete problem.
Since embedding based on the classical approach,
by finding a coset leader using a exhaustive search is
really complex and therefore time consuming. We fo-
cused on embedding strategies to reduce the embed-
ding complexity without reducing the embedding ef-
ficiency.
In order to reduce complexity of syndrome coding
for embedding, we can reduce complexity to find a
vector e with a minimal weight satisfying Equation 5
(e will be a leader of the coset(s)).
2.3 First-order Binary Reed-Muller
Codes
The recursive nature of the construction of first-order
binary Reed-Muller codes (R M (1,m)) suggests that
there is a recursive approach to decoding as well.
Roughly speaking, the R M (1, m) code of length
n = 2
m
is a subspace of dimension k = m + 1 which
consists of affine functions. We can define this code
as follows: starting with a word (u
0
,u
1
,·· · ,u
m
) of
length k = m+ 1, this word represents the affine func-
tion f R M (1,m) defined by the equality :
f(x) = u
0
+ < u,x > (7)
where u F
m
2
, u
0
F
2
and < u,x >=
m
i=1
u
i
x
i
is the
scalar product.
The encoded word is then given by the vector
( f(0), f(1),· ·· , f(2
m
1))
The minimum distance of R M (1,m) is d = 2
m1
.
So this code can correct t errors where
t =
d 1
2
= 2
m2
1.
By the support of a function f we mean, the set:
supp( f) = {x F
m
2
: f(x) 6= 0}
and the weight of f is the cardinal’s support:
ω
H
( f) = Card(supp( f)).
Before presenting the decoding algorithm of
R M (1, m) codes, we need to recall some definitions:
Definition 1. Let f : F
m
2
F
2
be a Boolean function.
Its Fourier transform is
b
f : F
m
2
Z defined by :
b
f(v) =
xF
m
2
f(x)(1)
<v,x>
=
xsupp( f)
(1)
<v,x>
.
We can show by induction on m that
xF
m
2
(1)
<v,x>
= 2
m
δ
0
(v)
where δ
0
is the Dirac function defined by:
δ
0
(v) =
1 if v = 0
0 otherwise
Definition 2. The Walsh-Hadamard transform
(WHT) of a Boolean function f is a real-valued func-
tion defined for all v F
m
2
as the Fourier transform of
its sign function X
f
(v) = (1)
f(v)
:
b
X
f
(v) =
xF
m
2
(1)
f(x)
(1)
<v,x>
Let f be a codeword of R M (1,m). We can write
f as f(x) = u
0
+ < u, x >, where u F
m
2
and u
0
F
2
.
Consequently all Walsh-Hadamard coefficients
are zero except the one of index u:
b
X
f
(v) =
2
m
(1)
u
0
if, v = u
0 otherwise
3 THE PROPOSED
STAGANOGRAPHIC SCHEMES
In this section we describe our contribution that’s to
use syndrome coding with a First-Order binary Reed-
Muller code that have a very efficient decoding meth-
ods.
Our problem is the following: We have a vectors
f = ( f
1
,·· · , f
n
) and g = (g
1
,·· · , g
n
) of length n = 2
m
of symbols of F
2
, and a message M = (M
1
,·· · ,M
nk
)
of length n k. We want to modify f into g such that
M is embedded in g, changing at most T coordinates
in f.
3.1 Hiding using Fast Walsh Transform
(FWT)
For v F
m
2
we define the boolean function x 7− hx, vi
and
d(g,v) = |{x F
m
2
/ g(x) 6= hx,vi}|
Given a boolean function g, the relationship be-
tween the Walsh transform of g at v and the distance
between g and v is then given by:
b
X
g
(v) = 2
m
2.d(g,v) (8)
Indeed,
b
X
g
(v) = |{x F
m
2
/g(x) =< v,x >}|
−|{x F
m
2
/g(x) 6=< v,x >}|
= 2
m
2|{x F
m
2
/g(x) 6=< v,x >}|
= 2
m
2d(g,v)
A NEW STEGANOGRAPHIC SCHEME BASED ON FIRST ORDER REED MULLER CODES - A New Steganographic
Scheme
353
Let q be a member of coset(s) that is qH
t
= s.
To find the leader coset e(= I
s
) we look for u
F
m
2
such that |
ˆ
χ
q
(u)| = max
vF
m
2
|
ˆ
χ
q
(v)| where c =
(c(0),·· · ,c(2
m
1)) satisfies
c(x) = u
0
+ hx, ui
and
u
0
=
0 if
ˆ
χ
q
(u) 0
1 otherwise
The principle idea consists of decomposing the sum
depending on whether one of the coordinates (in prac-
tice we consider x
m
of x = (x
1
,·· · ,x
m
)) is 1 or 0:
|
b
X
q
(v)| =
xF
m
2
,x
m
=0
(1)
q(x)
(1)
<v,x>
+
xF
m
2
,x
m
=1
(1)
q(x)
(1)
<v,x>
=
xF
m1
2
(1)
q(x,0)+<(v
1
,···,v
m1
),x>
+
xF
m1
2
(1)
q(x,1)+<(v
1
,···,v
m1
),x>+v
m
=
b
X
q(.,0)
((v
1
,·· · ,v
m1
))
+(1)
v
m
b
X
q(.,1)
((v
1
,·· · ,v
m1
))
So, once
b
X
q(.,0)
and
b
X
q(.,1)
are calculated, it remains
2
m1
additions and subtractions to obtain
b
X
q
. Con-
tinuing the decomposition (m times in all), then we
obtain
[
X (v)
q
in m.2
m
additions/subtractions. From a
practical point of view, we can obtain
b
X
q
(u) using an
array of size 2
m
, and F
m
2
lexicographically ordered.
Thus we have reduced the complexity from
2
m
(2
m
1)2
m+1
binary operations to 2
m
(2
m
1)m.
Moreover, the Hamming weight of e is precisely
the number of changes we apply to go from f to g;
so, we need ω
H
(e) T.
When T is equal to the covering radius of the code
corresponding to H, such a vector e always exists.
But, explicit computation of such a vector e, knownas
the bounded syndrome decoding problem, is proved
to be NP-complete for general linear codes. Even for
well structured codes, we usually do not have polyno-
mial time algorithm to solve the bounded syndrome
decoding problem up to the covering radius. The list
decoding of R M (1,m) codes overcome this problem
in a nice fashion.
3.2 Hiding using List Decoding
List decoding (Sudan, 2000) is of interest in coding
theory, for example when the weight of the error ex-
ceeds the correction capability (in which case there
may be several solutions or the (good) solution is fur-
ther from the noise vector that solution returned by a
maximum likelihood decoding).
3.2.1 List Decoding Algorithm
This algorithm compute from a vector q, a vector c
R M (1, m) such that d
H
(q,c) T.
The list decoding with radius T (parameter
fixed in advance) outputs the list L
T,m
(q) = {c
R M (1, m)|d
H
(q,c) T} of all codewords of a code
R M (1, m) located within distance T to the vector q.
Let d = 2
m1
denote the minimum distance of
R M (1, m). The following Johnson upper bound on
the list size will be useful below. See (Bassalygo,
1965) for a simple proof of this bound over an arbi-
trary alphabet.
Proposition 1. Any code C satisfies the inequality
|L
T,C
(q)|
d
d 2n
1
T(n T)
(9)
In this paper, we consider list decoding for codes
RM(1,m) with decoding radius T = (1 ε)d, where
ε > 0. The corresponding list is denoted by
L
ε,m
(q) = {c R M (1,m)|d
H
(q,c) (1 ε)d}
It follows from Proposition 1, and since the list
size does not exceed n, that
|L
ε,m
(q)| min{ε
2
,n} (10)
Let c(x
1
,·· · ,x
m
) be an arbitrary linear Boolean
function, and let c
( j)
= c
1
x
1
+ ··· +c
j
x
j
be its j
th
pre-
fix.
Let be L
( j)
ε,m
(q) the list of the j
th
prefixes of all
functions c(x
1
,·· · ,x
m
) L
ε,m
(q). we consider the
j-dimensional faces S
a
= {(x
1
,·· · ,x
j
,a
j+1
,·· · ,a
m
)},
where the variables x
1
,·· · ,x
j
take arbitrary values,
whereas the variables x
j+1
= a
j+1
,·· · ,x
m
= a
m
are
fixed.
Given any boolean functions f and g (also con-
sidered as vectors), let d
H
( f, g|S
a
) denote the Ham-
ming distance between their restrictions onto some j-
dimensional faces S
a
:
d
H
( f, g|S
a
) =
xS
a
d
H
( f(x),g(x)).
Obviously,
d
H
( f, g) =
aF
m j
2
d
H
( f, g|S
a
)
where we use the definition
( f, g|S
a
) := min{d( f,g|S
a
),d( f,g 1|S
a
)}
SECRYPT 2011 - International Conference on Security and Cryptography
354
Thus, for any (received) vector q,
(q,c
( j)
|S
a
) d(q, c|S
a
)
Let us define the j
th
distance between the vectors
f and g as
( j)
( f, g) =
aF
m j
2
( f, g|S
a
).
Lemme 1. For any affine function c = c
1
x
1
+ ··· +
c
m
x
m
+ c
0
and for any prefix c
( j)
= c
1
x
1
+ ··· + c
j
x
j
,
we have
( j)
(q,c
( j)
) d(q,c).
We say that a prefix c
( j)
= c
1
x
1
+ · ·· + c
j
x
j
satis-
fies the sum criterion if
( j)
(q,c
( j)
) (1 ε)d (11)
In accordance with this criterion, define the list
b
L
( j)
ε,m
(q) = {c
( j)
= c
1
x
1
+ · ·· + c
j
x
j
such that
( j)
(q,c
( j)
) (1 ε)d}
It follows from Lemma 1 that:
L
( j)
ε,m
b
L
( j)
ε,m
.
3.2.2 The Proposed Embedding Scheme
Our proposed approach, that we call Sum Criterion
embedding scheme, works by using of list decoding
who is executed by consecutive calculation of the lists
of (suspicious) prefixes using the sum criterion.
The principle of this algorithm is to define at each
step (j) a test to eliminate a certain number of linear
functions in (j) variables, those which we are confi-
dent that it can be the prefix of a solution of the prob-
lem.
We’re going to extract information at each step (j)
to invalidate certain sets of functions.
Given in step ( j) a list L
( j)
ε,m
(q) such that
L
( j)
ε,m
(q) L
( j)
ε,m
(q)
b
L
( j)
ε,m
(q) (12)
in the ( j+ 1)
th
step the algorithm processes all possi-
ble extensions c
( j)
(x
1
,·· · , x
j
+ c
j+1
x
j+1
) of the pre-
ceding prefixes, where c
( j)
L
( j)
ε,m
(q) and c
j+1
{0,1}. Among these extended prefixes, the SC-
algorithm leaves only those that satisfy the sum cri-
terion.
The latter prefixes in turn form a new list
L
( j+1)
ε,m
(q), which satisfies Relationship (11) for j :=
j + 1. In the last step (Step m); therefore, the list
L
(m)
ε,m
(q) coincides with the list L
(m)
ε,m
.
The Sum Criterion Algorithm for Embedding
Inputs f = ( f
0
,·· · , f
n1
), the cover data ;
M = (M
0
,·· · ,M
nk
) the message to hide,
ε > 0 such that T = (1 ε)d distortion.
d: minimal distance of R M (1,m) code.
H his parity check matrix.
Outputs g
0
,·· · ,g
n1
, stego-data such that:
d(g, f) T
1. We compute: s = M f.H
T
2. If s = 0 then e = 0 : no message to hide
else
Find a member coset q, such that q.H
T
= s
For each codewords c R M (1,m) :
j = 1 do :
While (
( j)
(q,c
( j)
) (1 ε)d) do :
c
( j+1)
= c
( j)
(x
1
,·· · ,x
j
) + c
j+1
x
j+1
where c
( j)
L
( j)
ε,m
j = j + 1
Endwhile
If j > m then e = q c
(m)
where (w(e) = d(q,c
(m)
) T)
else check next c R M (1,m)
EndFor
3. g = f + e (return g).
4 DISCUSSION
The proposed scheme for data hiding method based
on R M (1, m) syndromecoding is comparedwith that
uses a classical exhaustive search. The basic contribu-
tions of their methods are the reduction of time com-
plexity. They achieve significant improvement over
existing classical approach.
The first algorithm based on the fast Walsh trans-
form allows us to find the Hamming distances from
the coset member q to all 2
k
codewordsin O (n.ln
2
(n))
binary operations.
The second proposed scheme for data hiding
method based on the sum criterion list decoding algo-
rithm for R M (1,m) codes, allows us to reconstructs
all codewords located within the ball of radius (1
ε)d about the member coset in O (n.ln
2
(min{ε
2
,n}))
binary operations (Dumer et al., 2007).
We have shown in this paper that first-order binary
Reed Muller codes are good candidates for design-
ing efficient steganographic schemes. Contributions
of this paper include the reduction of time complex-
ity and storage complexity as well. Time complexity
of our methods is reduced compared to the existing
methods. Since, it is easy to extend this method to
A NEW STEGANOGRAPHIC SCHEME BASED ON FIRST ORDER REED MULLER CODES - A New Steganographic
Scheme
355
large n which will allows us to hide data less com-
plexly.
REFERENCES
Bassalygo (1965). New upper bounds for error correcting
codes. Problemy Peredachi Informatsii, 1(4):41–44.
Bender, W., Butera, W., Gruhl, D., Hwang, R., Paiz, F. J.,
and Pogreb, S. (2000). Applications for data hiding.
IBM Systems Journal, 39(3&4):547–568.
Cox, I., Miller, M., Bloom, J., Fridrich, J., and Kalker,
T. (2007). Digital Watermarking and Steganogra-
phy. Morgan Kaufmann Publishers Inc., San Fran-
cisco, CA, USA, 2nd edition.
Crandall, R. (1998). Some notes on steganography.
http://os.inf.tu-dresden.de/˜westfeld/crandall.pdf.
Dumer, I. I., Kabatiansky, G. A., and Tavernier, C.
(2007). First-order binary reed-muller codes. Prob-
lemy Peredachi Informatsii, 43(3):66–74.
Fontaine, C. and Galand, F. (2007). How can reed-solomon
codes improve steganographic schemes? In Infor-
mation Hiding, 9th International Workshop, IH 2007,
volume 4567 of Lecture Notes in Computer Science,
pages 130–144.
Kahn, D. (1996). The history of steganography. In Infor-
mation Hiding, volume 1174 of Lecture Notes in Com-
puter Science, pages 1–5.
Moulin, P. and Koetter, R. (2005). Data-hiding codes. Pro-
ceedings IEEE, 93(12):2083–2127.
Munuera, C. (2007). Steganography and error-correcting
codes. Signal Processing, 87(6):1528–1533.
Sch¨onfeld, D. and Winkler, A. (2007). Reducing the com-
plexity of syndrome coding for embedding. In Infor-
mation Hiding, 9th International Workshop, IH 2007,
volume 4567 of Lecture Notes in Computer Science,
pages 145–158.
Sudan, M. (2000). List decoding: Algorithms and applica-
tions. In IFIP TCS, volume 1872 of Lecture Notes in
Computer Science, pages 25–41.
Westfeld, A. (2001). F5-a steganographic algorithm. In
Information Hiding, volume 2137 of Lecture Notes in
Computer Science, pages 289–302.
Zhang, W. and Li, S. (2008). A coding problem in steganog-
raphy. Des. Codes Cryptography, 46(1):67–81.
SECRYPT 2011 - International Conference on Security and Cryptography
356