VERSATILE EVALUATION OF EFFECTS ON DCT-BASED

LOSSY COMPRESSION OF EMG SIGNALS ON MEDICAL

PARAMETERS

Tiia Siiskonen, Tapio Grönfors and Niina Päivinen

Department of Computer Science, University of Kuopio, Yliopistonranta 5, Kuopio, Finland

Keywords: Lossy data compression, Electromyography, Discrete Cosine Transform and Medical parameters.

Abstract: Typically used simplified error measures, like mean-squared-error (MSE), do not reveal everything about

the clinical quality of lossy compressed medical signals. Errors have to be interpreted via essential medical

parameters. The medical parameters depend on the type of the signal and only the preservation of essential

medical parameters can guarantee the correct clinical quality. In this study, short electromyography (EMG)

signals are compressed with DCT transformation -based lossy compression method. The compression is

gained with irreversible masking and scalar quantization of the DCT coefficients. The most prominent

medical parameters of EMG signal are the mean frequency (MNF) and the median frequency (MDF). The

behaviors of these parameters are studied both by fitting a regression line and by examining the mean

absolute errors frequency-by-frequency over clinically interesting frequency range. This reveals the

frequency dependency of errors of the medical parameters and inspires the idea that the generated linear

model can be used for estimating the correct value of the processed medical parameter.

1 INTRODUCTION

The compression ratio, the computational efficiency

of the method, and the quality of the result are the

most essential features of lossy signal compression

(Salomon, 2004). The quality of the result is

typically characterized with mathematical,

measurable error, or the distance between original

and processed (compressed-decompressed) signal.

It has not been validated that simplified error,

represented as mean-squared-error (MSE) (Carotti et

al., 2006), signal-to-noise-ratio (SNR) (Cuerrero and

Mailhes, 1997) or root-mean-squared difference

(PRD) (Wellig et al., 1998), can establish the

preservation of medical parameters. Only the

preservation of essential medical parameters can

guarantee the correct clinical quality. In spite of that

fact, many medical signal compression studies rely

only on simplified error measurements. However,

some of the thorough studies have been concentrated

on distinguishing proper medical parameters (Chan,

Lovely and Hudgins, 1997; Carotti et al., 2006;

Grönfors, Reinikainen and Sihvonen, 2006).

The lossy compression of electromyography

(EMG) signals is not intensively studied, although

the first methods have been published almost ten

years ago (Cuerrero and Mailhes, 1997). Anyway,

currently many EMG technologies, for example

wireless measuring and archiving in patient

recordings, need effective data compression. In this

study, a DCT-based transformation approach has

been used (Cuerrero and Mailhes, 1997; Berger et

al., 2003), because of well-known algorithm with

efficient implementation.

The most prominent spectral features of EMG

signal are the mean frequency (MNF) and the

median frequency (MDF) (Farina and Merletti,

2000; Filligoi and Felici, 1999), whose time

evolution has been used for clinical assessment of

EMG recordings. The simplified error represents a

suggestive average estimate of the error value of the

medical parameters, but it cannot be used to predict

where in the dynamic range the error has been

concentrated. In this study, we focus on versatile

evaluation of compression effects on medical

parameters. Both systematic and random errors on

medical parameters are examined over these

dynamic ranges.

149

Siiskonen T., Grönfors T. and Päivinen N. (2007).

VERSATILE EVALUATION OF EFFECTS ON DCT-BASED LOSSY COMPRESSION OF EMG SIGNALS ON MEDICAL PARAMETERS.

In Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics, pages 149-156

DOI: 10.5220/0001640501490156

Copyright

c

SciTePress

2 MATERIALS AND METHODS

We have used real EMG recordings in this study. All

the tests and simulations were done with Matlab

(Versions 6.5.0.180913a Release 13 and 7.14

Release 14).

2.1 Test Signals

We have used EMG signals measured from

paraspinal muscles of healthy young volunteers. The

measurements and classification were done by an

experienced clinical neurophysiologist. The duration

of every signal was 20 seconds and they were

sampled with 1 kHz sampling frequency, consisting

of 20000 twelve-bit integer values measured with

DCU-600 lightweight EMG system (Sihvonen et al.,

2004). Each signal consists of several muscle

activity periods.

We have randomly picked out five 20000 sample

long EMG signals for training material and another

five 20000 sample long EMG signals for testing

material. On other words, we have used two

independent materials for testing and training, both

consisting of 100000 samples.

2.2 Spectral Features Mean Frequency

and Median Frequency

The mean and median frequencies are calculated

from the frequency spectrum of the segmented

signal. Signal segments are sliding over the signal

with one sample step (segments are heavily

overlapping). The frequency spectrum is obtained by

taking the FFT of the segment, using a Hanning

window of length 1024. The frequency spectrum

consists of 512 amplitude coefficients, A

i

.

The mean frequency MNF is the amplitude-

weighted average of the frequencies,

1

1

(1)

M

ii

i

M

i

i

fA

MNF

A

=

=

=

∑

∑

Graphically, the median frequency is the

frequency dividing the area of the amplitude

spectrum into equal halves. The value can be

computed using a cumulative function

(2)

k

f

m

m

fk

m

m

A

c

A

=

∑

∑

The median frequency MDF is the value of f

k

for

which the value of c

fk

is as close to 1/2 as possible.

2.3 The DCT Method

The proposed compression technique is based on

discrete cosine transformation which is a very

popular transformation used in many compression

schemes, especially in image compression standards

such as JPEG. There are also applications for

biomedical signal compression based on DCT

(Cuerrero and Mailhes, 1997; Berger et al., 2003).

The idea of transformation coding is that the

sequence of n data samples of one domain is rotated

to some other domain with equation

(3)

=

XTY

where X is the vector of original signal coefficients,

Y is the vector of transformed coefficients and T is

the transform matrix. The DCT coefficients of n data

samples in one-dimensional case is (Salomon, 2004)

given by

)4(

2

)12(

cos

2

1

0

∑

−

=

⎥

⎦

⎤

⎢

⎣

⎡

+

=

n

t

tff

n

ft

pC

n

G

π

where

)5(.1,...,1,0,

,0

,0

,1

,

2

1

−=

>

=

⎪

⎩

⎪

⎨

⎧

= ntffor

f

f

C

f

Input vector of n data values is p

t

and the output

vector is a set of n DCT coefficients G

f

. The inverse

DCT transformation is (Salomon, 2004) given by

.1,...,1,0

)6(

,

2

)12(

cos

2

1

0

−=

⎥

⎦

⎤

⎢

⎣

⎡

+

=

∑

−

=

nt

for

n

jt

GC

n

p

n

j

jjt

π

DCT compression concentrates signal energy to

a small number of DCT coefficients and the

compression is usually achieved by eliminating the

coefficients containing less information.

The DCT method applied here is based on three

steps:

ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics

150

• DCT

• Eliminating some of DCT coefficients

by using a masking vector

• Scalar quantization of the coefficients

First step was to calculate DCT from the original

signal using blocks of 16, 24 or 32 signal

coefficients. In these tests DCT was done by using

MatLab's DCT-function. After that, some of the

coefficients were eliminated by using binary

maskvector. Maskvector is the same size as the used

DCT block size. If maskvector's value in some index

is zero, the value of corresponding index of DCT

block will be eliminated. Otherwise maskvector's

value is one and DCT coefficient in corresponding

index will not be eliminated.

Maskvector is constant during the whole

compression process and the same vector is used

when compression is done and when signal is

decompressed. Before IDCT, receiver adds zeros at

those indexes of DCT block where coefficients have

been eliminated to have correct number of

reconstructed signal coefficients.

In this study, we have used masking to eliminate

high end DCT coefficients. For block size 16

coefficients we masked out last 3, 5 and 7 DCT

coefficients, for block size 24 respectively 4, 8 and

12 DCT coefficients, and for block size 32

respectively 5, 10 and 15.

After masking the selected coefficients, the rest

of coefficients will be scalar quantized. Compression

in this method comes from masking some DCT

coefficients and from scalar quantization.

Decompression is done by finding the DCT

values corresponding to indexes from codebook,

adding zeros to those places of the DCT block where

coefficients have been eliminated and making the

IDCT.

2.4 Scalar Quantization of Coefficients

In this study, non-uniform scalar quantization

method was used to quantize the DCT coefficients.

In a uniform scalar quantization the difference

between every value in codebook is the same,

whereas in a non-uniform scalar quantization the

difference between codebook values depends on the

distribution of coefficients' probabilities. In the

intervals where the probability of that the coefficient

is placed on that interval is large, the difference

between codebook values is short, and where the

probability of coefficient is placed on some interval

is small, the difference between codebook values is

bigger.

Table 1: Raw remaining sizes and mean-squared-errors

(MSE) of compressed signals in percentages by variations.

Codebook size 64 (6 bit)

Segment length 16 samples

Without mask 50% 25.6498

Masking last 3 41% 25.9935

Masking last 5 34% 27.6196

Masking last 7 28% 36.1951

Segment length 24 samples

Without mask 50% 17.0290

Masking last 4 42% 17.2294

Masking last 8 33% 19.0580

Masking last 12 25% 36.3946

Segment length 32 samples

Without mask 50% 19.5787

Masking last 5 42% 19.7208

Masking last 10 34% 20.8712

Masking last 15 27% 31.1169

Codebook size 256 (8 bit)

Segment length 16 samples

Without mask 67% 20.2835

Masking last 3 54% 20.6467

Masking last 5 46% 22.2934

Masking last 7 38% 30.9039

Segment length 24 samples

Without mask 67% 13.6706

Masking last 4 56% 13.8864

Masking last 8 44% 15.7420

Masking last 12 33% 33.1424

Segment length 32 samples

Without mask 67% 18.4853

Masking last 5 56% 18.6404

Masking last 10 46% 19.8118

Masking last 15 35% 30.1023

We constructed the codebooks by using Matlab's

KMEANS function. Before using KMEANS

function, the DCT of the training signal was

calculated using the same DCT block size which

will be used when compressing the test signal.

KMEANS function was given the following

parameters: training signal, which has 50000

samples, replicates 'rep' was 3, which made method

more optimal, maximum number of iterations

'maxiter' was 800 and 'EmptyAction' was 'singleton',

which creates a new cluster consisting of the one

point furthest from its centroid. We tested codebook

sizes 64 and 256. For codebook size 64, it is possible

to present all codebook indexes with 6 bits and

respectively for codebook size 256, indexes are

presented with 8 bits.

VERSATILE EVALUATION OF EFFECTS ON DCT-BASED LOSSY COMPRESSION OF EMG SIGNALS ON

MEDICAL PARAMETERS

151

3 RESULTS

The transformation itself has no compression effect;

all the compression is gained with irreversible

masking and scalar quantization of DCT

coefficients.

The achieved compression rations and related

MSE values by processing variations are listed in

Table 1. The general observation is that the MSE

increases when more coefficients are masked out

and MSE decreases when codebook size increases.

3.1 The Parameter Model

The mean frequency and median frequency values

are calculated from sliding segments for original

testsignal and all compressed-decompressed signals.

In every case we got 98974 MNF, MDF -pairs from

every signal. These values are compared time

synchronically against values of the original

unprocessed test material. That way we got new set

of value pairs:

()

()

,

(7)

,

original processed

ii

original processed

ii

MNF MNF

MDF MDF

where

98973,...,0=i is the segment number.

The pairs of values make possible the evaluation

of the effects of lossy compression to essential

medical parameters from-frequency-to-frequency. In

an ideal case, there are no differences.

Figure 1: Idea of fitting the regression line.

To model the behaviour of original MNF and

MDF values against the processed values, we fit the

regression lines to all sets with Matlab's POLYFIT

function.

dcMDFMDF

baMNFMNF

originalprocessed

originalprocessed

+=

+=

)8(

In Figure 1, the best fit line can be seen inside the

cloud of data points. Both axes are in frequency (Hz)

and the points are presented as the original value on

X-axis against the processed value on Y-axis. The

line coefficients and the norm of residuals are listed

in Table 2 - 5. If the line is exactly diagonal, there is

no error between the medical parameters of original

and processed signals.

The error of MNF value is typically positive in

low frequencies (the MNF of processed signal is

higher than the MNF of the original signal) and

negative in high frequencies. Reversal point is

around 80 Hz. The negative error in high frequencies

is smaller on nonmasked cases and the masking

increases it. The behaviour of the error of MDF

value is similar to MNF value, but typically smaller

in absolute value.

The line coefficients and the norm of residuals

values not seems to be dependent on segment length.

By comparing MSE values in Table 1 and norm of

residual values in Tables 2-5, can be recognized that

results are more or less correlated with each other.

Table 2: Line coefficients and the norm of the residuals of

MNF values.

Codebook size 64 (6 bit)

Segment length 16 samples

Without mask a=0.9611 b=5.0618 526.1498

Masking last 3 a=0.9479 b=5.6428 592.4270

Masking last 5 a=0.9401 b=6.1727 727.8094

Masking last 7 a=0.9425 b=5.9066 954.0903

Segment length 24 samples

Without mask a=0.9824 b=2.5583 280.5778

Masking last 4 a=0.9616 b=3.6733 486.0408

Masking last 8 a=0.9421 b=4.6243 833.8518

Masking last 12 a=0.9110 b=6.0588 1.1791e+003

Segment length 32 samples

Without mask a=0.9780 b=2.8869 301.1080

Masking last 5 a=0.9635 b=3.5139 394.1623

Masking last 10 a=0.9432 b=4.5146 626.5258

Masking last 15 a=0.9121 b=5.4824 972.4219

ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics

152

Table 3: Line coefficients and the norm of the residuals of

MNF values.

Codebook size 256 (8 bit)

Segment length 16 samples

Without mask a=0.9721 b=3.4008 412.3716

Masking last 3 a=0.9607 b=3.9323 544.9072

Masking last 5 a=0.9552 b=4.3098 711.5584

Masking last 7 a=0.9553 b=4.3155 946.9896

Segment length 24 samples

Without mask a=0.9865 b=1.7056 242.4786

Masking last 4 a=0.9660 b=2.8420 505.2355

Masking last 8 a=0.9469 b=3.8802 832.9791

Masking last 12 a=0.9160 b=5.3868 1.1716e+003

Segment length 32 samples

Without mask a=0.9752 b=2.8877 292.1546

Masking last 5 a=0.9605 b=3.5969 423.3512

Masking last 10 a=0.9412 b=4.5570 651.8788

Masking last 15 a=0.9119 b=5.4024 979.2064

Table 4: Line coefficients and the norm of the residuals of

MDF values.

Codebook size 64 (6 bit)

Segment length 16 samples

Without mask c=0.9943 d=0.9374 337.6673

Masking last 3 c=0.9895 d=1.0218 344.2408

Masking last 5 c=0.9837 d=1.1714 385.5636

Masking last 7 c=0.9689 d=1.4947 501.9240

Segment length 24 samples

Without mask c=0.9949 d=0.6864 291.5252

Masking last 4 c=0.9890 d=0.8036 355.9760

Masking last 8 c=0.9774 d=1.0882 477.4020

Masking last 12 c=0.9328 d=2.3365 708.1398

Segment length 32 samples

Without mask c=0.9960 d=0.6308 279.4163

Masking last 5 c=0.9910 d=0.7421 295.6508

Masking last 10 c=0.9807 d=1.0079 361.8950

Masking last 15 c=0.9445 d=1.9601 574.9671

Table 5: Line coefficients and the norm of the residuals of

MDF values.

Codebook size 256 (8 bit)

Segment length 16 samples

Without mask c=0.9955 d=0.5822 261.3096

Masking last 3 c=0.9914 d=0.6590 289.8141

Masking last 5 c=0.9863 d=0.7767 344.2559

Masking last 7 c=0.9717 d=1.1063 469.8654

Segment length 24 samples

Without mask c=0.9978 d=0.3327 243.8635

Masking last 4 c=0.9918 d=0.4807 296.6823

Masking last 8 c=0.9806 d=0.7793 422.6246

Masking last 12 c=0.9365 d=2.0190 672.4723

Segment length 32 samples

Without mask c=0.9932 d=0.7448 261.7802

Masking last 5 c=0.9884 d=0.8634 287.3707

Masking last 10 c=0.9779 d=1.1613 362.2217

Masking last 15 c=0.9426 d=2.0834 570.4948

3.2 Contemplation of Error

Examining the mean absolute error of MNF and

MDF values frequency-by-frequency over clinically

interesting frequency range from 40 Hz to 180 Hz is

an entirely novel approach.

The mean absolute error (MAE) is calculated by

sorting the value pairs (Equation 8) in increasing

order and averaging the differences between original

and processed value inside the pair. It must be

noticed that the distribution of the value pairs is not

uniform; on the contrary, the average value is in

some cases coarse.

By examining Figures 2 – 4, it can be easily

noticed that the mean absolute error of MNF and

MDF get the least values between 80 and 120 Hz in

all processing variations. Error is very moderate

within this range, and the segment length itself

doesn't dominate the error.

In the range less than 80 Hz, the error increases

when more coefficients are masked out. However,

behaviour is similar with MNF and MDF values and

also with codebook size 64 (6 bit) and codebook size

256 (8 bit).

The most prominent differences can be seen in

the range over 120 Hz. The error is multifold

compared to other ranges and heavily increasing

when more coefficients are masked out. At this

range the errors are also more dependent on the

codebook size.

Generally, the MNF error is larger than the MDF

error. The segment lengths have not fundamental

effect on error. Again, by comparing MSE values in

Table 1 and peak level of the MAE in the range over

120 Hz in Figures 2-5, can be recognized that results

are more or less correlated with each other, but not

so evidently than in case of the norm of residual

values.

4 CONCLUSIONS

The main value of this study was to reveal the

complexity of error evaluation on EMG signal lossy

compression studies. Guerrero and Mailhes (1997)

have used standard deviation estimator -based SNR

to evaluate the quality of the process. Wellig et al.

(1998) have used both SNR and PRD on quality

evaluation. Berger et al. (2003) use energy -based

SNR as a tool for quality evaluation. None of these

studies cover any medical parameters. Chan, Lovely

and Hudgins (1997) were first ones to use medical

parameters in performance evaluation. Carotti et al.

(2006) have used both MSE and some medical

VERSATILE EVALUATION OF EFFECTS ON DCT-BASED LOSSY COMPRESSION OF EMG SIGNALS ON

MEDICAL PARAMETERS

153

parameters, including MNF and MDF, for quality

evaluation. Examination is made via four force

levels and the results show a valid correlation

between MSE, MNF, and MDF values. Grönfors,

Reinikainen and Sihvonen (2006) have used PRD

value and percentual differences of MNF and MDF

values in quality evaluation. Also these values

indicate correlative behaviour. The use of averaged

values over signals is common for all the referred

studies.

The averaged processing errors with standard

deviations of medical parameters form the baseline

for the evaluation of a lossy compression method.

However, there are pitfalls in the use of averaged

error values. Only the error examinations over the

whole clinically interesting range of parameter

values expose the fidelity.

In this study we have used frequency-by-

frequency aspect and compared synchronically

generated medical parameters of original and

processed signals. We have found that there is more

or less correlation between MSE values and errors in

medical parameters. However, this interdependency

can only reveal the coarse amount of error, not

errors natural for a specific range of MNF or MDF

values.

The contemplation of error approach (chapter

3.2) has strong analytic use in finding out the values

for which the medical parameters are valid. The

parameter model approach (chapter 3.1) has both

theoretical, analytical, value and practical, predictive

usage. The generated regression line can be used for

estimating the true value of the processed parameter.

Together both approaches can produce a tool for

calculating the corrected MNF and MDF value and

an index for their quality.

Some of the achieved results are hypothetical,

such as the best achived compression ratio has the

worst MSE and the effect of masking on error in

high frequency range. With DCT-based method, the

segment length seems not to have prominent effect

on error as with direct vector quantization based

method has (Grönfors and Päivinen, 2006).The

method should be further tested with larger datasets

and with larger quantity of different lossy

compression methods.

Figure 2: Mean absolute errors of MNF and MDF values for segment length 16. Solid line for codebook size 64 and

dotted line for codebook size 256.

ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics

154

Figure 3: Mean absolute errors of MNF and MDF values for segment length 24. Solid line for codebook size 64 an

d

dotted line for codebook size 256.

Figure 4: Mean absolute errors of MNF and MDF values for segment length 32. Solid line for codebook size 64 an

d

dotted line for codebook size 256.

VERSATILE EVALUATION OF EFFECTS ON DCT-BASED LOSSY COMPRESSION OF EMG SIGNALS ON

MEDICAL PARAMETERS

155

ACKNOWLEDGEMENTS

The authors thank MD PhD Teuvo Sihvonen for his

valuable comments and support in EMG data

collection.

REFERENCES

Berger, P., Nascimento, F., Carmo, J., Rocha, A., dos

Santos, I., 2003. Algorithm for compression of EMG

signals, In Proc. 25th Annual International Conf

IEEE. Engineering in Medicine and Biology society,

Cancun, Mexico, 1299-302

Carotti, E., De Martin, J., Merletti, R., Farina, D., 2006.

Compression of surface EMG signals with algebraic

code exited linear prediction, Medical Engineering &

Physics, Article in press.

Chan, A., Lovely, D., Hudgins, B., 1997. Errors associated

with the use of adaptive differential pulse code

modulation in the compression of isometric and

dynamic myo-electric signals, Medical and

Biological Engineering and Computing, 36, 215-219

Cuerrero, A., Mailhes, C., 1997. On the choice of an

electromyogram data compression method, In Proc.

19th Annual International Conf IEEE. Engineering in

Medicine and Biology society, Chicago, IL, USA,

1558-61

Farina, D., Merletti, R., 2000. Comparison of algorithms

for estimation of EMG variables during voluntary

isometric contractions, Journal of Electromyography

and Kinesiology, 10, 337-349

Filligoi. GD., Felici, F., 1999. Detection of hidden rhytms

in surface EMG signals with a non-linear time-series

tool, Medical Engineering & Physics, 21, 439-448

Grönfors, T., Päivinen, N., 2006, The effect of vector

length and gain quantization level on medical

parameters of EMG signals on lossy compression. In

Proc. 3th International Conference on Advances in

Medical, Signal and Information Processing MEDSIP,

Glasgow, UK.

Grönfors, T., Reinikainen, M., Sihvonen, T., 2006. Vector

quantization as a method for integer EMG signal

compression, Journal of Medical Engineering &

Technology, 30(1), 41-52

Salomon, D., 2004. Data Compression The Complete

Reference, Springer-Verlag, New York

Sihvonen, T., Sihvonen, P., Kuusrainen, S., Grönfors, T.,

2004. Lightweight embedded system for acquiring

simultaneous electromyogenic activity and movement

data, In Proceedings of the 6th Nordic Signal

Processing Symposium - NORSIG 2004, June 9-11,

2004, Espoo, Finland, 177-179.

Wellig, P., Cheng, Z., Semling, M., Moschytz, G., 1998.

Electromyogram data compression using single-tree

and modified zero-tree wavelet encoding, In Proc.

20th Annual International Conf IEEE. Engineering in

Medicine and Biology society, Hong Kong Sar, China,

1303-6.

ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics

156