
The main purpose of this work is to use, as much 
as possible, the embedded information, taking 
advantage of the huge amount of analysis work 
performed by the MPEG video encoder. 
Furthermore, only few parameters have to be 
adjusted in the detector, regarding to the class of the 
moving object in scene. This class is related to the 
dimension of the moving object in scene and to the 
distance between the object and the camera. This 
image classification will increase the accuracy of the 
motion detector. 
In the following sections we will describe an 
efficient and low complexity scene change detector 
algorithm, which is able to detect significant visual 
events from a partially decoded MPEG bit stream. In 
section 2 we introduce MPEG standard and in 
section 3 the proposed algorithm is described. Some 
results are shown in section 4 and conclusions are 
presented in section 5.  
2  MPEG BIT STREAM 
INFORMATION 
MPEG encoders use a hybrid algorithm to compress 
video, by classifying and processing each frame as 
intra coded (I frame) or motion compensated inter 
coded (P and B) (F. Pereira and T. Ebrahimi, 2002). 
Intra frame pictures are encoded only using pixels 
within a frame, exploring the spatial redundancy 
with 8×8 DCT (Discrete Cosine Transform) blocks 
are transformed and DC and AC coefficients are 
entropy coded. P frames are encoded using motion 
compensated prediction from a past I/P frame, in 
order to remove the temporal redundancy. B frames 
are encoded using motion compensation prediction 
from both past and/or future encoded I/P frames. 
Video frames are organized in regular structures 
called group of pictures (GOP). Each frame (VOP) 
is divided into blocks of 16×16 pixels, called 
macroblocks (MB). Furthermore, each macroblock 
is divided into six 8×8 pixel blocks. After motion 
compensation, the residual image may also be 
divided into 8×8 pixel blocks, which are intra coded. 
Thus, a macroblock contains information about the 
type of temporal prediction used (or not) for motion 
compensation, which can be classified as intra 
coded, forward referenced, backward referenced, 
interpolated or direct. While MBs inside an I frame 
are intra coded, each MB in a P frame is either 
forward predicted, intra coded or skipped. Similarly, 
each MB in a B frame is either forward predicted, 
backward predicted, bidirectionally predicted, intra 
coded or skipped.  
3  COMPRESSED DOMAIN 
MOTION DETECTION 
In this section, we explain how motion detection is 
performed without fully decoding the bit stream. 
The proposed method mainly relies on the analysis 
of AC coefficient’s signal of I frames (section 3.1) 
and on the motion vector information of P and B 
coded frames (section 3.2). The main objective is to 
detect only motion related to the moving objects in 
the scene, eliminating camera switching (scene cuts) 
and some typical camera movements, which occurs 
in video surveillance scenes. 
13 
3.1 Motion detection  
In most surveillance applications, systems acquire 
and store images continuously, then a huge amount 
of information is required to be stored. In this case a 
high compression ratio is desirable. It is also 
common, that for long periods of time there are no 
motions in the scene. Thus, VOPs of type I can be 
sparser, which increases significantly the 
compression ratio. In this sense, we propose a 
hierarchical algorithm that processes the compressed 
video information in two stages.  
At the first stage, only I VOPs are analyzed, in 
order to check the signal variations between AC 
coefficients of two co-localized blocks in 
consecutive I VOPs. In order to speed up the 
process only a small set of significant coefficients 
are checked, and blocks with a number of 
coefficients with signal variation larger than 5 is 
used. When a number of blocks in this condition 
exceed a certain threshold, the image is regarded as 
containing a moving object. This threshold is 
obtained regarding the average and the variance of 
the number of blocks containing more than 5 signal 
variations.  We also have to deal with homogeneous 
surfaces and illumination changes, which tend to be 
detected as motion. When a VOP of type I is 
detected with moving objects, the algorithm moves 
to the second stage for a motion detection 
refinement. 
At the second stage, motion vectors of P and B 
VOPs are analyzed, in order to check the amount of 
motion vectors (MV) used to encode each inter 
frame. If the number of non-zero MVs exceeds a 
threshold given for that class (section 3.3) of 
surveillance scene, then the VOP is regarded to 
contain a moving object.   
After this step, it may happen that some motion 
detections are false, due to camera switching (scene 
cuts) or camera motions. These false motion 
FAST EVENT DETECTION IN MPEG VIDEO
265