
audio. Peeters (2011) presents an approach that is 
also based on audio. First, the onset positions are 
evaluated by an energy function. Based on this 
function, vector representations of rhythm 
characteristics are computed. For classifying these 
rhythms, four feature sets of these vectors are 
studied which are derived by applying DFT and 
ACF. Next, various ratios of the local tempo are 
applied to these vectors. Finally, a classification task 
measures the ability of these periodicity 
representations to describe the rhythm characteristics 
of audio items. 
Pattern-based Approaches: Ellis and Arroyo 
(2004) present an approach that uses Principal 
Components Analysis (PCA) to classify drum 
patterns. First, measure length and downbeat 
position are estimated for each track of a collection 
of 100 drum beat sequences given in General MIDI 
files. From each of these input patterns, a short 
sequence is passed to the PCA resulting in as set of 
basic patterns. A classification task is performed 
with them producing about 20 % correctly classified 
results. Murakami and Miura (2008) present an 
approach to classify drum-rhythm patterns into 
“basic rhythm” and “fill-in” patterns. Based on 
symbolic representations of music, i.e. General 
MIDI tracks, instruments are grouped by their 
estimated importance on playing roles in either 
“basic rhythm” patterns or “fill-in” patterns or both. 
These three groups model drum rhythm patterns. 
Expecting a minimum input of one measure in 4/4 
beat the classification is performed based on 
neighbourhood comparison. They achieve 
classification result of up to 76 %. 
Source Separation based Approaches: Tsunoo, 
Ono & Sagayama (2009) propose a method to 
describe rhythm by classifying track spectrograms 
based on audio. Thus, percussive and harmonic 
components of a track are first separated by the 
method described in Ono et al. (2008) followed by 
clustering the percussive part in a combination of 
One-Pass Dynamic Programming algorithm and k-
means clustering. Finally, the frame of each track is 
assigned to a cluster. The corresponding track’s 
spectrogram is used to classify the rhythms. They 
achieve accuracies of up to 97.8 % for House music. 
Psychoacoustic-based Approach: Rauber, 
Pampalk and Merkl (2002) propose a method to 
automatically create a hierarchical organization of 
music archives based on perceived sound similarity. 
First, several pre-processing steps are applied. All 
tracks of the archives are divided into segments of 
fixed length followed by the extraction of frequency 
spectra based on the Bark scale in order to reproduce 
human perception of frequency. Finally, the specific 
loudness sensation in Sone is calculated. After these 
pre-processing steps a time invariant representation 
of each piece of music is generated. In the last step 
of processing, these patterns are used for 
classification via Self-Organizing Maps. The method 
is based on audio.  
Although approaches on solving the problem of 
rhythm classification have already been presented 
yet the success rates can only be regarded as 
satisfying for specific genres, e.g. Popular music or 
House music (Ono et al., 2008) or ballroom dance 
music (Peeters, 2011). Furthermore, the majority of 
approaches (Paulus and Klapuri, 2002; Tzanetakis 
and Cook, 2002; Peeters, 2011; Tsunoo, Ono & 
Sagayama, 2009; Ono et. Al, 2008; Rauber, Pampalk 
and Merkl, 2002) rely on audio. Thus, further effort 
is required to improve classification methods that 
address symbolic data. 
3  CLASSIFYING RHYTHM 
PATTERNS 
In this paper we present an approach for the 
classification of music rhythms that treats rhythm as 
a sequence of N notes with a time difference 
between the onsets of adjacent notes. Our method is 
based on symbolic data in order to be able to access 
all necessary information for each note directly. 
Thus, by not using audio, we can exclude further 
sources of error, e.g. detecting the onset positions of 
notes. Although numerous onset detection 
approaches are known their reliability is still 
inadequate for excluding them as a possible source 
of error (Collins, 2005). 
We compare and classify rhythms in four steps. 
Step one covers all necessary preliminary 
computations; in step two all possible, i.e. 
hypothetical rhythm patterns are extracted; step 
three reduces the number of rhythm hypotheses and 
finally, step four performs the classification task 
utilizing a knowledge base. Fig. 1 illustrates this 
concept. 
However, to limit the number of possible sources 
of error, we only focus on drum rhythm patterns and 
limit our method to the use of temporal information 
and accentuation as features for the classification 
task. Furthermore, evaluated sequences are limited 
to a length of 30 s in order to reduce computational 
complexity. 
 
 
ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods
748