
 
reelin, which is an essential protein involved in the 
development of the six-layer cortex of the human 
brain.  Fractal analysis was applied to the HAR1 
nucleotide sequence and the homologous sequence 
in the chimpanzee genome.  Analysis shows that the 
differences in fractal dimension can be used as a 
marker of evolution.  The 118-bp in HAR1 contains 
18 point substitutions over an evolutionary span of 5 
million years when comparing the human to the 
chimpanzee.  However, the same 118-bp region only 
contains two point substitutions over a span of 300 
million years when comparing the chicken to the 
chimpanzee.  The implications of evolution and 
positive selection have been discussed in recent 
literature (Pollard, et al, 2006b).  
2  MATERIALS & METHODS 
The nucleotide sequences were downloaded from 
Genbank. The accession numbers of the mtDNA 
database are listed in the Appendix.  The studied 
primates are human, Neanderthal, chimp (chimp and 
pygmy chimp), gorilla (western and western 
lowland), and orangutan (Bornean and Sumatran).    
The ATCG sequence was converted to a 
numerical sequence by assigning the atomic number, 
the number of protons, to each of the nucleotides: 
A(70), T(66), C(58), G(78).  The assigned number is 
roughly proportional to the nucleotide mass.  This 
assignment was consistent with the recently reported 
mass fractal analysis of a ribosome sequence (Lee 
2006). The A-T and C-G pairs in a double strand 
DNA would have the same value of 136. 
Fractal dimension analysis can be used in the 
study of correlated randomness.  Among the various 
fractal dimension methods, the Higuchi fractal 
method is well suited for studying signal fluctuation 
(Higuchi, 1998).  The signal from the sequence 
represents a random spatial intensity series.  The 
spatial intensity (Int) random series with equal 
intervals could be used to generate a difference 
series (Int(j)-Int(i)) for different lags in the spatial 
variable.  The non-normalized apparent length of the 
spatial series curve is simply L(k) = Σ absolute 
(Int(j)-Int(i)) for all (j-i) pairs from 1 to k.  The 
number of terms in a k-series varies and 
normalization must be used to get the series length.  
If the Int(i) is a fractal function, then the log (L(k)) 
versus log (1/k) should be a straight line with the 
slope equal to the fractal dimension.  Higuchi 
incorporated a calibration division step (divide by k) 
such that the maximum theoretical value is 
calibrated to the topological value of 2.  The detailed 
calculation is given in the literature (Higuchi, 1998). 
When comparing the dimension of two fractal 
forms, the popular method of taking the difference 
of the two Higuchi fractal dimension values is valid 
to within a constant regardless of the calibration 
division step.  The Higuchi fractal algorithm used in 
this project was calibrated with the Weierstrass 
function.  This function has the form W(x) = Σ a
-nh
 
cos (2 π a
n 
x) for all the n values 0, 1, 2, 3…   The 
fractal dimension of the Weierstrass function was 
given by (2 - h) where h takes on an arbitrary value 
between zero and one.  
The Shannon entropy of a sequence can be used 
to monitor the level of functional constraints acting 
on the gene (Parkhomchuk, 2006).  A sequence with 
relatively low nucleotide variety would have a low 
Shannon entropy (more constraint) in terms of the 
set of 16 possible di-nucleotide pairs.  A sequence’s 
entropy can be computed as the sum of (p
i
) log(p
i
) 
over all states i and the probability p
i
 can be 
obtained from the empirical histogram of the 16 di-
nucleotide-pairs.  The maximum entropy is 4 binary 
bits per pair for 16 possibilities (2
 4
).  The maximum 
entropy is 2 bits per mono-nucleotide with 4 
possibilities (2
 2
).   
3  RESULTS & DISCUSSION 
For the 16S rRNA sequences, the C+G percent 
correlates with fractal dimension, FD, with R-square 
value of 0.88, N = 8, in Figure 1. Dropping human 
and Neanderthal data would increase the R-square 
value to ~ 0.91 because the data are in the middle as 
small outliers. 
y = 0.7981x + 1.6265
R
2
 = 0.881
1.96
1.965
1.97
1.975
1.98
0.42 0.425 0.43 0.435 0.44
 
Figure 1: The C+G percent (x-axis) versus FD (y-axis) for 
the studied 16S rRNA sequences. 
The mono-nucleotide entropy correlates with di-
nucleotide entropy in the 16S rRNA sequence with 
R-square value of ~ 0.88, N = 8 (Figure 2).   
Dropping human and Neanderthal increases R-
BIOINFORMATICS 2010 - International Conference on Bioinformatics
258