Development of Agents that Create Melodies based on Estimating

Gaussian Functions in the Pitch Space of Consonance

Hidefumi Ohmura

1 a

, Takuro Shibayama

, Keiji Hirata

and Satoshi Tojo

Department of Information Sciences, Tokyo University of Science, 2641 Yamazaki, Noda-shi, Chiba, Japan

Department of Information Systems and Design, Tokyo Denki University,

Ishizaka, Hatoyama-cho, Hikigun, Saitama, Japan

Department of Complex and Intelligent Systems, Future University Hakodate,

116-2, Kamedanakano-cho, Hakodate-shi, Hokkaido, Japan

Graduate School of Information Science, Japan Advanced Institute of Science and Technology,

1-1 Asahidai, Nomi-shi, Ishikawa, Japan

Keywords:

Music, Melody, Lattice Space, GMM, EM Algorithm.

Abstract:

Music is organized by simple physical structures, such as the relationship between the frequencies of tones.

We have focused on the frequency ratio between notes and have proposed lattice spaces, which express the

ratios of pitches and pulses. Agents produce melodies using distributions in the lattice spaces. In this study,

we upgrade the system to analyze existing music. Therefore, the system can obtain the distribution of the pitch

in the pitch lattice space and create melodies. We conﬁrm that the system ﬁts the musical features, such as

modes and scales of the existing music as GMM. The probability density function in the pitch lattice space is

suggested to be suitable for expressing the primitive musical structure of the pitch. However, there are a few

challenges of not adapting a 12-equal temperament and dynamic variation of the mode; in this study, we focus

on these challenges.

1 INTRODUCTION

Music is essential in various cultures, and people have

used music for various purposes (DeNora, 2000). It is

often thought that only professional musicians create

music; however, this is not true because almost ev-

eryone creates music, for example, when they hum

and whistle a melody by intuition in the bathroom

(Jordania, 2010). Why do people with limited mu-

sical education enjoy listening to music and creating

melodies? We believe that the reason comes from

the gestalt perception of humans. Music is organized

by simple physical structures, such as the relation-

ship between the frequencies of tones. Humans can

understand musical structures because they can often

discern the relationship between frequencies. Ledahl

and Jackendoff proposed the theory to analyze mu-

sic based on musical gestalt perception (Meyer, 1956;

Lerdahl and Jackendoff, 1983).

We focused on the frequency ratios of the funda-

mental relations between tones, and the development

https://orcid.org/0000-0003-4373-0890

of agents that create melodies as a system (Ohmura

et al., 2018; Ohmura et al., 2019). The frequency ratio

refers to the interval between two basic frequencies

of tones and note values between pulse frequencies of

the sound timing. The agents in the system produce

notes based on the probability density function. There

are two types of spaces, one for pitch and another for

musical values. The agents have a probability density

function consisting of one or two normal distributions

in every two spaces. This system provides simple

melodies like humming and whistling. Moreover, the

system creates a structure of the musical theory, such

as musical modes and complex rhythms. Therefore,

it was suggested that the spaces based on frequency

ratios could express musical structures quantitatively.

However, the system was only capable of creating

melodies and was unable to analyze existing music.

In this study, we make improvements to the system

to analyze existing music and express the probability

density functions of the spaces based on frequency

ratios. First, we provide a system analyzing pitches

of existing music. This system can read a Standard

MIDI ﬁle (SMF) as existing music. The system anal-

Ohmura, H., Shibayama, T., Hirata, K. and Tojo, S.

Development of Agents that Create Melodies based on Estimating Gaussian Functions in the Pitch Space of Consonance.

DOI: 10.5220/0009382203630369

In Proceedings of the 12th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2020) - Volume 1, pages 363-369

ISBN: 978-989-758-395-7; ISSN: 2184-433X

363

880Hz

(A5)!

990Hz

(B5)!

660Hz

(E5)

440Hz

(A4)!

495Hz

(B4)!

2:3!

5th!

4th

3:4

Octave1:2

1320Hz

(E6)

330Hz

(E4)

Octave1:2

4th

3:4

2:3!

5th!

4th

3:4

4th

3:4

2:3!

5th!

2:3!

5th!

Figure 1: Relationships between pitch (interval).

yses the ﬁle and creates a density function for each

melody. The system can then output melodies based

on each density function. Moreover, users can per-

form readjustment of parameters of the distributions

to control the musical structures of the outputs.

2 LATTICE SPACES BASED ON

FREQUENCY RATIOS

2.1 Interval and Musical Temperament

The pitch of a note is deﬁned by the frequency of air

vibration. Real sounds consist of various frequencies,

and humans recognize the lowest frequency, which is

called the fundamental frequency, as the pitch of the

note. The relationship between two notes, which is

called the interval, is deﬁned as the frequency ratio.

There are four intervals called the perfect consonance,

a unison, a perfect fourth, a perfect ﬁfth, and an oc-

tave. Humans feel they are the best-matched interval

group.

These groups are based on primitive ratios. A fre-

quency ratio of 1:1 between two pitches creates a uni-

son. A frequency ratio of 1:2 between two pitches

creates an octave. A frequency ratio of 2:3 between

two pitches creates a perfect ﬁfth. A frequency ratio

of 3:4 between two pitches creates a perfect fourth.

Figure 1 shows pitches of notes created by perfect

consonance based on 660Hz (E5);

The frequency values are created from 660Hz us-

ing Pythagorean temperament, which is one of the

musical temperaments. The Pythagorean temper-

ament is only based on ratios, 1:2, 2:3, and 3:4,

and its temperament provides accurate values of per-

Figure 2: Comparing Pythagorean tuning with 12-equal

temperament.

fect consonance. However, Pythagorean temperament

does not deﬁne 12 notes because of the Pythagorean

comma based on 2

, (3/2)

. The most popular tem-

perament is 12-equal temperament, which divides an

octave into twelfths. The 12-equal temperament treats

12 notes equally but does not provide accurate val-

ues of the consonance. Figure2 shows differences

between Pythagorean temperament and the 12-equal

temperament. The whorl shows the mean values of

the pitch, and the outer position is larger than the inner

position. The same angle signiﬁes octaves (1:2:4:8...).

Twelve lines show the pitch notation of the 12-equal

temperament. Circles show positions depending on

the perfect ﬁfths ((3/2)

,n = 1,2,3) from the red cir-

cle. In this study, because of SMF, we adopt the 12-

equal temperament to the system.

2.2 Lattice Spaces

There are two spaces in the system. The ﬁrst space

expresses pitch frequencies, called the pitch lattice

space, and the other space expresses frequencies of

the sound timing of pulses, called the rhythm lattice

space. Figure 3 is the expanded space from Figure 1.

Next, we explain the rhythm lattice space. In

rhythm, the frequencies of pulses are vital features.

When a listener hears two pulses whose relationship

is 1:2, they may feel a duple meter. Figure 5 shows the

relationship. When the relationship is 1:3, the listener

may feel a triple meter. Moreover, when a relation-

ship is 1:5, the listener may feel a quintuple meter.

However, one generally feels a quintuple meter as a

2 + 3 meter, for the quintuple meter is relatively chal-

lenging to perceive by the human ear. Actual music

HAMT 2020 - Special Session on Human-centric Applications of Multi-agent Technologies

364

×9/8!

× 2!

880Hz!

3520Hz!

1760Hz!

440Hz!

220Hz!

110Hz!

55Hz!

22.5Hz!

2640Hz!

5280Hz!

1320Hz!

660Hz!

330Hz!

165Hz!

82.5Hz!

41.3Hz!

7920Hz!

3960Hz!

1980Hz!

990Hz!

495Hz!

247.5Hz!

123.8Hz!

5940Hz!

G"8

2970Hz!

G"7

1485Hz!

G"6

742.5Hz!

G"5

371.3Hz!

G"4

32.6Hz!

521.1Hz!

130.4Hz!

65.2Hz!

16.3Hz!

260.7Hz!

97.8Hz!

1564.4Hz!

391.1Hz!

195.6Hz!

48.9Hz!

24.4Hz!

782.2Hz!

7040Hz!

293.3Hz!

4693.3Hz!

1173.3Hz!

586.7Hz!

146.7Hz!

73.3Hz!

36.7Hz!

2346.7Hz!

18.3Hz!

1043.0Hz!

2085.9Hz!

4171.9Hz!

3128.9Hz!

6257.8Hz!

260.7Hz!

20.6Hz!

260.7Hz!

185.6Hz!

G"3

92.8Hz!

G"2

46.4Hz!

G"1

23.2Hz!

G"0

× 4/3

! × 3/2!

Figure 3: The pitch lattice space based on f 2:3 and 3:4.

× 1/2

× 2

× 1/2

× 2

× 1/3

× 3

× 1/3

× 3

Figure 4: Relations between pulses.

consists of many pulses. A listener feels the strongest

or most frequent pulse as the meter of the music, and

the less frequent pulses as weak beats and up beats.

Monophony, however, lacks beats, such that a listener

at times may not feel any meter. For example, some

pieces of the Gregorian chant provide some rhythmic

interpretations, which is also true in the melodies of

humming.

We provide the rhythm lattice space, which also

consists of the ratios of 1:2 and 1:3 (see Figure 5). The

unit in this lattice space is bpm (beats per minute).

In this ﬁgure, 72 bpm is the basic frequency of the

pulse. The x-axis indicates triple relationships, and

the y-axis shows the duple relationships. Each point

of intersection is the frequency of a pulse. In this ﬁg-

ure, there are symbols of musical notes; a quarter note

is 72 bpm.

2.3 Outputting Note with GMM

In the system, there are probability density functions

in each space. The sound timing and pitches of an

! 3

! 2

96bpm 288bpm32bpm 864bpm 2592bpm

48bpm16bpm 432bpm 1296bpm144bpm

24bpm8bpm 216bpm 648bpm72bpm

12bpm 36bpm4bpm 108bpm 324bpm

6bpm 18bpm2bpm 54bpm 162pm

Figure 5: The lattice space for musical values with duple

and triple relationships.

output note depend on each function. The probability

density functions consist of one or two normal distri-

butions.

A normal distribution is expressed by the follow-

ing formula.

N(x) =

√

2πσ

exp



−

(x −µ)

2σ



µ is the mean and σ

is the variance.

The function is extended to two dimensions as fol-

lows.

N(x) =

(2π)

|Σ |

exp



−

(x −µ)

Σ(x − µ)



The details of each value are as follows.

x =





, µ =





,Σ =



Cov

Cov σ



Cov means a covariance. ρ means a coefﬁcient of

correlation between values on the x- and y-axis and is

calculated from Co v as follows.

ρ =

Cov

·σ

Using σ

,σ

, µ

, ρ, the function of the 2-

dimension normal distribution is expressed as follows

N(x,y) =

2πσ

1 −ρ

×exp

−

2(1 −ρ

)



(x −µ

)

−2ρ

(x −µ

)(y − µ

)

(y − µ

)



Development of Agents that Create Melodies based on Estimating Gaussian Functions in the Pitch Space of Consonance

365

G!4

D!4G4

E!4

B!4

G!6

G!5

D!5

B!6

B!5

E!5

Figure 6: Miyako-bushi scale in the lattice space of pitches

(Gray notations are concerned pitches).

If the agent has a normal distribution in the pitch lat-

tice space, it can create musical modes. An agent

with a normal distribution can only create a simple

musical mode. If the agent needs to create a melody

with a complicated mode, it must have a more com-

plex distribution. For example, if agents create a

melody of the Miyoko-bushi scale, which is a tradi-

tional Japanese mode (see Figure 6), it must have two

normal distributions. For these reasons, the agent in

the system has two normal distributions in each space.

When more than one normal distribution is used,

there is a Gaussian mixture model (GMM), which is

expressed as follows.

N(x|µ,Σ,w) =

k=1

·N(x|µ

,Σ

) (1)

At this moment, there are two normal distribution

functions (K = 2). w

shows the weight of each func-

tion, and w

+ w

= 1. Each agent has these param-

eters for creating melodies. Users can adjust the pa-

rameters with sliders of the interface of the system.

Here, we explain the ﬂow execution of the pro-

gram. When users push the play button, iterative pro-

cessing occurs as follows

1. Select a pitch from the rhythm lattice space ac-

cording to the probability density function.

2. Is the timing of the pulse hitting a note?

yes: Select a pitch from the pitch lattice space ac-

cording to the probability density function and

output it.

no: Do nothing

3. Go to 1 as the next step.

3 PROPOSED SYSTEM

In this study, we improve the existing system by

adding new features. The improved system analyzes

existing music and creates GMM in the pitch lattice

space, and it also accepts Standard MIDI Files (SMF)

as existing music. First, we explain SMF, and then we

elaborate on how to ﬁt GMM.

3.1 Standard MIDI File

MIDI (Musical Instrument Digital Interface) is the

standard of how to connect and share the information

of the musical performance between electronic instru-

ments. Standard MIDI File (SMF) is a ﬁle format

of MIDI for saving musical data. This system ana-

lyzes the pitch data of SMF as data of existing music.

There are three formats of SMF; however, the system

accepts only format 1 depending on the implementa-

tion.

3.2 Fitting GMM

The system considers a Gaussian Mixture model

(GMM) consisting of two normal distributions. We

adopt the EM algorithm as an approximation function

of existing music. The Probability density function

consisting of two normal distribution is expressed by

formula 1. Therefore, the log-likelihood function is

as follows;

ln p(X|µ,Σ,w) = ln

n=1

k=1

·N



|µ

,Σ



n=1

k=1

·N



|µ

,Σ



Let us deﬁne z

as hidden values, which means

that data x

belongs to class k. The posterior prob-

ability γ(z

) is calculated as follows using Bayes’

theorem.

γ(z

) =



|µ

,Σ



i=1



|µ

,Σ



(2)

The partial differentiation of each parameter pro-

vides us with the updated formulas as follows:

n=1

γ(z

)

new

n=1

γ(z

new

n=1

γ(z

)(x

−µ

new

)(x

−µ

new

)

new

N k

(3)

Using two computation processes alternately, For-

mula 2 called the E-step and Formula 3 called the M-

step, the system can ﬁnd optimum values of parame-

ters.

HAMT 2020 - Special Session on Human-centric Applications of Multi-agent Technologies

366

Figure 7: Sound Control Panel.

3.3 Implementation

We implemented the proposed agents as a system us-

ing HTML and JavaScript for creating music system

We used a web audio API as sounding notes. More-

over, we used tone.js

for analyzing SMF and tem-

pura.js

for executing the EM algorithm. We con-

ﬁrmed the operation of the system in Google Chrome.

An agent creates a melody line. The system out-

puts up to three melody lines because the system has

three agents. Users control parameters of the proba-

bility density functions for pitch and rhythm.

We prepared some preset data for examples which

provide musical modes, such as the Miyako-bushi

scale (Figure 6).

The system reads SMF by analyzing the existing

music and determines the pitch of each track. The sys-

tem calculates the optimal parameters of GMM from

the pitch data using the EM algorithm.

3.4 System Operating Instructions

The operational screen consists of three panels, the

sound control panel, the pitch control panel, and the

note-value control panel. Herein, we provide a step-

by-step explanation of their usage.

3.4.1 Sound Control Panel

At [Sound Control] (Figure 7), users can control

play/stop, volume, tempo, duration, waveform, and

melody lines of the output. The header of the oper-

ation screen also includes a play/stop button. Slid-

ers control the values of the volume, tempo, and du-

ration. The value of the tempo indicates the pro-

gram cycle time in bpm. The value of the duration

is the length of time of each note. By controlling

http://ohmura.sakura.ne.jp/program/pitchMaker/

pitchMaker010/

https://github.com/Tonejs/Midi

http://mil-tokyo.github.io/tempura/

Figure 8: Pitch control panel.

this value, melodies show articulations as staccato and

tenuto. With the waveform selector, users can select

from “Sin,” “Square,” “SawTooth,” and “Triangle.”

Users can select “Bongo” and “Piano” as actual sound

source samples. The [sound control] includes a pre-

set selector that provides each setting for the musical

mode. Using ‘choose ﬁle button’, the system can be

read arbitrary SMF. Moreover, the [sound control] in-

cludes a selector of preset SMF.

3.4.2 Pitch Control Panel

At [pitch control] (Figure 8), users can control the pa-

rameters of each probability density function for the

pitch of the melody lines using sliders. Each value of

the probability density function is shown in the upper

right [pitch cells]. The values of the melody lines are

shown in different colors. The ﬁrst line is cyan, the

second line is magenta, and the third line is yellow.

A darker color indicates a higher value. Using but-

tons at the bottom of the [pitch cells], each probability

density function is set as visible or invisible. The op-

erations of the melody lines are independent. Using

the upper left buttons, the users select an operating

melody line. Sliders control the parameters of the pri-

mary function in the [Main-function Settings]. The

sliders control the parameters of the subfunction in

the [subfunction settings]. During system execution,

the selected pitches are shown at the bottom right [cir-

cle of ﬁfth]. Therefore, users can conﬁrm the output

pitch in real-time.

3.4.3 Note Value Control Panel

At [note value control] (Figure 9), users can control

the parameters of each probability density function

Development of Agents that Create Melodies based on Estimating Gaussian Functions in the Pitch Space of Consonance

367

Figure 9: Note value control panel.

for the note values of the melody lines using slid-

ers. Each value of the probability density function

is shown in the upper right [note value cells]. As is

the case with [pitch control], the values of the melody

lines are shown in different colors. The ﬁrst line is

cyan, the second line is magenta, and the third line is

yellow. A darker color indicates a higher value. Using

the buttons at the bottom in the [note value cells], each

probability density function is set as visible or invisi-

ble. The operations of the melody lines are indepen-

dent, as in the case of [pitch control]. During system

execution, the selected note values are shown at the

bottom right [pulses]. Therefore, users can conﬁrm

the output pulses of the note values in real-time. The

pulses can be zoomed using buttons and displayed on

a log scale using a toggle button.

4 DISCUSSION

When the system reads some SMF, it creates a mu-

sical scale and mode of the existing SMF. For exam-

ple, using the SMF preset, Usagi (Japanese nursery

song), the system shows Figure 10 in ‘Pitch Cells’.

As seen from the ﬁgure, the system ﬁts the Miyako-

bushi scale using two normal distribution functions.

However, there are some challenges with this system,

as discussed below.

For example, using the SMF preset, Debussy Pre-

lude, the system shows Figure 11 in ‘Pitch Cells’. As

seen from the ﬁgure, the areas of distribution are far

from each other. The reason is that this SMF is writ-

ten in G-ﬂat major, which includes G[, A[, B[, B,

Figure 10: Distribution of Usagi (Japanese nursery song) in

the lattice space for pitch.

Figure 11: Distribution of Debussy 1-8 in the lattice space

for pitch.

D[, E[ and F, in contrast, the center of the pitch lat-

tice space is D. As a solution, the system analyzes the

modes and scales of the existing music, then the key

of the mode can be set in the center of the pitch lattice

space.

Another challenge is that the spread of the pitch

lattice space continues inﬁnitely in Pythagorean tun-

ing, and yet the spread of the pitch lattice space loop

is over twelve notes. If the system targets a 12-equal

temperament, we may need to adopt the von Mises

distribution, which considers the direction, rather than

the normal distribution. We should consider and up-

date the system so that it treats various temperaments.

Furthermore, this system cannot express the dy-

namic variation of music because it reads music as

a whole and creates one probability density function.

For example, the probability density function of the

result of Beethoven’s Moonlight sonata 1 includes

various notes on the x-axis because the music trans-

poses various keys. As a solution, the system needs to

HAMT 2020 - Special Session on Human-centric Applications of Multi-agent Technologies

368

consider dynamic variations.

Additionally, the upgrade function is limited to the

pitch of the music. In the future, we will add a func-

tion for the music values and rhymes to the system.

When the system has this function, it will be able to

consider dynamic variations.

The lattice spaces based on frequency ratios we

have proposed are inspired by human musical cog-

nition, which differs from musical scores based on

creating music; therefore, the system cannot consider

macro structures but can create primitive structures.

The system might be able to treat not only rhythm and

musical value but also musical forms such as reprises

and developments.

5 CONCLUSIONS

We have focused on the frequency ratio between notes

based on pitch and sound timings and have developed

an agent that creates music using a web system. We

have proposed lattice spaces that express the ratios of

pitches and pulses. Agents create melodies based on

the GMM of the lattice spaces. In this study, we up-

graded the system to analyze existing music. There-

fore, the system can get the distribution of pitch in

the pitch lattice space and create melodies. We con-

ﬁrm that the system ﬁts musical features, such as

modes and scales of the existing music like GMM.

It is suggested that the pitch lattice space and GMM

are suitable for expressing primitive musical struc-

tures of pitch. However, there are some challenges of

not adapting a 12-equal temperament and of dynamic

variation of the mode. We are going to approach these

problems in our future work.

ACKNOWLEDGEMENTS

This work was supported by JSPS KAKENHI Grant

Numbers JP17K12808, JP17K02347, JP16H01744

and JP19K00227.

REFERENCES

DeNora, T. (2000). Music in everyday life. Cambridge, UK:

Cambridge University Press.

Jordania, J. (2010). Music and emotions: humming in hu-

man prehistory. Proceedings of the International Sym-

posium on Traditional Polyphony (Tbilisi), pages 41–

49.

Lerdahl, F. and Jackendoff, R. (1983). A Generative Theory

of Tonal Music. MIT Press.

Meyer, L. B. (1956). Emotion and meaning in music. Uni-

versity of Chicago Press.

Ohmura, H., Shibayama, T., Hirata, K., and Tojo, S. (2018).

Music generation system based on a human instinctive

creativity. In Proceedings of Computer Simulation of

Musical Creativity (CSMC2018).

Ohmura, H., Shibayama, T., Hirata, K., and Tojo, S. (2019).

Development of agents for creating melodies and in-

vestigation of interaction between the agents. In

ICAART2019: Proceedings of the 11th International

Conference on Agents and Artiﬁcial Intelligence, vol-

ume 1: HAMT, pages 553–569.

Development of Agents that Create Melodies based on Estimating Gaussian Functions in the Pitch Space of Consonance

369