Implementation of Emotion in Music Composing: Evidence of
Sadness, Happiness and Calmness
Zeen Li
La Jolla Country Day School, San Diego, U.S.A.
Keywords: Emotion in Music, Computational Music Composition, Music Generation Models, Emotional Expression, AI
in Music.
Abstract: As a matter of fact, emotion plays a crucial role in music creation, influencing how listeners perceive and
react to musical works. With the advancement of artificial intelligence (especially deep learning), generating
music that can convey specific emotions such as sadness, happiness, and calmness has become increasingly
complex. This study explores the implementation of emotional expression in AI music creation, utilizing
models such as long short-term memory (LSTM) networks, generative adversarial networks (GANs), and
transformer-based architectures. This study analyses the ability of these models to generate emotionally
resonant music and evaluate the results using quantitative/objective/algorithmic-analysis metrics (e.g., note
density, harmonic content) and qualitative/subjective/human-cantered evaluations from human listeners. The
results show that while these models can successfully produce music that matches the desired emotional
characteristics, their effectiveness varies depending on the model and the target emotion. For example, GANs
are particularly effective in generating happy music with unique rhythmic patterns, while Transformers master
creating calm, coherent pieces. This study highlights the potential of AI for emotionally adaptive music
applications, with important implications for areas such as therapeutic practice, interactive media, and
personalized learning. Future work will focus on improving model accuracy and exploring cross-cultural
emotional interpretation in music generation.
1 INTRODUCTION
Incorporating emotional expressions into music has
long been an interest for human music composers and
more recently for artificial intelligence (AI)
programmers. In fact, emotional expressions are very
essential in music compositions, composers control
specific variables of music to deliver a distinct
emotion. Music is a powerful medium for conveying
emotions, and its emotional impact on listeners is
well-established in psychology and musicology
research. (Juslin & Västfjäll, 2008; Gabrielsson,
2011). As the topic of computer-generating music
continues to progress, there is an increasing interest
in generating compositions that not only have human
creativity but also express distinct emotions. The
development of computational music has many
significant advancements, from early rule-based
computer programs to nowadays AI deep-learning
models, which can generate complex musical pieces
that accurately present kinds of emotions (Todd &
Loy, 1991; Briot et al., 2020). AI-driven music
composition tools, such as OpenAI's MuseNet and
Google Magenta, are capable of producing music
across various genres and styles with increasing
sophistication and emotional depth.
In recent years there has been a trend focused on
enhancing the effectiveness and accuracy of
emotional expression of AI-generated music, moving
beyond simply replicating musical notes to
incorporating subtle emotional cues and details to
enrich the pieces. (Ferreira & Whitehead, 2019;
Herremans et al., 2020). For example, nowadays AI
music generators ask you to input words that describe
the emotions and genres of songs to generate, and the
music output often is highly accurate to the inputted
emotions and musical genres, and richer than
expected. Emotion-driven music composition utilizes
advanced machine learning techniques, including
Recurrent Neural Networks (RNNs), Generative
Adversarial Networks (GANs), and Transformer-
based models, to produce music that aligns with
human emotional perceptions (Chuan et al., 2020;
Yang et al., 2022). For instance, models such as
Li and Z.
Implementation of Emotion in Music Composing: Evidence of Sadness, Happiness and Calmness.
DOI: 10.5220/0013512200004619
In Proceedings of the 2nd International Conference on Data Analysis and Machine Learning (DAML 2024), pages 179-183
ISBN: 978-989-758-754-2
Copyright © 2025 by Paper published under CC license (CC BY-NC-ND 4.0)
179
EmoMusic and EMOPIA have shown promising
results in generating music that reflects specific
emotions like sadness, happiness, and calmness,
tailored to listeners’ expectations (Zhu et al., 2021).
These advancements demonstrate AI’s increasing
ability to understand and simulate the complexity of
human emotions through music, creating new
possibilities for personalized music creation, music
therapy, interactive digital art forms, and so on.
This study aims to explore how to implement
emotional expression in music creation, especially
sadness, happiness, and calmness, and introduce
some implementation methods of artificial
intelligence models through specific examples and
references. The framework of this study includes a
comprehensive analysis of variables for different
emotions, an evaluation of their effectiveness, and a
discussion of their limitations and potential
improvements. In the following sections, this study
will first outline how to create music with different
emotions by controlling different variables, e.g.,
melody, harmony, tempo, dynamics. Then, one will
implement specific emotions through computational
models, introduce typical results and principles, and
evaluate the results. Finally, this research highlights
the main findings, challenges, and future prospects of
this field.
2 MODELS AND EVALUATIOINS
AI implementing emotions in music creation relies on
advanced models that depend on deep learning,
generative algorithms, and music theory principles.
These models are designed to generate music that
reflects specific emotional states, such as sadness,
happiness, or calmness. This section explores key
models used in emotion-driven music creation, tools
and software that facilitate this process, and methods
used to evaluate the quality and effectiveness of
generated music.
One of the most remarkable models used in
emotion-based music generation is the Recurrent
Neural Network (RNN), particularly the Long Short-
Term Memory (LSTM) variant. LSTM networks are
very good at sequence prediction problems, making
them ideal for generating music as they can capture
temporal dependencies in musical compositions
(Briot, Hadjeres, & Pachet, 2020). LSTMs have been
widely used to generate sequences of notes that align
with the emotional tone specified by the input data.
For instance, an LSTM model trained on a dataset of
sad classical music pieces can generate compositions
that simulate the emotional patterns and
characteristics found in the training data, such as
minor keys, slower tempos, and low dynamics.
However, the effectiveness of LSTM-based models
largely depends on the quality and diversity of the
training datasets, as well as the model's architecture
and hyperparameters (Ferreira & Whitehead, 2019).
Another model that has gained popularity for its
ability to generate emotionally rich music is the
Generative Adversarial Network (GAN). GANs
consist of two neural networks, which are a generator
and a discriminator. They are trained simultaneously
through a competitive process. In the context of music
generation, the generator generates music samples
based on the emotion of the input, while the
discriminator evaluates the authenticity and
emotional consistency of these samples based on real
music data (Yang et al., 2017). Variants of GANs,
such as Conditional GANs (cGANs), have been used
to generate music on specific emotional labels, which
would give more targeted outputs. The advantage of
using GANs is their ability to learn complex
distributions and generate diverse musical
compositions. However, training GANs are
computationally intensive and require careful tuning
to avoid common pitfalls such as mode collapse
(Herremans et al., 2020).
Transformer-based models have also been used
for music generation tasks due to their powerful
sequence modeling capabilities. The Transformer
architecture has achieved great success in natural
language processing (NLP). It has been adapted for
music generation by representing musical elements as
sequences similar to words in a sentence. Models
such as the Music Transformer and GPT-based
architectures (e.g., OpenAI’s MuseNet) have
effectively captured long-term dependencies and
complex structures in music, enabling the generation
of compositions that evoke specific emotions (Huang
et al., 2018). Transformers can be fine-tuned on
emotion-labeled datasets to align the generated music
with an emotional expression that is desired. This
approach has been shown to successfully generate
coherent and expressive music in various genres and
emotional contexts.
To evaluate the quality and emotional accuracy of
AI-generated music, researchers have employed both
quantitative and qualitative methods. Quantitative
methods typically involve metrics such as note
density, pitch range, and rhythmic complexity, which
can be statistically analyzed to determine how well
the generated music matches specific emotional
profiles (Liu et al., 2021). For example, music
classified as “sad” may exhibit a lower average tempo
and use more minor chords than “happy” music.
DAML 2024 - International Conference on Data Analysis and Machine Learning
180
Qualitative evaluations, on the other hand, rely on
human listeners to recognize the emotional impact
and aesthetic quality of the generated music.
Participants are often asked to rate the music on a
scale associated with specific emotions (e.g., happy,
sad, calm) or provide feedback on how well the music
matches the emotional expectations (Hung et al.,
2023).
In addition to these evaluation methods, various
tools and software platforms exist to help with
emotion-driven music generation and evaluation. For
example, Google's Magenta Studio provides a suite of
music creation tools driven by machine learning
models, while OpenAI's MuseNet can generate music
of various styles and emotional tones. These
platforms provide user-friendly interfaces that allow
composers and researchers who don’t actually know
much about coding or computers, to input specific
emotional parameters and try different AI models to
generate music that meets their emotional criteria.
Overall, RNN, GAN, and Transformer-based models,
along with powerful evaluation frameworks, form the
basis of emotion-based music generation.
3 REALIZATIONS OF SADNESS
The emotion of sadness in music is often
characterized by slower tempos, minor keys, low
dynamics, and smoother legato phrases. AI models
generate music that conveys sadness by combining
these musical features with data from pieces that
evoke similar emotions. One outstanding approach is
to use long short-term memory (LSTM) networks,
which are effective at modeling sequences where the
order of elements matters, such as in music. LSTM
models have been widely used to generate music with
emotional content due to their ability to handle
temporal dependencies in sequential data (Ferreira &
Whitehead, 2019). To generate sad music, LSTM
networks are trained on a dataset of sad pieces. These
models learn typical structural and expressive
elements of sad music, such as minor chord
progressions, slow tempos, and smooth legatos. In the
generation phase, LSTM models can compose new
pieces by predicting subsequent notes and chords that
are consistent with the emotional tone of sadness. The
generated music often reflects a melancholic
atmosphere with long note durations and minimal
rhythmic complexity.
Evaluating the effectiveness of LSTM-generated
sad music involves both quantitative and qualitative
measures. Quantitative metrics might include
analyzing the frequency of minor chords, tempo
variations, and note densities to ensure they fall
within the typical range associated with sadness in
music. Qualitatively, human listeners are asked to rate
the generated music based on how well it evokes
feelings of sadness. Studies show that LSTM-
generated sad music can effectively convey the
intended emotion, as participants often rate these
pieces highly in terms of sadness perception (Hung et
al., 2023).
Example Results: Research indicates that music
generated by the model is often perceived as sad when
it adheres to conventions such as slow tempos
(around 60-70 beats per minute), minor chord
progressions, and sparse melodic lines. An example
study by Ferreira and Whitehead found that LSTM-
generated music trained on a dataset of sad music
(negative) pieces resulted in compositions that human
listeners consistently rated as sad (negative),
confirming the model’s ability to replicate emotional
content effectively (seen from the Fig. 1) (Ferreira &
Whitehead, 2019).
Figure 1: Annotation tasks for reliazations of emotion
(Ferreira & Whitehead, 2019).
4 REALIZATIONS OF
HAPPINESS
Happiness in music is often associated with fast
tempos, major keys, high dynamics, rhythmic
regularity, and bright timbres. AI models aimed at
generating happy music focus on combining these
elements to deliver a sense of joy and energy.
Generative Adversarial Networks (GANs),
particularly Conditional GANs (cGANs), are
effective in generating music that involve happiness
by allowing the model to be conditioned on specific
emotional labels during the training process. The
cGANs involve a generator that creates music
samples conditioned on a “happy” label and a
discriminator that evaluates these samples against
real music data annotated as happy. The generator
learns to produce music that fools the discriminator
Implementation of Emotion in Music Composing: Evidence of Sadness, Happiness and Calmness
181
into thinking it is genuine “happy” music (Yang et al.,
2017). The training data typically includes music
pieces with fast tempos, major scales, syncopated
rhythms, and higher register melodies, all of which
are musical features that convey happiness. cGAN
refines its output, generating increasingly realistic
and emotionally consistent happy music.
The effectiveness of happy music generated by
cGANs is evaluated with both objective and
subjective criteria. Objective (Quantitative) measures
may include tempo analysis, frequency of major
chords, and rhythmic patterns, while subjective
(Qualitative) evaluations involve listener studies
where participants rate the perceived happiness of the
music. Research has shown that cGANs can
effectively capture the dynamics of happy music, and
human evaluators frequently agree with the model’s
classification of happiness based on emotional
content (Herremans et al. , 2020). The sketch of the
overall modelling is shown in Fig. 2.
An experiment involving cGAN-generated happy
music output pieces with a tempo above 120 beats per
minute, frequent use of major triads, and syncopated
rhythmic patterns were consistently rated as “happy”
by listeners. The use of bright-sounding instruments,
like pianos and brass, further enhanced the perceived
happiness in the compositions (Huang et al., 2018).
5 REALIZATIONS OF
CALMNESS
Calmness in music is characterized by smooth,
flowing melodies, consistent tempos, soft dynamics,
and often features ambient or minimalist textures. AI
models, particularly Transformer-based models like
the Music Transformer and MuseNet, have been
effective in generating calm music by capturing long-
range dependencies and patterns that contribute to a
soothing and serene auditory experience.
Transformer models have revolutionized
sequence modeling in various domains, including
music generation, by their ability to handle long-term
dependencies and parallelize the learning process
(Huang et al., 2018). In generating calm music, these
models are trained on datasets containing pieces
labelled as calm or serene, such as ambient music,
slow piano pieces, or certain types of classical
compositions. By learning from these examples, the
Transformer model can generate music that reflects
the harmonic simplicity, smooth phrasing, and steady
tempos typical of calm music. The attention
mechanisms within Transformers allow the model to
focus on key features that contribute to calmness,
such as sustained notes and minimal harmonic tension.
The evaluation of calm music generated by
Transformer models combines both algorithmic
Figure 2: Concept map for automatic music generation systems (Herremans et al. , 2020).
DAML 2024 - International Conference on Data Analysis and Machine Learning
182
analysis and human-cantered evaluations.
Algorithmically, the generated music can be
evaluated for smooth transitions, consistency in
tempo, and minimal use of dissonant chords. Human
listeners are then asked to rate the calmness of the
music on scales, providing subjective feedback that
can help validate the AI model’s ability to evoke a
sense of calmness. Studies have shown that calm
music generated by Transformers is often perceived
as relaxing and peaceful, validating the effectiveness
of the model (Briot et al., 2020). A study on calm
music generation using Music Transformer showed
that compositions featuring long, sustained chords,
slow-moving melodies, and soft dynamics were
consistently rated as “calm” by listeners. The use of
gentle timbres, such as soft synthesizers or mellow
strings, also contributed to the further expression of
calmness, confirming the model’s capability to
generate music that aligns with the intended emotion
(Herremans et al., 2020).
6 CONCLUSIONS
To sum up, this study explored the implementation of
emotional expression in AI-driven music creation,
focusing on generating music that conveys sadness,
happiness, and calmness using deep learning models
such as LSTM, GAN, and Transformers. The results
show that while each model has advantages (e.g.,
GAN can generate happy music, Transformers can
create calming compositions), they have limitations
in achieving nuanced emotional expression. Future
research should aim to improve model accuracy and
explore the cultural dimensions of emotional
interpretation in music. This work helps advance the
application of AI in therapeutic, educational, and
entertainment settings, and enhance emotionally
adaptive music systems.
REFERENCES
Briot, J. P., Hadjeres, G., Pachet, F. D., 2020. Deep
Learning Techniques for Music Generation. Springer.
Chuan, C.-H., Agres, K., Herremans, D., 2020. From
context to concept: Exploring semantic meaning in
music with transformer-based models. Proceedings of
the 21st International Society for Music Information
Retrieval Conference 11,
Ferreira, L., Whitehead, J., 2019. Learning to Generate
Music with Sentiment. Proceedings of the 14th
International Conference on the Foundations of Digital
Games, 7.
Gabrielsson, A., 2011. Strong experiences with music:
Music is much more than just music. Oxford University
Press.
Herremans, D., Chuan, C. H., Chew, E., 2020. A functional
taxonomy of music generation systems. ACM
Computing Surveys (CSUR), 53(3).
Huang, C. Z. A., Vaswani, A., Uszkoreit, J., Shazeer, N.,
Simon, I., Hawthorne, C., Dai, A. M., Hoffman, M. D.,
Dinculescu, M., Eck, D., 2018. Music Transformer:
Generating Music with Long-Term Structure.
International Conference on Learning Representations
(ICLR), 18.
Hung, S. H., Chen, W. Y., Su, J. L., 2023. EMOPIA:
Emotionally Adaptive Music Generation via
Transformer Models. Proceedings of the 18th
International Conference on Music Perception and
Cognition 22.
Juslin, P. N., Västfjäll, D., 2008. Emotional responses to
music: The need to consider underlying mechanisms.
Behavioral and Brain Sciences, 31(5), 559-621.
Todd, P. M., Loy, D. G., 1991. Music and Connectionism.
MIT Press.
Yang, L. C., Pasquier, P., Herremans, D., 2022. Music
Emotion Recognition: A State of the Art Review. ACM
Transactions on Multimedia Computing,
Communications, and Applications (TOMM), 18(1).
Zhu, J., Wang, L., Cai, X., 2021. EmoMusic: A Dataset for
Music Emotion Recognition. IEEE Transactions on
Affective Computing, 12.
Implementation of Emotion in Music Composing: Evidence of Sadness, Happiness and Calmness
183