expanding to multimodal inputs, and refining
evaluation metrics to create music that is not only
technically proficient but also emotionally and
creatively compelling.
6 CONCLUSION
To sum up, this paper has explored the functionality,
computational efficiency, and creative potential of
four key transformer-based models for music
generation: Transformer-VAE, Multitrack Music
Transformer, MuseGAN, and Pop Music
Transformer. Each model has its strengths and
limitations, from the flexibility and structural control
of Transformer-VAE to the efficiency and multitrack
harmony capabilities of the Multitrack Music
Transformer. MuseGAN excels in human-AI
collaboration, while the Pop Music Transformer
generates rhythmically focused compositions suitable
for pop music. These models showcase how AI can
contribute to the creative process by generating
coherent and structured music across various genres
and use cases. Future works in this field have great
potential to expand through improvements in
reinforcement learning, multimodal integration, and
evaluation metrics that better capture creativity and
emotional expressiveness. These advancements will
further enhance the ability of AI models to produce
innovative and emotionally compelling compositions.
In conclusion, these results contribute to the growing
body of research on algorithmic composition,
highlighting the strengths and challenges of current
models while identifying areas for future
development. By pushing the boundaries of AI-
generated music, these models represent significant
strides toward bridging the gap between human
creativity and machine learning.
REFERENCES
Agostinelli, A., Denk, T. I., Borsos, Z., et al, 2023.
Musiclm: Generating music from text. arxiv preprint
arxiv:2301.11325.
Ames, C., 1987. Automated Composition in Retrospect:
1956–1986. Leonardo 20(2), 169-185. 5.
Ames, C., 1989. The Markov Process as a compositional
Model: a survey and tutorial. Leonardo, 22(2), 175.
Dai, S., Yu, H., Dannenberg, R. B., 2022. What is missing
in deep music generation? A study of repetition and
structure in popular music. 23rd International Society
for Music Information Retrieval Conference, 11
Dong, H. W., Hsiao, W. Y., Yang, L. C., Yang, Y. H., 2018.
MuseGAN: multitrack sequential generative
adversarial networks for symbolic music generation
and accompaniment. Proceedings of the Thirty-Second
AAAI Conference on Artificial Intelligence and
Thirtieth Innovative Applications of Artificial
Intelligence Conference and Eighth AAAI Symposium,
32, 1.
Dong, H. W., Chen, K., Dubnov, S., McAuley, J., Berg-
Kirkpatrick, T., 2023. Multitrack music transformer.
ICASSP 2023-2023 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP) pp.
1-5.
Huang, C. Z., Vaswani, A., Uszkoreit, J., Shazeer, N.,
Hawthorne, C., Dai, A., Hoffman, M., Eck, D., 2018.
Music Transformer: Generating Music with Long-Term
Structure. arXiv preprint arXiv:1809.04281.
Huang, Y. S., Yang, Y. H., 2020. Pop Music Transformer:
Beat-based Modeling and Generation of Expressive
Pop Piano Compositions. Proceedings of the 28th
ACM International Conference on Multimedia, 1180–
1188.
Jiang, J., Xia, G. G., Carlton, D. B., Anderson, C. N.,
Miyakawa, R. H., 2020. Transformer vae: A
hierarchical model for structure-aware and
interpretable music representation learning. ICASSP
2020-2020 IEEE International Conference on
Acoustics, Speech and Signal Processing
(ICASSP) 516-520.
Prafulla, D., Heewoo, J., Christine, P., Jong, W. K., Alec,
R., Ilya, S., 2020. Jukebox: A Generative Model for
Music. arxiv preprint arxiv:2005.00341.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A., Kaiser, U., Polosukhin, I., 2017.
Attention is All you Need. Advances in Neural
Information Processing Systems. Curran Associates,
Inc..