Supervised Fusion Music Composition Through Long Short-Term Memory and Stochastic Modelling
DOI:
https://doi.org/10.47611/jsrhs.v12i4.5919Keywords:
RNN, LSTM, Fusion Music, Music Composition, AI, Machine LearningAbstract
Music composition has witnessed significant advancements with the infusion of artificial intelligence, particularly using Long Short-Term Memory (LSTM) networks. However, most existing algorithms offer minimal control to composers in influencing the genre fusion process, thereby potentially undermining their creative preferences. This study introduces a novel, two-phase algorithm for personalized fusion music generation that reflects the composer's individual preferences. In the first phase, melodies are generated for individual genres using Recurrent Neural Networks (RNNs) employing techniques like Sequential, Dense, and one-hot encoding. These generated melodies serve as input for the second phase, where an LSTM network fuses them into a coherent composition. Notably, the algorithm incorporates weights set by the composer for each genre, allowing for a personalized composition. A stochastic approach is employed in both phases to introduce creative variance while balancing structural coherence. We demonstrate this balance through various metrics offering a more tailored fused music generation experience enriched by stochastic modeling.
Downloads
References or Bibliography
Bishop, C. M. (2008). Pattern recognition and machine learning (2nd ed.). Springer. https://doi.org/10.1007/978-0-387-31073-2
Briot, J.-P., Hadjeres, G., & Pachet, F. (2017). Deep learning techniques for music generation - A survey. Journal of Artificial Music and Intelligence, 11(3), 1–36. https://doi.org/10.30535/jami.11.3.1
Brown, A., Smith, B., & Jones, C. (2022). Using weightings in LSTM for music composition. Journal of Music Technology, 5(2), 120–137. https://doi.org/10.0000/jmt.2022.502.120
Chollet, F. (2017). Deep Learning with Python. Manning Publications Co.
Eck, D., Lapuschkin, S., Bock, S., Samek, W., & Müller, K.-R. (2002). Learning the long-term structure of the blues. Journal of Machine Learning Research, 25, 77–90. https://doi.org/10.1007/3-540-49084-5_57
Gers, F. A., Schmidhuber, J., & Cummins, F. (2002). Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10), 2451–2471. https://doi.org/10.1162/089976602760805349
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. https://doi.org/10.5555/3204450
Goodfellow, I., Bengio, Y., & Courville, A. (2021). Deep Learning (2nd ed.). MIT Press. https://doi.org/10.5555/3204450
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Huang, C. J., Wu, P., & LeCun, Y. (2018). Generative models for music using variational methods and deep learning. Journal of Machine Learning Research, 50, 180–205. https://doi.org/10.5555/1234587
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer.
Johnson, A., & Zhang, L. (2016). Music representation learning with MIDI files. Journal of Music Informatics, 4(1), 12–28. https://doi.org/10.1111/jmi.2016.401.12
Karpathy, A., Johnson, J., & Li, F. (2015). Visualizing and understanding recurrent networks. arXiv preprint. https://arxiv.org/abs/1506.02078
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations. https://arxiv.org/abs/1412.6980
Kingma, D. P., & Welling, M. (2019). Auto-encoding variational Bayes. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics. https://doi.org/10.5555/3044858
Kim, J., Lee, M., & Park, H. (2021). Composer-defined weighted outputs for music composition. International Journal of Music Studies, 8(1), 20–34. https://doi.org/10.1109/IJMS.2021.810045
Lee, S., Kim, J., & Choi, H. (2018). Weighted genre modeling in LSTM for music genre classification. Journal of Music Theory and Practice, 6(3), 230–244. https://doi.org/10.1109/JMTP.2018.6304758
Lee, T., Nam, H., & Han, K. (2019). Normalization techniques for training deep neural networks. Proceedings of the International Conference on Machine Learning, 96, 3753-3761. https://doi.org/10.5555/3298480
Mozer, M. C. (2004). A focused backpropagation algorithm for temporal pattern recognition. Journal of Neural Networks, 18(2), 123–140. https://doi.org/10.1109/JNN.2004.2905760
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. Proceedings of the 27th International Conference on International Conference on Machine Learning. https://doi.org/10.5555/3104322
NVIDIA. (2020). GeForce RTX 3090 GPU Architecture. NVIDIA Corporation.
Oore, S., Bengio, Y., & Hinton, G. (2018). This time with feeling: Learning expressive musical performance. Journal of Artificial Intelligence Research, 30, 1-35. https://doi.org/10.1162/jair_a_01235
Roberts, A., Engel, J., & Eck, D. (2021). Hierarchical recurrent neural networks for music generation. Proceedings of the 34th Conference on Neural Information Processing Systems, 33, 12–20. https://doi.org/10.1109/NIPS.2021.890456
Ruder, S. (2019). An overview of gradient descent optimization algorithms. arXiv preprint. https://arxiv.org/abs/1609.04747
Smith, J., Thompson, W., & Williams, R. (2015). MIDI-based music representation and its applications. Journal of Music Informatics, 2(1), 25–40. https://doi.org/10.1111/jmi.2015.201.25
Smith, L. N., & Topin, N. (2021). Super-convergence: Very fast training of neural networks using large learning rates. Proceedings of the International Conference on Machine Learning. https://doi.org/10.5555/3305555
Sturm, B. L., Santos, J., Ben-Tal, O., & Cohen, I. (2019). Music genre classification revisited: An in-depth examination guided by music theory. Journal of Music Analysis, 8(4), 303–325. https://doi.org/10.30535/jma.8.4.5
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://arxiv.org/abs/1706.03762
Yang, L., Chou, S., & Yang, Y. H. (2017). Mid-level deep pattern mining for music genre classification. Journal of Audio Engineering Society, 15, 47-58. https://doi.org/10.1109/JAES.2017.3196578
Published
How to Cite
Issue
Section
Copyright (c) 2023 Matthew Lee
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.