diff --git a/README.md b/README.md index 652966d..61f1a14 100644 --- a/README.md +++ b/README.md @@ -172,6 +172,7 @@ However, these surveys do not cover music information retrieval tasks that are i | 2017 | [Designing efficient architectures for modeling temporal features with convolutional neural networks](http://ieeexplore.ieee.org/document/7952601/) | [GitHub](https://github.com/jordipons/ICASSP2017) | | 2017 | [Timbre analysis of music audio signals with convolutional neural networks](https://github.com/ronggong/EUSIPCO2017) | [GitHub](https://github.com/jordipons/EUSIPCO2017) | | 2017 | [Deep learning and intelligent audio mixing](http://www.semanticaudio.co.uk/wp-content/uploads/2017/09/WIMP2017_Martinez-RamirezReiss.pdf) | No | +| 2017 | [A SeqGAN for Polyphonic Music Generation](https://arxiv.org/pdf/1710.11418v2.pdf) | [GitHub](https://github.com/L0SG/seqgan-music) | | 2017 | [Deep learning for event detection, sequence labelling and similarity estimation in music signals](http://ofai.at/~jan.schlueter/pubs/phd/phd.pdf) | No | | 2017 | [Music feature maps with convolutional neural networks for music genre classification](https://www.researchgate.net/profile/Thomas_Pellegrini/publication/319326354_Music_Feature_Maps_with_Convolutional_Neural_Networks_for_Music_Genre_Classification/links/59ba5ae3458515bb9c4c6724/Music-Feature-Maps-with-Convolutional-Neural-Networks-for-Music-Genre-Classification.pdf?origin=publication_detail&ev=pub_int_prw_xdl&msrp=wzXuHZAa5zAnqEmErYyZwIRr2H0q01LnNEd4Wd7A15CQfdVLwdy98pmE-AdnrDvoc3-bVENSFrHt0yhaOiE2mQrYllVS9CJZOk-c9R0j_R1rbgcZugS6RtQ_.AUjPuJSF5P_DMngf-woH7W-7jdnQlbNQziR4_h6NnCHfR_zGcEa8vOyyOz5gx5nc4azqKTPQ5ZgGGLUxkLj1qCQLEQ5ThkhGlWHLyA.s6MBZE20-EO_RjRGCOCV4wk0WSFdN56Aloiraxz9hKCbJwRM2Et27RHVUA8jj9H8qvXIB6f7zSIrQgjXGrL2yCpyQlLffuf57rzSwg.KMMXbZrHsihV8DJM53xkHAWf3VebCJESi4KU4btNv9nQsyK2KnkhSQaTILKv0DSZY3c70a61LzywCBuoHtIhVOFhW5hVZN2n5O9uKQ) | No | | 2017 | [Automatic drum transcription for polyphonic recordings using soft attention mechanisms and convolutional neural networks](https://carlsouthall.files.wordpress.com/2017/12/ismir2017adt.pdf) | [GitHub](https://github.com/CarlSouthall/ADTLib) | @@ -191,7 +192,10 @@ However, these surveys do not cover music information retrieval tasks that are i | 2017 | [Attention and localization based on a deep convolutional recurrent model for weakly supervised audio tagging](https://arxiv.org/pdf/1703.06052.pdf) | [GitHub](https://github.com/yongxuUSTC/att_loc_cgrnn) | | 2017 | [Surrey-CVSSP system for DCASE2017 challenge task4](https://www.cs.tut.fi/sgn/arg/dcase2017/documents/challenge_technical_reports/DCASE2017_Xu_146.pdf) | [GitHub](https://github.com/yongxuUSTC/dcase2017_task4_cvssp) | | 2017 | [A study on LSTM networks for polyphonic music sequence modelling](https://qmro.qmul.ac.uk/xmlui/handle/123456789/24946) | [Website](http://www.eecs.qmul.ac.uk/~ay304/code/ismir17) | +| 2018 | [MUSIC TRANSFORMER:GENERATING MUSIC WITH LONG-TERM STRUCTURE](https://arxiv.org/pdf/1809.04281.pdf) | No | | 2018 | [MuseGAN: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment](https://arxiv.org/pdf/1709.06298.pdf) | [GitHub](https://github.com/salu133445/musegan) | +| 2018 | [Music Theory Inspired Policy Gradient Method for Piano Music Transcription](https://nips2018creativity.github.io/doc/music_theory_inspired_policy_gradient.pdf) | No | +| 2019 | [Generating Long Sequences with Sparse Transformers](https://arxiv.org/pdf/1904.10509.pdf) | [GitHub](https://github.com/openai/sparse_attention) | [Go back to top](https://github.com/ybayle/awesome-deep-learning-music#deep-learning-for-music-dl4m-) @@ -238,24 +242,24 @@ Each entry in [dl4m.bib](dl4m.bib) also displays additional information: ## Statistics and visualisations -- 160 papers referenced. See the details in [dl4m.bib](dl4m.bib). +- 164 papers referenced. See the details in [dl4m.bib](dl4m.bib). There are more papers from 2017 than any other years combined. Number of articles per year: ![Number of articles per year](fig/articles_per_year.png) -- If you are applying DL to music, there are [329 other researchers](authors.md) in your field. -- 33 tasks investigated. See the list of [tasks](tasks.md). +- If you are applying DL to music, there are [348 other researchers](authors.md) in your field. +- 35 tasks investigated. See the list of [tasks](tasks.md). Tasks pie chart: ![Tasks pie chart](fig/pie_chart_task.png) -- 48 datasets used. See the list of [datasets](datasets.md). +- 51 datasets used. See the list of [datasets](datasets.md). Datasets pie chart: ![Datasets pie chart](fig/pie_chart_dataset.png) -- 27 architectures used. See the list of [architectures](architectures.md). +- 29 architectures used. See the list of [architectures](architectures.md). Architectures pie chart: ![Architectures pie chart](fig/pie_chart_architecture.png) -- 9 frameworks used. See the list of [frameworks](frameworks.md). +- 10 frameworks used. See the list of [frameworks](frameworks.md). Frameworks pie chart: ![Frameworks pie chart](fig/pie_chart_framework.png) -- Only 42 articles (26%) provide their source code. +- Only 44 articles (26%) provide their source code. Repeatability is the key to good science, so check out the [list of useful resources on reproducibility for MIR and ML](reproducibility.md). [Go back to top](https://github.com/ybayle/awesome-deep-learning-music#deep-learning-for-music-dl4m-) diff --git a/architectures.md b/architectures.md index 7cc692c..ae61667 100644 --- a/architectures.md +++ b/architectures.md @@ -27,5 +27,7 @@ Please refer to the list of useful acronyms used in deep learning and music: [ac - RNN - RNN-LSTM - ResNet +- SeqGAN +- Transformer - U-Net - VPNN diff --git a/authors.md b/authors.md index 35fce04..107fc80 100644 --- a/authors.md +++ b/authors.md @@ -1,8 +1,11 @@ # List of authors - Adavanne, Sharath +- Alec Radford +- Andrew M. Dai - Arumugam, Muthumari - Arzt, Andreas +- Ashish Vaswani - Badeau, Roland - Bammer, Roswitha - Barbieri, Francesco @@ -33,6 +36,7 @@ - Chen, Tanfang - Chen, Wenxiao - Cheng, Wen-Huang +- Cheng{-}Zhi Anna Huang - Chesmore, David - Chiang, Chin-Chin - Cho, Kyunghyun @@ -41,6 +45,7 @@ - Costa, Yandre MG - Courville, Aaron - Coutinho, Eduardo +- Curtis Hawthorne - Dannenberg, Roger B - David, Bertrand - De Haas, W Bas @@ -52,6 +57,7 @@ - Doerfler, Monika - Dong, Hao-Wen - Dorfer, Matthias +- Douglas Eck - Drossos, Konstantinos - Duppada, Venkatesh - Durand, Simon @@ -111,9 +117,11 @@ - Hutchings, P. - Huttunen, Heikki - Ide, Ichiro +- Ilya Sutskever - Imenina, Alina - Jackson, Philip J. B. - Jain, Shubham +- Jakob Uszkoreit - Janer Mestres, Jordi - Janer, Jordi - Jang, Jyh-Shing R @@ -158,6 +166,7 @@ - Lee, Tan - Leglaive, Simon - Lewis, J. P. +- Li, Juncheng - Li, Lihua - Li, Peter - Li, Siyan @@ -178,11 +187,13 @@ - Materka, Andrzej - Mathulaprangsan, Seksan - Matityaho, Benyamin +- Matthew D. Hoffman - McFee, Brian - Medhat, Fady - Mehri, Soroush - Meng, Fanhang - Mertins, Alfred +- Metze, Florian - Mimilakis, Stylianos Ioannis - Miron, Marius - Mitsufuji, Yuki @@ -198,6 +209,7 @@ - Nielsen, Frank - Nieto, Oriol - Niewiadomski, Adam +- Noam Shazeer - Ogihara, Mitsunori - Oliveira, Luiz S - Oramas, Sergio @@ -223,10 +235,12 @@ - Prockup, Matthew - Qian, Jiyuan - Qian, Sheng +- Qu, Shuhui - Radenen, Mathieu - Ramírez, Marco A. Martínez - Reiss, Joshua D. - Ren, Gang +- Rewon Child - Richard, Gaël - Riedmiller, Martin - Rigaud, François @@ -234,6 +248,7 @@ - Roma, Gerard - Rosasco, Lorenzo - Sandler, Mark Brian +- Sang{-}gil Lee - Santos, João Felipe - Santoso, Andri - Saurous, Rif A. @@ -246,7 +261,9 @@ - Schuller, Björn W - Schuller, Gerald - Schultz, Tanja +- Scott Gray - Senac, Christine +- Seonwoo Min - Serra, Xavier - Seybold, Bryan - Shi, Zhengshan @@ -265,6 +282,7 @@ - Stoller, Daniel - Sturm, Bob L. - Su, Hong +- Sungroh Yoon - Takahashi, Naoya - Takiguchi, Tetsuya - Tanaka, Hidehiko @@ -277,6 +295,7 @@ - Tsaptsinos, Alexandros - Tsipas, Nikolaos - Uhlich, Stefan +- Uiwon Hwang - Ullrich, Karen - Valin, Jean-Marc - Van Gemert, JC diff --git a/datasets.md b/datasets.md index 72fa920..d2eda0a 100644 --- a/datasets.md +++ b/datasets.md @@ -23,6 +23,7 @@ Please refer to the list of useful acronyms used in deep learning and music: [ac - [Homburg](http://www-ai.cs.uni-dortmund.de/audio.html) - [IDMT-SMT-Drums](https://www.idmt.fraunhofer.de/en/business_units/m2d/smt/drums.html) - [IRMAS](https://www.upf.edu/web/mtg/irmas) +- [J.S. Bach chorales dataset](https://github.com/czhuang/JSB-Chorales-dataset) - [JSB Chorales](ftp://i11ftp.ira.uka.de/pub/neuro/dominik/midifiles/bach.zip) - [Jamendo](http://www.mathieuramona.com/wp/data/jamendo/) - [LMD](https://sites.google.com/site/carlossillajr/resources/the-latin-music-database-lmd) @@ -39,7 +40,9 @@ Please refer to the list of useful acronyms used in deep learning and music: [ac - [MedleyDB](http://medleydb.weebly.com/) - [MusicNet](https://homes.cs.washington.edu/~thickstn/musicnet.html) - [NTT MLS](http://www.ntt-at.com/product/speech/) +- [Nottingham dataset](http://abc.sourceforge.net/NMD/) - [Open Multitrack Testbed](http://www.semanticaudio.co.uk/projects/omtb/) +- [Piano-e-Competition dataset (competition history)](http://www.piano-e-competition.com/) - [Piano-midi.de](Piano-midi.de) - [RWC](https://staff.aist.go.jp/m.goto/RWC-MDB/) - [SALAMI](http://ddmal.music.mcgill.ca/research/salami/annotations) diff --git a/dl4m.bib b/dl4m.bib index 2514d59..3c4b973 100644 --- a/dl4m.bib +++ b/dl4m.bib @@ -1,3 +1,5 @@ +@comment{}} + @inproceedings{Bharucha1988, author = {Bharucha, J.}, booktitle = {Proceedings of the First Workshop on Artificial Intelligence and Music}, @@ -1782,6 +1784,23 @@ @inproceedings{Ramirez2017 year = {2017} } +@inproceedings{Lee2017, + architecture = {SeqGAN}, + author = {Sang{-}gil Lee and Uiwon Hwang and Seonwoo Min and Sungroh Yoon}, + batch = {No}, + booktitle = {CoRR}, + code = {https://github.com/L0SG/seqgan-music}, + dataaugmentation = {No}, + dataset = {[Nottingham dataset](http://abc.sourceforge.net/NMD/)}, + framework = {Tensorflow}, + input = {MIDI}, + link = {https://arxiv.org/pdf/1710.11418v2.pdf}, + loss = {No}, + task = {Polyphonic music sequence modelling}, + title = {A SeqGAN for Polyphonic Music Generation}, + year = {2017} +} + @phdthesis{Schlueter2017, author = {Schlüter, Jan}, link = {http://ofai.at/~jan.schlueter/pubs/phd/phd.pdf}, @@ -2034,6 +2053,22 @@ @inproceedings{Ycart2017 year = {2017} } +@inproceedings{Huang2018, + architecture = {Transformer & RNN}, + author = {Cheng{-}Zhi Anna Huang and Ashish Vaswani and Jakob Uszkoreit and Noam Shazeer and Curtis Hawthorne and Andrew M. Dai and Matthew D. Hoffman and Douglas Eck}, + batch = {No}, + booktitle = {CoRR}, + dataaugmentation = {Time Stretches & pitch transcription}, + dataset = {[J.S. Bach chorales dataset](https://github.com/czhuang/JSB-Chorales-dataset) & [Piano-e-Competition dataset (competition history)](http://www.piano-e-competition.com/)}, + framework = {tensor2tensor}, + input = {MIDI}, + link = {https://arxiv.org/pdf/1809.04281.pdf}, + loss = {No}, + task = {Polyphonic music sequence modelling}, + title = {MUSIC TRANSFORMER:GENERATING MUSIC WITH LONG-TERM STRUCTURE}, + year = {2018} +} + @inproceedings{Dong2018, activation = {ReLU & Leaky ReLU}, architecture = {GAN & CNN}, @@ -2065,3 +2100,27 @@ @inproceedings{Dong2018 year = {2018} } +@article{Li2018, + architecture = {CNN & RNN}, + author = {Li, Juncheng and Qu, Shuhui and Metze, Florian}, + link = {https://nips2018creativity.github.io/doc/music_theory_inspired_policy_gradient.pdf}, + task = {Music Transcription}, + title = {Music Theory Inspired Policy Gradient Method for Piano Music Transcription}, + year = {2018} +} + +@unpublished{Child2019, + architecture = {Transformer}, + author = {Rewon Child and Scott Gray and Alec Radford and Ilya Sutskever}, + batch = {No}, + code = {https://github.com/openai/sparse_attention}, + input = {Raw Audio}, + link = {https://arxiv.org/pdf/1904.10509.pdf}, + loss = {No}, + note = {this paper is mainly about how sparse transformer are implemented}, + pages = {8--9}, + task = {audio generation}, + title = {Generating Long Sequences with Sparse Transformers}, + year = {2019} +} + diff --git a/dl4m.tsv b/dl4m.tsv index 3384c0f..072ba34 100644 --- a/dl4m.tsv +++ b/dl4m.tsv @@ -139,6 +139,7 @@ Year Entrytype Title Author Link Code Task Reproducible Dataset Framework Archit 2017 inproceedings Designing efficient architectures for modeling temporal features with convolutional neural networks Pons, Jordi and Serra, Xavier http://ieeexplore.ieee.org/document/7952601/ https://github.com/jordipons/ICASSP2017 MGR [Ballroom](http://mtg.upf.edu/ismir2004/contest/tempoContest/node5.html) CNN 2017 inproceedings Timbre analysis of music audio signals with convolutional neural networks Pons, Jordi and Slizovskaia, Olga and Gong, Rong and Gómez, Emilia and Serra, Xavier https://github.com/ronggong/EUSIPCO2017 https://github.com/jordipons/EUSIPCO2017 CNN 2017 inproceedings Deep learning and intelligent audio mixing Ramírez, Marco A. Martínez and Reiss, Joshua D. http://www.semanticaudio.co.uk/wp-content/uploads/2017/09/WIMP2017_Martinez-RamirezReiss.pdf No Mixing [Open Multitrack Testbed](http://www.semanticaudio.co.uk/projects/omtb/) DAE No Adam +2017 inproceedings A SeqGAN for Polyphonic Music Generation Sang{-}gil Lee and Uiwon Hwang and Seonwoo Min and Sungroh Yoon https://arxiv.org/pdf/1710.11418v2.pdf https://github.com/L0SG/seqgan-music Polyphonic music sequence modelling [Nottingham dataset](http://abc.sourceforge.net/NMD/) Tensorflow SeqGAN No No MIDI No 2017 phdthesis Deep learning for event detection, sequence labelling and similarity estimation in music signals Schlüter, Jan http://ofai.at/~jan.schlueter/pubs/phd/phd.pdf 2017 inproceedings Music feature maps with convolutional neural networks for music genre classification Senac, Christine and Pellegrini, Thomas and Mouret, Florian and Pinquier, Julien https://www.researchgate.net/profile/Thomas_Pellegrini/publication/319326354_Music_Feature_Maps_with_Convolutional_Neural_Networks_for_Music_Genre_Classification/links/59ba5ae3458515bb9c4c6724/Music-Feature-Maps-with-Convolutional-Neural-Networks-for-Music-Genre-Classification.pdf?origin=publication_detail&ev=pub_int_prw_xdl&msrp=wzXuHZAa5zAnqEmErYyZwIRr2H0q01LnNEd4Wd7A15CQfdVLwdy98pmE-AdnrDvoc3-bVENSFrHt0yhaOiE2mQrYllVS9CJZOk-c9R0j_R1rbgcZugS6RtQ_.AUjPuJSF5P_DMngf-woH7W-7jdnQlbNQziR4_h6NnCHfR_zGcEa8vOyyOz5gx5nc4azqKTPQ5ZgGGLUxkLj1qCQLEQ5ThkhGlWHLyA.s6MBZE20-EO_RjRGCOCV4wk0WSFdN56Aloiraxz9hKCbJwRM2Et27RHVUA8jj9H8qvXIB6f7zSIrQgjXGrL2yCpyQlLffuf57rzSwg.KMMXbZrHsihV8DJM53xkHAWf3VebCJESi4KU4btNv9nQsyK2KnkhSQaTILKv0DSZY3c70a61LzywCBuoHtIhVOFhW5hVZN2n5O9uKQ MGR [GTzan](http://marsyas.info/downloads/datasets.html) CNN Spectrograms & common audio features 2017 inproceedings Automatic drum transcription for polyphonic recordings using soft attention mechanisms and convolutional neural networks Southall, Carl and Stables, Ryan and Hockman, Jason https://carlsouthall.files.wordpress.com/2017/12/ismir2017adt.pdf https://github.com/CarlSouthall/ADTLib Transcription [IDMT-SMT-Drums](https://www.idmt.fraunhofer.de/en/business_units/m2d/smt/drums.html) CNN & BRNN @@ -158,4 +159,7 @@ Year Entrytype Title Author Link Code Task Reproducible Dataset Framework Archit 2017 inproceedings Attention and localization based on a deep convolutional recurrent model for weakly supervised audio tagging Xu, Yong and Kong, Qiuqiang and Huang, Qiang and Wang, Wenwu and Plumbley, Mark D. https://arxiv.org/pdf/1703.06052.pdf https://github.com/yongxuUSTC/att_loc_cgrnn DCASE 2016 Task 4 Domestic audio tagging CRNN 2017 techreport Surrey-CVSSP system for DCASE2017 challenge task4 Xu, Yong and Kong, Qiuqiang and Wang, Wenwu and Plumbley, Mark D. https://www.cs.tut.fi/sgn/arg/dcase2017/documents/challenge_technical_reports/DCASE2017_Xu_146.pdf https://github.com/yongxuUSTC/dcase2017_task4_cvssp Event recognition 2017 inproceedings A study on LSTM networks for polyphonic music sequence modelling Ycart, Adrien and Benetos, Emmanouil https://qmro.qmul.ac.uk/xmlui/handle/123456789/24946 http://www.eecs.qmul.ac.uk/~ay304/code/ismir17 Polyphonic music sequence modelling Inhouse & [Piano-midi.de](Piano-midi.de) RNN-LSTM Pitch shift +2018 inproceedings MUSIC TRANSFORMER:GENERATING MUSIC WITH LONG-TERM STRUCTURE Cheng{-}Zhi Anna Huang and Ashish Vaswani and Jakob Uszkoreit and Noam Shazeer and Curtis Hawthorne and Andrew M. Dai and Matthew D. Hoffman and Douglas Eck https://arxiv.org/pdf/1809.04281.pdf Polyphonic music sequence modelling [J.S. Bach chorales dataset](https://github.com/czhuang/JSB-Chorales-dataset) & [Piano-e-Competition dataset (competition history)](http://www.piano-e-competition.com/) tensor2tensor Transformer & RNN No Time Stretches & pitch transcription MIDI No 2018 inproceedings MuseGAN: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment Dong, Hao-Wen and Hsiao, Wen-Yi and Yang, Li-Chia and Yang, Yi-Hsuan https://arxiv.org/pdf/1709.06298.pdf https://github.com/salu133445/musegan Composition No [Lakh Pianoroll Datase](https://github.com/salu133445/musegan/blob/master/docs/dataset.md) No GAN & CNN No No No No Piano-roll 1D ReLU & Leaky ReLU No No Adam 1 Tesla K40m +2018 article Music Theory Inspired Policy Gradient Method for Piano Music Transcription Li, Juncheng and Qu, Shuhui and Metze, Florian https://nips2018creativity.github.io/doc/music_theory_inspired_policy_gradient.pdf Music Transcription CNN & RNN +2019 unpublished Generating Long Sequences with Sparse Transformers Rewon Child and Scott Gray and Alec Radford and Ilya Sutskever https://arxiv.org/pdf/1904.10509.pdf https://github.com/openai/sparse_attention audio generation Transformer No Raw Audio No diff --git a/fig/articles_per_year.png b/fig/articles_per_year.png index c5f3351..1e0a96f 100644 Binary files a/fig/articles_per_year.png and b/fig/articles_per_year.png differ diff --git a/fig/pie_chart_architecture.png b/fig/pie_chart_architecture.png index 52467b6..a3b6846 100644 Binary files a/fig/pie_chart_architecture.png and b/fig/pie_chart_architecture.png differ diff --git a/fig/pie_chart_dataset.png b/fig/pie_chart_dataset.png index 3723118..c567ce0 100644 Binary files a/fig/pie_chart_dataset.png and b/fig/pie_chart_dataset.png differ diff --git a/fig/pie_chart_framework.png b/fig/pie_chart_framework.png index 72a607f..6273ad3 100644 Binary files a/fig/pie_chart_framework.png and b/fig/pie_chart_framework.png differ diff --git a/fig/pie_chart_task.png b/fig/pie_chart_task.png index 964a7c2..2a747ba 100644 Binary files a/fig/pie_chart_task.png and b/fig/pie_chart_task.png differ diff --git a/frameworks.md b/frameworks.md index bd6916d..ddcb6ed 100644 --- a/frameworks.md +++ b/frameworks.md @@ -11,3 +11,4 @@ Please refer to the list of useful acronyms used in deep learning and music: [ac - PyTorch - Tensorflow - Theano +- tensor2tensor diff --git a/publication_type.md b/publication_type.md index ed47497..ca2c35c 100644 --- a/publication_type.md +++ b/publication_type.md @@ -22,6 +22,7 @@ - Biennial Symposium for Arts and Technology - CBMI - CSMC +- CoRR - Connectionist Models Summer School - Convention of Electrical and Electronics Engineers - DLRS diff --git a/tasks.md b/tasks.md index a652f4b..fb7c26e 100644 --- a/tasks.md +++ b/tasks.md @@ -18,6 +18,7 @@ Please refer to the list of useful acronyms used in deep learning and music: [ac - MSR - Manifesto - Mixing +- Music Transcription - Music/Noise segmentation - Noise suppression - Onset detection @@ -35,3 +36,4 @@ Please refer to the list of useful acronyms used in deep learning and music: [ac - Syllable segmentation - Transcription - VAD +- audio generation