diff --git a/README.md b/README.md index 9e3d4c4..9081cc2 100644 --- a/README.md +++ b/README.md @@ -154,6 +154,7 @@ However, these surveys do not cover music information retrieval tasks that are i | 2017 | [Multi-level and multi-scale feature aggregation using pre-trained convolutional neural networks for music auto-tagging](https://arxiv.org/pdf/1703.01793v2.pdf) | No | | 2017 | [Multi-level and multi-scale feature aggregation using sample-level deep convolutional neural networks for music classification](https://arxiv.org/pdf/1706.06810.pdf) | [GitHub](https://github.com/jongpillee/musicTagging_MSD) | | 2017 | [Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms](https://arxiv.org/pdf/1703.01789v2.pdf) | No | +| 2017 | [A SeqGAN for Polyphonic Music Generation](https://arxiv.org/pdf/1710.11418.pdf) | [GitHub](https://github.com/L0SG/seqgan-music) | | 2017 | [Harmonic and percussive source separation using a convolutional auto encoder](http://www.eurasip.org/Proceedings/Eusipco/Eusipco2017/papers/1570346835.pdf) | No | | 2017 | [Stacked convolutional and recurrent neural networks for music emotion recognition](https://arxiv.org/pdf/1706.02292.pdf) | No | | 2017 | [A deep learning approach to source separation and remixing of hiphop music](https://repositori.upf.edu/bitstream/handle/10230/32919/Martel_2017.pdf?sequence=1&isAllowed=y) | No | @@ -172,7 +173,6 @@ However, these surveys do not cover music information retrieval tasks that are i | 2017 | [Designing efficient architectures for modeling temporal features with convolutional neural networks](http://ieeexplore.ieee.org/document/7952601/) | [GitHub](https://github.com/jordipons/ICASSP2017) | | 2017 | [Timbre analysis of music audio signals with convolutional neural networks](https://github.com/ronggong/EUSIPCO2017) | [GitHub](https://github.com/jordipons/EUSIPCO2017) | | 2017 | [Deep learning and intelligent audio mixing](http://www.semanticaudio.co.uk/wp-content/uploads/2017/09/WIMP2017_Martinez-RamirezReiss.pdf) | No | -| 2017 | [A SeqGAN for Polyphonic Music Generation](https://arxiv.org/pdf/1710.11418v2.pdf) | [GitHub](https://github.com/L0SG/seqgan-music) | | 2017 | [Deep learning for event detection, sequence labelling and similarity estimation in music signals](http://ofai.at/~jan.schlueter/pubs/phd/phd.pdf) | No | | 2017 | [Music feature maps with convolutional neural networks for music genre classification](https://www.researchgate.net/profile/Thomas_Pellegrini/publication/319326354_Music_Feature_Maps_with_Convolutional_Neural_Networks_for_Music_Genre_Classification/links/59ba5ae3458515bb9c4c6724/Music-Feature-Maps-with-Convolutional-Neural-Networks-for-Music-Genre-Classification.pdf?origin=publication_detail&ev=pub_int_prw_xdl&msrp=wzXuHZAa5zAnqEmErYyZwIRr2H0q01LnNEd4Wd7A15CQfdVLwdy98pmE-AdnrDvoc3-bVENSFrHt0yhaOiE2mQrYllVS9CJZOk-c9R0j_R1rbgcZugS6RtQ_.AUjPuJSF5P_DMngf-woH7W-7jdnQlbNQziR4_h6NnCHfR_zGcEa8vOyyOz5gx5nc4azqKTPQ5ZgGGLUxkLj1qCQLEQ5ThkhGlWHLyA.s6MBZE20-EO_RjRGCOCV4wk0WSFdN56Aloiraxz9hKCbJwRM2Et27RHVUA8jj9H8qvXIB6f7zSIrQgjXGrL2yCpyQlLffuf57rzSwg.KMMXbZrHsihV8DJM53xkHAWf3VebCJESi4KU4btNv9nQsyK2KnkhSQaTILKv0DSZY3c70a61LzywCBuoHtIhVOFhW5hVZN2n5O9uKQ) | No | | 2017 | [Automatic drum transcription for polyphonic recordings using soft attention mechanisms and convolutional neural networks](https://carlsouthall.files.wordpress.com/2017/12/ismir2017adt.pdf) | [GitHub](https://github.com/CarlSouthall/ADTLib) | @@ -192,10 +192,17 @@ However, these surveys do not cover music information retrieval tasks that are i | 2017 | [Attention and localization based on a deep convolutional recurrent model for weakly supervised audio tagging](https://arxiv.org/pdf/1703.06052.pdf) | [GitHub](https://github.com/yongxuUSTC/att_loc_cgrnn) | | 2017 | [Surrey-CVSSP system for DCASE2017 challenge task4](https://www.cs.tut.fi/sgn/arg/dcase2017/documents/challenge_technical_reports/DCASE2017_Xu_146.pdf) | [GitHub](https://github.com/yongxuUSTC/dcase2017_task4_cvssp) | | 2017 | [A study on LSTM networks for polyphonic music sequence modelling](https://qmro.qmul.ac.uk/xmlui/handle/123456789/24946) | [Website](http://www.eecs.qmul.ac.uk/~ay304/code/ismir17) | -| 2018 | [MUSIC TRANSFORMER:GENERATING MUSIC WITH LONG-TERM STRUCTURE](https://arxiv.org/pdf/1809.04281.pdf) | No | | 2018 | [MuseGAN: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment](https://arxiv.org/pdf/1709.06298.pdf) | [GitHub](https://github.com/salu133445/musegan) | -| 2018 | [Music Theory Inspired Policy Gradient Method for Piano Music Transcription](https://nips2018creativity.github.io/doc/music_theory_inspired_policy_gradient.pdf) | No | +| 2018 | [Music transformer: Generating music with long-term structure](https://arxiv.org/pdf/1809.04281.pdf) | No | +| 2018 | [Music theory inspired policy gradient method for piano music transcription](https://nips2018creativity.github.io/doc/music_theory_inspired_policy_gradient.pdf) | No | | 2019 | [Generating Long Sequences with Sparse Transformers](https://arxiv.org/pdf/1904.10509.pdf) | [GitHub](https://github.com/openai/sparse_attention) | +address = {Montreal, Canada} | [Music theory inspired policy gradient method for piano music transcription](https://nips2018creativity.github.io/doc/music_theory_inspired_policy_gradient.pdf) | No | +reproducible = {No} | [Generating Long Sequences with Sparse Transformers](https://arxiv.org/pdf/1904.10509.pdf) | [GitHub](https://github.com/openai/sparse_attention) | +address = {Montreal, Canada} | [Music theory inspired policy gradient method for piano music transcription](https://nips2018creativity.github.io/doc/music_theory_inspired_policy_gradient.pdf) | No | +reproducible = {No} | [Generating Long Sequences with Sparse Transformers](https://arxiv.org/pdf/1904.10509.pdf) | [GitHub](https://github.com/openai/sparse_attention) | +epochs = {100} | [A SeqGAN for Polyphonic Music Generation](https://arxiv.org/pdf/1710.11418.pdf) | [GitHub](https://github.com/L0SG/seqgan-music) | +address = {Montreal, Canada} | [Music theory inspired policy gradient method for piano music transcription](https://nips2018creativity.github.io/doc/music_theory_inspired_policy_gradient.pdf) | No | +reproducible = {No} | [Generating Long Sequences with Sparse Transformers](https://arxiv.org/pdf/1904.10509.pdf) | [GitHub](https://github.com/openai/sparse_attention) | [Go back to top](https://github.com/ybayle/awesome-deep-learning-music#deep-learning-for-music-dl4m-) @@ -247,17 +254,17 @@ Each entry in [dl4m.bib](dl4m.bib) also displays additional information: There are more papers from 2017 than any other years combined. Number of articles per year: ![Number of articles per year](fig/articles_per_year.png) -- If you are applying DL to music, there are [348 other researchers](authors.md) in your field. -- 35 tasks investigated. See the list of [tasks](tasks.md). +- If you are applying DL to music, there are [352 other researchers](authors.md) in your field. +- 34 tasks investigated. See the list of [tasks](tasks.md). Tasks pie chart: ![Tasks pie chart](fig/pie_chart_task.png) -- 51 datasets used. See the list of [datasets](datasets.md). +- 52 datasets used. See the list of [datasets](datasets.md). Datasets pie chart: ![Datasets pie chart](fig/pie_chart_dataset.png) -- 29 architectures used. See the list of [architectures](architectures.md). +- 30 architectures used. See the list of [architectures](architectures.md). Architectures pie chart: ![Architectures pie chart](fig/pie_chart_architecture.png) -- 10 frameworks used. See the list of [frameworks](frameworks.md). +- 9 frameworks used. See the list of [frameworks](frameworks.md). Frameworks pie chart: ![Frameworks pie chart](fig/pie_chart_framework.png) - Only 44 articles (26%) provide their source code. diff --git a/architectures.md b/architectures.md index ae61667..c3af7cc 100644 --- a/architectures.md +++ b/architectures.md @@ -31,3 +31,4 @@ Please refer to the list of useful acronyms used in deep learning and music: [ac - Transformer - U-Net - VPNN +- tensor2tensor diff --git a/authors.md b/authors.md index 107fc80..0ffc58e 100644 --- a/authors.md +++ b/authors.md @@ -2,10 +2,8 @@ - Adavanne, Sharath - Alec Radford -- Andrew M. Dai - Arumugam, Muthumari - Arzt, Andreas -- Ashish Vaswani - Badeau, Roland - Bammer, Roswitha - Barbieri, Francesco @@ -36,7 +34,6 @@ - Chen, Tanfang - Chen, Wenxiao - Cheng, Wen-Huang -- Cheng{-}Zhi Anna Huang - Chesmore, David - Chiang, Chin-Chin - Cho, Kyunghyun @@ -45,19 +42,20 @@ - Costa, Yandre MG - Courville, Aaron - Coutinho, Eduardo -- Curtis Hawthorne +- Dai, Andrew M. - Dannenberg, Roger B +- Das, Samarjit - David, Bertrand - De Haas, W Bas - De Lyon, Insa - Deng, Junqi - Dieleman, Sander - Dimoulas, Charalampos +- Dinculescu, Monica - Dixon, Simon - Doerfler, Monika - Dong, Hao-Wen - Dorfer, Matthias -- Douglas Eck - Drossos, Konstantinos - Duppada, Venkatesh - Durand, Simon @@ -99,6 +97,7 @@ - Han, Yoonchang - Hanjalic, A - Harchaoui, Zaid +- Hawthorne, Curtis - He, Wenqi - Hennequin, Romain - Herrera, Jorge @@ -107,21 +106,23 @@ - Hiray, Sushant - Hirvonen, Toni - Hockman, Jason +- Hoffman, Matthew D. - Holzapfel, Andre - Hsiao, Wen-Yi - Hsu, Yu-Lun - Hu, Min-Chun - Huang, Allen +- Huang, Cheng-Zhi Anna - Huang, Qiang - Humphrey, Eric J. - Hutchings, P. - Huttunen, Heikki +- Hwang, Uiwon - Ide, Ichiro - Ilya Sutskever - Imenina, Alina - Jackson, Philip J. B. - Jain, Shubham -- Jakob Uszkoreit - Janer Mestres, Jordi - Janer, Jordi - Jang, Jyh-Shing R @@ -162,6 +163,7 @@ - Lee, Honglak - Lee, Jongpil - Lee, Kyogu +- Lee, Sang-gil - Lee, Taejin - Lee, Tan - Leglaive, Simon @@ -171,6 +173,7 @@ - Li, Peter - Li, Siyan - Li, Tom LH +- Li, Xinjian - Li, Xinxing - Lidy, Thomas - Liem, CCS @@ -187,7 +190,6 @@ - Materka, Andrzej - Mathulaprangsan, Seksan - Matityaho, Benyamin -- Matthew D. Hoffman - McFee, Brian - Medhat, Fady - Mehri, Soroush @@ -195,6 +197,7 @@ - Mertins, Alfred - Metze, Florian - Mimilakis, Stylianos Ioannis +- Min, Seonwoo - Miron, Marius - Mitsufuji, Yuki - Montecchio, Nicola @@ -209,7 +212,6 @@ - Nielsen, Frank - Nieto, Oriol - Niewiadomski, Adam -- Noam Shazeer - Ogihara, Mitsunori - Oliveira, Luiz S - Oramas, Sergio @@ -248,7 +250,6 @@ - Roma, Gerard - Rosasco, Lorenzo - Sandler, Mark Brian -- Sang{-}gil Lee - Santos, João Felipe - Santoso, Andri - Saurous, Rif A. @@ -263,12 +264,13 @@ - Schultz, Tanja - Scott Gray - Senac, Christine -- Seonwoo Min - Serra, Xavier - Seybold, Bryan +- Shazeer, Noam - Shi, Zhengshan - Sigtia, Siddharth - Silla, Carlos N +- Simon, Ian - Simpson, Andrew J. R. - Slaney, Malcolm - Slizovskaia, Olga @@ -282,7 +284,6 @@ - Stoller, Daniel - Sturm, Bob L. - Su, Hong -- Sungroh Yoon - Takahashi, Naoya - Takiguchi, Tetsuya - Tanaka, Hidehiko @@ -295,12 +296,13 @@ - Tsaptsinos, Alexandros - Tsipas, Nikolaos - Uhlich, Stefan -- Uiwon Hwang - Ullrich, Karen +- Uszkoreit, Jakob - Valin, Jean-Marc - Van Gemert, JC - Van Gool, Luc - Van den Oord, Aaron +- Vaswani, Ashish - Velarde, Gissel - Veličković, Petar - Virtanen, Tuomas @@ -316,6 +318,7 @@ - Wang, Wenwu - Wang, Xinxi - Wang, Ye +- Wang, Yun - Wang, Yuyi - Wang, Ziyuan - Watson, David @@ -340,6 +343,7 @@ - Yang, Yi-Hsuan - Ycart, Adrien - Yoo, Chang D +- Yoon, Sungroh - Zhang, Chiyuan - Zhang, Hui - Zhang, Pengjing diff --git a/datasets.md b/datasets.md index d2eda0a..c871ebf 100644 --- a/datasets.md +++ b/datasets.md @@ -5,6 +5,7 @@ Please refer to the list of useful acronyms used in deep learning and music: [ac - Inhouse - No - [32 Beethoven’s piano sonatas gathered from https://archive.org](https://soundcloud.com/samplernn/sets) +- [413 hours of recorded solo piano music](http://papers.nips.cc/paper/8023-the-challenge-of-realistic-music-generation-modelling-raw-audio-at-scale-supplemental.zip) - [7digital](https://7digital.com) - [ADC2004](http://labrosa.ee.columbia.edu/projects/melody/) - [Acoustic Event](https://data.vision.ee.ethz.ch/cvl/ae_dataset/) @@ -24,7 +25,7 @@ Please refer to the list of useful acronyms used in deep learning and music: [ac - [IDMT-SMT-Drums](https://www.idmt.fraunhofer.de/en/business_units/m2d/smt/drums.html) - [IRMAS](https://www.upf.edu/web/mtg/irmas) - [J.S. Bach chorales dataset](https://github.com/czhuang/JSB-Chorales-dataset) -- [JSB Chorales](ftp://i11ftp.ira.uka.de/pub/neuro/dominik/midifiles/bach.zip) +- [JSB Chorales](https://github.com/czhuang/JSB-Chorales-dataset) - [Jamendo](http://www.mathieuramona.com/wp/data/jamendo/) - [LMD](https://sites.google.com/site/carlossillajr/resources/the-latin-music-database-lmd) - [LSDB](lsdb.flow-machines.com/) diff --git a/dl4m.bib b/dl4m.bib index 868fa81..eb647ba 100644 --- a/dl4m.bib +++ b/dl4m.bib @@ -1518,6 +1518,36 @@ @inproceedings{Lee2017a year = {2017} } +@unpublished{Lee2017d, + activation = {No}, + architecture = {SeqGAN}, + author = {Lee, Sang-gil and Hwang, Uiwon and Min, Seonwoo and Yoon, Sungroh}, + batch = {No}, + booktitle = {CoRR}, + code = {https://github.com/L0SG/seqgan-music}, + computationtime = {No}, + dataaugmentation = {No}, + dataset = {[Nottingham dataset](http://abc.sourceforge.net/NMD/)}, + dimension = {1D}, + dropout = {No}, + epochs = {100}, + framework = {Tensorflow}, + gpu = {No}, + input = {MIDI}, + layers = {5}, + learningrate = {0.01 & 0.001 & 0.0001}, + link = {https://arxiv.org/pdf/1710.11418.pdf}, + loss = {No}, + metric = {No}, + momentum = {No}, + optimizer = {No}, + pages = {1--8}, + reproducible = {No}, + task = {Polyphonic music sequence modelling}, + title = {A SeqGAN for Polyphonic Music Generation}, + year = {2017} +} + @inproceedings{Lim2017, architecture = {CNN}, author = {Lim, Wootaek and Lee, Taejin}, @@ -1782,36 +1812,6 @@ @inproceedings{Ramirez2017 year = {2017} } -@unpublished{Lee2017d, - architecture = {SeqGAN}, - author = {Lee, Sang-gil and Hwang, Uiwon and Min, Seonwoo and Yoon, Sungroh}, - batch = {No}, - booktitle = {CoRR}, - code = {https://github.com/L0SG/seqgan-music}, - dataaugmentation = {No}, - dataset = {[Nottingham dataset](http://abc.sourceforge.net/NMD/)}, - framework = {Tensorflow}, - input = {MIDI}, - link = {https://arxiv.org/pdf/1710.11418.pdf}, - loss = {No}, - task = {Polyphonic music sequence modelling}, - title = {A SeqGAN for Polyphonic Music Generation}, - year = {2017} - epochs = {100}, - learningrate = {0.01 & 0.001 & 0.0001}, - layers = {5}, - dropout = {No}, - gpu = {No}, - pages = {1--8}, - reproducible = {No}, - optimizer = {No}, - momentum = {No}, - metric = {No}, - activation = {No}, - computationtime = {No}, - dimension = {1D}, -} - @phdthesis{Schlueter2017, author = {Schlüter, Jan}, link = {http://ofai.at/~jan.schlueter/pubs/phd/phd.pdf}, @@ -2064,37 +2064,6 @@ @inproceedings{Ycart2017 year = {2017} } -@unpublished{Huang2018, - activation = {No}, - address = {No}, - architecture = {Transformer & RNN & tensor2tensor}, - author = {Huang, Cheng-Zhi Anna and Vaswani, Ashish and Uszkoreit, Jakob and Shazeer, Noam and Simon, Ian and Hawthorne, Curtis and Dai, Andrew M. and Hoffman, Matthew D. and Dinculescu, Monica and Eck, Douglas}, - batch = {1}, - dataaugmentation = {Time Stretches & pitch transcription}, - dataset = {[J.S. Bach chorales dataset](https://github.com/czhuang/JSB-Chorales-dataset) & [Piano-e-Competition dataset (competition history)](http://www.piano-e-competition.com/)}, - framework = {No}, - input = {MIDI}, - link = {https://arxiv.org/pdf/1809.04281.pdf}, - loss = {No}, - task = {Polyphonic music sequence modelling}, - title = {Music transformer: Generating music with long-term structure}, - year = {2018}, - note = {Submitted to ICLR 2019 Conference Paper1531 Area Chair1 cf https://openreview.net/forum?id=rJe4ShAcF7} - code = {No}, - computationtime = {No}, - dimension = {1D}, - dropout = {0.1}, - learningrate = {0.1}, - epochs = {No}, - gpu = {No}, - layers = {4 & 5 & 6}, - momentum = {}, - optimizer = {No}, - pages = {1--14}, - reproducible = {No}, - metric = {Negative Log-likelihood}, -} - @inproceedings{Dong2018, activation = {ReLU & Leaky ReLU}, architecture = {GAN & CNN}, @@ -2126,64 +2095,96 @@ @inproceedings{Dong2018 year = {2018} } +@unpublished{Huang2018, + activation = {No}, + address = {No}, + architecture = {Transformer & RNN & tensor2tensor}, + author = {Huang, Cheng-Zhi Anna and Vaswani, Ashish and Uszkoreit, Jakob and Shazeer, Noam and Simon, Ian and Hawthorne, Curtis and Dai, Andrew M. and Hoffman, Matthew D. and Dinculescu, Monica and Eck, Douglas}, + batch = {1}, + computationtime = {No}, + dataaugmentation = {Time Stretches & pitch transcription}, + dataset = {[J.S. Bach chorales dataset](https://github.com/czhuang/JSB-Chorales-dataset) & [Piano-e-Competition dataset (competition history)](http://www.piano-e-competition.com/)}, + dimension = {1D}, + dropout = {0.1}, + epochs = {No}, + framework = {No}, + gpu = {No}, + input = {MIDI}, + layers = {4 & 5 & 6}, + learningrate = {0.1}, + link = {https://arxiv.org/pdf/1809.04281.pdf}, + loss = {No}, + metric = {Negative Log-likelihood}, + momentum = {}, + note = {{Submitted to ICLR 2019 Conference Paper1531 Area Chair1 cf https://openreview.net/forum?id=rJe4ShAcF7} +code = {No}}, + optimizer = {No}, + pages = {1--14}, + reproducible = {No}, + task = {Polyphonic music sequence modelling}, + title = {Music transformer: Generating music with long-term structure}, + year = {2018} +} + @inproceedings{Li2018, + activation = {No}, + address = {Montreal, Canada}, architecture = {CNN & RNN}, author = {Li, Juncheng and Qu, Shuhui and Wang, Yun and Li, Xinjian and Das, Samarjit and Metze, Florian}, - link = {https://nips2018creativity.github.io/doc/music_theory_inspired_policy_gradient.pdf}, - task = {Transcription}, - title = {Music theory inspired policy gradient method for piano music transcription}, - year = {2018} - address = {Montreal, Canada}, + batch = {8}, booktitle = {[NIPS](https://nips.cc/)}, code = {No}, - reproducible = {No}, - pages = {1--10}, - batch = {8}, - learningrate = {0.0006}, - optimizer = {Adam}, - metric = {Precision & Recall & F1}, - month = {Dec.}, - dataset = {[MAPS](http://www.tsi.telecom-paristech.fr/aao/en/2010/07/08/maps-database-a-piano-database-for-multipitch-estimation-and-automatic-transcription-of-music/)}, - layers = {4}, - loss = {binary cross-entropy}, - momentum = {No}, computationtime = {No}, dataaugmentation = {No}, - activation = {No}, + dataset = {[MAPS](http://www.tsi.telecom-paristech.fr/aao/en/2010/07/08/maps-database-a-piano-database-for-multipitch-estimation-and-automatic-transcription-of-music/)}, + dimension = {2D}, dropout = {No}, epochs = {No}, - note = {RL Reinforcement Learning} - framework = {No}, gpu = {No}, - dimension = {2D}, input = {Log Mel-spectrogram with 48 bins per octave and 512 hop-size and 2018 window size and 16 kHz sample rate}, + layers = {4}, + learningrate = {0.0006}, + link = {https://nips2018creativity.github.io/doc/music_theory_inspired_policy_gradient.pdf}, + loss = {binary cross-entropy}, + metric = {Precision & Recall & F1}, + momentum = {No}, + month = {Dec.}, + note = {{RL Reinforcement Learning} +framework = {No}}, + optimizer = {Adam}, + pages = {1--10}, + reproducible = {No}, + task = {Transcription}, + title = {Music theory inspired policy gradient method for piano music transcription}, + year = {2018} } @unpublished{Child2019, + activation = {No}, architecture = {Transformer}, author = {Rewon Child and Scott Gray and Alec Radford and Ilya Sutskever}, batch = {No}, code = {https://github.com/openai/sparse_attention}, + dataaugmentation = {No}, + dataset = {[413 hours of recorded solo piano music](http://papers.nips.cc/paper/8023-the-challenge-of-realistic-music-generation-modelling-raw-audio-at-scale-supplemental.zip)}, + dimension = {1D}, + dropout = {0.25}, + epochs = {120}, + framework = {Tensorflow}, + gpu = {8 NVIDIA Tesla V100}, input = {Raw Audio}, + layers = {128}, + learningrate = {0.00035}, link = {https://arxiv.org/pdf/1904.10509.pdf}, loss = {No}, + metric = {Human listening}, + momentum = {No}, note = {this paper is mainly about how sparse transformer are implemented}, + optimizer = {Adam}, pages = {8--9}, + reproducible = {No}, task = {Audio generation}, title = {Generating Long Sequences with Sparse Transformers}, year = {2019} - reproducible = {No}, - dataaugmentation = {No}, - dataset = {[413 hours of recorded solo piano music](http://papers.nips.cc/paper/8023-the-challenge-of-realistic-music-generation-modelling-raw-audio-at-scale-supplemental.zip)}, - activation = {No}, - dimension = {1D}, - epochs = {120}, - dropout = {0.25}, - learningrate = {0.00035}, - layers = {128}, - metric = {Human listening}, - gpu = {8 NVIDIA Tesla V100}, - optimizer = {Adam}, - momentum = {No}, - framework = {Tensorflow}, } + diff --git a/dl4m.tsv b/dl4m.tsv index 072ba34..f5f70dd 100644 --- a/dl4m.tsv +++ b/dl4m.tsv @@ -87,7 +87,7 @@ Year Entrytype Title Author Link Code Task Reproducible Dataset Framework Archit 2016 unpublished Deep convolutional neural networks and data augmentation for acoustic event detection Takahashi, Naoya and Gygli, Michael and Pfister, Beat and Van Gool, Luc https://arxiv.org/pdf/1604.07160.pdf https://bitbucket.org/naoya1/aenet_release Event recognition [Acoustic Event](https://data.vision.ee.ethz.ch/cvl/ae_dataset/) CNN Mixing 2017 unpublished Gabor frames and deep scattering networks in audio processing Bammer, Roswitha and Doerfler, Monika https://arxiv.org/pdf/1706.08818.pdf 2017 inproceedings Vision-based detection of acoustic timed events: A case study on clarinet note onsets Bazzica, Alessio and Van Gemert, JC and Liem, CCS and Hanjalic, A http://dorienherremans.com/dlm2017/papers/bazzica2017clarinet.pdf Onset detection [C4S](http://mmc.tudelft.nl/users/alessio-bazzica#C4S-dataset) CNN -2017 unpublished Deep learning techniques for music generation - A survey Briot, Jean-Pierre and Hadjeres, Gaëtan and Pachet, François https://arxiv.org/pdf/1709.01620.pdf Survey & Composition [JSB Chorales](ftp://i11ftp.ira.uka.de/pub/neuro/dominik/midifiles/bach.zip) & [MusicNet](https://homes.cs.washington.edu/~thickstn/musicnet.html) & [Symbolic music data](http://users.cecs.anu.edu.au/~christian.walder/) & [LSDB](lsdb.flow-machines.com/) No +2017 unpublished Deep learning techniques for music generation - A survey Briot, Jean-Pierre and Hadjeres, Gaëtan and Pachet, François https://arxiv.org/pdf/1709.01620.pdf Survey & Composition [JSB Chorales](https://github.com/czhuang/JSB-Chorales-dataset) & [MusicNet](https://homes.cs.washington.edu/~thickstn/musicnet.html) & [Symbolic music data](http://users.cecs.anu.edu.au/~christian.walder/) & [LSDB](lsdb.flow-machines.com/) No 2017 inproceedings JamBot: Music theory aware chord based generation of polyphonic music with LSTMs Brunner, Gino and Wang, Yuyi and Wattenhofer, Roger and Wiesendanger, Jonas https://arxiv.org/pdf/1711.07682.pdf https://github.com/brunnergino/JamBot Composition [Lakh MIDI](https://labrosa.ee.columbia.edu/sounds/music/) Keras-TensorFlow RNN-LSTM No No 4 No Softmax 0.00001 Adam 1 2017 unpublished XFlow: 1D <-> 2D cross-modal deep neural networks for audiovisual classification Cangea, Cătălina and Veličković, Petar and Liò, Pietro https://arxiv.org/pdf/1709.00572.pdf 2017 inproceedings Machine listening intelligence Cella, Carmine-Emanuele http://dorienherremans.com/dlm2017/papers/cella2017mli.pdf No Manifesto No No No No No No No No No No No No @@ -121,6 +121,7 @@ Year Entrytype Title Author Link Code Task Reproducible Dataset Framework Archit 2017 article Multi-level and multi-scale feature aggregation using pre-trained convolutional neural networks for music auto-tagging Lee, Jongpil and Nam, Juhan https://arxiv.org/pdf/1703.01793v2.pdf 2017 unpublished Multi-level and multi-scale feature aggregation using sample-level deep convolutional neural networks for music classification Lee, Jongpil and Nam, Juhan https://arxiv.org/pdf/1706.06810.pdf https://github.com/jongpillee/musicTagging_MSD [MSD](https://labrosa.ee.columbia.edu/millionsong/) 2017 inproceedings Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms Lee, Jongpil and Park, Jiyoung and Kim, Keunhyoung Luke and Nam, Juhan https://arxiv.org/pdf/1703.01789v2.pdf +2017 unpublished A SeqGAN for Polyphonic Music Generation Lee, Sang-gil and Hwang, Uiwon and Min, Seonwoo and Yoon, Sungroh https://arxiv.org/pdf/1710.11418.pdf https://github.com/L0SG/seqgan-music Polyphonic music sequence modelling No [Nottingham dataset](http://abc.sourceforge.net/NMD/) Tensorflow SeqGAN No No 100 No MIDI 1D No No 0.01 & 0.001 & 0.0001 No No 2017 inproceedings Harmonic and percussive source separation using a convolutional auto encoder Lim, Wootaek and Lee, Taejin http://www.eurasip.org/Proceedings/Eusipco/Eusipco2017/papers/1570346835.pdf Source separation [DSD100](http://sisec17.audiolabs-erlangen.de/#/dataset) CNN 2017 unpublished Stacked convolutional and recurrent neural networks for music emotion recognition Malik, Miroslav and Adavanne, Sharath and Drossos, Konstantinos and Virtanen, Tuomas and Ticha, Dasa and Jarina, Roman https://arxiv.org/pdf/1706.02292.pdf No MER [Free music archive](http://freemusicarchive.org/) & [MedleyDB](http://medleydb.weebly.com/) & [Jamendo](http://www.mathieuramona.com/wp/data/jamendo/) Keras-Theano CRNN No RMSE Adam 2017 mastersthesis A deep learning approach to source separation and remixing of hiphop music Martel Baro, Héctor https://repositori.upf.edu/bitstream/handle/10230/32919/Martel_2017.pdf?sequence=1&isAllowed=y Source separation & Remixing [DSD100](http://sisec17.audiolabs-erlangen.de/#/dataset) & [HHDS](https://drive.google.com/drive/folders/0B1zpiGdDzFNlbmJyYU1VVFR3OEE) DNN & CNN & RNN Mixing & Circular Shift & Instrument augmentation @@ -139,7 +140,6 @@ Year Entrytype Title Author Link Code Task Reproducible Dataset Framework Archit 2017 inproceedings Designing efficient architectures for modeling temporal features with convolutional neural networks Pons, Jordi and Serra, Xavier http://ieeexplore.ieee.org/document/7952601/ https://github.com/jordipons/ICASSP2017 MGR [Ballroom](http://mtg.upf.edu/ismir2004/contest/tempoContest/node5.html) CNN 2017 inproceedings Timbre analysis of music audio signals with convolutional neural networks Pons, Jordi and Slizovskaia, Olga and Gong, Rong and Gómez, Emilia and Serra, Xavier https://github.com/ronggong/EUSIPCO2017 https://github.com/jordipons/EUSIPCO2017 CNN 2017 inproceedings Deep learning and intelligent audio mixing Ramírez, Marco A. Martínez and Reiss, Joshua D. http://www.semanticaudio.co.uk/wp-content/uploads/2017/09/WIMP2017_Martinez-RamirezReiss.pdf No Mixing [Open Multitrack Testbed](http://www.semanticaudio.co.uk/projects/omtb/) DAE No Adam -2017 inproceedings A SeqGAN for Polyphonic Music Generation Sang{-}gil Lee and Uiwon Hwang and Seonwoo Min and Sungroh Yoon https://arxiv.org/pdf/1710.11418v2.pdf https://github.com/L0SG/seqgan-music Polyphonic music sequence modelling [Nottingham dataset](http://abc.sourceforge.net/NMD/) Tensorflow SeqGAN No No MIDI No 2017 phdthesis Deep learning for event detection, sequence labelling and similarity estimation in music signals Schlüter, Jan http://ofai.at/~jan.schlueter/pubs/phd/phd.pdf 2017 inproceedings Music feature maps with convolutional neural networks for music genre classification Senac, Christine and Pellegrini, Thomas and Mouret, Florian and Pinquier, Julien https://www.researchgate.net/profile/Thomas_Pellegrini/publication/319326354_Music_Feature_Maps_with_Convolutional_Neural_Networks_for_Music_Genre_Classification/links/59ba5ae3458515bb9c4c6724/Music-Feature-Maps-with-Convolutional-Neural-Networks-for-Music-Genre-Classification.pdf?origin=publication_detail&ev=pub_int_prw_xdl&msrp=wzXuHZAa5zAnqEmErYyZwIRr2H0q01LnNEd4Wd7A15CQfdVLwdy98pmE-AdnrDvoc3-bVENSFrHt0yhaOiE2mQrYllVS9CJZOk-c9R0j_R1rbgcZugS6RtQ_.AUjPuJSF5P_DMngf-woH7W-7jdnQlbNQziR4_h6NnCHfR_zGcEa8vOyyOz5gx5nc4azqKTPQ5ZgGGLUxkLj1qCQLEQ5ThkhGlWHLyA.s6MBZE20-EO_RjRGCOCV4wk0WSFdN56Aloiraxz9hKCbJwRM2Et27RHVUA8jj9H8qvXIB6f7zSIrQgjXGrL2yCpyQlLffuf57rzSwg.KMMXbZrHsihV8DJM53xkHAWf3VebCJESi4KU4btNv9nQsyK2KnkhSQaTILKv0DSZY3c70a61LzywCBuoHtIhVOFhW5hVZN2n5O9uKQ MGR [GTzan](http://marsyas.info/downloads/datasets.html) CNN Spectrograms & common audio features 2017 inproceedings Automatic drum transcription for polyphonic recordings using soft attention mechanisms and convolutional neural networks Southall, Carl and Stables, Ryan and Hockman, Jason https://carlsouthall.files.wordpress.com/2017/12/ismir2017adt.pdf https://github.com/CarlSouthall/ADTLib Transcription [IDMT-SMT-Drums](https://www.idmt.fraunhofer.de/en/business_units/m2d/smt/drums.html) CNN & BRNN @@ -159,7 +159,7 @@ Year Entrytype Title Author Link Code Task Reproducible Dataset Framework Archit 2017 inproceedings Attention and localization based on a deep convolutional recurrent model for weakly supervised audio tagging Xu, Yong and Kong, Qiuqiang and Huang, Qiang and Wang, Wenwu and Plumbley, Mark D. https://arxiv.org/pdf/1703.06052.pdf https://github.com/yongxuUSTC/att_loc_cgrnn DCASE 2016 Task 4 Domestic audio tagging CRNN 2017 techreport Surrey-CVSSP system for DCASE2017 challenge task4 Xu, Yong and Kong, Qiuqiang and Wang, Wenwu and Plumbley, Mark D. https://www.cs.tut.fi/sgn/arg/dcase2017/documents/challenge_technical_reports/DCASE2017_Xu_146.pdf https://github.com/yongxuUSTC/dcase2017_task4_cvssp Event recognition 2017 inproceedings A study on LSTM networks for polyphonic music sequence modelling Ycart, Adrien and Benetos, Emmanouil https://qmro.qmul.ac.uk/xmlui/handle/123456789/24946 http://www.eecs.qmul.ac.uk/~ay304/code/ismir17 Polyphonic music sequence modelling Inhouse & [Piano-midi.de](Piano-midi.de) RNN-LSTM Pitch shift -2018 inproceedings MUSIC TRANSFORMER:GENERATING MUSIC WITH LONG-TERM STRUCTURE Cheng{-}Zhi Anna Huang and Ashish Vaswani and Jakob Uszkoreit and Noam Shazeer and Curtis Hawthorne and Andrew M. Dai and Matthew D. Hoffman and Douglas Eck https://arxiv.org/pdf/1809.04281.pdf Polyphonic music sequence modelling [J.S. Bach chorales dataset](https://github.com/czhuang/JSB-Chorales-dataset) & [Piano-e-Competition dataset (competition history)](http://www.piano-e-competition.com/) tensor2tensor Transformer & RNN No Time Stretches & pitch transcription MIDI No 2018 inproceedings MuseGAN: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment Dong, Hao-Wen and Hsiao, Wen-Yi and Yang, Li-Chia and Yang, Yi-Hsuan https://arxiv.org/pdf/1709.06298.pdf https://github.com/salu133445/musegan Composition No [Lakh Pianoroll Datase](https://github.com/salu133445/musegan/blob/master/docs/dataset.md) No GAN & CNN No No No No Piano-roll 1D ReLU & Leaky ReLU No No Adam 1 Tesla K40m -2018 article Music Theory Inspired Policy Gradient Method for Piano Music Transcription Li, Juncheng and Qu, Shuhui and Metze, Florian https://nips2018creativity.github.io/doc/music_theory_inspired_policy_gradient.pdf Music Transcription CNN & RNN -2019 unpublished Generating Long Sequences with Sparse Transformers Rewon Child and Scott Gray and Alec Radford and Ilya Sutskever https://arxiv.org/pdf/1904.10509.pdf https://github.com/openai/sparse_attention audio generation Transformer No Raw Audio No +2018 unpublished Music transformer: Generating music with long-term structure Huang, Cheng-Zhi Anna and Vaswani, Ashish and Uszkoreit, Jakob and Shazeer, Noam and Simon, Ian and Hawthorne, Curtis and Dai, Andrew M. and Hoffman, Matthew D. and Dinculescu, Monica and Eck, Douglas https://arxiv.org/pdf/1809.04281.pdf Polyphonic music sequence modelling No [J.S. Bach chorales dataset](https://github.com/czhuang/JSB-Chorales-dataset) & [Piano-e-Competition dataset (competition history)](http://www.piano-e-competition.com/) No Transformer & RNN & tensor2tensor 0.1 1 No Time Stretches & pitch transcription MIDI 1D No No 0.1 No No +2018 inproceedings Music theory inspired policy gradient method for piano music transcription Li, Juncheng and Qu, Shuhui and Wang, Yun and Li, Xinjian and Das, Samarjit and Metze, Florian https://nips2018creativity.github.io/doc/music_theory_inspired_policy_gradient.pdf No Transcription No [MAPS](http://www.tsi.telecom-paristech.fr/aao/en/2010/07/08/maps-database-a-piano-database-for-multipitch-estimation-and-automatic-transcription-of-music/) CNN & RNN No 8 No No Log Mel-spectrogram with 48 bins per octave and 512 hop-size and 2018 window size and 16 kHz sample rate 2D No binary cross-entropy 0.0006 Adam No +2019 unpublished Generating Long Sequences with Sparse Transformers Rewon Child and Scott Gray and Alec Radford and Ilya Sutskever https://arxiv.org/pdf/1904.10509.pdf https://github.com/openai/sparse_attention Audio generation No [413 hours of recorded solo piano music](http://papers.nips.cc/paper/8023-the-challenge-of-realistic-music-generation-modelling-raw-audio-at-scale-supplemental.zip) Tensorflow Transformer 0.25 No 120 No Raw Audio 1D No No 0.00035 Adam 8 NVIDIA Tesla V100 diff --git a/fig/articles_per_year.png b/fig/articles_per_year.png index 1e0a96f..fc46ecd 100644 Binary files a/fig/articles_per_year.png and b/fig/articles_per_year.png differ diff --git a/fig/pie_chart_architecture.png b/fig/pie_chart_architecture.png index a3b6846..f2c08cf 100644 Binary files a/fig/pie_chart_architecture.png and b/fig/pie_chart_architecture.png differ diff --git a/fig/pie_chart_dataset.png b/fig/pie_chart_dataset.png index c567ce0..7d54925 100644 Binary files a/fig/pie_chart_dataset.png and b/fig/pie_chart_dataset.png differ diff --git a/fig/pie_chart_framework.png b/fig/pie_chart_framework.png index 6273ad3..b939797 100644 Binary files a/fig/pie_chart_framework.png and b/fig/pie_chart_framework.png differ diff --git a/fig/pie_chart_task.png b/fig/pie_chart_task.png index 2a747ba..964a7c2 100644 Binary files a/fig/pie_chart_task.png and b/fig/pie_chart_task.png differ diff --git a/frameworks.md b/frameworks.md index ddcb6ed..bd6916d 100644 --- a/frameworks.md +++ b/frameworks.md @@ -11,4 +11,3 @@ Please refer to the list of useful acronyms used in deep learning and music: [ac - PyTorch - Tensorflow - Theano -- tensor2tensor diff --git a/publication_type.md b/publication_type.md index ca2c35c..ed47497 100644 --- a/publication_type.md +++ b/publication_type.md @@ -22,7 +22,6 @@ - Biennial Symposium for Arts and Technology - CBMI - CSMC -- CoRR - Connectionist Models Summer School - Convention of Electrical and Electronics Engineers - DLRS diff --git a/tasks.md b/tasks.md index fb7c26e..de9bf1b 100644 --- a/tasks.md +++ b/tasks.md @@ -3,6 +3,7 @@ Please refer to the list of useful acronyms used in deep learning and music: [acronyms.md](acronyms.md). - Artist recognition +- Audio generation - Beat detection - Boundary detection - Chord recognition @@ -18,7 +19,6 @@ Please refer to the list of useful acronyms used in deep learning and music: [ac - MSR - Manifesto - Mixing -- Music Transcription - Music/Noise segmentation - Noise suppression - Onset detection @@ -36,4 +36,3 @@ Please refer to the list of useful acronyms used in deep learning and music: [ac - Syllable segmentation - Transcription - VAD -- audio generation