-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MusicGen advice #3
Comments
I'll take a look at your parameters. I don't think there's huge public
expertise on musicgen, probably nobody even knows all the genres it can and
cannot do.
That being said, I might invest some money into generating more data about
musicgen.
…On Fri, Jun 28, 2024, 3:05 PM Christopher Lowden ***@***.***> wrote:
Hello
This is an issue but maybe you have an idea that can help me. I am using
your MusicGen interface with the metadata below ;.
{ "_version": "0.0.1", "_hash_version": "0.0.3", "_type": "musicgen",
"_audiocraft_version": "1.3.0", "models": {}, "prompt": "((piano)) acoustic
key F minor minimalist low energy 4/4, 150bpm 320kbps 48.0kHz Stereo",
"hash": "05f88c6f4049307a5209b74c368f62fda1575c7ab45668e06b39f54806e0fbcd",
"date": "2024-06-21_23-06-50", "text": "((piano)) acoustic key F minor
minimalist low energy 4/4, 150bpm 320kbps 48.0kHz Stereo", "melody":
"94240fe69b46edc19d55977a5b38598da85708c28bd932d51dbbd5f00e609076",
"model": "facebook/musicgen-stereo-melody-large", "duration": 360, "topk":
250, "topp": 0, "temperature": 1, "cfg_coef": 3, "seed": "1538577670",
"use_multi_band_diffusion": false }
The melody reference is Philip Glass Metamorphosis 5
https://www.youtube.com/watch?v=Rebr_F53db8
This seemed to me to reasonably possible for a general AI model. After
hours of playing around with different settings, I still don't get anything
near the ref. I get an audio "soup" at best. As there seems to be rather
little written on Musicgen, I was wondering if you have any ideas about the
limits of the Musicgen model and what it might have been trained on?
Any thoughts are most welcome.
Thank you.
—
Reply to this email directly, view it on GitHub
<#3>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTRXI6UIV4UZWU7OJFZ2B3ZJVGPHAVCNFSM6AAAAABKBYEPKOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4DAMRYGA4DQNY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Note - the following comment is copyrighted and is not meant to be freely redistributed on other pages or platforms. To anyone who wishes to reproduce this - please reach out to me. I have done some work on the prompt, though this might not be the prompt you wanted it to be, this seems to work better: (default settings, musicgen stereo melody large, seed: 4240372830)
audio.51.mp4Note that the elements are separated more. Although this reduces the expressiveness of the model, it can be better to split the prompt with more commas when the AI doesn't seem to notice something. I also generated it without any melody. I noticed that the audio is normalized and the quality isn't perfect, but that might always be a challenge since with all of the generative AI models. I felt like the audio is a bit too fast, so I tried a lower BPM with this reference:
audio.53.mp4By the way, I see that you generate a 360s long piece. Even on an A100 I see 3 seconds of generation time for 1 second of audio, so I recommend doing 1-3 seconds short generations and then when it sounds like something worthwhile reusing the last seed and increasing the generation time bit by bit, that's also what I did for some of this library to even see what the model is able to follow and what it isn't. I tried to add some more tweaks, but it seems to need some skillful prompt-engineering (as much as prompt-engineering is a meme in other circles, here it's exactly what we need).
With the longer durations musicgen begins to splice and give suboptimal transitions. audio.54.mp4
audio.56.mp4The problem that tends to happen is it gets fairly monotone; however it does slowly move forward. Increasing the temparature gives a worse result:
audio.57.mp4Even increasing CFG to 5 does not fix it:
audio.58.mp4lowering temperature to 0.9 with CFG still at 5:
audio.59.mp4dropping CFG to 2 improves the result:
audio.61.mp4However, overall it seems that the model wants to have more information in the prompt. At 0.7 CFG the model was too creative and added silence gaps in the music.
audio.62.mp4Even CFG 1.4 is not enough:
audio.63.mp41.7 gets better, but monotone audio.64.mp4CFG 1.7 with Temperature 1.1 again is not really useful: audio.65.mp4Tweaking CFG, temperature and adding more to the prompt:
audio.66.mp4Now, trying to expand that to 6 minutes we run into problems, and the audio slowly degenerates completely (warning, loud noises):
audio.67.mp4Note - this comment is copyrighted and is not meant to be freely redistributed on other pages or platforms. To anyone who wishes to reproduce this - please reach out to me. |
Also notably it used 33.2GB of VRAM |
I've confirmed that stable audio destroys musicgen in this case, VRAM, speed, quality, control, ease of use. It's only the license, although musicgen does not have a very permissive license either. |
Thank you so much for all your time. What you tried is very interesting. PS. I have been listening to the SD radio channel |
I think melody is only like a relief, the prompt itself is very important. And yes, unfortunately we don't have CLIP for music that works. |
Hello
This is an issue but maybe you have an idea that can help me. I am using your MusicGen interface with the metadata below ;.
{ "_version": "0.0.1", "_hash_version": "0.0.3", "_type": "musicgen", "_audiocraft_version": "1.3.0", "models": {}, "prompt": "((piano)) acoustic key F minor minimalist low energy 4/4, 150bpm 320kbps 48.0kHz Stereo", "hash": "05f88c6f4049307a5209b74c368f62fda1575c7ab45668e06b39f54806e0fbcd", "date": "2024-06-21_23-06-50", "text": "((piano)) acoustic key F minor minimalist low energy 4/4, 150bpm 320kbps 48.0kHz Stereo", "melody": "94240fe69b46edc19d55977a5b38598da85708c28bd932d51dbbd5f00e609076", "model": "facebook/musicgen-stereo-melody-large", "duration": 360, "topk": 250, "topp": 0, "temperature": 1, "cfg_coef": 3, "seed": "1538577670", "use_multi_band_diffusion": false }
The melody reference is Philip Glass Metamorphosis 5
https://www.youtube.com/watch?v=Rebr_F53db8
This seemed to me to reasonably possible for a general AI model. After hours of playing around with different settings, I still don't get anything near the ref. I get an audio "soup" at best. As there seems to be rather little written on Musicgen, I was wondering if you have any ideas about the limits of the Musicgen model and what it might have been trained on?
Any thoughts are most welcome.
Thank you.
The text was updated successfully, but these errors were encountered: