Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected results? #4

Open
brianc118 opened this issue Oct 2, 2019 · 20 comments
Open

Expected results? #4

brianc118 opened this issue Oct 2, 2019 · 20 comments

Comments

@brianc118
Copy link

I get the following when evaluating on MAPS after training the model over 100k iterations.

These metrics appear to be quite low, especially the frame metrics which are 0.65/0.65/0.64 whereas the Maestro paper reports 0.90/0.95/0.81.

Is this expected?

Thanks!

                            note precision                : 0.795 ± 0.096
                            note recall                   : 0.756 ± 0.109
                            note f1                       : 0.773 ± 0.096
                            note overlap                  : 0.541 ± 0.101
               note-with-offsets precision                : 0.362 ± 0.127
               note-with-offsets recall                   : 0.345 ± 0.126
               note-with-offsets f1                       : 0.352 ± 0.125
               note-with-offsets overlap                  : 0.808 ± 0.092
              note-with-velocity precision                : 0.739 ± 0.093
              note-with-velocity recall                   : 0.704 ± 0.110
              note-with-velocity f1                       : 0.719 ± 0.096
              note-with-velocity overlap                  : 0.543 ± 0.102
  note-with-offsets-and-velocity precision                : 0.341 ± 0.123
  note-with-offsets-and-velocity recall                   : 0.325 ± 0.124
  note-with-offsets-and-velocity f1                       : 0.332 ± 0.122
  note-with-offsets-and-velocity overlap                  : 0.807 ± 0.092
                           frame f1                       : 0.636 ± 0.108
                           frame precision                : 0.649 ± 0.163
                           frame recall                   : 0.654 ± 0.102
                           frame accuracy                 : 0.475 ± 0.115
                           frame substitution_error       : 0.106 ± 0.058
                           frame miss_error               : 0.240 ± 0.108
                           frame false_alarm_error        : 0.337 ± 0.338
                           frame total_error              : 0.683 ± 0.337
                           frame chroma_precision         : 0.686 ± 0.155
                           frame chroma_recall            : 0.696 ± 0.102
                           frame chroma_accuracy          : 0.516 ± 0.106
                           frame chroma_substitution_error: 0.064 ± 0.033
                           frame chroma_miss_error        : 0.240 ± 0.108
                           frame chroma_false_alarm_error : 0.337 ± 0.338
                           frame chroma_total_error       : 0.641 ± 0.315
@jongwook
Copy link
Owner

jongwook commented Oct 2, 2019

I have also noticed low performance on MAPS. But the performance in the MAESTRO paper to compare fairly is the second row in Table 6, the one tested on MAPS without data augmentation: 0.82/0.83/0.61.

I'm suspecting that this is partly related to #3 , although I haven't had bandwidth to verify that.

I'll be able to get back on this this month before ISMIR starts.

@brianc118
Copy link
Author

@jongwook I'm going to try the proposed fix in #3 . Wouldn't the fairer comparison the one trained on MAESTRO not MAPS (fourth row of Table 6)?

@jongwook
Copy link
Owner

jongwook commented Oct 4, 2019

Thanks! As far as I understand, all of their experiments are trained on MAESTRO, and Table 6 (row 1-2) shows how the model generalizes to the MAPS dataset - I wasn't able to reproduce the same level of generalizability with my implementation. When trained & tested on MAESTRO, this implementation can achieve similar performance to the row 4 of table 6.

@hanjuTsai
Copy link

I have also trained the model for 100k, but I got the following result. Should I predict on earlier iteration checkpoints? Or maybe something went wrong?

/home/hanju/miniconda3/envs/hanju/lib/python3.6/site-packages/mir_eval/transcription.py:167: UserWarning: Estimated notes are empty.
  warnings.warn("Estimated notes are empty.")
/home/hanju/miniconda3/envs/hanju/lib/python3.6/site-packages/mir_eval/multipitch.py:275: UserWarning: Estimate frequencies are all empty.
  warnings.warn("Estimate frequencies are all empty.")
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [02:55<00:00,  2.24s/it]
                            note precision                : 0.000 ± 0.000
                            note recall                   : 0.000 ± 0.000
                            note f1                       : 0.000 ± 0.000
                            note overlap                  : 0.000 ± 0.000
               note-with-offsets precision                : 0.000 ± 0.000
               note-with-offsets recall                   : 0.000 ± 0.000
               note-with-offsets f1                       : 0.000 ± 0.000
               note-with-offsets overlap                  : 0.000 ± 0.000
              note-with-velocity precision                : 0.000 ± 0.000
              note-with-velocity recall                   : 0.000 ± 0.000
              note-with-velocity f1                       : 0.000 ± 0.000
              note-with-velocity overlap                  : 0.000 ± 0.000
                           frame f1                       : 0.000 ± 0.000
                           frame precision                : 0.000 ± 0.000
                           frame recall                   : 0.000 ± 0.000
                           frame accuracy                 : 0.000 ± 0.000
                           frame substitution_error       : 0.000 ± 0.000
                           frame miss_error               : 1.000 ± 0.000
                           frame false_alarm_error        : 0.000 ± 0.000
                           frame total_error              : 1.000 ± 0.000
                           frame chroma_precision         : 0.000 ± 0.000
                           frame chroma_recall            : 0.000 ± 0.000
                           frame chroma_accuracy          : 0.000 ± 0.000
                           frame chroma_substitution_error: 0.000 ± 0.000
                           frame chroma_miss_error        : 1.000 ± 0.000
                           frame chroma_false_alarm_error : 0.000 ± 0.000
                           frame chroma_total_error       : 1.000 ± 0.000

@brianc118
Copy link
Author

@hanjuTsai see #1

@jongwook
Copy link
Owner

jongwook commented Nov 4, 2019

Thanks @brianc118 !
@hanjuTsai I've updated requirements.txt to install the latest mir_eval commit directly from Github. Now just running

pip install -r requirements.txt

should download all required dependencies.

@justachetan
Copy link

Hi @jongwook, I have been trying to train on MAPS since yesterday, however, I am still facing the Userwarning shown above. Additionally, the metrics.json file in my case is shown to be empty. Could you kindly point out what I might be doing wrong?

The command I am using for running is:

python3 train.py with train_on="MAPS" logdir=runs/baseline iterations=10000 validation_interval=100 checkpoint_interval=100

I checked it till after 1100 iterations, but there was no change.

@jongwook
Copy link
Owner

@justachetan Is your loss decreasing? You'll need ~100,000 iterations (ideally more) to see sensible results

@justachetan
Copy link

I am not able to even see the loss. The metrics.json file generates is completely empty. As per my understanding, on running the above command, the metrics should get logged after every 100 iterations, right? I saw in another issue that they were able to see some results after 500 iterations, as the Userwarning about empty reference frames went away.

Hence, I was confused as to why this is happening.

@jongwook
Copy link
Owner

train.py writes tensorboard logs from which you should be able to see the loss curves.

@justachetan
Copy link

justachetan commented Mar 21, 2020

Screenshot 2020-03-21 at 2 07 48 PM

This is what the loss plot looks like. The metrics.json file is still empty though.

EDIT: The same plot, slightly zoomed out. It seems to be fluctuating like crazy, I don't know why
Screenshot 2020-03-21 at 2 09 40 PM

EDIT 2: Additionally, plots of frame metrics and note metrics don't even get loaded, I am guessing because of the User warning from mir_eval. These results are after around 4.1k iterations while training on MAPS dataset.

@jongwook
Copy link
Owner

I think you'll need to let the training run for a lot more, until at least 100k. The loss will stay at ~0.2 for a short while, and decrease steadily:

image

BTW I don't remember saving metrics.json during training. Are you sure that file is generated by my code?

@justachetan
Copy link

Yep, I have not made any changes to your code as of now. I assumed that it was getting generated from there.

@justachetan
Copy link

I think you'll need to let the training run for a lot more, until at least 100k. The loss will stay at ~0.2 for a short while, and decrease steadily:

image

BTW I don't remember saving metrics.json during training. Are you sure that file is generated by my code?

Your loss plot does not seem to have as many fluctuations as mine. Is this while training on MAPS only?

@jongwook
Copy link
Owner

FYI your plot contains lines from multiple tensorboard log files, hence looking messy. Also note that my plots are smoothed significantly; the dim curves in the background are the actual data points. If you train until ~100k it'll look similar.

@justachetan
Copy link

So even in your case, while training on MAPS, the Accuracy/Recall plots for notes and frames are not available till 100k iterations?

@jongwook
Copy link
Owner

I don't have the numbers for MAPS at hand, but it'll be generally similar. See Figure 6 of https://arxiv.org/pdf/1906.08512.pdf ; the blue baseline curve is for the MAESTRO dataset.

@justachetan
Copy link

justachetan commented Mar 21, 2020

The plot seems to indicate that you were not getting any values on Frame F1 or Note F1 till about 100k iterations. Probably due to the mir_eval issue itself. Could you kindly confirm if this is correct?

@jongwook
Copy link
Owner

You'll get sensible frame/note F1 values after around 100k, as said earlier. mir_eval will stop complaining once the predictions start having some notes.

@AkasaTanabe
Copy link

Hi @jongwook, I have read your paper ADVERSARIAL LEARNING FOR IMPROVED ONSETS AND FRAMES MUSIC TRANSCRIPTION. In your paper, you reported good experiment results in both onsets and frames and your proposed method.
I think you might solve this problem already. If so, I would like you to teach us how to solve this problem.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants