Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resample_multipitch improvement #336

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

cwitkowitz
Copy link

Using the current functionality, if I try to resample the following pitch list,

image

which has no empty pitch observations (taken directly from a JAMS file), I get the following:

image

With the changes, specifying a tolerance of 50 ms, I get the following:

image

A good rule of thumb would be setting the tolerance (if you want to use it) to twice the minimum timing difference in your pitch list.

…ly) fills large gap with empty pitch estimates
@craffel
Copy link
Collaborator

craffel commented Feb 15, 2021

@rabitt can you take a look?

@rabitt
Copy link
Contributor

rabitt commented Feb 15, 2021

@cwitkowitz good catch on this bug! I think the bug boils down to a false assumption the function makes - that the time series is uniformly spaced / time stamps with no pitches are reported with an empty frequency list.

@craffel meta-question - do we want to force multipitch time series to be uniform like we do for melody? @cwitkowitz's solution is more general and would allow non-uniform time scales, but the properties of the metrics are really different if we do/don't report silent frames.

@craffel
Copy link
Collaborator

craffel commented Feb 15, 2021

The high-level rule is to try to stick to external conventions, i.e. if in all past datasets/papers/competitions the multipitch time series are uniform, then we can assume that. But if there is existing convention that time series can be non-uniform, we should support that. Either way, we should document the convention. Does that make sense in this case?

@rabitt
Copy link
Contributor

rabitt commented Feb 15, 2021

The high-level rule is to try to stick to external conventions, i.e. if in all past datasets/papers/competitions the multipitch time series are uniform, then we can assume that. But if there is existing convention that time series can be non-uniform, we should support that. Either way, we should document the convention. Does that make sense in this case?

I've never seen non-uniform time series for multipitch. and the mirex data from when I implemented this were uniform. Paper-wise, it's one of those details we all tend to leave out. All that said, I'd lean strongly towards uniform.

@craffel
Copy link
Collaborator

craffel commented Feb 15, 2021

Ok, in that case, if the code is broken for non-uniform timescales, we should throw an exception when a non-uniform timescale is passed in.

@cwitkowitz
Copy link
Author

Here is some information on my use-case. I am using GuitarSet, which has note and pitch annotations in JAMS. When evaluating MF0E performance, I would like to use the pitch annotations, as I believe they are more correct than converting the notes into discrete pitch activity during discrete times.

The pitch annotations are only specified for positive samples (i.e. times where a pitch is active). If I want to use that data with mir_eval, it needs to be converted into uniform sampling first. I thought this was a good place to put that conversion, however, I am open to other suggestions. Maybe the conversion to uniform is appropriate as its own function or somewhere in the JAMS repo.

@craffel
Copy link
Collaborator

craffel commented Feb 15, 2021

We could add a separate utility function that converts non-uniform time series to uniform.

@cwitkowitz
Copy link
Author

@rabitt / @craffel How does the following sound for a solution? Scrap what I have so far and replace it with a warning similar to that in the melody resampling function. This warns the user if the multipitch times are not uniform.

Then, I can create a utility function for making time-series uniform. In this function, I can estimate the true spacing by ignoring outliers and taking the mean of the differentials (something to that effect). Then I can re-project the original data onto the new collection of times, which would start at 0 and be spaced by the estimated spacing.

Still, this would not always handle sporadic data very well, e.g. if my positive samples are not spaced uniformly or have no consistent sampling in the first place. However, in the case of JAMS data, where pitch contours should be aligned with some sampling period, I believe it would work just fine.

It might also make sense to put the utility function in the JAMS repo, since JAMS pitch_contour seems to be the only case where this has ever been encountered.

@craffel
Copy link
Collaborator

craffel commented Feb 16, 2021

This seems like a reasonable solution to me at a high level.

@cwitkowitz
Copy link
Author

This seems like a reasonable solution to me at a high level.

Great. I can get started on this then.

@rabitt
Copy link
Contributor

rabitt commented Feb 18, 2021

@cwitkowitz If you haven't gotten too far - I have a old script that does exactly this (filling in missing time stamps) if you want to use it as a starting point.

@cwitkowitz
Copy link
Author

@rabitt - The script you linked would certainly work, however it requires that the sampling rate and hop length are known. In my case these values are not explicitly defined.

I went a little down the path of implementing what I mentioned above, i.e. estimating the hop length by taking the mean of first-order differentials wherever the second-order differential was close to zero. I wasn't sure what to do at that point. I can step from times[0] to times[-1] using this estimated hop length, but it doesn't necessarily line up perfectly, and it seems we would need 1d interpolation again. There is also no knowledge of the duration, so we can fill in empties from t=0 until times[0], but we cannot fill in empties from times[-1] to t=duration.

Any thoughts?

@rabitt
Copy link
Contributor

rabitt commented Feb 22, 2021

@cwitkowitz

I went a little down the path of implementing what I mentioned above, i.e. estimating the hop length by taking the mean of first-order differentials wherever the second-order differential was close to zero. I wasn't sure what to do at that point. I can step from times[0] to times[-1] using this estimated hop length, but it doesn't necessarily line up perfectly, and it seems we would need 1d interpolation again.
There is also no knowledge of the duration, so we can fill in empties from t=0 until times[0], but we cannot fill in empties from times[-1] to t=duration.

Maybe it's safer to have hop_length and duration as arguments to the script - presumably that information should be part of the datasets the data comes from, and then there's no room for error. And maybe you could create a separate helper function called estimate_hop_length that returns a number that the user can use/not?

@cwitkowitz
Copy link
Author

@rabitt - how does this look? I can add test cases once we are happy with the changes.

Copy link
Contributor

@rabitt rabitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! One small comment on the hop size estimation

mir_eval/util.py Outdated
if not np.sum(non_gaps):
raise ValueError("Time observations are too irregular.")

return np.diff(times)[non_gaps].mean()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think .median() would be more stable, since in theory the spacing should be uniform. Then one or two outliers won't skew the hop size, which will over time propagate the error

@cwitkowitz
Copy link
Author

cwitkowitz commented Mar 2, 2021

Good suggestion @rabitt

I made that change and added a few other small changes necessary to get the test cases working.

Here a a few final considerations before completing the PR:

  1. NumPy diff() function was not always giving me exact values, which is why I needed the A_TOL comparison rather than strict equality.
  2. The time_series_to_uniform() function works only for observations specified by a list of ndarrays. This is fine for my purposes, but I am not sure if we would like the function to be more general for mir_eval as a whole.
  3. In my test cases, I needed to compare frequencies, so I used repeat code from test_multipitch.py. I did not see any current examples where code within tests/ was being reused, so I wasn't sure if we wanted to start doing that or just ignore it for now.

…on was provided, but realized that in this case a hop length cannot be estimated and thus it does not make sense to support this case in time_series_to_uniform().
@cwitkowitz
Copy link
Author

@rabitt - I recently came back to this and ran into an error where sometimes the final (original) observation, while still occurring within the specified duration, extends in time beyond what we represent in the new times without including an extra frame that goes over the specified duration. I am still a little bit unsure about the best way to handle this issue. In order to reproduce the error, time_series_to_uniform() can simply be ran with the JAMS data from source 3 of the track 01_BN1-129-Eb_solo in GuitarSet.

My confusion led me to check the mirdata package, to see what was being done there for GuitarSet (see my use-case in the above comment). I then realized that nearly the exact same functionality introduced with this PR in time_series_to_uniform() is supported in _fill_pitch_contour() of the GuitarSet wrapper in the mirdata package.

I also noticed that in the load_pitch_contour() function of the wrapper, the duration provided in the JAMS data is not being passed into _fill_pitch_contour(), which means that potentially 1-2 frames of original observations could be discarded due to the check if time > max_time or t_idx >= n_stamps: in _fill_pitch_contour(). If I am grasping this correctly, one frame may get discarded because of the issue I described above, and another frame may get discarded because the new times won't extend past the final time in the original observations (as opposed to the true duration of the file, which may be larger than the maximum time).

Coming back to this with a fresh look also made me begin to question if we really have the best solution here (in this PR and in mirdata). We are essentially just shifting the times of all of the observations without modifying the values in any way. Wouldn't it be more correct to do some sort of interpolation? I guess that is the point of the resample_multipitch() function in mir_eval, which we can't immediately use here due to the non-uniform time series in the JAMS data (the whole point of this PR).

Sorry for the really long and detailed comment. I needed to write all of this out to organize my own thoughts 😁.

Since some of this discussion involves mirdata, would it be better to bring any of this up in that repository? Also, would it make more sense to support this functionality within the JAMS package, so that it can be used in multiple different contexts?

@cwitkowitz
Copy link
Author

I think the best solution here is simply to add an extra frame in all cases, even if this means the nearest frame on the uniform time grid for the last observation may occur slightly after the specified duration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants