-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: BSSEval v4 #272
WIP: BSSEval v4 #272
Conversation
Thanks for all your hard work. Provided the regression tests still pass, I am open to giving this some sort of fast-track, but I personally won't be able to do any kind of review until after Feb 9. It looks like this is still "WIP" though, I see commented-out code, early return statements, print statements, etc. in the tests. In principle I would be most happy to have there be a single reference implementation of bss_eval in Python, instead of having two separate versions (one in mir_eval, one outside).
I'm not sure what you mean by this or why it's helpful. |
I agree but for most other implementations in mir_eval there were existing reference implementations outside of mir_eval. So this would be something shiny and new here ;-)
We would like to use bsseval v4 in the SiSEC evaluation as part of the python musdb parser. And we are running a bit out of time, so we have to basically use the WIP version for the evaluation campaign. I would therefore either copy the separation code into the parser or make a separate bsseval package. Having participants install our fork directly from github seems not a great option. Do you think there are better options? |
I support you doing whatever is most convenient, but I do think it would be awkward if there ended up with two PIP packages -- bss_eval_v4 and mir_eval which contains bss_eval_v4. pip can install directly from a github repo/fork, maybe that's a decent option? |
I personally don't mind installing mir_eval for running separation metrics as the full package is lightweight in terms of size and dependencies, and it's nice to not fragment the community too much. |
@craffel okay, @aliutkus and me decided for now to copy the bsseval v4 into our musdb18 evaluation package to reduce confusion to users (installing a mir_eval fork from github is still a bit fragile for users not having virtual environments). Before start with the actual code review, it would be great if we can agree on the API changes and function namings so that we can easily switch so the new version of mir_eval in our evaluation tool, as soon as this PR is merged and a new release is issued:
|
Seems reasonable to me. As I said as long as things are backwards compatible and that there is community consensus that this way is "correct" I am ok.
Yes. In some cases in I would be interested to hear what the reaction of the larger SISEC community is to this. |
okay I then would suggest to wait for the LVA/ICA conference in summer. We will then have a good feedback from the community. Also we are in contact with Emmanuel Vincent so that there is a chance that bss_eval v4 will 'officially' replace the older version and be added to the bsseval website. Until then v4 will live in https://github.com/sigsep/sigsep-mus-eval/blob/master/museval/metrics.py I would suggest to close the PR for now and I will reopen it when we are ready. |
SG! |
Okay, here is some news regarding the source separation evaluation model:
@aliutkus and me did make a spontaneous rewrite of the bsseval method to fix some shortcomings in the principle of bsseval (see #270 and #271).
The BSSEval metrics, as implemented in the MATLAB toolboxes and its re-implementation here in mir_eval are widely used in the audio separation literature. One particularity of BSSEval is to compute the metrics after optimally matching the estimates to the true sources through linear distortion filters. This arguably allows the criteria to be robust to some linear mismatches. Apart from the optional computation of all possible permutations of the sources, this matching is the reason for most of the computation cost of BSSEval, especially considering it is done for each evaluation window when the metrics are computed on a framewise basis.
In this years SiSEC, we decided to drop the assumption that distortion filters could be varying over time, but considered instead they are fixed for the whole length of the track. First, this significantly reduces the computational cost for evaluation because matching needs to be done only once for the whole signal. Second, this introduces much more dynamics in the evaluation, because time-varying matching filters turn out to over-estimate performance. Third, this makes matching more robust, because true sources are not silent throughout the whole recording, while they often were for short windows.
Technically, we are offering a new evaluation function
bss_eval()
which covers all previous functionality (images, sources/ framewise/ global) and additionally allows to use the newv4
mode. We maintained compatibility to the old functions by wrappingbss_eval
. The benefit is mainly thatv4
is significantly faster. Depending on the frame size it can be up to 50 %.Long story short.... we touched most of the code but are 100% backwards compatible (regression tests are passing) and now we consider the options how to get this code out as soon as possible. The reason is that we want to use this in our evaluation kit that we give out to the SiSEC participants as soon as possible (as early as next week).
Currently, we don't know if it would be a good idea to move the new version bsseval version directly into mir_eval. Here are my concerns:
So maybe the best option is to have bsseval v4 in a separate python package first and then merge it into mir_eval later this year. @craffel are you okay with releasing a stripped down version of mir_eval (all modules removed, except for separation) under a different package name (probably
bsseval
) on pypi?