-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pitch Detection Algorithm Is Missing a Step from the McLeod Paper #23
Comments
Thank you for taking the time to look into this and do a few tests! I don't have any objection to this, just give me a few days to digest the proposed changes to make sure they don't conflict with any other of the detectors implemented in the library. Also, feel free to open a pull request, if you'd like to retain authorship of this fix. It's cool to see this little library being used in the real world :) |
Hey, no problem. I'm not familiar enough with the other algorithms to know if they would have a similar parameter, but they might if they also need to pick from multiple peaks corresponding to different octaves. After making this change, it's always possible for a user to just set the pick_threshold to zero if they want to retain the original functionality of picking the first peak above the clarity threshold. I won't bother with the pull request. As I mentioned, I barely know Rust, so I'd rather let someone more capable with Rust take care of implementing this properly. I'm just happy to be able to use this library in my side projects. |
And by the way, in case you are interested, this is my project that is currently running a wasm-compiled version of your library with the changes I mentioned above. It is now much less common to see random jumps to different octaves. |
Cool game! |
I do not have it in a github repository, but you can view it all by inspecting my site: https://imacslive.com/filedata/imacs-live/sing/ look at these files:
The code to initialize the wasm:
Then to use the detector, just pass it an array of samples from your audio stream:
Let me know if something is unclear |
Sorry, I'm totally new to this. I was expecting some rust files... |
Oh if you are trying to use it in a web application where you will be compiling to wasm anyway, then I suggest just using the wasm I already have compiled from the Rust. If you aren't using in a web app, and you need the Rust source code, I think your best bet is to just copy the code from this Rust library and make the changes to the code that I mentioned in my original comment at the top of this thread. Depending on how exactly you use it, you might have to adjust some of the objects to accept the new inputs. I barely know Rust, so I just hacked it together for now. Hopefully alesgenova is willing to update his library with the change done properly. |
Oops, sorry, I totally missed those! |
see: alesgenova#23 (comment) Thanks to @music-game for this code.
@music-game Thanks again for the improvement! |
Nice! Did you notice an improvement with the change? |
I'm still testing. :) |
Yes! It's massively more likely to pick the right octave. |
Awesome! Glad to hear the change works for you too. |
Have you seen https://github.com/sevagh/pitch-detection/tree/master/misc/mcleod ? |
I have not seen that. I did find that repo when I was originally researching pitch detection, but I thought it was just C++ code. I didn't see the misc/mcleod folder. Thanks for pointing it out. |
Let me start by saying this is a great library and works quite well. However, after reviewing the code and working with it in a real application, I believe you are missing this part of the algorithm defined in the McLeod Paper:
From the key maxima we define a threshold which is equal to the value of the highest maximum, nmax, multiplied by a constant k. We then take the first key maximum which is above this threshold and assign its delay, τ , as the pitch period. The constant, k, has to be large enough to avoid choosing peaks caused by strong harmonics, such as those in Figure 2, but low enough to not choose the unwanted beat or sub-harmonic signals. Choosing an incorrect key maximum causes a pitch error, usually a ’wrong octave’.
Your "clarity threshold" somewhat implements this, however you don't do any scaling by the maximum peak. In the two example NSDF plots below, picking the red line (~0.7) as the threshold works, but what if you have a bit more noise in the signal? Perhaps the clarity of all the peaks reduce, by ~0.2 for example. In that scenario, you might incorrectly pick the second peak in the first image, and perhaps pick no peaks in the second image, if you kept your threshold as 0.7.
If instead, you scale the selection threshold by the maximum peak, you would still be able to select the correct peak in either case, because the threshold would move up and down with the clarity of the overall signal. You should still have a minimum clarity threshold below which you don't want to return any result. But when it comes to picking between the various maxima above the clarity threshold, you can get much better octave-selection by applying this third input.
I was able to modify your code to accept a third input, which I've called "pick_threshold" that implements the constant from the McLeod paper. With my change, you now will select the first peak that is above both the "clarity threshold" and "maximum_peak * pick_threshold". In my testing, this change led to much fewer octave-errors. I used these settings: power_thresh = 3, clarity_thresh = 0.7, and pick_thresh = 0.98. The pick threshold can be adjusted a bit lower if you are sometimes picking up subharmonics, but I found 0.98 to be a sweet spot to avoid picking up higher order harmonics, which was happening more frequently with the original code.
The only main changes to the code I made were in the pitch_from_peaks function and the choose_peak function. I barely know Rust (I'm compiling this to wasm for a web application), so I'm sure there is a more efficient way to extract the maximum peak than what I did here (duplicating the peaks iterator), but this code did compile and worked a lot better in terms of picking the correct octave.
`pub fn pitch_from_peaks(
input: &[T],
sample_rate: usize,
clarity_threshold: T,
pick_threshold: T,
correction: PeakCorrection,
) -> Option<Pitch>
where
T: Float,
{
let sample_rate = T::from_usize(sample_rate).unwrap();
let peaks = detect_peaks(input);
let peaks2 = detect_peaks(input);
let maxpeak = peaks2.max_by(|x, y| x.1.partial_cmp(&y.1).unwrap_or_else(|| Ordering::Equal));
let thresh2 = match maxpeak {
None => T::from(0).unwrap(),
Some(p) => p.1 * pick_threshold,
};
choose_peak(peaks, thresh2, clarity_threshold)
.map(|peak| correct_peak(peak, input, correction))
.map(|peak| Pitch {
frequency: sample_rate / peak.0,
clarity: peak.1 / input[0],
})
}
pub fn choose_peak<I: Iterator<Item = (usize, T)>, T: Float>(
mut peaks: I,
threshold: T,
threshold2: T,
) -> Option<(usize, T)> {
peaks.find(|p| p.1 > threshold && p.1 > threshold2)
}`
The text was updated successfully, but these errors were encountered: