-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whiten refactor #441
Whiten refactor #441
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #441 +/- ##
=======================================
Coverage 99.84% 99.84%
=======================================
Files 114 114
Lines 9210 9252 +42
=======================================
+ Hits 9196 9238 +42
Misses 14 14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Hi @d-chambers , yes - this is am important speed up. Nice! About the smoothing - I am not sure I understand the 5-sample limitation. We can convolve with a shorter window as long as its length is >=2, no? Also, for testing, can you make an example of a plot for which one side shows the data whitened directly, and the other side shows the result of transforming the data to an FFT patch, whitening it, and then going back to the time domain? Ariel |
fac4975
to
e639d79
Compare
dascore/proc/whiten.py
Outdated
# Get an envelope which can be used to normalize the spectra so each freq | ||
# has roughly the same amplitude (but phase remains unchanged). | ||
envelope = _get_amplitude_envelope(fft_patch, fft_dim, smooth_size, freq_range) | ||
|
||
# Apply the tukey window to the original envelope so only the freqs | ||
# in the selected range are used. | ||
if freq_range is not None: | ||
envelope = _filter_array(envelope, fft_patch, fft_dim, freq_range, tukey_alpha) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please clarify why we calculate envelope
first in line 260 and then change it in 265 if the selected range are used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ahmadtourei, I updated the comment so it is hopefully explained better now.
dascore/proc/whiten.py
Outdated
msg = "Frequency range is too narrow" | ||
raise ParameterError(msg) | ||
|
||
# We need at least 5 smoothing points. Before, we could increase fft resolution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you could share a reference on this, it would be greatly appreciated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was how I interpreted the original implementation. Perhaps we don't need it anymore? See the following comment.
Yes, that is right. I guess I was following these lines in the original code: # Virtually increase computation resolution so that the smoothing window
# contains 5 discrete frequencies. However, if the input value is smaller
# than the default resolution, raise a parameter error because the
# computational cost would go up
if smooth_size < default_df:
msg = "Frequency smoothing size is smaller than default frequency resolution"
raise ParameterError(msg)
if math.floor(smooth_size / default_df) < 5:
comp_nsamp = math.floor(nsamp * 5 / (smooth_size / default_df))
else:
comp_nsamp = nsamp Which increases the fft resolution if < 5 samples are available. I figured you were doing that to avoid some kind of stability issue, but if not we should be good to drop down to a 2 sample window. I agree on testing this a bit more. To test this I implemented a new test which uses a patch composed of chirp from 5 to 25 Hz (see #442). First, I tested the behavior of the current implementation using the current master branch (c7e3178). First, I setup some plotting code: import matplotlib.pyplot as plt
import dascore as dc
def plot_time_and_frequency(patch):
""" Make plots of time and frequency of patch with single channel."""
sq_patch = patch.select(distance=0, samples=True).squeeze()
time = patch.get_array("time")
fig, (td_ax, fd_ax) = plt.subplots(1, 2, figsize=(10, 4))
# Plot in time domain
td_ax.plot(time, sq_patch.data)
td_ax.set_title("time domain")
td_ax.set_xlabel("time (s)")
# Plot in freq domain
ft_patch = sq_patch.dft("time", real=True)
freq = ft_patch.get_array("ft_time")
fd_ax.plot(freq, ft_patch.abs().data)
fd_ax.set_xlabel("Frequency (Hz)")
# fd_ax.set_xlim(0, 1000)
return fig Then plot the chirp patch: import dascore as dc
patch = dc.get_example_patch("chirp", f0=5, f1=50, channel_count=2)
fig = plot_time_and_frequency(patch) Plus the whitened patch: white_patch = patch.whiten()
_ = plot_time_and_frequency(white_patch) Which looks a bit odd to me, I would have expected the spectrum to simply smooth out rather than change shape so drastically. Looking at the new implementation (in this PR) by switching to branch So I don't think I have changed the behavior. Testing if the fft version looks the same: fft_patch = patch.dft("time", real=True)
fft_white = fft_patch.whiten()
idft_white = fft_white.idft()
_ = plot_time_and_frequency(idft_white) It clearly doesn't! So you are right, there is something going on that we need to dive more into. However, I suspect that the original function and time domain version on this branch may not be correct to begin with. What do you think? edit: fix chirp range |
@ahmadtourei, if you have time feel free to pick this up. I have some other research I need to catch up on, but can probably circle back early next week. There are a few other whiten implementations we might check out (putting this here for future reference) |
Wow, @d-chambers @ahmadtourei looks like I am better at giving advice than coding properly. Judging by Derrick's examples, it looks like my original implementation was messed up. But also, the chirp seems weird - you generate it between 5 and 25, yet the spectrum shows 5 to 50. Unrelated to the whitening, but still probably good to fix |
@ariellellouch, sounds good if you have the time. In light of recent events, I didn't want to dump anything on your plate in case you have more pressing things to deal with, but if you can work on this it would be much appreciated! I was also wondering if we need the convolution-based smoothing of the amplitude spectra or if we could just use a savgol filter? I don't know if the convolution is the issue but swapping it out might simplify the code.
Ah good point. I double-checked and the chirp is working, I just copied the wrong cell from my notebook into the github comment. I just updated the comment. |
Hey guys @d-chambers @ahmadtourei I checked; It is not really a bug, but a problem with the definition of the smoothing. When you use the default, smoothing is over the whole spectrum, so a pretty big window. The problem is the convolution in that case starts to "wrap around" (or not), but in case we are not dividing by something very smooth. There are 2 ideas I have: in whiten.py, change the convolved1d mode from 'constant' to reflect (looks much better) About the 5-samples smoothing - this is for stability. We can change it to 2, worst case there won't be much effect. Still not sure about the FFT implementation difference though; Could also be an edge case. We can test the difference on a 5 Hz smoothing window maybe? |
Hey guys, Quick update. I am struggling to finish up a draft of my dissertation which I need to submit to my committee in about 2 weeks. I probably wont have time to work on this until I get that submitted. If either of you want take over this PR feel free to just push to this branch. Otherwise, I will circle back when I can. |
Thanks for checking @ariellellouch and for the update @d-chambers. I'd be also busy with my proposal defense till early November but I'll update you guys here if I find time to work on it. |
Hi @d-chambers , can you summarize the changes in those versions? |
Ok, I found some time to work on this. After thinking about it and looking at some other implementations I made the following changes:
This does break backwards compatibility so I will add a very clear error message. Hopefully it wont cause too much inconvenience. Here is the tutorial page I am going to add on WhitenThe To demonstrate, we first create some boiler plate plotting code and create example patches. import matplotlib.pyplot as plt
import numpy as np
rng = np.random.default_rng()
import dascore as dc
from dascore.utils.time import to_float
def plot_time_and_frequency(patch, channel=0):
""" Make plots of time and frequency of patch with single channel."""
sq_patch = patch.select(distance=channel, samples=True).squeeze()
time_array = to_float(patch.get_array("time"))
time = time_array - np.min(time_array)
fig, (td_ax, fd_ax, phase_ax) = plt.subplots(1, 3, figsize=(15, 4))
# Plot in time domain
td_ax.plot(time, sq_patch.data, color="tab:blue")
td_ax.set_title("Time Domain")
td_ax.set_xlabel("time (s)")
# Plot freq amplitdue
ft_patch = sq_patch.dft("time", real=True)
freq = ft_patch.get_array("ft_time")
fd_ax.plot(freq, ft_patch.abs().data, color="tab:red")
fd_ax.set_xlabel("Frequency (Hz)")
fd_ax.set_title("Amplitude Spectra")
# plot freq phase
phase_ax.plot(freq, np.angle(ft_patch.data), color="tab:cyan")
phase_ax.set_xlabel("Frequency (Hz)")
phase_ax.set_title("Phase Angle")
# fd_ax.set_xlim(0, 1000)
return fig
def make_noisy_sine_patch():
"""Make a noisy sine wave patch."""
patch = dc.get_example_patch(
"sin_wav",
frequency=[1, 10, 20, 54, 66],
amplitude=[1, 2, 3, 4, 9],
duration=1,
sample_rate=200,
)
rand_noise = (rng.random(patch.data.shape) - 0.5) * 10
patch = patch.new(data = patch.data + rand_noise)
return patch
def make_chirp_patch():
# Get a patch with a chirp from 5 to 50 Hz.
return dc.get_example_patch("chirp", f0=5, f1=50, channel_count=2) patch = make_noisy_sine_patch()
plot_time_and_frequency(patch); The default whitening makes all spectral amplitudes equal. white_patch = patch.whiten()
plot_time_and_frequency(white_patch); The whitening can be restricted to certain frequency ranges by specifying the dimension and a frequency range. white_patch = patch.whiten(time=(20, 80))
plot_time_and_frequency(white_patch); Four values can be used to control the taper. white_patch = patch.whiten(time=(20, 40, 60, 80))
plot_time_and_frequency(white_patch); Whiten also supports a smoothed amplitude for normalization which reduces the effect depending on the smooth_size parameter. white_patch = patch.whiten(smooth_size=1)
_ = plot_time_and_frequency(white_patch) white_patch = patch.whiten(smooth_size=2, time=(10, 20, 80, 90))
_ = plot_time_and_frequency(white_patch) Whiten can accept a patch in the frequency domain, in which case a frequency domain patch is returned. This might be useful if whiten is only a part of a frequency domain workflow. fft_patch = patch.dft("time", real=True)
dft_white = fft_patch.whiten(smooth_size=1, time=(20, 40, 60, 80))
td_patch = dft_white.idft()
plot_time_and_frequency(td_patch); The |
Let me know what you think of the changes and if they will be suitable for your workflows. When you guys are happy with the design/outputs I will add the docs. |
Hi @d-chambers , |
Thanks for your work and including these examples. This looks great to me! I'll be applying this function on my Alaska dataset and will re-open this PR and add more comments if I notice any unexpected behavior. |
Ok, thanks guys. I will merge this now. If we find issues we can always open another PR/issue. |
Description
This PR refactors
Patch.whiten
. The primary motivation was to enable performing whitening on a patch that is already in the Fourier domain (@ahmadtourei is fine-tunning a DASCore-based ambient noise processing pipeline and it helps not perform the dft/idft more than needed). Also @ariellellouch pointed out in #431 that the way we accessed some of the coordinate attributes (the step) could be improved.I tried to simplify the function by breaking into a few sub-functions: one to check the inputs, one to get an envelope for the normalization, and one to get the envelope for limiting the analysis to certain frequencies. TBH, I am not sure if I really made it any simpler but it does now accept a patch in the Fourier domain without re-doing the transform. All the old tests still pass as well, so I think its ok.
A few changes:
Following the behavior of
Patch.correlate
the idft argument is now deprecated. If a transformed patch is passed intowhiten
it does not undo the dft. Conversely, if a time domain patch is passed in the idft is always performed after whitening.Before the refactor, if no kwargs were passed the last dimension was whitened. Now if no kwargs are passed a dimension called "time" will be transformed. If it doesn't exist an error is raised.
In the previous version the smoothing parameter allowed for smoothing windows to be between 1 and 5 frequency steps. If a smoothing size of < 5 samples was selected the number of samples in the fft could be increased so the smoothing window still had 5 samples. In this version, that is no longer possible since the patch may already be in the Fourier domain, so if the smoothing window is < 5 samples and error is just raised.
I tweaked the behavior utils.patch.history so that if exactly the same patch is returned from a patch function no history string is added.
I would really appreciate it if @ariellellouch could take a look, and sorry if my "improvements" made your original work more complicated 🤷🏻.
Checklist
I have (if applicable):