Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whiten refactor #441

Merged
merged 18 commits into from
Nov 2, 2024
Merged

Whiten refactor #441

merged 18 commits into from
Nov 2, 2024

Conversation

d-chambers
Copy link
Contributor

Description

This PR refactors Patch.whiten. The primary motivation was to enable performing whitening on a patch that is already in the Fourier domain (@ahmadtourei is fine-tunning a DASCore-based ambient noise processing pipeline and it helps not perform the dft/idft more than needed). Also @ariellellouch pointed out in #431 that the way we accessed some of the coordinate attributes (the step) could be improved.

I tried to simplify the function by breaking into a few sub-functions: one to check the inputs, one to get an envelope for the normalization, and one to get the envelope for limiting the analysis to certain frequencies. TBH, I am not sure if I really made it any simpler but it does now accept a patch in the Fourier domain without re-doing the transform. All the old tests still pass as well, so I think its ok.

A few changes:

  1. Following the behavior of Patch.correlate the idft argument is now deprecated. If a transformed patch is passed into whiten it does not undo the dft. Conversely, if a time domain patch is passed in the idft is always performed after whitening.

  2. Before the refactor, if no kwargs were passed the last dimension was whitened. Now if no kwargs are passed a dimension called "time" will be transformed. If it doesn't exist an error is raised.

  3. In the previous version the smoothing parameter allowed for smoothing windows to be between 1 and 5 frequency steps. If a smoothing size of < 5 samples was selected the number of samples in the fft could be increased so the smoothing window still had 5 samples. In this version, that is no longer possible since the patch may already be in the Fourier domain, so if the smoothing window is < 5 samples and error is just raised.

  4. I tweaked the behavior utils.patch.history so that if exactly the same patch is returned from a patch function no history string is added.

I would really appreciate it if @ariellellouch could take a look, and sorry if my "improvements" made your original work more complicated 🤷🏻.

Checklist

I have (if applicable):

  • referenced the GitHub issue this PR closes.
  • documented the new feature with docstrings or appropriate doc page.
  • included a test. See testing guidelines.
  • your name has been added to the contributors page (docs/contributors.md).
  • added the "ready_for_review" tag once the PR is ready to be reviewed.

@d-chambers d-chambers added the ready_for_review PR is ready for review label Sep 28, 2024
Copy link

codecov bot commented Sep 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.84%. Comparing base (b2551d2) to head (6841e0b).
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #441   +/-   ##
=======================================
  Coverage   99.84%   99.84%           
=======================================
  Files         114      114           
  Lines        9210     9252   +42     
=======================================
+ Hits         9196     9238   +42     
  Misses         14       14           
Flag Coverage Δ
unittests 99.84% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ariellellouch
Copy link
Contributor

Hi @d-chambers , yes - this is am important speed up. Nice!

About the smoothing - I am not sure I understand the 5-sample limitation. We can convolve with a shorter window as long as its length is >=2, no?

Also, for testing, can you make an example of a plot for which one side shows the data whitened directly, and the other side shows the result of transforming the data to an FFT patch, whitening it, and then going back to the time domain?
I don't know if it will be mathematically equal (it should, but there are a lot of quirky details along the way for that to happen), but visual classification should work.

Ariel

Comment on lines 258 to 265
# Get an envelope which can be used to normalize the spectra so each freq
# has roughly the same amplitude (but phase remains unchanged).
envelope = _get_amplitude_envelope(fft_patch, fft_dim, smooth_size, freq_range)

# Apply the tukey window to the original envelope so only the freqs
# in the selected range are used.
if freq_range is not None:
envelope = _filter_array(envelope, fft_patch, fft_dim, freq_range, tukey_alpha)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify why we calculate envelope first in line 260 and then change it in 265 if the selected range are used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ahmadtourei, I updated the comment so it is hopefully explained better now.

msg = "Frequency range is too narrow"
raise ParameterError(msg)

# We need at least 5 smoothing points. Before, we could increase fft resolution
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could share a reference on this, it would be greatly appreciated!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was how I interpreted the original implementation. Perhaps we don't need it anymore? See the following comment.

@d-chambers
Copy link
Contributor Author

d-chambers commented Oct 1, 2024

About the smoothing - I am not sure I understand the 5-sample limitation. We can convolve with a shorter window as long as its length is >=2, no?

Yes, that is right. I guess I was following these lines in the original code:

    # Virtually increase computation resolution so that the smoothing window
    # contains 5 discrete frequencies. However, if the input value is smaller
    # than the default resolution, raise a parameter error because the
    # computational cost would go up
    if smooth_size < default_df:
        msg = "Frequency smoothing size is smaller than default frequency resolution"
        raise ParameterError(msg)


    if math.floor(smooth_size / default_df) < 5:
        comp_nsamp = math.floor(nsamp * 5 / (smooth_size / default_df))
    else:
        comp_nsamp = nsamp

Which increases the fft resolution if < 5 samples are available. I figured you were doing that to avoid some kind of stability issue, but if not we should be good to drop down to a 2 sample window.

I agree on testing this a bit more.

To test this I implemented a new test which uses a patch composed of chirp from 5 to 25 Hz (see #442). First, I tested the behavior of the current implementation using the current master branch (c7e3178).

First, I setup some plotting code:

import matplotlib.pyplot as plt

import dascore as dc


def plot_time_and_frequency(patch):
    """ Make plots of time and frequency of patch with single channel."""
    sq_patch = patch.select(distance=0, samples=True).squeeze()
    time = patch.get_array("time")

    fig, (td_ax, fd_ax) = plt.subplots(1, 2, figsize=(10, 4))

    # Plot in time domain
    td_ax.plot(time, sq_patch.data)
    td_ax.set_title("time domain")
    td_ax.set_xlabel("time (s)")

    # Plot in freq domain
    ft_patch = sq_patch.dft("time", real=True)
    freq = ft_patch.get_array("ft_time")
    fd_ax.plot(freq, ft_patch.abs().data)
    fd_ax.set_xlabel("Frequency (Hz)")
    # fd_ax.set_xlim(0, 1000)

    return fig

Then plot the chirp patch:

import dascore as dc

patch = dc.get_example_patch("chirp", f0=5, f1=50, channel_count=2)

fig = plot_time_and_frequency(patch)

image

Plus the whitened patch:

white_patch = patch.whiten()
_ = plot_time_and_frequency(white_patch)

image

Which looks a bit odd to me, I would have expected the spectrum to simply smooth out rather than change shape so drastically.

Looking at the new implementation (in this PR) by switching to branch whiten_refactor and re-running the above code we get:

image

So I don't think I have changed the behavior. Testing if the fft version looks the same:

fft_patch = patch.dft("time", real=True)
fft_white = fft_patch.whiten()
idft_white = fft_white.idft()
_ = plot_time_and_frequency(idft_white)

image

It clearly doesn't! So you are right, there is something going on that we need to dive more into. However, I suspect that the original function and time domain version on this branch may not be correct to begin with. What do you think?

edit: fix chirp range

@d-chambers
Copy link
Contributor Author

@ahmadtourei, if you have time feel free to pick this up. I have some other research I need to catch up on, but can probably circle back early next week.

There are a few other whiten implementations we might check out (putting this here for future reference)

seismo live Ambient Seismic Noise Analysis whiten

NoisePy's whiten2D

MSNoise's whiten

@ariellellouch
Copy link
Contributor

ariellellouch commented Oct 2, 2024

Wow, @d-chambers @ahmadtourei looks like I am better at giving advice than coding properly. Judging by Derrick's examples, it looks like my original implementation was messed up.
I will dive into this and see where I was wrong

But also, the chirp seems weird - you generate it between 5 and 25, yet the spectrum shows 5 to 50. Unrelated to the whitening, but still probably good to fix

@d-chambers
Copy link
Contributor Author

@ariellellouch, sounds good if you have the time. In light of recent events, I didn't want to dump anything on your plate in case you have more pressing things to deal with, but if you can work on this it would be much appreciated!

I was also wondering if we need the convolution-based smoothing of the amplitude spectra or if we could just use a savgol filter? I don't know if the convolution is the issue but swapping it out might simplify the code.

But also, the chirp seems weird - you generate it between 5 and 25, yet the spectrum shows 5 to 50. Unrelated to the whitening, but still probably good to fix

Ah good point. I double-checked and the chirp is working, I just copied the wrong cell from my notebook into the github comment. I just updated the comment.

@ariellellouch
Copy link
Contributor

ariellellouch commented Oct 3, 2024

Hey guys @d-chambers @ahmadtourei

I checked; It is not really a bug, but a problem with the definition of the smoothing. When you use the default, smoothing is over the whole spectrum, so a pretty big window. The problem is the convolution in that case starts to "wrap around" (or not), but in case we are not dividing by something very smooth. There are 2 ideas I have:

in whiten.py, change the convolved1d mode from 'constant' to reflect (looks much better)
change the default smoothing size to lets say 10% of the Nyquist (instead of the Full Nyquist)

About the 5-samples smoothing - this is for stability. We can change it to 2, worst case there won't be much effect.

Still not sure about the FFT implementation difference though; Could also be an edge case. We can test the difference on a 5 Hz smoothing window maybe?

@d-chambers
Copy link
Contributor Author

Hey guys,

Quick update. I am struggling to finish up a draft of my dissertation which I need to submit to my committee in about 2 weeks. I probably wont have time to work on this until I get that submitted. If either of you want take over this PR feel free to just push to this branch. Otherwise, I will circle back when I can.

@ahmadtourei
Copy link
Collaborator

Thanks for checking @ariellellouch and for the update @d-chambers. I'd be also busy with my proposal defense till early November but I'll update you guys here if I find time to work on it.

@ariellellouch
Copy link
Contributor

Hi @d-chambers , can you summarize the changes in those versions?
Thanks!

@d-chambers
Copy link
Contributor Author

d-chambers commented Oct 31, 2024

Ok, I found some time to work on this.

After thinking about it and looking at some other implementations I made the following changes:

  • I created a patch.taper_range method which makes it easier to apply the frequency windowing. Now the taper windows are applied using kwargs to specify a 2 or 4 length tuple. This makes the tukey_alpha argument unnecessary.

  • I changed the default behavior when smooth_size is None. I decided some users might want a strong whiten (like in the seismo live implementation) which essentially removes the amplitude information. This can also be applied in a band-limited way.

  • I added an optional water_level argument for stabilization of low power signals.

  • I replaced the convolution smoothing with a uniform_filter which some report to be 50x faster.

This does break backwards compatibility so I will add a very clear error message. Hopefully it wont cause too much inconvenience.

Here is the tutorial page I am going to add on Patch.whiten

Whiten

The Patch.whiten function performs spectral whitening by balancing the amplitude spectra of the patch while leaving the phase (largely) unchanged. Spectral whitening is often a pre-processing step in ambient noise correlation workflows.

To demonstrate, we first create some boiler plate plotting code and create example patches.

import matplotlib.pyplot as plt
import numpy as np

rng = np.random.default_rng()

import dascore as dc
from dascore.utils.time import to_float


def plot_time_and_frequency(patch, channel=0):
    """ Make plots of time and frequency of patch with single channel."""
    sq_patch = patch.select(distance=channel, samples=True).squeeze()
    time_array = to_float(patch.get_array("time"))
    time = time_array - np.min(time_array)

    fig, (td_ax, fd_ax, phase_ax) = plt.subplots(1, 3, figsize=(15, 4))

    # Plot in time domain
    td_ax.plot(time, sq_patch.data, color="tab:blue")
    td_ax.set_title("Time Domain")
    td_ax.set_xlabel("time (s)")

    # Plot freq amplitdue
    ft_patch = sq_patch.dft("time", real=True)
    freq = ft_patch.get_array("ft_time")
    fd_ax.plot(freq, ft_patch.abs().data, color="tab:red")
    fd_ax.set_xlabel("Frequency (Hz)")
    fd_ax.set_title("Amplitude Spectra")

    # plot freq phase
    phase_ax.plot(freq, np.angle(ft_patch.data), color="tab:cyan")
    phase_ax.set_xlabel("Frequency (Hz)")
    phase_ax.set_title("Phase Angle")
    
    # fd_ax.set_xlim(0, 1000)
    return fig


def make_noisy_sine_patch():
    """Make a noisy sine wave patch.""" 
    patch = dc.get_example_patch(
        "sin_wav", 
        frequency=[1, 10, 20, 54, 66], 
        amplitude=[1, 2, 3, 4, 9], 
        duration=1, 
        sample_rate=200,
    )
    rand_noise = (rng.random(patch.data.shape) - 0.5) * 10
    patch = patch.new(data = patch.data + rand_noise)
    return patch


def make_chirp_patch():
    # Get a patch with a chirp from 5 to 50 Hz.
    return dc.get_example_patch("chirp", f0=5, f1=50, channel_count=2)
patch = make_noisy_sine_patch()
plot_time_and_frequency(patch);

image

The default whitening makes all spectral amplitudes equal.

white_patch = patch.whiten()
plot_time_and_frequency(white_patch);

image

The whitening can be restricted to certain frequency ranges by specifying the dimension and a frequency range.

white_patch = patch.whiten(time=(20, 80))
plot_time_and_frequency(white_patch);

image

Four values can be used to control the taper.

white_patch = patch.whiten(time=(20, 40, 60, 80))
plot_time_and_frequency(white_patch);

image

Whiten also supports a smoothed amplitude for normalization which reduces the effect depending on the smooth_size parameter.

white_patch = patch.whiten(smooth_size=1)
_ = plot_time_and_frequency(white_patch)

image

white_patch = patch.whiten(smooth_size=2, time=(10, 20, 80, 90))
_ = plot_time_and_frequency(white_patch)

image

Whiten can accept a patch in the frequency domain, in which case a frequency domain patch is returned. This might be useful if whiten is only a part of a frequency domain workflow.

fft_patch = patch.dft("time", real=True)
dft_white = fft_patch.whiten(smooth_size=1, time=(20, 40, 60, 80))
td_patch = dft_white.idft()
plot_time_and_frequency(td_patch);

image

The water_level parameter is useful for stabilizing frequencies that may have low values.

@d-chambers
Copy link
Contributor Author

Let me know what you think of the changes and if they will be suitable for your workflows. When you guys are happy with the design/outputs I will add the docs.

@ariellellouch
Copy link
Contributor

Hi @d-chambers ,
The behavior looks great to me, and it is what I would test. Nice!!!

@ahmadtourei
Copy link
Collaborator

Let me know what you think of the changes and if they will be suitable for your workflows. When you guys are happy with the design/outputs I will add the docs.

Thanks for your work and including these examples. This looks great to me! I'll be applying this function on my Alaska dataset and will re-open this PR and add more comments if I notice any unexpected behavior.

@d-chambers
Copy link
Contributor Author

Ok, thanks guys. I will merge this now. If we find issues we can always open another PR/issue.

@d-chambers d-chambers merged commit fa38582 into master Nov 2, 2024
15 checks passed
@d-chambers d-chambers deleted the whiten_refactor branch November 2, 2024 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready_for_review PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants