Whiten refactor #441

d-chambers · 2024-09-28T22:58:49Z

Description

This PR refactors Patch.whiten. The primary motivation was to enable performing whitening on a patch that is already in the Fourier domain (@ahmadtourei is fine-tunning a DASCore-based ambient noise processing pipeline and it helps not perform the dft/idft more than needed). Also @ariellellouch pointed out in #431 that the way we accessed some of the coordinate attributes (the step) could be improved.

I tried to simplify the function by breaking into a few sub-functions: one to check the inputs, one to get an envelope for the normalization, and one to get the envelope for limiting the analysis to certain frequencies. TBH, I am not sure if I really made it any simpler but it does now accept a patch in the Fourier domain without re-doing the transform. All the old tests still pass as well, so I think its ok.

A few changes:

Following the behavior of Patch.correlate the idft argument is now deprecated. If a transformed patch is passed into whiten it does not undo the dft. Conversely, if a time domain patch is passed in the idft is always performed after whitening.
Before the refactor, if no kwargs were passed the last dimension was whitened. Now if no kwargs are passed a dimension called "time" will be transformed. If it doesn't exist an error is raised.
In the previous version the smoothing parameter allowed for smoothing windows to be between 1 and 5 frequency steps. If a smoothing size of < 5 samples was selected the number of samples in the fft could be increased so the smoothing window still had 5 samples. In this version, that is no longer possible since the patch may already be in the Fourier domain, so if the smoothing window is < 5 samples and error is just raised.
I tweaked the behavior utils.patch.history so that if exactly the same patch is returned from a patch function no history string is added.

I would really appreciate it if @ariellellouch could take a look, and sorry if my "improvements" made your original work more complicated 🤷🏻.

Checklist

I have (if applicable):

referenced the GitHub issue this PR closes.
documented the new feature with docstrings or appropriate doc page.
included a test. See testing guidelines.
your name has been added to the contributors page (docs/contributors.md).
added the "ready_for_review" tag once the PR is ready to be reviewed.

codecov · 2024-09-28T23:01:07Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.84%. Comparing base (b2551d2) to head (6841e0b).
Report is 2 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #441   +/-   ##
=======================================
  Coverage   99.84%   99.84%           
=======================================
  Files         114      114           
  Lines        9210     9252   +42     
=======================================
+ Hits         9196     9238   +42     
  Misses         14       14

Flag	Coverage Δ
unittests	`99.84% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ariellellouch · 2024-09-29T07:52:58Z

Hi @d-chambers , yes - this is am important speed up. Nice!

About the smoothing - I am not sure I understand the 5-sample limitation. We can convolve with a shorter window as long as its length is >=2, no?

Also, for testing, can you make an example of a plot for which one side shows the data whitened directly, and the other side shows the result of transforming the data to an FFT patch, whitening it, and then going back to the time domain?
I don't know if it will be mathematically equal (it should, but there are a lot of quirky details along the way for that to happen), but visual classification should work.

Ariel

ahmadtourei · 2024-10-01T21:42:08Z

dascore/proc/whiten.py

+    # Get an envelope which can be used to normalize the spectra so each freq
+    # has roughly the same amplitude (but phase remains unchanged).
+    envelope = _get_amplitude_envelope(fft_patch, fft_dim, smooth_size, freq_range)
+
+    # Apply the tukey window to the original envelope so only the freqs
+    # in the selected range are used.
+    if freq_range is not None:
+        envelope = _filter_array(envelope, fft_patch, fft_dim, freq_range, tukey_alpha)


Could you please clarify why we calculate envelope first in line 260 and then change it in 265 if the selected range are used?

Hey @ahmadtourei, I updated the comment so it is hopefully explained better now.

ahmadtourei · 2024-10-01T21:45:10Z

dascore/proc/whiten.py

+        msg = "Frequency range is too narrow"
+        raise ParameterError(msg)
+
+    # We need at least 5 smoothing points. Before, we could increase fft resolution


If you could share a reference on this, it would be greatly appreciated!

This was how I interpreted the original implementation. Perhaps we don't need it anymore? See the following comment.

d-chambers · 2024-10-01T22:12:47Z

About the smoothing - I am not sure I understand the 5-sample limitation. We can convolve with a shorter window as long as its length is >=2, no?

Yes, that is right. I guess I was following these lines in the original code:

    # Virtually increase computation resolution so that the smoothing window
    # contains 5 discrete frequencies. However, if the input value is smaller
    # than the default resolution, raise a parameter error because the
    # computational cost would go up
    if smooth_size < default_df:
        msg = "Frequency smoothing size is smaller than default frequency resolution"
        raise ParameterError(msg)


    if math.floor(smooth_size / default_df) < 5:
        comp_nsamp = math.floor(nsamp * 5 / (smooth_size / default_df))
    else:
        comp_nsamp = nsamp

Which increases the fft resolution if < 5 samples are available. I figured you were doing that to avoid some kind of stability issue, but if not we should be good to drop down to a 2 sample window.

I agree on testing this a bit more.

To test this I implemented a new test which uses a patch composed of chirp from 5 to 25 Hz (see #442). First, I tested the behavior of the current implementation using the current master branch (c7e3178).

First, I setup some plotting code:

import matplotlib.pyplot as plt

import dascore as dc


def plot_time_and_frequency(patch):
    """ Make plots of time and frequency of patch with single channel."""
    sq_patch = patch.select(distance=0, samples=True).squeeze()
    time = patch.get_array("time")

    fig, (td_ax, fd_ax) = plt.subplots(1, 2, figsize=(10, 4))

    # Plot in time domain
    td_ax.plot(time, sq_patch.data)
    td_ax.set_title("time domain")
    td_ax.set_xlabel("time (s)")

    # Plot in freq domain
    ft_patch = sq_patch.dft("time", real=True)
    freq = ft_patch.get_array("ft_time")
    fd_ax.plot(freq, ft_patch.abs().data)
    fd_ax.set_xlabel("Frequency (Hz)")
    # fd_ax.set_xlim(0, 1000)

    return fig

Then plot the chirp patch:

import dascore as dc

patch = dc.get_example_patch("chirp", f0=5, f1=50, channel_count=2)

fig = plot_time_and_frequency(patch)

Plus the whitened patch:

white_patch = patch.whiten()
_ = plot_time_and_frequency(white_patch)

Which looks a bit odd to me, I would have expected the spectrum to simply smooth out rather than change shape so drastically.

Looking at the new implementation (in this PR) by switching to branch whiten_refactor and re-running the above code we get:

So I don't think I have changed the behavior. Testing if the fft version looks the same:

fft_patch = patch.dft("time", real=True)
fft_white = fft_patch.whiten()
idft_white = fft_white.idft()
_ = plot_time_and_frequency(idft_white)

It clearly doesn't! So you are right, there is something going on that we need to dive more into. However, I suspect that the original function and time domain version on this branch may not be correct to begin with. What do you think?

edit: fix chirp range

d-chambers · 2024-10-01T22:31:48Z

@ahmadtourei, if you have time feel free to pick this up. I have some other research I need to catch up on, but can probably circle back early next week.

There are a few other whiten implementations we might check out (putting this here for future reference)

seismo live Ambient Seismic Noise Analysis whiten

NoisePy's whiten2D

MSNoise's whiten

ariellellouch · 2024-10-02T10:30:37Z

Wow, @d-chambers @ahmadtourei looks like I am better at giving advice than coding properly. Judging by Derrick's examples, it looks like my original implementation was messed up.
I will dive into this and see where I was wrong

But also, the chirp seems weird - you generate it between 5 and 25, yet the spectrum shows 5 to 50. Unrelated to the whitening, but still probably good to fix

d-chambers · 2024-10-02T17:41:43Z

@ariellellouch, sounds good if you have the time. In light of recent events, I didn't want to dump anything on your plate in case you have more pressing things to deal with, but if you can work on this it would be much appreciated!

I was also wondering if we need the convolution-based smoothing of the amplitude spectra or if we could just use a savgol filter? I don't know if the convolution is the issue but swapping it out might simplify the code.

But also, the chirp seems weird - you generate it between 5 and 25, yet the spectrum shows 5 to 50. Unrelated to the whitening, but still probably good to fix

Ah good point. I double-checked and the chirp is working, I just copied the wrong cell from my notebook into the github comment. I just updated the comment.

ariellellouch · 2024-10-03T19:30:42Z

Hey guys @d-chambers @ahmadtourei

I checked; It is not really a bug, but a problem with the definition of the smoothing. When you use the default, smoothing is over the whole spectrum, so a pretty big window. The problem is the convolution in that case starts to "wrap around" (or not), but in case we are not dividing by something very smooth. There are 2 ideas I have:

in whiten.py, change the convolved1d mode from 'constant' to reflect (looks much better)
change the default smoothing size to lets say 10% of the Nyquist (instead of the Full Nyquist)

About the 5-samples smoothing - this is for stability. We can change it to 2, worst case there won't be much effect.

Still not sure about the FFT implementation difference though; Could also be an edge case. We can test the difference on a 5 Hz smoothing window maybe?

d-chambers · 2024-10-09T17:15:05Z

Hey guys,

Quick update. I am struggling to finish up a draft of my dissertation which I need to submit to my committee in about 2 weeks. I probably wont have time to work on this until I get that submitted. If either of you want take over this PR feel free to just push to this branch. Otherwise, I will circle back when I can.

ahmadtourei · 2024-10-10T00:21:09Z

Thanks for checking @ariellellouch and for the update @d-chambers. I'd be also busy with my proposal defense till early November but I'll update you guys here if I find time to work on it.

ariellellouch · 2024-10-31T16:57:09Z

Hi @d-chambers , can you summarize the changes in those versions?
Thanks!

d-chambers · 2024-10-31T19:35:38Z

Ok, I found some time to work on this.

After thinking about it and looking at some other implementations I made the following changes:

I created a patch.taper_range method which makes it easier to apply the frequency windowing. Now the taper windows are applied using kwargs to specify a 2 or 4 length tuple. This makes the tukey_alpha argument unnecessary.
I changed the default behavior when smooth_size is None. I decided some users might want a strong whiten (like in the seismo live implementation) which essentially removes the amplitude information. This can also be applied in a band-limited way.
I added an optional water_level argument for stabilization of low power signals.
I replaced the convolution smoothing with a uniform_filter which some report to be 50x faster.

This does break backwards compatibility so I will add a very clear error message. Hopefully it wont cause too much inconvenience.

Here is the tutorial page I am going to add on Patch.whiten

Whiten

The Patch.whiten function performs spectral whitening by balancing the amplitude spectra of the patch while leaving the phase (largely) unchanged. Spectral whitening is often a pre-processing step in ambient noise correlation workflows.

To demonstrate, we first create some boiler plate plotting code and create example patches.

import matplotlib.pyplot as plt
import numpy as np

rng = np.random.default_rng()

import dascore as dc
from dascore.utils.time import to_float


def plot_time_and_frequency(patch, channel=0):
    """ Make plots of time and frequency of patch with single channel."""
    sq_patch = patch.select(distance=channel, samples=True).squeeze()
    time_array = to_float(patch.get_array("time"))
    time = time_array - np.min(time_array)

    fig, (td_ax, fd_ax, phase_ax) = plt.subplots(1, 3, figsize=(15, 4))

    # Plot in time domain
    td_ax.plot(time, sq_patch.data, color="tab:blue")
    td_ax.set_title("Time Domain")
    td_ax.set_xlabel("time (s)")

    # Plot freq amplitdue
    ft_patch = sq_patch.dft("time", real=True)
    freq = ft_patch.get_array("ft_time")
    fd_ax.plot(freq, ft_patch.abs().data, color="tab:red")
    fd_ax.set_xlabel("Frequency (Hz)")
    fd_ax.set_title("Amplitude Spectra")

    # plot freq phase
    phase_ax.plot(freq, np.angle(ft_patch.data), color="tab:cyan")
    phase_ax.set_xlabel("Frequency (Hz)")
    phase_ax.set_title("Phase Angle")
    
    # fd_ax.set_xlim(0, 1000)
    return fig


def make_noisy_sine_patch():
    """Make a noisy sine wave patch.""" 
    patch = dc.get_example_patch(
        "sin_wav", 
        frequency=[1, 10, 20, 54, 66], 
        amplitude=[1, 2, 3, 4, 9], 
        duration=1, 
        sample_rate=200,
    )
    rand_noise = (rng.random(patch.data.shape) - 0.5) * 10
    patch = patch.new(data = patch.data + rand_noise)
    return patch


def make_chirp_patch():
    # Get a patch with a chirp from 5 to 50 Hz.
    return dc.get_example_patch("chirp", f0=5, f1=50, channel_count=2)

patch = make_noisy_sine_patch()
plot_time_and_frequency(patch);

The default whitening makes all spectral amplitudes equal.

white_patch = patch.whiten()
plot_time_and_frequency(white_patch);

The whitening can be restricted to certain frequency ranges by specifying the dimension and a frequency range.

white_patch = patch.whiten(time=(20, 80))
plot_time_and_frequency(white_patch);

Four values can be used to control the taper.

white_patch = patch.whiten(time=(20, 40, 60, 80))
plot_time_and_frequency(white_patch);

Whiten also supports a smoothed amplitude for normalization which reduces the effect depending on the smooth_size parameter.

white_patch = patch.whiten(smooth_size=1)
_ = plot_time_and_frequency(white_patch)

white_patch = patch.whiten(smooth_size=2, time=(10, 20, 80, 90))
_ = plot_time_and_frequency(white_patch)

Whiten can accept a patch in the frequency domain, in which case a frequency domain patch is returned. This might be useful if whiten is only a part of a frequency domain workflow.

fft_patch = patch.dft("time", real=True)
dft_white = fft_patch.whiten(smooth_size=1, time=(20, 40, 60, 80))
td_patch = dft_white.idft()
plot_time_and_frequency(td_patch);

The water_level parameter is useful for stabilizing frequencies that may have low values.

d-chambers · 2024-10-31T19:45:45Z

Let me know what you think of the changes and if they will be suitable for your workflows. When you guys are happy with the design/outputs I will add the docs.

ariellellouch · 2024-11-01T07:14:16Z

Hi @d-chambers ,
The behavior looks great to me, and it is what I would test. Nice!!!

ahmadtourei · 2024-11-01T21:44:45Z

Let me know what you think of the changes and if they will be suitable for your workflows. When you guys are happy with the design/outputs I will add the docs.

Thanks for your work and including these examples. This looks great to me! I'll be applying this function on my Alaska dataset and will re-open this PR and add more comments if I notice any unexpected behavior.

d-chambers · 2024-11-02T00:05:54Z

Ok, thanks guys. I will merge this now. If we find issues we can always open another PR/issue.

ahmadtourei and others added 9 commits September 25, 2024 19:00

reformat whiten and added a test

b22fe98

add pdbpp hack for windows

215d1c8

start refactor

982fa2f

Merge branch 'refactor_whitening' into whiten_refactor

4bdea57

continue refactor

6901b1c

fix a few whiten checks

395dd88

update

938a3dc

fix history tests

31c769a

improve coverage

51040e9

d-chambers added the ready_for_review PR is ready for review label Sep 28, 2024

try fix linter

e639d79

d-chambers force-pushed the whiten_refactor branch from fac4975 to e639d79 Compare October 1, 2024 20:08

ahmadtourei reviewed Oct 1, 2024

View reviewed changes

merge master

435070e

clarify envelop frequency manipulation

55ec543

d-chambers and others added 3 commits October 30, 2024 18:02

work on whiten refactor

e51569b

use unfirom filter, optional water level

c9460e3

merge master

301cd42

d-chambers added 2 commits October 31, 2024 14:02

add docs, fix test coverage

b7c2f39

fix linter

8817f50

support multiple windows

6841e0b

d-chambers merged commit fa38582 into master Nov 2, 2024
15 checks passed

d-chambers deleted the whiten_refactor branch November 2, 2024 00:06

ahmadtourei mentioned this pull request Nov 14, 2024

Patch.whiten on a dft patch expects kwargs range in time instead of freq #438

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whiten refactor #441

Whiten refactor #441

d-chambers commented Sep 28, 2024

codecov bot commented Sep 28, 2024 •

edited

Loading

ariellellouch commented Sep 29, 2024

ahmadtourei Oct 1, 2024

d-chambers Oct 1, 2024

ahmadtourei Oct 1, 2024

d-chambers Oct 1, 2024

d-chambers commented Oct 1, 2024 •

edited

Loading

d-chambers commented Oct 1, 2024

ariellellouch commented Oct 2, 2024 •

edited

Loading

d-chambers commented Oct 2, 2024

ariellellouch commented Oct 3, 2024 •

edited

Loading

d-chambers commented Oct 9, 2024

ahmadtourei commented Oct 10, 2024

ariellellouch commented Oct 31, 2024

d-chambers commented Oct 31, 2024 •

edited

Loading

d-chambers commented Oct 31, 2024

ariellellouch commented Nov 1, 2024

ahmadtourei commented Nov 1, 2024

d-chambers commented Nov 2, 2024

Whiten refactor #441

Whiten refactor #441

Conversation

d-chambers commented Sep 28, 2024

Description

Checklist

codecov bot commented Sep 28, 2024 • edited Loading

Codecov Report

ariellellouch commented Sep 29, 2024

ahmadtourei Oct 1, 2024

Choose a reason for hiding this comment

d-chambers Oct 1, 2024

Choose a reason for hiding this comment

ahmadtourei Oct 1, 2024

Choose a reason for hiding this comment

d-chambers Oct 1, 2024

Choose a reason for hiding this comment

d-chambers commented Oct 1, 2024 • edited Loading

d-chambers commented Oct 1, 2024

ariellellouch commented Oct 2, 2024 • edited Loading

d-chambers commented Oct 2, 2024

ariellellouch commented Oct 3, 2024 • edited Loading

d-chambers commented Oct 9, 2024

ahmadtourei commented Oct 10, 2024

ariellellouch commented Oct 31, 2024

d-chambers commented Oct 31, 2024 • edited Loading

Whiten

d-chambers commented Oct 31, 2024

ariellellouch commented Nov 1, 2024

ahmadtourei commented Nov 1, 2024

d-chambers commented Nov 2, 2024

codecov bot commented Sep 28, 2024 •

edited

Loading

d-chambers commented Oct 1, 2024 •

edited

Loading

ariellellouch commented Oct 2, 2024 •

edited

Loading

ariellellouch commented Oct 3, 2024 •

edited

Loading

d-chambers commented Oct 31, 2024 •

edited

Loading