Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for FC Level in MTH5 #278

Open
7 of 9 tasks
kkappler opened this issue Jul 8, 2023 · 1 comment
Open
7 of 9 tasks

Support for FC Level in MTH5 #278

kkappler opened this issue Jul 8, 2023 · 1 comment
Assignees

Comments

@kkappler
Copy link
Collaborator

kkappler commented Jul 8, 2023

As of July 2023, there is a branch of mth5 that supports archiving spectrogram data, or the so called Fouier coefficients (FCs) aka the Short-Time-Fourier-Transform data.

Testing of these archiving and retrieval capabilities is underway, but for this to be practically useful, we also must be able to generate Transfer Functions from the FCs.

The FC layer is currently not required by aurora, but it may be in the future. It is expected to provide the following advantages:

  • simplify the implementation of multiple-station processing
  • at a cost of drive storage, significantly speed up re-processing
  • simplify the implementation of parallel processing by
    • providing an interface between the STFT stage of processing (not easy to do decimation in parallel)
    • allowing embarrasing parallelization of STFT over each run to be a separate process from the computation of TFs

Here are some considerations in using the FC levels:

  1. How can we ensure that the stored FCs are consistent with what the user has requested in their processing config?
    Methods have been added to TF Kernel to check that the "FC recipe" from processing config is reflected by the stored FCs
  2. How to ensure stored FCs between various runs at various stations are compatible?
    In general this cannot be guaranteed. Using a standardized set of windowing parmameters will help. If the stored FCs are not compatible with what the processing config requests, then new FCs must be generaeted.

In general, there are a lot of things that can go wrong if assumptions are made about the stored FCs, these concerns were never an issue with direct processing because one config dictated everything about the process moving forward.

To avoid these concerns, what if …
We keep the existing processing class to drive the TF generation, but enable checks for existing FCs and use these when available, or a flag like use_existing_fc is set to True?
This would then have no impact on the existing TF computation, but just change the source of the input data from "compute-on-the-fly" to using the stored FCs.

To implement a strategy like this, the following will need to be taken into consideration:

  • User will need to extract info from the FC Layers, and check their configurations against the processing object.

    • This is implemented, but could be streamlined somewhat by moving the checking into methods of mth5.groups.fourier_coefficients.FCDecimationGroup and mth5.groups.fourier_coefficients.FCGroup. Prototype methods were staged in transfer_function_kernel.py in Aug/Sept 2023.
  • Decide whether to tolerate partial existence of FCs and compute some on the fly and use some stored ....
    This would probably be easiest to do an "all-or-none" approach in the beginning, but eventually modify the processing_summary to have a Boolean column indicating for a given station-run-decimation_level whether to source from ["FC Level", "compute"]
    Decision made -- "All or none" in first implementation

  • Validation Check that an existing FC Level does indeed conform to the processing config

    • If clock_zero is specified, it must be the same for local and remote stations for each contiguous processing run block ... note that there are "infinitely many solutions", this will be some sort of delta clock zero modulo window advance that will need to be checked
    • Checking for channels will only check {input_channels, output_channels} or refernce_channels for a given station-run, and not both
  • It may be that we want to support variations of decimation levels, and if so, some sort of fc_summary index object that can be used to efficiently and cleanly search stored FCs and create new layers if needed may come into play

  • Currently, recoloring data is forced to be True, add it to decimation_level, and add logic to apply_recoloring to check the boolean value of this

  • While deep into these modifications, the extent to which aurora's RunSummary df property can be replaced with mth5's built-in RunSummary should be reviewed.

The key change in logic is that the loop that creates the FCs will first check if they exist … if they do, there is no operations needed on TimeSeries and it will simply bypass the calculation and load them in-place.

Consider the cases user wants to:

  • Add an FC layer to an mth5: This is working (Sept 1 2023). In future we may want to consider the ability to store multiple versions of FCs, this is not currently tested, and probably wouldn't work.

  • Process data with existing FCs: The strategy will be to support this by default. If there is an FC Layer in the H5 it will be checked for compatibiltiy with the processing config. If compatible data is already stored, then this is what will be passed to TF estimation. If compatible data are not already stored, FCs will be computed. Whether or not FCs are saved depends on whether the processing config has save_fc = True/False.

QUESTION
process_mth5.py & process_mth5_2.py or process_mth5.py with complex logic???

@kkappler kkappler self-assigned this Jul 8, 2023
kkappler added a commit that referenced this issue Jul 23, 2023
- Add unit test to cover basic creation of fcs for synthetic data
- add core FC building methods to pipelines
- modify make_mth5_from_ascii so that
   - main takes version as a kwarg
   - ensure all retuned mth5_paths are instances of pathlib.Path

Relates to issue #278
@kkappler
Copy link
Collaborator Author

kkappler commented Sep 2, 2023

Technical note:
Inside process_mth5, when adding FCs, there is a need to do:
fc_decimation_level = fc_group.add_decimation_level(f"{i_dec_level}")
but the fc_decimation_level above initializes to default values, as prescribed by FC standards in mt_metadata. It therefore does not contain the specific STFT parameters from the procesing config, which is the recipe that was used to make the stft_obj.

A preferable way to initialize is:
fc_decimation_level = fc_group.add_decimation_level(f"{i_dec_level}",decimation_level_metadata=dec_level_config)

That way, fc_decimation_level will get its attributes from the config that was used to make the STFTs.
This does not work out of the box however, because the dec_level_config and fc_decimation_level are different data structures ... (It seems there is little to lose (except time) and much to gain by making the dec_level_config and the fc_decimation_level as close to one another as possible).

If we set the kwarg decimation_level_metadata=dec_level_config an AttributeError is encountered. This occurs around line 296 of mth5.groups.base
return_obj.metadata = group_metadata
in the above line, group_metadata is the dec_level_config.

A solution would be a a reformatter that populates an FCDecimation metadata object
with the info from the dec_level config.

One way to do this would be to initialize a dummy layer, to get the skeleton, and then fill it in
Function can be called: fc_decimation_level_from_processing_config_decimation_level()

kkappler added a commit that referenced this issue Sep 2, 2023
- Created make_fc_metadata_from_processing_config
  - there are still some things to iron out but its working on first synthetic test
- Tuned some logic in tf_kernel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant