Applying aggregations to only certain subsets of catalogs? #544

riley-brady · 2022-11-04T21:11:25Z

riley-brady
Nov 4, 2022

Is there a way to apply aggregations to only certain subsets of a catalog?

For instance, imagine I have a setup like

# source here could be 'CMIP6' 
groupby_attrs=["source", "scenario", "frequency"],
aggregations=[
        {"type": "union", "attribute_name": "variable"},
        # aggregation that I want to only be applied to certain `source`s
        {
            "type": "join_new",
            "attribute_name": "model",
            "options": {"coords": "minimal", "compat": "override"},
        },
        {
            "type": "join_new",
            "attribute_name": "member",
            "options": {"coords": "minimal", "compat": "override"},
        },
    ],

I would like intake-esm to concatenate over a model dimension, only for certain subsets of the catalog.

For datasets from source='a', each model is the same resolution so they can be concatenated into a multi-model ensemble mean. For datasets from source='b' (e.g. CMIP6), they are all different resolutions and cannot be concatenated.

If I take a catalog subset with source 'a', this aggregation works perfectly, but of course throws an error if the subset includes source 'b'.

Possible solutions (not ideal):

When including source 'b', just run subset.to_dataset_dict(aggregate=False)
Create an entirely separate catalog for source 'b' that does not have that model rule for concatenating

andersy005 · 2022-11-15T23:31:35Z

andersy005
Nov 15, 2022
Maintainer

unfortunately, this is not supported. However, if you are okay with working with two catalog objects, you could try the following

cat = intake.open_esm_datastore(.... )

source='a'

from intake_esm.cat import Aggregation 
cat_subset_1 = cat.search(source='a')

# Remove `source` from `groupby_attrs`
cat_subset_1.esmcat.aggregation_control.groupby_attrs = ['scenario', 'frequency'] 
# Instantiate an `Aggregation` for `source` and add it to the existing list of aggregations 
aggregation =  Aggregation(type='join_new', attribute_name='source', options={'coords': 'minimal', 'compat': 'override'})
cat_subset_1.esmcat.aggregation_control.aggregations += aggregation

# Load the data
dsets_1  = cat_subset_1.to_dataset_dict(.....)

source='b'

cat_subset_2 = cat.search(source='b')
dsets_2 = cat_subset_2.to_dataset_dict(....)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applying aggregations to only certain subsets of catalogs? #544

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Applying aggregations to only certain subsets of catalogs? #544

riley-brady Nov 4, 2022

Replies: 1 comment

andersy005 Nov 15, 2022 Maintainer

riley-brady
Nov 4, 2022

andersy005
Nov 15, 2022
Maintainer