Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate version history json file - for use by ES-DOCS #2

Open
durack1 opened this issue Sep 18, 2017 · 56 comments
Open

Generate version history json file - for use by ES-DOCS #2

durack1 opened this issue Sep 18, 2017 · 56 comments
Assignees
Labels
CMIP6 Related to previous-era data

Comments

@durack1
Copy link
Contributor

durack1 commented Sep 18, 2017

@eguil @davidhassell this issue has been generated following the email correspondence regarding https://es-doc.org/cmip6-ensembles-conformance/

It will be useful to iterate over the format of the json version info within this issue

@durack1
Copy link
Contributor Author

durack1 commented Sep 18, 2017

@davidhassell @eguil here is a first pass at a version lookup file - please review and let me know if fields should be ordered differently to make it easier to use.

PCMDI/input4MIPs-cmor-tables/Versions/6.2.0.json

Once we have finalised the format, I can regenerate the versions back in time. Any future release will have a new 6.x.y.json file generated

@durack1
Copy link
Contributor Author

durack1 commented Sep 19, 2017

@davidhassell @eguil, I'm wondering whether sorting these by target_mip keys, with the institution_id values as a second level will make these easier for you to use?

@durack1 durack1 removed their assignment Sep 21, 2017
@durack1
Copy link
Contributor Author

durack1 commented Sep 26, 2017

@davidhassell @eguil it would be great to get your feedback soon on the format, as I am anticipating significant changes as new datasets are generated and published, and without feedback you're going to be stuck using the existing format of the json info

@durack1
Copy link
Contributor Author

durack1 commented Oct 23, 2017

@davidhassell @eguil I have made a change to the format, please take a look at the files now building in PCMDI/input4MIPs-cmor-tables/Versions

@MartinaSt
Copy link
Collaborator

@davidhassell @eguil This JSON file with the versions is very helpful. It would provide the user with even more information, if you add a short reason for deprecation. Currently, I can only mention that the data is deprecated and point the user to the current version and the version google doc (see e.g.: https://doi.org/10.22033/ESGF/input4MIPs.1120 ).

@durack1
Copy link
Contributor Author

durack1 commented Oct 24, 2017

@MartinaSt thanks for the feedback, if you see these files as useful for you, then please feel free to suggest changed/augmentations/amendments to the format and content, so that it's easiest to use for you. The plan that I had, was once a format had been finalized, then I will generate versions extending all the way back to the original release v6.0.0 (20th December 2016) as noted in the google doc

I think also having each of the DOIs for published/DOI-minted data would also be a useful addition

@MartinaSt
Copy link
Collaborator

@durack1 Thanks, Paul. Having the change of the DRS and my matching in mind, it would be great if you could add:

  • the source_id to have the complete (and after republication very few) DRS components in the JSON: %(institution)s.%(source_id)s.
  • the reason and date of deprecation for the deprecated data collections (deprecatedVersionNotes)

Could you avoid '' notations, e.g. '2017-05-18 (-AIR-*)' and replace these by all individual versions? The current notation is difficult to parse.

In the currentVersionNotes you have split the note into multiple list entries. It would be good to have a single note per data version. Example from the 6.2.1. JSON:
"currentVersionNotes":[
"latest AIR datasets are 2017-08-30 (except",
" SO2), and SO2 aircraft emission files 2017-10-05",
", which deprecate 2017-05-18"

It would be great if you could make these changes. Is this information sufficient or do you need more information from me?

Adding the DOIs is an excellent idea. It would be easiest if we had the DRS of the data collection on the DOI granularity directly in the JSON, e.g. %(mip_era)s.%(activity)s.%(institution)s.%(source_id)s [CMIP6.input4MIPs.PCMDI.PCMDI-AMIP-1-1-2].
(new DRS after republication)

@durack1
Copy link
Contributor Author

durack1 commented Oct 31, 2017

@MartinaSt I hadn't thought about your use of this, glad it will be useful for you. Can you take a pass at editing the current Versions/6.2.1.json version of the file to the format that you want? If I have an example of the changes that you want implemented, it'll be easier for me to propagate the changes across all datasets in the collection.

@MartinaSt
Copy link
Collaborator

@durack1 The ideal structure from the citation point of view would be with examples ImperialCollege and PNNL-JGCRI in the new DRS:

{
"input4MIPs_version":{
"data":{
"CMIP6.input4MIPs.ImperialCollege.ImperialCollege-1-0":{
"institution_id":"ImperialCollege",
"source_id":"ImperialCollege-1-0",
"mip_table":["C4MIP","OMIP"],
"data_type":"atmosphericState",
"version":"1.0",
"VersionInfo":"deprecated",
"VersionNotes":"...to be added: reason for deprecation...",
"doi":"10.22033/ESGF/input4MIPs.1162"
},
"CMIP6.input4MIPs.ImperialCollege.ImperialCollege-1-1":{
"institution_id":"ImperialCollege",
"source_id":"ImperialCollege-1-1",
"mip_table":["C4MIP","OMIP"],
"data_type":"atmosphericState",
"version":"1.1",
"VersionInfo":"current",
"VersionNotes":"...",
"doi":"10.22033/ESGF/input4MIPs.1601"
},
"CMIP6.input4MIPs.ImperialCollege.ImperialCollege-2-0":{
"institution_id":"ImperialCollege",
"source_id":"ImperialCollege-2-0",
"mip_table":["C4MIP","OMIP"],
"data_type":"atmosphericState",
"version":"2.0",
"VersionInfo":"current",
"VersionNotes":"...",
"doi":"10.22033/ESGF/input4MIPs.1602"
},
"CMIP6.input4MIPs.PNNL-JGCRI.CEDS-2017-05-18":{
"institution_id":"PNNL-JGCRI",
"source_id":"CEDS-2017-05-18",
"mip_table":["CMIP"],
"data_type":"emissions",
"version":"2017-05-18",
"VersionInfo":"current",
"VersionNotes":"...",
"doi":"10.22033/ESGF/input4MIPs.1241"
},
"CMIP6.input4MIPs.PNNL-JGCRI.CEDS-2017-05-18-supplemental-data":{
"institution_id":"PNNL-JGCRI",
"source_id":"CEDS-2017-05-18-supplemental-data",
"mip_table":["CMIP"],
"data_type":"emissions",
"version":"2017-05-18-supplemental-data",
"VersionInfo":"current",
"VersionNotes":"...",
"doi":"10.22033/ESGF/input4MIPs.1242"
},
"CMIP6.input4MIPs.PNNL-JGCRI.CEDS-2017-08-30":{
"institution_id":"PNNL-JGCRI",
"source_id":"CEDS-2017-08-30",
"mip_table":["CMIP"],
"data_type":"emissions",
"version":"2017-08-30",
"VersionInfo":"current",
"VersionNotes":"latest AIR datasets are 2017-08-30 (except SO2), and SO2 aircraft emission files 2017-10-05, which deprecate 2017-05-18",
"doi":"10.22033/ESGF/input4MIPs.1604"
},
"CMIP6.input4MIPs.PNNL-JGCRI.CEDS-2017-08-30-supplemental-data":{
"institution_id":"PNNL-JGCRI",
"source_id":"CEDS-2017-08-30-supplemental-data",
"mip_table":["CMIP"],
"data_type":"emissions",
"version":"2017-08-30-supplemental-data",
"VersionInfo":"current",
"VersionNotes":"latest AIR datasets are 2017-08-30 (except SO2), and SO2 aircraft emission files 2017-10-05, which deprecate 2017-05-18",
"doi":"10.22033/ESGF/input4MIPs.1605"
},
"CMIP6.input4MIPs.PNNL-JGCRI.CEDS-2017-10-05":{
"institution_id":"PNNL-JGCRI",
"source_id":"CEDS-2017-10-05",
"mip_table":["CMIP"],
"data_type":"emissions",
"version":"2017-10-05",
"VersionInfo":"current",
"VersionNotes":"latest AIR datasets are 2017-08-30 (except SO2), and SO2 aircraft emission files 2017-10-05, which deprecate 2017-05-18",
"doi":""
},
"CMIP6.input4MIPs.PNNL-JGCRI.CEDS-v2016-06-18":{
"institution_id":"PNNL-JGCRI",
"source_id":"CEDS-v2016-06-18",
"mip_table":["CMIP"],
"data_type":"emissions",
"version":"2016-06-18",
"VersionInfo":"deprecated",
"VersionNotes":"...to be added: reason for deprecation...",
"doi":"10.22033/ESGF/input4MIPs.1123"
},
"CMIP6.input4MIPs.PNNL-JGCRI.CEDS-v2016-06-18-sectorDimV2":{
"institution_id":"PNNL-JGCRI",
"source_id":"CEDS-v2016-06-18-sectorDimV2",
"mip_table":["CMIP"],
"data_type":"emissions",
"version":"2016-06-18-sectorDimV2",
"VersionInfo":"deprecated",
"VersionNotes":"...to be added: reason for deprecation...",
"doi":"10.22033/ESGF/input4MIPs.1126"
},
"CMIP6.input4MIPs.PNNL-JGCRI.CEDS-v2016-07-26":{
"institution_id":"PNNL-JGCRI",
"source_id":"CEDS-v2016-07-26",
"mip_table":["CMIP"],
"data_type":"emissions",
"version":"2016-07-26",
"VersionInfo":"deprecated",
"VersionNotes":"...to be added: reason for deprecation...",
"doi":"10.22033/ESGF/input4MIPs.1116"
},
"CMIP6.input4MIPs.PNNL-JGCRI.CEDS-v2016-07-26-sectorDim":{
"institution_id":"PNNL-JGCRI",
"source_id":"CEDS-v2016-07-26-sectorDim",
"mip_table":["CMIP"],
"data_type":"emissions",
"version":"2016-07-26-sectorDim",
"VersionInfo":"deprecated",
"VersionNotes":"...to be added: reason for deprecation...",
"doi":"10.22033/ESGF/input4MIPs.1114"
},
"CMIP6.input4MIPs.PNNL-JGCRI.CEDS-v2016-07-26-sectorDim-supplemental-data":{
"institution_id":"PNNL-JGCRI",
"source_id":"CEDS-v2016-07-26-sectorDim-supplemental-data",
"mip_table":["CMIP"],
"data_type":"emissions",
"version":"2016-07-26-sectorDim-supplemental-data",
"VersionInfo":"deprecated",
"VersionNotes":"...to be added: reason for deprecation...",
"doi":"10.22033/ESGF/input4MIPs.1124"
}
}
},
"version":"6.2.1_ms",
"version_release":"2017-11-01"
}
}

Doi information is accessible as JSON using the above DRS_ids via:
https://cera-www.dkrz.de/WDCC/meta/CMIP6/<DRS_id>.json
As we currently have several different DRS in use, an example for the citation json is available via:
https://cera-www.dkrz.de/WDCC/meta/CMIP6/input4MIPs.PNNL-JGCRI.emissions.CMIP.CEDS-2017-08-30.json
Note that in the non-doi case you will find the url of the landing page in the json instead of the doi.

@durack1
Copy link
Contributor Author

durack1 commented Nov 3, 2017

@MartinaSt thanks for this, the citation was not the target for the existing format so I'll have to consider merging these both.

@davidhassell @eguil it would be really useful for you to chime in, as once a format has been settled you'll have to deal with this anyway you can

@agstephens
Copy link

At what level of the DRS hierarchy are we planning to publish DOIs? If we want to auto-generate version history files then it is important to know which level of the directory structure they apply to.

@esdoc-system-user
Copy link

  1. Be consistent with key naming convention, i.e. either lower_case_underscore (ala python) or camelCase (ala native JSON).

  2. The data field should be an array not an object.

@MartinaSt
Copy link
Collaborator

@agstephens You can see the citation granularity, which is in use for input4MIPs, in my example. I have used the DRS_id on the citation granularity as key.
E.g.:
old DRS: input4MIPs.PNNL-JGCRI.emissions.CMIP.CEDS-2017-08-30
new DRS: CMIP6.input4MIPs.PNNL-JGCRI.CEDS-2017-08-30

@durack1
Copy link
Contributor Author

durack1 commented Dec 1, 2017

@esdoc-system-user your comment above "data field should be an array not an object", can you further explain? The current file version/format can be viewed here

@davidhassell
Copy link

It may be some use to summarize how ES-DOC will be storing dataset descriptions. The properties we might collect are (summarized from the CIM definition)

  • citations (Set of pertinent citations)
  • responsible_parties (Individuals and organisations reponsible for the data)
  • name (Name of dataset)
  • description (Textural description of dataset)
  • availability (Where the data is located, and how it is accessed)
  • drs_datasets (Data available in the DRS)
  • produced_by (Makes a link back to originating activity)
  • related_to_dataset (Related dataset)

Apart from name, all properties are optional.

Currently, the name, availability and description are captured in the ES-DOC CMIP6 experiments spreadsheet, that is rendered in the ES-DOC viewer (https://search.es-doc.org , e.g. the descriptions of pre-industrial aerosols for esm-piControl may be see here)

@durack1
Copy link
Contributor Author

durack1 commented Dec 12, 2017

@davidhassell thanks for this, I believe these properties are exactly what I was hoping to gather, so will consider these along with the requirements outlined by @MartinaSt above #2 and propose a new format before preparing the 6.0.0 -> 6.2.3 version json files

@MartinaSt
Copy link
Collaborator

@durack1, independent of the format for the version information the current information is outdated (version 6.2.3, November 2017) When do you plan an update?

@durack1
Copy link
Contributor Author

durack1 commented Oct 2, 2018

Hi folks, as discussed at the WIP call this morning we need to work on the input4MIPs dataset version history so that this information can be provided for model simulations to be accurately documented (which combination of the numerous forcing datasets available). It would be useful for @davidhassell @charliepascoe to engage on this so that we can generate an easy to use format that can be updated live as additional datasets are updated and contributed to the project

@eguil @momipsl @taylor13 @MartinaSt

@durack1
Copy link
Contributor Author

durack1 commented Oct 4, 2018

It will be necessary to assign the input4MIPs collection version - currently 6.2.14 (see here) for each valid dataset, and as these datasets are deprecated their version remains static, with the new version getting the new collection tag, so e.g. a new volcanic forcing dataset (v4) is released, the input4MIPs collection is incremented to 6.2.15, in the 6.2.14.json file the v3 file had collection version = 6.2.14, in the 6.2.15.json file the v3 file continues to have collection version 6.2.14, whereas v4 will have 6.2.15

@MartinaSt
Copy link
Collaborator

@durack1 thanks for coming back to this issue. Please keep either the version information, which data providers included in the dataset names, or/and add the ESGF version under which the dataset was published. Otherwise I will loose the connection to the dataset version in the citation.

@durack1
Copy link
Contributor Author

durack1 commented Nov 19, 2018

@eguil @davidhassell @momipsl this is the conversation that we can hopefully spend some time finalizing tomorrow - the format that @MartinaSt suggested is above #2

@davidhassell
Copy link

Hi @durack1 and all, Thanks for taking the time to talk through this a couple of days ago.

To summarize, these are the attributes in the JSON files that I think we can use for ES-DOC:

  • DRS id (e.g. input4MIPs.CMIP6.CMIP.NCAR.NCAR-CCMI-2-0.atmos.monC.drynhx.gn)
  • Title (e.g. CCMI nitrogen surface fluxes in support of CMIP6 - version 2.0)
  • Identifier DOI (e.g http://doi.org/10.22033/ESGF/input4MIPs.1125)
  • current dataset versions for the input4mips version (e.g. ["1.1.2", "1.1.3"])
  • deprecated dataset versions for the input4mips version (e.g. ["1.0.0", "1.0.1", "1.1.0", "1.1.1"])

I understand that all of these items are readily available. Of course any extra attributes that are needed, e.g. for citations are all fine and will not affect ES-DOC.

This could be, more or less, a mingling of Martina's and Paul's JSON examples:

{
"data":{
"CMIP6.input4MIPs.ImperialCollege.ImperialCollege-1-0":{
"institution_id":"ImperialCollege",
"source_id":"ImperialCollege-1-0",
"mip_table":["C4MIP","OMIP"],
"data_type":"atmosphericState",
"version":"1.0",
"VersionInfo":"deprecated",
"VersionNotes":"...to be added: reason for deprecation...",
"doi":"10.22033/ESGF/input4MIPs.1162",
"Title": "Compiled Historical Record of Atmospheric delta13CO2 version 1.1",
"id": "input4MIPs.CMIP6.C4MIP.ImperialCollege.ImperialCollege-1-1.atmos.yr.delta13co2-in-air.gm",
"currentVersion":["1.1",
                           "2.0"],
"deprecatedVersion": ["1.0"]
},

Thanks, David

@MartinaSt
Copy link
Collaborator

@durack1 , @davidhassell : Following today's discussion in the input4MIPs meeting, I propose that we add an attribute "VersionLink", which enables to link to a document (PDF) with a detailed description of the issue with the dataset version.

@durack1
Copy link
Contributor Author

durack1 commented May 14, 2019

@MartinaSt just circling back around on this. Is there an API to call to query the DOIs issued by the DKRZ citation service? As the archive has grown so much now, I'm reluctant to try and hand-spin this versioning information.

I'm looking into harvesting all the metadata attributes from the ESGF project so I can populate all the fields comprehensively

@MartinaSt
Copy link
Collaborator

@durack1 : I'd like to come back to this version documentation issue. As the errata are not accessible for the data citation (an access by DRS CV is required) the revised version information is the only possibility to access+display version/errata information on the DOI landing page.

What are your plans with this version documentation? Any idea about a schedule?

@MartinaSt
Copy link
Collaborator

@durack1, my use for this version information is to inform users about errors on the DOI granularity, which is for @mauzey1's example: 'input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0'. Thus what I need is:

  • DRS on DOI granularity (see above)
  • deprecated [True/False] as information if this is the latest version or not
  • reason for deprecation, which I called 'VersionNotes' in an old comment for this issue

Apart from that, you had the idea to include the doi in the version information as well. If we still want to do that, that would define the granularity for the version information.

@durack1
Copy link
Contributor Author

durack1 commented Jan 15, 2020

@mauzey1 it'd be great to get back to this and get it done. What did you need from me to get this finalized?

@mauzey1
Copy link

mauzey1 commented Feb 26, 2020

@durack1
input4MIPs_report.txt
Here is the latest table I have made from input4MIPs. Below is an excerpt of the table.

...
        "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-2": {
            "institutionId": "PCMDI",
            "sourceId": "PCMDI-AMIP-1-1-2",
            "mipTable": "CMIP",
            "datatype": "SSTsAndSeaIce",
            "version": "1.1.2",
            "id": {
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-2.ocean.fx.areacello.gn.v20170419": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-2.ocean.fx.sftof.gn.v20170419": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-2.ocean.mon.tos.gn.v20170419": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-2.ocean.mon.tosbcs.gn.v20170419": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-2.seaIce.mon.siconc.gn.v20170419": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-2.seaIce.mon.siconcbcs.gn.v20170419": "latest"
            },
            "doi": "10.22033/ESGF/input4MIPs.1161",
            "title": "PCMDI AMIP SST and sea-ice boundary conditions version 1.1.2"
        },
        "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3": {
            "institutionId": "PCMDI",
            "sourceId": "PCMDI-AMIP-1-1-3",
            "mipTable": "CMIP",
            "datatype": "SSTsAndSeaIce",
            "version": "1.1.3",
            "id": {
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3.ocean.fx.areacello.gn.v20171031": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3.ocean.fx.sftof.gn.v20171031": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3.ocean.mon.tos.gn.v20171031": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3.ocean.mon.tosbcs.gn.v20171031": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3.seaIce.mon.siconc.gn.v20171031": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3.seaIce.mon.siconcbcs.gn.v20171031": "latest"
            },
            "doi": "10.22033/ESGF/input4MIPs.1735",
            "title": "PCMDI AMIP SST and sea-ice boundary conditions version 1.1.3"
        },
        "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4": {
            "institutionId": "PCMDI",
            "sourceId": "PCMDI-AMIP-1-1-4",
            "mipTable": "CMIP",
            "datatype": "SSTsAndSeaIce",
            "version": "1.1.4",
            "id": {
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4.ocean.fx.areacello.gn.v20180427": "deprecated",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4.ocean.fx.sftof.gn.v20180427": "deprecated",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4.ocean.mon.tos.gn.v20180427": "deprecated",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4.ocean.mon.tosbcs.gn.v20180427": "deprecated",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4.seaIce.mon.siconc.gn.v20180427": "deprecated",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4.seaIce.mon.siconcbcs.gn.v20180427": "deprecated"
            },
            "doi": "10.22033/ESGF/input4MIPs.2204",
            "title": "PCMDI AMIP SST and sea-ice boundary conditions version 1.1.4"
        },
...

Each entry has the activity_id, mip_era, target_mip_list, institution_id, source_id, source_version, and dataset_category of a group of datasets. Each group of activity_id, mip_era, target_mip, institution_id, and source_id are used to get a DOI from https://cera-www.dkrz.de/WDCC/ui/cerasearch/cerarest/exportcmip6. Each entry has the list of dataset IDs along with their dataset_status (deprecated/latest/None).

This table was made with this Python script: https://github.com/mauzey1/esgf-utils/blob/d1e4215fd36ffa67f3a46bba7a2cd324ce5121b2/update-reports/input4MIPs_report.py

@durack1
Copy link
Contributor Author

durack1 commented May 20, 2020

@MartinaSt @davidhassell we should really circle around on this so we can finalize the forcing versioning json and you guys can start using it. How does the above format #2 look?

@MartinaSt
Copy link
Collaborator

MartinaSt commented May 25, 2020

@durack1, thanks for getting this forward.

I use this information to document version and error information on the DOI landing pages. Therefore I need information on (see above):

  • version: version number
  • versionInfo: "deprecated" in case of error
  • versionNotes: information on the error in case of a deprecated data version

@durack1
Copy link
Contributor Author

durack1 commented May 25, 2020

@MartinaSt, it seems the above includes such information, so e.g.

version/versionInfo:
"id": {
"input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3.ocean.fx.areacello.gn.v20171031": "latest",
...
"id": {
"input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4.ocean.fx.areacello.gn.v20180427": "deprecated",

Regarding the versionNotes, what format would you want for this, a drop-down selection, or free-form text? Just wondering the use case and whether a char limit etc is required.

@davidhassell please chime in now, as once this format is set it's not likely to be revisited and may require ES-DOCs to harvest information separate to this version json data

@MartinaSt
Copy link
Collaborator

@durack1 Sorry, I did not scroll to the right to see the deprecated information. Regarding the errata information: The content is up to the data creators, so free text. The important part for me is that I can show a reliable and meaningful errata information for deprecated data or the reason for deprecation on the DOI landing page.

@durack1
Copy link
Contributor Author

durack1 commented May 26, 2020

@MartinaSt no problem, so how about if we have a situation where two datasets are latest? With the PCMDI-AMIP-X-Y-Z data, the 1.1.3 (from memory) was the official CMIP6 release, whereas normally a 6 monthly update is released, which deprecates the previous version (but not 1.1.3 which will always be available as "latest"). Is such logic a problem?

Are there any other considerations we need to factor in whilst finalizing the format?

@MartinaSt
Copy link
Collaborator

MartinaSt commented May 26, 2020

@durack1 yes, the errata might be related to some but not all datasets. The "versionNote" is directly related to the "deprecated/latest/None" information and thus including it breaks the proposed json.

Maybe we can assume that for such a case, tat there is only one reason for the deprecation of some (but not all) datasets and include it in the upper level alongside "version"?

@durack1
Copy link
Contributor Author

durack1 commented May 26, 2020

@MartinaSt well how about we proceed this way, we'll work to generate the 6.2.37 version of the json, review this and once we have a finalized format generate all the previous versions back to initial 6.0.0 (20th December 2016) release, sound good?

@MartinaSt
Copy link
Collaborator

@durack1 Any suggestion to get this finalized is a good one! So from my view: Go ahead!

Just as two comments: I will wait for the final format before I do any code changes and I will do the change if the version includes the errata information for the users (otherwise the deprecation flag does not add much information to the already available version, which is part of the DRS). It's a matter of spending my time most efficiently on the different projects I am involved in...

@durack1
Copy link
Contributor Author

durack1 commented May 28, 2020

@MartinaSt completely understood, and agree (spending valuable time appropriately).

@mauzey1 is addressing a number of high priority requirements, and this is in the queue after these, so I'd hope we can get the latest version json finalized first, then we can double check it contains everything in the format required and then roll back to the start.

There are a couple of new datasets that have started to appear for review, so 6.2.37 will be incrementing over the coming months

@durack1
Copy link
Contributor Author

durack1 commented Jul 2, 2020

@MartinaSt @mauzey1 is working on this as a second priority to the CMIP publication page. He has already generated the attached, and so we'll need to tweak this format to get to the finish line
input4MIPs_report.json.txt

@MartinaSt
Copy link
Collaborator

@durack1 @mauzey1 Ok, now I am on the right page. Thanks for the JSON and your effort. It looks good except for:

Sorry, to be persistent but the most important information for me is to get errata information or in other words to have the reason for deprecation in the JSON for the "deprecated" cases. Is it possible to add such an information to the JSON?

@durack1
Copy link
Contributor Author

durack1 commented Jul 2, 2020

@MartinaSt yes sorry for my loose comments. Yeah that should be possible, so if a dataset is "deprecated", we could add an additional field such as deprecationNotes with a brief description. Do you have any suggestion regarding character counts etc, or you have no limitations?

@MartinaSt
Copy link
Collaborator

Thanks @durack1 !
I am not aware of a hard character limitation on my end. But Brief is certainly good.

@mauzey1
Copy link

mauzey1 commented Jul 23, 2020

@durack1 The ESGF database only provides whether or not a dataset was deprecated; it does not provide any notes about why it was deprecated. How will we get this information? Would we just contact the people who published the datasets for the reason for deprecation, and manually add it to the rest of the information?

@durack1
Copy link
Contributor Author

durack1 commented Jul 23, 2020

@mauzey1 thanks for circling around on this. I have this information, so let me know where this should be put, so we can integrate it

@mauzey1
Copy link

mauzey1 commented Jul 23, 2020

@durack1 Is there a github repo where you could store that information? It would make that info more accessible and easy to update.

@durack1
Copy link
Contributor Author

durack1 commented Jul 23, 2020

I was hoping that this information would be stored alongside the version info. There is no current github based index, rather I would need to generate this for inclusion alongside the versioning info

@mauzey1
Copy link

mauzey1 commented Jul 23, 2020

@durack1 Is there a file with this information you can post here? I would like to rewrite my program that created the input4MIPs version info file to include that information. That way if changes were to happen to the status of the datasets, we can update the deprecation info and rebuild the version info file.

@durack1
Copy link
Contributor Author

durack1 commented Jul 23, 2020

@mauzey1 that will be something that I need to go through notes (not digital) to generate - so a job for me. Do you have a list of the datasets that have been identified as deprecated? That will simplify my job, and I should be able to post text within this thread for simplicity.

We will obviously have to figure out where to put this going forward, but get up-to-date first

@mauzey1
Copy link

mauzey1 commented Jul 27, 2020

@durack1 Here is a list of datasets listed as "deprecated" from input4MIPs.
deprecated_list.txt

@durack1 durack1 transferred this issue from PCMDI/input4MIPs-cmor-tables May 19, 2023
@durack1
Copy link
Contributor Author

durack1 commented May 19, 2023

Migrating this issue from https://github.com/PCMDI/input4MIPs-cmor-tables to this repo

@durack1 durack1 added the CMIP6 Related to previous-era data label Aug 21, 2024
@durack1 durack1 self-assigned this Aug 28, 2024
@durack1
Copy link
Contributor Author

durack1 commented Aug 28, 2024

@davidhassell @charliepascoe just pinging you guys on this thread. We've (for years) had an aspiration to bring the forcing datasets used for simulations into the documentation list, so this will be our best way to achieve this - we now have a live repo with the latest data identity and status, we just need to wrap this up for documentation purposes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMIP6 Related to previous-era data
Projects
None yet
Development

No branches or pull requests

6 participants