[Go-essp-tech] Handling missing data in the CMIP5 archive

ag.stephens at stfc.ac.uk ag.stephens at stfc.ac.uk
Thu Apr 28 04:12:51 MDT 2011


Dear all, 

At BADC we have come across our first "missing data" issue in the CMIP5 datasets we are ingesting. We have an example of some missing months for a particular set of variables that was revealed when running the QC code from DKRZ.

It would be very useful for the CMIP5 archive managers to make an authoritative statement about how we should handle missing data time steps in the archive.

I propose the following response when a Data Node receives a dataset in which time steps are missing:

 1. QC manager (i.e. whoever runs the QC code) informs Data Provider that there is missing data in a dataset (specifying full DRS structure and date range missing).

 2a. If Data Provider says "no, cannot provide this data" then the affected datasets cannot get a DOI and cannot be part of the "crystallised archive". STOP

 2b. Data Provider re-generates files, data is re-ingested, new version is generated, QC is re-run, all is good. STOP

 2c. Data Provider cannot re-generate but wants to pass QC - so needs to create the required files full of missing data.

 3. Data Provider creates missing data files and sends, data re-ingested, new version is generated, QC re-run, all good. STOP

In cases 2a and 2c it would also be very useful if the dataset is annotated to inform the user which dates have been FILLED with missing data. This would, I believe, be in the QC logs but we might want a more prominent record of this if possible.

Cheers,

Ag
BADC-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list