[Go-essp-tech] Handling missing data in the CMIP5 archive

ag.stephens at stfc.ac.uk ag.stephens at stfc.ac.uk
Tue May 3 05:01:22 MDT 2011


Dear all,

Thanks for your responses. I concur that it is very important to agree when the QC manager might be able to "relax" the criteria for passing QC-L2. As long as we document the guidelines and they are clear to QC managers and data providers then I think we are safe. I look forward to hearing the outcomes of the GO-ESSP discussions on this topic.

Thanks,

Ag

From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Karl Taylor
Sent: 29 April 2011 21:53
To: Michael Lautenschlager
Cc: go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] Handling missing data in the CMIP5 archive

Hi Michael,

I agree we should decide the criteria, and we'll have an opportunity for this at the GO-ESSP meeting.

Have a good weekend,
Karl

On 4/29/11 3:10 AM, Michael Lautenschlager wrote:

Hi Karl et al.,



yes, I agree that we cannot expect all variables from all models and all

centres. We have to deal with differing completeness for DOI published

data entities. This is not a problem for DOI data publication because

each individual DOI points to the description and to the complete list a

data which is hind this entity.



But what we should discuss and decide on are the criteria for

completeness and for the "QC-L2 passed" flag (see Frank's earlier

response) for accepting data entities in the CMIP5/IPCC-AR5 reference

data archive and for DOI data publication. This is exactly on of the

topics I would like to discuss at the ESGF day and during the GO-ESSP

meeting.



Ag's request rise the question from a theoretical to a practical level

and we have to agree on a common procedure in order to guarantee

consistency across archives in QC level assignment and in completeness

of data entities.



   Best wishes, Michael





Am 28.04.2011 19:38, schrieb Karl Taylor:

Dear Ag,



There is another possible way of handling the "missing data" issue.

I'm not sure that a dataset should be be required to be complete

(i.e., required to include all time slices) to be considered eligible

for DOI assignment.  That is, we could relax the criteria.  Note that

I don't think we require *all* variables requested within a single

dataset to be present, so some datasets will indeed be incomplete but

be eligible for a DOI.  I think the QC procedure should be to check

with the modeling group, and if they can't supply the missing

time-slices, then we somehow note this flaw in the dataset

documentation and if other QC checks are passed, assign it a DOI.



The criteria for getting a DOI should be that there are no known

errors in the data itself, and that there are no major problems with

the metadata.  In this case the data will be reliable, and analysts

will be welcome to use it and publish results, so I think it should be

assigned a DOI.



What do others think?



Best regards,

Karl





On 4/28/11 3:12 AM, ag.stephens at stfc.ac.uk<mailto:ag.stephens at stfc.ac.uk> wrote:

Dear all,



At BADC we have come across our first "missing data" issue in the CMIP5 datasets we are ingesting. We have an example of some missing months for a particular set of variables that was revealed when running the QC code from DKRZ.



It would be very useful for the CMIP5 archive managers to make an authoritative statement about how we should handle missing data time steps in the archive.



I propose the following response when a Data Node receives a dataset in which time steps are missing:



  1. QC manager (i.e. whoever runs the QC code) informs Data Provider that there is missing data in a dataset (specifying full DRS structure and date range missing).



  2a. If Data Provider says "no, cannot provide this data" then the affected datasets cannot get a DOI and cannot be part of the "crystallised archive". STOP



  2b. Data Provider re-generates files, data is re-ingested, new version is generated, QC is re-run, all is good. STOP



  2c. Data Provider cannot re-generate but wants to pass QC - so needs to create the required files full of missing data.



  3. Data Provider creates missing data files and sends, data re-ingested, new version is generated, QC re-run, all good. STOP



In cases 2a and 2c it would also be very useful if the dataset is annotated to inform the user which dates have been FILLED with missing data. This would, I believe, be in the QC logs but we might want a more prominent record of this if possible.



Cheers,



Ag

BADC--

Scanned by iCritical.





_______________________________________________

GO-ESSP-TECH mailing list

GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>

http://mailman.ucar.edu/mailman/listinfo/go-essp-tech

-- 
Scanned by iCritical.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110503/d6b6429a/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list