[Go-essp-tech] Handling missing data in the CMIP5 archive

Karl Taylor taylor13 at llnl.gov
Fri Apr 29 14:52:39 MDT 2011


Hi Michael,

I agree we should decide the criteria, and we'll have an opportunity for 
this at the GO-ESSP meeting.

Have a good weekend,
Karl

On 4/29/11 3:10 AM, Michael Lautenschlager wrote:
> Hi Karl et al.,
>
> yes, I agree that we cannot expect all variables from all models and all
> centres. We have to deal with differing completeness for DOI published
> data entities. This is not a problem for DOI data publication because
> each individual DOI points to the description and to the complete list a
> data which is hind this entity.
>
> But what we should discuss and decide on are the criteria for
> completeness and for the "QC-L2 passed" flag (see Frank's earlier
> response) for accepting data entities in the CMIP5/IPCC-AR5 reference
> data archive and for DOI data publication. This is exactly on of the
> topics I would like to discuss at the ESGF day and during the GO-ESSP
> meeting.
>
> Ag's request rise the question from a theoretical to a practical level
> and we have to agree on a common procedure in order to guarantee
> consistency across archives in QC level assignment and in completeness
> of data entities.
>
>     Best wishes, Michael
>
>
> Am 28.04.2011 19:38, schrieb Karl Taylor:
>> Dear Ag,
>>
>> There is another possible way of handling the "missing data" issue.
>> I'm not sure that a dataset should be be required to be complete
>> (i.e., required to include all time slices) to be considered eligible
>> for DOI assignment.  That is, we could relax the criteria.  Note that
>> I don't think we require *all* variables requested within a single
>> dataset to be present, so some datasets will indeed be incomplete but
>> be eligible for a DOI.  I think the QC procedure should be to check
>> with the modeling group, and if they can't supply the missing
>> time-slices, then we somehow note this flaw in the dataset
>> documentation and if other QC checks are passed, assign it a DOI.
>>
>> The criteria for getting a DOI should be that there are no known
>> errors in the data itself, and that there are no major problems with
>> the metadata.  In this case the data will be reliable, and analysts
>> will be welcome to use it and publish results, so I think it should be
>> assigned a DOI.
>>
>> What do others think?
>>
>> Best regards,
>> Karl
>>
>>
>> On 4/28/11 3:12 AM, ag.stephens at stfc.ac.uk wrote:
>>> Dear all,
>>>
>>> At BADC we have come across our first "missing data" issue in the CMIP5 datasets we are ingesting. We have an example of some missing months for a particular set of variables that was revealed when running the QC code from DKRZ.
>>>
>>> It would be very useful for the CMIP5 archive managers to make an authoritative statement about how we should handle missing data time steps in the archive.
>>>
>>> I propose the following response when a Data Node receives a dataset in which time steps are missing:
>>>
>>>    1. QC manager (i.e. whoever runs the QC code) informs Data Provider that there is missing data in a dataset (specifying full DRS structure and date range missing).
>>>
>>>    2a. If Data Provider says "no, cannot provide this data" then the affected datasets cannot get a DOI and cannot be part of the "crystallised archive". STOP
>>>
>>>    2b. Data Provider re-generates files, data is re-ingested, new version is generated, QC is re-run, all is good. STOP
>>>
>>>    2c. Data Provider cannot re-generate but wants to pass QC - so needs to create the required files full of missing data.
>>>
>>>    3. Data Provider creates missing data files and sends, data re-ingested, new version is generated, QC re-run, all good. STOP
>>>
>>> In cases 2a and 2c it would also be very useful if the dataset is annotated to inform the user which dates have been FILLED with missing data. This would, I believe, be in the QC logs but we might want a more prominent record of this if possible.
>>>
>>> Cheers,
>>>
>>> Ag
>>> BADC--
>>> Scanned by iCritical.
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110429/19c74835/attachment.html 


More information about the GO-ESSP-TECH mailing list