[Go-essp-tech] Comments on Tuesday telco on QC and DOI

Tue Mar 16 13:35:57 MDT 2010

Hi Michael,

It was mentioned in today's telco that the ESG publisher currently  
does some QC checks automatically. To be specific, the publisher checks:

- Discovery data - especially DRS fields - are identifiable and have  
correct values. If any mandatory fields are missing or invalid, an  
error is raised and the data cannot be published.
- Standard names are valid. A warning is issued if the standard name  
is missing or unrecognized.
- Coordinate axes are recognizable - particularly time. A calendar is  
defined.
- Time values are monotonic and do not overlap between files. This is  
checked when aggregations are generated. It is not considered an  
error if timepoints are missing.

There seems to be reasonable consensus that the quality control flag  
will be created and updated by the publisher, will be associated with  
the publication-level dataset and displayed with that dataset on the  
gateway. The question remains how to deal with datasets for which  
either (a) some of the variables in the dataset were not generated by  
the modeling group, or (b) a small number of variables (whatever that  
means) did not pass quality control:

(a) Experience suggests that some groups will not submit all  
variables for an experiment, or will not generate and submit them at  
the same time. When the publishing group is not the same as the  
modelling group (e.g. where a center has submitted data to one of the  
core nodes for publication and archival)  it is not always obvious  
when a dataset is 'complete'. Should the publisher wait until the  
modellers say 'the dataset is complete', or publish partial datasets  
with the idea that only QC level 3 would require the 'complete' dataset?

(b) If one or a few variables in a dataset are found to be in error,  
there may be a considerable delay before the modelling center can  
replace the erroneous data. Again, should the remaining valid data be  
published, with the idea that some users will not care about the  
missing variables but want prompt access to the remaining variables?

One last comment: in AR4 some datasets ( and some variables within  
those datasets) were much more heavily subscribed than others. In  
particular, the 20th century historical runs were downloaded with  
greater frequency. If it were possible to anticipate which datasets  
would be of greatest interest, it would be a good idea to prioritize  
the associated QC and publication.

Best regards,

Bob

On Mar 15, 2010, at 8:06 AM, Michael Lautenschlager wrote:

> Dear all,
>
> as Stephen just announced I merged the contributions into two  
> documents for our tomorrow's telco. The QC document contains the  
> complete set of flow charts and highlights open issues and points  
> of discussion.
>
> Best wishes, Michael
>
> -- 
> ---------------
> Dr. Michael Lautenschlager
>
> German Climate Computing Centre (DKRZ)
> World Data Center Climate (WDCC)
> ADDRESS: Bundesstrasse 45a, D-20146 Hamburg, Germany
> PHONE:   +4940-460094-118
> E-Mail:  lautenschlager at dkrz.de
>
> URL:    http://*www.*dkrz.de/
>         http://*www.*wdc-climate.de/<data-citations-100311-mil- 
> bnl.pdf><CMIP5-AR5- 
> QualityControl-20100315.pdf>__________________________________________ 
> _____
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech