[Go-essp-tech] Comments on Tuesday telco on QC and DOI
Bob Drach
drach at llnl.gov
Tue Mar 16 13:35:57 MDT 2010
Hi Michael,
It was mentioned in today's telco that the ESG publisher currently
does some QC checks automatically. To be specific, the publisher checks:
- Discovery data - especially DRS fields - are identifiable and have
correct values. If any mandatory fields are missing or invalid, an
error is raised and the data cannot be published.
- Standard names are valid. A warning is issued if the standard name
is missing or unrecognized.
- Coordinate axes are recognizable - particularly time. A calendar is
defined.
- Time values are monotonic and do not overlap between files. This is
checked when aggregations are generated. It is not considered an
error if timepoints are missing.
There seems to be reasonable consensus that the quality control flag
will be created and updated by the publisher, will be associated with
the publication-level dataset and displayed with that dataset on the
gateway. The question remains how to deal with datasets for which
either (a) some of the variables in the dataset were not generated by
the modeling group, or (b) a small number of variables (whatever that
means) did not pass quality control:
(a) Experience suggests that some groups will not submit all
variables for an experiment, or will not generate and submit them at
the same time. When the publishing group is not the same as the
modelling group (e.g. where a center has submitted data to one of the
core nodes for publication and archival) it is not always obvious
when a dataset is 'complete'. Should the publisher wait until the
modellers say 'the dataset is complete', or publish partial datasets
with the idea that only QC level 3 would require the 'complete' dataset?
(b) If one or a few variables in a dataset are found to be in error,
there may be a considerable delay before the modelling center can
replace the erroneous data. Again, should the remaining valid data be
published, with the idea that some users will not care about the
missing variables but want prompt access to the remaining variables?
One last comment: in AR4 some datasets ( and some variables within
those datasets) were much more heavily subscribed than others. In
particular, the 20th century historical runs were downloaded with
greater frequency. If it were possible to anticipate which datasets
would be of greatest interest, it would be a good idea to prioritize
the associated QC and publication.
Best regards,
Bob
On Mar 15, 2010, at 8:06 AM, Michael Lautenschlager wrote:
> Dear all,
>
> as Stephen just announced I merged the contributions into two
> documents for our tomorrow's telco. The QC document contains the
> complete set of flow charts and highlights open issues and points
> of discussion.
>
> Best wishes, Michael
>
> --
> ---------------
> Dr. Michael Lautenschlager
>
> German Climate Computing Centre (DKRZ)
> World Data Center Climate (WDCC)
> ADDRESS: Bundesstrasse 45a, D-20146 Hamburg, Germany
> PHONE: +4940-460094-118
> E-Mail: lautenschlager at dkrz.de
>
> URL: http://*www.*dkrz.de/
> http://*www.*wdc-climate.de/<data-citations-100311-mil-
> bnl.pdf><CMIP5-AR5-
> QualityControl-20100315.pdf>__________________________________________
> _____
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
More information about the GO-ESSP-TECH
mailing list