[Go-essp-tech] Fwd: Expected number of variables for which quality control will be needed

martin.juckes at stfc.ac.uk martin.juckes at stfc.ac.uk
Fri Jan 7 08:46:56 MST 2011


Hello Martina,

I'll pass your comments on the QC tool on to Kevin, who is running it for us. 

Before releasing this data for replication testing, we would like to complete the QC L2 process. To do that we need to know what the criteria for successful completion of the QC L2 process are. If we are, for instance, looking at a collection of 1880 files from the rcp45 experiment, and we have run the QC L2 software, how do we determine whether this collection of files should have a pass or fail?

Regards,
Martin


-----Original Message-----
From: Martina Stockhause [mailto:martina.stockhause at zmaw.de] 
Sent: 07 January 2011 15:37
To: Juckes, Martin (STFC,RAL,SSTD)
Cc: lautenschlager at dkrz.de; go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] Fwd: Expected number of variables for which quality control will be needed

  Dear Martin, dear all,

we are happy to read that you have applied the QC tool on your data,
successfully. We would expect the data in the QCDB (running on
qc.dkrz.de), but we cannot find results in qc1, qc2 or test_badc.
Please insert the QC results in the QCDB using the Wrapper with option
--noqc ("qcWrapper --configure=<configure file> --noqc").
For information on the additional QCDB entries in the configure file
please refer to the qc.conf template file in the qcWrapper directory
of our svn:
(http://svn-mad.zmaw.de/svn/mad/Model/QualCheck/QCWrapper/trunc/qc.conf).

A few remarks on the relevance of the Wrapper and the QCDB for the CMIP5
quality process:
The QC tool runs on data of QC level 1, i.e. data run through the
versioning tool AND the ESG publisher. Thus data in ESGF DRS syntax. The
QCWrapper fills the QCDB with the results of the QC tool to make them
easily accessible for result analyzes (using qcDbselect.py).
Additionally, it is the source of information for QC L3 checks. We
cannot start with QC L3 as long as the results of QC L2 are not
available for us in the QCDB.

And, of course, we are interested to get feedback on your experiences of
the QC tool application, e.g. in the efficient wall clock time needed to
run the QC on a DRS experiment or a mean value per atomic dataset check.

Our technical colleagues Estani and Stephan are very interested to start
testing the data replication. Could we use that part of the UKMO data
for that?

Have a nice weekend,
Martina


On 01/06/2011 05:35 PM, martin.juckes at stfc.ac.uk wrote:
> Hello everyone,
>
> I hope we can get some conclusion on what constitutes passed QC L2 before the Asheville meeting -- we have some UKMO data which appears to pass all the tests -- but it is hard to be sure because of the complexity of the output from the quality control code. We would like to declare this data as passed, so that we can get onto the next problem (replicating it to other centres).
>
> The quality control document (attached) lists 4 objective tests under QC level 2:
> (1) Number of records in each file consistent with metadata
> (2) Regular time steps
> (3) Metadata consistent with data request
> (4) Minimum and maximum checked against specified ranges (or a default based on the mean).
>
> An email from Martina this morning implied that subjective tests would only come in at the QC L3 stage or if there is some doubt about the objective tests.
>
> I'm not clear why the process is so complex when the document only specifies a small number of objective tests -- though there clearly is complexity of the workflow (ensuring that results for tests of all files are recorded and retrievable).
>
> Regards,
> Martin
>
> -----Original Message-----
> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Michael Lautenschlager
> Sent: 06 January 2011 16:16
> To: go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] Fwd: Expected number of variables for which quality control will be needed
>
> Hi Martina, Bryan, Martin, Karl, and .....
>
> this discussion stream fits a discussion Karl raised shortly before
> Christmas when Martina and Frank are captured in San Francisco due to
> snow air traffic shut down in Europe. Karl had been asking for
> interpretation of QC-L2 output interpretation. This is connected to two
> more questions:
> the complexity of QC-L2 checks (What can be achieved in an acceptable
> period of time?) and
> the definition of criteria for the assignment of the data flag "QC-L2
> passed".
>
> The complexity discussion of QC-L2 checks has just been started on this
> thread and should be continued. But I would like to see also a
> discussion about the criteria for assignment of "QC-L2 passed" to CMIP5
> data. The problem I see is that QC-L2 does not give a clear yes/no or
> white/black decision (as for the CMOR-2 compliance). QC-L2 disseminate
> also grey colour results and we have to deal with these grey colours.
> How much grey colour we can accept without loosing too much quality?
>
> I think we have to discuss precisely what tests can be achieved in QC-L2
> with respect to work load and benefit and develop guidelines to weight
> the results.
>
> Since I am not continuously available for the next weeks I  suggest that
> Martina and Frank own this discussion for DKRZ as Stephan and Estani do
> for the technical part.
>
> With respect to our upcoming GO-ESSP meeting May 2011 in Asheville I
> think we could present our CMIP5 quality control management and discuss
> it in a slightly wider community.
>
> So far for the moment my ideas to this thread on QC in CMIP5.
>
> Best wishes, Michael
>
> ---------------
> Dr. Michael Lautenschlager
>
> German Climate Computing Centre (DKRZ)
> World Data Center Climate (WDCC)
> ADDRESS: Bundesstrasse 45a, D-20146 Hamburg, Germany
> PHONE:   +4940-460094-118
> E-Mail:  lautenschlager at dkrz.de
>
> URL:    http://www.dkrz.de/
>           http://www.wdc-climate.de/
>
>
> Geschäftsführer: Prof. Dr. Thomas Ludwig
> Sitz der Gesellschaft: Hamburg
> Amtsgericht Hamburg HRB 39784
>
> Am 06.01.2011 12:13, schrieb Martina Stockhause:
>>     Hi, Bryan,
>>
>> I expect that
>> - existing errors are found and documented by the QC tool
>> - errors are analysed and catagorised by the select-Script (part of the
>> wrapper package) : We are still working on this catagorisation and
>> evaluation of errors.
>>
>> The plots are an additional help if the person running the QC is in
>> doubt and for documentation of QC results. If there were errors in the
>> data  not found by the QC tool, I would expect that they were visible
>> within the first few plots.
>>
>> During QC L3 checks we will double-check the QC L2 results using
>> logfiles and plots for spot checking.
>>
>> Best wishes,
>> Martina
>>
>>
>> On 01/05/2011 06:11 PM, Bryan Lawrence wrote:
>>> Hi Karl, Martina
>>>
>>> So are we really expecting PCMDI to look at more than 10,000 plots per
>>> day as part of QC level 2?
>>>
>>> (Martin's numbers are on an internal wiki, but a back of the envelope
>>> calculation goes something like:
>>>     - one plot per atomic data set,
>>>     - o(10^6) atomic datasets
>>>     - 100 days
>>>     - o(10^4) per day at PCMDI given BADC and DKRZ doing a negligible
>>> amount cf PCMDI under current plans)
>>>
>>> Cheers
>>> Bryan
>>>
>>> ---- Original Message ----
>>>> From: "Juckes, Martin (STFC,RAL,SSTD)"<martin.juckes at stfc.ac.uk>
>>>> To: "Lawrence, Bryan (STFC,RAL,SSTD)"<bryan.lawrence at stfc.ac.uk>,
>>> badc<badc-internal at zonda.badc.rl.ac.uk>
>>>> CC: "Pepler, Sam (STFC,RAL,SSTD)"<sam.pepler at stfc.ac.uk>
>>>> Subject:Expected number of variables for which quality control will be
>>> needed
>>>> Following the discussion in the CMIP5 meeting this morning, I've put
>>>> some estimates of numbers of variables in
>>>> http://proj.badc.rl.ac.uk/badc/wiki/Ar5Cmip5/MOHCCmip5/VolumePredict
>>>> ions
>>>>
>>>> It looks as though we will have up to 40,000 from UKMO (including
>>>> HiGEM). If we want to do this in around 100 working days (leaving
>>>> some room for repeats), we have 400 plots to verify per day. This
>>>> much is, I think, manageable. If we expand it to, say, a third of
>>>> the CMIP5 experiment, we will have 4,000 per day, which looks
>>>> problematic -- but if it has to be done we could probably keep the
>>>> plot inspection time quite small.
>>>>
>>>> Cheers,
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Bryan Lawrence [mailto:bryan.lawrence at stfc.ac.uk]
>>>> Sent: 05 January 2011 15:51
>>>> To: badc
>>>> Cc: Pepler, Sam (STFC,RAL,SSTD); Juckes, Martin (STFC,RAL,SSTD)
>>>> Subject: cmip5 meeting summary
>>>>
>>>> Hi Folks
>>>>
>>>> I've posted a short (2 page) doc suitable for public viewing of where
>>>> we are at with cmip5 support on the MIRP website at
>>>> http://proj.badc.rl.ac.uk/mirp/wiki/CMIP5status
>>>>
>>>> If I can't make future meetings, could Sam or Martin please ensure
>>>> that a version of this document is updated for future meetings.
>>>>
>>>> Cheers
>>>> Bryan
>>>>
>>>>
>>>> --
>>>> Bryan Lawrence
>>>> Director of Environmental Archival and Associated Research
>>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
>>>> STFC, Rutherford Appleton Laboratory
>>>> Phone +44 1235 445012; Fax ... 5848;
>>>> Web: home.badc.rl.ac.uk/lawrence
>>>> --
>>>> Scanned by iCritical.
>>> --
>>> Bryan Lawrence
>>> Director of Environmental Archival and Associated Research
>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
>>> STFC, Rutherford Appleton Laboratory
>>> Phone +44 1235 445012; Fax ... 5848;
>>> Web: home.badc.rl.ac.uk/lawrence
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>

-- 
----------- DKRZ / Data Management -----------

Martina Stockhause
Deutsches Klimarechenzentrum
Bundesstr. 45a
D-20146 Hamburg
Germany

phone:	+49-40-460094-122
FAX:	+49-40-460094-106
e-mail:	martina.stockhause at zmaw.de

----------------------------------------------

-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list