[Go-essp-tech] Versioning in CMIP5 including QC procedure

Martina Stockhause martina.stockhause at zmaw.de
Tue Apr 19 08:56:30 MDT 2011


Hi, Bob,

what I do not understand is that you have the right version syntax on 
the TDS (http://pcmdi3.llnl.gov/thredds/esgcet/catalog.html):
cmip5.output2.INM.inmcm4.1pctCO2.mon.seaIce.OImon.r1i1p1.v20110323/ 
<http://pcmdi3.llnl.gov/thredds/esgcet/18/cmip5.output2.INM.inmcm4.1pctCO2.mon.seaIce.OImon.r1i1p1.v20110323.html>
but the wrong in the QC DB
test1=> select * from v_last_version where 
exp_name='cmip5/output1/INM/inmcm4/1pctCO2';
  exp_id |             exp_name             | exp_type | ads_id 
|                              ads_name                              | 
ads_type
--------+----------------------------------+----------+--------+--------------------------------------------------------------------+-----------
    2738 | cmip5/output1/INM/inmcm4/1pctCO2 | DOI_EXP  |   2623 | 
cmip5/output1/INM/inmcm4/1pctCO2/mon/seaIce/OImon/r1i1p1/snomelt/1 | 
ATOMIC_DS

Moreover even the position of old-style version and variable are 
switched in the QCDB.

If you run the versioning tool before ESG publication and then the QC 
tool on the "versionened" data, you should have no problem with QC tool 
application.

Best wishes,
Martina


On 04/18/2011 09:44 PM, Drach, Bob wrote:
> I'm also in agreement on the salient points:
>
> - QC is an essential part of CMIP5
> - CMIP5 is using date-style versioning. (Yes, I'll remind the data node
> publishers).
>
> I don't think the implementation issues are (or should be) showstoppers.
> We'll find a way to make the QC tool work in our environment.
>
> Regards,
>
> Bob
>
>
> On 4/18/11 1:21 AM, "stephen.pascoe at stfc.ac.uk"<stephen.pascoe at stfc.ac.uk>
> wrote:
>
>> I completely agree with Martin.  For us the key event that dictates the
>> dataset version is when the data is placed in our archive -- the date of that
>> event is what becomes the version number.
>>
>> I also notice that neither the CCCMA, NASA-GISS  or BCC datanodes are using
>> date-versioning!
>>
>> Stephen.
>>
>> ---
>> Stephen Pascoe  +44 (0)1235 445980
>> Centre of Environmental Data Archival
>> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>>
>>
>> -----Original Message-----
>> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On
>> Behalf Of martin.juckes at stfc.ac.uk
>> Sent: 18 April 2011 08:48
>> To: martina.stockhause at zmaw.de; drach1 at llnl.gov; taylor13 at llnl.gov; Lawrence,
>> Bryan (STFC,RAL,RALSP)
>> Cc: go-essp-tech at ucar.edu; michael.lautenschlager at zmaw.de; painter1 at llnl.gov
>> Subject: Re: [Go-essp-tech] Versioning in CMIP5 including QC procedure
>>
>> Hello All,
>>
>> I'd like to back up Martina on this. We do have an agreed version control
>> system defined in the DRS document (or, more precisely, the DRS document
>> defines an aspect of the version control which should be implemented -- namely
>> a subdirectory level which indicates the version).
>>
>> We agreed that the version of a publication dataset should be reflected in the
>> directory structure, and it is clear that this requires that the version of
>> the publication dataset be determined before running the ESG publisher, not by
>> the ESG publisher. The QC software runs on the file system rather than
>> accessing data through the data node software, so consistency between the
>> file-system layout and the publication units is essential. While it is
>> possible to imagine archives in which all access to data is through the data
>> node and the file-system layout is of no interest anyone but the data node
>> development team, that is clearly not the situation here -- no matter how
>> desirable it might be.
>>
>> QC is a very important part of ensuring consistency of quality in the archive,
>> we really need to make it work.
>>
>> At BADC, we started doing the layout on the same machine as the publishing,
>> but are now moving it to a different machine -- there is no fundamental
>> problem with this (the machine on which the layout is done can, of course,
>> communicate with the machine on which the publishing is done).
>>
>> regards,
>> Martin
>>
>>
>> From: go-essp-tech-bounces at ucar.edu [go-essp-tech-bounces at ucar.edu] on behalf
>> of Martina Stockhause [martina.stockhause at zmaw.de]
>> Sent: 18 April 2011 06:57
>> To: Drach, Bob
>> Cc: GO-ESSP; Painter, Jeff; michael.lautenschlager
>> Subject: Re: [Go-essp-tech] Versioning in CMIP5 including QC procedure
>>
>> Hi, Bob,
>>
>> since not every published dataset is part of the DOI (on the level of
>> experiment), I have to keep track of versions as well, on the dataset and on
>> the experiment (DOI) level. The inhomogeneity of the dataset version syntax is
>> more a problem of version control within the QC than one of the QC L2 checker,
>> the QC L2 analyzer, or the QC L2 result export for QC L3.
>>
>> I do not care if the homogeneous version syntax is yours or that of BADC and
>> DKRZ, though the latter would save me adaptation effort, but *that* it is
>> homogeneous. Maybe you could talk to Stephen to find an agreement on the
>> version syntax / version handling.
>>
>> I am sorry that I have to insist.
>>
>> Best wishes,
>> Martina
>>
>>
>> On 04/15/2011 11:18 PM, Drach, Bob wrote:
>> Hi Martina,
>>
>> There are a lot of things I like about the layout tool, but one aspect I'm not
>> happy with is that it chooses a dataset version. IMO that logic should reside
>> in the publisher, which has access to the history of dataset publication and
>> dataset definitions. In our environment the layout is done on a different
>> machine than publication, and does not have access to that history.
>> Consequently we support the DRS file layout with the exception of dataset
>> version numbers, which are defined later in the processing stream.
>>
>> Would it be difficult to provide an option for the QC tool to ignore
>> extraneous directories (not defined by DRS)?
>>
>> Best regards,
>>
>> Bob
>>
>>
>> On 4/15/11 4:52 AM, "Martina Stockhause"
>> <martina.stockhause at zmaw.de<UrlBlockedError.aspx>>  wrote:
>>
>>
>>   Hi, Dean, Karl, and Bob,
>>
>>   there was a discussion started about different types of versioning inside
>> ESGF for CMIP5 data on the QC request tracker (see:
>> http://redmine.dkrz.de/collaboration/issues/321). Jeff wrote: "
>>
>>
>> Bob Drach corrected me on one issue: our PCMDI version numbers are not DRS
>> version numbers, they are just a tool for keeping track of the data received
>> at PCMDI. Thus these version numbers are generated at PCMDI, while DRS version
>> numbers are generated by the data producer. PCMDI does not use Stephen's
>> versioning tool, or the DRS-style version numbers.
>>
>>
>> "
>>
>>
>>
>> Is that right? I thought that we agreed on a versioning procedure using
>> Stephen's tool.
>>
>>
>>
>> And I do have a problem with different ESG publication procedures (QC level 1
>> checks), i.e. different QC procedures at the three partners. Additionally, the
>> inconsistent naming conventions between WDCC / BADC on one side and PCMDI on
>> the other side cannot be handled by the QC Workflow. Since we do a federated
>> QC in three locations we need to use not only the same tools with the same
>> configurations for a comparability of QC results, but we need to use the same
>> naming conventions to grant a continuation of the overall QC process with QC
>> L3 / DOI publication.
>>
>>
>>
>> Thus the question:
>>   Could PCMDI use Stephen's tool for CMIP5 data versioning as well?
>>
>>
>>
>> Best wishes,
>>   Martina
>>
>>
>>

-- 
----------- DKRZ / Data Management -----------

Martina Stockhause
Deutsches Klimarechenzentrum
Bundesstr. 45a
D-20146 Hamburg
Germany

phone:	+49-40-460094-122
FAX:	+49-40-460094-106
e-mail:	martina.stockhause at zmaw.de

----------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110419/419da8fd/attachment.html 


More information about the GO-ESSP-TECH mailing list