[Go-essp-tech] Versioning in CMIP5 including QC procedure

stephen.pascoe at stfc.ac.uk stephen.pascoe at stfc.ac.uk
Mon Apr 18 02:21:10 MDT 2011


I completely agree with Martin.  For us the key event that dictates the dataset version is when the data is placed in our archive -- the date of that event is what becomes the version number.

I also notice that neither the CCCMA, NASA-GISS  or BCC datanodes are using date-versioning!

Stephen.

---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK


-----Original Message-----
From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of martin.juckes at stfc.ac.uk
Sent: 18 April 2011 08:48
To: martina.stockhause at zmaw.de; drach1 at llnl.gov; taylor13 at llnl.gov; Lawrence, Bryan (STFC,RAL,RALSP)
Cc: go-essp-tech at ucar.edu; michael.lautenschlager at zmaw.de; painter1 at llnl.gov
Subject: Re: [Go-essp-tech] Versioning in CMIP5 including QC procedure

Hello All,

I'd like to back up Martina on this. We do have an agreed version control system defined in the DRS document (or, more precisely, the DRS document defines an aspect of the version control which should be implemented -- namely a subdirectory level which indicates the version).

We agreed that the version of a publication dataset should be reflected in the directory structure, and it is clear that this requires that the version of the publication dataset be determined before running the ESG publisher, not by the ESG publisher. The QC software runs on the file system rather than accessing data through the data node software, so consistency between the file-system layout and the publication units is essential. While it is possible to imagine archives in which all access to data is through the data node and the file-system layout is of no interest anyone but the data node development team, that is clearly not the situation here -- no matter how desirable it might be.

QC is a very important part of ensuring consistency of quality in the archive, we really need to make it work.

At BADC, we started doing the layout on the same machine as the publishing, but are now moving it to a different machine -- there is no fundamental problem with this (the machine on which the layout is done can, of course, communicate with the machine on which the publishing is done).

regards,
Martin


From: go-essp-tech-bounces at ucar.edu [go-essp-tech-bounces at ucar.edu] on behalf of Martina Stockhause [martina.stockhause at zmaw.de]
Sent: 18 April 2011 06:57
To: Drach, Bob
Cc: GO-ESSP; Painter, Jeff; michael.lautenschlager
Subject: Re: [Go-essp-tech] Versioning in CMIP5 including QC procedure

Hi, Bob,

since not every published dataset is part of the DOI (on the level of experiment), I have to keep track of versions as well, on the dataset and on the experiment (DOI) level. The inhomogeneity of the dataset version syntax is more a problem of version control within the QC than one of the QC L2 checker, the QC L2 analyzer, or the QC L2 result export for QC L3.

I do not care if the homogeneous version syntax is yours or that of BADC and DKRZ, though the latter would save me adaptation effort, but *that* it is homogeneous. Maybe you could talk to Stephen to find an agreement on the version syntax / version handling.

I am sorry that I have to insist.

Best wishes,
Martina


On 04/15/2011 11:18 PM, Drach, Bob wrote:
Hi Martina,

There are a lot of things I like about the layout tool, but one aspect I'm not happy with is that it chooses a dataset version. IMO that logic should reside in the publisher, which has access to the history of dataset publication and dataset definitions. In our environment the layout is done on a different machine than publication, and does not have access to that history. Consequently we support the DRS file layout with the exception of dataset version numbers, which are defined later in the processing stream.

Would it be difficult to provide an option for the QC tool to ignore extraneous directories (not defined by DRS)?

Best regards,

Bob


On 4/15/11 4:52 AM, "Martina Stockhause" <martina.stockhause at zmaw.de<UrlBlockedError.aspx>> wrote:


 Hi, Dean, Karl, and Bob,

 there was a discussion started about different types of versioning inside ESGF for CMIP5 data on the QC request tracker (see: http://redmine.dkrz.de/collaboration/issues/321). Jeff wrote: "


Bob Drach corrected me on one issue: our PCMDI version numbers are not DRS version numbers, they are just a tool for keeping track of the data received at PCMDI. Thus these version numbers are generated at PCMDI, while DRS version numbers are generated by the data producer. PCMDI does not use Stephen's versioning tool, or the DRS-style version numbers.


"



Is that right? I thought that we agreed on a versioning procedure using Stephen's tool.



And I do have a problem with different ESG publication procedures (QC level 1 checks), i.e. different QC procedures at the three partners. Additionally, the inconsistent naming conventions between WDCC / BADC on one side and PCMDI on the other side cannot be handled by the QC Workflow. Since we do a federated QC in three locations we need to use not only the same tools with the same configurations for a comparability of QC results, but we need to use the same naming conventions to grant a continuation of the overall QC process with QC L3 / DOI publication.



Thus the question:
 Could PCMDI use Stephen's tool for CMIP5 data versioning as well?



Best wishes,
 Martina




-- 
Scanned by iCritical.
_______________________________________________
GO-ESSP-TECH mailing list
GO-ESSP-TECH at ucar.edu
http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list