[Go-essp-tech] Proposed version directory structure document

Bob Drach drach1 at llnl.gov
Thu Apr 15 17:14:07 MDT 2010


Hi Stephen,

Let me clarify a few points in the description of ESG Publisher:

The document states: "ESG Publisher version system is built around  
mutable datasets.  It does not attempt to maintain references to  
previous data and the dataset version number is not part of the  
dataset id unless the publisher is configured to include it from the  
dataset metadata.  This means that it is not straight forward at this  
time to publish multiple versions of an atomic dataset unless each  
version is published as a separate dataset.  This approach would  
effectively ignore ESG Publisher's version system and manage all  
versions independently."

- As of Version 2 the unit of publication is in fact a 'dataset  
version', terminology that came out of the December meeting in  
Boulder. A dataset version is an immutable object which can represent  
a 'DRS dataset including version number'. The published 'dataset  
version' itself has an identifier which typically consists of  
dataset_id+version number; this appears in the THREDDS catalog. As you  
stated in the document, whether or not the published dataset  
corresponds to a DRS dataset is a matter of publisher configuration,  
not an inherent property of the publisher.

- The node database does in fact maintain references to the  
composition of previous dataset versions. It is possible to have  
multiple versions published simultaneously, to list all published  
versions of a dataset, and for any given dataset version the files  
contained in that version can be listed.

- The intention of the publisher design is to automate versioning as  
much as possible. A 'dataset' is considered to be a collection of  
dataset versions. Consequently, 'publishing a dataset' really means  
'publishing a dataset version where the version number is incremented  
relative to the previous version.' Similarly, 'unpublishing' a dataset  
by default unpublishes all versions of a dataset. The terminology  
dataset_id#n can be used to refer to a specific version.



In short, there is no fundamental mismatch between the DRS model and  
the ESG publisher.



Best regards,



Bob





On Apr 15, 2010, at 3:24 AM, <stephen.pascoe at stfc.ac.uk> wrote:

> Hi everyone,
>
> Attached is my view on how we should structure the archive to  
> support multiple versions.  It divides into 2 main sections, the  
> first is a fairly lengthy summary of why this problem isn't solved  
> yet in terms of the differences between the ESG datanode software  
> and the DRS document.  The second section lays out the proposed  
> structure and how we would manage symbolic links and moving from one  
> version to another.  I restrict myself to directories below the  
> atomic dataset level.
>
> Lots of issues are left to resolve, in particular how we ESG  
> publisher can make use of this structure.  I'll try and draw  
> attention to these points in the agenda for Tuesday's telco which  
> will follow later today.
>
> Cheers,
> Stephen.
>
> ---
> Stephen Pascoe  +44 (0)1235 445980
> British Atmospheric Data Centre
> Rutherford Appleton Laboratory
>
>
> -- 
> Scanned by iCritical.
>
>
> < 
> ESGF_version_structure 
> .odt>_______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20100415/a9789ad6/attachment.html 


More information about the GO-ESSP-TECH mailing list