[Go-essp-tech] publishing by realm

Bob Drach drach at llnl.gov
Thu Feb 25 15:48:18 MST 2010


Hi Bryan,

Are you assuming that CMOR will assign the version numbers (either  
atomic_dataset | realm_dataset | file)? That's not the case, and I'm  
not sure that CMOR has sufficient information to do so.

It's worth recapping how the ESG publisher currently deals with  
versioning:

- The publisher is given a dataset id and a list of files to be  
published. Let's assume that dataset == realm_dataset here.
- If this is a new dataset, the dataset is assigned dataset_version=1  
by default. Each file is assigned file_version=1. Dataset_version and  
file_version are completely independent.
- If the dataset exists, each file is compared with it's existing  
counterpart in the dataset (if present), based on a set of metadata:  
checksum, file length, modification date, etc. If a file has changed,  
it's file_version is incremented and that value is recorded in the  
THREDDS catalog. Similarly, if the dataset has any files that have  
been added, deleted, or modified, its dataset_version is incremented  
and this is also recorded in THREDDS.

So suppose that we publish at the realm_dataset granularity and one  
of the files in that dataset is updated. Then the file has a new  
file_version, the dataset has a new dataset_version, both are  
recorded in the TDS catalog. It should be possible for the replica  
manager to compare old and new dataset versions - by comparing old  
and new catalogs - to determine which files have changed, and only  
transfer those files to the replica site.

Bob


On Feb 25, 2010, at 12:08 PM, Bryan Lawrence wrote:

> Hi Bob
>
> On Thursday 25 February 2010 19:27:15 Bob Drach wrote:
>> Where would 'atomic dataset version' be stored? In ESG there would
>> only be realm-dataset versions and individual file versions.
>
> The DRS is writing a version associated with the atomic dataset as  
> defined within it. We expect modelling groups would conform to  
> that, and update versions according to it. We could rewrite the  
> DRS ... (and hence CMOR presumably ... but it's a bit late for  
> that ... or maybe I'm missing something).
>
> That means, if we leave things the way there are: there is a  
> logical disconnect, and the risk of either vastly more data  
> movement than is necessary, or a complex resolution problem (is my  
> replicated "realm" level dataset the same as yours, if we've done  
> replication at the file level).
>
> cheers
> Bryan
>
> -- 
> Bryan Lawrence
> Director of Environmental Archival and Associated Research
> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> STFC, Rutherford Appleton Laboratory
> Phone +44 1235 445012; Fax ... 5848;
> Web: home.badc.rl.ac.uk/lawrence



More information about the GO-ESSP-TECH mailing list