[Go-essp-tech] publishing by realm -- required DRS modifications.

martin.juckes at stfc.ac.uk martin.juckes at stfc.ac.uk
Fri Feb 26 08:43:58 MST 2010


Hello,

I have eventually got round to checking this idea against Karl's specification of the replication subset. The latter would not be complete realm level datasets (e.g. ocean, monthly data is not all to be replicated). This means that we would have to revise the replication plan, because we had been counting on replicating complete published units of the "requested" product. An alternative approach might be to replace the "requested" product with a "replicated" product. The ESG data node would then have "output" and "replicated" products, the latter being a subset of the former both in terms of the temporal coverage and the number of variable included. The entire "replicated" product would then, as the name suggests, be replicated.

A second DRS modification which would be required by the realm level publishing is the scrapping of the atomic dataset versioning and replacing this with versioning at the realm level. 

cheers,
Martin

-----Original Message-----
From: go-essp-tech-bounces at ucar.edu on behalf of Bob Drach
Sent: Thu 25/02/2010 22:48
To: Lawrence, Bryan (STFC,RAL,SSTD)
Cc: go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] publishing by realm
 
Hi Bryan,

Are you assuming that CMOR will assign the version numbers (either  
atomic_dataset | realm_dataset | file)? That's not the case, and I'm  
not sure that CMOR has sufficient information to do so.

It's worth recapping how the ESG publisher currently deals with  
versioning:

- The publisher is given a dataset id and a list of files to be  
published. Let's assume that dataset == realm_dataset here.
- If this is a new dataset, the dataset is assigned dataset_version=1  
by default. Each file is assigned file_version=1. Dataset_version and  
file_version are completely independent.
- If the dataset exists, each file is compared with it's existing  
counterpart in the dataset (if present), based on a set of metadata:  
checksum, file length, modification date, etc. If a file has changed,  
it's file_version is incremented and that value is recorded in the  
THREDDS catalog. Similarly, if the dataset has any files that have  
been added, deleted, or modified, its dataset_version is incremented  
and this is also recorded in THREDDS.

So suppose that we publish at the realm_dataset granularity and one  
of the files in that dataset is updated. Then the file has a new  
file_version, the dataset has a new dataset_version, both are  
recorded in the TDS catalog. It should be possible for the replica  
manager to compare old and new dataset versions - by comparing old  
and new catalogs - to determine which files have changed, and only  
transfer those files to the replica site.

Bob


On Feb 25, 2010, at 12:08 PM, Bryan Lawrence wrote:

> Hi Bob
>
> On Thursday 25 February 2010 19:27:15 Bob Drach wrote:
>> Where would 'atomic dataset version' be stored? In ESG there would
>> only be realm-dataset versions and individual file versions.
>
> The DRS is writing a version associated with the atomic dataset as  
> defined within it. We expect modelling groups would conform to  
> that, and update versions according to it. We could rewrite the  
> DRS ... (and hence CMOR presumably ... but it's a bit late for  
> that ... or maybe I'm missing something).
>
> That means, if we leave things the way there are: there is a  
> logical disconnect, and the risk of either vastly more data  
> movement than is necessary, or a complex resolution problem (is my  
> replicated "realm" level dataset the same as yours, if we've done  
> replication at the file level).
>
> cheers
> Bryan
>
> -- 
> Bryan Lawrence
> Director of Environmental Archival and Associated Research
> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> STFC, Rutherford Appleton Laboratory
> Phone +44 1235 445012; Fax ... 5848;
> Web: home.badc.rl.ac.uk/lawrence

_______________________________________________
GO-ESSP-TECH mailing list
GO-ESSP-TECH at ucar.edu
http://mailman.ucar.edu/mailman/listinfo/go-essp-tech

-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list