[Go-essp-tech] publishing by realm -- required DRS modifications.
martin.juckes at stfc.ac.uk
martin.juckes at stfc.ac.uk
Fri Feb 26 08:43:58 MST 2010
Hello,
I have eventually got round to checking this idea against Karl's specification of the replication subset. The latter would not be complete realm level datasets (e.g. ocean, monthly data is not all to be replicated). This means that we would have to revise the replication plan, because we had been counting on replicating complete published units of the "requested" product. An alternative approach might be to replace the "requested" product with a "replicated" product. The ESG data node would then have "output" and "replicated" products, the latter being a subset of the former both in terms of the temporal coverage and the number of variable included. The entire "replicated" product would then, as the name suggests, be replicated.
A second DRS modification which would be required by the realm level publishing is the scrapping of the atomic dataset versioning and replacing this with versioning at the realm level.
cheers,
Martin
-----Original Message-----
From: go-essp-tech-bounces at ucar.edu on behalf of Bob Drach
Sent: Thu 25/02/2010 22:48
To: Lawrence, Bryan (STFC,RAL,SSTD)
Cc: go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] publishing by realm
Hi Bryan,
Are you assuming that CMOR will assign the version numbers (either
atomic_dataset | realm_dataset | file)? That's not the case, and I'm
not sure that CMOR has sufficient information to do so.
It's worth recapping how the ESG publisher currently deals with
versioning:
- The publisher is given a dataset id and a list of files to be
published. Let's assume that dataset == realm_dataset here.
- If this is a new dataset, the dataset is assigned dataset_version=1
by default. Each file is assigned file_version=1. Dataset_version and
file_version are completely independent.
- If the dataset exists, each file is compared with it's existing
counterpart in the dataset (if present), based on a set of metadata:
checksum, file length, modification date, etc. If a file has changed,
it's file_version is incremented and that value is recorded in the
THREDDS catalog. Similarly, if the dataset has any files that have
been added, deleted, or modified, its dataset_version is incremented
and this is also recorded in THREDDS.
So suppose that we publish at the realm_dataset granularity and one
of the files in that dataset is updated. Then the file has a new
file_version, the dataset has a new dataset_version, both are
recorded in the TDS catalog. It should be possible for the replica
manager to compare old and new dataset versions - by comparing old
and new catalogs - to determine which files have changed, and only
transfer those files to the replica site.
Bob
On Feb 25, 2010, at 12:08 PM, Bryan Lawrence wrote:
> Hi Bob
>
> On Thursday 25 February 2010 19:27:15 Bob Drach wrote:
>> Where would 'atomic dataset version' be stored? In ESG there would
>> only be realm-dataset versions and individual file versions.
>
> The DRS is writing a version associated with the atomic dataset as
> defined within it. We expect modelling groups would conform to
> that, and update versions according to it. We could rewrite the
> DRS ... (and hence CMOR presumably ... but it's a bit late for
> that ... or maybe I'm missing something).
>
> That means, if we leave things the way there are: there is a
> logical disconnect, and the risk of either vastly more data
> movement than is necessary, or a complex resolution problem (is my
> replicated "realm" level dataset the same as yours, if we've done
> replication at the file level).
>
> cheers
> Bryan
>
> --
> Bryan Lawrence
> Director of Environmental Archival and Associated Research
> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> STFC, Rutherford Appleton Laboratory
> Phone +44 1235 445012; Fax ... 5848;
> Web: home.badc.rl.ac.uk/lawrence
_______________________________________________
GO-ESSP-TECH mailing list
GO-ESSP-TECH at ucar.edu
http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
--
Scanned by iCritical.
More information about the GO-ESSP-TECH
mailing list