[Go-essp-tech] publishing by realm
Bob Drach
drach at llnl.gov
Thu Feb 25 15:48:18 MST 2010
Hi Bryan,
Are you assuming that CMOR will assign the version numbers (either
atomic_dataset | realm_dataset | file)? That's not the case, and I'm
not sure that CMOR has sufficient information to do so.
It's worth recapping how the ESG publisher currently deals with
versioning:
- The publisher is given a dataset id and a list of files to be
published. Let's assume that dataset == realm_dataset here.
- If this is a new dataset, the dataset is assigned dataset_version=1
by default. Each file is assigned file_version=1. Dataset_version and
file_version are completely independent.
- If the dataset exists, each file is compared with it's existing
counterpart in the dataset (if present), based on a set of metadata:
checksum, file length, modification date, etc. If a file has changed,
it's file_version is incremented and that value is recorded in the
THREDDS catalog. Similarly, if the dataset has any files that have
been added, deleted, or modified, its dataset_version is incremented
and this is also recorded in THREDDS.
So suppose that we publish at the realm_dataset granularity and one
of the files in that dataset is updated. Then the file has a new
file_version, the dataset has a new dataset_version, both are
recorded in the TDS catalog. It should be possible for the replica
manager to compare old and new dataset versions - by comparing old
and new catalogs - to determine which files have changed, and only
transfer those files to the replica site.
Bob
On Feb 25, 2010, at 12:08 PM, Bryan Lawrence wrote:
> Hi Bob
>
> On Thursday 25 February 2010 19:27:15 Bob Drach wrote:
>> Where would 'atomic dataset version' be stored? In ESG there would
>> only be realm-dataset versions and individual file versions.
>
> The DRS is writing a version associated with the atomic dataset as
> defined within it. We expect modelling groups would conform to
> that, and update versions according to it. We could rewrite the
> DRS ... (and hence CMOR presumably ... but it's a bit late for
> that ... or maybe I'm missing something).
>
> That means, if we leave things the way there are: there is a
> logical disconnect, and the risk of either vastly more data
> movement than is necessary, or a complex resolution problem (is my
> replicated "realm" level dataset the same as yours, if we've done
> replication at the file level).
>
> cheers
> Bryan
>
> --
> Bryan Lawrence
> Director of Environmental Archival and Associated Research
> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> STFC, Rutherford Appleton Laboratory
> Phone +44 1235 445012; Fax ... 5848;
> Web: home.badc.rl.ac.uk/lawrence
More information about the GO-ESSP-TECH
mailing list