[Go-essp-tech] publishing by realm -- required DRS modifications.

Luca Cinquini luca at ucar.edu
Fri Feb 26 09:51:28 MST 2010


Hi Martin,
	the way I understood is that there would be two dataset for each  
realm, for example "atmos/output" and "atmos/requested", the latter  
being a sub-set of the former, and each of those could be replicated  
as a whole.
Luca

On Feb 26, 2010, at 8:43 AM, <martin.juckes at stfc.ac.uk> wrote:

>
> Hello,
>
> I have eventually got round to checking this idea against Karl's  
> specification of the replication subset. The latter would not be  
> complete realm level datasets (e.g. ocean, monthly data is not all  
> to be replicated). This means that we would have to revise the  
> replication plan, because we had been counting on replicating  
> complete published units of the "requested" product. An alternative  
> approach might be to replace the "requested" product with a  
> "replicated" product. The ESG data node would then have "output" and  
> "replicated" products, the latter being a subset of the former both  
> in terms of the temporal coverage and the number of variable  
> included. The entire "replicated" product would then, as the name  
> suggests, be replicated.
>
> A second DRS modification which would be required by the realm level  
> publishing is the scrapping of the atomic dataset versioning and  
> replacing this with versioning at the realm level.
>
> cheers,
> Martin
>
> -----Original Message-----
> From: go-essp-tech-bounces at ucar.edu on behalf of Bob Drach
> Sent: Thu 25/02/2010 22:48
> To: Lawrence, Bryan (STFC,RAL,SSTD)
> Cc: go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] publishing by realm
>
> Hi Bryan,
>
> Are you assuming that CMOR will assign the version numbers (either
> atomic_dataset | realm_dataset | file)? That's not the case, and I'm
> not sure that CMOR has sufficient information to do so.
>
> It's worth recapping how the ESG publisher currently deals with
> versioning:
>
> - The publisher is given a dataset id and a list of files to be
> published. Let's assume that dataset == realm_dataset here.
> - If this is a new dataset, the dataset is assigned dataset_version=1
> by default. Each file is assigned file_version=1. Dataset_version and
> file_version are completely independent.
> - If the dataset exists, each file is compared with it's existing
> counterpart in the dataset (if present), based on a set of metadata:
> checksum, file length, modification date, etc. If a file has changed,
> it's file_version is incremented and that value is recorded in the
> THREDDS catalog. Similarly, if the dataset has any files that have
> been added, deleted, or modified, its dataset_version is incremented
> and this is also recorded in THREDDS.
>
> So suppose that we publish at the realm_dataset granularity and one
> of the files in that dataset is updated. Then the file has a new
> file_version, the dataset has a new dataset_version, both are
> recorded in the TDS catalog. It should be possible for the replica
> manager to compare old and new dataset versions - by comparing old
> and new catalogs - to determine which files have changed, and only
> transfer those files to the replica site.
>
> Bob
>
>
> On Feb 25, 2010, at 12:08 PM, Bryan Lawrence wrote:
>
>> Hi Bob
>>
>> On Thursday 25 February 2010 19:27:15 Bob Drach wrote:
>>> Where would 'atomic dataset version' be stored? In ESG there would
>>> only be realm-dataset versions and individual file versions.
>>
>> The DRS is writing a version associated with the atomic dataset as
>> defined within it. We expect modelling groups would conform to
>> that, and update versions according to it. We could rewrite the
>> DRS ... (and hence CMOR presumably ... but it's a bit late for
>> that ... or maybe I'm missing something).
>>
>> That means, if we leave things the way there are: there is a
>> logical disconnect, and the risk of either vastly more data
>> movement than is necessary, or a complex resolution problem (is my
>> replicated "realm" level dataset the same as yours, if we've done
>> replication at the file level).
>>
>> cheers
>> Bryan
>>
>> -- 
>> Bryan Lawrence
>> Director of Environmental Archival and Associated Research
>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
>> STFC, Rutherford Appleton Laboratory
>> Phone +44 1235 445012; Fax ... 5848;
>> Web: home.badc.rl.ac.uk/lawrence
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>
> -- 
> Scanned by iCritical.
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech



More information about the GO-ESSP-TECH mailing list