[Go-essp-tech] Proposal for adjusting our definition of an atomic dataset

Mon Dec 7 15:02:41 MST 2009

A bunch of the ESG developers are in NCAR this week talking in detail
about versioning and representing replicas in the datanode and gateway.  
We have come to the conclusion that in order to implement replication 
we need to confine ourselves to replicating entire atomic datasets.  
We would like to work with the following principles:

1. CMIP5 archive is a set of atomic datasets
2. The CMIP5 standard output is a subset of the CMIP5 archive
3. We only replicate entire atomic datasets.

>From previous emails it is apparent that the standard output does not
correspond to a set of atomic datasets because in some cases standard
output is a temporal subset of an atomic dataset.  This implies that a
replica of an atomic dataset would be a temporal subset of that atomic
dataset.

Therefore we propose adjusting the definition of an atomic dataset to
allow us to only replicate entire atomic datasets.  We suggest 2 ways
of achieving this:

 1. Add an extra attribute to the DRS syntax to represent the
 difference between standard and non-standard output.  Atomic datasets
 that currently span standard and non-standard output would be split
 into 2 atomic datasets.  Other atomic datasets would exist in one
 category or the other.

 2. Split all experiments (as definied in the DRS) that contain atomic
 datasets that span standard and non-standard output into 2
 experiments e.g. "<expt>_standard", "<expt>_optional".

We'd like to discuss this proposal at the telco tomorrow.  Comments welcome.

Thanks,
Stephen.

-- 
Scanned by iCritical.