[Go-essp-tech] Are atomic datasets mutable?

Bryan Lawrence bryan.lawrence at stfc.ac.uk
Mon Nov 23 04:19:04 MST 2009


Hi Stephen, Karl

Stephen's got me worried.

On Monday 23 November 2009 11:00:10 Pascoe, Stephen (STFC,RAL,SSTD) wrote:
> It has implications for our data management system, particularly how we
> replicate the core.  It means that our atomic datasets aren't atomic!
> Consider an atomic dataset that has both core and tier 1 data.  The core
> portion will be replicated at PCMDI, BADC, etc.  These replicas will
> contain less data than at the originating node.  This will extremely
> confusing to users who go download replicas of the core.

We probably need to be a bit careful about the use of the word core ... core in the sense of core and tier1 refers only to the nature of the experiment (Ron was insistent I didn't call it "priority").

In the "core" data centres, we expect to keep a copies of the standard output from experiments, which in practice means a *subset* of the data from all the core and tier1 and tier2 experiments.

Ideally, those subsets correspond *exactly* with a selection of atomic datasets,  which allow us to have a separation of concerns between replication and higher level functions. We want to replicate the verisons of atomic datasets as appropriate.

So we have a problem if we go with amalgmating experiments from core and tier1 if the standard output requirements differ from one to the other. (I don't know if they do, I'd have to look).

We also have a problem with those standard output requirements where they are "last thirty years" of the simulation, for example ...  IFF, the originating modelling centre copies all the years onto their ESG datanode (so it's one atomic dataset), and we then copy a subset off to the ESG "Core" nodes
where the atomic dataset has the same name, but different content. Bad for users. Impossible for replication handling at least as I had envisaged it. Maybe replication is ok the way Gavin had envisaged it :-) (i.e via the catalog).

It's still bad for users though. We could have two atomic datasets with rather different contents ... if the modelling centres load more than their standard output onto their ESG nodes (and we expect them to).

I'm going to think on this, and hope someone else knows the resolution.

Bryan

-- 
Bryan Lawrence
Director of Environmental Archival and Associated Research
(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
STFC, Rutherford Appleton Laboratory
Phone +44 1235 445012; Fax ... 5848; 
Web: home.badc.rl.ac.uk/lawrence


More information about the GO-ESSP-TECH mailing list