[Go-essp-tech] Are atomic datasets mutable?

Bryan Lawrence bryan.lawrence at stfc.ac.uk
Thu Nov 19 06:15:13 MST 2009


Hi Martin

I agree.

We need to have some clear blue water between a disk subsystem, and publication.

At one end of the spectrum of the ways one could do this, we would have a supercomputer
writing directly to an ESG data node disk (yeah, I know how likely that isn't) ...

... at the other end of the spectrum we have a modelling group waiting until their data
was perfectly polished and sending us one copy, just once, via some nice shiny data
transfer protocol (disk, bdm whatever) ....

Neither are particularly likely, but some atomic datasets might look like the latter, but not, I suspect, all atomic datasets  from any given institution ...

I think those of us running ESG data nodes, whether "on behalf of" (like us) or "for real" (like, for example, IPSL), need to conform to some agreed principles (which is why we have "operational procedures" and "quality control" as things to agree on, and probably revisit, regularly).

One of those principles would be to "update" atomic datasets at a "friendly" frequency. Regardless of whether these things are versioned at source, they'd still have to be replicated and synchronised ... and that process would need to be managed and verified ... which I think suggests some sort of friendly agreement on update frequency would be advantageous (which is not to say it need be particularly intermittent, but for the foreseeable future, real q.c. and verificaiton is likely to involve human inspection, and so that puts an upper limit on the definition of "friendly").

Having said "regardless of versioning" it's clear that replication and synchronisation will be much easier with versioning of atomic datasets, becuase then we can keep the metadata in sync (otherwise we might as will give up and point the metadata to individual files).

So, when we get data from the met office, they can give it to us in whatever chunks they like, but it'd be up to us to decide how often to "ESG publish" it, and I think each time we do that, it should be a new version ... 

Further, I think Sébastien's suggestion is a good one:  I would vote for: You can publish once you reach
the minimum duration, and then you can republish later with longer simulation if you want to, but in a sense it will be a new version (just longer version). That would probably forestall most of the issue with "friendly" frequencies ...

Bryan


-- 
Bryan Lawrence
Director of Environmental Archival and Associated Research
(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
STFC, Rutherford Appleton Laboratory
Phone +44 1235 445012; Fax ... 5848; 
Web: home.badc.rl.ac.uk/lawrence


More information about the GO-ESSP-TECH mailing list