[Go-essp-tech] Are atomic datasets mutable?

Thu Nov 19 06:49:09 MST 2009

Hi Folks,

I think I'm the only UKMO person who subscribes to this list so I should probably chip in ;-)

Our line of thinking yesterday was that, for these long simulations, because we'd only be *extending* the timespan covered by the atomic dataset and not actually *modifying* the content of the existing netcdf files comprising the dataset, we weren't sure if this really constituted a new version.

However, the arguments and examples put forward on this list do indeed suggest that we are dealing with a new version of the atomic dataset - i.e. when thought of as a holistic collection of files. So although the file content may not have changed, the definition of the dataset has (e.g. vN = 10 files, vN+1 = 11 files).

Presumably there needs to be some corresponding process to handle versioned updates to the corresponding metadata. But that's another story I guess...:-)

Regards,
Phil

> -----Original Message-----
> From: go-essp-tech-bounces at ucar.edu 
> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Bryan Lawrence
> Sent: 19 November 2009 13:15
> To: go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] Are atomic datasets mutable?
> 
> Hi Martin
> 
> I agree.
> 
> We need to have some clear blue water between a disk 
> subsystem, and publication.
> 
> At one end of the spectrum of the ways one could do this, we 
> would have a supercomputer writing directly to an ESG data 
> node disk (yeah, I know how likely that isn't) ...
> 
> ... at the other end of the spectrum we have a modelling 
> group waiting until their data was perfectly polished and 
> sending us one copy, just once, via some nice shiny data 
> transfer protocol (disk, bdm whatever) ....
> 
> Neither are particularly likely, but some atomic datasets 
> might look like the latter, but not, I suspect, all atomic 
> datasets  from any given institution ...
> 
> I think those of us running ESG data nodes, whether "on 
> behalf of" (like us) or "for real" (like, for example, IPSL), 
> need to conform to some agreed principles (which is why we 
> have "operational procedures" and "quality control" as things 
> to agree on, and probably revisit, regularly).
> 
> One of those principles would be to "update" atomic datasets 
> at a "friendly" frequency. Regardless of whether these things 
> are versioned at source, they'd still have to be replicated 
> and synchronised ... and that process would need to be 
> managed and verified ... which I think suggests some sort of 
> friendly agreement on update frequency would be advantageous 
> (which is not to say it need be particularly intermittent, 
> but for the foreseeable future, real q.c. and verificaiton is 
> likely to involve human inspection, and so that puts an upper 
> limit on the definition of "friendly").
> 
> Having said "regardless of versioning" it's clear that 
> replication and synchronisation will be much easier with 
> versioning of atomic datasets, becuase then we can keep the 
> metadata in sync (otherwise we might as will give up and 
> point the metadata to individual files).
> 
> So, when we get data from the met office, they can give it to 
> us in whatever chunks they like, but it'd be up to us to 
> decide how often to "ESG publish" it, and I think each time 
> we do that, it should be a new version ... 
> 
> Further, I think Sébastien's suggestion is a good one:  I 
> would vote for: You can publish once you reach the minimum 
> duration, and then you can republish later with longer 
> simulation if you want to, but in a sense it will be a new 
> version (just longer version). That would probably forestall 
> most of the issue with "friendly" frequencies ...
> 
> Bryan
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>