[Go-essp-tech] Are atomic datasets mutable?
Bentley, Philip
philip.bentley at metoffice.gov.uk
Thu Nov 19 06:49:09 MST 2009
Hi Folks,
I think I'm the only UKMO person who subscribes to this list so I should probably chip in ;-)
Our line of thinking yesterday was that, for these long simulations, because we'd only be *extending* the timespan covered by the atomic dataset and not actually *modifying* the content of the existing netcdf files comprising the dataset, we weren't sure if this really constituted a new version.
However, the arguments and examples put forward on this list do indeed suggest that we are dealing with a new version of the atomic dataset - i.e. when thought of as a holistic collection of files. So although the file content may not have changed, the definition of the dataset has (e.g. vN = 10 files, vN+1 = 11 files).
Presumably there needs to be some corresponding process to handle versioned updates to the corresponding metadata. But that's another story I guess...:-)
Regards,
Phil
> -----Original Message-----
> From: go-essp-tech-bounces at ucar.edu
> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Bryan Lawrence
> Sent: 19 November 2009 13:15
> To: go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] Are atomic datasets mutable?
>
> Hi Martin
>
> I agree.
>
> We need to have some clear blue water between a disk
> subsystem, and publication.
>
> At one end of the spectrum of the ways one could do this, we
> would have a supercomputer writing directly to an ESG data
> node disk (yeah, I know how likely that isn't) ...
>
> ... at the other end of the spectrum we have a modelling
> group waiting until their data was perfectly polished and
> sending us one copy, just once, via some nice shiny data
> transfer protocol (disk, bdm whatever) ....
>
> Neither are particularly likely, but some atomic datasets
> might look like the latter, but not, I suspect, all atomic
> datasets from any given institution ...
>
> I think those of us running ESG data nodes, whether "on
> behalf of" (like us) or "for real" (like, for example, IPSL),
> need to conform to some agreed principles (which is why we
> have "operational procedures" and "quality control" as things
> to agree on, and probably revisit, regularly).
>
> One of those principles would be to "update" atomic datasets
> at a "friendly" frequency. Regardless of whether these things
> are versioned at source, they'd still have to be replicated
> and synchronised ... and that process would need to be
> managed and verified ... which I think suggests some sort of
> friendly agreement on update frequency would be advantageous
> (which is not to say it need be particularly intermittent,
> but for the foreseeable future, real q.c. and verificaiton is
> likely to involve human inspection, and so that puts an upper
> limit on the definition of "friendly").
>
> Having said "regardless of versioning" it's clear that
> replication and synchronisation will be much easier with
> versioning of atomic datasets, becuase then we can keep the
> metadata in sync (otherwise we might as will give up and
> point the metadata to individual files).
>
> So, when we get data from the met office, they can give it to
> us in whatever chunks they like, but it'd be up to us to
> decide how often to "ESG publish" it, and I think each time
> we do that, it should be a new version ...
>
> Further, I think Sébastien's suggestion is a good one: I
> would vote for: You can publish once you reach the minimum
> duration, and then you can republish later with longer
> simulation if you want to, but in a sense it will be a new
> version (just longer version). That would probably forestall
> most of the issue with "friendly" frequencies ...
>
> Bryan
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>
More information about the GO-ESSP-TECH
mailing list