[Go-essp-tech] Are atomic datasets mutable?

Fri Nov 20 02:37:31 MST 2009

Hello, 

> > Option 1) 21st century data for the RCP4.5 future run is received
and
> > identified as version 1.  Then the continuation of that run to the
> end
> > of the 23rd century is received and stored as version 2.  New users
> will
> > have to download both version 1 and version 2 to get the complete
> run.
>
I'm not sure about the last point (from Karl): I would expect the latest
version of the atomic dataset to "contain" all the files. I've used
quotes because there was a stage at which "atomic datasets" where
considered as abstract data collections rather than data directories. If
we are now talking about data directories we obviously would not want to
copy all the files from version-1 into version-2. 

On Bryan's point about different experiments, I note that there is some
ambiguity in the text of the DRS as to whether the controlled vocabulary
item "scenario/experiment" refers to experiment or scenario, but the
appendix does imply that the item will be the scenario (e.g. "rcp45"),
and I think it will be easier for the users if we take this approach.

On the difference between extensions versus changes: can we use decimal
versions: starting at v1.00, v1.01 when data is added, and v2.00 when
any data in the atomic dataset (even if it is only one file) is changed.

This would mean that we could use a single directory "v1" to "contain"
v1.00, v1.01 etc., combining a reasonably stable directory structure
with a more detailed version history in the metadata held at the atomic
dataset level,

Cheers,
Martin

> -----Original Message-----
> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-
> bounces at ucar.edu] On Behalf Of Bryan Lawrence
> Sent: 20 November 2009 06:09
> To: go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] Are atomic datasets mutable?
> 
> Hi Karl
> 
> I've promised Don to write something up about the use cases for
> versioning as we understand them, and that would include the use cases
> you describe ... but to cut to the chase the bottom line is , you
can't
> have a DOI pointing at an object, and allow it to change (even by
> extension).
> 
> > I think the users will be confused if a new "version" of model
output
> > (that has been modified in some way) is indistinguishable from model
> > output that has been simply extended.  Both of the following options
> > will be confusing:
> 
> The key word is "indistinguishable". How? You do need to start relying
> on metadata (internal AND external to files) or this data management
> job is just impossible.
> 
> > Option 1) 21st century data for the RCP4.5 future run is received
and
> > identified as version 1.  Then the continuation of that run to the
> end
> > of the 23rd century is received and stored as version 2.  New users
> will
> > have to download both version 1 and version 2 to get the complete
> run.
> 
> There are two classes of this example:
>  - some of these examples are going to come from tier1 experiments
> which extend core experiments.
>  - some are going to be because of the way folk have done their runs
> 
> The first instance is covered by the fact that technically the second
> tranch of data is a different atomic dataset, and analysis should
> exploit a concatenation of two atomic datasets ... so the issue I
think
> is about the second case, and I think we covered that in discussion
> yesterday.
> 
> > Option 2) 21st century data for the RCP4.5 future run is received
and
> > identified as version 1.   Then the continuation of that run to the
> end
> > of the 23rd century is received and stored as version 2 along with a
> > copy of the data already stored as version 1.  In this case a new
> user
> > will get all the data by downloading version 2, but an old user who
> > already downloaded version 1, won't know if what's in version 2 is a
> > duplicate of the data he already has, or is replacement data which
> has
> > corrected some problems in the earlier version.
> 
> Metadata metadata metadata.
> 
> > I would suggest therefore that for a single experiment, it would be
> best
> > from a user's perspective to not assign a new version to model
output
> > that simply extends a previous run.  We will have to find a method
by
> > which to advise old users who already downloaded data that the runs
> have
> > now been extended.
> 
> I don't think this is the right solution, it makes for even more
> confusion in the long run, because your two users have used
*different*
> data in their analyses and yet they have assigned it the same
> "reference" (name including version with or without a DOI).
> 
> Speaking as a potential DOI authority, there can be no chance of
> allowing something to change underfoot otherwise we'd have no
> credibility. Extension is change.
> 
> As I say, we'll try and write something coherent on this over the next
> fortnight.
> 
> Bryan
> 
> --
> Bryan Lawrence
> Director of Environmental Archival and Associated Research
> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> STFC, Rutherford Appleton Laboratory
> Phone +44 1235 445012; Fax ... 5848;
> Web: home.badc.rl.ac.uk/lawrence
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
-- 
Scanned by iCritical.