[Go-essp-tech] CMIP5 Version directory structure update

Bryan Lawrence bryan.lawrence at stfc.ac.uk
Thu Jun 10 08:53:33 MDT 2010


Hi Stephen

First: the confusion.

This is the category error that I was talking about.  If you have two 
real world objects A and B, with descriptions a and b, and a 
relationship R described by r

Then experience (see ebRIM et al) tells us not to put the relationship r 
inside either a or b, because invariably you have to version either a or 
b or both, and once you put r inside one (or both), you then have to 
version both - and potentially change both even if only one has changed.

Better is make r a first class object, which is logically equivalen to 
creating a new class s which looks rather like a triple, all it does is 
say a r b ...

Now, if you change either a or b, you do need to change s, but not the 
other one (which shouldn't change, because it hasn't!)

So the use case: just as folks will fix data versions, they'll fix 
metadata descriptions. What if I wrote a simulation descriptoin where I 
said I used HadCM3, but actually I used HadGEM3? Somehow (qc failure), a 
DOI was created that pointed to that *simulatoin description*. (The DOIS 
will point to the metadata, not directly to the data).  Someone writes a 
paper about how the convectoin parameterisation did xyz ... wrong! Need 
to have a corrigendum and a new DOI pointing to the correct metadata.

This is one of the reasons for the doc stereotype in metafor, amongst 
other reasons it indicates something for which version history is 
independently interesting.

So we don't want objects having version increments if they haven't 
changed ... that's a  pretty good rule of thumb for provenance.

Cheers
Bryan

On Thursday 10 Jun 2010 15:34:57 Pascoe, Stephen (STFC,RAL,SSTD) wrote:
> Hi Bryan,
> 
> > * Yes, and we'll have to increment those documents version to
> > conform
> 
> with any changes to the realm
> 
> > datasets, even if they themselves don't change.
> >
> > This nicely contradicts something I was saying at the sprint: that
> > we
> 
> could do the associations with a
> 
> > query. I should have known better. The registry world deals with
> > these
> 
> sorts of associations as first
> 
> > class objects in their own right. So, eg, we should have:
> >  s: a simulatoin description (v1)
> >  d1, and d2: two realm descriptoins (v1)
> >  a1: the association from s to d1 and d2 (at v1)
> 
> This is confusing me a little.  Are you suggesting s, d<n> and a<n>
>  are all independently versioned?  Any class that you want to version
>  effectively introduces a second class, the "class version".  E.g. in
>  CMIP5 we have "datasets" and "dataset versions".  I can see us
>  getting into a mess if we have "simulation versions" and
>  "association versions". Would it be better to just have "association
>  versions" and point the DOI to that?  What is the use case for
>  versioning the simulation metadata?
> 
> S.
> 
> ---
> Stephen Pascoe  +44 (0)1235 445980
> British Atmospheric Data Centre
> Rutherford Appleton Laboratory
> 
> -----Original Message-----
> From: Bryan Lawrence [mailto:bryan.lawrence at stfc.ac.uk]
> Sent: 10 June 2010 15:09
> To: Martina Stockhause
> Cc: go-essp-tech at ucar.edu; Pascoe, Stephen (STFC,RAL,SSTD); A.
> Treshansky
> Subject: Re: [Go-essp-tech] CMIP5 Version directory structure update
> 
> Hi Martina
> 
> > Sorry. I meant that we assign DOIs in two granularities: One as you
> > described for the group of datasets belonging to a simulation
> >  (metafor) / experiment (DRS level); and the second for coarser
> > citations for the data produced by a whole modelling center or by 
> > one
> >
> > GCM of a modelling center. This was for citation convenience for
> > publications that analyse and compare many CMIP5 simulations.
> 
> OK, I understand what you intend (but didn't know - or forgot - you
> planned to do so), but what do you expect those DOIs to land on?
> 
> Currently we expect the DOIs for the simulation to resolve to the
> metafor page for the simulation (but see below *)
> 
> We currently have no logical CIM document which makes sense for
>  landing on a centre or a model.... and I'm not sure it makes real
>  sense either: academia doesn't have a DOI for normal publication
>  aggregatoins - apart from those which are published as deliberate
>  anthologies ... when would one publish these aggregations? At the
>  end of CMIP5, at the close off date for the IPCC ...???  How would
>  we deal with these in the q.c. sense? We'd end up with a new level
>  of versioning for the centre and the model ... (and only wrt to
>  their use, not them themselves).
> 
> > > My understanding is that there
> > > will will be DOIs that point to simulations (which are run by
> > > models
> > >
> > > at institutes on platforms in conformance to experiments).
> > > Those documents will point to mulitiple realm datasets.
> 
> * Yes, and we'll have to increment those documents version to conform
> with any changes to the realm datasets, even if they themselves don't
> change.
> 
> This nicely contradicts something I was saying at the sprint: that we
> could do the associations with a query. I should have known better.
>  The registry world deals with these sorts of asociatoins as first
>  class objects in their own right. So, eg, we should have:
>  s: a simulatoin description (v1)
>  d1, and d2: two realm descriptoins (v1)
>  a1: the association from s to d1 and d2 (at v1)
> 
> The citation will need to point to a1, not s, and be rendered
> accordingly (one doesn't render a1, one renders s with the
>  associations denoted in a1).
> 
> If new versions of data are released, then even if s doesn't change,
>  a new association group can be created, and a new doi to point at
>  that.
> 
> I don't think the CIM (currently) handles this gracefully. It will
> within a week or so :-)  (It would have had we really gone down the
>  OGC route properly where this stuff has already been thought through
>  in the context of ebRIM).
> 
> Hopefully that'll handle these issues.
> 
> Cheers
> Bryan
> 
> --
> Bryan Lawrence
> Director of Environmental Archival and Associated Research
>  (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC) STFC,
>  Rutherford Appleton Laboratory Phone +44 1235 445012; Fax ... 5848;
> Web: home.badc.rl.ac.uk/lawrence
> --
> Scanned by iCritical.
> 

-- 
Bryan Lawrence
Director of Environmental Archival and Associated Research
(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
STFC, Rutherford Appleton Laboratory
Phone +44 1235 445012; Fax ... 5848; 
Web: home.badc.rl.ac.uk/lawrence


More information about the GO-ESSP-TECH mailing list