[Go-essp-tech] CMIP5 Version directory structure update

Bryan Lawrence bryan.lawrence at stfc.ac.uk
Thu Jun 10 09:36:58 MDT 2010


Hi Stephen

We have always planned to version simulation metadata ... see 
http://metaforclimate.eu/trac/wiki/CMIP5/VarURIStructure

And the questionnaire already supports this concept, once a metadata 
document is published, if it is changed, a new version is automatically 
created if it is then republished.

The only piece that hasn't been dealt with properly is the implied 
recordset (as opposed to an actual physical recordset).

Metafor has very little to do to deal with this.

Cheers
Bryan

On Thursday 10 Jun 2010 16:21:51 Pascoe, Stephen (STFC,RAL,SSTD) wrote:
> Bryan,
> 
> So you are going to version simulation metadata.  Are you going to
> independently version all the other bits of CIM too?  What happens if
> there is a qc failure at the experiment requirements level -- does
>  that imply generating a whole new set of DOIs?
> 
> I guess what I'm getting at is that you may want to pick your
>  priorities to avoid the system getting too complex.
> 
> I get the category error issue.  It's equivalent to the need for
> database normalisation in the RDMS world: if you have two entities in
>  a many to many mapping you need an association table.
> 
> S.
> 
> 
> ---
> Stephen Pascoe  +44 (0)1235 445980
> British Atmospheric Data Centre
> Rutherford Appleton Laboratory
> 
> -----Original Message-----
> From: Bryan Lawrence [mailto:bryan.lawrence at stfc.ac.uk]
> Sent: 10 June 2010 15:54
> To: Pascoe, Stephen (STFC,RAL,SSTD); Metafor List
> Cc: Martina Stockhause; go-essp-tech at ucar.edu; A. Treshansky
> Subject: Re: [Go-essp-tech] CMIP5 Version directory structure update
> 
> Hi Stephen
> 
> First: the confusion.
> 
> This is the category error that I was talking about.  If you have two
> real world objects A and B, with descriptions a and b, and a
> relationship R described by r
> 
> Then experience (see ebRIM et al) tells us not to put the
>  relationship r inside either a or b, because invariably you have to
>  version either a or b or both, and once you put r inside one (or
>  both), you then have to version both - and potentially change both
>  even if only one has changed.
> 
> Better is make r a first class object, which is logically equivalen
>  to creating a new class s which looks rather like a triple, all it
>  does is say a r b ...
> 
> Now, if you change either a or b, you do need to change s, but not
>  the other one (which shouldn't change, because it hasn't!)
> 
> So the use case: just as folks will fix data versions, they'll fix
> metadata descriptions. What if I wrote a simulation descriptoin where
>  I said I used HadCM3, but actually I used HadGEM3? Somehow (qc
>  failure), a DOI was created that pointed to that *simulatoin
>  description*. (The DOIS will point to the metadata, not directly to
>  the data).  Someone writes a paper about how the convectoin
>  parameterisation did xyz ... wrong! Need to have a corrigendum and a
>  new DOI pointing to the correct metadata.
> 
> This is one of the reasons for the doc stereotype in metafor, amongst
> other reasons it indicates something for which version history is
> independently interesting.
> 
> So we don't want objects having version increments if they haven't
> changed ... that's a  pretty good rule of thumb for provenance.
> 
> Cheers
> Bryan
> 
> On Thursday 10 Jun 2010 15:34:57 Pascoe, Stephen (STFC,RAL,SSTD) 
wrote:
> > Hi Bryan,
> >
> > > * Yes, and we'll have to increment those documents version to
> > > conform
> >
> > with any changes to the realm
> >
> > > datasets, even if they themselves don't change.
> > >
> > > This nicely contradicts something I was saying at the sprint:
> > > that we
> >
> > could do the associations with a
> >
> > > query. I should have known better. The registry world deals with
> > > these
> >
> > sorts of associations as first
> >
> > > class objects in their own right. So, eg, we should have:
> > >  s: a simulatoin description (v1)
> > >  d1, and d2: two realm descriptoins (v1)
> > >  a1: the association from s to d1 and d2 (at v1)
> >
> > This is confusing me a little.  Are you suggesting s, d<n> and a<n>
> > are all independently versioned?  Any class that you want to
> > version effectively introduces a second class, the "class version".
> >  E.g. in CMIP5 we have "datasets" and "dataset versions".  I can
> > see us getting into a mess if we have "simulation versions" and 
> > "association
> >
> > versions". Would it be better to just have "association  versions"
> > and
> >
> > point the DOI to that?  What is the use case for  versioning the
> > simulation metadata?
> >
> > S.
> >
> > ---
> > Stephen Pascoe  +44 (0)1235 445980
> > British Atmospheric Data Centre
> > Rutherford Appleton Laboratory
> >
> > -----Original Message-----
> > From: Bryan Lawrence [mailto:bryan.lawrence at stfc.ac.uk]
> > Sent: 10 June 2010 15:09
> > To: Martina Stockhause
> > Cc: go-essp-tech at ucar.edu; Pascoe, Stephen (STFC,RAL,SSTD); A.
> > Treshansky
> > Subject: Re: [Go-essp-tech] CMIP5 Version directory structure
> > update
> >
> > Hi Martina
> >
> > > Sorry. I meant that we assign DOIs in two granularities: One as
> > > you described for the group of datasets belonging to a simulation
> > > (metafor) / experiment (DRS level); and the second for coarser
> > > citations for the data produced by a whole modelling center or by
> > > one
> > >
> > > GCM of a modelling center. This was for citation convenience for
> > > publications that analyse and compare many CMIP5 simulations.
> >
> > OK, I understand what you intend (but didn't know - or forgot - you
> > planned to do so), but what do you expect those DOIs to land on?
> >
> > Currently we expect the DOIs for the simulation to resolve to the
> > metafor page for the simulation (but see below *)
> >
> > We currently have no logical CIM document which makes sense for
> > landing on a centre or a model.... and I'm not sure it makes real
> > sense either: academia doesn't have a DOI for normal publication
> > aggregatoins - apart from those which are published as deliberate
> > anthologies ... when would one publish these aggregations? At the 
> > end
> >
> > of CMIP5, at the close off date for the IPCC ...???  How would  we
> > deal with these in the q.c. sense? We'd end up with a new level  of
> > versioning for the centre and the model ... (and only wrt to  their
> > use, not them themselves).
> >
> > > > My understanding is that there
> > > > will will be DOIs that point to simulations (which are run by
> > > > models
> > > >
> > > > at institutes on platforms in conformance to experiments).
> > > > Those documents will point to mulitiple realm datasets.
> >
> > * Yes, and we'll have to increment those documents version to
> > conform with any changes to the realm datasets, even if they
> > themselves don't change.
> >
> > This nicely contradicts something I was saying at the sprint: that
> > we could do the associations with a query. I should have known
> > better. The registry world deals with these sorts of asociatoins as
> > first class objects in their own right. So, eg, we should have:
> >  s: a simulatoin description (v1)
> >  d1, and d2: two realm descriptoins (v1)
> >  a1: the association from s to d1 and d2 (at v1)
> >
> > The citation will need to point to a1, not s, and be rendered
> > accordingly (one doesn't render a1, one renders s with the
> > associations denoted in a1).
> >
> > If new versions of data are released, then even if s doesn't
> > change, a new association group can be created, and a new doi to
> > point at that.
> >
> > I don't think the CIM (currently) handles this gracefully. It will
> > within a week or so :-)  (It would have had we really gone down the
> > OGC route properly where this stuff has already been thought
> > through in the context of ebRIM).
> >
> > Hopefully that'll handle these issues.
> >
> > Cheers
> > Bryan
> >
> > --
> > Bryan Lawrence
> > Director of Environmental Archival and Associated Research
> > (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC) STFC,
> > Rutherford Appleton Laboratory Phone +44 1235 445012; Fax ... 5848;
> > Web: home.badc.rl.ac.uk/lawrence
> > --
> > Scanned by iCritical.
> 
> --
> Bryan Lawrence
> Director of Environmental Archival and Associated Research
>  (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC) STFC,
>  Rutherford Appleton Laboratory Phone +44 1235 445012; Fax ... 5848;
> Web: home.badc.rl.ac.uk/lawrence
> --
> Scanned by iCritical.
> 

-- 
Bryan Lawrence
Director of Environmental Archival and Associated Research
(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
STFC, Rutherford Appleton Laboratory
Phone +44 1235 445012; Fax ... 5848; 
Web: home.badc.rl.ac.uk/lawrence


More information about the GO-ESSP-TECH mailing list