[Go-essp-tech] CMIP5 Version directory structure update

stephen.pascoe at stfc.ac.uk stephen.pascoe at stfc.ac.uk
Thu Jun 10 09:21:51 MDT 2010


Bryan,

So you are going to version simulation metadata.  Are you going to
independently version all the other bits of CIM too?  What happens if
there is a qc failure at the experiment requirements level -- does that
imply generating a whole new set of DOIs?

I guess what I'm getting at is that you may want to pick your priorities
to avoid the system getting too complex.

I get the category error issue.  It's equivalent to the need for
database normalisation in the RDMS world: if you have two entities in a
many to many mapping you need an association table.  

S.


---
Stephen Pascoe  +44 (0)1235 445980
British Atmospheric Data Centre
Rutherford Appleton Laboratory

-----Original Message-----
From: Bryan Lawrence [mailto:bryan.lawrence at stfc.ac.uk] 
Sent: 10 June 2010 15:54
To: Pascoe, Stephen (STFC,RAL,SSTD); Metafor List
Cc: Martina Stockhause; go-essp-tech at ucar.edu; A. Treshansky
Subject: Re: [Go-essp-tech] CMIP5 Version directory structure update

Hi Stephen

First: the confusion.

This is the category error that I was talking about.  If you have two
real world objects A and B, with descriptions a and b, and a
relationship R described by r

Then experience (see ebRIM et al) tells us not to put the relationship r
inside either a or b, because invariably you have to version either a or
b or both, and once you put r inside one (or both), you then have to
version both - and potentially change both even if only one has changed.

Better is make r a first class object, which is logically equivalen to
creating a new class s which looks rather like a triple, all it does is
say a r b ...

Now, if you change either a or b, you do need to change s, but not the
other one (which shouldn't change, because it hasn't!)

So the use case: just as folks will fix data versions, they'll fix
metadata descriptions. What if I wrote a simulation descriptoin where I
said I used HadCM3, but actually I used HadGEM3? Somehow (qc failure), a
DOI was created that pointed to that *simulatoin description*. (The DOIS
will point to the metadata, not directly to the data).  Someone writes a
paper about how the convectoin parameterisation did xyz ... wrong! Need
to have a corrigendum and a new DOI pointing to the correct metadata.

This is one of the reasons for the doc stereotype in metafor, amongst
other reasons it indicates something for which version history is
independently interesting.

So we don't want objects having version increments if they haven't
changed ... that's a  pretty good rule of thumb for provenance.

Cheers
Bryan

On Thursday 10 Jun 2010 15:34:57 Pascoe, Stephen (STFC,RAL,SSTD) wrote:
> Hi Bryan,
> 
> > * Yes, and we'll have to increment those documents version to 
> > conform
> 
> with any changes to the realm
> 
> > datasets, even if they themselves don't change.
> >
> > This nicely contradicts something I was saying at the sprint: that 
> > we
> 
> could do the associations with a
> 
> > query. I should have known better. The registry world deals with 
> > these
> 
> sorts of associations as first
> 
> > class objects in their own right. So, eg, we should have:
> >  s: a simulatoin description (v1)
> >  d1, and d2: two realm descriptoins (v1)
> >  a1: the association from s to d1 and d2 (at v1)
> 
> This is confusing me a little.  Are you suggesting s, d<n> and a<n>  
> are all independently versioned?  Any class that you want to version  
> effectively introduces a second class, the "class version".  E.g. in
>  CMIP5 we have "datasets" and "dataset versions".  I can see us  
> getting into a mess if we have "simulation versions" and  "association

> versions". Would it be better to just have "association  versions" and

> point the DOI to that?  What is the use case for  versioning the 
> simulation metadata?
> 
> S.
> 
> ---
> Stephen Pascoe  +44 (0)1235 445980
> British Atmospheric Data Centre
> Rutherford Appleton Laboratory
> 
> -----Original Message-----
> From: Bryan Lawrence [mailto:bryan.lawrence at stfc.ac.uk]
> Sent: 10 June 2010 15:09
> To: Martina Stockhause
> Cc: go-essp-tech at ucar.edu; Pascoe, Stephen (STFC,RAL,SSTD); A.
> Treshansky
> Subject: Re: [Go-essp-tech] CMIP5 Version directory structure update
> 
> Hi Martina
> 
> > Sorry. I meant that we assign DOIs in two granularities: One as you 
> > described for the group of datasets belonging to a simulation
> >  (metafor) / experiment (DRS level); and the second for coarser 
> > citations for the data produced by a whole modelling center or by 
> > one
> >
> > GCM of a modelling center. This was for citation convenience for 
> > publications that analyse and compare many CMIP5 simulations.
> 
> OK, I understand what you intend (but didn't know - or forgot - you 
> planned to do so), but what do you expect those DOIs to land on?
> 
> Currently we expect the DOIs for the simulation to resolve to the 
> metafor page for the simulation (but see below *)
> 
> We currently have no logical CIM document which makes sense for  
> landing on a centre or a model.... and I'm not sure it makes real  
> sense either: academia doesn't have a DOI for normal publication  
> aggregatoins - apart from those which are published as deliberate  
> anthologies ... when would one publish these aggregations? At the  end

> of CMIP5, at the close off date for the IPCC ...???  How would  we 
> deal with these in the q.c. sense? We'd end up with a new level  of 
> versioning for the centre and the model ... (and only wrt to  their 
> use, not them themselves).
> 
> > > My understanding is that there
> > > will will be DOIs that point to simulations (which are run by 
> > > models
> > >
> > > at institutes on platforms in conformance to experiments).
> > > Those documents will point to mulitiple realm datasets.
> 
> * Yes, and we'll have to increment those documents version to conform 
> with any changes to the realm datasets, even if they themselves don't 
> change.
> 
> This nicely contradicts something I was saying at the sprint: that we 
> could do the associations with a query. I should have known better.
>  The registry world deals with these sorts of asociatoins as first  
> class objects in their own right. So, eg, we should have:
>  s: a simulatoin description (v1)
>  d1, and d2: two realm descriptoins (v1)
>  a1: the association from s to d1 and d2 (at v1)
> 
> The citation will need to point to a1, not s, and be rendered 
> accordingly (one doesn't render a1, one renders s with the  
> associations denoted in a1).
> 
> If new versions of data are released, then even if s doesn't change,  
> a new association group can be created, and a new doi to point at  
> that.
> 
> I don't think the CIM (currently) handles this gracefully. It will 
> within a week or so :-)  (It would have had we really gone down the  
> OGC route properly where this stuff has already been thought through  
> in the context of ebRIM).
> 
> Hopefully that'll handle these issues.
> 
> Cheers
> Bryan
> 
> --
> Bryan Lawrence
> Director of Environmental Archival and Associated Research  
> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC) STFC,  
> Rutherford Appleton Laboratory Phone +44 1235 445012; Fax ... 5848;
> Web: home.badc.rl.ac.uk/lawrence
> --
> Scanned by iCritical.
> 

--
Bryan Lawrence
Director of Environmental Archival and Associated Research (NCAS/British
Atmospheric Data Centre and NCEO/NERC NEODC) STFC, Rutherford Appleton
Laboratory Phone +44 1235 445012; Fax ... 5848;
Web: home.badc.rl.ac.uk/lawrence
--
Scanned by iCritical.
-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list