[Go-essp-tech] Are atomic datasets mutable?

Karl Taylor taylor13 at llnl.gov
Sun Nov 22 22:34:13 MST 2009


Dear Stephen,

Here are some responses to your email:

stephen.pascoe at stfc.ac.uk wrote:
> CMOR2 will write output into the same directory. 
>
> Does this mean that a single atomic dataset could contain data from 2
> different tiers (core and tier 1)? 
Yes.
>
> On the more general I point I completely agree with Bryan.  The DOI
> point is the most concise statement of why we need immutable atomic
> datasets.  What we need to agree on is what is a "version".  We probably
> have different ideas what a version is so I'll share my perspective,
> which comes from software engineering and version control systems.
>
> VCS systems have a concept of an atomic unit: a file.  Any change to a
> file is considered a new version (or revision, the terminology varies).
> It doesn't differentiate between additions and changes -- a change could
> be as trivial as a newline at the end of the file.  The point is that
> the system's knowledge of the internal structure of objects needs to
> stop somewhere and that's the atomic unit.  
>
> Therefore I think extension should imply a new version.  As Bryan says
> the job of explaining the relationship between 2 versions is the job of
> metadata.  Also the problem of how we efficiently store 2 versions one
> of which is an extension of the first can be solved separately.
>
>   
 From a user's perspective, I think the common understanding of 
"version" will be that it differs in some substantive way from other 
versions, not that it simply contains data not previously contributed to 
the archive.  Anyone publishing a scientific article will have to say 
which time-period he analyzed, and this will not be evident simply by 
specifying the version, since many papers will be based on some subset 
of the total output available under a single version number.

best regards,
Karl
> S.
>
> ---
> Stephen Pascoe  +44 (0)1235 445980
> British Atmospheric Data Centre
> Rutherford Appleton Laboratory
>
> -----Original Message-----
> From: Karl Taylor [mailto:taylor13 at llnl.gov] 
> Sent: 19 November 2009 20:34
> To: Pascoe, Stephen (STFC,RAL,SSTD)
> Cc: go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] Are atomic datasets mutable?
>
> Hi all,
>
> Another common case that we'll have to deal with in CMIP5 and which
> should be considered in defining what a "version" is:  For the future
> (so-called RCP) runs, CMIP5 calls for runs initiated from the end of the
> historical run and as part of the core set of expts., running to the end
> of the 21st century.  At a lower priority some of these runs will be
> extended to the end of the 23rd century.  Groups will likely carry out
> these simulations in stages, sending us the 21st century output long 
> before the 22nd and 23rd century output becomes available.   The 
> component time-periods are part of the same experiment and CMOR2 will
> write output into the same directory. 
>
> I think the users will be confused if a new "version" of model output
> (that has been modified in some way) is indistinguishable from model
> output that has been simply extended.  Both of the following options
> will be confusing:
>
> Option 1) 21st century data for the RCP4.5 future run is received and
> identified as version 1.  Then the continuation of that run to the end
> of the 23rd century is received and stored as version 2.  New users will
> have to download both version 1 and version 2 to get the complete run.
>
> Option 2) 21st century data for the RCP4.5 future run is received and 
> identified as version 1.   Then the continuation of that run to the end 
> of the 23rd century is received and stored as version 2 along with a
> copy of the data already stored as version 1.  In this case a new user
> will get all the data by downloading version 2, but an old user who
> already downloaded version 1, won't know if what's in version 2 is a
> duplicate of the data he already has, or is replacement data which has
> corrected some problems in the earlier version.
>
> I would suggest therefore that for a single experiment, it would be best
> from a user's perspective to not assign a new version to model output
> that simply extends a previous run.  We will have to find a method by
> which to advise old users who already downloaded data that the runs have
> now been extended.
>
> Best regards,
> Karl
>
>
> stephen.pascoe at stfc.ac.uk wrote:
>   
>> Hi all,
>>  
>> The UKMO has flagged up a use case where an atomic dataset might 
>> change over time without being a new version.  The example is the 1000
>>     
>
>   
>> year piControl run where UKMO is likely to deliver it in several time 
>> chunks and would want it to be published before the full run is 
>> complete.  Since atomic datasets represent the whole time period these
>>     
>
>   
>> datasets will grow over time.
>>  
>> I am tempted to say each addition to the dataset triggers a new 
>> version that deprecates the previous one but UKMO wasn't too keen on 
>> that.  Any ideas?
>>  
>> S.
>>  
>> ---
>> Stephen Pascoe  +44 (0)1235 445980
>> British Atmospheric Data Centre
>> Rutherford Appleton Laboratory
>>  
>>
>> --
>> Scanned by iCritical.
>>
>>
>> ----------------------------------------------------------------------
>> --
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://**mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>   
>>     
>
>   



More information about the GO-ESSP-TECH mailing list