[Go-essp-tech] CMIP5 parameter versions

stephen.pascoe at stfc.ac.uk stephen.pascoe at stfc.ac.uk
Tue Sep 6 08:16:22 MDT 2011


Hi Estani

> First, regarding the Gateway, I think it's not a bad Idea to "hide" 
> older versions... that's what mark mentioned. Old versions are "always" 
> bad... unless you have already used one, that is.

Sure, I don't disagree, but I'd like old versions to be a little more visible in the Gateway.  At the moment end-users wouldn't find it useful unless they know where to look and understand THREDDS.

> There are three different events that can happen to the files from an 
> older version when a new one gets published:
> - old files that are available in the new version
> - old files that get replaced in the new version (same file name)
> - old files that are missing in the new version (because they were 
> removed, or because they are replaced by a file named differently (e.g. 
> different chunking))
>
> So, how do drs_lib cope with those three events? How does it tell a 
> changed file from an unchanged apart?

The first 2 cases are handled by drslib but the last one is *to be implemented*.  We have worked around this problem by creating empty files for ones we want deleting then hand-removing symbolic links after the DRS structure is created and before publication of the new version.  I know it's a priority to implement.

> If that is solved, I now have 2 "complete" versions. Now what I need to 
> do is to "attach" somehow a comment to the data and notify people who 
> has already downloaded the older version about it.

I see this as the job of notification.  You can add comments to esgpublish events as a start -- I don't know whether these comments are integrated with notification.  We are also putting together a lightweight web-app for documenting version changes across the archive.  Bare with us for a couple of days and we'll open it up for evaluation (and I really mean a couple of days -- it will be publicised after the BADC maintenance downtime).

Cheers,
Stephen.

---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK


-----Original Message-----
From: Estanislao Gonzalez [mailto:gonzalez at dkrz.de] 
Sent: 06 September 2011 14:35
To: Pascoe, Stephen (STFC,RAL,RALSP)
Cc: abhipsl at ipsl.jussieu.fr; Stephen.Jeffrey at climatechange.qld.gov.au; Leon.Rotstayn at csiro.au; go-essp-tech at ucar.edu; Ben.Evans at anu.edu.au; Mark.Collier at csiro.au
Subject: Re: [Go-essp-tech] CMIP5 parameter versions

Hi Stephen,

First, regarding the Gateway, I think it's not a bad Idea to "hide" 
older versions... that's what mark mentioned. Old versions are "always" 
bad... unless you have already used one, that is. This distinction 
between people how has already used the older version and those who 
don't is what probably difficult the Gateway development regarding how 
other versions should be displayed.

Anyway, I'm going to publish a new version and there are a couple of 
things I don't fully understand, maybe you or somebody else how has 
published a new version can help me out.

There are three different events that can happen to the files from an 
older version when a new one gets published:
- old files that are available in the new version
- old files that get replaced in the new version (same file name)
- old files that are missing in the new version (because they were 
removed, or because they are replaced by a file named differently (e.g. 
different chunking))

So, how do drs_lib cope with those three events? How does it tell a 
changed file from an unchanged apart?
I have quite a few questions more, but all depends on where the drs_lib 
expects to get a complete datasets which it then compares somehow with 
the older one, or a description about what should be done plus the new 
files.

If that is solved, I now have 2 "complete" versions. Now what I need to 
do is to "attach" somehow a comment to the data and notify people who 
has already downloaded the older version about it.

How is this done? I think we are writing a comment in the file header, 
but we don't want the users to download all files again and search in 
the headers to see what has been changed (and possibly throwing this new 
version away, because the changes weren't important for the study 
they've already started). And in any case, that's wrong, since the 
comment is not useful for this new version (why would I want to know 
about errors in the past?) I think what the user wants to know is what 
has changed since he/she downloaded the data, not what happened before that.

We thought about writing this info into a wiki and sending the link to 
all people that have downloaded a superseded version. I hope I can use 
the notification system in the Gateway for this, but I have no clue how 
this works. (Any Idea?)

How have you done this? I'm pretty sure I'm missing a couple of 
points... any other ideas?

Thanks,
Estani

Am 06.09.2011 14:13, schrieb stephen.pascoe at stfc.ac.uk:
> Hi Ashish,
>
> I have recently checked Gateway 1.3.1 and it appears only the most recent version is downloadable.  Information on previous versions can be seen in the dataset's History tab, including the URL of the THREDDS catalog but the URL is not clickable.
>
> The NCAR developers may be able to comment but my impression is that version support in the Gateway has been left at an early stage.  I saw mock-ups of full version support at a workshop many months ago but I expect they left implementation until the requirement was clearer.
>
> BADC is also trying to get multi-version support working. Following the Gateway upgrade tomorrow I will be trying to expose as much version information as possible through our Gateway.  We currently have quite a few datasets with 2 versions, a few with 3 and some version upgrades we are waiting to process.
>
> Cheers,
> Stephen.
>
> ---
> Stephen Pascoe  +44 (0)1235 445980
> Centre of Environmental Data Archival
> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>
>
> -----Original Message-----
> From: Ashish Bhardwaj [mailto:abhipsl at ipsl.jussieu.fr]
> Sent: 06 September 2011 10:53
> To: Pascoe, Stephen (STFC,RAL,RALSP)
> Cc: Mark.Collier at csiro.au; go-essp-tech at ucar.edu; Stephen.Jeffrey at climatechange.qld.gov.au; Leon.Rotstayn at csiro.au; Ben.Evans at anu.edu.au
> Subject: Re: [Go-essp-tech] CMIP5 parameter versions
>
> Hi Stephen,
>
>
> stephen.pascoe at stfc.ac.uk wrote:
>> Hi Mark,
>>
>> 1) if the ESG system is designed to always supply the latest version
>> (and hide away old versions)?
>>
>> Yes.  The ESG publisher keeps THREDDS catalogs of old versions but the Gateway will always display the most recent version.  Previous versions aren't directly accessible through a Gateway (at least not in the versions I've evaluated, 1.2.x and 1.3.x) but the existence of previous versions are visible through a Gateway.
>>
>> Of course this presumes you have separated the files of old and new versions in some way and correctly told ESG publisher which files constitute a new version.  See below.
>>
>>
> If I added a new version for a dataset containing new variables, is it
> possible both versions to be accessible through gateway ?
>
> Thanks.
> Ashish
>
>
>
>>> 2) as the file system that is scanned by the ESG system is shared by
>>> our analysts, as it stands old versions can still be seen and copied
>>> and potentially end up in local archives, especially when distributed
>>> by 3rd parties. We would like to try and avoid this from happening
>>> _without_ messing up the ESG metadata.
>>>
>> Are you using the DRS directory structure which includes a version directory "vYYYYMMDD"?  This is how we are separating the current version from previous versions.  The tool drslib[1] is designed to manage duplicate files across versions using symbolic links.
>>
>> [1] http://esgf.org/esgf-drslib-site
>>
>> Cheers,
>> Stephen.
>>
>> ---
>> Stephen Pascoe  +44 (0)1235 445980
>> Centre of Environmental Data Archival
>> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>>
>> -----Original Message-----
>> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of mark collier
>> Sent: 06 September 2011 02:30
>> To: go-essp-tech at ucar.edu
>> Cc: Jeffrey Stephen; Aspendale) Rotstayn Leon (CMAR; Ben Evans
>> Subject: [Go-essp-tech] CMIP5 parameter versions
>>
>> Hi,
>> just a general question about versions of parameters.
>>
>> As we identify problems our list of new versions is growing, however,
>> still manageable.
>>
>> What concerns us more is the possibility of outdated versions which
>> may still be accessed. In terms of reconciling file (differences) it
>> can be good to have all versions available, however, nightmarish if
>> the wrong files get into analysts repositories and accidently (or
>> unknowingly used if they aren't aware of them being outdated) used.
>>
>> I would like to know:
>>
>> 1) if the ESG system is designed to always supply the latest version
>> (and hide away old versions)?
>>
>> 2) as the filesystem that is scanned by the ESG system is shared by
>> our analysts, as it stands old versions can still be seen and copied
>> and potentially end up in local archives, especially when distributed
>> by 3rd parties. We would like to try and avoid this from happening
>> _without_ messing up the ESG metadata.
>>
>> One solution for 2) is to ("in-situ") change the variable name from
>> say pr to pr_error in the wrong/outdated file versions - the DRS
>> structure (including filename) will stay the same but anyone trying to
>> read the file will instantly be alerted to the problem.
>>
>> This is probably our biggest concern at the moment in terms of making
>> data available to the CMIP5 community.
>>
>> Regards,
>> Mark Collier.
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>
>


-- 
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  gonzalez at dkrz.de

-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list