[Go-essp-tech] Grouping files of differing versions

stephen.pascoe at stfc.ac.uk stephen.pascoe at stfc.ac.uk
Fri Jul 22 04:47:52 MDT 2011


Nebojsa,

> I need a criteria for assigning files differing only in the version number into separate groups.

I think your problem may be confused because ESGF is versioning datasets not files.  If you want to find the files of the latest version of a dataset search for all dataset versions of that dataset and then list the files in that dataset version.

If you start with the files I believe there will be problems.  As you say, both version and product are not determinable from the filename.  *in theory* any given filename should be in the same product for all versions.  However, I wouldn't depend on this.  For example a datanode could discover they have got the product of some data wrong and they may republish a new datasets containing the same files to fix this.  You would then see 2 datasets containing the same files but different product.

E.g.

Originally you could have a dataset versions:

cmip5.output1.blabla.v20110720: f1.nc, f2.nc, f3.nc, f4.nc

The datanode realises f3.nc and f4.nc should be in output2 so publishes 2 new datasets:

cmip5.output1.blabla.v20110720: f1.nc, f2.nc, f3.nc, f4.nc
cmip5.output1.blabla.v20110721: f1.nc, f2.nc
cmip5.output2.blabla.v20110721: f3.nc, f4.nc

We hope this won't happen but it might.  If you start by looking for all datasets containing f3.nc you will find 2 with different version and product: cmip5.output1.blabla.v20110720 and cmip5.output2.blabla.v20110721.

I believe it is safer to search at the dataset level then drill down to individual files.

Cheers,
Stephen.



---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK


-----Original Message-----
From: Nebojsa Balic [mailto:balic at dkrz.de] 
Sent: 22 July 2011 11:10
To: Pascoe, Stephen (STFC,RAL,RALSP)
Cc: go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] Grouping files of differing versions

  Stephen,
Thank you for a prompt answer.
A new demand on the search interface in the vERC portal is to provide 
the files of the most recent version that satisfy the given set of 
search constraints (DRS components, elements of the geospatial and 
temporal coverage). In order to determine the files with the latest 
version, I need a criteria for assigning files differing only in the 
version number into separate groups. The files cannot be grouped by 
their id-s because they all have different one. The version number is 
also not an option because files of different models, simulation, 
experiments etc. can have the same version number. The name seems to be 
the best grouping criteria since it does not contain the version of a 
file but DRS content. But does it mean that the files with the same name 
varies only in version? The product can be additional grouping crietria 
since the name does not contain this information. The search is 
performed on the CMI5 data so the files are all of the same activity.
So If I group files by their names and activity and for each of these 
groups I determine the file with the highest version number - do I get 
files of the latest version?
Thanks
Nebojsa

On 07/22/2011 11:08 AM, stephen.pascoe at stfc.ac.uk wrote:
> Nebojsa,
>
> Since CMOR is not version-aware files have no indication of their version number.  Version should be explicit in the THREDDS dataset at ID attribute and the property[@name="version"] attribute.
>> assumption that they all belong to the same product
> Can you give us an example as I'm not sure what you mean.  Files shouldn't move product between versions unless they were miss-classified initially.
>
> I would use the property[@name="version"] attribute to distinguish between versions as the dataset at ID could be changed  in the future.  I think of it as an internal THREDDS identifier.
>
> Cheers,
> Stephen.
>
> ---
> Stephen Pascoe  +44 (0)1235 445980
> Centre of Environmental Data Archival
> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>
> -----Original Message-----
> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Nebojsa Balic
> Sent: 22 July 2011 09:58
> To: go-essp-tech at ucar.edu
> Subject: [Go-essp-tech] Grouping files of differing versions
>
>    Dear All,
> I am trying to group files differing only in the version number in order
> to determine the files of the latest version. I have come to conclusion
> that files differing only in the version have all the same name but
> different ID under the assumption that they all belong to the same
> product. Is this a necessary and sufficient condition for grouping files
> with different versions?
> Thanks
> Nebojsa Balic
> MPI-M
> Hamburg
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech

-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list