[Go-essp-tech] Grouping files of differing versions

martin.juckes at stfc.ac.uk martin.juckes at stfc.ac.uk
Fri Jul 22 05:08:27 MDT 2011


Hello Nebojsa,

Just to add to whet Stephen says, we could have the situation where a file is present in an early version of the dataset, but not in a later version:

cmip5.output1.blabla.v20110720: foo1.nc, foo2.nc, foo3.nc
cmip5.output1.blabla.v20110721: foo1.nc, foo2.nc

The collection of files in the most recent version is not the same as the collection of most recent files. You say you want the latter, but I think it would be better to provide the former (i.e. foo1.nc and foo2.nc) in this case, so it might be worth clarifying the requirements,

Cheers,
Martin

> >-----Original Message-----
> >From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-
> >bounces at ucar.edu] On Behalf Of stephen.pascoe at stfc.ac.uk
> >Sent: 22 July 2011 11:48
> >To: balic at dkrz.de
> >Cc: go-essp-tech at ucar.edu
> >Subject: Re: [Go-essp-tech] Grouping files of differing versions
> >
> >Nebojsa,
> >
> >> I need a criteria for assigning files differing only in the version
> >number into separate groups.
> >
> >I think your problem may be confused because ESGF is versioning
> >datasets not files.  If you want to find the files of the latest
> >version of a dataset search for all dataset versions of that dataset
> >and then list the files in that dataset version.
> >
> >If you start with the files I believe there will be problems.  As you
> >say, both version and product are not determinable from the filename.
> >*in theory* any given filename should be in the same product for all
> >versions.  However, I wouldn't depend on this.  For example a datanode
> >could discover they have got the product of some data wrong and they
> >may republish a new datasets containing the same files to fix this.
> >You would then see 2 datasets containing the same files but different
> >product.
> >
> >E.g.
> >
> >Originally you could have a dataset versions:
> >
> >cmip5.output1.blabla.v20110720: f1.nc, f2.nc, f3.nc, f4.nc
> >
> >The datanode realises f3.nc and f4.nc should be in output2 so
> >publishes 2 new datasets:
> >
> >cmip5.output1.blabla.v20110720: f1.nc, f2.nc, f3.nc, f4.nc
> >cmip5.output1.blabla.v20110721: f1.nc, f2.nc
> >cmip5.output2.blabla.v20110721: f3.nc, f4.nc
> >
> >We hope this won't happen but it might.  If you start by looking for
> >all datasets containing f3.nc you will find 2 with different version
> >and product: cmip5.output1.blabla.v20110720 and
> >cmip5.output2.blabla.v20110721.
> >
> >I believe it is safer to search at the dataset level then drill down
> >to individual files.
> >
> >Cheers,
> >Stephen.
> >
> >
> >
> >---
> >Stephen Pascoe  +44 (0)1235 445980
> >Centre of Environmental Data Archival
> >STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX,
> >UK
> >
> >
> >-----Original Message-----
> >From: Nebojsa Balic [mailto:balic at dkrz.de]
> >Sent: 22 July 2011 11:10
> >To: Pascoe, Stephen (STFC,RAL,RALSP)
> >Cc: go-essp-tech at ucar.edu
> >Subject: Re: [Go-essp-tech] Grouping files of differing versions
> >
> >  Stephen,
> >Thank you for a prompt answer.
> >A new demand on the search interface in the vERC portal is to provide
> >the files of the most recent version that satisfy the given set of
> >search constraints (DRS components, elements of the geospatial and
> >temporal coverage). In order to determine the files with the latest
> >version, I need a criteria for assigning files differing only in the
> >version number into separate groups. The files cannot be grouped by
> >their id-s because they all have different one. The version number is
> >also not an option because files of different models, simulation,
> >experiments etc. can have the same version number. The name seems to
> >be
> >the best grouping criteria since it does not contain the version of a
> >file but DRS content. But does it mean that the files with the same
> >name
> >varies only in version? The product can be additional grouping
> >crietria
> >since the name does not contain this information. The search is
> >performed on the CMI5 data so the files are all of the same activity.
> >So If I group files by their names and activity and for each of these
> >groups I determine the file with the highest version number - do I get
> >files of the latest version?
> >Thanks
> >Nebojsa
> >
> >On 07/22/2011 11:08 AM, stephen.pascoe at stfc.ac.uk wrote:
> >> Nebojsa,
> >>
> >> Since CMOR is not version-aware files have no indication of their
> >version number.  Version should be explicit in the THREDDS dataset at ID
> >attribute and the property[@name="version"] attribute.
> >>> assumption that they all belong to the same product
> >> Can you give us an example as I'm not sure what you mean.  Files
> >shouldn't move product between versions unless they were miss-
> >classified initially.
> >>
> >> I would use the property[@name="version"] attribute to distinguish
> >between versions as the dataset at ID could be changed  in the future.  I
> >think of it as an internal THREDDS identifier.
> >>
> >> Cheers,
> >> Stephen.
> >>
> >> ---
> >> Stephen Pascoe  +44 (0)1235 445980
> >> Centre of Environmental Data Archival
> >> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11
> >0QX, UK
> >>
> >> -----Original Message-----
> >> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-
> >bounces at ucar.edu] On Behalf Of Nebojsa Balic
> >> Sent: 22 July 2011 09:58
> >> To: go-essp-tech at ucar.edu
> >> Subject: [Go-essp-tech] Grouping files of differing versions
> >>
> >>    Dear All,
> >> I am trying to group files differing only in the version number in
> >order
> >> to determine the files of the latest version. I have come to
> >conclusion
> >> that files differing only in the version have all the same name but
> >> different ID under the assumption that they all belong to the same
> >> product. Is this a necessary and sufficient condition for grouping
> >files
> >> with different versions?
> >> Thanks
> >> Nebojsa Balic
> >> MPI-M
> >> Hamburg
> >> _______________________________________________
> >> GO-ESSP-TECH mailing list
> >> GO-ESSP-TECH at ucar.edu
> >> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >
> >--
> >Scanned by iCritical.
> >_______________________________________________
> >GO-ESSP-TECH mailing list
> >GO-ESSP-TECH at ucar.edu
> >http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list