[Go-essp-tech] THREDDS group wiki

martin.juckes at stfc.ac.uk martin.juckes at stfc.ac.uk
Fri Feb 24 03:00:51 MST 2012


Hello,

There is another area of vagueness, as far as I can see, which is related to current discussions about how terms in the THREDDS catalogue are translated into the terms that appear in the user interface. The <property name="...." value="...."/> elements have are often associated with a controlled vocabulary, but there is no indication in the catalogue or our interface specification as to where these vocabularies are defined.

I can see 3 options:

*         List the vocabularies in the interface specification;

*         List places where the vocabularies are defined in the interface specification;

*         Provide a mechanism in the THREDDS profile for the catalogue to indicate the relevant vocabularies;

The last would be preferable, but I can't see how to achieve it within the THREDDS schema,

Cheers,
Martin

From: Cinquini, Luca (3880) [mailto:Luca.Cinquini at jpl.nasa.gov]
Sent: 23 February 2012 16:46
To: Pascoe, Stephen (STFC,RAL,RALSP)
Cc: Juckes, Martin (STFC,RAL,RALSP); go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] Corrected THREDDS group wiki page link

Hi Stephen:
On Feb 23, 2012, at 8:54 AM, <stephen.pascoe at stfc.ac.uk<mailto:stephen.pascoe at stfc.ac.uk>> wrote:


Hi Luca,

> o Each catalog is harvested as a single discoverable dataset - the reason being that hopefully the data provider thought about how to generate
> the catalogs, and decided on what should be the single unit of discovery
>
> o For each catalog, all files are assigned to the top-level dataset container - so if there were many nested datasets with files, it still would result
> in a single discoverable dataset with as many files

Does this mean that each catalog should contain 0 or 1 top-level datasets and any further nesting below that is collapsed down?  That sounds quite sensible.  What happens to any properties in any dataset below the top-level one?
I may be mistaken, but at a thredds catalog always only contains one top-level dataset ? At least that used to be the case, I believe. At the very least, I don't know of any catalog that has many top-level datasets.

As for the properties - if they are associated with mid-level datasets, they would currently be ignored. This could change, if we had examples to work with.

thanks, L



Cheers,
Stephen.


---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK

From: Cinquini, Luca (3880) [mailto:Luca.Cinquini at jpl.nasa.gov]
Sent: 23 February 2012 15:02
To: Pascoe, Stephen (STFC,RAL,RALSP)
Cc: Juckes, Martin (STFC,RAL,RALSP); go-essp-tech at ucar.edu<mailto:go-essp-tech at ucar.edu>
Subject: Re: [Go-essp-tech] Corrected THREDDS group wiki page link

Hi Stephen and Martin,
            just for clarification, this is what the P2P harvesting software currently does - this doesn't mean that it cannot be changed if desired:

o Each catalog can contain an arbitrary hierarchy of datasets and catalogRefs

o Each catalog is harvested as a single discoverable dataset - the reason being that hopefully the data provider thought about how to generate the catalogs, and decided on what should be the single unit of discovery

o For each catalog, all files are assigned to the top-level dataset container - so if there were many nested datasets with files, it still would result in a single discoverable dataset with as many files

o And obviously, all catalogRef are followed in harvesting, and generate separate discoverable datasets.

thanks, Luca

On Feb 23, 2012, at 7:33 AM, <stephen.pascoe at stfc.ac.uk<mailto:stephen.pascoe at stfc.ac.uk>> <stephen.pascoe at stfc.ac.uk<mailto:stephen.pascoe at stfc.ac.uk>> wrote:



Thanks Martin.  There is a catalog_version attribute already, although I don't think there is any documentation on what it means.

On the hierarchy, I personally believe we could allow any number of intermediate catalogues containing <catalogRef> elements in the spec.  Datanodes currently only produce 2 levels .../thredds/catalog.xml and .../thredds/esgcet/catalog.xml, but there would be no harm in having deeper nesting.  What I think is less flexible is the constraint that "leaf-catalogs" contain a single container <dataset> element and a set of child <dataset> elements representing files and aggregations.  This design is what LAS and other bits of ESGF rely on. General THREDDS allows you to mix catalogRef, container datasets and "real" datasets throughout the hierarchy.

Anyone, please chip in if you dissagree.

Cheers,
Stephen.

---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK

From: Juckes, Martin (STFC,RAL,RALSP)
Sent: 23 February 2012 12:11
To: Pascoe, Stephen (STFC,RAL,RALSP); go-essp-tech at ucar.edu<mailto:go-essp-tech at ucar.edu>
Subject: RE: Corrected THREDDS group wiki page link

Hello All,

Sorry I had to leave the telco early - but it was a useful discussion.

After leaving, I had a couple of thoughts:

(1)    There syntax should be versioned , and the version should be indicated in the catalogue somewhere - whatever we agree on, there is bound to be need to change in the future, and changes will be much easier to manage if we have the version in the catalogue. There could be independent syntax versions for the top level catalogue and the "publication unit" catalogue. The cleanest way to do this would be with an xsd document referenced in the schemaLocation attribute. We could set this up initially with a "permissive" xsd schema imposing necessary constraints, but not all the required constraints.
(2)    The decision to stick to a 2-level hierarchy of THREDDS documents (a top-level catalogue with a list of "catalogRef"s and a sub-catalogue for each publication unit) is certainly right for now, but may be too restrictive in the medium term. The specification of "catalogRef" means that very little information is in the top level, and at the next level you have to fetch everything. Having an 3rd level - e.g. for each simulation - would allow more flexibility in recording changes and pointing to documentation.

Cheers,
Martin

From: go-essp-tech-bounces at ucar.edu<mailto:go-essp-tech-bounces at ucar.edu> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of stephen.pascoe at stfc.ac.uk<mailto:stephen.pascoe at stfc.ac.uk>
Sent: 21 February 2012 16:14
To: go-essp-tech at ucar.edu<mailto:go-essp-tech at ucar.edu>
Subject: [Go-essp-tech] Corrected THREDDS group wiki page link


http://esgf.org/wiki/ESGFInterfaceGroups/ThreddsGroup

---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK



--
Scanned by iCritical.



--
Scanned by iCritical.

_______________________________________________
GO-ESSP-TECH mailing list
GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>
http://mailman.ucar.edu/mailman/listinfo/go-essp-tech



--
Scanned by iCritical.




-- 
Scanned by iCritical.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20120224/3ab3ff8f/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list