[Go-essp-tech] Corrected THREDDS group wiki page link

stephen.pascoe at stfc.ac.uk stephen.pascoe at stfc.ac.uk
Fri Feb 24 06:46:24 MST 2012


I agree Estani.  The whole point is we need to define where we are not supporting the whole schema.  But we should assume our profile is compatible with a subset of the THREDDS schema, if it isn't we need to say why.  In this case the schema seems to contradict the implementation and it also contradicts Luca's statement that each catalog element has only one dataset child.  In my view that's useful information for writing a profile that is "compatible with THREDDS".

> But besides this, I don't think we have any compelling reason for supporting the whole TDS schema.
> IMHO we should define what we now have in a xsd as a subset of what it's allowed (but considering the previous point regarding scalability).
I both agree and disagree here.  We should capture what is done now but I'm not sure about XSD.  Most of the things we want to pin down are beyond the expressiveness of XSD, such as the semantics of the replica properties and the relationship between properties and identifiers.

I think ISO metadata is at a much higher level like who owns and maintains the dataset.  We've had lots of experience with ISO and I think it's a completely different activity to consider expressing ESGF metadata in ISO.

Stephen.

---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK

From: Estanislao Gonzalez [mailto:gonzalez at dkrz.de]
Sent: 24 February 2012 13:22
To: Juckes, Martin (STFC,RAL,RALSP)
Cc: Pascoe, Stephen (STFC,RAL,RALSP); Luca.Cinquini at jpl.nasa.gov; go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] Corrected THREDDS group wiki page link

Hi all,

I think we are drifting from the main point, which was to define  a subset of the catalog schema tailored for ESG, so we can now what to expect.

The TDS is much more complex that what we use, and the inheritance forces to store state while going down the tree, which we don't use at all (nor probably want)
We should think already that we will have much more datasets in the future, I'm already promoting going to the atomic dataset level for CORDEX, which will simplify data management, but will bloat the number of datasets (40x).
Displaying a list of ~6000 datasets, what we now have, takes some time, and at this time much memory from TDS side. So we might want to add an intermediate level to address this (assume this solves anything...)

But besides this, I don't think we have any compelling reason for supporting the whole TDS schema.
IMHO we should define what we now have in a xsd as a subset of what it's allowed (but considering the previous point regarding scalability).

Furthermore, because the TDS is a metadata "producer" for the ESG, we could add some "general" catalogs with other metadata in the same structure. This could be the modification time of the catalogs or other data not handle directly by the catalogs (i.e. controlled vocabulary services endpoints). Just to depict what I mean (I'm not "proposing" anything, just presenting some possibilities):

Root/
|-MPI <all MPI catalogs>
|   |-metadata
|      |- last_mod_time
|-MOHC <all MOHC catalogs, for us, replicas>
|   |-metadata
|      |- last_mod_time
|-metadata
   |- controlled vocabularies referenced

Though I'd prefer to rely on a standard (ISO?), or a subset of it, instead of the TDS own metadata format.

Just my 2c,
Estani

Am 23.02.2012 18:28, schrieb martin.juckes at stfc.ac.uk:<mailto:martin.juckes at stfc.ac.uk:>
There is a page on the schema here: http://www.unidata.ucar.edu/projects/THREDDS/tech/catalog/InvCatalogSpec.html#catalog

They state pretty clearly that datasets at the top level are allowed .....,

Cheers,
Martin


From: Pascoe, Stephen (STFC,RAL,RALSP)
Sent: 23 February 2012 17:19
To: 'Cinquini, Luca (3880)'
Cc: Juckes, Martin (STFC,RAL,RALSP); go-essp-tech at ucar.edu<mailto:go-essp-tech at ucar.edu>
Subject: RE: [Go-essp-tech] Corrected THREDDS group wiki page link

Hi Luca,

This is what I found googling "THREDDS Schema":

  <!-- xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx -->
  <!-- Catalog element -->
  <xsd:element name="catalog">

    <xsd:complexType>
      <xsd:sequence>
        <xsd:element ref="service" minOccurs="0" maxOccurs="unbounded"/>
        <xsd:element ref="datasetRoot" minOccurs="0" maxOccurs="unbounded"/>
        <xsd:element ref="property" minOccurs="0" maxOccurs="unbounded"/>
        <xsd:element ref="dataset" minOccurs="1" maxOccurs="unbounded" />
      </xsd:sequence>

      <xsd:attribute name="base" type="xsd:anyURI" />
      <xsd:attribute name="name" type="xsd:string"/>
      <xsd:attribute name="expires" type="dateType"/>
      <xsd:attribute name="version" type="xsd:token" default="1.0.2"/>
    </xsd:complexType>

So that says you can have 1..* dataset elements but we know that real THREDDS catalogs can have no dataset elements, just catalogRef elements, so it looks like we should treat the schema with a pinch of salt.

I agree ignoring mid-level properties is probably the right thing to do.

Cheers,
Stephen.

---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK

From: Cinquini, Luca (3880) [mailto:Luca.Cinquini at jpl.nasa.gov]
Sent: 23 February 2012 16:46
To: Pascoe, Stephen (STFC,RAL,RALSP)
Cc: Juckes, Martin (STFC,RAL,RALSP); go-essp-tech at ucar.edu<mailto:go-essp-tech at ucar.edu>
Subject: Re: [Go-essp-tech] Corrected THREDDS group wiki page link

Hi Stephen:
On Feb 23, 2012, at 8:54 AM, <stephen.pascoe at stfc.ac.uk<mailto:stephen.pascoe at stfc.ac.uk>> wrote:

Hi Luca,

> o Each catalog is harvested as a single discoverable dataset - the reason being that hopefully the data provider thought about how to generate
> the catalogs, and decided on what should be the single unit of discovery
>
> o For each catalog, all files are assigned to the top-level dataset container - so if there were many nested datasets with files, it still would result
> in a single discoverable dataset with as many files

Does this mean that each catalog should contain 0 or 1 top-level datasets and any further nesting below that is collapsed down?  That sounds quite sensible.  What happens to any properties in any dataset below the top-level one?
I may be mistaken, but at a thredds catalog always only contains one top-level dataset ? At least that used to be the case, I believe. At the very least, I don't know of any catalog that has many top-level datasets.

As for the properties - if they are associated with mid-level datasets, they would currently be ignored. This could change, if we had examples to work with.

thanks, L


Cheers,
Stephen.


---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK

From: Cinquini, Luca (3880) [mailto:Luca.Cinquini at jpl.nasa.gov]
Sent: 23 February 2012 15:02
To: Pascoe, Stephen (STFC,RAL,RALSP)
Cc: Juckes, Martin (STFC,RAL,RALSP); go-essp-tech at ucar.edu<mailto:go-essp-tech at ucar.edu>
Subject: Re: [Go-essp-tech] Corrected THREDDS group wiki page link

Hi Stephen and Martin,
            just for clarification, this is what the P2P harvesting software currently does - this doesn't mean that it cannot be changed if desired:

o Each catalog can contain an arbitrary hierarchy of datasets and catalogRefs

o Each catalog is harvested as a single discoverable dataset - the reason being that hopefully the data provider thought about how to generate the catalogs, and decided on what should be the single unit of discovery

o For each catalog, all files are assigned to the top-level dataset container - so if there were many nested datasets with files, it still would result in a single discoverable dataset with as many files

o And obviously, all catalogRef are followed in harvesting, and generate separate discoverable datasets.

thanks, Luca

On Feb 23, 2012, at 7:33 AM, <stephen.pascoe at stfc.ac.uk<mailto:stephen.pascoe at stfc.ac.uk>> <stephen.pascoe at stfc.ac.uk<mailto:stephen.pascoe at stfc.ac.uk>> wrote:



Thanks Martin.  There is a catalog_version attribute already, although I don't think there is any documentation on what it means.

On the hierarchy, I personally believe we could allow any number of intermediate catalogues containing <catalogRef> elements in the spec.  Datanodes currently only produce 2 levels .../thredds/catalog.xml and .../thredds/esgcet/catalog.xml, but there would be no harm in having deeper nesting.  What I think is less flexible is the constraint that "leaf-catalogs" contain a single container <dataset> element and a set of child <dataset> elements representing files and aggregations.  This design is what LAS and other bits of ESGF rely on. General THREDDS allows you to mix catalogRef, container datasets and "real" datasets throughout the hierarchy.

Anyone, please chip in if you dissagree.

Cheers,
Stephen.

---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK

From: Juckes, Martin (STFC,RAL,RALSP)
Sent: 23 February 2012 12:11
To: Pascoe, Stephen (STFC,RAL,RALSP); go-essp-tech at ucar.edu<mailto:go-essp-tech at ucar.edu>
Subject: RE: Corrected THREDDS group wiki page link

Hello All,

Sorry I had to leave the telco early - but it was a useful discussion.

After leaving, I had a couple of thoughts:

(1)    There syntax should be versioned , and the version should be indicated in the catalogue somewhere - whatever we agree on, there is bound to be need to change in the future, and changes will be much easier to manage if we have the version in the catalogue. There could be independent syntax versions for the top level catalogue and the "publication unit" catalogue. The cleanest way to do this would be with an xsd document referenced in the schemaLocation attribute. We could set this up initially with a "permissive" xsd schema imposing necessary constraints, but not all the required constraints.
(2)    The decision to stick to a 2-level hierarchy of THREDDS documents (a top-level catalogue with a list of "catalogRef"s and a sub-catalogue for each publication unit) is certainly right for now, but may be too restrictive in the medium term. The specification of "catalogRef" means that very little information is in the top level, and at the next level you have to fetch everything. Having an 3rd level - e.g. for each simulation - would allow more flexibility in recording changes and pointing to documentation.

Cheers,
Martin

From: go-essp-tech-bounces at ucar.edu<mailto:go-essp-tech-bounces at ucar.edu> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of stephen.pascoe at stfc.ac.uk<mailto:stephen.pascoe at stfc.ac.uk>
Sent: 21 February 2012 16:14
To: go-essp-tech at ucar.edu<mailto:go-essp-tech at ucar.edu>
Subject: [Go-essp-tech] Corrected THREDDS group wiki page link


http://esgf.org/wiki/ESGFInterfaceGroups/ThreddsGroup

---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK



--
Scanned by iCritical.



--
Scanned by iCritical.

_______________________________________________
GO-ESSP-TECH mailing list
GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>
http://mailman.ucar.edu/mailman/listinfo/go-essp-tech



--
Scanned by iCritical.





--
Scanned by iCritical.





_______________________________________________

GO-ESSP-TECH mailing list

GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>

http://mailman.ucar.edu/mailman/listinfo/go-essp-tech




--

Estanislao Gonzalez



Max-Planck-Institut für Meteorologie (MPI-M)

Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre

Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany



Phone:   +49 (40) 46 00 94-126

E-Mail:  gonzalez at dkrz.de<mailto:gonzalez at dkrz.de>

-- 
Scanned by iCritical.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20120224/a863be7d/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list