[Go-essp-tech] [esg-node-dev] Use of <metadata> element in THREDDS catalogs

Cinquini, Luca (3880) Luca.Cinquini at jpl.nasa.gov
Wed Jun 1 09:18:01 MDT 2011


Hi Stephen,
to answer some of your questions...

o The p2p index will harvest all properties in the THREDDS catalogs. Infact, I was able to run a quick job and ingest that catalog in our prototype system - you can search for "cordex" at this URL:

http://esg-datanode.jpl.nasa.gov/esgf-web-fe/

As you can see, I have defined two facets: "CORDEX_domain" and "Frequency" (upper case!) that relate to the metadata in that catalog. As I was mentioning, the metadata just flows through.

o Note that I think some of the metadata property names should really be lower case, instead of upper case.... at least that's the CMIP5 convention. Off course we could change the case while parsing the catalogs

o Your last point about inheriting metadata is exactly what we were discussing with Charles and others in previous days. Charles asked that, in order to make the search for files more powerful, we tag all files that belong to a dataset with the properties that belong to the dataset: this way, you could make a search for files subject to the constraints experiment=X, frequency=Y and model=Z. This is something that is not difficult to do, but we haven't done yet because it means "interpreting" the catalogs as opposed to just "parsing" them. But it looks like there is enough momentum behind this requirement that we should go ahead and do it...

o Finally, note that so far the p2p search only looks for Datasets - this is to limit the number of results. We could as well look for Files, if we wanted, from the web interface.

thanks, Luca


On Jun 1, 2011, at 7:59 AM, <stephen.pascoe at stfc.ac.uk<mailto:stephen.pascoe at stfc.ac.uk>> wrote:

(note I CC'd gonzalez at dkrz.de<mailto:gonzalez at dkrz.de> by mistake -- I meant go-essp)

Hi Roland,

I suppose what I'm getting at is would the Gateway detect driving_model_id=ERAINT should result in a facet value for that dataset or just ignore it.  Also I think the P2P index node will index files and datasets separately.  In theory it should therefore include this facet to both the dataset and all files it contains but will it now and should it in the future?

More generally, do we want to use this inheritance feature for key/value pairs that result in facets in our user interfaces and search APIs?  This gets to an underlying design decision about what information we expose at the file level and what at the dataset level.  It is the case that each CMIP5 file has a model_id but this property isn't exposed in the THREDDS as file properties, only dataset properties.

Cheers,
Stephen.

---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK

From: Roland Schweitzer [mailto:Roland.Schweitzer at noaa.gov]
Sent: 01 June 2011 14:49
To: Pascoe, Stephen (STFC,RAL,RALSP)
Cc: esg-node-dev at lists.llnl.gov<mailto:esg-node-dev at lists.llnl.gov>; gonzalez at dkrz.de<mailto:gonzalez at dkrz.de>
Subject: Re: [esg-node-dev] Use of <metadata> element in THREDDS catalogs

Hi All,

I agree that we need to formalize an ESG profile.

To that end, the THREDDS XML schema allows for the inheritance of metadata to be controlled by an attribute.  And the schema allows for more than one <metadata> element with different inheritance in a particular <dataset>.  Perhaps all that is needed is to get the inheritance right.

But, isn't it the case in the example you sent that the inheritance is in fact correct.  A variable in this data set has the property driving_model_id=ERAINT, for example.  What are the properties that were added that should not be inherited?

Roland

On 06/01/2011 04:18 AM, stephen.pascoe at stfc.ac.uk<mailto:stephen.pascoe at stfc.ac.uk> wrote:
Hi all,

I've just received an excellent example of why we need to formalise an ESG profile for THREDDS catalogs.  Henrik Wiberg has added some extra THREDDS properties to support the CORDEX project (see the attached email for links).  He's put these properties in a <metadata> element within the top-level dataset element.  This is valid THREDDS but I'm not sure what ESG would do with it.

Properties in <metadata> elements implies they apply to all dataset elements contained within the current one.  Now the new search engine will index properties in files as well as datasets we need to decide whether we are going to support this feature of THREDDS.  My guess is that the Gateway and P2P index wouldn't process this right.

My instinct is that there should be a clear distinction between properties associated with a dataset and those associated with the files it contains -- therefore in this case we'd need to move the properties out of the metadata section.

Cheers,
Stephen.

---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK



--
Scanned by iCritical.





--
Scanned by iCritical.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110601/fa94e426/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list