[Go-essp-tech] Minutes: 6/15/10 Metadata Pipeline Telco

Sylvia Murphy Sylvia.Murphy at noaa.gov
Fri Jun 18 16:24:22 MDT 2010


Present:
Sylvia Murphy
Bryan Lawrence
Nate Wilhelmi
Luca Cinquini
Roland Schweitzer
Serguei Nikonov
Dean Williams
Eric Nienhouse
Craig Ward
Bob Drach
Steven Pascoe
Martina Stockhause
Stephan Kindermann
Frank Toussaint
V. Balaji


Hi Everyone,

Below is a summary of the metadata pipeline call held on Tuesday 15 June, 2010.  I have included the notes embedded in the published metadata pipeline that I emailed out before.  

SCHEDULES:

Please note that Curator's timelines are synced with those of ESG. Below is ESG's current release schedule, which can also be viewed at https://wiki.ucar.edu/display/esgcet/ESG+Gateway+Release+Roadmap.

Please note that code freezes occur TWO WEEKS before these dates. Only bug fixes are allowed after a freeze.  Also note that new features should be demonstrated AT LEAST THREE WEEKS before any release so that changes can be made before the freeze.  I have put these dates into the list below. 

1.1 July 15th (Freeze: July 1st, Demontration Deadline: June 24th)
1.2 August 15th (Freeze: August 2nd, Demonstration Deadline: July 26th)
1.3 September 16th (Freeze: September 2nd, Demonstration Deadline: August 26th)

Resource Constraints: 
 
* Curator's developer Julien will be on travel starting 21 August.  He will be gone one month. His last days to get anything done will be August 19th, 20th, and 23rd. Because of this, Curator's primary contributions to CMIP need to be finalized by ESG's 1.2 release (Freeze 2 August). 

At the bottom of the email there is a draft outline of capabilities by release.

Curator also syncs its efforts to those of METAFOR.  Below I am quoting Charlotte's latest questionnaire release schedule (dated 3 June).

"Here is the release schedule showing just the current version of the timeline – to avoid confusion.
14th June – Beta testing starts" (Bryan indicated that the beta testing is likely to slip another week)
"13th-20th July – Release of the questionnaire."
*Middle of August: First CMIP5 files received from METAFOR (Target based upon conversations with METAFOR)

Demonstration Schedule:

a) XML Harvest demonstration.  XML Harvest demonstrations have been occurring on a regular basis.   They are primarily for the METAFOR and go-essp-tech communities.  They show a sample questionnaire output (given to us by METAFOR) and are meant to primarily QC the harvesting process but also give folks an idea of what the questionnaire output looks like so that changes in the questionnaire and/or the ESG display can be identified. The next demonstration is targeted for the week of 28 June.  This date is dependent upon receipt of an updated XML file from Gerry and upon Julien who will need to add code to harvest the improved inputs and conformances.  See 4) below.

b) Scientist demonstration.  The entire gateway (data browse, data search, model search, metadata content) needs to be demonstrated to scientists.  

Sylvia asked the group if we should go ahead and conduct this demonstration in July prior to the 1.2 ESG release even if a complete example is not available.  She indicated that the consquence of waiting is that further changes to the system will have to be made while the system is operational.  Both Balaji and Bryan commented that they thought it was too early to demonstrate the system.  Eric indicated that ESG is anticipating having to make changes while operational anyway.  The group agreed to wait to conduct this demonstration until after real data is in the system and a complete metadata example is available.  Target: November 2010. 


METADATA PATHS:

There are three primary paths:  the gridspec path, the netCDF path, and the questionnaire path. Tasks to be completed are embedded under the specific metadata pathway step they refer to.  This is a summary that includes portions of the ESG/CMIP5 effort outside of the Curator project scope and may have gaps and errors. 

Gridspec Path:

   * What is the status of Gridspec command line program? Balaji indicated that the libCF software will be published in September 2010.  It will contain regrid capabilities. By July GFDL plans to publish a native-grid data set and instructions on what you can do with it.  There are also other plans to get libCF into other tools like NCL.  
   
   * Do we anticipate getting gridspec files now? Balaji reported that it is very unlikely that any group besides GFDL will produce gridspec files. Additionally, he knows of no one who is planning on producing multi-mosaic files.  He also indicated that CMOR2 is still going to produce a link to external gridspec files for users that want that.  
   
   * How do we balance gridspec metadata from the new grid info coming from the Questionnaire? Balaji indicated we should forget about all the gridspec display that was done before and just display the grid information that will come in via the questionnaire. 

netCDF Path: 

* Is the DRS document final? Bryan indicated that the document is finished but that there is another document being circulated that needs to be discussed.  This document is about the metadata that is supposed to appear in the headers of the netCDF files.  It really confuses simulations, ensembles, and experiments. METAFOR has examined this document and found inconsistencies and impossibilites.  They will be composing an email to Karl about this.  It was revealed later in the call that Karl had made changes within the last month to experiment names, so DRS may not be final as Bryan thinks.

Action: METAFOR to email Karl and the group about this document.  Sylvia to track that the loop gets closed on this issue. 

* What is the status of CMOR2?
Balaji indicated that CMOR 2.0 was released yesterday.  They expect the software to be final even though they expect the CMOR tables to keep evolving. Eric wanted to know if there were any new attributes coming out of CMOR2.  Bob did not think so.  That means that at this point ESG is currently harvesting all the CMOR2 attributes. 

* What is the status of the netCDF harvesting software?
Bob indicated that version 2.4 is out and should be officially announced this week.  It has numerous upgrades for CMIP5.  

* Are there any missing pieces with respect to netCDF metadata harvesting to ESG display? See the discussion above about CMOR2.  At this point it is believed that all the CMOR2 attributes are accounted for in the ESG system. 

Needed: The netCDF path discussion resulted in the identification of a need for a file with the official vocabularies (e.g. model, institution, DRS, and experiment names) syntax.  This will needed to be used by both the ESG publisher and also the questionnaire.  

Action:  Bob Drach will produce this document (Target: 18 June) and send it to the list and also put it on the CMIP5 web site.  

Action: METAFOR will take this and put it into a vocabulary server, which will ensure that the experiment descriptions are consistent and its usage is consistent in the questionnaire.

Needed: The discussion also resulted in the identification of a need for a document that describes what information models are being used within the federation.  Bryan indicated that these need to be tabulated and versioned.

Questionnaire Path:  

1) Modelling centers fill out online questionnaire (being developed by METAFOR for 12 July release).

2) METAFOR converts output from the questionnaire into a CIM compliant XML file.

3) METAFOR sends the XML output from the modelling centers to ESG via Atom.
   *In Progress: ESG to write software to periodically query METAFOR's Atom server for new files and to download those files. 
   
   Luca indicated that the infrasturcture to do this is built but that it still needs to be integrated with Julien's harvesting code. He is also waiting on Bryan's latest changes to their feed, which won't come out until the current questionnaire branches are merged and the beta version of the questionnaire is released (scheduled for 23 June).  Bryan commented that a good target date for the completion of this task is one week after the beta release. 
   
Action:  Sylvia to check with Gerry about uploading of his example into the feed, which will give Luca something to play with.  

 Luca wanted to know if there was any way to tell if the XML had changed.  Bryan responded that to answer that question, one needed to know why it was updated in the first place?  If it was to fix the metadata then the version will be incremented in the file but the document ID will remain constant. If, however, it is the object that has changed there will be a new document ID in the file. Bryan further indicated that there will be a string in the document itself that will self-identify things. 
 
 Luca asked if there was a document describing all the things that METAFOR  planned with respect to the documents.  Bryan indicated that everything they were planning on doing was contained in a ticket on the METAFOR system and that Luca could review all those tickets to figure this out. 
 
 Eric wanted to know whether there was support for the deletion of documents.  Bryan responded no, that one should never delete documents, but only make them obsolete. 

4) ESG uses software (to be developed by ESG) to convert the XML into an OWL file for ingestation into the Sesame Triple Store.

Sylvia reported that she and Julien have been receivng periodic XML samples from METAFOR, but that they have not received one from the latest version of the Questionnaire.  The reason for this is that the latest version contained numerous bugs that was preventing Gerry from inputing information. Gerry reported today that those were now fixed, but that the Questionnaire to XML software was broken.  Rupert reported today that he intended to work on this right away.  

The discussion of when ESG can expect things from METAFOR prompted a discussion about whether or not we should wait until every aspect of the questionnaire can be displayed before uploading and displaying real instances.  Bryan suggested we not wait, but that we get the harvesting program fully automated and publish the metadata even if it is only the title and that they only way to test the system is to actually use it.  Dean had to leave the call before he could comment on this suggestion, so Bryan agreed continue the discussion in an offline email.  

Bryan indicated that he is no longer on the critical path for the questionnaire and that this has been handed off to Gerry.  Unfortunately, Gerry is also the person who is creating the XML example, so he can't finish the questionnaire and get us a sample at the same time.  Bryan was suggesting that we allow him to finish getting the questionnaire deployed and then ask for a sample. 

For the above reasons I have removed all of the details about the sample XML file that ESG would like to receive until Bryan's suggestion is clarified. 

Bryan wanted to put forth some XML Handling rules that we should  expect ESG's XML harvester to do:   a) gracefully ignore content it doesn't understand, and b) gracefully ignore structure that mismatches with what is understood, and in both cases raise errors. Sylvia indicated that Julien's program already does this.  
  
Luca asked how we were going to prevent people from uploading data before the METAFOR metata was present (Quality Control Level 2).  Bryan indicated that first off people will be allowed to publish QC Level 1 data (passed through CMOR2) to their own data nodes immediately  and give access to their local users through their own mechanisms.  At this point, the data will begin the replication process, which will take a while.  During this time the METAFOR metadata can be finalized and checked.  Sylvia asked what will happen when a user hits submit on the questionnaire.  Will the results automatically be placed into the ATOM feed?  Bryan indicated that no, it will be farmed off and reviewed by a human before being placed in the feed. Bryan suggested that we not put in barriers to the information and data flow initially because if we do, there will be no way to test the system.

5) ESG continuity of operations
    * Don identified the need to develop a plan to reinitialize the sesame triple store with the CMIP5 metadata given a catastrophic failure.  Eric reported that what Don wanted to do was be able to recreate the metadata (Triple Store) for many reasons including disaster recovery.  He said that they needed to start the discussion about this but did not plan to get to it in the near term, so no release has been targeted. They do expect to get to it sometime near July of 2011. Luca indicated that he thought the feed software might actually cover most of these needs, and that the 30 minute latency that would result between when the portal was reinitialized and when the data would appear could be solved by a backup copy of the Triple Store. Either way, this issue does not need to be tracked for CMIP5.
 
6) The XML instance is displayed on the web. The XML instances will contain all the information about each CMIP5 model and simulation.  This will include information about the platform the simulation was run on, the descriptive scientific properties of each component, etc. 
   * Needed: Next XML Harvesting demo (Target: week of 28 June)
   * Needed: All of the CMIP5 experiment information needs to be represented in the system.  A separate XML conversion program needs to be written to convert METAFOR's experiment XML files to OWL.  (Target: 14 July)
   
 Bryan's suggestion that we display what we got applies to this item as well.  As new pieces of metadata are identified in the questionnaire output, they will be added to the display.   
  

7) Users can use the ESG search page to find model metadata. My last conversation with Don indicated that ESG was scheduling this task for September.  Is that too late?
    * Needed: Community-wide consensus on the facets displayed for the search
    * Needed: ESG modify the search pages to conform to community desires 
    * Needed: ESG finalize the look and feel of the search page and fix any identified bugs
    
Sylvia indicated that she has been trying to get the group to consider the search facets without sucess and asked if this needed to be dealt with by ESG Release 1.2 instead of ESG's current target date of September.  Luca thought the list should be finalized earlier than September.  Balaji asked how quickly the portal can be modified.  Luca responded that it only took minutes. Balaji then indicated he thought the facets should not be finalized until there is more data in the system to test the search against.  Bryan agreed with this and the group decided to postpone any work on the search facets for the forseeable future.  

SUMMARY OF METADATA-RELATED CAPABILITIES BY ESG RELEASE

Note:   Here is a current list of baseline model metadata capabilities.  Only future capabilities  are listed under the releases.  Please assume that the baseline carries forward:

Baseline capabilities: 
* Component navigation
* Technical properties displayed
* Basic properties (e.g. institution, contacts etc) displayed
* Pop-up definitions of attributes
* Associated grids displayed
* Datahook
* Initial conditions/boundary conditions displayed
* Conformance displayed
* Scientific properties displayed
* Experiment information displayed

ESG Release 1.2 (Freeze: 2 August)
* XML Harvest software complete and made operational
* Experiment information harvested and host files reconciled
* Trackback page display adjusted for users coming via the data browse
* Citation formatting improved
* Genealogy displayed
* Simulation to data connection and publishing process made more robust
* Ensemble information displayed

ESG Release 1.3 (Freeze: 2 September)
* Finalize search interface
* User's changes to the component navigation retained in the session
* Link behavior throughout the site made more consistent
* Loading of the component tree made more efficient

***********************************
Sylvia Murphy
NESII/CIRES/NOAA Earth System Research Laboratory
325 Broadway, Boulder CO 80305
Email: sylvia.murphy at noaa.gov
Phone: 303-497-7753





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20100618/2e55b43e/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list