[Go-essp-tech] Metadata pipeline paths and proposed schedule

Sylvia Murphy Sylvia.Murphy at noaa.gov
Tue Jan 5 17:36:37 MST 2010

Hi Everyone,

Cecelia asked me to provide you with a summary of the CMIP5 metadata paths and a proposed schedule in preparation for the upcoming metadata pipeline call on 11 January.  There are three primary paths:  the questionnaire path, the gridspec path, and the netCDF path. Tasks to be completed are embedded under the specific metadata pathway step they refer to.  This is a summary that includes portions of the ESG/CMIP5 effort outside of the Curator project scope and may have gaps and errors. 
Please note that Curator's timelines are synced with those of ESG. Below is ESG's current release schedule, which can also be viewed at https://wiki.ucar.edu/display/esgcet/1.0.0+Release+Plan

1.0.0-Beta 1: Jan 7, 2010

1.0.0-Beta 2: Jan 30, 2010

1.0.0-Release Candidate 1: Feb 30, 2010

1.0.0: March 30, 2010

At the bottom of the email there is a draft outline of capabilities by release.

Questionnaire Path:  

1) Modelling centers fill out online questionnaire (being developed by METAFOR for 1 February release).

2) METAFOR converts output from the questionnaire into a CIM compliant XML file.

3) METAFOR sends the XML output from the modelling centers to ESG (method yet to be determined - anticipated first arrival mid March).
     * Needed: Bryan and Luca decide on transfer method (e.g. OAI versus RSS) (Target 11 January). 

4) ESG uses software (to be developed by ESG) to convert the XML into an RDF file or to deposit it directly into the Sesame Triple Store.
    * Needed:
         a) Number of Attributes estimate:  Required from METAFOR (Target 7 January).  This is required in order to determine how long it will take to complete the XML upload code.                        
         b) Better sample XML file from METAFOR: Required from METAFOR (Target 12 January).  We have a partial sample we are starting with but will need better samples soon to guide the coding.          
         c) Complete sample XML from METAFOR (Target 1 February).            
         d) Curator/ESG (Luca/Julien) to complete the XML to RDF/Sesame software (Target 12 February).   We are estimating it will take one month to do this task assuming we get good information early.  

5) The XML instance is displayed on the web. The XML instances will contain all the information about each CMIP5 model and simulation.  This will include information about the platform the simulation was run on, the descriptive scientific properties of each component, etc.  Below is a list of display related tasks scheduled for completion between now and 12 February 2009.  
  Inputs (e.g. Initial Conditions and Boundary Conditions)
      *Curator demonstrated at the 5 January METAFOR telecon. Requests for changes received.             
            a) Curator (Sylvia) enters ontology changes requested by METAFOR (Sophie) (Target 6 January).
            b) Curator (Julien) modifies display based upon METAFOR (Sophie's) requested changes (Target 6 January).
            c) METAFOR to write and proof the definitions for inputs on the wiki page with definitions/ticket 280 (http://metaforclimate.eu/trac/wiki/tickets/280) (Target 7 January).
            d) Curator (Julien) modifies the input display to create a collapsible tree (Target 7 January)
            e) GFDL provides (for demonstration purposes) example of two initial conditions and two boundary conditions for both the atmosphere and ocean models (Target 8 January).             
       *Curator to demonstrate modifications during the 12 January METAFOR Telecon.  

            a) METAFOR (Charlottle)/Curator telcon to discuss conformance display requirements  (Target 7 January).                       
            b) METAFOR to proof the conformance attributes and definitions on the wiki with definitions/ticket 280 (http://metaforclimate.eu/trac/wiki/tickets/280) (Target 8 January) .                       
            c) Curator (Sylvia) enters into the ontology (Target 12 January).
            d) GFDL provides example (for demonstration purposes) (Target 14 January).                                      
            e) Curator (Julien) enters into the display (Target 15 January).
       *Curator to demonstrate the week of 18 January. 
  Output Variables:
        * Needed:
             a) METAFOR (Gerry Devine)/Curator telcon to discuss output variable display requirements within the model metadata portion of the display (Target 8 January).
             b) Curator (Sylvia) enters into the ontology (Target 12 January).
             c) Curator (Julien) enters into the display (Target 15 January).                                           
             d) GFDL provides example variables names (for demonstration purposes) (5 each for the atmosphere and ocean components) (Target 14 January)
         * Curator to demonstrate the week of 18 January. 

  Scientific Properties:
              a) METAFOR (Rupert) to provide text list of all scientific properties (Target 7 January).                                         
              b) METAFOR (Rupert) to finish the mindmap to RDF conversion software (Target 15 January).                       
              c) METAFOR to proof  all of the definitions for the scientific properties (Target 13 January). These exist in the mindmaps. These will be automatically ingested and displayed within ESG.
              d) GFDL to fill out the online questionnaire and provide XML output to ESG (Target 8 February).                      
              e) Curator (Julien) enters into the display (Target 12 February).                             
         * Curator has prototyped the general appearance of scientific properties in the display.  The full set will be demonstrated during the February demonstration scheduled for the week of 15 February.      

   Data Hook:
         * Needed:                                           
               a) GFDL to provide (Target 8 January) data files for the ESM2M demonstration simulation to ESG.  This is required to demonstrate the model metadata to data connection.  If we don't receive a real file we will create a synthetic one.
         * Curator has already demonstrated the ability to connect model metadata to datasets.  We will review this feature in subsequent demonstrations.

Gridspec Path:

The dates below ensure that ESG has sufficient time to verify that the gridspec metadata pipeline works and that the gateway can properly display harvested gridspec metadata. 

1) Modelling centers run the gridspec command line program (being developed by GFDL) and generate gridspec files
     * Needed: GFDL ensures the program works for all grids represented in CMIP5 (Target 8 January).

2) These files are transferred to a data node.

3) Software will run on the data node and harvest the grid metadata and input that metadata into ESG's system. We are not sure if GFDL or PCMDI is doing this (Curator is not).     
     * Needed: Someone to develop metadata harvesting program (Target 22 January).

4) The grids can now be connected to specific simulations (method to be determined, may be manual) and displayed.       
     * Needed: The gridspec information is totally separate from the questionnaire information. ESG/METAFOR/Curator to determine how to associate grid files with particular models and simulation (Target 15 January).  We need a method, preferably automatic, to link the two. 

netCDF Path: 

1) Modelling centers output DRS compliant netCDF data files.

2) Modelling centers send data to a data node.

3) Publishing software is run at the data node.
    * Needed:                 
       a) ESG (Eric) to modify publishing software to extract new DRS fields and propagate this information through THREDDS.
       b) ESG (Luca) to modify the publishing software to extract the new DRS field and propagate this information into the database, and Sesame Triple Store. 


Note:  We are not anticipating versioning model metadata before the March 1.0.0 release. Here is a current list of baseline model metadata capabilities.  Only future capabilities  are listed under the releases.  Please assume that the baseline carries forward:

Baseline capabilities: 1.0.0-Beta 1: Jan 7, 2010
Demonstration: 5 January and previous
Final Review: 6 January
* Component navigation
* Technical properties displayed
* Basic properties (e.g. institution, contacts etc) displayed
* Pop-up definitions of attributes
* Associated grids displayed
* Datahook

1.0.0-Beta 2: Jan 30, 2010
Demonstration: week of 18 January
Final Review: week of 25 January
* Initial conditions/boundary conditions displayed
* Conformance displayed
* Output variables displayed
* netCDF harvesting modified to handle new DRS fields

1.0.0-Release Candidate 1: Feb 30, 2010
Demonstration: week of 15 February
Final Review: week of 22 February, but depends on extent of changes
* XML upload software written
* All scientific properties displayed

1.0.0: March 30, 2010



Sylvia Murphy
sylvia.murphy at noaa.gov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20100105/046b964b/attachment-0001.html 

More information about the GO-ESSP-TECH mailing list