[Go-essp-tech] Directory structure proposal and a doodle poll

Tue Mar 12 11:14:47 MDT 2013

Hi everybody,

Several groups have expressed interest to publish downscaled climate 
datasets on ESGF. A standardized solution to publishing (directory 
structure elements) would contribute to the prompt identification of 
datasets. To discuss needs and options for directory structure elements 
we had an initial teleconference about a month ago. With this email we 
are expanding our reach to other groups, such as the go-essp group, in 
order to have a wider discussion of these elements.

As agreed during our first teleconference, Aparna and Galia worked on a 
proposal for a Directory Structure for publishing downscaled datasets on 
ESGF. We would like to focus our next teleconference on discussing this 
proposal. Below please find a doodle poll for a potential next 
teleconference.

http://doodle.com/hrwthqs2g5pgsyv6

**********************************************************************
Details of each element of the proposed directory structure:

Proposed elements -
/projectID/sub-project/product/institution/*predictorModel/experimentID/frequency/realm/MIPtable/Pred**
**ictor_experiment_rip/predictorversion*//downscalingMethod/predictand 
(variableName)/region///DownscaledDataversion//file_name.nc

Example:

/ncpp2013/perfectModel/downscaled/NOAA-GFDL/*GFDL-HIRAM-C360-coarsened/amip/day/atmos/day/r1i1p1/v20121024*//GFDL-ARRMv1/tasmax/US48/v20120227//tasmax_day_amip_r1i1p1_downscaled_US48_GFDLARRMv1_19790101-19831231.nc

The new element sub-project (in blue above) gives the opportunity to 
indicate to users that in the one case the method was trained on 
observations (standard setting), and in the other on model that was 
considered to be the truth (perfect model setting);
The options there could be: PerfectModel or Standard - where possibly 
there could be a different name instead of 'standard' for the standard 
downscaling setting.

For NASA datasets some of the directories could be:

project = NEX
product = downscaled
institution = NASA-Ames
predictorModel - original model value
experimentID = historical
frequency = mon
realm = atmos
Predictor_experiment_rip - original model value
variable = precipitation or temperature
region = CONUS

DownscalingMethod will also be included as a directory to allow for 
search on method.

**********************
There are a set of sub-directories that refer to the _PredictorModel_ - 
presented in bold - 
*/predictorModel/experimentID/frequency/realm/MIPtable/Pred**
**ictor_experiment_rip/predictorversion*

Where:

  * predictor model - is the specific GCM which is the source of the
    predictor data set - GFDL-HIRAM-C360-coarsened - in the above example
  * experimentID - the specific experiment - amip in this case
  * frequency - refers to the temporal scale of the predictor fields - daily
  * realm - the realm of the predictors - in this case atmos(phere)
  * MIPtable - name of the model intercomparison table - daily in this
    example, could be amon - for atm monthly data;
  * Predictor-Experiment-rip - follows the standard notation from CMIP5
  * version - the version date of the global model that provided the
    predictor dataset

The elements above follow quite closely the structure for CMIP5 model 
output directory elements.

There is a set of sub-directories that refer to the Downscaling method - 
presented in italics -
//downscalingMethod/predictand (variableName)/region///DownscaledDataversion
/
/Where:

  * downscalingMethod - is the downscaling method abbreviation - in this
    case GFDL-ARRMv1 - the GFDL in the name indicates that this is a
    setting applied by GFDL where there were two sets of predictors,
    based on the ARRM method of K.Hayhoe; also v.1 indicates which
    version of the ARRM method was used (the original version) - more
    details about the method are given in the global attributes of the file;
  * Predictand (variableName) - the specific predictand variable that
    was downscaled; tasmax in this case;
  * region - indicates that the method was applied to the US48
  * DownscaledDataversion - the version of the downscaled dataset

*For the purposes of standardization there are two directions to consider:*

1) One is to have*one standard directory* structure that will be used by 
all - for example, following the example of GFDL to have the details of 
the predictor model first and then the downscaling method details:

  * ProjectID - sub-project - product - Institution - Predictor dataset
    details - Downscaling method details - Filename

Having a standardized approach would help any automated service/web 
service to detect the directory path for a particular dataset.

2) During our last teleconference there was a proposal to follow the 
downscaling practice and describe the downscaling method first and then 
the predictor model. This leads to *two paths*:

         . ProjectID - _Standard or Perfect Model sub-project facet_- 
product - Institution -  then see below:
                -  (if Perfect model setting) Predictor dataset details 
- Downscaling method details,
                -  (if Standard setting) - Downscaling method details - 
Predictor dataset details

The NCPP Core team accepts that it may be reasonable to have a directory 
structure - where the method description is first; and another directory 
structure - where the predictor description is first and then the 
methods that are applied are described; *NCPP will support either 
approach* (one overall directory structure, or two separate pathways) 
and if the second approach is chosen (with two different sub-directory 
sequences) - we would like to promote and to support the standardization 
of these different directory pathways - meaning - we will support two 
standardized directory structures to accommodate two common practices.

******************
Additional details:

*Variable level attributes-*
The published dataset should also conform to CF-standards.
eg-

                 tasmax:long_name = "Downscaled Daily Maximum 
Near-Surface Air Temperature" ;
                 tasmax:units = "K" ;
                 tasmax:missing_value = 1.e+20f ;
                 tasmax:_FillValue = 1.e+20f ;
                 tasmax:standard_name = "air_temperature" ;
                 tasmax:original_units = "K" ;
*                tasmax:downscaling_method: GFDL-ARRMv1*

*Global attributes- *listing a few here, several CMIP-style attributes 
will be inherited.

"predictorModel" will replace "model_id"
   For the 'downscaling model', as agreed with Luca on the call it would 
be 'downscalingMethod'

                 :Conventions = "CF-1.4" ;
                 :references = "info about model, training datasets etc 
will be provided here"
                 :info = "additional info about the downscaling method"
                 :creation_date = "2011-08-19T21:57:06Z" ;
                 :institution = "NOAA GFDL(201 Forrestal Rd, Princeton, 
NJ, 08540)" ;
                 :history = "info on file processing. Eg" processed by 
toolX." ;
                 :projectID = ncpp2013
                 :subprojectID = perfectModel
                 :product = downscaled
                 :institution = NOAA-GFDL
                 :predictorModel = GFDL-HIRAM-C360-coarsened
                 :experimentID = amip
                 :frequency = day
                 :modeling_realm = atmos
                 :Predictor_experiment_rip = r1i1p1
                 :region = US48
                 :table_id = day
                 :version = v20120227
                 :downscalingMethod = GFDL-ARRMv1
**************************************************

Best regards,
Galia and Aparna

-- 
Galia Guentchev, PhD
Project Scientist
National CLimate
Predictions and
Projections
Platform (NCPP)
NCAR RAL CSAP
FL2 3103
3450 Mitchell Lane
Boulder, CO, 80301
phone: 303 497 2743

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20130312/e8048ec4/attachment.html