[Go-essp-tech] metadata specifications and data infrastructure

Robert S. Drach drach1 at llnl.gov
Fri Jan 28 13:44:09 MST 2011


Hi Charlotte,

charlotte.pascoe at stfc.ac.uk wrote:
>
> Dear Go-ESSP tech
>
>  
>
> The latest version of the CMIP5 DRS document contains some changes to 
> experiment names.
>
> http://cmip-pcmdi.llnl.gov/cmip5/docs/cmip5_data_reference_syntax.pdf
>
> We will be able to configure the Questionnaire to take account of 
> these changes.
>
> However it is not clear whether the rest of the CMIP5 data system is 
> able to handle or even knows about the new experiment names.
>
In addition to the DRS spec, the important document within CMIP5/DRS is 
the CMIP5 controlled vocabulary 
(http://esg-pcmdi.llnl.gov/internal/esg-data-node-documentation/cmip5_controlled_vocab.txt/view). 
It summarizes the vocabularies defined in the DRS doc, and adds 
vocabularies for institution and CMOR table. It also ties the 
vocabularies to their implementation in the esgpublish configuration.
>
> Similarly it is not clear that all the modelling centres know about 
> the new experiment names either.
>
You're right - it's important that the data publishers be aware of 
updates to the DRS. In general when the ESG data nodes are built, the 
publisher configuration corresponds to the latest DRS spec, but after 
that it's up to the node managers / publishers to track the changes.
>
> This raises the following issues.
>
>  
>
> 1.       Do we have any data in the ESG data nodes with “old” 
> experiment names? Eg. with suffix E, I or S.
>
> 2.       What will we do with data that arrives using old experiment 
> names – can we handle changing possibly thousands of file names?
>
Also what if the metadata in the files is not up-to-date? Rather than 
rewrite thousands of files, it would make a lot more sense to map the 
metadata to the correct values. For example, the publisher could be 
configured to map experiment historicalAA (removed) => historicalMisc.
>
> 3.       Changes to metadata specifications can have a bearing on the 
> way data is handled throughout the ESG federation – do we know what 
> these dependencies are?
>
> 4.       There are probably more issues too
>
I'm also concerned about issues of case-sensitivity in metadata and 
search. If we're not careful we could end up with ten different 
variations on a gateway for the same model. I've tried to stress to the 
data producers that, while they are free to choose their own model and 
institution IDs, once chosen they should be absolutely consistent in 
their usage.
>
>  
>
> It is important to put some thought into where and how metadata is 
> used to organise data within the ESG federation before CMIP5 data 
> starts arriving in earnest.  Then if metadata specifications change we 
> will have a clear idea of what the consequences are and  a plan of how 
> to deal with it.
>
Absolutely correct.

Best regards,

Bob D.
>
>  
>
> Kind regards,
>
>  
>
> *Charlotte*
>
> ----------------------------------------------------------
>
> Dr Charlotte Pascoe
>
> NCAS British Atmospheric Data Centre
>
> STFC Rutherford Appleton Laboratory
>
> Phone +44 1235 445869; Fax ...5848
>
> e-mail charlotte.pascoe at stfc.ac.uk <mailto:c.l.pascoe at rl.ac.uk>
>
> ----------------------------------------------------------
>
>  
>
>  
>
>
> -- 
> Scanned by iCritical.
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>   



More information about the GO-ESSP-TECH mailing list