[Go-essp-tech] [metafor] MOHC ESG CMOR interface issues

Tue Jun 22 08:50:18 MDT 2010

Hi Mark et al,

I'll answer the Curator question.  You asked if 
>> Curator only presents metadata
>> from the questionnaire + the DRS facets.  Does it use the netcdf
>> header content in any way.  

This is correct.  Curator's piece of the metadata pie is to deal with the metadata coming in via the questionnaire.  This information will be displayed on the "trackback" pages.  The metadata that exists in the netCDF headers gets inserted into the THREDDS catalogs and is harvested by ESG and ends up on the data side of the house.  If you have questions about how that metadata is displayed, Eric is the person you need to talk to.  If you have further questions on the netCDF metadata, I can forward them on.   

>> We haven't seen the final connection
>> from Curator to the data files so can't be sure what is happening
>> with this.]

We have demonstrated this connection a couple of times, but in case you missed those, you can see the connection between a Curator "trackback" to the data files in the live ESG portal.  We have a sample CMIP5 metadata sample and sample datasets you can explore.  Too see this do the following:

1) Go to:  http://www.earthsystemgrid.org/home.htm
2) In the Search pull down menu select Simulations
3) Type "ESM2M" in the text box and hit the Search button
4) One result is returned: "GFDL ESM2M Control-1860 r1i1"
5) Click on the returned link
6) You should now be viewing the sample GFDL CMIP5 sample trackback instance
7) Click on the outputs tab and expand the data collections accordion
8) Click on any of the links.  This will take you to the data

I would be happy to give you a  personal demonstration of any part of the system you are interested in.  

Cheers,
Sylvia

On Jun 22, 2010, at 7:55 AM, Bryan Lawrence wrote:

> Hi Mark
> 
> I don't have answers to many of your questions, so I'm going to reply 
> directly here, and to the lists, so hopefully we can get specific answers 
> from relevant folks. Can I trouble you to repost these questions every 
> week til you are satisfied you have good answers!
> 
>> At the METAFOR telecon last week I said I would list the issues that
>> we (as data producers) have with the questionnaire/ESG development
>> that we have raised one way or another either with Karl/Charles or
>> through the METAFOR group.  In general the METAFOR group have logged
>> issues on their TRAC site, but we don't have visibility of any such
>> issues list for ESG (aside from the email discussions).
> 
> This is a major issue, as you know, and we are taking steps to do 
> something about it. I believe it will be a few weeks before we see 
> anything happen. Meanwhile the list is the final arbiter ...
> 
>> I should make clear that we don't want these issues to distract from
>> the development progress - system availability is going to become
>> our biggest issue in a couple of months time.  We would just
>> appreciate some guidance on how these issues will be accomodated in
>> the initial operational system so that we can avoid having to do
>> rework on our processing system.  An awful lot of the functionality
>> decisions have been made on guesses on how the CMIP5 data system
>> will work, most of which have been OK - but some of the more recent
>> ESG information has resulted in reworking/additional functionality,
>> and we can't afford to do much more of that.
>> 
>> 	1) Forcing Vocabulary
>> 	The vocabularies used for forcings in the netcdf files and
>> questionnaire (Q) are not the same and may lead to
>> misunderstandings.  Should we plan on having to deal with both
>> approaches or will CMOR allow us to 'point' to the Q content or will
>> the Q adopt the netcdf vocabulary..
> 
> I don't understand this question, which probably means both. The netcdf 
> fields are asking for a specific set of information about forcing in a 
> very generic way. The questionnaire asks similar but more detailed 
> questions (via direct questions about specific boundary conditions and 
> conformance to experimental requirements, which have been adjusted 
> (http://metaforclimate.eu/trac/ticket/671) where appropriate.  
> 
>> 	2) Master metadata
>> 	Given the overlap between the netcdf headers and the questionnaire -
> 
> which is much more limited than you imply, see 
> http://metaforclimate.eu/trac/ticket/732
> what explicitly is the problem?
> 
>> how do we deal with the fact that it is possible that we will update
>> the questionnaire metadata after the data has been published (or
>> indeed as it is highly likely that different people will be involved
>> in the CMOR configuration and the completing the Q response, how
>> will we deal with differences in the metadata from the getgo). 
> 
> Yes, but what really can be different and not caught in the qc level one 
> checks (which force compliance to controlled vocabs)? I don't know the 
> answer, but if there is stuff that *ought* to be the same and isn't, then 
> that's a qc level 2 issue, and you need to fix it!
> 
>>> Will
>> we need to reprocess the data to get the netcdf headers to match the
>> questionnaire content.  Assuming that this is unrealistic
> 
> Why? I can appreciate it might be if you start from the pp, but why not 
> just run a script over the netcdf data?
> 
>> - what are
>> the mandatory fields in the netcdf header that must be consistent
>> with the questionnaire metadata.
> 
> Good question. But see my previous. I'm still in the dark as to the 
> specific risks here.  Most of the vocab terms (models etc) are being 
> forced to be the same. Hence  
> http://esg-pcmdi.llnl.gov/internal/esg-data-node-
> documentation/cmip5_controlled_vocab.txt/view
> 
>> 	[BTW - it is our assumption that the Curator only presents metadata
>> from the questionnaire + the DRS facets.  Does it use the netcdf
>> header content in any way.  We haven't seen the final connection
>> from Curator to the data files so can't be sure what is happening
>> with this.]
> 
> Good question. Don't know. Sylvia?
> 
>> 	3) r:i:p usage
>> 	The latest version of the CMIP5 Model Output Requirements document
>> has some more detail on the use of r:i:p.   We have spent some time
>> working out how this will apply to our experiment setup.  We have a
>> view now, but it is a reasonably complex process to assign r:i:p
>> across all the experiments we are doing, and we are worried that
>> there is clearly room for different interpretations of this scheme
>> which may result in rework or confusion for end  users.  For
>> example, is the initialisation method incremented across the whole
>> set of data deliveries or only within a single experiment.  Further
>> examples which map directly to the CMIP5 experiment plan would be
>> really useful for data producers who are not directly involved in
>> ESG to check our interpretation of r:i:p.
> 
> I think we should take this up directly with Karl. As far as the 
> questionnaire goes, the questionnaire simply asks for the r.i.p for 
> ensemble members, so there ought to be no opportunity for confusion.
> 
>> 	4) quality control
>> 	How do we get access to information on the quality control levels
>> achieved by our datasets within the ESG system (e.g. attainment of
>> level 1 and 2).  For individual datasets this shouldn't be a problem
>> with normal email channels, but when we are delivering multiple
>> datasets in parallel, we may need a more structured interface to get
>> this information.  We have already delevoped such an interface for
>> the BADC ingest delivery step - but we will need to know the status
>> of all datasets post ingest.
> 
> :-)  Who wrote the quality control package for the CIM?  I rather hope 
> we can edit CIM qc records. It's not on the critical path to the 
> questionnaire delivery, but Gerry, Mark and I have  discussed the 
> importance of getting an interface to it in the work plan directly 
> thereafter. It's also the subject of ongoing discussions between us and 
> DKRZ.  Hopefully we'll have a good solution pretty soon.
> 
>> 	Where are the publishing controls inferred by the QC levels going to
>> be implemented - will it be in the publishing interface that our
>> scientists use, or some other system?.
> 
> What are you inferring ? :-)
> 
>> 	6) CMOR updates
>> 	There is a possibility that MIP tables will be updated after data
>> has been published.   They are now held in a separate repository, so
>> we are assuming there will be a notification scheme when they change
>> (correct?).
> 
> Charles?
> 
>> If a future change requires data to be reprocessed I
>> assume each data producer will need to take the decision as to
>> whether they can afford to do the rework.  If we decide not to
>> reprocess will the affected atomic datasets need to be deleted from
>> the system or can they remain with the end-user using the MIP table
>> revision details to decide which changes have been applied to the
>> data.
> 
> I think we should take a decision not to finalise DOIs until the MIP 
> tables have been finalised, and we should expect data to conform to the 
> MIP tables in order to get a DOI. What that says is:
> reprocess the data to a new version if necessary, but we don't want a 
> lot of that, so 
> 
> Karl/Charles: when is the cut off date for NO MORE MIP table changes, and 
> we just live with them as they are (for things in the CMIP5 DRS 
> heirarchy).
> 
>> 	7) Versioning
>> 	(I think Stephen is working on these issues)
>> 	It is our understanding that if we make a change to one atomic
>> dataset and redeliver it to BADC, the version number of all atomic
>> datasets within that realm will be incremented.  As the data node
>> puts the files into the DRS structure (including Version) will the
>> ESG also harvest information on which atomic dataset has been
>> modified so that end-users can identify which atomic datasets they
>> may need to download to upgrade to the new version, or is there
>> other information that we need to send to BADC to provide
>> information on our changes (e.g which datasets we changed at Version
>> N and why?).
> 
> Stephen can handle that one.
> 
> Cheers
> Bryan
> 
> -- 
> Bryan Lawrence
> Director of Environmental Archival and Associated Research
> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> STFC, Rutherford Appleton Laboratory
> Phone +44 1235 445012; Fax ... 5848; 
> Web: home.badc.rl.ac.uk/lawrence
> _______________________________________________
> metafor mailing list
> metafor at lists.enes.org
> https://lists.enes.org/mailman/listinfo/metafor

***********************************
Sylvia Murphy
NESII/CIRES/NOAA Earth System Research Laboratory
325 Broadway, Boulder CO 80305
Email: sylvia.murphy at noaa.gov
Phone: 303-497-7753