[Go-essp-tech] MOHC ESG CMOR interface issues

Tue Jun 22 07:55:55 MDT 2010

Hi Mark

I don't have answers to many of your questions, so I'm going to reply 
directly here, and to the lists, so hopefully we can get specific answers 
from relevant folks. Can I trouble you to repost these questions every 
week til you are satisfied you have good answers!

> At the METAFOR telecon last week I said I would list the issues that
>  we (as data producers) have with the questionnaire/ESG development
>  that we have raised one way or another either with Karl/Charles or
>  through the METAFOR group.  In general the METAFOR group have logged
>  issues on their TRAC site, but we don't have visibility of any such
>  issues list for ESG (aside from the email discussions).

This is a major issue, as you know, and we are taking steps to do 
something about it. I believe it will be a few weeks before we see 
anything happen. Meanwhile the list is the final arbiter ...

> I should make clear that we don't want these issues to distract from
>  the development progress - system availability is going to become
>  our biggest issue in a couple of months time.  We would just
>  appreciate some guidance on how these issues will be accomodated in
>  the initial operational system so that we can avoid having to do
>  rework on our processing system.  An awful lot of the functionality
>  decisions have been made on guesses on how the CMIP5 data system
>  will work, most of which have been OK - but some of the more recent
>  ESG information has resulted in reworking/additional functionality,
>  and we can't afford to do much more of that.
> 
> 	1) Forcing Vocabulary
> 	The vocabularies used for forcings in the netcdf files and
>  questionnaire (Q) are not the same and may lead to
>  misunderstandings.  Should we plan on having to deal with both
>  approaches or will CMOR allow us to 'point' to the Q content or will
>  the Q adopt the netcdf vocabulary..

I don't understand this question, which probably means both. The netcdf 
fields are asking for a specific set of information about forcing in a 
very generic way. The questionnaire asks similar but more detailed 
questions (via direct questions about specific boundary conditions and 
conformance to experimental requirements, which have been adjusted 
(http://metaforclimate.eu/trac/ticket/671) where appropriate.  

> 	2) Master metadata
> 	Given the overlap between the netcdf headers and the questionnaire -

which is much more limited than you imply, see 
http://metaforclimate.eu/trac/ticket/732
what explicitly is the problem?

>  how do we deal with the fact that it is possible that we will update
>  the questionnaire metadata after the data has been published (or
>  indeed as it is highly likely that different people will be involved
>  in the CMOR configuration and the completing the Q response, how
>  will we deal with differences in the metadata from the getgo). 

Yes, but what really can be different and not caught in the qc level one 
checks (which force compliance to controlled vocabs)? I don't know the 
answer, but if there is stuff that *ought* to be the same and isn't, then 
that's a qc level 2 issue, and you need to fix it!

>  >  Will
>  we need to reprocess the data to get the netcdf headers to match the
>  questionnaire content.  Assuming that this is unrealistic

Why? I can appreciate it might be if you start from the pp, but why not 
just run a script over the netcdf data?

>  - what are
>  the mandatory fields in the netcdf header that must be consistent
>  with the questionnaire metadata.

Good question. But see my previous. I'm still in the dark as to the 
specific risks here.  Most of the vocab terms (models etc) are being 
forced to be the same. Hence  
http://esg-pcmdi.llnl.gov/internal/esg-data-node-
documentation/cmip5_controlled_vocab.txt/view

> 	[BTW - it is our assumption that the Curator only presents metadata
>  from the questionnaire + the DRS facets.  Does it use the netcdf
>  header content in any way.  We haven't seen the final connection
>  from Curator to the data files so can't be sure what is happening
>  with this.]

Good question. Don't know. Sylvia?

> 	3) r:i:p usage
> 	The latest version of the CMIP5 Model Output Requirements document
>  has some more detail on the use of r:i:p.   We have spent some time
>  working out how this will apply to our experiment setup.  We have a
>  view now, but it is a reasonably complex process to assign r:i:p
>  across all the experiments we are doing, and we are worried that
>  there is clearly room for different interpretations of this scheme
>  which may result in rework or confusion for end  users.  For
>  example, is the initialisation method incremented across the whole
>  set of data deliveries or only within a single experiment.  Further
>  examples which map directly to the CMIP5 experiment plan would be
>  really useful for data producers who are not directly involved in
>  ESG to check our interpretation of r:i:p.

I think we should take this up directly with Karl. As far as the 
questionnaire goes, the questionnaire simply asks for the r.i.p for 
ensemble members, so there ought to be no opportunity for confusion.

> 	4) quality control
> 	How do we get access to information on the quality control levels
>  achieved by our datasets within the ESG system (e.g. attainment of
>  level 1 and 2).  For individual datasets this shouldn't be a problem
>  with normal email channels, but when we are delivering multiple
>  datasets in parallel, we may need a more structured interface to get
>  this information.  We have already delevoped such an interface for
>  the BADC ingest delivery step - but we will need to know the status
>  of all datasets post ingest.

:-)  Who wrote the quality control package for the CIM?  I rather hope 
we can edit CIM qc records. It's not on the critical path to the 
questionnaire delivery, but Gerry, Mark and I have  discussed the 
importance of getting an interface to it in the work plan directly 
thereafter. It's also the subject of ongoing discussions between us and 
DKRZ.  Hopefully we'll have a good solution pretty soon.

> 	Where are the publishing controls inferred by the QC levels going to
>  be implemented - will it be in the publishing interface that our
>  scientists use, or some other system?.

What are you inferring ? :-)

> 	6) CMOR updates
> 	There is a possibility that MIP tables will be updated after data
>  has been published.   They are now held in a separate repository, so
>  we are assuming there will be a notification scheme when they change
>  (correct?).

Charles?

>  If a future change requires data to be reprocessed I
>  assume each data producer will need to take the decision as to
>  whether they can afford to do the rework.  If we decide not to
>  reprocess will the affected atomic datasets need to be deleted from
>  the system or can they remain with the end-user using the MIP table
>  revision details to decide which changes have been applied to the
>  data.

I think we should take a decision not to finalise DOIs until the MIP 
tables have been finalised, and we should expect data to conform to the 
MIP tables in order to get a DOI. What that says is:
 reprocess the data to a new version if necessary, but we don't want a 
lot of that, so 

Karl/Charles: when is the cut off date for NO MORE MIP table changes, and 
we just live with them as they are (for things in the CMIP5 DRS 
heirarchy).

> 	7) Versioning
> 	(I think Stephen is working on these issues)
> 	It is our understanding that if we make a change to one atomic
>  dataset and redeliver it to BADC, the version number of all atomic
>  datasets within that realm will be incremented.  As the data node
>  puts the files into the DRS structure (including Version) will the
>  ESG also harvest information on which atomic dataset has been
>  modified so that end-users can identify which atomic datasets they
>  may need to download to upgrade to the new version, or is there
>  other information that we need to send to BADC to provide
>  information on our changes (e.g which datasets we changed at Version
>  N and why?).

Stephen can handle that one.

Cheers
Bryan

-- 
Bryan Lawrence
Director of Environmental Archival and Associated Research
(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
STFC, Rutherford Appleton Laboratory
Phone +44 1235 445012; Fax ... 5848; 
Web: home.badc.rl.ac.uk/lawrence