[Go-essp-tech] CIM Quality Examples

Fri Jul 8 10:21:36 MDT 2011

Hi Martina,

Let me explain what I mean by the data and model-metadata side of  
ESG.  Some other comments are inline below.

There are two places where metadata is displayed within ESG and they  
come from totally different sources and are being developed by two  
slightly different groups of people.

Data:
Metadata is being harvested from the netCDF files and placed within a  
TDS catalog.  Some of this metadata is being displayed on pages that  
serve the datasets themselves.   This is being developed at PCMDI (and  
perhaps others) through CMOR  and the Gateway publishing software etc.

Model-metadata:
This is the metadata coming in from the Metafor questionnaire via XML  
files that are converted to RDF and displayed in the ESG trackback  
pages.  This is being developed in a collaboration between the Curator  
project and the NCAR ESG team.

I am attaching a screen shot showing how the model metadata side lists  
the data collections associated with the rendered simulation:

-------------- next part --------------
A non-text attachment was scrubbed...
Name: data.tiff
Type: image/tiff
Size: 291604 bytes
Desc: not available
Url : http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110708/f1a1050a/attachment-0001.tiff 
-------------- next part --------------

Perhaps the above image will help inform my question.  Is there going  
to be a QC value for each of the links above, for all the links as a  
set (representing all the data that the simulation produces), or at  
the level of individual files within each link listed above?

more comments inline below...

On Jul 8, 2011, at 7:23 AM, Martina Stockhause wrote:

>  Hi, Sylvia,
>
> I add the answers for 2) and 3)
>
>> 2) I am trying to understand what you mean by the granularity of an  
>> ESG dataset / publication unit and experiment.  Again I just do see  
>> this side of things....
>> The current query from the model metadata side displays  all the  
>> datasets that have model=x and experiment=y in their TDS.  Is there  
>> one QC value for this entire set or are they on the level of the  
>> individual datasets in that set (e.g. from a data search.....    
>> project=CMIP5 / IPCC Fifth Assessment Report, model=HadGEM2-ES, Met  
>> Office Hadley Centre, experiment=historical, time_frequency=6hr,  
>> modeling realm=atmos, ensemble=r1i1p1, version=20101208)
>>
>> The reason I ask is that if there is one QC value for the entire  
>> set, then displaying this information with the model metadata makes  
>> sense.  If it is at the level of the search listing above, then it  
>> may best be displayed with the data page itself since there is a  
>> one to one correspondence.
>>
> I am not sure what you mean by model metadata side. If I speak of
> experiment I mean the TDS sense of experiment (in metafor that would  
> be
> a simulation) and I mean the full DRS name of the experiment, e.g.
> cmip5.output1.IPSL.IPSL-CM5A-LR.amip4K. For me the model is part of  
> this
> DRS experiment name "IPSL-CM5A-LR".
> If your model metadata is the metadata of the realization of one
> experiment performed by one model, than this should be my  
> "experiment".

Just to restate how we think of this....to us an experiment is  
something like CMIP5 rcp45.  It is a list of requirements.   Every  
model participating in CMIP5 will run a simulation against this  
experiment.   Therefore, a simulation is a model run that for CMIP5  
conforms to a particular experiment.  It produces a bunch of files.  I  
think this is your "experiment".

>
> The information for the QC status of an experiment is not sufficient,
> esp. if it comes to DOIs. DOIs are related to the latest version (in  
> our
> example: version=20101208 ) of all datasets belonging to an experiment
> at the time of DOI assignment. Afterwards that version(s) of the
> individual datasets belonging to the DOI are fixed. If there is data
> published afterwards under a new version, it is not part of the DOI.  
> So,
> if you have a quality information/status for an experiment, not all of
> the datasets and not always the latest versions of every dataset are
> connected to it. For the time of QC level assignment for the  
> experiment
> and CIM experiment document publication, the CIM dataset documents  
> tell
> you which versions of the datasets are part of that assignment.

Not sure what you mean by a CIM experiment document and a CIM dataset  
document here.  To my knowledge the gateways are receiving one CIM  
document set, the content of the questionnaire.  Not sure what these  
others are.  Of course your new QC XML is another document but that is  
new.

>
> Difficult to explain, I hope you get it nevertheless.
>> 3) The primary suggestion I would make is that if there is key  
>> information that we need to extract and display or extract to  
>> connect to something else, that this should not be buried in a long  
>> string.  It should be a stand alone entity.
> I agree. I just did not find a proper tag. I would be grateful for any
> suggestions. I hoped that the metafor people would make assist me, but
> they seem to be occupied by other topics. Besides, I still have not
> understood there concept how to map metafor UUIDs to CMIP5/Thredds  
> DRS_IDs.

Alas, I am not the person to assist with this.  I can tell you that  
ESG has gotten away from using the DRS ID in its entirety for  
connecting model metadata and datasets and is now just focusing on two  
parts of it, the model name + the experiment (or if that does not  
exist) the model name + the simulation.

If those two pieces of information were to exist in the XML file, we  
could connect the QC value to simulation metadata (again assuming it  
exists on that granularity).

>
> Have a nice weekend,
> Martina
>
>>
>> On Jul 7, 2011, at 4:52 AM, Martina Stockhause wrote:
>>
>>>  Hi, Sylvia,
>>>
>>> we offer two preliminary CIM Quality examples to give you a first
>>> impression of the quality documents to be harvested by the gateways:
>>>
>>> http://anticyclone.dkrz.de:8088/geonetwork/srv/en/atom.latestCIM
>>> internal_ids=1896,1897
>>>
>>> A few explanations where to find the core information:
>>>
>>> - Quality information will be provided on ESG dataset /  
>>> publication unit
>>> level and experiment level. The example with internal_id=1897 is a
>>> dataset and the internal_id=1896 is the related experiment.
>>>
>>> - pass/fail information: "pass" with pass=0 (failed) pass=1  
>>> (passed).
>>> The examples haven't completed QC L2 and therefore are pass=0.
>>> Interesting are more the pass=1 documents. Therefore I suggest to
>>> publish them by AtomFeed only for QC status changes and therefore  
>>> a pass=1.
>>>
>>> - QC Level: "nameOfMeasure" or "measureIdentification/code" (string
>>> includes "2" or "3"). These are examples for QC L2.
>>>
>>> - DRS_ID: I did not find an exclusive tag for that, therefore it  
>>> is part
>>> of "evaluationMethodDescription", e.g.
>>> dataset_id=cmip5.output1.IPSL.IPSL-CM5A-LR.amip4K.3hr.atmos. 
>>> 3hr.r1i1p1.v20110429
>>> or part of the "title". I would not suggest to use the "title". We  
>>> keep
>>> the version because for QC L3 it is important to assign only a  
>>> specific
>>> version of a dataset the QC level 3.
>>>
>>>
>>> Status and expected changes:
>>> - Hans and I were handed over Bryan's QC tool for CIM quality  
>>> document
>>> creation. We still need a bit of development and discussion about  
>>> the
>>> application with Bryan, before we can use it. Therefore the CIM  
>>> document
>>> might change (CIM schema v1.5 should be stable, but semantics  
>>> might change).
>>> - The field for the DRS_ID might change. "measureIdentification/ 
>>> code"
>>> and "pass" will remain. We can give you distinct
>>> "measureIdentification/code" values for QC L2 and QC L3 when we  
>>> are ready.
>>> - Address of AtomFeed might change.
>>> - Schema location for validation will be added, document_id etc.  
>>> with
>>> the use of Bryan's QC tool.
>>>
>>> If you need more information or have suggestions for changes,  
>>> please let
>>> us know.
>>>
>>> Best wishes,
>>> Martina and Hans
>>>
>>> _______________________________________________
>>> GO-ESSP-TECH mailing list
>>> GO-ESSP-TECH at ucar.edu
>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>> ***********************************
>> Sylvia Murphy
>> NESII/CIRES/NOAA Earth System Research Laboratory
>> 325 Broadway, Boulder CO 80305
>> Email: sylvia.murphy at noaa.gov
>> Phone: 303-497-7753
>> Fax: 303-497-7649
>>
>>
>>
>>
>>
>>
>>
>
>
> -- 
> ------------------ DKRZ / Data Management ------------------
> Martina Stockhause	
> Deutsches Klimarechenzentrum	phone:	+49-40-460094-122
> Bundesstr. 45a			FAX:	+49-40-460094-106
> D-20146 Hamburg, Germany	e-mail:	stockhause at dkrz.de
> ------------------------------------------------------------
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech