[Go-essp-tech] AtomFeed for simulationRun documents and CIM qctool

Fri Oct 22 23:27:16 MDT 2010

Hi Martina

Thanks for this (and for sending Tobias to our sprint meeting primed 
with questions). I think this email pretty much covers what Tobias, Hans 
and I talked about too ...

> then the CIM simulation level is then the same as my quality level, -
> including multiple ensemble members and multiple realms. You write of
> a URL or URI, - a URL to where? A cim document to which the quality
> information belong?

The concept of the quality is that it defines the quality of some 
resource described at a specific url. The type of the resource at hte url 
comes from the type controlled vocabulary.  If the thing of interest is 
"buried" at the remote URL, then you can add an identifier/path/query to 
define the resource of interest.

So, the URL is not the resource at which the quality belongs, but the 
resource *about which* you have some quality assertions to make.

In our use case, it might be a url to a specific page in an ESG catalog, 
and the type will tell us whether it is the page itself, or the thing it 
describes.

> Well, and I meant that I need a GUI-less qctool for the quality
> information ingest with an example call.

You do :-)

> Let me try to describe a workflow for the ingest of quality
> information in CIM:
> 
> 1. We two register the measurement descriptions for QC L2 and QC L3
> checks.

Yes.

> 2. The people who are responsible for QC L2 checks register with
> their name and email addresses.
> 
> Both could be done within the qc questionnaire.

Yes.

> 3. Add quality check result for QC L2 and assign QC level 2 using a
> python tool instead of the questionnaire:
> 
> qctool.py --level=2 --simulation=<drs experiment> --contact=<email
> address or name> --report=<report file location> --uploadlog=<logfile
> location> [--upoadpdf=<pdf location>] [--upoadbinary=<result
> location>]
> 
> example:
> qctool.py --level=2
> --simulation=cmip5.output.MPI-M.ECHAM6-MPIOM-TR.amip
> --contact=martina.stockhause at zmaw.de
> --report=QCL2_cimresult_cmip5_output_MPI-M_ECHAM6-MPIOM-TR_amip.xml
> --uploadlog=QCL2_cimlogfile_cmip5_output_MPI-M_ECHAM6-MPIOM-TR_amip.l
> og --upoadpdf=QCL2_cimpdf_cmip5_output_MPI-M_ECHAM6-MPIOM-TR_amip.pdf
> --upoadbinary=QCL2_cimresults_cmip5_output_MPI-M_ECHAM6-MPIOM-TR_ami
> p.tar
> 
> It could be different or shorter. 

Yes. Something like that. We will sort the exact API out later, but the 
concept is right. I've just discussed timing of when we would get this 
piece of code available: probably end of November (we might try and make 
it earlier, but don't count on it).  I don't think we're going to be 
qc'ing a lot of data between now and then, so using the manual interface 
via the GUI is probably practical until then. 

> I would use the information as
> follows:
> 
> - level: Link the measurement description for QC L2 to the uploaded
> result section
> - simulation: Link QC L2 quality information to a CIM simulation
> - contact: Link the qc contact to the uploaded result section
> - uploadlog: Give the location of the qc logfile for upload to CIM.
> uploadpdf,uploadbinary alike.
> - report: XML to be specified.

We can finalise those details in the next round of discusison (I can't 
spare the time until at least the end of the first week of November). 
Sorry about that. I'll try and get a detailed sequence/flow diagram to 
you before we start writing the code thoug - and of course, we'll 
discuss the API in more detail before then too. (We think the code 
writing will only take a day or so, if that ... but I'm even more  
overloaded now for a couple of weeks).

> Alternatively, I could send a full quality document together with the
> DRS experiment name it belongs to. Or any solution between these
> extremes.

We think the python tool will process the arguments, and post a json 
document (or some sort of doc) up to the qctool, and XML is available 
from the qc tool.

> You can remove the second measure describing QC L2 I added. It was
> just for testing.
> I still encounter an error, when I add a report and fill the form's
> field for explanation in the qc questionnaire.

We have deployed a 0.21 version which has fixed what i hope was the bug 
you had. Let us know if it's still there, and if so, raise a ticket on 
the metafor trac with details (screenshots etc), and make sure the 
ticket has qctool in the keywords.

> For QC L3 there are an added quality information, added citations for
> URN and DOI, and might be changes in contacts and citations.

We think that whlie authorship etc is changing, the righ way to handle 
that is via fixing the metadata in the questionnaire. When the DOI is 
finally assigned, it'll be assigned to a *version* of the material 
exported from the questionnaire - until hat time, you can keep fixing 
things, and the version number will keep updating (indeed, you'll be 
able to keep changing material in the questionnaire after that, but the 
DOI will point to a specific version which will have been exported and 
loaded into the metafor portal.

Cheers
Bryan

> 
> Best wishes
> Martina
> 
> On 10/18/2010 04:52 PM, Bryan Lawrence wrote:
> >> Hi Martina
> >> 
> >>> which is the AtomFeed address for access of simulationRun
> >>> documents of CIM? This is needed for QC L3. It would be
> >>> necessary to have one or two examples in the AtomFeed for the
> >>> tool development.
> >> 
> >> Gerry is the right person to answer this one now!
> >> 
> >>> And at last I tested your qc questionnaire. Moreover I seem to
> >>> understand mostly of what it does.
> >> 
> >> The granularity of quality entries is not clear to me: I have
> >> summed results for an DRS experiment (metafor simulationRun),
> >> which I can send to you or dublicate it if a finer granularity is
> >> needed, e.g. realm.
> > 
> > The tool is agnostic (I hope). Currently it allows you to provide
> > both a URL and URI, so you can make qc assertions about any target
> > URL and any identifier within it.
> > 
> > We should probably decide on  best practice. My suspicion is that
> > it would be easier to raise them on URLs at the realm dataset
> > level for data, and at the simulation level for metadata. How we
> > ensure that both are closed to pass up a qc level would then be an
> > issue. We can talk about this to decide what is best.
> > 
> >> By the way, how does a metafor simulationRun correspond with the
> >> new DRS syntax in the TDS? In the TDS we have
> >> realm+ensemble+version as a dataset. Is it realm+version with all
> >> ensembles in an
> >> simulationRun entry?
> > 
> > A metafor simulation can include both multiple ensemble members and
> > multiple realms. So, a simulation metadata record will describe
> > multiple datasets.  We probably need to sort that out in the CIM
> > data record. Ideally the output data record associated with a
> > simulation would then include a nested set of records
> > corresponding to the datasets as the publisher describes them.
> > 
> >> Remarks to the qc questionnaire and the CIM qctool:
> >> 
> >> - We need the offline version of the CIM qctool.
> > 
> > By which I presume you mean, you need the ability to upload XML to
> > the qctool? (We could add a simple python tool to post this stuff
> > as well, would that help ... see also below where I have a **)
> > 
> >> - 'issue's: My idea was to send the complete quality metadata
> >> after the QC checks for assignment of QC L2. This would include
> >> only 'report's.
> > 
> > I think that's appropriate. Issues are there for other ... issues
> > ...
> > 
> >> - For new authors the email should be set required for contact if
> >> questions arise during QC L3 regarding QC L2 results.
> > 
> > Not sure what you mean here. Can you pls explain further?
> > 
> >> - The 'report/measureDescription' part describing the QC checks
> >> itself (not their results) has two values: one for QC L2 and one
> >> for QCL3. Therefore these two need to be entered only once and
> >> then referenced when adding 'report's.
> > 
> > There are two measures: QC level 2 and QC level 3. I would expect
> > one report as to each. Is that not what you expect? (Clearly a qc
> > level 3 has already passed qc level 2.). I'm not sure I understand
> > what you are suggesting/asking.
> > 
> >> - The 'report/explanation' part is the QC result. Unfortunately, I
> >> get an error before I could view the metadata (see at the end of
> >> the message). But it would be good to add all additional
> >> information at once (logfile and pdf). The logfile should be
> >> mandatory in order to have at least this piece of information
> >> about the qc results available.
> > 
> > You can add one logfile/plot at a time currently. Are you
> > suggesting you would like the facility (through the interactive
> > tool) to add multiple ones? (I suspect this would be better
> > supported via a script, we could add that to the tool discussed
> > above ** if that's what you wanted).
> > 
> >> For the CIM qctool the effort would be minimal if I can simply add
> >> a report/explanation with the option 'QCL2' or 'QCL3' and the
> >> email of the user as reference to the contact. I expect that
> >> another option has to be the DRS name of the experiment.
> > 
> > I think the effort should be that minmal. The measures will all be
> > predefined, so you only need to enter a resource description (once)
> > and then reports against the (pre-defined) qc meaures (QCL2 and
> > QCL3). (I see you added a measure that was a copy of the one that
> > was there!) I am not sure why one would need to put either a user
> > or a drs name in since you will have a URL pointing to these
> > things - and a portal should harvest hte feed and bind them to the
> > targe ....
> > 
> >> What are your ideas? Could we specify the tool options in advance
> >> and soon, please? With a concrete example?
> >> The report/explanation will be an xml?
> > 
> > It is now (xml!) ... as you can see ... concrete examples? Well
> > this is a live tool, we need some data to put some live reports
> > against.
> > 
> >> For QCL3 this is not sufficient since citations (at least DOI and
> >> URN of the data) and contacts for the DRS experiment
> >> (simulationRun) will or may be changed by the data author.
> > 
> > Sure, but the qc report shouldn't change?
> > 
> >> Good to get a step further!
> >> 
> > :-)
> > 
> > Cheers
> > Bryan
> > 
> >> Best wishes,
> >> Martina
> >> 
> >> On 10/06/2010 04:07 PM, Martina Stockhause wrote:
> >>>   Hallo Bryan,
> >>> 
> >>> at the moment we use the AtomFeed for experiments at
> >>> http://q.cmip5.ceda.ac.uk/feeds/cmip5/experiment/
> >>> 
> >>> But we would need the AtomFeed for simulations as well for qc
> >>> level 3 cross-checks. Is the below address the right one or is
> >>> there another which Sylvia and the ESG portal people use?
> >>> http://q.cmip5.ceda.ac.uk/feeds/cmip5/simulation/
> >>> 
> >>> Thanks a lot and best wishes,
> >>> Martina
> >>> 
> >>    KeyError at /report/
> >> 
> >> 'explanation'
> >> 
> >> Request Method: 	POST
> >> Request URL: 	http://qc.cmip5.ceda.ac.uk/report/
> >> Django Version: 	1.2.3
> >> Exception Type: 	KeyError
> >> Exception Value:
> >> 
> >> 'explanation'
> >> 
> >> Exception Location:
> >> /usr/local/cmip5qc/develop/QCTool/qcproj/qcapp/forms.py in clean,
> >> line 71
> > 
> > Bryan Lawrence
> > Director of Environmental Archival and Associated Research
> > (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> > STFC, Rutherford Appleton Laboratory
> > Phone +44 1235 445012; Fax ... 5848;
> > Web: home.badc.rl.ac.uk/lawrence
Bryan Lawrence
Director of Environmental Archival and Associated Research
(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
STFC, Rutherford Appleton Laboratory
Phone +44 1235 445012; Fax ... 5848; 
Web: home.badc.rl.ac.uk/lawrence