[Go-essp-tech] ESG CMIP5 notification and inquiry requirements

Martina Stockhause martina.stockhause at zmaw.de
Wed Nov 10 07:39:48 MST 2010


  Hi, Bryan, hi, Karl,

a few remarks on the withdrawal of data subdivided in data without 
assigned DOI and with assigned DOI.

no assigned DOI
- Where is the reason for a data retrieval documented? Questionnaire?
- And by whom is it done? By the modelling centers?
- If we decide that a QC Level 2 or 3 cannot be assigned, should we, the 
ESGF stuff, document the reason in the metadata? BADC qctool would be 
able to handle that.

assigned DOI
- As DOI publication agency we at WDCC would like to know if errors in 
the data are found. How can we access that information?
- 4) We have agreed at WDCC to document if we get new QC L2 checked data 
for a DRS experiment with an assigned DOI. We will add references of the 
type isNewVersionOf to the new DOI pointing to the old DOI and 
isOldVersionOf to the old DOI pointing to the new DOI (according to 
STD-DOI schema 
http://dc110dmz.gfz-potsdam.de/contenido/std-doi/upload/pdf/STD_metadata_kernel_v3.pdf: 
page 8 - relationTypes). We will be able to ingest them into CIM 
metadata via the QC L3 interface to CIM (update of references and 
contacts), which is currently under development at BADC. However, we do 
not know the reason for the delivery of new data if that is not send to 
us or accessible for us.

If we speak about errors in datasets belonging to a DOI, we should not 
speak of withdrawal. Because we do not withdraw the DOI nor its data nor 
its metadata, but set the QC Flag to "approved by author, but suspended" 
and add the reference isOldVersionOf to it.

Reporting of errors to the modelling groups after DOI assignment is 
possible via the DOI target page. This page include:
- DOI citation
- contact email address of the responsible modeling center
- link to metadata
- link to data

Best wishes,
Martina


On 11/09/2010 07:59 PM, Bryan Lawrence wrote:
> Hi Karl
>
> A quick response to 1). This is exactly why I want a tracking id
> service. (Which I'm not yet sure we've built. Hans, what's the status of
> that?)
>
> If you have a file, with a tracking id, you want to be able to cut and
> paste it into a "find my data" page, and go straight to the parent
> metadata, which should show it has been withdrawn, and what the current
> version is.
>
> Cheers
> Bryan
>
>> Dear all,
>>
>> I am concerned that we will be unable to help users learn when CMIP5
>> data they have downloaded has been withdrawn (presumably because it
>> is flawed).  Here are some common "use cases" that ESG should be
>> able to handle (but I don't think it currently does).
>>
>> 1.  A user downloads some files on December 12, 2010.  Three months
>> later he wants to know
>>      a) if any files he downloaded were withdrawn (i.e., found to be
>> flawed).
>>      b) if similar data from other models (or replacement data from
>> models he has already downloaded) has become available.
>>      c) the reasons for data being withdrawn or replaced.  (For
>> example, was the data in the file flawed?  Was the data mislabeled?
>>   Were some of the attributes incorrect?  If so, which ones?)
>>
>> 2.  A user wishes to be informed by email when files he has
>> downloaded have been withdrawn, and he wants to know the reasons for
>> their withdrawal.
>>
>> 3.  A user wishes to be informed by email when new files in his area
>> of interest become available.  (The user would define "area of
>> interest" in terms of a set of DRS identifiers, e.g., experiment,
>> variable name, MIP table).
>>
>> 4.  A reader of a journal article wants to know whether any of the
>> data used in a study has been withdrawn and the reasons for its
>> withdrawal. The DOI's for the dataset(s) are included in the
>> article, and the user knows what variables are used from that
>> dataset.  How does he learn whether the data were subsequently
>> withdrawn, and the reasons?
>>
>> It is my understanding that the assignment of versions in the present
>> system is based on "dataset", whereas most users will only be
>> interested in a tiny portion of the dataset (e.g., a single
>> variable, rather than the perhaps 100 variables that might be
>> included in the dataset).   It would be very little help if the user
>> could learn only about changes at the dataset level (which might
>> occur because a single  variable was added withdrawn or replaced).
>> Also the *reason* for any changes to a dataset should always be made
>> clear.  So, the challenges would seem to be:
>>
>> 1.  Making sure data providers recorded information about why changes
>> were made to their datasets.
>>
>> 2.  Being able to report changes applied only to the subset of files
>> in a dataset that are of interest to any particular user.  This is
>> especially important, since if a user only is interested in 1 out of
>> 100 variables, he doesn't want to be bothered with messages about
>> changes to the dataset that didn't affect the variable he is
>> interested in.
>>
>> Another thing we should plan on doing is making it easy for users to
>> report suspected errors in the data they are analyzing directly to
>> the responsible modeling group(s).  How are we going to handle all
>> the emails from users who think they've discovered problems?
>>
>> If we don't have some way of doing the above by the first month or
>> two of 2011, I think we're going to be in for lots of complaints.  I
>> therefore hope we can make this a very high priority.  Are there any
>> higher priorities? (I'm sure there are, just wondering what they
>> are.)
>>
>> Best regards,
>> Karl
>>
>> p.s. feel free to post or forward to whomever you think might be able
>> to help.
> --
> Bryan Lawrence
> Director of Environmental Archival and Associated Research
> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> STFC, Rutherford Appleton Laboratory
> Phone +44 1235 445012; Fax ... 5848;
> Web: home.badc.rl.ac.uk/lawrence
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech

-- 
----------- DKRZ / Data Management -----------

Martina Stockhause
Deutsches Klimarechenzentrum
Bundesstr. 45a
D-20146 Hamburg
Germany

phone:	+49-40-460094-122
FAX:	+49-40-460094-106
e-mail:	martina.stockhause at zmaw.de

----------------------------------------------



More information about the GO-ESSP-TECH mailing list