[Go-essp-tech] ESG CMIP5 notification and inquiry requirements

Tue Nov 9 11:31:42 MST 2010

Dear all,

I am concerned that we will be unable to help users learn when CMIP5 
data they have downloaded has been withdrawn (presumably because it is 
flawed).  Here are some common "use cases" that ESG should be able to 
handle (but I don't think it currently does).

1.  A user downloads some files on December 12, 2010.  Three months 
later he wants to know
    a) if any files he downloaded were withdrawn (i.e., found to be 
flawed).
    b) if similar data from other models (or replacement data from 
models he has already downloaded) has become available.
    c) the reasons for data being withdrawn or replaced.  (For example, 
was the data in the file flawed?  Was the data mislabeled?   Were some 
of the attributes incorrect?  If so, which ones?)

2.  A user wishes to be informed by email when files he has downloaded 
have been withdrawn, and he wants to know the reasons for their withdrawal.

3.  A user wishes to be informed by email when new files in his area of 
interest become available.  (The user would define "area of interest" in 
terms of a set of DRS identifiers, e.g., experiment, variable name, MIP 
table).

4.  A reader of a journal article wants to know whether any of the data 
used in a study has been withdrawn and the reasons for its withdrawal.  
The DOI's for the dataset(s) are included in the article, and the user 
knows what variables are used from that dataset.  How does he learn 
whether the data were subsequently withdrawn, and the reasons?

It is my understanding that the assignment of versions in the present 
system is based on "dataset", whereas most users will only be interested 
in a tiny portion of the dataset (e.g., a single variable, rather than 
the perhaps 100 variables that might be included in the dataset).   It 
would be very little help if the user could learn only about changes at 
the dataset level (which might occur because a single  variable was 
added withdrawn or replaced).  Also the *reason* for any changes to a 
dataset should always be made clear.  So, the challenges would seem to be:

1.  Making sure data providers recorded information about why changes 
were made to their datasets.

2.  Being able to report changes applied only to the subset of files in 
a dataset that are of interest to any particular user.  This is 
especially important, since if a user only is interested in 1 out of 100 
variables, he doesn't want to be bothered with messages about changes to 
the dataset that didn't affect the variable he is interested in.

Another thing we should plan on doing is making it easy for users to 
report suspected errors in the data they are analyzing directly to the 
responsible modeling group(s).  How are we going to handle all the 
emails from users who think they've discovered problems?

If we don't have some way of doing the above by the first month or two 
of 2011, I think we're going to be in for lots of complaints.  I 
therefore hope we can make this a very high priority.  Are there any 
higher priorities? (I'm sure there are, just wondering what they are.)

Best regards,
Karl

p.s. feel free to post or forward to whomever you think might be able to 
help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20101109/0d7fcf11/attachment.html