[Go-essp-tech] ESG CMIP5 notification and inquiry requirements

Tue Nov 9 11:59:48 MST 2010

Hi Karl

A quick response to 1). This is exactly why I want a tracking id 
service. (Which I'm not yet sure we've built. Hans, what's the status of 
that?)

If you have a file, with a tracking id, you want to be able to cut and 
paste it into a "find my data" page, and go straight to the parent 
metadata, which should show it has been withdrawn, and what the current 
version is.

Cheers
Bryan

> Dear all,
> 
> I am concerned that we will be unable to help users learn when CMIP5
> data they have downloaded has been withdrawn (presumably because it
> is flawed).  Here are some common "use cases" that ESG should be
> able to handle (but I don't think it currently does).
> 
> 1.  A user downloads some files on December 12, 2010.  Three months
> later he wants to know
>     a) if any files he downloaded were withdrawn (i.e., found to be
> flawed).
>     b) if similar data from other models (or replacement data from
> models he has already downloaded) has become available.
>     c) the reasons for data being withdrawn or replaced.  (For
> example, was the data in the file flawed?  Was the data mislabeled? 
>  Were some of the attributes incorrect?  If so, which ones?)
> 
> 2.  A user wishes to be informed by email when files he has
> downloaded have been withdrawn, and he wants to know the reasons for
> their withdrawal.
> 
> 3.  A user wishes to be informed by email when new files in his area
> of interest become available.  (The user would define "area of
> interest" in terms of a set of DRS identifiers, e.g., experiment,
> variable name, MIP table).
> 
> 4.  A reader of a journal article wants to know whether any of the
> data used in a study has been withdrawn and the reasons for its
> withdrawal. The DOI's for the dataset(s) are included in the
> article, and the user knows what variables are used from that
> dataset.  How does he learn whether the data were subsequently
> withdrawn, and the reasons?
> 
> It is my understanding that the assignment of versions in the present
> system is based on "dataset", whereas most users will only be
> interested in a tiny portion of the dataset (e.g., a single
> variable, rather than the perhaps 100 variables that might be
> included in the dataset).   It would be very little help if the user
> could learn only about changes at the dataset level (which might
> occur because a single  variable was added withdrawn or replaced). 
> Also the *reason* for any changes to a dataset should always be made
> clear.  So, the challenges would seem to be:
> 
> 1.  Making sure data providers recorded information about why changes
> were made to their datasets.
> 
> 2.  Being able to report changes applied only to the subset of files
> in a dataset that are of interest to any particular user.  This is
> especially important, since if a user only is interested in 1 out of
> 100 variables, he doesn't want to be bothered with messages about
> changes to the dataset that didn't affect the variable he is
> interested in.
> 
> Another thing we should plan on doing is making it easy for users to
> report suspected errors in the data they are analyzing directly to
> the responsible modeling group(s).  How are we going to handle all
> the emails from users who think they've discovered problems?
> 
> If we don't have some way of doing the above by the first month or
> two of 2011, I think we're going to be in for lots of complaints.  I
> therefore hope we can make this a very high priority.  Are there any
> higher priorities? (I'm sure there are, just wondering what they
> are.)
> 
> Best regards,
> Karl
> 
> p.s. feel free to post or forward to whomever you think might be able
> to help.

--
Bryan Lawrence
Director of Environmental Archival and Associated Research
(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
STFC, Rutherford Appleton Laboratory
Phone +44 1235 445012; Fax ... 5848; 
Web: home.badc.rl.ac.uk/lawrence