[Go-essp-tech] [metafor] ESG CMIP5 notification and inquiry requirements

Wed Nov 10 12:45:01 MST 2010

Hi Mark

Hmmm. I think Mark E's use case is somewhat different than I have been 
advocating. When I'm back in the UK I'll try and write something more 
explicit about the use case Karl and I have been discussing. (There are 
some written docs around, but they don't seem to have migrated to your 
collection.)

The key underlying service will depend on Han's Thredds to CIM importer 
being able to map *all* the tracking ids in datafiles to the relevant 
upper level ids (drs id and metafor document ids). Which means it 
depends on all the tracking id's being discoverable via the Thredds 
interface.

I think someone confirmed that was possible once upon a time, but I don't 
have access to CMIP5 data (with tracking ids) via an ESG Thredds 
interface to check. Can anyone confirm this is possible .... Bob?

Thanks
Bryan

> Hi
> 
> The "Tracking ID" service, or rather "ID Resolution" service, is at
> the initial development phase.  The use case is summed up by Mark
> Elkington in the two rtf attachments.  The design is detailed up in
> the Metafor deliverable D5.5 attached.
> 
> How groups update the "parent meta data" is unclear ... via the CIMP5
> questionnaire?
> 
> Regards
> 
> Mark
> 
> > Hi Karl
> > 
> > A quick response to 1). This is exactly why I want a tracking id
> > service. (Which I'm not yet sure we've built. Hans, what's the
> > status of that?)
> > 
> > If you have a file, with a tracking id, you want to be able to cut
> > and paste it into a "find my data" page, and go straight to the
> > parent metadata, which should show it has been withdrawn, and what
> > the current version is.
> > 
> > Cheers
> > Bryan
> > 
> >> Dear all,
> >> 
> >> I am concerned that we will be unable to help users learn when
> >> CMIP5 data they have downloaded has been withdrawn (presumably
> >> because it is flawed).  Here are some common "use cases" that ESG
> >> should be able to handle (but I don't think it currently does).
> >> 
> >> 1.  A user downloads some files on December 12, 2010.  Three
> >> months later he wants to know
> >> 
> >>     a) if any files he downloaded were withdrawn (i.e., found to
> >>     be
> >> 
> >> flawed).
> >> 
> >>     b) if similar data from other models (or replacement data from
> >> 
> >> models he has already downloaded) has become available.
> >> 
> >>     c) the reasons for data being withdrawn or replaced.  (For
> >> 
> >> example, was the data in the file flawed?  Was the data
> >> mislabeled?
> >> 
> >>  Were some of the attributes incorrect?  If so, which ones?)
> >> 
> >> 2.  A user wishes to be informed by email when files he has
> >> downloaded have been withdrawn, and he wants to know the reasons
> >> for their withdrawal.
> >> 
> >> 3.  A user wishes to be informed by email when new files in his
> >> area of interest become available.  (The user would define "area
> >> of interest" in terms of a set of DRS identifiers, e.g.,
> >> experiment, variable name, MIP table).
> >> 
> >> 4.  A reader of a journal article wants to know whether any of the
> >> data used in a study has been withdrawn and the reasons for its
> >> withdrawal. The DOI's for the dataset(s) are included in the
> >> article, and the user knows what variables are used from that
> >> dataset.  How does he learn whether the data were subsequently
> >> withdrawn, and the reasons?
> >> 
> >> It is my understanding that the assignment of versions in the
> >> present system is based on "dataset", whereas most users will
> >> only be interested in a tiny portion of the dataset (e.g., a
> >> single variable, rather than the perhaps 100 variables that might
> >> be included in the dataset).   It would be very little help if
> >> the user could learn only about changes at the dataset level
> >> (which might occur because a single  variable was added withdrawn
> >> or replaced). Also the *reason* for any changes to a dataset
> >> should always be made clear.  So, the challenges would seem to
> >> be:
> >> 
> >> 1.  Making sure data providers recorded information about why
> >> changes were made to their datasets.
> >> 
> >> 2.  Being able to report changes applied only to the subset of
> >> files in a dataset that are of interest to any particular user. 
> >> This is especially important, since if a user only is interested
> >> in 1 out of 100 variables, he doesn't want to be bothered with
> >> messages about changes to the dataset that didn't affect the
> >> variable he is interested in.
> >> 
> >> Another thing we should plan on doing is making it easy for users
> >> to report suspected errors in the data they are analyzing
> >> directly to the responsible modeling group(s).  How are we going
> >> to handle all the emails from users who think they've discovered
> >> problems?
> >> 
> >> If we don't have some way of doing the above by the first month or
> >> two of 2011, I think we're going to be in for lots of complaints. 
> >> I therefore hope we can make this a very high priority.  Are
> >> there any higher priorities? (I'm sure there are, just wondering
> >> what they are.)
> >> 
> >> Best regards,
> >> Karl
> >> 
> >> p.s. feel free to post or forward to whomever you think might be
> >> able to help.
> > 
> > --
> > Bryan Lawrence
> > Director of Environmental Archival and Associated Research
> > (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> > STFC, Rutherford Appleton Laboratory
> > Phone +44 1235 445012; Fax ... 5848;
> > Web: home.badc.rl.ac.uk/lawrence
> > _______________________________________________
> > GO-ESSP-TECH mailing list
> > GO-ESSP-TECH at ucar.edu
> > http://mailman.ucar.edu/mailman/listinfo/go-essp-tech

--
Bryan Lawrence
Director of Environmental Archival and Associated Research
(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
STFC, Rutherford Appleton Laboratory
Phone +44 1235 445012; Fax ... 5848; 
Web: home.badc.rl.ac.uk/lawrence