<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    <font face="Times New Roman">Hi Bryan,<br>

      <br>

      This will be very useful.<br>

      <br>

      Your email reminded me, that some data providers might withdraw

      data simply by erasing the files.&nbsp; How is ESG supposed to be

      informed of this?&nbsp; If they "republish" the dataset that the files

      are part of, will a new version be assigned, and will anyone know

      the files have been withdrawn?<br>

      <br>

      cheers,<br>

      Karl<br>

    </font><br>

    On 11/9/10 10:59 AM, Bryan Lawrence wrote:

    <blockquote cite="mid:201011091859.48195.bryan.lawrence@stfc.ac.uk"

      type="cite">

      <pre wrap="">Hi Karl

A quick response to 1). This is exactly why I want a tracking id 

service. (Which I'm not yet sure we've built. Hans, what's the status of 

that?)

If you have a file, with a tracking id, you want to be able to cut and 

paste it into a "find my data" page, and go straight to the parent 

metadata, which should show it has been withdrawn, and what the current 

version is.

Cheers

Bryan

</pre>

      <blockquote type="cite">

        <pre wrap="">Dear all,

I am concerned that we will be unable to help users learn when CMIP5

data they have downloaded has been withdrawn (presumably because it

is flawed).  Here are some common "use cases" that ESG should be

able to handle (but I don't think it currently does).

1.  A user downloads some files on December 12, 2010.  Three months

later he wants to know

    a) if any files he downloaded were withdrawn (i.e., found to be

flawed).

    b) if similar data from other models (or replacement data from

models he has already downloaded) has become available.

    c) the reasons for data being withdrawn or replaced.  (For

example, was the data in the file flawed?  Was the data mislabeled? 

 Were some of the attributes incorrect?  If so, which ones?)

2.  A user wishes to be informed by email when files he has

downloaded have been withdrawn, and he wants to know the reasons for

their withdrawal.

3.  A user wishes to be informed by email when new files in his area

of interest become available.  (The user would define "area of

interest" in terms of a set of DRS identifiers, e.g., experiment,

variable name, MIP table).

4.  A reader of a journal article wants to know whether any of the

data used in a study has been withdrawn and the reasons for its

withdrawal. The DOI's for the dataset(s) are included in the

article, and the user knows what variables are used from that

dataset.  How does he learn whether the data were subsequently

withdrawn, and the reasons?

It is my understanding that the assignment of versions in the present

system is based on "dataset", whereas most users will only be

interested in a tiny portion of the dataset (e.g., a single

variable, rather than the perhaps 100 variables that might be

included in the dataset).   It would be very little help if the user

could learn only about changes at the dataset level (which might

occur because a single  variable was added withdrawn or replaced). 

Also the *reason* for any changes to a dataset should always be made

clear.  So, the challenges would seem to be:

1.  Making sure data providers recorded information about why changes

were made to their datasets.

2.  Being able to report changes applied only to the subset of files

in a dataset that are of interest to any particular user.  This is

especially important, since if a user only is interested in 1 out of

100 variables, he doesn't want to be bothered with messages about

changes to the dataset that didn't affect the variable he is

interested in.

Another thing we should plan on doing is making it easy for users to

report suspected errors in the data they are analyzing directly to

the responsible modeling group(s).  How are we going to handle all

the emails from users who think they've discovered problems?

If we don't have some way of doing the above by the first month or

two of 2011, I think we're going to be in for lots of complaints.  I

therefore hope we can make this a very high priority.  Are there any

higher priorities? (I'm sure there are, just wondering what they

are.)

Best regards,

Karl

p.s. feel free to post or forward to whomever you think might be able

to help.

</pre>

      </blockquote>

      <pre wrap="">

--

Bryan Lawrence

Director of Environmental Archival and Associated Research

(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)

STFC, Rutherford Appleton Laboratory

Phone +44 1235 445012; Fax ... 5848; 

Web: home.badc.rl.ac.uk/lawrence

</pre>

    </blockquote>

  </body>

</html>