<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#ffffff" text="#000000">
<font face="Times New Roman">Dear all,<br>
<br>
I am concerned that we will be unable to help users learn when
CMIP5 data they have downloaded has been withdrawn (presumably
because it is flawed). Here are some common "use cases" that ESG
should be able to handle (but I don't think it currently does). <br>
<br>
1. A user downloads some files on December 12, 2010. Three
months later he wants to know <br>
a) </font><font face="Times New Roman">if any files he
downloaded were withdrawn (i.e., found to be flawed). </font>
<style>p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: "Times New Roman"; }div.Section1 { page: Section1; }ol { margin-bottom: 0in; }ul { margin-bottom: 0in; }</style>
<br>
<font face="Times New Roman"> b) if similar data from other models
(or replacement data from models he has already downloaded) has
become available. <br>
c) the reasons for data being withdrawn or replaced. (For
example, was the data in the file flawed? Was the data
mislabeled? Were some of the attributes incorrect? If so, which
ones?)<br>
<br>
2. A user wishes to be informed by email when files he has
downloaded have been withdrawn, and he wants to know the reasons
for their withdrawal.<br>
<br>
3. A user wishes to be informed by email when new files in his
area of interest become available. (The user would define "area
of interest" in terms of a set of DRS identifiers, e.g.,
experiment, variable name, MIP table).<br>
<br>
4. A reader of a journal article wants to know whether any of the
data used in a study has been withdrawn and the reasons for its
withdrawal. The DOI's for the dataset(s) are included in the
article, and the user knows what variables are used from that
dataset. How does he learn whether the data were subsequently
withdrawn, and the reasons?<br>
<br>
It is my understanding that the assignment of versions in the
present system is based on "dataset", whereas most users will only
be interested in a tiny portion of the dataset (e.g., a single
variable, rather than the perhaps 100 variables that might be
included in the dataset). It would be very little help if the
user could learn only about changes at the dataset level (which
might occur because a single variable was added withdrawn or
replaced). Also the *reason* for any changes to a dataset should
always be made clear. So, the challenges would seem to be:<br>
<br>
1. Making sure data providers recorded information about why
changes were made to their datasets.<br>
<br>
2. Being able to report changes applied only to the subset of
files in a dataset that are of interest to any particular user.
This is especially important, since if a user only is interested
in 1 out of 100 variables, he doesn't want to be bothered with
messages about changes to the dataset that didn't affect the
variable he is interested in.<br>
<br>
Another thing we should plan on doing is making it easy for users
to report suspected errors in the data they are analyzing directly
to the responsible modeling group(s). How are we going to handle
all the emails from users who think they've discovered problems?<br>
<br>
If we don't have some way of doing the above by the first month or
two of 2011, I think we're going to be in for lots of complaints.
I therefore hope we can make this a very high priority. Are there
any higher priorities? (I'm sure there are, just wondering what
they are.)<br>
<br>
Best regards,<br>
Karl<br>
<br>
p.s. feel free to post or forward to whomever you think might be
able to help.<br>
</font>
</body>
</html>