<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<font face="Times New Roman">Hi all,<br>
<br>
In this discussion, I think we should distinguish between two
cases:<br>
<br>
1) Isolated time-slices are missing from a time-series (say, less
than 0.1% of samples are missing). This might happen, for
example, if a few "history" files get lost and can't be
regenerated. We shouldn't expect entire files produced by CMOR to
be missing in this case, just some time samples. <br>
<br>
2) A group decides only to provide model output for a subset of
the requested period, and so there are whole portions of a
time-series missing. For example, suppose a group only chooses to
save 3-hourly data from its historical run for the years 1960-1969
and years 1990-2005. The data requested for years 1970-1989 would
be missing.<br>
<br>
Keep in mind also that some requested time-series contain gaps by
design. For example the 3-d aero data for the RCP runs is to be
collected only for years 2010, 2020, 2040, 2060, 2080, and 2100.
Note that the gaps are not all of equal length. These data are
monthly, so most months are "missing" by design. <br>
<br>
I think when large portions of a time-series are missing (case 2),
the user will easily notice this by inspecting the file names, as
long as the gap is *not* contained within the file itself. This
leads to the suggestion that entire files may be omitted, but
within a single file the data should be complete (although
isolated time-slices might be entirely filled with "missing"
values. I don't think we can generate new types of files for
CMIP5 that are "empty"; it's too late for changes of this kind.
Also I don't think seeing a file of size near zero is any easier
than checking the time periods explicitly given in the names of
the files.<br>
<br>
Revisiting what to do about the *isolated* missing time-slices of
case 1, my original suggestion was to omit these (or fill the with
missing values), but Bryan felt strongly they should always be
included and filled with missing values. Others have pointed out
that one can fairly easily infer from the time-coordinate whether
or not a time slice has been omitted, whereas if the entire time
slice were filled with "missing values", one would have to read in
the data itself to determine whether there was any valid data.
On the other hand if anyone failed to read in the
time-coordinates, and instead simply read all the time-slices that
were available, and then *assumed* no time-slices were omitted,
they would likely perform a flawed analysis and might never
notice. They would be less likely to do this if all the
time-slices were actually written, but isolated ones were filled
with missing values.<br>
<br>
So, I'm inclined to allow some flexibility summarized here since
unless folks are careful, they'll make mistakes no matter what we
decide:<br>
<br>
When isolated time-slices in a dataset are lost and it is
impossible to recover them, it is recommended that those isolated
missing time-slices be:<br>
1) filled entirely with the "missing data" value, or<br>
2) be entirely omitted from the file (making sure the
time-coordinate reflects their absence)<br>
<br>
When significant portions of a time-series are omitted (either my
design or otherwise), one should simply not create files for those
portions of the time-series. This might require the user to
divide data normally found in a single file into two files. For
example, if 100-years of monthly mean data are normally packaged
into a single file, but a decade (i.e., 120 consecutive samples)
is unavailable (say years 40-49), the user should write instead
two files, the first with 40 years of day and the second with the
last 50 years of data. <br>
</font><font face="Times New Roman"><br>
Further discussion invited.<br>
<br>
Best regards,<br>
Karl<br>
<br>
<br>
<br>
<br>
<br>
</font><br>
On 6/24/11 7:47 AM, Bentley, Philip wrote:
<blockquote
cite="mid:E51EDFEBF10BE44BB4BDAF5FC2F024B903636837@EXXMAIL02.desktop.frd.metoffice.com"
type="cite">
<pre wrap="">Hi George,
Your chosen solution to use a metadata attribute ('nodata' in your case)
to flag that a given file is an empty/null file is exactly the solution
that I had in mind for CMIP5 files comprised of all missing data.
Unfortunately - and the reason I didn't pursue it on mailing lists -
such files would not, I think, be CF-compliant and as such would likely
trip up current netCDF client software (and certainly the tools likely
to be used for analysing CMIP5 datasets).
Although it's probably too late to use this device on the CMIP5 project,
nonetheless I wonder if it isn't worth making a proposal along these
lines to the CF mailing list?
Regards,
Phil
</pre>
<blockquote type="cite">
<pre wrap="">-----Original Message-----
From: <a class="moz-txt-link-abbreviated" href="mailto:go-essp-tech-bounces@ucar.edu">go-essp-tech-bounces@ucar.edu</a>
[<a class="moz-txt-link-freetext" href="mailto:go-essp-tech-bounces@ucar.edu">mailto:go-essp-tech-bounces@ucar.edu</a>] On Behalf Of George J. Huffman
Sent: 24 June 2011 14:23
To: Kettleborough, Jamie
Cc: <a class="moz-txt-link-abbreviated" href="mailto:go-essp-tech@ucar.edu">go-essp-tech@ucar.edu</a>
Subject: Re: [Go-essp-tech] Handling missingdata in the CMIP5 archive
Hi all - to quote a different context ... in the
Precipitation Processing System, which is the processing
center for TRMM and GPM satellite project data, the choice is
to provide a file whether or not the data actually exist. If
parts of the file are missing, they are filled with the
missing value, as you'd expect. If the entire contents of
the file are unavailable, the metadata in the header includes
a "nodata=true" flag and no space is wasted. To follow up on
the "early failure" comments, if you process the header for
the nodata flag, you'd immediately hit it, and if you don't,
you'd immediately hit read failures. As a visual check, the
all-missing file's size is tiny compared to the usual file
that has data.
George
</pre>
</blockquote>
<pre wrap="">_______________________________________________
GO-ESSP-TECH mailing list
<a class="moz-txt-link-abbreviated" href="mailto:GO-ESSP-TECH@ucar.edu">GO-ESSP-TECH@ucar.edu</a>
<a class="moz-txt-link-freetext" href="http://mailman.ucar.edu/mailman/listinfo/go-essp-tech">http://mailman.ucar.edu/mailman/listinfo/go-essp-tech</a>
</pre>
</blockquote>
</body>
</html>