<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    <font face="Times New Roman">Hi all,<br>

      <br>

      In this discussion, I think we should distinguish between two

      cases:<br>

      <br>

      1) Isolated time-slices are missing from a time-series (say, less

      than 0.1% of samples are missing).&nbsp; This might happen, for

      example,&nbsp; if a few "history" files get lost and can't be

      regenerated.&nbsp; We shouldn't expect entire files produced by CMOR to

      be missing in this case, just some time samples.&nbsp; <br>

      <br>

      2) A group decides only to provide model output for a subset of

      the requested period, and so there are whole portions of a

      time-series missing.&nbsp; For example, suppose a group only chooses to

      save 3-hourly data from its historical run for the years 1960-1969

      and years 1990-2005.&nbsp; The data requested for years 1970-1989 would

      be missing.<br>

      <br>

      Keep in mind also that some requested time-series contain gaps by

      design.&nbsp; For example the 3-d aero data for the RCP runs is to be

      collected only for years 2010, 2020, 2040, 2060, 2080, and 2100.&nbsp;

      Note that the gaps are not all of equal length.&nbsp; These data are

      monthly, so most months are "missing" by design.&nbsp; <br>

      <br>

      I think when large portions of a time-series are missing (case 2),

      the user will easily notice this by inspecting the file names, as

      long as the gap is *not* contained within the file itself.&nbsp; This

      leads to the suggestion that entire files may be omitted, but

      within a single file the data should be complete (although

      isolated time-slices might be entirely filled with "missing"

      values.&nbsp; I don't think we can generate new types of files for

      CMIP5 that are "empty"; it's too late for changes of this kind.&nbsp;

      Also I don't think seeing a file of size near zero is any easier

      than checking the time periods explicitly given in the names of

      the files.<br>

      <br>

      Revisiting what to do about the *isolated* missing time-slices of

      case 1, my original suggestion was to omit these (or fill the with

      missing values), but Bryan felt strongly they should always be

      included and filled with missing values.&nbsp; Others have pointed out

      that one can fairly easily infer from the time-coordinate whether

      or not a time slice has been omitted, whereas if the entire time

      slice were filled with "missing values", one would have to read in

      the data itself to determine whether there was any valid data.&nbsp;&nbsp;

      On the other hand if anyone failed to read in the

      time-coordinates, and instead simply read all the time-slices that

      were available, and then *assumed* no time-slices were omitted,

      they would likely perform a flawed analysis and might never

      notice.&nbsp; They would be less likely to do this if all the

      time-slices were actually written, but isolated ones were filled

      with missing values.<br>

      <br>

      So, I'm inclined to allow some flexibility summarized here since

      unless folks are careful, they'll make mistakes no matter what we

      decide:<br>

      <br>

      When isolated time-slices in a dataset are lost and it is

      impossible to recover them, it is recommended that those isolated

      missing time-slices be:<br>

      1) filled entirely with the "missing data" value, or<br>

      2) be entirely omitted from the file (making sure the

      time-coordinate reflects their absence)<br>

      <br>

      When significant portions of a time-series are omitted (either my

      design or otherwise), one should simply not create files for those

      portions of the time-series.&nbsp; This might require the user to

      divide data normally found in a single file into two files.&nbsp; For

      example, if 100-years of monthly mean data are normally packaged

      into a single file, but a decade&nbsp; (i.e., 120 consecutive samples)

      is unavailable (say years 40-49), the user should write instead

      two files, the first with 40 years of day and the second with the

      last 50 years of data. <br>

    </font><font face="Times New Roman"><br>

      Further discussion invited.<br>

      <br>

      Best regards,<br>

      Karl<br>

      <br>

      <br>

      <br>

      <br>

      <br>

    </font><br>

    On 6/24/11 7:47 AM, Bentley, Philip wrote:

    <blockquote

cite="mid:E51EDFEBF10BE44BB4BDAF5FC2F024B903636837@EXXMAIL02.desktop.frd.metoffice.com"

      type="cite">

      <pre wrap="">Hi George,

Your chosen solution to use a metadata attribute ('nodata' in your case)

to flag that a given file is an empty/null file is exactly the solution

that I had in mind for CMIP5 files comprised of all missing data.

Unfortunately - and the reason I didn't pursue it on mailing lists -

such files would not, I think, be CF-compliant and as such would likely

trip up current netCDF client software (and certainly the tools likely

to be used for analysing CMIP5 datasets).

Although it's probably too late to use this device on the CMIP5 project,

nonetheless I wonder if it isn't worth making a proposal along these

lines to the CF mailing list?

Regards,

Phil

</pre>

      <blockquote type="cite">

        <pre wrap="">-----Original Message-----

From: <a class="moz-txt-link-abbreviated" href="mailto:go-essp-tech-bounces@ucar.edu">go-essp-tech-bounces@ucar.edu</a> 

[<a class="moz-txt-link-freetext" href="mailto:go-essp-tech-bounces@ucar.edu">mailto:go-essp-tech-bounces@ucar.edu</a>] On Behalf Of George J. Huffman

Sent: 24 June 2011 14:23

To: Kettleborough, Jamie

Cc: <a class="moz-txt-link-abbreviated" href="mailto:go-essp-tech@ucar.edu">go-essp-tech@ucar.edu</a>

Subject: Re: [Go-essp-tech] Handling missingdata in the CMIP5 archive

Hi all - to quote a different context ... in the 

Precipitation Processing System, which is the processing 

center for TRMM and GPM satellite project data, the choice is 

to provide a file whether or not the data actually exist.  If 

parts of the file are missing, they are filled with the 

missing value, as you'd expect.  If the entire contents of 

the file are unavailable, the metadata in the header includes 

a "nodata=true" flag and no space is wasted.  To follow up on 

the "early failure" comments, if you process the header for 

the nodata flag, you'd immediately hit it, and if you don't, 

you'd immediately hit read failures.  As a visual check, the 

all-missing file's size is tiny compared to the usual file 

that has data.

George

</pre>

      </blockquote>

      <pre wrap="">_______________________________________________

GO-ESSP-TECH mailing list

<a class="moz-txt-link-abbreviated" href="mailto:GO-ESSP-TECH@ucar.edu">GO-ESSP-TECH@ucar.edu</a>

<a class="moz-txt-link-freetext" href="http://mailman.ucar.edu/mailman/listinfo/go-essp-tech">http://mailman.ucar.edu/mailman/listinfo/go-essp-tech</a>

</pre>

    </blockquote>

  </body>

</html>