[Go-essp-tech] Handling missingdata in the CMIP5 archive

Tue Jun 28 10:23:01 MDT 2011

Hello Karl,

I think this is a better solution.  There is something still bothering
me about it a bit, but it may just be the way I think of the problem.  I
think your 'large chunk' case where you omit files is similar to having
missing data in a time coordinate, so you may still end up with users
having to handle both cases.  The reason I think that the large chunk
and missing time coordinate values are similar is because I think there
is a concept that users will have in code of a dataset - it may take
different guises - but one example is something like a DRS atomic data
set.  This atomic data set, practically, is made up of an aggregation of
files.  If there is a file omitted, because of unavailable data, then
this will lead to omitted times in the time coordinate of the atomic
data set.  So the user may still have to deal with both detecting
missing data values and an incomplete time series in a time coordinate.
(Does that make sense?)  I don't want to labour this unnecessarily as I
think the decision has to be made, so feel free to ignore this.

How large is 'large chunk'?  I think we have a months worth of 3 hourly
data that we have lost (and don't think we can recreate without a huge
amount of effort).  Is this sufficiently large to warrant leaving out a
file? (I can't remember the details of the case that triggered this
debate).

Presumably if the recommendation is to put in missing data values for
this month then we can do it in a single file?  So we would have one
file spanning the missing month, with all the correct meta-data - time
coordinate included - but with missing data.  The files either side (in
the time series) of this file would be standard files - with the data
values as output by the model etc. I think users have to deal with a
series of files in an atomic dataset anyway so we wouldn't be adding
anything new if we did this.

Jamie

> 
> Message: 3
> Date: Mon, 27 Jun 2011 09:44:24 -0700
> From: Karl Taylor <taylor13 at llnl.gov>
> Subject: Re: [Go-essp-tech] Handling missingdata in the CMIP5 archive
> To: go-essp-tech at ucar.edu
> Message-ID: <4E08B368.80501 at llnl.gov>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Hi Jamie,
> 
> Well, I really think large chunks of contiguous missing time-slices 
> should be treated differently from occasional missing time-slices.
> 
> In the latter case, your argument has merit in that if we allow 
> flexibility, users would have to perform 2 rather than 1 
> test.  I have 
> just checked with Charles and found out that CMOR will error 
> exit if you 
> try to write successive time-slices to a file that are not 
> spaced within 
> 20% of the specified interval.  So if you try to write 
> monthly data and 
> skip a month, CMOR will error exit.  CMOR can't check whether 
> time-slices are missing *between* two files.
> 
> Anyway, as a practical matter then, most users will have to fill 
> time-slices that can't be recovered with missing values or CMOR will 
> error exit.  So, I guess I vote to drop the 2nd option I proposed.  
> Thus, you shouldn't omit isolated time-slices entirely; you 
> should fill 
> them with missing values.
> 
> [Large chunks of contiguous missing time-slices can be omitted by 
> constructing files such that the missing time-slices fall 
> between files.]
> 
> Any objections?
> 
> Best regards,
> Karl
> 
>