[Go-essp-tech] Handling missingdata in the CMIP5 archive

Karl Taylor taylor13 at llnl.gov
Mon Jun 27 10:44:24 MDT 2011


Hi Jamie,

Well, I really think large chunks of contiguous missing time-slices 
should be treated differently from occasional missing time-slices.

In the latter case, your argument has merit in that if we allow 
flexibility, users would have to perform 2 rather than 1 test.  I have 
just checked with Charles and found out that CMOR will error exit if you 
try to write successive time-slices to a file that are not spaced within 
20% of the specified interval.  So if you try to write monthly data and 
skip a month, CMOR will error exit.  CMOR can't check whether 
time-slices are missing *between* two files.

Anyway, as a practical matter then, most users will have to fill 
time-slices that can't be recovered with missing values or CMOR will 
error exit.  So, I guess I vote to drop the 2nd option I proposed.  
Thus, you shouldn't omit isolated time-slices entirely; you should fill 
them with missing values.

[Large chunks of contiguous missing time-slices can be omitted by 
constructing files such that the missing time-slices fall between files.]

Any objections?

Best regards,
Karl


On 6/27/11 9:15 AM, Kettleborough, Jamie wrote:
> Hello Karl,
>
> Is this extra flexibility really needed?  I think that giving
> alternative 'representations' of the fact that some data sets have
> unavailable time slices may make it harder for the data user.  Don't
> they have to write code to deal with all the alternatives?
>
> Sorry this is so terse and there are other things in your e-mail I
> haven't commented on - because I haven't had chance to think through
> what you are saying.  I'm also aware you need to get a decision made on
> this so you can inform people what to do.
>
> Jamie
>
>
>
>> So, I'm inclined to allow some flexibility summarized here
>> since unless folks are careful, they'll make mistakes no
>> matter what we decide:
>>
>> When isolated time-slices in a dataset are lost and it is
>> impossible to recover them, it is recommended that those
>> isolated missing time-slices be:
>> 1) filled entirely with the "missing data" value, or
>> 2) be entirely omitted from the file (making sure the
>> time-coordinate reflects their absence)
>>
>> When significant portions of a time-series are omitted
>> (either my design or otherwise), one should simply not create
>> files for those portions of the time-series.  This might
>> require the user to divide data normally found in a single
>> file into two files.  For example, if 100-years of monthly
>> mean data are normally packaged into a single file, but a
>> decade  (i.e., 120 consecutive samples) is unavailable (say
>> years 40-49), the user should write instead two files, the
>> first with 40 years of day and the second with the last 50
>> years of data.
>>
>> Further discussion invited.
>>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110627/3b3f0003/attachment.html 


More information about the GO-ESSP-TECH mailing list