[Go-essp-tech] Handling missingdata in the CMIP5 archive
Karl Taylor
taylor13 at llnl.gov
Mon Jun 27 10:44:24 MDT 2011
Hi Jamie,
Well, I really think large chunks of contiguous missing time-slices
should be treated differently from occasional missing time-slices.
In the latter case, your argument has merit in that if we allow
flexibility, users would have to perform 2 rather than 1 test. I have
just checked with Charles and found out that CMOR will error exit if you
try to write successive time-slices to a file that are not spaced within
20% of the specified interval. So if you try to write monthly data and
skip a month, CMOR will error exit. CMOR can't check whether
time-slices are missing *between* two files.
Anyway, as a practical matter then, most users will have to fill
time-slices that can't be recovered with missing values or CMOR will
error exit. So, I guess I vote to drop the 2nd option I proposed.
Thus, you shouldn't omit isolated time-slices entirely; you should fill
them with missing values.
[Large chunks of contiguous missing time-slices can be omitted by
constructing files such that the missing time-slices fall between files.]
Any objections?
Best regards,
Karl
On 6/27/11 9:15 AM, Kettleborough, Jamie wrote:
> Hello Karl,
>
> Is this extra flexibility really needed? I think that giving
> alternative 'representations' of the fact that some data sets have
> unavailable time slices may make it harder for the data user. Don't
> they have to write code to deal with all the alternatives?
>
> Sorry this is so terse and there are other things in your e-mail I
> haven't commented on - because I haven't had chance to think through
> what you are saying. I'm also aware you need to get a decision made on
> this so you can inform people what to do.
>
> Jamie
>
>
>
>> So, I'm inclined to allow some flexibility summarized here
>> since unless folks are careful, they'll make mistakes no
>> matter what we decide:
>>
>> When isolated time-slices in a dataset are lost and it is
>> impossible to recover them, it is recommended that those
>> isolated missing time-slices be:
>> 1) filled entirely with the "missing data" value, or
>> 2) be entirely omitted from the file (making sure the
>> time-coordinate reflects their absence)
>>
>> When significant portions of a time-series are omitted
>> (either my design or otherwise), one should simply not create
>> files for those portions of the time-series. This might
>> require the user to divide data normally found in a single
>> file into two files. For example, if 100-years of monthly
>> mean data are normally packaged into a single file, but a
>> decade (i.e., 120 consecutive samples) is unavailable (say
>> years 40-49), the user should write instead two files, the
>> first with 40 years of day and the second with the last 50
>> years of data.
>>
>> Further discussion invited.
>>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110627/3b3f0003/attachment.html
More information about the GO-ESSP-TECH
mailing list