[Go-essp-tech] Handling missingdata in the CMIP5 archive

Mon Jul 11 03:30:04 MDT 2011

Hello Karl,

We've come across some more 'lost diagnostics' in one of our ocean runs.
We have lost a years worth of some annual mean ocean diagnostics.  We
don't think these are recoverable.

Based on the discussion in this thread, and what is easiest for us in
the time available, we intend to submit the lost year as a separate file
with the diagnostics filled with missing data.  So an example of files
from the series will be:

bddtalk_Oyr_HadGEM2-ES_historical_r1i1p1_1860-1880.nc
bddtalk_Oyr_HadGEM2-ES_historical_r1i1p1_1881-1881.nc  # this file will
contain missing data only (I think I have the file name right - I just
made it up)
bddtalk_Oyr_HadGEM2-ES_historical_r1i1p1_1882-1959.nc
bddtalk_Oyr_HadGEM2-ES_historical_r1i1p1_1960-2005.nc

The variables with a complete time series will look like:

talk_Oyr_HadGEM2-ES_historical_r1i1p1_1860-1959.nc
talk_Oyr_HadGEM2-ES_historical_r1i1p1_1960-2005.nc

If you think this is going to cause users or data services problems then
please let us know.

Thanks,
Jamie

> -----Original Message-----
> From: Kettleborough, Jamie 
> Sent: 28 June 2011 17:23
> To: 'go-essp-tech at ucar.edu'
> Cc: Kettleborough, Jamie
> Subject: Re: [Go-essp-tech] Handling missingdata in the CMIP5 archive
> 
> Hello Karl,
> 
> I think this is a better solution.  There is something still 
> bothering me about it a bit, but it may just be the way I 
> think of the problem.  I think your 'large chunk' case where 
> you omit files is similar to having missing data in a time 
> coordinate, so you may still end up with users having to 
> handle both cases.  The reason I think that the large chunk 
> and missing time coordinate values are similar is because I 
> think there is a concept that users will have in code of a 
> dataset - it may take different guises - but one example is 
> something like a DRS atomic data set.  This atomic data set, 
> practically, is made up of an aggregation of files.  If there 
> is a file omitted, because of unavailable data, then this 
> will lead to omitted times in the time coordinate of the 
> atomic data set.  So the user may still have to deal with 
> both detecting missing data values and an incomplete time 
> series in a time coordinate.  (Does that make sense?)  I 
> don't want to labour this unnecessarily as I think the 
> decision has to be made, so feel free to ignore this.
> 
> How large is 'large chunk'?  I think we have a months worth 
> of 3 hourly data that we have lost (and don't think we can 
> recreate without a huge amount of effort).  Is this 
> sufficiently large to warrant leaving out a file? (I can't 
> remember the details of the case that triggered this debate).
> 
> Presumably if the recommendation is to put in missing data 
> values for this month then we can do it in a single file?  So 
> we would have one file spanning the missing month, with all 
> the correct meta-data - time coordinate included - but with 
> missing data.  The files either side (in the time series) of 
> this file would be standard files - with the data values as 
> output by the model etc. I think users have to deal with a 
> series of files in an atomic dataset anyway so we wouldn't be 
> adding anything new if we did this.
> 
> 
> Jamie
> 
> 
> > 
> > Message: 3
> > Date: Mon, 27 Jun 2011 09:44:24 -0700
> > From: Karl Taylor <taylor13 at llnl.gov>
> > Subject: Re: [Go-essp-tech] Handling missingdata in the 
> CMIP5 archive
> > To: go-essp-tech at ucar.edu
> > Message-ID: <4E08B368.80501 at llnl.gov>
> > Content-Type: text/plain; charset="iso-8859-1"
> > 
> > Hi Jamie,
> > 
> > Well, I really think large chunks of contiguous missing time-slices 
> > should be treated differently from occasional missing time-slices.
> > 
> > In the latter case, your argument has merit in that if we allow 
> > flexibility, users would have to perform 2 rather than 1 
> test.  I have 
> > just checked with Charles and found out that CMOR will 
> error exit if 
> > you try to write successive time-slices to a file that are 
> not spaced 
> > within 20% of the specified interval.  So if you try to 
> write monthly 
> > data and skip a month, CMOR will error exit.  CMOR can't 
> check whether 
> > time-slices are missing *between* two files.
> > 
> > Anyway, as a practical matter then, most users will have to fill 
> > time-slices that can't be recovered with missing values or 
> CMOR will 
> > error exit.  So, I guess I vote to drop the 2nd option I proposed.
> > Thus, you shouldn't omit isolated time-slices entirely; you should 
> > fill them with missing values.
> > 
> > [Large chunks of contiguous missing time-slices can be omitted by 
> > constructing files such that the missing time-slices fall between 
> > files.]
> > 
> > Any objections?
> > 
> > Best regards,
> > Karl
> > 
> > 
>