[Go-essp-tech] DRS question - temporal subset

Thu Feb 9 16:28:31 MST 2012

Hi Estani, Stephen, et al.

Charles and I looked at the CMOR2 code and confirmed that all files that 
were successfully written should have two times in their file names 
marking the first and last time slices written.  If a file has only one 
time written, this indicates it was a *temporary* file that was opened 
by CMOR (at the time it attempted to write the first time-slice), but it 
was never closed successfully by CMOR.  This means the file probably is 
either incomplete or has errors and has not passed QCL1.

CMOR's algorithm for generating file names is copied below.  In short, 
when we get to sub-6-hourly we include minutes (because time-mean 
3-hourly data, for example, is reported for periods centered at 1.5, 
4.5, 7.5 .... 22.5 hours, which would be recorded in the filename as 
0130, 0430, 0730, ... 2230),  and sub-hourly data could possibly be 
reported at times centered at half-minutes (e.g, if reported every 25 
minutes, perhaps at 12 min, 30 sec, 37 min 30 sec, etc.)  By allowing 
for these possibilities, we felt it best that all models include the 
same precision, even if the finest unit was unnecessary in a particular 
case.

The bottom line is that the number of digits should depend only on the 
cmor table, not the model.  Also, the DRS should be revised because it 
implies that depending on when sampling occurs, you might require more 
or less precision, when in fact the precision is determined by the CMOR 
table (i.e., based only on the sampling interval).

Best regards,
Karl

Note:  below "interval" is the approximate interval of time between 
successive time-slices.

   /* first time point */
        strncat(outname,"_",CMOR_MAX_STRING-strlen(outname));
        snprintf(msg2,CMOR_MAX_STRING,"%.4ld",comptime.year);
        strncat(outname,msg2,CMOR_MAX_STRING-strlen(outname));
        if (interval<29.E6) { /* less than a year */
      snprintf(msg2,CMOR_MAX_STRING,"%.2i",comptime.month);
      strncat(outname,msg2,CMOR_MAX_STRING-strlen(outname));
        }
        if (interval<2.E6) { /* less than a month */
      snprintf(msg2,CMOR_MAX_STRING,"%.2i",comptime.day);
      strncat(outname,msg2,CMOR_MAX_STRING-strlen(outname));
        }
        if (interval<86000) { /* less than a day */
      snprintf(msg2,CMOR_MAX_STRING,"%.2i",(int)comptime.hour);
      strncat(outname,msg2,CMOR_MAX_STRING-strlen(outname));
        }
        if (interval<21000) { /* less than 6hr */
      /* from now on add 1 more level of precision since that frequency */
      ierr = (int)((comptime.hour-(int)(comptime.hour))*60.);
      snprintf(msg2,CMOR_MAX_STRING,"%.2i",ierr);
      strncat(outname,msg2,CMOR_MAX_STRING-strlen(outname));
        }
        if (interval<3000) { /* less than an hour */

   snprintf(msg2,CMOR_MAX_STRING,"%.2i",(int)((comptime.hour-(int)(comptime.hour))*3600.)-ierr*60);
      strncat(outname,msg2,CMOR_MAX_STRING-strlen(outname));
        }

        /* separator between first and last time */
        strncat(outname,"-",CMOR_MAX_STRING-strlen(outname));

On 2/9/12 9:10 AM, Estanislao Gonzalez wrote:
> Hi Karl, Charles,
>
> AFAIK these files were created by CMOR.  If this naming schema is not 
> valid, CMOR should be changed to forbid this from happening. drslib is 
> a utility tool, but CMOR is part of the QCL1, so this shouldn't be 
> allowed at all IMHO.
>
> By chance I stumble upon a subhourly file, which I wanted to 
> replicate, that has a time-slice defined up to the seconds. You 
> mentioned, just as what the documentation says, that the finest 
> resolution is minutes. Is this acceptable? Does it make sense? (I'm 
> still waiting for an answer if this was intentional at all, since the 
> seconds are always '00')
>
> Thanks,
> Estani
>
>
> Am 09.02.2012 17:48, schrieb stephen.pascoe at stfc.ac.uk:
>>
>> Thanks Karl,
>>
>> The code is fixed in the repository to do the right thing with only 
>> one time-slice but I'm glad we shouldn't be seeing these files in 
>> general as other bits of the pipeline could be affected.
>>
>> Stephen.
>>
>> ---
>>
>> Stephen Pascoe  +44 (0)1235 445980
>>
>> Centre of Environmental Data Archival
>>
>> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>>
>> *From:*Karl Taylor [mailto:taylor13 at llnl.gov]
>> *Sent:* 09 February 2012 16:40
>> *To:* Estanislao Gonzalez
>> *Cc:* Pascoe, Stephen (STFC,RAL,RALSP); legutke at dkrz.de; 
>> go-essp-tech at ucar.edu; Charles Doutriaux
>> *Subject:* Re: [Go-essp-tech] DRS question - temporal subset
>>
>> Hi Estani and Charles,
>>
>> Unless only a single time-slice is stored, both the beginning and 
>> ending times should be included in the file name, so the file you are 
>> considering has an incorrect name.  The bit about "sufficient 
>> suffixes" relates to the precision with which the times are specified 
>> (stopping, for example, at month for monthly means, but going all the 
>> way to minutes for sub-hourly).
>>
>> I don't think drslib needs to handle the case of a single time-slice 
>> for CMIP5 because I think CMOR invariably includes both the beginning 
>> and ending times (even if they are identical).  Were the files 
>> written with CMOR?
>>
>> Charles can tell us if I've said anything incorrect.
>>
>> thanks,
>> Karl
>>
>>
>>
>>
>> On 2/9/12 4:30 AM, Estanislao Gonzalez wrote:
>>
>> Sure,
>>   
>> trans.filename_to_drs('clmcalipso_cf3hr_MPI-ESM-LR_amip_r1i1p1_200810221030.nc')
>> Traceback (most recent call last):
>>     File "<stdin>", line 1, in<module>
>>     File
>> "/usr/local/cdat/lib/python2.6/site-packages/drslib-0.3.0a3-py2.6.egg/drslib/translate.py",
>> line 413, in filename_to_drs
>>       t.filename_to_drs(context)
>>     File
>> "/usr/local/cdat/lib/python2.6/site-packages/drslib-0.3.0a3-py2.6.egg/drslib/translate.py",
>> line 355, in filename_to_drs
>>       context.drs.subset = (_to_date(n1), _to_date(n2), clim)
>>     File
>> "/usr/local/cdat/lib/python2.6/site-packages/drslib-0.3.0a3-py2.6.egg/drslib/translate.py",
>> line 503, in _to_date
>>       mo = re.match(r'(\d{4})(\d{2})?(\d{2})?(\d{2})?(\d{2})?', date_str)
>>     File "/usr/local/cdat/lib/python2.6/re.py", line 137, in match
>>       return _compile(pattern, flags).match(string)
>> TypeError: expected string or buffer
>>   
>>   
>> But I'm more concern with the question of how should it be. Missing that
>> info will cause that in some particular cases you can't infer the end
>> date, and thus the start date of the next junk, without knowing the
>> calender type. If you need to get the file to read the calender type and
>> *then* infer the name, the usefulness of having a naming scheme would be
>> diminished.
>>   
>> As always I think we should comply with whatever rules and guidelines we
>> have, as a computer scientist I can't help but seeing the benefits of
>> homogeneity and the burden caused by exceptions.
>> That's why my question is: how should it be? I'm not sure I'm
>> interpreting the documentation properly.
>>   
>> Thanks,
>> Estani
>>   
>> Am 09.02.2012 13:14, schriebstephen.pascoe at stfc.ac.uk:  <mailto:stephen.pascoe at stfc.ac.uk:>
>>
>>     I think drslib will cope -- if not it's a bug -- but the product deduction code often needs both time bounds.
>>
>>       
>>
>>     Can you send me the error message.
>>
>>       
>>
>>     Thanks,
>>
>>     Stephen.
>>
>>       
>>
>>     ---
>>
>>     Stephen Pascoe  +44 (0)1235 445980
>>
>>     Centre of Environmental Data Archival
>>
>>     STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>>
>>       
>>
>>       
>>
>>     -----Original Message-----
>>
>>     From:go-essp-tech-bounces at ucar.edu  <mailto:go-essp-tech-bounces at ucar.edu>  [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Estanislao Gonzalez
>>
>>     Sent: 09 February 2012 11:06
>>
>>     To:go-essp-tech at ucar.edu  <mailto:go-essp-tech at ucar.edu>
>>
>>     Cc: Stephanie Legutke
>>
>>     Subject: [Go-essp-tech] DRS question - temporal subset
>>
>>       
>>
>>     Hi all,
>>
>>       
>>
>>     This is what the DRS reference says:
>>
>>     "Temporal Subsets: Time instants and periods (N1(-N2))
>>
>>       
>>
>>     Time instants and periods will be represented by
>>
>>     ‘yyyy[mm[dd[hh][mm]]][-clim]’, where ‘yyyy’,
>>
>>     ‘mm’, ‘dd’, ‘hh’ ‘mm’ are integer year, month, day, hour, and minute,
>>
>>     respectively, and enough
>>
>>     (and just enough) of the suffixes should be added to unambiguously
>>
>>     resolve the interval between
>>
>>     time-samples contained in the file or virtual file URL. (For example,
>>
>>     monthly mean data would
>>
>>     include “mm”, but not “dd”, “hh”, or “mm”; “subhr” data would include
>>
>>     all suffixes.) The
>>
>>     optional “-clim” is appended when the file contains a climatology. For
>>
>>     example, a file with
>>
>>     sampling frequency of “mo” and the time designation 196001-198912-clim
>>
>>     represents the
>>
>>     monthly mean climatology (12 time values) computed for the period
>>
>>     extending from 1/1960-
>>
>>     12/1989. Note that the DRS does not explicitly specify the calendar type
>>
>>     (e.g., Julian,
>>
>>     Gregorian), but the calendar will be indicated by one of the attributes
>>
>>     in each netCDF file."[1]
>>
>>       
>>
>>        From the title I infer that only the start point is required as part of
>>
>>     the temporal subset, though the text speaks about "enough information to
>>
>>     unambiguously resolve the interval between
>>
>>     time-samples".
>>
>>       
>>
>>     The problem is that I get 3hr and subhr data with only the starting
>>
>>     time-stamp, the drslib cannot cope with this.
>>
>>     Should the drslib be adapted to accept that case or all files renamed?
>>
>>       
>>
>>     What about the rest of the system? Have anyone already tried publishing
>>
>>     such files? Make sense?
>>
>>     And what was the intention with the parenthesis in the Temporal Subsets
>>
>>     title?
>>
>>       
>>
>>     Thanks,
>>
>>     Estani
>>
>>       
>>
>>     [1] -http://cmip-pcmdi.llnl.gov/cmip5/docs/cmip5_data_reference_syntax.pdf
>>
>>       
>>
>>   
>>   
>
>
> -- 
> Estanislao Gonzalez
>
> Max-Planck-Institut für Meteorologie (MPI-M)
> Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
> Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>
> Phone:   +49 (40) 46 00 94-126
> E-Mail:gonzalez at dkrz.de  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20120209/1d94de9d/attachment-0001.html