<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<font face="Times New Roman">Hi Stephen, Martin, and all,<br>
<br>
Thanks very much for thinking carefully about this. I've
responded to your input below:<br>
</font><br>
Stephen:<br>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Extending
the use of the "-suffix" part of temporal subset to include
averaging looks reasonable. The geographic subset section is
rather complex and I worry that it will be difficult to
implement unambiguous parsers for it. This may not matter
provided we can always interpret it as an opaque string in
filenames of the form:
"c1_c2_...cn_[temporal-subset]_[geospatial-info].nc". My
specific concerns about parsing are below. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31, 73,
125);">Also, more generally, I wonder whether we are repeating
too much information from the CF metadata in the filename. I
think the temporal subset is already pushing to the limit what
can be effectively represent in a filename and this could push
it too far. Filenames within a dataset should be unique but
maybe we could let data providers decide how they are labelled?
<br>
</span></p>
<p class="MsoNormal"><br>
Karl: Yes, this is an option. Including a uniform way of
embedding the time in the filename was essential since we wanted
to be able to split time-series across files. The motivation for
treating simple spatial subsetting and averaging in a standard way
is that we hope to return to users requested regional datasets,
extracting the data on the server side. Shouldn't a user expect
the files to be named similarly, even if they were created from
different ESG nodes?<br>
<span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31, 73,
125);">Stephen:<br>
If we continue to add detailed syntax to the filename it would
greatly help to have a formal grammar in BNF notation
(<a class="moz-txt-link-freetext" href="http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form">http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form</a>).<br>
</span></p>
<p class="MsoNormal">Karl: I hope someone familiar with BNF might
do this if it's deemed important.<br>
<span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p><br>
Stephen:<br>
</o:p></span></p>
<p class="MsoNormal"><b><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Section
2.4 Geographic subsets<o:p></o:p></span></b></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">As
described the format is "g[-XXXX][-YYYY]" where both XXXX and
YYYY are optional and YYYY = "[yyy][-zzz]". XXXX can be omitted
when YYYY is present as in the example "g-ocn-areaavg".<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31, 73,
125);">I foresee problems in writing parsers that disambiguate
the case "g-XXXX" from "g-YYYY" particularly in the case where
XXXX is a named region. If we wanted to extend the valid
vocabulary of YYYY we would have to check for clashes with all
named regions used in XXXX. This would seam like a hostage to
fortune, particularly if users start defining their own regions.</span></p>
<p class="MsoNormal"> Karl: Perhaps to simplify things we should
require XXXX (and prohibit hyphens within XXXX). <br>
</p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Stephen:<br>
Similarly how do we disambiguate these cases:<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">g-XXXX-yyy<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">g-yyy-zzz<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">With
a sufficiently complex parser we can differentiate these because
yyy and zzz are from controlled vocabularies but writing a
generic parser that forsees extensions to these vocabularies
will be tricky and error-prone. <br>
<o:p></o:p></span></p>
<p class="MsoNormal"> Karl: requiring XXXX would eliminate this
problem.<br>
</p>
<p class="MsoNormal">Martin:<br>
</p>
<blockquote
cite="mid:E21FBC3F00D7304687CB46529F9676D7266103E6@EXCHMBX01.fed.cclrc.ac.uk"
type="cite">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">(1) Like Stephen, I’m concerned about the
complexity of the “XXXX” section. My first suggestion would
be to drop the first hyphen and “lat” and “lon”, changing
“g-lat20S20Nlon170p5W130p5W” to “g20S20N170p5W130p5W”. I’d
also be tempted to drop the “p5” terms: for some grids (e.g.
Gaussian) the exact limits will have many decimal places and
so there will need to be some specification of the level of
truncation expected, and I think the most convenient would
be to round to the nearest integer.</span></p>
</div>
</blockquote>
<br>
Karl: I like the suggestion to round to the nearest integer. I'd
like to hear others weigh in on whether to eliminate "lat" and
"lon". I guess this would be o.k.<br>
<br>
<blockquote
cite="mid:E21FBC3F00D7304687CB46529F9676D7266103E6@EXCHMBX01.fed.cclrc.ac.uk"
type="cite">
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">(2)
To make parsing of the overall file name easier, you could
use c1_c2_..... [_<time range>][.<spatial
info>].nc – using a “.” Instead of “_” makes life easier
for file parsers. Technically this is not necessary, as the
“g” already makes it unambiguous, but parsers have to deal
with the special case of gridspec files and adding more
variants make life more complicated. Using “.” will make it
easier to separate the parsing of the existing components
from the new ones. <o:p></o:p></span></p>
</div>
</blockquote>
Karl: I don't find this argument compelling. I think it's pretty
easy to write a parser that can deal with the two optional suffixes
(i.e., temporal subset and geographical info.) The first consists
of only numerals (and a hyphen), whereas the second begins with
"g-". I think some software doesn't like "." in filenames except to
separate the final "file-type" suffix (e.g, ".nc").<br>
<blockquote
cite="mid:E21FBC3F00D7304687CB46529F9676D7266103E6@EXCHMBX01.fed.cclrc.ac.uk"
type="cite">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">(3) As Stephen points out, in the present form,
in a string “g-aaa-bbb” the term “aaa” could be either a
region from an gazetteer or a designation of a type of
surface (“ocn” or “lnd”). Having to look through multiple
vocabularies is a problem for file name interpretation, even
if one of them only has two elements. To get over this, I’d
suggest something like: “.....[_<time
range>][.gXXXX_pYYY-ZZZ].nc”, where the “gXXXX” and
“pYYY-ZZZ” terms are both optional and the underscore is
only present if both are present. This approach will only
work if you accept the use of “.” suggested in (2) to make a
clear break between the first part of the name and the new.
This would give us a first section of the file name in which
components are identified by position and a 2<sup>nd</sup>
section in which components are identified by the first
letter of the component.</span></p>
</div>
</blockquote>
Karl: I prefer simply requiring XXXX if you want to include YYYY.<br>
<br>
<br>
I've made the changes inspired by your input in the attached file.
Further comments/suggestions are welcome.<br>
<br>
Best regards,<br>
Karl<br>
<blockquote
cite="mid:E21FBC3F00D7304687CB46529F9676D7266103E6@EXCHMBX01.fed.cclrc.ac.uk"
type="cite">
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Regards,<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Martin
<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm
0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #B5C4DF
1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext"
lang="EN-US">From:</span></b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext"
lang="EN-US"> Karl Taylor [<a class="moz-txt-link-freetext" href="mailto:taylor13@llnl.gov">mailto:taylor13@llnl.gov</a>]
<br>
<b>Sent:</b> 06 June 2012 21:58<br>
<b>To:</b> Kettleborough, Jamie; V. Balaji; Steve
Hankin; Juckes, Martin (STFC,RAL,RALSP); Lawrence,
Bryan (STFC,RAL,RALSP); Pascoe, Stephen
(STFC,RAL,RALSP); <a class="moz-txt-link-abbreviated" href="mailto:go-essp-tech@ucar.edu">go-essp-tech@ucar.edu</a><br>
<b>Subject:</b> Re: [Go-essp-tech] DRS corrections and
extensions<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Dear all,<br>
<br>
In February I asked for comments on my proposal to extend
the DRS to include information about spatio-temporal
subsets or means. I heard from Jamie, but no one else. I
respond to Jamie below, but I also would like your input
specifically about:<br>
<br>
1. Is this method of describing spatio-temporal subsets
acceptable?<br>
2. Is it worth taking this step if we don't say anything
about other "processed" output? For example how to
describe "regridded" data or multi-model means.<br>
<br>
I've attached the proposed version of the DRS, which differs
from the one I sent in January only in a couple mods made in
response to Jamie.<br>
<br>
Best regards,<br>
Karl<br>
<br>
On 2/13/12 6:47 AM, Kettleborough, Jamie wrote: <o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">Hello
Karl,</span><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">this
will be terse as I have time to review, but not to
necessarily get the words right - hope I don't say
anything too bad because of this.</span><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">1.
section 2.3, Not sure 'output' should be mentioned under
'product'. I don't think 'output' ever makes it to
publication level, so does not need to appear in a
publication level id. I know cmor produces it, but I
think that's kind of historical isn't it, rather than
necessary? Maybe its too late for details like this?</span><o:p></o:p></p>
<p class="MsoNormal">It's true that in the end the CMIP5
output should not remain as "output", but be assigned to
"output1" or "output2". Nevertheless, I don't think there
is any harm in keeping it in the DRS.
<br>
<br>
<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">2.
section 2.3 version number: to be consistent with what we
really have in CMIP5 I think you need to note that v1, v2
are also present, though any *new* versions should use
vYYYYMMDD.</span><o:p></o:p></p>
<p class="MsoNormal">I have modified the text to indicate that
software cannot rely on the version number reflecting a
date.<br>
<br>
<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">3.
section 2.3 version: I wonder if you need to say more
(maybe not here, but if not where?) about what triggers a
new version. I think its
</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">
a. anything that changes the content of a file already
published and</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">
b. the addition or deletion of files from any publication
data set. </span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue"> Pure
'data management' meta data changes (addition of
checksums, move to new URL's) need not trigger a new
version.</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">
Do you also need to say there is no guarantee that old
versions will be kept (unless they have a DOI).</span><o:p></o:p></p>
<p class="MsoNormal">I've added some of this information now
to the document.<br>
<br>
<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">4.
section 2.4 Temporal Subsets or means: I don't understand
the 'avg' example, or if I do I don't know if its right
(but the point is relatively minor). I think the example
you quote as one 6 month mean field in it. This is based
on 1 day means. I think its a little anomalous to keep
the frequency as 'day' in this case. That's not quite
consistent with the definition (and I think all other
uses) of frequency. Strictly speaking frequency should be
6mon no? (I may have misunderstood).</span><o:p></o:p></p>
<p class="MsoNormal">I think you're right. I'm not sure why I
thought this was the right way to do it. I've changed the
example,
<br>
<br>
<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">5.
section 3.5. Does this need clarifying? I think the
current wording is potentially confusing, I think it
should say something like:</span><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">'URLs
referencing the data files will have a site dependent
prefix (that may change due to site-specific data
management tasks) followed by the directory structure.
This directory structure should (but may not) follow the
recommendations of section 3.3'</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I've modified the text as suggested.<br>
<br>
<o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">6.
I've noticed that the thredds catalogs also expose a thing
called the file_id, e.g</span><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue"><property
name="file_id"
value="cmip5.output1.CNRM-CERFACS.CNRM-CM5.rcp45.mon.ocean.Omon.r1i1p1.vo_Omon_CNRM-CM5_rcp45_r1i1p1_203601-204512.nc"/></span><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">I
don't know if they need a mention as being anything
important (we don't use them as they don't give any
version info).</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We've already given 5 use cases, which I
think is enough. The DRS is used in a number of other ways.<br>
<br>
<o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">Hope
this is useful,</span><o:p></o:p></p>
<p class="MsoNormal">Yes thanks very much!<br>
Karl<br>
<br>
<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">Jamie</span><o:p></o:p></p>
<blockquote style="border:none;border-left:solid blue
1.5pt;padding:0cm 0cm 0cm
4.0pt;margin-left:3.75pt;margin-top:5.0pt;margin-right:0cm;margin-bottom:5.0pt">
<p class="MsoNormal"><o:p> </o:p></p>
<div class="MsoNormal" style="text-align:center"
align="center"><span lang="EN-US">
<hr align="center" size="2" width="100%">
</span></div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif""
lang="EN-US">From:</span></b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif""
lang="EN-US">
<a moz-do-not-send="true"
href="mailto:go-essp-tech-bounces@ucar.edu">go-essp-tech-bounces@ucar.edu</a>
[<a moz-do-not-send="true"
href="mailto:go-essp-tech-bounces@ucar.edu">mailto:go-essp-tech-bounces@ucar.edu</a>]
<b>On Behalf Of </b>Karl Taylor<br>
<b>Sent:</b> 10 February 2012 01:32<br>
<b>To:</b> V. Balaji; Steve Hankin; Martin Juckes; Bryan
Lawrence; Stephen Pascoe;
<a moz-do-not-send="true"
href="mailto:go-essp-tech@ucar.edu">go-essp-tech@ucar.edu</a><br>
<b>Subject:</b> [Go-essp-tech] DRS corrections and
extensions</span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal">Dear all,<br>
<br>
Attached is my attempt to make the DRS consistent with
CMIP5 (in describing the precision of "time instants"),
but primarily to extend it to a more complete treatment of
spatio-temporal subsets or means. I've also corrected a
few typos.<br>
<br>
Comments most welcome. In particular could someone
recheck sections 3.3-3.5 (which haven't been changed by
me) to see if they remain consistent with CMIP5?<br>
<br>
thanks and best regards,<br>
Karl<o:p></o:p></p>
</blockquote>
</div>
</div>
<br>
<p>-- <br>
Scanned by iCritical.
</p>
<br>
</blockquote>
<br>
<br>
</body>
</html>