<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <font face="Times New Roman">Hi Stephen, Martin, and all,<br>

      <br>

      Thanks very much for thinking carefully about this.  I've

      responded to your input below:<br>

    </font><br>

    Stephen:<br>

    <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">Extending

        the use of the "-suffix" part of temporal subset to include

        averaging looks reasonable.  The geographic subset section is

        rather complex and I worry that it will be difficult to

        implement unambiguous parsers for it.  This may not matter

        provided we can always interpret it as an opaque string in

        filenames of the form:

        "c1_c2_...cn_[temporal-subset]_[geospatial-info].nc".  My

        specific concerns about parsing are below.  <o:p></o:p></span></p>

    <p class="MsoNormal"><span style="font-size: 11pt; font-family:

        &quot;Calibri&quot;,&quot;sans-serif&quot;; color: rgb(31, 73,

        125);">Also, more generally, I wonder whether we are repeating

        too much information from the CF metadata in the filename.  I

        think the temporal subset  is already pushing to the limit what

        can be effectively represent in a filename and this could push

        it too far.  Filenames within a dataset should be unique but

        maybe we could let data providers decide how they are labelled? 

        <br>

      </span></p>

    <p class="MsoNormal"><br>

      Karl:  Yes, this is an option.  Including a uniform way of

      embedding the time in the filename was essential since we wanted

      to be able to split time-series across files.  The motivation for

      treating simple spatial subsetting and averaging in a standard way

      is that we hope to return to users requested regional datasets,

      extracting the data on the server side.  Shouldn't a user expect

      the files to be named similarly, even if they were created from

      different ESG nodes?<br>

      <span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p></o:p></span></p>

    <p class="MsoNormal"><span style="font-size: 11pt; font-family:

        &quot;Calibri&quot;,&quot;sans-serif&quot;; color: rgb(31, 73,

        125);">Stephen:<br>

        If we continue to add detailed syntax to the filename it would

        greatly help to have a formal grammar in BNF notation

        (<a class="moz-txt-link-freetext" href="http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form">http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form</a>).<br>

      </span></p>

    <p class="MsoNormal">Karl:  I hope someone familiar with BNF might

      do this if it's deemed important.<br>

      <span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p><br>

          Stephen:<br>

        </o:p></span></p>

    <p class="MsoNormal"><b><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">Section

          2.4 Geographic subsets<o:p></o:p></span></b></p>

    <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">As

        described the format is "g[-XXXX][-YYYY]" where both XXXX and

        YYYY are optional and YYYY = "[yyy][-zzz]". XXXX can be omitted

        when YYYY is present as in the example "g-ocn-areaavg".<o:p></o:p></span></p>

    <p class="MsoNormal"><span style="font-size: 11pt; font-family:

        &quot;Calibri&quot;,&quot;sans-serif&quot;; color: rgb(31, 73,

        125);">I foresee problems in writing parsers that disambiguate

        the case "g-XXXX" from "g-YYYY" particularly in the case where

        XXXX is a named region.  If we wanted to extend the valid

        vocabulary of YYYY we would have to check for clashes with all

        named regions used in XXXX.  This would seam like a hostage to

        fortune, particularly if users start defining their own regions.</span></p>

    <p class="MsoNormal"> Karl:  Perhaps to simplify things we should

      require XXXX (and prohibit hyphens within XXXX).  <br>

    </p>

    <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">Stephen:<br>

        Similarly how do we disambiguate these cases:<o:p></o:p></span></p>

    <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">g-XXXX-yyy<o:p></o:p></span></p>

    <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">g-yyy-zzz<o:p></o:p></span></p>

    <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">With

        a sufficiently complex parser we can differentiate these because

        yyy and zzz are from controlled vocabularies but writing a

        generic parser that forsees extensions to these vocabularies

        will be tricky and error-prone.  <br>

        <o:p></o:p></span></p>

    <p class="MsoNormal"> Karl:  requiring XXXX would eliminate this

      problem.<br>

    </p>

    <p class="MsoNormal">Martin:<br>

    </p>

    <blockquote

cite="mid:E21FBC3F00D7304687CB46529F9676D7266103E6@EXCHMBX01.fed.cclrc.ac.uk"

      type="cite">

      <div class="WordSection1">

        <p class="MsoNormal"><span style="font-size: 11pt; font-family:

            &quot;Calibri&quot;,&quot;sans-serif&quot;; color: rgb(31,

            73, 125);">(1) Like Stephen, I’m concerned about the

            complexity of the “XXXX” section. My first suggestion would

            be to drop the first hyphen and “lat” and “lon”, changing

            “g-lat20S20Nlon170p5W130p5W” to “g20S20N170p5W130p5W”. I’d

            also be tempted to drop the “p5” terms: for some grids (e.g.

            Gaussian) the exact limits will have many decimal places and

            so there will need to be some specification of the level of

            truncation expected, and I think the most convenient would

            be to round to the nearest integer.</span></p>

      </div>

    </blockquote>

    <br>

    Karl:  I like the suggestion to round to the nearest integer.  I'd

    like to hear others weigh in on whether to eliminate "lat" and

    "lon".   I guess this would be o.k.<br>

    <br>

    <blockquote

cite="mid:E21FBC3F00D7304687CB46529F9676D7266103E6@EXCHMBX01.fed.cclrc.ac.uk"

      type="cite">

      <div class="WordSection1">

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">(2)

            To make parsing of the overall file name easier, you could

            use c1_c2_..... [_&lt;time range&gt;][.&lt;spatial

            info&gt;].nc – using a “.” Instead of “_” makes life easier

            for file parsers. Technically this is not necessary, as the

            “g” already makes it unambiguous, but parsers have to deal

            with the special case of gridspec files and adding more

            variants make life more complicated. Using “.” will make it

            easier to separate the parsing of the existing components

            from the new ones. <o:p></o:p></span></p>

      </div>

    </blockquote>

    Karl:  I don't find this argument compelling.  I think it's pretty

    easy to write a parser that can deal with the two optional suffixes

    (i.e., temporal subset and geographical info.)  The first consists

    of only numerals (and a hyphen), whereas the second begins with

    "g-".  I think some software doesn't like "." in filenames except to

    separate the final "file-type" suffix (e.g, ".nc").<br>

    <blockquote

cite="mid:E21FBC3F00D7304687CB46529F9676D7266103E6@EXCHMBX01.fed.cclrc.ac.uk"

      type="cite">

      <div class="WordSection1">

        <p class="MsoNormal"><span style="font-size: 11pt; font-family:

            &quot;Calibri&quot;,&quot;sans-serif&quot;; color: rgb(31,

            73, 125);">(3) As Stephen points out, in the present form,

            in a string “g-aaa-bbb” the term “aaa” could be either a

            region from an gazetteer or a designation of a type of

            surface (“ocn” or “lnd”). Having to look through multiple

            vocabularies is a problem for file name interpretation, even

            if one of them only has two elements. To get over this, I’d

            suggest something like: “.....[_&lt;time

            range&gt;][.gXXXX_pYYY-ZZZ].nc”, where the “gXXXX” and

            “pYYY-ZZZ” terms are both optional and the underscore is

            only present if both are present. This approach will only

            work if you accept the use of “.” suggested in (2) to make a

            clear break between the first part of the name and the new.

            This would give us a first section of the file name in which

            components are identified by position and a 2<sup>nd</sup>

            section in which components are identified by the first

            letter of the component.</span></p>

      </div>

    </blockquote>

    Karl:  I prefer simply requiring XXXX if you want to include YYYY.<br>

    <br>

    <br>

    I've made the changes inspired by your input in the attached file. 

    Further comments/suggestions are welcome.<br>

    <br>

    Best regards,<br>

    Karl<br>

    <blockquote

cite="mid:E21FBC3F00D7304687CB46529F9676D7266103E6@EXCHMBX01.fed.cclrc.ac.uk"

      type="cite">

      <div class="WordSection1">

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p></o:p></span></p>

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">Regards,<o:p></o:p></span></p>

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">Martin

              <o:p></o:p></span></p>

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p> </o:p></span></p>

        <div style="border:none;border-left:solid blue 1.5pt;padding:0cm

          0cm 0cm 4.0pt">

          <div>

            <div style="border:none;border-top:solid #B5C4DF

              1.0pt;padding:3.0pt 0cm 0cm 0cm">

              <p class="MsoNormal"><b><span

style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;;color:windowtext"

                    lang="EN-US">From:</span></b><span

style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;;color:windowtext"

                  lang="EN-US"> Karl Taylor [<a class="moz-txt-link-freetext" href="mailto:taylor13@llnl.gov">mailto:taylor13@llnl.gov</a>]

                  <br>

                  <b>Sent:</b> 06 June 2012 21:58<br>

                  <b>To:</b> Kettleborough, Jamie; V. Balaji; Steve

                  Hankin; Juckes, Martin (STFC,RAL,RALSP); Lawrence,

                  Bryan (STFC,RAL,RALSP); Pascoe, Stephen

                  (STFC,RAL,RALSP); <a class="moz-txt-link-abbreviated" href="mailto:go-essp-tech@ucar.edu">go-essp-tech@ucar.edu</a><br>

                  <b>Subject:</b> Re: [Go-essp-tech] DRS corrections and

                  extensions<o:p></o:p></span></p>

            </div>

          </div>

          <p class="MsoNormal"><o:p> </o:p></p>

          <p class="MsoNormal">Dear all,<br>

            <br>

            In February I asked for comments on my proposal to extend

            the DRS to  include information about spatio-temporal

            subsets or means.  I heard from Jamie, but no one else.  I

            respond to Jamie below, but I also would like your input

            specifically about:<br>

            <br>

            1.  Is this method of describing spatio-temporal subsets

            acceptable?<br>

            2.  Is it worth taking this step if we don't say anything

            about other "processed" output?   For example how to

            describe "regridded" data or multi-model means.<br>

            <br>

            I've attached the proposed version of the DRS, which differs

            from the one I sent in January only in a couple mods made in

            response to Jamie.<br>

            <br>

            Best regards,<br>

            Karl<br>

            <br>

            On 2/13/12 6:47 AM, Kettleborough, Jamie wrote: <o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue">Hello

              Karl,</span><o:p></o:p></p>

          <p class="MsoNormal"> <o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue">this

              will be terse as I have time to review, but not to

              necessarily get the words right - hope I don't say

              anything too bad because of this.</span><o:p></o:p></p>

          <p class="MsoNormal"> <o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue">1.

              section 2.3,  Not sure 'output' should be mentioned under

              'product'.  I don't think 'output' ever makes it to

              publication level, so does not need to appear in a

              publication level id.  I know cmor produces it, but I

              think that's kind of historical isn't it, rather than

              necessary?  Maybe its too late for details like this?</span><o:p></o:p></p>

          <p class="MsoNormal">It's true that in the end the CMIP5

            output should not remain as "output", but be assigned to

            "output1" or "output2".  Nevertheless, I don't think there

            is any harm in keeping it in the DRS. 

            <br>

            <br>

            <o:p></o:p></p>

          <p class="MsoNormal"> <o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue">2.

              section 2.3 version number: to be consistent with what we

              really have in CMIP5 I think you need to note that v1, v2

              are also present, though any *new* versions should use

              vYYYYMMDD.</span><o:p></o:p></p>

          <p class="MsoNormal">I have modified the text to indicate that

            software cannot rely on the version number reflecting a

            date.<br>

            <br>

            <o:p></o:p></p>

          <p class="MsoNormal"> <o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue">3.

              section 2.3 version:  I wonder if you need to say more

              (maybe not here, but if not where?) about what triggers a

              new version.  I think its

            </span><o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue"> 

              a. anything that changes the content of a file already

              published and</span><o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue"> 

              b. the addition or deletion of files from any publication

              data set. </span><o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue">  Pure

              'data management' meta data changes (addition of

              checksums, move to new URL's) need not trigger a new

              version.</span><o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue"> 

              Do you also need to say there is no guarantee that old

              versions will be kept (unless they have a DOI).</span><o:p></o:p></p>

          <p class="MsoNormal">I've added some of this information now

            to the document.<br>

            <br>

            <o:p></o:p></p>

          <p class="MsoNormal"> <o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue">4.

              section 2.4 Temporal Subsets or means: I don't understand

              the 'avg' example, or if I do I don't know if its right

              (but the point is relatively minor).  I think the example

              you quote as one 6 month mean field in it.  This is based

              on 1 day means.  I think its a little anomalous to keep

              the frequency as 'day' in this case.  That's not quite

              consistent with the definition (and I think all other

              uses) of frequency.  Strictly speaking frequency should be

              6mon no?  (I may have misunderstood).</span><o:p></o:p></p>

          <p class="MsoNormal">I think you're right.  I'm not sure why I

            thought this was the right way to do it.  I've changed the

            example,

            <br>

            <br>

            <o:p></o:p></p>

          <p class="MsoNormal"> <o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue">5.

              section 3.5.  Does this need clarifying? I think the

              current wording is potentially confusing,  I think it

              should say something like:</span><o:p></o:p></p>

          <p class="MsoNormal"> <o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue">'URLs

              referencing the data files will have a site dependent

              prefix (that may change due to site-specific data

              management tasks) followed by the directory structure. 

              This directory structure should (but may not) follow the

              recommendations of section 3.3'</span><o:p></o:p></p>

          <p class="MsoNormal"><o:p> </o:p></p>

          <p class="MsoNormal">I've modified the text as suggested.<br>

            <br>

            <o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue">6.

              I've noticed that the thredds catalogs also expose a thing

              called the file_id, e.g</span><o:p></o:p></p>

          <p class="MsoNormal"> <o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue">&lt;property

              name="file_id"

value="cmip5.output1.CNRM-CERFACS.CNRM-CM5.rcp45.mon.ocean.Omon.r1i1p1.vo_Omon_CNRM-CM5_rcp45_r1i1p1_203601-204512.nc"/&gt;</span><o:p></o:p></p>

          <p class="MsoNormal"> <o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue">I

              don't know if they need a mention as being anything

              important (we don't use them as they don't give any

              version info).</span><o:p></o:p></p>

          <p class="MsoNormal"><o:p> </o:p></p>

          <p class="MsoNormal">We've already given 5 use cases, which I

            think is enough.  The DRS is used in a number of other ways.<br>

            <br>

            <o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue">Hope

              this is useful,</span><o:p></o:p></p>

          <p class="MsoNormal">Yes thanks very much!<br>

            Karl<br>

            <br>

            <o:p></o:p></p>

          <p class="MsoNormal"> <o:p></o:p></p>

          <p class="MsoNormal"><span

style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:blue">Jamie</span><o:p></o:p></p>

          <blockquote style="border:none;border-left:solid blue

            1.5pt;padding:0cm 0cm 0cm

4.0pt;margin-left:3.75pt;margin-top:5.0pt;margin-right:0cm;margin-bottom:5.0pt">

            <p class="MsoNormal"><o:p> </o:p></p>

            <div class="MsoNormal" style="text-align:center"

              align="center"><span lang="EN-US">

                <hr align="center" size="2" width="100%">

              </span></div>

            <p class="MsoNormal" style="margin-bottom:12.0pt"><b><span

style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"

                  lang="EN-US">From:</span></b><span

style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"

                lang="EN-US">

                <a moz-do-not-send="true"

                  href="mailto:go-essp-tech-bounces@ucar.edu">go-essp-tech-bounces@ucar.edu</a>

                [<a moz-do-not-send="true"

                  href="mailto:go-essp-tech-bounces@ucar.edu">mailto:go-essp-tech-bounces@ucar.edu</a>]

                <b>On Behalf Of </b>Karl Taylor<br>

                <b>Sent:</b> 10 February 2012 01:32<br>

                <b>To:</b> V. Balaji; Steve Hankin; Martin Juckes; Bryan

                Lawrence; Stephen Pascoe;

                <a moz-do-not-send="true"

                  href="mailto:go-essp-tech@ucar.edu">go-essp-tech@ucar.edu</a><br>

                <b>Subject:</b> [Go-essp-tech] DRS corrections and

                extensions</span><span lang="EN-US"><o:p></o:p></span></p>

            <p class="MsoNormal">Dear all,<br>

              <br>

              Attached is my attempt to make the DRS consistent with

              CMIP5 (in describing the precision of "time instants"),

              but primarily to extend it to a more complete treatment of

              spatio-temporal subsets or means.  I've also corrected a

              few typos.<br>

              <br>

              Comments most welcome.  In particular could someone

              recheck sections 3.3-3.5 (which haven't been changed by

              me) to see if they remain consistent with CMIP5?<br>

              <br>

              thanks and best regards,<br>

              Karl<o:p></o:p></p>

          </blockquote>

        </div>

      </div>

      <br>

      <p>-- <br>

        Scanned by iCritical.

      </p>

      <br>

    </blockquote>

    <br>

    <br>

  </body>

</html>