[Go-essp-tech] Non-DRS File structure at data nodes

Fri Sep 2 05:51:11 MDT 2011

Hi Jamie,

That should (mostly) work, although I hope you see how cumbersome that 
is. Not to mention that there's no requirement for the catalogs to 
contain this information AFAIK. The information contained in the 
catalogs is also part of the "best-practices" document; not mandatory at 
all.

To build the DRS you'll also need the table name, that might be in the 
catalogs as well. And the product separation as it's not always part of 
the id, e.g. BCC hasn't done it. drslib output1/2 separation is AFAIK 
precise, but requires data from the model run to configure. I think the 
esgpublish has an option for achieving a similar result without this 
information. In this particular case, I think you should rely on it, as 
I don't think it will be easy to get to the information required for 
configuring the product detection in the drslib tool.

Most information required for building the DRS structure might be 
extracted from the file themselves. There are a few which can't 
(checksum and version).

I recall Karl saying that the version was mandatory, and although I 
haven't seen this requirement written anywhere, I think the publisher 
assigned always a version to any published dataset. Checksums are a bit 
different. All other parameters can be extracted from the files.
It might not be fast, but is certainly doable.

Thanks,
Estani

Am 02.09.2011 13:31, schrieb Kettleborough, Jamie:
> Hello,
>
> Martin, like you, we want to store data locally in a DRS-like way.  I think this means we will not preserve the actual directory structure as implied by the URL, but will infer its DRS location from the publication data set version id (I think that's the right word), and the filename in the URL.  Do you know of any reason (or examples) where this won't work.
>
> In practice this means we will take the publication data set version id from the esgcet/thredds/catalog.xml<catalogRef xlink:title>  (hope you understand the short hand - I don't know xpath...).  We will infer the variable from the pre '_' part of the filename in the URL, and the filename will be taken from the URL.  From what I understood of the DRS document this seems to be the most reliable way of deriving the implied DRS directory structure.   We'll get the URL from the thredds data set catalogue (<catalogRef  xlink:href>  in esgcet/thredds/catalog.xml) using<dataset><dataset urlPath>.
>
> Any pitfalls?  (Apart from those pointed out a while ago by Estani on the use of the thredds catalogue rather than the gate way API - but my guess is the gateways get populated by harvesting this exact same information - or is this guess wrong?).
>
> Thanks,
>
> Jamie
>
>
>> -----Original Message-----
>> From: go-essp-tech-bounces at ucar.edu
>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of
>> martin.juckes at stfc.ac.uk
>> Sent: 02 September 2011 11:59
>> To: gonzalez at dkrz.de; taylor13 at llnl.gov
>> Cc: go-essp-tech at ucar.edu; Laura.E.Carriere at nasa.gov
>> Subject: Re: [Go-essp-tech] Non-DRS File structure at data nodes
>>
>> Hello All,
>>
>> Just a few comments. As Karl says, it is clear that users can
>> get at data which is not in the DRS directory structure, and
>> in many cases will not be aware of the distinction. In
>> addition to the points Estani raises, some users may wish to
>> preserve the directory structure in their local copies and
>> will be faced with a range of different directory structures
>> -- so it is clear that lack of standardisation is going cause
>> problems for some users.
>>
>> Another aspect is version control: as Karl points out, CMOR
>> is generally going to be run before it is possible to
>> determine the version of the dataset to which a file will be
>> assigned. So the version needs to be assigned later. We
>> talked a great deal about the importance of having version
>> control of data implemented at the data nodes, and I was
>> under the impression that it would be mandatory -- but
>> perhaps we didn't get that far.
>>
>> Data which is replicated to BADC will be available through a
>> range of interfaces, including direct file system access to
>> users logged onto local machines. We will convert data into
>> the DRS directory structure (having different structures for
>> data from different groups is far too complicated to be worth
>> considering). This directory structure is also required for
>> quality control. We do have a requirement to ensure that
>> copies of data published at the archive centres (PCMDI, DKRZ
>> and BADC) are identical to those published at the providing
>> centres. The plan was to exploit the DRS directory structure
>> to meet this requirement -- if directory structures vary
>> between copies we may struggle here -- though it should be
>> possible to find a solution using file checksums.
>>
>> cheers,
>> Martin
>>
>>
>>
>>
>> ________________________________
>> From: go-essp-tech-bounces at ucar.edu
>> [go-essp-tech-bounces at ucar.edu] on behalf of Estanislao
>> Gonzalez [gonzalez at dkrz.de]
>> Sent: 02 September 2011 10:55
>> To: Karl Taylor
>> Cc: go-essp-tech at ucar.edu; Laura Carriere
>> Subject: Re: [Go-essp-tech] Non-DRS File structure at data nodes
>>
>> Dear Laura, Karl
>>
>> Regarding Karl's three points:
>>
>> 1) Indeed what Karl said it's true. Our discussion around DRS
>> is precisely because it's not mandated.
>> I think we made quite a few mistakes in this, if we had had
>> delivered proper tools in time, there should have been no
>> need for data centers to come up with different directory structures.
>>
>> 2) the drslib is not intended for CMIP3, it will/might be
>> used for that purpose though. It mainly produces a valid DRS
>> structure out of any files in other structure (including
>> CMOR2). I think Stephen can comment more on this if required.
>>
>> 3) In my opinion, the recommendation is useful for
>> datacenters, but not on an archive level. We must cope with
>> data centers not complying to this, so it's the same as if
>> there where no recommendation at all.
>>
>> I know the main idea is to create a middleware layer that
>> would make file structures obsolete. But then, we will have
>> to write all tools again in order to interact with this
>> intermediate level or at least patch them somehow. gridFTP,
>> as well as ftp, are only useful as transmission protocols,
>> you can't write your own script to use them, you have to rely
>> on either the gateway or the datanode to find what you are looking.
>> In my opinion, we will be relying too much in the ESG
>> infrastructure. What would happen if we loose the publisher
>> database? How would we tell apart one version from another,
>> if this is not represented in the directory structure?
>> My fear is that if we keep separating the metadata from the
>> data itself, we add a new weak link in the chain. Now if we
>> loose the metadata the data will also be useless (this would
>> be indeed the worst case scenario). In 10 years we will have
>> no idea what this interfaces were like, probably both data
>> node and gateways will be superseded  by newer versions that
>> can't translate our old requirements. But as I said, that's a
>> problem for LTAs only. In any case, we need the middleware to
>> provide some services and speed things up, but I don't think
>> we should rely blindly on it.
>>
>> And regarding CMOR2, indeed it was designed to be flexible,
>> but drslib also relies on the same CMOR tables to separate
>> what output1 and 2 is. And there's no magic in drslib
>> regarding versioning, it must be input by hand. Why this
>> functionality was kept away from CMOR2 is not really clear to
>> me. What ever it was, I'm not sure it work the best for all
>> configurations regarding who create, post-processes and
>> publish the data.
>>
>> I don't mean we should change any of these, it's too late and
>> that wasn't the point anyway. I just thought that it is worth
>> the discussion, especially for the future.
>>
>> Thanks,
>> Estani
>>
>> Am 02.09.2011 00:02, schrieb Karl Taylor:
>> Dear Laura,
>>
>> Thank you for providing an important perspective on this.  I
>> agree that misunderstanding and poor communication about this
>> has caused considerable confusion.
>>
>> Here's some short answers to your questions, followed by a
>> more complete discussion that others may also want to read carefully:
>>
>> 1.  It is *not* true that CMIP5 or ESG mandate a specific
>> directory structure, although DRS document  recommends for
>> CMIP5 a specific directory structure.  Note that for
>> reanalysis data, which falls under the "obs4MIPs" project,
>> the recommended (again not required) directory structure
>> differs from CMIP5.
>>
>> 2.  The directory structure produced by CMOR2 is not
>> identical to the directory structure for CMIP*3* data stored
>> at PCMDI.  It also differs from the "final" form of the
>> recommended (not required) directory structure for CMIP5. I'm
>> not sure if drslib
>> (http://esgf.org/esgf-drslib-site/index.html) can convert
>> from CMIP3 to final recommended CMIP5 directory structure,
>> but I know it can convert from the default CMOR2-produced
>> directory structure to final CMIP5 structure (although I
>> didn't see this mentioned in the drslib documentation).
>>
>> 3.  The recommended procedure for treatment of CMIP5 data is
>> to write it using CMOR2 (without overriding the default
>> directory structure it produces)  and then use drslib (or
>> equivalent) to produce the final directory structure.
>>
>> Now for some discussion....
>>
>> For ESG, there is no directory structure imposed.  When
>> datasets are published, information is recorded that enables
>> users (through gateways) to access the data they want
>> (without any knowledge of directory structures).  The
>> directory structures recommended for CMIP5 and for the
>> "obs4MIPs" activity are different, but this does not hamper
>> ESG from serving them and searching them, because it doesn't
>> really care about directory structure.
>>
>> For CMIP5 (which is only one of the projects served by ESG),
>> cmor2 creates a directory structure that is a reasonable way
>> to organize the output, and CMOR2 can generate filenames
>> according to a template required by CMIP5, as described in
>> the DRS document.
>>
>> For CMIP5  the DRS document recommends (but does *not*
>> require) a final directory structure.   Because this is only
>> a recommendation, individual data nodes may choose to
>> organize their data to fit their own local requirements.
>>
>> The DRS specifies a controlled vocabulary, and various
>> "descriptors" of CMIP5 datasets that are stored in catalogs
>> at the data nodes.   This information can be accessed in
>> various ways, but by "reading" the catalogs (which are xml
>> files), a user can obtain the URL that can be used to get the
>> data.  The uniformity in structure for all CMIP5 catalogs
>> ensures that software can be written to automatically
>> translate between a set of DRS descriptors that uniquely
>> identify the data being sought and a list of (possibly
>> *non-uniformly* structured) directories/filenames  containing
>> that data.    Thus the ESG gateway can generate wget scripts
>> that can be run to download the data even when the directory
>> structures differ from one node to another.  Presumably other
>> tools could get the URL's similarly.
>>
>> By the way, CMOR2 was designed to meet the needs of many
>> different projects, not just CMIP5, so having it generate
>> automatically directory structures consistent with the
>> requirements of these different projects is difficult.   For
>> one thing, the "output" descriptor called for by the DRS
>> requires a complicated algorithm unique to CMIP5 and thus
>> this information is unknown by CMOR2.  Also the version
>> number (which appears in the final recommended DRS directory
>> structure) is based on the ("publication") date of the
>> dataset.  Since a dataset comprises many different variables,
>> perhaps written on different days, it would be impossible for
>> CMOR2 to assign this date automatically, which is why the
>> version number is assigned when the data are published.
>> Thus, the full, final directory structure *recommended* by
>> CMIP5 cannot be assigned by CMOR2.
>>
>> So, those are the rules for CMIP5:  the directory structure
>> is not mandated, but it is certainly recommended.  I think
>> that using drslib is a good way to put CMOR2 output in the
>> recommended DRS directory structure, and I don't *think*
>> other steps are required.
>>
>> Please let me know if you have questions, and please feel
>> free to respond.
>>
>> Best regards,
>> Karl
>>
>>
>>
>> On 9/1/11 12:55 PM, Laura Carriere wrote:
>>
>> For what it's worth, I'm going to add my own perspective, one
>> that comes from someone who is managing the team that is
>> publishing the data at NASA/GSFC but is not involved in
>> writing the code or producing the data.  In other words, I'm
>> sure there's lots I don't understand, but here's what I have
>> managed to decipher.
>>
>> I'll start by saying that we don't have a strong opinion
>> about what directory structure is used.  Our focus is on
>> providing users quick access to data that is accurate and
>> easily identified.  Our initial understanding was that CMOR2
>> would create the correct DRS file structure but we have since
>> learned that this is not the case.  We were also under the
>> impression that the DRS file structure was "recommended" not
>> "required".  This, also, appears not to be the case.
>>
>> After learning that we weren't using the correct file
>> structure, we re-read the documentation more carefully but we
>> were still left not really knowing what the expectations were.
>>
>> First I read the CMIP5 Data Reference Syntax (DRS) and
>> Controlled Vocabulary documentation:
>>
>> http://cmip-pcmdi.llnl.gov/cmip5/docs/cmip5_data_reference_syntax.pdf
>>
>> Section 3.1, shows the DRS structure we were creating by
>> using CMOR2, and 3.3 shows the DRS structure that we are
>> supposed to be creating.
>>
>> It also states that there is an expectation that we are
>> responsible for "transforming" the CMOR2 structure to the
>> recommended structure.  I found this surprising so I checked
>> the CMOR2 release notes and found that there's no reference
>> to modifying CMOR2 to have an option to produce the new DRS
>> structure so it became clear that we needed to do this ourselves.
>>
>> I then looked at the drslib page:
>>
>> http://esgf.org/esgf-drslib-site/index.html
>>
>> This is a utility to convert a CMIP3 directory structure to
>> DRS-compliant form but since our team is quite new to the
>> IPCC activity, we don't know if what CMOR2 creates is CMIP3 or not.
>>
>> That left us not knowing if there were any tools to do what
>> we had been asked to do.  The data provider was willing to
>> recreate the data with the missing directories so we
>> republished all the data we had had the time.  However, that
>> doesn't really help us with the next data provider who is
>> just now starting to give us data.
>>
>> What I would like to be able to find is a simple way for the
>> data providers (who are running CMOR2 but are not publishing
>> the data) to prepare the directory structure in a way that is
>> compliant.  I would rather not ask them to wade through all
>> the above documentation and translate the directory structure
>> themselves because they are busy enough as it is.
>>
>> Ideally I would like to be able to tell them to use a
>> particular option to CMOR2 to create the right structure but
>> such an option doesn't exist.  The second best option would
>> be some clarification on the use of drslib.  Specifically,
>> can it be run on the directory structure that
>> CMOR2 produces and will it then produce a compliant directory
>> structure that we can publish?  And are there any additional
>> steps required?
>>
>> So, in the interests of improving communication, I suggest
>> that someone remove the word "recommended" from sections 3.1
>> and 3.3 in the DRS document, explain why it's "required" and
>> the repercussions of not complying and also add instructions
>> on how to get to the "required"
>> structure.  In an ideal world, an option would be added to
>> CMOR2 to do this there.
>>
>> As I said, this is just my perspective from the data publication side.
>> Please feel free to enlighten me on what I've missed.  Thanks.
>>
>>     Laura Carriere
>>
>>
>> On 9/1/2011 4:58 AM, Kettleborough, Jamie wrote:
>>
>>
>> Hello,
>>
>> Isn't one issue that for some applications the *interface*
>> with the data is at the *file system level* - not the
>> catalogues? Version management, QC look like they are
>> examples, and replication may be too (and I think these are
>> pretty much federation wide activities/applications).  So if
>> you want to minimise the complexity (~= minimise time to
>> develop, cost of maintenance) in the way these applications
>> interact with the data you want to ensure consistency in the
>> way data stored in the file system.
>> Bryan - I wasn't sure what interfaces you were talking about... Sorry.
>>
>> I'm going to be a bit pedantic here - but I don't think the
>> DRS document says that data nodes must follow the DRS
>> directory structure, its only a recommendation.  Though there
>> *may* be a slight inconsistency in the way the DRS is written
>> as it says the URLS *will* be a site dependant prefix
>> followed by the *DRS directory structure*.  At least that's
>> my reading of the 1.2 version dated 9th March. I don't think
>> all nodes are following the DRS specification for the URLS
>> because they don't have the same underlying directory
>> structure.  I don't know if the way the DRS is written or
>> being interpreted is one of the sources of misunderstanding
>> over this issue of DRS directory structure?  (This is not a
>> criticism, its an acceptance that communicating specification
>> and plans is a hard problem to crack).
>>
>> Another (possibly week) motivation for keeping all data in
>> the DRS directory structure is it gives you a last ditch back
>> up strategy - if you loose the catalogues you can regenerate
>> the version info etc from the file system.
>>
>> Jamie
>>
>>
>>
>> -----Original Message-----
>> From:
>> go-essp-tech-bounces at ucar.edu<mailto:go-essp-tech-bounces at ucar.edu>
>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Bryan Lawrence
>> Sent: 01 September 2011 08:55
>> To: go-essp-tech at ucar.edu<mailto:go-essp-tech at ucar.edu>
>> Cc: stockhause at dkrz.de<mailto:stockhause at dkrz.de>
>> Subject: Re: [Go-essp-tech] Non-DRS File structure at data nodes
>>
>> Hi Folks
>>
>>
>>
>> At least it's now clear to me, that we can't rely on the
>>
>>
>> DRS structure
>>
>>
>> so we should try to cope with this.
>>
>>
>> I'm just coming back to this, and I haven't read all of this
>> thread, but I don't agree with this statement!  If we can't
>> rely on the DRS *at the interface level*, then ESGF is
>> fundamentally doomed as a distributed activity, because we'll
>> never have the resource to support all the possible variants.
>>
>> Behind those interfaces, more flexibility might be possible,
>> but components would need to be pretty targetted in their
>> functionality.
>>
>> Bryan
>>
>>
>>
>>
>> Thanks,
>> Estani
>>
>> Am 31.08.2011 12:55, schrieb
>> stephen.pascoe at stfc.ac.uk:<mailto:stephen.pascoe at stfc.ac.uk:>
>>
>>
>> Hi Estani,
>>
>> I see you have some code in esgf-contrib.git for managing
>>
>>
>> a replica
>>
>>
>> database.  There's quite a lot of drs-parsing code there.
>>
>>
>>    Is there
>>
>>
>> any reason why this couldn't use drslib?
>>
>> Cheers,
>> Stephen.
>>
>> ---
>> Stephen Pascoe  +44 (0)1235 445980
>> Centre of Environmental Data Archival STFC Rutherford
>> Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>>
>>
>> -----Original Message-----
>> From:
>> go-essp-tech-bounces at ucar.edu<mailto:go-essp-tech-bounces at ucar.edu>
>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of
>> Estanislao Gonzalez
>> Sent: 31 August 2011 10:23
>> To: Juckes, Martin (STFC,RAL,RALSP)
>> Cc: stockhause at dkrz.de<mailto:stockhause at dkrz.de>;
>> go-essp-tech at ucar.edu<mailto:go-essp-tech at ucar.edu>
>> Subject: Re: [Go-essp-tech] Non-DRS File structure at data nodes
>>
>> Hi Martin,
>>
>> Are you planning to publish that data as a new instance
>>
>>
>> or as a replica?
>>
>>
>> If I recall it right, Karl said he thought the replica
>>
>>
>> was attached
>>
>>
>> at a semantic level. But I have my doubts and haven't got
>>
>>
>> any feed
>>
>>
>> back on this. does anyone know if the gateway can handle
>>
>>
>> a replica
>>
>>
>> with a different url path? (dataset and version "should" be
>> the same, although keeping the same version will be
>>
>>
>> difficult, because
>>
>>
>> no tool can handle this AFAIK, i.e. replicating or publishing
>> multiple datasets with different versions)
>>
>> And regarding replication (independently from the previous
>> question), how are you going to cope with new versions? Do
>> you already have tools for harvesting the TDS and building a
>> list of which files do need to be replicated, regarding from what
>>
>>
>> you already have?
>>
>>
>> The catalog will just publish a dataset and version along
>> with a bunch of files, you would need to keep a DB with the
>> fies you've already downloaded, and compare with the catalog
>> to realize what should be done next. This information is what drslib
>>
>>
>> should use to
>>
>>
>> create the next version. Is that what will happen? If
>>
>>
>> not, how will you be solving this?
>>
>>
>> Thanks,
>> Estani
>>
>> Am 31.08.2011 10:54, schrieb
>> martin.juckes at stfc.ac.uk:<mailto:martin.juckes at stfc.ac.uk:>
>>
>>
>> Hello Martina,
>>
>> For BADC, I don't think we are considering storing data
>>
>>
>> in anything
>>
>>
>> other than the DRS structure -- we just don't have the time
>> to build systems around multiple structures. This means
>>
>>
>> that data that
>>
>>
>> comes from a node with a different directory structure
>>
>>
>> will have to
>>
>>
>> be re-mapped. Verification of file identities will rely on
>> check-sums, as it always will when dealing with files
>>
>>
>> from archives
>>
>>
>> from which we have no curation guarantees,
>>
>> cheers,
>> Martin
>>
>> ________________________________
>> From:
>> go-essp-tech-bounces at ucar.edu<mailto:go-essp-tech-bounces at ucar.edu>
>>
>>
>> [go-essp-tech-bounces at ucar.edu<mailto:go-essp-tech-bounces at ucar.edu>]
>>
>>
>> on behalf of Martina Stockhause
>> [stockhause at dkrz.de<mailto:stockhause at dkrz.de>] Sent: 31 August 2011
>> 09:44
>> To: go-essp-tech at ucar.edu<mailto:go-essp-tech at ucar.edu>
>> Subject: [Go-essp-tech] Non-DRS File structure at data nodes
>>
>> Hi everyone,
>>
>> we promised to describe the problems regarding the non-DRS
>> file structures at the data nodes. Estani has already started
>> the discussion on the replication/user download problems
>>
>>
>> (see attached
>>
>>
>> email and document).
>>
>> Implications for the QC:
>> - In the QCDB we need DRS syntax. The DOI process,
>>
>>
>> creation of CIM
>>
>>
>> documents, and identification of the data the QC results are
>> connected to rely on that. - The QC needs to know the version
>> of the data checked. The DOI at the end of the QC process
>>
>>
>> is assigned
>>
>>
>> to a specific not-changable data version. At least at
>>
>>
>> DKRZ we have
>>
>>
>> to guarantee that the data is not changed after
>>
>>
>> assignment of the
>>
>>
>> DOI, therefore we store a data copy in our archive. - The QC
>> checker tool runs on files in a given directory structure and
>> creates results in a copy of this structure. The QC
>>
>>
>> wrapper can deal with recombinations of path parts.
>>
>>
>> So, if the directory structure includes all parts of the DRS
>> syntax, the wrapper can create the DRS syntax before
>>
>>
>> insert in the
>>
>>
>> QCDB. But we deal with structures at the data nodes, where
>> some information is missing in the directory path, i.e.
>>
>>
>> version and MIP
>>
>>
>> table. Therefore an additional information would be
>>
>>
>> needed for that mapping.
>>
>>
>> Possible solutions to map the given file structure to the DRS
>> directory structure before insert in the QCDB:
>>
>> 1. The publication on the data nodes of the three gateways
>> who store replicas (PCMDI, BADC, DKRZ) publish data in the
>> DRS directory structure. Then the QC run is possible without
>>
>>
>> mapping.
>>
>>
>> Replication problems?
>>
>> 2. The directory structures of the data nodes are replicated
>> as they are. We store the data under a certain version.
>>
>>
>> How? Are there
>>
>>
>> implications for the replication from the data nodes? The
>> individual file structures down to the chunk level are stored
>> together with its DRS identification in a repository and
>>
>>
>> a service
>>
>>
>> is created to access the DRS id for the given file in the
>> given file structure. The QC and maybe other user data
>>
>>
>> services use this
>>
>>
>> service for mapping. That will slow down the QC insert process.
>> Before each insert of a chunk name, a qc result for a
>> specific variable, and the qc result on the experiment level
>> that service has to be called. And who can set-up and
>> maintain such a repository? DKRZ has not the man power to do
>> that in the
>>
>>
>> next months.
>>
>>
>> Cheers,
>> Martina
>>
>>
>>
>> -------- Original-Nachricht --------
>> Betreff:        RE: ESG discussion
>> Datum:  Wed, 10 Aug 2011 15:35:04 +0100
>> Von:    Kettleborough,
>>
>>
>>
>> Jamie<jamie.kettleborough at metoffice.gov.uk><mailto:jamie.kettl
>> eborough at metoffice.gov.uk><mailto:jamie.kettl
>> eborough@
>>
>>
>> metoffice.gov.uk>   An:     Karl
>> Taylor<taylor13 at llnl.gov><mailto:taylor13 at llnl.gov><mailto:tay
>> lor13 at llnl.gov><mailto:taylor13 at llnl.gov>, Wood,
>>
>>
>>
>> Richard<richard.wood at metoffice.gov.uk><mailto:richard.wood at met
>> office.gov.uk><mailto:richard.wood at met
>> office.go
>>
>>
>> v.uk>   CC:     Carter,
>>
>>
>>
>> Mick<mick.carter at metoffice.gov.uk><mailto:mick.carter at metoffic
>> e.gov.uk><mailto:mick.carter at metoffice.gov
>>
>>
>> .uk>
>> , Elkington,
>>
>>
>>
>> Mark<mark.elkington at metoffice.gov.uk><mailto:mark.elkington at me
>> toffice.gov.uk><mailto:mark.elkington at metoffi
>>
>>
>> ce.g
>> ov.uk>, Bentley,
>>
>>
>>
>> Philip<philip.bentley at metoffice.gov.uk><mailto:philip.bentley@
>> metoffice.gov.uk><mailto:philip.bentley at metof
>>
>>
>> fice
>> .gov.uk>, Senior,
>>
>>
>>
>> Cath<cath.senior at metoffice.gov.uk><mailto:cath.senior at metoffic
>> e.gov.uk><mailto:cath.senior at metoffice.gov
>>
>>
>> .uk>
>> , Hines,
>>
>>
>>
>> Adrian<adrian.hines at metoffice.gov.uk><mailto:adrian.hines at meto
>> ffice.gov.uk><mailto:adrian.hines at metoffice
>>
>>
>> .gov .uk>, Dean N.
>> Williams<williams13 at llnl.gov><mailto:williams13 at llnl.gov><mail
>> to:williams13 at llnl.gov><mailto:williams13 at llnl.gov>,
>> Estanislao
>>
>>
>>
>> Gonzalez<gonzalez at dkrz.de><mailto:gonzalez at dkrz.de><mailto:gon
>> zalez at dkrz.de><mailto:gonzalez at dkrz.de>,<martin.juckes@
>>
>>
>> stfc
>> .ac.uk><mailto:martin.juckes at stfc.ac.uk><mailto:martin.juckes@
>> stfc.ac.uk>, Kettleborough,
>>
>>
>>
>> Jamie<jamie.kettleborough at metoffice.gov.uk><mailto:jamie.kettl
>> eborough at metoffice.gov.uk><mailto:jamie.kettleboro
>>
>>
>> ugh@
>> metoffice.gov.uk>
>>
>>
>> Hello Karl, Dean,
>>
>> Thanks for you reply on this, and the fact you are taking our
>> concerns seriously. You are right to challenge us for
>>
>>
>> the specific
>>
>>
>> issues, rather than us just highlighting the things that
>>
>>
>> don't meet
>>
>>
>> our (possibly idealised) expectations of how the system
>> should look.  As a result, we have had a thorough review of
>> our key issues. I think some of them are issues that make if
>>
>>
>> harder for us
>>
>>
>> to do things now; other issues are maybe more concerns
>>
>>
>> of problems
>>
>>
>> being stored up. This document has been prepared with the
>> help Estani Gonzalez.  We would like to have Martin Juckes
>>
>>
>> input on this
>>
>>
>> too - but he is currently away on holiday.  I hope he can add
>> to this when he returns - he has spent a lot of time thinking
>> about the implications of data node directory structure on
>>
>>
>> versioning. I
>>
>>
>> hope this helps clarify issues, if not please let use
>>
>>
>> know, Thanks,
>>
>>
>> Jamie
>>
>> ________________________________
>> From: Karl Taylor [mailto:taylor13 at llnl.gov]
>> Sent: 09 August 2011 01:48
>> To: Wood, Richard
>> Cc: Carter, Mick; Kettleborough, Jamie; Elkington, Mark;
>>
>>
>> Bentley,
>>
>>
>> Philip; Senior, Cath; Hines, Adrian; Dean N. Williams
>>
>>
>> Subject: Re:
>>
>>
>> ESG discussion
>>
>> Dear all,
>>
>> Thanks for taking the time to bring to my attention the
>>
>>
>> ESG issues
>>
>>
>> that I hope can be addressed reasonably soon.  I think we're
>> in general agreement that the user's experience should be improved.
>>
>> I've discussed this briefly with Dean.  I plan to meet
>>
>>
>> with him and
>>
>>
>> others here, and, drawing on your suggestions, we'll attempt
>> to find solutions and methods of communication that might
>>
>>
>> improve matters.
>>
>>
>> Before doing this, it would help if you could briefly answer
>> the following questions:
>>
>> 1.  Is the main issue that it is currently difficult to
>> script downloads from all the nodes because only some support
>>
>>
>> PKI?  What
>>
>>
>> other uniformity among nodes is required for you to be
>>
>>
>> able to do
>>
>>
>> what you want to do (i.e., what do you specifically want
>>
>>
>> to do that
>>
>>
>> is difficult to do now)?  [nb. all data nodes are
>>
>>
>> scheduled to be
>>
>>
>> operating with PKI authentication by September 1.]
>>
>> 2.  Is there anything from the perspective of a data
>> *provider* that needs to be done (other than make things easier for
>>
>>
>> data users)?
>>
>>
>> 3.  Currently ESG and CMIP5 do not dictate the directory
>>
>>
>> structure
>>
>>
>> found at each data node (although most nodes are adhering to the
>> recommendations of the DRS).   The gateway software and
>>
>>
>> catalog make it
>>
>>
>> possible to get to the data regardless of directory
>>
>>
>> structure.  It
>>
>>
>> is possible that "versioning" might impose additional
>>
>>
>> constraints
>>
>>
>> on the directory structure, but I'm not sure about this.
>>
>>
>>    (By the
>>
>>
>> way, I'm not sure what the "versioning" issue is since
>>
>>
>> currently I
>>
>>
>> think it's impossible for users to know about more than one
>> version; is that the
>> issue?)  From a user's or provider's perspective, is there
>> any essential reason that the directory structure should be
>>
>>
>> the same at
>>
>>
>> each node?
>>
>> 4.  ESG allows considerable flexibility in publishing data, and
>> CMIP5 has suggested "best practices" to reduce
>>
>>
>> differences.  Only
>>
>>
>> some of the "best practices" are currently requirements.
>>
>>
>>    A certain
>>
>>
>> amount of flexibility is essential since different data
>>
>>
>> providers
>>
>>
>> have resources to support the potential capabilities of
>>
>>
>> ESG (e.g.,
>>
>>
>> not all can support server-side calculations, which will
>>
>>
>> be put in place at some nodes).
>>
>>
>> Likewise a provider can currently turn off the
>>
>>
>> "checksum", if this
>>
>>
>> is deemed to slow publication too much (although we could
>> insist that checksums be stored in the thredds catalogue).
>>
>>
>> Nevertheless,
>>
>>
>> it is unlikely that every data node will be identically
>>
>>
>> configured for all
>>
>>
>> options.    What are the *essential* ways that the data
>>
>>
>> nodes should
>>
>>
>> respond identically (we may not be able to insist on
>> uniformity that isn't essential for serving our users)?
>>
>> Thanks again for your input, and I look forward to your
>> further help with this.
>>
>> Best regards,
>> Karl
>>
>>
>> On 8/5/11 10:43 AM, Wood, Richard wrote:
>>
>> Dear Karl,
>>
>>       Following on from our phone call I had a discussion with
>> technical
>>
>> colleagues here (Mick Carter, Jamie Kettleborough, Mark
>>
>>
>> Elkington,
>>
>>
>> also earlier with Phil Bentley), and with Adrian Hines who is
>> coordinating our CMIP5 analysis work, about ideas for
>>
>>
>> future development of the ESG.
>>
>>
>> Our observations are from the user perspective, and
>>
>>
>> based on what
>>
>>
>> we can gather from mailing lists and our own experience.
>>
>>
>> Coming out
>>
>>
>> of our discussion we have a couple of suggestions that
>>
>>
>> could help
>>
>>
>> with visibility for data providers and users:
>>
>> - Some areas need agreement among the data nodes as to the
>> technical solution, and then implementation across all
>>
>>
>> the nodes,
>>
>>
>> while others need a specific solution to be developed in
>>
>>
>> one place and rolled out.
>>
>>
>> The group teleconferences that Dean organises appear to
>>
>>
>> be a good
>>
>>
>> forum for airing specific technical ideas and solutions.
>>
>>
>> However,
>>
>>
>> in our experience it can be  difficult in that kind of forum
>> to discuss planning and prioritisation questions. From our
>>
>>
>> perspective
>>
>>
>> we don't have visibility of the more project-related
>>
>>
>> issues such as
>>
>>
>> key technical decisions, prioritisation and timelines, or of
>> whether issues that have arisen in the mailing list
>>
>>
>> discussions are
>>
>>
>> being followed up. We guess these may be discussed in
>> separate project teleconferences involving the technical leads
>>
>>
>> from the data
>>
>>
>> nodes. As users we would not necessarily expect to be
>>
>>
>> involved in
>>
>>
>> those discussions, but as data providers and dowloaders
>>
>>
>> it would be
>>
>>
>> very helpful for our planning to see the outcomes of the
>> discussions. The sort of thing we had in mind would be a
>>
>>
>> simple web
>>
>>
>> page showing the priority development areas, agreed
>>
>>
>> solutions and
>>
>>
>> estimated dates for completion/release. Some solutions
>>
>>
>> will need to
>>
>>
>> be implemented separately across all the participating
>>
>>
>> data nodes,
>>
>>
>> and in these cases it would be useful to see the
>>
>>
>> estimated timeframe for implementation at each node.
>>
>>
>> This would not be intended as a 'big stick' to the partners,
>> but simply as a planning aid so that everyone can see what's
>>
>>
>> available
>>
>>
>> when and the project can identify any potential
>>
>>
>> bottlenecks or issues in advance.
>>
>>
>> Also the intention is not to generate a lot of extra work.
>> Hopefully providing this information would be pretty
>>
>>
>> light on people's time.
>>
>>
>> - From where we sit it appears that some nodes are quite
>>
>>
>> successful
>>
>>
>> in following best practice and implementing the
>>
>>
>> federation policies
>>
>>
>> as far as they are aware of them. Could what these nodes
>>
>>
>> do be made
>>
>>
>> helpful to all the data nodes (e.g. by using identical
>>
>>
>> software)?
>>
>>
>> We realise there may be real differences between some
>>
>>
>> data nodes -
>>
>>
>> but where possible we think that what is similar could
>>
>>
>> be enforced
>>
>>
>> or made explicitly the same through sharing the software
>>
>>
>> components and tools.
>>
>>
>> To set the discussion on priorities rolling, Jamie has
>>
>>
>> prepared, in
>>
>>
>> consultation with others here, a short document showing the
>> Met Office view of current priority issues (attached). If you
>> could update us on the status of work on these issues, that
>>
>>
>> would be very
>>
>>
>> useful (ideally via the web pages proposed above, which we
>> think would be of interest to many users, or via email in the
>>
>>
>> interim).
>>
>>
>> Many thanks for the update on tokenless authentication,
>>
>>
>> which is very good news.
>>
>>
>>       Once again, our thanks to you, Dean and the team for
>>
>>
>> all the hard
>>
>>
>>       work
>>
>> we know is going into this. Please let us know what you think
>> of the above ideas and the attachment, and if there is
>>
>>
>> anything we can
>>
>>
>> do to help.
>>
>>           Best wishes,
>>
>>            Richard
>>
>> --------------
>> Richard Wood
>> Met Office Fellow and Head (Oceans, Cryosphere and Dangerous Climate
>> Change)
>> Met Office Hadley Centre
>> FitzRoy Road, Exeter EX1 3PB, UK
>> Phone +44 (0)1392 886641  Fax +44 (0)1392 885681 Email
>>
>>
>>
>> richard.wood at metoffice.gov.uk<mailto:richard.wood at metoffice.go
>> v.uk><mailto:richard.wood at metoffice.gov.uk><mailto:richard.woo
>> d at metoffice.gov.uk>
>>
>>
>> http://www.metoffice.gov.uk Personal web page
>>
>>
>>
>> http://www.metoffice.gov.uk/research/scientists/cryosphere-oceans/r
>>
>>
>> ichar
>> d-wood
>>
>> *** Please note I also work as Theme Leader (Climate System)
>> for the Natural Environment Research Council ***
>> *** Where possible please send emails on NERC matters to
>> rwtl at nerc.ac.uk<mailto:rwtl at nerc.ac.uk><mailto:rwtl at nerc.ac.uk
>>> <mailto:rwtl at nerc.ac.uk>   ***
>>
>> --
>> Bryan Lawrence
>> University of Reading:  Professor of Weather and Climate
>> Computing National Centre for Atmospheric Science: Director
>> of Models and Data
>> STFC: Director of the Centre of Environmental Data Archival
>> Phone +44 1235 445012; Web: home.badc.rl.ac.uk/lawrence
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>
>>
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>
>>
>> --
>>
>>     Laura Carriere
>> laura.carriere at nasa.gov<mailto:laura.carriere at nasa.gov>
>>     SAIC                                 301 614-5064
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>
>>
>>
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>
>>
>>
>>
>> --
>> Estanislao Gonzalez
>>
>> Max-Planck-Institut für Meteorologie (MPI-M) Deutsches
>> Klimarechenzentrum (DKRZ) - German Climate Computing Centre
>> Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>>
>> Phone:   +49 (40) 46 00 94-126
>> E-Mail:  gonzalez at dkrz.de<mailto:gonzalez at dkrz.de>
>> --
>> Scanned by iCritical.
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>

-- 
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  gonzalez at dkrz.de