[Go-essp-tech] Non-DRS File structure at data nodes

Estanislao Gonzalez gonzalez at dkrz.de
Fri Sep 2 03:55:42 MDT 2011


Dear Laura, Karl

Regarding Karl's three points:

1) Indeed what Karl said it's true. Our discussion around DRS is 
precisely because it's not mandated.
I think we made quite a few mistakes in this, if we had had delivered 
proper tools in time, there should have been no need for data centers to 
come up with different directory structures.

2) the drslib is not intended for CMIP3, it will/might be used for that 
purpose though. It mainly produces a valid DRS structure out of any 
files in other structure (including CMOR2). I think Stephen can comment 
more on this if required.

3) In my opinion, the recommendation is useful for datacenters, but not 
on an archive level. We must cope with data centers not complying to 
this, so it's the same as if there where no recommendation at all.

I know the main idea is to create a middleware layer that would make 
file structures obsolete. But then, we will have to write all tools 
again in order to interact with this intermediate level or at least 
patch them somehow. gridFTP, as well as ftp, are only useful as 
transmission protocols, you can't write your own script to use them, you 
have to rely on either the gateway or the datanode to find what you are 
looking.
In my opinion, we will be relying too much in the ESG infrastructure. 
What would happen if we loose the publisher database? How would we tell 
apart one version from another, if this is not represented in the 
directory structure?
My fear is that if we keep separating the metadata from the data itself, 
we add a new weak link in the chain. Now if we loose the metadata the 
data will also be useless (this would be indeed the worst case 
scenario). In 10 years we will have no idea what this interfaces were 
like, probably both data node and gateways will be superseded  by newer 
versions that can't translate our old requirements. But as I said, 
that's a problem for LTAs only. In any case, we need the middleware to 
provide some services and speed things up, but I don't think we should 
rely blindly on it.

And regarding CMOR2, indeed it was designed to be flexible, but drslib 
also relies on the same CMOR tables to separate what output1 and 2 is. 
And there's no magic in drslib regarding versioning, it must be input by 
hand. Why this functionality was kept away from CMOR2 is not really 
clear to me. What ever it was, I'm not sure it work the best for all 
configurations regarding who create, post-processes and publish the data.

I don't mean we should change any of these, it's too late and that 
wasn't the point anyway. I just thought that it is worth the discussion, 
especially for the future.

Thanks,
Estani

Am 02.09.2011 00:02, schrieb Karl Taylor:
> Dear Laura,
>
> Thank you for providing an important perspective on this.  I agree 
> that misunderstanding and poor communication about this has caused 
> considerable confusion.
>
> Here's some short answers to your questions, followed by a more 
> complete discussion that others may also want to read carefully:
>
> 1.  It is *not* true that CMIP5 or ESG mandate a specific directory 
> structure, although DRS document  recommends for CMIP5 a specific 
> directory structure.  Note that for reanalysis data, which falls under 
> the "obs4MIPs" project, the recommended (again not required) directory 
> structure differs from CMIP5.
>
> 2.  The directory structure produced by CMOR2 is not identical to the 
> directory structure for CMIP*3* data stored at PCMDI.  It also differs 
> from the "final" form of the recommended (not required) directory 
> structure for CMIP5. I'm not sure if drslib 
> (http://esgf.org/esgf-drslib-site/index.html) can convert from CMIP3 
> to final recommended CMIP5 directory structure, but I know it can 
> convert from the default CMOR2-produced directory structure to final 
> CMIP5 structure (although I didn't see this mentioned in the drslib 
> documentation).
>
> 3. The recommended procedure for treatment of CMIP5 data is to write 
> it using CMOR2 (without overriding the default directory structure it 
> produces)  and then use drslib (or equivalent) to produce the final 
> directory structure.
>
> Now for some discussion....
>
> For ESG, there is no directory structure imposed.  When datasets are 
> published, information is recorded that enables users (through 
> gateways) to access the data they want (without any knowledge of 
> directory structures).  The directory structures recommended for CMIP5 
> and for the "obs4MIPs" activity are different, but this does not 
> hamper ESG from serving them and searching them, because it doesn't 
> really care about directory structure.
>
> For CMIP5 (which is only one of the projects served by ESG), cmor2 
> creates a directory structure that is a reasonable way to organize the 
> output, and CMOR2 can generate filenames according to a template 
> required by CMIP5, as described in the DRS document.
>
> For CMIP5  the DRS document recommends (but does *not* require) a 
> final directory structure.   Because this is only a recommendation, 
> individual data nodes may choose to organize their data to fit their 
> own local requirements.
>
> The DRS specifies a controlled vocabulary, and various "descriptors" 
> of CMIP5 datasets that are stored in catalogs at the data nodes.   
> This information can be accessed in various ways, but by "reading" the 
> catalogs (which are xml files), a user can obtain the URL that can be 
> used to get the data.  The uniformity in structure for all CMIP5 
> catalogs ensures that software can be written to automatically 
> translate between a set of DRS descriptors that uniquely identify the 
> data being sought and a list of (possibly *non-uniformly* structured) 
> directories/filenames  containing that data.    Thus the ESG gateway 
> can generate wget scripts that can be run to download the data even 
> when the directory structures differ from one node to another.  
> Presumably other tools could get the URL's similarly.
>
> By the way, CMOR2 was designed to meet the needs of many different 
> projects, not just CMIP5, so having it generate automatically 
> directory structures consistent with the requirements of these 
> different projects is difficult.   For one thing, the "output" 
> descriptor called for by the DRS requires a complicated algorithm 
> unique to CMIP5 and thus this information is unknown by CMOR2.  Also 
> the version number (which appears in the final recommended DRS 
> directory structure) is based on the ("publication") date of the 
> dataset.  Since a dataset comprises many different variables, perhaps 
> written on different days, it would be impossible for CMOR2 to assign 
> this date automatically, which is why the version number is assigned 
> when the data are published.  Thus, the full, final directory 
> structure *recommended* by CMIP5 cannot be assigned by CMOR2.
>
> So, those are the rules for CMIP5:  the directory structure is not 
> mandated, but it is certainly recommended.  I think that using drslib 
> is a good way to put CMOR2 output in the recommended DRS directory 
> structure, and I don't *think* other steps are required.
>
> Please let me know if you have questions, and please feel free to respond.
>
> Best regards,
> Karl
>
>
>
> On 9/1/11 12:55 PM, Laura Carriere wrote:
>> For what it's worth, I'm going to add my own perspective, one that comes
>> from someone who is managing the team that is publishing the data at
>> NASA/GSFC but is not involved in writing the code or producing the
>> data.  In other words, I'm sure there's lots I don't understand, but
>> here's what I have managed to decipher.
>>
>> I'll start by saying that we don't have a strong opinion about what
>> directory structure is used.  Our focus is on providing users quick
>> access to data that is accurate and easily identified.  Our initial
>> understanding was that CMOR2 would create the correct DRS file structure
>> but we have since learned that this is not the case.  We were also under
>> the impression that the DRS file structure was "recommended" not
>> "required".  This, also, appears not to be the case.
>>
>> After learning that we weren't using the correct file structure, we
>> re-read the documentation more carefully but we were still left not
>> really knowing what the expectations were.
>>
>> First I read the CMIP5 Data Reference Syntax (DRS) and Controlled
>> Vocabulary documentation:
>>
>> http://cmip-pcmdi.llnl.gov/cmip5/docs/cmip5_data_reference_syntax.pdf
>>
>> Section 3.1, shows the DRS structure we were creating by using CMOR2,
>> and 3.3 shows the DRS structure that we are supposed to be creating.
>>
>> It also states that there is an expectation that we are responsible for
>> "transforming" the CMOR2 structure to the recommended structure.  I
>> found this surprising so I checked the CMOR2 release notes and found
>> that there's no reference to modifying CMOR2 to have an option to
>> produce the new DRS structure so it became clear that we needed to do
>> this ourselves.
>>
>> I then looked at the drslib page:
>>
>> http://esgf.org/esgf-drslib-site/index.html
>>
>> This is a utility to convert a CMIP3 directory structure to
>> DRS-compliant form but since our team is quite new to the IPCC activity,
>> we don't know if what CMOR2 creates is CMIP3 or not.
>>
>> That left us not knowing if there were any tools to do what we had been
>> asked to do.  The data provider was willing to recreate the data with
>> the missing directories so we republished all the data we had had the
>> time.  However, that doesn't really help us with the next data provider
>> who is just now starting to give us data.
>>
>> What I would like to be able to find is a simple way for the data
>> providers (who are running CMOR2 but are not publishing the data) to
>> prepare the directory structure in a way that is compliant.  I would
>> rather not ask them to wade through all the above documentation and
>> translate the directory structure themselves because they are busy
>> enough as it is.
>>
>> Ideally I would like to be able to tell them to use a particular option
>> to CMOR2 to create the right structure but such an option doesn't
>> exist.  The second best option would be some clarification on the use of
>> drslib.  Specifically, can it be run on the directory structure that
>> CMOR2 produces and will it then produce a compliant directory structure
>> that we can publish?  And are there any additional steps required?
>>
>> So, in the interests of improving communication, I suggest that someone
>> remove the word "recommended" from sections 3.1 and 3.3 in the DRS
>> document, explain why it's "required" and the repercussions of not
>> complying and also add instructions on how to get to the "required"
>> structure.  In an ideal world, an option would be added to CMOR2 to do
>> this there.
>>
>> As I said, this is just my perspective from the data publication side.
>> Please feel free to enlighten me on what I've missed.  Thanks.
>>
>>     Laura Carriere
>>
>>
>> On 9/1/2011 4:58 AM, Kettleborough, Jamie wrote:
>>> Hello,
>>>
>>> Isn't one issue that for some applications the *interface* with the data
>>> is at the *file system level* - not the catalogues? Version management,
>>> QC look like they are examples, and replication may be too (and I think
>>> these are pretty much federation wide activities/applications).  So if
>>> you want to minimise the complexity (~= minimise time to develop, cost
>>> of maintenance) in the way these applications interact with the data you
>>> want to ensure consistency in the way data stored in the file system.
>>> Bryan - I wasn't sure what interfaces you were talking about... Sorry.
>>>
>>> I'm going to be a bit pedantic here - but I don't think the DRS document
>>> says that data nodes must follow the DRS directory structure, its only a
>>> recommendation.  Though there *may* be a slight inconsistency in the way
>>> the DRS is written as it says the URLS *will* be a site dependant prefix
>>> followed by the
>>> *DRS directory structure*.  At least that's my reading of the 1.2
>>> version dated 9th March. I don't think all nodes are following the DRS
>>> specification for the URLS because they don't have the same underlying
>>> directory structure.  I don't know if the way the DRS is written or
>>> being interpreted is one of the sources of misunderstanding over this
>>> issue of DRS directory structure?  (This is not a criticism, its an
>>> acceptance that communicating specification and plans is a hard problem
>>> to crack).
>>>
>>> Another (possibly week) motivation for keeping all data in the DRS
>>> directory structure is it gives you a last ditch back up strategy - if
>>> you loose the catalogues you can regenerate the version info etc from
>>> the file system.
>>>
>>> Jamie
>>>
>>>> -----Original Message-----
>>>> From:go-essp-tech-bounces at ucar.edu
>>>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Bryan Lawrence
>>>> Sent: 01 September 2011 08:55
>>>> To:go-essp-tech at ucar.edu
>>>> Cc:stockhause at dkrz.de
>>>> Subject: Re: [Go-essp-tech] Non-DRS File structure at data nodes
>>>>
>>>> Hi Folks
>>>>
>>>>> At least it's now clear to me, that we can't rely on the
>>>> DRS structure
>>>>> so we should try to cope with this.
>>>> I'm just coming back to this, and I haven't read all of this
>>>> thread, but I don't agree with this statement!  If we can't
>>>> rely on the DRS *at the interface level*, then ESGF is
>>>> fundamentally doomed as a distributed activity, because we'll
>>>> never have the resource to support all the possible variants.
>>>>
>>>> Behind those interfaces, more flexibility might be possible,
>>>> but components would need to be pretty targetted in their
>>>> functionality.
>>>>
>>>> Bryan
>>>>
>>>>
>>>>> Thanks,
>>>>> Estani
>>>>>
>>>>> Am 31.08.2011 12:55, schriebstephen.pascoe at stfc.ac.uk:
>>>>>> Hi Estani,
>>>>>>
>>>>>> I see you have some code in esgf-contrib.git for managing
>>>> a replica
>>>>>> database.  There's quite a lot of drs-parsing code there.
>>>>    Is there
>>>>>> any reason why this couldn't use drslib?
>>>>>>
>>>>>> Cheers,
>>>>>> Stephen.
>>>>>>
>>>>>> ---
>>>>>> Stephen Pascoe  +44 (0)1235 445980
>>>>>> Centre of Environmental Data Archival STFC Rutherford Appleton
>>>>>> Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From:go-essp-tech-bounces at ucar.edu
>>>>>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Estanislao
>>>>>> Gonzalez
>>>>>> Sent: 31 August 2011 10:23
>>>>>> To: Juckes, Martin (STFC,RAL,RALSP)
>>>>>> Cc:stockhause at dkrz.de;go-essp-tech at ucar.edu
>>>>>> Subject: Re: [Go-essp-tech] Non-DRS File structure at data nodes
>>>>>>
>>>>>> Hi Martin,
>>>>>>
>>>>>> Are you planning to publish that data as a new instance
>>>> or as a replica?
>>>>>> If I recall it right, Karl said he thought the replica
>>>> was attached
>>>>>> at a semantic level. But I have my doubts and haven't got
>>>> any feed
>>>>>> back on this. does anyone know if the gateway can handle
>>>> a replica
>>>>>> with a different url path? (dataset and version "should" be the
>>>>>> same, although keeping the same version will be
>>>> difficult, because
>>>>>> no tool can handle this AFAIK, i.e. replicating or publishing
>>>>>> multiple datasets with different versions)
>>>>>>
>>>>>> And regarding replication (independently from the previous
>>>>>> question), how are you going to cope with new versions? Do you
>>>>>> already have tools for harvesting the TDS and building a list of
>>>>>> which files do need to be replicated, regarding from what
>>>> you already have?
>>>>>> The catalog will just publish a dataset and version along with a
>>>>>> bunch of files, you would need to keep a DB with the fies you've
>>>>>> already downloaded, and compare with the catalog to realize what
>>>>>> should be done next. This information is what drslib
>>>> should use to
>>>>>> create the next version. Is that what will happen? If
>>>> not, how will you be solving this?
>>>>>> Thanks,
>>>>>> Estani
>>>>>>
>>>>>> Am 31.08.2011 10:54, schriebmartin.juckes at stfc.ac.uk:
>>>>>>> Hello Martina,
>>>>>>>
>>>>>>> For BADC, I don't think we are considering storing data
>>>> in anything
>>>>>>> other than the DRS structure -- we just don't have the time to
>>>>>>> build systems around multiple structures. This means
>>>> that data that
>>>>>>> comes from a node with a different directory structure
>>>> will have to
>>>>>>> be re-mapped. Verification of file identities will rely on
>>>>>>> check-sums, as it always will when dealing with files
>>>> from archives
>>>>>>> from which we have no curation guarantees,
>>>>>>>
>>>>>>> cheers,
>>>>>>> Martin
>>>>>>>
>>>>>>> ________________________________
>>>>>>> From:go-essp-tech-bounces at ucar.edu
>>>> [go-essp-tech-bounces at ucar.edu]
>>>>>>> on behalf of Martina Stockhause [stockhause at dkrz.de] Sent: 31
>>>>>>> August 2011
>>>>>>> 09:44
>>>>>>> To:go-essp-tech at ucar.edu
>>>>>>> Subject: [Go-essp-tech] Non-DRS File structure at data nodes
>>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> we promised to describe the problems regarding the non-DRS file
>>>>>>> structures at the data nodes. Estani has already started the
>>>>>>> discussion on the replication/user download problems
>>>> (see attached
>>>>>>> email and document).
>>>>>>>
>>>>>>> Implications for the QC:
>>>>>>> - In the QCDB we need DRS syntax. The DOI process,
>>>> creation of CIM
>>>>>>> documents, and identification of the data the QC results are
>>>>>>> connected to rely on that. - The QC needs to know the version of
>>>>>>> the data checked. The DOI at the end of the QC process
>>>> is assigned
>>>>>>> to a specific not-changable data version. At least at
>>>> DKRZ we have
>>>>>>> to guarantee that the data is not changed after
>>>> assignment of the
>>>>>>> DOI, therefore we store a data copy in our archive. - The QC
>>>>>>> checker tool runs on files in a given directory structure and
>>>>>>> creates results in a copy of this structure. The QC
>>>> wrapper can deal with recombinations of path parts.
>>>>>>> So, if the directory structure includes all parts of the DRS
>>>>>>> syntax, the wrapper can create the DRS syntax before
>>>> insert in the
>>>>>>> QCDB. But we deal with structures at the data nodes, where some
>>>>>>> information is missing in the directory path, i.e.
>>>> version and MIP
>>>>>>> table. Therefore an additional information would be
>>>> needed for that mapping.
>>>>>>> Possible solutions to map the given file structure to the DRS
>>>>>>> directory structure before insert in the QCDB:
>>>>>>>
>>>>>>> 1. The publication on the data nodes of the three gateways who
>>>>>>> store replicas (PCMDI, BADC, DKRZ) publish data in the DRS
>>>>>>> directory structure. Then the QC run is possible without
>>>> mapping.
>>>>>>> Replication problems?
>>>>>>>
>>>>>>> 2. The directory structures of the data nodes are replicated as
>>>>>>> they are. We store the data under a certain version.
>>>> How? Are there
>>>>>>> implications for the replication from the data nodes? The
>>>>>>> individual file structures down to the chunk level are stored
>>>>>>> together with its DRS identification in a repository and
>>>> a service
>>>>>>> is created to access the DRS id for the given file in the given
>>>>>>> file structure. The QC and maybe other user data
>>>> services use this
>>>>>>> service for mapping. That will slow down the QC insert process.
>>>>>>> Before each insert of a chunk name, a qc result for a specific
>>>>>>> variable, and the qc result on the experiment level that service
>>>>>>> has to be called. And who can set-up and maintain such a
>>>>>>> repository? DKRZ has not the man power to do that in the
>>>> next months.
>>>>>>> Cheers,
>>>>>>> Martina
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -------- Original-Nachricht --------
>>>>>>> Betreff:        RE: ESG discussion
>>>>>>> Datum:  Wed, 10 Aug 2011 15:35:04 +0100
>>>>>>> Von:    Kettleborough,
>>>>>>>
>>>> Jamie<jamie.kettleborough at metoffice.gov.uk><mailto:jamie.kettl
>>>> eborough@
>>>>>>> metoffice.gov.uk>   An:     Karl
>>>>>>> Taylor<taylor13 at llnl.gov><mailto:taylor13 at llnl.gov>, Wood,
>>>>>>>
>>>> Richard<richard.wood at metoffice.gov.uk><mailto:richard.wood at met
>>>> office.go
>>>>>>> v.uk>   CC:     Carter,
>>>>>>>
>>>> Mick<mick.carter at metoffice.gov.uk><mailto:mick.carter at metoffice.gov
>>>>>>> .uk>
>>>>>>> , Elkington,
>>>>>>>
>>>> Mark<mark.elkington at metoffice.gov.uk><mailto:mark.elkington at metoffi
>>>>>>> ce.g
>>>>>>> ov.uk>, Bentley,
>>>>>>>
>>>> Philip<philip.bentley at metoffice.gov.uk><mailto:philip.bentley at metof
>>>>>>> fice
>>>>>>> .gov.uk>, Senior,
>>>>>>>
>>>> Cath<cath.senior at metoffice.gov.uk><mailto:cath.senior at metoffice.gov
>>>>>>> .uk>
>>>>>>> , Hines,
>>>>>>>
>>>> Adrian<adrian.hines at metoffice.gov.uk><mailto:adrian.hines at metoffice
>>>>>>> .gov .uk>, Dean N.
>>>>>>> Williams<williams13 at llnl.gov><mailto:williams13 at llnl.gov>,
>>>>>>> Estanislao
>>>>>>>
>>>> Gonzalez<gonzalez at dkrz.de><mailto:gonzalez at dkrz.de>,<martin.juckes@
>>>>>>> stfc .ac.uk><mailto:martin.juckes at stfc.ac.uk>, Kettleborough,
>>>>>>>
>>>> Jamie<jamie.kettleborough at metoffice.gov.uk><mailto:jamie.kettleboro
>>>>>>> ugh@
>>>>>>> metoffice.gov.uk>
>>>>>>>
>>>>>>>
>>>>>>> Hello Karl, Dean,
>>>>>>>
>>>>>>> Thanks for you reply on this, and the fact you are taking our
>>>>>>> concerns seriously. You are right to challenge us for
>>>> the specific
>>>>>>> issues, rather than us just highlighting the things that
>>>> don't meet
>>>>>>> our (possibly idealised) expectations of how the system should
>>>>>>> look.  As a result, we have had a thorough review of our key
>>>>>>> issues. I think some of them are issues that make if
>>>> harder for us
>>>>>>> to do things now; other issues are maybe more concerns
>>>> of problems
>>>>>>> being stored up. This document has been prepared with the help
>>>>>>> Estani Gonzalez.  We would like to have Martin Juckes
>>>> input on this
>>>>>>> too - but he is currently away on holiday.  I hope he can add to
>>>>>>> this when he returns - he has spent a lot of time thinking about
>>>>>>> the implications of data node directory structure on
>>>> versioning. I
>>>>>>> hope this helps clarify issues, if not please let use
>>>> know, Thanks,
>>>>>>> Jamie
>>>>>>>
>>>>>>> ________________________________
>>>>>>> From: Karl Taylor [mailto:taylor13 at llnl.gov]
>>>>>>> Sent: 09 August 2011 01:48
>>>>>>> To: Wood, Richard
>>>>>>> Cc: Carter, Mick; Kettleborough, Jamie; Elkington, Mark;
>>>> Bentley,
>>>>>>> Philip; Senior, Cath; Hines, Adrian; Dean N. Williams
>>>> Subject: Re:
>>>>>>> ESG discussion
>>>>>>>
>>>>>>> Dear all,
>>>>>>>
>>>>>>> Thanks for taking the time to bring to my attention the
>>>> ESG issues
>>>>>>> that I hope can be addressed reasonably soon.  I think we're in
>>>>>>> general agreement that the user's experience should be improved.
>>>>>>>
>>>>>>> I've discussed this briefly with Dean.  I plan to meet
>>>> with him and
>>>>>>> others here, and, drawing on your suggestions, we'll attempt to
>>>>>>> find solutions and methods of communication that might
>>>> improve matters.
>>>>>>> Before doing this, it would help if you could briefly answer the
>>>>>>> following questions:
>>>>>>>
>>>>>>> 1.  Is the main issue that it is currently difficult to script
>>>>>>> downloads from all the nodes because only some support
>>>> PKI?  What
>>>>>>> other uniformity among nodes is required for you to be
>>>> able to do
>>>>>>> what you want to do (i.e., what do you specifically want
>>>> to do that
>>>>>>> is difficult to do now)?  [nb. all data nodes are
>>>> scheduled to be
>>>>>>> operating with PKI authentication by September 1.]
>>>>>>>
>>>>>>> 2.  Is there anything from the perspective of a data *provider*
>>>>>>> that needs to be done (other than make things easier for
>>>> data users)?
>>>>>>> 3.  Currently ESG and CMIP5 do not dictate the directory
>>>> structure
>>>>>>> found at each data node (although most nodes are adhering to the
>>>>>>> recommendations of the DRS).   The gateway software and
>>>> catalog make it
>>>>>>> possible to get to the data regardless of directory
>>>> structure.  It
>>>>>>> is possible that "versioning" might impose additional
>>>> constraints
>>>>>>> on the directory structure, but I'm not sure about this.
>>>>    (By the
>>>>>>> way, I'm not sure what the "versioning" issue is since
>>>> currently I
>>>>>>> think it's impossible for users to know about more than one
>>>>>>> version; is that the
>>>>>>> issue?)  From a user's or provider's perspective, is there any
>>>>>>> essential reason that the directory structure should be
>>>> the same at
>>>>>>> each node?
>>>>>>>
>>>>>>> 4.  ESG allows considerable flexibility in publishing data, and
>>>>>>> CMIP5 has suggested "best practices" to reduce
>>>> differences.  Only
>>>>>>> some of the "best practices" are currently requirements.
>>>>    A certain
>>>>>>> amount of flexibility is essential since different data
>>>> providers
>>>>>>> have resources to support the potential capabilities of
>>>> ESG (e.g.,
>>>>>>> not all can support server-side calculations, which will
>>>> be put in place at some nodes).
>>>>>>> Likewise a provider can currently turn off the
>>>> "checksum", if this
>>>>>>> is deemed to slow publication too much (although we could insist
>>>>>>> that checksums be stored in the thredds catalogue).
>>>> Nevertheless,
>>>>>>> it is unlikely that every data node will be identically
>>>> configured for all
>>>>>>> options.    What are the *essential* ways that the data
>>>> nodes should
>>>>>>> respond identically (we may not be able to insist on uniformity
>>>>>>> that isn't essential for serving our users)?
>>>>>>>
>>>>>>> Thanks again for your input, and I look forward to your further
>>>>>>> help with this.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On 8/5/11 10:43 AM, Wood, Richard wrote:
>>>>>>>
>>>>>>> Dear Karl,
>>>>>>>
>>>>>>>       Following on from our phone call I had a discussion with
>>>>>>> technical
>>>>>>>
>>>>>>> colleagues here (Mick Carter, Jamie Kettleborough, Mark
>>>> Elkington,
>>>>>>> also earlier with Phil Bentley), and with Adrian Hines who is
>>>>>>> coordinating our CMIP5 analysis work, about ideas for
>>>> future development of the ESG.
>>>>>>> Our observations are from the user perspective, and
>>>> based on what
>>>>>>> we can gather from mailing lists and our own experience.
>>>> Coming out
>>>>>>> of our discussion we have a couple of suggestions that
>>>> could help
>>>>>>> with visibility for data providers and users:
>>>>>>>
>>>>>>> - Some areas need agreement among the data nodes as to the
>>>>>>> technical solution, and then implementation across all
>>>> the nodes,
>>>>>>> while others need a specific solution to be developed in
>>>> one place and rolled out.
>>>>>>> The group teleconferences that Dean organises appear to
>>>> be a good
>>>>>>> forum for airing specific technical ideas and solutions.
>>>> However,
>>>>>>> in our experience it can be  difficult in that kind of forum to
>>>>>>> discuss planning and prioritisation questions. From our
>>>> perspective
>>>>>>> we don't have visibility of the more project-related
>>>> issues such as
>>>>>>> key technical decisions, prioritisation and timelines, or of
>>>>>>> whether issues that have arisen in the mailing list
>>>> discussions are
>>>>>>> being followed up. We guess these may be discussed in separate
>>>>>>> project teleconferences involving the technical leads
>>>> from the data
>>>>>>> nodes. As users we would not necessarily expect to be
>>>> involved in
>>>>>>> those discussions, but as data providers and dowloaders
>>>> it would be
>>>>>>> very helpful for our planning to see the outcomes of the
>>>>>>> discussions. The sort of thing we had in mind would be a
>>>> simple web
>>>>>>> page showing the priority development areas, agreed
>>>> solutions and
>>>>>>> estimated dates for completion/release. Some solutions
>>>> will need to
>>>>>>> be implemented separately across all the participating
>>>> data nodes,
>>>>>>> and in these cases it would be useful to see the
>>>> estimated timeframe for implementation at each node.
>>>>>>> This would not be intended as a 'big stick' to the partners, but
>>>>>>> simply as a planning aid so that everyone can see what's
>>>> available
>>>>>>> when and the project can identify any potential
>>>> bottlenecks or issues in advance.
>>>>>>> Also the intention is not to generate a lot of extra work.
>>>>>>> Hopefully providing this information would be pretty
>>>> light on people's time.
>>>>>>> - From where we sit it appears that some nodes are quite
>>>> successful
>>>>>>> in following best practice and implementing the
>>>> federation policies
>>>>>>> as far as they are aware of them. Could what these nodes
>>>> do be made
>>>>>>> helpful to all the data nodes (e.g. by using identical
>>>> software)?
>>>>>>> We realise there may be real differences between some
>>>> data nodes -
>>>>>>> but where possible we think that what is similar could
>>>> be enforced
>>>>>>> or made explicitly the same through sharing the software
>>>> components and tools.
>>>>>>> To set the discussion on priorities rolling, Jamie has
>>>> prepared, in
>>>>>>> consultation with others here, a short document showing the Met
>>>>>>> Office view of current priority issues (attached). If you could
>>>>>>> update us on the status of work on these issues, that
>>>> would be very
>>>>>>> useful (ideally via the web pages proposed above, which we think
>>>>>>> would be of interest to many users, or via email in the
>>>> interim).
>>>>>>> Many thanks for the update on tokenless authentication,
>>>> which is very good news.
>>>>>>>       Once again, our thanks to you, Dean and the team for
>>>> all the hard
>>>>>>>       work
>>>>>>>
>>>>>>> we know is going into this. Please let us know what you think of
>>>>>>> the above ideas and the attachment, and if there is
>>>> anything we can
>>>>>>> do to help.
>>>>>>>
>>>>>>>           Best wishes,
>>>>>>>
>>>>>>>            Richard
>>>>>>>
>>>>>>> --------------
>>>>>>> Richard Wood
>>>>>>> Met Office Fellow and Head (Oceans, Cryosphere and Dangerous
>>>>>>> Climate
>>>>>>> Change)
>>>>>>> Met Office Hadley Centre
>>>>>>> FitzRoy Road, Exeter EX1 3PB, UK
>>>>>>> Phone +44 (0)1392 886641  Fax +44 (0)1392 885681 Email
>>>>>>>
>>>> richard.wood at metoffice.gov.uk<mailto:richard.wood at metoffice.gov.uk>
>>>>>>> http://www.metoffice.gov.uk  Personal web page
>>>>>>>
>>>> http://www.metoffice.gov.uk/research/scientists/cryosphere-oceans/r
>>>>>>> ichar
>>>>>>> d-wood
>>>>>>>
>>>>>>> *** Please note I also work as Theme Leader (Climate System) for
>>>>>>> the Natural Environment Research Council ***
>>>>>>> *** Where possible please send emails on NERC matters to
>>>>>>> rwtl at nerc.ac.uk<mailto:rwtl at nerc.ac.uk>   ***
>>>> --
>>>> Bryan Lawrence
>>>> University of Reading:  Professor of Weather and Climate
>>>> Computing National Centre for Atmospheric Science: Director
>>>> of Models and Data
>>>> STFC: Director of the Centre of Environmental Data Archival
>>>> Phone +44 1235 445012; Web: home.badc.rl.ac.uk/lawrence
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>
>>> _______________________________________________
>>> GO-ESSP-TECH mailing list
>>> GO-ESSP-TECH at ucar.edu
>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>> --
>>
>>     Laura Carrierelaura.carriere at nasa.gov
>>     SAIC                                 301 614-5064
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech


-- 
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  gonzalez at dkrz.de

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110902/95c802e5/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list