[Go-essp-tech] Checksums on data nodes

Kettleborough, Jamie jamie.kettleborough at metoffice.gov.uk
Thu Jul 7 07:53:32 MDT 2011


Hello Martin,

As you have been pulling data back from different nodes how often has
the checksum picked up a corrupt transfer?  How often could this
corruption have been spotted by just checking the file size?

Thanks,

Jamie 

> -----Original Message-----
> From: martin.juckes at stfc.ac.uk [mailto:martin.juckes at stfc.ac.uk] 
> Sent: 06 July 2011 22:43
> To: Kettleborough, Jamie; gavin at llnl.gov
> Cc: go-essp-tech at ucar.edu
> Subject: Checksums and PKI access control on data nodes
> 
> Hi Jamie,
> 
> just picking up something on one of your data node 
> authorization threads. 
> 
> I think programmatic access to data requires PKI security -- 
> I don't see any prospect of adequate data access with the 
> http token approach. 
> 
> I think that checksums are also necessary to guarantee data 
> integrity -- these are given in the THREDDS catalogues of 
> BADC, IPSL, and CNRM -- and CCCMA is in the process of adding them. 
> 
> I aim to continue contacting data nodes over the coming weeks 
> and hope that there will be steady progress in levelling the 
> quality of service upwards,
> 
> cheers,
> Martin
> 
> ________________________________________
> From: go-essp-tech-bounces at ucar.edu 
> [go-essp-tech-bounces at ucar.edu] on behalf of Kettleborough, 
> Jamie [jamie.kettleborough at metoffice.gov.uk]
> Sent: 05 July 2011 14:48
> To: Gavin M. Bell
> Cc: go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] Data node authorization
> 
> Hello Gavin,
> 
> thanks for this.  This looks useful.  Any ideas when any 
> live/production data nodes will have this version of the 
> service on them? - I couldn't find any (but that's part of 
> the problem of course). When available how up to date will 
> the registry be e.g. are their constraints on it like it will 
> only know about data nodes running the same releases?
> 
> I know you were just answering my tangent.  But I think the 
> original question is still only half answered.  As I 
> understand it there are two ways this might go:
> 
> 1. all data nodes upgrade change to the PKI infrastructure
> 
> 2. the ESGF continues to support (for some time) both PKI and 
> the HTTP query string token (I don't know the right name for 
> this, sorry).
> 
> (there is a 3rd option of everyone move to just the HTTP 
> query string token - but I don't think that is really under 
> discussion).
> 
> My guess is that 2. is the most likely outcome and data users 
> will have to cope with both.  So...
> 
> 1. How do you programmatically get data using the HTTP query 
> string token (I think Martin is following this up with Bob - 
> can we have a summary posted to the list?)
> 
> 2. How does a user know which method to use for which nodes.  
> (This may be in the data-node registry, when available, but 
> it wasn't' obvious to me from the sample Luca sent round? - 
> again I may be missing something though).
> 
> Apologies if I'm coming across as over demanding here - I 
> realise I'm coming to this discussion relatively late in the 
> day.  Just I'm aware that we have scientists who want to get 
> data so they can start the analysis and writing of multi 
> model papers in time for the 1st draft of the AR5. At the 
> moment I'm really uncertain on how they can get the data 
> minimising the effort that have to put into finding and fetching it.
> 
> Thanks,
> 
> Jamie
> 
> 
> ________________________________
> 
>         From: Gavin M. Bell [mailto:gavin at llnl.gov]
>         Sent: 01 July 2011 20:35
>         To: Kettleborough, Jamie
>         Cc: Cinquini, Luca (3880); go-essp-tech at ucar.edu
>         Subject: Re: [Go-essp-tech] Data node authorization
> 
> 
>         Hello Jamie,
> 
>         Allow me to solely indulge your tangent for a moment... :-)
> 
>         The issue of knowing who is where etc. is solved by 
> using a sufficiently recent version of the  ESGF "data" Node 
> (v0.5.1+).
>         The node-manager's registry component will 
> automatically generate a continuously updating descriptive 
> (xml) document of nodes currently present in the federation 
> at a given time.  This would have ameliorated your task considerably.
> 
>         If you look at the sites you have collected; go to 
> the esgf-node-manager page and look at the bottom left corner 
> for the version.
>         They are all earlier than v0.5.1 and hence do not 
> have the automatic federation feature in place.
> 
>         Ex:
>         http://esgnode1.nci.org.au/esgf-node-manager/  (v0.5.0)
>         http://vesg.ipsl.fr/esgf-node-manager/  (v0.4.0)
>         http://esg.cnrm-game-meteo.fr/esgf-node-manager/  (v0.4.0)
>         http://dap.cccma.uvic.ca/esgf-node-manager/  (v0.5.0)
>         http://cmip-dn.badc.rl.ac.uk/esgf-node-manager/  (v0.4.0)
> 
>         (NASA-GISS are not running a node manager at all)
> 
>         If you look at more recent node installations 
> (version 0.5.1+) you will see that there is a 
> registration.xml document that is served under 
> esgf-node-manager.  It is an active document that is 
> automatically updated by the node manager's registry service 
> to always reflect the current state of the federation.
>         This is a feature of the new ESGF Node.  Gateways are 
> not running node managers so they are not present in the 
> registration.xml document.  However, you can find out about 
> gateways indirectly by looking at the ESGF Node's 
> registration entry and looking at the attribute "adminPeer" 
> this indicates that node's target IDP service, which in older 
> ESG parlance indicates a "gateway".  The new ESGF Nodes are 
> built based on a modular component architecture such that 
> sets of components embody functionality, these are what we 
> call ESGF Node "types".  There are 4 node types. The node 
> type that is currently being installed is the well known 
> "data" type a.k.a the "data node", the other types are not 
> mutually exclusive and extend the ESGF Nodes functionality to 
> include familiar features such as:
>         - User credential management and single sign on support
>         - Attribute management
>         - Enhanced Federation-wide searching (with new search 
> front-end)
> 
>         As well as recent features since v0.5.1 and pending 
> features coming on line such as:
>         - Automatic fail-over and fault tolerance
>         - New administrative front ends
>         - Computation / Visualization tools
>         - and more...
> 
>         I would suggest upgrading :-).
> 
>         The installation/upgrading process has been 
> streamlined to make things more straight forward - and the 
> team and I are always glad to help if needed.  There are 
> further enhancements in the queue that will further 
> streamline the process to make installation/upgrading as 
> turn-key as possible.  There are also enhancements to the 
> federation protocol and new features as well, that will soon 
> be available in an upcoming v0.5.3 release that is currently in test.
> 
>         FYI:
>         The current installer installs the ESGF Node at v0.5.1.
>         In staging is v0.5.2
>         In test is v0.5.3.
> 
>         Note: The list above are versions of the node manager 
> component.
> As it is a component of the ESGF Node, the node itself has a 
> version currently ESG Node v1.0.4+ (Stuyvesant release).
> 
>         The new ESGF Node augments the data node and is a 
> complete solution in and of itself while being compatible 
> with the current Gateway.  It should be considered a useful 
> tool to help the climate community and adding to the ESG 
> ecosystem of utilities :-).
> 
>         Whew... (that was a long email)
>         I hope this was somewhat useful information in the 
> context of your tangent. :-)
> 
> 
>         On 7/1/11 6:49 AM, Kettleborough, Jamie wrote:
> 
>                 I created this table by: looking at each 
> gateway, figuring out which
>                 modelling institutes contributed to the CMIP5 
> project, selecting a
>                 sample data-set, creating a wget script, and 
> then inspecting the url in
>                 the script.  (I couldn't get to any NCC data 
> as I didn't have access).
>                 I only sampled one dataset.
> 
>                 This feels a bit long winded - what is the 
> expected way to do this?
>                 Although today I was just gathering 
> information on what data nodes are
>                 out there I can imagine this as a part of a 
> real life use case (a very
>                 common use case).  If I want to gather a 
> diagnostic, such as monthly
>                 mean surface temperature from as many models 
> as I can, I think I'd have
>                 to do this sort of trawling.  OK I maybe only 
> have to do the initial
>                 mapping of institute to data node once, but I 
> think there is still a
>                 trawl needed between gateways to get the 
> data.  I may be missing
>                 something - and I took some unnecessary 
> steps. Please let me know if
>                 this is the case.  Estani, Martin, Sebastien 
> - sounds like you have
>                 already started to do this sort of thing?
> 
>                 I also note that not all gateways know about 
> all institutes - I think
>                 this is a known problem.  For instance PCMDI 
> doesn't know about IPSL,
>                 and only NCI seems to know about CSIRO. Any 
> ideas when this might be
>                 resolved?
> 
> 
> 
> 
>         --
>         Gavin M. Bell
>         Lawrence Livermore National Labs
>         --
> 
>          "Never mistake a clear view for a short distance."
>                        -Paul Saffo
> 
>         (GPG Key - http://rainbow.llnl.gov/dist/keys/gavin.asc)
> 
>          A796 CE39 9C31 68A4 52A7  1F6B 66B7 B250 21D5 6D3E
> 
> 
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> --
> Scanned by iCritical.
> 


More information about the GO-ESSP-TECH mailing list