[Go-essp-tech] Checksums on data nodes
Kettleborough, Jamie
jamie.kettleborough at metoffice.gov.uk
Thu Jul 7 07:53:32 MDT 2011
Hello Martin,
As you have been pulling data back from different nodes how often has
the checksum picked up a corrupt transfer? How often could this
corruption have been spotted by just checking the file size?
Thanks,
Jamie
> -----Original Message-----
> From: martin.juckes at stfc.ac.uk [mailto:martin.juckes at stfc.ac.uk]
> Sent: 06 July 2011 22:43
> To: Kettleborough, Jamie; gavin at llnl.gov
> Cc: go-essp-tech at ucar.edu
> Subject: Checksums and PKI access control on data nodes
>
> Hi Jamie,
>
> just picking up something on one of your data node
> authorization threads.
>
> I think programmatic access to data requires PKI security --
> I don't see any prospect of adequate data access with the
> http token approach.
>
> I think that checksums are also necessary to guarantee data
> integrity -- these are given in the THREDDS catalogues of
> BADC, IPSL, and CNRM -- and CCCMA is in the process of adding them.
>
> I aim to continue contacting data nodes over the coming weeks
> and hope that there will be steady progress in levelling the
> quality of service upwards,
>
> cheers,
> Martin
>
> ________________________________________
> From: go-essp-tech-bounces at ucar.edu
> [go-essp-tech-bounces at ucar.edu] on behalf of Kettleborough,
> Jamie [jamie.kettleborough at metoffice.gov.uk]
> Sent: 05 July 2011 14:48
> To: Gavin M. Bell
> Cc: go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] Data node authorization
>
> Hello Gavin,
>
> thanks for this. This looks useful. Any ideas when any
> live/production data nodes will have this version of the
> service on them? - I couldn't find any (but that's part of
> the problem of course). When available how up to date will
> the registry be e.g. are their constraints on it like it will
> only know about data nodes running the same releases?
>
> I know you were just answering my tangent. But I think the
> original question is still only half answered. As I
> understand it there are two ways this might go:
>
> 1. all data nodes upgrade change to the PKI infrastructure
>
> 2. the ESGF continues to support (for some time) both PKI and
> the HTTP query string token (I don't know the right name for
> this, sorry).
>
> (there is a 3rd option of everyone move to just the HTTP
> query string token - but I don't think that is really under
> discussion).
>
> My guess is that 2. is the most likely outcome and data users
> will have to cope with both. So...
>
> 1. How do you programmatically get data using the HTTP query
> string token (I think Martin is following this up with Bob -
> can we have a summary posted to the list?)
>
> 2. How does a user know which method to use for which nodes.
> (This may be in the data-node registry, when available, but
> it wasn't' obvious to me from the sample Luca sent round? -
> again I may be missing something though).
>
> Apologies if I'm coming across as over demanding here - I
> realise I'm coming to this discussion relatively late in the
> day. Just I'm aware that we have scientists who want to get
> data so they can start the analysis and writing of multi
> model papers in time for the 1st draft of the AR5. At the
> moment I'm really uncertain on how they can get the data
> minimising the effort that have to put into finding and fetching it.
>
> Thanks,
>
> Jamie
>
>
> ________________________________
>
> From: Gavin M. Bell [mailto:gavin at llnl.gov]
> Sent: 01 July 2011 20:35
> To: Kettleborough, Jamie
> Cc: Cinquini, Luca (3880); go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] Data node authorization
>
>
> Hello Jamie,
>
> Allow me to solely indulge your tangent for a moment... :-)
>
> The issue of knowing who is where etc. is solved by
> using a sufficiently recent version of the ESGF "data" Node
> (v0.5.1+).
> The node-manager's registry component will
> automatically generate a continuously updating descriptive
> (xml) document of nodes currently present in the federation
> at a given time. This would have ameliorated your task considerably.
>
> If you look at the sites you have collected; go to
> the esgf-node-manager page and look at the bottom left corner
> for the version.
> They are all earlier than v0.5.1 and hence do not
> have the automatic federation feature in place.
>
> Ex:
> http://esgnode1.nci.org.au/esgf-node-manager/ (v0.5.0)
> http://vesg.ipsl.fr/esgf-node-manager/ (v0.4.0)
> http://esg.cnrm-game-meteo.fr/esgf-node-manager/ (v0.4.0)
> http://dap.cccma.uvic.ca/esgf-node-manager/ (v0.5.0)
> http://cmip-dn.badc.rl.ac.uk/esgf-node-manager/ (v0.4.0)
>
> (NASA-GISS are not running a node manager at all)
>
> If you look at more recent node installations
> (version 0.5.1+) you will see that there is a
> registration.xml document that is served under
> esgf-node-manager. It is an active document that is
> automatically updated by the node manager's registry service
> to always reflect the current state of the federation.
> This is a feature of the new ESGF Node. Gateways are
> not running node managers so they are not present in the
> registration.xml document. However, you can find out about
> gateways indirectly by looking at the ESGF Node's
> registration entry and looking at the attribute "adminPeer"
> this indicates that node's target IDP service, which in older
> ESG parlance indicates a "gateway". The new ESGF Nodes are
> built based on a modular component architecture such that
> sets of components embody functionality, these are what we
> call ESGF Node "types". There are 4 node types. The node
> type that is currently being installed is the well known
> "data" type a.k.a the "data node", the other types are not
> mutually exclusive and extend the ESGF Nodes functionality to
> include familiar features such as:
> - User credential management and single sign on support
> - Attribute management
> - Enhanced Federation-wide searching (with new search
> front-end)
>
> As well as recent features since v0.5.1 and pending
> features coming on line such as:
> - Automatic fail-over and fault tolerance
> - New administrative front ends
> - Computation / Visualization tools
> - and more...
>
> I would suggest upgrading :-).
>
> The installation/upgrading process has been
> streamlined to make things more straight forward - and the
> team and I are always glad to help if needed. There are
> further enhancements in the queue that will further
> streamline the process to make installation/upgrading as
> turn-key as possible. There are also enhancements to the
> federation protocol and new features as well, that will soon
> be available in an upcoming v0.5.3 release that is currently in test.
>
> FYI:
> The current installer installs the ESGF Node at v0.5.1.
> In staging is v0.5.2
> In test is v0.5.3.
>
> Note: The list above are versions of the node manager
> component.
> As it is a component of the ESGF Node, the node itself has a
> version currently ESG Node v1.0.4+ (Stuyvesant release).
>
> The new ESGF Node augments the data node and is a
> complete solution in and of itself while being compatible
> with the current Gateway. It should be considered a useful
> tool to help the climate community and adding to the ESG
> ecosystem of utilities :-).
>
> Whew... (that was a long email)
> I hope this was somewhat useful information in the
> context of your tangent. :-)
>
>
> On 7/1/11 6:49 AM, Kettleborough, Jamie wrote:
>
> I created this table by: looking at each
> gateway, figuring out which
> modelling institutes contributed to the CMIP5
> project, selecting a
> sample data-set, creating a wget script, and
> then inspecting the url in
> the script. (I couldn't get to any NCC data
> as I didn't have access).
> I only sampled one dataset.
>
> This feels a bit long winded - what is the
> expected way to do this?
> Although today I was just gathering
> information on what data nodes are
> out there I can imagine this as a part of a
> real life use case (a very
> common use case). If I want to gather a
> diagnostic, such as monthly
> mean surface temperature from as many models
> as I can, I think I'd have
> to do this sort of trawling. OK I maybe only
> have to do the initial
> mapping of institute to data node once, but I
> think there is still a
> trawl needed between gateways to get the
> data. I may be missing
> something - and I took some unnecessary
> steps. Please let me know if
> this is the case. Estani, Martin, Sebastien
> - sounds like you have
> already started to do this sort of thing?
>
> I also note that not all gateways know about
> all institutes - I think
> this is a known problem. For instance PCMDI
> doesn't know about IPSL,
> and only NCI seems to know about CSIRO. Any
> ideas when this might be
> resolved?
>
>
>
>
> --
> Gavin M. Bell
> Lawrence Livermore National Labs
> --
>
> "Never mistake a clear view for a short distance."
> -Paul Saffo
>
> (GPG Key - http://rainbow.llnl.gov/dist/keys/gavin.asc)
>
> A796 CE39 9C31 68A4 52A7 1F6B 66B7 B250 21D5 6D3E
>
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> --
> Scanned by iCritical.
>
More information about the GO-ESSP-TECH
mailing list