[Go-essp-tech] Checksums on data nodes

martin.juckes at stfc.ac.uk martin.juckes at stfc.ac.uk
Fri Jul 8 02:02:38 MDT 2011


Hi Jamie,

I have been checking file size before checking the checksum -- but I'm afraid I don't have statistics on failure rates. 

I believe that Alan Iwi has had experience (on another project) of corrupted files showing up with the correct size. This may have been associated with a parallel FTP client and so not directly relevant to wget transfers, so I think it is best to be cautious.

I have data from CNRM which I transferred before they started publishing checksums -- I need to go through that now and will let you know the results,

Cheers,
Martin

> >-----Original Message-----
> >From: Kettleborough, Jamie
> >[mailto:jamie.kettleborough at metoffice.gov.uk]
> >Sent: 07 July 2011 14:54
> >To: Juckes, Martin (STFC,RAL,RALSP); gavin at llnl.gov
> >Cc: go-essp-tech at ucar.edu; Kettleborough, Jamie
> >Subject: RE: Checksums on data nodes
> >
> >Hello Martin,
> >
> >As you have been pulling data back from different nodes how often has
> >the checksum picked up a corrupt transfer?  How often could this
> >corruption have been spotted by just checking the file size?
> >
> >Thanks,
> >
> >Jamie
> >
> >> -----Original Message-----
> >> From: martin.juckes at stfc.ac.uk [mailto:martin.juckes at stfc.ac.uk]
> >> Sent: 06 July 2011 22:43
> >> To: Kettleborough, Jamie; gavin at llnl.gov
> >> Cc: go-essp-tech at ucar.edu
> >> Subject: Checksums and PKI access control on data nodes
> >>
> >> Hi Jamie,
> >>
> >> just picking up something on one of your data node
> >> authorization threads.
> >>
> >> I think programmatic access to data requires PKI security --
> >> I don't see any prospect of adequate data access with the
> >> http token approach.
> >>
> >> I think that checksums are also necessary to guarantee data
> >> integrity -- these are given in the THREDDS catalogues of
> >> BADC, IPSL, and CNRM -- and CCCMA is in the process of adding them.
> >>
> >> I aim to continue contacting data nodes over the coming weeks
> >> and hope that there will be steady progress in levelling the
> >> quality of service upwards,
> >>
> >> cheers,
> >> Martin
> >>
> >> ________________________________________
> >> From: go-essp-tech-bounces at ucar.edu
> >> [go-essp-tech-bounces at ucar.edu] on behalf of Kettleborough,
> >> Jamie [jamie.kettleborough at metoffice.gov.uk]
> >> Sent: 05 July 2011 14:48
> >> To: Gavin M. Bell
> >> Cc: go-essp-tech at ucar.edu
> >> Subject: Re: [Go-essp-tech] Data node authorization
> >>
> >> Hello Gavin,
> >>
> >> thanks for this.  This looks useful.  Any ideas when any
> >> live/production data nodes will have this version of the
> >> service on them? - I couldn't find any (but that's part of
> >> the problem of course). When available how up to date will
> >> the registry be e.g. are their constraints on it like it will
> >> only know about data nodes running the same releases?
> >>
> >> I know you were just answering my tangent.  But I think the
> >> original question is still only half answered.  As I
> >> understand it there are two ways this might go:
> >>
> >> 1. all data nodes upgrade change to the PKI infrastructure
> >>
> >> 2. the ESGF continues to support (for some time) both PKI and
> >> the HTTP query string token (I don't know the right name for
> >> this, sorry).
> >>
> >> (there is a 3rd option of everyone move to just the HTTP
> >> query string token - but I don't think that is really under
> >> discussion).
> >>
> >> My guess is that 2. is the most likely outcome and data users
> >> will have to cope with both.  So...
> >>
> >> 1. How do you programmatically get data using the HTTP query
> >> string token (I think Martin is following this up with Bob -
> >> can we have a summary posted to the list?)
> >>
> >> 2. How does a user know which method to use for which nodes.
> >> (This may be in the data-node registry, when available, but
> >> it wasn't' obvious to me from the sample Luca sent round? -
> >> again I may be missing something though).
> >>
> >> Apologies if I'm coming across as over demanding here - I
> >> realise I'm coming to this discussion relatively late in the
> >> day.  Just I'm aware that we have scientists who want to get
> >> data so they can start the analysis and writing of multi
> >> model papers in time for the 1st draft of the AR5. At the
> >> moment I'm really uncertain on how they can get the data
> >> minimising the effort that have to put into finding and fetching it.
> >>
> >> Thanks,
> >>
> >> Jamie
> >>
> >>
> >> ________________________________
> >>
> >>         From: Gavin M. Bell [mailto:gavin at llnl.gov]
> >>         Sent: 01 July 2011 20:35
> >>         To: Kettleborough, Jamie
> >>         Cc: Cinquini, Luca (3880); go-essp-tech at ucar.edu
> >>         Subject: Re: [Go-essp-tech] Data node authorization
> >>
> >>
> >>         Hello Jamie,
> >>
> >>         Allow me to solely indulge your tangent for a moment... :-)
> >>
> >>         The issue of knowing who is where etc. is solved by
> >> using a sufficiently recent version of the  ESGF "data" Node
> >> (v0.5.1+).
> >>         The node-manager's registry component will
> >> automatically generate a continuously updating descriptive
> >> (xml) document of nodes currently present in the federation
> >> at a given time.  This would have ameliorated your task
> >considerably.
> >>
> >>         If you look at the sites you have collected; go to
> >> the esgf-node-manager page and look at the bottom left corner
> >> for the version.
> >>         They are all earlier than v0.5.1 and hence do not
> >> have the automatic federation feature in place.
> >>
> >>         Ex:
> >>         http://esgnode1.nci.org.au/esgf-node-manager/  (v0.5.0)
> >>         http://vesg.ipsl.fr/esgf-node-manager/  (v0.4.0)
> >>         http://esg.cnrm-game-meteo.fr/esgf-node-manager/  (v0.4.0)
> >>         http://dap.cccma.uvic.ca/esgf-node-manager/  (v0.5.0)
> >>         http://cmip-dn.badc.rl.ac.uk/esgf-node-manager/  (v0.4.0)
> >>
> >>         (NASA-GISS are not running a node manager at all)
> >>
> >>         If you look at more recent node installations
> >> (version 0.5.1+) you will see that there is a
> >> registration.xml document that is served under
> >> esgf-node-manager.  It is an active document that is
> >> automatically updated by the node manager's registry service
> >> to always reflect the current state of the federation.
> >>         This is a feature of the new ESGF Node.  Gateways are
> >> not running node managers so they are not present in the
> >> registration.xml document.  However, you can find out about
> >> gateways indirectly by looking at the ESGF Node's
> >> registration entry and looking at the attribute "adminPeer"
> >> this indicates that node's target IDP service, which in older
> >> ESG parlance indicates a "gateway".  The new ESGF Nodes are
> >> built based on a modular component architecture such that
> >> sets of components embody functionality, these are what we
> >> call ESGF Node "types".  There are 4 node types. The node
> >> type that is currently being installed is the well known
> >> "data" type a.k.a the "data node", the other types are not
> >> mutually exclusive and extend the ESGF Nodes functionality to
> >> include familiar features such as:
> >>         - User credential management and single sign on support
> >>         - Attribute management
> >>         - Enhanced Federation-wide searching (with new search
> >> front-end)
> >>
> >>         As well as recent features since v0.5.1 and pending
> >> features coming on line such as:
> >>         - Automatic fail-over and fault tolerance
> >>         - New administrative front ends
> >>         - Computation / Visualization tools
> >>         - and more...
> >>
> >>         I would suggest upgrading :-).
> >>
> >>         The installation/upgrading process has been
> >> streamlined to make things more straight forward - and the
> >> team and I are always glad to help if needed.  There are
> >> further enhancements in the queue that will further
> >> streamline the process to make installation/upgrading as
> >> turn-key as possible.  There are also enhancements to the
> >> federation protocol and new features as well, that will soon
> >> be available in an upcoming v0.5.3 release that is currently in
> >test.
> >>
> >>         FYI:
> >>         The current installer installs the ESGF Node at v0.5.1.
> >>         In staging is v0.5.2
> >>         In test is v0.5.3.
> >>
> >>         Note: The list above are versions of the node manager
> >> component.
> >> As it is a component of the ESGF Node, the node itself has a
> >> version currently ESG Node v1.0.4+ (Stuyvesant release).
> >>
> >>         The new ESGF Node augments the data node and is a
> >> complete solution in and of itself while being compatible
> >> with the current Gateway.  It should be considered a useful
> >> tool to help the climate community and adding to the ESG
> >> ecosystem of utilities :-).
> >>
> >>         Whew... (that was a long email)
> >>         I hope this was somewhat useful information in the
> >> context of your tangent. :-)
> >>
> >>
> >>         On 7/1/11 6:49 AM, Kettleborough, Jamie wrote:
> >>
> >>                 I created this table by: looking at each
> >> gateway, figuring out which
> >>                 modelling institutes contributed to the CMIP5
> >> project, selecting a
> >>                 sample data-set, creating a wget script, and
> >> then inspecting the url in
> >>                 the script.  (I couldn't get to any NCC data
> >> as I didn't have access).
> >>                 I only sampled one dataset.
> >>
> >>                 This feels a bit long winded - what is the
> >> expected way to do this?
> >>                 Although today I was just gathering
> >> information on what data nodes are
> >>                 out there I can imagine this as a part of a
> >> real life use case (a very
> >>                 common use case).  If I want to gather a
> >> diagnostic, such as monthly
> >>                 mean surface temperature from as many models
> >> as I can, I think I'd have
> >>                 to do this sort of trawling.  OK I maybe only
> >> have to do the initial
> >>                 mapping of institute to data node once, but I
> >> think there is still a
> >>                 trawl needed between gateways to get the
> >> data.  I may be missing
> >>                 something - and I took some unnecessary
> >> steps. Please let me know if
> >>                 this is the case.  Estani, Martin, Sebastien
> >> - sounds like you have
> >>                 already started to do this sort of thing?
> >>
> >>                 I also note that not all gateways know about
> >> all institutes - I think
> >>                 this is a known problem.  For instance PCMDI
> >> doesn't know about IPSL,
> >>                 and only NCI seems to know about CSIRO. Any
> >> ideas when this might be
> >>                 resolved?
> >>
> >>
> >>
> >>
> >>         --
> >>         Gavin M. Bell
> >>         Lawrence Livermore National Labs
> >>         --
> >>
> >>          "Never mistake a clear view for a short distance."
> >>                        -Paul Saffo
> >>
> >>         (GPG Key - http://rainbow.llnl.gov/dist/keys/gavin.asc)
> >>
> >>          A796 CE39 9C31 68A4 52A7  1F6B 66B7 B250 21D5 6D3E
> >>
> >>
> >> _______________________________________________
> >> GO-ESSP-TECH mailing list
> >> GO-ESSP-TECH at ucar.edu
> >> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >> --
> >> Scanned by iCritical.
> >>
-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list