[Go-essp-tech] Data node authorization

Kettleborough, Jamie jamie.kettleborough at metoffice.gov.uk
Tue Jul 5 07:48:11 MDT 2011


Hello Gavin,
 
thanks for this.  This looks useful.  Any ideas when any live/production
data nodes will have this version of the service on them? - I couldn't
find any (but that's part of the problem of course). When available how
up to date will the registry be e.g. are their constraints on it like it
will only know about data nodes running the same releases?
 
I know you were just answering my tangent.  But I think the original
question is still only half answered.  As I understand it there are two
ways this might go:
 
1. all data nodes upgrade change to the PKI infrastructure
 
2. the ESGF continues to support (for some time) both PKI and the HTTP
query string token (I don't know the right name for this, sorry).
 
(there is a 3rd option of everyone move to just the HTTP query string
token - but I don't think that is really under discussion).
 
My guess is that 2. is the most likely outcome and data users will have
to cope with both.  So...
 
1. How do you programmatically get data using the HTTP query string
token (I think Martin is following this up with Bob - can we have a
summary posted to the list?)
 
2. How does a user know which method to use for which nodes.  (This may
be in the data-node registry, when available, but it wasn't' obvious to
me from the sample Luca sent round? - again I may be missing something
though).

Apologies if I'm coming across as over demanding here - I realise I'm
coming to this discussion relatively late in the day.  Just I'm aware
that we have scientists who want to get data so they can start the
analysis and writing of multi model papers in time for the 1st draft of
the AR5. At the moment I'm really uncertain on how they can get the data
minimising the effort that have to put into finding and fetching it.

Thanks,

Jamie


________________________________

	From: Gavin M. Bell [mailto:gavin at llnl.gov] 
	Sent: 01 July 2011 20:35
	To: Kettleborough, Jamie
	Cc: Cinquini, Luca (3880); go-essp-tech at ucar.edu
	Subject: Re: [Go-essp-tech] Data node authorization
	
	
	Hello Jamie, 
	
	Allow me to solely indulge your tangent for a moment... :-)
	
	The issue of knowing who is where etc. is solved by using a
sufficiently recent version of the  ESGF "data" Node (v0.5.1+).
	The node-manager's registry component will automatically
generate a continuously updating descriptive (xml) document of nodes
currently present in the federation at a given time.  This would have
ameliorated your task considerably.
	
	If you look at the sites you have collected; go to the
esgf-node-manager page and look at the bottom left corner for the
version.
	They are all earlier than v0.5.1 and hence do not have the
automatic federation feature in place.
	
	Ex:
	http://esgnode1.nci.org.au/esgf-node-manager/  (v0.5.0)
	http://vesg.ipsl.fr/esgf-node-manager/  (v0.4.0)
	http://esg.cnrm-game-meteo.fr/esgf-node-manager/  (v0.4.0)
	http://dap.cccma.uvic.ca/esgf-node-manager/  (v0.5.0)
	http://cmip-dn.badc.rl.ac.uk/esgf-node-manager/  (v0.4.0)
	
	(NASA-GISS are not running a node manager at all)
	
	If you look at more recent node installations (version 0.5.1+)
you will see that there is a registration.xml document that is served
under esgf-node-manager.  It is an active document that is automatically
updated by the node manager's registry service to always reflect the
current state of the federation.
	This is a feature of the new ESGF Node.  Gateways are not
running node managers so they are not present in the registration.xml
document.  However, you can find out about gateways indirectly by
looking at the ESGF Node's registration entry and looking at the
attribute "adminPeer" this indicates that node's target IDP service,
which in older ESG parlance indicates a "gateway".  The new ESGF Nodes
are built based on a modular component architecture such that sets of
components embody functionality, these are what we call ESGF Node
"types".  There are 4 node types. The node type that is currently being
installed is the well known "data" type a.k.a the "data node", the other
types are not mutually exclusive and extend the ESGF Nodes functionality
to include familiar features such as:
	- User credential management and single sign on support
	- Attribute management
	- Enhanced Federation-wide searching (with new search front-end)
	
	As well as recent features since v0.5.1 and pending features
coming on line such as:
	- Automatic fail-over and fault tolerance
	- New administrative front ends
	- Computation / Visualization tools
	- and more...
	
	I would suggest upgrading :-).
	
	The installation/upgrading process has been streamlined to make
things more straight forward - and the team and I are always glad to
help if needed.  There are further enhancements in the queue that will
further streamline the process to make installation/upgrading as
turn-key as possible.  There are also enhancements to the federation
protocol and new features as well, that will soon be available in an
upcoming v0.5.3 release that is currently in test.
	
	FYI:
	The current installer installs the ESGF Node at v0.5.1.
	In staging is v0.5.2
	In test is v0.5.3.
	
	Note: The list above are versions of the node manager component.
As it is a component of the ESGF Node, the node itself has a version
currently ESG Node v1.0.4+ (Stuyvesant release).
	
	The new ESGF Node augments the data node and is a complete
solution in and of itself while being compatible with the current
Gateway.  It should be considered a useful tool to help the climate
community and adding to the ESG ecosystem of utilities :-).
	
	Whew... (that was a long email)
	I hope this was somewhat useful information in the context of
your tangent. :-)
	
	
	On 7/1/11 6:49 AM, Kettleborough, Jamie wrote: 

		I created this table by: looking at each gateway,
figuring out which
		modelling institutes contributed to the CMIP5 project,
selecting a
		sample data-set, creating a wget script, and then
inspecting the url in
		the script.  (I couldn't get to any NCC data as I didn't
have access).
		I only sampled one dataset.
		
		This feels a bit long winded - what is the expected way
to do this?
		Although today I was just gathering information on what
data nodes are
		out there I can imagine this as a part of a real life
use case (a very
		common use case).  If I want to gather a diagnostic,
such as monthly
		mean surface temperature from as many models as I can, I
think I'd have
		to do this sort of trawling.  OK I maybe only have to do
the initial
		mapping of institute to data node once, but I think
there is still a
		trawl needed between gateways to get the data.  I may be
missing
		something - and I took some unnecessary steps. Please
let me know if
		this is the case.  Estani, Martin, Sebastien - sounds
like you have
		already started to do this sort of thing?
		
		I also note that not all gateways know about all
institutes - I think
		this is a known problem.  For instance PCMDI doesn't
know about IPSL,
		and only NCI seems to know about CSIRO. Any ideas when
this might be
		resolved?
		
		


	-- 
	Gavin M. Bell
	Lawrence Livermore National Labs
	--
	
	 "Never mistake a clear view for a short distance."
	       	       -Paul Saffo
	
	(GPG Key - http://rainbow.llnl.gov/dist/keys/gavin.asc)
	
	 A796 CE39 9C31 68A4 52A7  1F6B 66B7 B250 21D5 6D3E
	



More information about the GO-ESSP-TECH mailing list