[Go-essp-tech] Selecting GridFTP Server Types in the Replication Client

Craig E. Ward cward at isi.edu
Mon May 23 12:50:42 MDT 2011


This doesn't actually address the issue at hand. The problem is distinguishing 
the different flavors of GridFTP service in the gateway meta data. As the meta 
data from the gateway is already being retrieved and processed, it is simpler 
to modify how it is processed than to add yet another retrieval and additional 
processing.

As I understand the meta data, it contains two elements that allow clients to 
create URLs for files. One element is called "data_access_capability" and 
contains attributes naming the service, identifying the type of service, and 
specifying a base URI. The other element is "file_access_point," which contains 
an attribute "data_access_capability" with the name of the service the URI 
string is applicable to. The "file_access_point" element is a child of the 
"file" element and "data_access_capability" is a child of "dataset."

Ideally, the "type" attribute of the "data_access_capability" element could be 
set to indicate the type of GridFTP service, i.e. "normal" or "ESG/BDM." In 
another thread, Bob Drach may have written that the values for "type" are 
restricted. We need this clarified.

The "name" attribute for the element may be more free form and we could use 
that. This name may come from the value assigned to "thredds_file_services" in 
the esg.ini file. If this is correct, the naming convention could be 
standardized administratively and not require code changes outside of the 
replication client itself.

One way or another, a choice needs to be made.

The most expeditious route to a solution is to make use of either the type 
attribute or name attribute of the data_access_capability element. Trying to 
take other routes will require significant redesign and reimplementation and 
take more time than I think this project has for this stage.

Craig

p.s. I removed the list "esgf-mechanic at lists.llnl.gov" because I am not a 
subscriber to it. I assume that everyone that needs to know about this is on 
the "go-essp-tech" list.

On 5/17/11 7:39 PM, Gavin M. Bell wrote:
> Hi Craig,
>
> Perhaps I can help elucidate some things.  This is at least how I see
> things moving forward.
> The key thing is that you need information about nodes and what they
> have running on them, specifically GridFTP and what configuration is
> supported.
>
> In the current ESGF P2P Node there is a registry service that comes with
> every node and is present in all configurations.  The registry's
> external view is an XML file that adheres to an xsd that Rachana, Neill,
> Philip, Eric and myself put together.  This xml file will contain
> essentially a manifest of all nodes participating in the P2P federation
> including what services are running on them.  For services like the
> gateway, that currently doesn't run directly in the P2P architecture,
> any peer can be contacted for this information.  For example PCMDI3 will
> be a peer that can be contacted for inspection of the registry, but
> PCMDI3 is no more special that any other node that is present in the P2P
> federation - so by convention instead of by construction PCMDI3 is a
> name we all know and love and will be albe to go to.  So, you may
> inspect the registration information on the node you are running the
> replication client on for the information you need i.e. gridftp nodes,
> their configuration and ports and such.
>
> I have attached a quick registration.xml document from a test node, it
> only contains two node entries at the moment, but that should be enough
> to illustrate that there may be many more nodes represented in the same
> fashion.  Note the context that you should read this in is that each
> node is advertising the services it has running.  It is up the the
> "reader" to glean from this collection other derivative information,
> like, 'which nodes are running gridftp services in a particular
> configuration', 'which what nodes are pointing to a particular IDP', etc.
>
> I think this solves your questions, essentially get the info straight
> from the horses mouth (aka the registry running on the nodes where the
> data lives).  Does this help?  Let me know if there are bits of
> information that you don't find accommodated so we can have them present
> if needed. :-)  Between the registry and the search api you should be
> golden. I know you have recently installed the P2P Node, so you may
> already have the manager in place.  The version is v0.5.0 (check
> /etc/esg.install_log and look for the esgf-node-manager entry).
>
> P.S.
> I would suggest that folks start upgrading their node installations.  We
> are doing a bit of cleanup and spit and polish on our respective ESGF
> P2P Node components, but what is posted right now, is ready - modulo
> spit and/or polish. :-).  I am the last man standing with some
> installation cleanup, however, by end of next week we will "release the
> hounds"! The intrepid may go at it now. ...(Hmmm.. that reminds me... we
> have a lot of documentation to post as well.)
>
> On 5/17/11 3:23 PM, Craig E. Ward wrote:
>> On the GO-ESSP call today, we bounced around some ideas about how replication
>> could distinguish between the different types of GridFTP servers that the ESG
>> Federation will have. This is what I heard. I'd like to make sure we're all on
>> the same page.
>>
>> A naming convention will be applied to the a relevant property (Which
>> property?) that will mark a particular service as either "normal" GridFTP or
>> the ESG-security-specific (i.e. BDM) GridFTP. The replication client will use
>> an appropriate name when selecting services for the user's preferred data
>> movement agent.
>>
>> This also requires a new configuration option for the replication client that
>> allows the user to control what the which service to use in the transfer
>> control file.
>>
>> In the meta data XML, the replication client is looking at the name stored for
>> "data_access_capability." The default value to match is "GridFTP." It isn't
>> clear to me where this value is coming from, but the "serviceType" attribute of
>> the "service" element in the TDS catalog is set to "GridFTP" for that type of
>> protocol. Is the gateway placing the "serviceType" value into the
>> "data_access_capability" attribute?
>>
>> In another thread, Bob wrote about restrictions on what names were allowed at
>> certain points. If so, this could complicate the issue, but not prevent this
>> solution from working.
>>
>> Who remembers things differently?
>>
>> Thanks,
>>
>> Craig
>>
>

-- 
Craig E. Ward
USC Information Sciences Institute
310-448-8271
cward at ISI.EDU


More information about the GO-ESSP-TECH mailing list