[Go-essp-tech] Selecting GridFTP Server Types in the Replication Client

Gavin M. Bell gavin at llnl.gov
Mon May 23 19:09:13 MDT 2011


Hi Craig,

I am not thinking about the gateway at all.  I am thinking about
limiting the number of actors in this transaction.  By all means press
on.  I think there is another way to accomplish this task.


On 5/23/11 11:50 AM, Craig E. Ward wrote:
> This doesn't actually address the issue at hand. The problem is distinguishing 
> the different flavors of GridFTP service in the gateway meta data. As the meta 
> data from the gateway is already being retrieved and processed, it is simpler 
> to modify how it is processed than to add yet another retrieval and additional 
> processing.
>
> As I understand the meta data, it contains two elements that allow clients to 
> create URLs for files. One element is called "data_access_capability" and 
> contains attributes naming the service, identifying the type of service, and 
> specifying a base URI. The other element is "file_access_point," which contains 
> an attribute "data_access_capability" with the name of the service the URI 
> string is applicable to. The "file_access_point" element is a child of the 
> "file" element and "data_access_capability" is a child of "dataset."
>
> Ideally, the "type" attribute of the "data_access_capability" element could be 
> set to indicate the type of GridFTP service, i.e. "normal" or "ESG/BDM." In 
> another thread, Bob Drach may have written that the values for "type" are 
> restricted. We need this clarified.
>
> The "name" attribute for the element may be more free form and we could use 
> that. This name may come from the value assigned to "thredds_file_services" in 
> the esg.ini file. If this is correct, the naming convention could be 
> standardized administratively and not require code changes outside of the 
> replication client itself.
>
> One way or another, a choice needs to be made.
>
> The most expeditious route to a solution is to make use of either the type 
> attribute or name attribute of the data_access_capability element. Trying to 
> take other routes will require significant redesign and reimplementation and 
> take more time than I think this project has for this stage.
>
> Craig
>
> p.s. I removed the list "esgf-mechanic at lists.llnl.gov" because I am not a 
> subscriber to it. I assume that everyone that needs to know about this is on 
> the "go-essp-tech" list.
>
> On 5/17/11 7:39 PM, Gavin M. Bell wrote:
>> Hi Craig,
>>
>> Perhaps I can help elucidate some things.  This is at least how I see
>> things moving forward.
>> The key thing is that you need information about nodes and what they
>> have running on them, specifically GridFTP and what configuration is
>> supported.
>>
>> In the current ESGF P2P Node there is a registry service that comes with
>> every node and is present in all configurations.  The registry's
>> external view is an XML file that adheres to an xsd that Rachana, Neill,
>> Philip, Eric and myself put together.  This xml file will contain
>> essentially a manifest of all nodes participating in the P2P federation
>> including what services are running on them.  For services like the
>> gateway, that currently doesn't run directly in the P2P architecture,
>> any peer can be contacted for this information.  For example PCMDI3 will
>> be a peer that can be contacted for inspection of the registry, but
>> PCMDI3 is no more special that any other node that is present in the P2P
>> federation - so by convention instead of by construction PCMDI3 is a
>> name we all know and love and will be albe to go to.  So, you may
>> inspect the registration information on the node you are running the
>> replication client on for the information you need i.e. gridftp nodes,
>> their configuration and ports and such.
>>
>> I have attached a quick registration.xml document from a test node, it
>> only contains two node entries at the moment, but that should be enough
>> to illustrate that there may be many more nodes represented in the same
>> fashion.  Note the context that you should read this in is that each
>> node is advertising the services it has running.  It is up the the
>> "reader" to glean from this collection other derivative information,
>> like, 'which nodes are running gridftp services in a particular
>> configuration', 'which what nodes are pointing to a particular IDP', etc.
>>
>> I think this solves your questions, essentially get the info straight
>> from the horses mouth (aka the registry running on the nodes where the
>> data lives).  Does this help?  Let me know if there are bits of
>> information that you don't find accommodated so we can have them present
>> if needed. :-)  Between the registry and the search api you should be
>> golden. I know you have recently installed the P2P Node, so you may
>> already have the manager in place.  The version is v0.5.0 (check
>> /etc/esg.install_log and look for the esgf-node-manager entry).
>>
>> P.S.
>> I would suggest that folks start upgrading their node installations.  We
>> are doing a bit of cleanup and spit and polish on our respective ESGF
>> P2P Node components, but what is posted right now, is ready - modulo
>> spit and/or polish. :-).  I am the last man standing with some
>> installation cleanup, however, by end of next week we will "release the
>> hounds"! The intrepid may go at it now. ...(Hmmm.. that reminds me... we
>> have a lot of documentation to post as well.)
>>
>> On 5/17/11 3:23 PM, Craig E. Ward wrote:
>>> On the GO-ESSP call today, we bounced around some ideas about how replication
>>> could distinguish between the different types of GridFTP servers that the ESG
>>> Federation will have. This is what I heard. I'd like to make sure we're all on
>>> the same page.
>>>
>>> A naming convention will be applied to the a relevant property (Which
>>> property?) that will mark a particular service as either "normal" GridFTP or
>>> the ESG-security-specific (i.e. BDM) GridFTP. The replication client will use
>>> an appropriate name when selecting services for the user's preferred data
>>> movement agent.
>>>
>>> This also requires a new configuration option for the replication client that
>>> allows the user to control what the which service to use in the transfer
>>> control file.
>>>
>>> In the meta data XML, the replication client is looking at the name stored for
>>> "data_access_capability." The default value to match is "GridFTP." It isn't
>>> clear to me where this value is coming from, but the "serviceType" attribute of
>>> the "service" element in the TDS catalog is set to "GridFTP" for that type of
>>> protocol. Is the gateway placing the "serviceType" value into the
>>> "data_access_capability" attribute?
>>>
>>> In another thread, Bob wrote about restrictions on what names were allowed at
>>> certain points. If so, this could complicate the issue, but not prevent this
>>> solution from working.
>>>
>>> Who remembers things differently?
>>>
>>> Thanks,
>>>
>>> Craig
>>>

-- 
Gavin M. Bell
--

 "Never mistake a clear view for a short distance."
       	       -Paul Saffo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110523/b2fe0ebb/attachment.html 


More information about the GO-ESSP-TECH mailing list