[Go-essp-tech] Selecting GridFTP Server Types in the Replication Client

Craig E. Ward cward at isi.edu
Tue May 24 09:56:38 MDT 2011


As the saying goes, there's more than one way to skin a catfish. From my 
perspective, there are issues with the way replication is integrated into ESG, 
but I don't think we can solve all of them all at once. The proposed 
gateway-centric solution solves the immediate problem without limiting changes 
at a future stage of the project.

Craig

On 5/23/11 6:09 PM, Gavin M. Bell wrote:
> Hi Craig,
>
> I am not thinking about the gateway at all.  I am thinking about
> limiting the number of actors in this transaction.  By all means press
> on.  I think there is another way to accomplish this task.
>
>
> On 5/23/11 11:50 AM, Craig E. Ward wrote:
>> This doesn't actually address the issue at hand. The problem is distinguishing
>> the different flavors of GridFTP service in the gateway meta data. As the meta
>> data from the gateway is already being retrieved and processed, it is simpler
>> to modify how it is processed than to add yet another retrieval and additional
>> processing.
>>
>> As I understand the meta data, it contains two elements that allow clients to
>> create URLs for files. One element is called "data_access_capability" and
>> contains attributes naming the service, identifying the type of service, and
>> specifying a base URI. The other element is "file_access_point," which contains
>> an attribute "data_access_capability" with the name of the service the URI
>> string is applicable to. The "file_access_point" element is a child of the
>> "file" element and "data_access_capability" is a child of "dataset."
>>
>> Ideally, the "type" attribute of the "data_access_capability" element could be
>> set to indicate the type of GridFTP service, i.e. "normal" or "ESG/BDM." In
>> another thread, Bob Drach may have written that the values for "type" are
>> restricted. We need this clarified.
>>
>> The "name" attribute for the element may be more free form and we could use
>> that. This name may come from the value assigned to "thredds_file_services" in
>> the esg.ini file. If this is correct, the naming convention could be
>> standardized administratively and not require code changes outside of the
>> replication client itself.
>>
>> One way or another, a choice needs to be made.
>>
>> The most expeditious route to a solution is to make use of either the type
>> attribute or name attribute of the data_access_capability element. Trying to
>> take other routes will require significant redesign and reimplementation and
>> take more time than I think this project has for this stage.
>>
>> Craig
>>
>> p.s. I removed the list "esgf-mechanic at lists.llnl.gov" because I am not a
>> subscriber to it. I assume that everyone that needs to know about this is on
>> the "go-essp-tech" list.
>>
>> On 5/17/11 7:39 PM, Gavin M. Bell wrote:
>>> Hi Craig,
>>>
>>> Perhaps I can help elucidate some things.  This is at least how I see
>>> things moving forward.
>>> The key thing is that you need information about nodes and what they
>>> have running on them, specifically GridFTP and what configuration is
>>> supported.
>>>
>>> In the current ESGF P2P Node there is a registry service that comes with
>>> every node and is present in all configurations.  The registry's
>>> external view is an XML file that adheres to an xsd that Rachana, Neill,
>>> Philip, Eric and myself put together.  This xml file will contain
>>> essentially a manifest of all nodes participating in the P2P federation
>>> including what services are running on them.  For services like the
>>> gateway, that currently doesn't run directly in the P2P architecture,
>>> any peer can be contacted for this information.  For example PCMDI3 will
>>> be a peer that can be contacted for inspection of the registry, but
>>> PCMDI3 is no more special that any other node that is present in the P2P
>>> federation - so by convention instead of by construction PCMDI3 is a
>>> name we all know and love and will be albe to go to.  So, you may
>>> inspect the registration information on the node you are running the
>>> replication client on for the information you need i.e. gridftp nodes,
>>> their configuration and ports and such.
>>>
>>> I have attached a quick registration.xml document from a test node, it
>>> only contains two node entries at the moment, but that should be enough
>>> to illustrate that there may be many more nodes represented in the same
>>> fashion.  Note the context that you should read this in is that each
>>> node is advertising the services it has running.  It is up the the
>>> "reader" to glean from this collection other derivative information,
>>> like, 'which nodes are running gridftp services in a particular
>>> configuration', 'which what nodes are pointing to a particular IDP', etc.
>>>
>>> I think this solves your questions, essentially get the info straight
>>> from the horses mouth (aka the registry running on the nodes where the
>>> data lives).  Does this help?  Let me know if there are bits of
>>> information that you don't find accommodated so we can have them present
>>> if needed. :-)  Between the registry and the search api you should be
>>> golden. I know you have recently installed the P2P Node, so you may
>>> already have the manager in place.  The version is v0.5.0 (check
>>> /etc/esg.install_log and look for the esgf-node-manager entry).
>>>
>>> P.S.
>>> I would suggest that folks start upgrading their node installations.  We
>>> are doing a bit of cleanup and spit and polish on our respective ESGF
>>> P2P Node components, but what is posted right now, is ready - modulo
>>> spit and/or polish. :-).  I am the last man standing with some
>>> installation cleanup, however, by end of next week we will "release the
>>> hounds"! The intrepid may go at it now. ...(Hmmm.. that reminds me... we
>>> have a lot of documentation to post as well.)
>>>
>>> On 5/17/11 3:23 PM, Craig E. Ward wrote:
>>>> On the GO-ESSP call today, we bounced around some ideas about how replication
>>>> could distinguish between the different types of GridFTP servers that the ESG
>>>> Federation will have. This is what I heard. I'd like to make sure we're all on
>>>> the same page.
>>>>
>>>> A naming convention will be applied to the a relevant property (Which
>>>> property?) that will mark a particular service as either "normal" GridFTP or
>>>> the ESG-security-specific (i.e. BDM) GridFTP. The replication client will use
>>>> an appropriate name when selecting services for the user's preferred data
>>>> movement agent.
>>>>
>>>> This also requires a new configuration option for the replication client that
>>>> allows the user to control what the which service to use in the transfer
>>>> control file.
>>>>
>>>> In the meta data XML, the replication client is looking at the name stored for
>>>> "data_access_capability." The default value to match is "GridFTP." It isn't
>>>> clear to me where this value is coming from, but the "serviceType" attribute of
>>>> the "service" element in the TDS catalog is set to "GridFTP" for that type of
>>>> protocol. Is the gateway placing the "serviceType" value into the
>>>> "data_access_capability" attribute?
>>>>
>>>> In another thread, Bob wrote about restrictions on what names were allowed at
>>>> certain points. If so, this could complicate the issue, but not prevent this
>>>> solution from working.
>>>>
>>>> Who remembers things differently?
>>>>
>>>> Thanks,
>>>>
>>>> Craig
>>>>
>

-- 
Craig E. Ward
USC Information Sciences Institute
310-448-8271
cward at ISI.EDU


More information about the GO-ESSP-TECH mailing list