[Go-essp-tech] Selecting GridFTP Server Types in the Replication Client

Gavin M. Bell gavin at llnl.gov
Tue May 24 10:02:33 MDT 2011


Hi Craig,

Indeed, as long as said catfish gets skinned in the end :-).
I am going to investigate another way.  If my investigation bears fruit
we can trade notes and put this thing to bed.


On 5/24/11 8:56 AM, Craig E. Ward wrote:
> As the saying goes, there's more than one way to skin a catfish. From my 
> perspective, there are issues with the way replication is integrated into ESG, 
> but I don't think we can solve all of them all at once. The proposed 
> gateway-centric solution solves the immediate problem without limiting changes 
> at a future stage of the project.
>
> Craig
>
> On 5/23/11 6:09 PM, Gavin M. Bell wrote:
>> Hi Craig,
>>
>> I am not thinking about the gateway at all.  I am thinking about
>> limiting the number of actors in this transaction.  By all means press
>> on.  I think there is another way to accomplish this task.
>>
>>
>> On 5/23/11 11:50 AM, Craig E. Ward wrote:
>>> This doesn't actually address the issue at hand. The problem is distinguishing
>>> the different flavors of GridFTP service in the gateway meta data. As the meta
>>> data from the gateway is already being retrieved and processed, it is simpler
>>> to modify how it is processed than to add yet another retrieval and additional
>>> processing.
>>>
>>> As I understand the meta data, it contains two elements that allow clients to
>>> create URLs for files. One element is called "data_access_capability" and
>>> contains attributes naming the service, identifying the type of service, and
>>> specifying a base URI. The other element is "file_access_point," which contains
>>> an attribute "data_access_capability" with the name of the service the URI
>>> string is applicable to. The "file_access_point" element is a child of the
>>> "file" element and "data_access_capability" is a child of "dataset."
>>>
>>> Ideally, the "type" attribute of the "data_access_capability" element could be
>>> set to indicate the type of GridFTP service, i.e. "normal" or "ESG/BDM." In
>>> another thread, Bob Drach may have written that the values for "type" are
>>> restricted. We need this clarified.
>>>
>>> The "name" attribute for the element may be more free form and we could use
>>> that. This name may come from the value assigned to "thredds_file_services" in
>>> the esg.ini file. If this is correct, the naming convention could be
>>> standardized administratively and not require code changes outside of the
>>> replication client itself.
>>>
>>> One way or another, a choice needs to be made.
>>>
>>> The most expeditious route to a solution is to make use of either the type
>>> attribute or name attribute of the data_access_capability element. Trying to
>>> take other routes will require significant redesign and reimplementation and
>>> take more time than I think this project has for this stage.
>>>
>>> Craig
>>>
>>> p.s. I removed the list "esgf-mechanic at lists.llnl.gov" because I am not a
>>> subscriber to it. I assume that everyone that needs to know about this is on
>>> the "go-essp-tech" list.
>>>
>>> On 5/17/11 7:39 PM, Gavin M. Bell wrote:
>>>> Hi Craig,
>>>>
>>>> Perhaps I can help elucidate some things.  This is at least how I see
>>>> things moving forward.
>>>> The key thing is that you need information about nodes and what they
>>>> have running on them, specifically GridFTP and what configuration is
>>>> supported.
>>>>
>>>> In the current ESGF P2P Node there is a registry service that comes with
>>>> every node and is present in all configurations.  The registry's
>>>> external view is an XML file that adheres to an xsd that Rachana, Neill,
>>>> Philip, Eric and myself put together.  This xml file will contain
>>>> essentially a manifest of all nodes participating in the P2P federation
>>>> including what services are running on them.  For services like the
>>>> gateway, that currently doesn't run directly in the P2P architecture,
>>>> any peer can be contacted for this information.  For example PCMDI3 will
>>>> be a peer that can be contacted for inspection of the registry, but
>>>> PCMDI3 is no more special that any other node that is present in the P2P
>>>> federation - so by convention instead of by construction PCMDI3 is a
>>>> name we all know and love and will be albe to go to.  So, you may
>>>> inspect the registration information on the node you are running the
>>>> replication client on for the information you need i.e. gridftp nodes,
>>>> their configuration and ports and such.
>>>>
>>>> I have attached a quick registration.xml document from a test node, it
>>>> only contains two node entries at the moment, but that should be enough
>>>> to illustrate that there may be many more nodes represented in the same
>>>> fashion.  Note the context that you should read this in is that each
>>>> node is advertising the services it has running.  It is up the the
>>>> "reader" to glean from this collection other derivative information,
>>>> like, 'which nodes are running gridftp services in a particular
>>>> configuration', 'which what nodes are pointing to a particular IDP', etc.
>>>>
>>>> I think this solves your questions, essentially get the info straight
>>>> from the horses mouth (aka the registry running on the nodes where the
>>>> data lives).  Does this help?  Let me know if there are bits of
>>>> information that you don't find accommodated so we can have them present
>>>> if needed. :-)  Between the registry and the search api you should be
>>>> golden. I know you have recently installed the P2P Node, so you may
>>>> already have the manager in place.  The version is v0.5.0 (check
>>>> /etc/esg.install_log and look for the esgf-node-manager entry).
>>>>
>>>> P.S.
>>>> I would suggest that folks start upgrading their node installations.  We
>>>> are doing a bit of cleanup and spit and polish on our respective ESGF
>>>> P2P Node components, but what is posted right now, is ready - modulo
>>>> spit and/or polish. :-).  I am the last man standing with some
>>>> installation cleanup, however, by end of next week we will "release the
>>>> hounds"! The intrepid may go at it now. ...(Hmmm.. that reminds me... we
>>>> have a lot of documentation to post as well.)
>>>>
>>>> On 5/17/11 3:23 PM, Craig E. Ward wrote:
>>>>> On the GO-ESSP call today, we bounced around some ideas about how replication
>>>>> could distinguish between the different types of GridFTP servers that the ESG
>>>>> Federation will have. This is what I heard. I'd like to make sure we're all on
>>>>> the same page.
>>>>>
>>>>> A naming convention will be applied to the a relevant property (Which
>>>>> property?) that will mark a particular service as either "normal" GridFTP or
>>>>> the ESG-security-specific (i.e. BDM) GridFTP. The replication client will use
>>>>> an appropriate name when selecting services for the user's preferred data
>>>>> movement agent.
>>>>>
>>>>> This also requires a new configuration option for the replication client that
>>>>> allows the user to control what the which service to use in the transfer
>>>>> control file.
>>>>>
>>>>> In the meta data XML, the replication client is looking at the name stored for
>>>>> "data_access_capability." The default value to match is "GridFTP." It isn't
>>>>> clear to me where this value is coming from, but the "serviceType" attribute of
>>>>> the "service" element in the TDS catalog is set to "GridFTP" for that type of
>>>>> protocol. Is the gateway placing the "serviceType" value into the
>>>>> "data_access_capability" attribute?
>>>>>
>>>>> In another thread, Bob wrote about restrictions on what names were allowed at
>>>>> certain points. If so, this could complicate the issue, but not prevent this
>>>>> solution from working.
>>>>>
>>>>> Who remembers things differently?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Craig
>>>>>

-- 
Gavin M. Bell
--

 "Never mistake a clear view for a short distance."
       	       -Paul Saffo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110524/3838f2c1/attachment.html 


More information about the GO-ESSP-TECH mailing list