<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    Hi,<br>

    <br>

    indeed there are multiple ways to accomplish this, but let's stick

    to what we have already discussed and agreed on for the time being.<br>

    <br>

    And that's:<br>

    - name the BDM endpoint to GridFTP-BDM while publishing (Craig,

    perform a case insensitive comparison for the name attribute in

    data_access_capability)<br>

    - Add a flag to the replication client to define this value in case

    the default is not desired (In that case I'd suggest a case

    sensitive comparison). This allows us to extract data from HTTP

    services (if we don't publish bdm endpoints to the server they won't

    be available; AFAIK you get them from the publisher... and I'm not

    sure if it queries the local DB or the Gateway API)<br>

    <br>

    This defines the Interface, it doesn't really matter where or how

    this gets published.<br>

    <br>

    Anyway, Craig, I'm having problems with the globus-url-copy as it

    keeps breaking. This forces me to go to my backup plan, which

    doesn't involve the esgreplicate.py at all. The problem with the

    script is that it's extremely decoupled from the file transfer; and

    that there are still no checksums in it (probably because the

    publisher does not provide them).<br>

    My backup plan involves BASH and SQLlite. I'm wondering if you want

    to integrate this into your replica script, in such case I'd port it

    to Python+sqlalchemy (+SQLlite).<br>

    The idea is to keep track of the files as well as their sizes and

    checksums to avoid re-downloading and assuring no data corruption

    happens in the process. I contact the Gateway directly for

    extracting this data (then transform to fin GridFTP-BDM, a little

    different procedure I'm afraid), but if the data node provide such

    API, then I'd contact the data node directly. This should ease the

    work in my opinion. <br>

    <br>

    Thanks,<br>

    Estani<br>

    <br>

    Am 24.05.2011 03:09, schrieb Gavin M. Bell:

    <blockquote cite="mid:4DDB0539.5090001@llnl.gov" type="cite">

      <meta content="text/html; charset=ISO-8859-1"

        http-equiv="Content-Type">

      Hi Craig, <br>

      <br>

      I am not thinking about the gateway at all.&nbsp; I am thinking about

      limiting the number of actors in this transaction.&nbsp; By all means

      press on.&nbsp; I think there is another way to accomplish this task.<br>

      <br>

      <br>

      On 5/23/11 11:50 AM, Craig E. Ward wrote:

      <blockquote cite="mid:4DDAAC82.9090001@ISI.EDU" type="cite">

        <pre wrap="">This doesn't actually address the issue at hand. The problem is distinguishing 

the different flavors of GridFTP service in the gateway meta data. As the meta 

data from the gateway is already being retrieved and processed, it is simpler 

to modify how it is processed than to add yet another retrieval and additional 

processing.

As I understand the meta data, it contains two elements that allow clients to 

create URLs for files. One element is called "data_access_capability" and 

contains attributes naming the service, identifying the type of service, and 

specifying a base URI. The other element is "file_access_point," which contains 

an attribute "data_access_capability" with the name of the service the URI 

string is applicable to. The "file_access_point" element is a child of the 

"file" element and "data_access_capability" is a child of "dataset."

Ideally, the "type" attribute of the "data_access_capability" element could be 

set to indicate the type of GridFTP service, i.e. "normal" or "ESG/BDM." In 

another thread, Bob Drach may have written that the values for "type" are 

restricted. We need this clarified.

The "name" attribute for the element may be more free form and we could use 

that. This name may come from the value assigned to "thredds_file_services" in 

the esg.ini file. If this is correct, the naming convention could be 

standardized administratively and not require code changes outside of the 

replication client itself.

One way or another, a choice needs to be made.

The most expeditious route to a solution is to make use of either the type 

attribute or name attribute of the data_access_capability element. Trying to 

take other routes will require significant redesign and reimplementation and 

take more time than I think this project has for this stage.

Craig

p.s. I removed the list <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:esgf-mechanic@lists.llnl.gov">"esgf-mechanic@lists.llnl.gov"</a> because I am not a 

subscriber to it. I assume that everyone that needs to know about this is on 

the "go-essp-tech" list.

On 5/17/11 7:39 PM, Gavin M. Bell wrote:

</pre>

        <blockquote type="cite">

          <pre wrap="">Hi Craig,

Perhaps I can help elucidate some things.  This is at least how I see

things moving forward.

The key thing is that you need information about nodes and what they

have running on them, specifically GridFTP and what configuration is

supported.

In the current ESGF P2P Node there is a registry service that comes with

every node and is present in all configurations.  The registry's

external view is an XML file that adheres to an xsd that Rachana, Neill,

Philip, Eric and myself put together.  This xml file will contain

essentially a manifest of all nodes participating in the P2P federation

including what services are running on them.  For services like the

gateway, that currently doesn't run directly in the P2P architecture,

any peer can be contacted for this information.  For example PCMDI3 will

be a peer that can be contacted for inspection of the registry, but

PCMDI3 is no more special that any other node that is present in the P2P

federation - so by convention instead of by construction PCMDI3 is a

name we all know and love and will be albe to go to.  So, you may

inspect the registration information on the node you are running the

replication client on for the information you need i.e. gridftp nodes,

their configuration and ports and such.

I have attached a quick registration.xml document from a test node, it

only contains two node entries at the moment, but that should be enough

to illustrate that there may be many more nodes represented in the same

fashion.  Note the context that you should read this in is that each

node is advertising the services it has running.  It is up the the

"reader" to glean from this collection other derivative information,

like, 'which nodes are running gridftp services in a particular

configuration', 'which what nodes are pointing to a particular IDP', etc.

I think this solves your questions, essentially get the info straight

from the horses mouth (aka the registry running on the nodes where the

data lives).  Does this help?  Let me know if there are bits of

information that you don't find accommodated so we can have them present

if needed. :-)  Between the registry and the search api you should be

golden. I know you have recently installed the P2P Node, so you may

already have the manager in place.  The version is v0.5.0 (check

/etc/esg.install_log and look for the esgf-node-manager entry).

P.S.

I would suggest that folks start upgrading their node installations.  We

are doing a bit of cleanup and spit and polish on our respective ESGF

P2P Node components, but what is posted right now, is ready - modulo

spit and/or polish. :-).  I am the last man standing with some

installation cleanup, however, by end of next week we will "release the

hounds"! The intrepid may go at it now. ...(Hmmm.. that reminds me... we

have a lot of documentation to post as well.)

On 5/17/11 3:23 PM, Craig E. Ward wrote:

</pre>

          <blockquote type="cite">

            <pre wrap="">On the GO-ESSP call today, we bounced around some ideas about how replication

could distinguish between the different types of GridFTP servers that the ESG

Federation will have. This is what I heard. I'd like to make sure we're all on

the same page.

A naming convention will be applied to the a relevant property (Which

property?) that will mark a particular service as either "normal" GridFTP or

the ESG-security-specific (i.e. BDM) GridFTP. The replication client will use

an appropriate name when selecting services for the user's preferred data

movement agent.

This also requires a new configuration option for the replication client that

allows the user to control what the which service to use in the transfer

control file.

In the meta data XML, the replication client is looking at the name stored for

"data_access_capability." The default value to match is "GridFTP." It isn't

clear to me where this value is coming from, but the "serviceType" attribute of

the "service" element in the TDS catalog is set to "GridFTP" for that type of

protocol. Is the gateway placing the "serviceType" value into the

"data_access_capability" attribute?

In another thread, Bob wrote about restrictions on what names were allowed at

certain points. If so, this could complicate the issue, but not prevent this

solution from working.

Who remembers things differently?

Thanks,

Craig

</pre>

          </blockquote>

        </blockquote>

      </blockquote>

      <br>

      <pre class="moz-signature" cols="72">-- 

Gavin M. Bell

--

 "Never mistake a clear view for a short distance."

                      -Paul Saffo

</pre>

      <pre wrap="">

<fieldset class="mimeAttachmentHeader"></fieldset>

_______________________________________________

GO-ESSP-TECH mailing list

<a class="moz-txt-link-abbreviated" href="mailto:GO-ESSP-TECH@ucar.edu">GO-ESSP-TECH@ucar.edu</a>

<a class="moz-txt-link-freetext" href="http://mailman.ucar.edu/mailman/listinfo/go-essp-tech">http://mailman.ucar.edu/mailman/listinfo/go-essp-tech</a>

</pre>

    </blockquote>

    <br>

    <br>

    <pre class="moz-signature" cols="72">-- 

Estanislao Gonzalez

Max-Planck-Institut f&uuml;r Meteorologie (MPI-M)

Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre

Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126

E-Mail:  <a class="moz-txt-link-abbreviated" href="mailto:gonzalez@dkrz.de">gonzalez@dkrz.de</a> </pre>

  </body>

</html>