[Go-essp-tech] Bulk data moving and the UNIDATA LDM

Rachana Ananthakrishnan ranantha at mcs.anl.gov
Tue Jun 15 13:43:46 MDT 2010


Hi Balaji,

>
> Ffor large files, you might consider the data channel integrity
> option, rather then the end of file transfer checksum check. In the
> data channel integrity option, every data block is integrity protected
> and checked. I am confirming that this option with the retries support
> built into globus-url-copy, will only retry the failed data block.
> I'll get back to you on that piece.
>

This was confirmed. So with globus-url-copy, if you use dumpfile (-df)  
option, it transmits only the missing pieces, even when you use - 
dcsafe option (checking each packet). More info on retries with  
dumpfile is available at http://www.globus.org/toolkit/docs/latest-stable/data/gridftp/user/#gridftp-user-advanced-failures

Cheers,
Rachana

> rsync does provide an excellent tool for keeping two directories in
> sych. Infact, the GridFTP team is working on building such a support
> in, where in the same features as rsync is provided (in terms of
> determining the changed file set or file blocks that need to be
> moved), but the underlying transfer protocol will be gsiftp. I expect
> that this would be useful, over rsync,  if the changes are
> significantly large. This feature has just been completed, and
> performance studies have not been done yet. We'll know more soon.
>
> Rachana
>
> On Jun 14, 2010, at 9:55 AM, V. Balaji wrote:
>
>> The talk of loss rates and so on is bringing up a question for
>> me related to your previous email, Rachana. There you stated that
>> gridftp data movement clients have support for checksumming. As far
>> as we can tell (and we are heavy users of griftp between ORNL and
>> GFDL...) we have had to write our own checksum software at either
>> end, globus_url_copy does not. It's not the most efficient way to do
>> things, for if a large file transfer fails to pass a checksum test,
>> we have to retransmit the whole file. Whereas rsync, for instance,
>> can checksum each packet in flight and only retransmit the failed
>> ones.
>>
>> Can we confirm whether the end-to-end checksumming for bulk data
>> movement
>> will come from gridftp, or will it have to be built by the ESGF?
>>
>> Thanks,
>>
>> Rachana Ananthakrishnan writes:
>>
>>> Hi Martin,
>>>
>>> At this point the globus.org work is pre-production, although it
>>> builds on our stable data transfer software, GridFTP.  We are
>>> actively engaging with user communities, and working with them to
>>> use this solution for their data transfer needs. We have seen
>>> success in the specific functionality required by the communities
>>> we have engaged with, and are in the process of prioritizing
>>> feature additions based on feedback we are getting. It would be
>>> good for us to evaluate globus.org to see if it fits as a solution
>>> for our use case. If there is interest, I am happy to take back
>>> requirements from this group to evaluate the globus.org solution in
>>> terms of features offered, and planned roadmap. It will help us
>>> determine if we can leverage it, and how and when we could engage.
>>>
>>> Regarding the intercontinental transfers, I took the question back
>>> to our data transfer folks, and here is their response. Attached is
>>> also a reference paper from them on studies with using UDT with
>>> GridFTP for such transfers:
>>>
>>> "We have seen data rates around only 50 mbps or less using TCP
>>> between US and both Australia and New Zealand. We haven't done any
>>> detailed analysis of the packet loss rates but this is probably due
>>> to the inherent limitations in TCP on high-latency links. The
>>> latency to australia and NZ is more than 200 ms. Using UDT as an
>>> alternative to TCP has helped improve the performance by 4x.
>>> Enclosed paper has some data comparing GridFTP over TCP and GridFTP
>>> over UDT for transfers between US and NZ.
>>> Note that these are straight globus-url-copy transfers and not
>>> Globus.org transfers."
>>>
>>> Also, in the original link I sent, you pointed out the retries
>>> mentioned in the study. Those retries were actually for transfers
>>> with in Australia, and not intercontinental transfer. Run 5 and Run
>>> 6 in the link http://www.mcs.anl.gov/~childers/VPAC/, were for
>>> sites within Australia.
>>>
>>> Hope this helps,
>>> Rachana
>>>
>>> On Jun 10, 2010, at 2:50 AM, <martin.juckes at stfc.ac.uk> <martin.juckes at stfc.ac.uk
>>>> wrote:
>>>
>>>> Hello Rachana,
>>>> Thanks for those useful examples. My main interest with LDM was
>>>> the fire-and-forget feature, which simplifies management of a data-
>>>> stream.  The phrase I used, "verify success", was not really
>>>> accurate -- what I should of said was guarantee success.
>>>> It sounds as though I also got a mis-leading view of the maturity
>>>> of globus.org from the web sites I found on the topic -- my
>>>> apologies for that.
>>>> On a slightly different topic, a quick look at the links you sent
>>>> appears to reflect similar problems with long range transfers to
>>>> ones we are encountering. We have also been doing trials with
>>>> GridFTP looking at transfer rates between here (southern England)
>>>> and France, Germany and the US. Transfer rates to the US are
>>>> substantially slower, reflecting high package loss rates. It looks
>>>> as though there is an important distinction to be made between
>>>> wide area networks within Europe or within the US, for which very
>>>> high transfer rates can be achieved, and global networks, for
>>>> which package loss appears to seriously throttle transfer rates.
>>>> This problem is reflected in the first link you sent, with high
>>>> resend rates to Australia compared with very small number on
>>>> trials with larger files in the US.
>>>> This is an important issue for us because we need to move a lot of
>>>> data around the world and the current transfer rates we are
>>>> getting are not likely to be sufficient. Do you know of any work
>>>> on assessing and alleviating package loss on intercontinental
>>>> transfers? (I'm not expecting LDM to provide any improvement in
>>>> this area -- but it would be a very welcome surprise if it did).
>>>> regards,
>>>> Martin
>>>> -----Original Message-----
>>>> From: Rachana Ananthakrishnan [mailto:ranantha at mcs.anl.gov]
>>>> Sent: Thu 10/06/2010 03:46
>>>> To: Juckes, Martin (STFC,RAL,SSTD)
>>>> Cc: don at ucar.edu; go-essp-tech at ucar.edu
>>>> Subject: Re: [Go-essp-tech] Bulk data moving and the UNIDATA LDM
>>>> Hi,
>>>> I wanted to address some points raised on this thread, and also
>>>> provide pointers to some relevant material.
>>>> On the GridFTP transfer with small files, there has been  
>>>> significant
>>>> work to improve performance with small files. Here is a reference
>>>> that
>>>> explains this work: http://www.mcs.anl.gov/~kettimut/publications/Pipelining.pdf
>>>> . Is the concern of using GridFTP for small files a performance
>>>> issue
>>>> or a reliability issue?
>>>> It is hard to compare performance without hard numbers, but taking
>>>> this thread back to our GridFTP team, here are some questions that
>>>> were raised:
>>>> - Checksums at end of transfer is supported by the GridFTP client
>>>> libraries. What additional features are covered by the statement
>>>> "LDM
>>>> appears to have a lot of good features built in to verify success  
>>>> of
>>>> data transfer between sites".
>>>> - GridFTP supports both concurrent transfer of multiple files and
>>>> concurrent transfer of multiple chunks within a file. This
>>>> combination
>>>> can outperform just concurrent transfer of multiple files on high
>>>> latency links. Is there comparable feature with LDM?
>>>> - What security model does LDM support?
>>>> Regarding globus.org, it is built around GridFTP server and client
>>>> technology to provide the fire and forget mode for file transfers.
>>>> Like Pauline mentions, it at the minimum, provides reliability, and
>>>> restarts with the transfers. It is provided as a Software as a
>>>> Service
>>>> solution around GridFTP software that is both mature and robust,  
>>>> and
>>>> will be provided as a hosted solution operated by us (Argonne
>>>> National
>>>> Lab in the US) for user communities. We are working with individual
>>>> communities to do performance and reliability testing, and to  
>>>> gather
>>>> specific requirements from them. Integrity check is one of the
>>>> features in the pipeline. Some references to performance numbers
>>>> thus
>>>> far:
>>>> Transfer managed from a site in US to Australia: http://www.mcs.anl.gov/~childers/VPAC/
>>>> Transfer test for tutorials: http://www.mcs.anl.gov/~childers/GlobusWorld2010/results.html
>>>> Alex referenced a Super Computing Bandwidth Challenge work that  
>>>> used
>>>> globus.org to replicate 10TB of CMIP3 data between two sites in the
>>>> US: http://www.mcs.anl.gov/~kettimut/publications/HPDC10_BWC_Final.pdf
>>>> .
>>>> If there is of interest in this group, like Pauline suggested, we
>>>> could have someone from our team present about the service to this
>>>> group. We are certainly interested to hear experiences with LDM,  
>>>> and
>>>> experiences on using both these solutions.
>>>> Thanks,
>>>> Rachana
>>>> On Jun 7, 2010, at 12:04 PM, <martin.juckes at stfc.ac.uk> <martin.juckes at stfc.ac.uk
>>>>> wrote:
>>>>> Hello Don, Pauline,
>>>>> I had a quick look at globus.org -- it looks good but is only
>>>>> available
>>>>> as a beta release, whereas LDM is on version 6.8. Having a tried
>>>>> and
>>>>> tested bit of software to handle a key part of our data management
>>>>> would
>>>>> make life a lot easier.
>>>>> Form Don's message, it appears that LDM does not have GridFTP's
>>>>> ability
>>>>> to deal with large files. So, we have a choice between LDM which
>>>>> supports a "fire-and-forget" approach but needs small files or
>>>>> GridFTP,
>>>>> which will deal with small files but requires additional software
>>>>> to
>>>>> determine whether transfer succeeded.
>>>>> It looks to me as though LDM has a big advantage in terms of
>>>>> maturity. I
>>>>> think the overhead of having to split and rejoin files is likely
>>>>> to be
>>>>> significantly less problematic than getting reliable site to site
>>>>> communication about thousands of files being transferred.
>>>>> Cheers,
>>>>> Martin
>>>>>> -----Original Message-----
>>>>>> From: Don Middleton [mailto:don at ucar.edu]
>>>>>> Sent: 07 June 2010 13:11
>>>>>> To: Juckes, Martin (STFC,RAL,SSTD)
>>>>>> Cc: Don Middleton; Lawrence, Bryan (STFC,RAL,SSTD); go-essp-
>>>>>> tech at ucar.edu
>>>>>> Subject: Re: [Go-essp-tech] Bulk data moving and the UNIDATA LDM
>>>>>> I don't have figures, but some general information. In the TIGGE
>>>>>> context, LDM is dealing with pretty small packages: 2D grids,
>>>>>> which
>>>>>> can be reassembled at the receiving end into individual 3D
>>>>>> fields for
>>>>>> each timestep and forecast. Cost of transmission failure for  
>>>>>> large
>>>>>> files is reported to be high, but LDM does have features to
>>>>>> support
>>>>>> retry until success. I don't know how difficult it might be to
>>>>>> replace
>>>>>> LDM's transport layer with GridFTP, or even if it makes any
>>>>>> sense to
>>>>>> think about that, particularly given Pauline's message. LDM does
>>>>>> appear to be working quite well in an operational context.
>>>>>> don
>>>>>> On Jun 7, 2010, at 1:50 AM, <martin.juckes at stfc.ac.uk>
>>>>>> <martin.juckes at stfc.ac.uk
>>>>>>> wrote:
>>>>>>> Thanks Don, that would be interesting to see. The TIGGE data
>>>>>> transfers
>>>>>>> are the same scale as we will have to deal with, nso the figures
>>>>>>> will be
>>>>>>> very relevant.
>>>>>>> I was interested in LDM because it appears to offer a lot of
>>>>>>> support
>>>>>>> for
>>>>>>> data management which might significantly reduce the amount of
>>>>> coding
>>>>>>> and design we need to do ourselves -- relative to what would be
>>>>>>> necessary with GridFTP.
>>>>>>> Regards,
>>>>>>> Martin
>>>>>>>> -----Original Message-----
>>>>>>>> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-
>>>>>>>> bounces at ucar.edu] On Behalf Of Don Middleton
>>>>>>>> Sent: 04 June 2010 16:29
>>>>>>>> To: Lawrence, Bryan (STFC,RAL,SSTD)
>>>>>>>> Cc: go-essp-tech at ucar.edu
>>>>>>>> Subject: Re: [Go-essp-tech] Bulk data moving and the UNIDATA  
>>>>>>>> LDM
>>>>>>>> We're using LDM for TIGGE, and replicating a couple of
>>>>>>>> terabytes of
>>>>>>>> forecast data a week, around the world. I'll inquire about
>>>>> filesizes
>>>>>>>> and rates.
>>>>>>>> don
>>>>>>>> On Jun 4, 2010, at 9:15 AM, Bryan Lawrence wrote:
>>>>>>>>> My understanding is that it doesn't compare on a file by file
>>>>>> basis:
>>>>>>>>> GridFTP is clearly optomised to move big files fast. However,
>>>>>>>>> if
>>>>> we
>>>>>>>>> are
>>>>>>>>> moving more than dozens of files (as we are), then as I
>>>>>>>>> understand
>>>>>>>> it,
>>>>>>>>> LDM would open multiple file transfer streams, so GridFTP's
>>>>>>> advantage
>>>>>>>>> will boil down to the (not inconsiderable) negotiated window
>>>>>>>>> size.
>>>>>>>>> Someone else ought to be able to give much better information
>>>>>>>>> than
>>>>>>>>> that
>>>>>>>>> :-)
>>>>>>>>> Cheers
>>>>>>>>> Bryan
>>>>>>>>> On Friday 04 Jun 2010 15:58:33 Alex Sim wrote:
>>>>>>>>>> Can you tell us  about the wide-area transfer performance  
>>>>>>>>>> with
>>>>>>>>>> Unidata LDM compared to GridFTP server based transfers?
>>>>>>>>>> -- Alex
>>>>>>>>>> On 6/4/10 3:39 AM, martin.juckes at stfc.ac.uk wrote:
>>>>>>>>>> Hello all,
>>>>>>>>>> There may be a simple answer to this, but is there a reason
>>>>>>>>>> why
>>>>> we
>>>>>>>>>> shouldn't use the Unidata Local Data Manager (LDM) for bulk
>>>>>>>>>> data
>>>>>>>>>> movement within the CMIP5 distributed archive? It appears to
>>>>>>>>>> have
>>>>>> a
>>>>>>>>>> lot of good features built in to verify success of data
>>>>>>>>>> transfer
>>>>>>>>>> between sites, and runs successfully at many operational
>>>>>>>>>> sites.
>>>>>>> This
>>>>>>>>>> would simplify the work flow, since all the complexity of the
>>>>> site
>>>>>>>>>> to site transfers would be dealt with by a tried and tested
>>>>>> system,
>>>>>>>>>> Cheers,
>>>>>>>>>> Martin
>>>>>>>>> --
>>>>>>>>> Bryan Lawrence
>>>>>>>>> Director of Environmental Archival and Associated Research
>>>>>>>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
>>>>>>>>> STFC, Rutherford Appleton Laboratory
>>>>>>>>> Phone +44 1235 445012; Fax ... 5848;
>>>>>>>>> Web: home.badc.rl.ac.uk/lawrence
>>>>>>>>> _______________________________________________
>>>>>>>>> GO-ESSP-TECH mailing list
>>>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>> _______________________________________________
>>>>>>>> GO-ESSP-TECH mailing list
>>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>> --
>>>>>>> Scanned by iCritical.
>>>>> -- 
>>>>> Scanned by iCritical.
>>>>> _______________________________________________
>>>>> GO-ESSP-TECH mailing list
>>>>> GO-ESSP-TECH at ucar.edu
>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>> Rachana Ananthakrishnan
>>>> Argonne National Lab | University of Chicago
>>>> --
>>>> Scanned by iCritical.
>>>
>>> Rachana Ananthakrishnan
>>> Argonne National Lab | University of Chicago
>>
>> -- 
>>
>> V. Balaji                               Office:  +1-609-452-6516
>> Head, Modeling Systems Group, GFDL      Home:    +1-212-253-6662
>> Princeton University                    Email: v.balaji at noaa.gov
>
> Rachana Ananthakrishnan
> Argonne National Lab | University of Chicago
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech

Rachana Ananthakrishnan
Argonne National Lab | University of Chicago



More information about the GO-ESSP-TECH mailing list