[Go-essp-tech] Bulk data moving and the UNIDATA LDM

Alex Sim asim at lbl.gov
Mon Jun 21 22:13:19 MDT 2010


On 6/15/10 6:56 AM, Rachana Ananthakrishnan wrote:
> Hi Balaji,
>
> True, the globus-url-copy command line does not have the "verify"  
> option, that will automatically pull down the file checksum from the  
> sever and compare. Currently, the server supports the option, but the  
> client library does not. This has been filed in our task backlog, and  
> is being worked on. I suspect that in the ESG replication case, Bulk  
> Data Mover is querying the GridFTP server for the file checksum, and  
> comparing it.
>   

Bulk Data Mover (BDM) calculates the checksum locally at the end of the
transfer requests (at the end of all files transfers, rather than after
each file transfer) as described in the NCAR wiki page
(https://wiki.ucar.edu/display/esgcet/Bulk+Data+Movement).
Currently the catalog does not contain any checksum values to compare
with. Once the publishing mechanism populates the checksum values to
compare with,  and those checksum values are provided as part of the
input to BDM, it'll make comparisons for each file after the local
checksum is calculated.
The wiki page is a bit old, but essential information is still the same.
As the running requirements, you need the gridftp server at the source,
as BDM pulls the data to the underlying local target storage. Underlying
transfer protocol is gridftp, and it takes benefits of what gridftp
provides, in addition to some customized features for ESG, such as
checksum calculation at the end of the transfer request and asynchronous
status check.  GSI is needed for gridftp transfers, and correct CA files
are assumed in proper place. You can put the bdm process in the backend,
and go home. It'll give you the "token" that you can check the request
status as to how much transfer is done, etc. The same technology in BDM
has been used for several years in HEP community that we've been
involved in. 

--Alex


> Ffor large files, you might consider the data channel integrity  
> option, rather then the end of file transfer checksum check. In the  
> data channel integrity option, every data block is integrity protected  
> and checked. I am confirming that this option with the retries support  
> built into globus-url-copy, will only retry the failed data block.  
> I'll get back to you on that piece.
>
> rsync does provide an excellent tool for keeping two directories in  
> sych. Infact, the GridFTP team is working on building such a support  
> in, where in the same features as rsync is provided (in terms of  
> determining the changed file set or file blocks that need to be  
> moved), but the underlying transfer protocol will be gsiftp. I expect  
> that this would be useful, over rsync,  if the changes are  
> significantly large. This feature has just been completed, and  
> performance studies have not been done yet. We'll know more soon.
>
> Rachana
>
> On Jun 14, 2010, at 9:55 AM, V. Balaji wrote:
>
>   
>> The talk of loss rates and so on is bringing up a question for
>> me related to your previous email, Rachana. There you stated that
>> gridftp data movement clients have support for checksumming. As far
>> as we can tell (and we are heavy users of griftp between ORNL and
>> GFDL...) we have had to write our own checksum software at either
>> end, globus_url_copy does not. It's not the most efficient way to do
>> things, for if a large file transfer fails to pass a checksum test,
>> we have to retransmit the whole file. Whereas rsync, for instance,
>> can checksum each packet in flight and only retransmit the failed  
>> ones.
>>
>> Can we confirm whether the end-to-end checksumming for bulk data  
>> movement
>> will come from gridftp, or will it have to be built by the ESGF?
>>
>> Thanks,
>>
>> Rachana Ananthakrishnan writes:
>>
>>     
>>> Hi Martin,
>>>
>>> At this point the globus.org work is pre-production, although it  
>>> builds on our stable data transfer software, GridFTP.  We are  
>>> actively engaging with user communities, and working with them to  
>>> use this solution for their data transfer needs. We have seen  
>>> success in the specific functionality required by the communities  
>>> we have engaged with, and are in the process of prioritizing  
>>> feature additions based on feedback we are getting. It would be  
>>> good for us to evaluate globus.org to see if it fits as a solution  
>>> for our use case. If there is interest, I am happy to take back  
>>> requirements from this group to evaluate the globus.org solution in  
>>> terms of features offered, and planned roadmap. It will help us  
>>> determine if we can leverage it, and how and when we could engage.
>>>
>>> Regarding the intercontinental transfers, I took the question back  
>>> to our data transfer folks, and here is their response. Attached is  
>>> also a reference paper from them on studies with using UDT with  
>>> GridFTP for such transfers:
>>>
>>> "We have seen data rates around only 50 mbps or less using TCP  
>>> between US and both Australia and New Zealand. We haven't done any  
>>> detailed analysis of the packet loss rates but this is probably due  
>>> to the inherent limitations in TCP on high-latency links. The  
>>> latency to australia and NZ is more than 200 ms. Using UDT as an  
>>> alternative to TCP has helped improve the performance by 4x.  
>>> Enclosed paper has some data comparing GridFTP over TCP and GridFTP  
>>> over UDT for transfers between US and NZ.
>>> Note that these are straight globus-url-copy transfers and not  
>>> Globus.org transfers."
>>>
>>> Also, in the original link I sent, you pointed out the retries  
>>> mentioned in the study. Those retries were actually for transfers  
>>> with in Australia, and not intercontinental transfer. Run 5 and Run  
>>> 6 in the link http://www.mcs.anl.gov/~childers/VPAC/, were for  
>>> sites within Australia.
>>>
>>> Hope this helps,
>>> Rachana
>>>
>>> On Jun 10, 2010, at 2:50 AM, <martin.juckes at stfc.ac.uk> <martin.juckes at stfc.ac.uk 
>>>       
>>>> wrote:
>>>>         
>>>       
>>>> Hello Rachana,
>>>> Thanks for those useful examples. My main interest with LDM was  
>>>> the fire-and-forget feature, which simplifies management of a data- 
>>>> stream.  The phrase I used, "verify success", was not really  
>>>> accurate -- what I should of said was guarantee success.
>>>> It sounds as though I also got a mis-leading view of the maturity  
>>>> of globus.org from the web sites I found on the topic -- my  
>>>> apologies for that.
>>>> On a slightly different topic, a quick look at the links you sent  
>>>> appears to reflect similar problems with long range transfers to  
>>>> ones we are encountering. We have also been doing trials with  
>>>> GridFTP looking at transfer rates between here (southern England)  
>>>> and France, Germany and the US. Transfer rates to the US are  
>>>> substantially slower, reflecting high package loss rates. It looks  
>>>> as though there is an important distinction to be made between  
>>>> wide area networks within Europe or within the US, for which very  
>>>> high transfer rates can be achieved, and global networks, for  
>>>> which package loss appears to seriously throttle transfer rates.  
>>>> This problem is reflected in the first link you sent, with high  
>>>> resend rates to Australia compared with very small number on  
>>>> trials with larger files in the US.
>>>> This is an important issue for us because we need to move a lot of  
>>>> data around the world and the current transfer rates we are  
>>>> getting are not likely to be sufficient. Do you know of any work  
>>>> on assessing and alleviating package loss on intercontinental  
>>>> transfers? (I'm not expecting LDM to provide any improvement in  
>>>> this area -- but it would be a very welcome surprise if it did).
>>>> regards,
>>>> Martin
>>>> -----Original Message-----
>>>> From: Rachana Ananthakrishnan [mailto:ranantha at mcs.anl.gov]
>>>> Sent: Thu 10/06/2010 03:46
>>>> To: Juckes, Martin (STFC,RAL,SSTD)
>>>> Cc: don at ucar.edu; go-essp-tech at ucar.edu
>>>> Subject: Re: [Go-essp-tech] Bulk data moving and the UNIDATA LDM
>>>> Hi,
>>>> I wanted to address some points raised on this thread, and also
>>>> provide pointers to some relevant material.
>>>> On the GridFTP transfer with small files, there has been significant
>>>> work to improve performance with small files. Here is a reference  
>>>> that
>>>> explains this work: http://www.mcs.anl.gov/~kettimut/publications/Pipelining.pdf
>>>> . Is the concern of using GridFTP for small files a performance  
>>>> issue
>>>> or a reliability issue?
>>>> It is hard to compare performance without hard numbers, but taking
>>>> this thread back to our GridFTP team, here are some questions that
>>>> were raised:
>>>> - Checksums at end of transfer is supported by the GridFTP client
>>>> libraries. What additional features are covered by the statement  
>>>> "LDM
>>>> appears to have a lot of good features built in to verify success of
>>>> data transfer between sites".
>>>> - GridFTP supports both concurrent transfer of multiple files and
>>>> concurrent transfer of multiple chunks within a file. This  
>>>> combination
>>>> can outperform just concurrent transfer of multiple files on high
>>>> latency links. Is there comparable feature with LDM?
>>>> - What security model does LDM support?
>>>> Regarding globus.org, it is built around GridFTP server and client
>>>> technology to provide the fire and forget mode for file transfers.
>>>> Like Pauline mentions, it at the minimum, provides reliability, and
>>>> restarts with the transfers. It is provided as a Software as a  
>>>> Service
>>>> solution around GridFTP software that is both mature and robust, and
>>>> will be provided as a hosted solution operated by us (Argonne  
>>>> National
>>>> Lab in the US) for user communities. We are working with individual
>>>> communities to do performance and reliability testing, and to gather
>>>> specific requirements from them. Integrity check is one of the
>>>> features in the pipeline. Some references to performance numbers  
>>>> thus
>>>> far:
>>>> Transfer managed from a site in US to Australia: http://www.mcs.anl.gov/~childers/VPAC/
>>>> Transfer test for tutorials: http://www.mcs.anl.gov/~childers/GlobusWorld2010/results.html
>>>> Alex referenced a Super Computing Bandwidth Challenge work that used
>>>> globus.org to replicate 10TB of CMIP3 data between two sites in the
>>>> US: http://www.mcs.anl.gov/~kettimut/publications/HPDC10_BWC_Final.pdf 
>>>> .
>>>> If there is of interest in this group, like Pauline suggested, we
>>>> could have someone from our team present about the service to this
>>>> group. We are certainly interested to hear experiences with LDM, and
>>>> experiences on using both these solutions.
>>>> Thanks,
>>>> Rachana
>>>> On Jun 7, 2010, at 12:04 PM, <martin.juckes at stfc.ac.uk> <martin.juckes at stfc.ac.uk
>>>>         
>>>>> wrote:
>>>>> Hello Don, Pauline,
>>>>> I had a quick look at globus.org -- it looks good but is only
>>>>> available
>>>>> as a beta release, whereas LDM is on version 6.8. Having a tried  
>>>>> and
>>>>> tested bit of software to handle a key part of our data management
>>>>> would
>>>>> make life a lot easier.
>>>>> Form Don's message, it appears that LDM does not have GridFTP's
>>>>> ability
>>>>> to deal with large files. So, we have a choice between LDM which
>>>>> supports a "fire-and-forget" approach but needs small files or
>>>>> GridFTP,
>>>>> which will deal with small files but requires additional software  
>>>>> to
>>>>> determine whether transfer succeeded.
>>>>> It looks to me as though LDM has a big advantage in terms of
>>>>> maturity. I
>>>>> think the overhead of having to split and rejoin files is likely  
>>>>> to be
>>>>> significantly less problematic than getting reliable site to site
>>>>> communication about thousands of files being transferred.
>>>>> Cheers,
>>>>> Martin
>>>>>           
>>>>>> -----Original Message-----
>>>>>> From: Don Middleton [mailto:don at ucar.edu]
>>>>>> Sent: 07 June 2010 13:11
>>>>>> To: Juckes, Martin (STFC,RAL,SSTD)
>>>>>> Cc: Don Middleton; Lawrence, Bryan (STFC,RAL,SSTD); go-essp-
>>>>>> tech at ucar.edu
>>>>>> Subject: Re: [Go-essp-tech] Bulk data moving and the UNIDATA LDM
>>>>>> I don't have figures, but some general information. In the TIGGE
>>>>>> context, LDM is dealing with pretty small packages: 2D grids,  
>>>>>> which
>>>>>> can be reassembled at the receiving end into individual 3D  
>>>>>> fields for
>>>>>> each timestep and forecast. Cost of transmission failure for large
>>>>>> files is reported to be high, but LDM does have features to  
>>>>>> support
>>>>>> retry until success. I don't know how difficult it might be to
>>>>>> replace
>>>>>> LDM's transport layer with GridFTP, or even if it makes any  
>>>>>> sense to
>>>>>> think about that, particularly given Pauline's message. LDM does
>>>>>> appear to be working quite well in an operational context.
>>>>>> don
>>>>>> On Jun 7, 2010, at 1:50 AM, <martin.juckes at stfc.ac.uk>
>>>>>> <martin.juckes at stfc.ac.uk
>>>>>>             
>>>>>>> wrote:
>>>>>>> Thanks Don, that would be interesting to see. The TIGGE data
>>>>>>>               
>>>>>> transfers
>>>>>>             
>>>>>>> are the same scale as we will have to deal with, nso the figures
>>>>>>> will be
>>>>>>> very relevant.
>>>>>>> I was interested in LDM because it appears to offer a lot of  
>>>>>>> support
>>>>>>> for
>>>>>>> data management which might significantly reduce the amount of
>>>>>>>               
>>>>> coding
>>>>>           
>>>>>>> and design we need to do ourselves -- relative to what would be
>>>>>>> necessary with GridFTP.
>>>>>>> Regards,
>>>>>>> Martin
>>>>>>>               
>>>>>>>> -----Original Message-----
>>>>>>>> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-
>>>>>>>> bounces at ucar.edu] On Behalf Of Don Middleton
>>>>>>>> Sent: 04 June 2010 16:29
>>>>>>>> To: Lawrence, Bryan (STFC,RAL,SSTD)
>>>>>>>> Cc: go-essp-tech at ucar.edu
>>>>>>>> Subject: Re: [Go-essp-tech] Bulk data moving and the UNIDATA LDM
>>>>>>>> We're using LDM for TIGGE, and replicating a couple of  
>>>>>>>> terabytes of
>>>>>>>> forecast data a week, around the world. I'll inquire about
>>>>>>>>                 
>>>>> filesizes
>>>>>           
>>>>>>>> and rates.
>>>>>>>> don
>>>>>>>> On Jun 4, 2010, at 9:15 AM, Bryan Lawrence wrote:
>>>>>>>>                 
>>>>>>>>> My understanding is that it doesn't compare on a file by file
>>>>>>>>>                   
>>>>>> basis:
>>>>>>             
>>>>>>>>> GridFTP is clearly optomised to move big files fast. However,  
>>>>>>>>> if
>>>>>>>>>                   
>>>>> we
>>>>>           
>>>>>>>>> are
>>>>>>>>> moving more than dozens of files (as we are), then as I  
>>>>>>>>> understand
>>>>>>>>>                   
>>>>>>>> it,
>>>>>>>>                 
>>>>>>>>> LDM would open multiple file transfer streams, so GridFTP's
>>>>>>>>>                   
>>>>>>> advantage
>>>>>>>               
>>>>>>>>> will boil down to the (not inconsiderable) negotiated window  
>>>>>>>>> size.
>>>>>>>>> Someone else ought to be able to give much better information  
>>>>>>>>> than
>>>>>>>>> that
>>>>>>>>> :-)
>>>>>>>>> Cheers
>>>>>>>>> Bryan
>>>>>>>>> On Friday 04 Jun 2010 15:58:33 Alex Sim wrote:
>>>>>>>>>                   
>>>>>>>>>> Can you tell us  about the wide-area transfer performance with
>>>>>>>>>> Unidata LDM compared to GridFTP server based transfers?
>>>>>>>>>> -- Alex
>>>>>>>>>> On 6/4/10 3:39 AM, martin.juckes at stfc.ac.uk wrote:
>>>>>>>>>> Hello all,
>>>>>>>>>> There may be a simple answer to this, but is there a reason  
>>>>>>>>>> why
>>>>>>>>>>                     
>>>>> we
>>>>>           
>>>>>>>>>> shouldn't use the Unidata Local Data Manager (LDM) for bulk  
>>>>>>>>>> data
>>>>>>>>>> movement within the CMIP5 distributed archive? It appears to  
>>>>>>>>>> have
>>>>>>>>>>                     
>>>>>> a
>>>>>>             
>>>>>>>>>> lot of good features built in to verify success of data  
>>>>>>>>>> transfer
>>>>>>>>>> between sites, and runs successfully at many operational  
>>>>>>>>>> sites.
>>>>>>>>>>                     
>>>>>>> This
>>>>>>>               
>>>>>>>>>> would simplify the work flow, since all the complexity of the
>>>>>>>>>>                     
>>>>> site
>>>>>           
>>>>>>>>>> to site transfers would be dealt with by a tried and tested
>>>>>>>>>>                     
>>>>>> system,
>>>>>>             
>>>>>>>>>> Cheers,
>>>>>>>>>> Martin
>>>>>>>>>>                     
>>>>>>>>> --
>>>>>>>>> Bryan Lawrence
>>>>>>>>> Director of Environmental Archival and Associated Research
>>>>>>>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
>>>>>>>>> STFC, Rutherford Appleton Laboratory
>>>>>>>>> Phone +44 1235 445012; Fax ... 5848;
>>>>>>>>> Web: home.badc.rl.ac.uk/lawrence
>>>>>>>>> _______________________________________________
>>>>>>>>> GO-ESSP-TECH mailing list
>>>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>>                   
>>>>>>>> _______________________________________________
>>>>>>>> GO-ESSP-TECH mailing list
>>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>                 
>>>>>>> --
>>>>>>> Scanned by iCritical.
>>>>>>>               
>>>>> -- 
>>>>> Scanned by iCritical.
>>>>> _______________________________________________
>>>>> GO-ESSP-TECH mailing list
>>>>> GO-ESSP-TECH at ucar.edu
>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>           
>>>> Rachana Ananthakrishnan
>>>> Argonne National Lab | University of Chicago
>>>> --
>>>> Scanned by iCritical.
>>>>         
>>> Rachana Ananthakrishnan
>>> Argonne National Lab | University of Chicago
>>>       
>> -- 
>>
>> V. Balaji                               Office:  +1-609-452-6516
>> Head, Modeling Systems Group, GFDL      Home:    +1-212-253-6662
>> Princeton University                    Email: v.balaji at noaa.gov
>>     
> Rachana Ananthakrishnan
> Argonne National Lab | University of Chicago
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>
>   


More information about the GO-ESSP-TECH mailing list