[Go-essp-tech] CMIP5 data archive size estimate

Alex Sim asim at lbl.gov
Sun Dec 13 20:50:27 MST 2009


BDM takes care of "lots of small files" in transfers: a few features
come from gridftp library - channel caching, pipelining as in the link
you mentioned and in addition, a a few more optimization in bdm -
queuing, auto-tuning on concurrency, parellelism, tcp buffer and block
size, etc.

-- Alex



On 12/13/09 4:26 PM, Pauline Mak wrote:
> Hi all,
>
> Bryan Lawrence wrote:
>   
>> hi Stephen
>>
>> Hmm. I just spoke to you and suggested I thought you were wrong, but I did my calculation again, and it was probably me that was wrong.
>>
>> I think the number of different outputs requested is of o(500) (*)
>> I think the number of experiments is of o(50)
>> The number of modelling centres is of o(20)
>> The number of ensembles is of o(3)
>> Number out=500x50x20x3=1.5E6.
>>
>> So, what's a factor of two between friends :-)
>>
>> But, this also implies, 1PB/2 million= 0.5 GB per atomic dataset. We know/think that gridftp doesn't like small files ... is this big enough? Does the BDM aggregate things to faster?
>>
>>     
>
> I was under the impression that the new implementation of the client (at 
> least for Java, see http://dev.globus.org/wiki/CoG_JGlobus_1.5.0) that 
> improves transfers for many small files.  Although, I'm not entirely 
> sure what this means!
>
> Cheers,
>
> -Pauline.
>
>   


More information about the GO-ESSP-TECH mailing list