[Go-essp-tech] Replication plan

Ann Chervenak annc at isi.edu
Thu Mar 3 09:49:24 MST 2011


Hi, Estani,

I agree that such a replication plan is a smart idea. Otherwise, we are 
likely to create hot spots where a newly published data set gets a lot 
of simultaneous access, slowing down everyone using that site.

Your suggestion of two sites each downloading half the data set and then 
acting as alternate sources for replication operations makes sense.

Such a replication plan can get fairly sophisticated--e.g., using a tree 
configuration to disseminate data to multiple mirror sites, with each 
newly created replica acting as a source site for subsequent replication 
operations.

We could also think about scheduling downloads based on when system and 
network loads are likely to be lighter.

Are your able to schedule replication operations (i.e., can you expect 
BADC to publish certain data sets at certain times), or are replication 
operations more reactive (initiated when you see that a new data set has 
been published)?

Ann


On 3/3/11 12:09 AM, Estanislao Gonzalez wrote:
> Hi,
>
> I decided to split my last email as this is something a little
> different, but got originated from what I said there.
>
> Shouldn't we have a replication plan?
> For example, if PCMDI and DKRZ replicates the same datasets from BADC at
> the same time it will be a waste of time. We (DKRZ) could replicate one
> half and PCMDI the other, and for the second half we will be able to
> download simultaneously from two other gateways (actually datanodes, but
> you get the idea).
>
> Any thoughts on how this could be achievable? Or if it even makes sense?
>
> Thanks,
> Estani
>
>    


More information about the GO-ESSP-TECH mailing list