[Go-essp-tech] Replication plan
Ann Chervenak
annc at isi.edu
Thu Mar 3 09:49:24 MST 2011
Hi, Estani,
I agree that such a replication plan is a smart idea. Otherwise, we are
likely to create hot spots where a newly published data set gets a lot
of simultaneous access, slowing down everyone using that site.
Your suggestion of two sites each downloading half the data set and then
acting as alternate sources for replication operations makes sense.
Such a replication plan can get fairly sophisticated--e.g., using a tree
configuration to disseminate data to multiple mirror sites, with each
newly created replica acting as a source site for subsequent replication
operations.
We could also think about scheduling downloads based on when system and
network loads are likely to be lighter.
Are your able to schedule replication operations (i.e., can you expect
BADC to publish certain data sets at certain times), or are replication
operations more reactive (initiated when you see that a new data set has
been published)?
Ann
On 3/3/11 12:09 AM, Estanislao Gonzalez wrote:
> Hi,
>
> I decided to split my last email as this is something a little
> different, but got originated from what I said there.
>
> Shouldn't we have a replication plan?
> For example, if PCMDI and DKRZ replicates the same datasets from BADC at
> the same time it will be a waste of time. We (DKRZ) could replicate one
> half and PCMDI the other, and for the second half we will be able to
> download simultaneously from two other gateways (actually datanodes, but
> you get the idea).
>
> Any thoughts on how this could be achievable? Or if it even makes sense?
>
> Thanks,
> Estani
>
>
More information about the GO-ESSP-TECH
mailing list