[Go-essp-tech] Thoughts Relating to your phone conference on replication....

Tue Nov 3 12:41:46 MST 2009

Hello Gentle-people,

I just wanted to say a few words.  I should have said something at the
end of our phone conference but I thought it wiser to collect my
thoughts a bit for a more cogent and durable presentation.

Okay, please be patient and read through the entire email, the complete
thought.  Please send feed back so we can hash this out.... Bare with me.

The main idea I want to get across is for us to have a *catalog-centric*
view of they system.  It is the catalog that is the primary currency of
the system.

- The catalog gets generated and published from the data via the
data-node/publisher to the gateway.

- The gateway is simply, in the context of this model, a searchable
index over a collection of catalogs.

- Changes to catalogs are what is versioned.

- Changes to catalogs are what trigger notifications

- Replication should be about replicating catalogs, where files
transfers are the necessary side-effect of proper catalog replication.

It is the catalog that is the central 'document' that we are interested
in.  It is the single entity that contains the necessary information
used in all levels of this system.

The very good point that was brought up on the call was, what is the
interface between parts of the system?  It has become clear to me that
if each part of the system understood the catalog then they could
operate quite well, gleaning the information out of catalogs.

The topic today was replication:
So... In a catalog centric model, the question of replication becomes
simply, what datasets have changed?  This is equivalent to asking, what
catalogs have changed?  This can be answered by the gateway which, in
this context, is essentially a catalog store. The gateway knows when a
catalog changes because a new catalog would usurp an older one - this
can be detected at publication time and versioned appropriately and
announced (via notification).  The replication agent is interested in
these notifications thus should be defacto subscribed to getting such
notification messages.  When the replication agent is notified it would
look on it's system and see if the notification is something in it's
list to have a replica of, it's "replication list".  If so it can pull
down the catalog or some subset (diff) of that catalog, or simply the
necessary tuple to find the location(s) of holders of the newest
catalog.  The catalog will always have in it its authoritative source
(dataset name and gateway).  This can be resolved to the actual data
node that has the new version of that catalog (and any other replicas
that are up-to-date).  Then it is the job of the replication agent that
wants to be updated to contact the authoritative data-node or any
up-to-date replica holder and basically sync catalogs.  Syncing catalogs
means grabbing the latest catalog, from the authoritative source or an
updated data-node replica, and diffing it with the stale catalog it
currently has... the result of the diff is the set files and such that
need to be transfered in order to make the state of the stale node
equivalent to the state of the latest catalog.  It is the catalog that
contains the 'inventory' and all other necessary information.  Once
files are transfered integrity checking can be accomplished at a few
levels.  First is to have the stale, node generate it's own catalog and
then check it against the reference (up-to-date) catalog it got from the
source. If replication has been done successfully they should be
identical!  The catalog should have a 'header' portion that contains the
checksum of the immutable portion 'body' of the catalog.  The first
level integrity check would be to see if what is generated and the
reference are the same, if not a second level check that required
walking the catalog's (xml) tree and compare the two trees.  it is in
the latter check where individual files entries are checked to detect
what files may need to be fetched again.  Also if the connection goes
down or fails in some way, generating a catalog over the partial set of
files that have already been downloaded, and comparing it with the
source catalog will tell the replication agent where to puck up from.
The source catalog could be cached on the replication catalog and then
purged after replication is done.  Or to be more up-to-date, can refetch
a catalog from any in the list of already up-to-date replica holders.

The model is consistent.  Perhaps what needs to happen is for every part
of this system to be able to parse and glean information from the catalog.

There are system tweaks and optimizations that can be made (Ex:
subscribing to be notified for specific entities or doing a general
subscription blast.  Refetching latest catalog from source or up-to-date
replicas vs holding on to the source you already have - a question of
freshness, etc...).  But the model of being catalog centric is
consistent and complete.  I think this is the direction we should go in
if we want this system to be scalable and provide clean interfacing of
the different parts.  Furthermore testing becomes easier because
essentially all you would need is a bag-of-catalogs to ingest into your
piece.

Thanks for putting up with my stream of consciousness. :-)
I hope I was cogent.  Feel free to contact me if any additional
clarification is required.

Gavin.

-- 
Gavin M. Bell
Lawrence Livermore National Labs
--

 "Never mistake a clear view for a short distance."
       	       -Paul Saffo