[Go-essp-tech] Reasoning for the use of symbolic links in drslib
Kettleborough, Jamie
jamie.kettleborough at metoffice.gov.uk
Wed Sep 21 04:46:07 MDT 2011
Hello Gavin,
so is that a consensus - every data node should record the checksums for
every file in the thredds catalogues?
(Urm... not sure its really my role to say this - so sorry if I've
stepped out of line).
Jamie
p.s. do I have friends? I thought I was just making enemies
________________________________
From: Gavin M. Bell [mailto:gavin at llnl.gov]
Sent: 20 September 2011 17:26
To: Kettleborough, Jamie
Cc: V. Balaji; go-essp-tech at ucar.edu;
esg-node-dev at lists.llnl.gov
Subject: Re: [Go-essp-tech] Reasoning for the use of symbolic
links in drslib
Jamie and friends.
You've answered your own questions :-)...
It is the catalog where these checksums (and other features) are
recorded.
And thus using the catalog you can see what has changed.
There is a new catalog for every version of a dataset. Given
that...
you can quickly and easily inspect catalog_v1 and catalog_v2 to
find what the changes are.
This all answers the question of "WHAT" (to download)... the
other question of "HOW" is a different, but related story.
The trick is to not conflate the two issues which is what
filesystem discussions do. When talking about filesystems you are
stipulating the what but implicitly conflating the HOW because you are
implicitly designing for tools that intrinsically use the filesystem.
It is a muddying of the waters that doesn't separate the two concerns.
We need to deal with these two concepts independently in a way that does
not limit the system or cause undo burden on institutions by requiring
a rigid structure.
As I mentioned... it's not the filesystem we need to look at...
it's the catalogs.
just my $0.02 - I'll stop flogging this particular horse... but
I hope I have done a better job expressing the issues and where the
solution lies (IMHO).
On 9/20/11 8:14 AM, Kettleborough, Jamie wrote:
Hello Balaji,
I agree - getting all nodes to make the checksums
available would be a
good thing. It gives you both the data integrity check
on download, and
the ability to see what files really have changed from
one publication
version to the next.
I don't know how hard it is to do this, particularly for
data that is
already published.
Jamie
-----Original Message-----
From: V. Balaji [mailto:V.Balaji at noaa.gov]
Sent: 20 September 2011 16:01
To: Kettleborough, Jamie
Cc: Karl Taylor; go-essp-tech at ucar.edu;
esg-node-dev at lists.llnl.gov
Subject: Re: [Go-essp-tech] Reasoning for the
use of symbolic
links in drslib
If nodes can currently choose to record
checksums or not, I'd
strongly recommend this be a non-optional
requirement.. how
could anyone download any data with confidence
without being
able to checksum?
You can of course check timestamps and filesizes
and so on,
but you have to consider those optimizations...
a fast option
for the less paranoid to avoid the sum
computation, which has
to be the gold standard.
"Trust but checksum".
Kettleborough, Jamie writes:
Hello Karl, everyone,
For replicating the latest
version, I agree that your alternate
structure poses difficulties (but it
seems like there must
be a way to
smartly determine whether the file you
already have a file
and simply
need to move it, rather than bring it
over again).
Doesn't every user (not just the
replication system) have
this problem:
they want to know what files have
changed (or not changed) at a new
publication version. No one wants to be
using band width
or storage
space to fetch and store files they
already have. How is a user
expected to know what has really
changed? Estani mentions
check sums
- OK, but I don't think all nodes expose
them (is this
right?). You
may try to infer from modification dates
(not sure, I
haven't look at
them that closely). You may try to
infer from the
TRACKING_ID - but
I'm not sure how reliable this is (I can
imagine scenarios where
different files share the same
TRACKING_ID - e.g. if they have been
modified with an nco tool).
Is there a recommended method for users
to understand what *files*
have actually changed when a new
publication version appears?
Thanks,
Jamie
--
V. Balaji Office:
+1-609-452-6516
Head, Modeling Systems Group, GFDL Home:
+1-212-253-6662
Princeton University Email:
v.balaji at noaa.gov
_______________________________________________
GO-ESSP-TECH mailing list
GO-ESSP-TECH at ucar.edu
http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
--
Gavin M. Bell
--
"Never mistake a clear view for a short distance."
-Paul Saffo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110921/dbbf973d/attachment-0001.html
More information about the GO-ESSP-TECH
mailing list