[Go-essp-tech] Reasoning for the use of symbolic links in drslib

Gavin M. Bell gavin at llnl.gov
Tue Sep 20 10:26:18 MDT 2011


Jamie and friends.

You've answered your own questions :-)...
It is the catalog where these checksums (and other features) are recorded.
And thus using the catalog you can see what has changed.
There is a new catalog for every version of a dataset. Given that...
you can quickly and easily inspect catalog_v1 and catalog_v2 to find
what the changes are.
This all answers the question of "WHAT" (to download)... the other
question of "HOW" is a different, but related story.
The trick is to not conflate the two issues which is what filesystem
discussions do.  When talking about filesystems you are stipulating the
what but implicitly conflating the HOW because you are implicitly
designing for tools that intrinsically use the filesystem.  It is a
muddying of the waters that doesn't separate the two concerns.  We need
to deal with these two concepts independently in a way that does not 
limit the system or cause undo burden on institutions by requiring a
rigid structure.

As I mentioned... it's not the filesystem we need to look at... it's the
catalogs.

just my $0.02 - I'll stop flogging this particular horse... but I hope I
have done a better job expressing the issues and where the solution lies
(IMHO).

On 9/20/11 8:14 AM, Kettleborough, Jamie wrote:
> Hello Balaji,
>
> I agree - getting all nodes to make the checksums available would be a
> good thing.  It gives you both the data integrity check on download, and
> the ability to see what files really have changed from one publication
> version to the next.
>
> I don't know how hard it is to do this, particularly for data that is
> already published.
>
> Jamie 
>
>> -----Original Message-----
>> From: V. Balaji [mailto:V.Balaji at noaa.gov] 
>> Sent: 20 September 2011 16:01
>> To: Kettleborough, Jamie
>> Cc: Karl Taylor; go-essp-tech at ucar.edu; esg-node-dev at lists.llnl.gov
>> Subject: Re: [Go-essp-tech] Reasoning for the use of symbolic 
>> links in drslib
>>
>> If nodes can currently choose to record checksums or not, I'd 
>> strongly recommend this be a non-optional requirement.. how 
>> could anyone download any data with confidence without being 
>> able to checksum?
>>
>> You can of course check timestamps and filesizes and so on, 
>> but you have to consider those optimizations... a fast option 
>> for the less paranoid to avoid the sum computation, which has 
>> to be the gold standard.
>>
>> "Trust but checksum".
>>
>> Kettleborough, Jamie writes:
>>
>>> Hello Karl, everyone,
>>>
>>>
>>> 	For replicating the latest version, I agree that your alternate 
>>> structure poses difficulties (but it seems like there must 
>> be a way to 
>>> smartly determine whether the file you already have a file 
>> and simply 
>>> need to move it, rather than bring it over again).
>>>
>>>
>>> Doesn't every user (not just the replication system) have 
>> this problem:
>>> they want to know what files have changed (or not changed) at a new 
>>> publication version.  No one wants to be using band width 
>> or storage 
>>> space to fetch and store files they already have.  How is a user 
>>> expected to know what has really changed?  Estani mentions 
>> check sums 
>>> - OK, but I don't think all nodes expose them (is this 
>> right?).  You 
>>> may try to infer from modification dates (not sure, I 
>> haven't look at 
>>> them that closely).  You may try to infer from the 
>> TRACKING_ID - but 
>>> I'm not sure how reliable this is (I can imagine scenarios where 
>>> different files share the same TRACKING_ID - e.g. if they have been 
>>> modified with an nco tool).
>>>
>>> Is there a recommended method for users to understand what *files* 
>>> have actually changed when a new publication version appears?
>>>
>>> Thanks,
>>>
>>> Jamie
>>>
>> -- 
>>
>> V. Balaji                               Office:  +1-609-452-6516
>> Head, Modeling Systems Group, GFDL      Home:    +1-212-253-6662
>> Princeton University                    Email: v.balaji at noaa.gov
>>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech

-- 
Gavin M. Bell
--

 "Never mistake a clear view for a short distance."
       	       -Paul Saffo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110920/9d49d3b5/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list