[Go-essp-tech] Reasoning for the use of symbolic links in drslib

Mark Morgan momipsl at ipsl.jussieu.fr
Tue Sep 20 09:23:19 MDT 2011


Hi

esgNode.mandatory(checksums + PKI) = a better night's sleep.

Mark


On 20 Sep 2011, at 17:14, Kettleborough, Jamie wrote:

> Hello Balaji,
> 
> I agree - getting all nodes to make the checksums available would be a
> good thing.  It gives you both the data integrity check on download, and
> the ability to see what files really have changed from one publication
> version to the next.
> 
> I don't know how hard it is to do this, particularly for data that is
> already published.
> 
> Jamie 
> 
>> -----Original Message-----
>> From: V. Balaji [mailto:V.Balaji at noaa.gov] 
>> Sent: 20 September 2011 16:01
>> To: Kettleborough, Jamie
>> Cc: Karl Taylor; go-essp-tech at ucar.edu; esg-node-dev at lists.llnl.gov
>> Subject: Re: [Go-essp-tech] Reasoning for the use of symbolic 
>> links in drslib
>> 
>> If nodes can currently choose to record checksums or not, I'd 
>> strongly recommend this be a non-optional requirement.. how 
>> could anyone download any data with confidence without being 
>> able to checksum?
>> 
>> You can of course check timestamps and filesizes and so on, 
>> but you have to consider those optimizations... a fast option 
>> for the less paranoid to avoid the sum computation, which has 
>> to be the gold standard.
>> 
>> "Trust but checksum".
>> 
>> Kettleborough, Jamie writes:
>> 
>>> Hello Karl, everyone,
>>> 
>>> 
>>> 	For replicating the latest version, I agree that your alternate 
>>> structure poses difficulties (but it seems like there must 
>> be a way to 
>>> smartly determine whether the file you already have a file 
>> and simply 
>>> need to move it, rather than bring it over again).
>>> 
>>> 
>>> Doesn't every user (not just the replication system) have 
>> this problem:
>>> they want to know what files have changed (or not changed) at a new 
>>> publication version.  No one wants to be using band width 
>> or storage 
>>> space to fetch and store files they already have.  How is a user 
>>> expected to know what has really changed?  Estani mentions 
>> check sums 
>>> - OK, but I don't think all nodes expose them (is this 
>> right?).  You 
>>> may try to infer from modification dates (not sure, I 
>> haven't look at 
>>> them that closely).  You may try to infer from the 
>> TRACKING_ID - but 
>>> I'm not sure how reliable this is (I can imagine scenarios where 
>>> different files share the same TRACKING_ID - e.g. if they have been 
>>> modified with an nco tool).
>>> 
>>> Is there a recommended method for users to understand what *files* 
>>> have actually changed when a new publication version appears?
>>> 
>>> Thanks,
>>> 
>>> Jamie
>>> 
>> 
>> -- 
>> 
>> V. Balaji                               Office:  +1-609-452-6516
>> Head, Modeling Systems Group, GFDL      Home:    +1-212-253-6662
>> Princeton University                    Email: v.balaji at noaa.gov
>> 
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech

---------------------------------------------------
Mark Morgan
Software Architect / Engineer
Institut Pierre Simon Laplace (IPSL),
Université Pierre Marie Curie,
4 Place Jussieu,
Tour 45-55, Salle #207,
Paris 75005
France.
Tel : +33 (0) 1 44 27 49 10
Email: momipsl at ipsl.jussieu.fr
---------------------------------------------------



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20110920/a27a99b2/attachment.html 


More information about the GO-ESSP-TECH mailing list