[Go-essp-tech] Reasoning for the use of symbolic links in drslib

Kettleborough, Jamie jamie.kettleborough at metoffice.gov.uk
Tue Sep 20 09:14:23 MDT 2011


Hello Balaji,

I agree - getting all nodes to make the checksums available would be a
good thing.  It gives you both the data integrity check on download, and
the ability to see what files really have changed from one publication
version to the next.

I don't know how hard it is to do this, particularly for data that is
already published.

Jamie 

> -----Original Message-----
> From: V. Balaji [mailto:V.Balaji at noaa.gov] 
> Sent: 20 September 2011 16:01
> To: Kettleborough, Jamie
> Cc: Karl Taylor; go-essp-tech at ucar.edu; esg-node-dev at lists.llnl.gov
> Subject: Re: [Go-essp-tech] Reasoning for the use of symbolic 
> links in drslib
> 
> If nodes can currently choose to record checksums or not, I'd 
> strongly recommend this be a non-optional requirement.. how 
> could anyone download any data with confidence without being 
> able to checksum?
> 
> You can of course check timestamps and filesizes and so on, 
> but you have to consider those optimizations... a fast option 
> for the less paranoid to avoid the sum computation, which has 
> to be the gold standard.
> 
> "Trust but checksum".
> 
> Kettleborough, Jamie writes:
> 
> > Hello Karl, everyone,
> >
> >
> > 	For replicating the latest version, I agree that your alternate 
> > structure poses difficulties (but it seems like there must 
> be a way to 
> > smartly determine whether the file you already have a file 
> and simply 
> > need to move it, rather than bring it over again).
> >
> >
> > Doesn't every user (not just the replication system) have 
> this problem:
> > they want to know what files have changed (or not changed) at a new 
> > publication version.  No one wants to be using band width 
> or storage 
> > space to fetch and store files they already have.  How is a user 
> > expected to know what has really changed?  Estani mentions 
> check sums 
> > - OK, but I don't think all nodes expose them (is this 
> right?).  You 
> > may try to infer from modification dates (not sure, I 
> haven't look at 
> > them that closely).  You may try to infer from the 
> TRACKING_ID - but 
> > I'm not sure how reliable this is (I can imagine scenarios where 
> > different files share the same TRACKING_ID - e.g. if they have been 
> > modified with an nco tool).
> >
> > Is there a recommended method for users to understand what *files* 
> > have actually changed when a new publication version appears?
> >
> > Thanks,
> >
> > Jamie
> >
> 
> -- 
> 
> V. Balaji                               Office:  +1-609-452-6516
> Head, Modeling Systems Group, GFDL      Home:    +1-212-253-6662
> Princeton University                    Email: v.balaji at noaa.gov
> 


More information about the GO-ESSP-TECH mailing list