[Go-essp-tech] [versioning] some issues (related to atomic dataset concept)

Karl Taylor taylor13 at llnl.gov
Wed Mar 11 15:43:15 MDT 2009


Bryan, Charlotte et al.,

I have couple of comments/questions below:

Should the following read: "but *not* retracted"?

> 3) The state of information which has been replaced but retracted is not clear, and it needs to be, given that we state at the end that older version data can still be obtained.  This means that the earlier statement that files will not be directly versioned needs to be married to the ability to retain older versions, and if we we stick with atomic datasets we
> have to replicate atomic datasets even if only a few files within them have changed.  We ought to be handle that more cleverly (i.e. older pieces of replaced data are kept as new "old sub atomic datasets" or whatever - and the new "parent" atomic dataset will of course include the entire set of current data).
> 

As I understand it, a set of files (an "atomic dataset"??), but not 
individual files will be "versioned", but as part of the information 
made available for each version, a list of files and whether they are 
old, new or replacement files would be provided.  Is that correct?  Is 
it really necessary to replicate the actual "old" files in each new 
version?  Couldn't you simply point to the old files themselves (or does 
that go against the indivisible "atom" concept)?

> 4) We cannot see how changes in data would not be reflected in metadata, however, since all the metadata does not have a one-to-one relationship to data, it's probably cleaner to say: that where appropriate, external metadata will be versioned and/or updated to reflect the data versioning. So, we're not at all comfortable with the last bullet point 
> under data versioning breakdown. In particular, activities that give rise to data changes are pretty important!

I had a similar reaction to this.

> 
> 6) If somone holds retracted data, we think it should show up in the (a?) catalogue. The whole reason for having it would be for evidential reasons. We can certainly make it non-trivial to accidently use it, but it should be discoverable somehow!

Perhaps at the very least there should be an option for those 
"discovering" data to hide (or not) the retracted data.

Best regards,
Karl



More information about the GO-ESSP-TECH mailing list