[Go-essp-tech] Replica support in Gateway 2.0

martin.juckes at stfc.ac.uk martin.juckes at stfc.ac.uk
Thu Nov 10 11:02:50 MST 2011


I don't think so -- we had a use query a couple of weeks ago: "how do I know data hasn't changed?", and the response "was contact the data node managers". If that is the basis of our DOIs, they don't really do what people expect. We have promised a reference archive -- which is one of the reasons used to justify copying the data here. And the IPCC authors have been promised that the DOI will point to the reference archive (I think),

Cheers,
Martin 

> >-----Original Message-----
> >From: Bryan Lawrence [mailto:bryan.lawrence at ncas.ac.uk]
> >Sent: 10 November 2011 17:55
> >To: Juckes, Martin (STFC,RAL,RALSP)
> >Cc: go-essp-tech at ucar.edu; taylor13 at llnl.gov
> >Subject: Re: [Go-essp-tech] Replica support in Gateway 2.0
> >
> >> I don't think so, the point is that we can't guarantee that the data
> >being pointed to is the same as that which has been quality
> >controlled.
> >>
> >> We could try to explain -- if you consistently get data with the
> >wrong checksum it might mean the file has been changed and the DOI'ed
> >data is no longer there and you could try one of the other replicas,
> >but I think we should stick to pointing at locations where we have an
> >appropriate guarantee that the data won't change,
> >
> >Hi Martin. I think that's essentially a step beyond the mandate we
> >have. The system has to point to all the data ... but the DOI doesn't
> >of course. The issue then how easily we support both "everything
> >visible" with "here is the DOI'd copy" ... with the minimum of s/w and
> >effort. We're probably saying the same thing now :-)
> >
> >Cheers
> >Bryan
> >
> >
> >>
> >> Cheers,
> >> Martin
> >>
> >> > >-----Original Message-----
> >> > >From: Bryan Lawrence [mailto:bryan.lawrence at ncas.ac.uk]
> >> > >Sent: 10 November 2011 17:04
> >> > >To: go-essp-tech at ucar.edu
> >> > >Cc: Juckes, Martin (STFC,RAL,RALSP); taylor13 at llnl.gov
> >> > >Subject: Re: [Go-essp-tech] Replica support in Gateway 2.0
> >> > >
> >> > >
> >> > >Hi Martin
> >> > >
> >> > >> Unfortunately not. We can say "please don't touch the files
> >once
> >> > >you've published" to the data node managers, but we can't tell
> >the
> >> > >users that this is a guarantee. The checksums in the catalogue do
> >not
> >> > >guarantee that the underlying files have not been changed,
> >> > >
> >> > >Well that's true, but isn't that the same use case as user
> >downloads
> >> > >data, and doesn't use the checksums? If she/he did so in this
> >> > >instance, they would find a problem ... (unfortuantely they'd
> >probably
> >> > >carry on trying to download the data until the checksums matched
> >...
> >> > >but they might eventually give up and download a replica where
> >the
> >> > >checksums did match).
> >> > >
> >> > >If the originator changed the checksums, then we'd know it was a
> >> > >different version, although that "knowing" might require a qc
> >tool to
> >> > >regularly check that all replicas (and the original) still share
> >the
> >> > >same checksums.
> >> > >
> >> > >Bottom line: anyone who doesn't use the checksums, deserves to
> >get
> >> > >burnt!
> >> > >
> >> > >Bryan
> >> > >
> >> > >
> >> > >> Cheers,
> >> > >> Martin
> >> > >>
> >> > >> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-
> >> > >bounces at ucar.edu] On Behalf Of Karl Taylor
> >> > >> Sent: 10 November 2011 16:52
> >> > >> To: go-essp-tech at ucar.edu
> >> > >> Subject: Re: [Go-essp-tech] Replica support in Gateway 2.0
> >> > >>
> >> > >> Dear all,
> >> > >>
> >> > >> If we guard against "replicas" appearing in ESG that have a
> >> > >different checksum from the original, then the same QC status
> >should
> >> > >apply to all replicas and it shouldn't matter which one we point
> >to as
> >> > >having passed a given QC level.
> >> > >>
> >> > >> This seems to be another argument to add checksums to all
> >datasets
> >> > >as soon as we can.  Once this happens, wouldn't that mean that
> >the
> >> > >current gateways are adequate?
> >> > >>
> >> > >> cheers,
> >> > >> Karl
> >> > >>
> >> > >> On 11/10/11 7:15 AM, Bryan Lawrence wrote:
> >> > >>
> >> > >>
> >> > >>
> >> > >> Martin has been quite vociferous (quite rightly) in personal
> >email
> >> > >to me that as far as QC goes, the dataset which gets through QC2
> >will
> >> > >*not* be the original dataset - we have no control over the
> >original
> >> > >dataset's permanence and/or immutability.
> >> > >>
> >> > >>
> >> > >>
> >> > >> This raises some interesting issues about the role of ESGF ...
> >and
> >> > >it's interaction with the data owner and the publication process
> >which
> >> > >is governed by DKRZ as the Publisher (and in the future probably
> >> > >multiple publication processes and multiple Publishers). The
> >correct
> >> > >analogy here, as I said on an earlier email today, is to consider
> >the
> >> > >original dataset as a preprint, of a Published dataset (at QC
> >level
> >> > >3).
> >> > >>
> >> > >>
> >> > >>
> >> > >> Incidentally, this disctinction might offer us a possible
> >(distinct)
> >> > >future for two different types of gateways into ESGF: the
> >Published
> >> > >datasets view (which makes pre-eminent the QC'd copy) and the
> >> > >published view (which makes pre-eminenent whatever someone sticks
> >on a
> >> > >data node).
> >> > >>
> >> > >>
> >> > >>
> >> > >> But meanwhile, I think we can live with what you proposed, as
> >long
> >> > >as the QC status of the replicas is clearly visible - and the DOI
> >> > >points to a landing page that somehow prioritises those versions,
> >> > >which would be trivial if your page was organised in the same way
> >> > >(prioritising the replicants of QC level 3, then replicants of QC
> >> > >level 2, and then originals).
> >> > >>
> >> > >>
> >> > >>
> >> > >> Cheers
> >> > >>
> >> > >> Bryan
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> Hi Stephen,
> >> > >>
> >> > >>
> >> > >>
> >> > >> On 11/10/2011 05:23 AM,
> >> > >stephen.pascoe at stfc.ac.uk<mailto:stephen.pascoe at stfc.ac.uk>
> >wrote:
> >> > >>
> >> > >> Hi Eric,
> >> > >>
> >> > >>
> >> > >>
> >> > >> Replicas are beginning to show up in CMIP5 and this is exposing
> >some
> >> > >>
> >> > >> gaps in what Gateway 1.x can do. I know you are reimplementing
> >> > >replica
> >> > >>
> >> > >> support in Gateway 2.0 so I'd like to raise these issues now.
> >> > >>
> >> > >>
> >> > >>
> >> > >> We need to be able to publish a replica to the same Gateway
> >that
> >> > >hosts
> >> > >>
> >> > >> the original. I can't imagine this being possible with Gateway
> >1.x
> >> > >since
> >> > >>
> >> > >> the URL http://<GATEWAY>/dataset/<dataset-id>.html only points
> >to
> >> > >one
> >> > >>
> >> > >> dataset on that Gateway. Either that page needs to link to the
> >> > >original
> >> > >>
> >> > >> and all replicas for that dataset or we need separate URLs for
> >each
> >> > >>
> >> > >> replica/original, or both.
> >> > >>
> >> > >>
> >> > >>
> >> > >> The current direction for the implementation would be to have a
> >1
> >> > >page
> >> > >>
> >> > >> for the original dataset and have that page list where replicas
> >are
> >> > >>
> >> > >> located.
> >> > >>
> >> > >>
> >> > >>
> >> > >> If there are use cases for the other options we should get
> >those
> >> > >identified.
> >> > >>
> >> > >>
> >> > >>
> >> > >> Thanks!
> >> > >>
> >> > >> -Nate
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> Is this part of your design for Gateway 2.0's replica support?
> >> > >>
> >> > >>
> >> > >>
> >> > >> Thanks,
> >> > >>
> >> > >>
> >> > >>
> >> > >> Stephen.
> >> > >>
> >> > >>
> >> > >>
> >> > >> ---
> >> > >>
> >> > >>
> >> > >>
> >> > >> Stephen Pascoe +44 (0)1235 445980
> >> > >>
> >> > >>
> >> > >>
> >> > >> Centre of Environmental Data Archival
> >> > >>
> >> > >>
> >> > >>
> >> > >> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot
> >OX11
> >> > >0QX, UK
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> --
> >> > >>
> >> > >> Scanned by iCritical.
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> _______________________________________________
> >> > >>
> >> > >> GO-ESSP-TECH mailing list
> >> > >>
> >> > >> GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>
> >> > >>
> >> > >> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >> > >>
> >> > >> _______________________________________________
> >> > >>
> >> > >> GO-ESSP-TECH mailing list
> >> > >>
> >> > >> GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>
> >> > >>
> >> > >> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> --
> >> > >>
> >> > >> Bryan Lawrence
> >> > >>
> >> > >> University of Reading:  Professor of Weather and Climate
> >Computing.
> >> > >>
> >> > >> National Centre for Atmospheric Science: Director of Models and
> >> > >Data.
> >> > >>
> >> > >> STFC: Director of the Centre for Environmental Data Archival.
> >> > >>
> >> > >> Ph: +44 118 3786507 or 1235 445012;
> >Web:home.badc.rl.ac.uk/lawrence
> >> > >>
> >> > >> _______________________________________________
> >> > >>
> >> > >> GO-ESSP-TECH mailing list
> >> > >>
> >> > >> GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>
> >> > >>
> >> > >> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >> > >>
> >> > >>
> >> > >
> >> > >--
> >> > >Bryan Lawrence
> >> > >University of Reading:  Professor of Weather and Climate
> >Computing.
> >> > >National Centre for Atmospheric Science: Director of Models and
> >Data.
> >> > >STFC: Director of the Centre for Environmental Data Archival.
> >> > >Ph: +44 118 3786507 or 1235 445012;
> >Web:home.badc.rl.ac.uk/lawrence
> >>
> >
> >--
> >Bryan Lawrence
> >University of Reading:  Professor of Weather and Climate Computing.
> >National Centre for Atmospheric Science: Director of Models and Data.
> >STFC: Director of the Centre for Environmental Data Archival.
> >Ph: +44 118 3786507 or 1235 445012; Web:home.badc.rl.ac.uk/lawrence
-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list