[Go-essp-tech] Replica support in Gateway 2.0

Estanislao Gonzalez gonzalez at dkrz.de
Thu Nov 10 15:38:52 MST 2011


Hi Martin,

I think I wasn't clear enough. It's not "done" yet, but I guess the 
"original" copy located at a specific datanode (pre-print), may be 
replaced by a replicated copy of that data "copied" to a new data node 
(so I don't use the word publish again) and used for creating a new set 
of TDS catalogs marked as "originals" and not as replicas (the master 
Gateway knows nothing about this, the Catalogs are ready for being 
ingested and should match the original ones). After this, the master 
Gateway swaps the "original" copy pointed at the pre-print node, with 
that of the new datanode. The pre-print is not required anymore and 
might even delete it to free some space. (Reusing that copy could be 
more tricky and requires changes to the system)

The completely remove of the pre-print might not be required, depending 
on the uniqueness constraint of the id as asked in my previous mail. 
Swapping involves altering the URLs in the Gateway DB, not more than 
that.
The Data published is not the one held by the author, but the one held 
at the publishers (archives).

But never mind, was just a possible solution.
Thanks,
Estani

On 10.11.2011 08:25, martin.juckes at stfc.ac.uk wrote:
> Hi Estani,
>
> You missed the start -- the bit which is not achievable is publishing
> a replica to the same gateway used for the original publication of
> that data. E.g. IPSL data published to BADC,
>
> Cheers,
> Martin
>
>> >-----Original Message-----
>> >From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-
>> >bounces at ucar.edu] On Behalf Of Estanislao Gonzalez
>> >Sent: 10 November 2011 16:20
>> >To: go-essp-tech at ucar.edu; esg-gateway-dev at earthsystemgrid.org
>> >Subject: Re: [Go-essp-tech] Replica support in Gateway 2.0
>> >
>> >Hi,
>> >
>> >this analogy seams perfect. Now regarding to the options we have at
>> >the
>> >moment:
>> >1) How "unique" is the dataset id in a Gateway? Federation wide, 
>> local
>> >gAteway or project unique?
>> >2) Depending on one, the procedure could involve "moving" the
>> >published
>> >data to some other project, Gateway, Federation :-)
>> >
>> >I think this could be achievable:
>> >1) Data gets replicated to some Gateway (redundancy enforced)
>> >2) The originated Gateway, if it's also replicating, should 
>> replicate
>> >(just data, no publication yet) from the QC checked replica.
>> >3) The "pre-print" gets removed (which either mean move to a 
>> different
>> >project, Gateway, etc or really completely delete it from the 
>> Gateway)
>> >4) The replica gets published.
>> >
>> >I might be omitting something, but it seams achievable right now.
>> >
>> >My 2c,
>> >Estani
>> >
>> >Am 10.11.2011 07:15, schrieb Bryan Lawrence:
>> >> Martin has been quite vociferous (quite rightly) in personal 
>> email
>> >to me that as far as QC goes, the dataset which gets through QC2 
>> will
>> >*not* be the original dataset - we have no control over the 
>> original
>> >dataset's permanence and/or immutability.
>> >>
>> >> This raises some interesting issues about the role of ESGF ... 
>> and
>> >it's interaction with the data owner and the publication process 
>> which
>> >is governed by DKRZ as the Publisher (and in the future probably
>> >multiple publication processes and multiple Publishers). The 
>> correct
>> >analogy here, as I said on an earlier email today, is to consider 
>> the
>> >original dataset as a preprint, of a Published dataset (at QC level
>> >3).
>> >>
>> >> Incidentally, this disctinction might offer us a possible 
>> (distinct)
>> >future for two different types of gateways into ESGF: the Published
>> >datasets view (which makes pre-eminent the QC'd copy) and the
>> >published view (which makes pre-eminenent whatever someone sticks 
>> on a
>> >data node).
>> >>
>> >> But meanwhile, I think we can live with what you proposed, as 
>> long
>> >as the QC status of the replicas is clearly visible - and the DOI
>> >points to a landing page that somehow prioritises those versions,
>> >which would be trivial if your page was organised in the same way
>> >(prioritising the replicants of QC level 3, then replicants of QC
>> >level 2, and then originals).
>> >>
>> >> Cheers
>> >> Bryan
>> >>
>> >>
>> >>> Hi Stephen,
>> >>>
>> >>> On 11/10/2011 05:23 AM, stephen.pascoe at stfc.ac.uk wrote:
>> >>>> Hi Eric,
>> >>>>
>> >>>> Replicas are beginning to show up in CMIP5 and this is exposing
>> >some
>> >>>> gaps in what Gateway 1.x can do. I know you are reimplementing
>> >replica
>> >>>> support in Gateway 2.0 so I'd like to raise these issues now.
>> >>>>
>> >>>> We need to be able to publish a replica to the same Gateway 
>> that
>> >hosts
>> >>>> the original. I can't imagine this being possible with Gateway 
>> 1.x
>> >since
>> >>>> the URL http://<GATEWAY>/dataset/<dataset-id>.html only points 
>> to
>> >one
>> >>>> dataset on that Gateway. Either that page needs to link to the
>> >original
>> >>>> and all replicas for that dataset or we need separate URLs for
>> >each
>> >>>> replica/original, or both.
>> >>> The current direction for the implementation would be to have a 
>> 1
>> >page
>> >>> for the original dataset and have that page list where replicas 
>> are
>> >>> located.
>> >>>
>> >>> If there are use cases for the other options we should get those
>> >identified.
>> >>>
>> >>> Thanks!
>> >>> -Nate
>> >>>
>> >>>
>> >>>> Is this part of your design for Gateway 2.0's replica support?
>> >>>>
>> >>>> Thanks,
>> >>>>
>> >>>> Stephen.
>> >>>>
>> >>>> ---
>> >>>>
>> >>>> Stephen Pascoe +44 (0)1235 445980
>> >>>>
>> >>>> Centre of Environmental Data Archival
>> >>>>
>> >>>> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot 
>> OX11
>> >0QX, UK
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Scanned by iCritical.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> GO-ESSP-TECH mailing list
>> >>>> GO-ESSP-TECH at ucar.edu
>> >>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>> >>> _______________________________________________
>> >>> GO-ESSP-TECH mailing list
>> >>> GO-ESSP-TECH at ucar.edu
>> >>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>> >>>
>> >> --
>> >> Bryan Lawrence
>> >> University of Reading:  Professor of Weather and Climate 
>> Computing.
>> >> National Centre for Atmospheric Science: Director of Models and
>> >Data.
>> >> STFC: Director of the Centre for Environmental Data Archival.
>> >> Ph: +44 118 3786507 or 1235 445012; 
>> Web:home.badc.rl.ac.uk/lawrence
>> >> _______________________________________________
>> >> GO-ESSP-TECH mailing list
>> >> GO-ESSP-TECH at ucar.edu
>> >> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>> >
>> >
>> >--
>> >Estanislao Gonzalez
>> >
>> >Max-Planck-Institut für Meteorologie (MPI-M)
>> >Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing 
>> Centre
>> >Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>> >
>> >Phone:   +49 (40) 46 00 94-126
>> >E-Mail:  gonzalez at dkrz.de
>> >
>> >_______________________________________________
>> >GO-ESSP-TECH mailing list
>> >GO-ESSP-TECH at ucar.edu
>> >http://mailman.ucar.edu/mailman/listinfo/go-essp-tech

-- 
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  gonzalez at dkrz.de


More information about the GO-ESSP-TECH mailing list