<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Sébastien,<br>
<br>
[<dropping the help-desk link, as this is more a GO-ESSP
subject>]<br>
<br>
Thanks for the feedback. The replication problem is a beast that was
left to grow alone... so pretty much every institution has a
different procedure that's not being communicated properly. The
problem sadly is that there are many other issues that don't let us
get to the core of this problematic. Nevertheless, and as I said,
this is an ongoing effort and everyone is really doing their best
with the spare time they get after fixing other more urgent
problems.<br>
<br>
In our case replication is done "intelligently" if I may say so
(i.e. only deltas are moved around, everything is being kept track
of, etc); but not as automated as we'd like it to be... too many
exceptions, too little time. For instance, and AFAIK, gridFTP is
almost not used anywhere and there are very little institutions
providing fast access to their data for data replicators (i.e. when
replicating we go the same channel as all other users, which slows
thing considerably). We have 2 gridFTP servers and one dedicated for
replication, BADC has pretty much the same thing, although they have
to keep both servers in synch, and PCMDI has some GridFTP, but last
time I checked (a while ago) only a few datasets were available).
AFAIK no other institutions have those resources (and we three
provide it because of our "commitment" as archives).<br>
<br>
But basically the impediment of publishing replicas is what's
holding us back... what's the point of having replicas if they break
functionality? They were meant to help users (not to mention
archives rely on them), but as of this time, the procedure hinders
them by taking up precious resources (bandwidth - especially in
institutions not providing a separate means to access data for
replication) and not offering anything in exchange as we just can
publish replicas at this time.<br>
<br>
And regarding the write permit, I'd say nothing should be a matter
of trust. We just don't have a proper paradigm in place. <br>
There are trully two different type of "replicas": the archive one,
meant for persistence and LTA (long time archiving), and the
redundant, meant for speeding up bandwidth and used as a back-up.<br>
The problem is that the first one, can be used for redundancy as
well... though they have a very different nature: while the
redundant is a truly subordinated copy (i.e. it must follow the
"original" one), the archive copy is not, it's a complete new
entity. In programming terms, the redundant is a pointer with cached
date, while the archive is a deep copy.<br>
<br>
Sorry for the long mail, but I think the community should know about
the current status of replication.<br>
As usual, feedback is more than welcome.<br>
<br>
Thanks,<br>
Estani<br>
<br>
Am 23.01.2012 15:40, schrieb Sébastien Denvil:
<blockquote cite="mid:4F1D7175.3030801@ipsl.jussieu.fr" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
Hi Estani,<br>
<br>
Le 23/01/2012 12:40, Estanislao Gonzalez a écrit :
<blockquote cite="mid:4F1D4746.9030609@dkrz.de" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
Hi Sébastien,<br>
<br>
This is a known problem about replicas. I'm removing all
replicas from our system (just from the Gateway) hoping this
will get solved. This shoudn't inhibit replication via the BDM,
but it will forbid discovery... Anyway, I was waiting/hoping for
a solution to this, but I see no other option as to retreat
them. Should be ready soon...<br>
<br>
</blockquote>
<br>
Ok, thanks for letting me know. What about PCMDI
replicated/published datasets?<br>
<br>
<blockquote cite="mid:4F1D4746.9030609@dkrz.de" type="cite">
Regarding your last point:<br>
> Can you confirm that *not* all users authorised to publish
at DKRZ are able to modify this dataset? <br>
<br>
Well indeed they can, since there's only one person authorized
to published to DKRZ (me) and only one person able to change
that authorization (myself) I don't think this is a problem... </blockquote>
<br>
Ok, that was my supposition. If it's only you then no problem. In
the future it could be that other users can publish to the DKRZ
gateway. By that time it would be good to change permissions (just
to avoid mistakes).<br>
<br>
<blockquote cite="mid:4F1D4746.9030609@dkrz.de" type="cite">I
guess your question goes more on why am I able to "write" IPSL
dataset. Well, please remember that we are talking about
replicas, so I can't alter IPSL dataset, but I can publish a
replica. Furthermore, I'm even able to publish wrong
information, i.e. another datasets or a corrupt one, and mark it
as a replica of IPSLs. This is something we don't really want to
happen.<br>
</blockquote>
<br>
Again it's a matter of trust. I'm sure you perform all the
necessary checks to avoid publication of corrupted replicas.<br>
<br>
<blockquote cite="mid:4F1D4746.9030609@dkrz.de" type="cite"> On
the other hand, IPSL may remove, alter or do whatever it likes
with the "original", and that's again something archives don't
want, at least not if "our" copy is treated as the "replica".<br>
<br>
</blockquote>
<br>
Up to know we preserve datasets version and we follow the CMIP5
procedures precisely. The benefit for you is that you have time to
define the best replicas/publication strategy.<br>
<br>
<blockquote cite="mid:4F1D4746.9030609@dkrz.de" type="cite"> We do
have a lot to define regarding replicas. This is an ongoing
conversation, so I'll kindly ask or stakeholders to speak their
mind.<br>
</blockquote>
<br>
I believe gateways should expose all dataset version (especially
the last one). <br>
<br>
Because it's taking time to replicate/publish it would then mean
that the latest version may not have been replicated but
should appear as such in gateways.<br>
<br>
It would also mean that the best thing the replication software
must achieve is to be able to download only what has changed
(based on checksums when available) and following the drslib
strategy to build link when nothing has changed.<br>
<br>
Thanks.<br>
Sébastien<br>
<br>
<blockquote cite="mid:4F1D4746.9030609@dkrz.de" type="cite"> <br>
Thanks,<br>
Estani<br>
<br>
Am 23.01.2012 10:33, schrieb Sébastien Denvil:
<blockquote cite="mid:4F1D2957.8060905@ipsl.jussieu.fr"
type="cite">Dear all, <br>
<br>
browsing gateways from PCMDI, BADC and DKRZ using the
underlying facets of this dataset I observed a strange
behaviour.
cmip5.output1.IPSL.IPSL-CM5A-LR.piControl.mon.ocean.Omon.r1i1p1
<br>
<br>
This dataset have 2 versions, v20110324 and v20111010. <br>
<br>
Only the BADC gateway display the latest version. The other
two gateways display the old one and never mentioned the
existence of a new version. I believe this is a major issue
due to replication side effects. <br>
<br>
Because there isn't any "version" facet it would be important
to make visible every version of a dataset in an homogeneous
ways across gateways? <br>
<br>
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://cmip-gw.badc.rl.ac.uk/dataset/cmip5.output1.IPSL.IPSL-CM5A-LR.piControl.mon.ocean.Omon.r1i1p1.html">http://cmip-gw.badc.rl.ac.uk/dataset/cmip5.output1.IPSL.IPSL-CM5A-LR.piControl.mon.ocean.Omon.r1i1p1.html</a>
<br>
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://pcmdi3.llnl.gov/esgcet/dataset/cmip5.output1.IPSL.IPSL-CM5A-LR.piControl.mon.ocean.Omon.r1i1p1.html">http://pcmdi3.llnl.gov/esgcet/dataset/cmip5.output1.IPSL.IPSL-CM5A-LR.piControl.mon.ocean.Omon.r1i1p1.html</a>
<br>
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://ipcc-ar5.dkrz.de/dataset/cmip5.output1.IPSL.IPSL-CM5A-LR.piControl.mon.ocean.Omon.r1i1p1.html">http://ipcc-ar5.dkrz.de/dataset/cmip5.output1.IPSL.IPSL-CM5A-LR.piControl.mon.ocean.Omon.r1i1p1.html</a>
<br>
<br>
Also selecting the administration tab from the DKRZ gateway I
can read the following: <br>
Groups authorized for Writing: Users authorized to publish at
DKRZ <br>
Gateway Administrators <br>
<br>
Can you confirm that *not* all users authorised to publish at
DKRZ are able to modify this dataset? <br>
<br>
Regards. <br>
Sébastien <br>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
GO-ESSP-TECH mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:GO-ESSP-TECH@ucar.edu">GO-ESSP-TECH@ucar.edu</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://mailman.ucar.edu/mailman/listinfo/go-essp-tech">http://mailman.ucar.edu/mailman/listinfo/go-essp-tech</a>
</pre>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
Estanislao Gonzalez
Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
Phone: +49 (40) 46 00 94-126
E-Mail: <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:gonzalez@dkrz.de">gonzalez@dkrz.de</a> </pre>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
Sébastien Denvil
IPSL, Pôle de modélisation du climat
UPMC, Case 101, 4 place Jussieu,
75252 Paris Cedex 5
Tour 45-55 2ème étage Bureau 209
Tel: 33 1 44 27 21 10
Fax: 33 1 44 27 39 02
</pre>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
Estanislao Gonzalez
Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
Phone: +49 (40) 46 00 94-126
E-Mail: <a class="moz-txt-link-abbreviated" href="mailto:gonzalez@dkrz.de">gonzalez@dkrz.de</a> </pre>
</body>
</html>