<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFCC" text="#000000">
<br>
<div class="moz-forward-container"><br>
<br>
-------- Original Message --------
<table class="moz-email-headers-table" border="0" cellpadding="0"
cellspacing="0">
<tbody>
<tr>
<th align="RIGHT" nowrap="nowrap" valign="BASELINE">Subject:
</th>
<td>Re: [esgf-devel] Bug reports from IS-ENES2 installation
sprint</td>
</tr>
<tr>
<th align="RIGHT" nowrap="nowrap" valign="BASELINE">Date: </th>
<td>Mon, 11 Nov 2013 18:42:22 -0800</td>
</tr>
<tr>
<th align="RIGHT" nowrap="nowrap" valign="BASELINE">From: </th>
<td>Gavin M. Bell <a class="moz-txt-link-rfc2396E" href="mailto:gavin@llnl.gov"><gavin@llnl.gov></a></td>
</tr>
<tr>
<th align="RIGHT" nowrap="nowrap" valign="BASELINE">To: </th>
<td>Stephen Pascoe <a class="moz-txt-link-rfc2396E" href="mailto:stephen.pascoe@lirico.co.uk"><stephen.pascoe@lirico.co.uk></a></td>
</tr>
<tr>
<th align="RIGHT" nowrap="nowrap" valign="BASELINE">CC: </th>
<td><a class="moz-txt-link-abbreviated" href="mailto:esgf-devel@lists.llnl.gov">esgf-devel@lists.llnl.gov</a>, IS-ENES-2 Data-WPs
<a class="moz-txt-link-rfc2396E" href="mailto:is-enes2-data@lists.enes.org"><is-enes2-data@lists.enes.org></a></td>
</tr>
</tbody>
</table>
<br>
<br>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
Hey All, <br>
<br>
Thanks for running the installer through its paces... we are
certainly resource constrained to test all the configurations.<br>
The one we run the most is the DATA+INDEX+IDP and another
additionally with COMPUTE. So I am glad to see the other
permutations getting some exercise. <br>
<br>
Some of these issues have been partially dealt with but are in
need of a more deep tissue massage (#19) others are soon to be
moot (#18), others are short oversights (entry 3.) Some are new
dependencies brought over with updates in components like UV-CDAT
needing gfortran, (entry 5.).<br>
<br>
There are now only a handful of issues left open... some of which
are on the way to being closed. Others... 'the juice is not [yet]
worth the squeeze'. Keep in mind that moving forward we are going
to move to a VM(-ish) based solution, where matters of
installation at this level will be only ever witnessed and done by
a handful of folks - so I caution us to be parsimonious with the
effort we may initially want to marshal to this end.<br>
<br>
(rest of response is interleaved)<br>
<br>
<div class="moz-cite-prefix">On 11/8/13 6:08 AM, Stephen Pascoe
wrote:<br>
</div>
<blockquote
cite="mid:ED9FAF2B-1166-4628-90D5-F8693989CEA6@lirico.co.uk"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<div>European node managers have been meeting in Paris over the
last 3 days to test installation of ESG nodes. Below is a
summary of issues we have found. Where appropriate I have
raised an equivalent ticket on github as indicated below.</div>
<div><br>
</div>
<div>Our judgement is that 1.6 requires some work before it is
safe to upgrade production nodes.</div>
<div><br>
</div>
<div>Speaking purely as BADC, we will try to help resolve some
of these issues by issuing pull requests.</div>
<div><br>
</div>
<div><br>
</div>
<div><b><br>
</b></div>
<div><b>1. Myproxy failing to install correctly [issue #18]</b></div>
<div><br>
</div>
<div>We spent some time diagnosing an apparent failure to
install myproxy. The symptoms included no /etc/init.d/myproxy
being installed and "esg-node generate-globus-key-and-csr"
failing. In the end we discovered that this was because we
had answered "N" when asked to install globus for a second
time. </div>
<div><br>
</div>
<div>The script first installs gridftp and tries to install
myproxy. When an existing globus installation is detected it
asks whether you want to install globus again, defaulting to
"N". If you follow the default myproxy is not installed.</div>
<div><br>
</div>
<div>We recommend at the minimum the default should be to
install globus. Ideally the installer would only query the
user to install globus once and would know whether to install
gridftp and/or myproxy.</div>
</blockquote>
<br>
[response]<br>
I have paid little attention to the globus script, it quite
frankly is a bit of a mess and will be entirely replaced with an
rpm solution.<br>
This is work that will be available in v1.7.0. For now, just
answer *yes* to all questions being presented when in the globus
realm of the installation - regardless of the defaults. I worked
quite a bit to make the prompts be 'smart' so you can essentially
hit [return] all the way through, but that didn't happen in the
globus script. The idea there is to make it through the FULL
globus install by saying yes to EVERYTHING that you are asked for
and once you have done it... never do it again :-). (sort of).
This is why subsequently the install prompts having to do with
globus steer you way from re-entering that install process with
default "N" answers.<br>
<br>
Again, this is going away entirely so there is no point in
investing any time with addressing these issues.<br>
<br>
<br>
<blockquote
cite="mid:ED9FAF2B-1166-4628-90D5-F8693989CEA6@lirico.co.uk"
type="cite">
<div><br>
</div>
<div><b>2. THREDDS fails to start with data-only install [issue
#19]</b></div>
<div><b><br>
</b></div>
<div>When not installing all components the directory
/esg/content/thredds is not created and the script fails. The
solution is to create /esg/content/thredds and re-run the
script.</div>
<div><br>
</div>
<div>When not installing the compute component thredds will not
start because it is looking for las_servers.xml. The solution
is to comment out the ipFilter declaration in thredds'
web.xml.</div>
<div><br>
</div>
<div>We recommend the ipFilter declaration in web.xml should
only be included in the compute configuration.</div>
<div><br>
</div>
</blockquote>
<br>
[response]<br>
Luca will relax the filter's behavior when it cannot find the
las_servers.xml files. This was partially addressed in
adedd261ee7404b84cde8b8ec0b8a329dd3109df but indeed it was still
in the context of installing a compute node.<br>
With relaxing the ipFilter this will go away. <b>No need to do
any edits to the Thredds web.xml file</b>. We don't want any
one-off cases of editing that file... it is not meant to be edited
casually (as in during the course of an installation).<br>
<br>
<br>
<blockquote
cite="mid:ED9FAF2B-1166-4628-90D5-F8693989CEA6@lirico.co.uk"
type="cite">
<div><b>3. Recent changes to rainbow prevent installation of git</b></div>
<div><br>
</div>
<div>On Thursday the installer changed the git version and this
version was not downloadable from rainbow therefore clean
installations stopped working. This demonstrates why we need
reproducible installations. In this particular case we should
just depend on the Git RPM.</div>
</blockquote>
<br>
[response]<br>
This was a push that was inadvertent. The source has been posted
to rainbow, however, the installer has been updated to use git's
distribution server to pull down the source for building. <a
moz-do-not-send="true" class="moz-txt-link-freetext"
href="https://github.com/ESGF/esgf-installer/issues/17">https://github.com/ESGF/esgf-installer/issues/17</a><br>
<br>
<br>
<br>
<blockquote
cite="mid:ED9FAF2B-1166-4628-90D5-F8693989CEA6@lirico.co.uk"
type="cite">
<div><br>
</div>
<div><b>4. Failing to re-install replica shards causes script
failure [issue #20]</b></div>
<div><b><br>
</b></div>
<div>If you accept the default response for "Replica shard entry
for port X is already present. Would you like to install it
again?" the script fails. The default is N you need Y.</div>
<div><br>
</div>
<div><b>5. Undocumented prerequisites</b></div>
<div><b><br>
</b></div>
<div>gcc-gfortran is now a dependency but is undocumented.</div>
</blockquote>
<br>
[response]<br>
Added the following to the wiki section that describes
pre-requisites: (<a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="https://github.com/ESGF/esgf.github.io/wiki/ESGFNode%7CFAQ">https://github.com/ESGF/esgf.github.io/wiki/ESGFNode%7CFAQ</a>)<br>
<i>NOTE: There are additional prerequisites from the UV-CDAT tool
that is installed as part of the DATA configuration of the
stack. Please see them here: <a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="https://github.com/UV-CDAT/uvcdat/wiki/System-Requirements">https://github.com/UV-CDAT/uvcdat/wiki/System-Requirements</a>,
most notably the need for gfortran. (In newer versions of
uv-cdat gfortran is part of the installation procedure)</i><br>
<br>
<br>
<br>
<blockquote
cite="mid:ED9FAF2B-1166-4628-90D5-F8693989CEA6@lirico.co.uk"
type="cite">
<div><br>
</div>
<div><b>6. Compute configuration fails with "Argument too long"
bash error [issue #21]</b></div>
<div><br>
</div>
<div>This issue has occured on data/compute and
idp/index/data/compute nodes from scratch installations. Rerun
the installation again doest not solve the problem. The
work-around we found is to remove /usr/local/ferret and rerun
the installation. </div>
<div><br>
</div>
<blockquote class="webkit-indent-blockquote" style="margin: 0 0
0 40px; border: none; padding: 0px;">
<div>*******************************</div>
<div>Setting up LAS Product Server...</div>
<div>*******************************</div>
<div><br>
</div>
<div>Getting LAS...</div>
<div>Don't see LAS tar file las-esgf-v8.1.tar.gz Downloading
LAS from las-esgf-v8.1.tar.gz -to->
/usr/local/src/esgf/workbench/esg/ferret/8.1/las-esgf-v8.1.tar.gz</div>
<div>wget -O 'las-esgf-v8.1.tar.gz' '<a moz-do-not-send="true"
href="ftp://ftp.pmel.noaa.gov/pub/las/las-esgf-v8.1.tar.gz%27">ftp://ftp.pmel.noaa.gov/pub/las/las-esgf-v8.1.tar.gz'</a></div>
<div>/usr/local/bin/esg-product-server: line 426:
/usr/bin/wget: Argument list too long</div>
<div> ERROR: Could not download LAS:las-esgf-v8.1.tar.gz</div>
</blockquote>
<div><br>
</div>
<div>This would appear to be a low-level bash bug (because the
command being executed definitely only has 3 arguments).</div>
<div>One possible work-around would be to try to increase ulimit
-s. it is 10240 by default. On Linux, the maximum amount of
space for command arguments is 1/4th of the amount of
available stack space.</div>
</blockquote>
<br>
Zed hit it on the head in his email 11/8/13 @9:32am PDT. There
are things that we could perhaps do to try to scrub up behind us
regarding the environment, but nothing that comes to mind.<br>
<br>
Hence the solution currently is that you can do DATA+INDEX+IDP in
one pass... and then do COMPUTE in another. There are other
tricks that can be done... but at the moment. <br>
<br>
<blockquote
cite="mid:ED9FAF2B-1166-4628-90D5-F8693989CEA6@lirico.co.uk"
type="cite">
<div><br>
</div>
<div><b>7. Assorted minor issues</b></div>
<div><br>
</div>
<div>
<div>We observed that when SOLr shards time out the logs do
not show which shard failed. This would be very helpful for
diagnosing issues.</div>
</div>
<div>
<div><br>
</div>
<div>If you don't include "--verify" the esgcet/catalog.xml
file is not created.</div>
<div><br>
</div>
<div><br>
</div>
</div>
</blockquote>
[response]<br>
There are a bunch of shards/search related flags to show the state
of shards... If shards fail, which I haven't seen happen thus far,
then they would be seen as timing out locally. the --verify flag
is pretty much there to run the test_* functions that sanity check
things are running and on the 'right' port. To your point it is
not quite as useful as its initial intentions... but still
marginally functional to sanity check the install. I do not use
that flag that much in the wild. Pretty much --install does what
is needed for most occasions... (same goes for --update, which I
believe I took out). The --install flag is idempotent and only
changes things that are out of version range or not present, which
is what you want.<br>
<br>
<br>
P.S.<br>
I see a bunch of work being done on the fork over at badc, thus
far from what I can see there are 12 divergent commits - we should
have an install meeting to discuss those plans.<br>
<br>
<pre class="moz-signature" cols="72">--
Senior Computer Scientist / Mathematics Programmer
Gavin M. Bell
Lawrence Livermore National Labs
--
"Never mistake a clear view for a short distance."
         -Paul Saffo
(GPG Key - <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://rainbow.llnl.gov/dist/keys/gavin.asc">http://rainbow.llnl.gov/dist/keys/gavin.asc</a>)
A796 CE39 9C31 68A4 52A7 1F6B 66B7 B250 21D5 6D3E
</pre>
<br>
</div>
<br>
</body>
</html>