[Wrf-users] Results variability depending on processor count
Jan Ploski
Jan.Ploski at offis.de
Wed Sep 24 08:24:29 MDT 2008
"Gustafson, William I" <william.gustafson at pnl.gov> schrieb am 09/19/2008
05:50:49 PM:
> Jan,
>
> This issue is actually a pretty complicated one that is a bit machine
and
> compiler dependent.
Bill,
Thanks for responding. Based on my most recent tests, I'm afraid that the
isssue is not just machine and compiler dependent. It is also dependent on
the input data. That is, one namelist.input will give bit-identical
results regardless of the used processor count whereas another won't.
Example:
I have a two-domain nested configuration with 100x151 grid points per
domain. I ran it with 1-8 processors and almost every run produced a
slightly different output after 1 hour of integration. Only the 4
processor and 8 processor runs agreed with each other. Then I changed this
configuration, ceteris paribus, to 300x351 grid points per domain, in
order to execute tests with up to 32 processors (there is a known problem
with applying too many processors to a small domain in WRF 3.0.1.1, which
is why I had to enlarge the domain). Surprisingly, this new configuration
resulted in bit-identical results for all runs with 4-32 processors (I
didn't perform 1-3 processor runs because of not enough memory).
This is on an Opteron x86_64 cluster, using PGI 6.2-5 compiler. WRF was
compiled with the -Kieee option (IEEE-compliant floating point operations)
for the above tests. Without the -Kieee option, I couldn't even observe an
agreement between 4 and 8 processor runs in the small case. In the large
case, there was again perfect agreement among all runs with varying
processor counts, even though the results were different from the ones
obtained with -Kieee.
I'm not concerned about the differences resulting from different compiler
options - fair enough - just about the differences due to different
processor counts, which I cannot explain.
> In fact, my understanding is that for any of us
> developing code to be released with WRF, we must be able to reproduce
the
> wrfout files and get a bit-for-bit match when we change processor
counts.
That is what I'd like to achieve, ideally between different clusters that
share the x86_64 architecture. However, getting invariant results under
varying processor counts in a single cluster would be a good first step.
> The reason for the differences are many. The most problematic is bugs in
the
> code, e.g. an array not being given a value before being used, which
leads
> to random results.
Wouldn't bugs cause easily reproducible deviations? Is it likely that
varying the processor count triggers these kinds of bugs?
> Another possibility is how optimization is done. Each CPU
> has a set of registers that hold values used for calculation and
> intermediary results. These registers typically operate at a higher
> precision than the numbers held in memory. So, when numbers are passed
from
> a register to memory and brought back to another register, a small
amount of
> precision is lost.
> The implication is that if multiple calculations can be
> done entirely in the registers, one can gain a little accuracy. However,
if
> during the same series of calculations one has to use memory space to
hold
> an intermediary value, the result could differ at the end. Compilers
often
> have options to prevent these differences by forcing round-off error to
be
> handled consistently, e.g. with ifort one would add "-fp-model precise".
I think you are referring to the x87 architecture (which has 80 bit
registers and 64 bit in-memory representation, according to what I have
read). However, as far as I understand, the SSE/SSE2 architecture of
Opteron is different and does not suffer from this particular problem. In
any case, I used the -Kieee and -Mnobuiltin options of the PGI compiler
just to be on the safe side (I thought). However, as reported above, the
results still varied.
Right now I tend to believe that there is a bug somewhere in WRF code,
based on the observations that it *can* produce bit-identical results
regardless of the processor count for *some* namelists (and input data).
Regards,
Jan Ploski
--
Dipl.-Inform. (FH) Jan Ploski
OFFIS
FuE Bereich Energie | R&D Division Energy
Escherweg 2 - 26121 Oldenburg - Germany
Phone/Fax: +49 441 9722 - 184 / 202
E-Mail: Jan.Ploski at offis.de
URL: http://www.offis.de
More information about the Wrf-users
mailing list