[Wrf-users] same run - different results?
Sam Trahan
samtrahan at samtrahan.com
Mon Nov 5 10:04:30 MST 2012
Brian,
You mentioned you were seeing different results for different processor
configurations. There are several likely causes we've seen in EMC when
tracking down HWRF and GFS issues over the years:
1. RRTMG -- the WRF implementation of RRTMG uses a processor-local random
number generator, which will always provide different results for
different processor counts. The only way to prevent this is to use a grid
of gridsquare-local random number generators such as the frame/bobrand.c,
used by one of the SAS implementations in WRF but that would require some
small changes to RRTMG.
2. OpenMP (SM parallel) -- a lot of OpenMP implementations do not
guarantee the same results even with the same execution on the same number
of processors, because the order of operations changes nearly randomly
from run to run. Some OpenMP implementations do guarantee unchanged
results if you enable some option while compiling or running (like
-qstrict in IBM XL Fortran).
3. MPI_Reduce (DM parellel) -- the MPI_Reduce call will produce different
results for different processor counts and topographies, since the order
of operations will change. Not all WRF configurations will ever run
MPI_Reduce though. If you use the same processor count and topology, the
results should not change from run to run. The only way to avoid this is
to either not run MPI_Reduce, or make sure your MPI_Reduce is 128 bit,
which is extraordinarily expensive, and not cross-platform (though that is
what the GSI does for cross-platform testing).
4. Bugs -- frequently, out-of-bounds accesses or uninitialized variables
will pop up and cause these problems. The Intel compiler is especially
good at detecting these sorts of problems, if you compile with "-O0 -debug
all check all". It cannot detect every possible case though, since there
are ways of (accidentally) fooling the detection, but it does detect a lot
of them.
Sincerely,
Sam Trahan
On Fri, 2 Nov 2012, Brian Jewett wrote:
[NON-Text Body part not included]
More information about the Wrf-users
mailing list