[Wrf-users] same run - different results?

Sam Trahan samtrahan at samtrahan.com
Mon Nov 5 10:04:30 MST 2012


Brian,

You mentioned you were seeing different results for different processor 
configurations.  There are several likely causes we've seen in EMC when 
tracking down HWRF and GFS issues over the years:

1. RRTMG -- the WRF implementation of RRTMG uses a processor-local random 
number generator, which will always provide different results for 
different processor counts.  The only way to prevent this is to use a grid 
of gridsquare-local random number generators such as the frame/bobrand.c, 
used by one of the SAS implementations in WRF but that would require some 
small changes to RRTMG.

2. OpenMP (SM parallel) -- a lot of OpenMP implementations do not 
guarantee the same results even with the same execution on the same number 
of processors, because the order of operations changes nearly randomly 
from run to run.  Some OpenMP implementations do guarantee unchanged 
results if you enable some option while compiling or running (like 
-qstrict in IBM XL Fortran).

3. MPI_Reduce (DM parellel) -- the MPI_Reduce call will produce different 
results for different processor counts and topographies, since the order 
of operations will change.  Not all WRF configurations will ever run 
MPI_Reduce though.  If you use the same processor count and topology, the 
results should not change from run to run.  The only way to avoid this is 
to either not run MPI_Reduce, or make sure your MPI_Reduce is 128 bit, 
which is extraordinarily expensive, and not cross-platform (though that is 
what the GSI does for cross-platform testing).

4. Bugs -- frequently, out-of-bounds accesses or uninitialized variables 
will pop up and cause these problems.  The Intel compiler is especially 
good at detecting these sorts of problems, if you compile with "-O0 -debug 
all check all".  It cannot detect every possible case though, since there 
are ways of (accidentally) fooling the detection, but it does detect a lot 
of them.

Sincerely,
Sam Trahan

On Fri, 2 Nov 2012, Brian Jewett wrote:


   [NON-Text Body part not included]


More information about the Wrf-users mailing list