[Wrf-users] same run - different results?
Jialun Li
jialunl at uci.edu
Wed Nov 7 09:50:15 MST 2012
Hi All,
Are there any WRF users using SGI UV and/or SGI Altix ICE? Is there any
method to fix the problems when compiling and running code on these
systems using Intel and Intel-MPI for ICE, or Intel and SGI MPT for UV ?
I appreciate.
Jialun Li
On 11/5/2012 9:04 AM, Sam Trahan wrote:
> Brian,
>
> You mentioned you were seeing different results for different processor
> configurations. There are several likely causes we've seen in EMC when
> tracking down HWRF and GFS issues over the years:
>
> 1. RRTMG -- the WRF implementation of RRTMG uses a processor-local random
> number generator, which will always provide different results for
> different processor counts. The only way to prevent this is to use a grid
> of gridsquare-local random number generators such as the frame/bobrand.c,
> used by one of the SAS implementations in WRF but that would require some
> small changes to RRTMG.
>
> 2. OpenMP (SM parallel) -- a lot of OpenMP implementations do not
> guarantee the same results even with the same execution on the same number
> of processors, because the order of operations changes nearly randomly
> from run to run. Some OpenMP implementations do guarantee unchanged
> results if you enable some option while compiling or running (like
> -qstrict in IBM XL Fortran).
>
> 3. MPI_Reduce (DM parellel) -- the MPI_Reduce call will produce different
> results for different processor counts and topographies, since the order
> of operations will change. Not all WRF configurations will ever run
> MPI_Reduce though. If you use the same processor count and topology, the
> results should not change from run to run. The only way to avoid this is
> to either not run MPI_Reduce, or make sure your MPI_Reduce is 128 bit,
> which is extraordinarily expensive, and not cross-platform (though that is
> what the GSI does for cross-platform testing).
>
> 4. Bugs -- frequently, out-of-bounds accesses or uninitialized variables
> will pop up and cause these problems. The Intel compiler is especially
> good at detecting these sorts of problems, if you compile with "-O0 -debug
> all check all". It cannot detect every possible case though, since there
> are ways of (accidentally) fooling the detection, but it does detect a lot
> of them.
>
> Sincerely,
> Sam Trahan
>
> On Fri, 2 Nov 2012, Brian Jewett wrote:
>
>
> [NON-Text Body part not included]
> _______________________________________________
> Wrf-users mailing list
> Wrf-users at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/wrf-users
More information about the Wrf-users
mailing list