[Wrf-users] Odd WRF crashes

Dmitry N. Mikushin maemarcus at gmail.com
Sat Aug 25 18:34:56 MDT 2012


Hi Bart,

> Any suggestions for how to track down this error?

Valgrind may help:

1) compile WRF wih debug information included and without
optimization, i.e. -g -O0
2) run mpi app as usual, but with valgrind in the middle: mpirun -np N
valgrind app_name app_args
3) redirect terminal output to disk file
4) wait for app to crash
5) open output and analyze traces: if debug info presents, it will
show you the trace of frames annotated with source files lines,
describing how exactly the app crashed.

Good luck,
- D.

2012/8/16 Bart Brashers <bbrashers at environcorp.com>:
> WRFv3.4, PGI compilers, WPSv3.4 using ERA-Interim + RTG SST initialization, OpenMPI 1.4.3, on 2x3=6 cores of an AMD 6100 series processor.  Some of my settings:
>
>  max_dom                             = 3,
>  e_we                                = 165, 100, 133,
>  e_sn                                = 129, 100, 100,
>  e_vert                              = 34,   34,  34,
>  dx                                  = 36000, 12000, 4000,
>  mp_physics                          = 10,   10,   10,
>  mp_zero_out                         = 1,
>  mp_zero_out_thresh                  = 1.e-8,
>  ra_lw_physics                       = 4,     4,    4,
>  ra_sw_physics                       = 4,     4,    4,
>  radt                                = 30,   10,    5,
>  sf_sfclay_physics                   = 2,     2,    2,
>  sf_surface_physics                  = 2,     2,    2,
>  sf_urban_physics                    = 0,     0,    0,
>  bl_pbl_physics                      = 2,     2,    2,
>  bldt                                = 0,     0,    0,
>  cu_physics                          = 5,     5,    0,
>  cudt                                = 0,     0,    0,
>  ishallow                            = 0,
>  prec_acc_dt                         = 0.,   0.,   0.,
>
> Many of my 5.5-day inits run OK, but a few here and there are crashing.  Here's an example of the frustrating lack of details:
>
> # tail -20 rsl.error.0002
>   OBS NUDGING FOR IN,J,KTAU,XTIME,IVAR,IPL:  3 10   936   171.14  2  2 rindx=45.0
>  OBS NUDGING: Reading new obs for time window TBACK =    1.902 TFORWD =    3.902 for grid =  3
>  ****** CALL IN4DOB AT KTAU =   954 AND XTIME =     174.13:  NSTA =     134 ******
>  ++++++CALL ERROB AT KTAU =   954 AND INEST =  3:  NSTA =   134 ++++++
>   OBS NUDGING FOR IN,J,KTAU,XTIME,IVAR,IPL:  3 10   954   174.13  3  3 rindx=45.0
>   OBS NUDGING FOR IN,J,KTAU,XTIME,IVAR,IPL:  3 10   954   174.13  4  4 rindx=45.0
>   OBS NUDGING FOR IN,J,KTAU,XTIME,IVAR,IPL:  3 10   954   174.13  1  1 rindx=45.0
>   OBS NUDGING FOR IN,J,KTAU,XTIME,IVAR,IPL:  3 10   954   174.13  2  2 rindx=45.0
>  OBS NUDGING: Reading new obs for time window TBACK =    1.956 TFORWD =    3.956 for grid =  3
>  ****** CALL IN4DOB AT KTAU =   972 AND XTIME =     177.35:  NSTA =     134 ******
>  ++++++CALL ERROB AT KTAU =   972 AND INEST =  3:  NSTA =   134 ++++++
>   OBS NUDGING FOR IN,J,KTAU,XTIME,IVAR,IPL:  3 10   972   177.35  3  3 rindx=45.0
>   OBS NUDGING FOR IN,J,KTAU,XTIME,IVAR,IPL:  3 10   972   177.35  4  4 rindx=45.0
>   OBS NUDGING FOR IN,J,KTAU,XTIME,IVAR,IPL:  3 10   972   177.35  1  1 rindx=45.0
>   OBS NUDGING FOR IN,J,KTAU,XTIME,IVAR,IPL:  3 10   972   177.35  2  2 rindx=45.0
> [compute-0-3:19366] *** Process received signal ***
> [compute-0-3:19366] Signal: Segmentation fault (11)
> [compute-0-3:19366] Signal code:  (128)
> [compute-0-3:19366] Failing at address: (nil)
> [compute-0-3:19366] *** End of error message ***
>
> It's repeatable.  Happens whether I use adaptive time stepping or not.  Happens whether I use OBS nudging or not.
>
> Any suggestions for how to track down this error?
>
> Bart Brashers
>
>
>
>
>
>
>
>
>
>
>
>
> Ignore Mordac:
>
>
> ________________________________
> This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to email at environcorp.com and immediately delete all copies of the message.
> _______________________________________________
> Wrf-users mailing list
> Wrf-users at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/wrf-users


More information about the Wrf-users mailing list