[Wrf-users] WPSV3 metgrid.exe seg fault GFS/GFDL

Eric_Meyers emeyers3 at atmos.uiuc.edu
Tue Jun 24 09:41:46 MDT 2008


Dear WPS Users:


SUMMARY: 

metgrid.exe segmentation fault when combining GFS/GFDL input AND processing domains with total grid point number exceeding some mysterious upper bound.



PROCESS:

After trying to process GFDL input (Vtable.GFDL) for 1 domain (9-km grid spacing) using WPSV3 and witnessing metgrid.exe fail due to lacking necessary soil parameters, I now run WPSV3 with multiple data sources - GFS and GFDL - the latter taking precedence for all duplicate fields, the former providing the necessary soil parameters.  

First, I run WPSV3 with the GFS input, excluding running of metgrid.exe (i.e., just run geogrid.exe & ungrib.exe; Vtable.GFS; GFS input):

     &ungrib
      out_format = 'WPS',
      prefix = 'GFS',
     /

in directory ~/WPSV3_GFS1dm/, for example.  The following are produced, for example:

     GFS:2005-07-08_00
     GFS:2005-07-08_03
     GFS:2005-07-08_06

Second, I run WPSV3 again (this time, geogrid.exe, ungrib.exe, AND metgrid.exe; Vtable.GFDL; GFDL input), in a separate directory, with fg_name specified to have metgrid.exe process the GFS input first (path directed to previous WPSV3 run GFS:* output from ungrib.exe) and the GFDL input second:

     &ungrib
      out_format = 'WPS',
      prefix = 'GFDL',
     /

     &metgrid
      fg_name = '~/WPSV3_GFS1dm/GFS', 'GFDL'
      io_form_metgrid = 2,
     /



RESULTS:

For SMALL DOMAIN SIZES, such as 100x79, metgrid.exe SUCCEEDS in producing, for example,

     met_em.d01.2005-07-08_00:00:00.nc.
     met_em.d01.2005-07-08_03:00:00.nc.
     met_em.d01.2005-07-08_06:00:00.nc.

HOWEVER, when I simply try LARGER DOMAIN SIZES, such as 340x340, metgrid.exe FAILS after processing all of the GFS fields for the initial time, while "Processing SKINTEMP at level 200100.000000" from the GFDL input:

     Processing domain 1 of 1
      Processing 2005-07-08_00
         ~/WPSV3_GFS1dm/GFS
         GFDL
     forrtl: severe (174): SIGSEGV, segmentation fault occurred
     Image              PC                Routine            Line        Source
     metgrid.exe        400000000005DDD0  interp_module_mp_         377  interp_module.f90
     metgrid.exe        4000000000054B30  interp_module_mp_         193  interp_module.f90
     metgrid.exe        4000000000085AC0  interp_module_mp_         751  interp_module.f90
     metgrid.exe        40000000000556B0  interp_module_mp_         203  interp_module.f90
     metgrid.exe        4000000000077600  interp_module_mp_         585  interp_module.f90
     metgrid.exe        40000000000539F0  interp_module_mp_         178  interp_module.f90
     metgrid.exe        400000000008D070  interp_module_mp_         821  interp_module.f90
     metgrid.exe        4000000000052E70  interp_module_mp_         168  interp_module.f90
     metgrid.exe        400000000009C7F0  interp_module_mp_         942  interp_module.f90
     metgrid.exe        4000000000054570  interp_module_mp_         188  interp_module.f90
     metgrid.exe        400000000025AF70  process_domain_mo        1619  process_domain_module.f90
     metgrid.exe        400000000024BA20  process_domain_mo        1518  process_domain_module.f90
     metgrid.exe        4000000000212FC0  process_domain_mo         883  process_domain_module.f90
     metgrid.exe        40000000001D69D0  process_domain_mo         137  process_domain_module.f90
     metgrid.exe        400000000002BC00  MAIN__                     66  metgrid.f90
     metgrid.exe        4000000000003E90  Unknown               Unknown  Unknown
     libc.so.6.1        2000000000138060  Unknown               Unknown  Unknown
     metgrid.exe        40000000000038C0  Unknown               Unknown  Unknown

The first error listed, corresponding to line 377 of interp_module.f90, is inside routine search_extrap: 

     if (array(qdata%x,qdata%y,izz) /= msgval .and. mask_array(qdata%x,qdata%y) /= maskval) then

The second error listed, corresponding to line 193, is the call to search_extrap: 

     interp_sequence = search_extrap(xx, yy, izz, array, start_x, end_x, &...



CHECK:

I compiled with -check all, which should have caught an out-of-bounds error for array() or mask_array().

I'm certain the seg fault has no dependence on the input processing time, for I've encountered the same seg fault using various input times.  In addition, the specified domains (e.g., 340x340) never exceed the boundaries of either the GFS or GFDL input.

The fact that processing of smaller domains works demonstrates that my process of running WPSV3 with the multiple GFS/GFDL input sources is correct, but there is obviously some limitation on total grid point number with the compilation, code.



EXPLORATION:

I tried running WPSV3 with more processors, using ‘unlimit’ before running metgrid.exe, and changing optimization, but I witnessed the seg fault regardless.

In addition to 100x79, metgrid.exe SUCCEEDED for domain sizes 202x160 and 250x220, but it FAILED (with the above seg fault) for LARGER domains.  I performed two tests: 1) increasing just the x dimension (250) to that which I desired (340) 2) increasing just the y dimension (220) to that which I desired (340).  I had hoped that these tests would show if there was a region of the GFDL input to either the East or North not willing to work with the 9-km domain increased beyond 250x220, or if simply exceeding an upper limit to the total number of grid points composing the domain caused the seg fault.  Both 1) and 2) failed, thus suggesting the latter.

HOWEVER, when I processed the GFS input ONLY (i.e., no GFDL; single WPSV3 run - geogrid.exe, ungrib.exe, and metgrid.exe; Vtable.GFS; GFS input; fg_name = ‘GFS’) for the 340x340 domain, metgrid.exe SUCCEEDED.  So, the total grid point constraint only applies to my use of GFDL input, in combination with GFS input.  In other words, the seg fault is confined to using large grid point numbers ~ > 250x220 = 55000 AND combined GFS/GFDL data sources.

Although seemingly redundant, I tried processing GFS data twice (i.e., simply replaced link to GFDL input with link to GFS input), as if the data sets were different, to see if perhaps my methodology of processing multiple data sources (even though in this case the same content) for larger domains was producing the seg fault.  I followed the same process described in "PROCESS" section above, but I simply changed the link to GFDL input to GFS input, so that only GFS input would be processed, but with Vtable.GFDL.  In other words, I ran WPSV3 again (geogrid.exe, ungrib.exe, AND metgrid.exe; Vtable.GFDL; GFS input THIS TIME), in a separate directory, with fg_name specified to have metgrid.exe process the GFS input first (path directed to WPSV3 run producing GFS:* output from ungrib.exe, in directory ~/WPSV3_GFS1dm/, for example, from before) and the GFS input again.  metgrid.exe SUCCEEDED for the 340x340 domain.  The success of this test proves that the GFDL input in particular is causing the seg fault, not the methodology that I am using to process multiple data sources, but why only for large total grid point numbers (i.e., ~ > 250x220)?

Although not intended because I want GFDL input to take priority over GFS, I changed (as another sensitivity test, this time with combined GFS/GFDL input again, following "PROCESS" section above, except fg_name order:)

     fg_name = '~/WPSV3_GFS1dm/GFS', 'GFDL' 

to

     fg_name = 'GFDL', '~/WPSV3_GFS1dm/GFS'

The GFDL fields processed without error for the second case listed, in addition to the GFS fields (as usual), for the 340x340 domain.  I find it quite peculiar that just switching the order of processing (from GFS, GFDL to GFDL, GFS, the latter in each taking priority) resulted in SUCCESSFUL processing of GFDL input, but of course not in the manner intended (i.e., "erasing" bogus vortex with subsequently-processed, relatively coarse GFS input).



CONCLUSION:

metgrid.exe failure is linked to use of combined GFS/GFDL input, and for some mysterious reason only occurs for increased domain total grid point numbers > ~ 250x220 and when GFDL takes priority.  The seg fault does not occur when processing GFS input only.

The 340x340 GFDL 9-km domain, the resolution of which is not necessary for my simulation (I'm using a 27-km grid temporarily, which I could process using the methodology described because it, despite covering the same area intended for the 9-km domain, is composed of fewer than 250x220 grid points (~1/9 x 340x340)), would benefit my analysis.  Ideally, I would like to nest a 3-km domain with WPSV3 for enhanced terrain resolution, but this would require even greater nx, ny certain to produce the seg fault.

*****If anyone has experience processing GFDL input with WPS and/or can help me resolve this problem, please reply.  More broadly, if there is another way other than via WPSV3 (nesting grids of finer horizontal resolution) to obtain enhanced terrain resolution, please help.***** 

THANK YOU!


-- 
--------------------------------------------------------------
Eric C. Meyers
Graduate Research Assistant
University of Illinois at Urbana-Champaign
Department of Atmospheric Sciences
emeyers3 at atmos.uiuc.edu
--------------------------------------------------------------


More information about the Wrf-users mailing list