[ncl-install] Segmentation fault when compiling NCL from source on Amazon Linux 2 (ARM64)

michael.graf at meteoprime.ch michael.graf at meteoprime.ch
Tue Sep 7 03:42:47 MDT 2021


Thanks for the helpful suggestions. I started with isolation of the segmentation fault and found out that it occurs when the function gsn_contour() is called. Then I checked where in this function the segmentation fault is triggered and found out that it is in the block shown below between the two print statements. Don’t know exactly what this part is doing but it seems to be related to the contour plotting routine. I was also using ncl -x but didn’t find some additional info about the error. Next step will be the compilation with compiler-based debugging features enabled. 

 

*****

if (is_lb_mode) then

if(res2.and.isatt(res2,"trGridType")) then

plot_object = create wksname + "_contour" contourPlotClass wks

"cnScalarFieldData" : data_object

"pmLabelBarDisplayMode" : lb_mode

"trXTensionF": xtension

"trYTensionF": ytension

"trGridType": res2 at trGridType

end create

delete(res2 at trGridType)

else

print("START CREATE")

plot_object = create wksname + "_contour" contourPlotClass wks

"cnScalarFieldData" : data_object

"pmLabelBarDisplayMode" : lb_mode

"trXTensionF": xtension

"trYTensionF": ytension

end create <-- Segmentation fault

print("END CREATE")

end if

 

*****

opts2 = opts

delete_attrs(opts2); Clean up.

  print("RUN gsn_contour()")

  ;print(wks)

  ;print(data)

  ;print(opts2)

cn = gsn_contour(wks,data,opts2); Create the plot. <-- Segmentation fault

print("FINISH gsn_contour()")

_SetMainTitle(nc_file,wks,cn,opts); Set some titles

 

Von: Dave Allured - NOAA Affiliate <dave.allured at noaa.gov> 
Gesendet: Montag, 6. September 2021 21:51
An: michael.graf at meteoprime.ch
Cc: ncl-install at mailman.ucar.edu
Betreff: Re: [ncl-install] Segmentation fault when compiling NCL from source on Amazon Linux 2 (ARM64)

 

Here are a few more suggestions in between trial and error, and deeper debugging.  I don't have anything better than these general suggestions, sorry.

 

* Use the debug mode ncl -x to further isolate the lower level NCL statement that triggers the error.

 

* wrf_contour is actually NCL code, inside $NCARG_ROOT/lib/ncarg/nclscripts/wrf/WRFUserARW.ncl.   Make your own clone, and isolate the lower level NCL statement that triggers the error.  You may be able to bypass the problem with alternative coding, or simply eliminate a non-essential section, such as logging.

 

* Rebuild NCAR/NCL with compiler-based debugging features enabled, such as -g -O0 -fbacktrace -fcheck=all -ffpe-trap=invalid,zero,overflow.  


* Try the latest NCARG/NCL development version from https://github.com/NCAR/ncl.  Take the "develop" branch.  There have been several bug fixes and build improvements since the 6.6.2 release.

 

* Upgrade your GCC/gfortran version.  There have been improvements in ARM support.  Check to see what is available in the Extras package for Amazon Linux.  Consider building your own GCC/gfortran to the latest version, currently 11.2.  If you switch GCC/gfortran versions, you may also need to rebuild some of your dependencies.

 

 

On Mon, Sep 6, 2021 at 5:12 AM <michael.graf at meteoprime.ch <mailto:michael.graf at meteoprime.ch> > wrote:

Thanks for the hint. Now (with copying the font files from another distribution) I can compile NCL without error message on Amazon Linux 2. However, when I run a script with NCL I’m still receiving a message ‘Segmentation fault’. I’m also compiling it with the -g option, but don’t get some additional hints (see below).  I found out that the segmentation fault occurs when the function wrf_contour() is called. Other things seem to work well. I can read NetCDF-4 files without problems, calculate CAPE and other diagnostics. I also managed to install NCL version 6.6.2 from EPEL8 on RHEL8 (ARM64, AWS Graviton2) without any problems. However, exactly the same issue (Segmentation fault) occurred, when calling wrf_contour(). It seems to me that other parts then the fontcap compilation have similar problems. Maybe the problem is related to the CPU AWS Graviton2, but it’s also little endian.

 

*****

> ncl wrf_mucape_cin.ncl

 

Copyright (C) 1995-2019 - All Rights Reserved 

University Corporation for Atmospheric Research NCAR Command Language Version 6.6.2 

The use of this software is governed by a License Agreement. 

See  <http://www.ncl.ucar.edu/> http://www.ncl.ucar.edu/ for more details. 

(0)Working on time: 2021-08-31_00:00:00 

Segmentation fault

 

*****

> lscpu

 

Architecture: aarch64 

Byte Order: Little Endian 

CPU(s): 2 On-line 

CPU(s) list: 0,1 

Thread(s) per core: 1 Core(s) per socket: 2 

Socket(s): 1 NUMA node(s): 1 

Model: 1 

BogoMIPS: 243.75 

L1d cache: 64K 

L1i cache: 64K 

L2 cache: 1024K 

L3 cache: 32768K 

NUMA node0 CPU(s): 0,1 

Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

 

 

On Fri, Sep 3, 2021 at 4:39 PM Dave Allured - NOAA Affiliate <dave.allured at noaa.gov <mailto:dave.allured at noaa.gov> > wrote:

I think fontc is a standalone program that is used only during the NCL build process.  You may be able to sidestep the program issue completely, by simply copying over the compiled fontcap files from a different build.  Look at one of the X86 binary distributions, or a working install on any X86 system.  I suspect that the only compatibility issue is endianness of 16- and 32-bit integers.  ARM64 and X86 should both be little endian; not sure because I lack ARM experience.  

 

 

On Wed, Sep 1, 2021 at 11:37 AM Michael Graf via ncl-install <ncl-install at mailman.ucar.edu <mailto:ncl-install at mailman.ucar.edu> > wrote:

Dear all, 

Thanks for adding me to the NCL mailing list.

I am trying to compile the latest NCL Version 6.6.2 from scratch on Amazon
Linux 2 (ARM64 architecture). Everything works fine except that a
segmentation fault occurs when the fontcaps are compiled respectively when
the fontc binary is processing fontcaps (see output below). No other error
occurs. The ncl binary is compiled and it can be started without problems,
but when I run a plotting script a segmentation fault occurs that is
probably related to the compilation error in fontcap. 

I also compiled a minimal version with as few dependencies as possible (no
GDAL, HDF5, NETCDF-4 and so on) to rule out that they cause the problem
without any effect. I have also randomly tried different compiler options
for the compilation in the folder fontcap, but the error always remains the
same. I suspect that the compiler is causing the problem, but there is no
alternative on Amazon Linux 2 so far. I'm using gfortran (version 7.3.1) and
gcc (version 7.3.1), but here only Fortran77 code seems to be compiled. 

It would be great if somebody has a hint how to overcome this problem. Maybe
there is another option, so that I don't have to build it from scratch. The
installation with conda does not work on ARM64. 

Best, Michael

************************************************************************
Making ./common/src/fontcap
make[4]: Entering directory
`/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap'
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O   -c
cfaamn.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
cfrdln.o cfrdln.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
cfwrit.o cfwrit.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
ffgttk.o ffgttk.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
ffinfo.o ffinfo.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
ffphol.o ffphol.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
ffppkt.o ffppkt.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
ffprcf.o ffprcf.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
ffprsa.o ffprsa.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
fftbkd.o fftbkd.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
fftkin.o fftkin.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
sffndc.o sffndc.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
sfgtin.o sfgtin.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
sfgtkw.o sfgtkw.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
sfprcf.o sfprcf.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
sfskbk.o sfskbk.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c -o
sftbkd.o sftbkd.f
gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -o
fontc cfaamn.o  cfrdln.o  cfwrit.o  ffgttk.o  ffinfo.o  ffphol.o  ffppkt.o
ffprcf.o ffprsa.o  fftb
kd.o  fftkin.o  sffndc.o  sfgtin.o  sfgtkw.o  sfprcf.o  sfskbk.o sftbkd.o
-L../../.././common/src/libncarg_c -lncarg_c -L/usr/local/ncarg_gdal/lib
-L/usr/local/lib
Processing fontcap font1

Program received signal SIGSEGV: Segmentation fault - invalid memory
reference.

Backtrace for this error:
#0  0x40001dcb99a3
#1  0x40001dcb888f
#2  0x40001dc90667
#3  0x403bdc
#4  0x403c63
#5  0x4032af
#6  0x400efb
#7  0x401213
#8  0x40001df5ace3
#9  0x400d07
make[4]: *** [font1] Segmentation fault
make[4]: Leaving directory
`/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap'

************************************************************************
gfortran -g -fbacktrace -Wall -fcheck=all      -o fontc cfaamn.o  cfrdln.o
cfwrit.o  ffgttk.o  ffinfo.o  ffphol.o  ffppkt.o  ffprcf.o ffprsa.o
fftbkd.o  fftkin.o  sffndc.o
  sfgtin.o  sfgtkw.o  sfprcf.o  sfskbk.o sftbkd.o
-L../../.././common/src/libncarg_c -lncarg_c -L/usr/local/ncarg/lib
-L/usr/local/lib    
Processing fontcap font1

Program received signal SIGSEGV: Segmentation fault - invalid memory
reference.

Backtrace for this error:
#0  0x40001a29595b in ???
#1  0x40001a29488f in ???
#2  0x40001a26c667 in ???
#3  0x40723c in ???
#4  0x4072e7 in ???
#5  0x405c97 in sfgtwk_
at /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/sfgtkw.f:95
#6  0x4061cb in sfprcf_
at /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/sfprcf.f:108
#7  0x40119b in cfaamn
at /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/cfaamn.f:304
#8  0x401633 in main
at /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/cfaamn.f:358
make: *** [font1] Segmentation fault

************************************************************************
gfortran -g -fsanitize=address,undefined      -o fontc cfaamn.o  cfrdln.o
cfwrit.o  ffgttk.o  ffinfo.o  ffphol.o  ffppkt.o  ffprcf.o ffprsa.o
fftbkd.o  fftkin.o  sffndc.o 
 sfgtin.o  sfgtkw.o  sfprcf.o  sfskbk.o sftbkd.o
-L../../.././common/src/libncarg_c -lncarg_c -L/usr/local/ncarg/lib
-L/usr/local/lib    
Processing fontcap font1
ASAN:DEADLYSIGNAL
=================================================================
==2477==ERROR: AddressSanitizer: SEGV on unknown address 0x100005104df40 (pc
0x00000040ffd0 bp 0xffffd104daf0 sp 0xffffd104daf0 T0)
==2477==The signal is caused by a READ memory access.
    #0 0x40ffcf in gbyte_
(/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/fontc+0x40ffcf)
    #1 0x410057 in gbytes_
(/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/fontc+0x410057)
    #2 0x40deab in sfprcf_
/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/sfprcf.f:117
    #3 0x40213f in cfaamn
/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/cfaamn.f:304
    #4 0x402d7b in main
/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/cfaamn.f:358
    #5 0x40002cbc7ce3 in __libc_start_main (/lib64/libc.so.6+0x1fce3)
    #6 0x4018a7
(/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/fontc+0x4018a7)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV
(/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/fontc+0x40ffcf)
in gbyte_
==2477==ABORTING
make: *** [font1] Error 1

_______________________________________________
ncl-install mailing list
List instructions, subscriber options, unsubscribe:
https://mailman.ucar.edu/mailman/listinfo/ncl-install

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.ucar.edu/pipermail/ncl-install/attachments/20210907/755f96df/attachment-0001.html>


More information about the ncl-install mailing list