[ncl-install] Segmentation fault when compiling NCL from source on Amazon Linux 2 (ARM64)

Dave Allured - NOAA Affiliate dave.allured at noaa.gov
Mon Sep 6 13:51:24 MDT 2021


Here are a few more suggestions in between trial and error, and deeper
debugging.  I don't have anything better than these general suggestions,
sorry.

* Use the debug mode *ncl -x* to further isolate the lower level NCL
statement that triggers the error.

* wrf_contour is actually NCL code, inside
$NCARG_ROOT/lib/ncarg/nclscripts/wrf/WRFUserARW.ncl.   Make your own clone,
and isolate the lower level NCL statement that triggers the error.  You may
be able to bypass the problem with alternative coding, or simply eliminate
a non-essential section, such as logging.

* Rebuild NCAR/NCL with compiler-based debugging features enabled, such as *-g
-O0 -fbacktrace -fcheck=all -ffpe-trap=invalid,zero,overflow*.

* Try the latest NCARG/NCL development version from
https://github.com/NCAR/ncl.  Take the "develop" branch.  There have been
several bug fixes and build improvements since the 6.6.2 release.

* Upgrade your GCC/gfortran version.  There have been improvements in ARM
support.  Check to see what is available in the Extras package for Amazon
Linux.  Consider building your own GCC/gfortran to the latest version,
currently 11.2.  If you switch GCC/gfortran versions, you may also need to
rebuild some of your dependencies.


On Mon, Sep 6, 2021 at 5:12 AM <michael.graf at meteoprime.ch> wrote:

> Thanks for the hint. Now (with copying the font files from another
> distribution) I can compile NCL without error message on Amazon Linux 2.
> However, when I run a script with NCL I’m still receiving a message ‘Segmentation
> fault’. I’m also compiling it with the -g option, but don’t get some
> additional hints (see below).  I found out that the segmentation fault
> occurs when the function wrf_contour() is called. Other things seem to work
> well. I can read NetCDF-4 files without problems, calculate CAPE and other
> diagnostics. I also managed to install NCL version 6.6.2 from EPEL8 on
> RHEL8 (ARM64, AWS *Graviton2*) without any problems. However, exactly the
> same issue (Segmentation fault) occurred, when calling wrf_contour(). It
> seems to me that other parts then the fontcap compilation have similar
> problems. Maybe the problem is related to the CPU AWS *Graviton2, but
> it’s also little endian.*
>
>
>
> *****
>
> > ncl wrf_mucape_cin.ncl
>
>
>
> Copyright (C) 1995-2019 - All Rights Reserved
>
> University Corporation for Atmospheric Research NCAR Command Language
> Version 6.6.2
>
> The use of this software is governed by a License Agreement.
>
> See http://www.ncl.ucar.edu/ for more details.
>
> (0)Working on time: 2021-08-31_00:00:00
>
> *Segmentation fault*
>
>
>
> *****
>
> > lscpu
>
>
>
> Architecture: aarch64
>
> *Byte Order: Little Endian *
>
> CPU(s): 2 On-line
>
> CPU(s) list: 0,1
>
> Thread(s) per core: 1 Core(s) per socket: 2
>
> Socket(s): 1 NUMA node(s): 1
>
> Model: 1
>
> BogoMIPS: 243.75
>
> L1d cache: 64K
>
> L1i cache: 64K
>
> L2 cache: 1024K
>
> L3 cache: 32768K
>
> NUMA node0 CPU(s): 0,1
>
> Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
> cpuid asimdrdm lrcpc dcpop asimddp ssbs
>
>
>
>
>
> On Fri, Sep 3, 2021 at 4:39 PM Dave Allured - NOAA Affiliate <
> dave.allured at noaa.gov> wrote:
>
> I think fontc is a standalone program that is used only during the NCL
> build process.  You may be able to sidestep the program issue completely,
> by simply copying over the compiled fontcap files from a different build.
> Look at one of the X86 binary distributions, or a working install on any
> X86 system.  I suspect that the only compatibility issue is endianness of
> 16- and 32-bit integers.  ARM64 and X86 should both be little endian; not
> sure because I lack ARM experience.
>
>
>
>
>
> On Wed, Sep 1, 2021 at 11:37 AM Michael Graf via ncl-install <
> ncl-install at mailman.ucar.edu> wrote:
>
> Dear all,
>
> Thanks for adding me to the NCL mailing list.
>
> I am trying to compile the latest NCL Version 6.6.2 from scratch on Amazon
> Linux 2 (ARM64 architecture). Everything works fine except that a
> segmentation fault occurs when the fontcaps are compiled respectively when
> the fontc binary is processing fontcaps (see output below). No other error
> occurs. The ncl binary is compiled and it can be started without problems,
> but when I run a plotting script a segmentation fault occurs that is
> probably related to the compilation error in fontcap.
>
> I also compiled a minimal version with as few dependencies as possible (no
> GDAL, HDF5, NETCDF-4 and so on) to rule out that they cause the problem
> without any effect. I have also randomly tried different compiler options
> for the compilation in the folder fontcap, but the error always remains the
> same. I suspect that the compiler is causing the problem, but there is no
> alternative on Amazon Linux 2 so far. I'm using gfortran (version 7.3.1)
> and
> gcc (version 7.3.1), but here only Fortran77 code seems to be compiled.
>
> It would be great if somebody has a hint how to overcome this problem.
> Maybe
> there is another option, so that I don't have to build it from scratch. The
> installation with conda does not work on ARM64.
>
> Best, Michael
>
> ************************************************************************
> Making ./common/src/fontcap
> make[4]: Entering directory
> `/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap'
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O   -c
> cfaamn.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> cfrdln.o cfrdln.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> cfwrit.o cfwrit.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> ffgttk.o ffgttk.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> ffinfo.o ffinfo.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> ffphol.o ffphol.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> ffppkt.o ffppkt.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> ffprcf.o ffprcf.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> ffprsa.o ffprsa.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> fftbkd.o fftbkd.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> fftkin.o fftkin.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> sffndc.o sffndc.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> sfgtin.o sfgtin.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> sfgtkw.o sfgtkw.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> sfprcf.o sfprcf.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> sfskbk.o sfskbk.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -c
> -o
> sftbkd.o sftbkd.f
> gfortran -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -o
> fontc cfaamn.o  cfrdln.o  cfwrit.o  ffgttk.o  ffinfo.o  ffphol.o  ffppkt.o
> ffprcf.o ffprsa.o  fftb
> kd.o  fftkin.o  sffndc.o  sfgtin.o  sfgtkw.o  sfprcf.o  sfskbk.o sftbkd.o
> -L../../.././common/src/libncarg_c -lncarg_c -L/usr/local/ncarg_gdal/lib
> -L/usr/local/lib
> Processing fontcap font1
>
> Program received signal SIGSEGV: Segmentation fault - invalid memory
> reference.
>
> Backtrace for this error:
> #0  0x40001dcb99a3
> #1  0x40001dcb888f
> #2  0x40001dc90667
> #3  0x403bdc
> #4  0x403c63
> #5  0x4032af
> #6  0x400efb
> #7  0x401213
> #8  0x40001df5ace3
> #9  0x400d07
> make[4]: *** [font1] Segmentation fault
> make[4]: Leaving directory
> `/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap'
>
> ************************************************************************
> gfortran -g -fbacktrace -Wall -fcheck=all      -o fontc cfaamn.o  cfrdln.o
> cfwrit.o  ffgttk.o  ffinfo.o  ffphol.o  ffppkt.o  ffprcf.o ffprsa.o
> fftbkd.o  fftkin.o  sffndc.o
>   sfgtin.o  sfgtkw.o  sfprcf.o  sfskbk.o sftbkd.o
> -L../../.././common/src/libncarg_c -lncarg_c -L/usr/local/ncarg/lib
> -L/usr/local/lib
> Processing fontcap font1
>
> Program received signal SIGSEGV: Segmentation fault - invalid memory
> reference.
>
> Backtrace for this error:
> #0  0x40001a29595b in ???
> #1  0x40001a29488f in ???
> #2  0x40001a26c667 in ???
> #3  0x40723c in ???
> #4  0x4072e7 in ???
> #5  0x405c97 in sfgtwk_
> at /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/sfgtkw.f:95
> #6  0x4061cb in sfprcf_
> at /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/sfprcf.f:108
> #7  0x40119b in cfaamn
> at /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/cfaamn.f:304
> #8  0x401633 in main
> at /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/cfaamn.f:358
> make: *** [font1] Segmentation fault
>
> ************************************************************************
> gfortran -g -fsanitize=address,undefined      -o fontc cfaamn.o  cfrdln.o
> cfwrit.o  ffgttk.o  ffinfo.o  ffphol.o  ffppkt.o  ffprcf.o ffprsa.o
> fftbkd.o  fftkin.o  sffndc.o
>  sfgtin.o  sfgtkw.o  sfprcf.o  sfskbk.o sftbkd.o
> -L../../.././common/src/libncarg_c -lncarg_c -L/usr/local/ncarg/lib
> -L/usr/local/lib
> Processing fontcap font1
> ASAN:DEADLYSIGNAL
> =================================================================
> ==2477==ERROR: AddressSanitizer: SEGV on unknown address 0x100005104df40
> (pc
> 0x00000040ffd0 bp 0xffffd104daf0 sp 0xffffd104daf0 T0)
> ==2477==The signal is caused by a READ memory access.
>     #0 0x40ffcf in gbyte_
> (/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/fontc+0x40ffcf)
>     #1 0x410057 in gbytes_
> (/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/fontc+0x410057)
>     #2 0x40deab in sfprcf_
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/sfprcf.f:117
>     #3 0x40213f in cfaamn
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/cfaamn.f:304
>     #4 0x402d7b in main
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/cfaamn.f:358
>     #5 0x40002cbc7ce3 in __libc_start_main (/lib64/libc.so.6+0x1fce3)
>     #6 0x4018a7
> (/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/fontc+0x4018a7)
>
> AddressSanitizer can not provide additional info.
> SUMMARY: AddressSanitizer: SEGV
> (/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/fontc+0x40ffcf)
> in gbyte_
> ==2477==ABORTING
> make: *** [font1] Error 1
>
> _______________________________________________
> ncl-install mailing list
> List instructions, subscriber options, unsubscribe:
> https://mailman.ucar.edu/mailman/listinfo/ncl-install
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.ucar.edu/pipermail/ncl-install/attachments/20210906/9c7774f4/attachment.html>


More information about the ncl-install mailing list