[ncl-install] Segmentation fault when compiling NCL from source on Amazon Linux 2 (ARM64)

Dave Allured - NOAA Affiliate dave.allured at noaa.gov
Fri Sep 10 11:29:41 MDT 2021


Michael, thank you for testing those various strategies.  Your reports and
build logs are helping me understand some of the current problems in
NCL/NCARG.

NCL internally uses a lot of fortran.  Runtime errors "index above upper
bound" are detected in fortran when *-fcheck-all* is used.  Some of these
are caused by deliberate, old style array methods that are now considered
unsafe, and have better alternatives.  Bounds checking is very useful for
locating seg faults due to fortran array mismanagement.  However, it seems
that bounds checking has never been seriously applied to NCL/NCARG.  There
may be numerous cases to work through, before arriving at your original
problem in that *create* block within *wrf_contour*.  Even then it may not
help, because that particular fault might be in C code, rather than fortran.

The undefined reference errors are usually secondary errors that cascade
after some primary compile error.  As such they are not of much concern,
until the primary errors are solved.

For what it's worth, here is a fix for the bounds error in *binput.f*.
This should fix the *graphc* executable, and thereby fix the
graphcap section of the full build process.  This worked for me with
gfortran 10 and 11 on Mac X86.  Original code near the end of *binput.f*:

      DO 1111 II = 1,DUMSIZ
        DUMSPC(II) = 0
 1111 CONTINUE

 Change the first line, and insert another one at the end:

      DO 1111 II = 1,DUMSM1
        DUMSPC(II) = 0
 1111 CONTINUE
      ENDDSP = 0

I don't have much time to spend on this, but I will try to look at some of
the other problems when possible.


On Fri, Sep 10, 2021 at 6:09 AM <michael.graf at meteoprime.ch> wrote:

> Now, I performed the last experiments without success. It seems that the
> compilation of NCL on ARM64 architectures (Amazon Graviton 2) is very
> tricky and I will do now a workaround, where WRF and NCL are running on two
> different virtual machines with different architectures (ARM64 and X86_64).
> It would be more convenient to have it on the same virtual machine.
> However, at the moment, the needed effort for this seems to be too large.
> Nevertheless I attached the make-output files (compiled with GCC 11.2) for
> others as a debugging help.
>
>
>
> The compilation with the develop branch produced identical errors as with
> the main branch. Afterwards I tried to compile it with the newest GCC
> version 11.2. Now, a lot of *type mismatch error* and *rank mismatch
> errors* occurs (see attached make-output files). I turned them off with
> the option *-fallow-argument-mismatch, *followed by numerous errors of
> type “Error: BOZ literal constant at (1) is neither a data-stmt-constant
> nor an actual argument to INT, REAL, DBLE, or CMPLX intrinsic function” that
> I override with the *-fallow-invalid-boz *option, but without success.
> Many “undefined reference” errors remain and also some “Error: Operands of
> binary numeric operator ‘/’at (1) are INTEGER(4)/BOZ*” *appear*. *
>
>
>
> *Von:* ncl-install <ncl-install-bounces at mailman.ucar.edu> *Im Auftrag von
> *Michael Graf via ncl-install
> *Gesendet:* Donnerstag, 9. September 2021 08:50
> *An:* 'Dave Allured - NOAA Affiliate' <dave.allured at noaa.gov>
> *Cc:* ncl-install at mailman.ucar.edu
> *Betreff:* Re: [ncl-install] Segmentation fault when compiling NCL from
> source on Amazon Linux 2 (ARM64)
>
>
>
> As suggested, I compiled NCL with compiler-based debugging features
> enabled. Now, several errors occur (see below for some examples – zipped
> make-output file is attached). E. g. the error ‘*At line 4397 of file
> Iftran.f - Fortran runtime error: Index '2' of dimension 1 of array 'id'
> above upper bound of 1’ *occurs several times and many messages *‘undefined
> reference to `xxx’* appear now. The ncl binary cannot be created anymore.
> It’s not clear to me what’s causing now the errors, because they aren’t
> present, when debugging features are disabled. Maybe you have a suggestion.
>
>
>
> Next steps will be to use the develop branch of ncl for compilation and
> try out the newest version of GCC (11.2).
>
>
>
> *****
>
> Processing graphcap adm5
>
> *At line 316 of file binput.f*
>
> *Fortran runtime error: Index '327' of dimension 1 of array 'dumspc' above
> upper bound of 326*
>
>
>
> Error termination. Backtrace:
>
> #0  0x40001b72195b in ???
>
> #1  0x40001b722893 in ???
>
> #2  0x40001b722ccb in ???
>
> #3  0x401983 in binput_
>
>                 at
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/graphcap/binput.f:316
>
> #4  0x40131f in capchg
>
>                 at
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/graphcap/capchg.f:811
>
> #5  0x401477 in main
>
>                 at
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/graphcap/capchg.f:839
>
> make[4]: *** [adm5] Error 2
>
> make[4]: Leaving directory
> `/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/graphcap'
>
>
>
> *****
>
> gcc -g -O0 -ansi -fPIC -fopenmp -std=c99  -O    -o Fsplit Fsplit.o
> -L../../../.././common/src/libncarg_c -lncarg_c -L/usr/local/ncarg/lib
> -L/usr/local/lib
>
> make[5]: Leaving directory
> `/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/libncarg/Iftran'
>
> Making ./ncarg2d/src/libncarg/areas
>
> make[5]: Entering directory
> `/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/libncarg/areas'
>
> *At line 4397 of file Iftran.f*
>
> *Fortran runtime error: Index '2' of dimension 1 of array 'id' above upper
> bound of 1*
>
>
>
> Error termination. Backtrace:
>
> #0  0x4000104a895b in ???
>
> #1  0x4000104a9893 in ???
>
> #2  0x4000104a9ccb in ???
>
> #3  0x400f1f in xmit_
>
>                 at
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/libncarg/Iftran/Iftran.f:4397
>
> #4  0x4016cb in iftrio_
>
>                 at
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/libncarg/Iftran/Iftran.f:2060
>
> #5  0x403403 in rdcrd_
>
>                 at
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/libncarg/Iftran/Iftran.f:3036
>
> #6  0x40ca6f in getsta_
>
>                 at
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/libncarg/Iftran/Iftran.f:1701
>
> #7  0x4116d7 in iftrax_
>
>                 at
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/libncarg/Iftran/Iftran.f:225
>
> #8  0x413093 in iftran
>
>                 at
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/libncarg/Iftran/Iftran.f:89
>
> #9  0x413213 in main
>
>                 at
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/libncarg/Iftran/Iftran.f:99
>
> make[5]: *** [IftranRun] Error 2
>
> make[5]: Leaving directory
> `/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/libncarg/areas'
>
>
>
> *****
>
> make[5]: Entering directory
> `/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/bin/ezmapdemo'
>
> gfortran -g -O0 -fbacktrace -fcheck=all -ffpe-trap=invalid,zero,overflow
> -fPIC -fno-second-underscore -fno-range-check -fopenmp  -O    -o ezmapdemo
> EzmapDemo.o -L../../../.././ncarg2d/src/libncarg -lncarg
> -L../../../.././ncarg2d/src/libncarg_gks -lncarg_gks
> -L../../../.././common/src/libncarg_c -lncarg_c -lcairo -lXrender
> -lfontconfig -lpixman-1 -lfreetype -lexpat -lpng -lz -lbz2 -lpng -lz
> -L/usr/local/ncarg/lib -L/usr/local/lib  -lX11 -lXext
>
> EzmapDemo.o: In function `colora_':
>
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/bin/ezmapdemo/EzmapDemo.f:2968:
> undefined reference to `mapaci_'
>
> EzmapDemo.o: In function `drawla_':
>
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/bin/ezmapdemo/EzmapDemo.f:2984:
> undefined reference to `mapaci_'
>
> EzmapDemo.o: In function `coninv_':
>
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/bin/ezmapdemo/EzmapDemo.f:4360:
> undefined reference to `cpsetr_'
>
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/bin/ezmapdemo/EzmapDemo.f:4361:
> undefined reference to `cpsetr_'
>
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/ncarg2d/src/bin/ezmapdemo/EzmapDemo.f:4362:
> undefined reference to `cpsetr_'
>

<snip>



> *Von:* ncl-install <ncl-install-bounces at mailman.ucar.edu> *Im Auftrag von
> *Michael Graf via ncl-install
> *Gesendet:* Dienstag, 7. September 2021 11:43
> *An:* 'Dave Allured - NOAA Affiliate' <dave.allured at noaa.gov>
> *Cc:* ncl-install at mailman.ucar.edu
> *Betreff:* Re: [ncl-install] Segmentation fault when compiling NCL from
> source on Amazon Linux 2 (ARM64)
>
>
>
> Thanks for the helpful suggestions. I started with isolation of the
> segmentation fault and found out that it occurs when the function
> gsn_contour() is called. Then I checked where in this function the
> segmentation fault is triggered and found out that it is in the block shown
> below between the two print statements. Don’t know exactly what this part
> is doing but it seems to be related to the contour plotting routine. I was
> also using ncl -x but didn’t find some additional info about the error.
> Next step will be the compilation with compiler-based debugging features
> enabled.
>
>
>
> *****
>
> if (is_lb_mode) then
>
> if(res2.and.isatt(res2,"trGridType")) then
>
> plot_object = create wksname + "_contour" contourPlotClass wks
>
> "cnScalarFieldData" : data_object
>
> "pmLabelBarDisplayMode" : lb_mode
>
> "trXTensionF": xtension
>
> "trYTensionF": ytension
>
> "trGridType": res2 at trGridType
>
> end create
>
> delete(res2 at trGridType)
>
> else
>
> *print("START CREATE")*
>
> plot_object = create wksname + "_contour" contourPlotClass wks
>
> "cnScalarFieldData" : data_object
>
> "pmLabelBarDisplayMode" : lb_mode
>
> "trXTensionF": xtension
>
> "trYTensionF": ytension
>
> end create <-- *Segmentation fault*
>
> *print("END CREATE")*
>
> end if
>
>
>
> *****
>
> opts2 = opts
>
> delete_attrs(opts2); Clean up.
>
>   print("RUN gsn_contour()")
>
>   ;print(wks)
>
>   ;print(data)
>
>   ;print(opts2)
>
> cn = gsn_contour(wks,data,opts2); Create the plot. <-- S*egmentation
> fault*
>
> print("FINISH gsn_contour()")
>
> _SetMainTitle(nc_file,wks,cn,opts); Set some titles
>
>
>
> *Von:* Dave Allured - NOAA Affiliate <dave.allured at noaa.gov>
> *Gesendet:* Montag, 6. September 2021 21:51
> *An:* michael.graf at meteoprime.ch
> *Cc:* ncl-install at mailman.ucar.edu
> *Betreff:* Re: [ncl-install] Segmentation fault when compiling NCL from
> source on Amazon Linux 2 (ARM64)
>
>
>
> Here are a few more suggestions in between trial and error, and deeper
> debugging.  I don't have anything better than these general suggestions,
> sorry.
>
>
>
> * Use the debug mode *ncl -x* to further isolate the lower level NCL
> statement that triggers the error.
>
>
>
> * wrf_contour is actually NCL code, inside
> $NCARG_ROOT/lib/ncarg/nclscripts/wrf/WRFUserARW.ncl.   Make your own clone,
> and isolate the lower level NCL statement that triggers the error.  You may
> be able to bypass the problem with alternative coding, or simply eliminate
> a non-essential section, such as logging.
>
>
>
> * Rebuild NCAR/NCL with compiler-based debugging features enabled, such as *-g
> -O0 -fbacktrace -fcheck=all -ffpe-trap=invalid,zero,overflow*.
>
>
> * Try the latest NCARG/NCL development version from
> https://github.com/NCAR/ncl.  Take the "develop" branch.  There have been
> several bug fixes and build improvements since the 6.6.2 release.
>
>
>
> * Upgrade your GCC/gfortran version.  There have been improvements in ARM
> support.  Check to see what is available in the Extras package for Amazon
> Linux.  Consider building your own GCC/gfortran to the latest version,
> currently 11.2.  If you switch GCC/gfortran versions, you may also need to
> rebuild some of your dependencies.
>
>
>
>
>
> On Mon, Sep 6, 2021 at 5:12 AM <michael.graf at meteoprime.ch> wrote:
>
> Thanks for the hint. Now (with copying the font files from another
> distribution) I can compile NCL without error message on Amazon Linux 2.
> However, when I run a script with NCL I’m still receiving a message
> ‘Segmentation fault’. I’m also compiling it with the -g option, but don’t
> get some additional hints (see below).  I found out that the segmentation
> fault occurs when the function wrf_contour() is called. Other things seem
> to work well. I can read NetCDF-4 files without problems, calculate CAPE
> and other diagnostics. I also managed to install NCL version 6.6.2 from
> EPEL8 on RHEL8 (ARM64, AWS *Graviton2*) without any problems. However,
> exactly the same issue (Segmentation fault) occurred, when calling
> wrf_contour(). It seems to me that other parts then the fontcap compilation
> have similar problems. Maybe the problem is related to the CPU AWS *Graviton2,
> but it’s also little endian.*
>
>
>
> *****
>
> > ncl wrf_mucape_cin.ncl
>
>
>
> Copyright (C) 1995-2019 - All Rights Reserved
>
> University Corporation for Atmospheric Research NCAR Command Language
> Version 6.6.2
>
> The use of this software is governed by a License Agreement.
>
> See http://www.ncl.ucar.edu/ for more details.
>
> (0)Working on time: 2021-08-31_00:00:00
>
> *Segmentation fault*
>
>
>
> *****
>
> > lscpu
>
>
>
> Architecture: aarch64
>
> *Byte Order: Little Endian *
>
> CPU(s): 2 On-line
>
> CPU(s) list: 0,1
>
> Thread(s) per core: 1 Core(s) per socket: 2
>
> Socket(s): 1 NUMA node(s): 1
>
> Model: 1
>
> BogoMIPS: 243.75
>
> L1d cache: 64K
>
> L1i cache: 64K
>
> L2 cache: 1024K
>
> L3 cache: 32768K
>
> NUMA node0 CPU(s): 0,1
>
> Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
> cpuid asimdrdm lrcpc dcpop asimddp ssbs
>
>
>
>
>
> On Fri, Sep 3, 2021 at 4:39 PM Dave Allured - NOAA Affiliate <
> dave.allured at noaa.gov> wrote:
>
> I think fontc is a standalone program that is used only during the NCL
> build process.  You may be able to sidestep the program issue completely,
> by simply copying over the compiled fontcap files from a different build.
> Look at one of the X86 binary distributions, or a working install on any
> X86 system.  I suspect that the only compatibility issue is endianness of
> 16- and 32-bit integers.  ARM64 and X86 should both be little endian; not
> sure because I lack ARM experience.
>
>
>
>
>
> On Wed, Sep 1, 2021 at 11:37 AM Michael Graf via ncl-install <
> ncl-install at mailman.ucar.edu> wrote:
>
> Dear all,
>
> Thanks for adding me to the NCL mailing list.
>
> I am trying to compile the latest NCL Version 6.6.2 from scratch on Amazon
> Linux 2 (ARM64 architecture). Everything works fine except that a
> segmentation fault occurs when the fontcaps are compiled respectively when
> the fontc binary is processing fontcaps (see output below). No other error
> occurs. The ncl binary is compiled and it can be started without problems,
> but when I run a plotting script a segmentation fault occurs that is
> probably related to the compilation error in fontcap.
>
> I also compiled a minimal version with as few dependencies as possible (no
> GDAL, HDF5, NETCDF-4 and so on) to rule out that they cause the problem
> without any effect. I have also randomly tried different compiler options
> for the compilation in the folder fontcap, but the error always remains the
> same. I suspect that the compiler is causing the problem, but there is no
> alternative on Amazon Linux 2 so far. I'm using gfortran (version 7.3.1)
> and
> gcc (version 7.3.1), but here only Fortran77 code seems to be compiled.
>
> It would be great if somebody has a hint how to overcome this problem.
> Maybe
> there is another option, so that I don't have to build it from scratch. The
> installation with conda does not work on ARM64.
>
> Best, Michael
>
> ************************************************************************
> Making ./common/src/fontcap
> make[4]: Entering directory
> `/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap'
>
>
<snip>

gfortran -g -fbacktrace -Wall -fcheck=all      -o fontc cfaamn.o  cfrdln.o
> cfwrit.o  ffgttk.o  ffinfo.o  ffphol.o  ffppkt.o  ffprcf.o ffprsa.o
> fftbkd.o  fftkin.o  sffndc.o
>   sfgtin.o  sfgtkw.o  sfprcf.o  sfskbk.o sftbkd.o
> -L../../.././common/src/libncarg_c -lncarg_c -L/usr/local/ncarg/lib
> -L/usr/local/lib
> Processing fontcap font1
>
> Program received signal SIGSEGV: Segmentation fault - invalid memory
> reference.
>
> Backtrace for this error:
> #0  0x40001a29595b in ???
> #1  0x40001a29488f in ???
> #2  0x40001a26c667 in ???
> #3  0x40723c in ???
> #4  0x4072e7 in ???
> #5  0x405c97 in sfgtwk_
> at /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/sfgtkw.f:95
> #6  0x4061cb in sfprcf_
> at /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/sfprcf.f:108
> #7  0x40119b in cfaamn
> at /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/cfaamn.f:304
> #8  0x401633 in main
> at /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/cfaamn.f:358
> make: *** [font1] Segmentation fault
>
> ************************************************************************
> gfortran -g -fsanitize=address,undefined      -o fontc cfaamn.o  cfrdln.o
> cfwrit.o  ffgttk.o  ffinfo.o  ffphol.o  ffppkt.o  ffprcf.o ffprsa.o
> fftbkd.o  fftkin.o  sffndc.o
>  sfgtin.o  sfgtkw.o  sfprcf.o  sfskbk.o sftbkd.o
> -L../../.././common/src/libncarg_c -lncarg_c -L/usr/local/ncarg/lib
> -L/usr/local/lib
> Processing fontcap font1
> ASAN:DEADLYSIGNAL
> =================================================================
> ==2477==ERROR: AddressSanitizer: SEGV on unknown address 0x100005104df40
> (pc
> 0x00000040ffd0 bp 0xffffd104daf0 sp 0xffffd104daf0 T0)
> ==2477==The signal is caused by a READ memory access.
>     #0 0x40ffcf in gbyte_
> (/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/fontc+0x40ffcf)
>     #1 0x410057 in gbytes_
> (/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/fontc+0x410057)
>     #2 0x40deab in sfprcf_
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/sfprcf.f:117
>     #3 0x40213f in cfaamn
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/cfaamn.f:304
>     #4 0x402d7b in main
> /home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/cfaamn.f:358
>     #5 0x40002cbc7ce3 in __libc_start_main (/lib64/libc.so.6+0x1fce3)
>     #6 0x4018a7
> (/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/fontc+0x4018a7)
>
> AddressSanitizer can not provide additional info.
> SUMMARY: AddressSanitizer: SEGV
> (/home/ec2-user/wrf/NCL/ncl_ncarg-6.6.2/common/src/fontcap/fontc+0x40ffcf)
> in gbyte_
> ==2477==ABORTING
> make: *** [font1] Error 1
>
> _______________________________________________
> ncl-install mailing list
> List instructions, subscriber options, unsubscribe:
> https://mailman.ucar.edu/mailman/listinfo/ncl-install
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.ucar.edu/pipermail/ncl-install/attachments/20210910/ede24f1e/attachment-0001.html>


More information about the ncl-install mailing list