porting build from gnu to intel, runtime error: "Subscript #1 of the array DES has value 0 which is less than the lower bound of 1"
Short issue:
We are getting the runtime error:
forrtl: severe (408): fort: (3): Subscript #1 of the array DES has value 0 which is less than the lower bound of 1
In more detail:
We are using the latest version of shield, i.e. SHiELD_BUILD_VERSION="FV3-202204-public", FV3_VERSION="FV3-202210-public", FMS_VERSION="2022.04". We're running on an ubuntu 22.04 linux AWS ec2 instance, and have built/run SHiELD successfully for many months using OpenMPI/gfortran.
We are now switching our build over from OpenMPI/gfortran (MKMF_TEMPLATE=linux-ubuntu-trusty-gnu.mk) to IntelMPI/ifort (MKMF_TEMPLATE="intel.mk"). We are using intel version:
mpiifort for the Intel(R) MPI Library 2021.10 for Linux*
Copyright Intel Corporation.
ifort version 2021.10.0
Our build is based as closely as possible on this SHiELD_build repo. We're testing a 1-hour C96 simulation with our original OpenMPI/gfortran build, and it completes successfully (~300 seconds on 24 cores). With IntelMPI/ifort, the model builds successfully, but from the same experiment directory where the GNU build runs without error, the intel build gives the following error at runtime:
---------------------------------------------
NOTE from PE 0: READING FROM SST_restart DISABLED
Before adi: W max = 1.573370 min = -1.371867
NOTE from PE 0: Performing adiabatic init 1 times
forrtl: severe (408): fort: (3): Subscript #1 of the array DES has value 0 which is less than the lower bound of 1
Image PC Routine Line Source
shield_nh.prod.32 00000000015BCBBE gfdl_mp_mod_mp_qs 7233 gfdl_mp.F90
shield_nh.prod.32 00000000015BD7D4 gfdl_mp_mod_mp_iq 7369 gfdl_mp.F90
shield_nh.prod.32 00000000015174C5 gfdl_mp_mod_mp_cl 4621 gfdl_mp.F90
shield_nh.prod.32 0000000001428F26 gfdl_mp_mod_mp_mp 1429 gfdl_mp.F90
shield_nh.prod.32 00000000015589F5 gfdl_mp_mod_mp_fa 5648 gfdl_mp.F90
shield_nh.prod.32 00000000018EB123 intermediate_phys 257 intermediate_phys.F90
libiomp5.so 000014B302363493 __kmp_invoke_micr Unknown Unknown
libiomp5.so 000014B3022D1CA4 __kmp_fork_call Unknown Unknown
libiomp5.so 000014B302289D23 __kmpc_fork_call Unknown Unknown
shield_nh.prod.32 00000000018C8D09 intermediate_phys 186 intermediate_phys.F90
shield_nh.prod.32 0000000000BE6BC0 fv_mapz_mod_mp_la 841 fv_mapz.F90
shield_nh.prod.32 00000000019FA0A1 fv_dynamics_mod_m 590 fv_dynamics.F90
shield_nh.prod.32 0000000002D31F23 atmosphere_mod_mp 1553 atmosphere.F90
shield_nh.prod.32 0000000002C61BFA atmosphere_mod_mp 431 atmosphere.F90
shield_nh.prod.32 0000000002280A56 atmos_model_mod_m 395 atmos_model.F90
shield_nh.prod.32 0000000000EDE999 coupler_main_IP_c 417 coupler_main.F90
shield_nh.prod.32 0000000000ED93FF MAIN__ 146 coupler_main.F90
shield_nh.prod.32 000000000041504D Unknown Unknown Unknown
libc.so.6 000014B301E29D90 Unknown Unknown Unknown
libc.so.6 000014B301E29E40 __libc_start_main Unknown Unknown
shield_nh.prod.32 0000000000414F65 Unknown Unknown Unknown
For reference the traceback is pointing to intermediate_phys.F90, line 257: https://github.com/NOAA-GFDL/GFDL_atmos_cubed_sphere/blob/d2e5bef344b64d6a10524479b3288717239fb2a2/model/intermediate_phys.F90#L257
! fast saturation adjustment
call fast_sat_adj (abs (mdt), is, ie, kmp, km, hydrostatic, consv .gt. consv_min, &
adj_vmr (is:ie, kmp:km), te (is:ie, j, kmp:km), dte (is:ie), q (is:ie, j, kmp:km, sphum), &
q (is:ie, j, kmp:km, liq_wat), q (is:ie, j, kmp:km, rainwat), &
q (is:ie, j, kmp:km, ice_wat), q (is:ie, j, kmp:km, snowwat), &
q (is:ie, j, kmp:km, graupel), q (is:ie, j, kmp:km, cld_amt), &
q2 (is:ie, kmp:km), q3 (is:ie, kmp:km), hs (is:ie, j), &
dz (is:ie, kmp:km), pt (is:ie, j, kmp:km), delp (is:ie, j, kmp:km), &
#ifdef USE_COND
q_con (is:ie, j, kmp:km), &
#else
q_con (isd:, jsd, 1:), &
#endif
#ifdef MOIST_CAPPA
cappa (is:ie, j, kmp:km), &
#else
cappa (isd:, jsd, 1:), &
#endif
gsize, last_step, inline_mp%cond (is:ie, j), inline_mp%reevap (is:ie, j), &
inline_mp%dep (is:ie, j), inline_mp%sub (is:ie, j), do_sat_adj)
I checked our build logs, and we are using both USE_COND and MOIST_CAPPA, which are activated due to the 'nh' setting.
I noticed this is called from: https://github.com/NOAA-GFDL/SHiELD_physics/blob/2882fdeb429abc2349a8e881803ac67b154532c3/simple_coupler/coupler_main.F90#L146C19-L146C19
call fms_init()
call mpp_init()
initClock = mpp_clock_id( 'Initialization' )
call mpp_clock_begin (initClock) !nesting problem
call fms_init
call constants_init
call fms_affinity_init
call sat_vapor_pres_init
call coupler_init
As an additional piece of information, we have also generated our own control/coupler file, and do not have this runtime error with the intel build. In our case, we comment out fms_init and fms_affinity_init since fms_init is called here twice and fms_affinity_init was removed later in https://github.com/NOAA-GFDL/FMScoupler/blob/main/SHiELD/coupler_main.F90:
! call fms_init(mpi_comm_fv3)
if (dodebug) print *, "fv3_shield_cap:: calling constants_init..."
call constants_init
! if (dodebug) print *, "fv3_shield_cap:: calling fms_affinity_init..."
! call fms_affinity_init
I've tried building the IntelMPI/ifort build in both a docker container and a bash script directly on the ec2 instance, and I've tried building in both 'prod' mode and 'debug' model, but all give the same error above.
I've tried removing "export FMS_CPPDEFS=-DHAVE_GETTID" from the build options - in that case the make FMS fails.
I found a similar issue report in E3SM due to an upgrade in the intel complier. In their case it was related to a bug, but I'm not sure if that is true here: https://github.com/E3SM-Project/E3SM/issues/2051
Have you seen this error before, and do you have any idea what might be causing it? I recall getting a similar error in Dec 2022 and I believe the FMS version was part of the problem, and it was resolved by upgrading FMS. However, the FMS versions are the same between builds in this case.