porting build from gnu to intel, runtime error: "Subscript #1 of the array DES has value 0 which is less than the lower bound of 1"

Short issue:

We are getting the runtime error:

forrtl: severe (408): fort: (3): Subscript #1 of the array DES has value 0 which is less than the lower bound of 1

In more detail:

We are using the latest version of shield, i.e. SHiELD_BUILD_VERSION="FV3-202204-public", FV3_VERSION="FV3-202210-public", FMS_VERSION="2022.04". We're running on an ubuntu 22.04 linux AWS ec2 instance, and have built/run SHiELD successfully for many months using OpenMPI/gfortran.

We are now switching our build over from OpenMPI/gfortran (MKMF_TEMPLATE=linux-ubuntu-trusty-gnu.mk) to IntelMPI/ifort (MKMF_TEMPLATE="intel.mk"). We are using intel version:

mpiifort for the Intel(R) MPI Library 2021.10 for Linux*
Copyright Intel Corporation.
ifort version 2021.10.0

Our build is based as closely as possible on this SHiELD_build repo. We're testing a 1-hour C96 simulation with our original OpenMPI/gfortran build, and it completes successfully (~300 seconds on 24 cores). With IntelMPI/ifort, the model builds successfully, but from the same experiment directory where the GNU build runs without error, the intel build gives the following error at runtime:

 ---------------------------------------------
NOTE from PE     0: READING FROM SST_restart DISABLED
 Before adi: W max =    1.573370      min =   -1.371867    
NOTE from PE     0: Performing adiabatic init   1 times
forrtl: severe (408): fort: (3): Subscript #1 of the array DES has value 0 which is less than the lower bound of 1

Image              PC                Routine            Line        Source             
shield_nh.prod.32  00000000015BCBBE  gfdl_mp_mod_mp_qs        7233  gfdl_mp.F90
shield_nh.prod.32  00000000015BD7D4  gfdl_mp_mod_mp_iq        7369  gfdl_mp.F90
shield_nh.prod.32  00000000015174C5  gfdl_mp_mod_mp_cl        4621  gfdl_mp.F90
shield_nh.prod.32  0000000001428F26  gfdl_mp_mod_mp_mp        1429  gfdl_mp.F90
shield_nh.prod.32  00000000015589F5  gfdl_mp_mod_mp_fa        5648  gfdl_mp.F90
shield_nh.prod.32  00000000018EB123  intermediate_phys         257  intermediate_phys.F90
libiomp5.so        000014B302363493  __kmp_invoke_micr     Unknown  Unknown
libiomp5.so        000014B3022D1CA4  __kmp_fork_call       Unknown  Unknown
libiomp5.so        000014B302289D23  __kmpc_fork_call      Unknown  Unknown
shield_nh.prod.32  00000000018C8D09  intermediate_phys         186  intermediate_phys.F90
shield_nh.prod.32  0000000000BE6BC0  fv_mapz_mod_mp_la         841  fv_mapz.F90
shield_nh.prod.32  00000000019FA0A1  fv_dynamics_mod_m         590  fv_dynamics.F90
shield_nh.prod.32  0000000002D31F23  atmosphere_mod_mp        1553  atmosphere.F90
shield_nh.prod.32  0000000002C61BFA  atmosphere_mod_mp         431  atmosphere.F90
shield_nh.prod.32  0000000002280A56  atmos_model_mod_m         395  atmos_model.F90
shield_nh.prod.32  0000000000EDE999  coupler_main_IP_c         417  coupler_main.F90
shield_nh.prod.32  0000000000ED93FF  MAIN__                    146  coupler_main.F90
shield_nh.prod.32  000000000041504D  Unknown               Unknown  Unknown
libc.so.6          000014B301E29D90  Unknown               Unknown  Unknown
libc.so.6          000014B301E29E40  __libc_start_main     Unknown  Unknown
shield_nh.prod.32  0000000000414F65  Unknown               Unknown  Unknown

For reference the traceback is pointing to intermediate_phys.F90, line 257: https://github.com/NOAA-GFDL/GFDL_atmos_cubed_sphere/blob/d2e5bef344b64d6a10524479b3288717239fb2a2/model/intermediate_phys.F90#L257

! fast saturation adjustment
            call fast_sat_adj (abs (mdt), is, ie, kmp, km, hydrostatic, consv .gt. consv_min, &
                     adj_vmr (is:ie, kmp:km), te (is:ie, j, kmp:km), dte (is:ie), q (is:ie, j, kmp:km, sphum), &
                     q (is:ie, j, kmp:km, liq_wat), q (is:ie, j, kmp:km, rainwat), &
                     q (is:ie, j, kmp:km, ice_wat), q (is:ie, j, kmp:km, snowwat), &
                     q (is:ie, j, kmp:km, graupel), q (is:ie, j, kmp:km, cld_amt), &
                     q2 (is:ie, kmp:km), q3 (is:ie, kmp:km), hs (is:ie, j), &
                     dz (is:ie, kmp:km), pt (is:ie, j, kmp:km), delp (is:ie, j, kmp:km), &
#ifdef USE_COND
                     q_con (is:ie, j, kmp:km), &
#else
                     q_con (isd:, jsd, 1:), &
#endif
#ifdef MOIST_CAPPA
                     cappa (is:ie, j, kmp:km), &
#else
                     cappa (isd:, jsd, 1:), &
#endif
                     gsize, last_step, inline_mp%cond (is:ie, j), inline_mp%reevap (is:ie, j), &
                     inline_mp%dep (is:ie, j), inline_mp%sub (is:ie, j), do_sat_adj)

I checked our build logs, and we are using both USE_COND and MOIST_CAPPA, which are activated due to the 'nh' setting.

I noticed this is called from: https://github.com/NOAA-GFDL/SHiELD_physics/blob/2882fdeb429abc2349a8e881803ac67b154532c3/simple_coupler/coupler_main.F90#L146C19-L146C19

 call fms_init()
 call mpp_init()
 initClock = mpp_clock_id( 'Initialization' )
 call mpp_clock_begin (initClock) !nesting problem

 call fms_init
 call constants_init
 call fms_affinity_init
 call sat_vapor_pres_init

 call coupler_init

As an additional piece of information, we have also generated our own control/coupler file, and do not have this runtime error with the intel build. In our case, we comment out fms_init and fms_affinity_init since fms_init is called here twice and fms_affinity_init was removed later in https://github.com/NOAA-GFDL/FMScoupler/blob/main/SHiELD/coupler_main.F90:

!   call fms_init(mpi_comm_fv3)
    if (dodebug) print *, "fv3_shield_cap:: calling constants_init..."
    call constants_init
!   if (dodebug) print *, "fv3_shield_cap:: calling fms_affinity_init..."
!   call fms_affinity_init

I've tried building the IntelMPI/ifort build in both a docker container and a bash script directly on the ec2 instance, and I've tried building in both 'prod' mode and 'debug' model, but all give the same error above.

I've tried removing "export FMS_CPPDEFS=-DHAVE_GETTID" from the build options - in that case the make FMS fails.

I found a similar issue report in E3SM due to an upgrade in the intel complier. In their case it was related to a bug, but I'm not sure if that is true here: https://github.com/E3SM-Project/E3SM/issues/2051

Have you seen this error before, and do you have any idea what might be causing it? I recall getting a similar error in Dec 2022 and I believe the FMS version was part of the problem, and it was resolved by upgrading FMS. However, the FMS versions are the same between builds in this case.